Re: [Patch, mips] MIPS performance patch for PR 56552
Steve Ellcey sell...@mips.com writes: diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index 0cda169..49c2bf7 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -6721,7 +6721,7 @@ (define_insn *movGPR:mode_on_MOVECC:mode [(set (match_operand:GPR 0 register_operand =d,d) (if_then_else:GPR - (match_operator:MOVECC 4 equality_operator + (match_operator 4 equality_operator [(match_operand:MOVECC 1 register_operand MOVECC:reg,MOVECC:reg) (const_int 0)]) (match_operand:GPR 2 reg_or_0_operand dJ,0) Sorry, I didn't notice this before, but we should remove _on_MOVECC:mode from the name of the insn. Same for the FP version. OK with that change, thanks. It'd be good to add a testcase too. E.g. we could take your example in the PR and check for the redundant 0x. Richard
Re: [PATCH] MIPS: MIPS32r2 FP reciprocal instruction set support
Maciej W. Rozycki ma...@codesourcery.com writes: Note that these instructions were allowed in either FPU mode in the MIPS IV ISA, but for forward ISA compatibility this change does not enable them for -march=mips4 in the 32-bit FPR mode because the original revision of the MIPS64 ISA did not support it. Yeah, sounds good. Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.h === --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.h 2013-11-12 15:31:46.758734464 + +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.h2013-11-12 15:33:22.277646941 + @@ -921,6 +921,21 @@ struct mips_cpu_info { 'c = -((a * b) [+-] c)'. */ #define ISA_HAS_NMADD3_NMSUB3TARGET_LOONGSON_2EF +/* ISA has floating-point RECIP.fmt and RSQRT.fmt instructions. The + MIPS64 rev. 1 ISA says that RECIP.D and RSQRT.D are unpredictable when + doubles are stored in pairs of FPRs, so for safety's sake, we apply + this restriction to the MIPS IV ISA too. */ +#define ISA_HAS_FP_RECIP_RSQRT(MODE) \ + (((ISA_HAS_FP4 \ +|| (ISA_MIPS32R2 !TARGET_MIPS16)) \ +((MODE) == SFmode \ + || ((TARGET_FLOAT64 \ +|| !(ISA_MIPS4 \ + || ISA_MIPS64)) \ +(MODE) == DFmode)))\ + || ((TARGET_SB1 !TARGET_MIPS16) \ + (MODE) == V2SFmode)) I think the !(ISA_MIPS4 || ISA_MIPS64) part is really r2 or later, which elsewhere we test as ISA_MIPS32R2 || ISA_MIPS64R2. Obviously that isn't as future-proof, but I think consistency wins here. (E.g. the earlier ISA_MIPS32R2 seems like it's reallly r2 or later too). Cleaning up these macros has been on my todo list for about ten years :-( Please also test !TARGET_MIPS16 at the outermost level, so that there's only one instance. I think that gives something like: #define ISA_HAS_FP_RECIP_RSQRT(MODE)\ ISA_HAS_FP4 || ISA_MIPS32R2)\ ((MODE) == SFmode \ || ((TARGET_FLOAT64 \ || ISA_MIPS32R2 \ || ISA_MIPS64R2)\ (MODE) == DFmode))) \ || (TARGET_SB1\ (MODE) == V2SFmode)) \ !TARGET_MIPS16) OK with those changes, thanks. Richard
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
When testing with -freorder-blocks-and-partition enabled, I hit a verification failure in an LTO profiledbootstrap. Edge forwarding performed when we went into cfg layout mode after bb reordering (during compgotos) created a situation where a hot block was then dominated by a cold block and was therefore remarked as cold. Because bb reorder was complete at that point, it was not moved in the physical layout, and we incorrectly went in and out of the cold section multiple times. The following patch addresses that by fixing the layout when we move blocks to the cold section after bb reordering is complete. Tested with an LTO profiledbootstrap with -freorder-blocks-and-partition enabled. Ok for trunk? Thanks, Teresa 2013-11-15 Teresa Johnson tejohn...@google.com * cfgrtl.c (fixup_partitions): Reorder blocks if necessary. computed_gotos just unfactors unified blocks that we use to avoid CFGs with O(n^2) edges. This is mostly to avoid problems with nonlinearity of other passes and to reduce the quadratic memory use case to one function at a time. I wonder if it won't be cleaner to simply unfactor those just before pass_reorder_blocks. Computed gotos are used e.g. in libjava interpreter to optimize the tight interpretting loop. I think those cases would benefit from having at least scheduling/reordering and alignments done right. Of course it depends on how bad the compile time implications are (I think in addition to libjava, there was a lucier's testcase that made us to go for this trick) , but I would prefer it over ading yet another hack into cfgrtl... We also may just avoid cfglayout cleanup_cfg while doing computed gotos... Honza
Re: [PATCH] MIPS: MIPS32r2 FP indexed access instruction set support
Maciej W. Rozycki ma...@codesourcery.com writes: 2013-11-14 Maciej W. Rozycki ma...@codesourcery.com gcc/ * config/mips/mips.h (ISA_HAS_FP4): Remove TARGET_FLOAT64 restriction for ISA_MIPS32R2. (ISA_HAS_FP_MADD4_MSUB4): Remove ISA_MIPS32R2 special-casing. (ISA_HAS_NMADD4_NMSUB4): Likewise. (ISA_HAS_FP_RECIP_RSQRT): Likewise. (ISA_HAS_PREFETCHX): Redefine in terms of ISA_HAS_FP4. Nice. So the reasoning is that, after your RECIP.fmt patch, the only direct uses of ISA_HAS_FP4 for instruction selection are indexed loads and stores. That's why extending them to ISA_MIPS32R2 !TARGET_FLOAT64 allows ISA_HAS_FP4 to be simplified. But if we keep: @@ -906,16 +906,14 @@ struct mips_cpu_info { #define GENERATE_MADD_MSUB (TARGET_IMADD !TARGET_MIPS16) /* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'. */ -#define ISA_HAS_FP_MADD4_MSUB4 (ISA_HAS_FP4 \ - || (ISA_MIPS32R2 !TARGET_MIPS16)) +#define ISA_HAS_FP_MADD4_MSUB4 ISA_HAS_FP4 /* ISA has floating-point madd and msub instructions 'c = a * b [+-] c'. */ #define ISA_HAS_FP_MADD3_MSUB3 TARGET_LOONGSON_2EF /* ISA has floating-point nmadd and nmsub instructions 'd = -((a * b) [+-] c)'. */ -#define ISA_HAS_NMADD4_NMSUB4(ISA_HAS_FP4 \ - || (ISA_MIPS32R2 !TARGET_MIPS16)) +#define ISA_HAS_NMADD4_NMSUB4ISA_HAS_FP4 then I think we should also have a macro like: /* ISA has indexed floating-point loads and stores (LWXC1, LDXC1, SWXC1 and SDXC1). */ #define ISA_HAS_LXC1_SXC1 ISA_HAS_FP4 and add: Note that this macro should only be used by other ISA_HAS_* macros. to the ISA_HAS_FP4 comment. OK with those changes, thanks. Richard
Re: [ia64] [PR target/57491] internal compiler error: in ia64_split_tmode -O2, quadmath
As far as I understand semantics of this insn: (insn 200 199 0 (set (reg:DI 15 r15) (mem:DI (post_dec:DI (reg/f:DI 15 r15 [447])) [3 *_61[_12]{lb: 1 sz: 64}.text+8 S8 A64])) -1 (nil)) What is done is (in that sequence). 1. Calculate address of MEM: get r15 value. 2. Decrement r15 value. 3. Load MEM in to r15. Point 2 is useless as we kill it by 3. So, it is clobbered and as mention in comment this is sometimes ok to override pointer with pointer value. That depends on the semantics of the hardware instruction though, does it really guarantee 1/2/3 in that order? We need to set `dead' flag only when address is actually going to be killed by load. Patch in the bottom. Test from PR pass. The patch looks good to me if you also adjust the last sentence in the comment just above the block: /* It is possible for reload to decide to overwrite a pointer with the value it points to. In that case we have to do the loads in the appropriate order so that the pointer is not destroyed too early. Also we must not generate a postmodify for that second load, or rws_access_regno will die. */ Something like And we must not generate a postmodify for the second load if the destination register overlaps with the base register. Thanks for fixing this. -- Eric Botcazou
Re: [PATCH][3/3] Re-submission of Altera Nios II port, libgcc parts
On 2013/7/14 03:55 PM, Chung-Lin Tang wrote: nios2 libgcc parts. Since the original post, the only main change has been the fdpbit vs soft-fp issue raised by Joseph, which has been resolved. Other parts are mostly the same. The Nios II libgcc parts have been further updated to include a sfp-machine.h file, and the Linux atomic cmpxchg updated to now use a fixed address kernel helper cmpxchg routine, similar to ARM. Thanks, Chung-Lin 2013-11-16 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * config.host (nios2-*-*,nios2-*-linux*): Add nios2 host cases. * config/nios2/lib2-nios2.h: New file. * config/nios2/lib2-divmod-hi.c: New file. * config/nios2/linux-unwind.h: New file. * config/nios2/lib2-divmod.c: New file. * config/nios2/linux-atomic.c: New file. * config/nios2/t-nios2: New file. * config/nios2/crti.asm: New file. * config/nios2/t-linux: New file. * config/nios2/lib2-divtable.c: New file. * config/nios2/lib2-mul.c: New file. * config/nios2/tramp.c: New file. * config/nios2/crtn.asm: New file. * config/nios2/sfp-machine.h: New file. Index: libgcc/config.host === --- libgcc/config.host (revision 204897) +++ libgcc/config.host (working copy) @@ -146,6 +146,9 @@ mips*-*-*) nds32*-*) cpu_type=nds32 ;; +nios2*-*-*) + cpu_type=nios2 + ;; powerpc*-*-*) cpu_type=rs6000 ;; @@ -876,6 +879,15 @@ nds32*-elf*) ;; esac ;; +nios2-*-linux*) + tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc + extra_parts=$extra_parts crti.o crtn.o + md_unwind_header=nios2/linux-unwind.h + ;; +nios2-*-*) + tmake_file=$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl t-softfp + extra_parts=$extra_parts crti.o crtn.o + ;; pdp11-*-*) tmake_file=pdp11/t-pdp11 t-fdpbit ;; Index: libgcc/config/nios2/t-linux === --- libgcc/config/nios2/t-linux (revision 0) +++ libgcc/config/nios2/t-linux (revision 0) @@ -0,0 +1,7 @@ +# Soft-float functions go in glibc only, to facilitate the possible +# future addition of exception and rounding mode support integrated +# with fenv.h. + +LIB2FUNCS_EXCLUDE = _floatdidf _floatdisf _fixunsdfsi _fixunssfsi \ + _fixunsdfdi _fixdfdi _fixunssfdi _fixsfdi _floatundidf _floatundisf +LIB2ADD += $(srcdir)/config/nios2/linux-atomic.c Index: libgcc/config/nios2/sfp-machine.h === --- libgcc/config/nios2/sfp-machine.h (revision 0) +++ libgcc/config/nios2/sfp-machine.h (revision 0) @@ -0,0 +1,78 @@ +/* Soft-FP definitions for Altera Nios II. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +#define _FP_W_TYPE_SIZE 32 +#define _FP_W_TYPE unsigned long +#define _FP_WS_TYPE signed long +#define _FP_I_TYPE long + +#define _FP_MUL_MEAT_S(R,X,Y)\ + _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_D(R,X,Y)\ + _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_Q(R,X,Y)\ + _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm) + +#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_loop(S,R,X,Y) +#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) +#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) + +#define _FP_NANFRAC_S ((_FP_QNANBIT_S 1) - 1) +#define _FP_NANFRAC_D ((_FP_QNANBIT_D 1) - 1), -1 +#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q 1) - 1), -1, -1, -1 +#define _FP_NANSIGN_S 0 +#define _FP_NANSIGN_D 0 +#define _FP_NANSIGN_Q 0 + +#define _FP_KEEPNANFRACP 1 +#define _FP_QNANNEGATEDP 0 + +/* Someone please check this. */ +#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP) \ + do {\ +if ((_FP_FRAC_HIGH_RAW_##fs(X) _FP_QNANBIT_##fs) \ + !(_FP_FRAC_HIGH_RAW_##fs(Y) _FP_QNANBIT_##fs)) \ + {\ + R##_s = Y##_s; \ + _FP_FRAC_COPY_##wc(R,Y);\ + }\ +else
Re: [PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts
On 2013/10/17 10:20 PM, Bernd Schmidt wrote: On 07/14/2013 09:54 AM, Chung-Lin Tang wrote: These are nios2 patches for the gcc testsuite. Some new testcases were added since the last posting. Index: gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c === --- gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c (revision 200946) +++ gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c (working copy) @@ -124,16 +124,17 @@ __memmove_chk (void *dst, const void *src, __SIZE_ void * memset (void *dst, int c, __SIZE_TYPE__ n) { + while (n-- != 0) +n[(char *) dst] = c; + /* Single-byte memsets should be done inline when optimisation - is enabled. */ + is enabled. Do this after the copy in case we're being called to + initialize bss. */ #ifdef __OPTIMIZE__ if (memset_disallowed inside_main n 2) abort (); #endif - while (n-- != 0) -n[(char *) dst] = c; - return dst; } I'm not sure I understand this change. Is nios2 the only target calling memset to initialize bss, and memset_disallowed is nonzero at the start of execution? This appears to be for the nios2-elf bare metal testing. Looking at the upstream libgloss sources, nios2 is indeed not the only target that calls memset for zeroing bss. Note that however, in a somewhat reverse of situation: https://sourceware.org/ml/newlib/2013/msg00264.html It appears that due to the presumed usage model for Nios II, Sandra did not contribute the libgloss port. So the original code that needed this testsuite change is probably not there. OTOH, if this change is not deemed harmful, than it might further robustify the testsuite. Index: gcc/testsuite/gcc.target/nios2/nios2-int-types.c === --- gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) +++ gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) @@ -0,0 +1,34 @@ +/* Test that various types are all derived from int. */ +/* { dg-do compile { target nios2-*-* } } */ I think you can lose the { target nios2-*-* } for everything inside gcc.target/nios2. Done. The new attached patch also has the Dxx constraint test removed, as that feature is now removed from the compiler. The memset() change mentioned above is still in the patch, but will remove before committing if not approved. Thanks, Chung-Lin 2013-11-16 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * gcc.dg/stack-usage-1.c (SIZE): Define case for __nios2__. * gcc.dg/20040813-1.c: Skip for nios2-*-*. * gcc.dg/20020312-2.c: Add __nios2__ case. * g++.dg/other/PR23205.C: Skip for nios2-*-*. * g++.dg/other/pr23205-2.C: Skip for nios2-*-*. * g++.dg/cpp0x/constexpr-rom.C: Skip for nios2-*-*. * g++.dg/cpp0x/alias-decl-debug-0.C: Skip for nios2-*-*. * g++.old-deja/g++.jason/thunk3.C: Skip for nios2-*-*. * lib/target-supports.exp (check_profiling_available): Check for nios2-*-elf. * gcc.c-torture/execute/pr47237.x:: Skip for nios2-*-*. * gcc.c-torture/execute/20101011-1.c: Skip for nios2-*-*. * gcc.c-torture/execute/builtins/lib/chk.c (memset): Place char-based memset loop before inline check, to prevent problems when called to initialize .bss. Update comments. * gcc.target/nios2/nios2.exp: New DejaGNU file. * gcc.target/nios2/nios2-custom-1.c: New test. * gcc.target/nios2/nios2-trap-insn.c: New test. * gcc.target/nios2/nios2-builtin-custom.c: New test. * gcc.target/nios2/nios2-builtin-io.c: New test. * gcc.target/nios2/nios2-stack-check-1.c: New test. * gcc.target/nios2/nios2-stack-check-2.c: New test. * gcc.target/nios2/nios2-rdctl.c: New test. * gcc.target/nios2/nios2-wrctl.c: New test. * gcc.target/nios2/nios2-wrctl-zero.c: New test. * gcc.target/nios2/nios2-wrctl-not-zero.c: New test. * gcc.target/nios2/nios2-rdwrctl-1.c: New test. * gcc.target/nios2/nios2-ashlsi3-one_shift.c: New test. * gcc.target/nios2/nios2-mul-options-1.c: New test. * gcc.target/nios2/nios2-mul-options-2.c: New test. * gcc.target/nios2/nios2-mul-options-3.c: New test. * gcc.target/nios2/nios2-mul-options-4.c: New test. * gcc.target/nios2/nios2-nor.c: New test. * gcc.target/nios2/nios2-stxio.c: New test. * gcc.target/nios2/custom-fp-1.c: New test. * gcc.target/nios2/custom-fp-2.c: New test. * gcc.target/nios2/custom-fp-3.c: New test. * gcc.target/nios2/custom-fp-4.c: New test. * gcc.target/nios2/custom-fp-5.c: New test. * gcc.target/nios2/custom-fp-6.c: New test. * gcc.target/nios2/custom-fp-7.c: New test. *
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
Cary Coutant ccout...@google.com writes: Isn't this something that should be expressed in DWARF with DW_AT_ranges? See DWARF4, section 2.17, Does GCC generate such ranges? GCC does generate these ranges. However, according to Cary many tools do not rely on dwarf info for locating the corresponding function name, they just look at the symbols to identify what function an address resides in. Nor would we want tools such as objdump and profilers to rely on dwarf for locating the function names as this would not work for binaries that were generated without -g options or had their debug info stripped. Yes, while the information needed is in the DWARF info, I don't think it's a good idea to depend on having debug info in all binaries. It's quite common to need to symbolize binaries that don't have debug info, and without a symbol such as Sri and Teresa are proposing, the result will be not just an address that didn't get symbolized, but an address that gets symbolized incorrectly (in a way that will often be quite misleading). +1 FWIW. Another reason is that on MIPS, we could be throwing cold MIPS and MIPS16/microMIPS code into the same section. Tools like objdump rely on symbols to figure out which ISA mode is being used where. Thanks, Richard
Re: [wide-int] Documentation and comment tweaks
Richard Sandiford rdsandif...@googlemail.com writes: Some minor tweaks to the documentation and commentary. The hyphenation and non zero-nonzero changes are supposed to be per guidelines: http://gcc.gnu.org/codingconventions.html#Spelling Hope I got them right. OK to install? Ping. Index: gcc/dfp.c === --- gcc/dfp.c 2013-11-09 09:50:47.392396760 + +++ gcc/dfp.c 2013-11-09 11:07:22.754160541 + @@ -605,8 +605,8 @@ decimal_real_to_integer (const REAL_VALU return real_to_integer (to); } -/* Likewise, but returns a wide_int with PRECISION. Fail - is set if the value does not fit. */ +/* Likewise, but returns a wide_int with PRECISION. *FAIL is set if the + value does not fit. */ wide_int decimal_real_to_integer (const REAL_VALUE_TYPE *r, bool *fail, int precision) Index: gcc/doc/rtl.texi === --- gcc/doc/rtl.texi 2013-11-09 09:50:47.392396760 + +++ gcc/doc/rtl.texi 2013-11-09 11:07:22.755160549 + @@ -1542,11 +1542,10 @@ Similarly, there is only one object for @findex const_double @item (const_double:@var{m} @var{i0} @var{i1} @dots{}) This represents either a floating-point constant of mode @var{m} or -(on ports older ports that do not define +(on older ports that do not define @code{TARGET_SUPPORTS_WIDE_INT}) an integer constant too large to fit into @code{HOST_BITS_PER_WIDE_INT} bits but small enough to fit within -twice that number of bits (GCC does not provide a mechanism to -represent even larger constants). In the latter case, @var{m} will be +twice that number of bits. In the latter case, @var{m} will be @code{VOIDmode}. For integral values constants for modes with more bits than twice the number in @code{HOST_WIDE_INT} the implied high order bits of that constant are copies of the top bit of @@ -1576,25 +1575,25 @@ the precise bit pattern used by the targ This contains an array of @code{HOST_WIDE_INTS} that is large enough to hold any constant that can be represented on the target. This form of rtl is only used on targets that define -@code{TARGET_SUPPORTS_WIDE_INT} to be non zero and then -@code{CONST_DOUBLES} are only used to hold floating point values. If +@code{TARGET_SUPPORTS_WIDE_INT} to be nonzero and then +@code{CONST_DOUBLE}s are only used to hold floating-point values. If the target leaves @code{TARGET_SUPPORTS_WIDE_INT} defined as 0, @code{CONST_WIDE_INT}s are not used and @code{CONST_DOUBLE}s are as they were before. -The values are stored in a compressed format. The higher order +The values are stored in a compressed format. The higher-order 0s or -1s are not represented if they are just the logical sign extension of the number that is represented. @findex CONST_WIDE_INT_VEC @item CONST_WIDE_INT_VEC (@var{code}) Returns the entire array of @code{HOST_WIDE_INT}s that are used to -store the value. This macro should be rarely used. +store the value. This macro should be rarely used. @findex CONST_WIDE_INT_NUNITS @item CONST_WIDE_INT_NUNITS (@var{code}) The number of @code{HOST_WIDE_INT}s used to represent the number. -Note that this generally be smaller than the number of +Note that this generally is smaller than the number of @code{HOST_WIDE_INT}s implied by the mode size. @findex CONST_WIDE_INT_ELT Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi 2013-11-09 09:50:47.392396760 + +++ gcc/doc/tm.texi 2013-11-09 11:07:22.757160564 + @@ -9683,10 +9683,9 @@ Returns the negative of the floating poi Returns the absolute value of @var{x}. @end deftypefn -@deftypefn Macro void REAL_VALUE_FROM_INT (REAL_VALUE_TYPE @var{x}, HOST_WIDE_INT @var{val}, enum machine_mode @var{mode}) -Converts a double-precision integer found in @var{val}, -into a floating point value which is then stored into @var{x}. The -value is truncated to fit in mode @var{mode}. +@deftypefn Macro void REAL_VALUE_FROM_INT (REAL_VALUE_TYPE @var{x}, const wide_int_ref @var{val}, enum machine_mode @var{mode}) +Converts integer @var{val} into a floating-point value which is then +stored into @var{x}. The value is truncated to fit in mode @var{mode}. @end deftypefn @node Mode Switching @@ -11497,15 +11496,15 @@ The default value of this hook is based @defmac TARGET_SUPPORTS_WIDE_INT On older ports, large integers are stored in @code{CONST_DOUBLE} rtl -objects. Newer ports define @code{TARGET_SUPPORTS_WIDE_INT} to be non -zero to indicate that large integers are stored in +objects. Newer ports define @code{TARGET_SUPPORTS_WIDE_INT} to be nonzero +to indicate that large integers are stored in @code{CONST_WIDE_INT} rtl objects. The @code{CONST_WIDE_INT} allows very large integer constants to be represented. @code{CONST_DOUBLE} -are
Re: [PATCH] Avoid some unnecessary set_cfun calls
Jakub Jelinek ja...@redhat.com writes: On Wed, Nov 13, 2013 at 11:27:10AM +0100, Richard Biener wrote: Also, I wonder if we couldn't defer the expensive ira_init, if the info computed by it is used only during RTL optimization passes (haven't verified it yet), then supposedly we could just remember using some target hook what the last state was when we did ira_init last time, and call ira_init again at the start of expansion or so if it is different from the last time. For i?86/x86_64/ppc* this would be whether the current function's DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been called, for rx whether interrupt attribute is the same and for mips whatever is needed. I wonder why we cannot move all the stuff we re-init to a member of struct function (or rather have a pointer to that info there to cache it across functions with the same options). That is, get rid of more global state? That would make switching back and forth cheaper. Isn't that what the SWITCHABLE_TARGET stuff is all about? So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc* (and rx if maintainer cares) and tweak it to attach somehow struct target_globals * to TARGET_OPTION_NODE somehow. A problem might be that lots of the save_target_globals allocated structures are heap allocated rather than GC, so we might leak memory. Wonder if save_target_globals couldn't just compute the aggregate size of all the structures it allocates with XCNEW right now (plus required alignment if needed) and just allocate them together with the ggc_alloc_target_globals after the target_globals structure itself. Yeah, that might be worth doing. I think the only non-GCed structures with subpointers are target_ira_int and target_lra_int, but we could probably convert them to GCed structures. (And perhaps use the same technique recursively. E.g. LRA could work out the maximum number of operand_alternative structures needed and allocate them in one go.) Thanks, Richard
Re: [PATCH] Time profiler - phase 2
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c566a85..1562098 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,15 @@ +2013-11-13 Martin Liskamarxin.li...@gmail.com + Jan Hubicka j...@suse.cz + + * cgraphunit.c (node_cmp): New function. + (expand_all_functions): Function ordering added. + * common.opt: New profile based function reordering flag introduced. + * coverage.c (get_coverage_counts): Wrong profile handled. + * ipa.c (cgraph_externally_visible_p): New late flag introduced. + * lto-partition.c: Support for time profile added. + * lto.c: Likewise. + * value-prof.c: Histogram instrumentation switch added. + 2013-11-13 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/59036 diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 4765e6a..7cdd9a4 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1821,6 +1821,17 @@ expand_function (struct cgraph_node *node) ipa_remove_all_references (node-ref_list); } +/* Node comparer that is responsible for the order that corresponds + to time when a function was launched for the first time. */ + +static int +node_cmp (const void *pa, const void *pb) +{ + const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; + const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; + + return b-tp_first_run - a-tp_first_run; Please stabilize this by using node-order when tp_first_run is equivalent. Later we ought to use better heuristic here, but order may be good enough to start with. diff --git a/gcc/ipa.c b/gcc/ipa.c index a11b1c7..d92a332 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -761,10 +761,14 @@ cgraph_externally_visible_p (struct cgraph_node *node, This improves code quality and we know we will duplicate them at most twice (in the case that we are not using plugin and link with object file implementing same COMDAT) */ - if ((in_lto_p || whole_program) - DECL_COMDAT (node-decl) - comdat_can_be_unshared_p (node)) -return false; + if ((in_lto_p || whole_program || profile_arc_flag) + DECL_COMDAT (node-decl) + comdat_can_be_unshared_p (node)) +{ + gcc_checking_assert (cgraph_function_body_availability (node) + AVAIL_OVERWRITABLE); + return false; +} /* When doing link time optimizations, hidden symbols become local. */ if (in_lto_p @@ -932,7 +936,7 @@ function_and_variable_visibility (bool whole_program) } gcc_assert ((!DECL_WEAK (node-decl) !DECL_COMDAT (node-decl)) - || TREE_PUBLIC (node-decl) + || TREE_PUBLIC (node-decl) || node-weakref || DECL_EXTERNAL (node-decl)); if (cgraph_externally_visible_p (node, whole_program)) @@ -949,7 +953,7 @@ function_and_variable_visibility (bool whole_program) node-definition !node-weakref !DECL_EXTERNAL (node-decl)) { - gcc_assert (whole_program || in_lto_p + gcc_assert (whole_program || in_lto_p || profile_arc_flag || !TREE_PUBLIC (node-decl)); node-unique_name = ((node-resolution == LDPR_PREVAILING_DEF_IRONLY || node-resolution == LDPR_PREVAILING_DEF_IRONLY_EXP) These changes are unrelated, please remove them. @@ -395,6 +397,20 @@ node_cmp (const void *pa, const void *pb) { const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; + + /* Profile reorder flag enables function reordering based on first execution + of a function. All functions with profile are placed in ascending + order at the beginning. */ + + if (flag_profile_reorder_functions) a-tp_first_run != b-tp_first_run + { +if (a-tp_first_run b-tp_first_run) + return a-tp_first_run - b-tp_first_run; + +if (a-tp_first_run || b-tp_first_run) + return b-tp_first_run - a-tp_first_run; Drop a comment explaining the logic here ;) @@ -449,7 +465,7 @@ void lto_balanced_map (void) { int n_nodes = 0; - int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0; + int n_varpool_nodes = 0, varpool_pos = 0; struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid); struct varpool_node **varpool_order = NULL; int i; @@ -481,10 +497,13 @@ lto_balanced_map (void) get better about minimizing the function bounday, but until that things works smoother if we order in source order. */ qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp); + + if (cgraph_dump_file) +for(i = 0; i n_nodes; i++) + fprintf (cgraph_dump_file, Balanced map symbol order:%s:%u\n, cgraph_node_asm_name
[PowerPC] libffi fixes and support for PowerPC64 ELFv2
The following six patches correspond to patches posted to the libffi mailing list a few days ago to add support for PowerPC64 ELFv2. The patch series has been tested on powerpc-linux, powerpc64-linux, powerpc64le-linux and powerpc-freebsd by running the libffi testsuite, and on powerpc64-linux and powerpc64le-linux by gcc bootstrap and regression testing. I guess the normal procedure would be to wait for upstream approval before applying here, but since Uli's gcc support for ELFv2 is in, it would be nice to have a working libffi along with that. -- Alan Modra Australia Development Lab, IBM
Reinstate powerpc bounce buffer copying in ffi.c
The first patch in the series is a little different to the corresponding upstream libffi patch, because there I needed to revert some fixes first. The second patch in the series is entirely missing due to the testsuite already being fixed in gcc. This patch properly copies the bounce buffer to destination, and only uses the bounce buffer for FFI_SYSV. I also fix an accounting error in integer register usage. * src/powerpc/ffi.c (ffi_prep_cif_machdep): Do not consume an int arg when returning a small struct for FFI_SYSV ABI. (ffi_call): Only use bounce buffer when FLAG_RETURNS_SMST. Properly copy bounce buffer to destination. diff -urp gcc-virgin/libffi/src/powerpc/ffi.c gcc1/libffi/src/powerpc/ffi.c --- gcc-virgin/libffi/src/powerpc/ffi.c 2013-06-25 09:36:39.259402853 +0930 +++ gcc1/libffi/src/powerpc/ffi.c 2013-11-15 23:06:57.313036827 +1030 @@ -691,7 +691,7 @@ case FFI_TYPE_STRUCT: /* * The final SYSV ABI says that structures smaller or equal 8 bytes - * are returned in r3/r4. The FFI_GCC_SYSV ABI instead returns them + * are returned in r3/r4. The FFI_GCC_SYSV ABI instead returns them * in memory. * * NOTE: The assembly code can safely assume that it just needs to @@ -700,7 +700,10 @@ * set. */ if (cif-abi == FFI_SYSV size = 8) - flags |= FLAG_RETURNS_SMST; + { + flags |= FLAG_RETURNS_SMST; + break; + } intarg_count++; flags |= FLAG_RETVAL_REFERENCE; /* Fall through. */ @@ -919,30 +922,25 @@ { /* * The final SYSV ABI says that structures smaller or equal 8 bytes - * are returned in r3/r4. The FFI_GCC_SYSV ABI instead returns them + * are returned in r3/r4. The FFI_GCC_SYSV ABI instead returns them * in memory. * - * Just to keep things simple for the assembly code, we will always - * bounce-buffer struct return values less than or equal to 8 bytes. - * This allows the ASM to handle SYSV small structures by directly - * writing r3 and r4 to memory without worrying about struct size. + * We bounce-buffer SYSV small struct return values so that sysv.S + * can write r3 and r4 to memory without worrying about struct size. */ unsigned int smst_buffer[2]; extended_cif ecif; - unsigned int rsize = 0; ecif.cif = cif; ecif.avalue = avalue; - /* Ensure that we have a valid struct return value */ ecif.rvalue = rvalue; - if (cif-rtype-type == FFI_TYPE_STRUCT) { -rsize = cif-rtype-size; -if (rsize = 8) - ecif.rvalue = smst_buffer; -else if (!rvalue) - ecif.rvalue = alloca(rsize); - } + if ((cif-flags FLAG_RETURNS_SMST) != 0) +ecif.rvalue = smst_buffer; + /* Ensure that we have a valid struct return value. + FIXME: Isn't this just papering over a user problem? */ + else if (!rvalue cif-rtype-type == FFI_TYPE_STRUCT) +ecif.rvalue = alloca (cif-rtype-size); switch (cif-abi) { @@ -967,7 +965,21 @@ /* Check for a bounce-buffered return value */ if (rvalue ecif.rvalue == smst_buffer) -memcpy(rvalue, smst_buffer, rsize); +{ + unsigned int rsize = cif-rtype-size; +#ifndef __LITTLE_ENDIAN__ + /* The SYSV ABI returns a structure of up to 4 bytes in size +left-padded in r3. */ + if (rsize = 4) + memcpy (rvalue, (char *) smst_buffer + 4 - rsize, rsize); + /* The SYSV ABI returns a structure of up to 8 bytes in size +left-padded in r3/r4. */ + else if (rsize = 8) + memcpy (rvalue, (char *) smst_buffer + 8 - rsize, rsize); + else +#endif + memcpy (rvalue, smst_buffer, rsize); +} } -- Alan Modra Australia Development Lab, IBM
libffi doc fixes
This enshrines the current testsuite practice of using ffi_arg for returned values. It would be reasonable and logical to use the actual return argument type as passed to ffi_prep_cif, but this would mean changing a large number of tests that use ffi_arg and all backends that write results to an ffi_arg. * doc/libffi.texi: Correct example code. diff -urp gcc1/libffi/doc/libffi.texi gcc3/libffi/doc/libffi.texi --- gcc1/libffi/doc/libffi.texi 2013-06-13 21:03:53.0 +0930 +++ gcc3/libffi/doc/libffi.texi 2013-11-15 23:16:06.811643952 +1030 @@ -214,7 +214,7 @@ int main() ffi_type *args[1]; void *values[1]; char *s; - int rc; + ffi_arg rc; /* Initialize the argument info vectors */ args[0] = ffi_type_pointer; @@ -222,7 +222,7 @@ int main() /* Initialize the cif */ if (ffi_prep_cif(cif, FFI_DEFAULT_ABI, 1, - ffi_type_uint, args) == FFI_OK) + ffi_type_sint, args) == FFI_OK) @{ s = Hello World!; ffi_call(cif, puts, rc, values); @@ -414,6 +414,7 @@ Here is the corresponding code to descri int i; tm_type.size = tm_type.alignment = 0; + tm_type.type = FFI_TYPE_STRUCT; tm_type.elements = tm_type_elements; for (i = 0; i 9; i++) @@ -540,7 +541,7 @@ A trivial example that creates a new @co #include ffi.h /* Acts like puts with the file given at time of enclosure. */ -void puts_binding(ffi_cif *cif, unsigned int *ret, void* args[], +void puts_binding(ffi_cif *cif, ffi_arg *ret, void* args[], FILE *stream) @{ *ret = fputs(*(char **)args[0], stream); @@ -565,7 +566,7 @@ int main() /* Initialize the cif */ if (ffi_prep_cif(cif, FFI_DEFAULT_ABI, 1, - ffi_type_uint, args) == FFI_OK) + ffi_type_sint, args) == FFI_OK) @{ /* Initialize the closure, setting stream to stdout */ if (ffi_prep_closure_loc(closure, cif, puts_binding, -- Alan Modra Australia Development Lab, IBM
Pass floating point values on powerpc64 as per ABI
The powerpc64 support opted to pass floating point values both in the fpr area and the parameter save area, necessary when the backend doesn't know if a function argument corresponds to the ellipsis arguments of a variadic function. This patch adds powerpc support for variadic functions, and changes the code to only pass fp in the ABI mandated area. ELFv2 needs this change since the parameter save area may not exist there. This also fixes two faulty tests that used a non-variadic function cast to call a variadic function, and spuriously reasoned that this is somehow necessary for static functions.. The whitespace changes, and comment changes in the tests, are to make the gcc versions of these files mirror upstream libffi. * src/powerpc/ffitarget.h (FFI_TARGET_SPECIFIC_VARIADIC): Define. (FFI_EXTRA_CIF_FIELDS): Define. * src/powerpc/ffi.c (ffi_prep_args64): Save fprs as per the ABI, not to both fpr and param save area. (ffi_prep_cif_machdep_core): Renamed from ffi_prep_cif_machdep. Keep initial flags. Formatting. Remove dead FFI_LINUX_SOFT_FLOAT code. (ffi_prep_cif_machdep, ffi_prep_cif_machdep_var): New functions. (ffi_closure_helper_LINUX64): Pass floating point as per ABI, not to both fpr and parameter save areas. * libffi/testsuite/libffi.call/cls_double_va.c (main): Correct function cast and don't call ffi_prep_cif. * libffi/testsuite/libffi.call/cls_longdouble_va.c (main): Likewise. diff -urp gcc3/libffi/src/powerpc/ffitarget.h gcc4/libffi/src/powerpc/ffitarget.h --- gcc3/libffi/src/powerpc/ffitarget.h 2013-11-15 23:03:07.313959745 +1030 +++ gcc4/libffi/src/powerpc/ffitarget.h 2013-11-15 23:19:21.692053339 +1030 @@ -106,6 +106,10 @@ typedef enum ffi_abi { #define FFI_CLOSURES 1 #define FFI_NATIVE_RAW_API 0 +#if defined (POWERPC) || defined (POWERPC_FREEBSD) +# define FFI_TARGET_SPECIFIC_VARIADIC 1 +# define FFI_EXTRA_CIF_FIELDS unsigned nfixedargs +#endif /* For additional types like the below, take care about the order in ppc_closures.S. They must follow after the FFI_TYPE_LAST. */ diff -urp gcc3/libffi/src/powerpc/ffi.c gcc4/libffi/src/powerpc/ffi.c --- gcc3/libffi/src/powerpc/ffi.c 2013-11-15 23:06:57.313036827 +1030 +++ gcc4/libffi/src/powerpc/ffi.c 2013-11-15 23:47:24.402296569 +1030 @@ -443,9 +443,9 @@ ffi_prep_args64 (extended_cif *ecif, uns /* 'fpr_base' points at the space for fpr3, and grows upwards as we use FPR registers. */ valp fpr_base; - int fparg_count; + unsigned int fparg_count; - int i, words; + unsigned int i, words, nargs, nfixedargs; ffi_type **ptr; double double_tmp; union { @@ -482,30 +482,34 @@ ffi_prep_args64 (extended_cif *ecif, uns /* Now for the arguments. */ p_argv.v = ecif-avalue; - for (ptr = ecif-cif-arg_types, i = ecif-cif-nargs; - i 0; - i--, ptr++, p_argv.v++) + nargs = ecif-cif-nargs; + nfixedargs = ecif-cif-nfixedargs; + for (ptr = ecif-cif-arg_types, i = 0; + i nargs; + i++, ptr++, p_argv.v++) { switch ((*ptr)-type) { case FFI_TYPE_FLOAT: double_tmp = **p_argv.f; - *next_arg.f = (float) double_tmp; + if (fparg_count NUM_FPR_ARG_REGISTERS64 i nfixedargs) + *fpr_base.d++ = double_tmp; + else + *next_arg.f = (float) double_tmp; if (++next_arg.ul == gpr_end.ul) next_arg.ul = rest.ul; - if (fparg_count NUM_FPR_ARG_REGISTERS64) - *fpr_base.d++ = double_tmp; fparg_count++; FFI_ASSERT (flags FLAG_FP_ARGUMENTS); break; case FFI_TYPE_DOUBLE: double_tmp = **p_argv.d; - *next_arg.d = double_tmp; + if (fparg_count NUM_FPR_ARG_REGISTERS64 i nfixedargs) + *fpr_base.d++ = double_tmp; + else + *next_arg.d = double_tmp; if (++next_arg.ul == gpr_end.ul) next_arg.ul = rest.ul; - if (fparg_count NUM_FPR_ARG_REGISTERS64) - *fpr_base.d++ = double_tmp; fparg_count++; FFI_ASSERT (flags FLAG_FP_ARGUMENTS); break; @@ -513,18 +517,20 @@ ffi_prep_args64 (extended_cif *ecif, uns #if FFI_TYPE_LONGDOUBLE != FFI_TYPE_DOUBLE case FFI_TYPE_LONGDOUBLE: double_tmp = (*p_argv.d)[0]; - *next_arg.d = double_tmp; + if (fparg_count NUM_FPR_ARG_REGISTERS64 i nfixedargs) + *fpr_base.d++ = double_tmp; + else + *next_arg.d = double_tmp; if (++next_arg.ul == gpr_end.ul) next_arg.ul = rest.ul; - if (fparg_count NUM_FPR_ARG_REGISTERS64) - *fpr_base.d++ = double_tmp; fparg_count++; double_tmp = (*p_argv.d)[1]; - *next_arg.d = double_tmp; + if (fparg_count NUM_FPR_ARG_REGISTERS64 i nfixedargs) + *fpr_base.d++ = double_tmp; + else + *next_arg.d =
Support PowerPC64 ELFv2 ABI
Finally, this adds _CALL_ELF == 2 support. ELFv1 objects can't be linked with ELFv2 objects, so this is one case where preprocessor tests in ffi.c are fine. Also, there is no need to define a new FFI_ELFv2 or somesuch value in enum ffi_abi. FFI_LINUX64 will happily serve both ABIs. * src/powerpc/ffitarget.h (FFI_V2_TYPE_FLOAT_HOMOG, FFI_V2_TYPE_DOUBLE_HOMOG, FFI_V2_TYPE_SMALL_STRUCT): Define. (FFI_TRAMPOLINE_SIZE): Define variant for ELFv2. * src/powerpc/ffi.c (FLAG_ARG_NEEDS_PSAVE): Define. (discover_homogeneous_aggregate): New function. (ffi_prep_args64): Adjust start of param save area for ELFv2. Handle homogenous floating point struct parms. (ffi_prep_cif_machdep_core): Adjust space calculation for ELFv2. Handle ELFv2 return values. Set FLAG_ARG_NEEDS_PSAVE. Handle homogenous floating point structs. (ffi_call): Increase size of smst_buffer for ELFv2. Handle ELFv2. (flush_icache): Compile for ELFv2. (ffi_prep_closure_loc): Set up ELFv2 trampoline. (ffi_closure_helper_LINUX64): Don't return all structs directly to caller. Handle homogenous floating point structs. Handle ELFv2 struct return values. * src/powerpc/linux64.S (ffi_call_LINUX64): Set up r2 for ELFv2. Adjust toc save location. Call function pointer using r12. Handle FLAG_RETURNS_SMST. Don't predict branches. * src/powerpc/linux64_closure.S (ffi_closure_LINUX64): Set up r2 for ELFv2. Define ELFv2 versions of STACKFRAME, PARMSAVE, and RETVAL. Handle possibly missing parameter save area. Handle ELFv2 return values. (.note.GNU-stack): Move inside outer #ifdef. diff -urp gcc6/libffi/src/powerpc/ffitarget.h gcc7/libffi/src/powerpc/ffitarget.h --- gcc6/libffi/src/powerpc/ffitarget.h 2013-11-15 23:19:21.692053339 +1030 +++ gcc7/libffi/src/powerpc/ffitarget.h 2013-11-15 23:48:02.452807673 +1030 @@ -122,14 +122,23 @@ typedef enum ffi_abi { defined in ffi.c, to determine the exact return type and its size. */ #define FFI_SYSV_TYPE_SMALL_STRUCT (FFI_TYPE_LAST + 2) -#if defined(POWERPC64) || defined(POWERPC_AIX) +/* Used by ELFv2 for homogenous structure returns. */ +#define FFI_V2_TYPE_FLOAT_HOMOG(FFI_TYPE_LAST + 1) +#define FFI_V2_TYPE_DOUBLE_HOMOG (FFI_TYPE_LAST + 2) +#define FFI_V2_TYPE_SMALL_STRUCT (FFI_TYPE_LAST + 3) + +#if _CALL_ELF == 2 +# define FFI_TRAMPOLINE_SIZE 32 +#else +# if defined(POWERPC64) || defined(POWERPC_AIX) # if defined(POWERPC_DARWIN64) #define FFI_TRAMPOLINE_SIZE 48 # else #define FFI_TRAMPOLINE_SIZE 24 # endif -#else /* POWERPC || POWERPC_AIX */ +# else /* POWERPC || POWERPC_AIX */ # define FFI_TRAMPOLINE_SIZE 40 +# endif #endif #ifndef LIBFFI_ASM diff -urp gcc6/libffi/src/powerpc/ffi.c gcc7/libffi/src/powerpc/ffi.c --- gcc6/libffi/src/powerpc/ffi.c 2013-11-15 23:47:40.153680507 +1030 +++ gcc7/libffi/src/powerpc/ffi.c 2013-11-15 23:51:02.333766929 +1030 @@ -49,6 +49,7 @@ enum { FLAG_RETURNS_128BITS = 1 (31-27), /* cr6 */ FLAG_ARG_NEEDS_COPY = 1 (31- 7), + FLAG_ARG_NEEDS_PSAVE = FLAG_ARG_NEEDS_COPY, /* Used by ELFv2 */ #ifndef __NO_FPRS__ FLAG_FP_ARGUMENTS = 1 (31- 6), /* cr1.eq; specified by ABI */ #endif @@ -383,6 +384,45 @@ enum { }; enum { ASM_NEEDS_REGISTERS64 = 4 }; +#if _CALL_ELF == 2 +static unsigned int +discover_homogeneous_aggregate (const ffi_type *t, unsigned int *elnum) +{ + switch (t-type) +{ +case FFI_TYPE_FLOAT: +case FFI_TYPE_DOUBLE: + *elnum = 1; + return (int) t-type; + +case FFI_TYPE_STRUCT:; + { + unsigned int base_elt = 0, total_elnum = 0; + ffi_type **el = t-elements; + while (*el) + { + unsigned int el_elt, el_elnum = 0; + el_elt = discover_homogeneous_aggregate (*el, el_elnum); + if (el_elt == 0 + || (base_elt base_elt != el_elt)) + return 0; + base_elt = el_elt; + total_elnum += el_elnum; + if (total_elnum 8) + return 0; + el++; + } + *elnum = total_elnum; + return base_elt; + } + +default: + return 0; +} +} +#endif + + /* ffi_prep_args64 is called by the assembly routine once stack space has been allocated for the function's arguments. @@ -470,7 +510,11 @@ ffi_prep_args64 (extended_cif *ecif, uns stacktop.c = (char *) stack + bytes; gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - NUM_GPR_ARG_REGISTERS64; gpr_end.ul = gpr_base.ul + NUM_GPR_ARG_REGISTERS64; +#if _CALL_ELF == 2 + rest.ul = stack + 4 + NUM_GPR_ARG_REGISTERS64; +#else rest.ul = stack + 6 + NUM_GPR_ARG_REGISTERS64; +#endif fpr_base.d = gpr_base.d - NUM_FPR_ARG_REGISTERS64; fparg_count = 0; next_arg.ul = gpr_base.ul; @@ -492,6 +536,8 @@ ffi_prep_args64 (extended_cif *ecif, uns i nargs;
Tidy powerpc linux64_closure.S with defines for stack offsets
This patch prepares for ELFv2, where sizes of these areas change. It also makes some minor changes to improve code efficiency. * src/powerpc/linux64.S (ffi_call_LINUX64): Tweak restore of r28. (.note.GNU-stack): Move inside outer #ifdef. * src/powerpc/linux64_closure.S (STACKFRAME, PARMSAVE, RETVAL): Define and use throughout. (ffi_closure_LINUX64): Save fprs before buying stack. (.note.GNU-stack): Move inside outer #ifdef. diff -urp gcc4/libffi/src/powerpc/linux64.S gcc5/libffi/src/powerpc/linux64.S --- gcc4/libffi/src/powerpc/linux64.S 2013-11-15 23:03:07.337958821 +1030 +++ gcc5/libffi/src/powerpc/linux64.S 2013-11-15 23:37:49.672792802 +1030 @@ -130,7 +130,7 @@ ffi_call_LINUX64: /* Restore the registers we used and return. */ mr %r1, %r28 ld %r0, 16(%r28) - ld %r28, -32(%r1) + ld %r28, -32(%r28) mtlr%r0 ld %r29, -24(%r1) ld %r30, -16(%r1) @@ -197,8 +197,8 @@ ffi_call_LINUX64: .uleb128 0x4 .align 3 .LEFDE1: -#endif -#if defined __ELF__ defined __linux__ +# if (defined __ELF__ defined __linux__) || _CALL_ELF == 2 .section.note.GNU-stack,,@progbits +# endif #endif diff -urp gcc4/libffi/src/powerpc/linux64_closure.S gcc5/libffi/src/powerpc/linux64_closure.S --- gcc4/libffi/src/powerpc/linux64_closure.S 2013-11-15 23:03:07.333958973 +1030 +++ gcc5/libffi/src/powerpc/linux64_closure.S 2013-11-15 23:37:49.672792802 +1030 @@ -50,53 +50,57 @@ ffi_closure_LINUX64: .text .ffi_closure_LINUX64: #endif + +# 48 bytes special reg save area + 64 bytes parm save area +# + 16 bytes retval area + 13*8 bytes fpr save area + round to 16 +# define STACKFRAME 240 +# define PARMSAVE 48 +# define RETVAL PARMSAVE+64 + .LFB1: - # save general regs into parm save area - std %r3, 48(%r1) - std %r4, 56(%r1) - std %r5, 64(%r1) - std %r6, 72(%r1) mflr%r0 + # Save general regs into parm save area + # This is the parameter save area set up by our caller. + std %r3, PARMSAVE+0(%r1) + std %r4, PARMSAVE+8(%r1) + std %r5, PARMSAVE+16(%r1) + std %r6, PARMSAVE+24(%r1) + std %r7, PARMSAVE+32(%r1) + std %r8, PARMSAVE+40(%r1) + std %r9, PARMSAVE+48(%r1) + std %r10, PARMSAVE+56(%r1) - std %r7, 80(%r1) - std %r8, 88(%r1) - std %r9, 96(%r1) - std %r10, 104(%r1) std %r0, 16(%r1) - # mandatory 48 bytes special reg save area + 64 bytes parm save area - # + 16 bytes retval area + 13*8 bytes fpr save area + round to 16 - stdu%r1, -240(%r1) -.LCFI0: + # load up the pointer to the parm save area + addi%r5, %r1, PARMSAVE # next save fpr 1 to fpr 13 - stfd %f1, 128+(0*8)(%r1) - stfd %f2, 128+(1*8)(%r1) - stfd %f3, 128+(2*8)(%r1) - stfd %f4, 128+(3*8)(%r1) - stfd %f5, 128+(4*8)(%r1) - stfd %f6, 128+(5*8)(%r1) - stfd %f7, 128+(6*8)(%r1) - stfd %f8, 128+(7*8)(%r1) - stfd %f9, 128+(8*8)(%r1) - stfd %f10, 128+(9*8)(%r1) - stfd %f11, 128+(10*8)(%r1) - stfd %f12, 128+(11*8)(%r1) - stfd %f13, 128+(12*8)(%r1) + stfd%f1, -104+(0*8)(%r1) + stfd%f2, -104+(1*8)(%r1) + stfd%f3, -104+(2*8)(%r1) + stfd%f4, -104+(3*8)(%r1) + stfd%f5, -104+(4*8)(%r1) + stfd%f6, -104+(5*8)(%r1) + stfd%f7, -104+(6*8)(%r1) + stfd%f8, -104+(7*8)(%r1) + stfd%f9, -104+(8*8)(%r1) + stfd%f10, -104+(9*8)(%r1) + stfd%f11, -104+(10*8)(%r1) + stfd%f12, -104+(11*8)(%r1) + stfd%f13, -104+(12*8)(%r1) - # set up registers for the routine that actually does the work - # get the context pointer from the trampoline - mr %r3, %r11 + # load up the pointer to the saved fpr registers */ + addi%r6, %r1, -104 - # now load up the pointer to the result storage - addi %r4, %r1, 112 + # load up the pointer to the result storage + addi%r4, %r1, -STACKFRAME+RETVAL - # now load up the pointer to the parameter save area - # in the previous frame - addi %r5, %r1, 240 + 48 + stdu%r1, -STACKFRAME(%r1) +.LCFI0: - # now load up the pointer to the saved fpr registers */ - addi %r6, %r1, 128 + # get the context pointer from the trampoline + mr %r3, %r11 # make the call #ifdef _CALL_LINUX @@ -115,7 +119,7 @@ ffi_closure_LINUX64: mflr %r4# move address of .Lret to r4 sldi %r3, %r3, 4# now multiply return type by 16 addi %r4, %r4, .Lret_type0 - .Lret - ld %r0, 240+16(%r1) + ld %r0, STACKFRAME+16(%r1) add %r3, %r3, %r4 # add contents of table to table
Align powerpc64 structs passed by value as per ABI
The powerpc64 ABIs align structs passed by value, a fact ignored by gcc for quite some time. Since gcc now does the correct alignment, libffi needs to follow suit. This ought to be made selectable via a new abi value, and the #ifdefs removed from ffi.c along with almost all the other #ifdefs present there and in assembly. * src/powerpc/ffi.c (ffi_prep_args64): Align struct parameters according to __STRUCT_PARM_ALIGN__. (ffi_prep_cif_machdep_core): Likewise. (ffi_closure_helper_LINUX64): Likewise. diff -urp gcc5/libffi/src/powerpc/ffi.c gcc6/libffi/src/powerpc/ffi.c --- gcc5/libffi/src/powerpc/ffi.c 2013-11-15 23:47:31.890003986 +1030 +++ gcc6/libffi/src/powerpc/ffi.c 2013-11-15 23:47:40.153680507 +1030 @@ -428,6 +428,7 @@ ffi_prep_args64 (extended_cif *ecif, uns unsigned long *ul; float *f; double *d; +size_t p; } valp; /* 'stacktop' points at the previous backchain pointer. */ @@ -462,6 +463,9 @@ ffi_prep_args64 (extended_cif *ecif, uns double **d; } p_argv; unsigned long gprvalue; +#ifdef __STRUCT_PARM_ALIGN__ + unsigned long align; +#endif stacktop.c = (char *) stack + bytes; gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - NUM_GPR_ARG_REGISTERS64; @@ -538,6 +542,13 @@ ffi_prep_args64 (extended_cif *ecif, uns #endif case FFI_TYPE_STRUCT: +#ifdef __STRUCT_PARM_ALIGN__ + align = (*ptr)-alignment; + if (align __STRUCT_PARM_ALIGN__) + align = __STRUCT_PARM_ALIGN__; + if (align 1) + next_arg.p = ALIGN (next_arg.p, align); +#endif words = ((*ptr)-size + 7) / 8; if (next_arg.ul = gpr_base.ul next_arg.ul + words gpr_end.ul) { @@ -828,6 +839,10 @@ ffi_prep_cif_machdep_core (ffi_cif *cif) else for (ptr = cif-arg_types, i = cif-nargs; i 0; i--, ptr++) { +#ifdef __STRUCT_PARM_ALIGN__ + unsigned int align; +#endif + switch ((*ptr)-type) { #if FFI_TYPE_LONGDOUBLE != FFI_TYPE_DOUBLE @@ -843,6 +858,14 @@ ffi_prep_cif_machdep_core (ffi_cif *cif) break; case FFI_TYPE_STRUCT: +#ifdef __STRUCT_PARM_ALIGN__ + align = (*ptr)-alignment; + if (align __STRUCT_PARM_ALIGN__) + align = __STRUCT_PARM_ALIGN__; + align = align / 8; + if (align 1) + intarg_count = ALIGN (intarg_count, align); +#endif intarg_count += ((*ptr)-size + 7) / 8; break; @@ -1383,6 +1406,9 @@ ffi_closure_helper_LINUX64 (ffi_closure unsigned long i, avn, nfixedargs; ffi_cif *cif; ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64; +#ifdef __STRUCT_PARM_ALIGN__ + unsigned long align; +#endif cif = closure-cif; avalue = alloca (cif-nargs * sizeof (void *)); @@ -1437,6 +1463,13 @@ ffi_closure_helper_LINUX64 (ffi_closure break; case FFI_TYPE_STRUCT: +#ifdef __STRUCT_PARM_ALIGN__ + align = arg_types[i]-alignment; + if (align __STRUCT_PARM_ALIGN__) + align = __STRUCT_PARM_ALIGN__; + if (align 1) + pst = (unsigned long *) ALIGN ((size_t) pst, align); +#endif #ifndef __LITTLE_ENDIAN__ /* Structures with size less than eight bytes are passed left-padded. */ -- Alan Modra Australia Development Lab, IBM
Committed: arc/constraints.md: simplify Rcq definition
2013-11-16 Joern Rennecke joern.renne...@embecosm.com * config/arc/constraints.md (Rcq): Simplify register number test. Index: config/arc/constraints.md === --- config/arc/constraints.md (revision 204899) +++ config/arc/constraints.md (revision 204900) @@ -338,7 +338,7 @@ (define_constraint Rcq (and (match_code REG) (match_test TARGET_Rcq !arc_ccfsm_cond_exec_p () -REGNO (op) 7) ^ 4) - 4) 15) == REGNO (op +IN_RANGE (REGNO (op) ^ 4, 4, 11 ; If we need a reload, we generally want to steer reload to use three-address ; alternatives in preference of two-address alternatives, unless the
Committed: arc.c: Make predication in delay slots explicit
2013-11-16 Joern Rennecke joern.renne...@embecosm.com * config/arc/arc.c (arc_predicate_delay_insns): New function. (pass_data_arc_predicate_delay_insns): New pass_data instance. (pass_arc_predicate_delay_insns): New subclass of rtl_opt_class. (make_pass_arc_predicate_delay_insns): New function. (arc_init): Register pass_arc_predicate_delay_insns if flag_delayed_branch is active. Index: config/arc/arc.c === --- config/arc/arc.c(revision 204900) +++ config/arc/arc.c(revision 204901) @@ -632,6 +632,44 @@ make_pass_arc_ifcvt (gcc::context *ctxt) return new pass_arc_ifcvt (ctxt); } +static unsigned arc_predicate_delay_insns (void); + +namespace { + +const pass_data pass_data_arc_predicate_delay_insns = +{ + RTL_PASS, + arc_predicate_delay_insns, /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + false, /* has_gate */ + true,/* has_execute */ + TV_IFCVT2, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_df_finish /* todo_flags_finish */ +}; + +class pass_arc_predicate_delay_insns : public rtl_opt_pass +{ +public: + pass_arc_predicate_delay_insns(gcc::context *ctxt) + : rtl_opt_pass(pass_data_arc_predicate_delay_insns, ctxt) + {} + + /* opt_pass methods: */ + unsigned int execute () { return arc_predicate_delay_insns (); } +}; + +} // anon namespace + +rtl_opt_pass * +make_pass_arc_predicate_delay_insns (gcc::context *ctxt) +{ + return new pass_arc_predicate_delay_insns (ctxt); +} + /* Called by OVERRIDE_OPTIONS to initialize various things. */ void @@ -752,6 +790,16 @@ arc_init (void) register_pass (arc_ifcvt4_info); register_pass (arc_ifcvt5_info); } + + if (flag_delayed_branch) +{ + opt_pass *pass_arc_predicate_delay_insns + = make_pass_arc_predicate_delay_insns (g); + struct register_pass_info arc_predicate_delay_info + = { pass_arc_predicate_delay_insns, dbr, 1, PASS_POS_INSERT_AFTER }; + + register_pass (arc_predicate_delay_info); +} } /* Check ARC options, generate derived target attributes. */ @@ -8296,6 +8344,74 @@ arc_ifcvt (void) } return 0; } + +/* Find annulled delay insns and convert them to use the appropriate predicate. + This allows branch shortening to size up these insns properly. */ + +static unsigned +arc_predicate_delay_insns (void) +{ + for (rtx insn = get_insns (); insn; insn = NEXT_INSN (insn)) +{ + rtx pat, jump, dlay, src, cond, *patp; + int reverse; + + if (!NONJUMP_INSN_P (insn) + || GET_CODE (pat = PATTERN (insn)) != SEQUENCE) + continue; + jump = XVECEXP (pat, 0, 0); + dlay = XVECEXP (pat, 0, 1); + if (!JUMP_P (jump) || !INSN_ANNULLED_BRANCH_P (jump)) + continue; + /* If the branch insn does the annulling, leave the delay insn alone. */ + if (!TARGET_AT_DBR_CONDEXEC !INSN_FROM_TARGET_P (dlay)) + continue; + /* ??? Could also leave DLAY un-conditionalized if its target is dead +on the other path. */ + gcc_assert (GET_CODE (PATTERN (jump)) == SET); + gcc_assert (SET_DEST (PATTERN (jump)) == pc_rtx); + src = SET_SRC (PATTERN (jump)); + gcc_assert (GET_CODE (src) == IF_THEN_ELSE); + cond = XEXP (src, 0); + if (XEXP (src, 2) == pc_rtx) + reverse = 0; + else if (XEXP (src, 1) == pc_rtx) + reverse = 1; + else + gcc_unreachable (); + if (!INSN_FROM_TARGET_P (dlay) != reverse) + { + enum machine_mode ccm = GET_MODE (XEXP (cond, 0)); + enum rtx_code code = reverse_condition (GET_CODE (cond)); + if (code == UNKNOWN || ccm == CC_FP_GTmode || ccm == CC_FP_GEmode) + code = reverse_condition_maybe_unordered (GET_CODE (cond)); + + cond = gen_rtx_fmt_ee (code, GET_MODE (cond), +copy_rtx (XEXP (cond, 0)), +copy_rtx (XEXP (cond, 1))); + } + else + cond = copy_rtx (cond); + patp = PATTERN (dlay); + pat = *patp; + /* dwarf2out.c:dwarf2out_frame_debug_expr doesn't know +what to do with COND_EXEC. */ + if (RTX_FRAME_RELATED_P (dlay)) + { + /* As this is the delay slot insn of an anulled branch, +dwarf2out.c:scan_trace understands the anulling semantics +without the COND_EXEC. */ + rtx note = alloc_reg_note (REG_FRAME_RELATED_EXPR, pat, +REG_NOTES (dlay)); + validate_change (dlay, REG_NOTES (dlay), note, 1); + } +
Re: Implement C11/C++11 set of UCNs allowed in identifiers
On Fri, 15 Nov 2013, Tom Tromey wrote: Joseph == Joseph S Myers jos...@codesourcery.com writes: Joseph Any comments on whether we should consider the Unicode character data Joseph - UnicodeData.txt and DerivedNormalizationProps.txt, a total of about Joseph 2MB - as source code for the generated ucnid.h that should be checked Joseph into the repository and included in releases, or as an external build Joseph tool or system library that doesn't need including in the GCC source Joseph code? The last time this came up, for something in libgcj, it wasn't permissible, according to the Unicode rules, to check in the file. I haven't checked whether this has changed. According to the NEWS for GNU miscfiles-1.4, License worries about the Unicode data are no longer a problem due to a change in the Unicode license., and according to http://www.gnu.org/licenses/license-list.html, It is a lax permissive license, compatible with all versions of the GPL.. My recollection is that previously there was a license peculiarity meaning that you could import the character data and export an equivalent file under a free software license, but not distribute the original file under such a license. (The license text is included in the generated ucnid.h.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On Sat, 16 Nov 2013, Chung-Lin Tang wrote: +/* Local prototypes. */ I'd much prefer not to have any of those. Achieve this by putting +struct gcc_target targetm = TARGET_INITIALIZER; along with all the necessary definitions at the end of the file (and reordering some other functions). I would rather keep it that way. The ARM backend is another example of this. I agree with Bernd's preference of topologically sorting static functions / variables so forward declarations are only needed in cases of recursion. I sometimes think it should be possible to convert many target macros to hooks, including generating function definitions from the macro definitions in .h files, with a lot more automation than I think has been used for that before. Some back ends using a style rather requires forward function declarations is a needless complication for that sort of thing (indeed, if anyone were working on automated target macro to hook conversion, I'd suggest an early automated change should be making all back ends define targetm at the end of the file and avoid forward static declarations where possible). -- Joseph S. Myers jos...@codesourcery.com
[0/10] Replace host_integerp and tree_low_cst
After the patch that went in yesterday, all calls to host_integerp and tree_low_cst pass a constant pos argument. This series replaces each function with two separate ones: host_integerp (x, 0) - tree_fits_shwi_p (x) host_integerp (x, 1) - tree_fits_uhwi_p (x) tree_low_cst (x, 0) - tree_to_shwi (x) tree_low_cst (x, 1) - tree_to_uhwi (x) The change is part of the wide-int conversion. In some ways it's one of the more bikesheddy parts because, unlike wide_int itself, it just changes an interface without adding new functionality. The two main reasons for doing it IMO are: 1. the new functions are direct analogues of wide-int functions 2. the return type of tree_to_*hwi matches the function name, whereas tree_low_cst (x, 1) gets an unsigned value as a signed type The series is pretty laboured because I wanted to separate out the large mechanical changes from the small manual changes for ease of review. Tested by building: aarch64-linux-gnueabi alpha-linux-gnu arm-linux-gnueabi c6x-elf epiphany-elf ia64-linux-gnu iq2000-elf m32c-elf mep-elf mips-linux-gnu picochip-elf powerpc-linux-gnu s390-linux-gnu sparc-linux-gnu x86_64-darwin before and after the patch, checking that there were no new warnings, and comparing the before and after assembly output at -O2 for gcc.dg, g++.dg and gcc.c-torture. Also tested normally on x86_64-linux-gnu and powerpc64-linux-gnu. Thanks, Richard
[1/10] Add tree_fits_shwi_p and tree_fits_uhwi_p
Add tree_fits_shwi_p and tree_fits_uhwi_p. The implementations are taken directly from host_integerp. Thanks, Richard gcc/ * tree.h (tree_fits_shwi_p, tree_fits_uhwi_p): Declare. * tree.c (tree_fits_shwi_p, tree_fits_uhwi_p): Define. Index: gcc/tree.h === --- gcc/tree.h 2013-11-16 09:09:56.388037088 + +++ gcc/tree.h 2013-11-16 09:11:53.535874667 + @@ -3659,6 +3659,16 @@ extern int host_integerp (const_tree, in ATTRIBUTE_PURE /* host_integerp is pure only when checking is disabled. */ #endif ; +extern bool tree_fits_shwi_p (const_tree) +#ifndef ENABLE_TREE_CHECKING + ATTRIBUTE_PURE /* tree_fits_shwi_p is pure only when checking is disabled. */ +#endif + ; +extern bool tree_fits_uhwi_p (const_tree) +#ifndef ENABLE_TREE_CHECKING + ATTRIBUTE_PURE /* tree_fits_uhwi_p is pure only when checking is disabled. */ +#endif + ; extern HOST_WIDE_INT tree_low_cst (const_tree, int); #if !defined ENABLE_TREE_CHECKING (GCC_VERSION = 4003) extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT Index: gcc/tree.c === --- gcc/tree.c 2013-11-16 09:09:56.388037088 + +++ gcc/tree.c 2013-11-16 09:11:53.534874659 + @@ -6990,6 +6990,32 @@ host_integerp (const_tree t, int pos) || (pos TREE_INT_CST_HIGH (t) == 0))); } +/* Return true if T is an INTEGER_CST whose numerical value (extended + according to TYPE_UNSIGNED) fits in a signed HOST_WIDE_INT. */ + +bool +tree_fits_shwi_p (const_tree t) +{ + return (t != NULL_TREE + TREE_CODE (t) == INTEGER_CST + ((TREE_INT_CST_HIGH (t) == 0 + (HOST_WIDE_INT) TREE_INT_CST_LOW (t) = 0) + || (TREE_INT_CST_HIGH (t) == -1 + (HOST_WIDE_INT) TREE_INT_CST_LOW (t) 0 + !TYPE_UNSIGNED (TREE_TYPE (t); +} + +/* Return true if T is an INTEGER_CST whose numerical value (extended + according to TYPE_UNSIGNED) fits in an unsigned HOST_WIDE_INT. */ + +bool +tree_fits_uhwi_p (const_tree t) +{ + return (t != NULL_TREE + TREE_CODE (t) == INTEGER_CST + TREE_INT_CST_HIGH (t) == 0); +} + /* Return the HOST_WIDE_INT least significant bits of T if it is an INTEGER_CST and there is no overflow. POS is nonzero if the result must be non-negative. We must be able to satisfy the above conditions. */
[2/10] Mechanical replacement of host_integerp (..., 0)
This is the result of using sed to replace all single-line host_integerp (x, 0)s with tree_to_shwi_p (x), taking care to handle bracket nesting in x. Thanks, Richard gcc/ada/ * gcc-interface/cuintp.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/c-family/ * c-ada-spec.c, c-common.c, c-format.c, c-pretty-print.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/c/ * c-parser.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/cp/ * error.c, init.c, parser.c, semantics.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/go/ * gofrontend/expressions.cc: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/java/ * class.c, expr.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. gcc/ * builtins.c, config/alpha/alpha.c, config/c6x/predicates.md, config/ia64/predicates.md, config/iq2000/iq2000.c, config/mips/mips.c, config/s390/s390.c, dbxout.c, dwarf2out.c, except.c, explow.c, expr.c, expr.h, fold-const.c, gimple-fold.c, gimple-ssa-strength-reduction.c, gimple.c, godump.c, graphite-scop-detection.c, graphite-sese-to-poly.c, omp-low.c, predict.c, rtlanal.c, sdbout.c, simplify-rtx.c, stor-layout.c, tree-data-ref.c, tree-dfa.c, tree-pretty-print.c, tree-sra.c, tree-ssa-alias.c, tree-ssa-forwprop.c, tree-ssa-loop-ivopts.c, tree-ssa-loop-prefetch.c, tree-ssa-math-opts.c, tree-ssa-phiopt.c, tree-ssa-reassoc.c, tree-ssa-sccvn.c, tree-ssa-strlen.c, tree-ssa-structalias.c, tree-vect-data-refs.c, tree-vect-patterns.c, tree-vectorizer.h, tree.c, var-tracking.c, varasm.c: Replace host_integerp (..., 0) with tree_fits_shwi_p throughout. tree-to-shwi.diff.bz2 Description: BZip2 compressed data
[3/10] Mechanical replacement of host_integerp (..., 1)
Like the previous patch, but for host_integerp (x, 1) - tree_to_uhwi_p (x). Thanks, Richard gcc/ada/ * gcc-interface/decl.c, gcc-interface/misc.c, gcc-interface/utils.c: Replace host_integerp (..., 1) with tree_fits_uhwi_p throughout. gcc/c-family/ * c-ada-spec.c, c-common.c, c-pretty-print.c: Replace host_integerp (..., 1) with tree_fits_uhwi_p throughout. gcc/cp/ * decl.c: Replace host_integerp (..., 1) with tree_fits_uhwi_p throughout. gcc/ * builtins.c, config/alpha/alpha.c, config/iq2000/iq2000.c, config/mips/mips.c, dbxout.c, dwarf2out.c, expr.c, fold-const.c, gimple-fold.c, godump.c, omp-low.c, predict.c, sdbout.c, stor-layout.c, tree-dfa.c, tree-sra.c, tree-ssa-forwprop.c, tree-ssa-loop-prefetch.c, tree-ssa-phiopt.c, tree-ssa-sccvn.c, tree-ssa-strlen.c, tree-ssa-structalias.c, tree-vect-data-refs.c, tree-vect-patterns.c, tree.c, varasm.c, alias.c, cfgexpand.c, config/aarch64/aarch64.c, config/arm/arm.c, config/epiphany/epiphany.c, config/i386/i386.c, config/m32c/m32c-pragma.c, config/mep/mep-pragma.c, config/rs6000/rs6000.c, config/sparc/sparc.c, emit-rtl.c, function.c, gimplify.c, ipa-prop.c, stmt.c, trans-mem.c, tree-cfg.c, tree-object-size.c, tree-ssa-ccp.c, tree-ssa-loop-ivcanon.c, tree-stdarg.c, tree-switch-conversion.c, tree-vect-generic.c, tree-vrp.c, tsan.c, ubsan.c: Replace host_integerp (..., 1) with tree_fits_uhwi_p throughout. tree-to-uhwi.diff.bz2 Description: BZip2 compressed data
[4/10] Mop up remaining host_integerp calls
Handle host_integerp references that weren't caught by the sed. Thanks, Richard gcc/ada/ * gcc-interface/cuintp.c: Update comments to refer to tree_fits_shwi_p rather than host_integerp. * gcc-interface/decl.c (gnat_to_gnu_entity): Use tree_fits_uhwi_p rather than host_integerp. * gcc-interface/utils.c (rest_of_record_type_compilation): Likewise. gcc/ * expr.h: Update comments to refer to tree_fits_[su]hwi_p rather than host_integerp. Index: gcc/ada/gcc-interface/cuintp.c === --- gcc/ada/gcc-interface/cuintp.c 2013-11-16 09:14:25.293995960 + +++ gcc/ada/gcc-interface/cuintp.c 2013-11-16 09:33:31.591920981 + @@ -150,7 +150,7 @@ UI_From_gnu (tree Input) Int_Vector vec; #if HOST_BITS_PER_WIDE_INT == 64 - /* On 64-bit hosts, host_integerp tells whether the input fits in a + /* On 64-bit hosts, tree_fits_shwi_p tells whether the input fits in a signed 64-bit integer. Then a truncation tells whether it fits in a signed 32-bit integer. */ if (tree_fits_shwi_p (Input)) @@ -162,7 +162,7 @@ UI_From_gnu (tree Input) else return No_Uint; #else - /* On 32-bit hosts, host_integerp tells whether the input fits in a + /* On 32-bit hosts, tree_fits_shwi_p tells whether the input fits in a signed 32-bit integer. Then a sign test tells whether it fits in a signed 64-bit integer. */ if (tree_fits_shwi_p (Input)) Index: gcc/ada/gcc-interface/decl.c === --- gcc/ada/gcc-interface/decl.c2013-11-16 09:22:06.982466042 + +++ gcc/ada/gcc-interface/decl.c2013-11-16 09:33:31.614921192 + @@ -1480,8 +1480,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit AGGREGATE_TYPE_P (gnu_type) tree_fits_uhwi_p (TYPE_SIZE_UNIT (gnu_type)) !(TYPE_IS_PADDING_P (gnu_type) - !host_integerp (TYPE_SIZE_UNIT - (TREE_TYPE (TYPE_FIELDS (gnu_type))), 1))) + !tree_fits_uhwi_p (TYPE_SIZE_UNIT + (TREE_TYPE (TYPE_FIELDS (gnu_type)) static_p = true; /* Now create the variable or the constant and set various flags. */ Index: gcc/ada/gcc-interface/utils.c === --- gcc/ada/gcc-interface/utils.c 2013-11-16 09:22:06.985466064 + +++ gcc/ada/gcc-interface/utils.c 2013-11-16 09:33:31.616921211 + @@ -1753,8 +1753,8 @@ rest_of_record_type_compilation (tree re TREE_CODE (curpos) == PLUS_EXPR tree_fits_uhwi_p (TREE_OPERAND (curpos, 1)) TREE_CODE (TREE_OPERAND (curpos, 0)) == MULT_EXPR - host_integerp - (TREE_OPERAND (TREE_OPERAND (curpos, 0), 1), 1)) + tree_fits_uhwi_p + (TREE_OPERAND (TREE_OPERAND (curpos, 0), 1))) { tree offset = TREE_OPERAND (TREE_OPERAND (curpos, 0), 0); unsigned HOST_WIDE_INT addend Index: gcc/expr.h === --- gcc/expr.h 2013-11-16 09:14:25.398996758 + +++ gcc/expr.h 2013-11-16 09:33:31.719922154 + @@ -26,7 +26,7 @@ #define GCC_EXPR_H #include rtl.h /* For optimize_size */ #include flags.h -/* For host_integerp, tree_low_cst, fold_convert, size_binop, ssize_int, +/* For tree_fits_[su]hwi_p, tree_low_cst, fold_convert, size_binop, ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ #include tree-core.h /* For GET_MODE_BITSIZE, word_mode */
[5/10] Add tree_to_shwi and tree_to_uhwi
Add tree_to_shwi and tree_to_uhwi. Initially tree_to_uhwi returns a HOST_WIDE_INT, so that it's a direct replacement for tree_low_cst. Patch 10 makes it return unsigned HOST_WIDE_INT instead. Thanks, Richard gcc/ * tree.h (tree_to_shwi, tree_to_uhwi): Declare, with inline expansions. * tree.c (tree_to_shwi, tree_to_uhwi): New functions. Index: gcc/tree.c === --- gcc/tree.c 2013-11-15 16:46:27.420395607 + +++ gcc/tree.c 2013-11-15 16:47:15.226216885 + @@ -7027,6 +7027,28 @@ tree_low_cst (const_tree t, int pos) return TREE_INT_CST_LOW (t); } +/* T is an INTEGER_CST whose numerical value (extended according to + TYPE_UNSIGNED) fits in a signed HOST_WIDE_INT. Return that + HOST_WIDE_INT. */ + +HOST_WIDE_INT +tree_to_shwi (const_tree t) +{ + gcc_assert (tree_fits_shwi_p (t)); + return TREE_INT_CST_LOW (t); +} + +/* T is an INTEGER_CST whose numerical value (extended according to + TYPE_UNSIGNED) fits in an unsigned HOST_WIDE_INT. Return that + HOST_WIDE_INT. */ + +HOST_WIDE_INT +tree_to_uhwi (const_tree t) +{ + gcc_assert (tree_fits_uhwi_p (t)); + return TREE_INT_CST_LOW (t); +} + /* Return the most significant (sign) bit of T. */ int Index: gcc/tree.h === --- gcc/tree.h 2013-11-15 16:46:26.263399881 + +++ gcc/tree.h 2013-11-15 16:46:56.569287095 + @@ -3662,6 +3662,8 @@ extern bool tree_fits_uhwi_p (const_tree #endif ; extern HOST_WIDE_INT tree_low_cst (const_tree, int); +extern HOST_WIDE_INT tree_to_shwi (const_tree); +extern HOST_WIDE_INT tree_to_uhwi (const_tree); #if !defined ENABLE_TREE_CHECKING (GCC_VERSION = 4003) extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT tree_low_cst (const_tree t, int pos) @@ -3669,6 +3671,20 @@ tree_low_cst (const_tree t, int pos) gcc_assert (host_integerp (t, pos)); return TREE_INT_CST_LOW (t); } + +extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT +tree_to_shwi (const_tree t) +{ + gcc_assert (tree_fits_shwi_p (t)); + return TREE_INT_CST_LOW (t); +} + +extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT +tree_to_uhwi (const_tree t) +{ + gcc_assert (tree_fits_uhwi_p (t)); + return TREE_INT_CST_LOW (t); +} #endif extern int tree_int_cst_sgn (const_tree); extern int tree_int_cst_sign_bit (const_tree);
[6/10] Mechanical replacement of tree_low_cst (..., 0)
Like patch 2, but using sed to replace tree_low_cst (x, 0) with tree_to_shwi (x). Thanks, Richard gcc/c-family/ * c-common.c, c-format.c, c-omp.c, c-pretty-print.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/c/ * c-parser.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/cp/ * class.c, dump.c, error.c, init.c, method.c, parser.c, semantics.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/go/ * gofrontend/expressions.cc: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/java/ * class.c, expr.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/objc/ * objc-next-runtime-abi-02.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. gcc/ * builtins.c, cilk-common.c, config/aarch64/aarch64.c, config/alpha/alpha.c, config/arm/arm.c, config/c6x/predicates.md, config/i386/i386.c, config/ia64/predicates.md, config/s390/s390.c, coverage.c, dbxout.c, dwarf2out.c, except.c, explow.c, expr.c, expr.h, fold-const.c, gimple-fold.c, godump.c, ipa-prop.c, omp-low.c, predict.c, rtlanal.c, sdbout.c, stmt.c, stor-layout.c, targhooks.c, tree-cfg.c, tree-data-ref.c, tree-inline.c, tree-ssa-forwprop.c, tree-ssa-loop-prefetch.c, tree-ssa-phiopt.c, tree-ssa-sccvn.c, tree-ssa-strlen.c, tree-stdarg.c, tree-vect-data-refs.c, tree-vect-patterns.c, tree.c, tree.h, var-tracking.c, varasm.c: Replace tree_low_cst (..., 0) with tree_to_shwi throughout. tree-to-shwi.diff.bz2 Description: BZip2 compressed data
[7/10] Mechanical replacement of tree_low_cst (..., 1)
Like the previous patch, but for tree_low_cst (x, 1) - tree_to_uhwi_p (x). Thanks, Richard gcc/ada/ * gcc-interface/decl.c, gcc-interface/utils.c, gcc-interface/utils2.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. gcc/c-family/ * c-common.c, c-cppbuiltin.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. gcc/c/ * c-decl.c, c-typeck.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. gcc/cp/ * call.c, class.c, decl.c, error.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. gcc/objc/ * objc-encoding.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. gcc/ * alias.c, asan.c, builtins.c, cfgexpand.c, cgraph.c, config/aarch64/aarch64.c, config/alpha/predicates.md, config/arm/arm.c, config/darwin.c, config/epiphany/epiphany.c, config/i386/i386.c, config/iq2000/iq2000.c, config/m32c/m32c-pragma.c, config/mep/mep-pragma.c, config/mips/mips.c, config/picochip/picochip.c, config/rs6000/rs6000.c, cppbuiltin.c, dbxout.c, dwarf2out.c, emit-rtl.c, except.c, expr.c, fold-const.c, function.c, gimple-fold.c, godump.c, ipa-cp.c, ipa-prop.c, omp-low.c, predict.c, sdbout.c, stor-layout.c, trans-mem.c, tree-object-size.c, tree-sra.c, tree-ssa-ccp.c, tree-ssa-forwprop.c, tree-ssa-loop-ivcanon.c, tree-ssa-loop-ivopts.c, tree-ssa-loop-niter.c, tree-ssa-loop-prefetch.c, tree-ssa-strlen.c, tree-stdarg.c, tree-switch-conversion.c, tree-vect-generic.c, tree-vect-loop.c, tree-vect-patterns.c, tree-vrp.c, tree.c, tsan.c, ubsan.c, varasm.c: Replace tree_low_cst (..., 1) with tree_to_uhwi throughout. tree-to-uhwi.diff.bz2 Description: BZip2 compressed data
[8/10] Mop up remaining tree_low_cst calls
Handle tree_low_cst references that weren't caught by the sed. Thanks, Richard gcc/ada/ * gcc-interface/cuintp.c (UI_From_gnu): Use tree_to_shwi rather than tree_low_cst. gcc/c-family/ * c-common.c (fold_offsetof_1): Use tree_to_uhwi rather than tree_low_cst. (complete_array_type): Update comment to refer to tree_to_[su]hwi rather than tree_low_cst. gcc/c/ * c-decl.c (grokdeclarator): Update comment to refer to tree_to_[su]hwi rather than tree_low_cst. gcc/cp/ * decl.c (reshape_init_array_1): Use tree_to_uhwi rather than tree_low_cst. (grokdeclarator): Update comment to refer to tree_to_[su]hwi rather than tree_low_cst. gcc/ * expr.h: Update comments to refer to tree_to_[su]hwi rather than tree_low_cst. * fold-const.c (fold_binary_loc): Likewise. * expr.c (store_constructor): Use tree_to_uhwi rather than tree_low_cst. * ipa-utils.h (possible_polymorphic_call_target_p): Likewise. * stmt.c (emit_case_dispatch_table): Likewise. * tree-switch-conversion.c (emit_case_bit_tests): Likewise. Index: gcc/ada/gcc-interface/cuintp.c === --- gcc/ada/gcc-interface/cuintp.c 2013-11-16 13:08:22.531824320 + +++ gcc/ada/gcc-interface/cuintp.c 2013-11-16 13:08:24.254837390 + @@ -176,9 +176,9 @@ UI_From_gnu (tree Input) for (i = Max_For_Dint - 1; i = 0; i--) { - v[i] = tree_low_cst (fold_build1 (ABS_EXPR, gnu_type, + v[i] = tree_to_shwi (fold_build1 (ABS_EXPR, gnu_type, fold_build2 (TRUNC_MOD_EXPR, gnu_type, -gnu_temp, gnu_base)), 0); +gnu_temp, gnu_base))); gnu_temp = fold_build2 (TRUNC_DIV_EXPR, gnu_type, gnu_temp, gnu_base); } Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c 2013-11-16 13:08:22.531824320 + +++ gcc/c-family/c-common.c 2013-11-16 13:08:46.45771 + @@ -9721,8 +9721,7 @@ fold_offsetof_1 (tree expr) return error_mark_node; } off = size_binop_loc (input_location, PLUS_EXPR, DECL_FIELD_OFFSET (t), - size_int (tree_low_cst (DECL_FIELD_BIT_OFFSET (t), - 1) + size_int (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (t)) / BITS_PER_UNIT)); break; @@ -10091,7 +10090,7 @@ complete_array_type (tree *ptype, tree i { error (size of array is too large); /* If we proceed with the array type as it is, we'll eventually -crash in tree_low_cst(). */ +crash in tree_to_[su]hwi(). */ type = error_mark_node; } Index: gcc/c/c-decl.c === --- gcc/c/c-decl.c 2013-11-16 13:08:22.531824320 + +++ gcc/c/c-decl.c 2013-11-16 13:08:24.258837421 + @@ -5912,7 +5912,7 @@ grokdeclarator (const struct c_declarato else error_at (loc, size of unnamed array is too large); /* If we proceed with the array type as it is, we'll eventually -crash in tree_low_cst(). */ +crash in tree_to_[su]hwi(). */ type = error_mark_node; } Index: gcc/cp/decl.c === --- gcc/cp/decl.c 2013-11-16 13:08:22.531824320 + +++ gcc/cp/decl.c 2013-11-16 13:09:31.845353189 + @@ -5095,8 +5095,7 @@ reshape_init_array_1 (tree elt_type, tre max_index_cst = tree_to_uhwi (max_index); /* sizetype is sign extended, not zero extended. */ else - max_index_cst = tree_low_cst (fold_convert (size_type_node, max_index), - 1); + max_index_cst = tree_to_uhwi (fold_convert (size_type_node, max_index)); } /* Loop until there are no more initializers. */ @@ -10031,7 +10030,7 @@ grokdeclarator (const cp_declarator *dec { error (size of array %qs is too large, name); /* If we proceed with the array type as it is, we'll eventually -crash in tree_low_cst(). */ +crash in tree_to_[su]hwi(). */ type = error_mark_node; } Index: gcc/expr.h === --- gcc/expr.h 2013-11-16 13:08:22.531824320 + +++ gcc/expr.h 2013-11-16 13:08:24.263837459 + @@ -26,8 +26,8 @@ #define GCC_EXPR_H #include rtl.h /* For optimize_size */ #include flags.h -/* For tree_fits_[su]hwi_p, tree_low_cst, fold_convert, size_binop, ssize_int, - TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ +/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, + ssize_int, TREE_CODE, TYPE_SIZE,
[9/10] Remove host_integerp and tree_low_cst
Remove the old functions, which are now unused. Thanks, Richard gcc/ * tree.h (host_integerp, tree_low_cst): Delete. * tree.c (host_integerp, tree_low_cst): Delete. Index: gcc/tree.h === --- gcc/tree.h 2013-11-16 09:35:59.381239766 + +++ gcc/tree.h 2013-11-16 10:14:00.618868694 + @@ -3654,11 +3654,6 @@ extern int attribute_list_contained (con extern int tree_int_cst_equal (const_tree, const_tree); extern int tree_int_cst_lt (const_tree, const_tree); extern int tree_int_cst_compare (const_tree, const_tree); -extern int host_integerp (const_tree, int) -#ifndef ENABLE_TREE_CHECKING - ATTRIBUTE_PURE /* host_integerp is pure only when checking is disabled. */ -#endif - ; extern bool tree_fits_shwi_p (const_tree) #ifndef ENABLE_TREE_CHECKING ATTRIBUTE_PURE /* tree_fits_shwi_p is pure only when checking is disabled. */ @@ -3669,18 +3664,10 @@ extern bool tree_fits_uhwi_p (const_tree ATTRIBUTE_PURE /* tree_fits_uhwi_p is pure only when checking is disabled. */ #endif ; -extern HOST_WIDE_INT tree_low_cst (const_tree, int); extern HOST_WIDE_INT tree_to_shwi (const_tree); extern HOST_WIDE_INT tree_to_uhwi (const_tree); #if !defined ENABLE_TREE_CHECKING (GCC_VERSION = 4003) extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT -tree_low_cst (const_tree t, int pos) -{ - gcc_assert (host_integerp (t, pos)); - return TREE_INT_CST_LOW (t); -} - -extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT tree_to_shwi (const_tree t) { gcc_assert (tree_fits_shwi_p (t)); Index: gcc/tree.c === --- gcc/tree.c 2013-11-16 09:59:37.205620348 + +++ gcc/tree.c 2013-11-16 10:14:00.604868554 + @@ -6970,26 +6970,6 @@ tree_int_cst_compare (const_tree t1, con return 0; } -/* Return 1 if T is an INTEGER_CST that can be manipulated efficiently on - the host. If POS is zero, the value can be represented in a single - HOST_WIDE_INT. If POS is nonzero, the value must be non-negative and can - be represented in a single unsigned HOST_WIDE_INT. */ - -int -host_integerp (const_tree t, int pos) -{ - if (t == NULL_TREE) -return 0; - - return (TREE_CODE (t) == INTEGER_CST - ((TREE_INT_CST_HIGH (t) == 0 - (HOST_WIDE_INT) TREE_INT_CST_LOW (t) = 0) - || (! pos TREE_INT_CST_HIGH (t) == -1 - (HOST_WIDE_INT) TREE_INT_CST_LOW (t) 0 - !TYPE_UNSIGNED (TREE_TYPE (t))) - || (pos TREE_INT_CST_HIGH (t) == 0))); -} - /* Return true if T is an INTEGER_CST whose numerical value (extended according to TYPE_UNSIGNED) fits in a signed HOST_WIDE_INT. */ @@ -7016,17 +6996,6 @@ tree_fits_uhwi_p (const_tree t) TREE_INT_CST_HIGH (t) == 0); } -/* Return the HOST_WIDE_INT least significant bits of T if it is an - INTEGER_CST and there is no overflow. POS is nonzero if the result must - be non-negative. We must be able to satisfy the above conditions. */ - -HOST_WIDE_INT -tree_low_cst (const_tree t, int pos) -{ - gcc_assert (host_integerp (t, pos)); - return TREE_INT_CST_LOW (t); -} - /* T is an INTEGER_CST whose numerical value (extended according to TYPE_UNSIGNED) fits in a signed HOST_WIDE_INT. Return that HOST_WIDE_INT. */
[10/10] Make tree_to_uhwi return unsigned
This is probably the only non-obvious part of the series. I went through all callers to tree_to_uhwi to see whether they were used in a context where signedness mattered. If so, I tried to adjust the casting to match. This mostly meant removing casts to unsigned types. There are a couple of cases where I added casts to HOST_WIDE_INT though, to mimic the old tree_low_cst behaviour: - In cfgexpand.c and trans-mem.c, where we're comparing the value with an int PARAM_VALUE. The test isn't watertight since any unsigned constant HOST_WIDE_INT_MAX is going to be accepted. That's a preexisting problem though and it can be fixed more easily with wi:: routines. Until then this preserves the current behaviour. - In the AArch32/64 and powerpc ABI handling. Here too count is an int and is probably not safe for large values anyway; e.g.: count *= (1 + tree_to_uhwi (TYPE_MAX_VALUE (index)) - tree_to_uhwi (TYPE_MIN_VALUE (index))); is done without overflow checking. This too is easier to fix with wi::, so I've just kept it as a signed comparison for now. Thanks, Richard gcc/c-family/ * c-common.c (convert_vector_to_pointer_for_subscript): Remove cast to unsigned type. gcc/ * tree.h (tree_to_uhwi): Return an unsigned HOST_WIDE_INT. * tree.c (tree_to_uhwi): Return an unsigned HOST_WIDE_INT. (tree_ctz): Remove cast to unsigned type. * builtins.c (fold_builtin_memory_op): Likewise. * dwarf2out.c (descr_info_loc): Likewise. * godump.c (go_output_typedef): Likewise. * omp-low.c (expand_omp_simd): Likewise. * stor-layout.c (excess_unit_span): Likewise. * tree-object-size.c (addr_object_size): Likewise. * tree-sra.c (analyze_all_variable_accesses): Likewise. * tree-ssa-forwprop.c (simplify_builtin_call): Likewise. (simplify_rotate): Likewise. * tree-ssa-strlen.c (adjust_last_stmt, handle_builtin_memcpy) (handle_pointer_plus): Likewise. * tree-switch-conversion.c (check_range): Likewise. * tree-vect-patterns.c (vect_recog_rotate_pattern): Likewise. * tsan.c (instrument_builtin_call): Likewise. * cfgexpand.c (defer_stack_allocation): Add cast to HOST_WIDE_INT. * trans-mem.c (tm_log_add): Likewise. * config/aarch64/aarch64.c (aapcs_vfp_sub_candidate): Likewise. * config/arm/arm.c (aapcs_vfp_sub_candidate): Likewise. * config/rs6000/rs6000.c (rs6000_aggregate_candidate): Likewise. * config/mips/mips.c (r10k_safe_mem_expr_p): Make offset unsigned. Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c 2013-11-16 10:13:53.825800713 + +++ gcc/c-family/c-common.c 2013-11-16 10:14:40.373263297 + @@ -11702,8 +11702,7 @@ convert_vector_to_pointer_for_subscript if (TREE_CODE (index) == INTEGER_CST) if (!tree_fits_uhwi_p (index) -|| ((unsigned HOST_WIDE_INT) tree_to_uhwi (index) - = TYPE_VECTOR_SUBPARTS (type))) +|| tree_to_uhwi (index) = TYPE_VECTOR_SUBPARTS (type)) warning_at (loc, OPT_Warray_bounds, index value is out of bound); c_common_mark_addressable_vec (*vecp); Index: gcc/tree.h === --- gcc/tree.h 2013-11-16 10:14:00.618868694 + +++ gcc/tree.h 2013-11-16 10:14:40.488264431 + @@ -3665,7 +3665,7 @@ extern bool tree_fits_uhwi_p (const_tree #endif ; extern HOST_WIDE_INT tree_to_shwi (const_tree); -extern HOST_WIDE_INT tree_to_uhwi (const_tree); +extern unsigned HOST_WIDE_INT tree_to_uhwi (const_tree); #if !defined ENABLE_TREE_CHECKING (GCC_VERSION = 4003) extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT tree_to_shwi (const_tree t) @@ -3674,7 +3674,7 @@ tree_to_shwi (const_tree t) return TREE_INT_CST_LOW (t); } -extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT +extern inline __attribute__ ((__gnu_inline__)) unsigned HOST_WIDE_INT tree_to_uhwi (const_tree t) { gcc_assert (tree_fits_uhwi_p (t)); Index: gcc/tree.c === --- gcc/tree.c 2013-11-16 10:14:00.604868554 + +++ gcc/tree.c 2013-11-16 10:14:40.488264431 + @@ -2211,8 +2211,7 @@ tree_ctz (const_tree expr) case LSHIFT_EXPR: ret1 = tree_ctz (TREE_OPERAND (expr, 0)); if (tree_fits_uhwi_p (TREE_OPERAND (expr, 1)) - ((unsigned HOST_WIDE_INT) tree_to_uhwi (TREE_OPERAND (expr, 1)) - (unsigned HOST_WIDE_INT) prec)) + (tree_to_uhwi (TREE_OPERAND (expr, 1)) prec)) { ret2 = tree_to_uhwi (TREE_OPERAND (expr, 1)); return MIN (ret1 + ret2, prec); @@ -2220,8 +2219,7 @@ tree_ctz (const_tree expr) return ret1; case RSHIFT_EXPR: if (tree_fits_uhwi_p (TREE_OPERAND (expr, 1)) - ((unsigned
Re: [3/10] Mechanical replacement of host_integerp (..., 1)
Richard Sandiford rdsandif...@googlemail.com writes: Like the previous patch, but for host_integerp (x, 1) - tree_to_uhwi_p (x). Should have been this patch. tree-fits-uhwi-p.diff.bz2 Description: BZip2 compressed data
Re: [2/10] Mechanical replacement of host_integerp (..., 0)
Richard Sandiford rdsandif...@googlemail.com writes: This is the result of using sed to replace all single-line host_integerp (x, 0)s with tree_to_shwi_p (x), taking care to handle bracket nesting in x. Bah, wrong patch, sorry. tree-fits-shwi-p.diff.bz2 Description: BZip2 compressed data
Re: [Patch, mips] MIPS performance patch for PR 56552
Richard Sandiford rdsandif...@googlemail.com writes: Steve Ellcey sell...@mips.com writes: diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index 0cda169..49c2bf7 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -6721,7 +6721,7 @@ (define_insn *movGPR:mode_on_MOVECC:mode [(set (match_operand:GPR 0 register_operand =d,d) (if_then_else:GPR - (match_operator:MOVECC 4 equality_operator + (match_operator 4 equality_operator [(match_operand:MOVECC 1 register_operand MOVECC:reg,MOVECC:reg) (const_int 0)]) (match_operand:GPR 2 reg_or_0_operand dJ,0) Sorry, I didn't notice this before, but we should remove _on_MOVECC:mode from the name of the insn. Same for the FP version. OK with that change, thanks. Sorry, MOVECC is still used of course. The patch is OK as-is. Thanks, Richard
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka hubi...@ucw.cz wrote: When testing with -freorder-blocks-and-partition enabled, I hit a verification failure in an LTO profiledbootstrap. Edge forwarding performed when we went into cfg layout mode after bb reordering (during compgotos) created a situation where a hot block was then dominated by a cold block and was therefore remarked as cold. Because bb reorder was complete at that point, it was not moved in the physical layout, and we incorrectly went in and out of the cold section multiple times. The following patch addresses that by fixing the layout when we move blocks to the cold section after bb reordering is complete. Tested with an LTO profiledbootstrap with -freorder-blocks-and-partition enabled. Ok for trunk? Thanks, Teresa 2013-11-15 Teresa Johnson tejohn...@google.com * cfgrtl.c (fixup_partitions): Reorder blocks if necessary. computed_gotos just unfactors unified blocks that we use to avoid CFGs with O(n^2) edges. This is mostly to avoid problems with nonlinearity of other passes and to reduce the quadratic memory use case to one function at a time. I wonder if it won't be cleaner to simply unfactor those just before pass_reorder_blocks. Computed gotos are used e.g. in libjava interpreter to optimize the tight interpretting loop. I think those cases would benefit from having at least scheduling/reordering and alignments done right. Of course it depends on how bad the compile time implications are (I think in addition to libjava, there was a lucier's testcase that made us to go for this trick) , but I would prefer it over ading yet another hack into cfgrtl... We also may just avoid cfglayout cleanup_cfg while doing computed gotos... Note I haven't done an extensive check to see if compgotos is the only phase that goes back into cfglayout mode after bb reordering is done, that's just the one that hit this. Eventually it might be good to prevent going into cfglayout mode after bb reordering. For now we could either fix up the layout as I am doing here. Or as you suggest, prevent some cleanup/cfg optimization after bb reordering is done. I thought about preventing the forwarding optimization after bb reordering when splitting was on initially, but didn't want enabling -freorder-blocks-and-partition to unnecessarily prevent optimization. The reordering seemed reasonably straightforward so I went with that solution in this patch. Let me know if you'd rather have the solution of preventing the forwarding (or maybe all all of try_optimize_cfg to be safe) under -freorder-blocks-and-partition after bb reordering. Thanks, Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix lto bootstrap verification failure with -freorder-blocks-and-partition
Note I haven't done an extensive check to see if compgotos is the only phase that goes back into cfglayout mode after bb reordering is done, that's just the one that hit this. Eventually it might be good to prevent going into cfglayout mode after bb reordering. Can we just try to abort when into cfg layout is called after bb reorder. It seems to make sense to avoid that - in/out will definitely result in misplaced gotos and toher stuff. For now we could either fix up the layout as I am doing here. Or as you suggest, prevent some cleanup/cfg optimization after bb reordering is done. I thought about preventing the forwarding optimization after bb reordering when splitting was on initially, but didn't want enabling -freorder-blocks-and-partition to unnecessarily prevent optimization. The reordering seemed reasonably straightforward so I went with that solution in this patch. Let me know if you'd rather have the solution of preventing the forwarding (or maybe all all of try_optimize_cfg to be safe) under -freorder-blocks-and-partition after bb reordering. Generally I would like to be consistent about the stage of IL - i.e. go to cfglayout after RTl expanstion and stay in it until after the bb reorder and then consistently work with the actual insns layout we decided on. Honza Thanks, Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PING] [PATCH] Add POST_LINK_SPEC for adding possibility of additional steps after linking
On 11/05/2013 02:09 PM, Andris Pavenis wrote: Attached patch adds a possibility to add additional build steps after linking. Without this patch only possibility is to redefine entire LINK_COMMAND_SPEC. Currently only DJGPP seems to need it 2013-11-05Andris Pavenis andris.pave...@iki.fi * gcc/gcc.c: Add macro POST_LINK SPEC for specifying additional steps to invoke after linking * gcc/doc/tm.texi.in: (POST_LINK_SPEC): new * gcc/doc/tm.texi: regenerate Bootstrapped and tested on Linux x86_64 (Fedora 19) Andris Original post http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00367.html Andris
Re: libbacktrace patch RFC: Look up variables in backtrace_syminfo
On Fri, Nov 15, 2013 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Nov 15, 2013 at 01:26:54PM -0800, Ian Lance Taylor wrote: Jakub asked whether it would be possible to extend backtrace_syminfo to work for variables as well as functions. It's a straightforward extension, implemented by this patch. Bootstrapped and ran libbacktrace tests on x86_64-unknown-linux-gnu. Any comments on this patch before I submit it? Looks good to me. Committed. OT, permanent buffer. If THREADED is non-zero the state may be accessed by multiple threads simultaneously, and the library will use appropriate locks (this requires that the library be configured with --enable-backtrace-threads). If THREADED is zero the state in backtrace.h in backtrace_create_state comment doesn't look to be up to date, there is no --enable-backtrace-threads it seems, just depending on configure either it is thread safe or not (and doesn't use locks). Thanks. I committed the following patch to correct the comment. Ian 2013-11-16 Ian Lance Taylor i...@google.com * backtrace.h (backtrace_create_state): Correct comment about threading. Index: backtrace.h === --- backtrace.h (revision 204904) +++ backtrace.h (working copy) @@ -89,8 +89,7 @@ typedef void (*backtrace_error_callback) system-specific path names. If not NULL, FILENAME must point to a permanent buffer. If THREADED is non-zero the state may be accessed by multiple threads simultaneously, and the library will - use appropriate locks (this requires that the library be configured - with --enable-backtrace-threads). If THREADED is zero the state + use appropriate atomic operations. If THREADED is zero the state may only be accessed by one thread at a time. This returns a state pointer on success, NULL on error. If an error occurs, this will call the ERROR_CALLBACK routine. */
Re: [PATCH][ARM] Add Cortex-A53 rtx costs table
On 15 Nov 2013, at 15:42, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds the rtx costs table for the Cortex-A53. It goes in the new aarch-cost-tables.h file because we will want to share it with AArch64. We add a corresponding tuning struct and set the tuning from generic cortex tuning to the new one. Tested arm-none-eabi on model. Ok for trunk? Thanks, Kyrill 2013-11-15 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-cost-tables.h (cortexa53_extra_costs): New table. * config/arm/arm.c (arm_cortex_a53_tune): New. * config/arm/arm-cores.def (cortex-a53): Use cortex_a53 tuning struct. a53-costs.patch Ok. R.
Re: [wide-int] Documentation and comment tweaks
On 11/16/2013 05:49 AM, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: Some minor tweaks to the documentation and commentary. The hyphenation and non zero-nonzero changes are supposed to be per guidelines: http://gcc.gnu.org/codingconventions.html#Spelling Hope I got them right. OK to install? sorry, yes it is ok to install. Ping. Index: gcc/dfp.c === --- gcc/dfp.c 2013-11-09 09:50:47.392396760 + +++ gcc/dfp.c 2013-11-09 11:07:22.754160541 + @@ -605,8 +605,8 @@ decimal_real_to_integer (const REAL_VALU return real_to_integer (to); } -/* Likewise, but returns a wide_int with PRECISION. Fail - is set if the value does not fit. */ +/* Likewise, but returns a wide_int with PRECISION. *FAIL is set if the + value does not fit. */ wide_int decimal_real_to_integer (const REAL_VALUE_TYPE *r, bool *fail, int precision) Index: gcc/doc/rtl.texi === --- gcc/doc/rtl.texi2013-11-09 09:50:47.392396760 + +++ gcc/doc/rtl.texi2013-11-09 11:07:22.755160549 + @@ -1542,11 +1542,10 @@ Similarly, there is only one object for @findex const_double @item (const_double:@var{m} @var{i0} @var{i1} @dots{}) This represents either a floating-point constant of mode @var{m} or -(on ports older ports that do not define +(on older ports that do not define @code{TARGET_SUPPORTS_WIDE_INT}) an integer constant too large to fit into @code{HOST_BITS_PER_WIDE_INT} bits but small enough to fit within -twice that number of bits (GCC does not provide a mechanism to -represent even larger constants). In the latter case, @var{m} will be +twice that number of bits. In the latter case, @var{m} will be @code{VOIDmode}. For integral values constants for modes with more bits than twice the number in @code{HOST_WIDE_INT} the implied high order bits of that constant are copies of the top bit of @@ -1576,25 +1575,25 @@ the precise bit pattern used by the targ This contains an array of @code{HOST_WIDE_INTS} that is large enough to hold any constant that can be represented on the target. This form of rtl is only used on targets that define -@code{TARGET_SUPPORTS_WIDE_INT} to be non zero and then -@code{CONST_DOUBLES} are only used to hold floating point values. If +@code{TARGET_SUPPORTS_WIDE_INT} to be nonzero and then +@code{CONST_DOUBLE}s are only used to hold floating-point values. If the target leaves @code{TARGET_SUPPORTS_WIDE_INT} defined as 0, @code{CONST_WIDE_INT}s are not used and @code{CONST_DOUBLE}s are as they were before. -The values are stored in a compressed format. The higher order +The values are stored in a compressed format. The higher-order 0s or -1s are not represented if they are just the logical sign extension of the number that is represented. @findex CONST_WIDE_INT_VEC @item CONST_WIDE_INT_VEC (@var{code}) Returns the entire array of @code{HOST_WIDE_INT}s that are used to -store the value. This macro should be rarely used. +store the value. This macro should be rarely used. @findex CONST_WIDE_INT_NUNITS @item CONST_WIDE_INT_NUNITS (@var{code}) The number of @code{HOST_WIDE_INT}s used to represent the number. -Note that this generally be smaller than the number of +Note that this generally is smaller than the number of @code{HOST_WIDE_INT}s implied by the mode size. @findex CONST_WIDE_INT_ELT Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi 2013-11-09 09:50:47.392396760 + +++ gcc/doc/tm.texi 2013-11-09 11:07:22.757160564 + @@ -9683,10 +9683,9 @@ Returns the negative of the floating poi Returns the absolute value of @var{x}. @end deftypefn -@deftypefn Macro void REAL_VALUE_FROM_INT (REAL_VALUE_TYPE @var{x}, HOST_WIDE_INT @var{val}, enum machine_mode @var{mode}) -Converts a double-precision integer found in @var{val}, -into a floating point value which is then stored into @var{x}. The -value is truncated to fit in mode @var{mode}. +@deftypefn Macro void REAL_VALUE_FROM_INT (REAL_VALUE_TYPE @var{x}, const wide_int_ref @var{val}, enum machine_mode @var{mode}) +Converts integer @var{val} into a floating-point value which is then +stored into @var{x}. The value is truncated to fit in mode @var{mode}. @end deftypefn @node Mode Switching @@ -11497,15 +11496,15 @@ The default value of this hook is based @defmac TARGET_SUPPORTS_WIDE_INT On older ports, large integers are stored in @code{CONST_DOUBLE} rtl -objects. Newer ports define @code{TARGET_SUPPORTS_WIDE_INT} to be non -zero to indicate that large integers are stored in +objects. Newer ports define @code{TARGET_SUPPORTS_WIDE_INT} to be nonzero +to indicate that large integers are stored in @code{CONST_WIDE_INT} rtl objects. The @code{CONST_WIDE_INT} allows very large integer constants
Re: [PATCH] Fix various reassoc issues (PR tree-optimization/58791, tree-optimization/58775)
On Tue, Oct 22, 2013 at 6:09 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! I've spent over two days looking at reassoc, fixing spots where we invalidly reused SSA_NAMEs (this results in wrong-debug, as the added guality testcases show, even some ICEs (pr58791-3.c) and wrong range info for SSA_NAMEs) and cleaning up the stmt scheduling stuff (e.g. all gsi_move* calls are gone, if we need to move something or set an SSA_NAME to different value than previously, we'll now always create new stmt and the old one depending on the case either remove or mark as visited zero uses, so that it will be removed later on by reassociate_bb. Of course some gimple_assign_set_rhs* etc. calls are still valid even without creating new stmts, optimizing some statement to equivalent computation is fine, but computing something different in an old SSA_NAME is not. I've also noticed that build_and_add_sum was using different framework from rewrite_expr_tree, the former was using stmt_dominates_stmt_p (which is IMHO quite clean interface, but with the added uid stuff in reassoc can be unnecessarily slow on large basic blocks) and rewrite_expr_tree was using worse APIs, but using the uids. So, the patch also unifies that, into a new reassoc_stmt_dominates_stmt_p that has the same semantics as the tree-ssa-loop-niter.c function, but uses uids internally. rewrite_expr_tree is changed so that it recurses first, then handles current level (which is needed if the recursion needs to create new stmt and give back a new SSA_NAME), which allowed removing the ensure_ops_are_available recursive stuff. Also, uids are now computed in break_up_subtract_bb (and are per-bb, starting with 1, we never compare uids from different bbs), which allows us to get rid of an extra whole IL walk. For the inter-bb optimization, I had to stop modifying stmts right away in update_range_test, because we don't want to reuse SSA_NAMEs and if we modified there, we'd need to modify potentially many dependent SSA_NAMEs and sometimes many times. So, now it instead just updates oe-op values and maybe_optimize_range_tests just looks at those values and updates what is needed. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? For 4.8 a partial backport would be possible, but quite a lot of work, for 4.7 I'd prefer not to backport given that there gsi_for_stmt isn't O(1). 2013-10-22 Jakub Jelinek ja...@redhat.com PR tree-optimization/58775 PR tree-optimization/58791 * tree-ssa-reassoc.c (reassoc_stmt_dominates_stmt_p): New function. (insert_stmt_after): Rewritten, don't move the stmt, but really insert it. (get_stmt_uid_with_default): Remove. (build_and_add_sum): Use insert_stmt_after and reassoc_stmt_dominates_stmt_p. Fix up uid if bb contains only labels. (update_range_test): Set uid on stmts added by force_gimple_operand_gsi. Don't immediately modify statements in inter-bb optimization, just update oe-op values. (optimize_range_tests): Return bool whether any changed have been made. (update_ops): New function. (struct inter_bb_range_test_entry): New type. (maybe_optimize_range_tests): Perform statement changes here. (not_dominated_by, appears_later_in_bb, get_def_stmt, ensure_ops_are_available): Remove. (find_insert_point): Rewritten. (rewrite_expr_tree): Remove MOVED argument, add CHANGED argument, return LHS of the (new resp. old) stmt. Don't call ensure_ops_are_available, don't reuse SSA_NAMEs, recurse first instead of last, move new stmt at the right place. (linearize_expr, repropagate_negates): Don't reuse SSA_NAMEs. (negate_value): Likewise. Set uids. (break_up_subtract_bb): Initialize uids. (reassociate_bb): Adjust rewrite_expr_tree caller. (do_reassoc): Don't call renumber_gimple_stmt_uids. It caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59154 H.J.
Re: [PATCH][1-3] New configure option to enable Position independent executable as default.
On Wed, 13 Nov 2013 23:28:45 +0100 Magnus Granberg zo...@gentoo.org wrote: Hi This patchset will add a new configure options --enable-default-pie. With the new option enable will make it pass -fPIE and -pie from the gcc and g++ frontend. Have only add the support for two targets but should work on more targes. In configure.ac we add the new option. We can't compile the compiler or the crt stuff with -fPIE it will brake the PCH and the crtbegin and crtend files. The disabling is done in the Makefiles. The needed spec is added to DRIVER_SELF_SPECS. We disable all the profiling test for the linking will fail.Tested on x86_64 linux (Gentoo). /Magnus Granberg Hey Magnus. Some nits: --- a/gcc/configure.ac2013-09-25 18:10:35.0 +0200 +++ b/gcc/configure.ac2013-10-22 21:26:56.287602139 +0200 @@ -5434,6 +5434,31 @@ if test x${LINKER_HASH_STYLE} != x; th [The linker hash style]) fi +# Check whether --enable-default-pie was given and target have the support. +AC_ARG_ENABLE(default-pie, +[AS_HELP_STRING([--enable-default-pie], [Enable Position independent executable as default. Help strings begin with a lowercase letter and do not end with a period. enable Position Independent Executables by default. + If we have suppot for it when compiling and linking. + Linux targets supported i?86 and x86_64.])], I would drop these lines. +enable_default_pie=$enableval, +enable_default_pie=no) +if test x$enable_default_pie = xyes; then + AC_MSG_CHECKING(if $target support to default with -fPIE and link with -pie as default) if $target supports default PIE + enable_default_pie=no + case $target in +i?86*-*-linux* | x86_64*-*-linux*) + enable_default_pie=yes + ;; +*) + ;; +esac + AC_MSG_RESULT($enable_default_pie) +fi +if test x$enable_default_pie == xyes ; then + AC_DEFINE(ENABLE_DEFAULT_PIE, 1, + [Define if your target support default-pie and you have enable it.]) supports default PIE and it is enabled. +fi +AC_SUBST([enable_default_pie]) + # Configure the subdirectories # AC_CONFIG_SUBDIRS($subdirs) --- a/gcc/doc/install.texi2013-10-01 19:29:40.0 +0200 +++ b/gcc/doc/install.texi2013-11-09 15:40:20.831402110 +0100 @@ -1421,6 +1421,11 @@ do a @samp{make -C gcc gnatlib_and_tools Specify that the run-time libraries for stack smashing protection should not be built. +@item --enable-default-pie +We will turn on @option{-fPIE} and @option{-pie} as default when +compileing and linking if the support is there. We only support +i?86-*-linux* and x86-64-*-linux* as target for now. Turn on @option{-fPIE} and @option{-pie} by default if supported. Currently supported targets are i?86-*-linux* and x86-64-*-linux*. Also two spaces between sentences. --- a/gcc/doc/invoke.texi 2012-03-01 10:57:59.0 +0100 +++ b/gcc/doc/invoke.texi 2012-07-30 00:57:03.766847851 +0200 @@ -9457,6 +9480,12 @@ For predictable results, you must also s that were used to generate code (@option{-fpie}, @option{-fPIE}, or model suboptions) when you specify this option. +NOTE: With configure --enable-default-pie this option is enabled by default Extra space (also in the hunk for fPIE). +for C, C++, ObjC, ObjC++, if none of @option{-fno-PIE}, @option{-fno-pie}, +@option{-fPIC}, @option{-fpic}, @option{-fno-PIC}, @option{-fno-pic}, +@option{-nostdlib}, @option{-nostartfiles}, @option{-shared}, +@option{-nodefaultlibs}, nor @option{static} are found. Looks like nodefaultlibs is missing from PIE_DRIVER_SELF_SPECS or this needs to be updated. Thanks! -- Ryan Hillpsn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 signature.asc Description: PGP signature
[PATCH, rs6000] Emit correct note for DWARF CFI information on LE prolog VSX stores
Hi, For VSX in little endian we currently split vector register stores into a permute/store pair. For prolog stores, this results in a REG_FRAME_RELATED_EXPR note that doesn't have a simple register for its RHS, which it needs to have. This patch detects that situation and ensures we produce the correct note. This problem was breaking bootstrap when configured with --with-cpu=power7, something we hadn't tried before. With the patch we now get past stage 1. There is at least one wrong-code bug to track down in stage 2, but modifying this note is clearly not involved with that. Otherwise bootstrapped and tested on powerpc64-unknown-linux-gnu with no regressions on the big-endian side, also bootstrapped with --with-cpu=power7. Is this ok for trunk? Thanks, Bill 2011-11-16 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg parameter and use it in REG_FRAME_RELATED_EXPR note. (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX parameter. (rs6000_emit_prologue): Likewise, but for little endian VSX stores, pass the source register of the store instead. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 204861) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -21439,7 +21439,7 @@ output_probe_stack_range (rtx reg1, rtx reg2) static rtx rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val, - rtx reg2, rtx rreg) + rtx reg2, rtx rreg, rtx split_reg) { rtx real, temp; @@ -21530,6 +21530,11 @@ rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE } } + /* If a store insn has been split into multiple insns, the + true source register is given by split_reg. */ + if (split_reg != NULL_RTX) +real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg); + RTX_FRAME_RELATED_P (insn) = 1; add_reg_note (insn, REG_FRAME_RELATED_EXPR, real); @@ -21637,7 +21642,7 @@ emit_frame_save (rtx frame_reg, enum machine_mode reg = gen_rtx_REG (mode, regno); insn = emit_insn (gen_frame_store (reg, frame_reg, offset)); return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } /* Emit an offset memory reference suitable for a frame store, while @@ -22217,7 +2,7 @@ rs6000_emit_prologue (void) insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - treg, GEN_INT (-info-total_size)); + treg, GEN_INT (-info-total_size), NULL_RTX); sp_off = frame_off = info-total_size; } @@ -22302,7 +22307,7 @@ rs6000_emit_prologue (void) insn = emit_move_insn (mem, reg); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); END_USE (0); } } @@ -22358,7 +22363,7 @@ rs6000_emit_prologue (void) info-lr_save_offset, DFmode, sel); rs6000_frame_related (insn, ptr_reg, sp_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); if (lr) END_USE (0); } @@ -22437,7 +22442,7 @@ rs6000_emit_prologue (void) SAVRES_SAVE | SAVRES_GPR); rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } /* Move the static chain pointer back. */ @@ -22487,7 +22492,7 @@ rs6000_emit_prologue (void) info-lr_save_offset + ptr_off, reg_mode, sel); rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); if (lr) END_USE (0); } @@ -22503,7 +22508,7 @@ rs6000_emit_prologue (void) info-gp_save_offset + frame_off + reg_size * i); insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } else if (!WORLD_SAVE_P (info)) { @@ -22826,7 +22831,7 @@ rs6000_emit_prologue (void) info-altivec_save_offset + ptr_off, 0, V4SImode, SAVRES_SAVE | SAVRES_VR); rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off, -
Re: [PowerPC] libffi fixes and support for PowerPC64 ELFv2
On Sat, Nov 16, 2013 at 10:18:05PM +1030, Alan Modra wrote: The following six patches correspond to patches posted to the libffi mailing list a few days ago to add support for PowerPC64 ELFv2. The The ChangeLog just became easier to write. :) * src/powerpc/ffitarget.h: Import from upstream. * src/powerpc/ffi.c: Likewise. * src/powerpc/linux64.S: Likewise. * src/powerpc/linux64_closure.S: Likewise. * doc/libffi.texi: Likewise. * testsuite/libffi.call/cls_double_va.c: Likewise. * testsuite/libffi.call/cls_longdouble_va.c: Likewise. OK to apply? -- Alan Modra Australia Development Lab, IBM
[RFA][PATCH]Fix 59019
59019 is currently latent on the trunk, but it's likely to fail again at some point. The problem we have is combine transforms a conditional trap into an unconditional trap. conditional traps are not considered control flow insns, but unconditional traps are. Thus, if we turn a conditional trap in the middle of a block into an unconditional trap, we end up with a control flow insn in the middle of a block and trip a checking assert. This is, IMHO, a bandaid. The inconsistency is amazingly annoying. But I've got bigger fish to fry and I was unhappy with the number of issues I was running into when I tried to make conditional traps control flow insns. Basically when we see an unconditional trap after we've done combining, we remove all the insns after the trap to the end of the block, delete the block's outgoing edges and emit a barrier into the block's footer. It's similar in spirit to the cleanups we do for other situations. Bootstrapped on ia64 with a hack installed to make this situation more likely to arise. Ok for the trunk if it passes a bootstrap regression test on x86_64-unknown-linux-gnu? Jeff * combine.c (try_combine): If we have created an unconditional trap, make sure to fixup the insn stream CFG appropriately. diff --git a/gcc/combine.c b/gcc/combine.c index 13f5e29..b3d20f2 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -4348,6 +4348,37 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, update_cfg_for_uncondjump (undobuf.other_insn); } + /* If we might have created an unconditional trap, then we have + cleanup work to do. + + The fundamental problem is a conditional trap is not considered + control flow altering, while an unconditional trap is considered + control flow altering. + + So while we could have a conditional trap in the middle of a block + we can not have an unconditional trap in the middle of a block. */ + if (GET_CODE (i3) == INSN + GET_CODE (PATTERN (i3)) == TRAP_IF + XEXP (PATTERN (i3), 0) == const1_rtx) +{ + basic_block bb = BLOCK_FOR_INSN (i3); + rtx last = get_last_bb_insn (bb); + + /* First remove all the insns after the trap. */ + if (i3 != last) + delete_insn_chain (NEXT_INSN (i3), last, true); + + /* And ensure there's no outgoing edges anymore. */ + while (EDGE_COUNT (bb-succs) 0) + remove_edge (EDGE_SUCC (bb, 0)); + + /* And ensure cfglayout knows this block does not fall through. */ + emit_barrier_after_bb (bb); + + /* Not exactly true, but gets the effect we want. */ + *new_direct_jump_p = 1; +} + /* A noop might also need cleaning up of CFG, if it comes from the simplification of a jump. */ if (JUMP_P (i3)