[PATCH][RFC] Sanitize POINTER_PLUS_EXPR offset operand requirements
This implements my idea of how POINTER_PLUS_EXPR should end up being changed. This addresses two things, 1) reduce the use of sizetype nodes, 2) properly handle targets where sizeof (void *) != sizeof (sizetype). Especially 2) is important as currently we lose information about the signedness of the offset and thus have no idea on whether to sign- or zero-extend. The following patch just implements the different checking and handles the desired semantics during expansion. It does not yet touch the various places that build POINTER_PLUS_EXPRs (in fact, a sizetype offset operand is still valid). A final transition will be as tedious as the original patch introducing POINTER_PLUS_EXPRs, but the obvious starting points are the frontends and fold. Partial transitions will of course also eventually expose issues with this new design. Like, how do we fold (p + short a) + unsigned int b. Straight-forward answer: as p + ((uintptr_t)a + (uintptr_t)b). Which we eventually can narrow again if we know sth about the value-ranges of a and b (remember, overflow in ptr + offset is undefined). Comments? Thanks, Richard. 2011-06-17 Richard Guenther rguent...@suse.de * expr.c (expand_expr_real_2): Extend the POINTER_PLUS_EXPR offset operand to pointer precision. * tree-cfg.c (verify_expr): Allow arbitrary integral types for the POINTER_PLUS_EXPR offset operand. (verify_gimple_assign_binary): Likewise. * tree.c (build2_stat): Likewise. * tree.def (POINTER_PLUS_EXPR): Adjust documentation. Index: trunk/gcc/expr.c === *** trunk.orig/gcc/expr.c 2011-06-16 16:56:02.0 +0200 --- trunk/gcc/expr.c2011-06-17 11:33:52.0 +0200 *** expand_expr_real_2 (sepops ops, rtx targ *** 7422,7436 } case POINTER_PLUS_EXPR: ! /* Even though the sizetype mode and the pointer's mode can be different ! expand is able to handle this correctly and get the correct result out ! of the PLUS_EXPR code. */ ! /* Make sure to sign-extend the sizetype offset in a POINTER_PLUS_EXPR ! if sizetype precision is smaller than pointer precision. */ ! if (TYPE_PRECISION (sizetype) TYPE_PRECISION (type)) ! treeop1 = fold_convert_loc (loc, type, ! fold_convert_loc (loc, ssizetype, ! treeop1)); case PLUS_EXPR: /* If we are adding a constant, a VAR_DECL that is sp, fp, or ap, and something else, make sure we add the register to the constant and --- 7422,7433 } case POINTER_PLUS_EXPR: ! /* Extend/truncate the offset operand to pointer width according ! to its signedness. */ ! if (TYPE_PRECISION (type) != TYPE_PRECISION (treeop1)) ! treeop1 = fold_convert_loc (loc, type, treeop1); ! ! /* Fallthru. */ case PLUS_EXPR: /* If we are adding a constant, a VAR_DECL that is sp, fp, or ap, and something else, make sure we add the register to the constant and Index: trunk/gcc/tree-cfg.c === *** trunk.orig/gcc/tree-cfg.c 2011-06-16 16:55:45.0 +0200 --- trunk/gcc/tree-cfg.c2011-06-17 11:35:17.0 +0200 *** verify_expr (tree *tp, int *walk_subtree *** 2845,2857 error (invalid operand to pointer plus, first operand is not a pointer); return t; } ! /* Check to make sure the second operand is an integer with type of !sizetype. */ ! if (!useless_type_conversion_p (sizetype, !TREE_TYPE (TREE_OPERAND (t, 1 { error (invalid operand to pointer plus, second operand is not an !integer with type of sizetype); return t; } /* FALLTHROUGH */ --- 2845,2855 error (invalid operand to pointer plus, first operand is not a pointer); return t; } ! /* Check to make sure the second operand is an integer. */ ! if (!INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 1 { error (invalid operand to pointer plus, second operand is not an !integer type); return t; } /* FALLTHROUGH */ *** verify_gimple_assign_binary (gimple stmt *** 3609,3615 do_pointer_plus_expr_check: if (!POINTER_TYPE_P (rhs1_type) || !useless_type_conversion_p (lhs_type, rhs1_type) ! || !useless_type_conversion_p (sizetype, rhs2_type)) { error (type mismatch in pointer plus expression); debug_generic_stmt (lhs_type); --- 3607,3613 do_pointer_plus_expr_check: if (!POINTER_TYPE_P (rhs1_type) || !useless_type_conversion_p (lhs_type, rhs1_type) !
Re: PING: PATCH: PR other/49325: Incorrect target HAVE_INITFINI_ARRAY check
Why are you not changing the gcc_AC_ macro and instead introducing duplicate code? Perhaps the right solution is to add a final argument to the AC_RUN_IFELSE macro (don't know, I should be on holiday and hence I do not have the source code at hand :)); in any case this is _not_ how you add tests that work for cross compilation. Paolo On Wed, Jun 15, 2011 at 00:05, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Jun 8, 2011 at 9:49 AM, H.J. Lu hongjiu...@intel.com wrote: Hi, Target HAVE_INITFINI_ARRAY support was added by: http://gcc.gnu.org/ml/gcc-patches/2002-11/msg00387.html Unfortunately, it checks if host supports init_array/fini_array sections, not target. It will generate wrong result for cross compiler. This patch uses the target compiler instead of host compiler to check this feature. For ia64, we set HAVE_INITFINI_ARRAY to 1 if init_array/fini_array section can be used. For other targets, we set it only if we can mix init_array/fini_array sections with ctors/dtors sections. OK for trunk? Thanks. H.J. --- 2011-06-07 H.J. Lu hongjiu...@intel.com PR other/49325 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Removed. * configure.ac: Remove gcc_AC_INITFINI_ARRAY. Add --enable-initfini-array and check if .init_array can be used with .ctors. * configure: Regenerated. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index 3eec559..0f5f686 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -369,26 +369,6 @@ else fi fi]) -AC_DEFUN([gcc_AC_INITFINI_ARRAY], -[AC_ARG_ENABLE(initfini-array, - [ --enable-initfini-array use .init_array/.fini_array sections], - [], [ -AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array support, - gcc_cv_initfini_array, [dnl - AC_RUN_IFELSE([AC_LANG_SOURCE([ -static int x = -1; -int main (void) { return x; } -int foo (void) { x = 0; } -int (*fp) (void) __attribute__ ((section (.init_array))) = foo;])], - [gcc_cv_initfini_array=yes], [gcc_cv_initfini_array=no], - [gcc_cv_initfini_array=no])]) - enable_initfini_array=$gcc_cv_initfini_array -]) -if test $enable_initfini_array = yes; then - AC_DEFINE(HAVE_INITFINI_ARRAY, 1, - [Define .init_array/.fini_array sections are available and working.]) -fi]) - dnl # _gcc_COMPUTE_GAS_VERSION dnl # Used by gcc_GAS_VERSION_GTE_IFELSE dnl # diff --git a/gcc/configure b/gcc/configure index a373518..a23e2fa 100755 --- a/gcc/configure +++ b/gcc/configure @@ -10447,6 +10447,7 @@ fi # Restore CFLAGS from before the gcc_AC_NEED_DECLARATIONS tests. CFLAGS=$saved_CFLAGS +# Check if .init_array can be used with .ctors. # Check whether --enable-initfini-array was given. if test ${enable_initfini_array+set} = set; then : enableval=$enable_initfini_array; @@ -10457,16 +10458,114 @@ $as_echo_n checking for .preinit_array/.init_array/.fini_array support... 6 if test ${gcc_cv_initfini_array+set} = set; then : $as_echo_n (cached) 6 else + if test x${build} = x${target} ; then if test $cross_compiling = yes; then : gcc_cv_initfini_array=no else cat confdefs.h - _ACEOF conftest.$ac_ext /* end confdefs.h. */ +#ifdef __ia64__ +/* We turn on .preinit_array/.init_array/.fini_array support for ia64 + if it can be used. */ static int x = -1; int main (void) { return x; } int foo (void) { x = 0; } int (*fp) (void) __attribute__ ((section (.init_array))) = foo; +#else +extern void abort (); +static int count; + +static void +init1005 () +{ + if (count != 0) + abort (); + count = 1005; +} +void (*const init_array1005) () + __attribute__ ((section (.init_array.01005), aligned (sizeof (void * + = { init1005 }; +static void +fini1005 () +{ + if (count != 1005) + abort (); +} +void (*const fini_array1005) () + __attribute__ ((section (.fini_array.01005), aligned (sizeof (void * + = { fini1005 }; + +static void +ctor1007 () +{ + if (count != 1005) + abort (); + count = 1007; +} +void (*const ctors1007) () + __attribute__ ((section (.ctors.64528), aligned (sizeof (void * + = { ctor1007 }; +static void +dtor1007 () +{ + if (count != 1007) + abort (); + count = 1005; +} +void (*const dtors1007) () + __attribute__ ((section (.dtors.64528), aligned (sizeof (void * + = { dtor1007 }; + +static void +init65530 () +{ + if (count != 1007) + abort (); + count = 65530; +} +void (*const init_array65530) () + __attribute__ ((section (.init_array.65530), aligned (sizeof (void * + = { init65530 }; +static void +fini65530 () +{ + if (count != 65530) + abort (); + count = 1007; +} +void (*const fini_array65530) () + __attribute__ ((section (.fini_array.65530), aligned (sizeof (void * + = { fini65530 }; + +static void +ctor65535 () +{ + if (count != 65530) + abort (); + count = 65535; +} +void (*const ctors65535)
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
On Fri, Jun 17, 2011 at 12:40 PM, Tom de Vries vr...@codesourcery.com wrote: On 06/17/2011 12:01 AM, Jeff Law wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/16/11 00:39, Tom de Vries wrote: Hi, Consider the following example. extern unsigned int foo (int*) __attribute__((pure)); unsigned int tr (int array[], int n) { unsigned int i; unsigned int sum = 0; for (i = 0; i n; i++) sum += foo (array[i]); return sum; } For 32-bit pointers, the analysis in infer_loop_bounds_from_pointer_arith currently concludes that the range of valid array[i] is array[0x0] to array[0x3fff], meaning 0x4000 distinct values. This implies that i n is executed at most 0x4001 times, and i n cannot be eliminated by an 32-bit iterator with step 4, since that one has only 0x4000 distinct values. The patch reasons that NULL cannot be used or produced by pointer arithmetic, and that we can exclude the possibility of the NULL pointer in the range. So the range of valid array[i] is array[0] to array[0x3ffe], meaning 0x3fff distinct values. This implies that i n is executed at most 0x4000 times and i n can be eliminated. The patch implements this new limitation by changing the (low, high, step) triplet in infer_loop_bounds_from_pointer_arith from (0x0, 0x, 0x4) to (0x4, 0x, 0x4). I'm not too happy about the test for C-like language: ptrdiff_type_node != NULL_TREE, but I'm not sure how else to test for this. Bootstrapped and reg-tested on x86_64. I will sent the adapted test cases in a separate email. Interesting. I'd never thought about the generation/use angle to prove a pointer was non-null. ISTM we could use that same logic to infer that more pointers are non-null in extract_range_from_binary_expr. Interested in tackling that improvement, obviously as an independent patch? I'm not familiar with vrp code, but.. something like this? Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. Richard. + { + set_value_range_to_nonnull (vr, expr_type); + set_value_range_to_nonnull (vr0, expr_type); + } + else if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) set_value_range_to_nonnull (vr, expr_type); else if (range_is_null (vr0) range_is_null (vr1)) set_value_range_to_null (vr, expr_type); Thanks, - Tom
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
Interesting. I'd never thought about the generation/use angle to prove a pointer was non-null. ISTM we could use that same logic to infer that more pointers are non-null in extract_range_from_binary_expr. Interested in tackling that improvement, obviously as an independent patch? I'm not familiar with vrp code, but.. something like this? Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
2011/6/17 Zdenek Dvorak rakd...@kam.mff.cuni.cz: Interesting. I'd never thought about the generation/use angle to prove a pointer was non-null. ISTM we could use that same logic to infer that more pointers are non-null in extract_range_from_binary_expr. Interested in tackling that improvement, obviously as an independent patch? I'm not familiar with vrp code, but.. something like this? Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). It's maybe undefined in C, but is it undefined in the middle-end? Thus, are you sure we never generate it from (void *)((uintptr_t)p + obfuscated_0)? I'm sure we simply fold that to p + obfuscated_0. Richard. Zdenek
Re: [PATCH]: Pass -no_pie on SYSTEMSPEC for darwin11
Hi Jack, On 17 Jun 2011, at 03:21, Jack Howarth wrote: The gcj compiler needs to pass -no_pie for linkage on darwin11 due to the new -pie default of the linker. The attached patch accomplishes this by passing -no_pie on SYSTEMSPEC for *-*-darwin[12]*. Since Darwin10 supports -no_pie in its linker, I included it in the triplet match to simplify the syntax. Bootstrap and tested on x86_64- apple-darwin11. Okay for gcc trunk? Jack 2011-06-16 Jack Howarth howa...@bromo.med.uc.edu * libjava/configure.ac (SYSTEMSPEC): Pass -no_pie for darwin11. * libjava/configure: Regenerate. I would like to see some more analysis of what the underlying reasons for failure are. -fpie works fine with darwin 9 and darwin 10 libjava [XCode 3.1.4 and 3.2.5 respectively, bootstrap w/4.2.1] (modulo suppressing it when building test-suite .dylibs *** - which is a testsuite options handling issue - not a fundamental problem). so: make -k check-target-libjava RUNTESTFLAGS=--target_board=unix/-fpie\{- m32,-m64\} passes without regression (re the no pie case) given that pie is suppressed for dylibs. - So is your proposed patch a work-around for (as yet unreleased) darwin 11 tool-chain bugs or ... ? Iain === kludge to suppress pie for for dylibs (use in place of the darwin9.h hunk from the attachment on PR49371). Index: gcc/config/darwin9.h === --- gcc/config/darwin9.h(revision 175110) +++ gcc/config/darwin9.h(working copy) @@ -35,6 +35,12 @@ along with GCC; see the file COPYING3. If not see /* Tell collect2 to run dsymutil for us as necessary. */ #define COLLECT_RUN_DSYMUTIL 1 +#undef PIE_SPEC +#define PIE_SPEC \ + %{fpie|pie|fPIE: %{!Zdynamiclib: \ + %{mdynamic-no-pic: %n'-mdynamic-no-pic' overrides '-pie', '- fpie' or '-fPIE'; \ + :-pie}}} + #undef ASM_OUTPUT_ALIGNED_COMMON #define ASM_OUTPUT_ALIGNED_COMMON(FILE, NAME, SIZE, ALIGN) \ do { \
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
Hi, Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). It's maybe undefined in C, but is it undefined in the middle-end? Thus, are you sure we never generate it from (void *)((uintptr_t)p + obfuscated_0)? I'm sure we simply fold that to p + obfuscated_0. if we do, we definitely should not -- the only point of such a construction is to bypass the pointer arithmetics restrictions, Zdenek
Re: [pph] Fix cxx_binding streaming logic (issue4646041)
On Thu, Jun 16, 2011 at 18:57, Gabriel Charette gch...@google.com wrote: At the end of pph_out_cxx_binding I could also simply do pph_out_uchar (stream, PPH_RECORD_END); instead of pph_out_cxx_binding_1 (stream, NULL, ref_p); which has the exact same effect. That's fine. I kind of prefer the way you did it. 2011-06-16 Gabriel Charette gch...@google.com * gcc/cp/pph-streamer-in.c (pph_in_cxx_binding): Fix streaming logic. * gcc/cp/pph-streamer-out.c (pph_out_cxx_binding): Fix streaming logic. OK, committed in rev 175143. Diego.
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
2011/6/17 Zdenek Dvorak rakd...@kam.mff.cuni.cz: Hi, Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). It's maybe undefined in C, but is it undefined in the middle-end? Thus, are you sure we never generate it from (void *)((uintptr_t)p + obfuscated_0)? I'm sure we simply fold that to p + obfuscated_0. if we do, we definitely should not -- the only point of such a construction is to bypass the pointer arithmetics restrictions, Ok, we don't. Where does the C standard say that NULL + 0 is undefined? Richard. Zdenek
Re: [Patch, AVR]: QI builtins for parity, popcount, 1 n
On Fri, 17 Jun 2011, Georg-Johann Lay wrote: I don't see what's bat with the patch, it's straight forward. C is a high-level language, defined in terms of high-level semantics rather than machine instructions. C code should be written where possible using machine-independent functionality, falling into machine-dependent operations only where the semantics cannot readily be represented in a machine-independent way; the compiler should generally be responsible for picking optimal instructions for the source code. C code should write 1 n (or 1 (n 7) if that's what it wants) for shifts rather than __builtin_avr_pow2. That way it's more portable and more readable to general C programmers who aren't familiar with all the machine-specific interfaces. Similarly, code wanting to add values should use the + operator rather than each target having its own __builtin_arch_add. And since parity and population-count operations are widely present and generically useful, GNU C has generic built-in functions for those, so code should use __builtin_parity and __builtin_popcount on all machines rather than machine-specific variants; such machine-specific variants should not exist. The machine-independent parts of the compiler should know about the equivalence of (popcount X) and (popcount (zero_extend X)), (parity X) and (parity (zero_extend X)) and (parity X) and (parity (sign_extend X)) if the sign-extension is by an even number of bits - in fact, there is already code in simplify_unary_operation_1 that does know this (the assumption that all sign-extensions are by an even number of bits is hardcoded). The target should just need to describe how to code these operations on various modes. If the existing code doesn't suffice to cause popcountqi etc. patterns to be used for the generic built-in functions, the appropriate fix is generic rather than adding new target-specific built-in functions - just as if the compiler didn't generate your target's addition instruction from the '+' operator, the right fix is not to add __builtin_arch_add to generate that instruction but rather to make '+' generate it. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Fix ICEs with out of range immediates in SSE*/AVX*/XOP* intrinsics (PR target/49411)
On Fri, Jun 17, 2011 at 01:31:14AM +0200, Jakub Jelinek wrote: Not here, those are handled by ix86_expand_args_builtin instead of ix86_expand_multi_arg_builtin. Furthermore, only CODE_FOR_vcvtps2ph and CODE_FOR_vcvtps2ph256 have CONST_INT argument. And I believe ix86_expand_args_builtin handles it fine, what's wrong is the actual predicates those insns use. Ok, had a deeper look into this and it seems there are other issues, some of them even without test coverage regressed since 4.6. Some problems result in ICEs, other fail to assemble. Had to revert the blendbits removal patch, because that removal results in out of range immediates not to be reported as predicate failures, but instead as ICEs. So here is an updated patch that adds test coverage. Regtested on x86_64-linux {-m32,-m64}, ok for trunk (and backport for 4.6)? There are still a couple of things I'm unsure about (not tested by the testcases, compile fine): #include x86intrin.h __m128i i1, i2, i3, i4; __m128 a1, a2, a3, a4; __m128d d1, d2, d3, d4; __m256i l1, l2, l3, l4; __m256 b1, b2, b3, b4; __m256d e1, e2, e3, e4; __m64 m1, m2, m3, m4; int k1, k2, k3, k4; float f1, f2, f3, f4; void foo (void) { /* 8 bit imm only? This compiles fine, but one ends up with number modulo 256 in the insn. To make it error out const_0_to_255_operand would need to be used. */ e1 = _mm256_shuffle_pd (e2, e3, 256); b1 = _mm256_shuffle_ps (b2, b3, 256); i1 = _mm_shuffle_epi32 (i2, 256); i1 = _mm_shufflehi_epi16 (i2, 256); i1 = _mm_shufflelo_epi16 (i2, 256); d1 = _mm_shuffle_pd (d2, d3, 256); m1 = _mm_shuffle_pi16 (m2, 256); a1 = _mm_shuffle_ps (a2, a3, 256); /* What about these? Similarly to the above, they result in imm modulo 16 resp. imm modulo 4. */ e1 = _mm256_permute_pd (e2, 16); d1 = _mm_permute_pd (d2, 4); } 2011-06-17 Jakub Jelinek ja...@redhat.com PR target/49411 * config/i386/i386.c (ix86_expand_multi_arg_builtins): If last_arg_constant and last argument doesn't match its predicate, for xop_vpermil2mode3 error out and for xop_rotlmode3 if it is CONST_INT, mask it, otherwise expand using rotlmode3. (ix86_expand_sse_pcmpestr, ix86_expand_sse_pcmpistr): Fix spelling of error message. * config/i386/sse.md (sse4a_extrqi, sse4a_insertqi, vcvtps2ph, *vcvtps2ph, *vcvtps2ph_store, vcvtps2ph256): Use const_0_to_255_operand instead of const_int_operand. Revert: 2011-05-09 Uros Bizjak ubiz...@gmail.com * config/i386/sse.md (blendbits): Remove mode attribute. (sse4_1_blendssemodesuffixavxsizesuffix): Use const_int_operand instead of const_0_to_blendbits_operand for operand 3 predicate. Check integer value of operand 3 in insn constraint. * gcc.target/i386/testimm-1.c: New test. * gcc.target/i386/testimm-2.c: New test. * gcc.target/i386/testimm-3.c: New test. * gcc.target/i386/testimm-4.c: New test. * gcc.target/i386/testimm-5.c: New test. * gcc.target/i386/testimm-6.c: New test. * gcc.target/i386/testimm-7.c: New test. * gcc.target/i386/testimm-8.c: New test. * gcc.target/i386/xop-vpermil2px-2.c: New test. * gcc.target/i386/xop-rotate1-int.c: New test. * gcc.target/i386/xop-rotate2-int.c: New test. --- gcc/config/i386/i386.c.jj 2011-06-17 11:02:11.0 +0200 +++ gcc/config/i386/i386.c 2011-06-17 13:35:26.0 +0200 @@ -25566,16 +25566,61 @@ ix86_expand_multi_arg_builtin (enum insn int adjust = (comparison_p) ? 1 : 0; enum machine_mode mode = insn_data[icode].operand[i+adjust+1].mode; - if (last_arg_constant i == nargs-1) + if (last_arg_constant i == nargs - 1) { - if (!CONST_INT_P (op)) + if (!insn_data[icode].operand[i + 1].predicate (op, mode)) { - error (last argument must be an immediate); - return gen_reg_rtx (tmode); + enum insn_code new_icode = icode; + switch (icode) + { + case CODE_FOR_xop_vpermil2v2df3: + case CODE_FOR_xop_vpermil2v4sf3: + case CODE_FOR_xop_vpermil2v4df3: + case CODE_FOR_xop_vpermil2v8sf3: + error (the last argument must be a 2-bit immediate); + return gen_reg_rtx (tmode); + case CODE_FOR_xop_rotlv2di3: + new_icode = CODE_FOR_rotlv2di3; + goto xop_rotl; + case CODE_FOR_xop_rotlv4si3: + new_icode = CODE_FOR_rotlv4si3; + goto xop_rotl; + case CODE_FOR_xop_rotlv8hi3: + new_icode = CODE_FOR_rotlv8hi3; + goto xop_rotl; + case CODE_FOR_xop_rotlv16qi3: + new_icode = CODE_FOR_rotlv16qi3; + xop_rotl: + if (CONST_INT_P (op)) + { +
Re: unwinding fallback for mips-irix6 n32
Hi Olivier, Rainer Orth wrote: I've finally gotten around to this. Apart from some comment and code cleanups along the lines of the sol2-unwind.h files, I had to minimally adapt the N32 multithreaded code sequence for IRIX 6.5.30 that I'm running here. While I was at it, I added N64 support which proved to be almost trivial. You'll probably have to adapt this for the version of IRIX 6.5 you're running, or we could simply skip the single varying insn. Either way is fine with me. There's a micro stronger confidence in exact matches, but this could lead to spurious propagation failures on other variants of the OS where a third version of that insn could show up while still part of a valid context. agreed: if the number variations we observe remain in the 3-5 range, we can continue with matching all of them. I've installed the patch in the meantime, so feel free to update it with the N64 variants you see on IRIX != 6.5.30 (might even vary in patches, though). 64-bit stack_check2.adb remains broken, though. It SEGVs in memcpy, but the stack is corrupted, so I cannot say yet what's going on. Still can't, but am currently looking into the remaining libjava failures. I'll run a full bootstrap over the weekend. It seems that the remaining libjava failures are unrelated. OK ... They were, but libjava on both IRIX and Tru64 UNIX didn't make use of MD_FALLBACK_FRAME_STATE_FOR yet. I've just fixed that, which allowed me to get rid of another testsuite failure there. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: avx2 branch is created
On 06/17/2011 12:32 AM, H.J. Lu wrote: Jason, can you add it to git mirror? Done. Jason
Re: RFA (fold): PATCH for c++/49290 (folding *(T*)(ar+10))
Hi, On Thu, 16 Jun 2011, Richard Guenther wrote: If people want to not create useless conversions in the first place, though, I suspect there are lots of places that create useless conversions in the compiler. Yeah, the above looks it comes from the frontends - gimplification should strip this conversion, if it doesn't that is a bug ;) Well, Mike said this view_convert_expr actually is on the LHS of an assignment. I wouldn't be surprised if nobody strips useless conversions of LHSes, exactly because such things on a LHS are no conversions. Ciao, Michael.
Re: [PATCH]: Pass -no_pie on SYSTEMSPEC for darwin11
On Fri, Jun 17, 2011 at 12:04:34PM +0100, IainS wrote: Hi Jack, On 17 Jun 2011, at 03:21, Jack Howarth wrote: The gcj compiler needs to pass -no_pie for linkage on darwin11 due to the new -pie default of the linker. The attached patch accomplishes this by passing -no_pie on SYSTEMSPEC for *-*-darwin[12]*. Since Darwin10 supports -no_pie in its linker, I included it in the triplet match to simplify the syntax. Bootstrap and tested on x86_64- apple-darwin11. Okay for gcc trunk? Jack 2011-06-16 Jack Howarth howa...@bromo.med.uc.edu * libjava/configure.ac (SYSTEMSPEC): Pass -no_pie for darwin11. * libjava/configure: Regenerate. I would like to see some more analysis of what the underlying reasons for failure are. Iain, Stock gcc trunk without my patch, which creates a gcj (ecj1) linked with -pie, crashes as... [MacPro:~] howarth% gcj-fsf-4.7 --main=testme -O testme.java -pie -v Using built-in specs. Reading specs from /sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.0.0/4.7.0/../../../libgcj.spec rename spec startfile to startfileorig rename spec lib to liborig COLLECT_GCC=gcj-fsf-4.7 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.0.0/4.7.0/lto-wrapper Target: x86_64-apple-darwin11.0.0 Configured with: ../gcc-4.7-20110617/configure --prefix=/sw --prefix=/sw/lib/gcc4.7 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.7/info --enable-languages=c,c++,fortran,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.7 --enable-checking=yes --enable-cloog-backend=isl Thread model: posix gcc version 4.7.0 20110617 (experimental) (GCC) COLLECT_GCC_OPTIONS='-fsaw-java-file' '-mmacosx-version-min=10.7.0' '-O' '-pie' '-v' '-fbootclasspath=./:/sw/lib/gcc4.7/share/java/libgcj-4.7.0.jar' '-shared-libgcc' '-mtune=core2' /sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.0.0/4.7.0/ecj1 testme.java -fbootclasspath=./:/sw/lib/gcc4.7/share/java/libgcj-4.7.0.jar -fsource=1.5 -ftarget=1.5 -fzip-dependency /var/folders/1l/n78sywl52lz6kkys6nv7mnphgp/T//ccxj1zOQ.zip -fzip-target /var/folders/1l/n78sywl52lz6kkys6nv7mnphgp/T//ccXXudTY.jar Exception in thread main java.lang.ClassFormatError: org.eclipse.jdt.internal.compiler.Compiler (erroneous exception handler info) at java.lang.VMClassLoader.defineClass(libgcj.12.dylib) at java.lang.ClassLoader.defineClass(libgcj.12.dylib) at java.security.SecureClassLoader.defineClass(libgcj.12.dylib) at java.net.URLClassLoader.findClass(libgcj.12.dylib) at java.lang.ClassLoader.loadClass(libgcj.12.dylib) at java.lang.ClassLoader.loadClass(libgcj.12.dylib) at org.eclipse.jdt.internal.compiler.impl.CompilerOptions.resetDefaults(CompilerOptions.java:963) at org.eclipse.jdt.internal.compiler.impl.CompilerOptions.init(CompilerOptions.java:371) at org.eclipse.jdt.internal.compiler.impl.CompilerOptions.init(CompilerOptions.java:363) at org.eclipse.jdt.internal.compiler.batch.Main.initialize(Main.java:3548) at org.eclipse.jdt.internal.compiler.batch.Main.init(Main.java:1435) at org.eclipse.jdt.internal.compiler.batch.Main.init(Main.java:1423) at org.eclipse.jdt.internal.compiler.batch.GCCMain.init(GCCMain.java:62) at org.eclipse.jdt.internal.compiler.batch.GCCMain.main(GCCMain.java:498) Running... MacPro:~] howarth% gdb /sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.0.0/4.7.0/ecj1 ... (gdb) r testme.java -fbootclasspath=./:/sw/lib/gcc4.7/share/java/libgcj-4.7.0.jar -fsource=1.5 -ftarget=1.5 -fzip-dependency testme.zip -fzip-target testme.jar Starting program: /sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.0.0/4.7.0/ecj1 testme.java -fbootclasspath=./:/sw/lib/gcc4.7/share/java/libgcj-4.7.0.jar -fsource=1.5 -ftarget=1.5 -fzip-dependency testme.zip -fzip-target testme.jar Reading symbols for shared libraries + done Program exited normally. ...produces no backtraces (and no a.out). Jack -fpie works fine with darwin 9 and darwin 10 libjava [XCode 3.1.4 and 3.2.5 respectively, bootstrap w/4.2.1] (modulo suppressing it when building test-suite .dylibs *** - which is a testsuite options handling issue - not a fundamental problem). so: make -k check-target-libjava RUNTESTFLAGS=--target_board=unix/-fpie\{- m32,-m64\} passes without regression (re the no pie case) given that pie is suppressed for dylibs. - So is your proposed patch a work-around for (as yet unreleased) darwin 11 tool-chain bugs or ... ? Iain === kludge to suppress pie for for dylibs (use in place of the darwin9.h hunk from the attachment on PR49371). Index: gcc/config/darwin9.h === --- gcc/config/darwin9.h(revision 175110) +++ gcc/config/darwin9.h(working copy) @@ -35,6
Re: [Patch, AVR]: PR49313, fix PR29524
On 06/17/2011 02:20 AM, Georg-Johann Lay wrote: Richard Henderson schrieb: On 06/15/2011 02:47 AM, Georg-Johann Lay wrote: +#if defined (L_loop_ffsqi2) +;; Helper for ffshi2, ffssi2 +;; r25:r24 = r26 + zero_extend16 (ffs8(r24)) +;; r24 must be != 0 +;; clobbers: r26 +DEFUN __loop_ffsqi2 Why does this function have loop in its name? The actual implementation is surely irrelevant. hmmm. I needed some global name that can be referenced from __ffshi2 resp. __ffssi2. The function in itself is not very helpful as stand alone. You prefer some other naming for such global helpers? __ffsqi_nz perhaps? The following instruction is BRNE, a conditional branch. Oops, missed that, sorry. Yes, you are right. Following patchlet ok? Johann * config/avr/libgcc.S (__ctzsi2, __ctzhi2): Map zero to 255. You'd also delete the COUNT_LEADING_ZEROS_0 definition in longlong.h. r~
Re: [Patch, AVR]: PR49313, fix PR29524
Richard Henderson schrieb: On 06/17/2011 02:20 AM, Georg-Johann Lay wrote: Richard Henderson schrieb: On 06/15/2011 02:47 AM, Georg-Johann Lay wrote: Yes, you are right. Following patchlet ok? Johann * config/avr/libgcc.S (__ctzsi2, __ctzhi2): Map zero to 255. You'd also delete the COUNT_LEADING_ZEROS_0 definition in longlong.h. r~ __clzsi2(0) still returns 32 as it does not use ctz. So if implementation of ctz affects validity of COUNT_LEADING_ZEROS_0 that's bit confusing... Johann
Re: [PATCH]: Pass -no_pie on SYSTEMSPEC for darwin11
On Jun 16, 2011, at 7:21 PM, Jack Howarth wrote: The gcj compiler needs to pass -no_pie for linkage on darwin11 due to the new -pie default of the linker. The attached patch accomplishes this by passing -no_pie on SYSTEMSPEC for *-*-darwin[12]*. Since Darwin10 supports -no_pie in its linker, I included it in the triplet match to simplify the syntax. Bootstrap and tested on x86_64-apple-darwin11. Okay for gcc trunk? The darwin aspects of this seem right to me. I'm less familiar with the SYSTEMSPEC bit in libjava, someone want to give a quick comment on if this is the right knob to twist? Roughly, there are unresolved issues with position independent code with the garbage collector (or was it a problem with some unknown bit in the compiler), and this just turns off a new OS default for java to work around the issue. I hope that's an at least half way accurate description. We're aiming to have the default for C be -fpie, but for java, no pie. For the C compiler, we must build the compiler without -pie in order for PCH to work. It is reasonable to turn off pie in the java compiler as well. Jack, do we have a PR number for this? If so, please include in the changelog in the usual spot. 2011-06-16 Jack Howarth howa...@bromo.med.uc.edu * libjava/configure.ac (SYSTEMSPEC): Pass -no_pie for darwin11. * libjava/configure: Regenerate. Index: libjava/configure.ac === --- libjava/configure.ac (revision 175131) +++ libjava/configure.ac (working copy) @@ -898,9 +898,12 @@ case ${host} in SYSTEMSPEC=-lunicows $SYSTEMSPEC fi ;; -*-*-darwin[[912]]*) +*-*-darwin9*) SYSTEMSPEC=%{!Zdynamiclib:%{!Zbundle:-allow_stack_execute}} ;; +*-*-darwin[[12]]*) + SYSTEMSPEC=-no_pie %{!Zdynamiclib:%{!Zbundle:-allow_stack_execute}} +;; *) SYSTEMSPEC= ;;
[PATCH] parallelize g++ testing a bit more
I've done a lot of g++-only testsuite runs lately and I noticed that it didn't parallelize all that well. The patch below adds a couple more .exp files to the parallel infrastructure. dg-torture.exp is the big one; it takes about as much time as old-deja.exp. Other valid candidates are lto.exp and debug.exp, but the patch cuts g++ testing time in half as-is, so I felt it was a sufficient stopping point. OK to commit? -Nathan gcc/cp/ * Make-lang.in (check_g++_parallelize): Add more .exp files. diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index 45efd67..95bae37 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -154,7 +154,7 @@ check-c++-subtargets : check-g++-subtargets lang_checks += check-g++ lang_checks_parallelized += check-g++ # For description see comment above check_gcc_parallelize in gcc/Makefile.in. -check_g++_parallelize = old-deja.exp dg.exp +check_g++_parallelize = old-deja.exp dg.exp dg-torture.exp struct-layout-1.exp # # Install hooks:
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/17/11 08:12, Zdenek Dvorak wrote: if we do, we definitely should not -- the only point of such a construction is to bypass the pointer arithmetics restrictions, Ok, we don't. Where does the C standard say that NULL + 0 is undefined? I don't think it explicitly states that it is undefined. But 6.5.6 #7 and #8 only give semantics for pointers to objects, Right. I think we need to review the standard closely, but IIRC pointers are allowed to be NULL, point into an object or point to one element beyond an object. If that's correct P+C could only result in a NULL value if P was one element beyond the end of an object and C happened to be the exact magic value to wrap P to zero. I have a hard time seeing how code could which did that (and relied upon it) could ever be standard conforming. Closely related, given P+C where the result is a pointer ought to tells us that P is non-null since the only way it could be null would be if C was the exact magic constant to make the result point to a valid object. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN+5ZSAAoJEBRtltQi2kC7H34H/RG+9VrV6RqeolgSIqjyLsUf WbJwl2AWo3xIMt/8OKpWG5zzhgJk67tW+PpSHhzs615kjRPryOiiNq+GBIelteKT ho3mgbBeU5qJsPLU6j2feBR4+OEdo/oJZxHm/m8zUwWcGuZtazNGtoiuq7rlNr52 PDl7DM7ZWK731nFZfKYPq/fYMgNxWhxTBo9ucH3s9yXKJ6LYbUGHpKNyP14nB3n3 bJhdPF8A365uLYz5xCmP0QOKInbzNclN+gbTVZXc+NxtOYNUM16NalsxWhHSALCB U0S9hpDMDznWWEwPDNdNN2oRphSGmIB0oYxeuaga5RgviNiPNVgEzANkGIv7OEo= =z8kH -END PGP SIGNATURE-
Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
On Fri, Jun 17, 2011 at 10:45 AM, Fang, Changpeng changpeng.f...@amd.com wrote: Why not just move AVX256_SPLIT_UNALIGNED_STORE and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices? I would like to keep the -m option so that at least we can explicitly turn off the splittings when regressions are found! I prefer to implement it the same way as: x86_accumulate_outgoing_args x86_arch_always_fancy_math_387 By the way, I can add an index for store splitting, if you want. Yes, please. -- H.J.
Re: [PATCH] parallelize g++ testing a bit more
OK. Jason
Re: [Patch, AVR]: QI builtins for parity, popcount, 1 n
Joseph S. Myers schrieb: On Fri, 17 Jun 2011, Georg-Johann Lay wrote: I don't see what's bat with the patch, it's straight forward. C is a high-level language, defined in terms of high-level semantics rather than machine instructions. C code should be written where possible using machine-independent functionality, falling into machine-dependent operations only where the semantics cannot readily be represented in a machine-independent way; the compiler should generally be responsible for picking optimal instructions for the source code. C code should write 1 n (or 1 (n 7) if that's what it wants) for shifts rather than __builtin_avr_pow2. That way it's more portable and more readable to general C programmers who aren't familiar with all the machine-specific interfaces. Similarly, code wanting to add values should use the + operator rather than each target having its own __builtin_arch_add. And since parity and population-count operations are widely present and generically useful, GNU C has generic built-in functions for those, so code should use __builtin_parity and __builtin_popcount on all machines rather than machine-specific variants; such machine-specific variants should not exist. The machine-independent parts of the compiler should know about the equivalence of (popcount X) and (popcount (zero_extend X)), (parity X) and (parity (zero_extend X)) and (parity X) and (parity (sign_extend X)) if the sign-extension is by an even number of bits - in fact, there is already code in simplify_unary_operation_1 that does know this (the assumption that all sign-extensions are by an even number of bits is hardcoded). The target should just need to describe how to code these operations on various modes. If the existing code doesn't suffice to cause popcountqi etc. patterns to be used for the generic built-in functions, the appropriate fix is generic rather than adding new target-specific built-in functions - just as if the compiler didn't generate your target's addition instruction from the '+' operator, the right fix is not to add __builtin_arch_add to generate that instruction but rather to make '+' generate it. Joseph S. Myers Thanks for you detailed explanations. For popcount and parity I agree, the patch is cleaner now and there are no ad-hoc builtins needed to get the QI versions produced. Trying popcountsi2 confused be a bit: Without popcountsi2 in md, optabs expands to a function as HI = popcount (SI) as I expect from the prototype/documentation. For the expander, however, destination (operand0) must be SI in order to get it to work. With HI destination the expander is ignored. For the 1 n type of shifts the situation on a 8-bit µC is considerably different to a PC: On bare metal with RAM of 128 bytes, 2K for the program and at your option also hard real time constraints, the situation is different and sometimes each instruction/tick/byte counts. In my programs I avoid shifting by variable offsets if possible; but some guys do it. The resulting code is a loop because AVR can only shift by one bit. Together with C spec that results in a bulky 16-bit loop, see PR29560. A typical piece of code might look like this: #define SFR (*((volatile unsigned char*) 0x2b)) void set_port (unsigned char pin) { SFR |= (1 pin); } which compiles to set_port: in r25,43-0x20 ; 7 *movqi/4[length = 1] ldi r18,lo8(1) ; 9 *movhi/4[length = 2] ldi r19,hi8(1) rjmp 2f ; 10 ashlhi3/1 [length = 6] 1: lsl r18 rol r19 2: dec r24 brpl 1b or r25,r18 ; 11 iorqi3/1[length = 1] out 43-0x20,r25 ; 13 *movqi/3[length = 1] ret ; 21 return [length = 1] Trying to hack around that and to get at least an 8-bit loop fails. Sprinkling (unsigned char) casts won't help and writing 1 (pin 7) will add just another instruction (andqi3). Insn combine does not come up with more than (set (reg:HI) (ashift:HI (const_int 1) (reg:QI))) and even if a suitable pattern would show up it is not nice to clutter a backend with this sorts of code like (set (reg:QI) (subreg:QI (ashift:HI (const_int 1) (and:QI (reg:QI) (const_int 7))) 0)) or whatever. There is a ashlqi3 pattern in avr BE but I never managed to produce it (except with -mint8 which changes ABI and is not really supported any more). For the addition, some users torture poor AVR with long long. I agree that it's overkill and anyone that writes that in C must know what he is doing. Anyway, there are programmers that write such code and expect avr-gcc to come up with code like subi r18, -1 sbci r19, -1 sbci r20, -1 sbci r21, -1 sbci r22, -1 sbci r23, -1 sbci r24, -1 sbci r25, -1 for x=x+1. What they actually will get is mov r14,r18 ; 215 *movqi/1[length
Re: [Patch, AVR]: QI builtins for parity, popcount, 1 n
Georg-Johann Lay schrieb: To come back to the original topic, here is a tentative patch for better popcount and parity: * config/avr/t-avr (LIB1ASMFUNCS): Rename _loop_ffsqi2 to _ffsqi2_nz. * confif/avr/libgcc.S: Ditto. Rename __loop_ffsqi2 to __ffsqi2_nz. (__ctzsi2, __ctzhi2): Map zero to 255. (__popcounthi2): Use r27 instead of r30. (__popcountdi2): Use r30 instead of r27. * config/avr/avr.md (parityhi2): New expander. (popcounthi2): New expander. (popcountsi2): New expander. (*parityhi2.libgcc): New insn. (*parityqihi2.libgcc): New insn. (*popcounthi2.libgcc): New insn. (*popcountsi2.libgcc): New insn. (*popcountqi2.libgcc): New insn. (*popcountqihi2.libgcc): New insn_and_split. Johann Oops, picked the wrong file. Index: config/avr/libgcc.S === --- config/avr/libgcc.S (revision 175149) +++ config/avr/libgcc.S (working copy) @@ -935,7 +935,7 @@ DEFUN __ffssi2 brne 1f ret 1: mov r24, r22 -XJMP __loop_ffsqi2 +XJMP __ffsqi2_nz ENDF __ffssi2 #endif /* defined (L_ffssi2) */ @@ -946,7 +946,7 @@ ENDF __ffssi2 DEFUN __ffshi2 clr r26 cpse r24, __zero_reg__ -1: XJMP __loop_ffsqi2 +1: XJMP __ffsqi2_nz ldi r26, 8 or r24, r25 brne 1b @@ -954,20 +954,20 @@ DEFUN __ffshi2 ENDF __ffshi2 #endif /* defined (L_ffshi2) */ -#if defined (L_loop_ffsqi2) +#if defined (L_ffsqi2_nz) ;; Helper for ffshi2, ffssi2 ;; r25:r24 = r26 + zero_extend16 (ffs8(r24)) ;; r24 must be != 0 ;; clobbers: r26 -DEFUN __loop_ffsqi2 +DEFUN __ffsqi2_nz inc r26 lsr r24 -brcc __loop_ffsqi2 +brcc __ffsqi2_nz mov r24, r26 clr r25 ret -ENDF __loop_ffsqi2 -#endif /* defined (L_loop_ffsqi2) */ +ENDF __ffsqi2_nz +#endif /* defined (L_ffsqi2_nz) */ /** @@ -977,12 +977,11 @@ ENDF __loop_ffsqi2 #if defined (L_ctzsi2) ;; count trailing zeros ;; r25:r24 = ctz32 (r25:r22) -;; ctz(0) = 32 +;; ctz(0) = 255 +;; Note that ctz(0) is undefined for GCC. DEFUN __ctzsi2 XCALL __ffssi2 dec r24 -sbrc r24, 7 -ldi r24, 32 ret ENDF __ctzsi2 #endif /* defined (L_ctzsi2) */ @@ -990,12 +989,11 @@ ENDF __ctzsi2 #if defined (L_ctzhi2) ;; count trailing zeros ;; r25:r24 = ctz16 (r25:r24) -;; ctz(0) = 16 +;; ctz(0) = 255 +;; Note that ctz(0) is undefined for GCC. DEFUN __ctzhi2 XCALL __ffshi2 dec r24 -sbrc r24, 7 -ldi r24, 16 ret ENDF __ctzhi2 #endif /* defined (L_ctzhi2) */ @@ -1129,13 +1127,13 @@ ENDF __parityqi2 #if defined (L_popcounthi2) ;; population count ;; r25:r24 = popcount16 (r25:r24) -;; clobbers: r30, __tmp_reg__ +;; clobbers: r27, __tmp_reg__ DEFUN __popcounthi2 XCALL __popcountqi2 -mov r30, r24 +mov r27, r24 mov r24, r25 XCALL __popcountqi2 -add r24, r30 +add r24, r27 clr r25 ret ENDF __popcounthi2 @@ -1144,7 +1142,7 @@ ENDF __popcounthi2 #if defined (L_popcountsi2) ;; population count ;; r25:r24 = popcount32 (r25:r22) -;; clobbers: r26, r30, __tmp_reg__ +;; clobbers: r26, r27, __tmp_reg__ DEFUN __popcountsi2 XCALL __popcounthi2 mov r26, r24 @@ -1162,13 +1160,13 @@ ENDF __popcountsi2 ;; clobbers: r22, r23, r26, r27, r30, __tmp_reg__ DEFUN __popcountdi2 XCALL __popcountsi2 -mov r27, r24 +mov r30, r24 mov_l r22, r18 mov_h r23, r19 mov_l r24, r20 mov_h r25, r21 XCALL __popcountsi2 -add r24, r27 +add r24, r30 ret ENDF __popcountdi2 #endif /* defined (L_popcountdi2) */ Index: config/avr/avr.md === --- config/avr/avr.md (revision 175149) +++ config/avr/avr.md (working copy) @@ -3321,6 +3321,92 @@ (define_insn delay_cycles_4 [(set_attr length 9) (set_attr cc clobber)]) +(define_expand parityhi2 + [(set (reg:HI 24) +(match_operand:HI 1 register_operand )) + (set (reg:HI 24) +(parity:HI (reg:HI 24))) + (set (match_operand:HI 0 register_operand ) +(reg:HI 24))] + + ) + +(define_insn *parityhi2.libgcc + [(set (reg:HI 24) +(parity:HI (reg:HI 24)))] + + %~call __parityhi2 + [(set_attr type xcall) + (set_attr cc clobber)]) + +(define_insn *parityqihi2.libgcc + [(set (reg:HI 24) +(parity:HI (reg:QI 24)))] + + %~call __parityqi2 + [(set_attr type xcall) + (set_attr cc clobber)]) + +(define_expand popcounthi2 + [(set (reg:HI 24) +(match_operand:HI 1 register_operand )) + (parallel[(set (reg:HI 24) + (popcount:HI (reg:HI 24))) + (clobber (reg:QI 27))]) + (set (match_operand:HI 0 register_operand ) +(reg:HI 24))] + + ) + +(define_expand popcountsi2 + [(set (reg:SI 22) +(match_operand:SI 1 register_operand )) + (parallel[(set (reg:HI 24) + (popcount:HI (reg:SI 22))) +
[Patch, Fortran, committed] PR 48699: [4.6/4.7 Regression] [OOP] MOVE_ALLOC inside SELECT TYPE
Hi all, I have just committed to trunk a simple patch for an OOP regression with MOVE_ALLOC (see comment #9-#11 in the PR): http://gcc.gnu.org/viewcvs?view=revisionrevision=175151 I will apply this to the 4.6 branch soon. Cheers, Janus
Re: Improve DSE in the presence of calls
This patch seems to break ia64 and some other targets. I have updated Bug 49429 with a test case triage. It looks like for some EXPR, may_be_aliased returns incorrect value and hence can_escape incorrectly concludes the variable can't escape resulting in removal of useful stores. (So can_escape() returns false. But later on, in the same BB, I see: In the following list of instructions, (insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) y.c:23 5 {movdi_internal} (expr_list:REG_DEAD (reg:DI 112 in0) (nil))) ... (insn 36 30 37 2 (set (reg:DI 120 out0) (reg/f:DI 357)) 5 {movdi_internal} (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 62 [0x3e])) (nil))) (insn 37 36 38 2 (set (reg:DI 121 out1) (reg/f:DI 341)) 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 341) (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 96 [0x60])) (nil (insn 38 37 39 2 (set (reg:DI 122 out2) (const_int 31 [0x1f])) 5 {movdi_internal} (nil)) (call_insn 39 38 42 2 (parallel [ (set (reg:DI 8 r8) (call (mem:DI (symbol_ref:DI (memcpy) [flags 0x41] function_decl 0x770d2e00 memcpy) [0 memcpy S8 A64]) (const_int 1 [0x1]))) (clobber (reg:DI 320 b0)) (clobber (scratch:DI)) (clobber (scratch:DI)) ]) 332 {call_value_gp} (expr_list:REG_DEAD (reg:DI 122 out2) (expr_list:REG_DEAD (reg:DI 121 out1) (expr_list:REG_DEAD (reg:DI 120 out0) (expr_list:REG_UNUSED (reg:DI 8 r8) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:DI 1 r1)) (expr_list:REG_DEP_TRUE (use (reg:DI 122 out2)) (expr_list:REG_DEP_TRUE (use (reg:DI 121 out1)) (expr_list:REG_DEP_TRUE (use (reg:DI 120 out0)) (nil)) for the memory expression (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) may_be_aliased() returns false, but register 341 is passed as the second parameter to memcpy. I suspect this is a bug elsewhere which is exposed by this patch. If someone knows why this might be happening, I can tighten the can_escape() function appropriately. Thanks, Easwaran On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/10/11 13:18, Easwaran Raman wrote: I am not sure I understand the problem here. If there is a wild read from asm, the instruction has the wild_read flag set. The if statement checks if that flag is set and if so it clears the bitmap - which was the original behavior. Originally, only if read_rec is non NULL you need to recompute the kill set. Now, even if read_rec is NULL, non_frame_wild_read could be set requiring the kill set to be modified, which is what this patch does. In fact, isn't what you have written above the equivalent to what is in the patch as '/* Leave this clause unchanged */' is the same as if (dump_file) fprintf (dump_file, regular read\n); scan_reads_nospill (insn_info, v, NULL); -Easwaran Ping. I have changed the test case to use int and added another test case that shows DSE doesn't happen when the struct instance is volatile (wild_read gets set in that case) What's the purpose behind using unit64_t in the testcase? Somehow I suspect using int64_t means the test is unlikely not going to work across targets with different word sizes. Sorry for the exceedingly long wait. Things have been a bit crazy the last several weeks. On a positive note, re-reading things now I think my objection/comment was mis-guided. Patch approved, and again, sorry for the absurdly long period of non-responsiveness. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5 6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r 40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw= =o7AI -END PGP SIGNATURE-
Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
On Fri, Jun 17, 2011 at 3:18 PM, Fang, Changpeng changpeng.f...@amd.com wrote: Hi, I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that enable it. Is this OK? Can you do something similar to how MASK_ACCUMULATE_OUTGOING_ARGS is handled? Thanks. H.J.
[pph] Initialize cache_ix in all paths in pph_start_record (issue4642045)
Read the comment in the diff for all the details. I found this while working on my current patch, adding some assertion at the end of the function would create a new build error and moving it around in the function would stop showing the error. I discussed this with Collin and Jeff and it appears that the compiler changes it's inlining decisions based on very picky details. In this particular case, inlining the function resulted in an error. Tested with bootstrap build and pph regression testing. 2011-06-17 Gabriel Charette gch...@google.com * gcc/cp/pph-streamer-in.c (pph_start_record): Initialize cache_ix in all paths. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index b3c2ac9..186100f 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -204,7 +204,15 @@ pph_start_record (pph_stream *stream, unsigned *cache_ix) if (marker == PPH_RECORD_START || marker == PPH_RECORD_SHARED) *cache_ix = pph_in_uint (stream); else -gcc_assert (marker == PPH_RECORD_END); +{ + gcc_assert (marker == PPH_RECORD_END); + /* Initialize CACHE_IX to an invalid index. Even though this +is never used in practice, the compiler will throw an error +if the optimizer inlines this function and discards the asserts +in a given build as it will complain that 'ix' may be used +unititialized. */ + *cache_ix = -1; +} return marker; } -- This patch is available for review at http://codereview.appspot.com/4642045
[pph] Rename two pph_start_record functions adding in/out. (issue4629049)
There were two pph_start_record functions, one for the in part, one for the out part. I renamed them to have different names to avoid confusion for the reader (and it also makes it easier for tags). 2011-06-17 Gabriel Charette gch...@google.com * gcc/cp/pph-streamer-in.c (pph_in_start_record): Rename from pph_start_record. Update all users. * gcc/cp/pph-streamer-out.c (pph_out_start_record): Rename from pph_start_record. Update all users. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index b3c2ac9..cd611c7 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -192,7 +192,7 @@ pph_init_read (pph_stream *stream) materialized structure. */ static inline enum pph_record_marker -pph_start_record (pph_stream *stream, unsigned *cache_ix) +pph_in_start_record (pph_stream *stream, unsigned *cache_ix) { enum pph_record_marker marker; @@ -211,7 +211,7 @@ pph_start_record (pph_stream *stream, unsigned *cache_ix) /* Return a shared pointer from the streamer cache in STREAM. This is - called when pph_start_record returns PPH_RECORD_SHARED. It means + called when pph_in_start_record returns PPH_RECORD_SHARED. It means that the data structure we are about to read has been instantiated before and is present in the streamer cache. */ @@ -330,7 +330,7 @@ pph_in_cxx_binding_1 (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -378,7 +378,7 @@ pph_in_class_binding (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -401,7 +401,7 @@ pph_in_label_binding (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -425,7 +425,7 @@ pph_in_binding_level (pph_stream *stream) struct bitpack_d bp; enum pph_record_marker marker; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -495,7 +495,7 @@ pph_in_c_language_function (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -521,7 +521,7 @@ pph_in_language_function (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -614,7 +614,7 @@ pph_in_struct_function (pph_stream *stream) gcc_assert (stream-data_in != NULL); - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; @@ -713,7 +713,7 @@ pph_in_lang_specific (pph_stream *stream, tree decl) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return; @@ -828,7 +828,7 @@ pph_in_sorted_fields_type (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) @@ -908,7 +908,7 @@ pph_in_lang_type_class (pph_stream *stream, ltc-typeinfo_var = pph_in_tree (stream); ltc-vbases = pph_in_tree_vec (stream); - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_START) { ltc-nested_udts = pph_in_binding_table (stream); @@ -950,7 +950,7 @@ pph_in_lang_type (pph_stream *stream) enum pph_record_marker marker; unsigned ix; - marker = pph_start_record (stream, ix); + marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; else if (marker == PPH_RECORD_SHARED) diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c index 49847a9..83e90c3 100644 --- a/gcc/cp/pph-streamer-out.c +++ b/gcc/cp/pph-streamer-out.c @@ -200,7 +200,7 @@ pph_flush_buffers (pph_stream *stream) cache, write a shared-record marker and return false. */ static inline bool -pph_start_record (pph_stream *stream, void *data) +pph_out_start_record (pph_stream *stream, void *data) { if
Re: [Patch, libfortran] PR 49296 List read of file without EOR
On 06/17/2011 12:02 PM, Janne Blomqvist wrote: PING. Also, here's a slightly simplfied testcase: Yes, OK Jerry
[PATCH] move libiconv linkage from libgcj.spec.in to libgcj_la_LDFLAGS
The attached patch moves the linkage on $(LDLIBICONV) from libgcj.spec.in to a more correct direct linkage on libgcj which contains the unresolved symbols. This change also solves problems when a multilib only has libiconv for one of the two architectures (ie building gcc trunk on x86_64 fink which lacks a i386 libiconv). Bootstrap and libjava regression tested on x86_64 darwin11. Okay for gcc trunk? Jack 2011-06-18 Jack Howarth howa...@bromo.med.uc.edu * libjava/libgcj.spec.in: Don't pass @LDLIBICONV@. * libjava/Makefile.am (libgcj_la_LDFLAGS): Pass @LDLIBICONV@. * libjava/Makefile.in: Regenerate. Index: libjava/libgcj.spec.in === --- libjava/libgcj.spec.in (revision 175163) +++ libjava/libgcj.spec.in (working copy) @@ -7,6 +7,6 @@ *startfile: @THREADSTARTFILESPEC@ %(startfileorig) %rename lib liborig -*lib: @LD_START_STATIC_SPEC@ @LIBGCJ_SPEC@ @LD_FINISH_STATIC_SPEC@ @LIBMATHSPEC@ @LDLIBICONV@ @GCSPEC@ @THREADSPEC@ @ZLIBSPEC@ @SYSTEMSPEC@ %(libgcc) @LIBSTDCXXSPEC@ %(liborig) +*lib: @LD_START_STATIC_SPEC@ @LIBGCJ_SPEC@ @LD_FINISH_STATIC_SPEC@ @LIBMATHSPEC@ @GCSPEC@ @THREADSPEC@ @ZLIBSPEC@ @SYSTEMSPEC@ %(libgcc) @LIBSTDCXXSPEC@ %(liborig) *jc1: @HASH_SYNC_SPEC@ @DIVIDESPEC@ @CHECKREFSPEC@ @JC1GCSPEC@ @EXCEPTIONSPEC@ @BACKTRACESPEC@ @IEEESPEC@ @ATOMICSPEC@ @LIBGCJ_BC_SPEC@ -fkeep-inline-functions Index: libjava/Makefile.am === --- libjava/Makefile.am (revision 175163) +++ libjava/Makefile.am (working copy) @@ -492,7 +492,7 @@ xlib_nat_files = $(xlib_nat_source_files libgcj_la_LDFLAGS = -rpath $(toolexeclibdir) $(THREADLDFLAGS) $(extra_ldflags) $(THREADLIBS) \ $(LIBLTDL) $(SYS_ZLIBS) $(LIBJAVA_LDFLAGS_NOUNDEF) \ -version-info `grep -v '^\#' $(srcdir)/libtool-version` \ - $(LIBGCJ_LD_SYMBOLIC_FUNCTIONS) $(LIBGCJ_LD_EXPORT_ALL) + $(LIBGCJ_LD_SYMBOLIC_FUNCTIONS) $(LIBGCJ_LD_EXPORT_ALL) $(LDLIBICONV) libgcj_la_LIBADD = \ classpath/native/fdlibm/libfdlibm.la \ java/lang/Object.lo \
C++ PATCH for c++/49458 (rvalue reference to function and conversion ops)
Another special case for the rule that an rvalue reference to function is an lvalue. Tested x86_64-pc-linux-gnu, applying to trunk. commit df1f90245e5436ceadf6b3df5cc6f7ef7e0536c0 Author: Jason Merrill ja...@redhat.com Date: Fri Jun 17 16:45:44 2011 -0400 PR c++/49458 * call.c (convert_class_to_reference_1): Allow binding function lvalue to rvalue reference. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index b43d078..05bf983 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -1362,6 +1362,8 @@ convert_class_to_reference_1 (tree reference_type, tree s, tree expr, int flags) /* Don't allow binding of lvalues to rvalue references. */ if (TYPE_REF_IS_RVALUE (reference_type) + /* Function lvalues are OK, though. */ + TREE_CODE (TREE_TYPE (reference_type)) != FUNCTION_TYPE !TYPE_REF_IS_RVALUE (TREE_TYPE (TREE_TYPE (cand-fn cand-second_conv-bad_p = true; } diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-func2.C b/gcc/testsuite/g++.dg/cpp0x/rv-func2.C new file mode 100644 index 000..b792342 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/rv-func2.C @@ -0,0 +1,10 @@ +// PR c++/49458 +// { dg-options -std=c++0x } + +typedef void ftype(); + +struct A { + operator ftype(void); +}; + +ftype frvref = A();