[PATCH] PR52528, combine fix
Hi, As described in the PR, a testcase compiled for PowerPC: struct S { unsigned a : 30; unsigned b : 2; }; int foo (int b) { struct S s = {0}; s.b = b; return bar (0x000b0010, 0x00040100ULL, *(unsigned long *)s); } currently this is compiled to: foo: lis 6,0x4 li 5,0 ori 6,6,256 li 7,0 crxor 6,6,6 b bar Notice the incorrect code generated: no construction of the 1st arg (reg 3), and wrong code for the 3rd arg (reg 7) The problem seems to be in combine, during calls from try_combine() to can_combine_p(): can_combine_p() has a call to expand_field_assignment(), which may call get_last_value() during its simplification operations (through the reg_nonzero_bits_for_combine() hook); not setting subst_low_luid properly affects its correctness. So the fix is a one-liner that sets subst_low_luid before the expand_field_assignment() call. Bootstrapped and tested under i686, x86-64, powerpc64. Cross-tested on ARM. I was a bit weary that some optimization regression might appear, which will complicate things, but everything looks fine. I have a larger (customer provided) testcase that exposed this bug after rev.161655 (the mem-ref2 merge, may be related to effects on bitfields). So if suitable, please also approve this patch for 4.6/4.7 branches. Thanks, Chung-Lin 2012-03-10 Chung-Lin Tang clt...@codesourcery.com PR rtl-optimization/52528 * combine.c (can_combine_p): Add setting of subst_low_luid before call to expand_field_assignment(). Index: combine.c === --- combine.c (revision 185168) +++ combine.c (working copy) @@ -1822,6 +1822,10 @@ can_combine_p (rtx insn, rtx i3, rtx pred ATTRIBUT if (set == 0) return 0; + /* The simplification in expand_field_assignment() may call back to + get_last_value(), so set safe guard here. */ + subst_low_luid = DF_INSN_LUID (insn); + set = expand_field_assignment (set); src = SET_SRC (set), dest = SET_DEST (set);
Re: [PATCH] PR52528, combine fix
So the fix is a one-liner that sets subst_low_luid before the expand_field_assignment() call. Bootstrapped and tested under i686, x86-64, powerpc64. Cross-tested on ARM. I was a bit weary that some optimization regression might appear, which will complicate things, but everything looks fine. I have a larger (customer provided) testcase that exposed this bug after rev.161655 (the mem-ref2 merge, may be related to effects on bitfields). So if suitable, please also approve this patch for 4.6/4.7 branches. Thanks, Chung-Lin 2012-03-10 Chung-Lin Tang clt...@codesourcery.com PR rtl-optimization/52528 * combine.c (can_combine_p): Add setting of subst_low_luid before call to expand_field_assignment(). OK for mainline, 4.7 branch (once 4.7.0 is released) and 4.6 branch, modulo: + /* The simplification in expand_field_assignment() may call back to + get_last_value(), so set safe guard here. */ + subst_low_luid = DF_INSN_LUID (insn); No () in comments, just use the function name. -- Eric Botcazou
[google/integration] Add XFAIL file for arm-gretv2-linux-gnueabi target (issue5798046)
Hi Diego, This patch adds an .xfail file for the arm-grtev2-linux-gnueabi target in the integration branch. -Doug 2012-03-10 Doug Kwan dougk...@google.com * contrib/testsuite-management/arm-grtev2-linux-gnueabi.xfail: New file. Index: contrib/testsuite-management/arm-grtev2-linux-gnueabi.xfail === --- contrib/testsuite-management/arm-grtev2-linux-gnueabi.xfail (revision 0) +++ contrib/testsuite-management/arm-grtev2-linux-gnueabi.xfail (revision 0) @@ -0,0 +1,126 @@ +# Failures in ./gcc/testsuite/gcc/gcc.sum: +# *** gcc: +FAIL: gcc.c-torture/compile/920928-2.c -Os (internal compiler error) +FAIL: gcc.c-torture/compile/920928-2.c -Os (test for excess errors) +FAIL: gcc.dg/builtin-apply2.c execution test +FAIL: gcc.dg/cproj-fails-with-broken-glibc.c execution test +FAIL: gcc.dg/di-longlong64-sync-1.c (test for excess errors) +UNRESOLVED: gcc.dg/di-longlong64-sync-1.c compilation failed to produce executable +FAIL: gcc.dg/di-sync-multithread.c execution test +FAIL: gcc.dg/pr49994-3.c (test for excess errors) +FAIL: gcc.dg/tls/pr42894.c (test for excess errors) +FAIL: gcc.dg/torture/stackalign/builtin-apply-2.c -O0 execution test +FAIL: gcc.dg/torture/stackalign/builtin-apply-2.c -O0 execution test +FAIL: gcc.dg/torture/stackalign/builtin-apply-2.c -O1 execution test +FAIL: gcc.dg/torture/stackalign/builtin-apply-2.c -Os execution test + +# There are flaky when running on QEMU +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O0 execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O1 execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -fomit-frame-pointer execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -g execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -Os execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O0 -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O1 -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -fomit-frame-pointer -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -g -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -Os -fpic execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O0 -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O1 -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -fomit-frame-pointer -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -g -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -Os -fPIC execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O0 -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O1 -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -fomit-frame-pointer -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -g -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -Os -pie -fpie execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O0 -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O1 -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -fomit-frame-pointer -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O3 -g -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -Os -pie -fPIE execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test +flaky | FAIL: gcc.dg/torture/tls/tls-test.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test + +FAIL: gcc.dg/tree-ssa/sra-12.c scan-tree-dump-times release_ssa l; 0 +FAIL: gcc.dg/vect/vect-104.c scan-tree-dump-times vect possible dependence between data-refs 1 +FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect vectorized 1 loops 1 +FAIL: gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect vectorized 1 loops 1 +FAIL: gcc.dg/vect/vect-outer-1-big-array.c scan-tree-dump-times vect strided access in outer loop 1 +FAIL: gcc.dg/vect/vect-outer-1.c scan-tree-dump-times vect strided access in outer loop 1 +FAIL: gcc.dg/vect/vect-outer-1a-big-array.c scan-tree-dump-times vect strided access in outer loop 1 +FAIL: gcc.dg/vect/vect-outer-1a.c scan-tree-dump-times vect strided access in outer loop 1 +FAIL: gcc.dg/vect/vect-outer-1b-big-array.c scan-tree-dump-times vect strided access in outer loop 1 +FAIL: gcc.dg/vect/vect-outer-1b.c scan-tree-dump-times vect strided access in outer loop 1
Many regressions with: [patch] Cleanup fortran/convert.c
Steven Bosscher wrote: This cleans up some remnants of the ancestors of fortran's convert.c, which was copied from GNAT IIRC. I would bootstraptest this, but trunk appears to be broken for x86_64-linux right now (ICE in patch_jump_insn). But I can post this for review, at least. OK for trunk, after bootstrap+test? Your patch seems to have caused many Fortran regressions. At least I see with 185156 only one (known) failure, cf. http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg01069.html While starting with 185160 there are many, many gfortran failures, cf. http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg01073.html Tobias * Make-lang.in (convert.o): Depend on convert.h. * convert.c: Header and comment cleanups. (gfc_thruthvalue_conversion): Rename static function to truthvalue_conversion. Do not use 'internal_error' from here, use 'gcc_unreachable' instead. (convert): Do not use 'error' for conversions to void, use 'gcc_unreachable' instead. Likewise for conversions to non-scalar types. Do not hanlde ENUMERAL_TYPE, the front end never creates them. Clean up #if 0 code.
Re: Many regressions with: [patch] Cleanup fortran/convert.c
On Sat, Mar 10, 2012 at 11:19 AM, Tobias Burnus bur...@net-b.de wrote: Steven Bosscher wrote: This cleans up some remnants of the ancestors of fortran's convert.c, which was copied from GNAT IIRC. I would bootstraptest this, but trunk appears to be broken for x86_64-linux right now (ICE in patch_jump_insn). But I can post this for review, at least. OK for trunk, after bootstrap+test? Your patch seems to have caused many Fortran regressions. At least I see with 185156 only one (known) failure, cf. http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg01069.html While starting with 185160 there are many, many gfortran failures, cf. http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg01073.html Yes, it seems that different boolean types aren't allowed. I must have looked at the wrong test results somehow. I'm testing this fix: Index: convert.c === --- convert.c (revision 185160) +++ convert.c (working copy) @@ -95,7 +95,8 @@ convert (tree type, tree expr) if (code == VOID_TYPE) return fold_build1_loc (input_location, CONVERT_EXPR, type, e); if (code == BOOLEAN_TYPE) -return truthvalue_conversion (e); +return fold_build1_loc (input_location, NOP_EXPR, type, + truthvalue_conversion (e)); if (code == INTEGER_TYPE) return fold (convert_to_integer (type, e)); if (code == POINTER_TYPE || code == REFERENCE_TYPE)
Re: PATCH: Properly generate X32 IE sequence
On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE-LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset = 3 (offset + 4) = sec-size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset 2 || (offset + 3) sec-size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD-IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD-LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD-LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE-LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the (reg32) part in fs:(reg32), we can't use it as memory operand. This patch
Re: [PR51752] publication safety violations in loop invariant motion pass
On Fri, 2012-03-09 at 15:48 -0600, Aldy Hernandez wrote: Torvald is this what you were thinking of? Yes, but with an exit in the else branch or something that can cause x not being read after the condition. I _suppose_ that your original example would be an allowed transformation but just because x would be read anyway independently of flag's value; we can assume data-race freedom, and thus we must be able to read x in a data-race-free way even if flag is false, so flag's value actually doesn't matter. What about modifying the example like below? In this case, if flag2 is true, flag's value will matter and we can't move the load to x before it. Will PRE still introduce tmp = x + 4 in such an example? Torvald +__transaction_atomic { + if (flag) +y = x + 4; + else +// stuff if (flag2) return; + z = x + 4; +} + + PRE can rewrite this into: + +__transaction_atomic { + if (flag) { +tmp = x + 4; +y = tmp; + } else { +// stuff +tmp = x + 4; if (flag2) return; + } + z = tmp; +} + + A later pass can move the now totally redundant [x + 4] + before its publication predicated by flag: + +__transaction_atomic { + tmp = x + 4; + if (flag) { + } else { +// stuff if (flag2) return; + } + z = tmp; + */
Re: [Patch, Fortran] PR 52542 - Fix PROCEDURE() with Bind(C)
Tobias Burnus wrote: If the interface in a PROCEDURE() statement is Bind(C), also the procedure (pointer) declared in that statement is BIND(C). From the F2008 standard: A proc-language-binding-spec without a NAME= is allowed, but is redundant with the proc-interface required by C1222. Build and currently regtested on x86-64-linux. OK for the trunk (if regtesting succeeded)? Well, it didn't as I forgot to reset two variables - one then gets then an error that one has specified an binding name - or the wrong binding name might be used. Build and regtested on x86-64-linux. OK? Tobias 2012-03-10 Tobias Burnus bur...@net-b.de PR fortran/52542 * decl.c (match_procedure_decl): If the interface is bind(C), the procedure is as well. 2012-03-10 Tobias Burnus bur...@net-b.de PR fortran/52542 * gfortran.dg/proc_ptr_35.f90: New. --- /dev/null 2012-03-09 19:41:57.079829322 +0100 +++ gcc/gcc/testsuite/gfortran.dg/proc_ptr_35.f90 2012-03-09 22:22:31.0 +0100 @@ -0,0 +1,16 @@ +! { dg-do compile } +! +! PR fortran/52542 +! +! Ensure that the procedure myproc is Bind(C). +! +! Contribute by Mat Cross of NAG +! +interface + subroutine s() bind(c) + end subroutine s +end interface +procedure(s) :: myproc +call myproc() +end +! { dg-final { scan-assembler-not myproc_ } } diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 75b8a89..4da21c3 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -4855,6 +4855,13 @@ match_procedure_decl (void) if (m == MATCH_ERROR) return MATCH_ERROR; + if (proc_if proc_if-attr.is_bind_c !current_attr.is_bind_c) +{ + current_attr.is_bind_c = 1; + has_name_equals = 0; + curr_binding_label = NULL; +} + /* Get procedure symbols. */ for(num=1;;num++) {
Re: PATCH: Properly check mode for x86 call/jmp address
On Wed, Mar 7, 2012 at 1:58 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Mar 7, 2012 at 5:03 PM, H.J. Lu hjl.to...@gmail.com wrote: (define_insn *call - [(call (mem:QI (match_operand:P 0 call_insn_operand czw)) + [(call (mem:QI (match_operand:C 0 call_insn_operand czw)) (match_operand 1 ))] - !SIBLING_CALL_P (insn) + !SIBLING_CALL_P (insn) + (GET_CODE (operands[0]) == SYMBOL_REF + || GET_MODE (operands[0]) == word_mode) There are enough copies of this extra constraint that I wonder if it simply ought to be folded into call_insn_operand. Which would need to be changed to define_special_predicate, since you'd be doing your own mode checking. Probably similar changes to sibcall_insn_operand. Here is the updated patch. I changed constant_call_address_operand and call_register_no_elim_operand to use define_special_predicate. OK for trunk? Please do not complicate matters that much. Just stick word_mode overrides for register operands in predicates.md, like in attached patch. These changed predicates now allow registers only in word_mode (and VOIDmode). You can now remove all new mode iterators and leave call patterns untouched. @@ -22940,14 +22940,18 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); - else if (sibcall - ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode) - : !call_insn_operand (XEXP (fnaddr, 0), Pmode)) + else if (!(constant_call_address_operand (XEXP (fnaddr, 0), Pmode) + || call_register_no_elim_operand (XEXP (fnaddr, 0), + word_mode) + || (!sibcall + !TARGET_X32 + memory_operand (XEXP (fnaddr, 0), word_mode { fnaddr = XEXP (fnaddr, 0); - if (GET_MODE (fnaddr) != Pmode) - fnaddr = convert_to_mode (Pmode, fnaddr, 1); - fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (Pmode, fnaddr)); + if (GET_MODE (fnaddr) != word_mode) + fnaddr = convert_to_mode (word_mode, fnaddr, 1); + fnaddr = gen_rtx_MEM (QImode, + copy_to_mode_reg (word_mode, fnaddr)); } vec_len = 0; Please update the above part. It looks you don't even have to change condition with new predicates. Basically, you should only convert the address to word_mode instead of Pmode. + if (TARGET_X32) + operands[0] = convert_memory_address (word_mode, operands[0]); This addition to indirect_jump and tablejump should be the only change, needed in i386.md now. Please write the condition if (Pmode != word_mode) for consistency. BTW: The attached patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Uros. It doesn't work: x.i:7:1: error: unrecognizable insn: (call_insn/j 8 7 9 3 (call (mem:QI (reg:DI 62) [0 *foo.0_1 S1 A8]) (const_int 0 [0])) x.i:6 -1 (nil) (nil)) x.i:7:1: internal compiler error: in extract_insn, at recog.c:2123 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. make: *** [x.s] Error 1 I will investigate it. For reference, attached is the complete patch that uses define_special_predicate. This patch works OK with the current mainline, with additional patch to i386.h, where Index: i386.h === --- i386.h (revision 185079) +++ i386.h (working copy) @@ -1744,7 +1744,7 @@ /* Specify the machine mode that pointers have. After generation of rtl, the compiler makes no further distinction between pointers and any other objects of this machine mode. */ -#define Pmode (TARGET_64BIT ? DImode : SImode) +#define Pmode (TARGET_LP64 ? DImode : SImode) /* A C expression whose value is zero if pointers that need to be extended from being `POINTER_SIZE' bits wide to `Pmode' are sign-extended and Uros. I tested this patch and it passed all my x32 tests. Thanks. -- H.J. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c2cad5a..33ef330 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -23032,13 +23031,13 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); else if (sibcall - ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode) - : !call_insn_operand (XEXP (fnaddr, 0), Pmode)) + ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) + : !call_insn_operand (XEXP (fnaddr, 0), word_mode)) { fnaddr = XEXP (fnaddr, 0); - if (GET_MODE (fnaddr) != Pmode) - fnaddr = convert_to_mode (Pmode, fnaddr, 1); -
Re: [Patch, Fortran] Change array descriptor's data to base_addr for TS 29113
Tobias, These patches are OK for trunk and fortran-dev. Many thanks Paul On Sat, Mar 10, 2012 at 4:53 PM, Tobias Burnus bur...@net-b.de wrote: The attached patch renames (in libgfortran/) the array descriptor's data field to base_addr and lbound to lower_bound. The reason is that Technical Specification (TS) 29113* uses those names in their C bindings, defined in ISO_Fortran_binding.h. But I would like to include that header file in libgfortran/libgfortran.h (cf. fortran-dev branch). Hence, the renaming. In order to make later merging of the fortran-dev branch into the trunk easier to review, I'd prefer to commit this patch already to the trunk, but it can also be commit to the branch. The patch shouldn't have any effect in terms of the ABI, however, I am not sure whether it formally fulfills the criteria in C99's 6.2.7p1** (compatible type and composite type). On the other hand, the fields in the dimension triplet are already differently named: ubound (gcc/fortran/trans-types.c) vs. _ubound (libgfortran/libgfortran.h). Okay for the trunk? (Or for the fortran-dev branch?) Comments? (Bootstrapped and regtested on x86-64-linux.) Tobias * Current TS 29113 draft: ftp://ftp.nag.co.uk/sc22wg5/N1901-N1950/N1904.pdf ** C99 plus TC1 to TC3, http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[patch, 4.7] libitm: Fix lost wake-up in serial lock.
This patch fixes PR52526, a lost wake-up in libitm (ie, one ore more threads could hang and not get woken up anymore). The problem was missing handling of one corner case in the futex-based serial lock implementation (config/linux/rwlock.cc, read_lock()): Multiple readers would set READERS to 1 and only call futex_wait(readers, 1) if there were any writers. Writers would set READERS to 0 and then call futex_wake(readers). That's fine, but because there are multiple readers, it can happen that some would set READERS to 1 after the writer's futex_wake() call, enabling the futex_wait() in other readers (because READERS isn't 0 anymore). This patch fixes this by having readers wake up all potentially waiting readers when they set READERS to 1 without an existing writer (thus taking over what the writer would do). OK for trunk? OK for 4.7 too? This is a showstopper if users hit it, so I'd prefer if it could go into 4.7 as well. commit 07d6d68b423797311bb04d8eb571f053d2078aa4 Author: Torvald Riegel trie...@redhat.com Date: Sat Mar 10 17:44:37 2012 +0100 libitm: Fix lost wake-up in serial lock. PR libitm/52526 * config/linux/rwlock.cc (GTM::gtm_rwlock::read_lock): Fix lost wake-up. diff --git a/libitm/config/linux/rwlock.cc b/libitm/config/linux/rwlock.cc index ad1b042..cf1fdd5 100644 --- a/libitm/config/linux/rwlock.cc +++ b/libitm/config/linux/rwlock.cc @@ -74,6 +74,32 @@ gtm_rwlock::read_lock (gtm_thread *tx) atomic_thread_fence (memory_order_seq_cst); if (writers.load (memory_order_relaxed)) futex_wait(readers, 1); + else + { + // There is no writer, actually. However, we can have enabled + // a futex_wait in other readers by previously setting readers + // to 1, so we have to wake them up because there is no writer + // that will do that. We don't know whether the wake-up is + // really necessary, but we can get lost wake-up situations + // otherwise. + // No additional barrier nor a nonrelaxed load is required due + // to coherency constraints. write_unlock() checks readers to + // see if any wake-up is necessary, but it is not possible that + // a reader's store prevents a required later writer wake-up; + // If the waking reader's store (value 0) is in modification + // order after the waiting readers store (value 1), then the + // latter will have to read 0 in the futex due to coherency + // constraints and the happens-before enforced by the futex + // (paragraph 6.10 in the standard, 6.19.4 in the Batty et al + // TR); second, the writer will be forced to read in + // modification order too due to Dekker-style synchronization + // with the waiting reader (see write_unlock()). + // ??? Can we avoid the wake-up if readers is zero (like in + // write_unlock())? Anyway, this might happen too infrequently + // to improve performance significantly. + readers.store (0, memory_order_relaxed); + futex_wake(readers, INT_MAX); + } } // And we try again to acquire a read lock.
Re: PATCH RFA: Update Go frontend on gcc 4.7 branch
On Fri, Mar 09, 2012 at 02:20:14PM -0800, Ian Lance Taylor wrote: I would like to update the Go support on the 4.7 branch. As I've mentioned before, Go is working toward a stable Go 1 release. That release is not complete, but it is quite close. The 4.7 branch was made at a slightly unstable point in the process. I've updated the library one more time, and I've spent the week testing the result on a bunch of Google-internal programs. What I have now is not perfect, but it is better than what is on the 4.7 branch today. I'm not very excited by such huge changes, but I've tested this on Fedora 17 (various architectures) and RHEL6/5 today, let's check this in. But certainly no further such large change will be accepted on the 4.7 branch. FYI, on Fedora 17 I had recent testresults without the patch, so below are just testsuite differences for that (debug/dwarf fails consistently everywhere), on RHEL5/6 I didn't have earlier go testsuite results, so I'm just providing summaries there. Fedora 17 i686-linux -FAIL: database/sql +FAIL: debug/dwarf x86_64-linux +FAIL: go.test/test/stack.go execution, -O2 -g +FAIL: debug/dwarf -FAIL: database/sql +FAIL: debug/dwarf ppc-linux +FAIL: log -FAIL: database/sql +FAIL: debug/dwarf -FAIL: exp/signal +FAIL: net/http/httptest +FAIL: os/signal -FAIL: testing/script ppc64-linux -FAIL: exp/signal +FAIL: net/http/httptest +FAIL: os/signal -FAIL: testing/script +FAIL: log +FAIL: debug/dwarf s390-linux +FAIL: log +FAIL: debug/dwarf -FAIL: sync/atomic s390x-linux +FAIL: log -FAIL: sync/atomic +FAIL: debug/dwarf +FAIL: log +FAIL: debug/dwarf -FAIL: sync/atomic RHEL 5, x86_64-linux (insufficient .cfi* support, so -fsplit-stack not supported): === go Summary === # of expected passes1045 # of unexpected failures556 # of expected failures 4 # of untested testcases 535 === libgo tests === Running target unix FAIL: debug/dwarf === libgo Summary for unix === # of expected passes122 # of unexpected failures1 Running target unix/-m32 FAIL: net FAIL: debug/dwarf RHEL6, x86_64-linux === go Summary === # of expected passes3296 # of expected failures 4 # of untested testcases 4 === libgo tests === Running target unix FAIL: debug/dwarf === libgo Summary for unix === # of expected passes122 # of unexpected failures1 Jakub
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE-LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset = 3 (offset + 4) = sec-size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset 2 || (offset + 3) sec-size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD-IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD-LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD-LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE-LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the
Re: PATCH RFA: Update Go frontend on gcc 4.7 branch
Jakub Jelinek ja...@redhat.com writes: I'm not very excited by such huge changes, but I've tested this on Fedora 17 (various architectures) and RHEL6/5 today, let's check this in. Thanks. Committed. But certainly no further such large change will be accepted on the 4.7 branch. Understood. FYI, on Fedora 17 I had recent testresults without the patch, so below are just testsuite differences for that (debug/dwarf fails consistently everywhere), on RHEL5/6 I didn't have earlier go testsuite results, so I'm just providing summaries there. I will look into these failures, not sure what is happening here. Ian
PATCH: Check Pmode in lwp_slwpcb
Hi, Pmode may be SImode for TARGET_64BIT. This patch checks Pmode instead of TARGET_64BIT in lwp_slwpcb. Tested on Linux/x86-64. OK for trunk? Thanks. H.J. --- 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.md (lwp_slwpcb): Check Pmode instead of TARGET_64BIT. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7f5a9e0..8fc7918 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -18015,7 +18065,7 @@ { rtx (*insn)(rtx); - insn = (TARGET_64BIT + insn = (Pmode == DImode ? gen_lwp_slwpcbdi : gen_lwp_slwpcbsi); -- 1.7.6.5
Re: [google/integration] Add XFAIL file for arm-gretv2-linux-gnueabi target (issue5798046)
On 10/03/12 01:16 , Doug Kwan wrote: * contrib/testsuite-management/arm-grtev2-linux-gnueabi.xfail: New file. OK. Diego.
Re: [committed] Update baseline symbols for hppa-linux-gnu
On Sat, Mar 10, 2012 at 04:27:47PM -0500, John David Anglin wrote: Tested on hppa-unknown-linux-gnu and committed to trunk. Ok for 4.7? Ok, but please leave the two TLS: lines out (similarly how they are left out for other targets) for now. @@ -3288,3 +3613,5 @@ OBJECT:8:_ZTTSo@@GLIBCXX_3.4 OBJECT:8:_ZTTSt13basic_istreamIwSt11char_traitsIwEE@@GLIBCXX_3.4 OBJECT:8:_ZTTSt13basic_ostreamIwSt11char_traitsIwEE@@GLIBCXX_3.4 +TLS:4:_ZSt11__once_call@@GLIBCXX_3.4.11 +TLS:4:_ZSt15__once_callable@@GLIBCXX_3.4.11 Jakub
[committed] Skip gcc.dg/torture/pr52402.c execution on 32-bit hppa*-*-hpux*
Tested on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Committed to trunk. Ok for 4.7? Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2012-03-10 John David Anglin dave.ang...@nrc-cnrc.gc.ca PR target/52450 * gcc.dg/torture/pr52402.c: Skip execution on 32-bit hppa*-*-hpux*. Index: gcc.dg/torture/pr52402.c === --- gcc.dg/torture/pr52402.c(revision 185121) +++ gcc.dg/torture/pr52402.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-options -w -Wno-psabi } */ /* { dg-require-effective-target int32plus } */ +/* { dg-xfail-run-if pr52450 { { hppa*-*-hpux* } { ! lp64 } } } */ typedef int v4si __attribute__((vector_size(16))); struct T { v4si i[2]; int j; } __attribute__((packed));
Re: [Ping][PATCH, libstdc++-v3] Enable to cross-test libstdc++ on simulator
On 7 March 2012 05:22, Terry Guo wrote: Hello, Can anybody please review and approve the following simple patch? Thanks very much. http://gcc.gnu.org/ml/libstdc++/2011-08/msg00063.html I think this looks OK but I'm not familiar with those details of the testsuite - do any ARM or other maintainers have any comments?
Re: [PATCH 07/10] addr32: Use word_mode instead of Pmode in loop expand
On Thu, Mar 8, 2012 at 3:22 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Mar 2, 2012 at 10:02 PM, H.J. Lu hongjiu...@intel.com wrote: This patches uses word_mode instead of Pmode in loop expand since word_mode may have bigger size than Pmode. OK for trunk? Thanks. H.J. --- 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_expand_movmem): Use word_mode instead of Pmode on loop. (ix86_expand_setmem): Likwise. Jan, can you please comment on the changes in this patch? Here is a complete updated patch to use word_mode in ix86_expand_movmem and ix86_expand_setmem. It also fixes ix86_zero_extend_to_Pmode to handle Pmode != DImode. OK for trunk? Thanks. -- H.J. --- 2012-03-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_zero_extend_to_Pmode): Handle Pmode != DImode. (ix86_expand_movmem): Use word_mode for size needed for loop. (ix86_expand_setmem): Likewise. 2012-03-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_zero_extend_to_Pmode): Handle Pmode != DImode. (ix86_expand_movmem): Use word_mode for size needed for loop. (ix86_expand_setmem): Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bc144a9..a51c6b4 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21031,7 +21031,11 @@ ix86_zero_extend_to_Pmode (rtx exp) if (GET_MODE (exp) == Pmode) return copy_to_mode_reg (Pmode, exp); r = gen_reg_rtx (Pmode); - emit_insn (gen_zero_extendsidi2 (r, exp)); + if (Pmode == DImode) +emit_insn (gen_zero_extendsidi2 (r, exp)); + else +emit_move_insn (r, + simplify_gen_subreg (Pmode, exp, GET_MODE (exp), 0)); return r; } @@ -22060,11 +22064,11 @@ ix86_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp, gcc_unreachable (); case loop: need_zero_guard = true; - size_needed = GET_MODE_SIZE (Pmode); + size_needed = GET_MODE_SIZE (word_mode); break; case unrolled_loop: need_zero_guard = true; - size_needed = GET_MODE_SIZE (Pmode) * (TARGET_64BIT ? 4 : 2); + size_needed = GET_MODE_SIZE (word_mode) * (TARGET_64BIT ? 4 : 2); break; case rep_prefix_8_byte: size_needed = 8; @@ -22230,13 +22234,13 @@ ix86_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp, break; case loop: expand_set_or_movmem_via_loop (dst, src, destreg, srcreg, NULL, -count_exp, Pmode, 1, expected_size); +count_exp, word_mode, 1, expected_size); break; case unrolled_loop: /* Unroll only by factor of 2 in 32bit mode, since we don't have enough registers for 4 temporaries anyway. */ expand_set_or_movmem_via_loop (dst, src, destreg, srcreg, NULL, -count_exp, Pmode, TARGET_64BIT ? 4 : 2, +count_exp, word_mode, TARGET_64BIT ? 4 : 2, expected_size); break; case rep_prefix_8_byte: @@ -22448,11 +22452,11 @@ ix86_expand_setmem (rtx dst, rtx count_exp, rtx val_exp, rtx align_exp, gcc_unreachable (); case loop: need_zero_guard = true; - size_needed = GET_MODE_SIZE (Pmode); + size_needed = GET_MODE_SIZE (word_mode); break; case unrolled_loop: need_zero_guard = true; - size_needed = GET_MODE_SIZE (Pmode) * 4; + size_needed = GET_MODE_SIZE (word_mode) * 4; break; case rep_prefix_8_byte: size_needed = 8; @@ -22623,11 +22627,11 @@ ix86_expand_setmem (rtx dst, rtx count_exp, rtx val_exp, rtx align_exp, break; case loop: expand_set_or_movmem_via_loop (dst, NULL, destreg, NULL, promoted_val, -count_exp, Pmode, 1, expected_size); +count_exp, word_mode, 1, expected_size); break; case unrolled_loop: expand_set_or_movmem_via_loop (dst, NULL, destreg, NULL, promoted_val, -count_exp, Pmode, 4, expected_size); +count_exp, word_mode, 4, expected_size); break; case rep_prefix_8_byte: expand_setmem_via_rep_stos (dst, destreg, promoted_val, count_exp,
PATCH: Use Pmode on x86_64 this parameter
Hi, This patch replaces DImode with Pmode on x86_64 this parameter. OK for trunk? Thanks. H.J. --- 2012-03-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (x86_this_parameter): Replace DImode with Pmode. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bc144a9..bfa3cdc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -31971,7 +31978,7 @@ x86_this_parameter (tree function) parm_regs = x86_64_ms_abi_int_parameter_registers; else parm_regs = x86_64_int_parameter_registers; - return gen_rtx_REG (DImode, parm_regs[aggr]); + return gen_rtx_REG (Pmode, parm_regs[aggr]); } nregs = ix86_function_regparm (type, function);
PATCH: Check ptr_mode and use Pmode in ix86_trampoline_init
Hi, x86 trampoline depends on ptr_mode. This patch checks ptr_mode, instead of TARGET_X32. Also we should use Pmode for address mode. Tested on Linux/x86-64. OK for trunk? Thanks. H.J. --- 2012-03-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_trampoline_init): Use movl for 64bit if ptr_mode == SImode. Replace DImode with Pmode or ptr_mode. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bc144a9..bfa3cdc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24309,10 +24313,13 @@ ix86_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) /* Load the function address to r11. Try to load address using the shorter movl instead of movabs. We may want to support movq for kernel mode, but kernel does not use trampolines at -the moment. */ - if (x86_64_zext_immediate_operand (fnaddr, VOIDmode)) +the moment. FNADDR is a 32bit address and may not be in +DImode when ptr_mode == SImode. Always use movl in this +case. */ + if (ptr_mode == SImode + || x86_64_zext_immediate_operand (fnaddr, VOIDmode)) { - fnaddr = copy_to_mode_reg (DImode, fnaddr); + fnaddr = copy_to_mode_reg (Pmode, fnaddr); mem = adjust_address (m_tramp, HImode, offset); emit_move_insn (mem, gen_int_mode (0xbb41, HImode)); @@ -24331,9 +24338,9 @@ ix86_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) offset += 10; } - /* Load static chain using movabs to r10. Use the -shorter movl instead of movabs for x32. */ - if (TARGET_X32) + /* Load static chain using movabs to r10. Use the shorter movl + instead of movabs when ptr_mode == SImode. */ + if (ptr_mode == SImode) { opcode = 0xba41; size = 6;