Re: RFA: Use precision in get_mode_bounds()
Hi Volker, apparently you added the wrong PR number to the patch. You probably meant PR 59613. Would you mind fixing that in the ChangeLog? Doh! Sorry - fixed. Cheers Nick
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)
On Tue, Dec 31, 2013 at 08:39:02AM +0100, Richard Biener wrote: That said, fold_stmt callers have to be prepared to handle say a normal call becoming noreturn call, consider say: struct A { virtual int foo (); }; struct B : public A { int foo () __attribute__((noreturn)); }; int B::foo () { __builtin_exit (0); } int bar () { B b; B *p = b; return p-foo (); } Is that a valid specialization though? I think so, after all don't we set noreturn attribute automatically even if it is missing when IPA can prove the function never returns? If we fold_stmt after that, the above testcase even without explicit noreturn attribute would need cfg cleanup. Perhaps gimple_fold_call should punt and not change fndecl if !inplace if some call flags have changed that would require cfg cleanup, making at least fold_stmt_inplace callers not having to deal with it, and make sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns true? Jakub
[Patch, Fortran] PR 59023: [4.9 regression] ICE in gfc_search_interface with BIND(C)
Hi all, ... and the reg-bashing continues: Here is a small patch to fix a bind(c) problem. It essentially prevents 'resolve_global_procedure' to be applied to bind(c) procedures. Regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2013-12-31 Janus Weil ja...@gcc.gnu.org PR fortran/59023 * resolve.c (resolve_global_procedure): Don't apply to c-binding procedures. (gfc_verify_binding_labels): Remove duplicate line. 2013-12-31 Janus Weil ja...@gcc.gnu.org PR fortran/59023 * gfortran.dg/bind_c_procs_2.f90: New. Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (revision 206252) +++ gcc/fortran/resolve.c (working copy) @@ -2351,6 +2351,7 @@ resolve_global_procedure (gfc_symbol *sym, locus * if ((sym-attr.if_source == IFSRC_UNKNOWN || sym-attr.if_source == IFSRC_IFBODY) gsym-type != GSYM_UNKNOWN + !gsym-binding_label gsym-ns gsym-ns-resolved != -1 gsym-ns-proc_name @@ -10163,7 +10164,6 @@ gfc_verify_binding_labels (gfc_symbol *sym) gsym-where = sym-declared_at; gsym-sym_name = sym-name; gsym-binding_label = sym-binding_label; - gsym-binding_label = sym-binding_label; gsym-ns = sym-ns; gsym-mod_name = module; if (sym-attr.function) ! { dg-do compile } ! ! PR 59023: [4.9 regression] ICE in gfc_search_interface with BIND(C) ! ! Contributed by Francois-Xavier Coudert fxcoud...@gcc.gnu.org type t integer hidden end type contains subroutine bar type(t) :: toto interface integer function helper() bind(c) end function end interface toto = t(helper()) end subroutine end
Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.
RA figured out that operation with general registers results in less moves (you already have x1 in general reg). This is exaclty the reason why I think unspecs are not needed. It is the job of the compiler to choose most appropriate approach, and its behavior should be adjusted with appropriate cost functions. I guess that if you load x from memory, RA will allocate mask registers all the way to add insn. I tried loading from memory and result is the same. Without unspec this intrinsic is just return __A and is completely useless. As for RA choosing best approach, big concern was generating klogic for normal instructions, so current implementation of masks is conservative and RA chooses gpr alternatives. So i think, that kmov intrinsic with unspec has it's uses as a hint to complier. If you are against this approach here is version without unspec. --- gcc/config/i386/avx512fintrin.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index 0a43b1e..e0e74cf 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -14826,6 +14826,13 @@ _mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B) _MM_FROUND_CUR_DIRECTION); } +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kmov (__mmask16 __A) +{ + return __A; +} + #ifdef __DISABLE_AVX512F__ #undef __DISABLE_AVX512F__ #pragma GCC pop_options -- 1.8.3.1
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)
Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 31, 2013 at 08:39:02AM +0100, Richard Biener wrote: That said, fold_stmt callers have to be prepared to handle say a normal call becoming noreturn call, consider say: struct A { virtual int foo (); }; struct B : public A { int foo () __attribute__((noreturn)); }; int B::foo () { __builtin_exit (0); } int bar () { B b; B *p = b; return p-foo (); } Is that a valid specialization though? I think so, after all don't we set noreturn attribute automatically even if it is missing when IPA can prove the function never returns? If we fold_stmt after that, the above testcase even without explicit noreturn attribute would need cfg cleanup. Perhaps gimple_fold_call should punt and not change fndecl if !inplace if some call flags have changed that would require cfg cleanup, making at least fold_stmt_inplace callers not having to deal with it, and make sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns true? It would be nice to audit callers and finally document what callers are required to do ... I can look at this next week. I agree that the in place variant should avoid these kind of side-effects. Meanwhile your patch is ok. Thanks, Richard. Jakub
Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.
On Tue, Dec 31, 2013 at 11:22 AM, Ilya Tocar tocarip.in...@gmail.com wrote: RA figured out that operation with general registers results in less moves (you already have x1 in general reg). This is exaclty the reason why I think unspecs are not needed. It is the job of the compiler to choose most appropriate approach, and its behavior should be adjusted with appropriate cost functions. I guess that if you load x from memory, RA will allocate mask registers all the way to add insn. I tried loading from memory and result is the same. Without unspec this intrinsic is just return __A and is completely useless. As for RA choosing best approach, big concern was generating klogic for normal instructions, so current implementation of masks is conservative and RA chooses gpr alternatives. So i think, that kmov intrinsic with unspec has it's uses as a hint to complier. If you are against this approach here is version without unspec. No, this explanation sounds reasonable. I was trying to take parallels with MMX insns, where we were able to avoid DImode MMX moves by penalizing various moves between register sets. However, apart from DImode shifts, MMX didn't have equivalent logical or arithmetic DImode operations. When _mm512_kmov is used, user clearly wants to use mask registers and operations on mask registers. Based on these facts, I think that the lesser evil is to use UNSPEC moves. So, your original patch is OK. Thanks, Uros.
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)
I think so, after all don't we set noreturn attribute automatically even if it is missing when IPA can prove the function never returns? If we fold_stmt after that, the above testcase even without explicit noreturn attribute would need cfg cleanup. Perhaps gimple_fold_call should punt and not change fndecl if !inplace if some call flags have changed that would require cfg cleanup, making at least fold_stmt_inplace callers not having to deal with it, and make sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns true? It would be nice to audit callers and finally document what callers are required to do ... I can look at this next week. I agree that the in place variant should avoid these kind of side-effects. Yep, I think with in-place-fold we really can't do this. This make me wonder, how we arrange the statement to be folded later if we don't fold statements we did not touched by a given pass? Note that with LTO and invalid C++ program, I think devirtualization can return a function of a different type just because the slots in virtual tables are used differently in each unit. We should not ICE here. Honza Meanwhile your patch is ok. Thanks, Richard. Jakub
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)
Jan Hubicka hubi...@ucw.cz wrote: I think so, after all don't we set noreturn attribute automatically even if it is missing when IPA can prove the function never returns? If we fold_stmt after that, the above testcase even without explicit noreturn attribute would need cfg cleanup. Perhaps gimple_fold_call should punt and not change fndecl if !inplace if some call flags have changed that would require cfg cleanup, making at least fold_stmt_inplace callers not having to deal with it, and make sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns true? It would be nice to audit callers and finally document what callers are required to do ... I can look at this next week. I agree that the in place variant should avoid these kind of side-effects. Yep, I think with in-place-fold we really can't do this. This make me wonder, how we arrange the statement to be folded later if we don't fold statements we did not touched by a given pass? We don't fold all statements, that is a general issue. I have old patches that ensure we at least fold all statements after gimplifying. Propagators should fold what they propagate into. That leaves us with lto which could fold each statement after reading it. I've not done this to avoid differences with regarding to the gimplification issue. Richard. Note that with LTO and invalid C++ program, I think devirtualization can return a function of a different type just because the slots in virtual tables are used differently in each unit. We should not ICE here. Honza Meanwhile your patch is ok. Thanks, Richard. Jakub
Re: RFA: Use precision in get_mode_bounds()
Richard Biener richard.guent...@gmail.com writes: Nick Clifton ni...@redhat.com wrote: Hi Guys, I have tracked down a bug reported against the MSP430 (PR target/59613) which turns out to be a generic problem. The function get_mode_bounds() in stor-layout.c computes the minimum and maximum values for a given mode, but it uses the bitsize of the mode not the precision. This is incorrect if the two values are different, (which can happen when the target is using partial integer modes), since values outside of the precision cannot be stored in the mode, no matter how many bits the mode uses. The result, for the MSP430 in LARGE mode, is that calling get_mode_bounds() on PSImode returns a maximum value of -1 instead of 1^20. Note - what happens is that get_mode_bounds computes the maximum as (1 31) - 1 and then calls gen_int_mode to create an RTX for this value. But gen_int_mode calls trunc_int_for_mode which uses GET_MODE_PRECISION to determine the sign bits and these bits overwrite some of the zero bits in (1 31) - 1 changing it into -1. The result of this behaviour is the bug that I was tracking down. simplify_const_relational_operation() was erroneously discarding a comparison of a pointer against zero, because get_mode_bounds told it that the pointer could only have values between 0 and -1... The proposed patch is simple - use the mode's precision and not its bitsize to compute the bounds. This seems like an obvious fix, but I am not familiar with all of the uses of get_mode_bounds() so maybe there is a situation where using the bitsize is correct. There were no regressions and several fixed test cases with the msp430-elf toolchain that I was using, and no regressions with an i686-pc-linux-gnu toolchain. OK to apply ? Ok without the comment (this is obvious) This completely breaks ia64, see PR59649. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [C++,doc] vector conditional expression
On Mon, 2 Dec 2013, Gerald Pfeifer wrote: On Mon, 2 Dec 2013, Marc Glisse wrote: Index: doc/extend.texi === +In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where +@code{b} and @code{c} are vectors of the same type and @code{a} is an +integer vector of the same size and number of elements as @code{b} and +@code{c} Why same size and number of elements in the above? What is the difference between these two? (on x86_64) A vector of 4 int and a vector of 4 long have the same number of elements but not the same size. A vector of 8 int and a vector of 4 long have the same size but not the same number of elements. For semantics, we want the same number of elements. To match the hardware, we want the same size. Ah, so it was good I asked. :-) Thanks for your explanation. It seems the way this is intended is integer vector of the (same size and number of elements) as whereas I parsed it as (integer vector of the same size) and (number of elements) as hence wondering what the difference between the size of the vector and the number of elements was. I think you had parsed it ok. In code terms: size: sizeof(vec) number of elements: sizeof(vec)/sizeof(vec[0]) Now when the number of elements is fixed, saying that vectors have the same (total) size or that they have elements of the same size is equivalent, so any interpretation is fine. Rephrasing this as the same number and size of elements as or better the same number of elements of the same size as may help avoid this. Ok. Like this then? +In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where +@code{b} and @code{c} are vectors of the same type and @code{a} is an +integer vector with the same number of elements of the same size as @code{b} +and @code{c}, computes all three arguments and creates a vector +@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}. Note that unlike in +OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a 0}. +As in the case of binary operations, this syntax is also accepted when +one of @code{b} or @code{c} is a scalar that is then transformed into a +vector. If both @code{b} and @code{c} are scalars and the type of +@code{true?b:c} has the same size as the element type of @code{a}, then +@code{b} and @code{c} are converted to a vector type whose elements have +this type and with the same number of elements as @code{a}. (though arguably one could parse this to mean that the elements of a have the same size as the whole vector b, but I am fine with ignoring this) -- Marc Glisse
Re: [C,C++] integer constants in attribute arguments
Ping http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03822.html On Sat, 30 Nov 2013, Marc Glisse wrote: Hello, we currently reject: constexpr int s = 32; typedef double VEC __attribute__ ((__vector_size__ (s))); and similarly for other attributes, while we accept s+0 or (int)s, etc. The code is basically copied from the constructor attribute. The C front-end is much less forgiving than the C++ one, so we need to protect the call to default_conversion (as in PR c/59280), and for some reason one of the attributes can see a FUNCTION_DECL where others see an IDENTIFIER_NODE, I didn't try to understand why and just added that check to the code. Bootstrap and testsuite on x86_64-linux-gnu. 2013-11-30 Marc Glisse marc.gli...@inria.fr PR c++/53017 PR c++/59211 gcc/c-family/ * c-common.c (handle_aligned_attribute, handle_alloc_size_attribute, handle_vector_size_attribute, handle_nonnull_attribute): Call default_conversion on the attribute argument. gcc/cp/ * tree.c (handle_init_priority_attribute): Likewise. gcc/ * doc/extend.texi (Function Attributes): Typo. gcc/testsuite/ * c-c++-common/attributes-1.c: New testcase. * g++.dg/cpp0x/constexpr-attribute2.C: Likewise. -- Marc Glisse
[C++] PR59641: error recovery in vector condition
Hello, here is a simple patch for error recovery. We already check the arguments earlier, but force_rvalue can replace them with errors. Bootstrap+testsuite on x86_64-unknown-linux-gnu. 2014-01-01 Marc Glisse marc.gli...@inria.fr PR c++/59641 gcc/cp/ * call.c (build_conditional_expr_1): Check the return value of force_rvalue. gcc/testsuite/ * g++.dg/cpp0x/pr59641.C: New file. -- Marc GlisseIndex: gcc/cp/call.c === --- gcc/cp/call.c (revision 206265) +++ gcc/cp/call.c (working copy) @@ -4395,20 +4395,26 @@ build_conditional_expr_1 (location_t loc orig_arg2 = arg2; orig_arg3 = arg3; if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (arg1))) { arg1 = force_rvalue (arg1, complain); arg2 = force_rvalue (arg2, complain); arg3 = force_rvalue (arg3, complain); + /* force_rvalue can return error_mark on valid arguments. */ + if (error_operand_p (arg1) + || error_operand_p (arg2) + || error_operand_p (arg3)) + return error_mark_node; + tree arg1_type = TREE_TYPE (arg1); arg2_type = TREE_TYPE (arg2); arg3_type = TREE_TYPE (arg3); if (TREE_CODE (arg2_type) != VECTOR_TYPE TREE_CODE (arg3_type) != VECTOR_TYPE) { /* Rely on the error messages of the scalar version. */ tree scal = build_conditional_expr_1 (loc, integer_one_node, orig_arg2, orig_arg3, complain); Index: gcc/testsuite/g++.dg/cpp0x/pr59641.C === --- gcc/testsuite/g++.dg/cpp0x/pr59641.C(revision 0) +++ gcc/testsuite/g++.dg/cpp0x/pr59641.C(working copy) @@ -0,0 +1,8 @@ +// { dg-options -std=gnu++11 } +typedef int T __attribute__((vector_size(2*sizeof(int; + +void foo(T r, const T a, const T b) +{ + constexpr T c = a b; // { dg-error constant } + r = c ? a : b; +} Property changes on: gcc/testsuite/g++.dg/cpp0x/pr59641.C ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property
Re: [PATCH] Vectorization for store with negative step
Bingfeng Mei wrote: Hi, I created PR59544 and here is the patch. OK to commit? Thanks, Bingfeng 2013-12-18 Bingfeng Mei b...@broadcom.com PR tree-optimization/59544 * tree-vect-stmts.c (perm_mask_for_reverse): Move before vectorizable_store. (vectorizable_store): Handle negative step. 2013-12-18 Bingfeng Mei b...@broadcom.com PR tree-optimization/59544 * gcc.target/i386/pr59544.c: New test Hi Bingfeng, Your patch seems to have a dependence calculation bug(I think) due to which gcc.dg/torture/pr52943.c regresses on aarch64. I've raised http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59651. Do you think you could have a look? Thanks, Tejas. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Richard Biener Sent: 18 December 2013 11:47 To: Bingfeng Mei Cc: gcc-patches@gcc.gnu.org Subject: Re: Vectorization for store with negative step On Wed, Dec 18, 2013 at 12:34 PM, Bingfeng Mei b...@broadcom.com wrote: Thanks, Richard. I will file a bug report and prepare a complete patch. For perm_mask_for_reverse function, should I move it before vectorizable_store or add a declaration. Move it. Richard. Bingfeng -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 18 December 2013 11:26 To: Bingfeng Mei Cc: gcc-patches@gcc.gnu.org Subject: Re: Vectorization for store with negative step On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei b...@broadcom.com wrote: Hi, I was looking at some loops that can be vectorized by LLVM, but not GCC. One type of loop is with store of negative step. void test1(short * __restrict__ x, short * __restrict__ y, short * __restrict__ z) { int i; for (i=127; i=0; i--) { x[i] = y[127-i] + z[127-i]; } } I don't know why GCC only implements negative step for load, but not store. I implemented a patch, very similar to code in vectorizable_load. ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx Without patch: test1: .LFB0: addq$254, %rdi xorl%eax, %eax .p2align 4,,10 .p2align 3 .L2: movzwl (%rsi,%rax), %ecx subq$2, %rdi addw(%rdx,%rax), %cx addq$2, %rax movw%cx, 2(%rdi) cmpq$256, %rax jne .L2 rep; ret With patch: test1: .LFB0: vmovdqa .LC0(%rip), %xmm1 xorl%eax, %eax .p2align 4,,10 .p2align 3 .L2: vmovdqu (%rsi,%rax), %xmm0 movq%rax, %rcx negq%rcx vpaddw (%rdx,%rax), %xmm0, %xmm0 vpshufb %xmm1, %xmm0, %xmm0 addq$16, %rax cmpq$256, %rax vmovups %xmm0, 240(%rdi,%rcx) jne .L2 rep; ret Performance is definitely improved here. It is bootstrapped for x86_64-unknown-linux-gnu, and has no additional regressions on my machine. For reference, LLVM seems to use different instructions and slightly worse code. I am not so familiar with x86 assemble code. The patch is originally for our private port. test1: # @test1 .cfi_startproc # BB#0: # %entry addq$240, %rdi xorl%eax, %eax .align 16, 0x90 .LBB0_1:# %vector.body # =This Inner Loop Header: Depth=1 movdqu (%rsi,%rax,2), %xmm0 movdqu (%rdx,%rax,2), %xmm1 paddw %xmm0, %xmm1 shufpd $1, %xmm1, %xmm1# xmm1 = xmm1[1,0] pshuflw $27, %xmm1, %xmm0 # xmm0 = xmm1[3,2,1,0,4,5,6,7] pshufhw $27, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2,3,7,6,5,4] movdqu %xmm0, (%rdi) addq$8, %rax addq$-16, %rdi cmpq$128, %rax jne .LBB0_1 # BB#2: # %for.end ret Any comment? Looks good to me. One of the various TODOs in vectorizable_store I presume. Needs a testcase and at this stage a bugreport that is fixed by it. Thanks, Richard. Bingfeng Mei Broadcom UK
[PATCH] Tiny predcom improvement (PR tree-optimization/59643)
Hi! As written in the PR, I've been looking why is llvm 3.[34] so much faster on Scimark2 SOR benchmark and the reason is that it's predictive commoning or whatever it uses doesn't give up on the inner loop, while our predcom unnecessarily gives up, because there are reads that could alias the write. This simple patch improves the benchmark by 42%. We already ignore unsuitable dependencies for read/read, the patch extends that for unsuitable dependencies for read/write by just putting the read (and anything in it's component) into the bad component which is ignored. pcom doesn't optimize away the writes and will keep the potentially aliasing reads unmodified as well. Without the patch we'd merge the two components, and as !determine_offset between the two DRs, it would mean the whole merged component would be always unsuitable and thus ignored. With the patch we'll hopefully have some other reads with known offset to the write and can optimize that, so the patch should always either handle what it did before or handle perhaps some more cases. The inner loop from the (public domain) benchmark is added in the two tests, one runtime test and one test looking whether pcom actually optimized it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-12-31 Jakub Jelinek ja...@redhat.com PR tree-optimization/59643 * tree-predcom.c (split_data_refs_to_components): If one dr is read and one write, determine_offset fails and the write isn't in the bad component, just put the read into the bad component. * gcc.dg/pr59643.c: New test. * gcc.c-torture/execute/pr59643.c: New test. --- gcc/tree-predcom.c.jj 2013-12-31 12:50:47.169748385 +0100 +++ gcc/tree-predcom.c 2013-12-31 15:39:44.422297404 +0100 @@ -772,10 +772,37 @@ split_data_refs_to_components (struct lo bad = component_of (comp_father, n); /* If both A and B are reads, we may ignore unsuitable dependences. */ - if (DR_IS_READ (dra) DR_IS_READ (drb) - (ia == bad || ib == bad - || !determine_offset (dra, drb, dummy_off))) - continue; + if (DR_IS_READ (dra) DR_IS_READ (drb)) + { + if (ia == bad || ib == bad + || !determine_offset (dra, drb, dummy_off)) + continue; + } + /* If A is read and B write or vice versa and there is unsuitable +dependence, instead of merging both components into a component +that will certainly not pass suitable_component_p, just put the +read into bad component, perhaps at least the write together with +all the other data refs in it's component will be optimizable. */ + else if (DR_IS_READ (dra) ib != bad) + { + if (ia == bad) + continue; + else if (!determine_offset (dra, drb, dummy_off)) + { + merge_comps (comp_father, comp_size, bad, ia); + continue; + } + } + else if (DR_IS_READ (drb) ia != bad) + { + if (ib == bad) + continue; + else if (!determine_offset (dra, drb, dummy_off)) + { + merge_comps (comp_father, comp_size, bad, ib); + continue; + } + } merge_comps (comp_father, comp_size, ia, ib); } --- gcc/testsuite/gcc.dg/pr59643.c.jj 2013-12-31 15:34:48.584810067 +0100 +++ gcc/testsuite/gcc.dg/pr59643.c 2013-12-31 15:34:48.584810067 +0100 @@ -0,0 +1,15 @@ +/* PR tree-optimization/59643 */ +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-tree-pcom-details } */ + +void +foo (double *a, double *b, double *c, double d, double e, int n) +{ + int i; + for (i = 1; i n - 1; i++) +a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i]; +} + +/* { dg-final { scan-tree-dump-times Before commoning: 1 pcom } } */ +/* { dg-final { scan-tree-dump-times Unrolling 2 times 1 pcom } } */ +/* { dg-final { cleanup-tree-dump pcom } } */ --- gcc/testsuite/gcc.c-torture/execute/pr59643.c.jj2013-12-31 15:34:48.584810067 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr59643.c 2013-12-31 15:34:48.584810067 +0100 @@ -0,0 +1,39 @@ +/* PR tree-optimization/59643 */ + +#define N 32 + +__attribute__((noinline, noclone)) void +foo (double *a, double *b, double *c, double d, double e, int n) +{ + int i; + for (i = 1; i n - 1; i++) +a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i]; +} + +double expected[] = { + 0.0, 10.0, 44.0, 110.0, 232.0, 490.0, 1020.0, 2078.0, 4152.0, 8314.0, + 16652.0, 33326.0, 4.0, 133354.0, 266748.0, 533534.0, 1067064.0, + 2134138.0, 4268300.0, 8536622.0, 17073256.0, 34146538.0, 68293116.0, + 136586270.0, 273172536.0, 546345082.0, 1092690188.0, 2185380398.0, + 4370760808.0, 8741521642.0, 17483043324.0, 6.0 +}; + +int +main () +{ + int i; + double a[N], b[N], c[N]; + if (__DBL_MANT_DIG__ = 35) +return 0; + for (i = 0; i N; i++) +{ + a[i] = (i 3) * 2.0; + b[i]
[PATCH] CSE fix for UNSIGNED_FLOAT (PR rtl-optimization/59647)
Hi! This patch fixes a 4.8 ICE (on trunk the bug went latent by GIMPLE optimization changes). The problem is we have a REG_EQUAL (unsigned_float:DF (reg:SI ...)) and cse_process_notes_1 attempts to replace the reg with CONST_INT that has the top most bit set. As CONST_INTs are canonicalized by sign extending it, simplify_unary* ICEs on it because it doesn't know the original mode of the argument and thus if the e.g. -1 in the CONST_INT is 0xff, 0x, 0x, 0xULL. cse_process_notes_1 already punts similarly on SIGN_EXTEND/ZERO_EXTEND/SUBREG, this just handles UNSIGNED_FLOAT similarly, but allows unsigned values that aren't negative when canonicalized, one doesn't need the mode in that case, if the CONST_INT was valid for the original mode, then it shouldn't have any bits set beyond it's precision. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-31 Jakub Jelinek ja...@redhat.com PR rtl-optimization/59647 * cse.c (cse_process_notes_1): Don't substitute negative VOIDmode new_rtx into UNSIGNED_FLOAT rtxes. * g++.dg/opt/pr59647.C: New test. --- gcc/cse.c.jj2013-03-16 08:30:37.0 +0100 +++ gcc/cse.c 2013-12-31 15:31:16.469885722 +0100 @@ -6082,6 +6082,18 @@ cse_process_notes_1 (rtx x, rtx object, return x; } +case UNSIGNED_FLOAT: + { + rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed); + /* We don't substitute negative VOIDmode constants into these rtx, + since they would impede folding. */ + if (GET_MODE (new_rtx) != VOIDmode + || (CONST_INT_P (new_rtx) INTVAL (new_rtx) = 0) + || (CONST_DOUBLE_P (new_rtx) CONST_DOUBLE_HIGH (new_rtx) = 0)) + validate_change (object, XEXP (x, 0), new_rtx, 0); + return x; + } + case REG: i = REG_QTY (REGNO (x)); --- gcc/testsuite/g++.dg/opt/pr59647.C.jj 2013-12-31 15:16:16.259454995 +0100 +++ gcc/testsuite/g++.dg/opt/pr59647.C 2013-12-31 15:15:51.0 +0100 @@ -0,0 +1,32 @@ +// PR rtl-optimization/59647 +// { dg-do compile } +// { dg-options -O2 -fno-tree-vrp } +// { dg-additional-options -msse2 -mfpmath=sse { target { { i?86-*-* x86_64-*-* } ia32 } } } + +void f1 (int); +void f2 (); +double f3 (int); + +struct A +{ + int f4 () const + { +if (a == 0) + return 1; +return 0; + } + unsigned f5 () + { +if (!f4 ()) + f2 (); +return a; + } + int a; +}; + +void +f6 (A *x) +{ + unsigned b = x-f5 (); + f1 (b - 1 - f3 (x-f5 () - 1U)); +} Jakub
Re: [PATCH] CSE fix for UNSIGNED_FLOAT (PR rtl-optimization/59647)
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-31 Jakub Jelinek ja...@redhat.com PR rtl-optimization/59647 * cse.c (cse_process_notes_1): Don't substitute negative VOIDmode new_rtx into UNSIGNED_FLOAT rtxes. * g++.dg/opt/pr59647.C: New test. OK, thanks. -- Eric Botcazou
Re: [PATCH] Tiny predcom improvement (PR tree-optimization/59643)
On 12/31/2013, 2:04 PM, Jakub Jelinek wrote: Hi! As written in the PR, I've been looking why is llvm 3.[34] so much faster on Scimark2 SOR benchmark and the reason is that it's predictive commoning or whatever it uses doesn't give up on the inner loop, while our predcom unnecessarily gives up, because there are reads that could alias the write. This simple patch improves the benchmark by 42%. We already ignore unsuitable dependencies for read/read, the patch extends that for unsuitable dependencies for read/write by just putting the read (and anything in it's component) into the bad component which is ignored. pcom doesn't optimize away the writes and will keep the potentially aliasing reads unmodified as well. Without the patch we'd merge the two components, and as !determine_offset between the two DRs, it would mean the whole merged component would be always unsuitable and thus ignored. With the patch we'll hopefully have some other reads with known offset to the write and can optimize that, so the patch should always either handle what it did before or handle perhaps some more cases. Congratulation, Jakub! Scimark2 is always used by Phoronix to show how bad GCC in comparison with LLVM. It is understandable. For some reasons phoronix is very biased to LLVM and, I'd say, a marketing machine for LLVM. They use very small selected benchmarks. Benchmarking is evil but I believe more SPEC2000/SPEC2006 which use bigger real programs. With their approach, they could use whetstone instead of Scimark but the results would be very bad for LLVM (e.g. 64-bit code for -O3 on Haswell): dfa:MWIPS 3705.311 -- LLVM3.3 nor:MWIPS 4587.555 -- GCC4.8 It would be nice to fix also Scimark2 LU GCC-4.8 degradation to finally stop this nonsense from Phoronix.
[buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT (was: nios2 port committed)
On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com wrote: The nios2 port was just committed. Thanks to all that gave time and effort to review this. Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being redefined, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923 as an example. 2013-12-31 Jan-Benedict Glaw jbg...@lug-owl.de * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h index 8e6941b..f333be3 100644 --- a/gcc/config/nios2/nios2.h +++ b/gcc/config/nios2/nios2.h @@ -73,7 +73,6 @@ #define BITS_BIG_ENDIAN 0 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 #define POINTER_SIZE 32 Ok? MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: They that give up essential liberty to obtain temporary safety, the second : deserve neither liberty nor safety. (Ben Franklin) signature.asc Description: Digital signature
[PATCH] Fix up --enable-checking=rtl SSE/AVX regressions
Hi! In --enable-checking=rtl bootstrap most of SSE/AVX tests fail, because *movmode_internal is using REGNO on operands that are e.g. MEM or CONST_INT etc. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as obvious. 2014-01-01 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (*movmode_internal): Guard EXT_REX_SSE_REGNO_P (REGNO ()) uses with REG_P. --- gcc/config/i386/sse.md.jj 2013-12-31 13:52:45.0 +0100 +++ gcc/config/i386/sse.md 2013-12-31 20:13:32.673520479 +0100 @@ -670,8 +670,10 @@ (define_insn *movmode_internal in avx512f, so we need to use workarounds, to access sse registers 16-31, which are evex-only. */ if (TARGET_AVX512F GET_MODE_SIZE (MODEmode) 64 - (EXT_REX_SSE_REGNO_P (REGNO (operands[0])) - || EXT_REX_SSE_REGNO_P (REGNO (operands[1] + ((REG_P (operands[0]) + EXT_REX_SSE_REGNO_P (REGNO (operands[0]))) + || (REG_P (operands[1]) + EXT_REX_SSE_REGNO_P (REGNO (operands[1]) { if (memory_operand (operands[0], MODEmode)) { Jakub
[PATCH] Fix check_effective_target_avx512f
Hi! I've noticed that all abi-avx512f.exp tests fail when the assembler doesn't support avx512f. The problem is that the builtin has been optimized away as unused and thus it passed even with assembler that doesn't support EVEX. Fixed thusly, tested on x86_64-linux, committed to trunk as obvious. Happy New Year to everyone! 2014-01-01 Jakub Jelinek ja...@redhat.com * lib/target-supports.exp (check_effective_target_avx512f): Make sure the builtin isn't optimized away as unused. --- gcc/testsuite/lib/target-supports.exp.jj2013-12-31 12:51:09.0 +0100 +++ gcc/testsuite/lib/target-supports.exp 2014-01-01 01:00:28.086093259 +0100 @@ -5176,9 +5176,9 @@ proc check_effective_target_avx512f { } return [check_no_compiler_messages avx512f object { typedef double __m512d __attribute__ ((__vector_size__ (64))); - void _mm512_add (__m512d a) + __m512d _mm512_add (__m512d a) { - __builtin_ia32_addpd512_mask (a, a, a, 1, 4); + return __builtin_ia32_addpd512_mask (a, a, a, 1, 4); } } -O2 -mavx512f ] } Jakub
pch bug fix
In testing for wide-int, we discovered that someone seems to have blown pch. The problem is that the optabs field is not a string. It's size is not determined by strlen. strlen are the semantics gty attaches to unsigned char * data. By defeating the type of optabs to being any other type, then the length is determined by the ggc subsystem which just knows the length of the data (2K on my system). I don't really like the type lying going on. I'd prefer if a tree-core person would get honest with the types. Another style of fix would be to decide that atomic in this case means something else, then, we'd have to have a gty person implement the new semantics. Until that is done, this problem is so bad, that the below should go in until someone can fix gty, if someone doesn't want to be honest with the type system. This bug comes across as a random memory smasher and is incredibly annoying to track down. There is non-determinism in the process and will causes non-deterministic memory crashes, but, only on pch use. To track down why, you have to trace back through the pch writing process and realize the data in memory is completely valid, is it only the meta information about that data that pch uses from the GTY world that causes any problem. Ok? Index: optabs.c === --- optabs.c(revision 206086) +++ optabs.c(working copy) @@ -6242,7 +6242,7 @@ /* If the optabs changed, record it. */ if (memcmp (tmp_optabs, this_target_optabs, sizeof (struct target_optabs))) -TREE_OPTIMIZATION_OPTABS (optnode) = (unsigned char *) tmp_optabs; +TREE_OPTIMIZATION_OPTABS (optnode) = (struct target_optabs_S *) tmp_optabs; else { TREE_OPTIMIZATION_OPTABS (optnode) = NULL; Index: tree-core.h === --- tree-core.h (revision 206086) +++ tree-core.h (working copy) @@ -1559,6 +1559,9 @@ struct tree_statement_list_node *tail; }; +struct GTY(()) target_optabs_S { + unsigned char e[1]; +}; /* Optimization options used by a function. */ @@ -1570,7 +1573,7 @@ /* Target optabs for this set of optimization options. This is of type `struct target_optabs *'. */ - unsigned char *GTY ((atomic)) optabs; + struct target_optabs_S *GTY((atomic)) optabs; /* The value of this_target_optabs against which the optabs above were generated. */
Re: [buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT (was: nios2 port committed)
On Dec 31, 2013, at 12:26 PM, Jan-Benedict Glaw jbg...@lug-owl.de wrote: On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com wrote: The nios2 port was just committed. Thanks to all that gave time and effort to review this. Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being redefined, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923 as an example. 2013-12-31 Jan-Benedict Glaw jbg...@lug-owl.de * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h index 8e6941b..f333be3 100644 --- a/gcc/config/nios2/nios2.h +++ b/gcc/config/nios2/nios2.h @@ -73,7 +73,6 @@ #define BITS_BIG_ENDIAN 0 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 #define POINTER_SIZE 32 Ok? Ok.
Re: pch bug fix
On Tue, Dec 31, 2013 at 10:39:58PM -0800, Mike Stump wrote: In testing for wide-int, we discovered that someone seems to have blown pch. The problem is that the optabs field is not a string. It's size is not determined by strlen. strlen are the semantics gty attaches to unsigned char * data. By defeating the type of optabs to being any other type, then the length is determined by the ggc subsystem which just knows the length of the data (2K on my system). I don't really like the type lying going on. I'd prefer if a tree-core person would get honest with the types. Another style of fix would be to decide that atomic in this case means something else, then, we'd have to have a gty person implement the new semantics. Until that is done, this problem is so bad, that the below should go in until someone can fix gty, if someone doesn't want to be honest with the type system. This bug comes across as a random memory smasher and is incredibly annoying to track down. There is non-determinism in the process and will causes non-deterministic memory crashes, but, only on pch use. To track down why, you have to trace back through the pch writing process and realize the data in memory is completely valid, is it only the meta information about that data that pch uses from the GTY world that causes any problem. Thanks for tracking this down, this sounds like PR59436. How have you managed to track it down? I also wonder why it doesn't seem to affect 4.8 when it also has the same change. Based on the comments in gengtype.c, I'd expect unsigned char *GTY ((atomic, length (sizeof (struct target_optabs optabs; to work (or, perhaps instead of sizeof directly call a function that is defined in optabs.c and returns that value, perhaps optabs.h isn't included where it should be), but unfortunately it seems to be rejected right now. So, the question is how hard would it be to support that. Jakub