Re: [PATCH] Power/GCC: Subword atomic operation endianness check bug fix
On Wed, 2 Jul 2014, David Edelsohn wrote: * config/rs6000/rs6000.c (rs6000_adjust_atomic_subword): Use BYTES_BIG_ENDIAN rather than WORDS_BIG_ENDIAN to check for byte endianness. This patch is okay. Thanks for noticing it. Committed, thanks. Maciej
[PATCH] Fix simplify_comparison in the combiner (PR rtl-optimization/61673)
Hi! The following testcase is miscompiled on s390-linux (31-bit). r202393 changed: @@ -11946,11 +11949,11 @@ if (op1 == const0_rtx (code == LT || code == GE) HWI_COMPUTABLE_MODE_P (mode)) { + unsigned HOST_WIDE_INT sign + = (unsigned HOST_WIDE_INT) 1 (GET_MODE_BITSIZE (mode) - 1); op0 = simplify_gen_binary (AND, tmode, gen_lowpart (tmode, op0), -GEN_INT ((unsigned HOST_WIDE_INT) 1 - (GET_MODE_BITSIZE (mode) - - 1))); +gen_int_mode (sign, mode)); code = (code == LT) ? NE : EQ; break; } This code creates AND of a paradoxical subreg where the bits above mode are undefined, so of course the mask has to check just the single sign bit rather than that bit + all bits above it. In this particular testcase, mode is QImode and tmode is SImode, previously and with my patch we were masking with 128, current 4.9 branch and trunk masks with -128. Bootstrapped/regtested on x86_64-linux, i686-linux and s390{,x}-linux. Ok for trunk/4.9? 2014-07-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/61673 * combine.c (simplify_comparison): Test just mode's sign bit in tmode rather than the sign bit and any bits above it. * gcc.c-torture/execute/pr61673.c: New test. --- gcc/combine.c.jj2014-03-28 20:49:52.892077022 +0100 +++ gcc/combine.c 2014-07-02 16:56:02.260456040 +0200 @@ -11987,7 +11987,7 @@ simplify_comparison (enum rtx_code code, = (unsigned HOST_WIDE_INT) 1 (GET_MODE_BITSIZE (mode) - 1); op0 = simplify_gen_binary (AND, tmode, gen_lowpart (tmode, op0), -gen_int_mode (sign, mode)); +gen_int_mode (sign, tmode)); code = (code == LT) ? NE : EQ; break; } --- gcc/testsuite/gcc.c-torture/execute/pr61673.c.jj2014-07-02 17:17:01.398908630 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr61673.c 2014-07-02 17:12:36.0 +0200 @@ -0,0 +1,50 @@ +/* PR rtl-optimization/61673 */ + +char e; + +__attribute__((noinline, noclone)) void +bar (char x) +{ + if (x != 0x54 x != (char) 0x87) +__builtin_abort (); +} + +__attribute__((noinline, noclone)) void +foo (const char *x) +{ + char d = x[0]; + int c = d; + if ((c = 0 c = 0x7f) == 0) +e = d; + bar (d); +} + +__attribute__((noinline, noclone)) void +baz (const char *x) +{ + char d = x[0]; + int c = d; + if ((c = 0 c = 0x7f) == 0) +e = d; +} + +int +main () +{ + const char c[] = { 0x54, 0x87 }; + e = 0x21; + foo (c); + if (e != 0x21) +__builtin_abort (); + foo (c + 1); + if (e != (char) 0x87) +__builtin_abort (); + e = 0x21; + baz (c); + if (e != 0x21) +__builtin_abort (); + baz (c + 1); + if (e != (char) 0x87) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On July 3, 2014 1:06:30 AM CEST, Jason Merrill ja...@redhat.com wrote: I think that makes sense; I'm not aware of anyone working on improving LTO debugging. I've done that in the past. So it would be nice to verify we don't regress existing tests. Richard. Jason
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On July 3, 2014 7:37:13 AM CEST, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jul 02, 2014 at 04:06:30PM -0700, Jason Merrill wrote: I think that makes sense; I'm not aware of anyone working on improving LTO debugging. I think at this point all we care about is that with -flto we don't ICE on those, perhaps we should arrange to change all the tests into dg-do compile with -flto and ignore all gdb-test and have some env var override which would force full testing also with -flto? I think the individual tests that currently fail can be appropriately changed, no? It would be bad to lose the lto regression testing here. Richard. Jakub
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On Thu, Jul 03, 2014 at 09:41:15AM +0200, Richard Biener wrote: On July 3, 2014 7:37:13 AM CEST, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jul 02, 2014 at 04:06:30PM -0700, Jason Merrill wrote: I think that makes sense; I'm not aware of anyone working on improving LTO debugging. I think at this point all we care about is that with -flto we don't ICE on those, perhaps we should arrange to change all the tests into dg-do compile with -flto and ignore all gdb-test and have some env var override which would force full testing also with -flto? I think the individual tests that currently fail can be appropriately changed, no? That is hard, as whether a test fails heavily depends on the optimization flags and targets, so maintaining xfails would be a nightmare. BTW, the trunk has lots of guality regressions even on x86_64-linux compared to 4.9 branch now :(, some of them are LTO only, but others are not. +FAIL: gcc.dg/guality/pr36728-1.c -O1 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O1 line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -fomit-frame-pointer line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -fomit-frame-pointer line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -g line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -g line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -Os line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -Os line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O1 line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O1 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -fomit-frame-pointer line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -fomit-frame-pointer line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -g line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -g line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -Os line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -Os line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 arg7 == 30 -XPASS: gcc.dg/guality/pr41353-1.c -O1 line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O3 -fomit-frame-pointer line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O3 -g line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -Os line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 28 j == 28 + 37 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 41 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 41 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 41 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
Re: combination of read/write and earlyclobber constraint modifier
On 2 July 2014 09:02, Tom de Vries tom_devr...@mentor.com wrote: On 02-07-14 08:23, Marc Glisse wrote: In the first example you gave, looking at the pattern (no match_dup, setting the full register), it seems that it may have wanted = instead of +. [ move discussion from gcc ml to gcc-patches ml ] Marcus, The + constraint on operand 0 of vec_unpack_trunc_mode seems wrong, since the template does not use the operand as input. This patch fixes that. OK for trunk if aarch64 build regtest succeeds ? Your patch looks fine, operand 0 isn't used for input. OK assuming no regression. Did you find this by inspection or is this the cause of some bug? /Marcus
Re: combination of read/write and earlyclobber constraint modifier
On 03-07-14 10:20, Marcus Shawcroft wrote: On 2 July 2014 09:02, Tom de Vries tom_devr...@mentor.com wrote: On 02-07-14 08:23, Marc Glisse wrote: In the first example you gave, looking at the pattern (no match_dup, setting the full register), it seems that it may have wanted = instead of +. [ move discussion from gcc ml to gcc-patches ml ] Marcus, The + constraint on operand 0 of vec_unpack_trunc_mode seems wrong, since the template does not use the operand as input. This patch fixes that. OK for trunk if aarch64 build regtest succeeds ? Your patch looks fine, operand 0 isn't used for input. OK assuming no regression. Did you find this by inspection or is this the cause of some bug? Marcus, I found this by inspection: https://gcc.gnu.org/ml/gcc/2014-07/msg7.html . Thanks, - Tom
Re: [AArch64,PATCH] Refactor acquire/release determination into output template
On 4 June 2014 01:07, Jones, Joel joel.jo...@caviumnetworks.com wrote: There is duplicate code for determining whether a load or store instruction needs acquire or release semantics. This patch removes the duplicated code and uses a modifying operator to output a/l instead. Since the testsuite already contains tests for the atomic functions, no new testcases are needed. OK? Built and tested for aarch64-elf using Cavium's internal simulator with no regressions. Thanks, Joel Jones There are a limited number of single character output modifiers. I'd rather not consume the available character space if it isn't necessary, therefore I prefer not to restructure the code in the manner proposed by this patch. Cheers /Marcus
[C++ Patch] PR 51448, 53618, 58059 (Take 2)
Hi again, this is IMHO more spot-on, because I figured out where exactly things go wrong as part of the most_specialized_class call. In complete analogy with the get_bindings case for functions, the problem happens in get_class_bindings, thus I added a simple push_tinst_level check around the tsubst there, which works fine for the testcases we have in this area. Tested x86_64-linux. Thanks, Paolo. / /cp 2014-07-03 Paolo Carlini paolo.carl...@oracle.com PR c++/51488 PR c++/53618 PR c++/58059 * pt.c (get_class_bindings): Call push_tinst_level/pop_tinst_level around tsubst. /testsuite 2014-07-03 Paolo Carlini paolo.carl...@oracle.com PR c++/51488 PR c++/53618 PR c++/58059 * g++.dg/cpp0x/template-recurse1.C: New. * g++.dg/template/recurse4.C: Likewise. * g++.dg/template/recurse.C: Adjust. Index: cp/pt.c === --- cp/pt.c (revision 212223) +++ cp/pt.c (working copy) @@ -18826,6 +18826,13 @@ get_class_bindings (tree tmpl, tree tparms, tree s if (! TREE_VEC_ELT (innermost_deduced_args, i)) return NULL_TREE; + tree tinst = build_tree_list (tmpl, args); + if (! push_tinst_level (tinst)) +{ + ggc_free (tinst); + return NULL_TREE; +} + /* Verify that nondeduced template arguments agree with the type obtained from argument deduction. @@ -18839,6 +18846,9 @@ get_class_bindings (tree tmpl, tree tparms, tree s `T' is `A' but unify () does not check whether `typename T::X' is `int'. */ spec_args = tsubst (spec_args, deduced_args, tf_none, NULL_TREE); + + pop_tinst_level (); + spec_args = coerce_template_parms (DECL_INNERMOST_TEMPLATE_PARMS (tmpl), spec_args, tmpl, tf_none, false, false); Index: testsuite/g++.dg/cpp0x/template-recurse1.C === --- testsuite/g++.dg/cpp0x/template-recurse1.C (revision 0) +++ testsuite/g++.dg/cpp0x/template-recurse1.C (working copy) @@ -0,0 +1,25 @@ +// PR c++/58059 +// { dg-do compile { target c++11 } } + +templatebool, typename T = void struct enable_if { typedef T type; }; +templatetypename T struct enable_iffalse, T { }; + +// This code is nonsense; it was produced by minimizing the problem repeatedly. +constexpr bool test_func(int value) { + return true; +} +template int TParm, class Enable=void +struct test_class { + static constexpr int value = 0; +}; +template int TParm +struct test_class +TParm, +// This line ultimately causes the crash. +typename enable_iftest_func(test_classTParm-1::value)::type // { dg-error depth exceeds } + { + static constexpr int value = 1; +}; + +// This instantiation is required in order to crash. +template class test_class2,void; Index: testsuite/g++.dg/template/recurse.C === --- testsuite/g++.dg/template/recurse.C (revision 21) +++ testsuite/g++.dg/template/recurse.C (working copy) @@ -5,9 +5,7 @@ template int I struct F { int operator()() { - FI+1 f;// { dg-error incomplete type incomplete } - // { dg-bogus exceeds maximum.*exceeds maximum exceeds { xfail *-*-* } 8 } -// { dg-error exceeds maximum exceeds { xfail *-*-* } 8 } + FI+1 f;// { dg-error depth exceeds|incomplete } return f()*I; // { dg-message recursively recurse } } }; Index: testsuite/g++.dg/template/recurse4.C === --- testsuite/g++.dg/template/recurse4.C(revision 0) +++ testsuite/g++.dg/template/recurse4.C(working copy) @@ -0,0 +1,5 @@ +// PR c++/51488 + +templateclass T,class U=void struct s; +templateclass T struct sT,typename sT::a {}; +sint ca; // { dg-error depth exceeds|incomplete }
Re: [PATCH, x86] Improves x86 permutation expand
The following patch should fix 61618 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61618 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8046c67..2cffcef 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -43211,12 +43211,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) bool ok; /* Use the same checks as in expand_vec_perm_blend, but skipping - AVX2 as it requires more than 2 instructions for general case. */ + AVX float case and AVX2 as they require more than 2 instructions. */ if (d-one_operand_p) return false; - if (TARGET_AVX (vmode == V4DFmode || vmode == V8SFmode)) -; - else if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) + if (TARGET_SSE4_1 GET_MODE_SIZE (vmode) == 16) ; else return false; Evgeny On Mon, Jun 9, 2014 at 11:49 PM, Richard Henderson r...@redhat.com wrote: On 06/09/2014 12:10 PM, Evgeny Stupachenko wrote: Nice catch. Patch with corresponding changes: Looks ok with an appropriate changelog. r~
[fortran,patch] Support for IEEE underflow control on x86/x86_64
Hi all, The attached patch provides support for underflow control in the IEEE_ARITHMETIC module, for x86/x86_64 targets (our main user base). Bootstrapped and regtested on x86_64-apple-darwin13. Comes with a testcase. OK to commit? FX underflow.ChangeLog Description: Binary data underflow.diff Description: Binary data
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
On Thu, Jul 3, 2014 at 11:06 AM, FX fxcoud...@gmail.com wrote: Hi all, The attached patch provides support for underflow control in the IEEE_ARITHMETIC module, for x86/x86_64 targets (our main user base). Bootstrapped and regtested on x86_64-apple-darwin13. Comes with a testcase. +int +support_fpu_underflow_control (int kind) +{ + return (has_sse() (kind == 4 || kind == 8)) ? 1 : 0; +} Please split this condition to improve readability: if (!has_sse) return 0; ... Index: gcc/testsuite/gfortran.dg/ieee/underflow_1.f90 === --- gcc/testsuite/gfortran.dg/ieee/underflow_1.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/ieee/underflow_1.f90 (working copy) @@ -0,0 +1,75 @@ +! { dg-do run } +! { dg-additional-options -O0 } +! { dg-additional-options -msse -mfpmath=sse { target { i?86-*-* x86_64-*-* } } } + ! { dg-do run } ! { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } ! { dg-additional-options -msse2 -mfpmath=sse { target { i?86-*-* x86_64-*-* } } } (I don't think -O0 is needed, but have to check with a testsuite run.) Uros.
Re: [C/C++ PATCH] Implement -Wsizeof-array-argument (PR c/6940)
On Wed, Jul 02, 2014 at 07:27:07PM -0700, Jason Merrill wrote: On 06/26/2014 03:22 PM, Marek Polacek wrote: The following is a revamped patch for -Wsizeof-array-argument. Its purpose is to detect suspicious usage of the sizeof operator on an array function parameter. Then the name should be -Wsizeof-array-parm, not -argument. Yeah, but since clang calls this warnings -Wsizeof-array-argument, I thought it's better to keep the names in sync. @@ -9550,6 +9551,8 @@ grokdeclarator (const cp_declarator *declarator, array. */ returned_attrs = chainon (returned_attrs, declarator-std_attributes); + if (decl_context == PARM) +array_parameter_p = true; break; Setting this here means that you'll treat a parameter with pointer-to-array type as an array parm. I think you want to set it here, instead: /* A parameter declared as an array of T is really a pointer to T. One declared as a function is really a pointer to a function. One declared as a member is really a pointer to member. */ if (TREE_CODE (type) == ARRAY_TYPE) { /* Transfer const-ness of array into that of type pointed to. */ type = build_pointer_type (TREE_TYPE (type)); type_quals = TYPE_UNQUALIFIED; } Ah! Thanks for catching it. I added a test for that, with some typedefs too. (The C FE didn't need similar fix.) Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-07-03 Marek Polacek pola...@redhat.com PR c/6940 * doc/invoke.texi: Document -Wsizeof-array-argument. c-family/ * c.opt (Wsizeof-array-argument): New option. c/ * c-decl.c (grokdeclarator): Set C_ARRAY_PARAMETER. * c-tree.h (C_ARRAY_PARAMETER): Define. * c-typeck.c (c_expr_sizeof_expr): Warn when using sizeof on an array function parameter. cp/ * cp-tree.h (DECL_ARRAY_PARAMETER_P): Define. * decl.c (grokdeclarator): Set DECL_ARRAY_PARAMETER_P. * typeck.c (cxx_sizeof_expr): Warn when using sizeof on an array function parameter. testsuite/ * c-c++-common/Wsizeof-pointer-memaccess1.c: Use -Wno-sizeof-array-argument. * c-c++-common/Wsizeof-pointer-memaccess2.c: Likewise. * g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Likewise. * gcc.dg/Wsizeof-pointer-memaccess1.c: Likewise. * g++.dg/torture/Wsizeof-pointer-memaccess1.C: Likewise. * g++.dg/torture/Wsizeof-pointer-memaccess2.C: Likewise. * gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Likewise. * c-c++-common/sizeof-array-argument.c: New test. * gcc.dg/vla-5.c: Add dg-warnings. ../libgomp/ * testsuite/libgomp.c/appendix-a/a.29.1.c (f): Add dg-warnings. diff --git gcc/gcc/c-family/c.opt gcc/gcc/c-family/c.opt index c89040a..faef774 100644 --- gcc/gcc/c-family/c.opt +++ gcc/gcc/c-family/c.opt @@ -534,6 +534,10 @@ Wsizeof-pointer-memaccess C ObjC C++ ObjC++ Var(warn_sizeof_pointer_memaccess) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn about suspicious length parameters to certain string functions if the argument uses sizeof +Wsizeof-array-argument +C ObjC C++ ObjC++ Var(warn_sizeof_array_argument) Warning Init(1) +Warn when sizeof is applied on a parameter declared as an array + Wsuggest-attribute=format C ObjC C++ ObjC++ Var(warn_suggest_attribute_format) Warning Warn about functions which might be candidates for format attributes diff --git gcc/gcc/c/c-decl.c gcc/gcc/c/c-decl.c index 3dec90b..0ca2e0d 100644 --- gcc/gcc/c/c-decl.c +++ gcc/gcc/c/c-decl.c @@ -6103,6 +6103,7 @@ grokdeclarator (const struct c_declarator *declarator, if (decl_context == PARM) { tree promoted_type; + bool array_parameter_p = false; /* A parameter declared as an array of T is really a pointer to T. One declared as a function is really a pointer to a function. */ @@ -6124,6 +6125,7 @@ grokdeclarator (const struct c_declarator *declarator, attributes in parameter array declarator ignored); size_varies = false; + array_parameter_p = true; } else if (TREE_CODE (type) == FUNCTION_TYPE) { @@ -6148,6 +6150,7 @@ grokdeclarator (const struct c_declarator *declarator, PARM_DECL, declarator-u.id, type); if (size_varies) C_DECL_VARIABLE_SIZE (decl) = 1; + C_ARRAY_PARAMETER (decl) = array_parameter_p; /* Compute the type actually passed in the parmlist, for the case where there is no prototype. diff --git gcc/gcc/c/c-tree.h gcc/gcc/c/c-tree.h index 133930f..f97d0d5 100644 --- gcc/gcc/c/c-tree.h +++ gcc/gcc/c/c-tree.h @@ -66,6 +66,9 @@ along with GCC; see the file COPYING3. If not see /* For a FUNCTION_DECL, nonzero if it was an implicit declaration. */ #define C_DECL_IMPLICIT(EXP) DECL_LANG_FLAG_2 (EXP)
[PATCH, libbacktrace]: fix pointer from integer without a cast
Hello the following patch fixes some post-build-checks from our distro build system, better to upstream it: [ 4077s] E: rust 64bit-portability-issue /home/abuild/rpmbuild/BUILD/rust-0.11.0+git.1403898616.aa1163b/src/libbacktrace/dwarf.c:2690, 2873, 3005 [ 4077s] E: rust 64bit-portability-issue /home/abuild/rpmbuild/BUILD/rust-0.11.0+git.1403898616.aa1163b/src/libbacktrace/elf.c:448, 493, 970 [ 4077s] E: rust 64bit-portability-issue /home/abuild/rpmbuild/BUILD/rust-0.11.0+git.1403898616.aa1163b/src/libbacktrace/fileline.c:75 libbacktrace/ChangeLog: 2014-07-03 Yaroslav Sidlovsky zawer...@gmail.com * dwarf.c, elf.c, fileline.c: fix pointer from integer without a cast diff -U 3 -H -d -r -N -- rust-0.10.orig/src/libbacktrace/dwarf.c rust-0.10/src/libbacktrace/dwarf.c --- rust-0.10.orig/src/libbacktrace/dwarf.c 2014-04-03 04:03:08.0 +0400 +++ rust-0.10/src/libbacktrace/dwarf.c 2014-04-21 12:01:59.803278408 +0400 @@ -2687,7 +2687,7 @@ } if (state-threaded) -lines = backtrace_atomic_load_pointer (u-lines); +lines = (struct line *) backtrace_atomic_load_pointer (u-lines); new_data = 0; if (lines == NULL) @@ -2870,7 +2870,7 @@ pp = (struct dwarf_data **) (void *) state-fileline_data; while (1) { - ddata = backtrace_atomic_load_pointer (pp); + ddata = (struct dwarf_data *) backtrace_atomic_load_pointer (pp); if (ddata == NULL) break; @@ -3002,7 +3002,7 @@ { struct dwarf_data *p; - p = backtrace_atomic_load_pointer (pp); + p = (struct dwarf_data *) backtrace_atomic_load_pointer (pp); if (p == NULL) break; diff -U 3 -H -d -r -N -- rust-0.10.orig/src/libbacktrace/elf.c rust-0.10/src/libbacktrace/elf.c --- rust-0.10.orig/src/libbacktrace/elf.c 2014-04-03 04:03:08.0 +0400 +++ rust-0.10/src/libbacktrace/elf.c 2014-04-21 12:17:50.977257617 +0400 @@ -445,7 +445,7 @@ { struct elf_syminfo_data *p; - p = backtrace_atomic_load_pointer (pp); + p = (struct elf_syminfo_data *) backtrace_atomic_load_pointer (pp); if (p == NULL) break; @@ -490,7 +490,7 @@ pp = (struct elf_syminfo_data **) (void *) state-syminfo_data; while (1) { - edata = backtrace_atomic_load_pointer (pp); + edata = (struct elf_syminfo_data *) backtrace_atomic_load_pointer (pp); if (edata == NULL) break; @@ -967,7 +967,7 @@ { fileline current_fn; - current_fn = backtrace_atomic_load_pointer (state-fileline_fn); + current_fn = (fileline) backtrace_atomic_load_pointer (state-fileline_fn); if (current_fn == NULL || current_fn == elf_nodebug) *fileline_fn = elf_fileline_fn; } diff -U 3 -H -d -r -N -- rust-0.10.orig/src/libbacktrace/fileline.c rust-0.10/src/libbacktrace/fileline.c --- rust-0.10.orig/src/libbacktrace/fileline.c 2014-04-03 04:03:08.0 +0400 +++ rust-0.10/src/libbacktrace/fileline.c 2014-04-21 11:33:37.790315610 +0400 @@ -72,7 +72,7 @@ if (!state-threaded) fileline_fn = state-fileline_fn; else -fileline_fn = backtrace_atomic_load_pointer (state-fileline_fn); +fileline_fn = (fileline) backtrace_atomic_load_pointer (state-fileline_fn); if (fileline_fn != NULL) return 1;
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
(I don't think -O0 is needed, but have to check with a testsuite run.) On x86_64-apple-darwin, -O0 or -O1 are needed: at -O2 my “use_real” call is optimized out anyway, and the division simplified at compile time. FX
[PATCH] Add guality [p]type test.
Hi, I pulled out the guality.exp [p]type test extension from the actual dwarf2out.c changes (which I will repost soon with some tweaks). I think the test extension itself is useful on its own (and will use it to add tests for my new patches). All new tests PASS, except when using -flto, so you'll need the Don't run guality.exp tests with LTO_TORTURE_OPTIONS if you don't want to add new FAILs. But I hope this patch can go in even without that because I do think it is useful on its own. Add a new type:var variant to the guality.exp testsuite to check that gdb gets the correct type for a variable or function. To use it in a guality test add something like: /* { dg-final { gdb-test 50 type:main int (int, char **) } } */ Which will put a breakpoint at line 50 and check that the type of main equals int (int, char **) according to gdb. The test harness will make sure to squash all extra whitespace/newlines that gdb might use to make comparisons of large structs easy. gcc/testsuite/ChangeLog * lib/gcc-gdb-test.exp (gdb-test): Handle type:var for gdb ptype matching. Catch 'unknown type in ' to recognize older gdb versions. * gcc.dg/guality/const-volatile.c: New test. --- gcc/testsuite/ChangeLog |6 ++ gcc/testsuite/gcc.dg/guality/const-volatile.c | 83 + gcc/testsuite/lib/gcc-gdb-test.exp| 47 +- 3 files changed, 132 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/guality/const-volatile.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 421e006..1abc700 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-07-03 Mark Wielaard m...@redhat.com + + * lib/gcc-gdb-test.exp (gdb-test): Handle type:var for gdb ptype + matching. Catch 'unknown type in ' to recognize older gdb versions. + * gcc.dg/guality/const-volatile.c: New test. + 2014-07-02 Mark Wielaard m...@redhat.com * gcc.dg/guality/guality.exp: Remove LTO_TORTURE_OPTIONS from diff --git a/gcc/testsuite/gcc.dg/guality/const-volatile.c b/gcc/testsuite/gcc.dg/guality/const-volatile.c new file mode 100644 index 000..6c2b617 --- /dev/null +++ b/gcc/testsuite/gcc.dg/guality/const-volatile.c @@ -0,0 +1,83 @@ +/* debuginfo tests for combinations of const and volatile type qualifiers. */ +/* { dg-do run } */ +/* { dg-options -g } */ + +int i; +const int ci; +volatile int vi; +const volatile int cvi; + +int *pi; +const int *pci; +volatile int *pvi; +const volatile int *pcvi; + +int * const cip; +int * volatile vip; +int * const volatile cvip; + +volatile struct +{ + const long cli; + const signed char csc; +} vs; + +struct foo +{ + const long cli; + const signed char csc; +}; + +struct foo foo; +const struct foo cfoo; +volatile struct foo vfoo; +const volatile struct foo cvfoo; + +typedef volatile signed char score; + +score s; +const score cs; + +static __attribute__((noclone, noinline)) int +f (const char *progname, volatile struct foo *dummy, const score s) +{ + return progname == 0 || dummy == 0 || dummy-csc == s; +} + +int +main (int argc, char **argv) +{ + score as = argc; + struct foo dummy = { 1, 1 }; + return f (argv[0], dummy, as) - 1; +} + +/* { dg-final { gdb-test 50 type:main int (int, char **) } } */ + +/* { dg-final { gdb-test 50 type:i int } } */ +/* { dg-final { gdb-test 50 type:ci const int } } */ +/* { dg-final { gdb-test 50 type:vi volatile int } } */ +/* { dg-final { gdb-test 50 type:cvi const volatile int } } */ + +/* { dg-final { gdb-test 50 type:pi int * } } */ +/* { dg-final { gdb-test 50 type:pci const int * } } */ +/* { dg-final { gdb-test 50 type:pvi volatile int * } } */ +/* { dg-final { gdb-test 50 type:pcvi const volatile int * } } */ + +/* { dg-final { gdb-test 50 type:cip int * const } } */ +/* { dg-final { gdb-test 50 type:vip int * volatile } } */ +/* { dg-final { gdb-test 50 type:cvip int * const volatile } } */ + +/* { dg-final { gdb-test 50 type:vs volatile struct { const long cli; const signed char csc; } } } */ + +/* { dg-final { gdb-test 50 type:cvip int * const volatile } } */ + +/* { dg-final { gdb-test 50 type:foo struct foo { const long cli; const signed char csc; } } } */ +/* { dg-final { gdb-test 50 type:cfoo const struct foo { const long cli; const signed char csc; } } } */ +/* { dg-final { gdb-test 50 type:vfoo volatile struct foo { const long cli; const signed char csc; } } } */ +/* { dg-final { gdb-test 50 type:cvfoo const volatile struct foo { const long cli; const signed char csc; } } } */ + +/* { dg-final { gdb-test 58 type:s volatile signed char } } */ +/* { dg-final { gdb-test 50 type:cs const volatile signed char } } */ + +/* { dg-final { gdb-test 50 type:f int (const char *, volatile struct foo *, const score) } } */ diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp b/gcc/testsuite/lib/gcc-gdb-test.exp index d182d88..c729793 100644 --- a/gcc/testsuite/lib/gcc-gdb-test.exp +++
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
The expand_vec_perm_palignr is similar for SSSE3 and AVX2 cases, but AVX2 requires more instructions to complete the scheme. The patch below adds AVX2 support for six instructions, leaving SSSE3 for two. Is it ok? diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2cffcef..70fc832 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -43130,23 +43130,38 @@ expand_vec_perm_pshuflw_pshufhw (struct expand_vec_perm_d *d) return true; } +static bool +expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d); + /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to simplify - the permutation using the SSSE3 palignr instruction. This succeeds + the permutation using the SSSE3/AVX2 palignr instruction. This succeeds when all of the elements in PERM fit within one vector and we merely need to shift them down so that a single vector permutation has a chance to succeed. */ static bool -expand_vec_perm_palignr (struct expand_vec_perm_d *d) +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num) { unsigned i, nelt = d-nelt; unsigned min, max; bool in_order, ok; - rtx shift, target; + rtx shift, shift1, target, tmp; struct expand_vec_perm_d dcopy; - /* Even with AVX, palignr only operates on 128-bit vectors. */ - if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + /* SSSE3 is required to apply PALIGNR on 16 bytes operands. */ + if (GET_MODE_SIZE (d-vmode) == 16) +{ + if (!TARGET_SSSE3) + return false; +} + /* AVX2 is required to apply PALIGNR on 32 bytes operands. */ + else if (GET_MODE_SIZE (d-vmode) == 32) +{ + if (!TARGET_AVX2) + return false; +} + /* Other sizes are not supported. */ + else return false; min = nelt, max = 0; @@ -43168,9 +43183,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d) dcopy = *d; shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode))); - target = gen_reg_rtx (TImode); - emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1), - gen_lowpart (TImode, d-op0), shift)); + shift1 = GEN_INT ((min - nelt / 2) + * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode))); + + if (GET_MODE_SIZE (d-vmode) != 32) +{ + target = gen_reg_rtx (TImode); + emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1), + gen_lowpart (TImode, d-op0), shift)); +} + else +{ + target = gen_reg_rtx (V2TImode); + tmp = gen_reg_rtx (V4DImode); + emit_insn (gen_avx2_permv2ti (tmp, + gen_lowpart (V4DImode, d-op0), + gen_lowpart (V4DImode, d-op1), + GEN_INT (33))); + if (min nelt / 2) +emit_insn (gen_avx2_palignrv2ti (target, +gen_lowpart (V2TImode, tmp), +gen_lowpart (V2TImode, d-op0), +shift)); + else + emit_insn (gen_avx2_palignrv2ti (target, +gen_lowpart (V2TImode, d-op1), +gen_lowpart (V2TImode, tmp), +shift1)); +} dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target); dcopy.one_operand_p = true; @@ -43192,9 +43232,22 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d) return true; } - ok = expand_vec_perm_1 (dcopy); - gcc_assert (ok); - + /* For SSSE3 we need 1 instruction for palignr plus 1 for one + operand permutaoin. */ + if (insn_num == 2) +{ + ok = expand_vec_perm_1 (dcopy); + gcc_assert (ok); +} + /* For AVX2 we need 2 instructions for the shift: vpalignr and + vperm plus 4 instructions for one operand permutation. */ + else if (insn_num == 6) +{ + ok = expand_vec_perm_vpshufb2_vpermq (dcopy); + gcc_assert (ok); +} + else +ok = false; return ok; } @@ -44627,7 +44680,7 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_pshuflw_pshufhw (d)) return true; - if (expand_vec_perm_palignr (d)) + if (expand_vec_perm_palignr (d, 2)) return true; if (expand_vec_perm_interleave2 (d)) @@ -44680,6 +44733,10 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_even_odd (d)) return true; + /* Try sequences of six instructions. */ + if (expand_vec_perm_palignr (d, 6)) +return true; + /* Even longer sequences. */ if (expand_vec_perm_vpshufb4_vpermq2 (d)) return true; On Mon, May 19, 2014 at 7:32 PM, Richard Henderson r...@redhat.com wrote: On 05/05/2014 09:49 AM, Evgeny Stupachenko wrote: @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_pshufb (d)) return true; + /* Try the AVX2 vpshufb. */ + if
Re: [PATCH] Add guality [p]type test.
On Thu, Jul 03, 2014 at 11:53:29AM +0200, Mark Wielaard wrote: I pulled out the guality.exp [p]type test extension from the actual dwarf2out.c changes (which I will repost soon with some tweaks). I think the test extension itself is useful on its own (and will use it to add tests for my new patches). All new tests PASS, except when using -flto, so you'll need the Don't run guality.exp tests with LTO_TORTURE_OPTIONS if you don't want to add new FAILs. But I hope this patch can go in even without that because I do think it is useful on its own. Is what gdb prints for ptype stable across different gdb versions (except for whitespace that you canonicalize)? If yes, this looks good to me. Jakub
Re: [PATCH] Don't ICE with huge alignment (PR middle-end/60226)
On Mon, Jun 30, 2014 at 01:50:12PM -0600, Jeff Law wrote: On 03/04/14 09:40, Marek Polacek wrote: This should fix ICE on insane alignment. Normally, check_user_alignment detects e.g. alignment 1 32, but not 1 28. However, record_align is in bits, so it's actually 8 * (1 28) and that's greater than INT_MAX. This patch rejects such code. In the middle hunk, we should give up when an error occurs, we don't want to call finalize_type_size in that case -- we'd ICE in there. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2014-03-04 Marek Polacek pola...@redhat.com PR middle-end/60226 * stor-layout.c (layout_type): Return if alignment of array elements is greater than element size. Error out if requested alignment is too large. cp/ * class.c (layout_class_type): Error out if requested alignment is too large. testsuite/ * c-c++-common/pr60226.c: New test. Is this still applicable after the wide-int changes? I haven't looked closely. Yeah, it applies cleanly. But I tried the int - unsigned change which Mike suggested and that cures the ICE. I'll send a patch momentarily. Marek
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
On Thu, Jul 3, 2014 at 11:25 AM, Uros Bizjak ubiz...@gmail.com wrote: The attached patch provides support for underflow control in the IEEE_ARITHMETIC module, for x86/x86_64 targets (our main user base). Bootstrapped and regtested on x86_64-apple-darwin13. Comes with a testcase. Index: gcc/testsuite/gfortran.dg/ieee/underflow_1.f90 I'd suggest to name this fie ieee_underflow_1.f90 for consistency. BTW: underflow control also works on alpha, using following code: --cut here-- int support_fpu_underflow_control (int kind) { return (kind == 4 || kind == 8) ? 1 : 0; } int get_fpu_underflow_mode (void) { fenv_t state = __ieee_get_fp_control (); /* Return 0 for abrupt underflow (flush to zero), 1 for gradual underflow. */ return (state FE_MAP_UMZ) ? 0 : 1; } void set_fpu_underflow_mode (int gradual __attribute__((unused))) { fenv_t state = __ieee_get_fp_control (); if (gradual) state = ~FE_MAP_UMZ; else state |= FE_MAP_UMZ; __ieee_set_fp_control (state); } --cut here-- This non-portable code was stuffed into fpu-glibc.h so should be #ifdef'd with some __ALPHA__ and/or FE_MAP_UMZ check. Uros.
Re: [i386] Replace builtins with vector extensions
Hello Marc, On 28 Jun 12:42, Marc Glisse wrote: It would enable a number of optimizations, like constant propagation, FMA contraction, etc. It would also allow us to remove several builtins. This should be main motivation for replacing built-ins. But this approach IMHO should only be used for `obvious' cases only. I mean: + - / * and friends. Think that this shouldn't apply for shuffles, broadcasts. But we have to define border between `obvious' and rest intrinsics. On the over hand, updated in such a way intrinsic may actually generate different instruction then intended (e.g. FMA case). For ICC this is generally OK to generate different instructions, only semantics should be obeyed. -- Thanks, K
Re: [PATCH] Don't ICE with huge alignment (PR middle-end/60226)
On Mon, Jun 30, 2014 at 03:40:18PM -0700, Mike Stump wrote: I glanced at it: (gdb) p/x TYPE_ALIGN (type) $1 = 2147483648 (gdb) p/x TYPE_ALIGN (type) $2 = 0x8000 The callee is int, the caller uses unsigned int. The assert I see is because the routines are not type correct: =TYPE_SIZE (type) = round_up (TYPE_SIZE (type), TYPE_ALIGN (type)); (gdb) ptype TYPE_ALIGN (type) type = unsigned int tree round_up_loc (location_t loc, tree value, int divisor) { tree div = NULL_TREE; =gcc_assert (divisor 0); Would be nice if the routine was type correct (wrt unsigned). Yeah, I did that. One issue with that is that round_up now wraps the value, so I had to add a check for huge size before rounding up, otherwise we'd regress on e.g. PR42611. How about the following? Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-07-03 Marek Polacek pola...@redhat.com PR c/60226 * fold-const.c (round_up_loc): Change the parameter type. Remove assert. * fold-const.h (round_up_loc): Adjust declaration. * stor-layout.c (finalize_record_size): Check for too large types. * c-c++-common/pr60226.c: New test. diff --git gcc/fold-const.c gcc/fold-const.c index d22eac1..c57ac7b 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -16647,11 +16647,10 @@ fold_ignored_result (tree t) /* Return the value of VALUE, rounded up to a multiple of DIVISOR. */ tree -round_up_loc (location_t loc, tree value, int divisor) +round_up_loc (location_t loc, tree value, unsigned int divisor) { tree div = NULL_TREE; - gcc_assert (divisor 0); if (divisor == 1) return value; diff --git gcc/fold-const.h gcc/fold-const.h index dcb97a1..3b5fd84 100644 --- gcc/fold-const.h +++ gcc/fold-const.h @@ -144,7 +144,7 @@ extern tree combine_comparisons (location_t, enum tree_code, enum tree_code, extern void debug_fold_checksum (const_tree); extern bool may_negate_without_overflow_p (const_tree); #define round_up(T,N) round_up_loc (UNKNOWN_LOCATION, T, N) -extern tree round_up_loc (location_t, tree, int); +extern tree round_up_loc (location_t, tree, unsigned int); #define round_down(T,N) round_down_loc (UNKNOWN_LOCATION, T, N) extern tree round_down_loc (location_t, tree, int); extern tree size_int_kind (HOST_WIDE_INT, enum size_type_kind); diff --git gcc/stor-layout.c gcc/stor-layout.c index cfd436f..19e7adb 100644 --- gcc/stor-layout.c +++ gcc/stor-layout.c @@ -1587,6 +1587,11 @@ finalize_record_size (record_layout_info rli) unpadded_size_unit = size_binop (PLUS_EXPR, unpadded_size_unit, size_one_node); + if (TREE_CODE (unpadded_size_unit) == INTEGER_CST + !TREE_OVERFLOW (unpadded_size_unit) + !valid_constant_size_p (unpadded_size_unit)) +error (type %qT is too large, rli-t); + /* Round the size up to be a multiple of the required alignment. */ TYPE_SIZE (rli-t) = round_up (unpadded_size, TYPE_ALIGN (rli-t)); TYPE_SIZE_UNIT (rli-t) diff --git gcc/testsuite/c-c++-common/pr60226.c gcc/testsuite/c-c++-common/pr60226.c index e69de29..3a1c261 100644 --- gcc/testsuite/c-c++-common/pr60226.c +++ gcc/testsuite/c-c++-common/pr60226.c @@ -0,0 +1,14 @@ +/* PR c/60226 */ +/* { dg-do compile } */ +/* { dg-options -Wno-c++-compat { target c } } */ + +typedef int __attribute__ ((aligned (1 28))) int28; +int28 foo[4] = {}; /* { dg-error alignment of array elements is greater than element size } */ +typedef int __attribute__ ((aligned (1 29))) int29; /* { dg-error requested alignment is too large } */ + +void +f (void) +{ + struct { __attribute__((aligned (1 28))) double a; } x1; + struct { __attribute__((aligned (1 29))) double a; } x2; /* { dg-error requested alignment is too large } */ +} Marek
Re: [Patch, libgomp] libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}
Hi! On Wed, 2 Jul 2014 21:14:11 +0200, Tobias Burnus bur...@net-b.de wrote: Thomas Schwinge wrote: Reopening this oldie: index 5fa42f4..68440d18 100644 --- a/libgomp/testsuite/libgomp.fortran/fortran.exp +++ b/libgomp/testsuite/libgomp.fortran/fortran.exp @@ -14,6 +14,7 @@ set quadmath_library_path ../libquadmath/.libs dg-init if { $blddir != } { +lappend ALWAYS_CFLAGS additional_flags=-fintrinsic-modules-path ${blddir} How about the following (only lightly tested). I wonder why I didn't use it before – but it looks obvious. --- a/libgomp/testsuite/libgomp.fortran/fortran.exp +++ b/libgomp/testsuite/libgomp.fortran/fortran.exp @@ -48,5 +48,5 @@ if { $lang_test_file_found } { || [file exists ${blddir}/${quadmath_library_path}/libquadmath.${shlib_ext}] } { - lappend ALWAYS_CFLAGS ldflags=-L${blddir}/${quadmath_library_path}/ + lappend ALWAYS_FFLAGS ldflags=-L${blddir}/${quadmath_library_path}/ # Allow for spec subsitution. - lappend ALWAYS_CFLAGS additional_flags=-B${blddir}/${quadmath_library_path}/ + lappend ALWAYS_FFLAGS additional_flags=-B${blddir}/${quadmath_library_path}/ set ld_library_path $always_ld_library_path:${blddir}/${lang_library_path}:${blddir}/${quadmath_library_path} I don't understand -- there is no ALWAYS_FFLAGS, and, it's not -L and -B options that are the problem here, but rather -fintrinsic-modules-path passed to xgcc running in non-Fortran mode when doing check_effective-target_* checks and the like. My understanding is that these checks will always be using ${tool}_target_compile, so the problem is ALWAYS_CFLAGS usage in libgomp/testsuite/lib/libgomp.exp:libgomp_target_compile: this will be set up for Fortran testing in libgomp/testsuite/libgomp.fortran/fortran.exp (including the -fintrinsic-modules-path option), but then used by check_effective-target_* with C language test cases, hence the compiler warning. I found the following to work (but so far only did libgomp testing), but that is a little bit more intrusive, but may actually be the right thing to do. (Possibly also in additional places where ${tool}_target_compile is used? CCing testsuite maintainers.) Comments? Patch to relax checking for compiler warnings when determining features supported by the target: --- gcc/testsuite/lib/target-supports.exp +++ gcc/testsuite/lib/target-supports.exp @@ -78,6 +78,9 @@ proc check_compile {basename type contents args} { set lines [${tool}_target_compile $src $output $compile_type $options] file delete $src +# Mask out messages from gcc that aren't useful for our purposes here. +set lines [string trim [prune_gcc_output $lines]] + set scan_output $output # Don't try folding this into the switch above; calling glob before the # file is created won't work. Patch to mask out »valid for Fortran but not for C« warnings. --- gcc/testsuite/lib/prune.exp +++ gcc/testsuite/lib/prune.exp @@ -46,6 +46,10 @@ proc prune_gcc_output { text } { regsub -all (^|\n)\[^\n\]*: Additional NOP may be necessary to workaround Itanium processor A/B step errata $text text regsub -all (^|\n)\[^\n*\]*: Assembler messages:\[^\n\]* $text text +# Ignore warning for gfortran options passed to xgcc not running in Fortran +# mode. +regsub -all (^|\n)\[^\n\]*: warning: command line option .-f\[^\n\]*. is valid for Fortran but not for C\[^\n\]* $text text + # Ignore harmless VTA note. regsub -all (^|\n)\[^\n\]*: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without\[^\n\]* $text text Grüße, Thomas pgpLrt60ycbiZ.pgp Description: PGP signature
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
On Thu, Jul 3, 2014 at 11:42 AM, FX fxcoud...@gmail.com wrote: (I don't think -O0 is needed, but have to check with a testsuite run.) On x86_64-apple-darwin, -O0 or -O1 are needed: at -O2 my “use_real” call is optimized out anyway, and the division simplified at compile time. You can mark variables with: real, volatile :: x double precision, volatile :: y and you don't even need the call to use_real anymore. I have tested on alpha that this approach works for all optimization levels. Uros.
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
I'd suggest to name this fie ieee_underflow_1.f90 for consistency. In fact, since the directory is called ieee/, I think I’ll rename the others so they don’t all start with ieee_ BTW: underflow control also works on alpha, using following code: Could you test the attached libgfortran/config/fpu-glibc.h file on alpha? You can mark variables with “volatile” Indeed, I should have thought of that. Once you report the results of the alpha modification, I’ll propose an updated patch with all of those remarks. Thanks, FX fpu-glibc.h Description: Binary data
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
On Thu, Jul 3, 2014 at 12:26 PM, FX fxcoud...@gmail.com wrote: I'd suggest to name this fie ieee_underflow_1.f90 for consistency. In fact, since the directory is called ieee/, I think I’ll rename the others so they don’t all start with ieee_ BTW: underflow control also works on alpha, using following code: Could you test the attached libgfortran/config/fpu-glibc.h file on alpha? You can mark variables with “volatile” Indeed, I should have thought of that. Once you report the results of the alpha modification, I’ll propose an updated patch with all of those remarks. (I'd also make the new code dependant on __alpha__.) Otherwise, the new header works OK. Thanks, Uros.
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Sat, Jun 28, 2014 at 06:52:00PM +0200, Gerald Pfeifer wrote: On Fri, 20 Jun 2014, Marek Polacek wrote: +@item -fsanitize=bounds +@opindex fsanitize=bounds + +This option enables instrumentation of array bounds. Various out of bounds +accesses are detected. Flexible array members are not instrumented, as well +as initializers of variables with static storage. Can you make this Flexible array members and initializers... (or ...as well as...)? The current wording confused me a bit at first. And I believe there should be no empty line after @opindex. Thanks, I'll fix both with the following. Also -fsanitize=float-divide-by-zero and -fsanitize=float-cast-overflow descriptions were at a wrong place, so moved a little bit above. Applying to trunk as obvious. 2014-07-03 Marek Polacek pola...@redhat.com * doc/invoke.texi (-fsanitize=bounds): Tweak wording. (-fsanitize=float-divide-by-zero): Move to the table with -fsanitize=undefined suboptions. (-fsanitize=float-cast-overflow): Likewise. diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index b1f6f4b..046ea58 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -5400,26 +5400,22 @@ at runtime. Current suboptions are: @item -fsanitize=shift @opindex fsanitize=shift - This option enables checking that the result of a shift operation is not undefined. Note that what exactly is considered undefined differs slightly between C and C++, as well as between ISO C90 and C99, etc. @item -fsanitize=integer-divide-by-zero @opindex fsanitize=integer-divide-by-zero - Detect integer division by zero as well as @code{INT_MIN / -1} division. @item -fsanitize=unreachable @opindex fsanitize=unreachable - With this option, the compiler will turn the @code{__builtin_unreachable} call into a diagnostics message call instead. When reaching the @code{__builtin_unreachable} call, the behavior is undefined. @item -fsanitize=vla-bound @opindex fsanitize=vla-bound - This option instructs the compiler to check that the size of a variable length array is positive. This option does not have any effect in @option{-std=c++1y} mode, as the standard requires the exception be thrown @@ -5427,7 +5423,6 @@ instead. @item -fsanitize=null @opindex fsanitize=null - This option enables pointer checking. Particularly, the application built with this option turned on will issue an error message when it tries to dereference a NULL pointer, or if a reference (possibly an @@ -5435,7 +5430,6 @@ rvalue reference) is bound to a NULL pointer. @item -fsanitize=return @opindex fsanitize=return - This option enables return statement checking. Programs built with this option turned on will issue an error message when the end of a non-void function is reached without actually @@ -5443,7 +5437,6 @@ returning a value. This option works in C++ only. @item -fsanitize=signed-integer-overflow @opindex fsanitize=signed-integer-overflow - This option enables signed integer overflow checking. We check that the result of @code{+}, @code{*}, and both unary and binary @code{-} does not overflow in the signed arithmetics. Note, integer promotion @@ -5456,20 +5449,12 @@ a++; @item -fsanitize=bounds @opindex fsanitize=bounds - This option enables instrumentation of array bounds. Various out of bounds -accesses are detected. Flexible array members are not instrumented, as well -as initializers of variables with static storage. - -@end table - -While @option{-ftrapv} causes traps for signed overflows to be emitted, -@option{-fsanitize=undefined} gives a diagnostic message. -This currently works only for the C family of languages. +accesses are detected. Flexible array members and initializers of variables +with static storage are not instrumented. @item -fsanitize=float-divide-by-zero @opindex fsanitize=float-divide-by-zero - Detect floating-point division by zero. Unlike other similar options, @option{-fsanitize=float-divide-by-zero} is not enabled by @option{-fsanitize=undefined}, since floating-point division by zero can @@ -5477,11 +5462,16 @@ be a legitimate way of obtaining infinities and NaNs. @item -fsanitize=float-cast-overflow @opindex fsanitize=float-cast-overflow - This option enables floating-point type to integer conversion checking. We check that the result of the conversion does not overflow. This option does not work well with @code{FE_INVALID} exceptions enabled. +@end table + +While @option{-ftrapv} causes traps for signed overflows to be emitted, +@option{-fsanitize=undefined} gives a diagnostic message. +This currently works only for the C family of languages. + @item -fsanitize-recover @opindex fsanitize-recover By default @option{-fsanitize=undefined} sanitization (and its suboptions Marek
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-02 21:03 GMT+04:00 Uros Bizjak ubiz...@gmail.com: Hello! Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. SSE4 instructions are not very widely used by GCC but I see some significant gains caused by this patch (tested on Avoton on -O3). 2014-07-02 Ilya Enkovich ilya.enkov...@intel.com * config/i386/constraints.md (Yr): New. * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives which use only NO_REX_SSE_REGS. You don't need to add alternatives, just change existing alternatives from x to Yr. The allocator will handle reduced register set just fine. Hi, Thanks for review! My first patch version did such replacement. Performance results were OK but I got into stability issues due to peephole2 pass. Peepholes may exchange operands of instructions and ignore register restrictions assuming all SSE registers are homogeneous. It caused unrecognized instructions on some tests. I preferred to add a new alternative instead of fixing peephole and possibly other similar problems. BTW: I think that Yr is a very confusing name for the alternative. I'd suggest Ya. Will rename. Ilya Uros.
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Thu, Jul 03, 2014 at 12:41:46PM +0200, Marek Polacek wrote: On Sat, Jun 28, 2014 at 06:52:00PM +0200, Gerald Pfeifer wrote: On Fri, 20 Jun 2014, Marek Polacek wrote: +@item -fsanitize=bounds +@opindex fsanitize=bounds + +This option enables instrumentation of array bounds. Various out of bounds +accesses are detected. Flexible array members are not instrumented, as well +as initializers of variables with static storage. Can you make this Flexible array members and initializers... (or ...as well as...)? The current wording confused me a bit at first. And I believe there should be no empty line after @opindex. Thanks, I'll fix both with the following. Also -fsanitize=float-divide-by-zero and -fsanitize=float-cast-overflow descriptions were at a wrong place, so moved a little bit above. Applying to trunk as obvious. 2014-07-03 Marek Polacek pola...@redhat.com * doc/invoke.texi (-fsanitize=bounds): Tweak wording. (-fsanitize=float-divide-by-zero): Move to the table with -fsanitize=undefined suboptions. (-fsanitize=float-cast-overflow): Likewise. Those two aren't -fsanitize=undefined suboptions, so shouldn't be included in there. Jakub
Re: [PATCH] Memory leak in parallel/unique_copy
On 02/07/14 22:42 +0100, Goncalo Carvalho wrote: Hi, In parallel/unique_copy.h __counter is never deleted. I'm also trying to follow from other posts how to submit a patch but is well possible I missed some of the conventions. Many apologies if that's the case. Thanks for this, it looks correct. (I thought I remembered finding something similar in another parallel header, but don't see anything in the ChangeLog.) Do you have a testcase to reproduce the leak that we could add to the testsuite? Or even just to run once with valgrind and verify it's fixed (I tried a trivial test and didn't see a leak). libstdc++-v3/ * include/parallel/unique_copy.h: prevent memory leak of __counter Index: libstdc++-v3/include/parallel/unique_copy.h === --- libstdc++-v3/include/parallel/unique_copy.h (revision 212239) +++ libstdc++-v3/include/parallel/unique_copy.h (working copy) @@ -171,6 +171,7 @@ for (_ThreadIndex __t = 0; __t __num_threads + 1; __t++) __end_output += __counter[__t]; + delete[] __counter; delete[] __borders; return __result + __end_output;
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-02 20:21 GMT+04:00 Andi Kleen a...@firstfloor.org: Ilya Enkovich enkovich@gmail.com writes: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. Why make it a tuning flag? Shouldn't this help unconditionally for code size everywhere? Or is there some drawback? There is already a higher priority for registers not requiring REX. My patch affects cases when compiler has to use xmm8-15 and it just tries to say LRA to assign them for non SSE4 instructions. I doubt it would have some use for other targets than Silvermont. Ilya -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-02 20:27 GMT+04:00 Jakub Jelinek ja...@redhat.com: On Wed, Jul 02, 2014 at 09:21:25AM -0700, Andi Kleen wrote: Ilya Enkovich enkovich@gmail.com writes: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. Why make it a tuning flag? Shouldn't this help unconditionally for code size everywhere? Or is there some drawback? I don't think it will make code smaller, if you already have some value in xmm8..xmm15 register, then by not allowing those registers directly on SSE4 insns just means it reloading and larger code. BTW, is that change needed also when emitting AVX insns instead of SSE4? This is for Silvermont only. It does not have AVX. Ilya Jakub
Re: gfortran-dg-runtest, torture options
Hi! On Tue, 13 Aug 2013 13:06:30 +0200, I wrote: I noticed something strange in the libgomp testresults (but not necessarily specific to libgomp): an arbitrary set of the Fortran execution tests are run just for -O, and others for each of the full set of torture options: -O0, -O1, -O2, and so on. After some time I realized it's the set of tests that contain an explicit »dg-do run« directive that are run for all torture levels, and the tests that inherit the default »set dg-do-what-default run« from libgomp/testsuite/lib/libgomp.exp are only run for -O. This is coming from the special handling in gcc/testsuite/lib/gfortran-dg.exp:gfortran-dg-test (which seems to be present approximately forever). Should this consider the dg-do-what-default case, too? Why is torture testing done only for execution tests? And, why only for Fortran? Is this behavior generally intentional -- of course, bigger testing coverage is nice, but this seems a bit arbitrary to me? Thanks Janis and Mikael for your replies (nearly a year ago...), but still my questions remain to be answered: in my understanding, the libgomp testsuite is not the place for compiler torture testing (different optimization flags and all that -- and, that is done for Fortran only; gfortran-dg-runtest), but rather, I understand the libgomp testsuite to be the place for libgomp library testing ;-), and hence I propose to remove that special casing of Fortran test cases: --- libgomp/testsuite/lib/libgomp-dg.exp +++ libgomp/testsuite/lib/libgomp-dg.exp @@ -5,3 +5,19 @@ proc libgomp-dg-test { prog do_what extra_tool_flags } { proc libgomp-dg-prune { system text } { return [gcc-dg-prune $system $text] } + +# Modified dg-runtest that deal with Fortran modules cleanup. +proc dg-runtest-fortran { testcases flags default-extra-flags } { +global runtests + +foreach testcase $testcases { + # If we're only testing specific files and this isn't one of them, skip it. + if {![runtest_file_p $runtests $testcase]} { + continue + } + verbose Testing [file tail [file dirname $testcase]]/[file tail $testcase] + list-module-names $testcase + dg-test $testcase $flags ${default-extra-flags} + cleanup-modules +} +} --- libgomp/testsuite/libgomp.fortran/fortran.exp +++ libgomp/testsuite/libgomp.fortran/fortran.exp @@ -11,6 +11,10 @@ set lang_link_flags -lgfortran set lang_test_file_found 0 set quadmath_library_path ../libquadmath/.libs +# If a testcase doesn't have special options, use these. +if ![info exists DEFAULT_CFLAGS] then { +set DEFAULT_CFLAGS -O2 +} # Initialize dg. dg-init @@ -60,7 +64,7 @@ if { $lang_test_file_found } { set_ld_library_path_env_vars # Main loop. -gfortran-dg-runtest $tests +dg-runtest-fortran $tests $DEFAULT_CFLAGS } # All done. A follow-up patch could then be to remove all the redundant »dg-do run« directives from the individual test cases, as that's the default set in libgomp/testsuite/lib/libgomp.exp: »set dg-do-what-default run«. Grüße, Thomas pgpDpYkQRvwJA.pgp Description: PGP signature
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Thu, Jul 03, 2014 at 12:46:35PM +0200, Jakub Jelinek wrote: On Thu, Jul 03, 2014 at 12:41:46PM +0200, Marek Polacek wrote: On Sat, Jun 28, 2014 at 06:52:00PM +0200, Gerald Pfeifer wrote: On Fri, 20 Jun 2014, Marek Polacek wrote: +@item -fsanitize=bounds +@opindex fsanitize=bounds + +This option enables instrumentation of array bounds. Various out of bounds +accesses are detected. Flexible array members are not instrumented, as well +as initializers of variables with static storage. Can you make this Flexible array members and initializers... (or ...as well as...)? The current wording confused me a bit at first. And I believe there should be no empty line after @opindex. Thanks, I'll fix both with the following. Also -fsanitize=float-divide-by-zero and -fsanitize=float-cast-overflow descriptions were at a wrong place, so moved a little bit above. Applying to trunk as obvious. 2014-07-03 Marek Polacek pola...@redhat.com * doc/invoke.texi (-fsanitize=bounds): Tweak wording. (-fsanitize=float-divide-by-zero): Move to the table with -fsanitize=undefined suboptions. (-fsanitize=float-cast-overflow): Likewise. Those two aren't -fsanitize=undefined suboptions, so shouldn't be included in there. But they're parts of ubsan and at least -fsanitize=float-divide-by-zero says it is not enabled by -fsanitize=undefined. Dunno, I can move it back if you want. Marek
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
On Thu, Jul 03, 2014 at 02:49:10PM +0400, Ilya Enkovich wrote: 2014-07-02 20:21 GMT+04:00 Andi Kleen a...@firstfloor.org: Ilya Enkovich enkovich@gmail.com writes: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. Why make it a tuning flag? Shouldn't this help unconditionally for code size everywhere? Or is there some drawback? There is already a higher priority for registers not requiring REX. My patch affects cases when compiler has to use xmm8-15 and it just tries to say LRA to assign them for non SSE4 instructions. I doubt it would have some use for other targets than Silvermont. When it is just a hint, shouldn't there be something like Ya,???x or Ya,!x or similar in the SSE4 constraints? I mean, xmm{8-15} can be used, just is costly. Jakub
Re: gfortran-dg-runtest, torture options
On Thu, Jul 03, 2014 at 12:54:32PM +0200, Thomas Schwinge wrote: Thanks Janis and Mikael for your replies (nearly a year ago...), but still my questions remain to be answered: in my understanding, the libgomp testsuite is not the place for compiler torture testing (different optimization flags and all that -- and, that is done for Fortran only; gfortran-dg-runtest), but rather, I understand the libgomp testsuite to be the place for libgomp library testing ;-), and hence I propose to remove that special casing of Fortran test cases: No, it is intentional that we torture test those, libgomp is the place for all OpenMP runtime tests, not just for library testing. Jakub
Re: gfortran-dg-runtest, torture options
Hi! On Thu, 3 Jul 2014 12:58:32 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 12:54:32PM +0200, Thomas Schwinge wrote: Thanks Janis and Mikael for your replies (nearly a year ago...), but still my questions remain to be answered: in my understanding, the libgomp testsuite is not the place for compiler torture testing (different optimization flags and all that -- and, that is done for Fortran only; gfortran-dg-runtest), but rather, I understand the libgomp testsuite to be the place for libgomp library testing ;-), and hence I propose to remove that special casing of Fortran test cases: No, it is intentional that we torture test those, libgomp is the place for all OpenMP runtime tests, not just for library testing. But then, the obvious question: why for Fortran only, but not for C and C++? Grüße, Thomas pgpkr_8KWhEcN.pgp Description: PGP signature
Re: [Patch ARM-AArch64/testsuite v2 02/21] Add unary operators: vabs and vneg.
On Tue, Jul 1, 2014 at 11:05 AM, Christophe Lyon christophe.l...@linaro.org wrote: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 3a0f99b..44c4990 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,11 @@ 2014-06-30 Christophe Lyon christophe.l...@linaro.org + * gcc.target/aarch64/neon-intrinsics/unary_op.inc: New file. + * gcc.target/aarch64/neon-intrinsics/vabs.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vneg.c: Likewise. + +2014-06-30 Christophe Lyon christophe.l...@linaro.org + * gcc.target/arm/README.neon-intrinsics: New file. * gcc.target/aarch64/neon-intrinsics/README: Likewise. * gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h: Likewise. Ok for ARM if no regressions. Wait for an ack from AArch64 maintainers. Ramana diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/unary_op.inc b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/unary_op.inc new file mode 100644 index 000..33f9b5f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/unary_op.inc @@ -0,0 +1,72 @@ +/* Template file for unary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other unary + operators, these can be tested by providing a definition for + EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=OP(x), then store the result. */ +#define TEST_UNARY_OP1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define TEST_UNARY_OP(INSN, Q, T1, T2, W, N) \ + TEST_UNARY_OP1(INSN, Q, T1, T2, W, N) \ + + /* No need for 64 bits variants in the general case. */ + DECL_VARIABLE(vector, int, 8, 8); + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, int, 8, 16); + DECL_VARIABLE(vector, int, 16, 8); + DECL_VARIABLE(vector, int, 32, 4); + + DECL_VARIABLE(vector_res, int, 8, 8); + DECL_VARIABLE(vector_res, int, 16, 4); + DECL_VARIABLE(vector_res, int, 32, 2); + DECL_VARIABLE(vector_res, int, 8, 16); + DECL_VARIABLE(vector_res, int, 16, 8); + DECL_VARIABLE(vector_res, int, 32, 4); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, , int, s, 8, 8); + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector, buffer, q, int, s, 8, 16); + VLOAD(vector, buffer, q, int, s, 16, 8); + VLOAD(vector, buffer, q, int, s, 32, 4); + + /* Apply a unary operator named INSN_NAME. */ + TEST_UNARY_OP(INSN_NAME, , int, s, 8, 8); + TEST_UNARY_OP(INSN_NAME, , int, s, 16, 4); + TEST_UNARY_OP(INSN_NAME, , int, s, 32, 2); + TEST_UNARY_OP(INSN_NAME, q, int, s, 8, 16); + TEST_UNARY_OP(INSN_NAME, q, int, s, 16, 8); + TEST_UNARY_OP(INSN_NAME, q, int, s, 32, 4); + + CHECK_RESULTS (TEST_MSG, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME)(); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vabs.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vabs.c new file mode 100644 index 000..ca3901a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vabs.c @@ -0,0 +1,74 @@ +#define INSN_NAME vabs +#define TEST_MSG VABS/VABSQ + +/* Extra tests for functions requiring floating-point types. */ +void exec_vabs_f32(void); +#define EXTRA_TESTS exec_vabs_f32 + +#include unary_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x10, 0xf, 0xe, 0xd, + 0xc, 0xb, 0xa, 0x9 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x10, 0xf, 0xe, 0xd }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x10, 0xf }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x,
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-03 14:56 GMT+04:00 Jakub Jelinek ja...@redhat.com: On Thu, Jul 03, 2014 at 02:49:10PM +0400, Ilya Enkovich wrote: 2014-07-02 20:21 GMT+04:00 Andi Kleen a...@firstfloor.org: Ilya Enkovich enkovich@gmail.com writes: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. Why make it a tuning flag? Shouldn't this help unconditionally for code size everywhere? Or is there some drawback? There is already a higher priority for registers not requiring REX. My patch affects cases when compiler has to use xmm8-15 and it just tries to say LRA to assign them for non SSE4 instructions. I doubt it would have some use for other targets than Silvermont. When it is just a hint, shouldn't there be something like Ya,???x or Ya,!x or similar in the SSE4 constraints? I mean, xmm{8-15} can be used, just is costly. I made it Ya,?x. I do not know how many '?' would be reasonable here. One was enough to get gain on tests where I expected it. Ilya Jakub
Re: [Patch ARM-AArch64/testsuite v2 03/21] Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub.
On Tue, Jul 1, 2014 at 11:05 AM, Christophe Lyon christophe.l...@linaro.org wrote: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 44c4990..73709c6 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,16 @@ 2014-06-30 Christophe Lyon christophe.l...@linaro.org + * gcc.target/aarch64/neon-intrinsics/binary_op.inc: New file. + * gcc.target/aarch64/neon-intrinsics/vadd.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vand.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vbic.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/veor.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vorn.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vorr.c: Likewise. + * gcc.target/aarch64/neon-intrinsics/vsub.c: Likewise. + Ok for the ARM backend. Wait for an ack from an AArch64 maintainer. Ramana +2014-06-30 Christophe Lyon christophe.l...@linaro.org + * gcc.target/aarch64/neon-intrinsics/unary_op.inc: New file. * gcc.target/aarch64/neon-intrinsics/vabs.c: Likewise. * gcc.target/aarch64/neon-intrinsics/vneg.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/binary_op.inc b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/binary_op.inc new file mode 100644 index 000..3483e0e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/binary_op.inc @@ -0,0 +1,70 @@ +/* Template file for binary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other binary + operators, these can be tested by providing a definition for + EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=OP(x1,x2), then store the result. */ +#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N) \ + TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + + DECL_VARIABLE_ALL_VARIANTS(vector); + DECL_VARIABLE_ALL_VARIANTS(vector2); + DECL_VARIABLE_ALL_VARIANTS(vector_res); + + clean_results (); + + /* Initialize input vector from buffer. */ + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer); + + /* Fill input vector2 with arbitrary values. */ + VDUP(vector2, , int, s, 8, 8, 2); + VDUP(vector2, , int, s, 16, 4, -4); + VDUP(vector2, , int, s, 32, 2, 3); + VDUP(vector2, , int, s, 64, 1, 100); + VDUP(vector2, , uint, u, 8, 8, 20); + VDUP(vector2, , uint, u, 16, 4, 30); + VDUP(vector2, , uint, u, 32, 2, 40); + VDUP(vector2, , uint, u, 64, 1, 2); + VDUP(vector2, q, int, s, 8, 16, -10); + VDUP(vector2, q, int, s, 16, 8, -20); + VDUP(vector2, q, int, s, 32, 4, -30); + VDUP(vector2, q, int, s, 64, 2, 24); + VDUP(vector2, q, uint, u, 8, 16, 12); + VDUP(vector2, q, uint, u, 16, 8, 3); + VDUP(vector2, q, uint, u, 32, 4, 55); + VDUP(vector2, q, uint, u, 64, 2, 3); + + /* Apply a binary operator named INSN_NAME. */ + TEST_MACRO_ALL_VARIANTS_1_5(TEST_BINARY_OP, INSN_NAME); + + CHECK_RESULTS (TEST_MSG, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vadd.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vadd.c new file mode 100644 index 000..f08c620 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vadd.c @@ -0,0 +1,81 @@ +#define INSN_NAME vadd +#define TEST_MSG VADD/VADDQ + +/* Extra tests for functions requiring floating-point types. */ +void exec_vadd_f32(void); +#define EXTRA_TESTS exec_vadd_f32 + +#include binary_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf3, 0xf4, 0xf5, + 0xf6, 0xf7, 0xf8, 0xf9 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xffec, 0xffed, 0xffee, 0xffef }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff3, 0xfff4 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x54 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x4, 0x5, 0x6, 0x7, + 0x8, 0x9, 0xa, 0xb }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xe, 0xf, 0x10, 0x11 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x18, 0x19 };
Re: gfortran-dg-runtest, torture options
On Thu, Jul 03, 2014 at 01:06:48PM +0200, Thomas Schwinge wrote: On Thu, 3 Jul 2014 12:58:32 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 12:54:32PM +0200, Thomas Schwinge wrote: Thanks Janis and Mikael for your replies (nearly a year ago...), but still my questions remain to be answered: in my understanding, the libgomp testsuite is not the place for compiler torture testing (different optimization flags and all that -- and, that is done for Fortran only; gfortran-dg-runtest), but rather, I understand the libgomp testsuite to be the place for libgomp library testing ;-), and hence I propose to remove that special casing of Fortran test cases: No, it is intentional that we torture test those, libgomp is the place for all OpenMP runtime tests, not just for library testing. But then, the obvious question: why for Fortran only, but not for C and C++? Fortran has far more tests with arrays etc. that testing just -O0 or -O2 is insufficient, that is typically not the case for C/C++. Jakub
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich enkovich@gmail.com wrote: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. SSE4 instructions are not very widely used by GCC but I see some significant gains caused by this patch (tested on Avoton on -O3). 2014-07-02 Ilya Enkovich ilya.enkov...@intel.com * config/i386/constraints.md (Yr): New. * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives which use only NO_REX_SSE_REGS. You don't need to add alternatives, just change existing alternatives from x to Yr. The allocator will handle reduced register set just fine. Hi, Thanks for review! My first patch version did such replacement. Performance results were OK but I got into stability issues due to peephole2 pass. Peepholes may exchange operands of instructions and ignore register restrictions assuming all SSE registers are homogeneous. It caused unrecognized instructions on some tests. I preferred to add a new alternative instead of fixing peephole and possibly other similar problems. No, please rather fix the peephole2 patterns. It is just a matter of putting satisfies_constraint_Xx to their insn condition. In effect, peephole2 pass is nullifying your optimization. Also, RA is still free to allocate unwanted registers, even when prefixed with ?. Uros.
Re: [PATCH] Add guality [p]type test.
On Thu, 2014-07-03 at 12:05 +0200, Jakub Jelinek wrote: On Thu, Jul 03, 2014 at 11:53:29AM +0200, Mark Wielaard wrote: I pulled out the guality.exp [p]type test extension from the actual dwarf2out.c changes (which I will repost soon with some tweaks). I think the test extension itself is useful on its own (and will use it to add tests for my new patches). All new tests PASS, except when using -flto, so you'll need the Don't run guality.exp tests with LTO_TORTURE_OPTIONS if you don't want to add new FAILs. But I hope this patch can go in even without that because I do think it is useful on its own. Is what gdb prints for ptype stable across different gdb versions (except for whitespace that you canonicalize)? If yes, this looks good to me. Yes, I believe it is (I tested against gdb git master and gdb 7.6.50). It tries to print the expression as a canonical C type, so it should be stable. GDB itself contains similar tests, but for pregenerated .S files or synthetic generated DWARF. This just extends it to make sure gcc and gdb agree on the produced/consumed debuginfo. Cheers, Mark
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
Hello! There is already a higher priority for registers not requiring REX. My patch affects cases when compiler has to use xmm8-15 and it just tries to say LRA to assign them for non SSE4 instructions. I doubt it would have some use for other targets than Silvermont. When it is just a hint, shouldn't there be something like Ya,???x or Ya,!x or similar in the SSE4 constraints? I mean, xmm{8-15} can be used, just is costly. Maybe we can use Ya*x, similar to *pushdf pattern, where it is costly - but tolerable - to push DFmode value through integer registers. Oh, and I didn't notice that Ya name is already taken... Uros.
Re: [Patch, libgomp] libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}
On Thu, Jul 03, 2014 at 12:19:15PM +0200, Thomas Schwinge wrote: I found the following to work (but so far only did libgomp testing), but that is a little bit more intrusive, but may actually be the right thing to do. (Possibly also in additional places where ${tool}_target_compile is used? CCing testsuite maintainers.) Comments? What about this instead, pass it only for Fortran tests and nothing else? Only very lightly tested so far. 2014-07-03 Jakub Jelinek ja...@redhat.com * testsuite/lib/libgomp.exp (libgomp_target_compile): If $source matches regex $lang_source_re, add $lang_include_flags to options. * testsuite/libgomp.c/c.exp: Unset lang_include_flags. * testsuite/libgomp.c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Likewise. Set lang_source_re and lang_include_flags instead of adding -fintrinsic-modules-path= to ALWAYS_CFLAGS. * testsuite/libgomp.graphite/graphite.exp: Unset lang_include_flags. --- libgomp/testsuite/lib/libgomp.exp.jj2013-11-12 11:30:59.0 +0100 +++ libgomp/testsuite/lib/libgomp.exp 2014-07-03 13:24:31.951953289 +0200 @@ -184,6 +184,8 @@ proc libgomp_target_compile { source des global lang_test_file global lang_library_path global lang_link_flags +global lang_include_flags +global lang_source_re if { [info exists lang_test_file] } { if { $blddir != } { @@ -193,6 +195,10 @@ proc libgomp_target_compile { source des lappend options ldflags=-L${blddir}/${lang_library_path} } lappend options ldflags=${lang_link_flags} + if { [info exists lang_include_flags] \ + [regexp ${lang_source_re} ${source}] } { + lappend options additional_flags=${lang_include_flags} + } } if { [target_info needs_status_wrapper] != [info exists gluefile] } { --- libgomp/testsuite/libgomp.c/c.exp.jj2013-11-12 11:30:59.0 +0100 +++ libgomp/testsuite/libgomp.c/c.exp 2014-07-03 12:43:08.091889315 +0200 @@ -5,6 +5,9 @@ if [info exists lang_library_path] then if [info exists lang_test_file] then { unset lang_test_file } +if [info exists lang_include_flags] then { +unset lang_include_flags +} load_lib libgomp-dg.exp load_gcc_lib gcc-dg.exp --- libgomp/testsuite/libgomp.c++/c++.exp.jj2013-11-12 11:30:59.0 +0100 +++ libgomp/testsuite/libgomp.c++/c++.exp 2014-07-03 12:43:23.488808394 +0200 @@ -7,6 +7,9 @@ set shlib_ext [get_shlib_extension] set lang_link_flags -lstdc++ set lang_test_file_found 0 set lang_library_path ../libstdc++-v3/src/.libs +if [info exists lang_include_flags] then { +unset lang_include_flags +} # Initialize dg. dg-init --- libgomp/testsuite/libgomp.fortran/fortran.exp.jj2013-11-12 11:30:59.0 +0100 +++ libgomp/testsuite/libgomp.fortran/fortran.exp 2014-07-03 13:23:26.074295258 +0200 @@ -8,6 +8,9 @@ global ALWAYS_CFLAGS set shlib_ext [get_shlib_extension] set lang_library_path ../libgfortran/.libs set lang_link_flags-lgfortran +if [info exists lang_include_flags] then { +unset lang_include_flags +} set lang_test_file_found 0 set quadmath_library_path ../libquadmath/.libs @@ -19,7 +22,8 @@ dg-init lappend ALWAYS_CFLAGS additional_flags=-fopenmp if { $blddir != } { -lappend ALWAYS_CFLAGS additional_flags=-fintrinsic-modules-path=${blddir} +set lang_source_re {^.*\.[fF](|90|95|03|08)$} +set lang_include_flags -fintrinsic-modules-path=${blddir} # Look for a static libgfortran first. if [file exists ${blddir}/${lang_library_path}/libgfortran.a] { set lang_test_file ${lang_library_path}/libgfortran.a --- libgomp/testsuite/libgomp.graphite/graphite.exp.jj 2014-01-03 11:41:28.0 +0100 +++ libgomp/testsuite/libgomp.graphite/graphite.exp 2014-07-03 12:42:59.942930755 +0200 @@ -21,6 +21,9 @@ if [info exists lang_library_path] then if [info exists lang_test_file] then { unset lang_test_file } +if [info exists lang_include_flags] then { +unset lang_include_flags +} load_lib libgomp-dg.exp load_gcc_lib gcc-dg.exp Jakub
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-03 15:11 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich enkovich@gmail.com wrote: Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode). This situation happens when REX prefix is used in SSE4 instructions. This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions. I achieved it by adding new tuning flag and new alternatives affected by tuning. SSE4 instructions are not very widely used by GCC but I see some significant gains caused by this patch (tested on Avoton on -O3). 2014-07-02 Ilya Enkovich ilya.enkov...@intel.com * config/i386/constraints.md (Yr): New. * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives which use only NO_REX_SSE_REGS. You don't need to add alternatives, just change existing alternatives from x to Yr. The allocator will handle reduced register set just fine. Hi, Thanks for review! My first patch version did such replacement. Performance results were OK but I got into stability issues due to peephole2 pass. Peepholes may exchange operands of instructions and ignore register restrictions assuming all SSE registers are homogeneous. It caused unrecognized instructions on some tests. I preferred to add a new alternative instead of fixing peephole and possibly other similar problems. No, please rather fix the peephole2 patterns. It is just a matter of putting satisfies_constraint_Xx to their insn condition. In effect, peephole2 pass is nullifying your optimization. Also, RA is still free to allocate unwanted registers, even when prefixed with ?. I didn't find a nice way to fix peephole2 patterns to take register constraints into account. Is there any way to do it? Also fully restrict xmm8-15 does not seem right. It is just costly but not fully disallowed. Ilya Uros.
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich enkovich@gmail.com wrote: I didn't find a nice way to fix peephole2 patterns to take register constraints into account. Is there any way to do it? Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint. Also fully restrict xmm8-15 does not seem right. It is just costly but not fully disallowed. As said earlier, you can try Ya*x as a constraint. Uros.
Re: [Patch, libgomp] libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}
Hi! On Thu, 3 Jul 2014 13:35:15 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 12:19:15PM +0200, Thomas Schwinge wrote: I found the following to work (but so far only did libgomp testing), but that is a little bit more intrusive, but may actually be the right thing to do. (Possibly also in additional places where ${tool}_target_compile is used? CCing testsuite maintainers.) Comments? What about this instead, pass it only for Fortran tests and nothing else? Only very lightly tested so far. Confirming that this does the right thing for my case. * testsuite/lib/libgomp.exp (libgomp_target_compile): If $source matches regex $lang_source_re, add $lang_include_flags to options. * testsuite/libgomp.c/c.exp: Unset lang_include_flags. * testsuite/libgomp.c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Likewise. Set lang_source_re and lang_include_flags instead of adding -fintrinsic-modules-path= to ALWAYS_CFLAGS. * testsuite/libgomp.graphite/graphite.exp: Unset lang_include_flags. Thanks! Yes, that looks less intrusive than my patch. Grüße, Thomas pgpYvMXhxiAob.pgp Description: PGP signature
Re: gfortran-dg-runtest, torture options
Hi! On Thu, 3 Jul 2014 13:09:57 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 01:06:48PM +0200, Thomas Schwinge wrote: On Thu, 3 Jul 2014 12:58:32 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 12:54:32PM +0200, Thomas Schwinge wrote: Thanks Janis and Mikael for your replies (nearly a year ago...), but still my questions remain to be answered: in my understanding, the libgomp testsuite is not the place for compiler torture testing (different optimization flags and all that -- and, that is done for Fortran only; gfortran-dg-runtest), but rather, I understand the libgomp testsuite to be the place for libgomp library testing ;-), and hence I propose to remove that special casing of Fortran test cases: No, it is intentional that we torture test those, libgomp is the place for all OpenMP runtime tests, not just for library testing. But then, the obvious question: why for Fortran only, but not for C and C++? Fortran has far more tests with arrays etc. that testing just -O0 or -O2 is insufficient, that is typically not the case for C/C++. OK to document as follows? 2014-07-03 Jakub Jelinek ja...@redhat.com libgomp/ * testsuite/libgomp.fortran/fortran.exp: Explain gfortran-dg-runtest usage. --- libgomp/testsuite/libgomp.fortran/fortran.exp +++ libgomp/testsuite/libgomp.fortran/fortran.exp @@ -59,7 +59,9 @@ if { $lang_test_file_found } { append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST] set_ld_library_path_env_vars -# Main loop. +# For Fortran we're doing torture testing, as Fortran has far more tests +# with arrays etc. that testing just -O0 or -O2 is insufficient, that is +# typically not the case for C/C++. gfortran-dg-runtest $tests } Grüße, Thomas pgpy4Fk5exAL1.pgp Description: PGP signature
Re: [PATCH PR61576]
Ping! 2014-06-24 13:37 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Hi All, Here is a fix for PR 61576 - additional test was added that block containing reduction statement is predecessor of block containing phi to choose the correct condition. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-06-24 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/61576 * tree-if-conv.c (is_cond_scalar_reduction): Add check that basic block containing reduction statement is predecessor of phi basi block. gcc/testsuite/ChangeLog * gcc.dg/torture/pr61576.c: New test.
Re: [PATCH] Memory leak in parallel/unique_copy
Hi, Many thanks! I'll try add a test to the suite (unsure how foolproof will be in terms of detecting memory usage). The 11000 is simply to go beyond the minimum unique_count needed to specialise the parallel version. This was on g++ (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6) but the issue still exists on latest SVN. I'll try replicate the same test on a more recent install later on. //g++ -std=c++0x -Wall -ggdb -O3 -fopenmp -o uq_leak uq_leak.cpp #include vector #include iostream #include parallel/algorithm int main() { { size_t difftypesiz = sizeof(std::vectordouble::iterator::difference_type); std::cout num_threads: omp_get_max_threads() difference_type_size: difftypesiz std::endl; std::vectordouble v(11000); std::vectordouble u(v.size()); for (auto i = 0u; i v.size(); ++i) v[i] = std::sqrt(i); for (auto i = 0; i 10; ++i) { auto e = std::__parallel::unique_copy(v.begin(), v.end(), u.begin()); } } } Valgrind output below (without patch is first): gdecarv@devapp10 dev]$ valgrind ./uq_leak ==26489== Memcheck, a memory error detector ==26489== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==26489== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==26489== Command: ./uq_leak ==26489== num_threads: 24 difference_type_size: 8 ==26489== ==26489== HEAP SUMMARY: ==26489== in use at exit: 13,416 bytes in 36 blocks ==26489== total heap usage: 57 allocs, 21 frees, 227,784 bytes allocated ==26489== ==26489== LEAK SUMMARY: ==26489==definitely lost: 2,000 bytes in 10 blocks ==26489==indirectly lost: 0 bytes in 0 blocks ==26489== possibly lost: 6,992 bytes in 23 blocks ==26489==still reachable: 4,424 bytes in 3 blocks ==26489== suppressed: 0 bytes in 0 blocks ==26489== Rerun with --leak-check=full to see details of leaked memory ==26489== ==26489== For counts of detected and suppressed errors, rerun with: -v ==26489== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6) [gdecarv@devapp10 dev]$ g++ -std=c++0x -Wall -ggdb -O3 -fopenmp -o uq_leak uq_leak.cpp [gdecarv@devapp10 dev]$ valgrind ./uq_leak ==26530== Memcheck, a memory error detector ==26530== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==26530== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==26530== Command: ./uq_leak ==26530== num_threads: 24 difference_type_size: 8 ==26530== ==26530== HEAP SUMMARY: ==26530== in use at exit: 11,416 bytes in 26 blocks ==26530== total heap usage: 57 allocs, 31 frees, 227,784 bytes allocated ==26530== ==26530== LEAK SUMMARY: ==26530==definitely lost: 0 bytes in 0 blocks ==26530==indirectly lost: 0 bytes in 0 blocks ==26530== possibly lost: 6,992 bytes in 23 blocks ==26530==still reachable: 4,424 bytes in 3 blocks ==26530== suppressed: 0 bytes in 0 blocks ==26530== Rerun with --leak-check=full to see details of leaked memory ==26530== ==26530== For counts of detected and suppressed errors, rerun with: -v ==26530== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6) On 3 July 2014 11:47, Jonathan Wakely jwak...@redhat.com wrote: On 02/07/14 22:42 +0100, Goncalo Carvalho wrote: Hi, In parallel/unique_copy.h __counter is never deleted. I'm also trying to follow from other posts how to submit a patch but is well possible I missed some of the conventions. Many apologies if that's the case. Thanks for this, it looks correct. (I thought I remembered finding something similar in another parallel header, but don't see anything in the ChangeLog.) Do you have a testcase to reproduce the leak that we could add to the testsuite? Or even just to run once with valgrind and verify it's fixed (I tried a trivial test and didn't see a leak). libstdc++-v3/ * include/parallel/unique_copy.h: prevent memory leak of __counter Index: libstdc++-v3/include/parallel/unique_copy.h === --- libstdc++-v3/include/parallel/unique_copy.h (revision 212239) +++ libstdc++-v3/include/parallel/unique_copy.h (working copy) @@ -171,6 +171,7 @@ for (_ThreadIndex __t = 0; __t __num_threads + 1; __t++) __end_output += __counter[__t]; + delete[] __counter; delete[] __borders; return __result + __end_output; -- http://www.cryogenicgraphics.com http://www.flickr.com/photos/hdrflow
Re: gfortran-dg-runtest, torture options
On Thu, Jul 03, 2014 at 02:37:41PM +0200, Thomas Schwinge wrote: OK to document as follows? 2014-07-03 Jakub Jelinek ja...@redhat.com libgomp/ * testsuite/libgomp.fortran/fortran.exp: Explain gfortran-dg-runtest usage. You wrote the patch, so put your name on it. Ok with that change. --- libgomp/testsuite/libgomp.fortran/fortran.exp +++ libgomp/testsuite/libgomp.fortran/fortran.exp @@ -59,7 +59,9 @@ if { $lang_test_file_found } { append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST] set_ld_library_path_env_vars -# Main loop. +# For Fortran we're doing torture testing, as Fortran has far more tests +# with arrays etc. that testing just -O0 or -O2 is insufficient, that is +# typically not the case for C/C++. gfortran-dg-runtest $tests } Jakub
Re: [PATCH] Memory leak in parallel/unique_copy
On 03/07/14 13:40 +0100, Goncalo Carvalho wrote: Hi, Many thanks! I'll try add a test to the suite (unsure how foolproof will be in terms of detecting memory usage). Yes, it might not be worth adding to the testsuite, but I want to be able to verify the patch changes something :-) The 11000 is simply to go beyond the minimum unique_count needed to specialise the parallel version. Ah, that's what I was missing, I didn't bother checking the minimum I needed to pass. I'll try replicate the same test on a more recent install later on. No need, I can do so - thanks for the test.
Normalize interface for all *-dg-runtest
Hi! I have a need to pass »flags« to a gfortran-dg-runtest call, but found that not to be possible as the *-dg-runtest interfaces are narrowed compared to dg-runtest. Here is a patch to fix that. So far only tested in libgomp. OK in principle? I'll then test this thoroughly, and, of course, before commit make sure that no further *-dg-runtest calls have been added compared to my patch's baseline. gcc/testsuite/ * lib/g++-dg.exp (g++-dg-runtest): Change interface to dg-runtest's. Adapt all callers. * lib/gcc-dg.exp (gcc-dg-runtest): Likewise. * lib/gfortran-dg.exp (gfortran-dg-runtest): Likewise. * lib/go-dg.exp (go-dg-runtest): Likewise. * lib/obj-c++-dg.exp (obj-c++-dg-runtest): Likewise. * lib/objc-dg.exp (objc-dg-runtest): Likewise. libffi/ * testsuite/lib/libffi.exp (libffi-dg-runtest): Change interface to dg-runtest's. diff --git gcc/testsuite/g++.dg/asan/asan.exp gcc/testsuite/g++.dg/asan/asan.exp index 30fbb1d..98ff59c 100644 --- gcc/testsuite/g++.dg/asan/asan.exp +++ gcc/testsuite/g++.dg/asan/asan.exp @@ -29,7 +29,7 @@ dg-init if [asan_init] { # Main loop. -gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C $srcdir/c-c++-common/asan/*.c]] +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C $srcdir/c-c++-common/asan/*.c]] } diff --git gcc/testsuite/g++.dg/charset/charset.exp gcc/testsuite/g++.dg/charset/charset.exp index 3ca071e..c54e676 100644 --- gcc/testsuite/g++.dg/charset/charset.exp +++ gcc/testsuite/g++.dg/charset/charset.exp @@ -38,7 +38,7 @@ dg-init # Main loop. g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.{c,cc,S} ]] \ -$DEFAULT_CHARSETCFLAGS + $DEFAULT_CHARSETCFLAGS # All done. dg-finish diff --git gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp index 0cb6539..b0f0362 100644 --- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp +++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp @@ -26,12 +26,12 @@ if { ![check_effective_target_cilkplus] } { dg-init if [cilkplus_init] { # Run the tests that are shared with C. -g++-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/PS/*.c]] +g++-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/PS/*.c]] dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/SE/*.c]] -O3 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/SE/*.c]] dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/SE/*.c]] -g -O2 # Run the C++ only tests. -g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] +g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -O1 -fcilkplus diff --git gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp index d947a0e..b9eb97f 100644 --- gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp +++ gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp @@ -36,7 +36,7 @@ if { ! [string match *: target system does not support the * debug format* \ $comp_output] } { remove-build-file trivial.S g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C $srcdir/c-c++-common/dwarf2/*.c]] \ - $DEFAULT_CFLAGS +$DEFAULT_CFLAGS } # All done. diff --git gcc/testsuite/g++.dg/dfp/dfp.exp gcc/testsuite/g++.dg/dfp/dfp.exp index fceb126..3cfe03c 100644 --- gcc/testsuite/g++.dg/dfp/dfp.exp +++ gcc/testsuite/g++.dg/dfp/dfp.exp @@ -50,10 +50,10 @@ dg-init # Main loop. Run the tests that are specific to C++. g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[C]] \ -$DEFAULT_CXXFLAGS + $DEFAULT_CXXFLAGS # Run tests that are shared with C testing. g++-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/dfp/*.c]] \ -$DEFAULT_CXXFLAGS + $DEFAULT_CXXFLAGS # All done. dg-finish diff --git gcc/testsuite/g++.dg/dg.exp gcc/testsuite/g++.dg/dg.exp index aeae8f3..14beae1 100644 --- gcc/testsuite/g++.dg/dg.exp +++ gcc/testsuite/g++.dg/dg.exp @@ -57,14 +57,14 @@ set tests [prune $tests $srcdir/$subdir/ubsan/*] set tests [prune $tests $srcdir/$subdir/tsan/*] # Main loop. -g++-dg-runtest $tests $DEFAULT_CXXFLAGS +g++-dg-runtest $tests $DEFAULT_CXXFLAGS # C/C++ common tests. g++-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/*.\[cSi\]]] \ - + g++-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cpp/*.\[cS\]]] \ - + # All done. diff --git gcc/testsuite/g++.dg/gcov/gcov.exp gcc/testsuite/g++.dg/gcov/gcov.exp index 892baa8..20cd9d0 100644 --- gcc/testsuite/g++.dg/gcov/gcov.exp +++ gcc/testsuite/g++.dg/gcov/gcov.exp @@ -39,6 +39,6 @@ if { $files != } { } # Main loop. -g++-dg-runtest
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
Here’s an updated patch, providing support for underflow control in the IEEE_ARITHMETIC module, for x86/x86_64 targets and alpha-glibc. Bootstrapped and regtested on x86_64-apple-darwin13, tested by Uros on alpha. OK to commit? underflow.ChangeLog Description: Binary data underflow.diff Description: Binary data
Re: [fortran,patch] Support for IEEE underflow control on x86/x86_64
On Thu, Jul 3, 2014 at 2:43 PM, FX fxcoud...@gmail.com wrote: Here’s an updated patch, providing support for underflow control in the IEEE_ARITHMETIC module, for x86/x86_64 targets and alpha-glibc. Bootstrapped and regtested on x86_64-apple-darwin13, tested by Uros on alpha. The testcase still needs: ! { dg-do run } ! { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } ! { dg-additional-options -msse2 -mfpmath=sse { target { i?86-*-* x86_64-*-* } } } Uros.
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
2014-07-03 16:07 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich enkovich@gmail.com wrote: I didn't find a nice way to fix peephole2 patterns to take register constraints into account. Is there any way to do it? Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint. Peephole doesn't know whether it works with tuned instruction or not, right? I would need to mark all instructions I modify with some attribute and then check for it in peephole. Also fully restrict xmm8-15 does not seem right. It is just costly but not fully disallowed. As said earlier, you can try Ya*x as a constraint. I tried it. It does not seem to affect allocation much. I do not see any gain on targeted tests. Ilya Uros.
Re: [Patch AArch64_be] Fix some vec_concat big-endian confusions
*ping* Thanks, James On Tue, Jun 24, 2014 at 09:45:28AM +0100, James Greenhalgh wrote: Hi, vec_concat ( { a, b }, { c, d }) should give a new vector { a, b, c, d }. On big-endian aarch64 targets, we have to think carefully about what this means as we map GCC's view of endian-ness on to ours. GCC (for reasons I have yet to understand) likes to describe lane-extracts from a vector as endian-ness dependant bit-field extracts. This cause major headaches, and means we have to pretend throughout the backend that lane zero is at the high bits of a vector register. When we have a machine instruction which zeroes the high bits of a vector register, and we want to describe it in RTL, the natural little-endian view is vec_concat ( operand, zeroes ). The reality described above implies that the correct description on big-endian systems is vec_concat ( zeroes, operand ). This also affects arm_neon.h intrinsics. When we say vcombine (a, b) we mean that a should occupy the low 64-bits and b the high 64 bits. We therefore need to take care to swap the operands to vec_concat when we are targeting big-endian. This patch is messy, but it gives an notable improvement in the PASS rates for an internal testsuite for Neon intrinsics. Tested on aarch64-none-elf and aarch64_be-none-elf with no issues, but no improvements either. OK for trunk? Thanks, James --- gcc/ 2014-06-20 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-simd.md (move_lo_quad_internal_mode): New. (move_lo_quad_internal_be_mode): Likewise. (move_lo_quad_mode): Convert to define_expand. (aarch64_simd_move_hi_quad_mode): Gate on BYTES_BIG_ENDIAN. (aarch64_simd_move_hi_quad_be_mode): New. (move_hi_quad_mode): Use appropriate insn for BYTES_BIG_ENDIAN. (aarch64_combinezmode): Gate on BYTES_BIG_ENDIAN. (aarch64_combinez_bemode): New. (aarch64_combinemode): Convert to define_expand. (aarch64_combine_internalmode): New. (aarch64_simd_combinemode): Remove bogus RTL description. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6b81d811b70bd157207f7753027309442ec9e8b5..00e2206b200fd32c6df5987d7317687488e8dadd 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -942,14 +942,38 @@ (define_insn sumaxminmode3 [(set_attr type neon_minmaxq)] ) -;; Move into low-half clearing high half to 0. +;; vec_concat gives a new vector with the low elements from operand 1, and +;; the high elements from operand 2. That is to say, given op1 = { a, b } +;; op2 = { c, d }, vec_concat (op1, op2) = { a, b, c, d }. +;; What that means, is that the RTL descriptions of the below patterns +;; need to change depending on endianness. + +;; Move to the low architectural bits of the register. +;; On little-endian this is { operand, zeroes } +;; On big-endian this is { zeroes, operand } -(define_insn move_lo_quad_mode +(define_insn move_lo_quad_internal_mode [(set (match_operand:VQ 0 register_operand =w,w,w) (vec_concat:VQ (match_operand:VHALF 1 register_operand w,r,r) (vec_duplicate:VHALF (const_int 0] - TARGET_SIMD + TARGET_SIMD !BYTES_BIG_ENDIAN + @ + dup\\t%d0, %1.d[0] + fmov\\t%d0, %1 + dup\\t%d0, %1 + [(set_attr type neon_dupq,f_mcr,neon_dupq) + (set_attr simd yes,*,yes) + (set_attr fp *,yes,*) + (set_attr length 4)] +) + +(define_insn move_lo_quad_internal_be_mode + [(set (match_operand:VQ 0 register_operand =w,w,w) +(vec_concat:VQ + (vec_duplicate:VHALF (const_int 0)) + (match_operand:VHALF 1 register_operand w,r,r)))] + TARGET_SIMD BYTES_BIG_ENDIAN @ dup\\t%d0, %1.d[0] fmov\\t%d0, %1 @@ -960,7 +984,23 @@ (define_insn move_lo_quad_mode (set_attr length 4)] ) -;; Move into high-half. +(define_expand move_lo_quad_mode + [(match_operand:VQ 0 register_operand) + (match_operand:VQ 1 register_operand)] + TARGET_SIMD +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_move_lo_quad_internal_be_mode (operands[0], operands[1])); + else +emit_insn (gen_move_lo_quad_internal_mode (operands[0], operands[1])); + DONE; +} +) + +;; Move operand1 to the high architectural bits of the register, keeping +;; the low architectural bits of operand2. +;; For little-endian this is { operand2, operand1 } +;; For big-endian this is { operand1, operand2 } (define_insn aarch64_simd_move_hi_quad_mode [(set (match_operand:VQ 0 register_operand +w,w) @@ -969,12 +1009,25 @@ (define_insn aarch64_simd_move_hi_quad_ (match_dup 0) (match_operand:VQ 2 vect_par_cnst_lo_half )) (match_operand:VHALF 1 register_operand w,r)))] - TARGET_SIMD + TARGET_SIMD !BYTES_BIG_ENDIAN @ ins\\t%0.d[1], %1.d[0] ins\\t%0.d[1], %1 - [(set_attr type neon_ins) -
Re: [PATCH]Enable elimination of IV use with unsigned type candidate
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng amker.ch...@gmail.com wrote: Sorry for this late reply, I spent some time in understanding the problem. On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng bin.ch...@arm.com wrote: expressions. It's possible to have iv_elimination_compare_lt to do some undo transformation on may_be_zero, but I found it's difficult for cases involving signed/unsigned conversion like case loop-41.c. Since I think there is no obvious benefit to fold may_be_zero here (somehow because the two operands are already in folded forms), this patch just calls build2_loc instead. But it may fold to true/false, no? You are right, it can be folded to false in many cases. Thus I managed to check specific folded forms of may_be_zero, as in attached patch. So far it works for tests I added, but there are some other folded forms not handled. When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in iv_elimination_compare_lt are like below with i_1 of signed type: B: i_1 + 1 A: 0 niter-niter: (unsigned int)i_1 Apparently, (B-A-1) is i_1, which doesn't equal to (unsigned int)i_1. Without this patch, it is considered equal to each other. just looking at this part. Do you have a testcase that exhibits a difference when just applying patch A? So I can have a look here? From the code in iv_elimination_compare_lt I can't figure why we'd end up with i_1 instead of (unsigned int)i_1 as we convert to a common type. I suppose the issue may be that at tree_to_aff_combination time we strip all nops with STRIP_NOPS but when operating on -rest via convert/scale or add we do not strip them again. But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above really doesn't look correct. Just to make sure we don't paper over an issue in tree-affine.c. Thus - testcase? On x86 we don't run into this place in iv_elimination_compare_lt (on an unpatched tree). CCing Zdenek for the meat of patch B. For the record, here is my understanding about the problem. Considering below simple loop, and we don't restrict P to any specific type as GCC now does: some_type p, p_0; int a, b, i; i = a; p = p_0 do { use(p); p += step; i++; } while (i b); We want to optimization it into below form, given there is no overflow/wrap behavior introduced, if NITER is the number of the loop. p = p_0; do { use(p); p += step; } while (p p_0 + (NITER + 1) * step); For convenient, we assume positive step for now. I think it's safe (in other words, no new overflow or wrap), if below two conditions are satisfied: 1) When the original latch executes for N(0) times, the new latch executes for same times. 2) When the original latch executes ZERO time, the new latch executes for ZERO time. For 1), expression p_0 + (NITER + 1) * step should not overflow/wrap upward. This is checked now by code snippet like: /* We need to know that the candidate induction variable does not overflow. While more complex analysis may be used to prove this, for now just check that the variable appears in the original program and that it is computed in a type that guarantees no overflows. */ cand_type = TREE_TYPE (cand-iv-base); if (cand-pos != IP_ORIGINAL || !nowrap_type_p (cand_type)) return false; GCC only checks if the variable appears originally and is of type not overflow/wrap. Just as the comment states, it can be improved (something like introduced in my patch). For 2), expression p_0 + (NITER + 1) * step should not overflow/wrap downward, in other words, p = p_0 + (NITER + 1) * step should hold when may_be_zero (i.e, a + 1 b) holds. Since NITER is computed as (unsigned_type)B - (unsigned_type)A - 1, the new bound is effective in form of p_0 + (unsigned_type)B * step - (unsigned_type)A * step. Due to the folded form of may_be_zero, GCC only handles cases in which B/A are of unsigned type, we only need to make sure that p_0 - (unsigned_type)A * step holds, given (unsigned_type)B = (unsigned_type)A already holds. When comes to cases in which B is of signed type, we need to make sure that (unsigned_type)B doesn't overflow/wrap. So the conclusions are: 1) With the original patch B, patch A is needed because we relax GCC for cases in which B is of signed type. 2) With the updated patch B, patch A won't be needed. Thanks, bin
Re: More informative ODR warnings
+@opindex Wodr +@opindex Wno-odr +@opindex Wodr +Warn about One Definition Rule violations during link time optimization. +Require @option{-flto-odr-type-merging} to be enabled. Enabled by default Duplicated @opindex Wodr. (@item is missing) Requires. Period after default But according to current practice (which I dislike), this should be @item -Wno-odr @opindex Wodr @opindex Wno-odr Disable warnings about One Definition Rule violations during link time optimization. These warnings are enabled by default and require @option{-flto-odr-type-merging} to be enabled. See the example for -Woverflow just above where you added Wodr. + if (!warning_at (DECL_SOURCE_LOCATION (TYPE_NAME (t1)), OPT_Wodr, + type %qT violates one definition rule , + t1)) +return; Why the trailing whitespace within the warning message? + if (!warning_at (DECL_SOURCE_LOCATION (st1), OPT_Wodr, + field %qD (of type %qT) violates one definition rule , + st1, t1)) I agree with others that say that (of type %qT) is understood as %qD being of type %qT. I would also suggest to drop the parentheses in the new message. +inform (UNKNOWN_LOCATION, Conflicting compilation units: %s and %s, +IDENTIFIER_POINTER (name), +IDENTIFIER_POINTER (name1)); GNU-standard diagnostics do not start with uppercase. What do you think about using %qs here? +G_(a type with attributes + is defined in another translation unit Is this a type with different attributes? + bool warned = 0; We should really use true/false for booleans. + FIXME: disable for now; because ODR types are now build during + streaming in, the variants do not need to be linked to the type, + yet. We need to do the merging in cleanup pass to be implemented + soon. */ if (!flag_ltrans merge + 0 TREE_CODE (val-type) == RECORD_TYPE TREE_CODE (type) == RECORD_TYPE TYPE_BINFO (val-type) TYPE_BINFO (type) @@ -569,7 +1076,6 @@ add_type_duplicate (odr_type val, tree t == master_binfo) set_type_binfo ((*val-types)[i], TYPE_BINFO (type)); } - BINFO_TYPE (TYPE_BINFO (type)) = val-type; } else Why commit this part if it is disabled? There seems to be other parts of the code that are known to not work but will be committed. Perhaps I am misunderstanding and this is going to a development branch and not trunk. And tescases? Shouldn't there be one testcase for each possible diagnostic? Otherwise we get diagnostics that are never tested and they stop to work (or they even didn't work in the first place) even though there are huge chunks of code devoted to them. Cheers, Manuel.
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On 02/07/14 13:05, Charles Baylis wrote: On 30 June 2014 14:26, Richard Earnshaw rearn...@arm.com wrote: On 30/06/14 13:53, Charles Baylis wrote: I see two options to fix it - one is to teach the back-end to successfully generate code for this insn, and the other is to teach the back-end that such an insn is not valid. My proposed patch does the former. The latter can presumably be achieved by providing a different kind of memory constraint which disallows constant pool references for these insns although I haven't tried this yet. I think we should be doing the latter (not permitting these operations). If we wanted to do the former, we could just add an offset range for the insn. The reason we don't want the former is that the offset ranges are too small and overly constrain literal pool placement. The attached patch adds a 'Uh' constraint, which means the same as 'm', except that literal pool references are not allowed. Patterns which generate ldr[s]b or ldr[s]h have been updated to use it, and the pool_range attributes have been removed from those patterns. Bootstrapped and make-checked with no regressions on qemu for arm-unknown-linux-gnueabihf. date Charles Baylis charles.bay...@linaro.org PR target/49423 * config/arm/arm-protos.h (arm_legitimate_address_p, arm_is_constant_pool_ref): Add prototypes. * config/arm/arm.c (arm_legitimate_address_p): Remove static. (arm_is_constant_pool_ref) New function. * config/arm/arm.md (unaligned_loadhis, arm_zero_extendhisi2_v6, arm_zero_extendqisi2_v6): Use Uh constraint for memory operand. (arm_extendhisi2, arm_extendhisi2_v6): Use Uh constraint for memory operand and remove pool_range and neg_pool_range attributes. ^ Better to start a new sentence here. (arm_extendqihi_insn, arm_extendqisi, arm_extendqisi_v6): Remove pool_range and neg_pool_range attributes. * config/arm/constraints.md (Uh): New constraint. (Uq): Don't allow constant pool references. (Uq) should be on a new line. OK for trunk? I certainly think this is the right approach. My only worry is whether we also need new predicates that have similar restrictions (see, for example how Uq is matched with arm_extendqisi_mem_op). On balance, I think we're OK without that change, since I don't expect that we'll see MEM (CONST_POOL_ADDR) unless reload has already decided to re-materialize a constant register. So OK, but if you're considering back-ports, I suggest you let it bake a while on trunk first. R.
Re: [PATCH] rs6000: Fix the shift patterns, and add test
On Wed, Jul 2, 2014 at 5:06 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: Firstly, it adds back the split conditions that I accidentally removed. Without it the dot insns are never generated, or rather, always split back to a separate compare instruction. Secondly, the shift amount should be SI always, not GPR, or GCC will insert a zero-extend at expand time that it cannot get rid of later. Ugh. The test tests whether dot-form instructions are generated for both dot and dot2 cases, that is, with just a CC output or also a GPR output; for all four basic shifts, with a register amount or an immediate amount. It also tests for superfluous zero-extends. This also tests if combine simplifies the rotates to right-rotates, which it shouldn't do anymore. Bootstrapped and tested as usual. Okay to commit? Segher 2014-07-02 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.md (rotlmode3, ashlmode3, lshrmode3, ashrmode3): Correct mode of operands[2]. (rotlmode3_dot, rotlmode3_dot2, ashlmode3_dot, ashlmode3_dot2, lshrmode3_dot, lshrmode3_dot2, ashrmode3_dot, ashrmode3_dot2): Correct mode of operands[2]. Fix split condition. gcc/testsuite/ * gcc.target/powerpc/shift-dot.c: New test. Okay. Thanks, David
Re: [PATCH] PR preprocessor/60723 - missing system-ness marks for macro
Jason Merrill ja...@redhat.com writes: On 06/27/2014 03:27 AM, Dodji Seketeli wrote: + print.prev_was_system_token != !!in_system_header_at(loc)) +/* The system-ness of this token is different from the one + of the previous token. Let's emit a line change to + mark the new system-ness before we emit the token. */ +line_marker_emitted = do_line_change(pfile, token, loc, false); Missing spaces before '('. OK with that fixed. Thanks. It appeared that the patch was too eager to emit line changes, even for cases (like when preprocessing asm files) where a new line between tokens can be significant and turn a valid statement into an invalid one. I have updated the patch to prevent that and tested it again on x86_64-unknown-linux-gnu. Christophe Lyon (who reported this latest issue) tested it on his ARM-based system that exhibited the issue. The relevant hunk that changes is this one: @@ -248,9 +252,20 @@ scan_translation_unit (cpp_reader *pfile) if (cpp_get_options (parse_in)-debug) linemap_dump_location (line_table, token-src_loc, print.outf); + + if (do_line_adjustments + !in_pragma + !line_marker_emitted + print.prev_was_system_token != !!in_system_header_at(loc)) + /* The system-ness of this token is different from the one + of the previous token. Let's emit a line change to + mark the new system-ness before we emit the token. */ + line_marker_emitted = do_line_change (pfile, token, loc, false); cpp_output_token (token, print.outf); + line_marker_emitted = false; } + print.prev_was_system_token = !!in_system_header_at(loc); /* CPP_COMMENT tokens and raw-string literal tokens can have embedded new-line characters. Rather than enumerating all the possible token types just check if token uses In there, the change is that I am now testing that line adjustments are allowed and that we are not inside pragmas with the: + if (do_line_adjustments + !in_pragma This make the change coherent with what is done elsewhere in scan_translation_unit. OK to commit this latest version to trunk? gcc/c-family/ChangeLog: * c-ppoutput.c (struct print::prev_was_system_token): New data member. (init_pp_output): Initialize it. (maybe_print_line_1, maybe_print_line, print_line_1, print_line) (do_line_change): Return a flag saying if a line marker was emitted or not. (scan_translation_unit): Detect if the system-ness of the token we are about to emit is different from the one of the previously emitted token. If so, emit a line marker. Avoid emitting useless adjacent line markers. (scan_translation_unit_directives_only): Adjust. gcc/testsuite/ChangeLog: * gcc.dg/cpp/syshdr{4,5}.{c,h}: New test files. Signed-off-by: Dodji Seketeli do...@redhat.com git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@212194 138bc75d-0d04-0410-961f-82ee72b054a4 Signed-off-by: Dodji Seketeli do...@redhat.com --- gcc/c-family/ChangeLog | 15 gcc/c-family/c-ppoutput.c | 78 ++ gcc/testsuite/ChangeLog| 5 +++ gcc/testsuite/gcc.dg/cpp/syshdr4.c | 24 gcc/testsuite/gcc.dg/cpp/syshdr4.h | 8 gcc/testsuite/gcc.dg/cpp/syshdr5.c | 14 +++ gcc/testsuite/gcc.dg/cpp/syshdr5.h | 6 +++ 7 files changed, 126 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr4.c create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr4.h create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr5.c create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr5.h diff --git a/gcc/c-family/c-ppoutput.c b/gcc/c-family/c-ppoutput.c index f3b5fa4..400d3a7 100644 --- a/gcc/c-family/c-ppoutput.c +++ b/gcc/c-family/c-ppoutput.c @@ -36,6 +36,8 @@ static struct unsigned char printed; /* Nonzero if something output at line. */ bool first_time; /* pp_file_change hasn't been called yet. */ const char *src_file;/* Current source file. */ + bool prev_was_system_token; /* True if the previous token was a + system token.*/ } print; /* Defined and undefined macros being queued for output with -dU at @@ -58,11 +60,11 @@ static void account_for_newlines (const unsigned char *, size_t); static int dump_macro (cpp_reader *, cpp_hashnode *, void *); static void dump_queued_macros (cpp_reader *); -static void print_line_1 (source_location, const char*, FILE *); -static void print_line (source_location, const char *); -static void maybe_print_line_1 (source_location, FILE *); -static void maybe_print_line (source_location); -static void do_line_change (cpp_reader *, const cpp_token *, +static bool print_line_1
[patch] fix build failure of x86_64-mingw32, missing crtbegin/crtend.o
Hello, From gcc/i386/config/mingw32.h, STARTFILE_SPEC and ENDFILE_SPEC include crtbegin.o and crtend.o unconditionally. libgcc/config.host includes crtbegin.o and crtend.o in extra_parts for i[34567]86-*-mingw* but not for x86_64-*-mingw*. Building a toolchain for x86_64-pc-mingw32 then rapidly fails with complaints about crtbegin.o and crtend.o missing. This patch is a proposal to fix this by adding the objects to extra_parts, as well as i386/t-cygming to tmake_file so rules are available to build the objects. Tested by verifying that a build with --target=x86_64-pc-mingw32 proceeds to completion after the change. OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier 2014-07-02 Olivier Hainque hain...@adacore.com libgcc/ * config.host (x86_64-*-mingw*): Add i386/t-cygming to tmake_file and crtbegin.o + crtend.o to extra_parts. mingw-crtstuff.diff Description: Binary data
[PATCH, ARM] Work around erratum in VFP9
The VFP9 floating-point unit (as occasionally used with ARM9 devices) has an erratum (760019) whereby it is possible for floating-point division and square-root instructions to be executed twice. This is not a problem if the destination register is not used as an input, but can cause incorrect results if they do. The safest work-around for this issue is to make the compiler treat these instructions as early-clobber; this ensures that the conditions for result corruption cannot occur. This patch takes that approach, but relaxes back to the original behaviour when either the architecture level is ARMv6 or higher or the VFP sub-architecture level is VFPv3 or higher; if either of these are true then the code cannot run on an affected part. 2014-07-03 Richard Earnshaw rearn...@arm.com * arm.md (arch): Add armv6_or_vfpv3. (arch_enabled): Add test for the above. * vfp.md (divsf_vfp, divdf_vfp): Add earlyclobber when code can run on VFP9. (sqrtsf_vfp, sqrtdf_vfp): Likewise. Committed to trunk. R.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 97753ce..674565c 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -127,7 +127,7 @@ ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without ; arm_arch6. This attribute is used to compute attribute enabled, ; use type any to enable an alternative in all cases. -(define_attr arch any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2 +(define_attr arch any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3 (const_string any)) (define_attr arch_enabled no,yes @@ -174,7 +174,12 @@ (and (eq_attr arch iwmmxt2) (match_test TARGET_REALLY_IWMMXT2)) -(const_string yes)] +(const_string yes) + +(and (eq_attr arch armv6_or_vfpv3) + (match_test arm_arch6 || TARGET_VFP3)) +(const_string yes) + ] (const_string no))) diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index e1a48ee..1c9ff19 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -714,25 +714,30 @@ ;; Division insns +; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input +; operands, so mark the output as early clobber for VFPv2 on ARMv5 or +; earlier. (define_insn *divsf3_vfp - [(set (match_operand:SF0 s_register_operand =t) - (div:SF (match_operand:SF 1 s_register_operand t) - (match_operand:SF 2 s_register_operand t)))] + [(set (match_operand:SF0 s_register_operand =t,t) + (div:SF (match_operand:SF 1 s_register_operand t,t) + (match_operand:SF 2 s_register_operand t,t)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP fdivs%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) + (set_attr arch *,armv6_or_vfpv3) (set_attr type fdivs)] ) (define_insn *divdf3_vfp - [(set (match_operand:DF0 s_register_operand =w) - (div:DF (match_operand:DF 1 s_register_operand w) - (match_operand:DF 2 s_register_operand w)))] + [(set (match_operand:DF0 s_register_operand =w,w) + (div:DF (match_operand:DF 1 s_register_operand w,w) + (match_operand:DF 2 s_register_operand w,w)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP_DOUBLE fdivd%?\\t%P0, %P1, %P2 [(set_attr predicable yes) (set_attr predicable_short_it no) + (set_attr arch *,armv6_or_vfpv3) (set_attr type fdivd)] ) @@ -1070,23 +1075,28 @@ ;; Sqrt insns. +; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input +; operands, so mark the output as early clobber for VFPv2 on ARMv5 or +; earlier. (define_insn *sqrtsf2_vfp - [(set (match_operand:SF 0 s_register_operand =t) - (sqrt:SF (match_operand:SF 1 s_register_operand t)))] + [(set (match_operand:SF 0 s_register_operand =t,t) + (sqrt:SF (match_operand:SF 1 s_register_operand t,t)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP fsqrts%?\\t%0, %1 [(set_attr predicable yes) (set_attr predicable_short_it no) + (set_attr arch *,armv6_or_vfpv3) (set_attr type fsqrts)] ) (define_insn *sqrtdf2_vfp - [(set (match_operand:DF 0 s_register_operand =w) - (sqrt:DF (match_operand:DF 1 s_register_operand w)))] + [(set (match_operand:DF 0 s_register_operand =w,w) + (sqrt:DF (match_operand:DF 1 s_register_operand w,w)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP_DOUBLE fsqrtd%?\\t%P0, %P1 [(set_attr predicable yes) (set_attr predicable_short_it no) + (set_attr arch *,armv6_or_vfpv3) (set_attr type fsqrtd)] )
[PATCH][ARM/AArch64 Testsuite] Fix vext[us]64_1.c test on ARM by unsharing test body
Moving into own thread from https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01895.html This fixes the compilation failures of gcc.target/arm/simd/vexts64_1.c and gcc.target/arm/simd/vextu64_1.c that I introduced in r by unsharing the test body on AArch64. (As [u]int64x1_t are vector types on AArch64 but scalar types on ARM.) gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vexts64_1.c: Remove #include, inline test body. * gcc.target/arm/simd/vextu64_1.c: Likewise. * gcc.target/aarch64/simd/ext_s64_1.c: Likewise. * gcc.target/aarch64/simd/ext_u64_1.c: Likewise. * gcc.target/aarch64/simd/ext_s64.x: Remove. * gcc.target/aarch64/simd/ext_u64.x: Remove. On arm-none-eabi (arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard): FAIL-PASS: gcc.target/arm/simd/vexts64_1.c (test for excess errors) UNRESOLVED-NA: gcc.target/arm/simd/vexts64_1.c compilation failed to produce executable NA-PASS: gcc.target/arm/simd/vexts64_1.c execution test FAIL-PASS: gcc.target/arm/simd/vextu64_1.c (test for excess errors) UNRESOLVED-NA: gcc.target/arm/simd/vextu64_1.c compilation failed to produce executable NA-PASS: gcc.target/arm/simd/vextu64_1.c execution test No changes on aarch64-none-elf.Index: gcc/testsuite/gcc.target/arm/simd/vexts64_1.c === --- gcc/testsuite/gcc.target/arm/simd/vexts64_1.c (revision 211933) +++ gcc/testsuite/gcc.target/arm/simd/vexts64_1.c (working copy) @@ -6,7 +6,22 @@ /* { dg-add-options arm_neon } */ #include arm_neon.h -#include ../../aarch64/simd/ext_s64.x +extern void abort (void); + +int +main (int argc, char **argv) +{ + int64_t arr1[] = {0}; + int64x1_t in1 = vld1_s64 (arr1); + int64_t arr2[] = {1}; + int64x1_t in2 = vld1_s64 (arr2); + int64x1_t actual = vext_s64 (in1, in2, 0); + if (actual != in1) +abort (); + + return 0; +} + /* Don't scan assembler for vext - it can be optimized into a move from r0. */ /* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/arm/simd/vextu64_1.c === --- gcc/testsuite/gcc.target/arm/simd/vextu64_1.c (revision 211933) +++ gcc/testsuite/gcc.target/arm/simd/vextu64_1.c (working copy) @@ -6,7 +6,22 @@ /* { dg-add-options arm_neon } */ #include arm_neon.h -#include ../../aarch64/simd/ext_u64.x +extern void abort (void); + +int +main (int argc, char **argv) +{ + uint64_t arr1[] = {0}; + uint64x1_t in1 = vld1_u64 (arr1); + uint64_t arr2[] = {1}; + uint64x1_t in2 = vld1_u64 (arr2); + uint64x1_t actual = vext_u64 (in1, in2, 0); + if (actual != in1) +abort (); + + return 0; +} + /* Don't scan assembler for vext - it can be optimized into a move from r0. */ /* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/simd/ext_s64.x === --- gcc/testsuite/gcc.target/aarch64/simd/ext_s64.x (revision 211933) +++ gcc/testsuite/gcc.target/aarch64/simd/ext_s64.x (working copy) @@ -1,17 +0,0 @@ -extern void abort (void); - -int -main (int argc, char **argv) -{ - int i, off; - int64_t arr1[] = {0}; - int64x1_t in1 = vld1_s64 (arr1); - int64_t arr2[] = {1}; - int64x1_t in2 = vld1_s64 (arr2); - int64x1_t actual = vext_s64 (in1, in2, 0); - if (actual[0] != in1[0]) -abort (); - - return 0; -} - Index: gcc/testsuite/gcc.target/aarch64/simd/ext_u64.x === --- gcc/testsuite/gcc.target/aarch64/simd/ext_u64.x (revision 211933) +++ gcc/testsuite/gcc.target/aarch64/simd/ext_u64.x (working copy) @@ -1,17 +0,0 @@ -extern void abort (void); - -int -main (int argc, char **argv) -{ - int i, off; - uint64_t arr1[] = {0}; - uint64x1_t in1 = vld1_u64 (arr1); - uint64_t arr2[] = {1}; - uint64x1_t in2 = vld1_u64 (arr2); - uint64x1_t actual = vext_u64 (in1, in2, 0); - if (actual[0] != in1[0]) -abort (); - - return 0; -} - Index: gcc/testsuite/gcc.target/aarch64/simd/ext_u64_1.c === --- gcc/testsuite/gcc.target/aarch64/simd/ext_u64_1.c (revision 211933) +++ gcc/testsuite/gcc.target/aarch64/simd/ext_u64_1.c (working copy) @@ -4,8 +4,23 @@ /* { dg-options -save-temps -O3 -fno-inline } */ #include arm_neon.h -#include ext_u64.x +extern void abort (void); + +int +main (int argc, char **argv) +{ + uint64_t arr1[] = {0}; + uint64x1_t in1 = vld1_u64 (arr1); + uint64_t arr2[] = {1}; + uint64x1_t in2 = vld1_u64 (arr2); + uint64x1_t actual = vext_u64 (in1, in2, 0); + if (actual[0] != in1[0]) +abort (); + + return 0; +} + /* Do not scan-assembler. An EXT instruction could be emitted, but would merely return its first argument, so it is legitimate to optimize it out. */ /* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/simd/ext_s64_1.c
Re: [patch 1/4] change specific int128 - generic intN
Yes. That's exactly the problem I'm trying to solve here. I'm making partial int modes have real corresponding types, and they can be any bit size, with target PS*modes to match. The MSP430, for example, has 20-bit modes, 20-bit operands, and __int20. Rounding up to byte sizes forces everything into an emulated SImode which makes code size huge and performance much worse. And the hardware really loads 20 bits and not 24 bits? If so, I think you might want to consider changing the unit to 4 bits instead of 8 bits. If no, the mode is padded and has 24-bit size so why is setting TYPE_PRECISION to 20 not sufficient to achieve what you want? Thus, in these cases, TYPE_SIZE and TYPE_SIZE_UNIT no longer have a * BITS_PER_UNIT mathematical relationship. I'm skeptical this can work, it's pretty fundamental. -- Eric Botcazou
Re: [patch 1/4] change specific int128 - generic intN
And the hardware really loads 20 bits and not 24 bits? If so, I think you might want to consider changing the unit to 4 bits instead of 8 bits. If no, the mode is padded and has 24-bit size so why is setting TYPE_PRECISION to 20 not sufficient to achieve what you want? The hardware transfers data in and out of byte-oriented memory in TYPE_SIZE_UNITS chunks. Once in a hardware register, all operations are either 8, 16, or 20 bits (TYPE_SIZE) in size. So yes, values are padded in memory, but no, they are not padded in registers. Setting TYPE_PRECISION is mostly useless, because most of gcc assumes it's the same as TYPE_SIZE and ignores it. Heck, most of gcc is oblivious to the idea that types might not be powers-of-two in size. GCC doesn't even bother with a DECL_PRECISION. Thus, in these cases, TYPE_SIZE and TYPE_SIZE_UNIT no longer have a * BITS_PER_UNIT mathematical relationship. I'm skeptical this can work, it's pretty fundamental. It seems to work just fine in testing, and I'm trying to make it non-fundamental.
Re: [patch 1/4] change specific int128 - generic intN
On 07/03/2014 06:12 PM, DJ Delorie wrote: The hardware transfers data in and out of byte-oriented memory in TYPE_SIZE_UNITS chunks. Once in a hardware register, all operations are either 8, 16, or 20 bits (TYPE_SIZE) in size. So yes, values are padded in memory, but no, they are not padded in registers. Setting TYPE_PRECISION is mostly useless, because most of gcc assumes it's the same as TYPE_SIZE and ignores it. That's what'll need fixing then. I doubt there are too many places that require changing. Also, the above seems inaccurate: $ grep TYPE_PREC *.c|wc -l 633 $ grep TYPE_SIZE *.c|wc -l 551 Heck, most of gcc is oblivious to the idea that types might not be powers-of-two in size. GCC doesn't even bother with a DECL_PRECISION. Sure - why would you even need one? Thus, in these cases, TYPE_SIZE and TYPE_SIZE_UNIT no longer have a * BITS_PER_UNIT mathematical relationship. I'm skeptical this can work, it's pretty fundamental. It seems to work just fine in testing, and I'm trying to make it non-fundamental. I also think this is not a very good idea. Bernd
Re: [PATCH, ARM] Work around erratum in VFP9
On Thu, Jul 3, 2014 at 4:15 PM, Richard Earnshaw rearn...@arm.com wrote: The VFP9 floating-point unit (as occasionally used with ARM9 devices) has an erratum (760019) whereby it is possible for floating-point division and square-root instructions to be executed twice. This is not a problem if the destination register is not used as an input, but can cause incorrect results if they do. The safest work-around for this issue is to make the compiler treat these instructions as early-clobber; this ensures that the conditions for result corruption cannot occur. This patch takes that approach, but relaxes back to the original behaviour when either the architecture level is ARMv6 or higher or the VFP sub-architecture level is VFPv3 or higher; if either of these are true then the code cannot run on an affected part. 2014-07-03 Richard Earnshaw rearn...@arm.com * arm.md (arch): Add armv6_or_vfpv3. (arch_enabled): Add test for the above. * vfp.md (divsf_vfp, divdf_vfp): Add earlyclobber when code can run on VFP9. (sqrtsf_vfp, sqrtdf_vfp): Likewise. Richard, you would need full relative path for back-end files here. Thanks, bin Committed to trunk. R.
Re: [GOMP4, OpenACC] Fixed-form Fortran code failing to parse
Hi Cesar! On Wed, 2 Jul 2014 17:27:48 -0700, Cesar Philippidis ce...@codesourcery.com wrote: Thomas, is this patch ok for gomp-4_0-branch? Tobias has approved the patch (and I'm confirming it does fix the issue); thanks to you and Tobias for looking into this! If so, please check it in. I'm happy to, but why don't we just get you set up? You're covered by the general Mentor Graphics/CodeSourcery copyright assignment. Please request an account per http://gcc.gnu.org/svnwrite.html, and on https://sourceware.org/cgi-bin/pdw/ps_form.cgi put in tho...@schwinge.name as approver. Grüße, Thomas pgpvz3nieUgGe.pgp Description: PGP signature
PR C++/60209 - Declaration of user-defined literal operator cause error
Support operator (...) per CWG 1473. I'll be AFK over the holiday. Bootstrapped and tested on x86_64-linux. OK? I'm less sure if this is appropriate for 4.9. Index: cp/parser.c === --- cp/parser.c (revision 212248) +++ cp/parser.c (working copy) @@ -1895,7 +1895,7 @@ static tree cp_parser_identifier (cp_parser *); static tree cp_parser_string_literal - (cp_parser *, bool, bool); + (cp_parser *, bool, bool, bool); static tree cp_parser_userdef_char_literal (cp_parser *); static tree cp_parser_userdef_string_literal @@ -3566,7 +3566,8 @@ FUTURE: ObjC++ will need to handle @-strings here. */ static tree -cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok) +cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok, + bool lookup_udlit = true) { tree value; size_t count; @@ -3721,7 +3722,10 @@ { tree literal = build_userdef_literal (suffix_id, value, OT_NONE, NULL_TREE); - value = cp_parser_userdef_string_literal (literal); + if (lookup_udlit) + value = cp_parser_userdef_string_literal (literal); + else + value = literal; } } else @@ -12635,7 +12639,7 @@ { tree id = NULL_TREE; cp_token *token; - bool bad_encoding_prefix = false; + bool utf8 = false; /* Peek at the next token. */ token = cp_lexer_peek_token (parser-lexer); @@ -12835,83 +12839,73 @@ cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE); return ansi_opname (ARRAY_REF); +case CPP_UTF8STRING: +case CPP_UTF8STRING_USERDEF: + utf8 = true; +case CPP_STRING: case CPP_WSTRING: case CPP_STRING16: case CPP_STRING32: -case CPP_UTF8STRING: - bad_encoding_prefix = true; - /* Fall through. */ - -case CPP_STRING: - if (cxx_dialect == cxx98) - maybe_warn_cpp0x (CPP0X_USER_DEFINED_LITERALS); - if (bad_encoding_prefix) - { - error (invalid encoding prefix in literal operator); - return error_mark_node; - } - if (TREE_STRING_LENGTH (token-u.value) 2) - { - error (expected empty string after %operator% keyword); - return error_mark_node; - } - /* Consume the string. */ - cp_lexer_consume_token (parser-lexer); - /* Look for the suffix identifier. */ - token = cp_lexer_peek_token (parser-lexer); - if (token-type == CPP_NAME) - { - id = cp_parser_identifier (parser); - if (id != error_mark_node) - { - const char *name = IDENTIFIER_POINTER (id); - return cp_literal_operator_id (name); - } - } - else if (token-type == CPP_KEYWORD) - { - error (unexpected keyword; - remove space between quotes and suffix identifier); - return error_mark_node; - } - else - { - error (expected suffix identifier); - return error_mark_node; - } - +case CPP_STRING_USERDEF: case CPP_WSTRING_USERDEF: case CPP_STRING16_USERDEF: case CPP_STRING32_USERDEF: -case CPP_UTF8STRING_USERDEF: - bad_encoding_prefix = true; - /* Fall through. */ + { + tree str, string_tree; + int sz, len; -case CPP_STRING_USERDEF: - if (cxx_dialect == cxx98) - maybe_warn_cpp0x (CPP0X_USER_DEFINED_LITERALS); - if (bad_encoding_prefix) - { - error (invalid encoding prefix in literal operator); + if (cxx_dialect == cxx98) + maybe_warn_cpp0x (CPP0X_USER_DEFINED_LITERALS); + + /* Consume the string. */ + str = cp_parser_string_literal (parser, /*translate=*/true, + /*wide_ok=*/true, /*lookup_udlit=*/false); + if (str == error_mark_node) return error_mark_node; - } - { - tree string_tree = USERDEF_LITERAL_VALUE (token-u.value); - if (TREE_STRING_LENGTH (string_tree) 2) + else if (TREE_CODE (str) == USERDEF_LITERAL) { + string_tree = USERDEF_LITERAL_VALUE (str); + id = USERDEF_LITERAL_SUFFIX_ID (str); + } + else + { + string_tree = str; + /* Look for the suffix identifier. */ + token = cp_lexer_peek_token (parser-lexer); + if (token-type == CPP_NAME) + id = cp_parser_identifier (parser); + else if (token-type == CPP_KEYWORD) + { + error (unexpected keyword; + remove space between quotes and suffix identifier); + return error_mark_node; + } + else + { + error (expected suffix identifier); + return error_mark_node; + } + } + sz = TREE_INT_CST_LOW
Re: Strenghten assumption about dynamic type changes (placement new)
On 07/02/2014 06:30 PM, Jan Hubicka wrote: But this is one of things that was not quite clear to me. I know that polymorphic type A was created at a give memory location. THis means that accesses to that location in one alias class has been made. Now I destroy A and turn it into B, construct B and make memory accesses in different alias set. I see this has chance to work if one is base of another, but if B is completely different type, I think strick aliasin should just make those accesses to not alias and in turn make whole thing undefined? Right, if they're unrelated types the accesses don't alias (3.10p10). On the subject of aliasing, there's a proposal to add explicit alias sets to C++: http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3988.pdf Any thoughts? Thanks! I will take a look. I would like to decide what to do with this approach. I can see 1) I can start explicitly tracking if type came from a declaration or was detected dynamically (by seeing a vtable write or constructor call). Currently we don't do the second, but I would like to understand if these make difference 2) The code in question first detect that a type in a given variable is fully constructed and then starts tracking it across function calls. If needed I can check if my unwind stack contains some additional constructors/destructors that may possibly be currently destructing the type if it is valid to call destructor and construct same type there again. If one can there destruct the type and build completely different type, we need updates elsewhere in devirt machinery, too. Thanks! Honza Jason
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On July 3, 2014 9:55:36 AM CEST, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 09:41:15AM +0200, Richard Biener wrote: On July 3, 2014 7:37:13 AM CEST, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jul 02, 2014 at 04:06:30PM -0700, Jason Merrill wrote: I think that makes sense; I'm not aware of anyone working on improving LTO debugging. I think at this point all we care about is that with -flto we don't ICE on those, perhaps we should arrange to change all the tests into dg-do compile with -flto and ignore all gdb-test and have some env var override which would force full testing also with -flto? I think the individual tests that currently fail can be appropriately changed, no? That is hard, as whether a test fails heavily depends on the optimization flags and targets, so maintaining xfails would be a nightmare. Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Richard. BTW, the trunk has lots of guality regressions even on x86_64-linux compared to 4.9 branch now :(, some of them are LTO only, but others are not. +FAIL: gcc.dg/guality/pr36728-1.c -O1 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O1 line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -fomit-frame-pointer line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -fomit-frame-pointer line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -g line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O3 -g line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -Os line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -Os line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 18 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O1 line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O1 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -fomit-frame-pointer line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -fomit-frame-pointer line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -g line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O3 -g line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -Os line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -Os line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 16 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 14 arg7 == 30 +FAIL: gcc.dg/guality/pr36728-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 arg7 == 30 -XPASS: gcc.dg/guality/pr41353-1.c -O1 line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O3 -fomit-frame-pointer line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O3 -g line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -Os line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 28 j == 28 + 37 -XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 28 j == 28 + 37 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O2 line 41 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer line 41 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 35 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 36 e == a[1] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 39 c == a[0] +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line 40 v == 1 +FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops line
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On Thu, Jul 03, 2014 at 08:37:07PM +0200, Richard Biener wrote: Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Let's keep it as is then? Jakub
[PATCH] Fix high handling in wi::mul_internal (PR tree-optimization/61682)
Hi! Several places in wi::mul_internal didn't handle high parameter and would return the low bits instead of high bits. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-07-03 Jakub Jelinek ja...@redhat.com PR tree-optimization/61682 * wide-int.cc (wi::mul_internal): Handle high correctly for umul_ppmm using cases and when one of the operands is equal to 1. * gcc.c-torture/execute/pr61682.c: New test. --- gcc/wide-int.cc.jj 2014-05-30 10:51:11.0 +0200 +++ gcc/wide-int.cc 2014-07-03 09:35:11.084228924 +0200 @@ -1,5 +1,5 @@ /* Operations with very long integers. - Copyright (C) 2012-2013 Free Software Foundation, Inc. + Copyright (C) 2012-2014 Free Software Foundation, Inc. Contributed by Kenneth Zadeck zad...@naturalbridge.com This file is part of GCC. @@ -1282,6 +1282,12 @@ wi::mul_internal (HOST_WIDE_INT *val, co wi::fits_uhwi_p (op1) wi::fits_uhwi_p (op2)) { + /* This case never overflows. */ + if (high) + { + val[0] = 0; + return 1; + } umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); return 1 + (val[1] != 0 || val[0] 0); } @@ -1294,6 +1300,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ()); if (needs_overflow) *overflow = (upper != 0); + if (high) + val[0] = upper; return 1; } } @@ -1302,12 +1310,28 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* Handle multiplications by 1. */ if (op1 == 1) { + if (high) + { + if (sgn == SIGNED wi::neg_p (op2)) + val[0] = -1; + else + val[0] = 0; + return 1; + } for (i = 0; i op2len; i++) val[i] = op2val[i]; return op2len; } if (op2 == 1) { + if (high) + { + if (sgn == SIGNED wi::neg_p (op1)) + val[0] = -1; + else + val[0] = 0; + return 1; + } for (i = 0; i op1len; i++) val[i] = op1val[i]; return op1len; --- gcc/testsuite/gcc.c-torture/execute/pr61682.c.jj2014-03-19 15:57:57.735114622 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr61682.c 2014-07-03 09:40:26.215520476 +0200 @@ -0,0 +1,17 @@ +/* PR tree-optimization/61682 */ + +int a, b; +static int *c = b; + +int +main () +{ + int *d = a; + for (a = 0; a 12; a++) +*c |= *d / 9; + + if (b != 1) +__builtin_abort (); + + return 0; +} Jakub
[PATCH] Fix recognize_single_bit_test (PR tree-optimization/61684)
Hi! The rhs1 of CONVERT_EXPR_CODE_P doesn't have to be a SSA_NAME, can be e.g. invariant like ADDR_EXPR of a var, but ifcombine didn't think about that possibility. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9/4.8? 2014-07-03 Jakub Jelinek ja...@redhat.com PR tree-optimization/61684 * tree-ssa-ifcombine.c (recognize_single_bit_test): Make sure rhs1 of conversion is a SSA_NAME before using SSA_NAME_DEF_STMT on it. * gcc.c-torture/compile/pr61684.c: New test. --- gcc/tree-ssa-ifcombine.c.jj 2014-06-06 09:19:22.0 +0200 +++ gcc/tree-ssa-ifcombine.c2014-07-03 11:46:25.868335148 +0200 @@ -233,7 +233,8 @@ recognize_single_bit_test (gimple cond, while (is_gimple_assign (stmt) ((CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)) (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) - = TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt) + = TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt + TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME) || gimple_assign_ssa_name_copy_p (stmt))) stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt)); --- gcc/testsuite/gcc.c-torture/compile/pr61684.c.jj2014-07-03 12:06:46.654858358 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr61684.c 2014-07-03 12:09:05.016123771 +0200 @@ -0,0 +1,15 @@ +/* PR tree-optimization/61684 */ + +int a, c; +static int *b = 0; +short d; +static short **e = 0; + +void +foo () +{ + for (; c 1; c++) +; + *e = d; + a = d (c 1) *b; +} Jakub
[PATCH] Fix ICE with thunks (PR middle-end/61654)
Hi! update_ssa that expand_thunk calls, if it needs to change anything, computes CDI_DOMINATORS, but we assert that dominators are not computed when we release e.g. an unused thunk. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-07-03 Jakub Jelinek ja...@redhat.com PR middle-end/61654 * cgraphunit.c (expand_thunk): Call free_dominance_info. * g++.dg/opt/pr61654.C: New test. --- gcc/cgraphunit.c.jj 2014-07-01 19:38:24.0 +0200 +++ gcc/cgraphunit.c2014-07-03 15:51:44.329423346 +0200 @@ -1693,6 +1693,7 @@ expand_thunk (struct cgraph_node *node, #ifdef ENABLE_CHECKING verify_flow_info (); #endif + free_dominance_info (CDI_DOMINATORS); /* Since we want to emit the thunk, we explicitly mark its name as referenced. */ --- gcc/testsuite/g++.dg/opt/pr61654.C.jj 2014-07-03 15:55:42.413163208 +0200 +++ gcc/testsuite/g++.dg/opt/pr61654.C 2014-07-03 15:55:16.0 +0200 @@ -0,0 +1,27 @@ +// PR middle-end/61654 +// { dg-do compile } + +class A +{ + virtual int a (int, int = 0) = 0; + int b (const int ); + int c; +}; + +class B : virtual A +{ + int d; + int a (int, int); +}; + +int +A::b (const int ) +{ + return a ('\0'); +} + +int +B::a (int, int) +{ + return 0 ? 0 : d; +} Jakub
Re: Normalize interface for all *-dg-runtest
On Jul 3, 2014, at 5:49 AM, Thomas Schwinge tho...@codesourcery.com wrote: I have a need to pass »flags« to a gfortran-dg-runtest call, but found that not to be possible as the *-dg-runtest interfaces are narrowed compared to dg-runtest. Here is a patch to fix that. So far only tested in libgomp. OK in principle? Ok. As usual, watch for any screams…
Re: [GOMP4, OpenACC] Fixed-form Fortran code failing to parse
On 07/03/2014 10:01 AM, Thomas Schwinge wrote: I'm happy to, but why don't we just get you set up? You're covered by the general Mentor Graphics/CodeSourcery copyright assignment. Please request an account per http://gcc.gnu.org/svnwrite.html, and on https://sourceware.org/cgi-bin/pdw/ps_form.cgi put in tho...@schwinge.name as approver. Thank you for sponsoring me Thomas! Fixed in r212269. Cesar
[wwwdocs] Adjust two textual references to http://gcc.gnu.org
There were two cases, where we did not have links any more, but textual references to gcc.gnu.org via http. This addresses it. Applied. Gerald Index: gcc-2.96.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-2.96.html,v retrieving revision 1.3 diff -u -r1.3 gcc-2.96.html --- gcc-2.96.html 21 Sep 2006 14:17:36 - 1.3 +++ gcc-2.96.html 3 Jul 2014 18:59:27 - @@ -40,7 +40,7 @@ versions that were not issued by the GCC team./p pPlease see -a href=snapshots.htmlhttp://gcc.gnu.org/snapshots.html/a +a href=snapshots.htmlhttps://gcc.gnu.org/snapshots.html/a if you want to use our latest snapshots. We suggest you use 2.95.2 if you are uncertain./p Index: news/javaannounce.html === RCS file: /cvs/gcc/wwwdocs/htdocs/news/javaannounce.html,v retrieving revision 1.5 diff -u -r1.5 javaannounce.html --- news/javaannounce.html 21 Jan 2002 10:24:45 - 1.5 +++ news/javaannounce.html 3 Jul 2014 18:59:27 - @@ -29,8 +29,8 @@ suite of free software tools for compiled Java./p pFor instructions on downloading and installation, and for more -information on gcj and libgcj in general, see the Gcj and Libgcj -homepage at a href=../java/http://gcc.gnu.org/java//a./p +information on gcj and libgcj in general, see the +a href=../java/Gcj and Libgcj homepage/a./p /body /html
[wwwdocs] Buildstat update for 4.7
Latest results for 4.7.x -tgc Testresults for 4.7.4: i386-pc-solaris2.8 i386-pc-solaris2.9 sparc-sun-solaris2.8 sparc-sun-solaris2.9 sparc64-sun-solaris2.8 x86_64-apple-darwin13.2.0 x86_64-pc-linux-gnu Index: buildstat.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/buildstat.html,v retrieving revision 1.13 diff -u -r1.13 buildstat.html --- buildstat.html 11 Jun 2014 18:49:25 - 1.13 +++ buildstat.html 3 Jul 2014 19:25:20 - @@ -115,6 +115,7 @@ tdi386-pc-solaris2.8/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00073.html;4.7.4/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00255.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00254.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-05/msg00269.html;4.7.3/a, @@ -131,6 +132,7 @@ tdi386-pc-solaris2.9/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00074.html;4.7.4/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00227.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00226.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02618.html;4.7.0/a @@ -217,6 +219,7 @@ tdsparc-sun-solaris2.8/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00075.html;4.7.4/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00238.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00237.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02928.html;4.7.0/a, @@ -228,6 +231,7 @@ tdsparc-sun-solaris2.9/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00076.html;4.7.4/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00242.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-05/msg00265.html;4.7.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02620.html;4.7.0/a @@ -246,6 +250,7 @@ tdsparc64-sun-solaris2.8/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00077.html;4.7.4/a a href=https://gcc.gnu.org/ml/gcc-testresults/2013-09/msg02154.html;4.7.3/a /td /tr @@ -293,6 +298,14 @@ /tr tr +tdx86_64-apple-darwin13.2.0/td +tdnbsp;/td +tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg01320.html;4.7.4/a +/td +/tr + +tr tdx86_64-pc-solaris2.10/td tdnbsp;/td tdTest results: @@ -301,6 +314,14 @@ /tr tr +tdx86_64-pc-linux-gnu/td +tdnbsp;/td +tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg01475.html;4.7.4/a, +/td +/tr + +tr tdx86_64-unknown-linux-gnu/td tdnbsp;/td tdTest results:
[wwwdocs] Buildstat update for 4.8
Latest results for 4.8.x -tgc Testresults for 4.8.3: i686-pc-linux-gnu sparc-sun-solaris2.9 sparc64-sun-solaris2.9 Index: buildstat.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/buildstat.html,v retrieving revision 1.8 diff -u -r1.8 buildstat.html --- buildstat.html 11 Jun 2014 18:49:26 - 1.8 +++ buildstat.html 3 Jul 2014 19:25:27 - @@ -130,6 +130,7 @@ tdi686-pc-linux-gnu/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg02861.html;4.8.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01348.html;4.8.2/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01313.html;4.8.2/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-07/msg02349.html;4.8.1/a, @@ -212,6 +213,7 @@ tdsparc-sun-solaris2.9/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00071.html;4.8.3/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-10/msg02067.html;4.8.2/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00243.html;4.8.1/a, a href=https://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02774.html;4.8.0/a, @@ -231,6 +233,7 @@ tdsparc64-sun-solaris2.9/td tdnbsp;/td tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg00070.html;4.8.3/a a href=https://gcc.gnu.org/ml/gcc-testresults/2013-09/msg02155.html;4.8.1/a /td /tr
Re: [PATCH] Add guality [p]type test.
Is what gdb prints for ptype stable across different gdb versions (except for whitespace that you canonicalize)? If yes, this looks good to me. Mark Yes, I believe it is (I tested against gdb git master and gdb 7.6.50). Mark It tries to print the expression as a canonical C type, so it should be Mark stable. GDB itself contains similar tests, but for pregenerated .S files Mark or synthetic generated DWARF. This just extends it to make sure gcc and Mark gdb agree on the produced/consumed debuginfo. I think it should be reasonably reliable. Something like this (but for value-printing) is already done in the libstdc++ test suite. Tom
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On July 3, 2014 8:38:14 PM CEST, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 08:37:07PM +0200, Richard Biener wrote: Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Let's keep it as is then? That works for me. Richard. Jakub
Re: [patch 1/4] change specific int128 - generic intN
That's what'll need fixing then. Can I change TYPE_SIZE to TYPE_SIZE_WITH_PADDING then? Because it's not reflecting the type's size any more. Why do we have to round up a type's size anyway? That's a pointless assumption *unless* you're allocating memory space for it, and in that case, you want TYPE_SIZE_UNITS anyway. I doubt there are too many places that require changing. I don't doubt it, because I've been fighting these assumptions for years. Heck, most of gcc is oblivious to the idea that types might not be powers-of-two in size. GCC doesn't even bother with a DECL_PRECISION. Sure - why would you even need one? Why do we need to have DECL_SIZE_UNITS (the size of the type, rounded up to whole number of bytes) and DECL_SIZE (the size of the type, rounded up to whole number of bytes), yet not have something that says how big the decl *really is* ? A pointer on MSP430 is 20 bits. All the general registers are 20 bits. Not 16, and not 24. 20. There's nothing in a decl that says I'm 20 bits and inevitably it ends up being SImode instead of PSImode. It seems to work just fine in testing, and I'm trying to make it non-fundamental. I also think this is not a very good idea. Then please provide a very good idea for how to teach gcc about true 20-bit types in a system with 8-bit memory and 16-bit words.
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On Thu, 2014-07-03 at 21:52 +0200, Richard Biener wrote: On July 3, 2014 8:38:14 PM CEST, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 08:37:07PM +0200, Richard Biener wrote: Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Let's keep it as is then? That works for me. I don't find that very satisfactory. I want to add more guality tests, but the fact that they are unreliable and by default introduce even more FAILs when lto is enabled makes that not very attractive. I do like Jakub's suggestion to disable the guality tests be run with lto by default, but provide an environment variable to enable them for those that want to try them anyway. Shall I implement that? Thanks, Mark
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On Thu, Jul 03, 2014 at 10:04:35PM +0200, Mark Wielaard wrote: On Thu, 2014-07-03 at 21:52 +0200, Richard Biener wrote: On July 3, 2014 8:38:14 PM CEST, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 08:37:07PM +0200, Richard Biener wrote: Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Let's keep it as is then? That works for me. I don't find that very satisfactory. I want to add more guality tests, but the fact that they are unreliable and by default introduce even more FAILs when lto is enabled makes that not very attractive. I do like Jakub's suggestion to disable the guality tests be run with lto by default, but provide an environment variable to enable them for those that want to try them anyway. Shall I implement that? They aren't that unrealiable (at least, if people committing patches don't ignore regressions in there). Just one should diff contrib/test_summary output from earlier builds to the latest, that way it is clear what is a regression and what is not. Jakub
Re: [PATCH] Don't run guality.exp tests with LTO_TORTURE_OPTIONS.
On Thu, 2014-07-03 at 22:14 +0200, Jakub Jelinek wrote: On Thu, Jul 03, 2014 at 10:04:35PM +0200, Mark Wielaard wrote: On Thu, 2014-07-03 at 21:52 +0200, Richard Biener wrote: On July 3, 2014 8:38:14 PM CEST, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 03, 2014 at 08:37:07PM +0200, Richard Biener wrote: Well, simply removing the regression testing for LTO is a maintainance nightmare as well. The guality testsuite is very noisy anyway with all the xfail and xpass. Let's keep it as is then? That works for me. I don't find that very satisfactory. I want to add more guality tests, but the fact that they are unreliable and by default introduce even more FAILs when lto is enabled makes that not very attractive. I do like Jakub's suggestion to disable the guality tests be run with lto by default, but provide an environment variable to enable them for those that want to try them anyway. Shall I implement that? They aren't that unrealiable (at least, if people committing patches don't ignore regressions in there). Just one should diff contrib/test_summary output from earlier builds to the latest, that way it is clear what is a regression and what is not. The are much more unreliable than any other test. With guality.exp disabled one can just eyeball the results and investigate new FAILS. There are only a handful. When you include guality.exp you can easily get the impression the gcc testsuite is really bad (and it isn't!) And the problem is that it makes adding new tests a pain. See my new tests, they introduce new FAILs because LTO is enabled by default for guality.exp at the moment. It just results in a slow increase of FAILs that people have to ignore. And I am afraid that will just result in people missing real regressions. I don't mind if there is active work to fix LTO DWARF debuginfo generation issues and the guality.exp LTO failures will soon disappear, but if there is no active work on reducing the amount of failures and introducing new guality.exp testcases will keep adding more FAILs I think we are much better off disabling them for now. Thanks, Mark
[PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
This patch implements the TARGET_ATOMIC_ASSIGN_EXPAND_FENV for powerpc-fpu. I have to adjust current c11-atomic-exec-5 testcase because for IBM long double 0 += LDBL_MAX might generate overflow/underflow in internal __gcc_qadd calculations. The c11-atomic-exec-5 now passes for linux/powerpc, checked on powerpc32-linux-fpu, powerpc64-linux, and powerpc64le-linux. -- 2014-07-03 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index bf67e72..75a2a45 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1621,6 +1621,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -32991,6 +32994,105 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_C (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Generates the equivalent of feclearexcept (FE_ALL_EXCEPT): + + double fenv_clear = __builtin_mffs (); + *(uint64_t)fenv_clear = 0xLL; + __builtin_mtfsf (0xff, fenv_clear); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT clear_exception_mask = +HOST_WIDE_INT_UC (0x); + + tree fenv_clear = create_tmp_var (double_type_node, NULL); + + tree clear_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_clear, call_mffs); + + tree fenv_clean_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_clear_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +fenv_clean_llu, build_int_cst (uint64_type_node, clear_exception_mask)); + + tree fenv_clear_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, +fenv_clear_llu_and); + + tree clear_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_clear_mtfsf); + + *clear = build2 (COMPOUND_EXPR, void_type_node, clear_mffs, clear_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_fenv = create_tmp_var (double_type_node, NULL); + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu, build_int_cst (uint64_type_node, update_exception_mask)); + + tree new_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, new_exception_mask)); + + tree
RE: [PATCH] PR preprocessor/60723 - missing system-ness marks for macro
Hello Dodji, I found time this morning to run your changes through our system. I patched our gcc-4.8.1 with your latest change, and ran it through our folly testsuite. One thing that I immediately noticed was that this increased the preprocessed size substantially. When preprocessing my favorite .cpp file, its .ii grew from 137k lines to 145k, a 5% increase. All the folly code compiled and ran successfully under the changes. I looked at some of the preprocessed output. I was pleased to see that consecutive macros that expanded entirely to system tokens did not insert unnecessary line directives between them. I did, however, notice that __LINE__ was treated as belonging to the calling file, even when its token appears in the system file. That is to say: CODE: // system macro #define FOO() sys_token __LINE__ sys_token // non-system callsite FOO() // preprocessed output # 3 test.cpp 3 4 sys_token # 3 test.cpp 3 # 3 test.cpp 3 4 sys_token :CODE This seems to generalize to other builtin macros, like __FILE__. Otherwise, the code looks fine. There is only one thing I noticed: + if (do_line_adjustments + !in_pragma + !line_marker_emitted + print.prev_was_system_token != !!in_system_header_at(loc)) + /* The system-ness of this token is different from the one + of the previous token. Let's emit a line change to + mark the new system-ness before we emit the token. */ + line_marker_emitted = do_line_change (pfile, token, loc, false); This line_marker_emitted assignment is immediately overwritten, two lines below. However, from a maintainability perspective, this is probably a good assignment to keep. cpp_output_token (token, print.outf); + line_marker_emitted = false; } Thanks for this diff! Cheers, Nicholas
Re: [wwwdocs] Buildstat update for 4.8
On Thu, 3 Jul 2014, Tom G. Christensen wrote: Latest results for 4.8.x Thanks, applied! Gerald
Re: [PATCH] Fix high handling in wi::mul_internal (PR tree-optimization/61682)
Jakub Jelinek ja...@redhat.com writes: @@ -1302,12 +1310,28 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* Handle multiplications by 1. */ if (op1 == 1) { + if (high) + { + if (sgn == SIGNED wi::neg_p (op2)) + val[0] = -1; + else + val[0] = 0; + return 1; + } for (i = 0; i op2len; i++) val[i] = op2val[i]; return op2len; } if (op2 == 1) { + if (high) + { + if (sgn == SIGNED wi::neg_p (op1)) + val[0] = -1; + else + val[0] = 0; + return 1; + } for (i = 0; i op1len; i++) val[i] = op1val[i]; return op1len; I think the preferred way of writing this is wi::neg_p (op1, sgn) OK otherwise, thanks, and sorry for the multiple breakage. Richard
Re: [PATCH] Fix high handling in wi::mul_internal (PR tree-optimization/61682)
On Jul 3, 2014, at 2:53 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Jakub Jelinek ja...@redhat.com writes: + if (sgn == SIGNED wi::neg_p (op1)) I think the preferred way of writing this is wi::neg_p (op1, svn) Yes.
[PATCH] Fix confusion between target, host and symbolic number byte sizes
The bswap pass deals with 3 possibly different byte size: host, target and the size a byte marker occupied in the symbolic_number structure [1]. However, as of now the code mixes the three size. This works in practice as the pass is only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities but I prefered to keep the code working the same as before) for which a new macro is introduced (BITS_PER_MARKERS), anything related to storing the value or a byte marker in a variable should check for the host byte size or wide integer size and anything aimed at manipulating the target value should check for BITS_PER_UNIT. [1] Although the comment for this structure implies that a byte marker as the same size as the host byte, the way it is used in the code (even before any of my patch) shows that it uses a fixed size of 8 [2]. [2] Note that since the pass is only active for targets with BITS_PER_UNIT == 8, it might be using the target byte size. gcc/ChangeLog: 2014-07-04 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment about the size of byte markers. (do_shift_rotate): Fix confusion between host, target and marker byte size. (verify_symbolic_number_p): Likewise. (find_bswap_or_nop_1): Likewise. (find_bswap_or_nop): Likewise. diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index ca2b30d..55c5df7 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt) /* A symbolic number is used to detect byte permutation and selection patterns. Therefore the field N contains an artificial number - consisting of byte size markers: + consisting of octet sized markers: - 0- byte has the value 0 - 1..size - byte contains the content of the byte - number indexed with that value minus one. + 0- target byte has the value 0 + 1..size - marker value is the target byte index minus one. To detect permutations on memory sources (arrays and structures), a symbolic number is also associated a base address (the array or structure the load is @@ -1631,6 +1630,8 @@ struct symbolic_number { unsigned HOST_WIDE_INT range; }; +#define BITS_PER_MARKER 8 + /* The number which the find_bswap_or_nop_1 result should match in order to have a nop. The number is masked according to the size of the symbolic number before using it. */ @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code, struct symbolic_number *n, int count) { - int bitsize = TYPE_PRECISION (n-type); + int size = TYPE_PRECISION (n-type) / BITS_PER_UNIT; - if (count % 8 != 0) + if (count % BITS_PER_UNIT != 0) return false; + count = (count / BITS_PER_UNIT) * BITS_PER_MARKER; /* Zero out the extra bits of N in order to avoid them being shifted into the significant bits. */ - if (bitsize 8 * (int)sizeof (int64_t)) -n-n = ((uint64_t)1 bitsize) - 1; + if (size 64 / BITS_PER_MARKER) +n-n = ((uint64_t) 1 (size * BITS_PER_MARKER)) - 1; switch (code) { @@ -1670,22 +1672,22 @@ do_shift_rotate (enum tree_code code, case RSHIFT_EXPR: /* Arithmetic shift of signed type: result is dependent on the value. */ if (!TYPE_UNSIGNED (n-type) - (n-n ((uint64_t) 0xff (bitsize - 8 + (n-n ((uint64_t) 0xff ((size - 1) * BITS_PER_MARKER return false; n-n = count; break; case LROTATE_EXPR: - n-n = (n-n count) | (n-n (bitsize - count)); + n-n = (n-n count) | (n-n ((size * BITS_PER_MARKER) - count)); break; case RROTATE_EXPR: - n-n = (n-n count) | (n-n (bitsize - count)); + n-n = (n-n count) | (n-n ((size * BITS_PER_MARKER) - count)); break; default: return false; } /* Zero unused bits for size. */ - if (bitsize 8 * (int)sizeof (int64_t)) -n-n = ((uint64_t)1 bitsize) - 1; + if (size 64 / BITS_PER_MARKER) +n-n = ((uint64_t) 1 (size * BITS_PER_MARKER)) - 1; return true; } @@ -1726,13 +1728,13 @@ init_symbolic_number (struct symbolic_number *n, tree src) if (size % BITS_PER_UNIT != 0) return false; size /= BITS_PER_UNIT; - if (size (int)sizeof (uint64_t)) + if (size 64 / BITS_PER_MARKER) return false; n-range = size; n-n = CMPNOP; - if (size (int)sizeof (int64_t)) -n-n = ((uint64_t)1 (size * BITS_PER_UNIT)) - 1; + if (size 64 / BITS_PER_MARKER) +n-n = ((uint64_t) 1 (size * BITS_PER_MARKER)) - 1; return true; } @@ -1870,15 +1872,17 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) case BIT_AND_EXPR: { int i, size =
[PATCH] PR rtl-optimization/61712
Hi, This crash is due to fail to consider the exception situation that the insn variable may not be a insn at all. arm.c (thumb1_reorg): if the selected insn is not a insn, continue to next bb. --- gcc/config/arm/arm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 89684bb..50ae64b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16720,7 +16720,7 @@ thumb1_reorg (void) insn = PREV_INSN (insn); /* Find the last cbranchsi4_insn in basic block BB. */ - if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) + if (!INSN_P (insn) || (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)) continue; /* Get the register with which we are comparing. */ -- 1.9.1
[PATCH 2/2] Remove x86 cmpstrnsi
From: Andi Kleen a...@linux.intel.com In my tests the optimized glibc out of line strcmp is always faster than using inline rep ; cmpsb, even for small strings. The Intel optimization manual also recommends to not use it. So remove the cmpstrnsi instruction. Tested on Sandy Bridge, Westmere Intel CPUs. gcc/: 2014-07-02 Andi Kleen a...@linux.intel.com * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders. --- gcc/config/i386/i386.md | 85 - 1 file changed, 85 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 5f32a24..67f1343 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -15878,91 +15878,6 @@ (const_string *))) (set_attr mode QI)]) -(define_expand cmpstrnsi - [(set (match_operand:SI 0 register_operand) - (compare:SI (match_operand:BLK 1 general_operand) - (match_operand:BLK 2 general_operand))) - (use (match_operand 3 general_operand)) - (use (match_operand 4 immediate_operand))] - -{ - rtx addr1, addr2, out, outlow, count, countreg, align; - - if (optimize_insn_for_size_p () !TARGET_INLINE_ALL_STRINGOPS) -FAIL; - - /* Can't use this if the user has appropriated ecx, esi or edi. */ - if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG]) -FAIL; - - out = operands[0]; - if (!REG_P (out)) -out = gen_reg_rtx (SImode); - - addr1 = copy_addr_to_reg (XEXP (operands[1], 0)); - addr2 = copy_addr_to_reg (XEXP (operands[2], 0)); - if (addr1 != XEXP (operands[1], 0)) -operands[1] = replace_equiv_address_nv (operands[1], addr1); - if (addr2 != XEXP (operands[2], 0)) -operands[2] = replace_equiv_address_nv (operands[2], addr2); - - count = operands[3]; - countreg = ix86_zero_extend_to_Pmode (count); - - /* %%% Iff we are testing strict equality, we can use known alignment - to good advantage. This may be possible with combine, particularly - once cc0 is dead. */ - align = operands[4]; - - if (CONST_INT_P (count)) -{ - if (INTVAL (count) == 0) - { - emit_move_insn (operands[0], const0_rtx); - DONE; - } - emit_insn (gen_cmpstrnqi_nz_1 (addr1, addr2, countreg, align, -operands[1], operands[2])); -} - else -{ - rtx (*gen_cmp) (rtx, rtx); - - gen_cmp = (TARGET_64BIT -? gen_cmpdi_1 : gen_cmpsi_1); - - emit_insn (gen_cmp (countreg, countreg)); - emit_insn (gen_cmpstrnqi_1 (addr1, addr2, countreg, align, - operands[1], operands[2])); -} - - outlow = gen_lowpart (QImode, out); - emit_insn (gen_cmpintqi (outlow)); - emit_move_insn (out, gen_rtx_SIGN_EXTEND (SImode, outlow)); - - if (operands[0] != out) -emit_move_insn (operands[0], out); - - DONE; -}) - -;; Produce a tri-state integer (-1, 0, 1) from condition codes. - -(define_expand cmpintqi - [(set (match_dup 1) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_dup 2) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (parallel [(set (match_operand:QI 0 register_operand) - (minus:QI (match_dup 1) -(match_dup 2))) - (clobber (reg:CC FLAGS_REG))])] - -{ - operands[1] = gen_reg_rtx (QImode); - operands[2] = gen_reg_rtx (QImode); -}) - ;; memcmp recognizers. The `cmpsb' opcode does nothing if the count is ;; zero. Emit extra code to make sure that a zero-length compare is EQ. -- 2.0.0
[PATCH 1/2] Remove i386 cmpstrnsi peephole
From: Andi Kleen a...@linux.intel.com The peephole that removes the code to compute a tristate for cmpstrnsi when only a boolean jump is needed never triggers in my tests. Just remove it. gcc/: 2014-07-02 Andi Kleen a...@linux.intel.com * config/i386/i386.md: Remove peepholes for cmpstrn*. --- gcc/config/i386/i386.md | 77 - 1 file changed, 77 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 695b981..5f32a24 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16078,83 +16078,6 @@ (const_string 0) (const_string *))) (set_attr prefix_rep 1)]) - -;; Peephole optimizations to clean up after cmpstrn*. This should be -;; handled in combine, but it is not currently up to the task. -;; When used for their truth value, the cmpstrn* expanders generate -;; code like this: -;; -;; repz cmpsb -;; seta %al -;; setb %dl -;; cmpb %al, %dl -;; jcc label -;; -;; The intermediate three instructions are unnecessary. - -;; This one handles cmpstrn*_nz_1... -(define_peephole2 - [(parallel[ - (set (reg:CC FLAGS_REG) - (compare:CC (mem:BLK (match_operand 4 register_operand)) - (mem:BLK (match_operand 5 register_operand - (use (match_operand 6 register_operand)) - (use (match_operand:SI 3 immediate_operand)) - (clobber (match_operand 0 register_operand)) - (clobber (match_operand 1 register_operand)) - (clobber (match_operand 2 register_operand))]) - (set (match_operand:QI 7 register_operand) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_operand:QI 8 register_operand) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (reg FLAGS_REG) - (compare (match_dup 7) (match_dup 8))) - ] - peep2_reg_dead_p (4, operands[7]) peep2_reg_dead_p (4, operands[8]) - [(parallel[ - (set (reg:CC FLAGS_REG) - (compare:CC (mem:BLK (match_dup 4)) - (mem:BLK (match_dup 5 - (use (match_dup 6)) - (use (match_dup 3)) - (clobber (match_dup 0)) - (clobber (match_dup 1)) - (clobber (match_dup 2))])]) - -;; ...and this one handles cmpstrn*_1. -(define_peephole2 - [(parallel[ - (set (reg:CC FLAGS_REG) - (if_then_else:CC (ne (match_operand 6 register_operand) - (const_int 0)) - (compare:CC (mem:BLK (match_operand 4 register_operand)) - (mem:BLK (match_operand 5 register_operand))) - (const_int 0))) - (use (match_operand:SI 3 immediate_operand)) - (use (reg:CC FLAGS_REG)) - (clobber (match_operand 0 register_operand)) - (clobber (match_operand 1 register_operand)) - (clobber (match_operand 2 register_operand))]) - (set (match_operand:QI 7 register_operand) - (gtu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (match_operand:QI 8 register_operand) - (ltu:QI (reg:CC FLAGS_REG) (const_int 0))) - (set (reg FLAGS_REG) - (compare (match_dup 7) (match_dup 8))) - ] - peep2_reg_dead_p (4, operands[7]) peep2_reg_dead_p (4, operands[8]) - [(parallel[ - (set (reg:CC FLAGS_REG) - (if_then_else:CC (ne (match_dup 6) - (const_int 0)) - (compare:CC (mem:BLK (match_dup 4)) - (mem:BLK (match_dup 5))) - (const_int 0))) - (use (match_dup 3)) - (use (reg:CC FLAGS_REG)) - (clobber (match_dup 0)) - (clobber (match_dup 1)) - (clobber (match_dup 2))])]) ;; Conditional move instructions. -- 2.0.0