RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
-Original Message- From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Monday, June 29, 2015 2:17 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. True. Do you know the exact number of bits that the initial estimate and the subsequent refinement steps add for your micro-arch? I am not sure on this. I need to check for cortex-a57 case. But best thing for cortex-a57 case is not to use this optimization by default. If we get an -mrecip-sqrt command line , then we can add it for gromacs kind application to get gains. Any thoughts on this? Regards, Venkat. Thanks, Philipp. On 29 Jun 2015, at 10:17, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Hmm, Reducing the iterations to 1 step for float and 2 steps for double I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Regards, Venkat. -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Sunday, June 28, 2015 8:35 PM To: Kumar, Venkataramanan Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: I got around ~12% gain with -Ofast -mcpu=cortex-a57. I get around 11/12% on thunderX with the patch and the decreasing the iterations change (1/2) compared to without the patch. Thanks, Andrew Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich Sent: Thursday, June 25, 2015 9:13 PM To: Kumar, Venkataramanan Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, what is the relative gain that you see on Cortex-A57? Thanks, Philipp. On 25 Jun 2015, at 17:35, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Changing to 1 step for float and 2 steps for double gives better gains now for gromacs on cortex-a57. Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Benedikt Huber Sent: Thursday, June 25, 2015 4:09 PM To: pins...@gmail.com Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- systems.com Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Andrew, This is NOT a win on thunderX at least for single precision because you have to do the divide and sqrt in the same time as it takes 5 multiples (estimate and step are multiplies in the thunderX pipeline). Doubles is 10 multiplies which is just the same as what the patch does (but it is really slightly less than 10, I rounded up). So in the end this is NOT a win at all for thunderX unless we do one less step for both single and double. Yes, the expected benefit from rsqrt estimation is implementation specific. If one has a better initial rsqrte or an application that can trade precision for execution time, we could offer a command line option to do only 2 steps for doulbe and 1 step for float; similar to - mrecip-precision for PowerPC. What are your thoughts on that? Best regards, Benedikt
Re: [AArch64][2/2] Implement -fpic for -mcmodel=small
On 27 June 2015 at 14:49, Jiong Wang jiong.w...@arm.com wrote: Andreas Schwab writes: Jiong Wang jiong.w...@arm.com writes: Andreas Schwab writes: spawn -ignore SIGHUP /opt/gcc/gcc-20150627/Build/gcc/xgcc -B/opt/gcc/gcc-20150627/Build/gcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -w -fpie -c -o pr65593.o /opt/gcc/gcc-20150627/gcc/testsuite/gcc.c-torture/compile/pr65593.c /tmp/cc0Pymaf.s: Assembler messages: /tmp/cc0Pymaf.s:11: Error: unknown relocation modifier at operand 2 -- `ldr x0,[x0,#:gotpage_lo15:a]' /tmp/cc0Pymaf.s:19: Error: unknown relocation modifier at operand 2 -- `ldr x2,[x0,#:gotpage_lo15:bar]' Andreas, The binutils patch has been upstreamed already. Please checkout the latest binutils code. That must work with the current binutils. I see, agree. Will adopt current ILP32 feature detection mechanism into -fpic. And some new TLS patches may need similiar check also. Thanks. Hi Jiong, I'm not sure I fully understand your answer. I believe you are going to change the testcase, right? I'm seeing whole toolchain build failures (when building glibc) since you committed this, because of the new relocations. I understand your suggestion is to move to trunk binutils to have support for these new relocations. Any other things I could do to avoid changing the binutils version I use? Thanks Christophe. -- Regards, Jiong
[wwwdocs] Knuth's pages have moved, adjust URLs
Applied. Gerald Index: readings.html === RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v retrieving revision 1.239 diff -u -r1.239 readings.html --- readings.html 27 Jun 2015 19:26:46 - 1.239 +++ readings.html 29 Jun 2015 11:30:46 - @@ -209,13 +209,12 @@ [EM-micks]. The number stands for the average of numbers of 14 actual computers very similar to MMIX. The name may also be due to a predecessor appropriately named MIX. - br /MMIX is used in program examples in a - href=http://www-cs-faculty.stanford.edu/~knuth/;Donald E. Knuth/a's - a href=http://www-cs-faculty.stanford.edu/~knuth/taocp.html;The Art + br /MMIX is used in program examples in Donald E. Knuth's + a href=http://www-cs-faculty.stanford.edu/~uno/taocp.html;The Art of Computer Programming/a (ISBN 0-201-89683-4). - br /a href=http://www-cs-faculty.stanford.edu/~knuth/mmix.html;Knuth's - MMIX page/a has more information about MMIX. Knuth also wrote a - a href=http://www-cs-faculty.stanford.edu/~knuth/mmixware.html;book + br /The a href=http://www-cs-faculty.stanford.edu/~uno/mmix.html;MMIX + page/a has more information about MMIX. Knuth also wrote a + a href=http://www-cs-faculty.stanford.edu/~uno/mmixware.html;book specifically about MMIX/a (MMIXware, ISBN 3-540-66938-8). /li Index: gcc-3.1/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.1/changes.html,v retrieving revision 1.68 diff -u -r1.68 changes.html --- gcc-3.1/changes.html28 Jun 2014 07:45:10 - 1.68 +++ gcc-3.1/changes.html29 Jun 2015 11:30:46 - @@ -256,7 +256,7 @@ ul liHans-Peter Nilsson has contributed a port to a -href=http://www-cs-faculty.stanford.edu/~knuth/mmix.html;MMIX/a, +href=http://www-cs-faculty.stanford.edu/~uno/mmix.html;MMIX/a, the CPU architecture used in new editions of Donald E. Knuth's emThe Art of Computer Programming/em./li Index: projects/prefetch.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/prefetch.html,v retrieving revision 1.25 diff -u -r1.25 prefetch.html --- projects/prefetch.html 30 Jan 2011 18:38:35 - 1.25 +++ projects/prefetch.html 29 Jun 2015 11:30:46 - @@ -786,14 +786,14 @@ pa name=ref_11[11]/a emMMIX Op Codes/em, Don Knuth; -a href=http://www-cs-faculty.stanford.edu/~knuth/mmop.html; -http://www-cs-faculty.stanford.edu/~knuth/mmop.html/a./p +a href=http://www-cs-faculty.stanford.edu/~uno/mmop.html; +http://www-cs-faculty.stanford.edu/~uno/mmop.html/a./p pa name=ref_12[12]/a emThe Art of Computer Programming, Fascicle 1: MMIX/em, Don Knuth, Addison Wesley Longman, 2001; a href=http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz; -http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz/a./p +http://www-cs-faculty.stanford.edu/~uno/fasc1.ps.gz/a./p pa name=ref_13[13]/a emPA-RISC 2.0 Instruction Set Architecture/em;
Re: [wwwdocs] Knuth's pages have moved, adjust URLs
Oh, and the patch also removed a link to his home page. Enough links in place already. Gerald
Re: C/C++ PATCH to smarten up -Wswitch-bool (PR c/66322)
On Thu, Jun 25, 2015 at 12:40:09PM -0600, Jeff Law wrote: If this patch is approved, I'd like to backport it even to 5 branch after some time so it's fixed in 5.2. I've committed the patch to trunk. If it's OK with Jakub, it's fine with me. So Jakub, is it ok to backport this to 5.2 after a few days? OK. Thanks for review! Marek
[PATCH] Add canonicalize_funcptr_for_compare target-insn
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2015-06-29 Richard Biener rguent...@suse.de * target-insns.def (canonicalize_funcptr_for_compare): Add. * fold-const.c (build_range_check): Replace uses of HAVE_canonicalize_funcptr_for_compare. (fold_widened_comparison): Likewise. (fold_sign_changed_comparison): Likewise. * dojump.c: Include target.h. (do_compare_and_jump): Replace uses of HAVE_canonicalize_funcptr_for_compare and gen_canonicalize_funcptr_for_compare. * expr.c (do_store_flag): Likewise. Index: gcc/target-insns.def === --- gcc/target-insns.def(revision 225115) +++ gcc/target-insns.def(working copy) @@ -32,3 +32,4 @@ Instructions should be documented in md.texi rather than here. */ DEF_TARGET_INSN (return, (void)) DEF_TARGET_INSN (simple_return, (void)) +DEF_TARGET_INSN (canonicalize_funcptr_for_compare, (rtx x0, rtx x1)) Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 225115) +++ gcc/fold-const.c(working copy) @@ -4488,14 +4488,12 @@ build_range_check (location_t loc, tree { tree etype = TREE_TYPE (exp), value; -#ifdef HAVE_canonicalize_funcptr_for_compare /* Disable this optimization for function pointer expressions on targets that require function pointer canonicalization. */ - if (HAVE_canonicalize_funcptr_for_compare + if (targetm.have_canonicalize_funcptr_for_compare () TREE_CODE (etype) == POINTER_TYPE TREE_CODE (TREE_TYPE (etype)) == FUNCTION_TYPE) return NULL_TREE; -#endif if (! in_p) { @@ -6964,14 +6962,12 @@ fold_widened_comparison (location_t loc, return NULL_TREE; shorter_type = TREE_TYPE (arg0_unw); -#ifdef HAVE_canonicalize_funcptr_for_compare /* Disable this optimization if we're casting a function pointer type on targets that require function pointer canonicalization. */ - if (HAVE_canonicalize_funcptr_for_compare + if (targetm.have_canonicalize_funcptr_for_compare () TREE_CODE (shorter_type) == POINTER_TYPE TREE_CODE (TREE_TYPE (shorter_type)) == FUNCTION_TYPE) return NULL_TREE; -#endif if (TYPE_PRECISION (TREE_TYPE (arg0)) = TYPE_PRECISION (shorter_type)) return NULL_TREE; @@ -7059,14 +7055,12 @@ fold_sign_changed_comparison (location_t arg0_inner = TREE_OPERAND (arg0, 0); inner_type = TREE_TYPE (arg0_inner); -#ifdef HAVE_canonicalize_funcptr_for_compare /* Disable this optimization if we're casting a function pointer type on targets that require function pointer canonicalization. */ - if (HAVE_canonicalize_funcptr_for_compare + if (targetm.have_canonicalize_funcptr_for_compare () TREE_CODE (inner_type) == POINTER_TYPE TREE_CODE (TREE_TYPE (inner_type)) == FUNCTION_TYPE) return NULL_TREE; -#endif if (TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type)) return NULL_TREE; Index: gcc/dojump.c === --- gcc/dojump.c(revision 225115) +++ gcc/dojump.c(working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. #include predict.h #include basic-block.h #include tm_p.h +#include target.h static bool prefer_and_bit_test (machine_mode, int); static void do_jump_by_parts_greater (tree, tree, int, @@ -1204,13 +1205,12 @@ do_compare_and_jump (tree treeop0, tree unsignedp = TYPE_UNSIGNED (type); code = unsignedp ? unsigned_code : signed_code; -#ifdef HAVE_canonicalize_funcptr_for_compare /* If function pointers need to be canonicalized before they can be reliably compared, then canonicalize them. Only do this if *both* sides of the comparison are function pointers. If one side isn't, we want a noncanonicalized comparison. See PR middle-end/17564. */ - if (HAVE_canonicalize_funcptr_for_compare + if (targetm.have_canonicalize_funcptr_for_compare () TREE_CODE (TREE_TYPE (treeop0)) == POINTER_TYPE TREE_CODE (TREE_TYPE (TREE_TYPE (treeop0))) == FUNCTION_TYPE @@ -1221,13 +1221,12 @@ do_compare_and_jump (tree treeop0, tree rtx new_op0 = gen_reg_rtx (mode); rtx new_op1 = gen_reg_rtx (mode); - emit_insn (gen_canonicalize_funcptr_for_compare (new_op0, op0)); + emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op0, op0)); op0 = new_op0; - emit_insn (gen_canonicalize_funcptr_for_compare (new_op1, op1)); + emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1)); op1 = new_op1; } -#endif do_compare_rtx_and_jump (op0, op1, code, unsignedp, mode, ((mode == BLKmode) Index: gcc/expr.c === --- gcc/expr.c (revision 225115) +++ gcc/expr.c
[AArch64] Fall back to -fPIC if no support of -fpic relocation modifer in assembler
This patch fix the breakage caused by https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01913.html We fall back to -fPIC if there is no assembler support on those new relocation modifiers for -fpic. OK for trunk? gcc/ * configure.ac: Add check for aarch64 assembler -fpic relocation modifier support. * configure: Regenerate. * config.in: Regenerate. * config/aarch64/aarch64.c (initialize_aarch64_code_model): Fall back to -fPIC if not support of -fpic relocation modifier in assembler. -- Regards, Jiong diff --git a/gcc/config.in b/gcc/config.in index 3aee936..476d585 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -606,6 +606,12 @@ #endif +/* Define if your assembler supports relocs needed by -fpic. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_SPIC_RELOCS +#endif + + /* Define if your assembler and linker support thread-local storage. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_TLS diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f130f8d..7e09e3b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7581,9 +7581,13 @@ initialize_aarch64_code_model (void) aarch64_cmodel = AARCH64_CMODEL_TINY_PIC; break; case AARCH64_CMODEL_SMALL: +#ifdef HAVE_AS_SPIC_RELOCS aarch64_cmodel = (flag_pic == 2 ? AARCH64_CMODEL_SMALL_PIC : AARCH64_CMODEL_SMALL_SPIC); +#else + aarch64_cmodel = AARCH64_CMODEL_SMALL_PIC; +#endif break; case AARCH64_CMODEL_LARGE: sorry (code model %qs with -f%s, large, diff --git a/gcc/configure b/gcc/configure index 3f3f578..b72cde7 100755 --- a/gcc/configure +++ b/gcc/configure @@ -24228,6 +24228,40 @@ $as_echo #define HAVE_AS_MABI_OPTION 1 confdefs.h done fi fi +# Check if we have binutils support for relocations types needed by -fpic +{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -fpic relocs 5 +$as_echo_n checking assembler for -fpic relocs... 6; } +if test ${gcc_cv_as_aarch64_picreloc+set} = set; then : + $as_echo_n (cached) 6 +else + gcc_cv_as_aarch64_picreloc=no + if test x$gcc_cv_as != x; then +$as_echo ' + .text + ldr x0, [x2, #:gotpage_lo15:globalsym] +' conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s 5' + { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5 + (eval $ac_try) 25 + ac_status=$? + $as_echo $as_me:${as_lineno-$LINENO}: \$? = $ac_status 5 + test $ac_status = 0; }; } +then + gcc_cv_as_aarch64_picreloc=yes +else + echo configure: failed program was 5 + cat conftest.s 5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo $as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_aarch64_picreloc 5 +$as_echo $gcc_cv_as_aarch64_picreloc 6; } +if test $gcc_cv_as_aarch64_picreloc = yes; then + +$as_echo #define HAVE_AS_SPIC_RELOCS 1 confdefs.h + +fi # Enable default workaround for AArch64 Cortex-A53 erratum 835769. # Check whether --enable-fix-cortex-a53-835769 was given. if test ${enable_fix_cortex_a53_835769+set} = set; then : diff --git a/gcc/configure.ac b/gcc/configure.ac index 85f72d5..fb7dbfb 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3562,6 +3562,13 @@ case $target in done fi fi +# Check if we have binutils support for relocations types needed by -fpic +gcc_GAS_CHECK_FEATURE([-fpic relocs], gcc_cv_as_aarch64_picreloc,,, +[ + .text + ldr x0, [[x2, #:gotpage_lo15:globalsym]] +],,[AC_DEFINE(HAVE_AS_SPIC_RELOCS, 1, + [Define if your assembler supports relocs needed by -fpic.])]) # Enable default workaround for AArch64 Cortex-A53 erratum 835769. AC_ARG_ENABLE(fix-cortex-a53-835769, [
Re: conditional lim
On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi Richard, Here is some explanation. I hope you let me know if I need to clarify something. Also, you asked me about concrete example, to make sure you don’t miss my answer here is the link: https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. Also, I doubt whether it’s convenient for you to create a build with my patch or not. May be to clarify things you could send me some examples/concrete cases, then I’ll compile them with –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you these dumps. May be these dumps will be useful. (I’ll only disable cleanup_cfg TODO after lim to let you know the exact picture after lim). What do you think? 1. invariantness _dom_walker – 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to call for true and false edges handle_branch_edge, which calls SET_TARGET_OF for all bb ‘predicated’ by given GIMPLE_COND. SET_TARGET_OF sets in basic_blocks aux 2 facts: a) this is true or false edge; b) link to cond stmt; Handle_branch_edge works this way: If (cond1) { bb1; if (cond2} { bb2; } Being called for cond1, it sets cond1 as condition for both bb1 and bb2 (the whole branch for cond1, ie also for bb containing cond2), then this method will be called (as there is dominance order) for cond2 to correct things (ie to set cond2 as condition for bb2). Hmm, why not track the current condition as state during the DOM walk and thus avoid processing more than one basic-block in handle_branch_edge? Thus get rid of handle_branch_edge and instead do everything in handle_cond_stmt plus the dom-walkers BB visitor? I need to look more carefully how to implement it, but I think I understand what you mean and this optimization of course looks reasonable to me. Will do. I see you don't handle BBs with multiple predecessors - that's ok, but are you sure you don't run into correctness issues when not marking such BBs as predicated? This misses handling of, say if (a || b) bb; which is a pity (but can be fixed later if desired). I had some test (in gcc testsuite or bootstrap build) which worked incorrectly because of multiple predecessors. As far as I remember the situation was (further, will make some notes about such tests to clarify this better), I mean with previous version of my code which handled bb with 2 predecessors: if (a) tmpvar=something; while() if (a || b) basic_block {do something with tmpvar;} // I mean basic block predicated by bb with a and bb with b So, if a is false, I mean we didn't do tmpvar=something (outside loop), BUT we are in basick_block (we went by bb with b), we got Segmentation falt in basic_block {do something with tmpvar;}. I think we can reproduce all the details of this test if I remove not handling bb with 2 predecessors. So I wouldn't move bb with 2 predecessors (this is not always executed bb in any loop, not conditionally, they will not be moved at all). This is my more detail explanation on this point. Perhaps, I didn't understand your question about correctness. Could you repeat it in other words (based on this new clarification). So I think according to current code it will not be moved. What incorrectness do you mean? If the block isn't marked as predicated the question is whether it is handled correctly or assumed to be unconditionally executed. I note that collecting predicates has similarities to what if-conversion does in tree-ifcvt.c (even if its implementation is completely different, of course). Ok, I'll look at this. But could you please clarify your point? (Should I just take this into account with low priority and look at this later or you want some refactoring?) I just noted similar code exists elsewhere - it may be possible to factor it out but I didn't investigate. And no, doing that isn't a prerequesite for this patch. 1.2 As 1.1 goes we identify whether some bb is predicated by some condition or not. bb-aux-type will be [TRUE/FALSE]_TARGET_OF and bb-aux-cond_stmt=cond stmt (the nearest condition). If bb is always executed bb-aux-type = ALWAYS_EXECUTED_IN, bb-loop-loop (this info was available in the clean build). 1.3 As this walker is called in dominance order, information about condition is available when invariantness_dom_walker is called for given bb. So we can make some computations based on bb-aux structure. This is done in check_conditionally_executed. The main goal of this method is to check that the root condition is always executed in the loop. I did so to avoid situation like this Loop: Jmp somewhere; If (cond1) If (cond2)
Re: Fix PR43404, PR48470, PR64744 ICE on naked functions
I've updated patch with attributes lookup. is it OK? -- Alexander 2015-06-26 9:33 GMT+03:00 Alexander Basov coo...@gmail.com: 2015-06-25 21:47 GMT+03:00 Jeff Law l...@redhat.com: On 06/03/2015 02:15 PM, Alexander Basov wrote: Hello Jeff, please find updated patch attached diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index b190f91..c6db8a9 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1382,7 +1382,15 @@ expand_one_var (tree var, bool toplevel, bool really_expand) else { if (really_expand) -expand_one_stack_var (origvar); +{ + if (!targetm.calls.allocate_stack_slots_for_args ()) +error (cannot allocate stack for variable %q+D, naked function., + var); + + expand_one_stack_var (origvar); +} So how do you know ORIGVAR is an argument here before issuing the error? ie, shouldn't you verify that the underlying object is a PARM_DECL? If there's some way we already know we're dealing with a PARM_DECL, then just say so. In case of naked function stack should not be used not only for function args, but also for any local variables. So, i think we don't need to check if underlying object is a PARM_DECL. Then that would indicate that we're using the wrong test (allocate_stack_slot_for_args). That hook is for whether or not arguments should have stack slots allocated. Yet you're issuing an error for more than just PARM_DECLs. Shouldn't you instead be checking if the current function is a naked function or not by checking the attributes of the current function? Jeff What allocate_stack_slots_for_args does, it only checks if current function is naked or not. May be it will be better to remove allocate_stack_slots_for_args and replace if with explicit checking of naked attribute? -- Alexander commit 3a72dac72beb713ab6a566728b77c4da6d297755 Author: Alexander Basov coo...@gmail.com Date: Tue Mar 10 14:15:24 2015 +0300 PR middle-end/64744 PR middle-end/48470 PR middle-end/43404 * gcc/cfgexpand.c (expand_one_var): Add check if stack is going to be used in naked function. * gcc/expr.c (expand_expr_addr_expr_1): Remove exscess checking whether expression should not reside in MEM. * gcc/function.c (use_register_for_decl): Do not use registers for non-register things (volatile, float, BLKMode) in naked functions. * gcc/testsuite/gcc.target/arm/pr43404.c : New testcase. * gcc/testsuite/gcc.target/arm/pr48470.c : New testcase. * gcc/testsuite/gcc.target/arm/pr64744-1.c : New testcase. * gcc/testsuite/gcc.target/arm/pr64744-2.c : New testcase. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 05eb2ad..b7b4804 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1349,7 +1349,16 @@ expand_one_var (tree var, bool toplevel, bool really_expand) else { if (really_expand) -expand_one_stack_var (origvar); +{ + if (lookup_attribute (naked, +DECL_ATTRIBUTES (current_function_decl))) +error (cannot allocate stack for variable %q+D, naked function., + var); + + expand_one_stack_var (origvar); +} + + return tree_to_uhwi (DECL_SIZE_UNIT (var)); } return 0; diff --git a/gcc/expr.c b/gcc/expr.c index 408ae1a..34cd7de 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7631,15 +7631,7 @@ expand_expr_addr_expr_1 (tree exp, rtx target, machine_mode tmode, marked TREE_ADDRESSABLE, which will be either a front-end or a tree optimizer bug. */ - if (TREE_ADDRESSABLE (exp) - ! MEM_P (result) - ! targetm.calls.allocate_stack_slots_for_args ()) - { - error (local frame unavailable (naked function?)); - return result; - } - else - gcc_assert (MEM_P (result)); + gcc_assert (MEM_P (result)); result = XEXP (result, 0); /* ??? Is this needed anymore? */ diff --git a/gcc/function.c b/gcc/function.c index cffe323..0866c49 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -2110,9 +2110,6 @@ aggregate_value_p (const_tree exp, const_tree fntype) bool use_register_for_decl (const_tree decl) { - if (!targetm.calls.allocate_stack_slots_for_args ()) -return true; - /* Honor volatile. */ if (TREE_SIDE_EFFECTS (decl)) return false; @@ -2140,6 +2137,9 @@ use_register_for_decl (const_tree decl) if (flag_float_store FLOAT_TYPE_P (TREE_TYPE (decl))) return false; + if (!targetm.calls.allocate_stack_slots_for_args ()) +return true; + /* If we're not interested in tracking debugging information for this decl, then we can certainly put it in a register. */ if (DECL_IGNORED_P (decl)) diff --git a/gcc/testsuite/gcc.target/arm/pr43404.c b/gcc/testsuite/gcc.target/arm/pr43404.c new file mode 100644 index 000..4f2291d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr43404.c @@ -0,0 +1,10
Re: [PATCH] Move X - (X / Y) * Y folding to match.pd
On Mon, 29 Jun 2015, Marek Polacek wrote: On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote: Anything wrong with this? +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 That looks awfully similar to a variant I also tried (but I remember having convert1? and convert2? in it). Not sure what was wrong with that one; certainly yours seems to work fine. Not sure whether we need some tree_nop_conversion_p in it. Perhaps not. Yes. Eventually even (convert? (mult (convert1? (trunc_div ...)? Of course with matching @0 between the two operands of the minus you constrain types quite a bit. I'm starting to dislike this whole convert business ;). ;) fold-const.c STRIP_NOPS certainly was both convenient and error-prone at the same time. I'd say just single-step through fold and see what types it get present when folding a - (unsigned) ((a / b) * b). I did that. The whole expression has type unsigned int, arg0 is a of type int and arg1 is (a / b) * b of type int. The following version is what Marc suggests. Bootstrapped/regtested on x86_64-linux, ok for trunk? Ok. Thanks, Richard. 2015-06-29 Marek Polacek pola...@redhat.com Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_loc): Move X - (X / Y) * Y - X % Y to ... * match.pd: ... pattern here. diff --git gcc/fold-const.c gcc/fold-const.c index 6f12dd0..01e3983 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -10509,19 +10509,6 @@ fold_binary_loc (location_t loc, fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0))); - /* X - (X / Y) * Y is X % Y. */ - if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) -TREE_CODE (arg1) == MULT_EXPR -TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR -operand_equal_p (arg0, - TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0) -operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg1, 0), 1), - TREE_OPERAND (arg1, 1), 0)) - return - fold_convert_loc (loc, type, - fold_build2_loc (loc, TRUNC_MOD_EXPR, TREE_TYPE (arg0), - arg0, TREE_OPERAND (arg1, 1))); - if (! FLOAT_TYPE_P (type)) { /* Fold A - (A B) into ~B A. */ diff --git gcc/match.pd gcc/match.pd index b2f8429..2bc158b 100644 --- gcc/match.pd +++ gcc/match.pd @@ -238,6 +238,12 @@ along with GCC; see the file COPYING3. If not see tree_nop_conversion_p (type, TREE_TYPE (@1))) (trunc_mod @0 (convert @1 +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 + /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR, i.e. X % C into X (C - 1), if X and C are positive. Also optimize A % (C N) where C is a power of 2, Marek -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Debug mode enhancements
On 12/06/15 19:11 +0200, François Dumont wrote: Hi This is a patch to: - Enhance __get_distance to get a better feedback about distance between iterators so that we can take sharper decision about what is right or not. This function is now aware about safe iterators and leverage on those a little like std::distance does with C++ 11 list::iterator. - Make debug mode aware about iterator adapters reverse_iterator and move_iterator. - Thanks to previous points this patch also extend situations where it is possible to remove debug layers on iterators to lower performance hint of this mode. We now detect at runtime if we know enough about the iterator range to get rid of the potential debug layer. For the last point I introduced __gnu_debug::__unsafe which remove debug layer unconditionally in opposition to __gnu_debug::__base which do so only for random access iterator. The latter has been kept to be used in context of constructors. I had to introduced new debug headers to limit impact in stl_iterator.h. We shall not include debug.h here as the purpose is not to inject debug checks in the normal code. Note that the new __get_distance will be very useful to implement proper debug algos I didn't check every line of this, but it looks good and I trust your judgment for debug mode changes, so it's OK for trunk, thanks.
[PATCH, PR66652] Use max_loop_iterations in transform_to_exit_first_loop_alt
Hi, this patch fixes PR66652. It uses max_loop_iterations in transform_to_exit_first_loop_alt to ensure that the new loop bound nit + 1 doesn't overflow. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Use max_loop_iterations in transform_to_exit_first_loop_alt 2015-06-29 Tom de Vries t...@codesourcery.com PR tree-optimization/66652 * tree-parloops.c (try_transform_to_exit_first_loop_alt): Use max_loop_iterations to determine if nit + 1 overflows. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (f): Rewrite using restrict pointers. (main): Add arguments to calls to f. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: Same. * gcc.dg/parloops-exit-first-loop-alt-pr66652.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c (f): Rewrite using restrict pointers. * gcc.dg/parloops-exit-first-loop-alt.c: Same. --- .../gcc.dg/parloops-exit-first-loop-alt-3.c| 2 +- .../gcc.dg/parloops-exit-first-loop-alt-pr66652.c | 31 ++ .../gcc.dg/parloops-exit-first-loop-alt.c | 19 + gcc/tree-parloops.c| 31 ++ .../libgomp.c/parloops-exit-first-loop-alt-3.c | 6 ++--- .../libgomp.c/parloops-exit-first-loop-alt.c | 7 ++--- 6 files changed, 77 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c index b0fde37..fec53a1 100644 --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c @@ -7,7 +7,7 @@ unsigned int *a; unsigned int -f (unsigned int n) +f (unsigned int n, unsigned int *__restrict__ a) { int i; unsigned int sum = 1; diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c new file mode 100644 index 000..2ea097d --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */ + +#include stdio.h +#include stdlib.h +#include limits.h + +unsigned int +f (unsigned int n, unsigned int sum) +{ + unsigned int i; + + i = UINT_MAX; + do +{ + sum += i % 13; + i++; +} + while (i n - 1); + + return sum; +} + +/* Four times % 13: + - once in f._loopfn.0 + - once in the parallel + - once in the low iteration count loop + - once for a peeled off last iteration following the parallel. + In other words, we want try_transform_to_exit_first_loop_alt to fail. */ +/* { dg-final { scan-tree-dump-times (?n)% 13 4 parloops } } */ diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c index b36f01b..e088fa1 100644 --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c @@ -4,14 +4,9 @@ /* Variable bound, vector addition. */ -#define N 1000 - -unsigned int a[N]; -unsigned int b[N]; -unsigned int c[N]; - void -f (unsigned int n) +f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b, + unsigned int *__restrict__ c) { int i; @@ -19,9 +14,9 @@ f (unsigned int n) c[i] = a[i] + b[i]; } -/* Three times three array accesses: - - three in f._loopfn.0 - - three in the parallel - - three in the low iteration count loop +/* Three times a store: + - one in f._loopfn.0 + - one in the parallel + - one in the low iteration count loop Crucially, none for a peeled off last iteration following the parallel. */ -/* { dg-final { scan-tree-dump-times (?n)\\\[i 9 parloops } } */ +/* { dg-final { scan-tree-dump-times (?n)^ \\*_\[0-9\]* 3 parloops } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index cb0ba54..32d059a 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1795,8 +1795,39 @@ try_transform_to_exit_first_loop_alt (struct loop *loop, gcc_assert (TREE_CODE (nit) == SSA_NAME); + /* Variable nit is the loop bound as returned by canonicalize_loop_ivs, for an + iv with base 0 and step 1 that is incremented in the latch, like this: + + bb header: + # iv_1 = PHI 0 (preheader), iv_2 (latch) + ... + if (iv_1 nit) + goto bb latch; + else + goto bb exit; + + bb latch: + iv_2 = ivtmp_1 + 1; + goto bb header; + + The range of iv_1 is [0, nit]. The latch edge is taken for + iv_1 == [0, nit - 1] and the exit edge is taken for iv_1 == nit. So the + number of latch executions is equal to nit. + + The function max_loop_iterations gives us the maximum number of latch + executions, so it gives us the maximum value of nit. */ + widest_int nit_max; + if (!max_loop_iterations
Re: [PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.
On Mon, Jun 22, 2015 at 10:52 AM, Matthew Wahab matthew.wa...@arm.com wrote: This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). This patch adds tests for the code generated by the ARM backend for the __sync builtins. Tested the series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/testsuite 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 And as earlier ... s/Target/target :) * gcc.target/arm/armv8-sync-comp-swap.c: New. * gcc.target/arm/armv8-sync-op-acquire.c: New. * gcc.target/arm/armv8-sync-op-full.c: New. * gcc.target/arm/armv8-sync-op-release.c: New. OK. Ramana
[RFC, PATCH] Split pool_allocator and create a new object_allocator
Hello. This mail thread is follow-up from the previous thread, where Ulrich Weigand spotted a miscompilation: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00868.html Following patch changes behavior of the pool_allocator: 1) pool_allocator is typeless and does not call any ctor 2) pool_allocator returns void* as a result of allocate function 3) all classes (e.g. et_occ) which use an operator new utilize the typeless allocator 4) object_allocator is a new type-based allocator which is used for allocation of objects 5) for a type T, the allocator calls placement new constructor and T* is return type of the allocate function 6) the object_allocator internally utilizes pool_allocator with m_size == sizeof(T) The patch can bootstrap on x86_64-unknown-linux-gnu and ppc64le-unknown-linux-gnu and regression tests have been running. Thoughts? Thanks, Martin From 9702a6ea0b60985f08ff28b0977c1dc46c25f27b Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Wed, 24 Jun 2015 13:42:52 +0200 Subject: [PATCH] Add new object_allocator. gcc/c-family/ChangeLog: 2015-06-24 Martin Liska mli...@suse.cz * c-format.c (static void check_format_info_main): Use object_allocator instead of pool_allocator. (check_format_arg): Likewise. (check_format_info_main): Likewise. gcc/ChangeLog: 2015-06-24 Martin Liska mli...@suse.cz * alloc-pool.h (pool_allocator::initialize): Change to type-less pool_allocator. (pool_allocator::allocate): Likewise. (pool_allocator::remove): Likewise. * asan.c (struct asan_mem_ref): Likewise. * cfg.c (initialize_original_copy_tables): Use object_allocator instead of pool_allocator. * cselib.c (struct elt_list): Change declaration of used pool_allocator. (new_cselib_val): Likewise. * cselib.h (struct cselib_val): Change to type-less pool_allocator.. (struct elt_loc_list): Likewise. * df-problems.c (df_chain_alloc): Use object_allocator instead of pool_allocator.. * df-scan.c (struct df_scan_problem_data): Likewise. (df_scan_alloc): Likewise. * df.h (struct dataflow): Likewise. * dse.c (struct read_info_type): Change to type-less pool_allocator. (struct insn_info_type): Likewise. (struct dse_bb_info_type): Likewise. (struct group_info): Likewise. (struct deferred_change): Likewise. * et-forest.c (struct et_occ): Likewise. * et-forest.h (struct et_node): Likewise. * ipa-cp.c: Use object_allocator instead of pool_allocator. * ipa-inline-analysis.c: Likewise. * ipa-profile.c: Likewise. * ipa-prop.c: Likewise. * ipa-prop.h: Likewise. * ira-build.c (initiate_cost_vectors): Change to type-less pool_allocator. (ira_allocate_cost_vector): Likewise. * ira-color.c (struct update_cost_record): Use object_allocator instead of pool_allocator. * lra-int.h (struct lra_live_range): Change to type-less pool_allocator. (struct lra_copy): Likewise. (struct lra_insn_reg): Likewise. * lra-lives.c: Likewise. * lra.c: Likewise. * regcprop.c (struct queued_debug_insn_change): Use object_allocator instead of pool_allocator. * sched-deps.c (sched_deps_init): Likewise. * sel-sched-ir.c: Likewise. * sel-sched-ir.h: Likewise. * stmt.c (expand_case): Likewise. (expand_sjlj_dispatch_table): Likewise. * tree-sra.c (struct access): Change to type-less pool_allocator. (struct assign_link): Likewise. * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use object_allocator instead of pool_allocator. * tree-ssa-pre.c: Likewise. * tree-ssa-reassoc.c: Likewise. * tree-ssa-sccvn.c (allocate_vn_table): Likewise. * tree-ssa-strlen.c: Likewise. * tree-ssa-structalias.c: Likewise. * var-tracking.c (onepart_pool_allocate): Change to type-less pool_allocator. (unshare_variable): Likewise. (variable_merge_over_cur): Likewise. (variable_from_dropped): Likewise. (variable_was_changed): Likewise. (set_slot_part): Likewise. (emit_notes_for_differences_1): Likewise. --- gcc/alloc-pool.h | 176 + gcc/asan.c | 8 +-- gcc/c-family/c-format.c| 7 +- gcc/cfg.c | 5 +- gcc/cselib.c | 17 +++-- gcc/cselib.h | 12 ++-- gcc/df-problems.c | 2 +- gcc/df-scan.c | 24 +++ gcc/df.h | 2 +- gcc/dse.c | 50 +++-- gcc/et-forest.c| 11 ++- gcc/et-forest.h| 6 +- gcc/ipa-cp.c | 8 +-- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-profile.c | 2 +- gcc/ipa-prop.c | 2 +- gcc/ipa-prop.h | 8 +-- gcc/ira-build.c| 18 ++--- gcc/ira-color.c| 8 +-- gcc/lra-int.h | 19 +++-- gcc/lra-lives.c| 3 +- gcc/lra.c | 4 +- gcc/regcprop.c | 4 +- gcc/sched-deps.c | 8 +-- gcc/sel-sched-ir.c | 2 +- gcc/sel-sched-ir.h | 2 +- gcc/stmt.c | 7 +- gcc/tree-sra.c | 16 ++---
Re: [PATCH] Move X - (X / Y) * Y folding to match.pd
On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote: Anything wrong with this? +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 That looks awfully similar to a variant I also tried (but I remember having convert1? and convert2? in it). Not sure what was wrong with that one; certainly yours seems to work fine. Not sure whether we need some tree_nop_conversion_p in it. Perhaps not. Yes. Eventually even (convert? (mult (convert1? (trunc_div ...)? Of course with matching @0 between the two operands of the minus you constrain types quite a bit. I'm starting to dislike this whole convert business ;). I'd say just single-step through fold and see what types it get present when folding a - (unsigned) ((a / b) * b). I did that. The whole expression has type unsigned int, arg0 is a of type int and arg1 is (a / b) * b of type int. The following version is what Marc suggests. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-06-29 Marek Polacek pola...@redhat.com Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_loc): Move X - (X / Y) * Y - X % Y to ... * match.pd: ... pattern here. diff --git gcc/fold-const.c gcc/fold-const.c index 6f12dd0..01e3983 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -10509,19 +10509,6 @@ fold_binary_loc (location_t loc, fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0))); - /* X - (X / Y) * Y is X % Y. */ - if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) - TREE_CODE (arg1) == MULT_EXPR - TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR - operand_equal_p (arg0, - TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0) - operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg1, 0), 1), - TREE_OPERAND (arg1, 1), 0)) - return - fold_convert_loc (loc, type, - fold_build2_loc (loc, TRUNC_MOD_EXPR, TREE_TYPE (arg0), -arg0, TREE_OPERAND (arg1, 1))); - if (! FLOAT_TYPE_P (type)) { /* Fold A - (A B) into ~B A. */ diff --git gcc/match.pd gcc/match.pd index b2f8429..2bc158b 100644 --- gcc/match.pd +++ gcc/match.pd @@ -238,6 +238,12 @@ along with GCC; see the file COPYING3. If not see tree_nop_conversion_p (type, TREE_TYPE (@1))) (trunc_mod @0 (convert @1 +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 + /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR, i.e. X % C into X (C - 1), if X and C are positive. Also optimize A % (C N) where C is a power of 2, Marek
[committed] Simplify structure try_transform_to_exit_first_loop_alt
Hi, this patch simplifies the structure of function try_transform_to_exit_first_loop_alt, no functional changes. Bootstrapped on x86_64, retested parloops-exit-first-loop-alt*.* testcases. Committed as trivial. Thanks, - Tom Simplify structure try_transform_to_exit_first_loop_alt 2015-06-29 Tom de Vries t...@codesourcery.com * tree-parloops.c (try_transform_to_exit_first_loop_alt): Simplify function structure. --- gcc/tree-parloops.c | 59 ++--- 1 file changed, 15 insertions(+), 44 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index ab77f32..ec708c6 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1788,60 +1788,31 @@ try_transform_to_exit_first_loop_alt (struct loop *loop, nit, build_one_cst (nit_type)); gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST); + transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound); + return true; } else { /* Todo: Figure out if we can trigger this, if it's worth to handle optimally, and if we can handle it optimally. */ + return false; } } - else -{ - gcc_assert (TREE_CODE (nit) == SSA_NAME); - - gimple def = SSA_NAME_DEF_STMT (nit); - - if (def - is_gimple_assign (def) - gimple_assign_rhs_code (def) == PLUS_EXPR) - { - tree op1 = gimple_assign_rhs1 (def); - tree op2 = gimple_assign_rhs2 (def); - if (integer_minus_onep (op1)) - alt_bound = op2; - else if (integer_minus_onep (op2)) - alt_bound = op1; - } - /* There is a number of test-cases for which we don't get an alt_bound - here: they're listed here, with the lhs of the last stmt as the nit: + gcc_assert (TREE_CODE (nit) == SSA_NAME); - libgomp.graphite/force-parallel-1.c: - _21 = (signed long) N_6(D); - _19 = _21 + -1; - _7 = (unsigned long) _19; + gimple def = SSA_NAME_DEF_STMT (nit); - libgomp.graphite/force-parallel-2.c: - _33 = (signed long) N_9(D); - _16 = _33 + -1; - _37 = (unsigned long) _16; - - libgomp.graphite/force-parallel-5.c: - bb 6: - # graphite_IV.5_46 = PHI 0(5), graphite_IV.5_47(11) - bb 7: - _33 = (unsigned long) graphite_IV.5_46; - - g++.dg/tree-ssa/pr34355.C: - _2 = (unsigned int) i_9; - _3 = 4 - _2; - - gcc.dg/pr53849.c: - _5 = d.0_11 + -2; - _18 = (unsigned int) _5; - - We will be able to handle some of these cases, if we can determine when - it's safe to look past casts. */ + if (def + is_gimple_assign (def) + gimple_assign_rhs_code (def) == PLUS_EXPR) +{ + tree op1 = gimple_assign_rhs1 (def); + tree op2 = gimple_assign_rhs2 (def); + if (integer_minus_onep (op1)) + alt_bound = op2; + else if (integer_minus_onep (op2)) + alt_bound = op1; } if (alt_bound == NULL_TREE) -- 1.9.1
Re: conditional lim
On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi Richard, Here is some explanation. I hope you let me know if I need to clarify something. Also, you asked me about concrete example, to make sure you don’t miss my answer here is the link: https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. Also, I doubt whether it’s convenient for you to create a build with my patch or not. May be to clarify things you could send me some examples/concrete cases, then I’ll compile them with –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you these dumps. May be these dumps will be useful. (I’ll only disable cleanup_cfg TODO after lim to let you know the exact picture after lim). What do you think? 1. invariantness _dom_walker – 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to call for true and false edges handle_branch_edge, which calls SET_TARGET_OF for all bb ‘predicated’ by given GIMPLE_COND. SET_TARGET_OF sets in basic_blocks aux 2 facts: a) this is true or false edge; b) link to cond stmt; Handle_branch_edge works this way: If (cond1) { bb1; if (cond2} { bb2; } Being called for cond1, it sets cond1 as condition for both bb1 and bb2 (the whole branch for cond1, ie also for bb containing cond2), then this method will be called (as there is dominance order) for cond2 to correct things (ie to set cond2 as condition for bb2). Hmm, why not track the current condition as state during the DOM walk and thus avoid processing more than one basic-block in handle_branch_edge? Thus get rid of handle_branch_edge and instead do everything in handle_cond_stmt plus the dom-walkers BB visitor? I need to look more carefully how to implement it, but I think I understand what you mean and this optimization of course looks reasonable to me. Will do. I see you don't handle BBs with multiple predecessors - that's ok, but are you sure you don't run into correctness issues when not marking such BBs as predicated? This misses handling of, say if (a || b) bb; which is a pity (but can be fixed later if desired). I had some test (in gcc testsuite or bootstrap build) which worked incorrectly because of multiple predecessors. As far as I remember the situation was (further, will make some notes about such tests to clarify this better), I mean with previous version of my code which handled bb with 2 predecessors: if (a) tmpvar=something; while() if (a || b) basic_block {do something with tmpvar;} // I mean basic block predicated by bb with a and bb with b So, if a is false, I mean we didn't do tmpvar=something (outside loop), BUT we are in basick_block (we went by bb with b), we got Segmentation falt in basic_block {do something with tmpvar;}. I think we can reproduce all the details of this test if I remove not handling bb with 2 predecessors. So I wouldn't move bb with 2 predecessors (this is not always executed bb in any loop, not conditionally, they will not be moved at all). This is my more detail explanation on this point. Perhaps, I didn't understand your question about correctness. Could you repeat it in other words (based on this new clarification). So I think according to current code it will not be moved. What incorrectness do you mean? If the block isn't marked as predicated the question is whether it is handled correctly or assumed to be unconditionally executed. I note that collecting predicates has similarities to what if-conversion does in tree-ifcvt.c (even if its implementation is completely different, of course). Ok, I'll look at this. But could you please clarify your point? (Should I just take this into account with low priority and look at this later or you want some refactoring?) I just noted similar code exists elsewhere - it may be possible to factor it out but I didn't investigate. And no, doing that isn't a prerequesite for this patch. 1.2 As 1.1 goes we identify whether some bb is predicated by some condition or not. bb-aux-type will be [TRUE/FALSE]_TARGET_OF and bb-aux-cond_stmt=cond stmt (the nearest condition). If bb is always executed bb-aux-type = ALWAYS_EXECUTED_IN, bb-loop-loop (this info was available in the clean build). 1.3 As this walker is called in dominance order, information about condition is available when invariantness_dom_walker is called for given bb. So we can make some computations based on bb-aux structure. This is done in check_conditionally_executed. The main goal of this method is to check that the root condition is always executed in the loop. I did so to
Re: [PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.
On Mon, Jun 22, 2015 at 10:50 AM, Matthew Wahab matthew.wa...@arm.com wrote: This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). This patch changes the code generated for __sync_type_compare_and_swap to remove the acquire-barrier from the load and end the operation with a fence. This also strengthens the acquire barrier generated for __sync_lock_test_and_set which, like compare-and-swap, is implemented as a form of atomic exchange. Tested as part of a series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 s/Target/target * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with a final full barrier. OK. Ramana
Re: [PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins
On Mon, Jun 22, 2015 at 10:48 AM, Matthew Wahab matthew.wa...@arm.com wrote: This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). The problem is that the barriers generated for the __sync builtins for ARMv8 targets are too weak. This affects the full and the acquire barriers in the __sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set. This patch series changes the code to strengthen the barriers by replacing initial load-acquires with a simple load and adding a final memory barrier to prevent code hoisting. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; barrier]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; barrier] This patch changes the code generated for __sync_fetch_and_op and __sync_op_and_fetch builtins. Tested as part of a series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 s/Target/target * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with a final full barrier. Otherwise OK. Ramana
Re: [PATCH, C++-1z] Implement N4197 - Adding u8 character literals
On 06/27/2015 08:14 PM, Ed Smith-Rowland wrote: In c-ada-specs.c/print_ada_macros() I just write these as a char constant rather than spelling the token. We could do the latter. You'd see the u8 then I think. I couldn't find in the Ada test suite where this was exercised. If Ada folks don't respond, let's do the latter. OK with that change. Jason
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Hmm, Reducing the iterations to 1 step for float and 2 steps for double I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. I could not reproduce the ICE for namd. Could you provide the spec config entry or the command line options, please. Which ICE did you get? Please provide the output. Thank you and best regards, Benedikt signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [AArch64][2/2] Implement -fpic for -mcmodel=small
Christophe Lyon writes: On 27 June 2015 at 14:49, Jiong Wang jiong.w...@arm.com wrote: Andreas Schwab writes: Jiong Wang jiong.w...@arm.com writes: Andreas Schwab writes: spawn -ignore SIGHUP /opt/gcc/gcc-20150627/Build/gcc/xgcc -B/opt/gcc/gcc-20150627/Build/gcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -w -fpie -c -o pr65593.o /opt/gcc/gcc-20150627/gcc/testsuite/gcc.c-torture/compile/pr65593.c /tmp/cc0Pymaf.s: Assembler messages: /tmp/cc0Pymaf.s:11: Error: unknown relocation modifier at operand 2 -- `ldr x0,[x0,#:gotpage_lo15:a]' /tmp/cc0Pymaf.s:19: Error: unknown relocation modifier at operand 2 -- `ldr x2,[x0,#:gotpage_lo15:bar]' Andreas, The binutils patch has been upstreamed already. Please checkout the latest binutils code. That must work with the current binutils. I see, agree. Will adopt current ILP32 feature detection mechanism into -fpic. And some new TLS patches may need similiar check also. Thanks. Hi Jiong, I'm not sure I fully understand your answer. I believe you are going to change the testcase, right? Sorry about the trouble. I am going to add gcc_GAS_CHECK_FEATURE to detect whether those relocs needed by -fpic supported. Just like what's done for ILP32 detection. commit 891f2090d994d420cb56942e486f6bba3ede265f Author: clyon clyon@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Dec 11 12:57:08 2013 + I'm seeing whole toolchain build failures (when building glibc) since you committed this, because of the new relocations. I understand your suggestion is to move to trunk binutils to have support for these new relocations. Any other things I could do to avoid changing the binutils version I use? Thanks Christophe. -- Regards, Jiong -- Regards, Jiong
Re: Move ABS detection from fold-const.c to match.pd
On Sun, Jun 28, 2015 at 8:34 PM, Marc Glisse marc.gli...@inria.fr wrote: (this message looks like it was lost in my draft folder...) On Tue, 26 May 2015, Richard Biener wrote: +(match zerop integer_zerop) +(match zerop real_zerop) Would it also include fixed_zerop? Probably, yes. The main issue is that I know next to nothing about fixed-point types, so I am always unsure how to handle them (when I don't forget them completely). For instance, in the recently added -A CMP -B, we could probably replace (if (FLOAT_TYPE_P (TREE_TYPE (@0)) || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)) TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0 with (if (FLOAT_TYPE_P (TREE_TYPE (@0)) || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) Not sure if TYPE_OVERFLOW_UNDEFINED says sth sensible for fixed-point types given that there is no overflow for them but they saturate. As far as I see the check would even ICE without guarding it with ANY_INTEGRAL_TYPE_P. So it would be || NON_SAT_FIXED_POINT_TYPE_P (TREE_TYPE (@0)) where I am not sure whether overflow is undefined for non-saturating fixed-point types ... Note that with inlining implemented it would duplicate the pattern for each match variant thus in this case adding a tree.[ch] function zerop () might be better. Ah... I actually thought we might end up moving things like integer_zerop from tree.c to match.pd, especially since predicates are not declared 'static'... Ok, reverse gear. Yeah, I don't think match.pd is a good fit for them. Note that inlining does not seem necessary to implement more advanced predicates like negated_value_for_comparison in the parent message. Sure not necessary but one point of match-and-simplify was that the pattern matching is fast because it uses a decision tree. Once you introduce predicates that are in functions with their own decision tree you get back to testing all of them. + (simplify +(cnd (cmp @0 zerop) (convert?@2 @0) (negate@1 @2)) +(if (cmp == EQ_EXPR || cmp == UNEQ_EXPR) + @1) +(if (cmp == NE_EXPR || cmp == LTGT_EXPR) + (non_lvalue @2)) +(if (TYPE_SIGN (TREE_TYPE (@0)) == SIGNED /* implicit */ + TYPE_SIGN (type) == SIGNED + element_precision (type) = element_precision (TREE_TYPE (@0))) + (if (cmp == GE_EXPR || cmp == GT_EXPR + || (!flag_trapping_math (cmp == UNGE_EXPR || cmp == UNGT_EXPR))) + (abs @2)) + (if (cmp == LE_EXPR || cmp == LT_EXPR + || (!flag_trapping_math (cmp == UNLE_EXPR || cmp == UNLT_EXPR))) + (negate (abs @2) + /* Now with the branches swapped. */ + (simplify +(cnd (cmp @0 zerop) (negate@1 (convert?@2 @0)) @2) not obvious from a quick look - but would you be able to remove the swapped branch vairant if (cnd:c (cmp @0 zerop) X Y) would work by swapping X and Y? Hmm. How do I test if I am currently in the original or commuted version of the simplification? You can't. I could add a with block that defines truecmp as either cmp or invert_tree_comparison (cmp) and test that. Otherwise, I would need a test before each return as swapped versions don't return the same thing. It might make a slight difference on the handling of flag_trapping_math, but that handling already seems strange to me... (cnd:c (cmp @0 zerop) (convert?@2 @0) (negate@1 @2)) would get you (cnd (cmp @0 zerop) (convert?@2 @0) (negate@1 @2)) and (cnd (cmp @0 zerop) (negate@1 @2) (convert?@2 @0)) in the patterns it almost literally looked like what you did manually. The fold-const.c code doesn't seem to handle as many variants (esp. the swapping?), The fold-const.c function is called twice, once on regular operands, once with inverted comparison and swapped operands. I really don't think I am handling more cases (except maybe the silly a?a:0 is extended to unsigned). Ok. so maybe you can add a testcase that exercises some of the above on GIMPLE? So mostly the VEC_COND_EXPR version? We don't seem to have that much COND_EXPR left in gimple. Ah, true. Yes, the vector variant then. Thanks, Richard. -- Marc Glisse
Re: [patch] Fix PR middle-end/66633
Don't you need to handle convert_nonlocal_omp_clauses similarly (need_chain in that case)? At least looking at your r211308 commit, for !optimize you force not just the frame, but also chain. You're very likely right, although I didn't manage to write a Fortran testcase with my limited knowledge of the language (I get a Bad statement code ICE). In fact the conditions should also mimic those of the aforementioned commit. I initially didn't realize it, because I didn't quite grasp how the GOMP stuff was interacting with the nesting stuff, but I guess that the code ought not to do anything if there aren't pre-existing nested functions in the sources. Tested on x86_64-suse-linux, OK for the mainline and 5 branch? 2015-06-29 Eric Botcazou ebotca...@adacore.com PR middle-end/66633 * tree-nested.c (convert_nonlocal_omp_clauses): Initialize need_chain to true if the function is nested and if not optimizing. (convert_local_omp_clauses): Initialize need_frame to true if the function contains nested functions and if not optimizing. 2015-06-29 Eric Botcazou ebotca...@adacore.com * gfortran.dg/gomp/pr66633.f90: New test. -- Eric BotcazouIndex: tree-nested.c === --- tree-nested.c (revision 225111) +++ tree-nested.c (working copy) @@ -1069,7 +1069,9 @@ static bool convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) { struct nesting_info *const info = (struct nesting_info *) wi-info; - bool need_chain = false, need_stmts = false; + /* If not optimizing, we will force the creation of the CHAIN object in + convert_all_function_calls, so we need to take it into account here. */ + bool need_chain = info-outer !optimize, need_stmts = false; tree clause, decl; int dummy; bitmap new_suppress; @@ -1697,7 +1699,9 @@ static bool convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) { struct nesting_info *const info = (struct nesting_info *) wi-info; - bool need_frame = false, need_stmts = false; + /* If not optimizing, we will force the creation of the FRAME object in + convert_all_function_calls, so we need to take it into account here. */ + bool need_frame = info-inner !optimize, need_stmts = false; tree clause, decl; int dummy; bitmap new_suppress;! PR middle-end/66633 ! Testcase by Andrew Benson abenso...@gmail.com ! { dg-do compile } ! { dg-options -O0 -fopenmp } module spls contains function spl() !$omp parallel write (0,*) igrt(fli) !$omp end parallel contains double precision function fli() end function fli end function spl end module spls
Re: [PATCH, stage1] Make parloops gate more strict
On Fri, Jun 19, 2015 at 9:47 AM, Tom de Vries tom_devr...@mentor.com wrote: On 16/06/15 13:18, Richard Biener wrote: On Mon, Jun 15, 2015 at 12:38 AM, Tom de Vries tom_devr...@mentor.com wrote: On 14/06/15 23:49, Bernhard Reutner-Fischer wrote: On June 14, 2015 10:55:59 AM GMT+02:00, Tom de Vries tom_devr...@mentor.com wrote: On 13/03/15 11:36, Richard Biener wrote: On Fri, Mar 13, 2015 at 11:32 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch moves a bunch of early-out tests from the parloops pass to the gate function. The only effect is for functions that we don't consider at all for parallelization in the parloops pass. We no longer dump those in the parloops dump file. Bootstrapped and reg-tested on x86_64. OK for stage1 trunk? Does it work with -fdump-passes? Hi, with -fdump-passes now fixed to work on a dummy function (r222129), I'm resubmitting this patch, split up in two patches. The first patch moves two trivial early-exit tests to the parloops gate. The second patch moves the number_of_loops test to the parloops gate, and adds a dummy loops structure in the dummy function for -fdump-passes. Bootstrapped and reg-tested on x86_64. Both patches OK for trunk? diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 02f44eb..a1659a3 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2535,10 +2535,6 @@ parallelize_loops (void) source_location loop_loc; /* Do not parallelize loops in the functions created by parallelization. */ - if (parallelized_function_p (cfun-decl)) -return false; - if (cfun-has_nonlocal_label) -return false; gcc_obstack_init (parloop_obstack); reduction_info_table_type reduction_list (10); Now stray comment? Stopped reading here. Fixed in updated patch. Also: - made sure cfun is not used in the gate function - added missing update of function header comment for init_loops_structure - improved comment in pass_manager::dump_passes. OK for trunk? For -fdump-passes this doesn't make much sense but I suppose you are after not getting the untouched functions dumped. Note that generally this is undesired for debugging (in my opinion) as diffs from the pass dump before parloops to parloops will contain a lot of spurious changes (and if the preceeding pass is changed similarly the function we run parloops on is maybe not even dumped there!). So I question the general idea of the change. I suppose there are two competing principles: 1. ensure the state before the pass is in the previous dump 2. only dump if changed Indeed in general we use the first principle, although it doesn't hold for f.i. pass_tree_loop and a function without loops. The problems I'm trying to solve with this patch are the following: - even if you're compiling a single function, part of the function occurs twice (once in the original, once in the split-off function), making formulating scan-tree-dumps more complicated for parloops. - the intuitive thing is to look in the parloops tree-dump and look at the original function and the split-off function, and think that the split-off function is the immediate result of the split-off that happened in the pass, which is incorrect (since it jumps back in the pass list and traverses other passes before arriving back at parloops). Adding something like a flag -fno-dump-new-function would solve the first problem, but not the second. Yeah, any pass introducing new functions has this issue. We could also dump new functions in a different dump file src.c.new.123t.pass, and this would solve both problems. But it will cause the 'where did that function go' confusion, certainly initially. Any other ideas? No, not really. But it sounds like a very special thing you try to solve. You could also dump the split off function twice - once next to the function it was split off and once when it arrives at parloops again... Richard. Thanks, - Tom
Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT
On Jun 28, 2015, at 2:28 PM, Kugan kugan.vivekanandara...@linaro.org wrote: This patch allows setting REG_EQUAL for ZERO_EXTRACT and handle that in cse (where the src for the ZERO_EXTRACT needs to be calculated) Thanks, Kugan From 75e746e559ffd21b25542b3db627e3b318118569 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org Date: Fri, 26 Jun 2015 17:12:07 +1000 Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT --- gcc/ChangeLog | 6 ++ gcc/cse.c | 41 - gcc/emit-rtl.c | 3 ++- 3 files changed, 48 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 080aa39..d4a73d6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2015-06-26 Kugan Vivekanandarajah kug...@linaro.org + + * cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT. + * emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set + REG_EQUAL note. + 2015-06-25 H.J. Lu hongjiu...@intel.com * gentarget-def.c (def_target_insn): Cast return of strtol to diff --git a/gcc/cse.c b/gcc/cse.c index 100c9c8..8add651 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4531,8 +4531,47 @@ cse_insn (rtx_insn *insn) if (n_sets == 1 REG_NOTES (insn) != 0 (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0 (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl)) + || GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)) -src_eqv = copy_rtx (XEXP (tem, 0)); +{ + src_eqv = copy_rtx (XEXP (tem, 0)); + + /* If DEST is of the form ZERO_EXTACT, as in: + (set (zero_extract:SI (reg:SI 119) + (const_int 16 [0x10]) + (const_int 16 [0x10])) + (const_int 51154 [0xc7d2])) + REG_EQUAL note will specify the value of register (reg:SI 119) at this + point. Note that this is different from SRC_EQV. We can however + calculate SRC_EQV with the position and width of ZERO_EXTRACT. */ + if (GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT) Consider changing if (something (!rtx_equal_p) || ZERO_EXTRACT || STRICT_LOW_PART) to if (something !rtx_equal_p) { if (ZERO_EXTRACT) { } else if (STRICT_LOW_PART) { } } Otherwise looks good to me, but you still need another approval. + { + if (CONST_INT_P (src_eqv) +CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 1)) +CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 2))) + { + rtx dest_reg = XEXP (SET_DEST (sets[0].rtl), 0); + rtx width = XEXP (SET_DEST (sets[0].rtl), 1); + rtx pos = XEXP (SET_DEST (sets[0].rtl), 2); + HOST_WIDE_INT val = INTVAL (src_eqv); + HOST_WIDE_INT mask; + unsigned int shift; + if (BITS_BIG_ENDIAN) + shift = GET_MODE_PRECISION (GET_MODE (dest_reg)) + - INTVAL (pos) - INTVAL (width); + else + shift = INTVAL (pos); + if (INTVAL (width) == HOST_BITS_PER_WIDE_INT) + mask = ~(HOST_WIDE_INT) 0; + else + mask = ((HOST_WIDE_INT) 1 INTVAL (width)) - 1; + val = (val shift) mask; + src_eqv = GEN_INT (val); + } + else + src_eqv = 0; + } +} /* Set sets[i].src_elt to the class each source belongs to. Detect assignments from or to volatile things diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index e7f7eab..cb891b1 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -5228,7 +5228,8 @@ set_for_reg_notes (rtx insn) reg = SET_DEST (pat); /* Notes apply to the contents of a STRICT_LOW_PART. */ - if (GET_CODE (reg) == STRICT_LOW_PART) + if (GET_CODE (reg) == STRICT_LOW_PART + || GET_CODE (reg) == ZERO_EXTRACT) reg = XEXP (reg, 0); /* Check that we have a register. */ -- 1.9.1 -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote: -Original Message- From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Monday, June 29, 2015 2:17 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. True. I would be very uncomfortable with this approach. From Richard Biener's post in the thread Michael Matz linked earlier in the thread: It would follow existing practice of things we allow in -funsafe-math-optimizations. Existing practice in that we want to allow -ffast-math use with common benchmarks we care about. https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html With the solution you seem to be converging on (2-steps for some microarchitectures, 3 for others), a binary generated for one micro-arch may drop below a minimum guarantee of precision when run on another. This seems to go against the spirit of the practice above. I would only support adding this optimization to -Ofast if we could keep to architectural guarantees of precision in the generated code (i.e. 3-steps everywhere). I don't object to adding a -mlow-precision-recip-sqrt style option, which would be off by default, would enable the 2-step mode, and would need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't see what this buys you beyond the Gromacs boost (and even there you would be creating an Invalid Run as optimization flags must be applied across all workloads). For the 3-step optimization, it is clear to me that for generic tuning we don't want this to be enabled by default experimental results and advice in this thread argues against it for thunderx and cortex-a57 targets. However, enabling it based on the CPU tuning selected seems fine to me. Thanks, James Do you know the exact number of bits that the initial estimate and the subsequent refinement steps add for your micro-arch? I am not sure on this. I need to check for cortex-a57 case. But best thing for cortex-a57 case is not to use this optimization by default. If we get an -mrecip-sqrt command line , then we can add it for gromacs kind application to get gains. Any thoughts on this? Regards, Venkat. Thanks, Philipp. On 29 Jun 2015, at 10:17, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Hmm, Reducing the iterations to 1 step for float and 2 steps for double I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Regards, Venkat. -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Sunday, June 28, 2015 8:35 PM To: Kumar, Venkataramanan Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: I got around ~12% gain with -Ofast -mcpu=cortex-a57. I get around 11/12% on thunderX with the patch and the decreasing the iterations change (1/2) compared to without the patch. Thanks, Andrew Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich Sent: Thursday, June 25, 2015 9:13 PM To: Kumar, Venkataramanan Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, what is the relative gain that you see on Cortex-A57? Thanks, Philipp. On 25 Jun 2015, at 17:35, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Changing to 1 step for float and 2 steps for double gives better gains now for gromacs on cortex-a57. Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Benedikt Huber Sent: Thursday, June 25, 2015 4:09 PM To: pins...@gmail.com Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- systems.com Subject: Re: [PATCH] [aarch64] Implemented reciprocal square
Re: [PATCH 2/2] Set REG_EQUAL
On Jun 28, 2015, at 2:30 PM, Kugan kugan.vivekanandara...@linaro.org wrote: This patch sets REG_EQUAL when emitting arm_emit_movpair. Thanks, Kugan gcc/testsuite/ChangeLog: 2015-06-26 Kugan Vivekanandarajah kug...@linaro.org * gcc.target/arm/reg_equal_test.c: New test. gcc. 2015-06-26 Kugan Vivekanandarajah kug...@linaro.org * config/arm/arm.c (arm_emit_movpair): Add REG_EQUAL notes to instruction. 0002-Add-REG_EQUAL-note-for-arm_emit_movpair.patch LGTM, but you need ARM maintainer's approval. -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
James, On 29 Jun 2015, at 13:36, James Greenhalgh james.greenha...@arm.com wrote: On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote: -Original Message- From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Monday, June 29, 2015 2:17 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. True. I would be very uncomfortable with this approach. Same here. The default must be safe. Always. Unlike other architectures, we don’t have a problem with making the proper defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of precise bits per iteration. From Richard Biener's post in the thread Michael Matz linked earlier in the thread: It would follow existing practice of things we allow in -funsafe-math-optimizations. Existing practice in that we want to allow -ffast-math use with common benchmarks we care about. https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html With the solution you seem to be converging on (2-steps for some microarchitectures, 3 for others), a binary generated for one micro-arch may drop below a minimum guarantee of precision when run on another. This seems to go against the spirit of the practice above. I would only support adding this optimization to -Ofast if we could keep to architectural guarantees of precision in the generated code (i.e. 3-steps everywhere). I don't object to adding a -mlow-precision-recip-sqrt style option, which would be off by default, would enable the 2-step mode, and would need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't see what this buys you beyond the Gromacs boost (and even there you would be creating an Invalid Run as optimization flags must be applied across all workloads). Any flag that reduces precision (and thus breaks IEEE floating-point semantics) needs to be gated with an “unsafe” flag (i.e. one that is never on by default). As a consequence, the “peak”-tuning for SPEC will turn this on… but barely anyone else would. For the 3-step optimization, it is clear to me that for generic tuning we don't want this to be enabled by default experimental results and advice in this thread argues against it for thunderx and cortex-a57 targets. However, enabling it based on the CPU tuning selected seems fine to me. I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5 iterations respectively) to become the default. Most “server-type” chips should not see a performance regression, while it will be easier to optimise for this in hardware than for a (potentially microcoded) sqrt-instruction (and subsequent, dependent divide). I have not heard anyone claim a performance regression (either on thunderx or on cortex-a57), but merely heard a “no speed-up”. So I am strongly in favor of defaulting to the ‘safe’ number of iterations, even when compiling for a generic target. Best, Philipp.
[PATCH][7/n] Remove GENERIC stmt combining from SCCVN
This moves a few more patterns from fold-const.c to match.pd, also covering two cases where fold basically recurses (calling fold_unary or using negate_expr_p/negate_expr). For these cases we have to implement a subset of all possible cases (all fold cases basically searching for some stuff arbitrarily deep in an expression tree would need to be converted to a propagator-like pass rather than too many patterns). It also moves (x y) | x - x before (x | CST1) CST2 - (x CST2) | (CST1 CST2) so it has the chance to match (x | 1) 1 which with the first pattern simplifies without needing intermediate stmts (which I don't allow for SCCVN for efficiency reasons). As for other generator inputs what comes earlier is matched earlier. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I think next I'll merge trunk into match-and-simplify and see what patterns I have implemented there and will either kill them off if they are broken or move them. Richard. 2015-06-29 Richard Biener rguent...@suse.de * fold-const.c (fold_unary_loc): Move abs(abs(x)) - abs(x), ~ (-A) to A - 1, ~ (A - 1) or ~ (A + -1) to -A and some cases of ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify to ... * match.pd: ... here. Add a few cases of A - B - A + (-B) when B easily negates. Move (x y) | x - x and friends before (x | CST1) CST2 - (x CST2) | (CST1 CST2). Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 225115) --- gcc/fold-const.c(working copy) *** fold_unary_loc (location_t loc, enum tre *** 8131,8139 TREE_TYPE (targ0), targ0)); } - /* ABS_EXPRABS_EXPRx = ABS_EXPRx even if flag_wrapv is on. */ - else if (TREE_CODE (arg0) == ABS_EXPR) - return arg0; /* Strip sign ops from argument. */ if (TREE_CODE (type) == REAL_TYPE) --- 8131,8136 *** fold_unary_loc (location_t loc, enum tre *** 8161,8193 return NULL_TREE; case BIT_NOT_EXPR: - /* Convert ~ (-A) to A - 1. */ - if (INTEGRAL_TYPE_P (type) TREE_CODE (arg0) == NEGATE_EXPR) - return fold_build2_loc (loc, MINUS_EXPR, type, - fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)), - build_int_cst (type, 1)); - /* Convert ~ (A - 1) or ~ (A + -1) to -A. */ - else if (INTEGRAL_TYPE_P (type) - ((TREE_CODE (arg0) == MINUS_EXPR -integer_onep (TREE_OPERAND (arg0, 1))) - || (TREE_CODE (arg0) == PLUS_EXPR - integer_all_onesp (TREE_OPERAND (arg0, 1) - { - /* Perform the negation in ARG0's type and only then convert -to TYPE as to avoid introducing undefined behavior. */ - tree t = fold_build1_loc (loc, NEGATE_EXPR, - TREE_TYPE (TREE_OPERAND (arg0, 0)), - TREE_OPERAND (arg0, 0)); - return fold_convert_loc (loc, type, t); - } /* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify. */ ! else if (TREE_CODE (arg0) == BIT_XOR_EXPR ! (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type, !fold_convert_loc (loc, type, ! TREE_OPERAND (arg0, 0) return fold_build2_loc (loc, BIT_XOR_EXPR, type, tem, ! fold_convert_loc (loc, type, ! TREE_OPERAND (arg0, 1))); else if (TREE_CODE (arg0) == BIT_XOR_EXPR (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type, fold_convert_loc (loc, type, --- 8158,8171 return NULL_TREE; case BIT_NOT_EXPR: /* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify. */ ! if (TREE_CODE (arg0) == BIT_XOR_EXPR ! (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type, ! fold_convert_loc (loc, type, ! TREE_OPERAND (arg0, 0) return fold_build2_loc (loc, BIT_XOR_EXPR, type, tem, ! fold_convert_loc (loc, type, ! TREE_OPERAND (arg0, 1))); else if (TREE_CODE (arg0) == BIT_XOR_EXPR (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type, fold_convert_loc (loc, type, Index: gcc/match.pd === *** gcc/match.pd(revision 225115) --- gcc/match.pd(working copy) *** (define_operator_list swapped_tcc_compar *** 378,389 --- 378,409 (bit_and @0 @1)) (simplify
Re: [gomp4][PATCH] Handle casts in bound in try_transform_to_exit_first_loop_alt
On 22/06/15 16:36, Richard Biener wrote: On Thu, 18 Jun 2015, Tom de Vries wrote: On 13/06/15 16:24, Tom de Vries wrote: Hi, this patch allows try_transform_to_exit_first_loop_alt to succeed when handling cases where the expression representing the number of iterations contains a cast. Currently, transform_to_exit_first_loop_alt testcase gfortran/parloops-exit-first-loop-alt.f95 will fail. The nit is _19, which is defined as follows: ... _20 = _6 + -1; _19 = (unsigned int) _20; ... And transform_to_exit_first_loop_alt currently only handles nits with defining stmt 'nit = x - 1', for which it finds alt_bound 'x'. The patch: - uses try_get_loop_niter to get nit as a nested tree expression '(unsigned int) (_6 + -1)' - strips the outer nops (assuming no change in value) - uses '(unsigned int)_6' as the alt_bound, and - gimplifies the expression. Bootstrapped and reg-tested on x86_64. Cleaned up whitespace in testcases. Committed to gomp-4_0-branch as atttached. OK for trunk? I assume the above also handles the reverse, (int) (_6 + -1)? AFAIU, niter.niter is always unsigned. In this case what happens if _6 == INT_MAX + 1? nit is INT_MAX but (int) _6 is INT_MIN. Likewise what happens if _6 + -1 under-/overflows? While playing around with underflow/overflow cases, I ran into PR66652, which reproduces on trunk. I've submitted for trunk: - https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02084.html , which fixes PR66652, and should handle all overflow concerns. - https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02085.html , which deals with the case that nit is not defined by assignment nit = n - 1, in other words, with the case that this patch is trying to address. I'll revert this patch on gomp-4_0-branch, and commit these patches instead (in addition to https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01768.html, which fixes low iteration count reduction loops handled by transform_to_exit_first_loop_alt). Thanks, - Tom
[committed] Flags outputs for asms
Sorry for the delay, but here's the third and final version. This includes the requested cpp symbol and updated documentation. r~ * config/i386/constraints.md (Bf): New constraint. * config/i386/i386-c.c (ix86_target_macros): Define __GCC_ASM_FLAG_OUTPUTS__. * config/i386/i386.c (ix86_md_asm_adjust): Handle =@cc* constraints as flags outputs. * doc/extend.texi (FlagOutputOperands): Document them. * gcc.target/i386/asm-flag-1.c: New. * gcc.target/i386/asm-flag-2.c: New. * gcc.target/i386/asm-flag-3.c: New. * gcc.target/i386/asm-flag-4.c: New. * gcc.target/i386/asm-flag-5.c: New. diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index c718bc1..2861d8d 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -146,11 +146,16 @@ @internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.) ;; We use the B prefix to denote any number of internal operands: +;; f FLAGS_REG ;; g GOT memory operand. ;; s Sibcall memory operand, not valid for TARGET_X32 ;; w Call memory operand, not valid for TARGET_X32 ;; z Constant call address operand. +(define_constraint Bf + @internal Flags register operand. + (match_operand 0 flags_reg_operand)) + (define_constraint Bg @internal GOT memory operand. (match_operand 0 GOT_memory_operand)) diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 28444f9..d506345 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -560,6 +560,8 @@ ix86_target_macros (void) cpp_define_formatted (parse_in, __ATOMIC_HLE_ACQUIRE=%d, IX86_HLE_ACQUIRE); cpp_define_formatted (parse_in, __ATOMIC_HLE_RELEASE=%d, IX86_HLE_RELEASE); + cpp_define (parse_in, __GCC_ASM_FLAG_OUTPUTS__); + ix86_target_macros_internal (ix86_isa_flags, ix86_arch, ix86_tune, diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9598600..144c1a6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4,21 +4,144 @@ ix86_c_mode_for_suffix (char suffix) /* Worker function for TARGET_MD_ASM_ADJUST. - We do this in the new i386 backend to maintain source compatibility + We implement asm flag outputs, and maintain source compatibility with the old cc0-based compiler. */ static rtx_insn * -ix86_md_asm_adjust (vecrtx /*outputs*/, vecrtx /*inputs*/, - vecconst char * /*constraints*/, +ix86_md_asm_adjust (vecrtx outputs, vecrtx /*inputs*/, + vecconst char * constraints, vecrtx clobbers, HARD_REG_SET clobbered_regs) { - clobbers.safe_push (gen_rtx_REG (CCmode, FLAGS_REG)); clobbers.safe_push (gen_rtx_REG (CCFPmode, FPSR_REG)); - - SET_HARD_REG_BIT (clobbered_regs, FLAGS_REG); SET_HARD_REG_BIT (clobbered_regs, FPSR_REG); - return NULL; + bool saw_asm_flag = false; + + start_sequence (); + for (unsigned i = 0, n = outputs.length (); i n; ++i) +{ + const char *con = constraints[i]; + if (strncmp (con, =@cc, 4) != 0) + continue; + con += 4; + if (strchr (con, ',') != NULL) + { + error (alternatives not allowed in asm flag output); + continue; + } + + bool invert = false; + if (con[0] == 'n') + invert = true, con++; + + machine_mode mode = CCmode; + rtx_code code = UNKNOWN; + + switch (con[0]) + { + case 'a': + if (con[1] == 0) + mode = CCAmode, code = EQ; + else if (con[1] == 'e' con[2] == 0) + mode = CCCmode, code = EQ; + break; + case 'b': + if (con[1] == 0) + mode = CCCmode, code = EQ; + else if (con[1] == 'e' con[2] == 0) + mode = CCAmode, code = NE; + break; + case 'c': + if (con[1] == 0) + mode = CCCmode, code = EQ; + break; + case 'e': + if (con[1] == 0) + mode = CCZmode, code = EQ; + break; + case 'g': + if (con[1] == 0) + mode = CCGCmode, code = GT; + else if (con[1] == 'e' con[2] == 0) + mode = CCGCmode, code = GE; + break; + case 'l': + if (con[1] == 0) + mode = CCGCmode, code = LT; + else if (con[1] == 'e' con[2] == 0) + mode = CCGCmode, code = LE; + break; + case 'o': + if (con[1] == 0) + mode = CCOmode, code = EQ; + break; + case 'p': + if (con[1] == 0) + mode = CCPmode, code = EQ; + break; + case 's': + if (con[1] == 0) + mode = CCSmode, code = EQ; + break; + case 'z': + if (con[1] == 0) + mode = CCZmode, code = EQ; + break; + } + if (code == UNKNOWN) + { + error (unknown asm flag output %qs,
Four jit backports to gcc 5 branch
I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg directories in trunk since gcc 5 and backported the following 4 changes from trunk to the gcc-5-branch: gcc-5-branch's r225123: * trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition of test-benchmark.c): https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html gcc-5-branch's r225125: * trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957, PR jit/66539: Add parentheses as needed to gcc_jit_object_get_debug_string): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html gcc-5-branch's r225127: * trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add a test for compound assignment): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html gcc-5-branch's r225129: * trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a, jit: Add missing type-checking to gcc_jit_{l|r}value_access_field) https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html Note to self: I believe that covers everything on master up to ec2e0095a3a5988b03a2706b5ffe0e807b238ba8 (on 2015-06-25) that might be backportable. Dave
[PATCH] Insert new bound in try_transform_to_exit_first_loop_alt
Hi, this patch allows try_transform_to_exit_first_loop_alt to handle the case that the new loop bound nit + 1 is not available as ssa-name n in the assignment nit = n - 1, by inserting the new loop bound. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Insert new bound in try_transform_to_exit_first_loop_alt 2015-06-29 Tom de Vries t...@codesourcery.com * tree-parloops.c (try_transform_to_exit_first_loop_alt): If not found, insert nit + 1 bound. * testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95: New test. * testsuite/libgomp.fortran/parloops-exit-first-loop-alt.f95: New test. * gfortran.dg/parloops-exit-first-loop-alt-2.f95: New test. * gfortran.dg/parloops-exit-first-loop-alt.f95: New test. --- .../gfortran.dg/parloops-exit-first-loop-alt-2.f95 | 24 + .../gfortran.dg/parloops-exit-first-loop-alt.f95 | 25 + gcc/tree-parloops.c| 18 +- .../parloops-exit-first-loop-alt-2.f95 | 40 + .../parloops-exit-first-loop-alt.f95 | 41 ++ 5 files changed, 147 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 create mode 100644 libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95 create mode 100644 libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt.f95 diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 new file mode 100644 index 000..f26a6e3 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 @@ -0,0 +1,24 @@ +! { dg-additional-options -O2 } +! { dg-require-effective-target pthread } +! { dg-additional-options -ftree-parallelize-loops=2 } +! { dg-additional-options -fdump-tree-parloops } + +! Constant bound, vector addition. + +subroutine foo () + integer, parameter :: n = 1000 + integer, dimension (0:n-1) :: a, b, c + common a, b, c + integer :: ii + + do ii = 0, n - 1 + c(ii) = a(ii) + b(ii) + 25 + end do +end subroutine foo + +! Three times plus 25: +! - once in f._loopfn.0 +! - once in the parallel +! - once in the low iteration count loop +! Crucially, none for a peeled off last iteration following the parallel. +! { dg-final { scan-tree-dump-times (?n) \\+ 25; 3 parloops } } diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 new file mode 100644 index 000..6dc8a38 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 @@ -0,0 +1,25 @@ +! { dg-additional-options -O2 } +! { dg-require-effective-target pthread } +! { dg-additional-options -ftree-parallelize-loops=2 } +! { dg-additional-options -fdump-tree-parloops } + +! Variable bound, vector addition. + +subroutine foo (nr) + integer, intent(in) :: nr + integer, parameter :: n = 1000 + integer, dimension (0:n-1) :: a, b, c + common a, b, c + integer :: ii + + do ii = 0, nr - 1 + c(ii) = a(ii) + b(ii) + 25 + end do +end subroutine foo + +! Three times plus 25: +! - once in f._loopfn.0 +! - once in the parallel +! - once in the low iteration count loop +! Crucially, none for a peeled off last iteration following the parallel. +! { dg-final { scan-tree-dump-times (?n) \\+ 25; 3 parloops } } diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 32d059a..7a07c7d 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1840,8 +1840,24 @@ try_transform_to_exit_first_loop_alt (struct loop *loop, alt_bound = op1; } + /* If not found, insert nit + 1. */ if (alt_bound == NULL_TREE) -return false; +{ + alt_bound = fold_build2 (PLUS_EXPR, nit_type, nit, + build_int_cst_type (nit_type, 1)); + + gimple_seq pre = NULL, post = NULL; + push_gimplify_context (true); + gimplify_expr (alt_bound, pre, post, is_gimple_reg, + fb_rvalue); + pop_gimplify_context (NULL); + + gimple_seq_add_seq (pre, post); + + gimple_stmt_iterator gsi + = gsi_last_bb (loop_preheader_edge (loop)-src); + gsi_insert_seq_after (gsi, pre, GSI_CONTINUE_LINKING); +} transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound); return true; diff --git a/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95 b/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95 new file mode 100644 index 000..56add65 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95 @@ -0,0 +1,40 @@ +! { dg-do run } +! { dg-additional-options -O2 } +! { dg-additional-options -ftree-parallelize-loops=2 } + +! Constant bound, vector addition. + +subroutine foo () + integer, parameter :: n = 1000 + integer, dimension (0:n-1) :: a, b, c + common a, b, c + integer :: ii + +
Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware
On Fri, Jun 19, 2015 at 05:30:21PM +0100, Matt Thomas wrote: On Jun 19, 2015, at 8:51 AM, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi James, On Tue, 2015-06-16 10:58:48 +0100, James Greenhalgh james.greenha...@arm.com wrote: The testcase in this patch, from libgcc, causes an ICE in the Vax build. [...] As far as I know, reload is going to get rid of these SUBREGs for us, so we don't need to modify the output statement. Tested that this restores the VAX build and that the code-gen is sensible for the testcase. OK? Looks good to me, but Matt has to ACK this fix. I so ACK. Now that this has had a few days sitting on trunk without seeing any complaints, would you mind if I backported it to the GCC 5 branch? Thanks, James
Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware
On Jun 29, 2015, at 8:19 AM, James Greenhalgh james.greenha...@arm.com wrote: Now that this has had a few days sitting on trunk without seeing any complaints, would you mind if I backported it to the GCC 5 branch? I don’t have a problem with that.
[PATCH C/C++/middle-end] move Wunused-parameter to the FEs
Wunused-parameter warnings are given from cgraph::finalize_function, which is the middle-end. This is an oddity compared to other -Wunused-* warnings. Moreover, Fortran has its own definition of -Wunused-parameter that conflicts with the middle-end definition. This patch moves the middle-end part of Wunused-parameter to the C/C++ FEs. I'm not sure if other FEs expected this warning to work. If so, they do not seem to test for it. Ada, for example, explicitly disables it. This also fixes [fortran/66605] -Wunused-parameter causes internal compiler error with gfortran 5.1.0. Bootstrapped regression tested on x86_64-linux-gnu with languages c,c++,objc,fortran,ada,obj-c++ OK? gcc/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * cgraphunit.c (cgraph_node::finalize_function): Do not call do_warn_unused_parameter. * function.c (do_warn_unused_parameter): Move from here. * function.h (do_warn_unused_parameter): Do not declare. gcc/c-family/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * c-common.c (do_warn_unused_parameter): Move here. * c-common.h (do_warn_unused_parameter): Declare. gcc/ada/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * gcc-interface/misc.c (gnat_post_options): No need to disable warn_unused_parameter anymore. gcc/cp/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * decl.c (finish_function): Call do_warn_unused_parameter. gcc/testsuite/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * gfortran.dg/wunused-parameter.f90: New test. gcc/c/ChangeLog: 2015-06-28 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/66605 * c-decl.c (finish_function): Call do_warn_unused_parameter. Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 224997) +++ gcc/c-family/c-common.c (working copy) @@ -12045,10 +12045,27 @@ do_warn_double_promotion (tree result_ty else return; warning_at (loc, OPT_Wdouble_promotion, gmsgid, source_type, result_type); } +/* Possibly warn about unused parameters. */ + +void +do_warn_unused_parameter (tree fn) +{ + tree decl; + + for (decl = DECL_ARGUMENTS (fn); + decl; decl = DECL_CHAIN (decl)) +if (!TREE_USED (decl) TREE_CODE (decl) == PARM_DECL +DECL_NAME (decl) !DECL_ARTIFICIAL (decl) +!TREE_NO_WARNING (decl)) + warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wunused_parameter, + unused parameter %qD, decl); +} + + /* Setup a TYPE_DECL node as a typedef representation. X is a TYPE_DECL for a typedef statement. Create a brand new ..._TYPE node (which will be just a variant of the existing ..._TYPE node with identical properties) and then install X Index: gcc/c-family/c-common.h === --- gcc/c-family/c-common.h (revision 224997) +++ gcc/c-family/c-common.h (working copy) @@ -1041,10 +1041,11 @@ extern void warn_for_div_by_zero (locati extern void warn_for_sign_compare (location_t, tree orig_op0, tree orig_op1, tree op0, tree op1, tree result_type, enum tree_code resultcode); +extern void do_warn_unused_parameter (tree); extern void do_warn_double_promotion (tree, tree, tree, const char *, location_t); extern void set_underlying_type (tree); extern void record_types_used_by_current_var_decl (tree); extern void record_locally_defined_typedef (tree); Index: gcc/c/c-decl.c === --- gcc/c/c-decl.c (revision 224997) +++ gcc/c/c-decl.c (working copy) @@ -9026,10 +9026,14 @@ finish_function (void) /* Complain about locally defined typedefs that are not used in this function. */ maybe_warn_unused_local_typedefs (); + /* Possibly warn about unused parameters. */ + if (warn_unused_parameter) +do_warn_unused_parameter (fndecl); + /* Store the end of the function, so that we get good line number info for the epilogue. */ cfun-function_end_locus = input_location; /* Finalize the ELF visibility for the function. */ Index: gcc/cgraphunit.c === --- gcc/cgraphunit.c(revision 224997) +++ gcc/cgraphunit.c(working copy) @@ -470,14 +470,10 @@ cgraph_node::finalize_function (tree dec /* If we've not yet emitted decl, tell the debug info about it. */ if (!TREE_ASM_WRITTEN (decl)) (*debug_hooks-deferred_inline_function) (decl); - /* Possibly warn about unused parameters. */ - if (warn_unused_parameter) -do_warn_unused_parameter (decl); - if
Re: [PATCH C/C++/middle-end] move Wunused-parameter to the FEs
OK. Jason
Re: [PATCH] top-level for libvtv: use normal (not raw_cxx) target exports
On 06/25/2015 09:15 PM, Jeff Law wrote: On 06/09/2015 08:22 AM, Michael Haubenwallner wrote: Hi build machinery maintainers, since we always build the C++ compiler now, I fail to see the need to still use RAW_CXX_TARGET_EXPORTS for libvtv. But why is vtv special here? Wouldn't this also apply to libstdc++-v3 which would still have raw_cxx=true after your change? libstdc++ breaks with shell syntax errors in maintainer-mode, which more obviously spots the configure script recreation while running as the breakage's origin, where the simple enough workaround is to run 'make bootstrap' again. This is because libstdc++-v3/configure runs 'generate-headers' (via config.status), which is 'cd include make', causing dependencies in 'include/Makefile' to be evaluated, causing make to run autoconf to recreate libstdc++-v3/configure. Actually, an unexpected libstdc++-v3/configure change is introduced by some inconsistency in https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=223196 where acinclude.m4 contains different code than the recreated configure: acinclude.m4-3973: case ${target_os} in acinclude.m4#3974: gnu* | linux* | solaris*) acinclude.m4-3975: GCC_TRY_COMPILE_OR_LINK( configure-79218: case ${target_os} in configure#79219: gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu | solaris*) configure-79220: if test x$gcc_no_link = xyes; then Not sure how to handle such kind of inconsistencies though... It's unfortunate that David didn't document the meaning of raw_cxx... I've read the threads from 2004 and I'm still lost. For vtv the breakage once resulted in the different CC/CXX values here, but that must've been related to maintainer-mode actually, not raw_cxx. From my point of view, this breakage vanished as reason for the raw_cxx patch. /haubi/
RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd
Richard, please have a look at my question below in a reference to your previous statement. On Thu, 18 Jun 2015, Steve Ellcey wrote: OK, I checked in the prequel patch and here is a new copy of the original patch based off of that (and with no HONOR_NAN checks in the fma/madd instructions). OK for checkin? Please see below for my notes. 2015-06-18 Steve Ellcey sell...@imgtec.com * config.gcc (mips*-*-*): Add fused-madd.opt. Please use angle brackets as per https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-Changed.html, i.e.: * config.gcc mips*-*-*: Add fused-madd.opt. There's no function or similar entity involved here and `mips*-*-*' is a case value like with the C language's `switch' statement where you'd use angle brackets too to refer to individual cases. (*nmsub4mode_fastmath) Update condition. Extraneous space here. diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index f6912e1..4f5692c 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md [...] +;; fnma is defined in GCC as (fma (neg op1) op2 op3) +;; (-op1 * op2) + op3 == -(op1 * op2) + op3 == -((op1 * op2) - op3) +;; The mips nmsub instructions implement -((op1 * op2) - op3) +;; This transformation means we may return the wrong signed zero +;; so we check HONOR_SIGNED_ZEROS. + +(define_expand fnmamode4 + [(set (match_operand:ANYF 0 register_operand) + (fma:ANYF (neg:ANYF (match_operand:ANYF 1 register_operand)) + (match_operand:ANYF 2 register_operand) + (match_operand:ANYF 3 register_operand)))] + (ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4) +!HONOR_SIGNED_ZEROS (MODEmode)) Have you considered the alternative/complementary approach proposed by Richard here: http://gcc.gnu.org/ml/gcc-patches/2010-11/msg00680.html, i.e. to introduce further expanders, e.g.: fmanM4: (neg:M (fma:M OP1 OP2 OP3)) (multiply-add, negated) fmsnM4: (neg:M (fma:M OP1 OP2 (neg:M OP3))) (multiply-subtract, negated) ? These patterns wouldn't need a check for !HONOR_SIGNED_ZEROS as they match the respective hardware instructions in an exact manner. Therefore I think they would be more useful as they would also suit software that claims/requires full IEEE Std 754 compliance. Richard, do you maintain the introduction of these additional operations would be a good idea and one you're willing to support for the purpose of patch acceptance/approval if implemented? +;; fnms is defined as: (fma (neg op1) op2 (neg op3)) +;; ((-op1) * op2) - op3 == -(op1 * op2) - op3 == -((op1 * op2) + op3) +;; The mips nmadd instructions implement -((op1 * op2) + op3) +;; This transformation means we may return the wrong signed zero +;; so we check HONOR_SIGNED_ZEROS. + +(define_expand fnmsmode4 + [(set (match_operand:ANYF 0 register_operand) + (fma:ANYF + (neg:ANYF (match_operand:ANYF 1 register_operand)) + (match_operand:ANYF 2 register_operand) + (neg:ANYF (match_operand:ANYF 3 register_operand] + (ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4) +!HONOR_SIGNED_ZEROS (MODEmode)) Same observation here. The change looks good to me otherwise. Maciej
[nios2, committed] fix bad assertion
When I was preparing to regression-test something else in a nios2-linux-gnu build, I discovered it was ICE'ing while building shared libraries with -fpic (glibc, libgomp). I tracked this down to having started with r224048, but on further investigation I decided that commit merely exposed a latent bug. The trouble is that the assertion in nios2_delegitimize_address is too restrictive compared to what nios2_legitimize_address can produce. It's expecting to find a SYMBOL_REF underneath but in one case it was crashing on a LABEL_REF (for a computed goto), and in another case it was a symbol + offset expression which is even documented with a big block of comments in nios2_legitimize_address. I've checked in this patch to relax the assertion; it allows the toolchain to build again, and test results look decent. -Sandra 2015-06-29 Sandra Loosemore san...@codesourcery.com gcc/ * config/nios2/nios2.c (nios2_delegitimize_address): Make assert less restrictive. Index: gcc/config/nios2/nios2.c === --- gcc/config/nios2/nios2.c (revision 225094) +++ gcc/config/nios2/nios2.c (working copy) @@ -1920,7 +1920,7 @@ nios2_delegitimize_address (rtx x) case UNSPEC_LOAD_TLS_IE: case UNSPEC_ADD_TLS_LE: x = XVECEXP (XEXP (x, 0), 0, 0); - gcc_assert (GET_CODE (x) == SYMBOL_REF); + gcc_assert (CONSTANT_P (x)); break; } }
[gomp4] Remove unnecessary conversions
Hi, This patch removes the conversion of the arguments. As the arguments are already of the proper type, via the front ends, there's no need to the conversion. Furthermore, if the node is a 'const int', then the conversion function will create a node that can't be used as an argument to a gimple call. Committed to gomp-4_0-branch. Jim diff --git a/gcc/ChangeLog.gomp b/gcc/ChangeLog.gomp index 4d449b0..f2b6c4b 100644 --- a/gcc/ChangeLog.gomp +++ b/gcc/ChangeLog.gomp @@ -1,3 +1,7 @@ +2015-06-29 James Norris jnor...@codesourcery.com + + * omp-low.c (expand_omp_target): Remove unnecessary conversion. + 2015-06-26 Nathan Sidwell nat...@codesourcery.com * config/nvptx/nvptx.md (UNSPEC_ID): Rename to ... diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 5914fcf..24fac7c 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -10138,22 +10138,17 @@ expand_omp_target (struct omp_region *region) = fold_convert_loc (gimple_location (entry_stmt), integer_type_node, integer_one_node); /* ..., but if present, use the value specified by the respective - clause, making sure that are of the correct type. */ + clause. */ c = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS); if (c) - t_num_gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (c), - integer_type_node, - OMP_CLAUSE_NUM_GANGS_EXPR (c)); + t_num_gangs = OMP_CLAUSE_NUM_GANGS_EXPR (c); c = find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS); if (c) - t_num_workers = fold_convert_loc (OMP_CLAUSE_LOCATION (c), - integer_type_node, - OMP_CLAUSE_NUM_WORKERS_EXPR (c)); + t_num_workers = OMP_CLAUSE_NUM_WORKERS_EXPR (c); c = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH); if (c) - t_vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (c), - integer_type_node, - OMP_CLAUSE_VECTOR_LENGTH_EXPR (c)); + t_vector_length = OMP_CLAUSE_VECTOR_LENGTH_EXPR (c); + args.quick_push (t_num_gangs); args.quick_push (t_num_workers); args.quick_push (t_vector_length);
Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
Hi All, Here is updated patch containing missed change in slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses in inner loop. ChangeLog: 2015-06-29 Yuri Rumyantsev ysrum...@gmail.com * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument to allow renaming of PHI arguments on edges incoming from outer loop header, add corresponding check before start PHI iterator. (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool variable DUPLICATE_OUTER_LOOP and set it to true for outer loops with true force_vectorize. Set-up dominator for outer loop too. Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it was marked with force_vectorize and has restricted cfg. (slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in inner loop. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not do peeling for outer loops. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-outer-simd-2.c: New test. 2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, I attached updated patch. You asked me to explain why I did changes for renaming. If we do not change PHI arguments for inner loop header we can get the following IR: source outer loop: bb 5: outer-loop header # i_45 = PHI 0(4), i_18(9) # .MEM_17 = PHI .MEM_4(D)(4), .MEM_44(9) bb 6:inner-loop header # .MEM_46 = PHI .MEM_44(7), .MEM_17(5) after duplication we have (without argument correction): bb 12: # i_74 = PHI i_64(13), 0(17) # .MEM_73 = PHI .MEM_97(13), .MEM_4(D)(17) bb 15: # .MEM_63 = PHI .MEM_17(12), .MEM_97(16) and later we get verifier error: test1.c:20:6: error: definition in block 6 does not dominate use in block 10 void foo (int n) ^ for SSA_NAME: .MEM_17 in statement: .MEM_63 = PHI .MEM_17(10), .MEM_97(14) and you can see that we need to rename MEM_17 argument for out-coming edge to MEM_73 since MEM_17 was converted to MEM_73 in outer-loop header. This explains my simple fix in rename_variables_in_bb. Note also that loop distribution is not performed for outer loops. I also did a change in slpeel_can_duplicate_loop_p to simplify check. Any comments will be appreciated. Yuri. 2015-06-17 15:24 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Thanks a lot Richard for your review. I presented updated patch which is not gated by force_vectorize. I added test on outer-loop in vect_enhance_data_refs_alignment and it returns false for it because we can not improve dr alighment through outer-loop peeling in general. So I assume that only versioning for alignment can be applied for targets do not support unaligned memory access. @@ -998,7 +998,12 @@ gimple stmt = DR_STMT (dr); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); tree vectype = STMT_VINFO_VECTYPE (stmt_info); + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + /* Peeling of outer loops can't improve alignment. */ + if (loop_vinfo LOOP_VINFO_LOOP (loop_vinfo)-inner) +return false; + but this looks wrong. It depends on the context (DRs in the outer loop can improve alignment by peeling the outer loop and we can still peel the inner loop for alignment). So IMHO the correct place to amend is vect_enhance_data_refs_alignment (which it seems doesn't consider peeling the inner loop). I'd say you should simply add || loop-inner) to the /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) do_peeling = false; check. I did not change tests for outer loops in slpeel_can_duplicate_loop_p as you proposed since it is not called outside vectorization. There is still no reason for this complex condition, so please remove it. _Please_ also generate diffs with -p, it is very tedious to see patch hunks without a function name context. You didn't explain why you needed the renaming changes as I don't remember needing it when using the code from loop distribution. I also noticed one not-resolved issue with outer-loop peeling - we don't consider remainder for possible vectorization of inner-loop as we can see on the following example: for (i = 0; i n; i++) { diff = 0; for (j = 0; j M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } Is it worth to fix it? You mean vectorizing the inner loop in the niter peel epilogue loop of the outer loop? Possibly yes. Thanks, Richard. 2015-06-16 Yuri Rumyantsev ysrum...@gmail.com * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument to allow renaming of PHI arguments on edges incoming from outer loop header, add corresponding check before start PHI iterator. (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool variable
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
On Jun 29, 2015, at 4:44 AM, Dr. Philipp Tomsich philipp.toms...@theobroma-systems.com wrote: James, On 29 Jun 2015, at 13:36, James Greenhalgh james.greenha...@arm.com wrote: On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote: -Original Message- From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Monday, June 29, 2015 2:17 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. True. I would be very uncomfortable with this approach. Same here. The default must be safe. Always. Unlike other architectures, we don’t have a problem with making the proper defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of precise bits per iteration. From Richard Biener's post in the thread Michael Matz linked earlier in the thread: It would follow existing practice of things we allow in -funsafe-math-optimizations. Existing practice in that we want to allow -ffast-math use with common benchmarks we care about. https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html With the solution you seem to be converging on (2-steps for some microarchitectures, 3 for others), a binary generated for one micro-arch may drop below a minimum guarantee of precision when run on another. This seems to go against the spirit of the practice above. I would only support adding this optimization to -Ofast if we could keep to architectural guarantees of precision in the generated code (i.e. 3-steps everywhere). I don't object to adding a -mlow-precision-recip-sqrt style option, which would be off by default, would enable the 2-step mode, and would need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't see what this buys you beyond the Gromacs boost (and even there you would be creating an Invalid Run as optimization flags must be applied across all workloads). Any flag that reduces precision (and thus breaks IEEE floating-point semantics) needs to be gated with an “unsafe” flag (i.e. one that is never on by default). As a consequence, the “peak”-tuning for SPEC will turn this on… but barely anyone else would. For the 3-step optimization, it is clear to me that for generic tuning we don't want this to be enabled by default experimental results and advice in this thread argues against it for thunderx and cortex-a57 targets. However, enabling it based on the CPU tuning selected seems fine to me. I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5 iterations respectively) to become the default. Most “server-type” chips should not see a performance regression, while it will be easier to optimise for this in hardware than for a (potentially microcoded) sqrt-instruction (and subsequent, dependent divide). I have not heard anyone claim a performance regression (either on thunderx or on cortex-a57), but merely heard a “no speed-up”. Actually it does regress performance on thunderX, I just assumed that when I said not going to be a win it was taken as a slow down. It regress gromacs by more than 10% on thunderX but I can't remember how much as i had someone else run it. The latency difference is also over 40%; for example single precision: 29 cycles with div (12) sqrt(17) directly vs 42 cycles with the rsqrte and 2 iterations of 2mul/rsqrts (double is 53 vs 60). That is huge difference right there. ThunderX has a fast div and a fast sqrt for 32bit and a reasonable one for double. So again this is not just not a win but rather a regression for thunderX. I suspect cortex-a57 is also true. Thanks, Andrew So I am strongly in favor of defaulting to the ‘safe’ number of iterations, even when compiling for a generic target. Best, Philipp.
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Venkat, now I could reproduce it. Strangely it does not happen with -flto. I will try to find out the reason for that. Thank you for the catch, Benedikt signature.asc Description: Message signed with OpenPGP using GPGMail
Fix PR ada/63310
This fixes the -fcompare-debug bootstrap failure reported by Jakub. Tested on x86_64-suse-linux, applied on the mainline. 2015-06-29 Eric Botcazou ebotca...@adacore.com PR ada/63310 * gcc-interface/utils.c (gnat_write_global_declarations): Always build the dummy global variable if code was generated. -- Eric BotcazouIndex: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 225111) +++ gcc-interface/utils.c (working copy) @@ -5202,8 +5202,9 @@ gnat_write_global_declarations (void) tree iter; /* If we have declared types as used at the global level, insert them in - the global hash table. We use a dummy variable for this purpose. */ - if (types_used_by_cur_var_decl !types_used_by_cur_var_decl-is_empty ()) + the global hash table. We use a dummy variable for this purpose, but + we need to build it unconditionally to avoid -fcompare-debug issues. */ + if (first_global_object_name) { struct varpool_node *node; char *label; @@ -5218,11 +5219,12 @@ gnat_write_global_declarations (void) node-definition = 1; node-force_output = 1; - while (!types_used_by_cur_var_decl-is_empty ()) - { - tree t = types_used_by_cur_var_decl-pop (); - types_used_by_var_decl_insert (t, dummy_global); - } + if (types_used_by_cur_var_decl) + while (!types_used_by_cur_var_decl-is_empty ()) + { + tree t = types_used_by_cur_var_decl-pop (); + types_used_by_var_decl_insert (t, dummy_global); + } } /* Output debug information for all global type declarations first. This
[PATCH, i386] Remove MPX jump insn patterns
Hello! Attached patch removes special MPX jump insn patterns and extends generic jump insn patterns to handle bnd prefix using existing MPX infrastructure. 2015-06-29 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*jcc_1): Use %! in asm template. Set attribute length_nobnd instead of length. (*jcc_2): Ditto. (jump): Ditto. (*jcc_1_bnd, *jcc_2_bnd, jump_bnd): Remove insn patterns. Patch was tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 225123) +++ config/i386/i386.md (working copy) @@ -10948,24 +10948,6 @@ ;; Basic conditional jump instructions. ;; We ignore the overflow flag for signed branch instructions. -(define_insn *jcc_1_bnd - [(set (pc) - (if_then_else (match_operator 1 ix86_comparison_operator - [(reg FLAGS_REG) (const_int 0)]) - (label_ref (match_operand 0)) - (pc)))] - TARGET_MPX ix86_bnd_prefixed_insn_p (insn) - bnd %+j%C1\t%l0 - [(set_attr type ibr) - (set_attr modrm 0) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -(const_int 3) -(const_int 7)))]) - (define_insn *jcc_1 [(set (pc) (if_then_else (match_operator 1 ix86_comparison_operator @@ -10973,35 +10955,18 @@ (label_ref (match_operand 0)) (pc)))] - %+j%C1\t%l0 + %!%+j%C1\t%l0 [(set_attr type ibr) (set_attr modrm 0) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -(const_int 2) -(const_int 6)))]) + (set (attr length_nobnd) + (if_then_else + (and (ge (minus (match_dup 0) (pc)) + (const_int -126)) + (lt (minus (match_dup 0) (pc)) + (const_int 128))) + (const_int 2) + (const_int 6)))]) -(define_insn *jcc_2_bnd - [(set (pc) - (if_then_else (match_operator 1 ix86_comparison_operator - [(reg FLAGS_REG) (const_int 0)]) - (pc) - (label_ref (match_operand 0] - TARGET_MPX ix86_bnd_prefixed_insn_p (insn) - bnd %+j%c1\t%l0 - [(set_attr type ibr) - (set_attr modrm 0) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -(const_int 3) -(const_int 7)))]) - (define_insn *jcc_2 [(set (pc) (if_then_else (match_operator 1 ix86_comparison_operator @@ -11009,16 +10974,17 @@ (pc) (label_ref (match_operand 0] - %+j%c1\t%l0 + %!%+j%c1\t%l0 [(set_attr type ibr) (set_attr modrm 0) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -(const_int 2) -(const_int 6)))]) + (set (attr length_nobnd) + (if_then_else + (and (ge (minus (match_dup 0) (pc)) + (const_int -126)) + (lt (minus (match_dup 0) (pc)) + (const_int 128))) + (const_int 2) + (const_int 6)))]) ;; In general it is not safe to assume too much about CCmode registers, ;; so simplify-rtx stops when it sees a second one. Under certain @@ -11452,35 +11418,21 @@ ;; Unconditional and other jump instructions -(define_insn jump_bnd - [(set (pc) - (label_ref (match_operand 0)))] - TARGET_MPX ix86_bnd_prefixed_insn_p (insn) - bnd jmp\t%l0 - [(set_attr type ibr) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -(const_int 3) -(const_int 6))) - (set_attr modrm 0)]) - (define_insn jump [(set (pc) (label_ref (match_operand 0)))] - jmp\t%l0 + %!jmp\t%l0 [(set_attr type ibr) - (set (attr length) - (if_then_else (and (ge (minus (match_dup 0) (pc)) - (const_int -126)) - (lt (minus (match_dup 0) (pc)) - (const_int 128))) -
Re: [PATCH, PR66652] Use max_loop_iterations in transform_to_exit_first_loop_alt
On 06/29/2015 08:24 AM, Tom de Vries wrote: Hi, this patch fixes PR66652. It uses max_loop_iterations in transform_to_exit_first_loop_alt to ensure that the new loop bound nit + 1 doesn't overflow. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom 0001-Use-max_loop_iterations-in-transform_to_exit_first_l.patch Use max_loop_iterations in transform_to_exit_first_loop_alt 2015-06-29 Tom de Vriest...@codesourcery.com PR tree-optimization/66652 * tree-parloops.c (try_transform_to_exit_first_loop_alt): Use max_loop_iterations to determine if nit + 1 overflows. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (f): Rewrite using restrict pointers. (main): Add arguments to calls to f. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: Same. * gcc.dg/parloops-exit-first-loop-alt-pr66652.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c (f): Rewrite using restrict pointers. * gcc.dg/parloops-exit-first-loop-alt.c: Same. OK. Make sure to put the PR marker in the testsuite/ChangeLog entry and drop the testsuite/ prefix in the testsuite/ChangeLog entry. jeff
Re: [Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
On Mon, 29 Jun 2015, David Sherwood wrote: Hi, I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the IEEE versions of fmin and fmax. This is done by recognising the math library fmax and fmin builtin functions in a similar way to how this is done for -ffast-math. This also allows us to vectorise the IEEE max/min functions for targets that support it, for example aarch64/aarch32. This patch is missing documentation. You need to document the new insn patterns in md.texi and the new tree codes in generic.texi. -- Joseph S. Myers jos...@codesourcery.com
RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Hi, -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Benedikt Huber Sent: Monday, June 29, 2015 11:04 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Dr. Philipp Tomsich; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Venkat, now I could reproduce it. Strangely it does not happen with -flto. I will try to find out the reason for that. Sure, I kind of remember that RTX for target in builtin rsqrt was 0. But not sure if we need to create reg rtx in such cases. Thank you for the catch, Benedikt
Re: [PATCH 1/2] Add gcc/typed-splay-tree.h
On Thu, 2015-06-25 at 13:18 -0600, Jeff Law wrote: On 06/25/2015 01:13 PM, David Malcolm wrote: I found when implementing switch statements for the jit that it was much easier to work with libiberty's splay-tree.h by first wrapping it in a C++ wrapper to add typesafety. This patch adds such a wrapper, implementing the methods I needed. It's unused in this patch, but is used in the followup patch (which only touches the jit). OK for trunk? gcc/ChangeLog: * typed-splay-tree.h: New file. OK. Well, this is embarrassing, it seems the patch I posted to the list doesn't actually compile. The underlying splay_tree_insert returns a splay_tree_node, I had the typed_splay_tree::insert returning a value_type. I dropped this bogus return type from the insert method in the implementation, in favor of void, but I forgot to update the declaration, leading to errors when attempting to actually compile this (via jit/jit-recording.c in the followup patch). The attached one-liner patch drops it from the declaration, and applies to [PATCH 1/2]. I don't know if I can count this as obvious... It does compile now, and make check-jit looks good. Suitably mortified Dave commit 839fdd693d2fe8d12f04ab5bde20e8595b3031f2 Author: David Malcolm dmalc...@redhat.com Date: Fri Jun 26 05:22:32 2015 -0400 fix typed-splay-tree.h insert diff --git a/gcc/typed-splay-tree.h b/gcc/typed-splay-tree.h index 1ec4894..7849862 100644 --- a/gcc/typed-splay-tree.h +++ b/gcc/typed-splay-tree.h @@ -42,7 +42,7 @@ class typed_splay_tree value_type lookup (key_type k); value_type predecessor (key_type k); value_type successor (key_type k); - value_type insert (key_type k, value_type v); + void insert (key_type k, value_type v); private: static value_type node_to_value (splay_tree_node node);
Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern
On 06/01/2015 04:55 AM, Richard Biener wrote: On Sat, May 30, 2015 at 11:11 AM, Marc Glisse marc.gli...@inria.fr wrote: (only commenting on the technique, not on the transformation itself) +(simplify + (cond @0 (convert @1) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) +COMPARISON_CLASS_P (@0) If you add COMPARISON_CLASS_P to define_predicates, you could write: (cond COMPARISON_CLASS_P@0 (convert @1) INTEGER_CST@2) But that would fail to match on GIMPLE, so I don't like either variant as Jeffs relies on the awkward fact that on GIMPLE cond expr conditions are GENERIC and yours wouldn't work. That said - for this kind of patterns testcases that exercise the patterns on GIMPLE would be very appreciated. It may be the case that these patterns don't make a lot of sense on gimple and should be restricted to generic, at least with our current infrastructure. The problem is when we lower from generic to gimple, we end up with branchy code, not straight line code and there's no good way I see to write a match.pd pattern which encompasses flow control. So to discover the MIN/MAX with typecast, we're probably stuck hacking minmax_replacement to know when it can ignore the typecast. That may in fact be a good thing -- I haven't looked closely yet, but 45397 may be of a similar nature (no good chance to see straight line code for match.pd, thus we're forced to handle it in phiopt). So do we want to restrict the new pattern on GENERIC, then rely on phiopt to get smarter and catch this stuff for GIMPLE? Or drop the pattern totally and do everything in phiopt on GIMPLE? or maybe use a for loop on comparisons, which would give names to TREE_OPERAND (@0, *). This should even handle the operand_equal_p alternative: (cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2) Yes, that would be my reference. But won't this require pointer equivalence? Are INTEGER_CST nodes fully shared? What if @1 is something more complex than a _DECL node (remember, we're working with GENERIC). So something like (cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4)) And using operand_equal_p seems more appropriate to me (and is still better than the original (cond @0 ...) and grubbing around inside @0 to look at operands. +int_fits_type_p (@2, TREE_TYPE (@1)) +((operand_equal_p (TREE_OPERAND (@0, 0), @2, 0) +operand_equal_p (TREE_OPERAND (@0, 1), @1, 0)) + || (operand_equal_p (TREE_OPERAND (@0, 0), @1, 0) + operand_equal_p (TREE_OPERAND (@0, 1), @2, 0 +(with { tree itype = TREE_TYPE (@1); tree otype = TREE_TYPE (@2); } + (convert:otype (cond:itype @0 @1 (convert:itype @2)) This should be enough, no need to specify the outer type (convert (cond:itype @0 @1 (convert:itype @2)) Yes. I believe we should not have to write cond:itype here, cond should be made to use the type of its second argument instead of the first one, by default (expr::gen_transform already has a few special cases). Indeed. Patch welcome (I'd have expected it already works...) With Marc's fix, those explicit types are no longer needed. Jeff
Re: [gomp4, fortran] Patch to fix continuation checks of OpenACC and OpenMP directives
Hi Cesar! Thanks for your review! 08.06.2015, 17:59, Cesar Philippidis ce...@codesourcery.com: On 06/07/2015 02:05 PM, Ilmir Usmanov wrote: Fixed fortran mail-list address. Sorry for inconvenience. 08.06.2015, 00:01, Ilmir Usmanov m...@ilmir.us: Hi Cesar! This patch fixes checks of OpenMP and OpenACC continuations in case if someone mixes them (i.e. continues OpenMP directive with !$ACC sentinel or vice versa). OK for gomp branch? Thanks for working on this. Does this fix PR63858 by any chance? No problem. I had a feeling that something is wrong in the scanner since I've committed an initial support of OpenACC ver. 1.0 to gomp branch (more than a year ago). Now it does fix the PR, because I've added support of fixed form to the patch. BTW, your test in the PR has a wrong continuation. Fixed test added to the patch. two minor nits... 0001-Fix-mix-of-OpenACC-and-OpenMP-sentinels-in-continuat.patch From 5492bf5bc991b6924f5e3b35c11eeaed745df073 Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov i.usma...@samsung.com Date: Sun, 7 Jun 2015 23:55:22 +0300 Subject: [PATCH] Fix mix of OpenACC and OpenMP sentinels in continuation --- gcc/fortran/ChangeLog | 5 + Use ChangeLog.gomp for gomp-4_0-branch. Done. + /* In case we have an OpenMP directive continued by OpenACC + sentinel, or vice versa, we get both openmp_flag and + openacc_flag on. */ + + if (openacc_flag openmp_flag) + { + int is_openmp = 0; + for (i = 0; i 5; i++, c = next_char ()) + { + if (gfc_wide_tolower (c) != (unsigned char) !$acc[i]) + is_openmp = 1; + if (i == 4) + old_loc = gfc_current_locus; + } + gfc_error (Wrong %s continuation at %C: expected %s, got %s, + is_openmp ? OpenACC : OpenMP, + is_openmp ? !$ACC : !$OMP, + is_openmp ? !$OMP : !$ACC); I think it's better for the translation project if you made this a complete string. So maybe change this line into gfc_error (is_openmp ? Wrong continuation at %C: expected !$ACC, got !$OMP, : Wrong continuation at %C: expected !$OMP, got !$ACC); Done Other than that, it looks fine. Thanks, Cesar OK for gomp branch? -- Ilmir. From dc98062b499838ca4391c0ed497f40205267a44e Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov m...@ilmir.us Date: Tue, 30 Jun 2015 03:05:23 +0300 Subject: [PATCH] PR/63858 --- gcc/fortran/ChangeLog.gomp | 10 ++ gcc/fortran/scanner.c | 259 +--- gcc/testsuite/ChangeLog.gomp| 7 + gcc/testsuite/gfortran.dg/goacc/omp-fixed.f | 32 gcc/testsuite/gfortran.dg/goacc/omp.f95 | 8 + 5 files changed, 216 insertions(+), 100 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp-fixed.f diff --git a/gcc/fortran/ChangeLog.gomp b/gcc/fortran/ChangeLog.gomp index aa02ffe..0a5bf47 100644 --- a/gcc/fortran/ChangeLog.gomp +++ b/gcc/fortran/ChangeLog.gomp @@ -1,3 +1,13 @@ +2015-06-30 Ilmir Usmanov m...@ilmir.us + + PR/63858 + Fix mix of OpenACC and OpenMP sentinels in continuation. + * scanner.c (skip_omp_attribute_fixed, skip_oacc_attribute_fixed): New + functions. + (skip_fixed_comments): Fix mix of OpenACC and OpenMP sentinels in + continuation. + (gfc_next_char_literal): Likewise. + 2015-06-18 James Norris jnor...@codesourcery.com * trans-decl.c (find_module_oacc_declare_clauses): Fix setting of diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c index f0e6404..d66cf65 100644 --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -935,6 +935,63 @@ skip_free_comments (void) return false; } +/* Return true if MP was matched in fixed form. */ +static bool +skip_omp_attribute_fixed (locus *start) +{ + gfc_char_t c; + if (((c = next_char ()) == 'm' || c == 'M') + ((c = next_char ()) == 'p' || c == 'P')) +{ + c = next_char (); + if (c != '\n' + (continue_flag + || c == ' ' || c == '\t' || c == '0')) + { + do + c = next_char (); + while (gfc_is_whitespace (c)); + if (c != '\n' c != '!') + { + /* Canonicalize to *$omp. */ + *start-nextc = '*'; + openmp_flag = 1; + gfc_current_locus = *start; + return true; + } + } +} + return false; +} + +/* Return true if CC was matched in fixed form. */ +static bool +skip_oacc_attribute_fixed (locus *start) +{ + gfc_char_t c; + if (((c = next_char ()) == 'c' || c == 'C') + ((c = next_char ()) == 'c' || c == 'C')) +{ + c = next_char (); + if (c != '\n' + (continue_flag + || c == ' ' || c == '\t' || c == '0')) + { + do + c = next_char (); + while (gfc_is_whitespace (c)); + if (c != '\n' c != '!') + { + /* Canonicalize to *$omp. */ + *start-nextc = '*'; + openacc_flag = 1; + gfc_current_locus = *start; + return true; + } + } +} + return false; +} /* Skip comment lines in fixed source mode. We have the same rules as in skip_free_comment(), except that we can
[PATCH][6/n] Remove GENERIC stmt combining from SCCVN
This moves X ==/!= ptr comparison folding which triggers surprisingly often. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-06-29 Richard Biener rguent...@suse.de * genmatch.c (add_operator): Treat ADDR_EXPR as atom. * fold-const.c (fold_binary_loc): Move A - B simplification via ptr_difference_const ... * match.pd: ... here. When matching (X ^ Y) == Y also match with swapped operands. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 225007) +++ gcc/genmatch.c (working copy) @@ -324,6 +324,9 @@ add_operator (enum tree_code code, const /* And allow CONSTRUCTOR for vector initializers. */ !(code == CONSTRUCTOR)) return; + /* Treat ADDR_EXPR as atom, thus don't allow matching its operand. */ + if (code == ADDR_EXPR) +nargs = 0; operator_id *op = new operator_id (code, id, nargs, tcc); id_base **slot = operators-find_slot_with_hash (op, op-hashval, INSERT); if (*slot) Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 225007) +++ gcc/fold-const.c(working copy) @@ -10618,16 +10562,6 @@ fold_binary_loc (location_t loc, fold_convert_loc (loc, type, negate_expr (arg1))); - /* Try folding difference of addresses. */ - { - HOST_WIDE_INT diff; - - if ((TREE_CODE (arg0) == ADDR_EXPR -|| TREE_CODE (arg1) == ADDR_EXPR) -ptr_difference_const (arg0, arg1, diff)) - return build_int_cst_type (type, diff); - } - /* Fold a[i] - a[j] to i-j. */ if (TREE_CODE (arg0) == ADDR_EXPR TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF Index: gcc/match.pd === *** gcc/match.pd(revision 225007) --- gcc/match.pd(working copy) *** (define_operator_list swapped_tcc_compar *** 545,550 --- 545,565 (with { tree algn = wide_int_to_tree (TREE_TYPE (@0), wi::bit_not (@1)); } (bit_and @0 { algn; }))) + /* Try folding difference of addresses. */ + (simplify + (minus (convert ADDR_EXPR@0) (convert @1)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) + (with { HOST_WIDE_INT diff; } +(if (ptr_difference_const (@0, @1, diff)) + { build_int_cst_type (type, diff); } + (simplify + (minus (convert @0) (convert ADDR_EXPR@1)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) + (with { HOST_WIDE_INT diff; } +(if (ptr_difference_const (@0, @1, diff)) + { build_int_cst_type (type, diff); } + + /* We can't reassociate at all for saturating types. */ (if (!TYPE_SATURATING (type)) *** (define_operator_list swapped_tcc_compar *** 1229,1235 /* (X ^ Y) == Y becomes X == 0. Likewise (X ^ Y) == X becomes Y == 0. */ (simplify ! (cmp (bit_xor:c @0 @1) @0) (cmp @1 { build_zero_cst (TREE_TYPE (@1)); })) /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2). */ --- 1244,1250 /* (X ^ Y) == Y becomes X == 0. Likewise (X ^ Y) == X becomes Y == 0. */ (simplify ! (cmp:c (bit_xor:c @0 @1) @0) (cmp @1 { build_zero_cst (TREE_TYPE (@1)); })) /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2). */
RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd
Maciej W. Rozycki ma...@linux-mips.org writes: Richard, please have a look at my question below in a reference to your previous statement. On Thu, 18 Jun 2015, Steve Ellcey wrote: OK, I checked in the prequel patch and here is a new copy of the original patch based off of that (and with no HONOR_NAN checks in the fma/madd instructions). OK for checkin? Please see below for my notes. 2015-06-18 Steve Ellcey sell...@imgtec.com * config.gcc (mips*-*-*): Add fused-madd.opt. Please use angle brackets as per https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-Changed.html, i.e.: * config.gcc mips*-*-*: Add fused-madd.opt. There's no function or similar entity involved here and `mips*-*-*' is a case value like with the C language's `switch' statement where you'd use angle brackets too to refer to individual cases. (*nmsub4mode_fastmath) Update condition. Extraneous space here. The change looks good to me otherwise. Maciej If nobody can identify any further functional issues with this patch then I'd like to get this committed and pursue enhancements as a second round. Catherine, would you be happy for this to be committed on that basis? Thanks, Matthew
[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop
Hello! Attached patch introduces asm flag outputs in seach_line_sse42 main loop to handle carry flag value from pcmpestri insn. Slightly improved old code that uses asm loop compiles to: 96:66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 9d:00 9e:48 83 ef 10 sub$0x10,%rdi a2:ba 10 00 00 00 mov$0x10,%edx a7:b8 04 00 00 00 mov$0x4,%eax ac:0f 1f 40 00 nopl 0x0(%rax) b0:48 83 c7 10 add$0x10,%rdi b4:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0 ba:73 f4jaeb0 _ZL17search_line_sse42PKhS0_+0x20 bc:48 8d 04 0f lea(%rdi,%rcx,1),%rax c0:c3 retq and new code results in: 96:66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 9d:00 9e:ba 10 00 00 00 mov$0x10,%edx a3:b8 04 00 00 00 mov$0x4,%eax a8:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0 ae:72 0cjb bc _ZL17search_line_sse42PKhS0_+0x2c b0:48 83 c7 10 add$0x10,%rdi b4:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0 ba:73 f4jaeb0 _ZL17search_line_sse42PKhS0_+0x20 bc:48 8d 04 0f lea(%rdi,%rcx,1),%rax c0:c3 retq which looks like an improvement to me. 2015-06-29 Uros Bizjak ubiz...@gmail.com * lex.c (search_line_sse42) [__GCC_ASM_FLAG_OUTPUTS__]: New main loop using asm flag outputs. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,m32} (ivybridge), so both code paths were exercised. Since this is a new feature - does the approach look OK? Uros. Index: lex.c === --- lex.c (revision 225138) +++ lex.c (working copy) @@ -450,15 +450,30 @@ search_line_sse42 (const uchar *s, const uchar *en s = (const uchar *)((si + 16) -16); } - /* Main loop, processing 16 bytes at a time. By doing the whole loop - in inline assembly, we can make proper use of the flags set. */ - __asm ( sub $16, %1\n - .balign 16\n + /* Main loop, processing 16 bytes at a time. */ +#ifdef __GCC_ASM_FLAG_OUTPUTS__ + while (1) +{ + char f; + __asm (%vpcmpestri\t$0, %2, %3 +: =c(index), =@ccc(f) +: m(*s), x(search), a(4), d(16)); + if (f) + break; + + s += 16; +} +#else + s -= 16; + /* By doing the whole loop in inline assembly, + we can make proper use of the flags set. */ + __asm ( .balign 16\n 0: add $16, %1\n - %vpcmpestri $0, (%1), %2\n + %vpcmpestri\t$0, (%1), %2\n jnc 0b : =c(index), +r(s) : x(search), a(4), d(16)); +#endif found: return s + index;
RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd
-Original Message- From: Matthew Fortune [mailto:matthew.fort...@imgtec.com] Sent: Monday, June 29, 2015 3:07 PM To: Maciej W. Rozycki; Steve Ellcey; Moore, Catherine; Richard Biener Cc: Richard Sandiford; Richard Sandiford; Myers, Joseph; gcc- patc...@gcc.gnu.org Subject: RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused- madd Maciej W. Rozycki ma...@linux-mips.org writes: Richard, please have a look at my question below in a reference to your previous statement. On Thu, 18 Jun 2015, Steve Ellcey wrote: OK, I checked in the prequel patch and here is a new copy of the original patch based off of that (and with no HONOR_NAN checks in the fma/madd instructions). OK for checkin? Please see below for my notes. 2015-06-18 Steve Ellcey sell...@imgtec.com * config.gcc (mips*-*-*): Add fused-madd.opt. Please use angle brackets as per https://www.gnu.org/prep/standards/html_node/Indicating-the-Part- Chan ged.html, i.e.: * config.gcc mips*-*-*: Add fused-madd.opt. There's no function or similar entity involved here and `mips*-*-*' is a case value like with the C language's `switch' statement where you'd use angle brackets too to refer to individual cases. (*nmsub4mode_fastmath) Update condition. Extraneous space here. The change looks good to me otherwise. Maciej If nobody can identify any further functional issues with this patch then I'd like to get this committed and pursue enhancements as a second round. Catherine, would you be happy for this to be committed on that basis? Yes, this is fine with me.
RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Hi, -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Monday, June 29, 2015 10:23 PM To: Dr. Philipp Tomsich Cc: James Greenhalgh; Kumar, Venkataramanan; Benedikt Huber; gcc- patc...@gcc.gnu.org; Marcus Shawcroft; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math On Jun 29, 2015, at 4:44 AM, Dr. Philipp Tomsich philipp.toms...@theobroma-systems.com wrote: James, On 29 Jun 2015, at 13:36, James Greenhalgh james.greenha...@arm.com wrote: On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote: -Original Message- From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Monday, June 29, 2015 2:17 PM To: Kumar, Venkataramanan Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. True. I would be very uncomfortable with this approach. Same here. The default must be safe. Always. Unlike other architectures, we don’t have a problem with making the proper defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of precise bits per iteration. From Richard Biener's post in the thread Michael Matz linked earlier in the thread: It would follow existing practice of things we allow in -funsafe-math-optimizations. Existing practice in that we want to allow -ffast-math use with common benchmarks we care about. https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html With the solution you seem to be converging on (2-steps for some microarchitectures, 3 for others), a binary generated for one micro-arch may drop below a minimum guarantee of precision when run on another. This seems to go against the spirit of the practice above. I would only support adding this optimization to -Ofast if we could keep to architectural guarantees of precision in the generated code (i.e. 3-steps everywhere). I don't object to adding a -mlow-precision-recip-sqrt style option, which would be off by default, would enable the 2-step mode, and would need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't see what this buys you beyond the Gromacs boost (and even there you would be creating an Invalid Run as optimization flags must be applied across all workloads). Any flag that reduces precision (and thus breaks IEEE floating-point semantics) needs to be gated with an “unsafe” flag (i.e. one that is never on by default). As a consequence, the “peak”-tuning for SPEC will turn this on… but barely anyone else would. For the 3-step optimization, it is clear to me that for generic tuning we don't want this to be enabled by default experimental results and advice in this thread argues against it for thunderx and cortex- a57 targets. However, enabling it based on the CPU tuning selected seems fine to me. I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5 iterations respectively) to become the default. Most “server-type” chips should not see a performance regression, while it will be easier to optimise for this in hardware than for a (potentially microcoded) sqrt-instruction (and subsequent, dependent divide). I have not heard anyone claim a performance regression (either on thunderx or on cortex-a57), but merely heard a “no speed-up”. Actually it does regress performance on thunderX, I just assumed that when I said not going to be a win it was taken as a slow down. It regress gromacs by more than 10% on thunderX but I can't remember how much as i had someone else run it. The latency difference is also over 40%; for example single precision: 29 cycles with div (12) sqrt(17) directly vs 42 cycles with the rsqrte and 2 iterations of 2mul/rsqrts (double is 53 vs 60). That is huge difference right there. ThunderX has a fast div and a fast sqrt for 32bit and a reasonable one for double. So again this is not just not a win but rather a regression for thunderX. I suspect cortex-a57 is also true. Thanks, Andrew Yes theoretically should be true for cortex-57 case as well. But I believe hardware pipelining with instruction scheduling in compiler helps a little for gromacs case ~3% to 4% with the original patch. I have not tested other FP benchmarks. As James said a flag -mlow-precision-recip-sqrt if allowed can be
Re: C++ PATCH for c++/66255 (ICE with non-type template parameter of typedef type)
On 06/27/2015 04:28 AM, Andreas Schwab wrote: * pt.c (check_unstripped_args): Mark parameter as unused. Thanks. Jason
Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern
On Mon, Jun 29, 2015 at 3:12 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 29 Jun 2015, Jeff Law wrote: That said - for this kind of patterns testcases that exercise the patterns on GIMPLE would be very appreciated. It may be the case that these patterns don't make a lot of sense on gimple and should be restricted to generic, at least with our current infrastructure. The problem is when we lower from generic to gimple, we end up with branchy code, not straight line code and there's no good way I see to write a match.pd pattern which encompasses flow control. Andrew was working on a way to generate phiopt code automatically from match.pd patterns involving cond_expr, I don't know what the status is. I stopped due to many different things. One was match.pd was too immature for conditional expressions. The other reason was because I had other things to do internally. I might pick up but it won't be for another three months. Thanks, Andrew or maybe use a for loop on comparisons, which would give names to TREE_OPERAND (@0, *). This should even handle the operand_equal_p alternative: (cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2) Hmm, looks like I wrote INTEGER_CST before the second occurence of @2 instead of the first, so it is probably ignored :-( Yes, that would be my reference. But won't this require pointer equivalence? Are INTEGER_CST nodes fully shared? What if @1 is something more complex than a _DECL node (remember, we're working with GENERIC). So something like (cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4)) And using operand_equal_p seems more appropriate to me (and is still better than the original (cond @0 ...) and grubbing around inside @0 to look at operands. I don't understand this comment. When you write @1 twice, genmatch emits a call to operand_equal_p to check that they are the same. -- Marc Glisse
Re: [C++ Patch] PR 65977
OK. Jason
Re: [PATCH] Move X - (X / Y) * Y folding to match.pd
On Mon, 29 Jun 2015, Marek Polacek wrote: On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote: Anything wrong with this? +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 That looks awfully similar to a variant I also tried (but I remember having convert1? and convert2? in it). Not sure what was wrong with that one; certainly yours seems to work fine. Afterwards I thought of a limitation. Nothing bad, but it highlights a trap I regularly fall into: several @0 in the same pattern may have different types (for INTEGER_CST, operand_equal_p mostly ignores the type). So for an int x, 42L-42/x*x should fail to match, while using convert1? and convert2? should match. -- Marc Glisse
[C++/66443]
This patch fixes 66443, a defect introduced by a C++14 change concerning virtual bases of abstract classes. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66443 points at DR1611 1658, which describe the issue. Essentially, an abstract class type can only exist as a base of some other (non-abstract) class. Any virtual bases of the abstract class will be constructed by the containing class itself. In this specific instance, a deleted default ctor of such a virtual base should not delete the abstract class's default dtor. built tested on x6-64-linux, ok? nathan 2015-06-29 Nathan Sidwell nat...@acm.org PR c++/66443 * init.c (emit_mem_initializers): Do not emit initializers for virtual bases of abstract classes. * method.c (synthesized_method_walk): Skip virtual bases of abstract classes in C++14 mode. PR c++/66443 * cpp1y/pr66443.C: New test. Index: cp/init.c === --- cp/init.c (revision 225103) +++ cp/init.c (working copy) @@ -1139,9 +1139,7 @@ emit_mem_initializers (tree mem_inits) } /* Initialize the base. */ - if (BINFO_VIRTUAL_P (subobject)) - construct_virtual_base (subobject, arguments); - else + if (!BINFO_VIRTUAL_P (subobject)) { tree base_addr; @@ -1155,6 +1153,8 @@ emit_mem_initializers (tree mem_inits) tf_warning_or_error); expand_cleanup_for_base (subobject, NULL_TREE); } + else if (!ABSTRACT_CLASS_TYPE_P (current_class_type)) + construct_virtual_base (subobject, arguments); } in_base_initializer = 0; Index: cp/method.c === --- cp/method.c (revision 225103) +++ cp/method.c (working copy) @@ -1505,7 +1505,10 @@ synthesized_method_walk (tree ctype, spe vbases = CLASSTYPE_VBASECLASSES (ctype); if (vec_safe_is_empty (vbases)) /* No virtual bases to worry about. */; - else if (!assign_p) + else if (!assign_p + /* C++14 ignores virtual bases of abstract classes, as they + can never be directly called. */ + (cxx_dialect cxx14 || !ABSTRACT_CLASS_TYPE_P (ctype))) { if (constexpr_p) *constexpr_p = false; Index: testsuite/g++.dg/cpp1y/pr66443.C === --- testsuite/g++.dg/cpp1y/pr66443.C (revision 0) +++ testsuite/g++.dg/cpp1y/pr66443.C (working copy) @@ -0,0 +1,22 @@ +// PR c++/66443 +// { dg-do compile { target c++14 } } + +class A { +public: +A( int ) { } +}; + +// B's virtual base is ignored for default ctor determination as B is +// abstract. DR1611 DR1658 + +class B: virtual public A { +public: +virtual void do_something() = 0; +}; + +class C: public B { +public: +C(): A( 1 ) { } +virtual void do_something() { } +}; +
Re: [PATCH] Use PIE_SPEC/NO_PIE_SPEC for crtend.o/crtendS.o
On Sun, 28 Jun 2015, H.J. Lu wrote: Here is a patch. OK for trunk? Thanks. I'd like to check in this patch. Any comments, objections? Thanks. H.J. From 50bebf531193c18efb0982ac119694aa9f650e44 Mon Sep 17 00:00:00 2001 From: H.J. Lu hjl.to...@gmail.com Date: Thu, 25 Jun 2015 03:04:56 -0700 Subject: [PATCH 1/2] Use PIE_SPEC/NO_PIE_SPEC for crtend.o/crtendS.o We need to link with crtend.o and crtendS.o properly for GCC configured to generate PIE by default. * config/gnu-user.h (GNU_USER_TARGET_ENDFILE_SPEC): Use PIE_SPEC and NO_PIE_SPEC if HAVE_LD_PIE is defined. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: Four jit backports to gcc 5 branch
On 29 June 2015 at 16:44, David Malcolm dmalc...@redhat.com wrote: I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg directories in trunk since gcc 5 and backported the following 4 changes from trunk to the gcc-5-branch: gcc-5-branch's r225123: * trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition of test-benchmark.c): https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html gcc-5-branch's r225125: * trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957, PR jit/66539: Add parentheses as needed to gcc_jit_object_get_debug_string): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html gcc-5-branch's r225127: * trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add a test for compound assignment): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html gcc-5-branch's r225129: * trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a, jit: Add missing type-checking to gcc_jit_{l|r}value_access_field) https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html Hi Dave - is this what will go into 5.2? If so, is the option to disable unreachable block validation going to be available? Thanks and Regards Dibyendu
Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern
On Mon, 29 Jun 2015, Jeff Law wrote: That said - for this kind of patterns testcases that exercise the patterns on GIMPLE would be very appreciated. It may be the case that these patterns don't make a lot of sense on gimple and should be restricted to generic, at least with our current infrastructure. The problem is when we lower from generic to gimple, we end up with branchy code, not straight line code and there's no good way I see to write a match.pd pattern which encompasses flow control. Andrew was working on a way to generate phiopt code automatically from match.pd patterns involving cond_expr, I don't know what the status is. or maybe use a for loop on comparisons, which would give names to TREE_OPERAND (@0, *). This should even handle the operand_equal_p alternative: (cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2) Hmm, looks like I wrote INTEGER_CST before the second occurence of @2 instead of the first, so it is probably ignored :-( Yes, that would be my reference. But won't this require pointer equivalence? Are INTEGER_CST nodes fully shared? What if @1 is something more complex than a _DECL node (remember, we're working with GENERIC). So something like (cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4)) And using operand_equal_p seems more appropriate to me (and is still better than the original (cond @0 ...) and grubbing around inside @0 to look at operands. I don't understand this comment. When you write @1 twice, genmatch emits a call to operand_equal_p to check that they are the same. -- Marc Glisse
Re: [patch] PR debug/66653: avoid late_global_decl on decl_type_context()s
On 06/29/2015 05:07 AM, Richard Biener wrote: On Fri, Jun 26, 2015 at 11:59 PM, Jason Merrill ja...@redhat.com wrote: On 06/26/2015 05:37 AM, Richard Biener wrote: Can we defer TLS model setting to template instantiation? We need to represent somehow that __thread (or thread_local) was used in the declaration, but DECL_THREAD_LOCAL_P was changed to refer to the TLS model. Ok, so easiest would be to allocate a bit from decl_with_vis for this... Or I can find a flag in the front end. I guess I'll do that. Jason
[PATCH] Discard Scops for which entry==exit
In this patch we discard the scops where entry and exit are the same BB. This is an effort to remove graphite-scop-detection.c:limit_scops. Removing the limit_scops function introduces correctness regressions. We are making relevant changes in incremental steps to fix those bugs, and finally we intend to remove limit_scops. 2015-06-29 Aditya Kumar aditya...@samsung.com Sebastian Pop s@samsung.com * graphite-scop-detection.c (build_scops_1): Discard scops for which entry==exit --- gcc/graphite-scop-detection.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index e8ddecd..f57cc4a 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -810,7 +810,14 @@ build_scops_1 (basic_block current, loop_p outermost_loop, { open_scop.exit = sinfo.exit; gcc_assert (open_scop.exit); - scops-safe_push (open_scop); + if (open_scop.entry != open_scop.exit) + scops-safe_push (open_scop); + else + { + sinfo.difficult = true; + sinfo.exits = false; + sinfo.exit = NULL; + } } result.exit = sinfo.exit; -- 2.1.0.243.g30d45f7
Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)
On 29 June 2015 at 22:26, David Malcolm dmalc...@redhat.com wrote: I'm working on that, I know that it's important to Ravi (the deadline for gcc-5.2 is Friday; 2015-07-03). Thanks! There's no good way to track enums in binary metadata, so I'm currently thinking of this approach for the new options: extern void gcc_jit_context_set_bool_allow_unreachable_blocks (gcc_jit_context *ctxt, int bool_value); tagging it within the linker mapfile as being a new API. I'm looking at ways to manage libgccjit API/ABI as per this thread: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html by transitioning to using symbol versioning, so that the linker can tag subsets of libgccjit symbols in both libgccjit and in client binaries. Adding symbol versioning will involve a SONAME bump (from libgccjit.so.0 to libgccjit.so.1), thus requiring anyone using libgccjit to recompile for this one change (the transition to versioned symbols, that is); hopefully we never need to bump the SONAME again. Given that this is needed by Ravi I feel that this break is worth backporting to gcc 5.2 (I'd rather do this one-time ABI break for the benefit of specific users, than not do it out of concern for hypothetical ones). Hope this sounds sane. If you are just adding some API then does it need to be versioned? Apologies if I am misunderstanding the reason here. A breaking change would be if you changed an existing API. Regards Dibyendu
[PATCH] Graphite cannot handle return stmt
No regressions. 2015-06-29 Aditya Kumar aditya...@samsung.com Sebastian Pop s@samsung.com * graphite-scop-detection.c (stmt_simple_for_scop_p): Bail out in case of a return statement. --- gcc/graphite-scop-detection.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index e8ddecd..a10702e 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -365,6 +365,8 @@ stmt_simple_for_scop_p (basic_block scop_entry, loop_p outermost_loop, switch (gimple_code (stmt)) { case GIMPLE_RETURN: + return false; + case GIMPLE_LABEL: return true; -- 2.1.0.243.g30d45f7
Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)
On Mon, Jun 29, 2015 at 5:26 PM, David Malcolm dmalc...@redhat.com wrote: I'm looking at ways to manage libgccjit API/ABI as per this thread: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html by transitioning to using symbol versioning, so that the linker can tag subsets of libgccjit symbols in both libgccjit and in client binaries. You don't have to bump the SONAME to introduce symbol versioning. glibc in the beginning didn't have symbol versioning and we wrote the linker and dynamic linker support so that no SONAME change was necessary. The idea is that unversioned symbols are satisfied by the oldest symbol version.
[C++ Patch] PR 65977
Hi, given in particular Richard Smith' analysis here: http://stackoverflow.com/questions/29871138/constexpr-is-not-allowed-in-declaration-of-friend-template-specialization/29957648#29957648 I think it's pretty straightforward that we don't have a good reason to reject constexpr friend declarations. In practice the most recent compilers I have at hand all agree. Tested x86_64-linux. Thanks, Paolo. /// /cp 2015-06-29 Paolo Carlini paolo.carl...@oracle.com PR c++/65977 * decl.c (grokfndecl): Allow constexpr declarations of friend template specializations. /testsuite 2015-06-29 Paolo Carlini paolo.carl...@oracle.com PR c++/65977 * g++.dg/cpp0x/constexpr-friend-3.C: New. * g++.dg/cpp0x/constexpr-friend-2.C: Adjust. Index: cp/decl.c === --- cp/decl.c (revision 225123) +++ cp/decl.c (working copy) @@ -7738,15 +7738,12 @@ grokfndecl (tree ctype, } if (inlinep 1) - error (%inline% is not allowed in declaration of friend - template specialization %qD, - decl); - if (inlinep 2) - error (%constexpr% is not allowed in declaration of friend - template specialization %qD, - decl); - if (inlinep) - return NULL_TREE; + { + error (%inline% is not allowed in declaration of friend +template specialization %qD, +decl); + return NULL_TREE; + } } } Index: testsuite/g++.dg/cpp0x/constexpr-friend-2.C === --- testsuite/g++.dg/cpp0x/constexpr-friend-2.C (revision 225123) +++ testsuite/g++.dg/cpp0x/constexpr-friend-2.C (working copy) @@ -3,5 +3,5 @@ templatetypename T void f(T); template class T class A { - friend constexpr void f(int); // { dg-error 'constexpr' is not allowed } + friend constexpr void f(int); }; Index: testsuite/g++.dg/cpp0x/constexpr-friend-3.C === --- testsuite/g++.dg/cpp0x/constexpr-friend-3.C (revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-friend-3.C (working copy) @@ -0,0 +1,21 @@ +// PR c++/65977 +// { dg-do compile { target c++11 } } + +template__SIZE_TYPE__ +class bitset; + +template__SIZE_TYPE__ N +constexpr bool operator==(const bitsetN, const bitsetN) noexcept; + +template__SIZE_TYPE__ N +class bitset +{ + friend constexpr bool operator== (const bitsetN, + const bitsetN) noexcept; +}; + +template__SIZE_TYPE__ N +constexpr bool operator==(const bitsetN, const bitsetN) noexcept +{ + return true; +}
Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)
On Mon, 2015-06-29 at 21:57 +0100, Dibyendu Majumdar wrote: On 29 June 2015 at 16:44, David Malcolm dmalc...@redhat.com wrote: I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg directories in trunk since gcc 5 and backported the following 4 changes from trunk to the gcc-5-branch: gcc-5-branch's r225123: * trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition of test-benchmark.c): https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html gcc-5-branch's r225125: * trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957, PR jit/66539: Add parentheses as needed to gcc_jit_object_get_debug_string): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html gcc-5-branch's r225127: * trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add a test for compound assignment): https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html gcc-5-branch's r225129: * trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a, jit: Add missing type-checking to gcc_jit_{l|r}value_access_field) https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html Hi Dave - is this what will go into 5.2? No, it's just the things that I've backported so far. If so, is the option to disable unreachable block validation going to be available? I'm working on that, I know that it's important to Ravi (the deadline for gcc-5.2 is Friday; 2015-07-03). There's no good way to track enums in binary metadata, so I'm currently thinking of this approach for the new options: extern void gcc_jit_context_set_bool_allow_unreachable_blocks (gcc_jit_context *ctxt, int bool_value); tagging it within the linker mapfile as being a new API. I'm looking at ways to manage libgccjit API/ABI as per this thread: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html by transitioning to using symbol versioning, so that the linker can tag subsets of libgccjit symbols in both libgccjit and in client binaries. Adding symbol versioning will involve a SONAME bump (from libgccjit.so.0 to libgccjit.so.1), thus requiring anyone using libgccjit to recompile for this one change (the transition to versioned symbols, that is); hopefully we never need to bump the SONAME again. Given that this is needed by Ravi I feel that this break is worth backporting to gcc 5.2 (I'd rather do this one-time ABI break for the benefit of specific users, than not do it out of concern for hypothetical ones). Hope this sounds sane. As mentioned in that other thread, I'm wondering if linker experts could advise on the sanity of using a feature-based naming scheme for symbol map tags, rather than pure numbers? e.g. LIBGCCJIT_ABI_INITIAL LIBGCCJIT_ABI_WITH_SWITCH_STATEMENTS LIGCCCJIT_ABI_WITH_BOOL_OPTION_ERROR_ON_UNREACHABLE_BLOCKS etc (how long can they be, and is there a cost? can they include lower case?) Dave
Re: [PATCH] Move X - (X / Y) * Y folding to match.pd
On Sat, 27 Jun 2015, Marc Glisse wrote: On Fri, 26 Jun 2015, Marek Polacek wrote: This is an attempt to move one pattern from fold-const.c to match.pd. It ought to be 1:1, but is not, e.g. with this patch we won't fold e.g. int f (int a, int b) { return a - (unsigned) ((a / b) * b) } anymore, but we're able to fold int ff (int a, unsigned int b) { return a - ((a / b) * b); } and fold-const.c is not. Interesting. I played around with converts, but didn't find anything that would work well. Any suggestions how to make this pattern better? Anything wrong with this? +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1))) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (convert (trunc_mod @0 @1 Yes. Eventually even (convert? (mult (convert1? (trunc_div ...)? Of course with matching @0 between the two operands of the minus you constrain types quite a bit. I'd say just single-step through fold and see what types it get present when folding a - (unsigned) ((a / b) * b). Thanks, Richard. (the other div/mod pairs could benefit from the same transformation as long as there are no conversions, but the conversion seems easier to handle with trunc_) diff --git gcc/match.pd gcc/match.pd index b2f8429..2bc158b 100644 --- gcc/match.pd +++ gcc/match.pd @@ -238,6 +238,12 @@ along with GCC; see the file COPYING3. If not see tree_nop_conversion_p (type, TREE_TYPE (@1))) (trunc_mod @0 (convert @1 +/* X - (X / Y) * Y is the same as X % Y. */ +(simplify + (minus @0 (mult (trunc_div @0 @1) @1)) + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) + (trunc_mod @0 @1))) + /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR, i.e. X % C into X (C - 1), if X and C are positive. Also optimize A % (C N) where C is a power of 2, Marek -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Hmm, Reducing the iterations to 1 step for float and 2 steps for double I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Regards, Venkat. -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Sunday, June 28, 2015 8:35 PM To: Kumar, Venkataramanan Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: I got around ~12% gain with -Ofast -mcpu=cortex-a57. I get around 11/12% on thunderX with the patch and the decreasing the iterations change (1/2) compared to without the patch. Thanks, Andrew Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich Sent: Thursday, June 25, 2015 9:13 PM To: Kumar, Venkataramanan Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, what is the relative gain that you see on Cortex-A57? Thanks, Philipp. On 25 Jun 2015, at 17:35, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Changing to 1 step for float and 2 steps for double gives better gains now for gromacs on cortex-a57. Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Benedikt Huber Sent: Thursday, June 25, 2015 4:09 PM To: pins...@gmail.com Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- systems.com Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Andrew, This is NOT a win on thunderX at least for single precision because you have to do the divide and sqrt in the same time as it takes 5 multiples (estimate and step are multiplies in the thunderX pipeline). Doubles is 10 multiplies which is just the same as what the patch does (but it is really slightly less than 10, I rounded up). So in the end this is NOT a win at all for thunderX unless we do one less step for both single and double. Yes, the expected benefit from rsqrt estimation is implementation specific. If one has a better initial rsqrte or an application that can trade precision for execution time, we could offer a command line option to do only 2 steps for doulbe and 1 step for float; similar to - mrecip-precision for PowerPC. What are your thoughts on that? Best regards, Benedikt
Re: [PATCH][5/n] Remove GENERIC stmt combining from SCCVN
On Sat, 27 Jun 2015, Marc Glisse wrote: On Fri, 26 Jun 2015, Richard Biener wrote: + /* Equality compare simplifications from fold_binary */ + (for cmp (eq ne) + + /* If we have (A | C) == D where C ~D != 0, convert this into 0. + Similarly for NE_EXPR. */ + (simplify + (cmp (convert?@3 (bit_ior @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@0)) + wi::bit_and_not (@1, @2) != 0) +{ constant_boolean_node (cmp == NE_EXPR, type); })) + + /* (X ^ Y) == 0 becomes X == Y, and (X ^ Y) != 0 becomes X != Y. */ + (simplify + (cmp (bit_xor @0 @1) integer_zerop) + (cmp @0 @1)) + + /* (X ^ Y) == Y becomes X == 0. + Likewise (X ^ Y) == X becomes Y == 0. */ + (simplify + (cmp (bit_xor:c @0 @1) @0) Don't you need cmp:c for this one? The transformation still somehow happens through forward_propagate_into_comparison, but it looks like an accident. Yeah, I think it canonicalizes operand order (and uses the GENERIC folding path). I'm testing a patch changing the above to cmp:c. + (cmp @1 { build_zero_cst (TREE_TYPE (@1)); })) + + /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2). */ + (simplify + (cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@0))) +(cmp @0 (bit_xor @1 (convert @2)) I guess we'll have to generalize this to vectors at some point... True - I'm trying to move patterns 1:1 leaving improvements to followups Thanks, Richard.
[Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
Hi, I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the IEEE versions of fmin and fmax. This is done by recognising the math library fmax and fmin builtin functions in a similar way to how this is done for -ffast-math. This also allows us to vectorise the IEEE max/min functions for targets that support it, for example aarch64/aarch32. Tested: x86_64-linux: no regressions aarch64-none-elf: no regressions aarch64_be-none-elf: no regressions arm-none-eabi: no regressions ChangeLog: 2015-06-26 David Sherwood david.sherw...@arm.com gcc/ * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR. (fold_builtin_fmin_fmax): For strict math, convert builting fmin and fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively. * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR. * fold-const.c (const_binop): Likewise. (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise. (tree_binary_nonzero_warnv_p): Likewise. * optabs.h (strict_minmax_support): Declare. * optabs.def: Add new optabs strict_max_optab/strict_min_optab. * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR and STRICT_MAX_EXPR. (strict_minmax_support): New function. * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR. * tree.def: Likewise. * tree.c (associative_tree_code, commutative_tree_code): Likewise. * tree-cfg.c (verify_expr): Likewise. (verify_gimple_assign_binary): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise. (op_symbol_code): Likewise. * config/aarch64/aarch64.md: New pattern. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/iterators.md: New unspecs, iterators. * config/arm/iterators.md: New iterators. * config/arm/unspecs.md: New unspecs. * config/arm/neon.md: New pattern. * config/arm/vfp.md: Likewise. gcc/testsuite * gcc.target/aarch64/maxmin_strict.c: New test. * gcc.target/arm/maxmin_strict.c: New test. * lib/target-supports.exp (check_effective_target_arm_v8_vfp_neon_ok): New check. (add_options_for_arm_v8_vfp_neon): New options. Regards, David Sherwood. patch Description: Binary data
Re: [PATCH][4/n] Remove GENERIC stmt combining from SCCVN
On Fri, 26 Jun 2015, Jeff Law wrote: On 06/26/2015 03:24 AM, Richard Biener wrote: On Thu, 25 Jun 2015, Richard Biener wrote: This moves fold_sign_changed_comparison. Shows up in gcc.dg/pr55833.c I'll eventually massage it according to Jakubs suggestion to do a #ifndef HAVE_canonicalize_funcptr_for_compare #define HAVE_canonicalize_funcptr_for_compare 0 #endif somewhere (defaults.h should work I guess). Bootstrap and regtest running on x86_64-unknown-linux-gnu. This runs into Running target unix//-m32 FAIL: g++.dg/tree-ssa/calloc.C -std=gnu++11 scan-tree-dump-not optimized malloc where we now optimize n_5 = (size_t) n_4(D); ... bb 5: - if (n_5 != 0) + if (n_4(D) != 0) but both VRP and DOM fail to record equivalences for n_5 from the updated condition (I have a patch to fix VRP but not DOM). So you want an equivalence recorded for n_5, even though it no longer appears in the conditional (but presumably has other uses)? DOM has some code for this already, but it's kind-of backwards from what you're trying to do in terms of when it's used. That code might be factorable and usable elsewhere in DOM. Not sure I came along sth like that. In principle the following works for the testcase (even w/o fixing the VRP part). Index: gcc/tree-ssa-dom.c === --- gcc/tree-ssa-dom.c (revision 225007) +++ gcc/tree-ssa-dom.c (working copy) @@ -1409,6 +1409,14 @@ simplify_stmt_for_jump_threading (gimple return lookup_avail_expr (stmt, false); } +static tree +dom_valueize (tree t) +{ + if (TREE_CODE (t) == SSA_NAME) +return SSA_NAME_VALUE (t); + return t; +} + /* Record into the equivalence tables any equivalences implied by traversing edge E (which are cached in E-aux). @@ -1429,7 +1437,33 @@ record_temporary_equivalences (edge e) /* If we have a simple NAME = VALUE equivalence, record it. */ if (lhs TREE_CODE (lhs) == SSA_NAME) - const_and_copies-record_const_or_copy (lhs, rhs); + { + gimple use_stmt; + imm_use_iterator iter; + const_and_copies-record_const_or_copy (lhs, rhs); + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) + { + /* Only bother to record more equivalences for lhs that +can be directly used by e-dest. +??? If the code gets re-organized to a worklist to +catch more indirect opportunities and it is made to +handle PHIs then this should only consider use_stmts +in basic-blocks we have already visited. */ + if (!dominated_by_p (CDI_DOMINATORS, + e-dest, gimple_bb (use_stmt))) + continue; + tree lhs = gimple_get_lhs (use_stmt); + if (lhs TREE_CODE (lhs) == SSA_NAME) + { + tree res = gimple_fold_stmt_to_constant_1 (use_stmt, +dom_valueize, + no_follow_ssa_edges); + if (TREE_CODE (res) == SSA_NAME + || is_gimple_min_invariant (res)) + const_and_copies-record_const_or_copy (lhs, res); + } + } + } /* If we have 0 = COND or 1 = COND equivalences, record them into our expression hash tables. */ it's not using DOMs own stmt visiting machinery as that always modifies stmts in-place. As stated in the comment it doesn't catch secondary opportunities. That would be possible by using a work-list seeded by LHS we recorded new const/copies for and re-visiting their uses. You can get extra fancy here by properly handling PHIs and conditionals. But it's a question of cost here, of course. Note that I think this isn't really backward propagation but just context sensitive value-numbering. Like VRP restricts what it inserts asserts for stmt re-visiting could also be constrained on availability of uses dominated by the edge providing the original equivalency. So in some sense the above is a hack (similar to all the special-cases VRP has for similar situations). Under what constraints do you think sth like the above is ok to put on trunk? Thanks, Richard. I think we're simply missing a pass that can properly deal with this kind of stuff. For example DOM could re-visit stmts which have uses of SSA names we added temporary equivalences for (ones that dominate the current block, others we'll (re-)visit anyway). That would fix this testcase but to catch secondary effects you'd need to re-visit uses of the defs of the revisited stmts as well (if anything interesting was produced, of course). This problem feels a bit like it's better handled by an optimizer independent of DOM. Probably a backwards IL walking context
[PATCH] Bump LTO_major_version on trunk
Was still the same as on the GCC 5 branch. Committed as obvious. Richard. 2015-06-29 Richard Biener rguent...@suse.de * lto-streamer.h (LTO_major_version): Bump to 5. Index: gcc/lto-streamer.h === --- gcc/lto-streamer.h (revision 225112) +++ gcc/lto-streamer.h (working copy) @@ -130,7 +130,7 @@ along with GCC; see the file COPYING3. String are represented in the table as pairs, a length in ULEB128 form followed by the data for the string. */ -#define LTO_major_version 4 +#define LTO_major_version 5 #define LTO_minor_version 0 typedef unsigned char lto_decl_flags_t;
Re: Remove redundant AND from count reduction loop
On Sun, Jun 28, 2015 at 1:45 PM, Marc Glisse marc.gli...@inria.fr wrote: On Fri, 26 Jun 2015, Richard Biener wrote: OK. The reason I was being paranoid was that I couldn't see anywhere where we enforced that the vector condition in a VEC_COND had to have the same element width as the values being selected. We don't require that indeed. tree-cfg.c only checks that rhs2 and rhs3 are compatible with the result. There doesn't seem to be any checking of rhs1 vs. the other types. So I wasn't sure whether anything stopped us from, e.g., comparing two V4HIs and using the result to select between two V4SIs. Nothing does (or should). The documentation patch you approved in https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01109.html says something different. If it is really wrong, could you fix it? Hmm, that simplifies things. On the other hand, vectors of bools could be (I haven't thought about it much) nice to have, especially for avx512 (and at least one other arch, maybe sparc). It would be nice if these constraints would also be checked in the gimple verifier... This passed bootstrap+testsuite on powerpc64le-unknown-linux-gnu. Ok. Thanks, Richard. 2015-06-29 Marc Glisse marc.gli...@inria.fr * tree-cfg.c (verify_gimple_assign_ternary) VEC_COND_EXPR: Check the first argument. -- Marc Glisse Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 225104) +++ gcc/tree-cfg.c (working copy) @@ -4001,8 +4001,22 @@ } break; +case VEC_COND_EXPR: + if (!VECTOR_INTEGER_TYPE_P (rhs1_type) + || TYPE_SIGN (rhs1_type) != SIGNED + || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type) + || TYPE_VECTOR_SUBPARTS (rhs1_type) +!= TYPE_VECTOR_SUBPARTS (lhs_type)) + { + error (the first argument of a VEC_COND_EXPR must be of a signed +integral vector type of the same size and number of +elements as the result); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + /* Fallthrough. */ case COND_EXPR: -case VEC_COND_EXPR: if (!useless_type_conversion_p (lhs_type, rhs2_type) || !useless_type_conversion_p (lhs_type, rhs3_type)) {
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Kumar, This does not come unexpected, as the initial estimation and each iteration will add an architecturally-defined number of bits of precision (ARMv8 guarantuees only a minimum number of bits provided per operation… the exact number is specific to each micro-arch, though). Depending on your architecture and on the required number of precise bits by any given benchmark, one may see miscompares. Do you know the exact number of bits that the initial estimate and the subsequent refinement steps add for your micro-arch? Thanks, Philipp. On 29 Jun 2015, at 10:17, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Hmm, Reducing the iterations to 1 step for float and 2 steps for double I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Regards, Venkat. -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Sunday, June 28, 2015 8:35 PM To: Kumar, Venkataramanan Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: I got around ~12% gain with -Ofast -mcpu=cortex-a57. I get around 11/12% on thunderX with the patch and the decreasing the iterations change (1/2) compared to without the patch. Thanks, Andrew Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich Sent: Thursday, June 25, 2015 9:13 PM To: Kumar, Venkataramanan Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Kumar, what is the relative gain that you see on Cortex-A57? Thanks, Philipp. On 25 Jun 2015, at 17:35, Kumar, Venkataramanan venkataramanan.ku...@amd.com wrote: Changing to 1 step for float and 2 steps for double gives better gains now for gromacs on cortex-a57. Regards, Venkat. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Benedikt Huber Sent: Thursday, June 25, 2015 4:09 PM To: pins...@gmail.com Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- systems.com Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math Andrew, This is NOT a win on thunderX at least for single precision because you have to do the divide and sqrt in the same time as it takes 5 multiples (estimate and step are multiplies in the thunderX pipeline). Doubles is 10 multiplies which is just the same as what the patch does (but it is really slightly less than 10, I rounded up). So in the end this is NOT a win at all for thunderX unless we do one less step for both single and double. Yes, the expected benefit from rsqrt estimation is implementation specific. If one has a better initial rsqrte or an application that can trade precision for execution time, we could offer a command line option to do only 2 steps for doulbe and 1 step for float; similar to - mrecip-precision for PowerPC. What are your thoughts on that? Best regards, Benedikt
Re: [patch] PR debug/66653: avoid late_global_decl on decl_type_context()s
On Fri, Jun 26, 2015 at 11:59 PM, Jason Merrill ja...@redhat.com wrote: On 06/26/2015 05:37 AM, Richard Biener wrote: Can we defer TLS model setting to template instantiation? We need to represent somehow that __thread (or thread_local) was used in the declaration, but DECL_THREAD_LOCAL_P was changed to refer to the TLS model. Ok, so easiest would be to allocate a bit from decl_with_vis for this... Richard. Jason
Re: [PATCH] Graphite cannot handle return stmt
On Mon, Jun 29, 2015 at 3:58 PM, Aditya Kumar hiradi...@msn.com wrote: No regressions. 2015-06-29 Aditya Kumar aditya...@samsung.com Sebastian Pop s@samsung.com * graphite-scop-detection.c (stmt_simple_for_scop_p): Bail out in case of a return statement. Looks good to me. Tobi, do you see a good reason not to cut scops at return stmts? Thanks, Sebastian --- gcc/graphite-scop-detection.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index e8ddecd..a10702e 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -365,6 +365,8 @@ stmt_simple_for_scop_p (basic_block scop_entry, loop_p outermost_loop, switch (gimple_code (stmt)) { case GIMPLE_RETURN: + return false; + case GIMPLE_LABEL: return true; -- 2.1.0.243.g30d45f7
[PATCH, rs6000] Add more overloaded built-ins for vec_cmpge and vec_cmple
Hi, This patch adds built-in functions for vec_cmpge and vec_cmple required by the ELFv2 ABI but not yet present in gcc. These make use of the existing patterns for gt:VEC_I and gtu:VEC_I, applying the not operator and reversing the order of operands as needed. The result is generation of a vmpgt[su][bhwd] instruction followed by an xxlnor. Since it is likely that a vec_cmp[gl]e will be followed by a vec_sel, it will be important to optimize the xxlnor/xxsel* to remove the xxlnor by reversing the sense of the select. A separate patch will follow to add this pattern to simplify-rtx.c. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill [gcc] 2015-06-29 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000-builtin.def (CMPGE_16QI): New built-in definition. (CMPGE_8HI): Likewise. (CMPGE_4SI): Likewise. (CMPGE_2DI): Likewise. (CMPGE_U16QI): Likewise. (CMPGE_U8HI): Likewise. (CMPGE_U4SI): Likewise. (CMPGE_U2DI): Likewise. (CMPLE_16QI): Likewise. (CMPLE_8HI): Likewise. (CMPLE_4SI): Likewise. (CMPLE_2DI): Likewise. (CMPLE_U16QI): Likewise. (CMPLE_U8HI): Likewise. (CMPLE_U4SI): Likewise. (CMPLE_U2DI): Likewise. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add overloads for ALTIVEC_BUILTIN_VEC_CMPGE and ALTIVEC_BUILTIN_VEC_CMPLE. * config/rs6000/vector.md (vector_gemode): Restrict to floating-point vector modes. (vector_nltmode): New define_expand. (vector_nltumode): Likewise. (vector_ngtmode): Likewise. (vector_ngtumode): Likewise. [gcc/testsuite] 2015-06-29 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/vec-cmp.c: New test. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 224924) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1284,6 +1284,24 @@ BU_VSX_2 (XVCVUXDDP_SCALE,xvcvuxddp_scale, C BU_VSX_2 (XVCVDPSXDS_SCALE, xvcvdpsxds_scale, CONST, vsx_xvcvdpsxds_scale) BU_VSX_2 (XVCVDPUXDS_SCALE, xvcvdpuxds_scale, CONST, vsx_xvcvdpuxds_scale) +BU_VSX_2 (CMPGE_16QI, cmpge_16qi, CONST, vector_nltv16qi) +BU_VSX_2 (CMPGE_8HI, cmpge_8hi, CONST, vector_nltv8hi) +BU_VSX_2 (CMPGE_4SI, cmpge_4si, CONST, vector_nltv4si) +BU_VSX_2 (CMPGE_2DI, cmpge_2di, CONST, vector_nltv2di) +BU_VSX_2 (CMPGE_U16QI,cmpge_u16qi,CONST, vector_nltuv16qi) +BU_VSX_2 (CMPGE_U8HI, cmpge_u8hi, CONST, vector_nltuv8hi) +BU_VSX_2 (CMPGE_U4SI, cmpge_u4si, CONST, vector_nltuv4si) +BU_VSX_2 (CMPGE_U2DI, cmpge_u2di, CONST, vector_nltuv2di) + +BU_VSX_2 (CMPLE_16QI, cmple_16qi, CONST, vector_ngtv16qi) +BU_VSX_2 (CMPLE_8HI, cmple_8hi, CONST, vector_ngtv8hi) +BU_VSX_2 (CMPLE_4SI, cmple_4si, CONST, vector_ngtv4si) +BU_VSX_2 (CMPLE_2DI, cmple_2di, CONST, vector_ngtv2di) +BU_VSX_2 (CMPLE_U16QI,cmple_u16qi,CONST, vector_ngtuv16qi) +BU_VSX_2 (CMPLE_U8HI, cmple_u8hi, CONST, vector_ngtuv8hi) +BU_VSX_2 (CMPLE_U4SI, cmple_u4si, CONST, vector_ngtuv4si) +BU_VSX_2 (CMPLE_U2DI, cmple_u2di, CONST, vector_ngtuv2di) + /* VSX abs builtin functions. */ BU_VSX_A (XVABSDP, xvabsdp,CONST, absv2df2) BU_VSX_A (XVNABSDP, xvnabsdp, CONST, vsx_nabsv2df2) Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 224924) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -1096,6 +1096,26 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_XVCMPGEDP, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_16QI, +RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U16QI, +RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, +RS6000_BTI_unsigned_V16QI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_8HI, +RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U8HI, +RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, +RS6000_BTI_unsigned_V8HI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_4SI, +RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U4SI, +RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, +RS6000_BTI_unsigned_V4SI, 0}, + { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_2DI, +
Re: [patch] fix regrename pass to ensure renamings produce valid insns
Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); On Mon, Jun 29, 2015 at 5:08 AM, Sandra Loosemore san...@codesourcery.com wrote: On 06/24/2015 09:46 PM, Jeff Law wrote: On 06/23/2015 07:00 PM, Sandra Loosemore wrote: On 06/18/2015 11:32 AM, Eric Botcazou wrote: The attached patch teaches regrename to validate insns affected by each register renaming before making the change. I can see at least two other ways to handle this -- earlier, by rejecting renamings that result in invalid instructions when it's searching for the best renaming; or later, by validating the entire set of renamings as a group instead of incrementally for each one -- but doing it all in regname_do_replace seems least disruptive and risky in terms of the existing code. OK, but the patch looks incomplete, rename_chains should be adjusted as well, i.e. regrename_do_replace should now return a boolean. Like this? I tested this on nios2 and x86_64-linux-gnu, as before, plus built for aarch64-linux-gnu and ran the gcc testsuite. The c6x back end also calls regrename_do_replace. I am not set up to build or test on that target, and Bernd told me off-list that it would never fail on that target anyway so I have left that code alone. -Sandra regrename-2.log 2015-06-23 Chung-Lin Tangclt...@codesourcery.com Sandra Loosemoresan...@codesourcery.com gcc/ * regrename.h (regrename_do_replace): Change to return bool. * regrename.c (rename_chains): Check return value of regname_do_replace. (regrename_do_replace): Re-validate the modified insns and return bool status. * config/aarch64/cortex-a57-fma-steering.c (rename_single_chain): Update to match rename_chains changes. As Eric mentioned, please put an assert to verify that the call from the c6x backend never fails. The regrename and ARM bits are fine. Do you have a testcase that you can add to the suite? If so it'd be appreciated if you could include that too. Approved with the c6x assert if a testcase isn't available or exceedingly difficult to produce. Thanks. I've committed the attached version. Re the testcase, this fixed 16 FAILs on existing tests in the gcc testsuite with the forthcoming nios2 load/store multiple instruction support, all assembler errors due to the bad instructions being generated. There's nothing I can do on nios2 for a testcase until I get those patches committed (I'm still trying to re-test and tidy them up for submission), plus I think the failures are rather fragile -- depending on the register allocator choosing an initial register numbering that allows peephole optimizers to trigger, etc. But, I will revisit this later and see what I can do. -Sandra
Re: [PATCH GCC][refacor]Manage allocation of struct iv in obstack.
On Sat, Jun 27, 2015 at 5:13 AM, Jeff Law l...@redhat.com wrote: On 06/26/2015 03:02 AM, Bin Cheng wrote: Hi, GCC avoids multi-pointers/dangling-pointers of struct iv by allocating multiple copies of the structure. This patch is an obvious fix to the issue by managing iv structures in obstack. Bootstrap on x86_64, will apply to trunk if no objection. Thanks, bin 2015-06-26 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (struct ivopts_data): New field iv_obstack. (tree_ssa_iv_optimize_init): Initialize iv_obstack. (alloc_iv): New parameter. Allocate struct iv using obstack_alloc. (set_iv, find_interesting_uses_address, add_candidate_1): New argument. (find_interesting_uses_op): Don't duplicate struct iv. (free_loop_data): Don't free iv structure explicitly. (tree_ssa_iv_optimize_finalize): Free iv_obstack. Presumably you're trying to simplify the memory management here so that you don't have to track lifetimes of the IV structures so carefully, which in turn simplifies some upcoming patch? Yes, that's exactly the reason. I am still on the way fixing missed-optimizations in IVO, and plan to do some refactoring/simplification afterwards. Note we don't have a no objection policy for this kind of patch. However, I think it may make sense to look into having you as a maintainer for the IV optimizations if you're interested. Oh, that would be my great honor. Thanks, bin Jeff
Re: [PATCH] Discard Scops for which entry==exit
On Mon, Jun 29, 2015 at 3:04 PM, Aditya Kumar hiradi...@msn.com wrote: In this patch we discard the scops where entry and exit are the same BB. This is an effort to remove graphite-scop-detection.c:limit_scops. Removing the limit_scops function introduces correctness regressions. We are making relevant changes in incremental steps to fix those bugs, and finally we intend to remove limit_scops. 2015-06-29 Aditya Kumar aditya...@samsung.com Sebastian Pop s@samsung.com * graphite-scop-detection.c (build_scops_1): Discard scops for which entry==exit Looks good to me. Let's wait on comments from Tobi before pushing this patch. Thanks, Sebastian --- gcc/graphite-scop-detection.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index e8ddecd..f57cc4a 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -810,7 +810,14 @@ build_scops_1 (basic_block current, loop_p outermost_loop, { open_scop.exit = sinfo.exit; gcc_assert (open_scop.exit); - scops-safe_push (open_scop); + if (open_scop.entry != open_scop.exit) + scops-safe_push (open_scop); + else + { + sinfo.difficult = true; + sinfo.exits = false; + sinfo.exit = NULL; + } } result.exit = sinfo.exit; -- 2.1.0.243.g30d45f7
Re: [s390] Revert TPF C++ library changes
But don't we need to support the older system (with 2 libstdc++s) and the newer system (just one libstdc++)? Which implies these changes need to be conditional, right? The CPP2 configuration was never shipped to TPF customers, so there's no need to retain both ways.
[PATCH, ARM] stop changing signedness in PROMOTE_MODE
This is my suggested fix for PR 65932, which is a linux kernel miscompile with gcc-5.1. The problem here is caused by a chain of events. The first is that the relatively new eipa_sra pass creates fake parameters that behave slightly differently than normal parameters. The second is that the optimizer creates phi nodes that copy local variables to fake parameters and/or vice versa. The third is that the ouf-of-ssa pass assumes that it can emit simple move instructions for these phi nodes. And the fourth is that the ARM port has a PROMOTE_MODE macro that forces QImode and HImode to unsigned, but a TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and short parameters have different in register representations than local variables, and require a conversion when copying between them, a conversion that the out-of-ssa pass can't easily emit. Ultimately, I think this is a problem in the arm backend. It should not have a PROMOTE_MODE macro that is changing the sign of char and short local variables. I also think that we should merge the PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to prevent this from happening again. I see four general problems with the current ARM PROMOTE_MODE definition. 1) Unsigned char is only faster for armv5 and earlier, before the sxtb instruction was added. It is a lose for armv6 and later. 2) Unsigned short was only faster for targets that don't support unaligned accesses. Support for these targets was removed a while ago, and this PROMODE_MODE hunk should have been removed at the same time. It was accidentally left behind. 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was converted to a function, the PROMOTE_MODE code was copied without the UNSIGNEDP changes. Thus it is only an accident that TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE changes to resolve the difference are safe. 4) There is a general principle that you should only change signedness in PROMOTE_MODE if the hardware forces it, as otherwise this results in extra conversion instructions that make code slower. The mips64 hardware for instance requires that 32-bit values be sign-extended regardless of type, and instructions may trap if this is not true. However, it has a set of 32-bit instructions that operate on these values, and hence no conversions are required. There is no similar case on ARM. Thus the conversions are unnecessary and unwise. This can be seen in the testcases where gcc emits both a zero-extend and a sign-extend inside a loop, as the sign-extend is required for a compare, and the zero-extend is required by PROMOTE_MODE. My change was tested with an arm bootstrap, make check, and SPEC CPU2000 run. The original poster verified that this gives a linux kernel that boots correctly. The PRMOTE_MODE change causes 3 testsuite testcases to fail. These are tests to verify that smulbb and/or smlabb are generated. Eliminating the unnecessary sign conversions causes us to get better code that doesn't include the smulbb and smlabb instructions. I had to modify the testcases to get them to emit the desired instructions. With the testcase changes there are no additional testsuite failures, though I'm concerned that these testcases with the changes may be fragile, and future changes may break them again. If there are ARM parts where smulbb/smlabb are faster than mul/mla, then maybe we should try to add new patterns to get the instructions emitted again for the unmodified testcases. Jim gcc/ 2015-06-29 Jim Wilson jim.wil...@linaro.org PR target/65932 * config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and HImode. gcc/testsuite/ 2015-06-29 Jim Wilson jim.wil...@linaro.org PR target/65932 * gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers. Cast multiply operands to short. * gcc.target/arm/wmul-2.c (vec_mpy): Cast multiply result to short. * gcc.target/arm/wmul-3.c (mac): Cast multiply results to short. Index: config/arm/arm.h === --- config/arm/arm.h (revision 224672) +++ config/arm/arm.h (working copy) @@ -523,16 +523,10 @@ extern int arm_arch_crc; type, but kept valid in the wider mode. The signedness of the extension may differ from that of the type. */ -/* It is far faster to zero extend chars than to sign extend them */ - #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ if (GET_MODE_CLASS (MODE) == MODE_INT \ GET_MODE_SIZE (MODE) 4) \ { \ - if (MODE == QImode) \ - UNSIGNEDP = 1;\ - else if (MODE == HImode) \ - UNSIGNEDP = 1;\ (MODE) = SImode;\ } Index: testsuite/gcc.target/arm/wmul-1.c === --- testsuite/gcc.target/arm/wmul-1.c (revision 224672) +++ testsuite/gcc.target/arm/wmul-1.c (working copy) @@ -2,14 +2,14 @@
Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)
On Mon, 2015-06-29 at 18:34 -0400, Ulrich Drepper wrote: On Mon, Jun 29, 2015 at 5:26 PM, David Malcolm dmalc...@redhat.com wrote: I'm looking at ways to manage libgccjit API/ABI as per this thread: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html by transitioning to using symbol versioning, so that the linker can tag subsets of libgccjit symbols in both libgccjit and in client binaries. You don't have to bump the SONAME to introduce symbol versioning. glibc in the beginning didn't have symbol versioning and we wrote the linker and dynamic linker support so that no SONAME change was necessary. The idea is that unversioned symbols are satisfied by the oldest symbol version. Aha! Thanks. I won't bump the SONAME, in that case, I'll just add symbol versioning, and each new symbol will go it a new tag.
Re: [PATCH, rs6000] Add more overloaded built-ins for vec_cmpge and vec_cmple
On Mon, Jun 29, 2015 at 10:05 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This patch adds built-in functions for vec_cmpge and vec_cmple required by the ELFv2 ABI but not yet present in gcc. These make use of the existing patterns for gt:VEC_I and gtu:VEC_I, applying the not operator and reversing the order of operands as needed. The result is generation of a vmpgt[su][bhwd] instruction followed by an xxlnor. Since it is likely that a vec_cmp[gl]e will be followed by a vec_sel, it will be important to optimize the xxlnor/xxsel* to remove the xxlnor by reversing the sense of the select. A separate patch will follow to add this pattern to simplify-rtx.c. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill [gcc] 2015-06-29 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000-builtin.def (CMPGE_16QI): New built-in definition. (CMPGE_8HI): Likewise. (CMPGE_4SI): Likewise. (CMPGE_2DI): Likewise. (CMPGE_U16QI): Likewise. (CMPGE_U8HI): Likewise. (CMPGE_U4SI): Likewise. (CMPGE_U2DI): Likewise. (CMPLE_16QI): Likewise. (CMPLE_8HI): Likewise. (CMPLE_4SI): Likewise. (CMPLE_2DI): Likewise. (CMPLE_U16QI): Likewise. (CMPLE_U8HI): Likewise. (CMPLE_U4SI): Likewise. (CMPLE_U2DI): Likewise. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add overloads for ALTIVEC_BUILTIN_VEC_CMPGE and ALTIVEC_BUILTIN_VEC_CMPLE. * config/rs6000/vector.md (vector_gemode): Restrict to floating-point vector modes. (vector_nltmode): New define_expand. (vector_nltumode): Likewise. (vector_ngtmode): Likewise. (vector_ngtumode): Likewise. [gcc/testsuite] 2015-06-29 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/vec-cmp.c: New test. Okay. Thanks, David
Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT
On 29/06/15 21:56, Maxim Kuvyrkov wrote: On Jun 28, 2015, at 2:28 PM, Kugan kugan.vivekanandara...@linaro.org wrote: This patch allows setting REG_EQUAL for ZERO_EXTRACT and handle that in cse (where the src for the ZERO_EXTRACT needs to be calculated) Thanks, Kugan From 75e746e559ffd21b25542b3db627e3b318118569 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org Date: Fri, 26 Jun 2015 17:12:07 +1000 Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT --- gcc/ChangeLog | 6 ++ gcc/cse.c | 41 - gcc/emit-rtl.c | 3 ++- 3 files changed, 48 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 080aa39..d4a73d6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2015-06-26 Kugan Vivekanandarajah kug...@linaro.org + +* cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT. +* emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set +REG_EQUAL note. + 2015-06-25 H.J. Lu hongjiu...@intel.com * gentarget-def.c (def_target_insn): Cast return of strtol to diff --git a/gcc/cse.c b/gcc/cse.c index 100c9c8..8add651 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4531,8 +4531,47 @@ cse_insn (rtx_insn *insn) if (n_sets == 1 REG_NOTES (insn) != 0 (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0 (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl)) + || GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)) -src_eqv = copy_rtx (XEXP (tem, 0)); +{ + src_eqv = copy_rtx (XEXP (tem, 0)); + + /* If DEST is of the form ZERO_EXTACT, as in: + (set (zero_extract:SI (reg:SI 119) + (const_int 16 [0x10]) + (const_int 16 [0x10])) + (const_int 51154 [0xc7d2])) + REG_EQUAL note will specify the value of register (reg:SI 119) at this + point. Note that this is different from SRC_EQV. We can however + calculate SRC_EQV with the position and width of ZERO_EXTRACT. */ + if (GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT) Consider changing if (something (!rtx_equal_p) || ZERO_EXTRACT || STRICT_LOW_PART) to if (something !rtx_equal_p) { if (ZERO_EXTRACT) { } else if (STRICT_LOW_PART) { } } Otherwise looks good to me, but you still need another approval. Thanks Maxim for the review. How about the attached patch? Thanks, Kugan +{ + if (CONST_INT_P (src_eqv) + CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 1)) + CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 2))) +{ + rtx dest_reg = XEXP (SET_DEST (sets[0].rtl), 0); + rtx width = XEXP (SET_DEST (sets[0].rtl), 1); + rtx pos = XEXP (SET_DEST (sets[0].rtl), 2); + HOST_WIDE_INT val = INTVAL (src_eqv); + HOST_WIDE_INT mask; + unsigned int shift; + if (BITS_BIG_ENDIAN) +shift = GET_MODE_PRECISION (GET_MODE (dest_reg)) + - INTVAL (pos) - INTVAL (width); + else +shift = INTVAL (pos); + if (INTVAL (width) == HOST_BITS_PER_WIDE_INT) +mask = ~(HOST_WIDE_INT) 0; + else +mask = ((HOST_WIDE_INT) 1 INTVAL (width)) - 1; + val = (val shift) mask; + src_eqv = GEN_INT (val); +} + else +src_eqv = 0; +} +} /* Set sets[i].src_elt to the class each source belongs to. Detect assignments from or to volatile things diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index e7f7eab..cb891b1 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -5228,7 +5228,8 @@ set_for_reg_notes (rtx insn) reg = SET_DEST (pat); /* Notes apply to the contents of a STRICT_LOW_PART. */ - if (GET_CODE (reg) == STRICT_LOW_PART) + if (GET_CODE (reg) == STRICT_LOW_PART + || GET_CODE (reg) == ZERO_EXTRACT) reg = XEXP (reg, 0); /* Check that we have a register. */ -- 1.9.1 -- Maxim Kuvyrkov www.linaro.org From 3754bcd11fb732e03a39d6625a9eb14934b36643 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org Date: Fri, 26 Jun 2015 17:12:07 +1000 Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT --- gcc/ChangeLog | 6 ++ gcc/cse.c | 49 ++--- gcc/emit-rtl.c | 3 ++- 3 files changed, 50 insertions(+), 8 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 080aa39..d4a73d6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2015-06-26 Kugan Vivekanandarajah kug...@linaro.org + + * cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT. + * emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set + REG_EQUAL note. + 2015-06-25 H.J. Lu
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 06/29/2015 09:07 PM, Kito Cheng wrote: Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); I'm sorry; do you have a suggestion for a fix? I thought this was the change I was asked to make, and as I noted previously, I'm not set up to test (or even build) for this target. -Sandra the obviously confused :-(
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 2015/6/30 12:22 PM, Sandra Loosemore wrote: On 06/29/2015 09:07 PM, Kito Cheng wrote: Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); I'm sorry; do you have a suggestion for a fix? I thought this was the change I was asked to make, and as I noted previously, I'm not set up to test (or even build) for this target. -Sandra the obviously confused :-( You probably have to separate out the regrename_do_replace() bool result into a variable, placing the whole call into the gcc_assert() might make it disappear when assertions are turned off. Chung-Lin
Re: [C++/66443]
On 06/29/2015 06:57 PM, Nathan Sidwell wrote: * method.c (synthesized_method_walk): Skip virtual bases of abstract classes in C++14 mode. Let's not limit this to C++14 mode; most DRs apply to earlier standards as well. Jason
Re: [s390] Revert TPF C++ library changes
On 06/29/2015 07:00 PM, DJ Delorie wrote: But don't we need to support the older system (with 2 libstdc++s) and the newer system (just one libstdc++)? Which implies these changes need to be conditional, right? The CPP2 configuration was never shipped to TPF customers, so there's no need to retain both ways. Ah, then approved. Jeff
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 2015/6/30 下午 01:13, Chung-Lin Tang wrote: On 2015/6/30 12:22 PM, Sandra Loosemore wrote: On 06/29/2015 09:07 PM, Kito Cheng wrote: Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); I'm sorry; do you have a suggestion for a fix? I thought this was the change I was asked to make, and as I noted previously, I'm not set up to test (or even build) for this target. -Sandra the obviously confused :-( You probably have to separate out the regrename_do_replace() bool result into a variable, placing the whole call into the gcc_assert() might make it disappear when assertions are turned off. I notice the way gcc_assert() is defined in system.h now, the test won't disappear even when runtime checks are disabled, though you might still adjust it to avoid any programmer confusion. Chung-Lin
Re: [PATCH] config/bfin/bfin.c (hwloop_optimize): Use return false instead of gcc_assert for checking jump_insn.
On 06/28/2015 04:15 PM, Chen Gang wrote: For bfin looping optimization, after lsetup optimization, it can have the correct lsetup related insns which causes gcc_assert for jump_insn. I've been debugging this for a bit, and at least the explanation of the patch is wrong - it's finding an LSETUP for a different loop. There seems to be an inconsistency in the CFG, and it looks like it's caused by the unusual (?) situation that both arms out of a conditional branch lead directly to a hwloop candidate. So, not OK until further investigation I think. Bbernd