Re: Phone call (was Re: Armhf dynamic linker path)
On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote: All good. My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it: The directory should be /libhf/ or /libhfp/ for that for consistency with all the other architectures. Note e.g. x86_64 dynamic linker is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2. For distros that choose to multilib softfp vs. hardfp all the hardfp libraries would go into the usual */lib{qual} paths (for qual hf resp. hfp), for others /libhf can be a symlink to /lib or for those doing multiarch stuff can be a symlink to the multiarch location of the thing. I'm fine with arm and hf (resp. hfp) being mentioned in the name of the dynamic linker, but IMNSHO having there gnu and eabi strings is an overkill - why mention gnu there, when all the other architectures which also have GNU libc dynamic linkers don't? That part is implicit. And, EABI is implied by so.3, softfp dynamic linker for EABI is /lib/ld-linux.so.3 while softfp dynamic linker for the old ABI is /lib/ld-linux.so.2. Jakub
[PATCH] Option to build bare-metal ARM cross-compiler for arm-none-eabi target without libunwind
We've been using a bare-metal 'arm-elf' cross-compiler toolchain over the years since GCC 2.95.x/GCC3/GCC4, and it has worked fine. Now it seems though like the 'arm-elf' target seems obsolete, and we are trying to switch to 'arm-none-eabi'. Though we use no standard-libs at all, we are really hard bare-metal with our own memcpy() implementation etc. Now when we try to build GCC 4.7.0 we get errors that libgcc has dependencies on UnWind stuff we cannot get rid of. The dependencies are caused by some division library functions like _divdi3 that can throw exceptions like division-by-zero. These exceptions I guess mainly targeting C++/Java exceptions, and we want to avoid dependencies to libunwind when building libgcc for our bare-metal pure C-toolchain. I attach a patch that adds an option to configure to avoid building libgcc with libunwind dependencies. The new option is called '--disable-libunwind-exceptions'. Over the years several people have experienced the same problem, here is some background from the gcc mailinglists: http://gcc.gnu.org/ml/gcc-help/2012-03/msg00352.html http://gcc.gnu.org/ml/gcc-help/2009-10/msg00302.html Please review and comment! Best Regards, Fredrik Hederstierna Securitas Direct AB Malmoe Sweden disable_libunwind_exceptions.patch Description: Binary data
Re: Phone call (was Re: Armhf dynamic linker path)
On Thu, Apr 12, 2012 at 10:33:08AM +0300, Riku Voipio wrote: On 12 April 2012 09:05, Jakub Jelinek ja...@redhat.com wrote: On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote: All good. My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it: The directory should be /libhf/ or /libhfp/ for that for consistency with all the other architectures. Note e.g. x86_64 dynamic linker is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2. For some value of consistency. x86_64, mips64, powerpc64 and sparc64 install to /lib64. But on ia64 it is /lib/ld-linux-ia64.so.2 and on ia64 installs in /lib, because it isn't a multilibbed architecture. s390x it is /lib/ld64.so.1 [1]. Ok, I forgot about this, I've tried to convince s390x folks to move it to /lib64/ld64.so.1 many years ago, but that just didn't happen, so /lib/ld64.so.1 is just a symlink to /lib64/ld64.so.1. Upstream glibc binaries use /lib64/ld64.so.1 as their dynamic linker, while all other packages use /lib/ld64.so.1 as that is hardcoded in gcc. That is an argument that perhaps /lib/ld-linux-armhf.so.3 could be acceptable too, as it would follow the s390x model, I wouldn't be terribly happy about that, but I could live with that. Jakub
Re: [C++ Patch] for c++/52465
2012/4/11 Jason Merrill ja...@redhat.com: [...] Your ChangeLog needs to be adjusted. :) I did write an updated ChangeLog, but I pasted here the wrong one... I committed it to trunk yesterday, hopefully with the correct ChangeLog . I'll be backporting it to 4.7 after a relaxing weekend :-) -- Fabien
Re: [cxx-conversion] Switch default to build in C++
On Wed, Apr 11, 2012 at 6:37 PM, Diego Novillo dnovi...@google.com wrote: This patch makes the branch to build in C++ mode by default. Tested on x86_64. Committed. Note that for merging the branch the support to build with a C host compiler should be removed (and docs updated). Richard. ChangeLog.cxx-conversion * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'. * configure: Regenerate. gcc/ChangeLog.cxx-conversion * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'. * configure: Regenerate. libcpp/ChangeLog.cxx-conversion * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'. * configure: Regenerate. Index: configure.ac === --- configure.ac (revision 186326) +++ configure.ac (working copy) @@ -1191,7 +1191,7 @@ [AS_HELP_STRING([--enable-build-with-cxx], [build with C++ compiler instead of C compiler])], ENABLE_BUILD_WITH_CXX=$enableval, -ENABLE_BUILD_WITH_CXX=no) +ENABLE_BUILD_WITH_CXX=yes) # Build stage1 with C and build stages 2 and 3 with C++. AC_ARG_ENABLE(build-poststage1-with-cxx, Index: gcc/configure.ac === --- gcc/configure.ac (revision 186326) +++ gcc/configure.ac (working copy) @@ -608,7 +608,7 @@ [AS_HELP_STRING([--enable-build-with-cxx], [build with C++ compiler instead of C compiler])], ENABLE_BUILD_WITH_CXX=$enableval, -ENABLE_BUILD_WITH_CXX=no) +ENABLE_BUILD_WITH_CXX=yes) AC_SUBST(ENABLE_BUILD_WITH_CXX) if test $ENABLE_BUILD_WITH_CXX = yes; then AC_DEFINE(ENABLE_BUILD_WITH_CXX, 1, Index: libcpp/configure.ac === --- libcpp/configure.ac (revision 186326) +++ libcpp/configure.ac (working copy) @@ -22,7 +22,7 @@ AC_ARG_ENABLE(build-with-cxx, [ --enable-build-with-cxx build with C++ compiler instead of C compiler], ENABLE_BUILD_WITH_CXX=$enableval, -ENABLE_BUILD_WITH_CXX=no) +ENABLE_BUILD_WITH_CXX=yes) AC_SUBST(ENABLE_BUILD_WITH_CXX) MISSING=`cd $ac_aux_dir ${PWDCMD-pwd}`/missing
Re: [IA-64] Fix PR target/48496
On Wed, Apr 11, 2012 at 11:01 PM, Eric Botcazou ebotca...@adacore.com wrote: Hi, this is the spurious error on asm statements of the form: error: 'asm' operand requires impossible reload present for IA-64 on mainline and 4.7 branch. As diagnosed by Ulrich, the code responsible for the error implicitly assumes that constraints accepting memory operands also accept pseudo-registers during reload. That isn't true for the Q constraint of IA-64. The patch also fixes the MEM_VOLATILE_P access not properly guarded by a MEM_P test in the expression. Bootstrapped/regtested on IA-64/Linux, OK for mainline and 4.7 branch? How ugly ;) Can we have a generic helper for the pseudo-created-by-reload test on trunk please? Ok. Thanks, Richard. 2012-04-11 Eric Botcazou ebotca...@adacore.com PR target/48496 * config/ia64/constraints.md (Q): Only accept non-volatile MEMs and also pseudo-registers during reload. 2012-04-11 Eric Botcazou ebotca...@adacore.com * gcc.target/ia64/pr48496.c: New test. * gcc.target/ia64/pr52657.c: Likewise. -- Eric Botcazou
Re: [i386, patch, RFC] HLE support in GCC
Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. IMHO, this is also not as good, since ACQUIRE/RELEASE reflect actual prefixies names. HLE_START/END may confuse user even more. I am also, agree with Jakub, such things shouldn't be implied. Going to remove it from patch K
Re: [IA-64] Fix PR target/48496
On Wed, Apr 11, 2012 at 11:01:40PM +0200, Eric Botcazou wrote: 2012-04-11 Eric Botcazou ebotca...@adacore.com PR target/48496 * config/ia64/constraints.md (Q): Only accept non-volatile MEMs and also pseudo-registers during reload. 2012-04-11 Eric Botcazou ebotca...@adacore.com * gcc.target/ia64/pr48496.c: New test. * gcc.target/ia64/pr52657.c: Likewise. Is the standard condition for define_memory_constraint here /* Likewise if the address will be reloaded because reg_equiv_address is nonzero. For reg_equiv_mem we have to check. */ else if (REG_P (operand) REGNO (operand) = FIRST_PSEUDO_REGISTER reg_renumber[REGNO (operand)] 0 ((reg_equiv_mem (REGNO (operand)) != 0 EXTRA_CONSTRAINT_STR (reg_equiv_mem (REGNO (operand)), c, p)) || (reg_equiv_address (REGNO (operand)) != 0))) win = 1; If so, shouldn't you check those conditions as well, or at least something similar? Not sure if reg_equiv_address needs to be allowed there, and guess reg_equiv_mem should satisfy the Q constraint, i.e. !MEM_VOLATILE_P memory_operand. Accepting any pseudo there sounds too risky to me... Index: config/ia64/constraints.md === --- config/ia64/constraints.md(revision 186272) +++ config/ia64/constraints.md(working copy) @@ -111,11 +111,16 @@ (define_constraint H ;; Note that while this accepts mem, it only accepts non-volatile mem, ;; and so cannot be fixed by adjusting the address. Thus it cannot -;; and does not use define_memory_constraint. +;; and does not use define_memory_constraint. But it needs to accept +;; pseudo-registers during reload like a define_memory_constraint. (define_constraint Q Non-volatile memory for FP_REG loads/stores - (and (match_operand 0 memory_operand) - (match_test !MEM_VOLATILE_P (op + (ior (and (match_code mem) + (match_test !MEM_VOLATILE_P (op)) + (match_operand 0 memory_operand)) + (and (match_code reg) + (match_test !HARD_REGISTER_P (op)) + (match_test reload_in_progress (define_constraint R 1..4 for shladd arguments Jakub
[Patch, Fortran] PR52864 - Fix pointer-intent regresssion
That's a GCC 4.6-4.8 regression. Pointer intents apply to the association status and not to the value. Thus, assigning to an intent(in) pointer is fine. The problem was that the LHS is no pointer due to the array access (dt%ptr(1) =) thus, the check got triggered. Build and regtested on x86-64-linux. OK for the trunk and the 4.6/4.7 branch? Tobias 20012-04-12 Tobias Burnus bur...@net-b.de PR fortran/52864 * expr.c (gfc_check_vardef_context): Fix assignment check for pointer components. 20012-04-12 Tobias Burnus bur...@net-b.de PR fortran/52864 * gfortran.dg/pointer_intent_6.f90: New. diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c index e6a9c88..7ce693b 100644 --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -4656,7 +4680,11 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj, if (ptr_component ref-type == REF_COMPONENT) check_intentin = false; if (ref-type == REF_COMPONENT ref-u.c.component-attr.pointer) - ptr_component = true; + { + ptr_component = true; + if (!pointer) + check_intentin = false; + } } if (check_intentin sym-attr.intent == INTENT_IN) { --- /dev/null 2012-04-12 06:55:49.927755790 +0200 +++ gcc/gcc/testsuite/gfortran.dg/pointer_intent_6.f90 2012-04-12 09:35:25.0 +0200 @@ -0,0 +1,19 @@ +! { dg-do compile } +! +! PR fortran/52864 +! +! Assigning to an intent(in) pointer (which is valid). +! + program test + type PoisFFT_Solver3D + complex, dimension(:,:,:), + pointer :: work = null() + end type PoisFFT_Solver3D + contains +subroutine PoisFFT_Solver3D_FullPeriodic(D, p) + type(PoisFFT_Solver3D), intent(in) :: D + real, intent(in), pointer :: p(:) + D%work(i,j,k) = 0.0 + p = 0.0 +end subroutine + end
Re: [PATCH] [ARM] thumb1 imm move 256-510
On 12/04/12 02:48, Joey Ye wrote: For thumb1 use move + add instructions for immediate move [256-510]. Following is a complete range if combine an imm mov with listed instructions. Among them, lsls and neg have already been implemented. The only missing opportunity is add, in which I enabled in this patch. Others are replicated with lsls or neg. 1. Result imm range with 16bit movs Insn Result imm with movs lsls [0x1-0xff] shifted by 1-31 add [256, 510] neg [-255, 0] rev [0x100-0xff00] step 0x100 rev16 [0x100-0xff00] step 0x100 revsh [0x1-0xff] step 0x1 rsb [-255, 0] mvn [-255, 0] sub [-255, 0] sxtb [-255, 0] ChangeLog: * config/arm/constraints.md (Pe): New constraint. * config/arm/arm.md: New split for imm 256-510. Testcase: * gcc.target/arm/thumb1-imm.c: New testcase. OK. R.
Re: [i386, patch, RFC] HLE support in GCC
Folks, Here is patch with removed implied atomic ACQUIRE/RELEASE. Could you please have a look? ChangeLog entry: 2012-04-12 Kirill Yukhin kirill.yuk...@intel.com * builtins.c (get_memmodel): Remove check of upper bound. (expand_builtin_atomic_compare_exchange): Mask memmodel values. * config/i386/cpuid.h (bit_HLE): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect HLE support. * config/i386/i386-protos.h (ix86_generate_hle_prefix): New. * config/i386/i386-c.c (ix86_target_macros_internal): Set HLE defines. (ix86_target_string)-mhle: New. (ix86_option_override_internal)PTA_HLE: Ditto. (ix86_valid_target_attribute_inner_p)OPT_mhle: Ditto. * config/i386/i386.c (ix86_target_string)OPTION_MASK_ISA_HLE: New. (ix86_valid_target_attribute_inner_p)OPT_mhle: Ditto. (ix86_generate_hle_prefix): Ditto. * config/i386/i386.h (OPTION_ISA_HLE): Ditto. (IX86_HLE_ACQUIRE): Ditto. (IX86_HLE_RELEASE): Ditto. * config/i386/i386.h (ix86_generate_hle_prefix): Ditto. * config/i386/i386.opt (mhle): Ditto. * config/i386/sync.md(atomic_compare_and_swapmode): Pass success model to instruction emitter. (atomic_compare_and_swap_singlemode): Define and use argument for success model. (atomic_compare_and_swap_doublemode): Ditto. * configure.ac: Check if assembler support HLE prefixies. * configure: Regenerate. * config.in: Ditto. Thanks, K hle-rfc-5.gcc.patch Description: Binary data
Re: [i386, patch, RFC] HLE support in GCC
On Thu, Apr 12, 2012 at 01:37:24PM +0400, Kirill Yukhin wrote: Folks, Here is patch with removed implied atomic ACQUIRE/RELEASE. Could you please have a look? + + sprintf (hle_macro, __ATOMIC_HLE_ACQUIRE=%d, IX86_HLE_ACQUIRE); + def_or_undef (parse_in, hle_macro); + + sprintf (hle_macro, __ATOMIC_HLE_RELEASE=%d, IX86_HLE_RELEASE); + def_or_undef (parse_in, hle_macro); This doesn't belong to ix86_target_macros_internal, but to ix86_target_macros. It is defined unconditionally, so you don't want to undef and define it again on each target pragma, and furthermore cpp_undef with __ATOMIC_HLE_ACQUIRE=something wouldn't work (for undef you'd need __ATOMIC_HLE_ACQUIRE). And in ix86_target_macros you should be able to use cpp_define_formatted and avoid the temporary buffer. As for the rest of the patch, I'd like Richard to chime in... Jakub
strengthen protection against REG_EQUIV/EQUAL on !REG set
Hello, This is a followup on a suggestion made along the thread at http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00967.html where we were observing the middle-end setting invalid REG_EQUIV notes on set(mem) insns. At the time we had fixed specific locations where this was happening via explicit calls to set_unique_reg_note. Steven suggested to add some code to set_unique_reg_note as well. This patch implements this suggestion. We (AdaCore) have been using it without problem for a while now, it turned out useful to prevent a real case in the gcc-4.5 series and we never saw the original issue resurface again since then. Bootstrapped and regtested on x86_64-suse-linux OK to commit ? Thanks in advance, With Kind Regards, Olivier 2012-04-12 Olivier Hainque hain...@adacore.com * emit-rtl.c (set_unique_reg_note): Don't add REG_EQUAL or REG_EQUIV on a SET if the destination isn't a REG or a SUBREG of REG. regequiv.dif Description: video/dv
Re: strengthen protection against REG_EQUIV/EQUAL on !REG set
Clarifying: On Apr 12, 2012, at 11:54 , Olivier Hainque wrote: At the time we had fixed specific locations where this was happening via explicit calls to set_unique_reg_note. We had fixed the problems observable at the time by preventing calls to set_unique_reg_notes when they would lead to invalid settings. We did that at specific locations where we had observed real problematic calls taking place. Steven suggested to add some code to set_unique_reg_note as well. This patch implements this suggestion. Olivier
[PATCH] Also use max-iteration knowledge for cost consideration
This makes us also use max-iteration knowledge when considering costs for prefetching and parallelization instead of only relying on iteration estimates which are sometimes not available. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-04-11 Richard Guenther rguent...@suse.de * tree-parloops.c (parallelize_loops): Also consult the upper bound for the number of iterations. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. (loop_prefetch_arrays): Likewise. Index: trunk/gcc/tree-parloops.c === *** trunk.orig/gcc/tree-parloops.c 2012-04-11 16:31:08.0 +0200 --- trunk/gcc/tree-parloops.c 2012-04-11 16:39:25.271205694 +0200 *** parallelize_loops (void) *** 2192,2203 header-copied loops correctly - see PR46886. */ || !do_while_loop_p (loop)) continue; estimated = estimated_stmt_executions_int (loop); /* FIXME: Bypass this check as graphite doesn't update the ! count and frequency correctly now. */ if (!flag_loop_parallelize_all ! ((estimated !=-1 ! estimated = (HOST_WIDE_INT) n_threads * MIN_PER_THREAD) /* Do not bother with loops in cold areas. */ || optimize_loop_nest_for_size_p (loop))) continue; --- 2192,2206 header-copied loops correctly - see PR46886. */ || !do_while_loop_p (loop)) continue; + estimated = estimated_stmt_executions_int (loop); + if (estimated == -1) + estimated = max_stmt_executions_int (loop); /* FIXME: Bypass this check as graphite doesn't update the !count and frequency correctly now. */ if (!flag_loop_parallelize_all ! ((estimated != -1 ! estimated = (HOST_WIDE_INT) n_threads * MIN_PER_THREAD) /* Do not bother with loops in cold areas. */ || optimize_loop_nest_for_size_p (loop))) continue; Index: trunk/gcc/tree-ssa-loop-prefetch.c === *** trunk.orig/gcc/tree-ssa-loop-prefetch.c 2012-04-11 16:31:08.0 +0200 --- trunk/gcc/tree-ssa-loop-prefetch.c 2012-04-11 16:42:45.864193724 +0200 *** determine_loop_nest_reuse (struct loop * *** 1549,1555 aloop = VEC_index (loop_p, vloops, i); vol = estimated_stmt_executions_int (aloop); ! if (vol 0) vol = expected_loop_iterations (aloop); volume *= vol; } --- 1549,1555 aloop = VEC_index (loop_p, vloops, i); vol = estimated_stmt_executions_int (aloop); ! if (vol == -1) vol = expected_loop_iterations (aloop); volume *= vol; } *** loop_prefetch_arrays (struct loop *loop) *** 1801,1806 --- 1801,1808 ahead = (PREFETCH_LATENCY + time - 1) / time; est_niter = estimated_stmt_executions_int (loop); + if (est_niter == -1) + est_niter = max_stmt_executions_int (loop); /* Prefetching is not likely to be profitable if the trip count to ahead ratio is too small. */
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Wed, Apr 11, 2012 at 5:38 PM, Andi Kleen a...@firstfloor.org wrote: Igor Zamyatin izamya...@gmail.com writes: Hi All! It is known that imul placement is rather critical for Atom processors and changes try to improve imul scheduling for Atom. This gives +5% performance on several tests from new OA 2.0 testsuite from EEMBC. Tested for i386 and x86-64, ok for trunk? Did you measure how much this slows down the compiler when compiling for Atom? The new pass looks rather slow. No, since this pass is applied rarely. Conditions are mentioned in the comments to ix86_sched_reorder -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Wed, Apr 11, 2012 at 6:10 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Apr 11, 2012 at 3:38 PM, Andi Kleen a...@firstfloor.org wrote: Igor Zamyatin izamya...@gmail.com writes: Hi All! It is known that imul placement is rather critical for Atom processors and changes try to improve imul scheduling for Atom. This gives +5% performance on several tests from new OA 2.0 testsuite from EEMBC. Tested for i386 and x86-64, ok for trunk? Did you measure how much this slows down the compiler when compiling for Atom? The new pass looks rather slow. Also please explain why adjusting the automaton for Atom is not a way to attack this issue. If I understand the question correctly - it's a dynamic check and it's not clear how to describe this adjusting statically in machine description Richard. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Thu, Apr 12, 2012 at 12:20 PM, Igor Zamyatin izamya...@gmail.com wrote: On Wed, Apr 11, 2012 at 6:10 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Apr 11, 2012 at 3:38 PM, Andi Kleen a...@firstfloor.org wrote: Igor Zamyatin izamya...@gmail.com writes: Hi All! It is known that imul placement is rather critical for Atom processors and changes try to improve imul scheduling for Atom. This gives +5% performance on several tests from new OA 2.0 testsuite from EEMBC. Tested for i386 and x86-64, ok for trunk? Did you measure how much this slows down the compiler when compiling for Atom? The new pass looks rather slow. Also please explain why adjusting the automaton for Atom is not a way to attack this issue. If I understand the question correctly - it's a dynamic check and it's not clear how to describe this adjusting statically in machine description From reading the code (the comments are not clear to me) you seem to produce two adjacent IMUL instructions from within the ready list - but first check that there is only a single one. So I fail to see how it can work. Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? You miss a testcase that would show the effect of your patch. Richard. Richard. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Wed, Apr 11, 2012 at 5:34 PM, Andi Kleen a...@firstfloor.org wrote: Richard Guenther richard.guent...@gmail.com writes: 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, why? Why do you disable register renaming? check_imull requires a function comment. This completely looks like a hack for EEMBC2.0, so it's definitely not ok. -O2 is not supposed to give best benchmark results. Besides it is against the Intel Optimization Manual recommendation to prefer small code on Atom to avoid falling out of the predecode hints in the cache. Yes, this is well-known concern for Atom. But in the same time unroll could help a lot for inorder machines because it could provide more opportunities to a compiler scheduler. And experiments showed that unroll could really help. So would need much more benchmarking on macro workloads first at least. Like what, for example? I believe in this case everything also strongly depends on test usage model (e.g. it usually compiled with Os not O2) and, let's say, internal test structure - whether there are hot loops that suitable for unroll. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin izamya...@gmail.com wrote: Hi All! Here is a patch that enables unroll at O2 for Atom. This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 bits). 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, why? Why do you disable register renaming? check_imull requires a function comment. Sure, enabling unroll for O3 could be the next step. We can't avoid code size increase with unroll - what number do you think will be appropriate? Register renaming was the reason of several degradations during tuning process Comment for check_imull was added This completely looks like a hack for EEMBC2.0, so it's definitely not ok. Why? EEMBC was measured and result provided here just because this benchmark considers to be very relevant for Atom -O2 is not supposed to give best benchmark results. O2 is wide-used so performance improvement could be important for users. Thanks, Richard. Tested for i386 and x86-64, ok for trunk? Updated patch attached Thanks, Igor ChangeLog: 2012-04-10 Yakovlev Vladimir vladimir.b.yakov...@intel.com * gcc/config/i386/i386.c (check_imul): New routine. (ix86_loop_unroll_adjust): New target hook. (ix86_option_override_internal): Enable unrolling on Atom at -O2. (TARGET_LOOP_UNROLL_ADJUST): New define. unroll1.patch Description: Binary data
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin izamya...@gmail.com wrote: On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin izamya...@gmail.com wrote: Hi All! Here is a patch that enables unroll at O2 for Atom. This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 bits). 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, why? Why do you disable register renaming? check_imull requires a function comment. Sure, enabling unroll for O3 could be the next step. We can't avoid code size increase with unroll - what number do you think will be appropriate? Register renaming was the reason of several degradations during tuning process Comment for check_imull was added This completely looks like a hack for EEMBC2.0, so it's definitely not ok. Why? EEMBC was measured and result provided here just because this benchmark considers to be very relevant for Atom I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP would be irrelevant OTOH). Similar code size for, say, Mozilla Firefox or GCC itself would be important. -O2 is not supposed to give best benchmark results. O2 is wide-used so performance improvement could be important for users. But not at a 5% size cost. Please also always check the compile-time effect which is important for -O2 as well. Richard. Thanks, Richard. Tested for i386 and x86-64, ok for trunk? Updated patch attached Thanks, Igor ChangeLog: 2012-04-10 Yakovlev Vladimir vladimir.b.yakov...@intel.com * gcc/config/i386/i386.c (check_imul): New routine. (ix86_loop_unroll_adjust): New target hook. (ix86_option_override_internal): Enable unrolling on Atom at -O2. (TARGET_LOOP_UNROLL_ADJUST): New define.
Re: [PATCH, i386, Android] -mandroid support for i386 target
This is a big change. Maybe it is better to break it into 2 parts: 1. Introduce gnu-user32.h only, which can be reviewed by x86 backend maintainers. Another possibility is to add gnu-user-common.h instead of gnu-user32.h so that no changes to config.gcc are needed. 2. Add Android support, which can reviewed by Android maintainer. Thanks. -- H.J. OK, I'll split this patch and send new patches in separate threads. Thanks Ilya
[PATCH] Fix PR52943
Data-dependence analysis gets analyzing a[3] vs. a[{3, +, -1}_1] wrong where it claims there is no dependence. That is because analyze_siv_subscript_cst_affine uses chrec_is_positive to distinguish cases but that returns always negative for zero (zero, in this case being the distance between 3 and 3). The following patch makes chrec_is_positive fail for zero, which is neither positive nor negative. It also special-cases the zero distance case because it can be handled independently on the direction or value of the increment. For the questionable handling of 'zero' I moved chrec_is_positive to a private place. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I will apply this to the trunk and the 4.7 branch where this latent bug shows up first. Thanks, Richard. 2012-04-12 Richard Guenther rguent...@suse.de PR tree-optimization/52943 * tree-chrec.h (chrec_is_positive): Remove. * tree-scalar-evolution.c (chrec_is_positive): Move ... * tree-data-ref.c (chrec_is_positive): ... here. Make static. Return false for a constant zero instead of negative. (analyze_siv_subscript_cst_affine): Handle zero difference in the initial condition explicitely. * gcc.dg/torture/pr52943.c: New testcase. Index: gcc/tree-chrec.h === *** gcc/tree-chrec.h(revision 186372) --- gcc/tree-chrec.h(working copy) *** extern void for_each_scev_op (tree *, bo *** 77,83 /* Observers. */ extern bool eq_evolutions_p (const_tree, const_tree); extern bool is_multivariate_chrec (const_tree); - extern bool chrec_is_positive (tree, bool *); extern bool chrec_contains_symbols (const_tree); extern bool chrec_contains_symbols_defined_in_loop (const_tree, unsigned); extern bool chrec_contains_undetermined (const_tree); --- 77,82 Index: gcc/tree-scalar-evolution.c === *** gcc/tree-scalar-evolution.c (revision 186372) --- gcc/tree-scalar-evolution.c (working copy) *** compute_overall_effect_of_inner_loop (st *** 501,565 return chrec_dont_know; } - /* Determine whether the CHREC is always positive/negative. If the expression -cannot be statically analyzed, return false, otherwise set the answer into -VALUE. */ - - bool - chrec_is_positive (tree chrec, bool *value) - { - bool value0, value1, value2; - tree end_value, nb_iter; - - switch (TREE_CODE (chrec)) - { - case POLYNOMIAL_CHREC: - if (!chrec_is_positive (CHREC_LEFT (chrec), value0) - || !chrec_is_positive (CHREC_RIGHT (chrec), value1)) - return false; - - /* FIXME -- overflows. */ - if (value0 == value1) - { - *value = value0; - return true; - } - - /* Otherwise the chrec is under the form: {-197, +, 2}_1, -and the proof consists in showing that the sign never -changes during the execution of the loop, from 0 to -loop-nb_iterations. */ - if (!evolution_function_is_affine_p (chrec)) - return false; - - nb_iter = number_of_latch_executions (get_chrec_loop (chrec)); - if (chrec_contains_undetermined (nb_iter)) - return false; - - #if 0 - /* TODO -- If the test is after the exit, we may decrease the number of -iterations by one. */ - if (after_exit) - nb_iter = chrec_fold_minus (type, nb_iter, build_int_cst (type, 1)); - #endif - - end_value = chrec_apply (CHREC_VARIABLE (chrec), chrec, nb_iter); - - if (!chrec_is_positive (end_value, value2)) - return false; - - *value = value0; - return value0 == value1; - - case INTEGER_CST: - *value = (tree_int_cst_sgn (chrec) == 1); - return true; - - default: - return false; - } - } - /* Associate CHREC to SCALAR. */ static void --- 501,506 Index: gcc/tree-data-ref.c === *** gcc/tree-data-ref.c (revision 186372) --- gcc/tree-data-ref.c (working copy) *** max_stmt_executions_tree (struct loop *l *** 1718,1723 --- 1718,1793 return double_int_to_tree (unsigned_type_node, nit); } + /* Determine whether the CHREC is always positive/negative. If the expression +cannot be statically analyzed, return false, otherwise set the answer into +VALUE. */ + + static bool + chrec_is_positive (tree chrec, bool *value) + { + bool value0, value1, value2; + tree end_value, nb_iter; + + switch (TREE_CODE (chrec)) + { + case POLYNOMIAL_CHREC: + if (!chrec_is_positive (CHREC_LEFT (chrec), value0) + || !chrec_is_positive (CHREC_RIGHT (chrec), value1)) + return false; + + /* FIXME -- overflows. */ + if (value0 == value1) + { + *value = value0; + return true; + } + +
Re: [i386, patch, RFC] HLE support in GCC
On Thu, Apr 12, 2012 at 12:38:44AM +0200, Torvald Riegel wrote: On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote: Tests passing, bootstrap in progress. Comments? Do you really imply ACQUIRE/RELEASE with HLE_ACQUIRE/RELEASE now? I don't see that in the code. I think that's really required, otherwise the optimizer will do the wrong thing and move memory references outside the region. Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. It still seems wrong to me. HLE is an x86 construct, so weaker memory orders on the compiler level than what the instruction implements does not really make sense to me. And the instruction just has LOCK semantics. Currently it's highly error prone -- on the Russel hard to misuse scale not higher than 1 as is [1] At the minimum it would need a warning with RELAXED as suggested by Jakub earlier. I agree with Jakub that users really should specify memory order bits, if they want ordering. Andi, I also see your point regarding catching bugs, but this is really expert stuff, and my hope is that we can make HLE really transparent or at least provide better abstractions around it At least this form of HLE cannot be transparent, it has to be annotated by the programmer. -Andi [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [IA-64] Fix PR target/48496
Is the standard condition for define_memory_constraint here /* Likewise if the address will be reloaded because reg_equiv_address is nonzero. For reg_equiv_mem we have to check. */ else if (REG_P (operand) REGNO (operand) = FIRST_PSEUDO_REGISTER reg_renumber[REGNO (operand)] 0 ((reg_equiv_mem (REGNO (operand)) != 0 EXTRA_CONSTRAINT_STR (reg_equiv_mem (REGNO (operand)), c, p)) || (reg_equiv_address (REGNO || (operand)) != 0))) win = 1; If so, shouldn't you check those conditions as well, or at least something similar? Not sure if reg_equiv_address needs to be allowed there, and guess reg_equiv_mem should satisfy the Q constraint, i.e. !MEM_VOLATILE_P memory_operand. Accepting any pseudo there sounds too risky to me... You're right, and modifying a constraint to silence a bogus error is probably too dangerous in any case. And there may be other affected architectures. So patch withdrawn. The best fix is very likely in reload1.c in the end. -- Eric Botcazou
[v3] libstdc++/52942
Hi, this is what I'm going to commit to mainline and 4_7-branch to fix the regression. Tested x86_64-linux, debug-mode too. Thanks, Paolo. 2012-04-12 Paolo Carlini paolo.carl...@oracle.com PR libstdc++/52942 * include/bits/stl_function.h (_Identity, _Select1st, _Select2nd): In C++11 mode do not derive from std::unary_function. * include/ext/functional (identity, select1st, select2nd): Adjust. * testsuite/23_containers/unordered_map/requirements/52942.cc: New. * testsuite/23_containers/unordered_set/requirements/52942.cc: Likewise. Index: include/ext/functional === --- include/ext/functional (revision 186366) +++ include/ext/functional (working copy) @@ -1,6 +1,6 @@ // Functional extensions -*- C++ -*- -// Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010 +// Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2012 // Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free @@ -183,7 +183,13 @@ * @addtogroup SGIextensions */ template class _Tp -struct identity : public std::_Identity_Tp {}; +struct identity +#ifdef __GXX_EXPERIMENTAL_CXX0X__ +: public std::unary_function_Tp,_Tp, + public std::_Identity_Tp {}; +#else +: public std::_Identity_Tp {}; +#endif /** @c select1st and @c select2nd are extensions provided by SGI. Their * @c operator()s @@ -197,11 +203,23 @@ */ /// An \link SGIextensions SGI extension \endlink. template class _Pair -struct select1st : public std::_Select1st_Pair {}; +struct select1st +#ifdef __GXX_EXPERIMENTAL_CXX0X__ +: public std::unary_function_Pair, typename _Pair::first_type, + public std::_Select1st_Pair {}; +#else +: public std::_Select1st_Pair {}; +#endif /// An \link SGIextensions SGI extension \endlink. template class _Pair -struct select2nd : public std::_Select2nd_Pair {}; +struct select2nd +#ifdef __GXX_EXPERIMENTAL_CXX0X__ +: public std::unary_function_Pair, typename _Pair::second_type, + public std::_Select2nd_Pair {}; +#else +: public std::_Select2nd_Pair {}; +#endif /** @} */ // extension documented next Index: include/bits/stl_function.h === --- include/bits/stl_function.h (revision 186372) +++ include/bits/stl_function.h (working copy) @@ -1,6 +1,7 @@ // Functor implementations -*- C++ -*- -// Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011 +// Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, +// 2011, 2012 // Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free @@ -471,7 +472,12 @@ /** @} */ templatetypename _Tp -struct _Identity : public unary_function_Tp,_Tp +struct _Identity +#ifndef __GXX_EXPERIMENTAL_CXX0X__ +// unary_function itself is deprecated in C++11 and deriving from +// it can even be a nuisance (see PR 52942). +: public unary_function_Tp,_Tp +#endif { _Tp operator()(_Tp __x) const @@ -483,8 +489,10 @@ }; templatetypename _Pair -struct _Select1st : public unary_function_Pair, - typename _Pair::first_type +struct _Select1st +#ifndef __GXX_EXPERIMENTAL_CXX0X__ +: public unary_function_Pair, typename _Pair::first_type +#endif { typename _Pair::first_type operator()(_Pair __x) const @@ -508,8 +516,10 @@ }; templatetypename _Pair -struct _Select2nd : public unary_function_Pair, - typename _Pair::second_type +struct _Select2nd +#ifndef __GXX_EXPERIMENTAL_CXX0X__ +: public unary_function_Pair, typename _Pair::second_type +#endif { typename _Pair::second_type operator()(_Pair __x) const Index: testsuite/23_containers/unordered_map/requirements/52942.cc === --- testsuite/23_containers/unordered_map/requirements/52942.cc (revision 0) +++ testsuite/23_containers/unordered_map/requirements/52942.cc (revision 0) @@ -0,0 +1,37 @@ +// { dg-do compile } +// { dg-options -std=gnu++11 } + +// Copyright (C) 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without Pred the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +//
Re: [cxx-conversion] Switch default to build in C++
On 4/12/12 4:31 AM, Richard Guenther wrote: On Wed, Apr 11, 2012 at 6:37 PM, Diego Novillodnovi...@google.com wrote: This patch makes the branch to build in C++ mode by default. Tested on x86_64. Committed. Note that for merging the branch the support to build with a C host compiler should be removed (and docs updated). Thanks. Yes, I already had that in mind. Diego.
Re: [i386, patch, RFC] HLE support in GCC
On 04/11/2012 06:38 PM, Torvald Riegel wrote: On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote: Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. Does it make any sense to simply predefine the possible valid combinations with the HLE bit already set? it at least removes any possible invalid combinations and forces the programmer to consciously choose their memory model. ie, __ATOMIC_HLE_XACQ_CONSUME __ATOMIC_HLE_XACQ_ACQUIRE __ATOMIC_HLE_XACQ_ACQ_REL __ATOMIC_HLE_XACQ_SEQ_CST __ATOMIC_HLE_XREL_RELEASE __ATOMIC_HLE_XREL_ACQ_REL __ATOMIC_HLE_XREL_SEQ_CST or whatever happens to be valid... Doesn't really scale to adding more new bits later, but perhaps that doesn't matter. Just a thought. Andrew
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [i386, patch, RFC] HLE support in GCC
On Thu, Apr 12, 2012 at 08:21:47AM -0400, Andrew MacLeod wrote: On 04/11/2012 06:38 PM, Torvald Riegel wrote: On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote: Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. Does it make any sense to simply predefine the possible valid combinations with the HLE bit already set? it at least removes any possible invalid combinations and forces the programmer to consciously choose their memory model. ie, __ATOMIC_HLE_XACQ_CONSUME __ATOMIC_HLE_XACQ_ACQUIRE __ATOMIC_HLE_XACQ_ACQ_REL __ATOMIC_HLE_XACQ_SEQ_CST __ATOMIC_HLE_XREL_RELEASE __ATOMIC_HLE_XREL_ACQ_REL __ATOMIC_HLE_XREL_SEQ_CST or whatever happens to be valid... Doesn't really scale to adding more new bits later, but perhaps that doesn't matter. I'd prefer not to predefine these. They can be surely defined in some intrinsic header, but the number of predefined macros is already huge and is quite costly (it appears in all -g3 macro info, increases compiler initialization time even for empty sources, etc.). Jakub
Re: [i386, patch, RFC] HLE support in GCC
On Thu, 2012-04-12 at 13:36 +0200, Andi Kleen wrote: On Thu, Apr 12, 2012 at 12:38:44AM +0200, Torvald Riegel wrote: On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote: Tests passing, bootstrap in progress. Comments? Do you really imply ACQUIRE/RELEASE with HLE_ACQUIRE/RELEASE now? I don't see that in the code. I think that's really required, otherwise the optimizer will do the wrong thing and move memory references outside the region. Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. It still seems wrong to me. HLE is an x86 construct, so weaker memory orders on the compiler level than what the instruction implements does not really make sense to me. And the instruction just has LOCK semantics. What if another vendor shows up, perhaps on another architecture? Currently it's highly error prone -- on the Russel hard to misuse scale not higher than 1 as is [1] It would be a three, if the patch would contain documentation of the additional bit. I guess that's a hint :) It could even be a four, depending on the point of view. Not from the POV of the Intel HLE feature. But from a conceptual POV, HLE is pretty independent of memory orders. At the minimum it would need a warning with RELAXED as suggested by Jakub earlier. That could be helpful. However, I'm not 100% sure that HLE is only useful paired with acquire/release memory orders in general (ie, possibly on other archs). For example, if you only care about having acq_rel atomics being protected by your possibly-elided lock, then they won't get moved out of the critical section (unless I read the C++11 memory model too conservatively). Therefore, given that I don't see the atomic builtins being used by lots of programmers, I'd rather make them more general. I agree with Jakub that users really should specify memory order bits, if they want ordering. Andi, I also see your point regarding catching bugs, but this is really expert stuff, and my hope is that we can make HLE really transparent or at least provide better abstractions around it At least this form of HLE cannot be transparent, it has to be annotated by the programmer. Let me elaborate. The point I was trying to make is that it should be transparent for the non-concurrency-expert programmer. Or at least make this specific detail we're discussing here transparent (ie, whether HLE_ACQUIRE should imply a certain memory order). That is, if non-experts only see a default lock implementation, or see a lock-implementation with a use-HLE-here-if-possible flag, then they don't have to deal with memory orders anyway.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin izamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [i386, patch, RFC] HLE support in GCC
Hi Andrew, Does it make any sense to simply predefine the possible valid combinations with the HLE bit already set? it at least removes any possible invalid combinations and forces the programmer to consciously choose their memory model. ie, __ATOMIC_HLE_XACQ_CONSUME __ATOMIC_HLE_XACQ_ACQUIRE __ATOMIC_HLE_XACQ_ACQ_REL __ATOMIC_HLE_XACQ_SEQ_CST __ATOMIC_HLE_XREL_RELEASE __ATOMIC_HLE_XREL_ACQ_REL __ATOMIC_HLE_XREL_SEQ_CST or whatever happens to be valid... Doesn't really scale to adding more new bits later, but perhaps that doesn't matter. Idea sounds good to me. Certainly would be much harder to misuse, so generally be a better interface. As to what combinations make sense: An HLE region ACQUIRE has somewhat interesting ordering semantics. It's a fairly strong barrier (LOCK prefix) for reads and writes. The HLE RELEASE is either LOCK too, or MOV without LOCK. If it's running transactionally the whole block acts like a LOCK too. But we have to use the weakest. I suppose that would map to always _SEQ_CST just for most instructions, except for mov release whih can be _RELEASE too (and would need an additional MFENCE generated for anything stronger) Probably there is not a lot of value in allowing the optimizer weaker models than what the CPU does. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [i386, patch, RFC] HLE support in GCC
What if another vendor shows up, perhaps on another architecture? HLE code is currently usually x86 specific. e.g. for practical spin locks you have to include a __builtin_ia32_pause() on lock locked to stop speculation, otherwise the lock path will speculate too, which is very inefficient. So if you wanted abstracted HLE, you would need more abstracted builtins first. Currently it's highly error prone -- on the Russel hard to misuse scale not higher than 1 as is [1] It would be a three, if the patch would contain documentation of the additional bit. I guess that's a hint :) Yes :) However, I'm not 100% sure that HLE is only useful paired with acquire/release memory orders in general (ie, possibly on other archs). It's actually stronger on the instruction level (see other mail), except for the MOV release. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [i386, patch, RFC] HLE support in GCC
__ATOMIC_HLE_XACQ_CONSUME __ATOMIC_HLE_XACQ_ACQUIRE __ATOMIC_HLE_XACQ_ACQ_REL __ATOMIC_HLE_XACQ_SEQ_CST __ATOMIC_HLE_XREL_RELEASE __ATOMIC_HLE_XREL_ACQ_REL __ATOMIC_HLE_XREL_SEQ_CST or whatever happens to be valid... Doesn't really scale to adding more new bits later, but perhaps that doesn't matter. I'd prefer not to predefine these. They can be surely defined in some intrinsic I think only a very small number make sense for x86 HLE, two, if you include weak MOV RELEASE three. That's only one more than currently. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [patch optimization]: Add some basic folding for ==/!= comparison
On Thu, Apr 5, 2012 at 6:15 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this patch adds some basic folding capabilities to fold-const for equal and none-equal comparisons on integer typed argument. ChangeLog 2012-04-05 Kai Tietz kti...@redhat.com * fold-const.c (fold_comparison_1): New function. (fold_comparison): Use fold_comparison_1. 2012-04-05 Kai Tietz kti...@redhat.com * gcc.dg/fold-compare-1.c: Adjust matching rule for a == b without argument swap. * gcc.dg/fold-compare-7.c: New test. Regession tested for x86_64-unknown-linux-gnu for all languages (including Ada and Obj-C++). Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -8739,6 +8739,241 @@ pointer_may_wrap_p (tree base, tree offs return total_low (unsigned HOST_WIDE_INT) size; } +/* Sub-routine of fold_comparison. Folding of EQ_EXPR/NE_EXPR + comparisons with integral typed arguments. */ + +static tree +fold_comparison_1 (location_t loc, enum tree_code code, tree type, + tree arg0, tree arg1) Please name it more specific, like for example fold_integral_equality_comparison. +{ + enum tree_code c1, c2; + tree optype, op0, op1, opr0, opr1, tem; + + if (code != NE_EXPR code != EQ_EXPR) + return NULL_TREE; + + optype = TREE_TYPE (arg0); + if (!INTEGRAL_TYPE_P (optype)) + return NULL_TREE; + + c1 = TREE_CODE (arg0); + c2 = TREE_CODE (arg1); + + /* Integer constant is on right-hand side. */ + if (c1 == INTEGER_CST + c2 != c1) + return fold_build2_loc (loc, code, type, arg1, arg0); /* If one arg is a real or integer constant, put it last. */ if (tree_swap_operands_p (arg0, arg1, true)) return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0); in fold_comparison already ensures this. + if (!TREE_SIDE_EFFECTS (arg0) + operand_equal_p (arg0, arg1, 0)) + { + if (code == EQ_EXPR) + return build_one_cst (type); + return build_zero_cst (type); + } This is already done in a more general way in fold_comparison: /* Simplify comparison of something with itself. (For IEEE floating-point, we can only do some of these simplifications.) */ if (operand_equal_p (arg0, arg1, 0)) { switch (code) { case EQ_EXPR: ... which also shows how to fold to true/false - using constant_boolean_node. + + if (c1 == NEGATE_EXPR) + { + op0 = TREE_OPERAND (arg0, 0); + /* -X ==/!= -Y - X ==/!= Y. */ + if (c2 == c1) + return fold_build2_loc (loc, code, type, + op0, + TREE_OPERAND (arg1, 0)); This is already done, in a more general way but only for float types, in fold_comparison. It's beyond me why it is conditional on float types there and does not check for trapping math and NaNs (maybe that's well-defined - one would need to double-check). For integral types you'd have to care for undefined overflow (or issue a warning), and ... + /* -X ==/!= CST - X ==/!= CST' with CST'= -CST. */ + else if (c2 == INTEGER_CST) + return fold_build2_loc (loc, code, type, op0, + fold_build1_loc (loc, NEGATE_EXPR, + optype, arg1)); ... generalizing this the code should use negate_expr_p / negate_expr to for example handle folding -b != b - a to b != a - b (of course you'd require at least one explicit NEGATE_EXPR - similar foldings elsewhere will tell you what to do). + } + else if (c1 == BIT_NOT_EXPR) + { + op0 = TREE_OPERAND (arg0, 0); + /* ~X ==/!= ~Y - X ==/!= Y. */ + if (c2 == c1) + return fold_build2_loc (loc, code, type, op0, + TREE_OPERAND (arg1, 0)); This can be generalized to relational comparisons as well I think. It's also already done in fold_comparison here: /* Fold ~X op ~Y as Y op X. */ if (TREE_CODE (arg0) == BIT_NOT_EXPR TREE_CODE (arg1) == BIT_NOT_EXPR) { + /* ~X ==/!= CST - X ==/!= CST' with CST'= ~CST. */ + else if (c2 == INTEGER_CST) + return fold_build2_loc (loc, code, type, op0, + fold_build1_loc (loc, BIT_NOT_EXPR, + optype, arg1)); Possibly unified with having a new predicate/worker invert_expr_p / invert_expr. + } + + /* See if we can simplify case X == (Y +|-|^ Z). */ + if (c1 != PLUS_EXPR c1 != MINUS_EXPR c1 != BIT_XOR_EXPR) + { + if ((c2 != PLUS_EXPR c2 != MINUS_EXPR + c2 != BIT_XOR_EXPR) + || TREE_SIDE_EFFECTS (arg0)) + return NULL_TREE; (I'm not sure why you are testing for side-effects - if you omit sth use omit_*_operand ()) + +
Re: [RS6000] Stack tie fix.
On Wed, Apr 11, 2012 at 11:00:04AM +0200, Richard Guenther wrote: On Wed, Apr 11, 2012 at 6:15 AM, Alan Modra amo...@gmail.com wrote: Well - you are expecting generic code to understand your arch-specific UNSPEC. It of course cannot. Right. (USE (mem:BLK (reg 1))) (CLOBBER (mem:BLK (reg 1))) repeated, for each register (maybe you can avoid the USE if the CLOBBER I tried that. It doesn't work without something else in the insn to stop rtl-dce deleting it, so you may as well use SETs. But thanks for the prod in the right direction. We do get slightly better results when the regs are not hidden away in an UNSPEC, for instance non-stack writes/reads are seen by the alias oracle to not conflict with the epilogue frame deallocation. Bootstrapped etc. powerpc-linux. OK to apply, David? PR target/52828 * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with tie regs on destination of sets. Delete forward declaration. (rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls. (rs6000_emit_prologue): Likewise. (rs6000_emit_epilogue): Likewise. Use in place of gen_frame_tie and gen_stack_tie. (is_mem_ref): Use tie_operand to recognise stack ties. * config/rs6000/predicates.md (tie_operand): New. * config/rs6000/rs6000.md (UNSPEC_TIE): Delete. (restore_stack_block): Generate new stack tie rtl. (restore_stack_nonlocal): Likewise. (stack_tie): Update. (frame_tie): Delete. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 186373) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -925,7 +925,6 @@ static const char *rs6000_invalid_within_doloop (c static bool rs6000_legitimate_address_p (enum machine_mode, rtx, bool); static bool rs6000_debug_legitimate_address_p (enum machine_mode, rtx, bool); static rtx rs6000_generate_compare (rtx, enum machine_mode); -static void rs6000_emit_stack_tie (void); static bool spe_func_has_64bit_regs_p (void); static rtx gen_frame_mem_offset (enum machine_mode, rtx, int); static unsigned rs6000_hash_constant (rtx); @@ -18505,12 +18504,29 @@ rs6000_aix_asm_output_dwarf_table_ref (char * fram and the change to the stack pointer. */ static void -rs6000_emit_stack_tie (void) +rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed) { - rtx mem = gen_frame_mem (BLKmode, - gen_rtx_REG (Pmode, STACK_POINTER_REGNUM)); + rtvec p; + int i; + rtx regs[3]; - emit_insn (gen_stack_tie (mem)); + i = 0; + regs[i++] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); + if (hard_frame_needed) +regs[i++] = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM); + if (!(REGNO (fp) == STACK_POINTER_REGNUM + || (hard_frame_needed +REGNO (fp) == HARD_FRAME_POINTER_REGNUM))) +regs[i++] = fp; + + p = rtvec_alloc (i); + while (--i = 0) +{ + rtx mem = gen_frame_mem (BLKmode, regs[i]); + RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, mem, const0_rtx); +} + + emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p))); } /* Emit the correct code for allocating stack space, as insns. @@ -19130,7 +19146,7 @@ rs6000_emit_stack_reset (rs6000_stack_t *info, || (TARGET_SPE_ABI info-spe_64bit_regs_used != 0 info-first_gp_reg_save != 32)) -rs6000_emit_stack_tie (); +rs6000_emit_stack_tie (frame_reg_rtx, frame_pointer_needed); if (frame_reg_rtx != sp_reg_rtx) { @@ -19350,7 +19366,7 @@ rs6000_emit_prologue (void) } rs6000_emit_allocate_stack (info-total_size, copy_reg); if (frame_reg_rtx != sp_reg_rtx) - rs6000_emit_stack_tie (); + rs6000_emit_stack_tie (frame_reg_rtx, false); } /* Handle world saves specially here. */ @@ -19854,7 +19870,7 @@ rs6000_emit_prologue (void) sp_offset = info-total_size; rs6000_emit_allocate_stack (info-total_size, copy_reg); if (frame_reg_rtx != sp_reg_rtx) - rs6000_emit_stack_tie (); + rs6000_emit_stack_tie (frame_reg_rtx, false); } /* Set frame pointer, if needed. */ @@ -20425,13 +20441,7 @@ rs6000_emit_epilogue (int sibcall) /* Prevent reordering memory accesses against stack pointer restore. */ else if (cfun-calls_alloca || offset_below_red_zone_p (-info-total_size)) - { - rtx mem1 = gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx); - rtx mem2 = gen_rtx_MEM (BLKmode, sp_reg_rtx); - MEM_NOTRAP_P (mem1) = 1; - MEM_NOTRAP_P (mem2) = 1; - emit_insn (gen_frame_tie (mem1, mem2)); - } + rs6000_emit_stack_tie (frame_reg_rtx, true); insn = emit_insn (gen_add3_insn (frame_reg_rtx, hard_frame_pointer_rtx, GEN_INT (info-total_size))); @@ -20444,11 +20454,7 @@ rs6000_emit_epilogue (int sibcall) /* Prevent reordering memory
Re: [i386, patch, RFC] HLE support in GCC
On Thu, 2012-04-12 at 08:21 -0400, Andrew MacLeod wrote: On 04/11/2012 06:38 PM, Torvald Riegel wrote: On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote: Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START / HLE_END instead? Not particularly great names, but at least it avoids overloading ACQUIRE/RELEASE and thus should make it clearer that you still need to specify a memory order. Does it make any sense to simply predefine the possible valid combinations with the HLE bit already set? it at least removes any possible invalid combinations and forces the programmer to consciously choose their memory model. ie, __ATOMIC_HLE_XACQ_CONSUME __ATOMIC_HLE_XACQ_ACQUIRE __ATOMIC_HLE_XACQ_ACQ_REL __ATOMIC_HLE_XACQ_SEQ_CST __ATOMIC_HLE_XREL_RELEASE __ATOMIC_HLE_XREL_ACQ_REL __ATOMIC_HLE_XREL_SEQ_CST or whatever happens to be valid... Doesn't really scale to adding more new bits later, but perhaps that doesn't matter. I would suggest that we keep the HLE acq/rel bits independent of the memory order bits. Both are independent on a conceptual level. And we should add documentation that tells programmers that memory orders need always be specified. This way, we: - keep it generic and don't add any arch-specific assumptions, - remain closer to a true extension of the C++11/C11 atomics instead of a more strongly coupled mix-up, - can still do arch-specific selection of memory orders (e.g., even just the memory orders are more fine-granular than you'd need for just x86), - can still do arch-specific warnings if some combinations really don't make sense, - keep the number of macros small as preferred by Jakub. Would that be an acceptable option for everyone? Andi, would proper documentation resolve your ease-of-use concerns?
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
So would need much more benchmarking on macro workloads first at least. Like what, for example? I believe in this case everything also strongly depends on test usage model (e.g. it usually compiled with Os not O2) and, let's say, internal test structure - whether there are hot loops that suitable for unroll. Normally the compiler doesn't know if a loop is hot unless you use profile feedback. So worst case on a big code base you may end up with a lot of unnecessary unrolling. On cold code it's just wasted bytes, but there could be already icache limited code where it would be worse. How about just a compiler bootstrap on Atom as a worst case? For the benchmark can you use profile feedback? BTW I know some loops are unrolled at -O3 by default at tree level because the vectorizer likes it. I actually have an older patch to dial this down for some common cases. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [i386, patch, RFC] HLE support in GCC
Would that be an acceptable option for everyone? Andi, would proper documentation resolve your ease-of-use concerns? Proper documentation is needed in any case, but I would strongly prefer an inherently hard-to-misuse interface. I think Andrew's proposal is the best for that so far. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
2012/4/12 Andrey Belevantsev a...@ispras.ru: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 17:54, Richard Guenther wrote: 2012/4/12 Andrey Belevantseva...@ispras.ru: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.comwrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? I mean that the scheduler does things like sort the ready list according to a number of heuristics and to the insn priority, then choose the insn that would allow the maximum of ready insns to be issued on the current cycle. The heuristics are rather general. The scheduler does not do things like if some random insn is ready, then choose some other random insn from the ready list and schedule it (which is what the patch does). This is what scheduler hooks are for, to account for some target-specific heuristic. The problem is that this particular implementation looks somewhat like an overkill and also motivated by a single benchmark. Testing on a wider set of benchmarks and checking compile-time hit would make the motivation more convincing. Andrey Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [PATCH, ARM] PR49448 incorrectly detecting big-endian arm-linux triplets
On 28/02/12 15:27, Richard Earnshaw wrote: The pattern to match a big-endian machine for linux is ambiguous as reported by the PR and can cause some little-endian triplets to be confused as big-endian. This patch makes the string unambiguous. R. * config.gcc (arm*-*-linux*): Use an unambiguous pattern for detecting big-endian triplets. be-lin.patch Index: config.gcc === --- config.gcc(revision 184624) +++ config.gcc(working copy) @@ -825,7 +825,7 @@ arm*-*-netbsdelf*) arm*-*-linux*) # ARM GNU/Linux with ELF tm_file=dbxelf.h elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h case $target in - arm*b-*) + arm*b-*-linux*) tm_defines=${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1 ;; esac For the record, I've back-ported this patch to gcc-4.5 and gcc-4.6. R.
Re: [i386, patch, RFC] HLE support in GCC
I would suggest that we keep the HLE acq/rel bits independent of the memory order bits. Both are independent on a conceptual level. And we should add documentation that tells programmers that memory orders need always be specified. Sorry, I didn't get your point. You propose to separate HLE-capable builtins at all? How will we make it independent without new param introduction? New param introduction will make new builtin. We already had that: see initial patch - it contains HLE-featured intrinsics (and builtins), independent of __atomic* K
[PATCH] More code refactoring
Eventually I'll succeed in making tree-optimize.c empty. At least the pass stuff I'm interested in get's better now. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2012-04-12 Richard Guenther rguent...@suse.de * Makefile.in (cgraphunit.o): Add $(EXCEPT_H) dependency. * cgraph.h (tree_rest_of_compilation): Remove. * cgraph.c (cgraph_add_new_function): Move ... * cgraphunit.c (cgraph_add_new_function): ... here. (tree_rest_of_compilation): Make static. (cgraph_expand_function): Do not set cgraph_function_flags_ready. * tree-optimize.c (gate_all_optimizations, pass_all_optimizations, gate_all_early_local_passes, execute_all_early_local_passes, pass_early_local_passes, gate_all_early_optimizations, pass_all_early_optimizations): Move ... * passes.c (gate_all_optimizations, pass_all_optimizations, gate_all_early_local_passes, execute_all_early_local_passes, pass_early_local_passes, gate_all_early_optimizations, pass_all_early_optimizations): ... here. * tree-optimize.c (execute_free_datastructures): Remove. * tree-flow.h (execute_free_datastructures): Remove. * tree-optimize.c (execute_init_datastructures, pass_init_datastructures): Move ... * tree-ssa.c (execute_init_datastructures, pass_init_datastructures): ... here. * cfgexpand.c (gimple_expand_cfg): Inline-expand call to execute_free_datastructures. Index: gcc/Makefile.in === *** gcc/Makefile.in.orig2012-04-12 14:05:23.0 +0200 --- gcc/Makefile.in 2012-04-12 14:05:51.274596925 +0200 *** cgraphunit.o : cgraphunit.c $(CONFIG_H) *** 2922,2928 $(FIBHEAP_H) output.h $(PARAMS_H) $(RTL_H) $(TIMEVAR_H) $(IPA_PROP_H) \ gt-cgraphunit.h tree-iterator.h $(COVERAGE_H) $(TREE_DUMP_H) \ tree-pretty-print.h gimple-pretty-print.h ipa-inline.h $(IPA_UTILS_H) \ !$(LTO_STREAMER_H) output.h $(REGSET_H) cgraphbuild.o : cgraphbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(TREE_H) langhooks.h $(CGRAPH_H) intl.h pointer-set.h $(GIMPLE_H) \ $(TREE_FLOW_H) $(TREE_PASS_H) $(IPA_UTILS_H) $(EXCEPT_H) \ --- 2922,2928 $(FIBHEAP_H) output.h $(PARAMS_H) $(RTL_H) $(TIMEVAR_H) $(IPA_PROP_H) \ gt-cgraphunit.h tree-iterator.h $(COVERAGE_H) $(TREE_DUMP_H) \ tree-pretty-print.h gimple-pretty-print.h ipa-inline.h $(IPA_UTILS_H) \ !$(LTO_STREAMER_H) output.h $(REGSET_H) $(EXCEPT_H) cgraphbuild.o : cgraphbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(TREE_H) langhooks.h $(CGRAPH_H) intl.h pointer-set.h $(GIMPLE_H) \ $(TREE_FLOW_H) $(TREE_PASS_H) $(IPA_UTILS_H) $(EXCEPT_H) \ Index: gcc/cgraph.c === *** gcc/cgraph.c.orig 2012-04-12 14:05:23.0 +0200 --- gcc/cgraph.c2012-04-12 14:05:51.276596925 +0200 *** cgraph_function_body_availability (struc *** 2397,2485 return avail; } - /* Add the function FNDECL to the call graph. -Unlike cgraph_finalize_function, this function is intended to be used -by middle end and allows insertion of new function at arbitrary point -of compilation. The function can be either in high, low or SSA form -GIMPLE. - -The function is assumed to be reachable and have address taken (so no -API breaking optimizations are performed on it). - -Main work done by this function is to enqueue the function for later -processing to avoid need the passes to be re-entrant. */ - - void - cgraph_add_new_function (tree fndecl, bool lowered) - { - struct cgraph_node *node; - switch (cgraph_state) - { - case CGRAPH_STATE_CONSTRUCTION: - /* Just enqueue function to be processed at nearest occurrence. */ - node = cgraph_create_node (fndecl); - node-next_needed = cgraph_new_nodes; - if (lowered) - node-lowered = true; - cgraph_new_nodes = node; - break; - - case CGRAPH_STATE_IPA: - case CGRAPH_STATE_IPA_SSA: - case CGRAPH_STATE_EXPANSION: - /* Bring the function into finalized state and enqueue for later - analyzing and compilation. */ - node = cgraph_get_create_node (fndecl); - node-local.local = false; - node-local.finalized = true; - node-reachable = node-needed = true; - if (!lowered cgraph_state == CGRAPH_STATE_EXPANSION) - { - push_cfun (DECL_STRUCT_FUNCTION (fndecl)); - current_function_decl = fndecl; - gimple_register_cfg_hooks (); - bitmap_obstack_initialize (NULL); - execute_pass_list (all_lowering_passes); - execute_pass_list (pass_early_local_passes.pass.sub); - bitmap_obstack_release (NULL); - pop_cfun (); - current_function_decl =
Re: [PATCH] Atom: Scheduler improvements for better imul placement
2012/4/12 Andrey Belevantsev a...@ispras.ru: On 12.04.2012 17:54, Richard Guenther wrote: 2012/4/12 Andrey Belevantseva...@ispras.ru: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? I mean that the scheduler does things like sort the ready list according to a number of heuristics and to the insn priority, then choose the insn that would allow the maximum of ready insns to be issued on the current cycle. The heuristics are rather general. The scheduler does not do things like if some random insn is ready, then choose some other random insn from the ready list and schedule it (which is what the patch does). This is what scheduler hooks are for, to account for some target-specific heuristic. The problem is that this particular implementation looks somewhat like an overkill and also motivated by a single benchmark. Testing on a wider set of benchmarks and checking compile-time hit would make the motivation more convincing. Yeah, and especially looking _why_ the generic heuristics are not working and if they could be improved. After all the issue seems to be properly modeled in the DFA. Richard. Andrey Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [PATCH] Fix ICE in expand_cse_reciprocals (PR tree-optimization/42078)
On Mon, Apr 9, 2012 at 8:37 AM, Alexandre Oliva aol...@redhat.com wrote: On Jun 3, 2011, Alexandre Oliva aol...@redhat.com wrote: According to http://gcc.gnu.org/ml/gcc-patches/2009-11/msg01082.html on Nov 20, 2009, Alexandre Oliva aol...@redhat.com wrote: On Nov 19, 2009, Richard Guenther richard.guent...@gmail.com wrote: In fact this exchanging of the LHS (or rather invalidating of the SSA name value) should be a helper function that knows the implementation details and avoids going through releasing and allocating the name. Okie dokie, here's a sequence of patches that implements helper functions for this sort of stuff. The first patch introduces machinery to propagate “dying” DEFs into debug stmts, while replacing them with other SSA_NAMEs. This is already in. The second extends it so as to enable the old LHS to be redefined e.g. in terms of the new LHS. IIRC this may be useful in some other transformations that, in the process of introducing VTA, I changed from modifying stmts in place to inserting new stmts and removing others. I haven't looked for them yet. The third uses this new feature for the case at hand, while avoiding computing the reciprocal expression if we know it won't be used. Updated versions of these follow. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? I was asked to submit these again in stage1, so... Ping? (updated and re-tested) + /* If the conditions in which this function uses VALUE change, + adjust gimple_replace_lhs_wants_value(). */ + gcc_assert (gimple_replace_lhs_wants_value () + == MAY_HAVE_DEBUG_STMTS); + that looks ... odd. But I see you want to conditionally compute value. +static inline bool +gimple_replace_lhs_wants_value (void) +{ + return MAY_HAVE_DEBUG_STMTS; +} but should this not depend on the old stmt / lhs? For example do we really want to do this for artificial variables? I suppose not. Thanks, Richard. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH] Fix PR18589
On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Ok. Thanks, Richard. Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. gcc/testsuite: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y) +{ + return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); +} + +/* { dg-final { scan-tree-dump-times \\* 4 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +float baz (float x, float y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times \\* 3 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized
Re: [PATCH][ARM] NEON DImode neg
Ping. On 26/03/12 11:14, Andrew Stubbs wrote: On 28/02/12 17:45, Andrew Stubbs wrote: Hi all, This patch adds a DImode negate pattern for NEON. Unfortunately, the NEON vneg instruction only supports vectors, not singletons, so there's no direct way to do it in DImode, and the compiler ends up moving the value back to core registers, negating it, and returning to NEON afterwards: fmrrd r2, r3, d16 @ int negs r2, r2 sbc r3, r3, r3, lsl #1 fmdrr d16, r2, r3 @ int The new patch does it entirely in NEON: vmov.i32 d17, #0 @ di vsub.i64 d16, d17, d16 (Note that this is the result when combined with my recent patch for NEON DImode immediates. Without that you get a constant pool load.) This updates fixes a bootstrap failure caused by an early clobber error. I've also got a native regression test running now. OK? Andrew
[PATCH][C] Fix PR52549
This fixes PR52549 - we are running into an overzealous assert that wants to make sure we don't have PLUS_EXPR on pointers. But that code does not really check this and falls foul of the conversion removal code right before it that transforms (void *)(a + b) to a + b. Fixed as follows. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Ok? Thanks, Richard. 2012-04-12 Richard Guenther rguent...@suse.de PR c/52549 * c-typeck.c (pointer_diff): Remove bogus assert. * gcc.dg/pr52549.c: New testcase. Index: gcc/c-typeck.c === --- gcc/c-typeck.c (revision 186373) +++ gcc/c-typeck.c (working copy) @@ -3446,8 +3446,6 @@ pointer_diff (location_t loc, tree op0, else con1 = op1; - gcc_assert (TREE_CODE (con0) != PLUS_EXPR - TREE_CODE (con1) != PLUS_EXPR); if (TREE_CODE (con0) == POINTER_PLUS_EXPR) { lit0 = TREE_OPERAND (con0, 1); Index: gcc/testsuite/gcc.dg/pr52549.c === --- gcc/testsuite/gcc.dg/pr52549.c (revision 0) +++ gcc/testsuite/gcc.dg/pr52549.c (revision 0) @@ -0,0 +1,6 @@ +/* { dg-do compile } */ + +_mark (long obj, int i, char *a) +{ + (char *)(((long *)(obj)) [i]) - a; +}
[PATCH] Fix PR52862
This removes the special casing of zero from convert_to_pointer (why would only that need overflow flag handling? and why would the generic code below not properly handle it ...) That sounds better than papering over the issue with noting that integer_zerop can return true for non-INTEGER_CSTs and checking for that before querying TREE_OVERFLOW. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Richard. 2012-04-12 Richard Guenther rguent...@suse.de PR c/52862 * convert.c (convert_to_pointer): Remove special-casing of zero. * gcc.dg/pr52862.c: New testcase. Index: gcc/convert.c === *** gcc/convert.c (revision 186373) --- gcc/convert.c (working copy) *** convert_to_pointer (tree type, tree expr *** 44,54 if (TREE_TYPE (expr) == type) return expr; - /* Propagate overflow to the NULL pointer. */ - if (integer_zerop (expr)) - return force_fit_type_double (type, double_int_zero, 0, - TREE_OVERFLOW (expr)); - switch (TREE_CODE (TREE_TYPE (expr))) { case POINTER_TYPE: --- 44,49 Index: gcc/testsuite/gcc.dg/pr52862.c === *** gcc/testsuite/gcc.dg/pr52862.c (revision 0) --- gcc/testsuite/gcc.dg/pr52862.c (revision 0) *** *** 0 --- 1,9 + /* { dg-do compile } */ + /* { dg-options -O } */ + + void ASMAtomicWritePtrVoid(const void *pv); + void rtThreadDestroy(void) + { + void * const pvTypeChecked = ((void *)0); + ASMAtomicWritePtrVoid((void *)(pvTypeChecked)); + }
Re: [PATCH, powerpc] Fix PR52775, enable FCFID on power4
On Wed, Apr 11, 2012 at 4:12 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: It was brought to my attention that when I rewrote the floating point conversion operations for power7, I did not notice that the power4 and 970 powerpc's actually support the FCFID (floating point convert) instruciton in 32-bit mode. This patch fixes it. It is ok to apply? I did bootstraps with and without the patch, and there were no regressions. [gcc] 2012-04-11 Michael Meissner meiss...@linux.vnet.ibm.com PR target/52775 * config/rs6000/rs6000.h (TARGET_FCFID): Add TARGET_PPC_GPOPT to the list of options to enable the FCFID instruction. (TARGET_EXTRA_BUILTINS): Adjust comment. [gcc/testsuite] 2012-04-11 Michael Meissner meiss...@linux.vnet.ibm.com PR target/52775 * gcc.target/powerpc/pr52775.c: New file. Okay. Thanks, David
[Patch, Fortran] PR52864 - fix actual/formal checks
This patch is a kind of follow up to the other one for the same PR - though this one is for a separate test case, it is not a regression and it's about actual/formal checks. When trying to fix the rejects-valid bug, I realized that one function was never accessed as a call to expr.c's gfc_check_vardef_context is done before. I made some cleanup and added some code to ensure pointer CLASS are correctly handled. I am not positive that the removed code is unreachable, but I failed to produce reachable code and also the test suit passed. Thus, this patch removed a rejects-valid bug, an accepts-invalid bug, cleans up the code a bit and adds a test case for existing checks (besides testing the bug fixes). Build and regtested on x86-64-linux. OK for the trunk? Tobias 20012-04-12 Tobias Burnus bur...@net-b.de PR fortran/52864 * interface.c (compare_parameter_intent): Remove. (check_intents): Remove call, handle CLASS pointer. (compare_actual_formal): Handle CLASS pointer. 20012-04-12 Tobias Burnus bur...@net-b.de PR fortran/52864 * gfortran.dg/pointer_intent_7.f90: New. * gfortran.dg/pure_formal_3.f90: New. diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index 298ae23..3c8f9cb 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2504,7 +2520,9 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, ? _(actual argument to INTENT = OUT/INOUT) : NULL); - if (f-sym-attr.pointer + if (((f-sym-ts.type == BT_CLASS f-sym-attr.class_ok + CLASS_DATA (f-sym)-attr.class_pointer) + || (f-sym-ts.type != BT_CLASS f-sym-attr.pointer)) gfc_check_vardef_context (a-expr, true, false, context) == FAILURE) return 0; @@ -2799,25 +2817,6 @@ check_some_aliasing (gfc_formal_arglist *f, gfc_actual_arglist *a) } -/* Given a symbol of a formal argument list and an expression, - return nonzero if their intents are compatible, zero otherwise. */ - -static int -compare_parameter_intent (gfc_symbol *formal, gfc_expr *actual) -{ - if (actual-symtree-n.sym-attr.pointer !formal-attr.pointer) -return 1; - - if (actual-symtree-n.sym-attr.intent != INTENT_IN) -return 1; - - if (formal-attr.intent == INTENT_INOUT || formal-attr.intent == INTENT_OUT) -return 0; - - return 1; -} - - /* Given formal and actual argument lists that correspond to one another, check that they are compatible in the sense that intents are not mismatched. */ @@ -2839,25 +2838,11 @@ check_intents (gfc_formal_arglist *f, gfc_actual_arglist *a) f_intent = f-sym-attr.intent; - if (!compare_parameter_intent(f-sym, a-expr)) - { - gfc_error (Procedure argument at %L is INTENT(IN) while interface - specifies INTENT(%s), a-expr-where, - gfc_intent_string (f_intent)); - return FAILURE; - } - if (gfc_pure (NULL) gfc_impure_variable (a-expr-symtree-n.sym)) { - if (f_intent == INTENT_INOUT || f_intent == INTENT_OUT) - { - gfc_error (Procedure argument at %L is local to a PURE - procedure and is passed to an INTENT(%s) argument, - a-expr-where, gfc_intent_string (f_intent)); - return FAILURE; - } - - if (f-sym-attr.pointer) + if ((f-sym-ts.type == BT_CLASS CLASS_DATA (f-sym) + CLASS_DATA (f-sym)-attr.class_pointer) + || (f-sym-ts.type != BT_CLASS f-sym-attr.pointer)) { gfc_error (Procedure argument at %L is local to a PURE procedure and has the POINTER attribute, @@ -2877,7 +2862,9 @@ check_intents (gfc_formal_arglist *f, gfc_actual_arglist *a) return FAILURE; } - if (f-sym-attr.pointer) + if ((f-sym-ts.type == BT_CLASS CLASS_DATA (f-sym) +CLASS_DATA (f-sym)-attr.class_pointer) + || (f-sym-ts.type != BT_CLASS f-sym-attr.pointer)) { gfc_error (Coindexed actual argument at %L in PURE procedure is passed to a POINTER dummy argument, --- /dev/null 2012-04-12 06:55:49.927755790 +0200 +++ gcc/gcc/testsuite/gfortran.dg/pointer_intent_7.f90 2012-04-12 12:21:37.0 +0200 @@ -0,0 +1,45 @@ +! { dg-do compile } +! +! PR fortran/ +! +! Contributed by Neil Carlson +! +! Check whether passing an intent(in) pointer +! to an intent(inout) nonpointer is allowed +! +module modA + type :: typeA +integer, pointer :: ptr + end type +contains + subroutine foo (a,b,c) +type(typeA), intent(in) :: a +type(typeA), intent(in) , pointer :: b +class(typeA), intent(in) , pointer :: c + +call bar (a%ptr) +call bar2 (b) +call bar3 (b) +call bar2 (c) +call bar3 (c) +call bar2p (b) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT } +call bar3p (b) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT } +call bar2p (c) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT } +
Re: [RS6000] Stack tie fix.
On Thu, Apr 12, 2012 at 9:22 AM, Alan Modra amo...@gmail.com wrote: I tried that. It doesn't work without something else in the insn to stop rtl-dce deleting it, so you may as well use SETs. But thanks for the prod in the right direction. We do get slightly better results when the regs are not hidden away in an UNSPEC, for instance non-stack writes/reads are seen by the alias oracle to not conflict with the epilogue frame deallocation. Bootstrapped etc. powerpc-linux. OK to apply, David? PR target/52828 * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with tie regs on destination of sets. Delete forward declaration. (rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls. (rs6000_emit_prologue): Likewise. (rs6000_emit_epilogue): Likewise. Use in place of gen_frame_tie and gen_stack_tie. (is_mem_ref): Use tie_operand to recognise stack ties. * config/rs6000/predicates.md (tie_operand): New. * config/rs6000/rs6000.md (UNSPEC_TIE): Delete. (restore_stack_block): Generate new stack tie rtl. (restore_stack_nonlocal): Likewise. (stack_tie): Update. (frame_tie): Delete. This probably is getting close to the best we can do with GCC's RTL alias analysis infrastructure. This version is okay, but I also want to give Richi and Olivier an opportunity to comment if they still have any concerns. Thanks, David
Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable
Hi, On Fri, Apr 06, 2012 at 06:13:20PM +0200, Eric Botcazou wrote: 2012-04-03 Martin Jambor mjam...@suse.cz * expr.c (expand_expr_real_1): Pass type, not the expression, to set_mem_attributes for a memory temporary. Do not call the function for temporaries with a different alias set. The last sentence is unprecise, this would rather be: Do not call the function for the memory temporary created for a bitfield. OK I wonder whether we should simplify the bitfield case in the process. Once we remove the call to set_mem_attributes, I think the /* If the reference doesn't use the alias set of its type, we cannot create the temporary using that type. */ is useless, so we could try to revert r122014 in the process. Well, the commit did not add a testcase and when I looked up the patch in the mailing list archive (http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it was fixing problems not reproducible on trunk so it's basically impossible for me to evaluate whether it is still necessary by some simple testing. Having said that, I guess I can give it a round of regular testing on all the platforms I have currently set up. On Fri, Apr 06, 2012 at 06:21:55PM +0200, Eric Botcazou wrote: @@ -9870,7 +9871,14 @@ expand_expr_real_1 (tree exp, rtx target if (op0 == orig_op0) op0 = copy_rtx (op0); - set_mem_attributes (op0, exp, 0); + /* If op0 is a temporary because of forcing to memory, pass only the + type to set_mem_attributes so that the original expression is never + marked as ADDRESSABLE through MEM_EXPR of the temporary. */ + if (mem_attrs_from_type) + set_mem_attributes (op0, TREE_TYPE (exp), 0); set_mem_attributes (op0, type, 0); is equivalent. OK. So basically I'd like to commit the following. It has been successfully bootstrapped and tested on x86_64-linux, i686-linux, sparc64-linux (without Java), ia64-linux (without Ada) and tested on hppa-linux (C and C++ only). OK for trunk? Thanks, Martin 2012-04-10 Martin Jambor mjam...@suse.cz * expr.c (expand_expr_real_1): Pass type, not the expression, to set_mem_attributes for a memory temporary. Do not call the function for the memory temporary created for a bitfield. Index: src/gcc/expr.c === --- src.orig/gcc/expr.c +++ src/gcc/expr.c @@ -9598,6 +9598,7 @@ expand_expr_real_1 (tree exp, rtx target tree tem = get_inner_reference (exp, bitsize, bitpos, offset, mode1, unsignedp, volatilep, true); rtx orig_op0, memloc; + bool mem_attrs_from_type = false; /* If we got back the original object, something is wrong. Perhaps we are evaluating an expression too early. In any event, don't @@ -9703,6 +9704,7 @@ expand_expr_real_1 (tree exp, rtx target memloc = assign_temp (nt, 1, 1, 1); emit_move_insn (memloc, op0); op0 = memloc; + mem_attrs_from_type = true; } if (offset) @@ -9875,7 +9877,6 @@ expand_expr_real_1 (tree exp, rtx target emit_move_insn (new_rtx, op0); op0 = copy_rtx (new_rtx); PUT_MODE (op0, BLKmode); - set_mem_attributes (op0, exp, 1); } return op0; @@ -9896,7 +9897,14 @@ expand_expr_real_1 (tree exp, rtx target if (op0 == orig_op0) op0 = copy_rtx (op0); - set_mem_attributes (op0, exp, 0); + /* If op0 is a temporary because of forcing to memory, pass only the + type to set_mem_attributes so that the original expression is never + marked as ADDRESSABLE through MEM_EXPR of the temporary. */ + if (mem_attrs_from_type) + set_mem_attributes (op0, type, 0); + else + set_mem_attributes (op0, exp, 0); + if (REG_P (XEXP (op0, 0))) mark_reg_pointer (XEXP (op0, 0), MEM_ALIGN (op0));
Re: [PATCH][ARM] NEON DImode neg
On 26/03/12 11:14, Andrew Stubbs wrote: On 28/02/12 17:45, Andrew Stubbs wrote: Hi all, This patch adds a DImode negate pattern for NEON. Unfortunately, the NEON vneg instruction only supports vectors, not singletons, so there's no direct way to do it in DImode, and the compiler ends up moving the value back to core registers, negating it, and returning to NEON afterwards: fmrrd r2, r3, d16 @ int negs r2, r2 sbc r3, r3, r3, lsl #1 fmdrr d16, r2, r3 @ int The new patch does it entirely in NEON: vmov.i32 d17, #0 @ di vsub.i64 d16, d17, d16 (Note that this is the result when combined with my recent patch for NEON DImode immediates. Without that you get a constant pool load.) This updates fixes a bootstrap failure caused by an early clobber error. I've also got a native regression test running now. OK? Andrew neon-neg64.patch 2012-03-26 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm.md (negdi2): Use gen_negdi2_neon. * config/arm/neon.md (negdi2_neon): New insn. Also add splitters for core and NEON registers. --- gcc/config/arm/arm.md |8 +++- gcc/config/arm/neon.md | 37 + 2 files changed, 44 insertions(+), 1 deletions(-) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 751997f..f1dbbf7 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4048,7 +4048,13 @@ (neg:DI (match_operand:DI 1 s_register_operand ))) (clobber (reg:CC CC_REGNUM))])] TARGET_EITHER - + { +if (TARGET_NEON) + { +emit_insn (gen_negdi2_neon (operands[0], operands[1])); + DONE; + } + } ) ;; The constraints here are to prevent a *partial* overlap (where %Q0 == %R1). diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 3c88568..bf229a7 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -922,6 +922,43 @@ (const_string neon_int_3)))] ) +(define_insn negdi2_neon + [(set (match_operand:DI 0 s_register_operand = w,?r,?r,?w) + (neg:DI (match_operand:DI 1 s_register_operand w, 0, r, w))) + (clobber (match_scratch:DI 2 =w, X, X,w)) + (clobber (reg:CC CC_REGNUM))] + TARGET_NEON + # + [(set_attr length 8) + (set_attr arch nota8,*,*,onlya8)] +) + If negation in Neon needs a scratch register, it seems to me to be somewhat odd that we're disparaging the ARM version. Also, wouldn't it be sensible to support a variant that was early-clobber on operand 0, but loaded immediate zero into that value first: vmovDd, #0 vsubDd, Dd, Dm That way you'll never need more than two registers, whereas today you want three. R.
Re: [RS6000] Stack tie fix.
On Thu, Apr 12, 2012 at 5:34 PM, David Edelsohn dje@gmail.com wrote: On Thu, Apr 12, 2012 at 9:22 AM, Alan Modra amo...@gmail.com wrote: I tried that. It doesn't work without something else in the insn to stop rtl-dce deleting it, so you may as well use SETs. But thanks for the prod in the right direction. We do get slightly better results when the regs are not hidden away in an UNSPEC, for instance non-stack writes/reads are seen by the alias oracle to not conflict with the epilogue frame deallocation. Bootstrapped etc. powerpc-linux. OK to apply, David? PR target/52828 * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with tie regs on destination of sets. Delete forward declaration. (rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls. (rs6000_emit_prologue): Likewise. (rs6000_emit_epilogue): Likewise. Use in place of gen_frame_tie and gen_stack_tie. (is_mem_ref): Use tie_operand to recognise stack ties. * config/rs6000/predicates.md (tie_operand): New. * config/rs6000/rs6000.md (UNSPEC_TIE): Delete. (restore_stack_block): Generate new stack tie rtl. (restore_stack_nonlocal): Likewise. (stack_tie): Update. (frame_tie): Delete. This probably is getting close to the best we can do with GCC's RTL alias analysis infrastructure. This version is okay, but I also want to give Richi and Olivier an opportunity to comment if they still have any concerns. It looks fine to me. Richard. Thanks, David
Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?
Hi, On Wed, Apr 04, 2012 at 04:42:05PM +0200, Richard Guenther wrote: On Wed, 4 Apr 2012, Martin Jambor wrote: Hi everyone, especially Richi and Eric, I'd like to know what is your attitude to changing SRA's build_ref_for_model to what it once looked like, so that it produces COMPONENT_REFs only for bit-fields. The non-bit field handling was added in order to work-around problems when expanding non-aligned MEM_REFs on strict-alignment targets but that should not be a problem now and my experiments confirm that. Last week I successfully bootstrapped and tested this patch on sparc64-linux (with the temporary MEM_EXPR patch, not including Java), ia64-linux (without Ada), x86_64-linux, i686-linux and tested it on hppa-linux (only C and C++). ... 2012-03-20 Martin Jambor mjam...@suse.cz * tree-sra.c (build_ref_for_model): Create COMPONENT_REFs only for bit-fields. Index: src/gcc/tree-sra.c === *** src.orig/gcc/tree-sra.c --- src/gcc/tree-sra.c *** build_ref_for_offset (location_t loc, tr *** 1489,1558 return fold_build2_loc (loc, MEM_REF, exp_type, base, off); } - DEF_VEC_ALLOC_P_STACK (tree); - #define VEC_tree_stack_alloc(alloc) VEC_stack_alloc (tree, alloc) - /* Construct a memory reference to a part of an aggregate BASE at the given !OFFSET and of the type of MODEL. In case this is a chain of references !to component, the function will replicate the chain of COMPONENT_REFs of !the expression of MODEL to access it. GSI and INSERT_AFTER have the same !meaning as in build_ref_for_offset. */ static tree build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, struct access *model, gimple_stmt_iterator *gsi, bool insert_after) { ! tree type = model-type, t; ! VEC(tree,stack) *cr_stack = NULL; ! ! if (TREE_CODE (model-expr) == COMPONENT_REF) { ! tree expr = model-expr; ! ! /* Create a stack of the COMPONENT_REFs so later we can walk them in !order from inner to outer. */ ! cr_stack = VEC_alloc (tree, stack, 6); ! ! do { ! tree field = TREE_OPERAND (expr, 1); ! tree cr_offset = component_ref_field_offset (expr); ! HOST_WIDE_INT bit_pos ! = tree_low_cst (cr_offset, 1) * BITS_PER_UNIT ! + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field)); ! /* We can be called with a model different from the one associated ! with BASE so we need to avoid going up the chain too far. */ ! if (offset - bit_pos 0) ! break; ! ! offset -= bit_pos; ! VEC_safe_push (tree, stack, cr_stack, expr); ! ! expr = TREE_OPERAND (expr, 0); ! type = TREE_TYPE (expr); ! } while (TREE_CODE (expr) == COMPONENT_REF); } ! ! t = build_ref_for_offset (loc, base, offset, type, gsi, insert_after); ! ! if (TREE_CODE (model-expr) == COMPONENT_REF) ! { ! unsigned i; ! tree expr; ! ! /* Now replicate the chain of COMPONENT_REFs from inner to outer. */ ! FOR_EACH_VEC_ELT_REVERSE (tree, cr_stack, i, expr) ! { ! tree field = TREE_OPERAND (expr, 1); ! t = fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (field), t, field, ! TREE_OPERAND (expr, 2)); ! } ! ! VEC_free (tree, stack, cr_stack); ! } ! ! return t; } /* Construct a memory reference consisting of component_refs and array_refs to --- 1489,1520 return fold_build2_loc (loc, MEM_REF, exp_type, base, off); } /* Construct a memory reference to a part of an aggregate BASE at the given !OFFSET and of the same type as MODEL. In case this is a reference to a !bit-field, the function will replicate the last component_ref of model's !expr to access it. GSI and INSERT_AFTER have the same meaning as in !build_ref_for_offset. */ static tree build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, struct access *model, gimple_stmt_iterator *gsi, bool insert_after) { ! if (TREE_CODE (model-expr) == COMPONENT_REF !DECL_BIT_FIELD (TREE_OPERAND (model-expr, 1))) I think you need to test DECL_BIT_FIELD_TYPE here I have no problems changing that but in between revisions 164280 and 166730 this test worked fine. What exactly is the difference? { ! /* This access represents a bit-field. */ ! tree t, exp_type; ! offset -= int_bit_position (TREE_OPERAND (model-expr, 1)); I'm not sure that offset is now byte-aligned for all the funny Ada bit-layouted records. But maybe we don't even try to SRA those? Aggregates containing aggregate fields which start at position straddling
Re: [RS6000] Stack tie fix.
On Apr 12, 2012, at 17:34 , David Edelsohn wrote: This version is okay, but I also want to give Richi and Olivier an opportunity to comment if they still have any concerns. Thanks :) I'm rebuilding our old compiler with the patch and will double check how things go on our original failing case. Olivier
Re: [PATCH][ARM] NEON DImode neg
On 12/04/12 16:48, Richard Earnshaw wrote: If negation in Neon needs a scratch register, it seems to me to be somewhat odd that we're disparaging the ARM version. Good point, I'm really meaning to provide missing functionality in NEON to prevent unnecessary moves, rather than cause them, so I should remove the '?'s. It probably doesn't need disparaging on A8, etiher? Also, wouldn't it be sensible to support a variant that was early-clobber on operand 0, but loaded immediate zero into that value first: vmovDd, #0 vsubDd, Dd, Dm That way you'll never need more than two registers, whereas today you want three. That does seem like a good option. I'll have a go and see what happens. Andrew
Re: [PATCH][ARM] NEON DImode neg
On 12/04/12 17:16, Andrew Stubbs wrote: On 12/04/12 16:48, Richard Earnshaw wrote: If negation in Neon needs a scratch register, it seems to me to be somewhat odd that we're disparaging the ARM version. Good point, I'm really meaning to provide missing functionality in NEON to prevent unnecessary moves, rather than cause them, so I should remove the '?'s. It probably doesn't need disparaging on A8, etiher? Possibly not. Which then begs the question as to whether we need an A8 variant at all. Also, wouldn't it be sensible to support a variant that was early-clobber on operand 0, but loaded immediate zero into that value first: vmovDd, #0 vsubDd, Dd, Dm That way you'll never need more than two registers, whereas today you want three. That does seem like a good option. I'll have a go and see what happens. Andrew
Re: [PATCH] Fix PR18589
On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J.
Re: [PATCH] Caret diagnostics
Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes: Manuel So, in fact, libcpp is buggy for not implementing this (as can be seen Manuel in emacs). If/When libcpp is fixed, the column info will be correct Manuel for tabs. And then, one may care about printing tabs as anything Manuel different from a single space. But until then, a single space is a Manuel good as anything. In fact this is what is done in c-ppoutput.c. Gaby I would not necessarily say that libcpp is buggy in this aspect. Gaby However, I do agree that removing the tab expansion simplifies Gaby the code and is better. Unfortunately I think it really is buggy. Other GNU programs, including Emacs, already conform to the GCS here. Arguably the GCS made the wrong decision, but it is already baked in all over. Tom
Re: [PATCH] Caret diagnostics
On Thu, Apr 12, 2012 at 11:53 AM, Tom Tromey tro...@redhat.com wrote: Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes: Manuel So, in fact, libcpp is buggy for not implementing this (as can be seen Manuel in emacs). If/When libcpp is fixed, the column info will be correct Manuel for tabs. And then, one may care about printing tabs as anything Manuel different from a single space. But until then, a single space is a Manuel good as anything. In fact this is what is done in c-ppoutput.c. Gaby I would not necessarily say that libcpp is buggy in this aspect. Gaby However, I do agree that removing the tab expansion simplifies Gaby the code and is better. Unfortunately I think it really is buggy. Other GNU programs, including Emacs, already conform to the GCS here. Arguably the GCS made the wrong decision, but it is already baked in all over. OK. -- Gaby
Re: [PATCH] Fix PR18589
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote: On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J. Whoops. Looks like I need to use HOST_WIDE_INT_PRINT_DEC instead of %ld in those spots. I'll get a fix prepared.
Re: [RS6000] Stack tie fix.
On Apr 12, 2012, at 18:07 , Olivier Hainque wrote: I'm rebuilding our old compiler with the patch and will double check how things go on our original failing case. All is well :) Thanks Alan!
Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable
Well, the commit did not add a testcase and when I looked up the patch in the mailing list archive (http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it was fixing problems not reproducible on trunk so it's basically impossible for me to evaluate whether it is still necessary by some simple testing. Having said that, I guess I can give it a round of regular testing on all the platforms I have currently set up. The problem was that, for the same address, you had the alias set of the type on one MEM and the alias set of the reference on the other MEM. If the alias set of the reference doesn't conflict with that of the type (this can happen in Ada because of DECL_NONADDRESSABLE_P), the RAW dependency may be missed. If we don't put the alias set of the reference on one of the MEM, then I don't think that we need to put it on the other MEM. That's what's done for the first, non-bitfield temporary now. 2012-04-10 Martin Jambor mjam...@suse.cz * expr.c (expand_expr_real_1): Pass type, not the expression, to set_mem_attributes for a memory temporary. Do not call the function for the memory temporary created for a bitfield. Fine with me, but the now dangling code in the bitfield case is a bit annoying. -- Eric Botcazou
Re: Phone call (was Re: Armhf dynamic linker path)
On Thursday 12 April 2012 02:05:23 Jakub Jelinek wrote: On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote: All good. My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it: The directory should be /libhf/ or /libhfp/ for that for consistency with all the other architectures. i think the idea was that no one is looking to do multilib here. so we won't have softfloat in /lib/ and hardfloat in /libhf/. we're just changing the ldso to reflect a change in the ABI. you could also make this argument for EABI and OABI -- the EABI ldso should not be in /lib/. but since we've got OABI and EABI both in /lib/ and people are happy with that, as well as the hardfloat ldso in /lib/, there's no need for a sep /libhf/. I'm fine with arm and hf (resp. hfp) being mentioned in the name of the dynamic linker, but IMNSHO having there gnu and eabi strings is an overkill - why mention gnu there, when all the other architectures which also have GNU libc dynamic linkers don't? That part is implicit. And, EABI is implied by so.3, softfp dynamic linker for EABI is /lib/ld-linux.so.3 while softfp dynamic linker for the old ABI is /lib/ld-linux.so.2. i have no opinion either way here. uClibc has already opted to name things with -uClibc- in them, so we're clear of collisions there. -mike signature.asc Description: This is a digitally signed message part.
Re: Phone call (was Re: Armhf dynamic linker path)
On Thursday 12 April 2012 03:47:29 Jakub Jelinek wrote: On Thu, Apr 12, 2012 at 10:33:08AM +0300, Riku Voipio wrote: On 12 April 2012 09:05, Jakub Jelinek ja...@redhat.com wrote: On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote: All good. My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it: The directory should be /libhf/ or /libhfp/ for that for consistency with all the other architectures. Note e.g. x86_64 dynamic linker is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2. For some value of consistency. x86_64, mips64, powerpc64 and sparc64 install to /lib64. But on ia64 it is /lib/ld-linux-ia64.so.2 and on ia64 installs in /lib, because it isn't a multilibbed architecture. because distros choose not to support it. in first gen chips, there was hardware support for running x86. so if we were to be strict, there should have been /libia64/ (or whatever) while the current x86 32bit code would stay in /lib/. but no distro was interested in supporting that (probably due to the 32bit x86 layers being balls-ass slow), so it never happened. s390x it is /lib/ld64.so.1 [1]. Ok, I forgot about this, I've tried to convince s390x folks to move it to /lib64/ld64.so.1 many years ago, but that just didn't happen, so /lib/ld64.so.1 is just a symlink to /lib64/ld64.so.1. Upstream glibc binaries use /lib64/ld64.so.1 as their dynamic linker, while all other packages use /lib/ld64.so.1 as that is hardcoded in gcc. i always thought this was weird. glad to know i'm not the only one :). -mike signature.asc Description: This is a digitally signed message part.
Re: Phone call (was Re: Armhf dynamic linker path)
On Thu, Apr 12, 2012 at 01:49:16PM -0400, Mike Frysinger wrote: ia64 installs in /lib, because it isn't a multilibbed architecture. because distros choose not to support it. in first gen chips, there was hardware support for running x86. so if we were to be strict, there should have been /libia64/ (or whatever) while the current x86 32bit code would stay in /lib/. but no distro was interested in supporting that (probably due to the 32bit x86 layers being balls-ass slow), so it never happened. We even carried patches (not applied) for lib64 ia64 support in our binutils/glibc/gcc for a while, but the final decision after these were written was that it is going to stay in /lib. Jakub
Re: [i386, patch, RFC] HLE support in GCC
On Thu, 2012-04-12 at 18:13 +0400, Kirill Yukhin wrote: I would suggest that we keep the HLE acq/rel bits independent of the memory order bits. Both are independent on a conceptual level. And we should add documentation that tells programmers that memory orders need always be specified. Sorry, I didn't get your point. You propose to separate HLE-capable builtins at all? No, just the bits; programmers would need to do __atomic_...(..., __ATOMIC_RELEASE | HLE_RELEASE); I believe this is what you had in one of your versions of the patch. My suggestions was not about doing something new but instead a suggestions/poll for a resolution of the discussion.
[steve.mcint...@linaro.org: Phone call (was Re: Armhf dynamic linker path)]
Getting bounces from this message. Let's try again. - Forwarded message from Steve McIntyre steve.mcint...@linaro.org - From: Steve McIntyre steve.mcint...@linaro.org Date: Wed, 11 Apr 2012 23:37:55 +0100 To: cross-dis...@lists.linaro.org Cc: Adam Conrad adcon...@debian.org, linaro-toolch...@lists.linaro.org, Jeff Law l...@redhat.com, libc-po...@sourceware.org, gcc-patches@gcc.gnu.org Subject: Phone call (was Re: Armhf dynamic linker path) X-attached: unknown On Wed, Apr 11, 2012 at 02:06:09AM +0100, Steve McIntyre wrote: And here's the details as promised. I've started a wiki page at https://wiki.linaro.org/OfficeofCTO/HardFloat/LinkerPathCallApr2012 with a strawman agenda for now, and a Doodle poll at http://www.doodle.com/93bitkqeb7auyxn7 to see when the best time is for the call on Thursday/Friday. Please fill in the times that work for you ASAP and I'll announce the result during Wednesday. Ideally we'd like stakeholders from all the relevant distros and the upstream toolchain developers to be there, able to represent their groups and (importantly) able to make a decision here on what we should do. Apologies for the short notice, but we need a decision quickly. And the best time turns out to be Friday at 15:00 UTC (16:00 BST, 11:00 EDT etc.). Of the 10 people who responded in the poll, the only person who can't make that time is Michael in .nz. Sorry, Michael. Phone numbers are listed in the wiki page above. I'll take notes and post minutes afterwards. Let's talk Friday and get this shaken out to a working conclusion. Thanks, -- Steve McIntyresteve.mcint...@linaro.org http://www.linaro.org/ Linaro.org | Open source software for ARM SoCs - End forwarded message - Cheers, -- Steve McIntyresteve.mcint...@linaro.org http://www.linaro.org/ Linaro.org | Open source software for ARM SoCs
[PATCH, i386]: Fix PR 52932 - _mm256_permutevar8x32_ps and _mm256_permutevar8x32_ps
Hello! Attached patch fixes issues around AVX2 vpermps and vpermd instructions. 1. Changes second argument of _mm256_permutevar8x32_ps to __m256i type and consequently changes second argument of __builtin_ia32_permvarsf256 argument to __v8si type. 2. Changes avx2_permvarv2sf pattern to accept v8si mask operand as its 2nd operand 3. Changes avx2_permvarv2si pattern in similar way, so it accepts mask as its 2nd operand 4. Macroizes avx2_permvarv2sf and permvarv2si patterns 5. Mechanically updates all calls to these two expanders 6. Fixes testcases accordingly 2012-04-12 Uros Bizjak ubiz...@gmail.com PR target/52932 * config/i386/avx2intrin.h (_mm256_permutevar8x32_ps): Change second argument type to __m256i. Update call to __builtin_ia32_permvarsf256. * config/i386/sse.md (UNSPEC_VPERMVAR): New. (UNSPEC_VPERMSI, UNSPEC_VPERMSF): Remove. (avx2_permvarv8sf, avx2_permvarv8si): Switch operands 1 and 2. (avx2_permvarmode): Macroize insn from avx2_permvarv8sf and avx2_permvarv8si using VI4F_256 mode iterator. * config/i386/i386.c (bdesc_args) __builtin_ia32_permvarsf256: Update builtin type to V8SF_FTYPE_V8SF_V8SI. (ix86_expand_vec_perm): Update calls to gen_avx2_permvarv8si and gen_avx2_permvarv8sf. (expand_vec_perm_pshufb): Ditto. testsuite/ChangeLog: 2012-04-12 Uros Bizjak ubiz...@gmail.com PR target/52932 * gcc.target/i386/avx2-vpermps-1.c (avx2_test): Use __m256i type for second function argument. * gcc.target/i386/avx2-vpermps-2.c (init_permps): Update declaration. (calc_permps): Update declaration. Calculate result correctly. (avx2_test): Change src2 type to union256i_d. * gcc.target/i386/avx2-vpermd-2.c (calc_permd): Calculate result correctly. Patch was tested on x86_64-pc-linux-gnu {,-m32}. Earlier version of the patch (without mechanical changes) was also tested on AVX2 target by Kirill. Patch was committed to mainline SVN, will be committed to 4.7.1 in a few days. Uros. Index: config/i386/avx2intrin.h === --- config/i386/avx2intrin.h(revision 186383) +++ config/i386/avx2intrin.h(working copy) @@ -1034,9 +1034,9 @@ _mm256_permute4x64_pd (__m256d __X, const int __M) extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_permutevar8x32_ps (__m256 __X, __m256 __Y) +_mm256_permutevar8x32_ps (__m256 __X, __m256i __Y) { - return (__m256) __builtin_ia32_permvarsf256 ((__v8sf)__X,(__v8sf)__Y); + return (__m256) __builtin_ia32_permvarsf256 ((__v8sf)__X, (__v8si)__Y); } #ifdef __OPTIMIZE__ Index: config/i386/sse.md === --- config/i386/sse.md (revision 186383) +++ config/i386/sse.md (working copy) @@ -79,8 +79,7 @@ UNSPEC_VCVTPS2PH ;; For AVX2 support - UNSPEC_VPERMSI - UNSPEC_VPERMSF + UNSPEC_VPERMVAR UNSPEC_VPERMTI UNSPEC_GATHER UNSPEC_VSIBADDR @@ -11901,30 +11900,18 @@ (set_attr prefix vex) (set_attr mode sseinsnmode)]) -(define_insn avx2_permvarv8si - [(set (match_operand:V8SI 0 register_operand =x) - (unspec:V8SI - [(match_operand:V8SI 1 register_operand x) - (match_operand:V8SI 2 nonimmediate_operand xm)] - UNSPEC_VPERMSI))] +(define_insn avx2_permvarmode + [(set (match_operand:VI4F_256 0 register_operand =x) + (unspec:VI4F_256 + [(match_operand:VI4F_256 1 nonimmediate_operand xm) + (match_operand:V8SI 2 register_operand x)] + UNSPEC_VPERMVAR))] TARGET_AVX2 - vpermd\t{%2, %1, %0|%0, %1, %2} + vpermssemodesuffix\t{%1, %2, %0|%0, %2, %1} [(set_attr type sselog) (set_attr prefix vex) (set_attr mode OI)]) -(define_insn avx2_permvarv8sf - [(set (match_operand:V8SF 0 register_operand =x) - (unspec:V8SF - [(match_operand:V8SF 1 register_operand x) - (match_operand:V8SF 2 nonimmediate_operand xm)] - UNSPEC_VPERMSF))] - TARGET_AVX2 - vpermps\t{%2, %1, %0|%0, %1, %2} - [(set_attr type sselog) - (set_attr prefix vex) - (set_attr mode OI)]) - (define_expand avx2_permmode [(match_operand:VI8F_256 0 register_operand) (match_operand:VI8F_256 1 nonimmediate_operand) Index: config/i386/i386.c === --- config/i386/i386.c (revision 186383) +++ config/i386/i386.c (working copy) @@ -19937,7 +19937,7 @@ ix86_expand_vec_perm (rtx operands[]) vt = force_reg (maskmode, vt); mask = gen_lowpart (maskmode, mask); if (maskmode == V8SImode) - emit_insn (gen_avx2_permvarv8si (t1, vt, mask)); + emit_insn (gen_avx2_permvarv8si (t1, mask, vt)); else emit_insn (gen_avx2_pshufbv32qi3 (t1, mask, vt)); @@ -19971,13 +19971,13 @@ ix86_expand_vec_perm (rtx operands[]) the
libstdc++-v3: Run prettyprinters.exp
prettyprinters.exp is never run except in non-parallel check or when requested explicitly. Tested on powerpc-linux and checked in as obvious. Andreas. 2012-04-12 Andreas Schwab sch...@linux-m68k.org * testsuite/Makefile.am (check_DEJAGNUnormal0): Run prettyprinters.exp. * testsuite/Makefile.in: Regenerated. diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am index 7094ad5..0cf8de5 100644 --- a/libstdc++-v3/testsuite/Makefile.am +++ b/libstdc++-v3/testsuite/Makefile.am @@ -134,7 +134,7 @@ check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp normal0) \ if $(SHELL) -c $$runtest --version /dev/null 21; then \ $$runtest $(AM_RUNTESTFLAGS) $(RUNTESTDEFAULTFLAGS) \ - $(RUNTESTFLAGS) abi.exp; \ + $(RUNTESTFLAGS) abi.exp prettyprinters.exp; \ else echo WARNING: could not find \`runtest' 12; :;\ fi; \ dirs=`cd $$srcdir; echo [013-9][0-9]_*/*`;; \ -- 1.7.10 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: libstdc++-v3: Run prettyprinters.exp
Hi, prettyprinters.exp is never run except in non-parallel check or when requested explicitly. Tested on powerpc-linux and checked in as obvious. Thanks Andreas. Since you seem to pay good attention to these testing details, I wonder if you could also help with the issue I noticed again over the last days with make check-performance not actually doing much if make check isn't run before. I bet a fix would turn out to be just a few lines.. Paolo
Re: [PATCH] PR c++/50852 - loose template parameter comparison
Sorry for the delay in getting back to you on this; I just keep thinking there has to be a better way to deal with this issue than this increasingly complex fixup machinery. The basic problem we're dealing with is that two typedefs from different contexts are considered identical for the purpose of template argument comparison, and we are leaving them in non-type template arguments, so when we look for a pre-existing specialization one typedef gets replaced by the other, which breaks. The fixup stuff is a solution to the problem of typedefs with incompatible template parameters leading to crashes, but it seems to me that this is just a symptom of the problem of replacing one typedef with another, and we should attack that underlying problem instead. That problem is also visible when the typedefs have different names and an error message ends up talking about the wrong one. It seems to me that the way to do this is to strip typedefs from non-type arguments like strip_typedefs does for types. Did we try this before? Ideal would be to treat a specialization using the typedef-using arguments as a typedef to the specialization using the canonical arguments, but that could follow on later. Jason
[google/google-main] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS (issue 6016047)
Reviewers: asharif1, jingyu, Diego Novillo, Message: Hi, the newest chrome gcc (from google-main) fails to linking anything, by looking into specs file, it seems that 'link_emulation' section is missing in specs. The problem I found is that SUBTARGET_EXTRA_SPECS, which is not empty for chromeos, is overridden by LINUX_GRTE_EXTRA_SPECS. Please review the proposed a patch. (Tested by buildit bootstrap). -Han Description: In linux.h, macro SUBTARGET_EXTRA_SPECS is overridden by LINUX_GRTE_EXTRA_SPECS, so the origin content of SUBTARGET_EXTRA_SPECS never gets written into gcc spec, which causes linking failure for chrome gcc. Please review this at http://codereview.appspot.com/6016047/ Affected files: M gcc/config/i386/i386.h M gcc/config/i386/linux.h Index: gcc/config/i386/i386.h diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 7721c46..29fdd0b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -621,8 +621,9 @@ enum target_cpu_default #endif #define EXTRA_SPECS\ - { cc1_cpu, CC1_CPU_SPEC }, \ - SUBTARGET_EXTRA_SPECS + { cc1_cpu, CC1_CPU_SPEC }, \ +LINUX_GRTE_EXTRA_SPECS \ +SUBTARGET_EXTRA_SPECS /* Set the value of FLT_EVAL_METHOD in float.h. When using only the Index: gcc/config/i386/linux.h diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h index ade524c..383dedf 100644 --- a/gcc/config/i386/linux.h +++ b/gcc/config/i386/linux.h @@ -30,7 +30,3 @@ along with GCC; see the file COPYING3. If not see #ifndef LINUX_GRTE_EXTRA_SPECS #define LINUX_GRTE_EXTRA_SPECS #endif - -#undef SUBTARGET_EXTRA_SPECS -#define SUBTARGET_EXTRA_SPECS \ - LINUX_GRTE_EXTRA_SPECS
Re: [PATCH] Fix PR18589
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote: On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J. Thanks, H.J. Sorry for the problem! Fixing as follows. I'll plan to commit as obvious shortly. 2012-04-12 Bill Schmidt wschm...@linux.vnet.ibm.com * tree-ssa-reassoc.c (attempt_builtin_powi_stats): Change %ld to HOST_WIDE_INT_PRINT_DEC in format strings. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 186384) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -3186,7 +3186,8 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent if (elt vec_len - 1) fputs ( * , dump_file); } - fprintf (dump_file, )^%ld\n, power); + fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n, + power); } } } @@ -3219,7 +3220,7 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent if (elt vec_len - 1) fputs ( * , dump_file); } - fprintf (dump_file, )^%ld\n, power); + fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n, power); } reassociate_stats.pows_created++;
Re: libstdc++-v3: Run prettyprinters.exp
Paolo Carlini pcarl...@gmail.com writes: Since you seem to pay good attention to these testing details, I wonder if you could also help with the issue I noticed again over the last days with make check-performance not actually doing much if make check isn't run before. I bet a fix would turn out to be just a few lines.. Perhaps because # XXX Need to add dependency on libtestc++.a Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
[PR tree-optimization/52558]: RFC: questions on store data race
Here we have a testcase that affects both the C++ memory model and transactional memory. [Hans, this is caused by the same problem that is causing the speculative register promotion issue you and Torvald pointed me at]. In the following testcase (adapted from the PR), the loop invariant motion pass caches a pre-existing value for g_2, and then performs a store to g_2 on every path, causing a store data race: int g_1 = 1; int g_2 = 0; int func_1(void) { int l; for (l = 0; l 1234; l++) { if (g_1) return l; else g_2 = 0; -- Store to g_2 should only happen if !g_1 } return 999; } This gets transformed into something like: g_2_lsm = g_2; if (g_1) { g_2 = g_2_lsm; // boo! hiss! return 0; } else { g_2_lsm = 0; g_2 = g_2_lsm; } The spurious write to g_2 could cause a data race. Andrew has suggested verifying that the store to g_2 was actually different than on entry, and letting PHI copy propagation optimize the redundant comparisons. Like this: g_2_lsm = g_2; if (g_1) { if (g_2_lsm != g_2) // hopefully optimized away g_2 = g_2_lsm; // hopefully optimized away return 0; } else { g_2_lsm = 0; if (g_2_lsm != g_2) // hopefully optimized away g_2 = g_2_lsm; } ...which PHI copy propagation and/or friends should be able to optimize. For that matter, regardless of the memory model, the proposed solution would improve the existing code by removing the extra store (in this contrived test case anyhow). Anyone see a problem with this approach (see attached patch)? Assuming the unlikely scenario that everyone likes this :), I have two problems that I would like answered. But feel free to ignore if the approach is a no go. 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2. In the PR, Richard mentions that both FRE/PRE can figure this out, but they are not run after store motion. DOM should also be able to, but apparently does not :(. Tips? 2. The GIMPLE_CONDs I create in this patch are currently causing problems with complex floats/doubles. do_jump is complaining that it can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex values since complex lowering has already happened (??). Is there a more canonical way of creating a GIMPLE_COND that may contain complex floats at this stage? Thanks folks. * tree-ssa-loop-im.c (execute_sm): Guard store motion with a conditional. * opts.c (finish_options): Do not allow store or load data races with -fgnu-tm. Index: tree-ssa-loop-im.c === --- tree-ssa-loop-im.c (revision 186108) +++ tree-ssa-loop-im.c (working copy) @@ -1999,8 +1999,59 @@ execute_sm (struct loop *loop, VEC (edge FOR_EACH_VEC_ELT (edge, exits, i, ex) { - store = gimple_build_assign (unshare_expr (ref-mem), tmp_var); - gsi_insert_on_edge (ex, store); + basic_block new_bb, then_bb, old_dest; + edge then_old_edge; + gimple_stmt_iterator gsi; + gimple stmt; + tree t1; + + if (PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)) + { + store = gimple_build_assign (unshare_expr (ref-mem), tmp_var); + gsi_insert_on_edge (ex, store); + } + else + { + old_dest = ex-dest; + new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops new_bb-loop_father) + add_bb_to_loop (then_bb, new_bb-loop_father); + + gsi = gsi_start_bb (new_bb); + t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL); + stmt = gimple_build_assign (t1, unshare_expr (ref-mem)); + gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); + stmt = gimple_build_cond (NE_EXPR, t1, tmp_var, + NULL_TREE, NULL_TREE); + gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); + + gsi = gsi_start_bb (then_bb); + store = gimple_build_assign (unshare_expr (ref-mem), tmp_var); + gsi_insert_after (gsi, store, GSI_CONTINUE_LINKING); + + make_edge (new_bb, then_bb, EDGE_TRUE_VALUE); + make_edge (new_bb, old_dest, EDGE_FALSE_VALUE); + then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU); + set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb); + + for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple phi = gsi_stmt (gsi); + unsigned i; + + for (i = 0; i gimple_phi_num_args (phi); i++) + if (gimple_phi_arg_edge (phi, i)-src == new_bb) + { + tree arg = gimple_phi_arg_def (phi, i); +
Re: libstdc++-v3: Run prettyprinters.exp
On 04/12/2012 11:43 PM, Andreas Schwab wrote: Paolo Carlinipcarl...@gmail.com writes: Since you seem to pay good attention to these testing details, I wonder if you could also help with the issue I noticed again over the last days with make check-performance not actually doing much if make check isn't run before. I bet a fix would turn out to be just a few lines.. Perhaps because # XXX Need to add dependency on libtestc++.a Indeed, I was looking for some sort of missing dependency... Any patchlet forthcoming for that? ;) Thanks, Paolo.
Re: Support for Runtime CPU type detection via builtins (issue5754058)
Ping. On Tue, Apr 3, 2012 at 12:47 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, i386 maintainers - Is this patch ok? Thanks, -Sri. On Mon, Apr 2, 2012 at 5:48 PM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Apr 2, 2012 at 5:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Mar 31, 2012 at 1:03 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Mar 30, 2012 at 5:47 AM, Michael Matz m...@suse.de wrote: Hi, On Thu, 29 Mar 2012, Sriraman Tallam wrote: +struct __processor_model +{ + /* Vendor. */ + unsigned int __cpu_is_amd : 1; + unsigned int __cpu_is_intel : 1; + /* CPU type. */ + unsigned int __cpu_is_intel_atom : 1; + unsigned int __cpu_is_intel_core2 : 1; + unsigned int __cpu_is_intel_corei7 : 1; + unsigned int __cpu_is_intel_corei7_nehalem : 1; + unsigned int __cpu_is_intel_corei7_westmere : 1; + unsigned int __cpu_is_intel_corei7_sandybridge : 1; + unsigned int __cpu_is_amdfam10h : 1; + unsigned int __cpu_is_amdfam10h_barcelona : 1; + unsigned int __cpu_is_amdfam10h_shanghai : 1; + unsigned int __cpu_is_amdfam10h_istanbul : 1; + unsigned int __cpu_is_amdfam15h_bdver1 : 1; + unsigned int __cpu_is_amdfam15h_bdver2 : 1; +} __cpu_model; It doesn't make sense for the model to be a bitfield, a processor will have only ever exactly one model. Just make it an enum or even just an int. Not entirely true, nehalem and corei7 can be both set. However, I modified this by dividing it into types and sub types and then did what you said. Uh... then I suppose you need to document somewhere what names match to what cpuid family/model (supposedly thats where your two-layer hierarchy comes from, which incidentially misses one layer, the vendor?) Added documentation to extend.texi Patch available for review here: http://codereview.appspot.com/5754058 Thanks, -Sri. Richard. * config/i386/i386.c (build_processor_features_struct): New function. (build_processor_model_struct): New function. (make_var_decl): New function. (get_field_from_struct): New function. (fold_builtin_target): New function. (ix86_fold_builtin): New function. (ix86_expand_builtin): Expand new builtins by folding them. (make_cpu_type_builtin): New functions. (ix86_init_platform_type_builtins): Make the new builtins. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbols __cpu_model and __cpu_features. Patch available for review here: http://codereview.appspot.com/5754058 Thanks, -Sri. Ciao, Michael.
[v3] hashtable aliases
This patch restructures the std::unordered_* containers, as per the tempting comment in unordered_set: // NB: When we get typedef templates these class definitions // will be unnecessary. This required some changes in _Hashtable, forced the consolidation of insert() logic in a base class, and necessitated the restructuring of type definitions in the _Hashtable. I think the current form allows some nice advantages in terms of nested types and expressiveness. tested x86/linux tested x86/linux -debug -benjamin 20120412-334.patch.bz2 Description: application/bzip