Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Using: int test (const char *s1) { const v16qi *p = (const v16qi *)(unsigned long) s1; return __builtin_ia32_pcmpistri128 (*p, ...); } will generate movdqa before pcmpistri. With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front of __builtin_ia32_pcmpestri128. Uros.
Re: [PATCH 2/3] Add XLP-specific atomic instructions and tweaks.
On 16/06/2012, at 7:45 PM, Richard Sandiford wrote: Maxim Kuvyrkov ma...@codesourcery.com writes: Updated patch attached. Any further comments? It's due to my bad explanation, sorry, but this isn't what I meant. The two main changes I was looking for were: 1) Your pattern uses: [(mem:GPR (match_operand:P 1 register_operand d))] Instead, we should define a new memory predicate/constraint pair for memories that only accept register addresses. I.e. there should be a new predicate to go alongside things like memory_operand and stack_operand, except that the new one would be even more restrictive in the set of addresses that it allows. mem_reg_operand seems as good a name as any, but I'm not wedded to a particular name. The new memory constraint would likewise go alongside m, W, etc., except that (like the predicate) it too would only allow register addresses. We're running low on constraint latters, so a two-operand one like ZR might be OK. We can then use Z as a prefix for other MIPS-specific memory and address constraints in future. The atomic_exchange and atomic_fetch_add expanders should use the code I quoted in the earlier message to force the original memory_operand into this more restrictive form: if (!mem_reg_operand (operands[1], MODEmode)) { addr = force_reg (Pmode, XEXP (operands[1], 0)); operands[1] = replace_equiv_address (operands[1], addr); } The reason is that hard-coding (mem ...) in named define_insns (i.e. those with a gen_* function) is usually a mistake. We end up discarding the original MEM and losing track of its MEM_ATTRs. (Note that this change means we don't need separate Pmode == SImode and Pmode == DImode patterns.) 2) Your pattern has: (match_operand:GPR 2 arith_operand 0) to match: (match_operand:GPR 0 register_operand =d) Operand 2 doesn't accept constants, so it should be a register_operand rather than an arith_operand. Then the atomic_exchange and atomic_fetch_add expanders should use force_reg to turn _their_ arith_operands into register_operands before calling gen_atomic_fetch_addmode_ldadd and gen_atomic_fetchmode_swap. Your new comment says: /* Spill the address to a register upfront to simplify reload's job. */ But this isn't about making reload's job easier. Reload can cope just fine with the arith_operand above and would cope just fine with: (match_operand ... memory_operand ZR) with ZR defined as above. Instead. we're trying to describe the instruction as accurately as possible so that the pre-reload passes (including IRA) are in a position to make good optimisation decisions. They're less able to do that if patterns claim to accept more things than they actually do. I.e. it's the same reason that we don't just use general_operand for all reloadable rvalues and nonimmediate_operand for all reloadable lvalues. Trying to use accurate predicates is such standard practice that I think it'd be better to drop the comment here. Having one gives the impression that we're trying to cope with some special case, which AFAICT we're not. Richard, Thank you for a thoughtful write-up. I really appreciate the time you are taking to educate me. I've incorporated yours and Richard H.'s comments (stole pieces from ARM port) and attached is the updated patch. The only other change that I made that was not in your comments is the addition of b mips_print_operand specifier. The LDADD and SWAP instructions accept their address as a plain register without parenthesis, so I've added the specifier to skip outputting parenthesis. Any further comments? -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics 0002-Add-XLP-specific-atomic-instructions-and-tweaks.patch Description: Binary data
Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Using: int test (const char *s1) { const v16qi *p = (const v16qi *)(unsigned long) s1; return __builtin_ia32_pcmpistri128 (*p, ...); } will generate movdqa before pcmpistri. Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N); movdqa in front of this builtin will be generated with -O0. Uros.
Re: [v3] PR 53270 fix hppa-linux bootstrap regression
On 14 June 2012 23:23, Jonathan Wakely wrote: For 4.6.4 and 4.7.2 I plan to make a less intrusive change, #undef'ing the __GTHREAD_MUTEX_INIT, _GTHREAD_RECURSIVE_MUTEX_INIT and __GTHREAD_COND_INIT macros on hppa-linux in C++11 mode, so that the init functions are used instead. This fixes the bootstrap regression on hppa-linux without affecting other targets. Here's the simpler patch I'm committing to the 4.7 and 4.6 branches. PR libstdc++/53270 * config/os/gnu-linux/os_defines.h: Disable static initializer macros for gthreads types in C++11 mode. Tested hppa-linux. commit 82976f5a0e4a69d247bded9d8bae99a633360f20 Author: Jonathan Wakely jwakely@gmail.com Date: Tue Jun 19 01:07:54 2012 +0100 PR libstdc++/53270 * config/os/gnu-linux/os_defines.h: Disable static initializer macros for gthreads types in C++11 mode. diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h index c4aa305..f41160f 100644 --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h @@ -46,4 +46,10 @@ # undef _GLIBCXX_HAVE_GETS #endif +#if defined(__hppa__) defined(__GXX_EXPERIMENTAL_CXX0X__) +# define _GTHREAD_USE_MUTEX_INIT_FUNC +# define _GTHREAD_USE_RECURSIVE_MUTEX_INIT_FUNC +# define _GTHREAD_USE_COND_INIT_FUNC +#endif + #endif
Re: [Patch] Adjustments for Windows x64 SEH
On Jun 18, 2012, at 4:28 PM, Kai Tietz wrote: Hello Tristan, patch works for me, too. Just one nit about the patch. 2012/6/18 Tristan Gingold ging...@adacore.com: @@ -8558,6 +8558,11 @@ ix86_frame_pointer_required (void) if (TARGET_32BIT_MS_ABI cfun-calls_setjmp) return true; + /* Win64 SEH, very large frames need a frame-pointer as maximum stack + allocation is 4GB (add a safety guard for saved registers). */ + if (TARGET_64BIT_MS_ABI get_frame_size () + 4096 SEH_MAX_FRAME_SIZE) +return true; Where does this magic 4096 comes from? Is it intended to be the page-size, or is it meant to be the maximum stack-frame consumed by prologue? It is an upper bound for the maximum stack-frame consumed by prologue. I would suggest to use here instead: + if (TARGET_64BIT_MS_ABI get_frame_size () (SEH_MAX_FRAME_SIZE - 4096)) +return true; Additional a testcase for big-stackframe would be interesting. You won't need to make here a execution test, a assembler-scan would be enough. I think that a simple build test should make it. Thanks, Tristan.
Re: RFA: Fix PR53688
On Mon, Jun 18, 2012 at 4:59 PM, Michael Matz m...@suse.de wrote: Hi, now that we regard MEM_EXPR as a conservative approximation for MEM_SIZE (and MEM_OFFSET) we must ensure that this is really the case. It isn't currently for the string expanders, as they use the MEM_REF (whose address was taken) directly as the one to use for MEM_EXPR on the MEM rtx. That's wrong, on gimple side we take the address only and hence its size is arbitrary. So, we have to build a memref always and rewrite its type to one representing the real size. Note that TYPE_MAX_VALUE may be NULL, so we don't need to check for 'len' being null or not. This fixes the C testcase (don't know about fma 3d), and is in regstrapping on x86_64-linux. Okay if that passes? Ok. Note that as a followup you should be able to remove the whole /* Allow the string and memory builtins to overflow from one field into another, see http://gcc.gnu.org/PR23561. Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole memory accessed by the string or memory builtin will fit within the field. */ if (MEM_EXPR (mem) TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF) { block. Also practically (as we are expanding from GIMPLE now), off should always be zero and TREE_CODE (exp) should never be POINTER_PLUS_EXPR, nor should there be wrapping conversions. The 'off' case can also be dealt with by using the offset operand of the MEM_REF we build. Finally MEM_EXPR itself has invalid type-based aliasing properties (it has so even before your patch), of course that doesn't really matter, as below we do set_mem_alias_set (mem, 0). Still with MEM_REF you should be able to do Index: gcc/builtins.c === --- gcc/builtins.c (revision 188733) +++ gcc/builtins.c (working copy) @@ -1250,132 +1250,27 @@ expand_builtin_prefetch (tree exp) static rtx get_memory_rtx (tree exp, tree len) { - tree orig_exp = exp; rtx addr, mem; - HOST_WIDE_INT off; - /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived - from its expression, for expr-a.b only variable.a.b is recorded. */ - if (TREE_CODE (exp) == SAVE_EXPR !SAVE_EXPR_RESOLVED_P (exp)) -exp = TREE_OPERAND (exp, 0); - - addr = expand_expr (orig_exp, NULL_RTX, ptr_mode, EXPAND_NORMAL); + addr = expand_expr (exp, NULL_RTX, ptr_mode, EXPAND_NORMAL); mem = gen_rtx_MEM (BLKmode, memory_address (BLKmode, addr)); - /* Get an expression we can use to find the attributes to assign to MEM. - If it is an ADDR_EXPR, use the operand. Otherwise, dereference it if - we can. First remove any nops. */ - while (CONVERT_EXPR_P (exp) - POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0 -exp = TREE_OPERAND (exp, 0); - - off = 0; - if (TREE_CODE (exp) == POINTER_PLUS_EXPR - TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR - host_integerp (TREE_OPERAND (exp, 1), 0) - (off = tree_low_cst (TREE_OPERAND (exp, 1), 0)) 0) -exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0); - else if (TREE_CODE (exp) == ADDR_EXPR) -exp = TREE_OPERAND (exp, 0); - else if (POINTER_TYPE_P (TREE_TYPE (exp))) -exp = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (exp)), exp); - else -exp = NULL; + /* Build a memory reference suitable for MEM_EXPR for use by the + alias oracle. Make sure to give that memory reference a proper + access size as well as alias-set zero. */ + exp = fold_build2 (MEM_REF, +build_array_type (char_type_node, + build_range_type (sizetype, +size_one_node, len)), +exp, build_int_cst (ptr_type_node, 0)); /* Honor attributes derived from exp, except for the alias set (as builtin stringops may alias with anything) and the size (as stringops may access multiple array elements). */ - if (exp) -{ - set_mem_attributes (mem, exp, 0); - - if (off) - mem = adjust_automodify_address_nv (mem, BLKmode, NULL, off); - - /* Allow the string and memory builtins to overflow from one -field into another, see http://gcc.gnu.org/PR23561. -Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole -memory accessed by the string or memory builtin will fit -within the field. */ - if (MEM_EXPR (mem) TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF) - { - tree mem_expr = MEM_EXPR (mem); - HOST_WIDE_INT offset = -1, length = -1; - tree inner = exp; - - while (TREE_CODE (inner) == ARRAY_REF -|| CONVERT_EXPR_P (inner) -|| TREE_CODE (inner) == VIEW_CONVERT_EXPR -|| TREE_CODE (inner) == SAVE_EXPR) - inner = TREE_OPERAND (inner, 0); - - gcc_assert (TREE_CODE (inner) == COMPONENT_REF); - - if (MEM_OFFSET_KNOWN_P
Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)
On Mon, Jun 18, 2012 at 7:51 PM, Steven Bosscher stevenb@gmail.com wrote: Hello, This patch started as an attempt to remove #include output.h from objc/: Instead of writing references directly to asm_out_file, the references are output as top-level asm statements. It's a bit of a hack, but it works and it's a better hack than writing to asm_out_file from a front end, and it also happens to fix PR objc/48109 to make ObjC on darwin/-m32 LTO-compatible. Bootstrappedtested on darwin by Iain, and bootstrappedtested by me on x86_64-unknown-linux-gnu. OK for trunk? Ok for the general idea and implementation, I'd still ask for a darwin maintainer ack though. Thanks, Richard. Ciao! Steven
Re: [4.6][ARM] Backport MCR Not available in Thumb1
On 19/06/12 04:03, Joey Ye wrote: Backporting trunk r179979 OK for 4.6? Backported from mainline 2011-10-14 David Alan Gilbert david.gilb...@linaro.org PR target/48126 * config/arm/arm.c (arm_output_sync_loop): Move label before barrier. Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h (revision 188331) +++ gcc/config/arm/arm.h (working copy) @@ -294,7 +294,8 @@ #define TARGET_HAVE_DMB (arm_arch7) /* Nonzero if this chip implements a memory barrier via CP15. */ -#define TARGET_HAVE_DMB_MCR (arm_arch6k ! TARGET_HAVE_DMB) +#define TARGET_HAVE_DMB_MCR (arm_arch6 ! TARGET_HAVE_DMB \ + ! TARGET_THUMB1) /* Nonzero if this chip implements a memory barrier instruction. */ #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR) Not ok (yet), the ChangeLog entry doesn't match the patch. R.
Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant
On Mon, Jun 18, 2012 at 9:51 PM, Jiří Hruška ji...@fud.cz wrote: Hi all, I have tracked down a bug which results in invalid code being generated for indexed TARGET_MEM_REF expressions during dominator optimization. The conditions are: accessing objects adjacent in memory in a loop (in order to generate the TARGET_MEM_REF gimple) and optimizing this tree item during dom optimization (to trigger folding). There might be another set of conditions which get to the same state through a different The problem is that get_ref_base_and_extent() for TARGET_MEM_REF with variable index sets `maxsize' to -1 to signal that via index or index2, the whole object can be reached and returns. But before that, if the target object is a declaration with known size and `maxsize' is -1, it is updated, which can be taken by the caller (if `maxsize' equals to basic `size') as possibility to fold the expression into a constant. Assuming I understood the code and comments right, the solution is then to really take a quick exit in the abovementioned indexed case instead of just breaking the loop and letting the rest of function change the `maxsize' parameter. A quick search did not reveal any existing ticket for this problem. The bug was originally found in GCC 4.6.1 while compiling x86 code under MinGW, which is what the attached simplified testcase is based upon (compilation with -O1 is OK, anything higher fails). GCC 3.4.6 seems unaffected. Also the relevant code parts seem unchanged in current trunk. Patched build of 4.7.1 survived bootstrap on x86_64-rhel fine. The attached patch and all changes provided therein are released to public domain and can be freely used or modified by anyone. (This is my first time dealing with GCC bowels, please excuse my superficial understanding of everything I have written above.) The issue is that your testcase is invalid. __attribute__((section(.rodata$int0))) const int fooS = 0; __attribute__((section(.rodata$int1))) const int foo1 = 1; __attribute__((section(.rodata$int2))) const int foo2 = 2; __attribute__((section(.rodata$int3))) const int foo3 = 3; __attribute__((section(.rodata$int4))) const int fooE = 0; ... int x = ret(*(fooS + i)); this access is only ever valid for i == 0 as otherwise you are creating a pointer that points outside of the object fooS. Richard. Thanks, Jiri Hruska
Re: [PATCH, testsuite]: Fix scan-tree-dump-times argument order in gcc.dg/tree-ssa/vrp68.c.
On Mon, Jun 18, 2012 at 10:01 PM, Janis Johnson janis_john...@mentor.com wrote: On 06/17/2012 05:03 AM, Richard Guenther wrote: On Sun, Jun 17, 2012 at 10:41 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! The testcase still fails on x86_64-pc-linux-gnu with: FAIL: gcc.dg/tree-ssa/vrp68.c scan-tree-dump-times vrp1 link_error 1 since there are two calls to link_error. Oops. I wonder how I did not see those failures myself ... Richard. I'm confused about what this test is supposed to do. It uses dg-do link which means the compile (test for excess errors) will fail if there is a reference to link_error. There are two uses of scan-tree-dump-times for the same string in the same file, so one of those is guaranteed to fail. It looks like the scans aren't needed, and dg-do link is the thing that needs the xfail. No, the scan-tree-dump-times are supposed to catch that already VRP1 has done the optimization - it does not so fully, which is why I added the XFAILed scan-tree-dump-times. But we still catch that XFAILed case with subsequent optimizations so the link succeeds nevertheless. The testcase fails now, I must have broken the optimization somehow and I am looking into it. Richard. Janis
[patch] Fix failing nested-3.C on ARM.
The regexp in nested-3.C has to parse the machine-specific comment character; on ARM that is '@'. Tested on arm-eabi, where this test now passes. OK? R. * g++.dg/debug/dwarf2/nested-3.C: Add ARM comment character to regexp. --- g++.dg/debug/dwarf2/nested-3.C (revision 188750) +++ g++.dg/debug/dwarf2/nested-3.C (local) @@ -59,4 +59,4 @@ main () // // Hence the scary regexp: // -// { dg-final { scan-assembler \[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|\]*\[#/!|\] \[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 0x\\1\[\n\r]+ } } +// { dg-final { scan-assembler \[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|@\]*\[#/!|@\] \[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 0x\\1\[\n\r]+ } }
[PATCH] Fix PR53708
We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. Bootstrapped and tested on i686-darwin9 and x86_64-apple-darwin10 and powerpc-apple-darwin9 by darwin folks, applied. Richard. 2012-06-19 Richard Guenther rguent...@suse.de PR tree-optimization/53708 * tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve user-supplied alignment and alignment of decls with the used attribute. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 188733) +++ gcc/tree-vect-data-refs.c (working copy) @@ -4731,6 +4720,12 @@ vect_can_force_dr_alignment_p (const_tre if (TREE_ASM_WRITTEN (decl)) return false; + /* Do not override explicit alignment set by the user or the alignment + as specified by the ABI when the used attribute is set. */ + if (DECL_USER_ALIGN (decl) + || DECL_PRESERVE_P (decl)) +return false; + if (TREE_STATIC (decl)) return (alignment = MAX_OFILE_ALIGNMENT); else
Re: [PATCH] Add vector cost model density heuristic
On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); with eventually slightly different signatures for add_stmt_cost (like pass in the original scalar stmt?). It allows the target, at finish_cost time, to evaluate things like register pressure and resource utilization. Thanks, Richard. I've been looking at this in between other projects. I wanted to be sure I understood the SLP infrastructure and whether it would cause any problems. It looks to me like it will be mostly ok. One issue I noticed is a possible difference in the order in which SLP instructions are analyzed and the order in which the instructions are issued during transformation. For both loop analysis and basic block analysis, SLP trees are constructed and analyzed prior to examining other vectorizable instructions. Their costs are calculated and stored in the SLP trees at this time. Later, when transforming statements to their vector equivalents, instructions in the block (or loop body) are processed in order until the first instruction that's part of an SLP tree is encountered. At that point, every instruction that's part of any SLP tree is transformed; then the vectorizer continues with the remaining non-SLP vectorizable statements. So if we do the natural and easy thing of placing calls to add_stmt_cost everywhere that costs are calculated today, the order that those costs are presented to the back end model will possibly be different than the order they are actually emitted. Interesting. But I suppose this is similar to how pattern statements are handled? Thus, the whole pattern sequence is processed when we encounter the main pattern statement? For a first cut at this, I suggest ignoring the problem other than to document it as an opportunity for improvement. Later we could improve it by using an add_stmt_slp_cost () interface (or adding an is_slp flag), and another interface to be called at the time during analysis when the SLP statements will be issued during transformation. This would allow the back end model to queue up the SLP costs in a separate vector and later place them in its internal structures at the appropriate place. It should eventually be possible to remove these fields/accessors: * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST However, I think this should be delayed until we have the basic infrastructure in place for the new model and well-tested. Indeed. The other issue is that we should have the model track both the inside and outside costs if we're going to get everything into the target model. For a first pass we can ignore this and keep the existing logic for the outside costs. Later we should add some interfaces analogous to add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so the model can track this stuff as carefully as it wants to. Outside costs are merely added to the niter * inner-cost metric to be compared with the scalar cost niter * scalar-cost, right? Thus they would be tracked completely separate - eventually similar to how we compute the cost of the scalar loop. So, I'd propose going at this in several phases: (1) Add calls to the new interface without disturbing existing logic; modify the profitability algorithms to query the new model for inside costs. Default algorithm for the model is to just sum costs as is done today. Right. (x) Add heuristics to target models as desired. (2) Handle the SLP ordering problem. (3) Handle outside costs in the target model. (4) Remove the now unnecessary cost fields and the calls that set them. Item (x) can happen anytime after item (1). I don't think this work is terribly difficult, just a bit tedious. The only really time-consuming aspect of it will be in
Re: [PATCH] Add vector cost model density heuristic
On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); By the way, I don't see much point in passing the void *data around here. Too many levels of interfaces that we'd have to pass it around in the vectorizer, so it would just sit in a static variable. Might as well let the data be wholly private to the target. Ok, so you'd have void init_cost (struct loop *) and unsigned finish_cost (void); then? Static variables are of couse not properly abstracted so we can't ever compute two set of costs at the same time ... but that's true all-over-the-place in GCC ... With previous discussion the add_stmt_cost hook would be split up to also allow passing the operation code for example. Richard.
Re: RFA: Fix PR53688
Hi, On Tue, 19 Jun 2012, Richard Guenther wrote: So, we have to build a memref always and rewrite its type to one representing the real size. Note that TYPE_MAX_VALUE may be NULL, so we don't need to check for 'len' being null or not. This fixes the C testcase (don't know about fma 3d), and is in regstrapping on x86_64-linux. Okay if that passes? Ok. Thanks, but I now know why we built an INDIRECT_REF :) build_simple_mem_ref() only handles some very constrained arguments, namely pointers and offseted ADDR_EXPRs when the offset is a constant. It doesn't for instance handle bla-a[i] (it asserts). So the patch trips over the assert in build_simple_mem_ref on __builtin_memset (p-c[i], 0, 42);. I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole bruhaha would be dead code (well, except for alignment setting). Or I could build MEM_REFs directly, not via build_simple_mem_ref, that also works, but leaves us with such MEM_EXPRs sometimes: (mem/c:BLK (reg:DI 65) [0 MEM[(void *)p_1(D)-c[i_2(D)]]+0 A8]) Note the complicated and non-canonical expression in the MEM[]. I'm not sure if the disambiguators do anything interesting with such expressions. If they aren't we'd safe memory by not generating this MEM_EXPR at all. If the latter is acceptable, then I indeed can as well wrap everything in a MEM_REF like you proposed (possibly with a predicate simple enough that reflects what build_simple_mem_ref is also checking) and be done with it. So, what should it be? Ciao, Michael.
Re: [PATCH] Fix PR53708
Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. Richard
Re: [PATCH] Fix PR53708
On Tue, 19 Jun 2012, Richard Sandiford wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). Richard.
Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)
On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote: On 2012-06-14 13:58, Jakub Jelinek wrote: + if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt, + vecwtype, vectype, + dummy, dummy, dummy_code, + dummy_code, dummy_int, dummy_vec)) +return NULL; It would be nice to be able to handle high-part multiplies as well, e.g. VEC_WIDEN_MULT_HI_EXPR. Which is what Altivec provides, and not VEC_WIDEN_MULT. Sure, but we don't have a tree code for that right now, do we? VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results, not all the high halves of the widened multiply. For 16-bit multiplication we could also use {,V}PMULH{,U}W (for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards). Jakub
Re: RFA: Fix PR53688
On Tue, Jun 19, 2012 at 12:13 PM, Michael Matz m...@suse.de wrote: Hi, On Tue, 19 Jun 2012, Richard Guenther wrote: So, we have to build a memref always and rewrite its type to one representing the real size. Note that TYPE_MAX_VALUE may be NULL, so we don't need to check for 'len' being null or not. This fixes the C testcase (don't know about fma 3d), and is in regstrapping on x86_64-linux. Okay if that passes? Ok. Thanks, but I now know why we built an INDIRECT_REF :) build_simple_mem_ref() only handles some very constrained arguments, namely pointers and offseted ADDR_EXPRs when the offset is a constant. It doesn't for instance handle bla-a[i] (it asserts). So the patch trips over the assert in build_simple_mem_ref on __builtin_memset (p-c[i], 0, 42);. I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole bruhaha would be dead code (well, except for alignment setting). Or I could build MEM_REFs directly, not via build_simple_mem_ref, that also works, but leaves us with such MEM_EXPRs sometimes: (mem/c:BLK (reg:DI 65) [0 MEM[(void *)p_1(D)-c[i_2(D)]]+0 A8]) Note the complicated and non-canonical expression in the MEM[]. I'm not sure if the disambiguators do anything interesting with such expressions. If they aren't we'd safe memory by not generating this MEM_EXPR at all. If the latter is acceptable, then I indeed can as well wrap everything in a MEM_REF like you proposed (possibly with a predicate simple enough that reflects what build_simple_mem_ref is also checking) and be done with it. So, what should it be? The MEM_REF is acceptable to the tree oracle and it can extract points-to information from it. Thus for simplicity unconditionally building the above is the best. We can always massage both fold to handle more complex cases (like the POINTER_PLUS_EXPR case) and set_mem_attributes to canonicalize / strip the above from useless parts. Thanks, Richard. Ciao, Michael.
RE: [4.6][ARM] Backport MCR Not available in Thumb1
Oops! Sorry for such a stupid problem. 2012-06-18 Joey Ye joey...@arm.com Backported from mainline 2011-10-14 David Alan Gilbert david.gilb...@linaro.org * config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1. Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h(revision 188331) +++ gcc/config/arm/arm.h(working copy) @@ -294,7 +294,8 @@ #define TARGET_HAVE_DMB(arm_arch7) /* Nonzero if this chip implements a memory barrier via CP15. */ -#define TARGET_HAVE_DMB_MCR(arm_arch6k ! TARGET_HAVE_DMB) +#define TARGET_HAVE_DMB_MCR(arm_arch6 ! TARGET_HAVE_DMB \ + ! TARGET_THUMB1) /* Nonzero if this chip implements a memory barrier instruction. */ #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR) -Original Message- From: Richard Earnshaw Sent: Tuesday, June 19, 2012 16:43 To: Joey Ye Cc: GCC Patches Subject: Re: [4.6][ARM] Backport MCR Not available in Thumb1 On 19/06/12 04:03, Joey Ye wrote: Backporting trunk r179979 OK for 4.6? Backported from mainline 2011-10-14 David Alan Gilbert david.gilb...@linaro.org PR target/48126 * config/arm/arm.c (arm_output_sync_loop): Move label before barrier. Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h(revision 188331) +++ gcc/config/arm/arm.h(working copy) @@ -294,7 +294,8 @@ #define TARGET_HAVE_DMB(arm_arch7) /* Nonzero if this chip implements a memory barrier via CP15. */ -#define TARGET_HAVE_DMB_MCR(arm_arch6k ! TARGET_HAVE_DMB) +#define TARGET_HAVE_DMB_MCR(arm_arch6 ! TARGET_HAVE_DMB \ + ! TARGET_THUMB1) /* Nonzero if this chip implements a memory barrier instruction. */ #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR) Not ok (yet), the ChangeLog entry doesn't match the patch. R.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote: On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); with eventually slightly different signatures for add_stmt_cost (like pass in the original scalar stmt?). It allows the target, at finish_cost time, to evaluate things like register pressure and resource utilization. Thanks, Richard. I've been looking at this in between other projects. I wanted to be sure I understood the SLP infrastructure and whether it would cause any problems. It looks to me like it will be mostly ok. One issue I noticed is a possible difference in the order in which SLP instructions are analyzed and the order in which the instructions are issued during transformation. For both loop analysis and basic block analysis, SLP trees are constructed and analyzed prior to examining other vectorizable instructions. Their costs are calculated and stored in the SLP trees at this time. Later, when transforming statements to their vector equivalents, instructions in the block (or loop body) are processed in order until the first instruction that's part of an SLP tree is encountered. At that point, every instruction that's part of any SLP tree is transformed; then the vectorizer continues with the remaining non-SLP vectorizable statements. So if we do the natural and easy thing of placing calls to add_stmt_cost everywhere that costs are calculated today, the order that those costs are presented to the back end model will possibly be different than the order they are actually emitted. Interesting. But I suppose this is similar to how pattern statements are handled? Thus, the whole pattern sequence is processed when we encounter the main pattern statement? Yes, but the difference is that both vect_analyze_stmt and vect_transform_loop handle the pattern statements in the same order (thankfully -- I would hate to have to deal with the pattern mess). With SLP, all SLP statements are analyzed ahead of time, but they aren't transformed until one of them is encountered in the statement walk. For a first cut at this, I suggest ignoring the problem other than to document it as an opportunity for improvement. Later we could improve it by using an add_stmt_slp_cost () interface (or adding an is_slp flag), and another interface to be called at the time during analysis when the SLP statements will be issued during transformation. This would allow the back end model to queue up the SLP costs in a separate vector and later place them in its internal structures at the appropriate place. It should eventually be possible to remove these fields/accessors: * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST However, I think this should be delayed until we have the basic infrastructure in place for the new model and well-tested. Indeed. The other issue is that we should have the model track both the inside and outside costs if we're going to get everything into the target model. For a first pass we can ignore this and keep the existing logic for the outside costs. Later we should add some interfaces analogous to add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so the model can track this stuff as carefully as it wants to. Outside costs are merely added to the niter * inner-cost metric to be compared with the scalar cost niter * scalar-cost, right? Thus they would be tracked completely separate - eventually similar to how we compute the cost of the scalar loop. Yes, that's the way they're used today, and probably nobody will ever want to get fancier than that. But as you say, the idea would be to let them be tracked similarly, but in
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote: On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); with eventually slightly different signatures for add_stmt_cost (like pass in the original scalar stmt?). It allows the target, at finish_cost time, to evaluate things like register pressure and resource utilization. Thanks, Richard. I've been looking at this in between other projects. I wanted to be sure I understood the SLP infrastructure and whether it would cause any problems. It looks to me like it will be mostly ok. One issue I noticed is a possible difference in the order in which SLP instructions are analyzed and the order in which the instructions are issued during transformation. For both loop analysis and basic block analysis, SLP trees are constructed and analyzed prior to examining other vectorizable instructions. Their costs are calculated and stored in the SLP trees at this time. Later, when transforming statements to their vector equivalents, instructions in the block (or loop body) are processed in order until the first instruction that's part of an SLP tree is encountered. At that point, every instruction that's part of any SLP tree is transformed; then the vectorizer continues with the remaining non-SLP vectorizable statements. So if we do the natural and easy thing of placing calls to add_stmt_cost everywhere that costs are calculated today, the order that those costs are presented to the back end model will possibly be different than the order they are actually emitted. Interesting. But I suppose this is similar to how pattern statements are handled? Thus, the whole pattern sequence is processed when we encounter the main pattern statement? Yes, but the difference is that both vect_analyze_stmt and vect_transform_loop handle the pattern statements in the same order (thankfully -- I would hate to have to deal with the pattern mess). With SLP, all SLP statements are analyzed ahead of time, but they aren't transformed until one of them is encountered in the statement walk. Ah, ok. I suppose we can simply declare that when we register vectorized stmts with the backend they are in arbitrary oder. After all this is not supposed to be another machine dependent reorg phase (to quote David). For a first cut at this, I suggest ignoring the problem other than to document it as an opportunity for improvement. Later we could improve it by using an add_stmt_slp_cost () interface (or adding an is_slp flag), and another interface to be called at the time during analysis when the SLP statements will be issued during transformation. This would allow the back end model to queue up the SLP costs in a separate vector and later place them in its internal structures at the appropriate place. It should eventually be possible to remove these fields/accessors: * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST However, I think this should be delayed until we have the basic infrastructure in place for the new model and well-tested. Indeed. The other issue is that we should have the model track both the inside and outside costs if we're going to get everything into the target model. For a first pass we can ignore this and keep the existing logic for the outside costs. Later we should add some interfaces analogous to add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so the model can track this stuff as carefully as it wants to. Outside costs are merely added to the niter *
Re: [4.6][ARM] Backport MCR Not available in Thumb1
On 19/06/12 12:26, Joey Ye wrote: Oops! Sorry for such a stupid problem. 2012-06-18 Joey Ye joey...@arm.com Backported from mainline 2011-10-14 David Alan Gilbert david.gilb...@linaro.org * config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1. OK. R.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote: On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); By the way, I don't see much point in passing the void *data around here. Too many levels of interfaces that we'd have to pass it around in the vectorizer, so it would just sit in a static variable. Might as well let the data be wholly private to the target. Ok, so you'd have void init_cost (struct loop *) and unsigned finish_cost (void); then? Static variables are of couse not properly abstracted so we can't ever compute two set of costs at the same time ... but that's true all-over-the-place in GCC ... It's a fair point, and perhaps I'll decide to pass the data pointer around anyway to keep that option open. We'll see which looks uglier. With previous discussion the add_stmt_cost hook would be split up to also allow passing the operation code for example. I remember having this discussion, and I was looking for it to check on the details, but I can't seem to find it either in my inbox or in the archives. Can you please point me to that again? Sorry for the bother. Thanks, Bill Richard.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote: On Mon, 18 Jun 2012, William J. Schmidt wrote: On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: On Fri, 8 Jun 2012, William J. Schmidt wrote: snip Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); By the way, I don't see much point in passing the void *data around here. Too many levels of interfaces that we'd have to pass it around in the vectorizer, so it would just sit in a static variable. Might as well let the data be wholly private to the target. Ok, so you'd have void init_cost (struct loop *) and unsigned finish_cost (void); then? Static variables are of couse not properly abstracted so we can't ever compute two set of costs at the same time ... but that's true all-over-the-place in GCC ... It's a fair point, and perhaps I'll decide to pass the data pointer around anyway to keep that option open. We'll see which looks uglier. With previous discussion the add_stmt_cost hook would be split up to also allow passing the operation code for example. I remember having this discussion, and I was looking for it to check on the details, but I can't seem to find it either in my inbox or in the archives. Can you please point me to that again? Sorry for the bother. It was in the Correct cost model for strided loads thread. Richard.
Re: [PATCH] Fix PR53708
On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) Dominique
[PATCH][5/n] VRP and anti-ranges
This adjusts intersect_ranges to match what will become union_ranges (but in a separate patch). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-06-19 Richard Guenther rguent...@suse.de * tree-vrp.c (intersect_ranges): Handle more cases. (vrp_intersect_ranges): Dump what we intersect and call ... (vrp_intersect_ranges_1): ... this. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 188771) --- gcc/tree-vrp.c (working copy) *** intersect_ranges (enum value_range_type *** 6781,6789 enum value_range_type vr1type, tree vr1min, tree vr1max) { /* [] is vr0, () is vr1 in the following classification comments. */ ! if (operand_less_p (*vr0max, vr1min) == 1 ! || operand_less_p (vr1max, *vr0min) == 1) { /* [ ] ( ) or ( ) [ ] If the ranges have an empty intersection, the result of the --- 6781,6811 enum value_range_type vr1type, tree vr1min, tree vr1max) { + bool mineq = operand_equal_p (*vr0min, vr1min, 0); + bool maxeq = operand_equal_p (*vr0max, vr1max, 0); + /* [] is vr0, () is vr1 in the following classification comments. */ ! if (mineq maxeq) ! { ! /* [( )] */ ! if (*vr0type == vr1type) ! /* Nothing to do for equal ranges. */ ! ; ! else if ((*vr0type == VR_RANGE !vr1type == VR_ANTI_RANGE) ! || (*vr0type == VR_ANTI_RANGE ! vr1type == VR_RANGE)) ! { ! /* For anti-range with range intersection the result is empty. */ ! *vr0type = VR_UNDEFINED; ! *vr0min = NULL_TREE; ! *vr0max = NULL_TREE; ! } ! else ! gcc_unreachable (); ! } ! else if (operand_less_p (*vr0max, vr1min) == 1 ! || operand_less_p (vr1max, *vr0min) == 1) { /* [ ] ( ) or ( ) [ ] If the ranges have an empty intersection, the result of the *** intersect_ranges (enum value_range_type *** 6813,6831 /* Take VR0. */ } } ! else if (operand_less_p (vr1max, *vr0max) == 1 ! operand_less_p (*vr0min, vr1min) == 1) { ! /* [ ( ) ] */ ! if (*vr0type == VR_RANGE) { ! /* If the outer is a range choose the inner one. !??? If the inner is an anti-range this arbitrarily chooses !the anti-range. */ *vr0type = vr1type; *vr0min = vr1min; *vr0max = vr1max; } else if (*vr0type == VR_ANTI_RANGE vr1type == VR_ANTI_RANGE) /* If both are anti-ranges the result is the outer one. */ --- 6835,6882 /* Take VR0. */ } } ! else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1) ! (mineq || operand_less_p (*vr0min, vr1min) == 1)) { ! /* [ ( ) ] or [( ) ] or [ ( )] */ ! if (*vr0type == VR_RANGE ! vr1type == VR_RANGE) { ! /* If both are ranges the result is the inner one. */ *vr0type = vr1type; *vr0min = vr1min; *vr0max = vr1max; } + else if (*vr0type == VR_RANGE + vr1type == VR_ANTI_RANGE) + { + /* Choose the right gap if the left one is empty. */ + if (mineq) + { + if (TREE_CODE (vr1max) == INTEGER_CST) + *vr0min = int_const_binop (PLUS_EXPR, vr1max, integer_one_node); + else + *vr0min = vr1max; + } + /* Choose the left gap if the right one is empty. */ + else if (maxeq) + { + if (TREE_CODE (vr1min) == INTEGER_CST) + *vr0max = int_const_binop (MINUS_EXPR, vr1min, + integer_one_node); + else + *vr0max = vr1min; + } + /* Choose the anti-range if the range is effectively varying. */ + else if (vrp_val_is_min (*vr0min) + vrp_val_is_max (*vr0max)) + { + *vr0type = vr1type; + *vr0min = vr1min; + *vr0max = vr1max; + } + /* Else choose the range. */ + } else if (*vr0type == VR_ANTI_RANGE vr1type == VR_ANTI_RANGE) /* If both are anti-ranges the result is the outer one. */ *** intersect_ranges (enum value_range_type *** 6841,6856 else gcc_unreachable (); } ! else if (operand_less_p (*vr0max, vr1max) == 1 ! operand_less_p (vr1min, *vr0min) == 1) { ! /* ( [ ] ) */ ! if (vr1type == VR_RANGE) ! /* If the outer is a range, choose the inner one. ! ??? If the inner is an anti-range this arbitrarily chooses ! the anti-range. */ ; else if (*vr0type ==
[arm] Remove obsolete FPA support (7/n): Tidy up attributes
This patch cleans up some more of the resulting fall-out from removing the FPA and maverick co-processors. In particular it covers: - Removing the redundant states from the type attributes - Removing some now redundant UNSPEC values. - Removing some state from the generic scheduler description that is now no-longer needed. Tested on arm-eabi and installed on trunk. * arm.md (enum unspec): Delete UNSPEC_SIN and UNSPEC_COS. (attr type): Remove fmul, ffmul, farith, ffarith, float_em f_fpa_load, f_fpa_store, f_mem_r, r_mem_f. (attr write_conflict, attr core_cycles): Update. * arm-generic.md (r_mem_f_wbuf): Delete reservation. R.Index: config/arm/arm.md === --- config/arm/arm.md (revision 188771) +++ config/arm/arm.md (working copy) @@ -65,12 +65,6 @@ (define_constants ;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md (define_c_enum unspec [ - UNSPEC_SIN; `sin' operation (MODE_FLOAT): -; operand 0 is the result, -; operand 1 the parameter. - UNPSEC_COS; `cos' operation (MODE_FLOAT): -; operand 0 is the result, -; operand 1 the parameter. UNSPEC_PUSH_MULT ; `push multiple' operation: ; operand 0 is the first register, ; subsequent registers are in parallel (use ...) @@ -321,21 +315,11 @@ (define_attr insn ; floata floating point arithmetic operation (subject to expansion) ; fdivdDFmode floating point division ; fdivsSFmode floating point division -; fmul Floating point multiply -; ffmulFast floating point multiply -; farith Floating point arithmetic (4 cycle) -; ffarith Fast floating point arithmetic (2 cycle) -; float_em a floating point arithmetic operation that is normally emulated -; even on a machine with an fpa. -; f_fpa_load a floating point load from memory. Only for the FPA. -; f_fpa_store a floating point store to memory. Only for the FPA. ; f_load[sd] A single/double load from memory. Used for VFP unit. ; f_store[sd] A single/double store to memory. Used for VFP unit. ; f_flag a transfer of co-processor flags to the CPSR -; f_mem_r a transfer of a floating point register to a real reg via mem -; r_mem_f the reverse of f_mem_r -; f_2_rfast transfer float to arm (no memory needed) -; r_2_ffast transfer arm to float +; f_2_rtransfer float to core (no memory needed) +; r_2_ftransfer core to float ; f_cvtconvert floating-integral ; branch a branch ; call a subroutine call @@ -351,18 +335,59 @@ (define_attr insn ; (define_attr type - alu,alu_shift,alu_shift_reg,mult,block,float,fdivx,fdivd,fdivs,fmul,fmuls,fmuld,fmacs,fmacd,ffmul,farith,ffarith,f_flag,float_em,f_fpa_load,f_fpa_store,f_loads,f_loadd,f_stores,f_stored,f_mem_r,r_mem_f,f_2_r,r_2_f,f_cvt,branch,call,load_byte,load1,load2,load3,load4,store1,store2,store3,store4,fconsts,fconstd,fadds,faddd,ffariths,ffarithd,fcmps,fcmpd,fcpys - (if_then_else -(eq_attr insn smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals) -(const_string mult) -(const_string alu))) + alu,\ + alu_shift,\ + alu_shift_reg,\ + mult,\ + block,\ + float,\ + fdivd,\ + fdivs,\ + fmuls,\ + fmuld,\ + fmacs,\ + fmacd,\ + f_flag,\ + f_loads,\ + f_loadd,\ + f_stores,\ + f_stored,\ + f_2_r,\ + r_2_f,\ + f_cvt,\ + branch,\ + call,\ + load_byte,\ + load1,\ + load2,\ + load3,\ + load4,\ + store1,\ + store2,\ + store3,\ + store4,\ + fconsts,\ + fconstd,\ + fadds,\ + faddd,\ + ffariths,\ + ffarithd,\ + fcmps,\ + fcmpd,\ + fcpys + (if_then_else +(eq_attr insn smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,\ +umull,umulls,umlal,umlals,smull,smulls,smlal,smlals) +(const_string mult) +(const_string alu))) ; Is this an (integer side) multiply with a 64-bit result? (define_attr mul64 no,yes -(if_then_else - (eq_attr insn smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals) - (const_string yes) - (const_string no))) + (if_then_else +(eq_attr insn + smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals) +(const_string yes) +(const_string no))) ; wtype for WMMX insn scheduling purposes. (define_attr wtype @@ -486,7 +511,7 @@ (define_attr model_wbuf no,yes (cons ; to stall the processor. Used with model_wbuf above. (define_attr write_conflict no,yes (if_then_else (eq_attr type - block,float_em,f_fpa_load,f_fpa_store,f_mem_r,r_mem_f,call,load1) +block,call,load1)
[PATCH][AARCH64]: Invent new regclass - FP low regs.
Hi, The attached patch invents a new register class V0 - V15 that is needed for some lane variants of AdvSIMD instructions that can only take V0 - V15 as their indexed register when working on half-word type. Regression tests are happy. OK? Thanks, Tejas Belagod. ARM. Changelog: 2012-06-19 Tejas Belagod tejas.bela...@arm.com gcc/ * config/aarch64/aarch64-simd.md (aarch64_sqrdmulh_lanemode, aarch64_sqdmlSBINQOPS:asl_lanemode_internal, aarch64_sqdmlal_lanemode, aarch64_sqdmlal_laneqmode, aarch64_sqdmlsl_lanemode, aarch64_sqdmlsl_laneqmode, aarch64_sqdmlSBINQOPS:asl2_lanemode_internal, aarch64_sqdmlal2_lanemode, aarch64_sqdmlal2_laneqmode, aarch64_sqdmlsl2_lanemode, aarch64_sqdmlsl2_laneqmode, aarch64_sqdmull_lanemode_internal, aarch64_sqdmull_lanemode, aarch64_sqdmull_laneqmode, aarch64_sqdmull2_lanemode_internal, aarch64_sqdmull2_lanemode, aarch64_sqdmull2_laneqmode): Change the constraint of the indexed operand to use vwl instead of w. * config/aarch64/aarch64.c (aarch64_hard_regno_nregs): Add case for FP_LO_REGS class. (aarch64_regno_regclass): Return FP_LO_REGS if register in V0 - V15. (aarch64_secondary_reload): Change condition to check for both FP reg classes. (aarch64_class_max_nregs): Add case for FP_LO_REGS. * config/aarch64/aarch64.h (reg_class): New register class FP_LO_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (FP_LO_REGNUM_P): New. * config/aarch64/aarch64.md (V15_REGNUM): New. * config/aarch64/constraints.md (x): New register constraint. * config/aarch64/iterators.md (vwx): New.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9ceefee..43017df 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1897,7 +1897,7 @@ (unspec:VSDQ_HSI [(match_operand:VSDQ_HSI 1 register_operand w) (vec_select:VEL - (match_operand:VCON 2 register_operand w) + (match_operand:VCON 2 register_operand vwx) (parallel [(match_operand:SI 3 immediate_operand i)]))] VQDMULH))] TARGET_SIMD @@ -1940,7 +1940,7 @@ (sign_extend:VWIDE (vec_duplicate:VD_HSI (vec_select:VEL - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (parallel [(match_operand:SI 4 immediate_operand i)]))) )) (const_int 1] @@ -1960,7 +1960,7 @@ (match_operand:SD_HSI 2 register_operand w)) (sign_extend:VWIDE (vec_select:VEL - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (parallel [(match_operand:SI 4 immediate_operand i)]))) ) (const_int 1] @@ -1974,7 +1974,7 @@ [(match_operand:VWIDE 0 register_operand =w) (match_operand:VWIDE 1 register_operand 0) (match_operand:VSD_HSI 2 register_operand w) - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (match_operand:SI 4 immediate_operand i)] TARGET_SIMD { @@ -1989,7 +1989,7 @@ [(match_operand:VWIDE 0 register_operand =w) (match_operand:VWIDE 1 register_operand 0) (match_operand:VSD_HSI 2 register_operand w) - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (match_operand:SI 4 immediate_operand i)] TARGET_SIMD { @@ -2004,7 +2004,7 @@ [(match_operand:VWIDE 0 register_operand =w) (match_operand:VWIDE 1 register_operand 0) (match_operand:VSD_HSI 2 register_operand w) - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (match_operand:SI 4 immediate_operand i)] TARGET_SIMD { @@ -2019,7 +2019,7 @@ [(match_operand:VWIDE 0 register_operand =w) (match_operand:VWIDE 1 register_operand 0) (match_operand:VSD_HSI 2 register_operand w) - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (match_operand:SI 4 immediate_operand i)] TARGET_SIMD { @@ -2114,7 +2114,7 @@ (sign_extend:VWIDE (vec_duplicate:VHALF (vec_select:VEL - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3 register_operand vwx) (parallel [(match_operand:SI 4 immediate_operand i)]) (const_int 1] @@ -2128,7 +2128,7 @@ [(match_operand:VWIDE 0 register_operand =w) (match_operand:VWIDE 1 register_operand w) (match_operand:VQ_HSI 2 register_operand w) - (match_operand:VCON 3 register_operand w) + (match_operand:VCON 3
[PATCH] AIX pthread.h fixincludes
AIX 5.2 pthread.h uses the wrong number of braces for more of the PTHREAD initializers. This patch extends the earlier patch to fix the other broken macros. * inclhack.def (aix_mutex_initializer_1, aix_cond_initializer_1, aix_rwlock_initializer): New. * fixincl.x: Regenerate. * tests/base/pthread.h [AIX_MUTEX_INITIALIZER_1_CHECK, AIX_COND_INITIALIZER_1_CHECK, AIX_RWLOCK_INITIALIZER_1_CHECK]: New. Okay? Thanks, David Index: inclhack.def === --- inclhack.def(revision 188738) +++ inclhack.def(working copy) @@ -397,7 +397,9 @@ }; /* - * pthread.h on AIX defines PTHREAD_ONCE_INIT without enough braces. + * pthread.h on AIX defines PTHREAD_ONCE_INIT, PTHREAD_MUTEX_INITIALIZER, + * PTHREAD_COND_INITIALIZER and PTHREAD_RWLOCK_INITIALIZER without enough + * braces. */ fix = { hackname = aix_once_init_1; @@ -425,6 +427,45 @@ }\n; }; +fix = { +hackname = aix_mutex_initializer_1; +mach = *-*-aix*; +files = pthread.h; +select= #define[ \t]PTHREAD_MUTEX_INITIALIZER \n + \\{ \n; +c_fix = format; +c_fix_arg = #define PTHREAD_MUTEX_INITIALIZER \\\n + {{ \\\n; +test_text = #define PTHREAD_MUTEX_INITIALIZER \n + { \n; +}; + +fix = { +hackname = aix_cond_initializer_1; +mach = *-*-aix*; +files = pthread.h; +select= #define[ \t]PTHREAD_COND_INITIALIZER \n + \\{ \n; +c_fix = format; +c_fix_arg = #define PTHREAD_COND_INITIALIZER \\\n + {{ \\\n; +test_text = #define PTHREAD_COND_INITIALIZER \n + { \n; +}; + +fix = { +hackname = aix_rwlock_initializer_1; +mach = *-*-aix*; +files = pthread.h; +select= #define[ \t]PTHREAD_RWLOCK_INITIALIZER \n + \\{ \n; +c_fix = format; +c_fix_arg = #define PTHREAD_RWLOCK_INITIALIZER \\\n + {{ \\\n; +test_text = #define PTHREAD_RWLOCK_INITIALIZER \n + { \n; +}; + /* * pthread.h on AIX 4.3.3 tries to define a macro without whitspace * which violates a requirement of ISO C.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote: On Tue, 19 Jun 2012, William J. Schmidt wrote: I remember having this discussion, and I was looking for it to check on the details, but I can't seem to find it either in my inbox or in the archives. Can you please point me to that again? Sorry for the bother. It was in the Correct cost model for strided loads thread. Ah, right, thanks. I think it will be best to make that a separate patch in the series. Like so: (1) Add calls to the new interface without disturbing existing logic; modify the profitability algorithms to query the new model for inside costs. Default algorithm for the model is to just sum costs as is done today. (1a) Split up the cost hooks (one for loads/stores with misalign parm, one for vector_stmt with tree_code, etc.). (x) Add heuristics to target models as desired. (2) Handle the SLP ordering problem. (3) Handle outside costs in the target model. (4) Remove the now unnecessary cost fields and the calls that set them. I'll start work on this series of patches as I have time between other projects. Thanks, Bill Richard.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote: On Tue, 19 Jun 2012, William J. Schmidt wrote: I remember having this discussion, and I was looking for it to check on the details, but I can't seem to find it either in my inbox or in the archives. Can you please point me to that again? Sorry for the bother. It was in the Correct cost model for strided loads thread. Ah, right, thanks. I think it will be best to make that a separate patch in the series. Like so: (1) Add calls to the new interface without disturbing existing logic; modify the profitability algorithms to query the new model for inside costs. Default algorithm for the model is to just sum costs as is done today. (1a) Split up the cost hooks (one for loads/stores with misalign parm, one for vector_stmt with tree_code, etc.). (x) Add heuristics to target models as desired. (2) Handle the SLP ordering problem. (3) Handle outside costs in the target model. (4) Remove the now unnecessary cost fields and the calls that set them. I'll start work on this series of patches as I have time between other projects. Thanks! Richard.
Re: [PATCH] AIX pthread.h fixincludes
Hi David, On Tue, Jun 19, 2012 at 7:16 AM, David Edelsohn dje@gmail.com wrote: Okay? Okay. Cheers - Bruce
[testsuite] Clear hwcap_2 with Sun ld
In recent Solaris 11 Update 1 builds, the Sun assembler tags AVX2 object files with a hardware capability that isn't cleared by the current gcc/testsuite/gcc.target/i386/clearcap.map file. There are some new capabilities in sys/auxv_386.h in AT_SUN_CAP_HW2, but unfortunately the old linker map syntax has no support for setting/clearing hwcap_2, and won't ever get it. To deal with this situation, I've introduced a new mapfile using the v2 syntax which does support clearing hwcap_2, but now I need to determine if the linker supports that syntax before using it. Solaris 11 ld has the necessary support, and it was backported to Solaris 10 Update 10. Older Solaris 10 updates and Solaris 8/9 lack it, though. The following patch does just that. Tested with the appropriate runtest invocation on i386-pc-solaris2.11 (ld v2 support), i386-pc-solaris2.9 (ld v1 support only), and x86_64-unknown-linux-gnu (GNU ld which doesn't support either syntax). Unless someone finds fault with the patch, I'll commit it in a day. Rainer 2012-06-19 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.target/i386/clearcapv2.map: New file. * gcc.target/i386/i386.exp: Try it first before clearcap.map. # HG changeset patch # Parent 02789d700fe014df8358c45b8dc09a6b104fbb6b Clear hwcap_2 with Sun ld diff --git a/gcc/testsuite/gcc.target/i386/clearcapv2.map b/gcc/testsuite/gcc.target/i386/clearcapv2.map new file mode 100644 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/clearcapv2.map @@ -0,0 +1,7 @@ +# clear all hardware capabilities emitted by Sun as: the tests here +# guard against execution at runtime +# uses mapfile v2 syntax which is the only way to clear AT_SUN_CAP_HW2 flags +$mapfile_version 2 +CAPABILITY { + HW = ; +}; diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp --- a/gcc/testsuite/gcc.target/i386/i386.exp +++ b/gcc/testsuite/gcc.target/i386/i386.exp @@ -256,12 +256,23 @@ proc check_effective_target_rtm { } { # If the linker used understands -M mapfile, pass it to clear hardware # capabilities set by the Sun assembler. -set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcap.map +# Try mapfile syntax v2 first which is the only way to clear hwcap_2 flags. +set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcapv2.map -if [check_no_compiler_messages mapfile executable { +if ![check_no_compiler_messages mapfilev2 executable { +int main (void) { return 0; } +} $clearcap_ldflags ] { +# If this doesn't work, fall back to the less capable v1 syntax. +set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcap.map + +if ![check_no_compiler_messages mapfile executable { int main (void) { return 0; } - } $clearcap_ldflags ] { +} $clearcap_ldflags ] { + unset clearcap_ldflags +} +} +if [info exists clearcap_ldflags] { if { [info procs gcc_target_compile] != [list] \ [info procs saved_gcc_target_compile] == [list] } { rename gcc_target_compile saved_gcc_target_compile -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][7/n] VRP and anti-ranges
And here is the union_ranges part. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-06-19 Richard Guenther rguent...@suse.de * tree-vrp.c (union_ranges): New function. (vrp_meet_1): Use union_ranges. (vrp_meet): Dump what we union and call vrp_meet_1. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2012-06-19 15:18:34.0 +0200 --- gcc/tree-vrp.c 2012-06-19 15:23:20.803752745 +0200 *** vrp_visit_stmt (gimple stmt, edge *taken *** 6770,6775 --- 6770,7032 return SSA_PROP_VARYING; } + /* Union the two value-ranges { *VR0TYPE, *VR0MIN, *VR0MAX } and +{ VR1TYPE, VR0MIN, VR0MAX } and store the result +in { *VR0TYPE, *VR0MIN, *VR0MAX }. This may not be the smallest +possible such range. The resulting range is not canonicalized. */ + + static void + union_ranges (enum value_range_type *vr0type, + tree *vr0min, tree *vr0max, + enum value_range_type vr1type, + tree vr1min, tree vr1max) + { + bool mineq = operand_equal_p (*vr0min, vr1min, 0); + bool maxeq = operand_equal_p (*vr0max, vr1max, 0); + + /* [] is vr0, () is vr1 in the following classification comments. */ + if (mineq maxeq) + { + /* [( )] */ + if (*vr0type == vr1type) + /* Nothing to do for equal ranges. */ + ; + else if ((*vr0type == VR_RANGE +vr1type == VR_ANTI_RANGE) + || (*vr0type == VR_ANTI_RANGE + vr1type == VR_RANGE)) + { + /* For anti-range with range union the result is varying. */ + goto give_up; + } + else + gcc_unreachable (); + } + else if (operand_less_p (*vr0max, vr1min) == 1 + || operand_less_p (vr1max, *vr0min) == 1) + { + /* [ ] ( ) or ( ) [ ] +If the ranges have an empty intersection, result of the union +operation is the anti-range or if both are anti-ranges +it covers all. */ + if (*vr0type == VR_ANTI_RANGE + vr1type == VR_ANTI_RANGE) + goto give_up; + else if (*vr0type == VR_ANTI_RANGE + vr1type == VR_RANGE) + ; + else if (*vr0type == VR_RANGE + vr1type == VR_ANTI_RANGE) + { + *vr0type = vr1type; + *vr0min = vr1min; + *vr0max = vr1max; + } + else if (*vr0type == VR_RANGE + vr1type == VR_RANGE) + { + /* The result is the convex hull of both ranges. */ + if (operand_less_p (*vr0max, vr1min) == 1) + { + /* If the result can be an anti-range, create one. */ + if (TREE_CODE (*vr0max) == INTEGER_CST + TREE_CODE (vr1min) == INTEGER_CST + vrp_val_is_min (*vr0min) + vrp_val_is_max (vr1max)) + { + tree min = int_const_binop (PLUS_EXPR, + *vr0max, integer_one_node); + tree max = int_const_binop (MINUS_EXPR, + vr1min, integer_one_node); + if (!operand_less_p (max, min)) + { + *vr0type = VR_ANTI_RANGE; + *vr0min = min; + *vr0max = max; + } + else + *vr0max = vr1max; + } + else + *vr0max = vr1max; + } + else + { + /* If the result can be an anti-range, create one. */ + if (TREE_CODE (vr1max) == INTEGER_CST + TREE_CODE (*vr0min) == INTEGER_CST + vrp_val_is_min (vr1min) + vrp_val_is_max (*vr0max)) + { + tree min = int_const_binop (PLUS_EXPR, + vr1max, integer_one_node); + tree max = int_const_binop (MINUS_EXPR, + *vr0min, integer_one_node); + if (!operand_less_p (max, min)) + { + *vr0type = VR_ANTI_RANGE; + *vr0min = min; + *vr0max = max; + } + else + *vr0min = vr1min; + } + else + *vr0min = vr1min; + } + } + else + gcc_unreachable (); + } + else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1) + (mineq || operand_less_p (*vr0min, vr1min) == 1)) + { + /* [ ( ) ] or [( ) ] or [ ( )] */ + if (*vr0type == VR_RANGE + vr1type == VR_RANGE) + ; + else if (*vr0type == VR_ANTI_RANGE + vr1type == VR_ANTI_RANGE) + { + *vr0type = vr1type; + *vr0min = vr1min; + *vr0max = vr1max;
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On Mon, 18 Jun 2012, Iain Buclaw wrote: These series of patches are for the D compiler frontend for inclusion into GCC. http://www.gdcproject.org/files/gdc_frontend.patch.gz http://www.gdcproject.org/files/gdc_libphobos.patch.gz http://www.gdcproject.org/files/gdc_testsuite.patch.gz http://www.gdcproject.org/files/gdc_gcc.patch.gz Please provide GNU ChangeLog entries for each patch, for each relevant ChangeLog file. It would be best to post those in plain text to the list, even if the patches themselves are too big. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][AARCH64]: Invent new regclass - FP low regs.
On 19/06/12 15:03, Tejas Belagod wrote: Hi, The attached patch invents a new register class V0 - V15 that is needed for some lane variants of AdvSIMD instructions that can only take V0 - V15 as their indexed register when working on half-word type. Regression tests are happy. OK? OK /Marcus
Re: [PATCH 4/4] - Merging gdc (GNU D Compiler) into gcc
On Mon, 18 Jun 2012, Iain Buclaw wrote: --- gcc-4.8-20120617/gcc/doc/install.texi 2012-05-29 15:14:06.0 +0100 +++ gcc-4.8/gcc/doc/install.texi 2012-06-18 20:39:45.058591380 +0100 @@ -1360,12 +1360,12 @@ their runtime libraries should be built. grep language= */config-lang.in @end smallexample Currently, you can use any of the following: -@code{all}, @code{ada}, @code{c}, @code{c++}, @code{fortran}, +@code{all}, @code{ada}, @code{c}, @code{c++}, @code{d}, @code{fortran}, @code{go}, @code{java}, @code{objc}, @code{obj-c++}. Building the Ada compiler has special requirements, see below. If you do not pass this flag, or specify the option @code{all}, then all default languages available in the @file{gcc} sub-tree will be configured. -Ada, Go and Objective-C++ are not default languages; the rest are. +Ada, D, Go and Objective-C++ are not default languages; the rest are. Maybe this should be true, but I don't see a build_by_default=no setting in config-lang.in (in gdc_frontend.patch.gz) to make it so. --- gcc-4.8-20120617/gcc/doc/standards.texi 2011-12-21 17:53:58.0 + +++ gcc-4.8/gcc/doc/standards.texi2012-04-22 17:11:38.553880036 +0100 @@ -289,6 +289,16 @@ a specific version. In general GCC trac closely, and any given release will support the language as of the date that the release was frozen. +@section D language + +The D language continues to evolve as of this writing; see the +@uref{http://golang.org/@/doc/@/go_spec.html, current language +specifications}. At present there are no specific versions of Go, and +there is no way to describe the language supported by GCC in terms of +a specific version. In general GCC tracks the evolving specification +closely, and any given release will support the language as of the +date that the release was frozen. Referring to Go in a section about D doesn't make sense I don't see entries in contrib.texi in this patch. I'd also expect contrib/gcc_update to be updated to handle timestamp ordering for generated files in libphobos. Are you volunteering to be appointed maintainer for this front end by the SC? -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Fix vrp68 testcase
This fixes the testcase to match reality - and update the comments appropriately in it. Tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-06-19 Richard Guenther rguent...@suse.de * gcc.dg/tree-ssa/vrp68.c: Adjust testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/vrp68.c === --- gcc/testsuite/gcc.dg/tree-ssa/vrp68.c (revision 188780) +++ gcc/testsuite/gcc.dg/tree-ssa/vrp68.c (working copy) @@ -8,17 +8,11 @@ void test1 (int i, int j, int b) RANGE(i, 2, 6); ANTI_RANGE(j, 1, 7); MERGE(b, i, j); - CHECK_ANTI_RANGE(i, 7, 7); CHECK_ANTI_RANGE(i, 1, 1); - /* If we swap the anti-range tests the ~[6, 6] test is never eliminated. */ } int main() { } -/* While subsequent VRP/DOM passes manage to even recognize the ~[6, 6] - test as redundant a single VRP run will arbitrarily choose ~[0, 0] when - merging [1, 5] with ~[0, 6] so the first VRP pass can only eliminate - the ~[0, 0] check as redundant. */ +/* VRP will arbitrarily choose ~[1, 1] when merging [2, 6] with ~[1, 7]. */ -/* { dg-final { scan-tree-dump-times link_error 0 vrp1 { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times link_error 1 vrp1 } } */ +/* { dg-final { scan-tree-dump-times link_error 0 vrp1 } } */ /* { dg-final { cleanup-tree-dump vrp1 } } */
Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On 2012-06-18 23:38, Uros Bizjak wrote: On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Fair enough. I'm ok with the patch as-is. r~
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
Hello, I had a very quick look through the gdc_frontend patch. Below are a couple of comments on it: http://www.gdcproject.org/files/gdc_frontend.patch.gz [PATCH 1/4]: The D compiler frontend - gcc/d How did you test this? You include rtl.h/expr.h in d-builtins.c and d-gcc-includes.h, which should both be in ALL_HOST_FRONTEND_OBJS and fail to build because IN_GCC_FRONTEND is defined and GCC_RTL_H is poisoned. See system.h: /* Front ends should never have to include middle-end headers. Enforce this by poisoning the header double-include protection defines. */ #ifdef IN_GCC_FRONTEND #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H #endif Do you somehow bypass the normal build system? Or maybe you don't include system.h? Either way, front ends should never have to include RTL headers. BTW you also include output.h in those two files, and I am about two patches away from adding output.h to the list of headers that no front end should ever include (a front end should never have to write out assembly). Can you please check what you need output.h for, and fix this? What are you calling targetm.asm_out.output_mi_thunk and targetm.asm_out.generate_internal_label for? Thunks and aliases should go through cgraphunit. (NB: This also means that this front end cannot work with LTO. IMHO we shouldn't let in new front ends that don't work with LTO.) Many functions have no leading comment, and other GNU coding standard requirements are not followed either. Those should IMHO be fixed also, before this front end can be accepted. There is this comment: +/* GCC does not support jumps from asm statements. This isn't really true anymore, as your patch also notes: + -- + %% Fix for GCC-4.5+ + GCC now accepts a 5th operand, ASM_LABELS. (...) + For prior versions of gcc, this requires a backpatch. It seems to me that if this front end is contributed, handling of prior version of gcc isn't necessary anymore - that code should just be removed. + + case Op_de: +#ifndef TARGET_80387 +#define XFmode TFmode +#endif + mode = XFmode; // not TFmode What is this hack for? This is not the way to find the right mode for this operation. +#ifdef TARGET_80387 +#include d-asm-i386.h +#else +#define D_NO_INLINE_ASM_AT_ALL +#endif + +/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */ Idem here. And Apple GCC is irrelevant too, if this front end lands on FSF trunk. What is d/d-asm-i386.h for? It looks like i386 is a special case throughout the front end. In d-gcc-tree.h: +// normally include config.h (hconfig.h, tconfig.h?), but that +// includes things that cause problems, so... + +union tree_node; +typedef union tree_node *tree; See coretypes.h. Ciao! Steven
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On Mon, 18 Jun 2012, Iain Buclaw wrote: [PATCH 1/4]: The D compiler frontend - gcc/d Only selectively reviewed, but here are some comments: diff -Naur gcc-4.8-20120617/gcc/d/asmstmt.cc gcc-4.8/gcc/d/asmstmt.cc --- gcc-4.8-20120617/gcc/d/asmstmt.cc 1970-01-01 01:00:00.0 +0100 +++ gcc-4.8/gcc/d/asmstmt.cc2012-06-05 13:42:09.044876794 +0100 @@ -0,0 +1,2731 @@ +// asmstmt.cc -- D frontend for GCC. +// Originally contributed by David Friedman +// Maintained by Iain Buclaw + +// GCC is free software; you can redistribute it and/or modify it under Every file more than ten lines long needs a copyright notice as well as the license notice. See http://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html for instructions, including the case of multiple copyright holders - though if there are any significant (more than fifteen lines of copyrightable text or so) contributors not assigning copyright to the FSF then special approval from the FSF will be needed to include the front end. I would say that the files in dfrontend/ need copyright and license notices as well, though not necessarily in exactly GNU form. Thus, you will need to get Digital Mars to approve appropriate notices for those files (aav.c is the first I see that's lacking such a notice but is long enough to need one; likewise async.c, gnuc.c, speller.c; rmem.c just says All Rights Reserved and needs a proper license notice like other files; likewise rmem.h). +#ifdef TARGET_80387 +#include d-asm-i386.h +#else +#define D_NO_INLINE_ASM_AT_ALL +#endif Ugh. We want to move away from target macros, and this isn't even a proper target macro. It would be better to define target hooks for the D inline asm support - possibly with a D-specific hook structure, like the C hooks structure. (Even if you avoid needing copyright assignments for the front end itself, such hook implementations will probably need to be assigned.) +/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */ I don't see why that should be in the least relevant to a contribution to FSF GCC. If you can do things in a more natural way in FSF GCC, then do so. Each function in the GCC-specific parts of the code should have a comment on it, explaining the semantics of the function, its operands and its return value if any. For new code in GCC, it's better to use snprintf than sprintf. +extern void decode_options (struct gcc_options *, struct gcc_options *, Please use appropriate headers rather than local declarations of GCC functions. +// d-bi-attr.h -- D frontend for GCC. This file looks like it's largely copied from elsewhere in GCC. In such a case, please work out a better way to refactor the code so that it can be shared rather than duplicated. (Again, such common code will no doubt need full copyright assignments.) I don't know whether your assignment Assigns Past and Future Changes to the GNU D Compiler (GDC) covers changes elsewhere in GCC. But I expect a general assignment for GCC to be needed for any refactoring involved in adapting common code for use in D. (And such refactoring would be a new contribution so there shouldn't be any issues with unknown previous contributors without assignments - those would only arise if significant amounts of previously written D front-end code are being moved into common code.) +#if D_VA_LIST_TYPE_VOIDPTR Please avoid #if conditionals on anything that could be a target property. It's generally better to use if conditionals instead of #if, so that all cases are checked for syntax in all compiles. I see #if conditions on defines such as V2 and V1 as well. Unless something is an *existing* target macro or configure macro in GCC, use if conditions and ensure that the macro is defined to true or false values (rather than defined or not defined). But if a macro is always defined, or never defined, then just avoiding the conditionals may be better. The gcc/d/dfrontend/readme.txt says: +These sources are free, they are redistributable and modifiable +under the terms of the GNU General Public License (attached as gpl.txt), +or the Artistic License (attached as artistic.txt). But that license is GPLv2. We need an explicit notice (approved by the copyright holder) saying that *any later version* may be used. If Digital Mars wishes to license the separately maintained dfrontend/ code under GPLv2+ rather than GPLv3+, that's fine, just like the gofrontend/ code is under a permissive license - but it needs to be explicit that any later version may be used. I haven't studied the details of the dfrontend/ code. But if you are to follow the Go model - separately maintained code for the front end proper that may be used verbatim in multiple compilers, with the code outside dfrontend/ doing everything related to interfacing with GCC, and only what's related to interfacing with GCC - then the +/* NOTE: This file has been patched
[PATCH, i386]: Introduce FRNDINT_ROUNDING int iterator
Hello! 2012-06-19 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (FRNDINT_ROUNDING): New int iterator. (rounding): New int attribute. (ROUNDING): Ditto. (frndintxf2_rounding): Macroize insn from frndintxf2_{floor,ceil,trunc} using FRNDINT_ROUNDING int iterator. (frndintxf2_rounding_i387): Macroize insn from frndintxf2_{floor,ceil,trunc}_i387 using FRNDINT_ROUNDING int iterator. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Will be committed to mainline SVN. BTW: A follow-up patch will also macroize fistmode2_{floor,ceil} and friends. Uros. Index: i386.md === --- i386.md (revision 188781) +++ i386.md (working copy) @@ -15099,11 +15099,26 @@ DONE; }) +(define_int_iterator FRNDINT_ROUNDING + [UNSPEC_FRNDINT_FLOOR +UNSPEC_FRNDINT_CEIL +UNSPEC_FRNDINT_TRUNC]) + +(define_int_attr rounding + [(UNSPEC_FRNDINT_FLOOR floor) +(UNSPEC_FRNDINT_CEIL ceil) +(UNSPEC_FRNDINT_TRUNC trunc)]) + +(define_int_attr ROUNDING + [(UNSPEC_FRNDINT_FLOOR FLOOR) +(UNSPEC_FRNDINT_CEIL CEIL) +(UNSPEC_FRNDINT_TRUNC TRUNC)]) + ;; Rounding mode control word calculation could clobber FLAGS_REG. -(define_insn_and_split frndintxf2_floor +(define_insn_and_split frndintxf2_rounding [(set (match_operand:XF 0 register_operand) (unspec:XF [(match_operand:XF 1 register_operand)] -UNSPEC_FRNDINT_FLOOR)) + FRNDINT_ROUNDING)) (clobber (reg:CC FLAGS_REG))] TARGET_USE_FANCY_MATH_387 flag_unsafe_math_optimizations @@ -15112,30 +15127,30 @@ 1 [(const_int 0)] { - ix86_optimize_mode_switching[I387_FLOOR] = 1; + ix86_optimize_mode_switching[I387_ROUNDING] = 1; operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); - operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR); + operands[3] = assign_386_stack_local (HImode, SLOT_CW_ROUNDING); - emit_insn (gen_frndintxf2_floor_i387 (operands[0], operands[1], - operands[2], operands[3])); + emit_insn (gen_frndintxf2_rounding_i387 (operands[0], operands[1], +operands[2], operands[3])); DONE; } [(set_attr type frndint) - (set_attr i387_cw floor) + (set_attr i387_cw rounding) (set_attr mode XF)]) -(define_insn frndintxf2_floor_i387 +(define_insn frndintxf2_rounding_i387 [(set (match_operand:XF 0 register_operand =f) (unspec:XF [(match_operand:XF 1 register_operand 0)] -UNSPEC_FRNDINT_FLOOR)) + FRNDINT_ROUNDING)) (use (match_operand:HI 2 memory_operand m)) (use (match_operand:HI 3 memory_operand m))] TARGET_USE_FANCY_MATH_387 flag_unsafe_math_optimizations fldcw\t%3\n\tfrndint\n\tfldcw\t%2 [(set_attr type frndint) - (set_attr i387_cw floor) + (set_attr i387_cw rounding) (set_attr mode XF)]) (define_expand floorxf2 @@ -15357,45 +15372,6 @@ DONE; }) -;; Rounding mode control word calculation could clobber FLAGS_REG. -(define_insn_and_split frndintxf2_ceil - [(set (match_operand:XF 0 register_operand) - (unspec:XF [(match_operand:XF 1 register_operand)] -UNSPEC_FRNDINT_CEIL)) - (clobber (reg:CC FLAGS_REG))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -can_create_pseudo_p () - # - 1 - [(const_int 0)] -{ - ix86_optimize_mode_switching[I387_CEIL] = 1; - - operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); - operands[3] = assign_386_stack_local (HImode, SLOT_CW_CEIL); - - emit_insn (gen_frndintxf2_ceil_i387 (operands[0], operands[1], - operands[2], operands[3])); - DONE; -} - [(set_attr type frndint) - (set_attr i387_cw ceil) - (set_attr mode XF)]) - -(define_insn frndintxf2_ceil_i387 - [(set (match_operand:XF 0 register_operand =f) - (unspec:XF [(match_operand:XF 1 register_operand 0)] -UNSPEC_FRNDINT_CEIL)) - (use (match_operand:HI 2 memory_operand m)) - (use (match_operand:HI 3 memory_operand m))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations - fldcw\t%3\n\tfrndint\n\tfldcw\t%2 - [(set_attr type frndint) - (set_attr i387_cw ceil) - (set_attr mode XF)]) - (define_expand ceilxf2 [(use (match_operand:XF 0 register_operand)) (use (match_operand:XF 1 register_operand))] @@ -15613,45 +15589,6 @@ DONE; }) -;; Rounding mode control word calculation could clobber FLAGS_REG. -(define_insn_and_split frndintxf2_trunc - [(set (match_operand:XF 0 register_operand) - (unspec:XF [(match_operand:XF 1 register_operand)] -UNSPEC_FRNDINT_TRUNC)) - (clobber (reg:CC FLAGS_REG))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -can_create_pseudo_p () - # - 1 - [(const_int 0)] -{ - ix86_optimize_mode_switching[I387_TRUNC] = 1; - - operands[2] =
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On Mon, 18 Jun 2012, Iain Buclaw wrote: http://www.gdcproject.org/files/gdc_libphobos.patch.gz Same comments as before about FSF postal addresses. Although runtime libraries need not be assigned to the FSF (as per the GCC Mission Statement), all significant files should still have copyright and license notices (approved by all significant contributors) so that people know the free software terms under which they may be used. E.g., libphobos/libdruntime/config/x3.c appears to be missing such notices. Without a license (or a dedication to the public domain), a file is presumptively copyright and has no license for anyone to use it at all. +if true; then if true seems odd; if you have a good reason for it, you need to comment it. +# generated automatically by aclocal 1.9.6 -*- Autoconf -*- Please use the standard documented autoconf/automake versions for GCC (autoconf 2.64, automake 1.11.1). diff -Naur gcc-4.8-20120617/libphobos/autom4te.cache/output.0 gcc-4.8/libphobos/autom4te.cache/output.0 We don't check in autom4te.cache directories. +# libphobos is usually a symlink to gcc/d/phobos, so libphobos/.. No it's not. No runtime libraries should go under gcc/ any more at all. +dnl Copied from libstdc++-v3/acinclude.m4. Indeed, multilib will not work Refactor into the config/ directory, don't copy. \ No newline at end of file Add any missing newlines to text files in all patches. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On Mon, 18 Jun 2012, Iain Buclaw wrote: http://www.gdcproject.org/files/gdc_testsuite.patch.gz I have no comments on this patch for now. -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch] Adjustments for Windows x64 SEH
On 2012-06-18 05:22, Tristan Gingold wrote: + /* Win64 SEH, very large frames need a frame-pointer as maximum stack + allocation is 4GB (add a safety guard for saved registers). */ + if (TARGET_64BIT_MS_ABI get_frame_size () + 4096 SEH_MAX_FRAME_SIZE) +return true; Elsewhere you say this is an upper bound for stack use by the prologue. It's clearly a wild guess. The maximum stack use is 10*sse + 8*int registers saved, which is a lot less than 4096. That said, I'm ok with *using* 4096 so long that the comment clearly states that it's a large over-estimate. I do suggest, however, folding this into the SEH_MAX_FRAME_SIZE value, and expanding on the comment there. I see no practical difference between 0x8000 and 0x7fffe000 being the limit. +/* Output assembly code to get the establisher frame (Windows x64 only). + This corresponds to what will be computed by Windows from Frame Register + and Frame Register Offset fields of the UNWIND_INFO structure. Since + these values are computed very late (by ix86_expand_prologue), we cannot + express this using only RTL. */ + +const char * +ix86_output_establisher_frame (rtx target) +{ + if (!frame_pointer_needed) +{ + /* Note that we have advertized an lea operation. */ + output_asm_insn (lea{q}\t{0(%%rsp), %0|%0, 0[rsp]}, target); +} + else +{ + rtx xops[3]; + struct ix86_frame frame; + + /* Recompute the frame layout here. */ + ix86_compute_frame_layout (frame); + + /* Closely follow how the frame pointer is set in + ix86_expand_prologue. */ + xops[0] = target; + xops[1] = hard_frame_pointer_rtx; + if (frame.hard_frame_pointer_offset == frame.reg_save_offset) + xops[2] = GEN_INT (0); + else + xops[2] = GEN_INT (-(frame.stack_pointer_offset + - frame.hard_frame_pointer_offset)); + output_asm_insn (lea{q}\t{%a2(%1), %0|%0, %a2[%1]}, xops); This is what register elimination is for; the value substitution happens during reload. Now, one *could* add a new pseudo-hard-register for this (we support as many register eliminations as needed), but before we do that we need to decide if we can adjust the soft frame pointer to be the value required. If so, you can then rely on the existing __builtin_frame_address. Which is a very attractive sounding solution. I'm 99% moving the sfp will work. r~
Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant
On Tue, Jun 19, 2012 at 10:54 AM, Richard Guenther richard.guent...@gmail.com wrote: The issue is that your testcase is invalid. int x = ret(*(fooS + i)); this access is only ever valid for i == 0 as otherwise you are creating a pointer that points outside of the object fooS. Richard, thanks for your reply. The testcase is invalid also for other reasons, a big one being the automatic sorting and merging of sections with a dollar sign in their names is a Windows-originated extension used for PE target only, which makes it not work elsewhere. Sorry about that, I'll refrain from using anything non-standard here. Accessing outside object bounds is IMO a common C practice allowed by the existence of pointers. This exact technique is used for decentralized lists created during compile-time, be it extensible handler/hook structures, pointers to init/fini functions etc. It has notable use e.g. in Linux kernel [1], [2]. The programmer places defined data to a special linker section in individual compilation units, then traverse through it using linker-provided symbols (e.g. ld creates __start_section-name and __end_section-name automatically), as test0.c shows: $ gcc -O1 -m32 -fno-toplevel-reorder test0.c ./a.out 0: 1 1: 2 2: 3 The sole reason for messing with the section attributes is to keep the values together. Because I can force the order (to the necessary extent) by -fno-toplevel-reorder, the program can be changed to use just bounding variables without any linker magic (test1.c): $ gcc -O1 -m32 -fno-toplevel-reorder test1.c ./a.out 0: 1 1: 2 2: 3 The only changes in the code are removing the section attributes and adding offset by one, skipping the starting element (as __start_foo has a size now). Now, changing the end condition from test for the end address to test for the end sentinel -1 and duplicating the printf() line (to hit the right optimization spot), something weird happens (test2.c): $ gcc -O1 -m32 -fno-toplevel-reorder test2.c ./a.out 0: 1 0: -1 1: 2 1: -1 2: 3 2: -1 Why is the second line in each iteration different from the first? It should be printing exactly the same expression. Analyzing the dom phase log shows the memory access is optimized to constant value of the base variable, hence -1. And without optimization, both of them are correct: $ gcc -O0 -m32 test2.c ./a.out 0: 1 0: 1 1: 2 1: 2 2: 3 2: 3 That is the problem I am talking about and which the patch aims to address. Jiri [1] http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/index.html [2] http://lkml.indiana.edu/hypermail/linux/kernel/0706.2/2552.html #include stdio.h __attribute__((section(foo))) const int foo1 = 1; __attribute__((section(foo))) const int foo2 = 2; __attribute__((section(foo))) const int foo3 = 3; extern const int __start_foo, __stop_foo; int main(void) { int i; i = 0; do { printf(%d: %d\n, i, *(__start_foo + i)); i++; } while(__start_foo + i != __stop_foo); return 0; } #include stdio.h const int __start_foo = -1; const int foo1 = 1; const int foo2 = 2; const int foo3 = 3; const int __stop_foo = -1; int main(void) { int i; i = 0; do { printf(%d: %d\n, i, *(__start_foo + 1 + i)); i++; } while(__start_foo + 1 + i != __stop_foo); return 0; } #include stdio.h const int __start_foo = -1; const int foo1 = 1; const int foo2 = 2; const int foo3 = 3; const int __stop_foo = -1; int main(void) { int i; i = 0; do { printf(%d: %d\n, i, *(__start_foo + 1 + i)); printf(%d: %d\n, i, *(__start_foo + 1 + i)); i++; } while(*(__start_foo + 1 + i) != -1); return 0; }
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
On Wed, Jun 13, 2012 at 10:47 PM, Jason Merrill ja...@redhat.com wrote: On 06/13/2012 04:26 PM, Sterling Augustine wrote: I lean toward -g myself, since there doesn't seem to be a strong rule one way or the other. Unless there are further comments, I'll stick with -g then. I think that covers all the comments, so I think I will commit this Friday morning unless I hear anything further. Weren't you going to repost the patch first? :) I hate how codereview.appspot.com doesn't connect some messages properly. After this prompting, I re-posted the patch here: http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00949.html As this has addressed all previous comments, and barring any objections, I'll check it in tomorrow morning. Sterling
Re: [PATCH 2/3] Add XLP-specific atomic instructions and tweaks.
Maxim Kuvyrkov ma...@codesourcery.com writes: The only other change that I made that was not in your comments is the addition of b mips_print_operand specifier. The LDADD and SWAP instructions accept their address as a plain register without parenthesis, Ouch. so I've added the specifier to skip outputting parenthesis. Yeah, good idea. Patch is OK, thanks. Richard
Re: [PATCH, i386]: Introduce FIST_ROUNDING int iterator
Hello! 2012-06-19 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (FIST_ROUNDING): New int iterator. (rounding): Handle UNSPEC_FIST_{FLOOR,CEIL}. (ROUNDING): Ditto. (*fistmode2_rounding_1): Macroize insn from *fistmode2_{floor,ceil}_1 using FIST_ROUNDING int iterator. (fistdi2_rounding): Macroize insn from fistdi2_{floor,ceil} using FIST_ROUNDING int iterator. (fistdi2_rounding_with_temp and splitters): Macroize insn and corresponding splitters from fistdi2_{floor,ceil} and corresponding splitters using FIST_ROUNDING int iterator. (fistmode2_rounding): Macroize insn from fistmode2_{floor,ceil} using FIST_ROUNDING int iterator. (fistmode2_rounding_with_temp and splitters): Macroize insn and corresponding splitters from fistmode2_{floor,ceil} and corresponding splitters using FIST_ROUNDING int iterator. (lroundingxfmode2): Macroize expander from l{floor,ceil}xfmode2 using FIST_ROUNDING int iterator. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 188783) +++ config/i386/i386.md (working copy) @@ -15104,15 +15104,23 @@ UNSPEC_FRNDINT_CEIL UNSPEC_FRNDINT_TRUNC]) +(define_int_iterator FIST_ROUNDING + [UNSPEC_FIST_FLOOR +UNSPEC_FIST_CEIL]) + (define_int_attr rounding [(UNSPEC_FRNDINT_FLOOR floor) (UNSPEC_FRNDINT_CEIL ceil) -(UNSPEC_FRNDINT_TRUNC trunc)]) +(UNSPEC_FRNDINT_TRUNC trunc) +(UNSPEC_FIST_FLOOR floor) +(UNSPEC_FIST_CEIL ceil)]) (define_int_attr ROUNDING [(UNSPEC_FRNDINT_FLOOR FLOOR) (UNSPEC_FRNDINT_CEIL CEIL) -(UNSPEC_FRNDINT_TRUNC TRUNC)]) +(UNSPEC_FRNDINT_TRUNC TRUNC) +(UNSPEC_FIST_FLOOR FLOOR) +(UNSPEC_FIST_CEIL CEIL)]) ;; Rounding mode control word calculation could clobber FLAGS_REG. (define_insn_and_split frndintxf2_rounding @@ -15205,174 +15213,59 @@ DONE; }) -(define_insn_and_split *fistmode2_floor_1 - [(set (match_operand:SWI248x 0 nonimmediate_operand) - (unspec:SWI248x [(match_operand:XF 1 register_operand)] - UNSPEC_FIST_FLOOR)) - (clobber (reg:CC FLAGS_REG))] +(define_expand ceilxf2 + [(use (match_operand:XF 0 register_operand)) + (use (match_operand:XF 1 register_operand))] TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -can_create_pseudo_p () - # - 1 - [(const_int 0)] +flag_unsafe_math_optimizations { - ix86_optimize_mode_switching[I387_FLOOR] = 1; + if (optimize_insn_for_size_p ()) +FAIL; + emit_insn (gen_frndintxf2_ceil (operands[0], operands[1])); + DONE; +}) - operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED); - operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR); - if (memory_operand (operands[0], VOIDmode)) -emit_insn (gen_fistmode2_floor (operands[0], operands[1], - operands[2], operands[3])); +(define_expand ceilmode2 + [(use (match_operand:MODEF 0 register_operand)) + (use (match_operand:MODEF 1 register_operand))] + (TARGET_USE_FANCY_MATH_387 + (!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) + || TARGET_MIX_SSE_I387) + flag_unsafe_math_optimizations) + || (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH +!flag_trapping_math) +{ + if (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH + !flag_trapping_math) +{ + if (TARGET_ROUND) + emit_insn (gen_sse4_1_roundmode2 + (operands[0], operands[1], GEN_INT (ROUND_CEIL))); + else if (optimize_insn_for_size_p ()) + FAIL; + else if (TARGET_64BIT || (MODEmode != DFmode)) + ix86_expand_floorceil (operands[0], operands[1], false); + else + ix86_expand_floorceildf_32 (operands[0], operands[1], false); +} else { - operands[4] = assign_386_stack_local (MODEmode, SLOT_TEMP); - emit_insn (gen_fistmode2_floor_with_temp (operands[0], operands[1], - operands[2], operands[3], - operands[4])); -} - DONE; -} - [(set_attr type fistp) - (set_attr i387_cw floor) - (set_attr mode MODE)]) + rtx op0, op1; -(define_insn fistdi2_floor - [(set (match_operand:DI 0 memory_operand =m) - (unspec:DI [(match_operand:XF 1 register_operand f)] - UNSPEC_FIST_FLOOR)) - (use (match_operand:HI 2 memory_operand m)) - (use (match_operand:HI 3 memory_operand m)) - (clobber (match_scratch:XF 4 =1f))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations - * return output_fix_trunc (insn, operands, false); - [(set_attr type fistp) - (set_attr i387_cw floor) - (set_attr mode DI)]) + if
Re: [PATCH 2/3] Use synth_mult for vector multiplies vs scalar constant
On 2012-06-16 04:19, Eric Botcazou wrote: @@ -179,7 +179,11 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES]; extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES]; #define GET_MODE_SIZE(MODE)((unsigned short) mode_size[MODE]) -#define GET_MODE_BITSIZE(MODE) ((unsigned short) (GET_MODE_SIZE (MODE) * BITS_PER_UNIT)) + +#define GET_MODE_BITSIZE(MODE) \ + ((unsigned short) (GET_MODE_SIZE (MODE) * BITS_PER_UNIT)) +#define GET_MODE_UNIT_BITSIZE(MODE) \ + ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT)) /* Get the number of value bits of an object of mode MODE. */ extern const unsigned short mode_precision[NUM_MACHINE_MODES]; Can you move GET_MODE_UNIT_BITSIZE to after GET_MODE_UNIT_SIZE, changing size in bytes to size in bytes and bits in the comment just above? Because the overloading of UNIT in the macro makes the whole thing slightly confusing. :-) Done in the committed patch. r~
Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant
Jiří Hruška ji...@fud.cz writes: #include stdio.h __attribute__((section(foo))) const int foo1 = 1; __attribute__((section(foo))) const int foo2 = 2; __attribute__((section(foo))) const int foo3 = 3; extern const int __start_foo, __stop_foo; Declare them as arrays. extern const int __start_foo[], __stop_foo[]; Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] backport darwin12 fixes to gcc-4_7-branch
Ok.
Re: [RFC 0/3] Stuff related to pr53533
On 2012-06-15 13:57, Richard Henderson wrote: Bootstrapped and tested on x86_64, but I'll leave some time for comment before committing any of this. Patches now committed. r~
Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)
On 2012-06-18 22:46, Jakub Jelinek wrote: On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote: On 2012-06-14 13:58, Jakub Jelinek wrote: + if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt, + vecwtype, vectype, + dummy, dummy, dummy_code, + dummy_code, dummy_int, dummy_vec)) +return NULL; It would be nice to be able to handle high-part multiplies as well, e.g. VEC_WIDEN_MULT_HI_EXPR. Which is what Altivec provides, and not VEC_WIDEN_MULT. Sure, but we don't have a tree code for that right now, do we? VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results, not all the high halves of the widened multiply. Actually, it is all the high parts of the multiply results. The comment in tree.def is incorrect. Likewise MULT_LO_EXPR is the low parts (and fully redundant with plain MULT_EXPR, really). For 16-bit multiplication we could also use {,V}PMULH{,U}W (for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards). Well, an single interleave, not shifts, but yes. r~
Re: [PATCH] Fix PR53708
On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Iain
Re: [patch] Deal with #ident without
On Thu, Jun 7, 2012 at 11:22 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 7, 2012 at 8:16 AM, Andreas Schwab sch...@linux-m68k.org wrote: Steven Bosscher stevenb@gmail.com writes: Index: doc/tm.texi === --- doc/tm.texi (revision 188182) +++ doc/tm.texi (working copy) @@ -5847,6 +5847,10 @@ value is 0. @end deftypevr @deftypefn {Target Hook} void TARGET_ASM_OUTPUT_ANCHOR (rtx @var{x}) + +@deftypefn {Target Hook} void TARGET_ASM_OUTPUT_IDENT (const char *@var{name}) +Generate a string based on @var{name}, suitable for the @samp{#ident} directive, or the equivalent directive or pragma in non-C-family languages. If this hook is not defined, nothing is output for the @samp{#ident} directive. +@end deftypefn That looks misplaced. Ok after double-checking the above. I've now committed this, see r188791. Ciao! Steven
Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)
On Jun 18, 2012, at 10:51 AM, Steven Bosscher stevenb@gmail.com wrote: This patch started as an attempt to remove #include output.h from objc/: Instead of writing references directly to asm_out_file, the references are output as top-level asm statements. OK for trunk? Ok.
Re: [patch] Use IDENTIFIER_LENGTH instead of strlen(IDENTIFIER_POINTER) in a few places
On Jun 18, 2012, at 8:55 AM, Steven Bosscher stevenb@gmail.com wrote: Obvious enough objc/ * objc-encoding.c (encode_aggregate_fields): Use IDENTIFIER_LENGTH instead of strlen(IDENTIFIER_POINTER). (encode_aggregate_within): Likewise. Ok.
Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant
On Tue, Jun 19, 2012 at 8:59 PM, Andreas Schwab sch...@linux-m68k.org wrote: Declare them as arrays. extern const int __start_foo[], __stop_foo[]; Thanks, that's a good suggestion, cleans the code nicely! (Though, of course, both ways work here and the strange things happen only in the 3rd testcase, which does not use these special variables.)
Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)
Richard Guenther writes: On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Guenther wrote: On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand uweig...@de.ibm.com wrote: However, there is a second case where we need to check every pass: if we're not actually vectorizing any loop, but are performing basic-block SLP. In this case, it would appear that we need the same check as described in the comment above, i.e. to verify that the stride is a multiple of the vector size. The patch below adds this check, and this indeed fixes the invalid access I was seeing in the test case (in the final assembler, we now get a vld1.16 instead of vldr). Tested on arm-linux-gnueabi with no regressions. OK for mainline? Ok. Thanks for the quick review; I've checked this in to mainline now. I just noticed that the test case also crashes on 4.7, but not on 4.6. Would a backport to 4.7 also be OK, once testing passes? Yes. Please leave it on mainline a few days to catch fallout from autotesters. This patch caused FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp basic block vectorized using SLP 1 on sparc64-linux. Comparing the pre and post patch dumps for that file shows 22: vect_compute_data_ref_alignment: 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] 22: vect_compute_data_ref_alignment: -22: force alignment of arr[i_87] -22: misalign = 0 bytes of ref arr[i_87] +22: SLP: step doesn't divide the vector-size. +22: Unknown alignment for access: arr (lots of stuff that's simply gone) -22: BASIC BLOCK VECTORIZED - -22: basic block vectorized using SLP +22: not vectorized: unsupported unaligned store.arr[i_87] +22: not vectorized: unsupported alignment in basic block. /Mikael
Re: [patch] Fix failing nested-3.C on ARM.
On Jun 19, 2012, at 2:18 AM, Richard Earnshaw rearn...@arm.com wrote: The regexp in nested-3.C has to parse the machine-specific comment character; on ARM that is '@'. Tested on arm-eabi, where this test now passes. OK? Ok.
Re: [PATCH] Fix PR53708
On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote: On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Hum, upper is easy... I thought the issue was that extra alignment would kill it? I know that extra alignment does kill some of the objc metadata.
Re: [PATCH] Fix PR53708
On Jun 19, 2012, at 5:53 AM, domi...@lps.ens.fr (Dominique Dhumieres) wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, Sounds good to me. It seems ok to me for the optimizer bumps up the alignment on things that aren't special. DECL_PRESERVE seems like a reasonable way to declare they are special.
Re: [testsuite] profopt.exp and friends: use expected list of options
On Jun 18, 2012, at 4:51 PM, Janis Johnson janis_john...@mentor.com wrote: There are tests in g++.tree-prof that have non-unique lines in test summaries for scan-*-dump checks. Investigation showed that these tests were being run multiple times, for a list of options that had leaked over from another set of profile-directed optimization tests. This patch makes it use [ { -O2 } {-O3 } ] so the options tested there will get some coverage with optimization, although not as much as originally planned when the tests were added years and years ago. Sounds ok to me, but I'd be happy to have a prof champion chime in, if they disagree. OK for mainline? Ok, with the caveat that I'll defer to a prof champion.
Re: [PATCH] Fix PR53708
On 19 Jun 2012, at 22:41, Mike Stump wrote: On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote: On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Hum, upper is easy... I thought the issue was that extra alignment would kill it? I know that extra alignment does kill some of the objc metadata. clearly, ambiguous phrasing on my part. I mean when we want to say no more than this much.
Fix e500 vector ICE with string constants
On some tests involving storing a pointer to a string constant in a vector, on powerpc with SPE vectors, an ICE occurs of the form: t2.c: In function 'f': t2.c:7:1: error: unrecognizable insn: } ^ (insn 9 8 10 2 (set (subreg:SI (reg:V2SI 125 [ D.1618 ]) 4) (lo_sum:SI (reg:SI 126) (symbol_ref/f:SI (*.LC0) [flags 0x82] var_decl 0xf745b000 *.LC0))) t2.c:6 -1 (nil)) t2.c:7:1: internal compiler error: in extract_insn, at recog.c:2130 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. The patterns to set individual words of SPE vectors only allow input_operand and do not allow for the LO_SUM constructs used for pointers to strings. This patch fixes things by adding further patterns for the LO_SUM case. (It's possible the issue could also arise with the patterns for subregs of TFmode at offset 8 and 12, but I couldn't get the compiler to generate stores of string constant pointers to such subregs.) The original test I had for this issue in a 4.6-based compiler simplified to char *a1[20]; int a2[20]; char a3[1]; void f (void) { int i; for (i = 1; i 20; i++) { a1[i] = ; a2[i] = 0; } } with -O3, where the vectors were generated internally, but that doesn't ICE with trunk, so I created the synthetic testcases in this patch that do ICE with trunk. Tested with no regressions with cross to powerpc-eabispe. OK to commit? 2012-06-19 Joseph Myers jos...@codesourcery.com * config/rs6000/spe.md (*mov_simode_e500_subreg0): Rename to mov_simode_e500_subreg0. (*mov_simode_e500_subreg0_elf_low) (*mov_simode_e500_subreg4_elf_low): New patterns. testsuite: 2012-06-19 Joseph Myers jos...@codesourcery.com * gcc.c-torture/compile/vector-5.c, gcc.c-torture/compile/vector-6.c: New tests. Index: gcc/testsuite/gcc.c-torture/compile/vector-5.c === --- gcc/testsuite/gcc.c-torture/compile/vector-5.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/vector-5.c (revision 0) @@ -0,0 +1,7 @@ +typedef int v2si __attribute__((__vector_size__(8))); + +v2si +f (int x) +{ + return (v2si) { x, (__INTPTR_TYPE__) }; +} Index: gcc/testsuite/gcc.c-torture/compile/vector-6.c === --- gcc/testsuite/gcc.c-torture/compile/vector-6.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/vector-6.c (revision 0) @@ -0,0 +1,7 @@ +typedef int v2si __attribute__((__vector_size__(8))); + +v2si +f (int x) +{ + return (v2si) { (__INTPTR_TYPE__) , x }; +} Index: gcc/config/rs6000/spe.md === --- gcc/config/rs6000/spe.md(revision 188753) +++ gcc/config/rs6000/spe.md(working copy) @@ -1,5 +1,5 @@ ;; e500 SPE description -;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 +;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011, 2012 ;; Free Software Foundation, Inc. ;; Contributed by Aldy Hernandez (a...@quesejoda.com) @@ -2329,7 +2329,7 @@ evmergehi %0,%1,%1\;mr %L0,%1\;evmergehi %Y0,%L1,%L1\;mr %Z0,%L1 [(set_attr length 16)]) -(define_insn *mov_simode_e500_subreg0 +(define_insn mov_simode_e500_subreg0 [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r,r) 0) (match_operand:SI 1 input_operand r,m))] (TARGET_E500_DOUBLE (MODEmode == DFmode || MODEmode == TFmode)) @@ -2339,6 +2339,24 @@ evmergelohi %0,%0,%0\;{l%U1%X1|lwz%U1%X1} %0,%1\;evmergelohi %0,%0,%0 [(set_attr length 4,12)]) +(define_insn_and_split *mov_simode_e500_subreg0_elf_low + [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r) 0) + (lo_sum:SI (match_operand:SI 1 gpc_reg_operand r) + (match_operand 2 )))] + ((TARGET_E500_DOUBLE (MODEmode == DFmode || MODEmode == TFmode)) +|| (TARGET_SPE MODEmode != DFmode MODEmode != TFmode)) +TARGET_ELF !TARGET_64BIT can_create_pseudo_p () + # + 1 + [(pc)] +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_elf_low (tmp, operands[1], operands[2])); + emit_insn (gen_mov_simode_e500_subreg0 (operands[0], tmp)); + DONE; +} + [(set_attr length 8)]) + ;; ??? Could use evstwwe for memory stores in some cases, depending on ;; the offset. (define_insn *mov_simode_e500_subreg0_2 @@ -2360,6 +2378,15 @@ mr %0,%1 {l%U1%X1|lwz%U1%X1} %0,%1) +(define_insn *mov_simode_e500_subreg4_elf_low + [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r) 4) + (lo_sum:SI (match_operand:SI 1 gpc_reg_operand r) + (match_operand 2 )))] + ((TARGET_E500_DOUBLE (MODEmode == DFmode || MODEmode == TFmode)) +|| (TARGET_SPE MODEmode != DFmode MODEmode != TFmode)) +TARGET_ELF !TARGET_64BIT + {ai|addic} %0,%1,%K2) + (define_insn *mov_simode_e500_subreg4_2 [(set (match_operand:SI 0
[patch][PCH] Do not write/read asm_out_file, take 2
Hello, The attached patch removes one more #include output.h, this time from c-family/c-pch.c. Anything written out to asm_out_file between pch_init and c_common_write_pch is read back in by c_common_write_pch and dumped to the PCH that's being written out. In c_common_read_pch this data is written out verbatim to asm_out_file again. But nothing should write to asm_out_file between pch_init and c_common_write_pch. I suppose this happened before unit-at-a-time became the only supported compilation mode, but these days there's nothing, AFAICT, that should be written to asm_out_file by a front end during PCH generation. This patch was bootstrappedtested on powerpc64-unknown-linux-gnu. The issues with #ident have already been addressed, and this patch adds a new test case, to make sure... OK for trunk? Ciao! Steven 01_c_pch_no_asm_out_file.diff Description: Binary data
Re: Fix e500 vector ICE with string constants
On Tue, Jun 19, 2012 at 5:56 PM, Joseph S. Myers jos...@codesourcery.com wrote: 2012-06-19 Joseph Myers jos...@codesourcery.com * config/rs6000/spe.md (*mov_simode_e500_subreg0): Rename to mov_simode_e500_subreg0. (*mov_simode_e500_subreg0_elf_low) (*mov_simode_e500_subreg4_elf_low): New patterns. testsuite: 2012-06-19 Joseph Myers jos...@codesourcery.com * gcc.c-torture/compile/vector-5.c, gcc.c-torture/compile/vector-6.c: New tests. Okay. Thanks, David
[patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH
Hi, I've applied the attached patch which is a tiny SH specific change of gcc.dg/stack-usage-1.c test. Tested on sh-linux and i686-pc-linux-gnu. Regards, kaz -- 2012-06-19 Kaz Kojima kkoj...@gcc.gnu.org * gcc.dg/stack-usage-1.c: Use sh*-*-* instead of sh-*-*. --- ORIG/trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-16 09:29:54.0 +0900 +++ trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-19 07:55:54.0 +0900 @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options -fstack-usage } */ -/* { dg-options -fstack-usage -fomit-frame-pointer { target { sh-*-* } } } */ +/* { dg-options -fstack-usage -fomit-frame-pointer { target { sh*-*-* } } } */ /* This is aimed at testing basic support for -fstack-usage in the back-ends. See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c).
[patch][ARM] Do not include output.h in arm-c.c
Hello, Only a few front-end files to go that need output.h, and some of them are in the c_target_objs: arm, mep, m32c, and rl78. This patch tackles the ARM case. arm-c.c needs output.h because EMIT_EABI_ATTRIBUTE wants to print to asm_out_file. Solved by replacing EMIT_EABI_ATTRIBUTE with a function arm.c:arm_emit_eabi_attribute. Tested by building a cross-compiler from powerpc64-unknown-linux-gnu X arm-eabi, and comparing assembly on a set of files. OK for trunk? Ciao! Steven arm_C_no_output_h.diff Description: Binary data
Re: [RFC 0/3] Stuff related to pr53533
On 2012-06-15 13:57, Richard Henderson wrote: Bootstrapped and tested on x86_64, but I'll leave some time for comment before committing any of this. Patches now committed. Hey Richard, Thanks for taking on some of these issues. I'm not seeing much of an improvement yet when manually applying the patches to 4.7, but it looks like steps in the right direction. Having to turn off vectorization to approximate previous compiler performance was disappointing given it's supposed to give us a boost on some of these architectures ;) Would it be possible to commit these to 4_7-branch as well? (One of the patches looks relevant to 4.6 as well, and applied cleanly, but I haven't tested to see if it had a noticeable effect.) Thanks again! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
[cxx-conversion] Remove option to build without a C++ compiler (issue6296093)
Remove option to build without a C++ compiler. This patch removes all the configuration code that allowed GCC to build without a C++ compiler. After this patch the following configuration flags are no longer valid: --enable-build-with-cxx --enable-build-poststage1-with-cxx All builds will unconditionally use C++. Tested on x86_64. Ian, could you please take a look to double check I have not missed anything? There was more code dealing with it than I was expecting. I'm also not sure how to propagate the changes in go/gofrontend, but we don't need to worry about that until we do the acutal merge into trunk. Thanks. Diego. 2012-06-19 Diego Novillo dnovi...@google.com ChangeLog.cxx-conversion * Makefile.tpl (STAGE[+id+]_CXXFLAGS): Remove POSTSTAGE1_CONFIGURE_FLAGS. * Makefile.in: Regenerate. * configure.ac (ENABLE_BUILD_WITH_CXX): Remove. Update all users. * configure: Regenerate. gcc/ChangeLog.cxx-conversion * Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX. * configure.ac: Likewise. * configure: Regenerate. * config.in: Regenerate. * doc/install.texi: Remove documentation for --enable-build-with-cxx and --enable-build-poststage1-with-cxx. gcc/go/ChangeLog.cxx-conversion * go-c.h: Remove all handlers of ENABLE_BUILD_WITH_CXX. * go-gcc.cc: Likewise. * go-system.h: Likewise. libcpp/ChangeLog.cxx-conversion * Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX. * configure.ac: Likewise. * configure: Regenerate. diff --git a/Makefile.in b/Makefile.in index def860e..d81fb97 100644 --- a/Makefile.in +++ b/Makefile.in @@ -422,7 +422,6 @@ TFLAGS = STAGE_CFLAGS = $(BOOT_CFLAGS) STAGE_TFLAGS = $(TFLAGS) STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@ -POSTSTAGE1_CONFIGURE_FLAGS = @POSTSTAGE1_CONFIGURE_FLAGS@ # Defaults for stage 1; some are overridden below. @@ -433,10 +432,7 @@ STAGE1_CXXFLAGS = $(CXXFLAGS) STAGE1_CXXFLAGS = $(STAGE1_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGE1_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGE1_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGE1_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Defaults for stage 2; some are overridden below. STAGE2_CFLAGS = $(STAGE_CFLAGS) @@ -446,10 +442,7 @@ STAGE2_CXXFLAGS = $(CXXFLAGS) STAGE2_CXXFLAGS = $(STAGE2_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGE2_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGE2_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGE2_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Defaults for stage 3; some are overridden below. STAGE3_CFLAGS = $(STAGE_CFLAGS) @@ -459,10 +452,7 @@ STAGE3_CXXFLAGS = $(CXXFLAGS) STAGE3_CXXFLAGS = $(STAGE3_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGE3_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGE3_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGE3_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Defaults for stage 4; some are overridden below. STAGE4_CFLAGS = $(STAGE_CFLAGS) @@ -472,10 +462,7 @@ STAGE4_CXXFLAGS = $(CXXFLAGS) STAGE4_CXXFLAGS = $(STAGE4_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGE4_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGE4_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGE4_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Defaults for stage profile; some are overridden below. STAGEprofile_CFLAGS = $(STAGE_CFLAGS) @@ -485,10 +472,7 @@ STAGEprofile_CXXFLAGS = $(CXXFLAGS) STAGEprofile_CXXFLAGS = $(STAGEprofile_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGEprofile_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGEprofile_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGEprofile_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Defaults for stage feedback; some are overridden below. STAGEfeedback_CFLAGS = $(STAGE_CFLAGS) @@ -498,10 +482,7 @@ STAGEfeedback_CXXFLAGS = $(CXXFLAGS) STAGEfeedback_CXXFLAGS = $(STAGEfeedback_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGEfeedback_TFLAGS = $(STAGE_TFLAGS) -# STAGE1_CONFIGURE_FLAGS overridden below, so we can use -# POSTSTAGE1_CONFIGURE_FLAGS here. -STAGEfeedback_CONFIGURE_FLAGS = \ - $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS) +STAGEfeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) # Only build the C compiler for stage1, because that is the only one that @@ -519,9 +500,6 @@
Re: [patch] Deal with #ident without
On Tue, 19 Jun 2012, Steven Bosscher wrote: I've now committed this, see r188791. Breaking cris-elf. Just try rebuilding cc1: ./gcc/gcc/../libdecnumber/dpd -I../libdecnumber\ /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c -o cris.o /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'cris_asm_output_ident': /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'cgraph_state' undeclared (first use in this function) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: (Each undeclared identifier is reported only once /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: for each function it appears in.) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'CGRAPH_STATE_PARSING' undeclared (first use in this funct\ ion) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2478: warning: unused variable 'buf' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2477: warning: unused variable 'size' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2476: warning: unused variable 'section_asm_op' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'cris_option_override': /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2538: error: 'flag_no_gcc_ident' undeclared (first use in this function\ ) make[2]: *** [cris.o] Error 1 brgds, H-P
Re: [RFC 0/3] Stuff related to pr53533
On 2012-06-19 15:55, Matt wrote: On 2012-06-15 13:57, Richard Henderson wrote: Bootstrapped and tested on x86_64, but I'll leave some time for comment before committing any of this. Patches now committed. Hey Richard, Thanks for taking on some of these issues. I'm not seeing much of an improvement yet when manually applying the patches to 4.7... Of course not. None of them address the real problem. They merely fix warts discovered along the way. Would it be possible to commit these to 4_7-branch as well? No, I don't think so. r~
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
On 06/19/2012 10:12 AM, Sterling Augustine wrote: + /* If we're putting types in their own .debug_types sections, +the .debug_pubtypes table will still point to the compile +unit (not the type unit), so we want to use the offset of +the skeleton DIE (if there is one). */ + if (pub-die-comdat_type_p names == pubtype_table) + { + comdat_type_node_ref type_node = pub-die-die_id.die_type_node; + + if (type_node != NULL type_node-skeleton_die != NULL) + die_offset = type_node-skeleton_die-die_offset; + } I think we had agreed that if there is no skeleton, we should use an offset of 0. Jason
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
Ping. On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam tmsri...@google.com wrote: +cc c++ front-end maintainers Hi, C++ Frontend maintainers, Could you please take a look at the front-end part when you find the time? Honza, your thoughts on the callgraph part? Richard, any further comments/feedback? Additionally, I am working on generating better mangled names for function versions, along the lines of C++ thunks. Thanks, -Sri. On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam tmsri...@google.com wrote: Hi, Attaching updated patch for function multiversioning which brings in plenty of changes. * As suggested by Richard earlier, I have made cgraph aware of function versions. All nodes of function versions are chained and the dispatcher bodies are created on demand while building cgraph edges. The dispatcher body will be created if and only if there is a call or reference to a versioned function. Previously, I was maintaining the list of versions separately in a hash map, all that is gone now. * Now, the file multiverison.c has some helper routines that are used in the context of function versioning. There are no new passes and no new globals. * More tests, updated existing tests. * Fixed lots of bugs. * Updated patch description. Patch attached. Patch also available for review at http://codereview.appspot.com/5752064 Please let me know what you think, Thanks, -Sri. On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam tmsri...@google.com wrote: Hi H.J, Attaching new patch with 2 test cases, mv2.C checks ISAs only and mv1.C checks ISAs and arches mixed. Right now, checking only arches is not needed as they are mutually exclusive, any order should be fine. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Sat, May 12, 2012 at 6:37 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam tmsri...@google.com wrote: Hi H.J., I have updated the patch to improve the dispatching method like we discussed. Each feature gets a priority now, and the dispatching is done in priority order. Please see i386.c for the changes. Patch also available for review here: http://codereview.appspot.com/5752064 I think you need 3 tests: 1. Only with ISA. 2. Only with arch 3. Mixed with ISA and arch since test mixed ISA and arch may hide issues with ISA only or arch only. -- H.J.
Re: [PATCH, MIPS] Add most common atomic patterns
I've now checked these patches. Tom, thanks for great optimizing sync and atomic builtins for MIPS and XLP, and, Richard, thanks for the reviews and education on writing good .md descriptions. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics On 13/06/2012, at 5:50 PM, Maxim Kuvyrkov wrote: This patch series adds necessary patterns for __atomic_compare_exchange[_n], __atomic_exchange[_n] and __atomic_fetch_add builtins. These are the builtins that correspond to inline assembly that MIPS GLIBC port is using. The patches were originally developed by Tom de Vries a while ago, and I've rewrote parts of them to be better suited for upstream. The second patch adds XLP-specific patterns to support its swap and ldadd instructions. Unfortunately, there seem to be a problem in reload that prevents reload from properly spilling address for these two patterns. I will work with reload experts on investigating and fixing this problem, but, meanwhile, the patch contains a workaround that avoids the problem. The third patch is a small optimization to alleviate __atomic_compare_exchange[_n] builtins being a use-one-for-all solutions. These builtins return both boolean success and oldval results. As most cases use only one of the results, this optimizations looks at REG_UNUSED notes to determine if instructions to set these results can be omitted. The patch series was tested by running GLIBC testsuite for n32, n64 and o32 ABIs on XLP and [in-progress] non-XLP MIPS boards with no regressions with a corresponding patch to MIPS GLIBC port to use the new atomic builtins. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: [PATCH] Unify emit_{pre,post}_atomic_barrier across Alpha, ARM, MIPS and TileGX
On 15/06/2012, at 11:16 AM, Richard Henderson wrote: On 2012-06-14 16:06, Maxim Kuvyrkov wrote: 2012-06-15 Maxim Kuvyrkov ma...@codesourcery.com * emit-rtl.c (need_atomic_barrier_p): New function. * emit-rtl.h (need_atomic_barrier_p): Declare it. * config/alpha/alpha.c (alpha_{pre,post}_atomic_barrier): Remove, use generic version instead. * config/arm/arm.c (arm_{pre,post}_atomic_barrier): Remove, use generic version instead. * config/mips/mips.c (mips_{pre,post}_atomic_barrier_p): Remove, use generic version instead. * config/tilegx/tilegx.c, config/tilegx/tilegx-protos.h, * config/tilegx/sync.md (tilegx_{pre,post}_atomic_barrier): Remove, use generic version instead. Ok. Since I didn't hear any objections from target maintainers I've checked in this patch. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
C++ PATCH for c++/53651 (ICE with ill-formed use of decltype)
A decltype doesn't have a name. Tested x86_64-pc-linux-gnu, applying to trunk and 4.7. commit bab2f5e9e77bd41b91ca6eae34483eb159307519 Author: Jason Merrill ja...@redhat.com Date: Thu Jun 14 17:28:08 2012 -0700 PR c++/53651 * name-lookup.c (constructor_name_p): Don't try to look at the name of a DECLTYPE_TYPE. diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 0f28820..cc8439c 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -1966,6 +1966,11 @@ constructor_name_p (tree name, tree type) if (TREE_CODE (name) != IDENTIFIER_NODE) return false; + /* These don't have names. */ + if (TREE_CODE (type) == DECLTYPE_TYPE + || TREE_CODE (type) == TYPEOF_TYPE) +return false; + ctor_name = constructor_name_full (type); if (name == ctor_name) return true; diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype37.C b/gcc/testsuite/g++.dg/cpp0x/decltype37.C new file mode 100644 index 000..c885e9a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/decltype37.C @@ -0,0 +1,14 @@ +// PR c++/53651 +// { dg-do compile { target c++11 } } + +templatetypename struct wrap { void bar(); }; + +templatetypename T auto foo(T* t) - wrapT* { return 0; } + +templatetypename T +struct holder : decltype(*foo((T*)0)) // { dg-error class type } +{ +using decltype(*foo((T*)0))::bar; // { dg-error is not a base } +}; + +holderint h;
Re: [patch] Remove NO_IMPLICIT_EXTERN_C target macro
On Mon, 18 Jun 2012, Steven Bosscher wrote: The attached patch removes NO_IMPLICIT_EXTERN_C, and replaces its sole user with IMPLICIT_EXTERN_C to avoid the double negations (#ifndef NO_IMPLICIT_EXTERN_C, etc.). Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? I saw it wasn't part of this patch so: when and if this eventually gets in, please don't forget to poison it, see system.h. brgds, H-P
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
Ping. Teresa On Fri, May 18, 2012 at 7:21 AM, Teresa Johnson tejohn...@google.com wrote: Ping? Teresa On Fri, May 11, 2012 at 6:11 AM, Teresa Johnson tejohn...@google.com wrote: Ping? Teresa On Fri, May 4, 2012 at 3:41 PM, Teresa Johnson tejohn...@google.com wrote: On David's suggestion, I have removed the changes that rename niter_desc to loop_desc from this patch to focus the patch on the unrolling changes. I can submit a cleanup patch to do the renaming as soon as this one goes in. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa Here is the new description of improvements from the original patch: Improved patch based on feedback. Main changes are: 1) Improve efficiency by caching loop analysis results in the loop auxiliary info structure hanging off the loop structure. Added a new routine, analyze_loop_insns, to fill in information about the average and total number of branches, as well as whether there are any floating point set and call instructions in the loop. The new routine is invoked when we first create a loop's niter_desc struct, and the caller (get_simple_loop_desc) has been modified to handle creating a niter_desc for the fake outermost loop. 2) Improvements to max_unroll_with_branches: - Treat the fake outermost loop (the procedure body) as we would a hot outer loop, i.e. compute the max unroll looking at its nested branches, instead of shutting off unrolling when we reach the fake outermost loop. - Pull the checks previously done in the caller into the routine (e.g. whether the loop iterates frequently or contains fp instructions). - Fix a bug in the previous version that sometimes caused overflow in the new unroll factor. 3) Remove float variables, and use integer computation to compute the average number of branches in the loop. 4) Detect more types of floating point computations in the loop by walking all set instructions, not just single sets. 2012-05-04 Teresa Johnson tejohn...@google.com * doc/invoke.texi: Update the documentation with new params. * loop-unroll.c (max_unroll_with_branches): New function. (decide_unroll_constant_iterations, decide_unroll_runtime_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_peel_simple, decide_unroll_stupid): Retrieve number of branches from niter_desc instead of via function that walks loop. * loop-iv.c (get_simple_loop_desc): Invoke new analyze_loop_insns function, and add guards to enable this function to work for the outermost loop. * cfgloop.c (insn_has_fp_set, analyze_loop_insns): New functions. (num_loop_branches): Remove. * cfgloop.h (struct loop_desc): Added new fields to cache additional loop analysis information. (num_loop_branches): Remove. (analyze_loop_insns): Declare. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187013) +++ doc/invoke.texi (working copy) @@ -8842,6 +8842,12 @@ The maximum number of insns of an unswitched loop. @item max-unswitch-level The maximum number of branches unswitched in a single loop. +@item min-iter-unroll-with-branches +Minimum iteration count to ignore branch effects when unrolling. + +@item unroll-outer-loop-branch-budget +Maximum number of branches allowed in hot outer loop region after unroll. + @item lim-expensive The minimum cost of an expensive expression in the loop invariant motion. Index: loop-unroll.c === --- loop-unroll.c (revision 187013) +++ loop-unroll.c (working copy) @@ -152,6 +152,99 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which can increase branch mispredictions. The number of + branches is computed by weighting each branch with its expected execution + probability through the loop based on profile data. If no profile feedback + data exists, simply return the current NUNROLL factor. */ + +static unsigned +max_unroll_with_branches(struct loop *loop, unsigned nunroll) +{ + struct loop *outer; + struct niter_desc *outer_desc = 0; + int outer_niters = 1; + int frequent_iteration_threshold; + unsigned branch_budget; + struct niter_desc *desc = get_simple_loop_desc (loop); + + /* Ignore loops with FP computation as these tend to benefit much more + consistently from unrolling. */ + if (desc-has_fp) + return nunroll; + + frequent_iteration_threshold = PARAM_VALUE
Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant
On 18 June 2012 22:17, Carrot Wei car...@google.com wrote: Hi Could ARM maintainers review following patches? http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html 64bit add/sub constants. http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html 64bit and with constants. http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html 64bit xor with constants. http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html 64bit ior with constants. Hi Carrot. Out of interest, how do these interact with the 64 bit in NEON patches that Andrew has been doing? They seem to touch many of the same patterns and I'm concerned that they'd cause GCC to prefer core registers instead of NEON, especially as the constant values you can use in a vmov are limited. There's a (in progress) summary of the current state for the standard C operators here: https://wiki.linaro.org/MichaelHope/Sandbox/64BitOperations -- Michael
RE: [PATCH, ARM] New CPU support for Marvell PJ4 cores
marvell-pj4 is added to BE8_LINK_SPEC. Modified patch is attached. Thanks! B.R. Yi-Hsiu, Hsu -Original Message- From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] Sent: Thursday, June 14, 2012 2:19 AM To: Yi-Hsiu Hsu Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, ARM] New CPU support for Marvell PJ4 cores On 29 May 2012 10:07, Yi-Hsiu Hsu a...@marvell.com wrote: Hi, This patch maintains Marvell PJ4 cores pipeline description. Run arm testsuite on arm-linux-gnueabi and no extra regressions are found. * config/arm/marvell-pj4.md: New marvell-pj4 pipeline description. * config/arm/arm.c (arm_issue_rate): Add marvell_pj4. * config/arm/arm-cores.def: Add core marvell-pj4. * config/arm/arm-tune.md: Regenerated. * config/arm/arm-tables.opt: Regenerated. * doc/invoke.texi: Added entry for marvell-pj4. This command line option should also be added to BE8_LINK_SPEC similar to what's done for the other v7-a cores. Ok with that change. regards, Ramana Thanks! P.S. I create the patch from revision 187308, but this revision is unable to build successfully, then I apply this patch to revision 187623 and successfully build and pass the testsuite. marvell-pj4-core.patch Description: marvell-pj4-core.patch
[PR debug/53682] avoid crash in cselib promote_debug_loc
When promote_debug_loc was first introduced, it would never be called with a NULL loc list. However, because of the strategy of temporarily resetting loc lists before recursion introduced a few months ago in alias.c, the earlier assumption no longer holds. This patch adusts promote_debug_loc to deal with this case. Ok to install? for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/53682 * cselib.c (promote_debug_loc): Don't crash on NULL argument. Index: gcc/cselib.c === --- gcc/cselib.c.orig 2012-06-17 22:52:27.740087279 -0300 +++ gcc/cselib.c 2012-06-18 08:55:32.948832112 -0300 @@ -322,7 +322,7 @@ new_elt_loc_list (cselib_val *val, rtx l static inline void promote_debug_loc (struct elt_loc_list *l) { - if (l-setting_insn DEBUG_INSN_P (l-setting_insn) + if (l l-setting_insn DEBUG_INSN_P (l-setting_insn) (!cselib_current_insn || !DEBUG_INSN_P (cselib_current_insn))) { n_debug_values--; -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PR49888, VTA] don't keep VALUEs bound to modified MEMs
On Jun 16, 2012, H.J. Lu hjl.to...@gmail.com wrote: If I understand it correctly, the new approach fails to handle push properly. It's actually cselib that didn't deal with push properly, so it thinks incoming stack arguments may be clobbered by them. But that's not the whole story, unfortunately. I still don't have a complete fix for the problem, but I have some patches that restore nearly all of the passes. The first one extends RTX alias analysis so that cselib can recognize that (mem:SI ARGP) and (mem:SI (plus (and (plus ARGP #-4) #-32) #-4)) don't alias. Before the patch, we'd go for infinite sized objects upon AND. The second introduces an entry-point equivalence between ARGP and SP, so that SP references in push and stack-align sequences can be canonicalized to ARGP-based. The third introduces address canonicalization that uses information in the dataflow variable set in addition to the static cselib table. This is the one I'm still working on, because some expressions still fail to canonicalize to ARGP although they could. The fourth removes a now-redundant equivalence from the dynamic table; the required information is always preserved in the static table. I've regstrapped (and checked results! :-) all of these on x86_64-linux-gnu and i686-linux-gnu. It fixes all visible regressions in x86_64-linux-gnu, and nearly all on i686-linux-gnu. May I check these in and keep on working to complete the fix, or should I revert the original patch and come back only with a patchset that fixes all debug info regressions? for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * alias.c (memrefs_conflict_p): Improve handling of AND for alignment. Index: gcc/alias.c === --- gcc/alias.c.orig 2012-06-17 22:52:27.551102225 -0300 +++ gcc/alias.c 2012-06-17 22:59:00.674994588 -0300 @@ -2103,17 +2103,31 @@ memrefs_conflict_p (int xsize, rtx x, in at least as large as the alignment, assume no other overlap. */ if (GET_CODE (x) == AND CONST_INT_P (XEXP (x, 1))) { - if (GET_CODE (y) == AND || ysize -INTVAL (XEXP (x, 1))) + HOST_WIDE_INT sc = INTVAL (XEXP (x, 1)); + unsigned HOST_WIDE_INT uc = sc; + if (xsize 0 sc 0 -uc == (uc -uc)) + { + xsize -= sc + 1; + c -= sc; + } + else if (GET_CODE (y) == AND || ysize -INTVAL (XEXP (x, 1))) xsize = -1; return memrefs_conflict_p (xsize, canon_rtx (XEXP (x, 0)), ysize, y, c); } if (GET_CODE (y) == AND CONST_INT_P (XEXP (y, 1))) { + HOST_WIDE_INT sc = INTVAL (XEXP (y, 1)); + unsigned HOST_WIDE_INT uc = sc; + if (ysize 0 sc 0 -uc == (uc -uc)) + { + ysize -= sc + 1; + c += sc; + } /* ??? If we are indexing far enough into the array/structure, we may yet be able to determine that we can not overlap. But we also need to that we are far enough from the end not to overlap a following reference, so we do nothing with that for now. */ - if (GET_CODE (x) == AND || xsize -INTVAL (XEXP (y, 1))) + else if (GET_CODE (x) == AND || xsize -INTVAL (XEXP (y, 1))) ysize = -1; return memrefs_conflict_p (xsize, x, ysize, canon_rtx (XEXP (y, 0)), c); } for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * var-tracking.c (vt_initialize): Record initial offset between arg pointer and stack pointer. Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2012-06-17 23:00:45.793675979 -0300 +++ gcc/var-tracking.c 2012-06-17 23:01:02.525351931 -0300 @@ -9507,6 +9507,41 @@ vt_initialize (void) valvar_pool = NULL; } + if (MAY_HAVE_DEBUG_INSNS) +{ + rtx reg, expr; + int ofst; + cselib_val *val; + +#ifdef FRAME_POINTER_CFA_OFFSET + reg = frame_pointer_rtx; + ofst = FRAME_POINTER_CFA_OFFSET (current_function_decl); +#else + reg = arg_pointer_rtx; + ofst = ARG_POINTER_CFA_OFFSET (current_function_decl); +#endif + + ofst -= INCOMING_FRAME_SP_OFFSET; + + val = cselib_lookup_from_insn (reg, GET_MODE (reg), 1, + VOIDmode, get_insns ()); + preserve_value (val); + cselib_preserve_cfa_base_value (val, REGNO (reg)); + expr = plus_constant (GET_MODE (stack_pointer_rtx), + stack_pointer_rtx, -ofst); + cselib_add_permanent_equiv (val, expr, get_insns ()); + + if (ofst) + { + val = cselib_lookup_from_insn (stack_pointer_rtx, + GET_MODE (stack_pointer_rtx), 1, + VOIDmode, get_insns ()); + preserve_value (val); + expr = plus_constant (GET_MODE (reg), reg, ofst); + cselib_add_permanent_equiv (val, expr, get_insns ()); + } +} + /* In order to factor out the adjustments made to the stack pointer or to the hard frame pointer and thus be able to use DW_OP_fbreg operations instead of individual location lists, we're going to