[PR50672, PATCH] Fix ice triggered by -ftree-tail-merge: verify_ssa failed: no immediate_use list
Richard, I have a patch for PR50672. When compiling the testcase from the PR with -ftree-tail-merge, the scenario is as follows: We start out tail_merge_optimize with blocks 14 and 20, which are alike, but not equal, since they have different successors: ... # BLOCK 14 freq:690 # PRED: 25 [61.0%] (false,exec) if (wD.2197_57(D) != 0B) goto bb 15; else goto bb 16; # SUCC: 15 [78.4%] (true,exec) 16 [21.6%] (false,exec) # BLOCK 20 freq:2900 # PRED: 29 [100.0%] (fallthru) 31 [100.0%] (fallthru) # .MEMD.2447_209 = PHI .MEMD.2447_125(29), .MEMD.2447_129(31) if (wD.2197_57(D) != 0B) goto bb 5; else goto bb 6; # SUCC: 5 [85.0%] (true,exec) 6 [15.0%] (false,exec) ... In the first iteration, we merge block 5 with block 15 and block 6 with block 16. After that, the blocks 14 and 20 are equal. In the second iteration, the blocks 14 and 20 are merged, by redirecting the incoming edges of block 20 to block 14, and removing block 20. Block 20 also contains the definition of .MEMD.2447_209. Removing the definition delinks the vuse of .MEMD.2447_209 in block 5: ... # BLOCK 5 freq:6036 # PRED: 20 [85.0%] (true,exec) # PT = nonlocal escaped D.2306_58 = thisD.2200_10(D)-D.2156; # .MEMD.2447_132 = VDEF .MEMD.2447_209 # USE = anything # CLB = anything drawLineD.2135 (D.2306_58, wD.2197_57(D), gcD.2198_59(D)); goto bb 17; # SUCC: 17 [100.0%] (fallthru,exec) ... After the pass, when executing the TODO_update_ssa_only_virtuals, we update the drawLine call in block 5 using rewrite_update_stmt, which calls maybe_replace_use for the vuse operand. However, maybe_replace_use doesn't have an effect since the old vuse and the new vuse happen to be the same (rdef == use), so SET_USE is not called and the vuse remains delinked: ... if (rdef rdef != use) SET_USE (use_p, rdef); ... The patch fixes this by forcing SET_USE for delinked uses. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom 2011-10-12 Tom de Vries t...@codesourcery.com PR tree-optimization/50672 * tree-into-ssa.c (maybe_replace_use): Force SET_USE for delinked uses. Index: gcc/tree-into-ssa.c === --- gcc/tree-into-ssa.c (revision 179592) +++ gcc/tree-into-ssa.c (working copy) @@ -1908,7 +1908,7 @@ maybe_replace_use (use_operand_p use_p) else if (is_old_name (use)) rdef = get_reaching_def (use); - if (rdef rdef != use) + if (rdef (rdef != use || (!use_p-next !use_p-prev))) SET_USE (use_p, rdef); }
Re: int_cst_hash_table mapping persistence and the garbage collector
On 10/11/11 11:05:18, Eric Botcazou wrote: One easy way to address the current issue is to call tree_int_cst_equal() if the integer constant tree pointers do not match: if ((c1 != c2) !tree_int_cst_equal (c1, c2)) /* integer constants aren't equal. */ You have two objects C1 and C2 for the same constant and you're comparing them. One was created first, say C1. If C1 was still live when C2 was created, why was C2 created in the first class? If C1 wasn't live anymore when C2 was created, why are you still using C1 here? Eric, this note provides some more detail in addition to my earlier reply to Richard. The problem is that the references to object C1 and C2 live in a hash table, and that although the referenced nodes will be retained by the garbage collector, their mapping in int_cst_hash_table is deleted by the GC. Thus, if we follow the diagram: tree (type) - [ upc_block_factor_for_type ] - tree (integer constant) tree (integer constant) - [ int_cst_hash_table ] {unique map} - tree (integer constant) Given two tree nodes, P (prototype) and F (function) they declare a parameter that is pointer to a UPC shared object and this pointer is declared with a UPC blocking factor of 1000. Without garbage collection, the mappings look like this: P.param - C1, F.param - C1 where C1 is an integer constant of the form (sizetype, 1000). but when GC kicks in, it decides that the hash table entry in int_cst_hash_table can be deleted because it doesn't think that C1 is live. Therefore the next attempt to map (sizetype, 1000) will yield a new integral constant tree node, C2. Then the mapping changes to: P.param - C1, F.param - C2, and we can no longer use TYPE_UPC_BLOCKING_FACTOR (P.param) == TYPE_UPC_BLOCKING_FACTOR (F.param) to check that the blocking factors of P.param and F.param are equal. For the GC to know that int_cst_hash_table entry is needed, perhaps the upc_block_factor_for_type needs to be traversed, and then each integer constant needs to be marked, or the constant has to be hashed into int_cst_hash_table and the actual hash table entry has to be marked. I am not familiar with the details of garbage collection and pretty much just try use existing code as a model. Apparently, this sequence of statements is insufficient to tell the GC that it should mark the integer constants referenced in this hash table as in use. static GTY ((if_marked (tree_map_marked_p), param_is (struct tree_map))) htab_t upc_block_factor_for_type; [...] upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash tree_map_eq, 0); Reading the GC code: static int ggc_htab_delete (void **slot, void *info) { const struct ggc_cache_tab *r = (const struct ggc_cache_tab *) info; if (! (*r-marked_p) (*slot)) htab_clear_slot (*r-base, slot); else (*r-cb) (*slot); return 1; } It appears that the int_cst_hash_table entry for C1 needs to be marked or it will be cleared. I don't know how to set things up so that so that the garbage collecting mechanisms are in place to do that, and was hoping that tree_map_hash table would provide the required mechanisms. Apparently, this is not the case. I had hoped that this declaration would be sufficient to convince the GC to consider all mapped integer constant nodes to be live. If not, then perhaps I need a GC hook associated with upc_block_factor_for_type that does something like the following: for t (each used upc_block_factor_for_type entry): c = t.to # the mapped integer constant if is_integer_constant (c): h = int_cst_hash_table.hash(c) gc_mark_htab (int_cst_hash_table, h) or perhaps this is sufficient? for t (each used upc_block_factor_for_type entry): c = t.to gc_mark_tree_node (c) However, I thought that this would already have been done automatically by the GC hash tree implementation. If either of those methods are required, I would appreciate suggestions/pointers/code that would help me make sure that this approach is implemented correctly. thanks, - Gary
[PATCH] Add capability to run several iterations of early optimizations
The following patch adds new knob to make GCC perform several iterations of early optimizations and inlining. This is for dont-care-about-compile-time-optimize-all-you-can scenarios. Performing several iterations of optimizations does significantly improve code speed on a certain proprietary source base. Some hand-tuning of the parameter value is required to get optimum performance. Another good use for this option is for search and ad-hoc analysis of cases where GCC misses optimization opportunities. With the default setting of '1', nothing is changed from the current status quo. The patch was bootstrapped and regtested with 3 iterations set by default on i686-linux-gnu. The only failures in regression testsuite were due to latent bugs in handling of EH information, which are being discussed in a different thread. Performance impact on the standard benchmarks is not conclusive, there are improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*]. SPEC2006 benchmarks will take another day or two to complete and I will update the spreadsheet then. The benchmarks were run on a Core2 system for all combinations of {-m32/-m64}{-O2/-O3}. Effect on compilation time is fairly predictable, about 10% compile time increase with 3 iterations. OK for trunk? [*] https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFEhl=en_US Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics fsf-gcc-iter-eipa.ChangeLog Description: Binary data fsf-gcc-iter-eipa.patch Description: Binary data
[patch i386]: Unbreak bootstrap for x64 SEH enabled target
Hello, by recent changes gcc begun to move code into the prologue region. This is for x64 SEH an issue, as here the table-information for prologue is limited to 255 bytes size. So we need to avoid moving additional code into prologue. To achieve this we mark all standard and xmm registers as prologue-used at the end of prologue. Also we need to emit a memory blockage. ChangeLog 2011-10-12 Kai Tietz kti...@redhat.com * config/i386/i386.c (ix86_expand_prologue): Mark for TARGET_SEH all sse/integer registers as prologue-used. Tested for x86_64-w64-mingw32. Ok for apply? Regards, Kai Index: i386.c === --- i386.c (revision 179824) +++ i386.c (working copy) @@ -10356,7 +10356,24 @@ Further, prevent alloca modifications to the stack pointer from being combined with prologue modifications. */ if (TARGET_SEH) -emit_insn (gen_prologue_use (stack_pointer_rtx)); +{ + int i; + + /* Due limited size of prologue-code size of 255 bytes, + we need to prevent scheduler to sink instructions into + prologue code. Therefore we mark all standard, sse, fpu, + and the pc registers as prologue-used to prevent this. + Also an memory-blockage is necessary. */ + emit_insn (gen_memory_blockage ()); + + for (i = 0; i = 7; i++) +{ + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, AX_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, R8_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM0_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM8_REG + i))); + } +} } /* Emit code to restore REG using a POP insn. */
Re: int_cst_hash_table mapping persistence and the garbage collector
The problem is that the references to object C1 and C2 live in a hash table, and that although the referenced nodes will be retained by the garbage collector, their mapping in int_cst_hash_table is deleted by the GC. This isn't a simple hash table, is it? I am not familiar with the details of garbage collection and pretty much just try use existing code as a model. Apparently, this sequence of statements is insufficient to tell the GC that it should mark the integer constants referenced in this hash table as in use. static GTY ((if_marked (tree_map_marked_p), param_is (struct tree_map))) htab_t upc_block_factor_for_type; [...] upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash tree_map_eq, 0); This is a map so it is garbage-collected as a map: if the key isn't marked, then the value isn't either. Hence 2 questions: - why using a map and not a simple hash table? - what is the key and why isn't it marked? -- Eric Botcazou
Re: [PATCH] Mark static const strings as read-only.
On 10/10/2011 05:50 PM, Eric Botcazou wrote: So, the patch for build_constant_desc does not have the desired effect. OK, too bad that we need to play this back-and-forth game with MEMs. So the original patch is OK (with TREE_READONLY (base) on the next line to mimic what is done just above and without the gcc/ prefix in the ChangeLog). If you have some available cycles, you can test and install the build_constant_desc change in the same commit, otherwise I'll do it myself. I'll include the build_constant_desc change in a bootstrap/reg-test on x86_64. Thanks, - Tom
Re: int_cst_hash_table mapping persistence and the garbage collector
It maps a type node into a corresponding integer node that is the blocking factor associated with the type node. Before the advent of this hash table the blocking factor was stored in a dedicated field in the tree type node. The suggestion was made to move this into a hash table to save space. I chose the tree map hash table because I thought it could do the job. So this isn't a simple hash table since this is a map. A simple hash table doesn't store the key in the slot, only the value; a map does. The keys are valid. In the example discussed in this thread, there is a pointer to type node that used in a parameter declaration of a function prototype and also in the similarly named parameter of the function definition. Both tree pointers are used as keys, and they are live at the point that the GC runs. But somehow they aren't marked by the GC. You need to find out why, since the value will be kept only if the key is already marked by the GC. By the time a GC pass is run, all trees to be kept must be linked to a GC root. You said that the pass was run between the point that the function prototype tree node was created and the point at which the function declaration was processed. To which GC root are the keys linked between these points? -- Eric Botcazou
Re: [Patch,AVR]: Fix PR49939: Skip 2-word insns
Denis Chertykov schrieb: 2011/10/11 Georg-Johann Lay a...@gjlay.de: This patch teaches avr-gcc to skip 2-word instructions like STS and LDS. It's just about looking into an 2-word insn and check if it's a 2-word instruction or not. Passes without regression. Ok to install? Please commit. Denis. Committed with the following change: - avr_2word_insn_p (next_nonnote_nondebug_insn (insn; + avr_2word_insn_p (next_active_insn (insn; Otherwise a code label would knock out the optimization like in char c; void foo (char a, char b) { if (a || b) c = b; } which now compiles to foo: tst r24 brne .L4 cpse r22,__zero_reg__ .L4: sts c,r22 .L3: ret Johann
Commit: ARM: Add comments to emitted .eabi_attribute directives
Hi Guys, I am checking in the patch below to add comments to the .eabi_attribute assembler directives emitted by the ARM backend, when commented assembler output is enabled. Cheers Nick gcc/ChangeLog 2011-10-12 Nick Clifton ni...@redhat.com * config/arm/arm.h (EMIT_EABI_ATTRIBUTE): New macro. Used to emit a .eabi_attribute assembler directive, possibly with a comment attached. * config/arm/arm.c (arm_file_start): Use the new macro. * config/arm/arm-c.c (arm_output_c_attributes): Likewise. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 179839) +++ gcc/config/arm/arm.c(working copy) @@ -22291,9 +22291,9 @@ if (arm_fpu_desc-model == ARM_FP_MODEL_VFP) { if (TARGET_HARD_FLOAT) - asm_fprintf (asm_out_file, \t.eabi_attribute 27, 3\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3); if (TARGET_HARD_FLOAT_ABI) - asm_fprintf (asm_out_file, \t.eabi_attribute 28, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1); } } asm_fprintf (asm_out_file, \t.fpu %s\n, fpu_name); @@ -22302,31 +22302,24 @@ are used. However we don't have any easy way of figuring this out. Conservatively record the setting that would have been used. */ - /* Tag_ABI_FP_rounding. */ if (flag_rounding_math) - asm_fprintf (asm_out_file, \t.eabi_attribute 19, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1); + if (!flag_unsafe_math_optimizations) { - /* Tag_ABI_FP_denomal. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 20, 1\n); - /* Tag_ABI_FP_exceptions. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 21, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1); } - /* Tag_ABI_FP_user_exceptions. */ if (flag_signaling_nans) - asm_fprintf (asm_out_file, \t.eabi_attribute 22, 1\n); - /* Tag_ABI_FP_number_model. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 23, %d\n, - flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1); - /* Tag_ABI_align8_needed. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 24, 1\n); - /* Tag_ABI_align8_preserved. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 25, 1\n); - /* Tag_ABI_enum_size. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 26, %d\n, - flag_short_enums ? 1 : 2); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23, + flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2); + /* Tag_ABI_optimization_goals. */ if (optimize_size) val = 4; @@ -22336,21 +22329,18 @@ val = 1; else val = 6; - asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val); + EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val); - /* Tag_CPU_unaligned_access. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n, - unaligned_access); + EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access); - /* Tag_ABI_FP_16bit_format. */ if (arm_fp16_format) - asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n, -(int)arm_fp16_format); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_16bit_format, 38, (int) arm_fp16_format); if (arm_lang_output_object_attributes_hook) arm_lang_output_object_attributes_hook(); static void Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h(revision 179839) +++ gcc/config/arm/arm.h(working copy) @@ -2235,4 +2235,19 @@ %{mcpu=generic-*:-march=%*; \ :%{mcpu=*:-mcpu=%*} %{march=*:-march=%*}} +/* This macro is used to emit an EABI tag and its associated value. + We emit the numerical value of the tag in case the assembler does not + support textual tags. (Eg gas prior to 2.20). If requested we include + the tag name in a comment so that anyone reading the assembler output + will know which tag is being set. */ +#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL) \ + do \ +{ \ + asm_fprintf (asm_out_file, \t.eabi_attribute %d, %d, NUM, VAL); \ + if (flag_verbose_asm || flag_debug_asm) \ + asm_fprintf
[C++ Patch] PR 50594 (C++ front-end bits)
Hi, thus, per the discussion in the audit trail, I'm proceeding with decorating with __attribute__((externally_visible)) both the 8 new and delete in new, and the 4 pre-declared by the C++ front-end. The below is what I regression tested successfully, together with the library bits, on x86_64-linux. I'm also attaching, for convenience, the library work (I took the occasion to adjust noexcept vs throw(), etc, otherwise the patch would be tiny) What do you think? Thanks, Paolo. 2011-10-12 Paolo Carlini paolo.carl...@oracle.com PR c++/50594 * decl.c (cxx_init_decl_processing): Add __attribute__((externally_visible)) to operator new and operator delete library fn. Index: decl.c === --- decl.c (revision 179842) +++ decl.c (working copy) @@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void) current_lang_name = lang_name_cplusplus; { -tree newattrs; +tree newattrs, delattrs; tree newtype, deltype; tree ptr_ftype_sizetype; tree new_eh_spec; @@ -3687,9 +3687,16 @@ cxx_init_decl_processing (void) newattrs = build_tree_list (get_identifier (alloc_size), build_tree_list (NULL_TREE, integer_one_node)); +newattrs + = chainon (newattrs, build_tree_list +(get_identifier (externally_visible), NULL_TREE)); newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs); newtype = build_exception_variant (newtype, new_eh_spec); -deltype = build_exception_variant (void_ftype_ptr, empty_except_spec); +delattrs + = build_tree_list (get_identifier (externally_visible), +build_tree_list (NULL_TREE, integer_one_node)); +deltype = cp_build_type_attribute_variant (void_ftype_ptr, delattrs); +deltype = build_exception_variant (deltype, empty_except_spec); push_cp_library_fn (NEW_EXPR, newtype); push_cp_library_fn (VEC_NEW_EXPR, newtype); global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype); Index: include/bits/c++config === --- include/bits/c++config (revision 179842) +++ include/bits/c++config (working copy) @@ -103,9 +103,11 @@ # ifdef __GXX_EXPERIMENTAL_CXX0X__ # define _GLIBCXX_NOEXCEPT noexcept # define _GLIBCXX_USE_NOEXCEPT noexcept +# define _GLIBCXX_THROW(_EXC) # else # define _GLIBCXX_NOEXCEPT # define _GLIBCXX_USE_NOEXCEPT throw() +# define _GLIBCXX_THROW(_EXC) throw(_EXC) # endif #endif Index: libsupc++/del_op.cc === --- libsupc++/del_op.cc (revision 179842) +++ libsupc++/del_op.cc (working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -41,7 +42,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void -operator delete(void* ptr) throw () +operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT { if (ptr) std::free(ptr); Index: libsupc++/new_opv.cc === --- libsupc++/new_opv.cc(revision 179842) +++ libsupc++/new_opv.cc(working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -27,7 +28,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void* -operator new[] (std::size_t sz) throw (std::bad_alloc) +operator new[] (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) { return ::operator new(sz); } Index: libsupc++/new_op.cc === --- libsupc++/new_op.cc (revision 179842) +++ libsupc++/new_op.cc (working copy) @@ -42,7 +42,7 @@ extern new_handler __new_handler; _GLIBCXX_WEAK_DEFINITION void * -operator new (std::size_t sz) throw (std::bad_alloc) +operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) { void *p; Index: libsupc++/del_opv.cc === --- libsupc++/del_opv.cc(revision 179842) +++ libsupc++/del_opv.cc(working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -27,7 +28,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void
Re: Fix PR 50565 (offsetof-type expressions in static initializers)
On Tue, Oct 11, 2011 at 5:32 PM, Joseph S. Myers jos...@codesourcery.com wrote: This patch fixes PR 50565, a failure to accept certain offsetof-type expressions in static initializers introduced by my constant expressions changes. (These expressions are permitted but not required by ISO C to be accepted; the intent of my constant expressions model is that they should be valid in GNU C.) The problem comes down to an expression with the difference of two pointers being cast to int on a 64-bit system, resulting in convert_to_integer moving the conversions inside the subtraction. (These optimizations at conversion time should really be done later as a part of folding, or even later than that, rather than unconditionally in convert_to_*, but that's another issue.) So when the expression reaches c_fully_fold it is a difference of narrowed pointers being folded, which the compiler cannot optimize as it can a difference of unnarrowed pointers with the same base object. Before the introduction of c_fully_fold the difference would have been folded when built and so the narrowing of operands would never have been applied to it. This patch disables the narrowing in the case of pointer subtraction, as it doesn't seem particularly likely to be useful there and is known to prevent this folding required for these initializers to be accepted. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. OK to commit? Ok. Thanks, Richard. 2011-10-11 Joseph Myers jos...@codesourcery.com PR c/50565 * convert.c (convert_to_integer): Do not narrow operands of pointer subtraction. testsuite: 2011-10-11 Joseph Myers jos...@codesourcery.com PR c/50565 * gcc.c-torture/compile/pr50565-1.c, gcc.c-torture/compile/pr50565-2.c: New tests. Index: gcc/testsuite/gcc.c-torture/compile/pr50565-1.c === --- gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0) @@ -0,0 +1,4 @@ +struct s { char p[2]; }; +static struct s v; +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0U; +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1U; Index: gcc/testsuite/gcc.c-torture/compile/pr50565-2.c === --- gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0) @@ -0,0 +1,4 @@ +struct s { char p[2]; }; +static struct s v; +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0; +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1; Index: gcc/convert.c === --- gcc/convert.c (revision 179754) +++ gcc/convert.c (working copy) @@ -745,6 +745,15 @@ convert_to_integer (tree type, tree expr tree arg0 = get_unwidened (TREE_OPERAND (expr, 0), type); tree arg1 = get_unwidened (TREE_OPERAND (expr, 1), type); + /* Do not try to narrow operands of pointer subtraction; + that will interfere with other folding. */ + if (ex_form == MINUS_EXPR + CONVERT_EXPR_P (arg0) + CONVERT_EXPR_P (arg1) + POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg0, 0))) + POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0 + break; + if (outprec = BITS_PER_WORD || TRULY_NOOP_TRUNCATION (outprec, inprec) || inprec TYPE_PRECISION (TREE_TYPE (arg0)) -- Joseph S. Myers jos...@codesourcery.com
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Well to use here binary form of operation seems to me misleading. Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. It is right that the non-IF AND/OR has finally the same behavior as the binary form in gimple. Nevertheless it isn't the same on AST level. But sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C) Too much noise, leave out the || variant, and just say once Same for ||. Ciao, Michael.
Re: [PATCH] Add capability to run several iterations of early optimizations
On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov ma...@codesourcery.com wrote: The following patch adds new knob to make GCC perform several iterations of early optimizations and inlining. This is for dont-care-about-compile-time-optimize-all-you-can scenarios. Performing several iterations of optimizations does significantly improve code speed on a certain proprietary source base. Some hand-tuning of the parameter value is required to get optimum performance. Another good use for this option is for search and ad-hoc analysis of cases where GCC misses optimization opportunities. With the default setting of '1', nothing is changed from the current status quo. The patch was bootstrapped and regtested with 3 iterations set by default on i686-linux-gnu. The only failures in regression testsuite were due to latent bugs in handling of EH information, which are being discussed in a different thread. Performance impact on the standard benchmarks is not conclusive, there are improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*]. SPEC2006 benchmarks will take another day or two to complete and I will update the spreadsheet then. The benchmarks were run on a Core2 system for all combinations of {-m32/-m64}{-O2/-O3}. Effect on compilation time is fairly predictable, about 10% compile time increase with 3 iterations. OK for trunk? I don't think this is a good idea, especially in the form you implemented it. If we'd want to iterate early optimizations we'd want to do it by iterating an IPA pass so that we benefit from more precise size estimates when trying to inline a function the second time. Also statically scheduling the passes will mess up dump files and you have no chance of say, noticing that nothing changed for function f and its callees in iteration N and thus you can skip processing them in iteration N + 1. So, at least you should split the pass_early_local_passes IPA pass into three, you'd iterate over the 2nd (definitely not over pass_split_functions though), the third would be pass_profile and pass_split_functions only. And you'd iterate from the place the 2nd IPA pass is executed, not by scheduling them N times. Then you'd have to analyze the compile-time impact of the IPA splitting on its own when not iterating. Then you should look at what actually was the optimizations that were performed that lead to the improvement (I can see some indirect inlining happening, but everything else would be a bug in present optimizers in the early pipeline - they are all designed to be roughly independent on each other and _not_ expose new opportunities by iteration). Thus - testcases? Thanks, Richard. [*] https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFEhl=en_US Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: int_cst_hash_table mapping persistence and the garbage collector
On Wed, Oct 12, 2011 at 10:29 AM, Eric Botcazou ebotca...@adacore.com wrote: It maps a type node into a corresponding integer node that is the blocking factor associated with the type node. Before the advent of this hash table the blocking factor was stored in a dedicated field in the tree type node. The suggestion was made to move this into a hash table to save space. I chose the tree map hash table because I thought it could do the job. So this isn't a simple hash table since this is a map. A simple hash table doesn't store the key in the slot, only the value; a map does. The keys are valid. In the example discussed in this thread, there is a pointer to type node that used in a parameter declaration of a function prototype and also in the similarly named parameter of the function definition. Both tree pointers are used as keys, and they are live at the point that the GC runs. But somehow they aren't marked by the GC. You need to find out why, since the value will be kept only if the key is already marked by the GC. By the time a GC pass is run, all trees to be kept must be linked to a GC root. You said that the pass was run between the point that the function prototype tree node was created and the point at which the function declaration was processed. To which GC root are the keys linked between these points? I think there is an issue when two cache htabs refer to each other with respect to GC, you might search the list to find out more. Richard. -- Eric Botcazou
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/12 Michael Matz m...@suse.de: Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Well to use here binary form of operation seems to me misleading. Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. See TRUTH_AND_EXPR is in C-notation and TRUTH_ANDIF_EXPR is also . The transcription to binary is done in gimplifier. Btw I just noticed that by this patch a latent bug in gimplifier about boolification for TRUTH_NOT_EXPR/TRUTH_AND/OR... is present. On Fortran there are different boolean-kinds types with different precision. This makes them incompatible to eachother in gimple (as useless_type_conversion_p returns for them false). For gimplier needs to ensure that operands for those TRUTH_... expression met a compatible type of final expression type. I will sent a patch for this as soon I completed regression-testing for it. It is right that the non-IF AND/OR has finally the same behavior as the binary form in gimple. Nevertheless it isn't the same on AST level. But sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C) Too much noise, leave out the || variant, and just say once Same for ||. Ciao, Michael. Cheers, Kai
Re: [PATCH] [Annotalysis] Bugfix where lock function is attached to a base class.
On Tue, Oct 11, 2011 at 13:52, Delesley Hutchins deles...@google.com wrote: This patch fixes an error where Annotalysis generates bogus warnings when using lock and unlock functions that are attached to a base class. The canonicalize routine did not work correctly in this case. Bootstrapped and passed gcc regression testsuite on x86_64-unknown-linux-gnu. Okay for google/gcc-4_6? -DeLesley Changelog.google-4_6: 2011-10-11 DeLesley Hutchins deles...@google.com * tree-threadsafe-analyze.c (get_canonical_lock_expr) testsuite/Changelog.google-4_6: 2011-10-11 DeLesley Hutchins deles...@google.com * g++.dg/thread-ann/thread_annot_lock-83.C ChangeLog entries are missing. OK otherwise. Diego.
Re: [PATCH] RFC: Cache LTO streamer mappings
On Sun, Oct 9, 2011 at 13:11, Andi Kleen a...@firstfloor.org wrote: Is it still a good idea? Given that you found no speedups and it introduces added complexity, I think it's best if we revisit the idea later. I never found bytecode reading to be a bottleneck in LTO, but perhaps Jan can comment what the experience is with Mozilla builds. Diego.
Re: [gimplefe][patch] A bugfix for a missed symbol
On Mon, Oct 10, 2011 at 17:28, Sandeep Soni soni.sande...@gmail.com wrote: 2011-10-11 Sandeep Soni soni.sande...@gmail.com * parser.c (gp_parse_var_decl): Fixed a bug for the missing symbol 'CPP_LESS' in the 'INTEGER_TYPE' declaration. OK. Diego.
Re: [C++ Patch] PR 50594 (C++ front-end bits)
On 10/12/2011 07:26 AM, Paolo Carlini wrote: +delattrs + = build_tree_list (get_identifier (externally_visible), +build_tree_list (NULL_TREE, integer_one_node)); Why integer_one_node? Jason
Re: [PR50672, PATCH] Fix ice triggered by -ftree-tail-merge: verify_ssa failed: no immediate_use list
On Wed, Oct 12, 2011 at 8:35 AM, Tom de Vries vr...@codesourcery.com wrote: Richard, I have a patch for PR50672. When compiling the testcase from the PR with -ftree-tail-merge, the scenario is as follows: We start out tail_merge_optimize with blocks 14 and 20, which are alike, but not equal, since they have different successors: ... # BLOCK 14 freq:690 # PRED: 25 [61.0%] (false,exec) if (wD.2197_57(D) != 0B) goto bb 15; else goto bb 16; # SUCC: 15 [78.4%] (true,exec) 16 [21.6%] (false,exec) # BLOCK 20 freq:2900 # PRED: 29 [100.0%] (fallthru) 31 [100.0%] (fallthru) # .MEMD.2447_209 = PHI .MEMD.2447_125(29), .MEMD.2447_129(31) if (wD.2197_57(D) != 0B) goto bb 5; else goto bb 6; # SUCC: 5 [85.0%] (true,exec) 6 [15.0%] (false,exec) ... In the first iteration, we merge block 5 with block 15 and block 6 with block 16. After that, the blocks 14 and 20 are equal. In the second iteration, the blocks 14 and 20 are merged, by redirecting the incoming edges of block 20 to block 14, and removing block 20. Block 20 also contains the definition of .MEMD.2447_209. Removing the definition delinks the vuse of .MEMD.2447_209 in block 5: ... # BLOCK 5 freq:6036 # PRED: 20 [85.0%] (true,exec) # PT = nonlocal escaped D.2306_58 = thisD.2200_10(D)-D.2156; # .MEMD.2447_132 = VDEF .MEMD.2447_209 # USE = anything # CLB = anything drawLineD.2135 (D.2306_58, wD.2197_57(D), gcD.2198_59(D)); goto bb 17; # SUCC: 17 [100.0%] (fallthru,exec) ... And block 5 is retained and block 15 is discarded? After the pass, when executing the TODO_update_ssa_only_virtuals, we update the drawLine call in block 5 using rewrite_update_stmt, which calls maybe_replace_use for the vuse operand. However, maybe_replace_use doesn't have an effect since the old vuse and the new vuse happen to be the same (rdef == use), so SET_USE is not called and the vuse remains delinked: ... if (rdef rdef != use) SET_USE (use_p, rdef); ... The patch fixes this by forcing SET_USE for delinked uses. That isn't the correct fix. Whoever unlinks the vuse (by removing its definition) has to replace it with something valid, which is either the bare symbol .MEM, or the VUSE associated with the removed VDEF (thus, as unlink_stmt_vdef does). Richard. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom 2011-10-12 Tom de Vries t...@codesourcery.com PR tree-optimization/50672 * tree-into-ssa.c (maybe_replace_use): Force SET_USE for delinked uses.
Re: [gimplefe][patch] The symbol table for declarations
On Tue, Oct 11, 2011 at 09:42, Tom Tromey tro...@redhat.com wrote: Sandeep == Sandeep Soni soni.sande...@gmail.com writes: Sandeep The following patch is a basic attempt to build a symbol table that Sandeep stores the names of all the declarations made in the input file. I don't know anything about gimplefe, but unless you have complicated needs, it is more usual to just put a symbol's value directly into the identifier node. The C front end is a good example of this. Granted, but a central symbol table simplifies processing like generating gimple output. The gimple FE will want to emit a text file with transformed gimple. Diego.
[PATCH] Fix PR50700
This fixes __builtin_object_size folding on static storage accessed via a type with a trailing array element. Bootstrapped and tested on x86_64-unknown-linux-gnu, will apply after testing on the 4.6 branch. Thanks, Richard. 2011-10-12 Richard Guenther rguent...@suse.de PR tree-optimization/50700 * tree-object-size.c (addr_object_size): Simplify and treat MEM_REF bases consistently. * gcc.dg/builtin-object-size-12.c: New testcase. Index: gcc/tree-object-size.c === *** gcc/tree-object-size.c (revision 179757) --- gcc/tree-object-size.c (working copy) *** addr_object_size (struct object_size_inf *** 166,189 gcc_assert (TREE_CODE (ptr) == ADDR_EXPR); pt_var = TREE_OPERAND (ptr, 0); ! if (REFERENCE_CLASS_P (pt_var)) ! pt_var = get_base_address (pt_var); if (pt_var !TREE_CODE (pt_var) == MEM_REF !TREE_CODE (TREE_OPERAND (pt_var, 0)) == SSA_NAME !POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (pt_var, 0 { unsigned HOST_WIDE_INT sz; ! if (!osi || (object_size_type 1) != 0) { sz = compute_builtin_object_size (TREE_OPERAND (pt_var, 0), object_size_type ~1); - if (host_integerp (TREE_OPERAND (pt_var, 1), 0)) - sz -= TREE_INT_CST_LOW (TREE_OPERAND (pt_var, 1)); - else - sz = offset_limit; } else { --- 166,184 gcc_assert (TREE_CODE (ptr) == ADDR_EXPR); pt_var = TREE_OPERAND (ptr, 0); ! while (handled_component_p (pt_var)) ! pt_var = TREE_OPERAND (pt_var, 0); if (pt_var !TREE_CODE (pt_var) == MEM_REF) { unsigned HOST_WIDE_INT sz; ! if (!osi || (object_size_type 1) != 0 ! || TREE_CODE (pt_var) != SSA_NAME) { sz = compute_builtin_object_size (TREE_OPERAND (pt_var, 0), object_size_type ~1); } else { *** addr_object_size (struct object_size_inf *** 195,204 sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)]; else sz = unknown[object_size_type]; ! if (host_integerp (TREE_OPERAND (pt_var, 1), 0)) ! sz -= TREE_INT_CST_LOW (TREE_OPERAND (pt_var, 1)); else ! sz = offset_limit; } if (sz != unknown[object_size_type] sz offset_limit) --- 190,206 sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)]; else sz = unknown[object_size_type]; ! } ! if (sz != unknown[object_size_type]) ! { ! double_int dsz = double_int_sub (uhwi_to_double_int (sz), ! mem_ref_offset (pt_var)); ! if (double_int_negative_p (dsz)) ! sz = 0; ! else if (double_int_fits_in_uhwi_p (dsz)) ! sz = double_int_to_uhwi (dsz); else ! sz = unknown[object_size_type]; } if (sz != unknown[object_size_type] sz offset_limit) *** addr_object_size (struct object_size_inf *** 211,217 tree_low_cst (DECL_SIZE_UNIT (pt_var), 1) offset_limit) pt_var_size = DECL_SIZE_UNIT (pt_var); else if (pt_var ! (SSA_VAR_P (pt_var) || TREE_CODE (pt_var) == STRING_CST) TYPE_SIZE_UNIT (TREE_TYPE (pt_var)) host_integerp (TYPE_SIZE_UNIT (TREE_TYPE (pt_var)), 1) (unsigned HOST_WIDE_INT) --- 213,219 tree_low_cst (DECL_SIZE_UNIT (pt_var), 1) offset_limit) pt_var_size = DECL_SIZE_UNIT (pt_var); else if (pt_var ! TREE_CODE (pt_var) == STRING_CST TYPE_SIZE_UNIT (TREE_TYPE (pt_var)) host_integerp (TYPE_SIZE_UNIT (TREE_TYPE (pt_var)), 1) (unsigned HOST_WIDE_INT) Index: gcc/testsuite/gcc.dg/builtin-object-size-12.c === *** gcc/testsuite/gcc.dg/builtin-object-size-12.c (revision 0) --- gcc/testsuite/gcc.dg/builtin-object-size-12.c (revision 0) *** *** 0 --- 1,19 + /* { dg-do run } */ + /* { dg-options -O2 } */ + + extern void abort (void); + struct S { + int len; + char s[0]; + }; + int main() + { + char buf[sizeof (struct S) + 32]; + if (__builtin_object_size (((struct S *)buf[0])-s, 1) != 32) + abort (); + if (__builtin_object_size (((struct S *)buf[1])-s, 1) != 31) + abort (); + if (__builtin_object_size (((struct S *)buf[64])-s, 0) != 0) + abort (); + return 0; + }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. See TRUTH_AND_EXPR is in C-notation and TRUTH_ANDIF_EXPR is also . Ah right, confusing but there we are. A comment using ANDIF and AND it is then. Ciao, Michael.
[PATCH] Fix PR50189
This changes VRP to use the type of the variable we record an assertion for to look for TYPE_MIN/MAX_VALUEs rather than the limit that it is tested against. That makes sense anyway and happens to mitigate the wrong-code bug for the testcase in PR50189. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Thanks, Richard. 2011-10-12 Paul Koning pkon...@gcc.gnu.org PR tree-optimization/50189 * tree-vrp.c (extract_range_from_assert): Use the type of the variable, not the limit. * g++.dg/torture/pr50189.C: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 179757) --- gcc/tree-vrp.c (working copy) *** extract_range_from_assert (value_range_t *** 1476,1482 limit = avoid_overflow_infinity (limit); ! type = TREE_TYPE (limit); gcc_assert (limit != var); /* For pointer arithmetic, we only keep track of pointer equality --- 1476,1482 limit = avoid_overflow_infinity (limit); ! type = TREE_TYPE (var); gcc_assert (limit != var); /* For pointer arithmetic, we only keep track of pointer equality Index: gcc/testsuite/g++.dg/torture/pr50189.C === *** gcc/testsuite/g++.dg/torture/pr50189.C (revision 0) --- gcc/testsuite/g++.dg/torture/pr50189.C (revision 0) *** *** 0 --- 1,121 + // { dg-do run } + // { dg-options -fstrict-enums } + + extern C void abort (void); + class CCUTILS_KeyedScalarLevelPosition + { + public: + + typedef enum + { + UNINITED = 0, + AT_BEGIN = 1, + AT_END = 2, + AT_KEY = 3 + + } position_t; + + bool is_init() const + { return(m_timestamp != UNINITED); } + + bool is_at_begin() const + { return(m_timestamp == AT_BEGIN); } + + position_t get_state() const + { + return((m_timestamp = AT_KEY) + ? AT_KEY + : ((position_t)m_timestamp)); + } + + void set_at_begin() + { m_timestamp = AT_BEGIN; } + + unsigned int get_index() const + { return(m_index); } + + void set_pos(unsigned int a_index, unsigned int a_timestmap) + { + m_index = a_index; + m_timestamp = a_timestmap; + } + + bool check_pos(unsigned int a_num_entries, unsigned int a_timestamp) const + { + if (get_state() != AT_KEY) + return(false); + + if (m_timestamp != a_timestamp) + return(false); + + return(m_index a_num_entries); + } + + void set_not_init() + { m_timestamp = 0; } + + private: + + unsigned int m_timestamp; + unsigned int m_index; + + }; + + class CCUTILS_KeyedScalarPosition + { + public: + + CCUTILS_KeyedScalarLevelPosition m_L1; + CCUTILS_KeyedScalarLevelPosition m_L2; + }; + + class baz + { + public: + int *n[20]; + unsigned int m_cur_array_len; + unsigned int m_timestamp; + + unsigned int _get_timestamp() const + { return(m_timestamp); } + + bool _check_L1_pos(const CCUTILS_KeyedScalarPosition a_position) const + { + return(a_position.m_L1.check_pos( +m_cur_array_len, _get_timestamp())); + } + + void *next (CCUTILS_KeyedScalarPosition ); + }; + + void * baz::next (CCUTILS_KeyedScalarPosition a_position) + { + if (a_position.m_L1.is_at_begin() || (!a_position.m_L1.is_init())) + { + a_position.m_L1.set_pos(0, _get_timestamp()); + a_position.m_L2.set_at_begin(); + } + else if (!_check_L1_pos(a_position)) + return(0); + + return n[a_position.m_L1.get_index ()]; + } + + int main (int, char **) + { + baz obj; + CCUTILS_KeyedScalarPosition a_pos; + void *ret; + int n[5]; + + obj.n[0] = n; + obj.m_cur_array_len = 1; + obj.m_timestamp = 42; + + a_pos.m_L1.set_pos (0, 42); + + ret = obj.next (a_pos); + if (ret == 0) + abort (); + return 0; + }
Re: [C++ Patch] PR 50594 (C++ front-end bits)
On 10/12/2011 02:18 PM, Jason Merrill wrote: On 10/12/2011 07:26 AM, Paolo Carlini wrote: +delattrs + = build_tree_list (get_identifier (externally_visible), + build_tree_list (NULL_TREE, integer_one_node)); Why integer_one_node? To be honest? No idea, I copied what pre-existed for operator new. Shall I test (NULL_TREE, NULL_TREE)?? Paolo.
Re: [C++ Patch] PR 50594 (C++ front-end bits)
On Wed, Oct 12, 2011 at 2:29 PM, Paolo Carlini paolo.carl...@oracle.com wrote: On 10/12/2011 02:18 PM, Jason Merrill wrote: On 10/12/2011 07:26 AM, Paolo Carlini wrote: + delattrs + = build_tree_list (get_identifier (externally_visible), + build_tree_list (NULL_TREE, integer_one_node)); Why integer_one_node? To be honest? No idea, I copied what pre-existed for operator new. Shall I test (NULL_TREE, NULL_TREE)?? build_tree_list (get_identifier (externally_visible), NULL_TREE) Paolo.
Re: [PATCH] Fix PR50204
Hi, On Tue, 11 Oct 2011, Richard Guenther wrote: Since we have the alias oracle we no longer optimize the testcase below because I initially restricted the stmt walking to give up for PHIs with more than 2 arguments because of compile-time complexity issues. But it's easy to see that compile-time is not an issue when we reduce PHI args pairwise to a single dominating operand. Of course it is, not a different complexity class, but a constant factor. You have to do N-1 pairwise reductions, meaning with a large fan-in block you pay N-1 times the price, not just once for one pair, and if the price happens to be walking all up to the function start you indeed then are at N*M. I think there should be a cutoff, possibly not at two. Think about the generated testcases with many large switches. Ciao, Michael.
Re: Out-of-order update of new_spill_reg_store[]
On 10/11/11 14:35, Richard Sandiford wrote: No, reload 1 is inherited by a later instruction. And it's inherited correctly, in terms of the register contents being what we expect. (Reload 1 is the one that survives to the end of the instruction's reload sequence. Reload 2, in contrast, is clobbered by reload 1, so could not be inherited. So when we record inheritance information in emit_reload_insns, reload_reg_reaches_end_p correctly stops us from recording reload 2 but allows us to record reload 1.) The problem is that we record the wrong instruction for reload 1. We say that reload 1 is performed by the instruction that performs reload 2. So spill_reg_store[] contains the instruction for reload 2 rather than the instruction for reload 1. We delete it in delete_output_reload at the point of inheritance. Ok. So, would the minimal fix of testing !new_spill_reg_store[..] before writing to it also work? Seems to me this would cope with the out-of-order writes by only allowing the first. If so, then I think I'd prefer that, but we could gcc_assert (reload_reg_reaches_end (..)) as a bit of a verification of that function. Bernd
Re: [PATCH] RFC: Cache LTO streamer mappings
On 11-10-12 08:25 , Jan Hubicka wrote: WPA is currently about 1/3 of readingtype merging, 1/3 of streaming out and 1/3 of inlining. inlining is relatively easy to cure, so yes, streaming performance is important. The very basic streaming primitives actualy still shows top in profile along with hashing and type comparing code. I will post some updated oprofiles into Mozilla PR. OK, thanks. My numbers are from very early LTO development. Honestly I think we won't get any great speedups unless we work on reducing amount of unnecesary info we pickle/unpickle. That's what I was leaning towards. Optimizing the basic access patterns may not buy us as much as just reducing the amount of clutter we have to deal with. It may make sense, however, as a subsequent optimization. Diego.
Re: [Patch,AVR]: Fix PR49939: Skip 2-word insns
2011/10/12 Georg-Johann Lay a...@gjlay.de: Denis Chertykov schrieb: 2011/10/11 Georg-Johann Lay a...@gjlay.de: This patch teaches avr-gcc to skip 2-word instructions like STS and LDS. It's just about looking into an 2-word insn and check if it's a 2-word instruction or not. Passes without regression. Ok to install? Please commit. Denis. Committed with the following change: - avr_2word_insn_p (next_nonnote_nondebug_insn (insn; + avr_2word_insn_p (next_active_insn (insn; It was discussed in another thread. Denis.
Re: [C++ Patch] PR 50594 (C++ front-end bits)
... or like this, maybe better. Paolo. / Index: decl.c === --- decl.c (revision 179842) +++ decl.c (working copy) @@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void) current_lang_name = lang_name_cplusplus; { -tree newattrs; +tree newattrs, extvisattr; tree newtype, deltype; tree ptr_ftype_sizetype; tree new_eh_spec; @@ -3687,9 +3687,13 @@ cxx_init_decl_processing (void) newattrs = build_tree_list (get_identifier (alloc_size), build_tree_list (NULL_TREE, integer_one_node)); +extvisattr = build_tree_list (get_identifier (externally_visible), + NULL_TREE); +newattrs = chainon (newattrs, extvisattr); newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs); newtype = build_exception_variant (newtype, new_eh_spec); -deltype = build_exception_variant (void_ftype_ptr, empty_except_spec); +deltype = cp_build_type_attribute_variant (void_ftype_ptr, extvisattr); +deltype = build_exception_variant (deltype, empty_except_spec); push_cp_library_fn (NEW_EXPR, newtype); push_cp_library_fn (VEC_NEW_EXPR, newtype); global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype);
Re: [gimplefe][patch] The symbol table for declarations
On 11-10-10 17:47 , Sandeep Soni wrote: Hi, The following patch is a basic attempt to build a symbol table that stores the names of all the declarations made in the input file. Index: gcc/gimple/parser.c === --- gcc/gimple/parser.c (revision 174754) +++ gcc/gimple/parser.c (working copy) @@ -28,6 +28,7 @@ #include tree.h #include gimple.h #include parser.h +#include hashtab.h #include ggc.h /* The GIMPLE parser. Note: do not use this variable directly. It is @@ -44,6 +45,43 @@ /* EOF token. */ static gimple_token gl_eof_token = { CPP_EOF, 0, 0, 0 }; +/* The GIMPLE symbol table entry. */ + +struct GTY (()) gimple_symtab_entry_def +{ + /* Variable that is declared. */ + tree decl; + +}; No blank line before '};' Add 'typedef struct gimple_symtab_entry_def gimple_symtab_entry;' to shorten declarations. + +/* Gimple symbol table. */ +static htab_t gimple_symtab; + +/* Return the hash value of the declaration name of a gimple_symtab_entry_def + object pointed by ENTRY. */ + +static hashval_t +gimple_symtab_entry_hash (const void *entry) +{ + const struct gimple_symtab_entry_def *base = +(const struct gimple_symtab_entry_def *)entry; + return IDENTIFIER_HASH_VALUE (DECL_NAME(base-decl)); Space after DECL_NAME. +} + +/* Returns non-zero if ENTRY1 and ENTRY2 points to gimple_symtab_entry_def s/points/point/ + objects corresponding to the same declaration. */ + +static int +gimple_symtab_eq_hash (const void *entry1, const void *entry2) +{ + const struct gimple_symtab_entry_def *p1 = +(const struct gimple_symtab_entry_def *)entry1; + const struct gimple_symtab_entry_def *p2 = +(const struct gimple_symtab_entry_def *)entry2; + + return DECL_NAME(p1-decl) == DECL_NAME(p2-decl); Space after DECL_NAME. +} + /* Return the string representation of token TOKEN. */ static const char * @@ -807,6 +845,7 @@ } } + /* The Declaration section within a .gimple file can consist of a) Declaration of variables. b) Declaration of functions. @@ -870,11 +909,17 @@ static void gp_parse_var_decl (gimple_parser *parser) { - const gimple_token *next_token; + const gimple_token *next_token, *name_token; + const char* name; s/char* /char */ enum tree_code code ; + struct gimple_symtab_entry_def e; gl_consume_expected_token (parser-lexer, CPP_LESS); - gl_consume_expected_token (parser-lexer, CPP_NAME); + name_token = gl_consume_expected_token (parser-lexer, CPP_NAME); + name = gl_token_as_text (name_token); + e.decl = + build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier(name), void_type_node); No need to use UNKNOWN_LOCATION. Get the location for E.DECL from name_token.location. Additionally, before building the decl, you should make sure that the symbol table does not already have it. So, instead of looking up with a DECL, you should look it up using IDENTIFIER_NODEs. There are two approaches you can use: 1- Add an identifier field to gimple_symtab_entry_def. Use that field for hash table lookups (in this code you'd then fill E.ID with NAME_TOKEN). 2- Use a pointer_map_t and a VEC(). With this approach, you use a pointer map to map identifier nodes to unsigned integers. These integers are the index into the VEC() array where the corresponding decl is stored. In this case, I think #1 is the simplest approach. + htab_find_slot (gimple_symtab,e, INSERT); This looks wrong. Where are you actually filling in the slot? You need to check the returned slot, if it's empty, you fill it in with E.DECL. See other uses of htab_*. gl_consume_expected_token (parser-lexer, CPP_COMMA); next_token = gl_consume_token (parser-lexer); @@ -981,6 +1027,7 @@ gimple_parser *parser = ggc_alloc_cleared_gimple_parser (); line_table = parser-line_table = ggc_alloc_cleared_line_maps (); parser-ident_hash = ident_hash; + linemap_init (parser-line_table); parser-lexer = gl_init (parser, fname); @@ -1403,6 +1450,9 @@ if (parser-lexer-filename == NULL) return; + gimple_symtab = +htab_create_ggc (1021, gimple_symtab_entry_hash, +gimple_symtab_eq_hash, NULL); Do you need to indent it this way? Seems to me that the call to htab_create_ggc can fit in the line above. Diego.
[patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
Hi, dropping the optional DWARF attribute DW_AT_sibling has only advantages and no disadvantages: For files with .gdb_index GDB initial scan does not use DW_AT_sibling at all. For files without .gdb_index GDB initial scan has 1.79% time _improvement_. For .debug files it brings 3.49% size decrease (7.84% for rpm compressed files). I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock multipliers. Nowadays mostly only the data size transferred over FSB matters. I do not think there would be any DWARF consumers compatibility problems as DW_AT_sibling has always been optional but I admit I have tested only GDB. clean is FSF GCC+GDB, ns is FSF GCC with the patch applied. gdbindex -readnow 100x warm: clean: 56.975 57.161 57.738 58.243 57.529249 seconds ns: 57.799 58.008 58.202 58.473 58.1204993 seconds +1.03% = performance decrease but it should be 0%, it is a measurement error gdbindex -readnow 20x warm(gdb) cold(data): clean: 57.989 ns: 58.538 +0.95% = performance decrease but it should be 0%, it is a measurement error 200x warm: clean: 14.393 14.414 14.587 14.496 14.4724998 seconds ns: 14.202 14.160 14.174 14.318 14.2134998 seconds -1.79% = performance improvement of non-gdbindex scan (dwarf2_build_psymtabs_hard) gdbindex .debug: clean = 5589272 bytes ns = 5394120 bytes -3.49% = size improvement gdbindex .debug.xz9: clean = 1158696 bytes ns = 1067900 bytes -7.84% = size improvement .debug_info + .debug_types: clean = 0x1a11a0+0x08f389 bytes ns = 0x184205+0x0833b0 bytes -7.31% = size improvement Intel i7-920 CPU and only libstdc++ from GCC 4.7.0 20111002 and `-O2 -gdwarf-4 -fdebug-types-section' were used for the benchmark. GCC 4.7.0 20111002 --enable-languages=c++ was used for `make check' regression testing. Thanks, Jan gcc/ 2011-10-12 Jan Kratochvil jan.kratoch...@redhat.com Stop producing DW_AT_sibling. * dwarf2out.c (add_sibling_attributes): Remove the declaration. (add_sibling_attributes): Remove the function. (dwarf2out_finish): Remove calls of add_sibling_attributes. --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3316,7 +3316,6 @@ static int htab_cu_eq (const void *, const void *); static void htab_cu_del (void *); static int check_duplicate_cu (dw_die_ref, htab_t, unsigned *); static void record_comdat_symbol_number (dw_die_ref, htab_t, unsigned); -static void add_sibling_attributes (dw_die_ref); static void build_abbrev_table (dw_die_ref); static void output_location_lists (dw_die_ref); static int constant_size (unsigned HOST_WIDE_INT); @@ -7482,24 +7481,6 @@ copy_decls_for_unworthy_types (dw_die_ref unit) unmark_dies (unit); } -/* Traverse the DIE and add a sibling attribute if it may have the - effect of speeding up access to siblings. To save some space, - avoid generating sibling attributes for DIE's without children. */ - -static void -add_sibling_attributes (dw_die_ref die) -{ - dw_die_ref c; - - if (! die-die_child) -return; - - if (die-die_parent die != die-die_parent-die_child) -add_AT_die_ref (die, DW_AT_sibling, die-die_sib); - - FOR_EACH_CHILD (die, c, add_sibling_attributes (c)); -} - /* Output all location lists for the DIE and its children. */ static void @@ -22496,14 +22477,6 @@ dwarf2out_finish (const char *filename) prune_unused_types (); } - /* Traverse the DIE's and add add sibling attributes to those DIE's - that have children. */ - add_sibling_attributes (comp_unit_die ()); - for (node = limbo_die_list; node; node = node-next) -add_sibling_attributes (node-die); - for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode-next) -add_sibling_attributes (ctnode-root_die); - /* Output a terminator label for the .text section. */ switch_to_section (text_section); targetm.asm_out.internal_label (asm_out_file, TEXT_END_LABEL, 0);
[patch, Fortran] Change -std=f2008tr to f2008ts, update *.texi status and TR29113-TS29113
Hello all, this patch does two things: a) It updates the Fortran 2003 and TR/TS 29113 status in the GNU Fortran manual. b) It changes all references to Technical Report 29113 to Technical Specification 29113 c) It changes -std=f2008tr to -std=f2008ts (a) is obvious. Regarding (b): For some reason, ISO's SC22 thinks that one should not use Technical Reports (TR) but a Technical Specification (TS) for the Further //Interoperability of Fortran with C document - and later also for the coarray extensions. Glancing at the documentation, I think they are right that a TS is better; there are procedural differences, but for us the main difference is the name. As the final word is TS, I think gfortran should use TS and not TR throughout for the post-F2008 technical documents. Cf. ftp://ftp.nag.co.uk/sc22wg5/N1851-N1900/N1879.txt : JTC 1/SC 22 instructs the JTC 1/SC 22/WG 5 Convenor to submit future drafts of TR 29113, Further Interoperability of Fortran with C as Technical Specifications For the difference between TS and TR, see also http://www.iso.org/iso/standards_development/processes_and_procedures/deliverables.htm; for the different approval scheme also the following flow chart (clickable): http://www.iso.org/iso/standards_development/it_tools/flowchart_main.htm Regarding (c): If we switch to TS everywhere, I think it makes sense to also call the flag -std=f2008ts; the flag stands for: Follow the standard according to Fortran 2008 with the extensions defined in the post-F2008 (pre-F2013) standard. Namely, TS 29113 on further interoperability of Fortran with C and the coarray TS, which is in a rather early stage. (TS 29113 is already past a PDTS voting and a DTS should be submitted until June 2012. Given that -std=f2008tr was never included in a released GCC version and given that it currently only allows very few features, I think no one actually uses it. Hence, I decided that one can simply change it without taking care of still accepting the f2008tr version. (Currently supported TS 29113 features: OPTIONAL with BIND(C) - absent arguments are indicated as NULL pointer, matching the internal implementation. RANK() intrinsic - which is boring without assumed-rank arrays. ASYNCHRONOUS - well, only the semantic has changed a bit since F2003/GCC 4.6; however, GCC's middle end uses by default ASYNCHRONOUS semantic; turning it off is a missed-optimization bug.) The patch was build and regtested on x86-64-linux. OK for the trunk? Tobias PS: I will also update the release notes after the patch has been committed. 2011-10-12 Tobias Burnus bur...@net-b.de * gfortran.texi (Fortran 2008 status, TS 29113 status, Further Interoperability of Fortran with C): Update implementation status, change references from TR 29113 to TS 29113. * intrinsic.texi (RANK): Change TR 29113 to TS 29113. * invoke.text (-std=): Ditto, change -std=f2008tr to -std=f2008ts. * lang.opt (std=): Ditto. * options.c (gfc_handle_option, set_default_std_flags): Ditto and change GFC_STD_F2008_TR to GFC_STD_F2008_TS. * libgfortran.h: Ditto. * intrinsic.c (add_functions, gfc_check_intrinsic_standard): Ditto. * decl.c (verify_c_interop_param): Ditto. 2011-10-12 Tobias Burnus bur...@net-b.de * gfortran.dg/bind_c_usage_23.f90: Change TR 29113 to TS 29113 in the comments. * gfortran.dg/bind_c_usage_24.f90: Ditto. * gfortran.dg/rank_3.f90: Ditto. * gfortran.dg/bind_c_usage_22.f90: Ditto, change -std=f2008tr to -std=f2008ts in dg-options. * gfortran.dg/rank_4.f90: Ditto. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 0ee2575..9f3a39e 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -1069,7 +1069,7 @@ verify_c_interop_param (gfc_symbol *sym) retval = FAILURE; } else if (sym-attr.optional == 1 - gfc_notify_std (GFC_STD_F2008_TR, TR29113: Variable '%s' + gfc_notify_std (GFC_STD_F2008_TS, TS29113: Variable '%s' at %L with OPTIONAL attribute in procedure '%s' which is BIND(C), sym-name, (sym-declared_at), diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 389c05b..f847df3 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -772,7 +772,7 @@ compile option was used. @menu * Fortran 2003 status:: * Fortran 2008 status:: -* TR 29113 status:: +* TS 29113 status:: @end menu @node Fortran 2003 status @@ -1003,8 +1003,11 @@ the intrinsic module @code{ISO_FORTRAN_ENV}. @code{ISO_C_BINDINGS} and @code{COMPILER_VERSION} and @code{COMPILER_OPTIONS} of @code{ISO_FORTRAN_ENV}. -@item Experimental coarray, use the @option{-fcoarray=single} or -@option{-fcoarray=lib} flag to enable it. +@item Coarray support for serial programs with @option{-fcoarray=single} flag +and experimental support for multiple images with the @option{-fcoarray=lib} +flag. + +@item The @code{DO CONCURRENT} construct is supported. @item The @code{BLOCK} construct is supported. @@ -1049,19
Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Oct 12, 2011, at 3:50 PM, Jan Kratochvil wrote: Hi, dropping the optional DWARF attribute DW_AT_sibling has only advantages and no disadvantages: For files with .gdb_index GDB initial scan does not use DW_AT_sibling at all. For files without .gdb_index GDB initial scan has 1.79% time _improvement_. For .debug files it brings 3.49% size decrease (7.84% for rpm compressed files). I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock multipliers. Nowadays mostly only the data size transferred over FSB matters. I do not think there would be any DWARF consumers compatibility problems as DW_AT_sibling has always been optional but I admit I have tested only GDB. I fear that this may degrade performance of other debuggers. What about adding a command line option ? Tristan.
[Patch, Fortran, Committed] Update -f(no-)whole-file in invoke.texi (was: Re: [Patch, Fortran, committed] PR 50585: [4.6/4.7 Regression] ICE with assumed length character array argument)
I have committed the attached patch to the 4.7 trunk (rev 179854) and the 4.6 branch (rev 179855). invoke.texi wasn't updated when -fwhole-file became the default in GCC 4.6. This was spotted by Janus, who created the first draft patch. This patch was approved by Janus off list. Tobias On 10/09/2011 07:01 PM, Tobias Burnus wrote: On 08.10.2011 11:51, Janus Weil wrote: Thanks! What's about the .texi change for -fwhole-file? Will do. Should I include a note about deprecation? And if yes, do you have a suggestion for the wording? How about the following attachment? Index: ChangeLog === --- ChangeLog (revision 179852) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-10-11 Tobias Burnus bur...@net-b.de + Janus Weil ja...@gcc.gnu.org + + * invoke.texi (-fwhole-file): Update wording since -fwhole-file + is now enabled by default. + 2011-10-11 Michael Meissner meiss...@linux.vnet.ibm.com * trans-expr.c (gfc_conv_power_op): Delete old interface with two Index: invoke.texi === --- invoke.texi (revision 179852) +++ invoke.texi (working copy) @@ -164,7 +164,7 @@ @item Code Generation Options @xref{Code Gen Options,,Options for code generation conventions}. @gccoptlist{-fno-automatic -ff2c -fno-underscoring @gol --fwhole-file -fsecond-underscore @gol +-fno-whole-file -fsecond-underscore @gol -fbounds-check -fcheck-array-temporaries -fmax-array-constructor =@var{n} @gol -fcheck=@var{all|array-temps|bounds|do|mem|pointer|recursion} @gol -fcoarray=@var{none|single|lib} -fmax-stack-var-size=@var{n} @gol @@ -1225,19 +1225,22 @@ prevent accidental linking between procedures with incompatible interfaces. -@item -fwhole-file -@opindex @code{fwhole-file} -By default, GNU Fortran parses, resolves and translates each procedure -in a file separately. Using this option modifies this such that the -whole file is parsed and placed in a single front-end tree. During -resolution, in addition to all the usual checks and fixups, references +@item -fno-whole-file +@opindex @code{fno-whole-file} +This flag causes the compiler to resolve and translate each procedure in +a file separately. + +By default, the whole file is parsed and placed in a single front-end tree. +During resolution, in addition to all the usual checks and fixups, references to external procedures that are in the same file effect resolution of -that procedure, if not already done, and a check of the interfaces. The +that procedure, if not already done, and a check of the interfaces. The dependences are resolved by changing the order in which the file is translated into the backend tree. Thus, a procedure that is referenced is translated before the reference and the duplication of backend tree declarations eliminated. +The @option{-fno-whole-file} option is deprecated and may lead to wrong code. + @item -fsecond-underscore @opindex @code{fsecond-underscore} @cindex underscore Index: ChangeLog === --- ChangeLog (revision 179794) +++ ChangeLog (working copy) @@ -1,5 +1,11 @@ 2011-10-11 Tobias Burnus bur...@net-b.de + Janus Weil ja...@gcc.gnu.org + * invoke.texi (-fwhole-file): Update wording since -fwhole-file + is now enabled by default. + +2011-10-11 Tobias Burnus bur...@net-b.de + PR fortran/50273 * trans-common.c (translate_common): Fix -Walign-commons check. Index: invoke.texi === --- invoke.texi (revision 179793) +++ invoke.texi (working copy) @@ -163,7 +163,7 @@ @item Code Generation Options @xref{Code Gen Options,,Options for code generation conventions}. @gccoptlist{-fno-automatic -ff2c -fno-underscoring @gol --fwhole-file -fsecond-underscore @gol +-fno-whole-file -fsecond-underscore @gol -fbounds-check -fcheck-array-temporaries -fmax-array-constructor =@var{n} @gol -fcheck=@var{all|array-temps|bounds|do|mem|pointer|recursion} @gol -fcoarray=@var{none|single} -fmax-stack-var-size=@var{n} @gol @@ -1206,19 +1206,22 @@ prevent accidental linking between procedures with incompatible interfaces. -@item -fwhole-file -@opindex @code{fwhole-file} -By default, GNU Fortran parses, resolves and translates each procedure -in a file separately. Using this option modifies this such that the -whole file is parsed and placed in a single front-end tree. During -resolution, in addition to all the usual checks and fixups, references +@item -fno-whole-file +@opindex @code{fno-whole-file} +This flag causes the compiler to resolve and translate each procedure in +a file separately. + +By default, the whole file is parsed and placed in a single front-end tree. +During resolution, in addition to all the usual checks and fixups, references to external procedures that are in the same file effect resolution of -that procedure, if not already done,
Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Wed, 12 Oct 2011 16:07:24 +0200, Tristan Gingold wrote: I fear that this may degrade performance of other debuggers. What about adding a command line option ? I can test idb, there aren't so many DWARF debuggers out there I think. If the default is removed DW_AT_sibling a new options may make sense as some compatibility safeguard. Thanks, Jan
Re: [PATCH] RFC: Cache LTO streamer mappings
On 11-10-12 08:25 , Jan Hubicka wrote: WPA is currently about 1/3 of readingtype merging, 1/3 of streaming out and 1/3 of inlining. inlining is relatively easy to cure, so yes, streaming performance is important. The very basic streaming primitives actualy still shows top in profile along with hashing and type comparing code. I will post some updated oprofiles into Mozilla PR. OK, thanks. My numbers are from very early LTO development. Yeah, the problem is minor on small projects and C projects. C++ tends to carry a lot of context with it - both in the files streamed from compilation to WPA (a lot of types and such) as well as into individual ltrans units. We still need to stream in and out about 2GB from WPA to ltrans (combined sizes of ltrans0 to lstrans31) and since we are at 3 minutes of compilation now, seconds actually count. Honestly I think we won't get any great speedups unless we work on reducing amount of unnecesary info we pickle/unpickle. That's what I was leaning towards. Optimizing the basic access patterns may not buy us as much as just reducing the amount of clutter we have to deal with. It may make sense, however, as a subsequent optimization. I will give this patch a try on Mozilla to see if I can report some positive numbers. Obviously having the basic I/O effective is also important. Honza Diego.
Re: [PATCH] [Annotalysis] Bugfix for spurious thread safety warnings with shared mutexes
I don't think that will fix this bug. The bug occurs if: (1) The exclusive lock set has error_mark_node. (2) The shared lock set has the actual lock. In this case, remove_lock_from_lockset thinks that it has found the lock in the exclusive lock set, and fails to remove it from the shared lock set. To fix the bug, the first call to lock_set_contains should ignore the universal lock and return null, so that remove_lock will continue on to search the shared lock set. If I understand your suggested fix correctly, lock_set_contains would still return non-null when the universal lock was present, which is not what we want. IMHO, lock_set_contains is operating correctly; it was just passed the wrong arguments. -DeLesley On Tue, Oct 11, 2011 at 2:34 PM, Ollie Wild a...@google.com wrote: On Mon, Oct 10, 2011 at 3:37 PM, Delesley Hutchins deles...@google.com wrote: --- gcc/tree-threadsafe-analyze.c (revision 179771) +++ gcc/tree-threadsafe-analyze.c (working copy) @@ -1830,14 +1830,27 @@ remove_lock_from_lockset (tree lockable, struct po This feels like a bug in lock_set_contains(), not remove_lock_from_lockset(). I'd modify lock_set_contains() as follows: 1) During the universal lock conditional, remove the return statement. Instead, set default_lock = lock (where default_lock is a new variable initialized to NULL_TREE). 2) Anywhere NULL_TREE is returned later, replace it with default_lock. Ollie -- DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315
Re: [PATCH] Fix PR50189
On Wed, 12 Oct 2011, Richard Guenther wrote: This changes VRP to use the type of the variable we record an assertion for to look for TYPE_MIN/MAX_VALUEs rather than the limit that it is tested against. That makes sense anyway and happens to mitigate the wrong-code bug for the testcase in PR50189. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Which shows I need to adjust types for plus/minus we build because of possible mismatches. Bootstrapped and tested on x86_64-unknown-linux-gnu, installed. Richard. 2011-10-12 Paul Koning pkon...@gcc.gnu.org PR tree-optimization/50189 * tree-vrp.c (extract_range_from_assert): Use the type of the variable, not the limit. * g++.dg/torture/pr50189.C: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 179850) --- gcc/tree-vrp.c (working copy) *** extract_range_from_assert (value_range_t *** 1476,1482 limit = avoid_overflow_infinity (limit); ! type = TREE_TYPE (limit); gcc_assert (limit != var); /* For pointer arithmetic, we only keep track of pointer equality --- 1476,1482 limit = avoid_overflow_infinity (limit); ! type = TREE_TYPE (var); gcc_assert (limit != var); /* For pointer arithmetic, we only keep track of pointer equality *** extract_range_from_assert (value_range_t *** 1650,1657 /* For LT_EXPR, we create the range [MIN, MAX - 1]. */ if (cond_code == LT_EXPR) { ! tree one = build_int_cst (type, 1); ! max = fold_build2 (MINUS_EXPR, type, max, one); if (EXPR_P (max)) TREE_NO_WARNING (max) = 1; } --- 1650,1657 /* For LT_EXPR, we create the range [MIN, MAX - 1]. */ if (cond_code == LT_EXPR) { ! tree one = build_int_cst (TREE_TYPE (max), 1); ! max = fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, one); if (EXPR_P (max)) TREE_NO_WARNING (max) = 1; } *** extract_range_from_assert (value_range_t *** 1685,1692 /* For GT_EXPR, we create the range [MIN + 1, MAX]. */ if (cond_code == GT_EXPR) { ! tree one = build_int_cst (type, 1); ! min = fold_build2 (PLUS_EXPR, type, min, one); if (EXPR_P (min)) TREE_NO_WARNING (min) = 1; } --- 1685,1692 /* For GT_EXPR, we create the range [MIN + 1, MAX]. */ if (cond_code == GT_EXPR) { ! tree one = build_int_cst (TREE_TYPE (min), 1); ! min = fold_build2 (PLUS_EXPR, TREE_TYPE (min), min, one); if (EXPR_P (min)) TREE_NO_WARNING (min) = 1; } Index: gcc/testsuite/g++.dg/torture/pr50189.C === *** gcc/testsuite/g++.dg/torture/pr50189.C (revision 0) --- gcc/testsuite/g++.dg/torture/pr50189.C (revision 0) *** *** 0 --- 1,121 + // { dg-do run } + // { dg-options -fstrict-enums } + + extern C void abort (void); + class CCUTILS_KeyedScalarLevelPosition + { + public: + + typedef enum + { + UNINITED = 0, + AT_BEGIN = 1, + AT_END = 2, + AT_KEY = 3 + + } position_t; + + bool is_init() const + { return(m_timestamp != UNINITED); } + + bool is_at_begin() const + { return(m_timestamp == AT_BEGIN); } + + position_t get_state() const + { + return((m_timestamp = AT_KEY) + ? AT_KEY + : ((position_t)m_timestamp)); + } + + void set_at_begin() + { m_timestamp = AT_BEGIN; } + + unsigned int get_index() const + { return(m_index); } + + void set_pos(unsigned int a_index, unsigned int a_timestmap) + { + m_index = a_index; + m_timestamp = a_timestmap; + } + + bool check_pos(unsigned int a_num_entries, unsigned int a_timestamp) const + { + if (get_state() != AT_KEY) + return(false); + + if (m_timestamp != a_timestamp) + return(false); + + return(m_index a_num_entries); + } + + void set_not_init() + { m_timestamp = 0; } + + private: + + unsigned int m_timestamp; + unsigned int m_index; + + }; + + class CCUTILS_KeyedScalarPosition + { + public: + + CCUTILS_KeyedScalarLevelPosition m_L1; + CCUTILS_KeyedScalarLevelPosition m_L2; + }; + + class baz + { + public: + int *n[20]; + unsigned int m_cur_array_len; + unsigned int m_timestamp; + + unsigned int _get_timestamp() const + { return(m_timestamp); } + + bool _check_L1_pos(const CCUTILS_KeyedScalarPosition a_position) const + { + return(a_position.m_L1.check_pos( +
Re: [PATCH] Fix PR50204
On Wed, 12 Oct 2011, Michael Matz wrote: Hi, On Tue, 11 Oct 2011, Richard Guenther wrote: Since we have the alias oracle we no longer optimize the testcase below because I initially restricted the stmt walking to give up for PHIs with more than 2 arguments because of compile-time complexity issues. But it's easy to see that compile-time is not an issue when we reduce PHI args pairwise to a single dominating operand. Of course it is, not a different complexity class, but a constant factor. You have to do N-1 pairwise reductions, meaning with a large fan-in block you pay N-1 times the price, not just once for one pair, and if the price happens to be walking all up to the function start you indeed then are at N*M. I think there should be a cutoff, possibly not at two. Think about the generated testcases with many large switches. Indeed we can do a little better by also caching at possible branches. The easiest is to have cache points at the first store we visit in a basic-block (thus we have at most two bits in the visited bitmap per BB). We can also make the result (more) independent on the order of PHI arguments by disambiguating against a VUSE that (possibly) dominates all other VUSEs. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-10-12 Richard Guenther rguent...@suse.de * tree-ssa-alias.c (maybe_skip_until): Cache also at the point of the first store we visit in a basic-block. (get_continuation_for_phi): Search for a candidate VUSE that might dominates all others. Do pairwise disambiguation against that candidate. Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 179849) +++ gcc/tree-ssa-alias.c(working copy) @@ -1846,6 +1846,8 @@ static bool maybe_skip_until (gimple phi, tree target, ao_ref *ref, tree vuse, bitmap *visited) { + basic_block bb = gimple_bb (phi); + if (!*visited) *visited = BITMAP_ALLOC (NULL); @@ -1870,6 +1872,14 @@ maybe_skip_until (gimple phi, tree targe else if (gimple_nop_p (def_stmt) || stmt_may_clobber_ref_p_1 (def_stmt, ref)) return false; + /* If we reach a new basic-block see if we already skipped it + in a previous walk that ended successfully. */ + if (gimple_bb (def_stmt) != bb) + { + if (!bitmap_set_bit (*visited, SSA_NAME_VERSION (vuse))) + return true; + bb = gimple_bb (def_stmt); + } vuse = gimple_vuse (def_stmt); } return true; @@ -1948,18 +1958,35 @@ get_continuation_for_phi (gimple phi, ao until we hit the phi argument definition that dominates the other one. */ else if (nargs = 2) { - tree arg0 = PHI_ARG_DEF (phi, 0); - tree arg1; - unsigned i = 1; - do + tree arg0, arg1; + unsigned i; + + /* Find a candidate for the virtual operand which definition +dominates those of all others. */ + arg0 = PHI_ARG_DEF (phi, 0); + if (!SSA_NAME_IS_DEFAULT_DEF (arg0)) + for (i = 1; i nargs; ++i) + { + arg1 = PHI_ARG_DEF (phi, i); + if (SSA_NAME_IS_DEFAULT_DEF (arg1)) + { + arg0 = arg1; + break; + } + if (dominated_by_p (CDI_DOMINATORS, + gimple_bb (SSA_NAME_DEF_STMT (arg0)), + gimple_bb (SSA_NAME_DEF_STMT (arg1 + arg0 = arg1; + } + + /* Then pairwise reduce against the found candidate. */ + for (i = 0; i nargs; ++i) { arg1 = PHI_ARG_DEF (phi, i); arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited); if (!arg0) return NULL_TREE; - } - while (++i nargs); return arg0; }
Re: [PATCH] Fix PR50204
Hi, no need to test a phi argument with itself (testing arg0 != arg1 is not the right test, though, so remembering the candidate index): @@ -1948,18 +1958,35 @@ get_continuation_for_phi (gimple phi, ao until we hit the phi argument definition that dominates the other one. */ else if (nargs = 2) { - tree arg0 = PHI_ARG_DEF (phi, 0); - tree arg1; - unsigned i = 1; - do + tree arg0, arg1; + unsigned i; unsigned j; + /* Find a candidate for the virtual operand which definition + dominates those of all others. */ + arg0 = PHI_ARG_DEF (phi, 0); j = 0; + if (!SSA_NAME_IS_DEFAULT_DEF (arg0)) + for (i = 1; i nargs; ++i) + { + arg1 = PHI_ARG_DEF (phi, i); + if (SSA_NAME_IS_DEFAULT_DEF (arg1)) + { + arg0 = arg1; j = i; + break; + } + if (dominated_by_p (CDI_DOMINATORS, + gimple_bb (SSA_NAME_DEF_STMT (arg0)), + gimple_bb (SSA_NAME_DEF_STMT (arg1 + arg0 = arg1; , j = i; + } + + /* Then pairwise reduce against the found candidate. */ + for (i = 0; i nargs; ++i) { arg1 = PHI_ARG_DEF (phi, i); if (i != j) arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited); if (!arg0) return NULL_TREE; - } - while (++i nargs); Ciao, Michael.
Re: New warning for expanded vector operations
On Tue, Oct 11, 2011 at 9:11 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Committed with the revision 179807. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50704 -- H.J.
[PATCH] Optimize some loops using bool types (PR tree-optimization/50596)
Hi! This patch allows vectorization of some loops that use bool (which is especially important now that we use bool more often even for stmts that weren't originally using bool in the sources), in particular (when bool is cast to an integer type, and the bool rhs has def stmts within the loop as either BIT_{AND,IOR,XOR}_EXPR, or just SSA_NAME assigns or bool - another bool casts, or comparisons (tested recursively). In that case the pattern recognizer transforms the comparisons into COND_EXPRs using suitable integer type (the same width as the comparison operands) and other bools to suitable integer types with casts added where needed. The patch doesn't yet handle vectorization of storing into a bool array, I'll work on that later. Bootstrapped/regtested on x86_64-linux and i686-linux. Ok for trunk? 2011-10-12 Jakub Jelinek ja...@redhat.com PR tree-optimization/50596 * tree-vectorizer.h (NUM_PATTERNS): Increase to 7. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_bool_pattern. (check_bool_pattern, adjust_bool_pattern_cast, adjust_bool_pattern, vect_recog_bool_pattern): New functions. * gcc.dg/vect/vect-cond-9.c: New test. --- gcc/tree-vectorizer.h.jj2011-10-10 09:41:29.0 +0200 +++ gcc/tree-vectorizer.h 2011-10-10 10:12:03.0 +0200 @@ -902,7 +902,7 @@ extern void vect_slp_transform_bb (basic Additional pattern recognition functions can (and will) be added in the future. */ typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); -#define NUM_PATTERNS 6 +#define NUM_PATTERNS 7 void vect_pattern_recog (loop_vec_info); /* In tree-vectorizer.c. */ --- gcc/tree-vect-patterns.c.jj 2011-10-10 09:41:29.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-10 18:23:41.0 +0200 @@ -51,13 +51,15 @@ static gimple vect_recog_over_widening_p tree *); static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **, tree *, tree *); +static gimple vect_recog_bool_pattern (VEC (gimple, heap) **, tree *, tree *); static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = { vect_recog_widen_mult_pattern, vect_recog_widen_sum_pattern, vect_recog_dot_prod_pattern, vect_recog_pow_pattern, vect_recog_over_widening_pattern, - vect_recog_mixed_size_cond_pattern}; + vect_recog_mixed_size_cond_pattern, + vect_recog_bool_pattern}; /* Function widened_name_p @@ -1068,10 +1070,8 @@ vect_operation_fits_smaller_type (gimple constants. Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either be 'type' or some intermediate type. For now, we expect S5 to be a type - demotion operation. We also check that S3 and S4 have only one use. -. + demotion operation. We also check that S3 and S4 have only one use. */ -*/ static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **stmts, tree *type_in, tree *type_out) @@ -1333,6 +1333,356 @@ vect_recog_mixed_size_cond_pattern (VEC } +/* Helper function of vect_recog_bool_pattern. Called recursively, return + true if bool VAR can be optimized that way. */ + +static bool +check_bool_pattern (tree var, loop_vec_info loop_vinfo) +{ + gimple def_stmt; + enum vect_def_type dt; + tree def, rhs1; + enum tree_code rhs_code; + + if (!vect_is_simple_use (var, loop_vinfo, NULL, def_stmt, def, dt)) +return false; + + if (dt != vect_internal_def) +return false; + + if (!is_gimple_assign (def_stmt)) +return false; + + if (!has_single_use (def)) +return false; + + rhs1 = gimple_assign_rhs1 (def_stmt); + rhs_code = gimple_assign_rhs_code (def_stmt); + switch (rhs_code) +{ +case SSA_NAME: + return check_bool_pattern (rhs1, loop_vinfo); + +CASE_CONVERT: + if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1 + || !TYPE_UNSIGNED (TREE_TYPE (rhs1))) + TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE) + return false; + return check_bool_pattern (rhs1, loop_vinfo); + +case BIT_NOT_EXPR: + return check_bool_pattern (rhs1, loop_vinfo); + +case BIT_AND_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: + if (!check_bool_pattern (rhs1, loop_vinfo)) + return false; + return check_bool_pattern (gimple_assign_rhs2 (def_stmt), loop_vinfo); + +default: + if (TREE_CODE_CLASS (rhs_code) == tcc_comparison) + { + tree vecitype, comp_vectype; + + comp_vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1)); + if (comp_vectype == NULL_TREE) + return false; + + if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE) + { + enum machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1)); + tree itype + =
Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target
On 10/12/2011 12:07 AM, Kai Tietz wrote: Hello, by recent changes gcc begun to move code into the prologue region. This is for x64 SEH an issue, as here the table-information for prologue is limited to 255 bytes size. So we need to avoid moving additional code into prologue. To achieve this we mark all standard and xmm registers as prologue-used at the end of prologue. Also we need to emit a memory blockage. ChangeLog 2011-10-12 Kai Tietz kti...@redhat.com * config/i386/i386.c (ix86_expand_prologue): Mark for TARGET_SEH all sse/integer registers as prologue-used. Tested for x86_64-w64-mingw32. Ok for apply? Regards, Kai Index: i386.c === --- i386.c (revision 179824) +++ i386.c (working copy) @@ -10356,7 +10356,24 @@ Further, prevent alloca modifications to the stack pointer from being combined with prologue modifications. */ if (TARGET_SEH) -emit_insn (gen_prologue_use (stack_pointer_rtx)); +{ + int i; + + /* Due limited size of prologue-code size of 255 bytes, + we need to prevent scheduler to sink instructions into + prologue code. Therefore we mark all standard, sse, fpu, + and the pc registers as prologue-used to prevent this. + Also an memory-blockage is necessary. */ + emit_insn (gen_memory_blockage ()); + + for (i = 0; i = 7; i++) +{ + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, AX_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, R8_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM0_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM8_REG + i))); + } +} This is overkill. We simply need to disable shrink-wrapping for SEH. The easiest way to do that is to add !TARGET_SEH (and a comment) to the simple_return pattern predicate. r~
[PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support
Hi! This patch started with noticing while working on PR50596 that #define N 1024 long long a[N]; char b[N]; void foo (void) { int i; for (i = 0; i N; i++) b[i] = a[i]; } is even with -O3 -mavx2 vectorized just with 16-byte vectors instead of 32-byte vectors and has various fixes I've noticed when diving into it. The vector permutations with AVX2 aren't very easy, because some instructions don't shuffle cross-lane, some do but only for some modes. The patch adds AVX2 vec_pack_trunc* expanders so that the above can be vectorized, and implements a couple of permutation sequences, including for a single operand __builtin_vec_shuffle a 4 insn sequence that handles arbitrary V32QI/V16HI constant permutations (and some cases where 1 insn is possible too) and also a variable mask V{32Q,16H,8S,4D}I permutations. I think we badly need testcase which will try all possible constant permutations (probably one testcase per mode), even for V32QImode that's just 32x32 plus 32x64 tests (if split into 32 tests in a function times 96 noinline functions), but with that I'd like to wait for Richard's permutation improvements, because although currently the backend signalizes it can handle some constant argument e.g. V32QImode permutation, as there is no V32QImode permutation builtin __builtin_shuffle emits it as variable mask operation. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (UNSPEC_VPERMDI): Remove. * config/i386/i386.c (ix86_expand_vec_perm): Handle V16QImode and V32QImode for TARGET_AVX2. (MAX_VECT_LEN): Increase to 32. (expand_vec_perm_blend): Add support for 32-byte integer vectors with TARGET_AVX2. (valid_perm_using_mode_p): New function. (expand_vec_perm_pshufb): Add support for 32-byte integer vectors with TARGET_AVX2. (expand_vec_perm_vpshufb2_vpermq): New function. (expand_vec_perm_vpshufb2_vpermq_even_odd): New function. (expand_vec_perm_even_odd_1): Handle 32-byte integer vectors with TARGET_AVX2. (ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq and expand_vec_perm_vpshufb2_vpermq_even_odd. * config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2 32-byte integer vector modes. (vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128. (avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto. (avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate 4 new operands. (avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use match_dup, instead add 4 new operands and require they have right cross-lane values. (avx2_permv4di): Change into define_expand. (avx2_permv4di_1): New instruction. (avx2_permv2ti): Use nonimmediate_operand instead of register_operand for xm constrained operand. (VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2. --- gcc/config/i386/i386.md.jj 2011-10-06 16:42:12.0 +0200 +++ gcc/config/i386/i386.md 2011-10-11 10:07:04.0 +0200 @@ -235,7 +235,6 @@ (define_c_enum unspec [ UNSPEC_VPERMSI UNSPEC_VPERMDF UNSPEC_VPERMSF - UNSPEC_VPERMDI UNSPEC_VPERMTI UNSPEC_GATHER --- gcc/config/i386/i386.c.jj 2011-10-10 09:41:28.0 +0200 +++ gcc/config/i386/i386.c 2011-10-12 11:05:06.0 +0200 @@ -19334,7 +19334,7 @@ ix86_expand_vec_perm (rtx operands[]) rtx op0 = operands[1]; rtx op1 = operands[2]; rtx mask = operands[3]; - rtx t1, t2, vt, vec[16]; + rtx t1, t2, t3, t4, vt, vt2, vec[32]; enum machine_mode mode = GET_MODE (op0); enum machine_mode maskmode = GET_MODE (mask); int w, e, i; @@ -19343,50 +19343,72 @@ ix86_expand_vec_perm (rtx operands[]) /* Number of elements in the vector. */ w = GET_MODE_NUNITS (mode); e = GET_MODE_UNIT_SIZE (mode); - gcc_assert (w = 16); + gcc_assert (w = 32); if (TARGET_AVX2) { - if (mode == V4DImode || mode == V4DFmode) + if (mode == V4DImode || mode == V4DFmode || mode == V16HImode) { /* Unfortunately, the VPERMQ and VPERMPD instructions only support an constant shuffle operand. With a tiny bit of effort we can use VPERMD instead. A re-interpretation stall for V4DFmode is -unfortunate but there's no avoiding it. */ - t1 = gen_reg_rtx (V8SImode); +unfortunate but there's no avoiding it. +Similarly for V16HImode we don't have instructions for variable +shuffling, while for V32QImode we can use after preparing suitable +masks vpshufb; vpshufb; vpermq; vpor. */ + + if (mode == V16HImode) + { + maskmode = mode = V32QImode; + w = 32; + e = 1; + } + else + { + maskmode = mode = V8SImode; +
[PATCH] Add mulv32qi3 support
Hi! On long long a[1024], c[1024]; char b[1024]; void foo (void) { int i; for (i = 0; i 1024; i++) b[i] = a[i] + 3 * c[i]; } I've noticed that while i?86 backend supports mulv16qi3, it doesn't support mulv32qi3 even with AVX2. The following patch implements that similarly how mulv16qi3 is implemented. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? BTW, I wonder if vector multiply expansion when one argument is VECTOR_CST with all elements the same shouldn't use something similar to what expand_mult does, not sure if in the generic code or at least in the backends. Testing the costs will be harder, maybe it could just test fewer algorithms and perhaps just count number of instructions or something similar. But certainly e.g. v32qi multiplication by 3 is quite costly (4 interleaves, 2 v16hi multiplications, 4 insns to select even from the two), while two vector additions (tmp = x + x; result = x + tmp;) would do the job. 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (vec_avx2): New mode_attr. (mulv16qi3): Macroize to cover also mulv32qi3 for TARGET_AVX2 into ... (mulmode3): ... this. --- gcc/config/i386/sse.md.jj 2011-10-12 09:23:37.0 +0200 +++ gcc/config/i386/sse.md 2011-10-12 12:16:39.0 +0200 @@ -163,6 +163,12 @@ (define_mode_attr avx_avx2 (V4SI avx2) (V2DI avx2) (V8SI avx2) (V4DI avx2)]) +(define_mode_attr vec_avx2 + [(V16QI vec) (V32QI avx2) + (V8HI vec) (V16HI avx2) + (V4SI vec) (V8SI avx2) + (V2DI vec) (V4DI avx2)]) + ;; Mapping of logic-shift operators (define_code_iterator lshift [lshiftrt ashift]) @@ -4838,10 +4844,10 @@ (define_insn *sse2_avx2_plusminus_in (set_attr prefix orig,vex) (set_attr mode TI)]) -(define_insn_and_split mulv16qi3 - [(set (match_operand:V16QI 0 register_operand ) - (mult:V16QI (match_operand:V16QI 1 register_operand ) - (match_operand:V16QI 2 register_operand )))] +(define_insn_and_split mulmode3 + [(set (match_operand:VI1_AVX2 0 register_operand ) + (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 register_operand ) + (match_operand:VI1_AVX2 2 register_operand )))] TARGET_SSE2 can_create_pseudo_p () # @@ -4850,34 +4856,41 @@ (define_insn_and_split mulv16qi3 { rtx t[6]; int i; + enum machine_mode mulmode = sseunpackmodemode; for (i = 0; i 6; ++i) -t[i] = gen_reg_rtx (V16QImode); +t[i] = gen_reg_rtx (MODEmode); /* Unpack data such that we've got a source byte in each low byte of each word. We don't care what goes into the high byte of each word. Rather than trying to get zero in there, most convenient is to let it be a copy of the low byte. */ - emit_insn (gen_vec_interleave_highv16qi (t[0], operands[1], operands[1])); - emit_insn (gen_vec_interleave_highv16qi (t[1], operands[2], operands[2])); - emit_insn (gen_vec_interleave_lowv16qi (t[2], operands[1], operands[1])); - emit_insn (gen_vec_interleave_lowv16qi (t[3], operands[2], operands[2])); + emit_insn (gen_vec_avx2_interleave_highmode (t[0], operands[1], + operands[1])); + emit_insn (gen_vec_avx2_interleave_highmode (t[1], operands[2], + operands[2])); + emit_insn (gen_vec_avx2_interleave_lowmode (t[2], operands[1], + operands[1])); + emit_insn (gen_vec_avx2_interleave_lowmode (t[3], operands[2], + operands[2])); /* Multiply words. The end-of-line annotations here give a picture of what the output of that instruction looks like. Dot means don't care; the letters are the bytes of the result with A being the most significant. */ - emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[4]), /* .A.B.C.D.E.F.G.H */ - gen_lowpart (V8HImode, t[0]), - gen_lowpart (V8HImode, t[1]))); - emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[5]), /* .I.J.K.L.M.N.O.P */ - gen_lowpart (V8HImode, t[2]), - gen_lowpart (V8HImode, t[3]))); + emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[4]), + gen_rtx_MULT (mulmode,/* .A.B.C.D.E.F.G.H */ + gen_lowpart (mulmode, t[0]), + gen_lowpart (mulmode, t[1]; + emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[5]), + gen_rtx_MULT (mulmode,/* .I.J.K.L.M.N.O.P */ + gen_lowpart (mulmode, t[2]), + gen_lowpart (mulmode, t[3]; /* Extract the even bytes and merge them back together. */ ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0); set_unique_reg_note (get_last_insn (), REG_EQUAL, -
Re: int_cst_hash_table mapping persistence and the garbage collector
On 10/12/11 14:00:54, Richard Guenther wrote: I think there is an issue when two cache htabs refer to each other with respect to GC, you might search the list to find out more. Richard, thanks. I thought that might be the case, but I don't understand the GC well enough to make this determination. - Gary
Re: [PATCH 3/7] Emit macro expansion related diagnostics
On Tue, Oct 11, 2011 at 9:47 AM, Jason Merrill ja...@redhat.com wrote: That looks pretty good, but do you really need to build up a separate data structure to search? You seem to be searching it in the same order that it's built up, so why not just walk the expansion chain directly when searching? Agreed. Also, please keep linemap_location_before_p, if needed as macro that expands to the comparison function. That aids readability.
Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target
2011/10/12 Richard Henderson r...@redhat.com: On 10/12/2011 12:07 AM, Kai Tietz wrote: Hello, by recent changes gcc begun to move code into the prologue region. This is for x64 SEH an issue, as here the table-information for prologue is limited to 255 bytes size. So we need to avoid moving additional code into prologue. To achieve this we mark all standard and xmm registers as prologue-used at the end of prologue. Also we need to emit a memory blockage. ChangeLog 2011-10-12 Kai Tietz kti...@redhat.com * config/i386/i386.c (ix86_expand_prologue): Mark for TARGET_SEH all sse/integer registers as prologue-used. Tested for x86_64-w64-mingw32. Ok for apply? Regards, Kai Index: i386.c === --- i386.c (revision 179824) +++ i386.c (working copy) @@ -10356,7 +10356,24 @@ Further, prevent alloca modifications to the stack pointer from being combined with prologue modifications. */ if (TARGET_SEH) - emit_insn (gen_prologue_use (stack_pointer_rtx)); + { + int i; + + /* Due limited size of prologue-code size of 255 bytes, + we need to prevent scheduler to sink instructions into + prologue code. Therefore we mark all standard, sse, fpu, + and the pc registers as prologue-used to prevent this. + Also an memory-blockage is necessary. */ + emit_insn (gen_memory_blockage ()); + + for (i = 0; i = 7; i++) + { + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, AX_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, R8_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM0_REG + i))); + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM8_REG + i))); + } + } This is overkill. We simply need to disable shrink-wrapping for SEH. The easiest way to do that is to add !TARGET_SEH (and a comment) to the simple_return pattern predicate. r~ Thanks, this is indeed more simple. I wasn't aware that enamed return_simple expand also enables shrink-wrapping into prologue. Patch tested for x86_64-w64-mingw32. Ok? Regards, Kai Index: i386.md === --- i386.md (revision 179824) +++ i386.md (working copy) @@ -11708,9 +11708,13 @@ } }) +;; We need to disable this for TARGET_SEH, as otherwise +;; shrink-wrapped prologue gets enabled too. This might exceed +;; the maximum size of prologue in unwind information. + (define_expand simple_return [(simple_return)] - + !TARGET_SEH { if (crtl-args.pops_args) {
Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target
On 10/12/2011 09:54 AM, Kai Tietz wrote: +;; We need to disable this for TARGET_SEH, as otherwise +;; shrink-wrapped prologue gets enabled too. This might exceed +;; the maximum size of prologue in unwind information. + (define_expand simple_return [(simple_return)] - + !TARGET_SEH Ok. r~
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
On Tue, Oct 11, 2011 at 8:37 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Oct 11, 2011 at 3:12 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi Uros, you was right both with fpmath and configflags. That is why it was passing for me. Attached patch which cures the problem. testsuite/ChangeLog entry: 2011-10-11 Kirill Yukhin kirill.yuk...@intel.com * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse. * gcc.target/i386/fma_double_2.c: Ditto. * gcc.target/i386/fma_double_3.c: Ditto. * gcc.target/i386/fma_double_4.c: Ditto. * gcc.target/i386/fma_double_5.c: Ditto. * gcc.target/i386/fma_double_6.c: Ditto. * gcc.target/i386/fma_float_1.c: Ditto. * gcc.target/i386/fma_float_2.c: Ditto. * gcc.target/i386/fma_float_3.c: Ditto. * gcc.target/i386/fma_float_4.c: Ditto. * gcc.target/i386/fma_float_5.c: Ditto. * gcc.target/i386/fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. * gcc.target/i386/l_fma_float_1.c: Ditto. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_run_double_1.c: Ditto. * gcc.target/i386/l_fma_run_double_2.c: Ditto. * gcc.target/i386/l_fma_run_double_3.c: Ditto. * gcc.target/i386/l_fma_run_double_4.c: Ditto. * gcc.target/i386/l_fma_run_double_5.c: Ditto. * gcc.target/i386/l_fma_run_double_6.c: Ditto. * gcc.target/i386/l_fma_run_float_1.c: Ditto. * gcc.target/i386/l_fma_run_float_2.c: Ditto. * gcc.target/i386/l_fma_run_float_3.c: Ditto. * gcc.target/i386/l_fma_run_float_4.c: Ditto. * gcc.target/i386/l_fma_run_float_5.c: Ditto. * gcc.target/i386/l_fma_run_float_6.c: Ditto. Could you please have a look? Sorry for inconvenience, K All double vector tests are failed when GCC is configured with --with-cpu=atom since double vectorizer is turned off by default. You should add -mtune=generic to those tests. I checked in this patch to add -mfpmath=sse/-mtune=generic to FMA tests. I also removed the extra dg-options. Tested on Linux/ia32 and Linux/x96-64. -- H.J. --- diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 5af301f..11a3cc6 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,32 @@ +2011-10-12 H.J. Lu hongjiu...@intel.com + + * gcc.target/i386/fma_run_double_1.c: Add -mfpmath=sse. + * gcc.target/i386/fma_run_double_2.c: Likewise. + * gcc.target/i386/fma_run_double_3.c: Likewise. + * gcc.target/i386/fma_run_double_4.c: Likewise. + * gcc.target/i386/fma_run_double_5.c: Likewise. + * gcc.target/i386/fma_run_double_6.c: Likewise. + * gcc.target/i386/fma_run_float_1.c: Likewise. + * gcc.target/i386/fma_run_float_2.c: Likewise. + * gcc.target/i386/fma_run_float_3.c: Likewise. + * gcc.target/i386/fma_run_float_4.c: Likewise. + * gcc.target/i386/fma_run_float_5.c: Likewise. + * gcc.target/i386/fma_run_float_6.c: Likewise. + + * gcc.target/i386/l_fma_double_1.c: Add -mtune=generic and + remove the extra dg-options. + * gcc.target/i386/l_fma_double_2.c: Likewise. + * gcc.target/i386/l_fma_double_3.c: Likewise. + * gcc.target/i386/l_fma_double_4.c: Likewise. + * gcc.target/i386/l_fma_double_5.c: Likewise. + * gcc.target/i386/l_fma_double_6.c: Likewise. + * gcc.target/i386/l_fma_float_1.c: Likewise. + * gcc.target/i386/l_fma_float_2.c: Likewise. + * gcc.target/i386/l_fma_float_3.c: Likewise. + * gcc.target/i386/l_fma_float_4.c: Likewise. + * gcc.target/i386/l_fma_float_5.c: Likewise. + * gcc.target/i386/l_fma_float_6.c: Likewise. + 2011-10-12 Paul Koning pkon...@gcc.gnu.org PR tree-optimization/50189 diff --git a/gcc/testsuite/gcc.target/i386/fma_run_double_1.c b/gcc/testsuite/gcc.target/i386/fma_run_double_1.c index d46327d..79b219b 100644 --- a/gcc/testsuite/gcc.target/i386/fma_run_double_1.c +++ b/gcc/testsuite/gcc.target/i386/fma_run_double_1.c @@ -1,7 +1,7 @@ /* { dg-do run } */ /* { dg-prune-output .*warning: 'sseregparm' attribute ignored.* } */ /* { dg-require-effective-target fma } */ -/* { dg-options -O3 -mfma } */ +/* { dg-options -O3 -mfpmath=sse -mfma } */ /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ diff --git a/gcc/testsuite/gcc.target/i386/fma_run_double_2.c
Re: [PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support
On 10/12/2011 09:09 AM, Jakub Jelinek wrote: /* Multiply the shuffle indicies by two. */ - emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx)); + if (maskmode == V8SImode) + emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx)); + else + emit_insn (gen_addv32qi3 (t1, t1, t1)); I guess this would be cleaner to use plus always. And thus expand_simple_binop instead of (a couple of) these mode tests. + case V32QImode: + t1 = gen_reg_rtx (V32QImode); + t2 = gen_reg_rtx (V32QImode); + t3 = gen_reg_rtx (V32QImode); + vt2 = GEN_INT (128); + for (i = 0; i 32; i++) + vec[i] = vt2; + vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec)); + vt = force_reg (V32QImode, vt); + for (i = 0; i 32; i++) + vec[i] = i 16 ? vt2 : const0_rtx; + vt2 = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec)); + vt2 = force_reg (V32QImode, vt2); + emit_insn (gen_avx2_lshlv4di3 (gen_lowpart (V4DImode, t1), + gen_lowpart (V4DImode, mask), + GEN_INT (3))); + emit_insn (gen_avx2_andnotv32qi3 (t2, vt, mask)); + emit_insn (gen_xorv32qi3 (t1, t1, vt2)); + emit_insn (gen_andv32qi3 (t1, t1, vt)); + emit_insn (gen_iorv32qi3 (t3, t1, t2)); + emit_insn (gen_xorv32qi3 (t1, t1, vt)); + emit_insn (gen_avx2_permv4di_1 (gen_lowpart (V4DImode, t3), + gen_lowpart (V4DImode, t3), + const2_rtx, GEN_INT (3), + const0_rtx, const1_rtx)); + emit_insn (gen_iorv32qi3 (t1, t1, t2)); Some commentary here is required. I might have expected to see a compare, or something, but the logical operations here are less than obvious. I believe I've commented on everything else in the previous messages. r~
Re: PR c++/30195
Copying the decl is unlikely to do what we want, I think. Does putting the target decl directly into the method vec work? Unfortunately not, it ends up with the same error: undefined reference. Hunh, that's surprising. Furthermore, I don't think it is the right approach since the access may be different between the member function and the using declaration... Never mind. I would expect the existing access declaration code to deal with that, though I could be wrong. There don't seem to be any tests for a class that both uses and defines functions with the same name to verify that both functions can be called; I suspect that doesn't work yet with this patch. If we can't put the used functions directly into CLASSTYPE_METHOD_VEC, we need to combine them with functions from there at lookup time. + if (TREE_CODE (target_field) == FUNCTION_DECL +DECL_NAME (OVL_CURRENT (target_field)) == name) Checking for FUNCTION_DECL won't work if the target is overloaded. Jason
Re: [C++ Patch] PR 50594 (C++ front-end bits)
On 10/12/2011 09:18 AM, Paolo Carlini wrote: newattrs = build_tree_list (get_identifier (alloc_size), build_tree_list (NULL_TREE, integer_one_node)); +extvisattr = build_tree_list (get_identifier (externally_visible), + NULL_TREE); +newattrs = chainon (newattrs, extvisattr); Instead of chainon you could build newattrs after extvisattr with tree_cons. Jason
Re: RFC: Add ADD_RESTRICT tree code
On Wed, Oct 12, 2011 at 07:16:56PM +0200, Michael Matz wrote: This patch will fix the currently XFAILed tree-ssa/restrict-4.c again, as well as fix PR 50419. It also still fixes the original testcase of PR 49279. But it will break the checked in testcase for this bug report. I believe the checked in testcase is invalid as follows: struct S { int a; int *__restrict p; }; int foo (int *p, int *q) { struct S s, *t; s.a = 1; s.p = p; // 1 t = wrap(s); // 2 t=s in effect, but GCC doesn't see this t-p = q; // 3 s.p[0] = 0;// 4 t-p[0] = 1; // 5 return s.p[0]; // 6 } I'm fairly sure this is completely valid. Assignment 2 means that t-p points to s.p. Assignment 3 changes t-p and s.p, but the change to s.p doesn't occur through a pointer based on t-p or any other restrict pointer, in fact it doesn't occur through any explicit initialization or assignment, but rather through in indirect access via a different pointer. Hence the accesses to the same memory object at s.p[0] and t-p[0] were undefined because both accesses weren't through pointers based on each other. Only the field p in the structure is restrict qualified, there is no restrict qualification on the other pointers (e.g. t is not restrict). Thus, it is valid that t points to s. And, s.p[0] access is based on s.p as well as t-p and similarly t-p[0] access is based on s.p as well as t-p, in the sense of the ISO C99 restrict wording. Because, if you change t-p (or s.p) at some point in between t-p = q; and s.p[0]; (i.e. prior to the access) to point to a copy of the array, both s.p and t-p change. In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E. Note that ‘‘based’’ is defined only for expressions with pointer types. Which means that for memory restricts (fields in particular) we need to limit ourselves to the cases where the field is accessed through a restricted pointer or doesn't have address taken. Jakub
Re: [C++-11] User defined literals
On 10/12/2011 01:05 AM, Ed Smith-Rowland wrote: cp_parser_operator(function_id) is simply run twice in cp_parser_unqualified_id. Once inside cp_parser_template_id called at parser.c:4515. Once directly inside cp_parser_unqualified_id at parser.c:4525. Ah. You could try replacing the operator X tokens with a single CPP_LITERAL_OPERATOR token, like we do for CPP_NESTED_NAME_SPECIFIER and CPP_TEMPLATE_ID. cp_parser_template_id never succeeds with literal operator templates. I find that curious. But I haven't looked real hard and the things do get parsed somehow. I'd only expect it to succeed if you actually wrote, e.g., operator_c'a','b','c'(); Jason
Re: [C++ Patch] PR 50594 (C++ front-end bits)
On 10/12/2011 07:56 PM, Jason Merrill wrote: On 10/12/2011 09:18 AM, Paolo Carlini wrote: newattrs = build_tree_list (get_identifier (alloc_size), build_tree_list (NULL_TREE, integer_one_node)); +extvisattr = build_tree_list (get_identifier (externally_visible), + NULL_TREE); +newattrs = chainon (newattrs, extvisattr); Instead of chainon you could build newattrs after extvisattr with tree_cons. Yes. Like this? Paolo. / Index: decl.c === --- decl.c (revision 179859) +++ decl.c (working copy) @@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void) current_lang_name = lang_name_cplusplus; { -tree newattrs; +tree newattrs, extvisattr; tree newtype, deltype; tree ptr_ftype_sizetype; tree new_eh_spec; @@ -3684,12 +3684,15 @@ cxx_init_decl_processing (void) /* Ensure attribs.c is initialized. */ init_attributes (); -newattrs - = build_tree_list (get_identifier (alloc_size), -build_tree_list (NULL_TREE, integer_one_node)); +extvisattr = build_tree_list (get_identifier (externally_visible), + NULL_TREE); +newattrs = tree_cons (get_identifier (alloc_size), + build_tree_list (NULL_TREE, integer_one_node), + extvisattr); newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs); newtype = build_exception_variant (newtype, new_eh_spec); -deltype = build_exception_variant (void_ftype_ptr, empty_except_spec); +deltype = cp_build_type_attribute_variant (void_ftype_ptr, extvisattr); +deltype = build_exception_variant (deltype, empty_except_spec); push_cp_library_fn (NEW_EXPR, newtype); push_cp_library_fn (VEC_NEW_EXPR, newtype); global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype);
Re: [C++ Patch] PR 50594 (C++ front-end bits)
OK. Jason
[PATCH] Slightly fix up vgather* patterns (take 2)
On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote: BTW: No need to use %c modifier: /* Meaning of CODE: L,W,B,Q,S,T -- print the opcode suffix for specified size of operand. C -- print opcode suffix for set/cmov insn. c -- like C, but print reversed condition ... */ Well, something needs to be used there, because otherwise we get addresses like (%rax, %ymm0, $4) instead of the needed (%rax, %ymm0, 4) I've used %p6 instead of %c6 in the patch below. On Mon, Oct 10, 2011 at 01:47:49PM -0700, Richard Henderson wrote: The use of match_dup in the clobber is wrong. We should not be clobbering the user-visible copy of the operand. That does not make sense when dealing with the user-visible builtin. Ok. Instead, use (clobber (match_scratch)) and matching constraints with operand 4. Ok. I think that a (mem (scratch)) as input to the unspec is probably best. The exact memory usage is almost certainly too complex to describe in a useful way. Ok, so how about this (so far untested, will bootstrap/regtest it soon)? 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (avx2_gathersimode, avx2_gatherdimode, avx2_gatherdimode256): Add clobber of match_scratch, change memory_operand to register_operand, add (mem:BLK (scratch)) use. (*avx2_gathersimode, *avx2_gatherdimode, *avx2_gatherdimode256): Add clobber of match_scratch, add earlyclobber to the output operand and match_scratch, add (mem:BLK (scratch)) use, change the other mem to match_operand. Use %p6 instead of %c6 in the pattern. * config/i386/i386.c (ix86_expand_builtin): Adjust for operand 2 being a Pmode register_operand instead of memory_operand. --- gcc/config/i386/i386.c.jj 2011-10-12 16:15:50.0 +0200 +++ gcc/config/i386/i386.c 2011-10-12 19:12:15.0 +0200 @@ -28862,7 +28862,6 @@ rdrand_step: op4 = expand_normal (arg4); /* Note the arg order is different from the operand order. */ mode0 = insn_data[icode].operand[1].mode; - mode1 = insn_data[icode].operand[2].mode; mode2 = insn_data[icode].operand[3].mode; mode3 = insn_data[icode].operand[4].mode; mode4 = insn_data[icode].operand[5].mode; @@ -28876,12 +28875,11 @@ rdrand_step: if (GET_MODE (op1) != Pmode) op1 = convert_to_mode (Pmode, op1, 1); op1 = force_reg (Pmode, op1); - op1 = gen_rtx_MEM (mode1, op1); if (!insn_data[icode].operand[1].predicate (op0, mode0)) op0 = copy_to_mode_reg (mode0, op0); - if (!insn_data[icode].operand[2].predicate (op1, mode1)) - op1 = copy_to_mode_reg (mode1, op1); + if (!insn_data[icode].operand[2].predicate (op1, Pmode)) + op1 = copy_to_mode_reg (Pmode, op1); if (!insn_data[icode].operand[3].predicate (op2, mode2)) op2 = copy_to_mode_reg (mode2, op2); if (!insn_data[icode].operand[4].predicate (op3, mode3)) --- gcc/config/i386/sse.md.jj 2011-10-12 16:16:49.0 +0200 +++ gcc/config/i386/sse.md 2011-10-12 19:19:55.0 +0200 @@ -12582,55 +12582,61 @@ (define_mode_attr VEC_GATHER_MODE (V8SI V8SI) (V8SF V8SI)]) (define_expand avx2_gathersimode - [(set (match_operand:VEC_GATHER_MODE 0 register_operand ) - (unspec:VEC_GATHER_MODE - [(match_operand:VEC_GATHER_MODE 1 register_operand ) - (match_operand:ssescalarmode 2 memory_operand ) - (match_operand:VEC_GATHER_MODE 3 register_operand ) - (match_operand:VEC_GATHER_MODE 4 register_operand ) - (match_operand:SI 5 const1248_operand )] - UNSPEC_GATHER))] + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 register_operand ) + (unspec:VEC_GATHER_MODE +[(match_operand:VEC_GATHER_MODE 1 register_operand ) + (match_operand 2 register_operand ) + (mem:BLK (scratch)) + (match_operand:VEC_GATHER_MODE 3 register_operand ) + (match_operand:VEC_GATHER_MODE 4 register_operand ) + (match_operand:SI 5 const1248_operand )] +UNSPEC_GATHER)) + (clobber (match_scratch:VEC_GATHER_MODE 6 ))])] TARGET_AVX2) (define_insn *avx2_gathersimode - [(set (match_operand:VEC_GATHER_MODE 0 register_operand =x) + [(set (match_operand:VEC_GATHER_MODE 0 register_operand =x) (unspec:VEC_GATHER_MODE - [(match_operand:VEC_GATHER_MODE 1 register_operand 0) - (mem:ssescalarmode -(match_operand:P 2 register_operand r)) - (match_operand:VEC_GATHER_MODE 3 register_operand x) - (match_operand:VEC_GATHER_MODE 4 register_operand x) - (match_operand:SI 5 const1248_operand n)] - UNSPEC_GATHER))] + [(match_operand:VEC_GATHER_MODE 2 register_operand 0) + (match_operand:P 3 register_operand r) + (mem:BLK
[v3] PR C++/50594
Hi, these are the library bits, which I committed together with the front-end bits approved by Jason. Tested x86_64-linux. Paolo. 2011-10-12 Paolo Carlini paolo.carl...@oracle.com PR c++/50594 * libsupc++/new (operator new, operator delete): Decorate with __attribute__((__externally_visible__)). * include/bits/c++config: Add _GLIBCXX_THROW. * libsupc++/del_op.cc: Adjust. * libsupc++/del_opv.cc: Likewise. * libsupc++/del_opnt.cc: Likewise. * libsupc++/del_opvnt.cc: Likewise. * libsupc++/new_op.cc: Likewise. * libsupc++/new_opv.cc: Likewise. * libsupc++/new_opnt.cc: Likewise. * libsupc++/new_opvnt.cc: Likewise. * testsuite/18_support/50594.cc: New. * testsuite/ext/profile/mutex_extensions_neg.cc: Adjust dg-error line number. Index: include/bits/c++config === --- include/bits/c++config (revision 179842) +++ include/bits/c++config (working copy) @@ -103,9 +103,11 @@ # ifdef __GXX_EXPERIMENTAL_CXX0X__ # define _GLIBCXX_NOEXCEPT noexcept # define _GLIBCXX_USE_NOEXCEPT noexcept +# define _GLIBCXX_THROW(_EXC) # else # define _GLIBCXX_NOEXCEPT # define _GLIBCXX_USE_NOEXCEPT throw() +# define _GLIBCXX_THROW(_EXC) throw(_EXC) # endif #endif Index: libsupc++/del_op.cc === --- libsupc++/del_op.cc (revision 179842) +++ libsupc++/del_op.cc (working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -41,7 +42,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void -operator delete(void* ptr) throw () +operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT { if (ptr) std::free(ptr); Index: libsupc++/new_opv.cc === --- libsupc++/new_opv.cc(revision 179842) +++ libsupc++/new_opv.cc(working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -27,7 +28,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void* -operator new[] (std::size_t sz) throw (std::bad_alloc) +operator new[] (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) { return ::operator new(sz); } Index: libsupc++/new_op.cc === --- libsupc++/new_op.cc (revision 179842) +++ libsupc++/new_op.cc (working copy) @@ -42,7 +42,7 @@ extern new_handler __new_handler; _GLIBCXX_WEAK_DEFINITION void * -operator new (std::size_t sz) throw (std::bad_alloc) +operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) { void *p; Index: libsupc++/del_opv.cc === --- libsupc++/del_opv.cc(revision 179842) +++ libsupc++/del_opv.cc(working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -27,7 +28,7 @@ #include new _GLIBCXX_WEAK_DEFINITION void -operator delete[] (void *ptr) throw () +operator delete[] (void *ptr) _GLIBCXX_USE_NOEXCEPT { ::operator delete (ptr); } Index: libsupc++/del_opnt.cc === --- libsupc++/del_opnt.cc (revision 179842) +++ libsupc++/del_opnt.cc (working copy) @@ -1,6 +1,7 @@ // Boilerplate support routines for -*- C++ -*- dynamic memory management. -// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation +// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 +// Free Software Foundation // // This file is part of GCC. // @@ -29,7 +30,7 @@ extern C void free (void *); _GLIBCXX_WEAK_DEFINITION void -operator delete (void *ptr, const std::nothrow_t) throw () +operator delete (void *ptr, const std::nothrow_t) _GLIBCXX_USE_NOEXCEPT { free (ptr); } Index: libsupc++/new === --- libsupc++/new (revision 179842) +++ libsupc++/new (working copy) @@ -1,7 +1,7 @@ // The -*- C++ -*- dynamic memory management header. // Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, -// 2003, 2004, 2005, 2006, 2007, 2009, 2010 +// 2003, 2004, 2005, 2006,
Re: [Patch, Fortran, committed] PR 50659: [4.4/4.5/4.6/4.7 Regression] ICE with PROCEDURE statement
Committed to the 4.6 branch as r179864: http://gcc.gnu.org/viewcvs?view=revisionrevision=179723 Cheers, Janus 2011/10/9 Janus Weil ja...@gcc.gnu.org: Hi all, I have just committed as obvious a patch for an ICE-on-valid problem with PROCEDURE statements: http://gcc.gnu.org/viewcvs?root=gccview=revrev=179723 The problem was the following: When setting up an external procedure or procedure pointer (declared via a PROCEDURE statement), we copy the expressions for the array bounds and string length from the interface symbol given in the PROCEDURE declaration (cf. 'resolve_procedure_interface'). If those expressions depend on the actual args of the interface, we have to replace those args by the args of the new procedure symbol that we're setting up. This is what 'gfc_expr_replace_symbols' / 'replace_symbol' does. Unfortunately we failed to check whether the symbol we try to replace is actually a dummy! Contrary to Andrew's initial assumption, I think the test case is valid. I could neither find a compiler which rejects it, nor a restriction in the standard which makes it invalid. The relevant part of F08 is probably chapter 7.1.11 (Specification expression). This states that a specification expression can contain variables, which are made accessible via use association. I'm planning to apply the patch to the 4.6, 4.5 and 4.4 branches soon. Cheers, Janus
Factor out allocation of sorted_fields (issue5253050)
This moves the allocation of sorted_fields_type elements into a new allocator function. It's not completely necessary in trunk, but in the pph branch we need to allocate this type from pph images, so we need to call it from outside of class.c OK for trunk? Tested on x86_64. Diego. * class.c (sorted_fields_type_new): Factor out of ... (finish_struct_1): ... here. diff --git a/gcc/cp/class.c b/gcc/cp/class.c index 2df9177..6185054 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -5663,6 +5663,22 @@ determine_key_method (tree type) return; } + +/* Allocate and return an instance of struct sorted_fields_type with + N fields. */ + +static struct sorted_fields_type * +sorted_fields_type_new (int n) +{ + struct sorted_fields_type *sft; + sft = ggc_alloc_sorted_fields_type (sizeof (struct sorted_fields_type) + + n * sizeof (tree)); + sft-len = n; + + return sft; +} + + /* Perform processing required when the definition of T (a class type) is complete. */ @@ -5792,9 +5808,7 @@ finish_struct_1 (tree t) n_fields = count_fields (TYPE_FIELDS (t)); if (n_fields 7) { - struct sorted_fields_type *field_vec = ggc_alloc_sorted_fields_type -(sizeof (struct sorted_fields_type) + n_fields * sizeof (tree)); - field_vec-len = n_fields; + struct sorted_fields_type *field_vec = sorted_fields_type_new (n_fields); add_fields_to_record_type (TYPE_FIELDS (t), field_vec, 0); qsort (field_vec-elts, n_fields, sizeof (tree), field_decl_cmp); -- 1.7.3.1 -- This patch is available for review at http://codereview.appspot.com/5253050
Add new debugging routines to the C++ parser (issue5232053)
I added this code while learning my way through the parser. It dumps most of the internal parser state. It also changes the lexer dumper to support dumping a window of tokens and highlighting a specific token when dumping. Tested on x86_64. OK for trunk? Diego. * parser.c: Remove ENABLE_CHECKING markers around debugging routines. (cp_lexer_dump_tokens): Add arguments START_TOKEN and CURR_TOKEN. Make static When printing CURR_TOKEN surround it in [[ ]]. Start printing at START_TOKEN. Update all users. (cp_debug_print_tree_if_set): New. (cp_debug_print_context): New. (cp_debug_print_context_stack): New. (cp_debug_print_flag): New. (cp_debug_print_unparsed_function): New. (cp_debug_print_unparsed_queues): New. (cp_debug_parser_tokens): New. (cp_debug_parser): New. (cp_lexer_start_debugging): Set cp_lexer_debug_stream to stderr. (cp_lexer_stop_debugging): Set cp_lexer_debug_stream to NULL. * parser.h (cp_lexer_dump_tokens): Remove declaration. (cp_debug_parser): Declare. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index cabe9aa..48d92bb 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -210,7 +210,6 @@ static void cp_lexer_commit_tokens (cp_lexer *); static void cp_lexer_rollback_tokens (cp_lexer *); -#ifdef ENABLE_CHECKING static void cp_lexer_print_token (FILE *, cp_token *); static inline bool cp_lexer_debugging_p @@ -219,15 +218,6 @@ static void cp_lexer_start_debugging (cp_lexer *) ATTRIBUTE_UNUSED; static void cp_lexer_stop_debugging (cp_lexer *) ATTRIBUTE_UNUSED; -#else -/* If we define cp_lexer_debug_stream to NULL it will provoke warnings - about passing NULL to functions that require non-NULL arguments - (fputs, fprintf). It will never be used, so all we need is a value - of the right type that's guaranteed not to be NULL. */ -#define cp_lexer_debug_stream stdout -#define cp_lexer_print_token(str, tok) (void) 0 -#define cp_lexer_debugging_p(lexer) 0 -#endif /* ENABLE_CHECKING */ static cp_token_cache *cp_token_cache_new (cp_token *, cp_token *); @@ -241,33 +231,64 @@ static void cp_parser_initial_pragma /* Variables. */ -#ifdef ENABLE_CHECKING /* The stream to which debugging output should be written. */ static FILE *cp_lexer_debug_stream; -#endif /* ENABLE_CHECKING */ /* Nonzero if we are parsing an unevaluated operand: an operand to sizeof, typeof, or alignof. */ int cp_unevaluated_operand; -#ifdef ENABLE_CHECKING -/* Dump up to NUM tokens in BUFFER to FILE. If NUM is 0, dump all the - tokens. */ +/* Dump up to NUM tokens in BUFFER to FILE starting with token + START_TOKEN. If START_TOKEN is NULL, the dump starts with the + first token in BUFFER. If NUM is 0, dump all the tokens. If + CURR_TOKEN is set and it is one of the tokens in BUFFER, it will be + highlighted by surrounding it in [[ ]]. */ -void -cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) *buffer, unsigned num) +static void +cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) *buffer, + cp_token *start_token, unsigned num, + cp_token *curr_token) { - unsigned i; + unsigned i, nprinted; cp_token *token; + bool do_print; fprintf (file, %u tokens\n, VEC_length (cp_token, buffer)); + if (buffer == NULL) +return; + if (num == 0) num = VEC_length (cp_token, buffer); - for (i = 0; VEC_iterate (cp_token, buffer, i, token) i num; i++) + if (start_token == NULL) +start_token = VEC_address (cp_token, buffer); + + if (start_token VEC_address (cp_token, buffer)) +{ + cp_lexer_print_token (file, VEC_index (cp_token, buffer, 0)); + fprintf (file, ... ); +} + + do_print = false; + nprinted = 0; + for (i = 0; VEC_iterate (cp_token, buffer, i, token) nprinted num; i++) { + if (token == start_token) + do_print = true; + + if (!do_print) + continue; + + nprinted++; + if (token == curr_token) + fprintf (file, [[); + cp_lexer_print_token (file, token); + + if (token == curr_token) + fprintf (file, ]]); + switch (token-type) { case CPP_SEMICOLON: @@ -298,9 +319,218 @@ cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) *buffer, unsigned num) void cp_lexer_debug_tokens (VEC(cp_token,gc) *buffer) { - cp_lexer_dump_tokens (stderr, buffer, 0); + cp_lexer_dump_tokens (stderr, buffer, NULL, 0, NULL); +} + + +/* Dump the cp_parser tree field T to FILE if T is non-NULL. DESC is the + description for T. */ + +static void +cp_debug_print_tree_if_set (FILE *file, const char *desc, tree t) +{ + if (t) +{ + fprintf (file, %s: , desc); + print_node_brief (file, , t, 0); +} +} + + +/* Dump parser context C to FILE. */ + +static void +cp_debug_print_context (FILE *file, cp_parser_context *c) +{ + const char *status_s[] = {
[PATCH] Fix number of arguments in call to alloca_with_align
Richard, This patch fixes a trivial problem in gimplify_parameters, introduced by the patch that introduced BUILT_IN_ALLOCA_WITH_ALIGN. BUILT_IN_ALLOCA_WITH_ALIGN has 2 parameters, so the number of arguments in the corresponding build_call_expr should be 2, not 1. Bootstrapped and reg-tested (including Ada) on x86_64. OK for trunk? Thanks, - Tom 2011-10-12 Tom de Vries t...@codesourcery.com * function.c (gimplify_parameters): Set number of arguments of call to BUILT_IN_ALLOCA_WITH_ALIGN to 2. Index: gcc/function.c === --- gcc/function.c (revision 179773) +++ gcc/function.c (working copy) @@ -3636,7 +3636,7 @@ gimplify_parameters (void) local = build_fold_indirect_ref (addr); t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]; - t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm), + t = build_call_expr (t, 2, DECL_SIZE_UNIT (parm), size_int (DECL_ALIGN (parm))); /* The call has been built for a variable-sized object. */
Re: [PATCH] Slightly fix up vgather* patterns (take 2)
On 10/12/2011 11:25 AM, Jakub Jelinek wrote: * config/i386/sse.md (avx2_gathersimode, avx2_gatherdimode, avx2_gatherdimode256): Add clobber of match_scratch, change memory_operand to register_operand, add (mem:BLK (scratch)) use. (*avx2_gathersimode, *avx2_gatherdimode, *avx2_gatherdimode256): Add clobber of match_scratch, add earlyclobber to the output operand and match_scratch, add (mem:BLK (scratch)) use, change the other mem to match_operand. Use %p6 instead of %c6 in the pattern. * config/i386/i386.c (ix86_expand_builtin): Adjust for operand 2 being a Pmode register_operand instead of memory_operand. Ok. It looks like these 4 patterns could be macro-ized some more. But that can wait for a follow-up. r~
Re: [PATCH] Add mulv32qi3 support
On 10/12/2011 09:24 AM, Jakub Jelinek wrote: BTW, I wonder if vector multiply expansion when one argument is VECTOR_CST with all elements the same shouldn't use something similar to what expand_mult does, not sure if in the generic code or at least in the backends. Testing the costs will be harder, maybe it could just test fewer algorithms and perhaps just count number of instructions or something similar. But certainly e.g. v32qi multiplication by 3 is quite costly (4 interleaves, 2 v16hi multiplications, 4 insns to select even from the two), while two vector additions (tmp = x + x; result = x + tmp;) would do the job. It would certainly be a good thing to try to do this in the middle-end. 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (vec_avx2): New mode_attr. (mulv16qi3): Macroize to cover also mulv32qi3 for TARGET_AVX2 into ... (mulmode3): ... this. Ok. r~
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
From: Eric Botcazou ebotca...@adacore.com Date: Wed, 12 Oct 2011 00:33:43 +0200 I see, so we can test the code generation in the testsuite even if the compiler was built against an assembler without support for the instructions. At least partially, yes. But in such a case, I'm unsure if I understand why i386.exp needs these tests at all. The presence of support for a particular i386 intrinsic is an implicit property of the gcc sources that these test cases are a part of. If the tests are properly added only once the code to support the i386 intrinsic is added as well, the checks seem superfluous. The check is an _object_ check, for example: ... So the first category of tests will always be executed, whereas the latter two will only be executed if you have the binutils support. Thanks a lot for explaining things. I'm currently testing the following patch in various scenerios, I'm pretty sure this is what you had in mind. Any feedback is appreciated, thanks again Eric. diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 9c7cc56..fa790b3 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -850,7 +850,11 @@ sparc_option_override (void) cpu = cpu_table[(int) sparc_cpu_and_features]; target_flags = ~cpu-disable; - target_flags |= cpu-enable; + target_flags |= (cpu-enable +#ifndef HAVE_AS_FMAF_HPC_VIS3 + ~(MASK_FMAF | MASK_VIS3) +#endif + ); /* If -mfpu or -mno-fpu was explicitly used, don't override with the processor default. */ diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index 0642ff2..669f106 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -1871,10 +1871,6 @@ extern int sparc_indent_opcode; #ifndef HAVE_AS_FMAF_HPC_VIS3 #define AS_NIAGARA3_FLAG b -#undef TARGET_FMAF -#define TARGET_FMAF 0 -#undef TARGET_VIS3 -#define TARGET_VIS3 0 #else #define AS_NIAGARA3_FLAG d #endif diff --git a/gcc/testsuite/gcc.target/sparc/cmask.c b/gcc/testsuite/gcc.target/sparc/cmask.c index 989274c..b3168ec 100644 --- a/gcc/testsuite/gcc.target/sparc/cmask.c +++ b/gcc/testsuite/gcc.target/sparc/cmask.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 -mvis } */ void test_cm8 (long x) diff --git a/gcc/testsuite/gcc.target/sparc/fhalve.c b/gcc/testsuite/gcc.target/sparc/fhalve.c index 737fc71..340b936 100644 --- a/gcc/testsuite/gcc.target/sparc/fhalve.c +++ b/gcc/testsuite/gcc.target/sparc/fhalve.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 -mvis } */ float test_fhadds (float x, float y) diff --git a/gcc/testsuite/gcc.target/sparc/fnegop.c b/gcc/testsuite/gcc.target/sparc/fnegop.c index 3e3e72c..25f8c19 100644 --- a/gcc/testsuite/gcc.target/sparc/fnegop.c +++ b/gcc/testsuite/gcc.target/sparc/fnegop.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -O2 -mcpu=niagara3 -mvis } */ float test_fnadds(float x, float y) diff --git a/gcc/testsuite/gcc.target/sparc/fpadds.c b/gcc/testsuite/gcc.target/sparc/fpadds.c index f55cb05..d0704e0 100644 --- a/gcc/testsuite/gcc.target/sparc/fpadds.c +++ b/gcc/testsuite/gcc.target/sparc/fpadds.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 -mvis } */ typedef int __v2si __attribute__((vector_size(8))); typedef int __v1si __attribute__((vector_size(4))); diff --git a/gcc/testsuite/gcc.target/sparc/fshift.c b/gcc/testsuite/gcc.target/sparc/fshift.c index 6adbed6..a12df04 100644 --- a/gcc/testsuite/gcc.target/sparc/fshift.c +++ b/gcc/testsuite/gcc.target/sparc/fshift.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 -mvis } */ typedef int __v2si __attribute__((vector_size(8))); typedef short __v4hi __attribute__((vector_size(8))); diff --git a/gcc/testsuite/gcc.target/sparc/fucmp.c b/gcc/testsuite/gcc.target/sparc/fucmp.c index 4e7ecad..7f291c3 100644 --- a/gcc/testsuite/gcc.target/sparc/fucmp.c +++ b/gcc/testsuite/gcc.target/sparc/fucmp.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 -mvis } */ typedef unsigned char vec8 __attribute__((vector_size(8))); diff --git a/gcc/testsuite/gcc.target/sparc/lzd.c b/gcc/testsuite/gcc.target/sparc/lzd.c index 5ffaf56..a897829 100644 --- a/gcc/testsuite/gcc.target/sparc/lzd.c +++ b/gcc/testsuite/gcc.target/sparc/lzd.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { vis3 } } } */ +/* { dg-do compile } */ /* { dg-options -mcpu=niagara3 } */ int test_clz(int a) { diff --git a/gcc/testsuite/gcc.target/sparc/vis3misc.c b/gcc/testsuite/gcc.target/sparc/vis3misc.c index e3ef49e..8a9535e 100644 --- a/gcc/testsuite/gcc.target/sparc/vis3misc.c +++
Rename some fields in struct language_function. (issue5229058)
This patch is needed in the pph branch because streamers need access to the fields in struct language_function without going through cp_function_chain. Since these fields are named exactly like their #define counterparts, we cannot reference them without the pre-processor expanding the #defines, which causes build errors. OK for trunk? Tested on x86_64. Diego. * cp-tree.h (struct language_function): Rename returns_value to x_returns_value. Rename returns_null to x_returns_null. Rename returns_abnormally to x_returns_abnormally. Rename in_function_try_handler to x_in_function_try_handler. Rename in_base_initializer to x_in_base_initializer. Update all users. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index b53accf..a163cd2 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -1050,11 +1050,11 @@ struct GTY(()) language_function { tree x_vtt_parm; tree x_return_value; - BOOL_BITFIELD returns_value : 1; - BOOL_BITFIELD returns_null : 1; - BOOL_BITFIELD returns_abnormally : 1; - BOOL_BITFIELD in_function_try_handler : 1; - BOOL_BITFIELD in_base_initializer : 1; + BOOL_BITFIELD x_returns_value : 1; + BOOL_BITFIELD x_returns_null : 1; + BOOL_BITFIELD x_returns_abnormally : 1; + BOOL_BITFIELD x_in_function_try_handler : 1; + BOOL_BITFIELD x_in_base_initializer : 1; /* True if this function can throw an exception. */ BOOL_BITFIELD can_throw : 1; @@ -1107,23 +1107,23 @@ struct GTY(()) language_function { /* Set to 0 at beginning of a function definition, set to 1 if a return statement that specifies a return value is seen. */ -#define current_function_returns_value cp_function_chain-returns_value +#define current_function_returns_value cp_function_chain-x_returns_value /* Set to 0 at beginning of a function definition, set to 1 if a return statement with no argument is seen. */ -#define current_function_returns_null cp_function_chain-returns_null +#define current_function_returns_null cp_function_chain-x_returns_null /* Set to 0 at beginning of a function definition, set to 1 if a call to a noreturn function is seen. */ #define current_function_returns_abnormally \ - cp_function_chain-returns_abnormally + cp_function_chain-x_returns_abnormally /* Nonzero if we are processing a base initializer. Zero elsewhere. */ -#define in_base_initializer cp_function_chain-in_base_initializer +#define in_base_initializer cp_function_chain-x_in_base_initializer -#define in_function_try_handler cp_function_chain-in_function_try_handler +#define in_function_try_handler cp_function_chain-x_in_function_try_handler /* Expression always returned from function, or error_mark_node otherwise, for use by the automatic named return value optimization. */ -- This patch is available for review at http://codereview.appspot.com/5229058
Re: Out-of-order update of new_spill_reg_store[]
Bernd Schmidt ber...@codesourcery.com writes: On 10/11/11 14:35, Richard Sandiford wrote: No, reload 1 is inherited by a later instruction. And it's inherited correctly, in terms of the register contents being what we expect. (Reload 1 is the one that survives to the end of the instruction's reload sequence. Reload 2, in contrast, is clobbered by reload 1, so could not be inherited. So when we record inheritance information in emit_reload_insns, reload_reg_reaches_end_p correctly stops us from recording reload 2 but allows us to record reload 1.) The problem is that we record the wrong instruction for reload 1. We say that reload 1 is performed by the instruction that performs reload 2. So spill_reg_store[] contains the instruction for reload 2 rather than the instruction for reload 1. We delete it in delete_output_reload at the point of inheritance. Ok. So, would the minimal fix of testing !new_spill_reg_store[..] before writing to it also work? Seems to me this would cope with the out-of-order writes by only allowing the first. If so, then I think I'd prefer that, but we could gcc_assert (reload_reg_reaches_end (..)) as a bit of a verification of that function. I don't think the assert would be safe. We could have similar reuse in cases where the first reload (in rld order) is a double-register value starting in $4 and the second reload uses just $5. In that case, the first reload will have set new_spill_reg_store[4], so new_spill_reg_store[5] will still be null. But $5 in the second reload won't survive until the end of the sequence. So we'd try to set new_spill_reg_store[5] and trip the assert. IMO it's a choice between just checking for null and not asserting (if that really is safe, since we'll be storing instructions that don't actually reach the end of the reload sequence), or checking reload_reg_reaches_end. I prefer the second, since it seems more direct, and matches the corresponding code in emit_reload_insns. Richard
Re: Rename some fields in struct language_function. (issue5229058)
On 10/12/2011 04:48 PM, Diego Novillo wrote: This patch is needed in the pph branch because streamers need access to the fields in struct language_function without going through cp_function_chain. Since these fields are named exactly like their #define counterparts, we cannot reference them without the pre-processor expanding the #defines, which causes build errors. -#define current_function_returns_value cp_function_chain-returns_value +#define current_function_returns_value cp_function_chain-x_returns_value -#define current_function_returns_null cp_function_chain-returns_null +#define current_function_returns_null cp_function_chain-x_returns_null #define current_function_returns_abnormally \ - cp_function_chain-returns_abnormally + cp_function_chain-x_returns_abnormally Doesn't seem necessary for these three. OK for in_*. Jason
Re: Factor out allocation of sorted_fields (issue5253050)
OK. Jason
Re: int_cst_hash_table mapping persistence and the garbage collector
I think there is an issue when two cache htabs refer to each other with respect to GC, you might search the list to find out more. I'm not sure this is the case here, there seems to be a clear hierarchy. -- Eric Botcazou
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
I'm currently testing the following patch in various scenerios, I'm pretty sure this is what you had in mind. Yes, this seems to go in the right direction. Don't you need to pass -mvis3 instead of -mvis? Do you need to pass -mcpu=niagara3 at all? -- Eric Botcazou
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
From: Eric Botcazou ebotca...@adacore.com Date: Wed, 12 Oct 2011 23:08:39 +0200 I'm currently testing the following patch in various scenerios, I'm pretty sure this is what you had in mind. Yes, this seems to go in the right direction. Don't you need to pass -mvis3 instead of -mvis? Do you need to pass -mcpu=niagara3 at all? Yes, I need to correct the testcase flags now. I just noticed this while testing. I will post a finalized patch later tonight.
[PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support (take 2)
On Wed, Oct 12, 2011 at 10:49:33AM -0700, Richard Henderson wrote: I believe I've commented on everything else in the previous messages. Here is an updated patch which should incorporate your comments from both mails (thanks for them). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (UNSPEC_VPERMDI): Remove. * config/i386/i386.c (ix86_expand_vec_perm): Handle V16QImode and V32QImode for TARGET_AVX2. (MAX_VECT_LEN): Increase to 32. (expand_vec_perm_blend): Add support for 32-byte integer vectors with TARGET_AVX2. (valid_perm_using_mode_p): New function. (expand_vec_perm_pshufb): Add support for 32-byte integer vectors with TARGET_AVX2. (expand_vec_perm_vpshufb2_vpermq): New function. (expand_vec_perm_vpshufb2_vpermq_even_odd): New function. (expand_vec_perm_even_odd_1): Handle 32-byte integer vectors with TARGET_AVX2. (ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq and expand_vec_perm_vpshufb2_vpermq_even_odd. * config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2 32-byte integer vector modes. (vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128. (avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto. (avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate 4 new operands. (avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use match_dup, instead add 4 new operands and require they have right cross-lane values. (avx2_permv4di): Change into define_expand. (avx2_permv4di_1): New instruction. (avx2_permv2ti): Use nonimmediate_operand instead of register_operand for xm constrained operand. (VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2. --- gcc/config/i386/i386.md.jj 2011-10-12 20:28:19.0 +0200 +++ gcc/config/i386/i386.md 2011-10-12 20:30:00.0 +0200 @@ -235,7 +235,6 @@ (define_c_enum unspec [ UNSPEC_VPERMSI UNSPEC_VPERMDF UNSPEC_VPERMSF - UNSPEC_VPERMDI UNSPEC_VPERMTI UNSPEC_GATHER --- gcc/config/i386/i386.c.jj 2011-10-12 20:28:19.0 +0200 +++ gcc/config/i386/i386.c 2011-10-12 21:01:39.0 +0200 @@ -19334,7 +19334,7 @@ ix86_expand_vec_perm (rtx operands[]) rtx op0 = operands[1]; rtx op1 = operands[2]; rtx mask = operands[3]; - rtx t1, t2, vt, vec[16]; + rtx t1, t2, t3, t4, vt, vt2, vec[32]; enum machine_mode mode = GET_MODE (op0); enum machine_mode maskmode = GET_MODE (mask); int w, e, i; @@ -19343,50 +19343,68 @@ ix86_expand_vec_perm (rtx operands[]) /* Number of elements in the vector. */ w = GET_MODE_NUNITS (mode); e = GET_MODE_UNIT_SIZE (mode); - gcc_assert (w = 16); + gcc_assert (w = 32); if (TARGET_AVX2) { - if (mode == V4DImode || mode == V4DFmode) + if (mode == V4DImode || mode == V4DFmode || mode == V16HImode) { /* Unfortunately, the VPERMQ and VPERMPD instructions only support an constant shuffle operand. With a tiny bit of effort we can use VPERMD instead. A re-interpretation stall for V4DFmode is -unfortunate but there's no avoiding it. */ - t1 = gen_reg_rtx (V8SImode); +unfortunate but there's no avoiding it. +Similarly for V16HImode we don't have instructions for variable +shuffling, while for V32QImode we can use after preparing suitable +masks vpshufb; vpshufb; vpermq; vpor. */ + + if (mode == V16HImode) + { + maskmode = mode = V32QImode; + w = 32; + e = 1; + } + else + { + maskmode = mode = V8SImode; + w = 8; + e = 4; + } + t1 = gen_reg_rtx (maskmode); /* Replicate the low bits of the V4DImode mask into V8SImode: mask = { A B C D } t1 = { A A B B C C D D }. */ - for (i = 0; i 4; ++i) + for (i = 0; i w / 2; ++i) vec[i*2 + 1] = vec[i*2] = GEN_INT (i * 2); - vt = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, vec)); - vt = force_reg (V8SImode, vt); - mask = gen_lowpart (V8SImode, mask); - emit_insn (gen_avx2_permvarv8si (t1, vt, mask)); + vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); + vt = force_reg (maskmode, vt); + mask = gen_lowpart (maskmode, mask); + if (maskmode == V8SImode) + emit_insn (gen_avx2_permvarv8si (t1, vt, mask)); + else + emit_insn (gen_avx2_pshufbv32qi3 (t1, mask, vt)); /* Multiply the shuffle indicies by two. */ - emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx)); + t1 = expand_simple_binop (maskmode, PLUS, t1, t1, t1, 1, +
[PATCH] Add VEC_UNPACK_{HI,LO}_EXPR support for V{32QI,16HI,8SI} with AVX2
Hi! This patch allows to vectorize char a[1024], c[1024]; long long b[1024]; void foo (void) { int i; for (i = 0; i 1024; i++) b[i] = a[i] + 3 * c[i]; } using 32-byte vectors with -mavx2. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (vec_unpacks_lo_mode, vec_unpacks_hi_mode, vec_unpacku_lo_mode, vec_unpacku_hi_mode): Change VI124_128 mode to VI124_AVX2. * config/i386/i386.c (ix86_expand_sse_unpack): Handle V32QImode, V16HImode and V8SImode for TARGET_AVX2. --- gcc/config/i386/sse.md.jj 2011-10-12 15:42:12.0 +0200 +++ gcc/config/i386/sse.md 2011-10-12 16:16:49.0 +0200 @@ -7536,25 +7536,25 @@ (define_insn vec_concatv2di (define_expand vec_unpacks_lo_mode [(match_operand:sseunpackmode 0 register_operand ) - (match_operand:VI124_128 1 register_operand )] + (match_operand:VI124_AVX2 1 register_operand )] TARGET_SSE2 ix86_expand_sse_unpack (operands, false, false); DONE;) (define_expand vec_unpacks_hi_mode [(match_operand:sseunpackmode 0 register_operand ) - (match_operand:VI124_128 1 register_operand )] + (match_operand:VI124_AVX2 1 register_operand )] TARGET_SSE2 ix86_expand_sse_unpack (operands, false, true); DONE;) (define_expand vec_unpacku_lo_mode [(match_operand:sseunpackmode 0 register_operand ) - (match_operand:VI124_128 1 register_operand )] + (match_operand:VI124_AVX2 1 register_operand )] TARGET_SSE2 ix86_expand_sse_unpack (operands, true, false); DONE;) (define_expand vec_unpacku_hi_mode [(match_operand:sseunpackmode 0 register_operand ) - (match_operand:VI124_128 1 register_operand )] + (match_operand:VI124_AVX2 1 register_operand )] TARGET_SSE2 ix86_expand_sse_unpack (operands, true, true); DONE;) --- gcc/config/i386/i386.c.jj 2011-10-12 14:19:26.0 +0200 +++ gcc/config/i386/i386.c 2011-10-12 16:15:50.0 +0200 @@ -19658,9 +19658,38 @@ ix86_expand_sse_unpack (rtx operands[2], if (TARGET_SSE4_1) { rtx (*unpack)(rtx, rtx); + rtx (*extract)(rtx, rtx) = NULL; + enum machine_mode halfmode = BLKmode; switch (imode) { + case V32QImode: + if (unsigned_p) + unpack = gen_avx2_zero_extendv16qiv16hi2; + else + unpack = gen_avx2_sign_extendv16qiv16hi2; + halfmode = V16QImode; + extract + = high_p ? gen_vec_extract_hi_v32qi : gen_vec_extract_lo_v32qi; + break; + case V16HImode: + if (unsigned_p) + unpack = gen_avx2_zero_extendv8hiv8si2; + else + unpack = gen_avx2_sign_extendv8hiv8si2; + halfmode = V8HImode; + extract + = high_p ? gen_vec_extract_hi_v16hi : gen_vec_extract_lo_v16hi; + break; + case V8SImode: + if (unsigned_p) + unpack = gen_avx2_zero_extendv4siv4di2; + else + unpack = gen_avx2_sign_extendv4siv4di2; + halfmode = V4SImode; + extract + = high_p ? gen_vec_extract_hi_v8si : gen_vec_extract_lo_v8si; + break; case V16QImode: if (unsigned_p) unpack = gen_sse4_1_zero_extendv8qiv8hi2; @@ -19683,7 +19712,12 @@ ix86_expand_sse_unpack (rtx operands[2], gcc_unreachable (); } - if (high_p) + if (GET_MODE_SIZE (imode) == 32) + { + tmp = gen_reg_rtx (halfmode); + emit_insn (extract (tmp, operands[1])); + } + else if (high_p) { /* Shift higher 8 bytes to lower 8 bytes. */ tmp = gen_reg_rtx (imode); Jakub
Re: New warning for expanded vector operations
This patch fixed PR50704. gcc/testsuite: * gcc.target/i386/warn-vect-op-3.c: Exclude ia32 target. * gcc.target/i386/warn-vect-op-1.c: Ditto. * gcc.target/i386/warn-vect-op-2.c: Ditto. Ok for trunk? Artem. On Wed, Oct 12, 2011 at 4:40 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Oct 11, 2011 at 9:11 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Committed with the revision 179807. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50704 -- H.J. fix-performance-tests.diff Description: Binary data
Re: [PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support (take 2)
On 10/12/2011 02:23 PM, Jakub Jelinek wrote: 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (UNSPEC_VPERMDI): Remove. * config/i386/i386.c (ix86_expand_vec_perm): Handle V16QImode and V32QImode for TARGET_AVX2. (MAX_VECT_LEN): Increase to 32. (expand_vec_perm_blend): Add support for 32-byte integer vectors with TARGET_AVX2. (valid_perm_using_mode_p): New function. (expand_vec_perm_pshufb): Add support for 32-byte integer vectors with TARGET_AVX2. (expand_vec_perm_vpshufb2_vpermq): New function. (expand_vec_perm_vpshufb2_vpermq_even_odd): New function. (expand_vec_perm_even_odd_1): Handle 32-byte integer vectors with TARGET_AVX2. (ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq and expand_vec_perm_vpshufb2_vpermq_even_odd. * config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2 32-byte integer vector modes. (vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128. (avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto. (avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate 4 new operands. (avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use match_dup, instead add 4 new operands and require they have right cross-lane values. (avx2_permv4di): Change into define_expand. (avx2_permv4di_1): New instruction. (avx2_permv2ti): Use nonimmediate_operand instead of register_operand for xm constrained operand. (VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2. Ok. r~
Re: [PATCH] Add VEC_UNPACK_{HI,LO}_EXPR support for V{32QI,16HI,8SI} with AVX2
On 10/12/2011 02:28 PM, Jakub Jelinek wrote: 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (vec_unpacks_lo_mode, vec_unpacks_hi_mode, vec_unpacku_lo_mode, vec_unpacku_hi_mode): Change VI124_128 mode to VI124_AVX2. * config/i386/i386.c (ix86_expand_sse_unpack): Handle V32QImode, V16HImode and V8SImode for TARGET_AVX2. Ok. r~
[Ada] Enable vectorization for loops with dynamic bounds
Loops with static bounds are reasonably well vectorized in Ada. Problems arise when things start to go dynamic, because of the dynamic bounds themselves but also because of the checks. This patch is a first step towards enabling more vectorization in the dynamic cases. The generated code isn't pretty though... Tested on i586-suse-linux, applied on the mainline. 2011-10-12 Eric Botcazou ebotca...@adacore.com * gcc-interface/ada-tree.h (DECL_LOOP_PARM_P): New flag. (DECL_INDUCTION_VAR): New macro. (SET_DECL_INDUCTION_VAR): Likewise. * gcc-interface/gigi.h (convert_to_index_type): Declare. (gnat_invariant_expr): Likewise. * gcc-interface/decl.c (gnat_to_gnu_entity) object: If this is a loop parameter, set DECL_LOOP_PARM_P on it. * gcc-interface/misc.c (gnat_print_decl) VAR_DECL: If DECL_LOOP_PARM_P is set, print DECL_INDUCTION_VAR instead of DECL_RENAMED_OBJECT. * gcc-interface/trans.c (gnu_loop_label_stack): Delete. (struct range_check_info_d): New type. (struct loop_info_d): Likewise. (gnu_loop_stack): New stack. (Identifier_to_gnu): Set TREE_READONLY flag on the first dereference built for a by-double-ref read-only parameter. If DECL_LOOP_PARM_P is set, do not test DECL_RENAMED_OBJECT. (push_range_check_info): New function. (Loop_Statement_to_gnu): Push a new struct loop_info_d instead of just the label. Reference the label and the iteration variable from it. Build the special induction variable in the unsigned version of the size type, if it is larger than the base type. And attach it to the iteration variable if the latter isn't by-ref. In the iteration scheme case, initialize the invariant conditions in front of the loop if deemed profitable. Use gnu_loop_stack. (gnat_to_gnu) N_Exit_Statement: Use gnu_loop_stack. N_Raise_Constraint_Error: Always process the reason. In the range check and related cases, and if loop unswitching is enabled, compute invariant conditions and push this information onto the stack. Do not translate again the condition if it has been already translated. * gcc-interface/utils.c (record_global_renaming_pointer): Assert that DECL_LOOP_PARM_P isn't set. (convert_to_index_type): New function. * gcc-interface/utils2.c (build_binary_op) ARRAY_REF: Use it in order to convert the index from the base index type to sizetype. (gnat_invariant_expr): New function. 2011-10-12 Eric Botcazou ebotca...@adacore.com * gnat.dg/vect1.ad[sb]: New test. * gnat.dg/vect1_pkg.ads: New helper. * gnat.dg/vect2.ad[sb]: New test. * gnat.dg/vect2_pkg.ads: New helper. * gnat.dg/vect3.ad[sb]: New test. * gnat.dg/vect3_pkg.ads: New helper. * gnat.dg/vect4.ad[sb]: New test. * gnat.dg/vect4_pkg.ads: New helper. * gnat.dg/vect5.ad[sb]: New test. * gnat.dg/vect5_pkg.ads: New helper. * gnat.dg/vect6.ad[sb]: New test. * gnat.dg/vect6_pkg.ads: New helper. -- Eric Botcazou Index: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 179844) +++ gcc-interface/utils.c (working copy) @@ -1771,7 +1771,7 @@ process_attributes (tree decl, struct at void record_global_renaming_pointer (tree decl) { - gcc_assert (DECL_RENAMED_OBJECT (decl)); + gcc_assert (!DECL_LOOP_PARM_P (decl) DECL_RENAMED_OBJECT (decl)); VEC_safe_push (tree, gc, global_renaming_pointers, decl); } @@ -4247,6 +4247,92 @@ convert (tree type, tree expr) gcc_unreachable (); } } + +/* Create an expression whose value is that of EXPR converted to the common + index type, which is sizetype. EXPR is supposed to be in the base type + of the GNAT index type. Calling it is equivalent to doing + + convert (sizetype, expr) + + but we try to distribute the type conversion with the knowledge that EXPR + cannot overflow in its type. This is a best-effort approach and we fall + back to the above expression as soon as difficulties are encountered. + + This is necessary to overcome issues that arise when the GNAT base index + type and the GCC common index type (sizetype) don't have the same size, + which is quite frequent on 64-bit architectures. In this case, and if + the GNAT base index type is signed but the iteration type of the loop has + been forced to unsigned, the loop scalar evolution engine cannot compute + a simple evolution for the general induction variables associated with the + array indices, because it will preserve the wrap-around semantics in the + unsigned type of their inner part. As a result, many loop optimizations + are blocked. + + The solution is to use a special (basic) induction variable that is at + least as large as
[Ada] Housekeeping work in gigi (39/n)
Tested on i586-suse-linux, applied on the mainline. 2011-10-12 Eric Botcazou ebotca...@adacore.com * gcc-interface/trans.c (Attribute_to_gnu): Use remove_conversions. (push_range_check_info): Likewise. (gnat_to_gnu) N_Code_Statement: Likewise. * gcc-interface/utils2.c (build_unary_op) INDIRECT_REF: Likewise. (gnat_invariant_expr): Likewise. * gcc-interface/utils.c (compute_related_constant): Likewise. (max_size): Fix handling of SAVE_EXPR. (remove_conversions): Fix formatting. -- Eric Botcazou Index: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 179868) +++ gcc-interface/utils.c (working copy) @@ -1147,11 +1147,11 @@ compute_related_constant (tree op0, tree static tree split_plus (tree in, tree *pvar) { - /* Strip NOPS in order to ease the tree traversal and maximize the - potential for constant or plus/minus discovery. We need to be careful + /* Strip conversions in order to ease the tree traversal and maximize the + potential for constant or plus/minus discovery. We need to be careful to always return and set *pvar to bitsizetype trees, but it's worth the effort. */ - STRIP_NOPS (in); + in = remove_conversions (in, false); *pvar = convert (bitsizetype, in); @@ -2288,7 +2288,9 @@ max_size (tree exp, bool max_p) switch (TREE_CODE_LENGTH (code)) { case 1: - if (code == NON_LVALUE_EXPR) + if (code == SAVE_EXPR) + return exp; + else if (code == NON_LVALUE_EXPR) return max_size (TREE_OPERAND (exp, 0), max_p); else return @@ -2330,9 +2332,7 @@ max_size (tree exp, bool max_p) } case 3: - if (code == SAVE_EXPR) - return exp; - else if (code == COND_EXPR) + if (code == COND_EXPR) return fold_build2 (max_p ? MAX_EXPR : MIN_EXPR, type, max_size (TREE_OPERAND (exp, 1), max_p), max_size (TREE_OPERAND (exp, 2), max_p)); @@ -4359,8 +4359,9 @@ remove_conversions (tree exp, bool true_ return remove_conversions (TREE_OPERAND (exp, 0), true_address); break; -case VIEW_CONVERT_EXPR: case NON_LVALUE_EXPR: CASE_CONVERT: +case VIEW_CONVERT_EXPR: +case NON_LVALUE_EXPR: return remove_conversions (TREE_OPERAND (exp, 0), true_address); default: Index: gcc-interface/utils2.c === --- gcc-interface/utils2.c (revision 179868) +++ gcc-interface/utils2.c (working copy) @@ -1277,13 +1277,8 @@ build_unary_op (enum tree_code op_code, case INDIRECT_REF: { - bool can_never_be_null; - tree t = operand; - - while (CONVERT_EXPR_P (t) || TREE_CODE (t) == VIEW_CONVERT_EXPR) - t = TREE_OPERAND (t, 0); - - can_never_be_null = DECL_P (t) DECL_CAN_NEVER_BE_NULL_P (t); + tree t = remove_conversions (operand, false); + bool can_never_be_null = DECL_P (t) DECL_CAN_NEVER_BE_NULL_P (t); /* If TYPE is a thin pointer, first convert to the fat pointer. */ if (TYPE_IS_THIN_POINTER_P (type) @@ -2608,16 +2603,13 @@ gnat_invariant_expr (tree expr) { tree type = TREE_TYPE (expr), t; - STRIP_NOPS (expr); + expr = remove_conversions (expr, false); while ((TREE_CODE (expr) == CONST_DECL || (TREE_CODE (expr) == VAR_DECL TREE_READONLY (expr))) decl_function_context (expr) == current_function_decl DECL_INITIAL (expr)) -{ - expr = DECL_INITIAL (expr); - STRIP_NOPS (expr); -} +expr = remove_conversions (DECL_INITIAL (expr), false); if (TREE_CONSTANT (expr)) return fold_convert (type, expr); Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 179868) +++ gcc-interface/trans.c (working copy) @@ -1364,10 +1364,7 @@ Attribute_to_gnu (Node_Id gnat_node, tre don't try to build a trampoline. */ if (attribute == Attr_Code_Address) { - for (gnu_expr = gnu_result; - CONVERT_EXPR_P (gnu_expr); - gnu_expr = TREE_OPERAND (gnu_expr, 0)) - TREE_CONSTANT (gnu_expr) = 1; + gnu_expr = remove_conversions (gnu_result, false); if (TREE_CODE (gnu_expr) == ADDR_EXPR) TREE_NO_TRAMPOLINE (gnu_expr) = TREE_CONSTANT (gnu_expr) = 1; @@ -1378,10 +1375,7 @@ Attribute_to_gnu (Node_Id gnat_node, tre a useful warning with -Wtrampolines. */ else if (TREE_CODE (TREE_TYPE (gnu_prefix)) == FUNCTION_TYPE) { - for (gnu_expr = gnu_result; - CONVERT_EXPR_P (gnu_expr); - gnu_expr = TREE_OPERAND (gnu_expr, 0)) - ; + gnu_expr = remove_conversions (gnu_result, false); if (TREE_CODE (gnu_expr) == ADDR_EXPR decl_function_context (TREE_OPERAND (gnu_expr, 0))) @@ -2156,8 +2150,7 @@ push_range_check_info (tree var) if (VEC_empty (loop_info, gnu_loop_stack)) return NULL; - while (CONVERT_EXPR_P (var) || TREE_CODE (var) == VIEW_CONVERT_EXPR) -var = TREE_OPERAND (var, 0); + var =
[rs6000] Enable scalar shifts of vectors
I suppose technically the middle-end could be improved to implement ashlmode as vashlmode by broadcasting the scalar, but Altivec is the only extant SIMD ISA that would make use of this. All of the others can arrange for constant shifts to be encoded into the insn, and so implement the ashlmode named pattern. Tested on ppc64-linux, --with-cpu=G5. Ok? r~ * config/rs6000/rs6000.c (rs6000_expand_vector_broadcast): New. * config/rs6000/rs6000-protos.h: Update. * config/rs6000/vector.md (ashlVEC_I3): New. (lshrVEC_I3, ashrVEC_I3): New. commit 63a6b475bcde403cc4e220827370e6ecea9aad33 Author: Richard Henderson r...@twiddle.net Date: Mon Oct 10 12:34:59 2011 -0700 rs6000: Implement scalar shifts of vectors. diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 73da0f6..4dee23f 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -55,6 +55,7 @@ extern void rs6000_expand_vector_init (rtx, rtx); extern void paired_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, int); +extern rtx rs6000_expand_vector_broadcast (enum machine_mode, rtx); extern void build_mask64_2_operands (rtx, rtx *); extern int expand_block_clear (rtx[]); extern int expand_block_move (rtx[]); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 63c0f0c..786736d 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4890,6 +4890,35 @@ rs6000_expand_vector_extract (rtx target, rtx vec, int elt) emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0)); } +/* Broadcast an element to all parts of a vector, loaded into a register. + Used to turn vector shifts by a scalar into vector shifts by a vector. */ + +rtx +rs6000_expand_vector_broadcast (enum machine_mode mode, rtx elt) +{ + rtx repl, vec[16]; + int i, n; + + n = GET_MODE_NUNITS (mode); + for (i = 0; i n; ++i) +vec[i] = elt; + + if (CONSTANT_P (elt)) +{ + repl = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (n, vec)); + repl = force_reg (mode, repl); +} + else +{ + rtx par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (n, vec)); + repl = gen_reg_rtx (mode); + rs6000_expand_vector_init (repl, par); +} + + return repl; +} + + /* Generates shifts and masks for a pair of rldicl or rldicr insns to implement ANDing by the mask IN. */ void diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 0179cd9..24b473e 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -987,6 +987,16 @@ TARGET_ALTIVEC ) +(define_expand ashlmode3 + [(set (match_operand:VEC_I 0 vint_operand ) + (ashift:VEC_I + (match_operand:VEC_I 1 vint_operand ) + (match_operand:VEC_base 2 nonmemory_operand )))] + TARGET_ALTIVEC +{ + operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]); +}) + ;; Expanders for logical shift right on each vector element (define_expand vlshrmode3 [(set (match_operand:VEC_I 0 vint_operand ) @@ -995,6 +1005,16 @@ TARGET_ALTIVEC ) +(define_expand lshrmode3 + [(set (match_operand:VEC_I 0 vint_operand ) + (lshiftrt:VEC_I + (match_operand:VEC_I 1 vint_operand ) + (match_operand:VEC_base 2 nonmemory_operand )))] + TARGET_ALTIVEC +{ + operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]); +}) + ;; Expanders for arithmetic shift right on each vector element (define_expand vashrmode3 [(set (match_operand:VEC_I 0 vint_operand ) @@ -1002,6 +1022,16 @@ (match_operand:VEC_I 2 vint_operand )))] TARGET_ALTIVEC ) + +(define_expand ashrmode3 + [(set (match_operand:VEC_I 0 vint_operand ) + (ashiftrt:VEC_I + (match_operand:VEC_I 1 vint_operand ) + (match_operand:VEC_base 2 nonmemory_operand )))] + TARGET_ALTIVEC +{ + operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]); +}) ;; Vector reduction expanders for VSX
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
From: David Miller da...@davemloft.net Date: Wed, 12 Oct 2011 17:14:59 -0400 (EDT) From: Eric Botcazou ebotca...@adacore.com Date: Wed, 12 Oct 2011 23:08:39 +0200 I'm currently testing the following patch in various scenerios, I'm pretty sure this is what you had in mind. Yes, this seems to go in the right direction. Don't you need to pass -mvis3 instead of -mvis? Do you need to pass -mcpu=niagara3 at all? Yes, I need to correct the testcase flags now. I just noticed this while testing. I will post a finalized patch later tonight. Ok, I tested that this does the right thing both with and without a vis3/fmaf capable assembler. Committed to trunk. Eric, let me know if there are any further tweaks you'd like me to implement. Fix sparc when assembler lacks support for vis3/fmaf instructions. gcc/ * config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF to zero when assembler lacks support for such instructions. * config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3 and MASK_FMAF in defaults when assembler lacks necessary support. gcc/testsuite/ * gcc.target/sparc/cmask.c: Remove 'vis3' target check and specify '-mvis3' instead of 'mcpu=niagara3' in options. * gcc.target/sparc/fhalve.c: Likewise. * gcc.target/sparc/fnegop.c: Likewise. * gcc.target/sparc/fpadds.c: Likewise. * gcc.target/sparc/fshift.c: Likewise. * gcc.target/sparc/fucmp.c: Likewise. * gcc.target/sparc/lzd.c: Likewise. * gcc.target/sparc/vis3misc.c: Likewise. * gcc.target/sparc/xmul.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179875 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog |7 +++ gcc/config/sparc/sparc.c |6 +- gcc/config/sparc/sparc.h |4 gcc/testsuite/ChangeLog | 13 + gcc/testsuite/gcc.target/sparc/cmask.c|4 ++-- gcc/testsuite/gcc.target/sparc/fhalve.c |4 ++-- gcc/testsuite/gcc.target/sparc/fnegop.c |4 ++-- gcc/testsuite/gcc.target/sparc/fpadds.c |4 ++-- gcc/testsuite/gcc.target/sparc/fshift.c |4 ++-- gcc/testsuite/gcc.target/sparc/fucmp.c|4 ++-- gcc/testsuite/gcc.target/sparc/lzd.c |4 ++-- gcc/testsuite/gcc.target/sparc/vis3misc.c |4 ++-- gcc/testsuite/gcc.target/sparc/xmul.c |4 ++-- 13 files changed, 43 insertions(+), 23 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index cdc9391..017594f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2011-10-12 David S. Miller da...@davemloft.net + + * config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF + to zero when assembler lacks support for such instructions. + * config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3 + and MASK_FMAF in defaults when assembler lacks necessary support. + 2011-10-12 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (vec_unpacks_lo_mode, diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 9c7cc56..fa790b3 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -850,7 +850,11 @@ sparc_option_override (void) cpu = cpu_table[(int) sparc_cpu_and_features]; target_flags = ~cpu-disable; - target_flags |= cpu-enable; + target_flags |= (cpu-enable +#ifndef HAVE_AS_FMAF_HPC_VIS3 + ~(MASK_FMAF | MASK_VIS3) +#endif + ); /* If -mfpu or -mno-fpu was explicitly used, don't override with the processor default. */ diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index 0642ff2..669f106 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -1871,10 +1871,6 @@ extern int sparc_indent_opcode; #ifndef HAVE_AS_FMAF_HPC_VIS3 #define AS_NIAGARA3_FLAG b -#undef TARGET_FMAF -#define TARGET_FMAF 0 -#undef TARGET_VIS3 -#define TARGET_VIS3 0 #else #define AS_NIAGARA3_FLAG d #endif diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 9e8f1f9..943f36f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,16 @@ +2011-10-12 David S. Miller da...@davemloft.net + + * gcc.target/sparc/cmask.c: Remove 'vis3' target check and specify + '-mvis3' instead of 'mcpu=niagara3' in options. + * gcc.target/sparc/fhalve.c: Likewise. + * gcc.target/sparc/fnegop.c: Likewise. + * gcc.target/sparc/fpadds.c: Likewise. + * gcc.target/sparc/fshift.c: Likewise. + * gcc.target/sparc/fucmp.c: Likewise. + * gcc.target/sparc/lzd.c: Likewise. + * gcc.target/sparc/vis3misc.c: Likewise. + * gcc.target/sparc/xmul.c: Likewise. + 2011-10-12 Eric Botcazou ebotca...@adacore.com * gnat.dg/vect1.ad[sb]: New test. diff --git a/gcc/testsuite/gcc.target/sparc/cmask.c b/gcc/testsuite/gcc.target/sparc/cmask.c index
PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
Hi, When combine tries to combine: (insn 37 35 39 3 (set (reg:SI 90) (plus:SI (mult:SI (reg/v:SI 84 [ i ]) (const_int 4 [0x4])) (reg:SI 106))) x.i:11 247 {*leasi_2} (nil)) (insn 39 37 41 3 (set (mem:SI (zero_extend:DI (reg:SI 90)) [3 MEM[symbol: x, index: D.2741_12, step: 4, offset: 4294967292B]+0 S4 A32]) (reg/v:SI 84 [ i ])) x.i:11 64 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 90) (nil))) it optimizes (zero_extend:DI (plus:SI (mult:SI (reg/v:SI 84 [ i ]) (const_int 4 [0x4])) (reg:SI 106))) into (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) in make_compound_operation. X86 backend doesn't accept the new expression as valid address while (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND when zero-extending address to Pmode. It reduces number of lea from 24173 to 21428 in x32 libgfortran.so. Does it make any senses? Thanks. H.J. --- 2011-10-12 H.J. Lu hongjiu...@intel.com PR rtl-optimization/50696 * combine.c (subst): If an address is zero-extended to Pmode, replace FROM with while keeping ZERO_EXTEND. diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..45180e5 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -5078,6 +5078,23 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) } } } +#ifdef POINTERS_EXTEND_UNSIGNED + else if (POINTERS_EXTEND_UNSIGNED 0 + code == MEM + GET_CODE (XEXP (x, 0)) == ZERO_EXTEND + GET_MODE (XEXP (x, 0)) == Pmode) +{ + /* If an address is zero-extended to Pmode, replace FROM with +TO while keeping ZERO_EXTEND. */ + new_rtx = subst (XEXP (XEXP (x, 0), 0), from, to, 0, 0, + unique_copy); + /* Drop ZERO_EXTEND on constant. */ + if (CONST_INT_P (new_rtx)) + SUBST (XEXP (x, 0), new_rtx); + else + SUBST (XEXP (XEXP (x, 0), 0), new_rtx); +} +#endif else { len = GET_RTX_LENGTH (code);
Re: [rs6000] Enable scalar shifts of vectors
From: Richard Henderson r...@redhat.com Date: Wed, 12 Oct 2011 15:32:46 -0700 I suppose technically the middle-end could be improved to implement ashlmode as vashlmode by broadcasting the scalar, but Altivec is the only extant SIMD ISA that would make use of this. All of the others can arrange for constant shifts to be encoded into the insn, and so implement the ashlmode named pattern. I'm pretty sure Sparc's VIS3 can do this too, see the 'vis3_shift_insnvbits_vis' patterns in sparc.md
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
X86 backend doesn't accept the new expression as valid address while (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND when zero-extending address to Pmode. It reduces number of lea from 24173 to 21428 in x32 libgfortran.so. Does it make any senses? I'd be inclined to have the x86 backend accept combine's canonicalized form rather than doing a patch such as this.
[rs6000, spu] Add vec_perm named pattern
The generic support for vector permutation will allow for automatic lowering to V*QImode, so all we need to add to support for these targets is the single V16QI pattern that represents the base permutation insn. I'm not touching any of the other ways that the permutation insn could be generated. After the generic support is added, I'll leave it to the port maintainers to determine what they want to keep. I suspect in many cases using the generic __builtin_shuffle plus some casting in the target-specific header files would be sufficient, eliminating several dozen builtins. Ok? r~ * config/rs6000/altivec.md (vec_permv16qi): New. * config/spu/spu.md (vec_permv16qi): New. commit f2d8929afb989a09d7e287dc171607440c1a Author: Richard Henderson r...@twiddle.net Date: Mon Oct 10 12:35:25 2011 -0700 rs6000: Implement vec_permv16qi. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 9e7437e..84c5444 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1357,6 +1357,15 @@ vperm %0,%1,%2,%3 [(set_attr type vecperm)]) +(define_expand vec_permv16qi + [(set (match_operand:V16QI 0 register_operand ) + (unspec:V16QI [(match_operand:V16QI 1 register_operand ) + (match_operand:V16QI 2 register_operand ) + (match_operand:V16QI 3 register_operand )] + UNSPEC_VPERM))] + TARGET_ALTIVEC + ) + (define_insn altivec_vrfip ; ceil [(set (match_operand:V4SF 0 register_operand =v) (unspec:V4SF [(match_operand:V4SF 1 register_operand v)] commit a67ea08189a4399d6ade00c15e69447304f85f96 Author: Richard Henderson r...@twiddle.net Date: Mon Oct 10 12:35:50 2011 -0700 spu: Implement vec_permv16qi. diff --git a/gcc/config/spu/spu.md b/gcc/config/spu/spu.md index 676d54e..00cfaa4 100644 --- a/gcc/config/spu/spu.md +++ b/gcc/config/spu/spu.md @@ -4395,6 +4395,18 @@ selb\t%0,%4,%0,%3 shufb\t%0,%1,%2,%3 [(set_attr type shuf)]) +(define_expand vec_permv16qi + [(set (match_operand:V16QI 0 spu_reg_operand ) + (unspec:V16QI + [(match_operand:V16QI 1 spu_reg_operand ) + (match_operand:V16QI 2 spu_reg_operand ) + (match_operand:V16QI 3 spu_reg_operand )] + UNSPEC_SHUFB))] + + { +operands[3] = gen_lowpart (TImode, operands[3]); + }) + (define_insn nop [(unspec_volatile [(const_int 0)] UNSPECV_NOP)]
[Ada] Enable -W -Wall across the entire build
They weren't enabled for the Ada part of the front-end and the C part of the library. Of course there are a few warnings... Tested on i586-suse-linux, applied on the mainline. 2011-10-12 Eric Botcazou ebotca...@adacore.com gnattools/ * Makefile.in (LOOSE_WARN): Delete. (GCC_WARN_CFLAGS): Set to -W -Wall. (TOOLS_FLAGS_TO_PASS_1): Delete. (TOOLS_FLAGS_TO_PASS_1re): Rename into... (TOOLS_FLAGS_TO_PASS_RE): ...this. (gnattools-native): Use TOOLS_FLAGS_TO_PASS_NATIVE. (regnattools): Use TOOLS_FLAGS_TO_PASS_RE. libada/ * Makefile.in (LOOSE_WARN): Delete. (GCC_WARN_CFLAGS): Likewise. (WARN_CFLAGS): Likewise. (GNATLIBFLAGS): Add -nostdinc. (GNATLIBCFLAGS_FOR_C): Add -W -Wall. (LIBADA_FLAGS_TO_PASS): Remove WARN_CFLAGS. * configure.ac (warn_cflags): Delete. * configure: Regenerate. gcc/ada/ * sem_util.adb (Denotes_Same_Prefix): Fix fatal warning. * gcc-interface/Make-lang.in (WARN_ADAFLAGS): New. (ALL_ADAFLAGS): Include WARN_ADAFLAGS. (ADA_FLAGS_TO_PASS): Likewise. (COMMON_FLAGS_TO_PASS): New. (ADA_TOOLS_FLAGS_TO_PASS): Use COMMON_FLAGS_TO_PASS. In the regular native case, also use FLAGS_TO_PASS and ADA_FLAGS_TO_PASS. (gnatlib): Use COMMON_FLAGS_TO_PASS. (ada.install-common): Likewise. (install-gnatlib): Likewise. (install-gnatlib-obj): Likewise. (gnattools): Use ADA_TOOLS_FLAGS_TO_PASS for gnattools1 as well. (gnat-cross): Delete. (gnatboot): Likewise. (gnatboot2): Likewise. (gnatboot3): Likewise. (gnatstage1): Likewise. (gnatstage2): Likewise. * gcc-interface/Makefile.in (SOME_ADAFLAGS): Likewise. (MOST_ADAFLAGS): Likewise. (LOOSE_CFLAGS): Likewise. (gnat-cross): Likewise. (GNATLIBFLAGS): Add -W -Wall. (GNATLIBCFLAGS_FOR_C): Likewise. * gcc-interface/lang.opt: Remove C-specific warnings. Add doc lines. * gcc-interface/misc.c (gnat_handle_option): Remove obsolete cases. -- Eric Botcazou Index: gnattools/Makefile.in === --- gnattools/Makefile.in (revision 179844) +++ gnattools/Makefile.in (working copy) @@ -44,8 +44,7 @@ PWD_COMMAND = $${PWDCMD-pwd} # The tedious process of getting CFLAGS right. CFLAGS=-g -LOOSE_WARN = -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -GCC_WARN_CFLAGS = $(LOOSE_WARN) +GCC_WARN_CFLAGS = -W -Wall WARN_CFLAGS = @warn_cflags@ ADA_CFLAGS=@ADA_CFLAGS@ @@ -64,8 +63,8 @@ INCLUDES_FOR_SUBDIR = -I. -I.. -I../.. - -I$(fsrcdir)/../include -I$(fsrcdir) ADA_INCLUDES_FOR_SUBDIR = -I. -I$(fsrcdir)/ada -# Variables for gnattools1, native -TOOLS_FLAGS_TO_PASS_1= \ +# Variables for gnattools, native +TOOLS_FLAGS_TO_PASS_NATIVE= \ CC=../../xgcc -B../../ \ CFLAGS=$(CFLAGS) $(WARN_CFLAGS) \ LDFLAGS=$(LDFLAGS) \ @@ -76,11 +75,13 @@ TOOLS_FLAGS_TO_PASS_1= \ exeext=$(exeext) \ fsrcdir=$(fsrcdir) \ srcdir=$(fsrcdir) \ + GNATMAKE=../../gnatmake \ + GNATLINK=../../gnatlink \ GNATBIND=../../gnatbind \ TOOLSCASE=native # Variables for regnattools -TOOLS_FLAGS_TO_PASS_1re= \ +TOOLS_FLAGS_TO_PASS_RE= \ CC=../../xgcc -B../../ \ CFLAGS=$(CFLAGS) \ ADAFLAGS=$(ADAFLAGS) \ @@ -93,24 +94,7 @@ TOOLS_FLAGS_TO_PASS_1re= \ GNATMAKE=../../gnatmake \ GNATLINK=../../gnatlink \ GNATBIND=../../gnatbind \ - TOOLSCASE=cross \ - INCLUDES= - -# Variables for gnattools2, native -TOOLS_FLAGS_TO_PASS_NATIVE= \ - CC=../../xgcc -B../../ \ - CFLAGS=$(CFLAGS) \ - ADAFLAGS=$(ADAFLAGS) \ - ADA_CFLAGS=$(ADA_CFLAGS) \ - INCLUDES=$(INCLUDES_FOR_SUBDIR) \ - ADA_INCLUDES=-I../rts $(ADA_INCLUDES_FOR_SUBDIR) \ - exeext=$(exeext) \ - fsrcdir=$(fsrcdir) \ - srcdir=$(fsrcdir) \ - GNATMAKE=../../gnatmake \ - GNATLINK=../../gnatlink \ - GNATBIND=../../gnatbind \ - TOOLSCASE=native + TOOLSCASE=cross # Variables for gnattools, cross TOOLS_FLAGS_TO_PASS_CROSS= \ @@ -177,7 +161,7 @@ $(GCC_DIR)/stamp-tools: gnattools-native: $(GCC_DIR)/stamp-tools $(GCC_DIR)/stamp-gnatlib-rts # gnattools1 $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ - $(TOOLS_FLAGS_TO_PASS_1) \ + $(TOOLS_FLAGS_TO_PASS_NATIVE) \ ../../gnatmake$(exeext) ../../gnatlink$(exeext) # gnattools2 $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ @@ -189,7 +173,7 @@ gnattools-native: $(GCC_DIR)/stamp-tools regnattools: $(GCC_DIR)/stamp-gnatlib-rts # gnattools1-re $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ - $(TOOLS_FLAGS_TO_PASS_1re) \ + $(TOOLS_FLAGS_TO_PASS_RE) INCLUDES= \ gnatmake-re gnatlink-re # gnattools2 $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ Index: libada/Makefile.in === --- libada/Makefile.in (revision 179844) +++ libada/Makefile.in (working copy) @@ -45,21 +45,17 @@ AWK=@AWK@ # Variables for the user (or the top
Re: [rs6000] Enable scalar shifts of vectors
On 10/12/2011 03:37 PM, David Miller wrote: From: Richard Henderson r...@redhat.com Date: Wed, 12 Oct 2011 15:32:46 -0700 I suppose technically the middle-end could be improved to implement ashlmode as vashlmode by broadcasting the scalar, but Altivec is the only extant SIMD ISA that would make use of this. All of the others can arrange for constant shifts to be encoded into the insn, and so implement the ashlmode named pattern. I'm pretty sure Sparc's VIS3 can do this too, see the 'vis3_shift_insnvbits_vis' patterns in sparc.md Ok, if I read the rtl correctly, you can perform a vector shift, where each shift count comes from the corresponding element of op2. But VIS has no vector shift where the shift count comes from a single scalar (immediate or register)? If so, please rename this pattern to the vshift_pat_namemode3 form and I'll work on more middle-end support for re-use of the vshift_pat_name optab. r~
[lto] Add streamer hooks for emitting location_t (issue5249058)
The pph streamer does not write out expanded locations. It emits the line map tables exactly as it found them on the initial compile so that it can recreate them when the pph image is restored. This allows it to emit the location_t as integers and produce the same line locations than the original compile. This patch adds the two hooks needed to make sure that the tree streamer writes locations using the pph routines. Barring any objections, I'll commit this patch to mainline in the next couple of days. Diego. * streamer-hooks.h (struct streamer_hooks): Add hooks input_location and output_location. * lto-streamer-in.c (lto_input_location): Use streamer_hooks.input_location, if set. * lto-streamer-out.c (lto_output_location): Use streamer_hooks.output_location, if set. diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c index d4e80c7..f18b944 100644 --- a/gcc/lto-streamer-in.c +++ b/gcc/lto-streamer-in.c @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3. If not see #include lto-streamer.h #include tree-streamer.h #include tree-pass.h +#include streamer-hooks.h /* The table to hold the file names. */ static htab_t file_name_hash_table; @@ -180,15 +181,23 @@ lto_input_location_bitpack (struct data_in *data_in, struct bitpack_d *bp) } -/* Read a location from input block IB. */ +/* Read a location from input block IB. + If the input_location streamer hook exists, call it. + Otherwise, proceed with reading the location from the + expanded location bitpack. */ location_t lto_input_location (struct lto_input_block *ib, struct data_in *data_in) { - struct bitpack_d bp; + if (streamer_hooks.input_location) +return streamer_hooks.input_location (ib, data_in); + else +{ + struct bitpack_d bp; - bp = streamer_read_bitpack (ib); - return lto_input_location_bitpack (data_in, bp); + bp = streamer_read_bitpack (ib); + return lto_input_location_bitpack (data_in, bp); +} } diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index c14b3a9..4d88f62 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -172,15 +172,21 @@ lto_output_location_bitpack (struct bitpack_d *bp, /* Emit location LOC to output block OB. - When bitpack is handy, it is more space effecient to call + If the output_location streamer hook exists, call it. + Otherwise, when bitpack is handy, it is more space efficient to call lto_output_location_bitpack with existing bitpack. */ void lto_output_location (struct output_block *ob, location_t loc) { - struct bitpack_d bp = bitpack_create (ob-main_stream); - lto_output_location_bitpack (bp, ob, loc); - streamer_write_bitpack (bp); + if (streamer_hooks.output_location) +streamer_hooks.output_location (ob, loc); + else +{ + struct bitpack_d bp = bitpack_create (ob-main_stream); + lto_output_location_bitpack (bp, ob, loc); + streamer_write_bitpack (bp); +} } diff --git a/gcc/streamer-hooks.h b/gcc/streamer-hooks.h index b4c6562..0c1d483 100644 --- a/gcc/streamer-hooks.h +++ b/gcc/streamer-hooks.h @@ -51,6 +51,16 @@ struct streamer_hooks { and descriptors needed by the unpickling routines. It returns the tree instantiated from the stream. */ tree (*read_tree) (struct lto_input_block *, struct data_in *); + + /* [OPT] Called by lto_input_location to retrieve the source location of the + tree currently being read. If this hook returns NULL, lto_input_location + defaults to calling lto_input_location_bitpack. */ + location_t (*input_location) (struct lto_input_block *, struct data_in *); + + /* [OPT] Called by lto_output_location to write the source_location of the + tree currently being written. If this hook returns NULL, + lto_output_location defaults to calling lto_output_location_bitpack. */ + void (*output_location) (struct output_block *, location_t); }; #define stream_write_tree(OB, EXPR, REF_P) \ -- This patch is available for review at http://codereview.appspot.com/5249058
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: X86 backend doesn't accept the new expression as valid address while (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND when zero-extending address to Pmode. It reduces number of lea from 24173 to 21428 in x32 libgfortran.so. Does it make any senses? I'd be inclined to have the x86 backend accept combine's canonicalized form rather than doing a patch such as this. The address format generated by combine is very unusual in 2 aspecst: 1. The placement of subreg in (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) isn't supported by x86 backend. 2. The biggest problem is optimizing mask 0x to 0xfffc by keeping track of non-zero bits in registers. X86 backend doesn't have such information to know ADDR 0xfffc == ADDR 0x. -- H.J.
Re: [rs6000] Enable scalar shifts of vectors
From: Richard Henderson r...@redhat.com Date: Wed, 12 Oct 2011 15:49:28 -0700 Ok, if I read the rtl correctly, you can perform a vector shift, where each shift count comes from the corresponding element of op2. But VIS has no vector shift where the shift count comes from a single scalar (immediate or register)? That's correct. If so, please rename this pattern to the vshift_pat_namemode3 form and I'll work on more middle-end support for re-use of the vshift_pat_name optab. Will do, thanks Richard.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
1. The placement of subreg in (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) isn't supported by x86 backend. That's easy to fix. 2. The biggest problem is optimizing mask 0x to 0xfffc by keeping track of non-zero bits in registers. X86 backend doesn't have such information to know ADDR 0xfffc == ADDR 0x. But this indeed isn't. I withdraw my comment. I still don't like the patch, but I'm no longer as familiar with the code as I used to be so can't suggest a replacement. Let's see what others think about it.
[cxx-mem-model] merge from trunk @ 179855
Nothing major to report. I found a small buglet in gcc/testsuite/g++.dg/dg.exp that is left over from the memmodel - simulate-thread rename. I also found that trunk does not have the following change to g++.dg/dg.exp and I will submit this as a follow up (to trunk). Committed to branch. houston:/source/cxx-mem-model-merge/gcc/testsuite/g++.dg$ svn diff *.exp Index: dg.exp === --- dg.exp (revision 179857) +++ dg.exp (working copy) @@ -48,7 +48,7 @@ set tests [prune $tests $srcdir/$subdir/ set tests [prune $tests $srcdir/$subdir/torture/*] set tests [prune $tests $srcdir/$subdir/graphite/*] set tests [prune $tests $srcdir/$subdir/guality/*] -set tests [prune $tests $srcdir/$subdir/memmodel/*] +set tests [prune $tests $srcdir/$subdir/simulate-thread/*] # Main loop. dg-runtest $tests $DEFAULT_CXXFLAGS
[testsuite] require arm_little_endian in two tests
Tests gcc.target/arm/pr48252.c and gcc.target/arm/neon-vset_lanes8.c expect little-endian code and fail when compiled with -mbig-endian. This patch skips the test if the current multilib does not generate little-endian code. I'm not able to run execution tests for -mbig-endian for GCC mainline but have tested this patch with CodeSourcery's GCC 4.6. OK for trunk? 2011-10-12 Janis Johnson jani...@codesourcery.com * gcc.target/arm/pr48252.c: Require arm_little_endian. * gcc.target/arm/neon-vset_lanes8.c: Likewise. Index: gcc/testsuite/gcc.target/arm/pr48252.c === --- gcc/testsuite/gcc.target/arm/pr48252.c (revision 344214) +++ gcc/testsuite/gcc.target/arm/pr48252.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_neon_hw } */ +/* { dg-require-effective-target arm_little_endian } */ /* { dg-options -O2 } */ /* { dg-add-options arm_neon } */ Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c === --- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (revision 344214) +++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (working copy) @@ -2,6 +2,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_neon_hw } */ +/* { dg-require-effective-target arm_little_endian } */ /* { dg-options -O0 } */ /* { dg-add-options arm_neon } */
Re: RFC: Add ADD_RESTRICT tree code
Hi, On Wed, 12 Oct 2011, Jakub Jelinek wrote: Assignment 2 means that t-p points to s.p. Assignment 3 changes t-p and s.p, but the change to s.p doesn't occur through a pointer based on t-p or any other restrict pointer, in fact it doesn't occur through any explicit initialization or assignment, but rather through in indirect access via a different pointer. Hence the accesses to the same memory object at s.p[0] and t-p[0] were undefined because both accesses weren't through pointers based on each other. Only the field p in the structure is restrict qualified, there is no restrict qualification on the other pointers (e.g. t is not restrict). Thus, it is valid that t points to s. And, s.p[0] access is based on s.p as well as t-p and similarly t-p[0] access is based on s.p as well as t-p, in the sense of the ISO C99 restrict wording. IMO reading the standard to allow an access to be based on s.p _as well as_ t-p and that this should result in any sensible behaviour regarding restrict is interpreting too much into it. Let's do away with the fields, trying to capture the core of the disagreement. What you seem to be saying is that this code is well-defined and shouldn't return 1: int foo (int * _a, int * _b) { int * restrict a = _a; int * restrict b = _b; int * restrict *pa = wrap (a); *pa = _b; // 1 *a = 0; **pa = 1; return *a; } I think that would go straight against the intent of restrict. I'd read the standard as making the above trick undefined. Because, if you change t-p (or s.p) at some point in between t-p = q; and s.p[0]; (i.e. prior to the access) to point to a copy of the array, both s.p and t-p change. Yes, but the question is, if the very modification of t-p was valid to start with. In my example above insn 1 is a funny way to write a = _b, i.e. reassigning the already set restrict pointer a to the one that also is already in b. Simplifying the above then leads to: int foo (int * _a, int * _b) { int * restrict a = _a; int * restrict b = _b; a = _b; *a = 0; *b = 1; return *a; } which I think is undefined because of the fourth clause (multiple modifying accesses to the same underlying object X need to go through one particular restrict chain). Seen from another perspective your reading would introduce an inconsistency with composition. Let's assume we have this function available: int tail (int * restrict a, int * restrict b) { *a = 0; *b = 1; return *a; } Clearly we can optimize this into { *a=0;*b=1;return 0; } without looking at the context. Now write the testcase or my example above in terms of that function: int goo (int *p, int *q) { struct S s, *t; s.a = 1; s.p = p; // 1 t = wrap(s); // 2 t=s in effect, but GCC doesn't see this t-p = q; // 3 return tail (s.p, t-p); } Now we get the same behaviour of returning a zero. Something must be undefined here, and it's not in tail itself. It's either the call of tail, the implicit modification of s.p with writes to t-p or the existence of two separate restrict pointers of the same value. I think the production of two separate equal-value restrict pointers via indirect modification is the undefinedness, and _if_ the standard can be read in a way that this is supposed to be valid then it needs to be clarified to not allow that anymore. I believe the standard should say something to the effect of disallowing modifying restrict pointers after they are initialized/assigned to once. Ciao, Michael.
Re: [wwwdocs] gcc-4.6/porting_to.html
On Tue, 11 Oct 2011, Benjamin Kosnik wrote: Many users still won't have GCC 4.6 deployed yet, so I think it's still worth it. Ouch. I see this is not in, and I though I checked in the draft months ago. Please check this in immediately!!! Done last evening, and made some further tweaks. For reference hre is the full patch that's now live on the system. Gerald Index: porting_to.html === RCS file: porting_to.html diff -N porting_to.html --- /dev/null 1 Jan 1970 00:00:00 - +++ porting_to.html 12 Oct 2011 16:16:54 - 1.3 @@ -0,0 +1,150 @@ +html + +head +titlePorting to GCC 4.6/title +/head + +body +h1Porting to GCC 4.6/h1 + +p +The GCC 4.6 release series differs from previous GCC releases in more +than the usual list of +a href=http://gcc.gnu.org/gcc-4.6/changes.html;changes/a. Some of +these are a result of bug fixing, and some old behaviors have been +intentionally changed in order to support new standards, or relaxed +instandards-conforming ways to facilitate compilation or runtime +performance. Some of these changes are not visible to the naked eye +and will not cause problems when updating from older versions. +/p + +p +However, some of these changes are visible, and can cause grief to +users porting to GCC 4.6. This document is an effort to identify major +issues and provide clear solutions in a quick and easily searched +manner. Additions and suggestions for improvement are welcome. +/p + +h2C language issues/h2 + +h3New warnings for unused variables and parameters/h3 + +p +The behavior of code-Wall/code has changed and now includes the +new warning flags code-Wunused-but-set-variable/code and +(with code-Wall +-Wextra/code) code-Wunused-but-set-parameter/code. This may +result in new warnings in code that compiled cleanly with previous +versions of GCC. +/p + +pFor example,/p +pre + void fn (void) + { +int foo; +foo = bar (); /* foo is never used. */ + } +/pre +pGives the following diagnostic:/p +pre +warning: variable foo set but not used [-Wunused-but-set-variable] +/pre + +pAlthough these warnings will not result in compilation failure, +often code-Wall/code is used in conjunction with +code-Werror/code and as a result, new warnings are turned into +new errors./p + +pTo fix, first see if the unused variable or parameter can be removed +without changing the result or logic of the surrounding code. If not, +annotate it with code__attribute__((__unused__))/code./p + +pAs a workaround, remove code-Werror/code until the new warnings +are fixed. For conversion warnings add +code-Wno-unused-but-set-variable/code or +code-Wno-unused-but-set-parameter/code./p + +h3Strict overflow warnings/h3 + +pUsing the code-Wstrict-overflow/code flag with +code-Werror/code and optmization flags above code-O2/code +may result in compile errors when using glibc optimizations +for codestrcmp/code./p + +pFor example,/p +pre +#include lt;string.hgt; +void do_rm_rf (const char *p) { if (strcmp (p, /) == 0) return; } +/pre +pResults in the following diagnostic:/p +pre +error: assuming signed overflow does not occur when changing X +- C1 cmp C2 to X cmp C1 +- C2 [-Werror=strict-overflow] +/pre + +pTo work around this, use code-D__NO_STRING_INLINES/code./p + +h2C++ language issues/h2 + +h3Header dependency changes/h3 + +p +Many of the standard C++ library include files have been edited to no +longer include lt;cstddefgt; to get codenamespace std/code +-scoped versions of codesize_t/code and codeptrdiff_t/code. +/p + +p +As such, C++ programs that used the macros codeNULL/code +or codeoffsetof/code without including lt;cstddefgt; will no +longer compile. The diagnostic produced is similar to: +/p + +pre +error: 'ptrdiff_t' does not name a type +/pre + +pre +error: 'size_t' has not been declared +/pre + +pre +error: 'NULL' was not declared in this scope +/pre + +pre +error: there are no arguments to 'offsetof' that depend on a template +parameter, so a declaration of 'offsetof' must be available +/pre + +p +Fixing this issue is easy: just include lt;cstddefgt;. +/p + +!-- +h3Java issues/h3 +-- + +h3Links/h3 + +p +Jakub Jelinek, + a href=http://lists.fedoraproject.org/pipermail/devel/2011-February/148523.html;GCC +4.6 related common package rebuild failures (was Re: mass rebuild status)/a +/p + +p +Matthias Klose, +a href=http://lists.debian.org/debian-devel-announce/2011/02/msg00012.html;prepare +to fix build failures with new GCC versions/a +/p + +p +Jim Meyering, + a href=http://lists.fedoraproject.org/pipermail/devel/2011-March/149355.html;gcc-4.6.0-0.12.fc15.x86_64 breaks strcmp?/a +/p + +/body +/html + +
[lto] Factor out code for streaming struct function. (issue5253051)
The pph branch also needs to stream struct function objects, but it doesn't need to stream things like SSA names and the other data processed by output_function/input_function. This patch factors out the common code into separate functions. I've made them static for now, since no other external code calls it yet. This will minimize differences between pph and trunk. Tested on x86_64 with LTO profiled bootstrap. Barring any objections, I'll commit this to trunk in the next couple of days. Diego. 2011-10-12 Lawrence Crowl cr...@google.com Diego Novillo dnovi...@google.com * lto-streamer-in.c (input_struct_function_base): Factor out of ... (input_function): ... here. * lto-streamer-out.c (output_struct_function_base): Factor out of ... (output_function): ... here. diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c index f18b944..1847738 100644 --- a/gcc/lto-streamer-in.c +++ b/gcc/lto-streamer-in.c @@ -764,27 +764,40 @@ fixup_call_stmt_edges (struct cgraph_node *orig, gimple *stmts) } } -/* Read the body of function FN_DECL from DATA_IN using input block IB. */ + +/* Input the base body of struct function FN from DATA_IN + using input block IB. */ static void -input_function (tree fn_decl, struct data_in *data_in, - struct lto_input_block *ib) +input_struct_function_base (struct function *fn, struct data_in *data_in, +struct lto_input_block *ib) { - struct function *fn; - enum LTO_tags tag; - gimple *stmts; - basic_block bb; struct bitpack_d bp; - struct cgraph_node *node; - tree args, narg, oarg; int len; - fn = DECL_STRUCT_FUNCTION (fn_decl); - tag = streamer_read_record_start (ib); - clear_line_info (data_in); + /* Read the static chain and non-local goto save area. */ + fn-static_chain_decl = stream_read_tree (ib, data_in); + fn-nonlocal_goto_save_area = stream_read_tree (ib, data_in); - gimple_register_cfg_hooks (); - lto_tag_check (tag, LTO_function); + /* Read all the local symbols. */ + len = streamer_read_hwi (ib); + if (len 0) +{ + int i; + VEC_safe_grow (tree, gc, fn-local_decls, len); + for (i = 0; i len; i++) + { + tree t = stream_read_tree (ib, data_in); + VEC_replace (tree, fn-local_decls, i, t); + } +} + + /* Input the function start and end loci. */ + fn-function_start_locus = lto_input_location (ib, data_in); + fn-function_end_locus = lto_input_location (ib, data_in); + + /* Input the current IL state of the function. */ + fn-curr_properties = streamer_read_uhwi (ib); /* Read all the attributes for FN. */ bp = streamer_read_bitpack (ib); @@ -802,30 +815,30 @@ input_function (tree fn_decl, struct data_in *data_in, fn-calls_setjmp = bp_unpack_value (bp, 1); fn-va_list_fpr_size = bp_unpack_value (bp, 8); fn-va_list_gpr_size = bp_unpack_value (bp, 8); +} - /* Input the function start and end loci. */ - fn-function_start_locus = lto_input_location (ib, data_in); - fn-function_end_locus = lto_input_location (ib, data_in); - /* Input the current IL state of the function. */ - fn-curr_properties = streamer_read_uhwi (ib); +/* Read the body of function FN_DECL from DATA_IN using input block IB. */ - /* Read the static chain and non-local goto save area. */ - fn-static_chain_decl = stream_read_tree (ib, data_in); - fn-nonlocal_goto_save_area = stream_read_tree (ib, data_in); +static void +input_function (tree fn_decl, struct data_in *data_in, + struct lto_input_block *ib) +{ + struct function *fn; + enum LTO_tags tag; + gimple *stmts; + basic_block bb; + struct cgraph_node *node; + tree args, narg, oarg; - /* Read all the local symbols. */ - len = streamer_read_hwi (ib); - if (len 0) -{ - int i; - VEC_safe_grow (tree, gc, fn-local_decls, len); - for (i = 0; i len; i++) - { - tree t = stream_read_tree (ib, data_in); - VEC_replace (tree, fn-local_decls, i, t); - } -} + fn = DECL_STRUCT_FUNCTION (fn_decl); + tag = streamer_read_record_start (ib); + clear_line_info (data_in); + + gimple_register_cfg_hooks (); + lto_tag_check (tag, LTO_function); + + input_struct_function_base (fn, data_in, ib); /* Read all function arguments. We need to re-map them here to the arguments of the merged function declaration. */ diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index 4d88f62..1ae4a4b 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -719,36 +719,30 @@ produce_asm (struct output_block *ob, tree fn) } -/* Output the body of function NODE-DECL. */ +/* Output the base body of struct function FN using output block OB. */ -static void -output_function (struct cgraph_node *node) +void +output_struct_function_base (struct output_block *ob, struct function *fn) { struct bitpack_d bp; - tree function; - struct function *fn; -
Re: [PATCH] [Annotalysis] Bugfix for spurious thread safety warnings with shared mutexes
On Wed, Oct 12, 2011 at 9:58 AM, Delesley Hutchins deles...@google.com wrote: I don't think that will fix this bug. The bug occurs if: (1) The exclusive lock set has error_mark_node. (2) The shared lock set has the actual lock. Oh, I see. This change looks fine for google/gcc-4_6, then. If I understand your suggested fix correctly, lock_set_contains would still return non-null when the universal lock was present, which is not what we want. IMHO, lock_set_contains is operating correctly; it was just passed the wrong arguments. I still think there may be a bug in lock_set_contains, but my knowledge of the code is insufficient to know if this can lead to problems in practice. Suppose the lock set contains both the supplied lock and the universal lock, and ignore_universal_lock is false. Then lock_set_contains() will return the lock directly. However, it *should* return the canonicalized version of the lock. Ollie