Re: [PATCH] Keep REG_INC note in subreg2 pass
On 30 October 2013 02:47, Jeff Law l...@redhat.com wrote: On 10/24/13 02:20, Zhenqiang Chen wrote: Hi, REG_INC note is lost in subreg2 pass when resolve_simple_move, which might lead to wrong dependence for ira. e.g. In function validate_equiv_mem of ira.c, it checks REG_INC note: for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) if ((REG_NOTE_KIND (note) == REG_INC || REG_NOTE_KIND (note) == REG_DEAD) REG_P (XEXP (note, 0)) reg_overlap_mentioned_p (XEXP (note, 0), memref)) return 0; Without REG_INC note, validate_equiv_mem will return a wrong result. Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more detail about a real case in kernel. Bootstrap and no make check regression on X86-64 and ARM. Is it OK for trunk and 4.8? Thanks! -Zhenqiang ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * lower-subreg.c (resolve_simple_move): Copy REG_INC note. testsuite/ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * gcc.target/arm/lp1243022.c: New test. This clearly handles adding a note when the destination is a MEM with a side effect. What about cases where the side effect is associated with a load from memory rather than a store to memory? Yes. We should handle load from memory. lp1243022.patch diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index 57b4b3c..e710fa5 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + /* Copy the REG_INC notes. */ + if (MEM_P (real_dest) !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + { + if (!REG_NOTES (minsn)) + REG_NOTES (minsn) = note; + else + add_reg_note (minsn, REG_INC, note); + } + } +#endif If MINSN does not have any notes, then this results in MINSN and INSN sharing the note. Note carefully that notes are chained (see implementation of add_reg_note). Thus the sharing would result in MINSN and INSN actually sharing a chain of notes. I'm pretty sure that's not what you intended. I think you need to always use add_reg_note. Yes. I should use add_reg_note. Here is the updated patch: diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index ebf364f..16dfa62 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn) rtx reg; reg = gen_reg_rtx (orig_mode); + +#ifdef AUTO_INC_DEC + { + rtx move = emit_move_insn (reg, src); + if (MEM_P (src)) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + add_reg_note (move, REG_INC, XEXP (note, 0)); + } + } +#else emit_move_insn (reg, src); +#endif src = reg; } @@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + if (MEM_P (real_dest) !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) +{ + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + add_reg_note (minsn, REG_INC, XEXP (note, 0)); +} +#endif + smove = single_set (minsn); gcc_assert (smove != NULL_RTX);
Re: libsanitizer merge from upstream r191666
On Tue, Oct 29, 2013 at 05:15:24PM -0700, Konstantin Serebryany wrote: Actually, I guessed the flags: You don't have to guess them, if you look into testsuite/g++/g++.log (or testsuite/gcc/gcc.log etc.), you can find them there. % ../gcc-inst/bin/g++ -g -fsanitize=address -static-libasan -O2 -flto -fno-use-linker-plugin -flto-partition=none ../gcc/gcc/testsuite/c-c++-common/asan/stack-overflow-1.c; ./a.out 21 /tmp/ccgSw6NI.lto.o: In function `main': ../gcc/gcc/testsuite/c-c++-common/asan/stack-overflow-1.c:13: undefined reference to `.LASANPC0.2585' collect2: error: ld returned 1 exit status Looks like this patch is not friendly to -flto I guess if you do: ... - tree str_cst; + tree str_cst, decl, id; ... + ASM_GENERATE_INTERNAL_LABEL (buf, LASANPC, current_function_funcdef_no); + id = get_identifier (buf); + decl = build_decl (DECL_SOURCE_LOCATION (current_function_decl), +VAR_DECL, id, char_type_node); + SET_DECL_ASSEMBLER_NAME (decl, id); + TREE_ADDRESSABLE (decl) = 1; ... it might work even for -flto. The problem with -flto is that the default set_decl_assembler_name langhook for -flto appends dot and some number to the non-exported names, which is undesirable here, because we already make sure those first numbers are unique for the whole compilation unit. Jakub
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
so in the end I guess I have nothing against the original patch (with whatever form of the copy to reg call is desirable). So ok to commit? -Y
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Wed, Oct 30, 2013 at 11:32:14AM +0400, Yury Gribov wrote: so in the end I guess I have nothing against the original patch (with whatever form of the copy to reg call is desirable). So ok to commit? Ok with the change suggested by Richard, I think it was: addr = copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0)); Jakub
Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
Jason Merrill wrote: On 10/28/2013 05:48 PM, Tobias Burnus wrote: DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4) has now a 5th operator (RANGE_FOR_IVDEP), which has the value boolean_true_node - or NULL_TREE (via begin_range_for_stmt). You don't need to add an operand, you can use one of the TREE_LANG_FLAGs for a boolean flag. Yes, of course. I somehow thought that the TREE_LANG_FLAG had to be attached to RANGE_FOR_EXPR (because the annotate expression will be placed around the condition). Trying to do so, I had problems to find a free TREE_LANG_FLAG. But it makes much more sense to add the flag to RANGE_FOR_STMT; in that case, one can freely choose a flag. Build and regtested on x86-64-gnu-linux. OK? Tobias 2013-10-30 Tobias Burnus bur...@net-b.de gcc/cp/ PR other/33426 * cp-tree.h (RANGE_FOR_IVDEP): Define. (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Take 'bool ivdep' parameter. * cp-array-notation.c (create_an_loop): Update call. * init.c (build_vec_init): Ditto. * pt.c (tsubst_expr): Ditto. * parser.c (cp_parser_iteration_statement, cp_parser_for, cp_parser_range_for, cp_convert_range_for): Update calls. (cp_parser_pragma): Accept GCC ivdep for 'while' and 'do'. * semantics.c (finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Optionally build ivdep annotation. gcc/testsuite/ PR other/33426 * g++.dg/vect/pr33426-ivdep-2.cc: New. * g++.dg/vect/pr33426-ivdep-3.cc: New. * g++.dg/vect/pr33426-ivdep-4.cc: New. gcc/ PR other/33426 * gcc/tree-cfg.c (replace_loop_annotate): Replace warning by warning_at. diff --git a/gcc/cp/cp-array-notation.c b/gcc/cp/cp-array-notation.c index c700f58..e1fb0ee 100644 --- a/gcc/cp/cp-array-notation.c +++ b/gcc/cp/cp-array-notation.c @@ -71,7 +71,7 @@ create_an_loop (tree init, tree cond, tree incr, tree body) finish_expr_stmt (init); for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE); finish_for_init_stmt (for_stmt); - finish_for_cond (cond, for_stmt); + finish_for_cond (cond, for_stmt, false); finish_for_expr (incr, for_stmt); finish_expr_stmt (body); finish_for_stmt (for_stmt); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 507b389..fd79adb 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -116,6 +116,7 @@ c-common.h, not after. 6: IDENTIFIER_REPO_CHOSEN (in IDENTIFIER_NODE) DECL_CONSTRUCTION_VTABLE_P (in VAR_DECL) TYPE_MARKED_P (in _TYPE) + RANGE_FOR_IVDEP (in RANGE_FOR_STMT) Usage of TYPE_LANG_FLAG_?: 0: TYPE_DEPENDENT_P @@ -4088,6 +4089,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter) #define RANGE_FOR_EXPR(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 1) #define RANGE_FOR_BODY(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 2) #define RANGE_FOR_SCOPE(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 3) +#define RANGE_FOR_IVDEP(NODE) TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE)) #define SWITCH_STMT_COND(NODE) TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 0) #define SWITCH_STMT_BODY(NODE) TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 1) @@ -4321,7 +4323,7 @@ extern int comparing_specializations; sizeof can be nested. */ extern int cp_unevaluated_operand; -extern tree cp_convert_range_for (tree, tree, tree); +extern tree cp_convert_range_for (tree, tree, tree, bool); extern bool parsing_nsdmi (void); /* in pt.c */ @@ -5671,16 +5673,16 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree); +extern void finish_while_stmt_cond (tree, tree, bool); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree); +extern void finish_do_stmt (tree, tree, bool); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern tree begin_for_stmt (tree, tree); extern void finish_for_init_stmt (tree); -extern void finish_for_cond (tree, tree); +extern void finish_for_cond (tree, tree, bool); extern void finish_for_expr (tree, tree); extern void finish_for_stmt (tree); extern tree begin_range_for_stmt (tree, tree); diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 78ea986..bfd9152 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -3667,7 +3667,7 @@ build_vec_init (tree base, tree maxindex, tree init, finish_for_init_stmt (for_stmt); finish_for_cond (build2 (NE_EXPR, boolean_type_node, iterator, build_int_cst (TREE_TYPE (iterator), -1)), - for_stmt); + for_stmt, false); elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, 0, complain); if (elt_init == error_mark_node) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 8deffc3..9e28ced 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -1978,7 +1978,7 @@ static tree
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On Tue, Oct 29, 2013 at 8:48 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/10/29 Jeff Law l...@redhat.com: On 10/29/13 07:52, Ilya Enkovich wrote: Yeah. I'm working on it right now. I've fixed known issues and now I'm looking for others. Meanwhile here is a new patch version with required renames and without LTO restriction. I can't help but but curious, what turned out to be the root cause of those LTO problems? There were three different problems fixed. The first one was SSA_NAME in DECL_INITIAL of local var. Instrumentation used it to initialize var with input arg value (default SSA_NAME of PARM_DECL was used). LTO cannot handle it because when it reads symbols, it does not have SSA_NAMEs. It caused ICE. Obviously putting things in trees is bad. Another problem was in LTO front-end. I did not realize it has own langhooks. It caused reset of flag_check_pointer_bounds in process_options by my own code. And the last one was in initialization of checker structures. Some structures were initialized during checker pass and then used in other passes (e.g. expand). With LTO checker pass is not executed after LTO front-end and following passes could work with uninitialized checker structures. Looks badly designed then - any function related information should be hooked off struct function and streamed by LTO. Or the info should be present in the IL. Richard. 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * builtin-types.def (BT_FN_VOID_CONST_PTR): New. (BT_FN_PTR_CONST_PTR): New. (BT_FN_CONST_PTR_CONST_PTR): New. (BT_FN_PTR_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR): New. (BT_FN_VOID_PTRPTR_CONST_PTR): New. (BT_FN_VOID_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New. * chkp-builtins.def: New. * builtins.def: include chkp-builtins.def. (DEF_CHKP_BUILTIN): New. * builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS, BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS, BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS, BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS, BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS, BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND, BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL, BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET, BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW, BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER. * common.opt (fcheck-pointer-bounds): New. * toplev.c (process_options): Check Pointer Bounds Checker is supported. * doc/extend.texi: Document Pointer Bounds Checker built-in functions. This is fine. Please install. Thanks! Ilya Thanks, jeff
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Tue, Oct 29, 2013 at 10:53 PM, Ilya Enkovich enkovich@gmail.com wrote: On 29 Oct 13:39, Jeff Law wrote: On 10/24/13 08:43, Ilya Enkovich wrote: +/* Return the number of arguments used by call statement GS. */ + +static inline unsigned +gimple_call_num_nobnd_args (const_gimple gs) +{ + unsigned num_args = gimple_call_num_args (gs); + unsigned res = num_args; + for (unsigned n = 0; n num_args; n++) +if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) + res--; + return res; +} number of arguments seems wrong. Aren't you counting the number of arguments without bounds? Damned copy-paste. + +/* Nonzero if this type supposes bounds existence. */ +#define BOUNDED_TYPE_P(type) \ + (TREE_CODE (type) == POINTER_TYPE \ +|| TREE_CODE (type) == REFERENCE_TYPE) So how is this different than POINTER_TYPE_P? If you really want BOUNDED_TYPE_P, perhaps define it in terms of POINTER_TYPE_P? With that and the comment fix, this is fine. jeff I'd like to keep this macro. Currently it is equal to POINTER_TYPE_P but semantics is a little different. Below is a fixed version to be committed. Thanks! Ilya -- diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index e4b0f81..248dfea 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -539,11 +539,12 @@ dump_gimple_assign (pretty_printer *buffer, gimple gs, int spc, int flags) static void dump_gimple_return (pretty_printer *buffer, gimple gs, int spc, int flags) { - tree t; + tree t, t2; t = gimple_return_retval (gs); + t2 = gimple_return_retbnd (gs); if (flags TDF_RAW) -dump_gimple_fmt (buffer, spc, flags, %G %T, gs, t); +dump_gimple_fmt (buffer, spc, flags, %G %T %T, gs, t, t2); else { pp_string (buffer, return); @@ -552,6 +553,11 @@ dump_gimple_return (pretty_printer *buffer, gimple gs, int spc, int flags) pp_space (buffer); dump_generic_node (buffer, t, spc, flags, false); } + if (t2) + { + pp_string (buffer, , ); + dump_generic_node (buffer, t2, spc, flags, false); + } pp_semicolon (buffer); } } diff --git a/gcc/gimple.c b/gcc/gimple.c index 3ddceb9..20f6010 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -174,7 +174,7 @@ gimple_build_with_ops_stat (enum gimple_code code, unsigned subcode, gimple gimple_build_return (tree retval) { - gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 1); + gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 2); Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] Bah. Where is the update to gimple.texi and tree.texi? Richard. if (retval) gimple_return_set_retval (s, retval); return s; @@ -366,6 +366,26 @@ gimple_build_call_from_tree (tree t) } +/* Return index of INDEX's non bound argument of the call. */ + +unsigned +gimple_call_get_nobnd_arg_index (const_gimple gs, unsigned index) +{ + unsigned num_args = gimple_call_num_args (gs); + for (unsigned n = 0; n num_args; n++) +{ + if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) + continue; + else if (index) + index--; + else + return n; +} + + gcc_unreachable (); +} + + /* Extract the operands and code for expression EXPR into *SUBCODE_P, *OP1_P, *OP2_P and *OP3_P respectively. */ diff --git a/gcc/gimple.h b/gcc/gimple.h index fef64cd..c7ce394 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -919,6 +919,7 @@ extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq *); extern tree get_formal_tmp_var (tree, gimple_seq *); extern void declare_vars (tree, gimple, bool); extern void annotate_all_with_location (gimple_seq, location_t); +extern unsigned gimple_call_get_nobnd_arg_index (const_gimple, unsigned); /* Validation of GIMPLE expressions. Note that these predicates only check the basic form of the expression, they don't recurse to make sure that @@ -2414,6 +2415,32 @@ gimple_call_arg (const_gimple gs, unsigned index) } +/* Return the number of arguments used by call statement GS + ignoring bound ones. */ + +static inline unsigned +gimple_call_num_nobnd_args (const_gimple gs) +{ + unsigned num_args = gimple_call_num_args (gs); + unsigned res = num_args; + for (unsigned n = 0; n num_args; n++) +if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) + res--; + return res; +} + + +/* Return INDEX's call argument ignoring bound ones. */ +static inline tree +gimple_call_nobnd_arg (const_gimple gs, unsigned index) +{ + /* No bound args may exist if pointers checker is off. */ + if (!flag_check_pointer_bounds) +return gimple_call_arg (gs, index); + return
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote: For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. No, it wont be. Fallthrough will generate the pattern that will be matched by the insn pattern above, just like you are doing by hand below. I think the case is special for abs(). In optabs.c, there is a function expand_abs() in which the function expand_abs_nojump() is called. This function first tries the expand function defined for the target and if it fails it will try max(v, -v) then shift-xor-sub method. If I don't generate any instruction for SSSE3, the fall-through will be max(v, -v). I have tested it on my machine. I have tested the usual approach in i386.md, shown exactly by the patch below (and using your other changes to i386.c): -- cut here -- Index: sse.md === --- sse.md (revision 204149) +++ sse.md (working copy) @@ -10270,7 +10270,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))] @@ -10282,6 +10282,19 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) +(define_expand absmode2 + [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] + TARGET_SSE2 +{ + if (!TARGET_SSSE3) +{ + ix86_expand_sse2_abs (operands[0], operands[1]); + DONE; +} +}) + (define_insn absmode2 [(set (match_operand:MMXMODEI 0 register_operand =y) (abs:MMXMODEI -- cut here -- using following testcase: --cut here-- #define N 32 int ca[N]; int cb[N]; void test1 (void) { int i; for (i = 0; i N; ++i) cb[i] = abs (ca[i]); } -- cut here -- Compiling on x86_64-pc-linux-gnu target (inherently -msse2): ~/gcc-build-fast/gcc/cc1 -O2 -ftree-vectorize -dp t.c .L2: movdqa ca(%rax), %xmm0 # 25*movv4si_internal/2 [length = 8] addq$16, %rax # 30*adddi_1/1 [length = 4] movdqa ca-16(%rax), %xmm1 # 46*movv4si_internal/2 [length = 8] psrad $31, %xmm0 # 26ashrv4si3/1 [length = 5] pxor%xmm0, %xmm1# 47*xorv4si3/1 [length = 4] psubd %xmm0, %xmm1# 28*subv4si3/1 [length = 4] movaps %xmm1, cb-16(%rax) # 29*movv4si_internal/3 [length = 7] cmpq$128, %rax # 32*cmpdi_1/1 [length = 6] jne .L2 # 33*jcc_1 [length = 2] ~/gcc-build-fast/gcc/cc1 -O2 -ftree-vectorize -mssse3 -dp t.c .L2: pabsd ca(%rax), %xmm0 # 25*absv4si2 [length = 9] addq$16, %rax # 27*adddi_1/1 [length = 4] movaps %xmm0, cb-16(%rax) # 26*movv4si_internal/3 [length = 7] cmpq$128, %rax # 29*cmpdi_1/1 [length = 6] jne .L2 # 30*jcc_1 [length = 2] As shown above, it worked OK for both with -mssse3 (generating pabsd insn) and without (generating your V4SI sequence). Uros.
RE: [PATCH 1/n] Add conditional compare support
-Original Message- From: Richard Henderson [mailto:r...@redhat.com] Sent: Monday, October 28, 2013 11:07 PM To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener' Cc: GCC Patches Subject: Re: [PATCH 1/n] Add conditional compare support On 10/28/2013 01:32 AM, Zhenqiang Chen wrote: Patch is updated according to your comments. Main changes are: * Add two hooks: legitimize_cmp_combination and legitimize_ccmp_combination * Improve document. No, these are not the hooks I proposed. You should *not* have a ccmp_optab, because the middle-end has absolutely no idea what mode TARGET should be in. The hook needs to return an rtx expression appropriate for feeding to cbranch et al. E.g. for arm, (ne (reg:CC_Z CC_REGNUM) (const_int 0)) after having emitted the instruction that sets CC_REGNUM. We need to push this to the backend because for ia64 the expression needs to be of te form (ne (reg:BI new_pseudo) (const_int 0)) Thanks for the clarification. Patch is updated: * ccmp_optab is removed. * Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends to generate the conditional compare instructions. Thanks! -Zhenqiang ccmp-hook2.patch Description: Binary data
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Tue, Oct 29, 2013 at 10:36 PM, Jeff Law l...@redhat.com wrote: Marc pointed out that the handling of various BUILT_IN_MEM* and BUILT_IN_STR* functions in tree-ssa-alias.c probably wasn't working as intended because the code wasn't prepared for a common return value from ao_ref_base, particularly returns of MEM_REFs. This patch fixes the code to handle the trivial case of returning a MEM_REF and adds a simple testcase. There's probably a lot more that could be done here. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Ok for the trunk? Thanks, Jeff * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle case where ao_ref_base returns a MEM_REF. * gcc.dg/tree-ssa/alias-26.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c new file mode 100644 index 000..b5625b8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 4db83bd..5120e72 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -2079,6 +2079,7 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); tree base = NULL_TREE; + tree ref_base; HOST_WIDE_INT offset = 0; if (!host_integerp (len, 0)) return false; @@ -2087,8 +2088,11 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) offset); else if (TREE_CODE (dest) == SSA_NAME) base = dest; + ref_base = ao_ref_base (ref); if (base - base == ao_ref_base (ref)) + ((TREE_CODE (ref_base) == MEM_REF + base == TREE_OPERAND (ref_base, 0)) That's not sufficient - ref_base may have an offset, so for correctness you have to check that integer_zerop (TREE_OPERAND (ref_base, 0)). But this now looks convoluted and somewhat backward, and still does not catch all cases (including the def-stmt lookup recently added to ao_ref_from_ptr_and_size). Richard. + || ref_base == base)) { HOST_WIDE_INT size = TREE_INT_CST_LOW (len); if (offset = ref-offset / BITS_PER_UNIT
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Tue, Oct 29, 2013 at 11:23 PM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 29 Oct 2013, Jeff Law wrote: Marc pointed out that the handling of various BUILT_IN_MEM* and BUILT_IN_STR* functions in tree-ssa-alias.c probably wasn't working as intended because the code wasn't prepared for a common return value from ao_ref_base, particularly returns of MEM_REFs. Hmm, ao_ref_base is never a pointer, so I'd say the issue is really with trying to use the SSA_NAME directly. Yes. Note that the code tries to relate two pointers but one is a memory-reference (ao_ref_base) and one is either a SSA name pointer or the result of get_addr_base_and_unit_offset (a memory reference as well). This patch fixes the code to handle the trivial case of returning a MEM_REF and adds a simple testcase. There's probably a lot more that could be done here. Thanks. I am not sure we want to keep the variable base that is either a decl/ref (from get_addr_base_and_unit_offset) or a pointer (dest). We know which case is which, but then forget it by storing both into base. Maybe something like this would be more type-safe. bool same = false; if (TREE_CODE (dest) == ADDR_EXPR) same = (ref_base == get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), offset)); else if (TREE_CODE (dest) == SSA_NAME TREE_CODE (ref_base) == MEM_REF) integer_zerop (TREE_OPERAND (ref_base, 1)) same = (TREE_OPERAND (ref_base, 0) == dest); if (same) ... Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. By the way, I think the patch is fine as is, I am only discussing possible follow-ups. Only slightly wrong-codish ;) Richard. (see http://gcc.gnu.org/ml/gcc-patches/2013-10/txto0PQEYpiuz.txt for another approach using ao_ref_init_from_ptr_and_size) -- Marc Glisse
[RFC/CFT] auto-wipe dump files [was: Re: [committed] Fix up bb-slp-31.c testcase]
Hi! I've noticed that this testcase doesn't clean up after itself. Fixed thusly, committed as obvious to trunk. This was nagging me last weekend.. ;) What about automating this? Manual part is attached. The Adjust all callers below is too big to send to the list: git grep -l -E (cleanup-.*-dump|cleanup-saved-temps) | \ egrep -v (ChangeLog|/lib/) | sed -e s|[^/]*$|| | sort | uniq | \ while read d; do find $d -type f \ -exec sed -i -e /cleanup-[^-]*[-]*dump/d;/cleanup-saved-temps/d {} + done Full regstrap on x86_64-unknown-linux-gnu with no regressions with trunk@204119 for configure \ --enable-bootstrap \ --with-system-zlib \ --without-included-gettext \ --disable-werror \ --enable-link-mutex \ --enable-nls \ --enable-plugin \ --enable-__cxa_atexit \ --enable-debug \ --enable-checking=yes,rtl \ --enable-gather-detailed-mem-stats \ --enable-multilib \ --enable-multiarch \ --with-linker-hash-style=both \ --with-as=$BINU/as \ --with-ld=$BINU/ld.gold \ --enable-languages=c,c++,fortran,lto,go,objc,obj-c++ \ make -k check -j4 Ok for trunk? Comments? Given the Fix comment delimiter hunks in the manual patch, i'd suggest to add -Wcomment as default flags where possible to catch these early on in the future. gcc/testsuite/ChangeLog 2013-10-12 Bernhard Reutner-Fischer al...@gcc.gnu.org * lib/gcc-dg.exp (cleanup-ipa-dump, cleanup-rtl-dump, cleanup-tree-dump, cleanup-dump): Remove. Adjust all callers. (schedule-cleanups): New proc. (gcc-dg-test-1): Call it. * lib/profopt.exp (profopt-execute): Likewise. * g++.dg/cdce3.C: Adjust expected line numbers. * gcc.dg/cdce1.c: Likewise. * gcc.dg/cdce2.c: Likewise. * gcc.dg/strlenopt-22.c: Fix comment delimiter. * gcc.dg/strlenopt-24.c: Likewise. * gcc.dg/tree-ssa/vrp26.c: Likewise. * gcc.dg/tree-ssa/vrp28.c: Likewise. * obj-c++.dg/encode-2.mm: Likewise. libgomp/ChangeLog 2013-10-12 Bernhard Reutner-Fischer al...@gcc.gnu.org * testsuite/libgomp.graphite/bounds.c: Adjust for cleanup-tree-dump removal. * testsuite/libgomp.graphite/force-parallel-1.c: Likewise. * testsuite/libgomp.graphite/force-parallel-2.c: Likewise. * testsuite/libgomp.graphite/force-parallel-3.c: Likewise. * testsuite/libgomp.graphite/force-parallel-4.c: Likewise. * testsuite/libgomp.graphite/force-parallel-5.c: Likewise. * testsuite/libgomp.graphite/force-parallel-6.c: Likewise. * testsuite/libgomp.graphite/force-parallel-7.c: Likewise. * testsuite/libgomp.graphite/force-parallel-8.c: Likewise. * testsuite/libgomp.graphite/force-parallel-9.c: Likewise. * testsuite/libgomp.graphite/pr41118.c: Likewise. gcc/ChangeLog 2013-10-12 Bernhard Reutner-Fischer al...@gcc.gnu.org * config/arm/neon-testgen.ml (emit_epilogue): Remove manual call to cleanup-saved-temps. gcc/doc/ChangeLog 2013-10-12 Bernhard Reutner-Fischer al...@gcc.gnu.org * doc/sourcebuild.texi (Clean up generated test files): Expand introduction. (cleanup-ipa-dump, cleanup-rtl-dump, cleanup-tree-dump, cleanup-saved-temps): Remove. From dc181880947cbfb3d652c6d9530cea07cf8280d8 Mon Sep 17 00:00:00 2001 From: Bernhard Reutner-Fischer rep.dot@gmail.com Date: Fri, 18 Oct 2013 21:08:49 +0200 Subject: [PATCH] auto-wipe dump files --- gcc/config/arm/neon-testgen.ml| 1 - gcc/doc/sourcebuild.texi | 19 ++-- gcc/testsuite/g++.dg/cdce3.C | 5 +- gcc/testsuite/gcc.dg/cdce1.c | 3 +- gcc/testsuite/gcc.dg/cdce2.c | 3 +- gcc/testsuite/gcc.dg/strlenopt-22.c | 3 +- gcc/testsuite/gcc.dg/strlenopt-24.c | 3 +- gcc/testsuite/gcc.dg/tree-ssa/vrp26.c | 3 +- gcc/testsuite/gcc.dg/tree-ssa/vrp28.c | 3 +- gcc/testsuite/lib/dg-pch.exp | 18 +++- gcc/testsuite/lib/gcc-dg.exp | 160 -- gcc/testsuite/lib/profopt.exp | 3 + gcc/testsuite/obj-c++.dg/encode-2.mm | 3 +- 13 files changed, 151 insertions(+), 76 deletions(-) diff --git a/gcc/config/arm/neon-testgen.ml b/gcc/config/arm/neon-testgen.ml index 543318b..4734ac0 100644 --- a/gcc/config/arm/neon-testgen.ml +++ b/gcc/config/arm/neon-testgen.ml @@ -139,7 +139,6 @@ let emit_epilogue chan features regexps = else () ); -Printf.fprintf chan /* { dg-final { cleanup-saved-temps } } */\n (* Check a list of C types to determine which ones are pointers and which ones are const. *) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 1a70916..7e0ebd9 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2145,13 +2145,17 @@ Check branch and/or call counts, in addition to line counts, in @subsubsection Clean up generated test files +Usually the test-framework removes files that were generated during +testing. If a testcase, for example, uses any
Re: Aliasing: look through pointer's def stmt
On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 29 Oct 2013, Richard Biener wrote: For the POINTER_PLUS_EXPR offset argument you should use int_cst_value () to access it (it will unconditionally sign-extend) and use host_integerp (..., 0). That leaves the overflow possibility in place (and you should multiply by BITS_PER_UNIT) which we ignore in enough other places similar to this to ignore ... Like this? (passes bootstrap+testsuite on x86_64-linux-gnu) 2013-10-30 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a POINTER_PLUS_EXPR in the defining statement. gcc/testsuite/ * gcc.dg/tree-ssa/alias-24.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +extern void keepit (); +void g (const char *c, int *i) +{ + *i = 33; + __builtin_memcpy (i - 1, c, 3 * sizeof (int)); + if (*i != 33) keepit(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump keepit optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204188) +++ gcc/tree-ssa-alias.c(working copy) @@ -567,20 +567,29 @@ void ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size) { HOST_WIDE_INT t1, t2; ref-ref = NULL_TREE; if (TREE_CODE (ptr) == SSA_NAME) { gimple stmt = SSA_NAME_DEF_STMT (ptr); if (gimple_assign_single_p (stmt) gimple_assign_rhs_code (stmt) == ADDR_EXPR) ptr = gimple_assign_rhs1 (stmt); + else if (is_gimple_assign (stmt) + gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + host_integerp (gimple_assign_rhs2 (stmt), 0) + (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0) No need to restrict this to positive offsets I think. + { + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt), size); Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs and MEM[ptr, offset] so it shouldn't be necessary. + ref-offset += 8 * t1; BITS_PER_UNIT instead of 8. I'd say just have a 0-initialized additional_offset var that you unconditionally add ... + return; + } } if (TREE_CODE (ptr) == ADDR_EXPR) ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), ref-offset, t1, t2); else { ref-base = build2 (MEM_REF, char_type_node, ptr, null_pointer_node); ref-offset = 0; ... here at the end. Thanks, Richard.
RE: [PATCH 1/n] Add conditional compare support
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hans-Peter Nilsson Sent: Tuesday, October 29, 2013 10:38 AM To: Zhenqiang Chen Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches Subject: RE: [PATCH 1/n] Add conditional compare support On Tue, 22 Oct 2013, Zhenqiang Chen wrote: ChangeLog: 2013-10-22 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/arm.c (arm_fixed_condition_code_regs, arm_ccmode_to_code, arm_select_dominance_ccmp_mode): New functions. (arm_select_dominance_cc_mode_1): New function extracted from arm_select_dominance_cc_mode. (arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1. * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior, ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc, ccmp_and_scc_scc_cmp): New. * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): New. * expr.c (ccmp_candidate_p, used_in_cond_stmt_p, expand_ccmp_expr_2, expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): New. (expand_expr_real_1): Handle ccmp. * optabs.c: Include gimple.h. (expand_ccmp_op): New. (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.def (ccmp): New. * optabs.h (expand_ccmp_op): New. * doc/md.texi (ccmp): New index. One thing I don't see other people mentioning, is that this patch has just too much code inside #ifdef HAVE_ccmp ... #endif. I couldn't actually find the part that *requires* that, i.e. code that does something like gen_ccmp (...) but maybe it's there. In the arm.md, I define (define_expand ccmp It will be transferred to function gen_ccmp (...). And it will define HAVE_ccmp. In the updated patch, I add two backends hooks to generate conditional compare instruction. Where needed and where the conditioned code is more than a few lines, such code is preferably transformed into if (0):d code using constructs like that at builtins.c:5354: #ifndef HAVE_atomic_clear # define HAVE_atomic_clear 0 # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif Right, this causes dead code, but for maintenance it's *much* better than when only a fraction of the code being compiled for other targets. (Also, the HAVE_ccmp is automatically generated when the target has ccmp instruction. The code segments in #ifdef HAVE_ccmp ... #endif will *not* be compiled for other targets, which do not define ccmp instruction. dead code may be eliminated by gcc.) Unfortunately the number of examples (as above) are few compared to the pages of #if HAVE_thisorthat'd code. :( (And IMHO that whole construct should be the default implementation and shouldn't have to be written manually in the first place. But that's material for an invasive patch.) After adding another two hooks, the code in middle-end is just a framework. It fully depends on backend to generate the instruction. Thanks! -Zhenqiang
[RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
Hi! Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i] * 7; } void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i]; } we have: vmovdqu (%rsi,%rax), %xmm0 vpmulld %xmm1, %xmm0, %xmm0 vmovups %xmm0, (%rdi,%rax) in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT *movmode_internal patterns (and various others) use misaligned_operand to see if they should emit vmovaps or vmovups (etc.), so as suggested by Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory operands of all the various non-move AVX instructions for TARGET_AVX, or add extra patterns to help combine, this patch instead just uses the *movmode_internal in that case (assuming initially misaligned_operand doesn't become !misaligned_operand through RTL optimizations). Additionally the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned loads, which usually means combine will fail, by doing the load into a temporary pseudo in that case and then doing a pseudo to pseudo move with gen_lowpart on the rhs (which will be merged soon after into following instructions). I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my bootstrap/regtest server isn't AVX capable. 2013-10-30 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If op1 is misaligned_operand, just use *movmode_internal insn rather than UNSPEC_LOADU load. (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). Avoid gen_lowpart on op0 if it isn't MEM. --- gcc/config/i386/i386.c.jj 2013-10-30 08:15:38.0 +0100 +++ gcc/config/i386/i386.c 2013-10-30 10:20:22.684708729 +0100 @@ -16560,6 +16560,12 @@ ix86_avx256_split_vector_move_misalign ( r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); emit_move_insn (op0, r); } + /* Normal *movmode_internal pattern will handle +unaligned loads just fine if misaligned_operand +is true, and without the UNSPEC it can be combined +with arithmetic instructions. */ + else if (misaligned_operand (op1, GET_MODE (op1))) + emit_insn (gen_rtx_SET (VOIDmode, op0, op1)); else emit_insn (load_unaligned (op0, op1)); } @@ -16634,7 +16640,7 @@ ix86_avx256_split_vector_move_misalign ( void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { - rtx op0, op1, m; + rtx op0, op1, orig_op0 = NULL_RTX, m; rtx (*load_unaligned) (rtx, rtx); rtx (*store_unaligned) (rtx, rtx); @@ -16647,7 +16653,16 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V16SImode, op0); + if (GET_MODE (op0) != V16SImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V16SImode); + } + else + op0 = gen_lowpart (V16SImode, op0); + } op1 = gen_lowpart (V16SImode, op1); /* FALLTHRU */ @@ -16676,6 +16691,8 @@ ix86_expand_vector_move_misalign (enum m emit_insn (store_unaligned (op0, op1)); else gcc_unreachable (); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16692,12 +16709,23 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V32QImode, op0); + if (GET_MODE (op0) != V32QImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V32QImode); + } + else + op0 = gen_lowpart (V32QImode, op0); + } op1 = gen_lowpart (V32QImode, op1); /* FALLTHRU */ case MODE_VECTOR_FLOAT: ix86_avx256_split_vector_move_misalign (op0, op1); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16709,15 +16737,30 @@ ix86_expand_vector_move_misalign (enum m if (MEM_P (op1)) { + /* Normal *movmode_internal pattern will handle +unaligned loads just fine if misaligned_operand +is true, and without the UNSPEC it can be combined +with arithmetic instructions. */ + if (TARGET_AVX + (GET_MODE_CLASS (mode) == MODE_VECTOR_INT + || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) + misaligned_operand
Re: [PATCH] Fix PR middle-end/58134
On Wed, Oct 30, 2013 at 1:55 AM, Sharad Singhai sing...@google.com wrote: This patch removes a deprecated option -ftree-vectorizer-verbose. It was already implemented in terms of -fopt-info and makes little sense to keep it around longer. It causes needless confusion as reported in PR middle-end/58134. I noticed that several gettext related gcc/po files contain help translation for the option being removed. However, I haven't updated them as I wasn't sure if these files should be regenerated using a separate mechanism. If needed, I can include manual updates to those .po files as well. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Okay for the trunk? Please keep the common.opt entry as ignored for backward compatibility (see several examples in that file). Ok with that change. Thanks, Richard. Thanks, Sharad 2013-10-29 Sharad Singhai sing...@google.com PR middle-end/58134 * opts.c (common_handle_option): Remove deprecated option -ftree-vectorizer-verbose. * doc/invoke.texi (Debugging Options): Ditto. * common.opt: Ditto. * opts-global.c (handle_common_deferred_options): Ditto. (dump_remap_tree_vectorizer_verbose): Delete.
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
2013/10/30 Richard Biener richard.guent...@gmail.com: On Tue, Oct 29, 2013 at 8:48 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/10/29 Jeff Law l...@redhat.com: On 10/29/13 07:52, Ilya Enkovich wrote: Yeah. I'm working on it right now. I've fixed known issues and now I'm looking for others. Meanwhile here is a new patch version with required renames and without LTO restriction. I can't help but but curious, what turned out to be the root cause of those LTO problems? There were three different problems fixed. The first one was SSA_NAME in DECL_INITIAL of local var. Instrumentation used it to initialize var with input arg value (default SSA_NAME of PARM_DECL was used). LTO cannot handle it because when it reads symbols, it does not have SSA_NAMEs. It caused ICE. Obviously putting things in trees is bad. Yes, it is fixed. Another problem was in LTO front-end. I did not realize it has own langhooks. It caused reset of flag_check_pointer_bounds in process_options by my own code. And the last one was in initialization of checker structures. Some structures were initialized during checker pass and then used in other passes (e.g. expand). With LTO checker pass is not executed after LTO front-end and following passes could work with uninitialized checker structures. Looks badly designed then - any function related information should be hooked off struct function and streamed by LTO. Or the info should be present in the IL. Info is local to pass and is not required to be streamed by LTO. It is just stored using checker interfaces in its structures. Ilya Richard. 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * builtin-types.def (BT_FN_VOID_CONST_PTR): New. (BT_FN_PTR_CONST_PTR): New. (BT_FN_CONST_PTR_CONST_PTR): New. (BT_FN_PTR_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR): New. (BT_FN_VOID_PTRPTR_CONST_PTR): New. (BT_FN_VOID_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New. * chkp-builtins.def: New. * builtins.def: include chkp-builtins.def. (DEF_CHKP_BUILTIN): New. * builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS, BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS, BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS, BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS, BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS, BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND, BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL, BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET, BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW, BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER. * common.opt (fcheck-pointer-bounds): New. * toplev.c (process_options): Check Pointer Bounds Checker is supported. * doc/extend.texi: Document Pointer Bounds Checker built-in functions. This is fine. Please install. Thanks! Ilya Thanks, jeff
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 10:47:13AM +0100, Jakub Jelinek wrote: Hi! Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i] * 7; } void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i]; } we have: vmovdqu (%rsi,%rax), %xmm0 vpmulld %xmm1, %xmm0, %xmm0 vmovups %xmm0, (%rdi,%rax) in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT *movmode_internal patterns (and various others) use misaligned_operand to see if they should emit vmovaps or vmovups (etc.), so as suggested by That is intentional. In pre-haswell architectures splitting load is faster than 32 byte load. See Intel® 64 and IA-32 Architectures Optimization Reference Manual for details.
[PATCH C++/testsuite] Remove pchtest check objects and compile with current tool
Hi, The pch-init leaves check objects lying around, remove them. While at it, i noticed that i was getting warnings from the check since it was invoced with xg++ -nostdinc++ on C source (in one of the two iterations the check is run -- once per tool). My suggestion is to use the correct tool to perform the pch check. Does that make sense to you? Regstrapped (together with the auto-removal patch just sent) on x86_64-unknown-linux-gnu with trunk@204119 without regressions. Ok for trunk? gcc/testsuite/ChangeLog 2013-10-12 Bernhard Reutner-Fischer al...@gcc.gnu.org * lib/dg-pch.exp (pch-init): Remove pchtest check objects. Compile pchtest with current tool. diff --git a/gcc/testsuite/lib/dg-pch.exp b/gcc/testsuite/lib/dg-pch.exp index d82c669..41de454 100644 --- a/gcc/testsuite/lib/dg-pch.exp +++ b/gcc/testsuite/lib/dg-pch.exp @@ -18,6 +18,18 @@ load_lib copy-file.exp proc pch-init { args } { global pch_unsupported_debug pch_unsupported +global tool + +set chk_content +set chk_lang c-header +switch -- $tool { + g++ { + set chk_content // C++ + set chk_lang c++-header + } +} +append chk_content \nint i; +set chk_lang -x $chk_lang if [info exists pch_unsupported_debug] { error pch-init: pch_unsupported_debug is not empty as expected @@ -26,20 +38,22 @@ proc pch-init { args } { error pch-init: pch_unsupported is not empty as expected } -set result [check_compile pchtest object int i; -g -x c-header] +set result [check_compile pchtest object $chk_content -g $chk_lang] set pch_unsupported_debug \ [regexp debug format cannot be used with pre-compiled headers \ [lindex $result 0]] +remove-build-file [lindex $result 1] set pch_unsupported 0 if { $pch_unsupported_debug } { verbose -log pch is unsupported with the debug info format - set result [check_compile pchtest object int i; -x c-header] + set result [check_compile pchtest object $chk_type $chk_lang] set pch_unsupported \ [regexp debug format cannot be used with pre-compiled headers \ [lindex $result 0]] } +remove-build-file [lindex $result 1] } proc pch-finish { args } {
Re: [wide-int] More optimisations
On Tue, 29 Oct 2013, Mike Stump wrote: On Oct 29, 2013, at 5:43 AM, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: Richard Biener rguent...@suse.de writes: I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Well, the idea with most of these functions was to handle the simple everything is one HWI cases inline. Yes. The idea is that 99.99% of all cases are 1 HWI or less, dynamically. By inlining and doing those cases inline, provided the code is relatively small, seemed like a win. This win comes at the cost of duplicative code in the wider path, as it checks the precision/length as well, but slowing down those cases seemed reasonable with respect to how infrequent we expect them. Now, if the slow path code is the same speed as the inline code would have been, certainly the duplication just hurts. This is why I was posting the code fragments for the fast path case. I was aiming for the fast path to be really nice to help ensure that we don't don't slow down compiles on narrow code any. Handling the len == 1 case inline is fine, just don't add too many fast special cases (mixed len == 1, len 1, special known extended cases, etc.) that are not decidable at compile-time as branchy code is both hard to optimize, and puts load on I-cache and the branch predictor. Richard.
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 10:53:58AM +0100, Ondřej Bílka wrote: Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i] * 7; } void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i 1024; i++) e[i] = f[i]; } we have: vmovdqu (%rsi,%rax), %xmm0 vpmulld %xmm1, %xmm0, %xmm0 vmovups %xmm0, (%rdi,%rax) in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT *movmode_internal patterns (and various others) use misaligned_operand to see if they should emit vmovaps or vmovups (etc.), so as suggested by That is intentional. In pre-haswell architectures splitting load is faster than 32 byte load. But the above is 16 byte unaligned load. Furthermore, GCC supports -mavx256-split-unaligned-load and can emit 32 byte loads either as an unaligned 32 byte load, or merge of 16 byte unaligned loads. The patch affects only the cases where we were already emitting 16 byte or 32 byte unaligned loads rather than split loads. Jakub
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 11:00:13AM +0100, Jakub Jelinek wrote: But the above is 16 byte unaligned load. Furthermore, GCC supports -mavx256-split-unaligned-load and can emit 32 byte loads either as an unaligned 32 byte load, or merge of 16 byte unaligned loads. The patch affects only the cases where we were already emitting 16 byte or 32 byte unaligned loads rather than split loads. With my patch, the differences (in all cases only on f1) for -O2 -mavx -ftree-vectorize with the patch is (16 byte unaligned load, not split): - vmovdqu (%rsi,%rax), %xmm0 - vpmulld %xmm1, %xmm0, %xmm0 + vpmulld (%rsi,%rax), %xmm1, %xmm0 vmovups %xmm0, (%rdi,%rax) with -O2 -mavx2 -ftree-vectorize (again, load wasn't split): - vmovdqu (%rsi,%rax), %ymm0 - vpmulld %ymm1, %ymm0, %ymm0 + vpmulld (%rsi,%rax), %ymm1, %ymm0 vmovups %ymm0, (%rdi,%rax) and with -O2 -mavx2 -mavx256-split-unaligned-load: vmovdqu (%rsi,%rax), %xmm0 vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0 - vpmulld %ymm1, %ymm0, %ymm0 + vpmulld %ymm0, %ymm1, %ymm0 vmovups %ymm0, (%rdi,%rax) (the last change is just giving RTL optimizers more freedom by not doing the SUBREG on the lhs). Jakub
Re: C++ PATCH to deal with trivial but non-callable [cd]tors
In C++ all classes have destructors, but we try to defer building the implicit declaration. My patch causes us to build those implicit declarations more often, which is probably a bit of a memory regression, but it would be good for your code to handle the dtor being declared. OK, patch attached. It's large because of some internal shuffling, but it just implements that. Tested on x86-64/Linux, any objections? 2013-10-31 Eric Botcazou ebotca...@adacore.com c-family/ * c-ada-spec.h (cpp_operation): Add HAS_TRIVIAL_DESTRUCTOR. (dump_ada_specs): Adjust prototype of second callback. * c-ada-spec.c (cpp_check): New global variable. (dump_ada_nodes): Remove cpp_check parameter and do not pass it down. (print_generic_ada_decl): Likewise. (has_static_fields): Change return type to bool and add guard. (is_trivial_method): New predicate. (has_nontrivial_methods): Likewise. (is_tagged_type): Change return type to bool. (separate_class_package): Call has_nontrivial_methods. (pp_ada_tree_identifier): Minor tweaks. (dump_ada_function_declaration): Adjust calls to dump_generic_ada_node. (dump_ada_array_domains): Likewise. (dump_ada_array_type): Likewise. (dump_template_types): Remove cpp_check parameter and do not pass it to dump_generic_ada_node. (dump_ada_template): Likewise. (dump_generic_ada_node): Remove cpp_check parameter and do not pass it recursively. (print_ada_methods): Change return type to integer. Remove cpp_check parameter and do not pass it down. (dump_nested_types): Remove cpp_check parameter and do not pass it to dump_generic_ada_node. (print_ada_declaration): Likewise. Do not print trivial methods. Test RECORD_OR_UNION_TYPE_P predicate before accessing methods. (print_ada_struct_decl): Remove cpp_check parameter and do not pass it down. Use has_nontrivial_methods to recognize C++ classes. Use return value of print_ada_methods. (dump_ads): Rename cpp_check parameter to check and adjust prototype. Set cpp_check to it before invoking dump_ada_nodes. (dump_ada_specs): Likewise. cp/ * decl2.c (cpp_check): Change type of first parameter and deal with HAS_TRIVIAL_DESTRUCTOR. -- Eric BotcazouIndex: c-family/c-ada-spec.h === --- c-family/c-ada-spec.h (revision 204152) +++ c-family/c-ada-spec.h (working copy) @@ -29,13 +29,14 @@ typedef enum { IS_CONSTRUCTOR, IS_DESTRUCTOR, IS_COPY_CONSTRUCTOR, - IS_TEMPLATE + IS_TEMPLATE, + HAS_TRIVIAL_DESTRUCTOR } cpp_operation; extern location_t decl_sloc (const_tree, bool); extern void collect_ada_nodes (tree, const char *); extern void collect_source_ref (const char *); extern void dump_ada_specs (void (*)(const char *), - int (*)(tree, cpp_operation)); + int (*)(const_tree, cpp_operation)); #endif /* ! C_ADA_SPEC_H */ Index: c-family/c-ada-spec.c === --- c-family/c-ada-spec.c (revision 204152) +++ c-family/c-ada-spec.c (working copy) @@ -46,32 +46,31 @@ along with GCC; see the file COPYING3. #endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */ /* Local functions, macros and variables. */ -static int dump_generic_ada_node (pretty_printer *, tree, tree, - int (*)(tree, cpp_operation), int, int, bool); -static int print_ada_declaration (pretty_printer *, tree, tree, - int (*cpp_check)(tree, cpp_operation), int); -static void print_ada_struct_decl (pretty_printer *, tree, tree, - int (*cpp_check)(tree, cpp_operation), int, - bool); +static int dump_generic_ada_node (pretty_printer *, tree, tree, int, int, + bool); +static int print_ada_declaration (pretty_printer *, tree, tree, int); +static void print_ada_struct_decl (pretty_printer *, tree, tree, int, bool); static void dump_sloc (pretty_printer *buffer, tree node); static void print_comment (pretty_printer *, const char *); -static void print_generic_ada_decl (pretty_printer *, tree, -int (*)(tree, cpp_operation), const char *); +static void print_generic_ada_decl (pretty_printer *, tree, const char *); static char *get_ada_package (const char *); -static void dump_ada_nodes (pretty_printer *, const char *, - int (*)(tree, cpp_operation)); +static void dump_ada_nodes (pretty_printer *, const char *); static void reset_ada_withs (void); static void dump_ada_withs (FILE *); static void dump_ads (const char *, void (*)(const char *), - int (*)(tree, cpp_operation)); + int (*)(const_tree, cpp_operation)); static char *to_ada_name (const char *, int *); static bool separate_class_package (tree); -#define INDENT(SPACE) do { \ - int i; for (i = 0; iSPACE; i++) pp_space (buffer); } while (0) +#define INDENT(SPACE) \
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] Design is on GCC Wiki and link was posted few times: http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. Here is quote about return statements: Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds Bah. Where is the update to gimple.texi and tree.texi? Richard. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 7bd9fd5..be74170 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call statement @code{G}. Return the number of arguments used by call statement @code{G}. @end deftypefn +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g) +Return the number of arguments used by call statement @code{G} +ignoring bound ones. +@end deftypefn + @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index) Return the argument at position @code{INDEX} for call statement @code{G}. The first argument is 0. @end deftypefn +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned index) +Return the argument at position @code{INDEX} for call statement @code{G} +ignoring bound ones. The first argument is 0. +@end deftypefn + +@deftypefn {GIMPLE function} unsigned gimple_call_get_nobnd_arg_index (gimple g, unsigned index) +Return index of @code{INDEX}'s non bound argument of the call statement @code{G} +@end deftypefn + @deftypefn {GIMPLE function} {tree *} gimple_call_arg_ptr (gimple g, unsigned index) Return a pointer to the argument at position @code{INDEX} for call statement @code{G}. @@ -2029,6 +2043,15 @@ Return the return value for @code{GIMPLE_RETURN} @code{G}. Set @code{RETVAL} to be the return value for @code{GIMPLE_RETURN} @code{G}. @end deftypefn +@deftypefn {GIMPLE function} tree gimple_return_retbnd (gimple g) +Return the bounds of return value for @code{GIMPLE_RETURN} @code{G}. +@end deftypefn + +@deftypefn {GIMPLE function} void gimple_return_set_retbnd (gimple g, tree retbnd) +Set @code{RETBND} to be the bounds of return value for @code{GIMPLE_RETURN} +@code{G}. +@end deftypefn + @node @code{GIMPLE_SWITCH} @subsection @code{GIMPLE_SWITCH} @cindex @code{GIMPLE_SWITCH}
Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
Without that patch, which I have copied from asan-dg.exp, I get tons of failures because ld cannot find libcilkrts. OK for committal? Tobias 2013-10-30 Tobias Burnus bur...@net-b.de * gcc.dg/cilk-plus/cilk-plus.exp: Add the libcilkrts library path to the compile flags. diff --git a/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp b/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp index a8f00d4..0a9d19b 100644 --- a/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp +++ b/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp @@ -26,7 +26,24 @@ if { ![check_effective_target_cilkplus] } { verbose $tool $libdir 1 set library_var [get_multilibs] # Pointing the ld_library_path to the Cilk Runtime library binaries. -set ld_library_path ${library_var}/libcilkrts/.libs +if { $gccpath != } { + if { [file exists ${gccpath}/libcilkrts/.libs/libcilkrts.a] + || [file exists ${gccpath}/libcilkrts/.libs/libcilkrts.${shlib_ext}] } { + append flags -B${gccpath}/libcilkrts/ + append flags -L${gccpath}/libcilkrts/.libs + append ld_library_path :${gccpath}/libcilkrts/.libs + } +} else { + global tool_root_dir + + set libcilkrts [lookfor_file ${tool_root_dir} libcilkrts] + if { $libcilkrts != } { + append flags -L${libcilkrts} + append ld_library_path :${libcilkrts} + } +} + +set_ld_library_path_env_vars dg-init dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -fcilkplus
Re: [PATCH, MPX, 2/X] Pointers Checker [3/25] Attributes
On 24 Oct 23:34, Jeff Law wrote: On 10/21/13 05:59, Ilya Enkovich wrote: Hi, This patch adds attributes 'bnd_variable_size' and 'bnd_legacy' used by Pointers Checker. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * c-family/c-common.c (handle_bnd_variable_size_attribute): New. (handle_bnd_legacy): New. (c_common_attribute_table): Add bnd_variable_size and bnd_legacy. * doc/extend.texi: Document bnd_variable_size and bnd_legacy attributes. OK with the same terminology changes from the 2/25 Builtins patch. jeff Thanks! Below is the installed version. Ilya -- diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index b20fdd6..f519489 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -375,6 +375,8 @@ static tree handle_omp_declare_simd_attribute (tree *, tree, tree, int, bool *); static tree handle_omp_declare_target_attribute (tree *, tree, tree, int, bool *); +static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool *); +static tree handle_bnd_legacy (tree *, tree, tree, int, bool *); static void check_function_nonnull (tree, int, tree *); static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT); @@ -757,6 +759,10 @@ const struct attribute_spec c_common_attribute_table[] = handle_omp_declare_simd_attribute, false }, { omp declare target, 0, 0, true, false, false, handle_omp_declare_target_attribute, false }, + { bnd_variable_size, 0, 0, true, false, false, + handle_bnd_variable_size_attribute, false }, + { bnd_legacy, 0, 0, true, false, false, + handle_bnd_legacy, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -8013,6 +8019,38 @@ handle_fnspec_attribute (tree *node ATTRIBUTE_UNUSED, tree ARG_UNUSED (name), return NULL_TREE; } +/* Handle a bnd_variable_size attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_bnd_variable_size_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + if (TREE_CODE (*node) != FIELD_DECL) +{ + warning (OPT_Wattributes, %qE attribute ignored, name); + *no_add_attrs = true; +} + + return NULL_TREE; +} + +/* Handle a bnd_legacy attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_bnd_legacy (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) +{ + warning (OPT_Wattributes, %qE attribute ignored, name); + *no_add_attrs = true; +} + + return NULL_TREE; +} + /* Handle a warn_unused attribute; arguments as in struct attribute_spec.handler. */ diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 8ca3137..1d52e42 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2138,7 +2138,7 @@ attributes are currently defined for functions on all targets: @code{returns_nonnull}, @code{gnu_inline}, @code{externally_visible}, @code{hot}, @code{cold}, @code{artificial}, @code{no_sanitize_address}, @code{no_address_safety_analysis}, -@code{no_sanitize_undefined}, +@code{no_sanitize_undefined}, @code{bnd_legacy}, @code{error} and @code{warning}. Several other attributes are defined for functions on particular target systems. Other attributes, including @code{section} are @@ -3549,6 +3549,12 @@ The @code{no_sanitize_undefined} attribute on functions is used to inform the compiler that it should not check for undefined behavior in the function when compiling with the @option{-fsanitize=undefined} option. +@item bnd_legacy +@cindex @code{bnd_legacy} function attribute +The @code{bnd_legacy} attribute on functions is used to inform +compiler that function should not be instrumented when compiled +with @option{-fcheck-pointers} option. + @item regparm (@var{number}) @cindex @code{regparm} attribute @cindex functions that are passed arguments in registers on the 386 @@ -5321,12 +5327,12 @@ placed in either the @code{.bss_below100} section or the The keyword @code{__attribute__} allows you to specify special attributes of @code{struct} and @code{union} types when you define such types. This keyword is followed by an attribute specification -inside double parentheses. Seven attributes are currently defined for +inside double parentheses. Eight attributes are currently defined for types: @code{aligned}, @code{packed}, @code{transparent_union}, -@code{unused}, @code{deprecated}, @code{visibility}, and -@code{may_alias}. Other attributes are defined for functions -(@pxref{Function
Re: [PATCH, MPX, 2/X] Pointers Checker [4/25] Constructors
On 24 Oct 23:55, Jeff Law wrote: On 10/21/13 06:10, Ilya Enkovich wrote: Hi, This patch introduces two new contructor types supported by cgraph_build_static_cdtor. 'B' type is used to initialize static objects (bounds) created by Pointers Checker. The difference of this type from the regular constructor is that 'B' constructor is never instrumented by Pointers Checker. 'P' type is used by Pointers Checker to generate constructors to initialize bounds of statically initialized pointers. Pointers Checker remove all stores from such constructors after instrumentation. Since 'P' type constructors are created for statically initialized objects, we need to avoid creation of such objects during its gimplification. New restriction was added to gimplify_init_constructor. Bootstrapped and checked on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (cgraph_build_static_cdtor_1): Support contructors with chkp ctor and bnd_legacy attributes. * gimplify.c (gimplify_init_constructor): Avoid infinite loop during gimplification of bounds initializer. This is OK. As a side note, it seems awfully strange to be passing in the type of ctor/dtor in a char varaible. I'd look favorably upon changing that to an enum where the enum values describe the cases they handle. The existing code seems so, umm, 80s/90s style. Obviously not something that's required of you to move this patch forward. Thanks! Installed to trunk. Ilya jeff
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] Design is on GCC Wiki and link was posted few times: http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. Here is quote about return statements: Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. I wish I had seen all this before. Bah. Where is the update to gimple.texi and tree.texi? Richard. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Richard. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 7bd9fd5..be74170 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call statement @code{G}. Return the number of arguments used by call statement @code{G}. @end deftypefn +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g) +Return the number of arguments used by call statement @code{G} +ignoring bound ones. +@end deftypefn + @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index) Return the argument at position @code{INDEX} for call statement @code{G}. The first argument is 0. @end deftypefn +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned index) +Return the argument at position @code{INDEX} for call statement @code{G} +ignoring bound ones. The first argument is 0. +@end deftypefn + +@deftypefn {GIMPLE function} unsigned gimple_call_get_nobnd_arg_index (gimple g, unsigned index) +Return index of @code{INDEX}'s non bound argument of the call statement @code{G} +@end deftypefn + @deftypefn {GIMPLE function} {tree *} gimple_call_arg_ptr (gimple g, unsigned index) Return a pointer to the argument at position @code{INDEX} for call statement @code{G}. @@ -2029,6 +2043,15 @@ Return the return value for @code{GIMPLE_RETURN} @code{G}. Set @code{RETVAL} to be the
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Wed, Oct 30, 2013 at 11:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] Design is on GCC Wiki and link was posted few times: http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. Here is quote about return statements: Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. Btw, we have this kind of mess pre-existing with stuff like flag_wrapv and flag_trapv, adding a new case should be a 100% no-go. If you really need to have this conditional per call then add a flag on CALL_EXPR and GIMPLE_CALL saying this call has bound arguments and return value. And append the bound arguments at the end. Richard. I wish I had seen all this before. Bah. Where is the update to gimple.texi and tree.texi? Richard. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Richard. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 7bd9fd5..be74170 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call statement @code{G}. Return the number of arguments used by call statement @code{G}. @end deftypefn +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g) +Return the number of arguments used by call statement @code{G} +ignoring bound ones. +@end deftypefn + @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index) Return the argument at position @code{INDEX} for call statement @code{G}. The first argument is 0. @end deftypefn +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned index) +Return the argument at position @code{INDEX} for call statement @code{G} +ignoring bound ones. The first argument is 0. +@end deftypefn + +@deftypefn {GIMPLE
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 11:05:58AM +0100, Jakub Jelinek wrote: On Wed, Oct 30, 2013 at 11:00:13AM +0100, Jakub Jelinek wrote: But the above is 16 byte unaligned load. Furthermore, GCC supports -mavx256-split-unaligned-load and can emit 32 byte loads either as an unaligned 32 byte load, or merge of 16 byte unaligned loads. The patch affects only the cases where we were already emitting 16 byte or 32 byte unaligned loads rather than split loads. With my patch, the differences (in all cases only on f1) for -O2 -mavx -ftree-vectorize with the patch is (16 byte unaligned load, not split): My point was that this could mask split loads, thank for clarifying
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote: Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize This is actually PR 47754 that fell below radar for some reason... we have: vmovdqu (%rsi,%rax), %xmm0 vpmulld %xmm1, %xmm0, %xmm0 vmovups %xmm0, (%rdi,%rax) in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT *movmode_internal patterns (and various others) use misaligned_operand to see if they should emit vmovaps or vmovups (etc.), so as suggested by Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory operands of all the various non-move AVX instructions for TARGET_AVX, or add extra patterns to help combine, this patch instead just uses the *movmode_internal in that case (assuming initially misaligned_operand doesn't become !misaligned_operand through RTL optimizations). Additionally No worries here. We will generate movdqa, and it is definitely a gcc bug if RTL optimizations change misaligned operand to aligned. the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned loads, which usually means combine will fail, by doing the load into a temporary pseudo in that case and then doing a pseudo to pseudo move with gen_lowpart on the rhs (which will be merged soon after into following instructions). Is this similar to PR44141? There were similar problems with V4SFmode subregs, so combine was not able to merge load to the arithemtic insn. I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my bootstrap/regtest server isn't AVX capable. I can bootstrap the patch later today on IvyBridge with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. Uros.
[Patch ARM] Fix PR target/58854
PR58854 is another manifestation of the usual issue with the scheduler not understanding sp fp overlap issues and hoisting such adjustments (see PR38644 for a long history on an earlier one) over accesses from the frame. The usual bodge in the backend is attached. Tested arm-none-eabi cross - tested that the testcase works just fine. Applied to trunk and will backport to the 4.8 branch in a day or so after the auto-testers have had a chance to play with this. regards Ramana 2013-10-30 Ramana Radhakrishnan ramana.radhakrish...@arm.com PR target/58854 * config/arm/arm.c (arm_expand_epilogue_apcs_frame): Emit blockage.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 212a4bc..23dfc0e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -26547,6 +26547,7 @@ arm_expand_epilogue_apcs_frame (bool really_return) num_regs = bit_count (saved_regs_mask); if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca) { + emit_insn (gen_blockage ()); /* Unwind the stack to just below the saved registers. */ emit_insn (gen_addsi3 (stack_pointer_rtx, hard_frame_pointer_rtx,
Re: [PATCH] Introduce [sg]et_nonzero_bits
On Tue, 29 Oct 2013, Jakub Jelinek wrote: On Tue, Oct 29, 2013 at 12:55:04PM +0100, Richard Biener wrote: Surely you can't rely on CCP and VRP compute exactly the same nonzero_bits. As you don't record/compute zero_bits you can't tell whether a not set bit in nonzer_bits is don't know or if it is zero. And you cannot do an assert during iteration (during iteration optimistically 'wrong' values can disappear). During CCP iteration the rule is that bits may only be added to mask (and obviously the constant itself on a CONSTANT lattice value may not change). Factor this out to commonize with code in evaluate_stmt Ok. + tem = val-mask.low; + align = (tem -tem); + if (align 1) + set_ptr_info_alignment (get_ptr_info (name), align, + (TREE_INT_CST_LOW (val-value) + (align - 1))); + } + else + { + double_int nonzero_bits = val-mask; + nonzero_bits = nonzero_bits | tree_to_double_int (val-value); + nonzero_bits = get_nonzero_bits (name); This looks odd to me - shouldn't it be simply nonzero_bits = ~val-mask tree_to_double_int (val-value); Bits set in the mask are for the bits where the value is unknown, so potentially nonzero bits. Where bits are clear in the mask, but set in the value, those are surely nonzero bits. The only known zero bits are where both mask and value are zero, and as nonzero_bits is defined to be 1 where the bit may be non-zero, 0 where it must be zero (like it is for nonzero_bits* in RTL), I think val-mask | tree_to_double_int (val-value); is exactly what we want. Ah, I think I missed that detail (1 in nonzero_bits means maybe non-zero and 0 means must-be zero) - a bit odd, but yeah, making it consistent with RTL sounds good. ? = of get_nonzero_bits either is not necessary or shows the lattice is unnecessarily conservative because you apply it too early during propagation? + set_nonzero_bits (name, nonzero_bits); + } } /* Perform substitutions based on the known constant values. */ @@ -1671,6 +1700,25 @@ evaluate_stmt (gimple stmt) is_constant = (val.lattice_val == CONSTANT); } + if (flag_tree_bit_ccp + !is_constant ^^^ ah, so if the bit-CCP part figured out a set of known bits then you don't do anything here while in fact you can still remove bits from mask with get_nonzero_bits info. Yeah, that was not to lose info for the case where the lattice is already CONSTANT. CONSTANT and with mask == 0, sure. Remember the lattice is also CONSTANT if only some bits are known. Originally, I had code like: --- tree-ssa-ccp.c.xx 2013-10-29 15:31:03.0 +0100 +++ tree-ssa-ccp.c2013-10-29 15:34:44.257977817 +0100 @@ -1701,8 +1701,7 @@ evaluate_stmt (gimple stmt) } if (flag_tree_bit_ccp - !is_constant - likelyvalue != UNDEFINED + (is_constant || likelyvalue != UNDEFINED) gimple_get_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME) { @@ -1711,11 +1710,27 @@ evaluate_stmt (gimple stmt) double_int mask = double_int::mask (TYPE_PRECISION (TREE_TYPE (lhs))); if (nonzero_bits != double_int_minus_one nonzero_bits != mask) { - val.lattice_val = CONSTANT; - val.value = build_zero_cst (TREE_TYPE (lhs)); - /* CCP wants the bits above precision set. */ - val.mask = nonzero_bits | ~mask; - is_constant = true; + if (!is_constant) + { + val.lattice_val = CONSTANT; + val.value = build_zero_cst (TREE_TYPE (lhs)); + /* CCP wants the bits above precision set. */ + val.mask = nonzero_bits | ~mask; + is_constant = true; + } + else + { + double_int valv = tree_to_double_int (val.value); + gcc_assert ((valv ~val.mask +~nonzero_bits mask).is_zero ()); + if (!(valv ~nonzero_bits mask).is_zero ()) + val.value = double_int_to_tree (TREE_TYPE (lhs), + valv nonzero_bits); + if (nonzero_bits.is_zero ()) + val.mask = double_int_zero; + else + val.mask = val.mask (nonzero_bits | ~mask); + } } } in the patch, but that breaks e.g. on void foo (unsigned int a, unsigned b, unsigned **c, void *d[8], int e) { int i; for (i = 0; i != e; i++); if (a) { for (i = 0;;) { i--; if (bar ()) goto lab2; } lab1:; __builtin_printf (%d\n, i 0 ? -1 - i : i); return; } lab2:; if (!d[b] *c) goto lab1; } at -O2 - VRP1 figures out: if (i_3 0) goto bb 10; else goto bb 11; bb 10: # RANGE [0, 2147483647] NONZERO 0x07fff iftmp.0_23
Re: [wide-int] Update main comment
Kenneth Zadeck zad...@naturalbridge.com writes: On 10/29/2013 06:37 PM, Richard Sandiford wrote: This patch tries to update the main wide_int comment to reflect the current implementation. - bitsizetype is TImode on x86_64 and others, so I don't think it's necessarily true that all offset_ints are signed. (widest_int are though.) i am wondering if this is too conservative an interpretation.I believe that they are ti mode because that is the next thing after di mode and so they wanted to accommodate the 3 extra bits. Certainly there is no x86 that is able to address more than 64 bits. Right, but my point is that it's a different case from widest_int. It'd be just as valid to do bitsizetype arithmetic using wide_int rather than offset_int, and those wide_ints would have precision 128, just like the offset_ints. And I wouldn't really say that those wide_ints were fundamentally signed in any way. Although the tree layer might know that X upper bits of the bitsizetype are always signs, the tree-wide_int interface treats them in the same way as any other 128-bit type. Maybe I'm just being pedantic, but I think offset_int would only be like widest_int if bitsizetype had precision 67 or whatever. Then we could say that both offset_int and widest_int must be wider than any inputs, meaning that there's at least one leading sign bit. This is related to the way that we have to assert: template int N inline wi::extended_tree N::extended_tree (const_tree t) : m_t (t) { gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) = N); } rather than: template int N inline wi::extended_tree N::extended_tree (const_tree t) : m_t (t) { gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) N); } (which would give slightly better offset_int code, because we could then always use TREE_INT_CST_EXT_NUNITS.) Thanks, Richard
Re: [PATCH] Introduce [sg]et_nonzero_bits
On Tue, 29 Oct 2013, Jakub Jelinek wrote: Hi! On Tue, Oct 29, 2013 at 04:29:56PM +0100, Jakub Jelinek wrote: Surely you can't rely on CCP and VRP compute exactly the same nonzero_bits. As you don't record/compute zero_bits you can't tell whether a not set bit in nonzer_bits is don't know or if it is zero. And you cannot do an assert during iteration Unset bits in get_nonzero_bits are always known to be zero, set bits are don't know or known to be nonzero. (during iteration optimistically 'wrong' values can disappear). During CCP iteration the rule is that bits may only be added to mask (and obviously the constant itself on a CONSTANT lattice value may not change). Here is an untested safer variant of the original patch that attempts to honor the constraints you've mentioned by looking at get_value and either making nonzero_bits more conservative or not applying it at all if the above rules would be violated. Still, not doing anything for the !is_constant case, because I don't know what would be the right thing. Say, if for iteration we assume some constant value that has some bits unset in mask (known) and set in value (known non-zero), yet they are clear in get_nonzero_bits, shall we punt, or ignore what the iteration would try to do and adjust from get_nonzero_bits instead, something else? Well, you'd unconditionally make the lattice value more constant using get_nonzero_bits (which doesn't change during iteration). If it is applied consistently then convergence shouldn't change. So as how get_nonzero_bits works you can only force the kown zero bits to the lattice. You don't need to be concerned about invalid lattice transitions as with always applying a constant op you'll never get those. [and if you'd ever arrive at get_nonzero_bits saying that a bit is zero but the lattice value is not zero in that bit you have the choice of dropping to don't know for that bit or enforcing the zeroness of the bit by changing the lattice value as well as its mask] Richard. Factor this out to commonize with code in evaluate_stmt Ok. But in this more conservative patch I don't see enough code to commonize any more. The only change in the patch below since the previous patch is in the evaluate_stmt function: 2013-10-29 Jakub Jelinek ja...@redhat.com * gimple-pretty-print.c (dump_ssaname_info): Print newline also in case of VR_VARYING. Print get_nonzero_bits if not all ones. * tree-ssanames.h (struct range_info_def): Add nonzero_bits field. (set_nonzero_bits, get_nonzero_bits): New prototypes. * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if a default def isn't partially constant. (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME is known to be partially zero. (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info to see if a default def isn't partially constant. * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon creation of a range, if VR_RANGE, try to improve nonzero_bits from the range. (set_nonzero_bits, get_nonzero_bits): New functions. * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure we warn about it already during VRP1 pass. --- gcc/gimple-pretty-print.c.jj 2013-10-23 14:43:12.0 +0200 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe if (!POINTER_TYPE_P (TREE_TYPE (node)) SSA_NAME_RANGE_INFO (node)) { - double_int min, max; + double_int min, max, nonzero_bits; value_range_type range_type = get_range_info (node, min, max); if (range_type == VR_VARYING) @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe pp_printf (buffer, , ); pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node))); pp_printf (buffer, ]); - newline_and_indent (buffer, spc); } + nonzero_bits = get_nonzero_bits (node); + if (nonzero_bits != double_int_minus_one +(nonzero_bits + != double_int::mask (TYPE_PRECISION (TREE_TYPE (node) + { + pp_string (buffer, NONZERO ); + sprintf (pp_buffer (buffer)-digit_buffer, +HOST_WIDE_INT_PRINT_DOUBLE_HEX, +(unsigned HOST_WIDE_INT) nonzero_bits.high, +nonzero_bits.low); + pp_string (buffer, pp_buffer (buffer)-digit_buffer); + } + newline_and_indent (buffer, spc); } } --- gcc/tree-ssanames.h.jj2013-09-27 15:42:44.0 +0200 +++ gcc/tree-ssanames.h 2013-10-24 15:52:53.765955605 +0200 @@ -52,6 +52,8 @@ struct GTY (()) range_info_def { double_int min; /* Maximum for value range. */ double_int max; + /* Non-zero bits - bits not set are guaranteed to be always zero. */ +
Re: [PATCH] Use get_nonzero_bits to improve vectorization
On Tue, 29 Oct 2013, Jakub Jelinek wrote: On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote: +/* Return number of known trailing zero bits in EXPR, or, if the value of + EXPR is known to be zero, the precision of it's type. */ + +int unsigned int? Ok. +case PLUS_EXPR: +case MINUS_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case MIN_EXPR: +case MAX_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); This first recurses but if either one returns 0 you don't have to (that cuts down the recursion from exponential to linear in the common case?). Thus, early out on ret == 0? Yeah, that is reasonable. Usually we use this during expansion etc., trees won't be arbitrarily large and for SSA_NAMEs we don't recurse on definitions, so I think we are unlikely to ever see very large expressions there though. I've added similar early out to the COND_EXPR case. + return MIN (ret1, ret2); +case POINTER_PLUS_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = MIN (ret2, prec); Why do you need that here but not elsewhere when processing binary ops? Because POINTER_PLUS_EXPR (well, also shifts, but for those we don't call tree_ctz on the second argument) is the only binop we handle that uses different types for the operands, for the first argument we know it has the same precision as the result, but what if sizetype has bigger precision than TYPE_PRECISION of pointers? I know it typically doesn't, just wanted to make sure we never return an out of range return value, tree_ctz result should be in [0, prec] always. Ah, indeed. Maybe worth a comment. + return MIN (ret1, ret2); +case BIT_AND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MAX (ret1, ret2); +case MULT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MIN (ret1 + ret2, prec); +case LSHIFT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (host_integerp (TREE_OPERAND (expr, 1), 1) check that first before recursing for op0 - if op1 is negative you simply return ret1 which looks wrong, too. If op1 is negative, then it is undefined behavior, so assuming in that case the same thing as when op1 is not constant (i.e. that we worst case shift left by 0 and thus not increase number of trailing zeros, or shift left by 0 and increase it) is IMHO correct. I don't think we treat left shift by negative value as right shift anywhere, do we? We do during constant folding I think. +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) +(unsigned HOST_WIDE_INT) prec)) This check is to avoid overflowing ret1 + ret2? If so, why not add + { + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); ret2 = MIN (ret2, prec); instead? Because ret{1,2} are int (well, with the change you've suggested unsigned int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1 (yeah, surely undefined behavior), I just didn't want to lose any of the upper bits. Sure, alternatively I could have an UHWI temporary variable and assign tree_low_cst to it and do the MIN on that temporary and prec, but still I think it is better to handle out of range constants as variable shift count rather than say shift count up by prec - 1. Ok. Thanks, Richard. +case TRUNC_DIV_EXPR: +case CEIL_DIV_EXPR: +case FLOOR_DIV_EXPR: +case ROUND_DIV_EXPR: +case EXACT_DIV_EXPR: + if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST) + { + ret2 = tree_log2 (TREE_OPERAND (expr, 1)); + if (ret2 = 0 tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1) cheaper to test the sign first. Ok. + { + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret2) + return ret1 - ret2; + } + } + return 0; +CASE_CONVERT: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 0 + ret1 = prec; + return MIN (ret1, prec); +case SAVE_EXPR: + return tree_ctz (TREE_OPERAND (expr, 0)); +case COND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = tree_ctz (TREE_OPERAND (expr, 2)); + return MIN (ret1, ret2); +case COMPOUND_EXPR: + return tree_ctz (TREE_OPERAND (expr, 1)); +case ADDR_EXPR: + ret1 = get_object_alignment (TREE_OPERAND (expr, 0)); Use get_pointer_alignment, this isn't a memory reference so type alignment rules don't apply. Ok.
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 11:55:44AM +0100, Uros Bizjak wrote: On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote: Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize This is actually PR 47754 that fell below radar for some reason... Apparently yes. the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned loads, which usually means combine will fail, by doing the load into a temporary pseudo in that case and then doing a pseudo to pseudo move with gen_lowpart on the rhs (which will be merged soon after into following instructions). Is this similar to PR44141? There were similar problems with V4SFmode subregs, so combine was not able to merge load to the arithemtic insn. From the work on the vectorization last year I remember many cases where subregs (even equal size) on the LHS of instructions prevented combiner or other RTL optimizations from doing it's job. I believe I've changed some easy places that did that completely unnecessarily, but certainly have not went through all the code to look for other places where this is done. Perhaps let's hack up a checking pass that will after expansion walk the whole IL and complain about same sized subregs on the LHS of insns, then do make check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.? I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my bootstrap/regtest server isn't AVX capable. I can bootstrap the patch later today on IvyBridge with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. That would be greatly appreciated, thanks. Jakub
Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
On Tue, 29 Oct 2013, Cong Hou wrote: Hi SAD (Sum of Absolute Differences) is a common and important algorithm in image processing and other areas. SSE2 even introduced a new instruction PSADBW for it. A SAD loop can be greatly accelerated by this instruction after being vectorized. This patch introduced a new operation SAD_EXPR and a SAD pattern recognizer in vectorizer. The pattern of SAD is shown below: unsigned type x_t, y_t; signed TYPE1 diff, abs_diff; TYPE2 sum = init; loop: sum_0 = phi init, sum_1 S1 x_t = ... S2 y_t = ... S3 x_T = (TYPE1) x_t; S4 y_T = (TYPE1) y_t; S5 diff = x_T - y_T; S6 abs_diff = ABS_EXPR diff; [S7 abs_diff = (TYPE2) abs_diff; #optional] S8 sum_1 = abs_diff + sum_0; where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the same size of 'TYPE1' or bigger. This is a special case of a reduction computation. For SSE2, type is char, and TYPE1 and TYPE2 are int. In order to express this new operation, a new expression SAD_EXPR is introduced in tree.def, and the corresponding entry in optabs is added. The patch also added the define_expand for SSE2 and AVX2 platforms for i386. The patch is pasted below and also attached as a text file (in which you can see tabs). Bootstrap and make check got passed on x86. Please give me your comments. Apart from the testcase comment made earlier +++ b/gcc/tree-cfg.c @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt) return false; case DOT_PROD_EXPR: +case SAD_EXPR: case REALIGN_LOAD_EXPR: /* FIXME. */ return false; please add proper verification of the operand types. +/* Widening sad (sum of absolute differences). + The first two arguments are of type t1 which should be unsigned integer. + The third argument and the result are of type t2, such that t2 is at least + twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to: + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2); + tmp2 = ABS_EXPR (tmp1); + arg3 = PLUS_EXPR (tmp2, arg3); */ +DEFTREECODE (SAD_EXPR, sad_expr, tcc_expression, 3) WIDEN_MINUS_EXPR doesn't exist so you have to explain on its operation (it returns a signed wide difference?). Why should the first two arguments be unsigned? I cannot see a good reason to require that (other than that maybe the x86 target only has support for widened unsigned difference?). So if you want to make that restriction maybe change the name to SADU_EXPR (sum of absolute differences of unsigned)? I suppose you tried introducing WIDEN_MINUS_EXPR instead and letting combine do it's work, avoiding the very special optab? Thanks, Richard. thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..d528307 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,23 @@ +2013-10-29 Cong Hou co...@google.com + + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD + pattern recognition. + (type_conversion_p): PROMOTION is true if it's a type promotion + conversion, and false otherwise. Return true if the given expression + is a type conversion one. + * tree-vectorizer.h: Adjust the number of patterns. + * tree.def: Add SAD_EXPR. + * optabs.def: Add sad_optab. + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case. + * expr.c (expand_expr_real_2): Likewise. + * gimple-pretty-print.c (dump_ternary_rhs): Likewise. + * gimple.c (get_gimple_rhs_num_ops): Likewise. + * optabs.c (optab_for_tree_code): Likewise. + * tree-cfg.c (estimate_operator_cost): Likewise. + * tree-ssa-operands.c (get_expr_operands): Likewise. + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise. + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD. + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 7ed29f5..9ec761a 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp) { case COND_EXPR: case DOT_PROD_EXPR: + case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case FMA_EXPR: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..ca1ab70 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6052,6 +6052,40 @@ DONE; }) +(define_expand sadv16qi + [(match_operand:V4SI 0 register_operand) + (match_operand:V16QI 1 register_operand) + (match_operand:V16QI 2 register_operand) + (match_operand:V4SI 3 register_operand)] + TARGET_SSE2 +{ + rtx t1 = gen_reg_rtx (V2DImode); + rtx t2 = gen_reg_rtx (V4SImode); + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2])); + convert_move (t2, t1, 0); + emit_insn (gen_rtx_SET (VOIDmode, operands[0], + gen_rtx_PLUS (V4SImode, + operands[3], t2))); + DONE; +}) +
RE: [PATCH 1/n] Add conditional compare support
On Wed, 30 Oct 2013, Zhenqiang Chen wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hans-Peter Nilsson One thing I don't see other people mentioning, is that this patch has just too much code inside #ifdef HAVE_ccmp ... #endif. I couldn't actually find the part that *requires* that, i.e. code that does something like gen_ccmp (...) but maybe it's there. In the arm.md, I define (define_expand ccmp It will be transferred to function gen_ccmp (...). And it will define HAVE_ccmp. I know. I have not misunderstood that machinery. The point again, is that inside the #ifdef HAVE_ccmp there should contain as few lines as possible. In the updated patch, I add two backends hooks to generate conditional compare instruction. And that is good. Where you use hooks, we don't need the #ifdef HAVE_ccmp. That is one of the major points with hooks. Where needed and where the conditioned code is more than a few lines, such code is preferably transformed into if (0):d code using constructs like that at builtins.c:5354: #ifndef HAVE_atomic_clear # define HAVE_atomic_clear 0 # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif Right, this causes dead code, but for maintenance it's *much* better than when only a fraction of the code being compiled for other targets. (Also, the HAVE_ccmp is automatically generated when the target has ccmp instruction. Exactly like with my example; atomic_clear is generated from an insn pattern. ...or actually, it appears it's *supposed* to be, but I don't see any target defining it and it's not special in genconfig.c and not even documented in md.texi so that might be code dead everywhere! We have atomic_store but no target defines atomic_clear; certainly not without a mode. My bad for using a bad example. For a better, live example, see function.c:4665 same construct as above but with HAVE_stack_protect_set. The code segments in #ifdef HAVE_ccmp ... #endif will *not* be compiled for other targets, which do not define ccmp instruction. I think you misunderstood: *that is the problem*. We want as little as possible code to be conditional and inside #if... preprocessor macros; we want it to use hooks and if (0) code. brgds, H-P
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
2013/10/30 Richard Biener richard.guent...@gmail.com: On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] Design is on GCC Wiki and link was posted few times: http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. Here is quote about return statements: Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? Verifier just ignores bound arguments while iterating through them. How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). I'll consider using another layout for bound args. But why should we have any PARM_DECL or other pointer not having bounds? /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). Yes. And even having all bound args at the end would not fix it. We have to additionally keep number of bounds if want to fix it. But I do not see the strong reason for that. Currently there are just three calls to this function in whole GCC. /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. Yep. Marking instrumented calls and functions would be useful in LTO case. I wish I had seen all this before. Bah. Where is the update to gimple.texi and tree.texi? Richard. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Thanks for valuable input! Ilya Richard. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 7bd9fd5..be74170 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call statement @code{G}. Return the number of arguments used by call statement @code{G}. @end deftypefn +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g) +Return the number of arguments used by call statement @code{G} +ignoring bound ones. +@end deftypefn + @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index) Return the argument at position @code{INDEX} for call statement @code{G}. The first argument is 0. @end deftypefn +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple
[PATCH] Fix PR57100, add pre_and_rev_post_order_compute_fn
This adds a struct function taking variant of pre_and_rev_post_order_compute that also drops the assert that all blocks were visited (consistent with the function comment). This makes graph.c not ICE on CFGs with unreachable blocks which it tries to handle already, but it didn't notice that RPO order is filled from the end and that pre_and_rev_post_order_compute ICEs anyway for n != n_basic_blocks_for_function ... The reverse filling is a good reason to keep the assert in the pre_and_rev_post_order_compute variant that now wraps pre_and_rev_post_order_compute_fn. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2013-10-30 Richard Biener rguent...@suse.de PR middle-end/57100 * basic-block.h (pre_and_rev_post_order_compute_fn): New function. * cfganal.c (pre_and_rev_post_order_compute_fn): New function as worker for ... (pre_and_rev_post_order_compute): ... which now wraps it. * graph.c (draw_cfg_nodes_no_loops): Use pre_and_rev_post_order_compute_fn to avoid ICEing and dependence on cfun. Index: gcc/basic-block.h === *** gcc/basic-block.h (revision 204202) --- gcc/basic-block.h (working copy) *** extern void connect_infinite_loops_to_ex *** 795,800 --- 795,802 extern int post_order_compute (int *, bool, bool); extern basic_block dfs_find_deadend (basic_block); extern int inverted_post_order_compute (int *); + extern int pre_and_rev_post_order_compute_fn (struct function *, + int *, int *, bool); extern int pre_and_rev_post_order_compute (int *, int *, bool); extern int dfs_enumerate_from (basic_block, int, bool (*)(const_basic_block, const void *), Index: gcc/cfganal.c === *** gcc/cfganal.c (revision 204202) --- gcc/cfganal.c (working copy) *** inverted_post_order_compute (int *post_o *** 878,897 return post_order_num; } ! /* Compute the depth first search order and store in the array ! PRE_ORDER if nonzero, marking the nodes visited in VISITED. If ! REV_POST_ORDER is nonzero, return the reverse completion number for each ! node. Returns the number of nodes visited. A depth first search ! tries to get as far away from the starting point as quickly as ! possible. ! ! pre_order is a really a preorder numbering of the graph. ! rev_post_order is really a reverse postorder numbering of the graph. ! */ int ! pre_and_rev_post_order_compute (int *pre_order, int *rev_post_order, ! bool include_entry_exit) { edge_iterator *stack; int sp; --- 878,899 return post_order_num; } ! /* Compute the depth first search order of FN and store in the array !PRE_ORDER if nonzero. If REV_POST_ORDER is nonzero, return the !reverse completion number for each node. Returns the number of nodes !visited. A depth first search tries to get as far away from the starting !point as quickly as possible. ! !In case the function has unreachable blocks the number of nodes !visited does not include them. ! !pre_order is a really a preorder numbering of the graph. !rev_post_order is really a reverse postorder numbering of the graph. */ int ! pre_and_rev_post_order_compute_fn (struct function *fn, ! int *pre_order, int *rev_post_order, ! bool include_entry_exit) { edge_iterator *stack; int sp; *** pre_and_rev_post_order_compute (int *pre *** 921,927 bitmap_clear (visited); /* Push the first edge on to the stack. */ ! stack[sp++] = ei_start (ENTRY_BLOCK_PTR-succs); while (sp) { --- 923,929 bitmap_clear (visited); /* Push the first edge on to the stack. */ ! stack[sp++] = ei_start (ENTRY_BLOCK_PTR_FOR_FUNCTION (fn)-succs); while (sp) { *** pre_and_rev_post_order_compute (int *pre *** 935,941 dest = ei_edge (ei)-dest; /* Check if the edge destination has been visited yet. */ ! if (dest != EXIT_BLOCK_PTR ! bitmap_bit_p (visited, dest-index)) { /* Mark that we have visited the destination. */ bitmap_set_bit (visited, dest-index); --- 937,944 dest = ei_edge (ei)-dest; /* Check if the edge destination has been visited yet. */ ! if (dest != EXIT_BLOCK_PTR_FOR_FUNCTION (fn) ! ! bitmap_bit_p (visited, dest-index)) { /* Mark that we have visited the destination. */ bitmap_set_bit (visited, dest-index); *** pre_and_rev_post_order_compute (int *pre *** 956,962 } else { ! if (ei_one_before_end_p (ei) src != ENTRY_BLOCK_PTR
[C++ Patch / RFC] PR 29234
Hi, I'm having a look at this very old parsing issue. We reject things like: struct S { void operator () (); }; void foo () { ( S()() ); } because the parenthesized S()() triggers an hard error from groktypename as called at the end of cp_parser_type_id (called by cp_parser_cast_expression), about a function returning a function. Of course we have to parse it as an unary-expression. I scratched my head for a while, but then I noticed that in cp_parser_cast_expression, a bit later the cp_parser_type_id call at issue, we are *already* using a cp_parser_tokens_start_cast_expression which helps use figuring out whether the tokens following the outer ')' make sense as a cast-expression. Using it a bit earlier to completely avoid calling cp_parser_type_id in the cases at issue, appears to work well, for a - negligeable in my opinion - additional computational cost. Then, cp_parser_unary_expression is actually used. As part of that, cp_parser_postfix_expression looks for a compound-literal, but not in the robust way we are using elsewhere involving a cp_parser_skip_to_closing_parenthesis, but just using tentative parsing with cp_parser_type_id. Thus we are incurring again in the same issue. Parsing the compound-literal in the same way used elesewhere - thus looking for the '{' after the ')', solves the problem. In my personal opinion, again, the computational cost shouldn't be too high, because if we don't find '{' we completely avoid cp_parser_type_id. Is this the kind of approach we want to pursue for this issue? Tested x86_64-linux. Thanks! Paolo. / Index: cp/parser.c === --- cp/parser.c (revision 204202) +++ cp/parser.c (working copy) @@ -5800,31 +5800,45 @@ cp_parser_postfix_expression (cp_parser *parser, b cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN)) { tree initializer = NULL_TREE; - bool saved_in_type_id_in_expr_p; + bool compound_literal_p; cp_parser_parse_tentatively (parser); /* Consume the `('. */ cp_lexer_consume_token (parser-lexer); - /* Parse the type. */ - saved_in_type_id_in_expr_p = parser-in_type_id_in_expr_p; - parser-in_type_id_in_expr_p = true; - type = cp_parser_type_id (parser); - parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p; - /* Look for the `)'. */ - cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); + + /* Avoid calling cp_parser_type_id pointlessly, see comment + in cp_parser_cast_expression about c++/29234. */ + cp_lexer_save_tokens (parser-lexer); + + compound_literal_p + = (cp_parser_skip_to_closing_parenthesis (parser, false, false, + /*consume_paren=*/true) + cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE)); + + /* Roll back the tokens we skipped. */ + cp_lexer_rollback_tokens (parser-lexer); + + if (!compound_literal_p) + cp_parser_simulate_error (parser); + else + { + /* Parse the type. */ + bool saved_in_type_id_in_expr_p = parser-in_type_id_in_expr_p; + parser-in_type_id_in_expr_p = true; + type = cp_parser_type_id (parser); + parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p; + /* Look for the `)'. */ + cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); + } + /* If things aren't going well, there's no need to keep going. */ if (!cp_parser_error_occurred (parser)) { - if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE)) - { - bool non_constant_p; - /* Parse the brace-enclosed initializer list. */ - initializer = cp_parser_braced_list (parser, -non_constant_p); - } - else - cp_parser_simulate_error (parser); + bool non_constant_p; + /* Parse the brace-enclosed initializer list. */ + initializer = cp_parser_braced_list (parser, +non_constant_p); } /* If that worked, we're definitely looking at a compound-literal expression. */ @@ -7559,7 +7573,9 @@ cp_parser_cast_expression (cp_parser *parser, bool { tree type = NULL_TREE; tree expr = NULL_TREE; + bool skip_to_closing_ok_p; bool compound_literal_p; + bool cast_expression_p; const char *saved_message; /* There's no way to know yet whether or not this is a cast.
Re: [PATCH 1/n] Add conditional compare support
Il 18/09/2013 11:45, Zhenqiang Chen ha scritto: +;; Expand conditional compare according to the instruction patterns. +(define_expand conditional_compare + [(set (match_operand 0 s_register_operand ) + (match_operator 1 + [(match_operator:SI 2 arm_comparison_operator + [(match_operand:SI 3 s_register_operand ) +(match_operand:SI 4 arm_add_operand )]) My first reaction to this is that this operand should only match const0_rtx, and operands 2/3 should be the same mode as operand 0 rather than SImode. In particular, perhaps they could be CCmode or BImode on some architectures. But then Richard is right that this would look a lot like the old compare + branch patterns. So either you could do this at expansion time using hooks, like Richard mentioned, or you could do a pattern matching pass that is specific to this pattern and combines sets with MODE_CC destinations across multiple basic blocks. Paolo + (match_operator:SI 5 arm_comparison_operator + [(match_operand:SI 6 s_register_operand ) +(match_operand:SI 7 arm_add_operand )])]))] + TARGET_ARM || TARGET_THUMB2 + { + enum machine_mode mode; + HOST_WIDE_INT cond_or; + rtx par; + enum rtx_code code = GET_CODE (operands[1]); + cond_or = code == AND? DOM_CC_X_AND_Y : DOM_CC_X_OR_Y; + + mode = arm_select_dominance_cc_mode (operands[2], + operands[5], + cond_or); + /* To match and/ior_scc_scc, mode should not be CCmode. */ + if (mode == CCmode) + FAIL; + + operands[2] = gen_rtx_fmt_ee (GET_CODE (operands[2]), +GET_MODE (operands[3]), +operands[3], operands[4]); + operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[5]), +GET_MODE (operands[5]), +operands[6], operands[7]); + operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), SImode, +operands[2], operands[5]); + + /* Generate insn to match and/ior_scc_scc. */ + par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); + XVECEXP (par, 0, 0) = gen_rtx_SET (GET_MODE (operands[0]), + operands[0], operands[1]); + XVECEXP (par, 0, 1) = gen_rtx_CLOBBER (CCmode, + gen_rtx_REG (CCmode, CC_REGNUM)); + + emit_insn (par); + + DONE; + }
Re: Aliasing: look through pointer's def stmt
On Wed, 30 Oct 2013, Richard Biener wrote: On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 29 Oct 2013, Richard Biener wrote: For the POINTER_PLUS_EXPR offset argument you should use int_cst_value () to access it (it will unconditionally sign-extend) and use host_integerp (..., 0). That leaves the overflow possibility in place (and you should multiply by BITS_PER_UNIT) which we ignore in enough other places similar to this to ignore ... Like this? (passes bootstrap+testsuite on x86_64-linux-gnu) 2013-10-30 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a POINTER_PLUS_EXPR in the defining statement. gcc/testsuite/ * gcc.dg/tree-ssa/alias-24.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +extern void keepit (); +void g (const char *c, int *i) +{ + *i = 33; + __builtin_memcpy (i - 1, c, 3 * sizeof (int)); + if (*i != 33) keepit(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump keepit optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204188) +++ gcc/tree-ssa-alias.c(working copy) @@ -567,20 +567,29 @@ void ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size) { HOST_WIDE_INT t1, t2; ref-ref = NULL_TREE; if (TREE_CODE (ptr) == SSA_NAME) { gimple stmt = SSA_NAME_DEF_STMT (ptr); if (gimple_assign_single_p (stmt) gimple_assign_rhs_code (stmt) == ADDR_EXPR) ptr = gimple_assign_rhs1 (stmt); + else if (is_gimple_assign (stmt) + gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + host_integerp (gimple_assign_rhs2 (stmt), 0) + (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0) No need to restrict this to positive offsets I think. Actually there is. The function documents that it only handles nonnegative offsets, and if we ignore that, the keepit part of the testcase fails because ranges_overlap_p works with unsigned offsets. I can try to change ranges_overlap_p and remove this restriction from ao_ref_init_from_ptr_and_size, but that's more risky so I'd prefer to do that as a separate follow-up patch if that's ok. + { + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt), size); Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs and MEM[ptr, offset] so it shouldn't be necessary. Done. (note that if it isn't necessary, then it doesn't hurt either ;-) + ref-offset += 8 * t1; BITS_PER_UNIT instead of 8. I'd say just have a 0-initialized additional_offset var that you unconditionally add ... I also changed a few other 8s. + return; + } } if (TREE_CODE (ptr) == ADDR_EXPR) ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), ref-offset, t1, t2); else { ref-base = build2 (MEM_REF, char_type_node, ptr, null_pointer_node); ref-offset = 0; ... here at the end. Here is a new version, same testing, same ChangeLog. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +extern void keepit (); +void g (const char *c, int *i) +{ + *i = 33; + __builtin_memcpy (i - 1, c, 3 * sizeof (int)); + if (*i != 33) keepit(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump keepit optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c ___ Added: svn:keywords
Re: [wide-int] Update main comment
On 10/30/2013 07:01 AM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: On 10/29/2013 06:37 PM, Richard Sandiford wrote: This patch tries to update the main wide_int comment to reflect the current implementation. - bitsizetype is TImode on x86_64 and others, so I don't think it's necessarily true that all offset_ints are signed. (widest_int are though.) i am wondering if this is too conservative an interpretation.I believe that they are ti mode because that is the next thing after di mode and so they wanted to accommodate the 3 extra bits. Certainly there is no x86 that is able to address more than 64 bits. Right, but my point is that it's a different case from widest_int. It'd be just as valid to do bitsizetype arithmetic using wide_int rather than offset_int, and those wide_ints would have precision 128, just like the offset_ints. And I wouldn't really say that those wide_ints were fundamentally signed in any way. Although the tree layer might know that X upper bits of the bitsizetype are always signs, the tree-wide_int interface treats them in the same way as any other 128-bit type. Maybe I'm just being pedantic, but I think offset_int would only be like widest_int if bitsizetype had precision 67 or whatever. Then we could say that both offset_int and widest_int must be wider than any inputs, meaning that there's at least one leading sign bit. this was of course what mike and i wanted, but we could not really figure out how to pull it off. in particular, we could not find any existing reliable marker in the targets to say what the width of the widest pointer on any implementation. We actually used the number 68 rather than 67 because we assumed 64 for the widest pointer on any existing platform, 3 bits for the bits and 1 bit for the sign. This is related to the way that we have to assert: template int N inline wi::extended_tree N::extended_tree (const_tree t) : m_t (t) { gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) = N); } rather than: template int N inline wi::extended_tree N::extended_tree (const_tree t) : m_t (t) { gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) N); } (which would give slightly better offset_int code, because we could then always use TREE_INT_CST_EXT_NUNITS.) Thanks, Richard
[c++-concepts] Constrained scope bugfix
Partially fixing a bug that caused lookup errors in valid programs. For example: templateInt T, Float U pairT, U::type void f(T, U); // Error, no such pair When entering a template scope, we tried to match the template to one having the same constraints. Obviously pair doesn't have Int and Float constraints, and it probably doesn't have a partial specialization with those constraints either. I relaxed the fixup_template_type function so that it would just return the looked-up type without emitting a diagnostic. This fix makes the following a legal, however: templatetypename T struct S { void f(); } templateInt T void ST::f() { } // Should be an error The right solution seems to be to diagnose the error only when defining an out-of-class member by verifying that each template scope in the qualified name matches a declaration with the same constraints. 2013-10-30 Andrew Sutton andrew.n.sut...@gmail.com * gcc/cp/semantics.c (fixup_template_type): Don't emit errors when no templates can be found with matching constraints. Index: gcc/cp/semantics.c === --- gcc/cp/semantics.c (revision 203626) +++ gcc/cp/semantics.c (working copy) @@ -2847,8 +2847,35 @@ finish_template_decl (tree parms) // Returns the template type of the class scope being entered. If we're // entering a constrained class scope. TYPE is the class template -// scope being entered. If TYPE is not a class-type (e.g. a typename type), -// then no fixup is needed. +// scope being entered and we may need to match the intended type with +// a constrained specialization. For example: +// +//templateObject T +// struct S { void f(); }; #1 +// +//templateObject T +// void ST::f() { } #2 +// +// We check, in #2, that ST refers precisely to the type declared by +// #1 (i.e., that the constraints match). Note that the following should +// be an error since there is no specialization of ST that is +// unconstrained, but this is not diagnosed here. +// +//templatetypename T +// void ST::f() { } +// +// We cannot diagnose this problem here since this function also matches +// qualified template names that are not part of a definition. For example: +// +//templateIntegral T, Floating_point U +// typename pairT, U::first_type void f(T, U); +// +// Here, it is unlikely that there is a partial specialization of +// pair constrained for for Integral and Floating_point arguments. +// +// The general rule is: if a constrained specialization with matching +// constraints is found return that type. Alos note that if TYPE is not a +// class-type (e.g. a typename type), then no fixup is needed. static tree fixup_template_type (tree type) { @@ -2866,14 +2893,8 @@ fixup_template_type (tree type) return type; tree cur_constr = TEMPLATE_PARMS_CONSTRAINTS (parms); - // Do the constraints match those of the most general template? - // If the constraints are NULL_TREE, this will match the most general - // template iff it is unconstrained. + // Search for a specialization whose constraints match. tree tmpl = CLASSTYPE_TI_TEMPLATE (type); - if (equivalent_constraints (cur_constr, DECL_CONSTRAINTS (tmpl))) -return type; - - // Can we find a specialization that matches? tree specs = DECL_TEMPLATE_SPECIALIZATIONS (tmpl); while (specs) { @@ -2883,10 +2904,8 @@ fixup_template_type (tree type) specs = TREE_CHAIN (specs); } - // Emit an error, but return the type to allow processing to continue. - // TODO: We should emit candidates since we've just scanned the - // list of template constraints. - error (type %qT does not match any declarations, type); + // If no specialization matches, then must return the type + // previously found. return type; } @@ -2904,7 +2923,7 @@ finish_template_type (tree name, tree ar type = lookup_template_class (name, args, NULL_TREE, NULL_TREE, entering_scope, tf_warning_or_error | tf_user); - + // If entering a scope, correct the lookup to account for constraints. if (entering_scope) type = fixup_template_type (type);
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On Mon, Oct 28, 2013 at 09:01:45PM +, Iyer, Balaji V wrote: Thanks! I will extract and check in the Cilk_spawn and _Cilk_sync for C work. This broke bootstrap on i686-linux, fixed thusly, committed as obvious: 2013-10-30 Jakub Jelinek ja...@redhat.com * cilk.c (create_cilk_helper_decl): Use HOST_WIDE_INT_PRINT_DEC. --- gcc/c-family/cilk.c.jj 2013-10-30 13:53:52.0 +0100 +++ gcc/c-family/cilk.c 2013-10-30 14:44:49.358912539 +0100 @@ -287,9 +287,9 @@ create_cilk_helper_decl (struct wrapper_ { char name[20]; if (wd-type == CILK_BLOCK_FOR) -sprintf (name, _cilk_for_%ld, cilk_wrapper_count++); +sprintf (name, _cilk_for_ HOST_WIDE_INT_PRINT_DEC, cilk_wrapper_count++); else if (wd-type == CILK_BLOCK_SPAWN) -sprintf (name, _cilk_spn_%ld, cilk_wrapper_count++); +sprintf (name, _cilk_spn_ HOST_WIDE_INT_PRINT_DEC, cilk_wrapper_count++); else gcc_unreachable (); Jakub
[c++-concepts] Diagnostics patch
Applying a patch from Ville that adds diagnostics for the concept specifier. Thanks Ville! 2013-10-30 Ville Voutilainen ville.voutilai...@gmail.com * gcc/cp/decl.c (grokdeclarator): Reject concept keyword in typedefs, function parameters, data members, non-static member functions and variables. Allow static member functions to be concepts. Andrew Sutton Index: gcc/cp/decl.c === --- gcc/cp/decl.c (revision 204092) +++ gcc/cp/decl.c (working copy) @@ -9074,6 +9074,12 @@ if (name == NULL) name = decl_context == PARM ? parameter : type name; + if (concept_p typedef_p) +{ + error (%concept% cannot appear in a typedef declaration); + return error_mark_node; +} + if (constexpr_p typedef_p) { error (%constexpr% cannot appear in a typedef declaration); @@ -9387,9 +9393,12 @@ || thread_p) error (storage class specifiers invalid in parameter declarations); + /* Function parameters cannot be concept. */ + if (concept_p) + error (a parameter cannot be declared %concept%); /* Function parameters cannot be constexpr. If we saw one, moan and pretend it wasn't there. */ - if (constexpr_p) + else if (constexpr_p) { error (a parameter cannot be declared %constexpr%); constexpr_p = 0; @@ -10619,6 +10628,11 @@ uqname, ctype); return error_mark_node; } +if (concept_p) + { +error (a destructor cannot be %concept%); +return error_mark_node; + } if (constexpr_p) { error (a destructor cannot be %constexpr%); @@ -10632,6 +10646,12 @@ id_declarator-u.id.unqualified_name); return error_mark_node; } + if (sfk == sfk_constructor) +if (concept_p) + { +error (a constructor cannot be %concept%); +return error_mark_node; + } /* Tell grokfndecl if it needs to set TREE_PUBLIC on the node. */ function_context = (ctype != NULL_TREE) ? @@ -10645,7 +10665,7 @@ unqualified_id, virtualp, flags, memfn_quals, rqual, raises, friendp ? -1 : 0, friendp, publicp, - inlinep | (2 * constexpr_p), + inlinep | (2 * constexpr_p) | (4 * concept_p), sfk, funcdef_flag, template_count, in_namespace, attrlist, declarator-id_loc); @@ -10739,8 +10759,12 @@ if (declspecs-gnu_thread_keyword_p) DECL_GNU_TLS_P (decl) = true; } - - if (constexpr_p !initialized) + if (concept_p) + // TODO: This needs to be revisited once variable + // templates are supported + error (static data member %qE declared %concept%, + unqualified_id); + else if (constexpr_p !initialized) { error (constexpr static data member %qD must have an initializer, decl); @@ -10749,7 +10773,10 @@ } else { -if (constexpr_p) + if (concept_p) + error (non-static data member %qE declared %concept%, + unqualified_id); +else if (constexpr_p) { error (non-static data member %qE declared %constexpr%, unqualified_id); @@ -10897,6 +10924,15 @@ { /* It's a variable. */ + // TODO: This needs to be revisited once variable + // templates are supported + if (concept_p) + { + error (variable %qE declared %concept%, + unqualified_id); + return error_mark_node; + } + /* An uninitialized decl with `extern' is a reference. */ decl = grokvardecl (type, unqualified_id, declspecs, Index: gcc/testsuite/g++.dg/concepts/decl-diagnose.C === --- gcc/testsuite/g++.dg/concepts/decl-diagnose.C (revision 0) +++ gcc/testsuite/g++.dg/concepts/decl-diagnose.C (working copy) @@ -0,0 +1,20 @@ +// { dg-options -std=c++11 } +typedef concept int CINT; // { dg-error 'concept' cannot appear in a typedef declaration } + +void f(concept int); // { dg-error a parameter cannot be declared 'concept' } + +concept int f2(); // { dg-error result must be bool } +concept bool f3(); + +struct X +{ + concept int f4(); // { dg-error result must be bool|declared with function parameters } + concept bool f5(); // { dg-error declared with function parameters } + static concept bool f6(); + static concept bool x; // { dg-error declared 'concept' } + concept int x2; // { dg-error declared 'concept' } + concept ~X(); // { dg-error a destructor cannot be 'concept' } + concept X(); // { dg-error a constructor cannot be 'concept' } +}; + +concept bool X2; // { dg-error declared 'concept' }
Re: Aliasing: look through pointer's def stmt
On Wed, Oct 30, 2013 at 1:43 PM, Marc Glisse marc.gli...@inria.fr wrote: On Wed, 30 Oct 2013, Richard Biener wrote: On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 29 Oct 2013, Richard Biener wrote: For the POINTER_PLUS_EXPR offset argument you should use int_cst_value () to access it (it will unconditionally sign-extend) and use host_integerp (..., 0). That leaves the overflow possibility in place (and you should multiply by BITS_PER_UNIT) which we ignore in enough other places similar to this to ignore ... Like this? (passes bootstrap+testsuite on x86_64-linux-gnu) 2013-10-30 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a POINTER_PLUS_EXPR in the defining statement. gcc/testsuite/ * gcc.dg/tree-ssa/alias-24.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +extern void keepit (); +void g (const char *c, int *i) +{ + *i = 33; + __builtin_memcpy (i - 1, c, 3 * sizeof (int)); + if (*i != 33) keepit(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump keepit optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204188) +++ gcc/tree-ssa-alias.c(working copy) @@ -567,20 +567,29 @@ void ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size) { HOST_WIDE_INT t1, t2; ref-ref = NULL_TREE; if (TREE_CODE (ptr) == SSA_NAME) { gimple stmt = SSA_NAME_DEF_STMT (ptr); if (gimple_assign_single_p (stmt) gimple_assign_rhs_code (stmt) == ADDR_EXPR) ptr = gimple_assign_rhs1 (stmt); + else if (is_gimple_assign (stmt) + gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + host_integerp (gimple_assign_rhs2 (stmt), 0) + (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0) No need to restrict this to positive offsets I think. Actually there is. The function documents that it only handles nonnegative offsets, and if we ignore that, the keepit part of the testcase fails because ranges_overlap_p works with unsigned offsets. I can try to change ranges_overlap_p and remove this restriction from ao_ref_init_from_ptr_and_size, but that's more risky so I'd prefer to do that as a separate follow-up patch if that's ok. + { + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt), size); Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs and MEM[ptr, offset] so it shouldn't be necessary. Done. (note that if it isn't necessary, then it doesn't hurt either ;-) + ref-offset += 8 * t1; BITS_PER_UNIT instead of 8. I'd say just have a 0-initialized additional_offset var that you unconditionally add ... I also changed a few other 8s. + return; + } } if (TREE_CODE (ptr) == ADDR_EXPR) ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), ref-offset, t1, t2); else { ref-base = build2 (MEM_REF, char_type_node, ptr, null_pointer_node); ref-offset = 0; ... here at the end. Here is a new version, same testing, same ChangeLog. Ok. Thanks, Richard. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +extern void keepit (); +void g (const char *c, int *i) +{ + *i = 33; + __builtin_memcpy (i - 1, c, 3 * sizeof (int)); + if (*i != 33) keepit(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump keepit
Re: [PATCH GCC]Simplify address expression in IVOPT
On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng bin.ch...@arm.com wrote: Hi, I noticed that IVOPT generates complex address expressions like below for iv base. arr_base[0].y arr[0] MEM[p+o] It's even worse for targets support auto-increment addressing mode because IVOPT adjusts such base expression with +/- step, then creates below: arr_base[0].y +/- step arr[0] +/- step MEM[p+o] +/- step It has two disadvantages: 1) Cost computation in IVOPT can't handle complex address expression and general returns spill_cost for it, which is bad since address iv is important to IVOPT. 2) IVOPT creates duplicate candidates for IVs which have same value in different forms, for example, two candidates are generated with each for a[0] and a. Again, it's even worse for auto-increment addressing mode. This patch fixes the issue by simplifying address expression at the entry of allocating IV struct. Maybe the simplification can be put in various fold* functions but I think it might be better in this way, because: 1) fold* functions are used from front-end to various tree optimizations, the simplified address expressions may not be what each optimizer wanted. Think about parallelism related passes, they might want the array index information kept for further analysis. 2) In some way, the simplification is conflict with current implementation of fold* function. Take fold_binary_loc as an example, it tries to simplify a[i1] +p c* i2 into a[i1+i2]. Of course we can simplify in this way for IVOPT too, but that will cause new problems like: a) we have to add code in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want to avoid; b) the simplification can't always be done because of the sign/unsigned offset problem (especially for auto-increment addressing mode). 3) There are many entry point for fold* functions, the change will be non-trivial. 4) The simplification is only done in alloc_iv for true (not duplicate ones) iv struct, the number of such iv should be moderate. With these points, I think it might be a win to do the simplification in IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? Bootstrap and tested on x86/x86_64/arm. The patch causes three cases failed on some target, but all of them are false alarm, which can be resolved by refining test cases to check more accurate information. Is it OK? Hmm. I think you want what get_inner_reference_aff does, using the return value of get_inner_reference as starting point for determine_base_object. And you don't want to restrict yourselves so much on what exprs to process, but only exclude DECL_Ps. Just amend get_inner_reference_aff to return the tree base object. Note that this isn't really simplifying but rather lowering all addresses. The other changes in this patch are unrelated, right? Thanks, Richard. Thanks. bin gcc/testsuite/ChangeLog 2013-10-29 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-10-29 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression. (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost.
Re: [PATCH][ubsan] Add VLA bound instrumentation
On Fri, Oct 25, 2013 at 03:04:41PM -0400, Jason Merrill wrote: I'm sorry, you want me to move the c++1y VLA check into compute_array_index_type, or just do the ubsan instrumentation in there? Thanks, Both. Unfortunately, I'm having quite a lot of trouble with side-effects. :( For e.g. int x = 1; int a[++x]; with the following hunk --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -8394,6 +8382,18 @@ compute_array_index_type (tree name, tree size, tsubst_flags_t com if (found) itype = variable_size (fold (newitype)); } + + if ((flag_sanitize SANITIZE_VLA) + !processing_template_decl + /* From C++1y onwards, we throw an exception on a negative +length size of an array; see above */ + cxx_dialect cxx1y) + { + tree x = cp_save_expr (size); + x = build2 (COMPOUND_EXPR, TREE_TYPE (x), + ubsan_instrument_vla (input_location, x), x); + finish_expr_stmt (x); + } } /* Make sure that there was no overflow when creating to a signed index type. (For example, on a 32-bit machine, an array with we generate int x = 1; int a[0:(sizetype) SAVE_EXPR D.2143]; cleanup_point int x = 1;; cleanup_point Unknown tree: expr_stmt if (SAVE_EXPR ++x = 0) { __builtin___ubsan_handle_vla_bound_not_positive (*.Lubsan_data0, (unsigned long) SAVE_EXPR ++x); } else { 0 }, (void) SAVE_EXPR ++x; ; ssizetype D.2143; cleanup_point Unknown tree: expr_stmt (void) (D.2143 = (ssizetype) ++x + -1) ; cleanup_point int a[0:(sizetype) SAVE_EXPR D.2143];; that is, x is incremented twice and that is wrong. Is it possible to tell x has already been evaluated, don't evaluate it again so that the x isn't incremented in the cleanup_point? Or, would you, please, have some other advice? I've been looking into this for quite some time now, but haven't been able to come up with anything better than moving the checks back to create_array_type_for_decl, where it all started ;). Marek
patch to fix PR58784 (ARM LRA crash)
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58784 LRA has an old check of legitimate addresses. It was written before a newer address decomposition code which makes more correct checks of addresses. So I removed the old check. Committed as rev. 204215. 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58784 * lra.c (check_rtl): Remove address check before LRA work. 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58784 * gcc.target/arm/pr58784.c: New. Index: lra.c === --- lra.c (revision 204131) +++ lra.c (working copy) @@ -2017,10 +2017,8 @@ static void check_rtl (bool final_p) { - int i; basic_block bb; rtx insn; - lra_insn_recog_data_t id; lra_assert (! final_p || reload_completed); FOR_EACH_BB (bb) @@ -2036,31 +2034,13 @@ lra_assert (constrain_operands (1)); continue; } + /* LRA code is based on assumption that all addresses can be + correctly decomposed. LRA can generate reloads for + decomposable addresses. The decomposition code checks the + correctness of the addresses. So we don't need to check + the addresses here. */ if (insn_invalid_p (insn, false)) fatal_insn_not_found (insn); - if (asm_noperands (PATTERN (insn)) = 0) - continue; - id = lra_get_insn_recog_data (insn); - /* The code is based on assumption that all addresses in - regular instruction are legitimate before LRA. The code in - lra-constraints.c is based on assumption that there is no - subreg of memory as an insn operand. */ - for (i = 0; i id-insn_static_data-n_operands; i++) - { - rtx op = *id-operand_loc[i]; - - if (MEM_P (op) -(GET_MODE (op) != BLKmode - || GET_CODE (XEXP (op, 0)) != SCRATCH) -! memory_address_p (GET_MODE (op), XEXP (op, 0)) - /* Some ports don't recognize the following addresses - as legitimate. Although they are legitimate if - they satisfies the constraints and will be checked - by insn constraints which we ignore here. */ -GET_CODE (XEXP (op, 0)) != UNSPEC -GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC) - fatal_insn_not_found (insn); - } } } #endif /* #ifdef ENABLE_CHECKING */ Index: testsuite/gcc.target/arm/pr58784.c === --- testsuite/gcc.target/arm/pr58784.c (revision 0) +++ testsuite/gcc.target/arm/pr58784.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2 } */ + +typedef struct __attribute__ ((__packed__)) +{ +char valueField[2]; +} ptp_tlv_t; +typedef struct __attribute__ ((__packed__)) +{ +char stepsRemoved; +ptp_tlv_t tlv[1]; +} ptp_message_announce_t; +int ptplib_send_announce(int sequenceId, int i) +{ +ptp_message_announce_t tx_packet; +((long long *)tx_packet.tlv[0].valueField)[sequenceId] = i; +f(tx_packet); +}
RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
Hello Everyone, What I ideally wanted to do with my testsuite files was that I want all the Cilk keywords test to compile no matter what the architecture is, but it should only run in certain architectures where the runtime is enabled (this is known statically and thus the testsuite doesn't have to do anything to figure it out.). Can someone please tell me how do I do this? Thanks, Balaji V. Iyer. -Original Message- From: Tobias Burnus [mailto:bur...@net-b.de] Sent: Wednesday, October 30, 2013 6:35 AM To: gcc patches; Iyer, Balaji V Subject: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk- plus/cilk-plus.exp Without that patch, which I have copied from asan-dg.exp, I get tons of failures because ld cannot find libcilkrts. OK for committal? Tobias
RFA: patch to fix PR58785 (an ARM LRA crash)
The following patch fixes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58785 LRA chooses constraint 'm' for const_int operand. It means that the const_int should be placed in memory but it does not happen as preferred reload class hook returns LO_REGS for class NO_REGS which is result of LRA choosing 'm'. I don't know why reload pass needs such value but it should be return NO_REGS IMHO as it results in much less reload insns. Is this patch ok to commit to the trunk? 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58785 * config/arm/arm.c (arm_preferred_reload_class): Don't return LO_REGS for NO_REGS for LRA. 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58785 * gcc.target/arm/pr58785.c: New. Index: config/arm/arm.c === --- config/arm/arm.c(revision 204213) +++ config/arm/arm.c(working copy) @@ -6884,7 +6884,7 @@ arm_preferred_reload_class (rtx x ATTRIB { if (rclass == GENERAL_REGS || rclass == HI_REGS - || rclass == NO_REGS + || (! lra_in_progress rclass == NO_REGS) || rclass == STACK_REG) return LO_REGS; else Index: testsuite/gcc.target/arm/pr58785.c === --- testsuite/gcc.target/arm/pr58785.c (revision 0) +++ testsuite/gcc.target/arm/pr58785.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -mthumb } */ + +typedef _Fract HQtype __attribute__ ((mode (HQ))); +extern void *memcpy (void *,const void *,int); + +HQtype f() +{ + HQtype c; + int z = 0xfada; + memcpy (c, z, 2); + return c; +}
[patch] fix libstdc++/58912
This isn't a regression but is tiny and removes an unnecessary pessimisation, so I'm committing it to the 4.7 and 4.8 branches. The problem doesn't exist on the trunk. Tested x86_64-linux. 2013-10-30 Chris Studholme c...@cs.utoronto.ca PR libstdc++/58912 * include/bits/shared_ptr_base.h (_Sp_counted_ptr_inplace): Remove unnecessary initialization of storage buffer. --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -394,7 +394,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION public: templatetypename... _Args _Sp_counted_ptr_inplace(_Alloc __a, _Args... __args) - : _M_impl(__a), _M_storage() + : _M_impl(__a) { _M_impl._M_ptr = static_cast_Tp*(static_castvoid*(_M_storage)); // _GLIBCXX_RESOLVE_LIB_DEFECTS
Re: [PATCH] Do not append *INTERNAL* to the decl name
On 10/29/2013 01:37 PM, Dehao Chen wrote: If we're actually emitting the name now, we need to give it a name different from the complete constructor. I suppose it makes sense to go with C4/D4 as in the decloning patch, Shall we do it in a separate patch? And I suppose binutils also need to be updated for C4/D4? In the same patch, please. And yes, the demangler will need to be updated. Jason
PATCH to use -Wno-format during stage1
I find -Wformat warnings about unknown % codes when building with an older GCC to be mildly annoying noise; this patch avoids them by passing -Wno-format during stage 1. Tested x86_64-pc-linux-gnu. Is this OK for trunk? Do you have another theory of how this should work? commit c40b06619fc9ef74e4d4d8b299a6c77c6fb63df5 Author: Jason Merrill ja...@redhat.com Date: Mon Oct 28 16:45:05 2013 -0400 / * Makefile.tpl (STAGE1_CONFIGURE_FLAGS): Pass --disable-build-format-warnings. gcc/ * configure.ac (loose_warn): Add -Wno-format if --disable-build-format-warnings. diff --git a/Makefile.in b/Makefile.in index 572b3d0..e0ba784 100644 --- a/Makefile.in +++ b/Makefile.in @@ -498,8 +498,10 @@ STAGE1_LANGUAGES = @stage1_languages@ # the last argument when conflicting --enable arguments are passed. # * Likewise, we force-disable coverage flags, since the installed # compiler probably has never heard of them. +# * We also disable -Wformat, since older GCCs don't understand newer %s. STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \ - --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) + --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) \ + --disable-build-format-warnings STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate STAGEprofile_TFLAGS = $(STAGE2_TFLAGS) diff --git a/Makefile.tpl b/Makefile.tpl index 3e187e1..65d070b 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -451,8 +451,10 @@ STAGE1_LANGUAGES = @stage1_languages@ # the last argument when conflicting --enable arguments are passed. # * Likewise, we force-disable coverage flags, since the installed # compiler probably has never heard of them. +# * We also disable -Wformat, since older GCCs don't understand newer %s. STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \ - --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) + --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) \ + --disable-build-format-warnings STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate STAGEprofile_TFLAGS = $(STAGE2_TFLAGS) diff --git a/gcc/configure b/gcc/configure index 1e7bcb6..ea91906 100755 --- a/gcc/configure +++ b/gcc/configure @@ -875,6 +875,7 @@ with_demangler_in_ld with_gnu_as with_as enable_largefile +enable_build_format_warnings enable_werror_always enable_checking enable_coverage @@ -1569,6 +1570,8 @@ Optional Features: for creating source tarballs for users without texinfo bison or flex --disable-largefile omit support for large files + --disable-build-format-warnings + don't use -Wformat while building GCC --enable-werror-always enable -Werror despite compiler version --enable-checking[=LIST] enable expensive run-time checks. With LIST, enable @@ -6270,9 +6273,22 @@ fi # * C++11 narrowing conversions in { } # So, we only use -pedantic if we can disable those warnings. +# In stage 1, disable -Wformat warnings from old GCCs about new % codes +# Check whether --enable-build-format-warnings was given. +if test ${enable_build_format_warnings+set} = set; then : + enableval=$enable_build_format_warnings; +else + enable_build_format_warnings=yes +fi + +if test $enable_build_format_warnings = no; then : + wf_opt=-Wno-format +else + wf_opt= +fi loose_warn= save_CFLAGS=$CFLAGS -for real_option in -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual; do +for real_option in -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual $wf_opt; do # Do the check with the no- prefix removed since gcc silently # accepts any -Wno-* option on purpose case $real_option in @@ -17897,7 +17913,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 17900 configure +#line 17916 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18003,7 +18019,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18006 configure +#line 18022 configure #include confdefs.h #if HAVE_DLFCN_H diff --git a/gcc/configure.ac b/gcc/configure.ac index 5e686db..3d3b26b 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -326,8 +326,14 @@ GCC_STDINT_TYPES # * C++11 narrowing conversions in { } # So, we only use -pedantic if we can disable those warnings. +# In stage 1, disable -Wformat warnings from old GCCs about new % codes +AC_ARG_ENABLE(build-format-warnings, + AS_HELP_STRING([--disable-build-format-warnings],[don't use -Wformat while building GCC]), + [],[enable_build_format_warnings=yes]) +AS_IF([test $enable_build_format_warnings = no], + [wf_opt=-Wno-format],[wf_opt=]) ACX_PROG_CC_WARNING_OPTS( - m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual])), [loose_warn]) + m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual $wf_opt])),
Re: [PATCH][ubsan] Add VLA bound instrumentation
On 10/30/2013 10:52 AM, Marek Polacek wrote: + if ((flag_sanitize SANITIZE_VLA) + !processing_template_decl You don't need to check processing_template_decl; the template case was already handled above. + tree x = cp_save_expr (size); + x = build2 (COMPOUND_EXPR, TREE_TYPE (x), + ubsan_instrument_vla (input_location, x), x); + finish_expr_stmt (x); Saving 'size' here doesn't help since it's already been used above. Could you use itype instead of size here? Jason
Re: [RFA][PATCH] Minor fix to aliasing machinery
On 10/30/13 03:34, Richard Biener wrote: * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle case where ao_ref_base returns a MEM_REF. * gcc.dg/tree-ssa/alias-26.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c new file mode 100644 index 000..b5625b8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 4db83bd..5120e72 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -2079,6 +2079,7 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); tree base = NULL_TREE; + tree ref_base; HOST_WIDE_INT offset = 0; if (!host_integerp (len, 0)) return false; @@ -2087,8 +2088,11 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) offset); else if (TREE_CODE (dest) == SSA_NAME) base = dest; + ref_base = ao_ref_base (ref); if (base - base == ao_ref_base (ref)) + ((TREE_CODE (ref_base) == MEM_REF + base == TREE_OPERAND (ref_base, 0)) That's not sufficient - ref_base may have an offset, so for correctness you have to check that integer_zerop (TREE_OPERAND (ref_base, 0)). But this now looks convoluted and somewhat backward, and still does not catch all cases (including the def-stmt lookup recently added to ao_ref_from_ptr_and_size). So how do you want to proceed? I'm not really up for burning through this code right now and trying to sort out how it ought to work. Perhaps checkin the test (xfailed) and wait for someone with the interest and time to push this through to completion? jeff
Re: [PATCH v2 3/6] Split symtab_node declarations onto multiple lines
On Tue, 2013-09-10 at 15:36 +0200, Jan Hubicka wrote: Amongst other things, the rename_symtab.py script converts symtab_node to symtab_node *. This will lead to broken code on declarations that declare more than one variable (only the first would get a *), so split up such declarations. gcc/ * cgraphunit.c (analyze_functions): Split symtab_node declarations onto multiple lines to make things easier for rename_symtab.py. * symtab.c (symtab_dissolve_same_comdat_group_list): Likewise. (symtab_semantically_equivalent_p): Likewise. gcc/lto * lto-symtab.c (lto_symtab_merge_decls_2): Split symtab_node declarations onto multiple lines to make things easier for rename_symtab.py. (lto_symtab_merge_decls_1): Likewise. (lto_symtab_merge_symbols_1): Likewise. OK Thanks; committed to trunk as r204216.
Re: [PATCH] Keep REG_INC note in subreg2 pass
On 10/30/13 00:09, Zhenqiang Chen wrote: On 30 October 2013 02:47, Jeff Law l...@redhat.com wrote: On 10/24/13 02:20, Zhenqiang Chen wrote: Hi, REG_INC note is lost in subreg2 pass when resolve_simple_move, which might lead to wrong dependence for ira. e.g. In function validate_equiv_mem of ira.c, it checks REG_INC note: for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) if ((REG_NOTE_KIND (note) == REG_INC || REG_NOTE_KIND (note) == REG_DEAD) REG_P (XEXP (note, 0)) reg_overlap_mentioned_p (XEXP (note, 0), memref)) return 0; Without REG_INC note, validate_equiv_mem will return a wrong result. Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more detail about a real case in kernel. Bootstrap and no make check regression on X86-64 and ARM. Is it OK for trunk and 4.8? Thanks! -Zhenqiang ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * lower-subreg.c (resolve_simple_move): Copy REG_INC note. testsuite/ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * gcc.target/arm/lp1243022.c: New test. This clearly handles adding a note when the destination is a MEM with a side effect. What about cases where the side effect is associated with a load from memory rather than a store to memory? Yes. We should handle load from memory. lp1243022.patch diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index 57b4b3c..e710fa5 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + /* Copy the REG_INC notes. */ + if (MEM_P (real_dest) !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + { + if (!REG_NOTES (minsn)) + REG_NOTES (minsn) = note; + else + add_reg_note (minsn, REG_INC, note); + } + } +#endif If MINSN does not have any notes, then this results in MINSN and INSN sharing the note. Note carefully that notes are chained (see implementation of add_reg_note). Thus the sharing would result in MINSN and INSN actually sharing a chain of notes. I'm pretty sure that's not what you intended. I think you need to always use add_reg_note. Yes. I should use add_reg_note. Here is the updated patch: diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index ebf364f..16dfa62 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn) rtx reg; reg = gen_reg_rtx (orig_mode); + +#ifdef AUTO_INC_DEC + { + rtx move = emit_move_insn (reg, src); + if (MEM_P (src)) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + add_reg_note (move, REG_INC, XEXP (note, 0)); + } + } +#else emit_move_insn (reg, src); +#endif src = reg; } @@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + if (MEM_P (real_dest) !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) Formatting nit. This should be formatted as if (MEM_P (real_dest) !(resolve_reg_p (real_dest) || resolve_subreg_p (real_dest))) If that results in too long of a line, then it should wrap like this: if (MEM_P (real_dest) !(resolve_reg_p (real_dest) || resolve_subreg_p (real_dest))) OK with that change. Please install on the trunk. The 4.8 maintainers have the final call for the 4.8 release branch. Thanks, Jeff
Re: PATCH to use -Wno-format during stage1
Il 30/10/2013 16:47, Jason Merrill ha scritto: I find -Wformat warnings about unknown % codes when building with an older GCC to be mildly annoying noise; this patch avoids them by passing -Wno-format during stage 1. Tested x86_64-pc-linux-gnu. Is this OK for trunk? Ok.
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On 10/30/13 03:26, Richard Biener wrote: diff --git a/gcc/gimple.c b/gcc/gimple.c index 3ddceb9..20f6010 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -174,7 +174,7 @@ gimple_build_with_ops_stat (enum gimple_code code, unsigned subcode, gimple gimple_build_return (tree retval) { - gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 1); + gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 2); Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? [Seems I completely missed that MPX changes gimple and the design document that was posted somewhere??] I'd assumed that'd be in a follow-up patch and probably would be bounds on the return value or NULL if there were no bounds. I'm terribly concerned about the enlarging of the return statements. On the other hand, if something enlarged a GIMPLE_ASSIGN, then I would have called it out for further discussion. Bah. Where is the update to gimple.texi and tree.texi? Good catch. Ilya, please send a patch to update the docs.
Re: [PATCH][ubsan] Add VLA bound instrumentation
On Wed, Oct 30, 2013 at 11:56:25AM -0400, Jason Merrill wrote: On 10/30/2013 10:52 AM, Marek Polacek wrote: + if ((flag_sanitize SANITIZE_VLA) + !processing_template_decl You don't need to check processing_template_decl; the template case was already handled above. Right, removed. + tree x = cp_save_expr (size); + x = build2 (COMPOUND_EXPR, TREE_TYPE (x), + ubsan_instrument_vla (input_location, x), x); + finish_expr_stmt (x); Saving 'size' here doesn't help since it's already been used above. Could you use itype instead of size here? I already experimented with that and I think I can't, since we call the finish_expr_stmt too soon, which results in: int x = 1; int a[0:(sizetype) SAVE_EXPR D.2143]; cleanup_point int x = 1;; cleanup_point Unknown tree: expr_stmt if (SAVE_EXPR D.2143 = 0) { __builtin___ubsan_handle_vla_bound_not_positive (*.Lubsan_data0, (unsigned long) SAVE_EXPR D.2143); } else { 0 }, (void) SAVE_EXPR D.2143; ; ssizetype D.2143; cleanup_point Unknown tree: expr_stmt (void) (D.2143 = (ssizetype) ++x + -1) ; and that ICEs in gimplify_var_or_parm_decl, presumably because the if (SAVE_EXPR D.2143 = 0) { ... } should be emitted *after* that cleanup_point. When we generated the C++1y check in cp_finish_decl, we emitted the check after the cleanup_point, and everything was OK. I admit I don't understand the cleanup_points very much and I don't know exactly where they are coming from, because normally I don't see them coming out of C FE. :) Thanks. Marek
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Wed, 30 Oct 2013, Richard Biener wrote: Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. I tried the attached patch, and it almost worked, except for one fortran testcase (widechar_intrinsics_10.f90): ! { dg-do run } ! { dg-options -fbackslash } implicit none character(kind=1,len=3) :: s1(3) s1 = [ abc, def, ghi ] if (any (cshift (s1, -1) /= [ s1(3), s1(1:2) ])) call abort end we end up with a double array_ref, one of variable index, and get_ref_base_and_extent signals that by returning as size the size of the whole declaration (the double-array). However, ao_ref_init_from_ptr_and_size happily ignores the last two parameters of get_ref_base_and_extent and continues as if nothing was wrong (I don't know how to easily check that something went wrong either). Whether we think this is a good/bad patch for memcpy, we have a latent bug that the recent patches are making more visible. -- Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/alias-26.c === --- testsuite/gcc.dg/tree-ssa/alias-26.c(revision 0) +++ testsuite/gcc.dg/tree-ssa/alias-26.c(working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: testsuite/gcc.dg/tree-ssa/alias-26.c ___ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Index: tree-ssa-alias.c === --- tree-ssa-alias.c(revision 204199) +++ tree-ssa-alias.c(working copy) @@ -1982,23 +1982,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2071,37 +2072,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref case BUILT_IN_MEMPCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMPCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); - tree base = NULL_TREE; - HOST_WIDE_INT offset = 0; + tree rbase = ref-base; + HOST_WIDE_INT roffset = ref-offset; if (!host_integerp (len, 0)) return false; - if (TREE_CODE (dest) == ADDR_EXPR) - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), - offset); - else if (TREE_CODE (dest) == SSA_NAME) - base = dest; - if (base - base == ao_ref_base (ref)) + ao_ref dref; + ao_ref_init_from_ptr_and_size (dref, dest, len); + tree base = ao_ref_base (dref); + HOST_WIDE_INT offset = dref.offset; + if (!base) + return false; + if (TREE_CODE (base) == MEM_REF) + { + if (TREE_CODE (rbase) != MEM_REF) + return false; + // Compare pointers. + offset += BITS_PER_UNIT + * TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + roffset += BITS_PER_UNIT +* TREE_INT_CST_LOW (TREE_OPERAND (rbase, 1)); + base = TREE_OPERAND (base, 0); + rbase = TREE_OPERAND (rbase, 0); + } + if (base == rbase) { -
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On 10/30/2013 02:47 AM, Jakub Jelinek wrote: 2013-10-30 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If op1 is misaligned_operand, just use *movmode_internal insn rather than UNSPEC_LOADU load. (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). Avoid gen_lowpart on op0 if it isn't MEM. Ok. r~
Re: C++ PATCH to deal with trivial but non-callable [cd]tors
On 10/30/2013 06:14 AM, Eric Botcazou wrote: +/* Return whether DECL, a method of a C++ TYPE, is trivial, that is to say + doesn't do anything for the objects of TYPE. */ + +static bool +is_trivial_method (const_tree decl, const_tree type) +{ + if (cpp_check (decl, IS_CONSTRUCTOR) !TYPE_NEEDS_CONSTRUCTING (type)) +return true; This will tell you whether decl is a constructor for a type with some non-trivial constructor, but not whether decl itself is non-trivial. I think a good way to check for any non-trivial methods would be to check trivial_type_p in the front end and then see if there are any !DECL_ARTIFICIAL decls in TYPE_METHODS. Jason
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On 10/30/13 04:34, Ilya Enkovich wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? In general, when someone proposes a change in the size of tree, rtl or gimple nodes, it's a yellow flag that something may need further investigation. In this specific instance, I could trivially predict how that additional field would be used and a GIMPLE_RETURN isn't terribly important from a size standpoint, so I didn't call it out. Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds Exactly what I expected. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I think we're OK for now. If Richi wants it out, he'll say so explicitly. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. Can you also fixup the GIMPLE_RETURN documentation in gimple.texi. It needs a minor update after these changes. jeff
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. I appreciate your comment and test on it! thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) +{ + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X (W-1)) ^ X) - ((signed) X (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, +GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), +NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; +} +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..46e1df4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))] @@ -8733,6 +8733,19 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) +(define_expand absmode2 + [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] + TARGET_SSE2 +{ + if (!TARGET_SSSE3) +{ + ix86_expand_sse2_abs (operands[0], operands[1]); + DONE; +} +}) + (define_insn absmode2 [(set (match_operand:MMXMODEI 0 register_operand =y) (abs:MMXMODEI diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..cf5b942 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * gcc.dg/vect/pr58762.c: New test. + 2013-10-14 Tobias Burnus bur...@net-b.de PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58762.c b/gcc/testsuite/gcc.dg/vect/pr58762.c new file mode 100644 index 000..6468d0a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58762.c @@ -0,0 +1,28 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-do compile } */ +/* { dg-options -O2 -ftree-vectorize } */ + +void test1 (char* a, char* b) +{ + int i; + for (i = 0; i 1; ++i) +a[i] = abs (b[i]); +} + +void test2 (short* a, short* b) +{ + int i; + for (i = 0; i 1; ++i) +a[i] = abs (b[i]); +} + +void test3 (int* a, int* b) +{ + int i; + for (i = 0; i 1; ++i) +a[i] = abs (b[i]); +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 3 vect + {
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
Forget to attach the patch file. thanks, Cong On Wed, Oct 30, 2013 at 10:01 AM, Cong Hou co...@google.com wrote: I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. I appreciate your comment and test on it! thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) +{ + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X (W-1)) ^ X) - ((signed) X (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, +GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), +NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; +} +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..46e1df4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))] @@ -8733,6 +8733,19 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) +(define_expand absmode2 + [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] + TARGET_SSE2 +{ + if (!TARGET_SSSE3) +{ + ix86_expand_sse2_abs (operands[0], operands[1]); + DONE; +} +}) + (define_insn absmode2 [(set (match_operand:MMXMODEI 0 register_operand =y) (abs:MMXMODEI diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..cf5b942 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * gcc.dg/vect/pr58762.c: New test. + 2013-10-14 Tobias Burnus bur...@net-b.de PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58762.c b/gcc/testsuite/gcc.dg/vect/pr58762.c new file mode 100644 index 000..6468d0a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58762.c @@ -0,0 +1,28 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-do compile } */ +/* { dg-options -O2 -ftree-vectorize } */ + +void test1 (char* a, char* b) +{ + int i; + for (i = 0; i 1; ++i) +a[i] = abs (b[i]); +} + +void test2
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On 10/30/13 09:09, Iyer, Balaji V wrote: Hello Everyone, What I ideally wanted to do with my testsuite files was that I want all the Cilk keywords test to compile no matter what the architecture is, but it should only run in certain architectures where the runtime is enabled (this is known statically and thus the testsuite doesn't have to do anything to figure it out.). Can someone please tell me how do I do this? I can't recall a similar situation off the top of my head. Are you using the dg framework? Can you have multiple dg-do directives? ie, /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ ? That'd be my best guess. jeff
RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Wednesday, October 30, 2013 1:08 PM To: Iyer, Balaji V; Tobias Burnus; gcc patches Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp On 10/30/13 09:09, Iyer, Balaji V wrote: Hello Everyone, What I ideally wanted to do with my testsuite files was that I want all the Cilk keywords test to compile no matter what the architecture is, but it should only run in certain architectures where the runtime is enabled (this is known statically and thus the testsuite doesn't have to do anything to figure it out.). Can someone please tell me how do I do this? I can't recall a similar situation off the top of my head. Are you using the dg framework? Can you have multiple dg-do directives? ie, /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ Yes, I am using the dg-framework. I tried that couple months back, and from what it looked like, it was replacing the compile command with the run command, and then on non-x86 architectures, it was just ignoring the file... But, will try it again to see if it works. ? That'd be my best guess. jeff
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On Wed, 30 Oct 2013, Jeff Law wrote: /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ But with an effective-target keyword cilkplusrts or similar, rather than hardcoding the same list of targets in lots of places, please. -- Joseph S. Myers jos...@codesourcery.com
RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
-Original Message- From: Joseph Myers [mailto:jos...@codesourcery.com] Sent: Wednesday, October 30, 2013 1:15 PM To: Jeff Law Cc: Iyer, Balaji V; Tobias Burnus; gcc patches Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp On Wed, 30 Oct 2013, Jeff Law wrote: /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ But with an effective-target keyword cilkplusrts or similar, rather than hardcoding the same list of targets in lots of places, please. Wow, I didn't know you could do that. Do you have an example where it is done that I can model after? -- Joseph S. Myers jos...@codesourcery.com
[Patch] Fix canadian cross build on systems with no fenv.h
I ran into a build problem while doing a canadian cross build of GCC. I was building on linux to create a Windows (mingw) GCC that generates code for mips-mti-elf. The mips-mti-elf target uses newlib for its system headers and libraries and the headers do not include a fenv.h header file. However, when doing a canadian cross build libstdc++ is built with the C++ compiler that runs on linux (the build system), not the C++ that was just built for Windows (the host system). The problem is that the libstdc++ configure script (in GLIBCXX_CHECK_C99_TR1) is checking for the fenv.h using C++ (not C) and the C++ compiler running on linux does have an fenv.h header because the latest libstdc++ builds always create this header for C++ regardless of whether there is one on the system or not. This results in the newly created libstdc++ library having a fenv.h header that tries to include the system fenv.h header file which does not exist and the build fails. My fix for this is to explicitly check for fenv.h and complex.h in the libstdc++ configure.ac script before calling GLIBCXX_CHECK_C99_TR1. This makes the configure script check for these headers using the C compiler instead of the C++ compiler and when GLIBCXX_CHECK_C99_TR1 is run it uses that information (saved in a autoconf variable) to correctly ascertain that fenv.h does not exist. I could put these checks in GLIBCXX_CHECK_C99_TR1 if that were considered preferable, it would just have to be done before we call 'AC_LANG_CPLUSPLUS'. Tested with both my canadian cross build and a standard cross build targetting mips-mti-elf. OK for checkin? Steve Ellcey sell...@mips.com 2013-10-30 Steve Ellcey sell...@mips.com * configure.ac: Add header checks for fenv.h and complex.h. * configure: Regenerate. diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac index dd13b01..22fc840 100644 --- a/libstdc++-v3/configure.ac +++ b/libstdc++-v3/configure.ac @@ -195,6 +195,12 @@ GLIBCXX_CHECK_S_ISREG_OR_S_IFREG AC_CHECK_HEADERS(sys/uio.h) GLIBCXX_CHECK_WRITEV +# Check for fenv.h and complex.h before GLIBCXX_CHECK_C99_TR1 +# so that the check is done with the C compiler (not C++). +# Checking with C++ can break a canadian cross build if either +# file does not exist in C but does in C++. +AC_CHECK_HEADERS(fenv.h complex.h) + # For C99 support to TR1. GLIBCXX_CHECK_C99_TR1
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On 10/30/13 11:12, Iyer, Balaji V wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Wednesday, October 30, 2013 1:08 PM To: Iyer, Balaji V; Tobias Burnus; gcc patches Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp On 10/30/13 09:09, Iyer, Balaji V wrote: Hello Everyone, What I ideally wanted to do with my testsuite files was that I want all the Cilk keywords test to compile no matter what the architecture is, but it should only run in certain architectures where the runtime is enabled (this is known statically and thus the testsuite doesn't have to do anything to figure it out.). Can someone please tell me how do I do this? I can't recall a similar situation off the top of my head. Are you using the dg framework? Can you have multiple dg-do directives? ie, /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ Yes, I am using the dg-framework. I tried that couple months back, and from what it looked like, it was replacing the compile command with the run command, and then on non-x86 architectures, it was just ignoring the file... But, will try it again to see if it works. Hmm, if that's the case, you may need distinct tests. Judicious use of #include files would avoid unnecessary duplication. jeff
C++ PATCH to C++1y VLA of 0 length
At the Chicago meeting the EWG agreed that we don't need to throw on 0-length VLAs. Tested x86_64-pc-linux-gnu, applying to trunk. commit 1e328cbd26bfb641db8e218e4a4c32fc1a9a8d9d Author: Jason Merrill ja...@redhat.com Date: Fri Oct 25 06:15:01 2013 -0400 * decl.c (cp_finish_decl): Never throw for VLA bound == 0. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 1e92f2a..476d559 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6404,11 +6404,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, /* If the VLA bound is larger than half the address space, or less than zero, throw std::bad_array_length. */ tree max = convert (ssizetype, TYPE_MAX_VALUE (TYPE_DOMAIN (type))); - /* C++1y says we should throw for length = 0, but we have - historically supported zero-length arrays. Let's treat that as an - extension to be disabled by -std=c++NN. */ - int lower = flag_iso ? 0 : -1; - tree comp = build2 (LT_EXPR, boolean_type_node, max, ssize_int (lower)); + tree comp = build2 (LT_EXPR, boolean_type_node, max, ssize_int (-1)); comp = build3 (COND_EXPR, void_type_node, comp, throw_bad_array_length (), void_zero_node); finish_expr_stmt (comp);
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote: I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. OK, great that we put things in order ;) Does this patch need some extra middle-end functionality? I was not able to vectorize char and short part of your patch. Regarding the testcase - please put it to gcc.target/i386/ directory. There is nothing generic in the test, as confirmed by target-dependent scan test. You will find plenty of examples in the mentioned directory. I'd suggest to split the testcase in three files, and to simplify it to something like the testcase with global variables I used earlier. Modulo testcase, the patch is OK otherwise, but middle-end parts should be committed first. Thanks, Uros.
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On 10/30/13 11:16, Iyer, Balaji V wrote: -Original Message- From: Joseph Myers [mailto:jos...@codesourcery.com] Sent: Wednesday, October 30, 2013 1:15 PM To: Jeff Law Cc: Iyer, Balaji V; Tobias Burnus; gcc patches Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp On Wed, 30 Oct 2013, Jeff Law wrote: /* { dg-do compile } */ /* { dg-do run { target i?86-*-* x86-64-*-* } } */ But with an effective-target keyword cilkplusrts or similar, rather than hardcoding the same list of targets in lots of places, please. Wow, I didn't know you could do that. Do you have an example where it is done that I can model after? Look at testsuite/lib/target-support.exp for check_*. Once you see one or two, you can search in testsuite/gcc.dg for examples. jeff
Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
OK. Jason
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On 10/30/13 04:48, Richard Biener wrote: foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? Effectively the bounds are passed on the side. How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). You can't actually interleave them -- that results in MPX and normal code not being able to interact. Passing the bound at the end doesn't really work either -- varargs and the desire to pass some of the bounds around in bound registers. /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). Yes, but I don't see that's terribly problematical. /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. Sorry, I don't follow. Can you elaborate please. I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Obviously I will. jeff
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 09:17:04AM -0700, Richard Henderson wrote: On 10/30/2013 02:47 AM, Jakub Jelinek wrote: 2013-10-30 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If op1 is misaligned_operand, just use *movmode_internal insn rather than UNSPEC_LOADU load. (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). Avoid gen_lowpart on op0 if it isn't MEM. Ok. Testing revealed some testsuite failures, due to either trying to match insn names in -dp dump or counting specific FMA insns, where with the patch there are changes like: - vmovupd 0(%r13,%rax), %ymm0 - vfmadd231pd %ymm1, %ymm2, %ymm0 + vmovapd %ymm2, %ymm0 + vfmadd213pd 0(%r13,%rax), %ymm1, %ymm0 So, here is updated patch with those testsuite changes and added PR line to ChangeLog. I'll wait for Uros' testresults. 2013-10-30 Jakub Jelinek ja...@redhat.com PR target/47754 * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If op1 is misaligned_operand, just use *movmode_internal insn rather than UNSPEC_LOADU load. (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). Avoid gen_lowpart on op0 if it isn't MEM. * gcc.target/i386/avx256-unaligned-load-1.c: Adjust scan-assembler and scan-assembler-not regexps. * gcc.target/i386/avx256-unaligned-load-2.c: Likewise. * gcc.target/i386/avx256-unaligned-load-3.c: Likewise. * gcc.target/i386/avx256-unaligned-load-4.c: Likewise. * gcc.target/i386/l_fma_float_1.c: Expect vf{,n}m{add,sub}213*p* instead of vf{,n}m{add,sub}231*p*. * gcc.target/i386/l_fma_float_3.c: Likewise. * gcc.target/i386/l_fma_double_1.c: Likewise. * gcc.target/i386/l_fma_double_3.c: Likewise. --- gcc/config/i386/i386.c.jj 2013-10-30 08:15:38.0 +0100 +++ gcc/config/i386/i386.c 2013-10-30 10:20:22.684708729 +0100 @@ -16560,6 +16560,12 @@ ix86_avx256_split_vector_move_misalign ( r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); emit_move_insn (op0, r); } + /* Normal *movmode_internal pattern will handle +unaligned loads just fine if misaligned_operand +is true, and without the UNSPEC it can be combined +with arithmetic instructions. */ + else if (misaligned_operand (op1, GET_MODE (op1))) + emit_insn (gen_rtx_SET (VOIDmode, op0, op1)); else emit_insn (load_unaligned (op0, op1)); } @@ -16634,7 +16640,7 @@ ix86_avx256_split_vector_move_misalign ( void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { - rtx op0, op1, m; + rtx op0, op1, orig_op0 = NULL_RTX, m; rtx (*load_unaligned) (rtx, rtx); rtx (*store_unaligned) (rtx, rtx); @@ -16647,7 +16653,16 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V16SImode, op0); + if (GET_MODE (op0) != V16SImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V16SImode); + } + else + op0 = gen_lowpart (V16SImode, op0); + } op1 = gen_lowpart (V16SImode, op1); /* FALLTHRU */ @@ -16676,6 +16691,8 @@ ix86_expand_vector_move_misalign (enum m emit_insn (store_unaligned (op0, op1)); else gcc_unreachable (); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16692,12 +16709,23 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V32QImode, op0); + if (GET_MODE (op0) != V32QImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V32QImode); + } + else + op0 = gen_lowpart (V32QImode, op0); + } op1 = gen_lowpart (V32QImode, op1); /* FALLTHRU */ case MODE_VECTOR_FLOAT: ix86_avx256_split_vector_move_misalign (op0, op1); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16709,15 +16737,30 @@ ix86_expand_vector_move_misalign (enum m if (MEM_P (op1)) { + /* Normal *movmode_internal pattern will handle +unaligned loads just fine if misaligned_operand +is true, and without the UNSPEC it can be combined +with arithmetic instructions. */ + if (TARGET_AVX + (GET_MODE_CLASS (mode) == MODE_VECTOR_INT + || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) + misaligned_operand (op1, GET_MODE (op1))) + emit_insn
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 12:11 PM, Jakub Jelinek ja...@redhat.com wrote: Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize This is actually PR 47754 that fell below radar for some reason... Apparently yes. the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned loads, which usually means combine will fail, by doing the load into a temporary pseudo in that case and then doing a pseudo to pseudo move with gen_lowpart on the rhs (which will be merged soon after into following instructions). Is this similar to PR44141? There were similar problems with V4SFmode subregs, so combine was not able to merge load to the arithemtic insn. From the work on the vectorization last year I remember many cases where subregs (even equal size) on the LHS of instructions prevented combiner or other RTL optimizations from doing it's job. I believe I've changed some easy places that did that completely unnecessarily, but certainly have not went through all the code to look for other places where this is done. Perhaps let's hack up a checking pass that will after expansion walk the whole IL and complain about same sized subregs on the LHS of insns, then do make check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.? I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my bootstrap/regtest server isn't AVX capable. I can bootstrap the patch later today on IvyBridge with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. That would be greatly appreciated, thanks. The bootstrap and regression test was OK for x86_64-linux-gnu {,-m32}. The failures in the attached report are either pre-existing or benign due to core-avx-i default to AVX. Please also mention PR44141 in the ChangeLog entry. Uros. mail-report.log.gz Description: GNU Zip compressed data
Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads
On Wed, Oct 30, 2013 at 6:42 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Oct 30, 2013 at 12:11 PM, Jakub Jelinek ja...@redhat.com wrote: Yesterday I've noticed that for AVX which allows unaligned operands in AVX arithmetics instructions we still don't combine unaligned loads with the AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize This is actually PR 47754 that fell below radar for some reason... Apparently yes. the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned loads, which usually means combine will fail, by doing the load into a temporary pseudo in that case and then doing a pseudo to pseudo move with gen_lowpart on the rhs (which will be merged soon after into following instructions). Is this similar to PR44141? There were similar problems with V4SFmode subregs, so combine was not able to merge load to the arithemtic insn. From the work on the vectorization last year I remember many cases where subregs (even equal size) on the LHS of instructions prevented combiner or other RTL optimizations from doing it's job. I believe I've changed some easy places that did that completely unnecessarily, but certainly have not went through all the code to look for other places where this is done. Perhaps let's hack up a checking pass that will after expansion walk the whole IL and complain about same sized subregs on the LHS of insns, then do make check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.? I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my bootstrap/regtest server isn't AVX capable. I can bootstrap the patch later today on IvyBridge with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. That would be greatly appreciated, thanks. The bootstrap and regression test was OK for x86_64-linux-gnu {,-m32}. The failures in the attached report are either pre-existing or benign due to core-avx-i default to AVX. I was referring to the *runtime* failures here. Please also mention PR44141 in the ChangeLog entry. Ops, this should be PR47754. Thanks, Uros.
Re: RFA: patch to fix PR58785 (an ARM LRA crash)
On 30 Oct 2013, at 08:16, Vladimir Makarov vmaka...@redhat.com wrote: The following patch fixes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58785 LRA chooses constraint 'm' for const_int operand. It means that the const_int should be placed in memory but it does not happen as preferred reload class hook returns LO_REGS for class NO_REGS which is result of LRA choosing 'm'. I don't know why reload pass needs such value but it should be return NO_REGS IMHO as it results in much less reload insns. Is this patch ok to commit to the trunk? 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58785 * config/arm/arm.c (arm_preferred_reload_class): Don't return LO_REGS for NO_REGS for LRA. 2013-10-30 Vladimir Makarov vmaka...@redhat.com PR target/58785 * gcc.target/arm/pr58785.c: New. pr58785.patch We've been suspicious of this hunk of code for a while now. One reading of the manual suggests that p_r_c can only return a subset of rclass, not a different class. On that basis, lo_regs as a result should only be returned when rclass is general_regs, even for traditional reload. R.
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Wed, Oct 30, 2013 at 10:22 AM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote: I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. OK, great that we put things in order ;) Does this patch need some extra middle-end functionality? I was not able to vectorize char and short part of your patch. In the original patch, I converted abs() on short and char values to their own types by removing type casts. That is, originally char_val1 = abs(char_val2) will be converted to char_val1 = (char) abs((int) char_val2) in the frontend, and I would like to convert it back to char_val1 = abs(char_val2). But after several discussions, it seems this conversion has some problems such as overflow converns, and I thereby removed that part. Now you should still be able to vectorize abs(char) and abs(short) but with packing and unpacking. Later I will consider to write pattern recognizer for abs(char) and abs(short) and then the expand on abs(char)/abs(short) in this patch will be used during vectorization. Regarding the testcase - please put it to gcc.target/i386/ directory. There is nothing generic in the test, as confirmed by target-dependent scan test. You will find plenty of examples in the mentioned directory. I'd suggest to split the testcase in three files, and to simplify it to something like the testcase with global variables I used earlier. I have done it. The test case is split into three for s8/s16/s32 in gcc.target/i386. Thank you! Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) +{ + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X (W-1)) ^ X) - ((signed) X (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, +GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), +NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; +} +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..46e1df4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))] @@ -8733,6 +8733,19 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) +(define_expand absmode2
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
Also, as the current expand for abs() on 8/16bit integer is not used at all, should I comment them temporarily now? Later I can uncomment them once I finished the pattern recognizer. thanks, Cong On Wed, Oct 30, 2013 at 10:22 AM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote: I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. OK, great that we put things in order ;) Does this patch need some extra middle-end functionality? I was not able to vectorize char and short part of your patch. Regarding the testcase - please put it to gcc.target/i386/ directory. There is nothing generic in the test, as confirmed by target-dependent scan test. You will find plenty of examples in the mentioned directory. I'd suggest to split the testcase in three files, and to simplify it to something like the testcase with global variables I used earlier. Modulo testcase, the patch is OK otherwise, but middle-end parts should be committed first. Thanks, Uros.
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On 30 Oct 11:40, Jeff Law wrote: On 10/30/13 04:48, Richard Biener wrote: foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? Effectively the bounds are passed on the side. How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). You can't actually interleave them -- that results in MPX and normal code not being able to interact. Passing the bound at the end doesn't really work either -- varargs and the desire to pass some of the bounds around in bound registers. /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). Yes, but I don't see that's terribly problematical. /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. Sorry, I don't follow. Can you elaborate please. I suppose the possile problem here is when we run LTO compiler without -fcheck-pointer-bounds and give instrumented code as input. gimple_call_nobnd_arg would work wrong for instrumented code. Actually there are other places in subsequent patches wich assume that flag_check_pointer_bounds is 1 if we have instrumented code. Ilya I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Obviously I will. jeff
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Wed, Oct 30, 2013 at 7:01 PM, Cong Hou co...@google.com wrote: I found my problem: I put DONE outside of if not inside. You are right. I have updated my patch. OK, great that we put things in order ;) Does this patch need some extra middle-end functionality? I was not able to vectorize char and short part of your patch. In the original patch, I converted abs() on short and char values to their own types by removing type casts. That is, originally char_val1 = abs(char_val2) will be converted to char_val1 = (char) abs((int) char_val2) in the frontend, and I would like to convert it back to char_val1 = abs(char_val2). But after several discussions, it seems this conversion has some problems such as overflow converns, and I thereby removed that part. Now you should still be able to vectorize abs(char) and abs(short) but with packing and unpacking. Later I will consider to write pattern recognizer for abs(char) and abs(short) and then the expand on abs(char)/abs(short) in this patch will be used during vectorization. OK, this seems reasonable. We already have unused SSSE3 8/16 bit abs pattern, so I think we can commit SSE2 expanders, even if they will be unused for now. The proposed recognizer will benefit SSE2 as well as existing SSSE3 patterns. Regarding the testcase - please put it to gcc.target/i386/ directory. There is nothing generic in the test, as confirmed by target-dependent scan test. You will find plenty of examples in the mentioned directory. I'd suggest to split the testcase in three files, and to simplify it to something like the testcase with global variables I used earlier. I have done it. The test case is split into three for s8/s16/s32 in gcc.target/i386. OK. The patch is OK for mainline, but please check formatting and whitespace before the patch is committed. Thanks, Uros.
Re: [wide-int] Update main comment
Kenneth Zadeck zad...@naturalbridge.com writes: On 10/30/2013 07:01 AM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: On 10/29/2013 06:37 PM, Richard Sandiford wrote: This patch tries to update the main wide_int comment to reflect the current implementation. - bitsizetype is TImode on x86_64 and others, so I don't think it's necessarily true that all offset_ints are signed. (widest_int are though.) i am wondering if this is too conservative an interpretation.I believe that they are ti mode because that is the next thing after di mode and so they wanted to accommodate the 3 extra bits. Certainly there is no x86 that is able to address more than 64 bits. Right, but my point is that it's a different case from widest_int. It'd be just as valid to do bitsizetype arithmetic using wide_int rather than offset_int, and those wide_ints would have precision 128, just like the offset_ints. And I wouldn't really say that those wide_ints were fundamentally signed in any way. Although the tree layer might know that X upper bits of the bitsizetype are always signs, the tree-wide_int interface treats them in the same way as any other 128-bit type. Maybe I'm just being pedantic, but I think offset_int would only be like widest_int if bitsizetype had precision 67 or whatever. Then we could say that both offset_int and widest_int must be wider than any inputs, meaning that there's at least one leading sign bit. this was of course what mike and i wanted, but we could not really figure out how to pull it off. in particular, we could not find any existing reliable marker in the targets to say what the width of the widest pointer on any implementation. We actually used the number 68 rather than 67 because we assumed 64 for the widest pointer on any existing platform, 3 bits for the bits and 1 bit for the sign. Ah yeah, 68 would be better for signed types. Is the patch OK while we still have 128-bit bitsizetypes though? I agree the current comment would be right if we ever did switch to sub-128 bitsizes. Thanks, Richard
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
2013/10/30 Jeff Law l...@redhat.com: On 10/30/13 04:34, Ilya Enkovich wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? In general, when someone proposes a change in the size of tree, rtl or gimple nodes, it's a yellow flag that something may need further investigation. In this specific instance, I could trivially predict how that additional field would be used and a GIMPLE_RETURN isn't terribly important from a size standpoint, so I didn't call it out. Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds Exactly what I expected. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I think we're OK for now. If Richi wants it out, he'll say so explicitly. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. Can you also fixup the GIMPLE_RETURN documentation in gimple.texi. It needs a minor update after these changes. I could not find anything but accessors for GIMPLE_RETURN in gimple.texi. And new accessors are in my doc patch already. Ilya jeff
Re: C++ PATCH to deal with trivial but non-callable [cd]tors
+/* Return whether DECL, a method of a C++ TYPE, is trivial, that is to say + doesn't do anything for the objects of TYPE. */ + +static bool +is_trivial_method (const_tree decl, const_tree type) +{ + if (cpp_check (decl, IS_CONSTRUCTOR) !TYPE_NEEDS_CONSTRUCTING (type)) +return true; This will tell you whether decl is a constructor for a type with some non-trivial constructor, but not whether decl itself is non-trivial. I think that's good enough though, in practice we only need to eliminate constructors/destructors for POD types. As soon as there is one non-trivial method, the game is essentially over. I think a good way to check for any non-trivial methods would be to check trivial_type_p in the front end and then see if there are any !DECL_ARTIFICIAL decls in TYPE_METHODS. That sounds interesting indeed, thanks for the tip. I was initially reluctant to call into the front-end because of side-effects, but the various predicates in tree.c seem fine in this respect. -- Eric Botcazou
[C++ Patch] PR 58581
Hi, to resolve this simple ICE we only have to check the return value of mark_used, like we do in a number of other places. Tested x86_64-linux. Thanks! Paolo. / /cp 2013-10-30 Paolo Carlini paolo.carl...@oracle.com PR c++/58581 * call.c (build_over_call): Check return value of mark_used. /testsuite 2013-10-30 Paolo Carlini paolo.carl...@oracle.com PR c++/58581 * g++.dg/cpp0x/deleted1.C: New. Index: cp/call.c === --- cp/call.c (revision 204219) +++ cp/call.c (working copy) @@ -7112,8 +7112,9 @@ build_over_call (struct z_candidate *cand, int fla mark_versions_used (fn); } - if (!already_used) -mark_used (fn); + if (!already_used + !mark_used (fn)) +return error_mark_node; if (DECL_VINDEX (fn) (flags LOOKUP_NONVIRTUAL) == 0 /* Don't mess with virtual lookup in fold_non_dependent_expr; virtual Index: testsuite/g++.dg/cpp0x/deleted1.C === --- testsuite/g++.dg/cpp0x/deleted1.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted1.C (working copy) @@ -0,0 +1,6 @@ +// PR c++/58581 +// { dg-do compile { target c++11 } } + +templatetypename T int foo(T) noexcept(T()) = delete; + +int i = foo(0); // { dg-error deleted }
Re: [PATCH][RFC] fix reload causing ICE in subreg_get_info on m68k (PR58369)
On 10/18/13 14:17, Mikael Pettersson wrote: Thanks Mikael. My only concern is the lack of adjustment when the value found was already a SUBREG. ie, let's assume rld[r].in_reg was something like (subreg:XF (reg:DF) 0) and our target is (reg:DF) In this case it seems to me we still want to compute the subreg offset, right? jeff Thanks Jeff. Sorry about the delay, but it took me a while to work out the details of the various cases (I was afraid of having to do something like simplify_subreg, but the cases in reload are simpler), and then to re-test on my various targets. No worries. This stuff is complex and taking the time to thoroughly analyze the cases and test is warranted and greatly appreciated. Let the reloadee be rld[r].in_reg, outermode be its mode, and innermode be the mode of the hard reg in last_reg. The trivial cases are when the reloadee is a plain REG or a SUBREG of a hard reg. For these, reload virtually forms a normal lowpart subreg of last_reg, and subreg_lowpart_offset (outermode, innermode) computes the endian-correct offset for subreg_regno_offset. This is exactly what my previous patch did. Right. In remaining cases the reloadee is a SUBREG of a pseudo. Let outer_offset be its BYTE, and middlemode be the mode of its REG. Another simple case is when the reloadee is paradoxical. Then outer_offset is zero (by convention), and reload should just form a normal lowpart subreg as in the trivial cases. Even though the reloadee is paradoxical, this subreg will fit thanks to the mode size check on lines 6546-6547. Agreed. If the reloadee is a normal lowpart SUBREG, then again reload should just form a normal lowpart subreg as in the trivial cases. (But see below.) The tricky case is when the reloadee is a normal but not lowpart SUBREG. We get the correct offset for reload's virtual subreg by adding outer_offset to the lowpart offset of middlemode and innermode. This works for both big-endian and little-endian. It also handles normal lowpart SUBREGs, so we don't need to check for lowpart vs non-lowpart normal SUBREGs. Sounds right. Tested with trunk and 4.8 on {m68k,sparc64,powerpc64,x86_64}-linux-gnu and armv5tel-linux-gnueabi. No regressions. Ok for trunk? gcc/ 2013-10-18 Mikael Pettersson mikpeli...@gmail.com PR rtl-optimization/58369 * reload1.c (compute_reload_subreg_offset): Define. (choose_reload_regs): Use it to pass endian-correct offset to subreg_regno_offset. I fixed a few trivial whitespace issues and added a testcase. Installed onto the trunk on your behalf. Thanks for your patience, Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ef40a43..e3d2abd 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-18 Mikael Pettersson mikpeli...@gmail.com + + PR rtl-optimization/58369 + * reload1.c (compute_reload_subreg_offset): New function. + (choose_reload_regs): Use it to pass endian-correct + offset to subreg_regno_offset. + 2013-10-30 Tobias Burnus bur...@net-b.de PR other/33426 diff --git a/gcc/reload1.c b/gcc/reload1.c index d56c554..b62b047 100644 --- a/gcc/reload1.c +++ b/gcc/reload1.c @@ -6371,6 +6371,37 @@ replaced_subreg (rtx x) } #endif +/* Compute the offset to pass to subreg_regno_offset, for a pseudo of + mode OUTERMODE that is available in a hard reg of mode INNERMODE. + SUBREG is non-NULL if the pseudo is a subreg whose reg is a pseudo, + otherwise it is NULL. */ + +static int +compute_reload_subreg_offset (enum machine_mode outermode, + rtx subreg, + enum machine_mode innermode) +{ + int outer_offset; + enum machine_mode middlemode; + + if (!subreg) +return subreg_lowpart_offset (outermode, innermode); + + outer_offset = SUBREG_BYTE (subreg); + middlemode = GET_MODE (SUBREG_REG (subreg)); + + /* If SUBREG is paradoxical then return the normal lowpart offset + for OUTERMODE and INNERMODE. Our caller has already checked + that OUTERMODE fits in INNERMODE. */ + if (outer_offset == 0 + GET_MODE_SIZE (outermode) GET_MODE_SIZE (middlemode)) +return subreg_lowpart_offset (outermode, innermode); + + /* SUBREG is normal, but may not be lowpart; return OUTER_OFFSET + plus the normal lowpart offset for MIDDLEMODE and INNERMODE. */ + return outer_offset + subreg_lowpart_offset (middlemode, innermode); +} + /* Assign hard reg targets for the pseudo-registers we must reload into hard regs for this insn. Also output the instructions to copy them in and out of the hard regs. @@ -6508,6 +6539,7 @@ choose_reload_regs (struct insn_chain *chain) int byte = 0; int regno = -1; enum machine_mode mode = VOIDmode; + rtx subreg = NULL_RTX; if (rld[r].in == 0) ; @@ -6528,7 +6560,10 @@ choose_reload_regs (struct insn_chain
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
Iyer, Balaji V wrote: What I ideally wanted to do with my testsuite files was that I want all the Cilk keywords test to compile no matter what the architecture is, but it should only run in certain architectures where the runtime is enabled (this is known statically and thus the testsuite doesn't have to do anything to figure it out.). Can someone please tell me how do I do this? Which is a bit orthogonal to my patch, which helps that the tests pass on systems which are supported. – Thus, let's start with the question what do you think of that patch? On the other hand, one could use the existence of libcilkrts* as detected by the patch to decide whether to link or not: If the library is there, one can link – if not found, it is unlikely to work (unless it is, e.g. found in /usr/lib). * * * Regarding compile vs. run: I think one possibility would be to have no dg-do in the files and simply change the default to compile or run, depending whether the architecture is supported. On the other hand, that can be confusing as an explicit dg-do run will break it on some systems. * * * Actually, I was wondering whether -fcilkplus should always automatically link libcilkrts – akin to -fopenmp which links libgomp. Currently, one has to specify it manually.* Or are the features which do not need libcilkplus common enough that one doesn't always want to link it? [For OpenMP, GCC will have -fopenmp-simd [patch posted, 1], which doesn't link libgomp. I could imagine that one would like to have Cilk Plus' #pragma simd and the array syntax without enabling threads (cilk_sync, cilk_spawn, cilk_for, reducers – and thus libcilkrts linkage).] Tobias * libgomp is handled via spec files – including one which adds librt when libgomp is linked. [1] -fopenmp-simd: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02275.html
RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
* * * Actually, I was wondering whether -fcilkplus should always automatically link libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to specify it manually.* Yes, I would like that to happen. Do you have any pointers about how to do that?
Re: [C++ Patch] PR 58581
OK. Jason
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
Iyer, Balaji V wrote: * * * Actually, I was wondering whether -fcilkplus should always automatically link libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to specify it manually.* Yes, I would like that to happen. Do you have any pointers about how to do that? Well, if you need some special libraries like librt, see: libgomp/libgomp.spec.in libgomp/configure.ac (search for link_gomp). Otherwise, simply have a look at gcc/gcc.c – search for fopenmp (which adds -pthread). Tobias
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On 10/30/13 13:27, Iyer, Balaji V wrote: * * * Actually, I was wondering whether -fcilkplus should always automatically link libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to specify it manually.* Yes, I would like that to happen. Do you have any pointers about how to do that? LINK_SPEC and its friends. jeff
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
On 10/30/13 04:34, Tobias Burnus wrote: Without that patch, which I have copied from asan-dg.exp, I get tons of failures because ld cannot find libcilkrts. OK for committal? Tobias cilk.diff 2013-10-30 Tobias Burnusbur...@net-b.de * gcc.dg/cilk-plus/cilk-plus.exp: Add the libcilkrts library path to the compile flags. Yea, seems good to me. g++.dg/cilk-plus/cilk-plus.exp may have the same problem (haven't looked). jeff