RE: [PATCH GCC]Simplify address expression in IVOPT
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, October 30, 2013 10:46 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH GCC]Simplify address expression in IVOPT On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng bin.ch...@arm.com wrote: Hmm. I think you want what get_inner_reference_aff does, using the return value of get_inner_reference as starting point for determine_base_object. And you don't want to restrict yourselves so much on what exprs to process, but only exclude DECL_Ps. Did you mean I should pass all address expressions to get_inner_reference_aff except the one with declaration operand? I am a little confused here why DECL_Ps should be handled specially? Seems it can be handled properly by the get_* function. Anything important I missed? Just amend get_inner_reference_aff to return the tree base object. I found it's inappropriate to do this because: functions like get_inner_reference* only accept reference expressions, while base_object has to be computed for other kinds of expressions too. Take gimple statement a_1 = *ptr_1 as an example, the base passed to alloc_iv is ptr_1; the base_object computed by determine_base_object is also ptr_1, which we can't rely on get_inner_reference* . Also It seems the comment before determine_base_object might be inaccurate: Returns a memory object to that EXPR points. which I think is Returns a pointer pointing to the main memory object to that EXPR points. Right? Note that this isn't really simplifying but rather lowering all addresses. The other changes in this patch are unrelated, right? Right, I should have named this message like refine cost computation of expressions in IVOPT just as the patch file. Thanks. bin
Re: [wide-int] Update main comment
On Wed, 30 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: On 10/30/2013 07:01 AM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: On 10/29/2013 06:37 PM, Richard Sandiford wrote: This patch tries to update the main wide_int comment to reflect the current implementation. - bitsizetype is TImode on x86_64 and others, so I don't think it's necessarily true that all offset_ints are signed. (widest_int are though.) i am wondering if this is too conservative an interpretation.I believe that they are ti mode because that is the next thing after di mode and so they wanted to accommodate the 3 extra bits. Certainly there is no x86 that is able to address more than 64 bits. Right, but my point is that it's a different case from widest_int. It'd be just as valid to do bitsizetype arithmetic using wide_int rather than offset_int, and those wide_ints would have precision 128, just like the offset_ints. And I wouldn't really say that those wide_ints were fundamentally signed in any way. Although the tree layer might know that X upper bits of the bitsizetype are always signs, the tree-wide_int interface treats them in the same way as any other 128-bit type. Maybe I'm just being pedantic, but I think offset_int would only be like widest_int if bitsizetype had precision 67 or whatever. Then we could say that both offset_int and widest_int must be wider than any inputs, meaning that there's at least one leading sign bit. this was of course what mike and i wanted, but we could not really figure out how to pull it off. in particular, we could not find any existing reliable marker in the targets to say what the width of the widest pointer on any implementation. We actually used the number 68 rather than 67 because we assumed 64 for the widest pointer on any existing platform, 3 bits for the bits and 1 bit for the sign. Ah yeah, 68 would be better for signed types. Is the patch OK while we still have 128-bit bitsizetypes though? I agree the current comment would be right if we ever did switch to sub-128 bitsizes. The issue with sub-128bit bitsizetype is code generation quality. We do generate code for bitsizetype operations (at least from Ada), so a power-of-two precision is required to avoid a lot of masking operations. Richard.
[wide-int] Fix UNSIGNED_FIX typo
t had got switched to x during the conversion. This showed up in one of the alpha g++.dg tests. Applied as obvious. Richard Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c 2013-11-03 13:01:39.083385620 + +++ gcc/simplify-rtx.c 2013-11-03 20:22:41.115469359 + @@ -1819,7 +1819,8 @@ simplify_const_unary_operation (enum rtx if (REAL_VALUES_LESS (t, x)) return immed_wide_int_const (wmax, mode); - return immed_wide_int_const (real_to_integer (t, fail, width), mode); + return immed_wide_int_const (real_to_integer (x, fail, width), + mode); break; default:
[wide-int] Make integer_onep return false for signed 1-bit bitfields
As discussed on gcc@, integer_onep is supposed to return false for a nonzero 1-bit value if the type is signed. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c 2013-10-29 19:19:27.623468618 + +++ gcc/tree.c 2013-11-02 17:25:06.499657501 + @@ -2092,7 +2092,7 @@ integer_onep (const_tree expr) switch (TREE_CODE (expr)) { case INTEGER_CST: - return wi::eq_p (expr, 1); + return wi::eq_p (wi::to_widest (expr), 1); case COMPLEX_CST: return (integer_onep (TREE_REALPART (expr)) integer_zerop (TREE_IMAGPART (expr)));
Re: [wide-int] Avoid some changes in output code
On Mon, 4 Nov 2013, Richard Biener wrote: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. Oh, and my understanding is that it's maybe required for correctness on the RTL side if middle-end code removes masking operations. Constant propagation results later may not change the result because of that. I'm not sure this is the way it happens, but if we have (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f)) and we'd transform it to (based on SHIFT_COUNT_TRUNCATED) (lshift:si (reg:si 1) (reg:qi 2)) and later constant propagate 77 to reg:qi 2. That isn't the way SHIFT_COUNT_TRUNCATED should work, of course, but instead the backends should have a define_insn which matches the 'and' and combine should try to match that. You can see that various architectures may have truncating shifts but not for all patterns which results in stuff like config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD which clearly means that it's not implemented in terms of providing shift insns which can match a shift operand that is masked. That is, either try to clean that up (hooray!), or preserve the double-int.[ch] behavior (how does CONST_INT RTL constant folding honor it?). Richard. Thanks, Richard. [btw, please use diff -p to get us at least some context if you are not writing changelogs ...] Thanks, Richard Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 204247) +++ gcc/fold-const.c(working copy) @@ -1008,9 +1008,12 @@ The following code ignores overflow; perhaps a C standard interpretation ruling is needed. */ res = wi::rshift (arg1, arg2, sign, - GET_MODE_BITSIZE (TYPE_MODE (type))); + SHIFT_COUNT_TRUNCATED + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0); else - res = wi::lshift (arg1, arg2, GET_MODE_BITSIZE (TYPE_MODE (type))); + res = wi::lshift (arg1, arg2, + SHIFT_COUNT_TRUNCATED + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0); break; case RROTATE_EXPR: @@ -6870,7 +6873,8 @@ return NULL_TREE; if (TREE_CODE (arg1) == INTEGER_CST) -arg1 = force_fit_type (inner_type, arg1, 0, TREE_OVERFLOW (arg1)); +arg1 = force_fit_type (inner_type, wi::to_widest (arg1), 0, +
Re: [wide-int] Avoid some changes in output code
Richard Biener rguent...@suse.de writes: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. Sure, I can try, but what counts as success? For better or worse: unsigned int i = 1 32; has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour, so (e.g.) i is 0 for x86_64 and 1 for MIPS. So if the idea is to get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a visible change (but presumably only for undefined code). [btw, please use diff -p to get us at least some context if you are not writing changelogs ...] Yeah, sorry, this whole no changelog thing has messed up my workflow :-) The scripts I normally use did that for me. Thanks, Richard
Re: [wide-int] Avoid some changes in output code
Richard Biener rguent...@suse.de writes: On Mon, 4 Nov 2013, Richard Biener wrote: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. Oh, and my understanding is that it's maybe required for correctness on the RTL side if middle-end code removes masking operations. Constant propagation results later may not change the result because of that. I'm not sure this is the way it happens, but if we have (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f)) and we'd transform it to (based on SHIFT_COUNT_TRUNCATED) (lshift:si (reg:si 1) (reg:qi 2)) and later constant propagate 77 to reg:qi 2. That isn't the way SHIFT_COUNT_TRUNCATED should work, of course, but instead the backends should have a define_insn which matches the 'and' and combine should try to match that. You can see that various architectures may have truncating shifts but not for all patterns which results in stuff like config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD which clearly means that it's not implemented in terms of providing shift insns which can match a shift operand that is masked. That is, either try to clean that up (hooray!), or preserve the double-int.[ch] behavior (how does CONST_INT RTL constant folding honor it?). The CONST_INT code does the modulus explicitly: /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure the value is in range. We can't return any old value for out-of-range arguments because either the middle-end (via shift_truncation_mask) or the back-end might be relying on target-specific knowledge. Nor can we rely on shift_truncation_mask, since the shift might not be part of an ashlM3, lshrM3 or ashrM3 instruction. */ if (SHIFT_COUNT_TRUNCATED) arg1 = (unsigned HOST_WIDE_INT) arg1 % width; else if (arg1 0 || arg1 = GET_MODE_BITSIZE (mode)) return 0; There was a strong feeling (from me and others) that wide-int.h should not depend on tm.h. I don't think wi::lshift itself should know about SHIFT_COUNT_TRUNCATED. Especially since (and I think we agree on this :-)) it's a terrible interface. It would be better if it took a mode as an argument and (perhaps) returned a mask as a result. But that would make it even harder for wide-int.h to do the right thing, because it doesn't know about modes. Having the callers do
Re: [RFC] libgcov.c re-factoring and offline profile-tool
Thanks! Can you attach the patch file for the proposed change? David On Fri, Nov 1, 2013 at 2:08 PM, Rong Xu x...@google.com wrote: libgcov.c is the main file to generate libgcov.a. This single source generates 21 object files and then archives to a library. These objects are of very different purposes but they are in the same file guarded by various macros. The source file becomes quite large and its readability becomes very low. In its current state: 1) The code is very hard to understand by new developers; Agreed, libgcov has became hard to maintain since several parts are written in very low-level way. 2) It makes regressions more likely when making simple changes; More importantly, 3) it makes code reuse/sharing very difficult. For instance, Using libgcov to implement an offline profile tool (see below); kernel FDO support etc. Yep, it was my longer term plan to do this, too. We came to this issue when working on an offline tool to handle profile counters. Our plan is to have a profile-tool can merge, diff, collect statistics, and better dump raw profile counters. This is one of the most requested features from out internal users. This tool can also be very useful for any FDO/coverage users. In the first part of tool, we want to have the weight merge. Again, to reuse the code, we have to change libgcov.c. But we don't want to further diverge libgcov.a in our google branches from the trunk. Another issue in libgcov.c is function gcov_exit(). It is a huge function with about 467 lines of code. This function traverses gcov_list and dumps all the gcda files. The code for merging the gcda files also sits in the same function. The structure makes the code-reuse and extension really difficult (like in kernel FDO, we have to break this function to reuse some of the code). We propose to break gcov_exit into smaller functions and split libgcov.c into several files. Our plan for profile-tool is listed in (3). 1. break gcov_exit() into the following structure: gcov_exit() -- gcov_exit_compute_summary() -- allocate_filename_struct() for gi_ptr in gcov_list -- gcov_exit_dump_gcov() -- gcov_exit_open_gcda_file() -- gcov_exit_merge_gcda () -- gcov_exit_write_gcda () Just a short comment that summaries are also produced during the merging - i.e. they are done on per-file basis. But otherwise I agree. We should be cureful to not increase footprint of libgcov much for embedded users, but I believe changes like this can be done w/o additional overhead in the optimized library. 2. split libgcov.c into the following files: libgcov-profiler.c libgcov-merge.c libgcov-interface.c libgcov-driver.c libgcov-driver-system.c (source included into libgcov-driver.c) Seems resonable. Splitting i/o stuff into separate module (driver) is something I planned for long time. What is difference in between libgcov-interface and libgcov-driver? The details of the splitting: libgcov-interface.c /* This file contains API functions to other programs and the supporting functions. */ __gcov_dump() __gcov_execl() __gcov_execle() __gcov_execlp() __gcov_execv() __gcov_execve() __gcov_execvp() __gcov_flush() __gcov_fork() __gcov_reset() init_mx() init_mx_once() libgcov-profiler.c /* This file contains runtime profile handlers. */ variables: __gcov_indirect_call_callee __gcov_indirect_call_counters functions: __gcov_average_profiler() __gcov_indirect_call_profiler() __gcov_indirect_call_profiler_v2() __gcov_interval_profiler() __gcov_ior_profiler() __gcov_one_value_profiler() __gcov_one_value_profiler_body() __gcov_pow2_profiler() libgcov-merge.c /* This file contains the merge functions for various counters. */ functions: __gcov_merge_add() __gcov_merge_delta() __gcov_merge_ior() __gcov_merge_single() libcov-driver.c /* This file contains the gcov dumping functions. We separate the system dependent part to libgcov-driver-system.c. */ variables: gcov_list gcov_max_filename gcov_dump_complete -- newly added file static variables -- this_prg all_prg crc32 gi_filename fn_buffer fn_tail sum_buffer next_sum_buffer sum_tail - end - functions: free_fn_data() buffer_fn_data() crc32_unsigned() gcov_version() gcov_histogram_insert() gcov_compute_histogram() gcov_clear() __gcov_init() gcov_exit() --- newly added static functions -- gcov_exit_compute_summary () gcov_exit_merge_gcda() gcov_exit_write_gcda() gcov_exit_dump_gcov() - end
Re: [PATCH 1/n] Add conditional compare support
+ TARGET_ARM || TARGET_THUMB2 TARGET_32BIT +static const char *const ite = it\t%d4; +static const int cmp_idx[9] = {0, 0, 1, 0, 1}; s/9/5/ On Wed, Oct 30, 2013 at 5:32 PM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: -Original Message- From: Richard Henderson [mailto:r...@redhat.com] Sent: Monday, October 28, 2013 11:07 PM To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener' Cc: GCC Patches Subject: Re: [PATCH 1/n] Add conditional compare support On 10/28/2013 01:32 AM, Zhenqiang Chen wrote: Patch is updated according to your comments. Main changes are: * Add two hooks: legitimize_cmp_combination and legitimize_ccmp_combination * Improve document. No, these are not the hooks I proposed. You should *not* have a ccmp_optab, because the middle-end has absolutely no idea what mode TARGET should be in. The hook needs to return an rtx expression appropriate for feeding to cbranch et al. E.g. for arm, (ne (reg:CC_Z CC_REGNUM) (const_int 0)) after having emitted the instruction that sets CC_REGNUM. We need to push this to the backend because for ia64 the expression needs to be of te form (ne (reg:BI new_pseudo) (const_int 0)) Thanks for the clarification. Patch is updated: * ccmp_optab is removed. * Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends to generate the conditional compare instructions. Thanks! -Zhenqiang
Re: [wide-int] Avoid some changes in output code
On Mon, 4 Nov 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. Sure, I can try, but what counts as success? For better or worse: unsigned int i = 1 32; has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour, so (e.g.) i is 0 for x86_64 and 1 for MIPS. So if the idea is to get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a visible change (but presumably only for undefined code). Sure - but the above is undefined (as you say). Generally constant folding should follow language standards, not machine dependent behavior. [btw, please use diff -p to get us at least some context if you are not writing changelogs ...] Yeah, sorry, this whole no changelog thing has messed up my workflow :-) The scripts I normally use did that for me. ;) Richard.
RE: [PATCH 1/n] Add conditional compare support
-Original Message- From: Richard Henderson [mailto:r...@redhat.com] Sent: Friday, November 01, 2013 12:27 AM To: Zhenqiang Chen Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches Subject: Re: [PATCH 1/n] Add conditional compare support On 10/31/2013 02:06 AM, Zhenqiang Chen wrote: With two compares, we might swap the order of the two compares to get a valid combination. This is what current ARM does (Please refer arm_select_dominace_cc_mode, cmp_and, cmp_ior, and_scc_scc and ior_scc_scc). To improve gen_ccmp_first, we need another function/hook to determine which compare is done first. I will double check ARM backend to avoid hook. Hmm. I hadn't thought of that, and missed that point when looking through your previous patches. I think you should have a third hook for that. ARM will need it, but IA-64 doesn't. Thanks. I add a new hook. The default function will return -1 if the target does not care about the order. +DEFHOOK +(select_ccmp_cmp_order, + For some target (like ARM), the order of two compares is sensitive for\n\ +conditional compare. cmp0-cmp1 might be an invalid combination. But when\n\ +swapping the order, cmp1-cmp0 is valid. The function will return\n\ + -1: if @code{code1} and @code{code2} are valid combination.\n\ + 1: if @code{code2} and @code{code1} are valid combination.\n\ + 0: both are invalid., + int, (int code1, int code2), + default_select_ccmp_cmp_order) Hook gen_ccmp_next needs one more parameter to indicate AND/IOR since they will generate different instructions. True, I forgot about that while writing the hook descriptions. For gen_ccmp_next, I add another parameter CC_P to indicate the result is used as CC or not. If CC_P is false, the gen_ccmp_next will return a general register. This is for code like int test (int a, int b) { return a 0 b 0; } During expand, there might have no branch at all. So gen_ccmp_next can not return CC for a 0 b 0. +DEFHOOK +(gen_ccmp_next, + This function emits a conditional comparison within a sequence of\n\ + conditional comparisons. The @code{prev} expression is the result of a\n\ + prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It may return\n\ + @code{NULL} if the combination of @code{prev} and this comparison is\n\ + not supported, otherwise the result must be appropriate for passing to\n\ + @code{gen_ccmp_next} or @code{cbranch_optab} if @code{cc_p} is true.\n\ + If @code{cc_p} is false, it returns a general register. @code{bit_code}\n\ + is AND or IOR, which is the op on the two compares., + rtx, (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code, bool cc_p), + NULL) A updated patch is attached. Thanks! -Zhenqiang ccmp-hook3.patch Description: Binary data
Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2a5a2e1..8f5d39a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. +@cindex @code{ssad@var{m}} instruction pattern +@item @samp{ssad@var{m}} +@cindex @code{usad@var{m}} instruction pattern +@item @samp{usad@var{m}} +Compute the sum of absolute differences of two signed/unsigned elements. +Operand 1 and operand 2 are of the same mode. Their absolute difference, which +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode +equal or wider than the mode of the absolute difference. The result is placed +in operand 0, which is of the same mode as operand 3. + @cindex @code{ssum_widen@var{m3}} instruction pattern @item @samp{ssum_widen@var{m3}} @cindex @code{usum_widen@var{m3}} instruction pattern diff --git a/gcc/expr.c b/gcc/expr.c index 4975a64..1db8a49 100644 I'm not sure I follow, and if I do - I don't think it matches what you have implemented for i386. From your text description I would guess the series of operations to be: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) operands[0] = v3 + operands[3] But if I understand the behaviour of PSADBW correctly, what you have actually implemented is: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) v4 = reduce_plus (v3) operands[0] = v4 + operands[3] To my mind, synthesizing the reduce_plus step will be wasteful for targets who do not get this for free with their Absolute Difference step. Imagine a simple loop where we have synthesized the reduce_plus, we compute partial sums each loop iteration, though we would be better to leave the reduce_plus step until after the loop. REDUC_PLUS_EXPR would be the appropriate Tree code for this. I would prefer to see this Tree code not imply the reduce_plus. Thanks, James
Re: [wide-int] Avoid some changes in output code
On Mon, 4 Nov 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Mon, 4 Nov 2013, Richard Biener wrote: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. Oh, and my understanding is that it's maybe required for correctness on the RTL side if middle-end code removes masking operations. Constant propagation results later may not change the result because of that. I'm not sure this is the way it happens, but if we have (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f)) and we'd transform it to (based on SHIFT_COUNT_TRUNCATED) (lshift:si (reg:si 1) (reg:qi 2)) and later constant propagate 77 to reg:qi 2. That isn't the way SHIFT_COUNT_TRUNCATED should work, of course, but instead the backends should have a define_insn which matches the 'and' and combine should try to match that. You can see that various architectures may have truncating shifts but not for all patterns which results in stuff like config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD which clearly means that it's not implemented in terms of providing shift insns which can match a shift operand that is masked. That is, either try to clean that up (hooray!), or preserve the double-int.[ch] behavior (how does CONST_INT RTL constant folding honor it?). The CONST_INT code does the modulus explicitly: /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure the value is in range. We can't return any old value for out-of-range arguments because either the middle-end (via shift_truncation_mask) or the back-end might be relying on target-specific knowledge. Nor can we rely on shift_truncation_mask, since the shift might not be part of an ashlM3, lshrM3 or ashrM3 instruction. */ if (SHIFT_COUNT_TRUNCATED) arg1 = (unsigned HOST_WIDE_INT) arg1 % width; else if (arg1 0 || arg1 = GET_MODE_BITSIZE (mode)) return 0; There was a strong feeling (from me and others) that wide-int.h should not depend on tm.h. I don't think wi::lshift itself should know about SHIFT_COUNT_TRUNCATED. Especially since (and I think we agree on this :-)) it's a terrible interface. It would be better if it took a mode as an argument and (perhaps) returned a mask as a result.
Re: [PATCH] Fix up reassoc (PR tree-optimization/58946)
On Fri, 1 Nov 2013, Jakub Jelinek wrote: Hi! My recent reassoc patch caused following regression (though, it only started failing on this testcase with Andrew's ifcombine changes). The issue is that update_ops relies on walking the same stmts as get_ops did, and uses has_single_uses (either directly or indirectly through is_reassociable_op). optimize_range_tests itself doesn't change the IL except for inserting new stmts using values on which get_ops already didn't recurse (either because they were multiple uses or non-reassociable). The problem in the testcase is when optimizing a GIMPLE_COND directly, there is no guarantee of single use, we treat the condition as the starting point of init_range_info and thus SSA_NAME != 0 or SSA_NAME == 0 etc. and that is just fine if SSA_NAME has multiple uses, so if we first change the condition to something else (as instructed by the changed ops[i]-op value from NULL to some SSA_NAME), we might turn something update_ops looks at from multiple uses into single use. This patch fixes it by doing all the update_ops calls before changing GIMPLE_CONDs themselves. I believe it is safe, update_ops will walk only single use SSA_NAMEs and thus they occur only in the single particular update_ops call, and never removes anything, only adds new stmt (which can make single use SSA_NAMEs into multiple use, but that happened after we've walked that originally single use exactly ones from the single use), and GIMPLE_COND adjustments never use has_single_use, thus they can be safely done after all update_ops have been called. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2013-11-01 Jakub Jelinek ja...@redhat.com PR tree-optimization/58946 * tree-ssa-reassoc.c (maybe_optimize_range_tests): Update all bbs with bbinfo[idx].op != NULL before all blocks with bbinfo[idx].op == NULL. * gcc.c-torture/compile/pr58946.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2013-10-24 10:19:21.0 +0200 +++ gcc/tree-ssa-reassoc.c2013-11-01 09:23:09.264615181 +0100 @@ -2657,6 +2657,7 @@ maybe_optimize_range_tests (gimple stmt) edge e; vecoperand_entry_t ops = vNULL; vecinter_bb_range_test_entry bbinfo = vNULL; + bool any_changes = false; /* Consider only basic blocks that end with GIMPLE_COND or a cast statement satisfying final_range_test_p. All @@ -2870,41 +2871,31 @@ maybe_optimize_range_tests (gimple stmt) break; } if (ops.length () 1) +any_changes = optimize_range_tests (ERROR_MARK, ops); + if (any_changes) { unsigned int idx; - bool any_changes = optimize_range_tests (ERROR_MARK, ops); - for (bb = last_bb, idx = 0; any_changes; bb = single_pred (bb), idx++) + /* update_ops relies on has_single_use predicates returning the + same values as it did during get_ops earlier. Additionally it + never removes statements, only adds new ones and it should walk + from the single imm use and check the predicate already before + making those changes. + On the other side, the handling of GIMPLE_COND directly can turn + previously multiply used SSA_NAMEs into single use SSA_NAMEs, so + it needs to be done in a separate loop afterwards. */ + for (bb = last_bb, idx = 0; ; bb = single_pred (bb), idx++) { - if (bbinfo[idx].first_idx bbinfo[idx].last_idx) + if (bbinfo[idx].first_idx bbinfo[idx].last_idx +bbinfo[idx].op != NULL_TREE) { - gimple stmt = last_stmt (bb); tree new_op; - if (bbinfo[idx].op == NULL_TREE) - { - if (ops[bbinfo[idx].first_idx]-op != NULL_TREE) - { - if (integer_zerop (ops[bbinfo[idx].first_idx]-op)) - gimple_cond_make_false (stmt); - else if (integer_onep (ops[bbinfo[idx].first_idx]-op)) - gimple_cond_make_true (stmt); - else - { - gimple_cond_set_code (stmt, NE_EXPR); - gimple_cond_set_lhs (stmt, -ops[bbinfo[idx].first_idx]-op); - gimple_cond_set_rhs (stmt, boolean_false_node); - } - update_stmt (stmt); - } - bbinfo[idx].op = new_op = boolean_false_node; - } - else - new_op = update_ops (bbinfo[idx].op, - (enum tree_code) - ops[bbinfo[idx].first_idx]-rank, - ops, bbinfo[idx].first_idx, - loop_containing_stmt (stmt)); + stmt = last_stmt (bb); + new_op = update_ops (bbinfo[idx].op, +
PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size
emit_block_move_via_movmem and set_storage_via_setmem have for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; mode = GET_MODE_WIDER_MODE (mode)) { enum insn_code code = direct_optab_handler (movmem_optab, mode); if (code != CODE_FOR_nothing /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT here because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the actual mode mask. */ ((CONST_INT_P (size) ((unsigned HOST_WIDE_INT) INTVAL (size) = (GET_MODE_MASK (mode) 1))) || GET_MODE_BITSIZE (mode) = BITS_PER_WORD)) { Backend may assume mode of size in movmem and setmem expanders is no widder than Pmode since size is within the Pmode address space. X86 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has rtx saveddest = *destptr; gcc_assert (desired_align = size); /* Align destptr up, place it to new register. */ *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr, GEN_INT (prolog_size), NULL_RTX, 1, OPTAB_DIRECT); *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr, GEN_INT (-desired_align), *destptr, 1, OPTAB_DIRECT); /* See how many bytes we skipped. */ saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest, *destptr, saveddest, 1, OPTAB_DIRECT); /* Adjust srcptr and count. */ if (!issetmem) *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, saveddest, *srcptr, 1, OPTAB_DIRECT); *count = expand_simple_binop (GET_MODE (*count), PLUS, *count, saveddest, *count, 1, OPTAB_DIRECT); saveddest is a negative number in Pmode and *count is in word_mode. For x32, when Pmode is SImode and word_mode is DImode, saveddest + *count leads to overflow. We could fix it by using mode of saveddest to compute saveddest + *count. But it leads to extra conversions and other backends may run into the same problem. A better fix is to limit mode of size in movmem and setmem expanders to Pmode. It generates better and correct memcpy and memset for x32. There is also a typo in comments. It should be BITS_PER_WORD, not BITS_PER_HOST_WIDE_INT. Tested on x32. OK to install? Thanks. H.J. --- gcc/ 2013-11-04 H.J. Lu hongjiu...@intel.com PR middle-end/58981 * expr.c (emit_block_move_via_movmem): Don't use mode wider than Pmode for size. (set_storage_via_setmem): Likewise. gcc/testsuite/ 2013-11-04 H.J. Lu hongjiu...@intel.com PR middle-end/58981 * gcc.dg/pr58981.c: New test. diff --git a/gcc/expr.c b/gcc/expr.c index 551a660..1a869650 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1294,13 +1294,16 @@ emit_block_move_via_movmem (rtx x, rtx y, rtx size, unsigned int align, enum insn_code code = direct_optab_handler (movmem_optab, mode); if (code != CODE_FOR_nothing - /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT -here because if SIZE is less than the mode mask, as it is + /* We don't need MODE to be narrower than BITS_PER_WORD here +because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the -actual mode mask. */ +actual mode mask unless MODE is wider than Pmode. Since +SIZE is within the Pmode address space, we should use +Pmode in this case. */ ((CONST_INT_P (size) ((unsigned HOST_WIDE_INT) INTVAL (size) = (GET_MODE_MASK (mode) 1))) + || GET_MODE_BITSIZE (mode) = GET_MODE_BITSIZE (Pmode) || GET_MODE_BITSIZE (mode) = BITS_PER_WORD)) { struct expand_operand ops[6]; @@ -2879,13 +2882,16 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align, enum insn_code code = direct_optab_handler (setmem_optab, mode); if (code != CODE_FOR_nothing - /* We don't need MODE to be narrower than -BITS_PER_HOST_WIDE_INT here because if SIZE is less than -the mode mask, as it is returned by the macro, it will -definitely be less than the actual mode mask. */ + /* We don't need MODE to be narrower than BITS_PER_WORD here +because if SIZE is less than the mode mask, as it is +returned by the macro, it will definitely be less than the +actual mode mask unless MODE is wider than Pmode. Since +SIZE is within the Pmode
Re: RFC: simd enabled functions (omp declare simd / elementals)
On Fri, Nov 1, 2013 at 4:04 AM, Aldy Hernandez al...@redhat.com wrote: Hello gentlemen. I'm CCing all of you, because each of you can provide valuable feedback to various parts of the compiler which I touch. I have sprinkled love notes with your names throughout the post :). This is a patch against the gomp4 branch. It provides initial support for simd-enabled functions which are #pragma omp declare simd in the OpenMP world and elementals in Cilk Plus nomenclature. The parsing bits for OpenMP are already in trunk, but they are silently ignored. This patch aims to remedy the situation. The Cilk Plus parsing bits, OTOH, are not ready, but could trivially be adapted to use this infrastructure (see below). I would like to at least get this into the gomp4 branch for now, because I am accumulating far too many changes locally. The main idea is that for a simd annotated function, we can create one or more cloned vector variants of a scalar function that can later be used by the vectorizer. For a simple example with multiple returns... #pragma omp declare simd simdlen(4) notinbranch int foo (int a, int b) { if (a == b) return 555; else return 666; } ...we would generate with this patch (unoptimized): Just a quick question, queueing the thread for later review (aww, queue back at 100 threads again - I can't get any work done anymore :(). What does #pragma omp declare simd guarantee about memory side-effects and memory use in the function? That is, unless the function can be safely annotated with the 'const' attribute the whole thing is useless for the vectorizer. Thanks, Richard. foo.simdclone.0 (vector(4) int simd.4, vector(4) int simd.5) { unsigned int iter.6; int b.3[4]; int a.2[4]; int retval.1[4]; int _3; int _5; int _6; vector(4) int _7; bb 2: a.2 = VIEW_CONVERT_EXPRint[4](simd.4); b.3 = VIEW_CONVERT_EXPRint[4](simd.5); iter.6_12 = 0; bb 3: # iter.6_9 = PHI iter.6_12(2), iter.6_14(6) _5 = a.2[iter.6_9]; _6 = b.3[iter.6_9]; if (_5 == _6) goto bb 5; else goto bb 4; bb 4: bb 5: # _3 = PHI 555(3), 666(4) retval.1[iter.6_9] = _3; iter.6_14 = iter.6_9 + 1; if (iter.6_14 4) goto bb 6; else goto bb 7; bb 6: goto bb 3; bb 7: _7 = VIEW_CONVERT_EXPRvector(4) int(retval.1); return _7; } The new loop is properly created and annotated with loop-force_vect=true and loop-safelen set. A possible use may be: int array[1000]; void bar () { int i; for (i=0; i 1000; ++i) array[i] = foo(i, 123); } In which case, we would use the simd clone if available: bar () { vector(4) int vect_cst_.21; vector(4) int vect_i_6.20; vector(4) int * vectp_array.19; vector(4) int * vectp_array.18; vector(4) int vect_cst_.17; vector(4) int vect__4.16; vector(4) int vect_vec_iv_.15; vector(4) int vect_cst_.14; vector(4) int vect_cst_.13; int stmp_var_.12; int i; unsigned int ivtmp_1; int _4; unsigned int ivtmp_7; unsigned int ivtmp_20; unsigned int ivtmp_21; bb 2: vect_cst_.13_8 = { 0, 1, 2, 3 }; vect_cst_.14_2 = { 4, 4, 4, 4 }; vect_cst_.17_13 = { 123, 123, 123, 123 }; vectp_array.19_15 = array; vect_cst_.21_5 = { 1, 1, 1, 1 }; goto bb 4; bb 3: bb 4: # i_9 = PHI i_6(3), 0(2) # ivtmp_1 = PHI ivtmp_7(3), 1000(2) # vect_vec_iv_.15_11 = PHI vect_vec_iv_.15_12(3), vect_cst_.13_8(2) # vectp_array.18_16 = PHI vectp_array.18_17(3), vectp_array.19_15(2) # ivtmp_20 = PHI ivtmp_21(3), 0(2) vect_vec_iv_.15_12 = vect_vec_iv_.15_11 + vect_cst_.14_2; vect__4.16_14 = foo.simdclone.0 (vect_vec_iv_.15_11, vect_cst_.17_13); _4 = 0; MEM[(int *)vectp_array.18_16] = vect__4.16_14; vect_i_6.20_19 = vect_vec_iv_.15_11 + vect_cst_.21_5; i_6 = i_9 + 1; ivtmp_7 = ivtmp_1 - 1; vectp_array.18_17 = vectp_array.18_16 + 16; ivtmp_21 = ivtmp_20 + 1; if (ivtmp_21 250) goto bb 3; else goto bb 5; bb 5: return; } That's the idea. Some of the ABI issues still need to be resolved (mangling for avx-512, what to do with non x86 architectures, what (if any) default clones will be created when no vector length is specified, etc etc), but the main functionality can be seen above. Uniform and linear parameters (which are passed as scalars) are still not handled. Also, Jakub mentioned that with the current vectorizer we probably can't make good use of the inbranch/masked clones. I have a laundry list of missing things prepended by // FIXME if anyone is curious. I'd like some feedback from y'all in your respective areas, since this touches a few places besides OpenMP. For instance... [Honza] Where do you suggest I place a list of simd clones for a particular (scalar) function? Right now I have added a simdclone_of field in cgraph_node and am (temporarily) serially scanning all functions in get_simd_clone(). This is obviously
Re: [PATCH] Time profiler - phase 1
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fca665b..3b62bcc 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,31 @@ +2013-10-29 Martin Liska marxin.li...@gmail.com + Jan Hubicka j...@suse.cz + + * cgraph.c (dump_cgraph_node): Profile dump added. + * cgraph.h (struct cgraph_node): New time profile variable added. + * cgraphclones.c (cgraph_clone_node): Time profile is cloned. + * gcov-io.h (gcov_type): New profiler type introduced. + * ipa-profile.c (lto_output_node): Streaming for time profile added. + (input_node): Time profiler is read from LTO stream. + * predict.c (maybe_hot_count_p): Hot prediction changed. + * profile.c (instrument_values): New case for time profiler added. + (compute_value_histograms): Read of time profile. + * tree-pretty-print.c (dump_function_header): Time profiler is dumped. + * tree-profile.c (init_ic_make_global_vars): Time profiler function added. + (gimple_init_edge_profiler): TP function instrumentation. + (gimple_gen_time_profiler): New. + * value-prof.c (gimple_add_histogram_value): Support for time profiler + added. + (dump_histogram_value): TP type added to dumps. + (visit_hist): More sensitive check that takes TP into account. + (gimple_find_values_to_profile): TP instrumentation. + * value-prof.h (hist_type): New histogram type added. + (struct histogram_value_t): Pointer to struct function added. + * libgcc/Makefile.in: New GCOV merge function for TP added. + * libgcov.c: function_counter variable introduced. + (_gcov_merge_time_profile): New. + (_gcov_time_profiler): New. + 2013-10-29 David Malcolm dmalc...@redhat.com * doc/gty.texi (Inheritance and GTY): Make it clear that diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 52d9ab0..c95a54e 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node) if (node-profile_id) fprintf (f, Profile id: %i\n, node-profile_id); + fprintf (f, First run: %i\n, node-tp_first_run); fprintf (f, Function flags:); if (node-count) fprintf (f, executed HOST_WIDEST_INT_PRINT_DECx, diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 7706419..479d49f 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info bitmap combined_args_to_skip; }; - /* The cgraph data structure. Each function decl has assigned cgraph_node listing callees and callers. */ @@ -324,6 +323,8 @@ struct GTY(()) cgraph_node { unsigned tm_clone : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; + /* Time profiler: first run of function. */ + int tp_first_run; Move this up after profile_id. --- a/gcc/gcov-io.c +++ b/gcc/gcov-io.c @@ -68,7 +68,7 @@ gcov_open (const char *name, int mode) #if IN_LIBGCOV const int mode = 0; #endif -#if GCOV_LOCKED +#if GCOV_LOCKED struct flock s_flock; int fd; Accidental change? @@ -651,6 +658,9 @@ lto_symtab_prevailing_decl (tree decl) if (TREE_CODE (decl) == FUNCTION_DECL DECL_BUILT_IN (decl)) return decl; + if (!DECL_ASSEMBLER_NAME_SET_P (decl)) +return decl; + /* Ensure DECL_ASSEMBLER_NAME will not set assembler name. */ gcc_assert (DECL_ASSEMBLER_NAME_SET_P (decl)); Remove this change - it is unrelated hack from my old tree. diff --git a/gcc/predict.c b/gcc/predict.c index cc9a053..4b655d3 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -170,7 +170,7 @@ maybe_hot_count_p (struct function *fun, gcov_type count) if (fun profile_status_for_function (fun) != PROFILE_READ) return true; /* Code executed at most once is not hot. */ - if (profile_info-runs = count) + if (count = 1) return false; return (count = get_hot_bb_threshold ()); } And also this change. @@ -895,9 +907,19 @@ compute_value_histograms (histogram_values values, unsigned cfg_checksum, hist-hvalue.counters = XNEWVEC (gcov_type, hist-n_counters); for (j = 0; j hist-n_counters; j++) if (aact_count) - hist-hvalue.counters[j] = aact_count[j]; - else - hist-hvalue.counters[j] = 0; + hist-hvalue.counters[j] = aact_count[j]; +else + hist-hvalue.counters[j] = 0; + + if (hist-type == HIST_TYPE_TIME_PROFILE) +{ + node = cgraph_get_node (hist-fun-decl); + + node-tp_first_run = hist-hvalue.counters[0]; + + if (dump_file) +fprintf (dump_file, Read tp_first_run: %d\n, node-tp_first_run); +} Probably add a comment why you need to annotate counter here. diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index b2c5411..0fe262c 100644 --- a/gcc/tree-pretty-print.c +++
Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support
On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support Pointer Bounds Checker into c-family and LTO front-ends. The main purpose of changes in front-end is to register all statically initialized objects for checker. We also need to register such objects created by compiler. You define CHKP as supported in LTO. That means it has to be supported for all languages and thus you should drop the langhook. Richard. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * c/c-parser.c (c_parser_declaration_or_fndef): Register statically initialized decls in Pointer Bounds Checker. * cp/decl.c (cp_finish_decl): Likewise. * gimplify.c (gimplify_init_constructor): Likewise. diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c index 614c46d..a32bc6b 100644 --- a/gcc/c/c-lang.c +++ b/gcc/c/c-lang.c @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c; #define LANG_HOOKS_INIT c_objc_common_init #undef LANG_HOOKS_INIT_TS #define LANG_HOOKS_INIT_TS c_common_init_ts +#undef LANG_HOOKS_CHKP_SUPPORTED +#define LANG_HOOKS_CHKP_SUPPORTED true /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 9ccae3b..65d83c8 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1682,6 +1682,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok, maybe_warn_string_init (TREE_TYPE (d), init); finish_decl (d, init_loc, init.value, init.original_type, asm_name); + + /* Register all decls with initializers in Pointer +Bounds Checker to generate required static bounds +initializers. */ + if (DECL_INITIAL (d) != error_mark_node) + chkp_register_var_initializer (d); } } else diff --git a/gcc/cp/cp-lang.c b/gcc/cp/cp-lang.c index a7fa8e4..6d138bd 100644 --- a/gcc/cp/cp-lang.c +++ b/gcc/cp/cp-lang.c @@ -81,6 +81,8 @@ static tree get_template_argument_pack_elems_folded (const_tree); #define LANG_HOOKS_EH_PERSONALITY cp_eh_personality #undef LANG_HOOKS_EH_RUNTIME_TYPE #define LANG_HOOKS_EH_RUNTIME_TYPE build_eh_type_type +#undef LANG_HOOKS_CHKP_SUPPORTED +#define LANG_HOOKS_CHKP_SUPPORTED true /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 1e92f2a..db40e75 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6379,6 +6379,12 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, the class specifier. */ if (!DECL_EXTERNAL (decl)) var_definition_p = true; + + /* If var has initilizer then we need to register it in +Pointer Bounds Checker to generate static bounds initilizer +if required. */ + if (DECL_INITIAL (decl) DECL_INITIAL (decl) != error_mark_node) + chkp_register_var_initializer (decl); } /* If the variable has an array type, lay out the type, even if there is no initializer. It is valid to index through the diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 4f52c27..503450f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -4111,6 +4111,11 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, walk_tree (ctor, force_labels_r, NULL, NULL); ctor = tree_output_constant_def (ctor); + + /* We need to register created constant object to + initialize bounds for pointers in it. */ + chkp_register_var_initializer (ctor); + if (!useless_type_conversion_p (type, TREE_TYPE (ctor))) ctor = build1 (VIEW_CONVERT_EXPR, type, ctor); TREE_OPERAND (*expr_p, 1) = ctor; diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c index 0fa0fc9..b6073d9 100644 --- a/gcc/lto/lto-lang.c +++ b/gcc/lto/lto-lang.c @@ -1278,6 +1278,8 @@ static void lto_init_ts (void) #undef LANG_HOOKS_INIT_TS #define LANG_HOOKS_INIT_TS lto_init_ts +#undef LANG_HOOKS_CHKP_SUPPORTED +#define LANG_HOOKS_CHKP_SUPPORTED true struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/objc/objc-lang.c b/gcc/objc/objc-lang.c index bc0008b..5e7e43b 100644 --- a/gcc/objc/objc-lang.c +++ b/gcc/objc/objc-lang.c @@ -49,6 +49,8 @@ enum c_language_kind
Re: PR 58958: wrong aliasing info
On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, the issue was described in the PR and the message linked from there. ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size never learns of it and uses the offset+size as if they were meaningful. Well... it certainly learns of it, but it chooses to ignore... if (TREE_CODE (ptr) == ADDR_EXPR) -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), -ref-offset, t1, t2); +{ + ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0), + t, 0, safe); + ref-offset = BITS_PER_UNIT * t; +} safe == (t1 != -1 t1 == t2) note that ao_ref_init_from_ptr_and_size gets the size fed in as argument so I fail to see why it matters at all ...? That is, if you feed in a wrong size then it's the callers error. Richard. Bootstrap+testsuite on x86_64-unknown-linux-gnu. 2013-11-04 Marc Glisse marc.gli...@inria.fr PR tree-optimization/ gcc/ * tree-dfa.h (get_addr_base_and_unit_offset_1): Add error reporting. * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Use get_addr_base_and_unit_offset_1 instead of get_ref_base_and_extent. gcc/testsuite/ * gcc.dg/tree-ssa/pr58958.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +double a[10]; +int f(int n){ + a[3]=9; + __builtin_memset(a[n],3,sizeof(double)); + return a[3]==9; +} + +/* { dg-final { scan-tree-dump == 9 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-dfa.h === --- gcc/tree-dfa.h (revision 204302) +++ gcc/tree-dfa.h (working copy) @@ -30,65 +30,70 @@ extern tree ssa_default_def (struct func extern void set_ssa_default_def (struct function *, tree, tree); extern tree get_or_create_ssa_default_def (struct function *, tree); extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, HOST_WIDE_INT *); extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *); extern bool stmt_references_abnormal_ssa_name (gimple); extern void dump_enumerated_decls (FILE *, int); /* Returns the base object and a constant BITS_PER_UNIT offset in *POFFSET that denotes the starting address of the memory access EXP. - Returns NULL_TREE if the offset is not constant or any component - is not BITS_PER_UNIT-aligned. + If the offset is not constant or any component is not BITS_PER_UNIT-aligned, + sets *SAFE to false or returns NULL_TREE if SAFE is NULL. VALUEIZE if non-NULL is used to valueize SSA names. It should return its argument or a constant if the argument is known to be constant. */ /* ??? This is a static inline here to avoid the overhead of the indirect calls to VALUEIZE. But is this overhead really that significant? And should we perhaps just rely on WHOPR to specialize the function? */ static inline tree get_addr_base_and_unit_offset_1 (tree exp, HOST_WIDE_INT *poffset, -tree (*valueize) (tree)) +tree (*valueize) (tree), bool *safe = NULL) { HOST_WIDE_INT byte_offset = 0; + bool issafe = true; /* Compute cumulative byte-offset for nested component-refs and array-refs, and find the ultimate containing object. */ while (1) { switch (TREE_CODE (exp)) { case BIT_FIELD_REF: { HOST_WIDE_INT this_off = TREE_INT_CST_LOW (TREE_OPERAND (exp, 2)); if (this_off % BITS_PER_UNIT) - return NULL_TREE; - byte_offset += this_off / BITS_PER_UNIT; + issafe = false; + else + byte_offset += this_off / BITS_PER_UNIT; } break; case COMPONENT_REF: { tree field = TREE_OPERAND (exp, 1); tree this_offset = component_ref_field_offset (exp); HOST_WIDE_INT hthis_offset; if (!this_offset || TREE_CODE (this_offset) != INTEGER_CST ||
Re: RFC: simd enabled functions (omp declare simd / elementals)
Hi! On Mon, Nov 04, 2013 at 11:37:19AM +0100, Richard Biener wrote: Just a quick question, queueing the thread for later review (aww, queue back at 100 threads again - I can't get any work done anymore :(). What does #pragma omp declare simd guarantee about memory side-effects and memory use in the function? That is, unless the function can be safely annotated with the 'const' attribute the whole thing is useless for the vectorizer. The main restriction is: The execution of the function or subroutine cannot have any side effects that would alter its execution for concurrent iterations of a SIMD chunk. There are other restrictions, omp declare simd functions can't call setjmp/longjmp or throw, etc. The functions certainly can't be in the general case annotated with the const attribute, they can read and write memory, but the user is responsible for making sure that the effects or running the function sequentially as part of non-vectorized loop are the same as running the simd clone of that as part of a vectorized loop with the given simdlen vectorization factor. So it is certainly meant to be used by the vectorizer, after all, that is it's sole purpose. Jakub
Re: Expect JUMP_TABLE_DATA in NEXT_INSN(label)
On Sat, Nov 2, 2013 at 1:45 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, As $SUBJECT. The assert for this has been in place for several months now without triggering any issues, so I'd like to make this next step before stage1 is over. OK for trunk? Ok. Can you verify this from some RTL insn verifier (if it not already is). Thanks, Richard. Ciao! Steven
Re: PR 58958: wrong aliasing info
On Mon, 4 Nov 2013, Richard Biener wrote: On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, the issue was described in the PR and the message linked from there. ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size never learns of it and uses the offset+size as if they were meaningful. Well... it certainly learns of it, but it chooses to ignore... if (TREE_CODE (ptr) == ADDR_EXPR) -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), -ref-offset, t1, t2); +{ + ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0), + t, 0, safe); + ref-offset = BITS_PER_UNIT * t; +} safe == (t1 != -1 t1 == t2) I'll try that... (I need to think whether that's sufficient to be safe) note that ao_ref_init_from_ptr_and_size gets the size fed in as argument so I fail to see why it matters at all ...? That is, if you feed in a wrong size then it's the callers error. The caller is feeding the right size. The issue is that get_ref_base_and_extent cannot determine the offset as a constant. Normally, get_ref_base_and_extent then gives you a safe combination of offset+maxsize to cover the whole decl. Here, we don't want to use the size determined by get_ref_base_and_extent, we know better, but that also means we have to handle the case where the offset couldn't be determined as a constant. -- Marc Glisse
Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size
H.J. Lu hongjiu...@intel.com writes: emit_block_move_via_movmem and set_storage_via_setmem have for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; mode = GET_MODE_WIDER_MODE (mode)) { enum insn_code code = direct_optab_handler (movmem_optab, mode); if (code != CODE_FOR_nothing /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT here because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the actual mode mask. */ ((CONST_INT_P (size) ((unsigned HOST_WIDE_INT) INTVAL (size) = (GET_MODE_MASK (mode) 1))) || GET_MODE_BITSIZE (mode) = BITS_PER_WORD)) { Backend may assume mode of size in movmem and setmem expanders is no widder than Pmode since size is within the Pmode address space. X86 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has rtx saveddest = *destptr; gcc_assert (desired_align = size); /* Align destptr up, place it to new register. */ *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr, GEN_INT (prolog_size), NULL_RTX, 1, OPTAB_DIRECT); *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr, GEN_INT (-desired_align), *destptr, 1, OPTAB_DIRECT); /* See how many bytes we skipped. */ saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest, *destptr, saveddest, 1, OPTAB_DIRECT); /* Adjust srcptr and count. */ if (!issetmem) *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, saveddest, *srcptr, 1, OPTAB_DIRECT); *count = expand_simple_binop (GET_MODE (*count), PLUS, *count, saveddest, *count, 1, OPTAB_DIRECT); saveddest is a negative number in Pmode and *count is in word_mode. For x32, when Pmode is SImode and word_mode is DImode, saveddest + *count leads to overflow. We could fix it by using mode of saveddest to compute saveddest + *count. But it leads to extra conversions and other backends may run into the same problem. A better fix is to limit mode of size in movmem and setmem expanders to Pmode. It generates better and correct memcpy and memset for x32. There is also a typo in comments. It should be BITS_PER_WORD, not BITS_PER_HOST_WIDE_INT. I don't think it's a typo. It's explaining why we don't have to worry about: GET_MODE_BITSIZE (mode) BITS_PER_HOST_WIDE_INT in the CONST_INT_P test (because in that case the GET_MODE_MASK macro will be an all-1 HOST_WIDE_INT, even though that's narrower than the real mask). I don't think the current comment covers the BITS_PER_WORD test at all. AIUI it's there because the pattern is defined as taking a length of at most word_mode, so we should stop once we reach it. FWIW, I agree Pmode makes more sense on face value. But shouldn't we replace the BITS_PER_WORD test instead of adding to it? Having both would only make a difference for Pmode word_mode targets, which might be able to handle full Pmode lengths. Either way, the md.texi documentation should be updated too. Thanks, Richard
Re: PR 58958: wrong aliasing info
On Mon, 4 Nov 2013, Richard Biener wrote: On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, the issue was described in the PR and the message linked from there. ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size never learns of it and uses the offset+size as if they were meaningful. Well... it certainly learns of it, but it chooses to ignore... if (TREE_CODE (ptr) == ADDR_EXPR) -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), -ref-offset, t1, t2); +{ + ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0), + t, 0, safe); + ref-offset = BITS_PER_UNIT * t; +} safe == (t1 != -1 t1 == t2) note that ao_ref_init_from_ptr_and_size gets the size fed in as argument so I fail to see why it matters at all ...? That is, if you feed in a wrong size then it's the callers error. I think one issue is that the above code uses get_ref_base_and_extent on an address. It should have used get_addr_base_and_unit_offset. Isn't that what my patch does? Except that get_addr_base_and_unit_offset often gives up and returns NULL_TREE, whereas I believe we still want a base even if we have trouble determining a constant offset, so I modified get_addr_base_and_unit_offset_1 a bit. (not saying the result is pretty...) -- Marc Glisse
Re: [wide-int] Do not treat rtxes as sign-extended
On Sat, Nov 2, 2013 at 3:25 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Kenneth Zadeck zad...@naturalbridge.com writes: On 11/02/2013 06:30 AM, Richard Sandiford wrote: Bah. After all that effort, it turns out that -- by design -- there is one special case where CONST_INTs are not sign-extended. Nonzero/true BImode integers are stored as STORE_FLAG_VALUE, which can be 1 rather than -1. So (const_int 1) can be a valid BImode integer -- and consequently (const_int -1) can be wrong -- even though BImode only has 1 bit. It might be nice to change that, but for wide-int I think we should just treat rtxes like trees for now. Tested on powerpc64-linux-gnu and x86_64-linux-gnu. It fixes some ICEs seen on bfin-elf. OK to install? do we have to throw away the baby with the bath water here? I guess what you are saying is that it is worse to have is_sign_extended be a variable that is almost always true than to be a hard false. Right. is_sign_extended is only useful if it's a compile-time value. Making it a run-time value would negate the benefit. I think in practice STORE_FLAG_VALUE is a compile-time constant too, so we could set is_sign_extended to STORE_FLAG_VALUE == -1. But AFAICT that would only help SPU and m68k. also we could preserve the test and make it not apply to bimode. You mean the one in the assert? Yeah, OK. How about this version? Richard Index: gcc/rtl.h === --- gcc/rtl.h 2013-11-02 11:06:12.738517644 + +++ gcc/rtl.h 2013-11-02 14:22:05.636007860 + @@ -1408,7 +1408,9 @@ typedef std::pair rtx, enum machine_mod { static const enum precision_type precision_type = VAR_PRECISION; static const bool host_dependent_precision = false; -static const bool is_sign_extended = true; +/* This ought to be true, except for the special case that BImode + is canonicalized to STORE_FLAG_VALUE, which might be 1. */ +static const bool is_sign_extended = false; static unsigned int get_precision (const rtx_mode_t ); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, const rtx_mode_t ); @@ -1432,7 +1434,8 @@ wi::int_traits rtx_mode_t::decompose ( case CONST_INT: if (precision HOST_BITS_PER_WIDE_INT) gcc_checking_assert (INTVAL (x.first) -== sext_hwi (INTVAL (x.first), precision)); +== sext_hwi (INTVAL (x.first), precision) +|| (precision == 1 INTVAL (x.first) == 1)); please add a comment here (and a check for BImode?). Thanks, Richard. return wi::storage_ref (INTVAL (x.first), 1, precision);
[C++ PATCH] Fix PR58979
Even RO_ARROW_STAR can end up in invalid_indirection_error in invalid code, so handle that instead of ICEing later on. Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8? 2013-11-04 Marek Polacek pola...@redhat.com PR c++/58979 c-family/ * c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case. testsuite/ * g++.dg/diagnostic/pr58979.C: New test. --- gcc/c-family/c-common.c.mp2 2013-11-04 10:30:15.265951084 +0100 +++ gcc/c-family/c-common.c 2013-11-04 10:47:29.164783425 +0100 @@ -9930,6 +9930,7 @@ invalid_indirection_error (location_t lo type); break; case RO_ARROW: +case RO_ARROW_STAR: error_at (loc, invalid type argument of %-% (have %qT), type); --- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2 2013-11-04 11:09:30.515553664 +0100 +++ gcc/testsuite/g++.dg/diagnostic/pr58979.C 2013-11-04 11:09:26.505538785 +0100 @@ -0,0 +1,4 @@ +// PR c++/58979 +// { dg-do compile } + +int i = 0-*0; // { dg-error invalid type argument of } Marek
Re: [wide-int] Simplify some force_fit_type calls
On Sun, Nov 3, 2013 at 11:31 AM, Richard Sandiford rdsandif...@googlemail.com wrote: When going through force_fit_type calls to see whether they were extending correctly, I noticed some of the calls in VRP could be simplified. There's no change in behaviour, it's just shorter and more efficient. Tested on powerpc64-linux-gnu and x86_64-linux-gnu. OK to install? Ok. Thanks, Richard. Thanks, Richard Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c 2013-11-03 10:04:56.004019113 + +++ gcc/tree-vrp.c 2013-11-03 10:25:45.398984735 + @@ -1617,16 +1617,8 @@ extract_range_from_assert (value_range_t /* Make sure to not set TREE_OVERFLOW on the final type conversion. We are willingly interpreting large positive unsigned values as negative singed values here. */ - min = force_fit_type (TREE_TYPE (var), - wide_int::from (min, - TYPE_PRECISION (TREE_TYPE (var)), - TYPE_SIGN (TREE_TYPE (min))), - 0, false); - max = force_fit_type (TREE_TYPE (var), - wide_int::from (max, - TYPE_PRECISION (TREE_TYPE (var)), - TYPE_SIGN (TREE_TYPE (max))), - 0, false); + min = force_fit_type (TREE_TYPE (var), wi::to_widest (min), 0, false); + max = force_fit_type (TREE_TYPE (var), wi::to_widest (max), 0, false); /* We can transform a max, min range to an anti-range or vice-versa. Use set_and_canonicalize_value_range which does @@ -3235,20 +3227,12 @@ extract_range_from_unary_expr_1 (value_r if (is_overflow_infinity (vr0.min)) new_min = negative_overflow_infinity (outer_type); else - new_min = force_fit_type (outer_type, - wide_int::from - (vr0.min, -TYPE_PRECISION (outer_type), -TYPE_SIGN (TREE_TYPE (vr0.min))), + new_min = force_fit_type (outer_type, wi::to_widest (vr0.min), 0, false); if (is_overflow_infinity (vr0.max)) new_max = positive_overflow_infinity (outer_type); else - new_max = force_fit_type (outer_type, - wide_int::from - (vr0.max, -TYPE_PRECISION (outer_type), -TYPE_SIGN (TREE_TYPE (vr0.max))), + new_max = force_fit_type (outer_type, wi::to_widest (vr0.max), 0, false); set_and_canonicalize_value_range (vr, vr0.type, new_min, new_max, NULL);
Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size
Y On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: H.J. Lu hongjiu...@intel.com writes: emit_block_move_via_movmem and set_storage_via_setmem have for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; mode = GET_MODE_WIDER_MODE (mode)) { enum insn_code code = direct_optab_handler (movmem_optab, mode); if (code != CODE_FOR_nothing /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT here because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the actual mode mask. */ ((CONST_INT_P (size) ((unsigned HOST_WIDE_INT) INTVAL (size) = (GET_MODE_MASK (mode) 1))) || GET_MODE_BITSIZE (mode) = BITS_PER_WORD)) { Backend may assume mode of size in movmem and setmem expanders is no widder than Pmode since size is within the Pmode address space. X86 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has rtx saveddest = *destptr; gcc_assert (desired_align = size); /* Align destptr up, place it to new register. */ *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr, GEN_INT (prolog_size), NULL_RTX, 1, OPTAB_DIRECT); *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr, GEN_INT (-desired_align), *destptr, 1, OPTAB_DIRECT); /* See how many bytes we skipped. */ saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest, *destptr, saveddest, 1, OPTAB_DIRECT); /* Adjust srcptr and count. */ if (!issetmem) *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, saveddest, *srcptr, 1, OPTAB_DIRECT); *count = expand_simple_binop (GET_MODE (*count), PLUS, *count, saveddest, *count, 1, OPTAB_DIRECT); saveddest is a negative number in Pmode and *count is in word_mode. For x32, when Pmode is SImode and word_mode is DImode, saveddest + *count leads to overflow. We could fix it by using mode of saveddest to compute saveddest + *count. But it leads to extra conversions and other backends may run into the same problem. A better fix is to limit mode of size in movmem and setmem expanders to Pmode. It generates better and correct memcpy and memset for x32. There is also a typo in comments. It should be BITS_PER_WORD, not BITS_PER_HOST_WIDE_INT. I don't think it's a typo. It's explaining why we don't have to worry about: GET_MODE_BITSIZE (mode) BITS_PER_HOST_WIDE_INT in the CONST_INT_P test (because in that case the GET_MODE_MASK macro will be an all-1 HOST_WIDE_INT, even though that's narrower than the real mask). Thanks for explanation. I don't think the current comment covers the BITS_PER_WORD test at all. AIUI it's there because the pattern is defined as taking a length of at most word_mode, so we should stop once we reach it. I see. FWIW, I agree Pmode makes more sense on face value. But shouldn't we replace the BITS_PER_WORD test instead of adding to it? Having both would only make a difference for Pmode word_mode targets, which might be able to handle full Pmode lengths. Do we ever have a target with Pmode is wider than word_mode? If not, we can check Pmode instead. Either way, the md.texi documentation should be updated too. Thanks, Richard -- H.J.
Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT
On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng bin.ch...@arm.com wrote: Hi, The IVOPT in GCC has a problem that it does not use cost of auto-increment address expression in accounting, while it retreats to cost of address expression if auto-increment addressing mode is unavailable. For example, on ARM target: 1) the cost of [reg] (which is 6) is used for address expression [reg], #off; 2) the cost of [reg+off] (which is 2) is used for address expression [reg, #off]!; This causes: 1) cost of non-auto increment address expression is used for auto-increment address expression; 2) different address costs are used for pre/post increment address expressions. This patch fixes the problem by computing, caching and using the cost of auto-increment address expressions. Bootstrap and test on x86/arm. Is it OK? But don't you need to adjust static bool determine_use_iv_cost_address (struct ivopts_data *data, struct iv_use *use, struct iv_cand *cand) { bitmap depends_on; bool can_autoinc; int inv_expr_id = -1; comp_cost cost = get_computation_cost (data, use, cand, true, depends_on, can_autoinc, inv_expr_id); if (cand-ainc_use == use) { if (can_autoinc) cost.cost -= cand-cost_step; this which seems to try to compensate for your issue? Or maybe I don't understand. CCing Bernd who implemented this IIRC. Richard. 2013-11-01 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (enum ainc_type): New. (address_cost_data): New field. (get_address_cost): Compute auto-increment rtx cost in ainc_costs. Use ainc_costs for auto-increment rtx patterns. Cleanup TWS.
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Jakub Jelinek ja...@redhat.com writes: I think even as a fallback is the patch too expensive. I'd say best would be to write some getline API compatible function and just use it, using fread on say fixed size buffer. OK, thanks for the insight. I have just used the getline_fallback function you proposed, slightly amended to use the memory allocation routines commonly used in gcc and renamed into get_line, with a hopefully complete comment explaining where this function comes from etc. [...] A slight complication is what to do on mingw/cygwin and other DOS or Mac style line ending environments, no idea what fgets exactly does there. Actually, I think that even fgets just deals with '\n'. The reason, from what I gathered being that on windows, we fopen the input file in text mode; and in that mode, the \r\n is transformed into just \n. Apparently OSX is compatible with '\n' too. Someone corrects me if I am saying non-sense here. So the patch below is what I am bootstrapping at the moment. OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk? BTW, we probably want to do something with the speed of the caret diagnostics too, right now it opens the file again for each single line to be printed in caret diagnostics and reads all lines until the right one, so imagine how fast is printing of many warnings on almost adjacent lines near the end of many megabytes long file. Perhaps we could remember the last file we've opened for caret diagnostics, don't fclose the file right away but only if a new request is for a different file, perhaps keep some vector of line start offsets (say starting byte of every 100th line or similar) and also remember the last read line offset, so if a new request is for the same file, but higher line than last, we can just keep getlineing, and if it is smaller line than last, we look up the offset of the line / 100, fseek to it and just getline only modulo 100 lines. Maybe we should keep not just one, but 2 or 4 opened files as cache (again, with the starting line offsets vectors). I like this idea. I'll try and work on it. And now the patch. Cheers. gcc/ChangeLog: * input.h (location_get_source_line): Take an additional line_size parameter by reference. * input.c (get_line): New static function definition. (read_line): Take an additional line_length output parameter to be set to the size of the line. Use the new get_line function to compute the size of the line returned by fgets, rather than using strlen. Ensure that the buffer is initially zeroed; ensure that when growing the buffer too. (location_get_source_line): Take an additional output line_len parameter. Update the use of read_line to pass it the line_len parameter. * diagnostic.c (adjust_line): Take an additional input parameter for the length of the line, rather than calculating it with strlen. (diagnostic_show_locus): Adjust the use of location_get_source_line and adjust_line with respect to their new signature. While displaying a line now, do not stop at the first null byte. Rather, display the zero byte as a space and keep going until we reach the size of the line. gcc/testsuite/ChangeLog: * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file. --- gcc/diagnostic.c | 17 ++-- gcc/input.c| 104 ++--- gcc/input.h| 3 +- .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 - 240 bytes 4 files changed, 103 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..0ca7081 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context, MAX_WIDTH by some margin, then adjust the start of the line such that the COLUMN is smaller than MAX_WIDTH minus the margin. The margin is either 10 characters or the difference between the column - and the length of the line, whatever is smaller. */ + and the length of the line, whatever is smaller. The length of + LINE is given by LINE_WIDTH. */ static const char * -adjust_line (const char *line, int max_width, int *column_p) +adjust_line (const char *line, int line_width, +int max_width, int *column_p) { int right_margin = 10; - int line_width = strlen (line); int column = *column_p; right_margin = MIN (line_width - column, right_margin); @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context, const diagnostic_info *diagnostic) { const char *line; + int line_width; char *buffer; expanded_location s; int max_width; @@ -297,22 +299,25 @@
Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw
On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor i...@google.com wrote: The middle-end currently marks all the sync builtins as nothrow. That is incorrect for most of them when using -fnon-call-exceptions. The -fnon-call-exceptions option means that instructions that trap may throw exceptions. Most of the sync builtins access memory via pointers passed by user code. Therefore, they may trap. (I noticed this because Go uses -fnon-call-exceptions.) This patch fixes the problem by introducing a new internal function attribute nothrow call. The new attribute is exactly like nothrow, except that it is ignored when using -fnon-call-exceptions. It is an internal attribute because there is no reason for a user to use this. The problem only arises when the middle-end sees a function call that the backend turns into an instruction sequence that directly references memory. That can only happen for functions that are built in to the compiler. This requires changes to the C family, LTO, and Ada frontends. I think I'm handling the option correctly for LTO, but it would be good if somebody could check the logic for me. Bootstrapped and ran tests on x86_64-unknown-linux-gnu. OK for mainline? Hmm. I think you can handle this in a simpler way by just doing sth similar to #define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \ ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING) that is, conditionally drop the _NOTHROW part. Btw, I also think that if it can throw then it isn't LEAF (it may return exceptionally into another function in the unit). Also this whole conditional-on-a-flag thing doesn't work well with using -fnon-call-exceptions in the optimize attribute or with different -fnon-call-exceptions settings in different TUs we LTO together. Because we only have a single builtin decl (so for proper operation you'd have to clone builtin decls based on the matrix of flags that generate different attributes and use the correct one depending on context). That said, I'd prefer a simpler approach for now, mimicing flag_errno_math handling. Your lto-wrapper parts are clearly wrong (in a conservative way though). They follow -fexceptions so the lto-wrapper.c and lto-opts.c parts are ok. Fixing the LTO pieces would require to really stream most builtins instead of generating them in the LTO frontend and streaming them via DECL_FUNCTION_CODE. tree merging should get rid of the duplicates. I suppose I should try that again ;) Thanks, Richard. Ian gcc/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * builtin-attrs.def (ATTR_NOTHROWCALL): Define. (ATTR_NOTHROWCALL_LIST, ATTR_NOTHROWCALL_LEAF_LIST): Define. * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync builtins that take pointers. * lto-opts.c (lto_write_options): Write -fnon-call-exceptions if set. * lto-wrapper.c (merge_and_complain): Collect OPT_fnon_call_exceptions. (run_gcc): Pass -fnon-call-exceptions. gcc/c-family/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * c-common.c (c_common_attribute_table): Add nothrow call. (handle_nothrowcall_attribute): New static function. gcc/lto/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * lto-lang.c (lto_attribute_table): Add nothrow call (handle_nothrowcall_attribute): New static function. gcc/ada/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * gcc-interface/utils.c (gnat_internal_attribute_table): Add nothrow call. (handle_nothrowcall_attribute): New static function. gcc/testsuite/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * g++.dg/ext/sync-4.C: New test.
Re: PR 58958: wrong aliasing info
On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 4 Nov 2013, Richard Biener wrote: On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, the issue was described in the PR and the message linked from there. ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size never learns of it and uses the offset+size as if they were meaningful. Well... it certainly learns of it, but it chooses to ignore... if (TREE_CODE (ptr) == ADDR_EXPR) -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), -ref-offset, t1, t2); +{ + ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0), + t, 0, safe); + ref-offset = BITS_PER_UNIT * t; +} safe == (t1 != -1 t1 == t2) note that ao_ref_init_from_ptr_and_size gets the size fed in as argument so I fail to see why it matters at all ...? That is, if you feed in a wrong size then it's the callers error. I think one issue is that the above code uses get_ref_base_and_extent on an address. It should have used get_addr_base_and_unit_offset. Isn't that what my patch does? Except that get_addr_base_and_unit_offset often gives up and returns NULL_TREE, whereas I believe we still want a base even if we have trouble determining a constant offset, so I modified get_addr_base_and_unit_offset_1 a bit. (not saying the result is pretty...) I don't think we want get_addr_base_and_unit_offset to return non-NULL for a non-constant offset. But yes, inside your patch is the correct fix, so if you drop changing get_addr_base_and_unit_offset ... Richard. -- Marc Glisse
RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Hi, I see another read_line at gcov.c, which seems to be a copy. Maybe this should be changed too? What do you think? Bernd. On Mon, 4 Nov 2013 12:46:10, Dodji Seketeli wrote: Jakub Jelinek ja...@redhat.com writes: I think even as a fallback is the patch too expensive. I'd say best would be to write some getline API compatible function and just use it, using fread on say fixed size buffer. OK, thanks for the insight. I have just used the getline_fallback function you proposed, slightly amended to use the memory allocation routines commonly used in gcc and renamed into get_line, with a hopefully complete comment explaining where this function comes from etc. [...] A slight complication is what to do on mingw/cygwin and other DOS or Mac style line ending environments, no idea what fgets exactly does there. Actually, I think that even fgets just deals with '\n'. The reason, from what I gathered being that on windows, we fopen the input file in text mode; and in that mode, the \r\n is transformed into just \n. Apparently OSX is compatible with '\n' too. Someone corrects me if I am saying non-sense here. So the patch below is what I am bootstrapping at the moment. OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk? BTW, we probably want to do something with the speed of the caret diagnostics too, right now it opens the file again for each single line to be printed in caret diagnostics and reads all lines until the right one, so imagine how fast is printing of many warnings on almost adjacent lines near the end of many megabytes long file. Perhaps we could remember the last file we've opened for caret diagnostics, don't fclose the file right away but only if a new request is for a different file, perhaps keep some vector of line start offsets (say starting byte of every 100th line or similar) and also remember the last read line offset, so if a new request is for the same file, but higher line than last, we can just keep getlineing, and if it is smaller line than last, we look up the offset of the line / 100, fseek to it and just getline only modulo 100 lines. Maybe we should keep not just one, but 2 or 4 opened files as cache (again, with the starting line offsets vectors). I like this idea. I'll try and work on it. And now the patch. Cheers. gcc/ChangeLog: * input.h (location_get_source_line): Take an additional line_size parameter by reference. * input.c (get_line): New static function definition. (read_line): Take an additional line_length output parameter to be set to the size of the line. Use the new get_line function to compute the size of the line returned by fgets, rather than using strlen. Ensure that the buffer is initially zeroed; ensure that when growing the buffer too. (location_get_source_line): Take an additional output line_len parameter. Update the use of read_line to pass it the line_len parameter. * diagnostic.c (adjust_line): Take an additional input parameter for the length of the line, rather than calculating it with strlen. (diagnostic_show_locus): Adjust the use of location_get_source_line and adjust_line with respect to their new signature. While displaying a line now, do not stop at the first null byte. Rather, display the zero byte as a space and keep going until we reach the size of the line. gcc/testsuite/ChangeLog: * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file. --- gcc/diagnostic.c | 17 ++-- gcc/input.c | 104 ++--- gcc/input.h | 3 +- .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 - 240 bytes 4 files changed, 103 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..0ca7081 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context, MAX_WIDTH by some margin, then adjust the start of the line such that the COLUMN is smaller than MAX_WIDTH minus the margin. The margin is either 10 characters or the difference between the column - and the length of the line, whatever is smaller. */ + and the length of the line, whatever is smaller. The length of + LINE is given by LINE_WIDTH. */ static const char * -adjust_line (const char *line, int max_width, int *column_p) +adjust_line (const char *line, int line_width, + int max_width, int *column_p) { int right_margin = 10; - int line_width = strlen (line); int column = *column_p; right_margin = MIN (line_width - column, right_margin); @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context, const diagnostic_info *diagnostic) { const char *line; + int line_width; char *buffer; expanded_location s; int max_width; @@ -297,22 +299,25 @@ diagnostic_show_locus (diagnostic_context * context, context-last_location = diagnostic-location; s =
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote: I see another read_line at gcov.c, which seems to be a copy. Copy of what? gcov.c read_line hardly can be allowed to fail because out of mem unlike this one for caret diagnostics. Though, surely, this one could be somewhat adjusted so that it really doesn't use a temporary buffer but reads directly into the initially malloced, then realloced, buffer. But, if we want it to eventually switch to caching the caret diagnostics, it won't be possible/desirable anymore. Jakub
RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote: I see another read_line at gcov.c, which seems to be a copy. Copy of what? gcov.c read_line hardly can be allowed to fail because out of mem unlike this one for caret diagnostics. Though, surely, this one could be somewhat adjusted so that it really doesn't use a temporary buffer but reads directly into the initially malloced, then realloced, buffer. But, if we want it to eventually switch to caching the caret diagnostics, it won't be possible/desirable anymore. Jakub gcov.c and input.c currently both have a static function read_line they are currently 100% in sync. Both _can_ fail, if the file gets deleted or modified while the function executes. If gcov.c crashes in that event, I'd call it a bug. Bernd.
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Wed, Oct 30, 2013 at 5:20 PM, Jeff Law l...@redhat.com wrote: On 10/30/13 04:34, Ilya Enkovich wrote: On 30 Oct 10:26, Richard Biener wrote: Ick - you enlarge all return statements? But you don't set the actual value? So why allocate it with 2 ops in the first place?? When return does not return bounds it has operand with zero value similar to case when it does not return value. What is the difference then? In general, when someone proposes a change in the size of tree, rtl or gimple nodes, it's a yellow flag that something may need further investigation. In this specific instance, I could trivially predict how that additional field would be used and a GIMPLE_RETURN isn't terribly important from a size standpoint, so I didn't call it out. Btw, both for regular return with no value and this case only required operands could be allocated. There may be legacy issues with the regular return value but for new uses always allocating the operand seems wrong (even if cheap in practice). Richard. Returns instrumentation. We add new operand to return statement to hold returned bounds and instrumentation pass is responsible to fill this operand with correct bounds Exactly what I expected. Unfortunately patch has been already installed. Should we uninstall it? If not, then here is patch for documentation. I think we're OK for now. If Richi wants it out, he'll say so explicitly. Thanks, Ilya -- gcc/ 2013-10-30 Ilya Enkovich ilya.enkov...@intel.com * doc/gimple.texi (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index): New. Can you also fixup the GIMPLE_RETURN documentation in gimple.texi. It needs a minor update after these changes. jeff
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Wed, Oct 30, 2013 at 6:40 PM, Jeff Law l...@redhat.com wrote: On 10/30/13 04:48, Richard Biener wrote: foo (int * p, unsigned int size) { unnamed type __bound_tmp.0; long unsigned int D.2239; long unsigned int _2; sizetype _6; int * _7; bb 3: __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: _2 = (long unsigned int) size_1(D); __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D)); _6 = _2 + 18446744073709551615; _7 = p_3(D) + _6; __builtin_ia32_bndcu (__bound_tmp.0_4, _7); access_and_store (p_3(D), __bound_tmp.0_4, size_1(D)); so it seems there is now a mismatch between DECL_ARGUMENTS and the GIMPLE call stmt arguments. How (if) did you amend the GIMPLE stmt verifier for this? Effectively the bounds are passed on the side. Well, not exactly. I can see __bound_tmp.0_4 passed to access_and_store. Are __builtin_ia32_bndcu and access_and_store supposed to be called directly after each other? What if for example profile instrumentation inserts a call inbetween them? How does regular code deal with this which may expect matching to DECL_ARGUMENTS? In fact interleaving the additional arguments sounds very error-prone for existing code - I'd have appended all bound args at the end. Also you unconditionally claim all pointer arguments have a bound - that looks like bad design as well. Why didn't you add a flag to the relevant PARM_DECL (and then, what do you do for indirect calls?). You can't actually interleave them -- that results in MPX and normal code not being able to interact. Passing the bound at the end doesn't really work either -- varargs and the desire to pass some of the bounds around in bound registers. I'm only looking at how bound arguments are passed at the GIMPLE level - which I think is arbitrary given they are passed on-the-side during code-generation. So I'm looking for a more sensible option for the GIMPLE representation which looks somewhat fragile to me (it looks the argument is only there to have a data dependence between the bndcu intrinsic and the actual call?) /* Return the number of arguments used by call statement GS ignoring bound ones. */ static inline unsigned gimple_call_num_nobnd_args (const_gimple gs) { unsigned num_args = gimple_call_num_args (gs); unsigned res = num_args; for (unsigned n = 0; n num_args; n++) if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) res--; return res; } the choice means that gimple_call_num_nobnd_args is not O(1). Yes, but I don't see that's terribly problematical. Well, consider compile.exp limits-fnargs.c written with int * parameters and bound support ;) [there is a limit on the number of bound registers available, but the GIMPLE parts doesn't put any limit on instrumented args?] /* Return INDEX's call argument ignoring bound ones. */ static inline tree gimple_call_nobnd_arg (const_gimple gs, unsigned index) { /* No bound args may exist if pointers checker is off. */ if (!flag_check_pointer_bounds) return gimple_call_arg (gs, index); return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); } GIMPLE layout depending on flag_check_pointer_bounds sounds like a recipie for desaster if you consider TUs compiled with and TUs compiled without and LTO. Or if you consider using optimized attribute with that flag. Sorry, I don't follow. Can you elaborate please. Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files which gives you - well, either -fno-chk or -fchk. Then you read in the GIMPLE and when calling gimple_call_nobnd_arg you get a mismatch between where you generated the gimple and where you use the gimple. A flag on whether a call is instrumented is better (consider inlining from TU1 into TU2 where a flag on struct function for example wouldn't be enough either). Richard. I hope the reviewers that approved the patch will work with you to address the above issues. I can't be everywhere. Obviously I will. jeff
Re: PATCH to use -Wno-format during stage1
On Wed, Oct 30, 2013 at 8:55 PM, Ian Lance Taylor i...@google.com wrote: On Wed, Oct 30, 2013 at 8:47 AM, Jason Merrill ja...@redhat.com wrote: I find -Wformat warnings about unknown % codes when building with an older GCC to be mildly annoying noise; this patch avoids them by passing -Wno-format during stage 1. Tested x86_64-pc-linux-gnu. Is this OK for trunk? Do you have another theory of how this should work? Thank you! Please make sure the warning is preserved when not bootstrapping, it's useful to have them in your dev-tree and not notice errors only during -Werror bootstrap ... Richard. Ian
Re: [PATCH][RFA] Improvements to infer_nonnull_range
On Thu, Oct 31, 2013 at 6:41 AM, Jeff Law l...@redhat.com wrote: On 10/28/13 23:02, Jeff Law wrote: Based on a suggestion from Marc, I want to use infer_nonnull_range in the erroneous path isolation optimization. In the process of doing that, I found a few deficiencies in infer_nonnull_range that need to be addressed. First, infer_nonnull_range_p doesn't infer ranges from a GIMPLE_RETURN if the current function is marked as returns_nonnull. That's fixed in the relatively obvious way. Second, infer_nonnull_range_p, when presented with an arglist where the non-null attribute applies to all pointer arguments, it won't bother to determine if OP is actually one of the arguments :( It just assumes that OP is in the argument list. Opps. Third, I want to be able to call infer_nonnull_range with OP being 0B. That lets me use infer_nonnull_range to look for explicit null pointer dereferences, explicit uses of null in a return statement in functions that can't return non-null and explicit uses of null arguments when those arguments can't be null. Sadly, to make that work we need to use operand_equal_p rather than simple pointer comparisons to see if OP shows up in STMT. Finally, for detecting explicit null pointers, infer_nonnull_range calls count_ptr_derefs. count_ptr_derefs counts things in two ways. One with a FOR_EACH_SSA_TREE_OPERAND, then again with simple walks of the tree structures. Not surprisingly if we're looking for an explicit 0B, then the loop over the operands finds nothing, but walking the tree structures does. And the checking assert triggers. This change removes the assert and instead sets *num_uses_p to a sane value. I don't have testcases for this stuff that are independent of the erroneous path isolation optimization. However, each is triggered by tests I'll include in the the erroneous path isolation patch. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? I'm withdrawing this patch. While everything works exactly how I wanted, I think it's best to rework things a bit and totally eliminate count_ptr_derefs entirely. Yes! And count_uses_and_derefs, too. See walk_stmt_load_store_addr_ops. Richard. That change makes it significantly easier to move infer_nonnull_range out of tree-vrp.c and into gimple.c (or wherever Andrew wants it) without a significant modularity break. Jeff
Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT
On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng bin.ch...@arm.com wrote: Hi, The IVOPT in GCC has a problem that it does not use cost of auto-increment address expression in accounting, while it retreats to cost of address expression if auto-increment addressing mode is unavailable. For example, on ARM target: 1) the cost of [reg] (which is 6) is used for address expression [reg], #off; 2) the cost of [reg+off] (which is 2) is used for address expression [reg, #off]!; This causes: 1) cost of non-auto increment address expression is used for auto-increment address expression; 2) different address costs are used for pre/post increment address expressions. This patch fixes the problem by computing, caching and using the cost of auto-increment address expressions. Bootstrap and test on x86/arm. Is it OK? But don't you need to adjust static bool determine_use_iv_cost_address (struct ivopts_data *data, struct iv_use *use, struct iv_cand *cand) { bitmap depends_on; bool can_autoinc; int inv_expr_id = -1; comp_cost cost = get_computation_cost (data, use, cand, true, depends_on, can_autoinc, inv_expr_id); if (cand-ainc_use == use) { if (can_autoinc) cost.cost -= cand-cost_step; this which seems to try to compensate for your issue? That's where problem gets complicated depending on how backend defines address cost. For back ends define cost according to the true cost of addressing mode approximately, the address cost of auto-increment addressing mode doesn't take the saved stepping instruction into consideration, so the code is necessary. Moreover, according to gcc internal's description about TARGET_ADDRESS_COST, RISC machines may define different address cost for addressing modes which actually have equal execution on micro-architecture level (like ARM for now). The problems are: 1) Though address costs are defined in this discriminated way, it's unlikely to have the saved stepping instruction considered either. The address cost of auto-increment address expression shouldn't go so far. 2) We think the discriminated address cost model is established before gimple pass and is outdated. The true micro-architecture address cost (or cost normalized with COSTS_N_INSNS) should be used in GCC nowadays. The rtl passes like fwprop_addr which use address cost as heuristic information should be refined... but that's totally another problem (am investigating it). So overall, I think the stepping cost should to be subtracted here. Or maybe I don't understand. CCing Bernd who implemented this IIRC. Any suggestions appreciated. Thanks. bin -- Best Regards.
Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size
On Mon, Nov 4, 2013 at 3:34 AM, H.J. Lu hjl.to...@gmail.com wrote: Y On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: H.J. Lu hongjiu...@intel.com writes: emit_block_move_via_movmem and set_storage_via_setmem have for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; mode = GET_MODE_WIDER_MODE (mode)) { enum insn_code code = direct_optab_handler (movmem_optab, mode); if (code != CODE_FOR_nothing /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT here because if SIZE is less than the mode mask, as it is returned by the macro, it will definitely be less than the actual mode mask. */ ((CONST_INT_P (size) ((unsigned HOST_WIDE_INT) INTVAL (size) = (GET_MODE_MASK (mode) 1))) || GET_MODE_BITSIZE (mode) = BITS_PER_WORD)) { Backend may assume mode of size in movmem and setmem expanders is no widder than Pmode since size is within the Pmode address space. X86 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has rtx saveddest = *destptr; gcc_assert (desired_align = size); /* Align destptr up, place it to new register. */ *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr, GEN_INT (prolog_size), NULL_RTX, 1, OPTAB_DIRECT); *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr, GEN_INT (-desired_align), *destptr, 1, OPTAB_DIRECT); /* See how many bytes we skipped. */ saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest, *destptr, saveddest, 1, OPTAB_DIRECT); /* Adjust srcptr and count. */ if (!issetmem) *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, saveddest, *srcptr, 1, OPTAB_DIRECT); *count = expand_simple_binop (GET_MODE (*count), PLUS, *count, saveddest, *count, 1, OPTAB_DIRECT); saveddest is a negative number in Pmode and *count is in word_mode. For x32, when Pmode is SImode and word_mode is DImode, saveddest + *count leads to overflow. We could fix it by using mode of saveddest to compute saveddest + *count. But it leads to extra conversions and other backends may run into the same problem. A better fix is to limit mode of size in movmem and setmem expanders to Pmode. It generates better and correct memcpy and memset for x32. There is also a typo in comments. It should be BITS_PER_WORD, not BITS_PER_HOST_WIDE_INT. I don't think it's a typo. It's explaining why we don't have to worry about: GET_MODE_BITSIZE (mode) BITS_PER_HOST_WIDE_INT in the CONST_INT_P test (because in that case the GET_MODE_MASK macro will be an all-1 HOST_WIDE_INT, even though that's narrower than the real mask). Thanks for explanation. I don't think the current comment covers the BITS_PER_WORD test at all. AIUI it's there because the pattern is defined as taking a length of at most word_mode, so we should stop once we reach it. I see. FWIW, I agree Pmode makes more sense on face value. But shouldn't we replace the BITS_PER_WORD test instead of adding to it? Having both would only make a difference for Pmode word_mode targets, which might be able to handle full Pmode lengths. Do we ever have a target with Pmode is wider than word_mode? If not, we can check Pmode instead. Either way, the md.texi documentation should be updated too. This is the updated patch with md.texi change. The testcase is the same. Tested on x32. OK to install? Thanks. -- H.J. --- gcc/ 2013-11-04 H.J. Lu hongjiu...@intel.com PR middle-end/58981 * doc/md.texi (@code{movmem@var{m}}): Specify Pmode as mode of pattern, instead of word_mode. * expr.c (emit_block_move_via_movmem): Don't use mode wider than Pmode for size. (set_storage_via_setmem): Likewise. gcc/testsuite/ 2013-11-04 H.J. Lu hongjiu...@intel.com PR middle-end/58981 * gcc.dg/pr58981.c: New test. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index ac10a0a..1e22b88 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5291,12 +5291,13 @@ are the first two operands, and both are @code{mem:BLK}s with an address in mode @code{Pmode}. The number of bytes to move is the third operand, in mode @var{m}. -Usually, you specify @code{word_mode} for @var{m}. However, if you can +Usually, you specify @code{Pmode} for @var{m}. However, if you can generate better code knowing the range of valid lengths is smaller than -those representable in a full word, you should provide a pattern with a +those representable in a
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/01/2013 06:42 PM, David Malcolm wrote: ! gimple_statement_call *call; ! call = gimple_build_call_vec (build_fold_addr_expr_loc (0, alias), vargs); gimple_call_set_lhs (call, atree); that is looking much better :-) I'd love to make use of compile-time type-safety like this, but does this have to gate this initial step? no, I wasn't implying it did... just mapping out what I think we can go on to do with it. And eventually we can pull the accessor routines and others into the class itself: gimple_call call; call = gimple_build_call_vec (build_fold_addr_expr_loc (0, alias), vargs); call-set_lhs (atree); Nice. It's readable (IMHO), and the type-checking (that it's a call) is enforced at compile-time, rather than at run-time (and only in ENABLE_CHECKING builds). Sadly, this may need some further gengtype work: the simple inheritance support I'm using in this patch series will barf if it sees methods within a class. So I can have a look at supporting methods in gengtype, I guess. Though the change above of gimple_call_set_lhs to accepting a subclass ptr gives a middle-ground. Eventually this is going to be an issue elsewhere as well. Whats the problem with methods in gengtype? it just doesn't parse them? or something deeper and more insidious? Which results in a similar feel to the new gimple_type, gimple_decl, etc. classes with my upcoming tree wrappers. Changing gimple statements to behave something like this is actually mentioned in the plan, but out in the future once trees have been taken care of. I would also plan to create instances for each of the gimple statements By instances presumably you meant subclasses? yeah. That all said, this change enables that work to proceed if someone wants to do it. My question is: Is anyone going to do it, and if so, who and when? :-) I'd be up for working on a followup patch that adds such subclasses, and I'd be up for changing the accessors to give up compile-time type-safety. I'd do it one accessor at a time. Its probably pushing it for 4.9... and doesn't really serve much purpose to try do do any of this followup work now. Probably a better task for the next stage 1 anyway when there is time to fully flush it out and prove the various improvements to have full support from everyone :-) Again, as noted in the earlier patch series, the names of the structs are rather verbose. I would prefer to also rename them all to eliminate the _statement component: gimple_statement_base - gimple_base gimple_statement_phi - gimple_phi gimple_statement_omp - gimple_omp etc, but I didn't do this to mimimize the patch size. But if the core maintainers are up for that, I can redo the patch series with that change also, or do that as a followup. As mentioned, I'd rather see the short names be typedefs for pointers to the long names since we're typically using the pointers. [FWIW, I'm not a fan of such typedefs, but I'll defer to you in this] Just expressing my opinion :-) Presumably we'd want typedefs for the extra subclasses as well? Also, I take it we'd use the printable_name from gimple.def - though would we use it for the subclass name, or for the ptr typedef? I would think a typedef for any type that we would directly refer to in the code, like a gimple_omp_clause, gimple_assign, or whatever. Other than stomping on the same bits at the moment (less so once gimple-stmt.[ch] is split out), these changes are completely orthogonal, but in line with with my direction. I think It should be done sooner or later My preference is to also see the follow up work carried out as well. Or at least a plan for it. So you like the approach, provided I commit to the followup work? [1] OK, I'll try to create some followup patches next week. I'd prefer to get the conversion to inheritance into trunk sooner rather than later, though. Thanks for looking at this (and for your cleanup work!) Dave [1] and the patches still need formal review. Im not the one that has to be sold on this, I'm actually ambivalent to the change this close to the end of stage1... I'm pointing out what we can eventually do with it in case it sways opinion.. I would have considered something similar in a while when the other refactoring work reached that point. In some ways it might be better for next stage 1, but perhaps you have outlined enough benefits for it to be attractive to someone today. It seems to be pretty neutral in most other respects. Andrew
[Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed
The AVR backend does not generate position independent code, yet it happily accepts -fpic, -fPIC, -fpie and -fPIE. The generated code doesn't change at all. Also, it accepts the -shared option to generate a shared library, without really doing anything with it. This causes one of the regression tests (gcc.dg/lto/pr54709 c_lto_pr54709_0.o-c_lto_pr54709_1.o link) to fail with an 'undefined reference to main' error, when the test is trying to build a shared object. The attached patch generates a warning if one of the -f{pic,PIC,pie,PIE} options is provided, and an error if -shared is provided ( config/mep/mep.c and config/s390/tpf.h already do something very similar). Regression tested with no new failures.Tests which exercise PIC now report as unsupported. If ok, could someone commit please? I don't have commit access. Regards Senthil gcc/ChangeLog 2013-11-04 Senthil Kumar Selvaraj senthil_kumar.selva...@atmel.com * config/avr/avr.c (avr_option_override): Warn if asked to generate position independent code. * config/avr/avr.h: Modify LINK_SPEC to reject -shared. diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c index e7e1c2f..cf4b8ca 100644 --- gcc/config/avr/avr.c +++ gcc/config/avr/avr.c @@ -310,6 +310,15 @@ avr_option_override (void) flag_omit_frame_pointer = 0; } + if (flag_pic == 1) +warning (OPT_fpic, -fpic is not supported); + if (flag_pic == 2) +warning (OPT_fPIC, -fPIC is not supported); + if (flag_pie == 1) +warning (OPT_fpie, -fpie is not supported); + if (flag_pie == 2) +warning (OPT_fPIE, -fPIE is not supported); + avr_current_device = avr_mcu_types[avr_mcu_index]; avr_current_arch = avr_arch_types[avr_current_device-arch]; diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h index f223a61..1eff5be 100644 --- gcc/config/avr/avr.h +++ gcc/config/avr/avr.h @@ -522,7 +522,8 @@ extern const char *avr_device_to_sp8 (int argc, const char **argv); mmcu=at90can64*|\ mmcu=at90usb64*:--pmem-wrap-around=64k}}}\ %:device_to_ld(%{mmcu=*:%*})\ -%:device_to_data_start(%{mmcu=*:%*}) +%:device_to_data_start(%{mmcu=*:%*})\ +%{shared:%eshared is not supported} #define LIB_SPEC \ %{!mmcu=at90s1*:%{!mmcu=attiny11:%{!mmcu=attiny12:%{!mmcu=attiny15:%{!mmcu=attiny28: -lc }
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Thu, Oct 31, 2013 at 7:11 AM, Jeff Law l...@redhat.com wrote: I've incorporated the various suggestions from Marc and Richi, except for Richi's to integrate this into jump threading. I've also made the following changes since the last version: 1. Added more testcases. 2. Use infer_nonnull_range, moving it from tree-vrp.c into gimple.c. Minor improvements to infer_nonnull_range to make it handle more cases we care about and avoid using unnecessary routines from tree-ssa.c (which can now be removed) 3. Multiple undefined statements in a block are handled in the logical way. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? Comments inline Thanks, Jeff * Makefile.in (OBJS): Add gimple-ssa-isolate-paths.o * common.opt (-fisolate-erroneous-paths): Add option and documentation. * gimple-ssa-isolate-paths.c: New file. * gimple.c (check_loadstore): New function. (infer_nonnull_range): Moved into gimple.c from tree-vrp.c Verify OP is in the argument list and the argument corresponding to OP is a pointer type. Use operand_equal_p rather than pointer equality when testing if OP is on the nonnull list. Use check_loadstore rather than count_ptr_derefs. Handle GIMPLE_RETURN statements. * tree-vrp.c (infer_nonnull_range): Remove. * gimple.h (infer_nonnull_range): Declare. (gsi_start_nondebug_after_labels): New function. * opts.c (default_options_table): Add OPT_fisolate_erroneous_paths. * passes.def: Add pass_isolate_erroneous_paths. * timevar.def (TV_ISOLATE_ERRONEOUS_PATHS): New timevar. * tree-pass.h (make_pass_isolate_erroneous_paths): Declare. * tree-ssa.c (struct count_ptr_d): Remove. (count_ptr_derefs, count_uses_and_derefs): Remove. * tree-ssa.h (count_uses_and_derefs): Remove. * gcc.dg/pr38984.c: Add -fno-isolate-erroneous-paths. * gcc.dg/tree-ssa/20030711-3.c: Update expected output. * gcc.dg/tree-ssa/isolate-1.c: New test. * gcc.dg/tree-ssa/isolate-2.c: New test. * gcc.dg/tree-ssa/isolate-3.c: New test. * gcc.dg/tree-ssa/isolate-4.c: New test. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 29609fd..7e9a702 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1233,6 +1233,7 @@ OBJS = \ gimple-fold.o \ gimple-low.o \ gimple-pretty-print.o \ + gimple-ssa-isolate-paths.o \ gimple-ssa-strength-reduction.o \ gimple-streamer-in.o \ gimple-streamer-out.o \ diff --git a/gcc/common.opt b/gcc/common.opt index deeb3f2..6db9f56 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2104,6 +2104,12 @@ foptimize-strlen Common Report Var(flag_optimize_strlen) Optimization Enable string length optimizations on trees +fisolate-erroneous-paths +Common Report Var(flag_isolate_erroneous_paths) Init(1) Optimization Drop Init(1) (see below) +Detect paths which trigger erroneous or undefined behaviour. Isolate those +paths from the main control flow and turn the statement with erroneous or +undefined behaviour into a trap. + ftree-loop-distribution Common Report Var(flag_tree_loop_distribution) Optimization Enable loop distribution on trees diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c new file mode 100644 index 000..aa526cc --- /dev/null +++ b/gcc/gimple-ssa-isolate-paths.c @@ -0,0 +1,332 @@ +/* Detect paths through the CFG which can never be executed in a conforming + program and isolate them. + + Copyright (C) 2013 + Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include flags.h +#include basic-block.h +#include gimple.h +#include tree-ssa.h +#include gimple-ssa.h +#include tree-ssa-operands.h +#include tree-phinodes.h +#include ssa-iterators.h +#include cfgloop.h +#include tree-pass.h + + +static bool cfg_altered; + +/* BB when reached via incoming edge E will exhibit undefined behaviour + at STMT. Isolate and optimize the path which exhibits undefined + behaviour. + + Isolation is simple. Duplicate BB and redirect E to BB'. +
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/01/2013 06:58 PM, David Malcolm wrote: On Fri, 2013-11-01 at 22:57 +0100, Jakub Jelinek wrote: On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote: On 11/01/2013 05:41 PM, Jakub Jelinek wrote: On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote: static inline void ! gimple_call_set_lhs (gimple gs, tree lhs) { - GIMPLE_CHECK (gs, GIMPLE_CALL); The checking you are removing here. What checking? There ought to be no checking at all in this example... gimple_build_call_vec returns a gimple_call, and gimple_call_set_lhs() doesn't have to check anything because it only accepts gimple_call's.. so there is no checking other than the usual does my parameter match that the compiler has to do... and want to replace it by checking of the types at compile time. The problem is that it uglifies the source too much, and, when you actually don't have a gimple_call but supposedly a base class of it, I expect you'd do as_a which is not only further uglification, but has runtime cost also for --enable-checking=release. I can have a look next week at every call to gimple_call_set_lhs in the tree, and see to what extent we know at compile-time that the initial arg is indeed a call (of the ones I quickly grepped just now, most are from gimple_build_call and friends, but one was from a gimple_copy). FWIW I did some performance testing of the is_a/as_a code in the earlier version of the patch, and it didn't have a noticable runtime cost compared to the GIMPLE_CHECK in the existing code: Size of compiler executable: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01920.html Compile times: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00171.html I actually really dislike as_a and is_a, and think code needs to be restructured rather than use them, other than possibly at the very bottom level when we're allocating memory or something like that, or some kind of emergency :-)... If we require frequent uses of those, I'd be against it, I find them quite ugly. Like I said in the other reply, no rush, I don't think any of this follow up is appropriate this late in stage 1. It would be more of an interest examination right now.. at least in my opinion... I suspect thinks like gimple_assign are more complex cases, but without looking its hard to tell for sure. Andrew
Re: Pre-Patch RFC: proposed changes to option-lookup
On Wed, Oct 30, 2013 at 11:56 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 30 Oct 2013, David Malcolm wrote: My idea is to introduce a GCC_OPTION macro, and replace the above with: static bool gate_vrp (void) { return GCC_OPTION (flag_tree_vrp) != 0; } That's only slightly shorter than the full expansion using global_options; I'd prefer using the full expansion. (Of course the ideal is to use explicit options pointers instead of global_options. For example, if such a pointer were associated with the current function, it might make function-specific options handling a bit less fragile.) Seconded - ideally the 'struct function' a pass is supposed to work on would be passed as argument here and the options in effect should be able to be extracted from it. [that is, we shouldn't have push/pop_cfun or cfun or current_function_decl at all] Note that I think expanding this to global_options.x_flag_tree_vrp wouldn't be a step forward. Richard. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian
Hi, This cleans up another case where a vector-pack operation needs to reverse its operand order for little endian. This fixes the last remaining vector test failure in the test suite when building for little endian. Next I'll be spending some time looking for additional little endian issues for which we don't yet have test coverage. As an example, there are several patterns similar to vect_pack_trunc_v2df that probably need the same treatment as this patch, but we don't have any test cases that expose a problem. I need to verify those are broken and add test cases when fixing them. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for little endian. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 204320) +++ gcc/config/rs6000/vector.md (working copy) @@ -830,7 +830,12 @@ emit_insn (gen_vsx_xvcvdpsp (r1, operands[1])); emit_insn (gen_vsx_xvcvdpsp (r2, operands[2])); - rs6000_expand_extract_even (operands[0], r1, r2); + + if (BYTES_BIG_ENDIAN) +rs6000_expand_extract_even (operands[0], r1, r2); + else +rs6000_expand_extract_even (operands[0], r2, r1); + DONE; })
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
On Thu, Oct 31, 2013 at 10:02 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, Here is a patch which hadles the problem with optimization of BUILT_IN_CHKP_ARG_BND calls. Pointer Bounds Checker expects that argument of this call is a default SSA_NAME of the PARM_DECL whose bounds we want to get. The problem is in optimizations which may replace arg with it's copy or a known value. This patch suppress such modifications. This doesn't seem like a good fix. I suppose you require the same on RTL, that is, have the incoming arg reg coalesced with the use? In that case better set SSA_NAME_OCCURS_IN_ABNORMAL_PHI. Richard. Thanks, Ilya -- gcc/ 2013-10-28 Ilya Enkovich ilya.enkov...@intel.com * tree-into-ssa.c: Include target.h (rewrite_update_stmt): Skip BUILT_IN_CHKP_ARG_BND calls. * tree-ssa-dom.c: Include target.h (cprop_into_stmt): Skip BUILT_IN_CHKP_ARG_BND calls. diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c index 981e9f4..8d48f6d 100644 --- a/gcc/tree-into-ssa.c +++ b/gcc/tree-into-ssa.c @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include params.h #include diagnostic-core.h #include tree-into-ssa.h +#include target.h /* This file builds the SSA form for a function as described in: @@ -1921,8 +1922,14 @@ rewrite_update_stmt (gimple stmt, gimple_stmt_iterator gsi) } /* Rewrite USES included in OLD_SSA_NAMES and USES whose underlying - symbol is marked for renaming. */ - if (rewrite_uses_p (stmt)) + symbol is marked for renaming. + Skip calls to BUILT_IN_CHKP_ARG_BND whose arg should never be + renamed. */ + if (rewrite_uses_p (stmt) + !(flag_check_pointer_bounds + (gimple_code (stmt) == GIMPLE_CALL) + gimple_call_fndecl (stmt) + == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND))) { if (is_gimple_debug (stmt)) { diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 211bfcf..445278a 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include params.h #include tree-ssa-threadedge.h #include tree-ssa-dom.h +#include target.h /* This file implements optimizations on the dominator tree. */ @@ -2266,6 +2267,16 @@ cprop_into_stmt (gimple stmt) use_operand_p op_p; ssa_op_iter iter; + /* Call used to obtain bounds of input arg by Pointer Bounds Checker + should not be optimized. Argument of the call is a default + SSA_NAME of PARM_DECL. It should never be replaced by value. */ + if (flag_check_pointer_bounds gimple_code (stmt) == GIMPLE_CALL) +{ + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND)) + return; +} + FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE) cprop_operand (stmt, op_p); }
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
On Thu, Oct 31, 2013 at 10:24 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, Here is a patch to support of instrumented code in calls verifiers and calls copy with skipped args. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * cgraph.c (gimple_check_call_args): Handle bound args. * gimple.c (gimple_call_copy_skip_args): Likewise. (validate_call): Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 52d9ab0..9d7ae85 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -3030,40 +3030,54 @@ gimple_check_call_args (gimple stmt, tree fndecl, bool args_count_match) { for (i = 0, p = DECL_ARGUMENTS (fndecl); i nargs; - i++, p = DECL_CHAIN (p)) + i++) { - tree arg; + tree arg = gimple_call_arg (stmt, i); + + /* Skip bound args inserted by Pointer Bounds Checker. */ + if (POINTER_BOUNDS_P (arg)) + continue; + /* We cannot distinguish a varargs function from the case of excess parameters, still deferring the inlining decision to the callee is possible. */ if (!p) break; - arg = gimple_call_arg (stmt, i); + if (p == error_mark_node || arg == error_mark_node || (!types_compatible_p (DECL_ARG_TYPE (p), TREE_TYPE (arg)) !fold_convertible_p (DECL_ARG_TYPE (p), arg))) return false; + + p = DECL_CHAIN (p); } if (args_count_match p) return false; } else if (parms) { - for (i = 0, p = parms; i nargs; i++, p = TREE_CHAIN (p)) + for (i = 0, p = parms; i nargs; i++) { - tree arg; + tree arg = gimple_call_arg (stmt, i); + + /* Skip bound args inserted by Pointer Bounds Checker. */ + if (POINTER_BOUNDS_P (arg)) + continue; + /* If this is a varargs function defer inlining decision to callee. */ if (!p) break; - arg = gimple_call_arg (stmt, i); + if (TREE_VALUE (p) == error_mark_node || arg == error_mark_node || TREE_CODE (TREE_VALUE (p)) == VOID_TYPE || (!types_compatible_p (TREE_VALUE (p), TREE_TYPE (arg)) !fold_convertible_p (TREE_VALUE (p), arg))) return false; + + p = TREE_CHAIN (p); } } else diff --git a/gcc/gimple.c b/gcc/gimple.c index 20f6010..dc85bf8 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -3048,15 +3048,20 @@ canonicalize_cond_expr_cond (tree t) gimple gimple_call_copy_skip_args (gimple stmt, bitmap args_to_skip) { - int i; + int i, bit; int nargs = gimple_call_num_args (stmt); vectree vargs; vargs.create (nargs); gimple new_stmt; - for (i = 0; i nargs; i++) -if (!bitmap_bit_p (args_to_skip, i)) - vargs.quick_push (gimple_call_arg (stmt, i)); + for (i = 0, bit = 0; i nargs; i++, bit++) + if (POINTER_BOUNDS_P (gimple_call_arg (stmt, i))) + { + if (!bitmap_bit_p (args_to_skip, --bit)) + vargs.quick_push (gimple_call_arg (stmt, i)); + } + else if (!bitmap_bit_p (args_to_skip, bit)) + vargs.quick_push (gimple_call_arg (stmt, i)); The new code is completely confusing. You need to update comments with what bits args_to_skip refers to and what happens with pointer bounds. I suppose the bitmap also contains bound parameters as they are in calls (but not in fndecls -- I think this discrepancy still is a way to desaster). if (gimple_call_internal_p (stmt)) new_stmt = gimple_build_call_internal_vec (gimple_call_internal_fn (stmt), @@ -3702,6 +3707,9 @@ validate_call (gimple stmt, tree fndecl) if (!targs) return true; tree arg = gimple_call_arg (stmt, i); + /* Skip bounds. */ + if (flag_check_pointer_bounds POINTER_BOUNDS_P (arg)) + continue; Why check flag_check_pointer_bounds here but not elsewhere? if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) INTEGRAL_TYPE_P (TREE_VALUE (targs))) ;
[Patch, testsuite] Require scheduling support in gcc.dg/superblock.c
Hi, gcc.dg/superblock.c uses the -fdump-rtl-sched2 option and then scans the resulting dump. This fails for the AVR target, as it doesn't have any instruction scheduling support. The attached patch makes this test require scheduling support in the target. If ok, could someone commit please? I do not have commit access. Regards Senthil gcc/testsuite 2013-11-04 Senthil Kumar Selvaraj senthil_kumar.selva...@atmel.com * gcc.dg/superblock.c: Require scheduling support diff --git gcc/testsuite/gcc.dg/superblock.c gcc/testsuite/gcc.dg/superblock.c index 2b9aedf..272d161 100644 --- gcc/testsuite/gcc.dg/superblock.c +++ gcc/testsuite/gcc.dg/superblock.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fno-asynchronous-unwind-tables -fsched2-use-superblocks -fdump-rtl-sched2 -fdump-rtl-bbro } */ +/* { dg-require-effective-target scheduling } */ typedef int aligned __attribute__ ((aligned (64))); extern void abort (void);
Re: Aliasing: look through pointer's def stmt
On Thu, Oct 31, 2013 at 6:11 PM, Marc Glisse marc.gli...@inria.fr wrote: On Wed, 30 Oct 2013, Richard Biener wrote: --- gcc/tree-ssa-alias.c(revision 204188) +++ gcc/tree-ssa-alias.c(working copy) @@ -567,20 +567,29 @@ void ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size) { HOST_WIDE_INT t1, t2; ref-ref = NULL_TREE; if (TREE_CODE (ptr) == SSA_NAME) { gimple stmt = SSA_NAME_DEF_STMT (ptr); if (gimple_assign_single_p (stmt) gimple_assign_rhs_code (stmt) == ADDR_EXPR) ptr = gimple_assign_rhs1 (stmt); + else if (is_gimple_assign (stmt) + gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + host_integerp (gimple_assign_rhs2 (stmt), 0) + (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0) No need to restrict this to positive offsets I think. Here is the follow-up patch to remove this restriction (bootstrap+testsuite on x86_64-unknown-linux-gnu). I don't really know how safe it is. I couldn't use host_integerp(,1) since it returns false for (size_t)(-1) on x86_64, and host_integerp(,0) seemed strange, but I could use it if you want. Well, host_integer_p (, 0) looks correct to me. But ... Index: gcc/tree-ssa-alias.h === --- gcc/tree-ssa-alias.h(revision 204267) +++ gcc/tree-ssa-alias.h(working copy) @@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct extern void dump_pta_stats (FILE *); extern GTY(()) struct pt_solution ipa_escaped_pt; /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2] overlap. SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the range is open-ended. Otherwise return false. */ static inline bool -ranges_overlap_p (unsigned HOST_WIDE_INT pos1, - unsigned HOST_WIDE_INT size1, - unsigned HOST_WIDE_INT pos2, - unsigned HOST_WIDE_INT size2) +ranges_overlap_p (HOST_WIDE_INT pos1, + HOST_WIDE_INT size1, + HOST_WIDE_INT pos2, + HOST_WIDE_INT size2) I think size[12] should still be unsigned (we don't allow negative sizes but only the special value -1U which we special-case only to avoid pos + size to wrap) Otherwise ok. Thanks, Richard. { if (pos1 = pos2 - (size2 == (unsigned HOST_WIDE_INT)-1 + (size2 == -1 || pos1 (pos2 + size2))) return true; if (pos2 = pos1 - (size1 == (unsigned HOST_WIDE_INT)-1 + (size1 == -1 || pos2 (pos1 + size1))) return true; return false; } #endif /* TREE_SSA_ALIAS_H */
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/01/2013 05:57 PM, Jakub Jelinek wrote: On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote: On 11/01/2013 05:41 PM, Jakub Jelinek wrote: On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote: static inline void ! gimple_call_set_lhs (gimple gs, tree lhs) { - GIMPLE_CHECK (gs, GIMPLE_CALL); The checking you are removing here. What checking? There ought to be no checking at all in this example... gimple_build_call_vec returns a gimple_call, and gimple_call_set_lhs() doesn't have to check anything because it only accepts gimple_call's.. so there is no checking other than the usual does my parameter match that the compiler has to do... and want to replace it by checking of the types at compile time. The problem is that it uglifies the source too much, and, when you actually don't have a gimple_call but supposedly a base class of it, But when you convert all the source base for gimple calls, then you have context pretty much everywhere.. you wont be doing it from a base class... at least pretty infrequently. I expect you'd do as_a which is not only further uglification, but has runtime cost also for --enable-checking=release. Im not a fan of as_a or is_a at all. I really dislike them. They should be kept to the barest minimum, or even non existent if possible. If you need it, you probably need restructured source. ie, functions which work with switches on gimple_code() need to be restructured... And the end result should look cleaner. Of course, thats what makes it a big project too :-) Andrew
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote: What checking? There ought to be no checking at all in this example... gimple_build_call_vec returns a gimple_call, and gimple_call_set_lhs() doesn't have to check anything because it only accepts gimple_call's.. so there is no checking other than the usual does my parameter match that the compiler has to do... and want to replace it by checking of the types at compile time. The problem is that it uglifies the source too much, and, when you actually don't have a gimple_call but supposedly a base class of it, But when you convert all the source base for gimple calls, then you have context pretty much everywhere.. you wont be doing it from a base class... at least pretty infrequently. Usually you just have IL which contains various kinds of statements, it can be a call, assign, dozens of other gimple stmt forms. If you need to modify something in these, you'd need to uglify with as_a/is_a if you require everybody using these setters/getters to use gimple_call * rather than gimple, pointer to any kind of stmt. Jakub
Re: [PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian
On Mon, Nov 4, 2013 at 8:28 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This cleans up another case where a vector-pack operation needs to reverse its operand order for little endian. This fixes the last remaining vector test failure in the test suite when building for little endian. Next I'll be spending some time looking for additional little endian issues for which we don't yet have test coverage. As an example, there are several patterns similar to vect_pack_trunc_v2df that probably need the same treatment as this patch, but we don't have any test cases that expose a problem. I need to verify those are broken and add test cases when fixing them. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for little endian. This patch is okay. Thanks, David
Re: [PATCH, rs6000] Re-permute source register for postreload splits of VSX LE stores
On Sat, Nov 2, 2013 at 7:44 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, When I created the patch to split VSX loads and stores to add permutes so the register image was correctly little endian, I forgot to implement a known requirement. When a VSX store is split after reload, we must reuse the source register for the permute, which leaves it in a swapped state after the store. If the register remains live, this is invalid. After the store, we need to permute the register back to its original value. For each of the little endian VSX store patterns, I've replaced the define_insn_and_split with a define_insn and two define_splits, one for prior to reload and one for after reload. The post-reload split has the extra behavior. I don't know of a way to set the insn's length attribute conditionally, so I'm optimistically setting this to 8, though it will be 12 in the post-reload case. Is there any concern with that? Is there a way to make it fully accurate? Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. This fixes two failing test cases. Is this ok for trunk? Thanks! Bill 2013-11-02 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/vsx.md (*vsx_le_perm_store_mode for VSX_D): Replace the define_insn_and_split with a define_insn and two define_splits, with the split after reload re-permuting the source register to its original value. (*vsx_le_perm_store_mode for VSX_W): Likewise. (*vsx_le_perm_store_v8hi): Likewise. (*vsx_le_perm_store_v16qi): Likewise. This is okay, but you must change the insn length to the conservative, larger value. insn length is used to reverse branches when the displacement is too large. Thanks, David
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/04/2013 09:00 AM, Jakub Jelinek wrote: On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote: What checking? There ought to be no checking at all in this example... gimple_build_call_vec returns a gimple_call, and gimple_call_set_lhs() doesn't have to check anything because it only accepts gimple_call's.. so there is no checking other than the usual does my parameter match that the compiler has to do... and want to replace it by checking of the types at compile time. The problem is that it uglifies the source too much, and, when you actually don't have a gimple_call but supposedly a base class of it, But when you convert all the source base for gimple calls, then you have context pretty much everywhere.. you wont be doing it from a base class... at least pretty infrequently. Usually you just have IL which contains various kinds of statements, it can be a call, assign, dozens of other gimple stmt forms. If you need to modify something in these, you'd need to uglify with as_a/is_a if you require everybody using these setters/getters to use gimple_call * rather than gimple, pointer to any kind of stmt. Jakub In many cases splitting out the case body and typing functions will surprisingly cover many of the the uses required. Proper subclassing will cover a lot too if the get/set methods are in the right classes. In cases where none of this works... well, there is an appropriate time for a virtual function... Thats really all those kinds off switches are implementing but I won't get into that meta discussion right now :-). We can save that for someday when there is actually a real proposal on the table. Andrew
Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?
Hi, Steven Thanks for investigating this. This presumably was the reason that Vlad changed the constraint modifier for that pattern in his patch for LRA. I don't think that using memory is an improvement, but Mike is the best person to comment. Thanks, David On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher stevenb@gmail.com wrote: Hello, Today's powerpc64-linux gcc has 2 extra failures with -mlra vs. reload (i.e. svn unpatched). (I'm excluding guality failure differences here because there are too many of them that seem to fail at random after minimal changes anywhere in the compiler...). Test results are posted here: reload: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00128.html lra: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00129.html The new failures and total score is as follows (+=lra, -=reload): +FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6 +FAIL: gcc.target/powerpc/pr58330.c scan-assembler-not stwbrx === gcc Summary === -# of expected passes 97887 -# of unexpected failures 536 +# of expected passes 97903 +# of unexpected failures 538 # of unexpected successes 38 # of expected failures 244 -# of unsupported tests 1910 +# of unsupported tests 1892 The failure of pr53199.c is because of different instruction selection for bswap. Test case is reduced to just one function: /* { dg-options -O2 -mcpu=power6 -mavoid-indexed-addresses } */ long long reg_reverse (long long x) { return __builtin_bswap64 (x); } Reload left vs. LRA right: reg_reverse: reg_reverse: srdi 8,3,32 | addi 8,1,-16 rlwinm 7,3,8,0x | srdi 10,3,32 rlwinm 9,8,8,0x | addi 9,8,4 rlwimi 7,3,24,0,7| stwbrx 3,0,8 rlwimi 7,3,24,16,23 | stwbrx 10,0,9 rlwimi 9,8,24,0,7| ld 3,-16(1) rlwimi 9,8,24,16,23 sldi 7,7,32 or 7,7,9 mr 3,7 blrblr This same difference is responsible for the failure of pr58330.c which also uses __builtin_bswap64(). The difference in RTL for the test case is this (after reload vs. after LRA): - 11: {%7:DI=bswap(%3:DI);clobber %8:DI;clobber %9:DI;clobber %10:DI;} - 20: %3:DI=%7:DI + 20: %8:DI=%1:DI-0x10 + 21: %8:DI=%8:DI // stupid no-op move + 11: {[%8:DI]=bswap(%3:DI);clobber %9:DI;clobber %10:DI;clobber scratch;} + 19: %3:DI=[%1:DI-0x10] So LRA believes going through memory is better than using a register, even though obviously there are plenty registers available. What LRA does: Creating newreg=129 Removing SCRATCH in insn #11 (nop 2) Creating newreg=130 Removing SCRATCH in insn #11 (nop 3) Creating newreg=131 Removing SCRATCH in insn #11 (nop 4) // at this point the insn would be a bswapdi2_64bit: // 11: {%3:DI=bswap(%3:DI);clobber r129;clobber r130;clobber r131;} // cost calculation for the insn alternatives: 0 Early clobber: reject++ 1 Non-pseudo reload: reject+=2 1 Spill pseudo in memory: reject+=3 2 Scratch win: reject+=2 3 Scratch win: reject+=2 4 Scratch win: reject+=2 alt=0,overall=18,losers=1,rld_nregs=0 0 Non-pseudo reload: reject+=2 0 Spill pseudo in memory: reject+=3 0 Non input pseudo reload: reject++ 2 Scratch win: reject+=2 3 Scratch win: reject+=2 alt=1,overall=16,losers=1,rld_nregs=0 Staticly defined alt reject+=12 0 Early clobber: reject++ 2 Scratch win: reject+=2 3 Scratch win: reject+=2 4 Scratch win: reject+=2 0 Conflict early clobber reload: reject-- alt=2,overall=24,losers=1,rld_nregs=0 Choosing alt 1 in insn 11: (0) Z (1) r (2) b (3) r (4) X {*bswapdi2_64bit} Change to class BASE_REGS for r129 Change to class GENERAL_REGS for r130 Creating newreg=132 from oldreg=3, assigning class NO_REGS to r132 Change to class NO_REGS for r131 11: {r132:DI=bswap(%3:DI);clobber r129:DI;clobber r130:DI;clobber r131:DI;} REG_UNUSED r131:DI REG_UNUSED r130:DI REG_UNUSED r129:DI LRA selects alternative 1 (Z,r,b,r,X) which seems to be the right choice, from looking at the constraints. Reload selects alternative 2 which is slightly^2 discouraged: (??r,r,r,r,r). Is this an improvement or a regression? If it's an improvement then these two test cases should be adjusted :-) Ciao! Steven
Re: [wide-int] Avoid some changes in output code
On 11/04/2013 04:12 AM, Richard Biener wrote: On Fri, 1 Nov 2013, Richard Sandiford wrote: I'm building one target for each supported CPU and comparing the wide-int assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding output from the merge point. This patch removes all the differences I saw for alpha-linux-gnu in gcc.c-torture. Hunk 1: Preserve the current trunk behaviour in which the shift count is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by inspection after hunk 5. Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider types and where we weren't extending according to the sign of the source. We should probably assert that the input is at least as wide as the type... Hunk 4: The in: if ((TREE_INT_CST_HIGH (arg1) mask_hi) == mask_hi (TREE_INT_CST_LOW (arg1) mask_lo) == mask_lo) had got dropped during the conversion. Hunk 5: The old code was: if (shift 0) { *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); *val = r1val.llshift (shift, TYPE_PRECISION (type)); } else if (shift 0) { shift = -shift; *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); } and these precision arguments had two purposes: to control which bits of the first argument were shifted, and to control the truncation mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions for the second. (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the end of the !GENERATOR_FILE list in rtl.def, since the current position caused some spurious differences. The problem AFAICT is that hash_rtx hashes on code, RTL PRE creates registers in the hash order of the associated expressions, RA uses register numbers as a tie-breaker during ordering, and so the order of rtx_code can indirectly influence register allocation. First time I'd realised that could happen, so just thought I'd mention it. I think we should keep rtl.def in the current (logical) order though.) Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED from double-int.c on the trunk? If not then putting this at the callers of wi::rshift and friends is clearly asking for future mistakes. Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else than in machine instruction patterns was a mistake in the past. I disagree. The programmer has the right to expect that the optimizer transforms the program in a manner that is consistent, but faster, with the underlying machine. Saying you are going to do it in the patterns is not realistic. I should point out that i had originally put the shift count truncated inside wide-int and you (richi) told me to push it back out to the callers. I personally think that it should be returned to wide-int to assure consistent behavior. However, i would also say that it would be nice if were actually done right. There are (as far as i know) actually 3 ways handling shift amounts that have been used in the past: (1) mask the shift amount by (bitsize - 1). This is the most popular and is what is done if you set SHIFT_COUNT_TRUNCATED. (2) mask the shift amount by ((2*bitsize) -1). There are a couple of architectures that do this including the ppc. However, gcc has never done it correctly for these platforms. (3) assume that the shift amount can be a big number.The only machine that i know of that ever did this was the Knuth's mmix, and AFAIK, it has never known silicon. (it turns out that this is almost impossible to build efficiently). And yet, we support this if you do not set SHIFT_COUNT_TRUNCATED. I will, if you like, fix this properly. But i want to see a plan that at least richi and richard are happy with first.My gut feeling is to provide a target hook that takes a bitsize and a shift value as wide-int (or possibly a hwi) and returns a machine specific shift amount.Then i would put that inside of the shift routines of wide-int. kenny
Re: libsanitizer merge from upstream r191666
+glider, our Mac expert. Hi Jack, This patch has not been tested on Mac and we generally do not claim that gcc-asan is supported on Mac. clang-asan *is* supported on Mac and our bots are green (this patch is a merge of the sources which are regularly tested on Mac, but the build procedure is different). The failing assertion is weird, I haven't seen it since last year (https://code.google.com/p/address-sanitizer/issues/detail?id=117), but Alexander should know more. What is your version of Mac OS? --kcc On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth howa...@bromo.med.uc.edu wrote: On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote: Jukub, This works! New patch attached. Konstantin, This patch, when applied on top of r204305, bootstraps fine on x86_64-apple-darwin12 but produces a lot of regressions with... make -k check RUNTESTFLAGS=asan.exp --target_board=unix'{-m64}' Native configuration is x86_64-apple-darwin12.5.0 === g++ tests === Running target unix/-m64 FAIL: c-c++-common/asan/global-overflow-1.c -O0 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O1 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/global-overflow-1.c -O3 -g execution test FAIL: c-c++-common/asan/global-overflow-1.c -Os execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 -flto execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O0 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O1 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O3 -g execution test FAIL: c-c++-common/asan/heap-overflow-1.c -Os execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 -flto execution test FAIL: c-c++-common/asan/memcmp-1.c -O0 execution test FAIL: c-c++-common/asan/memcmp-1.c -O1 execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 execution test FAIL: c-c++-common/asan/memcmp-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/memcmp-1.c -O3 -g execution test FAIL: c-c++-common/asan/memcmp-1.c -Os execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 -flto execution test FAIL: c-c++-common/asan/null-deref-1.c -O0 execution test FAIL: c-c++-common/asan/null-deref-1.c -O1 execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 execution test FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/null-deref-1.c -O3 -g execution test FAIL: c-c++-common/asan/null-deref-1.c -Os execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O0 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O1 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O3 -g execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -Os execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 -flto execution test FAIL: c-c++-common/asan/sleep-before-dying-1.c -O2 execution test FAIL: c-c++-common/asan/sleep-before-dying-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/sleep-before-dying-1.c -O2 -flto execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O0 execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O1 execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O2 execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O3 -g execution test FAIL: c-c++-common/asan/stack-overflow-1.c -Os execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/stack-overflow-1.c -O2 -flto execution test FAIL: c-c++-common/asan/strip-path-prefix-1.c -O2 execution test FAIL: c-c++-common/asan/strip-path-prefix-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/strip-path-prefix-1.c -O2 -flto execution test FAIL: c-c++-common/asan/strncpy-overflow-1.c -O0
Re: Ping^3 Re: Add predefined macros for library use in defining __STDC_IEC_559*
On Mon, Nov 04, 2013 at 02:51:37PM +, Joseph S. Myers wrote: Ping^3. The addition of a new target hook in this patch http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01131.html is still pending review. Ok. Jakub
Re: [wide-int] Avoid some changes in output code
Kenneth Zadeck zad...@naturalbridge.com writes: However, i would also say that it would be nice if were actually done right. There are (as far as i know) actually 3 ways handling shift amounts that have been used in the past: (1) mask the shift amount by (bitsize - 1). This is the most popular and is what is done if you set SHIFT_COUNT_TRUNCATED. (2) mask the shift amount by ((2*bitsize) -1). There are a couple of architectures that do this including the ppc. However, gcc has never done it correctly for these platforms. There's TARGET_SHIFT_TRUNCATION_MASK for these cases. It applies only to the shift instruction patterns though, not to any random rtl shift expression. Part of the reason for this mess is that the things represented as rtl shifts didn't necessarily start out as shifts. E.g. during combine, extractions are rewritten as shifts, optimised, then converted back. So since SHIFT_COUNT_TRUNCATED applies to all rtl shift expressions, it also has to apply to all bitfield operations, since bitfield operations temporarily appear as shifts (see tm.texi). TARGET_SHIFT_TRUNCATION_MASK deliberately limits the scope to the shift instruction patterns because that's easier to contain. E.g. knowing that shifts operate to a certain mask allows you to generate more efficient doubleword shift sequences, which was why TARGET_SHIFT_TRUNCATION_MASK was added in the first place. (3) assume that the shift amount can be a big number.The only machine that i know of that ever did this was the Knuth's mmix, and AFAIK, it has never known silicon. (it turns out that this is almost impossible to build efficiently). And yet, we support this if you do not set SHIFT_COUNT_TRUNCATED. Well, we support it in the sense that we make no assumptions about what happens, at least at the rtl level. We just punt on any out-of-range shift and leave it to be evaluated at run time. Thanks, Richard
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Bernd Edlinger bernd.edlin...@hotmail.de writes: I see another read_line at gcov.c, which seems to be a copy. Maybe this should be changed too? I have seen it as well. I'd rather have the patch be reviewed and everthing, and only then propose to share the implementation with the gcov module. -- Dodji
[omp4]
Hi. While looking over some of your testcases I noticed that array subscripts are not being properly adjusted: foo(int i) { array[i] = } The `i' should use the simd array magic instead of ICEing :). Is the attached patch OK for the branch? Aldy gcc/ChangeLog.gomp * omp-low.c (ipa_simd_modify_function_body): Adjust tree operands on the LHS of an assign. (ipa_simd_modify_function_body_ops_1): New. (ipa_simd_modify_function_body_ops): New. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index d30fb17..94058af 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -11049,6 +11049,35 @@ simd_clone_init_simd_arrays (struct cgraph_node *node, return seq; } +static tree +ipa_simd_modify_function_body_ops_1 (tree *tp, int *walk_subtrees, void *data) +{ + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; + ipa_parm_adjustment_vec *adjustments = (ipa_parm_adjustment_vec *) wi-info; + tree t = *tp; + + if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) +return (tree) sra_ipa_modify_expr (tp, true, *adjustments); + else +*walk_subtrees = 1; + return NULL_TREE; +} + +/* Helper function for ipa_simd_modify_function_body. Make any + necessary adjustments for tree operators. */ + +static bool +ipa_simd_modify_function_body_ops (tree *tp, + ipa_parm_adjustment_vec *adjustments) +{ + struct walk_stmt_info wi; + memset (wi, 0, sizeof (wi)); + wi.info = adjustments; + bool res = (bool) walk_tree (tp, ipa_simd_modify_function_body_ops_1, + wi, NULL); + return res; +} + /* Traverse the function body and perform all modifications as described in ADJUSTMENTS. At function return, ADJUSTMENTS will be modified such that the replacement/reduction value will now be an @@ -11121,6 +11150,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node, case GIMPLE_ASSIGN: t = gimple_assign_lhs_ptr (stmt); modified |= sra_ipa_modify_expr (t, false, adjustments); + + /* The LHS may have operands that also need adjusting +(e.g. `foo' in array[foo]). */ + modified |= ipa_simd_modify_function_body_ops (t, adjustments); + for (i = 0; i gimple_num_ops (stmt); ++i) { t = gimple_op_ptr (stmt, i); diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c new file mode 100644 index 000..8818594 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options -fopenmp } */ + +/* Test that array subscripts are properly adjusted. */ + +int array[1000]; +#pragma omp declare simd notinbranch simdlen(4) +void foo (int i) +{ + array[i] = 555; +}
Re: [omp4] fix array subscripts in simd clones
On 11/04/13 08:19, Aldy Hernandez wrote: Hi. While looking over some of your testcases I noticed that array subscripts are not being properly adjusted: foo(int i) { array[i] = } The `i' should use the simd array magic instead of ICEing :). Is the attached patch OK for the branch? Aldy Ughh, my apologies for the lack of subject. Fixed in this followup.
Re: PATCH to use -Wno-format during stage1
On 11/04/2013 07:41 AM, Richard Biener wrote: Please make sure the warning is preserved when not bootstrapping, it's useful to have them in your dev-tree and not notice errors only during -Werror bootstrap ... I found it to be useless noise when not bootstrapping, since my system compiler isn't likely to recognize recently added % codes any time soon. Jason
Re: [C++ PATCH] Fix PR58979
On 11/04/2013 05:50 AM, Marek Polacek wrote: case RO_ARROW: +case RO_ARROW_STAR: error_at (loc, invalid type argument of %-% (have %qT), type); I think the diagnostic for RO_ARROW_STAR should say -*, not just -. Jason
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Jakub Jelinek ja...@redhat.com writes: [...] Eventually, I think using int for sizes is just a ticking bomb, what if somebody uses 2GB long lines? Surely, on 32-bit hosts we are unlikely to handle it, but why couldn't 64-bit host handle it? Column info maybe bogus in there, sure, but at least we should not crash or overflow buffers on it ;). Anyway, not something needed to be fixed right now, but in the future it would be nicer to use size_t and/or ssize_t here. Yes. I initially tried to use size_t but found that I'd need to modify several other places to shutdown many warning because these places where using int :-(. So I felt that would be a battle for later. But I am adding this to my TODO. I'll send a patch later that changes this to size_t then, and adjusts the other places that need it as well. [...] context-last_location = diagnostic-location; s = expand_location_to_spelling_point (diagnostic-location); - line = location_get_source_line (s); + line = location_get_source_line (s, line_width); I think richi didn't like C++ reference arguments to be used that way (and perhaps guidelines don't either), because it isn't immediately obvious that line_width is modified by the call. Can you change it to a pointer argument instead and pass line_width? Sure. I have done the change in the patch below. Sorry for this reflex. I tend to use pointers like these only in places where we can allow them to be NULL. XNEWVEC or XRESIZEVEC will never return NULL though, so it doesn't have to be tested. Though, the question is if that is what we want, caret diagnostics should be optional, if we can't print it, we just won't. Hmmh. This particular bug was noticed because of the explicit OOM message displayed by XNEWVEC/XRESIZEVEC; otherwise, I bet this could have just felt through the crack for a little longer. So I'd say let's just use XNEWVEC/XRESIZEVEC and remove the test, as you first suggested. The caret diagnostics functionality as a whole can be disabled with -fno-diagnostic-show-caret. [...] Otherwise, LGTM. Thanks. So here is the patch that bootstraps. gcc/ChangeLog: * input.h (location_get_source_line): Take an additional line_size parameter by reference. * input.c (get_line): New static function definition. (read_line): Take an additional line_length output parameter to be set to the size of the line. Use the new get_line function to compute the size of the line returned by fgets, rather than using strlen. Ensure that the buffer is initially zeroed; ensure that when growing the buffer too. (location_get_source_line): Take an additional output line_len parameter. Update the use of read_line to pass it the line_len parameter. * diagnostic.c (adjust_line): Take an additional input parameter for the length of the line, rather than calculating it with strlen. (diagnostic_show_locus): Adjust the use of location_get_source_line and adjust_line with respect to their new signature. While displaying a line now, do not stop at the first null byte. Rather, display the zero byte as a space and keep going until we reach the size of the line. gcc/testsuite/ChangeLog: * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file. --- gcc/diagnostic.c | 17 ++-- gcc/input.c| 100 ++--- gcc/input.h| 3 +- .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 - 240 bytes 4 files changed, 99 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..0ca7081 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context, MAX_WIDTH by some margin, then adjust the start of the line such that the COLUMN is smaller than MAX_WIDTH minus the margin. The margin is either 10 characters or the difference between the column - and the length of the line, whatever is smaller. */ + and the length of the line, whatever is smaller. The length of + LINE is given by LINE_WIDTH. */ static const char * -adjust_line (const char *line, int max_width, int *column_p) +adjust_line (const char *line, int line_width, +int max_width, int *column_p) { int right_margin = 10; - int line_width = strlen (line); int column = *column_p; right_margin = MIN (line_width - column, right_margin); @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context, const diagnostic_info *diagnostic) { const char *line; + int line_width; char *buffer; expanded_location s; int max_width; @@ -297,22 +299,25 @@ diagnostic_show_locus
Re: PATCH to use -Wno-format during stage1
On 11/04/2013 04:34 PM, Jason Merrill wrote: On 11/04/2013 07:41 AM, Richard Biener wrote: Please make sure the warning is preserved when not bootstrapping, it's useful to have them in your dev-tree and not notice errors only during -Werror bootstrap ... I found it to be useless noise when not bootstrapping, since my system compiler isn't likely to recognize recently added % codes any time soon. Likewise here. Paolo.
Re: [omp4]
On Mon, Nov 04, 2013 at 08:19:12AM -0700, Aldy Hernandez wrote: Hi. While looking over some of your testcases I noticed that array subscripts are not being properly adjusted: foo(int i) { array[i] = } The `i' should use the simd array magic instead of ICEing :). Is the attached patch OK for the branch? I guess short time yes, but I wonder if it wouldn't be better to use walk_gimple_op and do all changes in the callback. Instead of passing adjustments pass around a struct containing the adjustments, current stmt and the modified flag. You can use the val_only and is_lhs to determine what you need to do (probably need to reset those two for the subtrees to val_only = true, is_lhs = false and not walk subtrees of types) and you could (if not val_only) immediately gimplify it properly (insert temporary SSA_NAME setter before resp. store after depending on is_lhs). Then you could avoid the regimplification. Jakub
Re: PATCH to use -Wno-format during stage1
On Mon, Nov 04, 2013 at 10:34:55AM -0500, Jason Merrill wrote: On 11/04/2013 07:41 AM, Richard Biener wrote: Please make sure the warning is preserved when not bootstrapping, it's useful to have them in your dev-tree and not notice errors only during -Werror bootstrap ... I found it to be useless noise when not bootstrapping, since my system compiler isn't likely to recognize recently added % codes any time soon. That's true, but it is easy to ignore those, while without -Wformat you won't notice until you bootstrap that you actually passed incorrect parameters to some *printf etc. family function. Generally, in non-bootstrap compilers I tend to ignore warnings except for the file(s) I've been hacking on. Jakub
Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian
Bill Schmidt wschm...@linux.vnet.ibm.com writes: + /* We need this to be vmulouh for both big and little endian, + but for little endian we would swap this, so avoid that. */ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); + else + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two)); FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into define_expands and have define_insns for the underlying instructions. E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub depending on endianness. Then the unspec name would always match the instruction, and you could also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above. It probably works out as more code overall, but maybe it means jumping through fewer mental hoops... Thanks, Richard
Re: [C++ PATCH] Fix PR58979
On Mon, Nov 04, 2013 at 10:36:35AM -0500, Jason Merrill wrote: On 11/04/2013 05:50 AM, Marek Polacek wrote: case RO_ARROW: +case RO_ARROW_STAR: error_at (loc, invalid type argument of %-% (have %qT), type); I think the diagnostic for RO_ARROW_STAR should say -*, not just -. Ah, sorry about that. What about this one? 2013-11-04 Marek Polacek pola...@redhat.com PR c++/58979 c-family/ * c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case. testsuite/ * g++.dg/diagnostic/pr58979.C: New test. --- gcc/c-family/c-common.c.mp 2013-11-04 16:44:48.674023654 +0100 +++ gcc/c-family/c-common.c 2013-11-04 16:45:52.632267904 +0100 @@ -9934,6 +9934,11 @@ invalid_indirection_error (location_t lo invalid type argument of %-% (have %qT), type); break; +case RO_ARROW_STAR: + error_at (loc, + invalid type argument of %-*% (have %qT), + type); + break; case RO_IMPLICIT_CONVERSION: error_at (loc, invalid type argument of implicit conversion (have %qT), --- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2 2013-11-04 11:09:30.515553664 +0100 +++ gcc/testsuite/g++.dg/diagnostic/pr58979.C 2013-11-04 11:09:26.505538785 +0100 @@ -0,0 +1,4 @@ +// PR c++/58979 +// { dg-do compile } + +int i = 0-*0; // { dg-error invalid type argument of } Marek
Re: [C++ PATCH] Fix PR58979
OK. Jason
Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp
Balaji, I am seeing a large number of libcilkrts failures on AIX. These all are of the form: Executing on host: /tmp/20131103/gcc/xgcc -B/tmp/20131103/gcc/ /nasfarm/edelsohn /src/src/gcc/testsuite/c-c++-common/cilk-plus/CK/sync_wo_spawn.c -fno-diagnosti cs-show-caret -fdiagnostics-color=never -O0 -flto -g -fcilkplus -L/tmp/2013110 3/powerpc-ibm-aix7.1.0.0/./libcilkrts/.libs -fcilkplus -S -o sync_wo_spawn.s (timeout = 300) cc1: error: LTO support has not been enabled in this configuration compiler exited with status 1 If the feature / functionality requires LTO, the tests should check that LTO is supported with check_effective_target_lto, either in each relevant test or in cilk-plus.exp. Thanks, David
patch to fix pr58968
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58968 It is actually the 3rd try to fix the problem. Removing no-ops move may result in mode-switching crashes on x86/x86-64 as it expects a specific code (by the way reload has the same problem on x86/x86-64). The patch was successfully bootstrapped and tested on x86/x86-64 and ppc64 (with LRA). Committed as rev. 204353. 2013-11-04 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58968 * lra-spills.c (return_regno_p): New function. (lra_final_code_change): Use it. 2013-11-04 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58968 * gfortran.dg/pr58968.f: New Index: lra-spills.c === --- lra-spills.c (revision 204352) +++ lra-spills.c (working copy) @@ -618,6 +618,33 @@ alter_subregs (rtx *loc, bool final_p) return res; } +/* Return true if REGNO is used for return in the current + function. */ +static bool +return_regno_p (unsigned int regno) +{ + rtx outgoing = crtl-return_rtx; + + if (! outgoing) +return false; + + if (REG_P (outgoing)) +return REGNO (outgoing) == regno; + else if (GET_CODE (outgoing) == PARALLEL) +{ + int i; + + for (i = 0; i XVECLEN (outgoing, 0); i++) + { + rtx x = XEXP (XVECEXP (outgoing, 0, i), 0); + + if (REG_P (x) REGNO (x) == regno) + return true; + } +} + return false; +} + /* Final change of pseudos got hard registers into the corresponding hard registers and removing temporary clobbers. */ void @@ -625,7 +652,7 @@ lra_final_code_change (void) { int i, hard_regno; basic_block bb; - rtx insn, curr, set; + rtx insn, curr; int max_regno = max_reg_num (); for (i = FIRST_PSEUDO_REGISTER; i max_regno; i++) @@ -636,7 +663,6 @@ lra_final_code_change (void) FOR_BB_INSNS_SAFE (bb, insn, curr) if (INSN_P (insn)) { - bool change_p; rtx pat = PATTERN (insn); if (GET_CODE (pat) == CLOBBER LRA_TEMP_CLOBBER_P (pat)) @@ -649,12 +675,24 @@ lra_final_code_change (void) continue; } - set = single_set (insn); - change_p = (set != NULL - REG_P (SET_SRC (set)) REG_P (SET_DEST (set)) - REGNO (SET_SRC (set)) = FIRST_PSEUDO_REGISTER - REGNO (SET_DEST (set)) = FIRST_PSEUDO_REGISTER); - + /* IRA can generate move insns involving pseudos. It is + better remove them earlier to speed up compiler a bit. + It is also better to do it here as they might not pass + final RTL check in LRA, (e.g. insn moving a control + register into itself). So remove an useless move insn + unless next insn is USE marking the return reg (we should + save this as some subsequent optimizations assume that + such original insns are saved). */ + if (NONJUMP_INSN_P (insn) GET_CODE (pat) == SET + REG_P (SET_SRC (pat)) REG_P (SET_DEST (pat)) + REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat)) + ! return_regno_p (REGNO (SET_SRC (pat + { + lra_invalidate_insn_data (insn); + delete_insn (insn); + continue; + } + lra_insn_recog_data_t id = lra_get_insn_recog_data (insn); struct lra_static_insn_data *static_id = id-insn_static_data; bool insn_change_p = false; @@ -668,20 +706,5 @@ lra_final_code_change (void) } if (insn_change_p) lra_update_operator_dups (id); - - if (change_p REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))) - { - /* Remove an useless move insn but only involving - pseudos as some subsequent optimizations are based on - that move insns involving originally hard registers - are preserved. IRA can generate move insns involving - pseudos. It is better remove them earlier to speed - up compiler a bit. It is also better to do it here - as they might not pass final RTL check in LRA, - (e.g. insn moving a control register into - itself). */ - lra_invalidate_insn_data (insn); - delete_insn (insn); - } } } Index: testsuite/gfortran.dg/pr58968.f === --- testsuite/gfortran.dg/pr58968.f (revision 0) +++ testsuite/gfortran.dg/pr58968.f (working copy) @@ -0,0 +1,96 @@ +C PR rtl-optimization/58968.f +C { dg-do compile { target powerpc*-*-*} } +C { dg-options -mcpu=power7 -O3 -w -ffast-math -funroll-loops } + SUBROUTINE MAKTABS(IW,SOME,LBOX1,LBOX2,LBOX3,NSPACE,NA,NB, + *LBST,X, + *NX,IAMA,IAMI,IBMA,IBMI,MNUM,IDIM,MSTA,IBO, + *IDSYM,ISYM1,NSYM, + *NACT,LWRK,KTAB,LGMUL, + *LCON,LCOA,LCOB, + *LANDET,LBNDET,NAST,NBST,LSYMA,LSYMB,LGCOM, + *MINI,MAXI,LSPA,LSPB,LDISB, + *LSAS,LSBS,LSAC,LSBC, + *ITGA,ITGB,IAST,IBST,NCI,NA1EX,NB1EX,FDIRCT) + IMPLICIT DOUBLE PRECISION(A-H,O-Z) + LOGICAL SOME +
Re: [omp4]
On Mon, Nov 04, 2013 at 09:22:39AM -0700, Aldy Hernandez wrote: Ok, then let's leave this for later. I'm following what we've done in tree-sra.c to be consistent. Perhaps later we could even clean up ipa_sra_modify_function_body in tree-sra.c. But for now I think we can concentrate on the basic functionality. Well, ipa_sra_modify_function_body doesn't need to regimplify, unlike the omp-low.c version thereof (supposedly because SRA only modifies loads/stores, but doesn't change SSA_NAMEs on arithmetics etc.). But yeah, this isn't requirement for hacking on this on gomp-4_0-branch, but it would be nice to clean it up before merging to trunk. Committing to omp4 branch. Thanks. Jakub
Re: Aliasing: look through pointer's def stmt
On Mon, 4 Nov 2013, Richard Biener wrote: Well, host_integer_p (, 0) looks correct to me. But ... Ok, I'll put it back. Index: gcc/tree-ssa-alias.h === --- gcc/tree-ssa-alias.h(revision 204267) +++ gcc/tree-ssa-alias.h(working copy) @@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct extern void dump_pta_stats (FILE *); extern GTY(()) struct pt_solution ipa_escaped_pt; /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2] overlap. SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the range is open-ended. Otherwise return false. */ static inline bool -ranges_overlap_p (unsigned HOST_WIDE_INT pos1, - unsigned HOST_WIDE_INT size1, - unsigned HOST_WIDE_INT pos2, - unsigned HOST_WIDE_INT size2) +ranges_overlap_p (HOST_WIDE_INT pos1, + HOST_WIDE_INT size1, + HOST_WIDE_INT pos2, + HOST_WIDE_INT size2) I think size[12] should still be unsigned (we don't allow negative sizes but only the special value -1U which we special-case only to avoid pos + size to wrap) But all we do with size[12] is compare it to -1 and perform arithmetic with pos[12]. At least for the second one, we need to cast size to a signed type so the comparisons make sense (unless we decide to use a double_int there). So I thought I would do the cast in the argument. Otherwise, I'll start the function with: HOST_WIDE_INT ssize1 = (HOST_WIDE_INT)size1; and replace size1 with ssize1 through the function. Is the reason of keeping size[12] unsigned for documentation? Or am I missing a reason why I may be breaking things? -- Marc Glisse
[C++ Patch, Java related/RFC] PR 11006
Hi, we have got this very old bug, where we ICE in build_java_class_ref because TYPE is an INTEGER_TYPE and we try to use TYPE_FIELDS on it. Shall we apply something like the below and resolve it for good? Alternately, we could maybe provide the same message we currently provide in release-builds, when we get to: if (!field) { error (can%'t find %class$% in %qT, type); return error_mark_node; } Thanks! Paolo. // Index: cp/init.c === --- cp/init.c (revision 204353) +++ cp/init.c (working copy) @@ -2461,9 +2461,16 @@ build_new_1 (vectree, va_gc **placement, tree ty if (vec_safe_is_empty (*placement) TYPE_FOR_JAVA (elt_type)) { tree class_addr; - tree class_decl = build_java_class_ref (elt_type); + tree class_decl; static const char alloc_name[] = _Jv_AllocObject; + if (!MAYBE_CLASS_TYPE_P (elt_type)) + { + error (%qT isn%'t a valid Java class type, elt_type); + return error_mark_node; + } + + class_decl = build_java_class_ref (elt_type); if (class_decl == error_mark_node) return error_mark_node; Index: testsuite/g++.dg/other/java3.C === --- testsuite/g++.dg/other/java3.C (revision 0) +++ testsuite/g++.dg/other/java3.C (working copy) @@ -0,0 +1,7 @@ +// PR c++/11006 + +typedef int* jclass; + +void foo () { + new __java_boolean; // { dg-error valid } +}
Re: libsanitizer merge from upstream r191666
On Mon, Nov 04, 2013 at 06:47:13AM -0800, Konstantin Serebryany wrote: +glider, our Mac expert. Hi Jack, This patch has not been tested on Mac and we generally do not claim that gcc-asan is supported on Mac. clang-asan *is* supported on Mac and our bots are green (this patch is a merge of the sources which are regularly tested on Mac, but the build procedure is different). The failing assertion is weird, I haven't seen it since last year (https://code.google.com/p/address-sanitizer/issues/detail?id=117), but Alexander should know more. What is your version of Mac OS? I am testing on MacOS X 10.8.5 with FSF gcc trunk built as... % gcc-fsf-4.9 -v Using built-in specs. COLLECT_GCC=gcc-fsf-4.9 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.9/libexec/gcc/x86_64-apple-darwin12.5.0/4.9.0/lto-wrapper Target: x86_64-apple-darwin12.5.0 Configured with: ../gcc-4.9-20131101/configure --prefix=/sw --prefix=/sw/lib/gcc4.9 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.9/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-isl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --enable-checking=yes --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.9 Thread model: posix gcc version 4.9.0 20131101 (experimental) (GCC) --kcc On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth howa...@bromo.med.uc.edu wrote: On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote: Jukub, This works! New patch attached. Konstantin, This patch, when applied on top of r204305, bootstraps fine on x86_64-apple-darwin12 but produces a lot of regressions with... make -k check RUNTESTFLAGS=asan.exp --target_board=unix'{-m64}' Native configuration is x86_64-apple-darwin12.5.0 === g++ tests === Running target unix/-m64 FAIL: c-c++-common/asan/global-overflow-1.c -O0 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O1 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 execution test FAIL: c-c++-common/asan/global-overflow-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/global-overflow-1.c -O3 -g execution test FAIL: c-c++-common/asan/global-overflow-1.c -Os execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/global-overflow-1.c -O2 -flto execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O0 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O1 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O3 -g execution test FAIL: c-c++-common/asan/heap-overflow-1.c -Os execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/heap-overflow-1.c -O2 -flto execution test FAIL: c-c++-common/asan/memcmp-1.c -O0 execution test FAIL: c-c++-common/asan/memcmp-1.c -O1 execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 execution test FAIL: c-c++-common/asan/memcmp-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/memcmp-1.c -O3 -g execution test FAIL: c-c++-common/asan/memcmp-1.c -Os execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/memcmp-1.c -O2 -flto execution test FAIL: c-c++-common/asan/null-deref-1.c -O0 execution test FAIL: c-c++-common/asan/null-deref-1.c -O1 execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 execution test FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/null-deref-1.c -O3 -g execution test FAIL: c-c++-common/asan/null-deref-1.c -Os execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O0 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O1 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O3 -g execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -Os execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 -flto -flto-partition=none execution test FAIL: c-c++-common/asan/sanity-check-pure-c-1.c -O2 -flto execution test FAIL: c-c++-common/asan/sleep-before-dying-1.c -O2 execution test FAIL: c-c++-common/asan/sleep-before-dying-1.c -O2 -flto -flto-partition=none execution test FAIL:
Re: PR 58958: wrong aliasing info
On Mon, 4 Nov 2013, Richard Biener wrote: On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 4 Nov 2013, Richard Biener wrote: On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, the issue was described in the PR and the message linked from there. ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size never learns of it and uses the offset+size as if they were meaningful. Well... it certainly learns of it, but it chooses to ignore... if (TREE_CODE (ptr) == ADDR_EXPR) -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), -ref-offset, t1, t2); +{ + ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0), + t, 0, safe); + ref-offset = BITS_PER_UNIT * t; +} safe == (t1 != -1 t1 == t2) note that ao_ref_init_from_ptr_and_size gets the size fed in as argument so I fail to see why it matters at all ...? That is, if you feed in a wrong size then it's the callers error. I think one issue is that the above code uses get_ref_base_and_extent on an address. It should have used get_addr_base_and_unit_offset. Isn't that what my patch does? Except that get_addr_base_and_unit_offset often gives up and returns NULL_TREE, whereas I believe we still want a base even if we have trouble determining a constant offset, so I modified get_addr_base_and_unit_offset_1 a bit. (not saying the result is pretty...) I don't think we want get_addr_base_and_unit_offset to return non-NULL for a non-constant offset. But yes, inside your patch is the correct fix, so if you drop changing get_addr_base_and_unit_offset ... That means that in a number of cases, ao_ref_init_from_ptr_and_size will produce a meaningless ao_ref (both .ref and .base are 0). I expect that will cause regressions in code quality at least. Or did you mean something else? get_inner_reference might be an option too (we have quite a few functions doing similar things). -- Marc Glisse
Re: [C++ Patch, Java related/RFC] PR 11006
Surely it should be valid to allocate a Java boolean type. Andrew, how should that work? Jason
Re: PATCH to use -Wno-format during stage1
On 11/04/2013 10:50 AM, Jakub Jelinek wrote: That's true, but it is easy to ignore those, while without -Wformat you won't notice until you bootstrap that you actually passed incorrect parameters to some *printf etc. family function. Sure, but that is much less common than the false positives, and will be caught anyway when you actually bootstrap (or rebuild stage 3). Jason
Re: [Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed
As Senthil Kumar Selvaraj wrote: If ok, could someone commit please? I don't have commit access. I don't have commit access either, but the changes look reasonable to me. -- Joerg Wunsch * Development engineer, Dresden, Germany Atmel Automotive GmbH, Theresienstrasse 2, D-74027 Heilbronn Geschäftsführung: Steven Laub, Steve Skaggs, Dr. Matthias Kästner Amtsgericht Stuttgart, Registration HRB 106594
Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian
Per Richard S's suggestion, I'm reworking parts 1 and 3 of the patch set, but this one will remain unchanged and is ready for review. Thanks, Bill On Sun, 2013-11-03 at 23:34 -0600, Bill Schmidt wrote: Hi, This patch fixes the widening multiply high/low operations to work correctly in the presence of the first patch of this series, which reverses the meanings of multiply even/odd instructions. Here we reorder the input operands to the vector merge low/high instructions. The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on; that is, we need to reverse the usage of merge high and merge low, and also swap their inputs, to obtain the same semantics. In this case we are only swapping the inputs, because the reversed usage of high and low has already been done for us in the generic handling code for VEC_WIDEN_MULT_LO_EXPR. Bootstrapped and tested with the rest of the patch set on powerpc64{,le}-unknown-linux-gnu, with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-03 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap arguments to merge instruction for little endian. (vec_widen_umult_lo_v16qi): Likewise. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_smult_lo_v16qi): Likewise. (vec_widen_umult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Likewise. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_smult_lo_v8hi): Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md (revision 204192) +++ gcc/config/rs6000/altivec.md (working copy) @@ -2185,7 +2235,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }) @@ -2202,7 +2255,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }) @@ -2219,7 +2275,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }) @@ -2236,7 +2295,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }) @@ -2253,7 +2315,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }) @@ -2270,7 +2335,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); DONE; }) @@ -2287,7 +2355,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }) @@ -2304,7 +2375,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else +
Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw
Ian Lance Taylor i...@google.com wrote: On Mon, Nov 4, 2013 at 3:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor i...@google.com wrote: The middle-end currently marks all the sync builtins as nothrow. That is incorrect for most of them when using -fnon-call-exceptions. The -fnon-call-exceptions option means that instructions that trap may throw exceptions. Most of the sync builtins access memory via pointers passed by user code. Therefore, they may trap. (I noticed this because Go uses -fnon-call-exceptions.) This patch fixes the problem by introducing a new internal function attribute nothrow call. The new attribute is exactly like nothrow, except that it is ignored when using -fnon-call-exceptions. It is an internal attribute because there is no reason for a user to use this. The problem only arises when the middle-end sees a function call that the backend turns into an instruction sequence that directly references memory. That can only happen for functions that are built in to the compiler. This requires changes to the C family, LTO, and Ada frontends. I think I'm handling the option correctly for LTO, but it would be good if somebody could check the logic for me. Bootstrapped and ran tests on x86_64-unknown-linux-gnu. OK for mainline? Hmm. I think you can handle this in a simpler way by just doing sth similar to #define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \ ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING) that is, conditionally drop the _NOTHROW part. Good point. Btw, I also think that if it can throw then it isn't LEAF (it may return exceptionally into another function in the unit). According to the docs, a leaf function is permitted to return by throwing an exception. Also this whole conditional-on-a-flag thing doesn't work well with using -fnon-call-exceptions in the optimize attribute or with different -fnon-call-exceptions settings in different TUs we LTO together. Because we only have a single builtin decl (so for proper operation you'd have to clone builtin decls based on the matrix of flags that generate different attributes and use the correct one depending on context). Yes, but as you know this problem already exists with -fexceptions. I'm not really making it worse. That said, I'd prefer a simpler approach for now, mimicing flag_errno_math handling. Done. Attached. Bootstrapped on x86_64-unknown-linux-gnu, running tests now. OK for mainline if the tests pass? Ok Thanks, Richard Ian gcc/ChangeLog: 2013-11-04 Ian Lance Taylor i...@google.com * builtins.def (ATTR_NOTHROWCALL_LEAF_LIST): Define. * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync builtins that take pointers. * lto-opts.c (lto_write_options): Write -fnon-call-exceptions if set. * lto-wrapper.c (merge_and_complain): Collect OPT_fnon_call_exceptions. (run_gcc): Pass -fnon-call-exceptions. gcc/testsuite/ChangeLog: 2013-11-03 Ian Lance Taylor i...@google.com * g++.dg/ext/sync-4.C: New test.
Re: lto-plugin: mismatch between ld's architecture and GCC's configure --host
Ping. To sum it up, with these patches applied, there are no changes for a regular build (not using the new configure options). On the other hand, configuring GCC as described, it is possible use the 32-bit x86 linker for/with a x86_64 build, and get the very same GCC test results as when using a x86_64 linker. Sorry, I've been hoping someone with more global approval authority would respond. Allow overriding the libiberty used for building the LTO plugin. lto-plugin/ * configure.ac (--with-libiberty): New configure option. * configure: Regenerate. * Makefile.am (libiberty, libiberty_pic): New variables. (liblto_plugin_la_LIBADD, liblto_plugin_la_LDFLAGS) (liblto_plugin_la_DEPENDENCIES): Use them. * Makefile.in: Regenerate. These look OK to me. Allow for overriding a module's srcdir. * Makefile.tpl (configure-[+prefix+][+module+]) (configure-stage[+id+]-[+prefix+][+module+]): Allow for overriding a module's srcdir. * Makefile.in: Regenerate. These look OK, but I think a global maintainer or build machinery maintainer should give approval. Non-host system configuration for linker plugins. * configure.ac (--enable-linker-plugin-flags) (--enable-linker-plugin-configure-flags): New flags. (configdirs): Conditionally add libiberty-linker-plugin. * configure: Regenerate. * Makefile.def (host_modules): Add libiberty-linker-plugin. (host_modules) lto-plugin: Pay attention to @extra_linker_plugin_flags@ and @extra_linker_plugin_configure_flags@. (all-lto-plugin): Also depend on all-libiberty-linker-plugin. * Makefile.in: Regenerate. gcc/ * doc/install.texi (--enable-linker-plugin-flags) (--enable-linker-plugin-configure-flags): Document new flags. Same here. GNU ld can use linker plugins, too. gcc/ * doc/sourcebuild.texi (Top Level) lto-plugin: GNU ld can use linker plugins, too. OK. Fix typo. * Makefile.tpl: Fix typo. * Makefile.in: Regenerate. OK (qualifies as trivial and obvious). -cary
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/01/13 15:41, Jakub Jelinek wrote: On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote: static inline void ! gimple_call_set_lhs (gimple gs, tree lhs) { - GIMPLE_CHECK (gs, GIMPLE_CALL); gimple_set_op (gs, 0, lhs); to static inline void ! gimple_call_set_lhs (gimple_statement_call *gs, tree lhs) { gimple_set_op (gs, 0, lhs); but then every location that calls it needs an appropriate change: ! gimple call; ! call = gimple_build_call_vec (build_fold_addr_expr_loc (0, alias), vargs); gimple_call_set_lhs (call, atree); --- 1518,1524 ! gimple_statement_call *call; ! call = as_agimple_statement_call (gimple_build_call_vec (build_fold_addr_expr_loc (0, alias), vargs)); gimple_call_set_lhs (call, atree); And in fact there is a ripple effect to then change gimple_build_call_vec to simply return a gimple_statement_call *... Then this doesn't look as ugly either... ! gimple_statement_call *call; ! call = gimple_build_call_vec (build_fold_addr_expr_loc (0, alias), vargs); gimple_call_set_lhs (call, atree); that is looking much better :-) Do you seriously think this is an improvement? The cost of changing the --enable-checking=yes cost to compile time checking in either cases sounds way too high to me. Please don't. ?!? One of the things we're reallying trying to do here is be type safe and use the type system to check for things at compile-time rather than runtime checks. jeff
Re: [PATCH] Try to avoid vector mode punning in SET_DEST on i?86
On Thu, Oct 31, 2013 at 04:49:47PM +, Jakub Jelinek wrote: 2013-10-31 Jakub Jelinek ja...@redhat.com * optabs.c (expand_vec_perm): Avoid vector mode punning SUBREGs in SET_DEST. * expmed.c (store_bit_field_1): Likewise. * config/i386/sse.md (movdi_to_sse, vec_pack_sfix_trunc_v2df, vec_pack_sfix_v2df, vec_shl_mode, vec_shr_mode, vec_interleave_highmode, vec_interleave_lowmode): Likewise. * config/i386/i386.c (ix86_expand_vector_move_misalign, ix86_expand_sse_movcc, ix86_expand_int_vcond, ix86_expand_vec_perm, ix86_expand_sse_unpack, ix86_expand_args_builtin, ix86_expand_vector_init_duplicate, ix86_expand_vector_set, emit_reduc_half, expand_vec_perm_blend, expand_vec_perm_pshufb, expand_vec_perm_interleave2, expand_vec_perm_pshufb2, expand_vec_perm_vpshufb2_vpermq, expand_vec_perm_vpshufb2_vpermq_even_odd, expand_vec_perm_even_odd_1, expand_vec_perm_broadcast_1, expand_vec_perm_vpshufb4_vpermq2, ix86_expand_sse2_mulv4si3, ix86_expand_pinsr): Likewise. (expand_vec_perm_palignr): Likewise. Modify a copy of *d rather than *d itself. --- gcc/optabs.c.jj 2013-10-29 09:25:45.0 +0100 +++ gcc/optabs.c2013-10-31 13:20:40.384808642 +0100 @@ -6674,7 +6674,7 @@ expand_vec_perm (enum machine_mode mode, } tmp = gen_rtx_CONST_VECTOR (qimode, vec); sel = gen_lowpart (qimode, sel); - sel = expand_vec_perm (qimode, sel, sel, tmp, NULL); + sel = expand_vec_perm (qimode, gen_reg_rtx (qimode), sel, tmp, NULL); gcc_assert (sel != NULL); /* Add the byte offset to each byte element. */ This hunk causes issues on AArch64 and ARM. We look to see which permute operation we should generate in aarch64_expand_vec_perm_const. If we notice that all elements in would select from op0 we copy op0 to op1 and generate appropriate code. With this hunk applied we end up selecting from the register generated by gen_reg_rtx (qimode), rather than from 'sel' as we intended. Thus we lose the value of 'sel' and everything starts to go wrong! The hunk looks suspicious to me (why do we pick a register out of thin air?), and reverting it fixes the problems I see. Could you give me a pointer as to what this hunk fixes on i?86? I don't think I can fix this up in the expander very easily? Thanks, James
Re: [Patch, ObjC, ObjC++, Darwin] Remove -fobjc-sjlj-exceptions
On Sun, 3 Nov 2013, Iain Sandoe wrote: gcc/c-family: * c-opts.c (c_common_post_options): Remove handling of flag_objc_sjlj_exceptions. * c.opt (fobjc-sjlj-exceptions): Issue a warning if this option is given. The c-family changes are OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [driver] Force options Wa, Wl, Wp, to take a mandatory argument
On Mon, 4 Nov 2013, Mingjie Xing wrote: Hello, This patch forces options Wa, Wl, Wp, to take a mandatory argument, which can fix the bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55651. Tested on i686-pc-linux-gnu. 2013-11-04 Mingjie Xing mingjie.x...@gmail.com * common.opt (Wa, Wl, Wp,): Change JoinedOrMissing to Joined. Is it OK? As discussed in the thread starting at http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00848.html, this is incorrect; these options should pass through empty strings unchanged. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
On Mon, Nov 4, 2013 at 2:06 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2a5a2e1..8f5d39a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. +@cindex @code{ssad@var{m}} instruction pattern +@item @samp{ssad@var{m}} +@cindex @code{usad@var{m}} instruction pattern +@item @samp{usad@var{m}} +Compute the sum of absolute differences of two signed/unsigned elements. +Operand 1 and operand 2 are of the same mode. Their absolute difference, which +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode +equal or wider than the mode of the absolute difference. The result is placed +in operand 0, which is of the same mode as operand 3. + @cindex @code{ssum_widen@var{m3}} instruction pattern @item @samp{ssum_widen@var{m3}} @cindex @code{usum_widen@var{m3}} instruction pattern diff --git a/gcc/expr.c b/gcc/expr.c index 4975a64..1db8a49 100644 I'm not sure I follow, and if I do - I don't think it matches what you have implemented for i386. From your text description I would guess the series of operations to be: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) operands[0] = v3 + operands[3] But if I understand the behaviour of PSADBW correctly, what you have actually implemented is: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) v4 = reduce_plus (v3) operands[0] = v4 + operands[3] To my mind, synthesizing the reduce_plus step will be wasteful for targets who do not get this for free with their Absolute Difference step. Imagine a simple loop where we have synthesized the reduce_plus, we compute partial sums each loop iteration, though we would be better to leave the reduce_plus step until after the loop. REDUC_PLUS_EXPR would be the appropriate Tree code for this. What do you mean when you use synthesizing here? For each pattern, the only synthesized operation is the one being returned from the pattern recognizer. In this case, it is USAD_EXPR. The recognition of reduce sum is necessary as we need corresponding prolog and epilog for reductions, which is already done before pattern recognition. Note that reduction is not a pattern but is a type of vector definition. A vectorization pattern can still be a reduction operation as long as STMT_VINFO_RELATED_STMT of this pattern is a reduction operation. You can check the other two reduction patterns: widen_sum_pattern and dot_prod_pattern for reference. Thank you for your comment! Cong I would prefer to see this Tree code not imply the reduce_plus. Thanks, James
Re: changing a collision resolution strategy of the symbol table of identifiers
2013/10/20 Ondřej Bílka nel...@seznam.cz: On Sun, Oct 20, 2013 at 06:55:40PM +0600, Roman Gareev wrote: Dear gcc contributors, Recently I came across the list of ideas for speeding up GCC (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was suggested to replace identifier hash table with other data structure. Please find attached a patch, that switches the resolution strategy of a hash table used in the symbol table of identifiers from open addressing to separate chaining with linked list. I would be very grateful for your comments, feedback and ideas about further enhancements to gcc, in which I could take a part. I am pursuing master of science and looking for thesis theme, that is compiler related. Below are results of testing and the patch. During testing of the linux kernel (3.8.8) compilation time, the acquired results were the following: increasing by 0.17% for the version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing by 0.598% for trunk (this are average values). Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk. diff --git a/gcc/stringpool.c b/gcc/stringpool.c index f4d0dae..cc696f7 100644 --- a/gcc/stringpool.c +++ b/gcc/stringpool.c @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator op, void *cookie) /* SPD is saved in the PCH file and holds the information needed to restore the string pool. */ -struct GTY(()) string_pool_data { - ht_identifier_ptr * - GTY((length (%h.nslots), Mail client broke it? Sorry, an error occurred somewhere. Below is an attachment with the patch. patch Description: Binary data
[PATCH] Optional alternative base_expr in finding basis for CAND_REFs
Hi, This patch extends the slsr pass to optionally use an alternative base expression in finding basis for CAND_REFs. Currently the pass uses hash-based algorithm to match the base_expr in a candidate. Given a test case like the following, slsr will not be able to recognize the two CAND_REFs have the same basis, as their base_expr are of different SSA_NAMEs: typedef int arr_2[20][20]; void foo (arr_2 a2, int i, int j) { a2[i][j] = 1; a2[i + 10][j] = 2; } The gimple dump before slsr is like the following (using an arm-none-eabi gcc): i.0_2 = (unsigned int) i_1(D); _3 = i.0_2 * 80; _5 = a2_4(D) + _3; *_5[j_7(D)] = 1; _9 = _3 + 800; _10 = a2_4(D) + _9; *_10[j_7(D)] = 2; Here are the dumps for the two CAND_REFs generated for the two statements pointed by the arrows: 4 [2] _5 = a2_4(D) + _3; ADD : a2_4(D) + (80 * i_1(D)) : int[20] * basis: 0 dependent: 0 sibling: 0 next-interp: 0 dead-savings: 0 8 [2] *_10[j_7(D)] = 2; REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] * basis: 5 dependent: 0 sibling: 0 next-interp: 0 dead-savings: 0 As mentioned previously, slsr cannot establish that candidate 4 is the basis for the candidate 8, as they have different base_exprs: a2_4(D) and _10, respectively. However, the two references actually only differ by an immediate offset (800). This patch uses the tree affine combination facilities to create an optional alternative base expression to be used in finding (as well as recording) the basis. It calls tree_to_aff_combination_expand on base_expr, reset the offset field of the generated aff_tree to 0 and generate a tree from it by calling aff_combination_to_tree. The new tree is recorded as a potential basis, and when find_basis_for_candidate fails to find a basis for a CAND_REF in its normal approach, it searches again using a tree expanded in such way. Such an expanded tree usually discloses the expression behind an SSA_NAME. In the example above, instead of seeing the strength reduction candidate chains like this: _5 - 5 _10 - 8 we are now having: _5 - 5 _10 - 8 a2_4(D) + (sizetype) i_1(D) * 80 - 5 - 8 With the candidates 5 and 8 linked to the same tree expression (a2_4(D) + (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the basis of 8. The patch doesn't attempt to change the content of any CAND_REF though. It only enables CAND_REFs which (1) have the same stride and (2) have the underlying expressions of their base_expr only differ in immediate offsets, to be recognized to have the same basis. The statements with such CAND_REFs will be lowered to MEM_REFs, and later on the RTL expander shall be able to fold and re-associate the immediate offsets to the rightmost side of the addressing expression, and therefore exposes the common sub-expression successfully. The code-gen difference of the example code on arm with -O2 -mcpu=cortex-15 is: mov r3, r1, asl #6 - add ip, r0, r2, asl #2 str lr, [sp, #-4]! + mov ip, #1 + mov lr, #2 add r1, r3, r1, asl #4 - mov lr, #1 - mov r3, #2 add r0, r0, r1 - add r0, r0, #800 - str lr, [ip, r1] - str r3, [r0, r2, asl #2] + add r3, r0, r2, asl #2 + str ip, [r0, r2, asl #2] + str lr, [r3, #800] ldr pc, [sp], #4 One fewer instruction in this simple case. The example used in illustration is too simple to show code-gen difference on x86_64, but the included test case will show the benefit of the patch quite obviously. The patch has passed * bootstrapping on arm and x86_64 * regtest on arm-none-eabi, aarch64-none-elf and x86_64 There is no regression in SPEC2K on arm or x86_64. OK to commit to the trunk? Any comment is welcomed! Thanks, Yufeng gcc/ * gimple-ssa-strength-reduction.c: Include tree-affine.h. (find_basis_for_base_expr): Update comment. (find_basis_for_candidate): Add new parameter 'alt_base_expr' of type 'tree'. Optionally call find_basis_for_base_expr with 'alt_base_expr'. (record_potential_basis): Add new parameter 'alt_base_expr' of type 'tree'; set node-base_expr with 'alt_base_expr' if it is not NULL. (name_expansions): New static variable. (get_alternative_base): New function. (alloc_cand_and_find_basis): Call get_alternative_base for CAND_REF. Update calls to find_basis_for_candidate and record_potential_basis. (execute_strength_reduction): Call free_affine_expand_cache with name_expansions. gcc/testsuite/ * gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c index 9a5072c..3150046 100644 --- a/gcc/gimple-ssa-strength-reduction.c +++ b/gcc/gimple-ssa-strength-reduction.c @@ -48,6
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 31, 2013 at 10:02 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, Here is a patch which hadles the problem with optimization of BUILT_IN_CHKP_ARG_BND calls. Pointer Bounds Checker expects that argument of this call is a default SSA_NAME of the PARM_DECL whose bounds we want to get. The problem is in optimizations which may replace arg with it's copy or a known value. This patch suppress such modifications. This doesn't seem like a good fix. I suppose you require the same on RTL, that is, have the incoming arg reg coalesced with the use? In that case better set SSA_NAME_OCCURS_IN_ABNORMAL_PHI. Btw, I would have chosen P_2 = __builtin_xyz (p_1, bound) Call (p_2) Thus make the builtins a transparent pass-through which effectively binds parameter to their bound, removing the need for the artificial arguments and making propagations a non.issue Also, how does the feature interact with other extensions such as nested functions or optimizations like inlining? Richard. Richard. Thanks, Ilya -- gcc/ 2013-10-28 Ilya Enkovich ilya.enkov...@intel.com * tree-into-ssa.c: Include target.h (rewrite_update_stmt): Skip BUILT_IN_CHKP_ARG_BND calls. * tree-ssa-dom.c: Include target.h (cprop_into_stmt): Skip BUILT_IN_CHKP_ARG_BND calls. diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c index 981e9f4..8d48f6d 100644 --- a/gcc/tree-into-ssa.c +++ b/gcc/tree-into-ssa.c @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include params.h #include diagnostic-core.h #include tree-into-ssa.h +#include target.h /* This file builds the SSA form for a function as described in: @@ -1921,8 +1922,14 @@ rewrite_update_stmt (gimple stmt, gimple_stmt_iterator gsi) } /* Rewrite USES included in OLD_SSA_NAMES and USES whose underlying - symbol is marked for renaming. */ - if (rewrite_uses_p (stmt)) + symbol is marked for renaming. + Skip calls to BUILT_IN_CHKP_ARG_BND whose arg should never be + renamed. */ + if (rewrite_uses_p (stmt) + !(flag_check_pointer_bounds + (gimple_code (stmt) == GIMPLE_CALL) + gimple_call_fndecl (stmt) + == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND))) { if (is_gimple_debug (stmt)) { diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 211bfcf..445278a 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include params.h #include tree-ssa-threadedge.h #include tree-ssa-dom.h +#include target.h /* This file implements optimizations on the dominator tree. */ @@ -2266,6 +2267,16 @@ cprop_into_stmt (gimple stmt) use_operand_p op_p; ssa_op_iter iter; + /* Call used to obtain bounds of input arg by Pointer Bounds Checker + should not be optimized. Argument of the call is a default + SSA_NAME of PARM_DECL. It should never be replaced by value. */ + if (flag_check_pointer_bounds gimple_code (stmt) == GIMPLE_CALL) +{ + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND)) + return; +} + FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE) cprop_operand (stmt, op_p); }
Re: [C++ Patch, Java related/RFC] PR 11006
On 11/04/2013 06:21 PM, Jason Merrill wrote: Surely it should be valid to allocate a Java boolean type. Andrew, how should that work? Thanks. The problem we are facing (assuming we want to resolve this old isue) is that something seems seriously broken for the builtin Java types as declared in record_builtin_java_type, not just __java_boolean: all of them are just INTEGER_TYPEs or FLOAT_TYPEs. I'm wondering if we should special case the mangling for those in build_java_class_ref, something morally like the below. Then the testcase doesn't ICE anymore and if we add some missing declarations (definitely a _Jv_AllocObject) it even compiles. But my knowledge of Java is extremely weak, I have no idea how this is going to work. Paolo. / Index: init.c === --- init.c (revision 204355) +++ init.c (working copy) @@ -3068,22 +3068,29 @@ build_java_class_ref (tree type) jclass_node = TREE_TYPE (jclass_node); } - /* Mangle the class$ field. */ - { -tree field; -for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (DECL_NAME (field) == CL_suffix) + if (MAYBE_CLASS_TYPE_P (type)) +{ + /* Mangle the class$ field. */ + tree field; + for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) + if (DECL_NAME (field) == CL_suffix) + { + mangle_decl (field); + name = DECL_ASSEMBLER_NAME (field); + break; + } + if (!field) { - mangle_decl (field); - name = DECL_ASSEMBLER_NAME (field); - break; + error (can%'t find %class$% in %qT, type); + return error_mark_node; } -if (!field) - { - error (can%'t find %class$% in %qT, type); - return error_mark_node; - } - } +} + else +{ + /* Standard Java types per record_builtin_java_type. */ + mangle_decl (TYPE_NAME (type)); + name = DECL_ASSEMBLER_NAME (TYPE_NAME (type)); +} class_decl = IDENTIFIER_GLOBAL_VALUE (name); if (class_decl == NULL_TREE) extern Java { namespace java { namespace lang { class Class; class Object; } } } typedef struct java::lang::Object* jobject; typedef class java::lang::Class* jclass; extern C jobject _Jv_AllocObject (jclass) __attribute__((__malloc__)); void foo () { new __java_boolean; }
[PATCH] PR58985: testcase error.
Hi, This is to fix testcase error reported in PR58985. The intention of the testcase was to ensure there was no REG_EQUIV notes generated for a reg which was used in a paradoxical subreg. When target was x86, there was subreg generated so I omitted to add the subreg in the regexp pattern. However there is no subreg generated for target cris-axis-elf, so REG_EQUIV should be allowed. Is it ok for trunk and gcc-4.8 branch? Thanks, Wei Mi. 2013-11-04 Wei Mi w...@google.com PR regression/58985 * testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern. Index: testsuite/gcc.dg/pr57518.c === --- testsuite/gcc.dg/pr57518.c (revision 204353) +++ testsuite/gcc.dg/pr57518.c (working copy) @@ -2,7 +2,7 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-rtl-ira } */ -/* { dg-final { scan-rtl-dump-not REG_EQUIV\[^\n\]*mem\[^\n\]*\ip\ ira } } */ +/* { dg-final { scan-rtl-dump-not REG_EQUIV\[^\n\]*mem\[^\n\]*\ip\.*subreg ira } } */ char ip[10]; int total;
Re: changing a collision resolution strategy of the symbol table of identifiers
2013/10/31 Florian Weimer fwei...@redhat.com: On 10/20/2013 02:55 PM, Roman Gareev wrote: During testing of the linux kernel (3.8.8) compilation time, the acquired results were the following: increasing by 0.17% for the version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing by 0.598% for trunk (this are average values). Can you share the raw numbers? Are the differences statistically significant? -- Florian Weimer / Red Hat Product Security Team These are the row numbers. I'll share results of checking for statistical significance as soon as more compilations are made. results Description: Binary data
RFA: patch to fix PR58967
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967 The removed code is too old. To be honest, I even don't remember why I added this. LRA has been changed a lot since this change and now it works fine without it. There is no test for this case as it is too big. The patch was successfully bootstrapped on ppc64 (with LRA) and tested on ppc{32|64}. Is it ok to commit it to the trunk? 2013-11-04 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58967 * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove !lra_in_progress for mode sizes bigger word. Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 204305) +++ config/rs6000/rs6000.c (working copy) @@ -6388,7 +6388,7 @@ legitimate_lo_sum_address_p (enum machin return false; if (GET_MODE_NUNITS (mode) != 1) return false; - if (! lra_in_progress GET_MODE_SIZE (mode) UNITS_PER_WORD + if (GET_MODE_SIZE (mottp://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967de) UNITS_PER_WORD !(/* ??? Assume floating point reg based on mode? */ TARGET_HARD_FLOAT TARGET_FPRS TARGET_DOUBLE_FLOAT (mode == DFmode || mode == DDmode)))
Re: RFA: patch to fix PR58967
On Mon, Nov 4, 2013 at 2:23 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967 The removed code is too old. To be honest, I even don't remember why I added this. LRA has been changed a lot since this change and now it works fine without it. There is no test for this case as it is too big. The patch was successfully bootstrapped on ppc64 (with LRA) and tested on ppc{32|64}. Is it ok to commit it to the trunk? 2013-11-04 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58967 * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove !lra_in_progress for mode sizes bigger word. Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 204305) +++ config/rs6000/rs6000.c (working copy) @@ -6388,7 +6388,7 @@ legitimate_lo_sum_address_p (enum machin return false; if (GET_MODE_NUNITS (mode) != 1) return false; - if (! lra_in_progress GET_MODE_SIZE (mode) UNITS_PER_WORD + if (GET_MODE_SIZE (mottp://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967de) UNITS_PER_WORD This is okay, assuming the weird bugzilla URL paste shown in the patch is not really included. Thanks, David
Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian
On Mon, Nov 4, 2013 at 12:34 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This patch fixes the widening multiply high/low operations to work correctly in the presence of the first patch of this series, which reverses the meanings of multiply even/odd instructions. Here we reorder the input operands to the vector merge low/high instructions. The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on; that is, we need to reverse the usage of merge high and merge low, and also swap their inputs, to obtain the same semantics. In this case we are only swapping the inputs, because the reversed usage of high and low has already been done for us in the generic handling code for VEC_WIDEN_MULT_LO_EXPR. Bootstrapped and tested with the rest of the patch set on powerpc64{,le}-unknown-linux-gnu, with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-03 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap arguments to merge instruction for little endian. (vec_widen_umult_lo_v16qi): Likewise. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_smult_lo_v16qi): Likewise. (vec_widen_umult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Likewise. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_smult_lo_v8hi): Likewise. This patch is okay. I agree with Richard's suggestion to address the other parts of the patch by restructuring the patterns. Thanks, David