Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
On Mon, Nov 24, 2014 at 3:17 AM, Jonathan Wakely jwak...@redhat.com wrote: OK for trunk - thanks. Committed. :) Thanks! -- Regards, Tim Shen
Re: [PATCH] Fix find_base_term in 32-bit -fpic code (PR lto/64025)
On Tue, Nov 25, 2014 at 8:40 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Nov 25, 2014 at 12:25 AM, Jakub Jelinek ja...@redhat.com wrote: The fallback delegitimization I've added as last option mainly for debug info purposes, when we don't know if the base is a PIC register or say a PIC register plus some addend, unfortunately in some tests broke find_base_term, which for PLUS looks only at the first operand and recursion on it finds a base term, it returns it immediately. So, it found base term of _GLOBAL_OFFSET_TABLE_, when the right base term is actually in the second operand. This patch fixes it by swapping the operands, debug info doesn't care about the order, it won't match in any instruction anyway, but helps alias.c. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-11-24 Jakub Jelinek ja...@redhat.com PR lto/64025 * config/i386/i386.c (ix86_delegitimize_address): Ensure result comes before (addend - _GLOBAL_OFFSET_TABLE_) term. Can you also swap operands of (%ecx - %ebx) + foo? There is no point digging into RTX involving registers only when we know that we are looking for foo. This will also be consistent with the code you patched below. Something like attached prototype patch. Uros. Index: i386.c === --- i386.c (revision 218037) +++ i386.c (working copy) @@ -14847,19 +14847,20 @@ ix86_delegitimize_address (rtx x) leal (%ebx, %ecx, 4), %ecx ... movl foo@GOTOFF(%ecx), %edx -in which case we return (%ecx - %ebx) + foo -or (%ecx - _GLOBAL_OFFSET_TABLE_) + foo if pseudo_pic_reg +in which case we return foo + (%ecx - %ebx) +or foo + (%ecx - _GLOBAL_OFFSET_TABLE_) if pseudo_pic_reg and reload has completed. */ if (pic_offset_table_rtx (!reload_completed || !ix86_use_pseudo_pic_reg ())) -result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), -pic_offset_table_rtx), - result); +result = gen_rtx_PLUS (Pmode, result, + gen_rtx_MINUS (Pmode, copy_rtx (addend), + pic_offset_table_rtx)); else if (pic_offset_table_rtx !TARGET_MACHO !TARGET_VXWORKS_RTP) { rtx tmp = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME); - tmp = gen_rtx_MINUS (Pmode, copy_rtx (addend), tmp); - result = gen_rtx_PLUS (Pmode, tmp, result); + result = gen_rtx_PLUS (Pmode, result, +gen_rtx_MINUS (Pmode, copy_rtx (addend), + tmp)); } else return orig_x;
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
On Mon, Nov 24, 2014 at 10:28 PM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Nov 14, 2014 at 02:43:12AM +, Bin.Cheng wrote: On Fri, Nov 7, 2014 at 7:13 AM, Jeff Law l...@redhat.com wrote: On 11/05/14 02:30, Bin.Cheng wrote: Thanks very much for reviewing. I refined the patch according to your comments. Also made two small changes: a) skip breaking dependency between memory access and the corresponding base-reg modifying instruction. This feature doesn't help load/store pair that much and only increases compilation time. b) a minor bug fix in arm backend hook when calculating priority for memory accesses with minus offset. I am running bootstrap/test against latest trunk, and will adapt ChangeLog once get approved generally. So how about this one? OK for the trunk. Thanks for your patience. Jeff Thanks for reviewing. For the record, attached patch is committed. The only update is I disabled the pass if peephole2 isn't in effect because it relies on peephole2 to do real fusion work. Hi Bin, The documentation for TARGET_SCHED_FUSION_PRIORITY doesn't look right to me (see: https://gcc.gnu.org/onlinedocs/gccint/Scheduling.html ). I think you'll need to wrap your examples in something like @smallexample tags if you want to maintain their formatting. Hi James, Thanks very much for reporting this, will fix it. Thanks, bin
Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.
Ping. Anybody have a look? Thanks, bin On Tue, Nov 18, 2014 at 4:34 PM, Bin Cheng bin.ch...@arm.com wrote: Hi, This is the patch implementing ldp/stp optimization for aarch64. It consists of two parts. The first one is peephole part, which further includes ldp/stp patterns (both peephole patterns and the insn match patterns) and auxiliary functions (both checking the validity and merging). The second part implements the aarch64 backend hook for sched-fusion pass, which calculates appropriate priorities for different kinds of load/store instructions. With these priorities, sched-fusion pass can schedule as many load/store instructions together as possible, thus the coming peephole2 pass can merge them. I collected data for miscellaneous benchmarks. Some cases are improved; most of the rest cases are not regressed; only couple of them are regressed a little by 2-3%. After looking into the regressions I can confirm that code transformation is generally good with many load/stores paired. These regressions are most probably false alarms and caused by other issues. Conclusion is this patch can pair lots of consecutive load/store instructions into ldp/stp. The conclusion can be proven by code size improvement of benchmarks. E.g., in general it cuts off text size of spec2k6 binaries (O3 level, not statically linked in my build) by 1.68%. Bootstrap and test on aarch64. Is it OK? 2014-11-18 Bin Cheng bin.ch...@arm.com * config/aarch64/aarch64.md (load_pairmode): Split to load_pairsi, load_pairdi, load_pairsf and load_pairdf. (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split from load_pairmode. New alternative to support int/fp registers in fp/int mode patterns. (store_pairmode:): Split to store_pairsi, store_pairdi, store_pairsf and store_pairdi. (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split from store_pairmode. New alternative to support int/fp registers in fp/int mode patterns. (*load_pair_extendsidi2_aarch64): New pattern. (*load_pair_zero_extendsidi2_aarch64): New pattern. (aarch64-ldpstp.md): Include. * config/aarch64/aarch64-ldpstp.md: New file. * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp): New. (extract_base_offset_in_addr): New. (aarch64_operands_ok_for_ldpstp): New. (aarch64_operands_adjust_ok_for_ldpstp): New. * config/aarch64/aarch64.c (enum sched_fusion_type): New enum. (TARGET_SCHED_FUSION_PRIORITY): New hook. (fusion_load_store): New functon. (extract_base_offset_in_addr): New function. (aarch64_gen_adjusted_ldpstp): New function. (aarch64_sched_fusion_priority): New function. (aarch64_operands_ok_for_ldpstp): New function. (aarch64_operands_adjust_ok_for_ldpstp): New function. 2014-11-18 Bin Cheng bin.ch...@arm.com * gcc.target/aarch64/ldp-stp-1.c: New test. * gcc.target/aarch64/ldp-stp-2.c: New test. * gcc.target/aarch64/ldp-stp-3.c: New test. * gcc.target/aarch64/ldp-stp-4.c: New test. * gcc.target/aarch64/ldp-stp-5.c: New test. * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion and peephole2 pass.
Re: [PATCH] Fix find_base_term in 32-bit -fpic code (PR lto/64025)
On Tue, Nov 25, 2014 at 09:13:10AM +0100, Uros Bizjak wrote: On Tue, Nov 25, 2014 at 8:40 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Nov 25, 2014 at 12:25 AM, Jakub Jelinek ja...@redhat.com wrote: The fallback delegitimization I've added as last option mainly for debug info purposes, when we don't know if the base is a PIC register or say a PIC register plus some addend, unfortunately in some tests broke find_base_term, which for PLUS looks only at the first operand and recursion on it finds a base term, it returns it immediately. So, it found base term of _GLOBAL_OFFSET_TABLE_, when the right base term is actually in the second operand. This patch fixes it by swapping the operands, debug info doesn't care about the order, it won't match in any instruction anyway, but helps alias.c. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-11-24 Jakub Jelinek ja...@redhat.com PR lto/64025 * config/i386/i386.c (ix86_delegitimize_address): Ensure result comes before (addend - _GLOBAL_OFFSET_TABLE_) term. Can you also swap operands of (%ecx - %ebx) + foo? There is no point digging into RTX involving registers only when we know that we are looking for foo. This will also be consistent with the code you patched below. Something like attached prototype patch. Actually, thinking about it more, at least according to commutative_operand_precedence the canonical order is what we used to return (i.e. (something - _G_O_T_) + (symbol_ref) or (something - _G_O_T_) + (const (symbol_ref +- const)) So perhaps better fix is to follow find_base_value, which does something like: /* Guess which operand is the base address: If either operand is a symbol, then it is the base. If either operand is a CONST_INT, then the other is the base. */ if (CONST_INT_P (src_1) || CONSTANT_P (src_0)) return find_base_value (src_0); else if (CONST_INT_P (src_0) || CONSTANT_P (src_1)) return find_base_value (src_1); and do something similar in find_base_term too. I.e. perhaps even with higher precedence over REG_P with REG_POINTER (or lower, in these cases it doesn't really matter, neither argument is REG_P), choose first operand that is CONSTANT_P and not CONST_INT_P. Jakub
Re: [PATCH] Fix building of gengtype
On Tue, Nov 25, 2014 at 12:35:09AM +0100, Jakub Jelinek wrote: My last 2 bootstraps failed, both because of a race while building host gengtype (each time different gengtype*.o). Found bootstrap failures even with this patch (dunno what changed on my box that I started getting these last night, make has not changed), that time with errors.o and gcc-ar.o. The generated headers are solved these days in automatic dependencies world through # In order for parallel make to really start compiling the expensive # objects from $(OBJS) as early as possible, build all their # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) and build/*.o have explicit dependencies. I've tried to compare $(ALL_HOST_OBJS) on my box with all *.o */*.o files I had in stage3 directory, and besides build/*.o, I found: crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o crtfastmath.o crtprec32.o crtprec64.o crtprec80.o errors.o gcc-ar.o gcc-nm.o gcc-ranlib.o gengtype-lex.o gengtype.o gengtype-parse.o gengtype-state.o not being listed in ALL_HOST_OBJS. The crt*.o files come from libgcc build and thus are ok, the rest I've tried to handle in the following updated patch. If the #define GENERATOR_FILE inside of the 5 files is too ugly, another alternative might be to define both -DHOST_GENERATOR_FILE -DGENERATOR_FILE in Makefile.in and don't error in config.h if GENERATOR_FILE is defined, if HOST_GENERATOR_FILE is also defined. 2014-11-25 Jakub Jelinek ja...@redhat.com * Makefile.in (ALL_HOST_BACKEND_OBJS): Add $(GENGTYPE_OBJS), gcc-ar.o, gcc-nm.o and gcc-ranlib.o. (GENGTYPE_OBJS): New. (gengtype-lex.o, gengtype-parse.o, gengtype-state.o, gengtype.o): Remove explicit dependencies. (CFLAGS-gengtype-lex.o, CFLAGS-gengtype-parse.o, CFLAGS-gengtype-state.o, CFLAGS-gengtype.o): Add -DHOST_GENERATOR_FILE instead of -DGENERATOR_FILE. (CFLAGS-errors.o): New. * gengtype.c: Instead of testing GENERATOR_FILE define, test HOST_GENERATOR_FILE. If defined, include config.h and define GENERATOR_FILE afterwards, otherwise include bconfig.h. * gengtype-parse.c: Likewise. * gengtype-state.c: Likewise. * gengtype-lex.l: Likewise. * errors.c: Likewise. --- gcc/Makefile.in.jj 2014-11-25 00:06:43.122178737 +0100 +++ gcc/Makefile.in 2014-11-25 08:55:34.727300843 +0100 @@ -1509,7 +1509,8 @@ ALL_HOST_FRONTEND_OBJS = $(foreach v,$(C ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) $(OBJS-libcommon) \ $(OBJS-libcommon-target) @TREEBROWSER@ main.o c-family/cppspec.o \ $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \ - $(GCOV_TOOL_OBJS) lto-wrapper.o collect-utils.o + $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \ + lto-wrapper.o collect-utils.o # This lists all host object files, whether they are included in this # compilation or not. @@ -2484,30 +2485,31 @@ build/gengenrtl.o : gengenrtl.c $(BCONFI # on BCONFIG_H. For the build objects, add -DGENERATOR_FILE manually, # the build-%: rule doesn't apply to them. +GENGTYPE_OBJS = gengtype.o gengtype-parse.o gengtype-state.o \ + gengtype-lex.o errors.o + gengtype-lex.o build/gengtype-lex.o : gengtype-lex.c gengtype.h $(SYSTEM_H) -gengtype-lex.o: $(CONFIG_H) $(BCONFIG_H) -CFLAGS-gengtype-lex.o += -DGENERATOR_FILE +CFLAGS-gengtype-lex.o += -DHOST_GENERATOR_FILE build/gengtype-lex.o: $(BCONFIG_H) gengtype-parse.o build/gengtype-parse.o : gengtype-parse.c gengtype.h \ $(SYSTEM_H) -gengtype-parse.o: $(CONFIG_H) -CFLAGS-gengtype-parse.o += -DGENERATOR_FILE +CFLAGS-gengtype-parse.o += -DHOST_GENERATOR_FILE build/gengtype-parse.o: $(BCONFIG_H) gengtype-state.o build/gengtype-state.o: gengtype-state.c $(SYSTEM_H) \ gengtype.h errors.h double-int.h version.h $(HASHTAB_H) $(OBSTACK_H) \ $(XREGEX_H) -gengtype-state.o: $(CONFIG_H) -CFLAGS-gengtype-state.o += -DGENERATOR_FILE +CFLAGS-gengtype-state.o += -DHOST_GENERATOR_FILE build/gengtype-state.o: $(BCONFIG_H) gengtype.o build/gengtype.o : gengtype.c $(SYSTEM_H) gengtype.h\ rtl.def insn-notes.def errors.h double-int.h version.h \ $(HASHTAB_H) $(OBSTACK_H) $(XREGEX_H) -gengtype.o: $(CONFIG_H) -CFLAGS-gengtype.o += -DGENERATOR_FILE +CFLAGS-gengtype.o += -DHOST_GENERATOR_FILE build/gengtype.o: $(BCONFIG_H) +CFLAGS-errors.o += -DHOST_GENERATOR_FILE + build/genmddeps.o: genmddeps.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h\ errors.h $(READ_MD_H) build/genmodes.o : genmodes.c $(BCONFIG_H) $(SYSTEM_H) errors.h \ --- gcc/gengtype.c.jj 2014-11-21 10:17:06.135695325 +0100 +++ gcc/gengtype.c 2014-11-25 08:56:18.042523089 +0100 @@ -17,10 +17,11 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ -#ifdef GENERATOR_FILE -#include bconfig.h -#else +#ifdef HOST_GENERATOR_FILE #include config.h +#define GENERATOR_FILE 1 +#else +#include bconfig.h
Re: [Patch, libstdc++/63497] Avoid dereferencing invalid iterator in regex_executor
On Wed, Oct 22, 2014 at 8:19 PM, Tim Shen tims...@google.com wrote: Committed. Thank you too! I'm backporting this patch to gcc-4_9-branch. Do we usually boot test it and then commit directly, or it should be reviewed again? -- Regards, Tim Shen commit 1e146769d08ff19cc01a08b91ca8fd3151f34faf Author: timshen tims...@google.com Date: Tue Nov 25 00:36:25 2014 -0800 PR libstdc++/63497 include/bits/regex_executor.h (_Executor::_M_word_boundary): Remove unused parameter. include/bits/regex_executor.tcc (_Executor::_M_dfs, _Executor::_M_word_boundary): Avoid dereferecing _M_current at _M_end or other invalid position. diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h index 708c78e..0d1b676 100644 --- a/libstdc++-v3/include/bits/regex_executor.h +++ b/libstdc++-v3/include/bits/regex_executor.h @@ -134,7 +134,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } bool - _M_word_boundary(_State_TraitsT __state) const; + _M_word_boundary() const; bool _M_lookahead(_State_TraitsT __state); diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc index 052302b..ef49161 100644 --- a/libstdc++-v3/include/bits/regex_executor.tcc +++ b/libstdc++-v3/include/bits/regex_executor.tcc @@ -257,7 +257,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_dfs__match_mode(__state._M_next); break; case _S_opcode_word_boundary: - if (_M_word_boundary(__state) == !__state._M_neg) + if (_M_word_boundary() == !__state._M_neg) _M_dfs__match_mode(__state._M_next); break; // Here __state._M_alt offers a single start node for a sub-NFA. @@ -267,9 +267,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_dfs__match_mode(__state._M_next); break; case _S_opcode_match: + if (_M_current == _M_end) + break; if (__dfs_mode) { - if (_M_current != _M_end __state._M_matches(*_M_current)) + if (__state._M_matches(*_M_current)) { ++_M_current; _M_dfs__match_mode(__state._M_next); @@ -348,25 +350,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION templatetypename _BiIter, typename _Alloc, typename _TraitsT, bool __dfs_mode bool _Executor_BiIter, _Alloc, _TraitsT, __dfs_mode:: -_M_word_boundary(_State_TraitsT __state) const +_M_word_boundary() const { - // By definition. - bool __ans = false; - auto __pre = _M_current; - --__pre; - if (!(_M_at_begin() _M_at_end())) + bool __left_is_word = false; + if (_M_current != _M_begin + || (_M_flags regex_constants::match_prev_avail)) { - if (_M_at_begin()) - __ans = _M_is_word(*_M_current) - !(_M_flags regex_constants::match_not_bow); - else if (_M_at_end()) - __ans = _M_is_word(*__pre) - !(_M_flags regex_constants::match_not_eow); - else - __ans = _M_is_word(*_M_current) - != _M_is_word(*__pre); + auto __prev = _M_current; + if (_M_is_word(*std::prev(__prev))) + __left_is_word = true; } - return __ans; + bool __right_is_word = + _M_current != _M_end _M_is_word(*_M_current); + + if (__left_is_word == __right_is_word) + return false; + if (__left_is_word !(_M_flags regex_constants::match_not_eow)) + return true; + if (__right_is_word !(_M_flags regex_constants::match_not_bow)) + return true; + return false; } _GLIBCXX_END_NAMESPACE_VERSION
RE: [PATCH, AARCH64] Fix ICE in CCMP (PR64015)
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Richard Henderson Sent: Monday, November 24, 2014 4:57 PM To: Zhenqiang Chen; gcc-patches@gcc.gnu.org Cc: Marcus Shawcroft Subject: Re: [PATCH, AARCH64] Fix ICE in CCMP (PR64015) On 11/24/2014 06:11 AM, Zhenqiang Chen wrote: Expand pass always uses sign-extend to represent constant value. For the case in the patch, a 8-bit unsigned value 252 is represented as -4, which pass the ccmn check. After mode conversion, -4 becomes 252, which leads to mismatch. This sort of thing is why I suggested from the beginning that expansion happen directly from trees instead of sort-of re-expanding from rtl. I think you're better off fixing this properly than hacking around it here. Thanks for the comments. Here was your previous comments: We could avoid that by using struct expand_operand, create_input_operand et al, then expand_insn. That does require that the target hooks be given trees rather than rtl as input. I want to confirm with you two things before I rework it. (1) expand_insn needs an optab_handler as input. Do I need to define a ccmp_optab with different mode support in optabs.def? (2) To make sure later operands not clobber CC, all operands are expanded before ccmp-first in current implementation. If taking tree/gimple as input, what's your preferred logic to guarantee CC not clobbered? Thanks! -Zhenqiang
[PATCH, PR63995, CHKP] Use single static bounds var for varpool nodes sharing asm name
Hi, This patch partly fixes PR bootstrap/63995 by avoiding duplicating static bounds vars. With this fix bootstrap still fails at stage 2 and 3 comparison. Bootstrapped and checked on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * tree-chkp (chkp_make_static_bounds): Share bounds var between nodes sharing assembler name. gcc/testsuite 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * g++.dg/dg.exp: Add mpx-dg.exp. * g++.dg/pr63995-1.C: New. diff --git a/gcc/testsuite/g++.dg/dg.exp b/gcc/testsuite/g++.dg/dg.exp index 14beae1..44eab0c 100644 --- a/gcc/testsuite/g++.dg/dg.exp +++ b/gcc/testsuite/g++.dg/dg.exp @@ -18,6 +18,7 @@ # Load support procs. load_lib g++-dg.exp +load_lib mpx-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CXXFLAGS diff --git a/gcc/testsuite/g++.dg/pr63995-1.C b/gcc/testsuite/g++.dg/pr63995-1.C new file mode 100644 index 000..82e7606 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63995-1.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -O2 -g -fcheck-pointer-bounds -mmpx } */ + +int test1 (int i) +{ + extern const int arr[10]; + return arr[i]; +} + +extern const int arr[10]; + +int test2 (int i) +{ + return arr[i]; +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..d425084 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2727,9 +2727,29 @@ chkp_make_static_bounds (tree obj) /* First check if we already have required var. */ if (chkp_static_var_bounds) { - slot = chkp_static_var_bounds-get (obj); - if (slot) - return *slot; + /* If there is a symbol sharing assembler name with obj, +we may use its bounds. */ + if (TREE_CODE (obj) == VAR_DECL) + { + varpool_node *node = varpool_node::get_create (obj); + + while (node-previous_sharing_asm_name) + node = (varpool_node *)node-previous_sharing_asm_name; + + while (node) + { + slot = chkp_static_var_bounds-get (node-decl); + if (slot) + return *slot; + node = (varpool_node *)node-next_sharing_asm_name; + } + } + else + { + slot = chkp_static_var_bounds-get (obj); + if (slot) + return *slot; + } } /* Build decl for bounds var. */
Re: [Patch, libstdc++/63775] Fix regex bracket expression parsing
On Wed, Nov 12, 2014 at 11:45 PM, Tim Shen tims...@google.com wrote: Committed with comment fix and slight change on testcase (VERIFY(false) at end of the try block -- must throw). Is it possible to backport this patch to 4.9 branch? It's an important fix, but I'm not sure if there's any binary compatibility problem. Is it fine because it's only _Compiler class, which is an intermediate class, that's modified? -- Regards, Tim Shen
[PATCH, PR64056, i386] Fix chkp tests requiring mempcpy
Hi, This patch adds check for mempcpy availability for tests requiring it. Checked with RUNTESTFLAGS=--target_board='unix{-m32,}' i386.exp=chkp-*. OK for trunk? Thanks, Ilya -- 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR target/64056 * gcc.target/i386/chkp-strlen-4.c: Add mempcpy target check. * gcc.target/i386/chkp-stropt-4.c: Likewise. * gcc.target/i386/chkp-stropt-8.c: Likewise. * gcc.target/i386/chkp-stropt-12.c: Likewise. * gcc.target/i386/chkp-stropt-16.c: Likewise. diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c index a9ebe2b..2da762a 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c +++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump-times strlen 1 strlen } } */ /* { dg-final { cleanup-tree-dump strlen } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c index 94e936d..01a5159 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump-not mempcpy_nobnd chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c index 4b26d58..f925ef9 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-nochk-string-functions -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nobnd_nochk chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c index 4ee2390..3ae6bf5 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-nochk-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nochk chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c index 8c3b15d..6d6d55e 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nobnd chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */
Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c
On Mon, Nov 24, 2014 at 10:33 PM, Jeff Law l...@redhat.com wrote: On 11/22/14 11:50, Uros Bizjak wrote: Hello! These two tests fix PR target/63847 [1], where x87 excess precision causes testcase to fail. The problem was triggered by -fpic, please see the PR for analysis. The patch adds -ffloat-store for 32bit x86 target, a standard and well tested solution for this problem. 2014-11-22 Uros Bizjak ubiz...@gmail.com PR target/63847 * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store for 32bit x86 targets. * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto. OK. Don't we have -fexcess-precision=standard for this now? Richard. Jeff
Re: [PATCH] Fix linemap_line_start (PR preprocessor/60436)
On Tue, Nov 25, 2014 at 12:22 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! As mentioned in the PR, when preprocessing very large files, if there are huge numbers of lines where no #line is emitted, we might not detect overflowinging into adhoc locations. Apparently in the add_map case we already handle that fine, by first stopping tracking columns and after another 256M lines give up on tracking locations, so this patch just makes sure we enter that path if going over those limits. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-11-24 Jakub Jelinek ja...@redhat.com PR preprocessor/60436 * line-map.c (linemap_line_start): If highest is above 0x6000 and we are still tracking columns or highest is above 0x7000, force add_map. --- libcpp/line-map.c.jj2014-11-12 08:06:57.0 +0100 +++ libcpp/line-map.c 2014-11-24 12:14:52.691276169 +0100 @@ -529,10 +529,10 @@ linemap_line_start (struct line_maps *se line_delta * ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) 1000) || (max_column_hint = (1U ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map))) || (max_column_hint = 80 - ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) = 10)) -{ - add_map = true; -} + ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) = 10) + || (highest 0x6000 + (set-max_column_hint || highest 0x7000))) +add_map = true; else max_column_hint = set-max_column_hint; if (add_map) @@ -543,7 +543,7 @@ linemap_line_start (struct line_maps *se /* If the column number is ridiculous or we've allocated a huge number of source_locations, give up on column numbers. */ max_column_hint = 0; - if (highest 0x7000) + if (highest 0x7000) return 0; column_bits = 0; } Jakub
Re: [PATCH, AARCH64] Fix ICE in CCMP (PR64015)
On 11/25/2014 09:41 AM, Zhenqiang Chen wrote: I want to confirm with you two things before I rework it. (1) expand_insn needs an optab_handler as input. Do I need to define a ccmp_optab with different mode support in optabs.def? No, look again: expand_insn needs an enum insn_code as input. Since this is the backend, you can use any icode name you like, which means that you can use CODE_FOR_ccmp_and etc directly. (2) To make sure later operands not clobber CC, all operands are expanded before ccmp-first in current implementation. If taking tree/gimple as input, what's your preferred logic to guarantee CC not clobbered? Hmm. Perhaps the target hook will need to output two sequences, each of which will be concatenated while looping around the calls to gen_ccmp_next. The first sequence will be operand preparation and the second sequence will be ccmp generation. Something like bool aarch64_gen_ccmp_start(rtx *prep_seq, rtx *gen_seq, int cmp_code, int bit_code, tree op0, tree op1) { bool success; start_sequence (); // Widen and expand operands *prep_seq = get_insns (); end_sequence (); start_sequence (); // Generate the first compare *gen_seq = get_insns (); end_sequence (); return success; } bool aarch64_gen_ccmp_next(rtx *prep_seq, rtx *gen_seq, rtx prev, int cmp_code, int bit_code, tree op0, tree op1) { bool success; push_to_sequence (*prep_seq); // Widen and expand operands *prep_seq = get_insns (); end_sequence (); push_to_sequence (*gen_seq); // Generate the next ccmp *gen_seq = get_insns (); end_sequence (); return success; } If there are ever any failures, the middle-end can simply discard the sequences. If everything succeeds, it simply calls emit_insn on both sequences. r~
Re: [PATCH] Add verify_sese
On Tue, Nov 25, 2014 at 1:01 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, I ran into a problem with my oacc kernels directive patch series where tail-merge added another entry into a region that was previously single-entry-single-exit. That resulted in hitting this assert in calc_dfs_tree: ... /* This aborts e.g. when there is _no_ path from ENTRY to EXIT at all. */ gcc_assert (di-nodes == (unsigned int) n_basic_blocks_for_fn (cfun) - 1); ... during a call to move_sese_region_to_fn. This patch makes sure that we abort earlier, with a clearer message of what is actually wrong. Bootstrapped and reg-tested on x86_64. OK for trunk/stage3? I believe someone made the function work for SEME regions and I believe it is actually used to copy loops with multiple exits so I don't see how the patch can work in these cases? Thanks, Richard. Thanks, - Tom
Re: [RFC] First steps towards segregating types.
On Tue, Nov 25, 2014 at 2:06 AM, Joseph Myers jos...@codesourcery.com wrote: On Mon, 24 Nov 2014, Richard Biener wrote: TREE_LIST should die (with the typical replacement being vecsomething); most lists do not need all the overhead of individually allocated objects with (code, flags, type, chain, value, purpose). Probably TREE_VEC too. Note that there is nothing wrong with TREE_LIST or TREE_VEC if they were based off tree_base only. That they inherit from tree_common is the bug to fix - either by not using TREE_LIST or TREE_VEC from the users that use fields from tree_common or tree_typed or by adjusting those users to not need those fields. Even inheriting from tree_base, typically lists don't need code (because you know statically that something is a list) or flags. And because generally something is statically a list or not a list, there is no particular benefit from sharing the static type of tree, and better compile-time checking if there is no such common base class for list and other objects at all. (Identifiers are another case that doesn't generally benefit from having a common static type of tree.) All true - but 'tree's were built on the premise that everything is a tree. An incremental change is to make that sane - removing bits out of the tree space is also possible (though please not by a wart like a TYPE_REF tree node ...) Richard. -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch] Improving jump-thread pass for PR 54742
On Mon, Nov 24, 2014 at 11:05 PM, Sebastian Pop seb...@gmail.com wrote: Jeff Law wrote: On 11/23/14 15:22, Sebastian Pop wrote: The second patch attached limits the search for FSM jump threads to loops. With that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap (and 424 jump threads on powerpc64-linux bootstrap.) Yea, that was one of the things I was going to poke at as well as a quick scan of your patch gave me the impression it wasn't limited to loops. Again, I haven't looked much at the patch, but I got the impression you're doing a backwards walk through the predecessors to discover the result of the COND_EXPR. Correct? Yes. That's something I'd been wanting to do -- basically start with a COND_EXPR, then walk the dataflow backwards substituting values into the COND_EXPR (possibly creating non-gimple). Ultimately the goal is to substitute and fold, getting to a constant :-) The forward exhaustive stuff we do now is, crazy. The backwards approach could be decoupled from DOM VRP into an independent pass, which I think would be wise. Using a SEME region copier is also something I really wanted to do long term. In fact, I believe a lot of tree-ssa-threadupdate.c ought to be ripped out and replaced with a SEME based copier. I did an experiment around these lines over the week-end, and now that you mention it, I feel less shy to speak about; well the patch does not yet pass bootstrap, and there still are about 20 failing test-cases. I feel better reading the code generation part of jump-threading after this patch ;-) Basically I think all the tree-ssa-threadupdate.c can be replaced by duplicate_seme_region that generalizes the code generation. Btw I once thought about doing on-the-fly lattice use/update and folding during basic-block copying (or even re-generating expressions via simplifying gimple_build ()). Or have a substitute-and-fold like facility that can run on SEME regions and do this. Richard. It appears you've built at least parts of two pieces needed to all this as a Bodik style optimizer. Which is exactly the long term direction I think this code ought to take. One of the reasons I think we see more branches is that in sese region copying we do not use the knowledge of the value of the condition for the last branch in a jump-thread path: we rely on other propagation passes to remove the branch. The last attached patch adds: /* Remove the last branch in the jump thread path. */ remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit-dest); That's certainly a possibility. But I would expect that even with this limitation something would be picking up the fact that the branch is statically computable (even if it's an RTL optimizer). But it's definitely something to look for. Please let me know if the attached patches are producing better results on gcc. For the trunk: instructions:1339016494968 branches :243568982489 First version of your patch: instructions:1339739533291 branches: 243806615986 Latest version of your patch: instructions:1339749122609 branches: 243809838262 I think I got about the same results. I got my scripts installed on the gcc-farm. I first used an x86_64 gcc75 and valgrind was crashing not recognizing how to decode an instruction. Then I moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because there is a bit of noise in all these numbers) $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 ~/alias.ii all 4 patches: ==153617== I refs: 13,914,038,211 ==153617== ==153617== Branches: 1,926,407,760 (1,879,827,481 cond + 46,580,279 ind) ==153617== Mispredicts: 144,890,904 ( 132,094,105 cond + 12,796,799 ind) ==153617== Mispred rate: 7.5% ( 7.0% + 27.4% ) ==34993== I refs: 13,915,335,629 ==34993== ==34993== Branches: 1,926,597,919 (1,880,017,558 cond + 46,580,361 ind) ==34993== Mispredicts: 144,974,266 ( 132,177,440 cond + 12,796,826 ind) ==34993== Mispred rate: 7.5% ( 7.0% + 27.4% ) ==140841== I refs: 13,915,334,459 ==140841== ==140841== Branches: 1,926,597,819 (1,880,017,458 cond + 46,580,361 ind) ==140841== Mispredicts: 144,974,296 ( 132,177,470 cond + 12,796,826 ind) ==140841== Mispred rate: 7.5% ( 7.0% + 27.4% ) patch 1: ==99902== I refs: 13,915,069,710 ==99902== ==99902== Branches: 1,926,963,813 (1,880,376,148 cond + 46,587,665 ind) ==99902== Mispredicts: 145,501,564 ( 132,656,576 cond + 12,844,988 ind) ==99902== Mispred rate: 7.5% ( 7.0% + 27.5% ) ==3907== I refs: 13,915,082,469 ==3907== ==3907== Branches:
Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c
On Tue, Nov 25, 2014 at 10:23 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 24, 2014 at 10:33 PM, Jeff Law l...@redhat.com wrote: On 11/22/14 11:50, Uros Bizjak wrote: Hello! These two tests fix PR target/63847 [1], where x87 excess precision causes testcase to fail. The problem was triggered by -fpic, please see the PR for analysis. The patch adds -ffloat-store for 32bit x86 target, a standard and well tested solution for this problem. 2014-11-22 Uros Bizjak ubiz...@gmail.com PR target/63847 * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store for 32bit x86 targets. * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto. OK. Don't we have -fexcess-precision=standard for this now? Oh ... indeed. I will update the patch to enable it for all x86 targets. Thanks, Uros.
Re: [PATCH, PR63995, CHKP] Use single static bounds var for varpool nodes sharing asm name
On Tue, Nov 25, 2014 at 9:45 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch partly fixes PR bootstrap/63995 by avoiding duplicating static bounds vars. With this fix bootstrap still fails at stage 2 and 3 comparison. Bootstrapped and checked on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * tree-chkp (chkp_make_static_bounds): Share bounds var between nodes sharing assembler name. gcc/testsuite 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * g++.dg/dg.exp: Add mpx-dg.exp. * g++.dg/pr63995-1.C: New. diff --git a/gcc/testsuite/g++.dg/dg.exp b/gcc/testsuite/g++.dg/dg.exp index 14beae1..44eab0c 100644 --- a/gcc/testsuite/g++.dg/dg.exp +++ b/gcc/testsuite/g++.dg/dg.exp @@ -18,6 +18,7 @@ # Load support procs. load_lib g++-dg.exp +load_lib mpx-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CXXFLAGS diff --git a/gcc/testsuite/g++.dg/pr63995-1.C b/gcc/testsuite/g++.dg/pr63995-1.C new file mode 100644 index 000..82e7606 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63995-1.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -O2 -g -fcheck-pointer-bounds -mmpx } */ + +int test1 (int i) +{ + extern const int arr[10]; + return arr[i]; +} + +extern const int arr[10]; + +int test2 (int i) +{ + return arr[i]; +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..d425084 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2727,9 +2727,29 @@ chkp_make_static_bounds (tree obj) /* First check if we already have required var. */ if (chkp_static_var_bounds) { - slot = chkp_static_var_bounds-get (obj); - if (slot) - return *slot; + /* If there is a symbol sharing assembler name with obj, +we may use its bounds. */ + if (TREE_CODE (obj) == VAR_DECL) + { + varpool_node *node = varpool_node::get_create (obj); + + while (node-previous_sharing_asm_name) + node = (varpool_node *)node-previous_sharing_asm_name; + + while (node) + { + slot = chkp_static_var_bounds-get (node-decl); + if (slot) + return *slot; + node = (varpool_node *)node-next_sharing_asm_name; + } Hum. varpool_node::get returns the ultimate alias target thus the walking shouldn't be necessary. Just node = varpool_node::get_create (obj); slot = chkp_static_var_bounds-get (node-decl); if (slot) return *slot; and then making sure to set the decl also for node-decl. I suppose it really asks for making chkp_static_var_bounds-get based on a varpool node and not a decl so you consistently use the ultimate alias target. Richard. + } + else + { + slot = chkp_static_var_bounds-get (obj); + if (slot) + return *slot; + } } /* Build decl for bounds var. */
Re: [PATCH, PR64056, i386] Fix chkp tests requiring mempcpy
On Tue, Nov 25, 2014 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds check for mempcpy availability for tests requiring it. Checked with RUNTESTFLAGS=--target_board='unix{-m32,}' i386.exp=chkp-*. OK for trunk? Ok. Thanks, Richard. Thanks, Ilya -- 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR target/64056 * gcc.target/i386/chkp-strlen-4.c: Add mempcpy target check. * gcc.target/i386/chkp-stropt-4.c: Likewise. * gcc.target/i386/chkp-stropt-8.c: Likewise. * gcc.target/i386/chkp-stropt-12.c: Likewise. * gcc.target/i386/chkp-stropt-16.c: Likewise. diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c index a9ebe2b..2da762a 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c +++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump-times strlen 1 strlen } } */ /* { dg-final { cleanup-tree-dump strlen } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c index 94e936d..01a5159 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-12.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump-not mempcpy_nobnd chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c index 4b26d58..f925ef9 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-16.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-nochk-string-functions -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nobnd_nochk chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c index 4ee2390..3ae6bf5 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-4.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-nochk-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nochk chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */ diff --git a/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c b/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c index 8c3b15d..6d6d55e 100644 --- a/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c +++ b/gcc/testsuite/gcc.target/i386/chkp-stropt-8.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target mpx } */ +/* { dg-require-effective-target mempcpy } */ /* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt -fchkp-use-fast-string-functions -D_GNU_SOURCE } */ /* { dg-final { scan-tree-dump mempcpy_nobnd chkpopt } } */ /* { dg-final { cleanup-tree-dump chkpopt } } */
Re: [PATCH] Fix regressions in libgomp testsuite: set flag_fat_lto_objects for offload
On Mon, Nov 24, 2014 at 5:44 PM, Ilya Verbin iver...@gmail.com wrote: On 17 Nov 10:57, Richard Biener wrote: On Fri, Nov 14, 2014 at 6:08 PM, Ilya Verbin iver...@gmail.com wrote: On 14 Nov 09:01, H.J. Lu wrote: On Fri, Nov 14, 2014 at 8:51 AM, Ilya Verbin iver...@gmail.com wrote: On 14 Nov 08:46, H.J. Lu wrote: What happens when -flto is used on command line? Will we generate both LTO IR and offload IR? Right. I'm not sure whether we should make slim objects in case of LTO + offload IR... Isn't __gnu_lto_slim only applied to regular LTO IR? Should offload IR be handled separately from regular LTO IR? It is odd to use flag_fat_lto_objects to control offload IR. It is handled separately, but it uses a common infrastructure with regular LTO for streaming, therefore compile_file automatically emits __gnu_lto_slim when there is at least one section with IR (flag_generate_lto is set). You propose to introduce a second flag like flag_fat_lto_objects to disable __gnu_lto_slim? Err... why is offloading not guarded with a new symbol like __gnu_lto_offload? Well, it's possible to guard offload IR with a new symbol, using a patch like this (it is not fully regtested). But I don't like it... Maybe we could just change the meaning of __gnu_lto_v1 from object contains LTO IR to object contains any IR? In collect2 both LTO and offload cases are handled identically. Is there other place where the symbol is used? I don't think so (and even collect2.c should be changed to use simple-object to identify LTO objects rather than ar...). But I think libtool uses it as well. In the patch adding flag_generate_offload sounds like a good solution, I didn't like emitting fat LTO objects unconditionally just because we offload. Richard. -- Ilya diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c index c133a22..f09d79d 100644 --- a/gcc/ada/gcc-interface/decl.c +++ b/gcc/ada/gcc-interface/decl.c @@ -1490,7 +1490,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, int definition) definition debug_info_p !optimize -!flag_generate_lto) +!flag_generate_lto +!flag_generate_offload) { tree param = create_param_decl (gnu_entity_name, gnu_type, false); gnat_pushdecl (param, gnat_entity); diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 2fd99a7..fed1a3e 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2075,7 +2075,7 @@ ipa_passes (void) } /* Some targets need to handle LTO assembler output specially. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_start (); if (!in_lto_p) @@ -2092,7 +2092,7 @@ ipa_passes (void) } } - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_end (); if (!flag_ltrans (in_lto_p || !flag_lto || flag_fat_lto_objects)) @@ -2176,10 +2176,10 @@ symbol_table::compile (void) /* Offloading requires LTO infrastructure. */ if (!in_lto_p g-have_offload) -flag_generate_lto = 1; +flag_generate_offload = 1; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/collect2.c b/gcc/collect2.c index 9c3a1c5..2dcebcd 100644 --- a/gcc/collect2.c +++ b/gcc/collect2.c @@ -2392,12 +2392,16 @@ scan_prog_file (const char *prog_name, scanpass which_pass, if (found_lto) continue; - /* Look for the LTO info marker symbol, and add filename to + /* Look for the LTO or offload info marker symbol, and add filename to the LTO objects list if found. */ for (p = buf; (ch = *p) != '\0' ch != '\n'; p++) if (ch == ' ' p[1] == '_' p[2] == '_' -(strncmp (p + (p[3] == '_' ? 2 : 1), __gnu_lto_v1, 12) == 0) -ISSPACE (p[p[3] == '_' ? 14 : 13])) +(((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_lto_v1, 12) == 0) + ISSPACE (p[p[3] == '_' ? 14 : 13])) + || ((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_offload_v1, 16) == 0) +ISSPACE (p[p[3] == '_' ? 18 : 17] { add_lto_object (lto_objects, prog_name); diff --git a/gcc/common.opt b/gcc/common.opt index 41c8d4e..11a5500 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -67,6 +67,10 @@ int *param_values Variable int flag_generate_lto +; Nonzero if we should write GIMPLE bytecode for offload compilation. +Variable +int
Re: [PATCH, PR63995, CHKP] Use single static bounds var for varpool nodes sharing asm name
2014-11-25 12:43 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 25, 2014 at 9:45 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch partly fixes PR bootstrap/63995 by avoiding duplicating static bounds vars. With this fix bootstrap still fails at stage 2 and 3 comparison. Bootstrapped and checked on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * tree-chkp (chkp_make_static_bounds): Share bounds var between nodes sharing assembler name. gcc/testsuite 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * g++.dg/dg.exp: Add mpx-dg.exp. * g++.dg/pr63995-1.C: New. diff --git a/gcc/testsuite/g++.dg/dg.exp b/gcc/testsuite/g++.dg/dg.exp index 14beae1..44eab0c 100644 --- a/gcc/testsuite/g++.dg/dg.exp +++ b/gcc/testsuite/g++.dg/dg.exp @@ -18,6 +18,7 @@ # Load support procs. load_lib g++-dg.exp +load_lib mpx-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CXXFLAGS diff --git a/gcc/testsuite/g++.dg/pr63995-1.C b/gcc/testsuite/g++.dg/pr63995-1.C new file mode 100644 index 000..82e7606 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63995-1.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -O2 -g -fcheck-pointer-bounds -mmpx } */ + +int test1 (int i) +{ + extern const int arr[10]; + return arr[i]; +} + +extern const int arr[10]; + +int test2 (int i) +{ + return arr[i]; +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..d425084 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2727,9 +2727,29 @@ chkp_make_static_bounds (tree obj) /* First check if we already have required var. */ if (chkp_static_var_bounds) { - slot = chkp_static_var_bounds-get (obj); - if (slot) - return *slot; + /* If there is a symbol sharing assembler name with obj, +we may use its bounds. */ + if (TREE_CODE (obj) == VAR_DECL) + { + varpool_node *node = varpool_node::get_create (obj); + + while (node-previous_sharing_asm_name) + node = (varpool_node *)node-previous_sharing_asm_name; + + while (node) + { + slot = chkp_static_var_bounds-get (node-decl); + if (slot) + return *slot; + node = (varpool_node *)node-next_sharing_asm_name; + } Hum. varpool_node::get returns the ultimate alias target thus the walking shouldn't be necessary. Just node = varpool_node::get_create (obj); slot = chkp_static_var_bounds-get (node-decl); if (slot) return *slot; and then making sure to set the decl also for node-decl. I suppose it really asks for making chkp_static_var_bounds-get based on a varpool node and not a decl so you consistently use the ultimate alias target. varpool_node::get just returns symtab_node::get which returns decl-decl_with_vis.symtab_node - thus no aliases walkthrough. Also none of two varpool_nodes is an alias. The only connection between these nodes seems to be {next,previous}_sharing_asm_name. Here is how these nodes look: (gdb) p *$2 $3 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0, refuse_visibility_changes = 0, externally_visible = 0, no_reorder = 0, force_output = 0, forced_by_abi = 0, unique_name = 0, implicit_section = 0, body_removed = 1, used_from_other_partition = 0, in_other_partition = 0, address_taken = 0, in_init_priority_hash = 0, need_lto_streaming = 0, offloadable = 0, order = 3, decl = 0x77dd2cf0, next = 0x77f46200, previous = 0x77dd67a8, next_sharing_asm_name = 0x0, previous_sharing_asm_name = 0x77f46200, same_comdat_group = 0x0, ref_list = {references = 0x0, referring = {m_vec = 0x23856b0}}, alias_target = 0x0, lto_file_data = 0x0, aux = 0x0, x_comdat_group = 0x0, x_section = 0x0}, output = 0, need_bounds_init = 0, dynamically_initialized = 0, tls_model = TLS_MODEL_NONE, used_by_single_function = 0} (gdb) p *$5 $6 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0, refuse_visibility_changes = 0, externally_visible = 0, no_reorder = 0, force_output = 0, forced_by_abi = 0, unique_name = 0, implicit_section = 0, body_removed = 1, used_from_other_partition = 0, in_other_partition = 0, address_taken = 0, in_init_priority_hash = 0, need_lto_streaming = 0, offloadable = 0, order = 2, decl = 0x77dd2bd0, next = 0x77dd6620, previous = 0x77f46300, next_sharing_asm_name = 0x77f46300, previous_sharing_asm_name = 0x0, same_comdat_group = 0x0, ref_list = {references =
Re: [Patch] Improving jump-thread pass for PR 54742
On 2014.11.24 at 22:05 +, Sebastian Pop wrote: I got my scripts installed on the gcc-farm. I first used an x86_64 gcc75 and valgrind was crashing not recognizing how to decode an instruction. Then I moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because there is a bit of noise in all these numbers) $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 ~/alias.ii BTW perf is also available on gcc112: trippels@gcc2-power8 ~ % perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches[Hardware event] branch-misses [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend[Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] minor-faults [Software event] major-faults [Software event] alignment-faults [Software event] emulation-faults [Software event] dummy [Software event] L1-dcache-loads[Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-icache-loads[Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses[Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] dTLB-load-misses [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] (see 'man perf-list' on how to encode it) mem:addr[:access][Hardware breakpoint] -- Markus
Re: [Patch, libstdc++/63497] Avoid dereferencing invalid iterator in regex_executor
On 25/11/14 00:41 -0800, Tim Shen wrote: On Wed, Oct 22, 2014 at 8:19 PM, Tim Shen tims...@google.com wrote: Committed. Thank you too! I'm backporting this patch to gcc-4_9-branch. Do we usually boot test it and then commit directly, or it should be reviewed again? I approved it for the branch (in the bugzilla comments) so usually you could just test it and commit it ... but since you asked ... maybe you should leave the _M_word_boundary signature unchanged for the branch, since the unused parameter doesn't do any harm and removing it isn't needed for the fix to work.
Re: [PATCH, ciklplus]: Use -ffloat-store for 32bit x86 in cilk-plus/AN/builtin_fn_{custom,mutating}.c
On Tue, Nov 25, 2014 at 10:38 AM, Uros Bizjak ubiz...@gmail.com wrote: These two tests fix PR target/63847 [1], where x87 excess precision causes testcase to fail. The problem was triggered by -fpic, please see the PR for analysis. The patch adds -ffloat-store for 32bit x86 target, a standard and well tested solution for this problem. 2014-11-22 Uros Bizjak ubiz...@gmail.com PR target/63847 * c-c++-common/cilk-plus/AN/builtin_fn_custom.c: Add -ffloat-store for 32bit x86 targets. * c-c++-common/cilk-plus/AN/builtin_fn_mutating.c: Ditto. OK. Don't we have -fexcess-precision=standard for this now? Oh ... indeed. I will update the patch to enable it for all x86 targets. cc1plus: sorry, unimplemented: -fexcess-precision=standard for C++ Uros.
RE: [PATCH] Fix PR ipa/61190, updated
Hi Honza, On Mon, 24 Nov 2014 16:57:42 +0100, Jan Hubicka wrote: +cgraph_node::call_for_symbol_thunks_and_aliases_1 (bool (*callback) + (cgraph_node *, void *), + void *data, + bool include_overwritable, + bool exclude_virtual_thunks) Instead of adding _1 variant into public API, please just add implicit agrumnet bool exclude_virtual_thunks=false into +cgraph_node::call_for_symbol_thunks_and_aliases Ok, done. Index: gcc/ipa-pure-const.c === --- gcc/ipa-pure-const.c (revision 215888) +++ gcc/ipa-pure-const.c (working copy) @@ -744,6 +744,8 @@ analyze_function (struct cgraph_node *fn, bool ipa { /* Thunk gets propagated through, so nothing interesting happens. */ gcc_assert (ipa); + if (fn-thunk.virtual_offset_p) + l-pure_const_state = IPA_NEITHER; return l; } Hmm, I looked again at the above if statement, and I think now it should better be if (fn-thunk.thunk_p fn-thunk.virtual_offset_p), because thunk.virtual_offset_p is probably not well defined if we come here because of fn-alias == true. This makes the lattice to be initialized correctly, but you also need the function_symbol calls that will skip thunks replaced by something like function_or_non_virtual_thunk_symbol. Oh, I see what you mean, thanks. I created a new method function_or_virtual_thunk_symbol() for this. And simplified the algorithm of both function_symbol variants a bit. Attached, you'll find my updated patch for review. Boot-strapped and regression tested on x86_64-linux-gnu. OK for trunk? Thanks Bernd. Can you, please, send the updated patch? Sorry for late review, Honza 2014-11-25 Bernd Edlinger bernd.edlin...@hotmail.de PR ipa/61190 * cgraph.h (symtab_node::call_for_symbol_and_aliases): Fix comment. (cgraph_node::function_or_virtual_thunk_symbol): New function. (cgraph_node::call_for_symbol_and_aliases): Fix comment. (cgraph_node::call_for_symbol_thunks_and_aliases): Adjust comment. Add new optional parameter exclude_virtual_thunks. * cgraph.c (cgraph_node::call_for_symbol_thunks_and_aliases): Add new optional parameter exclude_virtual_thunks. (cgraph_node::set_const_flag): Don't propagate to virtual thunks. (cgraph_node::set_pure_flag): Likewise. (cgraph_node::function_symbol): Simplified. (cgraph_node::function_or_virtual_thunk_symbol): New function. * ipa-pure-const.c (analyze_function): For virtual thunks set pure_const_state to IPA_NEITHER. (propagate_pure_const): Use function_or_virtual_thunk_symbol. testsuite/ChangeLog: 2014-11-25 Bernd Edlinger bernd.edlin...@hotmail.de PR ipa/61190 * g++.old-deja/g++.mike/p4736b.C: Use -O2. patch-pr61190.diff Description: Binary data
Re: [PATCH, PR63995, CHKP] Use single static bounds var for varpool nodes sharing asm name
On Tue, Nov 25, 2014 at 11:19 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-25 12:43 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 25, 2014 at 9:45 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch partly fixes PR bootstrap/63995 by avoiding duplicating static bounds vars. With this fix bootstrap still fails at stage 2 and 3 comparison. Bootstrapped and checked on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * tree-chkp (chkp_make_static_bounds): Share bounds var between nodes sharing assembler name. gcc/testsuite 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * g++.dg/dg.exp: Add mpx-dg.exp. * g++.dg/pr63995-1.C: New. diff --git a/gcc/testsuite/g++.dg/dg.exp b/gcc/testsuite/g++.dg/dg.exp index 14beae1..44eab0c 100644 --- a/gcc/testsuite/g++.dg/dg.exp +++ b/gcc/testsuite/g++.dg/dg.exp @@ -18,6 +18,7 @@ # Load support procs. load_lib g++-dg.exp +load_lib mpx-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CXXFLAGS diff --git a/gcc/testsuite/g++.dg/pr63995-1.C b/gcc/testsuite/g++.dg/pr63995-1.C new file mode 100644 index 000..82e7606 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63995-1.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -O2 -g -fcheck-pointer-bounds -mmpx } */ + +int test1 (int i) +{ + extern const int arr[10]; + return arr[i]; +} + +extern const int arr[10]; + +int test2 (int i) +{ + return arr[i]; +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..d425084 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2727,9 +2727,29 @@ chkp_make_static_bounds (tree obj) /* First check if we already have required var. */ if (chkp_static_var_bounds) { - slot = chkp_static_var_bounds-get (obj); - if (slot) - return *slot; + /* If there is a symbol sharing assembler name with obj, +we may use its bounds. */ + if (TREE_CODE (obj) == VAR_DECL) + { + varpool_node *node = varpool_node::get_create (obj); + + while (node-previous_sharing_asm_name) + node = (varpool_node *)node-previous_sharing_asm_name; + + while (node) + { + slot = chkp_static_var_bounds-get (node-decl); + if (slot) + return *slot; + node = (varpool_node *)node-next_sharing_asm_name; + } Hum. varpool_node::get returns the ultimate alias target thus the walking shouldn't be necessary. Just node = varpool_node::get_create (obj); slot = chkp_static_var_bounds-get (node-decl); if (slot) return *slot; and then making sure to set the decl also for node-decl. I suppose it really asks for making chkp_static_var_bounds-get based on a varpool node and not a decl so you consistently use the ultimate alias target. varpool_node::get just returns symtab_node::get which returns decl-decl_with_vis.symtab_node - thus no aliases walkthrough. Also none of two varpool_nodes is an alias. The only connection between these nodes seems to be {next,previous}_sharing_asm_name. Here is how these nodes look: Ok, then it's get_for_asmname (). That said - the above loops look bogus to me. Honza - any better ideas? Richard. (gdb) p *$2 $3 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0, refuse_visibility_changes = 0, externally_visible = 0, no_reorder = 0, force_output = 0, forced_by_abi = 0, unique_name = 0, implicit_section = 0, body_removed = 1, used_from_other_partition = 0, in_other_partition = 0, address_taken = 0, in_init_priority_hash = 0, need_lto_streaming = 0, offloadable = 0, order = 3, decl = 0x77dd2cf0, next = 0x77f46200, previous = 0x77dd67a8, next_sharing_asm_name = 0x0, previous_sharing_asm_name = 0x77f46200, same_comdat_group = 0x0, ref_list = {references = 0x0, referring = {m_vec = 0x23856b0}}, alias_target = 0x0, lto_file_data = 0x0, aux = 0x0, x_comdat_group = 0x0, x_section = 0x0}, output = 0, need_bounds_init = 0, dynamically_initialized = 0, tls_model = TLS_MODEL_NONE, used_by_single_function = 0} (gdb) p *$5 $6 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0, refuse_visibility_changes = 0, externally_visible = 0, no_reorder = 0, force_output = 0, forced_by_abi = 0, unique_name = 0, implicit_section = 0, body_removed = 1, used_from_other_partition = 0, in_other_partition = 0, address_taken = 0, in_init_priority_hash = 0, need_lto_streaming =
Re: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias
On 24-11-14 11:56, Tom de Vries wrote: On 15-11-14 18:19, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch moves omp expansion of the oacc kernels directive to after pass_build_ealias. The rationale is that in order to use pass_parallelize_loops for analysis and transformation of an oacc kernels region, we postpone omp expansion of that region until the earliest point in the pass list where enough information is availabe to run pass_parallelize_loops, in other words, after pass_build_ealias. The patch postpones expansion in expand_omp, and ensures expansion by adding pass_expand_omp_ssa: - after pass_build_ealias, and - after pass_all_early_optimizations for the case we're not optimizing. In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa, the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of lowered omp code, to handle it conservatively. The patch contains changes in expand_omp_target to deal with ssa-code, similar to what is already present in expand_omp_taskreg. Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be static for oacc kernels. It does this to get some references to .omp_data_sizes and .omp_data_kinds in the ssa code. Without these references, the definitions will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE kludge for this purpose ]. Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the original function of which the definition has been removed (as in moved to the split off function). TODO_remove_unused_locals takes care of some of them, but not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these dangling SSA_NAMEs and releases them. Reposting with small update: I've replaced the use of the rather generic gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p. Bootstrapped and reg-tested in the same way as before. I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre. This allows fre to unify references to the same omp variable before entering pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels. F.i. this reduction fragment: ... # VUSE .MEM_8 # PT = { D.2282 } _67 = .omp_data_i_59-sumD.2270; # VUSE .MEM_8 _68 = *_67; _70 = _66 + _68; # VUSE .MEM_8 # PT = { D.2282 } _69 = .omp_data_i_59-sumD.2270; # .MEM_71 = VDEF .MEM_8 *_69 = _70; ... is transformed by fre into: ... # VUSE .MEM_8 # PT = { D.2282 } _67 = .omp_data_i_59-sumD.2270; # VUSE .MEM_8 _68 = *_67; _70 = _66 + _68; # .MEM_71 = VDEF .MEM_8 *_67 = _70; ... In order for pass_fre to respect the kernels region boundaries, I've added a change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 1/7] Expand oacc kernels after pass_fre 2014-11-25 Tom de Vries t...@codesourcery.com * function.h (struct function): Add contains_oacc_kernels field. * gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels. * omp-low.c: Include gimple-pretty-print.h. (release_first_vuse_in_edge_dest): New function. (expand_omp_target): Handle ssa-code. (expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa. (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in properties_provided field. (pass_expand_omp::execute): Set PROP_gimple_eomp in cfun-curr_properties only if cfun does not contain oacc kernels. (pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to todo_flags_finish field. (pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling execute_expand_omp. (lower_omp_target): Add static_arrays variable, init to 1. Don't use static arrays for kernels directive. Use static_arrays variable. Handle case that .omp_data_kinds is not static. (gimple_stmt_ssa_operand_references_var_p) (gimple_stmt_omp_data_i_init_p): New function. * omp-low.h (gimple_stmt_omp_data_i_init_p): Declare. * passes.def: Add pass_expand_omp_ssa after pass_fre. Add pass_expand_omp_ssa after pass_all_early_optimizations. * tree-ssa-ccp.c: Include omp-low.h. (surely_varying_stmt_p, ccp_visit_stmt): Handle
Re: [PATCH, 2/8] Add pass_oacc_kernels
On 15-11-14 18:20, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds a pass group pass_oacc_kernels. The rationale is that we want a pass group to run oacc kernels region related (optimization) passes in. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 2/7] Add pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * passes.def: Add pass group pass_oacc_kernels. * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. --- gcc/passes.def | 7 ++- gcc/tree-pass.h | 1 + gcc/tree-ssa-loop.c | 48 3 files changed, 55 insertions(+), 1 deletion(-) diff --git a/gcc/passes.def b/gcc/passes.def index bf1cd34..efb3d8c 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -86,7 +86,12 @@ along with GCC; see the file COPYING3. If not see execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); - NEXT_PASS (pass_expand_omp_ssa); + /* Pass group that runs when there are oacc kernels in the + function. */ + NEXT_PASS (pass_oacc_kernels); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) + NEXT_PASS (pass_expand_omp_ssa); + POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_early_ipa_sra); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 75f8aa5..d63ab2b 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -449,6 +449,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt); extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 758b5fc..c29aa22 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -157,6 +157,54 @@ make_pass_tree_loop (gcc::context *ctxt) return new pass_tree_loop (ctxt); } +/* Gate for oacc kernels pass group. */ + +static bool +gate_oacc_kernels (function *fn) +{ + if (!flag_openacc) +return false; + + return fn-contains_oacc_kernels; +} + +/* The oacc kernels superpass. */ + +namespace { + +const pass_data pass_data_oacc_kernels = +{ + GIMPLE_PASS, /* type */ + oacc_kernels, /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_oacc_kernels : public gimple_opt_pass +{ +public: + pass_oacc_kernels (gcc::context *ctxt) +: gimple_opt_pass (pass_data_oacc_kernels, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *fn) { return gate_oacc_kernels (fn); } + +}; // class pass_oacc_kernels + +} // anon namespace + +gimple_opt_pass * +make_pass_oacc_kernels (gcc::context *ctxt) +{ + return new pass_oacc_kernels (ctxt); +} + /* The no-loop superpass. */ namespace { -- 1.9.1
Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
On 15-11-14 18:21, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels. The idea is that pass_parallelize_loops only deals with loops for which the header has been copied, so the easiest way to meet that requirement when running pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part of pass_oacc_kernels. We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't part of a kernels region alone. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 3/7] Add pass_ch_oacc_kernels to pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. * tree-pass.h (make_pass_ch_oacc_kernels): Declare * tree-ssa-loop-ch.c: Include omp-low.h. (pass_ch_execute): Declare. (pass_ch::execute): Factor out ... (pass_ch_execute): ... this new function. If handling oacc kernels, skip loops that are not in oacc kernels region. (pass_ch_oacc_kernels::execute): (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New function. --- gcc/omp-low.c | 83 ++ gcc/omp-low.h | 2 ++ gcc/passes.def | 1 + gcc/tree-pass.h| 1 + gcc/tree-ssa-loop-ch.c | 59 +-- 5 files changed, 144 insertions(+), 2 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3ac546c..543dd48 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13912,4 +13912,87 @@ gimple_stmt_omp_data_i_init_p (gimple stmt) SSA_OP_DEF); } +/* Return true if LOOP is inside a kernels region. */ + +bool +loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry, + basic_block *region_exit) +{ + bitmap excludes_bitmap = BITMAP_GGC_ALLOC (); + bitmap region_bitmap = BITMAP_GGC_ALLOC (); + bitmap_clear (region_bitmap); + + if (region_entry != NULL) +*region_entry = NULL; + if (region_exit != NULL) +*region_exit = NULL; + + basic_block bb; + gimple last; + FOR_EACH_BB_FN (bb, cfun) +{ + if (bitmap_bit_p (region_bitmap, bb-index)) + continue; + + last = last_stmt (bb); + if (!last) + continue; + + if (gimple_code (last) != GIMPLE_OACC_KERNELS) + continue; + + bitmap_clear (excludes_bitmap); + bitmap_set_bit (excludes_bitmap, bb-index); + + vecbasic_block dominated + = get_all_dominated_blocks (CDI_DOMINATORS, bb); + + unsigned di; + basic_block dom; + + basic_block end_region = NULL; + FOR_EACH_VEC_ELT (dominated, di, dom) + { + if (dom == bb) + continue; + + last = last_stmt (dom); + if (!last) + continue; + + if (gimple_code (last) != GIMPLE_OMP_RETURN) + continue; + + if (end_region == NULL + || dominated_by_p (CDI_DOMINATORS, end_region, dom)) + end_region = dom; + } + + vecbasic_block excludes + = get_all_dominated_blocks (CDI_DOMINATORS, end_region); + + unsigned di2; + basic_block exclude; + + FOR_EACH_VEC_ELT (excludes, di2, exclude) + if (exclude != end_region) + bitmap_set_bit (excludes_bitmap, exclude-index); + + FOR_EACH_VEC_ELT (dominated, di, dom) + if (!bitmap_bit_p (excludes_bitmap, dom-index)) + bitmap_set_bit (region_bitmap, dom-index); + + if (bitmap_bit_p (region_bitmap, loop-header-index)) + { + if (region_entry != NULL) + *region_entry = bb; + if (region_exit != NULL) + *region_exit = end_region; + return true; + } +} + + return false; +} + #include gt-omp-low.h diff --git a/gcc/omp-low.h b/gcc/omp-low.h index 32076e4..30df867 100644 --- a/gcc/omp-low.h +++ b/gcc/omp-low.h @@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree); extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *); extern void omp_finish_file (void); extern bool gimple_stmt_omp_data_i_init_p (gimple); +extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *, +
Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
On 15-11-14 18:21, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_tree_loop_init and pass_tree_loop_init_done to pass_oacc_kernels. Pass_parallelize_loops is run between these passes in the pass group pass_tree_loop, since it requires loop information. We do the same for pass_oacc_kernels. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 4/7] Add pass_tree_loop_{init,done} to pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass group pass_oacc_kernels. * tree-ssa-loop.c (pass_tree_loop_init::clone) (pass_tree_loop_done::clone): New function. --- gcc/passes.def | 2 ++ gcc/tree-ssa-loop.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/gcc/passes.def b/gcc/passes.def index 01368bb..37e08a8 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); + NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_expand_omp_ssa); + NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index c29aa22..c78b013 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -269,6 +269,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); } }; // class pass_tree_loop_init @@ -563,6 +564,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *) { return tree_ssa_loop_done (); } + opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); } }; // class pass_tree_loop_done -- 1.9.1
Re: [PATCH, 5/8] Add pass_loop_im to pass_oacc_kernels
On 15-11-14 18:22, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_loop_im to pass group pass_oacc_kernels. We need this pass to simplify the loop body, and allow pass_parloops to detect that loop iterations are independent. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 5/7] Add pass_loop_im to pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. * c-c++-common/restrict-2.c: Update for new pass_lim. * c-c++-common/restrict-4.c: Same. * g++.dg/tree-ssa/pr33615.C: Same. * g++.dg/tree-ssa/restrict1.C: Same. * gcc.dg/tm/pub-safety-1.c: Same. * gcc.dg/tm/reg-promotion.c: Same. * gcc.dg/tree-ssa/20050314-1.c: Same. * gcc.dg/tree-ssa/loop-32.c: Same. * gcc.dg/tree-ssa/loop-33.c: Same. * gcc.dg/tree-ssa/loop-34.c: Same. * gcc.dg/tree-ssa/loop-35.c: Same. * gcc.dg/tree-ssa/loop-7.c: Same. * gcc.dg/tree-ssa/pr23109.c: Same. * gcc.dg/tree-ssa/restrict-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-1.c: Same. * gcc.dg/tree-ssa/ssa-lim-10.c: Same. * gcc.dg/tree-ssa/ssa-lim-11.c: Same. * gcc.dg/tree-ssa/ssa-lim-12.c: Same. * gcc.dg/tree-ssa/ssa-lim-2.c: Same. * gcc.dg/tree-ssa/ssa-lim-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-6.c: Same. * gcc.dg/tree-ssa/ssa-lim-7.c: Same. * gcc.dg/tree-ssa/ssa-lim-8.c: Same. * gcc.dg/tree-ssa/ssa-lim-9.c: Same. * gcc.dg/tree-ssa/structopt-1.c: Same. * gfortran.dg/pr32921.f: Same. --- gcc/passes.def | 1 + gcc/testsuite/c-c++-common/restrict-2.c | 6 +++--- gcc/testsuite/c-c++-common/restrict-4.c | 6 +++--- gcc/testsuite/g++.dg/tree-ssa/pr33615.C | 6 +++--- gcc/testsuite/g++.dg/tree-ssa/restrict1.C | 6 +++--- gcc/testsuite/gcc.dg/tm/pub-safety-1.c | 6 +++--- gcc/testsuite/gcc.dg/tm/reg-promotion.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-32.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-33.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-34.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-35.c | 8 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/pr23109.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c | 8 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c | 6 +++--- gcc/testsuite/gfortran.dg/pr32921.f | 6 +++--- 27 files changed, 81 insertions(+), 80 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index 37e08a8..438d292 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -92,6 +92,7 @@ along with GCC; see the file COPYING3. If not see PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); + NEXT_PASS (pass_lim); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () diff --git a/gcc/testsuite/c-c++-common/restrict-2.c b/gcc/testsuite/c-c++-common/restrict-2.c index 3f71b77..f0b0e15a 100644 --- a/gcc/testsuite/c-c++-common/restrict-2.c +++ b/gcc/testsuite/c-c++-common/restrict-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fno-strict-aliasing -fdump-tree-lim1-details } */ +/* { dg-options -O -fno-strict-aliasing -fdump-tree-lim2-details } */ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j) { @@ -10,5 +10,5 @@ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j) /* We should move the RHS of the store out of the loop. */ -/* { dg-final { scan-tree-dump-times Moving statement 11 lim1 } } */ -/* { dg-final { cleanup-tree-dump lim1 } } */
Re: [PATCH, 6/8] Add pass_ccp to pass_oacc_kernels
On 15-11-14 18:22, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_loop_ccp to pass group pass_oacc_kernels. We need this pass to simplify the loop body, and allow pass_parloops to detect that loop iterations are independent. As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) I've replaced the pass_ccp with pass_copyprop, which performs trivial constant propagation in addition to copy propagation. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 6/7] Add pass_copy_prop in pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init conservatively. --- gcc/passes.def | 1 + gcc/tree-ssa-copy.c | 4 2 files changed, 5 insertions(+) diff --git a/gcc/passes.def b/gcc/passes.def index 438d292..fb0d331 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); + NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c index 7c22c5e..d6eb7a7 100644 --- a/gcc/tree-ssa-copy.c +++ b/gcc/tree-ssa-copy.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include tree-scalar-evolution.h #include tree-ssa-dom.h #include tree-ssa-loop-niter.h +#include omp-low.h /* This file implements the copy propagation pass and provides a @@ -110,6 +111,9 @@ stmt_may_generate_copy (gimple stmt) if (gimple_has_volatile_ops (stmt)) return false; + if (gimple_stmt_omp_data_i_init_p (stmt)) +return false; + /* Statements with loads and/or stores will never generate a useful copy. */ if (gimple_vuse (stmt)) return false; -- 1.9.1
Re: [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels
On 15-11-14 18:23, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds: - a specialized version of pass_parallelize_loops called pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and - relevant test-cases. The pass only handles loops that are in a kernels region, and skips over bits of pass_parallelize_loops that are already done for oacc kernels. The pass reintroduces the use of omp_expand_local, I haven't managed to make it work yet using the external pass pass_expand_omp_ssa. An obvious limitation of the patch is the fact that we copy over the clauses from the kernels directive to the generated parallel directive. We'll need to do something more intelligent here, f.i. setting vector_length based on the parallelization factor. Another limitation is that the pass still needs -ftree-parallelize-loops to trigger. Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels. Bootstrapped and reg-tested as before. OK for trunk? Thanks, - Tom [PATCH 7/7] Add pass_parloops_oacc_kernels to pass_oacc_kernels 2014-11-25 Tom de Vries t...@codesourcery.com * passes.def: Add pass_parallelize_loops_oacc_kernels in pass group pass_oacc_kernels. Move pass_expand_omp_ssa into pass group pass_oacc_kernels. * tree-parloops.c (create_parallel_loop): Add function parameters region_entry and bool oacc_kernels_p. Handle oacc_kernels_p. (gen_parallel_loop): Same. Use omp_expand_local if oacc_kernels_p. Call create_parallel_loop with additional args. (parallelize_loops): Add function parameter oacc_kernels_p. Calculate dominance info. Skip loops that are not in a kernels region. Call gen_parallel_loop with additional args. (pass_parallelize_loops::execute): Call parallelize_loops with false argument. (pass_data_parallelize_loops_oacc_kernels): New pass_data. (class pass_parallelize_loops_oacc_kernels): New pass. (pass_parallelize_loops_oacc_kernels::execute) (make_pass_parallelize_loops_oacc_kernels): New function. * tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare. * testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test. * testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test. * gcc.dg/oacc-kernels-2.c: New test. * gcc.dg/oacc-kernels.c: New test. --- gcc/passes.def | 1 + gcc/testsuite/gcc.dg/oacc-kernels-2.c | 79 +++ gcc/testsuite/gcc.dg/oacc-kernels.c| 71 ++ gcc/tree-parloops.c| 242 - gcc/tree-pass.h| 2 + .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c | 65 ++ .../testsuite/libgomp.oacc-c/oacc-kernels-run.c| 59 + 7 files changed, 464 insertions(+), 55 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c diff --git a/gcc/passes.def b/gcc/passes.def index fb0d331..d91283b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_parallelize_loops_oacc_kernels); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c new file mode 100644 index 000..1ff4bad --- /dev/null +++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c @@ -0,0 +1,79 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fopenacc } */ +/* { dg-options -fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename } */ + +#include stdlib.h +#include stdio.h + +#define N (1024 * 512) +#define N_REF 4293394432 + +#if 1 +#define COUNTERTYPE unsigned int +#else +#define COUNTERTYPE int +#endif + +int +main (void) +{ + unsigned int i; + + unsigned int *__restrict a; + unsigned int *__restrict b; + unsigned int *__restrict c; + + a = malloc (N * sizeof (unsigned int)); + b = malloc (N * sizeof (unsigned int)); + c =
[C++ Patch] PR 63786
Hi, we are crashing on this kind of invalid code because we don't early check the case with check_for_bare_parameter_packs. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-11-25 Paolo Carlini paolo.carl...@oracle.com PR c++/63786 * parser.c (cp_parser_label_for_labeled_statement): Check the case with check_for_bare_parameter_packs. /testsuite 2014-11-25 Paolo Carlini paolo.carl...@oracle.com PR c++/63786 * g++.dg/cpp0x/variadic163.C: New. Index: cp/parser.c === --- cp/parser.c (revision 218039) +++ cp/parser.c (working copy) @@ -9820,6 +9820,8 @@ cp_parser_label_for_labeled_statement (cp_parser* cp_lexer_consume_token (parser-lexer); /* Parse the constant-expression. */ expr = cp_parser_constant_expression (parser); + if (check_for_bare_parameter_packs (expr)) + expr = error_mark_node; ellipsis = cp_lexer_peek_token (parser-lexer); if (ellipsis-type == CPP_ELLIPSIS) @@ -9826,8 +9828,9 @@ cp_parser_label_for_labeled_statement (cp_parser* { /* Consume the `...' token. */ cp_lexer_consume_token (parser-lexer); - expr_hi = - cp_parser_constant_expression (parser); + expr_hi = cp_parser_constant_expression (parser); + if (check_for_bare_parameter_packs (expr_hi)) + expr_hi = error_mark_node; /* We don't need to emit warnings here, as the common code will do this for us. */ Index: testsuite/g++.dg/cpp0x/variadic163.C === --- testsuite/g++.dg/cpp0x/variadic163.C(revision 0) +++ testsuite/g++.dg/cpp0x/variadic163.C(working copy) @@ -0,0 +1,21 @@ +// PR c++/63786 +// { dg-do compile { target c++11 } } +// { dg-options } + +template int... Is +int f(int i) { +switch (i) { +case Is: // { dg-error not expanded } +return 0; +} + +switch (i) { +case 0 ...Is: // { dg-error not expanded } +return 0; +} +return 0; +} + +int main() { +f1,2,3(1); +}
Re: [PATCH, PR63995, CHKP] Use single static bounds var for varpool nodes sharing asm name
2014-11-25 14:11 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 25, 2014 at 11:19 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-25 12:43 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 25, 2014 at 9:45 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch partly fixes PR bootstrap/63995 by avoiding duplicating static bounds vars. With this fix bootstrap still fails at stage 2 and 3 comparison. Bootstrapped and checked on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * tree-chkp (chkp_make_static_bounds): Share bounds var between nodes sharing assembler name. gcc/testsuite 2014-11-25 Ilya Enkovich ilya.enkov...@intel.com PR bootstrap/63995 * g++.dg/dg.exp: Add mpx-dg.exp. * g++.dg/pr63995-1.C: New. diff --git a/gcc/testsuite/g++.dg/dg.exp b/gcc/testsuite/g++.dg/dg.exp index 14beae1..44eab0c 100644 --- a/gcc/testsuite/g++.dg/dg.exp +++ b/gcc/testsuite/g++.dg/dg.exp @@ -18,6 +18,7 @@ # Load support procs. load_lib g++-dg.exp +load_lib mpx-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CXXFLAGS diff --git a/gcc/testsuite/g++.dg/pr63995-1.C b/gcc/testsuite/g++.dg/pr63995-1.C new file mode 100644 index 000..82e7606 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63995-1.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -O2 -g -fcheck-pointer-bounds -mmpx } */ + +int test1 (int i) +{ + extern const int arr[10]; + return arr[i]; +} + +extern const int arr[10]; + +int test2 (int i) +{ + return arr[i]; +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..d425084 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2727,9 +2727,29 @@ chkp_make_static_bounds (tree obj) /* First check if we already have required var. */ if (chkp_static_var_bounds) { - slot = chkp_static_var_bounds-get (obj); - if (slot) - return *slot; + /* If there is a symbol sharing assembler name with obj, +we may use its bounds. */ + if (TREE_CODE (obj) == VAR_DECL) + { + varpool_node *node = varpool_node::get_create (obj); + + while (node-previous_sharing_asm_name) + node = (varpool_node *)node-previous_sharing_asm_name; + + while (node) + { + slot = chkp_static_var_bounds-get (node-decl); + if (slot) + return *slot; + node = (varpool_node *)node-next_sharing_asm_name; + } Hum. varpool_node::get returns the ultimate alias target thus the walking shouldn't be necessary. Just node = varpool_node::get_create (obj); slot = chkp_static_var_bounds-get (node-decl); if (slot) return *slot; and then making sure to set the decl also for node-decl. I suppose it really asks for making chkp_static_var_bounds-get based on a varpool node and not a decl so you consistently use the ultimate alias target. varpool_node::get just returns symtab_node::get which returns decl-decl_with_vis.symtab_node - thus no aliases walkthrough. Also none of two varpool_nodes is an alias. The only connection between these nodes seems to be {next,previous}_sharing_asm_name. Here is how these nodes look: Ok, then it's get_for_asmname (). That said - the above loops look bogus to me. Honza - any better ideas? get_for_asmname () returns the first element in a chain of nodes with the same asm name. May I rely on the order of nodes in this chain? Probably use ASSEMBLER_NAME as a key in chkp_static_var_bounds hash? Thanks, Ilya Richard. (gdb) p *$2 $3 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0, refuse_visibility_changes = 0, externally_visible = 0, no_reorder = 0, force_output = 0, forced_by_abi = 0, unique_name = 0, implicit_section = 0, body_removed = 1, used_from_other_partition = 0, in_other_partition = 0, address_taken = 0, in_init_priority_hash = 0, need_lto_streaming = 0, offloadable = 0, order = 3, decl = 0x77dd2cf0, next = 0x77f46200, previous = 0x77dd67a8, next_sharing_asm_name = 0x0, previous_sharing_asm_name = 0x77f46200, same_comdat_group = 0x0, ref_list = {references = 0x0, referring = {m_vec = 0x23856b0}}, alias_target = 0x0, lto_file_data = 0x0, aux = 0x0, x_comdat_group = 0x0, x_section = 0x0}, output = 0, need_bounds_init = 0, dynamically_initialized = 0, tls_model = TLS_MODEL_NONE, used_by_single_function = 0} (gdb) p *$5 $6 = {symtab_node = {type = SYMTAB_VARIABLE, resolution = LDPR_UNKNOWN, definition = 0, alias = 0, weakref = 0, cpp_implicit_alias = 0, analyzed = 0, writeonly = 0,
[PATCH] Remove unnecessary calls to strchr.
Hi, As proposed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63853 this patch replaces some function calls with pointer arithmetic. I didn't mention PR in Changelog, as they are not actually related. Ok for trunk? gcc/ * gcc.c (handle_foffload_option): Remove unnecessary calls to strchr, strlen, strncpy. * lto-wrapper.c (append_offload_options): Likewise. --- gcc/gcc.c | 24 +--- gcc/lto-wrapper.c | 2 +- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index 653ca8d..4731eec 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -3384,11 +3384,11 @@ handle_foffload_option (const char *arg) { next = strchr (cur, ','); if (next == NULL) - next = strchr (cur, '\0'); + next = end; next = (next end) ? end : next; target = XNEWVEC (char, next - cur + 1); - strncpy (target, cur, next - cur); + memcpy (target, cur, next - cur); target[next - cur] = '\0'; /* If 'disable' is passed to the option, stop parsing the option and clean @@ -3408,8 +3408,7 @@ handle_foffload_option (const char *arg) if (n == NULL) n = strchr (c, '\0'); - if (strlen (target) == (size_t) (n - c) - strncmp (target, c, n - c) == 0) + if (next - cur == n - c strncmp (target, c, n - c) == 0) break; c = *n ? n + 1 : NULL; @@ -3420,7 +3419,10 @@ handle_foffload_option (const char *arg) target); if (!offload_targets) - offload_targets = xstrdup (target); + { + offload_targets = target; + target = NULL; + } else { /* Check that the target hasn't already presented in the list. */ @@ -3431,8 +3433,7 @@ handle_foffload_option (const char *arg) if (n == NULL) n = strchr (c, '\0'); - if (strlen (target) == (size_t) (n - c) - strncmp (c, target, n - c) == 0) + if (next - cur == n - c strncmp (c, target, n - c) == 0) break;
Re: [PATCH] Remove unnecessary calls to strchr.
On Tue, Nov 25, 2014 at 03:15:04PM +0300, Ilya Tocar wrote: As proposed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63853 this patch replaces some function calls with pointer arithmetic. I didn't mention PR in Changelog, as they are not actually related. Ok for trunk? @@ -3408,8 +3408,7 @@ handle_foffload_option (const char *arg) if (n == NULL) n = strchr (c, '\0'); - if (strlen (target) == (size_t) (n - c) -strncmp (target, c, n - c) == 0) + if (next - cur == n - c strncmp (target, c, n - c) == 0) I suppose you could use memcmp here, you know the string lengths. @@ -3431,8 +3433,7 @@ handle_foffload_option (const char *arg) if (n == NULL) n = strchr (c, '\0'); - if (strlen (target) == (size_t) (n - c) -strncmp (c, target, n - c) == 0) + if (next - cur == n - c strncmp (c, target, n - c) == 0) break; And here too. Ok with or without those changes. Jakub
[PATCH] sreal class fix for PR64050 and PR64060
Hello. Following patch is fix sreal problems that are mentioned in PR64050, PR64060. I added new GCC plugin test where I test sreal arithmetics and number comparison. Patch can bootstrap on ppc64-linux-pc and x86_64-linux-pc and can pass regression tests. Thanks, Martin gcc/ChangeLog: 2014-11-25 Martin Liska Martin li...@suse.cz PR bootstrap/64050 PR ipa/64060 * sreal.c (sreal::operator+): Addition fixed. (sreal::signedless_plus): Negative numbers are handled correctly. (sreal::operator-): Subtraction is fixed. (sreal::signedless_minus): Negative numbers are handled correctly. * sreal.h (sreal::operator): Equal negative numbers are compared correctly. (sreal::shift): New checking asserts are introduced. Operation is fixed. gcc/testsuite/ChangeLog: 2014-11-25 Martin Liska Martin li...@suse.cz PR bootstrap/64050 PR ipa/64060 * gcc.dg/plugin/plugin.exp: New plugin. * gcc.dg/plugin/sreal-test-1.c: New test. * gcc.dg/plugin/sreal_plugin.c: New test. diff --git a/gcc/sreal.c b/gcc/sreal.c index 0337f9e..2b5e3ae 100644 --- a/gcc/sreal.c +++ b/gcc/sreal.c @@ -182,9 +182,9 @@ sreal::operator+ (const sreal other) const { sreal tmp = -(*b_p); if (*a_p tmp) - return signedless_minus (tmp, *a_p, false); + return signedless_minus (tmp, *a_p, true); else - return signedless_minus (*a_p, tmp, true); + return signedless_minus (*a_p, tmp, false); } gcc_checking_assert (a_p-m_negative == b_p-m_negative); @@ -203,7 +203,7 @@ sreal::signedless_plus (const sreal a, const sreal b, bool negative) const sreal *a_p = a; const sreal *b_p = b; - if (*a_p *b_p) + if (a_p-m_exp b_p-m_exp) std::swap (a_p, b_p); dexp = a_p-m_exp - b_p-m_exp; @@ -211,6 +211,7 @@ sreal::signedless_plus (const sreal a, const sreal b, bool negative) if (dexp SREAL_BITS) { r.m_sig = a_p-m_sig; + r.m_negative = negative; return r; } @@ -248,11 +249,11 @@ sreal::operator- (const sreal other) const /* We want to substract a smaller number from bigger for nonegative numbers. */ if (!m_negative *this other) -return -signedless_minus (other, *this, true); +return signedless_minus (other, *this, true); /* Example: -2 - (-3) = 3 - 2 */ if (m_negative *this other) -return signedless_minus (-other, -(*this), true); +return signedless_minus (-other, -(*this), false); sreal r = signedless_minus (*this, other, m_negative); @@ -274,6 +275,7 @@ sreal::signedless_minus (const sreal a, const sreal b, bool negative) if (dexp SREAL_BITS) { r.m_sig = a_p-m_sig; + r.m_negative = negative; return r; } if (dexp == 0) diff --git a/gcc/sreal.h b/gcc/sreal.h index 1362bf6..3938c6e 100644 --- a/gcc/sreal.h +++ b/gcc/sreal.h @@ -60,6 +60,11 @@ public: bool operator (const sreal other) const { +/* We negate result in case of negative numbers and + it would return true for equal negative numbers. */ +if (*this == other) + return false; + if (m_negative != other.m_negative) return m_negative other.m_negative; @@ -86,10 +91,19 @@ public: return tmp; } - sreal shift (int sig) const + sreal shift (int s) const { +gcc_checking_assert (s = SREAL_BITS); +gcc_checking_assert (s = -SREAL_BITS); + +/* Exponent should never be so large because shift_right is used only by + sreal_add and sreal_sub ant thus the number cannot be shifted out from + exponent range. */ +gcc_checking_assert (m_exp + s = SREAL_MAX_EXP); +gcc_checking_assert (m_exp + s = -SREAL_MAX_EXP); + sreal tmp = *this; -tmp.m_sig += sig; +tmp.m_exp += s; return tmp; } diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp index e4b5f54..c12b3da 100644 --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp @@ -59,6 +59,7 @@ set plugin_test_list [list \ { selfassign.c self-assign-test-1.c self-assign-test-2.c } \ { ggcplug.c ggcplug-test-1.c } \ { one_time_plugin.c one_time-test-1.c } \ +{ sreal_plugin.c sreal-test-1.c } \ { start_unit_plugin.c start_unit-test-1.c } \ { finish_unit_plugin.c finish_unit-test-1.c } \ ] diff --git a/gcc/testsuite/gcc.dg/plugin/sreal-test-1.c b/gcc/testsuite/gcc.dg/plugin/sreal-test-1.c new file mode 100644 index 000..1bce2cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/sreal-test-1.c @@ -0,0 +1,8 @@ +/* Test that pass is inserted and invoked once. */ +/* { dg-do compile } */ +/* { dg-options -O } */ + +int main (int argc, char **argv) +{ + return 0; +} diff --git a/gcc/testsuite/gcc.dg/plugin/sreal_plugin.c b/gcc/testsuite/gcc.dg/plugin/sreal_plugin.c new file mode 100644 index 000..f113816 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/sreal_plugin.c @@ -0,0
PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is (gdb) call debug_tree (type) enumeral_type 0x719f85e8 X type integer_type 0x718a93f0 unsigned char public unsigned string-flag QI size integer_cst 0x718a5fa8 constant 8 unit size integer_cst 0x718a5fc0 constant 1 align 8 symtab 0 alias set -1 canonical type 0x718a93f0 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 static unsigned type_5 QI size integer_cst 0x718a5fa8 8 unit size integer_cst 0x718a5fc0 1 align 8 symtab 0 alias set -1 canonical type 0x719f85e8 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 values tree_list 0x719fb028 purpose identifier_node 0x719f6738 V bindings (nil) local bindings (nil) value const_decl 0x718c21c0 V type enumeral_type 0x719f85e8 X readonly constant used VOID file pr64037.ii line 2 col 3 align 1 context enumeral_type 0x719f85e8 X initial integer_cst 0x719d8d08 2 context translation_unit_decl 0x77ff91e0 D.1 chain type_decl 0x719f5c78 X (gdb) and DECL_ARG_TYPE is (gdb) call debug_tree (type) integer_type 0x718a9690 int public SI size integer_cst 0x718a5e70 type integer_type 0x718a9150 bitsizetype constant 32 unit size integer_cst 0x718a5e88 type integer_type 0x718a90a8 sizetype constant 4 align 32 symtab 0 alias set 1 canonical type 0x718a9690 precision 32 min integer_cst 0x718c60c0 -2147483648 max integer_cst 0x718c60d8 2147483647 pointer_to_this pointer_type 0x718cb930 (gdb) This mismatch makes combine thinks a byte parameter is passed as int in register and turns (insn 9 6 10 2 (set (reg:SI 92 [ b ]) (zero_extend:SI (subreg:QI (reg:SI 91 [ b ]) 0))) pr64037.ii:9 138 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:SI 91 [ b ]) (nil))) (insn 10 9 0 2 (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 92 [ b ])) pr64037.ii:9 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 92 [ b ]) (expr_list:REG_DEAD (reg/v/f:SI 88 [ out ]) (nil into Trying 9 - 10: Successfully matched this instruction: (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 91 [ b ])) allowing combination of insns 9 and 10 original costs 6 + 4 = 10 replacement cost 4 deferring deletion of insn with uid = 9. modifying insn i310: [r88:SI]=r91:SI REG_DEAD r91:SI REG_DEAD r88:SI This patch makes setup_incoming_promotions to match assign_parm_setup_reg. Tested on Linux/x86-64 without regressions. OK for trunk and backport? Thanks. H.J. diff --git a/gcc/combine.c b/gcc/combine.c index 1808f97..a0449a2 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1561,8 +1561,8 @@ setup_incoming_promotions (rtx_insn *first) uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, - after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ - mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, + see assign_parm_setup_reg in function.c. */ +
Re: [PATCH v2] gcc/c-family/c-cppbuiltin.c: Let buffer enough to print host wide integer value
On 11/25/14 7:56, Joseph Myers wrote: On Sun, 23 Nov 2014, Chen Gang wrote: + gcc_assert (wi::fits_to_tree_p(value, integer_type_node)); Watch formatting: space before '(' in the wi::fits_to_tree_p call. Applies elsewhere in this patch as well. OK, thanks, I shall notice next. When making such an interface change, (a) you should update the comment on builtin_define_with_int_value to explain the new interface, and (b) you should check existing callers to make sure their values are indeed in range, and describe the check you did. In fact, -fabi-version=0 results in __GXX_ABI_VERSION being defined to 99 using builtin_define_with_int_value. That's out of range of int on targets with 16-bit int. So that indicates against requiring the value to be within range of int. It might however be OK to require the value to be within range of target long. For me, can let builtin_define_with_int_value() fit all kinds of integer values, and the assert need be: gcc_assert (wi::fits_to_tree_p (value, char_type_node) || wi::fits_to_tree_p (value, short_integer_type_node) || wi::fits_to_tree_p (value, integer_type_node) || wi::fits_to_tree_p (value, long_integer_type_node) || wi::fits_to_tree_p (value, long_long_integer_type_node)); If it really can fit all kinds of integer values, for me, the related comments of builtin_define_with_int_value() need not be changed. + if (value = 0) +{ + sprintf (buf, %s=HOST_WIDE_INT_PRINT_DEC%s, + macro, value, + value = HOST_INT_MAX + ? + : value = HOST_LONG_MAX + ? L : LL); Limits on the host's int and long are completely irrelevant here. The question is the target's int and long, not the host's - and consistency indicates checking with wi::fits_to_tree_p (value, integer_type_node) if the assertion checked with long_integer_type_node. OK, thanks. And for me, the related sprintf() should be: sprintf (buf, %s=%sHOST_WIDE_INT_PRINT_DEC%s%s, macro, value 0 ? ( : , value, wi::fits_to_tree_p (value, char_type_node) || wi::fits_to_tree_p (value, short_integer_type_node) || wi::fits_to_tree_p (value, integer_type_node) ? : wi::fits_to_tree_p (value, long_integer_type_node) ? L : LL, value 0 ? ) : ); Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed
Re: [C++ Patch] PR 63786
OK. Jason
[PATCH] Fix PR61927
I am testing the following patch which reverts order of group and pattern analysis to 4.8 state. It doesn't really matter but it avoids pattern analysis to know about groups which its failure causes the wrong-code in the PR. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Help with making the testcase in the PR suitable for the testsuite is appreciated - my Fortran fu is limited. Richard. 2014-11-25 Richard Biener rguent...@suse.de PR tree-optimization/61927 * tree-vect-loop.c (vect_analyze_loop_2): Revert ordering of group and pattern analysis to the one in GCC 4.8. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 218019) +++ gcc/tree-vect-loop.c(working copy) @@ -1662,6 +1662,13 @@ vect_analyze_loop_2 (loop_vec_info loop_ return false; } + /* Classify all cross-iteration scalar data-flow cycles. + Cross-iteration cycles caused by virtual phis are analyzed separately. */ + + vect_analyze_scalar_cycles (loop_vinfo); + + vect_pattern_recog (loop_vinfo, NULL); + /* Analyze the access patterns of the data-refs in the loop (consecutive, complex, etc.). FORNOW: Only handle consecutive access pattern. */ @@ -1674,13 +1681,6 @@ vect_analyze_loop_2 (loop_vec_info loop_ return false; } - /* Classify all cross-iteration scalar data-flow cycles. - Cross-iteration cycles caused by virtual phis are analyzed separately. */ - - vect_analyze_scalar_cycles (loop_vinfo); - - vect_pattern_recog (loop_vinfo, NULL); - /* Data-flow analysis to detect stmts that do not need to be vectorized. */ ok = vect_mark_stmts_to_be_vectorized (loop_vinfo);
[PATCH] Fix PR64065(?)
The following might fix PR64065 but is certainly a bug. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-11-25 Richard Biener rguent...@suse.de PR lto/64065 * lto-streamer-out.c (output_struct_function_base): Stream last_clique field. * lto-streamer-in.c (input_struct_function_base): Likewise. Index: gcc/lto-streamer-out.c === --- gcc/lto-streamer-out.c (revision 218019) +++ gcc/lto-streamer-out.c (working copy) @@ -1956,6 +1956,7 @@ output_struct_function_base (struct outp bp_pack_value (bp, fn-has_simduid_loops, 1); bp_pack_value (bp, fn-va_list_fpr_size, 8); bp_pack_value (bp, fn-va_list_gpr_size, 8); + bp_pack_value (bp, fn-last_clique, sizeof (short) * 8); /* Output the function start and end loci. */ stream_output_location (ob, bp, fn-function_start_locus); Index: gcc/lto-streamer-in.c === --- gcc/lto-streamer-in.c (revision 218019) +++ gcc/lto-streamer-in.c (working copy) @@ -903,6 +903,7 @@ input_struct_function_base (struct funct fn-has_simduid_loops = bp_unpack_value (bp, 1); fn-va_list_fpr_size = bp_unpack_value (bp, 8); fn-va_list_gpr_size = bp_unpack_value (bp, 8); + fn-last_clique = bp_unpack_value (bp, sizeof (short) * 8); /* Input the function start and end loci. */ fn-function_start_locus = stream_input_location (bp, data_in);
Re: [PATCH] rs6000: Replace a stray addic with addi
On Mon, Nov 24, 2014 at 10:18 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: Tested as usual... okay for trunk? Segher 2014-11-24 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/sysv4.h (ASM_OUTPUT_REG_POP): Use addi instead of addic. Okay. Thanks, David
Re: [PATCH] rs6000: Remove iorxor/IORXOR code attrs
On Mon, Nov 24, 2014 at 10:11 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: As Richard pointed out, those do nothing more than code/CODE. Tested etc.; okay for trunk? Segher 2014-11-21 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.md (iorxor, IORXOR): Delete code_attrs. (rest of file): Replace those with code resp. CODE. Okay. thanks, David
Re: [PATCH] AIX: Filename-based shared library versioning for libgcc_s
On Tue, Nov 11, 2014 at 10:42 AM, Michael Haubenwallner michael.haubenwall...@ssi-schaefer.com wrote: On 11/11/2014 04:02 PM, David Edelsohn wrote: Michael, Why does the configure change match with p*-*-aix... instead of power* or powerpc*? Yes, it's unique and will match, but why make it as short as possible, which doesn't match other uses? Actually I did have powerpc* initially, but gmp-6.0.0 config.guess'ed power7-ibm-aix6.1.0.0 now. Then I've thought that one may use ppc as well, but now I see this config.sub's to powerpc anyway, so power* is fine. Patch updated. In your documentation, how are you distinguishing between Dynamic Linking and Runtime Linking? I've tried to use the same naming scheme as in the ld Command Reference and the dlopen Subroutine man pages. Actually, there is at linktime: Dynamic Linking: also known as Dynamic Mode or (more common) Shared Linking: record a shared object's name into the created binary at runtime: Runtime Loading: load these shared objects at process startup Runtime Linking: resolve the symbols after loading shared objects Dynamic Loading: load shared objects by application logic with dlopen() I'm unsure how to make this as clear as possible though. Now that things have calmed down with respect to breakage on AIX, the patch for building libgcc_s is okay. Thanks, David
[PATCH][AArch64]Fix ICE at -O0 on vld1_lane intrinsics
vld1_lane intrinsics ICE at -O0 because they contain a call to the vset_lane intrinsics, through which the lane index is not constant-propagated. (They are fine at -O1 and higher!). This fixes the ICE by replacing said call by a macro. Rather than defining many individual macros __aarch64_vset(q?)_lane_[uspf](8|16|32|64), instead this introduces a __AARCH64_NUM_LANES macro using sizeof(), such that a single __aarch64_vset_lane_any macro handles all variants (with bounds-checking and endianness-flipping). This reduces potential for error vs. writing the number of lanes for each variant by hand as previously. Also factor the endianness-flipping out to a separate macro __aarch64_lane; I intend to use this for vget_lane too in another patch. Tested with check-gcc on aarch64-none-elf and aarch64_be-none-elf (including new test that FAILs without this patch). Ok for trunk? gcc/ChangeLog: * config/aarch64/arm_neon.h (__AARCH64_NUM_LANES, __aarch64_lane *2): New. (aarch64_vset_lane_any): Redefine using previous, same for BE + LE. (vset_lane_f32, vset_lane_f64, vset_lane_p8, vset_lane_p16, vset_lane_s8, vset_lane_s16, vset_lane_s32, vset_lane_s64, vset_lane_u8, vset_lane_u16, vset_lane_u32, vset_lane_u64): Remove number of lanes. (vld1_lane_f32, vld1_lane_f64, vld1_lane_p8, vld1_lane_p16, vld1_lane_s8, vld1_lane_s16, vld1_lane_s32, vld1_lane_s64, vld1_lane_u8, vld1_lane_u16, vld1_lane_u32, vld1_lane_u64): Call __aarch64_vset_lane_any rather than vset_lane_xxx. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vld1_lane-o0.c: New test.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 921a5db..1291a8d 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -604,173 +604,28 @@ typedef struct poly16x8x4_t #define __aarch64_vdupq_laneq_u64(__a, __b) \ __aarch64_vdup_lane_any (u64, q, q, __a, __b) -/* vset_lane and vld1_lane internal macro. */ +/* Internal macro for lane indices. */ + +#define __AARCH64_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) -#ifdef __AARCH64EB__ /* For big-endian, GCC's vector indices are the opposite way around to the architectural lane indices used by Neon intrinsics. */ -#define __aarch64_vset_lane_any(__vec, __index, __val, __lanes) \ - __extension__ \ - ({\ -__builtin_aarch64_im_lane_boundsi (__index, __lanes); \ -__vec[__lanes - 1 - __index] = __val; \ -__vec; \ - }) +#ifdef __AARCH64EB__ +#define __aarch64_lane(__vec, __idx) (__AARCH64_NUM_LANES (__vec) - 1 - __idx) #else -#define __aarch64_vset_lane_any(__vec, __index, __val, __lanes) \ - __extension__ \ - ({\ -__builtin_aarch64_im_lane_boundsi (__index, __lanes); \ -__vec[__index] = __val; \ -__vec; \ - }) +#define __aarch64_lane(__vec, __idx) __idx #endif -/* vset_lane */ - -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vset_lane_f32 (float32_t __elem, float32x2_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 2); -} - -__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) -vset_lane_f64 (float64_t __elem, float64x1_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 1); -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vset_lane_p8 (poly8_t __elem, poly8x8_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 8); -} - -__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) -vset_lane_p16 (poly16_t __elem, poly16x4_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 4); -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vset_lane_s8 (int8_t __elem, int8x8_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 8); -} - -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vset_lane_s16 (int16_t __elem, int16x4_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 4); -} - -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) -vset_lane_s32 (int32_t __elem, int32x2_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 2); -} - -__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) -vset_lane_s64 (int64_t __elem, int64x1_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 1); -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vset_lane_u8 (uint8_t __elem, uint8x8_t __vec, const int __index) -{ - return __aarch64_vset_lane_any (__vec, __index, __elem, 8); -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
[PATCH] Fix PR62238
I will test the following patch fixing a tree sharing issue in PR62238 and plugging a SSA name leak. The issue here is that force_gimple_operand and friends modify trees in-place, injecting SSA name uses to them. If you end up not emitting their definitions or and up re-using those trees in not appropriate places you'll break things. Fixed by unsharing the tree. The following also plugs the SSA name leak which makes the SSA verifier ICE become a segfault (a released SSA name leaked into a tree used otherwise). Richard. 2014-11-25 Richard Biener rguent...@suse.de PR tree-optimization/62238 * tree-predcom.c (ref_at_iteration): Unshare the expression before gimplifying it. (prepare_initializers_chain): Discard unused seq. * gcc.dg/torture/pr62238.c: New testcase. Index: gcc/tree-predcom.c === --- gcc/tree-predcom.c (revision 218019) +++ gcc/tree-predcom.c (working copy) @@ -1402,8 +1402,8 @@ ref_at_iteration (data_reference_p dr, i off = size_binop (PLUS_EXPR, off, size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter))); tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off); - addr = force_gimple_operand_1 (addr, stmts, is_gimple_mem_ref_addr, -NULL_TREE); + addr = force_gimple_operand_1 (unshare_expr (addr), stmts, +is_gimple_mem_ref_addr, NULL_TREE); tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff); /* While data-ref analysis punts on bit offsets it still handles bitfield accesses at byte boundaries. Cope with that. Note that @@ -2354,7 +2354,6 @@ prepare_initializers_chain (struct loop unsigned i, n = (chain-type == CT_INVARIANT) ? 1 : chain-length; struct data_reference *dr = get_chain_root (chain)-ref; tree init; - gimple_seq stmts; dref laref; edge entry = loop_preheader_edge (loop); @@ -2378,12 +2377,17 @@ prepare_initializers_chain (struct loop for (i = 0; i n; i++) { + gimple_seq stmts = NULL; + if (chain-inits[i] != NULL_TREE) continue; init = ref_at_iteration (dr, (int) i - n, stmts); if (!chain-all_always_accessed tree_could_trap_p (init)) - return false; + { + gimple_seq_discard (stmts); + return false; + } if (stmts) gsi_insert_seq_on_edge_immediate (entry, stmts); Index: gcc/testsuite/gcc.dg/torture/pr62238.c === --- gcc/testsuite/gcc.dg/torture/pr62238.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr62238.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do run } */ + +int a[4], b, c, d; + +int +fn1 (int p) +{ + for (; d; d++) +{ + unsigned int h; + for (h = 0; h 3; h++) + { + if (a[c+c+h]) + { + if (p) + break; + return 0; + } + b = 0; + } +} + return 0; +} + +int +main () +{ + fn1 (0); + return 0; +}
[PATCH] libgcc: Add CFI directives to the floating point support code for ARM.
This patch adds CFI directives to the floating point support code for ARM. Previously, if we tried to do a backtrace from that code in a debug session we'd get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1082 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1158 #2 0xf5c28f5c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Now we'll get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1156 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1263 #2 0x0dc8 in __aeabi_dcmpeq () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1285 #3 0x0504 in main () I don't have write access, so it'd be nice if someone could commit this one for me after reviewing. Thanks a lot! libgcc/ChangeLog: 2014-11-25 Martin Galvan martin.gal...@tallertechnologies.com * config/arm/lib1funcs.S (CFI_START_FUNCTION, CFI_END_FUNCTION): New macros. * config/arm/ieee754-df.S: Add CFI directives. * config/arm/ieee754-sf.S: Add CFI directives. diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S index 1c45a39..5b34a04 100644 --- a/libgcc/config/arm/ieee754-df.S +++ b/libgcc/config/arm/ieee754-df.S @@ -33,8 +33,12 @@ * Only the default rounding mode is intended for best performances. * Exceptions aren't supported yet, but that can be added quite easily * if necessary without impacting performances. + * + * In the CFI related comments, 'previousOffset' refers to the previous offset + * from sp used to compute the CFA. */ +.cfi_sections .debug_frame #ifndef __ARMEB__ #define xl r0 @@ -53,11 +57,13 @@ ARM_FUNC_START negdf2 ARM_FUNC_ALIAS aeabi_dneg negdf2 +CFI_START_FUNCTION @ flip sign bit eor xh, xh, #0x8000 RET +CFI_END_FUNCTION FUNC_END aeabi_dneg FUNC_END negdf2 @@ -66,6 +72,7 @@ ARM_FUNC_ALIAS aeabi_dneg negdf2 #ifdef L_arm_addsubdf3 ARM_FUNC_START aeabi_drsub +CFI_START_FUNCTION eor xh, xh, #0x8000 @ flip sign bit of first arg b 1f @@ -81,7 +88,11 @@ ARM_FUNC_ALIAS aeabi_dsub subdf3 ARM_FUNC_START adddf3 ARM_FUNC_ALIAS aeabi_dadd adddf3 -1: do_push {r4, r5, lr} +1: do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp to sp + 8 +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 @ Look for zeroes, equal values, INF, or NAN. shift1 lsl, r4, xh, #1 @@ -148,6 +159,11 @@ ARM_FUNC_ALIAS aeabi_dadd adddf3 @ Since this is not common case, rescale them off line. teq r4, r5 beq LSYM(Lad_d) + +@ CFI note: we're lucky that the branches to Lad_* that appear after this function +@ have a CFI state that's exactly the same as the one we're in at this +@ point. Otherwise the CFI would change to a different state after the branch, +@ which would be disastrous for backtracing. LSYM(Lad_x): @ Compensate for the exponent overlapping the mantissa MSB added later @@ -413,6 +429,7 @@ LSYM(Lad_i): orrne xh, xh, #0x0008 @ quiet NAN RETLDM r4, r5 +CFI_END_FUNCTION FUNC_END aeabi_dsub FUNC_END subdf3 FUNC_END aeabi_dadd @@ -420,12 +437,19 @@ LSYM(Lad_i): ARM_FUNC_START floatunsidf ARM_FUNC_ALIAS aeabi_ui2d floatunsidf +CFI_START_FUNCTION teq r0, #0 do_it eq, t moveq r1, #0 RETc(eq) -do_push {r4, r5, lr} + +do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp + 0 to sp + 8. +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 + mov r4, #0x400 @ initial exponent add r4, r4, #(52-1 - 1) mov r5, #0 @ sign bit is 0 @@ -435,17 +459,25 @@ ARM_FUNC_ALIAS aeabi_ui2d floatunsidf mov xh, #0 b LSYM(Lad_l) +CFI_END_FUNCTION FUNC_END aeabi_ui2d FUNC_END floatunsidf ARM_FUNC_START floatsidf ARM_FUNC_ALIAS aeabi_i2d floatsidf +CFI_START_FUNCTION teq r0, #0 do_it eq, t moveq r1, #0 RETc(eq) -do_push {r4, r5, lr} + +do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp + 0 to sp + 8. +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 + mov r4, #0x400 @ initial exponent add r4, r4, #(52-1 - 1) andsr5, r0, #0x8000 @ sign bit in r5 @@ -457,11 +489,13 @@ ARM_FUNC_ALIAS aeabi_i2d floatsidf mov xh, #0 b LSYM(Lad_l) +CFI_END_FUNCTION FUNC_END aeabi_i2d FUNC_END floatsidf ARM_FUNC_START extendsfdf2 ARM_FUNC_ALIAS aeabi_f2d extendsfdf2 +
Re: [PATCH] gcc parallel make check
On 15-09-14 18:05, Jakub Jelinek wrote: libstdc++-v3/ * testsuite/Makefile.am (check_p_numbers0, check_p_numbers1, check_p_numbers2, check_p_numbers3, check_p_numbers4, check_p_numbers5, check_p_numbers6, check_p_numbers, check_p_subdirs): New variables. (check_DEJAGNU_normal_targets): Use check_p_subdirs. (check-DEJAGNU): Rewritten so that for parallelized testing each job runs all the *.exp files, with GCC_RUNTEST_PARALLELIZE_DIR set in environment. * testsuite/Makefile.in: Regenerated. * testsuite/lib/libstdc++.exp (gcc_parallel_test_run_p, gcc_parallel_test_enable): New procedures. If GCC_RUNTEST_PARALLELIZE_DIR is set in environment, override runtest_file_p to invoke also gcc_parallel_test_run_p. * testsuite/libstdc++-abi/abi.exp: Run all the tests serially by the first parallel runtest encountering it. Fix up path of the extract_symvers script. * testsuite/libstdc++-xmethods/xmethods.exp: Run all the tests serially by the first parallel runtest encountering it. Run dg-finish even in case of error. When comparing test results of patch builds with test results of reference builds, the only differences I'm seeing are random differences in amount of 'UNSUPPORTED: prettyprinter.exp'. This patch fixes that by ensuring that we print that unsupported message only once. The resulting test result comparison diff is: ... --- without/FAIL 2014-11-24 17:46:32.202673282 +0100 +++ with/FAIL 2014-11-25 13:45:15.636131571 +0100 libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp -libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp -libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp -libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp -libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: xmethods.exp ... Furthermore, the patch adds a dg-finish in case the prettyprinters.exp file is unsupported, which AFAIU is also required in that case. Bootstrapped and reg-tested on x86_64. OK for trunk/stage3? Thanks, - Tom 2014-11-25 Tom de Vries t...@codesourcery.com * testsuite/libstdc++-prettyprinters/prettyprinters.exp: Add missing dg-finish. Only print unsupported message once. --- libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp b/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp index a57660f..e5be5b5 100644 --- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp +++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp @@ -30,7 +30,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] { } if {! [gdb_version_check]} { +dg-finish +# Only print unsupported message in one instance. +if ![gcc_parallel_test_run_p prettyprinters] { + return +} +gcc_parallel_test_enable 0 unsupported prettyprinters.exp +gcc_parallel_test_enable 1 return } -- 1.9.1
C++ PATCH to lookup_template_variable
We need to use unknown_type_node for non-dependent arguments, too; we don't know what type the variable has until we look up the specialization. Tested x86_64-pc-linux-gnu, applying to trunk. commit c348ed4ea7152054ff623a3efbca7fab49227a5f Author: Jason Merrill ja...@redhat.com Date: Mon Nov 24 19:14:04 2014 -0500 * pt.c (lookup_template_variable): Always unknown_type_node. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 1d6b916..29fb2e1 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8026,19 +8026,14 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context, return ret; } -/* Return a TEMPLATE_ID_EXPR for the given variable template and ARGLIST. - If the ARGLIST refers to any template parameters, the type of the - expression is the unknown_type_node since the template-id could - refer to an explicit or partial specialization. */ +/* Return a TEMPLATE_ID_EXPR for the given variable template and ARGLIST. + The type of the expression is the unknown_type_node since the + template-id could refer to an explicit or partial specialization. */ tree lookup_template_variable (tree templ, tree arglist) { - tree type; - if (uses_template_parms (arglist)) -type = unknown_type_node; - else -type = TREE_TYPE (templ); + tree type = unknown_type_node; tsubst_flags_t complain = tf_warning_or_error; tree parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS (templ)); arglist = coerce_template_parms (parms, arglist, templ, complain, diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ17.C b/gcc/testsuite/g++.dg/cpp1y/var-templ17.C new file mode 100644 index 000..c6d97eb --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ17.C @@ -0,0 +1,9 @@ +// DR 1727: a specialization doesn't need to have the same type +// { dg-do compile { target c++14 } } + +template class T T t = 42; +template void* tint = 0; + +templateclass T, class U struct same; +templateclass T struct sameT,T {}; +samevoid*,decltype(tint) s;
RE: [PATCH][MIPS] Fix P5600 memory cost
Hi Prachi, OK with fixes to the changelog entry: latency not latency. Remember to tab in the changelog entry and split the line as it will exceed 80 chars. Also two spaces between the date/name and name/email. E.g. 2014-11-05 Prachi Godbole prachi.godb...@imgtec.com * config/mips/mips.c (mips_rtx_cost_data): Fix memory_latency cost for p5600. I can't see this committed in svn trunk, did you find a problem with the patch? Thanks, Matthew
Re: [PATCH] sreal class fix for PR64050 and PR64060
On Tue, Nov 25, 2014 at 1:55 PM, Martin Liška mli...@suse.cz wrote: Hello. Following patch is fix sreal problems that are mentioned in PR64050, PR64060. I added new GCC plugin test where I test sreal arithmetics and number comparison. Patch can bootstrap on ppc64-linux-pc and x86_64-linux-pc and can pass regression tests. Ok. Thanks, Richard. Thanks, Martin
Re: [PATCH] Add verify_sese
On 25-11-14 10:28, Richard Biener wrote: On Tue, Nov 25, 2014 at 1:01 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, I ran into a problem with my oacc kernels directive patch series where tail-merge added another entry into a region that was previously single-entry-single-exit. That resulted in hitting this assert in calc_dfs_tree: ... /* This aborts e.g. when there is _no_ path from ENTRY to EXIT at all. */ gcc_assert (di-nodes == (unsigned int) n_basic_blocks_for_fn (cfun) - 1); ... during a call to move_sese_region_to_fn. This patch makes sure that we abort earlier, with a clearer message of what is actually wrong. Bootstrapped and reg-tested on x86_64. OK for trunk/stage3? I believe someone made the function work for SEME regions and I believe it is actually used to copy loops with multiple exits This is the first part of the function comment for move_sese_region_to_fn: ... /* Move a single-entry, single-exit region delimited by ENTRY_BB and EXIT_BB to function DEST_CFUN. The whole region is replaced by a single basic block in the original CFG and the new basic block is returned. DEST_CFUN must not have a CFG yet. Note that the region need not be a pure SESE region. Blocks inside the region may contain calls to abort/exit. The only restriction is that ENTRY_BB should be the only entry point and it must dominate EXIT_BB. ... I'm guessing you're referring to the 'not pure SESE region' bit? So in fact, it's not a single-entry-single-exit region, but more a single-entry-at-most-one-continuation region. [ Note that in case of f.i. an eternal loop, we can also have single entry, no continuation. ] so I don't see how the patch can work in these cases? The bbs with calls to abort/exit don't have any successor edges. verify_sese doesn't assert anything specific about suchs bbs. Thanks, - Tom
Re: PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
On Tue, Nov 25, 2014 at 1:57 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is I think the bug is here, not in combine.c. Can you try going back in history for both snippets and see if they matched at some point? Thanks, Richard. (gdb) call debug_tree (type) enumeral_type 0x719f85e8 X type integer_type 0x718a93f0 unsigned char public unsigned string-flag QI size integer_cst 0x718a5fa8 constant 8 unit size integer_cst 0x718a5fc0 constant 1 align 8 symtab 0 alias set -1 canonical type 0x718a93f0 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 static unsigned type_5 QI size integer_cst 0x718a5fa8 8 unit size integer_cst 0x718a5fc0 1 align 8 symtab 0 alias set -1 canonical type 0x719f85e8 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 values tree_list 0x719fb028 purpose identifier_node 0x719f6738 V bindings (nil) local bindings (nil) value const_decl 0x718c21c0 V type enumeral_type 0x719f85e8 X readonly constant used VOID file pr64037.ii line 2 col 3 align 1 context enumeral_type 0x719f85e8 X initial integer_cst 0x719d8d08 2 context translation_unit_decl 0x77ff91e0 D.1 chain type_decl 0x719f5c78 X (gdb) and DECL_ARG_TYPE is (gdb) call debug_tree (type) integer_type 0x718a9690 int public SI size integer_cst 0x718a5e70 type integer_type 0x718a9150 bitsizetype constant 32 unit size integer_cst 0x718a5e88 type integer_type 0x718a90a8 sizetype constant 4 align 32 symtab 0 alias set 1 canonical type 0x718a9690 precision 32 min integer_cst 0x718c60c0 -2147483648 max integer_cst 0x718c60d8 2147483647 pointer_to_this pointer_type 0x718cb930 (gdb) This mismatch makes combine thinks a byte parameter is passed as int in register and turns (insn 9 6 10 2 (set (reg:SI 92 [ b ]) (zero_extend:SI (subreg:QI (reg:SI 91 [ b ]) 0))) pr64037.ii:9 138 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:SI 91 [ b ]) (nil))) (insn 10 9 0 2 (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 92 [ b ])) pr64037.ii:9 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 92 [ b ]) (expr_list:REG_DEAD (reg/v/f:SI 88 [ out ]) (nil into Trying 9 - 10: Successfully matched this instruction: (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 91 [ b ])) allowing combination of insns 9 and 10 original costs 6 + 4 = 10 replacement cost 4 deferring deletion of insn with uid = 9. modifying insn i310: [r88:SI]=r91:SI REG_DEAD r91:SI REG_DEAD r88:SI This patch makes setup_incoming_promotions to match assign_parm_setup_reg. Tested on Linux/x86-64 without regressions. OK for trunk and backport? Thanks. H.J. diff --git a/gcc/combine.c b/gcc/combine.c index 1808f97..a0449a2 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1561,8 +1561,8 @@ setup_incoming_promotions (rtx_insn *first)
Re: PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
On Tue, Nov 25, 2014 at 4:01 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 25, 2014 at 1:57 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is I think the bug is here, not in combine.c. Can you try going back in history for both snippets and see if they matched at some point? Oh, and note that I think DECL_ARG_TYPE is sth dangerous - it's meant to be a source language ABI kind-of-thing. Or rather an optimization hit. For example in C when integral promotions happen to call arguments this can be used to optimize sign-/zero-extensions in the callee. Unless something else overrides this (like the target which specifies the real ABI). IIRC. Richard. Thanks, Richard. (gdb) call debug_tree (type) enumeral_type 0x719f85e8 X type integer_type 0x718a93f0 unsigned char public unsigned string-flag QI size integer_cst 0x718a5fa8 constant 8 unit size integer_cst 0x718a5fc0 constant 1 align 8 symtab 0 alias set -1 canonical type 0x718a93f0 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 static unsigned type_5 QI size integer_cst 0x718a5fa8 8 unit size integer_cst 0x718a5fc0 1 align 8 symtab 0 alias set -1 canonical type 0x719f85e8 precision 8 min integer_cst 0x718a5fd8 0 max integer_cst 0x718a5f78 255 values tree_list 0x719fb028 purpose identifier_node 0x719f6738 V bindings (nil) local bindings (nil) value const_decl 0x718c21c0 V type enumeral_type 0x719f85e8 X readonly constant used VOID file pr64037.ii line 2 col 3 align 1 context enumeral_type 0x719f85e8 X initial integer_cst 0x719d8d08 2 context translation_unit_decl 0x77ff91e0 D.1 chain type_decl 0x719f5c78 X (gdb) and DECL_ARG_TYPE is (gdb) call debug_tree (type) integer_type 0x718a9690 int public SI size integer_cst 0x718a5e70 type integer_type 0x718a9150 bitsizetype constant 32 unit size integer_cst 0x718a5e88 type integer_type 0x718a90a8 sizetype constant 4 align 32 symtab 0 alias set 1 canonical type 0x718a9690 precision 32 min integer_cst 0x718c60c0 -2147483648 max integer_cst 0x718c60d8 2147483647 pointer_to_this pointer_type 0x718cb930 (gdb) This mismatch makes combine thinks a byte parameter is passed as int in register and turns (insn 9 6 10 2 (set (reg:SI 92 [ b ]) (zero_extend:SI (subreg:QI (reg:SI 91 [ b ]) 0))) pr64037.ii:9 138 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:SI 91 [ b ]) (nil))) (insn 10 9 0 2 (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 92 [ b ])) pr64037.ii:9 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 92 [ b ]) (expr_list:REG_DEAD (reg/v/f:SI 88 [ out ]) (nil into Trying 9 - 10: Successfully matched this instruction: (set (mem:SI (reg/v/f:SI 88 [ out ]) [1 *out_4(D)+0 S4 A32]) (reg:SI 91 [ b ])) allowing combination of insns 9 and 10 original costs 6 + 4 = 10 replacement cost 4 deferring deletion of insn
[PATCH, MIPS, COMMITTED] Testsuite fixes for soft-float configurations
Re: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02879.html The new FPXX tests now work correctly for soft-float configurations. Tests should only need to specify one of the 5 floating-point options and any other options are then inferred from that. The FPXX tests were the first tests to really rely on -mfp* options which is why we hadn't seen this issue before. Using a -mfp option implies that the test is hard-float and double float. To create a test that is single-float then only the -msingle-float option should be used without specifying a -mfp option. I have not done anything to improve single-float testsuite support in this patch though. I committed this via a git-svn bridge so fingers crossed it went in correctly! I also used the ChangeLog merging script from the link below which is a marvellous invention if others don't know of it. https://gcc.gnu.org/wiki/GitMirror#git-merge-changelog Thanks, Matthew gcc/testuite/ * gcc.target/mips/mips.exp: Add support for -msoft-float and -mhard-float options. Ensure that explicit -mfp* options imply both -mhard-float and -mdouble-float. * gcc.target/mips/call-clobbered-1.c: Add -mhard-float to the compile options. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@218047 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog | 8 gcc/testsuite/gcc.target/mips/call-clobbered-1.c | 2 +- gcc/testsuite/gcc.target/mips/mips.exp | 8 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index b577824..7b9b365 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,11 @@ +2014-11-25 Matthew Fortune matthew.fort...@imgtec.com + + * gcc.target/mips/mips.exp: Add support for -msoft-float and + -mhard-float options. Ensure that explicit -mfp* options imply + both -mhard-float and -mdouble-float. + * gcc.target/mips/call-clobbered-1.c: Add -mhard-float to the + compile options. + 2014-11-25 Paolo Carlini paolo.carl...@oracle.com PR c++/63786 diff --git a/gcc/testsuite/gcc.target/mips/call-clobbered-1.c b/gcc/testsuite/gcc.target/mips/call-clobbered-1.c index ecb994f..77294aa 100644 --- a/gcc/testsuite/gcc.target/mips/call-clobbered-1.c +++ b/gcc/testsuite/gcc.target/mips/call-clobbered-1.c @@ -1,6 +1,6 @@ /* Check that we handle call-clobbered FPRs correctly. */ /* { dg-skip-if code quality test { *-*-* } { -O0 } { } } */ -/* { dg-options isa=2 -mabi=32 -ffixed-f0 -ffixed-f1 -ffixed-f2 -ffixed-f3 -ffixed-f4 -ffixed-f5 -ffixed-f6 -ffixed-f7 -ffixed-f8 -ffixed-f9 -ffixed-f10 -ffixed-f11 -ffixed-f12 -ffixed-f13 -ffixed-f14 -ffixed-f15 -ffixed-f16 -ffixed-f17 -ffixed-f18 -ffixed-f19 } */ +/* { dg-options isa=2 -mabi=32 -mhard-float -ffixed-f0 -ffixed-f1 -ffixed-f2 -ffixed-f3 -ffixed-f4 -ffixed-f5 -ffixed-f6 -ffixed-f7 -ffixed-f8 -ffixed-f9 -ffixed-f10 -ffixed-f11 -ffixed-f12 -ffixed-f13 -ffixed-f14 -ffixed-f15 -ffixed-f16 -ffixed-f17 -ffixed-f18 -ffixed-f19 } */ void bar (void); double a; diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index a9beb27..6ae71ad 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -234,6 +234,7 @@ set mips_option_groups { dump_pattern -dp endianness -E(L|B)|-me(l|b) float -m(hard|soft)-float +fpu -m(double|single)-float forbid_cpu forbid_cpu=.* fp -mfp(32|xx|64) gp -mgp(32|64) @@ -858,6 +859,8 @@ proc mips-dg-finish {} { #| | # -modd-spreg -mno-odd-spreg #| | +# -mdouble-float -msingle-float +#| | # -mabs=2008/-mabs=legacy no option #| | # -mhard-float-msoft-float @@ -947,7 +950,12 @@ proc mips-dg-options { args } { mips_option_dependency options -mips3d -mpaired-single mips_option_dependency options -mpaired-single -mfp64 mips_option_dependency options -mfp64 -mhard-float +mips_option_dependency options -mfp32 -mhard-float +mips_option_dependency options -mfpxx -mhard-float mips_option_dependency options -mfp64 -modd-spreg +mips_option_dependency options -mfp64 -mdouble-float +mips_option_dependency options -mfp32 -mdouble-float +mips_option_dependency options -mfpxx -mdouble-float mips_option_dependency options -mabs=2008 -mhard-float mips_option_dependency options -mabs=legacy -mhard-float mips_option_dependency options -mrelax-pic-calls -mno-plt -- 1.9.4
Re: [PATCH 1/2] teach mklog to get name / email from git config when available
On 20/11/2014, 16:51 , Tom de Vries wrote: OK for trunk? This is fine. Thanks. Diego.
Re: [PATCH] Add verify_sese
On Tue, Nov 25, 2014 at 3:59 PM, Tom de Vries tom_devr...@mentor.com wrote: On 25-11-14 10:28, Richard Biener wrote: On Tue, Nov 25, 2014 at 1:01 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, I ran into a problem with my oacc kernels directive patch series where tail-merge added another entry into a region that was previously single-entry-single-exit. That resulted in hitting this assert in calc_dfs_tree: ... /* This aborts e.g. when there is _no_ path from ENTRY to EXIT at all. */ gcc_assert (di-nodes == (unsigned int) n_basic_blocks_for_fn (cfun) - 1); ... during a call to move_sese_region_to_fn. This patch makes sure that we abort earlier, with a clearer message of what is actually wrong. Bootstrapped and reg-tested on x86_64. OK for trunk/stage3? I believe someone made the function work for SEME regions and I believe it is actually used to copy loops with multiple exits This is the first part of the function comment for move_sese_region_to_fn: ... /* Move a single-entry, single-exit region delimited by ENTRY_BB and EXIT_BB to function DEST_CFUN. The whole region is replaced by a single basic block in the original CFG and the new basic block is returned. DEST_CFUN must not have a CFG yet. Note that the region need not be a pure SESE region. Blocks inside the region may contain calls to abort/exit. The only restriction is that ENTRY_BB should be the only entry point and it must dominate EXIT_BB. ... I'm guessing you're referring to the 'not pure SESE region' bit? So in fact, it's not a single-entry-single-exit region, but more a single-entry-at-most-one-continuation region. [ Note that in case of f.i. an eternal loop, we can also have single entry, no continuation. ] so I don't see how the patch can work in these cases? The bbs with calls to abort/exit don't have any successor edges. verify_sese doesn't assert anything specific about suchs bbs. Ah, indeed. Patch is ok then. Thanks, Richard. Thanks, - Tom
Re: PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
On Tue, Nov 25, 2014 at 7:01 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 25, 2014 at 1:57 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is I think the bug is here, not in combine.c. Can you try going back in history for both snippets and see if they matched at some point? The bug was introduced by https://gcc.gnu.org/ml/gcc-cvs/2007-09/msg00613.html commit 5d93234932c3d8617ce92b77b7013ef6bede9508 Author: shinwell shinwell@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Sep 20 11:01:18 2007 + gcc/ * combine.c: Include cgraph.h. (setup_incoming_promotions): Rework to allow more aggressive elimination of sign extensions when all call sites of the current function are known to lie within the current unit. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@128618 138bc75d-0d04-0410-961f-82ee72b054a4 Before this commit, combine.c has enum machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); int uns = TYPE_UNSIGNED (TREE_TYPE (arg)); mode = promote_mode (TREE_TYPE (arg), mode, uns, 1); if (mode == GET_MODE (reg) mode != DECL_MODE (arg)) { rtx x; x = gen_rtx_CLOBBER (DECL_MODE (arg), const0_rtx); x = gen_rtx_fmt_e ((uns ? ZERO_EXTEND : SIGN_EXTEND), mode, x); record_value_for_reg (reg, first, x); } It matches function.c: /* This is not really promoting for a call. However we need to be consistent with assign_parm_find_data_types and expand_expr_real_1. */ promoted_nominal_mode = promote_mode (data-nominal_type, data-nominal_mode, unsignedp, 1); r128618 changed mode = promote_mode (TREE_TYPE (arg), mode, uns, 1); to mode3 = promote_mode (DECL_ARG_TYPE (arg), mode2, uns3, 1); It breaks none WORD_REGISTER_OPERATIONS targets. -- H.J.
Re: [PATCH][AArch64] Remove crypto extension from default for cortex-a53, cortex-a57
On 25/11/14 01:36, Gerald Pfeifer wrote: On Tuesday 2014-11-18 09:38, Kyrill Tkachov wrote: Here's what I propose. + li The cryptographic extensions to the ARMv8-A architecture are no + longer enabled by default when specifying the + code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or + code-mcpu=cortex-a57.cortex-a53/code options. To enable these + extensions add the code+crypto/code extension to your given + code-mcpu/code or code-march/code options' value. option's? Or better to the value of your...option(s)? Ok, I've reworded it and added a small example to demonstrate. The description talks about -mcpu and mentions -march only once. Isn't this a bit confusing? The change is to the behaviour of -mcpu, not -march. -march is only mentioned as a way of getting the previous behaviour if the user so wishes. How about this amendment? Thanks for looking at it, Kyrill Gerald Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.41 diff -U 3 -r1.41 changes.html --- htdocs/gcc-5/changes.html 23 Nov 2014 14:42:28 - 1.41 +++ htdocs/gcc-5/changes.html 25 Nov 2014 16:05:01 - @@ -376,8 +376,9 @@ are no longer enabled by default when specifying the code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or code-mcpu=cortex-a57.cortex-a53/code options. To enable these - extensions add the code+crypto/code extension to your given - code-mcpu/code or code-march/code options' value. + extensions add the code+crypto/code extension to the value of + code-mcpu/code or code-march/code e.g. + code-mcpu=cortex-a53+crypto/code. /li liSupport for the Cavium ThunderX processor is now available through the code-mcpu=thunderx/code and code-mtune=thunderx/code options.
Re: [Patch] Improving jump-thread pass for PR 54742
On 11/24/14 21:55, Jeff Law wrote: On 11/24/14 18:09, Sebastian Pop wrote: Sebastian Pop wrote: I removed the return -1 and started a bootstrap on powerpc64-linux. Bootstrap passed on top of the 4 previous patches on powerpc64-linux. I will report the valgrind output. The output from valgrind looks closer to the output of master with no other patches: still 1M more instructions executed, and 300K more branches Just ran my suite where we get ~25k more branches, which definitely puts us in the noise. (that's with all 4 patches + fixing the return value ). I'm going to look at little closer at this stuff tomorrow, but I think we've resolved the performance issue. I'll dig deeper into the implementation tomorrow as well. I was running without your followup patches (must have used the wrong bits from my git stash), so those results are bogus, but in a good way. After fixing that goof, I'm seeing consistent improvements with your set of 4 patches and the fix for the wrong return code. Across the suite, ~140M fewer branches, not huge, but definitely not in the noise. So, time to dig into the implementation :-) Jeff ps. In case you're curious about the noise, it's primarily address hashing.
Re: [PATCH] crtstuff: Add missing semicolon
On 11/24/14 20:44, Segher Boessenkool wrote: I wonder how this survived so long, I must be building some strange configs (it failed on an avr cross). Okay for trunk? Segher 2014-11-24 Segher Boessenkool seg...@kernel.crashing.org libgcc/ * crtstuff.c (__do_glbal_ctors_1): Add missing semicolon. I think this falls under the obviously OK rule :-) jeff
[PATCH] pr31397 - implement -Wsuggest-override
From: Trevor Saunders tsaund...@mozilla.com Hi, this is a new warning to find places where virtual functions are over ridden, but not marked override. included test passes, I expect comments so regtest is pending, and ChangeLog is omitted. Trev --- gcc/c-family/c.opt| 5 + gcc/cp/class.c| 4 gcc/doc/invoke.texi | 6 +- gcc/testsuite/g++.dg/warn/Wsuggest-override.C | 21 + 4 files changed, 35 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/warn/Wsuggest-override.C diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 85dcb98..259b520 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -574,6 +574,11 @@ Wsuggest-attribute=format C ObjC C++ ObjC++ Var(warn_suggest_attribute_format) Warning Warn about functions which might be candidates for format attributes +Wsuggest-override +C++ ObjC++ Var(warn_override) Warning +Suggest that the override keyword be used when the declaration of a virtual +function overrides another. + Wswitch C ObjC C++ ObjC++ Var(warn_switch) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn about enumerated switches, with no default, missing a case diff --git a/gcc/cp/class.c b/gcc/cp/class.c index 16279df..515f33f 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -2777,6 +2777,10 @@ check_for_override (tree decl, tree ctype) { DECL_VINDEX (decl) = decl; overrides_found = true; + if (warn_override DECL_VIRTUAL_P (decl) !DECL_OVERRIDE_P (decl) + !DECL_DESTRUCTOR_P (decl)) + warning_at(DECL_SOURCE_LOCATION (decl), OPT_Wsuggest_override, + %q+D can be marked override, decl); } if (DECL_VIRTUAL_P (decl)) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..8741e8e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -275,7 +275,7 @@ Objective-C and Objective-C++ Dialects}. -Wstack-protector -Wstack-usage=@var{len} -Wstrict-aliasing @gol -Wstrict-aliasing=n @gol -Wstrict-overflow -Wstrict-overflow=@var{n} @gol -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol --Wsuggest-final-types @gol -Wsuggest-final-methods @gol +-Wsuggest-final-types @gol -Wsuggest-final-methods @gol -Wsuggest-override @gol -Wmissing-format-attribute @gol -Wswitch -Wswitch-default -Wswitch-enum -Wswitch-bool -Wsync-nand @gol -Wsystem-headers -Wtrampolines -Wtrigraphs -Wtype-limits -Wundef @gol @@ -4255,6 +4255,10 @@ effective with link time optimization, where the information about the class hiearchy graph is more complete. It is recommended to first consider suggestins of @option{-Wsuggest-final-types} and then rebuild with new annotations. +@item -Wsuggest-override +Warn about overriding virtual functions that are not marked with the override +keyword. + @item -Warray-bounds @opindex Wno-array-bounds @opindex Warray-bounds diff --git a/gcc/testsuite/g++.dg/warn/Wsuggest-override.C b/gcc/testsuite/g++.dg/warn/Wsuggest-override.C new file mode 100644 index 000..929d365 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wsuggest-override.C @@ -0,0 +1,21 @@ +// { dg-do compile } +// { dg-options -std=c++11 -Wsuggest-override } +struct A +{ + A(); + virtual ~A(); + virtual void f(); + virtual int bar(); + operator int(); + virtual operator float(); +}; + +struct B : A +{ + B(); + virtual ~B(); + virtual void f(); // { dg-warning can be marked override } +virtual int bar() override; +operator int(); +virtual operator float(); // { dg-warning can be marked override } +}; -- 2.1.3
Re: [PATCH] Fix a bug in bt-load.c
On 11/24/14 20:56, Segher Boessenkool wrote: This caused ICEs on sh64. `min_cost' and `def' here are supposed to refer to the same element; removing it from the heap before asking the heap for the key doesn't work (and at the end of the loop here we will ask for min_key on an empty heap, which then does gcc_unreachable). Bootstrapped and tested on powerpc64-linux, but I doubt it exercises this code at all; only sh64 did ICE, and does not anymore. Okay for trunk? Segher 2014-11-24 Segher Boessenkool seg...@kernel.crashing.org gcc/ * bt-load.c (migrate_btr_defs): Get the key of a heap entry before removing it, not after. OK. Did sh64 ICE during a build, or was it during testing or something else? Trying to figure out if we need a distinct test in the suite or not. jeff
Re: [PATCH] mn10300: Fix an ICE
On 11/24/14 20:37, Segher Boessenkool wrote: `lcc' is not an insn but just a pattern. This caused a build error in libgcc. Tested with a cross compiler build (which fails without and succeeds with the patch). Not tested much more; this compiler really likes to ICE, something with ipa-icf. Is this okay for trunk? Segher 2014-11-24 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/mn10300/mn10300.c (mn10300_insert_setlb_lcc): Remove PATTERN call. OK. A good example of a case that would have been caught if we get to a point where stuff in the insn chain are not RTX objects, but something else entirely. jeff
Re: [PING][PATCH] Change contrib/test_installed for testing cross compilers
On 11/24/14 09:51, Alan Lawrence wrote: Having just been experimenting with testing of installed compilers - yes something like this could be useful, however: to do cross-testing I found I also (a) had to set my target_list; so either an extra flag for that, or maybe just a generic 'extra_site_flags' parameter? (b) I had to set up some boards...so maybe could have got there with the --tmpdir flag, ok; (c) lost all the parallelism provided by the Makefile in build/gcc. It should be possible to use the (check-parallel-xxx rules from) Makefile in conjunction with the site.exp from contrib/test_installed, haven't got that far yet... This does leave me wondering (1) whether a one-step $ test_installed is feasible, or a two-stage setup and then run is inevitable; (2) whether having all that parallelism expressed in the Makefile is the best place for it. Not that I have an alternative proposal at this point... It might be inevitable to have a two stage setup. Red Hat does installed compiler testing and I'm sure increased parallelism for install testing would be appreciated by the team doing that work. As for the --target, change itself, seems reasonable. Jeff
[PATCH] Enhance ASAN_CHECK optimization
Hi all, This patch improves current optimization of ASAN_CHECKS performed by sanopt pass. In addition to searching the sanitized pointer in asan_check_map, it also tries to search for definition of this pointer. This allows more checks to be dropped when definition is not a gimple value (e.g. load from struct field) and thus cannot be propagated by forwprop. In my experiments this rarely managed to remove more than 0.5% of ASAN_CHECKs but some files got as much as 1% improvement e.g. * gimple.c: 49 (out of 5293) * varasm.c: 42 (out of 3678) For a total it was able to remove 2657 checks in Asan-bootstrapped GCC (out of ~500K). I've Asan-bootstrapped, bootstrapped and regtested on x64. Is this ok for stage3? Best regards, Yury From 85f65c403f132245e9efcc8a420269f8d631fae6 Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Tue, 25 Nov 2014 11:49:11 +0300 Subject: [PATCH] 2014-11-25 Yury Gribov y.gri...@samsung.com gcc/ * sanopt.c (maybe_get_single_definition): New function. (struct tree_map_traits): New struct. (struct sanopt_ctx): Use custom traits for asan_check_map. (maybe_get_dominating_check): New function. (maybe_optimize_ubsan_null_ifn): Move code to maybe_get_dominating_check. (maybe_optimize_asan_check_ifn): Ditto. Take non-SSA expressions into account when optimizing. (sanopt_optimize_walker): Do not treat recoverable sanitization specially. --- gcc/sanopt.c | 194 +++--- 1 file changed, 116 insertions(+), 78 deletions(-) diff --git a/gcc/sanopt.c b/gcc/sanopt.c index e1d11e0..9fe87de 100644 --- a/gcc/sanopt.c +++ b/gcc/sanopt.c @@ -84,6 +84,35 @@ struct sanopt_info bool visited_p; }; +/* If T has a single definition of form T = T2, return T2. */ + +static tree +maybe_get_single_definition (tree t) +{ + if (TREE_CODE (t) == SSA_NAME) +{ + gimple g = SSA_NAME_DEF_STMT (t); + if (gimple_assign_single_p (g)) + return gimple_assign_rhs1 (g); +} + return NULL_TREE; +} + +/* Traits class for tree hash maps below. */ + +struct tree_map_traits : default_hashmap_traits +{ + static inline hashval_t hash (const_tree ref) +{ + return iterative_hash_expr (ref, 0); +} + + static inline bool equal_keys (const_tree ref1, const_tree ref2) +{ + return operand_equal_p (ref1, ref2, 0); +} +}; + /* This is used to carry various hash maps and variables used in sanopt_optimize_walker. */ @@ -95,7 +124,7 @@ struct sanopt_ctx /* This map maps a pointer (the second argument of ASAN_CHECK) to a vector of ASAN_CHECK call statements that check the access. */ - hash_maptree, auto_vecgimple asan_check_map; + hash_maptree, auto_vecgimple, tree_map_traits asan_check_map; /* Number of IFN_ASAN_CHECK statements. */ int asan_num_accesses; @@ -197,6 +226,24 @@ imm_dom_path_with_freeing_call (basic_block bb, basic_block dom) return false; } +/* Get the first dominating check from the list of stored checks. + Non-dominating checks are silently dropped. */ + +static gimple +maybe_get_dominating_check (auto_vecgimple v) +{ + for (; !v.is_empty (); v.pop ()) +{ + gimple g = v.last (); + sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux; + if (!si-visited_p) + /* At this point we shouldn't have any statements + that aren't dominating the current BB. */ + return g; +} + return NULL; +} + /* Optimize away redundant UBSAN_NULL calls. */ static bool @@ -209,7 +256,8 @@ maybe_optimize_ubsan_null_ifn (struct sanopt_ctx *ctx, gimple stmt) bool remove = false; auto_vecgimple v = ctx-null_check_map.get_or_insert (ptr); - if (v.is_empty ()) + gimple g = maybe_get_dominating_check (v); + if (!g) { /* For this PTR we don't have any UBSAN_NULL stmts recorded, so there's nothing to optimize yet. */ @@ -220,43 +268,30 @@ maybe_optimize_ubsan_null_ifn (struct sanopt_ctx *ctx, gimple stmt) /* We already have recorded a UBSAN_NULL check for this pointer. Perhaps we can drop this one. But only if this check doesn't specify stricter alignment. */ - while (!v.is_empty ()) -{ - gimple g = v.last (); - /* Remove statements for BBs that have been already processed. */ - sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux; - if (si-visited_p) - { - v.pop (); - continue; - } - /* At this point we shouldn't have any statements that aren't dominating - the current BB. */ - tree align = gimple_call_arg (g, 2); - int kind = tree_to_shwi (gimple_call_arg (g, 1)); - /* If this is a NULL pointer check where we had segv anyway, we can - remove it. */ - if (integer_zerop (align) - (kind == UBSAN_LOAD_OF - || kind == UBSAN_STORE_OF - || kind == UBSAN_MEMBER_ACCESS)) - remove = true; - /* Otherwise remove the check in non-recovering mode, or if the - stmts have same location. */ - else if (integer_zerop
Re: [PATCH] mn10300: Fix an ICE
On Tue, Nov 25, 2014 at 09:44:35AM -0700, Jeff Law wrote: On 11/24/14 20:37, Segher Boessenkool wrote: `lcc' is not an insn but just a pattern. This caused a build error in libgcc. A good example of a case that would have been caught if we get to a point where stuff in the insn chain are not RTX objects, but something else entirely. Hey, it already did ICE, easy to catch. But you mean wouldn't even compile I guess :-) Segher
Re: [PATCH] mn10300: Fix an ICE
On 11/25/14 10:14, Segher Boessenkool wrote: On Tue, Nov 25, 2014 at 09:44:35AM -0700, Jeff Law wrote: On 11/24/14 20:37, Segher Boessenkool wrote: `lcc' is not an insn but just a pattern. This caused a build error in libgcc. A good example of a case that would have been caught if we get to a point where stuff in the insn chain are not RTX objects, but something else entirely. Hey, it already did ICE, easy to catch. But you mean wouldn't even compile I guess :-) Exactly. This kind of problem is something I want to catch at compile time rather than at runtime. jeff
Re: [PATCH] Fix a bug in bt-load.c
On Tue, Nov 25, 2014 at 09:41:40AM -0700, Jeff Law wrote: On 11/24/14 20:56, Segher Boessenkool wrote: This caused ICEs on sh64. `min_cost' and `def' here are supposed to refer to the same element; removing it from the heap before asking the heap for the key doesn't work (and at the end of the loop here we will ask for min_key on an empty heap, which then does gcc_unreachable). Did sh64 ICE during a build, or was it during testing or something else? Trying to figure out if we need a distinct test in the suite or not. During libgcc build, pretty much all files. The libiberty fibheap code returns 0 for min_key on an empty heap; the new fibonacci_heap code ICEs. This bt-load code will always fail if there is any work to do, so I don't think any other test is needed :-) Segher
Re: [PATCH] Fix a bug in bt-load.c
On 11/25/14 10:26, Segher Boessenkool wrote: On Tue, Nov 25, 2014 at 09:41:40AM -0700, Jeff Law wrote: On 11/24/14 20:56, Segher Boessenkool wrote: This caused ICEs on sh64. `min_cost' and `def' here are supposed to refer to the same element; removing it from the heap before asking the heap for the key doesn't work (and at the end of the loop here we will ask for min_key on an empty heap, which then does gcc_unreachable). Did sh64 ICE during a build, or was it during testing or something else? Trying to figure out if we need a distinct test in the suite or not. During libgcc build, pretty much all files. The libiberty fibheap code returns 0 for min_key on an empty heap; the new fibonacci_heap code ICEs. This bt-load code will always fail if there is any work to do, so I don't think any other test is needed :-) Ok. Thanks. jeff
Re: [Patch] Improving jump-thread pass for PR 54742
Jeff Law wrote: On 11/24/14 21:55, Jeff Law wrote: On 11/24/14 18:09, Sebastian Pop wrote: Sebastian Pop wrote: I removed the return -1 and started a bootstrap on powerpc64-linux. Bootstrap passed on top of the 4 previous patches on powerpc64-linux. I will report the valgrind output. The output from valgrind looks closer to the output of master with no other patches: still 1M more instructions executed, and 300K more branches Just ran my suite where we get ~25k more branches, which definitely puts us in the noise. (that's with all 4 patches + fixing the return value ). I'm going to look at little closer at this stuff tomorrow, but I think we've resolved the performance issue. I'll dig deeper into the implementation tomorrow as well. I was running without your followup patches (must have used the wrong bits from my git stash), so those results are bogus, but in a good way. After fixing that goof, I'm seeing consistent improvements with your set of 4 patches and the fix for the wrong return code. Across the suite, ~140M fewer branches, not huge, but definitely not in the noise. Thanks for your testing. So, time to dig into the implementation :-) To ease the review, I squashed all the patches in a single one. I will bootstrap and regression test this patch on x86_64-linux and powerpc64-linux. I will also run it on our internal benchmarks, coremark, and the llvm test-suite. I will also include a longer testcase that makes sure we do not regress on coremark. Sebastian From db0f6817768920b497225484fab24a20e5ddf556 Mon Sep 17 00:00:00 2001 From: Sebastian Pop s@samsung.com Date: Fri, 26 Sep 2014 14:54:20 -0500 Subject: [PATCH] extend jump thread for finite state automata PR 54742 Adapted from a patch from James Greenhalgh. * params.def (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): New. * doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): Documented. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. * tree-cfg.c (split_edge_bb_loc): Export. * tree-cfg.h (split_edge_bb_loc): Declared extern. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the original value of cond when simplification fails. (fsm_find_thread_path): New. (fsm_find_control_statement_thread_paths): New. (fsm_thread_through_normal_block): Call find_control_statement_thread_paths. * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print EDGE_START_FSM_THREAD. (duplicate_seme_region): New. (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges calling gimple_duplicate_sese_region. * tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD. --- gcc/doc/invoke.texi | 12 ++ gcc/params.def | 15 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 38 gcc/tree-cfg.c | 2 +- gcc/tree-cfg.h | 1 + gcc/tree-ssa-threadedge.c| 215 ++- gcc/tree-ssa-threadupdate.c | 198 - gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 479 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..074183f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10624,6 +10624,18 @@ large and significantly increase compile time at optimization level @option{-O1} and higher. This parameter is a maximum nubmer of statements in a single generated constructor. Default value is 5000. +@item max-fsm-thread-path-insns +Maximum number of instructions to copy when duplicating blocks on a +finite state automaton jump thread path. The default is 100. + +@item max-fsm-thread-length +Maximum number of basic blocks on a finite state automaton jump thread +path. The default is 10. + +@item max-fsm-thread-paths +Maximum number of new jump thread paths to create for a finite state +automaton. The default is 50. + @end table @end table diff --git a/gcc/params.def b/gcc/params.def index 9b21c07..edf3f53 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1140,6 +1140,21 @@ DEFPARAM (PARAM_CHKP_MAX_CTOR_SIZE, Maximum number of statements to be included into a single static constructor generated by Pointer Bounds Checker, 5000, 100, 0) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATH_INSNS, + max-fsm-thread-path-insns, + Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path, + 100, 1, 99) + +DEFPARAM (PARAM_MAX_FSM_THREAD_LENGTH, + max-fsm-thread-length, + Maximum number of basic blocks on a finite state automaton jump thread path, + 10, 1, 99) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATHS, + max-fsm-thread-paths, + Maximum number of new
[PATCH, libgfortran]: Remove unused variable
Hello! 2014-11-25 Uros Bizjak ubiz...@gmail.com * intrinsics/env.c (getenv): Remove unused variable res_len. Bootstrapped on x86_64-linux-gnu. Almost trivial, but ... OK for mainline? Uros. Index: intrinsics/env.c === --- intrinsics/env.c(revision 218056) +++ intrinsics/env.c(working copy) @@ -42,7 +42,6 @@ PREFIX(getenv) (char * name, char * value, gfc_cha { char *name_nt; char *res = NULL; - int res_len; if (name == NULL || value == NULL) runtime_error (Both arguments to getenv are mandatory.);
Re: [PATCH, libgfortran]: Remove unused variable
On Tue, Nov 25, 2014 at 07:17:17PM +0100, Uros Bizjak wrote: 2014-11-25 Uros Bizjak ubiz...@gmail.com * intrinsics/env.c (getenv): Remove unused variable res_len. Bootstrapped on x86_64-linux-gnu. Almost trivial, but ... OK for mainline? Yes. -- Steve
Re: [PATCH 2/5] combine: handle I2 a parallel of two SETs
On 11/14/14 12:19, Segher Boessenkool wrote: If I2 is a PARALLEL of two SETs, split it into two instructions, I1 and I2. If there already was an I1, rename it to I0. If there already was an I0, don't do anything. This surprisingly simple patch is enough to let combine handle such PARALLELs properly. It's clever. 2014-11-14 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (try_combine): If I2 is a PARALLEL of two SETs, split it into two insns. So you're virtually serializing the PARALLEL to make combine happy if I'm reading this correctly. THe first thing I worry about is preserving the semantics of a PARALLEL. Specifically that all the inputs are evaluated, then all the side effects happen. So I think one of the checks you need is that the destinations of the SETs are not used as source operands in the SETs. The second thing I worry about handling of match_dup operands. But presumably all the resulting insns must match in one way or another or the whole thing gets reset to its prior state. So I suspect those are OK as well. Related, I was worried about RTL structure sharing, but in the end I think those are a non-concern for the same basic reasons as match_dups aren't a real worry. --- gcc/combine.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/gcc/combine.c b/gcc/combine.c index f7797e7..c4d23e3 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2780,6 +2780,37 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, SUBST_LINK (LOG_LINKS (i2), alloc_insn_link (i1, LOG_LINKS (i2))); } } + + /* If I2 is a PARALLEL of two SETs of REGs (and perhaps some CLOBBERs), + make those two SETs separate I1 and I2 insns, and make an I0 that is + the original I1. */ + if (i0 == 0 Test for NULL. + GET_CODE (PATTERN (i2)) == PARALLEL + XVECLEN (PATTERN (i2), 0) = 2 + GET_CODE (XVECEXP (PATTERN (i2), 0, 0)) == SET + GET_CODE (XVECEXP (PATTERN (i2), 0, 1)) == SET + REG_P (SET_DEST (XVECEXP (PATTERN (i2), 0, 0))) + REG_P (SET_DEST (XVECEXP (PATTERN (i2), 0, 1))) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3) + (XVECLEN (PATTERN (i2), 0) == 2 + || GET_CODE (XVECEXP (PATTERN (i2), 0, 2)) == CLOBBER)) As noted above, I think you need to verify the set/clobbered operands do not conflict with any of the source operands. Otherwise you run the risk of changing the semantics when you rip apart the PARALLEL. Ah, just saw that Bernd made the same observation. Good. And I think while convention has CLOBBERs at the end of insns, I don't think that's a hard requirement. So I think you need a stronger check for elements 2 and beyond in the vector. OK with the direction this is going, but I think another iteration is going to be necessary. Jeff
Re: [PATCH 3/5] combine: add regno field to LOG_LINKS
On 11/14/14 12:19, Segher Boessenkool wrote: With this new field in place, we can have LOG_LINKS for insns that set more than one register and distribute them properly in distribute_links. This then allows many more PARALLELs to be combined. Also split off new functions can_combine_{def,use}_p from the create_log_links function. 2014-11-14 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (struct insn_link): New field `regno'. (alloc_insn_link): New parameter `regno'. Use it. (find_single_use): Check the new field. (can_combine_def_p, can_combine_use_p): New functions. Split off from ... (create_log_links): ... here. Correct data type of `regno'. Adjust call to alloc_insn_link. (adjust_for_new_dest): Find regno, use it in call to alloc_insn_link. (try_combine): Adjust call to alloc_insn_link. (distribute_links): Check the new field. Didn't you lose the check that avoids duplicated LOG_LINKs? Or is the claim that the check is no longer needed because there are no duplicates now that we include the register associated with the link? + + rtx set = single_set (insn); + gcc_assert (set); + + rtx reg = SET_DEST (set); + + while (GET_CODE (reg) == ZERO_EXTRACT +|| GET_CODE (reg) == STRICT_LOW_PART +|| GET_CODE (reg) == SUBREG) +reg = XEXP (reg, 0); + gcc_assert (REG_P (reg)); Can REG ever be a hard reg here? If so, then the SUBREG case needs to simplify the hard reg rather than just strip off the SUBREG. Might be OK, depends on answers to questions above -- holding final approval pending those answers. Jeff
Re: Document __builtin_*_overflow
Hi Jakub, On Wednesday 2014-11-12 14:13, Jakub Jelinek wrote: This patch mentions __builtin_*_overflow in gcc-5/changes.html. Ok for CVS? I've fallen a bit behind with GCC patches, sorry. What do you think about this follow-up patch on top of yours? Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.41 diff -u -r1.41 changes.html --- changes.html23 Nov 2014 14:42:28 - 1.41 +++ changes.html25 Nov 2014 18:49:02 - @@ -157,14 +157,14 @@ These builtins have two integral arguments (which don't need to have the same type), the arguments are extended to infinite precision signed type, code+/code, code-/code or code*/code - is performed on those and the result is stored into some integer - variable pointed by the last argument. If the stored value is equal - to the infinite precision result, the built-in functions return + is performed on those, and the result is stored in an integer + variable pointed to by the last argument. If the stored value is + equal to the infinite precision result, the built-in functions return codefalse/code, otherwise codetrue/code. The type of the integer variable that will hold the result can be different from - the types of arguments. The following snippet demonstrates how - this can be used in computing the size for the codecalloc/code - function: + the types of the first two arguments. The following snippet + demonstrates how this can be used in computing the size for the + codecalloc/code function: blockquotepre void * calloc (size_t x, size_t y) @@ -177,8 +177,8 @@ return ret; } /pre/blockquote - On e.g. i?86 or x86-64 the above will result in codemul/code - instruction followed by jump on overflow. + On e.g. i?86 or x86-64 the above will result in a codemul/code + instruction followed by a jump on overflow. /li liThe option code-fextended-identifiers/code is now enabled by default for C++, and for C99 and later C versions. Various Gerald
[ping^4] [libgomp] make it possible to use OMP on both sides of a fork
Ping^4 for: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00519.html On Tue, Nov 18, 2014 at 12:53 AM, Nathaniel Smith n...@pobox.com wrote: Hello, Ping for https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00519.html Patches posted early enough during Stage 1 and not yet fully reviewed may still get in early in Stage 3. Please make sure to ping them soon enough. This patch was initially posted before stage 1 opened... for 4.9. So hopefully that qualifies :-). It would be nice to get it in someday... -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Re: [patch] Restore bootstrap on powerpc*-apple-darwin*
2014-11-24 Rohit rohitarul...@freescale.com PR bootstrap/63703 * config/rs6000/darwin.h (REGISTER_NAMES): Update based on 32 newly added GCC hard register numbers for SPE high registers. IMO, it's obvious, and as you say, doesn't touch any other target. After further confirmations that it restores full bootstrap on powerpc-apple-darwin9, I’ve committed (r218058). I’ll backport to the 4.9 branch shortly. FX
Re: Document __builtin_*_overflow
On Tue, Nov 25, 2014 at 07:50:02PM +0100, Gerald Pfeifer wrote: Hi Jakub, On Wednesday 2014-11-12 14:13, Jakub Jelinek wrote: This patch mentions __builtin_*_overflow in gcc-5/changes.html. Ok for CVS? I've fallen a bit behind with GCC patches, sorry. What do you think about this follow-up patch on top of yours? LGTM, thanks. --- changes.html 23 Nov 2014 14:42:28 - 1.41 +++ changes.html 25 Nov 2014 18:49:02 - @@ -157,14 +157,14 @@ These builtins have two integral arguments (which don't need to have the same type), the arguments are extended to infinite precision signed type, code+/code, code-/code or code*/code - is performed on those and the result is stored into some integer - variable pointed by the last argument. If the stored value is equal - to the infinite precision result, the built-in functions return + is performed on those, and the result is stored in an integer + variable pointed to by the last argument. If the stored value is + equal to the infinite precision result, the built-in functions return codefalse/code, otherwise codetrue/code. The type of the integer variable that will hold the result can be different from - the types of arguments. The following snippet demonstrates how - this can be used in computing the size for the codecalloc/code - function: + the types of the first two arguments. The following snippet + demonstrates how this can be used in computing the size for the + codecalloc/code function: blockquotepre void * calloc (size_t x, size_t y) @@ -177,8 +177,8 @@ return ret; } /pre/blockquote - On e.g. i?86 or x86-64 the above will result in codemul/code - instruction followed by jump on overflow. + On e.g. i?86 or x86-64 the above will result in a codemul/code + instruction followed by a jump on overflow. /li liThe option code-fextended-identifiers/code is now enabled by default for C++, and for C99 and later C versions. Various Gerald Jakub
[PATCH, libobjc]: Remove ‘...’ is static but used in inline function ‘...’ which is not static
Hello! Recently, gcc bootstrap started to emit following warnings when building libobjc: libobjc/sendmsg.c:338:13: warning: ‘get_implementation’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:335:15: warning: ‘sarray_get_safe’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:143:21: warning: ‘__objc_word_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:141:21: warning: ‘__objc_double_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:139:21: warning: ‘__objc_block_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static 2014-11-25 Uros Bizjak ubiz...@gmail.com * sendmsg.c (get_imp): Declare as static inline. (__objc_get_forward_imp): Ditto. Bootstrapped on x86_64-linux-gnu. OK for mainline? Uros. Index: sendmsg.c === --- sendmsg.c (revision 218056) +++ sendmsg.c (working copy) @@ -105,7 +105,7 @@ id nil_method (id, SEL); /* Given a selector, return the proper forwarding implementation. */ -inline +static inline IMP __objc_get_forward_imp (id rcv, SEL sel) { @@ -320,7 +320,7 @@ return res; } -inline +static inline IMP get_imp (Class class, SEL sel) {
Re: [PATCH 4/5] combine: distribute_log_links for PARALLELs of SETs
On 11/14/14 12:19, Segher Boessenkool wrote: Now that LOG_LINKS are per regno, we can distribute them on PARALLELs just fine. Do so. This makes PARALLELs not lose their LOG_LINKS early when e.g. a trivial reg-reg move is combined, so that they can be used in more useful combinations as well. 2014-11-14 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (distribute_links): Handle multiple SETs. So the code in distribute_links implies that we're not going to see hard register SUBREGs, so ignore my concerns with the prior patch in this series WRT hard register SUBREGs. This is OK once prereqs are approved. You might consider pushing the two LOG_LINKs related patches forward independently of the patch to rip apart the PARALLELs. Though I think that all of the patches are pretty close to being approved. Your call. Jeff
Re: [patch c++]: Fix PR/53904
On 11/20/2014 02:48 PM, Kai Tietz wrote: this issue fixes a type-overflow issue caused by trying to cast a UHWI via tree_to_shwi. As soon as value gets larger then SHWI_MAX, we get an error for it. So we need to cast it via tree_to_uhwi, and then casting it to the signed variant. The problem seems to be with zero-length arrays getting -1 from array_type_nelts. Let's use array_type_nelts_top instead so we don't ever see negative values. Jason
[PATCH v3] gcc/c-family/c-cppbuiltin.c: Let buffer enough to print host wide integer value
The original length 18 is not enough for HOST_WIDE_INT printing, need use 20 instead of. Also need additional bytes for printing related prefix and suffix, and give a related check. It passes testsuite under fedora 20 x86_64-unknown-linux-gnu. 2014-11-26 Chen Gang gang.chen.5...@gmail.com * c-family/c-cppbuiltin.c (builtin_define_with_int_value): Let buffer enough to print host wide integer value. --- gcc/c-family/c-cppbuiltin.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index c571d1b..b1b96fb 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -1366,14 +1366,28 @@ static void builtin_define_with_int_value (const char *macro, HOST_WIDE_INT value) { char *buf; - size_t mlen = strlen (macro); - size_t vlen = 18; - size_t extra = 2; /* space for = and NUL. */ - - buf = (char *) alloca (mlen + vlen + extra); - memcpy (buf, macro, mlen); - buf[mlen] = '='; - sprintf (buf + mlen + 1, HOST_WIDE_INT_PRINT_DEC, value); + size_t vlen = 20; /* maximize value length: -9223372036854775807 */ + size_t extra = 6; /* space for =, NUL, (, ), and L L. */ + + gcc_assert (wi::fits_to_tree_p (value, char_type_node) + || wi::fits_to_tree_p (value, short_integer_type_node) + || wi::fits_to_tree_p (value, integer_type_node) + || wi::fits_to_tree_p (value, long_integer_type_node) + || wi::fits_to_tree_p (value, long_long_integer_type_node)); + + buf = (char *) alloca (strlen (macro) + vlen + extra); + + sprintf (buf, %s=%sHOST_WIDE_INT_PRINT_DEC%s%s, + macro, + value 0 ? ( : , + value, + wi::fits_to_tree_p (value, char_type_node) +|| wi::fits_to_tree_p (value, short_integer_type_node) +|| wi::fits_to_tree_p (value, integer_type_node) + ? + : wi::fits_to_tree_p (value, long_integer_type_node) +? L : LL, + value 0 ? ) : ); cpp_define (parse_in, buf); } -- 1.9.3
Re: [PATCH, libobjc]: Remove ‘...’ is static but used in inline function ‘...’ which is not static
On Tue, Nov 25, 2014 at 11:09 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Recently, gcc bootstrap started to emit following warnings when building libobjc: libobjc/sendmsg.c:338:13: warning: ‘get_implementation’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:335:15: warning: ‘sarray_get_safe’ is static but used in inline function ‘get_imp’ which is not static libobjc/sendmsg.c:143:21: warning: ‘__objc_word_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:141:21: warning: ‘__objc_double_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static libobjc/sendmsg.c:139:21: warning: ‘__objc_block_forward’ is static but used in inline function ‘__objc_get_forward_imp’ which is not static This patch is incorrect as get_imp is exported from libobjc.so. See libobjc.def. I would rather use -std=gnu90 to compile these source files as you are changing the exported symbols. This also fixes bug 63863. Thanks, Andrew Pinski 2014-11-25 Uros Bizjak ubiz...@gmail.com * sendmsg.c (get_imp): Declare as static inline. (__objc_get_forward_imp): Ditto. Bootstrapped on x86_64-linux-gnu. OK for mainline? Uros. Index: sendmsg.c === --- sendmsg.c (revision 218056) +++ sendmsg.c (working copy) @@ -105,7 +105,7 @@ id nil_method (id, SEL); /* Given a selector, return the proper forwarding implementation. */ -inline +static inline IMP __objc_get_forward_imp (id rcv, SEL sel) { @@ -320,7 +320,7 @@ return res; } -inline +static inline IMP get_imp (Class class, SEL sel) {
Re: PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
On Tue, Nov 25, 2014 at 7:04 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 25, 2014 at 4:01 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 25, 2014 at 1:57 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is I think the bug is here, not in combine.c. Can you try going back in history for both snippets and see if they matched at some point? Oh, and note that I think DECL_ARG_TYPE is sth dangerous - it's meant to be a source language ABI kind-of-thing. Or rather an optimization hit. For example in C when integral promotions happen to call arguments this can be used to optimize sign-/zero-extensions in the callee. Unless something else overrides this (like the target which specifies the real ABI). IIRC. PROMOTE_MODE is a performance hint, not an ABI requirement. i386.h has #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ do {\ if (((MODE) == HImode TARGET_PROMOTE_HI_REGS) \ || ((MODE) == QImode TARGET_PROMOTE_QI_REGS)) \ (MODE) = SImode;\ } while (0) We may promote QI/HI to SI, depending on optimization. On the other hand, TARGET_PROMOTE_FUNCTION_MODE is determined by psABI. I am enclosing the missing ChangeLog entries. -- H.J. --- gcc/ PR rtl-optimization/64037 * combine.c (setup_incoming_promotions): Pass the argument before any promotions happen to promote_function_mode. gcc/testsuite/ PR rtl-optimization/64037 * g++.dg/pr64037.C: New test.
Re: [PATCH] gcc parallel make check
On Tue, Nov 25, 2014 at 03:27:40PM +0100, Tom de Vries wrote: This patch fixes that by ensuring that we print that unsupported message only once. The resulting test result comparison diff is: 2014-11-25 Tom de Vries t...@codesourcery.com * testsuite/libstdc++-prettyprinters/prettyprinters.exp: Add missing dg-finish. Only print unsupported message once. LGTM. --- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp +++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp @@ -30,7 +30,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] { } if {! [gdb_version_check]} { +dg-finish +# Only print unsupported message in one instance. +if ![gcc_parallel_test_run_p prettyprinters] { + return +} +gcc_parallel_test_enable 0 unsupported prettyprinters.exp +gcc_parallel_test_enable 1 return } -- 1.9.1 Jakub
patch to fix PR63527
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 The patch was tested and bootstrapped on x86/x86-64. Committed as rev. 218509. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 Index: ira-lives.c === --- ira-lives.c (revision 218058) +++ ira-lives.c (working copy) @@ -1123,8 +1123,10 @@ process_bb_node_lives (ira_loop_tree_nod pessimistic, but it probably doesn't matter much in practice. */ FOR_BB_INSNS_REVERSE (bb, insn) { + int regno; + ira_allocno_t a; df_ref def, use; - bool call_p; + bool call_p, clear_pic_use_conflict_p; if (!NONDEBUG_INSN_P (insn)) continue; @@ -1134,6 +1136,21 @@ process_bb_node_lives (ira_loop_tree_nod INSN_UID (insn), loop_tree_node-parent-loop_num, curr_point); + call_p = CALL_P (insn); + clear_pic_use_conflict_p = false; + /* Processing insn usage in call insn can create conflict + with pic pseudo and pic hard reg and that is wrong. + Check this situation and fix it at the end of the insn + processing. */ + if (call_p pic_offset_table_rtx != NULL_RTX + (regno = REGNO (pic_offset_table_rtx)) = FIRST_PSEUDO_REGISTER + (a = ira_curr_regno_allocno_map[regno]) != NULL) + clear_pic_use_conflict_p + = (find_regno_fusage (insn, USE, REAL_PIC_OFFSET_TABLE_REGNUM) + ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS + (ALLOCNO_OBJECT (a, 0)), + REAL_PIC_OFFSET_TABLE_REGNUM)); + /* Mark each defined value as live. We need to do this for unused values because they still conflict with quantities that are live at the time of the definition. @@ -1143,7 +1160,6 @@ process_bb_node_lives (ira_loop_tree_nod on a call-clobbered register. Marking the register as live would stop us from allocating it to a call-crossing allocno. */ - call_p = CALL_P (insn); FOR_EACH_INSN_DEF (def, insn) if (!call_p || !DF_REF_FLAGS_IS_SET (def, DF_REF_MAY_CLOBBER)) mark_ref_live (def); @@ -1207,7 +1223,7 @@ process_bb_node_lives (ira_loop_tree_nod EXECUTE_IF_SET_IN_SPARSESET (objects_live, i) { ira_object_t obj = ira_object_id_map[i]; - ira_allocno_t a = OBJECT_ALLOCNO (obj); + a = OBJECT_ALLOCNO (obj); int num = ALLOCNO_NUM (a); HARD_REG_SET this_call_used_reg_set; @@ -1257,7 +1273,7 @@ process_bb_node_lives (ira_loop_tree_nod make_early_clobber_and_input_conflicts (); curr_point++; - + /* Mark each used value as live. */ FOR_EACH_INSN_USE (use, insn) mark_ref_live (use); @@ -1286,6 +1302,17 @@ process_bb_node_lives (ira_loop_tree_nod } } + if (clear_pic_use_conflict_p) + { + regno = REGNO (pic_offset_table_rtx); + a = ira_curr_regno_allocno_map[regno]; + CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (ALLOCNO_OBJECT (a, 0)), + REAL_PIC_OFFSET_TABLE_REGNUM); + CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS + (ALLOCNO_OBJECT (a, 0)), + REAL_PIC_OFFSET_TABLE_REGNUM); + } + curr_point++; }
Re: [PATCH] mn10300: Fix an ICE
On Tue, 2014-11-25 at 10:15 -0700, Jeff Law wrote: On 11/25/14 10:14, Segher Boessenkool wrote: On Tue, Nov 25, 2014 at 09:44:35AM -0700, Jeff Law wrote: On 11/24/14 20:37, Segher Boessenkool wrote: `lcc' is not an insn but just a pattern. This caused a build error in libgcc. A good example of a case that would have been caught if we get to a point where stuff in the insn chain are not RTX objects, but something else entirely. Hey, it already did ICE, easy to catch. But you mean wouldn't even compile I guess :-) Exactly. This kind of problem is something I want to catch at compile time rather than at runtime. Right. FWIW I have a set of patches that converts PATTERN() to requiring a const rtx_insn * rather than a const_rtx, but so far they only compile on x86_64. Extending them to cover all archs would have caught this at compile time, I guess, since lcc would have been just an rtx. Presumably something for the next stage1.
Re: [patch, build] Restore bootstrap in building libcc1 on darwin
On Nov 23, 2014, at 4:06 PM, FX fxcoud...@gmail.com wrote: One question to build maintainers, and one patch submitted to top-level configure.ac So, not sure who wants to review this. From the darwin perspective, Ok.
Re: patch to fix PR63527
On Tue, Nov 25, 2014 at 12:22 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 The patch was tested and bootstrapped on x86/x86-64. Committed as rev. 218509. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 I checked in this testcase. Thanks. -- H.J. --- Index: ChangeLog === --- ChangeLog (revision 218060) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-25 H.J. Lu hongjiu...@intel.com + + PR target/63527 + * gcc.target/i386/pr63527.c: New test. + 2014-11-25 Martin Liska mli...@suse.cz PR bootstrap/64050 Index: gcc.target/i386/pr63527.c === --- gcc.target/i386/pr63527.c (revision 0) +++ gcc.target/i386/pr63527.c (working copy) @@ -0,0 +1,25 @@ +/* PR rtl-optimization/pr63527 */ +/* { dg-do compile { target { ia32 fpic } } } */ +/* { dg-options -O2 -fPIC } */ + +struct cache_file +{ + char magic[sizeof ld.so-1.7.0 - 1]; + unsigned int nlibs; +}; +typedef unsigned int size_t; +size_t cachesize __attribute__ ((visibility (hidden))); +struct cache_file *cache __attribute__ ((visibility (hidden))); +extern int __munmap (void *__addr, size_t __len); +void +_dl_unload_cache (void) +{ + if (cache != ((void *)0) cache != (struct cache_file *) -1) +{ + __munmap (cache, cachesize); + cache = ((void *)0) ; +} +} + +/* We shouldn't load EBX again. */ +/* { dg-final { scan-assembler-not movl\[ \t\]%\[^,\]+, %ebx } } */
Re: [PATCH] Fix PR ipa/61190, updated
Index: gcc/ipa-pure-const.c === --- gcc/ipa-pure-const.c (revision 215888) +++ gcc/ipa-pure-const.c (working copy) @@ -744,6 +744,8 @@ analyze_function (struct cgraph_node *fn, bool ipa { /* Thunk gets propagated through, so nothing interesting happens. */ gcc_assert (ipa); + if (fn-thunk.virtual_offset_p) + l-pure_const_state = IPA_NEITHER; return l; } Hmm, I looked again at the above if statement, and I think now it should better be if (fn-thunk.thunk_p fn-thunk.virtual_offset_p), because thunk.virtual_offset_p is probably not well defined if we come here because of fn-alias == true. Yes, that is right. I plan to put the other thunk info off the structure anyway. This makes the lattice to be initialized correctly, but you also need the function_symbol calls that will skip thunks replaced by something like function_or_non_virtual_thunk_symbol. Oh, I see what you mean, thanks. I created a new method function_or_virtual_thunk_symbol() for this. And simplified the algorithm of both function_symbol variants a bit. Attached, you'll find my updated patch for review. Boot-strapped and regression tested on x86_64-linux-gnu. OK for trunk? Thanks Bernd. Can you, please, send the updated patch? Sorry for late review, Honza 2014-11-25 Bernd Edlinger bernd.edlin...@hotmail.de PR ipa/61190 * cgraph.h (symtab_node::call_for_symbol_and_aliases): Fix comment. (cgraph_node::function_or_virtual_thunk_symbol): New function. (cgraph_node::call_for_symbol_and_aliases): Fix comment. (cgraph_node::call_for_symbol_thunks_and_aliases): Adjust comment. Add new optional parameter exclude_virtual_thunks. * cgraph.c (cgraph_node::call_for_symbol_thunks_and_aliases): Add new optional parameter exclude_virtual_thunks. (cgraph_node::set_const_flag): Don't propagate to virtual thunks. (cgraph_node::set_pure_flag): Likewise. (cgraph_node::function_symbol): Simplified. (cgraph_node::function_or_virtual_thunk_symbol): New function. * ipa-pure-const.c (analyze_function): For virtual thunks set pure_const_state to IPA_NEITHER. (propagate_pure_const): Use function_or_virtual_thunk_symbol. OK, Honza testsuite/ChangeLog: 2014-11-25 Bernd Edlinger bernd.edlin...@hotmail.de PR ipa/61190 * g++.old-deja/g++.mike/p4736b.C: Use -O2.
Re: [patch, build] Restore bootstrap in building libcc1 on darwin
On 25/11/14 20:37, Mike Stump wrote: On Nov 23, 2014, at 4:06 PM, FX fxcoud...@gmail.com wrote: One question to build maintainers, and one patch submitted to top-level configure.ac So, not sure who wants to review this. From the darwin perspective, Ok. I mean from my limited viewpoint it looks fine. As long as the .so is built, that's really our only goal from a GDB point of view. But I am not a maintainer, so I have refrained from commenting on this change, as it seems fairly straightforward. Though I am no expert on GCC build systems. Cheers Phil
Re: [PATCH] pr61324 pr 63649 - fix crash in ipa_comdats
From: Trevor Saunders tsaund...@mozilla.com Hi, the interesting symbol in the test case for pr61324 is __GLOBAL__sub_I_s. It refers to nothing, and is called by nothing, however it is kept (I believe because of -fkeep-inline-functions). That means ipa_comdats never tries to put Aha, that explans why it is around. it in a comdat, and so it never ends up in the hash table. It seems like the simplest solution is to just check if symbol is not in the map before trying to get the comdat it should go in, but another approach might be to keep separate hash maps for comdat functions and functions that can't be in any comdat, and then iterate over only the functions that belong in a comdat. Well, -fkeep-inline-functions promise you that you can call any inline function from debugger. I suppose in this case you also want to be able to call static functions. Comdat pass may bundle the function into comdat that is later optimized away by linker, so I would say we just want to disable the whole comdat pass when -fkeep-inline-functions is used? Patch for that is preapproved. Honza boottstrapped + regtested x86_64-unknown-linux-gnu, ok? Trev gcc/ * ipa-comdats.c (ipa_commdats): check if map contains symbol before trying to put symbol in a comdat. diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c index af2aef8..8695a7e 100644 --- a/gcc/ipa-comdats.c +++ b/gcc/ipa-comdats.c @@ -327,18 +327,18 @@ ipa_comdats (void) !symbol-alias symbol-real_symbol_p ()) { - tree group = *map.get (symbol); + tree *group = map.get (symbol); - if (group == error_mark_node) + if (!group || *group == error_mark_node) continue; if (dump_file) { fprintf (dump_file, Localizing symbol\n); symbol-dump (dump_file); - fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (group)); + fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (*group)); } symbol-call_for_symbol_and_aliases (set_comdat_group, - *comdat_head_map.get (group), + *comdat_head_map.get (*group), true); } } diff --git a/gcc/testsuite/g++.dg/pr61324.C b/gcc/testsuite/g++.dg/pr61324.C new file mode 100644 index 000..6102574 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr61324.C @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-options -O -fkeep-inline-functions -fno-use-cxa-atexit } +void foo (); + +struct S +{ + ~S () + { +foo (); + } +}; + +S s; -- 2.1.3
Re: patch to fix PR63527
On Tue, Nov 25, 2014 at 12:54 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Nov 25, 2014 at 12:22 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 The patch was tested and bootstrapped on x86/x86-64. Committed as rev. 218509. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 I checked in this testcase. Thanks. -- H.J. --- Index: ChangeLog === --- ChangeLog (revision 218060) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-25 H.J. Lu hongjiu...@intel.com + + PR target/63527 + * gcc.target/i386/pr63527.c: New test. + 2014-11-25 Martin Liska mli...@suse.cz Added another testcase from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534 -- H.J. --- Index: ChangeLog === --- ChangeLog (revision 218061) +++ ChangeLog (working copy) @@ -1,5 +1,10 @@ 2014-11-25 H.J. Lu hongjiu...@intel.com + PR target/63534 + * gcc.target/i386/pr63534.c: New test. + +2014-11-25 H.J. Lu hongjiu...@intel.com + PR target/63527 * gcc.target/i386/pr63527.c: New test. Index: gcc.target/i386/pr63534.c === --- gcc.target/i386/pr63534.c (revision 0) +++ gcc.target/i386/pr63534.c (working copy) @@ -0,0 +1,15 @@ +/* PR target/pr63534 */ +/* { dg-do compile { target { ia32 fpic } } } */ +/* { dg-options -O2 -fPIC } */ + +extern void bar (void); + +void +foo (void) +{ + bar (); + bar (); +} + +/* We shouldn't load EBX again. */ +/* { dg-final { scan-assembler-not movl\[ \t\]%\[^,\]+, %ebx } } */
Re: [Patch] Improving jump-thread pass for PR 54742
Sebastian Pop wrote: I will bootstrap and regression test this patch on x86_64-linux and powerpc64-linux. I will also run it on our internal benchmarks, coremark, and the llvm test-suite. I will also include a longer testcase that makes sure we do not regress on coremark. Done all the above. Attached is the new patch with a new testcase. I have also added verify_seme inspired by the recent patch adding verify_sese. Sebastian From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001 From: Sebastian Pop s@samsung.com Date: Fri, 26 Sep 2014 14:54:20 -0500 Subject: [PATCH] extend jump thread for finite state automata PR 54742 Adapted from a patch from James Greenhalgh. * params.def (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): New. * doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): Documented. * tree-cfg.c (split_edge_bb_loc): Export. * tree-cfg.h (split_edge_bb_loc): Declared extern. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the original value of cond when simplification fails. (fsm_find_thread_path): New. (fsm_find_control_statement_thread_paths): New. (fsm_thread_through_normal_block): Call find_control_statement_thread_paths. * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print EDGE_START_FSM_THREAD. (verify_seme): New. (duplicate_seme_region): New. (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges calling gimple_duplicate_sese_region. * tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New. --- gcc/doc/invoke.texi | 12 ++ gcc/params.def | 15 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 43 + gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 127 + gcc/tree-cfg.c |2 +- gcc/tree-cfg.h |1 + gcc/tree-ssa-threadedge.c| 215 +- gcc/tree-ssa-threadupdate.c | 201 +++- gcc/tree-ssa-threadupdate.h |1 + 9 files changed, 614 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..074183f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10624,6 +10624,18 @@ large and significantly increase compile time at optimization level @option{-O1} and higher. This parameter is a maximum nubmer of statements in a single generated constructor. Default value is 5000. +@item max-fsm-thread-path-insns +Maximum number of instructions to copy when duplicating blocks on a +finite state automaton jump thread path. The default is 100. + +@item max-fsm-thread-length +Maximum number of basic blocks on a finite state automaton jump thread +path. The default is 10. + +@item max-fsm-thread-paths +Maximum number of new jump thread paths to create for a finite state +automaton. The default is 50. + @end table @end table diff --git a/gcc/params.def b/gcc/params.def index 9b21c07..edf3f53 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1140,6 +1140,21 @@ DEFPARAM (PARAM_CHKP_MAX_CTOR_SIZE, Maximum number of statements to be included into a single static constructor generated by Pointer Bounds Checker, 5000, 100, 0) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATH_INSNS, + max-fsm-thread-path-insns, + Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path, + 100, 1, 99) + +DEFPARAM (PARAM_MAX_FSM_THREAD_LENGTH, + max-fsm-thread-length, + Maximum number of basic blocks on a finite state automaton jump thread path, + 10, 1, 99) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATHS, + max-fsm-thread-paths, + Maximum number of new jump thread paths to create for a finite state automaton, + 50, 1, 99) /* Local variables: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c new file mode 100644 index 000..bb34a74 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-dom1-details } */ +/* { dg-final { scan-tree-dump-times FSM 6 dom1 } } */ +/* { dg-final { cleanup-tree-dump dom1 } } */ + +int sum0, sum1, sum2, sum3; +int foo (char *s, char **ret) +{ + int state=0; + char c; + + for (; *s state != 4; s++) +{ + c = *s; + if (c == '*') + { + s++; + break; + } + switch (state) + { + case 0: + if (c == '+') + state = 1; + else if (c != '-') + sum0+=c; + break; + case 1: + if (c == '+') +
Re: [Patch, Fortran] convert almost all {warning,error}_now to common diagnostic
(a) those majority which might need buffering (gfc_error, gfc_warning); Is there a plan for those in the longer term? Bootstrapped and regtested on x86-64-gnu-linux. OK for the trunk? OK
Re: [PATCH 3/5] combine: add regno field to LOG_LINKS
On Tue, Nov 25, 2014 at 11:46:52AM -0700, Jeff Law wrote: On 11/14/14 12:19, Segher Boessenkool wrote: With this new field in place, we can have LOG_LINKS for insns that set more than one register and distribute them properly in distribute_links. This then allows many more PARALLELs to be combined. Also split off new functions can_combine_{def,use}_p from the create_log_links function. 2014-11-14 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (struct insn_link): New field `regno'. (alloc_insn_link): New parameter `regno'. Use it. (find_single_use): Check the new field. (can_combine_def_p, can_combine_use_p): New functions. Split off from ... (create_log_links): ... here. Correct data type of `regno'. Adjust call to alloc_insn_link. (adjust_for_new_dest): Find regno, use it in call to alloc_insn_link. (try_combine): Adjust call to alloc_insn_link. (distribute_links): Check the new field. Didn't you lose the check that avoids duplicated LOG_LINKs? I don't think so; if I did, that's a bug. Or is the claim that the check is no longer needed because there are no duplicates now that we include the register associated with the link? Are you talking about create_log_links? There can be no duplicates there (anymore), that would be multiple defs of the same reg in the same insn, indeed. I did check all the places that look at links, and adjusted everything that needed adjusting. Could have missed something of course... Segher