Re: [wwwdocs, patch] gcc-4.9/changes.html: Add quip about #pragma GCC ivdep and update Fortran section
Gerald Pfeifer wrote: On Thu, 24 Oct 2013, Tobias Burnus wrote: Any comments? Or is the patch OK? thanks for doing this. Thanks for looking at the patch. However, the patch has a link problem. The documentation is at http://gcc.gnu.org/onlinedocs/gcc/Loop_002dSpecific-Pragmas.html That's also the link I use in the changes.html file. However, some script changes the link to: http://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html which won't work. Try yourself at http://gcc.gnu.org/gcc-4.9/changes.html Actually, a similar issue was reported at http://gcc.gnu.org/ml/gcc-help/2013-10/msg00132.html Does anyone know which script modifies those links? Tobias
Re: Aliasing: look through pointer's def stmt
On Thu, 24 Oct 2013, Jeff Law wrote: On 10/24/13 23:23, Marc Glisse wrote: Hello, I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... 2013-10-25 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for an ADDR_EXPR in the defining statement. Shouldn't the ADDR_EXPR have been propagated into the use? Maybe when the address is a constant, but here it comes from malloc. -- Marc Glisse
Re: [RFC] Fix context-sensitiveness of peephole2 pass
On Thu, Oct 24, 2013 at 11:14:42PM -0600, Jeff Law wrote: On 10/24/13 15:10, Eric Botcazou wrote: As discovered by Richard B. under PR rtl-optimization/58831, the peephole2 pass has been context-sensitive for a long time when scratch registers are needed, in the sense that the behaviour of the pass for a given function is dependent on what happened for the previously optimized function. Obvious patch attached, tested on x86_64-suse-linux. Do we want to apply it on mainline only or on all the active branches? 2013-10-24 Eric Botcazou ebotca...@adacore.com * recog.c (search_ofs): New static variable moved from... (peep2_find_free_register): ...here. (peephole2_optimize): Initialize it. OK for trunk. Release branch owners have the final call for their branches. This is ok for the release branches too. Jakub
Re: [Patch, C++] Add C++ FE support for #pragma ivdep
Jason Merrill wrote: On 10/10/2013 04:46 AM, Tobias Burnus wrote: I considered to add the annotation also to C++11's range-based loops, but as those are unlikely to vectorize, I didn't do so. I would think that a range-based loop over an array should vectorize nicely: int ar[8]; for (int i: ar) { ... } They do - but during my experiments they either also did without #pragma GCC ivdep – typically with static arrays like yours – or they didn't with vector. However, I came now up with an example where #pragma GCC ivdep should have an effect: int ar[100]; void foo(int *a) { for (auto i : ar) { i *= *a; } } Therefore, I will send a follow up patch. Tobias
RE: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi Richard, Did you just propose: --- stor-layout.c.orig 2013-10-22 10:46:49.233261818 +0200 +++ stor-layout.c 2013-10-24 14:54:00.425259062 +0200 @@ -471,27 +471,7 @@ static enum machine_mode mode_for_array (tree elem_type, tree size) { - tree elem_size; - unsigned HOST_WIDE_INT int_size, int_elem_size; - bool limit_p; - - /* One-element arrays get the component type's mode. */ - elem_size = TYPE_SIZE (elem_type); - if (simple_cst_equal (size, elem_size)) - return TYPE_MODE (elem_type); - - limit_p = true; - if (host_integerp (size, 1) host_integerp (elem_size, 1)) - { - int_size = tree_low_cst (size, 1); - int_elem_size = tree_low_cst (elem_size, 1); - if (int_elem_size 0 - int_size % int_elem_size == 0 - targetm.array_mode_supported_p (TYPE_MODE (elem_type), - int_size / int_elem_size)) - limit_p = false; - } - return mode_for_size_tree (size, MODE_INT, limit_p); + return BLKmode; } ??? Yes. Does it work? Richard. No. PASS: gcc.target/i386/pr14289-1.c (test for errors, line 10) FAIL: gcc.target/i386/pr14289-1.c (test for excess errors) Excess errors: /home/ed/gnu/gcc-4.9-20131013/gcc/testsuite/gcc.target/i386/pr14289-1.c:5:14: error: data type of 'a' isn't suitable for a register Does that look like an ABI-Change? the test case uses: register int a[2] asm(ebx); Bernd.
Re: [Patch, C++] Add C++ FE support for #pragma ivdep
On Fri, Oct 25, 2013 at 08:22:22AM +0200, Tobias Burnus wrote: Jason Merrill wrote: On 10/10/2013 04:46 AM, Tobias Burnus wrote: I considered to add the annotation also to C++11's range-based loops, but as those are unlikely to vectorize, I didn't do so. I would think that a range-based loop over an array should vectorize nicely: int ar[8]; for (int i: ar) { ... } They do - but during my experiments they either also did without #pragma GCC ivdep – typically with static arrays like yours – or they didn't with vector. However, I came now up with an example where #pragma GCC ivdep should have an effect: int ar[100]; void foo(int *a) { for (auto i : ar) { i *= *a; } } Therefore, I will send a follow up patch. What about #pragma GCC ivdep while (cond) { ... } and #pragma GCC ivdep do { ... } while (cond); (for both C and C++)? Jakub
Today's MPX trunk patches (Rev. 204046): i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope
Hi Kirill, with the current trunk (newst git mirror version), bootstrapping fails here with the following error. Did you forget to commit one file? g++ -c -g -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include -I../../gcc/../libcpp/include -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -DCLOOG_INT_GMP\ -o build/gencondmd.o build/gencondmd.c ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ Tobias
Re: Today's MPX trunk patches (Rev. 204046): i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope
Adding Ilya. On 25 October 2013 10:48, Tobias Burnus bur...@net-b.de wrote: Hi Kirill, with the current trunk (newst git mirror version), bootstrapping fails here with the following error. Did you forget to commit one file? g++ -c -g -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include -I../../gcc/../libcpp/include -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -DCLOOG_INT_GMP\ -o build/gencondmd.o build/gencondmd.c ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ ../../gcc/config/i386/i386.md:18329:2: error: 'TARGET_MPX' was not declared in this scope TARGET_MPX ^ Tobias -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
[PATCH, i386]: Unbreak bootstrap
Hello! 2013-10-25 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (TARGET_MPX): New define. (TARGET_MPX_P): Ditto. Tested on x86_64-pc-linux-gnu, committed. Uros. Index: i386.h === --- i386.h (revision 204047) +++ i386.h (working copy) @@ -128,6 +128,8 @@ #define TARGET_XSAVE_P(x) TARGET_ISA_XSAVE_P(x) #define TARGET_XSAVEOPTTARGET_ISA_XSAVEOPT #define TARGET_XSAVEOPT_P(x) TARGET_ISA_XSAVEOPT_P(x) +#define TARGET_MPX TARGET_ISA_MPX +#define TARGET_MPX_P(x)TARGET_ISA_MPX_P(x) #define TARGET_LP64TARGET_ABI_64 #define TARGET_LP64_P(x) TARGET_ABI_64_P(x)
Re: wwwdocs / gomp: Update projects/gomp/index.html
On Fri, Oct 25, 2013 at 03:11:40AM +0200, Gerald Pfeifer wrote: Hi Tobias, On Thu, 24 Oct 2013, Tobias Burnus wrote: the attached patch updates the gomp project patch. (Besides adding an item for the OpenMP 4.0 merge, I removed the 95 from Fortran 95 as Fortran is backward compatible and gfortran is effectively a Fortran 66/77/90/95/2003/2008 compiler [minus bugs and some missing features].) OK? looks good to me, though please give Jakub a bit to chime in in case. One thing to consider is replacing SVN mainline by mainline GCC; that way it's easier to understand and we avoid referring to our revision control system of the day (well, decade ;-). LGTM. Jakub
Re: lto-plugin: mismatch between ld's architecture and GCC's configure --host
Hi! Ping. To sum it up, with these patches applied, there are no changes for a regular build (not using the new configure options). On the other hand, configuring GCC as described, it is possible use the 32-bit x86 linker for/with a x86_64 build, and get the very same GCC test results as when using a x86_64 linker. On Mon, 14 Oct 2013 12:28:11 +0200, I wrote: On Sat, 12 Oct 2013 12:20:19 +0200, I wrote: This is a bit of a weird scenario -- but it is supposed to work fine in my opinion (but doesn't). I have a GNU toolchain as 32-bit x86 GNU/Linux executables, configured to to generate code for 32-bit x86 by default, and using -m64 for x86_64. This toolchain I'm using on a x86_64 system (which can execute 32-bit executables) to build a *native* GCC, that is I'm using the 32-bit toolchain to build a x86_64 GCC (configuring with CC='gcc -m64' CXX='g++ -m64'). I intend to continue using the 32-bit toolchain's linker, which also is a 32-bit executable (GNU ld). That one also defaults to x86 code, but can handle the x86_64 case fine if passed -m elf_x86_64, which GCC does. That the linker is a 32-bit executable is an implementation detail that is not important generally: it's a separate process living in its own address space. However it becomes relevant in the case of linker plugins: the native x86_64 GCC that I'm building also builds a x86_64 lto-plugin, which the 32-bit ld cannot load: $ gcc/xgcc -B[...] [...]/gcc.c-torture/execute/ieee/2320-1.c [...] -flto [...] [...]/ld: [...]/gcc/liblto_plugin.so: error loading plugin: [...]/gcc/liblto_plugin.so: wrong ELF class: ELFCLASS64 collect2: error: ld returned 1 exit status So, aside from building a 64-bit ld (which is the lame alternative), I now need to teach GCC's build system that the lto-plugin may need special configuration: CC='gcc -m32' -- and [...] its own build of libiberty, too [...] Instead of auto-detecting the linker's architecture (and then, what to do with that information?), I intend to make this a manual process (so, some new top-level configure argument(s)). Adding yet another set of {...,CC,...}_FOR_[something] is probably overkill -- I'll try to find something simpler. Any comments on this scenario? Here are the patches. Unless the new option is exercised, there are no effects on a native x86_64 GNU/Linux bootstrap build (the build trees' *.o files are identical, as are the test results). OK to commit? Allow overriding the libiberty used for building the LTO plugin. lto-plugin/ * configure.ac (--with-libiberty): New configure option. * configure: Regenerate. * Makefile.am (libiberty, libiberty_pic): New variables. (liblto_plugin_la_LIBADD, liblto_plugin_la_LDFLAGS) (liblto_plugin_la_DEPENDENCIES): Use them. * Makefile.in: Regenerate. --- lto-plugin/Makefile.am | 20 +++- lto-plugin/Makefile.in | 22 -- lto-plugin/configure| 17 +++-- lto-plugin/configure.ac | 5 + 4 files changed, 43 insertions(+), 21 deletions(-) diff --git lto-plugin/Makefile.am lto-plugin/Makefile.am index b24015e..8b7bb54 100644 --- lto-plugin/Makefile.am +++ lto-plugin/Makefile.am @@ -15,17 +15,19 @@ libexecsub_LTLIBRARIES = liblto_plugin.la gcc_build_dir = ../$(host_subdir)/gcc in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) -# Can be removed when libiberty becomes a normal convenience library -Wc=-Wc, - liblto_plugin_la_SOURCES = lto-plugin.c +# Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. +liblto_plugin_la_LDFLAGS = $(lt_host_flags) -module -bindir $(libexecsubdir) +# Can be simplified when libiberty becomes a normal convenience library. +libiberty=$(with_libiberty)/libiberty.a +libiberty_pic=$(with_libiberty)/pic/libiberty.a +Wc=-Wc, liblto_plugin_la_LIBADD = \ - $(if $(wildcard ../libiberty/pic/libiberty.a),$(Wc)../libiberty/pic/libiberty.a,) -# Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS -liblto_plugin_la_LDFLAGS = $(lt_host_flags) -module -bindir $(libexecsubdir) \ - $(if $(wildcard ../libiberty/pic/libiberty.a),,-Wc,../libiberty/libiberty.a) -liblto_plugin_la_DEPENDENCIES = $(if $(wildcard \ - ../libiberty/pic/libiberty.a),../libiberty/pic/libiberty.a,) + $(if $(wildcard $(libiberty_pic)),$(Wc)$(libiberty_pic),) +liblto_plugin_la_LDFLAGS += \ + $(if $(wildcard $(libiberty_pic)),,-Wc,$(libiberty)) +liblto_plugin_la_DEPENDENCIES = \ + $(if $(wildcard $(libiberty_pic)),$(libiberty_pic),) all-local: $(in_gcc_libs) diff --git lto-plugin/configure.ac lto-plugin/configure.ac index 9a418d2..b73fabb 100644 --- lto-plugin/configure.ac +++ lto-plugin/configure.ac @@ -4,6 +4,11 @@ AC_CANONICAL_SYSTEM GCC_TOPLEV_SUBDIRS AM_INIT_AUTOMAKE([foreign no-dist])
Re: [Patch Preview/RFC] tree-optimization/57994: Constant folding of infinity
On Thu, 24 Oct 2013, Paolo Carlini wrote: Hi, this is just a preview, but I decided to send it out early to understand if I'm on the right track or not. As you can see in the Bug, this started as a spin-off of a library issue with complex pow, which led us to: __builtin_exp(-__builtin_huge_val()) not being folded to a constant at compile-time. The reason is simple: the various do_mpfr_arg*, etc, all check real_isfinite on the arguments. In the audit trail of the bug we came to the conclusion that allowing non-NANs could make sense (the mpfr facilities appear to be quite solid in this case - maybe for NANs too, at least in the non-complex case, but that's another story). However, when today I started fiddling with this kind of change, I noticed that isn't enough for the kind of code we really care about for the original issue, involving logs too, thus something like: long double num = __builtin_logl(0L); long double res = __builtin_expl(num); because the log isn't folded at all for zero: fold_builtin_logarithm calls do_mpfr_arg1 with true as last argument. We can make progress if, when it's safe - is !flag_trapping_math !flag_errno_math enough? - we pass true instead. Then we also have to tweak do_mpfr_ckconv, because it checks mpfr_number_p (and real_isfinite) on the result of the folding, thus ruling out infinities. Then we are almost there: the latter can be actually fully folded if -O1 is used, -O0 is not Ok, because otherwise num isn't const propagated to the evaluation of res. Is this so far by and large Ok? I'm attaching a corresponding patchlet (it of course lacks testcases and testsuite adjustments) Note: in the patchlet I'm not trying to handle NaNs; neither complex numbers; neither bessel, remquo and lgamma (which could probably be added?) Also, I suppose we could discover other cases like fold_builtin_logarithm, where, in the context of folding infinities too, we may have to tweak a bit the way do_mpfr_* functions are called Generally I think it's quite odd that do_mpfr_arg1 restricts operands and results in any way. mpfr should be able to return the correct (exceptional) value for any inputs. We have to restrict ourselves with using the result only in the case we have to set errno (where we can still constant propagate but have to leave the call around for errno setting purposes) and if the function is supposed to trap on exceptional values (same handling as errno). But this logic should be really in the callers. Restricting what arguments we feed into mpfr should never be necessary unless there are bugs in mpfr (which should better be reported and fixed). Not sure if we have to care about SNaN inputs in any way, but we'd have -fsignalling-nans to control this anyway. I have no opinion on the patch itself for now, we have to decide first whether to trust mpfr enough to feed it with exceptional values (where obviously the range test of do_mpfr_arg? doesn't work, but AFAICS it's only to avoid errno cases which should be done separately). For example @@ -8191,7 +8191,9 @@ fold_builtin_logarithm (location_t loc, tree fndec const enum built_in_function fcode = builtin_mathfn_code (arg); /* Calculate the result when the argument is a constant. */ - if ((res = do_mpfr_arg1 (arg, type, func, dconst0, NULL, false))) + if ((res = do_mpfr_arg1 (arg, type, func, dconst0, NULL, + !flag_trapping_math !flag_errno_math + ? true : false))) return res; /* Special case, optimize logN(expN(x)) = x. */ we should be able to constant fold for all inputs, but if arg = 0 and flag_trapping_math || flag_errno_math then we have to fold to ( log (arg), constant-folding-result ) especially for the -ferrno-math case this allows to continue constant folding with the result, removing otherwise dead code. Not sure if this fixes the not constant folding (as the result is a constant but with side-effects that still need to be executed and represented by calls with unused result). Richard.
[committed] Dump OpenMP version number listed for -fopenmp at doc/invoke.texi
Dear all, I have committed the patch below as obvious (Rev. 204049). Seemingly, there are a lot of places which have to be updated for a new OpenMP version ... Tobias Index: gcc/ChangeLog === --- gcc/ChangeLog (Revision 204048) +++ gcc/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2013-10-25 Tobias Burnus bur...@net-b.de + + * doc/invoke.texi (fopenmp): Change supported OpenMP version + to 4.0. + 2013-10-25 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (TARGET_MPX): New define. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (Revision 204048) +++ gcc/doc/invoke.texi (Arbeitskopie) @@ -1825,7 +1825,7 @@ freestanding and hosted environments. Enable handling of OpenMP directives @code{#pragma omp} in C/C++ and @code{!$omp} in Fortran. When @option{-fopenmp} is specified, the compiler generates parallel code according to the OpenMP Application -Program Interface v3.0 @w{@uref{http://www.openmp.org/}}. This option +Program Interface v4.0 @w{@uref{http://www.openmp.org/}}. This option implies @option{-pthread}, and thus is only supported on targets that have support for @option{-pthread}.
Re: [patch] Fix PR rtl-optimization/58831
Actually I think it's jfc's code; he had a major revamp of alias.c back in the early egcs days and getting it integrated was one of the project's early wins. Sadly, jfc hasn't been involved in GCC work for a long long time. OK, thanks for the clarification. Anyway, I think the issue here is that once we set new_reg_base_value[regno] to zero, when we do encounter insn 30, we assume that's the first set of (reg 4) and we use the RHS as a base value. The result is we end up with something in new_reg_base_value[4] that is not correct for the entire life. ie, from function entry to the clobber (reg 4) has a value totally unrelated to what we've stored in new_reg_base_value[4]? RIght? Yes, that's exactly it. Seems reasonable to me, assuming my understandings noted above are correct. Thanks. -- Eric Botcazou
Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c
On 24/10/13 20:03, Kugan wrote: Hi Kyrill, It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm --with-arch=armv5te --with-float=soft Ah ok, I can reproduce it now. So, while I agree that we add a scan for vbit and vbif to these testcases, there seems to be something dodgy going on with the register allocation. With -march=armv5te I'm getting the following snippet of code in the ltgt case: .L12: ldr r4, [ip] ldr r5, [ip, #4] ldr r6, [ip, #8] ldr r7, [ip, #12] vmovd20, r4, r5 @ v4sf vmovd21, r6, r7 vcgt.f32q8, q10, q9 vcgt.f32q10, q9, q10 vorrq8, q8, q10 vmovd22, r4, r5 @ v4sf vmovd23, r6, r7 vbitq11, q9, q8 vmovr4, r5, d22 @ v4sf vmovr6, r7, d23 The second vcgt.f32 trashes q10, then recreates it in q11 with: vmovd22, r4, r5 @ v4sf vmovd23, r6, r7 so it can do the vbit. Surely there's something better that can be done? In contrast, with -march=armv7-a we get: .L12: vld1.32 {q9}, [r4]! vcgt.f32q8, q9, q10 vcgt.f32q11, q10, q9 vorrq8, q8, q11 vbslq8, q10, q9 vst1.32 {q8}, [lr]! So, while I agree with the patch, there seems to be some funny business with the register allocation that could be worth looking into. Thanks, Kyrill You can also find the logs here in http://cbuild.validation.linaro.org/build/gcc-linaro-4.8-2013.10/logs/armv7l-precise-cbuild461-calxeda02_21_00_precise_armel-armv5r2/ I changed neon-vcond-gt.c too. Thanks, Kugan 2013-10-23 Kugan Vivekanandarajah kug...@linaro.org * gcc.target/arm/neon-vcond-gt.c: Scan for vbsl or vbit or vbif. * gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif. * gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.
Re: [PATCH] RFA: Remove mudflap
On Thu, Oct 24, 2013 at 10:35 PM, Jeff Law l...@redhat.com wrote: On 10/24/13 07:40, Richard Biener wrote: we were supposed to remove mudflap for 4.9, no? Really? I guess it hasn't been removed yet since the include is still there? who is doing that? Yeah, nobody has done it yet appearantly :/ Well, here we go... Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Obviously the mudflap specific tests were deleted and not run and were manually ignored when comparing testsuite runs. OK for the trunk? Ok with preserving options as dummy, see examples like fargument-alias Common Ignore Does nothing. Preserved for backward compatibility. I don't think we should sorry (), but we can certainly warn that mudflap was replaced by -fsanitize=address(?). Implementing the warning properly isn't a requirement for the patch to go in (but the options should be still handled, but ignored). We can followup with a patch implementing the warning. Thanks, Richard. jeff
RE: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips
OK, so it is about 2%. Did you try if you need lookahead even in the early pass (before reload)? My guess would be so, but if not, it could cut the cost to half. For -Ofast/-O3 it looks resonable to me, but we will need to announce it on the ML. For other settings I think we need to work on more improvements or cut the expenses. Yes, it is required before reload. I have another idea which can be pondered upon. Currently, can we enable lookahead with the value 4 (pre reload) for default? This will exponentially cut the cost of build time. I have done some measurements on the build time of some benchmarks (mentioned below) with lookahead value 4. The 2% increase in build time with value 8 is now almost gone. dfa4 no_lookahead perlbench - 191s 193s bzip2 - 19s 19s gcc - 429s 429s mcf - 3s3s gobmk - 116s 115s hmmer - 60s 60s sjeng - 18s 17s libquantum - 6s6s h264ref - 107s 107s omnetpp - 128s 128s astar - 7s7s bwaves - 5s5s gamess - 1964s 1957s milc- 18s 18s GemsFDTD- 273s 272s Lookahead value 4 also helps because, the modified decoder model in bdver3.md is only two cycles deep (though in hardware it is actually 4 cycles deep). This means that we can look another two levels deep for better schedule. GemsFDTD still retains the performance boost of around 6-7% with value 4. Let me know your thoughts. Regards Ganesh -Original Message- From: Jan Hubicka [mailto:hubi...@ucw.cz] Sent: Thursday, October 24, 2013 6:48 PM To: Gopalasubramanian, Ganesh Cc: Jan Hubicka; gcc-patches@gcc.gnu.org; Uros Bizjak (ubiz...@gmail.com); H.J. Lu (hjl.to...@gmail.com) Subject: Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips Hi, Is this with -fschedule-insns? Or only with default settings? Did you test the compile time implications of increasing the lookahead? (value of 8 is very large, we may consider enbling it only for -Ofast, limiting for postreload only or something similar). The improvement is seen with the options -fschedule-insns -fschedule-insns2 -fsched-pressure Below are the build times of some of the SPEC benchmarks dfa8 no_lookahead perlbench - 196s 193s bzip2 - 19s 19s gcc - 439s 429s mcf - 3s3s gobmk - 119s 115s hmmer - 62s 60s sjeng - 18s 17s libquantum - 6s6s h264ref - 110s 107s omnetpp - 132s 128s astar - 7s7s bwaves - 4s5s gamess - 1996s 1957s milc- 18s 18s GemsFDTD- 276s 272s I think we can enable it by default rather than for -Ofast. Please let me know your inputs. OK, so it is about 2%. Did you try if you need lookahead even in the early pass (before reload)? My guess would be so, but if not, it could cut the cost to half. For -Ofast/-O3 it looks resonable to me, but we will need to announce it on the ML. For other settings I think we need to work on more improvmeents or cut the expenses. Honza Regards Ganesh -Original Message- From: Jan Hubicka [mailto:hubi...@ucw.cz] Sent: Thursday, October 24, 2013 2:54 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubiz...@gmail.com); hubi...@ucw.cz; H.J. Lu (hjl.to...@gmail.com) Subject: Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips Attached is the patch which does the following scheduler related changes. * re-models bdver3 decoder. * It enables lookahead with value 8 for all BD architectures. The patch doesn't consider if reloading is completed or not (an area that needs to be worked on). * The issue rate for BD architectures are set to 4. I see the following performance improvements on bdver3 machine. * GemsFDTD improves by 6-7% with lookahead value changed to 8. * Hmmer improves by 9% when issue rate when set to 4 . Is this with -fschedule-insns? Or only with default settings? Did you test the compile time implications of increasing the lookahead? (value of 8 is very large, we may consider enbling it only for -Ofast, limiting for postreload only or something similar). I have considered the following hardware details for the model. * There are four decoders inside a hardware decoder block. * These four independent decoders can execute in parallel. (They can take 8B from four different instructions and decode). * These four decoders are pipelined 4 cycles deep and are non-stalling.
Re: [PATCH] Fix names of various macro parameters in tree.h
On Fri, Oct 25, 2013 at 3:17 AM, David Malcolm dmalc...@redhat.com wrote: I noticed that some of the macros in tree.h that act on trees have parameters named CODE, rather NODE, which is confusing when in the presence of other macros that act on enum tree_code values. The attached patch renames such params for macros that I believe act on trees (mostly because they go ahead and lookup the TREE_CODE() of the param). Successfully bootstrappedregtested on x86_64-unknown-linux. OK for trunk? Ok. Thanks, Richard. (fwiw the macros appear to have been this way since they were introduced in 87675 in 2004-09-17) gcc/ * tree.h (EXCEPTIONAL_CLASS_P): Rename parameter from CODE to NODE, since this works on a tree, not an enum tree_code. (CONSTANT_CLASS_P): Likewise. (TYPE_P): Likewise. (DECL_P): Likewise. (INDIRECT_REF_P): Likewise. (REFERENCE_CLASS_P): Likewise. (COMPARISON_CLASS_P): Likewise. (UNARY_CLASS_P): Likewise. (BINARY_CLASS_P): Likewise. (STATEMENT_CLASS_P): Likewise. (VL_EXP_CLASS_P): Likewise. (EXPRESSION_CLASS_P): Likewise. (IS_TYPE_OR_DECL_P): Likewise.
[PATCH] Fix C++0x memory model for -fno-strict-volatile-bitfields on ARM
Hello, this patch fixes the recently discovered data store race on arm-eabi-gcc with -fno-strict-volatile-bitfields for structures like this: #define test_type unsigned short typedef struct s{ unsigned char Prefix[1]; test_type Type; }__attribute((__packed__,__aligned__(4))) ss; volatile ss v; void __attribute__((noinline)) foo (test_type u) { v.Type = u; } test_type __attribute__((noinline)) bar (void) { return v.Type; } I've manually confirmed the correct code generation using variations of the example above on an ARM cross-compiler for -fno-strict-volatile-bitfields. Note, that this example is still causes ICE's for -fstrict-volatile-bitfields, but I'd like to fix that separately. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Thanks Bernd.2013-10-25 Bernd Edlinger bernd.edlin...@hotmail.de Fix C++0x memory model for unaligned fields in packed, aligned(4) structures with -fno-strict-volatile-bitfields on STRICT_ALIGNMENT targets like arm-none-eabi. * expmed.c (store_bit_field): Handle unaligned fields like bit regions. patch-unaligned-data.diff Description: Binary data
Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.
Thanks. Installed on the trunk. Well, no, that will be problematic for some architectures. The history of this piece of code is complicated and it's admittedly lacking a comment, but the purpose of the block is clear enough: op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_SUM); op0 = memory_address_addr_space (address_mode, op0, as); if (!integer_zerop (TREE_OPERAND (exp, 1))) { rtx off = immed_double_int_const (mem_ref_offset (exp), address_mode); op0 = simplify_gen_binary (PLUS, address_mode, op0, off); } op0 = memory_address_addr_space (mode, op0, as); The offset computation is done in address_mode and then, only at the end, converted to mode. -- Eric Botcazou
Re: [PATCH, i386]: Unbreak bootstrap
2013/10/25 Uros Bizjak ubiz...@gmail.com: Hello! 2013-10-25 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (TARGET_MPX): New define. (TARGET_MPX_P): Ditto. Tested on x86_64-pc-linux-gnu, committed. Thanks for fixing it! Uros.
Re: Aliasing: look through pointer's def stmt
On Fri, Oct 25, 2013 at 8:11 AM, Marc Glisse marc.gli...@inria.fr wrote: On Thu, 24 Oct 2013, Jeff Law wrote: On 10/24/13 23:23, Marc Glisse wrote: Hello, I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... 2013-10-25 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for an ADDR_EXPR in the defining statement. Shouldn't the ADDR_EXPR have been propagated into the use? Maybe when the address is a constant, but here it comes from malloc. points-to should have propagated the alias info, so no, looking at def-stmts in random places like this isn't ok. Where does alias info get lost? Thanks, Richard. -- Marc Glisse
[PATCH] Handle __builtin_unreachable () using assertions in VRP
Hi! As discussed on IRC, this patch attempts to preserve VRP computed range info for some simple __builtin_unreachable () using assertions. If there are no immediate uses of some SSA_NAME except for those in a condition guarding __builtin_unreachable () and in ASSERT_EXPR in the following basic block, we can copy the range info from the assert lhs (which is lost during remove_range_assertions) and thus preserve it. Eventually we might then remove the __builtin_unreachable () in that case and still benefit from user annotating the source. In the testcase below in bar we have even without this patch the expected range info on the SSA_NAME set to x + 1, but in foo there was no range info preserved. As we have just one user of get_range_info right now, not creating a testcase. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? int foo (int x) { if (x 26 || x 37) __builtin_unreachable (); return x; } int bar (int x) { if (x 26 || x 37) __builtin_unreachable (); return x + 1; } 2013-10-25 Jakub Jelinek ja...@redhat.com * tree-vrp.c (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate uses but in the condition and ASSERT_EXPR and the other successor of the predecessor bb is __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs. --- gcc/tree-vrp.c.jj 2013-10-24 10:19:21.0 +0200 +++ gcc/tree-vrp.c 2013-10-24 14:32:29.065878208 +0200 @@ -6488,12 +6488,16 @@ remove_range_assertions (void) { basic_block bb; gimple_stmt_iterator si; + /* 1 if looking at ASSERT_EXPRs immediately at the beginning of + a basic block preceeded by GIMPLE_COND branching to it and + __builtin_trap, -1 if not yet checked, 0 otherwise. */ + int is_unreachable; /* Note that the BSI iterator bump happens at the bottom of the loop and no bump is necessary if we're removing the statement referenced by the current BSI. */ FOR_EACH_BB (bb) -for (si = gsi_start_bb (bb); !gsi_end_p (si);) +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);) { gimple stmt = gsi_stmt (si); gimple use_stmt; @@ -6501,30 +6505,96 @@ remove_range_assertions (void) if (is_gimple_assign (stmt) gimple_assign_rhs_code (stmt) == ASSERT_EXPR) { + tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree var; tree cond = fold (ASSERT_EXPR_COND (rhs)); - use_operand_p use_p; + use_operand_p use_p, use2_p; imm_use_iterator iter; gcc_assert (cond != boolean_false_node); - /* Propagate the RHS into every use of the LHS. */ var = ASSERT_EXPR_VAR (rhs); - FOR_EACH_IMM_USE_STMT (use_stmt, iter, - gimple_assign_lhs (stmt)) + gcc_assert (TREE_CODE (var) == SSA_NAME); + + if (!POINTER_TYPE_P (TREE_TYPE (lhs)) +SSA_NAME_RANGE_INFO (lhs)) + { + if (is_unreachable == -1) + { + is_unreachable = 0; + if (single_pred_p (bb)) + { + basic_block pred_bb = single_pred (bb); + gimple last = last_stmt (pred_bb); + if (last gimple_code (last) == GIMPLE_COND) + { + basic_block other_bb + = EDGE_SUCC (pred_bb, 0)-dest; + if (other_bb == bb) + other_bb = EDGE_SUCC (pred_bb, 1)-dest; + if (EDGE_COUNT (other_bb-succs) == 0) + { + gimple_stmt_iterator gsi + = gsi_after_labels (other_bb); + if (!gsi_end_p (gsi) +gimple_call_builtin_p + (gsi_stmt (gsi), +BUILT_IN_UNREACHABLE)) + is_unreachable = 1; + } + } + } + } + /* Handle + if (x_7 = 10 x_7 20) +__builtin_unreachable (); + x_8 = ASSERT_EXPR x_7, ...; + if the only uses of x_7 are in the ASSERT_EXPR and + in the condition. In that case, we can copy the + range info from x_8 computed in this pass also + for x_7. */ + if (is_unreachable) + { + bool ok = true; + FOR_EACH_IMM_USE_FAST (use_p, iter, var) + if (USE_STMT (use_p) != stmt) +
Re: [Patch] Refactor regex executors
On 10/25/2013 06:30 AM, Tim Shen wrote: This patch optimizes BFS executor by removing quantifier tracking, but precisely control the assignment order of the variable recording the optimal solution. Ok. Are there measurable performance changes at this point? I'm asking because we should take the chance and add performance testcases when we improve the performance: later, unless somebody files a specific Bug report, we become lazy about those, I know that ;) Paolo.
Re: Debug functions review
Hi, On 10/24/2013 09:22 PM, François Dumont wrote: Ok to commit ? Looks great to me. Thanks! Paolo.
Re: Aliasing: look through pointer's def stmt
On Fri, Oct 25, 2013 at 10:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 25, 2013 at 8:11 AM, Marc Glisse marc.gli...@inria.fr wrote: On Thu, 24 Oct 2013, Jeff Law wrote: On 10/24/13 23:23, Marc Glisse wrote: Hello, I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... 2013-10-25 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for an ADDR_EXPR in the defining statement. Shouldn't the ADDR_EXPR have been propagated into the use? Maybe when the address is a constant, but here it comes from malloc. points-to should have propagated the alias info, so no, looking at def-stmts in random places like this isn't ok. Where does alias info get lost? It doesn't get lost, but this is the missed optimization that malloc results still alias with global pointers (here a function argument). Taking into account the offset shouldn't change anything. I suppose you are looking for the memcpy to be folded into an assignment? Which ao_ref_from_ptr_and_size is critical for the transform - it doesn't seem to be the one in memory op folding. Richard. Thanks, Richard. -- Marc Glisse
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
On Fri, Oct 25, 2013 at 8:29 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi Richard, Did you just propose: --- stor-layout.c.orig 2013-10-22 10:46:49.233261818 +0200 +++ stor-layout.c 2013-10-24 14:54:00.425259062 +0200 @@ -471,27 +471,7 @@ static enum machine_mode mode_for_array (tree elem_type, tree size) { - tree elem_size; - unsigned HOST_WIDE_INT int_size, int_elem_size; - bool limit_p; - - /* One-element arrays get the component type's mode. */ - elem_size = TYPE_SIZE (elem_type); - if (simple_cst_equal (size, elem_size)) - return TYPE_MODE (elem_type); - - limit_p = true; - if (host_integerp (size, 1) host_integerp (elem_size, 1)) - { - int_size = tree_low_cst (size, 1); - int_elem_size = tree_low_cst (elem_size, 1); - if (int_elem_size 0 - int_size % int_elem_size == 0 - targetm.array_mode_supported_p (TYPE_MODE (elem_type), - int_size / int_elem_size)) - limit_p = false; - } - return mode_for_size_tree (size, MODE_INT, limit_p); + return BLKmode; } ??? Yes. Does it work? Richard. No. PASS: gcc.target/i386/pr14289-1.c (test for errors, line 10) FAIL: gcc.target/i386/pr14289-1.c (test for excess errors) Excess errors: /home/ed/gnu/gcc-4.9-20131013/gcc/testsuite/gcc.target/i386/pr14289-1.c:5:14: error: data type of 'a' isn't suitable for a register Does that look like an ABI-Change? the test case uses: register int a[2] asm(ebx); Eh ... even register struct { int i; int a[0]; } asm (ebx); works. Also with int a[1] but not with a[2]. So just handling trailing arrays makes this case regress as well. Now I'd call this use questionable ... but we've likely supported that for decades and cannot change that now. Back to fixing everything in expand. Richard. Bernd.
[PATCH] Introduce [sg]et_nonzero_bits
Hi! tree-ssa-ccp.c already computes which bits are known to be zero, but we preserve that info only for pointers and not for integers. This patch changes SSA_NAME_RANGE_INFO, so we preserve that info even for integers. The bitmask is also computed from range info. There are no users of this besides ccp itself right now, but I'm working on using that info e.g. in the vectorizer. I had to tweak one testcase, because it relied on one of the two warnings being emitted during VRP1 and one during VRP2, but because of the range info derived nonzero_bits we now manage to optimize away the second test during CCP3, which doesn't have -Wtype-limits warnings. If it would be too important to keep warning even there, I guess it could check get_range_info when folding a condition to 0/1 and warn if the rhs isn't in the lhs' range. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? During the development of the patch I had there code to not only use get_range_info if otherwise returning VARYING, but also if it was already constant, but that failed bootstrap miserably: It was something along the lines of: ++/* If get_nonzero_bits tells us some bits ++ have to be zero and they aren't, something ++ went wrong. Only bits in value where val.mask ++ has zeroes (i.e. valid bit from CCP), mask has ones ++ (don't care about bits outside of precision) and ++ has zeroes in nonzero_bits (guaranteed to be zero) ++ should be verified to be zero. */ ++double_int valv = tree_to_double_int (val.value); ++gcc_assert ((valv ~val.mask ++ ~nonzero_bits mask).is_zero ()); ++if (!(valv ~nonzero_bits mask).is_zero ()) ++ val.value = double_int_to_tree (TREE_TYPE (lhs), ++ valv nonzero_bits); ++if (nonzero_bits.is_zero ()) ++ val.mask = double_int_zero; ++else ++ val.mask = val.mask (nonzero_bits | ~mask); The problem was that during the iteration when only a subset of the edges were executable it sometimes failed the assertion, dunno if that is the result of the nonzero bits info in this case coming from completely different algorithm (from VRP ranges) and that simply until the CCP iterations settle it can be wrong, so just to be safe in that case I'm adding the info only in ccp_finalize (and the new nonzero_bits mask with the old one). From what I could see, the range info on that testcase looked just fine, the nonzero_bits derived from it too, if desirable, I can try to cook up a small testcase showing that (reduce from libgcov.c that exhibited that). 2013-10-24 Jakub Jelinek ja...@redhat.com * gimple-pretty-print.c (dump_ssaname_info): Print newline also in case of VR_VARYING. Print get_nonzero_bits if not all ones. * tree-ssanames.h (struct range_info_def): Add nonzero_bits field. (set_nonzero_bits, get_nonzero_bits): New prototypes. * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if a default def isn't partially constant. (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME is known to be partially zero. (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info to see if a default def isn't partially constant. * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon creation of a range, if VR_RANGE, try to improve nonzero_bits from the range. (set_nonzero_bits, get_nonzero_bits): New functions. * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure we warn about it already during VRP1 pass. --- gcc/gimple-pretty-print.c.jj2013-10-23 14:43:12.0 +0200 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe if (!POINTER_TYPE_P (TREE_TYPE (node)) SSA_NAME_RANGE_INFO (node)) { - double_int min, max; + double_int min, max, nonzero_bits; value_range_type range_type = get_range_info (node, min, max); if (range_type == VR_VARYING) @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe pp_printf (buffer, , ); pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node))); pp_printf (buffer, ]); - newline_and_indent (buffer, spc); } + nonzero_bits = get_nonzero_bits (node); + if (nonzero_bits != double_int_minus_one + (nonzero_bits + != double_int::mask (TYPE_PRECISION (TREE_TYPE (node) + { + pp_string (buffer, NONZERO ); + sprintf (pp_buffer (buffer)-digit_buffer, + HOST_WIDE_INT_PRINT_DOUBLE_HEX, + (unsigned HOST_WIDE_INT) nonzero_bits.high, + nonzero_bits.low); +
Re: [PATCH] Fix C++0x memory model for -fno-strict-volatile-bitfields on ARM
On Fri, Oct 25, 2013 at 10:40 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hello, this patch fixes the recently discovered data store race on arm-eabi-gcc with -fno-strict-volatile-bitfields for structures like this: #define test_type unsigned short typedef struct s{ unsigned char Prefix[1]; test_type Type; }__attribute((__packed__,__aligned__(4))) ss; volatile ss v; void __attribute__((noinline)) foo (test_type u) { v.Type = u; } test_type __attribute__((noinline)) bar (void) { return v.Type; } I've manually confirmed the correct code generation using variations of the example above on an ARM cross-compiler for -fno-strict-volatile-bitfields. Note, that this example is still causes ICE's for -fstrict-volatile-bitfields, but I'd like to fix that separately. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Isn't it more appropriate to fix it here: if (TREE_CODE (to) == COMPONENT_REF DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1))) get_bit_range (bitregion_start, bitregion_end, to, bitpos, offset); ? Btw, the C++ standard doesn't cover packed or aligned attributes so we could declare this a non-issue. Any opinion on that? Thanks, Richard. Thanks Bernd.
Re: Aliasing: look through pointer's def stmt
On Fri, 25 Oct 2013, Richard Biener wrote: On Fri, Oct 25, 2013 at 10:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 25, 2013 at 8:11 AM, Marc Glisse marc.gli...@inria.fr wrote: On Thu, 24 Oct 2013, Jeff Law wrote: On 10/24/13 23:23, Marc Glisse wrote: Hello, I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... 2013-10-25 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for an ADDR_EXPR in the defining statement. Shouldn't the ADDR_EXPR have been propagated into the use? Maybe when the address is a constant, but here it comes from malloc. points-to should have propagated the alias info, so no, looking at def-stmts in random places like this isn't ok. Where does alias info get lost? By points-to, do you mean in pt_solution? That's very rough information, and in particular here it can't tell me that 2 fields of the same struct don't alias, can it? It doesn't get lost, but this is the missed optimization that malloc results still alias with global pointers (here a function argument). Taking into account the offset shouldn't change anything. I think this is a different issue (but if you say improving malloc would help, maybe). I suppose you are looking for the memcpy to be folded into an assignment? Which ao_ref_from_ptr_and_size is critical for the transform - it doesn't seem to be the one in memory op folding. No, I only used memcpy as an example, my goal is for the compiler to notice that the call (memcpy here, possibly other functions with some new attribute later) doesn't clobber the counter (i) and thus the counter value is the same after the call as it was before. If memcpy gets turned to an assignment (which alignment should prevent), my testcase becomes useless. -- Marc Glisse
Re: Aliasing: look through pointer's def stmt
On Fri, Oct 25, 2013 at 11:29 AM, Marc Glisse marc.gli...@inria.fr wrote: On Fri, 25 Oct 2013, Richard Biener wrote: On Fri, Oct 25, 2013 at 10:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 25, 2013 at 8:11 AM, Marc Glisse marc.gli...@inria.fr wrote: On Thu, 24 Oct 2013, Jeff Law wrote: On 10/24/13 23:23, Marc Glisse wrote: Hello, I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... 2013-10-25 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for an ADDR_EXPR in the defining statement. Shouldn't the ADDR_EXPR have been propagated into the use? Maybe when the address is a constant, but here it comes from malloc. points-to should have propagated the alias info, so no, looking at def-stmts in random places like this isn't ok. Where does alias info get lost? By points-to, do you mean in pt_solution? That's very rough information, and in particular here it can't tell me that 2 fields of the same struct don't alias, can it? No, it cannot. It doesn't get lost, but this is the missed optimization that malloc results still alias with global pointers (here a function argument). Taking into account the offset shouldn't change anything. I think this is a different issue (but if you say improving malloc would help, maybe). I suppose you are looking for the memcpy to be folded into an assignment? Which ao_ref_from_ptr_and_size is critical for the transform - it doesn't seem to be the one in memory op folding. No, I only used memcpy as an example, my goal is for the compiler to notice that the call (memcpy here, possibly other functions with some new attribute later) doesn't clobber the counter (i) and thus the counter value is the same after the call as it was before. Ah, so you are looking at call_may_clobber_ref_p_1 and its pointer handling when special-casing builtins? Note that fields can only be disambiguated if the size of the access is known (not sure what fancy attribute you are going to invent here ...). Generally the simple alias machinery is written to be cheap, walking use-def chains isn't. Users do not have to expect use-def chains to be walked (we can change that rule of course, but for example the RTL alias machinery also shares the core of the gimple alias machinery. But I can see that in this particular case the patch makes sense. Still, + if (TREE_CODE (ptr) == SSA_NAME) +{ + gimple stmt = SSA_NAME_DEF_STMT (ptr); + if (gimple_assign_single_p (stmt) + !gimple_has_volatile_ops (stmt) + gimple_assign_rhs_code (stmt) == ADDR_EXPR) + ptr = gimple_assign_rhs1 (stmt); +} no need to look at gimple_has_volatile_ops (stmt). Also you want to handle p_2 = p_1 + CST; foo (p_2); which has a related canonical form, p_2 = MEM[p_1, CST]; The patch is ok with the volatile check removed, you can followup with handling POINTER_PLUS_EXPR if you like. Thanks, Richard. If memcpy gets turned to an assignment (which alignment should prevent), my testcase becomes useless. -- Marc Glisse
Re: [PATCH, PR 57748] Check for out of bounds access
Doesn't that also apply to arithmetic on symbols when the offset is NULL? This can presumably be retrofitted when the offset is null or constant. But yes, the codegen example posted shows this kind of difference (though it doesn't seem to save anything for that case). I'd have expected a more explicit guarding of this case, like with MEM_P (to_rtx) SYMBOL_REF_P (XEXP (to_rtx, 0)) though, eventually even looking whether the resulting address is legitimate. But ... doesn't forwprop fix this up later anyway? Not clear IMO, it would need to reassociate. The comment before it is also odd and likely only applies to later added restrictions. Just to quote it again: if (offset != 0) { ... /* A constant address in TO_RTX can have VOIDmode, we must not try to call force_reg for that case. Avoid that case. */ if (MEM_P (to_rtx) GET_MODE (to_rtx) == BLKmode GET_MODE (XEXP (to_rtx, 0)) != VOIDmode bitsize 0 (bitpos % bitsize) == 0 (bitsize % GET_MODE_ALIGNMENT (mode1)) == 0 MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1)) { to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT); bitpos = 0; } to_rtx = offset_address (to_rtx, offset_rtx, highest_pow2_factor_for_target (to, offset)); } Yes, the comment is out-of-sync and not very informative. -- Eric Botcazou
RE: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, Eh ... even register struct { int i; int a[0]; } asm (ebx); works. Also with int a[1] but not with a[2]. So just handling trailing arrays makes this case regress as well. Now I'd call this use questionable ... but we've likely supported that for decades and cannot change that now. Back to fixing everything in expand. Richard. Ok, finally you asked for it. Here is my previous version of that patch again. I have now added a new value EXPAND_REFERENCE to the expand_modifier enumeration. It is almost like EXPAND_MEMORY but it does not interfere with constant values. I have done the same modification to VIEW_CONVERT_EXPR too, because this is a possible inner reference, itself. It is however inherently hard to test around this code. To understand this patch it is good to know what type of object the return value tem of get_inner_reference can be. From the program logic at get_inner_reference it is clear that the return value may *not* be BIT_FIELD_REF, COMPONENT_REF, ARRAY_REF, ARRAY_RANGE_REF, REALPART_EXPR, IMAGPART_EXPR. The result may be VIEW_CONVERT_EXPR only on a STRICT_ALIGNMENT target. This is probably further restricted because exp is gimplified. Usually the result will be a MEM_REF or a SSA_NAME of the memory where the structure is to be found. When you look at where EXPAND_MEMORY is handled you see it is special-cased in TARGET_MEM_REF, MEM_REF, ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. At TARGET_MEM_REF, MEM_REF, VIEW_CONVERT_EXPR, it should be the same if EXPAND_MEMORY, EXPAND_WRITE or EXPAND_REFERENCE is given: If it is an unaligned memory, we just return the unaligned reference. This was missing for VIEW_CONVERT_EXPR, and unfortunately I have no test case, because it is only a problem for STRICT_ALIGNMENT targets, and even there it will certainly be really hard to find test cases that exercise this code. In ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF we do not have to touch the handling of the outer modifier. However we pass EXPAND_REFERENCE to the inner object, which should not be a recursive use of any ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. TARGET_MEM_REF, MEM_REF and VIEW_CONVERT_EXPR know how to handle EXPAND_REFERENCE, anything else handles it like EXPAND_NORMAL. Boot-strapped and regression-tested on x86_64-linux-gnu OK for trunk? Thanks Bernd.2013-10-25 Bernd Edlinger bernd.edlin...@hotmail.de PR middle-end/57748 * expr.h (expand_modifier): New enum value EXPAND_REFERENCE. * expr.c (expand_expr_real_1): Use EXAND_REFERENCE on base object. testsuite: 2013-10-25 Bernd Edlinger bernd.edlin...@hotmail.de PR middle-end/57748 * gcc.dg/torture/pr57748-3.c: New test. * gcc.dg/torture/pr57748-4.c: New test. patch-pr57748-2.diff Description: Binary data
Re: PR C++/58708 - string literal operator templates broken
Hi, On 10/18/2013 04:42 AM, Ed Smith-Rowland wrote: --- testsuite/g++.dg/cpp1y/pr58708.C(revision 0) +++ testsuite/g++.dg/cpp1y/pr58708.C(working copy) @@ -0,0 +1,70 @@ +// { dg-options -std=c++1y } + +#include array +#include vector +#include type_traits +#include testsuite_hooks.h are you sure you want these includes in a C++ front-end testcase? Even testsuite_hooks.h instead of cassert? What about something more minimal, not including large headers like vector, for the C++ front-end + a library testcase (first blush, the above would be perfectly fine) Thanks, Paolo.
Re: C++ PATCH to resolve LWG issue 1265
Hi, On 10/23/2013 09:16 PM, Jason Merrill wrote: The second hunk adds %X for printing an exception-specification in diagnostics. In my experience these convenience %? that we have got easily trigger warnings during bootstrap/build, eg: /scratch/Gcc/svn-dirs/trunk/gcc/cp/error.c:3302:53: warning: unknown conversion type character ‘R’ in format [-Wformat] /scratch/Gcc/svn-dirs/trunk/gcc/cp/error.c:3302:53: warning: too many arguments for format [-Wformat-extra-args] I suppose we could do something about that... Paolo.
Re: Aliasing: look through pointer's def stmt
On Fri, 25 Oct 2013, Richard Biener wrote: Ah, so you are looking at call_may_clobber_ref_p_1 and its pointer handling when special-casing builtins? Yes. Note that fields can only be disambiguated if the size of the access is known TBAA could also help sometimes. (not sure what fancy attribute you are going to invent here ...). That will certainly require quite a bit of discussion... Generally the simple alias machinery is written to be cheap, I wouldn't mind an expensive version ;-) walking use-def chains isn't. Peeking at the defining statement shouldn't be very costly, as long as you don't do it recursively. no need to look at gimple_has_volatile_ops (stmt). Ok. Also you want to handle p_2 = p_1 + CST; foo (p_2); which has a related canonical form, p_2 = MEM[p_1, CST]; This testcase seems relevant, I'll see if I can handle it: void f (const char *c, int *i) { *i = 42; __builtin_memcpy (i + 1, c, sizeof (int)); if (*i != 42) __builtin_abort(); } The patch is ok with the volatile check removed, you can followup with handling POINTER_PLUS_EXPR if you like. Thanks. -- Marc Glisse
Re: C++ PATCH to resolve LWG issue 1265
On Fri, Oct 25, 2013 at 12:39:58PM +0200, Paolo Carlini wrote: On 10/23/2013 09:16 PM, Jason Merrill wrote: The second hunk adds %X for printing an exception-specification in diagnostics. In my experience these convenience %? that we have got easily trigger warnings during bootstrap/build, eg: /scratch/Gcc/svn-dirs/trunk/gcc/cp/error.c:3302:53: warning: unknown conversion type character ‘R’ in format [-Wformat] /scratch/Gcc/svn-dirs/trunk/gcc/cp/error.c:3302:53: warning: too many arguments for format [-Wformat-extra-args] I suppose we could do something about that... That is because if you are starting bootstrap with an older compiler that doesn't grok these, you get warnings during stage1, but those should be ignored. When you start bootstrapping with gcc 4.9 or in stage2 and later, they won't show up. That said, we probably need to teach new version of gettext about those. Jakub
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
On Fri, Oct 25, 2013 at 12:02 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, Eh ... even register struct { int i; int a[0]; } asm (ebx); works. Also with int a[1] but not with a[2]. So just handling trailing arrays makes this case regress as well. Now I'd call this use questionable ... but we've likely supported that for decades and cannot change that now. Back to fixing everything in expand. Richard. Ok, finally you asked for it. Here is my previous version of that patch again. I have now added a new value EXPAND_REFERENCE to the expand_modifier enumeration. It is almost like EXPAND_MEMORY but it does not interfere with constant values. I have done the same modification to VIEW_CONVERT_EXPR too, because this is a possible inner reference, itself. It is however inherently hard to test around this code. To understand this patch it is good to know what type of object the return value tem of get_inner_reference can be. From the program logic at get_inner_reference it is clear that the return value may *not* be BIT_FIELD_REF, COMPONENT_REF, ARRAY_REF, ARRAY_RANGE_REF, REALPART_EXPR, IMAGPART_EXPR. The result may be VIEW_CONVERT_EXPR only on a STRICT_ALIGNMENT target. This is probably further restricted because exp is gimplified. Usually the result will be a MEM_REF or a SSA_NAME of the memory where the structure is to be found. When you look at where EXPAND_MEMORY is handled you see it is special-cased in TARGET_MEM_REF, MEM_REF, ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. At TARGET_MEM_REF, MEM_REF, VIEW_CONVERT_EXPR, it should be the same if EXPAND_MEMORY, EXPAND_WRITE or EXPAND_REFERENCE is given: If it is an unaligned memory, we just return the unaligned reference. This was missing for VIEW_CONVERT_EXPR, and unfortunately I have no test case, because it is only a problem for STRICT_ALIGNMENT targets, and even there it will certainly be really hard to find test cases that exercise this code. In ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF we do not have to touch the handling of the outer modifier. However we pass EXPAND_REFERENCE to the inner object, which should not be a recursive use of any ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. TARGET_MEM_REF, MEM_REF and VIEW_CONVERT_EXPR know how to handle EXPAND_REFERENCE, anything else handles it like EXPAND_NORMAL. Boot-strapped and regression-tested on x86_64-linux-gnu OK for trunk? You point to a weak spot in expansion - that it handles creating the base MEM to offset with handled components by recursing into the case that handles bare MEM_REFs. This makes the bare MEM_REF handling code somewhat awkward (it's the one to assign mem-attrs which are later adjusted for example). Maybe a better appropach than adding yet another expand modifier would be to split out the base MEM expansion part out of the bare MEM_REF handling code so we can call that instead of recursing. In this light - instead of a new expand modifier don't you want an actual flag that specifies we are coming from a call that wants to expand a base? That is, allow EXPAND_SUM but with the recursion flag set? Finally I think the recursion into the VIEW_CONVERT_EXPR case is only there because of the keep_aligning flag of get_inner_reference which should be obsolete now that we properly handle its effects in get_object_alignment. So you wouldn't need to adjust this path if we finally can get rid of that. What do others think? Thanks, Richard. Thanks Bernd.
[C++ testcase, committed] PR 54812
Hi, I had a look to the testcases Jason added / tweaked in his recent patch about trivial but non-callable [cd]tors: I think a simple testcase directly from 54812 cannot hurt. Thanks, Paolo. 2013-10-25 Paolo Carlini paolo.carl...@oracle.com PR c++/54812 * g++.dg/cpp0x/defaulted47.C: New. Index: g++.dg/cpp0x/defaulted47.C === --- g++.dg/cpp0x/defaulted47.C (revision 0) +++ g++.dg/cpp0x/defaulted47.C (working copy) @@ -0,0 +1,16 @@ +// PR c++/54812 +// { dg-do compile { target c++11 } } + +class A +{ + A() = default; // { dg-error private } +}; + +A a; // { dg-error context } + +class B +{ + ~B() = default; // { dg-error private } +}; + +B b; // { dg-error context }
[PATCH v4, nds32] Andes nds32 port, machine description 3/3
Hi, all, This is v4 patch for Andes nds32 port on machine description part 3, which includes other rtl patterns and settings. gcc/ 2013-10-25 Chung-Ju Wu jasonw...@gmail.com Shiva Chen shiva0...@gmail.com * config/nds32/constants.md: New file. * config/nds32/constraints.md: New file. * config/nds32/iterators.md: New file. * config/nds32/nds32-doubleword.md: New file. * config/nds32/nds32-intrinsic.md: New file. * config/nds32/nds32_intrinsic.h: New file. * config/nds32/nds32-modes.def: New file. * config/nds32/nds32-multiple.md: New file. * config/nds32/nds32.opt: New file. * config/nds32/nds32-opts.h: New file. * config/nds32/nds32-protos.h: New file. * config/nds32/nds32-peephole2.md: New file. * config/nds32/pipelines.md: New file. * config/nds32/predicates.md: New file. * config/nds32/t-mlibs: New file. Best regards, jasonwucj
RFA: Andes nds32 port v4 patch
Hi, GCC global reviewers, I would like to thank Joseph Myers's preliminary review and Richard Sandiford's help for further technical review in these three months: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01138.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01180.html Their comments clearly enhance nds32 port machine description design. Now we have v4 patch for nds32 port in which all the issues have been addressed: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html Any advice or comment to improve the patch is much appreciated. If there is no further comment, is this nds32 v4 patch OK for the trunk? :) Best regards, jasonwucj
Re: Aliasing: look through pointer's def stmt
Marc Glisse wrote: I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... I wonder why you are seeing failures with those test cases. I tired your patch (w/o modifying the test cases) and they passed. The idea of those test cases is to ensure that there is no output like note: loop versioned for vectorization because of possible aliasing Because that output would be a hint that #pragma GCC ivdep doesn't work. What kind of output do you see? Seemingly not the one above as you kept the version dg-bogus. Tobias
RE: [PATCH] Fix C++0x memory model for -fno-strict-volatile-bitfields on ARM
Hi, On Fri, 25 Oct 2013 11:26:20, Richard Biener wrote: On Fri, Oct 25, 2013 at 10:40 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hello, this patch fixes the recently discovered data store race on arm-eabi-gcc with -fno-strict-volatile-bitfields for structures like this: #define test_type unsigned short typedef struct s{ unsigned char Prefix[1]; test_type Type; }__attribute((__packed__,__aligned__(4))) ss; volatile ss v; void __attribute__((noinline)) foo (test_type u) { v.Type = u; } test_type __attribute__((noinline)) bar (void) { return v.Type; } I've manually confirmed the correct code generation using variations of the example above on an ARM cross-compiler for -fno-strict-volatile-bitfields. Note, that this example is still causes ICE's for -fstrict-volatile-bitfields, but I'd like to fix that separately. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Isn't it more appropriate to fix it here: if (TREE_CODE (to) == COMPONENT_REF DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1))) get_bit_range (bitregion_start, bitregion_end, to, bitpos, offset); ? Honestly, I'd call this is a work-around, not a design. Therefore I would not move that workaround to expr.c. Also the block below is only a work-around IMHO. if (MEM_P (str_rtx) bitregion_start 0) { enum machine_mode bestmode; HOST_WIDE_INT offset, size; gcc_assert ((bitregion_start % BITS_PER_UNIT) == 0); offset = bitregion_start / BITS_PER_UNIT; bitnum -= bitregion_start; size = (bitnum + bitsize + BITS_PER_UNIT - 1) / BITS_PER_UNIT; bitregion_end -= bitregion_start; bitregion_start = 0; bestmode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end, MEM_ALIGN (str_rtx), VOIDmode, MEM_VOLATILE_P (str_rtx)); str_rtx = adjust_bitfield_address_size (str_rtx, bestmode, offset, size); } Here, if bitregion_start = 8, we have a 4 byte aligned memory context, and whoops, now it is only 1 byte aligned. this example: struct s { char a; int b:24; }; struct s ss; void foo(int b) { ss.b = b; } gets compiled (at -O3) to: foo: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r3, .L2 mov r1, r0, lsr #8 mov r2, r0, lsr #16 strb r1, [r3, #2] strb r0, [r3, #1] strb r2, [r3, #3] bx lr while... struct s { char a; int b:24; }; struct s ss; void foo(int b) { ss.b = b; } gets compiled (at -O3) to foo: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r3, .L2 mov r2, r0, lsr #16 strb r2, [r3, #2] strh r0, [r3] @ movhi bx lr which is more efficient, but only because the memory context is still aligned in this case. Btw, the C++ standard doesn't cover packed or aligned attributes so we could declare this a non-issue. Any opinion on that? Thanks, Richard. Thanks Bernd.
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 06:08 AM, Paolo Carlini wrote: Hi, On 10/18/2013 04:42 AM, Ed Smith-Rowland wrote: --- testsuite/g++.dg/cpp1y/pr58708.C (revision 0) +++ testsuite/g++.dg/cpp1y/pr58708.C(working copy) @@ -0,0 +1,70 @@ +// { dg-options -std=c++1y } + +#include array +#include vector +#include type_traits +#include testsuite_hooks.h are you sure you want these includes in a C++ front-end testcase? Even testsuite_hooks.h instead of cassert? What about something more minimal, not including large headers like vector, for the C++ front-end + a library testcase (first blush, the above would be perfectly fine) Thanks, Paolo. You're right. I'll send something slimmer later tonight. The rest of the tests in the cpp1y directory use __builtin_abort. So, you also want a library testcase?
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 01:38 PM, Ed Smith-Rowland wrote: So, you also want a library testcase? Up to you: if you feel it would test something that the c++ front-end testcase doesn't, why not. Paolo.
[c++-concepts] Requires expr in non-templte
This patch prevents the use of requires expressions in non-template scopes. This restriction was relaxed in the most recent version of concepts lite, but the implementation requires some thought. For now, I am marking it an error to make it consistent with previous versions of the spec. 2013-10-25 Andrew Sutton andrew.n.sut...@gmail.com * gcc/cp/parsre.c (cp_parser_requires_expression): Gracefully fail when parsing a requires expr outside a template. Andrew Sutton template-requires.patch Description: Binary data
Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips
OK, so it is about 2%. Did you try if you need lookahead even in the early pass (before reload)? My guess would be so, but if not, it could cut the cost to half. For -Ofast/-O3 it looks resonable to me, but we will need to announce it on the ML. For other settings I think we need to work on more improvements or cut the expenses. Yes, it is required before reload. I have another idea which can be pondered upon. Currently, can we enable lookahead with the value 4 (pre reload) for default? This will exponentially cut the cost of build time. I have done some measurements on the build time of some benchmarks (mentioned below) with lookahead value 4. The 2% increase in build time with value 8 is now almost gone. dfa4 no_lookahead perlbench - 191s 193s bzip2 - 19s 19s gcc - 429s 429s mcf - 3s3s gobmk - 116s 115s hmmer - 60s 60s sjeng - 18s 17s libquantum - 6s6s h264ref - 107s 107s omnetpp - 128s 128s astar - 7s7s bwaves - 5s5s gamess - 1964s 1957s milc- 18s 18s GemsFDTD- 273s 272s Lookahead value 4 also helps because, the modified decoder model in bdver3.md is only two cycles deep (though in hardware it is actually 4 cycles deep). This means that we can look another two levels deep for better schedule. GemsFDTD still retains the performance boost of around 6-7% with value 4. Let me know your thoughts. This seems resonable. I would go for lookahead of 4 for now and 8 for -Ofast and we can tune things based on the experience with this setting incrementally. Uros, Richard, what do you think? Honza
Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips
On Fri, 25 Oct 2013, Jan Hubicka wrote: OK, so it is about 2%. Did you try if you need lookahead even in the early pass (before reload)? My guess would be so, but if not, it could cut the cost to half. For -Ofast/-O3 it looks resonable to me, but we will need to announce it on the ML. For other settings I think we need to work on more improvements or cut the expenses. Yes, it is required before reload. I have another idea which can be pondered upon. Currently, can we enable lookahead with the value 4 (pre reload) for default? This will exponentially cut the cost of build time. I have done some measurements on the build time of some benchmarks (mentioned below) with lookahead value 4. The 2% increase in build time with value 8 is now almost gone. dfa4 no_lookahead perlbench - 191s 193s bzip2 - 19s 19s gcc - 429s 429s mcf - 3s3s gobmk - 116s 115s hmmer - 60s 60s sjeng - 18s 17s libquantum - 6s6s h264ref - 107s 107s omnetpp - 128s 128s astar - 7s7s bwaves - 5s5s gamess - 1964s 1957s milc- 18s 18s GemsFDTD- 273s 272s Lookahead value 4 also helps because, the modified decoder model in bdver3.md is only two cycles deep (though in hardware it is actually 4 cycles deep). This means that we can look another two levels deep for better schedule. GemsFDTD still retains the performance boost of around 6-7% with value 4. Let me know your thoughts. This seems resonable. I would go for lookahead of 4 for now and 8 for -Ofast and we can tune things based on the experience with this setting incrementally. Uros, Richard, what do you think? Well, certainly -O3 not -Ofast. Richard.
Re: Aliasing: look through pointer's def stmt
On Fri, 25 Oct 2013, Tobias Burnus wrote: Marc Glisse wrote: I noticed that in some cases we were failing to find aliasing information because we were only looking at an SSA_NAME variable, missing the fact that it was really an ADDR_EXPR. The attached patch passes bootstrap+testsuite, does it make sense? (I am a bit afraid of losing some type information for instance) I didn't investigate the 2 tests where I had to remove dg-bogus, because removing dg-bogus sounds like a bonus... I wonder why you are seeing failures with those test cases. I tired your patch (w/o modifying the test cases) and they passed. The idea of those test cases is to ensure that there is no output like note: loop versioned for vectorization because of possible aliasing Because that output would be a hint that #pragma GCC ivdep doesn't work. What kind of output do you see? Seemingly not the one above as you kept the version dg-bogus. beats self on the head My source directory includes the word alias in it. So I'll leave those 2 dg-bogus alone, but please tighten the regexp a bit, include at least a space or something... -- Marc Glisse
Re: question about register pairs
On 25 October 2013 05:15, DJ Delorie d...@redhat.com wrote: Yup, my registers are smaller than Pmode. This is what I ended up with... Some notes: I lie to gcc and tell it that $fp (reg 22) is two bytes when it's really one. Well, it's not really a lie if you map hardware registers 22 and 23 to a single register for the purposes of gcc internals. Although it does make some other things more awkward, e.g. when you copy fp, and this gets split so you have an insn that copies the highpart of fp to another register. Index: reload.c === --- reload.c(revision 203733) +++ reload.c(working copy) @@ -723,13 +723,15 @@ find_valid_class_1 (enum machine_mode ou unsigned int best_size = 0; int cost; for (rclass = 1; rclass N_REG_CLASSES; rclass++) { int bad = 0; - for (regno = 0; regno FIRST_PSEUDO_REGISTER !bad; regno++) + for (regno = 0; + regno FIRST_PSEUDO_REGISTER !bad; + regno += HARD_REGNO_NREGS (regno, mode)) { if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno) !HARD_REGNO_MODE_OK (regno, mode)) bad = 1; } Seeing the patched code in its entirety like this, I notice that we would use HARD_REGNO_NREGS for a regno that's not ok for the mode. That can be avoided if we put a break into the if. And then the !bad term in the loop condition becomes redundant. Although the HARD_REGNO_NREGS definition in tm.texi says that HARD_REGNO_NREGS must never return zero. That wouldn't be technically violated with an assert, but I suppose the spirit of the text is that the macro should return always a non-positive value, so I suppose this use of HARD_REGNO_MODE_OK is not actually a problem.
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, On Fri, Oct 25, 2013 at 12:51:13PM +0200, Richard Biener wrote: On Fri, Oct 25, 2013 at 12:02 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, Eh ... even register struct { int i; int a[0]; } asm (ebx); works. Also with int a[1] but not with a[2]. So just handling trailing arrays makes this case regress as well. Now I'd call this use questionable ... but we've likely supported that for decades and cannot change that now. Back to fixing everything in expand. Richard. Ok, finally you asked for it. Here is my previous version of that patch again. I have now added a new value EXPAND_REFERENCE to the expand_modifier enumeration. It is almost like EXPAND_MEMORY but it does not interfere with constant values. I have done the same modification to VIEW_CONVERT_EXPR too, because this is a possible inner reference, itself. It is however inherently hard to test around this code. To understand this patch it is good to know what type of object the return value tem of get_inner_reference can be. From the program logic at get_inner_reference it is clear that the return value may *not* be BIT_FIELD_REF, COMPONENT_REF, ARRAY_REF, ARRAY_RANGE_REF, REALPART_EXPR, IMAGPART_EXPR. The result may be VIEW_CONVERT_EXPR only on a STRICT_ALIGNMENT target. This is probably further restricted because exp is gimplified. Usually the result will be a MEM_REF or a SSA_NAME of the memory where the structure is to be found. When you look at where EXPAND_MEMORY is handled you see it is special-cased in TARGET_MEM_REF, MEM_REF, ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. At TARGET_MEM_REF, MEM_REF, VIEW_CONVERT_EXPR, it should be the same if EXPAND_MEMORY, EXPAND_WRITE or EXPAND_REFERENCE is given: If it is an unaligned memory, we just return the unaligned reference. This was missing for VIEW_CONVERT_EXPR, and unfortunately I have no test case, because it is only a problem for STRICT_ALIGNMENT targets, and even there it will certainly be really hard to find test cases that exercise this code. In ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF we do not have to touch the handling of the outer modifier. However we pass EXPAND_REFERENCE to the inner object, which should not be a recursive use of any ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF, ARRAY_RANGE_REF. TARGET_MEM_REF, MEM_REF and VIEW_CONVERT_EXPR know how to handle EXPAND_REFERENCE, anything else handles it like EXPAND_NORMAL. Boot-strapped and regression-tested on x86_64-linux-gnu OK for trunk? You point to a weak spot in expansion - that it handles creating the base MEM to offset with handled components by recursing into the case that handles bare MEM_REFs. This makes the bare MEM_REF handling code somewhat awkward (it's the one to assign mem-attrs which are later adjusted for example). Maybe a better appropach than adding yet another expand modifier would be to split out the base MEM expansion part out of the bare MEM_REF handling code so we can call that instead of recursing. I think that we should seize every easy opportunity to break up expand_expr* functions into smaller ones with nicer names and easier to understand semantics. So I think is a good idea. Thanks, Martin In this light - instead of a new expand modifier don't you want an actual flag that specifies we are coming from a call that wants to expand a base? That is, allow EXPAND_SUM but with the recursion flag set? Finally I think the recursion into the VIEW_CONVERT_EXPR case is only there because of the keep_aligning flag of get_inner_reference which should be obsolete now that we properly handle its effects in get_object_alignment. So you wouldn't need to adjust this path if we finally can get rid of that. What do others think? Thanks, Richard. Thanks Bernd.
Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.
But it's nothing to do with construction inside memory_address_addr_space but with validity of an address construct. first thing memory_address_addr_space does is deconstructing address_mode from as parameter. But if one doesn't pass actual mode there's not way to find it and call to legitimate_address_p hook will fail. On Fri, Oct 25, 2013 at 1:39 AM, Eric Botcazou ebotca...@adacore.com wrote: Thanks. Installed on the trunk. Well, no, that will be problematic for some architectures. The history of this piece of code is complicated and it's admittedly lacking a comment, but the purpose of the block is clear enough: op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_SUM); op0 = memory_address_addr_space (address_mode, op0, as); if (!integer_zerop (TREE_OPERAND (exp, 1))) { rtx off = immed_double_int_const (mem_ref_offset (exp), address_mode); op0 = simplify_gen_binary (PLUS, address_mode, op0, off); } op0 = memory_address_addr_space (mode, op0, as); The offset computation is done in address_mode and then, only at the end, converted to mode. -- Eric Botcazou
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 07:54 AM, Paolo Carlini wrote: On 10/25/2013 01:38 PM, Ed Smith-Rowland wrote: So, you also want a library testcase? Up to you: if you feel it would test something that the c++ front-end testcase doesn't, why not. Paolo. I think this patch should be sufficient - no containers, it just follows the pattern of the other tests in the cpp1y directory. I builds and tests of x86_64-linux. OK? Later we can think of library tools that could help authors of literals operators. The bits/bits/parse_numbers.h is a start on this. Ed Index: cp/parser.c === --- cp/parser.c (revision 203997) +++ cp/parser.c (working copy) @@ -3793,22 +3793,39 @@ tree charvec; tree argpack = make_node (NONTYPE_ARGUMENT_PACK); const char *str = TREE_STRING_POINTER (value); - int i, len = TREE_STRING_LENGTH (value) - 1; + int sz = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (value; + int len = TREE_STRING_LENGTH (value) / sz - 1; tree argvec = make_tree_vec (2); - tree string_char_type_node = TREE_TYPE (TREE_TYPE (value)); + tree str_char_type_node = TREE_TYPE (TREE_TYPE (value)); + str_char_type_node = TYPE_MAIN_VARIANT (str_char_type_node); /* First template parm is character type. */ - TREE_VEC_ELT (argvec, 0) = string_char_type_node; + TREE_VEC_ELT (argvec, 0) = str_char_type_node; /* Fill in CHARVEC with all of the parameters. */ charvec = make_tree_vec (len); - for (i = 0; i len; ++i) -TREE_VEC_ELT (charvec, i) = build_int_cst (string_char_type_node, str[i]); + if (sz == 1) +{ + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, str[i]); +} + else if (sz == 2) +{ + const uint16_t *num = (const uint16_t *)str; + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, num[i]); +} + else if (sz == 4) +{ + const uint32_t *num = (const uint32_t *)str; + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, num[i]); +} /* Build the argument packs. */ SET_ARGUMENT_PACK_ARGS (argpack, charvec); - TREE_TYPE (argpack) = string_char_type_node; + TREE_TYPE (argpack) = str_char_type_node; TREE_VEC_ELT (argvec, 1) = argpack; Index: testsuite/g++.dg/cpp1y/pr58708.C === --- testsuite/g++.dg/cpp1y/pr58708.C(revision 0) +++ testsuite/g++.dg/cpp1y/pr58708.C(working copy) @@ -0,0 +1,68 @@ +// { dg-options -std=c++1y } +// { dg-do run } + +#include type_traits + +struct Foo +{ + bool type[4]; + bool const_type[4]; + int chars[10]; +}; + +templatetypename CharT, CharT... str +Foo +operator_foo() +{ + CharT arr[]{str...}; + + Foo foo; + + foo.type[0] = std::is_sameCharT, char::value; + foo.type[1] = std::is_sameCharT, wchar_t::value; + foo.type[2] = std::is_sameCharT, char16_t::value; + foo.type[3] = std::is_sameCharT, char32_t::value; + foo.const_type[0] = std::is_sameCharT, const char::value; + foo.const_type[1] = std::is_sameCharT, const wchar_t::value; + foo.const_type[2] = std::is_sameCharT, const char16_t::value; + foo.const_type[3] = std::is_sameCharT, const char32_t::value; + + for(int i; i sizeof(arr)/sizeof(CharT) - 1; ++i) +foo.chars[i] = (int)arr[i]; + + return foo; +} + +int +main() +{ + Foo foo; + + foo = U\x1\x10001\x10002_foo; + if (foo.type[3] != true) __builtin_abort(); + if (sizeof(foo.chars)/sizeof(char32_t)-1 != 3) __builtin_abort(); + if (foo.chars[0] != 65536) __builtin_abort(); + if (foo.chars[1] != 65537) __builtin_abort(); + if (foo.chars[2] != 65538) __builtin_abort(); + + foo = \x61\x62\x63_foo; + if (foo.type[0] != true) __builtin_abort(); + if (sizeof(foo.chars)/sizeof(char)-1 != 3) __builtin_abort(); + if (foo.chars[0] != 97) __builtin_abort(); + if (foo.chars[1] != 98) __builtin_abort(); + if (foo.chars[2] != 99) __builtin_abort(); + + foo = L\x01020304\x05060708_foo; + if (foo.type[1] != true) __builtin_abort(); + if (sizeof(foo.chars)/sizeof(wchar_t)-1 != 2) __builtin_abort(); + if (foo.chars[0] != 16909060) __builtin_abort(); + if (foo.chars[1] != 84281096) __builtin_abort(); + + foo = u\x0102\x0304\x0506\x0708_foo; + if (foo.type[2] != true) __builtin_abort(); + if (sizeof(foo.chars)/sizeof(char16_t)-1 != 4) __builtin_abort(); + if (foo.chars[0] != 258) __builtin_abort(); + if (foo.chars[1] != 772) __builtin_abort(); + if (foo.chars[2] != 1286) __builtin_abort(); + if (foo.chars[3] != 1800) __builtin_abort(); +} gcc/cp: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708 * parser.c (make_string_pack): Discover non-const type and size of character and build parm pack with correct type and chars. gcc/testsuite: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 03:46 PM, Ed Smith-Rowland wrote: On 10/25/2013 07:54 AM, Paolo Carlini wrote: On 10/25/2013 01:38 PM, Ed Smith-Rowland wrote: So, you also want a library testcase? Up to you: if you feel it would test something that the c++ front-end testcase doesn't, why not. Paolo. I think this patch should be sufficient - no containers, it just follows the pattern of the other tests in the cpp1y directory. Note, however, that type_traits isn't small at all, and apparently you are only using std::is_same which is just a one line template + a one line specialization (just grep in testsuite/g++.dg) Paolo.
Re: [Patch] Refactor regex executors
Hi, On 10/25/2013 04:05 PM, Tim Shen wrote: On Fri, Oct 25, 2013 at 5:07 AM, Paolo Carlini paolo.carl...@oracle.com wrote: Ok. Are there measurable performance changes at this point? I'm asking because we should take the chance and add performance testcases when we improve the performance: later, unless somebody files a specific Bug report, we become lazy about those, I know that ;) I do use another test case, but it's from Stackoverflow. I tend not to include that; Howeverm I make a bfs version of the previous split.cc testcase and it indeed reflect the optimization(in my machine, running time from 294u to 194u). Ah good. Can't you just have split_bfs.cc defining the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT macro and including the existing split.cc? Paolo.
[PATCH, SH] Add support for inlined builtin_strncmp
Hello, This patch implements the cmpstrnsi pattern to support the strncmp builtin for constant lengths. The cmp/str instructions is used for size = 8 bytes, else fall back to the byte-at-a-time check to favor small strings. I now also handle the cases where align is known for both cmpstr and cmpstrn, so we can avoid the pointer check, and added a schedule improvement to speculate the extu.b r1,r1 instruction into the delay slot, winning an additional instruction (we know that r1 is 0) when the end of string is reached. The byte-at-a-time loop becomes: mov.b @r4+,r1 tst r1,r1 bt/s.L4 mov.b @r3+,r0 cmp/eq r1,r0 bt/s.L9 extu.b r1,r1 .L4: extu.b r0,r0 rts sub r1,r0 Enabled the existing execute/builtins/strncmp-2.c for functional check and added 2 new target specific tests. No regressions for -m2 and -m4 for sh-elf. OK for trunk ? Many thanks, Christian 2013-10-27 Christian Bruel christian.br...@st.com * gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): Moved here. (sh_expand_cmpstr): Handle known align and schedule improvements. * gcc/config/sh/sh-protos.h (sh_expand_cmpstrn): Declare. * gcc/config/sh/sh.md (cmpstrnsi): New pattern. * gcc.c-torture/execute/builtins/strncmp-2.c: Enable for SH. * gcc.target/sh/cmpstr.c: New test. * gcc.target/sh/cmpstrn.c: New test. Index: config/sh/sh-mem.cc === --- config/sh/sh-mem.cc (revision 204013) +++ config/sh/sh-mem.cc (working copy) @@ -200,22 +200,25 @@ sh_expand_cmpstr (rtx *operands) rtx L_return = gen_label_rtx (); rtx L_loop_byte = gen_label_rtx (); rtx L_end_loop_byte = gen_label_rtx (); - rtx L_loop_long = gen_label_rtx (); - rtx L_end_loop_long = gen_label_rtx (); rtx jump, addr1, addr2; int prob_unlikely = REG_BR_PROB_BASE / 10; int prob_likely = REG_BR_PROB_BASE / 4; - emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr)); - emit_move_insn (tmp0, GEN_INT (3)); + rtx L_loop_long = gen_label_rtx (); + rtx L_end_loop_long = gen_label_rtx (); - emit_insn (gen_tstsi_t (tmp0, tmp1)); + int align = INTVAL (operands[3]); emit_move_insn (tmp0, const0_rtx); - jump = emit_jump_insn (gen_branch_false (L_loop_byte)); - add_int_reg_note (jump, REG_BR_PROB, prob_likely); + if (align 4) +{ + emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr)); + emit_insn (gen_tstsi_t (GEN_INT (3), tmp1)); + jump = emit_jump_insn (gen_branch_false (L_loop_byte)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); +} addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0); addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0); @@ -250,7 +253,7 @@ sh_expand_cmpstr (rtx *operands) add_int_reg_note (jump, REG_BR_PROB, prob_likely); /* end loop. */ - /* Fallthu, check if one of the word is greater. */ + /* Fallthu, diff results r. */ if (TARGET_LITTLE_ENDIAN) { rtx low_1 = gen_lowpart (HImode, tmp1); @@ -267,15 +270,15 @@ sh_expand_cmpstr (rtx *operands) jump = emit_jump_insn (gen_jump_compact (L_return)); emit_barrier_after (jump); - /* start byte loop. */ - addr1 = adjust_automodify_address (s1, QImode, s1_addr, 0); - addr2 = adjust_automodify_address (s2, QImode, s2_addr, 0); - emit_label (L_end_loop_long); emit_move_insn (s1_addr, plus_constant (Pmode, s1_addr, -4)); emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, -4)); + /* start byte loop. */ + addr1 = adjust_automodify_address (s1, QImode, s1_addr, 0); + addr2 = adjust_automodify_address (s2, QImode, s2_addr, 0); + emit_label (L_loop_byte); emit_insn (gen_extendqisi2 (tmp2, addr2)); @@ -289,13 +292,16 @@ sh_expand_cmpstr (rtx *operands) add_int_reg_note (jump, REG_BR_PROB, prob_unlikely); emit_insn (gen_cmpeqsi_t (tmp1, tmp2)); - emit_jump_insn (gen_branch_true (L_loop_byte)); + if (flag_delayed_branch) +emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2))); + jump = emit_jump_insn (gen_branch_true (L_loop_byte)); add_int_reg_note (jump, REG_BR_PROB, prob_likely); /* end loop. */ emit_label (L_end_loop_byte); - emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2))); + if (! flag_delayed_branch) +emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2))); emit_insn (gen_zero_extendqisi2 (tmp1, gen_lowpart (QImode, tmp1))); emit_label (L_return); @@ -305,3 +311,166 @@ sh_expand_cmpstr (rtx *operands) return true; } +/* Emit code to perform a strcmp. + + OPERANDS[0] is the destination. + OPERANDS[1] is the first string. + OPERANDS[2] is the second string. + OPERANDS[3] is the length. + OPERANDS[4] is the align. */ +bool +sh_expand_cmpnstr (rtx *operands) +{ + rtx s1 = copy_rtx (operands[1]); + rtx s2 = copy_rtx (operands[2]); + + rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0)); + rtx s2_addr = copy_addr_to_reg
Re: [PATCH, SH] Add support for inlined builtin_strncmp
In the ChangeLog, the entry * gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): Moved here. is instead * gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): New function. Sorry for this, Christian
Re: [PATCH, C++, PR58282] Handle noexcept on transactions with -fno-exceptions
On 07/09/13 18:54, Jason Merrill wrote: OK. I've reproduced the same problem with the 4.7 and 4.8 branch, and checked that applying the patch fixes the problem. Committed to 4.7 and 4.8 branch as well. Thanks, - Tom
Re: [Patch] Refactor regex executors
On 10/25/2013 04:22 PM, Tim Shen wrote: On Fri, Oct 25, 2013 at 10:12 AM, Paolo Carlini paolo.carl...@oracle.com wrote: Ah good. Can't you just have split_bfs.cc defining the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT macro and including the existing split.cc? Here :) Hack the __FILE__ macro to let it report correctly. Uhm, cute, didn't consider the naming issue, but I'm not sure we want to do that... What about having the code in a split.h and including it from split.cc and split_bfs.cc instead? Paolo.
Re: [PATCH][buildrobot] libcpp/lex.c: Use enum properly
Andrew == Andrew Pinski pins...@gmail.com writes: enum raw_str_phase phase = RAW_STR_PREFIX; Andrew This is a good work around but please add a comment of why this is Andrew needed (to work around a bug in XLC++). I agree. It's ok with this update. thanks, Tom
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 10:01 AM, Paolo Carlini wrote: On 10/25/2013 03:46 PM, Ed Smith-Rowland wrote: On 10/25/2013 07:54 AM, Paolo Carlini wrote: On 10/25/2013 01:38 PM, Ed Smith-Rowland wrote: So, you also want a library testcase? Up to you: if you feel it would test something that the c++ front-end testcase doesn't, why not. Paolo. I think this patch should be sufficient - no containers, it just follows the pattern of the other tests in the cpp1y directory. Note, however, that type_traits isn't small at all, and apparently you are only using std::is_same which is just a one line template + a one line specialization (just grep in testsuite/g++.dg) Paolo. Here is one with necessary tools inlined. Ok? gcc/cp: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708 * parser.c (make_string_pack): Discover non-const type and size of character and build parm pack with correct type and chars. gcc/testsuite: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708 *g++.dg/cpp1y/pr58708.C : New. Index: cp/parser.c === --- cp/parser.c (revision 203997) +++ cp/parser.c (working copy) @@ -3793,22 +3793,39 @@ tree charvec; tree argpack = make_node (NONTYPE_ARGUMENT_PACK); const char *str = TREE_STRING_POINTER (value); - int i, len = TREE_STRING_LENGTH (value) - 1; + int sz = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (value; + int len = TREE_STRING_LENGTH (value) / sz - 1; tree argvec = make_tree_vec (2); - tree string_char_type_node = TREE_TYPE (TREE_TYPE (value)); + tree str_char_type_node = TREE_TYPE (TREE_TYPE (value)); + str_char_type_node = TYPE_MAIN_VARIANT (str_char_type_node); /* First template parm is character type. */ - TREE_VEC_ELT (argvec, 0) = string_char_type_node; + TREE_VEC_ELT (argvec, 0) = str_char_type_node; /* Fill in CHARVEC with all of the parameters. */ charvec = make_tree_vec (len); - for (i = 0; i len; ++i) -TREE_VEC_ELT (charvec, i) = build_int_cst (string_char_type_node, str[i]); + if (sz == 1) +{ + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, str[i]); +} + else if (sz == 2) +{ + const uint16_t *num = (const uint16_t *)str; + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, num[i]); +} + else if (sz == 4) +{ + const uint32_t *num = (const uint32_t *)str; + for (int i = 0; i len; ++i) + TREE_VEC_ELT (charvec, i) = build_int_cst (str_char_type_node, num[i]); +} /* Build the argument packs. */ SET_ARGUMENT_PACK_ARGS (argpack, charvec); - TREE_TYPE (argpack) = string_char_type_node; + TREE_TYPE (argpack) = str_char_type_node; TREE_VEC_ELT (argvec, 1) = argpack; Index: testsuite/g++.dg/cpp1y/pr58708.C === --- testsuite/g++.dg/cpp1y/pr58708.C(revision 0) +++ testsuite/g++.dg/cpp1y/pr58708.C(working copy) @@ -0,0 +1,91 @@ +// { dg-options -std=c++1y } +// { dg-do run } + +templatetypename _Tp, _Tp __v + struct integral_constant + { +static constexpr _Tp value = __v; +typedef _Tp value_type; +typedef integral_constant_Tp, __v type; +constexpr operator value_type() const { return value; } +constexpr value_type operator()() const { return value; } + }; + +templatetypename _Tp, _Tp __v + constexpr _Tp integral_constant_Tp, __v::value; + +typedef integral_constantbool, true true_type; + +typedef integral_constantbool, falsefalse_type; + +templatetypename, typename + struct is_same + : public false_type { }; + +templatetypename _Tp + struct is_same_Tp, _Tp + : public true_type { }; + +struct Foo +{ + bool type[4]; + bool const_type[4]; + int chars[10]; +}; + +templatetypename CharT, CharT... str +Foo +operator_foo() +{ + CharT arr[]{str...}; + + Foo foo; + + foo.type[0] = is_sameCharT, char::value; + foo.type[1] = is_sameCharT, wchar_t::value; + foo.type[2] = is_sameCharT, char16_t::value; + foo.type[3] = is_sameCharT, char32_t::value; + foo.const_type[0] = is_sameCharT, const char::value; + foo.const_type[1] = is_sameCharT, const wchar_t::value; + foo.const_type[2] = is_sameCharT, const char16_t::value; + foo.const_type[3] = is_sameCharT, const char32_t::value; + + for(int i; i sizeof(arr)/sizeof(CharT) - 1; ++i) +foo.chars[i] = (int)arr[i]; + + return foo; +} + +int +main() +{ + Foo foo; + + foo = U\x1\x10001\x10002_foo; + if (foo.type[3] != true) __builtin_abort(); + if (sizeof(foo.chars)/sizeof(char32_t)-1 != 3) __builtin_abort(); + if (foo.chars[0] != 65536) __builtin_abort(); + if (foo.chars[1] != 65537) __builtin_abort(); + if (foo.chars[2] != 65538) __builtin_abort(); + + foo = \x61\x62\x63_foo; + if (foo.type[0] != true)
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 10:01 AM, Paolo Carlini wrote: On 10/25/2013 03:46 PM, Ed Smith-Rowland wrote: On 10/25/2013 07:54 AM, Paolo Carlini wrote: On 10/25/2013 01:38 PM, Ed Smith-Rowland wrote: So, you also want a library testcase? Up to you: if you feel it would test something that the c++ front-end testcase doesn't, why not. Paolo. I think this patch should be sufficient - no containers, it just follows the pattern of the other tests in the cpp1y directory. Note, however, that type_traits isn't small at all, and apparently you are only using std::is_same which is just a one line template + a one line specialization (just grep in testsuite/g++.dg) Paolo. Hold up. something didn't work. I'll be back later. Ed
Re: PR C++/58708 - string literal operator templates broken
On 10/25/2013 04:29 PM, Ed Smith-Rowland wrote: Here is one with necessary tools inlined. You still have a lot of redundant code... I'm not going to insist anyway. Paolo.
Re: PR C++/58708 - string literal operator templates broken
On Fri, Oct 25, 2013 at 10:29:41AM -0400, Ed Smith-Rowland wrote: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708 * parser.c (make_string_pack): Discover non-const type and size of character and build parm pack with correct type and chars. gcc/testsuite: 2013-10-25 Edward Smith-Rowland 3dw...@verizon.net PR c++/58708 *g++.dg/cpp1y/pr58708.C : New. --- testsuite/g++.dg/cpp1y/pr58708.C (revision 0) +++ testsuite/g++.dg/cpp1y/pr58708.C (working copy) @@ -0,0 +1,91 @@ +// { dg-options -std=c++1y } +// { dg-do run } + +templatetypename _Tp, _Tp __v + struct integral_constant + { +static constexpr _Tp value = __v; +typedef _Tp value_type; +typedef integral_constant_Tp, __v type; +constexpr operator value_type() const { return value; } +constexpr value_type operator()() const { return value; } + }; + +templatetypename _Tp, _Tp __v + constexpr _Tp integral_constant_Tp, __v::value; + +typedef integral_constantbool, true true_type; + +typedef integral_constantbool, falsefalse_type; + +templatetypename, typename + struct is_same + : public false_type { }; + +templatetypename _Tp + struct is_same_Tp, _Tp + : public true_type { }; Why not just the minimal: template class T, class U struct is_same { static constexpr bool value = false; }; template class T struct is_sameT, T { static constexpr bool value = true; }; other tests are using? Jakub
Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping
Hi, On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote: On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote: /* Perform the second half of the transformation started in @@ -4522,7 +4704,15 @@ ira (FILE *f) allocation because of -O0 usage or because the function is too big. */ if (ira_conflicts_p) -find_moveable_pseudos (); +{ + df_analyze (); + calculate_dominance_info (CDI_DOMINATORS); + + find_moveable_pseudos (); + split_live_ranges_for_shrink_wrap (); + + free_dominance_info (CDI_DOMINATORS); +} You probably want to add another df_analyze if split_live_ranges_for_shrink_wrap makes code transformations. AFAIU find_moveable_pseudos doesn't change global liveness but your transformation might. IRA/LRA need up-to-date DF_LR results to compute allocno live ranges. OK, I have changed the patch to fo that (it is below, still bootstraps and passes tests on x86_64 fine). However, I have noticed that the corresponding part in function ira now looks like: /* ... */ if (delete_trivially_dead_insns (get_insns (), max_reg_num ())) df_analyze (); /* It is not worth to do such improvement when we use a simple allocation because of -O0 usage or because the function is too big. */ if (ira_conflicts_p) { df_analyze (); calculate_dominance_info (CDI_DOMINATORS); find_moveable_pseudos (); if (split_live_ranges_for_shrink_wrap ()) df_analyze (); free_dominance_info (CDI_DOMINATORS); } /* ... */ So, that left me wondering whether the first call to df_analyze is actually necessary, or whether perhaps the data are actually already up to date. What do you think? Thanks for all the feedback, Martin 2013-10-23 Martin Jambor mjam...@suse.cz PR rtl-optimization/10474 * ira.c (find_moveable_pseudos): Do not calculate dominance info nor df analysis. (interesting_dest_for_shprep): New function. (split_live_ranges_for_shrink_wrap): Likewise. (ira): Calculate dominance info and df analysis. Call split_live_ranges_for_shrink_wrap. testsuite/ * gcc.dg/pr10474.c: New testcase. * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise. * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise. diff --git a/gcc/ira.c b/gcc/ira.c index 203fbff..532db31 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -3989,9 +3989,6 @@ find_moveable_pseudos (void) pseudo_replaced_reg.release (); pseudo_replaced_reg.safe_grow_cleared (max_regs); - df_analyze (); - calculate_dominance_info (CDI_DOMINATORS); - i = 0; bitmap_initialize (live, 0); bitmap_initialize (used, 0); @@ -4311,7 +4308,196 @@ find_moveable_pseudos (void) regstat_free_ri (); regstat_init_n_sets_and_refs (); regstat_compute_ri (); - free_dominance_info (CDI_DOMINATORS); +} + + +/* If insn is interesting for parameter range-splitting shring-wrapping + preparation, i.e. it is a single set from a hard register to a pseudo, which + is live at CALL_DOM, return the destination. Otherwise return NULL. */ + +static rtx +interesting_dest_for_shprep (rtx insn, basic_block call_dom) +{ + rtx set = single_set (insn); + if (!set) +return NULL; + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + if (!REG_P (src) || !HARD_REGISTER_P (src) + || !REG_P (dest) || HARD_REGISTER_P (dest) + || (call_dom !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest +return NULL; + return dest; +} + +/* Split live ranges of pseudos that are loaded from hard registers in the + first BB in a BB that dominates all non-sibling call if such a BB can be + found and is not in a loop. Return true if the function has made any + changes. */ + +static bool +split_live_ranges_for_shrink_wrap (void) +{ + basic_block bb, call_dom = NULL; + basic_block first = single_succ (ENTRY_BLOCK_PTR); + rtx insn, last_interesting_insn = NULL; + bitmap_head need_new, reachable; + vecbasic_block queue; + + if (!flag_shrink_wrap) +return false; + + bitmap_initialize (need_new, 0); + bitmap_initialize (reachable, 0); + queue.create (n_basic_blocks); + + FOR_EACH_BB (bb) +FOR_BB_INSNS (bb, insn) + if (CALL_P (insn) !SIBLING_CALL_P (insn)) + { + if (bb == first) + { + bitmap_clear (need_new); + bitmap_clear (reachable); + queue.release (); + return false; + } + + bitmap_set_bit (need_new, bb-index); + bitmap_set_bit (reachable, bb-index); + queue.quick_push (bb); + break; + } + + if (queue.is_empty ()) +{ + bitmap_clear (need_new); + bitmap_clear (reachable); + queue.release (); + return false; +} + + while (!queue.is_empty ()) +{ + edge e; + edge_iterator ei; + + bb = queue.pop (); + FOR_EACH_EDGE (e, ei, bb-succs) + if (e-dest !=
[C++ Patch] PR 58878
Hi, here the issue is that we fail to detect shadowing declarations in inline member function templates. The reason is the following check in check_template_shadow: if (decl == olddecl - || TEMPLATE_PARMS_FOR_INLINE (current_template_parms)) + || (DECL_TEMPLATE_PARM_P (decl) + TEMPLATE_PARMS_FOR_INLINE (current_template_parms))) return true; which, to avoid duplicate error messages involving template parameters (see g++.old-deja/g++.benjamin/tem0[34].C) ends up skipping VAR_DECLs etc too. Tested x86_64-linux. Thanks, Paolo. / /cp 2013-10-25 Paolo Carlini paolo.carl...@oracle.com PR c++/58878 * pt.c (check_template_shadow): Don't skip declarations in inline member templates. /testsuite 2013-10-25 Paolo Carlini paolo.carl...@oracle.com PR c++/58878 * g++.dg/template/pr58878.C: New. Index: cp/pt.c === --- cp/pt.c (revision 204056) +++ cp/pt.c (working copy) @@ -3511,7 +3511,8 @@ check_template_shadow (tree decl) name inside a class. We check TPFI to avoid duplicate errors for inline member templates. */ if (decl == olddecl - || TEMPLATE_PARMS_FOR_INLINE (current_template_parms)) + || (DECL_TEMPLATE_PARM_P (decl) + TEMPLATE_PARMS_FOR_INLINE (current_template_parms))) return true; error (declaration of %q+#D, decl); Index: testsuite/g++.dg/template/pr58878.C === --- testsuite/g++.dg/template/pr58878.C (revision 0) +++ testsuite/g++.dg/template/pr58878.C (working copy) @@ -0,0 +1,62 @@ +// PR c++/58878 + +// Template-members of non-template class +struct A +{ +template typename t// { dg-error shadows } +void f() +{ +int t = 1; // { dg-error declaration } +} + +template typename t +void g(); +}; + +template typename t// { dg-error shadows } +void A::g() +{ +int t = 2; // { dg-error declaration } +} + +// (Non-template) Members of template class +template typename t// { dg-error shadows } +struct B +{ +void f() +{ +int t = 3; // { dg-error declaration } +} + +void g(); +}; + +template typename t// { dg-error shadows } +void Bt::g() +{ +int t = 4; // { dg-error declaration } +} + + +// Template members of template class +template typename t// { dg-error shadows } +struct C +{ +template typename s// { dg-error shadows } +void f() +{ +int t = 5; // { dg-error declaration } +int s = 6; // { dg-error declaration } +} + +template typename s +void g(); +}; + +template typename t// { dg-error shadows } +template typename s// { dg-error shadows } +void Ct::g() +{ +int t = 7; // { dg-error declaration } +int s = 8; // { dg-error declaration } +}
Re: [C++ Patch] PR 58878
OK. Jason
Re: [Patch, C++] Add C++ FE support for #pragma GCC ivdep
On Thu, Oct 24, 2013 at 09:17:55PM +0200, Tobias Burnus wrote: 2013-08-24 Tobias Burnus bur...@net-b.de PR other/33426 * g++.dg/parse/ivdep.C: New. * g++.dg/vect/pr33426-ivdep.cc: New. FYI, I'm seeing various new FAILs on i686-linux: +FAIL: gcc.dg/vect/vect-ivdep-1.c (test for warnings, line ) +FAIL: gcc.dg/vect/vect-ivdep-1.c -flto (test for warnings, line ) +FAIL: gfortran.dg/vect/vect-do-concurrent-1.f90 -O (test for warnings, line ) +FAIL: g++.dg/vect/pr33426-ivdep.cc -std=gnu++98 (test for warnings, line ) +FAIL: g++.dg/vect/pr33426-ivdep.cc -std=gnu++11 (test for warnings, line ) --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr33426-ivdep.cc @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_float } */ +/* { dg-options -O3 -fopt-info-vec-optimized } */ The problem is likely the same in all cases and same as I've mailed about recently, dg-options should never be used in *.dg/vect/* testcases, instead just use dg-additional-options, because otherwise the required target options to enable vectorization aren't passed in. Jakub
Re: Aliasing: look through pointer's def stmt
On 10/25/13 03:29, Marc Glisse wrote: It doesn't get lost, but this is the missed optimization that malloc results still alias with global pointers (here a function argument). Taking into account the offset shouldn't change anything. I think this is a different issue (but if you say improving malloc would help, maybe). Has the patent on Steensgaard's techniques expired yet? IIRC it would handle this kind of stuff quite well. Jeff
Re: [Patch] Fix gcc.dg/20050922-*.c
On Oct 24, 2013, at 7:33 PM, Hans-Peter Nilsson h...@bitrange.com wrote: On Thu, 24 Oct 2013, Hans-Peter Nilsson wrote: On Mon, 21 Oct 2013, Mike Stump wrote: On Oct 21, 2013, at 3:28 AM, Vidya Praveen vidyaprav...@arm.com wrote: Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can be a issue especially since they define uint32_t. OK for 4.7, 4.8? For release branches, you'd need to transition from the theoretical to the practical. On which systems (software) does it fail? If none, then no, a back port isn't necessary. If it fails on a system (or software) on which real users use, then I'll approve it once you name the system (software) and let it bake on trunk for a week and see if anyone objects? I too would like to include this change on those branches, as recent generic newlib changes has caused these tests to break (regress) for my autotester for cris-elf testing those branches. Uhm, I'm on the fence, half-way wanting to retract my suggestion. This seems a recent bug in newlib, in which e.g. #include stdlib.h causes uint32_t to be defined, bleeding from newlib-internal include of stdint.h. On the other hand, testing that bleed isn't the purpose of these tests. Ah, that's what I was interested in, recent change in newlib; that makes it even more reasonable to me. Standard headers are supposed to include all headers they need, and if they need uint32_t (or any of the types the header that defines that type has in it) in an interface in any header that that header needs… then it isn't a gratuitous stupidity.
Re: [PATCH] RFA: Remove mudflap
On Fri, 25 Oct 2013, Richard Biener wrote: Ok with preserving options as dummy, see examples like fargument-alias Common Ignore Does nothing. Preserved for backward compatibility. I don't think we should sorry (), but we can certainly warn that mudflap was replaced by -fsanitize=address(?). Implementing the warning properly isn't a requirement for the patch to go in (but the options should be still handled, but ignored). We can followup with a patch implementing the warning. Note that warning is just a matter of putting Warn(switch %qs is no longer supported) alongside Ignore - see various options in c.opt for examples. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix C++0x memory model for -fno-strict-volatile-bitfields on ARM
On Fri, 25 Oct 2013, Richard Biener wrote: Btw, the C++ standard doesn't cover packed or aligned attributes so we could declare this a non-issue. Any opinion on that? I think the memory model naturally applies to packed structures (i.e., writes to fields in them should not write to any other fields except as part of a sequence of consecutive non-zero-width bit-fields, unless allow-store-data-races is set). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][ubsan] Add VLA bound instrumentation
On Thu, Oct 24, 2013 at 03:57:17PM -0400, Jason Merrill wrote: On 09/25/2013 08:41 AM, Marek Polacek wrote: + /* Do the instrumentation of VLAs if desired. */ + if ((flag_sanitize SANITIZE_VLA) + size !TREE_CONSTANT (size) + /* From C++1y onwards, we throw an exception on a negative length size + of an array. */ + cxx_dialect cxx1y) This code is in a completely different place from the C++1y code in cp_finish_decl; they should be in the same place. I'm also concerned that doing it here will mean adding sanitization code to template definitions, but I think we want to wait to add it until instantiation time. I've tried to implement the instrumentation in cp_finish_decl. However, the problem is with multidimensional arrays, e.g. for int x = -1; int a[1][x]; array_of_runtime_bound_p returns false, thus we don't instrument this at all, nor throw an exception in c++1y mode... I don't know what to do with that. Previous implementation in create_array_type_for_decl handled this fine. + /* Prevent bogus set-but-not-used warnings: we're definitely using + the variable. */ + if (VAR_P (size)) +DECL_READ_P (size) = 1; Use mark_rvalue_use for this. Ah, thanks. This is not needed anymore. Both ubsan testsuite + bootstrap-ubsan pass. 2013-10-25 Marek Polacek pola...@redhat.com Implement -fsanitize=vla-bound. * opts.c (common_handle_option): Handle vla-bound. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE): Define. * flag-types.h (enum sanitize_code): Add SANITIZE_VLA. * asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR. c-family/ * c-ubsan.c: Don't include hash-table.h. (ubsan_instrument_vla): New function. * c-ubsan.h: Declare it. cp/ * decl.c (cp_finish_decl): Add VLA instrumentation. c/ * c-decl.c (grokdeclarator): Add VLA instrumentation. testsuite/ * g++.dg/ubsan/cxx1y-vla.C: New test. * c-c++-common/ubsan/vla-3.c: New test. * c-c++-common/ubsan/vla-2.c: New test. * c-c++-common/ubsan/vla-4.c: New test. * c-c++-common/ubsan/vla-1.c: New test. --- gcc/opts.c.mp 2013-10-25 11:56:57.69722 +0200 +++ gcc/opts.c 2013-10-25 11:57:04.221224139 +0200 @@ -1445,6 +1445,7 @@ common_handle_option (struct gcc_options { undefined, SANITIZE_UNDEFINED, sizeof undefined - 1 }, { unreachable, SANITIZE_UNREACHABLE, sizeof unreachable - 1 }, + { vla-bound, SANITIZE_VLA, sizeof vla-bound - 1 }, { NULL, 0, 0 } }; const char *comma; --- gcc/c-family/c-ubsan.c.mp 2013-10-25 11:56:57.699200012 +0200 +++ gcc/c-family/c-ubsan.c 2013-10-25 11:57:04.223224148 +0200 @@ -25,7 +25,6 @@ along with GCC; see the file COPYING3. #include alloc-pool.h #include cgraph.h #include gimple.h -#include hash-table.h #include output.h #include toplev.h #include ubsan.h @@ -86,8 +85,7 @@ ubsan_instrument_division (location_t lo return t; } -/* Instrument left and right shifts. If not instrumenting, return - NULL_TREE. */ +/* Instrument left and right shifts. */ tree ubsan_instrument_shift (location_t loc, enum tree_code code, @@ -157,4 +155,23 @@ ubsan_instrument_shift (location_t loc, t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); return t; +} + +/* Instrument variable length array bound. */ + +tree +ubsan_instrument_vla (location_t loc, tree size) +{ + tree type = TREE_TYPE (size); + tree t, tt; + + t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 0)); + tree data = ubsan_create_data (__ubsan_vla_data, +loc, ubsan_type_descriptor (type), NULL_TREE); + data = build_fold_addr_expr_loc (loc, data); + tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE); + tt = build_call_expr_loc (loc, tt, 2, data, ubsan_encode_value (size)); + t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); + + return t; } --- gcc/c-family/c-ubsan.h.mp 2013-10-25 11:56:57.700200016 +0200 +++ gcc/c-family/c-ubsan.h 2013-10-25 11:57:04.223224148 +0200 @@ -23,5 +23,6 @@ along with GCC; see the file COPYING3. extern tree ubsan_instrument_division (location_t, tree, tree); extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree); +extern tree ubsan_instrument_vla (location_t, tree); #endif /* GCC_C_UBSAN_H */ --- gcc/sanitizer.def.mp2013-10-25 11:56:57.703200028 +0200 +++ gcc/sanitizer.def 2013-10-25 11:57:04.225224158 +0200 @@ -297,3 +297,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN __ubsan_handle_builtin_unreachable, BT_FN_VOID_PTR, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE, +
Re: Aliasing: look through pointer's def stmt
On Fri, 25 Oct 2013, Richard Biener wrote: you can followup with handling POINTER_PLUS_EXPR if you like. Like this? (bootstrap+testsuite on x86_64-unknown-linux-gnu) 2013-10-26 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a POINTER_PLUS_EXPR in the defining statement. gcc/testsuite/ * gcc.dg/tree-ssa/alias-24.c: New file. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +void f (const char *c, int *i) +{ + *i = 42; + __builtin_memcpy (i + 1, c, sizeof (int)); + if (*i != 42) __builtin_abort(); +} + +/* { dg-final { scan-tree-dump-not abort optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ + Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204071) +++ gcc/tree-ssa-alias.c(working copy) @@ -567,20 +567,28 @@ void ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size) { HOST_WIDE_INT t1, t2; ref-ref = NULL_TREE; if (TREE_CODE (ptr) == SSA_NAME) { gimple stmt = SSA_NAME_DEF_STMT (ptr); if (gimple_assign_single_p (stmt) gimple_assign_rhs_code (stmt) == ADDR_EXPR) ptr = gimple_assign_rhs1 (stmt); + else if (is_gimple_assign (stmt) + gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + host_integerp (gimple_assign_rhs2 (stmt), 1)) + { + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt), size); + ref-offset += 8 * TREE_INT_CST_LOW (gimple_assign_rhs2 (stmt)); + return; + } } if (TREE_CODE (ptr) == ADDR_EXPR) ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0), ref-offset, t1, t2); else { ref-base = build2 (MEM_REF, char_type_node, ptr, null_pointer_node); ref-offset = 0;
Re: [PATCH 1/2] Handwritten part of conversion of IPA pass hooks to virtual functions
On Thu, 2013-10-10 at 09:56 +0200, Richard Biener wrote: As a general observation I don't like the if (pass-has_foo_member_function) pass-foo_member_function () style. It is totally against the idea of having virtual functions. Why not simply provide stub implementations in the base classes? (sorry about the belated response) The reason for the has_foo_member_function flags is that everywhere we use them, the style is more like this: if (pass_has_foo_member_function) { do_various_stuff (); pass-foo_member_function (); do_more_stuff (); } where the precise stuff varies between the different vfuncs. I don't think there's a portable way to check if a vfunc has been overridden, so having these flags seems the best way of handling this kind of logic. It's very similar to what we now have for the gate and execute hooks for the opt_pass base class. (In the first version of that work that I posted, I had vfuncs for the has_ logic, but simple bool flags let us avoid that virtual call, hence I went with the latter approach). The patches still apply cleanly to trunk; I'm rechecking the bootstrap and regtest. Are they OK for trunk if the testing passes? [For context: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00639.html ] Thanks Dave On Thu, Oct 10, 2013 at 2:41 AM, David Malcolm dmalc...@redhat.com wrote: * passes.c (ipa_opt_pass_d::generate_summary): New. (ipa_opt_pass_d::write_summary): New. (ipa_opt_pass_d::read_summary): New. (ipa_opt_pass_d::write_optimization_summary): New. (ipa_opt_pass_d::read_optimization_summary): New. (ipa_opt_pass_d::stmt_fixup): New. (ipa_opt_pass_d::function_transform): New. (ipa_opt_pass_d::variable_transform): New. (execute_ipa_summary_passes): Use ipa_data_. to determine if pass has a generate_summary implemenation, since we can't check a vtable entry. (execute_one_ipa_transform_pass): Likewise for function_transform, and for the function_transform_todo_flags_start field. (ipa_write_summaries_2): Likewise for write_summary. (ipa_write_optimization_summaries_1): Likewise for write_optimization_summary. (ipa_read_summaries_1): Likewise for read_summary. (ipa_read_optimization_summaries_1): Likewise for read_optimization_summary. (execute_ipa_stmt_fixups): Likewise for stmt_fixup. * tree-pass.h (struct ipa_pass_data): New. (ipa_opt_pass_d::generate_summary): Convert to virtual function. (ipa_opt_pass_d::write_summary): Likewise. (ipa_opt_pass_d::read_summary): Likewise. (ipa_opt_pass_d::write_optimization_summary): Likewise. (ipa_opt_pass_d::read_optimization_summary): Likewise. (ipa_opt_pass_d::stmt_fixup): Likewise. (ipa_opt_pass_d::function_transform_todo_flags_start): Drop; this is now part of ipa_pass_data. (ipa_opt_pass_d::function_transform): Convert to virtual function. (ipa_opt_pass_d::variable_transform): Likewise. (ipa_opt_pass_d::ipa_data_): New. (ipa_opt_pass_d::ipa_opt_pass_d): Drop function pointer fields in favor of virtual functions and an ipa_pass_data. --- gcc/passes.c| 70 ++--- gcc/tree-pass.h | 54 ++-- 2 files changed, 89 insertions(+), 35 deletions(-) diff --git a/gcc/passes.c b/gcc/passes.c index 1b2202e..625f1d3 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -117,7 +117,59 @@ opt_pass::opt_pass (const pass_data data, context *ctxt) { } +/* Default implementations of the various ipa_opt_pass_d hooks, + provided to detect inconsistent pass metadata. + These implementations should never be called: any pass that sets a + has_ flag it its ipa_pass_data should provide a function overriding + the corresponding default implementation. */ +void +ipa_opt_pass_d::generate_summary (void) +{ + internal_error (base generate_summary called for %s, name); +} + +void +ipa_opt_pass_d::write_summary (void) +{ + internal_error (base write_summary called for %s, name); +} + +void +ipa_opt_pass_d::read_summary (void) +{ + internal_error (base read_summary called for %s, name); +} + +void +ipa_opt_pass_d::write_optimization_summary (void) +{ + internal_error (base write_optimization_summary called for %s, name); +} + +void +ipa_opt_pass_d::read_optimization_summary (void) +{ + internal_error (base read_optimization_summary called for %s, name); +} + +void +ipa_opt_pass_d::stmt_fixup (struct cgraph_node *, gimple *) +{ + internal_error (base stmt_fixup called for %s, name); +} + +unsigned int +ipa_opt_pass_d::function_transform (struct cgraph_node *) +{ +
[PATCH] Use get_nonzero_bits to improve vectorization
Hi! The following patch makes use of the computed nonzero_bits preserved in the SSA_NAME_RANGE_INFO. I chose to write a new routine instead of improving current highest_pow2_factor, because that routine didn't care about overflows etc. and by working on ctz numbers instead of powers of two in UHWI we can handle even larger constants etc. highest_pow2_factor could very well overflow to zero etc. So, the patch introduces a new tree_ctz routine and reimplements highest_pow2_factor on top of that, plus uses tree_ctz also in get_object_alignment_2 and in the vectorizer to determine if it can avoid scalar loop for bound (and indirectly also in the analysis whether peeling for alignment is needed). With this patch, e.g. int a[1024]; void foo (int x, int y) { int i; x = -32; y = -32; for (i = x + 32; i y; i++) a[i]++; } can be vectorized without any peeling for alignment or scalar loop afterwards. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-10-25 Jakub Jelinek ja...@redhat.com * tree.c (tree_ctz): New function. * tree.h (tree_ctz): New prototype. * tree-ssanames.h (get_range_info, get_nonzero_bits): Change first argument from tree to const_tree. * tree-ssanames.c (get_range_info, get_nonzero_bits): Likewise. * tree-vectorizer.h (vect_generate_tmps_on_preheader): New prototype. * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): No longer static. * expr.c (highest_pow2_factor): Reimplemented using tree_ctz. * tree-vect-loop.c (vect_analyze_loop_operations, vect_transform_loop): Don't force scalar loop for bound just because number of iterations is unknown, only do it if it is not known to be a multiple of vectorization_factor. * builtins.c (get_object_alignment_2): Use tree_ctz on offset. --- gcc/tree.c.jj 2013-10-23 14:43:12.0 +0200 +++ gcc/tree.c 2013-10-25 15:00:55.296178794 +0200 @@ -2213,6 +2213,110 @@ tree_floor_log2 (const_tree expr) : floor_log2 (low)); } +/* Return number of known trailing zero bits in EXPR, or, if the value of + EXPR is known to be zero, the precision of it's type. */ + +int +tree_ctz (const_tree expr) +{ + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + !POINTER_TYPE_P (TREE_TYPE (expr))) +return 0; + + int ret1, ret2, prec = TYPE_PRECISION (TREE_TYPE (expr)); + switch (TREE_CODE (expr)) +{ +case INTEGER_CST: + ret1 = tree_to_double_int (expr).trailing_zeros (); + return MIN (ret1, prec); +case SSA_NAME: + ret1 = get_nonzero_bits (expr).trailing_zeros (); + return MIN (ret1, prec); +case PLUS_EXPR: +case MINUS_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case MIN_EXPR: +case MAX_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MIN (ret1, ret2); +case POINTER_PLUS_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = MIN (ret2, prec); + return MIN (ret1, ret2); +case BIT_AND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MAX (ret1, ret2); +case MULT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MIN (ret1 + ret2, prec); +case LSHIFT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (host_integerp (TREE_OPERAND (expr, 1), 1) + ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) + (unsigned HOST_WIDE_INT) prec)) + { + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); + return MIN (ret1 + ret2, prec); + } + return ret1; +case RSHIFT_EXPR: + if (host_integerp (TREE_OPERAND (expr, 1), 1) + ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) + (unsigned HOST_WIDE_INT) prec)) + { + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); + if (ret1 ret2) + return ret1 - ret2; + } + return 0; +case TRUNC_DIV_EXPR: +case CEIL_DIV_EXPR: +case FLOOR_DIV_EXPR: +case ROUND_DIV_EXPR: +case EXACT_DIV_EXPR: + if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST) + { + ret2 = tree_log2 (TREE_OPERAND (expr, 1)); + if (ret2 = 0 tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1) + { + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret2) + return ret1 - ret2; + } + } + return 0; +CASE_CONVERT: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 0 + ret1 = prec; + return MIN (ret1, prec); +case SAVE_EXPR: + return tree_ctz
Re: C++ PATCH to deal with trivial but non-callable [cd]tors
Late in the C++11 process it was decided that a constructor or destructor can be trivial but not callable; as a result, everywhere that assumed that a call to a trivial function didn't need any processing needed to be updated. This patch does that. This has introduced a problem for the -fdump-ada-spec machinery, which boils down to the TYPE_METHODS field of the following structure: struct _outer { struct _inner { int x; } inner; } outer; Previously it was empty, now it contains the following destructor: function_decl 0x76c6e300 _outer type method_type 0x76c6d2a0 type void_type 0x76b0ebd0 void VOID align 8 symtab 0 alias set -1 canonical type 0x76b0ebd0 pointer_to_this pointer_type 0x76b0ec78 QI size integer_cst 0x76b04280 constant 8 unit size integer_cst 0x76b042a0 constant 1 align 8 symtab 0 alias set -1 canonical type 0x76c6d1f8 method basetype record_type 0x76c5c930 _outer arg-types tree_list 0x76c52fa0 value pointer_type 0x76c5cc78 chain tree_list 0x76c52f78 value integer_type 0x76b0e5e8 int chain tree_list 0x76b00b18 value void_type 0x76b0ebd0 void throws tree_list 0x76c527a8 public abstract external autoinline decl_3 QI file t5.h line 1 col 9 align 16 context record_type 0x76c5c930 _outer arguments parm_decl 0x76c6c480 this [...] full-name _outer::~_outer() throw () not-really-extern chain function_decl 0x76c6e500 __base_dtor The destructor is created as a side effect of the call to type_build_dtor_call added to cxx_maybe_build_cleanup by the patch: @@ -14296,7 +14300,7 @@ cxx_maybe_build_cleanup (tree decl, tsub } /* Handle ordinary C++ destructors. */ type = TREE_TYPE (decl); - if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type)) + if (type_build_dtor_call (type)) { int flags = LOOKUP_NORMAL|LOOKUP_DESTRUCTOR; bool has_vbases = (TREE_CODE (type) == RECORD_TYPE #0 implicitly_declare_fn (kind=sfk_destructor, type=0x76c5c930, const_p=false, inherited_ctor=0x0, inherited_parms=0x0) at /home/eric/gnat/gnat-head/src/gcc/cp/method.c:1551 #1 0x00758ef5 in lazily_declare_fn (sfk=sfk_destructor, type=0x76c5c930) at /home/eric/gnat/gnat-head/src/gcc/cp/method.c:1950 #2 0x0075ee64 in lookup_fnfields_1 (type=0x76c5c930, name=0x76b0aec8) at /home/eric/gnat/gnat-head/src/gcc/cp/search.c:1471 #3 0x0075eea5 in lookup_fnfields_slot (type=0x76c5c930, name=0x76b0aec8) at /home/eric/gnat/gnat-head/src/gcc/cp/search.c:1483 #4 0x00687dc7 in type_build_dtor_call (t=0x76c5c930) at /home/eric/gnat/gnat-head/src/gcc/cp/class.c:5193 #5 0x005ef967 in cxx_maybe_build_cleanup (decl=0x76c64390, complain=3) at /home/eric/gnat/gnat-head/src/gcc/cp/decl.c:14303 #6 0x005c99aa in expand_static_init (decl=0x76c64390, init=0x0) at /home/eric/gnat/gnat-head/src/gcc/cp/decl.c:6902 #7 0x005c8d6c in cp_finish_decl (decl=0x76c64390, init=0x0, init_const_expr_p=false, asmspec_tree=0x0, flags=1) at /home/eric/gnat/gnat-head/src/gcc/cp/decl.c:6504 #8 0x006e22b5 in cp_parser_init_declarator (parser=0x76c5b068, decl_specifiers=0x7fffda00, checks=0x0, function_definition_allowed_p=true, member_p=false, declares_class_or_enum=2, function_definition_p=0x7fffda8b, maybe_range_for_decl=0x0) at /home/eric/gnat/gnat-head/src/gcc/cp/parser.c:16640 Is that expected and, consequently, should we adjust the machinery? -- Eric Botcazou
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
2013/10/25 Jeff Law l...@redhat.com: On 10/21/13 05:49, Ilya Enkovich wrote: Hi, This patch introduces built-in functions used by Pointers Checker and flag to enable Pointers Checker. Builtins available for user are expanded in expand_builtin. All other builtins are not allowed in expand until generic version of Pointers Cheker is implemented. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-04 Ilya Enkovich ilya.enkov...@intel.com * builtin-types.def (BT_FN_VOID_CONST_PTR): New. (BT_FN_PTR_CONST_PTR): New. (BT_FN_CONST_PTR_CONST_PTR): New. (BT_FN_PTR_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR): New. (BT_FN_VOID_PTRPTR_CONST_PTR): New. (BT_FN_VOID_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New. * chkp-builtins.def: New. * builtins.def: include chkp-builtins.def. (DEF_CHKP_BUILTIN): New. * builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS, BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS, BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS, BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS, BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS, BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND, BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL, BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET, BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW, BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER. * common.opt (fcheck-pointers): New. * toplev.c (process_options): Check Pointers Checker is supported. * doc/extend.texi: Document Pointers Checker built-in functions. Just a few minor comments. --- a/gcc/common.opt +++ b/gcc/common.opt @@ -874,6 +874,11 @@ fbounds-check Common Report Var(flag_bounds_check) Generate code to check bounds before indexing arrays +fcheck-pointers +Common Report Var(flag_check_pointers) +Add pointers checker instrumentation. fchkp-* flags are used to +control instrumentation. Currently available for C, C++ and ObjC. + I'd probably use pointer bounds checking rather than pointers checker. It's a nit, but most folks have heard the term pointer bounds checking, but few probabaly use pointers checker. I think you make several references to pointers checker that are probably best reworded slightly to use pointer bounds checker diff --git a/gcc/toplev.c b/gcc/toplev.c index feba051..285b36d 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1290,6 +1290,18 @@ process_options (void) if (flag_mudflap flag_lto) sorry (mudflap cannot be used together with link-time optimization); + if (flag_check_pointers) +{ + if (flag_lto) + sorry (Pointers checker is not yet fully supported for link-time optimization); What was the final resolution of this? Like jsm, this seems to me to be papering over a problem elsewhere. I'll pre-approve this patch with the terminology change and the flag_lto hack removed. jeff There are currently two known issues with LTO. The first one is ICE in LTO streamer when it reads instrumented code. The second one is unitialized flag_check_pointers when code is compiled by lto1 (do you know why it may happen BTW?). It also causes ICE beacause instrumented code is met when not expected. Of course, I'll fix these problems anyway, but I was going to allow '-fcheck-pointers -flto' only when checker testsuite has 100% pass rate with lto. Ilya
Re: [PATCH][ubsan] Add VLA bound instrumentation
On 10/25/2013 12:58 PM, Marek Polacek wrote: I've tried to implement the instrumentation in cp_finish_decl. However, the problem is with multidimensional arrays, e.g. for int x = -1; int a[1][x]; array_of_runtime_bound_p returns false, thus we don't instrument this at all, nor throw an exception in c++1y mode... Because the above is not valid under the proposed standard C++ VLA support; only the leftmost bound can be variable. I think the right place to handle both ubsan and c++1y VLA checks is in compute_array_index_type, in the block where we're calling variable_size. Jason
Re: C++ PATCH to deal with trivial but non-callable [cd]tors
On 10/25/2013 01:53 PM, Eric Botcazou wrote: This has introduced a problem for the -fdump-ada-spec machinery, which boils down to the TYPE_METHODS field of the following structure: struct _outer { struct _inner { int x; } inner; } outer; Previously it was empty, now it contains the following destructor: [snip] Is that expected and, consequently, should we adjust the machinery? In C++ all classes have destructors, but we try to defer building the implicit declaration. My patch causes us to build those implicit declarations more often, which is probably a bit of a memory regression, but it would be good for your code to handle the dtor being declared. Jason
Re: [PATCH][ubsan] Add VLA bound instrumentation
On Fri, Oct 25, 2013 at 02:17:48PM -0400, Jason Merrill wrote: On 10/25/2013 12:58 PM, Marek Polacek wrote: I've tried to implement the instrumentation in cp_finish_decl. However, the problem is with multidimensional arrays, e.g. for int x = -1; int a[1][x]; array_of_runtime_bound_p returns false, thus we don't instrument this at all, nor throw an exception in c++1y mode... Because the above is not valid under the proposed standard C++ VLA support; only the leftmost bound can be variable. I see. I think the right place to handle both ubsan and c++1y VLA checks is in compute_array_index_type, in the block where we're calling variable_size. I'm sorry, you want me to move the c++1y VLA check into compute_array_index_type, or just do the ubsan instrumentation in there? Thanks, Marek
Re: [PATCH][ubsan] Add VLA bound instrumentation
On 10/25/2013 03:03 PM, Marek Polacek wrote: On Fri, Oct 25, 2013 at 02:17:48PM -0400, Jason Merrill wrote: I think the right place to handle both ubsan and c++1y VLA checks is in compute_array_index_type, in the block where we're calling variable_size. I'm sorry, you want me to move the c++1y VLA check into compute_array_index_type, or just do the ubsan instrumentation in there? Thanks, Both. Jason
Re: question about register pairs
Some notes: I lie to gcc and tell it that $fp (reg 22) is two bytes when it's really one. Well, it's not really a lie if you map hardware registers 22 and 23 to a single register for the purposes of gcc internals. Yeah, I'm basically making those two registers into a permanent bigger register. Although it does make some other things more awkward, e.g. when you copy fp, and this gets split so you have an insn that copies the highpart of fp to another register. Not a problem, the chip can copy the whole $fp in one insn. Registers are only 8 bits on this chip. Seeing the patched code in its entirety like this, I notice that we would use HARD_REGNO_NREGS for a regno that's not ok for the mode. The core problem is that gcc has no way of dealing with register pairs as real registers - the second halves are still registers, still need to be in the reg class, still need _NREGS() values, etc. There's no way to tell gcc that a register is a set of (for example) N registers starting at 3, 5, 9, 12, or 14. That can be avoided if we put a break into the if. And then the !bad term in the loop condition becomes redundant. The problem I had to solve was to re-synchronize the counter with the even-numbered register. The naive patch counted even registers up to $fp, then checked odd (invalid) registers after that. The only other way I can think to solve this problem is to add a VALID_STARTING_REG(reg,mode) macro, but that would require changes all over the core code.
Re: [Patch, C++] Add C++ FE support for #pragma GCC ivdep
Jakub Jelinek wrote: FYI, I'm seeing various new FAILs on i686-linux: +FAIL: gcc.dg/vect/vect-ivdep-1.c (test for warnings, line ) +FAIL: gcc.dg/vect/vect-ivdep-1.c -flto (test for warnings, line ) +FAIL: gfortran.dg/vect/vect-do-concurrent-1.f90 -O (test for warnings, line ) +FAIL: g++.dg/vect/pr33426-ivdep.cc -std=gnu++98 (test for warnings, line ) +FAIL: g++.dg/vect/pr33426-ivdep.cc -std=gnu++11 (test for warnings, line ) +/* { dg-options -O3 -fopt-info-vec-optimized } */ The problem is likely the same in all cases and same as I've mailed about recently, dg-options should never be used in *.dg/vect/* testcases, instead just use dg-additional-options, because otherwise the required target options to enable vectorization aren't passed in. I have now changed it to dg-additional-options. Additionally, I added a space to the regexp to avoid the issues mentioned at http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02165.html The current version successfully regtested with --target_board 'unix{-m32,}' on my x86_64-linux – and I also confirmed that the dg-bogus still fails when commenting the pragma. Committed as Rev. 204074. Tobias Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (Revision 204073) +++ gcc/testsuite/ChangeLog (Arbeitskopie) @@ -1,3 +1,9 @@ +2013-10-25 Tobias Burnus bur...@net-b.de + + * g++.dg/vect/pr33426-ivdep.cc: Use dg-options. + * gfortran.dg/vect/vect-do-concurrent-1.f90: Ditto. + * testsuite/gcc.dg/vect/vect-ivdep-1.c: Ditto. + 2013-10-25 Yufeng Zhang yufeng.zh...@arm.com * gcc.dg/wmul-1.c: New test. Index: gcc/testsuite/g++.dg/vect/pr33426-ivdep.cc === --- gcc/testsuite/g++.dg/vect/pr33426-ivdep.cc (Revision 204073) +++ gcc/testsuite/g++.dg/vect/pr33426-ivdep.cc (Arbeitskopie) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ -/* { dg-options -O3 -fopt-info-vec-optimized } */ +/* { dg-additional-options -O3 -fopt-info-vec-optimized } */ /* PR other/33426 */ /* Testing whether #pragma ivdep is working. */ @@ -14,6 +14,6 @@ void foo(int n, int *a, int *b, int *c, int *d, in } /* { dg-message loop vectorized { target *-*-* } 0 } */ -/* { dg-bogus version { target *-*-* } 0 } */ -/* { dg-bogus alias { target *-*-* } 0 } */ +/* { dg-bogus version { target *-*-* } 0 } */ +/* { dg-bogus alias { target *-*-* } 0 } */ /* { dg-final { cleanup-tree-dump vect } } */ Index: gcc/testsuite/gcc.dg/vect/vect-ivdep-1.c === --- gcc/testsuite/gcc.dg/vect/vect-ivdep-1.c (Revision 204073) +++ gcc/testsuite/gcc.dg/vect/vect-ivdep-1.c (Arbeitskopie) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ -/* { dg-options -O3 -fopt-info-vec-optimized } */ +/* { dg-additional-options -O3 -fopt-info-vec-optimized } */ /* PR other/33426 */ /* Testing whether #pragma ivdep is working. */ @@ -14,6 +14,6 @@ void foo(int n, int *a, int *b, int *c, int *d, in } /* { dg-message loop vectorized { target *-*-* } 0 } */ -/* { dg-bogus version { target *-*-* } 0 } */ -/* { dg-bogus alias { target *-*-* } 0 } */ +/* { dg-bogus version { target *-*-* } 0 } */ +/* { dg-bogus alias { target *-*-* } 0 } */ /* { dg-final { cleanup-tree-dump vect } } */ Index: gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 === --- gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Revision 204073) +++ gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Arbeitskopie) @@ -1,6 +1,6 @@ ! { dg-do compile } ! { dg-require-effective-target vect_float } -! { dg-options -O3 -fopt-info-vec-optimized } +! { dg-additional-options -O3 -fopt-info-vec-optimized } subroutine test(n, a, b, c) integer, value :: n @@ -12,6 +12,6 @@ subroutine test(n, a, b, c) end subroutine test ! { dg-message loop vectorized { target *-*-* } 0 } -! { dg-bogus version { target *-*-* } 0 } -! { dg-bogus alias { target *-*-* } 0 } +! { dg-bogus version { target *-*-* } 0 } +! { dg-bogus alias { target *-*-* } 0 } ! { dg-final { cleanup-tree-dump vect } }
Re: [PATCH] rewrite stack vectors
On 2013-10-10 14:07 , tsaund...@mozilla.com wrote: This makes the implementation of stack vectors simpler and easier to use. This works by making the size of the on stack storage a template argument, so the size is embedded in the type. This allows you to implicitly convert a stack_vecT, N to a vecT, va_heap *, and it will just work. Because there's no need to support stack vectors in unions we can make them be a more normal c++ class with a constructor and destructor that are nontrivial. Thanks. This looks much simpler, indeed. The patch is fine to commit. Just a couple of observations/questions: @@ -638,7 +638,7 @@ propagate_threaded_block_debug_into (basic_block dest, basic_block src) i++; } - vectree, va_stack fewvars = vNULL; + stack_vectree, alloc_count fewvars; pointer_set_t *vars = NULL; Hm, what will happen now if alloc_count == 0? If I'm following the logic, this is tied to the presence of the 'vars' local, so it seems that we are fine. Otherwise, the quick_push operation will run into trouble. /* If we're already starting with 3/4 of alloc_count, go for a @@ -646,8 +646,6 @@ propagate_threaded_block_debug_into (basic_block dest, basic_block src) VEC. */ if (i * 4 alloc_count * 3) vars = pointer_set_create (); - else if (alloc_count) -vec_stack_alloc (tree, fewvars, alloc_count); /* Now go through the initial debug stmts in DEST again, this time actually inserting in VARS or FEWVARS. Don't bother checking for diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..4a14607 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -107,7 +107,7 @@ typedef struct with a PHI DEF that would soon become non-dominant, and when we got to the suitable one, it wouldn't have anything to substitute any more. */ -static vecadjust_info, va_stack adjust_vec; +static vecadjust_info, va_heap adjust_vec; A file global was declared as a stack vector? Sigh. Diego.
Re: [Patch] Fix gcc.dg/20050922-*.c
On Fri, 25 Oct 2013, Mike Stump wrote: On Oct 24, 2013, at 7:33 PM, Hans-Peter Nilsson h...@bitrange.com wrote: On Thu, 24 Oct 2013, Hans-Peter Nilsson wrote: I too would like to include this change on those branches, as recent generic newlib changes has caused these tests to break (regress) for my autotester for cris-elf testing those branches. Uhm, I'm on the fence, half-way wanting to retract my suggestion. This seems a recent bug in newlib, in which e.g. #include stdlib.h causes uint32_t to be defined, bleeding from newlib-internal include of stdint.h. On the other hand, testing that bleed isn't the purpose of these tests. Ah, that's what I was interested in, recent change in newlib; that makes it even more reasonable to me. Standard headers are supposed to include all headers they need, But only as allowed by the standard. No gratuitous leaking of identifiers allowed. and if they need uint32_t (or any of the types the header that defines that type has in it) in an interface in any header that that header needs? then it isn't a gratuitous stupidity. Yeah, but in this case it was. None of the standard contents in stdlib.h need uint32_t, at face value. A look in the C99 standard make me say http://sourceware.org/ml/newlib/2013/msg00803.html. brgds, H-P
Re: [PATCH] rewrite stack vectors
On Fri, Oct 25, 2013 at 03:30:47PM -0400, Diego Novillo wrote: On 2013-10-10 14:07 , tsaund...@mozilla.com wrote: This makes the implementation of stack vectors simpler and easier to use. This works by making the size of the on stack storage a template argument, so the size is embedded in the type. This allows you to implicitly convert a stack_vecT, N to a vecT, va_heap *, and it will just work. Because there's no need to support stack vectors in unions we can make them be a more normal c++ class with a constructor and destructor that are nontrivial. Thanks. This looks much simpler, indeed. The patch is fine to commit. Just a couple of observations/questions: I don't have commit access, so can someone check it in for me? I bootstrapped and got no changes in regression tests two weeks ago, but haven't checked it since if that helps. @@ -638,7 +638,7 @@ propagate_threaded_block_debug_into (basic_block dest, basic_block src) i++; } - vectree, va_stack fewvars = vNULL; + stack_vectree, alloc_count fewvars; pointer_set_t *vars = NULL; Hm, what will happen now if alloc_count == 0? If I'm following the logic, this is tied to the presence of the 'vars' local, so it seems that we are fine. Otherwise, the quick_push operation will run into trouble. The first quick_push is ok because if we were going to add more than alloc_count vars we would have created the pointer set, and the second is safe because we explicitly check the vectors length is less than alloc_count. /* If we're already starting with 3/4 of alloc_count, go for a @@ -646,8 +646,6 @@ propagate_threaded_block_debug_into (basic_block dest, basic_block src) VEC. */ if (i * 4 alloc_count * 3) vars = pointer_set_create (); - else if (alloc_count) -vec_stack_alloc (tree, fewvars, alloc_count); /* Now go through the initial debug stmts in DEST again, this time actually inserting in VARS or FEWVARS. Don't bother checking for diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..4a14607 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -107,7 +107,7 @@ typedef struct with a PHI DEF that would soon become non-dominant, and when we got to the suitable one, it wouldn't have anything to substitute any more. */ -static vecadjust_info, va_stack adjust_vec; +static vecadjust_info, va_heap adjust_vec; A file global was declared as a stack vector? Sigh. indeed, istr checking and it could be moved to the stack, but involved more refactering than seemed to make sense here. Trev Diego.
wwwdocs: Broken links due to the preprocess script (was: Re: [wwwdocs, patch] gcc-4.9/changes.html: Add quip about #pragma GCC ivdep and update Fortran section)
Tobias Burnus wrote: Thanks for looking at the patch. However, the patch has a link problem. The documentation is at http://gcc.gnu.org/onlinedocs/gcc/Loop_002dSpecific-Pragmas.html That's also the link I use in the changes.html file. However, some script changes the link to: http://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html which won't work. Try yourself at http://gcc.gnu.org/gcc-4.9/changes.html Actually, a similar issue was reported at http://gcc.gnu.org/ml/gcc-help/2013-10/msg00132.html The reason for the broken links are the following lines in the /www/bin/preprocess script: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/wwwdocs/bin/preprocess.diff?r1=1.38r2=1.39f=h Gerald, do you still know why you added it 9 years ago? The commit comment is Use sed to work around makeinfo 4.7 brokenness. I think makeinfo is still broken, but those pages do not seem to go through the preprocess script, which means that only links to that page will change to a hyphen, breaking the links. Do you think it would be sensible to remove those lines again - or, alternatively, to run a similar script (e.g. perl -i -e 's/_002d/-/g' `find onlinedocs -name \*.html`) on the onlinedocs/. I think the impact of the the former on links is smaller. (One still needs to re-run the script on those files to restore the links.) Tobias
[Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
Tobias Burnus wrote: Jason Merrill wrote: I would think that a range-based loop over an array should vectorize nicely: [...] Therefore, I will send a follow up patch. Attached is that patch [C++]. [C/C++:] Additionally, as Jakub proposed and other compilers also support, the patch accepts #pragma GCC ivdep for 'do' and 'while' loops. I did an successful all-language bootstrap followed by successful regstesting on x86-64-gnu-linux. OK for committal? Tobias 2013-10-25 Tobias Burnus bur...@net-b.de gcc/c/ PR other/33426 * c-parser.c (c_parser_while_statement, c_parser_while_statement, c_parser_pragma): Add GCC ivdep support to 'do' and 'while'. (c_parser_statement_after_labels): Update calls. gcc/cp/ PR other/33426 * parser.c (cp_parser_for, cp_parser_range_for, cp_convert_range_for, cp_parser_iteration_statement, cp_parser_pragma): Add GCC ivdep support to 'do' and 'while' and to range-based loops. * cp-tree.h (cp_convert_range_for): Update prototype. * pt.c (tsubst_expr): Update call. gcc/testsuite/ PR other/33426 * gcc.dg/vect/vect-ivdep-2.c: New. * g++.dg/vect/pr33426-ivdep-2.cc: New. * g++.dg/vect/pr33426-ivdep-3.cc: New. diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 4f25078..9ccae3b 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1157,8 +1157,8 @@ static void c_parser_statement (c_parser *); static void c_parser_statement_after_labels (c_parser *); static void c_parser_if_statement (c_parser *); static void c_parser_switch_statement (c_parser *); -static void c_parser_while_statement (c_parser *); -static void c_parser_do_statement (c_parser *); +static void c_parser_while_statement (c_parser *, bool); +static void c_parser_do_statement (c_parser *, bool); static void c_parser_for_statement (c_parser *, bool); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); @@ -4579,10 +4579,10 @@ c_parser_statement_after_labels (c_parser *parser) c_parser_switch_statement (parser); break; case RID_WHILE: - c_parser_while_statement (parser); + c_parser_while_statement (parser, false); break; case RID_DO: - c_parser_do_statement (parser); + c_parser_do_statement (parser, false); break; case RID_FOR: c_parser_for_statement (parser, false); @@ -4912,7 +4912,7 @@ c_parser_switch_statement (c_parser *parser) */ static void -c_parser_while_statement (c_parser *parser) +c_parser_while_statement (c_parser *parser, bool ivdep) { tree block, cond, body, save_break, save_cont; location_t loc; @@ -4927,6 +4927,11 @@ c_parser_while_statement (c_parser *parser) statement); cond = error_mark_node; } + + if (ivdep cond != error_mark_node) +cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_ivdep_kind)); save_break = c_break_label; c_break_label = NULL_TREE; save_cont = c_cont_label; @@ -4945,7 +4950,7 @@ c_parser_while_statement (c_parser *parser) */ static void -c_parser_do_statement (c_parser *parser) +c_parser_do_statement (c_parser *parser, bool ivdep) { tree block, cond, body, save_break, save_cont, new_break, new_cont; location_t loc; @@ -4974,7 +4979,10 @@ c_parser_do_statement (c_parser *parser) do-while statement); cond = error_mark_node; } - + if (ivdep cond != error_mark_node) +cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_ivdep_kind)); if (!c_parser_require (parser, CPP_SEMICOLON, expected %;%)) c_parser_skip_to_end_of_block_or_statement (parser); c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false); @@ -9102,12 +9110,19 @@ c_parser_pragma (c_parser *parser, enum pragma_context context) case PRAGMA_IVDEP: c_parser_consume_pragma (parser); c_parser_skip_to_pragma_eol (parser); - if (!c_parser_next_token_is_keyword (parser, RID_FOR)) + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + !c_parser_next_token_is_keyword (parser, RID_WHILE) + !c_parser_next_token_is_keyword (parser, RID_DO)) { - c_parser_error (parser, for statement expected); + c_parser_error (parser, for, while or do statement expected); return false; } - c_parser_for_statement (parser, true); + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, true); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, true); + else + c_parser_do_statement (parser, true); return false; case PRAGMA_GCC_PCH_PREPROCESS: diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 507b389..692d3cc 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -4321,7 +4321,7 @@ extern int comparing_specializations; sizeof can be nested. */ extern int cp_unevaluated_operand; -extern tree cp_convert_range_for (tree, tree, tree); +extern tree cp_convert_range_for
patch adding LRA usage for ppc
I've committed the following patch to use LRA for ppc. By default reload pass is used. To use LRA, please add option -mlra. I also removed change in constraints for swap insn as it was requested by David Edelsohn. It was necessary to fix one failure of new gcc tests. I'll work on more safe fix for this later. The patch also fixes LRA crash on 32-bit SPEC2006 gamess reported by Mike Meissner. There is still LRA crash on SPEC2006 dealII in 32-bit mode with options used by Mike. It is an issue with address constraints for TImode. I'll work on this. I guess address legitimize hook will solve the problem. Currently LRA does not use this hook. This hook is also needed to fix a LRA failures on ARM. Some tests for SDmode can fail for LRA as it generates a different code than reload. If all toolchain is built with LRA, the failures will be gone. The patch was bootstrapped on ppc64 with LRA on/off. Committed as rev. 204079. 2013-10-25 Vladimir Makarov vmaka...@redhat.com * config/rs6000/rs6000-protos.h (rs6000_secondary_memory_needed_mode): New prototype. * config/rs6000/rs6000.c: Include ira.h. (TARGET_LRA_P): Redefine. (rs6000_legitimate_offset_address_p): Call legitimate_constant_pool_address_p in strict mode for LRA. (rs6000_legitimate_address_p): Ditto. (legitimate_lo_sum_address_p): Add code for LRA. Use lra_in_progress. (rs6000_emit_move): Add LRA version of code to generate load/store of SDmode values. (rs6000_secondary_memory_needed_mode): New. (rs6000_alloc_sdmode_stack_slot): Do nothing for LRA. (rs6000_secondary_reload_class): Return NO_REGS for LRA for constants, memory, and FP registers. (rs6000_lra_p): New. * config/rs6000/rs6000.h (SECONDARY_MEMORY_NEEDED_MODE): New macro. * config/rs6000/rs6000.opt (mlra): New option. * lra-spills.c (lra_final_code_change): Remove useless move insns. Index: config/rs6000/rs6000-protos.h === --- config/rs6000/rs6000-protos.h (revision 204075) +++ config/rs6000/rs6000-protos.h (working copy) @@ -126,6 +126,8 @@ extern void rs6000_split_multireg_move ( extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode); extern void rs6000_emit_move (rtx, rtx, enum machine_mode); extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode); +extern enum machine_mode rs6000_secondary_memory_needed_mode (enum + machine_mode); extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, int, int, int, int *); extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx, Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 204075) +++ config/rs6000/rs6000.c (working copy) @@ -55,6 +55,7 @@ #include intl.h #include params.h #include tm-constrs.h +#include ira.h #include opts.h #include tree-vectorizer.h #include dumpfile.h @@ -1554,6 +1555,9 @@ static const struct attribute_spec rs600 #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p +#undef TARGET_LRA_P +#define TARGET_LRA_P rs6000_lra_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE rs6000_can_eliminate @@ -6226,7 +6230,7 @@ rs6000_legitimate_offset_address_p (enum return false; if (!reg_offset_addressing_ok_p (mode)) return virtual_stack_registers_memory_p (x); - if (legitimate_constant_pool_address_p (x, mode, strict)) + if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress)) return true; if (GET_CODE (XEXP (x, 1)) != CONST_INT) return false; @@ -6366,19 +6370,31 @@ legitimate_lo_sum_address_p (enum machin if (TARGET_ELF || TARGET_MACHO) { + bool large_toc_ok; + if (DEFAULT_ABI != ABI_AIX DEFAULT_ABI != ABI_DARWIN flag_pic) return false; - if (TARGET_TOC) + /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls +push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS +recognizes some LO_SUM addresses as valid although this +function says opposite. In most cases, LRA through different +transformations can generate correct code for address reloads. +It can not manage only some LO_SUM cases. So we need to add +code analogous to one in rs6000_legitimize_reload_address for +LOW_SUM here saying that some addresses are still valid. */ + large_toc_ok = (lra_in_progress TARGET_CMODEL != CMODEL_SMALL + small_toc_ref (x, VOIDmode)); + if (TARGET_TOC ! large_toc_ok) return false; if (GET_MODE_NUNITS (mode) != 1)
Re: [PATCH] RFA: Remove mudflap
On 10/25/13 10:26, Joseph S. Myers wrote: On Fri, 25 Oct 2013, Richard Biener wrote: Ok with preserving options as dummy, see examples like fargument-alias Common Ignore Does nothing. Preserved for backward compatibility. I don't think we should sorry (), but we can certainly warn that mudflap was replaced by -fsanitize=address(?). Implementing the warning properly isn't a requirement for the patch to go in (but the options should be still handled, but ignored). We can followup with a patch implementing the warning. Note that warning is just a matter of putting Warn(switch %qs is no longer supported) alongside Ignore - see various options in c.opt for examples. Thanks. It turns out there's an example just after -fmudflapir :-) jeff
patch to fix PR58759
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58759 The patch was successfully bootstrapped and tested on x86/x86-64. Committed as rev. 204080. 2013-10-25 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58759 * lra-constraints.c (lra_constraints): Remove wrong condition to remove insn setting up an equivalent pseudo. 2013-10-25 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/58759 * gcc.target/i386/pr58759.c: New. Index: lra-constraints.c === --- lra-constraints.c (revision 204069) +++ lra-constraints.c (working copy) @@ -3975,18 +3975,6 @@ lra_constraints (bool first_p) dest_reg = SUBREG_REG (dest_reg); if ((REG_P (dest_reg) (x = get_equiv_substitution (dest_reg)) != dest_reg - /* Check that this is actually an insn setting up - the equivalence. */ - (in_list_p (curr_insn, - ira_reg_equiv - [REGNO (dest_reg)].init_insns) - /* Init insns may contain not all insns setting - up equivalence as we have live range - splitting. So here we use another condition - to check insn setting up the equivalence - which should be removed, e.g. in case when - the equivalence is a constant. */ - || ! MEM_P (x)) /* Remove insns which set up a pseudo whose value can not be changed. Such insns might be not in init_insns because we don't update equiv data @@ -3999,8 +3987,10 @@ lra_constraints (bool first_p) secondary memory movement. Then the pseudo is spilled and we use the equiv constant. In this case we should remove the additional insn and - this insn is not init_insns list. */ + this insn is not init_insns list. */ (! MEM_P (x) || MEM_READONLY_P (x) + /* Check that this is actually an insn setting + up the equivalence. */ || in_list_p (curr_insn, ira_reg_equiv [REGNO (dest_reg)].init_insns))) Index: testsuite/gcc.target/i386/pr58759.c === --- testsuite/gcc.target/i386/pr58759.c (revision 0) +++ testsuite/gcc.target/i386/pr58759.c (working copy) @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options -O2 } */ + + +int a, b, c, d, e, f, h, l, m, n, k, o; +long long g; + +struct S +{ + int f1; + int f2; + int f3; + int f4; +}; + +static struct S i = {0,0,0,0}, j; + +void +foo () +{ + m = 1 d; + n = b + c; + o = k 1; + f = 0 == e; +} + +int +main () +{ + for (; h 1; h++) +{ + g = 1 | (0 1 - a ? 0 : a); + foo (); + for (l = 0; l 3; l++) + j = i; +} + return 0; +}
[PATCH] make gengtype more robust against user error
This patch addresses various forms of failure described in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01974.html It adds a default: gcc_unreachable(); to the autogenerated switch() statement in the routines for a base class, converting various kinds of tag errors from leading to silent lack-of-field traversal into giving run-time assertion failures (addressing (F) and (G)) It also issues an error within gengtype itself for a missing desc (failure B), turning this into an error message from gengtype. I found another potential failure mode: (H) If you had: class GTY((desc(%0.kind), tag(0))) foo { public: int kind; tree p; }; class GTY((tag(1))) bar : public foo { public: tree q; }; static GTY(()) foo *dummy_foo; and there are no explicit pointers to bar in the code, gengtype treated it as unused, and silently omitted the case for it from foo's marking routine. I've updated set_gc_type_used so that it propagates usage down into subclasses (which recurses), fixing this issue. To do this efficiently we need to track subclasses in within gengtype, so the patch also adds that, and uses it to eliminate an O(N^2). Note that for error (G), if a class within the hierarchy omits a GTY marker, gengtype doesn't parse it at all, and so doesn't parse the inheritance information - so we can't issue a warning about this. However, the lack of a tag will trigger a run-time assertion failure due to hitting the default: gcc_unreachable(); in the switch. The patch also adds a paragraph to the docs, spelling out the need for evary class in such a hierarchy to have a GTY marker. I believe this addresses all of the silent-lack-of-field-traversal issues, converting them to gengtype errors or runtime assertions. It also adds a handler for (E), turning this from a failure to compile bogus C to a specific error in gengtype. I'm bootstrapping/regtesting now. OK for trunk if that passes? gcc/ * doc/gty.texi (Inheritance and GTY): Make it clear that to use autogenerated markers for a class-hierarchy, every class must have a GTY marker. * gengtype.h (struct type): Add linked list of subclasses to the s member of the union. (add_subclass): New decl. * gengtype-state.c (read_state_struct_type): Set up subclass linked list. * gengtype.c (get_ultimate_base_class): New. (add_subclass): New. (new_structure): Set up subclass linked list. (set_gc_used_type): Propagate usage information to subclasses. (output_mangled_typename): Use get_ultimate_base_class. (walk_subclasses): Use the subclass linked list, avoiding an O(N^2) when writing out all types. (walk_type): Issue an error if the base class is missing a tag, rather than generating bogus C code. Add a gcc_unreachable default case, in case people omit tags from concrete subclasses, or get the values wrong. (write_func_for_structure): Issue an error for subclasses for which the base doesn't have a desc, since otherwise the autogenerated routines for the base would silently fail to visit any subclass fields. (write_root): Use get_ultimate_base_class, tweaking constness of tp to match that function's signature. --- gcc/doc/gty.texi | 7 +++ gcc/gengtype-state.c | 2 + gcc/gengtype.c | 119 ++- gcc/gengtype.h | 10 + 4 files changed, 109 insertions(+), 29 deletions(-) diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index 090f6a6..a64d110 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -497,6 +497,13 @@ The base class and its discriminator must be identified using the ``desc'' option. Each concrete subclass must use the ``tag'' option to identify which value of the discriminator it corresponds to. +Every class in the hierarchy must have a @code{GTY(())} marker, as +gengtype will only attempt to parse classes that have such a marker +@footnote{Classes lacking such a marker will not be identified as being +part of the hierarchy, and so the marking routines will not handle them, +leading to a assertion failure within the marking routines due to an +unknown tag value (assuming that assertions are enabled).}. + @smallexample class GTY((desc(%h.kind), tag(0))) example_base @{ diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c index 1e9fade..fda473a 100644 --- a/gcc/gengtype-state.c +++ b/gcc/gengtype-state.c @@ -1615,6 +1615,8 @@ read_state_struct_type (type_p type) read_state_lang_bitmap ((type-u.s.bitmap)); read_state_type ((type-u.s.lang_struct)); read_state_type ((type-u.s.base_class)); + if (type-u.s.base_class) + add_subclass (type-u.s.base_class, type); } else { diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 31e0f99..f35952e 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -137,6 +137,16 @@ xasprintf (const char *format, ...)
Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
On Fri, 25 Oct 2013, Tobias Burnus wrote: Tobias Burnus wrote: Jason Merrill wrote: I would think that a range-based loop over an array should vectorize nicely: [...] Therefore, I will send a follow up patch. Attached is that patch [C++]. [C/C++:] Additionally, as Jakub proposed and other compilers also support, the patch accepts #pragma GCC ivdep for 'do' and 'while' loops. The C front-end changes are OK. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Compute nonzero_bits from __builtin_unreachable assertions
Hi! And here is a patch that allows vectorization without peeling for alignment and scalar loop for bound even for fn2, fn3 and fn4 in the following testcase, though as with the range __builtin_unreachable () notes, it is quite fragile, because it only works if there are no immediate uses of the tested SSA_NAME before the assertion. Perhaps more reliable way would be to convert those assertions info __builtin_assume_aligned, but that has the disadvantage that it's first argument is a pointer and it returns a pointer, so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED internal function. Bootstrapped/regtested on x86_64-linux and i686-linux. int a[1024]; void fn1 (int x, int y) { int i; x = -32; y = -32; for (i = x + 32; i y; i++) a[i]++; } void fn2 (int x, int y) { int i; if (x 31) __builtin_unreachable (); if (y 31) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } void fn3 (int x, int y) { int i; if (x % 32) __builtin_unreachable (); if (y % 32) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } void fn3 (int x, int y) { int i; if ((x % 32) != 0) __builtin_unreachable (); if ((y % 32) != 0) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } 2013-10-25 Jakub Jelinek ja...@redhat.com * tree-vrp.c (maybe_set_nonzero_bits): New function. (remove_range_assertions): Call it. --- gcc/tree-vrp.c.jj 2013-10-24 14:32:29.0 +0200 +++ gcc/tree-vrp.c 2013-10-25 21:21:35.183092937 +0200 @@ -6459,6 +6459,60 @@ check_all_array_refs (void) } } +/* Handle + _4 = x_3 31; + if (_4 != 0) + goto bb 6; + else + goto bb 7; + bb 6: + __builtin_unreachable (); + bb 7: + x_5 = ASSERT_EXPR x_3, ...; + If x_3 has no other immediate uses (checked by caller), + var is the x_3 var from ASSERT_EXPR, we can clear low 5 bits + from the non-zero bitmask. */ + +static void +maybe_set_nonzero_bits (basic_block bb, tree var) +{ + edge e = single_pred_edge (bb); + basic_block cond_bb = e-src; + gimple stmt = last_stmt (cond_bb); + tree cst; + + if (stmt == NULL + || gimple_code (stmt) != GIMPLE_COND + || gimple_cond_code (stmt) != ((e-flags EDGE_TRUE_VALUE) +? EQ_EXPR : NE_EXPR) + || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME + || !integer_zerop (gimple_cond_rhs (stmt))) +return; + + stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt)); + if (!is_gimple_assign (stmt) + || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR + || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST) +return; + if (gimple_assign_rhs1 (stmt) != var) +{ + gimple stmt2; + + if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME) + return; + stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt)); + if (!gimple_assign_cast_p (stmt2) + || gimple_assign_rhs1 (stmt2) != var + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) + || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) + != TYPE_PRECISION (TREE_TYPE (var + return; +} + cst = gimple_assign_rhs2 (stmt); + set_nonzero_bits (var, (get_nonzero_bits (var) + ~tree_to_double_int (cst))); +} + /* Convert range assertion expressions into the implied copies and copy propagate away the copies. Doing the trivial copy propagation here avoids the need to run the full copy propagation pass after @@ -6576,8 +6630,11 @@ remove_range_assertions (void) } } if (ok) - set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min, - SSA_NAME_RANGE_INFO (lhs)-max); + { + set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min, + SSA_NAME_RANGE_INFO (lhs)-max); + maybe_set_nonzero_bits (bb, var); + } } } Jakub
[google gcc-4_8] write ggc_memory to gcda file
Hi, This patch writes out ggc_memory to gcda files. Google branches only. Test is ongoing. OK to check-in after test passes? -Rong 2013-10-25 Rong Xu x...@google.com * contrib/profile_tool (ModuleInfo): write out ggc_memory to gcda file. * gcc/gcov-io.c (gcov_read_module_info): Ditto. * libgcc/dyn-ipa.c (gcov_write_module_info): Ditto. Index: contrib/profile_tool === --- contrib/profile_tool(revision 204080) +++ contrib/profile_tool(working copy) @@ -674,6 +674,7 @@ class ModuleInfo(DataObject): self.module_id = reader.ReadWord() self.is_primary = reader.ReadWord() self.flags = reader.ReadWord() +self.ggc_memory = reader.ReadWord() self.language = reader.ReadWord() self.num_quote_paths = reader.ReadWord() self.num_bracket_paths = reader.ReadWord() @@ -710,6 +711,7 @@ class ModuleInfo(DataObject): writer.WriteWord(self.is_primary) writer.WriteWord(self.flags) writer.WriteWord(self.language) +writer.WriteWord(self.ggc_memory) writer.WriteWord(self.num_quote_paths) writer.WriteWord(self.num_bracket_paths) writer.WriteWord(self.num_system_paths) Index: gcc/gcov-io.c === --- gcc/gcov-io.c (revision 204080) +++ gcc/gcov-io.c (working copy) @@ -591,14 +591,15 @@ gcov_read_module_info (struct gcov_module_info *mo mod_info-ident = gcov_read_unsigned (); mod_info-is_primary = gcov_read_unsigned (); mod_info-flags = gcov_read_unsigned (); - mod_info-lang = gcov_read_unsigned (); + mod_info-lang = gcov_read_unsigned (); + mod_info-ggc_memory = gcov_read_unsigned (); mod_info-num_quote_paths = gcov_read_unsigned (); mod_info-num_bracket_paths = gcov_read_unsigned (); mod_info-num_system_paths = gcov_read_unsigned (); mod_info-num_cpp_defines = gcov_read_unsigned (); mod_info-num_cpp_includes = gcov_read_unsigned (); mod_info-num_cl_args = gcov_read_unsigned (); - len -= 10; + len -= 11; filename_len = gcov_read_unsigned (); mod_info-da_filename = (char *) xmalloc (filename_len * Index: libgcc/dyn-ipa.c === --- libgcc/dyn-ipa.c(revision 204080) +++ libgcc/dyn-ipa.c(working copy) @@ -2095,7 +2095,7 @@ gcov_write_module_info (const struct gcov_info *mo len += 1; /* Each string is lead by a length. */ } - len += 10; /* 9 more fields */ + len += 11; /* 11 more fields */ gcov_write_tag_length (GCOV_TAG_MODULE_INFO, len); gcov_write_unsigned (module_info-ident); @@ -2104,6 +2104,7 @@ gcov_write_module_info (const struct gcov_info *mo SET_MODULE_INCLUDE_ALL_AUX (module_info); gcov_write_unsigned (module_info-flags); gcov_write_unsigned (module_info-lang); + gcov_write_unsigned (module_info-ggc_memory); gcov_write_unsigned (module_info-num_quote_paths); gcov_write_unsigned (module_info-num_bracket_paths); gcov_write_unsigned (module_info-num_system_paths);
[GOOGLE] Fix a bug when profile propagation handles infinite loop.
If the propagation finds an infinite look, if the in-edge count is non-zero, then it will cause compiler go into infinite loop when building with AutoFDO. Bootstrapped and regression test on-going. OK for google-4_8 branch? Thanks, Dehao Index: gcc/auto-profile.c === --- gcc/auto-profile.c (revision 204027) +++ gcc/auto-profile.c (working copy) @@ -1287,6 +1287,7 @@ afdo_propagate (void) { basic_block bb; bool changed = true; + int i = 0; FOR_ALL_BB (bb) { @@ -1295,7 +1296,7 @@ afdo_propagate (void) bb-flags |= BB_ANNOTATED; } - while (changed) + while (changed i++ PARAM_VALUE (PARAM_AUTOFDO_MAX_PROPAGATE_ITERATIONS)) { changed = false; Index: gcc/params.def === --- gcc/params.def (revision 204027) +++ gcc/params.def (working copy) @@ -460,6 +460,14 @@ DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, The maximum number of loop iterations we predict statically, 100, 0, 0) +/* This parameter controls the maximum iterations that AutoFDO profile + prpagation algorithm will run for a specific CFG. */ + +DEFPARAM(PARAM_AUTOFDO_MAX_PROPAGATE_ITERATIONS, + max-autofdo-max-propagate-iterations, + The maximum number of AutoFDO profile propagation iterations, + 1000, 0, 0) + /* This parameter controls the probability of builtin_expect. The default value is 90%. This empirical value is obtained through the weighted probability of FDO counters (with the FDO count value as the weight)
Re: [PATCH, SH] Add support for inlined builtin_strncmp
Christian Bruel christian.br...@st.com wrote: No regressions for -m2 and -m4 for sh-elf. OK for trunk ? OK with the change of ChangeLog entry suggested by your another mail. Thanks! Regards, kaz
Re: [google gcc-4_8] write ggc_memory to gcda file
Looks good. David On Fri, Oct 25, 2013 at 4:04 PM, Rong Xu x...@google.com wrote: Hi, This patch writes out ggc_memory to gcda files. Google branches only. Test is ongoing. OK to check-in after test passes? -Rong
Re: [GOOGLE] Fix a bug when profile propagation handles infinite loop.
What is the usual number of iterations? David On Fri, Oct 25, 2013 at 4:10 PM, Dehao Chen de...@google.com wrote: If the propagation finds an infinite look, if the in-edge count is non-zero, then it will cause compiler go into infinite loop when building with AutoFDO. Bootstrapped and regression test on-going. OK for google-4_8 branch? Thanks, Dehao Index: gcc/auto-profile.c === --- gcc/auto-profile.c (revision 204027) +++ gcc/auto-profile.c (working copy) @@ -1287,6 +1287,7 @@ afdo_propagate (void) { basic_block bb; bool changed = true; + int i = 0; FOR_ALL_BB (bb) { @@ -1295,7 +1296,7 @@ afdo_propagate (void) bb-flags |= BB_ANNOTATED; } - while (changed) + while (changed i++ PARAM_VALUE (PARAM_AUTOFDO_MAX_PROPAGATE_ITERATIONS)) { changed = false; Index: gcc/params.def === --- gcc/params.def (revision 204027) +++ gcc/params.def (working copy) @@ -460,6 +460,14 @@ DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, The maximum number of loop iterations we predict statically, 100, 0, 0) +/* This parameter controls the maximum iterations that AutoFDO profile + prpagation algorithm will run for a specific CFG. */ + +DEFPARAM(PARAM_AUTOFDO_MAX_PROPAGATE_ITERATIONS, + max-autofdo-max-propagate-iterations, + The maximum number of AutoFDO profile propagation iterations, + 1000, 0, 0) + /* This parameter controls the probability of builtin_expect. The default value is 90%. This empirical value is obtained through the weighted probability of FDO counters (with the FDO count value as the weight)