Re: [RFC PATCH, i386]: Remove special PIC related __cpuid definitions from config/i386/cpuid.h
On Thu, Oct 16, 2014 at 12:25 PM, Uros Bizjak ubiz...@gmail.com wrote: Now that %ebx is also allocatable in PIC modes, we can cleanup config/i386/cpuid considerably. I propose to remove all PIC related specializations of __cpuid and __cpuid_count and protect the compilation with #if __GNUC__ = 5. The only drawback would be that non-bootstrapped build with gcc 5.0 will ignore -march=native, but I think this should be acceptable. I'm worried about that. Can't you instead keep the current cpuid.h stuff as is, just add __GNUC__ 5 to that, so it treats GCC 5+ PIC as if __PIC__ wasn't defined? Or, at least use cpuid.h even for older GCC if __PIC__ is not defined (or __x86_64__ is defined and not medium/large PIC model)? Do we really care that much about non-bootstrapped build? I don't see At least on Linux, driver-i386.c should not be built with PIC normally, so at least changing #if __GNUC__ = 5 to #if defined(__GNUC__) (__GNUC__ = 5 || !defined(__PIC__)) would limit the -march=native change for non-bootstrapped compilers to Darwin only (or what other targets use PIC by default?). Yes, this would work for me - the goal is to keep only one universal __cpuid (and __cpuid_count) define, and the above condition fits this goal. I have committed the attached patch to mainline SVN. 2014-10-17 Uros Bizjak ubiz...@gmail.com * config/i386/cpuid.h (__cpuid): Remove definitions that handle %ebx register in a special way. (__cpuid_count): Ditto. * config/i386/driver-i386.h: Protect with #if defined(__GNUC__) (__GNUC__ = 5 || !defined(__PIC__)). (host_detect_local_cpu): Mention that GCC with non-fixed %ebx is required to compile the function. Bootstrapped and regression tested on x86_64-linux-gnu. Uros. Index: config/i386/cpuid.h === --- config/i386/cpuid.h (revision 216298) +++ config/i386/cpuid.h (working copy) @@ -146,56 +146,7 @@ #define signature_VORTEX_ecx 0x436f5320 #define signature_VORTEX_edx 0x36387865 -#if defined(__i386__) defined(__PIC__) -/* %ebx may be the PIC register. */ -#if __GNUC__ = 3 #define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{l}\t{%%}ebx, %k1\n\t \ - cpuid\n\t \ - xchg{l}\t{%%}ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -/* Host GCCs older than 3.0 weren't supporting Intel asm syntax - nor alternatives in i386 code. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchgl\t%%ebx, %k1\n\t \ - cpuid\n\t \ - xchgl\t%%ebx, %k1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#endif -#elif defined(__x86_64__) (defined(__code_model_medium__) || defined(__code_model_large__)) defined(__PIC__) -/* %rbx may be the PIC register. */ -#define __cpuid(level, a, b, c, d) \ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level)) - -#define __cpuid_count(level, count, a, b, c, d)\ - __asm__ (xchg{q}\t{%%}rbx, %q1\n\t \ - cpuid\n\t \ - xchg{q}\t{%%}rbx, %q1\n\t \ - : =a (a), =r (b), =c (c), =d (d)\ - : 0 (level), 2 (count)) -#else -#define __cpuid(level, a, b, c, d) \ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level)) @@ -204,8 +155,8 @@ __asm__ (cpuid\n\t \ : =a (a), =b (b), =c (c), =d (d) \ : 0 (level), 2 (count)) -#endif + /* Return highest supported input value for cpuid instruction. ext can be either 0x0 or 0x800 to return highest supported value for basic or extended cpuid information. Function returns 0 if cpuid Index: config/i386/driver-i386.c === ---
Re: [PATCH Fortran] rename gfc_warning_cmdline to gfc_warning_now_2
Manuel López-Ibáñez wrote: This patch is mostly cleanups, sorry for the churn. The next one will be far more interesting. Bootregtested on x86_64-linux-gnu. OK? Looks good to me. Thanks! Tobias gcc/fortran/ChangeLog: 2014-10-16 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/44054 * gfortran.h (gfc_warning_cmdline): Rename as gfc_warning_now_2. (gfc_error_cmdline): Rename as gfc_error_now_2. * error.c (gfc_diagnostic_build_locus_prefix): Remove trailing space. (gfc_diagnostic_starter): Add space between locus and prefix. (gfc_warning_now_2): Renamed from gfc_warning_cmdline. (gfc_error_now_2): Renamed from gfc_error_cmdline. * scanner.c (add_path_to_list): Use gfc_warning_now_2. (load_line): Likewise. (load_file): Likewise. * options.c (gfc_post_options): Update all renamed functions.
Re: [PATCH diagnostics/fortran] dynamically generate locations from offset + handle %C
Manuel López-Ibáñez wrote: This patch adds handling of Fortran %C using the common diagnostics machinery. This is achieved by dynamically generating a location given a location and an offset. This only works for non-macro line-maps (for now), but this is OK since Fortran does not have virtual locations (and I'm afraid it won't have them in the foreseeable future). Dodji, are the linemap_asserts() appropriate? I tried to follow your previous comments whenever possible. Bootregtested on x86_64-linux-gnu. OK? From my side, the patch is OK. Thanks again for your diagnostic work (for this patch set and in general)! Tobias libcpp/ChangeLog: 2014-10-16 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/44054 * include/line-map.h (linemap_position_for_loc_and_offset): Declare. * line-map.c (linemap_position_for_loc_and_offset): New. gcc/fortran/ChangeLog: 2014-10-16 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/44054 * gfortran.h (warn_use_without_only): Remove. (gfc_diagnostics_finish): Declare. * error.c: Include tree-diagnostics.h (gfc_format_decoder): New. (gfc_diagnostics_init): Use gfc_format_decoder. Set default caret char. (gfc_diagnostics_finish): Restore tree diagnostics defaults, but keep gfc_diagnostics_starter and finalizer. Restore default caret. * options.c: Remove all uses of warn_use_without_only. * lang.opt (Wuse-without-only): Add Var. * module.c (gfc_use_module): Use gfc_warning_now_2. * f95-lang.c (gfc_be_parse_file): Call gfc_diagnostics_finish. gcc/testsuite/ChangeLog: 2014-10-16 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/44054 * lib/gfortran-dg.exp: Update regexp to match locus and message without caret. * gfortran.dg/use_without_only_1.f90: Add column numbers.
[libatomic PATCH] Fix libatomic behavior for big endian toolchain
Hi, I noticed that libatomic implementation for builtin function parameter smaller than word. It would shift the parameter value to word and then store word. However, the shift amount for big endian would be wrong. This patch fix libatomic builtin function behavior for big endian toolchain. Is it ok for trunk ? Shiva 2014-10-17 Shiva Chen shiva0...@gmail.com Fix libatomic behavior for big endian toolchain * libatomic/cas_n.c: Fix shift amount for big endian toolchain * libatomic/config/arm/exch_n.c: Fix shift amount for big endian toolchain * libatomic/exch_n.c: Fix shift amount for big endian toolchain * libatomic/fop_n.c: Fix shift amount for big endian toolchain diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c index 801262d..aea49f0 100644 --- a/libatomic/cas_n.c +++ b/libatomic/cas_n.c @@ -60,7 +60,11 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, UTYPE newval, if (N WORDSIZE) { wptr = (UWORD *)((uintptr_t)mptr -WORDSIZE); - shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK); +#ifdef __ARMEB__ + shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT); +#else + shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT); +#endif mask = SIZE(MASK) shift; } else diff --git a/libatomic/config/arm/exch_n.c b/libatomic/config/arm/exch_n.c index c90d57f..0d71c5a 100644 --- a/libatomic/config/arm/exch_n.c +++ b/libatomic/config/arm/exch_n.c @@ -88,7 +88,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel) __atomic_thread_fence (__ATOMIC_SEQ_CST); wptr = (UWORD *)((uintptr_t)mptr -WORDSIZE); - shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ INVERT_MASK_1; +#ifdef __ARMEB__ + shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT); +#else + shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT); +#endif mask = MASK_1 shift; wnewval = newval shift; diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c index 23558b0..e293d0b 100644 --- a/libatomic/exch_n.c +++ b/libatomic/exch_n.c @@ -77,7 +77,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel) if (N WORDSIZE) { wptr = (UWORD *)((uintptr_t)mptr -WORDSIZE); - shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK); +#ifdef __ARMEB__ + shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT); +#else + shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT); +#endif mask = SIZE(MASK) shift; } else diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c index 4a18da9..b3184b7 100644 --- a/libatomic/fop_n.c +++ b/libatomic/fop_n.c @@ -113,7 +113,11 @@ SIZE(C2(libat_fetch_,NAME)) (UTYPE *mptr, UTYPE opval, int smodel) pre_barrier (smodel); wptr = (UWORD *)mptr; +#ifdef __ARMEB__ + shift = (WORDSIZE - N) * CHAR_BIT; +#else shift = 0; +#endif mask = -1; wopval = (UWORD)opval shift; @@ -137,7 +141,11 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, int smodel) pre_barrier (smodel); wptr = (UWORD *)mptr; +#ifdef __ARMEB__ + shift = (WORDSIZE - N) * CHAR_BIT; +#else shift = 0; +#endif mask = -1; wopval = (UWORD)opval shift;
Re: [PATCH][0/n] Merge from match-and-simplify
On 16/10/14 21:43, Andrew Pinski wrote: On Thu, Oct 16, 2014 at 1:38 PM, Sebastian Pop seb...@gmail.com wrote: Richard Biener wrote: I have posted 5 patches as part of a larger series to merge (parts) from the match-and-simplify branch. While I think there was overall consensus that the idea behind the project is sound there are technical questions left for how the thing should look in the end. I've raised them in 3/n which is the only patch of the series that contains any patterns sofar. To re-iterate here (as I expect most people will only look at [0/n] patches ;)), the question is whether we are fine with making fold-const (thus fold_{unary,binary,ternary}) not handle some cases it handles currently. I have tested on aarch64 all the code in the match-and-simplify against trunk as of the last merge at r216315: 2014-10-16 Richard Biener rguent...@suse.de Merge from trunk r216235 through r216315. Overall, I see a lot of perf regressions (about 2/3 of the tests) than improvements (1/3 of the tests). I will try to reduce tests. For instance, saxpy regresses at -O3 on aarch64: void saxpy(double* x, double* y, double* z) { int i=0; for (i = 0 ; i ARRAY_SIZE; i++) { z[i] = x[i] + scalar*y[i]; } } This looks like a scheduling issue rather than anything else. The scheduler for a57 is not complete and does not model some things like the fusion of the compares and branch which is most likely what you are seeing. Huh !! how is that related to the code generation shown by Seb ? See the replacement of subs by cmp and sub. Folding cmp into other flag setting instructions is a very useful optimization on ARM and AArch64 and that's what appears missing in fold-const. That maybe what's causing the slowdown. I've never known that to be caused by any scheduler vagaries ! regards Ramana Thanks, Andrew Pinski $ diff -u base.s mas.s --- base.s 2014-10-16 15:30:15.35143 -0500 +++ mas.s 2014-10-16 15:30:16.183035000 -0500 @@ -2,12 +2,14 @@ add x1, x2, 800 ldr q0, [x0, x2] add x3, x2, 1600 + cmp x0, 784 ldr q1, [x0, x1] + add x1, x0, 16 fmlav0.2d, v1.2d, v2.2d str q0, [x0, x3] - add x0, x0, 16 - cmp x0, 800 + mov x0, x1 bne .L140 .LBE179: - subsw4, w4, #1 + cmp w4, 1 + sub w4, w4, #1 bne .L139 Thanks, Sebastian
Re: [PATCH 9/17] Initial KAsan support
The patch was slightly updated to take care of missing UBSan work (SANITIZE_FLOAT_DIVIDE, SANITIZE_FLOAT_CAST, SANITIZE_BOUNDS). --- a/gcc/opts.c +++ b/gcc/opts.c @@ -868,6 +868,20 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, /* The -gsplit-dwarf option requires -gpubnames. */ if (opts-x_dwarf_split_debug_info) opts-x_debug_generate_pub_sections = 1; + + /* Userspace and kernel ASan conflict with each other and with TSan. */ + + if ((flag_sanitize SANITIZE_USER_ADDRESS) + (flag_sanitize SANITIZE_KERNEL_ADDRESS)) +error_at (loc, + -fsanitize=address is incompatible with + -fsanitize=kernel-address); + + if ((flag_sanitize SANITIZE_ADDRESS) + (flag_sanitize SANITIZE_THREAD)) +error_at (loc, + -fsanitize=address and -fsanitize=kernel-address + are incompatible with -fsanitize=thread); } Why aren't you using opts-x_ here, like all the code just above? -- Eric Botcazou
Re: [PATCH][0/n] Merge from match-and-simplify
On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 15/10/14 14:00, Richard Biener wrote: Any comments and reviews welcome (I don't think that my maintainership covers enough to simply check this in without approval). Hi Richard, The match-and-simplify branch bootstrapped successfully on aarch64-none-linux-gnu FWIW. What about regression tests ? Thanks, Kyrill
Re: [PATCH 9/17] Initial KAsan support
On 10/17/2014 11:27 AM, Eric Botcazou wrote: The patch was slightly updated to take care of missing UBSan work (SANITIZE_FLOAT_DIVIDE, SANITIZE_FLOAT_CAST, SANITIZE_BOUNDS). --- a/gcc/opts.c +++ b/gcc/opts.c @@ -868,6 +868,20 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, /* The -gsplit-dwarf option requires -gpubnames. */ if (opts-x_dwarf_split_debug_info) opts-x_debug_generate_pub_sections = 1; + + /* Userspace and kernel ASan conflict with each other and with TSan. */ + + if ((flag_sanitize SANITIZE_USER_ADDRESS) + (flag_sanitize SANITIZE_KERNEL_ADDRESS)) +error_at (loc, + -fsanitize=address is incompatible with + -fsanitize=kernel-address); + + if ((flag_sanitize SANITIZE_ADDRESS) + (flag_sanitize SANITIZE_THREAD)) +error_at (loc, + -fsanitize=address and -fsanitize=kernel-address + are incompatible with -fsanitize=thread); } Why aren't you using opts-x_ here, like all the code just above? Well, that's a backport of ancient patch from trunk so all credits go there. And flag_sanitize is indeed handled differently from other compiler flags. -Y
Re: [PATCH 9/17] Initial KAsan support
Well, that's a backport of ancient patch from trunk so all credits go there. And flag_sanitize is indeed handled differently from other compiler flags. Really curious to know why... -- Eric Botcazou
Re: [PATCH][4/n] Merge from match-and-simplify, hook into fold-const.c
On Thu, 16 Oct 2014, Sebastian Pop wrote: Richard Biener wrote: To give you an example how it looks like, the following code is generated for /* fold_negate_exprs convert - (~A) to A + 1. */ (simplify (negate (bit_not @0)) (if (INTEGRAL_TYPE_P (type)) (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } ))) tree generic_simplify (enum tree_code code, tree type ATTRIBUTE_UNUSED, tree op0) I wonder why ATTRIBUTE_UNUSED is generated for used parameters. I've added them for the initial patch set because without any patterns defined (just 1/n and 2/n) only one of the parameters will be used. Consider them removed again once we have enough patterns to make bootstrap happy after that. { if ((op0 TREE_SIDE_EFFECTS (op0))) return NULL_TREE; switch (code) { ... case NEGATE_EXPR: { switch (TREE_CODE (op0)) { case BIT_NOT_EXPR: { tree o20 = TREE_OPERAND (op0, 0); { /* #line 136 /space/rguenther/src/svn/match-and-simplify/gcc/match.pd */ tree captures[2] ATTRIBUTE_UNUSED = {}; Same here. Also, why do we allocate two elements when only captures[0] is used? Good question - I'll have a look. Thanks, Richard. captures[0] = o20; /* #line 135 /space/rguenther/src/svn/match-and-simplify/gcc/match.pd */ if (INTEGRAL_TYPE_P (type)) { if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Applying pattern match.pd:136, %s:%d\n, __FILE__, __LINE__); tree res_op0; res_op0 = captures[0]; tree res_op1; res_op1 = build_int_cst (TREE_TYPE (captures[0]), 1); return fold_build2 (PLUS_EXPR, type, res_op0, res_op1); } } break; } ... -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
[PATCHv4][Kasan] Allow to override Asan shadow offset from command line
Hi all, On 09/29/2014 09:21 PM, Yury Gribov wrote: Kasan developers has asked for an option to override offset of Asan shadow memory region. This should simplify experimenting with memory layouts on 64-bit architectures. New patch which checks that -fasan-shadow-offset is only enabled for -fsanitize=kernel-address. I (unfortunately) can't make this --param because this can be a 64-bit value. Bootstrapped and regtested on x64. New patchset that adds strtoull to libiberty (blind copy-paste of already existing strtoul.c) and uses it to parse -fasan-shadow-offset (to avoid problem with compiling for 64-bit target a 32-bit host). Bootstrapped and regtested on x64. -Y From 0225b7878bbb5b803814646d089824d016316fef Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Thu, 16 Oct 2014 18:31:10 +0400 Subject: [PATCH 1/2] Add strtoull to libiberty. 2014-10-17 Yury Gribov y.gri...@samsung.com libiberty/ * strtoull.c: New file. --- libiberty/strtoull.c | 119 ++ 1 file changed, 119 insertions(+) create mode 100644 libiberty/strtoull.c diff --git a/libiberty/strtoull.c b/libiberty/strtoull.c new file mode 100644 index 000..c92a4a3 --- /dev/null +++ b/libiberty/strtoull.c @@ -0,0 +1,119 @@ +/* + * Copyright (c) 2014 Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. [rescinded 22 July 1999] + * 4. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifdef HAVE_CONFIG_H +#include config.h +#endif +#ifdef HAVE_LIMITS_H +#include limits.h +#endif +#ifdef HAVE_SYS_PARAM_H +#include sys/param.h +#endif +#include errno.h +#ifdef NEED_DECLARATION_ERRNO +extern int errno; +#endif +#if 0 +#include stdlib.h +#endif +#include ansidecl.h +#include safe-ctype.h + +#ifdef HAVE_LONG_LONG + +#ifndef ULLONG_MAX +#define ULLONG_MAX ((unsigned long long)(~0L)) /* 0x */ +#endif + +/* + * Convert a string to an unsigned long long integer. + * + * Ignores `locale' stuff. Assumes that the upper and lower case + * alphabets and digits are each contiguous. + */ +unsigned long long +strtoull(const char *nptr, char **endptr, register int base) +{ + register const char *s = nptr; + register unsigned long long acc; + register int c; + register unsigned long long cutoff; + register int neg = 0, any, cutlim; + + /* + * See strtol for comments as to the logic used. + */ + do { + c = *s++; + } while (ISSPACE(c)); + if (c == '-') { + neg = 1; + c = *s++; + } else if (c == '+') + c = *s++; + if ((base == 0 || base == 16) + c == '0' (*s == 'x' || *s == 'X')) { + c = s[1]; + s += 2; + base = 16; + } + if (base == 0) + base = c == '0' ? 8 : 10; + cutoff = (unsigned long long)ULLONG_MAX / (unsigned long long)base; + cutlim = (unsigned long long)ULLONG_MAX % (unsigned long long)base; + for (acc = 0, any = 0;; c = *s++) { + if (ISDIGIT(c)) + c -= '0'; + else if (ISALPHA(c)) + c -= ISUPPER(c) ? 'A' - 10 : 'a' - 10; + else + break; + if (c = base) + break; + if (any 0 || acc cutoff || (acc == cutoff c cutlim)) + any = -1; + else { + any = 1; + acc *= base; + acc += c; + } + } + if (any 0) { + acc = ULLONG_MAX; + errno = ERANGE; + } else if (neg) + acc = -acc; + if (endptr != 0) + *endptr = (char *) (any ? s - 1 : nptr); + return (acc); +} + +#endif /* ifdef HAVE_LONG_LONG */ -- 1.7.9.5 From 6c9ad20bdcfc0fbf7ccb8e2700ef7dce52a34c64 Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Fri, 29 Aug 2014 11:58:03 +0400 Subject: [PATCH 2/2]
Re: [PATCH][0/n] Merge from match-and-simplify
On Thu, 16 Oct 2014, Sebastian Pop wrote: Richard Biener wrote: I have posted 5 patches as part of a larger series to merge (parts) from the match-and-simplify branch. While I think there was overall consensus that the idea behind the project is sound there are technical questions left for how the thing should look in the end. I've raised them in 3/n which is the only patch of the series that contains any patterns sofar. To re-iterate here (as I expect most people will only look at [0/n] patches ;)), the question is whether we are fine with making fold-const (thus fold_{unary,binary,ternary}) not handle some cases it handles currently. I have tested on aarch64 all the code in the match-and-simplify against trunk as of the last merge at r216315: 2014-10-16 Richard Biener rguent...@suse.de Merge from trunk r216235 through r216315. Overall, I see a lot of perf regressions (about 2/3 of the tests) than improvements (1/3 of the tests). I will try to reduce tests. Note that the branch goes much further in exercising the machinery than I want to merge at this point (that applies mostly to all passes using the SSA propagator such as CCP and VRP and passes exercising value-numbering - FRE and PRE). It may also simply show the effect of now folding all statements from tree-ssa-forwprop.c. I have yet to investigate the testsuite fallout of [1/n] to [5/n] - testresults have been very noisy lately due to the C11 change and now ICF. For instance, saxpy regresses at -O3 on aarch64: void saxpy(double* x, double* y, double* z) { int i=0; for (i = 0 ; i ARRAY_SIZE; i++) { z[i] = x[i] + scalar*y[i]; } } $ diff -u base.s mas.s --- base.s 2014-10-16 15:30:15.35143 -0500 +++ mas.s 2014-10-16 15:30:16.183035000 -0500 @@ -2,12 +2,14 @@ add x1, x2, 800 ldr q0, [x0, x2] add x3, x2, 1600 + cmp x0, 784 ldr q1, [x0, x1] + add x1, x0, 16 fmlav0.2d, v1.2d, v2.2d str q0, [x0, x3] - add x0, x0, 16 - cmp x0, 800 + mov x0, x1 bne .L140 .LBE179: - subsw4, w4, #1 + cmp w4, 1 + sub w4, w4, #1 bne .L139 I don't understand AARCH64 assembly very well but the above looks like RTL issues and/or IVOPTs issues? Thanks for doing performance measurements. Richard.
Re: [PATCH][match-and-simplify] More ternary commutative ops, canonicalize operand order before generic_simplify
On Thu, 16 Oct 2014, Jeff Law wrote: On 10/16/14 05:06, Richard Biener wrote: This patch (also applicable to trunk) makes us canoncialize operand order for comparisons at the same time we canonicalize other operand order, in particular before dispatching to generic_simplify. It also adds operand canonicalization to ternary ops and adds FMA_EXPR and DOT_PROD_EXPR to the list of ternary commutative ops. Bootstrap and regtest running on match-and-simplify branch and x86_64-unknown-linux-gnu. Richard. 2014-10-16 Richard Biener rguent...@suse.de * fold-const.c (fold_comparison): Remove redundant constant folding and operand swapping. (fold_binary_loc): Do comparison operand swapping here, dispatch to generic_simplify after operand canonicalization. (fold_ternary_loc): Canonicalize operand order for commutative ternary operations. * tree.c (commutative_ternary_tree_code): Add DOT_PROD_EXPR and FMA_EXPR. Seems like something we'd want for the trunk independent of the match-and-simplify work Yes, I am going to test and apply it there today. Thanks, Richard.
Re: [PATCH] Fix range test optimization (PR tree-optimization/63302)
On Fri, 17 Oct 2014, Jakub Jelinek wrote: Hi! This patch fixes PR63302 by using proper predicate to test if an INTEGER_CST is a not power of 2. While the issue has been originally reported for PA, the testcase shows the same issue on x86_64 (just with __int128 instead of long long). Bootstrapped/regtested on x86_64-linux and i686-linux on both 4.9 branch and trunk, ok for trunk/4.9? Ok. Thanks, Richard. 2014-10-16 Jakub Jelinek ja...@redhat.com PR tree-optimization/63302 * tree-ssa-reassoc.c (optimize_range_tests_xor, optimize_range_tests_diff): Use !integer_pow2p () instead of tree_log2 () 0. * gcc.c-torture/execute/pr63302.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2014-04-22 15:05:46.0 +0200 +++ gcc/tree-ssa-reassoc.c2014-10-15 13:33:12.501190909 +0200 @@ -2198,7 +2198,7 @@ optimize_range_tests_xor (enum tree_code lowxor = fold_binary (BIT_XOR_EXPR, type, lowi, lowj); if (lowxor == NULL_TREE || TREE_CODE (lowxor) != INTEGER_CST) return false; - if (tree_log2 (lowxor) 0) + if (!integer_pow2p (lowxor)) return false; highxor = fold_binary (BIT_XOR_EXPR, type, highi, highj); if (!tree_int_cst_equal (lowxor, highxor)) @@ -2245,7 +2245,7 @@ optimize_range_tests_diff (enum tree_cod tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi); if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) return false; - if (tree_log2 (tem1) 0) + if (!integer_pow2p (tem1)) return false; mask = fold_build1 (BIT_NOT_EXPR, type, tem1); --- gcc/testsuite/gcc.c-torture/execute/pr63302.c.jj 2014-10-15 13:33:57.075343573 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr63302.c 2014-10-15 13:33:44.0 +0200 @@ -0,0 +1,60 @@ +/* PR tree-optimization/63302 */ + +#ifdef __SIZEOF_INT128__ +#if __SIZEOF_INT128__ * __CHAR_BIT__ == 128 +#define USE_INT128 +#endif +#endif +#if __SIZEOF_LONG_LONG__ * __CHAR_BIT__ == 64 +#define USE_LLONG +#endif + +#ifdef USE_INT128 +__attribute__((noinline, noclone)) int +foo (__int128 x) +{ + __int128 v = x (((__int128) -1 63) | 0x7ff); + + return v == 0 || v == ((__int128) -1 63); +} +#endif + +#ifdef USE_LLONG +__attribute__((noinline, noclone)) int +bar (long long x) +{ + long long v = x (((long long) -1 31) | 0x7ff); + + return v == 0 || v == ((long long) -1 31); +} +#endif + +int +main () +{ +#ifdef USE_INT128 + if (foo (0) != 1 + || foo (1) != 0 + || foo (0x800) != 1 + || foo (0x801) != 0 + || foo ((__int128) 1 63) != 0 + || foo ((__int128) -1 63) != 1 + || foo (((__int128) -1 63) | 1) != 0 + || foo (((__int128) -1 63) | 0x800) != 1 + || foo (((__int128) -1 63) | 0x801) != 0) +__builtin_abort (); +#endif +#ifdef USE_LLONG + if (bar (0) != 1 + || bar (1) != 0 + || bar (0x800) != 1 + || bar (0x801) != 0 + || bar (1LL 31) != 0 + || bar (-1LL 31) != 1 + || bar ((-1LL 31) | 1) != 0 + || bar ((-1LL 31) | 0x800) != 1 + || bar ((-1LL 31) | 0x801) != 0) +__builtin_abort (); +#endif + return 0; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Optimize range tests into bittests (PR tree-optimization/63464)
On Fri, 17 Oct 2014, Jakub Jelinek wrote: Hi! This patch optimizes some range tests into bit tests, like we already do for switches in emit_case_bit_tests, but this time for a series of ored equality or anded non-equality comparisons. If at least 3 comparisons (after the contiguous range, xor and diff + xor optimizations are performed) are needed and range of values is at most number of bits in a word, we instead check whether the operand is = smallest and = highest number in the range and if it is, test (word) 1 operand against a bitmask. Bootstrapped/regtested on x86_64-linux and i686-linux (for i686-linux I had to go back to r216304 because there are multiple ICF related bootstrap issues on i686-linux), ok for trunk? Ok. Thanks, Richard. 2014-10-16 Jakub Jelinek ja...@redhat.com PR tree-optimization/63464 * gimple.h (gimple_seq_discard): New prototype. * gimple.c: Include stringpool.h and tree-ssanames.h. (gimple_seq_discard): New function. * optabs.h (lshift_cheap_p): New prototype. * optabs.c (lshift_cheap_p): New function, moved from... * tree-switch-conversion.c (lshift_cheap_p): ... here. * tree-ssa-reassoc.c: Include gimplify.h and optabs.h. (reassoc_branch_fixups): New variable. (update_range_test): Add otherrangep and seq arguments. Unshare exp. If otherrange is NULL, use for other ranges array of pointers pointed by otherrangep instead. Emit seq before gimplified statements for tem. (optimize_range_tests_diff): Adjust update_range_test caller. (optimize_range_tests_xor): Likewise. Fix up comment. (extract_bit_test_mask, optimize_range_tests_to_bit_test): New functions. (optimize_range_tests): Adjust update_range_test caller. Call optimize_range_tests_to_bit_test. (branch_fixup): New function. (execute_reassoc): Call branch_fixup. * gcc.dg/torture/pr63464.c: New test. * gcc.dg/tree-ssa/reassoc-37.c: New test. * gcc.dg/tree-ssa/reassoc-38.c: New test. --- gcc/gimple.h.jj 2014-10-15 12:28:06.428498079 +0200 +++ gcc/gimple.h 2014-10-15 13:43:18.967491428 +0200 @@ -1269,9 +1269,10 @@ extern bool gimple_asm_clobbers_memory_p extern void dump_decl_set (FILE *, bitmap); extern bool nonfreeing_call_p (gimple); extern bool infer_nonnull_range (gimple, tree, bool, bool); -extern void sort_case_labels (vectree ); -extern void preprocess_case_label_vec_for_gimple (vectree , tree, tree *); -extern void gimple_seq_set_location (gimple_seq , location_t); +extern void sort_case_labels (vectree); +extern void preprocess_case_label_vec_for_gimple (vectree, tree, tree *); +extern void gimple_seq_set_location (gimple_seq, location_t); +extern void gimple_seq_discard (gimple_seq); /* Formal (expression) temporary table handling: multiple occurrences of the same scalar expression are evaluated into the same temporary. */ --- gcc/gimple.c.jj 2014-10-15 12:28:19.917235900 +0200 +++ gcc/gimple.c 2014-10-15 13:43:18.970491368 +0200 @@ -47,6 +47,8 @@ along with GCC; see the file COPYING3. #include demangle.h #include langhooks.h #include bitmap.h +#include stringpool.h +#include tree-ssanames.h /* All the tuples have their operand vector (if present) at the very bottom @@ -2826,3 +2828,19 @@ gimple_seq_set_location (gimple_seq seq, for (gimple_stmt_iterator i = gsi_start (seq); !gsi_end_p (i); gsi_next (i)) gimple_set_location (gsi_stmt (i), loc); } + +/* Release SSA_NAMEs in SEQ as well as the GIMPLE statements. */ + +void +gimple_seq_discard (gimple_seq seq) +{ + gimple_stmt_iterator gsi; + + for (gsi = gsi_start (seq); !gsi_end_p (gsi); ) +{ + gimple stmt = gsi_stmt (gsi); + gsi_remove (gsi, true); + release_defs (stmt); + ggc_free (stmt); +} +} --- gcc/optabs.h.jj 2014-10-15 12:28:06.479497088 +0200 +++ gcc/optabs.h 2014-10-15 13:43:18.970491368 +0200 @@ -538,5 +538,6 @@ extern void gen_satfractuns_conv_libfunc enum machine_mode, enum machine_mode); extern void init_tree_optimization_optabs (tree); +extern bool lshift_cheap_p (bool); #endif /* GCC_OPTABS_H */ --- gcc/optabs.c.jj 2014-10-15 12:28:06.433497982 +0200 +++ gcc/optabs.c 2014-10-15 13:43:18.969491387 +0200 @@ -8624,4 +8624,31 @@ get_best_mem_extraction_insn (extraction struct_bits, field_mode); } +/* Determine whether 1 x is relatively cheap in word_mode. */ + +bool +lshift_cheap_p (bool speed_p) +{ + /* FIXME: This should be made target dependent via this this_target + mechanism, similar to e.g. can_copy_init_p in gcse.c. */ + static bool init[2] = { false, false }; + static bool cheap[2] = { true, true }; + + /* If the targer has no lshift in word_mode, the
Re: [PATCH][0/n] Merge from match-and-simplify
On Fri, 17 Oct 2014, Ramana Radhakrishnan wrote: On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 15/10/14 14:00, Richard Biener wrote: Any comments and reviews welcome (I don't think that my maintainership covers enough to simply check this in without approval). Hi Richard, The match-and-simplify branch bootstrapped successfully on aarch64-none-linux-gnu FWIW. What about regression tests ? Note the branch isn't regression free on x86_64 either. The branch does more than I want to merge to trunk (and it also retains all folding code I added patterns for). I've gone farther there to explore whether it will end up working in the end and what kind of features the IL and the APIs need. I've pasted testsuite results on x86_64 below for rev. 216324 which is based on trunk rev. 216315 which unfortunately has lots of regressions on its own. This is why I want to restrict the effect of the machinery to fold (), fold_stmt () and tree-ssa-forwprop.c for the moment and merge individual patterns (well, maybe in small groups) separately to allow for easy bi-section. I suppose I should push the most visible change to trunk first, namely tree-ssa-forwprop.c folding all statements via fold_stmt after the merge. I suspect this alone can have some odd effects like the sub + cmp fusing. That would be sth like the patch attached below. Richard. Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 216258) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3. #include tree-ssa-propagate.h #include tree-ssa-dom.h #include builtins.h +#include tree-cfgcleanup.h +#include tree-into-ssa.h /* This pass propagates the RHS of assignment statements into use sites of the LHS of the assignment. It's basically a specialized @@ -3586,6 +3588,8 @@ simplify_mult (gimple_stmt_iterator *gsi return false; } + + /* Main entry point for the forward propagation and statement combine optimizer. */ @@ -3626,6 +3630,40 @@ pass_forwprop::execute (function *fun) cfg_changed = false; + /* Combine stmts with the stmts defining their operands. Do that + in an order that guarantees visiting SSA defs before SSA uses. */ + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun)); + int postorder_num = inverted_post_order_compute (postorder); + for (int i = 0; i postorder_num; ++i) +{ + bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple orig_stmt = stmt; + + if (fold_stmt (gsi)) + { + stmt = gsi_stmt (gsi); + if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt) + gimple_purge_dead_eh_edges (bb)) + cfg_changed = true; + update_stmt (stmt); + } + } +} + free (postorder); + + /* ??? Code below doesn't expect non-renamed VOPs and the above + doesn't keep virtual operand form up-to-date. */ + if (cfg_changed) +{ + cleanup_tree_cfg (); + cfg_changed = false; +} + update_ssa (TODO_update_ssa_only_virtuals); + FOR_EACH_BB_FN (bb, fun) { gimple_stmt_iterator gsi;
[ARM] Fix DWARF unwinding breakage
Hi, some OSes, for example VxWorks 6, still use DWARF unwinding on the ARM, which means that they use __builtin_eh_return (EABI unwinding doesn't). The builtin is implemented by means of {arm|thumb}_set_return_address, which can generate a store if LR has been stored on function entry. The problem is that, if this store is FP-based, it is not seen by the RTL DSE pass as being consumed by the SP-based load at the same address on function exit. That's by design in the RTL DSE pass: FP and SP are never substituted for each other by cselib, see for example this comment: /* The only thing that we are not willing to do (this is requirement of dse and if others potential uses need this function we should add a parm to control it) is that we will not substitute the STACK_POINTER_REGNUM, FRAME_POINTER or the HARD_FRAME_POINTER. These expansions confuses the code that notices that stores into the frame go dead at the end of the function and that the frame is not effected by calls to subroutines. If you allow the STACK_POINTER_REGNUM substitution, then dse will think that parameter pushing also goes dead which is wrong. If you allow the FRAME_POINTER or the HARD_FRAME_POINTER then you lose the opportunity to make the frame assumptions. */ if (regno == STACK_POINTER_REGNUM || regno == FRAME_POINTER_REGNUM || regno == HARD_FRAME_POINTER_REGNUM || regno == cfa_base_preserved_regno) return orig; so a FP-based store and a SP-based load are never seen as a RAW dependency. This nevertheless used to work because the blockage insn emitted by the RTL epilogue was acting as a wild load but this got broken by Richard's patch: 2014-03-11 Richard Sandiford rdsandif...@googlemail.com * builtins.c (expand_builtin_setjmp_receiver): Use and clobber hard_frame_pointer_rtx. * cse.c (cse_insn): Remove volatile check. * cselib.c (cselib_process_insn): Likewise. * dse.c (scan_insn): Likewise. which removed the wild load trick. This is visible at -O2 for: void foo (void *c1, void *t1, void *ra) { long offset = uw_install_context_1 (c1, t1); void *handler = __builtin_frob_return_addr (ra); __builtin_unwind_init (); __builtin_eh_return (offset, handler); } The attached patch fixes the breakage by marking the stores as frame related. Tested on ARM/VxWorks, OK for mainline and 4.9 branch? 2014-10-17 Eric Botcazou ebotca...@adacore.com * config/arm/arm.c (arm_set_return_address): Mark the store as frame related, if any. (thumb_set_return_address): Likewise. -- Eric BotcazouIndex: config/arm/arm.c === --- config/arm/arm.c (revision 216252) +++ config/arm/arm.c (working copy) @@ -28952,7 +28952,11 @@ arm_set_return_address (rtx source, rtx addr = plus_constant (Pmode, addr, delta); } - emit_move_insn (gen_frame_mem (Pmode, addr), source); + /* The store needs to be marked as frame related in order to prevent + DSE from deleting it as dead if it is based on fp. */ + rtx insn = emit_move_insn (gen_frame_mem (Pmode, addr), source); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (Pmode, LR_REGNUM)); } } @@ -29004,7 +29008,11 @@ thumb_set_return_address (rtx source, rt else addr = plus_constant (Pmode, addr, delta); - emit_move_insn (gen_frame_mem (Pmode, addr), source); + /* The store needs to be marked as frame related in order to prevent + DSE from deleting it as dead if it is based on fp. */ + rtx insn = emit_move_insn (gen_frame_mem (Pmode, addr), source); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (Pmode, LR_REGNUM)); } else emit_move_insn (gen_rtx_REG (Pmode, LR_REGNUM), source);
Pasto in is_old_name: s/new_ssa_names/old_ssa_names/g
Hello, this trivial fix was pre-approved by Richard, I regtested and committed it. 2014-10-17 Marc Glisse marc.gli...@inria.fr * tree-into-ssa.c (is_old_name): Replace new with old. --- tree-into-ssa.c (revision 216366) +++ tree-into-ssa.c (working copy) @@ -572,23 +572,23 @@ set_livein_block (tree var, basic_block info-need_phi_state = NEED_PHI_STATE_MAYBE; } /* Return true if NAME is in OLD_SSA_NAMES. */ static inline bool is_old_name (tree name) { unsigned ver = SSA_NAME_VERSION (name); - if (!new_ssa_names) + if (!old_ssa_names) return false; - return (ver SBITMAP_SIZE (new_ssa_names) + return (ver SBITMAP_SIZE (old_ssa_names) bitmap_bit_p (old_ssa_names, ver)); } /* Return true if NAME is in NEW_SSA_NAMES. */ static inline bool is_new_name (tree name) { unsigned ver = SSA_NAME_VERSION (name); -- Marc Glisse
[Ada] Missing inheritance of pragma Default_Initial_Condition
This patch modifies the inheritance of all attributes related to pragma Default_Initial_Condition to account for a case where the full view of a private type derives from another private type. -- Source -- -- parent.ads package Parent is type Parent_Typ is private with Default_Initial_Condition = False; private type Parent_Typ is null record; end Parent; -- derivation.ads with Parent; use Parent; package Derivation is type Derivation_Typ is private; private type Derivation_Typ is new Parent_Typ; end Derivation; -- derivation_check.adb with Ada.Assertions; use Ada.Assertions; with Ada.Text_IO;use Ada.Text_IO; with Derivation; use Derivation; procedure Derivation_Check is begin declare Obj : Derivation_Typ; begin Put_Line (ERROR: Default_Initial_Condition not triggered); end; exception when Assertion_Error = Put_Line (OK); when others = Put_Line (ERROR: expected Assertion_Error); end Derivation_Check; -- Compilation and output -- $ gnatmake -q -gnata derivation_check.adb $ ./derivation_check OK Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Hristian Kirtchev kirtc...@adacore.com * sem_ch3.adb (Build_Derived_Record_Type): Remove the propagation of all attributes related to pragma Default_Initial_Condition. (Build_Derived_Type): Propagation of all attributes related to pragma Default_Initial_Condition. (Process_Full_View): Account for the case where the full view derives from another private type and propagate the attributes related to pragma Default_Initial_Condition to the private view. (Propagate_Default_Init_Cond_Attributes): New routine. * sem_util.adb: Alphabetize various routines. (Build_Default_Init_Cond_Call): Use an unchecked type conversion when calling the default initial condition procedure of a private type. (Build_Default_Init_Cond_Procedure_Declaration): Prevent the generation of multiple default initial condition procedures. Index: sem_ch3.adb === --- sem_ch3.adb (revision 216367) +++ sem_ch3.adb (working copy) @@ -650,6 +650,17 @@ -- present. If errors are found, error messages are posted, and the -- Real_Range_Specification of Def is reset to Empty. + procedure Propagate_Default_Init_Cond_Attributes + (From_Typ : Entity_Id; + To_Typ : Entity_Id; + Parent_To_Derivation : Boolean := False; + Private_To_Full_View : Boolean := False); + -- Subsidiary to routines Build_Derived_Type and Process_Full_View. Inherit + -- all attributes related to pragma Default_Initial_Condition from From_Typ + -- to To_Typ. Flag Parent_To_Derivation should be set when the context is + -- the creation of a derived type. Flag Private_To_Full_View should be set + -- when processing both views of a private type. + procedure Record_Type_Declaration (T: Entity_Id; N: Node_Id; @@ -8546,23 +8557,6 @@ end if; Check_Function_Writable_Actuals (N); - - -- Propagate the attributes related to pragma Default_Initial_Condition - -- from the parent type to the private extension. A derived type always - -- inherits the default initial condition flag from the parent type. If - -- the derived type carries its own Default_Initial_Condition pragma, - -- the flag is later reset in Analyze_Pragma. Note that both flags are - -- mutually exclusive. - - if Has_Inherited_Default_Init_Cond (Parent_Type) -or else Present (Get_Pragma - (Parent_Type, Pragma_Default_Initial_Condition)) - then - Set_Has_Inherited_Default_Init_Cond (Derived_Type); - - elsif Has_Default_Init_Cond (Parent_Type) then - Set_Has_Default_Init_Cond (Derived_Type); - end if; end Build_Derived_Record_Type; @@ -8680,6 +8674,18 @@ Set_First_Rep_Item (Derived_Type, First_Rep_Item (Parent_Type)); end if; + -- Propagate the attributes related to pragma Default_Initial_Condition + -- from the parent type to the private extension. A derived type always + -- inherits the default initial condition flag from the parent type. If + -- the derived type carries its own Default_Initial_Condition pragma, + -- the flag is later reset in Analyze_Pragma. Note that both flags are + -- mutually exclusive. + + Propagate_Default_Init_Cond_Attributes +(From_Typ = Parent_Type, + To_Typ = Derived_Type, + Parent_To_Derivation = True); + -- If the parent type has delayed rep aspects, then mark the derived -- type as possibly inheriting a delayed rep aspect. @@ -10008,6 +10014,401 @@
Re: [fortran,patch] Handle infinities and NaNs in intrinsics code generation
Hi FX, FX wrote: After the compile-time simplification, this patch fixes the handling of special values (infinities and NaNs) by intrinsics EXPONENT, FRACTION, SPACING, RRSPACING SET_EXPONENT Bootstrapped and regtested on x86_64-linux. OK to commit? Looks good to me. Thanks for taking care of F2003's IEEE support. Tobias PS: You might want to browse through the current (F2008 + corrigenda + first F2015 additions) draft at http://j3-fortran.org/doc/year/14/14-007r2.pdf See especially the list at the beginning under the item Changes to the intrinsic modules IEEE_ARITHMETIC, IEEE_EXCEPTIONS, and IEEE_FEATURES for conformance with ISO/IEC/IEEE 60559:2011: [...] and then later in that file. Everthing which is in the draft is very likely to be in the final version but of course not guranteed to be so.
[Ada] Ensure record type equality treated correctly for codepeer
This is an internal change that does not affect the compiler, but fixes a problem in which a record comparison was not properly expanded. The compiler back end handled this, but it blew up codepeer. No further test required. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * exp_ch4.adb (Expand_N_Op_Eq): Make sure we deal with the implementation base type. * sinfo.ads: Add a note for N_Op_Eq and N_Op_Ne that record operands are always expanded out into component comparisons. Index: exp_ch4.adb === --- exp_ch4.adb (revision 216367) +++ exp_ch4.adb (working copy) @@ -7152,8 +7152,11 @@ return; end if; - Typl := Base_Type (Typl); + -- Now get the implementation base type (note that plain Base_Type here + -- might lead us back to the private type, which is not what we want!) + Typl := Implementation_Base_Type (Typl); + -- Equality between variant records results in a call to a routine -- that has conditional tests of the discriminant value(s), and hence -- violates the No_Implicit_Conditionals restriction. Index: sinfo.ads === --- sinfo.ads (revision 216367) +++ sinfo.ads (working copy) @@ -4246,6 +4246,11 @@ -- point operands if the Treat_Fixed_As_Integer flag is set and will -- thus treat these nodes in identical manner, ignoring small values. + -- Note on equality/inequality tests for records. In the expanded tree, + -- record comparisons are always expanded to be a series of component + -- comparisons, so the back end will never see an equality or inequality + -- operation with operands of a record type. + -- Note on overflow handling: When the overflow checking mode is set to -- MINIMIZED or ELIMINATED, nodes for signed arithmetic operations may -- be modified to use a larger type for the operands and result. In
[Ada] Make System.Atomic_Counters available to user applications
The system unit System.Atomic_Counters which provides an atomic counter type, along with increment, decrement and test operations, available to user programs. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * gnat_rm.texi: Document System.Atomic_Counters. * impunit.adb: Add System.Atomic_Counters (s-atocou.ads) to the list of user- accessible units added as children of System. * s-atocou.ads: Update comment. Index: gnat_rm.texi === --- gnat_rm.texi(revision 216367) +++ gnat_rm.texi(working copy) @@ -661,6 +661,7 @@ * Interfaces.VxWorks.IO (i-vxwoio.ads):: * System.Address_Image (s-addima.ads):: * System.Assertions (s-assert.ads):: +* System.Atomic_Counters (s-atocou.ads):: * System.Memory (s-memory.ads):: * System.Multiprocessors (s-multip.ads):: * System.Multiprocessors.Dispatching_Domains (s-mudido.ads):: @@ -19074,6 +19075,7 @@ * Interfaces.VxWorks.IO (i-vxwoio.ads):: * System.Address_Image (s-addima.ads):: * System.Assertions (s-assert.ads):: +* System.Atomic_Counters (s-atocou.ads):: * System.Memory (s-memory.ads):: * System.Multiprocessors (s-multip.ads):: * System.Multiprocessors.Dispatching_Domains (s-mudido.ads):: @@ -20585,6 +20587,18 @@ by an run-time assertion failure, as well as the routine that is used internally to raise this assertion. +@node System.Atomic_Counters (s-atocou.ads) +@section @code{System.Atomic_Counters} (@file{s-atocou.ads}) +@cindex @code{System.Atomic_Counters} (@file{s-atocou.ads}) + +@noindent +This package provides the declaration of an atomic counter type, +together with efficient routines (using hardware +synchronization primitives) for incrementing, decrementing, +and testing of these counters. This package is implemented +on most targets, including all Alpha, ia64, PowerPC, SPARC V9, +x86, and x86_64 platforms. + @node System.Memory (s-memory.ads) @section @code{System.Memory} (@file{s-memory.ads}) @cindex @code{System.Memory} (@file{s-memory.ads}) Index: impunit.adb === --- impunit.adb (revision 216367) +++ impunit.adb (working copy) @@ -367,6 +367,7 @@ -- (s-addima, F), -- System.Address_Image +(s-atocou, F), -- System.Atomic_Counters (s-assert, F), -- System.Assertions (s-diflio, F), -- System.Dim.Float_IO (s-diinio, F), -- System.Dim.Integer_IO Index: s-atocou.ads === --- s-atocou.ads(revision 216367) +++ s-atocou.ads(working copy) @@ -6,7 +6,7 @@ -- -- -- S p e c -- -- -- --- Copyright (C) 2011-2013, Free Software Foundation, Inc. -- +-- Copyright (C) 2011-2014, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -37,8 +37,6 @@ --- all x86 platforms --- all x86_64 platforms --- Why isn't this package available to application programs??? - package System.Atomic_Counters is pragma Preelaborate; @@ -59,20 +57,19 @@ function Decrement (Item : in out Atomic_Counter) return Boolean; pragma Inline_Always (Decrement); - -- Decrements value of atomic counter, returns True when value reach zero. + -- Decrements value of atomic counter, returns True when value reach zero function Is_One (Item : Atomic_Counter) return Boolean; pragma Inline_Always (Is_One); - -- Returns True when value of the atomic counter is one. + -- Returns True when value of the atomic counter is one procedure Initialize (Item : out Atomic_Counter); pragma Inline_Always (Initialize); -- Initialize counter by setting its value to one. This subprogram is - -- intended to be used in special cases when counter object can't be + -- intended to be used in special cases when the counter object cannot be -- initialized in standard way. private - type Unsigned_32 is mod 2 ** 32; type Atomic_Counter is limited record
Re: [PATCH 9/17] Initial KAsan support
On 10/17/2014 11:43 AM, Eric Botcazou wrote: Well, that's a backport of ancient patch from trunk so all credits go there. And flag_sanitize is indeed handled differently from other compiler flags. Really curious to know why... I bet this was a typo but let's wait for Jakub's comments. -Y
Re: [PATCH][4/n] Merge from match-and-simplify, hook into fold-const.c
On Fri, 17 Oct 2014, Richard Biener wrote: On Thu, 16 Oct 2014, Sebastian Pop wrote: Richard Biener wrote: To give you an example how it looks like, the following code is generated for /* fold_negate_exprs convert - (~A) to A + 1. */ (simplify (negate (bit_not @0)) (if (INTEGRAL_TYPE_P (type)) (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } ))) tree generic_simplify (enum tree_code code, tree type ATTRIBUTE_UNUSED, tree op0) I wonder why ATTRIBUTE_UNUSED is generated for used parameters. I've added them for the initial patch set because without any patterns defined (just 1/n and 2/n) only one of the parameters will be used. Consider them removed again once we have enough patterns to make bootstrap happy after that. { if ((op0 TREE_SIDE_EFFECTS (op0))) return NULL_TREE; switch (code) { ... case NEGATE_EXPR: { switch (TREE_CODE (op0)) { case BIT_NOT_EXPR: { tree o20 = TREE_OPERAND (op0, 0); { /* #line 136 /space/rguenther/src/svn/match-and-simplify/gcc/match.pd */ tree captures[2] ATTRIBUTE_UNUSED = {}; Same here. Also, why do we allocate two elements when only captures[0] is used? Good question - I'll have a look. Fixed by the following - bootstrapped on x86_64-unknown-linux-gnu, applied. Richard. 2014-10-17 Richard Biener rguent...@suse.de * genmatch.c (simplify::simplify): Fix off-by-one error. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 216316) +++ gcc/genmatch.c (working copy) @@ -495,7 +495,7 @@ struct simplify : match (match_), match_location (match_location_), result (result_), result_location (result_location_), ifexpr_vec (ifexpr_vec_), for_vec (for_vec_), - capture_ids (capture_ids_), capture_max (capture_ids_-size ()) {} + capture_ids (capture_ids_), capture_max (capture_ids_-size () - 1) {} /* The expression that is matched against the GENERIC or GIMPLE IL. */ operand *match;
Re: [PATCH][match-and-simplify] More ternary commutative ops, canonicalize operand order before generic_simplify
On Fri, 17 Oct 2014, Richard Biener wrote: On Thu, 16 Oct 2014, Jeff Law wrote: On 10/16/14 05:06, Richard Biener wrote: This patch (also applicable to trunk) makes us canoncialize operand order for comparisons at the same time we canonicalize other operand order, in particular before dispatching to generic_simplify. It also adds operand canonicalization to ternary ops and adds FMA_EXPR and DOT_PROD_EXPR to the list of ternary commutative ops. Bootstrap and regtest running on match-and-simplify branch and x86_64-unknown-linux-gnu. Richard. 2014-10-16 Richard Biener rguent...@suse.de * fold-const.c (fold_comparison): Remove redundant constant folding and operand swapping. (fold_binary_loc): Do comparison operand swapping here, dispatch to generic_simplify after operand canonicalization. (fold_ternary_loc): Canonicalize operand order for commutative ternary operations. * tree.c (commutative_ternary_tree_code): Add DOT_PROD_EXPR and FMA_EXPR. Seems like something we'd want for the trunk independent of the match-and-simplify work Yes, I am going to test and apply it there today. Like below. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-10-17 Richard Biener rguent...@suse.de * fold-const.c (fold_comparison): Remove redundant constant folding and operand swapping. (fold_binary_loc): Do comparison operand swapping here. (fold_ternary_loc): Canonicalize operand order for commutative ternary operations. * tree.c (commutative_ternary_tree_code): Add DOT_PROD_EXPR and FMA_EXPR. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 216366) +++ gcc/fold-const.c(working copy) @@ -8721,14 +8721,6 @@ fold_comparison (location_t loc, enum tr STRIP_SIGN_NOPS (arg0); STRIP_SIGN_NOPS (arg1); - tem = fold_relational_const (code, type, arg0, arg1); - if (tem != NULL_TREE) -return tem; - - /* If one arg is a real or integer constant, put it last. */ - if (tree_swap_operands_p (arg0, arg1, true)) -return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0); - /* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 -+ C1. */ if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) (equality_code || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) @@ -9915,6 +9907,12 @@ fold_binary_loc (location_t loc, tree_swap_operands_p (arg0, arg1, true)) return fold_build2_loc (loc, code, type, op1, op0); + /* Likewise if this is a comparison, and ARG0 is a constant, move it + to ARG1 to reduce the number of tests below. */ + if (kind == tcc_comparison + tree_swap_operands_p (arg0, arg1, true)) +return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0); + /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. First check for cases where an arithmetic operation is applied to a @@ -13799,6 +13797,12 @@ fold_ternary_loc (location_t loc, enum t gcc_assert (IS_EXPR_CODE_CLASS (kind) TREE_CODE_LENGTH (code) == 3); + /* If this is a commutative operation, and OP0 is a constant, move it + to OP1 to reduce the number of tests below. */ + if (commutative_ternary_tree_code (code) + tree_swap_operands_p (op0, op1, true)) +return fold_build3_loc (loc, code, type, op1, op0, op2); + /* Strip any conversions that don't change the mode. This is safe for every expression, except for a comparison expression because its signedness is derived from its operands. So, in the latter Index: gcc/tree.c === --- gcc/tree.c (revision 216366) +++ gcc/tree.c (working copy) @@ -7385,6 +7385,8 @@ commutative_ternary_tree_code (enum tree { case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: +case DOT_PROD_EXPR: +case FMA_EXPR: return true; default:
[Ada] String literal is allowed for pragma Warnings in Ada 83
Documentation change only, no further test required Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * gnat_rm.texi: Document that string literal can be used for pragma Warnings when operating in Ada 83 mode. Index: gnat_rm.texi === --- gnat_rm.texi(revision 216371) +++ gnat_rm.texi(working copy) @@ -7829,6 +7829,9 @@ pragma Warnings (On | Off, static_string_EXPRESSION [,REASON]); REASON ::= Reason = STRING_LITERAL @{ STRING_LITERAL@} + +Note: in Ada 83 mode, a string literal may be used in place of +a static string expression (which does not exist in Ada 83). @end smallexample @noindent
[Ada] Class-wide type invariants for type extensions in other units.
A class-wide type invariant is inherited by a type extension, and incorporated into the invariant procedure for that type. When the expression for such an invariant (typically a function call) is first analyzed, we must preserve some semantic information in it, because the type extension may be declared in a different unit, where it cannot be resolved by visibility if it refers to local entities. The following must compile quietly: gcc -c -gnata inv2.ads --- package Inv1 is type T_Inv1 is tagged private with Type_Invariant'Class = Invariant (T_Inv1); function Invariant (This : in T_Inv1'Class) return Boolean; type T_Inv2 is new Inv1.T_Inv1 with private; private type T_Inv1 is tagged record Value : Integer := 1234; end record; function Invariant (This : in T_Inv1'Class) return Boolean is (This.Value 1000); type T_Inv2 is new Inv1.T_Inv1 with null record; end Inv1; --- with Inv1; package Inv2 is type T_Inv2 is new Inv1.T_Inv1 with private; private type T_Inv2 is new Inv1.T_Inv1 with null record; end Inv2; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Ed Schonberg schonb...@adacore.com * sem_ch13.adb (Add_Invariants): For a class-wide type invariant, preserve semantic information on the invariant expression (typically a function call) because it may be inherited by a type extension in a different unit, and it cannot be resolved by visibility elsewhere because it may refer to local entities. Index: sem_ch13.adb === --- sem_ch13.adb(revision 216367) +++ sem_ch13.adb(working copy) @@ -2947,8 +2947,7 @@ -- evaluation of this aspect should be delayed to the -- freeze point (why???) -if No (Expr) - or else Is_True (Static_Boolean (Expr)) +if No (Expr) or else Is_True (Static_Boolean (Expr)) then Set_Uses_Lock_Free (E); end if; @@ -3621,10 +3620,10 @@ if (Attr = Name_Constant_Indexing and then Present (Find_Aspect (Etype (Ent), Aspect_Constant_Indexing))) - - or else (Attr = Name_Variable_Indexing -and then Present - (Find_Aspect (Etype (Ent), Aspect_Variable_Indexing))) + or else + (Attr = Name_Variable_Indexing + and then Present + (Find_Aspect (Etype (Ent), Aspect_Variable_Indexing))) then if Debug_Flag_Dot_XX then null; @@ -4269,11 +4268,7 @@ -- Case of address clause for a (non-controlled) object -elsif - Ekind (U_Ent) = E_Variable -or else - Ekind (U_Ent) = E_Constant -then +elsif Ekind_In (U_Ent, E_Variable, E_Constant) then declare Expr : constant Node_Id := Expression (N); O_Ent : Entity_Id; @@ -4295,7 +4290,7 @@ if Present (O_Ent) and then (Has_Controlled_Component (Etype (O_Ent)) -or else Is_Controlled (Etype (O_Ent))) + or else Is_Controlled (Etype (O_Ent))) then Error_Msg_N (??cannot overlay with controlled object, Expr); @@ -4826,13 +4821,10 @@ -- except from aspect specification. if From_Aspect_Specification (N) then - if not (Is_Protected_Type (U_Ent) -or else Is_Task_Type (U_Ent)) - then + if not Is_Concurrent_Type (U_Ent) then Error_Msg_N -(Interrupt_Priority can only be defined for task - and protected object, - Nam); +(Interrupt_Priority can only be defined for task + and protected object, Nam); elsif Duplicate_Clause then null; @@ -4985,14 +4977,12 @@ -- aspect specification. if From_Aspect_Specification (N) then - if not (Is_Protected_Type (U_Ent) -or else Is_Task_Type (U_Ent) + if not (Is_Concurrent_Type (U_Ent) or else Ekind (U_Ent) = E_Procedure) then Error_Msg_N -(Priority can only be defined for task and protected - object, - Nam); +(Priority can only be defined for task and protected + object, Nam); elsif Duplicate_Clause then
[PATCH,i686]: Temporary fir for PR63566
Hello. After IRC discussion, IPA ICF will set local flag to false for both original and node that becomes an alias. That will enforce equal calling convention to be use. i686-pc-linux bootstrap has been still running, I will commit the fix as soon as it finishes. I consider it as pre-approved. Thanks you, Martin gcc/ChangeLog: 2014-10-17 Martin Liska mli...@suse.cz * ipa-icf.c (sem_function::merge): Local flags are set to false to enforce equal calling convention to be used. * opts.c (common_handle_option): Indentation fix. diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index f7510b3..0e6bd9a 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -630,6 +630,11 @@ sem_function::merge (sem_item *alias_item) cgraph_node::create_alias (alias_func-decl, decl); alias-resolve_alias (original); + /* Workaround for PR63566 that forces equal calling convention + to be used. */ + alias-local.local = false; + original-local.local = false; + if (dump_file) fprintf (dump_file, Callgraph alias has been created.\n\n); } diff --git a/gcc/opts.c b/gcc/opts.c index dc8ddf4..3054196 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1982,8 +1982,8 @@ common_handle_option (struct gcc_options *opts, break; case OPT_fipa_icf: - opts-x_flag_ipa_icf_functions = value; - opts-x_flag_ipa_icf_variables = value; + opts-x_flag_ipa_icf_functions = value; + opts-x_flag_ipa_icf_variables = value; break; default:
[Ada] Fix obscure case of compiler crash on bad attribute
This fixes an error in the handling of attributes where the prefix raises an exception. This resulted from other errors in the program. No simple test case has been found, but the correction is clearly safe. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * sem_attr.adb (Eval_Attribute): Ensure that attribute reference is not marked as being a static expression if the prefix evaluation raises CE. Index: sem_attr.adb === --- sem_attr.adb(revision 216367) +++ sem_attr.adb(working copy) @@ -7553,15 +7553,17 @@ Static := Static and then not Is_Constr_Subt_For_U_Nominal (P_Type); Set_Is_Static_Expression (N, Static); - end if; while Present (Nod) loop if not Is_Static_Subtype (Etype (Nod)) then Static := False; Set_Is_Static_Expression (N, False); + elsif not Is_OK_Static_Subtype (Etype (Nod)) then Set_Raises_Constraint_Error (N); + Static := False; + Set_Is_Static_Expression (N, False); end if; -- If however the index type is generic, or derived from @@ -7591,6 +7593,7 @@ begin E := E1; + while Present (E) loop -- If expression is not static, then the attribute reference @@ -7638,6 +7641,7 @@ end loop; if Raises_Constraint_Error (Prefix (N)) then +Set_Is_Static_Expression (N, False); return; end if; end;
[Ada] Better messages for missing entities in configurable runtime
A new mechanism has been implemented that allows specialization of error messages for missing entities in a configurable run-time. Instead of just outputting the (sometimes obscure) name of the entity involved, a more meaningful message can be issued. This new mechanism is used for a case of rendezvous not being supported and also for packed array operations not being supported. Also in the case of unsupported array packing, the message is now issued explicitly on the array type entity, as shown in this test program (compiled with -gnatld7 -gnatj55) 1. pragma No_Run_Time; 2. procedure BadPack (M : Integer) is 3.type R is mod 2 ** 43; 4.type A is array (1 .. 10) of R; | packing of 43-bit components not allowed in no run time mode 5.pragma Pack (A); 6.AV : A; 7. begin 8.AV (M) := 3; | construct not allowed in no run time mode packed component size of 43 is not supported 9. end; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * exp_pakd.adb: Move bit packed entity tables to spec. * exp_pakd.ads: Move bit packed entity tables here from body. * freeze.adb (Freeze_Array_Type): Check that packed array type is supported. * rtsfind.adb (PRE_Id_Table): New table (Entity_Not_Defined): Specialize messages using PRE_Id_Table. * uintp.ads, uintp.adb (UI_Image): New functional form. Index: exp_pakd.adb === --- exp_pakd.adb(revision 216367) +++ exp_pakd.adb(working copy) @@ -34,7 +34,6 @@ with Nlists; use Nlists; with Nmake;use Nmake; with Opt; use Opt; -with Rtsfind; use Rtsfind; with Sem; use Sem; with Sem_Aux; use Sem_Aux; with Sem_Ch3; use Sem_Ch3; @@ -77,365 +76,6 @@ -- right rotate into a left rotate, avoiding the subtract, if the machine -- architecture provides such an instruction. - -- - -- Entity Tables for Packed Access Routines -- - -- - - -- For the cases of component size = 3,5-7,9-15,17-31,33-63 we call library - -- routines. This table provides the entity for the proper routine. - - type E_Array is array (Int range 01 .. 63) of RE_Id; - - -- Array of Bits_nn entities. Note that we do not use library routines - -- for the 8-bit and 16-bit cases, but we still fill in the table, using - -- entries from System.Unsigned, because we also use this table for - -- certain special unchecked conversions in the big-endian case. - - Bits_Id : constant E_Array := - (01 = RE_Bits_1, - 02 = RE_Bits_2, - 03 = RE_Bits_03, - 04 = RE_Bits_4, - 05 = RE_Bits_05, - 06 = RE_Bits_06, - 07 = RE_Bits_07, - 08 = RE_Unsigned_8, - 09 = RE_Bits_09, - 10 = RE_Bits_10, - 11 = RE_Bits_11, - 12 = RE_Bits_12, - 13 = RE_Bits_13, - 14 = RE_Bits_14, - 15 = RE_Bits_15, - 16 = RE_Unsigned_16, - 17 = RE_Bits_17, - 18 = RE_Bits_18, - 19 = RE_Bits_19, - 20 = RE_Bits_20, - 21 = RE_Bits_21, - 22 = RE_Bits_22, - 23 = RE_Bits_23, - 24 = RE_Bits_24, - 25 = RE_Bits_25, - 26 = RE_Bits_26, - 27 = RE_Bits_27, - 28 = RE_Bits_28, - 29 = RE_Bits_29, - 30 = RE_Bits_30, - 31 = RE_Bits_31, - 32 = RE_Unsigned_32, - 33 = RE_Bits_33, - 34 = RE_Bits_34, - 35 = RE_Bits_35, - 36 = RE_Bits_36, - 37 = RE_Bits_37, - 38 = RE_Bits_38, - 39 = RE_Bits_39, - 40 = RE_Bits_40, - 41 = RE_Bits_41, - 42 = RE_Bits_42, - 43 = RE_Bits_43, - 44 = RE_Bits_44, - 45 = RE_Bits_45, - 46 = RE_Bits_46, - 47 = RE_Bits_47, - 48 = RE_Bits_48, - 49 = RE_Bits_49, - 50 = RE_Bits_50, - 51 = RE_Bits_51, - 52 = RE_Bits_52, - 53 = RE_Bits_53, - 54 = RE_Bits_54, - 55 = RE_Bits_55, - 56 = RE_Bits_56, - 57 = RE_Bits_57, - 58 = RE_Bits_58, - 59 = RE_Bits_59, - 60 = RE_Bits_60, - 61 = RE_Bits_61, - 62 = RE_Bits_62, - 63 = RE_Bits_63); - - -- Array of Get routine entities. These are used to obtain an element from - -- a packed array. The N'th entry is used to obtain elements from a packed - -- array whose component size is N. RE_Null is used as a null entry, for - -- the cases where a library routine is not used. - - Get_Id : constant E_Array := - (01 = RE_Null, - 02 = RE_Null, - 03 = RE_Get_03, - 04 = RE_Null, - 05 = RE_Get_05, - 06 = RE_Get_06, - 07 = RE_Get_07, - 08 = RE_Null, - 09 = RE_Get_09, - 10 = RE_Get_10, - 11 = RE_Get_11, - 12 = RE_Get_12, - 13 = RE_Get_13, - 14 = RE_Get_14, - 15 = RE_Get_15, - 16 = RE_Null, -
[Ada] Short_Integer should be considered implementation defined
For the purposes of restriction No_Implementation_Identifiers, Standard.Short_Integer should be considered as being implementation defined and this was not the case. In addition, this patch fixes a compiler blow up with a compiler built with assertions in the test for implementation-defined identifiers. Note that the latter problem is not documented in the KP entry for this ticket, since it shows up only in compilers built with assertions. The following should compile as indicated with -gnatld7 -gnatj55 1. pragma Restriction_Warnings 2. (No_Implementation_Identifiers); 3. package ImplIdent is 4. subtype Integer_8 is Standard.Short_Short_Integer; | warning: violation of restriction No_Implementation_Identifiers at line 1 5. subtype Integer_16 is Standard.Short_Integer; | warning: violation of restriction No_Implementation_Identifiers at line 1 6. end; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * cstand.adb (Create_Standard): Mark Short_Integer as implementation defined. * sem_util.adb (Set_Entity_With_Checks): Avoid blow up for compiler built with assertions for No_Implementation_Identifiers test. Index: sem_util.adb === --- sem_util.adb(revision 216371) +++ sem_util.adb(working copy) @@ -16462,8 +16462,9 @@ -- the entities within it). if (Is_Implementation_Defined (Val) - or else - Is_Implementation_Defined (Scope (Val))) + or else +(Present (Scope (Val)) + and then Is_Implementation_Defined (Scope (Val and then not (Ekind_In (Val, E_Package, E_Generic_Package) and then Is_Library_Level_Entity (Val)) then Index: cstand.adb === --- cstand.adb (revision 216367) +++ cstand.adb (working copy) @@ -735,6 +735,7 @@ Build_Signed_Integer_Type (Standard_Short_Integer, Standard_Short_Integer_Size); + Set_Is_Implementation_Defined (Standard_Short_Integer); Build_Signed_Integer_Type (Standard_Integer, Standard_Integer_Size);
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is reduced patch as you requested. All your remarks have been fixed. Could you please look at it ( I have already sent the patch with changes in add_to_predicate_list for review). + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, More than two phi node args.\n); + return false; + } + +} Excess vertical space. +/* Assumes that BB has more than 2 predecessors. More than 1 predecessor? + Returns false if at least one successor is not on critical edge + and true otherwise. */ + +static inline bool +all_edges_are_critical (basic_block bb) +{ all_preds_critical_p would be a better name + if (EDGE_COUNT (bb-preds) 2) +{ + if (!flag_force_vectorize) + return false; +} as I said in the last review I don't think we should restrict edge predicates to flag_force_vectorize. At least I can't see how if-conversion is magically more expensive for that case? So please rework the patch so critical edges are always handled correctly. Ok with that and the above suggested changes. Thanks, Richard. Thanks. Yuri. ChangeLog 2014-10-16 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Allow bb has more than two predecessors if FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-15 13:50 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is updated patch (part1) for extended if conversion. Second part of patch will be sent later. Ok, I'm starting to look at this. I'd still like you to split things up more. static inline void add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) { ... + /* We use notion of cd equivalence to get simplier predicate for +join block, e.g. if join block has 2 predecessors with predicates +p1 p2 and p1 !p2, we'd like to get p1 for it instead of +p1 p2 | p1 !p2. */ + if (dom_bb != loop-header + get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb) + { + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); + bc = bb_predicate (dom_bb); + gcc_assert (!is_true_predicate (bc)); these changes look worthwhile even for !flag_force_vectorize. So please split the change to add_to_predicate_list out and compute post-dominators unconditionally. Note that you should call free_dominance_info (CDI_POST_DOMINATORS) at the end of if-conversion. + if (!dominated_by_p (CDI_DOMINATORS, loop-latch, e-dest)) +add_to_predicate_list (loop, e-dest, cond); + + /* If edge E is critical save predicate on it. */ + if (EDGE_COUNT (e-dest-preds) = 2) +set_edge_predicate (e, cond); how do we know the edge is critical by this simple check? Why not simply always save edge predicates (well, you kind of do but omit the case where e-src dominates e-dest). Btw, you can rely on edge-aux being NULL at the start of the pass but need to clear it at the end (best use clear_aux_for_edges () for that). So stuff like + extract_true_false_edges_from_block (bb, true_edge, false_edge); + if (flag_force_vectorize) + true_edge-aux = false_edge-aux = NULL; shouldn't be necessary. I think the edge predicate handling should also be unconditionally and not depend on flag_force_vectorize. + /* The loop latch and loop exit block are always executed and +have no extra conditions to be processed: skip them. */ + if (bb == loop-latch + || bb_with_exit_edge_p (loop, bb)) I don't think the edge stuff is true - given you
[Ada] Better error message for illegal iterator expression
This patch improves the error message on an iterator specification whose name is a function call that does not yield a type that implements an iterator interface. Compiling try_containers.adb must yield: try_containers.adb:17:18: expect object that implements iterator interface -- with Ada.Text_Io; use Ada.Text_Io; with Ada.Containers.Vectors; procedure Try_Containers is package Integer_Vectors is new Ada.Containers.Vectors (Natural, Integer); use Integer_Vectors; A : Vector := To_Vector (1, 10); begin Loop_1 : for Element of A loop Put_Line (A (i) = Integer'Image (Element)); -- can't do Element := 2; end loop Loop_1; Loop_2 : for Cursor in First (A) loop -- oops! should be: -- for Cursor in Iterate (A) loop Put_Line (A (I) = Integer'Image (Element (Cursor))); Replace_Element (A, Cursor, 2); Reference (A, Cursor) := 2; end loop Loop_2; end Try_Containers; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Ed Schonberg schonb...@adacore.com * sem_ch5.adb (Analyze_Iterator_Specification): If the domain of iteration is given by an expression that is not an array type, verify that its type implements an iterator iterface. Index: sem_ch5.adb === --- sem_ch5.adb (revision 216367) +++ sem_ch5.adb (working copy) @@ -1838,6 +1838,17 @@ else Typ := Etype (Iter_Name); + + -- Verify that the expression produces an iterator. + + if not Of_Present (N) and then not Is_Iterator (Typ) + and then not Is_Array_Type (Typ) + and then No (Find_Aspect (Typ, Aspect_Iterable)) + then + Error_Msg_N +(expect object that implements iterator interface, +Iter_Name); + end if; end if; -- Protect against malformed iterator
[Ada] Directories are no longer created for abstract projects
Directories such as object directories are no longer created for abstract projects when the builder (gnatmake or gprbuild) is called with -P or with --subdirs=..., even when there is no explicit indication in the abstract project that there are no sources in the project. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Vincent Celier cel...@adacore.com * prj-nmsc.adb (Get_Directories): Do not create directories when a project is abstract. Index: prj-nmsc.adb === --- prj-nmsc.adb(revision 216367) +++ prj-nmsc.adb(working copy) @@ -5498,13 +5498,15 @@ Dir_Exists : Boolean; No_Sources : constant Boolean := - ((not Source_Files.Default + Project.Qualifier = Abstract_Project + or else + (((not Source_Files.Default and then Source_Files.Values = Nil_String) or else (not Source_Dirs.Default and then Source_Dirs.Values = Nil_String) or else (not Languages.Default and then Languages.Values = Nil_String)) - and then Project.Extends = No_Project; + and then Project.Extends = No_Project); -- Start of processing for Get_Directories
[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports
Hi all we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to revision 216130 as r216256. We have also backported this set of revisions: r209643 as 215975 : [AArch64] Define TARGET_FLAGS_REGNUM r211881 as 215975 : PR target/61565 r213035 as 215846 : [AArch64] libitm: Improve _ITM_beginTransaction r213090 as 215847 : [AArch64] Fix *extr_insv_lower_regmode pattern r214824 as 215977 : [AArch64] Use CC_Z and CC_NZ with csinc and similar instructions r214825 as 216007 : [AArch32][1/2] Implement lceil, lfloor, lround optabs with new ARMv8-A instructions r214826 as 216007 : [AArch32][2/2] Vectorise lroundf, lfloorf, lceilf using the new ARMv8-A vcvt* instructions r214886 as 215944 : [AArch64] Improve epilogue unwind info rth r214940 as 215853 : [AArch64] Add a mode to operand 1 of sibcall_value_insn r214943 as 215946 : [AArch64] Add a builtin for rbit(q?)_p8; add intrinsics and tests. r214944 as 215948 : [AArch32/AArch64] Schedule alu_ext for Cortex-A53 r214945 as 215949 : [AArch64] Remove varargs from aarch64_simd_expand_args r214947 as 215854 : [AArch64] Tidy: remove unused qualifier_const_pointer r214959 as 215857 : [AArch32/AArch64] Add scheduling info for ARMv8-A FPU new instructions in Cortex-A53 r215050 as 215858 : [AArch32[1/7] Convert FP mnemonics to UAL | mov patterns. r215051 as 215858 : [AArch32][2/7] Convert FP mnemonics to UAL | add/sub/div/abs patterns r215052 as 215858 : [AARch32][3/7] Convert FP mnemonics to UAL | mul+add patterns r215053 as 215858 : [AArch32][4/7] Convert FP mnemonics to UAL | vcvt patterns r215054 as 215858 : [AArch32][5/7] Convert FP mnemonics to UAL | sqrt and FP compare patterns r215055 as 215858 : [AArch32][6/7] Convert FP mnemonics to UAL | movcc_vfp (fmstat) r215056 as 215858 : [AArch32][7/7] Convert FP mnemonics to UAL | f{ld,st}m - v{ld,st}m r215067 as 215923 : [AArch32] Enable auto-vectorization for copysignf r215085 as 216007 : [AArch32][tests] Make input and output arrays 128-bit aligned in vectorisation tests r215086 as 215928 : [AARch64] Add crtfastmath for AArch64 r215101 as 215929 : PR target/56846 libstdc++ r215136 as 215932 : PR target/63209 r215205 as 215935 : [Ree] Ensure inserted copy don't change the number of hard registers r215260 as 215937 : [AArch64] Fix force_simd macro in vdup_lane_2 r215321 as 215938 : Disallow -mfpu=neon for unsuitable architectures r215346 as 215940 : movmisalignmode_neon_load r215385 as 215941 : [AArch64] Add constraint letter for stack_protect_test pattern r215471 as 216004 : [AArch64] Auto-generate the BUILTIN_ macros This will be part of our 2014.10 4.9 release. Thanks, Yvan
[Ada] Internal clean up (use Is_Directory_Separator)
This is an internal clean up to use an existing abstraction more extensively. No external effect, no test required. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-17 Robert Dewar de...@adacore.com * gnatcmd.adb, make.adb, prj-part.adb, gnatlink.adb, prj-nmsc.adb, prj-conf.adb, prj-env.adb: Use Is_Directory_Separator where possible. Index: gnatcmd.adb === --- gnatcmd.adb (revision 216367) +++ gnatcmd.adb (working copy) @@ -883,10 +883,9 @@ if not Is_Absolute_Path (Exec_File_Name) then for Index in Exec_File_Name'Range loop if Exec_File_Name (Index) = Directory_Separator then - Fail (relative executable ( - Exec_File_Name - ) with directory part not allowed - when using project files); + Fail (relative executable ( Exec_File_Name + ) with directory part not allowed + when using project files); end if; end loop; @@ -1398,9 +1397,7 @@ else for K in Switch'Range loop -if Switch (K) = '/' - or else Switch (K) = Directory_Separator -then +if Is_Directory_Separator (Switch (K)) then Test_Existence := True; exit; end if; Index: make.adb === --- make.adb(revision 216367) +++ make.adb(working copy) @@ -4057,8 +4057,7 @@ begin First := Name'Last; while First Name'First -and then Name (First - 1) /= Directory_Separator -and then Name (First - 1) /= '/' +and then not Is_Directory_Separator (Name (First - 1)) loop First := First - 1; end loop; @@ -6805,8 +6804,7 @@ begin First := Name'Last; while First Name'First - and then Name (First - 1) /= Directory_Separator - and then Name (First - 1) /= '/' + and then not Is_Directory_Separator (Name (First - 1)) loop First := First - 1; end loop; Index: prj-part.adb === --- prj-part.adb(revision 216367) +++ prj-part.adb(working copy) @@ -349,8 +349,7 @@ Get_Name_String (Path_Name_Of (Main_Project, In_Tree)); while Name_Len 0 -and then Name_Buffer (Name_Len) /= Directory_Separator -and then Name_Buffer (Name_Len) /= '/' +and then not Is_Directory_Separator (Name_Buffer (Name_Len)) loop Name_Len := Name_Len - 1; end loop; Index: gnatlink.adb === --- gnatlink.adb(revision 216367) +++ gnatlink.adb(working copy) @@ -1204,9 +1204,8 @@ if GCC_Index = 0 then GCC_Index := Index (Path (1 .. Path_Last), - Directory_Separator - lib - Directory_Separator); + Directory_Separator lib +Directory_Separator); end if; -- If we have found a lib subdir in Index: prj-nmsc.adb === --- prj-nmsc.adb(revision 216381) +++ prj-nmsc.adb(working copy) @@ -5031,10 +5031,7 @@ if OK then for J in 1 .. Name_Len loop - if Name_Buffer (J) = '/' - or else -Name_Buffer (J) = Directory_Separator - then + if Is_Directory_Separator (Name_Buffer (J)) then OK := False; exit; end if; @@ -5336,9 +5333,7 @@ function Compute_Directory_Last (Dir : String) return Natural is begin if Dir'Length 1 -and then (Dir (Dir'Last - 1) = Directory_Separator -or else - Dir (Dir'Last - 1) = '/') +and then Is_Directory_Separator (Dir (Dir'Last - 1)) then return Dir'Last - 1; else @@ -5858,7 +5853,7 @@ -- Check that there is no directory information for J in 1 .. Last loop - if Line (J) = '/' or else Line (J) = Directory_Separator then + if
[patch] LWG 2019 - std::isblankC(C, const std::locale)
http://cplusplus.github.io/LWG/lwg-defects.html#2019 I've checked the relevant _ISblank/_ISBLANK/_CTYPE_B constant on all targets except VxWorks where I chose something that looks reasonable. Not all targets reserve a bit for isblank, but this way ctype_base::blank is always defined, but on some platforms with the same value as ctype_base::space. That means that on those targets isblank(c, loc) is equivalent to isspace(c, loc) which is not correct, but isn't completely crazy either. Some systems (bionic, newlib, netbsd, openbsd) do define a _B (or _CTYPE_B) constant, but as it says on netbsd: /* * isblank() is implemented as C function, due to insufficient bitwidth in * _ctype_. Note that _B does not mean isblank - it means isprint !isgraph. */ On those targets there is no bitmask corresponding to the isblank set. I don't know how to solve that without changing ctype_base::mask to a wider type, which I'm not planning on doing. N.B. on other BSDs (freebsd, darwin, dragonfly) _CTYPE_B *does* correspond to isblank. Portability is fun. Some implementations of ctypechar::is(mask, char) and/or ctypewchar_t::do_is defined inline in config/os/*/ctype_inline.h need the ctype_base::blank mask, but those files get included by C++98 code, so for some targets ctype_base::blank is always defined even in C++98 mode. Solving that is too difficult. Tested x86_64-linux, with --enable-clocale={gnu,generic} and also by hacking configure.host to use config/os/generic, and also tested on x86_64-netbsd5.1 and x86_64-dragonfly3.6. Something will probably break on a target I didn't test, but should be easy to fix. I plan to commit this later today. commit 1c95ff5159b6a41e4f5f4d4919b8e394905bb9c4 Author: Jonathan Wakely jwak...@redhat.com Date: Thu Oct 16 15:21:02 2014 +0100 * src/c++98/Makefile.am: Move ctype.cc, ctype_configure_char.cc and ctype_members.cc to ... * src/c++11/Makefile.am: Here. * src/c++98/Makefile.in: Regenerate. * src/c++11/Makefile.in: Regenerate. * src/c++98/ctype.cc: Move file to ... * src/c++11/ctype.cc: Here, define ctype_base::blank. * config/abi/pre/gnu.ver: Export ctype_base::blank. * config/locale/generic/ctype_members.cc (ctypewchar_t::_M_convert_to_wmask): Handle blank. Update comments. * config/locale/gnu/ctype_members.cc (ctypewchar_t::_M_convert_to_wmask): Likewise. * config/os/aix/ctype_base.h (ctype_base::blank): Declare. * config/os/bionic/ctype_base.h (ctype_base::blank): Likewise. * config/os/bsd/darwin/ctype_base.h (ctype_base::blank): Declare. * config/os/bsd/darwin/ctype_inline.h (ctypechar::is): Use blank. (ctypewchar_t::do_is): Likewise. * config/os/bsd/dragonfly/ctype_base.h (ctype_base::blank): Declare. * config/os/bsd/dragonfly/ctype_inline.h (ctypechar::is): Use blank. (ctypewchar_t::do_is): Likewise. * config/os/bsd/freebsd/ctype_base.h (ctype_base::blank): Declare. * config/os/bsd/freebsd/ctype_inline.h (ctypechar::is): Use blank. (ctypewchar_t::do_is): Likewise. * config/os/bsd/netbsd/ctype_base.h (ctype_base::blank): Declare. * config/os/bsd/openbsd/ctype_base.h (ctype_base::blank): Likewise. * config/os/djgpp/ctype_base.h (ctype_base::blank): Likewise. * config/os/generic/ctype_base.h (ctype_base::blank): Declare. * config/os/generic/ctype_inline.h (ctypechar::is): Use blank. * config/os/gnu-linux/ctype_base.h (ctype_base::blank): Declare. * config/os/hpux/ctype_base.h (ctype_base::blank): Likewise. * config/os/mingw32-w64/ctype_base.h (ctype_base::blank): Declare. * config/os/mingw32-w64/ctype_configure_char.cc (ctypechar::classic_table()): Set blank bit for space and tab. * config/os/mingw32/ctype_base.h (ctype_base::blank): Declare. * config/os/mingw32/ctype_configure_char.cc (ctypechar::classic_table()): Set blank bit for space and tab. * config/os/newlib/ctype_base.h (ctype_base::blank): Declare. * config/os/qnx/qnx6.1/ctype_base.h (ctype_base::blank): Likewise. * config/os/solaris/solaris2.10/ctype_base.h (ctype_base::blank): Likewise. * config/os/tpf/ctype_base.h (ctype_base::blank): Likewise. * config/os/uclibc/ctype_base.h (ctype_base::blank): Likewise. * config/os/vxworks/ctype_base.h (ctype_base::blank): Likewise. * include/bits/locale_facets.h (isblank): Define. * include/bits/localefwd.h (isblank): Declare. * testsuite/22_locale/classification/isblank.cc: New. * testsuite/22_locale/ctype_base/blank.cc: New. diff --git a/libstdc++-v3/src/c++11/Makefile.am b/libstdc++-v3/src/c++11/Makefile.am index 39425d4..c8507ce 100644 --- a/libstdc++-v3/src/c++11/Makefile.am +++ b/libstdc++-v3/src/c++11/Makefile.am @@ -27,9 +27,22 @@ noinst_LTLIBRARIES = libc++11convenience.la headers = +# Source files linked in via configuration/make substitution for a +# particular host. +host_sources = \ + ctype_configure_char.cc \ +
[RS6000] Fix -mlongcall with nested functions on AIX
Hi, -mlongcall miscompiles nested functions with the AIX ABI. The problem is that, when -mlongcall is in effect, rs6000_call_aix redirects all calls to: /* Handle indirect calls. */ if (GET_CODE (func_desc) != SYMBOL_REF || (DEFAULT_ABI == ABI_AIX !SYMBOL_REF_FUNCTION_P (func_desc))) and then /* A function pointer under AIX is a pointer to a data area whose first word contains the actual address of the function, whose second word contains a pointer to its TOC, and whose third word contains a value to place in the static chain register (r11). Note that if we load the static chain, our trampoline need not have any executable code. */ But, if the call was originally direct, no trampoline has been built, which means that the value loaded into the static chain register is garbage. That's sort of OK, except when the called function is nested because the static chain register has already been loaded with the proper static chain value by the generic code, so overwriting it with garbage breaks the program. Tested on PowerPC/AIX, OK for the mainline? 2014-10-17 Eric Botcazou ebotca...@adacore.com * config/rs6000/rs6000.c (rs6000_call_aix): For the AIX ABI, do not load the static chain if the call was originally direct. 2014-10-17 Eric Botcazou ebotca...@adacore.com * gcc.target/powerpc/longcall-2.c: New test. -- Eric BotcazouIndex: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 216252) +++ config/rs6000/rs6000.c (working copy) @@ -32568,6 +32568,8 @@ rs6000_legitimate_constant_p (enum machi void rs6000_call_aix (rtx value, rtx func_desc, rtx flag, rtx cookie) { + const bool direct_call_p += GET_CODE (func_desc) == SYMBOL_REF SYMBOL_REF_FUNCTION_P (func_desc); rtx toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM); rtx toc_load = NULL_RTX; rtx toc_restore = NULL_RTX; @@ -32636,8 +32638,11 @@ rs6000_call_aix (rtx value, rtx func_des func_toc_offset)); toc_load = gen_rtx_USE (VOIDmode, func_toc_mem); - /* If we have a static chain, load it up. */ - if (TARGET_POINTERS_TO_NESTED_FUNCTIONS) + /* If we have a static chain, load it up. But, if the call was + originally direct, the 3rd word has not been written since no + trampoline has been built, so we ought not to load it, lest we + override a static chain value. */ + if (!direct_call_p TARGET_POINTERS_TO_NESTED_FUNCTIONS) { rtx sc_reg = gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM); rtx func_sc_offset = GEN_INT (2 * GET_MODE_SIZE (Pmode));/* { dg-do run } */ /* { dg-options -mlongcall } */ extern void abort (void); #define VAL 12345678 int j = VAL; void bar (void) { if (j != VAL) abort (); } int main (void) { int i = VAL; int foo (void) { if (i != VAL) abort (); } foo (); bar (); return 0; }
[patch, rfc] fix warning building libssp in C11 mode
Building libssp in C11 mode shows a warning for 64bit configurations, ../../../src/libssp/gets-chk.c:62:12: warning: return makes pointer from integer without a cast [-Wint-conversion] Currently working around by adding a prototype in gets-chk.c, conditionally defined by the inverted condition found in glibc's stdio.h. Is there a better approach? Matthias # DP: Declare prototype for gets in C11 mode --- libssp/gets-chk.c +++ libssp/gets-chk.c @@ -51,6 +51,11 @@ # include string.h #endif +#if !(!defined __USE_ISOC11\ + || (defined __cplusplus __cplusplus = 201103L)) +extern char *gets (char *); +#endif + extern void __chk_fail (void) __attribute__((__noreturn__)); char *
Re: -fuse-caller-save - Collect register usage information
On 16-10-14 23:46, Eric Botcazou wrote: Having said that, in my mind, what is confusing about the name -fuse-caller-save, is that in fact the caller-save registers are already used in register allocation. It's just that they're used across calls without the need to save them, but -fuse-caller-save-across-calls-without-saving-if-possible is not such a good option name. Agreed. Another thing that - in my mind - is confusing is that there's an option fcaller-saves which controls behaviour for caller-save registers: - for -fno-caller-saves, caller-save registers are not used across calls - for -fcaller-saves, caller-save registers are used across calls The name is similar to -fuse-caller-save, and it won't be clear from just the names what the difference is. OK, so the existing -fcaller-saves is in fact -fuse-caller-saves, Right, in the sense that a caller-save is the save of caller-save register, as opposed to short for a caller-save register, which is how it's used in -fuse-caller-save. which means that we should really find a better name for yours. :-) Agreed :) I've pondered the name -fipa-ira, but I rejected that earlier because that might suggest actual register allocation at the interprocedural scope, while this is only register allocation at the scope of a single procedure, taking some interprocedural information into account. Furthermore, it's not only ira that uses the interprocedural information. So, let's a generate a list of option names. -fuse-caller-save -fuse-call-clobbered -fprecise-call-clobbers -foptimize-call-clobbers -fprune-call-clobbers -freduce-call-clobbers -fcall-clobbers-ipa Any preferences, alternatives? Given the existing -fcaller-saves, I'd keep caller-saves in the name, so something along the lines of -foptimize-caller-saves or -fipa-caller-saves. Let's look at the effect of the option (after the recent fix for PR61605) on gcc.target/i386/fuse-calller-save.c: ... foo: .LFB1: .cfi_startproc - pushq %rbx - .cfi_def_cfa_offset 16 - .cfi_offset 3, -16 - movl%edi, %ebx callbar - addl%ebx, %eax - popq%rbx - .cfi_def_cfa_offset 8 + addl%edi, %eax ret .cfi_endproc .LFE1: ... So, the effect is: instead of using a callee-save register, we use a caller-save register to store a value that's live over a call, without needing to add a caller-save, as would be normally the case. If I see an option -foptimize-caller-saves, I'd expect the effect to be that without, there are some caller-saves and with, there are less. This is not the case in the diff above. Nevertheless, if we'd have a case where we already have caller-saves, that would be indeed the observed effect. I'm just trying to point out that the optimization does more than just removing caller-saves. The optimization, at it's core, can be regarded as removing superfluous clobbers from calls, and everything else is derived from that: - if a caller-save register is not clobbered by a call, then there's no need for a caller-save before that call, so it's cheaper to use across that call than a callee-save register. (which explains what we see in the diff) - if a caller-save register is live across a call, and is not clobbered by a call, then there's no need for a caller-save, and it can be removed. (which explains what we see in case we have an example where there are actual caller-saves without the optimization, and less so with the optimization) I'm starting to lean towards -foptimize-call-clobbers or similar. Thanks, - Tom
Re: -fuse-caller-save - Collect register usage information
On Fri, Oct 17, 2014 at 12:47 PM, Tom de Vries tom_devr...@mentor.com wrote: On 16-10-14 23:46, Eric Botcazou wrote: Having said that, in my mind, what is confusing about the name -fuse-caller-save, is that in fact the caller-save registers are already used in register allocation. It's just that they're used across calls without the need to save them, but -fuse-caller-save-across-calls-without-saving-if-possible is not such a good option name. Agreed. Another thing that - in my mind - is confusing is that there's an option fcaller-saves which controls behaviour for caller-save registers: - for -fno-caller-saves, caller-save registers are not used across calls - for -fcaller-saves, caller-save registers are used across calls The name is similar to -fuse-caller-save, and it won't be clear from just the names what the difference is. OK, so the existing -fcaller-saves is in fact -fuse-caller-saves, Right, in the sense that a caller-save is the save of caller-save register, as opposed to short for a caller-save register, which is how it's used in -fuse-caller-save. which means that we should really find a better name for yours. :-) Agreed :) I've pondered the name -fipa-ira, but I rejected that earlier because that might suggest actual register allocation at the interprocedural scope, while this is only register allocation at the scope of a single procedure, taking some interprocedural information into account. Furthermore, it's not only ira that uses the interprocedural information. So, let's a generate a list of option names. -fuse-caller-save -fuse-call-clobbered -fprecise-call-clobbers -foptimize-call-clobbers -fprune-call-clobbers -freduce-call-clobbers -fcall-clobbers-ipa Any preferences, alternatives? Given the existing -fcaller-saves, I'd keep caller-saves in the name, so something along the lines of -foptimize-caller-saves or -fipa-caller-saves. Let's look at the effect of the option (after the recent fix for PR61605) on gcc.target/i386/fuse-calller-save.c: ... foo: .LFB1: .cfi_startproc - pushq %rbx - .cfi_def_cfa_offset 16 - .cfi_offset 3, -16 - movl%edi, %ebx callbar - addl%ebx, %eax - popq%rbx - .cfi_def_cfa_offset 8 + addl%edi, %eax ret .cfi_endproc .LFE1: ... So, the effect is: instead of using a callee-save register, we use a caller-save register to store a value that's live over a call, without needing to add a caller-save, as would be normally the case. If I see an option -foptimize-caller-saves, I'd expect the effect to be that without, there are some caller-saves and with, there are less. This is not the case in the diff above. Nevertheless, if we'd have a case where we already have caller-saves, that would be indeed the observed effect. I'm just trying to point out that the optimization does more than just removing caller-saves. The optimization, at it's core, can be regarded as removing superfluous clobbers from calls, and everything else is derived from that: - if a caller-save register is not clobbered by a call, then there's no need for a caller-save before that call, so it's cheaper to use across that call than a callee-save register. (which explains what we see in the diff) - if a caller-save register is live across a call, and is not clobbered by a call, then there's no need for a caller-save, and it can be removed. (which explains what we see in case we have an example where there are actual caller-saves without the optimization, and less so with the optimization) I'm starting to lean towards -foptimize-call-clobbers or similar. Well, it is really some form of IPA driven register allocation. Whether you want to call it -fipa-ra or not is another question - but if we had such option then enabling it with that option would be fine. Also users may have no idea what call vs callee clobbers are, but IPA RA may be a term that is more widely known (or at least google can come up with something for you). So - I like -fipa-ra more. I can't see the obvious difference between -foptimize-caller-saves and -foptimize-call-clobbers (for the latter -fipa-call-clobbers would be more to the point?) Richard. Thanks, - Tom
Re: [PATCHv4][Kasan] Allow to override Asan shadow offset from command line
On Fri, Oct 17, 2014 at 11:53:01AM +0400, Yury Gribov wrote: From 0225b7878bbb5b803814646d089824d016316fef Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Thu, 16 Oct 2014 18:31:10 +0400 Subject: [PATCH 1/2] Add strtoull to libiberty. 2014-10-17 Yury Gribov y.gri...@samsung.com libiberty/ * strtoull.c: New file. Just putting a file in there won't magically make it be part of libiberty. Please read libiberty/README on how to add an optional file. strtoul is also optional as strotoull should be, so you can also just grep for strtoul in libiberty/* and add on similar spots. Not sure if there aren't extra steps to make strtoull prototype available in system.h, libiberty.h etc. for systems that don't have strtoull in their headers. CCing Ian as libiberty maintainer. Jakub
[C PATCH] Enable initializing statics with COMPOUND_LITERAL_EXPR in C99 (PR c/63567)
Building Linux kernel failed with 'error: initializer element is not constant', because they're initializing objects with static storage duration with (T){ ...} - and that isn't permitted in gnu99/gnu11. I think the Right Thing is to allow some latitude here and enable it even in gnu99/gnu11 unless -pedantic. In gnu89, this will work as before even with -pedantic. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-10-17 Marek Polacek pola...@redhat.com PR c/63567 * c-typeck.c (digest_init): Allow initializing objects with static storage duration with compound literals in non-pedantic mode. * gcc.dg/pr63567.c: New test. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 5c0697a..8ddf368 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -6676,7 +6676,7 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype, inside_init = convert (type, inside_init); if (require_constant - (code == VECTOR_TYPE || !flag_isoc99) + (code == VECTOR_TYPE || !pedantic || !flag_isoc99) TREE_CODE (inside_init) == COMPOUND_LITERAL_EXPR) { /* As an extension, allow initializing objects with static storage diff --git gcc/testsuite/gcc.dg/pr63567.c gcc/testsuite/gcc.dg/pr63567.c index e69de29..cf942ef 100644 --- gcc/testsuite/gcc.dg/pr63567.c +++ gcc/testsuite/gcc.dg/pr63567.c @@ -0,0 +1,11 @@ +/* PR c/63567 */ +/* { dg-do compile } */ +/* { dg-options } */ + +/* Allow initializing objects with static storage duration with + compound literals even in non-pedantic gnu99/gnu11. This is + being used in Linux kernel. */ + +struct T { int i; }; +struct S { struct T t; }; +static struct S s = (struct S) { .t = { 42 } }; Marek
[PATCH] Don't expand string/memory builtins if ASan is enabled.
Hi, this patch disables string/memory builtin functions inlining if ASan is enabled. As described in my previous post (https://gcc.gnu.org/ml/gcc/2014-09/msg00020.html), this allow us to be sure that some dangerous builtins (strcpy, stpcpy, etc) will be handled correctly. Also, some redundant checks will be removed for builtin functions, that are instrumented but later not inlined for some reason. Patch also changes logic in asan_mem_ref_hash updating. I eliminated memory ref access size from hash computing, so all accesses for same memory reference have the same hash. Updating of asan_mem_ref_hash occurs only if new access size is greater then saved one. I've provided some performance testing (spec2006 v1.1) on x86_64-unknown-linux-gnu and attached results in test.res (sorry for this, I couldn't make my Thunderbird make a pretty table). Regtested / bootstrapped on x86_64-unknown-linux-gnu. Does this patch look sane? -Maxim $ ~/install/master-x86_64/bin/gcc -v Using built-in specs. COLLECT_GCC=/home/max/install/master-x86_64/bin/gcc COLLECT_LTO_WRAPPER=/home/max/install/master-x86_64/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /home/max/workspace/downloads/gcc/configure --enable-multilib --enable-checking --target=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu --prefix=/home/max/install/master-x86_64 --disable-bootstrap --enable-languages=c,c++ Thread model: posix gcc version 5.0.0 20141014 (experimental) (GCC) Compile options: -O3 -fsanitize=address -static-libasan. testmaster nobuiltin % of slowdown 400.perlbench1044 10500,5747 401.bzip2682 676-0,8798 403.gcc 497 495-0,4024 429.mcf 488 489 0,2049 445.gobmk723 724 0,1383 456.hmmer783 750-4,2146 458.sjeng887 880-0,7892 462.libquantum 330 323-2,1212 464.h264ref 1108 11544,1516 471.omnetpp 545 559 2,5688 473.astar490 480-2,0408 483.xalancbmk411 400-2,6764 433.milc 517 509-1,5474 444.namd 419 419 0, 450.soplex 310 299-3,5484 453.povray 287 276-3,8328 470.lbm 299 306 2,3411 482.sphinx3 777 804 3,4749 Geomean: master nobuiltin% of increase 540538-0,50 gcc/ChangeLog: 2014-10-17 Max Ostapenko m.ostape...@partner.samsung.com * asan.c (asan_mem_ref_hasher::hash): Remove MEM_REF access size from hash value construction. Call iterative_hash_expr instead of explicit hash building. (asan_mem_ref_hasher::equal): Change condition. (has_mem_ref_been_instrumented): Likewise. (update_mem_ref_hash_table): Likewise. (maybe_update_mem_ref_hash_table): New function. (instrument_strlen_call): Removed. (instrument_mem_region_access): Likewise. (instrument_builtin_call): Call maybe_update_mem_ref_hash_table instead of instrument_mem_region_access. * builtins.c (is_memory_builtin): New function. (expand_builtin): Don't expand string/memory builtin functions if ASan is enabled. * builtins.def: Add comment. gcc/testsuite/ChangeLog: 2014-10-17 Max Ostapenko m.ostape...@partner.samsung.com * c-c++-common/asan/no-redundant-instrumentation-1.c: Updated test. * c-c++-common/asan/no-redundant-instrumentation-4.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-5.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-6.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-7.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-8.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-2.c: Removed. * c-c++-common/asan/no-redundant-instrumentation-9.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-10.c: New test. * c-c++-common/asan/no-redundant-instrumentation-11.c: Likewise. diff --git a/gcc/asan.c b/gcc/asan.c index 2a61a82..391f693 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -352,10 +352,7 @@ struct asan_mem_ref_hasher inline hashval_t asan_mem_ref_hasher::hash (const asan_mem_ref *mem_ref) { - inchash::hash hstate; - inchash::add_expr (mem_ref-start, hstate); - hstate.add_wide_int (mem_ref-access_size); - return hstate.end (); + return iterative_hash_expr (mem_ref-start, 0); } /* Compare two memory references. We accept the length of either @@ -365,8 +362,7 @@ inline bool asan_mem_ref_hasher::equal (const asan_mem_ref *m1, const asan_mem_ref *m2) { - return (m1-access_size == m2-access_size - operand_equal_p (m1-start, m2-start, 0)); + return operand_equal_p (m1-start, m2-start, 0); } static hash_tableasan_mem_ref_hasher *asan_mem_ref_ht; @@ -417,7 +413,8 @@ has_mem_ref_been_instrumented (tree ref, HOST_WIDE_INT access_size) asan_mem_ref r;
[C PATCH] Make -Wno-implicit-int work in C99 mode
C99 mode warns about defaulting to int by default, but without the possibility to suppress the warning with -Wno-implicit-int. This is likely to arouse the ire of the users, especially with the new default. Therefore the following patch tweaks warn_implicit_int in such a way that -Wimplicit and -Wimplicit-int should work as intended (following the rule that more specific option takes precedence over the less specific). There should be no changes in GNU89 mode. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-10-17 Marek Polacek pola...@redhat.com c-family/ * c-opts.c (c_common_post_options): Set warn_implicit_int. * c.opt (Wimplicit-int): Initialize to -1. c/ * c-decl.c (grokdeclarator): Use OPT_Wimplicit_int unconditionally. (start_function): Use OPT_Wimplicit_int instead of 0. (store_parm_decls_oldstyle): Likewise. testsuite/ * gcc.dg/Wimplicit-int-1.c: New test. * gcc.dg/Wimplicit-int-2.c: New test. * gcc.dg/Wimplicit-int-3.c: New test. * gcc.dg/Wimplicit-int-4.c: New test. diff --git gcc/c-family/c-opts.c gcc/c-family/c-opts.c index eb078e3..448eb3e 100644 --- gcc/c-family/c-opts.c +++ gcc/c-family/c-opts.c @@ -864,6 +864,10 @@ c_common_post_options (const char **pfilename) if (warn_implicit_function_declaration == -1) warn_implicit_function_declaration = flag_isoc99; + /* -Wimplicit-int is enabled by default for C99. */ + if (warn_implicit_int == -1) +warn_implicit_int = flag_isoc99; + /* Declone C++ 'structors if -Os. */ if (flag_declone_ctor_dtor == -1) flag_declone_ctor_dtor = optimize_size; diff --git gcc/c-family/c.opt gcc/c-family/c.opt index 72ac2ed..4f96cf8 100644 --- gcc/c-family/c.opt +++ gcc/c-family/c.opt @@ -488,7 +488,7 @@ C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning LangEnabledBy(C Warn about implicit function declarations Wimplicit-int -C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit) +C ObjC Var(warn_implicit_int) Init(-1) Warning LangEnabledBy(C ObjC,Wimplicit) Warn when a declaration does not specify a type Wimport diff --git gcc/c/c-decl.c gcc/c/c-decl.c index 839c67b..b18da48 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -5330,11 +5330,11 @@ grokdeclarator (const struct c_declarator *declarator, else { if (name) - warn_defaults_to (loc, flag_isoc99 ? 0 : OPT_Wimplicit_int, + warn_defaults_to (loc, OPT_Wimplicit_int, type defaults to %int% in declaration of %qE, name); else - warn_defaults_to (loc, flag_isoc99 ? 0 : OPT_Wimplicit_int, + warn_defaults_to (loc, OPT_Wimplicit_int, type defaults to %int% in type name); } } @@ -8120,7 +8120,7 @@ start_function (struct c_declspecs *declspecs, struct c_declarator *declarator, } if (warn_about_return_type) -warn_defaults_to (loc, flag_isoc99 ? 0 +warn_defaults_to (loc, flag_isoc99 ? OPT_Wimplicit_int : (warn_return_type ? OPT_Wreturn_type : OPT_Wimplicit_int), return type defaults to %int%); @@ -8429,7 +8429,8 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) if (flag_isoc99) pedwarn (DECL_SOURCE_LOCATION (decl), -0, type of %qD defaults to %int%, decl); +OPT_Wimplicit_int, type of %qD defaults to %int%, +decl); else warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wmissing_parameter_type, diff --git gcc/testsuite/gcc.dg/Wimplicit-int-1.c gcc/testsuite/gcc.dg/Wimplicit-int-1.c index e69de29..0c89caf 100644 --- gcc/testsuite/gcc.dg/Wimplicit-int-1.c +++ gcc/testsuite/gcc.dg/Wimplicit-int-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options } */ + +static l; /* { dg-warning type defaults to } */ + +foo (a) /* { dg-warning return type defaults to } */ +/* { dg-warning type of .a. defaults to .int. type { target *-*-* } 6 } */ +{ + auto p; /* { dg-warning type defaults to } */ + typedef bar; /* { dg-warning type defaults to } */ +} diff --git gcc/testsuite/gcc.dg/Wimplicit-int-2.c gcc/testsuite/gcc.dg/Wimplicit-int-2.c index e69de29..158b61c 100644 --- gcc/testsuite/gcc.dg/Wimplicit-int-2.c +++ gcc/testsuite/gcc.dg/Wimplicit-int-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -pedantic-errors } */ + +static l; /* { dg-error type defaults to } */ + +foo (a) /* { dg-error return type defaults to } */ +/* { dg-error type of .a. defaults to .int. type { target *-*-* } 6 } */ +{ + auto p; /* { dg-error type defaults to } */ + typedef bar; /* { dg-error type defaults to } */ +} diff --git gcc/testsuite/gcc.dg/Wimplicit-int-3.c gcc/testsuite/gcc.dg/Wimplicit-int-3.c index e69de29..654ce73 100644 ---
[PATCH][match-and-simplify] Merge from trunk
Lightly tested, committed. Richard. 2014-10-17 Richard Biener rguent...@suse.de Merge from trunk r216316 through r216394.
Re: [PATCH][0/n] Merge from match-and-simplify
On Fri, 17 Oct 2014, Richard Biener wrote: On Fri, 17 Oct 2014, Ramana Radhakrishnan wrote: On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 15/10/14 14:00, Richard Biener wrote: Any comments and reviews welcome (I don't think that my maintainership covers enough to simply check this in without approval). Hi Richard, The match-and-simplify branch bootstrapped successfully on aarch64-none-linux-gnu FWIW. What about regression tests ? Note the branch isn't regression free on x86_64 either. The branch does more than I want to merge to trunk (and it also retains all folding code I added patterns for). I've gone farther there to explore whether it will end up working in the end and what kind of features the IL and the APIs need. I've pasted testsuite results on x86_64 below for rev. 216324 which is based on trunk rev. 216315 which unfortunately has lots of regressions on its own. This is why I want to restrict the effect of the machinery to fold (), fold_stmt () and tree-ssa-forwprop.c for the moment and merge individual patterns (well, maybe in small groups) separately to allow for easy bi-section. I suppose I should push the most visible change to trunk first, namely tree-ssa-forwprop.c folding all statements via fold_stmt after the merge. I suspect this alone can have some odd effects like the sub + cmp fusing. That would be sth like the patch attached below. Just finished testing this (with -m32 on x86_64), showing regressions in the testsuite like FAIL: gcc.dg/tree-ssa/slsr-19.c scan-tree-dump-times optimized * y 1 FAIL: gcc.dg/vect/bb-slp-27.c -flto -ffat-lto-objects scan-tree-dump-times slp2 basic block vectorized 1 FAIL: gcc.dg/vect/bb-slp-27.c scan-tree-dump-times slp2 basic block vectorized 1 FAIL: gcc.dg/vect/bb-slp-8b.c -flto -ffat-lto-objects scan-tree-dump-times slp2 basic block vectorized 1 FAIL: gcc.dg/vect/bb-slp-8b.c scan-tree-dump-times slp2 basic block vectorized 1 FAIL: gcc.dg/vect/slp-cond-3.c -flto -ffat-lto-objects scan-tree-dump-times vec t vectorizing stmts using SLP 1 FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect vectorizing stmts usin g SLP 1 Bah. I suppose I need to investigate this (simply folding a stmt shouldn't cause any of the above... - with SLP it is probably operand canonicalization, but not sure). Richard. Richard. Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 216258) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3. #include tree-ssa-propagate.h #include tree-ssa-dom.h #include builtins.h +#include tree-cfgcleanup.h +#include tree-into-ssa.h /* This pass propagates the RHS of assignment statements into use sites of the LHS of the assignment. It's basically a specialized @@ -3586,6 +3588,8 @@ simplify_mult (gimple_stmt_iterator *gsi return false; } + + /* Main entry point for the forward propagation and statement combine optimizer. */ @@ -3626,6 +3630,40 @@ pass_forwprop::execute (function *fun) cfg_changed = false; + /* Combine stmts with the stmts defining their operands. Do that + in an order that guarantees visiting SSA defs before SSA uses. */ + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun)); + int postorder_num = inverted_post_order_compute (postorder); + for (int i = 0; i postorder_num; ++i) +{ + bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); +!gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple orig_stmt = stmt; + + if (fold_stmt (gsi)) + { + stmt = gsi_stmt (gsi); + if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt) +gimple_purge_dead_eh_edges (bb)) + cfg_changed = true; + update_stmt (stmt); + } + } +} + free (postorder); + + /* ??? Code below doesn't expect non-renamed VOPs and the above + doesn't keep virtual operand form up-to-date. */ + if (cfg_changed) +{ + cleanup_tree_cfg (); + cfg_changed = false; +} + update_ssa (TODO_update_ssa_only_virtuals); + FOR_EACH_BB_FN (bb, fun) { gimple_stmt_iterator gsi; -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH 9/17] Initial KAsan support
On Fri, Oct 17, 2014 at 12:44:26PM +0400, Yury Gribov wrote: On 10/17/2014 11:43 AM, Eric Botcazou wrote: Well, that's a backport of ancient patch from trunk so all credits go there. And flag_sanitize is indeed handled differently from other compiler flags. Really curious to know why... I bet this was a typo but let's wait for Jakub's comments. The obvious change to replace the direct flag with opts-x_ preapproved. Jakub
[PATCH] PR63442 ICE with ubsan/overflow-int128.c test on AArch64
the cause should be one minor bug in prepare_cmp_insn. the last mode parameter pmode of prepare_cmp_insn should match the mode of the first parameter x, while during the recursive call of prepare_cmp_insn, x is with mode of targetm.libgcc_cmp_return_mode () while pmode is assign to word_mode. generally this is OK, because default libgcc_cmp_return_mode hook always return word_mode, but AArch64 has a target private implementation which always return SImode, so there is a mismatch which cause a ICE later. this minor issue is hidding because nearly all other targets use default hook, and the compare is rarely invoked. Thanks gcc/ PR target/63442 * optabs.c (prepare_cmp_insn): Use target hook libgcc_cmp_return_mode instead of word_mode. diff --git a/gcc/optabs.c b/gcc/optabs.c index d55a6bb..3073816 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -4264,7 +4264,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, y = const0_rtx; } - *pmode = word_mode; + *pmode = targetm.libgcc_cmp_return_mode (); prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods, ptest, pmode); }
Re: [PATCH 0/17] KASan 4.9 backport
On Thu, Oct 16, 2014 at 12:34:35PM +0400, Yury Gribov wrote: Hi all, As discussed in https://gcc.gnu.org/ml/gcc/2014-09/msg00234.html , this patchset backports mainline patches necessary for Kernel ASan in GCC 4.9 (gcc-4_9-branch). The patcheset consists of * Asan headers installation (1 patch) * __asan_loadN/__asan_storeN support (3 patches) * instrumentation with calls support (1 patch) * optimization of strlen instrumentation (1 patch) * Kasan support (3 patches) * move inlining to sanopt (1 patches) * bugfixes (7 patches) To my knowledge it does not contain any changes that would influence ABI of generated code. The code was bootstrapped and regtested on x64 (I only tested the net result, not each patch in isolation). I had a brief look at what ended up on the branch in the end, and from what I understand, the 4.9 libasan.so has __asan_report_store_n and __asan_report_load_n entry points, but does not have any __asan_loadN/__asan_reportN entrypoints (neither 1/2/4/8/16, nor variable). So, what the branch does seems to not match what the library provides. E.g. trying: struct S { long long a; long long b; char c; }; void foo (struct S *p, struct S *q) { *p = *q; } int bar (struct S *p) { return p-a; } on x86_64-linux, with -fsanitize=kernel-address -O2 I get expected __asan_storeN/__asan_loadN/__asan_load8 calls. With -fsanitize=address -O2, foo unexpectedly is not instrumented (IMHO it should be, it can use __asan_report_{store,load}_n) and bar uses (expectedly) __asan_report_load8. With -fsanitize=address -O2 --param asan-instrumentation-with-call-threshold=0 foo is again unexpectedly not instrumented, and bar is instrumented with __asan_load8, which looks wrong to me, because the library does not provide any such entry point. Thus, IMHO the: if ((flag_sanitize SANITIZE_USER_ADDRESS) != 0 ((size_in_bytes (size_in_bytes - 1)) != 0 || (unsigned HOST_WIDE_INT) size_in_bytes - 1 = 16)) return; should be nuked from 4.9, we can do unaligned/non-{1,2,4,8,16} accesses fine. But, in execute_sanopt force !use_calls for (flag_sanitize SANITIZE_USER_ADDRESS). Or were there any bugfixes needed for __asan_report_{store,load}_n on the library side? Jakub
[v3 patch] partially fix testsuite/27_io/headers/cstdio/types_std.cc
testsuite/27_io/headers/cstdio/types_std.cc FAILs on dragonflybsd: /mnt/gcc-src/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc:25:13: error: aggregate 'FILE gnu::f' has incomplete type and cannot be defined /mnt/gcc-src/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc:26:13: error: aggregate 'FILE gnu::fpos_t' has incomplete type and cannot be defined These errors look correct to me, the C standard says that stdio.h declares FILE as an object type, but it doesn't say complete object type, so I think that's a bug in the test. I think there's another bug: #include cstdio namespace gnu { std::size_t s; std::FILE f; std::FILE fpos_t; } Surely that third declaration should be testing that fpos_t is a valid type, rather than declaring a variable of that name, so I'm committing the attached patch, which also fixes another fail on dragonflybsd. Tested x86_64-linux, committed to trunk. commit c388b3dc00c256c9cc5d8ae7b5bc37386bd14c58 Author: Jonathan Wakely jwak...@redhat.com Date: Fri Oct 17 12:57:31 2014 +0100 * testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc: Add dg-require-string-conversions. * testsuite/27_io/headers/cstdio/types_std.cc: Test for fpos_t. diff --git a/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc b/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc index 485a485..7fe6ff8 100644 --- a/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc +++ b/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc @@ -1,4 +1,5 @@ // { dg-options -std=gnu++11 } +// { dg-require-string-conversions } // 2014-03-27 R??diger Sonderfeld // test the hexadecimal floating point inserters (facet num_put) diff --git a/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc b/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc index a359b87..a34663f 100644 --- a/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc +++ b/libstdc++-v3/testsuite/27_io/headers/cstdio/types_std.cc @@ -23,5 +23,5 @@ namespace gnu { std::size_t s; std::FILE f; - std::FILE fpos_t; + std::fpos_t p; }
Re: [libstdc++ PATCH] Implement the Library Fundamentals v1 variable templates for type traits
On 16/10/14 23:04 +0300, Ville Voutilainen wrote: Argh, needed to do uglification, and formatting fixes. Also renamed the Dummy types in tests to something a bit more descriptive. I'm not using the tr2 test types because the test types in these tests are amalgamations of multiple properties to keep the tests simple(r). Thanks! Tested and committed to trunk.
Re: [PATCH] Don't expand string/memory builtins if ASan is enabled.
On Fri, Oct 17, 2014 at 03:45:52PM +0400, Maxim Ostapenko wrote: Patch also changes logic in asan_mem_ref_hash updating. I eliminated memory ref access size from hash computing, so all accesses for same memory reference have the same hash. Updating of asan_mem_ref_hash occurs only if new access size is greater then saved one. I guess that is reasonable. -/* Instrument an access to a contiguous memory region that starts at - the address pointed to by BASE, over a length of LEN (expressed in - the sizeof (*BASE) bytes). ITER points to the instruction before - which the instrumentation instructions must be inserted. LOCATION - is the source location that the instrumentation instructions must - have. If IS_STORE is true, then the memory access is a store; - otherwise, it's a load. */ +/* Insert a memory reference into the hash table if access length +can be determined in compile time. */ ... If you don't expand the memops builtins inline, I'd expect you start with get_mem_refs_of_builtin_call and remove all the builtins you stop expanding specially (i.e. emit a libcall instead unconditionally) that are handled by libsanitizer (only a subset of them are apparently, perhaps something to fix) from there. There are builtins that must be kept instrumented (e.g. all the sync/atomic builtins). There are builtins which might need first additions to libsanitizer (e.g. I see no __*_chk functions in libsanitizer). +/* Returns TRUE if given FCODE corresponds to string or memory builtin function. + */ + +static inline bool +is_memory_builtin (enum built_in_function fcode) +{ + return fcode = BUILT_IN_STRSTR fcode = BUILT_IN_BCMP; This is too fragile and ugly. IMHO you should list (supposedly not in a special inline, but directly where you use it) in a switch all the builtins you don't want to expand. Jakub
Re: [PATCH i386 AVX512] [75/n] Update vec_init.
Hello Jakub, On 15 Oct 18:23, Jakub Jelinek wrote: On Thu, Oct 09, 2014 at 04:13:25PM +0400, Kirill Yukhin wrote: --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39821,6 +39823,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V8HImode: + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); + Shouldn't that be TARGET_AVX512VL TARGET_AVX512BW ? Nice catch! Fixed. if (TARGET_SSE2) { struct expand_vec_perm_d dperm; @@ -39851,6 +39856,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V16QImode: + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); + Ditto. Ditto. if (TARGET_SSE2) goto permute; goto widen; @@ -39880,16 +39888,19 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, case V16HImode: case V32QImode: - { - enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode); - rtx x = gen_reg_rtx (hvmode); + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); Ditto. Ditto. @@ -40503,6 +40515,42 @@ half: gen_rtx_VEC_CONCAT (mode, op0, op1))); return; +case V64QImode: + quarter_mode = V16QImode; + half_mode = V32QImode; + goto quarter; + +case V32HImode: + quarter_mode = V8HImode; + half_mode = V16HImode; + goto quarter; I wonder whether for these modes it can ever be beneficial to build them through interleaves/concatenations etc., if it wouldn't be better to build them by storing all values into memory and just reading it back. I've tried this example: #include immintrin.h unsigned char a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63; __m512i foo () { return __extension__ (__m512i)(__v64qi){ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63 }; } w/ and w/o -mavx512bw (and always -mavx512f). When, this code works, we've got 127 lines of assembly to do this init. W/o AVX-512BW we've got 300 lines of code (mostly on GPRs, using sal, and etc.) Then I've looked into actual assembly w/ -mavx512bw and it turns out that no AVX-512BW insn were generated, only AVX-512F (and below). Fixed iterator. -(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF]) +(define_mode_iterator VI48F_I12_AVX512BW + [V16SI V16SF V8DI V8DF + (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512BW)]) What does the I12 stand for? Wasn't it meant to be VI48F_512_AVX512BW or I512? Actually, I am not awere of any name convention for iterators. As far as I understand, name [more or less] for vector mode should reflect: - Type family of the unit: float or int - Size of the unit: 1, 2, 4 etc. bytes - If possible, target predicates to enable certain modes in given iterator. The name is: - Vector (V) - I48F - contains both ints and floats of size 4 and 8 - I12 - contains ints of size 1 and 2 - AVX512BW - affected by the target (according to previous note - to be removed) Maybe it'll be better to name it: VF48_I1248? -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index baf0d3d..c3202c4 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39760,6 +39760,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, case V8SFmode: case V8SImode: case V2DFmode: +case V64QImode: +case V32HImode: case V2DImode: case V4SFmode: case V4SImode: @@ -39790,6 +39792,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V8HImode: + if (TARGET_AVX512VL TARGET_AVX512BW) +return ix86_vector_duplicate_value (mode, target, val); + if (TARGET_SSE2) { struct expand_vec_perm_d dperm; @@ -39820,6 +39825,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V16QImode: + if (TARGET_AVX512VL TARGET_AVX512BW) +return ix86_vector_duplicate_value (mode, target, val); + if (TARGET_SSE2) goto permute; goto widen; @@ -39849,16 +39857,19 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum
Re: [PATCH][1/n] Merge from match-and-simplify, public API
On Wed, Oct 15, 2014 at 01:40:07PM +0200, Richard Biener wrote: 2014-10-15 Richard Biener rguent...@suse.de * gimple-fold.h (gimple_build): Declare various overloads. (gimple_simplify): Likewise. (gimple_convert): Re-implement in terms of gimple_build. * gimple-fold.c (gimple_convert): Remove. (gimple_build): New functions. --- 45,141 extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple); ! /* gimple_build, functionally matching fold_buildN, outputs stmts !int the provided sequence, matching and simplifying them on-the-fly. !Supposed to replace force_gimple_operand (fold_buildN (...), ...). */ ! tree gimple_build (gimple_seq *, location_t, !enum tree_code, tree, tree, !tree (*valueize) (tree) = NULL); I find mixing prototypes with and without extern keyword weird, most of the prototypes in headers use extern, I think it would be cleaner to use it everywhere. *** gcc/gimple-fold.c.orig2014-10-14 15:49:30.634356179 +0200 --- gcc/gimple-fold.c 2014-10-15 13:02:08.158099055 +0200 *** along with GCC; see the file COPYING3. *** 56,61 --- 56,62 #include builtins.h #include output.h + /* Return true when DECL can be referenced from current unit. FROM_DECL (if non-null) specify constructor of variable DECL was taken from. We can get declarations that are not possible to reference for various Why the whitespace change? tree ! gimple_convert (gimple_seq *seq, location_t loc, tree type, tree op) { ! if (useless_type_conversion_p (type, TREE_TYPE (op))) ! return op; ! op = fold_convert_loc (loc, type, op); ! gimple_seq stmts = NULL; ! op = force_gimple_operand (op, stmts, true, NULL_TREE); ! gimple_seq_add_seq_without_update (seq, stmts); ! return op; } --- 5297,5487 return stmts; } ! ! 3 lines of vertical space too much? Otherwise, LGTM. Jakub
[committed] Fix ChangeLog entry
Hi all, I've committed the attached as obvious to fix up a whitespace issue in a patch I committed recently. This is r216399. Cheers, Kyrilldiff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9e35d69b..5a09e3e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -136,8 +136,8 @@ * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-835769 option. -2014-10-10 Kyrylo Tkachovkyrylo.tkac...@arm.com -Ramana Radhakrishnanramana.radhakrish...@arm.com +2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com +Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. (ADJUST_INSN_LENGTH): Define.
Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.
Hello Uroš, On 16 Oct 14:29, Uros Bizjak wrote: + if (mode == V4DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); + else if (mode == V2DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); Should this be v2di ? Right, copy-and-paste :( + } +} + else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */ op1 = gen_lowpart (V4SImode, op1); Please use function pointers in the added part. Done. Updated patch in the bottom. Is it ok? -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c3202c4..415e330 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -45667,7 +45667,22 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) enum machine_mode mode = GET_MODE (op0); rtx t1, t2, t3, t4, t5, t6; - if (TARGET_XOP mode == V2DImode) + if (TARGET_AVX512DQ) +{ + rtx (*gen) (rtx, rtx, rtx); + + if (mode == V8DImode) + gen = gen_avx512dq_mulv8di3; + else if (TARGET_AVX512VL) + { + if (mode == V4DImode) + gen = gen_avx512dq_mulv4di3; + else if (mode == V2DImode) + gen = gen_avx512dq_mulv2di3; + } + emit_insn (gen (op0, op1, op2)); +} + else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */ op1 = gen_lowpart (V4SImode, op1);
Re: [PATCHv4][Kasan] Allow to override Asan shadow offset from command line
Jakub Jelinek ja...@redhat.com writes: Not sure if there aren't extra steps to make strtoull prototype available in system.h, libiberty.h etc. for systems that don't have strtoull in their headers. See the #if defined(HAVE_DECL_XXX) !HAVE_DECL_XXX lines in include/libiberty.h. Although strtol is missing there as well. Ian
[wwwdocs] Add recent C++ changes to gcc-5/changes.html
Committed to CVS. ? htdocs/gcc-5/.changes.html.swp Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.18 diff -u -r1.18 changes.html --- htdocs/gcc-5/changes.html 15 Oct 2014 11:21:52 - 1.18 +++ htdocs/gcc-5/changes.html 17 Oct 2014 12:28:16 - @@ -60,6 +60,31 @@ liFull support for a href=https://www.cilkplus.org/;Cilk Plus/a has been added to the GCC compiler. Cilk Plus is an extension to the C and C++ languages to support data and task parallelism./li +liNew preprocessor constructs, code__has_include/code +and code__has_include_next/code, to test the availability of headers +have been added.br/ +This demonstrates a way to include the header codelt;optionalgt;/code +only if it is available:br/ +blockquotepre +#ifdef __has_include +# if __has_include(lt;optionalgt;) +#include lt;optionalgt; +#define have_optional 1 +# elif __has_include(lt;experimental/optionalgt;) +#include lt;experimental/optionalgt; +#define have_optional 1 +#define experimental_optional +# else +#define have_optional 0 +# endif +#endif +/pre/blockquote +The header search paths for code__has_include_next/code +and code__has_include_next/code are equivalent to those +of the standard directive code#include/code +and the extension code#include_next/code respectively. +/li + /ul h3 id=cC/h3 @@ -93,6 +118,10 @@ liG++ and libstdc++ now implement the feature-testing macros from a href=http://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations;Feature-testing recommendations for C++/a./li + liG++ now allows codetypename/code in a template template parameter. +blockquotepre + templatelt;templatelt;typenamegt; btypename/b Xgt; struct D; // OK +/pre/blockquote/li /ul h4 id=libstdcxxRuntime Library (libstdc++)/h4 @@ -100,11 +129,18 @@ lia href=https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011; Improved support for C++11/a, including: ul +li A new implementation of codestd::list/code is enabled by + default, with an O(1) codesize()/code function; /li li codestd::deque/code meets the allocator-aware container requirements;/li li movable and swappable iostream classes;/li -li support for codestd::aligned_union/code;/li -li I/O manipulators codestd::hexfloat/code and -codestd::defaultfloat/code; +li support for codestd::align/code and + codestd::aligned_union/code;/li +li Type traits codestd::is_trivially_copyable/code, + codestd::is_trivially_constructible/code, + codestd::is_trivially_assignable/code etc.; +/li +li I/O manipulators codestd::put_time/code, + codestd::hexfloat/code and codestd::defaultfloat/code; /li /ul /li @@ -128,12 +164,13 @@ ul li Class codestd::experimental::any/code; /li li Function template codestd::experimental::apply/code; /li +li Variable templates for type traits; /li /ul /li liNew random number distributions codelogistic_distribution/code and codeuniform_on_sphere_distribution/code as extensions./li lia href=https://sourceware.org/gdb/current/onlinedocs/gdb/Xmethods-In-Python.html;GDB - Xmethods/a for codestd::vector/code and codestd::unique_ptr/code;/li + Xmethods/a for Sequence Containers and codestd::unique_ptr/code;/li /ul h3 id=fortranFortran/h3
Re: [PATCH i386 AVX512] [80/n] Extend expand_sse2_mulvxdi3.
On Fri, Oct 17, 2014 at 2:32 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 16 Oct 14:29, Uros Bizjak wrote: + if (mode == V4DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); + else if (mode == V2DImode) + emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); Should this be v2di ? Right, copy-and-paste :( + } +} + else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */ op1 = gen_lowpart (V4SImode, op1); Please use function pointers in the added part. Done. Updated patch in the bottom. Is it ok? OK. Thanks, Uros.
Re: [PATCH][1/n] Merge from match-and-simplify, infrastructure
On Wed, Oct 15, 2014 at 01:39:33PM +0200, Richard Biener wrote: 2014-10-15 Richard Biener rguent...@suse.de Shouldn't Prathamesh be listed as co-author of the patch? + fprintf (f, case SSA_NAME:\n); + fprintf (f, {\n); + fprintf (f, gimple def_stmt = SSA_NAME_DEF_STMT (%s);\n, kid_opname); Etc.; so no attempt to indent the generated code, by tracking number of current indentation columns and trasnating that into a series of spaces or tabs or tabs+spaces? Other generated sources like insn-*.c usually are indented, at least to some extent. + char dest[32]; + snprintf (dest, 32, res_ops[%d], j); + const char *optype + = get_operand_type (e-operation, This seems to be indented too much. + type, e-expr_type, + j == 0 + ? NULL : TREE_TYPE (res_ops[0])); + /* The genmatch generator progam. It reads from a pattern description +and outputs GIMPLE or GENERIC IL matching and simplification routines. */ + + int + main(int argc, char **argv) Formatting ;) + return 1; + + bool gimple = true; + bool verbose = false; + char *input = argv[argc-1]; + for (int i = 1; i argc - 1; ++i) + { + if (strcmp (argv[i], -gimple) == 0) + gimple = true; + else if (strcmp (argv[i], -generic) == 0) + gimple = false; + else if (strcmp (argv[i], -v) == 0) + verbose = true; + else + { + fprintf (stderr, Usage: genmatch [-gimple] [-generic] [-v] input\n); + return 1; + } + } Wouldn't --gimple and --generic be nicer? Otherwise, LGTM. Jakub
Re: [PATCH][3/n] Merge from match-and-simplify, first patterns and questions
On Wed, Oct 15, 2014 at 01:40:49PM +0200, Richard Biener wrote: This adds a bunch of simplifications with constant operands or ones that simplify to constants, such as a + 0, x * 1. It's a patch mainly to get a few questions answered for further pattern merges: - The branch uses multiple .pd files and includes them from match.pd trying to group related stuff together. It has become somewhat difficult to do that grouping in some sensible manner so I am not sure this is the best approach. Any opinion? We can simply put everything into match.pd and group visually by overall comments. That would be probably my preference, unless match.pd grows too big. - Each pattern I will add will either be already implemented in some form in fold-const.c or tree-ssa-forwprop.c. Once the machinery is exercised from fold-const.c and tree-ssa-forwprop.c I can remove the duplicates at the same time I add a pattern. Should I do that? I guess it depends, if the new pattern covers the old one well, sure, the STRIP_{,SIGN_}NOPS issues might be more important, TREE_SIDE_EFFECTS probably less important (those shouldn't be really constant expressions and thus there should be fewer users expecting stuff to be folded). In any cases, we need to be prepared to cure some folding regressions if people report them and we find them desirable to be restored. Hopefully there won't be hundreds of such reports. Jakub
Re: [PATCH i386 AVX512] [75/n] Update vec_init.
On Fri, Oct 17, 2014 at 04:28:12PM +0400, Kirill Yukhin wrote: I wonder whether for these modes it can ever be beneficial to build them through interleaves/concatenations etc., if it wouldn't be better to build them by storing all values into memory and just reading it back. I've tried this example: #include immintrin.h unsigned char a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63; __m512i foo () { return __extension__ (__m512i)(__v64qi){ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63 }; } w/ and w/o -mavx512bw (and always -mavx512f). When, this code works, we've got 127 lines of assembly to do this init. W/o AVX-512BW we've got 300 lines of code (mostly on GPRs, using sal, and etc.) Then I've looked into actual assembly w/ -mavx512bw and it turns out that no AVX-512BW insn were generated, only AVX-512F (and below). Fixed iterator. Ok, if it is shorter than copying all those into memory and reading from memory, so be it. -(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF]) +(define_mode_iterator VI48F_I12_AVX512BW + [V16SI V16SF V8DI V8DF + (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512BW)]) What does the I12 stand for? Wasn't it meant to be VI48F_512_AVX512BW or I512? Actually, I am not awere of any name convention for iterators. As far as I understand, name [more or less] for vector mode should reflect: - Type family of the unit: float or int - Size of the unit: 1, 2, 4 etc. bytes - If possible, target predicates to enable certain modes in given iterator. The name is: - Vector (V) - I48F - contains both ints and floats of size 4 and 8 - I12 - contains ints of size 1 and 2 - AVX512BW - affected by the target (according to previous note - to be removed) Maybe it'll be better to name it: VF48_I1248? I'll leave that to Uros, the patch is ok by me. Jakub
Re: [PATCH i386 AVX512] [75/n] Update vec_init.
On Fri, Oct 17, 2014 at 2:57 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 17, 2014 at 04:28:12PM +0400, Kirill Yukhin wrote: I wonder whether for these modes it can ever be beneficial to build them through interleaves/concatenations etc., if it wouldn't be better to build them by storing all values into memory and just reading it back. I've tried this example: #include immintrin.h unsigned char a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63; __m512i foo () { return __extension__ (__m512i)(__v64qi){ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51, a52, a53, a54, a55, a56, a57, a58, a59, a60, a61, a62, a63 }; } w/ and w/o -mavx512bw (and always -mavx512f). When, this code works, we've got 127 lines of assembly to do this init. W/o AVX-512BW we've got 300 lines of code (mostly on GPRs, using sal, and etc.) Then I've looked into actual assembly w/ -mavx512bw and it turns out that no AVX-512BW insn were generated, only AVX-512F (and below). Fixed iterator. Ok, if it is shorter than copying all those into memory and reading from memory, so be it. -(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF]) +(define_mode_iterator VI48F_I12_AVX512BW + [V16SI V16SF V8DI V8DF + (V32HI TARGET_AVX512BW) (V64QI TARGET_AVX512BW)]) What does the I12 stand for? Wasn't it meant to be VI48F_512_AVX512BW or I512? Actually, I am not awere of any name convention for iterators. As far as I understand, name [more or less] for vector mode should reflect: - Type family of the unit: float or int - Size of the unit: 1, 2, 4 etc. bytes - If possible, target predicates to enable certain modes in given iterator. The name is: - Vector (V) - I48F - contains both ints and floats of size 4 and 8 - I12 - contains ints of size 1 and 2 - AVX512BW - affected by the target (according to previous note - to be removed) Maybe it'll be better to name it: VF48_I1248? I'll leave that to Uros, the patch is ok by me. Don't want to bikeshed, but VF48_I1248 looks somehow better to me. Anyway, the patch is OK even without this change. Thanks, Uros.
Re: [PATCH] Don't expand string/memory builtins if ASan is enabled.
On 10/17/2014 04:24 PM, Jakub Jelinek wrote: +/* Returns TRUE if given FCODE corresponds to string or memory builtin function. + */ + +static inline bool +is_memory_builtin (enum built_in_function fcode) +{ + return fcode = BUILT_IN_STRSTR fcode = BUILT_IN_BCMP; This is too fragile and ugly. IMHO you should list (supposedly not in a special inline, but directly where you use it) in a switch all the builtins you don't want to expand. We already do this for BUILT_IN_ASAN_REPORT_LOAD1 ... BUILT_IN_ASAN_STOREN but I agree that this one is more ugly. -Y
Re: [PATCH] Don't expand string/memory builtins if ASan is enabled.
On Fri, Oct 17, 2014 at 05:01:33PM +0400, Yury Gribov wrote: On 10/17/2014 04:24 PM, Jakub Jelinek wrote: +/* Returns TRUE if given FCODE corresponds to string or memory builtin function. + */ + +static inline bool +is_memory_builtin (enum built_in_function fcode) +{ + return fcode = BUILT_IN_STRSTR fcode = BUILT_IN_BCMP; This is too fragile and ugly. IMHO you should list (supposedly not in a special inline, but directly where you use it) in a switch all the builtins you don't want to expand. We already do this for BUILT_IN_ASAN_REPORT_LOAD1 ... BUILT_IN_ASAN_STOREN I know, but it is still a coherent sent of builtins for very similar purposes, many of them sorted by increasing size number. but I agree that this one is more ugly. The memops builtins are just random bag of them, it is expected many people will add builtins into that range and outside of that range. Jakub
Re: [PATCH] Simple improvement for predicate computation in if-convert phase.
Jeff, I prepared another patch that includes test-case as you requested. Below are answers on your questions. First, for the benefit of anyone trying to understand what you're doing, defining what cd equivalent means would be helpful. I added the following comment to function: fwe call basic blocks bb1 and bb2 cd-equivalent if they are executed under the same condition. Is it sufficient? So, do you have a case where the dominated_by_p test above is true and is_predicated(bb) returns true as well? I think this part of the change is largely responsible for the hack you're doing with having the function scoped static variable join_bb. I don't have such test-case and I assume that if bb is always executed, it is not predicated. I also deleted join_bb in my changes. Is it OK for trunk now. Thanks. Yuri. 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com gcc/ChangeLog * tree-if-conv.c (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree. (tree_if_conversion): Free post-dominance information. gcc/testsuite/ChangeLog * gcc/dg/tree-ssa/ifc-cd.c: New test. 2014-10-17 1:16 GMT+04:00 Jeff Law l...@redhat.com: On 10/16/14 05:52, Yuri Rumyantsev wrote: Hi All, Here is a simple enhancement for predicate computation in if-convert phase: We use notion of cd equivalence to get simpler predicate for join block, e.g. if join block has 2 predecessors with predicates p1 p2 and p1 !p2, we'd like to get p1 for it instead of p1 p2 | p1 !p2. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-10-16 Yuri Rumyantsevysrum...@gmail.com * tree-if-conv.c (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree. (tree_if_conversion): Free post-dominance information. First, for the benefit of anyone trying to understand what you're doing, defining what cd equivalent means would be helpful. if-conv.patch Index: tree-if-conv.c === --- tree-if-conv.c (revision 216217) +++ tree-if-conv.c (working copy) @@ -396,25 +396,51 @@ } /* Add condition NC to the predicate list of basic block BB. LOOP is - the loop to be if-converted. */ + the loop to be if-converted. Use predicate of cd-equivalent block + for join bb if it exists. */ static inline void add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) { tree bc, *tp; + basic_block dom_bb; + static basic_block join_bb = NULL; if (is_true_predicate (nc)) return; - if (!is_predicated (bb)) + /* If dominance tells us this basic block is always executed, + don't record any predicates for it. */ + if (dominated_by_p (CDI_DOMINATORS, loop-latch, bb)) +return; So, do you have a case where the dominated_by_p test above is true and is_predicated(bb) returns true as well? I think this part of the change is largely responsible for the hack you're doing with having the function scoped static variable join_bb. + + /* If predicate has been already set up for given bb using cd-equivalent + block predicate, simply escape. */ + if (join_bb == bb) +return; I *really* dislike the state you're carrying around via join_bb. ISTM that if you compute that there's an equivalence, then you just set the predicate for the equivalent block and the right things would have happened if you had not changed the test above. You also need a testcase. It doesn't have to be extensive, but at least some basic smoke test to verify basic operation of this code. It's perfectly fine to scan the debugging dumps for debug output. jeff if-conv.patch.new Description: Binary data
Re: [PATCH] support ggc hash_map and hash_set
Sorry, somehow I missed this email. Yes, that appears to have fixed it! Thank you very much, Alan Trevor Saunders wrote: On Tue, Sep 09, 2014 at 03:37:26PM +0100, Alan Lawrence wrote: Following this, we're seeing ICEs in tests in gcc.dg/pch.exp and g++.dg/pch.exp, with cross-builds (hosted on x86_64) targetting bare metal AArch64 and ARM (aarch64-none-elf, aarch64_be-none-elf and arm-none-eabi; I haven't tested armeb-none-eabi; builds targeting linux are OK), for *release builds only*. Could you test the below patch? it seems to work for me, but I'm not familiar with testing cross compilers. diff --git a/gcc/hash-table.h b/gcc/hash-table.h index c2a68fd..028b7de 100644 --- a/gcc/hash-table.h +++ b/gcc/hash-table.h @@ -1598,8 +1598,9 @@ templatetypename D static void gt_pch_nx (hash_tableD *h) { - gcc_checking_assert (gt_pch_note_object (h-m_entries, h, - hashtab_entry_note_pointersD)); + bool success ATTRIBUTE_UNUSED += gt_pch_note_object (h-m_entries, h, hashtab_entry_note_pointersD); + gcc_checking_assert (success); for (size_t i = 0; i h-m_size; i++) { if (hash_tableD::is_empty (h-m_entries[i]) Trev Affected tests: gcc.dg/pch.exp: all variants of ./except-1.h ./inline-3.h gcc.dg/pch/except-1.c gcc.dg/pch/inline-3.c g++.dg: all variants of ./array-1.H ./empty.H ./externc-1.H ./local-1.H ./pch.H ./static-1.H ./system-1.H ./system-2.H ./template-1.H ./uninst.H ./wchar-1.H (These then lead to failures of g++.dg/pch/{array-1,...}.C and corresponding assembly comparisons). Sample log: Executing on host: build/obj/gcc2/gcc/testsuite/g++/../../xg++ -Bbuild/obj/gcc2/gcc/testsuite/g++/../../ ./template-1.H -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -Ibuild/obj/gcc2/aarch64-none-elf/ilp32/libstdc++-v3/include/aarch64-none-elf -Ibuild/obj/gcc2/aarch64-none-elf/ilp32/libstdc++-v3/include -Isrc/gcc/libstdc++-v3/libsupc++ -Isrc/gcc/libstdc++-v3/include/backward -Isrc/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -O2 -g -specs=aem-ve.specs-mabi=ilp32 -mcmodel=small -o template-1.H.gch (timeout = 300) spawn build/obj/gcc2/gcc/testsuite/g++/../../xg++ -Bbuild/obj/gcc2/gcc/testsuite/g++/../../ ./template-1.H -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -Ibuild/obj/gcc2/aarch64-none-elf/ilp32/libstdc++-v3/include/aarch64-none-elf -Ibuild/obj/gcc2/aarch64-none-elf/ilp32/libstdc++-v3/include -Isrc/gcc/libstdc++-v3/libsupc++ -Isrc/gcc/libstdc++-v3/include/backward -Isrc/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -O2 -g -specs=aem-ve.specs -mabi=ilp32 -mcmodel=small -o template-1.H.gch ./template-1.H:5:2: internal compiler error: in relocate_ptrs, at ggc-common.c:435 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. compiler exited with status 1 output is: ./array-1.H:4:2: internal compiler error: in relocate_ptrs, at ggc-common.c:435 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. FAIL: ./array-1.H -g (internal compiler error) FAIL: ./array-1.H -g (test for excess errors) Excess errors: ./array-1.H:4:2: internal compiler error: in relocate_ptrs, at ggc-common.c:435 --Alan tsaund...@mozilla.com wrote: From: Trevor Saunders tsaund...@mozilla.com Hi, There are still some issues to make this work really nicely, but this part is probably good enough its worth reviewing. For one thing you can't use ggc hash_map or set in front ends with some types or gengtype will decide to put the overloads of the marking routines it provides in a front end file instead of the one it choose before breaking other front ends. However that seems to be an unrelated issue you can trigger it without using hash_map/set, so we might as well solve it separetly. I had to have the entry marking functions for set deligate to the traits class because gcc 4.9.1 issues clearly bogus errors if you inline the code from the traits implementation. We may well want to make map work the same way at some point to enable some of the special GTY attributes like if_marked, but it doesn't seem to be necessary right now. bootstrapped + regtested without regressions on x86_64-unknown-linux-gnu, ok? Trev gcc/ChangeLog: 2014-09-01 Trevor Saunders tsaund...@mozilla.com * alloc-pool.c: Include coretypes.h. * cgraph.h, dbxout.c, dwarf2out.c, except.c, except.h, function.c, function.h, symtab.c, tree-cfg.c, tree-eh.c: Use hash_map and hash_set instead of htab. * ggc-page.c (in_gc): New variable. (ggc_free): Do nothing if a collection is taking place. (ggc_collect): Set in_gc appropriately. * ggc.h (gt_ggc_mx(const char *)): New function. (gt_pch_nx(const char *)): Likewise. (gt_ggc_mx(int)): Likewise. (gt_pch_nx(int)): Likewise. * hash-map.h (hash_map::hash_entry::ggc_mx):
[PATCH] Fix for PR63569
Hello. Following patch fixes PR63569. Bootstrap executed on ppc64-linux and no regression seen on x86_64-pc-linux. Ready for trunk? Thank you, Martin gcc/testsuite/ChangeLog: 2014-10-17 Martin Liska mli...@suse.cz * gcc.dg/ipa/ipa-icf-31.c: New test. gcc/ChangeLog: 2014-10-17 Martin Liska mli...@suse.cz * ipa-icf-gimple.c (func_checker::compare_volatility): New function. (func_checker::compare_gimple_call): Volatility check added. (func_checker::compare_gimple_assign): Likewise. * ipa-icf-gimple.h: New function. diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index 792a3e4..1b9ee85 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -452,6 +452,17 @@ func_checker::compare_tree_list_operand (tree t1, tree t2) return true; } +/* Compares if both trees T1 and T2 have equal volatility. */ + +bool +func_checker::compare_volatility (tree t1, tree t2) +{ + if (t1 t2) +return TREE_THIS_VOLATILE (t1) == TREE_THIS_VOLATILE (t2); + + return !(t1 || t2); +} + /* Verifies that trees T1 and T2, representing function declarations are equivalent from perspective of ICF. */ @@ -663,6 +674,9 @@ func_checker::compare_gimple_call (gimple s1, gimple s2) t1 = gimple_get_lhs (s1); t2 = gimple_get_lhs (s2); + if (!compare_volatility (t1, t2)) +return return_false_with_msg (different volatility for call statement); + return compare_operand (t1, t2); } @@ -696,8 +710,11 @@ func_checker::compare_gimple_assign (gimple s1, gimple s2) if (!compare_operand (arg1, arg2)) return false; -} + if (!compare_volatility (arg1, arg2)) + return return_false_with_msg (different volatility for assignment + statement); +} return true; } diff --git a/gcc/ipa-icf-gimple.h b/gcc/ipa-icf-gimple.h index 8487a2a..b791c21 100644 --- a/gcc/ipa-icf-gimple.h +++ b/gcc/ipa-icf-gimple.h @@ -209,6 +209,10 @@ public: two trees are semantically equivalent. */ bool compare_tree_list_operand (tree t1, tree t2); + /* Compares two tree list operands T1 and T2 and returns true if these + two trees are semantically equivalent. */ + bool compare_volatility (tree t1, tree t2); + /* Verifies that trees T1 and T2, representing function declarations are equivalent from perspective of ICF. */ bool compare_function_decl (tree t1, tree t2); diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c b/gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c new file mode 100644 index 000..e70d72d --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options -fipa-icf -fdump-ipa-icf-details } */ + + +static int f(int t, int *a) __attribute__((noinline)); + +static int g(int t, volatile int *a) __attribute__((noinline)); +static int g(int t, volatile int *a) +{ + int i; + int tt = 0; + for(i=0;it;i++) +tt += *a; + return tt; +} +static int f(int t, int *a) +{ + int i; + int tt = 0; + for(i=0;it;i++) +tt += *a; + return tt; +} + + +int main() +{ + return 0; +} + +/* { dg-final { scan-ipa-dump Equal symbols: 0 icf } } */ +/* { dg-final { scan-ipa-dump different volatility for assignment statement icf } } */ +/* { dg-final { cleanup-ipa-dump icf } } */
Re: [PATCH 0/17] KASan 4.9 backport
On 10/17/2014 04:12 PM, Jakub Jelinek wrote: I had a brief look at what ended up on the branch in the end, and from what I understand, the 4.9 libasan.so has __asan_report_store_n and __asan_report_load_n entry points, but does not have any __asan_loadN/__asan_reportN entrypoints (neither 1/2/4/8/16, nor variable). So, what the branch does seems to not match what the library provides. I agree, __asan_report_loadN is indeed there and misalign tests seem to pass fine. Probably I should have examined 4.9 libasan closer. With -fsanitize=address -O2 --param asan-instrumentation-with-call-threshold=0 foo is again unexpectedly not instrumented, and bar is instrumented with __asan_load8, which looks wrong to me, because the library does not provide any such entry point. By default asan-instrumentation-with-call-threshold is INT_MAX which means that compiler will never generate __asan_load*/__asan_store* calls unless forced by the user (e.g. for Kasan). But, in execute_sanopt force !use_calls for (flag_sanitize SANITIZE_USER_ADDRESS). Do you think above limitation is not enough? Thus, IMHO the: if ((flag_sanitize SANITIZE_USER_ADDRESS) != 0 ((size_in_bytes (size_in_bytes - 1)) != 0 || (unsigned HOST_WIDE_INT) size_in_bytes - 1 = 16)) return; should be nuked from 4.9, we can do unaligned/non-{1,2,4,8,16} accesses fine. Right. I'd also import misalign tests. Or were there any bugfixes needed for __asan_report_{store,load}_n on the library side? I don't think so. -Y
Re: [PATCH 0/17] KASan 4.9 backport
On Fri, Oct 17, 2014 at 05:45:17PM +0400, Yury Gribov wrote: On 10/17/2014 04:12 PM, Jakub Jelinek wrote: I had a brief look at what ended up on the branch in the end, and from what I understand, the 4.9 libasan.so has __asan_report_store_n and __asan_report_load_n entry points, but does not have any __asan_loadN/__asan_reportN entrypoints (neither 1/2/4/8/16, nor variable). So, what the branch does seems to not match what the library provides. I agree, __asan_report_loadN is indeed there and misalign tests seem to pass fine. Probably I should have examined 4.9 libasan closer. With -fsanitize=address -O2 --param asan-instrumentation-with-call-threshold=0 foo is again unexpectedly not instrumented, and bar is instrumented with __asan_load8, which looks wrong to me, because the library does not provide any such entry point. By default asan-instrumentation-with-call-threshold is INT_MAX which means that compiler will never generate __asan_load*/__asan_store* calls unless forced by the user (e.g. for Kasan). But, in execute_sanopt force !use_calls for (flag_sanitize SANITIZE_USER_ADDRESS). Do you think above limitation is not enough? Yeah, even if the default is that it doesn't make the non-existing calls, anyone who uses the parameter will get code that doesn't link. Thus, IMHO the: if ((flag_sanitize SANITIZE_USER_ADDRESS) != 0 ((size_in_bytes (size_in_bytes - 1)) != 0 || (unsigned HOST_WIDE_INT) size_in_bytes - 1 = 16)) return; should be nuked from 4.9, we can do unaligned/non-{1,2,4,8,16} accesses fine. Right. I'd also import misalign tests. Or were there any bugfixes needed for __asan_report_{store,load}_n on the library side? I don't think so. So, what about this? Just checked that with make -k check-g{cc,++} RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp tsan.exp ubsan.exp' so far. Plus if you add misalign tests... 2014-10-17 Jakub Jelinek ja...@redhat.com * asan.c (instrument_derefs): Allow instrumentation of odd-sized accesses even for -fsanitize=address. (execute_sanopt): Only allow use_calls for -fsanitize=kernel-address. * c-c++-common/asan/instrument-with-calls-1.c: Add -fno-sanitize=address -fsanitize=kernel-address to dg-options. * c-c++-common/asan/instrument-with-calls-2.c: Likewise. --- gcc/asan.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/asan.c 2014-10-17 15:21:29.921495259 +0200 @@ -1707,10 +1707,6 @@ instrument_derefs (gimple_stmt_iterator size_in_bytes = int_size_in_bytes (type); if (size_in_bytes = 0) return; - if ((flag_sanitize SANITIZE_USER_ADDRESS) != 0 - ((size_in_bytes (size_in_bytes - 1)) != 0 - || (unsigned HOST_WIDE_INT) size_in_bytes - 1 = 16)) -return; HOST_WIDE_INT bitsize, bitpos; tree offset; @@ -2780,8 +2776,10 @@ execute_sanopt (void) } } - bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX - asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; + bool use_calls += ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) + asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; FOR_EACH_BB_FN (bb, cfun) { --- gcc/testsuite/c-c++-common/asan/instrument-with-calls-1.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/testsuite/c-c++-common/asan/instrument-with-calls-1.c 2014-10-17 15:34:06.679627168 +0200 @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options --param asan-instrumentation-with-call-threshold=0 -save-temps } */ +/* { dg-options -fno-sanitize=address -fsanitize=kernel-address --param asan-instrumentation-with-call-threshold=0 -save-temps } */ void f(char *a, int *b) { *b = *a; --- gcc/testsuite/c-c++-common/asan/instrument-with-calls-2.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/testsuite/c-c++-common/asan/instrument-with-calls-2.c 2014-10-17 15:34:15.569472032 +0200 @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options --param asan-instrumentation-with-call-threshold=1 -save-temps } */ +/* { dg-options -fno-sanitize=address -fsanitize=kernel-address --param asan-instrumentation-with-call-threshold=1 -save-temps } */ int x; Jakub
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On 15 Oct 17:35, Jakub Jelinek wrote: But we do want to test them with host fallback, which those lines preclude. Just a single dg-require-effective-target offload_device guarded test (which there necessarily is, e.g. the 57.* ones) should be sufficient for your purposes (if you want to diff UNSUPPORTED vs. PASS tests between runs). Right now the result of that test turns all tests in the directory into UNSUPPORTED, with the removals you'd just turn a single one or a dozen or how many would really need it. The fact that the tcl offload_device check succeeded doesn't mean that all tests don't use host fallback anyway. Additionally to a handful of dg-require-effective-target offload_device you could have one which just prints something on stdout depending on if it is offloaded or not, you can grep for the output of that in your libgomp.log. Jakub Agreed. Patch is fixed and retested. -- Ilya --- diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index 094e5ed..071e22f 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -239,3 +239,17 @@ proc libgomp_option_proc { option } { return 0 } } + +# Return 1 if offload device is available. +proc check_effective_target_offload_device { } { +return [check_runtime_nocache offload_device_available_ { + #include omp.h + int main () + { + int a; + #pragma omp target map(from: a) + a = omp_is_initial_device (); + return a; + } +} ] +} diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp index a9cf41a..da42e62 100644 --- a/libgomp/testsuite/libgomp.c++/c++.exp +++ b/libgomp/testsuite/libgomp.c++/c++.exp @@ -42,7 +42,7 @@ if { $blddir != } { if { $lang_test_file_found } { # Gather a list of all tests. -set tests [lsort [glob -nocomplain $srcdir/$subdir/*.C]] +set tests [lsort [find $srcdir/$subdir *.C]] if { $blddir != } { set ld_library_path $always_ld_library_path:${blddir}/${lang_library_path} diff --git a/libgomp/testsuite/libgomp.c++/examples-4/e.51.5.C b/libgomp/testsuite/libgomp.c++/examples-4/e.51.5.C new file mode 100644 index 000..4298e23 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/examples-4/e.51.5.C @@ -0,0 +1,62 @@ +// { dg-do run } + +#include omp.h + +#define EPS 0.01 +#define N 1000 + +extern C void abort (void); + +void init (float *a1, float *a2, int n) +{ + int s = -1; + for (int i = 0; i n; i++) +{ + a1[i] = s * 0.01; + a2[i] = i; + s = -s; +} +} + +void check (float *a, float *b, int n) +{ + for (int i = 0; i n; i++) +if (a[i] - b[i] EPS || b[i] - a[i] EPS) + abort (); +} + +void vec_mult_ref (float *p, float *v1, float *v2, int n) +{ + for (int i = 0; i n; i++) +p[i] = v1[i] * v2[i]; +} + +void vec_mult (float *p, float *v1, float *v2, int n) +{ + #pragma omp target map(to: v1[0:n], v2[:n]) map(from: p[0:n]) +#pragma omp parallel for + for (int i = 0; i n; i++) + p[i] = v1[i] * v2[i]; +} + +int main () +{ + float *p = new float [N]; + float *p1 = new float [N]; + float *v1 = new float [N]; + float *v2 = new float [N]; + + init (v1, v2, N); + + vec_mult_ref (p, v1, v2, N); + vec_mult (p1, v1, v2, N); + + check (p, p1, N); + + delete [] p; + delete [] p1; + delete [] v1; + delete [] v2; + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C b/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C new file mode 100644 index 000..75276e7 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C @@ -0,0 +1,43 @@ +// { dg-do run } +// { dg-require-effective-target offload_device } + +#include stdlib.h + +struct typeX +{ + int a; +}; + +class typeY +{ +public: + int foo () { return a^0x01; } + int a; +}; + +#pragma omp declare target +struct typeX varX; +class typeY varY; +#pragma omp end declare target + +int main () +{ + varX.a = 0; + varY.a = 0; + + #pragma omp target +{ + varX.a = 100; + varY.a = 100; +} + + if (varX.a != 0 || varY.a != 0) +abort (); + + #pragma omp target update from(varX, varY) + + if (varX.a != 100 || varY.a != 100) +abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/examples-4/e.50.1.c b/libgomp/testsuite/libgomp.c/examples-4/e.50.1.c new file mode 100644 index 000..45adbe0 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/examples-4/e.50.1.c @@ -0,0 +1,63 @@ +/* { dg-do run } */ + +#include stdlib.h + +#define N 10 + +void init (int *a1, int *a2) +{ + int i, s = -1; + for (i = 0; i N; i++) +{ + a1[i] = s; + a2[i] = i; + s = -s; +} +} + +void check (int *a, int *b) +{ + int i; + for (i = 0; i N; i++) +if (a[i] != b[i]) + abort (); +} + +void vec_mult_ref (int *p) +{ + int i; + int v1[N], v2[N]; + + init (v1, v2); + + for (i = 0; i N; i++) +p[i] = v1[i] *
[PARCH 1/2, x86, PR63534] Fix darwin bootstrap
Hi, The patch fixes 1st fail in darwin bootstarp. When PIC register is pseudo we don't need to init it after setjmp or non local goto. Is it ok? ChangeLog: 2014-10-17 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (builtin_setjmp_receiver): Delete. (nonlocal_goto_receiver): Ditto. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 624a1c1..fc3776f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16927,57 +16927,6 @@ * return output_probe_stack_range (operands[0], operands[2]); [(set_attr type multi)]) -(define_expand builtin_setjmp_receiver - [(label_ref (match_operand 0))] - !TARGET_64BIT flag_pic -{ -#if TARGET_MACHO - if (TARGET_MACHO) -{ - rtx xops[3]; - rtx picreg = gen_rtx_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); - rtx_code_label *label_rtx = gen_label_rtx (); - emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx)); - xops[0] = xops[1] = picreg; - xops[2] = machopic_gen_offset (gen_rtx_LABEL_REF (SImode, label_rtx)); - ix86_expand_binary_operator (MINUS, SImode, xops); -} - else -#endif -emit_insn (gen_set_got (pic_offset_table_rtx)); - DONE; -}) - -(define_insn_and_split nonlocal_goto_receiver - [(unspec_volatile [(const_int 0)] UNSPECV_NLGR)] - TARGET_MACHO !TARGET_64BIT flag_pic - # - reload_completed - [(const_int 0)] -{ - if (crtl-uses_pic_offset_table) -{ - rtx xops[3]; - rtx label_rtx = gen_label_rtx (); - rtx tmp; - - /* Get a new pic base. */ - emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx)); - /* Correct this with the offset from the new to the old. */ - xops[0] = xops[1] = pic_offset_table_rtx; - label_rtx = gen_rtx_LABEL_REF (SImode, label_rtx); - tmp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, label_rtx), - UNSPEC_MACHOPIC_OFFSET); - xops[2] = gen_rtx_CONST (Pmode, tmp); - ix86_expand_binary_operator (MINUS, SImode, xops); -} - else -/* No pic reg restore needed. */ -emit_note (NOTE_INSN_DELETED); - - DONE; -}) - ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode. ;; Do not split instructions with mask registers. (define_split
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to reject basic blocks with only critical incoming edges since support for extended predication of phi nodes will be in next patch. Could you please clarify your statement. I attached modified patch. ChangeLog: 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-17 13:09 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is reduced patch as you requested. All your remarks have been fixed. Could you please look at it ( I have already sent the patch with changes in add_to_predicate_list for review). + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, More than two phi node args.\n); + return false; + } + +} Excess vertical space. +/* Assumes that BB has more than 2 predecessors. More than 1 predecessor? + Returns false if at least one successor is not on critical edge + and true otherwise. */ + +static inline bool +all_edges_are_critical (basic_block bb) +{ all_preds_critical_p would be a better name + if (EDGE_COUNT (bb-preds) 2) +{ + if (!flag_force_vectorize) + return false; +} as I said in the last review I don't think we should restrict edge predicates to flag_force_vectorize. At least I can't see how if-conversion is magically more expensive for that case? So please rework the patch so critical edges are always handled correctly. Ok with that and the above suggested changes. Thanks, Richard. Thanks. Yuri. ChangeLog 2014-10-16 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Allow bb has more than two predecessors if FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-15 13:50 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is updated patch (part1) for extended if conversion. Second part of patch will be sent later. Ok, I'm starting to look at this. I'd still like you to split things up more. static inline void add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) { ... + /* We use notion of cd equivalence to get simplier
Re: [PATCH 0/17] KASan 4.9 backport
On 10/17/2014 05:49 PM, Jakub Jelinek wrote: Plus if you add misalign tests... Sure, can do this on Monday. - bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX - asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; + bool use_calls += ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) + asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; I agree that original code didn't quite match GNU conventions but can we avoid reformatting it to make future backports easier? So e.g. bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; -Y
[PARCH 2/2, x86, PR63534] Fix darwin bootstrap
Hi, Some instructions (like one in PR63534) could have hidden use of PIC register. Therefore we need to leave SET_GOT not deleted till reload completed. The patch prevents SET_GOT from deleting while PIC register is pseudo. Is it ok? ChangeLog: 2014-10-17 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * cse.c (delete_trivially_dead_insns): Consider PIC register is used while it is pseudo. * dse.c (deletable_insn_p): Likewise. diff --git a/gcc/cse.c b/gcc/cse.c index be2f31b..062ba45 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -6953,6 +6953,11 @@ delete_trivially_dead_insns (rtx_insn *insns, int nreg) /* If no debug insns can be present, COUNTS is just an array which counts how many times each pseudo is used. */ } + /* Pseudo PIC register should be considered as used due to possible + new usages generated. */ + if (pic_offset_table_rtx + REGNO (pic_offset_table_rtx) = FIRST_PSEUDO_REGISTER) +counts[REGNO (pic_offset_table_rtx)]++; /* Go from the last insn to the first and delete insns that only set unused registers or copy a register to itself. As we delete an insn, remove usage counts for registers it uses. diff --git a/gcc/dce.c b/gcc/dce.c index 5b7d36e..a52a59c 100644 --- a/gcc/dce.c +++ b/gcc/dce.c @@ -127,6 +127,10 @@ deletable_insn_p (rtx_insn *insn, bool fast, bitmap arg_stores) if (HARD_REGISTER_NUM_P (DF_REF_REGNO (def)) global_regs[DF_REF_REGNO (def)]) return false; +/* Initialization of pseudo PIC register should never be removed. */ +else if (DF_REF_REG (def) == pic_offset_table_rtx + REGNO (pic_offset_table_rtx) = FIRST_PSEUDO_REGISTER) + return false; body = PATTERN (insn); switch (GET_CODE (body))
Re: [PATCH 0/17] KASan 4.9 backport
On Fri, Oct 17, 2014 at 06:15:11PM +0400, Yury Gribov wrote: On 10/17/2014 05:49 PM, Jakub Jelinek wrote: Plus if you add misalign tests... Sure, can do this on Monday. Ok, thanks. - bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX - asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; + bool use_calls += ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) + asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; I agree that original code didn't quite match GNU conventions but can we avoid reformatting it to make future backports easier? So e.g. bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; I can live with that. So here is updated patch: 2014-10-17 Jakub Jelinek ja...@redhat.com * asan.c (instrument_derefs): Allow instrumentation of odd-sized accesses even for -fsanitize=address. (execute_sanopt): Only allow use_calls for -fsanitize=kernel-address. * c-c++-common/asan/instrument-with-calls-1.c: Add -fno-sanitize=address -fsanitize=kernel-address to dg-options. * c-c++-common/asan/instrument-with-calls-2.c: Likewise. --- gcc/asan.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/asan.c 2014-10-17 15:21:29.921495259 +0200 @@ -1707,10 +1707,6 @@ instrument_derefs (gimple_stmt_iterator size_in_bytes = int_size_in_bytes (type); if (size_in_bytes = 0) return; - if ((flag_sanitize SANITIZE_USER_ADDRESS) != 0 - ((size_in_bytes (size_in_bytes - 1)) != 0 - || (unsigned HOST_WIDE_INT) size_in_bytes - 1 = 16)) -return; HOST_WIDE_INT bitsize, bitpos; tree offset; @@ -2781,6 +2777,7 @@ execute_sanopt (void) } bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; FOR_EACH_BB_FN (bb, cfun) --- gcc/testsuite/c-c++-common/asan/instrument-with-calls-1.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/testsuite/c-c++-common/asan/instrument-with-calls-1.c 2014-10-17 15:34:06.679627168 +0200 @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options --param asan-instrumentation-with-call-threshold=0 -save-temps } */ +/* { dg-options -fno-sanitize=address -fsanitize=kernel-address --param asan-instrumentation-with-call-threshold=0 -save-temps } */ void f(char *a, int *b) { *b = *a; --- gcc/testsuite/c-c++-common/asan/instrument-with-calls-2.c.jj 2014-10-17 12:51:27.0 +0200 +++ gcc/testsuite/c-c++-common/asan/instrument-with-calls-2.c 2014-10-17 15:34:15.569472032 +0200 @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options --param asan-instrumentation-with-call-threshold=1 -save-temps } */ +/* { dg-options -fno-sanitize=address -fsanitize=kernel-address --param asan-instrumentation-with-call-threshold=1 -save-temps } */ int x; Jakub
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On Fri, Oct 17, 2014 at 06:02:11PM +0400, Ilya Verbin wrote: --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C @@ -0,0 +1,43 @@ +// { dg-do run } +// { dg-require-effective-target offload_device } Well, this test actually relies not only offload_device, but also on non-shared address space (so, if we ever have HSA backend, it would fail there). So, perhaps not immediately, but eventually we'll want an effective target whether address space is shared or not between offloading device and host. --- a/libgomp/testsuite/libgomp.c/target-7.c +++ b/libgomp/testsuite/libgomp.c/target-7.c @@ -1,7 +1,9 @@ +// { dg-require-effective-target offload_device } + Why? The test was specially written such that it tests host fallback (if f is true) too. #include omp.h #include stdlib.h -volatile int v; +volatile int v = 0; Why? void foo (int f) @@ -18,7 +20,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (v = 1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0 || omp_is_initial_device ()) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -30,7 +32,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0 || omp_is_initial_device ()) abort (); #pragma omp target device (d) if (1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -59,7 +61,7 @@ foo (int f) #pragma omp target data if (v = 1) map (to: h) { #pragma omp target if (v = 1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 8) +if (omp_get_level () != 0 || omp_is_initial_device () || h++ != 8) abort (); #pragma omp target update if (v = 1) from (h) } @@ -87,7 +89,7 @@ foo (int f) #pragma omp target data if (1) map (to: h) { #pragma omp target if (1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 12) +if (omp_get_level () != 0 || omp_is_initial_device () || h++ != 12) abort (); #pragma omp target update if (1) from (h) } I don't understand any of these changes. Otherwise it LGTM. Jakub
Re: [PATCH i386 AVX512 Boostrap] [80/n] Extend expand_sse2_mulvxdi3.
Hello, This is fix for bootstrap failure. Is it OK? gcc/ * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Refactor conditions to fix bootstrap. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7040200..3ddaf3d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -45671,21 +45671,12 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) enum machine_mode mode = GET_MODE (op0); rtx t1, t2, t3, t4, t5, t6; - if (TARGET_AVX512DQ) -{ - rtx (*gen) (rtx, rtx, rtx); - - if (mode == V8DImode) - gen = gen_avx512dq_mulv8di3; - else if (TARGET_AVX512VL) - { - if (mode == V4DImode) - gen = gen_avx512dq_mulv4di3; - else if (mode == V2DImode) - gen = gen_avx512dq_mulv2di3; - } - emit_insn (gen (op0, op1, op2)); -} + if (TARGET_AVX512DQ mode == V8DImode) +emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2)); + else if (TARGET_AVX512DQ TARGET_AVX512VL mode == V4DImode) +emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2)); + else if (TARGET_AVX512DQ TARGET_AVX512VL mode == V2DImode) +emit_insn (gen_avx512dq_mulv2di3 (op0, op1, op2)); else if (TARGET_XOP mode == V2DImode) { /* op1: A,B,C,D, op2: E,F,G,H */
Re: [PATCH i386 AVX512 Boostrap] [80/n] Extend expand_sse2_mulvxdi3.
On Fri, Oct 17, 2014 at 4:25 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This is fix for bootstrap failure. Is it OK? gcc/ * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Refactor conditions to fix bootstrap. Well, OK. Uros.
Re: [PARCH 2/2, x86, PR63534] Fix darwin bootstrap
On Fri, Oct 17, 2014 at 06:16:41PM +0400, Evgeny Stupachenko wrote: Hi, Some instructions (like one in PR63534) could have hidden use of PIC register. Therefore we need to leave SET_GOT not deleted till reload completed. The patch prevents SET_GOT from deleting while PIC register is pseudo. Just curious, do you emit the init_pic_reg unconditionally at the start of the function in -fpic mode? What does IRA do in that case, if it sees a dead setter of something that doesn't seem to be used at that point? Doesn't it penalize generated code, even if we don't end up with any PIC references during/after reload? Jakub
[PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode
Hi, The patch fixes profile in 32bits PIC mode (only -p option affected). x86 bootstrap, make check passed spec2000 o2 -p train data on Corei7: CINT -5% CFP +1,5 compared to a compiler before enabling ebx. There is a potential performance improve after the patch applied suggested by Jakub: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534#c8 There is opened bug on this: PR63527. However the fix of the bug is more complicated. Is it ok? ChangeLog 2014-10-16 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (x86_function_profiler): Add GOT register init for mcount call. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a3ca2ed..5117572 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39119,11 +39126,15 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) } else if (flag_pic) { + fprintf (file,\tpush\t%%ebx\n); + fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n); + fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n); #ifndef NO_PROFILE_COUNTERS fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%% PROFILE_COUNT_REGISTER \n, LPREFIX, labelno); #endif fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name); + fprintf (file,\tpop\t%%ebx\n); } else {
Re: [PATCH 0/17] KASan 4.9 backport
On 10/17/2014 06:18 PM, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 06:15:11PM +0400, Yury Gribov wrote: On 10/17/2014 05:49 PM, Jakub Jelinek wrote: Plus if you add misalign tests... Sure, can do this on Monday. Ok, thanks. - bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX - asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; + bool use_calls += ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) + asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; I agree that original code didn't quite match GNU conventions but can we avoid reformatting it to make future backports easier? So e.g. bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX + (flag_sanitize SANITIZE_KERNEL_ADDRESS) asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; I can live with that. So here is updated patch: Thanks, LGTM. -Y
Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode
On Fri, Oct 17, 2014 at 06:30:42PM +0400, Evgeny Stupachenko wrote: Hi, The patch fixes profile in 32bits PIC mode (only -p option affected). x86 bootstrap, make check passed spec2000 o2 -p train data on Corei7: CINT -5% CFP +1,5 compared to a compiler before enabling ebx. There is a potential performance improve after the patch applied suggested by Jakub: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534#c8 There is opened bug on this: PR63527. However the fix of the bug is more complicated. Is it ok? Unfortunately I don't think it is ok. 1) you don't set the appropriate bit in pic_labels_used (for ebx) 2) more importantly, it causes the stack to be misaligned (i.e. violating ABI) for the _mcount call, and, break unwind info. 2014-10-16 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (x86_function_profiler): Add GOT register init for mcount call. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a3ca2ed..5117572 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39119,11 +39126,15 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) } else if (flag_pic) { + fprintf (file,\tpush\t%%ebx\n); + fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n); + fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n); #ifndef NO_PROFILE_COUNTERS fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%% PROFILE_COUNT_REGISTER \n, LPREFIX, labelno); #endif fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name); + fprintf (file,\tpop\t%%ebx\n); } else { Jakub
Re: [PARCH 2/2, x86, PR63534] Fix darwin bootstrap
Yes, unconditionally. If pic_reg is unused, RA will allocate a hard register for it and treat it as free, DCE after reload will delete SET_GOT. On Fri, Oct 17, 2014 at 6:20 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 17, 2014 at 06:16:41PM +0400, Evgeny Stupachenko wrote: Hi, Some instructions (like one in PR63534) could have hidden use of PIC register. Therefore we need to leave SET_GOT not deleted till reload completed. The patch prevents SET_GOT from deleting while PIC register is pseudo. Just curious, do you emit the init_pic_reg unconditionally at the start of the function in -fpic mode? What does IRA do in that case, if it sees a dead setter of something that doesn't seem to be used at that point? Doesn't it penalize generated code, even if we don't end up with any PIC references during/after reload? Jakub
[PATCH 0/5] Add preferred_for_{size,speed} attributes
This patch implements the approach I suggested in: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00371.html for fixing PR61360. To recap, the problem is with the use of enabled in the i386.md pattern: (define_insn *floatSWI48:modeMODEF:mode2_sse [(set (match_operand:MODEF 0 register_operand =f,x,x) (float:MODEF (match_operand:SWI48 1 nonimmediate_operand m,r,m)))] SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH @ fild%Z1\t%1 %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1} %vcvtsi2MODEF:ssemodesuffixSWI48:rex64suffix\t{%1, %d0|%d0, %1} [(set_attr type fmov,sseicvt,sseicvt) (set_attr prefix orig,maybe_vex,maybe_vex) (set_attr mode MODEF:MODE) (set (attr prefix_rex) (if_then_else (and (eq_attr prefix maybe_vex) (match_test SWI48:MODEmode == DImode)) (const_string 1) (const_string *))) (set_attr unit i387,*,*) (set_attr athlon_decode *,double,direct) (set_attr amdfam10_decode *,vector,double) (set_attr bdver1_decode *,double,direct) (set_attr fp_int_src true) (set (attr enabled) (cond [(eq_attr alternative 0) (symbol_ref TARGET_MIX_SSE_I387 X87_ENABLE_FLOAT (MODEF:MODEmode, SWI48:MODEmode)) (eq_attr alternative 1) /* ??? For sched1 we need constrain_operands to be able to select an alternative. Leave this enabled before RA. */ (symbol_ref TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun) || !(reload_completed || reload_in_progress || lra_in_progress)) ] (symbol_ref true))) ]) The attribute was only really supposed to test properties of the currently- selected target. It wasn't supposed to test function-specific size/speed properties like the above pattern does. So the idea was instead to add two new attributes that say whether an alternative should be used when optimising for speed or size. These attributes would just be strong optimisation hints; they wouldn't be correctness properties in the way that enabled is. There are some cases where we could end up with a size-only alternative in code optimised for speed, or vice versa, but the idea would be to reduce them as far as possible. The main advantage of this approach is that we can take block-level size/speed choices into account, rather than just looking at the function-level choice. Series tested on x86_64-linux-gnu. Thanks, Richard
[PATCH 1/5] Add recog_constrain_insn
This patch just adds a new utility function called recog_constrain_insn, to go alongside the existing recog_constrain_insn_cached. Note that the extract_insn in lra.c wasn't used when checking is disabled. The function just moved on to the next instruction straight away. Richard gcc/ * recog.h (extract_constrain_insn): Declare. * recog.c (extract_constrain_insn): New function. * lra.c (check_rtl): Use it. * postreload.c (reload_cse_simplify_operands): Likewise. * reg-stack.c (check_asm_stack_operands): Likewise. (subst_asm_stack_regs): Likewise. * regcprop.c (copyprop_hardreg_forward_1): Likewise. * regrename.c (build_def_use): Likewise. * sel-sched.c (get_reg_class): Likewise. * config/arm/arm.c (note_invalid_constants): Likewise. * config/s390/predicates.md (execute_operation): Likewise. Index: gcc/recog.h === --- gcc/recog.h 2014-09-18 11:40:31.223690858 +0100 +++ gcc/recog.h 2014-10-17 15:44:50.219398486 +0100 @@ -134,6 +134,7 @@ extern void add_clobbers (rtx, int); extern int added_clobbers_hard_reg_p (int); extern void insn_extract (rtx_insn *); extern void extract_insn (rtx_insn *); +extern void extract_constrain_insn (rtx_insn *insn); extern void extract_constrain_insn_cached (rtx_insn *); extern void extract_insn_cached (rtx_insn *); extern void preprocess_constraints (int, int, const char **, Index: gcc/recog.c === --- gcc/recog.c 2014-09-22 08:36:23.889794255 +0100 +++ gcc/recog.c 2014-10-17 15:44:50.219398486 +0100 @@ -2110,6 +2110,17 @@ extract_insn_cached (rtx_insn *insn) recog_data.insn = insn; } +/* Do uncached extract_insn, constrain_operands and complain about failures. + This should be used when extracting a pre-existing constrained instruction + if the caller wants to know which alternative was chosen. */ +void +extract_constrain_insn (rtx_insn *insn) +{ + extract_insn (insn); + if (!constrain_operands (reload_completed)) +fatal_insn_not_found (insn); +} + /* Do cached extract_insn, constrain_operands and complain about failures. Used by insn_attrtab. */ void Index: gcc/lra.c === --- gcc/lra.c 2014-09-26 16:05:57.868394574 +0100 +++ gcc/lra.c 2014-10-17 15:44:50.219398486 +0100 @@ -1919,8 +1919,9 @@ check_rtl (bool final_p) { if (final_p) { - extract_insn (insn); - lra_assert (constrain_operands (1)); +#ifdef ENABLED_CHECKING + extract_constrain_insn (insn); +#endif continue; } /* LRA code is based on assumption that all addresses can be Index: gcc/postreload.c === --- gcc/postreload.c2014-08-26 12:09:02.182959856 +0100 +++ gcc/postreload.c2014-10-17 15:44:50.219398486 +0100 @@ -401,15 +401,11 @@ reload_cse_simplify_operands (rtx_insn * /* Array of alternatives, sorted in order of decreasing desirability. */ int *alternative_order; - extract_insn (insn); + extract_constrain_insn (insn); if (recog_data.n_alternatives == 0 || recog_data.n_operands == 0) return 0; - /* Figure out which alternative currently matches. */ - if (! constrain_operands (1)) -fatal_insn_not_found (insn); - alternative_reject = XALLOCAVEC (int, recog_data.n_alternatives); alternative_nregs = XALLOCAVEC (int, recog_data.n_alternatives); alternative_order = XALLOCAVEC (int, recog_data.n_alternatives); Index: gcc/reg-stack.c === --- gcc/reg-stack.c 2014-09-18 11:40:31.307689884 +0100 +++ gcc/reg-stack.c 2014-10-17 15:44:50.219398486 +0100 @@ -469,8 +469,7 @@ check_asm_stack_operands (rtx_insn *insn /* Find out what the constraints require. If no constraint alternative matches, this asm is malformed. */ - extract_insn (insn); - constrain_operands (1); + extract_constrain_insn (insn); preprocess_constraints (insn); @@ -2016,8 +2015,7 @@ subst_asm_stack_regs (rtx_insn *insn, st /* Find out what the constraints required. If no constraint alternative matches, that is a compiler bug: we should have caught such an insn in check_asm_stack_operands. */ - extract_insn (insn); - constrain_operands (1); + extract_constrain_insn (insn); preprocess_constraints (insn); const operand_alternative *op_alt = which_op_alt (); Index: gcc/regcprop.c === --- gcc/regcprop.c 2014-10-13 08:02:41.225135081 +0100 +++ gcc/regcprop.c 2014-10-17 15:44:50.227398391 +0100 @@ -762,9 +762,7 @@ copyprop_hardreg_forward_1 (basic_block } set = single_set (insn); - extract_insn (insn); - if (! constrain_operands (1)) -
[PATCH 2/5] Add preferred_for_{size,speed} attributes
This is the main patch, to add new preferred_for_size and preferred_for_speed attributes that can be used to selectively disable alternatives when optimising for size or speed. As explained in the docs, the new attributes are just optimisation hints and it is possible that size-only alternatives will sometimes end up in a block that's optimised for speed, or vice versa. The patch deals with code that directly accesses the enabled_attributes mask and that ought to take size/speed choices into account. The next patch deals with indirect uses. Note that I'm not making reload support these attributes for hopefully obvious reasons :-) Richard gcc/ * doc/md.texi: Document preferred_for_size and preferred_for_speed attributes. * genattr.c (main): Handle preferred_for_size and preferred_for_speed in the same way as enabled. * recog.h (bool_attr): New enum. (target_recog): Replace x_enabled_alternatives with x_bool_attr_masks. (get_preferred_alternatives, check_bool_attrs): Declare. * recog.c (have_bool_attr, get_bool_attr, get_bool_attr_mask_uncached) (get_bool_attr_mask, get_preferred_alternatives, check_bool_attrs): New functions. (get_enabled_alternatives): Use get_bool_attr_mask. * ira-costs.c (record_reg_classes): Use get_preferred_alternatives instead of recog_data.enabled_alternatives. * ira.c (ira_setup_alts): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise. * ira-lives.c (preferred_alternatives): New variable. (process_bb_node_lives): Set it. (check_and_make_def_conflict, make_early_clobber_and_input_conflicts) (single_reg_class, ira_implicitly_set_insn_hard_regs): Use it instead of recog_data.enabled_alternatives. * lra-int.h (lra_insn_recog_data): Replace enabled_alternatives to preferred_alternatives. * lra-constraints.c (process_alt_operands): Update accordingly. * lra.c (lra_set_insn_recog_data): Likewise. (lra_update_insn_recog_data): Assert check_bool_attrs. Index: gcc/doc/md.texi === --- gcc/doc/md.texi 2014-10-07 13:12:12.227445290 +0100 +++ gcc/doc/md.texi 2014-10-17 15:47:34.349453560 +0100 @@ -1080,7 +1080,7 @@ the addressing register. * Class Preferences:: Constraints guide which hard register to put things in. * Modifiers:: More precise control over effects of constraints. * Machine Constraints:: Existing constraints for some particular machines. -* Disable Insn Alternatives:: Disable insn alternatives using the @code{enabled} attribute. +* Disable Insn Alternatives:: Disable insn alternatives using attributes. * Define Constraints:: How to define machine-specific constraints. * C Constraint Interface:: How to test constraints from C code. @end menu @@ -4006,42 +4006,49 @@ Unsigned constant valid for BccUI instru @subsection Disable insn alternatives using the @code{enabled} attribute @cindex enabled -The @code{enabled} insn attribute may be used to disable insn -alternatives that are not available for the current subtarget. -This is useful when adding new instructions to an existing pattern -which are only available for certain cpu architecture levels as -specified with the @code{-march=} option. - -If an insn alternative is disabled, then it will never be used. The -compiler treats the constraints for the disabled alternative as -unsatisfiable. +There are three insn attributes that may be used to selectively disable +instruction alternatives: -In order to make use of the @code{enabled} attribute a back end has to add -in the machine description files: +@table @code +@item enabled +Says whether an alternative is available on the current subtarget. -@enumerate -@item -A definition of the @code{enabled} insn attribute. The attribute is -defined as usual using the @code{define_attr} command. This -definition should be based on other insn attributes and/or target flags. -The attribute must be a static property of the subtarget; that is, it -must not depend on the current operands or any other dynamic context -(for example, the location of the insn within the body of a loop). - -The @code{enabled} attribute is a numeric attribute and should evaluate to -@code{(const_int 1)} for an enabled alternative and to -@code{(const_int 0)} otherwise. -@item -A definition of another insn attribute used to describe for what -reason an insn alternative might be available or -not. E.g. @code{cpu_facility} as in the example below. -@item -An assignment for the second attribute to each insn definition -combining instructions which are not all available under the same -circumstances. (Note: It obviously only makes sense for definitions -with more than one alternative. Otherwise the insn pattern should be -disabled
[PATCH 3/5] Pass an alternative_mask to constrain_operands
After the previous patch there are cases where we want to constrain operands to any enabled alternative and cases where we want to also take size/speed preferences into account. The former applies when constraining an existing instruction (which might originally have been in a block with a different size/speed choice) or when making global decisions. The latter applies when evaluating a potential optimisation. This patch therefore passes the mask of allowable alternatives as a parameter to constrain_operands. Richard gcc/ * recog.h (constrain_operands): Add an alternative_mask parameter. (constrain_operands_cached): Likewise. (get_preferred_alternatives): Declare new form. * recog.c (get_preferred_alternatives): New bb-taking instance. (constrain_operands): Take the set of available alternatives as a parameter. (check_asm_operands, insn_invalid_p, extract_constrain_insn) (extract_constrain_insn_cached): Update calls to constrain_operands. * caller-save.c (reg_save_code): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload-gcse.c (eliminate_partially_redundant_load): Likewise. * ree.c (combine_reaching_defs): Likewise. * reload.c (can_reload_into): Likewise. * reload1.c (reload, reload_as_needed, inc_for_reload): Likewise. (gen_reload_chain_without_interm_reg_p, emit_input_reload_insns) (emit_insn_if_valid_for_reload): Likewise. * reorg.c (fill_slots_from_thread): Likewise. * config/i386/i386.c (ix86_attr_length_address_default): Likewise. * config/pa/pa.c (pa_can_combine_p): Likewise. * config/rl78/rl78.c (insn_ok_now): Likewise. * config/sh/sh.md (define_peephole2): Likewise. * final.c (final_scan_insn): Update call to constrain_operands_cached. Index: gcc/recog.h === --- gcc/recog.h 2014-10-17 15:50:02.0 +0100 +++ gcc/recog.h 2014-10-17 15:50:02.627695847 +0100 @@ -95,8 +95,8 @@ extern void confirm_change_group (void); extern int apply_change_group (void); extern int num_validated_changes (void); extern void cancel_changes (int); -extern int constrain_operands (int); -extern int constrain_operands_cached (int); +extern int constrain_operands (int, alternative_mask); +extern int constrain_operands_cached (rtx_insn *, int); extern int memory_address_addr_space_p (enum machine_mode, rtx, addr_space_t); #define memory_address_p(mode,addr) \ memory_address_addr_space_p ((mode), (addr), ADDR_SPACE_GENERIC) @@ -414,6 +414,7 @@ #define this_target_recog (default_targ alternative_mask get_enabled_alternatives (rtx_insn *); alternative_mask get_preferred_alternatives (rtx_insn *); +alternative_mask get_preferred_alternatives (rtx_insn *, basic_block); bool check_bool_attrs (rtx_insn *); void recog_init (); Index: gcc/recog.c === --- gcc/recog.c 2014-10-17 15:50:02.0 +0100 +++ gcc/recog.c 2014-10-17 15:50:02.627695847 +0100 @@ -155,8 +155,9 @@ check_asm_operands (rtx x) if (reload_completed) { /* ??? Doh! We've not got the wrapping insn. Cook one up. */ - extract_insn (make_insn_raw (x)); - constrain_operands (1); + rtx_insn *insn = make_insn_raw (x); + extract_insn (insn); + constrain_operands (1, get_enabled_alternatives (insn)); return which_alternative = 0; } @@ -360,7 +361,7 @@ insn_invalid_p (rtx_insn *insn, bool in_ { extract_insn (insn); - if (! constrain_operands (1)) + if (! constrain_operands (1, get_preferred_alternatives (insn))) return 1; } @@ -2159,6 +2160,21 @@ get_preferred_alternatives (rtx_insn *in return get_bool_attr_mask (insn, BA_PREFERRED_FOR_SIZE); } +/* Return the set of alternatives of INSN that are allowed by the current + target and are preferred for the size/speed optimization choice + associated with BB. Passing a separate BB is useful if INSN has not + been emitted yet or if we are considering moving it to a different + block. */ + +alternative_mask +get_preferred_alternatives (rtx_insn *insn, basic_block bb) +{ + if (optimize_bb_for_speed_p (bb)) +return get_bool_attr_mask (insn, BA_PREFERRED_FOR_SPEED); + else +return get_bool_attr_mask (insn, BA_PREFERRED_FOR_SIZE); +} + /* Assert that the cached boolean attributes for INSN are still accurate. The backend is required to define these attributes in a way that only depends on the current target (rather than operands, compiler phase, @@ -2199,7 +2215,7 @@ extract_insn_cached (rtx_insn *insn) extract_constrain_insn (rtx_insn *insn) { extract_insn (insn); - if (!constrain_operands (reload_completed)) + if (!constrain_operands (reload_completed, get_enabled_alternatives (insn))) fatal_insn_not_found (insn); } @@ -2210,16
[PATCH 4/5] Remove recog_data.enabled_alternatives
After the previous patches, this one gets rid of recog_data.enabled_alternatives and its one remaining use. Richard gcc/ * recog.h (recog_data_d): Remove enabled_alternatives. * recog.c (extract_insn): Don't set it. * reload.c (find_reloads): Call get_enabled_alternatives. Index: gcc/recog.h === --- gcc/recog.h 2014-10-17 15:50:02.627695847 +0100 +++ gcc/recog.h 2014-10-17 15:51:59.662308095 +0100 @@ -250,12 +250,6 @@ struct recog_data_d /* True if insn is ASM_OPERANDS. */ bool is_asm; - /* Specifies whether an insn alternative is enabled using the `enabled' - attribute in the insn pattern definition. For back ends not using - the `enabled' attribute the bits are always set to 1 in expand_insn. - Bits beyond the last alternative are also set to 1. */ - alternative_mask enabled_alternatives; - /* In case we are caching, hold insn data was generated for. */ rtx insn; }; Index: gcc/recog.c === --- gcc/recog.c 2014-10-17 15:50:02.627695847 +0100 +++ gcc/recog.c 2014-10-17 15:51:59.662308095 +0100 @@ -2339,8 +2339,6 @@ extract_insn (rtx_insn *insn) gcc_assert (recog_data.n_alternatives = MAX_RECOG_ALTERNATIVES); - recog_data.enabled_alternatives = get_enabled_alternatives (insn); - recog_data.insn = NULL; which_alternative = -1; } Index: gcc/reload.c === --- gcc/reload.c2014-10-17 15:50:02.627695847 +0100 +++ gcc/reload.c2014-10-17 15:51:59.666308048 +0100 @@ -2997,13 +2997,14 @@ find_reloads (rtx_insn *insn, int replac First loop over alternatives. */ + alternative_mask enabled = get_enabled_alternatives (insn); for (this_alternative_number = 0; this_alternative_number n_alternatives; this_alternative_number++) { int swapped; - if (!TEST_BIT (recog_data.enabled_alternatives, this_alternative_number)) + if (!TEST_BIT (enabled, this_alternative_number)) { int i;
[PATCH 5/5] Use preferred_for_speed in i386.md
Undo the original fix for 61630 and use preferred_for_speed in the problematic pattern. I've not written many gcc.target/i386 tests so the markup might need some work. Richard gcc/ * lra.c (lra): Remove call to recog_init. * config/i386/i386.md (preferred_for_speed): New attribute (*floatSWI48:modeMODEF:mode2_sse): Override it instead of enabled. gcc/testsuite/ * gcc.target/i386/conversion-2.c: New test. Index: gcc/lra.c === --- gcc/lra.c 2014-10-17 15:47:34.357453465 +0100 +++ gcc/lra.c 2014-10-17 15:53:10.889463339 +0100 @@ -2116,11 +2116,6 @@ lra (FILE *f) lra_in_progress = 1; - /* The enable attributes can change their values as LRA starts - although it is a bad practice. To prevent reuse of the outdated - values, clear them. */ - recog_init (); - lra_live_range_iter = lra_coalesce_iter = 0; lra_constraint_iter = lra_constraint_iter_after_spill = 0; lra_inheritance_iter = lra_undo_inheritance_iter = 0; Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md 2014-10-01 10:48:51.079918153 +0100 +++ gcc/config/i386/i386.md 2014-10-17 15:53:10.889463339 +0100 @@ -779,6 +779,8 @@ (define_attr enabled ] (const_int 1))) +(define_attr preferred_for_speed (const_int 1)) + ;; Describe a user's asm statement. (define_asm_attributes [(set_attr length 128) @@ -4794,16 +4796,12 @@ (define_insn *floatSWI48:modeMODEF:m (symbol_ref TARGET_MIX_SSE_I387 X87_ENABLE_FLOAT (MODEF:MODEmode, SWI48:MODEmode)) -(eq_attr alternative 1) - /* ??? For sched1 we need constrain_operands to be able to - select an alternative. Leave this enabled before RA. */ - (symbol_ref TARGET_INTER_UNIT_CONVERSIONS - || optimize_function_for_size_p (cfun) - || !(reload_completed -|| reload_in_progress -|| lra_in_progress)) ] (symbol_ref true))) + (set (attr preferred_for_speed) + (cond [(eq_attr alternative 1) + (symbol_ref TARGET_INTER_UNIT_CONVERSIONS)] + (symbol_ref true))) ]) (define_insn *floatSWI48x:modeMODEF:mode2_i387 Index: gcc/testsuite/gcc.target/i386/conversion-2.c === --- /dev/null 2014-10-06 08:13:11.214126005 +0100 +++ gcc/testsuite/gcc.target/i386/conversion-2.c2014-10-17 15:53:10.893463291 +0100 @@ -0,0 +1,35 @@ +/* { dg-options -O2 -fno-toplevel-reorder -mfpmath=sse } */ +/* { dg-require-effective-target lp64 } */ + +void __attribute__ ((hot, target (tune=bdver2))) +f1 (int x) +{ + register float f asm (%xmm0) = x; + asm volatile (#f :: x (f)); +} + +void __attribute__ ((cold, target (tune=bdver2))) +f2 (int x) +{ + register float f asm (%xmm1) = x; + asm volatile (#f :: x (f)); +} + +void __attribute__ ((hot, target (tune=bdver2))) +f3 (int x) +{ + register float f asm (%xmm2) = x; + asm volatile (#f :: x (f)); +} + +void __attribute__ ((cold, target (tune=bdver2))) +f4 (int x) +{ + register float f asm (%xmm3) = x; + asm volatile (#f :: x (f)); +} + +/* { dg-final { scan-assembler sp\\\), %xmm0 } } */ +/* { dg-final { scan-assembler di, %xmm1 } } */ +/* { dg-final { scan-assembler sp\\\), %xmm2 } } */ +/* { dg-final { scan-assembler di, %xmm3 } } */
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On 17 Oct 16:14, Jakub Jelinek wrote: -volatile int v; +volatile int v = 0; Why? Ok, I'll revert it back. --- a/libgomp/testsuite/libgomp.c/target-7.c +++ b/libgomp/testsuite/libgomp.c/target-7.c @@ -1,7 +1,9 @@ +// { dg-require-effective-target offload_device } + Why? The test was specially written such that it tests host fallback (if f is true) too. void foo (int f) @@ -18,7 +20,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (v = 1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0 || omp_is_initial_device ()) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -30,7 +32,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0 || omp_is_initial_device ()) abort (); #pragma omp target device (d) if (1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -59,7 +61,7 @@ foo (int f) #pragma omp target data if (v = 1) map (to: h) { #pragma omp target if (v = 1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 8) +if (omp_get_level () != 0 || omp_is_initial_device () || h++ != 8) abort (); #pragma omp target update if (v = 1) from (h) } @@ -87,7 +89,7 @@ foo (int f) #pragma omp target data if (1) map (to: h) { #pragma omp target if (1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 12) +if (omp_get_level () != 0 || omp_is_initial_device () || h++ != 12) abort (); #pragma omp target update if (1) from (h) } I don't understand any of these changes. Here in the original test you have: #pragma omp target if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); There are 2 same if-statements, but target pragmas have different clauses. The second depends on device (d), and (f !omp_is_initial_device ()) works fine. But the first one doesn't depend on 'f', and if we have offload device, this check will fail. So, to have this test working both with offloading and fallback, we need to remove all pragmas without device-clause. -- Ilya
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On Fri, Oct 17, 2014 at 06:58:17PM +0400, Ilya Verbin wrote: Here in the original test you have: #pragma omp target if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); There are 2 same if-statements, but target pragmas have different clauses. The second depends on device (d), and (f !omp_is_initial_device ()) works fine. But the first one doesn't depend on 'f', and if we have offload device, this check will fail. So, to have this test working both with offloading and fallback, we need to remove all pragmas without device-clause. Well, there is no need to remove them, just the || (f !omp_is_initial_device ()) should be dropped from target regions without device (d) on them. Where there is no f guard, the condition should stay. Do you agree? Jakub
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On 17 Oct 17:10, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 06:58:17PM +0400, Ilya Verbin wrote: Here in the original test you have: #pragma omp target if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); There are 2 same if-statements, but target pragmas have different clauses. The second depends on device (d), and (f !omp_is_initial_device ()) works fine. But the first one doesn't depend on 'f', and if we have offload device, this check will fail. So, to have this test working both with offloading and fallback, we need to remove all pragmas without device-clause. Well, there is no need to remove them, just the || (f !omp_is_initial_device ()) should be dropped from target regions without device (d) on them. Where there is no f guard, the condition should stay. Do you agree? Yes, should I re-post the patch? -- Ilya
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On Fri, Oct 17, 2014 at 07:17:31PM +0400, Ilya Verbin wrote: On 17 Oct 17:10, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 06:58:17PM +0400, Ilya Verbin wrote: Here in the original test you have: #pragma omp target if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) abort (); There are 2 same if-statements, but target pragmas have different clauses. The second depends on device (d), and (f !omp_is_initial_device ()) works fine. But the first one doesn't depend on 'f', and if we have offload device, this check will fail. So, to have this test working both with offloading and fallback, we need to remove all pragmas without device-clause. Well, there is no need to remove them, just the || (f !omp_is_initial_device ()) should be dropped from target regions without device (d) on them. Where there is no f guard, the condition should stay. Do you agree? Yes, should I re-post the patch? Guess just the target-7.c patch is enough, to make sure we agree on the same thing. Jakub
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On 17 Oct 17:18, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 07:17:31PM +0400, Ilya Verbin wrote: On 17 Oct 17:10, Jakub Jelinek wrote: Well, there is no need to remove them, just the || (f !omp_is_initial_device ()) should be dropped from target regions without device (d) on them. Where there is no f guard, the condition should stay. Do you agree? Yes, should I re-post the patch? Guess just the target-7.c patch is enough, to make sure we agree on the same thing. Here it is: diff --git a/libgomp/testsuite/libgomp.c/target-7.c b/libgomp/testsuite/libgomp.c/target-7.c index 90de6c5..0fe6150 100644 --- a/libgomp/testsuite/libgomp.c/target-7.c +++ b/libgomp/testsuite/libgomp.c/target-7.c @@ -18,7 +18,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (v = 1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -30,7 +30,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0) abort (); #pragma omp target device (d) if (1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -59,7 +59,7 @@ foo (int f) #pragma omp target data if (v = 1) map (to: h) { #pragma omp target if (v = 1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 8) +if (omp_get_level () != 0 || h++ != 8) abort (); #pragma omp target update if (v = 1) from (h) } @@ -87,7 +87,7 @@ foo (int f) #pragma omp target data if (1) map (to: h) { #pragma omp target if (1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 12) +if (omp_get_level () != 0 || h++ != 12) abort (); #pragma omp target update if (1) from (h) } -- Ilya
Re: [PATCH 7/n] OpenMP 4.0 offloading infrastructure: testsuite
On Fri, Oct 17, 2014 at 07:29:26PM +0400, Ilya Verbin wrote: On 17 Oct 17:18, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 07:17:31PM +0400, Ilya Verbin wrote: On 17 Oct 17:10, Jakub Jelinek wrote: Well, there is no need to remove them, just the || (f !omp_is_initial_device ()) should be dropped from target regions without device (d) on them. Where there is no f guard, the condition should stay. Do you agree? Yes, should I re-post the patch? Guess just the target-7.c patch is enough, to make sure we agree on the same thing. Here it is: LGTM, thanks. diff --git a/libgomp/testsuite/libgomp.c/target-7.c b/libgomp/testsuite/libgomp.c/target-7.c index 90de6c5..0fe6150 100644 --- a/libgomp/testsuite/libgomp.c/target-7.c +++ b/libgomp/testsuite/libgomp.c/target-7.c @@ -18,7 +18,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (v = 1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0) abort (); #pragma omp target device (d) if (v = 1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -30,7 +30,7 @@ foo (int f) if (omp_get_level () != 0 || !omp_is_initial_device ()) abort (); #pragma omp target if (1) - if (omp_get_level () != 0 || (f !omp_is_initial_device ())) + if (omp_get_level () != 0) abort (); #pragma omp target device (d) if (1) if (omp_get_level () != 0 || (f !omp_is_initial_device ())) @@ -59,7 +59,7 @@ foo (int f) #pragma omp target data if (v = 1) map (to: h) { #pragma omp target if (v = 1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 8) +if (omp_get_level () != 0 || h++ != 8) abort (); #pragma omp target update if (v = 1) from (h) } @@ -87,7 +87,7 @@ foo (int f) #pragma omp target data if (1) map (to: h) { #pragma omp target if (1) -if (omp_get_level () != 0 || (f !omp_is_initial_device ()) || h++ != 12) +if (omp_get_level () != 0 || h++ != 12) abort (); #pragma omp target update if (1) from (h) } Jakub
Re: [libatomic PATCH] Fix libatomic behavior for big endian toolchain
Changes to architecture-independent files must use architecture-independent conditionals, so __BYTE_ORDER__ not __ARMEB__. -- Joseph S. Myers jos...@codesourcery.com
[gomp4] Use GOMP_PLUGIN_ not gomp_plugin_ for libgomp plugin API
Hi, As the title says, this patch makes the libgomp plugin API use the GOMP_PLUGIN_ prefix rather than gomp_plugin_. This is purely a mechanical change. OK for the gomp4 branch? Thanks, Julian ChangeLog libgomp/ * libgomp-plugin.c (gomp_plugin_*): Rename to... (GOMP_PLUGIN_*): This. * libgomp-plugin.h: Likewise. * libgomp.map: Likewise. * oacc-host.c (GOMP): Use GOMP_PLUGIN_ in macro expansion. * oacc-plugin.c (gomp_plugin_*): Rename to... (GOMP_PLUGIN_*): This. * plugin-nvptx.c: Likewise.commit cce63ddb8895d3b51a176d68045b7920affc05e5 Author: Julian Brown jul...@codesourcery.com Date: Wed Oct 15 02:05:08 2014 -0700 Use GOMP_PLUGIN_ not gomp_plugin_ for libgomp plugin API. diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c index 46dd7b0..0f72bb9 100644 --- a/libgomp/libgomp-plugin.c +++ b/libgomp/libgomp-plugin.c @@ -31,25 +31,25 @@ #include target.h void * -gomp_plugin_malloc (size_t size) +GOMP_PLUGIN_malloc (size_t size) { return gomp_malloc (size); } void * -gomp_plugin_malloc_cleared (size_t size) +GOMP_PLUGIN_malloc_cleared (size_t size) { return gomp_malloc_cleared (size); } void * -gomp_plugin_realloc (void *ptr, size_t size) +GOMP_PLUGIN_realloc (void *ptr, size_t size) { return gomp_realloc (ptr, size); } void -gomp_plugin_error (const char *msg, ...) +GOMP_PLUGIN_error (const char *msg, ...) { va_list ap; @@ -59,7 +59,7 @@ gomp_plugin_error (const char *msg, ...) } void -gomp_plugin_notify (const char *msg, ...) +GOMP_PLUGIN_notify (const char *msg, ...) { va_list ap; @@ -69,7 +69,7 @@ gomp_plugin_notify (const char *msg, ...) } void -gomp_plugin_fatal (const char *msg, ...) +GOMP_PLUGIN_fatal (const char *msg, ...) { va_list ap; @@ -82,25 +82,25 @@ gomp_plugin_fatal (const char *msg, ...) } void -gomp_plugin_mutex_init (gomp_mutex_t *mutex) +GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex) { gomp_mutex_init (mutex); } void -gomp_plugin_mutex_destroy (gomp_mutex_t *mutex) +GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex) { gomp_mutex_destroy (mutex); } void -gomp_plugin_mutex_lock (gomp_mutex_t *mutex) +GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex) { gomp_mutex_lock (mutex); } void -gomp_plugin_mutex_unlock (gomp_mutex_t *mutex) +GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex) { gomp_mutex_unlock (mutex); } diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 0ecb407..e31573c 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -31,27 +31,27 @@ /* alloc.c */ -extern void *gomp_plugin_malloc (size_t) __attribute__((malloc)); -extern void *gomp_plugin_malloc_cleared (size_t) __attribute__((malloc)); -extern void *gomp_plugin_realloc (void *, size_t); +extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc)); +extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc)); +extern void *GOMP_PLUGIN_realloc (void *, size_t); /* error.c */ -extern void gomp_plugin_notify(const char *msg, ...); -extern void gomp_plugin_error (const char *, ...) +extern void GOMP_PLUGIN_notify(const char *msg, ...); +extern void GOMP_PLUGIN_error (const char *, ...) __attribute__((format (printf, 1, 2))); -extern void gomp_plugin_fatal (const char *, ...) +extern void GOMP_PLUGIN_fatal (const char *, ...) __attribute__((noreturn, format (printf, 1, 2))); /* mutex.c */ -extern void gomp_plugin_mutex_init (gomp_mutex_t *mutex); -extern void gomp_plugin_mutex_destroy (gomp_mutex_t *mutex); -extern void gomp_plugin_mutex_lock (gomp_mutex_t *mutex); -extern void gomp_plugin_mutex_unlock (gomp_mutex_t *mutex); +extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex); +extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex); +extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex); +extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex); /* target.c */ -extern void gomp_plugin_async_unmap_vars (void *ptr); +extern void GOMP_PLUGIN_async_unmap_vars (void *ptr); #endif diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index e1e87d9..538aabb 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -326,15 +326,15 @@ GOACC_2.0 { # FIXME: Hygiene/grouping/naming? PLUGIN_1.0 { global: - gomp_plugin_malloc; - gomp_plugin_malloc_cleared; - gomp_plugin_realloc; - gomp_plugin_error; - gomp_plugin_notify; - gomp_plugin_fatal; - gomp_plugin_mutex_init; - gomp_plugin_mutex_destroy; - gomp_plugin_mutex_lock; - gomp_plugin_mutex_unlock; - gomp_plugin_async_unmap_vars; + GOMP_PLUGIN_malloc; + GOMP_PLUGIN_malloc_cleared; + GOMP_PLUGIN_realloc; + GOMP_PLUGIN_error; + GOMP_PLUGIN_notify; + GOMP_PLUGIN_fatal; + GOMP_PLUGIN_mutex_init; + GOMP_PLUGIN_mutex_destroy; + GOMP_PLUGIN_mutex_lock; + GOMP_PLUGIN_mutex_unlock; + GOMP_PLUGIN_async_unmap_vars; }; diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 7a50d65..a47617a 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@
Re: -fuse-caller-save - Collect register usage information
On 10/17/14 05:00, Richard Biener wrote: I'm starting to lean towards -foptimize-call-clobbers or similar. Well, it is really some form of IPA driven register allocation. Whether you want to call it -fipa-ra or not is another question - but if we had such option then enabling it with that option would be fine. Also users may have no idea what call vs callee clobbers are, but IPA RA may be a term that is more widely known (or at least google can come up with something for you). So - I like -fipa-ra more. Similarly. At the heart of the matter is we're utilizing information about the callee's behaviour to improve the code we generate in the caller. That's clearly in IPA's domain IMHO. Jeff