Broken link
In this message: http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01096.html Gerald indicated a stale link for 'Guide for the use of the Ada programming language in high integrity systems'. The document is, in fact, not removed, just moved. It's new location is: http://www.dit.upm.es/~str/ork/documents/adahis.pdf Gratias tibi ago, Leighton Dawson - - - . . . . . . - .\\//
Re: [PATCH] Try to coalesce for unary and binary ops
On 04/17/14 05:38, Richard Biener wrote: The patch below increases the number of coalescs we attempt to also cover unary and binary operations. This improves initial code generation for code like [ ... ] Expansion has special code to improve coalescing of op1 with target thus this is what we try to match here. Overall there are positive and negative size effects during a bootstrap on x86_64, but overall it seems to be a loss - stage3 cc1 text size is 18261647 bytes without the patch compared to 18265751 bytes with the patch. Now the question is what does this tell us? Not re-using the same pseudo as op and target is always better? Coalescing has always been and probably always will be a transformation that can help or hurt code. At the level where you're applying it I suspect there aren't any really good heuristics to determine when it's likely profitable vs not profitable. I'd tend to lean towards coalescing is good here, but would want to investigate the regressions a bit to see if there's something systematic going on that causes us to generate poor code in some small number of situations that we might be able to fix. You might even initially restrict to just coalescing on unary operators until we have better heuristics on binary operators. Also note all that may end up hurting risc targets. Jeff
Re: [PATCH] Change HONOR_REG_ALLOC_ORDER to a marco for C expression
On 04/16/14 06:14, Kito Cheng wrote: Hi Vladimir: thanks your replay and approve, however I don't have commit right yet, can you help to commit it? thanks! I fixed up some minor whitespace issues and committed your patch. Jeff
Re: [PATCH] Remove keep_aligning from get_inner_reference
On 04/22/14 03:25, Eric Botcazou wrote: Sure, and thanks again for your help. Thanks! I was not able to find any difference on the generated code with or without that patch. Yes, my gut feeling is that TYPE_ALIGN_OK is really obsolete now. It is set in a single place in the compiler (gcc-interface/decl.c:gnat_to_gnu_entity): /* Tell the middle-end that objects of tagged types are guaranteed to be properly aligned. This is necessary because conversions to the class-wide type are translated into conversions to the root type, which can be less aligned than some of its derived types. */ if (Is_Tagged_Type (gnat_entity) || Is_Class_Wide_Equivalent_Type (gnat_entity)) TYPE_ALIGN_OK (gnu_type) = 1; but we changed the way these conversions are done some time ago. So does this remove the last concern around Bernd's patch? Jeff
Re: [PATCH] Change HONOR_REG_ALLOC_ORDER to a marco for C expression
Hi Jeff: I fixed up some minor whitespace issues and committed your patch. Thanks for your help :)
Re: [C PATCH] Fix up diagnostics (PR c/25801)
On Thu, May 01, 2014 at 10:54:18PM +, Joseph S. Myers wrote: I think the comment on this function should be updated to explain the interface contract when an incomplete (or function) type is passed (i.e. return size_one_node, caller is responsible for any diagnostics). Done. The test should I think also cover incomplete union types, and pointer subtraction for all the different cases of incomplete types, and the += and -= operators (i.e. cover invalid arithmetic on pointers to incomplete types more thoroughly, even if not actually affected by this patch). Done. Another change is that I used COMPLETE_TYPE_P instead of COMPLETE_OR_VOID_TYPE_P, that would be a bug and before it didn't make much sense: we checked VOID_TYPE before. Tested x86_64-linux, ok now? 2014-05-02 Marek Polacek pola...@redhat.com PR c/25801 * c-typeck.c (c_size_in_bytes): Update comment. Don't call error. Return size_one_node when the type is not complete. (pointer_diff): Remove comment. (build_unary_op): Improve error messages. * gcc.dg/pr25801.c: New test. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 98567b4..41770bc 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -1754,22 +1754,20 @@ type_lists_compatible_p (const_tree args1, const_tree args2, } } -/* Compute the size to increment a pointer by. */ +/* Compute the size to increment a pointer by. When a function type or void + type or incomplete type is passed, size_one_node is returned. + This function does not emit any diagnostics; the caller is responsible + for that. */ static tree c_size_in_bytes (const_tree type) { enum tree_code code = TREE_CODE (type); - if (code == FUNCTION_TYPE || code == VOID_TYPE || code == ERROR_MARK) + if (code == FUNCTION_TYPE || code == VOID_TYPE || code == ERROR_MARK + || !COMPLETE_TYPE_P (type)) return size_one_node; - if (!COMPLETE_OR_VOID_TYPE_P (type)) -{ - error (arithmetic on pointer to an incomplete type); - return size_one_node; -} - /* Convert in case a char is more than one unit. */ return size_binop_loc (input_location, CEIL_DIV_EXPR, TYPE_SIZE_UNIT (type), size_int (TYPE_PRECISION (char_type_node) @@ -3530,7 +3528,6 @@ pointer_diff (location_t loc, tree op0, tree op1) if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (TREE_TYPE (orig_op1 error_at (loc, arithmetic on pointer to an incomplete type); - /* This generates an error if op0 is pointer to incomplete type. */ op1 = c_size_in_bytes (target_type); if (pointer_to_zero_sized_aggr_p (TREE_TYPE (orig_op1))) @@ -4004,16 +4001,18 @@ build_unary_op (location_t location, if (typecode == POINTER_TYPE) { - /* If pointer target is an undefined struct, + /* If pointer target is an incomplete type, we just cannot know how to do the arithmetic. */ if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (argtype))) { if (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR) error_at (location, - increment of pointer to unknown structure); + increment of pointer to an incomplete type %qT, + TREE_TYPE (argtype)); else error_at (location, - decrement of pointer to unknown structure); + decrement of pointer to an incomplete type %qT, + TREE_TYPE (argtype)); } else if (TREE_CODE (TREE_TYPE (argtype)) == FUNCTION_TYPE || TREE_CODE (TREE_TYPE (argtype)) == VOID_TYPE) diff --git gcc/testsuite/gcc.dg/pr25801.c gcc/testsuite/gcc.dg/pr25801.c index e69de29..10b53d3 100644 --- gcc/testsuite/gcc.dg/pr25801.c +++ gcc/testsuite/gcc.dg/pr25801.c @@ -0,0 +1,44 @@ +/* PR c/25801 */ +/* { dg-do compile } */ +/* { dg-options -std=c99 } */ + +int (*a)[]; +struct S *s; +union U *u; +enum E *e; + +void +f (void) +{ + a++; /* { dg-error increment of pointer to an incomplete type } */ + ++a; /* { dg-error increment of pointer to an incomplete type } */ + a--; /* { dg-error decrement of pointer to an incomplete type } */ + --a; /* { dg-error decrement of pointer to an incomplete type } */ + a += 1; /* { dg-error invalid use of array with unspecified bounds } */ + a -= 1; /* { dg-error invalid use of array with unspecified bounds } */ + a - a; /* { dg-error arithmetic on pointer to an incomplete type } */ + + s++; /* { dg-error increment of pointer to an incomplete type } */ + ++s; /* { dg-error increment of pointer to an incomplete type } */ + s--; /* { dg-error decrement of pointer to an incomplete type } */ + --s; /* { dg-error decrement of pointer to an incomplete type } */ + s += 1; /* { dg-error invalid use of undefined type } */ + s -= 1; /* { dg-error invalid use of undefined type
Re: [PING][ARM] Handle simple SImode PLUS and MINUS operations in rtx costs
Ping again. Kyrill On 24/04/14 09:20, Kyrill Tkachov wrote: Hi all, Pinging this: http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01276.html Thanks, Kyrill
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
On Fri, 2014-05-02 at 00:45 +0200, Samuel Thibault wrote: Hello, Svante Signell, le Thu 24 Apr 2014 10:39:10 +0200, a écrit : - Without split stack enabled around 70 libgo tests pass and 50 fails, most of them with a segfault. - Enabling split stack and using the libc Samuel built all 122 libgo tests fail with a segfault. Please provide segfault backtrace, rpctrace, etc. Otherwise we won't be able to help you. You already have one in this thread, analysed by Justus: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01736.html The full rpctrace and gdb backtrace for that case was published on paste.debian.net, but are gone now. I can upload again if you need them, or do you want it attached to a mail?
[DOC PATCH] Describe -fsanitize=float-divide-by-zero
When I submitted -fsanitize=float-divide-by-zero stuff, I forgot to document the option in the table of ubsan options. Ok? 2014-05-02 Marek Polacek pola...@redhat.com * doc/invoke.texi: Describe -fsanitize=float-divide-by-zero. diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index 0eba1e0..3fe9d5f 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -5393,6 +5393,14 @@ While @option{-ftrapv} causes traps for signed overflows to be emitted, @option{-fsanitize=undefined} gives a diagnostic message. This currently works only for the C family of languages. +@item -fsanitize=float-divide-by-zero +@opindex fsanitize=float-divide-by-zero + +Detect floating-point division by zero. Unlike other similar options, +@option{-fsanitize=float-divide-by-zero} is not enabled by +@option{-fsanitize=undefined}, since floating-point division by zero can +be a legitimate way of obtaining infinities and NaNs. + @item -fsanitize-recover @opindex fsanitize-recover By default @option{-fsanitize=undefined} sanitization (and its suboptions Marek
Re: [DOC PATCH] Describe -fsanitize=float-divide-by-zero
On Fri, May 02, 2014 at 10:11:05AM +0200, Marek Polacek wrote: When I submitted -fsanitize=float-divide-by-zero stuff, I forgot to document the option in the table of ubsan options. Ok? 2014-05-02 Marek Polacek pola...@redhat.com * doc/invoke.texi: Describe -fsanitize=float-divide-by-zero. Ok. --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -5393,6 +5393,14 @@ While @option{-ftrapv} causes traps for signed overflows to be emitted, @option{-fsanitize=undefined} gives a diagnostic message. This currently works only for the C family of languages. +@item -fsanitize=float-divide-by-zero +@opindex fsanitize=float-divide-by-zero + +Detect floating-point division by zero. Unlike other similar options, +@option{-fsanitize=float-divide-by-zero} is not enabled by +@option{-fsanitize=undefined}, since floating-point division by zero can +be a legitimate way of obtaining infinities and NaNs. + @item -fsanitize-recover @opindex fsanitize-recover By default @option{-fsanitize=undefined} sanitization (and its suboptions Marek Jakub
[PATCH] RTEMS: Adjust SPARC multilibs
A recent fix for the LEON3 support added a new option -muser-mode to select the ASI of the CASA instruction used for atomic operations. Use the user-mode CASA instruction since this works also in supervisor-mode. This patch should be applied to GCC 4.8, 4,9 and mainline. I do not have write access, so in case this gets approved, please commit it for me. gcc/ChangeLog 2014-05-02 Sebastian Huber sebastian.hu...@embedded-brains.de * config/sparc/t-rtems: Use user-mode for leon3 multilibs. --- gcc/config/sparc/t-rtems | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems index f1a3d84..91de612 100644 --- a/gcc/config/sparc/t-rtems +++ b/gcc/config/sparc/t-rtems @@ -17,6 +17,20 @@ # http://www.gnu.org/licenses/. # -MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3 -MULTILIB_DIRNAMES = soft v8 leon3 +MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3 muser-mode +MULTILIB_DIRNAMES = soft v8 leon3 user-mode MULTILIB_MATCHES = msoft-float=mno-fpu + +# Enumeration of multilibs + +MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/muser-mode +# MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8 +# MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3/muser-mode +MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3 +MULTILIB_EXCEPTIONS += msoft-float/muser-mode +# MULTILIB_EXCEPTIONS += msoft-float +MULTILIB_EXCEPTIONS += mcpu=v8/muser-mode +# MULTILIB_EXCEPTIONS += mcpu=v8 +# MULTILIB_EXCEPTIONS += mcpu=leon3/muser-mode +MULTILIB_EXCEPTIONS += mcpu=leon3 +MULTILIB_EXCEPTIONS += muser-mode -- 1.7.7
Re: [PATCH] RTEMS thread model configuration
Ping. On 2014-04-18 12:11, Sebastian Huber wrote: From: Sebastian Huber sebastian-hu...@web.de The command line to build a GCC for RTEMS contained virtually always a '--enable-threads'. This patch helps to avoid this extra configuration command line parameter and makes the GCC build a bit more user friendly for RTEMS. This patch should be applied to GCC 4.9 branch and master. 2014-04-18 Sebastian Huber sebastian.hu...@embedded-brains.de * config.gcc (*-*-rtems*): Default to 'rtems' thread model. Enable selection of 'posix' or no thread model. --- gcc/config.gcc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 3c55c88..93d5994 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -791,7 +791,13 @@ case ${target} in ;; *-*-rtems*) case ${enable_threads} in -yes) thread_file='rtems' ;; + | yes | rtems) thread_file='rtems' ;; +posix) thread_file='posix' ;; +no) ;; +*) + echo 'Unknown thread configuration for RTEMS' + exit 1 + ;; esac tmake_file=${tmake_file} t-rtems extra_options=${extra_options} rtems.opt -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
[AArch64] Fix integer vabs intrinsics
Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. We keep the standard pattern name around for the benefit of auto-vectorization. Tested on aarch64-none-elf with no issues. This will also be a bug on 4.9 (ugh), OK for trunk and gcc-4_9-branch? Thanks, James --- 2014-05-02 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Don't fold integer abs builtins. * config/aarch64/aarch64-simd-builtins.def (abs): Split by integer and floating point variants. * config/aarch64/aarch64-simd.md (aarch64_absmode): New. * config/aarch64/iterators.md (unspec): Add UNSPEC_ABS. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a301982..6d47c0b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1153,7 +1153,7 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args, switch (fcode) { - BUILTIN_VALLDI (UNOP, abs, 2) + BUILTIN_VDQF (UNOP, abs, 2) return fold_build1 (ABS_EXPR, type, args[0]); break; BUILTIN_VALLDI (BINOP, cmge, 0) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 339e8f8..e2d1078 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -365,7 +365,8 @@ BUILTIN_VDQF (UNOP, frecpe, 0) BUILTIN_VDQF (BINOP, frecps, 0) - BUILTIN_VALLDI (UNOP, abs, 2) + BUILTIN_VDQ (UNOP, abs, 0) + BUILTIN_VDQF (UNOP, abs, 2) VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 108bc8d88931e67e6c7eeb4a01bb391a1ced..acb75f5bd0c732d8e11d4a7b6b61f8b1e81d1960 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -390,6 +390,18 @@ (define_insn abamode_3 [(set_attr type neon_arith_accq)] ) +;; To mirror the behaviour of hardware, as required for arm_neon.h, we must +;; show an abundance of caution around the abs instruction. + +(define_insn aarch64_absmode + [(set (match_operand:VDQ 0 register_operand =w) +(unspec:VDQ [(match_operand:VDQ 1 register_operand w)] + UNSPEC_ABS))] + TARGET_SIMD + abs\t%0.Vtype, %1.Vtype + [(set_attr type neon_absq)] +) + (define_insn fabdmode_3 [(set (match_operand:VDQF 0 register_operand =w) (abs:VDQF (minus:VDQF diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c537c3780eea95fa315c82bb36ac7f91f0f920fd..e45a1a11991a71ad37a8d5bb7c4ff81627671384 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -197,6 +197,7 @@ (define_c_enum unspec [ UNSPEC_ASHIFT_SIGNED ; Used in aarch-simd.md. UNSPEC_ASHIFT_UNSIGNED ; Used in aarch64-simd.md. +UNSPEC_ABS ; Used in aarch64-simd.md. UNSPEC_FMAX ; Used in aarch64-simd.md. UNSPEC_FMAXNMV ; Used in aarch64-simd.md. UNSPEC_FMAXV ; Used in aarch64-simd.md.
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Svante Signell, le Fri 02 May 2014 10:18:12 +0200, a écrit : Thread 4 (Thread 1205.4): #0 0x019977b7 in _hurd_intr_rpc_msg_in_trap () at intr-msg.c:132 err = optimized out err = optimized out user_option = 3 user_timeout = 48 m = 0x532370 msgh_bits = 0 remote_port = 268509186 msgid = 21118 save_data = optimized out __PRETTY_FUNCTION__ = _hurd_intr_rpc_mach_msg Ok, reading this again, this looks like what Justus analyzed. Samuel
Re: [AArch64] Fix integer vabs intrinsics
On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Thanks, Andrew Pinski We keep the standard pattern name around for the benefit of auto-vectorization. Tested on aarch64-none-elf with no issues. This will also be a bug on 4.9 (ugh), OK for trunk and gcc-4_9-branch? Thanks, James --- 2014-05-02 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Don't fold integer abs builtins. * config/aarch64/aarch64-simd-builtins.def (abs): Split by integer and floating point variants. * config/aarch64/aarch64-simd.md (aarch64_absmode): New. * config/aarch64/iterators.md (unspec): Add UNSPEC_ABS.
Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook
+2014-04-29 Kugan Vivekanandarajah kug...@linaro.org + + * config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New + define. + * config/aarch64/aarch64-protos.h (aarch64_atomic_assign_expand_fenv): + New function declaration. + * config/aarch64/aarch64-builtins.c (aarch64_builtins) : Add + AARCH64_BUILTIN_GET_FPCR, AARCH64_BUILTIN_SET_FPCR. + AARCH64_BUILTIN_GET_FPSR and AARCH64_BUILTIN_SET_FPSR. + (aarch64_init_builtins) : Initialize builtins + __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr. + __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr. + (aarch64_expand_builtin) : Expand builtins __builtins_aarch64_set_fpcr + __builtins_aarch64_get_fpcr, __builtins_aarch64_get_fpsr, + and __builtins_aarch64_set_fpsr. + (aarch64_atomic_assign_expand_fenv): New function. + * config/aarch64/aarch64.md (set_fpcr): New pattern. + (get_fpcr) : Likewise. + (set_fpsr) : Likewise. + (get_fpsr) : Likewise. + (unspecv): Add UNSPECV_GET_FPCR and UNSPECV_SET_FPCR, UNSPECV_GET_FPSR + and UNSPECV_SET_FPSR. + * doc/extend.texi (AARCH64 Built-in Functions) : Document + __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr. + __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr. Updated is based on the review at http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00041.html. FE_* values are now changed to AARCH64_FE-*. Thanks, Kugan diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 55cfe0a..40d53b1 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -371,6 +371,12 @@ static aarch64_simd_builtin_datum aarch64_simd_builtin_data[] = { enum aarch64_builtins { AARCH64_BUILTIN_MIN, + + AARCH64_BUILTIN_GET_FPCR, + AARCH64_BUILTIN_SET_FPCR, + AARCH64_BUILTIN_GET_FPSR, + AARCH64_BUILTIN_SET_FPSR, + AARCH64_SIMD_BUILTIN_BASE, #include aarch64-simd-builtins.def AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_BUILTIN_BASE @@ -752,6 +758,24 @@ aarch64_init_simd_builtins (void) void aarch64_init_builtins (void) { + tree ftype_set_fpr += build_function_type_list (void_type_node, unsigned_type_node, NULL); + tree ftype_get_fpr += build_function_type_list (unsigned_type_node, NULL); + + aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR] += add_builtin_function (__builtin_aarch64_get_fpcr, ftype_get_fpr, + AARCH64_BUILTIN_GET_FPCR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR] += add_builtin_function (__builtin_aarch64_set_fpcr, ftype_set_fpr, + AARCH64_BUILTIN_SET_FPCR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPSR] += add_builtin_function (__builtin_aarch64_get_fpsr, ftype_get_fpr, + AARCH64_BUILTIN_GET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPSR] += add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr, + AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + if (TARGET_SIMD) aarch64_init_simd_builtins (); } @@ -964,6 +988,36 @@ aarch64_expand_builtin (tree exp, { tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); int fcode = DECL_FUNCTION_CODE (fndecl); + int icode; + rtx pat, op0; + tree arg0; + + switch (fcode) +{ +case AARCH64_BUILTIN_GET_FPCR: +case AARCH64_BUILTIN_SET_FPCR: +case AARCH64_BUILTIN_GET_FPSR: +case AARCH64_BUILTIN_SET_FPSR: + if ((fcode == AARCH64_BUILTIN_GET_FPCR) + || (fcode == AARCH64_BUILTIN_GET_FPSR)) + { + icode = (fcode == AARCH64_BUILTIN_GET_FPSR) ? + CODE_FOR_get_fpsr : CODE_FOR_get_fpcr; + target = gen_reg_rtx (SImode); + pat = GEN_FCN (icode) (target); + } + else + { + target = NULL_RTX; + icode = (fcode == AARCH64_BUILTIN_SET_FPSR) ? + CODE_FOR_set_fpsr : CODE_FOR_set_fpcr; + arg0 = CALL_EXPR_ARG (exp, 0); + op0 = expand_normal (arg0); + pat = GEN_FCN (icode) (op0); + } + emit_insn (pat); + return target; +} if (fcode = AARCH64_SIMD_BUILTIN_BASE) return aarch64_simd_expand_builtin (fcode, exp, target); @@ -1196,6 +1250,106 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) return changed; } +void +aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + const unsigned AARCH64_FE_INVALID = 1; + const unsigned AARCH64_FE_DIVBYZERO = 2; + const unsigned AARCH64_FE_OVERFLOW = 4; + const unsigned AARCH64_FE_UNDERFLOW = 8; + const unsigned AARCH64_FE_INEXACT = 16; + const unsigned HOST_WIDE_INT AARCH64_FE_ALL_EXCEPT = (AARCH64_FE_INVALID + | AARCH64_FE_DIVBYZERO + |
Re: [PATCH 00/89] Compile-time gimple-checking
On April 30, 2014 11:26:35 PM CEST, Jeff Law l...@redhat.com wrote: On 04/23/14 08:32, Richard Biener wrote: Also I see you introduce a const_FOO class with every FOO one. I wonder whether, now that we have C++, can address const-correctness in a less awkward way than with a typedef. Can you try to go back in time and see why we did with that in the first place? ISTR that it was oh, if we were only using C++ we wouldn't need to jump through that hoop. To followup myself here, it's because 'tree' is a typedef to a pointer and thus 'const tree' is different from 'const tree_node *'. Right. Not sure why we re-introduced the 'mistake' of making 'tree' a pointer when we introduced 'gimple'. If we were to make 'gimple' the class type itself we can use gimple *, const gimple * and also const gimple (when a NULL pointer is not expected). It wasn't ever really discussed to the best of my recollection. Anyway, gazillion new typedefs are ugly :/ (typedefs are ugly) Yea, can't argue with that. However, do we want to ask David to fix up the gimple vs gimple * vs const gimple * vs const gimple as a prerequisite for this patchset or are you comfortable going forward with the patchset, then researching if there's a cleaner way to handle the const/typedef issues? Well, I'd like to see both and one affects the other. Doing the const correctness thing first seems more natural to me. Of course both need to wait for 4.9.1. Thanks, Richard. jeff
Re: [AArch64] Fix integer vabs intrinsics
On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. Thanks, James
Re: [AArch64] Fix integer vabs intrinsics
On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. Thanks, Andrew Pinski Thanks, James
Re: [C PATCH] Make attributes accept enum values (PR c/50459)
On Thu, May 01, 2014 at 11:20:25PM +, Joseph S. Myers wrote: On Wed, 23 Apr 2014, Marek Polacek wrote: diff --git gcc/testsuite/c-c++-common/attributes-1.c gcc/testsuite/c-c++-common/attributes-1.c index af4dd12..8458e47 100644 --- gcc/testsuite/c-c++-common/attributes-1.c +++ gcc/testsuite/c-c++-common/attributes-1.c @@ -9,7 +9,7 @@ typedef char vec __attribute__((vector_size(bar))); /* { dg-warning ignored } void f1(char*) __attribute__((nonnull(bar))); /* { dg-error invalid operand } */ void f2(char*) __attribute__((nonnull(1,bar))); /* { dg-error invalid operand } */ -void g() __attribute__((aligned(bar))); /* { dg-error invalid value|not an integer } */ +void g() __attribute__((aligned(bar))); I don't think it's appropriate to remove any test assertion that this invalid code gets diagnosed. If the only diagnostic is now one swallowed by the dg-prune-output in this test, either that dg-prune-output needs to be removed (and corresponding more detailed error expectations added), or a separate test needs adding for this erroneous use of this attribute (that separate test not using dg-prune-output). Yeah, that was a weird thing to do. I yanked that particular test to a new testcase. Otherwise no changes. Tested again x86_64-linux, ok now? 2014-05-02 Marek Polacek pola...@redhat.com PR c/50459 c-family/ * c-common.c (check_user_alignment): Return -1 if alignment is error node. (handle_aligned_attribute): Don't call default_conversion on FUNCTION_DECLs. (handle_vector_size_attribute): Likewise. (handle_tm_wrap_attribute): Handle case when wrap_decl is error node. (handle_sentinel_attribute): Call default_conversion and allow even integral types as an argument. c/ * c-parser.c (c_parser_attributes): Parse the arguments as an expression-list if the attribute takes identifier. testsuite/ * c-c++-common/attributes-1.c: Move test line to a new test. * c-c++-common/attributes-2.c: New test. * c-c++-common/pr50459.c: New test. * c-c++-common/pr59280.c: Add undeclared to dg-error. * gcc.dg/nonnull-2.c: Likewise. * gcc.dg/pr55570.c: Modify dg-error. * gcc.dg/tm/wrap-2.c: Likewise. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 0ad955d..3ebd960 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -7438,6 +7438,8 @@ check_user_alignment (const_tree align, bool allow_zero) { int i; + if (error_operand_p (align)) +return -1; if (TREE_CODE (align) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE (align))) { @@ -7559,7 +7561,8 @@ handle_aligned_attribute (tree *node, tree ARG_UNUSED (name), tree args, if (args) { align_expr = TREE_VALUE (args); - if (align_expr TREE_CODE (align_expr) != IDENTIFIER_NODE) + if (align_expr TREE_CODE (align_expr) != IDENTIFIER_NODE + TREE_CODE (align_expr) != FUNCTION_DECL) align_expr = default_conversion (align_expr); } else @@ -8424,9 +8427,11 @@ handle_tm_wrap_attribute (tree *node, tree name, tree args, else { tree wrap_decl = TREE_VALUE (args); - if (TREE_CODE (wrap_decl) != IDENTIFIER_NODE - TREE_CODE (wrap_decl) != VAR_DECL - TREE_CODE (wrap_decl) != FUNCTION_DECL) + if (error_operand_p (wrap_decl)) +; + else if (TREE_CODE (wrap_decl) != IDENTIFIER_NODE + TREE_CODE (wrap_decl) != VAR_DECL + TREE_CODE (wrap_decl) != FUNCTION_DECL) error (%qE argument not an identifier, name); else { @@ -8553,7 +8558,8 @@ handle_vector_size_attribute (tree *node, tree name, tree args, *no_add_attrs = true; size = TREE_VALUE (args); - if (size TREE_CODE (size) != IDENTIFIER_NODE) + if (size TREE_CODE (size) != IDENTIFIER_NODE + TREE_CODE (size) != FUNCTION_DECL) size = default_conversion (size); if (!tree_fits_uhwi_p (size)) @@ -8964,8 +8970,12 @@ handle_sentinel_attribute (tree *node, tree name, tree args, if (args) { tree position = TREE_VALUE (args); + if (position TREE_CODE (position) != IDENTIFIER_NODE + TREE_CODE (position) != FUNCTION_DECL) + position = default_conversion (position); - if (TREE_CODE (position) != INTEGER_CST) + if (TREE_CODE (position) != INTEGER_CST + || !INTEGRAL_TYPE_P (TREE_TYPE (position))) { warning (OPT_Wattributes, requested position is not an integer constant); diff --git gcc/c/c-parser.c gcc/c/c-parser.c index 7947355..48f8d2f 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -3955,11 +3955,16 @@ c_parser_attributes (c_parser *parser) In objective-c the identifier may be a classname. */ if (c_parser_next_token_is (parser, CPP_NAME) (c_parser_peek_token (parser)-id_kind == C_ID_ID -
Re: [PATCH][ARM] Handle simple SImode PLUS and MINUS operations in rtx costs
On 24/03/14 17:21, Kyrill Tkachov wrote: Hi all, I noticed that we don't handle simple reg-to-reg arithmetic operations in the arm rtx cost functions. We should be adding the cost of alu.arith to the costs of the operands. This patch does that. Since we don't have any cost tables yet that have a non-zero value for that field it shouldn't affect code-gen for any current cores. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for next stage1? Thanks, Kyrill 2014-03-24 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_new_rtx_costs): Handle reg-to-reg PLUS and MINUS RTXs. I think the convention for big switch statements like this is to write * config/arm/arm.c (arm_new_rtx_costs, case PLUS, MINUS): However... rtx-costs-plus-minus.patch diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index bf5d1b2..3102b67 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9428,6 +9428,14 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost += rtx_cost (XEXP (x, 1), code, 1, speed_p); return true; } + else if (REG_P (XEXP (x, 0))) + { + *cost += rtx_cost (XEXP (x, 0), code, 0, speed_p) ++ rtx_cost (XEXP (x, 1), code, 1, speed_p); + if (speed_p) + *cost += extra_cost-alu.arith; + return true; + } This still doesn't handle the case where the first operand is not a register. Furthermore, the only difference between this and what we had before is that we now add alu.arith to the overall cost. But we want to do that anyway. I think you really just need else if (speed_p) *cost += extra_cost.alu_arith; and then let recursion (return false) handle the cost of non-register operands (register will always cost as zero). return false; } @@ -9663,6 +9671,14 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost += rtx_cost (XEXP (x, 0), PLUS, 0, speed_p); return true; } + else if (REG_P (XEXP (x, 1))) + { + *cost += rtx_cost (XEXP (x, 0), PLUS, 0, speed_p) ++ rtx_cost (XEXP (x, 1), PLUS, 1, speed_p); + if (speed_p) + *cost += extra_cost-alu.arith; + return true; + } The same here. return false; } R.
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Svante Signell, le Fri 02 May 2014 10:03:23 +0200, a écrit : On Fri, 2014-05-02 at 00:45 +0200, Samuel Thibault wrote: Hello, Svante Signell, le Thu 24 Apr 2014 10:39:10 +0200, a écrit : - Without split stack enabled around 70 libgo tests pass and 50 fails, most of them with a segfault. - Enabling split stack and using the libc Samuel built all 122 libgo tests fail with a segfault. Please provide segfault backtrace, rpctrace, etc. Otherwise we won't be able to help you. You already have one in this thread, analysed by Justus: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01736.html Yes, but that stack was without split stack fixed. The backtrace you have just attached seems different. Samuel
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Justus Winter, le Sat 26 Apr 2014 08:53:08 +0200, a écrit : task130(pid1182)-vm_map (0 49880 0 1133--160(pid1182) 0 1 5 7 1) = 0 2453504 We map that somewhere. task130(pid1182)-mach_port_deallocate (pn{ 25}) = 0 Deallocate the port. Again, for some strange reason 133 == pn{ 25}. 158--157(pid1182)-io_map_request () = 0133--162(pid1182) (null) Some more io_map. task130(pid1182)-vm_map (2498560 8192 0 0133--162(pid1182) 40960 1 3 7 1) = 0x3 ((os/kern) no space available) task130(pid1182)-vm_deallocate (2498560 8192) = 0 Hum? task130(pid1182)-vm_map (2498560 8192 0 0133--162(pid1182) 40960 1 3 7 1) = 0 2498560 task130(pid1182)-mach_port_deallocate (pn{ 25}) = 0 Success! See the logic in elf/dl-load.c's _dl_map_object_from_fd and sysdeps/mach/hurd/mmap.c. _dl_map_object_from_fd first gets somewhere to map the SO (first __mmap call), and then, if bss is big, it maps anonymous pages for it with MAP_FIXED, replacing the bss part of the SO: 2453504 + 49880 = 0x2632d8 2498560 + 8192 = 0x264000 So this seems like just normal behavior. Samuel
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Svante Signell, le Fri 02 May 2014 10:18:12 +0200, a écrit : task130(pid1182)-vm_allocate (33562796 8364 0) = 0x3 ((os/kern) no space available) task130(pid1182)-vm_allocate (33571160 8364 0) = 0 33570816 While inspecting this, I realized this is from __pthread_stack_alloc, the only caller of vm_allocate with anywhere set to 0 which would have such behavior. 8364 is really small for a stack (but that's expected from -fsplit-stack), and the thing is: we have a bogus libpthread which includes guardsize into stacksize. I guess this is what happens: gcc believes there is 8K, but our libpthread actually removes 4K from it for guardsize, so the process will crash as soon as 4K are used on the stack. So we just need to fix guardsize in our libpthread. Samuel
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Samuel Thibault, le Fri 02 May 2014 11:57:53 +0200, a écrit : So we just need to fix guardsize in our libpthread. (And I'll have a look at it). Samuel
Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook
On 29 April 2014 03:37, Kugan kugan.vivekanandara...@linaro.org wrote: On 28/04/14 21:01, Ramana Radhakrishnan wrote: On 04/26/14 11:57, Kugan wrote: Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64. With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS. This implementation is based on SPARC and i386 implementations. Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new regression. Is this OK for trunk? Again like A32 please test on hardware to make sure this behaves correctly with c11-atomic-exec-5.c . If you don't have access to hardware, let us know : we'll take it for a spin once you update the patch according to Marcus's comments. Thanks for the review. I have updated the patch. I also have updated hold, clear and update to be exactly as in feholdexcpt.c, fclrexcpt.c and feupdateenv.c of glibc/ports/sysdeps/aarch64/fpu. Kugan, I've not looked at the respin in detail yet, but it has just occurred to me that the sequence used here to set FPCR is insufficient. The architecture reference manual requires that any write to FPCR must be syncrhronized by a context synchronization operation so we need to plant an ISB after the write. Both the write and ISB are likely to be expensive on some implementations so it would be good to ensure that both the write and the isb are scheduled independently. IIRC there si I have limited real hardware access and just did a bootstrap and tested c11-atomic-exec-5.c alone to make sure that it PASS. I have also regression tested again on qemu-aarch64 for aarch64-none-linux-gnu with no new regressions. I will appreciate if you could do the regression testing on real hw. Once the ISB issue is resolved I'll give the patch a spin on HW here. /Marcus
Re: [PATCH], RFC, add support for __float128/__ibm128 types on PowerPC
Hi! On Tue, Apr 29, 2014 at 06:30:32PM -0400, Michael Meissner wrote: This patch adds support for a new type (__float128) on the PowerPC to allow people to use the 128-bit IEEE floating point format instead of the traditional IBM double-double that has been used in the Linux compilers. At this time, long double still will remain using the IBM double-double format. There has been an undocumented option to switch long double to to IEEE 128-bit, but right now, there are bugs I haven't ironed out on VSX systems. In addition, I added another type (__ibm128) so that when the transition is eventually made, people can use this type to get the old long double type. I was wondering if people had any comments on the code so far, and things I should different. Note, I will be out on vacation May 6th - 14th, so I don't expect to submit the patches until I get back. For mangling, if you are going to mangle it the same as the -mlong-double-64 long double, is __float128 going to be supported solely for ELFv2 ABI and are you sure nobody has ever used -mlong-double-64 or --without-long-double-128 configured compiler for it? What is the plan for glibc (and for libstdc++)? Looking at current ppc64le glibc, it seems it mistakenly still supports the -mlong-double-64 stuff (e.g. printf calls are usually redirected to __nldbl_printf (and tons of other calls). So, is the plan to use yet another set of symbols? For __nldbl_* it is about 113 entry points in libc.so and 1 in libm.so, but if you are going to support all of -mlong-double-64, -mlong-double-128 as well as __float128, that would be far more, because the compat -mlong-double-64 support mostly works by redirecting, either in headers or through a special *.a library, to corresponding double entry points whenever possible. So, if you call logl in -mlong-double-64 code, it will be redirected to log, because it has the same ABI. But if you call *printf or nexttowardf etc. where there is no ABI compatible double entrypoint, it needs to be a new symbol. But with __float128 vs. __ibm128 and long double being either of those, you need different logl. Which is why it is so huge problem that this hasn't been resolved initially as part of ELFv2 changes. Jakub
Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook
On 2 May 2014 11:06, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: Kugan, I've not looked at the respin in detail yet, but it has just occurred to me that the sequence used here to set FPCR is insufficient. The architecture reference manual requires that any write to FPCR must be syncrhronized by a context synchronization operation so we need to plant an ISB after the write. Both the write and ISB are likely to be expensive on some implementations so it would be good to ensure that both the write and the isb are scheduled independently. IIRC there si Sorry, incomplete sentence. I had started to write that IIRC the same issue did not apply to FPSCR in the ARM patch. I have doubled checked and the FPSCR does not have the issue therefore the ARM patch is fine in this respect. /Marcus
[RS6000] PR60737, expand_block_clear uses word stores
In cases where the compiler has no alignment info, powerpc64le-linux gcc generates byte at a time copies for -mstrict-align (which is on for little-endian power7). That's awful code, a problem shared by other strict-align targets, see pr50417. However, we also have a case when -mno-strict-align generates less than ideal code, which I believe stems from using alignment as a proxy for testing an address offset. See http://gcc.gnu.org/ml/gcc-patches/1999-09n/msg01072.html. So my first attempt at fixing this problem looked at address offsets directly. That worked fine too, but on thinking some more, I believe we no longer have the movdi restriction. Nowadays we'll reload the address if we have an offset that doesn't satisfy the Y constraint (ie. a multiple of 4 offset). Which led to this simpler patch. Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux and powerpc-linux. OK to apply? PR target/60737 * config/rs6000/rs6000.c (expand_block_move): Allow 64-bit loads and stores when -mno-strict-align at any alignment. (expand_block_clear): Similarly. Also correct calculation of instruction count. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 209926) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15438,7 +15422,7 @@ expand_block_clear (rtx operands[]) load zero and three to do clearing. */ if (TARGET_ALTIVEC align = 128) clear_step = 16; - else if (TARGET_POWERPC64 align = 32) + else if (TARGET_POWERPC64 (align = 64 || !STRICT_ALIGNMENT)) clear_step = 8; else if (TARGET_SPE align = 64) clear_step = 8; @@ -15466,9 +15450,7 @@ expand_block_clear (rtx operands[]) mode = V2SImode; } else if (bytes = 8 TARGET_POWERPC64 - /* 64-bit loads and stores require word-aligned - displacements. */ - (align = 64 || (!STRICT_ALIGNMENT align = 32))) + (align = 64 || !STRICT_ALIGNMENT)) { clear_bytes = 8; mode = DImode; @@ -15599,9 +15581,7 @@ expand_block_move (rtx operands[]) gen_func.movmemsi = gen_movmemsi_4reg; } else if (bytes = 8 TARGET_POWERPC64 - /* 64-bit loads and stores require word-aligned - displacements. */ - (align = 64 || (!STRICT_ALIGNMENT align = 32))) + (align = 64 || !STRICT_ALIGNMENT)) { move_bytes = 8; mode = DImode; -- Alan Modra Australia Development Lab, IBM
Re: [AArch64] Fix integer vabs intrinsics
On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs (reg))) (intrinsic_abs (unspec [(reg)] UNSPEC_ABS)) (abd (abs (minus (reg) (reg (intrinsic_abd (unspec [(reg) (reg)] UNSPEC_ABD)) (aba (plus (abs (minus (reg) (reg))) (reg))) (intrinsic_aba (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) which should give us reasonable auto-vectorized code without triggering any of the issues mapping the semantics of the instructions to intrinsics. Thanks, James Thanks, Andrew Pinski Thanks, James
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Samuel Thibault, le Fri 02 May 2014 11:57:53 +0200, a écrit : So we just need to fix guardsize in our libpthread. It was not so difficult actually. Svante, could you try this libpthread: http://people.debian.org/~sthibault/tmp/libpthread.so.0.3 Thanks, Samuel
[C++ Patch/RFC] PR 58582
Hi, this relatively old bug report is about an ICE during error recovery and some time ago I figured out that the below simple tweak to grokfndecl worked for it (we notice that duplicate_decls returned error_mark_node). Today, however, while formatting the original testcases, I noticed something slightly more interesting: at variance with both current clang and icc, we reject the below deleted6.C, thus we reject explicit instantiations of deleted template functions (without explaining that the function is deleted in the error message and still accepting the code with -fpermissive (?)). Thus the below tweak to instantiate_decl. Without it, 58582 is still fixed of course, only the testcases need a little adjusting. Tested x86_64-linux. Thanks, Paolo. // Index: cp/decl.c === --- cp/decl.c (revision 210004) +++ cp/decl.c (working copy) @@ -7822,6 +7822,8 @@ grokfndecl (tree ctype, decl, ctype); return NULL_TREE; } + if (ok == error_mark_node) + return NULL_TREE; return old_decl; } } Index: cp/pt.c === --- cp/pt.c (revision 210004) +++ cp/pt.c (working copy) @@ -19698,7 +19699,8 @@ instantiate_decl (tree d, int defer_ok, if (at_eof !pattern_defined DECL_EXPLICIT_INSTANTIATION (d) - DECL_NOT_REALLY_EXTERN (d)) + DECL_NOT_REALLY_EXTERN (d) + (TREE_CODE (d) != FUNCTION_DECL || !DECL_DELETED_FN (d))) /* [temp.explicit] The definition of a non-exported function template, a Index: testsuite/g++.dg/cpp0x/deleted4.C === --- testsuite/g++.dg/cpp0x/deleted4.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted4.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +templateint void A::foo() { int i; } // { dg-error redefinition } + +template void A::foo0(); Index: testsuite/g++.dg/cpp0x/deleted5.C === --- testsuite/g++.dg/cpp0x/deleted5.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted5.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +templateint void A::foo() {} // { dg-error redefinition } + +template void A::foo0(); Index: testsuite/g++.dg/cpp0x/deleted6.C === --- testsuite/g++.dg/cpp0x/deleted6.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted6.C (working copy) @@ -0,0 +1,9 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +template void A::foo0();
Re: [AArch64] Fix integer vabs intrinsics
On 02/05/14 11:28, James Greenhalgh wrote: On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! I agree. I think what you need is a type widening so that you get t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VEC_NARROW (t2) This then guarantees that the ABS expression cannot be undefined. I'm less sure, however about the narrow causing a change in 'sign'. Has it just punted the problem? Maybe you need t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (x, unsigned) t4 = VEC_NARROW (t3) t5 = VIEW_CONVERT_EXPR (t4, signed) !!! How you capture this into RTL during expand, though, is another thing. R. (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs (reg))) (intrinsic_abs (unspec [(reg)] UNSPEC_ABS)) (abd (abs (minus (reg) (reg (intrinsic_abd (unspec [(reg) (reg)] UNSPEC_ABD)) (aba (plus (abs (minus (reg) (reg))) (reg))) (intrinsic_aba (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) which should give us reasonable auto-vectorized code without triggering any of the issues mapping the semantics of the instructions to intrinsics. Thanks, James Thanks, Andrew Pinski Thanks, James
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
On Fri, 2014-05-02 at 12:00 +0200, Samuel Thibault wrote: Samuel Thibault, le Fri 02 May 2014 11:57:53 +0200, a écrit : So we just need to fix guardsize in our libpthread. (And I'll have a look at it). Maybe this can fix the around 40 segfaults (of 50 failures) when split stack is disabled too? The segfaults are always around the same place in libpthread.
Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Svante Signell, le Fri 02 May 2014 12:45:56 +0200, a écrit : On Fri, 2014-05-02 at 12:00 +0200, Samuel Thibault wrote: Samuel Thibault, le Fri 02 May 2014 11:57:53 +0200, a écrit : So we just need to fix guardsize in our libpthread. (And I'll have a look at it). Maybe this can fix the around 40 segfaults (of 50 failures) when split stack is disabled too? The segfaults are always around the same place in libpthread. Possibly, if libgo uses small stacks already. Samuel
Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook
On 05/02/14 10:08, Kugan wrote: diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 347a94a..8bd13f3 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -9107,6 +9107,7 @@ to those machines. Generally these generate calls to specific machine instructions, but allow the compiler to schedule those calls. @menu +* AARCH64 Built-in Functions:: * Alpha Built-in Functions:: * Altera Nios II Built-in Functions:: * ARC Built-in Functions:: @@ -9139,6 +9140,18 @@ instructions, but allow the compiler to schedule those calls. * TILEPro Built-in Functions:: @end menu +@node AARCH64 Built-in Functions +@subsection AARCH64 Built-in Functions + +These built-in functions are available for the AARCH64 family of +processors. +@smallexample +unsigned int __builtin_aarch64_get_fpcr () +void __builtin_aarch64_set_fpcr (unsigned int) +unsigned int __builtin_aarch64_get_fpsr () +void __builtin_aarch64_set_fpsr (unsigned int) +@end smallexample + @node Alpha Built-in Functions @subsection Alpha Built-in Functions Please s/AARCH64/AArch64 to stay consistent with the existing usage, e.g. those in invoke.texi. Thanks, Yufeng
Re: [Patch, Fortran] Fix an issue with CLASS and -fcoarray=lib on the trunk
Dominique Dhumieres wrote: This causes several failures with -m32 (see http://gcc.gnu.org/ml/gcc-regression/2014-05/msg3.html): Sorry for the breakage - obviously, I forgot to test with -m32. Most of the failures are fixed by replacing 'kind=8' with 'kind=.' or 'kind=\[48\]'. Either change is fine with me. The remaining failures in gfortran.dg/coarray_lib_this_image_*.f90 are fixed by the following patches The patch is okay with a proper ChangeLog; please also list PR fortran/61025. Thanks, Tobias
Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook
On 02/05/14 20:06, Marcus Shawcroft wrote: On 29 April 2014 03:37, Kugan kugan.vivekanandara...@linaro.org wrote: On 28/04/14 21:01, Ramana Radhakrishnan wrote: On 04/26/14 11:57, Kugan wrote: Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64. With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS. This implementation is based on SPARC and i386 implementations. Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new regression. Is this OK for trunk? Again like A32 please test on hardware to make sure this behaves correctly with c11-atomic-exec-5.c . If you don't have access to hardware, let us know : we'll take it for a spin once you update the patch according to Marcus's comments. Thanks for the review. I have updated the patch. I also have updated hold, clear and update to be exactly as in feholdexcpt.c, fclrexcpt.c and feupdateenv.c of glibc/ports/sysdeps/aarch64/fpu. Kugan, I've not looked at the respin in detail yet, but it has just occurred to me that the sequence used here to set FPCR is insufficient. The architecture reference manual requires that any write to FPCR must be syncrhronized by a context synchronization operation so we need to plant an ISB after the write. Both the write and ISB are likely to be expensive on some implementations so it would be good to ensure that both the write and the isb are scheduled independently. IIRC there si I have limited real hardware access and just did a bootstrap and tested c11-atomic-exec-5.c alone to make sure that it PASS. I have also regression tested again on qemu-aarch64 for aarch64-none-linux-gnu with no new regressions. I will appreciate if you could do the regression testing on real hw. Once the ISB issue is resolved I'll give the patch a spin on HW here. Here is the modified patch which also includes changes Yufeng has suggested. Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new regressions. Thanks, Kugan gcc/ +2014-05-02 Kugan Vivekanandarajah kug...@linaro.org + + * config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New + define. + * config/aarch64/aarch64-protos.h (aarch64_atomic_assign_expand_fenv): + New function declaration. + * config/aarch64/aarch64-builtins.c (aarch64_builtins) : Add + AARCH64_BUILTIN_GET_FPCR, AARCH64_BUILTIN_SET_FPCR. + AARCH64_BUILTIN_GET_FPSR and AARCH64_BUILTIN_SET_FPSR. + (aarch64_init_builtins) : Initialize builtins + __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr. + __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr. + (aarch64_expand_builtin) : Expand builtins __builtins_aarch64_set_fpcr + __builtins_aarch64_get_fpcr, __builtins_aarch64_get_fpsr, + and __builtins_aarch64_set_fpsr. + (aarch64_atomic_assign_expand_fenv): New function. + * config/aarch64/aarch64.md (set_fpcr): New pattern. + (get_fpcr) : Likewise. + (set_fpsr) : Likewise. + (get_fpsr) : Likewise. + (unspecv): Add UNSPECV_GET_FPCR and UNSPECV_SET_FPCR, UNSPECV_GET_FPSR +and UNSPECV_SET_FPSR. + * doc/extend.texi (AARCH64 Built-in Functions) : Document + __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr. + __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr. + diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 55cfe0a..a5af874 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -371,6 +371,12 @@ static aarch64_simd_builtin_datum aarch64_simd_builtin_data[] = { enum aarch64_builtins { AARCH64_BUILTIN_MIN, + + AARCH64_BUILTIN_GET_FPCR, + AARCH64_BUILTIN_SET_FPCR, + AARCH64_BUILTIN_GET_FPSR, + AARCH64_BUILTIN_SET_FPSR, + AARCH64_SIMD_BUILTIN_BASE, #include aarch64-simd-builtins.def AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_BUILTIN_BASE @@ -752,6 +758,24 @@ aarch64_init_simd_builtins (void) void aarch64_init_builtins (void) { + tree ftype_set_fpr += build_function_type_list (void_type_node, unsigned_type_node, NULL); + tree ftype_get_fpr += build_function_type_list (unsigned_type_node, NULL); + + aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR] += add_builtin_function (__builtin_aarch64_get_fpcr, ftype_get_fpr, + AARCH64_BUILTIN_GET_FPCR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR] += add_builtin_function (__builtin_aarch64_set_fpcr, ftype_set_fpr, + AARCH64_BUILTIN_SET_FPCR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPSR] += add_builtin_function (__builtin_aarch64_get_fpsr, ftype_get_fpr, + AARCH64_BUILTIN_GET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPSR] += add_builtin_function (__builtin_aarch64_set_fpsr,
Re: [PATCH], RFC, add support for __float128/__ibm128 types on PowerPC
On Fri, 2014-05-02 at 12:13 +0200, Jakub Jelinek wrote: Hi! On Tue, Apr 29, 2014 at 06:30:32PM -0400, Michael Meissner wrote: This patch adds support for a new type (__float128) on the PowerPC to allow people to use the 128-bit IEEE floating point format instead of the traditional IBM double-double that has been used in the Linux compilers. At this time, long double still will remain using the IBM double-double format. There has been an undocumented option to switch long double to to IEEE 128-bit, but right now, there are bugs I haven't ironed out on VSX systems. In addition, I added another type (__ibm128) so that when the transition is eventually made, people can use this type to get the old long double type. I was wondering if people had any comments on the code so far, and things I should different. Note, I will be out on vacation May 6th - 14th, so I don't expect to submit the patches until I get back. For mangling, if you are going to mangle it the same as the -mlong-double-64 long double, is __float128 going to be supported solely for ELFv2 ABI and are you sure nobody has ever used -mlong-double-64 or --without-long-double-128 configured compiler for it? What is the plan for glibc (and for libstdc++)? Looking at current ppc64le glibc, it seems it mistakenly still supports the -mlong-double-64 stuff (e.g. printf calls are usually redirected to __nldbl_printf (and tons of other calls). So, is the plan to use yet another set of symbols? For __nldbl_* it is about 113 entry points in libc.so and 1 in libm.so, but if you are going to support all of -mlong-double-64, -mlong-double-128 as well as __float128, that would be far more, because the compat -mlong-double-64 support mostly works by redirecting, either in headers or through a special *.a library, to corresponding double entry points whenever possible. So, if you call logl in -mlong-double-64 code, it will be redirected to log, because it has the same ABI. But if you call *printf or nexttowardf etc. where there is no ABI compatible double entrypoint, it needs to be a new symbol. But with __float128 vs. __ibm128 and long double being either of those, you need different logl. Yes and we will work on a plan to do this. But at this time and near future there is no performance advantage to __float128 over IBM long double. Which is why it is so huge problem that this hasn't been resolved initially as part of ELFv2 changes. Because it was a huge problem and there was no way for the required GCC support to be available in time for GLIBC-2.19. So we will develop a orderly, step by step transition plan. This will take some time.
[PATCH] Describe Wsizeof-pointer-memaccess in c.opt
Wsizeof-pointer-memaccess option is missing a description in the c.opt file. This patch adds something in there. Ok? 2014-05-02 Marek Polacek pola...@redhat.com * c.opt (Wsizeof-pointer-memaccess): Describe option. diff --git gcc/c-family/c.opt gcc/c-family/c.opt index 1bbb81c..7aa9e23 100644 --- gcc/c-family/c.opt +++ gcc/c-family/c.opt @@ -516,6 +516,7 @@ Warn about missing fields in struct initializers Wsizeof-pointer-memaccess C ObjC C++ ObjC++ Var(warn_sizeof_pointer_memaccess) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) +Warn about suspicious length parameters to certain string functions if the argument uses sizeof Wsuggest-attribute=format C ObjC C++ ObjC++ Var(warn_suggest_attribute_format) Warning Marek
Re: [C PATCH] Another locus improvements (PR c/60257)
On Fri, May 02, 2014 at 12:11:02AM +, Joseph S. Myers wrote: On Sat, 26 Apr 2014, Marek Polacek wrote: What is worth mentioning is that the (near initialization for X) message seems next to useless to me now with caret diagnostics (?). The caret diagnostics point to the initializer. near initialization says (or should say) what field GCC thinks is being initialized, which may not be obvious from looking at the initializer (consider a 100-element structure where the initializer for one element is missing, resulting in errors for subsequent elements having the wrong type; being told which element GCC thinks matches the initializer with a bad type may help you find where the missing initializer is). I see, thanks. It would be nice if we could show the previous initialization in some note message... Marek
Re: [PATCH ARM] Improve ARM memset inlining
On 30/04/14 03:52, bin.cheng wrote: Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] Other than for zero, I'd expect the vmov to be vmov.i8 to move an arbitrary byte value into all lanes in a vector. After that, if the alignment is known to be more than 8-bit, I'd expect the vst1 instructions (with the exception of the last store if the length is not a multiple of the alignment) to use vst1.align {reg}, [addr-reg :align]! Hence, for 16-bit aligned data, we want vst1.16 {q8}, [r3:16]! But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Some more comments inline. Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. j1328-20140429.txt Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 209852) +++ gcc/config/arm/arm.c (working copy) @@ -1585,10 +1585,11 @@ const struct tune_params arm_slowmul_tune = true, /* Prefer constant pool. */ arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ - {true, true}, /* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false /* Prefer Neon for stringops. */ }; Please make sure that all the white space before the comments is using TAB, not spaces. Similarly for the other
Re: [PATCH AARCH64] One-line tidy of bit-twiddle expression in aarch64.c
Whilst I agree with Richard H that it is obvious, my feeling is that the assertion does no harm, so have committed rev 210005 with Richard E's changes. --Alan Richard Henderson wrote: On 04/29/2014 05:42 AM, Richard Earnshaw wrote: On 23/04/14 16:20, Alan Lawrence wrote: This patch is a small tidy of a more-complicated expression that just flips a single bit and can thus be a simple XOR. No regressions on aarch64-none-elf or aarch64_be-none-elf. (I've verified code is indeed exercised by dg-torture.exp vshuf-v*.c). Also ok after applying TBL and testsuite patches in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01309.html and http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00579.html. gcc/ChangeLog: 2014-04-23 Alan Lawrence alan.lawre...@arm.com * config/aarch64/aarch64.c (aarch64_expand_vec_perm_1): tidy bit-flip expression. s/tidy/Tidy/ It's not obvious from your description (or from the code, for that matter) that for this to be valid nelt must be a power of 2. I suggest that, above the loop, you put gcc_assert (nelt == (nelt -nelt)); OK with those changes. I think it's sort of obvious from context that we're working with a vector. And it also seems obvious that we won't have a vector without a power-of-two number of elements. r~
[wide-int] out-of-range set_bit in java
I locally tried adding an assertion to the wide-int version of set_bit to make sure that the bit number was in range. It triggers for this code in boehm.c:mark_reference_fields (quoting trunk version): /* First word in object corresponds to most significant byte of bitmap. In the case of a multiple-word record, we set pointer bits for all words in the record. This is conservative, but the size_words != 1 case is impossible in regular java code. */ for (i = 0; i size_words; ++i) *mask = (*mask).set_bit (ubit - count - i - 1); if (count = ubit - 2) *pointer_after_end = 1; if count + i + 1 = ubit. AIUI the lower 2 bits are used for something else: /* Bottom two bits for bitmap mark type are 01. */ mask = mask.set_bit (0); value = double_int_to_tree (value_type, mask); which is why the pointer_after_end condition checks for count = ubit - 2. We never actually use the mask if pointer_after_end is true, so this patch puts the set_bit in an else branch. On face value it looks like the condition should be: count + size_words ubit - 2 instead, but it'd go without saying that I don't really understand this code. Tested on x86_64-linux-gnu and powerpc64-linux-gnu for wide-int. OK to install? Thanks, Richard gcc/java/ * boehm.c (mark_reference_fields): Don't update the mask when setting pointer_after_end. Index: gcc/java/boehm.c === --- gcc/java/boehm.c2014-01-13 15:05:22.543887284 + +++ gcc/java/boehm.c2014-05-02 16:08:25.500760537 +0100 @@ -101,17 +101,17 @@ mark_reference_fields (tree field, *last_set_index = count; - /* First word in object corresponds to most significant byte of -bitmap. - -In the case of a multiple-word record, we set pointer -bits for all words in the record. This is conservative, but the -size_words != 1 case is impossible in regular java code. */ - for (i = 0; i size_words; ++i) - *mask = wi::set_bit (*mask, ubit - count - i - 1); - if (count = ubit - 2) *pointer_after_end = 1; + else + /* First word in object corresponds to most significant byte of + bitmap. + + In the case of a multiple-word record, we set pointer + bits for all words in the record. This is conservative, but the + size_words != 1 case is impossible in regular java code. */ + for (i = 0; i size_words; ++i) + *mask = wi::set_bit (*mask, ubit - count - i - 1); /* If we saw a non-reference field earlier, then we can't use the count representation. We keep track of that in
Re: [RS6000] PR60737, expand_block_clear uses word stores
On Fri, May 2, 2014 at 6:20 AM, Alan Modra amo...@gmail.com wrote: In cases where the compiler has no alignment info, powerpc64le-linux gcc generates byte at a time copies for -mstrict-align (which is on for little-endian power7). That's awful code, a problem shared by other strict-align targets, see pr50417. However, we also have a case when -mno-strict-align generates less than ideal code, which I believe stems from using alignment as a proxy for testing an address offset. See http://gcc.gnu.org/ml/gcc-patches/1999-09n/msg01072.html. So my first attempt at fixing this problem looked at address offsets directly. That worked fine too, but on thinking some more, I believe we no longer have the movdi restriction. Nowadays we'll reload the address if we have an offset that doesn't satisfy the Y constraint (ie. a multiple of 4 offset). Which led to this simpler patch. Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux and powerpc-linux. OK to apply? Hi, Alan Thanks for finding and addressing this. As you mention, recent server-class processors, at least POWER8, do not have the performance degradation for common, mis-aligned loads and stores of wider modes. But the patch should not impose this default on the large, installed based of processors, where mis-aligned loads can be a severe performance penalty. This heuristic has become processor-dependent and should not be hard-coded in the block_move and block_clear algorithms. PROCESSOR_DEFAULT is POWER8 for ELFv2 (and should be updated as the default for PowerLinux in general). Please update the patch to test rs6000_cpu, probably another boolean flag set in rs6000_option_override_internal(). Because of the processor defaults, the preferred instruction sequence will be the default without encoding an assumption about the heuristics in the algorithm itself. Thanks, David
[committed] Fix signed overflow for zext_hwi (..., 63)
...well, still HOST_BITS_PER_WIDE_INT-1 officially, until Richard's patches. Caught by a boostrap-ubsan on wide-int. Tested on x86_64-linux-gnu and committed as obvious. Richard gcc/ * hwint.h (zext_hwi): Fix signed overflow for prec == 63. Index: gcc/hwint.h === --- gcc/hwint.h 2014-01-03 15:06:18.841058787 + +++ gcc/hwint.h 2014-05-02 16:31:44.987263123 +0100 @@ -344,7 +344,7 @@ zext_hwi (unsigned HOST_WIDE_INT src, un else { gcc_checking_assert (prec HOST_BITS_PER_WIDE_INT); - return src (((HOST_WIDE_INT) 1 prec) - 1); + return src (((unsigned HOST_WIDE_INT) 1 prec) - 1); } }
[PATCH] Change PPC64 Linux default to POWER8.
This patch updates the default processor tuning to POWER8 for any PPC64 Linux target, not only little endian (ELFv2). Any objection? Thanks, David * config/rs6000/linux64.h (PROCESSOR_DEFAULT): Change to PROCESSOR_POWER8. (PROCESSOR_DEFAULT64): Same. Index: linux64.h === --- linux64.h (revision 209981) +++ linux64.h (working copy) @@ -69,13 +69,9 @@ #endif #undef PROCESSOR_DEFAULT -#define PROCESSOR_DEFAULT PROCESSOR_POWER7 +#define PROCESSOR_DEFAULT PROCESSOR_POWER8 #undef PROCESSOR_DEFAULT64 -#ifdef LINUX64_DEFAULT_ABI_ELFv2 #define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 -#else -#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7 -#endif /* We don't need to generate entries in .fixup, except when -mrelocatable or -mrelocatable-lib is given. */
Re: [PATCH] Implement -fsanitize=float-divide-by-zero
On Tue, 29 Apr 2014, Jakub Jelinek wrote: To my surprise, the wording in C99 or C++11 make even floating point division by zero undefined behavior, but I think generally at least for IEEE floating point semantics it is well defined, thus I think we shouldn't include it in -fsanitize=undefined. Making Annex F explicitly take precedence over the generic text is DR#442. (This precedence issue applies to out-of-range conversions from floating point to integer as well as such things as floating-point division by zero: the generic text makes such out-of-range conversions undefined, but Annex F says they raise invalid and return an unspecified value.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Change PPC64 Linux default to POWER8.
On Fri, 2014-05-02 at 11:50 -0400, David Edelsohn wrote: This patch updates the default processor tuning to POWER8 for any PPC64 Linux target, not only little endian (ELFv2). Any objection? Thanks, David * config/rs6000/linux64.h (PROCESSOR_DEFAULT): Change to PROCESSOR_POWER8. (PROCESSOR_DEFAULT64): Same. That looks good to me. Thanks. Peter
Re: [PING] Re: Add const char* constructors for exception classes in stdexcept
On 1 May 2014 17:18, Oleg Endo wrote: Jonathan, now that we're in stage 1 again, I'd like to revive the issue below. Do you have any particular plans? How should we proceed? Hi Oleg, sorry for letting the thread die in January. We will definitely want to deal with the missing constructors in stdexcept during stage 1, but I'm not sure exactly when :-) I'll come back to this next week - ping me if you don't hear from me again!
[patch] fix libstdc++/59476 - pretty printers
Update the python pretty printers to understand how _Rb_tree_node is defined in C++11 mode now. Tested x86_64-linux, committed to trunk and the 4.9 branch. commit a498be0a723e8c6b60185d5c15081be4650b2644 Author: Jonathan Wakely jwak...@redhat.com Date: Fri May 2 14:02:51 2014 +0100 PR libstdc++/59476 * python/libstdcxx/v6/printers.py (get_value_from_Rb_tree_node): New function to handle both C++03 and C++11 _Rb_tree_node implementations. (StdRbtreeIteratorPrinter, StdMapPrinter, StdSetPrinter): Use it. * testsuite/libstdc++-prettyprinters/simple.cc: Update comment to refer to... * testsuite/libstdc++-prettyprinters/simple11.cc: New. diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index 05da17b..1f1f860 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -375,6 +375,22 @@ class RbtreeIterator: self.node = node return result +def get_value_from_Rb_tree_node(node): +Returns the value held in an _Rb_tree_node_Val +try: +member = node.type.fields()[1].name +if member == '_M_value_field': +# C++03 implementation, node contains the value as a member +return node['_M_value_field'] +elif member == '_M_storage': +# C++11 implementation, node stores value in __aligned_buffer +p = node['_M_storage']['_M_storage'].address +p = p.cast(node.type.template_argument(0).pointer()) +return p.dereference() +except: +pass +raise ValueError, Unsupported implementation for %s % str(node.type) + # This is a pretty printer for std::_Rb_tree_iterator (which is # std::map::iterator), and has nothing to do with the RbtreeIterator # class above. @@ -387,7 +403,8 @@ class StdRbtreeIteratorPrinter: def to_string (self): typename = str(self.val.type.strip_typedefs()) + '::_Link_type' nodetype = gdb.lookup_type(typename).strip_typedefs() -return self.val.cast(nodetype).dereference()['_M_value_field'] +node = self.val.cast(nodetype).dereference() +return get_value_from_Rb_tree_node(node) class StdDebugIteratorPrinter: Print a debug enabled version of an iterator @@ -417,7 +434,8 @@ class StdMapPrinter: def next(self): if self.count % 2 == 0: n = self.rbiter.next() -n = n.cast(self.type).dereference()['_M_value_field'] +n = n.cast(self.type).dereference() +n = get_value_from_Rb_tree_node(n) self.pair = n item = n['first'] else: @@ -458,7 +476,8 @@ class StdSetPrinter: def next(self): item = self.rbiter.next() -item = item.cast(self.type).dereference()['_M_value_field'] +item = item.cast(self.type).dereference() +item = get_value_from_Rb_tree_node(item) # FIXME: this is weird ... what to do? # Maybe a 'set' display hint? result = ('[%d]' % self.count, item) diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc index 66ae8f7..030207a 100644 --- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc +++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc @@ -1,4 +1,4 @@ -// If you modify this, please update debug.cc as well. +// If you modify this, please update simple11.cc and debug.cc as well. // { dg-do run } // { dg-options -g -O0 } diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc new file mode 100644 index 000..e94bea6 --- /dev/null +++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc @@ -0,0 +1,92 @@ +// If you modify this, please update simple.cc and debug.cc as well. + +// { dg-do run } +// { dg-options -g -O0 -std=gnu++11 } + +// Copyright (C) 2011-2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include string +#include deque +#include bitset +#include iostream +#include list +#include map +#include set +#include ext/slist + +int +main() +{ + std::string tem; +
Re: [PATCH 00/89] Compile-time gimple-checking
On 05/02/14 03:09, Richard Biener wrote: Well, I'd like to see both and one affects the other. Doing the const correctness thing first seems more natural to me. Of course both need to wait for 4.9.1. Well, it looks like David is already on that path to some extent with the proposed gengtype changes. I guess I'm just trying to figure out how to stage this stuff in. ie, is it easier to go with the #89 patchkit, then followup with fixing the const stuff, or is it easier to first fix the const stuff, then adjust the #89 kit. You're recommending the latter, which is fine with me, but I'd like David to chime in as well since he's doing the work :-) jeff
Re: [committed] Fix signed overflow for zext_hwi (..., 63)
On May 2, 2014, at 8:36 AM, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: ...well, still HOST_BITS_PER_WIDE_INT-1 officially, until Richard's patches. Caught by a boostrap-ubsan on wide-int. Tested on x86_64-linux-gnu and committed as obvious. Thanks.
Re: [PATCH 00/89] Compile-time gimple-checking
On Fri, 2014-05-02 at 10:02 -0600, Jeff Law wrote: On 05/02/14 03:09, Richard Biener wrote: Well, I'd like to see both and one affects the other. Doing the const correctness thing first seems more natural to me. Of course both need to wait for 4.9.1. Well, it looks like David is already on that path to some extent with the proposed gengtype changes. I guess I'm just trying to figure out how to stage this stuff in. ie, is it easier to go with the #89 patchkit, then followup with fixing the const stuff, or is it easier to first fix the const stuff, then adjust the #89 kit. You're recommending the latter, which is fine with me, but I'd like David to chime in as well since he's doing the work :-) I'm about 4 or 5 hours from having patches I can post :) [my automated typedef-removal script seems to work, but I need to write some ChangeLogs and port one of the patches from the #89 kit, then rebootstrapregrtest]
Re: [PATCH] gengtype: Support explicit pointers in template arguments
David Malcolm dmalc...@redhat.com writes: Currently, gengtype does not support template arguments that are explicitly pointers, such as: static GTY(()) vecgimple_statement_base * test_gimple; giving this error: ../../src/gcc/gimple-expr.c:902: parse error: expected a string constant or ',', have '*' instead requiring them to be typedefs. This patch removes this restriction, supporting up to a single pointer in each template argument. Sorry to ask, but would you mind holding on until after the wide-int merge (hopefully the next few days, this time for real)? It looks like it might clash with the support for nested templates. Thanks, Richard
Re: [C PATCH] Improve warn msg (PR c/43395)
On Tue, 29 Apr 2014, Marek Polacek wrote: It's correct to warn about returning an address of a local label, but it's clumsy to say it's local variable. We can easily distinguish between a label and a variable. Regtested/bootstrapped on x86_64-linux, ok for trunk? You always need to have complete sentences in diagnostics for the sake of translation. Thus, you need two separate warning_at calls, one with each version of the message. (Using ? : for the whole format string isn't sufficient; I think xgettext only extracts one of the two alternatives for translation if you do that.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] gengtype: Support explicit pointers in template arguments
On Fri, 2014-05-02 at 17:43 +0100, Richard Sandiford wrote: David Malcolm dmalc...@redhat.com writes: Currently, gengtype does not support template arguments that are explicitly pointers, such as: static GTY(()) vecgimple_statement_base * test_gimple; giving this error: ../../src/gcc/gimple-expr.c:902: parse error: expected a string constant or ',', have '*' instead requiring them to be typedefs. This patch removes this restriction, supporting up to a single pointer in each template argument. Sorry to ask, but would you mind holding on until after the wide-int merge (hopefully the next few days, this time for real)? It looks like it might clash with the support for nested templates. Ack. [This is enabling work for something I've already been asked to wait post-4.9.1 for, so sure. I hope it won't be too hard to redo on top of your changes.] Good luck with the wide-int merge
Re: [PATCH], RFC, add support for __float128/__ibm128 types on PowerPC
On Tue, 29 Apr 2014, Michael Meissner wrote: * soft-fp/quad.h (TFmode): If TFmode is defined, don't use the normal TF mode definition. That does of course need to go to glibc first (and I think it would be best to do something consistent for all the floating-point formats in soft-fp - allow overriding the definitions of all of them in terms of machine modes to keep things consistent, even without any use case for overriding definitions other than for TFmode). I don't see anything in this patch to give appropriate symbol versions to the new libgcc functions. libgcc/config/rs6000/sfp-machine.h does not currently implement integration with hardware exceptions and rounding modes. Such integration should be added as on other architectures such as x86 and AArch64 (conditional on FPRs being present, as opposed to soft-fp being used in libgcc for soft-float or e500). -- Joseph S. Myers jos...@codesourcery.com
Re: [C PATCH] Improve warn msg (PR c/43395)
On Fri, May 02, 2014 at 05:01:18PM +, Joseph S. Myers wrote: On Tue, 29 Apr 2014, Marek Polacek wrote: It's correct to warn about returning an address of a local label, but it's clumsy to say it's local variable. We can easily distinguish between a label and a variable. Regtested/bootstrapped on x86_64-linux, ok for trunk? You always need to have complete sentences in diagnostics for the sake of translation. Thus, you need two separate warning_at calls, one with each version of the message. (Using ? : for the whole format string isn't sufficient; I think xgettext only extracts one of the two alternatives for translation if you do that.) Ooops. Is it ok to fix it up with this patch then? 2014-05-02 Marek Polacek pola...@redhat.com c/ * c-typeck.c (c_finish_return): Separate warning_at calls. cp/ * typeck.c (maybe_warn_about_returning_address_of_local): Separate warning_at calls. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index b95fd89..f7ad91e 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -9273,10 +9273,14 @@ c_finish_return (location_t loc, tree retval, tree origtype) !DECL_EXTERNAL (inner) !TREE_STATIC (inner) DECL_CONTEXT (inner) == current_function_decl) - warning_at (loc, - OPT_Wreturn_local_addr, function returns address - of %s, TREE_CODE (inner) == LABEL_DECL -? label : local variable); + { + if (TREE_CODE (inner) == LABEL_DECL) + warning_at (loc, OPT_Wreturn_local_addr, + function returns address of label); + else + warning_at (loc, OPT_Wreturn_local_addr, + function returns address of local variable); + } break; default: diff --git gcc/cp/typeck.c gcc/cp/typeck.c index 8b7cb8d..7b28a9a 100644 --- gcc/cp/typeck.c +++ gcc/cp/typeck.c @@ -8309,10 +8309,12 @@ maybe_warn_about_returning_address_of_local (tree retval) if (TREE_CODE (valtype) == REFERENCE_TYPE) warning (OPT_Wreturn_local_addr, reference to local variable %q+D returned, whats_returned); + else if (TREE_CODE (whats_returned) == LABEL_DECL) + warning (OPT_Wreturn_local_addr, address of label %q+D returned, +whats_returned); else - warning (OPT_Wreturn_local_addr, address of %s %q+D returned, -TREE_CODE (whats_returned) == LABEL_DECL -? label : local variable, whats_returned); + warning (OPT_Wreturn_local_addr, address of local variable %q+D +returned, whats_returned); return; } } Marek
[Patch, testsuite, mips] Fix two failing MIPS tests.
Two MIPS specific tests started failing a few days ago, this patch fixes them. The tests are trying to check that a performance optimization is done and that one constant is derived from another and not simply loaded as a completely new constant. Post 4.9 changes have affected which constant gets loaded and which one gets derived (and how it is derived) from it. The new code looks as good as the old code: Old: li $5,305397760# 0x1234 addiu $4,$5,1 addiu $5,$5,-1 New: li $5,305332224# 0x1233 ori $5,$5,0x addiu $4,$5,2 My proposed fix is to change the asm scan to verify that there is only 1 'li' instruction and to not worry about what value is being loaded or what instructions are used to generate the second constant. Tested on MIPS32 and MIPS64 (n32 and 64 ABIs). OK to checkin? Steve Ellcey sell...@mips.com 2014-05-02 Steve Ellcey sell...@mips.com * gcc.target/mips/const-anchor-1.c: Modify asm scan. * gcc.target/mips/const-anchor-2.c: Ditto. diff --git a/gcc/testsuite/gcc.target/mips/const-anchor-1.c b/gcc/testsuite/gcc.target/mips/const-anchor-1.c index a5f01e4..debab12 100644 --- a/gcc/testsuite/gcc.target/mips/const-anchor-1.c +++ b/gcc/testsuite/gcc.target/mips/const-anchor-1.c @@ -1,8 +1,8 @@ -/* Derive a constant (0x1233) from an intermediate value - (0x1234000) used to build another constant. */ +/* Derive one constant from an intermediate value used to build another + constant. Verify that there is only one load immediate instruction + generated. */ /* { dg-skip-if code quality test { *-*-* } { -O0 } { } } */ -/* { dg-final { scan-assembler-not 0x1233|305332224 } } */ -/* { dg-final { scan-assembler \td?addiu\t\\\$5,\\\$\[0-9\]*,-1 } } */ +/* { dg-final { scan-assembler-times \tli\t 1 } } */ NOMIPS16 void f () { diff --git a/gcc/testsuite/gcc.target/mips/const-anchor-2.c b/gcc/testsuite/gcc.target/mips/const-anchor-2.c index 8dad5a7..2edd1ef 100644 --- a/gcc/testsuite/gcc.target/mips/const-anchor-2.c +++ b/gcc/testsuite/gcc.target/mips/const-anchor-2.c @@ -1,7 +1,8 @@ -/* Derive a constant (0x30001) from another constant. */ +/* Derive one constant from an intermediate value used to build another + constant. Verify that there is only one load immediate instruction + generated. */ /* { dg-skip-if code quality test { *-*-* } { -O0 } { } } */ -/* { dg-final { scan-assembler-not 0x30|196608 } } */ -/* { dg-final { scan-assembler \td?addiu\t\\\$5,\\\$\[0-9\]*,32763 } } */ +/* { dg-final { scan-assembler-times \tli\t 1 } } */ NOMIPS16 void f () {
Re: [patch] change specific int128 - generic intN
On Thu, 1 May 2014, DJ Delorie wrote: Do we have a routine to do that already? I don't know. -- Joseph S. Myers jos...@codesourcery.com
Re: Contributing new gcc targets: i386-*-dragonfly and x86-64-*-dragonfly
On Fri, 2 May 2014, John Marino wrote: So given the track record (building itself, building base, building 21,000 software ports) over a couple of years I think any issues this could have caused would have been seen and identified by now. These issues aren't generally obvious (given that the ISO C conformance modes aren't used that often, and when they are that doesn't mean the application is using POSIX types for something else). (I don't know what the FreeBSD sys/_types.h defines, but it at least seems possible from the name that it is only defining things in the implementation namespace, with the public sys/types.h being what then includes sys/_types.h and does typedef __foo_t foo_t; or similar to provide the public POSIX types that aren't in ISO C.) Here are the headers in question: http://grok.dragonflybsd.org/xref/freebsd/sys/sys/_types.h That seems OK for stddef.h inclusion, as long as the machine/_types.h is OK. http://grok.dragonflybsd.org/xref/dragonfly/sys/sys/types.h That's definitely not correct to include in stddef.h; it defines lots of types outside the ISO C namespace. -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch] Fix a typo in Chinese msg
On Thu, 1 May 2014, Jeff Law wrote: On 05/01/14 17:59, Joseph S. Myers wrote: On Thu, 24 Apr 2014, Tobias Burnus wrote: Jeff, are you sure that this will not be overridden? I think the translations are always synced from http://translationproject.org/team/zh_CN.html ? Yes. This should be reverted; we shouldn't make any local changes to these files. When there's a new zh_CN.po from the TP (whether resulting from changes by translators, or an automatic merge with a new .pot file), that will be imported verbatim (to both trunk and 4.9 branch), whether or not it includes this change. OK. Sorry about that. Reverted. What's the procedure to contact the translators so they can fix this? Contact the address for the translation team given in the .po file. Unfortunately I got a bounce message in Chinese; maybe it indicates it's a subscriber-posting-only mailing list, which seems unfortunate. -- Joseph S. Myers jos...@codesourcery.com
Re: [C PATCH] Fix up diagnostics (PR c/25801)
On Fri, 2 May 2014, Marek Polacek wrote: On Thu, May 01, 2014 at 10:54:18PM +, Joseph S. Myers wrote: I think the comment on this function should be updated to explain the interface contract when an incomplete (or function) type is passed (i.e. return size_one_node, caller is responsible for any diagnostics). Done. The test should I think also cover incomplete union types, and pointer subtraction for all the different cases of incomplete types, and the += and -= operators (i.e. cover invalid arithmetic on pointers to incomplete types more thoroughly, even if not actually affected by this patch). Done. Another change is that I used COMPLETE_TYPE_P instead of COMPLETE_OR_VOID_TYPE_P, that would be a bug and before it didn't make much sense: we checked VOID_TYPE before. Tested x86_64-linux, ok now? OK. -- Joseph S. Myers jos...@codesourcery.com
[committed] Fix two omp loop issues
Hi! While starting to work on OpenMP 4.0 Fortran stuff, I've noticed some issues in C/C++ too. One problem is that when some variable other than the iteration var is linear on a combined for simd or parallel for simd loop, we weren't adding a barrier between read of the initial value and store of the final value. Fixed by making such vars firstprivate+lastprivate on the omp for loop. The second issue is that if there are lastprivate vars on combined for simd loop, the outer assignment on last iteration was keyed on some fd-loop.v iterator value which hasn't been ever assigned to. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk/4.9. 2014-05-02 Jakub Jelinek ja...@redhat.com * gimplify.c (gimplify_adjust_omp_clauses_1): Handle GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE. (gimplify_adjust_omp_clauses): Simd region is never directly nested in combined parallel. Instead, for linear with copyin/copyout, if in combined for simd loop, make decl firstprivate/lastprivate on OMP_FOR. * omp-low.c (expand_omp_for_generic, expand_omp_for_static_nochunk, expand_omp_for_static_chunk): When setting endvar, also set fd-loop.v to the same value. libgomp/ * testsuite/libgomp.c/simd-10.c: New test. * testsuite/libgomp.c/simd-11.c: New test. * testsuite/libgomp.c/simd-12.c: New test. * testsuite/libgomp.c/simd-13.c: New test. --- gcc/gimplify.c.jj 2014-04-30 19:06:11.0 +0200 +++ gcc/gimplify.c 2014-05-02 11:38:07.908463543 +0200 @@ -6294,9 +6294,17 @@ gimplify_adjust_omp_clauses_1 (splay_tre OMP_CLAUSE_CHAIN (clause) = nc; } } + if (code == OMP_CLAUSE_FIRSTPRIVATE (flags GOVD_LASTPRIVATE) != 0) +{ + tree nc = build_omp_clause (input_location, OMP_CLAUSE_LASTPRIVATE); + OMP_CLAUSE_DECL (nc) = decl; + OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (nc) = 1; + OMP_CLAUSE_CHAIN (nc) = *list_p; + OMP_CLAUSE_CHAIN (clause) = nc; + lang_hooks.decls.omp_finish_clause (nc); +} *list_p = clause; lang_hooks.decls.omp_finish_clause (clause); - return 0; } @@ -6335,18 +6343,17 @@ gimplify_adjust_omp_clauses (tree *list_ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR ctx-outer_context !(OMP_CLAUSE_LINEAR_NO_COPYIN (c) - OMP_CLAUSE_LINEAR_NO_COPYOUT (c)) - !is_global_var (decl)) + OMP_CLAUSE_LINEAR_NO_COPYOUT (c))) { - if (ctx-outer_context-region_type == ORT_COMBINED_PARALLEL) + if (ctx-outer_context-combined_loop + !OMP_CLAUSE_LINEAR_NO_COPYIN (c)) { n = splay_tree_lookup (ctx-outer_context-variables, (splay_tree_key) decl); if (n == NULL || (n-value GOVD_DATA_SHARE_CLASS) == 0) { - int flags = OMP_CLAUSE_LINEAR_NO_COPYIN (c) - ? GOVD_LASTPRIVATE : GOVD_SHARED; + int flags = GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE; if (n == NULL) omp_add_variable (ctx-outer_context, decl, flags | GOVD_SEEN); @@ -6354,7 +6361,7 @@ gimplify_adjust_omp_clauses (tree *list_ n-value |= flags | GOVD_SEEN; } } - else + else if (!is_global_var (decl)) omp_notice_variable (ctx-outer_context, decl, true); } } --- gcc/omp-low.c.jj2014-04-30 09:13:22.0 +0200 +++ gcc/omp-low.c 2014-05-01 15:18:05.368771615 +0200 @@ -5584,6 +5584,12 @@ expand_omp_for_generic (struct omp_regio { stmt = gimple_build_assign (endvar, iend); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); + if (useless_type_conversion_p (TREE_TYPE (fd-loop.v), TREE_TYPE (iend))) + stmt = gimple_build_assign (fd-loop.v, iend); + else + stmt = gimple_build_assign_with_ops (NOP_EXPR, fd-loop.v, iend, +NULL_TREE); + gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); } if (fd-collapse 1) expand_omp_for_init_vars (fd, gsi, counts, inner_stmt, startvar); @@ -6000,6 +6006,12 @@ expand_omp_for_static_nochunk (struct om { stmt = gimple_build_assign (endvar, e); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); + if (useless_type_conversion_p (TREE_TYPE (fd-loop.v), TREE_TYPE (e))) + stmt = gimple_build_assign (fd-loop.v, e); + else + stmt = gimple_build_assign_with_ops (NOP_EXPR, fd-loop.v, e, +NULL_TREE); +
Re: [C PATCH] Improve warn msg (PR c/43395)
On Fri, 2 May 2014, Marek Polacek wrote: Ooops. Is it ok to fix it up with this patch then? 2014-05-02 Marek Polacek pola...@redhat.com c/ * c-typeck.c (c_finish_return): Separate warning_at calls. cp/ * typeck.c (maybe_warn_about_returning_address_of_local): Separate warning_at calls. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [C++ Patch/RFC] PR 58582
I'd tweak instantiate_decl differently; a deleted function is defined, so pattern_defined should be true. Jason
Re: [PATCH] Describe Wsizeof-pointer-memaccess in c.opt
On Fri, 2 May 2014, Marek Polacek wrote: Wsizeof-pointer-memaccess option is missing a description in the c.opt file. This patch adds something in there. Ok? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: Contributing new gcc targets: i386-*-dragonfly and x86-64-*-dragonfly
On 5/2/2014 19:49, Joseph S. Myers wrote: On Fri, 2 May 2014, John Marino wrote: http://grok.dragonflybsd.org/xref/dragonfly/sys/sys/types.h That's definitely not correct to include in stddef.h; it defines lots of types outside the ISO C namespace. Ok. So I guess there are two problems. 1) I don't know which type definitions are missing (iow, the important ones from sys/type.h that are required to build gcc) 2) Mainly because of 1) I don't know what to pull in instead. DragonFly doesn't have this _type.h split like FreeBSD Do you have a suggestion for me on how to proceed? Thanks, John
[patch] libstdc++/61036 - fix dumb typo in shared_ptr
This was a really dumb regression from me :-( Tested x86_64-linux, committed to trunk and 4.9 branch. commit 892be9bd861871e39049c78c75807f5680fffb94 Author: Jonathan Wakely jwak...@redhat.com Date: Fri May 2 18:35:41 2014 +0100 PR libstdc++/61036 * include/bits/shared_ptr_base.h (__shared_ptr::__shared_ptr(_Tp1*)): Check the correct type in the static assertion. * testsuite/20_util/shared_ptr/cons/61036.cc: New. diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index 57398af..c25157f 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -871,7 +871,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_ptr(__p), _M_refcount(__p) { __glibcxx_function_requires(_ConvertibleConcept_Tp1*, _Tp*) - static_assert( !is_void_Tp::value, incomplete type ); + static_assert( !is_void_Tp1::value, incomplete type ); static_assert( sizeof(_Tp1) 0, incomplete type ); __enable_shared_from_this_helper(_M_refcount, __p, __p); } diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/61036.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/61036.cc new file mode 100644 index 000..9cade66 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/61036.cc @@ -0,0 +1,28 @@ +// { dg-options -std=gnu++11 } +// { dg-do compile } + +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// 20.8.2.2 Template class shared_ptr [util.smartptr.shared] + +#include memory + +void test01() +{ + std::shared_ptrvoid p(new int); +}
[wide-int] Handle zero-precision INTEGER_CSTs again
I'd hoped the days of zero-precision INTEGER_CSTs were behind us after Richard's patch to remove min amd max values from zero-width bitfields, but a boostrap-ubsan showed otherwise. One source is in: null_pointer_node = build_int_cst (build_pointer_type (void_type_node), 0); if no_target, since the precision of the type comes from ptr_mode, which remains the default VOIDmode. Maybe that's a bug, but setting it to an arbitrary nonzero value would also be dangerous. The other use is in the C/C++ void_zero_node, which is specifically defined as a VOID_TYPE, zero-precision INTEGER_CST. This is used by the ubsan code in some ?: tests, as well as by other places in the frontend proper. At least the ubsan use persists until gimple. This patch therefore restores the wide-int handling for zero precision, for which the length must be 1 and the single HWI must be zero. I've tried to wrap up most of the dependencies in two new functions, wi::blocks_needed (a function version of the .cc BLOCKS_NEEDED, now also used in the header) and wi::excess_bits, so that it'll be easier to remove the handling again if zero precision ever goes away. There are some remaining, easily-greppable cases that check directly for a precision of 0 though. The combination of this and the other patches allows boostrap-ubsan to complete. There are a lot of extra testsuite failures compared to a normal bootstrap, but they don't look related to wide-int. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-02 16:28:07.657847935 +0100 +++ gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100 @@ -48,8 +48,6 @@ static const HOST_WIDE_INT zeros[WIDE_IN #define HALF_INT_MASK (((HOST_WIDE_INT) 1 HOST_BITS_PER_HALF_WIDE_INT) - 1) #define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT) -#define BLOCKS_NEEDED(PREC) \ - (PREC ? (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT) : 1) #define SIGN_MASK(X) ((HOST_WIDE_INT) (X) 0 ? -1 : 0) /* Return the value a VAL[I] if I LEN, otherwise, return 0 or -1 @@ -69,20 +67,20 @@ safe_uhwi (const HOST_WIDE_INT *val, uns static unsigned int canonize (HOST_WIDE_INT *val, unsigned int len, unsigned int precision) { - unsigned int blocks_needed = BLOCKS_NEEDED (precision); + unsigned int blocks_needed = wi::blocks_needed (precision); HOST_WIDE_INT top; int i; if (len blocks_needed) len = blocks_needed; + if (wi::excess_bits (len, precision) 0) +val[len - 1] = sext_hwi (val[len - 1], precision % HOST_BITS_PER_WIDE_INT); if (len == 1) return len; top = val[len - 1]; - if (len * HOST_BITS_PER_WIDE_INT precision) -val[len - 1] = top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT); - if (top != 0 top != (HOST_WIDE_INT)-1) + if (top != 0 top != (HOST_WIDE_INT) -1) return len; /* At this point we know that the top is either 0 or -1. Find the @@ -134,7 +132,7 @@ wi::from_buffer (const unsigned char *bu /* We have to clear all the bits ourself, as we merely or in values below. */ - unsigned int len = BLOCKS_NEEDED (precision); + unsigned int len = wi::blocks_needed (precision); HOST_WIDE_INT *val = result.write_val (); for (unsigned int i = 0; i len; ++i) val[i] = 0; @@ -180,7 +178,7 @@ wi::to_mpz (const wide_int_ref x, mpz_t { int len = x.get_len (); const HOST_WIDE_INT *v = x.get_val (); - int excess = len * HOST_BITS_PER_WIDE_INT - x.get_precision (); + unsigned int excess = wi::excess_bits (len, x.get_precision ()); if (wi::neg_p (x, sgn)) { @@ -306,7 +304,8 @@ wi::force_to_size (HOST_WIDE_INT *val, c unsigned int xlen, unsigned int xprecision, unsigned int precision, signop sgn) { - unsigned int blocks_needed = BLOCKS_NEEDED (precision); + unsigned int blocks_needed = wi::blocks_needed (precision); + unsigned int xblocks_needed = wi::blocks_needed (xprecision); unsigned int len = blocks_needed xlen ? blocks_needed : xlen; for (unsigned i = 0; i len; i++) val[i] = xval[i]; @@ -318,11 +317,11 @@ wi::force_to_size (HOST_WIDE_INT *val, c /* Expanding. */ if (sgn == UNSIGNED) { - if (small_xprecision len == BLOCKS_NEEDED (xprecision)) + if (small_xprecision len == xblocks_needed) val[len - 1] = zext_hwi (val[len - 1], small_xprecision); else if (val[len - 1] 0) { - while (len BLOCKS_NEEDED (xprecision)) + while (len xblocks_needed) val[len++] = -1; if (small_xprecision) val[len - 1] = zext_hwi (val[len - 1], small_xprecision); @@ -332,7 +331,7 @@ wi::force_to_size (HOST_WIDE_INT *val, c } else { - if (small_xprecision len == BLOCKS_NEEDED (xprecision)) + if (small_xprecision len
[wide-int] Fix some division cases
divmod_internal didn't handle unsigned division in which the inputs have implicit all-one upper bits. There were two problems: - wi_unpack should extend implicit 1s to index blocks_needed - 1 (possibly with a zext_hwi on the last block for small_prec) regardless of signedness - when calculating m and n, divmod_internal only used the *_blocks_needed value if the division was signed. Neither is value is really signed in this context, since we've already calculated absolute values if sgnop == SIGNED, so the neg_p (and its pre-wi:: incarnation) would only have held for the minimum integer. Using a simple while loop to find the top half-HWI seems more direct. Also, divmod_internal_2 took n and m as unsigned values but used HOST_WIDE_INT iterators on m - n. m can be less than n for things like 0 / big, and the negative m - n was being treated as an unsigned int and zero-extended to HOST_WIDE_INT. The patch fixes both the types of n and m and the types of the iterators. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100 +++ gcc/wide-int.cc 2014-05-02 16:40:40.002916374 +0100 @@ -1126,8 +1126,7 @@ wi::add_large (HOST_WIDE_INT *val, const HOST_HALF_WIDE_INTs of RESULT. The rest of RESULT is filled by uncompressing the top bit of INPUT[IN_LEN - 1]. */ static void -wi_unpack (unsigned HOST_HALF_WIDE_INT *result, - const unsigned HOST_WIDE_INT *input, +wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const HOST_WIDE_INT *input, unsigned int in_len, unsigned int out_len, unsigned int prec, signop sgn) { @@ -1145,20 +1144,24 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT * else mask = 0; - for (i = 0; i in_len; i++) + for (i = 0; i blocks_needed - 1; i++) { - HOST_WIDE_INT x = input[i]; - if (i == blocks_needed - 1 small_prec) - { - if (sgn == SIGNED) - x = sext_hwi (x, small_prec); - else - x = zext_hwi (x, small_prec); - } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); result[j++] = x; result[j++] = x HOST_BITS_PER_HALF_WIDE_INT; } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); + if (small_prec) +{ + if (sgn == SIGNED) + x = sext_hwi (x, small_prec); + else + x = zext_hwi (x, small_prec); +} + result[j++] = x; + result[j++] = x HOST_BITS_PER_HALF_WIDE_INT; + /* Smear the sign bit. */ while (j out_len) result[j++] = mask; @@ -1317,10 +1320,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, -half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, -half_blocks_needed, prec, SIGNED); + wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED); + wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED); /* The 2 is for a full mult. */ memset (r, 0, half_blocks_needed * 2 @@ -1519,7 +1520,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_HALF_WIDE_INT *b_remainder, unsigned HOST_HALF_WIDE_INT *b_dividend, unsigned HOST_HALF_WIDE_INT *b_divisor, - unsigned int m, unsigned int n) + int m, int n) { /* The digits are a HOST_HALF_WIDE_INT which the size of half of a HOST_WIDE_INT and stored in the lower bits of each word. This @@ -1530,7 +1531,8 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_WIDE_INT qhat; /* Estimate of quotient digit. */ unsigned HOST_WIDE_INT rhat; /* A remainder. */ unsigned HOST_WIDE_INT p; /* Product of two digits. */ - HOST_WIDE_INT s, i, j, t, k; + HOST_WIDE_INT t, k; + int i, j, s; /* Single digit divisor. */ if (n == 1) @@ -1757,26 +1759,17 @@ wi::divmod_internal (HOST_WIDE_INT *quot } } - wi_unpack (b_dividend, (const unsigned HOST_WIDE_INT *) dividend.get_val (), -dividend.get_len (), dividend_blocks_needed, dividend_prec, sgn); - wi_unpack (b_divisor, (const unsigned HOST_WIDE_INT *) divisor.get_val (), -divisor.get_len (), divisor_blocks_needed, divisor_prec, sgn); - - if (wi::neg_p (dividend, sgn)) -m = dividend_blocks_needed; - else -m = 2 * dividend.get_len (); + wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (), +dividend_blocks_needed, dividend_prec, sgn); + wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (), +divisor_blocks_needed, divisor_prec, sgn); + + m = dividend_blocks_needed; + while (m 1 b_dividend[m - 1] == 0) +m--; - if (wi::neg_p (divisor, sgn)) -n = divisor_blocks_needed; - else -n = 2 *
Re: [wide-int 3/8] Add and use udiv_ceil
Richard Sandiford rdsandif...@googlemail.com writes: Just a minor tweak to avoid several calculations when one would do. Since we have a function for rounded-up division, we might as well use it instead of the (X + Y - 1) / Y idiom. Tested on x86_64-linux-gnu. OK to install? I ended up reverting this. I'd assumed that T would be nonnegative, but it turns out that there are cases where T itself is negative but T + ALIGN - 1 is nonnegative. In those cases the effect of this patch was to do an unsigned division on the negative (i.e. very large positive) number, add 1 to the large result, and then multiply the large result by ALIGN to get zero. Using a ubsan bootstrap after this change showed up some bugs in the divmod code; the symptoms were non-fatal overflow warnings during the bootstrap itself. I sent a separate patch for that and made sure that the failures were gone. With that fixed, the udiv_ceil version gave the same result as the original, but it's a non-optimisation after all. Thanks, Richard Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c 2014-04-22 20:31:25.187000808 +0100 +++ gcc/dwarf2out.c 2014-04-22 20:31:26.374009366 +0100 @@ -14824,7 +14824,7 @@ simple_decl_align_in_bits (const_tree de static inline offset_int round_up_to_align (const offset_int t, unsigned int align) { - return wi::udiv_trunc (t + align - 1, align) * align; + return wi::udiv_ceil (t, align) * align; } /* Given a pointer to a FIELD_DECL, compute and return the byte offset of the Index: gcc/wide-int.h === --- gcc/wide-int.h2014-04-22 20:31:25.842005530 +0100 +++ gcc/wide-int.h2014-04-22 20:31:26.375009373 +0100 @@ -521,6 +521,7 @@ #define SHIFT_FUNCTION \ BINARY_FUNCTION udiv_floor (const T1 , const T2 ); BINARY_FUNCTION sdiv_floor (const T1 , const T2 ); BINARY_FUNCTION div_ceil (const T1 , const T2 , signop, bool * = 0); + BINARY_FUNCTION udiv_ceil (const T1 , const T2 ); BINARY_FUNCTION div_round (const T1 , const T2 , signop, bool * = 0); BINARY_FUNCTION divmod_trunc (const T1 , const T2 , signop, WI_BINARY_RESULT (T1, T2) *); @@ -2566,6 +2567,13 @@ wi::div_ceil (const T1 x, const T2 y, return quotient; } +template typename T1, typename T2 +inline WI_BINARY_RESULT (T1, T2) +wi::udiv_ceil (const T1 x, const T2 y) +{ + return div_ceil (x, y, UNSIGNED); +} + /* Return X / Y, rouding towards nearest with ties away from zero. Treat X and Y as having the signedness given by SGN. Indicate in *OVERFLOW if the result overflows. */
C++ PATCH for c++/60992 (lambda and constant variable)
My change to avoid building a dummy variable in a nested function turns out to have been too aggressive; in this case, we have a reference to a local const variable in the type of an array that we're capturing. It seems safe enough to make dummy copies of const and static variables when instantiating a lambda operator, so let's do that. Tested x86_64-pc-linux-gnu, applying to trunk and 4.9. commit 7638a1fc9626c71c8665263dba6703b007293f92 Author: Jason Merrill ja...@redhat.com Date: Wed Apr 30 13:58:09 2014 -0400 PR c++/60992 * lambda.c (lambda_capture_field_type): Wrap anything dependent other than 'this'. (add_capture): Check for VLA before calling it. * semantics.c (is_this_parameter): Accept any 'this' parameter, not just the current one. Make non-static. * cp-tree.h: Declare it. * pt.c (tsubst_copy) [VAR_DECL]: Also build a new VAR_DECL if the operand was static or constant. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 55ecc4e..34d3d20 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5773,6 +5773,7 @@ extern bool is_sub_constant_expr (tree); extern bool reduced_constant_expression_p (tree); extern void explain_invalid_constexpr_fn (tree); extern vectree cx_error_context (void); +extern bool is_this_parameter (tree); enum { BCS_NO_SCOPE = 1, diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 0b8b46a..5ba6f14 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -216,8 +216,8 @@ lambda_capture_field_type (tree expr, bool explicit_init_p) } else type = non_reference (unlowered_expr_type (expr)); - if (!type || WILDCARD_TYPE_P (type) || type_uses_auto (type) - || DECL_PACK_P (expr)) + if (type_dependent_expression_p (expr) + !is_this_parameter (tree_strip_nop_conversions (expr))) { type = cxx_make_type (DECLTYPE_TYPE); DECLTYPE_TYPE_EXPR (type) = expr; @@ -455,7 +455,7 @@ add_capture (tree lambda, tree id, tree orig_init, bool by_reference_p, if (TREE_CODE (initializer) == TREE_LIST) initializer = build_x_compound_expr_from_list (initializer, ELK_INIT, tf_warning_or_error); - type = lambda_capture_field_type (initializer, explicit_init_p); + type = TREE_TYPE (initializer); if (array_of_runtime_bound_p (type)) { vla = true; @@ -482,15 +482,19 @@ add_capture (tree lambda, tree id, tree orig_init, bool by_reference_p, variable size, TREE_TYPE (type)); type = error_mark_node; } - else if (by_reference_p) + else { - type = build_reference_type (type); - if (!real_lvalue_p (initializer)) - error (cannot capture %qE by reference, initializer); + type = lambda_capture_field_type (initializer, explicit_init_p); + if (by_reference_p) + { + type = build_reference_type (type); + if (!real_lvalue_p (initializer)) + error (cannot capture %qE by reference, initializer); + } + else + /* Capture by copy requires a complete type. */ + type = complete_type (type); } - else -/* Capture by copy requires a complete type. */ -type = complete_type (type); /* Add __ to the beginning of the field name so that user code won't find the field with name lookup. We can't just leave the name diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 48cc2a9..1584eb9 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -12629,13 +12629,17 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl) } else { - /* This can happen for a variable used in a late-specified - return type of a local lambda. Just make a dummy decl - since it's only used for its type. */ - if (cp_unevaluated_operand) - return tsubst_decl (t, args, complain); - gcc_assert (errorcount || sorrycount); - return error_mark_node; + /* This can happen for a variable used in a + late-specified return type of a local lambda, or for a + local static or constant. Building a new VAR_DECL + should be OK in all those cases. */ + r = tsubst_decl (t, args, complain); + if (decl_constant_var_p (r)) + /* A use of a local constant must decay to its value. */ + return integral_constant_value (r); + gcc_assert (cp_unevaluated_operand || TREE_STATIC (r) + || errorcount || sorrycount); + return r; } } } diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 3f8ca44..4afb821 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -8155,10 +8155,11 @@ maybe_initialize_constexpr_call_table (void) /* Return true if T designates the implied `this' parameter. */ -static inline bool +bool is_this_parameter (tree t) { - return t == current_class_ptr; + return (TREE_CODE (t) == PARM_DECL + DECL_NAME (t) == this_identifier); } /* We have an expression tree T that represents a call, either CALL_EXPR diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const3.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const3.C new file mode
Re: Contributing new gcc targets: i386-*-dragonfly and x86-64-*-dragonfly
On Fri, 2 May 2014, John Marino wrote: 1) I don't know which type definitions are missing (iow, the important ones from sys/type.h that are required to build gcc) The default presumption should be: * stddef.h from GCC provides what it needs to provide; nothing extra is needed and such a #include should not be needed at all. * Special measures to avoid duplicate typedefs (where some other header also defines one of the typedefs defined in stddef.h) aren't in fact needed, because GCC allows duplicate typedefs in system headers (even outside C11 mode - in C11 mode it's a standard feature). So try removing that #include. If that causes problems, investigate based on the actual problems seen. -- Joseph S. Myers jos...@codesourcery.com
Re: Contributing new gcc targets: i386-*-dragonfly and x86-64-*-dragonfly
On 5/2/2014 22:15, Joseph S. Myers wrote: On Fri, 2 May 2014, John Marino wrote: 1) I don't know which type definitions are missing (iow, the important ones from sys/type.h that are required to build gcc) The default presumption should be: * stddef.h from GCC provides what it needs to provide; nothing extra is needed and such a #include should not be needed at all. * Special measures to avoid duplicate typedefs (where some other header also defines one of the typedefs defined in stddef.h) aren't in fact needed, because GCC allows duplicate typedefs in system headers (even outside C11 mode - in C11 mode it's a standard feature). So try removing that #include. If that causes problems, investigate based on the actual problems seen. Okay, will do. Thanks!
Re: debug container patch
Hi Jonathan I just wanted to make sure that you are aware that I preferred to wait for another validation of the small modification I have done. François On 28/04/2014 23:07, François Dumont wrote: On 27/04/2014 15:39, Jonathan Wakely wrote: On 17/04/14 22:43 +0200, François Dumont wrote: Hi Here is a patch to globally enhance debug containers implementation. François, sorry for the delay, this is a large patch and I wanted to give it the time it deserves to review properly. No problem, I see that there are a lot of proposals lately. I understand why this is needed, but it changes the layout of the classes in very significant ways, meaning the debug containers will not be compatible across GCC releases. I'm OK with that now, but from the next major GCC release I'd like to avoid that in future. I remember Paolo saying that there is no abi guaranty for debug mode this is why I didn't hesitate in making this proposal. Will there be one in the future ? I plan also breaking changes for profile mode to fix its very bad performance. I noticed also that in std/c++11/debug.cc we have some methods qualified with noexcept while in a C++03 user code those methods will have a throw() qualification. Is that fine ? As I said in my last mail, yes, those specifications are compatible. But I don't think your changes are doing what you think they are doing in all cases. Using _GLIBCXX_NOEXCEPT does not expand to throw() in C++03 mode, it expands to nothing. Yes, I discover this difference in one of your recent mail. * include/debug/safe_unordered_container.tcc N.B. This file has no changes listed in the changelog entry. I reviewed the ChangeLog and limit modifications like in this file. Note however that patch have been generated with '-x -b' option to hide white space modifications. I clean usage of white chars in impacted files, replaced some white spaces with tabs and remove useless white spaces. @@ -69,8 +75,26 @@ // 23.2.1.1 construct/copy/destroy: - deque() : _Base() { } +#if __cplusplus 201103L + deque() + : _Base() { } + deque(const deque __x) + : _Base(__x) { } + + ~deque() _GLIBCXX_NOEXCEPT { } In C++03 mode the _GLIBCXX_NOEXCEPT macro expands to an empty string, so it is useless in this chunk of code, which is only compiled for C++03 mode. It should probably just be removed here (and in all the other debug containers which use it in C++03-only code). Ok, I cleaned those. Did you mean removing the whole explicit destructor ? Is it a coding Standard to always explicitly implement the destructor or just a way to have Doxygen generate ? + * before-begin ownership.*/ + templatetypename _SafeSequence +void +_Safe_forward_list_SafeSequence:: +_M_swap(_Safe_sequence_base __other) noexcept +{ + __gnu_cxx::__scoped_lock sentry(_M_this()._M_get_mutex()); Shouldn't we be locking both containers' mutexes here? As we do in src/c++11/debug.cc Good point, not a regression but nice to fix in this patch. + forward_list(forward_list __list, const allocator_type __al) +: _Safe(std::move(__list), __al), + _Base(std::move(__list), __al) + { } This makes me feel uneasy, seeing a moved-from object being used again, but I don't think changing it to use static_casts to the two base classes would look better, so let's leave it like that. That indeed looks scary, we replaced with: forward_list(forward_list __list, const allocator_type __al) : _Safe(std::move(__list._M_safe()), __al), _Base(std::move(__list._M_base()), __al) { } it makes clearer the fact that we move each part. Index: include/debug/safe_base.h === --- include/debug/safe_base.h(revision 209446) +++ include/debug/safe_base.h(working copy) @@ -188,22 +188,18 @@ protected: // Initialize with a version number of 1 and no iterators -_Safe_sequence_base() +_Safe_sequence_base() _GLIBCXX_NOEXCEPT This use of _GLIBCXX_NOEXCEPT are correct, if the intention is to be noexcept in C++11 and have no exception specification in C++98/C++03. Yes, I preferred to use default implementation for special function in C++11 so I qualified as many things as possible noexcept so that resulting noexcept qualification depends only on the normal mode noexcept qualification. : _M_iterators(0), _M_const_iterators(0), _M_version(1) { } #if __cplusplus = 201103L _Safe_sequence_base(const _Safe_sequence_base) noexcept : _Safe_sequence_base() { } - -_Safe_sequence_base(_Safe_sequence_base __x) noexcept - : _Safe_sequence_base() -{ _M_swap(__x); } #endif /** Notify all iterators that reference this sequence that the sequence is being destroyed. */ -~_Safe_sequence_base() +~_Safe_sequence_base() _GLIBCXX_NOEXCEPT This is redundant. In C++03 the macro expands to nothing, and in
Re: [C++ Patch/RFC] PR 58582
Hi, On 05/02/2014 08:05 PM, Jason Merrill wrote: I'd tweak instantiate_decl differently; a deleted function is defined, so pattern_defined should be true. I see, thanks. Simply setting pattern_defined and nothing else appears to work very well. The only minor annoyance is that DECL_STRUCT_FUNCTION can be null and we have to check for that before using it (in fact, we do that in 2 other places too). Tested x86_64-linux. Thanks again, Paolo. /// Index: cp/decl.c === --- cp/decl.c (revision 210004) +++ cp/decl.c (working copy) @@ -7822,6 +7822,8 @@ grokfndecl (tree ctype, decl, ctype); return NULL_TREE; } + if (ok == error_mark_node) + return NULL_TREE; return old_decl; } } Index: cp/pt.c === --- cp/pt.c (revision 210004) +++ cp/pt.c (working copy) @@ -19616,7 +19617,8 @@ instantiate_decl (tree d, int defer_ok, if (TREE_CODE (d) == FUNCTION_DECL) pattern_defined = (DECL_SAVED_TREE (code_pattern) != NULL_TREE - || DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern)); + || DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern) + || DECL_DELETED_FN (code_pattern)); else pattern_defined = ! DECL_IN_AGGR_P (code_pattern); @@ -19858,14 +19860,17 @@ instantiate_decl (tree d, int defer_ok, tf_warning_or_error, tmpl, /*integral_constant_expression_p=*/false); - /* Set the current input_location to the end of the function -so that finish_function knows where we are. */ - input_location - = DECL_STRUCT_FUNCTION (code_pattern)-function_end_locus; + if (DECL_STRUCT_FUNCTION (code_pattern)) + { + /* Set the current input_location to the end of the function +so that finish_function knows where we are. */ + input_location + = DECL_STRUCT_FUNCTION (code_pattern)-function_end_locus; - /* Remember if we saw an infinite loop in the template. */ - current_function_infinite_loop - = DECL_STRUCT_FUNCTION (code_pattern)-language-infinite_loop; + /* Remember if we saw an infinite loop in the template. */ + current_function_infinite_loop + = DECL_STRUCT_FUNCTION (code_pattern)-language-infinite_loop; + } } /* We don't need the local specializations any more. */ Index: testsuite/g++.dg/cpp0x/deleted4.C === --- testsuite/g++.dg/cpp0x/deleted4.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted4.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +templateint void A::foo() { int i; } // { dg-error redefinition } + +template void A::foo0(); Index: testsuite/g++.dg/cpp0x/deleted5.C === --- testsuite/g++.dg/cpp0x/deleted5.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted5.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +templateint void A::foo() {} // { dg-error redefinition } + +template void A::foo0(); Index: testsuite/g++.dg/cpp0x/deleted6.C === --- testsuite/g++.dg/cpp0x/deleted6.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted6.C (working copy) @@ -0,0 +1,9 @@ +// PR c++/58582 +// { dg-do compile { target c++11 } } + +struct A +{ + templateint void foo() = delete; +}; + +template void A::foo0();
[PATCH 0/3] Compile-time gimple checking, without typedefs
This patch series demonstrates a way of reimplementing the 89-patch series: [PATCH 00/89] Compile-time gimple-checking http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01148.html whilst avoiding introducing a pair of gimple_foo/const_gimple_foo typedefs for each subclass. It eliminates the gimple and const_gimple typedefs, renaming gimple_statement_base to gimple_stmt, giving types: gimple_stmt * and const gimple_stmt * thoughout the middle-end. The rest of the gimple statement classes are renamed, converting the various gimple_statement_with_FOO to: gimple_stmt_with_FOO and the remainder: gimple_statement_SOME_SUBCLASS to just: gimple_SOME_SUBCLASS The idea is then to reimplement the earlier patch series, porting many of these: gimple_stmt *something to point to some more concrete subclass; I've done this for GIMPLE_SWITCH. It requires two patches that I've already posted separately: (A): [PATCH] gengtype: Support explicit pointers in template arguments: http://gcc.gnu.org/ml/gcc-patches/2014-05/msg3.html (which apparently will need reworking after wide-int is merged; oh well). (B): [PATCH 19/89] Const-correctness of gimple_call_builtin_p: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01194.html (I have a separate bootstrapregrtest in progress for just this one, which appears to be pre-approved per Jeff's earlier comments). Of the 3 patches in the series itself: Patch 1: This one is handwritten: it renames the gimple statement classes in their declarations, in the docs, in the selftests and renames explicit uses of *subclasses*. Patch 2: This one is autogenerated: it does the mass conversion of gimple to gimple_stmt * and const_gimple to const gimple_stmt * throughout the code. The conversion script is at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/rename_gimple.py It's not a real C++ parser, but it has some smarts for handling awkward cases like: gimple def0, def2; which must become: gimple_stmt *def0, *def2; ^ note the second '*' character. You can see the selftests for the conversion script at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_rename_gimple.py Patch 3: This one is a port of the previously-posted: [PATCH 02/89] Introduce gimple_switch and use it in various places http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01154.html to the new approach. As per an earlier revision: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01549.html it *eliminates* the stmt-as_a_gimple_switch () and stmt-dyn_cast_gimple_switch () casting methods from the base class in favor of direct usage of is-a.h: as_a gimple_switch * (stmt) and dyn_cast gimple_switch * (stmt) This version of the patch also eliminates the gimple_switch / const_gimple_switch typedefs from the initial patch series in favor of gimple_switch being the name of the subclass. Hopefully porting this first patch in the series gives a sense of what the end-result would look like. There are a few possible variations on the names we could pick in this approach. I deliberately renamed gimple to gimple_stmt since IIRC Andrew MacLeod had suggested something like this, for the potential of making gimple be a namespace. That said, using namespaces may be controversial, and would need further gengtype support, which may be tricky (i.e. fully teach gengtype about C++ namespaces, which may be a gengtype hack too far). [I'd much prefer to avoid C++ namespaces in the core code, mostly because of gengtype]. Also, AIUI, Andrew is looking at introducing concepts of gimple types and gimple expressions, so gimple may no longer imply a *statement*. Alternatively, we could make the base class be just gimple (which would be more consistent with the names of the accessor functions). There's also the bargain basement namespaces approach, where we don't have an implicit gimple namespace, but just *pretend* we do, and rename the base type to stmt, with e.g. gimple_statement_phi becoming just phi. [gimple_switch would need to become switch_, say, to avoid the reserved word]. Successfully bootstrapped and regrtested on x86_64-unknown-linux-gnu (Fedora 20) (on top of the 2 dependent patches mentioned above; equal results compared to a control build of r209953. How does this look? (for trunk; after 4.9.1 is released). David Malcolm (3): Handwritten part of conversion of gimple to gimple_stmt * Autogenerated part of conversion of gimple to gimple_stmt * Introduce gimple_switch and use it in various places gcc/asan.c | 36 +- gcc/builtins.c | 10 +- gcc/builtins.h |2 +- gcc/c-family/c-gimplify.c|4 +- gcc/calls.c
[PATCH 3/3] Introduce gimple_switch and use it in various places
gcc/ * gimple.h (gimple_switch): New subclass of gimple_stmt_with_ops, adding the invariant that stmt-code == GIMPLE_SWITCH. (is_a_helper gimple_switch *::test): New. (is_a_helper const gimple_switch *::test): New. (gimple_try): Likewise. * coretypes.h (gimple_switch): Add forward declaration here. * gdbhooks.py (build_pretty_printer): Add gimple_switch, using the gimple printer. * gimple.c (gimple_build_switch_nlabels): Return a gimple_switch rather than just a gimple. (gimple_build_switch): Likewise. * gimple.h (gimple_build_switch_nlabels): Likewise. (gimple_build_switch): Likewise. * gimple.h (gimple_switch_num_labels): Update type-signature to require a gimple_switch rather than just a gimple. (gimple_switch_set_num_labels): Likewise. (gimple_switch_set_index): Likewise. (gimple_switch_label): Likewise. (gimple_switch_set_label): Likewise. (gimple_switch_default_label): Likewise. (gimple_switch_set_default_label): Likewise. * expr.h (expand_case): Likewise. * gimple-pretty-print.c (dump_gimple_call): Likewise. * stmt.c (compute_cases_per_edge): Likewise. (expand_case): Likewise. * tree-cfg.h (group_case_labels_stmt): Likewise. * tree-cfg.c (make_gimple_switch_edges): Likewise. (find_taken_edge_switch_expr) Likewise. (find_case_label_for_value) Likewise. (get_cases_for_edge): Likewise. (group_case_labels_stmt): Likewise. (verify_gimple_switch): Likewise. * tree-eh.c (verify_norecord_switch_expr): Likewise. * tree-eh.c (lower_eh_constructs_2): Likewise. * tree-loop-distribution.c (generate_loops_for_partition): Likewise. * tree-ssa-dom.c (record_edge_info): Likewise. * tree-ssa-forwprop.c (simplify_gimple_switch_label_vec): Likewise. (simplify_gimple_switch): Likewise. * tree-switch-conversion.c (emit_case_bit_tests): Likewise. (collect_switch_conv_info): Likewise. (build_constructors): Likewise. (array_value_type): Likewise. (build_one_array): Likewise. (build_arrays): Likewise. (gen_inbound_check): Likewise. * tree-vrp.c (find_switch_asserts): Likewise. (find_case_label_range): Likewise. (find_case_label_ranges): Likewise. (vrp_visit_switch_stmt): Likewise. (simplify_switch_using_ranges): Likewise. * tree-vrp.c (switch_update): Strengthen field stmt from being merely a gimple_stmt * to being a gimple_switch *. * cfgexpand.c (expand_gimple_stmt_1): Add checked cast to gimple_switch in regions where the stmt code has been tested as GIMPLE_SWITCH. * gimple-pretty-print.c (pp_gimple_stmt_1): Likewise. * tree-cfg.c (make_edges): Likewise. (end_recording_case_labels): Likewise. (cleanup_dead_labels): Likewise. (cleanup_dead_labels): Likewise. (group_case_labels): Likewise. (find_taken_edge): Likewise. (find_case_label_for_value): Likewise. (verify_gimple_stmt): Likewise. (gimple_verify_flow_info): Likewise. (gimple_redirect_edge_and_branch): Likewise. * tree-inline.c (estimate_num_insns): Likewise. * tree-ssa-forwprop.c (ssa_forward_propagate_and_combine): Likewise. * tree-ssa-uncprop.c (associate_equivalences_with_edges): Likewise. * tree-switch-conversion.c (do_switchconv): Likewise. * tree-vrp.c (find_assert_locations_1): Likewise. (vrp_visit_stmt): Likewise. (simplify_stmt_using_ranges): Likewise. * ipa-inline-analysis.c (set_switch_stmt_execution_predicate): Introduce local lastg as a generic gimple, so that local last can be of type gimple_switch once lastg's code has been verified. * omp-low.c (diagnose_sb_2): Introduce switch_stmt local to handle the GIMPLE_SWITCH case. * tree-cfg.c (find_taken_edge_switch_expr): Add gimple_switch argument, since the caller (find_taken_edge) has checked that last_stmt is a switch. * doc/gimple.texi (Class hierarchy of GIMPLE statements): Add gimple_switch class. (GIMPLE_SWITCH): Update signatures of accessor functions to reflect above gimple_stmt to gimple_switch changes. --- gcc/cfgexpand.c | 2 +- gcc/coretypes.h | 5 gcc/doc/gimple.texi | 34 ++--- gcc/expr.h | 2 +- gcc/gdbhooks.py | 4 ++- gcc/gimple-pretty-print.c| 4 +-- gcc/gimple.c | 11 + gcc/gimple.h | 44 ++--- gcc/ipa-inline-analysis.c| 7 +++--- gcc/omp-low.c| 5 ++-- gcc/stmt.c | 4 +-- gcc/tree-cfg.c |
[PATCH, RS6000] Fix __builtin_{pack,unpack}_longdouble and __builtin_{pack,unpack}_dec128 builtins
Currently, the gcc.target/powerpc/pack0[23].c test cases are failing on trunk and the 4.9 and 4.8 branches. The problem is we either fail to register the builtins leading to undefined reference errors to the __builtin_* symbol or we ICE due to having registered the builtin, but due to compiler options, rs6000_expand_builtin doesn't think we should expanding the builtin. I fixed the new pack/unpack long double builtins by enabling them for TARGET_HARD_FLOAT and easing the gcc_assert in rs6000_expand_builtin(). The pack/unpack DFP builtins just required passing -mhard-dfp as an option to the test case. I also reduced the dg-require-effective-target, since the test case works just fine on non-vsx POWER6 hardware. In the process, I noticed we didn't disallow -msoft-float -mhard-dfp being used together, so we enforce that now. I'll note this is a precursor patch to using the __builtin_pack_longdouble builtin in libgcc/config/rs6000/ibm-ldouble.c which currently uses code that copies data through a union. The code it produces is horrific, as it stores the two double fprs to memory, loads them into gprs, then stores them back to memory before loading them back into fprs again. The builtin will also us to get a simple fmr(s). This passed bootstrap and regtesting on powerpc64-linux with no regressions and the pack0[23].c tests now pass. Ok for trunk? Is this also ok for the 4.8/4.9 branches once my testing on those are complete? Peter gcc/ * config/rs6000/rs6000.h (RS6000_BTM_HARD_FLOAT): New define. (RS6000_BTM_COMMON): Add RS6000_BTM_HARD_FLOAT. (TARGET_EXTRA_BUILTINS): Add TARGET_HARD_FLOAT. * config/rs6000/rs6000-builtin.def (BU_MISC_1): Use RS6000_BTM_HARD_FLOAT. (BU_MISC_2): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Handle RS6000_BTM_HARD_FLOAT. (rs6000_option_override_internal): Enforce -mhard-float if -mhard-dfp is explicitly used. (rs6000_invalid_builtin): Add hard floating builtin support. (rs6000_expand_builtin): Relax the gcc_assert to allow the new hard float builtins. (rs6000_builtin_mask_names): Add RS6000_BTM_HARD_FLOAT. gcc/testsuite/ * gcc.target/powerpc/pack02.c (dg-options): Add -mhard-float. (dg-require-effective-target): Change target to powerpc_fprs. * gcc.target/powerpc/pack03.c (dg-options): Add -mhard-dfp. (dg-require-effective-target): Change target to dfprt. Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 209990) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -624,7 +624,8 @@ extern int rs6000_vector_align[]; || TARGET_CMPB /* ISA 2.05 */ \ || TARGET_POPCNTD /* ISA 2.06 */ \ || TARGET_ALTIVEC \ - || TARGET_VSX))) + || TARGET_VSX \ + || TARGET_HARD_FLOAT))) /* E500 cores only support plain sync, not lwsync. */ #define TARGET_NO_LWSYNC (rs6000_cpu == PROCESSOR_PPC8540 \ @@ -2517,6 +2518,7 @@ extern int frame_pointer_needed; #define RS6000_BTM_POPCNTD MASK_POPCNTD/* Target supports ISA 2.06. */ #define RS6000_BTM_CELLMASK_FPRND /* Target is cell powerpc. */ #define RS6000_BTM_DFP MASK_DFP/* Decimal floating point. */ +#define RS6000_BTM_HARD_FLOAT MASK_SOFT_FLOAT /* Hardware floating point. */ #define RS6000_BTM_COMMON (RS6000_BTM_ALTIVEC \ | RS6000_BTM_VSX \ @@ -2529,7 +2531,8 @@ extern int frame_pointer_needed; | RS6000_BTM_HTM \ | RS6000_BTM_POPCNTD \ | RS6000_BTM_CELL \ -| RS6000_BTM_DFP) +| RS6000_BTM_DFP \ +| RS6000_BTM_HARD_FLOAT) /* Define builtin enum index. */ Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 209990) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -626,7 +626,7 @@ #define BU_MISC_1(ENUM, NAME, ATTR, ICODE) \ RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ __builtin_ NAME, /* NAME */ \ - RS6000_BTM_ALWAYS, /* MASK */ \ + RS6000_BTM_HARD_FLOAT, /* MASK */ \ (RS6000_BTC_ ## ATTR
Re: [PATCH] Detect a pack-unpack pattern in GCC vectorizer and optimize it.
On Mon, Apr 28, 2014 at 4:04 AM, Richard Biener rguent...@suse.de wrote: On Thu, 24 Apr 2014, Cong Hou wrote: Given the following loop: int a[N]; short b[N*2]; for (int i = 0; i N; ++i) a[i] = b[i*2]; After being vectorized, the access to b[i*2] will be compiled into several packing statements, while the type promotion from short to int will be compiled into several unpacking statements. With this patch, each pair of pack/unpack statements will be replaced by less expensive statements (with shift or bit-and operations). On x86_64, the loop above will be compiled into the following assembly (with -O2 -ftree-vectorize): movdqu 0x10(%rcx),%xmm3 movdqu -0x20(%rcx),%xmm0 movdqa %xmm0,%xmm2 punpcklwd %xmm3,%xmm0 punpckhwd %xmm3,%xmm2 movdqa %xmm0,%xmm3 punpcklwd %xmm2,%xmm0 punpckhwd %xmm2,%xmm3 movdqa %xmm1,%xmm2 punpcklwd %xmm3,%xmm0 pcmpgtw %xmm0,%xmm2 movdqa %xmm0,%xmm3 punpckhwd %xmm2,%xmm0 punpcklwd %xmm2,%xmm3 movups %xmm0,-0x10(%rdx) movups %xmm3,-0x20(%rdx) With this patch, the generated assembly is shown below: movdqu 0x10(%rcx),%xmm0 movdqu -0x20(%rcx),%xmm1 pslld $0x10,%xmm0 psrad $0x10,%xmm0 pslld $0x10,%xmm1 movups %xmm0,-0x10(%rdx) psrad $0x10,%xmm1 movups %xmm1,-0x20(%rdx) Bootstrapped and tested on x86-64. OK for trunk? This is an odd place to implement such transform. Also if it is faster or not depends on the exact ISA you target - for example ppc has constraints on the maximum number of shifts carried out in parallel and the above has 4 in very short succession. Esp. for the sign-extend path. Thank you for the information about ppc. If this is an issue, I think we can do it in a target dependent way. So this looks more like an opportunity for a post-vectorizer transform on RTL or for the vectorizer special-casing widening loads with a vectorizer pattern. I am not sure if the RTL transform is more difficult to implement. I prefer the widening loads method, which can be detected in a pattern recognizer. The target related issue will be resolved by only expanding the widening load on those targets where this pattern is beneficial. But this requires new tree operations to be defined. What is your suggestion? I apologize for the delayed reply. thanks, Cong Richard.
Re: [wide-int] Add more assertions
These are fine. On 05/02/2014 03:20 PM, Richard Sandiford wrote: This patch adds some assertions against sext (.., 0) and zext (..., 0). The former is undefined at the sext_hwi level and the latter is disallowed for consistency with the former. Also, set_bit (rightly IMO) can't handle bit = precision. For precision = HOST_BITS_PER_WIDE_INT it would invoke undefined behaviour while for other precisions I think it would crash. A case with precision = HOST_BITS_PER_WIDE_INT showed up in java (fix posted separately). Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2014-05-02 16:28:09.561842842 +0100 +++ gcc/wide-int.h 2014-05-02 16:44:04.015463718 +0100 @@ -2046,6 +2046,8 @@ wi::sext (const T x, unsigned int offse unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + gcc_checking_assert (offset != 0); + if (offset = HOST_BITS_PER_WIDE_INT) { val[0] = sext_hwi (xi.ulow (), offset); @@ -2065,6 +2067,8 @@ wi::zext (const T x, unsigned int offse unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + gcc_checking_assert (offset != 0); + /* This is not just an optimization, it is actually required to maintain canonization. */ if (offset = precision) @@ -2102,6 +2106,9 @@ wi::set_bit (const T x, unsigned int bi WI_UNARY_RESULT_VAR (result, val, T, x); unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + + gcc_checking_assert (bit precision); + if (precision = HOST_BITS_PER_WIDE_INT) { val[0] = xi.ulow () | ((unsigned HOST_WIDE_INT) 1 bit);
Re: [wide-int] Fix some division cases
this is fine. On 05/02/2014 03:22 PM, Richard Sandiford wrote: divmod_internal didn't handle unsigned division in which the inputs have implicit all-one upper bits. There were two problems: - wi_unpack should extend implicit 1s to index blocks_needed - 1 (possibly with a zext_hwi on the last block for small_prec) regardless of signedness - when calculating m and n, divmod_internal only used the *_blocks_needed value if the division was signed. Neither is value is really signed in this context, since we've already calculated absolute values if sgnop == SIGNED, so the neg_p (and its pre-wi:: incarnation) would only have held for the minimum integer. Using a simple while loop to find the top half-HWI seems more direct. Also, divmod_internal_2 took n and m as unsigned values but used HOST_WIDE_INT iterators on m - n. m can be less than n for things like 0 / big, and the negative m - n was being treated as an unsigned int and zero-extended to HOST_WIDE_INT. The patch fixes both the types of n and m and the types of the iterators. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100 +++ gcc/wide-int.cc 2014-05-02 16:40:40.002916374 +0100 @@ -1126,8 +1126,7 @@ wi::add_large (HOST_WIDE_INT *val, const HOST_HALF_WIDE_INTs of RESULT. The rest of RESULT is filled by uncompressing the top bit of INPUT[IN_LEN - 1]. */ static void -wi_unpack (unsigned HOST_HALF_WIDE_INT *result, - const unsigned HOST_WIDE_INT *input, +wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const HOST_WIDE_INT *input, unsigned int in_len, unsigned int out_len, unsigned int prec, signop sgn) { @@ -1145,20 +1144,24 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT * else mask = 0; - for (i = 0; i in_len; i++) + for (i = 0; i blocks_needed - 1; i++) { - HOST_WIDE_INT x = input[i]; - if (i == blocks_needed - 1 small_prec) - { - if (sgn == SIGNED) - x = sext_hwi (x, small_prec); - else - x = zext_hwi (x, small_prec); - } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); result[j++] = x; result[j++] = x HOST_BITS_PER_HALF_WIDE_INT; } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); + if (small_prec) +{ + if (sgn == SIGNED) + x = sext_hwi (x, small_prec); + else + x = zext_hwi (x, small_prec); +} + result[j++] = x; + result[j++] = x HOST_BITS_PER_HALF_WIDE_INT; + /* Smear the sign bit. */ while (j out_len) result[j++] = mask; @@ -1317,10 +1320,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, -half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, -half_blocks_needed, prec, SIGNED); + wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED); + wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED); /* The 2 is for a full mult. */ memset (r, 0, half_blocks_needed * 2 @@ -1519,7 +1520,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_HALF_WIDE_INT *b_remainder, unsigned HOST_HALF_WIDE_INT *b_dividend, unsigned HOST_HALF_WIDE_INT *b_divisor, - unsigned int m, unsigned int n) + int m, int n) { /* The digits are a HOST_HALF_WIDE_INT which the size of half of a HOST_WIDE_INT and stored in the lower bits of each word. This @@ -1530,7 +1531,8 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_WIDE_INT qhat; /* Estimate of quotient digit. */ unsigned HOST_WIDE_INT rhat; /* A remainder. */ unsigned HOST_WIDE_INT p; /* Product of two digits. */ - HOST_WIDE_INT s, i, j, t, k; + HOST_WIDE_INT t, k; + int i, j, s; /* Single digit divisor. */ if (n == 1) @@ -1757,26 +1759,17 @@ wi::divmod_internal (HOST_WIDE_INT *quot } } - wi_unpack (b_dividend, (const unsigned HOST_WIDE_INT *) dividend.get_val (), -dividend.get_len (), dividend_blocks_needed, dividend_prec, sgn); - wi_unpack (b_divisor, (const unsigned HOST_WIDE_INT *) divisor.get_val (), -divisor.get_len (), divisor_blocks_needed, divisor_prec, sgn); - - if (wi::neg_p (dividend, sgn)) -m = dividend_blocks_needed; - else -m = 2 * dividend.get_len (); + wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (), +dividend_blocks_needed, dividend_prec, sgn); + wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (), +divisor_blocks_needed, divisor_prec, sgn); + + m = dividend_blocks_needed; + while (m 1 b_dividend[m - 1] ==