Re: [PATCH, S390] Avoid LA with base and index on z13
Robin Dapp wrote: > * config/s390/s390.c (preferred_la_operand_p): Do not use > LA with base and index on z13 or later. The code just before your change reads: /* Avoid LA instructions with index register on z196; it is preferable to use regular add instructions when possible. Starting with zEC12 the la with index register is "uncracked" again. */ if (addr.indx && s390_tune == PROCESSOR_2817_Z196) return false; But on zEC12 LA works pretty much the same as on z13/z14, it is indeed not cracked, but still a 2-cycle instruction when using an index register. So I guess the change really should apply to zEC12 as well, and this could be as simple as changing the above line to: if (addr.indx && s390_tune >= PROCESSOR_2817_Z196) (Note that "addr.base && addr.indx" is the same as just checking for addr.indx, since s390_decompose_address will never fill in *just* an index.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Build fail on gthr-simple.h targets (Re: AsyncI/O patch committed)
Nicholas Koenig wrote: > Hello everyone, > > I have committed the async I/O patch as r262978. > > The test cases are in libgomp.fortran for now, maybe that can be changed > later. It looks like this broke building libgfortran on spu, and presumably any platform that uses gthr-simple instead of gthr-posix. The problem is that io/asynch.h unconditionally uses a couple of features that are not provided by gthr-simplex, in particular __gthread_cond_t and __gthread_equal / __gthread_self According to the documentation in gthr.h, the former is only available if __GTHREAD_HAS_COND is defined, and the latter are only available if __GTHREADS_CXX0X is defined. Neither is true for gthr-simple.h. To fix the build error, either libgfortran should only use those features conditionally on those defines, or else the gthr.h logic needs to be changed and (stubs for) those features provided in gthr-simple.h as well. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Build fail on gthr-single.h targets (Re: AsyncI/O patch committed)
I wrote: > Nicholas Koenig wrote: > > > Hello everyone, > > > > I have committed the async I/O patch as r262978. > > > > The test cases are in libgomp.fortran for now, maybe that can be changed > > later. > > It looks like this broke building libgfortran on spu, and presumably > any platform that uses gthr-simple instead of gthr-posix. The file is called gthr-single.h, not gthr-simple.h ... sorry for the typo. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH] Fix -ffast-math flags handling inconsistencies
Hello, we've noticed some inconsistencies in how the component flags of -ffast-math are handled, see the discussion on the GCC list starting here: https://gcc.gnu.org/ml/gcc/2020-01/msg00365.html This patch fixes those inconsistencies. Specifically, there are the following changes: 1. Some component flags for -ffast-math are *set* with -ffast-math (changing the default), but are not reset with -fno-fast-math, causing the latter to have surprising results. (Note that since "-ffast-math -fno-fast-math" is short-cut by the driver, you'll only see the surprising results with "-Ofast -fno-fast-math".) This is fixed here by both setting and resetting the flags. This affects the following flags -fcx-limited-range -fexcess-precision= 2. Some component flags for -ffast-math are changed from their default, but are *not* included in the fast_math_flags_set_p test, causing __FAST_MATH__ to remain predefined even when the full set of fast math options is not actually in effect. This is fixed here by adding those flags into the fast_math_flags_set_p test. This affects the following flags: -fcx-limited-range -fassociative-math -freciprocal-math 3. For some math flags, set_fast_math_flags has code that sets their values only to what is already their default. The overall effect of this code is a complete no-op. This patch removes that dead code. This affects the following flags: -frounding-math -fsignaling-nans The overall effect of this patch is that now all component flags of -ffast-math are treated exactly equivalently: * they are set (changed from their default) with -ffast-math * they are reset to their default with -fno-fast-math * __FAST_MATH__ is only defined if the value of the flag matches what -ffast-math would have set it to Tested on s390x-ibm-linux. OK for mainline? Bye, Ulrich ChangeLog: * opts.c (set_fast_math_flags): In the !set case, also reset x_flag_cx_limited_range and x_flag_excess_precision. Remove dead code resetting x_flag_signaling_nans and x_flag_rounding_math. (fast_math_flags_set_p): Also test x_flag_cx_limited_range, x_flag_associative_math, and x_flag_reciprocal_math. diff --git a/gcc/opts.c b/gcc/opts.c index 7affeb4..4452793 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set) opts->x_flag_finite_math_only = set; if (!opts->frontend_set_flag_errno_math) opts->x_flag_errno_math = !set; - if (set) -{ - if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT) - opts->x_flag_excess_precision - = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; - if (!opts->frontend_set_flag_signaling_nans) - opts->x_flag_signaling_nans = 0; - if (!opts->frontend_set_flag_rounding_math) - opts->x_flag_rounding_math = 0; - if (!opts->frontend_set_flag_cx_limited_range) - opts->x_flag_cx_limited_range = 1; -} + if (!opts->frontend_set_flag_cx_limited_range) +opts->x_flag_cx_limited_range = set; + if (!opts->frontend_set_flag_excess_precision) +opts->x_flag_excess_precision + = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; + + // -ffast-math should also reset -fsignaling-nans and -frounding-math, + // but since those are off by default, there's nothing to do for now. } /* When -funsafe-math-optimizations is set the following @@ -2884,10 +2880,13 @@ bool fast_math_flags_set_p (const struct gcc_options *opts) { return (!opts->x_flag_trapping_math + && !opts->x_flag_signed_zeros + && opts->x_flag_associative_math + && opts->x_flag_reciprocal_math && opts->x_flag_unsafe_math_optimizations && opts->x_flag_finite_math_only - && !opts->x_flag_signed_zeros && !opts->x_flag_errno_math + && opts->x_flag_cx_limited_range && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST); } -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Fix -ffast-math flags handling inconsistencies
Richard Biener wrote: > On Fri, Jan 31, 2020 at 6:01 PM Ulrich Weigand wrote: > > The overall effect of this patch is that now all component flags of > > -ffast-math are treated exactly equivalently: > > * they are set (changed from their default) with -ffast-math > > * they are reset to their default with -fno-fast-math > > * __FAST_MATH__ is only defined if the value of the flag matches > > what -ffast-math would have set it to > > The last part is not obviously correct to me since it doesn't match > documentation which says > > @item -ffast-math > @opindex ffast-math > Sets the options @option{-fno-math-errno}, > @option{-funsafe-math-optimizations}, > @option{-ffinite-math-only}, @option{-fno-rounding-math}, > @option{-fno-signaling-nans}, @option{-fcx-limited-range} and > @option{-fexcess-precision=fast}. > > This option causes the preprocessor macro @code{__FAST_MATH__} to be defined. > > to me this reads as -ffast-math -fexcess-precision=standard defines > __FAST_MATH__. > The only relevant part to define __FAST_MATH__ is specifying -ffast-math, > other > options are not relevant (which of course is contradicted by > implementation - where > I didn't actually follow its history in that respect). So can you > adjust documentation > as to when exactly __FAST_MATH__ is defined? Agreed. This should probably be worded something along the lines of: "Whenever all of those options are set to these values as listed above, the preprocessor macro @code{__FAST_MATH__} will be defined." > > - if (!opts->frontend_set_flag_signaling_nans) > > - opts->x_flag_signaling_nans = 0; > > - if (!opts->frontend_set_flag_rounding_math) > > - opts->x_flag_rounding_math = 0; [...] > > + // -ffast-math should also reset -fsignaling-nans and -frounding-math, > > + // but since those are off by default, there's nothing to do for now. > ... > > but what happens to -fsignalling-nans -ffast-math then? Better leave those > in I'd say. Ah, it seems I was confused about the intended semantics here. I thought that a *more specific* option like -fsignalling-nans was always intended to override a more generic option like -ffast-math, no matter whether it comes before or after it on the command line. Is it instead the case that the intended behavior is the generic option overrides the specific option if the generic option comes later on the command line? In that case, I agree that those should be left in. However, then I think it would be good to add tests for !flag_signaling_nans / !flag_rounding_math to fast_math_flags_set_p, because with these options on, we aren't really in fast-math territory any more ... > Note frontends come into play with what is considered -ffast-math > and -fno-fast-math but below flags are tested irrespectively of that > interpretation. > > Note there's -fcx-fortran-rules similar to -fcx-limited-range but not tested > above. The canonical middle-end "flag" to look at is flag_complex_method. > Somehow -fcx-fortran-rules doesn't come into play at all above but it > affects -fcx-limited-range in another inconsistent way in that > -fcx-limited-range -fcx-fortran-rules and -fcx-fortran-rules > -fcx-limited-range > behave the same (-fcx-fortran-rules takes precedence...). I guess > -fcomplex-method=ENUM should be exposed and -fcx-* made > appropriate aliases here. I agree it would probably be the best if to use a -fcomplex-method=... flag, handled analogously to -fexcess-precision in that it defaults to an explicit DEFAULT value; unless this is overridden by an explicit command line flag, the language front-end can then choose what DEFAULT means for this particular language. However, I'd prefer to not include that change into this patch as well :-) This patch only changes the behavior related to -fcx-limited-range in one very special case (-Ofast -fno-fast-math), where it makes it strictly better. It should not affect the (existing) interaction with -fcx-fortran-rules at all -- so anything to improve this can be done in a separate patch, I think. > You're tapping into a mine-field ;) That's for sure :-) Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Fix -ffast-math flags handling inconsistencies
Joseph Myers wrote: > On Fri, 7 Feb 2020, Ulrich Weigand wrote: > > > I thought that a *more specific* option like -fsignalling-nans was always > > intended to override a more generic option like -ffast-math, no matter > > whether it comes before or after it on the command line. > > Yes, that's correct. (There are cases where it's ambiguous which option > is more specific - see what I said in > <https://gcc.gnu.org/ml/gcc/2016-03/msg00245.html> about -Werror= - but I > don't think -ffast-math involves such cases.) I see. However, this does not appear to be implemented in current code. GCC (without my patch) behaves like this (using -fno-finite-math-only as an example instead of -fsignaling-nans, given the separate point about the latter option raised by Segher): gcc -dD -E -x c /dev/null | grep FINITE #define __FINITE_MATH_ONLY__ 0 gcc -dD -E -x c /dev/null -ffast-math | grep FINITE #define __FINITE_MATH_ONLY__ 1 gcc -dD -E -x c /dev/null -ffast-math -fno-finite-math-only | grep FINITE #define __FINITE_MATH_ONLY__ 0 gcc -dD -E -x c /dev/null -fno-finite-math-only -ffast-math | grep FINITE #define __FINITE_MATH_ONLY__ 1 Given the above rule, in the last case __FINITE_MATH_ONLY__ should also be 0, right? I'm not sure how this can be implemented in the current option handling framework. The -ffast-math handling case would have to check whether or not there is any explicit use of -f(no-)finite-math-only; is there any infrastructure to readily perform such a test? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Fix -ffast-math flags handling inconsistencies
Segher Boessenkool wrote: > On Fri, Feb 07, 2020 at 05:47:32PM +0100, Ulrich Weigand wrote: > > > but what happens to -fsignalling-nans -ffast-math then? Better leave > > > those > > > in I'd say. > > > > Ah, it seems I was confused about the intended semantics here. > > > > I thought that a *more specific* option like -fsignalling-nans was always > > intended to override a more generic option like -ffast-math, no matter > > whether it comes before or after it on the command line. > > -fsignaling-nans is an independent option. Signaling NaNs don't do much > at all when you also have -ffinite-math-only, of course, which says you > don't have *any* NaNs. That's a good point, thanks for pointing this out! I agree that -fsignaling-nans is not a good example then; it really should be independent of -ffast-math. However, there's still one other option that has a similar issue: -frounding-math. This *is* properly related to -ffast-math, in that -ffast-math really ought to imply -fno-rounding-math, but the latter is (currently) also the default in GCC. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Fix -ffast-math flags handling inconsistencies
Joseph Myers wrote: > On Mon, 10 Feb 2020, Ulrich Weigand wrote: > > I'm not sure how this can be implemented in the current option handling > > framework. The -ffast-math handling case would have to check whether > > or not there is any explicit use of -f(no-)finite-math-only; is there > > any infrastructure to readily perform such a test? > > Pass opts_set as well as opts to set_fast_math_flags, and use it for > checking whether an option was explicitly specified (whether enabled or > disabled) on the command line. Ah, I see. I'll update the patch accordingly. Thanks! I guess I still need to check for the opts->frontend_set_... flags as well -- those are not reflected in opts_set, right? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH v2] Fix -ffast-math flags handling inconsistencies
: - set_unsafe_math_optimizations_flags (opts, value); + set_unsafe_math_optimizations_flags (opts, opts_set, value); break; case OPT_ffixed_: @@ -2839,44 +2841,44 @@ set_Wstrict_aliasing (struct gcc_options *opts, int onoff) /* The following routines are useful in setting all the flags that -ffast-math and -fno-fast-math imply. */ static void -set_fast_math_flags (struct gcc_options *opts, int set) +set_fast_math_flags (struct gcc_options *opts, +struct gcc_options *opts_set, int set) { - if (!opts->frontend_set_flag_unsafe_math_optimizations) + if (!opts->frontend_set_flag_unsafe_math_optimizations + && !opts_set->x_flag_unsafe_math_optimizations) { opts->x_flag_unsafe_math_optimizations = set; - set_unsafe_math_optimizations_flags (opts, set); + set_unsafe_math_optimizations_flags (opts, opts_set, set); } if (!opts->frontend_set_flag_finite_math_only) -opts->x_flag_finite_math_only = set; +SET_OPTION_IF_UNSET (opts, opts_set, flag_finite_math_only, set); if (!opts->frontend_set_flag_errno_math) -opts->x_flag_errno_math = !set; - if (set) -{ - if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT) - opts->x_flag_excess_precision - = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; - if (!opts->frontend_set_flag_signaling_nans) - opts->x_flag_signaling_nans = 0; - if (!opts->frontend_set_flag_rounding_math) - opts->x_flag_rounding_math = 0; - if (!opts->frontend_set_flag_cx_limited_range) - opts->x_flag_cx_limited_range = 1; -} +SET_OPTION_IF_UNSET (opts, opts_set, flag_errno_math, !set); + if (!opts->frontend_set_flag_cx_limited_range) +SET_OPTION_IF_UNSET (opts, opts_set, flag_cx_limited_range, set); + if (!opts->frontend_set_flag_excess_precision) +SET_OPTION_IF_UNSET (opts, opts_set, flag_excess_precision, +set ? EXCESS_PRECISION_FAST +: EXCESS_PRECISION_DEFAULT); + + // -ffast-math should also reset -frounding-math, but since this + // is off by default, there's nothing to do for now. } /* When -funsafe-math-optimizations is set the following flags are set as well. */ static void -set_unsafe_math_optimizations_flags (struct gcc_options *opts, int set) +set_unsafe_math_optimizations_flags (struct gcc_options *opts, +struct gcc_options *opts_set, int set) { if (!opts->frontend_set_flag_trapping_math) -opts->x_flag_trapping_math = !set; +SET_OPTION_IF_UNSET (opts, opts_set, flag_trapping_math, !set); if (!opts->frontend_set_flag_signed_zeros) -opts->x_flag_signed_zeros = !set; +SET_OPTION_IF_UNSET (opts, opts_set, flag_signed_zeros, !set); if (!opts->frontend_set_flag_associative_math) -opts->x_flag_associative_math = set; +SET_OPTION_IF_UNSET (opts, opts_set, flag_associative_math, set); if (!opts->frontend_set_flag_reciprocal_math) -opts->x_flag_reciprocal_math = set; +SET_OPTION_IF_UNSET (opts, opts_set, flag_reciprocal_math, set); } /* Return true iff flags in OPTS are set as if -ffast-math. */ @@ -2884,10 +2886,14 @@ bool fast_math_flags_set_p (const struct gcc_options *opts) { return (!opts->x_flag_trapping_math + && !opts->x_flag_signed_zeros + && opts->x_flag_associative_math + && opts->x_flag_reciprocal_math && opts->x_flag_unsafe_math_optimizations && opts->x_flag_finite_math_only - && !opts->x_flag_signed_zeros && !opts->x_flag_errno_math + && !opts->x_flag_rounding_math + && opts->x_flag_cx_limited_range && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST); } -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: RFA: fix find_valid_class to accept classes w/out last hard reg
Joern Rennecke wrote: > 2013-05-02 Joern Rennecke > > * reload.c (find_valid_class): Allow classes that do not include > FIRST_PSEUDO_REGISTER - 1. This is OK. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Do not mark pseudo-copies decomposable during first lower-subreg pass
> * gcc.dg/lower-subreg-1.c: Disable on arm-*-* targets. I just noticed that the triple is incomplete; we're supposed to use arm*-*-* instead of just arm-*-*. Checked in the the following fix as obvious. Bye, Ulrich 2012-10-01 Ulrich Weigand * gcc.dg/lower-subreg-1.c: Disable on arm*-*-* targets. Index: gcc/testsuite/gcc.dg/lower-subreg-1.c === *** gcc/testsuite/gcc.dg/lower-subreg-1.c (revision 191805) --- gcc/testsuite/gcc.dg/lower-subreg-1.c (working copy) *** *** 1,4 ! /* { dg-do compile { target { ! { mips64 || { arm-*-* ia64-*-* spu-*-* tilegx-*-* } } } } } */ /* { dg-options "-O -fdump-rtl-subreg1" } */ /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */ /* { dg-require-effective-target ilp32 } */ --- 1,4 ! /* { dg-do compile { target { ! { mips64 || { arm*-*-* ia64-*-* spu-*-* tilegx-*-* } } } } } */ /* { dg-options "-O -fdump-rtl-subreg1" } */ /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */ /* { dg-require-effective-target ilp32 } */ -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[RFC] find_reloads_subreg_address rework triggers i386 back-end issue
Hello, I was running a couple of tests on various platforms in preparation of getting the find_reload_subreg_address patch needed by aarch64 upstream: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html This unfortunately uncovered a regression in vect-98-big-array.c on i386. It seems to me that this is due to a pre-existing problem in the back-end. The insn being reloaded is: (insn 17 15 19 3 (set (subreg:V16QI (reg:V4SI 182) 0) (unspec:V16QI [ (mem:V16QI (plus:SI (reg:SI 64 [ ivtmp.97 ]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32]) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) where reg:V4SI 182 was spilled to the stack. With the current compiler, this gets reloaded via a straightforward output reload: (insn 17 15 195 3 (set (reg:V16QI 21 xmm0) (unspec:V16QI [ (mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32]) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) (insn 195 17 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp) (const_int 112 [0x70])) [4 %sfp+-1232 S16 A128]) (reg:V16QI 21 xmm0)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 901 {*movv16qi_internal} (nil)) But with the above patch applied, we get an input reload instead: (insn 196 15 17 3 (set (reg:V16QI 21 xmm0) (mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32])) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 901 {*movv16qi_internal} (nil)) (insn 17 196 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp) (const_int 112 [0x70])) [4 %sfp+-1232 S16 A128]) (unspec:V16QI [ (reg:V16QI 21 xmm0) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) Now this is clearly broken, since now the load from memory is no longer marked as potentially unaligned. And indeed execution of this instruction traps due to a misaligned access. However, looking at the underlying pattern: (define_insn "_movdqu" [(set (match_operand:VI1 0 "nonimmediate_operand" "=x,m") (unspec:VI1 [(match_operand:VI1 1 "nonimmediate_operand" "xm,x")] UNSPEC_MOVU))] "TARGET_SSE2 && !(MEM_P (operands[0]) && MEM_P (operands[1]))" the back-end actually *allows* this transformation just fine. So reload has always had both options. Since both require just one reload, it's a matter of additional heuristics which will get chosen, and my patch as a side effect slightly changes one of those heuristics ... But in principle the problem is latent even without the patch. The back-end should not permit reload to make changes to the pattern that fundamentally change the semantics of the resulting instruction ... I was wondering if the i386 port maintainers could have a look at this pattern. Shouldn't we really have two patterns, one to *load* an unaligned value and one to *store* and unaligned value, and not permit that memory access to get reloaded? [I guess a more fundamental change might also be to not have an unspec in the first place, but only plain moves, and check MEM_ALIGN in the move insn emitter to see which variant of the instruction is required ...] Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue
Uros Bizjak wrote: > On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand wrote: > > I was wondering if the i386 port maintainers could have a look at this > > pattern. Shouldn't we really have two patterns, one to *load* an unaligned > > value and one to *store* and unaligned value, and not permit that memory > > access to get reloaded? > > Please find attached a fairly mechanical patch that splits > move_unaligned pattern into load_unaligned and store_unaligned > patterns. We've had some problems with this pattern, and finally we > found the reason to make unaligned moves more robust. > > I will wait for the confirmation that attached patch avoids the > failure you are seeing with your reload patch. Yes, this patch does in fact fix the failure I was seeing with the reload patch. (A full regression test shows a couple of extra fails: FAIL: gcc.target/i386/avx256-unaligned-load-1.c scan-assembler sse_movups/1 FAIL: gcc.target/i386/avx256-unaligned-load-3.c scan-assembler sse2_movupd/1 FAIL: gcc.target/i386/avx256-unaligned-load-4.c scan-assembler avx_movups256/1 FAIL: gcc.target/i386/avx256-unaligned-store-4.c scan-assembler avx_movups256/2 But I guess these tests simply need to be updated for the new pattern names.) Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[PATCH v2, ARM] 64-bit shifts in NEON
Richard Earnshaw wrote: > On 20/09/12 16:49, Ulrich Weigand wrote: > > Richard Earnshaw wrote: > > > >> Hmm, this is going to cause bottlenecks on Cortex-A15: writing a Neon > >> single-precision register and then reading it back as a double-precision > >> value will cause scheduling problems. [snip] > >> A solution to this is to have the set of the shifter register done as a > >> lane-set operation rather than as a set of the lower register, but it > >> probably needs some thought as to how to achieve this without creating > >> other overheads. > > > > What instruction are you refering to here? Loads from memory? > > Yes, if that's the source, or if from another register, something like > > vmov.32 Dd[0], Rt > > (it doesn't matter that the other lane remains unintialized). This has > the advantage that it doesn't clobber the other half of the register. > > If the operand is already known to be in an S-register, then > vdup(scalar) can be used, but of course that needs a full 64-bit target > register. Here's an updated version of the patch that allocated double registers and uses lane-set or load from memory instructions to fill in the shift count operand. Re-tested on arm-linux-gnueabi with no regression. Re-benchmarked (the Linaro backport) with results comparable to the original patch. Would this version be OK? Bye, Ulrich 2012-10-16 Andrew Stubbs Ulrich Weigand * config/arm/arm.c (arm_emit_coreregs_64bit_shift): Fix comment. * config/arm/arm.md (opt, opt_enabled): New attributes. (enabled): Use opt_enabled. (ashldi3, ashrdi3, lshrdi3): Add TARGET_NEON case. (ashldi3): Allow general operands for TARGET_NEON case. * config/arm/iterators.md (rshifts): New code iterator. (shift, shifttype): New code attributes. * config/arm/neon.md (UNSPEC_LOAD_COUNT): New unspec type. (neon_load_count, ashldi3_neon_noclobber, ashldi3_neon, signed_shift_di3_neon, unsigned_shift_di3_neon, ashrdi3_neon_imm_noclobber, lshrdi3_neon_imm_noclobber, di3_neon): New patterns. Index: gcc/config/arm/arm.c === *** gcc/config/arm/arm.c(revision 191400) --- gcc/config/arm/arm.c(working copy) *** arm_autoinc_modes_ok_p (enum machine_mod *** 26293,26300 Input requirements: - It is safe for the input and output to be the same register, but early-clobber rules apply for the shift amount and scratch registers. ! - Shift by register requires both scratch registers. Shift by a constant ! less than 32 in Thumb2 mode requires SCRATCH1 only. In all other cases the scratch registers may be NULL. - Ashiftrt by a register also clobbers the CC register. */ void --- 26293,26299 Input requirements: - It is safe for the input and output to be the same register, but early-clobber rules apply for the shift amount and scratch registers. ! - Shift by register requires both scratch registers. In all other cases the scratch registers may be NULL. - Ashiftrt by a register also clobbers the CC register. */ void Index: gcc/config/arm/neon.md === *** gcc/config/arm/neon.md (revision 191400) --- gcc/config/arm/neon.md (working copy) *** *** 23,28 --- 23,29 (define_c_enum "unspec" [ UNSPEC_ASHIFT_SIGNED UNSPEC_ASHIFT_UNSIGNED + UNSPEC_LOAD_COUNT UNSPEC_VABD UNSPEC_VABDL UNSPEC_VADD *** *** 1170,1175 --- 1171,1359 DONE; }) + ;; 64-bit shifts + + ;; This pattern loads a 32-bit shift count into a 64-bit NEON register, + ;; leaving the upper half uninitalized. This is OK since the shift + ;; instruction only looks at the low 8 bits anyway. To avoid confusing + ;; data flow analysis however, we pretend the full register is set + ;; using an unspec. + (define_insn "neon_load_count" + [(set (match_operand:DI 0 "s_register_operand" "=w,w") + (unspec:DI [(match_operand:SI 1 "nonimmediate_operand" "Um,r")] +UNSPEC_LOAD_COUNT))] + "TARGET_NEON" + "@ +vld1.32\t{%P0[0]}, %A1 +vmov.32\t%P0[0], %1" + [(set_attr "neon_type" "neon_vld1_vld2_lane,neon_mcr")] + ) + + (define_insn "ashldi3_neon_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w,w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " w,w") + (match_operand:DI 2 "reg_or_int_operand" " i,w")))] + "TARGET_NEON && reload_completed +
[commit] Re-work find_reloads_subreg_address
Tejas Belagod wrote: > > Ulrich Weigand wrote: > >> The following patch implements this idea; it passes a basic regression > >> test on arm-linux-gnueabi. (Obviously this would need a lot more > >> testing on various platforms before getting into mainline ...) > >> > >> Can you have a look whether this fixes the problem you're seeing? [snip] > Sorry for the delay in replying. These new issues that I was seeing were > bugs specific to my code that I've fixed. Your patch seems to work fine. > Thanks a lot for the patch. I've now checked the patch in to mainline. In addition to your test on aarch64, I've completed tests without regression on ppc(64)-linux, s390(x)-linux, and i386-linux (with Uros' patch). Note that Uros' patch here: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01463.html is a pre-requisite for the reload patch to avoid regressions on i386. Current version (nearly unchanged) of the patch appended below. Bye, Ulrich ChangeLog: * reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE parameter. Always replace normal subreg with memory reference whenever possible. Return NULL otherwise. (find_reloads_toplev): Always call find_reloads_subreg_address for subregs of registers equivalent to a memory location. Only recurse further if find_reloads_subreg_address fails. (find_reloads_address_1): Only call find_reloads_subreg_address for subregs of registers equivalent to a memory location. Properly handle failure of find_reloads_subreg_address. Index: gcc/reload.c === *** gcc/reload.c(revision 191665) --- gcc/reload.c(working copy) *** static int find_reloads_address_1 (enum *** 282,288 static void find_reloads_address_part (rtx, rtx *, enum reg_class, enum machine_mode, int, enum reload_type, int); ! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type, int, rtx, int *); static void copy_replacements_1 (rtx *, rtx *, int); static int find_inc_amount (rtx, rtx); --- 282,288 static void find_reloads_address_part (rtx, rtx *, enum reg_class, enum machine_mode, int, enum reload_type, int); ! static rtx find_reloads_subreg_address (rtx, int, enum reload_type, int, rtx, int *); static void copy_replacements_1 (rtx *, rtx *, int); static int find_inc_amount (rtx, rtx); *** find_reloads_toplev (rtx x, int opnum, e *** 4810,4840 } /* If the subreg contains a reg that will be converted to a mem, !convert the subreg to a narrower memref now. !Otherwise, we would get (subreg (mem ...) ...), !which would force reload of the mem. ! !We also need to do this if there is an equivalent MEM that is !not offsettable. In that case, alter_subreg would produce an !invalid address on big-endian machines. ! !For machines that extend byte loads, we must not reload using !a wider mode if we have a paradoxical SUBREG. find_reloads will !force a reload in that case. So we should not do anything here. */ if (regno >= FIRST_PSEUDO_REGISTER ! #ifdef LOAD_EXTEND_OP ! && !paradoxical_subreg_p (x) ! #endif ! && (reg_equiv_address (regno) != 0 ! || (reg_equiv_mem (regno) != 0 ! && (! strict_memory_address_addr_space_p ! (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), ! MEM_ADDR_SPACE (reg_equiv_mem (regno))) ! || ! offsettable_memref_p (reg_equiv_mem (regno)) ! || num_not_at_initial_offset ! x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels, ! insn, address_reloaded); } for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) --- 4810,4828 } /* If the subreg contains a reg that will be converted to a mem, !attempt to convert the whole subreg to a (narrower or wider) !memory reference instead. If this succeeds, we're done -- !otherwise fall through to check whether the inner reg still !needs address reloads anyway. */ if (regno >= FIRST_PSEUDO_REGISTER ! && reg_equiv_memory_loc (regno) != 0) ! { ! tem = find_reloads_subreg_address (x, opnum, type, ind_levels, !insn, address_reloaded); ! if (tem) ! return tem; ! } } for (copied = 0, i = GET_RTX_LENGTH (code)
Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.
Andreas Krebbel wrote: > +; Accept single register and immediate operands usable as shift > +; counts. > +(define_predicate "addrreg_or_constint_operand" > + (match_code "reg, subreg, const_int") I'm wondering whether this is even necessary. > +{ > + if (GET_MODE (op) != VOIDmode > + && GET_MODE_CLASS (GET_MODE (op)) != MODE_INT) > +return false; The predicate always seems to be used with a mode, so any extra mode checks here seem redundant. > + while (op && GET_CODE (op) == SUBREG) > +op = SUBREG_REG (op); > + > + if (REG_P (op) > + && (REGNO (op) >= FIRST_PSEUDO_REGISTER || ADDR_REG_P (op))) > +return true; > + > + if (CONST_INT_P (op)) > +return true; This looks somewhat copied from shift_count_or_setmem_operand, where we have a comment: /* Don't allow any non-base hard registers. Doing so without confusing reload and/or regrename would be tricky, and doesn't buy us much anyway. */ With the new set up, I don't believe there is any issue with confusing reload -- it's no more complex than any other pattern now. So it should be fine to just accept any register. In fact, I'm starting to wonder whether it wouldn't be just fine to simply using nonmemory_operand instead of this special predicate (the only issue might be that we could get CONST_WIDE_INT, but it should be simple to handle those). > +(define_expand "rotl3" > + [(set (match_operand:GPR 0 "register_operand" "") > +(rotate:GPR (match_operand:GPR 1 "register_operand" "") > + (match_operand:SI 2 "shift_count_or_setmem_operand" "")))] Similarly, I think the expander no longer needs to use the special predicate shift_count_or_setmem_operandm but could just use nonmemory_operand. [ This will reject the PLUS that shift_count_or_setmem_operand accepted, but I don't believe you ever *could* get the PLUS in the expander anyway. It will only be moved in by combine, so as long as the insn accepts it, everything should be good. ] > +(define_insn "*rotl3" > + [(set (match_operand:GPR 0 "register_operand" "=d,d") > + (rotate:GPR (match_operand:GPR 1 "register_operand""d,d") > + (match_operand:SI 2 "addrreg_or_constint_operand" "a,n")))] > + "TARGET_CPU_ZARCH" > + "@ > + rll\t%0,%1,(%2) > + rll\t%0,%1,%Y2" So it looks like %Y is now always used for just a constant. Maybe the implementation of %Y should be adjusted (once the patch series is complete)? Also, if we use just nonmemory_operand, %Y should be updated to accept CONST_WIDE_INT just like e.g. %x does. > +; This expands an register/immediate operand to a register+immediate > +; operand to draw advantage of the address style operand format > +; providing a addition for free. > +(define_subst "addr_style_op_subst" > + [(set (match_operand:DSI 0 "" "") > +(SUBST:DSI (match_operand:DSI 1 "" "") > +(match_operand:SI 2 "" "")))] > + "REG_P (operands[2])" > + [(set (match_dup 0) > +(SUBST:DSI (match_dup 1) > +(plus:SI (match_dup 2) > + (match_operand 3 "const_int_operand" "n"]) This seems to add just a single alternative. Is that OK if it gets substituted into a pattern that used multiple alternatives otherwise? > +; In the subst pattern we have to disable the alternative where op2 is > +; already a constant. This attribute is supposed to be used in the > +; "disabled" insn attribute to achieve this. > +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1") Is this necessary given that the subst adds a REG_P (operands[2]) condition? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 5/9] S/390: Get rid of Y constraint in arithmetic right shift patterns.
Andreas Krebbel wrote: > +; This is like the addr_style_op substitution above but with a CC clobber. > +(define_subst "addr_style_op_cc_subst" > +; This is like the masked_op substitution but with a CC clobber. > +(define_subst "masked_op_cc_subst" A bit unfortunate that these need to be duplicated. Does the subst always have to match the full pattern, or can it match and operate on just one element of a PARALLEL? > +; This adds an explicit CC reg set to an operation while keeping the > +; set for the operation result as well. > +(define_subst "setcc_subst" > +; This adds an explicit CC reg set to an operation while dropping the > +; result of the operation. > +(define_subst "cconly_subst" These are nice! It would seem they could be applied to simplify many of the non-shift patterns too, right? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 6/9] S/390: Get rid of Y constraint in tabort.
Andreas Krebbel wrote: > (define_insn "*tabort_1" > - [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" > "Y")] > + [(unspec_volatile [(match_operand:SI 0 "addrreg_or_constint_operand" > "a,n")] > UNSPECV_TABORT)] >"TARGET_HTM && operands != NULL" > - "tabort\t%Y0" > + "@ > + tabort\t0(%0) > + tabort\t%0" > + [(set_attr "op_type" "S")]) > + > +(define_insn "*tabort_1_plus" > + [(unspec_volatile [(plus:SI (match_operand:SI 0 "register_operand" "a") > + (match_operand:SI 1 "const_int_operand" "J"))] > + UNSPECV_TABORT)] > + "TARGET_HTM && operands != NULL" > + "tabort\t%1(%0)" This seems dangerous: const_int_operand may match a constant that does not fit into the "J" constraint, which would lead to an abort in reload. What is the semantics for the abort code anyway? It is supposed to be automatically truncated or not? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.
Andreas Krebbel wrote: > +(define_insn "*vec_extract_plus" > + [(set (match_operand: 0 > "nonimmediate_operand" "=d,QR") > + (unspec: [(match_operand:V 1 "register_operand" > "v, v") > +(plus:SI (match_operand:SI 2 "register_operand" > "a, I") The second alternative can never match here, can it? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 8/9] S/390: Use define_subst for the setmem patterns.
Andreas Krebel: > While trying to get rid of the Y constraint in the setmem patterns I > noticed that for these patterns it isn't even a problem since these > always only use the constraint with a Pmode match_operand. But while > being at it I've tried to fold some of the patterns a bit. Hmm, I'm wondering whether it's worth it to use a subst pattern that is only ever used in one single instance? (Specifically I'm wondering about the 31z subst. The and subst is used multiple times.) > +; Turn the DImode in the 31 bit pattern into TImode to enforce > +; register pair usage even with -mzarch. > +; The subreg offset is adjusted accordingly. > +(define_subst "setmem_31z_subst" > + [(clobber (match_operand:DI 0 "register_operand" "")) > + (set (mem:BLK (subreg:SI (match_operand:DI 3 "register_operand" "") > 0)) > +(unspec:BLK [(match_operand:SI 2 > "shift_count_or_setmem_operand" "") > + (subreg:SI (match_operand:DI 4 "register_operand" "") > + 0)] Is this right? Shouldn't it be subreg offset 4 originally? > + UNSPEC_REPLICATE_BYTE)) > + (use (match_operand:DI 1 "register_operand" "")) > + (clobber (reg:CC CC_REGNUM))] > +"" > + [(clobber (match_operand:TI 0 "register_operand" "")) > + (set (mem:BLK (subreg:SI (match_operand:TI 3 "register_operand" "") > 4)) > +(unspec:BLK [(match_operand:SI 2 > "shift_count_or_setmem_operand" "") > + (subreg:SI (match_operand:TI 4 "register_operand" "") > + 12)] > + UNSPEC_REPLICATE_BYTE)) > + (use (match_operand:TI 1 "register_operand" "")) > + (clobber (reg:CC CC_REGNUM))]) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 2/9] S/390: Add disabled insn attribute
Andreas Krebbel wrote: > With this patch a `disabled' attribute is defined which can be used to > explicitly disable an alternative. This comes handy when defining the > substitutions later and while adding it anyway I've used it for the > existing cases as well. > +; Insn attribute with boolean value > +; 1 to disable an insn alternative > +; 0 or * for an active insn alternative > +; an active alternative can still be disabled due to lacking cpu > +; facility support > +(define_attr "disabled" "" (const_int 0)) > + > (define_attr "enabled" "" > - (cond [(eq_attr "cpu_facility" "standard") > + (cond [(eq_attr "disabled" "1") > + (const_int 0) > + > + (eq_attr "cpu_facility" "standard") >(const_int 1) > > (and (eq_attr "cpu_facility" "ieee") So I'm wondering what the difference is between this and simply overriding the default implementation of "enabled" per-insn? So instead of adding (set_attr "disabled" "0,1")]) to an insn, you might simply add instead: (set_attr "enabled" "*,0")]) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 9/9] S/390: Disallow SImode in s390_decompose_address
Andreas Krebbel wrote: > * config/s390/s390.c (s390_decompose_address): Don't accept SImode > anymore. Great! Very nice to finally get rid of this. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.
Andreas Krebbel wrote: > On 02/01/2016 02:30 PM, Ulrich Weigand wrote: > > This seems to add just a single alternative. Is that OK if it gets > > substituted into a pattern that used multiple alternatives otherwise? > Yes. This is supposed to work. The new constraint will be duplicated until > it matches the number > of alternatives in the original insn. OK, that makes sense. But now I'm wondering about this bit of the ashiftrt patch: +; FIXME: The number of alternatives is doubled here to match the fix +; number of 4 in the subst pattern for the (clobber (match_scratch... +; The right fix should be to support match_scratch in the output +; pattern of a define_subst. +(define_subst "cconly_subst" + [(set (match_operand:DSI 0 "" "") +(match_operand:DSI 1 "" "")) + (clobber (reg:CC CC_REGNUM))] + "s390_match_ccmode(insn, CCSmode)" + [(set (reg CC_REGNUM) + (compare (match_dup 1) (const_int 0))) + (clobber (match_scratch:DSI 0 "=d,d,d,d"))]) I guess this is where the number 4 comes from? But if the alternatives are duplicated, shouldn't it then work to simply use: (clobber (match_scratch:DSI 0 "=d"))]) > >> +; In the subst pattern we have to disable the alternative where op2 is > >> +; already a constant. This attribute is supposed to be used in the > >> +; "disabled" insn attribute to achieve this. > >> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1") > > > > Is this necessary given that the subst adds a REG_P (operands[2]) > > condition? > I wasn't sure about this. How does reload/lra behave when the constraints > allow more than the insn > condition? I expected that reload might remember that one of the alternatives > might take a constant > as well and then tries as part of reload inheritance and such things to > actually exploit this?! Just > disabling the alternative in order to prevent reload from recording stuff for > that alternative > appeared safer to me. > > On the other hand it is not ideal to have an operands[x] check in the insn > condition in the first > place. Especially when using define_substs where the operands get renumbered > this might add hard to > find bugs. Perhaps I should try replacing the insn predicate with a > subst_attr instead? Is having a > constraint which allows more than a predicate any better than having the same > with the insn condition? Hmm. In theory reload should always respect both constraints and predicates. But I guess it can't hurt to disable the alternative just to be safe; it is indeed an uncommon case to have the constraint accept more than the predicate. (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.
Andreas Krebbel wrote: > > Hmm. In theory reload should always respect both constraints and > > predicates. > > But I guess it can't hurt to disable the alternative just to be safe; it is > > indeed an uncommon case to have the constraint accept more than the > > predicate. > > (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.) > > Does this mean you would prefer to convert the insn condition into a > predicate and change it with > subst_attr or should I leave it in the insn condition? If we can transform the operand predicate, that would certainly be the best. I.e. where the original pattern has: (match_operand:SI 2 "nonmemory_operand" "a,n") have the replaced pattern use instead: (plus:SI (match_operand:SI 2 "register_operand" "a") (match_operand 3 "const_int_operand" "n")) if that is possible via the subst mechanism ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 2/9] S/390: Add disabled insn attribute
Andreas Krebbel wrote: > On 02/01/2016 02:45 PM, Ulrich Weigand wrote: > > So I'm wondering what the difference is between this and simply > > overriding the default implementation of "enabled" per-insn? > > > > So instead of adding > > (set_attr "disabled" "0,1")]) > > to an insn, you might simply add instead: > > (set_attr "enabled" "*,0")]) > Not sure but wouldn't this mean that the value of the enabled attribute would > then depend on the > order of the set_attr for "enabled" and "cpu_facility" since one is defined > on the value of the other?! I don't think the order matches; genattrtab seems to first read in all .md files and only then optimize and then emit definitions of all the get_attr_... functions. The eq_attr uses in the "enabled" definition would only be evaluated at that point. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > Here we go. I've also removed the "see below", since I don't really > see anything below... The "see below" refers to this code (which I agree isn't really obvious): if (TARGET_TPF_PROFILING) { /* Generate a BAS instruction to serve as a function entry intercept to facilitate the use of tracing algorithms located at the branch target. */ emit_insn (gen_prologue_tpf ()); What is not explicitly called out here is that this tracing function actually refers to some hard registers, in particular r14, and assumes they still have the original contents as at function entry. That is why the prolog code avoid using r14 as temporary if the TPF tracing mechanism is in use. Now I think this doesn't apply to r12, so this part of your patch should still be fine. (In addition, TPF is not going to support split stacks --or indeed the Go language-- anyway, so it doesn't really matter all that much.) I do have two other issues; sorry for bringing those up again although they've been discussed up in the past, but I still think we can find some improvements here ... The first is the question Andreas brought up, why we need the extra set of insns introduced by s390_reorg. I think this may really have been necessary for the ESA case where data elements had to be intermixed into code at a specific location. But since we no longer support ESA, we now just have a data block that can be placed anywhere. For example, we could just have an insn (at any point in the prolog stream) that simply emits the full data block during final output, along the lines of (note: needs to be updated for SImode vs. DImode.): (define_insn "split_stack_data" [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X") (match_operand 1 "bras_sym_operand" "X") (match_operand 2 "consttable_operand" "X") (match_operand 3 "consttable_operand" "X")] UNSPECV_SPLIT_STACK_DATA)] "" { switch_to_section (targetm.asm_out.function_rodata_section (current_function_decl)); output_asm_insn (\".align 3", operands); (*targetm.asm_out.internal_label) (asm_out_file, \"L\", CODE_LABEL_NUMBER (operands[0])); output_asm_insn (\".quad %2\", operands); output_asm_insn (\".quad %3\", operands); output_asm_insn (\".quad %1-%0\", operands); switch_to_section (current_function_section ()); return ""; } [(set_attr "length" "0")]) Or possibly even cleaner, we can simply define the data block at the tree level as if it were an initialized global variable of a certain struct type, and just leave it to common code to emit it as usual. Then we just have the code bits, but I don't really see much difference between the split_stack_call and split_stack_sibcall patterns (apart from the data block), so if code flow is OK with the former insns, it should be OK with the latter too .. [ Or else, if there *are* code flow issues, the other alternative would be to emit the full call sequence, code and data, from a single insn pattern during final output. This might have the extra benefit that the assembler sequence is fully fixed, and thus easier to detect in the linker. ] Getting rid of the extra transformation in s390_reorg would not just remove a bunch of code from the back-end (always good!), it would also speed up compile time a bit. The second issue I'm still not sure about is the magic nop marker for frameless functions. In an earlier mail you wrote: > Both currently supported > architectures always emit split-stack code on every function. At least for rs6000 this doesn't appear to be true; in rs6000_expand_split_stack_prologue we have: if (!info->push_p) return; so it does nothing for frameless routines. Now on i386 we do indeed generate code for frameless routines; in fact, the *same* full stack check is generated as for any other routine. Now I'm wondering: is there are reason why this check would be necessary (and there's simply a bug in the rs6000 implementation)? Then we obviously should do the same on s390. On the other hand, if rs6000 works fine *without* any code in frameless routines, why wouldn't that work for s390 too? Emitting a nop (that is always executed) still looks weird to me. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > Comment fixed, split_stack_marker gone, reorg gone. Generated code seems > sane, > but testsuite still running. > > I will need to modify the gold patch to handle the "leaf function taking > non-split > stack function address" issue - this will likely require messing with the > target > independent plumbing, the hook for that doesn't seem to get enough params. Thanks for making those changes; the patch is looking a lot nicer (and shorter :-)) now! Just to clarify, your original patch series had two common-code prerequisite patches (3/5 and 4/5) -- it looks like those may still be needed? If so, we'll have to get approval from the appropriate middle-end maintainers before this patch can go it as well. As to the back-end patch, I've now only got some cosmetical issues: > + insn = emit_insn (gen_main_base_64 (r1, parm_base)); Now that we aren't using the literal pool infrastructure for the block any more, I guess we shouldn't be using it to load the address either. Just something like: insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base)); should do it. > +(define_insn "split_stack_data" > + [(unspec_volatile [(match_operand 0 "" "X") > + (match_operand 1 "" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] And similarly here, just use const_int_operand. Otherwise, this all looks very good to me. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > libgcc/ChangeLog: > > * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > * config/s390/morestack.S: New file. > * config/s390/t-stack-s390: New file. > * generic-morestack.c (__splitstack_find): Add s390-specific code. > > gcc/ChangeLog: > > * common/config/s390/s390-common.c (s390_supports_split_stack): > New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > * config/s390/s390.c (struct machine_function): New field > split_stack_varargs_pointer. > (s390_register_info): Mark r12 as clobbered if it'll be used as temp > in s390_emit_prologue. > (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > vararg pointer. > (morestack_ref): New global. > (SPLIT_STACK_AVAILABLE): New macro. > (s390_expand_split_stack_prologue): New function. > (s390_live_on_entry): New function. > (s390_va_start): Use split-stack vararg pointer if appropriate. > (s390_asm_file_end): Emit the split-stack note sections. > (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. > (UNSPECV_SPLIT_STACK_CALL): New unspec. > (UNSPECV_SPLIT_STACK_DATA): New unspec. > (split_stack_prologue): New expand. > (split_stack_space_check): New expand. > (split_stack_data): New insn. > (split_stack_call): New expand. > (split_stack_call_*): New insn. > (split_stack_cond_call): New expand. > (split_stack_cond_call_*): New insn. > --- > Changes applied. Testsuite still running, still works on my simple tests. > > As for common code prerequisites: #3 is no longer needed, and very likely > so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite > runs just fine without it now). OK, I see. The patch is OK for mainline then, assuming testing passes. Thanks again, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > Ugh. I take that back. For -fPIC, the load-address sequence is: > > larl%r1,f@GOTENT > lg %r2,0(%r1) > br %r14 This is correct. > And (sibling) call sequence is: > > larl%r1,f@GOTENT > lg %r1,0(%r1) > br %r1 Oops. That is actually a GCC bug. The sibcall sequence really must be: jg f@PLT This is a real bug since it forces non-lazy symbol resolution for f just because the compiler chose those use a sibcall optimization; that's not supposed to happen. It seems this bug was accidentally introduced here: 2010-04-20 Andreas Krebbel PR target/43635 * config/s390/s390.c (s390_emit_call): Turn direct into indirect calls for -fpic -m31 if they have been sibcall optimized. since the patch doesn't check for TARGET_64BIT ... Andreas, can you have a look? > It seems there's no proper way to recognize a call vs a load address - > so we can either go with emitting the marker, or have the same problem > as on ppc. > > So - how much should we care? I think we should fix that bug. That won't help for existing objects, but those don't use split stack either, so that shouldn't matter. If we fix that bug before (or at the same time as) adding split-stack support, the linker will still be able to distigunish function pointer loads from calls (including sibcalls) on all objects using split stack. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > Fair enough. Here's what I'm going to implement in gold: > > - any PLT relocation: call > - PC32DBL on a larl: non-call > - PC32DBL otherwise: call > - any other relocation: non-call > > Does that sound right? Hmm, I'm wondering about the PC32DBL choices. There are now a large number of other non-call instructions that use PC32DBL, including lrl, strl, crl, cgrl, cgfrl, ... However, these all access *data* at the pointed-to location, so it is quite unlikely they would ever be used with a function symbol. So, assuming that you also check that the target of the relocation is a function symbol, treating only larl as non-call might be OK. Maybe a more conservative approach might be to make the decision the other way round: for PC32DBL check for *branch* instructions, and treat only those are calls. There's just a few branch instruction using PC32DBL: brasl (call) brcl (conditional or unconditional sibcall) brcth (???) where the last one is extremely unlikely (but theorically possible) to be used as conditional sibcall combined with a register decrement; I don't think this can ever happen with current compilers however. For full completeness, there are also PC16DBL relocations that *could* target called functions, but only when compiling with the -msmall-exec flag to assume total executable size is less than 64 KB. These are used by the following instructions: bras brc brct brctg brxh brxhg brxle brxlg crj cgrj clrj clgrj cij cgij clij clgij Note that those are *all* branch instructions, so it might make sense to add any PC16DBL targetting a function symbol to the list of calls, just in case. (But since basically nobody ever uses -msmall-exec, it doesn't really matter much either.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] s390: Add -fsplit-stack support
Marcin Kościelnicki wrote: > I'll stay with checking for larl - while I can imagine someone adding a > new conditional branch instruction, I don't see a need for another > larl-like instruction. Besides, this way the failure mode for an > unknown instruction would be producing an error, instead of silently > emitting code with unfixed prologue. OK, fine with me. B.t.w. Andreas has checked in the sibcall fix, so you no longer should be seeing larl used for sibcalls. > I've updated and resubmitted the gold patch. Thanks! Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st
Hi Bill, > 2014-02-20 Bill Schmidt > > * config/rs6000/altivec.md (altivec_lvxl): Rename as > *altivec_lvxl__internal and use VM2 iterator instead of > V4SI. > (altivec_lvxl_): New define_expand incorporating > -maltivec=be semantics where needed. I just noticed that this: > -(define_insn "altivec_lvxl" > +(define_expand "altivec_lvxl_" >[(parallel > -[(set (match_operand:V4SI 0 "register_operand" "=v") > - (match_operand:V4SI 1 "memory_operand" "Z")) > +[(set (match_operand:VM2 0 "register_operand" "=v") > + (match_operand:VM2 1 "memory_operand" "Z")) > (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] >"TARGET_ALTIVEC" > - "lvxl %0,%y1" > +{ > + if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG) > +{ > + altivec_expand_lvx_be (operands[0], operands[1], mode, > UNSPEC_SET_VSCR); > + DONE; > +} > +}) > + > +(define_insn "*altivec_lvxl__internal" > + [(parallel > +[(set (match_operand:VM2 0 "register_operand" "=v") > + (match_operand:VM2 1 "memory_operand" "Z")) > + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] > + "TARGET_ALTIVEC" > + "lvx %0,%y1" >[(set_attr "type" "vecload")]) now causes vec_ldl to emit the lvx instead of the lvxl instruction. I assume this was not actually intended? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RS6000] reload_vsx_from_gprsf splitter
David Edelsohn wrote: > On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra wrote: > > This is PR68973 part 2, the failure of a boost test, caused by a > > splitter emitting an invalid move in reload_vsx_from_gprsf: > > emit_move_insn (op0_di, op2); > > > > op0 can be any vsx reg, but the mtvsrd destination constraint in > > movdi_internal64 is "wj", which only allows fp regs. I'm not sure why > > we have that constraint so left movdi_internal64 alone and used a > > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used > > by reload_vsx_from_gpr. When looking at those, I noticed we're > > restricted to fprs there too so fixed that as well. (We can't use %L > > in asm operands that must accept vsx.) > > Michael, please review the "wj" constraint in these patterns. > > Alan, the explanation says that it uses a special p8_mtvsrd similar to > p8_mtvsrd_[12], but does not explain why the two similar patterns are > removed. The incorrect use of %L implies those patterns, but the > change is to p8_xxpermdi_ that is not mentioned in the > ChangeLog. > > I also would appreciate Uli's comments on this direction because of > his reload expertise. For the most part, this patch doesn't really change anything in the interaction with reload as far as I can see. The changes introduced by the patch do make sense to me. In particular, replacing the two patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts of a TFmode register pair with a single pattern p8_mtvsrd that just works on any DFmode register (leaving the split into high/low to the caller if necessary) seems to simplify things. > > + /* You might think that we could use op0 as one temp and a DF clobber > > + as the other, but you'd be wrong. These secondary_reload patterns > > + don't check that the clobber is different to the destination, which > > + is probably a reload bug. */ It's not a bug, it's deliberate behavior. The reload registers allocated for secondary reload clobbers may overlap the destination, since in many cases you simply move the input to the reload register, and then the reload register to the destination (where the latter move can be a no-op if it is possible to allocate the reload register and the destination into the same physical register). If you need two separate intermediate values, you need to allocate a secondary reload register of a larger mode (as is already done in the pattern). > >/* Also use the destination register to hold the unconverted DImode > > value. > > This is conceptually a separate value from OP0, so we use gen_rtx_REG > > rather than simplify_gen_subreg. */ > > - rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); > > + rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0)); > > + rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0)); While this was not introduced by this patch, I'm a little bit concerned about the hard-coded use of REGNO here, which will crash if op0 at this point happens to be a SUBREG instead of a REG. This is extremely unlikely at this point in reload, but not 100% impossible, e.g. if op0 for some reason is one of the "magic" registers like the stack pointer. That's why it is in general better to use simplify_gen_subreg or one of gen_highpart/gen_lowpart, which will handle SUBREG correctly as well. I'm not sure why it matters whether this is "conceptually a separate value" as the comment argues ... > >/* Move SF value to upper 32-bits for xscvspdpn. */ > >emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); > > - emit_move_insn (op0_di, op2); > > - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di)); > > + emit_insn (gen_p8_mtvsrd (op0_df, op2)); > > + emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf)); The sequence of modes used for op0 here is a bit weird. First, op0 is loaded as DFmode by mtvsrd, then it is silently re- interpreted as V4SFmode when used as input to xscvspdpn, which gets a DFmode output that is again silently re-interpreted as SFmode. This isn't really wrong as such, just maybe a bit confusing. Maybe instead have p8_mtvsrd use DImode as output (instead of DFmode), which shouldn't be any harder to use in the reload_vsx_from_gpr splitter, and keep using the vsx_xscvspdpn_directmove pattern? [ This of course reinforces the question why we have p8_mtvsrd in the first place, instead of just allowing this use directly in movdi_internal64 itself. ] Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RS6000] reload_vsx_from_gprsf splitter
(Replying to multiple messages at once ...) Michael Meissner wrote: > At the present time, we do not allow DImode to go into Altivec > registers. Mostly it was due to reload complications in the power8 time frame, > where we didn't have d-form addressing for the Altivec registers and > interference with the other DImode reload patterns, but also I felt I would > need to go through the main rs6000.md to look for the various DImode patterns > to see if they needed to be updated. At some I hoped to extend this, so I > added the "wi" and "wj" constraints to be used whenever the mode is DImode. > "wi" is the constraint for the VSX register class DImode can go in > (i.e. currently FLOAT=5FREGS), and "wj" is the VSX register class DImode can > go > in if we have 64-bit direct move. I see. I hadn't noticed the restriction on DImode, sorry. I think that in general, it would be a good idea to allow DImode wherever DFmode is allowed, since the same instructions should be able to handle both. But given the unexpected problems that seem to turn up whenever we want to allow more modes in more registers, this is probably better for stage 1 than right now ... > The TFmode thing was a hack. It was the only way I could see getting two > registers without a lot of hair. Since TFmode is also restricted to FPR_REGS, > you could use %L on it. When I was writing it, I really wished we had a > reload > pattern to get more than one temporary, but we didn't, so I went for TFmode to > get 2 FPR registers. Given the replacement pattern no longer uses a TFmode > temporary, it can go in any appropriate register. Again, it would probably make sense to allow TFmode (when it is double double) in any pair of registers where DFmode is allowed, since the type *is* just a pair of DFmode values. Again, really more a stage 1 matter ... > On Thu, Feb 11, 2016 at 07:38:15PM +0100, Ulrich Weigand wrote: > > It's not a bug, it's deliberate behavior. The reload registers allocated > > for secondary reload clobbers may overlap the destination, since in many > > cases you simply move the input to the reload register, and then the > > reload register to the destination (where the latter move can be a no-op > > if it is possible to allocate the reload register and the destination > > into the same physical register). If you need two separate intermediate > > values, you need to allocate a secondary reload register of a larger > > mode (as is already done in the pattern). > > This is one of the cases I wished the reload support had the ability to > allocate 2 scratch temporaries instead of 1. As I said in my other message, > TFmode was a hack to get two registers to use. Yes, it would be nice if we could specify multiple scratch registers. That's a long-standing FIXME in the reload code: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ Unfortunately, given the structure of reload, that is difficult to fix. > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote: > > Another concern I had about this, besides using %L in asm output (what > > forces TFmode to use just fprs?), is what happens when we're using > > IEEE 128-bit floats? In that case it looks like we'd get just one reg. > > Good point that it breaks if the default long double (TFmode) type is IEEE > 128-bit floating point. We would need to have two patterns, one that uses > TFmode and one that uses IFmode. I wrote the power8 direct move stuff before > going down the road of IEEE 128-bit floating point. Right. It's a bit unfortunate that we can't just use IFmode unconditionally, but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and then we probably shouldn't be using it. Another option might be to use TDmode to allocate a scratch register pair. David Edelsohn wrote: > Good question: is p8_mtvsrd really necessary if the movdi_internal64 > is updated to use the correct constraints? > > The patch definitely is going in the right direction. Can we remove > more unnecessary bits? Given Mike's reply above, I don't think it is simply a matter of updating constraints in the one pattern. Rather, enabling DImode in all vector registers would require a change to rs6000_hard_regno_mode_ok and then reviewing all DImode patterns to make sure they're still compatible. As I said above, that's probably more of a stage 1 effort. As long as that is not done, my original suggestion here: > Maybe instead have p8_mtvsrd use DImode as output (instead of > DFmode), which shouldn't be any ha
Re: [RS6000] reload_vsx_from_gprsf splitter
Alan Modra wrote: > On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote: > > Right. It's a bit unfortunate that we can't just use IFmode > > unconditionally, > > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and > > then we probably shouldn't be using it. > > Actually, we can use IFmode unconditionally. scalar_mode_supported_p > is relevant only up to and including expand. Nothing prevents the > backend from using IFmode. Hmm, OK. That does make things simpler. > > Another option might be to use TDmode to allocate a scratch register pair. > > That won't work, at least if we want to extract the two component regs > with simplify_gen_subreg, due to rs6000_cannot_change_mode_class. In > my original patch I just extracted the regs by using gen_rtx_REG but I > changed that, based on your criticism of using gen_rtx_REG in > reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using > operand regnos in other places. That particular change is of course > entirely cosmetic. I also changed reload_vsx_from_gprsf to avoid mode > punning regs, instead duplicating insn patterns as done elsewhere in > the vsx support. I don't believe we will see subregs of vsx or fp > regs after reload, but I'm quite willing to concede the point for a > stage4 fix. I was thinking here that in the special case of the *reload scratch register*, which reload allocates for us, we will always get a full register. This is different from some other operand that may originate from pre-existing RTX that may require a SUBREG even after reload. But I certainly agree that your current patch looks like a good choice for a stage4 bugfix change. Further cleanup can always happen later. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH, rs6000] PR 66337: Improve Compliance with Power ABI
Kevin Nilsen wrote: > This patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu and= > powerpc64be-unknown-linux-gnu (both 32-bit and 64-bit) and=20 > powerpc64-unknown-freebsd11.0 (big endian) with no regressions. Is it ok t= > o fix this on the trunk? > > The problem described in PR66337 (https://gcc.gnu.org/bugzilla/show_bug.cgi= > ?id=3D3D66337) is that compiling for PowerPC targets with the -malign-power= > command-line option results in invalid field offsets for certain structure= > members. As identified in the problem report, this results from a macro = > definition present in both config/rs6000/{freebsd64,linux64}.h, which in bo= > th cases is introduced by the comment: > > /* PowerPC64 Linux word-aligns FP doubles when -malign-power is given. */ > > I have consulted various ABI documents, including "64-bit PowerPC ELF Appli= > cation Binary Interface Supplement 1.9" (http://refspecs.linuxfoundation.or= > g/ELF/ppc64/PPC-elf64abi.html), "Power Architecture 64-Bit ELF V2 ABI Speci= > fication" (https://members.openpowerfoundation.org/document/dl/576), and "P= > ower > Architecture(R) 32-bit Application Binary Interface Supplement 1.0 - Linux(= > R) & Embedded" (https://www.power.org/documentation/power-architecture-32-b= > it-abi-supplement-1-0-embeddedlinuxunified/). I have not been able to find= > any basis for this comment and thus am concluding that the comment and exi= > sting implementation are incorrect. > > The implemented patch removes the comment and changes the macro definition = > so that field alignment calculations on 64-bit architectures ignore the -ma= > lign-power command-line option. With this fix, the test case identified in= > the PR behaves as was expected by the submitter. There seems to be some confusion here. First of all, on Linux and FreeBSD, the *default* behavior is -malign-natural, which matches what the Linux ABI specifies. Using -malign-power on *Linux* is an explicit instruction to the compiler to *deviate* from the documented ABI. The only effect that the deviation has on Linux is to change the alignment requirements for certain structure elements. Your patch removes this change, making -malign-power fully a no-op on Linux and FreeBSD. This doesn't seem to be particularly useful ... If you don't want the effect, you can simply not use that switch. To my understanding, the intent of providing that switch was to allow creating code that is compatible code produced by some other compilers that do not adhere to the Linux ABI, but some other ABI. In particular, my understanding is that the *AIX* ABI has these alignment requirements. And in fact, GCC on *AIX* defaults to -malign-power. Looking at PR 66337, the submitter actually refers to the behaviour of GCC on AIX, so I'm not sure how Linux is even relevant here. (Maybe there is something wrong in how GCC implements the AIX ABI. But I'm not really familar with AIX, so I can't help much with that.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.
Andreas Krebbel wrote: > * config/s390/predicates.md (const_int_6bitset_operand): New > predicates. > * config/s390/s390.md: Include subst.md. > ("rotl3"): New expander. > ("rotl3", "*rotl3_and"): Merge insn definitions into > ... > ("*rotl3"): New insn definition. > * config/s390/subst.md: New file. This already looks much nicer (and shorter) :-) In fact, I'm now wondering whether it couldn't be even shorter. Would it be possible to use instead of: (define_insn "*rotl3" [(set (match_operand:GPR 0 "register_operand" "=d,d") (rotate:GPR (match_operand:GPR 1 "register_operand" "d,d") (match_operand:SI 2 "nonmemory_operand" "a,n")))] "TARGET_CPU_ZARCH" "@ rll\t%0,%1,(%2) rll\t%0,%1,%Y2" [(set_attr "op_type" "RSE") (set_attr "atype""reg") (set_attr "enabled" "*,") (set_attr "z10prop" "z10_super_E1")]) simply something like: (define_insn "*rotl3" [(set (match_operand:GPR 0 "register_operand" "=d") (rotate:GPR (match_operand:GPR 1 "register_operand" "d") (match_operand:SI 2 "nonmemory_operand" "an")))] "TARGET_CPU_ZARCH" "rll\t%0,%1, [(set_attr "op_type" "RSE") (set_attr "atype""reg") (set_attr "z10prop" "z10_super_E1")]) where addr_style_op_ops is defined like: (define_subst_attr "addr_style_op_ops" "addr_style_op_subst" "%Y2" "%Y3(%2)") and we don't need addr_style_op_enabled any more? %Y would continue to emit both simple constants and register operands (and full address operands e.g. for setmem) as before. > +(define_subst "masked_op_subst" > + [(set (match_operand:DSI 0 "" "") > +(SUBST:DSI (match_operand:DSI 1 "" "") > +(match_operand:SI 2 "" "")))] > + "" > + [(set (match_dup 0) > +(SUBST:DSI (match_dup 1) > +(and:SI (match_dup 2) > +(match_operand:SI 3 "const_int_6bitset_operand" > ""]) Do we need a constraint letter here? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.
Andreas Krebbel wrote: > +; vec_set is supposed to *modify* an existing vector so operand 0 is > +; duplicated as input operand. > +(define_expand "vec_set" > + [(set (match_operand:V0 "register_operand" > "") > + (unspec:V [(match_operand: 1 "general_operand" > "") > +(match_operand:SI2 "shift_count_or_setmem_operand" > "") This is probably only cosmetic, but should we use nonmemory_operand here instead of shift_count_or_setmem_operand (just like everywhere else now)? > +(define_expand "vec_extract" > + [(set (match_operand: 0 "nonimmediate_operand" "") > + (unspec: [(match_operand:V 1 "register_operand" "") > +(match_operand:SI 2 "shift_count_or_setmem_operand" > "")] Likewise. > +(define_insn "*vec_set_plus" > + [(set (match_operand:V 0 "register_operand" "=v") > + (unspec:V [(match_operand: 1 "general_operand" "d") > +(plus:SI (match_operand:SI 2 "register_operand" "a") > + (match_operand:SI 4 "const_int_operand" "n")) > +(match_operand:V 3 "register_operand" "0")] > + UNSPEC_VEC_SET))] > + "TARGET_VX" > + "vlvg\t%v0,%1,%4(%2)" > + [(set_attr "op_type" "VRS")]) Wouldn't it be better to use %Y4 instead of %4 here? Or does the middle-end guarantee that this is never out of range? > +(define_insn "*vec_extract_plus" > + [(set (match_operand: 0 > "nonimmediate_operand" "=d,QR") > + (unspec: [(match_operand:V 1 "register_operand" > "v, v") > +(plus:SI (match_operand:SI 2 "nonmemory_operand" > "a, I") > + (match_operand:SI 3 "const_int_operand" > "n, I"))] > +UNSPEC_VEC_EXTRACT))] > + "TARGET_VX" > + "@ > + vlgv\t%0,%v1,%3(%2) > + vste\t%v1,%0,%2" > + [(set_attr "op_type" "VRS,VRX")]) Likewise for %3. Also, the second alternative seems odd, it matches solely a PLUS of two CONST_INTs, which is not canonical RTL ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 0/9] S/390 rework shift count handling - v3
Andreas Krebbel wrote: > S/390: Use enabled attribute overrides to disable alternatives. > S/390: Get rid of Y constraint in rotate patterns. > S/390: Get rid of Y constraint in left and logical right shift > patterns. > S/390: Get rid of Y constraint in arithmetic right shift patterns. > S/390: Get rid of Y constraint in tabort. > S/390: Get rid of Y constraint in vector.md. > S/390: Use define_subst for the setmem patterns. > S/390: Disallow SImode in s390_decompose_address Except for a few minor comments for the vector.md patch (separate mail), this all looks now very good to me. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.
I wrote: > Andreas Krebbel wrote: > > > +; vec_set is supposed to *modify* an existing vector so operand 0 is > > +; duplicated as input operand. > > +(define_expand "vec_set" > > + [(set (match_operand:V0 "register_operand" > >"") > > + (unspec:V [(match_operand: 1 "general_operand" > > "") > > + (match_operand:SI2 "shift_count_or_setmem_operand" > > "") > > This is probably only cosmetic, but should we use nonmemory_operand here > instead of shift_count_or_setmem_operand (just like everywhere else now)? > > > +(define_expand "vec_extract" > > + [(set (match_operand: 0 "nonimmediate_operand" "") > > + (unspec: [(match_operand:V 1 "register_operand" "") > > + (match_operand:SI 2 "shift_count_or_setmem_operand" > > "")] > > Likewise. I just noticed that there are two more UNSPEC_VEC_SET expanders in vx-builtins.md. I guess those should be likewise changed: (define_expand "vec_insert" [(set (match_operand:V_HW0 "register_operand" "") (unspec:V_HW [(match_operand: 2 "register_operand" "") (match_operand:SI3 "shift_count_or_setmem_operand" "") (match_operand:V_HW 1 "register_operand" "")] UNSPEC_VEC_SET))] "TARGET_VX" "") (define_expand "vec_promote" [(set (match_operand:V_HW0 "register_operand" "") (unspec:V_HW [(match_operand: 1 "register_operand" "") (match_operand:SI2 "shift_count_or_setmem_operand" "") (match_dup 0)] UNSPEC_VEC_SET))] "TARGET_VX" "") Then the only remaining users of shift_count_or_setmem_operand are the actual setmem patterns (so maybe the predicate can be renamed to "setmem_operand") :-) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class
H.J. Lu wrote: > + if (type && type_is_empty_type_p (type)) > +{ > + if (warn_empty_type_p) > + warn_empty_type (); > + return NULL; > +} If I understand the problem correctly, for C code empty types already were handled correctly, the problem occured just for C++ code (since the size of an empty size is != 0 there). However, it seems this warning check would now appear also for C code, even though there really shouldn't be any ABI changes there. Would it not be better to only generate the warning for C++ (maybe check whether the type size is nonzero)? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)
On 07/09/2015 11:43 PM, Martin Liška wrote: > This final version which I agreed with Richard Sandiford. > Hope this can be finally installed to trunk? > > Patch can bootstrap and survive regression tests on x86_64-linux-gnu. Unfortunately, this still crashes on my SPU toolchain build machine, for pretty much the same reason outlined here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00868.html However, the host compiler no longer miscompiles these lines: empty_shared_hash = new shared_hash_def; empty_shared_hash->refcount = 1; But that is simply because that "new" now goes to the default heap-based allocator from the standard library. There is a "shared_hash_def_pool", but that's apparently not being used for anything -- this probably was not intended? But now the following lines are miscompiled: elt_list *el = elt_list_pool.allocate (); el->next = next; el->elt = elt; from new_elt_list in cselib.c. Again, the "allocate" call ends simply with a cast: header = m_returned_free_list; m_returned_free_list = header->next; return (void *)(header); and type-based aliasing now states the access to "header->next" in allocate must not alias the access to "el->next" in new_elt_list, but clearly it does. (Since there is no C++ operator new involved at all anymore, this clearly violates even the C aliasing rules ...) I really think the allocate routine needs to be more careful to avoid violating aliasing, e.g. by using memcpy or union-based type-punning to access its free list info. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)
Richard Biener wrote: > On July 17, 2015 3:11:51 PM GMT+02:00, Ulrich Weigand > wrote: > >(Since there is no C++ operator new involved at all anymore, > >this clearly violates even the C aliasing rules ...) > > > >I really think the allocate routine needs to be more careful to > >avoid violating aliasing, e.g. by using memcpy or union-based > >type-punning to access its free list info. > > As far as I understand the object allocator delegates construction to callers > and thus in the above case cselib > Would be responsible for calling placement new on the return value from > Allocate. Ah, it looks like I was wrong above: the code uses the *object* allocator, so it should go through a placement new here: inline T * allocate () ATTRIBUTE_MALLOC { return ::new (m_allocator.allocate ()) T (); } It's still being miscompiled at least by my GCC 4.1 host compiler ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)
Richard Biener wrote: > On July 17, 2015 3:50:19 PM GMT+02:00, "Martin Liška" wrote: > >Question is why aliasing oracle still wrongly aliases these pointers? > >Another option (suggested by Martin Jambor) would be to place > >::allocate implementation > >to alloc-pool.c file. > > Note that all compilers up to 4.4 have aliasing issues with placement new. > A fix is to move the placement new out-of-line. Yes, that's what I just noticed as well. In fact, my particular problem already disappears with 4.3, presumably due to the fix for PR 29286. So do we now consider host compilers < 4.3 (4?) unsupported for building mainline GCC, or should we try to work around the issue (e.g. by moving the allocator out-of-line or using some other aliasing barrier)? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)
On July 17, 2015 6:54:32 PM GMT+02:00, Ulrich Weigand wrote: > >So do we now consider host compilers < 4.3 (4?) unsupported for > >building > >mainline GCC, or should we try to work around the issue (e.g. by moving > >the allocator out-of-line or using some other aliasing barrier)? > > Why is this an issue for stage1 which runs w/o optimization? Well, this is the SPU compiler on a Cell system, which is technically a cross compiler from PowerPC (even though the resulting binaries run natively on the machine). > For cross compiling we already suggest using known good compilers. The documentation says: To build a cross compiler, we recommend first building and installing a native compiler. You can then use the native GCC compiler to build the cross compiler. The installed native compiler needs to be GCC version 2.95 or later. So building with a native GCC 4.1 seems to have been officially supported until now as far as I can tell (unless you're building Ada). Now, I could certainly live with a statement that cross compilers can only be build with a native GCC 4.3 or newer; but that should be IMO a deliberate decision and be widely announced (maybe even verified by a configure check?), so that others don't run into the problem; the nature of its symptoms make the problem difficult to diagnose. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/4] S390 -march=native related fixes
Dominik Vogt wrote: > + opt_esa_zarch = (has_highgprs) ? " -mzarch" : " -mesa"; This will force -mesa on old machines *even in -m64 mode*, which is wrong and will cause compilation to fail. > -/* Defaulting rules. */ > #ifdef DEFAULT_TARGET_64BIT > -#define DRIVER_SELF_SPECS\ This completely removes use of DRIVER_SELF_SPECS for defaulting, which I introduced back here: https://gcc.gnu.org/ml/gcc-patches/2003-06/msg03369.html The reason for using DRIVER_SELF_SPECS as described there is to make sure we use compatible flags for compiler, assembler and linker in all cases, even if some of those flags result from defaulting rules. If we don't do that, we rely on those components agreeing exactly in how to default for unspecified options; for example, we want to give the correct -march flag to the assembler as an additional verification to detect compiler bugs where the compiler erroneously generates an incorrect instruction for that architecture. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/4] S390 -march=native related fixes
Dominik Vogt wrote: > * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle > processor capabilities with -march=native. > * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise. > (DRIVER_SELF_SPECS): Likewise. Join specs for 31 and 64 bit. > * (S390_TARGET_BITS_STRING): Macro to simplify specs. This version is looking good, except for one problem: > - "%{!m31:%{!m64:-m64}}",\ > - "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}}", \ > - "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}}", \ There's a reason for this particular sequence. The first line ensures that one of -m64 or -m31 is present. The second line ensures that one of -mesa or -mzarch is present, but this works only if already one of -m64 or -m31 is present, so it needs to come *after* the first line. The third line ensures that some -march= switch is present, but this works only if already one of -mesa or -mzarch is present, so it needs to comer *after* the second line. > +#define DRIVER_SELF_SPECS\ > + "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}} " \ > + "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}} " \ > + MARCH_MTUNE_NATIVE_SPECS \ > + "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}} " This inverts the order of the second and third lines, so it is now no longer guaranteed that at least one -march= switch is present. I understand that you need to move MARCH_MTUNE_NATIVE_SPECS ahead of the -mesa/-mzarch defaulting rule, but it should be possible to do that without changing the sequence of the three existing rules. Why not just move MARCH_MTUNE_NATIVE_SPECS first? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/4] S390 -march=native related fixes
Dominik Vogt wrote: > * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle > processor capabilities with -march=native. > * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise. > (DRIVER_SELF_SPECS): Likewise. Join specs for 31 and 64 bit. > * (S390_TARGET_BITS_STRING): Macro to simplify specs. (That last "*" is superfluous.) This looks correct to me now, just a cosmetic comment: > +/* Defaulting rules. */ > +#define DRIVER_SELF_SPECS\ > + "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}} ", \ > + MARCH_MTUNE_NATIVE_SPECS, \ > + "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}} ", \ > + "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}} " There's no need to add those spaces at the end -- the self specs are all independent string, they don't need to end in a space. Also, I had thought to put MARCH_MTUNE_NATIVE_SPECS right at the top of list, like so: #define DRIVER_SELF_SPECS \ MARCH_MTUNE_NATIVE_SPECS, \ "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}}", \ "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}}", \ "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}}" But there should not be any functional difference between the two, it just looks a bit nicer maybe. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: s390-linux fails to build
Jakub Jelinek wrote: > On Thu, Jul 23, 2015 at 01:03:19PM +0100, Nick Clifton wrote: > > Hi Helmut, Hi Ulrich, Hi Andreas, > > > > A toolchain configured as --target=s390-linux currently fails to build > > gcc because of an undefined function: > > > > undefined reference to `s390_host_detect_local_cpu(int, char const**)' > > Makefile:1858: recipe for target 'xgcc' failed > > > > The patch below fixes the problem for me by adding a stub function in > > s390-common.c, but I am not sure if it is the correct solution. > > Please can you advise ? > > Isn't it better to just follow what other arches do? > E.g. on i?86/x86_64, the EXTRA_SPEC_FUNCTIONS definition is guarded > with > #if defined(__i386__) || defined(__x86_64__) > and similarly on mips: > #if defined(__mips__) > and thus I'd expect s390{,x} should guard it with > #if defined(__s390__) || defined(__s390x__) > or so. This is supposed to be fixed by this pending patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01546.html (which hasn't been applied yet, but probably should be soon ...) > The config.host change also looks wrong, e.g. i?86 or mips have: > i[34567]86-*-* \ > | x86_64-*-* ) > case ${target} in > i[34567]86-*-* \ > | x86_64-*-* ) > host_extra_gcc_objs="driver-i386.o" > host_xmake_file="${host_xmake_file} i386/x-i386" > ;; > esac > ;; > mips*-*-linux*) > case ${target} in > mips*-*-linux*) > host_extra_gcc_objs="driver-native.o" > host_xmake_file="${host_xmake_file} mips/x-native" > ;; > esac > ;; > while s390 has: > s390-*-* | s390x-*-*) > host_extra_gcc_objs="driver-native.o" > host_xmake_file="${host_xmake_file} s390/x-native" > ;; > I bet that is gone break also cross-compilers from s390* to other targets. I think this should be fine on s390. The problem with i386 is that the driver-native.c file uses data types only defined by the i386 target files (e.g. enum processor_type). But on s390, the file does not any target-specific types and should be fully portable. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 3/4] S390 -march=native related fixes
Dominik Vogt wrote: > > gcc/ChangeLog > > * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle > processor capabilities with -march=native. > * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise. > (DRIVER_SELF_SPECS): Likewise. Join specs for 31 and 64 bit. > (S390_TARGET_BITS_STRING): Macro to simplify specs. This is OK. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: s390-linux fails to build
Jakub Jelinek wrote: > On Thu, Jul 23, 2015 at 02:46:43PM +0200, Ulrich Weigand wrote: > > > I bet that is gone break also cross-compilers from s390* to other targets. > > > > I think this should be fine on s390. The problem with i386 is that > > the driver-native.c file uses data types only defined by the i386 > > target files (e.g. enum processor_type). But on s390, the file does > > not any target-specific types and should be fully portable. > > That hunk means that driver-native.o is added to EXTRA_GCC_OBJS > even say for s390x-*-* -> x86_64-*-* compiler. While it might compile > there, nothing will use it, so what is it good for? > i?86/x86_64 backend will certainly not reference s390_host_detect_local_cpu > anywhere. Oh, I agree this will not be *used*. I just wanted to point out that it will not *break* cross-compilers as is. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S390: Clean up cross-compile for S390.
Dominik Vogt wrote: > gcc/ChangeLog > > * config.hosto_plugin_soname): Include driver-native.c only when > building with s390* as host and target. (There seems to be a typo in the log entry ...) This is OK. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH] Work around host compiler placement new aliasing bug
er than GCC 4.3. + CXXFLAGS="$CXXFLAGS -fno-strict-aliasing" + AC_MSG_CHECKING([whether $CXX is affected by placement new aliasing bug]) + AC_COMPILE_IFELSE([ +#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3) +#error compiler not affected by placement new aliasing bug +#endif +], +[AC_MSG_RESULT([yes]); aliasing_flags='-fno-strict-aliasing'], +[AC_MSG_RESULT([no])]) + + CXXFLAGS="$saved_CXXFLAGS" +fi +AC_SUBST(aliasing_flags) Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 226312) +++ gcc/Makefile.in (working copy) @@ -180,6 +180,8 @@ NOCOMMON_FLAG = @nocommon_flag@ NOEXCEPTION_FLAGS = @noexception_flags@ +ALIASING_FLAGS = @aliasing_flags@ + # This is set by --disable-maintainer-mode (default) to "#" # FIXME: 'MAINT' will always be set to an empty string, no matter if # --disable-maintainer-mode is used or not. This is because the @@ -986,7 +988,8 @@ ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \ # The C++ version. ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \ - $(COVERAGE_FLAGS) $(NOEXCEPTION_FLAGS) $(WARN_CXXFLAGS) @DEFS@ + $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \ + $(WARN_CXXFLAGS) @DEFS@ # Likewise. Put INCLUDES at the beginning: this way, if some autoconf macro # puts -I options in CPPFLAGS, our include files in the srcdir will always -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Work around host compiler placement new aliasing bug
Richard Biener wrote: > On Wed, Jul 29, 2015 at 3:57 PM, Ulrich Weigand wrote: > > Hello, > > > > this patch is a workaround for the problem discussed here: > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html > > > > The problem is that the new pool allocator code relies on C++ aliasing > > rules related to placement new (basically, that placement new changes > > the dynamic type of the referenced memory). GCC compilers prior to > > version 4.3 did not implement this rule correctly (PR 29286). > > > > When building current GCC with a host compiler that is affected by this > > bug, and we build with optimization enabled (this typically only happens > > when building a cross-compiler), the resulting compiler binary may be > > miscompiled. > > > > The patch below attempts to detect this situation by checking whether > > the host compiler is a version of GCC prior to 4.3 (but stil accepts > > the -fno-strict-aliasing flag). If so, -fno-strict-aliasing is added > > to the flags when building the compiler binary. > > > > Tested on i686-linux, and when building an SPU cross-compiler using > > a gcc 4.1 powerpc64-linux host compiler. > > > > OK for mainline? > > Ok if nobody objects. I've checked this in now. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: debug mode symbols cleanup
Francois Dumont wrote: > * include/debug/formatter.h > (_Error_formatter::_Parameter::_M_print_field): Delete. > (_Error_formatter::_Parameter::_M_print_description): Likewise. > (_Error_formatter::_M_format_word): Likewise. > (_Error_formatter::_M_print_word): Likewise. > (_Error_formatter::_M_print_string): Likewise. > (_Error_formatter::_M_get_max_length): Likewise. > (_Error_formatter::_M_max_length): Likewise. > (_Error_formatter::_M_indent): Likewise. > (_Error_formatter::_M_column): Likewise. > (_Error_formatter::_M_first_line): Likewise. > (_Error_formatter::_M_wordwrap): Likewise. > * src/c++11/debug.cc: Adapt. This seems to break building an spu-elf cross-compiler: /home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc: In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const char*, std::ptrdiff_t)': /home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10: error: 'stderr' was not declared in this scope fprintf(stderr, "\n"); ^ /home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22: error: 'fprintf' was not declared in this scope fprintf(stderr, "\n"); ^ Bye, Ulrich
Re: debug mode symbols cleanup
Jonathan Wakely wrote: > On 18/09/15 13:00 +0200, Ulrich Weigand wrote: > >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc: > > In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const > >char*, std::ptrdiff_t)': > >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10: > > error: 'stderr' was not declared in this scope > > fprintf(stderr, "\n"); > > ^ > >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22: > > error: 'fprintf' was not declared in this scope > > fprintf(stderr, "\n"); > > ^ > > Catherine fixed this with r227888. Ah, OK. Thanks! Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[commit, spu] Re: [BUILDROBOT] spu: left shift of negative value
Jan-Benedict Glaw wrote: > I just noticed that (for config_list.mk builds), current GCC errors > out at spu.c, see eg. build > http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=3D469639 : > > g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-excep= > tions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrit= > e-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -peda= > ntic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -f= > no-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. = > -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/c= > farm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../= > libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace -o = > spu.o -MT spu.o -MMD -MP -MF ./.deps/spu.TPo ../../../gcc/gcc/config/spu/sp= > u.c > ../../../gcc/gcc/config/spu/spu.c: In function =E2=80=98void spu_expand_ins= > v(rtx_def**)=E2=80=99: > ../../../gcc/gcc/config/spu/spu.c:530:46: error: left shift of negative val= > ue [-Werror=3Dshift-negative-value] >maskbits =3D (-1ll << (32 - width - start)); > ^ > ../../../gcc/gcc/config/spu/spu.c:536:46: error: left shift of negative val= > ue [-Werror=3Dshift-negative-value] >maskbits =3D (-1ll << (64 - width - start)); > ^ > cc1plus: all warnings being treated as errors > Makefile:2092: recipe for target 'spu.o' failed > make[2]: *** [spu.o] Error 1 > make[2]: Leaving directory '/home/jbglaw/build-configlist_mk/spu-elf/build-= > gcc/mk/spu-elf/gcc' I've now checked in the following fix. Thanks, Ulrich ChangeLog: * config/spu/spu.c (spu_expand_insv): Avoid undefined behavior. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 227968) --- gcc/config/spu/spu.c(working copy) *** spu_expand_insv (rtx ops[]) *** 472,478 { HOST_WIDE_INT width = INTVAL (ops[1]); HOST_WIDE_INT start = INTVAL (ops[2]); ! HOST_WIDE_INT maskbits; machine_mode dst_mode; rtx dst = ops[0], src = ops[3]; int dst_size; --- 472,478 { HOST_WIDE_INT width = INTVAL (ops[1]); HOST_WIDE_INT start = INTVAL (ops[2]); ! unsigned HOST_WIDE_INT maskbits; machine_mode dst_mode; rtx dst = ops[0], src = ops[3]; int dst_size; *** spu_expand_insv (rtx ops[]) *** 527,541 switch (dst_size) { case 32: ! maskbits = (-1ll << (32 - width - start)); if (start) ! maskbits += (1ll << (32 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 64: ! maskbits = (-1ll << (64 - width - start)); if (start) ! maskbits += (1ll << (64 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 128: --- 527,541 switch (dst_size) { case 32: ! maskbits = (~(unsigned HOST_WIDE_INT)0 << (32 - width - start)); if (start) ! maskbits += ((unsigned HOST_WIDE_INT)1 << (32 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 64: ! maskbits = (~(unsigned HOST_WIDE_INT)0 << (64 - width - start)); if (start) ! maskbits += ((unsigned HOST_WIDE_INT)1 << (64 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 128: -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[commit][spu] Support atomic builtins
"atomic_exchange" + [(match_operand:AINT 0 "spu_reg_operand" "");; output +(match_operand:AINT 1 "memory_operand" "") ;; memory +(match_operand:AINT 2 "spu_nonmem_operand" "") ;; input +(match_operand:SI 3 "const_int_operand" "")] ;; model + "" + { + rtx retval; + + if (MEM_ADDR_SPACE (operands[1])) + FAIL; + + retval = gen_reg_rtx (mode); + + emit_move_insn (retval, operands[1]); + emit_move_insn (operands[1], operands[2]); + emit_move_insn (operands[0], retval); + DONE; + }) + + (define_expand "atomic_" + [(ATOMIC:AINT + (match_operand:AINT 0 "memory_operand" "") ;; memory + (match_operand:AINT 1 "" "")) ;; operand +(match_operand:SI 2 "const_int_operand" "")] ;; model + "" + { + if (MEM_ADDR_SPACE (operands[0])) + FAIL; + + spu_expand_atomic_op (, operands[0], operands[1], + NULL_RTX, NULL_RTX); + DONE; + }) + + (define_expand "atomic_fetch_" + [(match_operand:AINT 0 "spu_reg_operand" "");; output +(ATOMIC:AINT + (match_operand:AINT 1 "memory_operand" "") ;; memory + (match_operand:AINT 2 "" "")) ;; operand +(match_operand:SI 3 "const_int_operand" "")] ;; model + "" + { + if (MEM_ADDR_SPACE (operands[1])) + FAIL; + + spu_expand_atomic_op (, operands[1], operands[2], + operands[0], NULL_RTX); + DONE; + }) + + (define_expand "atomic__fetch" + [(match_operand:AINT 0 "spu_reg_operand" "");; output +(ATOMIC:AINT + (match_operand:AINT 1 "memory_operand" "") ;; memory + (match_operand:AINT 2 "" "")) ;; operand +(match_operand:SI 3 "const_int_operand" "")] ;; model + "" + { + if (MEM_ADDR_SPACE (operands[1])) + FAIL; + + spu_expand_atomic_op (, operands[1], operands[2], + NULL_RTX, operands[0]); + DONE; + }) + -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
Hans-Peter Nilsson wrote: > The directory at $target_header_dir is already inspected in > gcc/configure, for e.g. glibc version and stack protector > support, but not for setting inhibit_libc. This is just > inconsistent and the obvious resolution to me is to inhibit > inhibit_libc when a target *does* "have its own set of headers", > to quote the comment above the inhibit_libc setting. There is > nothing in the build log for "make all-gcc" that shows a > difference with/without --with-headers, if headers are actually > present anyway! > gcc: > * configure.ac (target_header_dir): Move block defining > this to before the block setting inhibit_libc. > (inhibit_libc): When considering $with_headers, just > check it it's explicitly "no". If not, also check if > $target_header_dir/stdio.h is present. If not, set > inhibit_libc=true. > * configure: Regenerate. This had the effect of breaking all gcov-related tests in my SPU builder, since libgcov was stubbed out due to inhibit_libc now being set. I'm using the build procedure: build initial GCC (--without-headers), use it to build newlib, install newlib into prefix, build final GCC (--with-headers). Using this procedure, inihibit_libc used to *not* be set in the final GCC build, but now it is. I'm seeing two issues here: > +if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then > + if test "x$with_headers" != x; then > +target_header_dir=$with_headers In the common case of just using --with-headers, this now sets target_header_dir to "yes", which is not particularly useful. > + elif test "x$with_sysroot" = x; then > + > target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include" If I change my build procedure to omit --with-headers, it still won't find the headers since newlib install placed them into .../include, not .../sys-include. The whole sys-include thing has always confused me anyway :-) I thought that the point of sys-include is that if you specify --with-headers=, then the contents of will be copied into your prefix under .../sys-include; the compiler will use both .../sys-include *and* .../include, using the former to find headers specified using --with-headers, and the latter to find headers already installed in the prefix. While the compiler will search both locations, target_header_dir can only be set to one of them. However, in this case, it seems it would be more useful to set it to .../include: since .../sys-include should only ever contain anything when using --with-headers=, and in *that* case, target_header_dir is being set directly to anyway. Now I understand that you didn't introduce those lines, and they were already wrong before your patch; but after the patch the problem now actually matters. Before the patch, I could always use --with-header to say: just assume headers are present in the prefix, and everything worked. This is not a critical problem since I have a work-around: remove --with-headers and also manually create a symlink from sys-include to include in the prefix. But it would still be nice to avoid having to do the symlink ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
Hans-Peter Nilsson wrote: > > From: Ulrich Weigand > > Date: Tue, 6 Oct 2015 16:04:35 +0200 > > > I'm using the build procedure: build initial GCC (--without-headers), > > use it to build newlib, install newlib into prefix, build final GCC > > (--with-headers). Using this procedure, inihibit_libc used to *not* > > be set in the final GCC build, but now it is. > > And not using --with-newlib I think. Somewhat of a borderline > case, FWIW. Well, --with-newlib doesn't really matter here, since the only use in GCC itself is for this check: if { { test x$host != x$target && test "x$with_sysroot" = x ; } || test x$with_newlib = xyes ; } && { test "x$with_headers" = xno || test ! -f "$target_header_dir/stdio.h"; } ; then inhibit_libc=true fi and since for me host != target and I don't use a sysroot, the first condition in the || is already true. (I don't like using --with-newlib because it causes the configure scripts in the various target libraries to stop doing cross-compile checks and default to hard-coded assumptions on which functions are and are not present. Those hard-coded checks tend to be outdated and/or wrong for SPU; while the ususal cross-compile checks work just fine if newlib has been installed into the prefix before.) > > > +if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then > > > + if test "x$with_headers" != x; then > > > +target_header_dir=$with_headers > > > > In the common case of just using --with-headers, this now sets > > target_header_dir to "yes", which is not particularly useful. > > --with-headers without a path to an argument? > Odd but that *should* work. I see old lines here and there > including *toplevel* configure.ac that refers to that. Yes, pretty much the only use for --with-headers without argument was to short-circuit the inhibit_libc test. > > Now I understand that you didn't introduce those lines, and they were > > already > > wrong before your patch; but after the patch the problem now actually > > matters. > > Before the patch, I could always use --with-header to say: just assume > > headers > > are present in the prefix, and everything worked. > > At a quick glance and my initial guess there's a missing two or > four lines checking for with_headers=yes. > > > This is not a critical problem since I have a work-around: remove > > --with-headers > > and also manually create a symlink from sys-include to include in the > > prefix. > > But it would still be nice to avoid having to do the symlink ... > > I'd recommend writing a patch handling that "yes". > I know I'm the one "exposing a latent bug" but you're in a much > better position to test it. So the question is what this should do then? Should I simply add back the behavior that when using --with-headers, we never get inhibit_libc set? Or should we simply ignore --with-headers and check for the presence of headers installed in the prefix? This would work too, once we solve the sys-include vs. include problem. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
Hans-Peter Nilsson wrote: > Sanity-check: you do have a $target_header_dir/stdio.h right? Well, no. That was the point of my original mail :-) With my initial configure line, using --with-headers, $target_header_dir is "yes" at this point, and I don't have "yes/stdio.h". Omitting --with-headers, $target_header_dir is something along the lines of "/spu/sys-include", which *does not exist* on my system. However, all newlib headers including stdio.h are actually present, just in "/spu/include". > Maybe make with_headers=yes (i.e. not a path) have the effect of > setting target_header_dir to include instead of sys-include? That would solve my problem. However, if with_headers is not set at all, shouldn't we then *also* set it to include? I don't really particularly want to use --with-headers, it's just that this is what worked in the past. Ideally, it would be best if everything just worked as expected without any option if there are indeed already headers installed in the prefix. > Anyway, with_headers=yes/no should simply short-circuit an > inspection of $target_header_dir regarding setting inhibit_libc. > I guess. But then again, target_header_dir really needs to be > set usefully for inspection, or else every use of it needs to be > dominated by a with_headers=yes test or something. Well, every other use of $target_header_dir does not in fact matter on the SPU (because they check for features that aren't present in newlib and/or on the SPU anyway). That's why the weird behaviour did not really matter in the past ... So short-circuting just for inhibit_libc would get everything back to where we were before your patch. But I agree this is not really the right solution. My preferred solution would be to set $target_header_dir to include instead of sys-include, always. I'm just a bit worried if this may break other users that have a workflow that somehow works with the current setup using sys-include. I guess my underlying problem is that I don't quite understand how the whole sys-include thing was intended to work in the first place ... [ Note there's also CROSS_SYSTEM_HEADER_DIR, which is also sometimes set to sys-include ... ] Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
Hans-Peter Nilsson wrote: > > > From: Ulrich Weigand > > Date: Tue, 6 Oct 2015 18:55:53 +0200 > > > > Maybe make with_headers=yes (i.e. not a path) have the effect of > > > setting target_header_dir to include instead of sys-include? > > (...and inspect both, use the first one that works?) So what does "works" mean, here? Do you expect that all headers are present either in include or all in sys-include? Or is there a scenario where some headers may be in either directory, and you'd have to check separately for any header you want to access? > > My preferred solution would be to set $target_header_dir to include > > instead of sys-include, always. I'm just a bit worried if this may > > break other users that have a workflow that somehow works with the > > current setup using sys-include. > > (T would: proof of existence here.) I'm more than worried! > Please don't suggest to change a toolchain configuration item > only because you used something that half-way worked, then > broke, like this. Not the least when also saying the below: > > > I guess my underlying problem is > > that I don't quite understand how the whole sys-include thing was > > intended to work in the first place ... > > I don't know when "include" will work and not, in particular > compared to "sys-include". So I wondering: what is your current setup? What headers do you have in sys-include, and how did they get there? I'm aware of the copying done when using --with-headers=, but this case should still work, since $target_header_dir is set directly to in this case anyway. Is there some *other* case, where you do not use --with-headers=, but still have a pre-existing sys-include directory in the prefix? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
ude instead of > sys-include would be the least evil, least unexpected change, > that would work for most, including you. > > I wouldn't understand to instead change things around and make > "include" be inspected by default. It's only the --with-headers > option that's broken. So this would be: --with-headers= Copy headers from to sys-include --with-headers Use headers from prefix include directory Use existing sys-include directory --without-headers Do not use any headers I agree that this is the smallest change to current behavior; on the other hand, it seems quite odd overall (i.e. hardest to explain to someone unfamiliar with current behavior). At the very least, the docs would have to be adapted. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc
Hans-Peter Nilsson wrote: > > > So, ISTM we should change --with-headers (=yes) to either look > > > in sys-include or in include. Setting it to sys-include > > > wouldn't help you or anyone else as it's already the default... > > > > On the other hand, the current docs appear to imply that the > > intent was for --with-headers (=yes) to look into a pre-existing > > sys-include directory for headers. > > Right. So, if you'd prefer to align the implementation with > that, I don't mind. But, these are odd cases as-is, so current > use and users matter when aligning the documentation and > implementation and I wouldn't be surprised if the entire > usage-space is between ours... So overall, I now this is probably the best way forward: 1) Fix --with-headers to work just like no argument does now, i.e. look in sys-include. This is a simple bugfix that brings behavior in line with documentation. It does imply that everybody has to use sys-include, but that seems to be accepted practice anyway. (For me, it just means adding a symlink.) If at some point we do want to make things work without sys-include, I see two options: 2a) Change to not look into sys-include, but include. This would be a change in existing behavior that would affect some users. (They could get the old behavior back by simply adding --with-headers to their configure line, however.) --or-- 2b) Change target_header_dir from a single directory to a list of directories, and check all of these for header files. This list would typically include both sys-include and include. This should not change behavior for any existing user, and would bring the header search at configure time in line with the actual search order used by the compiler at run time, which will probably be the least surprise to users anyway ... For 1), something like the following should probably suffice: Index: gcc/configure.ac === --- gcc/configure.ac(revision 228530) +++ gcc/configure.ac(working copy) @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t fi if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then - if test "x$with_headers" != x; then + if test "x$with_headers" != x && test "x$with_headers" != xyes; then target_header_dir=$with_headers elif test "x$with_sysroot" = x; then target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include" I'll probably not spend any more time right now to try to implement either of the 2) variants; I can live with using sys-include for now. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH v2 13/13] Add hook for modifying debug info for address spaces
Richard Henderson wrote: > +@deftypefn {Target Hook} int TARGET_ADDR_SPACE_DEBUG (addr_space_t @var{as}) > +Define this to define how the address space is encoded in dwarf. > +The result, @var{r}, should be positive to indicate > +@code{DW_AT_address_class @var{r}} should be emitted; or negative > +to indicate @code{DW_AT_segment} with location in register @code{~@var{r}}. > +@end deftypefn >From my reading of the DWARF specs, DW_AT_segment and DW_AT_address_class are intended to be used for different purposes: Any debugging information entry that contains a description of the location of an object or subroutine may have a DW_AT_segment attribute, whose value is a location description. The description evaluates to the segment selector of the item being described. vs. Any debugging information entry representing a pointer or reference type or a subroutine or subroutine type may have a DW_AT_address_class attribute, whose value is an integer constant. The set of permissible values is specific to each target architecture. Back when we implemented address space support for Cell/B.E. I interpreted this to mean that DW_AT_segment was intended to be used for DIEs *defining* an object, while DW_AT_address_class was intended to be used for pointer types. (Since the former wasn't possible on Cell/B.E., we never actually implemented DW_AT_segment back then.) B.t.w. Appendix A of the DWARF4 standard also does *not* list DW_AT_segment as a valid attribute for a DW_TAG_pointer_type or DW_TAG_reference_type DIE. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH v2 13/13] Add hook for modifying debug info for address spaces
Richard Henderson wrote: > So you would recommend continuing to use address_class for these x86 segments? Yes, I think so. As far as I can see, this would simply require to define two address_class values to correspond to __seg_fs and __seg_gs. (These choices should best be documented in the platform ABI.) GDB infrastructure to decode and use address_class should already be in place, you simply need to implement the relevant gdbarch callbacks: address_class_type_flags Convert from DWARF address class to GDB internal encoding address_class_type_flags_to_name address_class_name_to_type_flags Convert between GDB internal encoding and printable name (e.g. "__seg_fs") address_to_pointer pointer_to_address Convert between a GDB flat address (CORE_ADDR) and a target-specific pointer encoding, based on the pointer's address class flags Of course, to do the latter, you still need access to the segment base values, as discussed on the GDB mailing list. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[ping] Re: Fix PR debug/66728
air (rtl, mode)); > + return true; > + } > + return false; > =20 > case CONST_DOUBLE: >/* Note that a CONST_DOUBLE rtx could represent either an integer or= > a > @@ -15671,7 +15678,7 @@ add_const_value_attribute (dw_die_ref die, rtx rtl) > =20 > case CONST: >if (CONSTANT_P (XEXP (rtl, 0))) > - return add_const_value_attribute (die, XEXP (rtl, 0)); > + return add_const_value_attribute (die, mode, XEXP (rtl, 0)); >/* FALLTHROUGH */ > case SYMBOL_REF: >if (!const_ok_for_output (rtl)) > @@ -16171,7 +16178,7 @@ add_location_or_const_value_attribute (dw_die_ref d= > ie, tree decl, bool cache_p, > =20 >rtl =3D rtl_for_decl_location (decl); >if (rtl && (CONSTANT_P (rtl) || GET_CODE (rtl) =3D=3D CONST_STRING) > - && add_const_value_attribute (die, rtl)) > + && add_const_value_attribute (die, DECL_MODE (decl), rtl)) > return true; > =20 >/* See if we have single element location list that is equivalent to > @@ -16192,7 +16199,7 @@ add_location_or_const_value_attribute (dw_die_ref d= > ie, tree decl, bool cache_p, >if (GET_CODE (rtl) =3D=3D EXPR_LIST) > rtl =3D XEXP (rtl, 0); >if ((CONSTANT_P (rtl) || GET_CODE (rtl) =3D=3D CONST_STRING) > - && add_const_value_attribute (die, rtl)) > + && add_const_value_attribute (die, DECL_MODE (decl), rtl)) >return true; > } >/* If this decl is from BLOCK_NONLOCALIZED_VARS, we might need its > @@ -16395,7 +16402,7 @@ tree_add_const_value_attribute (dw_die_ref die, tre= > e t) > =20 >rtl =3D rtl_for_decl_init (init, type); >if (rtl) > -return add_const_value_attribute (die, rtl); > +return add_const_value_attribute (die, TYPE_MODE (type), rtl); >/* If the host and target are sane, try harder. */ >else if (CHAR_BIT =3D=3D 8 && BITS_PER_UNIT =3D=3D 8 > && initializer_constant_valid_p (init, type)) > diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr66728.c b/gcc/testsuite/gc= > c.dg/debug/dwarf2/pr66728.c > new file mode 100644 > index 000..ba41e97 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr66728.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile { target { x86_64-*-* && lp64 } } } */ > +/* { dg-options "-O -gdwarf -dA" } */ > + > +__uint128_t > +test (void) > +{ > + static const __uint128_t foo =3D __uint128_t) 0x22334455) << 96) > + | 0x99aabb); > + > + return foo; > +} > + > +/* { dg-final { scan-assembler {\.quad\t0x99aabb\t# DW_AT_const_value} } }= > */ > +/* { dg-final { scan-assembler {\.quad\t0x22334455\t} } } */ > -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [ping] Re: Fix PR debug/66728
Bernd Schmidt wrote: > This is all a bit html mangled, but this and the other change in > loc_descriptor seem to have the effect of doing nothing when called with > BLKmode. What is the desired effect of this on the debug information, > and how is it achieved? Sorry for the mangling caused by my forwarding of the patch. Richard's original patch is here: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01312.html (I'll leave it to Richard to answer your other questions ...) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Fix warning in "*movstr" pattern.
Dominik Vogt wrote: > @@ -2936,7 +2936,7 @@ > (set (mem:BLK (match_operand:P 1 "register_operand" "0")) > (mem:BLK (match_operand:P 3 "register_operand" "2"))) > (set (match_operand:P 0 "register_operand" "=d") > - (unspec [(mem:BLK (match_dup 1)) > + (unspec:P [(mem:BLK (match_dup 1)) >(mem:BLK (match_dup 3)) >(reg:SI 0)] UNSPEC_MVST)) > (clobber (reg:CC CC_REGNUM))] Don't you have to change the expander too? Otherwise the pattern will no longer match ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: S/390: Fix warnings in "*setmem_long..." patterns.
Andreas Krebbel wrote: > On 11/30/2015 04:11 PM, Dominik Vogt wrote: > > The attached patch fixes some warnings generated by the setmem... > > patterns in s390.md during build and add test cases for the > > patterns. The patch is to be added on to p of the movstr patch: > > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03485.html > > > > The test cases validate that the patterns are actually used, but > > at the moment the setmem_long_and pattern is never actually used > > and thus the test case would fail. So I've split the patch in two > > (both attached to this message) to activate this part of the test > > once we've fixed that. > > > > The patch has passed the SPEC2006 testsuite without any measurable > > changes in performance. > > Shouldn't we instead describe the whole setmem operation as unspec including > the other operands as > well? The semantics of the introduced UNSPEC_P_TO_BLK operation is not clear > to me. It suggests to > be some kind of "cast" which it isn't. In fact it is not able to do its job > without the length which > is specified as use outside the unspec. Well, I guess I suggested to Dominik to leave the basic [parallel (set (dst:BLK) (src:BLK)) (use (length)] structure in place; my understanding is that the middle-end recognizes this as a block move. As "source" in this case we'd use a BLKmode operand that consist iof the same byte replicated a number of times. If we were to use just a single UNSPEC, how would we indicate to the middle-end that a block of memory is modified, without using too coarse- grained clobbers? However, I agree that UNSPEC_P_TO_BLK really should also get the length as input, to make it have precisely defined semantics. Also, I'd rather use a more descriptive name, like UNSPEC_REPLICATE_BYTE or the like. What would you think about something like the following? (define_insn "*setmem_long" [(clobber (match_operand: 0 "register_operand" "=d")) (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0)) (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y") (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE)) (use (match_operand: 1 "register_operand" "d")) (clobber (reg:CC CC_REGNUM))] [ Not sure if we'd need an extra (use (match_dup 3)) any more. ] B.t.w. this is certainly wrong and cannot be generated by common code: (and:BLK (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")] UNSPEC_P_TO_BLK) (match_operand 4 "const_int_operand" "n")) (This explains why the pattern would never match.) The AND should be on the filler byte instead: (unspec:BLK [(and:P (match_operand:P 2 "shift_count_or_setmem_operand" "Y") (match_operand:P 4 "const_int_operand" "n")) (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE)) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: S/390: Fix warnings in "*setmem_long..." patterns.
Andreas Krebbel wrote: > On 12/02/2015 11:12 AM, Dominik Vogt wrote: > > Hopefully, this is correct now; it does pass the functional test case > > that's part of the patch. Unfortunately the define_insn patters > > had to be duplicated because of the new subreg offsets. > > The number of patterns could possibly be reduced using the define_subst > machinery. I'm looking into > this for some other changes. No need to do this right now. We can do this > later on-top. For this particular issue, shouldn't a simple mode_attr be OK? I see that the sh port uses this: (define_mode_attr lowpart_be [(QI "3") (HI "2")]) [(set (reg:SI T_REG) (eq:SI (subreg:QIHI (and:SI (match_operand:SI 0 "arith_reg_operand") (match_operand:SI 1 "arith_reg_operand")) ) (const_int 0)))] Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFA][1/3] Remove Cell Broadband Engine SPU targets
Thomas Schwinge wrote: > In r276692, I committed the attached to "Add back Trevor Smigiel; move > into Write After Approval section", assuming that it was just an > oversight that he got dropped from the file, as he -- in contrast to > David and Ulrich -- doesn't remain listed with other roles. Ah, you're right -- thanks for catching this! Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[spu, commit] Define TARGET_HAVE_SPECULATION_SAFE_VALUE
Hello, the SPU processor is not affected by speculation, so this macro can safely be defined as speculation_safe_value_not_needed. Committed to mainline. Bye, Ulrich gcc/ChangeLog: PR target/86807 * config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define to speculation_safe_value_not_needed. Index: config/spu/spu.c === --- config/spu/spu.c(revision 263334) +++ config/spu/spu.c(working copy) @@ -7463,6 +7463,9 @@ static const struct attribute_spec spu_a #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT spu_constant_alignment +#undef TARGET_HAVE_SPECULATION_SAFE_VALUE +#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-spu.h" -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 2/2] S/390: Remove TARGET_CPU_ZARCH
Ilya Leoshkevich wrote: > TARGET_CPU_ZARCH allowed to distinguish between g5/g6 and newer > machines. Since the former are now gone, we can assume that > TARGET_CPU_ZARCH is always true. As a side-effect, branch splitting > is now completely gone. Some parts of literal pool splitting are > also gone, but it's still there: we need to support it because > floating point and vector instructions still cannot use relative > addressing. Great to see this finally go! B.t.w. the literal pool handling can be simplified further now. The current code was made complicated due to the potential of interaction between branch splitting and literal pool splitting, see the long comment: However, the two problems are interdependent: splitting the literal pool can move a branch further away from its target, causing the 64K limit to overflow, and on the other hand, replacing a PC-relative branch by an absolute branch means we need to put the branch target address into the literal pool, possibly causing it to overflow. So, we loop trying to fix up both problems until we manage to satisfy both conditions at the same time. [...] Since branch splitting is gone, this whole logic is superfluous; the loop will never execute more than once. (It may not be necessary any longer to split the logic into the various chunkify_start/finish routines either ...) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib
Jonathan Wakely wrote: > On 08/08/18 10:52 +0200, Sebastian Huber wrote: > >While building for Newlib, some configure checks must be hard coded. > >The aligned_alloc() is supported since 2015 in Newlib. > > > >libstdc++-v3 > > > > PR target/85904 > > * configure.ac): Define HAVE_ALIGNED_ALLOC if building for > > There's a stray closing parenthesis here. > > > Newlib. > > * configure: Regnerate. > > Typo "Regnerate". > > But the patch itself is fine - OK for trunk. > > I'm ambivalent about this being backported to gcc-7 and gcc-8 branches > (gcc-6 is unaffected as it doesn't use aligned_alloc). > > It's strictly speaking an ABI change, because HAVE_ALIGNED_ALLOC > affects the memory layout for allocations from operator new(size_t, > align_val_t) (in new_opa.cc) which needs to agree with the > corresponding operator delete (in del_opa.cc). Using static linking it > might be possible to create a binary that has operator new using > aligned_alloc, but operator delete expecting to do ((void**)ptr)[-1], > which would be bad. > > Those operators are C++17, so "experimental", but maybe we shouldn't > make the change on release branches. The way it is now I'm getting build failures on new SPU target (which is newlib based): /home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:58:1: error: 'void* aligned_alloc(std::size_t, std::size_t)' was declared 'extern' and later 'static' [-fpermissive] aligned_alloc (std::size_t al, std::size_t sz) ^ In file included from /home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/cstdlib:75:62, from /home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/stdlib.h:36, from /home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:27: /home/uweigand/dailybuild/spu-tc-2018-08-07/spu-toolchain/spu/include/stdlib.h:328:8: note: previous declaration of 'void* aligned_alloc(size_t, size_t)' void * aligned_alloc(size_t, size_t) __malloc_like __alloc_align(1) ^ This seems to be because configure is hard-coded to assume aligned_alloc is not available, but then the new_opc.cc file tries to define its own version, conflicting with the newlib prototype that is actually there. So one way or the other this needs to be fixed ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib
Jonathan Wakely wrote: > Aha, so newlib was using memalign previously: > > @@ -53,20 +54,24 @@ aligned_alloc (std::size_t al, std::size_t sz) > #else > extern "C" void *memalign(std::size_t boundary, std::size_t size); > #endif > -#define aligned_alloc memalign Yes, exactly ... this commit introduced the regression. > OK, I've regressed the branches then - I'll fix that. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[spu, commit] Fix PR target/82960
Hello, the ICE with --enable-checking=rtl reported in PR 82960 was caused by spu.c:pad_bb using INSN_CODE on RTX with INSN_P false (specifically, on jump_table_data). Add checks to handle this case. Tested on spu-elf, committed to mainline. Bye, Ulrich ChangeLog: PR target/82960 * config/spu/spu.c (pad_bb): Only check INSN_CODE when INSN_P is true. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 255408) --- gcc/config/spu/spu.c(working copy) *** pad_bb(void) *** 2029,2036 for (; insn; insn = next_insn) { next_insn = next_active_insn (insn); ! if (INSN_CODE (insn) == CODE_FOR_iprefetch ! || INSN_CODE (insn) == CODE_FOR_hbr) { if (hbr_insn) { --- 2029,2037 for (; insn; insn = next_insn) { next_insn = next_active_insn (insn); ! if (INSN_P (insn) ! && (INSN_CODE (insn) == CODE_FOR_iprefetch ! || INSN_CODE (insn) == CODE_FOR_hbr)) { if (hbr_insn) { *** pad_bb(void) *** 2048,2054 } hbr_insn = insn; } ! if (INSN_CODE (insn) == CODE_FOR_blockage && next_insn) { if (GET_MODE (insn) == TImode) PUT_MODE (next_insn, TImode); --- 2049,2055 } hbr_insn = insn; } ! if (INSN_P (insn) && INSN_CODE (insn) == CODE_FOR_blockage && next_insn) { if (GET_MODE (insn) == TImode) PUT_MODE (next_insn, TImode); -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH v2] S/390: Fix conditional returns on z196+
Ilya Leoshkevich wrote: > 2018-09-19 Ilya Leoshkevich > > PR target/80080 > * config/s390/s390.md: Do not use PARALLEL RETURN+USE when > returning via %r14. This makes sense to me. I'm just wondering if it wouldn't be simpler to do the check for r14 right in s390_emit_epilogue and then just call gen_return instead of gen_return_use directly there? > +++ b/gcc/config/s390/s390.md > @@ -10831,6 +10831,13 @@ > (use (match_operand 0 "register_operand" "a"))])] >"" > { > + if (REGNO (operands[0]) == RETURN_REGNUM && s390_can_use_return_insn ()) This probably cannot happen in practice in this specific case, but in general a register_operand may also be e.g. a SUBREG, so you shouldn't use REGNO without first verifying that this is a REG. (If you move the check to s390_emit_epilogue as above, this point is moot; you don't even need to generate a REG RTX if it's for r14 then.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH v3] S/390: Fix conditional returns on z196+
Ilya Leoshkevich wrote: > gcc/ChangeLog: > > 2018-09-19 Ilya Leoshkevich > > PR target/80080 > * config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL > RETURN+USE when returning via %r14. > > gcc/testsuite/ChangeLog: > > 2018-09-19 Ilya Leoshkevich > > PR target/80080 > * gcc.target/s390/risbg-ll-3.c: Expect conditional returns. > * gcc.target/s390/zvector/vec-cmp-2.c: Likewise. This is OK. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
Tom de Vries wrote: > * ira-costs.c (ira_tune_allocno_costs): Use > ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS to adjust costs. In debugging PR 53864 on s390x-linux, I ran into a weird change in behavior that occurs when the following part of this patch was checked in: > - if (ira_hard_reg_set_intersection_p (regno, mode, > call_used_reg_set) > - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) > - cost += (ALLOCNO_CALL_FREQ (a) > - * (ira_memory_move_cost[mode][rclass][0] > - + ira_memory_move_cost[mode][rclass][1])); > + crossed_calls_clobber_regs > + = &(ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); > + if (ira_hard_reg_set_intersection_p (regno, mode, > +*crossed_calls_clobber_regs)) > + { > + if (ira_hard_reg_set_intersection_p (regno, mode, > +call_used_reg_set) > + || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) > + cost += (ALLOCNO_CALL_FREQ (a) > + * (ira_memory_move_cost[mode][rclass][0] > + + ira_memory_move_cost[mode][rclass][1])); > #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER > - cost += ((ira_memory_move_cost[mode][rclass][0] > - + ira_memory_move_cost[mode][rclass][1]) > -* ALLOCNO_FREQ (a) > -* IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); > + cost += ((ira_memory_move_cost[mode][rclass][0] > + + ira_memory_move_cost[mode][rclass][1]) > +* ALLOCNO_FREQ (a) > +* IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); > #endif > + } Before that patch, this code would penalize all call-clobbered registers (if the alloca is used across a call), and it would penalize *all* registers in a target-dependent way if IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined; the latter is completely independent of the presence of any calls. However, after that patch, the IRA_HARD_REGNO_ADD_COST_MULTIPLIER penalty is only applied for registers clobbered by calls in this function. This seems a completely unrelated change, and looks just wrong to me ... Was this done intentionally or is this just an oversight? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [IRA] some code improvement and s390 support
Vladmir Makarov wrote: > * config/s390/s390.h (IRA_COVER_CLASSES, > IRA_HARD_REGNO_ADD_COST_MULTIPLIER(regno)): Define. In debugging PR 53854 I noticed a strange behavior in IRA costs that seems to trace back to the very first definition of the IRA_HARD_REGNO_ADD_COST_MULTIPLIER on s390: > +/* In some case register allocation order is not enough for IRA to > + generate a good code. The following macro (if defined) increases > + cost of REGNO for a pseudo approximately by pseudo usage frequency > + multiplied by the macro value. > + > + We avoid usage of BASE_REGNUM by nonzero macro value because the > + reload can decide not to use the hard register because some > + constant was forced to be in memory. */ > +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(regno)\ > + (regno == BASE_REGNUM ? 0.0 : 0.5) (which is still unchanged in current sources.) Now, the comment says BASE_REGNUM should be avoided, but the actual implementation of the macro seems to avoid *all* registers *but* BASE_REGNUM ... Am I misreading this, or is there indeed a logic error here? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH PR62151]Fix uninitialized register issue caused by distribute_notes in combine pass
Segher Boessenkool wreote: > On Mon, Sep 01, 2014 at 10:39:10AM -0600, Jeff Law wrote: > > On 09/01/14 05:38, Segher Boessenkool wrote: > > >On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote: > > >>>In the testcase (and comment in the proposed patch), why is combine > > >>>combining four insns at all? That means it rejected combining just the > > >>>first three. Why did it do that? > > >>It is explicitly reject by below code in can_combine_p. > > >> > > >> if (GET_CODE (PATTERN (i3)) == PARALLEL) > > >> for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--) > > >> if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER) > > >> { > > >> /* Don't substitute for a register intended as a clobberable > > >> operand. */ > > >> rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0); > > >> if (rtx_equal_p (reg, dest)) > > >> return 0; > > >> > > >>Since insn i2 in the list of i0/i1/i2 as below contains parallel > > >>clobber of dest_of_insn76/use_of_insn77. > > >>32: r84:SI=0 > > >>76: flags:CC=cmp(r84:SI,0x1) > > >> REG_DEAD r84:SI > > >>77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;} > > >> REG_DEAD flags:CC > > >> REG_UNUSED flags:CC > > > > > >Archaeology suggests this check is because the clobber might be an > > >earlyclobber. Which seems silly: how can it be a valid insn at all > > >in that case? It seems to me the check can just be removed. That > > >will hide your issue, maybe even solve it (but I doubt it). > > Silly for other reasons, namely that earlyclobber doesn't come into play > > until after combine (register allocation and later). > > The last change to this code was by Ulrich (cc:ed); in that thread (June > 2004, mostly not threaded in the mail archive, broken MUAs :-( ) it was > said that any clobber should be considered an earlyclobber (an RTL insn > can expand to multiple machine instructions, for example). But I don't > see how that can matter for "dest" here (the dest of "insn", that's 76 > in the example), only for "src". > > The version of "flags" set in 76 obviously dies in 77 (it clobbers the > reg after all), but there is no way it could clobber it before it uses > it, that just makes no sense. And in the combined insn that version of > flags does not exist at all. This seems the time period where the email archive is not fully complete; some of the mails of that 2004 thread apparently were not linked into the monthly thread list. This archive seems to have them all: http://marc.info/?t=108747834900012&r=1&w=2 In any case, this test in can_combine_p rejects a combination for *two* different issues. One is the earlyclobber problem, which is what that 2004 thread was about, and which my patch back then relaxed for fixed hard register. However, this doesn't seem to apply to the example above; that is really about the second problem: don't substitute into a clobber. I understand the reason why this particular substitution is rejected is simply that if it weren't, we'd be substituting flags:CC=cmp(r84:SI,0x1) into clobber flags:CC, resulting in clobber cmp(r84:SI,0x1), which is invalid RTL. Now I guess this check could be relaxed if somewhere else in combine we'd recognize the substitution into a clobber and simply omit it in that case. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
Tom de Vries wrote: > thanks for noticing this. I agree, this looks wrong, and is probably an > oversight. [ It seems that s390 is the only target defining > IRA_HARD_REGNO_ADD_COST_MULTIPLIER, so this problem didn't show up on any > other > target. ] > > I think attached patch fixes it. > > I've build the patch and ran the fuse-caller-save tests, and I'm currently > bootstrapping and reg-testing it on x86_64. Thanks! > Can you check whether this patches fixes the issue for s390 ? Yes, this (which is equivalent to a patch I had been using) does fix the s390 issue again. Just for my curiosity, why is the second condition (after &&) needed in this clause in the first place? > if (ira_hard_reg_set_intersection_p (regno, mode, > +*crossed_calls_clobber_regs) > + && (ira_hard_reg_set_intersection_p (regno, mode, > call_used_reg_set) > - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) If a register is in crossed_calls_clobber_regs, can it ever *not* be a call-clobbered register? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos
Hello, when implementing the new ABI for powerpc64le-linux, I ran into an assertion failure in store_unaligned_arguments_into_pseudos: gcc_assert (args[i].partial % UNITS_PER_WORD == 0); This can happen in the new ABI since we pass "homogeneous structures" consisting of soleley floating point elements of the same type in floating-point registers until they run out, and the rest of the structure in memory. If the structure member type is a 4-byte float, and we only have an odd number of floating point registers available, then args[i].partial can legitimately end up not being a multiple of UNITS_PER_WORD (i.e. 8 on the platform). Now, there are a number of similar checks that args[i].partial is aligned elsewhere in calls.c and functions.c. But for all of those the logic is: if args[i].reg is a REG, then args[i].partial must be a multiple of the word size; but if args[i].regs is a PARALLEL, then args[i].partial can be any arbitrary value. In the powerpc64le-linux use case, the back-end always generates PARALLEL in this situation, so it seemed the middle-end ought to support this -- and it does, except for this one case. However, looking more closely, it seems store_unaligned_arguments_into_pseudos is not really useful for PARALLEL arguments in the first place. What this routine does is load arguments into args[i].aligned_regs. But if we have an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in fact never be used later on at all! Instead, PARALLEL will always be handled directly via emit_group_move (in load_register_parameters), so the code generated by store_unaligned_arguments_into_pseudos for such cases is simply dead anyway. Thus, I'd suggest to simply have store_unaligned_arguments_into_pseudos skip PARALLEL arguments. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL arguments. Index: gcc/gcc/calls.c === --- gcc.orig/gcc/calls.c +++ gcc/gcc/calls.c @@ -981,6 +981,7 @@ store_unaligned_arguments_into_pseudos ( for (i = 0; i < num_actuals; i++) if (args[i].reg != 0 && ! args[i].pass_on_stack + && GET_CODE (args[i].reg) != PARALLEL && args[i].mode == BLKmode && MEM_P (args[i].value) && (MEM_ALIGN (args[i].value) -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH] Avoid duplicate calls to REG_PARM_STACK_SPACE
Hello, this is another tweak to the middle-end to help support the new powerpc64le-linux ABI. In the new ABI, we make a distinction between functions that pass all arguments solely in registers, and those that do not. Only when calling one the latter type (or a varags routine) does the caller need to provide REG_PARM_STACK_SPACE; in the former case, this can be omitted to save stack space. This leads to a definition of REG_PARM_STACK_SPACE that is a lot more complex than usual, which means it would be helpful if the middle-end were to evaluate it sparingly (e.g. once per function definition / call). The middle-end already does cache REG_PARM_STACK_SPACE results, however, this cache it not consistently used. In particular, the locate_and_pad_parm routine will re-evaluate the macro every time it is called, even though all callers of the routine already have a cached copy. This patch adds a new argument to locate_and_pad_parm which is used instead of evaluating REG_PARM_STACK_SPACE, and updates the callers to pass in the correct value. There should be no change in generated code on any platform. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich 2013-11-11 Ulrich Weigand Alan Modra ChangeLog: * function.c (assign_parms): Use all.reg_parm_stack_space instead of re-evaluating REG_PARM_STACK_SPACE target macro. (locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE. Use it instead of evaluating target macro REG_PARM_STACK_SPACE every time. (assign_parm_find_entry_rtl): Update call. * calls.c (initialize_argument_information): Update call. (emit_library_call_value_1): Likewise. * expr.h (locate_and_pad_parm): Update prototype. Index: gcc/gcc/expr.h === --- gcc.orig/gcc/expr.h +++ gcc/gcc/expr.h @@ -521,8 +521,8 @@ extern rtx expand_divmod (int, enum tree rtx, int); #endif -extern void locate_and_pad_parm (enum machine_mode, tree, int, int, tree, -struct args_size *, +extern void locate_and_pad_parm (enum machine_mode, tree, int, int, int, +tree, struct args_size *, struct locate_and_pad_arg_data *); /* Return the CODE_LABEL rtx for a LABEL_DECL, creating it if necessary. */ Index: gcc/gcc/calls.c === --- gcc.orig/gcc/calls.c +++ gcc/gcc/calls.c @@ -1326,6 +1326,7 @@ initialize_argument_information (int num #else args[i].reg != 0, #endif +reg_parm_stack_space, args[i].pass_on_stack ? 0 : args[i].partial, fndecl, args_size, &args[i].locate); #ifdef BLOCK_REG_PADDING @@ -3736,7 +3737,8 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - 0, NULL_TREE, &args_size, &argvec[count].locate); + reg_parm_stack_space, 0, + NULL_TREE, &args_size, &argvec[count].locate); if (argvec[count].reg == 0 || argvec[count].partial != 0 || reg_parm_stack_space > 0) @@ -3823,7 +3825,7 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - argvec[count].partial, + reg_parm_stack_space, argvec[count].partial, NULL_TREE, &args_size, &argvec[count].locate); args_size.constant += argvec[count].locate.size.constant; gcc_assert (!argvec[count].locate.size.var); Index: gcc/gcc/function.c === --- gcc.orig/gcc/function.c +++ gcc/gcc/function.c @@ -2523,6 +2523,7 @@ assign_parm_find_entry_rtl (struct assig } locate_and_pad_parm (data->promoted_mode, data->passed_type, in_regs, + all->reg_parm_stack_space, entry_parm ? data->partial : 0, current_function_decl, &all->stack_args_size, &data->locate); @@ -3511,11 +3512,7 @@ assign_parms (tree fndecl) /* Adjust function incoming argument size for alignment and minimum length. */ -#ifdef REG_PARM_STACK_SPACE - crtl->args.size = MAX (crtl->args.size, - REG_PARM_STACK_SPACE (fndecl)); -#endif - + crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space); crtl->args.size = CEIL_ROUND (crtl->args.size, PARM_BOUNDARY / BITS_PER_UNIT); @@ -3719,6 +3716,9 @@ gimplify_parameters (void) IN_REGS is nonzero if the argument will be passed in registers. It will never be set if RE
[PATCH, rs6000] Fix little-endian bug in linux-unwind.h
Hello, this fixes another issue related to CR unwinding, this time only on 64-bit little-endian systems. The problem is linux-unwind.h:ppc_fallback_frame_state, which notes that the kernel places the CR value in the low bits of a 64-bit field in the signal handler frame; but the offset of that field is different in little-endian. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich libgcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Correct location of CR save area for 64-bit little-endian systems. Index: gcc/libgcc/config/rs6000/linux-unwind.h === --- gcc.orig/libgcc/config/rs6000/linux-unwind.h +++ gcc/libgcc/config/rs6000/linux-unwind.h @@ -185,6 +185,7 @@ ppc_fallback_frame_state (struct _Unwind { struct gcc_regs *regs = get_regs (context); struct gcc_vregs *vregs; + long cr_offset; long new_cfa; int i; @@ -206,11 +207,13 @@ ppc_fallback_frame_state (struct _Unwind fs->regs.reg[i].loc.offset = (long) ®s->gpr[i] - new_cfa; } + /* The CR is saved in the low 32 bits of regs->ccr. */ + cr_offset = (long) ®s->ccr - new_cfa; +#ifndef __LITTLE_ENDIAN__ + cr_offset += sizeof (long) - 4; +#endif fs->regs.reg[R_CR2].how = REG_SAVED_OFFSET; - /* CR? regs are always 32-bit and PPC is big-endian, so in 64-bit - libgcc loc.offset needs to point to the low 32 bits of regs->ccr. */ - fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa - + sizeof (long) - 4; + fs->regs.reg[R_CR2].loc.offset = cr_offset; fs->regs.reg[R_LR].how = REG_SAVED_OFFSET; fs->regs.reg[R_LR].loc.offset = (long) ®s->link - new_cfa; -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH, rs6000] Fix corner case in unwinding CR values
Hello, this patch fixes a corner case bug in unwinding CR values. The underlying problem is as follows: In order to save the CR in the prologue, two insns are necessary. The first is a mfcr that loads the CR value into a GPR, and the second is a store of the GPR to the stack. The back-end currently tags the first insn with a REG_FRAME_RELATED_EXPR, and marks both as RTX_FRAME_RELATED_P. This causes the middle-end to emit two CFI records, showing the CR saved first in the GPR, and later in its stack slot. (The first record is omitted if there is no possible exception in between.) Now, assume we use -fnon-call-exceptions, and get a signal on an instruction in between those two points, and the signal handler attempts to throw. The unwind logic will read CFI records and determine that in this frame, CR was saved into a GPR; and in the frame below (which must be a signal frame), that GPR was saved onto the stack. What the unwind logic then decides to do is to restore CR from the save slot of that latter GPR; i.e. the unwinder will copy that GPR save slot over the CR save slot of the unwind routine that calls __builtin_eh_return. Unfortunately, this is a problem on a 64-bit big-endian platform, since that GPR save slot is 8 bytes, while the CR save slot is just 4 bytes. The unwinder therefore copies the wrong half of the GPR save slot over the CR save slot ... As second, related, problem will come up with the new ABI. Here, we want to track each CR field in a separate CFI record, even though the values end up sharing the same stack slot. This is not a problem for a CFI record pointing to memory. However, the current dwarf2out.c middle-end logic is unable to handle the situation where multiple registers are saved in the same *register* at the same time ... Both problems can be fixed in the same way, by never emitting a CFI record showing CR saved into a GPR. This is easily done by simply marking only the store to memory as RTX_FRAME_RELATED_P. However, to avoid generating incorrect code we must then prevent and exception point to occur in between the two insns to save CR. The easiest way to do so is to mark the store to the stack slot as an additional user of all CR fields that need to be saved by the current function. This does restrict scheduling other instructions in between the prologue a bit more than otherwise. But since this affects only very special cases (e.g. an instruction that may triggers a floating- point exception, while -fnon-call-exceptions is active), this seems not too bad ... Note that in the epilogue, this is not really a problem, even though we also need two instructions to restore CR. We can (and do) simply leave the CFI record point to the stack slot all the time, which remains valid even after the CR value was read from there (and even after the stack pointer itself was modified). The patch below fixes this problem as described above, and adds a test case that fails without the patch. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_emit_prologue): Do not place a RTX_FRAME_RELATED_P marker on the UNSPEC_MOVESI_FROM_CR insn. Instead, add USEs of all modified call-saved CR fields to the insn storing the result to the stack slot, and provide an appropriate REG_FRAME_RELATED_EXPR for that insn. * config/rs6000/rs6000.md ("*crsave"): New insn pattern. * config/rs6000/predicates.md ("crsave_operation"): New predicate. testsuite/ChangeLog: 2013-11-11 Ulrich Weigand * g++.dg/eh/ppc64-sighandle-cr.C: New test. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -21761,21 +21761,9 @@ rs6000_emit_prologue (void) && REGNO (frame_reg_rtx) != cr_save_regno && !(using_static_chain_p && cr_save_regno == 11)) { - rtx set; - cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno); START_USE (cr_save_regno); - insn = emit_insn (gen_movesi_from_cr (cr_save_rtx)); - RTX_FRAME_RELATED_P (insn) = 1; - /* Now, there's no way that dwarf2out_frame_debug_expr is going -to understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)'. -But that's OK. All we have to do is specify that _one_ condition -code register is saved in this stack slot. The thrower's epilogue -will then restore all the call-saved registers. -We use CR2_REGNO (70) to be compatible with gcc-2.95 on Linux. */ - set = gen_rtx_SET (VOIDmode, cr_save_rtx, -gen_rtx_REG (SImode, CR2_REGNO)); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, set); + emit_insn (gen_movesi_from_cr (cr_save_rtx)); } /* Do any r
[PATCH, rs6000] ELFv2 ABI preparation: Refactor call expanders
Hello, this patch refactors code to expand calls in preparation for adding support for the new little-endian ABI. Currently, the call expanders always generate the same RTX pattern, which is matched by different insns patterns depending on circumstances (ABI, local vs. global calls, ...). This makes it difficult if in some of these circumstances, we'd really need a different RTX pattern, e.g. becuase the call uses certain special registers, or because the call must perform extra actions like saving or restoring the TOC pointer. There is one exception to this logic already in place: when expanding an indirect call on ABI_AIX, the expander calls a routine in rs6000.c. This patch expands on that by handling *all* calls (including sibling calls) on ABI_AIX via a routine in rs6000.c. This should be fully transparent to generated code, although generated RTL is a bit different: we use CALL_INSN_FUNCTION_USAGE to track uses of ABI-defined registers by the call, which has a couple of other advantages: - we can remove some patterns that were duplicated solely to track whether or not a call uses r11; - we can simply the sibcall patterns so they no longer explicitly refer to LR_REGNO; - we can precisely track which calls use r2 to ensure we have correct data flow for the register (this doesn't actually matter for the current code, but will become necessary with the new ABI). Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_call_indirect_aix): Rename to ... (rs6000_call_aix): ... this. Handle both direct and indirect calls. Create call insn directly instead of via various gen_... routines. Mention special registers used by the call in CALL_INSN_FUNCTION_USAGE. (rs6000_sibcall_aix): New function. * config/rs6000/rs6000.md (TOC_SAVE_OFFSET_32BIT): Remove. (TOC_SAVE_OFFSET_64BIT): Likewise. (AIX_FUNC_DESC_TOC_32BIT): Likewise. (AIX_FUNC_DESC_TOC_64BIT): Likewise. (AIX_FUNC_DESC_SC_32BIT): Likewise. (AIX_FUNC_DESC_SC_64BIT): Likewise. ("call" expander): Call rs6000_call_aix. ("call_value" expander): Likewise. ("call_indirect_aix"): Replace this pattern ... ("call_indirect_aix_nor11"): ... and this pattern ... ("*call_indirect_aix"): ... by this insn pattern. ("call_value_indirect_aix"): Replace this pattern ... ("call_value_indirect_aix_nor11"): ... and this pattern ... ("*call_value_indirect_aix"): ... by this insn pattern. ("*call_nonlocal_aix32", "*call_nonlocal_aix64"): Replace by ... ("*call_nonlocal_aix"): ... this pattern. ("*call_value_nonlocal_aix32", "*call_value_nonlocal_aix64"): Replace ("*call_value_nonlocal_aix"): ... by this pattern. ("*call_local_aix"): New insn pattern. ("*call_value_local_aix"): Likewise. ("sibcall" expander): Call rs6000_sibcall_aix. ("sibcall_value" expander): Likewise. Move earlier in file. ("*sibcall_nonlocal_aix"): Replace by ... ("*sibcall_aix"): ... this pattern. ("*sibcall_value_nonlocal_aix"): Replace by ... ("*sibcall_value_aix"): ... this pattern. * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): Remove. (rs6000_call_aix): Add prototype. (rs6000_sibcall_aix): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -30599,118 +30599,138 @@ rs6000_legitimate_constant_p (enum machi } -/* A function pointer under AIX is a pointer to a data area whose first word - contains the actual address of the function, whose second word contains a - pointer to its TOC, and whose third word contains a value to place in the - static chain register (r11). Note that if we load the static chain, our - "trampoline" need not have any executable code. */ + +/* Expand code to perform a call under the AIX ABI. */ void -rs6000_call_indirect_aix (rtx value, rtx func_desc, rtx flag) +rs6000_call_aix (rtx value, rtx func_desc, rtx flag, rtx cookie) { + rtx toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM); + rtx toc_load = NULL_RTX; + rtx toc_restore = NULL_RTX; rtx func_addr; - rtx toc_reg; - rtx sc_reg; - rtx stack_ptr; - rtx stack_toc_offset; - rtx stack_toc_mem; - rtx func_toc_offset; - rtx func_toc_mem; - rtx func_sc_offset; - rtx func_sc_mem; + rtx abi_reg = NULL_RTX; + rtx call[4]; + int n_call; rtx insn; - rtx (*call_func) (rtx, rtx, rtx, rtx); - rtx (*call_value_func) (r
[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_function_arg
Hello, another patch in preparation for the new ABI. When calling a function in the absence of a prototype, we need to pass arguments that are normally passed in floating-point or vector registers also on the stack and/or in GPRs. The code currently handling this in rs6000_function_arg is implemented in two separate places, one that handles floating-point arguments and one that handles vector arguments. This is probably reasonable with the current ABI, since in the vector case, everything is much simpler than in the floating-point case. However, with the new ABI, we may now pass more complex arguments in vector registers, in which case we might need to duplicate more of the logic currently used only for floating-point arguments. To avoid that duplication, this patch moves the logic to handle argument duplication on stack or in GPRs out to a pair of new helper routines, and uses them for both floating-point and vector arguments. For the current ABI, the result should be exactly the same. No change in generated code intented. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_psave_function_arg): New function. (rs6000_finish_function_arg): Likewise. (rs6000_function_arg): Use rs6000_psave_function_arg and rs6000_finish_function_arg to handle both vector and floating point arguments that are also passed in GPRs / the stack. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9511,6 +9511,83 @@ rs6000_mixed_function_arg (enum machine_ return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec)); } +/* We have an argument of MODE and TYPE that goes into FPRs or VRs, + but must also be copied into the parameter save area starting at + offset ALIGN_WORDS. Fill in RVEC with the elements corresponding + to the GPRs and/or memory. Return the number of elements used. */ + +static int +rs6000_psave_function_arg (enum machine_mode mode, const_tree type, + int align_words, rtx *rvec) +{ + int k = 0; + + if (align_words < GP_ARG_NUM_REG) +{ + int n_words = rs6000_arg_size (mode, type); + + if (align_words + n_words > GP_ARG_NUM_REG + || (TARGET_32BIT && TARGET_POWERPC64)) + { + /* If this is partially on the stack, then we only +include the portion actually in registers here. */ + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode; + int i = 0; + + if (align_words + n_words > GP_ARG_NUM_REG) + { + /* Not all of the arg fits in gprs. Say that it goes in memory +too, using a magic NULL_RTX component. Also see comment in +rs6000_mixed_function_arg for why the normal +function_arg_partial_nregs scheme doesn't work in this case. */ + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx); + } + + do + { + rtx r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words); + rtx off = GEN_INT (i++ * GET_MODE_SIZE (rmode)); + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); + } + while (++align_words < GP_ARG_NUM_REG && --n_words != 0); + } + else + { + /* The whole arg fits in gprs. */ + rtx r = gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, const0_rtx); + } +} + else +{ + /* It's entirely in memory. */ + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx); +} + + return k; +} + +/* RVEC is a vector of K components of an argument of mode MODE. + Construct the final function_arg return value from it. */ + +static rtx +rs6000_finish_function_arg (enum machine_mode mode, rtx *rvec, int k) +{ + gcc_assert (k >= 1); + + /* Avoid returning a PARALLEL in the trivial cases. */ + if (k == 1) +{ + if (XEXP (rvec[0], 0) == NULL_RTX) + return NULL_RTX; + + if (GET_MODE (XEXP (rvec[0], 0)) == mode) + return XEXP (rvec[0], 0); +} + + return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec)); +} + /* Determine where to put an argument to a function. Value is zero to push the argument on the stack, or a hard register in which to store the argument. @@ -9580,32 +9657,25 @@ rs6000_function_arg (cumulative_args_t c } if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) -if (TARGET_64BIT && ! cum->prototype) - { - /* Vector parameters get passed in vector register - and also in GPRs or memory, in absence of prototype. */ - int align_words; - rtx slot; - align_words = (cum->words + 1) & ~1; +{ + r
[PATCH, rs0000] ELFv2 ABI preparation: Refactor some uses of ABI_AIX
Hello, this is another patch to prepare for the new ABI. Since this will introduce a new setting of DEFAULT_ABI, which will have to be treated like ABI_AIX in many cases, it would be better if there were fewer explicit references to ABI_AIX in the back-end. This patch eliminates a number of them: - DEFAULT_ABI is either ABI_V4, ABI_DARWIN, or ABI_AIX, so any test for two of these can be replaced by a test for the third - there are some code paths that are never used on ABI_DARWIN. Here, any test for ABI_AIX can be replaced by one for ABI_V4. (This is particularly obvious for the load_toc_v4_... patterns that even have v4 in their name ...) - rs6000_elf_declare_function_name had a duplicated test for ABI_AIX in a code block that already tests for ABI_AIX None of this is intended to change generated code on any platform. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_option_override_internal): Replace "DEFAULT_ABI != ABI_AIX" test by testing for ABI_V4 or ABI_DARWIN. (rs6000_savres_strategy): Likewise. (rs6000_return_addr): Likewise. (rs6000_emit_load_toc_table): Replace "DEFAULT_ABI != ABI_AIX" by testing for ABI_V4 (since ABI_DARWIN is impossible here). (rs6000_emit_prologue): Likewise. (legitimate_lo_sum_address_p): Simplify DEFAULT_ABI test. (rs6000_elf_declare_function_name): Remove duplicated test. * config/rs6000/rs6000.md ("load_toc_v4_PIC_1"): Explicitly test for ABI_V4 (instead of "DEFAULT_ABI != ABI_AIX" test). ("load_toc_v4_PIC_1_normal"): Likewise. ("load_toc_v4_PIC_1_476"): Likewise. ("load_toc_v4_PIC_1b"): Likewise. ("load_toc_v4_PIC_1b_normal"): Likewise. ("load_toc_v4_PIC_1b_476"): Likewise. ("load_toc_v4_PIC_2"): Likewise. ("load_toc_v4_PIC_3b"): Likewise. ("load_toc_v4_PIC_3c"): Likewise. * config/rs6000/rs6000.h (RS6000_REG_SAVE): Simplify DEFAULT_ABI test. (RS6000_SAVE_AREA): Likewise. (FP_ARG_MAX_REG): Likewise. (RETURN_ADDRESS_OFFSET): Likewise. * config/rs6000/sysv.h (TARGET_TOC): Test for ABI_V4 instead of ABI_AIX. (SUBTARGET_OVERRIDE_OPTIONS): Likewise. (MINIMAL_TOC_SECTION_ASM_OP): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -3669,7 +3669,7 @@ rs6000_option_override_internal (bool gl /* We should always be splitting complex arguments, but we can't break Linux and Darwin ABIs at the moment. For now, only AIX is fixed. */ - if (DEFAULT_ABI != ABI_AIX) + if (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) targetm.calls.split_complex_arg = NULL; } @@ -6384,7 +6384,7 @@ legitimate_lo_sum_address_p (enum machin { bool large_toc_ok; - if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic) + if (DEFAULT_ABI == ABI_V4 && flag_pic) return false; /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS @@ -19605,7 +19605,8 @@ rs6000_savres_strategy (rs6000_stack_t * by the static chain. It would require too much fiddling and the static chain is rarely used anyway. FPRs are saved w.r.t the stack pointer on Darwin, and AIX uses r1 or r12. */ - if (using_static_chain_p && DEFAULT_ABI != ABI_AIX) + if (using_static_chain_p + && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)) strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS) | SAVE_INLINE_GPRS | SAVE_INLINE_VRS | REST_INLINE_VRS); @@ -20301,7 +20302,8 @@ rs6000_return_addr (int count, rtx frame /* Currently we don't optimize very well between prolog and body code and for PIC code the code can be actually quite bad, so don't try to be too clever here. */ - if (count != 0 || (DEFAULT_ABI != ABI_AIX && flag_pic)) + if (count != 0 + || ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && flag_pic)) { cfun->machine->ra_needs_full_frame = 1; @@ -20449,7 +20451,7 @@ rs6000_emit_load_toc_table (int fromprol rtx dest; dest = gen_rtx_REG (Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM); - if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI != ABI_AIX && flag_pic) + if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI == ABI_V4 && flag_pic) { char buf[30]; rtx lab, tmp1, tmp2, got; @@ -20477,7 +20479
[PATCH, rs6000] ELFv2 ABI preparation: Remove USE_FP/ALTIVEC_FOR_ARG_P type arg
Hello, another patch in preparation for the new ABI. USE_FP_FOR_ARG_P and USE_ALTIVEC_FOR_ARG_P take a TYPE argument which they never use. Since passing a correct TYPE for the homogeneous struct case would be a bit problematic, it seems cleaner to just remove the unused argument. No change in generated code intented. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (USE_FP_FOR_ARG_P): Remove TYPE argument. (USE_ALTIVEC_FOR_ARG_P): Likewise. (rs6000_darwin64_record_arg_advance_recurse): Update uses. (rs6000_function_arg_advance_1):Likewise. (rs6000_darwin64_record_arg_recurse): Likewise. (rs6000_function_arg): Likewise. (rs6000_arg_partial_bytes): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -8434,13 +8434,13 @@ rs6000_member_type_forces_blk (const_tre } /* Nonzero if we can use a floating-point register to pass this arg. */ -#define USE_FP_FOR_ARG_P(CUM,MODE,TYPE)\ +#define USE_FP_FOR_ARG_P(CUM,MODE) \ (SCALAR_FLOAT_MODE_P (MODE) \ && (CUM)->fregno <= FP_ARG_MAX_REG \ && TARGET_HARD_FLOAT && TARGET_FPRS) /* Nonzero if we can use an AltiVec register to pass this arg. */ -#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,TYPE,NAMED) \ +#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,NAMED) \ (ALTIVEC_OR_VSX_VECTOR_MODE (MODE) \ && (CUM)->vregno <= ALTIVEC_ARG_MAX_REG \ && TARGET_ALTIVEC_ABI \ @@ -8889,7 +8889,7 @@ rs6000_darwin64_record_arg_advance_recur if (TREE_CODE (ftype) == RECORD_TYPE) rs6000_darwin64_record_arg_advance_recurse (cum, ftype, bitpos); - else if (USE_FP_FOR_ARG_P (cum, mode, ftype)) + else if (USE_FP_FOR_ARG_P (cum, mode)) { unsigned n_fpregs = (GET_MODE_SIZE (mode) + 7) >> 3; rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0); @@ -8930,7 +8930,7 @@ rs6000_darwin64_record_arg_advance_recur else cum->words += n_fpregs; } - else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, 1)) + else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, 1)) { rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0); cum->vregno++; @@ -8993,7 +8993,7 @@ rs6000_function_arg_advance_1 (CUMULATIV { bool stack = false; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) { cum->vregno++; if (!TARGET_ALTIVEC) @@ -9366,7 +9366,7 @@ rs6000_darwin64_record_arg_recurse (CUMU if (TREE_CODE (ftype) == RECORD_TYPE) rs6000_darwin64_record_arg_recurse (cum, ftype, bitpos, rvec, k); - else if (cum->named && USE_FP_FOR_ARG_P (cum, mode, ftype)) + else if (cum->named && USE_FP_FOR_ARG_P (cum, mode)) { unsigned n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3; #if 0 @@ -9394,7 +9394,7 @@ rs6000_darwin64_record_arg_recurse (CUMU if (mode == TFmode || mode == TDmode) cum->fregno++; } - else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, ftype, 1)) + else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, 1)) { rs6000_darwin64_record_arg_flush (cum, bitpos, rvec, k); rvec[(*k)++] @@ -9579,7 +9579,7 @@ rs6000_function_arg (cumulative_args_t c /* Else fall through to usual handling. */ } - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) if (TARGET_64BIT && ! cum->prototype) { /* Vector parameters get passed in vector register @@ -9707,7 +9707,7 @@ rs6000_function_arg (cumulative_args_t c if (mode == TDmode && (cum->fregno % 2) == 1) cum->fregno++; - if (USE_FP_FOR_ARG_P (cum, mode, type)) + if (USE_FP_FOR_ARG_P (cum, mode)) { rtx rvec[GP_ARG_NUM_REG + 1]; rtx r; @@ -9823,7 +9823,7 @@ rs6000_arg_partial_bytes (cumulative_arg if (DEFAULT_ABI == ABI_V4) return 0; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named) && cum->nargs_prototype >= 0) return 0; @@ -9833,7 +9833,7 @@ rs6000_arg_partial_bytes (cumulative_arg align_words = rs6000_parm_start (mode, type, cum->words); - if (USE_FP_FOR_ARG_P (cum, mode, type)) + if (USE_FP_FOR_ARG_P (cum, mode)) { /* If we are passing this arg
[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_arg_partial_bytes
Hello, this is the final patch preparing for the new ABI. The logic in rs6000_arg_partial_bytes is a bit complex, since it apparently still contains remnants from the time where this routine was used even for arguments that are returned both in GPRs and FPRs/VRs (*and* memory). These days, all such cases are handled completely in rs6000_function_arg so rs6000_arg_partial_bytes should always return 0; which it *does*, even though in a somewhat complex way. This complex logic would make handling homogeneous structs more difficult than it needs to be. Therefore, this patch simplifies the logic by making explicit the fact that rs6000_arg_partial_bytes does not actually need to handle the cases described above. No change in generated code expected. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_arg_partial_bytes): Simplify logic by making use of the fact that for vector / floating point arguments passed both in VRs/FPRs and in the fixed parameter area, the partial bytes mechanism is in fact not used. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9835,15 +9835,25 @@ rs6000_arg_partial_bytes (cumulative_arg tree type, bool named) { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + bool passed_in_gprs = true; int ret = 0; int align_words; if (DEFAULT_ABI == ABI_V4) return 0; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named) - && cum->nargs_prototype >= 0) -return 0; + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) +{ + /* If we are passing this arg in the fixed parameter save area + (gprs or memory) as well as VRs, we do not use the partial +bytes mechanism; instead, rs6000_function_arg wil return a +PARALLEL including a memory element as necessary. */ + if (TARGET_64BIT && ! cum->prototype) + return 0; + + /* Otherwise, we pass in VRs only. No partial copy possible. */ + passed_in_gprs = false; +} /* In this complicated case we just disable the partial_nregs code. */ if (TARGET_MACHO && rs6000_darwin64_struct_check_p (mode, type)) @@ -9853,24 +9863,27 @@ rs6000_arg_partial_bytes (cumulative_arg if (USE_FP_FOR_ARG_P (cum, mode)) { + unsigned long n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3; + /* If we are passing this arg in the fixed parameter save area -(gprs or memory) as well as fprs, then this function should -return the number of partial bytes passed in the parameter -save area rather than partial bytes passed in fprs. */ + (gprs or memory) as well as FPRs, we do not use the partial +bytes mechanism; instead, rs6000_function_arg wil return a +PARALLEL including a memory element as necessary. */ if (type && (cum->nargs_prototype <= 0 || (DEFAULT_ABI == ABI_AIX && TARGET_XL_COMPAT && align_words >= GP_ARG_NUM_REG))) return 0; - else if (cum->fregno + ((GET_MODE_SIZE (mode) + 7) >> 3) - > FP_ARG_MAX_REG + 1) + + /* Otherwise, we pass in FPRs only. Check for partial copies. */ + passed_in_gprs = false; + if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1) ret = (FP_ARG_MAX_REG + 1 - cum->fregno) * 8; - else if (cum->nargs_prototype >= 0) - return 0; } - if (align_words < GP_ARG_NUM_REG + if (passed_in_gprs + && align_words < GP_ARG_NUM_REG && GP_ARG_NUM_REG < align_words + rs6000_arg_size (mode, type)) ret = (GP_ARG_NUM_REG - align_words) * (TARGET_32BIT ? 4 : 8); -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH, rs6000] ELFv2 ABI 3/8: Track single CR fields in DWARF CFI
Hello, this patch adds another ELFv2 ABI feature: explicit tracking of CR fields in DWARF CFI. In the current ABI, DWARF CFI contains only a single record describing the save location of the whole CR field. It is implicit that all (or at least all call-clobbered) fields are present at that location. Now, if you use the instructions that save and restore the whole CR at once, this approach might seem reasonable. Unfortunately, with current POWER processors, those instructions tend to be significantly slower that those that access only single CR fields. In particular in routines where only one or two CR fields are actually clobbered and need to be saved, we could improve performance of prolog and epilog code by saving/restoring only selected CR fields. However, this is not possible in the current ABI since there is no way to describe this fact in the CFI. With the ELFv2 ABI, every CR field gets its own CFI record (using the register numbers 68 .. 75 to stand for CR0 .. CR7). Now, those fields will still usually be saved in the same 4-byte field on the stack. The semantics of a CFI record for field CRx is that the memory location holds 4 bytes, and the 4-bit nibble corresponding to CRx within those 4 bytes hold the CRx value to be restored. The one problem with this scheme is the way uw_install_context tries to modify saved valued when unwinding the stack: it will simply copy over the whole field into the save slot of the unwinder routine (that calls __builtin_eh_return). This clearly does not work if multiple CR fields need to be restored independently. To fix this, the prolog/epilog code for unwinder routines will use *multiple* stack slots, one for each call-saved CR fields, and save and restore those fields to and from their own slot. This will allow uw_install_context to install values for multiple fields. (Note that there is already precedent for unwinder routines being treated specially in the rs6000.c prologue/epilogue code ...) Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (struct rs6000_stack): New member ehcr_offset. (rs6000_stack_info): For ABI_ELFv2, allocate space for separate CR field save areas if the function calls __builtin_eh_return. (rs6000_emit_move_from_cr): New function. (rs6000_emit_prologue): Use it. For ABI_ELFv2, generate separate CFI records for each saved CR field. For functions that call __builtin_eh_return, save all CR fields into separate slots. (restore_saved_cr): For ABI_ELFv2, generate separate CFA_RESTORE entries for each saved CR field. (add_crlr_cfa_restore): Likewise. (rs6000_emit_epilogue): For ABI_ELFv2, if the function calls __builtin_eh_return, restore each CR field from its own slot. libgcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/linux-unwind.h (R_CR3, R_CR4): New macros. (ppc_fallback_frame_state) [_CALL_ELF == 2]: Create CFI entry for CR3 and CR4. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -97,6 +97,7 @@ typedef struct rs6000_stack { int spe_gp_save_offset; /* offset to save spe 64-bit gprs */ int varargs_save_offset; /* offset to save the varargs registers */ int ehrd_offset; /* offset to EH return data */ + int ehcr_offset; /* offset to EH CR field data */ int reg_size;/* register size (4 or 8) */ HOST_WIDE_INT vars_size; /* variable save area size */ int parm_size; /* outgoing parameter size */ @@ -19847,6 +19848,7 @@ rs6000_stack_info (void) rs6000_stack_t *info_ptr = &stack_info; int reg_size = TARGET_32BIT ? 4 : 8; int ehrd_size; + int ehcr_size; int save_align; int first_gp; HOST_WIDE_INT non_fixed_size; @@ -19940,6 +19942,18 @@ rs6000_stack_info (void) else ehrd_size = 0; + /* In the ELFv2 ABI, we also need to allocate space for separate + CR field save areas if the function calls __builtin_eh_return. */ + if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return) +{ + /* This hard-codes that we have three call-saved CR fields. */ + ehcr_size = 3 * reg_size; + /* We do *not* use the regular CR save mechanism. */ + info_ptr->cr_save_p = 0; +} + else +ehcr_size = 0; + /* Determine various sizes. */ info_ptr->reg_size = reg_size; info_ptr->fixed_size = RS6000_SAVE_AREA; @@ -20009,6 +20023,8 @@ rs6000_stack_info (void) } else info_ptr->ehrd_offset = info_ptr->gp_save_offset - ehrd_size; + + info_ptr->ehcr_offset = info_ptr->ehrd_offset - ehcr_size; info_ptr->cr_save_offset = reg_size; /* first word when 64-bit. */ info_ptr->lr_save_offset = 2*re
[PATCH, rs6000] ELFv2 ABI 2/8: Remove function descriptors
Hello, this patch adds the first major feature of the ELFv2 ABI: removal of function descriptors. In the current ABI, it is the responsibility of a caller to set up the TOC pointer needed by the callee by loading the correct value into r2. This typically happens in one of these ways: - For a direct function call within the same module, caller and callee share the same TOC, so r2 already contains the correct value. - For a direct function call to a function in another module, the linker will interpose a PLT stub which reloads r2 to be appropriate for the callee, with help from ld.so. - For an indirect function call, the "function pointer" points to function descriptor that holds the target function address as well as the appropriate TOC value (and a static chain value for nested functions). The caller must load those values up before transfering control to the callee. With the new ABI, it is the responsibility of the callee to set up its own TOC pointer. This has a number of advantages, in particular for a callee that doesn't actually need a TOC, and makes the function pointer call sequence more effective by avoiding pipeline hazards. However, it remains true that with the PowerPC ISA, it is difficult to actually establish addressability to the TOC without any outside help. To avoid introducing new performance regressions, the new ABI instead provides a dual entry point scheme. Any function that needs a TOC may provide two entry points: - The first ("global") entry point is intended to called via indirect calls. It expects r12 to be set up to point to the entry point address itself. (Since in order to perform an indirect call, the address must be loaded into some GPR anyway, this does not impose a substantial impact on the caller ...) Code at the global entry point is expected to load up r2 with the appropriate TOC value for this function (and it may use the r12 value to compute that), and then fall through to the local entry point. - The second ("local") entry point expects r2 to be set up to the current TOC, much like an entry point in the current ABI does. However, it must not expect r12 to hold any particular value. This means that a local direct function call within the same module can be done via a simple "bl" instruction just as today. Note that the linker will automatically direct a "bl func" to the *local* entry point associated with the symbol "func"; not the global one. If local and global entry points coincide, the function may expect no particular value in either r12 or r2; this can be used by functions that do not need a TOC. There exists just one single ELF symbol for both local and global entry points; its symbol value refers to the global entry point, and flag bits in ELF st_other encode the offset from there to the local entry point. The compiler emits a .localentry directive to announce this location to the assembler, which will encode it as appropriate. This patch implements all aspects needed for this scheme: - Emit ".abiversion 2" assembler directive to help the assembler and linker understand the ABI implemented in this file. - Remove all handling of function descriptors / .opd section / dot symbols for the ELFv2 ABI. - Output a global entry point prologue and local entry point marker for functions that use the TOC. - Implement the ELFv2 function pointer call sequence. - Use trampolines to handle nested functions, just like on 32-bit (including marking the stack as executable). - No longer scan for the old ABI indirect call sequence in linux-unwind.h. - Update assembler code to the new ABI: - gcc/config/rs6000/ppc-asm.h (used by libgcc and elsewhere) - lititm/config/powerpc/sjlj.S - gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c In addition, some related changes are necessary: - Move the code generating the call to _mcount for -mprofile-kernel from output_function_profiler to rs6000_output_function_prologue, since this call must happen at the local entry point, not the global entry point. - Disable (now useless) tests for -mpointers-to-nested-functions. - Fix the libstc++ extract_symvers.in script to ignore markers emitted by readelf --symbols that indicate local entry points. - Fix the "gotest" script to no longer expect function symbols to reside in the data segment. (I understand this last bit needs to go into the Go repository first ...) Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (machine_function): New member r2_setup_needed. (rs6000_emit_prologue): Set r2_setup_needed if necessary. (rs6000_output_mi_thunk): Set r2_setup_needed. (rs6000_output_function_prologue): Output global entry point prologue and local entry point marker if needed for ABI_ELFv2. Output -mprofile-kernel code here.
[PATCH, rs6000] ELFv2 ABI 1/8: Add options and infrastructure
Hello, this is the first patch in the series to add support for the ELFv2 ABI. The ELFv2 ABI is the intended ABI for the new powerpc64le-linux port. However, it is not inherently tied to the byte order; it it possible in principle to use the ELFv2 ABI in big-endian mode too. Therefore, it is introduces via a new pair of options -mabi=elfv1 / -mabi=elfv2 where -mabi=elfv1 select the current Linux ABI, and -mabi=elfv2 selects the new one. If neither option is given, the compiler defaults to whatever ABI was specified at configure time using the --with-abi=elfv1 / --with-abi=elfv2 configure option. If this was not specified either, every subtarget can provide a master default by defining (or not) the LINUX64_DEFAULT_ABI_ELFv2 macro. The patch series is structured as follows: - This first patch adds the new enum rs6000_abi value ABI_ELFv2 and all the configure logic needed to set it. It is treated just like ABI_AIX (on a modern system) at this point; no change in generated code is expected yet. - A series of follow on patches will add the various features defining the ELFv2 ABI. Note that they will still not be active by default anywhere. - The final patch will simply turn the switch to make ELFv2 the default ABI on powerpc64le-linux. The whole series was tested on: - powerpc64-linux with no regression, including no regression in an ALT_CC_UNDER_TEST/ALT_CXX_UNDER_TEST run of compat.exp and struct-layout-1.exp against a current compiler. - powerpc64-linux --with-abi=elfv2 in mock-up big-endian ELFv2 environment running with some hacks on an existing system. - powerpc64le-linux, using the new default ELFv2 in a rebuilt small OS image using the new ABI everywhere. (Of course, patches to binutils, glibc, and libffi to support the new ELFv2 ABI are also required.) The tests were using all languages (including Java, Go, and Objective-C++; on powerpc64le-linux also Ada), and all target libraries except libsanitizer (which currently seems broken on mainline). There is a small number of regressions on powerpc64le-linux which seem to be due to unrelated issues: - FAIL: g++.dg/cpp1y/vla-initlist1.C execution test This is an invalid assumption in the test case - FAIL: go.test/test/fixedbugs/bug296.go execution, -O2 -g The ELFv2 ABI seems to be exposing a Go frontend bug here - FAIL: runtime/pprof (in libgo) This is due to a glibc problem with unwinding through a context created via makecontext I'll address those separately. (Of course, there are also other pre-existing regressions on powerpc64le-linux simply due to little-endian issues. Those are unaffected by ELFv2.) I'll add more detailed descriptions of the actual ABI features with the various follow-on patches that implement those. This patch specifically only adds the options as described above. ABI_ELFv2 is implemented to be equivalent to ABI_AIX on a modern system; since the ABI is new, there is no need to continue to support legacy features. This means specifically: - DOT_SYMBOLS is assumed to be false - rs6000_compat_align_parm is assumed to be false - code in linux-unwind.h that handles old kernels/linkers is removed The patch also adds two new pre-defined macros: - _CALL_ELF is set to 1 or 2 to denote the ELFv1 / ELFv2 ABI - __STRUCT_PARM_ALIGN__ is set to 16 if aggregates passed by value are aligned to 16 if their native alignment requires that (this is necessary to implement libffi, for example) In addition, the patch adds a testsuite effective target check for the ELFv2 ABI, and disables the pr57949 tests (these verify the operation of the -mcompat-align-parm option, which is no longer useful in the ELFv2 ABI). To avoid having a partial ABI implementation in tree, it seems best to commit this whole patch series as a single commit. Is the series OK for mainline? Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config.gcc [powerpc*-*-* | rs6000-*-*]: Support --with-abi=elfv1 and --with-abi=elfv2. * config/rs6000/option-defaults.h (OPTION_DEFAULT_SPECS): Add "abi". * config/rs6000/rs6000.opt (mabi=elfv1): New option. (mabi=elfv2): Likewise. * config/rs6000/rs6000-opts.h (enum rs6000_abi): Add ABI_ELFv2. * config/rs6000/linux64.h (DEFAULT_ABI): Do not hard-code to AIX_ABI if !RS6000_BI_ARCH. (ELFv2_ABI_CHECK): New macro. (SUBSUBTARGET_OVERRIDE_OPTIONS): Use it to decide whether to set rs6000_current_abi to ABI_AIX or ABI_ELFv2. (GLIBC_DYNAMIC_LINKER64): Support ELFv2 ld.so version. * config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Predefine _CALL_ELF and __STRUCT_PARM_ALIGN__ if appropriate. * config/rs6000/rs6000.c (rs6000_debug_reg_global): Handle ABI_ELFv2. (debug_stack_info): Likewise. (rs6000_file_start): Treat ABI_ELFv2 the same as ABI_AIX. (rs6000_legitimize_tl
[PATCH, rs6000] ELFv2 ABI 4/8: Struct passing calling convention changes
Hello, this patch implements the new struct calling convention for ELFv2. With the new ABI, so-called "homogeneous" aggregates, i.e. struct, arrays, or unions that (recursively) contain only elements of the same floating- point or vector type are passed as if those elements were passed as separate arguments. (This is done for up to 8 such elements.) After the refactoring of the rs6000_function_arg / rs6000_arg_partial_bytes logic that was done in a prior patch, implementing support for homogenous aggregates is now mostly straightforward. Note that rs6000_psave_function_arg can now get called for BLKmode arguments, and has to handle them using a PARALLEL. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Michael Gschwind * config/rs6000/rs6000.h (AGGR_ARG_NUM_REG): Define. * config/rs6000/rs6000.c (rs6000_aggregate_candidate): New function. (rs6000_discover_homogeneous_aggregate): Likewise. (rs6000_function_arg_boundary): Handle homogeneous aggregates. (rs6000_function_arg_advance_1): Likewise. (rs6000_function_arg): Likewise. (rs6000_arg_partial_bytes): Likewise. (rs6000_psave_function_arg): Handle BLKmode arguments. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1641,6 +1641,9 @@ extern enum reg_class rs6000_constraints #define ALTIVEC_ARG_MAX_REG (ALTIVEC_ARG_MIN_REG + 11) #define ALTIVEC_ARG_NUM_REG (ALTIVEC_ARG_MAX_REG - ALTIVEC_ARG_MIN_REG + 1) +/* Maximum number of registers per ELFv2 homogeneous aggregate argument. */ +#define AGGR_ARG_NUM_REG 8 + /* Return registers */ #define GP_ARG_RETURN GP_ARG_MIN_REG #define FP_ARG_RETURN FP_ARG_MIN_REG Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -8460,6 +8460,219 @@ rs6000_member_type_forces_blk (const_tre && TARGET_ALTIVEC_ABI \ && (NAMED)) +/* Walk down the type tree of TYPE counting consecutive base elements. + If *MODEP is VOIDmode, then set it to the first valid floating point + or vector type. If a non-floating point or vector type is found, or + if a floating point or vector type that doesn't match a non-VOIDmode + *MODEP is found, then return -1, otherwise return the count in the + sub-tree. */ + +static int +rs6000_aggregate_candidate (const_tree type, enum machine_mode *modep) +{ + enum machine_mode mode; + HOST_WIDE_INT size; + + switch (TREE_CODE (type)) +{ +case REAL_TYPE: + mode = TYPE_MODE (type); + if (!SCALAR_FLOAT_MODE_P (mode)) + return -1; + + if (*modep == VOIDmode) + *modep = mode; + + if (*modep == mode) + return 1; + + break; + +case COMPLEX_TYPE: + mode = TYPE_MODE (TREE_TYPE (type)); + if (!SCALAR_FLOAT_MODE_P (mode)) + return -1; + + if (*modep == VOIDmode) + *modep = mode; + + if (*modep == mode) + return 2; + + break; + +case VECTOR_TYPE: + if (!TARGET_ALTIVEC_ABI || !TARGET_ALTIVEC) + return -1; + + /* Use V4SImode as representative of all 128-bit vector types. */ + size = int_size_in_bytes (type); + switch (size) + { + case 16: + mode = V4SImode; + break; + default: + return -1; + } + + if (*modep == VOIDmode) + *modep = mode; + + /* Vector modes are considered to be opaque: two vectors are +equivalent for the purposes of being homogeneous aggregates +if they are the same size. */ + if (*modep == mode) + return 1; + + break; + +case ARRAY_TYPE: + { + int count; + tree index = TYPE_DOMAIN (type); + + /* Can't handle incomplete types. */ + if (!COMPLETE_TYPE_P (type)) + return -1; + + count = rs6000_aggregate_candidate (TREE_TYPE (type), modep); + if (count == -1 + || !index + || !TYPE_MAX_VALUE (index) + || !host_integerp (TYPE_MAX_VALUE (index), 1) + || !TYPE_MIN_VALUE (index) + || !host_integerp (TYPE_MIN_VALUE (index), 1) + || count < 0) + return -1; + + count *= (1 + tree_low_cst (TYPE_MAX_VALUE (index), 1) + - tree_low_cst (TYPE_MIN_VALUE (index), 1)); + + /* There must be no padding. */ + if (!host_integerp (TYPE_SIZE (type), 1) + || (tree_low_cst (TYPE_SIZE (type), 1) + != count * GET_MODE_BITSIZE (*modep))) + return -1; + + return count; + } + +case RECORD_TYPE: + { + int count = 0; + int sub_count; + tree field; + + /* Can't handle incomplete types. */ + if (!COMPLETE_TYPE_P (type
[PATCH, rs6000] ELFv2 ABI 7/8: Eliminate some stack frame fields
Hello, this is the second part of reducing stack space consumption for the ELFv2 ABI: the old ABI reserved six words in every stack frame for various purposes, and two of those were basically unused: one word for "compiler use" and one word for "linker use". Since neither the compiler nor the linker actually ever made any nontrivial use of these fields, they're now eliminated in the new ABI. This patch implements this change, which is mostly straightforward except for the fact that due to the change, the stack offset of the TOC save area changes, which was hard-coded in a number of places ... The patch also updates a number of test cases that hardcoded the stack layout. Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/rs6000.h (RS6000_SAVE_AREA): Handle ABI_ELFv2. (RS6000_SAVE_TOC): Remove. (RS6000_TOC_SAVE_SLOT): New macro. * config/rs6000/rs6000.c (rs6000_parm_offset): New function. (rs6000_parm_start): Use it. (rs6000_function_arg_advance_1): Likewise. (rs6000_emit_prologue): Use RS6000_TOC_SAVE_SLOT. (rs6000_emit_epilogue): Likewise. (rs6000_call_aix): Likewise. (rs6000_output_function_prologue): Do not save/restore r11 around calling _mcount for ABI_ELFv2. libgcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/linux-unwind.h (TOC_SAVE_SLOT): Define. (frob_update_context): Use it. gcc/testsuite/ChangeLog: 2013-11-11 Ulrich Weigand * gcc.target/powerpc/ppc64-abi-1.c (stack_frame_t): Remove compiler and linker field if _CALL_ELF == 2. * gcc.target/powerpc/ppc64-abi-2.c (stack_frame_t): Likewise. * gcc.target/powerpc/ppc64-abi-dfp-1.c (stack_frame_t): Likewise. * gcc.dg/stack-usage-1.c (SIZE): Update value for _CALL_ELF == 2. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1529,12 +1529,12 @@ extern enum reg_class rs6000_constraints /* Size of the fixed area on the stack */ #define RS6000_SAVE_AREA \ - ((DEFAULT_ABI == ABI_V4 ? 8 : 24) << (TARGET_64BIT ? 1 : 0)) + ((DEFAULT_ABI == ABI_V4 ? 8 : DEFAULT_ABI == ABI_ELFv2 ? 16 : 24)\ + << (TARGET_64BIT ? 1 : 0)) -/* MEM representing address to save the TOC register */ -#define RS6000_SAVE_TOC gen_rtx_MEM (Pmode, \ -plus_constant (Pmode, stack_pointer_rtx, \ - (TARGET_32BIT ? 20 : 40))) +/* Stack offset for toc save slot. */ +#define RS6000_TOC_SAVE_SLOT \ + ((DEFAULT_ABI == ABI_ELFv2 ? 12 : 20) << (TARGET_64BIT ? 1 : 0)) /* Align an address */ #define RS6000_ALIGN(n,a) (((n) + (a) - 1) & ~((a) - 1)) Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9029,6 +9029,16 @@ rs6000_function_arg_boundary (enum machi return PARM_BOUNDARY; } +/* The offset in words to the start of the parameter save area. */ + +static unsigned int +rs6000_parm_offset (void) +{ + return (DEFAULT_ABI == ABI_V4 ? 2 + : DEFAULT_ABI == ABI_ELFv2 ? 4 + : 6); +} + /* For a function parm of MODE and TYPE, return the starting word in the parameter area. NWORDS of the parameter area are already used. */ @@ -9037,11 +9047,9 @@ rs6000_parm_start (enum machine_mode mod unsigned int nwords) { unsigned int align; - unsigned int parm_offset; align = rs6000_function_arg_boundary (mode, type) / PARM_BOUNDARY - 1; - parm_offset = DEFAULT_ABI == ABI_V4 ? 2 : 6; - return nwords + (-(parm_offset + nwords) & align); + return nwords + (-(rs6000_parm_offset () + nwords) & align); } /* Compute the size (in words) of a function argument. */ @@ -9281,15 +9289,13 @@ rs6000_function_arg_advance_1 (CUMULATIV { int align; - /* Vector parameters must be 16-byte aligned. This places -them at 2 mod 4 in terms of words in 32-bit mode, since -the parameter save area starts at offset 24 from the -stack. In 64-bit mode, they just have to start on an -even word, since the parameter save area is 16-byte -aligned. Space for GPRs is reserved even if the argument -will be passed in memory. */ + /* Vector parameters must be 16-byte aligned. In 32-bit +mode this means we need to take into account the offset +to the parameter save area. In 64-bit mode, they just +have to start on an even word, since the parameter save +area is 16-byte aligned. */ if (TARGET_32BIT) - align = (2 - cum->words) & 3;
[PATCH, rs6000] ELFv2 ABI 5/8: Struct return calling convention changes
Hello, analogously to the previous patch handling homogeneous aggregates as arguments, this patch handles such aggregates as return values. In addition, the ELFv2 ABI returns any aggregate of up to 16 bytes in size (that is not otherwise handled) in up to two GPRs. In either case, the return value is returned formatted exactly as if it were being passed as first argument. In order to get this correct in the big-endian ELFv2 case, we need to provide a non-default implementation of TARGET_RETURN_IN_MSB. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Michael Gschwind * config/rs6000/rs6000.h (FP_ARG_MAX_RETURN): New macro. (ALTIVEC_ARG_MAX_RETURN): Likewise. (FUNCTION_VALUE_REGNO_P): Use them. * config/rs6000/rs6000.c (TARGET_RETURN_IN_MSB): Define. (rs6000_return_in_msb): New function. (rs6000_return_in_memory): Handle ELFv2 homogeneous aggregates. Handle aggregates of up to 16 bytes for ELFv2. (rs6000_function_value): Handle ELFv2 homogeneous aggregates. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1648,6 +1648,10 @@ extern enum reg_class rs6000_constraints #define GP_ARG_RETURN GP_ARG_MIN_REG #define FP_ARG_RETURN FP_ARG_MIN_REG #define ALTIVEC_ARG_RETURN (FIRST_ALTIVEC_REGNO + 2) +#define FP_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? FP_ARG_RETURN\ + : (FP_ARG_RETURN + AGGR_ARG_NUM_REG - 1)) +#define ALTIVEC_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? ALTIVEC_ARG_RETURN \ + : (ALTIVEC_ARG_RETURN + AGGR_ARG_NUM_REG - 1)) /* Flags for the call/call_value rtl operations set up by function_arg */ #define CALL_NORMAL0x /* no special processing */ @@ -1667,8 +1671,10 @@ extern enum reg_class rs6000_constraints On RS/6000, this is r3, fp1, and v2 (for AltiVec). */ #define FUNCTION_VALUE_REGNO_P(N) \ ((N) == GP_ARG_RETURN \ - || ((N) == FP_ARG_RETURN && TARGET_HARD_FLOAT && TARGET_FPRS) \ - || ((N) == ALTIVEC_ARG_RETURN && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI)) + || ((N) >= FP_ARG_RETURN && (N) <= FP_ARG_MAX_RETURN \ + && TARGET_HARD_FLOAT && TARGET_FPRS)\ + || ((N) >= ALTIVEC_ARG_RETURN && (N) <= ALTIVEC_ARG_MAX_RETURN \ + && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI)) /* 1 if N is a possible register number for function argument passing. On RS/6000, these are r3-r10 and fp1-fp13. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -1449,6 +1449,9 @@ static const struct attribute_spec rs600 #undef TARGET_RETURN_IN_MEMORY #define TARGET_RETURN_IN_MEMORY rs6000_return_in_memory +#undef TARGET_RETURN_IN_MSB +#define TARGET_RETURN_IN_MSB rs6000_return_in_msb + #undef TARGET_SETUP_INCOMING_VARARGS #define TARGET_SETUP_INCOMING_VARARGS setup_incoming_varargs @@ -8725,6 +8728,16 @@ rs6000_return_in_memory (const_tree type /* Otherwise fall through to more conventional ABI rules. */ } + /* The ELFv2 ABI returns homogeneous VFP aggregates in registers */ + if (rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, +NULL, NULL)) +return false; + + /* The ELFv2 ABI returns aggregates up to 16B in registers */ + if (DEFAULT_ABI == ABI_ELFv2 && AGGREGATE_TYPE_P (type) + && (unsigned HOST_WIDE_INT) int_size_in_bytes (type) <= 16) +return false; + if (AGGREGATE_TYPE_P (type) && (aix_struct_return || (unsigned HOST_WIDE_INT) int_size_in_bytes (type) > 8)) @@ -8756,6 +8769,19 @@ rs6000_return_in_memory (const_tree type return false; } +/* Specify whether values returned in registers should be at the most + significant end of a register. We want aggregates returned by + value to match the way aggregates are passed to functions. */ + +static bool +rs6000_return_in_msb (const_tree valtype) +{ + return (DEFAULT_ABI == ABI_ELFv2 + && BYTES_BIG_ENDIAN + && AGGREGATE_TYPE_P (valtype) + && FUNCTION_ARG_PADDING (TYPE_MODE (valtype), valtype) == upward); +} + #ifdef HAVE_AS_GNU_ATTRIBUTE /* Return TRUE if a call to function FNDECL may be one that potentially affects the function calling ABI of the object file. */ @@ -29897,6 +29923,8 @@ rs6000_function_value (const_tree valtyp { enum machine_mode mode; unsigned int regno; + enum machine_mode elt_mode; + int n_elts; /* Special handling for s
[PATCH, rs6000] ELFv2 ABI 6/8: Eliminate register save area in some cases
Hello, the final area of changes implemented by the ELFv2 ABI is stack space consumption. Reducing the miminum stack size requirements is important for environments where stack space is restricted, e.g. Linux kernel code, or where we have a large number of stacks (e.g. heavily multi-threaded applications). One reason for the large stack space requirement of the current ABI is that every caller must provide an argument save area of 64 bytes to its caller. This is actually useful in one specific case: if the called function uses variable arguments and constructs a va_list. Because of the way the ABI is designed, the callee can in this case simply store the 8 GPRs (potentially) carrying arguments into that save area, and is then guaranteed to have all function arguments in a linear arrangement on the stack, which makes a trivial va_arg implementation possible. If the callee is not vararg, this wouldn't really be necessary. However, if we were to skip that area then, vararg and non-vararg routines would have incompatible ABIs, which would make it impossible to generate code for a function call if the caller does not have a prototype of the called function. Therefore the old ABI requires the save area to be present always. With the new ABI, we wanted to retain the possibility of using a trivial va_arg implementation, and also the possibility of calling non-prototyped routines safely. However, even so, there is a set of cases where it is still not necessary to provide a save area: when calling a function we have a prototype for and know that it is neither vararg nor uses on-stack arguments. This patch implement this change by having REG_PARM_STACK_SPACE return a different value depending on whether the called routine requires on-stack arguments (and we have a prototype). There was one problem exposed by this change: rs6000_function_arg contained this piece of code: if (mode == BLKmode) mode = Pmode; which marked the mode of a GPR holding an incoming struct argument as Pmode. This is now causing a problem. When REG_PARM_STACK_SPACE returns 0, function.c must allocate a stack slot within the callee's frame to serve as DECL_RTL for the parameter. This is done in assign_parm_setup_block. This routine attempts to use the mode of the incoming register as the mode of that stack slot, if the sizes match. This means that the DECL_RTL for an argument of struct type may now have mode Pmode, even if that type cannot reliably be operated on in Pmode, which causes ICEs in several test cases for me. Note that when REG_PARM_STACK_SPACE is non-zero, this problem does not occur, because in this case, function.c allocates the stack slot for the parameter's DECL_RTL in the register save area, using a different function assign_parm_find_stack_rtl. *That* function will keep the mode as BLKmode even if the incoming register is in another mode. Removing the above two lines from rs6000_function_arg fixes this problem, and does not appear to introduce any problem for the ELFv1 ABI case either (full bootstrap / regtest still passes). Looking back at the ChangeLog, those two lines where added by Alan Modra back in 2004: http://gcc.gnu.org/ml/gcc/2004-11/msg00617.html to fix ICEs after a change by Richard Henderson: http://gcc.gnu.org/ml/gcc/2004-11/msg00569.html but Richard shortly afterwards reverted that change again since it caused problems elsewhere too: http://gcc.gnu.org/ml/gcc/2004-11/msg00751.html So all in all, it seems this change by Alan was simply not necessary and can be reverted. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/rs6000-protos.h (rs6000_reg_parm_stack_space): Add prototype. * config/rs6000/rs6000.h (RS6000_REG_SAVE): Remove. (REG_PARM_STACK_SPACE): Call rs6000_reg_parm_stack_space. * config/rs6000/rs6000.c (rs6000_parm_needs_stack): New function. (rs6000_function_parms_need_stack): Likewise. (rs6000_reg_parm_stack_space): Likewise. (rs6000_function_arg): Do not replace BLKmode by Pmode when returning a register argument. Index: gcc/gcc/config/rs6000/rs6000-protos.h === --- gcc.orig/gcc/config/rs6000/rs6000-protos.h +++ gcc/gcc/config/rs6000/rs6000-protos.h @@ -158,6 +158,7 @@ extern tree altivec_resolve_overloaded_b extern rtx rs6000_libcall_value (enum machine_mode); extern rtx rs6000_va_arg (tree, tree); extern int function_ok_for_sibcall (tree); +extern int rs6000_reg_parm_stack_space (tree); extern void rs6000_elf_declare_function_name (FILE *, const char *, tree); extern bool rs6000_elf_in_small_data_p (const_tree); #ifdef ARGS_SIZE_RTX Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1527,10 +1527,6 @@ extern
[PATCH, rs6000] ELFv2 ABI 8/8: Enable by default on little-endian
Hello, this patch finally throws the switch and enables the ELFv2 ABI by default on powerpc64le-linux. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/sysv4le.h (LINUX64_DEFAULT_ABI_ELFv2): Define. Index: gcc/gcc/config/rs6000/sysv4le.h === --- gcc.orig/gcc/config/rs6000/sysv4le.h +++ gcc/gcc/config/rs6000/sysv4le.h @@ -34,3 +34,7 @@ #undef MULTILIB_DEFAULTS #defineMULTILIB_DEFAULTS { "mlittle", "mcall-sysv" } + +/* Little-endian PowerPC64 Linux uses the ELF v2 ABI by default. */ +#define LINUX64_DEFAULT_ABI_ELFv2 + -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com