from:"Ulrich Weigand"

Re: [PATCH, S390] Avoid LA with base and index on z13

2018-07-05 Thread Ulrich Weigand

Robin Dapp wrote:

> * config/s390/s390.c (preferred_la_operand_p): Do not use
>   LA with base and index on z13 or later.

The code just before your change reads:

  /* Avoid LA instructions with index register on z196; it is
 preferable to use regular add instructions when possible.
 Starting with zEC12 the la with index register is "uncracked"
 again.  */
  if (addr.indx && s390_tune == PROCESSOR_2817_Z196)
return false;

But on zEC12 LA works pretty much the same as on z13/z14, it is
indeed not cracked, but still a 2-cycle instruction when using
an index register.  So I guess the change really should apply
to zEC12 as well, and this could be as simple as changing the
above line to:

  if (addr.indx && s390_tune >= PROCESSOR_2817_Z196)

(Note that "addr.base && addr.indx" is the same as just checking
for addr.indx, since s390_decompose_address will never fill in
*just* an index.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Build fail on gthr-simple.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand

Nicholas Koenig wrote:

> Hello everyone,
> 
> I have committed the async I/O patch as r262978.
> 
> The test cases are in libgomp.fortran for now, maybe that can be changed 
> later.

It looks like this broke building libgfortran on spu, and presumably
any platform that uses gthr-simple instead of gthr-posix.

The problem is that io/asynch.h unconditionally uses a couple of
features that are not provided by gthr-simplex, in particular
  __gthread_cond_t
and
  __gthread_equal / __gthread_self

According to the documentation in gthr.h, the former is only available
if __GTHREAD_HAS_COND is defined, and the latter are only available if
__GTHREADS_CXX0X is defined.  Neither is true for gthr-simple.h.

To fix the build error, either libgfortran should only use those features
conditionally on those defines, or else the gthr.h logic needs to be
changed and (stubs for) those features provided in gthr-simple.h as well.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Build fail on gthr-single.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand

I wrote:
> Nicholas Koenig wrote:
> 
> > Hello everyone,
> > 
> > I have committed the async I/O patch as r262978.
> > 
> > The test cases are in libgomp.fortran for now, maybe that can be changed 
> > later.
> 
> It looks like this broke building libgfortran on spu, and presumably
> any platform that uses gthr-simple instead of gthr-posix.

The file is called gthr-single.h, not gthr-simple.h ... sorry for the typo.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Fix -ffast-math flags handling inconsistencies

2020-01-31 Thread Ulrich Weigand

Hello,

we've noticed some inconsistencies in how the component flags of -ffast-math
are handled, see the discussion on the GCC list starting here:
https://gcc.gnu.org/ml/gcc/2020-01/msg00365.html

This patch fixes those inconsistencies.  Specifically, there are the
following changes:

1. Some component flags for -ffast-math are *set* with -ffast-math
   (changing the default), but are not reset with -fno-fast-math,
   causing the latter to have surprising results.  (Note that since
   "-ffast-math -fno-fast-math" is short-cut by the driver, you'll
   only see the surprising results with "-Ofast -fno-fast-math".)
   This is fixed here by both setting and resetting the flags.

   This affects the following flags
  -fcx-limited-range
  -fexcess-precision=

2. Some component flags for -ffast-math are changed from their default,
   but are *not* included in the fast_math_flags_set_p test, causing
   __FAST_MATH__ to remain predefined even when the full set of fast
   math options is not actually in effect.  This is fixed here by
   adding those flags into the fast_math_flags_set_p test.

   This affects the following flags:
 -fcx-limited-range
 -fassociative-math
 -freciprocal-math

3. For some math flags, set_fast_math_flags has code that sets their
   values only to what is already their default.  The overall effect
   of this code is a complete no-op.  This patch removes that dead code.

   This affects the following flags:
 -frounding-math
 -fsignaling-nans


The overall effect of this patch is that now all component flags of
-ffast-math are treated exactly equivalently:
  * they are set (changed from their default) with -ffast-math
  * they are reset to their default with -fno-fast-math
  * __FAST_MATH__ is only defined if the value of the flag matches
what -ffast-math would have set it to

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich

ChangeLog:

* opts.c (set_fast_math_flags): In the !set case, also reset
x_flag_cx_limited_range and x_flag_excess_precision.  Remove dead
code resetting x_flag_signaling_nans and x_flag_rounding_math.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, and x_flag_reciprocal_math.

diff --git a/gcc/opts.c b/gcc/opts.c
index 7affeb4..4452793 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set)
 opts->x_flag_finite_math_only = set;
   if (!opts->frontend_set_flag_errno_math)
 opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+  if (!opts->frontend_set_flag_cx_limited_range)
+opts->x_flag_cx_limited_range = set;
+  if (!opts->frontend_set_flag_excess_precision)
+opts->x_flag_excess_precision
+  = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
+
+  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
+  // but since those are off by default, there's nothing to do for now.
 }
 
 /* When -funsafe-math-optimizations is set the following
@@ -2884,10 +2880,13 @@ bool
 fast_math_flags_set_p (const struct gcc_options *opts)
 {
   return (!opts->x_flag_trapping_math
+ && !opts->x_flag_signed_zeros
+ && opts->x_flag_associative_math
+ && opts->x_flag_reciprocal_math
  && opts->x_flag_unsafe_math_optimizations
  && opts->x_flag_finite_math_only
- && !opts->x_flag_signed_zeros
  && !opts->x_flag_errno_math
+ && opts->x_flag_cx_limited_range
  && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST);
 }
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-07 Thread Ulrich Weigand

Richard Biener wrote:
> On Fri, Jan 31, 2020 at 6:01 PM Ulrich Weigand  wrote:
> > The overall effect of this patch is that now all component flags of
> > -ffast-math are treated exactly equivalently:
> >   * they are set (changed from their default) with -ffast-math
> >   * they are reset to their default with -fno-fast-math
> >   * __FAST_MATH__ is only defined if the value of the flag matches
> > what -ffast-math would have set it to
> 
> The last part is not obviously correct to me since it doesn't match
> documentation which says
> 
> @item -ffast-math
> @opindex ffast-math
> Sets the options @option{-fno-math-errno}, 
> @option{-funsafe-math-optimizations},
> @option{-ffinite-math-only}, @option{-fno-rounding-math},
> @option{-fno-signaling-nans}, @option{-fcx-limited-range} and
> @option{-fexcess-precision=fast}.
> 
> This option causes the preprocessor macro @code{__FAST_MATH__} to be defined.
> 
> to me this reads as -ffast-math -fexcess-precision=standard defines
> __FAST_MATH__.
> The only relevant part to define __FAST_MATH__ is specifying -ffast-math, 
> other
> options are not relevant (which of course is contradicted by
> implementation - where
> I didn't actually follow its history in that respect).  So can you
> adjust documentation
> as to when exactly __FAST_MATH__ is defined?

Agreed.  This should probably be worded something along the lines of:
"Whenever all of those options are set to these values as listed above,
the preprocessor macro @code{__FAST_MATH__} will be defined."

> > -  if (!opts->frontend_set_flag_signaling_nans)
> > -   opts->x_flag_signaling_nans = 0;
> > -  if (!opts->frontend_set_flag_rounding_math)
> > -   opts->x_flag_rounding_math = 0;
[...]
> > +  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
> > +  // but since those are off by default, there's nothing to do for now.
> ...
> 
> but what happens to -fsignalling-nans -ffast-math then?  Better leave those
> in I'd say.

Ah, it seems I was confused about the intended semantics here.

I thought that a *more specific* option like -fsignalling-nans was always
intended to override a more generic option like -ffast-math, no matter
whether it comes before or after it on the command line.

Is it instead the case that the intended behavior is the generic option
overrides the specific option if the generic option comes later on the
command line?

In that case, I agree that those should be left in.

However, then I think it would be good to add tests for 
!flag_signaling_nans / !flag_rounding_math to fast_math_flags_set_p,
because with these options on, we aren't really in fast-math territory
any more ...

> Note frontends come into play with what is considered -ffast-math
> and -fno-fast-math but below flags are tested irrespectively of that
> interpretation.
> 
> Note there's -fcx-fortran-rules similar to -fcx-limited-range but not tested
> above.  The canonical middle-end "flag" to look at is flag_complex_method.
> Somehow -fcx-fortran-rules doesn't come into play at all above but it
> affects -fcx-limited-range in another inconsistent way in that
> -fcx-limited-range -fcx-fortran-rules and -fcx-fortran-rules 
> -fcx-limited-range
> behave the same (-fcx-fortran-rules takes precedence...).  I guess
> -fcomplex-method=ENUM should be exposed and -fcx-* made
> appropriate aliases here.

I agree it would probably be the best if to use a -fcomplex-method=...
flag, handled analogously to -fexcess-precision in that it defaults to
an explicit DEFAULT value; unless this is overridden by an explicit
command line flag, the language front-end can then choose what DEFAULT
means for this particular language.

However, I'd prefer to not include that change into this patch as well :-)
This patch only changes the behavior related to -fcx-limited-range in one
very special case (-Ofast -fno-fast-math), where it makes it strictly
better.  It should not affect the (existing) interaction with
-fcx-fortran-rules at all -- so anything to improve this can be done
in a separate patch, I think.

> You're tapping into a mine-field ;)

That's for sure :-)

Thanks,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-10 Thread Ulrich Weigand

Joseph Myers wrote:
> On Fri, 7 Feb 2020, Ulrich Weigand wrote:
> 
> > I thought that a *more specific* option like -fsignalling-nans was always
> > intended to override a more generic option like -ffast-math, no matter
> > whether it comes before or after it on the command line.
> 
> Yes, that's correct.  (There are cases where it's ambiguous which option 
> is more specific - see what I said in 
> <https://gcc.gnu.org/ml/gcc/2016-03/msg00245.html> about -Werror= - but I 
> don't think -ffast-math involves such cases.)

I see.  However, this does not appear to be implemented in current code.
GCC (without my patch) behaves like this (using -fno-finite-math-only
as an example instead of -fsignaling-nans, given the separate point
about the latter option raised by Segher):

gcc -dD -E -x c /dev/null | grep FINITE
#define __FINITE_MATH_ONLY__ 0
gcc -dD -E -x c /dev/null -ffast-math | grep FINITE
#define __FINITE_MATH_ONLY__ 1
gcc -dD -E -x c /dev/null -ffast-math -fno-finite-math-only | grep FINITE
#define __FINITE_MATH_ONLY__ 0
gcc -dD -E -x c /dev/null -fno-finite-math-only -ffast-math | grep FINITE
#define __FINITE_MATH_ONLY__ 1

Given the above rule, in the last case __FINITE_MATH_ONLY__ should
also be 0, right?

I'm not sure how this can be implemented in the current option handling
framework.  The -ffast-math handling case would have to check whether
or not there is any explicit use of -f(no-)finite-math-only; is there
any infrastructure to readily perform such a test?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-10 Thread Ulrich Weigand

Segher Boessenkool wrote:
> On Fri, Feb 07, 2020 at 05:47:32PM +0100, Ulrich Weigand wrote:
> > > but what happens to -fsignalling-nans -ffast-math then?  Better leave 
> > > those
> > > in I'd say.
> > 
> > Ah, it seems I was confused about the intended semantics here.
> > 
> > I thought that a *more specific* option like -fsignalling-nans was always
> > intended to override a more generic option like -ffast-math, no matter
> > whether it comes before or after it on the command line.
> 
> -fsignaling-nans is an independent option.  Signaling NaNs don't do much
> at all when you also have -ffinite-math-only, of course, which says you
> don't have *any* NaNs.

That's a good point, thanks for pointing this out!  I agree that
-fsignaling-nans is not a good example then; it really should be
independent of -ffast-math.

However, there's still one other option that has a similar issue:
-frounding-math.  This *is* properly related to -ffast-math, in
that -ffast-math really ought to imply -fno-rounding-math, but the
latter is (currently) also the default in GCC.

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-11 Thread Ulrich Weigand

Joseph Myers wrote:
> On Mon, 10 Feb 2020, Ulrich Weigand wrote:
> > I'm not sure how this can be implemented in the current option handling
> > framework.  The -ffast-math handling case would have to check whether
> > or not there is any explicit use of -f(no-)finite-math-only; is there
> > any infrastructure to readily perform such a test?
> 
> Pass opts_set as well as opts to set_fast_math_flags, and use it for 
> checking whether an option was explicitly specified (whether enabled or 
> disabled) on the command line.

Ah, I see.  I'll update the patch accordingly.  Thanks!

I guess I still need to check for the opts->frontend_set_... flags
as well -- those are not reflected in opts_set, right?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH v2] Fix -ffast-math flags handling inconsistencies

2020-02-11 Thread Ulrich Weigand

:
-  set_unsafe_math_optimizations_flags (opts, value);
+  set_unsafe_math_optimizations_flags (opts, opts_set, value);
   break;
 
 case OPT_ffixed_:
@@ -2839,44 +2841,44 @@ set_Wstrict_aliasing (struct gcc_options *opts, int 
onoff)
 /* The following routines are useful in setting all the flags that
-ffast-math and -fno-fast-math imply.  */
 static void
-set_fast_math_flags (struct gcc_options *opts, int set)
+set_fast_math_flags (struct gcc_options *opts,
+struct gcc_options *opts_set, int set)
 {
-  if (!opts->frontend_set_flag_unsafe_math_optimizations)
+  if (!opts->frontend_set_flag_unsafe_math_optimizations
+  && !opts_set->x_flag_unsafe_math_optimizations)
 {
   opts->x_flag_unsafe_math_optimizations = set;
-  set_unsafe_math_optimizations_flags (opts, set);
+  set_unsafe_math_optimizations_flags (opts, opts_set, set);
 }
   if (!opts->frontend_set_flag_finite_math_only)
-opts->x_flag_finite_math_only = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_finite_math_only, set);
   if (!opts->frontend_set_flag_errno_math)
-opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+SET_OPTION_IF_UNSET (opts, opts_set, flag_errno_math, !set);
+  if (!opts->frontend_set_flag_cx_limited_range)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_cx_limited_range, set);
+  if (!opts->frontend_set_flag_excess_precision)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_excess_precision,
+set ? EXCESS_PRECISION_FAST
+: EXCESS_PRECISION_DEFAULT);
+
+  // -ffast-math should also reset -frounding-math, but since this
+  // is off by default, there's nothing to do for now.
 }
 
 /* When -funsafe-math-optimizations is set the following
flags are set as well.  */
 static void
-set_unsafe_math_optimizations_flags (struct gcc_options *opts, int set)
+set_unsafe_math_optimizations_flags (struct gcc_options *opts,
+struct gcc_options *opts_set, int set)
 {
   if (!opts->frontend_set_flag_trapping_math)
-opts->x_flag_trapping_math = !set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_trapping_math, !set);
   if (!opts->frontend_set_flag_signed_zeros)
-opts->x_flag_signed_zeros = !set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_signed_zeros, !set);
   if (!opts->frontend_set_flag_associative_math)
-opts->x_flag_associative_math = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_associative_math, set);
   if (!opts->frontend_set_flag_reciprocal_math)
-opts->x_flag_reciprocal_math = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_reciprocal_math, set);
 }
 
 /* Return true iff flags in OPTS are set as if -ffast-math.  */
@@ -2884,10 +2886,14 @@ bool
 fast_math_flags_set_p (const struct gcc_options *opts)
 {
   return (!opts->x_flag_trapping_math
+ && !opts->x_flag_signed_zeros
+ && opts->x_flag_associative_math
+ && opts->x_flag_reciprocal_math
  && opts->x_flag_unsafe_math_optimizations
  && opts->x_flag_finite_math_only
- && !opts->x_flag_signed_zeros
  && !opts->x_flag_errno_math
+     && !opts->x_flag_rounding_math
+ && opts->x_flag_cx_limited_range
  && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST);
 }
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: RFA: fix find_valid_class to accept classes w/out last hard reg

2013-08-26 Thread Ulrich Weigand

Joern Rennecke wrote:

> 2013-05-02  Joern Rennecke 
> 
>   * reload.c (find_valid_class): Allow classes that do not include
>   FIRST_PSEUDO_REGISTER - 1.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Do not mark pseudo-copies decomposable during first lower-subreg pass

2012-10-01 Thread Ulrich Weigand


>   * gcc.dg/lower-subreg-1.c: Disable on arm-*-* targets.

I just noticed that the triple is incomplete; we're supposed to use
arm*-*-* instead of just arm-*-*.

Checked in the the following fix as obvious.

Bye,
Ulrich


2012-10-01  Ulrich Weigand  

* gcc.dg/lower-subreg-1.c: Disable on arm*-*-* targets.

Index: gcc/testsuite/gcc.dg/lower-subreg-1.c
===
*** gcc/testsuite/gcc.dg/lower-subreg-1.c   (revision 191805)
--- gcc/testsuite/gcc.dg/lower-subreg-1.c   (working copy)
***
*** 1,4 
! /* { dg-do compile { target { ! { mips64 || { arm-*-* ia64-*-* spu-*-* 
tilegx-*-* } } } } } */
  /* { dg-options "-O -fdump-rtl-subreg1" } */
  /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */
  /* { dg-require-effective-target ilp32 } */
--- 1,4 
! /* { dg-do compile { target { ! { mips64 || { arm*-*-* ia64-*-* spu-*-* 
tilegx-*-* } } } } } */
  /* { dg-options "-O -fdump-rtl-subreg1" } */
  /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */
  /* { dg-require-effective-target ilp32 } */



-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

[RFC] find_reloads_subreg_address rework triggers i386 back-end issue

2012-10-12 Thread Ulrich Weigand

Hello,

I was running a couple of tests on various platforms in preparation
of getting the find_reload_subreg_address patch needed by aarch64 upstream:
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html

This unfortunately uncovered a regression in vect-98-big-array.c on i386.
It seems to me that this is due to a pre-existing problem in the back-end.

The insn being reloaded is:

(insn 17 15 19 3 (set (subreg:V16QI (reg:V4SI 182) 0)
(unspec:V16QI [
(mem:V16QI (plus:SI (reg:SI 64 [ ivtmp.97 ])
(const_int 16 [0x10])) [2 MEM[base: _164, offset: 
16B]+0 S16 A32])
] UNSPEC_MOVU)) 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 
921 {sse2_movdqu}
 (nil))

where reg:V4SI 182 was spilled to the stack.

With the current compiler, this gets reloaded via a straightforward
output reload:

(insn 17 15 195 3 (set (reg:V16QI 21 xmm0)
(unspec:V16QI [
(mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64])
(const_int 16 [0x10])) [2 MEM[base: _164, offset: 
16B]+0 S16 A32])
] UNSPEC_MOVU)) 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 
921 {sse2_movdqu}
 (nil))
(insn 195 17 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp)
(const_int 112 [0x70])) [4 %sfp+-1232 S16 A128])
(reg:V16QI 21 xmm0)) 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 
901 {*movv16qi_internal}
 (nil))


But with the above patch applied, we get an input reload instead:

(insn 196 15 17 3 (set (reg:V16QI 21 xmm0)
(mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64])
(const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 
A32])) 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 
901 {*movv16qi_internal}
 (nil))
(insn 17 196 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp)
(const_int 112 [0x70])) [4 %sfp+-1232 S16 A128])
(unspec:V16QI [
(reg:V16QI 21 xmm0)
] UNSPEC_MOVU)) 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 
921 {sse2_movdqu}
 (nil))

Now this is clearly broken, since now the load from memory is no longer
marked as potentially unaligned.  And indeed execution of this instruction
traps due to a misaligned access.


However, looking at the underlying pattern:

(define_insn "_movdqu"
  [(set (match_operand:VI1 0 "nonimmediate_operand" "=x,m")
(unspec:VI1 [(match_operand:VI1 1 "nonimmediate_operand" "xm,x")]
UNSPEC_MOVU))]
  "TARGET_SSE2 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"

the back-end actually *allows* this transformation just fine.  So reload has
always had both options.  Since both require just one reload, it's a matter
of additional heuristics which will get chosen, and my patch as a side effect
slightly changes one of those heuristics ...

But in principle the problem is latent even without the patch.  The back-end
should not permit reload to make changes to the pattern that fundamentally
change the semantics of the resulting instruction ...


I was wondering if the i386 port maintainers could have a look at this
pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
value and one to *store* and unaligned value, and not permit that memory
access to get reloaded?

[I guess a more fundamental change might also be to not have an unspec
in the first place, but only plain moves, and check MEM_ALIGN in the
move insn emitter to see which variant of the instruction is required ...]

Bye,
Ulrich
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue

2012-10-15 Thread Ulrich Weigand

Uros Bizjak wrote:
> On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand  wrote:
> > I was wondering if the i386 port maintainers could have a look at this
> > pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
> > value and one to *store* and unaligned value, and not permit that memory
> > access to get reloaded?
> 
> Please find attached a fairly mechanical patch that splits
> move_unaligned pattern into load_unaligned and store_unaligned
> patterns. We've had some problems with this pattern, and finally we
> found the reason to make unaligned moves more robust.
> 
> I will wait for the confirmation that attached patch avoids the
> failure you are seeing with your reload patch.

Yes, this patch does in fact fix the failure I was seeing with the
reload patch.  (A full regression test shows a couple of extra fails:
FAIL: gcc.target/i386/avx256-unaligned-load-1.c scan-assembler sse_movups/1
FAIL: gcc.target/i386/avx256-unaligned-load-3.c scan-assembler sse2_movupd/1
FAIL: gcc.target/i386/avx256-unaligned-load-4.c scan-assembler avx_movups256/1
FAIL: gcc.target/i386/avx256-unaligned-store-4.c scan-assembler avx_movups256/2
But I guess these tests simply need to be updated for the new pattern names.)

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

[PATCH v2, ARM] 64-bit shifts in NEON

2012-10-16 Thread Ulrich Weigand

Richard Earnshaw wrote:
> On 20/09/12 16:49, Ulrich Weigand wrote:
> > Richard Earnshaw wrote:
> >
> >> Hmm, this is going to cause bottlenecks on Cortex-A15: writing a Neon
> >> single-precision register and then reading it back as a double-precision
> >> value will cause scheduling problems.
[snip]
> >> A solution to this is to have the set of the shifter register done as a
> >> lane-set operation rather than as a set of the lower register, but it
> >> probably needs some thought as to how to achieve this without creating
> >> other overheads.
> >
> > What instruction are you refering to here?  Loads from memory?
> 
> Yes, if that's the source, or if from another register, something like
> 
>   vmov.32 Dd[0], Rt
> 
> (it doesn't matter that the other lane remains unintialized).  This has
> the advantage that it doesn't clobber the other half of the register.
> 
> If the operand is already known to be in an S-register, then
> vdup(scalar) can be used, but of course that needs a full 64-bit target
> register.

Here's an updated version of the patch that allocated double registers
and uses lane-set or load from memory instructions to fill in the
shift count operand.

Re-tested on arm-linux-gnueabi with no regression.  Re-benchmarked
(the Linaro backport) with results comparable to the original patch.

Would this version be OK?

Bye,
Ulrich


2012-10-16  Andrew Stubbs  
Ulrich Weigand  

* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Fix comment.
* config/arm/arm.md (opt, opt_enabled): New attributes.
(enabled): Use opt_enabled.
(ashldi3, ashrdi3, lshrdi3): Add TARGET_NEON case.
(ashldi3): Allow general operands for TARGET_NEON case.
* config/arm/iterators.md (rshifts): New code iterator.
(shift, shifttype): New code attributes.
* config/arm/neon.md (UNSPEC_LOAD_COUNT): New unspec type.
(neon_load_count, ashldi3_neon_noclobber, ashldi3_neon,
signed_shift_di3_neon, unsigned_shift_di3_neon,
ashrdi3_neon_imm_noclobber, lshrdi3_neon_imm_noclobber,
di3_neon): New patterns.


Index: gcc/config/arm/arm.c
===
*** gcc/config/arm/arm.c(revision 191400)
--- gcc/config/arm/arm.c(working copy)
*** arm_autoinc_modes_ok_p (enum machine_mod
*** 26293,26300 
 Input requirements:
  - It is safe for the input and output to be the same register, but
early-clobber rules apply for the shift amount and scratch registers.
! - Shift by register requires both scratch registers.  Shift by a constant
!   less than 32 in Thumb2 mode requires SCRATCH1 only.  In all other cases
the scratch registers may be NULL.
  - Ashiftrt by a register also clobbers the CC register.  */
  void
--- 26293,26299 
 Input requirements:
  - It is safe for the input and output to be the same register, but
early-clobber rules apply for the shift amount and scratch registers.
! - Shift by register requires both scratch registers.  In all other cases
the scratch registers may be NULL.
  - Ashiftrt by a register also clobbers the CC register.  */
  void
Index: gcc/config/arm/neon.md
===
*** gcc/config/arm/neon.md  (revision 191400)
--- gcc/config/arm/neon.md  (working copy)
***
*** 23,28 
--- 23,29 
  (define_c_enum "unspec" [
UNSPEC_ASHIFT_SIGNED
UNSPEC_ASHIFT_UNSIGNED
+   UNSPEC_LOAD_COUNT
UNSPEC_VABD
UNSPEC_VABDL
UNSPEC_VADD
***
*** 1170,1175 
--- 1171,1359 
DONE;
  })
  
+ ;; 64-bit shifts
+ 
+ ;; This pattern loads a 32-bit shift count into a 64-bit NEON register,
+ ;; leaving the upper half uninitalized.  This is OK since the shift
+ ;; instruction only looks at the low 8 bits anyway.  To avoid confusing
+ ;; data flow analysis however, we pretend the full register is set
+ ;; using an unspec.
+ (define_insn "neon_load_count"
+   [(set (match_operand:DI 0 "s_register_operand" "=w,w")
+ (unspec:DI [(match_operand:SI 1 "nonimmediate_operand" "Um,r")]
+UNSPEC_LOAD_COUNT))]
+   "TARGET_NEON"
+   "@
+vld1.32\t{%P0[0]}, %A1
+vmov.32\t%P0[0], %1"
+   [(set_attr "neon_type" "neon_vld1_vld2_lane,neon_mcr")]
+ )
+ 
+ (define_insn "ashldi3_neon_noclobber"
+   [(set (match_operand:DI 0 "s_register_operand"  "=w,w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " w,w")
+  (match_operand:DI 2 "reg_or_int_operand" " i,w")))]
+   "TARGET_NEON && reload_completed
+

[commit] Re-work find_reloads_subreg_address

2012-10-16 Thread Ulrich Weigand

Tejas Belagod wrote:
> > Ulrich Weigand wrote:
> >> The following patch implements this idea; it passes a basic regression
> >> test on arm-linux-gnueabi.  (Obviously this would need a lot more
> >> testing on various platforms before getting into mainline ...)
> >>
> >> Can you have a look whether this fixes the problem you're seeing?
[snip]

> Sorry for the delay in replying. These new issues that I was seeing were
> bugs specific to my code that I've fixed. Your patch seems to work fine.
> Thanks a lot for the patch.

I've now checked the patch in to mainline.   In addition to your test on
aarch64, I've completed tests without regression on ppc(64)-linux,
s390(x)-linux, and i386-linux (with Uros' patch).

Note that Uros' patch here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01463.html
is a pre-requisite for the reload patch to avoid regressions on i386.

Current version (nearly unchanged) of the patch appended below.

Bye,
Ulrich

ChangeLog:

* reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE
parameter.  Always replace normal subreg with memory reference
whenever possible.  Return NULL otherwise.
(find_reloads_toplev): Always call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Only recurse further if find_reloads_subreg_address fails.
(find_reloads_address_1): Only call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Properly handle failure of find_reloads_subreg_address.

Index: gcc/reload.c
===
*** gcc/reload.c(revision 191665)
--- gcc/reload.c(working copy)
*** static int find_reloads_address_1 (enum 
*** 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
--- 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
*** find_reloads_toplev (rtx x, int opnum, e
*** 4810,4840 
}

/* If the subreg contains a reg that will be converted to a mem,
!convert the subreg to a narrower memref now.
!Otherwise, we would get (subreg (mem ...) ...),
!which would force reload of the mem.
! 
!We also need to do this if there is an equivalent MEM that is
!not offsettable.  In that case, alter_subreg would produce an
!invalid address on big-endian machines.
! 
!For machines that extend byte loads, we must not reload using
!a wider mode if we have a paradoxical SUBREG.  find_reloads will
!force a reload in that case.  So we should not do anything here.  */

if (regno >= FIRST_PSEUDO_REGISTER
! #ifdef LOAD_EXTEND_OP
! && !paradoxical_subreg_p (x)
! #endif
! && (reg_equiv_address (regno) != 0
! || (reg_equiv_mem (regno) != 0
! && (! strict_memory_address_addr_space_p
! (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
!  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
! || ! offsettable_memref_p (reg_equiv_mem (regno))
! || num_not_at_initial_offset
!   x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
!  insn, address_reloaded);
  }

for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
--- 4810,4828 
}

/* If the subreg contains a reg that will be converted to a mem,
!attempt to convert the whole subreg to a (narrower or wider)
!memory reference instead.  If this succeeds, we're done --
!otherwise fall through to check whether the inner reg still
!needs address reloads anyway.  */

if (regno >= FIRST_PSEUDO_REGISTER
! && reg_equiv_memory_loc (regno) != 0)
!   {
! tem = find_reloads_subreg_address (x, opnum, type, ind_levels,
!insn, address_reloaded);
! if (tem)
!   return tem;
!   }
  }

for (copied = 0, i = GET_RTX_LENGTH (code)

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +; Accept single register and immediate operands usable as shift
> +; counts.
> +(define_predicate "addrreg_or_constint_operand"
> +  (match_code "reg, subreg, const_int")

I'm wondering whether this is even necessary.

> +{
> +  if (GET_MODE (op) != VOIDmode
> +  && GET_MODE_CLASS (GET_MODE (op)) != MODE_INT)
> +return false;

The predicate always seems to be used with a mode, so any extra
mode checks here seem redundant.

> +  while (op && GET_CODE (op) == SUBREG)
> +op = SUBREG_REG (op);
> +
> +  if (REG_P (op)
> +  && (REGNO (op) >= FIRST_PSEUDO_REGISTER || ADDR_REG_P (op)))
> +return true;
> +
> +  if (CONST_INT_P (op))
> +return true;

This looks somewhat copied from shift_count_or_setmem_operand, where
we have a comment:
  /* Don't allow any non-base hard registers.  Doing so without
 confusing reload and/or regrename would be tricky, and doesn't
 buy us much anyway.  */

With the new set up, I don't believe there is any issue with confusing
reload -- it's no more complex than any other pattern now.  So it
should be fine to just accept any register.

In fact, I'm starting to wonder whether it wouldn't be just fine to
simply using nonmemory_operand instead of this special predicate
(the only issue might be that we could get CONST_WIDE_INT, but it
should be simple to handle those).

> +(define_expand "rotl3"
> +  [(set (match_operand:GPR 0 "register_operand" "")
> +(rotate:GPR (match_operand:GPR 1 "register_operand" "")
> + (match_operand:SI 2 "shift_count_or_setmem_operand" "")))]

Similarly, I think the expander no longer needs to use the special predicate
shift_count_or_setmem_operandm but could just use nonmemory_operand.
[ This will reject the PLUS that shift_count_or_setmem_operand accepted,
but I don't believe you ever *could* get the PLUS in the expander anyway.
It will only be moved in by combine, so as long as the insn accepts it,
everything should be good.  ]

> +(define_insn "*rotl3"
> +  [(set (match_operand:GPR 0 "register_operand"   "=d,d")
> + (rotate:GPR (match_operand:GPR 1 "register_operand""d,d")
> + (match_operand:SI  2 "addrreg_or_constint_operand" "a,n")))]
> +  "TARGET_CPU_ZARCH"
> +  "@
> +   rll\t%0,%1,(%2)
> +   rll\t%0,%1,%Y2"

So it looks like %Y is now always used for just a constant.  Maybe the
implementation of %Y should be adjusted (once the patch series is complete)?
Also, if we use just nonmemory_operand, %Y should be updated to accept
CONST_WIDE_INT just like e.g. %x does.

> +; This expands an register/immediate operand to a register+immediate
> +; operand to draw advantage of the address style operand format
> +; providing a addition for free.
> +(define_subst "addr_style_op_subst"
> +  [(set (match_operand:DSI 0 "" "")
> +(SUBST:DSI (match_operand:DSI 1 "" "")
> +(match_operand:SI 2 "" "")))]
> +  "REG_P (operands[2])"
> +  [(set (match_dup 0)
> +(SUBST:DSI (match_dup 1)
> +(plus:SI (match_dup 2)
> + (match_operand 3 "const_int_operand" "n"])

This seems to add just a single alternative.  Is that OK if it gets
substituted into a pattern that used multiple alternatives otherwise?

> +; In the subst pattern we have to disable the alternative where op2 is
> +; already a constant.  This attribute is supposed to be used in the
> +; "disabled" insn attribute to achieve this.
> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")

Is this necessary given that the subst adds a REG_P (operands[2])
condition?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 5/9] S/390: Get rid of Y constraint in arithmetic right shift patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +; This is like the addr_style_op substitution above but with a CC clobber.
> +(define_subst "addr_style_op_cc_subst"

> +; This is like the masked_op substitution but with a CC clobber.
> +(define_subst "masked_op_cc_subst"

A bit unfortunate that these need to be duplicated.  Does the subst always
have to match the full pattern, or can it match and operate on just one
element of a PARALLEL?

> +; This adds an explicit CC reg set to an operation while keeping the
> +; set for the operation result as well.
> +(define_subst "setcc_subst"

> +; This adds an explicit CC reg set to an operation while dropping the
> +; result of the operation.
> +(define_subst "cconly_subst"

These are nice!  It would seem they could be applied to simplify many
of the non-shift patterns too, right?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 6/9] S/390: Get rid of Y constraint in tabort.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

>  (define_insn "*tabort_1"
> -  [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" 
> "Y")]
> +  [(unspec_volatile [(match_operand:SI 0 "addrreg_or_constint_operand" 
> "a,n")]
>   UNSPECV_TABORT)]
>"TARGET_HTM && operands != NULL"
> -  "tabort\t%Y0"
> +  "@
> +   tabort\t0(%0)
> +   tabort\t%0"
> +  [(set_attr "op_type" "S")])
> +
> +(define_insn "*tabort_1_plus"
> +  [(unspec_volatile [(plus:SI (match_operand:SI 0 "register_operand"  "a")
> +   (match_operand:SI 1 "const_int_operand" "J"))]
> + UNSPECV_TABORT)]
> +  "TARGET_HTM && operands != NULL"
> +  "tabort\t%1(%0)"

This seems dangerous: const_int_operand may match a constant that does
not fit into the "J" constraint, which would lead to an abort in reload.

What is the semantics for the abort code anyway?  It is supposed to be
automatically truncated or not?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "register_operand"  
> "a, I")

The second alternative can never match here, can it?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 8/9] S/390: Use define_subst for the setmem patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebel:

> While trying to get rid of the Y constraint in the setmem patterns I
> noticed that for these patterns it isn't even a problem since these
> always only use the constraint with a Pmode match_operand.  But while
> being at it I've tried to fold some of the patterns a bit.

Hmm, I'm wondering whether it's worth it to use a subst pattern that
is only ever used in one single instance?  (Specifically I'm wondering
about the 31z subst.  The and subst is used multiple times.)


> +; Turn the DImode in the 31 bit pattern into TImode to enforce
> +; register pair usage even with -mzarch.
> +; The subreg offset is adjusted accordingly.
> +(define_subst "setmem_31z_subst"
> +  [(clobber (match_operand:DI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:DI  3 "register_operand" "") 
> 0))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +  (subreg:SI (match_operand:DI  4 "register_operand" "")
> + 0)]

Is this right?  Shouldn't it be subreg offset 4 originally?

> +  UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:DI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))]
> +""
> +  [(clobber (match_operand:TI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:TI  3 "register_operand" "") 
> 4))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +      (subreg:SI (match_operand:TI  4 "register_operand" "")
> + 12)]
> + UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:TI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> With this patch a `disabled' attribute is defined which can be used to
> explicitly disable an alternative.  This comes handy when defining the
> substitutions later and while adding it anyway I've used it for the
> existing cases as well.

> +; Insn attribute with boolean value
> +;  1 to disable an insn alternative
> +;  0 or * for an active insn alternative
> +;  an active alternative can still be disabled due to lacking cpu
> +;  facility support
> +(define_attr "disabled" "" (const_int 0))
> +
>  (define_attr "enabled" ""
> -  (cond [(eq_attr "cpu_facility" "standard")
> +  (cond [(eq_attr "disabled" "1")
> +  (const_int 0)
> +
> +  (eq_attr "cpu_facility" "standard")
>(const_int 1)
>  
>   (and (eq_attr "cpu_facility" "ieee")

So I'm wondering what the difference is between this and simply
overriding the default implementation of "enabled" per-insn?

So instead of adding
(set_attr "disabled" "0,1")])
to an insn, you might simply add instead:
(set_attr "enabled" "*,0")])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 9/9] S/390: Disallow SImode in s390_decompose_address

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   * config/s390/s390.c (s390_decompose_address): Don't accept SImode
>   anymore.

Great!  Very nice to finally get rid of this.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 02/01/2016 02:30 PM, Ulrich Weigand wrote:
> > This seems to add just a single alternative.  Is that OK if it gets
> > substituted into a pattern that used multiple alternatives otherwise?
> Yes.  This is supposed to work.  The new constraint will be duplicated until 
> it matches the number
> of alternatives in the original insn.

OK, that makes sense.   But now I'm wondering about this bit of
the ashiftrt patch:

+; FIXME: The number of alternatives is doubled here to match the fix
+; number of 4 in the subst pattern for the (clobber (match_scratch...
+; The right fix should be to support match_scratch in the output
+; pattern of a define_subst.

+(define_subst "cconly_subst"
+  [(set (match_operand:DSI 0 ""   "")
+(match_operand:DSI 1 "" ""))
+   (clobber (reg:CC CC_REGNUM))]
+  "s390_match_ccmode(insn, CCSmode)"
+  [(set (reg CC_REGNUM)
+   (compare (match_dup 1) (const_int 0)))
+   (clobber (match_scratch:DSI 0 "=d,d,d,d"))])

I guess this is where the number 4 comes from?  But if the alternatives
are duplicated, shouldn't it then work to simply use:
   (clobber (match_scratch:DSI 0 "=d"))])


> >> +; In the subst pattern we have to disable the alternative where op2 is
> >> +; already a constant.  This attribute is supposed to be used in the
> >> +; "disabled" insn attribute to achieve this.
> >> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")
> > 
> > Is this necessary given that the subst adds a REG_P (operands[2])
> > condition?
> I wasn't sure about this.  How does reload/lra behave when the constraints 
> allow more than the insn
> condition? I expected that reload might remember that one of the alternatives 
> might take a constant
> as well and then tries as part of reload inheritance and such things to 
> actually exploit this?! Just
> disabling the alternative in order to prevent reload from recording stuff for 
> that alternative
> appeared safer to me.
> 
> On the other hand it is not ideal to have an operands[x] check in the insn 
> condition in the first
> place. Especially when using define_substs where the operands get renumbered 
> this might add hard to
> find bugs. Perhaps I should try replacing the insn predicate with a 
> subst_attr instead? Is having a
> constraint which allows more than a predicate any better than having the same 
> with the insn condition?

Hmm.  In theory reload should always respect both constraints and predicates.
But I guess it can't hurt to disable the alternative just to be safe; it is
indeed an uncommon case to have the constraint accept more than the predicate.
(Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> > Hmm.  In theory reload should always respect both constraints and 
> > predicates.
> > But I guess it can't hurt to disable the alternative just to be safe; it is
> > indeed an uncommon case to have the constraint accept more than the 
> > predicate.
> > (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)
> 
> Does this mean you would prefer to convert the insn condition into a 
> predicate and change it with
> subst_attr or should I leave it in the insn condition?

If we can transform the operand predicate, that would certainly be the best.
I.e. where the original pattern has:
  (match_operand:SI 2 "nonmemory_operand" "a,n")
have the replaced pattern use instead:
  (plus:SI (match_operand:SI 2 "register_operand" "a")
   (match_operand 3 "const_int_operand" "n"))

if that is possible via the subst mechanism ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 02/01/2016 02:45 PM, Ulrich Weigand wrote:
> > So I'm wondering what the difference is between this and simply
> > overriding the default implementation of "enabled" per-insn?
> > 
> > So instead of adding
> > (set_attr "disabled" "0,1")])
> > to an insn, you might simply add instead:
> > (set_attr "enabled" "*,0")])
> Not sure but wouldn't this mean that the value of the enabled attribute would 
> then depend on the
> order of the set_attr for "enabled" and "cpu_facility" since one is defined 
> on the value of the other?!

I don't think the order matches; genattrtab seems to first read in all .md
files and only then optimize and then emit definitions of all the get_attr_...
functions.  The eq_attr uses in the "enabled" definition would only be
evaluated at that point.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Here we go.  I've also removed the "see below", since I don't really
> see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

  if (TARGET_TPF_PROFILING)
{
  /* Generate a BAS instruction to serve as a function
 entry intercept to facilitate the use of tracing
 algorithms located at the branch target.  */
  emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)

I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
  [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
 (match_operand 1 "bras_sym_operand" "X")
 (match_operand 2 "consttable_operand" "X")
 (match_operand 3 "consttable_operand" "X")]
UNSPECV_SPLIT_STACK_DATA)]
  ""
{
  switch_to_section (targetm.asm_out.function_rodata_section
  (current_function_decl));

  output_asm_insn (\".align 3", operands);
  (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
 CODE_LABEL_NUMBER (operands[0]));
  output_asm_insn (\".quad %2\", operands);
  output_asm_insn (\".quad %3\", operands);
  output_asm_insn (\".quad %1-%0\", operands);

  switch_to_section (current_function_section ());
  return "";
}
  [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.

The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

> Both currently supported 
> architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

  if (!info->push_p)
return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-03 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Comment fixed, split_stack_marker gone, reorg gone.  Generated code seems 
> sane,
> but testsuite still running.
> 
> I will need to modify the gold patch to handle the "leaf function taking 
> non-split
> stack function address" issue - this will likely require messing with the 
> target
> independent plumbing, the hook for that doesn't seem to get enough params.

Thanks for making those changes; the patch is looking a lot nicer (and shorter 
:-))
now!  Just to clarify, your original patch series had two common-code 
prerequisite
patches (3/5 and 4/5) -- it looks like those may still be needed?  If so, we'll
have to get approval from the appropriate middle-end maintainers before this
patch can go it as well.

As to the back-end patch, I've now only got some cosmetical issues:

> +  insn = emit_insn (gen_main_base_64 (r1, parm_base));

Now that we aren't using the literal pool infrastructure for the block any more,
I guess we shouldn't be using it to load the address either.  Just something
like:
  insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base));
should do it.

> +(define_insn "split_stack_data"
> +  [(unspec_volatile [(match_operand 0 "" "X")
> +  (match_operand 1 "" "X")
> +  (match_operand 2 "consttable_operand" "X")
> +  (match_operand 3 "consttable_operand" "X")]

And similarly here, just use const_int_operand.

Otherwise, this all looks very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-03 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> libgcc/ChangeLog:
> 
>   * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>   * config/s390/morestack.S: New file.
>   * config/s390/t-stack-s390: New file.
>   * generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>   * config/s390/s390.c (struct machine_function): New field
>   split_stack_varargs_pointer.
>   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>   in s390_emit_prologue.
>   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>   vararg pointer.
>   (morestack_ref): New global.
>   (SPLIT_STACK_AVAILABLE): New macro.
>   (s390_expand_split_stack_prologue): New function.
>   (s390_live_on_entry): New function.
>   (s390_va_start): Use split-stack vararg pointer if appropriate.
>   (s390_asm_file_end): Emit the split-stack note sections.
>   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL): New unspec.
>   (UNSPECV_SPLIT_STACK_DATA): New unspec.
>   (split_stack_prologue): New expand.
>   (split_stack_space_check): New expand.
>   (split_stack_data): New insn.
>   (split_stack_call): New expand.
>   (split_stack_call_*): New insn.
>   (split_stack_cond_call): New expand.
>   (split_stack_cond_call_*): New insn.
> ---
> Changes applied.  Testsuite still running, still works on my simple tests.
> 
> As for common code prerequisites: #3 is no longer needed, and very likely
> so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite
> runs just fine without it now).

OK, I see.  The patch is OK for mainline then, assuming testing passes.

Thanks again,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-04 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Ugh. I take that back.  For -fPIC, the load-address sequence is:
> 
>  larl%r1,f@GOTENT
>  lg  %r2,0(%r1)
>  br  %r14

This is correct.

> And (sibling) call sequence is:
> 
>  larl%r1,f@GOTENT
>  lg  %r1,0(%r1)
>  br  %r1

Oops.  That is actually a GCC bug.  The sibcall sequence really must be:

jg f@PLT

This is a real bug since it forces non-lazy symbol resolution for f
just because the compiler chose those use a sibcall optimization;
that's not supposed to happen.

It seems this bug was accidentally introduced here:

2010-04-20  Andreas Krebbel  

PR target/43635
* config/s390/s390.c (s390_emit_call): Turn direct into indirect
calls for -fpic -m31 if they have been sibcall optimized.

since the patch doesn't check for TARGET_64BIT ...

Andreas, can you have a look?

> It seems there's no proper way to recognize a call vs a load address - 
> so we can either go with emitting the marker, or have the same problem 
> as on ppc.
> 
> So - how much should we care?

I think we should fix that bug.  That won't help for existing objects,
but those don't use split stack either, so that shouldn't matter.

If we fix that bug before (or at the same time as) adding split-stack
support, the linker will still be able to distigunish function pointer
loads from calls (including sibcalls) on all objects using split stack.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-04 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Fair enough.  Here's what I'm going to implement in gold:
> 
> - any PLT relocation: call
> - PC32DBL on a larl: non-call
> - PC32DBL otherwise: call
> - any other relocation: non-call
> 
> Does that sound right?

Hmm, I'm wondering about the PC32DBL choices.  There are now
a large number of other non-call instructions that use PC32DBL,
including lrl, strl, crl, cgrl, cgfrl, ...

However, these all access *data* at the pointed-to location,
so it is quite unlikely they would ever be used with a
function symbol.  So, assuming that you also check that the
target of the relocation is a function symbol, treating only
larl as non-call might be OK.

Maybe a more conservative approach might be to  make the decision
the other way round: for PC32DBL check for *branch* instructions,
and treat only those are calls.  There's just a few branch
instruction using PC32DBL:

brasl  (call)
brcl   (conditional or unconditional sibcall)
brcth  (???)

where the last one is extremely unlikely (but theorically
possible) to be used as conditional sibcall combined with
a register decrement; I don't think this can ever happen
with current compilers however.

For full completeness, there are also PC16DBL relocations that
*could* target called functions, but only when compiling with
the -msmall-exec flag to assume total executable size is less
than 64 KB.  These are used by the following instructions:

bras
brc
brct
brctg
brxh
brxhg
brxle
brxlg
crj
cgrj
clrj
clgrj
cij
cgij
clij
clgij

Note that those are *all* branch instructions, so it might
make sense to add any PC16DBL targetting a function symbol
to the list of calls, just in case.  (But since basically
nobody ever uses -msmall-exec, it doesn't really matter
much either.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-05 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> I'll stay with checking for larl - while I can imagine someone adding a 
> new conditional branch instruction, I don't see a need for another 
> larl-like instruction.  Besides, this way the failure mode for an 
> unknown instruction would be producing an error, instead of silently 
> emitting code with unfixed prologue.

OK, fine with me.  B.t.w. Andreas has checked in the sibcall fix,
so you no longer should be seeing larl used for sibcalls.

> I've updated and resubmitted the gold patch.

Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2016-02-09 Thread Ulrich Weigand

Hi Bill,

> 2014-02-20  Bill Schmidt  
> 
>   * config/rs6000/altivec.md (altivec_lvxl): Rename as
>   *altivec_lvxl__internal and use VM2 iterator instead of
>   V4SI.
>   (altivec_lvxl_): New define_expand incorporating
>   -maltivec=be semantics where needed.

I just noticed that this:

> -(define_insn "altivec_lvxl"
> +(define_expand "altivec_lvxl_"
>[(parallel
> -[(set (match_operand:V4SI 0 "register_operand" "=v")
> -   (match_operand:V4SI 1 "memory_operand" "Z"))
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
>   (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
>"TARGET_ALTIVEC"
> -  "lvxl %0,%y1"
> +{
> +  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
> +{
> +  altivec_expand_lvx_be (operands[0], operands[1], mode, 
> UNSPEC_SET_VSCR);
> +  DONE;
> +}
> +})
> +
> +(define_insn "*altivec_lvxl__internal"
> +  [(parallel
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
> +     (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> +  "TARGET_ALTIVEC"
> +  "lvx %0,%y1"
>[(set_attr "type" "vecload")])

now causes vec_ldl to emit the lvx instead of the lvxl instruction.
I assume this was not actually intended?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-11 Thread Ulrich Weigand

David Edelsohn wrote:
> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra  wrote:
> > This is PR68973 part 2, the failure of a boost test, caused by a
> > splitter emitting an invalid move in reload_vsx_from_gprsf:
> >   emit_move_insn (op0_di, op2);
> >
> > op0 can be any vsx reg, but the mtvsrd destination constraint in
> > movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
> > we have that constraint so left movdi_internal64 alone and used a
> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
> > by reload_vsx_from_gpr.  When looking at those, I noticed we're
> > restricted to fprs there too so fixed that as well.  (We can't use %L
> > in asm operands that must accept vsx.)
> 
> Michael, please review the "wj" constraint in these patterns.
> 
> Alan, the explanation says that it uses a special p8_mtvsrd similar to
> p8_mtvsrd_[12], but does not explain why the two similar patterns are
> removed.  The incorrect use of %L implies those patterns, but the
> change is to p8_xxpermdi_ that is not mentioned in the
> ChangeLog.
> 
> I also would appreciate Uli's comments on this direction because of
> his reload expertise.

For the most part, this patch doesn't really change anything in the
interaction with reload as far as I can see.  The changes introduced
by the patch do make sense to me.  In particular, replacing the two
patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts
of a TFmode register pair with a single pattern p8_mtvsrd that just
works on any DFmode register (leaving the split into high/low to the
caller if necessary) seems to simplify things.

> > +  /* You might think that we could use op0 as one temp and a DF clobber
> > + as the other, but you'd be wrong.  These secondary_reload patterns
> > + don't check that the clobber is different to the destination, which
> > + is probably a reload bug.  */

It's not a bug, it's deliberate behavior.  The reload registers allocated
for secondary reload clobbers may overlap the destination, since in many
cases you simply move the input to the reload register, and then the
reload register to the destination (where the latter move can be a no-op
if it is possible to allocate the reload register and the destination
into the same physical register).  If you need two separate intermediate
values, you need to allocate a secondary reload register of a larger
mode (as is already done in the pattern).

> >/* Also use the destination register to hold the unconverted DImode 
> > value.
> >   This is conceptually a separate value from OP0, so we use gen_rtx_REG
> >   rather than simplify_gen_subreg.  */
> > -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> > +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
> > +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));

While this was not introduced by this patch, I'm a little bit concerned
about the hard-coded use of REGNO here, which will crash if op0 at this
point happens to be a SUBREG instead of a REG.  This is extremely unlikely
at this point in reload, but not 100% impossible, e.g. if op0 for some
reason is one of the "magic" registers like the stack pointer.

That's why it is in general better to use simplify_gen_subreg or one of
gen_highpart/gen_lowpart, which will handle SUBREG correctly as well.
I'm not sure why it matters whether this is "conceptually a separate
value" as the comment argues ...

> >/* Move SF value to upper 32-bits for xscvspdpn.  */
> >emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> > -  emit_move_insn (op0_di, op2);
> > -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
> > +  emit_insn (gen_p8_mtvsrd (op0_df, op2));
> > +  emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf));

The sequence of modes used for op0 here is a bit weird.  First,
op0 is loaded as DFmode by mtvsrd, then it is silently re-
interpreted as V4SFmode when used as input to xscvspdpn, which
gets a DFmode output that is again silently re-interpreted as
SFmode.

This isn't really wrong as such, just maybe a bit confusing.
Maybe instead have p8_mtvsrd use DImode as output (instead of
DFmode), which shouldn't be any harder to use in the
reload_vsx_from_gpr splitter, and keep using the
vsx_xscvspdpn_directmove pattern?

[ This of course reinforces the question why we have p8_mtvsrd
in the first place, instead of just allowing this use directly
in movdi_internal64 itself.  ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-12 Thread Ulrich Weigand

(Replying to multiple messages at once ...)

Michael Meissner wrote:

> At the present time, we do not allow DImode to go into Altivec
> registers. Mostly it was due to reload complications in the power8 time frame,
> where we didn't have d-form addressing for the Altivec registers and
> interference with the other DImode reload patterns, but also I felt I would
> need to go through the main rs6000.md to look for the various DImode patterns
> to see if they needed to be updated.  At some I hoped to extend this, so I
> added the "wi" and "wj" constraints to be used whenever the mode is DImode.
> "wi" is the constraint for the VSX register class DImode can go in
> (i.e. currently FLOAT=5FREGS), and "wj" is the VSX register class DImode can 
> go
> in if we have 64-bit direct move.

I see.  I hadn't noticed the restriction on DImode, sorry.  I think that in
general, it would be a good idea to allow DImode wherever DFmode is allowed,
since the same instructions should be able to handle both.  But given the
unexpected problems that seem to turn up whenever we want to allow more modes
in more registers, this is probably better for stage 1 than right now ...

> The TFmode thing was a hack.  It was the only way I could see getting two
> registers without a lot of hair.  Since TFmode is also restricted to FPR_REGS,
> you could use %L on it.  When I was writing it, I really wished we had a 
> reload
> pattern to get more than one temporary, but we didn't, so I went for TFmode to
> get 2 FPR registers.  Given the replacement pattern no longer uses a TFmode
> temporary, it can go in any appropriate register.

Again, it would probably make sense to allow TFmode (when it is double double)
in any pair of registers where DFmode is allowed, since the type *is* just a
pair of DFmode values.  Again, really more a stage 1 matter ...

> On Thu, Feb 11, 2016 at 07:38:15PM +0100, Ulrich Weigand wrote:
> > It's not a bug, it's deliberate behavior.  The reload registers allocated
> > for secondary reload clobbers may overlap the destination, since in many
> > cases you simply move the input to the reload register, and then the
> > reload register to the destination (where the latter move can be a no-op
> > if it is possible to allocate the reload register and the destination
> > into the same physical register).  If you need two separate intermediate
> > values, you need to allocate a secondary reload register of a larger
> > mode (as is already done in the pattern).
> 
> This is one of the cases I wished the reload support had the ability to
> allocate 2 scratch temporaries instead of 1.  As I said in my other message,
> TFmode was a hack to get two registers to use.

Yes, it would be nice if we could specify multiple scratch registers.  That's
a long-standing FIXME in the reload code:

  /* ??? It would be useful to be able to handle only two, or more than
 three, operands, but for now we can only handle the case of having
 exactly three: output, input and one temp/scratch.  */

Unfortunately, given the structure of reload, that is difficult to fix.

> On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
> > Another concern I had about this, besides using %L in asm output (what
> > forces TFmode to use just fprs?), is what happens when we're using
> > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
> 
> Good point that it breaks if the default long double (TFmode) type is IEEE
> 128-bit floating point.  We would need to have two patterns, one that uses
> TFmode and one that uses IFmode.  I wrote the power8 direct move stuff before
> going down the road of IEEE 128-bit floating point.

Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
then we probably shouldn't be using it.

Another option might be to use TDmode to allocate a scratch register pair.

David Edelsohn wrote:

> Good question: is p8_mtvsrd really necessary if the movdi_internal64
> is updated to use the correct constraints?
> 
> The patch definitely is going in the right direction.  Can we remove
> more unnecessary bits?

Given Mike's reply above, I don't think it is simply a matter of updating
constraints in the one pattern.  Rather, enabling DImode in all vector
registers would require a change to rs6000_hard_regno_mode_ok and then
reviewing all DImode patterns to make sure they're still compatible.

As I said above, that's probably more of a stage 1 effort.

As long as that is not done, my original suggestion here:

> Maybe instead have p8_mtvsrd use DImode as output (instead of
> DFmode), which shouldn't be any ha

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread Ulrich Weigand

Alan Modra wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > Right.  It's a bit unfortunate that we can't just use IFmode 
> > unconditionally,
> > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> > then we probably shouldn't be using it.
> 
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.

Hmm, OK.  That does make things simpler.

> > Another option might be to use TDmode to allocate a scratch register pair.
> 
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.

I was thinking here that in the special case of the *reload scratch
register*, which reload allocates for us, we will always get a full
register.  This is different from some other operand that may originate
from pre-existing RTX that may require a SUBREG even after reload.

But I certainly agree that your current patch looks like a good choice
for a stage4 bugfix change.  Further cleanup can always happen later.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] PR 66337: Improve Compliance with Power ABI

2016-02-18 Thread Ulrich Weigand

Kevin Nilsen wrote:

> This patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu and=
>  powerpc64be-unknown-linux-gnu (both 32-bit and 64-bit) and=20
> powerpc64-unknown-freebsd11.0 (big endian) with no regressions.  Is it ok t=
> o fix this on the trunk?
> 
> The problem described in PR66337 (https://gcc.gnu.org/bugzilla/show_bug.cgi=
> ?id=3D3D66337) is that compiling for PowerPC targets with the -malign-power=
>  command-line option results in invalid field offsets for certain structure=
>  members.   As identified in the problem report, this results from a macro =
> definition present in both config/rs6000/{freebsd64,linux64}.h, which in bo=
> th cases is introduced by the comment:
> 
> /* PowerPC64 Linux word-aligns FP doubles when -malign-power is given. */
> 
> I have consulted various ABI documents, including "64-bit PowerPC ELF Appli=
> cation Binary Interface Supplement 1.9" (http://refspecs.linuxfoundation.or=
> g/ELF/ppc64/PPC-elf64abi.html), "Power Architecture 64-Bit ELF V2 ABI Speci=
> fication" (https://members.openpowerfoundation.org/document/dl/576), and "P=
> ower
> Architecture(R) 32-bit Application Binary Interface Supplement 1.0 - Linux(=
> R) & Embedded" (https://www.power.org/documentation/power-architecture-32-b=
> it-abi-supplement-1-0-embeddedlinuxunified/).  I have not been able to find=
>  any basis for this comment and thus am concluding that the comment and exi=
> sting implementation are incorrect.
> 
> The implemented patch removes the comment and changes the macro definition =
> so that field alignment calculations on 64-bit architectures ignore the -ma=
> lign-power command-line option.  With this fix, the test case identified in=
>  the PR behaves as was expected by the submitter.

There seems to be some confusion here.  First of all, on Linux and FreeBSD,
the *default* behavior is -malign-natural, which matches what the Linux ABI
specifies.  Using -malign-power on *Linux* is an explicit instruction to
the compiler to *deviate* from the documented ABI.

The only effect that the deviation has on Linux is to change the alignment
requirements for certain structure elements.  Your patch removes this change,
making -malign-power fully a no-op on Linux and FreeBSD.  This doesn't seem
to be particularly useful ...  If you don't want the effect, you can simply
not use that switch.

To my understanding, the intent of providing that switch was to allow
creating code that is compatible code produced by some other compilers
that do not adhere to the Linux ABI, but some other ABI.  In particular,
my understanding is that the *AIX* ABI has these alignment requirements.
And in fact, GCC on *AIX* defaults to -malign-power.

Looking at PR 66337, the submitter actually refers to the behaviour of
GCC on AIX, so I'm not sure how Linux is even relevant here.  (Maybe
there is something wrong in how GCC implements the AIX ABI.  But I'm
not really familar with AIX, so I can't help much with that.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-25 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   * config/s390/predicates.md (const_int_6bitset_operand): New
> predicates.
>   * config/s390/s390.md: Include subst.md.
>   ("rotl3"): New expander.
>   ("rotl3", "*rotl3_and"): Merge insn definitions into
>   ...
>   ("*rotl3"): New insn definition.
>   * config/s390/subst.md: New file.

This already looks much nicer (and shorter) :-)

In fact, I'm now wondering whether it couldn't be even shorter.  Would it
be possible to use instead of:

(define_insn "*rotl3"
  [(set (match_operand:GPR 0 "register_operand" "=d,d")
(rotate:GPR (match_operand:GPR 1 "register_operand"  "d,d")
(match_operand:SI  2 "nonmemory_operand" "a,n")))]
  "TARGET_CPU_ZARCH"
  "@
   rll\t%0,%1,(%2)
   rll\t%0,%1,%Y2"
  [(set_attr "op_type"  "RSE")
   (set_attr "atype""reg")
   (set_attr "enabled"  "*,")
   (set_attr "z10prop"  "z10_super_E1")])

simply something like:

(define_insn "*rotl3"
  [(set (match_operand:GPR 0 "register_operand" "=d")
(rotate:GPR (match_operand:GPR 1 "register_operand"  "d")
(match_operand:SI  2 "nonmemory_operand" "an")))]
  "TARGET_CPU_ZARCH"
  "rll\t%0,%1,
  [(set_attr "op_type"  "RSE")
   (set_attr "atype""reg")
   (set_attr "z10prop"  "z10_super_E1")])


where addr_style_op_ops is defined like:

(define_subst_attr "addr_style_op_ops" "addr_style_op_subst" "%Y2" "%Y3(%2)")

and we don't need addr_style_op_enabled any more?   %Y would continue
to emit both simple constants and register operands (and full address
operands e.g. for setmem) as before.


> +(define_subst "masked_op_subst"
> +  [(set (match_operand:DSI 0 ""   "")
> +(SUBST:DSI (match_operand:DSI 1 "" "")
> +(match_operand:SI  2 "" "")))]
> +  ""
> +  [(set (match_dup 0)
> +(SUBST:DSI (match_dup 1)
> +(and:SI (match_dup 2)
> +(match_operand:SI 3 "const_int_6bitset_operand" 
> ""])

Do we need a constraint letter here?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:


> +; vec_set is supposed to *modify* an existing vector so operand 0 is
> +; duplicated as input operand.
> +(define_expand "vec_set"
> +  [(set (match_operand:V0 "register_operand" 
>  "")
> + (unspec:V [(match_operand: 1 "general_operand"   
> "")
> +(match_operand:SI2 "shift_count_or_setmem_operand" 
> "")

This is probably only cosmetic, but should we use nonmemory_operand here
instead of shift_count_or_setmem_operand (just like everywhere else now)?

> +(define_expand "vec_extract"
> +  [(set (match_operand: 0 "nonimmediate_operand" "")
> + (unspec: [(match_operand:V  1 "register_operand" "")
> +(match_operand:SI 2 "shift_count_or_setmem_operand" 
> "")]

Likewise.


> +(define_insn "*vec_set_plus"
> +  [(set (match_operand:V  0 "register_operand" "=v")
> + (unspec:V [(match_operand:   1 "general_operand"   "d")
> +(plus:SI (match_operand:SI 2 "register_operand"  "a")
> + (match_operand:SI 4 "const_int_operand" "n"))
> +(match_operand:V   3 "register_operand"  "0")]
> +   UNSPEC_VEC_SET))]
> +  "TARGET_VX"
> +  "vlvg\t%v0,%1,%4(%2)"
> +  [(set_attr "op_type" "VRS")])

Wouldn't it be better to use %Y4 instead of %4 here?  Or does the middle-end
guarantee that this is never out of range?

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "nonmemory_operand" 
> "a, I")
> + (match_operand:SI 3 "const_int_operand" 
> "n, I"))]
> +UNSPEC_VEC_EXTRACT))]
> +  "TARGET_VX"
> +  "@
> +   vlgv\t%0,%v1,%3(%2)
> +   vste\t%v1,%0,%2"
> +  [(set_attr "op_type" "VRS,VRX")])

Likewise for %3.  Also, the second alternative seems odd, it matches solely a
PLUS of two CONST_INTs, which is not canonical RTL ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 0/9] S/390 rework shift count handling - v3

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   S/390: Use enabled attribute overrides to disable alternatives.
>   S/390: Get rid of Y constraint in rotate patterns.
>   S/390: Get rid of Y constraint in left and logical right shift
> patterns.
>   S/390: Get rid of Y constraint in arithmetic right shift patterns.
>   S/390: Get rid of Y constraint in tabort.
>   S/390: Get rid of Y constraint in vector.md.
>   S/390: Use define_subst for the setmem patterns.
>   S/390: Disallow SImode in s390_decompose_address

Except for a few minor comments for the vector.md patch (separate mail),
this all looks now very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-03-01 Thread Ulrich Weigand

I wrote:
> Andreas Krebbel wrote:
> 
> > +; vec_set is supposed to *modify* an existing vector so operand 0 is
> > +; duplicated as input operand.
> > +(define_expand "vec_set"
> > +  [(set (match_operand:V0 "register_operand"   
> >"")
> > +   (unspec:V [(match_operand: 1 "general_operand"   
> > "")
> > +  (match_operand:SI2 "shift_count_or_setmem_operand" 
> > "")
> 
> This is probably only cosmetic, but should we use nonmemory_operand here
> instead of shift_count_or_setmem_operand (just like everywhere else now)?
> 
> > +(define_expand "vec_extract"
> > +  [(set (match_operand: 0 "nonimmediate_operand" "")
> > +   (unspec: [(match_operand:V  1 "register_operand" "")
> > +  (match_operand:SI 2 "shift_count_or_setmem_operand" 
> > "")]
> 
> Likewise.

I just noticed that there are two more UNSPEC_VEC_SET expanders
in vx-builtins.md.  I guess those should be likewise changed:

(define_expand "vec_insert"
  [(set (match_operand:V_HW0 "register_operand" "")
(unspec:V_HW [(match_operand: 2 "register_operand" "")
  (match_operand:SI3 
"shift_count_or_setmem_operand" "")
  (match_operand:V_HW  1 "register_operand" "")]
 UNSPEC_VEC_SET))]
  "TARGET_VX"
  "")

(define_expand "vec_promote"
  [(set (match_operand:V_HW0 "register_operand" "")
(unspec:V_HW [(match_operand: 1 "register_operand" "")
  (match_operand:SI2 
"shift_count_or_setmem_operand" "")
  (match_dup 0)]
 UNSPEC_VEC_SET))]
  "TARGET_VX"
  "")

Then the only remaining users of shift_count_or_setmem_operand are the
actual setmem patterns (so maybe the predicate can be renamed to
"setmem_operand") :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-03-02 Thread Ulrich Weigand

H.J. Lu wrote:

> +  if (type && type_is_empty_type_p (type))
> +{
> +  if (warn_empty_type_p)
> + warn_empty_type ();
> +  return NULL;
> +}

If I understand the problem correctly, for C code empty types
already were handled correctly, the problem occured just for
C++ code (since the size of an empty size is != 0 there).

However, it seems this warning check would now appear also
for C code, even though there really shouldn't be any ABI
changes there.  Would it not be better to only generate the
warning for C++ (maybe check whether the type size is nonzero)?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)

2015-07-17 Thread Ulrich Weigand

On 07/09/2015 11:43 PM, Martin LiÅ¡ka wrote:

> This final version which I agreed with Richard Sandiford.
> Hope this can be finally installed to trunk?
> 
> Patch can bootstrap and survive regression tests on x86_64-linux-gnu.

Unfortunately, this still crashes on my SPU toolchain build machine,
for pretty much the same reason outlined here:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00868.html

However, the host compiler no longer miscompiles these lines:

  empty_shared_hash = new shared_hash_def;
  empty_shared_hash->refcount = 1;

But that is simply because that "new" now goes to the default
heap-based allocator from the standard library.  There is a
"shared_hash_def_pool", but that's apparently not being used
for anything -- this probably was not intended?

But now the following lines are miscompiled:

  elt_list *el = elt_list_pool.allocate ();
  el->next = next;
  el->elt = elt;

from new_elt_list in cselib.c.  Again, the "allocate" call ends
simply with a cast:

  header = m_returned_free_list;
  m_returned_free_list = header->next;

  return (void *)(header);

and type-based aliasing now states the access to "header->next"
in allocate must not alias the access to "el->next" in new_elt_list,
but clearly it does.

(Since there is no C++ operator new involved at all anymore,
this clearly violates even the C aliasing rules ...)

I really think the allocate routine needs to be more careful to
avoid violating aliasing, e.g. by using memcpy or union-based
type-punning to access its free list info.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)

2015-07-17 Thread Ulrich Weigand

Richard Biener wrote:
> On July 17, 2015 3:11:51 PM GMT+02:00, Ulrich Weigand  
> wrote:
> >(Since there is no C++ operator new involved at all anymore,
> >this clearly violates even the C aliasing rules ...)
> >
> >I really think the allocate routine needs to be more careful to
> >avoid violating aliasing, e.g. by using memcpy or union-based
> >type-punning to access its free list info.
> 
> As far as I understand the object allocator delegates construction to callers 
> and thus in the above case cselib
> Would be responsible for calling placement new on the return value from
> Allocate.

Ah, it looks like I was wrong above: the code uses the *object*
allocator, so it should go through a placement new here:
  inline T *
  allocate () ATTRIBUTE_MALLOC
  {
return ::new (m_allocator.allocate ()) T ();
  }

It's still being miscompiled at least by my GCC 4.1 host compiler ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)

2015-07-17 Thread Ulrich Weigand

Richard Biener wrote:
> On July 17, 2015 3:50:19 PM GMT+02:00, "Martin LiÅ¡ka"  wrote:
> >Question is why aliasing oracle still wrongly aliases these pointers?
> >Another option (suggested by Martin Jambor) would be to place
> >::allocate implementation
> >to alloc-pool.c file.
> 
> Note that all compilers up to 4.4 have aliasing issues with placement new.
> A fix is to move the placement new out-of-line.

Yes, that's what I just noticed as well.  In fact, my particular problem
already disappears with 4.3, presumably due to the fix for PR 29286.

So do we now consider host compilers < 4.3 (4?) unsupported for building
mainline GCC, or should we try to work around the issue (e.g. by moving
the allocator out-of-line or using some other aliasing barrier)?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)

2015-07-17 Thread Ulrich Weigand

On July 17, 2015 6:54:32 PM GMT+02:00, Ulrich Weigand  
wrote:
> >So do we now consider host compilers < 4.3 (4?) unsupported for
> >building
> >mainline GCC, or should we try to work around the issue (e.g. by moving
> >the allocator out-of-line or using some other aliasing barrier)?
> 
> Why is this an issue for stage1 which runs w/o optimization?

Well, this is the SPU compiler on a Cell system, which is technically
a cross compiler from PowerPC (even though the resulting binaries run
natively on the machine).

> For cross compiling we already suggest using known good compilers.

The documentation says:

  To build a cross compiler, we recommend first building and installing
  a native compiler. You can then use the native GCC compiler to build
  the cross compiler. The installed native compiler needs to be GCC
  version 2.95 or later. 

So building with a native GCC 4.1 seems to have been officially
supported until now as far as I can tell (unless you're building Ada).

Now, I could certainly live with a statement that cross compilers can
only be build with a native GCC 4.3 or newer; but that should be IMO
a deliberate decision and be widely announced (maybe even verified
by a configure check?), so that others don't run into the problem;
the nature of its symptoms make the problem difficult to diagnose.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/4] S390 -march=native related fixes

2015-07-17 Thread Ulrich Weigand

Dominik Vogt wrote:

> +  opt_esa_zarch = (has_highgprs) ? " -mzarch" : " -mesa";

This will force -mesa on old machines *even in -m64 mode*,
which is wrong and will cause compilation to fail.

> -/* Defaulting rules.  */
>  #ifdef DEFAULT_TARGET_64BIT
> -#define DRIVER_SELF_SPECS\

This completely removes use of DRIVER_SELF_SPECS for defaulting,
which I introduced back here:
https://gcc.gnu.org/ml/gcc-patches/2003-06/msg03369.html

The reason for using DRIVER_SELF_SPECS as described there is to
make sure we use compatible flags for compiler, assembler and
linker in all cases, even if some of those flags result from
defaulting rules.

If we don't do that, we rely on those components agreeing exactly
in how to default for unspecified options; for example, we want
to give the correct -march flag to the assembler as an additional
verification to detect compiler bugs where the compiler erroneously
generates an incorrect instruction for that architecture.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/4] S390 -march=native related fixes

2015-07-21 Thread Ulrich Weigand

Dominik Vogt wrote:

>   * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle
>   processor capabilities with -march=native.
>   * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise.
>   (DRIVER_SELF_SPECS): Likewise.  Join specs for 31 and 64 bit.
>   * (S390_TARGET_BITS_STRING): Macro to simplify specs.

This version is looking good, except for one problem:

> -  "%{!m31:%{!m64:-m64}}",\
> -  "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}}",  \
> -  "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}}",  \

There's a reason for this particular sequence.  The first line ensures
that one of -m64 or -m31 is present.  The second line ensures that one
of -mesa or -mzarch is present, but this works only if already one of
-m64 or -m31 is present, so it needs to come *after* the first line.
The third line ensures that some -march= switch is present, but this
works only if already one of -mesa or -mzarch is present, so it needs
to comer *after* the second line.

> +#define DRIVER_SELF_SPECS\
> +  "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}} "   \
> +  "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}} "  \
> +  MARCH_MTUNE_NATIVE_SPECS   \
> +  "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}} "

This inverts the order of the second and third lines, so it is now
no longer guaranteed that at least one -march= switch is present.

I understand that you need to move MARCH_MTUNE_NATIVE_SPECS ahead
of the -mesa/-mzarch defaulting rule, but it should be possible
to do that without changing the sequence of the three existing
rules.  Why not just move MARCH_MTUNE_NATIVE_SPECS first?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/4] S390 -march=native related fixes

2015-07-21 Thread Ulrich Weigand

Dominik Vogt wrote:

>   * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle
>   processor capabilities with -march=native.
>   * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise.
>   (DRIVER_SELF_SPECS): Likewise.  Join specs for 31 and 64 bit.
>   * (S390_TARGET_BITS_STRING): Macro to simplify specs.
(That last "*" is superfluous.)

This looks correct to me now, just a cosmetic comment:

> +/* Defaulting rules.  */
> +#define DRIVER_SELF_SPECS\
> +  "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}} ",  \
> +  MARCH_MTUNE_NATIVE_SPECS,  \
> +  "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}} ", \
> +  "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}} "

There's no need to add those spaces at the end -- the self specs
are all independent string, they don't need to end in a space.

Also, I had thought to put MARCH_MTUNE_NATIVE_SPECS right at the
top of list, like so:

#define DRIVER_SELF_SPECS   \
  MARCH_MTUNE_NATIVE_SPECS, \
  "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}}",  \
  "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}}", \
  "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}}"

But there should not be any functional difference between the two,
it just looks a bit nicer maybe.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: s390-linux fails to build

2015-07-23 Thread Ulrich Weigand

Jakub Jelinek wrote:
> On Thu, Jul 23, 2015 at 01:03:19PM +0100, Nick Clifton wrote:
> > Hi Helmut, Hi Ulrich, Hi Andreas,
> > 
> >   A toolchain configured as --target=s390-linux currently fails to build
> >   gcc because of an undefined function:
> > 
> > undefined reference to `s390_host_detect_local_cpu(int, char const**)'
> > Makefile:1858: recipe for target 'xgcc' failed
> > 
> >   The patch below fixes the problem for me by adding a stub function in
> >   s390-common.c, but I am not sure if it is the correct solution.
> >   Please can you advise ?
> 
> Isn't it better to just follow what other arches do?
> E.g. on i?86/x86_64, the EXTRA_SPEC_FUNCTIONS definition is guarded
> with
> #if defined(__i386__) || defined(__x86_64__)
> and similarly on mips:
> #if defined(__mips__)
> and thus I'd expect s390{,x} should guard it with
> #if defined(__s390__) || defined(__s390x__)
> or so.

This is supposed to be fixed by this pending patch:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01546.html

(which hasn't been applied yet, but probably should be soon ...)

> The config.host change also looks wrong, e.g. i?86 or mips have:
>   i[34567]86-*-* \
>   | x86_64-*-* )
> case ${target} in
>   i[34567]86-*-* \
>   | x86_64-*-* )
> host_extra_gcc_objs="driver-i386.o"
> host_xmake_file="${host_xmake_file} i386/x-i386"
> ;;
> esac
> ;;
>   mips*-*-linux*)
> case ${target} in
>   mips*-*-linux*)
> host_extra_gcc_objs="driver-native.o"
> host_xmake_file="${host_xmake_file} mips/x-native"
>   ;;
> esac
> ;;
> while s390 has:
>   s390-*-* | s390x-*-*)
> host_extra_gcc_objs="driver-native.o"
> host_xmake_file="${host_xmake_file} s390/x-native"
> ;;
> I bet that is gone break also cross-compilers from s390* to other targets.

I think this should be fine on s390.  The problem with i386 is that
the driver-native.c file uses data types only defined by the i386
target files (e.g. enum processor_type).  But on s390, the file does
not any target-specific types and should be fully portable.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/4] S390 -march=native related fixes

2015-07-23 Thread Ulrich Weigand

Dominik Vogt wrote:
> 
> gcc/ChangeLog
>  
>   * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle
>   processor capabilities with -march=native.
>   * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise.
>   (DRIVER_SELF_SPECS): Likewise.  Join specs for 31 and 64 bit.
>   (S390_TARGET_BITS_STRING): Macro to simplify specs.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: s390-linux fails to build

2015-07-23 Thread Ulrich Weigand

Jakub Jelinek wrote:
> On Thu, Jul 23, 2015 at 02:46:43PM +0200, Ulrich Weigand wrote:
> > > I bet that is gone break also cross-compilers from s390* to other targets.
> > 
> > I think this should be fine on s390.  The problem with i386 is that
> > the driver-native.c file uses data types only defined by the i386
> > target files (e.g. enum processor_type).  But on s390, the file does
> > not any target-specific types and should be fully portable.
> 
> That hunk means that driver-native.o is added to EXTRA_GCC_OBJS
> even say for s390x-*-* -> x86_64-*-* compiler.  While it might compile
> there, nothing will use it, so what is it good for?
> i?86/x86_64 backend will certainly not reference s390_host_detect_local_cpu
> anywhere.

Oh, I agree this will not be *used*.  I just wanted to point out that it
will not *break* cross-compilers as is.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S390: Clean up cross-compile for S390.

2015-07-24 Thread Ulrich Weigand

Dominik Vogt wrote:

> gcc/ChangeLog
> 
>   * config.hosto_plugin_soname): Include driver-native.c only when
>   building with s390* as host and target.
(There seems to be a typo in the log entry ...)

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Work around host compiler placement new aliasing bug

2015-07-29 Thread Ulrich Weigand

er than GCC 4.3.
+  CXXFLAGS="$CXXFLAGS -fno-strict-aliasing"
+  AC_MSG_CHECKING([whether $CXX is affected by placement new aliasing bug])
+  AC_COMPILE_IFELSE([
+#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
+#error compiler not affected by placement new aliasing bug
+#endif
+],
+[AC_MSG_RESULT([yes]); aliasing_flags='-fno-strict-aliasing'],
+[AC_MSG_RESULT([no])])
+
+  CXXFLAGS="$saved_CXXFLAGS"
+fi
+AC_SUBST(aliasing_flags)
 
 
 
Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 226312)
+++ gcc/Makefile.in (working copy)
@@ -180,6 +180,8 @@ NOCOMMON_FLAG = @nocommon_flag@
 
 NOEXCEPTION_FLAGS = @noexception_flags@
 
+ALIASING_FLAGS = @aliasing_flags@
+
 # This is set by --disable-maintainer-mode (default) to "#"
 # FIXME: 'MAINT' will always be set to an empty string, no matter if
 # --disable-maintainer-mode is used or not.  This is because the
@@ -986,7 +988,8 @@ ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \
 
 # The C++ version.
 ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
-  $(COVERAGE_FLAGS) $(NOEXCEPTION_FLAGS) $(WARN_CXXFLAGS) @DEFS@
+  $(COVERAGE_FLAGS) $(ALIASING_FLAGS) $(NOEXCEPTION_FLAGS) \
+  $(WARN_CXXFLAGS) @DEFS@
 
 # Likewise.  Put INCLUDES at the beginning: this way, if some autoconf macro
 # puts -I options in CPPFLAGS, our include files in the srcdir will always
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Work around host compiler placement new aliasing bug

2015-08-03 Thread Ulrich Weigand

Richard Biener wrote:
> On Wed, Jul 29, 2015 at 3:57 PM, Ulrich Weigand  wrote:
> > Hello,
> >
> > this patch is a workaround for the problem discussed here:
> > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html
> >
> > The problem is that the new pool allocator code relies on C++ aliasing
> > rules related to placement new (basically, that placement new changes
> > the dynamic type of the referenced memory).  GCC compilers prior to
> > version 4.3 did not implement this rule correctly (PR 29286).
> >
> > When building current GCC with a host compiler that is affected by this
> > bug, and we build with optimization enabled (this typically only happens
> > when building a cross-compiler), the resulting compiler binary may be
> > miscompiled.
> >
> > The patch below attempts to detect this situation by checking whether
> > the host compiler is a version of GCC prior to 4.3 (but stil accepts
> > the -fno-strict-aliasing flag).  If so, -fno-strict-aliasing is added
> > to the flags when building the compiler binary.
> >
> > Tested on i686-linux, and when building an SPU cross-compiler using
> > a gcc 4.1 powerpc64-linux host compiler.
> >
> > OK for mainline?
> 
> Ok if nobody objects.

I've checked this in now.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: debug mode symbols cleanup

2015-09-18 Thread Ulrich Weigand

Francois Dumont wrote:

> * include/debug/formatter.h
> (_Error_formatter::_Parameter::_M_print_field): Delete.
> (_Error_formatter::_Parameter::_M_print_description): Likewise.
> (_Error_formatter::_M_format_word): Likewise.
> (_Error_formatter::_M_print_word): Likewise.
> (_Error_formatter::_M_print_string): Likewise.
> (_Error_formatter::_M_get_max_length): Likewise.
> (_Error_formatter::_M_max_length): Likewise.
> (_Error_formatter::_M_indent): Likewise.
> (_Error_formatter::_M_column): Likewise.
> (_Error_formatter::_M_first_line): Likewise.
> (_Error_formatter::_M_wordwrap): Likewise.
> * src/c++11/debug.cc: Adapt.

This seems to break building an spu-elf cross-compiler:

/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:
 In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const 
char*, std::ptrdiff_t)':
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10:
 error: 'stderr' was not declared in this scope
  fprintf(stderr, "\n");
  ^
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22:
 error: 'fprintf' was not declared in this scope
  fprintf(stderr, "\n");
  ^

Bye,
Ulrich

Re: debug mode symbols cleanup

2015-09-18 Thread Ulrich Weigand

Jonathan Wakely wrote:
> On 18/09/15 13:00 +0200, Ulrich Weigand wrote:
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:
> > In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const 
> >char*, std::ptrdiff_t)':
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10:
> > error: 'stderr' was not declared in this scope
> >  fprintf(stderr, "\n");
> >  ^
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22:
> > error: 'fprintf' was not declared in this scope
> >  fprintf(stderr, "\n");
> >  ^
> 
> Catherine fixed this with r227888.

Ah, OK.  Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[commit, spu] Re: [BUILDROBOT] spu: left shift of negative value

2015-09-21 Thread Ulrich Weigand

Jan-Benedict Glaw wrote:

> I just noticed that (for config_list.mk builds), current GCC errors
> out at spu.c, see eg. build
> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=3D469639 :
> 
> g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-excep=
> tions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrit=
> e-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -peda=
> ntic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -f=
> no-common  -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. =
> -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/c=
> farm/mpc/include  -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../=
> libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace   -o =
> spu.o -MT spu.o -MMD -MP -MF ./.deps/spu.TPo ../../../gcc/gcc/config/spu/sp=
> u.c
> ../../../gcc/gcc/config/spu/spu.c: In function =E2=80=98void spu_expand_ins=
> v(rtx_def**)=E2=80=99:
> ../../../gcc/gcc/config/spu/spu.c:530:46: error: left shift of negative val=
> ue [-Werror=3Dshift-negative-value]
>maskbits =3D (-1ll << (32 - width - start));
>   ^
> ../../../gcc/gcc/config/spu/spu.c:536:46: error: left shift of negative val=
> ue [-Werror=3Dshift-negative-value]
>maskbits =3D (-1ll << (64 - width - start));
>   ^
> cc1plus: all warnings being treated as errors
> Makefile:2092: recipe for target 'spu.o' failed
> make[2]: *** [spu.o] Error 1
> make[2]: Leaving directory '/home/jbglaw/build-configlist_mk/spu-elf/build-=
> gcc/mk/spu-elf/gcc'

I've now checked in the following fix.

Thanks,
Ulrich


ChangeLog:

* config/spu/spu.c (spu_expand_insv): Avoid undefined behavior.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 227968)
--- gcc/config/spu/spu.c(working copy)
*** spu_expand_insv (rtx ops[])
*** 472,478 
  {
HOST_WIDE_INT width = INTVAL (ops[1]);
HOST_WIDE_INT start = INTVAL (ops[2]);
!   HOST_WIDE_INT maskbits;
machine_mode dst_mode;
rtx dst = ops[0], src = ops[3];
int dst_size;
--- 472,478 
  {
HOST_WIDE_INT width = INTVAL (ops[1]);
HOST_WIDE_INT start = INTVAL (ops[2]);
!   unsigned HOST_WIDE_INT maskbits;
machine_mode dst_mode;
rtx dst = ops[0], src = ops[3];
int dst_size;
*** spu_expand_insv (rtx ops[])
*** 527,541 
switch (dst_size)
  {
  case 32:
!   maskbits = (-1ll << (32 - width - start));
if (start)
!   maskbits += (1ll << (32 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 64:
!   maskbits = (-1ll << (64 - width - start));
if (start)
!   maskbits += (1ll << (64 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 128:
--- 527,541 
switch (dst_size)
  {
  case 32:
!   maskbits = (~(unsigned HOST_WIDE_INT)0 << (32 - width - start));
if (start)
!   maskbits += ((unsigned HOST_WIDE_INT)1 << (32 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 64:
!   maskbits = (~(unsigned HOST_WIDE_INT)0 << (64 - width - start));
if (start)
!   maskbits += ((unsigned HOST_WIDE_INT)1 << (64 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 128:


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[commit][spu] Support atomic builtins

2015-09-30 Thread Ulrich Weigand

"atomic_exchange"
+   [(match_operand:AINT 0 "spu_reg_operand" "");; output
+(match_operand:AINT 1 "memory_operand" "") ;; memory
+(match_operand:AINT 2 "spu_nonmem_operand" "") ;; input
+(match_operand:SI 3 "const_int_operand" "")]   ;; model
+   ""
+ {
+   rtx retval;
+ 
+   if (MEM_ADDR_SPACE (operands[1]))
+ FAIL;
+ 
+   retval = gen_reg_rtx (mode);
+ 
+   emit_move_insn (retval, operands[1]);
+   emit_move_insn (operands[1], operands[2]);
+   emit_move_insn (operands[0], retval);
+   DONE;
+ })
+ 
+ (define_expand "atomic_"
+   [(ATOMIC:AINT
+  (match_operand:AINT 0 "memory_operand" "")   ;; memory
+  (match_operand:AINT 1 "" ""))   ;; operand
+(match_operand:SI 2 "const_int_operand" "")]   ;; model
+   ""
+ {
+   if (MEM_ADDR_SPACE (operands[0]))
+ FAIL;
+ 
+   spu_expand_atomic_op (, operands[0], operands[1],
+   NULL_RTX, NULL_RTX);
+   DONE;
+ })
+ 
+ (define_expand "atomic_fetch_"
+   [(match_operand:AINT 0 "spu_reg_operand" "");; output
+(ATOMIC:AINT
+  (match_operand:AINT 1 "memory_operand" "")       ;; memory
+  (match_operand:AINT 2 "" ""))   ;; operand
+(match_operand:SI 3 "const_int_operand" "")]   ;; model
+   ""
+ { 
+   if (MEM_ADDR_SPACE (operands[1]))
+ FAIL;
+ 
+   spu_expand_atomic_op (, operands[1], operands[2],
+   operands[0], NULL_RTX);
+   DONE;
+ })
+ 
+ (define_expand "atomic__fetch"
+   [(match_operand:AINT 0 "spu_reg_operand" "");; output
+(ATOMIC:AINT
+  (match_operand:AINT 1 "memory_operand" "")   ;; memory
+  (match_operand:AINT 2 "" ""))   ;; operand
+(match_operand:SI 3 "const_int_operand" "")]   ;; model
+   ""
+ {
+   if (MEM_ADDR_SPACE (operands[1]))
+ FAIL;
+ 
+   spu_expand_atomic_op (, operands[1], operands[2],
+   NULL_RTX, operands[0]);
+   DONE;
+ })
+ 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-06 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:

> The directory at $target_header_dir is already inspected in
> gcc/configure, for e.g. glibc version and stack protector
> support, but not for setting inhibit_libc.  This is just
> inconsistent and the obvious resolution to me is to inhibit
> inhibit_libc when a target *does* "have its own set of headers",
> to quote the comment above the inhibit_libc setting.  There is
> nothing in the build log for "make all-gcc" that shows a
> difference with/without --with-headers, if headers are actually
> present anyway!

> gcc:
>   * configure.ac (target_header_dir): Move block defining
>   this to before the block setting inhibit_libc.
>   (inhibit_libc): When considering $with_headers, just
>   check it it's explicitly "no".  If not, also check if
>   $target_header_dir/stdio.h is present.  If not, set
>   inhibit_libc=true.
>   * configure: Regenerate.

This had the effect of breaking all gcov-related tests in my SPU
builder, since libgcov was stubbed out due to inhibit_libc now
being set.

I'm using the build procedure: build initial GCC (--without-headers),
use it to build newlib, install newlib into prefix, build final GCC
(--with-headers).  Using this procedure, inihibit_libc used to *not*
be set in the final GCC build, but now it is.

I'm seeing two issues here:

> +if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
> +  if test "x$with_headers" != x; then
> +target_header_dir=$with_headers

In the common case of just using --with-headers, this now sets
target_header_dir to "yes", which is not particularly useful.

> +  elif test "x$with_sysroot" = x; then
> +
> target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"

If I change my build procedure to omit --with-headers, it still won't find the
headers since newlib install placed them into .../include, not .../sys-include.

The whole sys-include thing has always confused me anyway :-)  I thought that
the point of sys-include is that if you specify --with-headers=, then
the contents of  will be copied into your prefix under .../sys-include;
the compiler will use both .../sys-include *and* .../include, using the
former to find headers specified using --with-headers, and the latter to
find headers already installed in the prefix.

While the compiler will search both locations, target_header_dir can only be
set to one of them.  However, in this case, it seems it would be more useful
to set it to .../include: since .../sys-include should only ever contain
anything when using --with-headers=, and in *that* case, target_header_dir
is being set directly to  anyway.

Now I understand that you didn't introduce those lines, and they were already
wrong before your patch; but after the patch the problem now actually matters.
Before the patch, I could always use --with-header to say: just assume headers
are present in the prefix, and everything worked.

This is not a critical problem since I have a work-around: remove --with-headers
and also manually create a symlink from sys-include to include in the prefix.
But it would still be nice to avoid having to do the symlink ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-06 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:
> > From: Ulrich Weigand 
> > Date: Tue, 6 Oct 2015 16:04:35 +0200
> 
> > I'm using the build procedure: build initial GCC (--without-headers),
> > use it to build newlib, install newlib into prefix, build final GCC
> > (--with-headers).  Using this procedure, inihibit_libc used to *not*
> > be set in the final GCC build, but now it is.
> 
> And not using --with-newlib I think.  Somewhat of a borderline
> case, FWIW.

Well, --with-newlib doesn't really matter here, since the only use in GCC
itself is for this check:

if { { test x$host != x$target && test "x$with_sysroot" = x ; } ||
   test x$with_newlib = xyes ; } &&
 { test "x$with_headers" = xno || test ! -f "$target_header_dir/stdio.h"; } 
; then
   inhibit_libc=true
fi

and since for me host != target and I don't use a sysroot, the first
condition in the || is already true.

(I don't like using --with-newlib because it causes the configure scripts
in the various target libraries to stop doing cross-compile checks and
default to hard-coded assumptions on which functions are and are not
present.  Those hard-coded checks tend to be outdated and/or wrong for
SPU; while the ususal cross-compile checks work just fine if newlib has
been installed into the prefix before.)

> > > +if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
> > > +  if test "x$with_headers" != x; then
> > > +target_header_dir=$with_headers
> > 
> > In the common case of just using --with-headers, this now sets
> > target_header_dir to "yes", which is not particularly useful.
> 
> --with-headers without a path to an argument?
> Odd but that *should* work.  I see old lines here and there
> including *toplevel* configure.ac that refers to that.

Yes, pretty much the only use for --with-headers without argument was to
short-circuit the inhibit_libc test.

> > Now I understand that you didn't introduce those lines, and they were 
> > already
> > wrong before your patch; but after the patch the problem now actually 
> > matters.
> > Before the patch, I could always use --with-header to say: just assume 
> > headers
> > are present in the prefix, and everything worked.
> 
> At a quick glance and my initial guess there's a missing two or
> four lines checking for with_headers=yes.
> 
> > This is not a critical problem since I have a work-around: remove 
> > --with-headers
> > and also manually create a symlink from sys-include to include in the 
> > prefix.
> > But it would still be nice to avoid having to do the symlink ...
> 
> I'd recommend writing a patch handling that "yes".
> I know I'm the one "exposing a latent bug" but you're in a much
> better position to test it.

So the question is what this should do then?  Should I simply add back the
behavior that when using --with-headers, we never get inhibit_libc set?

Or should we simply ignore --with-headers and check for the presence of
headers installed in the prefix?  This would work too, once we solve the
sys-include vs. include problem.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-06 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:

> Sanity-check: you do have a $target_header_dir/stdio.h right?

Well, no.  That was the point of my original mail :-)

With my initial configure line, using --with-headers, $target_header_dir
is "yes" at this point, and I don't have "yes/stdio.h".

Omitting --with-headers, $target_header_dir is something along the
lines of "/spu/sys-include", which *does not exist* on my
system.  However, all newlib headers including stdio.h are actually
present, just in "/spu/include".

> Maybe make with_headers=yes (i.e. not a path) have the effect of
> setting target_header_dir to include instead of sys-include?

That would solve my problem.  However, if with_headers is not set at all,
shouldn't we then *also* set it to include?

I don't really particularly want to use --with-headers, it's just that
this is what worked in the past.  Ideally, it would be best if everything
just worked as expected without any option if there are indeed already
headers installed in the prefix.

> Anyway, with_headers=yes/no should simply short-circuit an
> inspection of $target_header_dir regarding setting inhibit_libc.
> I guess.  But then again, target_header_dir really needs to be
> set usefully for inspection, or else every use of it needs to be
> dominated by a with_headers=yes test or something.

Well, every other use of $target_header_dir does not in fact matter
on the SPU (because they check for features that aren't present in
newlib and/or on the SPU anyway).  That's why the weird behaviour
did not really matter in the past ...

So short-circuting just for inhibit_libc would get everything back
to where we were before your patch.  But I agree this is not really
the right solution.

My preferred solution would be to set $target_header_dir to include
instead of sys-include, always.  I'm just a bit worried if this may
break other users that have a workflow that somehow works with the
current setup using sys-include.  I guess my underlying problem is
that I don't quite understand how the whole sys-include thing was
intended to work in the first place ... 

[ Note there's also CROSS_SYSTEM_HEADER_DIR, which is also sometimes
set to sys-include ... ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-07 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:
> 
> > From: Ulrich Weigand 
> > Date: Tue, 6 Oct 2015 18:55:53 +0200
> 
> > > Maybe make with_headers=yes (i.e. not a path) have the effect of
> > > setting target_header_dir to include instead of sys-include?
> 
> (...and inspect both, use the first one that works?)

So what does "works" mean, here?  Do you expect that all headers are
present either in include or all in sys-include?  Or is there a
scenario where some headers may be in either directory, and you'd
have to check separately for any header you want to access?

> > My preferred solution would be to set $target_header_dir to include
> > instead of sys-include, always.  I'm just a bit worried if this may
> > break other users that have a workflow that somehow works with the
> > current setup using sys-include.
> 
> (T would: proof of existence here.)  I'm more than worried!
> Please don't suggest to change a toolchain configuration item
> only because you used something that half-way worked, then
> broke, like this.  Not the least when also saying the below:
> 
> >  I guess my underlying problem is
> > that I don't quite understand how the whole sys-include thing was
> > intended to work in the first place ... 
> 
> I don't know when "include" will work and not, in particular
> compared to "sys-include".

So I wondering: what is your current setup?  What headers do you have
in sys-include, and how did they get there?  I'm aware of the copying
done when using --with-headers=, but this case should still work,
since $target_header_dir is set directly to  in this case anyway.
Is there some *other* case, where you do not use --with-headers=,
but still have a pre-existing sys-include directory in the prefix?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-08 Thread Ulrich Weigand

ude instead of
> sys-include would be the least evil, least unexpected change,
> that would work for most, including you.
> 
> I wouldn't understand to instead change things around and make
> "include" be inspected by default.  It's only the --with-headers
> option that's broken.

So this would be:

 --with-headers=   Copy headers from  to sys-include
 --with-headers Use headers from prefix include directory
   Use existing sys-include directory
 --without-headers      Do not use any headers

I agree that this is the smallest change to current behavior;
on the other hand, it seems quite odd overall (i.e. hardest to
explain to someone unfamiliar with current behavior).  At the
very least, the docs would have to be adapted.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-12 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:
> > > So, ISTM we should change --with-headers (=yes) to either look
> > > in sys-include or in include.  Setting it to sys-include
> > > wouldn't help you or anyone else as it's already the default...
> > 
> > On the other hand, the current docs appear to imply that the
> > intent was for --with-headers (=yes) to look into a pre-existing
> > sys-include directory for headers.
> 
> Right.  So, if you'd prefer to align the implementation with
> that, I don't mind.  But, these are odd cases as-is, so current
> use and users matter when aligning the documentation and
> implementation and I wouldn't be surprised if the entire
> usage-space is between ours...

So overall, I now this is probably the best way forward:

1) Fix --with-headers to work just like no argument does now,
   i.e. look in sys-include.

This is a simple bugfix that brings behavior in line with
documentation.  It does imply that everybody has to use
sys-include, but that seems to be accepted practice anyway.
(For me, it just means adding a symlink.)

If at some point we do want to make things work without sys-include,
I see two options:

2a) Change  to not look into sys-include, but include.

This would be a change in existing behavior that would affect some
users.  (They could get the old behavior back by simply adding
--with-headers to their configure line, however.)

--or--

2b) Change target_header_dir from a single directory to a list of
directories, and check all of these for header files.  This list
would typically include both sys-include and include.

This should not change behavior for any existing user, and would
bring the header search at configure time in line with the actual
search order used by the compiler at run time, which will probably
be the least surprise to users anyway ...

For 1), something like the following should probably suffice:

Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 228530)
+++ gcc/configure.ac(working copy)
@@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
 fi

 if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
-  if test "x$with_headers" != x; then
+  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
 target_header_dir=$with_headers
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"

I'll probably not spend any more time right now to try to implement
either of the 2) variants; I can live with using sys-include for now.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2 13/13] Add hook for modifying debug info for address spaces

2015-10-21 Thread Ulrich Weigand

Richard Henderson wrote:

> +@deftypefn {Target Hook} int TARGET_ADDR_SPACE_DEBUG (addr_space_t @var{as})
> +Define this to define how the address space is encoded in dwarf.
> +The result, @var{r}, should be positive to indicate
> +@code{DW_AT_address_class @var{r}} should be emitted; or negative
> +to indicate @code{DW_AT_segment} with location in register @code{~@var{r}}.
> +@end deftypefn

>From my reading of the DWARF specs, DW_AT_segment and DW_AT_address_class
are intended to be used for different purposes:

   Any debugging information entry that contains a description of the location
   of an object or subroutine may have a DW_AT_segment attribute, whose value
   is a location description. The description evaluates to the segment selector
   of the item being described.

vs.

   Any debugging information entry representing a pointer or reference type or
   a subroutine or subroutine type may have a DW_AT_address_class attribute,
   whose value is an integer constant. The set of permissible values is specific
   to each target architecture.

Back when we implemented address space support for Cell/B.E. I interpreted
this to mean that DW_AT_segment was intended to be used for DIEs *defining*
an object, while DW_AT_address_class was intended to be used for pointer
types.  (Since the former wasn't possible on Cell/B.E., we never actually
implemented DW_AT_segment back then.)

B.t.w. Appendix A of the DWARF4 standard also does *not* list DW_AT_segment
as a valid attribute for a DW_TAG_pointer_type or DW_TAG_reference_type DIE.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2 13/13] Add hook for modifying debug info for address spaces

2015-10-22 Thread Ulrich Weigand

Richard Henderson wrote:

> So you would recommend continuing to use address_class for these x86 segments?

Yes, I think so.  As far as I can see, this would simply require to define two
address_class values to correspond to __seg_fs and __seg_gs.  (These choices
should best be documented in the platform ABI.)

GDB infrastructure to decode and use address_class should already be in place,
you simply need to implement the relevant gdbarch callbacks:

  address_class_type_flags
Convert from DWARF address class to GDB internal encoding

  address_class_type_flags_to_name
  address_class_name_to_type_flags
Convert between GDB internal encoding and printable name (e.g. "__seg_fs")

  address_to_pointer
  pointer_to_address
Convert between a GDB flat address (CORE_ADDR) and a target-specific
pointer encoding, based on the pointer's address class flags

Of course, to do the latter, you still need access to the segment base values,
as discussed on the GDB mailing list.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[ping] Re: Fix PR debug/66728

2015-10-28 Thread Ulrich Weigand

air (rtl, mode));
> +   return true;
> + }
> +  return false;
> =20
>  case CONST_DOUBLE:
>/* Note that a CONST_DOUBLE rtx could represent either an integer or=
>  a
> @@ -15671,7 +15678,7 @@ add_const_value_attribute (dw_die_ref die, rtx rtl)
> =20
>  case CONST:
>if (CONSTANT_P (XEXP (rtl, 0)))
> - return add_const_value_attribute (die, XEXP (rtl, 0));
> + return add_const_value_attribute (die, mode, XEXP (rtl, 0));
>/* FALLTHROUGH */
>  case SYMBOL_REF:
>if (!const_ok_for_output (rtl))
> @@ -16171,7 +16178,7 @@ add_location_or_const_value_attribute (dw_die_ref d=
> ie, tree decl, bool cache_p,
> =20
>rtl =3D rtl_for_decl_location (decl);
>if (rtl && (CONSTANT_P (rtl) || GET_CODE (rtl) =3D=3D CONST_STRING)
> -  && add_const_value_attribute (die, rtl))
> +  && add_const_value_attribute (die, DECL_MODE (decl), rtl))
>  return true;
> =20
>/* See if we have single element location list that is equivalent to
> @@ -16192,7 +16199,7 @@ add_location_or_const_value_attribute (dw_die_ref d=
> ie, tree decl, bool cache_p,
>if (GET_CODE (rtl) =3D=3D EXPR_LIST)
>   rtl =3D XEXP (rtl, 0);
>if ((CONSTANT_P (rtl) || GET_CODE (rtl) =3D=3D CONST_STRING)
> -   && add_const_value_attribute (die, rtl))
> +   && add_const_value_attribute (die, DECL_MODE (decl), rtl))
>return true;
>  }
>/* If this decl is from BLOCK_NONLOCALIZED_VARS, we might need its
> @@ -16395,7 +16402,7 @@ tree_add_const_value_attribute (dw_die_ref die, tre=
> e t)
> =20
>rtl =3D rtl_for_decl_init (init, type);
>if (rtl)
> -return add_const_value_attribute (die, rtl);
> +return add_const_value_attribute (die, TYPE_MODE (type), rtl);
>/* If the host and target are sane, try harder.  */
>else if (CHAR_BIT =3D=3D 8 && BITS_PER_UNIT =3D=3D 8
>  && initializer_constant_valid_p (init, type))
> diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr66728.c b/gcc/testsuite/gc=
> c.dg/debug/dwarf2/pr66728.c
> new file mode 100644
> index 000..ba41e97
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr66728.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { x86_64-*-* && lp64 } } } */
> +/* { dg-options "-O -gdwarf -dA" } */
> +
> +__uint128_t
> +test (void)
> +{
> +  static const __uint128_t foo =3D __uint128_t) 0x22334455) << 96)
> +   | 0x99aabb);
> +
> +  return foo;
> +}
> +
> +/* { dg-final { scan-assembler {\.quad\t0x99aabb\t# DW_AT_const_value} } }=
>  */
> +/* { dg-final { scan-assembler {\.quad\t0x22334455\t} } } */
> 


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [ping] Re: Fix PR debug/66728

2015-10-28 Thread Ulrich Weigand

Bernd Schmidt wrote:

> This is all a bit html mangled, but this and the other change in 
> loc_descriptor seem to have the effect of doing nothing when called with 
> BLKmode. What is the desired effect of this on the debug information, 
> and how is it achieved?

Sorry for the mangling caused by my forwarding of the patch.
Richard's original patch is here:
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01312.html

(I'll leave it to Richard to answer your other questions ...)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-11-03 Thread Ulrich Weigand

Dominik Vogt wrote:

> @@ -2936,7 +2936,7 @@
> (set (mem:BLK (match_operand:P 1 "register_operand" "0"))
>   (mem:BLK (match_operand:P 3 "register_operand" "2")))
> (set (match_operand:P 0 "register_operand" "=d")
> - (unspec [(mem:BLK (match_dup 1))
> + (unspec:P [(mem:BLK (match_dup 1))
>(mem:BLK (match_dup 3))
>(reg:SI 0)] UNSPEC_MVST))
> (clobber (reg:CC CC_REGNUM))]

Don't you have to change the expander too?  Otherwise the
pattern will no longer match ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-11-30 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 11/30/2015 04:11 PM, Dominik Vogt wrote:
> > The attached patch fixes some warnings generated by the setmem...
> > patterns in s390.md during build and add test cases for the
> > patterns.  The patch is to be added on to p of the movstr patch:
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03485.html
> > 
> > The test cases validate that the patterns are actually used, but
> > at the moment the setmem_long_and pattern is never actually used
> > and thus the test case would fail.  So I've split the patch in two
> > (both attached to this message) to activate this part of the test
> > once we've fixed that.
> > 
> > The patch has passed the SPEC2006 testsuite without any measurable
> > changes in performance.
> 
> Shouldn't we instead describe the whole setmem operation as unspec including 
> the other operands as
> well? The semantics of the introduced UNSPEC_P_TO_BLK operation is not clear 
> to me.  It suggests to
> be some kind of "cast" which it isn't. In fact it is not able to do its job 
> without the length which
> is specified as use outside the unspec.

Well, I guess I suggested to Dominik to leave the basic
[parallel
  (set (dst:BLK) (src:BLK))
  (use (length)]
structure in place; my understanding is that the middle-end recognizes
this as a block move.  As "source" in this case we'd use a BLKmode
operand that consist iof the same byte replicated a number of times.

If we were to use just a single UNSPEC, how would we indicate to the
middle-end that a block of memory is modified, without using too coarse-
grained clobbers?

However, I agree that UNSPEC_P_TO_BLK really should also get the length
as input, to make it have precisely defined semantics.  Also, I'd rather
use a more descriptive name, like UNSPEC_REPLICATE_BYTE or the like.

What would you think about something like the following?

(define_insn "*setmem_long"
  [(clobber (match_operand: 0 "register_operand" "=d"))
   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
(unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
 (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))
   (use (match_operand: 1 "register_operand" "d"))
   (clobber (reg:CC CC_REGNUM))]

[ Not sure if we'd need an extra (use (match_dup 3)) any more. ]

B.t.w. this is certainly wrong and cannot be generated by common code:
(and:BLK (unspec:BLK
  [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")]
  UNSPEC_P_TO_BLK)
 (match_operand 4 "const_int_operand" "n"))
(This explains why the pattern would never match.)

The AND should be on the filler byte instead:
(unspec:BLK [(and:P (match_operand:P 2 "shift_count_or_setmem_operand" 
"Y")
(match_operand:P 4 "const_int_operand" 
"n"))
 (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 12/02/2015 11:12 AM, Dominik Vogt wrote:
> > Hopefully, this is correct now; it does pass the functional test case
> > that's part of the patch.  Unfortunately the define_insn patters
> > had to be duplicated because of the new subreg offsets.  
> 
> The number of patterns could possibly be reduced using the define_subst 
> machinery.  I'm looking into
> this for some other changes. No need to do this right now. We can do this 
> later on-top.

For this particular issue, shouldn't a simple mode_attr be OK?
I see that the sh port uses this:

(define_mode_attr lowpart_be [(QI "3") (HI "2")])

  [(set (reg:SI T_REG)
(eq:SI
  (subreg:QIHI
(and:SI (match_operand:SI 0 "arith_reg_operand")
(match_operand:SI 1 "arith_reg_operand")) )
  (const_int 0)))]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA][1/3] Remove Cell Broadband Engine SPU targets

2019-10-08 Thread Ulrich Weigand

Thomas Schwinge wrote:

> In r276692, I committed the attached to "Add back Trevor Smigiel; move
> into Write After Approval section", assuming that it was just an
> oversight that he got dropped from the file, as he -- in contrast to
> David and Ulrich -- doesn't remain listed with other roles.

Ah, you're right -- thanks for catching this!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[spu, commit] Define TARGET_HAVE_SPECULATION_SAFE_VALUE

2018-08-06 Thread Ulrich Weigand

Hello,

the SPU processor is not affected by speculation, so this macro can
safely be defined as speculation_safe_value_not_needed.

Committed to mainline.

Bye,
Ulrich


gcc/ChangeLog:

PR target/86807
* config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_safe_value_not_needed.

Index: config/spu/spu.c
===
--- config/spu/spu.c(revision 263334)
+++ config/spu/spu.c(working copy)
@@ -7463,6 +7463,9 @@ static const struct attribute_spec spu_a
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT spu_constant_alignment
 
+#undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-spu.h"
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/2] S/390: Remove TARGET_CPU_ZARCH

2018-08-08 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> TARGET_CPU_ZARCH allowed to distinguish between g5/g6 and newer
> machines.  Since the former are now gone, we can assume that
> TARGET_CPU_ZARCH is always true.  As a side-effect, branch splitting
> is now completely gone.  Some parts of literal pool splitting are
> also gone, but it's still there: we need to support it because
> floating point and vector instructions still cannot use relative
> addressing.

Great to see this finally go!

B.t.w. the literal pool handling can be simplified further now.
The current code was made complicated due to the potential of
interaction between branch splitting and literal pool splitting,
see the long comment:

 However, the two problems are interdependent: splitting the
 literal pool can move a branch further away from its target,
 causing the 64K limit to overflow, and on the other hand,
 replacing a PC-relative branch by an absolute branch means
 we need to put the branch target address into the literal
 pool, possibly causing it to overflow.

 So, we loop trying to fix up both problems until we manage
 to satisfy both conditions at the same time.  [...]

Since branch splitting is gone, this whole logic is superfluous;
the loop will never execute more than once.  (It may not be
necessary any longer to split the logic into the various
chunkify_start/finish routines either ...)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib

2018-08-08 Thread Ulrich Weigand

Jonathan Wakely wrote:
> On 08/08/18 10:52 +0200, Sebastian Huber wrote:
> >While building for Newlib, some configure checks must be hard coded.
> >The aligned_alloc() is supported since 2015 in Newlib.
> >
> >libstdc++-v3
> >
> > PR target/85904
> > * configure.ac): Define HAVE_ALIGNED_ALLOC if building for
> 
> There's a stray closing parenthesis here.
> 
> > Newlib.
> > * configure: Regnerate.
> 
> Typo "Regnerate".
> 
> But the patch itself is fine - OK for trunk.
> 
> I'm ambivalent about this being backported to gcc-7 and gcc-8 branches
> (gcc-6 is unaffected as it doesn't use aligned_alloc).
> 
> It's strictly speaking an ABI change, because HAVE_ALIGNED_ALLOC
> affects the memory layout for allocations from operator new(size_t,
> align_val_t) (in new_opa.cc) which needs to agree with the
> corresponding operator delete (in del_opa.cc). Using static linking it
> might be possible to create a binary that has operator new using
> aligned_alloc, but operator delete expecting to do ((void**)ptr)[-1],
> which would be bad.
> 
> Those operators are C++17, so "experimental", but maybe we shouldn't
> make the change on release branches.

The way it is now I'm getting build failures on new SPU target
(which is newlib based):

/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:58:1:
 error: 'void* aligned_alloc(std::size_t, std::size_t)' was declared 'extern' 
and later 'static' [-fpermissive]
 aligned_alloc (std::size_t al, std::size_t sz)
 ^
In file included from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/cstdlib:75:62,
 from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/stdlib.h:36,
 from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:27:
/home/uweigand/dailybuild/spu-tc-2018-08-07/spu-toolchain/spu/include/stdlib.h:328:8:
 note: previous declaration of 'void* aligned_alloc(size_t, size_t)'
 void * aligned_alloc(size_t, size_t) __malloc_like __alloc_align(1)
^

This seems to be because configure is hard-coded to assume aligned_alloc
is not available, but then the new_opc.cc file tries to define its own
version, conflicting with the newlib prototype that is actually there.

So one way or the other this needs to be fixed ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib

2018-08-08 Thread Ulrich Weigand

Jonathan Wakely wrote:

> Aha, so newlib was using memalign previously:
> 
> @@ -53,20 +54,24 @@ aligned_alloc (std::size_t al, std::size_t sz)
>  #else
>  extern "C" void *memalign(std::size_t boundary, std::size_t size);
>  #endif
> -#define aligned_alloc memalign

Yes, exactly ... this commit introduced the regression.

> OK, I've regressed the branches then - I'll fix that.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[spu, commit] Fix PR target/82960

2017-12-08 Thread Ulrich Weigand

Hello,

the ICE with --enable-checking=rtl reported in PR 82960 was caused by
spu.c:pad_bb using INSN_CODE on RTX with INSN_P false (specifically,
on jump_table_data).  Add checks to handle this case.

Tested on spu-elf, committed to mainline.

Bye,
Ulrich

ChangeLog:

PR target/82960
* config/spu/spu.c (pad_bb): Only check INSN_CODE when INSN_P is true.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 255408)
--- gcc/config/spu/spu.c(working copy)
*** pad_bb(void)
*** 2029,2036 
for (; insn; insn = next_insn)
  {
next_insn = next_active_insn (insn);
!   if (INSN_CODE (insn) == CODE_FOR_iprefetch
! || INSN_CODE (insn) == CODE_FOR_hbr)
{
  if (hbr_insn)
{
--- 2029,2037 
for (; insn; insn = next_insn)
  {
next_insn = next_active_insn (insn);
!   if (INSN_P (insn)
!   && (INSN_CODE (insn) == CODE_FOR_iprefetch
! || INSN_CODE (insn) == CODE_FOR_hbr))
{
  if (hbr_insn)
{
*** pad_bb(void)
*** 2048,2054 
}
  hbr_insn = insn;
}
!   if (INSN_CODE (insn) == CODE_FOR_blockage && next_insn)
{
  if (GET_MODE (insn) == TImode)
PUT_MODE (next_insn, TImode);
--- 2049,2055 
}
  hbr_insn = insn;
}
!   if (INSN_P (insn) && INSN_CODE (insn) == CODE_FOR_blockage && next_insn)
{
  if (GET_MODE (insn) == TImode)
PUT_MODE (next_insn, TImode);
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2] S/390: Fix conditional returns on z196+

2018-09-21 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * config/s390/s390.md: Do not use PARALLEL RETURN+USE when
>   returning via %r14.

This makes sense to me.  I'm just wondering if it wouldn't be
simpler to do the check for r14 right in s390_emit_epilogue
and then just call gen_return instead of gen_return_use
directly there?

> +++ b/gcc/config/s390/s390.md
> @@ -10831,6 +10831,13 @@
>   (use (match_operand 0 "register_operand" "a"))])]
>""
>  {
> +  if (REGNO (operands[0]) == RETURN_REGNUM && s390_can_use_return_insn ())

This probably cannot happen in practice in this specific case,
but in general a register_operand may also be e.g. a SUBREG,
so you shouldn't use REGNO without first verifying that this
is a REG.  (If you move the check to s390_emit_epilogue as above,
this point is moot; you don't even need to generate a REG RTX
if it's for r14 then.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v3] S/390: Fix conditional returns on z196+

2018-09-24 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> gcc/ChangeLog:
> 
> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL
>   RETURN+USE when returning via %r14.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * gcc.target/s390/risbg-ll-3.c: Expect conditional returns.
>   * gcc.target/s390/zvector/vec-cmp-2.c: Likewise.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.

2014-09-01 Thread Ulrich Weigand

Tom de Vries wrote:

>   * ira-costs.c (ira_tune_allocno_costs): Use
>   ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS to adjust costs.

In debugging PR 53864 on s390x-linux, I ran into a weird change in behavior
that occurs when the following part of this patch was checked in:

> -   if (ira_hard_reg_set_intersection_p (regno, mode, 
> call_used_reg_set)
> -   || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode))
> - cost += (ALLOCNO_CALL_FREQ (a)
> -  * (ira_memory_move_cost[mode][rclass][0]
> - + ira_memory_move_cost[mode][rclass][1]));
> +   crossed_calls_clobber_regs
> + = &(ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a));
> +   if (ira_hard_reg_set_intersection_p (regno, mode,
> +*crossed_calls_clobber_regs))
> + {
> +   if (ira_hard_reg_set_intersection_p (regno, mode,
> +call_used_reg_set)
> +   || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode))
> + cost += (ALLOCNO_CALL_FREQ (a)
> +  * (ira_memory_move_cost[mode][rclass][0]
> + + ira_memory_move_cost[mode][rclass][1]));
>  #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER
> -   cost += ((ira_memory_move_cost[mode][rclass][0]
> - + ira_memory_move_cost[mode][rclass][1])
> -* ALLOCNO_FREQ (a)
> -* IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
> +   cost += ((ira_memory_move_cost[mode][rclass][0]
> + + ira_memory_move_cost[mode][rclass][1])
> +* ALLOCNO_FREQ (a)
> +* IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
>  #endif
> + }

Before that patch, this code would penalize all call-clobbered registers
(if the alloca is used across a call), and it would penalize *all* registers
in a target-dependent way if IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined;
the latter is completely independent of the presence of any calls.

However, after that patch, the IRA_HARD_REGNO_ADD_COST_MULTIPLIER penalty
is only applied for registers clobbered by calls in this function.  This
seems a completely unrelated change, and looks just wrong to me ...

Was this done intentionally or is this just an oversight?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [IRA] some code improvement and s390 support

2014-09-01 Thread Ulrich Weigand

Vladmir Makarov wrote:

>   * config/s390/s390.h (IRA_COVER_CLASSES,
>   IRA_HARD_REGNO_ADD_COST_MULTIPLIER(regno)): Define.

In debugging PR 53854 I noticed a strange behavior in IRA costs
that seems to trace back to the very first definition of the
IRA_HARD_REGNO_ADD_COST_MULTIPLIER on s390:

> +/* In some case register allocation order is not enough for IRA to
> +   generate a good code.  The following macro (if defined) increases
> +   cost of REGNO for a pseudo approximately by pseudo usage frequency
> +   multiplied by the macro value.
> +
> +   We avoid usage of BASE_REGNUM by nonzero macro value because the
> +   reload can decide not to use the hard register because some
> +   constant was forced to be in memory.  */
> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(regno)\
> +  (regno == BASE_REGNUM ? 0.0 : 0.5)

(which is still unchanged in current sources.)

Now, the comment says BASE_REGNUM should be avoided, but the actual
implementation of the macro seems to avoid *all* registers *but*
BASE_REGNUM ...

Am I misreading this, or is there indeed a logic error here?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH PR62151]Fix uninitialized register issue caused by distribute_notes in combine pass

2014-09-02 Thread Ulrich Weigand

Segher Boessenkool wreote:
> On Mon, Sep 01, 2014 at 10:39:10AM -0600, Jeff Law wrote:
> > On 09/01/14 05:38, Segher Boessenkool wrote:
> > >On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote:
> > >>>In the testcase (and comment in the proposed patch), why is combine
> > >>>combining four insns at all?  That means it rejected combining just the
> > >>>first three.  Why did it do that?
> > >>It is explicitly reject by below code in can_combine_p.
> > >>
> > >>   if (GET_CODE (PATTERN (i3)) == PARALLEL)
> > >> for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
> > >>   if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
> > >> {
> > >>   /* Don't substitute for a register intended as a clobberable
> > >>  operand.  */
> > >>   rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
> > >>   if (rtx_equal_p (reg, dest))
> > >> return 0;
> > >>
> > >>Since insn i2 in the list of i0/i1/i2 as below contains parallel
> > >>clobber of dest_of_insn76/use_of_insn77.
> > >>32: r84:SI=0
> > >>76: flags:CC=cmp(r84:SI,0x1)
> > >>   REG_DEAD r84:SI
> > >>77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
> > >>   REG_DEAD flags:CC
> > >>   REG_UNUSED flags:CC
> > >
> > >Archaeology suggests this check is because the clobber might be an
> > >earlyclobber.  Which seems silly: how can it be a valid insn at all
> > >in that case?  It seems to me the check can just be removed.  That
> > >will hide your issue, maybe even solve it (but I doubt it).
> > Silly for other reasons, namely that earlyclobber doesn't come into play 
> > until after combine (register allocation and later).
> 
> The last change to this code was by Ulrich (cc:ed); in that thread (June
> 2004, mostly not threaded in the mail archive, broken MUAs :-( ) it was
> said that any clobber should be considered an earlyclobber (an RTL insn
> can expand to multiple machine instructions, for example).  But I don't
> see how that can matter for "dest" here (the dest of "insn", that's 76
> in the example), only for "src".
> 
> The version of "flags" set in 76 obviously dies in 77 (it clobbers the
> reg after all), but there is no way it could clobber it before it uses
> it, that just makes no sense.  And in the combined insn that version of
> flags does not exist at all.

This seems the time period where the email archive is not fully complete;
some of the mails of that 2004 thread apparently were not linked into the
monthly thread list.  This archive seems to have them all:
http://marc.info/?t=108747834900012&r=1&w=2

In any case, this test in can_combine_p rejects a combination for *two*
different issues.  One is the earlyclobber problem, which is what that
2004 thread was about, and which my patch back then relaxed for fixed
hard register.

However, this doesn't seem to apply to the example above; that is really
about the second problem: don't substitute into a clobber.

I understand the reason why this particular substitution is rejected is
simply that if it weren't, we'd be substituting flags:CC=cmp(r84:SI,0x1)
into clobber flags:CC, resulting in clobber cmp(r84:SI,0x1), which is
invalid RTL.

Now I guess this check could be relaxed if somewhere else in combine we'd
recognize the substitution into a clobber and simply omit it in that case.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.

2014-09-03 Thread Ulrich Weigand

Tom de Vries wrote:

> thanks for noticing this. I agree, this looks wrong, and is probably an 
> oversight. [ It seems that s390 is the only target defining 
> IRA_HARD_REGNO_ADD_COST_MULTIPLIER, so this problem didn't show up on any 
> other 
> target. ]
> 
> I think attached patch fixes it.
> 
> I've build the patch and ran the fuse-caller-save tests, and I'm currently 
> bootstrapping and reg-testing it on x86_64.

Thanks!

> Can you check whether this patches fixes the issue for s390 ?

Yes, this (which is equivalent to a patch I had been using) does fix
the s390 issue again.


Just for my curiosity, why is the second condition (after &&)
needed in this clause in the first place?

> if (ira_hard_reg_set_intersection_p (regno, mode,
> +*crossed_calls_clobber_regs)
> +   && (ira_hard_reg_set_intersection_p (regno, mode,
>  call_used_reg_set)
> -   || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode))

If a register is in crossed_calls_clobber_regs, can it ever *not*
be a call-clobbered register?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos

2013-11-11 Thread Ulrich Weigand

Hello,

when implementing the new ABI for powerpc64le-linux, I ran into an assertion
failure in store_unaligned_arguments_into_pseudos:
gcc_assert (args[i].partial % UNITS_PER_WORD == 0);

This can happen in the new ABI since we pass "homogeneous structures"
consisting of soleley floating point elements of the same type in
floating-point registers until they run out, and the rest of the
structure in memory.  If the structure member type is a 4-byte float,
and we only have an odd number of floating point registers available,
then args[i].partial can legitimately end up not being a multiple
of UNITS_PER_WORD (i.e. 8 on the platform).

Now, there are a number of similar checks that args[i].partial is
aligned elsewhere in calls.c and functions.c.  But for all of those
the logic is: if args[i].reg is a REG, then args[i].partial must be
a multiple of the word size; but if args[i].regs is a PARALLEL, then
args[i].partial can be any arbitrary value.  In the powerpc64le-linux
use case, the back-end always generates PARALLEL in this situation,
so it seemed the middle-end ought to support this -- and it does,
except for this one case.

However, looking more closely, it seems store_unaligned_arguments_into_pseudos
is not really useful for PARALLEL arguments in the first place.  What this
routine does is load arguments into args[i].aligned_regs.  But if we have
an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in
fact never be used later on at all!   Instead, PARALLEL will always be
handled directly via emit_group_move (in load_register_parameters), so
the code generated by store_unaligned_arguments_into_pseudos for such cases
is simply dead anyway.

Thus, I'd suggest to simply have store_unaligned_arguments_into_pseudos
skip PARALLEL arguments.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL
arguments.

Index: gcc/gcc/calls.c
===
--- gcc.orig/gcc/calls.c
+++ gcc/gcc/calls.c
@@ -981,6 +981,7 @@ store_unaligned_arguments_into_pseudos (
 
   for (i = 0; i < num_actuals; i++)
 if (args[i].reg != 0 && ! args[i].pass_on_stack
+   && GET_CODE (args[i].reg) != PARALLEL
&& args[i].mode == BLKmode
&& MEM_P (args[i].value)
    && (MEM_ALIGN (args[i].value)
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Avoid duplicate calls to REG_PARM_STACK_SPACE

2013-11-11 Thread Ulrich Weigand

Hello,

this is another tweak to the middle-end to help support the new
powerpc64le-linux ABI.

In the new ABI, we make a distinction between functions that pass
all arguments solely in registers, and those that do not.  Only when
calling one the latter type (or a varags routine) does the caller
need to provide REG_PARM_STACK_SPACE; in the former case, this can
be omitted to save stack space.

This leads to a definition of REG_PARM_STACK_SPACE that is a lot
more complex than usual, which means it would be helpful if the
middle-end were to evaluate it sparingly (e.g. once per function
definition / call).

The middle-end already does cache REG_PARM_STACK_SPACE results,
however, this cache it not consistently used.  In particular,
the locate_and_pad_parm routine will re-evaluate the macro
every time it is called, even though all callers of the routine
already have a cached copy.

This patch adds a new argument to locate_and_pad_parm which is
used instead of evaluating REG_PARM_STACK_SPACE, and updates
the callers to pass in the correct value.

There should be no change in generated code on any platform.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich

2013-11-11  Ulrich Weigand  
Alan Modra  

ChangeLog:

* function.c (assign_parms): Use all.reg_parm_stack_space instead
of re-evaluating REG_PARM_STACK_SPACE target macro.
(locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE.  Use it
instead of evaluating target macro REG_PARM_STACK_SPACE every time.
(assign_parm_find_entry_rtl): Update call.
* calls.c (initialize_argument_information): Update call.
(emit_library_call_value_1): Likewise.
* expr.h (locate_and_pad_parm): Update prototype.

Index: gcc/gcc/expr.h
===
--- gcc.orig/gcc/expr.h
+++ gcc/gcc/expr.h
@@ -521,8 +521,8 @@ extern rtx expand_divmod (int, enum tree
  rtx, int);
 #endif
 
-extern void locate_and_pad_parm (enum machine_mode, tree, int, int, tree,
-struct args_size *,
+extern void locate_and_pad_parm (enum machine_mode, tree, int, int, int,
+tree, struct args_size *,
 struct locate_and_pad_arg_data *);
 
 /* Return the CODE_LABEL rtx for a LABEL_DECL, creating it if necessary.  */
Index: gcc/gcc/calls.c
===
--- gcc.orig/gcc/calls.c
+++ gcc/gcc/calls.c
@@ -1326,6 +1326,7 @@ initialize_argument_information (int num
 #else
 args[i].reg != 0,
 #endif
+reg_parm_stack_space,
 args[i].pass_on_stack ? 0 : args[i].partial,
 fndecl, args_size, &args[i].locate);
 #ifdef BLOCK_REG_PADDING
@@ -3736,7 +3737,8 @@ emit_library_call_value_1 (int retval, r
 #else
   argvec[count].reg != 0,
 #endif
-  0, NULL_TREE, &args_size, &argvec[count].locate);
+  reg_parm_stack_space, 0,
+  NULL_TREE, &args_size, &argvec[count].locate);
 
   if (argvec[count].reg == 0 || argvec[count].partial != 0
  || reg_parm_stack_space > 0)
@@ -3823,7 +3825,7 @@ emit_library_call_value_1 (int retval, r
 #else
   argvec[count].reg != 0,
 #endif
-  argvec[count].partial,
+  reg_parm_stack_space, argvec[count].partial,
   NULL_TREE, &args_size, &argvec[count].locate);
  args_size.constant += argvec[count].locate.size.constant;
  gcc_assert (!argvec[count].locate.size.var);
Index: gcc/gcc/function.c
===
--- gcc.orig/gcc/function.c
+++ gcc/gcc/function.c
@@ -2523,6 +2523,7 @@ assign_parm_find_entry_rtl (struct assig
 }
 
   locate_and_pad_parm (data->promoted_mode, data->passed_type, in_regs,
+  all->reg_parm_stack_space,
   entry_parm ? data->partial : 0, current_function_decl,
   &all->stack_args_size, &data->locate);
 
@@ -3511,11 +3512,7 @@ assign_parms (tree fndecl)
   /* Adjust function incoming argument size for alignment and
  minimum length.  */
 
-#ifdef REG_PARM_STACK_SPACE
-  crtl->args.size = MAX (crtl->args.size,
-   REG_PARM_STACK_SPACE (fndecl));
-#endif
-
+  crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space);
   crtl->args.size = CEIL_ROUND (crtl->args.size,
   PARM_BOUNDARY / BITS_PER_UNIT);
 
@@ -3719,6 +3716,9 @@ gimplify_parameters (void)
IN_REGS is nonzero if the argument will be passed in registers.  It will
never be set if RE

[PATCH, rs6000] Fix little-endian bug in linux-unwind.h

2013-11-11 Thread Ulrich Weigand

Hello,

this fixes another issue related to CR unwinding, this time only on
64-bit little-endian systems.

The problem is linux-unwind.h:ppc_fallback_frame_state, which notes
that the kernel places the CR value in the low bits of a 64-bit
field in the signal handler frame; but the offset of that field
is different in little-endian.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Correct
location of CR save area for 64-bit little-endian systems.

Index: gcc/libgcc/config/rs6000/linux-unwind.h
===
--- gcc.orig/libgcc/config/rs6000/linux-unwind.h
+++ gcc/libgcc/config/rs6000/linux-unwind.h
@@ -185,6 +185,7 @@ ppc_fallback_frame_state (struct _Unwind
 {
   struct gcc_regs *regs = get_regs (context);
   struct gcc_vregs *vregs;
+  long cr_offset;
   long new_cfa;
   int i;
 
@@ -206,11 +207,13 @@ ppc_fallback_frame_state (struct _Unwind
   fs->regs.reg[i].loc.offset = (long) ®s->gpr[i] - new_cfa;
 }
 
+  /* The CR is saved in the low 32 bits of regs->ccr.  */
+  cr_offset = (long) ®s->ccr - new_cfa;
+#ifndef __LITTLE_ENDIAN__
+  cr_offset += sizeof (long) - 4;
+#endif
   fs->regs.reg[R_CR2].how = REG_SAVED_OFFSET;
-  /* CR? regs are always 32-bit and PPC is big-endian, so in 64-bit
- libgcc loc.offset needs to point to the low 32 bits of regs->ccr.  */
-  fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa
-  + sizeof (long) - 4;
+  fs->regs.reg[R_CR2].loc.offset = cr_offset;
 
   fs->regs.reg[R_LR].how = REG_SAVED_OFFSET;
   fs->regs.reg[R_LR].loc.offset = (long) ®s->link - new_cfa;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, rs6000] Fix corner case in unwinding CR values

2013-11-11 Thread Ulrich Weigand

Hello,

this patch fixes a corner case bug in unwinding CR values.  The
underlying problem is as follows:

In order to save the CR in the prologue, two insns are necessary.
The first is a mfcr that loads the CR value into a GPR, and the
second is a store of the GPR to the stack.  The back-end currently
tags the first insn with a REG_FRAME_RELATED_EXPR, and marks both
as RTX_FRAME_RELATED_P.  This causes the middle-end to emit two
CFI records, showing the CR saved first in the GPR, and later
in its stack slot.  (The first record is omitted if there is no
possible exception in between.)

Now, assume we use -fnon-call-exceptions, and get a signal on
an instruction in between those two points, and the signal handler
attempts to throw.  The unwind logic will read CFI records and
determine that in this frame, CR was saved into a GPR; and in
the frame below (which must be a signal frame), that GPR was
saved onto the stack.

What the unwind logic then decides to do is to restore CR from
the save slot of that latter GPR; i.e. the unwinder will copy
that GPR save slot over the CR save slot of the unwind routine
that calls __builtin_eh_return.

Unfortunately, this is a problem on a 64-bit big-endian platform,
since that GPR save slot is 8 bytes, while the CR save slot is
just 4 bytes.  The unwinder therefore copies the wrong half of
the GPR save slot over the CR save slot ...


As second, related, problem will come up with the new ABI.  Here,
we want to track each CR field in a separate CFI record, even
though the values end up sharing the same stack slot.  This is
not a problem for a CFI record pointing to memory.  However, the
current dwarf2out.c middle-end logic is unable to handle the
situation where multiple registers are saved in the same *register*
at the same time ...


Both problems can be fixed in the same way, by never emitting
a CFI record showing CR saved into a GPR.  This is easily done
by simply marking only the store to memory as RTX_FRAME_RELATED_P.
However, to avoid generating incorrect code we must then prevent
and exception point to occur in between the two insns to save CR.
The easiest way to do so is to mark the store to the stack slot
as an additional user of all CR fields that need to be saved by
the current function.

This does restrict scheduling other instructions in between the
prologue a bit more than otherwise.  But since this affects only
very special cases (e.g. an instruction that may triggers a floating-
point exception, while -fnon-call-exceptions is active), this
seems not too bad ...


Note that in the epilogue, this is not really a problem, even though
we also need two instructions to restore CR.  We can (and do) simply
leave the CFI record point to the stack slot all the time, which
remains valid even after the CR value was read from there (and even
after the stack pointer itself was modified).


The patch below fixes this problem as described above, and adds
a test case that fails without the patch.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_emit_prologue): Do not place a
RTX_FRAME_RELATED_P marker on the UNSPEC_MOVESI_FROM_CR insn.
Instead, add USEs of all modified call-saved CR fields to the
insn storing the result to the stack slot, and provide an
appropriate REG_FRAME_RELATED_EXPR for that insn.
* config/rs6000/rs6000.md ("*crsave"): New insn pattern.
* config/rs6000/predicates.md ("crsave_operation"): New predicate.

testsuite/ChangeLog:

2013-11-11  Ulrich Weigand  

* g++.dg/eh/ppc64-sighandle-cr.C: New test.


Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -21761,21 +21761,9 @@ rs6000_emit_prologue (void)
   && REGNO (frame_reg_rtx) != cr_save_regno
   && !(using_static_chain_p && cr_save_regno == 11))
 {
-  rtx set;
-
   cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);
   START_USE (cr_save_regno);
-  insn = emit_insn (gen_movesi_from_cr (cr_save_rtx));
-  RTX_FRAME_RELATED_P (insn) = 1;
-  /* Now, there's no way that dwarf2out_frame_debug_expr is going
-to understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)'.
-But that's OK.  All we have to do is specify that _one_ condition
-code register is saved in this stack slot.  The thrower's epilogue
-will then restore all the call-saved registers.
-We use CR2_REGNO (70) to be compatible with gcc-2.95 on Linux.  */
-  set = gen_rtx_SET (VOIDmode, cr_save_rtx,
-gen_rtx_REG (SImode, CR2_REGNO));
-  add_reg_note (insn, REG_FRAME_RELATED_EXPR, set);
+  emit_insn (gen_movesi_from_cr (cr_save_rtx));
 }
 
   /* Do any r

[PATCH, rs6000] ELFv2 ABI preparation: Refactor call expanders

2013-11-11 Thread Ulrich Weigand

Hello,

this patch refactors code to expand calls in preparation for adding support
for the new little-endian ABI.

Currently, the call expanders always generate the same RTX pattern, which
is matched by different insns patterns depending on circumstances (ABI,
local vs. global calls, ...).   This makes it difficult if in some of
these circumstances, we'd really need a different RTX pattern, e.g.
becuase the call uses certain special registers, or because the call
must perform extra actions like saving or restoring the TOC pointer.

There is one exception to this logic already in place: when expanding
an indirect call on ABI_AIX, the expander calls a routine in rs6000.c.

This patch expands on that by handling *all* calls (including sibling
calls) on ABI_AIX via a routine in rs6000.c.   This should be fully
transparent to generated code, although generated RTL is a bit different:
we use CALL_INSN_FUNCTION_USAGE to track uses of ABI-defined registers
by the call, which has a couple of other advantages:

- we can remove some patterns that were duplicated solely to track
  whether or not a call uses r11;

- we can simply the sibcall patterns so they no longer explicitly
  refer to LR_REGNO;

- we can precisely track which calls use r2 to ensure we have correct
  data flow for the register (this doesn't actually matter for the
  current code, but will become necessary with the new ABI).


Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_call_indirect_aix): Rename to ...
(rs6000_call_aix): ... this.  Handle both direct and indirect calls.
Create call insn directly instead of via various gen_... routines.
Mention special registers used by the call in CALL_INSN_FUNCTION_USAGE.
(rs6000_sibcall_aix): New function.
* config/rs6000/rs6000.md (TOC_SAVE_OFFSET_32BIT): Remove.
(TOC_SAVE_OFFSET_64BIT): Likewise.
(AIX_FUNC_DESC_TOC_32BIT): Likewise.
(AIX_FUNC_DESC_TOC_64BIT): Likewise.
(AIX_FUNC_DESC_SC_32BIT): Likewise.
(AIX_FUNC_DESC_SC_64BIT): Likewise.
("call" expander): Call rs6000_call_aix.
("call_value" expander): Likewise.
("call_indirect_aix"): Replace this pattern ...
("call_indirect_aix_nor11"): ... and this pattern ...
("*call_indirect_aix"): ... by this insn pattern.
("call_value_indirect_aix"): Replace this pattern ...
("call_value_indirect_aix_nor11"): ... and this pattern ...
("*call_value_indirect_aix"): ... by this insn pattern.
("*call_nonlocal_aix32", "*call_nonlocal_aix64"): Replace by ...
("*call_nonlocal_aix"): ... this pattern.
("*call_value_nonlocal_aix32", "*call_value_nonlocal_aix64"): Replace
("*call_value_nonlocal_aix"): ... by this pattern.
("*call_local_aix"): New insn pattern.
("*call_value_local_aix"): Likewise.
("sibcall" expander): Call rs6000_sibcall_aix.
("sibcall_value" expander): Likewise.  Move earlier in file.
("*sibcall_nonlocal_aix"): Replace by ...
("*sibcall_aix"): ... this pattern.
("*sibcall_value_nonlocal_aix"): Replace by ...
("*sibcall_value_aix"): ... this pattern.
* config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): Remove.
(rs6000_call_aix): Add prototype.
(rs6000_sibcall_aix): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -30599,118 +30599,138 @@ rs6000_legitimate_constant_p (enum machi
 }
 
 
-/* A function pointer under AIX is a pointer to a data area whose first word
-   contains the actual address of the function, whose second word contains a
-   pointer to its TOC, and whose third word contains a value to place in the
-   static chain register (r11).  Note that if we load the static chain, our
-   "trampoline" need not have any executable code.  */
+
+/* Expand code to perform a call under the AIX ABI.  */
 
 void
-rs6000_call_indirect_aix (rtx value, rtx func_desc, rtx flag)
+rs6000_call_aix (rtx value, rtx func_desc, rtx flag, rtx cookie)
 {
+  rtx toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM);
+  rtx toc_load = NULL_RTX;
+  rtx toc_restore = NULL_RTX;
   rtx func_addr;
-  rtx toc_reg;
-  rtx sc_reg;
-  rtx stack_ptr;
-  rtx stack_toc_offset;
-  rtx stack_toc_mem;
-  rtx func_toc_offset;
-  rtx func_toc_mem;
-  rtx func_sc_offset;
-  rtx func_sc_mem;
+  rtx abi_reg = NULL_RTX;
+  rtx call[4];
+  int n_call;
   rtx insn;
-  rtx (*call_func) (rtx, rtx, rtx, rtx);
-  rtx (*call_value_func) (r

[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_function_arg

2013-11-11 Thread Ulrich Weigand

Hello,

another patch in preparation for the new ABI.

When calling a function in the absence of a prototype, we need to pass
arguments that are normally passed in floating-point or vector registers
also on the stack and/or in GPRs.

The code currently handling this in rs6000_function_arg is implemented
in two separate places, one that handles floating-point arguments and
one that handles vector arguments.  This is probably reasonable with
the current ABI, since in the vector case, everything is much simpler
than in the floating-point case.

However, with the new ABI, we may now pass more complex arguments in
vector registers, in which case we might need to duplicate more of
the logic currently used only for floating-point arguments.

To avoid that duplication, this patch moves the logic to handle
argument duplication on stack or in GPRs out to a pair of new
helper routines, and uses them for both floating-point and vector
arguments.  For the current ABI, the result should be exactly
the same.

No change in generated code intented.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_psave_function_arg): New function.
(rs6000_finish_function_arg): Likewise.
(rs6000_function_arg): Use rs6000_psave_function_arg and
rs6000_finish_function_arg to handle both vector and floating
point arguments that are also passed in GPRs / the stack.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9511,6 +9511,83 @@ rs6000_mixed_function_arg (enum machine_
   return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec));
 }
 
+/* We have an argument of MODE and TYPE that goes into FPRs or VRs,
+   but must also be copied into the parameter save area starting at
+   offset ALIGN_WORDS.  Fill in RVEC with the elements corresponding
+   to the GPRs and/or memory.  Return the number of elements used.  */
+
+static int
+rs6000_psave_function_arg (enum machine_mode mode, const_tree type,
+  int align_words, rtx *rvec)
+{
+  int k = 0;
+
+  if (align_words < GP_ARG_NUM_REG)
+{
+  int n_words = rs6000_arg_size (mode, type);
+
+  if (align_words + n_words > GP_ARG_NUM_REG
+ || (TARGET_32BIT && TARGET_POWERPC64))
+   {
+ /* If this is partially on the stack, then we only
+include the portion actually in registers here.  */
+ enum machine_mode rmode = TARGET_32BIT ? SImode : DImode;
+ int i = 0;
+
+ if (align_words + n_words > GP_ARG_NUM_REG)
+   {
+ /* Not all of the arg fits in gprs.  Say that it goes in memory
+too, using a magic NULL_RTX component.  Also see comment in
+rs6000_mixed_function_arg for why the normal
+function_arg_partial_nregs scheme doesn't work in this case. */
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx);
+   }
+
+ do
+   {
+ rtx r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words);
+ rtx off = GEN_INT (i++ * GET_MODE_SIZE (rmode));
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
+   }
+ while (++align_words < GP_ARG_NUM_REG && --n_words != 0);
+   }
+  else
+   {
+ /* The whole arg fits in gprs.  */
+ rtx r = gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
+ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, const0_rtx);
+   }
+}
+  else
+{
+  /* It's entirely in memory.  */
+  rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx);
+}
+
+  return k;
+}
+
+/* RVEC is a vector of K components of an argument of mode MODE.
+   Construct the final function_arg return value from it.  */
+
+static rtx
+rs6000_finish_function_arg (enum machine_mode mode, rtx *rvec, int k)
+{
+  gcc_assert (k >= 1);
+
+  /* Avoid returning a PARALLEL in the trivial cases.  */
+  if (k == 1)
+{
+  if (XEXP (rvec[0], 0) == NULL_RTX)
+   return NULL_RTX;
+
+  if (GET_MODE (XEXP (rvec[0], 0)) == mode)
+   return XEXP (rvec[0], 0);
+}
+
+  return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec));
+}
+
 /* Determine where to put an argument to a function.
Value is zero to push the argument on the stack,
or a hard register in which to store the argument.
@@ -9580,32 +9657,25 @@ rs6000_function_arg (cumulative_args_t c
 }
 
   if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
-if (TARGET_64BIT && ! cum->prototype)
-  {
-   /* Vector parameters get passed in vector register
-  and also in GPRs or memory, in absence of prototype.  */
-   int align_words;
-   rtx slot;
-   align_words = (cum->words + 1) & ~1;
+{
+  r

[PATCH, rs0000] ELFv2 ABI preparation: Refactor some uses of ABI_AIX

2013-11-11 Thread Ulrich Weigand

Hello,

this is another patch to prepare for the new ABI.  Since this will introduce
a new setting of DEFAULT_ABI, which will have to be treated like ABI_AIX in
many cases, it would be better if there were fewer explicit references to
ABI_AIX in the back-end.  This patch eliminates a number of them:

- DEFAULT_ABI is either ABI_V4, ABI_DARWIN, or ABI_AIX, so any test for
  two of these can be replaced by a test for the third

- there are some code paths that are never used on ABI_DARWIN.  Here,
  any test for ABI_AIX can be replaced by one for ABI_V4.  (This is
  particularly obvious for the load_toc_v4_... patterns that even
  have v4 in their name ...)

- rs6000_elf_declare_function_name had a duplicated test for ABI_AIX in
  a code block that already tests for ABI_AIX

None of this is intended to change generated code on any platform.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_option_override_internal): Replace
"DEFAULT_ABI != ABI_AIX" test by testing for ABI_V4 or ABI_DARWIN.
(rs6000_savres_strategy): Likewise.
(rs6000_return_addr): Likewise.
(rs6000_emit_load_toc_table): Replace "DEFAULT_ABI != ABI_AIX" by
testing for ABI_V4 (since ABI_DARWIN is impossible here).
(rs6000_emit_prologue): Likewise.
(legitimate_lo_sum_address_p): Simplify DEFAULT_ABI test.
(rs6000_elf_declare_function_name): Remove duplicated test.
* config/rs6000/rs6000.md ("load_toc_v4_PIC_1"): Explicitly test
for ABI_V4 (instead of "DEFAULT_ABI != ABI_AIX" test).
("load_toc_v4_PIC_1_normal"): Likewise.
("load_toc_v4_PIC_1_476"): Likewise.
("load_toc_v4_PIC_1b"): Likewise.
("load_toc_v4_PIC_1b_normal"): Likewise.
("load_toc_v4_PIC_1b_476"): Likewise.
("load_toc_v4_PIC_2"): Likewise.
("load_toc_v4_PIC_3b"): Likewise.
("load_toc_v4_PIC_3c"): Likewise.
* config/rs6000/rs6000.h (RS6000_REG_SAVE): Simplify DEFAULT_ABI test.
(RS6000_SAVE_AREA): Likewise.
(FP_ARG_MAX_REG): Likewise.
(RETURN_ADDRESS_OFFSET): Likewise.
* config/rs6000/sysv.h (TARGET_TOC): Test for ABI_V4 instead
of ABI_AIX.
(SUBTARGET_OVERRIDE_OPTIONS): Likewise.
(MINIMAL_TOC_SECTION_ASM_OP): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -3669,7 +3669,7 @@ rs6000_option_override_internal (bool gl
 
   /* We should always be splitting complex arguments, but we can't break
 Linux and Darwin ABIs at the moment.  For now, only AIX is fixed.  */
-  if (DEFAULT_ABI != ABI_AIX)
+  if (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)
targetm.calls.split_complex_arg = NULL;
 }
 
@@ -6384,7 +6384,7 @@ legitimate_lo_sum_address_p (enum machin
 {
   bool large_toc_ok;
 
-  if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
+  if (DEFAULT_ABI == ABI_V4 && flag_pic)
return false;
   /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
 push_reload from reload pass code.  LEGITIMIZE_RELOAD_ADDRESS
@@ -19605,7 +19605,8 @@ rs6000_savres_strategy (rs6000_stack_t *
  by the static chain.  It would require too much fiddling and the
  static chain is rarely used anyway.  FPRs are saved w.r.t the stack
  pointer on Darwin, and AIX uses r1 or r12.  */
-  if (using_static_chain_p && DEFAULT_ABI != ABI_AIX)
+  if (using_static_chain_p
+  && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN))
 strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
 | SAVE_INLINE_GPRS
 | SAVE_INLINE_VRS | REST_INLINE_VRS);
@@ -20301,7 +20302,8 @@ rs6000_return_addr (int count, rtx frame
   /* Currently we don't optimize very well between prolog and body
  code and for PIC code the code can be actually quite bad, so
  don't try to be too clever here.  */
-  if (count != 0 || (DEFAULT_ABI != ABI_AIX && flag_pic))
+  if (count != 0
+  || ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && flag_pic))
 {
   cfun->machine->ra_needs_full_frame = 1;
 
@@ -20449,7 +20451,7 @@ rs6000_emit_load_toc_table (int fromprol
   rtx dest;
   dest = gen_rtx_REG (Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
 
-  if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI != ABI_AIX && flag_pic)
+  if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI == ABI_V4 && flag_pic)
 {
   char buf[30];
   rtx lab, tmp1, tmp2, got;
@@ -20477,7 +20479

[PATCH, rs6000] ELFv2 ABI preparation: Remove USE_FP/ALTIVEC_FOR_ARG_P type arg

2013-11-11 Thread Ulrich Weigand

Hello,

another patch in preparation for the new ABI.

USE_FP_FOR_ARG_P and USE_ALTIVEC_FOR_ARG_P take a TYPE argument
which they never use.  Since passing a correct TYPE for the
homogeneous struct case would be a bit problematic, it seems
cleaner to just remove the unused argument.

No change in generated code intented.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (USE_FP_FOR_ARG_P): Remove TYPE argument.
(USE_ALTIVEC_FOR_ARG_P): Likewise.
(rs6000_darwin64_record_arg_advance_recurse): Update uses.
(rs6000_function_arg_advance_1):Likewise.
(rs6000_darwin64_record_arg_recurse): Likewise.
(rs6000_function_arg): Likewise.
(rs6000_arg_partial_bytes): Likewise.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -8434,13 +8434,13 @@ rs6000_member_type_forces_blk (const_tre
 }
 
 /* Nonzero if we can use a floating-point register to pass this arg.  */
-#define USE_FP_FOR_ARG_P(CUM,MODE,TYPE)\
+#define USE_FP_FOR_ARG_P(CUM,MODE) \
   (SCALAR_FLOAT_MODE_P (MODE)  \
&& (CUM)->fregno <= FP_ARG_MAX_REG  \
&& TARGET_HARD_FLOAT && TARGET_FPRS)
 
 /* Nonzero if we can use an AltiVec register to pass this arg.  */
-#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,TYPE,NAMED) \
+#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,NAMED)  \
   (ALTIVEC_OR_VSX_VECTOR_MODE (MODE)   \
&& (CUM)->vregno <= ALTIVEC_ARG_MAX_REG \
&& TARGET_ALTIVEC_ABI   \
@@ -8889,7 +8889,7 @@ rs6000_darwin64_record_arg_advance_recur
 
if (TREE_CODE (ftype) == RECORD_TYPE)
  rs6000_darwin64_record_arg_advance_recurse (cum, ftype, bitpos);
-   else if (USE_FP_FOR_ARG_P (cum, mode, ftype))
+   else if (USE_FP_FOR_ARG_P (cum, mode))
  {
unsigned n_fpregs = (GET_MODE_SIZE (mode) + 7) >> 3;
rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0);
@@ -8930,7 +8930,7 @@ rs6000_darwin64_record_arg_advance_recur
else
  cum->words += n_fpregs;
  }
-   else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, 1))
+   else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, 1))
  {
rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0);
cum->vregno++;
@@ -8993,7 +8993,7 @@ rs6000_function_arg_advance_1 (CUMULATIV
 {
   bool stack = false;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named))
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
{
  cum->vregno++;
  if (!TARGET_ALTIVEC)
@@ -9366,7 +9366,7 @@ rs6000_darwin64_record_arg_recurse (CUMU
 
if (TREE_CODE (ftype) == RECORD_TYPE)
  rs6000_darwin64_record_arg_recurse (cum, ftype, bitpos, rvec, k);
-   else if (cum->named && USE_FP_FOR_ARG_P (cum, mode, ftype))
+   else if (cum->named && USE_FP_FOR_ARG_P (cum, mode))
  {
unsigned n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3;
 #if 0
@@ -9394,7 +9394,7 @@ rs6000_darwin64_record_arg_recurse (CUMU
if (mode == TFmode || mode == TDmode)
  cum->fregno++;
  }
-   else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, ftype, 1))
+   else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, 1))
  {
rs6000_darwin64_record_arg_flush (cum, bitpos, rvec, k);
rvec[(*k)++]
@@ -9579,7 +9579,7 @@ rs6000_function_arg (cumulative_args_t c
   /* Else fall through to usual handling.  */
 }
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named))
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
 if (TARGET_64BIT && ! cum->prototype)
   {
/* Vector parameters get passed in vector register
@@ -9707,7 +9707,7 @@ rs6000_function_arg (cumulative_args_t c
   if (mode == TDmode && (cum->fregno % 2) == 1)
cum->fregno++;
 
-  if (USE_FP_FOR_ARG_P (cum, mode, type))
+  if (USE_FP_FOR_ARG_P (cum, mode))
{
  rtx rvec[GP_ARG_NUM_REG + 1];
  rtx r;
@@ -9823,7 +9823,7 @@ rs6000_arg_partial_bytes (cumulative_arg
   if (DEFAULT_ABI == ABI_V4)
 return 0;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)
   && cum->nargs_prototype >= 0)
 return 0;
 
@@ -9833,7 +9833,7 @@ rs6000_arg_partial_bytes (cumulative_arg
 
   align_words = rs6000_parm_start (mode, type, cum->words);
 
-  if (USE_FP_FOR_ARG_P (cum, mode, type))
+  if (USE_FP_FOR_ARG_P (cum, mode))
 {
   /* If we are passing this arg

[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_arg_partial_bytes

2013-11-11 Thread Ulrich Weigand

Hello,

this is the final patch preparing for the new ABI.

The logic in rs6000_arg_partial_bytes is a bit complex, since it apparently
still contains remnants from the time where this routine was used even for
arguments that are returned both in GPRs and FPRs/VRs (*and* memory).

These days, all such cases are handled completely in rs6000_function_arg
so rs6000_arg_partial_bytes should always return 0; which it *does*, even
though in a somewhat complex way.

This complex logic would make handling homogeneous structs more difficult
than it needs to be.  Therefore, this patch simplifies the logic by making
explicit the fact that rs6000_arg_partial_bytes does not actually need to
handle the cases described above.

No change in generated code expected.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (rs6000_arg_partial_bytes): Simplify logic
by making use of the fact that for vector / floating point arguments
passed both in VRs/FPRs and in the fixed parameter area, the partial
bytes mechanism is in fact not used.

Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9835,15 +9835,25 @@ rs6000_arg_partial_bytes (cumulative_arg
  tree type, bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  bool passed_in_gprs = true;
   int ret = 0;
   int align_words;
 
   if (DEFAULT_ABI == ABI_V4)
 return 0;
 
-  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)
-  && cum->nargs_prototype >= 0)
-return 0;
+  if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named))
+{
+  /* If we are passing this arg in the fixed parameter save area
+ (gprs or memory) as well as VRs, we do not use the partial
+bytes mechanism; instead, rs6000_function_arg wil return a
+PARALLEL including a memory element as necessary.  */
+  if (TARGET_64BIT && ! cum->prototype)
+   return 0;
+
+  /* Otherwise, we pass in VRs only.  No partial copy possible.  */
+  passed_in_gprs = false;
+}
 
   /* In this complicated case we just disable the partial_nregs code.  */
   if (TARGET_MACHO && rs6000_darwin64_struct_check_p (mode, type))
@@ -9853,24 +9863,27 @@ rs6000_arg_partial_bytes (cumulative_arg
 
   if (USE_FP_FOR_ARG_P (cum, mode))
 {
+  unsigned long n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3;
+
   /* If we are passing this arg in the fixed parameter save area
-(gprs or memory) as well as fprs, then this function should
-return the number of partial bytes passed in the parameter
-save area rather than partial bytes passed in fprs.  */
+ (gprs or memory) as well as FPRs, we do not use the partial
+bytes mechanism; instead, rs6000_function_arg wil return a
+PARALLEL including a memory element as necessary.  */
   if (type
  && (cum->nargs_prototype <= 0
  || (DEFAULT_ABI == ABI_AIX
  && TARGET_XL_COMPAT
  && align_words >= GP_ARG_NUM_REG)))
return 0;
-  else if (cum->fregno + ((GET_MODE_SIZE (mode) + 7) >> 3)
-  > FP_ARG_MAX_REG + 1)
+
+  /* Otherwise, we pass in FPRs only.  Check for partial copies.  */
+  passed_in_gprs = false;
+  if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1)
ret = (FP_ARG_MAX_REG + 1 - cum->fregno) * 8;
-  else if (cum->nargs_prototype >= 0)
-   return 0;
 }
 
-  if (align_words < GP_ARG_NUM_REG
+  if (passed_in_gprs
+  && align_words < GP_ARG_NUM_REG
   && GP_ARG_NUM_REG < align_words + rs6000_arg_size (mode, type))
 ret = (GP_ARG_NUM_REG - align_words) * (TARGET_32BIT ? 4 : 8);
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, rs6000] ELFv2 ABI 3/8: Track single CR fields in DWARF CFI

2013-11-11 Thread Ulrich Weigand

Hello,

this patch adds another ELFv2 ABI feature: explicit tracking of CR fields
in DWARF CFI.

In the current ABI, DWARF CFI contains only a single record describing
the save location of the whole CR field.  It is implicit that all (or
at least all call-clobbered) fields are present at that location.

Now, if you use the instructions that save and restore the whole CR
at once, this approach might seem reasonable.  Unfortunately, with
current POWER processors, those instructions tend to be significantly
slower that those that access only single CR fields.

In particular in routines where only one or two CR fields are actually
clobbered and need to be saved, we could improve performance of prolog
and epilog code by saving/restoring only selected CR fields.  However,
this is not possible in the current ABI since there is no way to
describe this fact in the CFI.

With the ELFv2 ABI, every CR field gets its own CFI record (using the
register numbers 68 .. 75 to stand for CR0 .. CR7).  Now, those fields
will still usually be saved in the same 4-byte field on the stack.
The semantics of a CFI record for field CRx is that the memory location
holds 4 bytes, and the 4-bit nibble corresponding to CRx within those
4 bytes hold the CRx value to be restored.

The one problem with this scheme is the way uw_install_context tries
to modify saved valued when unwinding the stack: it will simply copy
over the whole field into the save slot of the unwinder routine
(that calls __builtin_eh_return).  This clearly does not work if
multiple CR fields need to be restored independently.

To fix this, the prolog/epilog code for unwinder routines will use
*multiple* stack slots, one for each call-saved CR fields, and save
and restore those fields to and from their own slot.  This will
allow uw_install_context to install values for multiple fields.
(Note that there is already precedent for unwinder routines being
treated specially in the rs6000.c prologue/epilogue code ...)

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (struct rs6000_stack): New member ehcr_offset.
(rs6000_stack_info): For ABI_ELFv2, allocate space for separate CR
field save areas if the function calls __builtin_eh_return.
(rs6000_emit_move_from_cr): New function.
(rs6000_emit_prologue): Use it.  For ABI_ELFv2, generate separate
CFI records for each saved CR field.  For functions that call
__builtin_eh_return, save all CR fields into separate slots.
(restore_saved_cr): For ABI_ELFv2, generate separate CFA_RESTORE
entries for each saved CR field.
(add_crlr_cfa_restore): Likewise.
(rs6000_emit_epilogue): For ABI_ELFv2, if the function calls
__builtin_eh_return, restore each CR field from its own slot.

libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/linux-unwind.h (R_CR3, R_CR4): New macros.
(ppc_fallback_frame_state) [_CALL_ELF == 2]: Create CFI entry
for CR3 and CR4.


Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -97,6 +97,7 @@ typedef struct rs6000_stack {
   int spe_gp_save_offset;  /* offset to save spe 64-bit gprs  */
   int varargs_save_offset; /* offset to save the varargs registers */
   int ehrd_offset; /* offset to EH return data */
+  int ehcr_offset; /* offset to EH CR field data */
   int reg_size;/* register size (4 or 8) */
   HOST_WIDE_INT vars_size; /* variable save area size */
   int parm_size;   /* outgoing parameter size */
@@ -19847,6 +19848,7 @@ rs6000_stack_info (void)
   rs6000_stack_t *info_ptr = &stack_info;
   int reg_size = TARGET_32BIT ? 4 : 8;
   int ehrd_size;
+  int ehcr_size;
   int save_align;
   int first_gp;
   HOST_WIDE_INT non_fixed_size;
@@ -19940,6 +19942,18 @@ rs6000_stack_info (void)
   else
 ehrd_size = 0;
 
+  /* In the ELFv2 ABI, we also need to allocate space for separate
+ CR field save areas if the function calls __builtin_eh_return.  */
+  if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
+{
+  /* This hard-codes that we have three call-saved CR fields.  */
+  ehcr_size = 3 * reg_size;
+  /* We do *not* use the regular CR save mechanism.  */
+  info_ptr->cr_save_p = 0;
+}
+  else
+ehcr_size = 0;
+
   /* Determine various sizes.  */
   info_ptr->reg_size = reg_size;
   info_ptr->fixed_size   = RS6000_SAVE_AREA;
@@ -20009,6 +20023,8 @@ rs6000_stack_info (void)
}
   else
info_ptr->ehrd_offset  = info_ptr->gp_save_offset - ehrd_size;
+
+  info_ptr->ehcr_offset  = info_ptr->ehrd_offset - ehcr_size;
   info_ptr->cr_save_offset   = reg_size; /* first word when 64-bit.  */
   info_ptr->lr_save_offset   = 2*re

[PATCH, rs6000] ELFv2 ABI 2/8: Remove function descriptors

2013-11-11 Thread Ulrich Weigand

Hello,

this patch adds the first major feature of the ELFv2 ABI: removal
of function descriptors.


In the current ABI, it is the responsibility of a caller to set
up the TOC pointer needed by the callee by loading the correct
value into r2.  This typically happens in one of these ways:

- For a direct function call within the same module, caller and
  callee share the same TOC, so r2 already contains the correct
  value.

- For a direct function call to a function in another module,
  the linker will interpose a PLT stub which reloads r2 to be
  appropriate for the callee, with help from ld.so.

- For an indirect function call, the "function pointer" points
  to function descriptor that holds the target function address
  as well as the appropriate TOC value (and a static chain
  value for nested functions).  The caller must load those
  values up before transfering control to the callee.


With the new ABI, it is the responsibility of the callee to
set up its own TOC pointer.  This has a number of advantages,
in particular for a callee that doesn't actually need a TOC,
and makes the function pointer call sequence more effective
by avoiding pipeline hazards.

However, it remains true that with the PowerPC ISA, it is
difficult to actually establish addressability to the TOC
without any outside help.  To avoid introducing new performance
regressions, the new ABI instead provides a dual entry point
scheme.

Any function that needs a TOC may provide two entry points:

- The first ("global") entry point is intended to called via
  indirect calls.  It expects r12 to be set up to point to
  the entry point address itself.  (Since in order to perform
  an indirect call, the address must be loaded into some GPR
  anyway, this does not impose a substantial impact on the
  caller ...)  Code at the global entry point is expected
  to load up r2 with the appropriate TOC value for this
  function (and it may use the r12 value to compute that),
  and then fall through to the local entry point.

- The second ("local") entry point expects r2 to be set up
  to the current TOC, much like an entry point in the current
  ABI does.  However, it must not expect r12 to hold any
  particular value.  This means that a local direct function
  call within the same module can be done via a simple "bl"
  instruction just as today.  Note that the linker will
  automatically direct a "bl func" to the *local* entry
  point associated with the symbol "func"; not the global one.

If local and global entry points coincide, the function may
expect no particular value in either r12 or r2; this can be
used by functions that do not need a TOC.

There exists just one single ELF symbol for both local and
global entry points; its symbol value refers to the global
entry point, and flag bits in ELF st_other encode the
offset from there to the local entry point.  The compiler
emits a .localentry directive to announce this location
to the assembler, which will encode it as appropriate.


This patch implements all aspects needed for this scheme:

- Emit ".abiversion 2" assembler directive to help the assembler
  and linker understand the ABI implemented in this file.
- Remove all handling of function descriptors / .opd section /
  dot symbols for the ELFv2 ABI.
- Output a global entry point prologue and local entry point
  marker for functions that use the TOC.
- Implement the ELFv2 function pointer call sequence.
- Use trampolines to handle nested functions, just like on
  32-bit (including marking the stack as executable).
- No longer scan for the old ABI indirect call sequence in
  linux-unwind.h.
- Update assembler code to the new ABI:
   - gcc/config/rs6000/ppc-asm.h (used by libgcc and elsewhere)
   - lititm/config/powerpc/sjlj.S
   - gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c

In addition, some related changes are necessary:

- Move the code generating the call to _mcount for -mprofile-kernel
  from output_function_profiler to rs6000_output_function_prologue,
  since this call must happen at the local entry point, not the
  global entry point.
- Disable (now useless) tests for -mpointers-to-nested-functions.
- Fix the libstc++ extract_symvers.in script to ignore markers
  emitted by readelf --symbols that indicate local entry points.
- Fix the "gotest" script to no longer expect function symbols
  to reside in the data segment.  (I understand this last bit
  needs to go into the Go repository first ...)

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/rs6000.c (machine_function): New member
r2_setup_needed.
(rs6000_emit_prologue): Set r2_setup_needed if necessary.
(rs6000_output_mi_thunk): Set r2_setup_needed.
(rs6000_output_function_prologue): Output global entry point
prologue and local entry point marker if needed for ABI_ELFv2.
Output -mprofile-kernel code here.

[PATCH, rs6000] ELFv2 ABI 1/8: Add options and infrastructure

2013-11-11 Thread Ulrich Weigand

Hello,

this is the first patch in the series to add support for the ELFv2 ABI.

The ELFv2 ABI is the intended ABI for the new powerpc64le-linux port.
However, it is not inherently tied to the byte order; it it possible
in principle to use the ELFv2 ABI in big-endian mode too.

Therefore, it is introduces via a new pair of options
   -mabi=elfv1 / -mabi=elfv2
where -mabi=elfv1 select the current Linux ABI, and -mabi=elfv2
selects the new one.

If neither option is given, the compiler defaults to whatever ABI
was specified at configure time using the
   --with-abi=elfv1 / --with-abi=elfv2
configure option.

If this was not specified either, every subtarget can provide a
master default by defining (or not) the LINUX64_DEFAULT_ABI_ELFv2
macro.

The patch series is structured as follows:

- This first patch adds the new enum rs6000_abi value ABI_ELFv2
  and all the configure logic needed to set it.  It is treated
  just like ABI_AIX (on a modern system) at this point; no change
  in generated code is expected yet.

- A series of follow on patches will add the various features
  defining the ELFv2 ABI.  Note that they will still not be
  active by default anywhere.

- The final patch will simply turn the switch to make ELFv2
  the default ABI on powerpc64le-linux.

The whole series was tested on:

  - powerpc64-linux with no regression, including no regression
in an ALT_CC_UNDER_TEST/ALT_CXX_UNDER_TEST run of compat.exp
and struct-layout-1.exp against a current compiler.

  - powerpc64-linux --with-abi=elfv2 in mock-up big-endian ELFv2
environment running with some hacks on an existing system.

  - powerpc64le-linux, using the new default ELFv2 in a rebuilt
small OS image using the new ABI everywhere.
 
(Of course, patches to binutils, glibc, and libffi to support the
new ELFv2 ABI are also required.)

The tests were using all languages (including Java, Go, and
Objective-C++; on powerpc64le-linux also Ada), and all target
libraries except libsanitizer (which currently seems broken
on mainline).

There is a small number of regressions on powerpc64le-linux
which seem to be due to unrelated issues:

- FAIL: g++.dg/cpp1y/vla-initlist1.C execution test
  This is an invalid assumption in the test case

- FAIL: go.test/test/fixedbugs/bug296.go execution,  -O2 -g
  The ELFv2 ABI  seems to be exposing a Go frontend bug here

- FAIL: runtime/pprof   (in libgo)
  This is due to a glibc problem with unwinding through a
  context created via makecontext

I'll address those separately.  (Of course, there are also
other pre-existing regressions on powerpc64le-linux simply
due to little-endian issues.  Those are unaffected by ELFv2.)


I'll add more detailed descriptions of the actual ABI features
with the various follow-on patches that implement those.

This patch specifically only adds the options as described above.
ABI_ELFv2 is implemented to be equivalent to ABI_AIX on a modern
system; since the ABI is new, there is no need to continue to
support legacy features.  This means specifically:
- DOT_SYMBOLS is assumed to be false
- rs6000_compat_align_parm is assumed to be false
- code in linux-unwind.h that handles old kernels/linkers is removed

The patch also adds two new pre-defined macros:
- _CALL_ELF is set to 1 or 2 to denote the ELFv1 / ELFv2 ABI
- __STRUCT_PARM_ALIGN__ is set to 16 if aggregates passed by value
  are aligned to 16 if their native alignment requires that
  (this is necessary to implement libffi, for example)

In addition, the patch adds a testsuite effective target check
for the ELFv2 ABI, and disables the pr57949 tests (these verify
the operation of the -mcompat-align-parm option, which is no
longer useful in the ELFv2 ABI).


To avoid having a partial ABI implementation in tree, it seems best
to commit this whole patch series as a single commit.

Is the series OK for mainline?

Bye,
Ulrich




gcc/ChangeLog:

2013-11-11  Ulrich Weigand  

* config.gcc [powerpc*-*-* | rs6000-*-*]: Support --with-abi=elfv1
and --with-abi=elfv2.
* config/rs6000/option-defaults.h (OPTION_DEFAULT_SPECS): Add "abi".
* config/rs6000/rs6000.opt (mabi=elfv1): New option.
(mabi=elfv2): Likewise.
* config/rs6000/rs6000-opts.h (enum rs6000_abi): Add ABI_ELFv2.
* config/rs6000/linux64.h (DEFAULT_ABI): Do not hard-code to AIX_ABI
if !RS6000_BI_ARCH.
(ELFv2_ABI_CHECK): New macro.
(SUBSUBTARGET_OVERRIDE_OPTIONS): Use it to decide whether to set
rs6000_current_abi to ABI_AIX or ABI_ELFv2.
(GLIBC_DYNAMIC_LINKER64): Support ELFv2 ld.so version.
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Predefine
_CALL_ELF and __STRUCT_PARM_ALIGN__ if appropriate.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Handle ABI_ELFv2.
(debug_stack_info): Likewise.
(rs6000_file_start): Treat ABI_ELFv2 the same as ABI_AIX.
(rs6000_legitimize_tl

[PATCH, rs6000] ELFv2 ABI 4/8: Struct passing calling convention changes

2013-11-11 Thread Ulrich Weigand

Hello,

this patch implements the new struct calling convention for ELFv2.

With the new ABI, so-called "homogeneous" aggregates, i.e. struct, arrays,
or unions that (recursively) contain only elements of the same floating-
point or vector type are passed as if those elements were passed as
separate arguments.  (This is done for up to 8 such elements.)

After the refactoring of the rs6000_function_arg / rs6000_arg_partial_bytes
logic that was done in a prior patch, implementing support for
homogenous aggregates is now mostly straightforward.  Note that
rs6000_psave_function_arg can now get called for BLKmode arguments,
and has to handle them using a PARALLEL.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Michael Gschwind  

* config/rs6000/rs6000.h (AGGR_ARG_NUM_REG): Define.
* config/rs6000/rs6000.c (rs6000_aggregate_candidate): New function.
(rs6000_discover_homogeneous_aggregate): Likewise.
(rs6000_function_arg_boundary): Handle homogeneous aggregates.
(rs6000_function_arg_advance_1): Likewise.
(rs6000_function_arg): Likewise.
(rs6000_arg_partial_bytes): Likewise.
(rs6000_psave_function_arg): Handle BLKmode arguments.

Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1641,6 +1641,9 @@ extern enum reg_class rs6000_constraints
 #define ALTIVEC_ARG_MAX_REG (ALTIVEC_ARG_MIN_REG + 11)
 #define ALTIVEC_ARG_NUM_REG (ALTIVEC_ARG_MAX_REG - ALTIVEC_ARG_MIN_REG + 1)
 
+/* Maximum number of registers per ELFv2 homogeneous aggregate argument.  */
+#define AGGR_ARG_NUM_REG 8
+
 /* Return registers */
 #define GP_ARG_RETURN GP_ARG_MIN_REG
 #define FP_ARG_RETURN FP_ARG_MIN_REG
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -8460,6 +8460,219 @@ rs6000_member_type_forces_blk (const_tre
&& TARGET_ALTIVEC_ABI   \
&& (NAMED))
 
+/* Walk down the type tree of TYPE counting consecutive base elements.
+   If *MODEP is VOIDmode, then set it to the first valid floating point
+   or vector type.  If a non-floating point or vector type is found, or
+   if a floating point or vector type that doesn't match a non-VOIDmode
+   *MODEP is found, then return -1, otherwise return the count in the
+   sub-tree.  */
+
+static int
+rs6000_aggregate_candidate (const_tree type, enum machine_mode *modep)
+{
+  enum machine_mode mode;
+  HOST_WIDE_INT size;
+
+  switch (TREE_CODE (type))
+{
+case REAL_TYPE:
+  mode = TYPE_MODE (type);
+  if (!SCALAR_FLOAT_MODE_P (mode))
+   return -1;
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  if (*modep == mode)
+   return 1;
+
+  break;
+
+case COMPLEX_TYPE:
+  mode = TYPE_MODE (TREE_TYPE (type));
+  if (!SCALAR_FLOAT_MODE_P (mode))
+   return -1;
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  if (*modep == mode)
+   return 2;
+
+  break;
+
+case VECTOR_TYPE:
+  if (!TARGET_ALTIVEC_ABI || !TARGET_ALTIVEC)
+   return -1;
+
+  /* Use V4SImode as representative of all 128-bit vector types.  */
+  size = int_size_in_bytes (type);
+  switch (size)
+   {
+   case 16:
+ mode = V4SImode;
+ break;
+   default:
+ return -1;
+   }
+
+  if (*modep == VOIDmode)
+   *modep = mode;
+
+  /* Vector modes are considered to be opaque: two vectors are
+equivalent for the purposes of being homogeneous aggregates
+if they are the same size.  */
+  if (*modep == mode)
+   return 1;
+
+  break;
+
+case ARRAY_TYPE:
+  {
+   int count;
+   tree index = TYPE_DOMAIN (type);
+
+   /* Can't handle incomplete types.  */
+   if (!COMPLETE_TYPE_P (type))
+ return -1;
+
+   count = rs6000_aggregate_candidate (TREE_TYPE (type), modep);
+   if (count == -1
+   || !index
+   || !TYPE_MAX_VALUE (index)
+   || !host_integerp (TYPE_MAX_VALUE (index), 1)
+   || !TYPE_MIN_VALUE (index)
+   || !host_integerp (TYPE_MIN_VALUE (index), 1)
+   || count < 0)
+ return -1;
+
+   count *= (1 + tree_low_cst (TYPE_MAX_VALUE (index), 1)
+ - tree_low_cst (TYPE_MIN_VALUE (index), 1));
+
+   /* There must be no padding.  */
+   if (!host_integerp (TYPE_SIZE (type), 1)
+   || (tree_low_cst (TYPE_SIZE (type), 1)
+   != count * GET_MODE_BITSIZE (*modep)))
+ return -1;
+
+   return count;
+  }
+
+case RECORD_TYPE:
+  {
+   int count = 0;
+   int sub_count;
+   tree field;
+
+   /* Can't handle incomplete types.  */
+   if (!COMPLETE_TYPE_P (type

[PATCH, rs6000] ELFv2 ABI 7/8: Eliminate some stack frame fields

2013-11-11 Thread Ulrich Weigand

Hello,

this is the second part of reducing stack space consumption for the
ELFv2 ABI: the old ABI reserved six words in every stack frame for
various purposes, and two of those were basically unused: one word
for "compiler use" and one word for "linker use".

Since neither the compiler nor the linker actually ever made any
nontrivial use of these fields, they're now eliminated in the new
ABI.

This patch implements this change, which is mostly straightforward
except for the fact that due to the change, the stack offset of
the TOC save area changes, which was hard-coded in a number of
places ...

The patch also updates a number of test cases that hardcoded the
stack layout.

Bye,
Ulrich


gcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/rs6000.h (RS6000_SAVE_AREA): Handle ABI_ELFv2.
(RS6000_SAVE_TOC): Remove.
(RS6000_TOC_SAVE_SLOT): New macro.
* config/rs6000/rs6000.c (rs6000_parm_offset): New function.
(rs6000_parm_start): Use it.
(rs6000_function_arg_advance_1): Likewise.
(rs6000_emit_prologue): Use RS6000_TOC_SAVE_SLOT.
(rs6000_emit_epilogue): Likewise.
(rs6000_call_aix): Likewise.
(rs6000_output_function_prologue): Do not save/restore r11
around calling _mcount for ABI_ELFv2.

libgcc/ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/linux-unwind.h (TOC_SAVE_SLOT): Define.
(frob_update_context): Use it.

gcc/testsuite/ChangeLog:

2013-11-11  Ulrich Weigand  

* gcc.target/powerpc/ppc64-abi-1.c (stack_frame_t): Remove
compiler and linker field if _CALL_ELF == 2.
* gcc.target/powerpc/ppc64-abi-2.c (stack_frame_t): Likewise.
* gcc.target/powerpc/ppc64-abi-dfp-1.c (stack_frame_t): Likewise.
* gcc.dg/stack-usage-1.c (SIZE): Update value for _CALL_ELF == 2.


Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1529,12 +1529,12 @@ extern enum reg_class rs6000_constraints
 
 /* Size of the fixed area on the stack */
 #define RS6000_SAVE_AREA \
-  ((DEFAULT_ABI == ABI_V4 ? 8 : 24) << (TARGET_64BIT ? 1 : 0))
+  ((DEFAULT_ABI == ABI_V4 ? 8 : DEFAULT_ABI == ABI_ELFv2 ? 16 : 24)\
+   << (TARGET_64BIT ? 1 : 0))
 
-/* MEM representing address to save the TOC register */
-#define RS6000_SAVE_TOC gen_rtx_MEM (Pmode, \
-plus_constant (Pmode, stack_pointer_rtx, \
-   (TARGET_32BIT ? 20 : 40)))
+/* Stack offset for toc save slot.  */
+#define RS6000_TOC_SAVE_SLOT \
+  ((DEFAULT_ABI == ABI_ELFv2 ? 12 : 20) << (TARGET_64BIT ? 1 : 0))
 
 /* Align an address */
 #define RS6000_ALIGN(n,a) (((n) + (a) - 1) & ~((a) - 1))
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -9029,6 +9029,16 @@ rs6000_function_arg_boundary (enum machi
 return PARM_BOUNDARY;
 }
 
+/* The offset in words to the start of the parameter save area.  */
+
+static unsigned int
+rs6000_parm_offset (void)
+{
+  return (DEFAULT_ABI == ABI_V4 ? 2
+ : DEFAULT_ABI == ABI_ELFv2 ? 4
+ : 6);
+}
+
 /* For a function parm of MODE and TYPE, return the starting word in
the parameter area.  NWORDS of the parameter area are already used.  */
 
@@ -9037,11 +9047,9 @@ rs6000_parm_start (enum machine_mode mod
   unsigned int nwords)
 {
   unsigned int align;
-  unsigned int parm_offset;
 
   align = rs6000_function_arg_boundary (mode, type) / PARM_BOUNDARY - 1;
-  parm_offset = DEFAULT_ABI == ABI_V4 ? 2 : 6;
-  return nwords + (-(parm_offset + nwords) & align);
+  return nwords + (-(rs6000_parm_offset () + nwords) & align);
 }
 
 /* Compute the size (in words) of a function argument.  */
@@ -9281,15 +9289,13 @@ rs6000_function_arg_advance_1 (CUMULATIV
{
  int align;
 
- /* Vector parameters must be 16-byte aligned.  This places
-them at 2 mod 4 in terms of words in 32-bit mode, since
-the parameter save area starts at offset 24 from the
-stack.  In 64-bit mode, they just have to start on an
-even word, since the parameter save area is 16-byte
-aligned.  Space for GPRs is reserved even if the argument
-will be passed in memory.  */
+ /* Vector parameters must be 16-byte aligned.  In 32-bit
+mode this means we need to take into account the offset
+to the parameter save area.  In 64-bit mode, they just
+have to start on an even word, since the parameter save
+area is 16-byte aligned.  */
  if (TARGET_32BIT)
-   align = (2 - cum->words) & 3;

[PATCH, rs6000] ELFv2 ABI 5/8: Struct return calling convention changes

2013-11-11 Thread Ulrich Weigand

Hello,

analogously to the previous patch handling homogeneous aggregates
as arguments, this patch handles such aggregates as return values.

In addition, the ELFv2 ABI returns any aggregate of up to 16 bytes
in size (that is not otherwise handled) in up to two GPRs.

In either case, the return value is returned formatted exactly as if
it were being passed as first argument.  In order to get this correct
in the big-endian ELFv2 case, we need to provide a non-default
implementation of TARGET_RETURN_IN_MSB.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Michael Gschwind  

* config/rs6000/rs6000.h (FP_ARG_MAX_RETURN): New macro.
(ALTIVEC_ARG_MAX_RETURN): Likewise.
(FUNCTION_VALUE_REGNO_P): Use them.
* config/rs6000/rs6000.c (TARGET_RETURN_IN_MSB): Define.
(rs6000_return_in_msb): New function.
(rs6000_return_in_memory): Handle ELFv2 homogeneous aggregates.
Handle aggregates of up to 16 bytes for ELFv2.
(rs6000_function_value): Handle ELFv2 homogeneous aggregates.


Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1648,6 +1648,10 @@ extern enum reg_class rs6000_constraints
 #define GP_ARG_RETURN GP_ARG_MIN_REG
 #define FP_ARG_RETURN FP_ARG_MIN_REG
 #define ALTIVEC_ARG_RETURN (FIRST_ALTIVEC_REGNO + 2)
+#define FP_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? FP_ARG_RETURN\
+  : (FP_ARG_RETURN + AGGR_ARG_NUM_REG - 1))
+#define ALTIVEC_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? ALTIVEC_ARG_RETURN \
+   : (ALTIVEC_ARG_RETURN + AGGR_ARG_NUM_REG - 1))
 
 /* Flags for the call/call_value rtl operations set up by function_arg */
 #define CALL_NORMAL0x  /* no special processing */
@@ -1667,8 +1671,10 @@ extern enum reg_class rs6000_constraints
On RS/6000, this is r3, fp1, and v2 (for AltiVec).  */
 #define FUNCTION_VALUE_REGNO_P(N)  \
   ((N) == GP_ARG_RETURN
\
-   || ((N) == FP_ARG_RETURN && TARGET_HARD_FLOAT && TARGET_FPRS)   \
-   || ((N) == ALTIVEC_ARG_RETURN && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI))
+   || ((N) >= FP_ARG_RETURN && (N) <= FP_ARG_MAX_RETURN
\
+   && TARGET_HARD_FLOAT && TARGET_FPRS)\
+   || ((N) >= ALTIVEC_ARG_RETURN && (N) <= ALTIVEC_ARG_MAX_RETURN  \
+   && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI))
 
 /* 1 if N is a possible register number for function argument passing.
On RS/6000, these are r3-r10 and fp1-fp13.
Index: gcc/gcc/config/rs6000/rs6000.c
===
--- gcc.orig/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -1449,6 +1449,9 @@ static const struct attribute_spec rs600
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY rs6000_return_in_memory
 
+#undef TARGET_RETURN_IN_MSB
+#define TARGET_RETURN_IN_MSB rs6000_return_in_msb
+
 #undef TARGET_SETUP_INCOMING_VARARGS
 #define TARGET_SETUP_INCOMING_VARARGS setup_incoming_varargs
 
@@ -8725,6 +8728,16 @@ rs6000_return_in_memory (const_tree type
   /* Otherwise fall through to more conventional ABI rules.  */
 }
 
+  /* The ELFv2 ABI returns homogeneous VFP aggregates in registers */
+  if (rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type,
+NULL, NULL))
+return false;
+
+  /* The ELFv2 ABI returns aggregates up to 16B in registers */
+  if (DEFAULT_ABI == ABI_ELFv2 && AGGREGATE_TYPE_P (type)
+  && (unsigned HOST_WIDE_INT) int_size_in_bytes (type) <= 16)
+return false;
+
   if (AGGREGATE_TYPE_P (type)
   && (aix_struct_return
  || (unsigned HOST_WIDE_INT) int_size_in_bytes (type) > 8))
@@ -8756,6 +8769,19 @@ rs6000_return_in_memory (const_tree type
   return false;
 }
 
+/* Specify whether values returned in registers should be at the most
+   significant end of a register.  We want aggregates returned by
+   value to match the way aggregates are passed to functions.  */
+
+static bool
+rs6000_return_in_msb (const_tree valtype)
+{
+  return (DEFAULT_ABI == ABI_ELFv2
+ && BYTES_BIG_ENDIAN
+ && AGGREGATE_TYPE_P (valtype)
+ && FUNCTION_ARG_PADDING (TYPE_MODE (valtype), valtype) == upward);
+}
+
 #ifdef HAVE_AS_GNU_ATTRIBUTE
 /* Return TRUE if a call to function FNDECL may be one that
potentially affects the function calling ABI of the object file.  */
@@ -29897,6 +29923,8 @@ rs6000_function_value (const_tree valtyp
 {
   enum machine_mode mode;
   unsigned int regno;
+  enum machine_mode elt_mode;
+  int n_elts;
 
   /* Special handling for s

[PATCH, rs6000] ELFv2 ABI 6/8: Eliminate register save area in some cases

2013-11-11 Thread Ulrich Weigand

Hello,

the final area of changes implemented by the ELFv2 ABI is stack space
consumption.  Reducing the miminum stack size requirements is important
for environments where stack space is restricted, e.g. Linux kernel code,
or where we have a large number of stacks (e.g. heavily multi-threaded
applications).

One reason for the large stack space requirement of the current ABI is
that every caller must provide an argument save area of 64 bytes to its
caller.

This is actually useful in one specific case: if the called function
uses variable arguments and constructs a va_list.  Because of the way
the ABI is designed, the callee can in this case simply store the 8
GPRs (potentially) carrying arguments into that save area, and is
then guaranteed to have all function arguments in a linear arrangement
on the stack, which makes a trivial va_arg implementation possible.

If the callee is not vararg, this wouldn't really be necessary.  However,
if we were to skip that area then, vararg and non-vararg routines would
have incompatible ABIs, which would make it impossible to generate code
for a function call if the caller does not have a prototype of the called
function.  Therefore the old ABI requires the save area to be present
always.

With the new ABI, we wanted to retain the possibility of using a
trivial va_arg implementation, and also the possibility of calling
non-prototyped routines safely.  However, even so, there is a set
of cases where it is still not necessary to provide a save area:
when calling a function we have a prototype for and know that it
is neither vararg nor uses on-stack arguments.

This patch implement this change by having REG_PARM_STACK_SPACE
return a different value depending on whether the called routine
requires on-stack arguments (and we have a prototype).


There was one problem exposed by this change: rs6000_function_arg
contained this piece of code:

  if (mode == BLKmode)
mode = Pmode;

which marked the mode of a GPR holding an incoming struct argument
as Pmode.  This is now causing a problem.  When REG_PARM_STACK_SPACE
returns 0, function.c must allocate a stack slot within the callee's
frame to serve as DECL_RTL for the parameter.  This is done in
assign_parm_setup_block.  This routine attempts to use the mode
of the incoming register as the mode of that stack slot, if the
sizes match.

This means that the DECL_RTL for an argument of struct type may
now have mode Pmode, even if that type cannot reliably be operated
on in Pmode, which causes ICEs in several test cases for me.

Note that when REG_PARM_STACK_SPACE is non-zero, this problem does
not occur, because in this case, function.c allocates the stack slot
for the parameter's DECL_RTL in the register save area, using a
different function assign_parm_find_stack_rtl.   *That* function
will keep the mode as BLKmode even if the incoming register is
in another mode.

Removing the above two lines from rs6000_function_arg fixes this
problem, and does not appear to introduce any problem for the
ELFv1 ABI case either (full bootstrap / regtest still passes).

Looking back at the ChangeLog, those two lines where added by
Alan Modra back in 2004:
http://gcc.gnu.org/ml/gcc/2004-11/msg00617.html
to fix ICEs after a change by Richard Henderson:
http://gcc.gnu.org/ml/gcc/2004-11/msg00569.html
but Richard shortly afterwards reverted that change again
since it caused problems elsewhere too:
http://gcc.gnu.org/ml/gcc/2004-11/msg00751.html

So all in all, it seems this change by Alan was simply not
necessary and can be reverted.


Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  
Alan Modra  

* config/rs6000/rs6000-protos.h (rs6000_reg_parm_stack_space):
Add prototype.
* config/rs6000/rs6000.h (RS6000_REG_SAVE): Remove.
(REG_PARM_STACK_SPACE): Call rs6000_reg_parm_stack_space.
* config/rs6000/rs6000.c (rs6000_parm_needs_stack): New function.
(rs6000_function_parms_need_stack): Likewise.
(rs6000_reg_parm_stack_space): Likewise.
(rs6000_function_arg): Do not replace BLKmode by Pmode when
returning a register argument.

Index: gcc/gcc/config/rs6000/rs6000-protos.h
===
--- gcc.orig/gcc/config/rs6000/rs6000-protos.h
+++ gcc/gcc/config/rs6000/rs6000-protos.h
@@ -158,6 +158,7 @@ extern tree altivec_resolve_overloaded_b
 extern rtx rs6000_libcall_value (enum machine_mode);
 extern rtx rs6000_va_arg (tree, tree);
 extern int function_ok_for_sibcall (tree);
+extern int rs6000_reg_parm_stack_space (tree);
 extern void rs6000_elf_declare_function_name (FILE *, const char *, tree);
 extern bool rs6000_elf_in_small_data_p (const_tree);
 #ifdef ARGS_SIZE_RTX
Index: gcc/gcc/config/rs6000/rs6000.h
===
--- gcc.orig/gcc/config/rs6000/rs6000.h
+++ gcc/gcc/config/rs6000/rs6000.h
@@ -1527,10 +1527,6 @@ extern

[PATCH, rs6000] ELFv2 ABI 8/8: Enable by default on little-endian

2013-11-11 Thread Ulrich Weigand

Hello,

this patch finally throws the switch and enables the ELFv2 ABI
by default on powerpc64le-linux.

Bye,
Ulrich


ChangeLog:

2013-11-11  Ulrich Weigand  

* config/rs6000/sysv4le.h (LINUX64_DEFAULT_ABI_ELFv2): Define.


Index: gcc/gcc/config/rs6000/sysv4le.h
===
--- gcc.orig/gcc/config/rs6000/sysv4le.h
+++ gcc/gcc/config/rs6000/sysv4le.h
@@ -34,3 +34,7 @@
 
 #undef MULTILIB_DEFAULTS
 #defineMULTILIB_DEFAULTS { "mlittle", "mcall-sysv" }
+
+/* Little-endian PowerPC64 Linux uses the ELF v2 ABI by default.  */
+#define LINUX64_DEFAULT_ABI_ELFv2
+
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

1 2 3 4 5 >

1 - 100 of 480 matches

Mail list logo