from:"Ulrich Weigand"

Re: [PATCH] v2 PR102024 - IBM Z: Add psabi diagnostics

2022-04-26 Thread Ulrich Weigand via Gcc-patches

Andreas Krebbel  wrote:

>gcc/ChangeLog:
>PR target/102024
>* config/s390/s390-protos.h (s390_function_arg_vector): Remove
>prototype.
>* config/s390/s390.cc (s390_single_field_struct_p): New
function.
>(s390_function_arg_vector): Invoke s390_single_field_struct_p.
>(s390_function_arg_float): Likewise.

This looks good to me.

Bye,
Ulrich

Re: [PATCH] gdb-power10-single-step

2021-03-26 Thread Ulrich Weigand via Gcc-patches

On Thu, Mar 25, 2021 at 12:21:42PM -0500, will schmidt wrote:
> On Wed, 2021-03-10 at 18:50 +0100, Ulrich Weigand wrote:
> > Will Schmidt wrote:
> > 
> > >   This is a patch written by Alan Modra.  With his permission
> > > I'm submitting this for review and helping get this upstream.
> > > 
> > > Powerpc / Power10 ISA 3.1 adds prefixed instructions, which
> > > are 8 bytes in length.  This is in contrast to powerpc previously
> > > always having 4 byte instruction length.  This patch implements
> > > changes to allow GDB to better detect prefixed instructions, and
> > > handle single stepping across the 8 byte instructions.
> > 
> > There's a few issues I see here:
> 
> I've dug in a bit more,.. have a few questions related to the patch and
> the comments here.  I've got a refreshed version of this patch in my
> queue, with a nit or two that i'm  still trying to understand and
> squash before I post it.
> 
> > 
> > - The patch now *forces* software single-stepping for all 8-byte
> >   instructions.  I'm not sure why this is necessary; I thought
> >   that hardware single-stepping was supported for 8-byte instructions
> >   as well?  That would certainly be preferable.
> 
> 
> Does software single-stepping go hand-in-hand with executing the
> instructions from a displaced location?

Yes.  Hardware single-step executes the instruction where it is.
Software single-step needs to replace the subsequent instruction
with a breakpoint, and in order to be able to do that without
unduly affecting simultaneous execution of that code in other
threads, this is not done in place, but in a copy in a displaced
location.

> I only see one clear return-if-prefixed change in the patch, so I am
> assuming the above refers to the patch chunk seen as :
> > @@ -1081,6 +1090,10 @@ ppc_deal_with_atomic_sequence (struct regcache 
> > *regcache)
> >const int atomic_sequence_length = 16; /* Instruction sequence length.  
> > */
> >int bc_insn_count = 0; /* Conditional branch instruction count.  */
> > 
> > +  /* Power10 prefix instructions are two words in length.  */
> > +  if ((insn & OP_MASK) == 1 << 26)
> > +return { pc + 8 };

Yes, this is what I was refering to.  By returning a PC value here,
common code is instructed to always perform a software single-step.
This should not be necessary.

> I've got a local change to eliminate that return.   Per the poking I've
> done so far, none of the prefix instructions I've run against so far
> allow us past the is_load_and_reserve instruction check. 
> >   if (!IS_LOAD_AND_RESERVE_INSN (insn))
> > return {};
> 
> statement, so not a significant code flow behavior change.

Yes, if you just remove the three lines above, code will fall
through to here and return an empty sequence, causing the
common code to use hardware single-step.

> > - However, the inner loop of ppc_deal_with_atomic_sequence should
> >   probably be updated to correctly skip 8-byte instructions; e.g.
> >   to avoid mistakenly recognizing the second word of an 8-byte
> >   instructions for a branch or store conditional.  (Also, the
> >   count of up to "16 instructions" is wrong if 8-byte instructions
> >   are not handled specifically.)
> 
> I've got a local change to inspect the instruction and determine if it
> is prefixed, so I think i've got this handled.  I'm generally assuming
> we will never start halfway through an 8-byte prefixed instruction.

Yes, you can assume the incoming PC value is a valid PC at the start
of the current instruction.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000][PR gdb/27525] displaced stepping across addpcis/lnia.

2021-03-26 Thread Ulrich Weigand via Gcc-patches

On Tue, Mar 16, 2021 at 05:31:03PM -0500, will schmidt wrote:

>   This addresses PR gdb/27525. The lnia and other variations
> of the addpcis instruction write the value of the NIA into a target register.
> If we are single-stepping across a breakpoint, the instruction is executed
> from a displaced location, and thusly the written value of the PC/NIA
> will be incorrect.   The changes here will measure the displacement
> offset, and adjust the target register value to compensate.
> 
> This is written in a way that I believe will make it easier to
> update to handle prefixed (8 byte) instructions in a future patch.

This looks good to me functionally, but I'm not sure it really makes
much sense to extract code into those new routines -- *all* of the
ppc_displaced_step_fixup routine is about handling instructions that
read the PC, like the branches do.

I'd prefer if the new instructions were simply added to the existing
switch alongside the branches.

> +  displaced_offset = from - to ;  /* FIXME - By inspection, it appears 
> the displaced instruction
> + is at a lower address.  Is this 
> always true?  */

No, it could be either way.  But it shouldn't really matter since
you just need to apply the same displaced offset to the target,
whether the offset is positive or negative.  Again, you should
just do it the same way it is already done by existing code
that handles branches.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2 01/31] PR target/58901: reload: Handle SUBREG of MEM with a mode-dependent address

2020-11-27 Thread Ulrich Weigand via Gcc-patches

On Tue, Nov 24, 2020 at 06:19:41AM +, Maciej W. Rozycki wrote: 

>  NB I find the reindentation resulting in `push_reload' awful, just as I 
> do either version of the massive logical expression involved.  Perhaps we 
> could factor these out into `static inline' functions sometime, and then 
> have them split into individual returns within?

I'm generally hesitant to do a lot of changes to the reload code base
at this stage.  The strategy rather is to move all remaining targets
over to LRA and then simply delete reload :-)

Given that you're modernizing the vax target, I'm wondering if you
shouldn't rather go all the way and move it over to LRA as well,
then you wouldn't be affected by any remaining reload deficiencies.
(The major hurdle so far was that LRA doesn't support CC0, but it
looks like you're removing that anyway ...)

> +  && (strict_low
> +   || (subreg_lowpart_p (in)
> +   && (CONSTANT_P (SUBREG_REG (in))
> +   || GET_CODE (SUBREG_REG (in)) == PLUS
> +   || strict_low

If we do this, then that "|| strict_low" clause is now redundant. 

> +   && REGNO (SUBREG_REG (in)) >= FIRST_PSEUDO_REGISTER
> +   && reg_equiv_mem (REGNO (SUBREG_REG (in)))
> +   && (mode_dependent_address_p
> +   (XEXP (reg_equiv_mem (REGNO (SUBREG_REG (in))), 0),
> +MEM_ADDR_SPACE (reg_equiv_mem (REGNO (SUBREG_REG (in)

I guess this is not incorrect, but I'm wondering if it would be
*complete* -- there are other cases where reload replaces a pseudo
with a MEM even where reg_equiv_mem is null.

That said, if it fixes the test suite errors you're seeing, it would
probably be OK to go with just this minimal change -- unless we can
just move to LRA as mentioned above.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2] Fix -ffast-math flags handling inconsistencies

2020-11-24 Thread Ulrich Weigand via Gcc-patches

On Fri, Nov 20, 2020 at 09:33:48PM -0700, Jeff Law wrote:
> > * doc/invoke.texi (-ffast-math): Remove mention of -fno-signaling-nans.
> > Clarify conditions when __FAST_MATH__ preprocessor macro is defined.
> >
> > * opts.c (common_handle_option): Pass OPTS_SET to set_fast_math_flags
> > and set_unsafe_math_optimizations_flags.
> > (set_fast_math_flags): Add OPTS_SET argument, and use it to avoid
> > setting flags already explicitly set on the command line.  In the !set
> > case, also reset x_flag_cx_limited_range and x_flag_excess_precision.
> > Never reset x_flag_signaling_nans or x_flag_rounding_math.
> > (set_unsafe_math_optimizations_flags): Add OPTS_SET argument, and use
> > it to avoid setting flags already explicitly set on the command line.
> > (fast_math_flags_set_p): Also test x_flag_cx_limited_range,
> > x_flag_associative_math, x_flag_reciprocal_math, and
> > x_flag_rounding_math.
> It appears this was dropped on the floor. It looks reasonable to me.
> Please retest and commit. Thanks!

This did handle flag_excess_precision incorrectly, causing in particular
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97970

I've reverted for now and will post a modified patch later.
Sorry for the breakage!

At a first glance, it appears the problem is that -fexcess-precision=fast
is already the default, so it does not make sense for -fno-fast-math to
"reset" this back to default.  Probably the best is for fast-math to
simply not touch excess precision at all:
- if it is set explicitly on the command line, fast-math should not
  affect it in any case;
- if it is not set explicitly on the command line, the default either
  with fast-math or no-fast-math is excess-precision=fast, so again
  it should not be touched.

I'll still need to look into the language-specific handling of the
excess precision setting to make sure this works for all languages.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[wwwdocs] Re: [PATCH v2] Fix -ffast-math flags handling inconsistencies

2020-11-24 Thread Ulrich Weigand via Gcc-patches

On Sat, Nov 21, 2020 at 01:57:32PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 20, 2020 at 09:33:48PM -0700, Jeff Law wrote:
> > On 2/11/20 11:43 AM, Ulrich Weigand wrote:
> > > 1. If a component flag of -ffast-math (or -funsafe-math-optimizations)
> > >is explicitly set (or reset) on the command line, this should override
> > >any implicit change due to -f(no-)fast-math, no matter in which order
> > >the flags come on the command line.  This change affects all flags.
> > >
> > > 2. Any component flag modified from its default by -ffast-math should
> > >be reset to the default by -fno-fast-math.  This was previously
> > >not done for the following flags:
> > >   -fcx-limited-range
> > >   -fexcess-precision=
> > >
> > > 3. Once -ffinite-math-only is true, the -f(no-)signaling-nans flag has
> > >no meaning (if we have no NaNs at all, it does not matter whether
> > >there is a difference between quiet and signaling NaNs).  Therefore,
> > >it does not make sense for -ffast-math to imply -fno-signaling-nans.
> > >This is also a documentation change.
> > >
> > > 4. -ffast-math is documented to imply -fno-rounding-math, however the
> > >latter setting is the default anyway; therefore it does not make
> > >sense to try to modify it from its default setting.
> > >
> > > 5. The __FAST_MATH__ preprocessor macro should be defined if and only
> > >if all the component flags of -ffast-math are set to the value that
> > >is documented as the effect of -ffast-math.  The following flags
> > >were currently *not* so tested:
> > >  -fcx-limited-range
> > >  -fassociative-math
> > >  -freciprocal-math
> > >  -frounding-math
> > >(Note that we should still *test* for -fno-rounding-math here even
> > >though it is not set as per 4.  -ffast-math -frounding-math should
> > >not set the __FAST_MATH__ macro.)
> > >This is also a documentation change.
> 
> > It appears this was dropped on the floor.? It looks reasonable to me.?
> > Please retest and commit.? Thanks!

I've now retested on s390x-ibm-linux and committed to mainline.
Thanks for the review!

> It all makes sense, and is a nice improvement :-)  But please mention it
> in the release notes?  No doubt people did use non-sensical flag
> combinations, and they will be affected.  Thanks!

Here's a proposed patch to update the gcc-11 changes.hmtl.

OK to commit?

Bye,
Ulrich

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 46a6a37..c0f896a 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -58,6 +58,29 @@ a work-in-progress.
   is deprecated and will be removed in a future release. It should be
   possible to use --enable-cheaders=c_global (the default)
   with no change in behaviour. 
+
+  Some inconsistencies in handling combinations of 
-ffast-math,
+  -fno-fast-math, -funsafe-math-optimizations,
+  -fno-unsafe-math-optimizations, and their component flags
+  have been fixed.  This might change the behavior of the compiler when
+  invoked with certain combinations of such command line options.
+  The behavior is now consistently:
+  
+  If a component flag of -ffast-math or 
+  -funsafe-math-optimizations is explicitly set or reset
+  on the command line, this will override any implicit change, no 
matter
+  in which order the flags come on the command line.
+  Any component flag (which is not explicity set or reset on the 
command
+  line) that was modified from its default by -ffast-math 
or
+  -funsafe-math-optimizations is always reset to its 
default
+  by a subsequent -fno-fast-math or
+      -fno-unsafe-math-optimizations.
+  -ffast-math no longer implicitly changes
+  -fsignaling-math.
+  The __FAST_MATH__ preprocessor macro is set if and
+  only if all component flags of -ffast-math are set
+  to the value documented as the effect of 
-ffast-math.
+  
 
 
 



-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Fix up -Wpsabi diagnostcs + analysis [PR94704]

2020-05-19 Thread Ulrich Weigand via Gcc-patches

On Tue, Apr 28, 2020 at 02:13:02PM +0200, Jakub Jelinek wrote:

> struct X { };
> struct Y { int : 0; };
> struct Z { int : 0; Y y; };
> struct U : public X { X q; };
> struct A { double a; };
> struct B : public X { double a; };
> struct C : public Y { double a; };
> struct D : public Z { double a; };
> struct E : public U { double a; };

This is only testing the [[no_unique_address]] attribute in the
most-derived class, but I believe the attribute also affects
members in the base class.  Is this understanding correct?

Specifically, if we were to add two more tests to the above:
struct X2 { X x; };
struct X3 { [[no_unique_address]] X x; };
struct B2 : public X2 { double a; };
struct B3 : public X3 { double a; };
Then we should see that B2 does *not* count as single-element
struct, but B3 *does*.  (That's also what GCC currently does.)

Just trying to get clarity here as I'm about to work on this
for clang ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH v2] Fix -ffast-math flags handling inconsistencies

2020-02-11 Thread Ulrich Weigand

:
-  set_unsafe_math_optimizations_flags (opts, value);
+  set_unsafe_math_optimizations_flags (opts, opts_set, value);
   break;
 
 case OPT_ffixed_:
@@ -2839,44 +2841,44 @@ set_Wstrict_aliasing (struct gcc_options *opts, int 
onoff)
 /* The following routines are useful in setting all the flags that
-ffast-math and -fno-fast-math imply.  */
 static void
-set_fast_math_flags (struct gcc_options *opts, int set)
+set_fast_math_flags (struct gcc_options *opts,
+struct gcc_options *opts_set, int set)
 {
-  if (!opts->frontend_set_flag_unsafe_math_optimizations)
+  if (!opts->frontend_set_flag_unsafe_math_optimizations
+  && !opts_set->x_flag_unsafe_math_optimizations)
 {
   opts->x_flag_unsafe_math_optimizations = set;
-  set_unsafe_math_optimizations_flags (opts, set);
+  set_unsafe_math_optimizations_flags (opts, opts_set, set);
 }
   if (!opts->frontend_set_flag_finite_math_only)
-opts->x_flag_finite_math_only = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_finite_math_only, set);
   if (!opts->frontend_set_flag_errno_math)
-opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+SET_OPTION_IF_UNSET (opts, opts_set, flag_errno_math, !set);
+  if (!opts->frontend_set_flag_cx_limited_range)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_cx_limited_range, set);
+  if (!opts->frontend_set_flag_excess_precision)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_excess_precision,
+set ? EXCESS_PRECISION_FAST
+: EXCESS_PRECISION_DEFAULT);
+
+  // -ffast-math should also reset -frounding-math, but since this
+  // is off by default, there's nothing to do for now.
 }
 
 /* When -funsafe-math-optimizations is set the following
flags are set as well.  */
 static void
-set_unsafe_math_optimizations_flags (struct gcc_options *opts, int set)
+set_unsafe_math_optimizations_flags (struct gcc_options *opts,
+struct gcc_options *opts_set, int set)
 {
   if (!opts->frontend_set_flag_trapping_math)
-opts->x_flag_trapping_math = !set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_trapping_math, !set);
   if (!opts->frontend_set_flag_signed_zeros)
-opts->x_flag_signed_zeros = !set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_signed_zeros, !set);
   if (!opts->frontend_set_flag_associative_math)
-opts->x_flag_associative_math = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_associative_math, set);
   if (!opts->frontend_set_flag_reciprocal_math)
-opts->x_flag_reciprocal_math = set;
+SET_OPTION_IF_UNSET (opts, opts_set, flag_reciprocal_math, set);
 }
 
 /* Return true iff flags in OPTS are set as if -ffast-math.  */
@@ -2884,10 +2886,14 @@ bool
 fast_math_flags_set_p (const struct gcc_options *opts)
 {
   return (!opts->x_flag_trapping_math
+ && !opts->x_flag_signed_zeros
+ && opts->x_flag_associative_math
+ && opts->x_flag_reciprocal_math
  && opts->x_flag_unsafe_math_optimizations
  && opts->x_flag_finite_math_only
- && !opts->x_flag_signed_zeros
  && !opts->x_flag_errno_math
+     && !opts->x_flag_rounding_math
+ && opts->x_flag_cx_limited_range
  && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST);
 }
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-11 Thread Ulrich Weigand

Joseph Myers wrote:
> On Mon, 10 Feb 2020, Ulrich Weigand wrote:
> > I'm not sure how this can be implemented in the current option handling
> > framework.  The -ffast-math handling case would have to check whether
> > or not there is any explicit use of -f(no-)finite-math-only; is there
> > any infrastructure to readily perform such a test?
> 
> Pass opts_set as well as opts to set_fast_math_flags, and use it for 
> checking whether an option was explicitly specified (whether enabled or 
> disabled) on the command line.

Ah, I see.  I'll update the patch accordingly.  Thanks!

I guess I still need to check for the opts->frontend_set_... flags
as well -- those are not reflected in opts_set, right?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-10 Thread Ulrich Weigand

Segher Boessenkool wrote:
> On Fri, Feb 07, 2020 at 05:47:32PM +0100, Ulrich Weigand wrote:
> > > but what happens to -fsignalling-nans -ffast-math then?  Better leave 
> > > those
> > > in I'd say.
> > 
> > Ah, it seems I was confused about the intended semantics here.
> > 
> > I thought that a *more specific* option like -fsignalling-nans was always
> > intended to override a more generic option like -ffast-math, no matter
> > whether it comes before or after it on the command line.
> 
> -fsignaling-nans is an independent option.  Signaling NaNs don't do much
> at all when you also have -ffinite-math-only, of course, which says you
> don't have *any* NaNs.

That's a good point, thanks for pointing this out!  I agree that
-fsignaling-nans is not a good example then; it really should be
independent of -ffast-math.

However, there's still one other option that has a similar issue:
-frounding-math.  This *is* properly related to -ffast-math, in
that -ffast-math really ought to imply -fno-rounding-math, but the
latter is (currently) also the default in GCC.

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-10 Thread Ulrich Weigand

Joseph Myers wrote:
> On Fri, 7 Feb 2020, Ulrich Weigand wrote:
> 
> > I thought that a *more specific* option like -fsignalling-nans was always
> > intended to override a more generic option like -ffast-math, no matter
> > whether it comes before or after it on the command line.
> 
> Yes, that's correct.  (There are cases where it's ambiguous which option 
> is more specific - see what I said in 
> <https://gcc.gnu.org/ml/gcc/2016-03/msg00245.html> about -Werror= - but I 
> don't think -ffast-math involves such cases.)

I see.  However, this does not appear to be implemented in current code.
GCC (without my patch) behaves like this (using -fno-finite-math-only
as an example instead of -fsignaling-nans, given the separate point
about the latter option raised by Segher):

gcc -dD -E -x c /dev/null | grep FINITE
#define __FINITE_MATH_ONLY__ 0
gcc -dD -E -x c /dev/null -ffast-math | grep FINITE
#define __FINITE_MATH_ONLY__ 1
gcc -dD -E -x c /dev/null -ffast-math -fno-finite-math-only | grep FINITE
#define __FINITE_MATH_ONLY__ 0
gcc -dD -E -x c /dev/null -fno-finite-math-only -ffast-math | grep FINITE
#define __FINITE_MATH_ONLY__ 1

Given the above rule, in the last case __FINITE_MATH_ONLY__ should
also be 0, right?

I'm not sure how this can be implemented in the current option handling
framework.  The -ffast-math handling case would have to check whether
or not there is any explicit use of -f(no-)finite-math-only; is there
any infrastructure to readily perform such a test?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-07 Thread Ulrich Weigand

Richard Biener wrote:
> On Fri, Jan 31, 2020 at 6:01 PM Ulrich Weigand  wrote:
> > The overall effect of this patch is that now all component flags of
> > -ffast-math are treated exactly equivalently:
> >   * they are set (changed from their default) with -ffast-math
> >   * they are reset to their default with -fno-fast-math
> >   * __FAST_MATH__ is only defined if the value of the flag matches
> > what -ffast-math would have set it to
> 
> The last part is not obviously correct to me since it doesn't match
> documentation which says
> 
> @item -ffast-math
> @opindex ffast-math
> Sets the options @option{-fno-math-errno}, 
> @option{-funsafe-math-optimizations},
> @option{-ffinite-math-only}, @option{-fno-rounding-math},
> @option{-fno-signaling-nans}, @option{-fcx-limited-range} and
> @option{-fexcess-precision=fast}.
> 
> This option causes the preprocessor macro @code{__FAST_MATH__} to be defined.
> 
> to me this reads as -ffast-math -fexcess-precision=standard defines
> __FAST_MATH__.
> The only relevant part to define __FAST_MATH__ is specifying -ffast-math, 
> other
> options are not relevant (which of course is contradicted by
> implementation - where
> I didn't actually follow its history in that respect).  So can you
> adjust documentation
> as to when exactly __FAST_MATH__ is defined?

Agreed.  This should probably be worded something along the lines of:
"Whenever all of those options are set to these values as listed above,
the preprocessor macro @code{__FAST_MATH__} will be defined."

> > -  if (!opts->frontend_set_flag_signaling_nans)
> > -   opts->x_flag_signaling_nans = 0;
> > -  if (!opts->frontend_set_flag_rounding_math)
> > -   opts->x_flag_rounding_math = 0;
[...]
> > +  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
> > +  // but since those are off by default, there's nothing to do for now.
> ...
> 
> but what happens to -fsignalling-nans -ffast-math then?  Better leave those
> in I'd say.

Ah, it seems I was confused about the intended semantics here.

I thought that a *more specific* option like -fsignalling-nans was always
intended to override a more generic option like -ffast-math, no matter
whether it comes before or after it on the command line.

Is it instead the case that the intended behavior is the generic option
overrides the specific option if the generic option comes later on the
command line?

In that case, I agree that those should be left in.

However, then I think it would be good to add tests for 
!flag_signaling_nans / !flag_rounding_math to fast_math_flags_set_p,
because with these options on, we aren't really in fast-math territory
any more ...

> Note frontends come into play with what is considered -ffast-math
> and -fno-fast-math but below flags are tested irrespectively of that
> interpretation.
> 
> Note there's -fcx-fortran-rules similar to -fcx-limited-range but not tested
> above.  The canonical middle-end "flag" to look at is flag_complex_method.
> Somehow -fcx-fortran-rules doesn't come into play at all above but it
> affects -fcx-limited-range in another inconsistent way in that
> -fcx-limited-range -fcx-fortran-rules and -fcx-fortran-rules 
> -fcx-limited-range
> behave the same (-fcx-fortran-rules takes precedence...).  I guess
> -fcomplex-method=ENUM should be exposed and -fcx-* made
> appropriate aliases here.

I agree it would probably be the best if to use a -fcomplex-method=...
flag, handled analogously to -fexcess-precision in that it defaults to
an explicit DEFAULT value; unless this is overridden by an explicit
command line flag, the language front-end can then choose what DEFAULT
means for this particular language.

However, I'd prefer to not include that change into this patch as well :-)
This patch only changes the behavior related to -fcx-limited-range in one
very special case (-Ofast -fno-fast-math), where it makes it strictly
better.  It should not affect the (existing) interaction with
-fcx-fortran-rules at all -- so anything to improve this can be done
in a separate patch, I think.

> You're tapping into a mine-field ;)

That's for sure :-)

Thanks,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Fix -ffast-math flags handling inconsistencies

2020-01-31 Thread Ulrich Weigand

Hello,

we've noticed some inconsistencies in how the component flags of -ffast-math
are handled, see the discussion on the GCC list starting here:
https://gcc.gnu.org/ml/gcc/2020-01/msg00365.html

This patch fixes those inconsistencies.  Specifically, there are the
following changes:

1. Some component flags for -ffast-math are *set* with -ffast-math
   (changing the default), but are not reset with -fno-fast-math,
   causing the latter to have surprising results.  (Note that since
   "-ffast-math -fno-fast-math" is short-cut by the driver, you'll
   only see the surprising results with "-Ofast -fno-fast-math".)
   This is fixed here by both setting and resetting the flags.

   This affects the following flags
  -fcx-limited-range
  -fexcess-precision=

2. Some component flags for -ffast-math are changed from their default,
   but are *not* included in the fast_math_flags_set_p test, causing
   __FAST_MATH__ to remain predefined even when the full set of fast
   math options is not actually in effect.  This is fixed here by
   adding those flags into the fast_math_flags_set_p test.

   This affects the following flags:
 -fcx-limited-range
 -fassociative-math
 -freciprocal-math

3. For some math flags, set_fast_math_flags has code that sets their
   values only to what is already their default.  The overall effect
   of this code is a complete no-op.  This patch removes that dead code.

   This affects the following flags:
 -frounding-math
 -fsignaling-nans


The overall effect of this patch is that now all component flags of
-ffast-math are treated exactly equivalently:
  * they are set (changed from their default) with -ffast-math
  * they are reset to their default with -fno-fast-math
  * __FAST_MATH__ is only defined if the value of the flag matches
what -ffast-math would have set it to

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich

ChangeLog:

* opts.c (set_fast_math_flags): In the !set case, also reset
x_flag_cx_limited_range and x_flag_excess_precision.  Remove dead
code resetting x_flag_signaling_nans and x_flag_rounding_math.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, and x_flag_reciprocal_math.

diff --git a/gcc/opts.c b/gcc/opts.c
index 7affeb4..4452793 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set)
 opts->x_flag_finite_math_only = set;
   if (!opts->frontend_set_flag_errno_math)
 opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+  if (!opts->frontend_set_flag_cx_limited_range)
+opts->x_flag_cx_limited_range = set;
+  if (!opts->frontend_set_flag_excess_precision)
+opts->x_flag_excess_precision
+  = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
+
+  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
+  // but since those are off by default, there's nothing to do for now.
 }
 
 /* When -funsafe-math-optimizations is set the following
@@ -2884,10 +2880,13 @@ bool
 fast_math_flags_set_p (const struct gcc_options *opts)
 {
   return (!opts->x_flag_trapping_math
+ && !opts->x_flag_signed_zeros
+ && opts->x_flag_associative_math
+ && opts->x_flag_reciprocal_math
  && opts->x_flag_unsafe_math_optimizations
  && opts->x_flag_finite_math_only
- && !opts->x_flag_signed_zeros
  && !opts->x_flag_errno_math
+ && opts->x_flag_cx_limited_range
  && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST);
 }
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA][1/3] Remove Cell Broadband Engine SPU targets

2019-10-08 Thread Ulrich Weigand

Thomas Schwinge wrote:

> In r276692, I committed the attached to "Add back Trevor Smigiel; move
> into Write After Approval section", assuming that it was just an
> oversight that he got dropped from the file, as he -- in contrast to
> David and Ulrich -- doesn't remain listed with other roles.

Ah, you're right -- thanks for catching this!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Use type alignment in get_builtin_sync_mem

2019-09-09 Thread Ulrich Weigand

Richard Biener wrote:
> On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand  wrote:
> > But as far as I can see, for *atomic* operations at least, we do make
> > that assumption.  The x86 back-end for example just assumes that any
> > "int" or "long" object that is the target of an atomic operation is
> > naturally aligned, or else the generated code would just crash.   So
> > if you used your example with a packed struct and passed that pointer
> > to an atomic, the back-end would still assume the alignment and the
> > code would crash.  But I'd still consider this a perfectly reasonable
> > and expected behavior in this case ...
> 
> Would it crash?  I think it would be not atomic if it crossed a cache-line
> boundary.

Sorry, I misremembered the x86 operations, it does indeed work for
unaligned 4- or 8-byte accesses.  However, for 16-byte accesses,
CMPXCHG16B does require aligned memory, the manual says:

  Note that CMPXCHG16B requires that the destination (memory)
  operand be 16-byte aligned.
[...]
64-Bit Mode Exceptions
[...]
#GP(0) If the memory address is in a non-canonical form.
   If memory operand for CMPXCHG16B is not aligned on a
   16-byte boundary.
[...]

So this is basically the same situation as on s390, except that on x86
the default TImode alignment is already 16 bytes.

> > Of course if some part of the middle end get the alignment wrong, we
> > have a problem.  But I'm not sure how this could happen here.  I agree
> > that it might be the case that a user-specified *under*-alignment might
> > get lost somewhere (e.g. you have a packed int and somewhere in the
> > optimizers this gets accidentally cast to a normal int with the default
> > alignment).  But in *this* case, what would have to happen is that the
> > middle-end accidentally casts something to a pointer with *higher*
> > than the default alignment for the type, even though no such pointer
> > cast is present in the source.  Can this really happen?
> 
> If the cast to the lower-aligned type is lost and there is an earlier cast
> to the aligned type.

My point is that this "cast to the aligned type" must have come from the
user in this case (since the aligned type is *more* aligned that any
standard version of the typ), and if the user casts the value to the aligned
type, it is undefined behavior anyway if the value was in fact not aligned.

> > This would actually
> > be wrong on s390.  The problem is that all atomic operations on any
> > one single object need to be consistent: they either all use the
> > 16-byte atomic instruction, or else they all protect the access with
> > a lock.  If you have parts of the code use the lock and other parts
> > use the instruction, they're not actually protected against each other.
> 
> But then the user has to be consistent in accessing the object
> atomically.  If he accesses it once as (aligned_int128_t *)
> and once as (int128_t *) it's his fault, no?

I'm not sure why this should be a requirement.  E.g. if we have a
set of subroutines that operates (correctly) on any int128_t *,
aligned or not, and have one user of those routines that actually
locally has an aligned_int128_t, then that user could no longer
safely pass that a pointer to its aligned variable to that
subsystem if it also does atomic operations locally ...

> If we'd document that the user invokes undefined behavior
> when performing __builtin_atomic () on objects that are not
> sufficiently aligned according to target specific needs then
> we are of course fine and should simply assume the memory
> is aligned accordingly (similar to your patch but probably with
> some target hook specifying the atomic alignment requirement
> when it differs from mode alignment).  But I don't read the
> documentation of our atomic builtins that way.
> 
> Does _Atomic __int128_t work properly on s390?

Yes, it currently does work properly in all cases (just not in
all cases as efficiently as it could be).

The rule to perform atomic operations on __int128_t on s390 is:
 - If the object is *actually* 16-byte aligned at runtime, then
   every atomic access must be performed using one of the atomic
   instructions (CDSG, LPQ, STPQ).
 - If the object is actually *not* 16-byte aligned, then every
   atomic access must be performed under protection of an
   out-of-line lock.

This rule is correctly implemented by:
 - The __builtin_*_16 family of libatomic library routines;
   these all perform a run-time alignment check and use either
   the instruction or the lock, as appropriate; and
 - Compiler-generated inline code;
   this will always use the instruction, but the compiler will
   generate it only if it can prove at compile-time that the
   object *must* be aligned at run-time.

[ However, this rule is *not*

Re: [PATCH] Use type alignment in get_builtin_sync_mem

2019-09-06 Thread Ulrich Weigand

Richard Biener wrote:
> On Tue, Sep 3, 2019 at 3:09 PM Ulrich Weigand  wrote:
> > > If you read the C standards fine-print then yes.  But people (or
> > > even the C frontend!) hardly get that correct since for example
> > > for
> > >
> > > struct __attribute__((packed)) { int i; } s;
> > >
> > > void foo ()
> > > {
> > >   __builtin_printf ("%p", );
> > > }
> > >
> > > the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> > > (and not an int * variant with 1-byte alignment).  And people use
> > > int * to pass such pointers everywhere.
> >
> > Well ... I'd say if you cast to int * and then use an atomic operation,
> > it's your own fault that it fails :-/   If the frontend itself uses
> > the wrong type, that is of course a problem.
> 
> Yes.  The C standard says that if you cast something to a pointer type
> the pointer has to be aligned according to the pointed-to type, otherwise
> undefined.  But we have no chance to use this because of this kind of
> issues (and of course developer laziness...).

But as far as I can see, for *atomic* operations at least, we do make
that assumption.  The x86 back-end for example just assumes that any
"int" or "long" object that is the target of an atomic operation is
naturally aligned, or else the generated code would just crash.   So
if you used your example with a packed struct and passed that pointer
to an atomic, the back-end would still assume the alignment and the
code would crash.  But I'd still consider this a perfectly reasonable
and expected behavior in this case ...

The only thing that is special on s390 is that for the 16-byte integer
type, the "natural" (mode) alignment is only 8 bytes, but for atomics
we require 16 bytes.  But if you explicitly use a 16-byte aligned
pointer type to assert to the compiler that this object *is* aligned,
the compiler should be able to rely on that.

Of course if some part of the middle end get the alignment wrong, we
have a problem.  But I'm not sure how this could happen here.  I agree
that it might be the case that a user-specified *under*-alignment might
get lost somewhere (e.g. you have a packed int and somewhere in the
optimizers this gets accidentally cast to a normal int with the default
alignment).  But in *this* case, what would have to happen is that the
middle-end accidentally casts something to a pointer with *higher*
than the default alignment for the type, even though no such pointer
cast is present in the source.  Can this really happen?

> I'm not sure how it is done now but IIRC the users
> use __atomic_load (ptr) and then the frontend changes that
> to one of BUILT_IN_ATOMIC_LOAD_{N,1,2,4,8,16} based on
> some criteria (size mainly).  I'm saying we should factor in
> alignment here, _not_ using say BUILT_IN_ATOMIC_LOAD_16
> if according to the C standard the data isn't aligned.  Maybe we can
> ask the target whether the alignment according to C matches the
> requirement for _16 expansion.  And if things are not fine
> the FE should instead use BUILT_IN_ATOMIC_LOAD_N
> which we later if the compiler can prove bigger alignment and N
> is constant could expand as one of the others.
> 
> But safety first.

The problem with using the _N variant is that we then get a call
to the _n version of the library routine, right?  This would actually
be wrong on s390.  The problem is that all atomic operations on any
one single object need to be consistent: they either all use the
16-byte atomic instruction, or else they all protect the access with
a lock.  If you have parts of the code use the lock and other parts
use the instruction, they're not actually protected against each other.

This is why the _16 version of the library routine does the runtime
alignment check, so that all accesses to actually 16-byte aligned
objects use the instruction, both in the library and in compiler-
generated code.   The _n version doesn't do that.

So I guess we'd have to convert the _n version back to the _16
library call if the size is constant 16, or something?

> So yes, I'd say try tackling the issue in the frontend which is the
> last to know the alignment in the most optimistic way (according
> to the C standard).  At RTL expansion time we have to be (very)
> conservative.
> 
> Btw, the alternative is to add an extra alignment parameter
> to all of the atomic builtins where the FE could stick the alignment
> for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
> variants it sounds easier to use those...
> 
> Note we only document __atomic_load and __atomic_load_n so
> the _{1,2,4,8,16} variants seem to be implementation detail.

Adding the alignment parameter would work, I think.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Use type alignment in get_builtin_sync_mem

2019-09-03 Thread Ulrich Weigand

Richard Biener wrote:
> On Tue, Sep 3, 2019 at 1:56 PM Ulrich Weigand  wrote:
> > combined with the fact that get_object_alignment_2 actually itself
> > uses type alignment if we have an actual memory object:
> >   /* When EXP is an actual memory reference then we can use
> >  TYPE_ALIGN of a pointer indirection to derive alignment.
> >  Do so only if get_pointer_alignment_1 did not reveal absolute
> >  alignment knowledge and if using that alignment would
> >  improve the situation.  */
> >
> > Is this not correct either, or is the situation there somehow different?
> 
> We're looking at an address, the above applies to places where we actually
> dereference it and there we take the type from the type of the access
> and not from the type of the pointer (or the pointed-to type) since that would
> be equally wrong.

I guess I'm confused again, sorry :-)   What is "the type of the access",
and where do we get it from?  In your example below, if somebody were
to access s.i here, what would the "type of the access" be?

> >  But in that case I'd assume that
> > every possible value of "ptr" at runtime must be 16 byte aligned, or
> > else it wouldn't be a valid value of a variable of that type, right?
> > So in that case assuming a 16-byte alignment would be correct?
> 
> If you read the C standards fine-print then yes.  But people (or
> even the C frontend!) hardly get that correct since for example
> for
> 
> struct __attribute__((packed)) { int i; } s;
> 
> void foo ()
> {
>   __builtin_printf ("%p", );
> }
> 
> the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> (and not an int * variant with 1-byte alignment).  And people use
> int * to pass such pointers everywhere.

Well ... I'd say if you cast to int * and then use an atomic operation,
it's your own fault that it fails :-/   If the frontend itself uses
the wrong type, that is of course a problem.

> > > The proper way to carry the information from source to RTL expansion
> > > (or earlier GIMPLE) is to add another argument to the builtins
> > > specifying alignment which the FEs can derive from the argument type
> > > given appropriate semantics of the builtin.
> >
> > I guess I can have a look to do it this way, if necessary.
> > Is there already some example of a builtin that does that?
> 
> I think the atomic_* builtins avoid (or should avoid) the issue by
> only "expanding"
> to the fixed size vairants if the memory is appropriately aligned
> and otherwise fall back to _n which doesn't assume any alignment(?)

Well, this is exactly what we're trying to do!  If the compiler can
prove correct alignment, we want to emit the atomic instruction.
Otherwise, we want to emit the library call.  (The library function
will do a run-time alignment check, and if the object is aligned
after all, it will also use the atomic instruction; otherwise it
will use a lock.)

The problem is that the decision whether to emit the instruction or
the library call is currently done by the back-end: if the insn
expander fails, the middle-end will fall back to the libcall.

So the back-end needs to base this decision on the alignment, but
the only input it has is the MEM (and its alignment setting).  But
that MEM is **generated** by get_builtin_sync_mem using the alignment
computed there, which is exactly the place I'm trying to fix ...

Are you saying that this very decision ought to be made earlier
in the front-end?  That would of course also be OK with me.

> > We definitely must handle the misaligned case (since the **default**
> > alignment of __int128 on s390x is misaligned for atomic access).
> 
> But is it fast (and atomic)?  Would going the _n way "work" or do
> you need a special 8-byte-aligned__int128 path?

See above.  We want the libcall fallback in the unaligned case.  It
is sort-of fast (except for the function call and run-time check
overhead) if the object actually is aligned, less fast if it isn't.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Use type alignment in get_builtin_sync_mem

2019-09-03 Thread Ulrich Weigand

Richard Biener wrote:
> On Mon, Sep 2, 2019 at 10:35 PM Ulrich Weigand  wrote:
> > Now one question might be, why does get_pointer_alignment not take
> > type alignment into account by itself?  This appears to be deliberate
> > to avoid considering numeric pointer values to be aligned when they
> > are not for target-specific reasons (e.g. the low bit that indicates
> > Thumb on ARM).
> 
> It's simply because we throw away type casting on GIMPLE so
> when you see an aligned __int128 * as argument to
> __builtin_sync* then there's no guarantee it's actually the type
> the user used here.   The user might have written
> 
> __buitlin_sync_* ((__int128 *) ptr /* cast away bogus over-alignedness */, 
> ...);
> 
> and GIMPLE happily throws away the cast.

Huh, I was simply going after the comment in front of get_object_alignment_2:
   Note that the address (and thus the alignment) computed here is based
   on the address to which a symbol resolves, whereas DECL_ALIGN is based
   on the address at which an object is actually located.  These two
   addresses are not always the same.  For example, on ARM targets,
   the address  of a Thumb function foo() has the lowest bit set,
   whereas foo() itself starts on an even address.

combined with the fact that get_object_alignment_2 actually itself
uses type alignment if we have an actual memory object:
  /* When EXP is an actual memory reference then we can use
 TYPE_ALIGN of a pointer indirection to derive alignment.
 Do so only if get_pointer_alignment_1 did not reveal absolute
 alignment knowledge and if using that alignment would
 improve the situation.  */

Is this not correct either, or is the situation there somehow different?

In any case, I'm not sure I understand your example.  Is this supposed
to cover the case where "ptr" is a type with 16 byte alignment, while
__int128 only has 8 byte alignment?  But in that case I'd assume that
every possible value of "ptr" at runtime must be 16 byte aligned, or
else it wouldn't be a valid value of a variable of that type, right?
So in that case assuming a 16-byte alignment would be correct?

> The proper way to carry the information from source to RTL expansion
> (or earlier GIMPLE) is to add another argument to the builtins
> specifying alignment which the FEs can derive from the argument type
> given appropriate semantics of the builtin.

I guess I can have a look to do it this way, if necessary.
Is there already some example of a builtin that does that?

> Alternatively you could make it simply undefined behavior passing
> not appropriately aligned memory to those builtins.  But I guess
> we support misaligned cases so that's not an option.

We definitely must handle the misaligned case (since the **default**
alignment of __int128 on s390x is misaligned for atomic access).

> also looks fishy.  So maybe it _is_ an option to simply require
> larger alignment?  Maybe add targetm.mode_alignment_for_atomics ()?

That's why this wouldn't work either -- a default __int128 is not
correctly aligned, we can only assume alignment if it was specifically
specified by the user via an attribute or the like.  (But since the
latter is a very common case, it is also important to handle.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH] Use type alignment in get_builtin_sync_mem

2019-09-02 Thread Ulrich Weigand

Hello,

on s390x the 128-bit integer type is only aligned to 8 bytes by default,
but when lock-free atomic operations can only be performed on objects
aligned to 16 bytes.  However, we've noticed that GCC sometimes falls
back to library calls *even if* the object is actually 16 byte aligned,
and GCC could easily know this from type alignment information.

However, it turns out that get_builtin_sync_mem *ignores* type alignment
data, and only looks at *mode* alignment and pointer alignment data.
This is a problem since mode alignment doesn't know about user-provided
alignment attributes, and while pointer alignment does sometimes know
about those, this usually only happens with optimization on and also
if the optimizers can trace pointer assignments to the final object.

One important use case where the latter does not happen is with the
C++ atomic classes, because here the object is accessed via the "this"
pointer.  Pointer alignment tracking at this point never knows the
final object, and thus we always get a library call *even though*
libstdc++ has actually marked the class type with the correct alignment
atttribute.

Now one question might be, why does get_pointer_alignment not take
type alignment into account by itself?  This appears to be deliberate
to avoid considering numeric pointer values to be aligned when they
are not for target-specific reasons (e.g. the low bit that indicates
Thumb on ARM).

However, this is not an issue in get_builtin_sync_mem, where we are
actually interested in the alignment of the MEM we're just about to
generate, so it should be fine to check type alignment here.

This patch does just that, fixing the issue we've been seeing.

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

* builtins.c (get_builtin_sync_mem): Respect type alignment.

testsuite/ChangeLog:

* gcc.target/s390/md/atomic_exchange-1.c: Do not use -latomic.
(aligned_int128): New data type.
(ae_128_0): Use it instead of __int128 to ensure proper alignment
for atomic accesses.
(ae_128_1): Likewise.
(g128): Likewise.
(main): Likewise.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 274142)
+++ gcc/builtins.c  (working copy)
@@ -6001,9 +6001,16 @@
 
   mem = validize_mem (mem);
 
-  /* The alignment needs to be at least according to that of the mode.  */
-  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
-  get_pointer_alignment (loc)));
+  /* The alignment needs to be at least according to that of the mode.
+ Also respect alignment requirements of the type, and alignment
+ info that may be deduced from the expression itself.  */
+  unsigned int align = GET_MODE_ALIGNMENT (mode);
+  if (POINTER_TYPE_P (TREE_TYPE (loc)))
+{
+  unsigned int talign = min_align_of_type (TREE_TYPE (TREE_TYPE (loc)));
+  align = MAX (align, talign * BITS_PER_UNIT);
+}
+  set_mem_align (mem, MAX (align, get_pointer_alignment (loc)));
   set_mem_alias_set (mem, ALIAS_SET_MEMORY_BARRIER);
   MEM_VOLATILE_P (mem) = 1;
 
Index: gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
===
--- gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c(revision 
274142)
+++ gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c(working copy)
@@ -1,7 +1,7 @@
 /* Machine description pattern tests.  */
 
 /* { dg-do compile } */
-/* { dg-options "-lpthread -latomic" } */
+/* { dg-options "-lpthread" } */
 /* { dg-do run { target { s390_useable_hw } } } */
 
 /**/
@@ -119,19 +119,21 @@
 /**/
 
 #ifdef __s390x__
+typedef __int128 __attribute__((aligned(16))) aligned_int128;
+
 __int128
-ae_128_0 (__int128 *lock)
+ae_128_0 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 0, 2);
 }
 
 __int128
-ae_128_1 (__int128 *lock)
+ae_128_1 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 1, 2);
 }
 
-__int128 g128;
+aligned_int128 g128;
 
 __int128
 ae_128_g_0 (void)
@@ -274,7 +276,7 @@
 
 #ifdef __s390x__
   {
-   __int128 lock;
+   aligned_int128 lock;
__int128 rval;
 
    lock = oval;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

2019-09-02 Thread Ulrich Weigand

[RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

Remove all references to spu from the libstdc++ directory.

Note that libstdc++ currently consideres "__ea" a reserved word
(because it was for the SPU target), and therefore specifically
avoids using it in include/tr1/ell_integral.tcc.  This patch
removes this workaround (which is not strictly necessary, but
seems the way to go ...).

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich


libstdc++-v3/ChangeLog:

* crossconfig.m4: Remove references to spu.
* configure: Regenerate.
* doc/xml/manual/appendix_contributing.xml: Remove references
to __ea as "badword" for spu.
* doc/html/manual/source_code_style.html: Regenerate.
* include/tr1/ell_integral.tcc (__ellint_rd): Do not attempt
to avoid __ea (as "badword" for spu).
(__ellint_rj): Likewise.

Index: libstdc++-v3/crossconfig.m4
===
--- libstdc++-v3/crossconfig.m4 (revision 275321)
+++ libstdc++-v3/crossconfig.m4 (working copy)
@@ -54,14 +54,6 @@
 AC_DEFINE(HAVE_SQRTF)
 ;;
 
-  spu-*-elf*)
-GLIBCXX_CHECK_COMPILER_FEATURES
-GLIBCXX_CHECK_LINKER_FEATURES
-GLIBCXX_CHECK_MATH_SUPPORT
-GLIBCXX_CHECK_STDLIB_SUPPORT
-AM_ICONV
-;;
-
   *-aix*)
 GLIBCXX_CHECK_LINKER_FEATURES
 GLIBCXX_CHECK_MATH_SUPPORT
Index: libstdc++-v3/doc/html/manual/source_code_style.html
===
--- libstdc++-v3/doc/html/manual/source_code_style.html (revision 275321)
+++ libstdc++-v3/doc/html/manual/source_code_style.html (working copy)
@@ -48,9 +48,6 @@
 Â Â Â Â Â Â _res_ext
 Â Â Â Â Â Â __tg_*
 
-Â Â Â Â Â Â SPUÂ adds:
-Â Â Â Â Â Â __ea
-
 Â Â Â Â Â Â ForÂ GCC:
 
 Â Â Â Â Â Â [NoteÂ thatÂ thisÂ listÂ isÂ outÂ ofÂ date.Â ItÂ appliesÂ toÂ theÂ 
old
Index: libstdc++-v3/doc/xml/manual/appendix_contributing.xml
===
--- libstdc++-v3/doc/xml/manual/appendix_contributing.xml   (revision 
275321)
+++ libstdc++-v3/doc/xml/manual/appendix_contributing.xml   (working copy)
@@ -463,9 +463,6 @@
   _res_ext
   __tg_*
 
-  SPU adds:
-  __ea
-
   For GCC:
 
   [Note that this list is out of date. It applies to the old
Index: libstdc++-v3/include/tr1/ell_integral.tcc
===
--- libstdc++-v3/include/tr1/ell_integral.tcc   (revision 275321)
+++ libstdc++-v3/include/tr1/ell_integral.tcc   (working copy)
@@ -370,11 +370,10 @@
   __zn = __c0 * (__zn + __lambda);
 }
 
- // Note: __ea is an SPU badname.
-  _Tp __eaa = __xndev * __yndev;
+  _Tp __ea = __xndev * __yndev;
   _Tp __eb = __zndev * __zndev;
-  _Tp __ec = __eaa - __eb;
-  _Tp __ed = __eaa - _Tp(6) * __eb;
+  _Tp __ec = __ea - __eb;
+  _Tp __ed = __ea - _Tp(6) * __eb;
   _Tp __ef = __ed + __ec + __ec;
   _Tp __s1 = __ed * (-__c1 + __c3 * __ed
/ _Tp(3) - _Tp(3) * __c4 * __zndev * __ef
@@ -381,7 +380,7 @@
/ _Tp(2));
   _Tp __s2 = __zndev
* (__c2 * __ef
-+ __zndev * (-__c3 * __ec - __zndev * __c4 - __eaa));
++ __zndev * (-__c3 * __ec - __zndev * __c4 - __ea));
 
   return _Tp(3) * __sigma + __power4 * (_Tp(1) + __s1 + __s2)
 / (__mu * std::sqrt(__mu));
@@ -634,17 +633,16 @@
   __pn = __c0 * (__pn + __lambda);
 }
 
- // Note: __ea is an SPU badname.
-  _Tp __eaa = __xndev * (__yndev + __zndev) + __yndev * __zndev;
+  _Tp __ea = __xndev * (__yndev + __zndev) + __yndev * __zndev;
   _Tp __eb = __xndev * __yndev * __zndev;
   _Tp __ec = __pndev * __pndev;
-  _Tp __e2 = __eaa - _Tp(3) * __ec;
-  _Tp __e3 = __eb + _Tp(2) * __pndev * (__eaa - __ec);
+  _Tp __e2 = __ea - _Tp(3) * __ec;
+  _Tp __e3 = __eb + _Tp(2) * __pndev * (__ea - __ec);
   _Tp __s1 = _Tp(1) + __e2 * (-__c1 + _Tp(3) * __c3 * __e2 / _Tp(4)
 - _Tp(3) * __c4 * __e3 / _Tp(2));
   _Tp __s2 = __eb * (__c2 / _Tp(2)
+ __pndev * (-__c3 - __c3 + __pndev * __c4));
-  _Tp __s3 = __pndev * __eaa * (__c2 - __pndev * __c3)
+  _Tp __s3 = __pndev * __ea * (__c2 - __pndev * __c3)
- __c2 * __pndev * __ec;
 
   return _Tp(3) * __sigma + __power4 * (__s1 + __s2 + __s3)
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[RFA][2/3] Remove Cell Broadband Engine SPU targets: testsuite

2019-09-02 Thread Ulrich Weigand

[RFA][2/3] Remove Cell Broadband Engine SPU targets: testsuite

Remove all references to spu from the testsuite directory.

Tested on s390x-ibm-linux.

OK for mainline?

(Deleted directories omitted from patch.)

Bye,
Ulrich



gcc/testsuite/ChangeLog:

* lib/compat.exp: Remove references to spu.
* lib/fortran-torture.exp: Likewise.
* lib/gcc-dg.exp: Likewise.
* lib/gfortran.exp: Likewise.
* lib/target-supports.exp: Likewise.
* lib/target-utils.exp: Likewise.

* c-c++-common/torture/complex-sign-add.c: Remove references to spu.
* c-c++-common/torture/complex-sign-mixed-add.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-div.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-mul.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-sub.c: Likewise.
* c-c++-common/torture/complex-sign-mul-minus-one.c: Likewise.
* c-c++-common/torture/complex-sign-mul-one.c: Likewise.
* c-c++-common/torture/complex-sign-mul.c: Likewise.
* c-c++-common/torture/complex-sign-sub.c: Likewise.

* g++.dg/opt/temp1.C: Remove references to spu.
* g++.dg/opt/vt1.C: Likewise.
* g++.dg/torture/type-generic-1.C: Likewise.
* g++.dg/warn/pr30551-2.C: Likewise.
* g++.dg/warn/pr30551.C: Likewise.
* g++.old-deja/g++.jason/thunk2.C: Likewise.
* g++.old-deja/g++.other/comdat5.C: Likewise.
* g++.old-deja/g++.other/local-alloc1.C: Likewise.

* gcc.c-torture/compile/20001226-1.c: Remove references to spu.
* gcc.c-torture/execute/20030222-1.c: Likewise.
* gcc.c-torture/execute/20031003-1.c: Likewise.
* gcc.c-torture/execute/20101011-1.c: Likewise.
* gcc.c-torture/execute/conversion.c: Likewise.
* gcc.c-torture/execute/ieee/compare-fp-4.x: Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-2.x: Likewise.
* gcc.c-torture/execute/ieee/inf-1.c: Likewise.
* gcc.c-torture/execute/ieee/inf-2.c: Likewise.
* gcc.c-torture/execute/ieee/mul-subnormal-single-1.x: Likewise.
* gcc.c-torture/execute/ieee/rbug.c: Likewise.
* gcc.c-torture/execute/pr39228.c: Likewise.
* gcc.c-torture/execute/ieee/20010114-2.x: Remove file.
* gcc.c-torture/execute/ieee/20030331-1.x: Remove file.
* gcc.c-torture/execute/ieee/920518-1.x: Remove file.
* gcc.c-torture/execute/ieee/compare-fp-1.x: Remove file.
* gcc.c-torture/execute/ieee/fp-cmp-4f.x: Remove file.
* gcc.c-torture/execute/ieee/fp-cmp-8f.x: Remove file.

* gcc.dg/20020312-2.c: Remove references to spu.
* gcc.dg/20030702-1.c: Likewise.
* gcc.dg/and-1.c: Likewise.
* gcc.dg/builtin-inf-1.c: Likewise.
* gcc.dg/builtins-1.c: Likewise.
* gcc.dg/builtins-43.c: Likewise.
* gcc.dg/builtins-44.c: Likewise.
* gcc.dg/builtins-45.c: Likewise.
* gcc.dg/float-range-1.c: Likewise.
* gcc.dg/float-range-3.c: Likewise.
* gcc.dg/float-range-4.c: Likewise.
* gcc.dg/float-range-5.c: Likewise.
* gcc.dg/fold-overflow-1.c: Likewise.
* gcc.dg/format/ms_unnamed-1.c: Likewise.
* gcc.dg/format/unnamed-1.c: Likewise.
* gcc.dg/hex-round-1.c: Likewise.
* gcc.dg/hex-round-2.c: Likewise.
* gcc.dg/lower-subreg-1.c: Likewise.
* gcc.dg/nrv3.c: Likewise.
* gcc.dg/pr15784-3.c: Likewise.
* gcc.dg/pr27095.c: Likewise.
* gcc.dg/pr28243.c: Likewise.
* gcc.dg/pr28796-2.c: Likewise.
* gcc.dg/pr30551-3.c: Likewise.
* gcc.dg/pr30551-6.c: Likewise.
* gcc.dg/pr30551.c: Likewise.
* gcc.dg/pr70317.c: Likewise.
* gcc.dg/sms-1.c: Likewise.
* gcc.dg/sms-2.c: Likewise.
* gcc.dg/sms-3.c: Likewise.
* gcc.dg/sms-4.c: Likewise.
* gcc.dg/sms-5.c: Likewise.
* gcc.dg/sms-6.c: Likewise.
* gcc.dg/sms-7.c: Likewise.
* gcc.dg/stack-usage-1.c: Likewise.
* gcc.dg/strlenopt-73.c: Likewise.
* gcc.dg/titype-1.c: Likewise.
* gcc.dg/tls/thr-cse-1.c: Likewise.
* gcc.dg/torture/builtin-attr-1.c: Likewise.
* gcc.dg/torture/builtin-complex-1.c: Likewise.
* gcc.dg/torture/builtin-cproj-1.c: Likewise.
* gcc.dg/torture/builtin-frexp-1.c: Likewise.
* gcc.dg/torture/builtin-ldexp-1.c: Likewise.
* gcc.dg/torture/builtin-logb-1.c: Likewise.
* gcc.dg/torture/builtin-math-2.c: Likewise.
* gcc.dg/torture/builtin-math-5.c: Likewise.
* gcc.dg/torture/builtin-modf-1.c: Likewise.
* gcc.dg/torture/fp-int-convert.h: Likewise.
* gcc.dg/torture/pr25947-1.c: Likewise.
* gcc.dg/torture/type-generic-1.c: Likewise.
* gcc.dg/tree-ssa/20040204-1.c: Likewise.
* gcc.dg/tree-ssa/ivopts-1.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-3.c: Likewise.
* gcc.dg/tree-ssa/vector-6.c:

[RFA][1/3] Remove Cell Broadband Engine SPU targets

2019-09-02 Thread Ulrich Weigand

Hello,

as announced here: https://gcc.gnu.org/ml/gcc/2019-04/msg00023.html
we have declared the spu-elf target obsolete in GCC 9 with the goal
of removing support in GCC 10.  Nobody has stepped up to take over
maintainership of the target.

This patch set therefore removes this target and all references
to it from the GCC source tree.  (libstdc++ and testsuite are
done as separate patches, this patch handles everything else.)

Tested on s390x-ibm-linux.

OK for mainline?

(Deleted directories omitted from patch.)

Bye,
Ulrich


ChangeLog:

* MAINTAINERS: Remove spu port maintainers.

contrib/ChangeLog:

* compare-all-tests (all_targets): Remove references to spu.
* config-list.mk (LIST): Likewise.

contrib/header-tools/ChangeLog:

* README: Remove references to spu.
* reduce-headers: Likewise.

libbacktrace/ChangeLog:

* configure.ac: Remove references to spu.
* configure: Regenerate.

libcpp/ChangeLog:

* directives.c: Remove references to spu from comments.
* expr.c: Likewise.

libgcc/ChangeLog:

* config.host: Remove references to spu.
* config/spu/: Remove directory.

gcc/ChangeLog:

* config.gcc: Obsolete spu target.  Remove references to spu.
* configure.ac: Remove references to spu.
* configure: Regenerate.
* config/spu/: Remove directory.
* common/config/spu/: Remove directory.

* doc/extend.texi: Remove references to spu.
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* doc/sourcebuild.texi: Likewise.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 275321)
+++ MAINTAINERS (working copy)
@@ -109,9 +109,6 @@
 sh portOleg Endo   
 sparc port David S. Miller 
 sparc port Eric Botcazou   
-spu port   Trevor Smigiel  

-spu port   David Edelsohn  
-spu port   Ulrich Weigand  
 tilegx portWalter Lee  
 tilepro port   Walter Lee  
 v850 port  Nick Clifton
Index: contrib/compare-all-tests
===
--- contrib/compare-all-tests   (revision 275321)
+++ contrib/compare-all-tests   (working copy)
@@ -34,7 +34,7 @@
 sh_opts='-m3 -m3e -m4 -m4a -m4al -m4/-mieee -m1 -m1/-mno-cbranchdi -m2a 
-m2a/-mieee -m2e -m2e/-mieee'
 sparc_opts='-mcpu=v8/-m32 -mcpu=v9/-m32 -m64'
 
-all_targets='alpha arm avr bfin cris fr30 frv h8300 ia64 iq2000 m32c m32r m68k 
mcore mips mmix mn10300 pa pdp11 ppc sh sparc spu v850 vax xstormy16 xtensa' # 
e500 
+all_targets='alpha arm avr bfin cris fr30 frv h8300 ia64 iq2000 m32c m32r m68k 
mcore mips mmix mn10300 pa pdp11 ppc sh sparc v850 vax xstormy16 xtensa' # e500 
 
 test_one_file ()
 {
Index: contrib/config-list.mk
===
--- contrib/config-list.mk  (revision 275321)
+++ contrib/config-list.mk  (working copy)
@@ -90,7 +90,7 @@
   sparc-leon3-linux-gnuOPT-enable-target=all sparc-netbsdelf \
   
sparc64-sun-solaris2.11OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix \
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \
-  sparc64-netbsd sparc64-openbsd spu-elf \
+  sparc64-netbsd sparc64-openbsd \
   tilegx-linux-gnu tilegxbe-linux-gnu tilepro-linux-gnu \
   v850e-elf v850-elf v850-rtems vax-linux-gnu \
   vax-netbsdelf vax-openbsd visium-elf x86_64-apple-darwin \
Index: contrib/header-tools/README
===
--- contrib/header-tools/README (revision 275321)
+++ contrib/header-tools/README (working copy)
@@ -203,7 +203,7 @@
   these targets.  They are also known to the tool.  When building targets it
   will check those targets before the rest.  
   This coverage can be achieved by building config-list.mk with :
-  LIST="aarch64-linux-gnu arm-netbsdelf c6x-elf epiphany-elf hppa2.0-hpux10.1 
i686-mingw32crt i686-pc-msdosdjgpp mipsel-elf powerpc-eabisimaltivec 
rs6000-ibm-aix5.1.0 sh-superh-elf sparc64-elf spu-elf"
+  LIST="aarch64-linux-gnu arm-netbsdelf c6x-elf epiphany-elf hppa2.0-hpux10.1 
i686-mingw32crt i686-pc-msdosdjgpp mipsel-elf powerpc-eabisimaltivec 
rs6000-ibm-aix5.1.0 sh-superh-elf sparc64-elf"
 
   -b specifies the native bootstrapped build root directory
   -t specifies a target build root directory that config-list.mk was run from
Index: contrib/header-tools/reduce-headers
===
--- contrib/header-tools/reduce-headers (revision 275321)
+++ contrib/header-tools/reduce-headers (working copy)
@@ -32,8 +32,7 @@
 "powerpc-eabisimaltivec",
 "rs6000-ibm-aix5.1.0",
 "sh-superh-elf",
-"sparc64-elf",

Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand

Eric Gallagher wrote:
> On 4/2/19, Ulrich Weigand  wrote:
> > Hello,
> >
> > the spu-elf target in GCC supports generating code for the SPU processors
> > of the Cell Broadband Engine; it has been part of upstream GCC since 2008.
> >
> > However, at this point I believe this target is no longer in use:
> > - There is no supported Cell/B.E. hardware any more.
> 
> Wait, SPU includes the Playstation 3, right? I'm pretty sure there are
> still plenty of PS3s in use out there... AFAIK my university (GWU) is
> still running its PS3 supercomputer cluster... (I'd have to check to
> make sure...)

GCC has only ever supported Linux targets on Cell/B.E.  I understand
Sony has removed Linux support for PS3 a long time ago.  (And in fact,
I believe support for PS3 in general has run out this year.)

Of course, there may be some people out there still running PS3 with
old firmware that runs Linux.  But they will then have to also use
older GCC versions, sorry.

(Unless, of course, somebody steps up and volunteers to take over
the maintainance of the target.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand

Thanks, Richi and Jeff!

>   * config.gcc: Mark spu* targets as deprecated/obsolete.

I've now checked this in.

I've also checked in the following patch to announce the change
in htdocs.

Bye,
Ulrich

Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.56
diff -u -r1.56 changes.html
--- htdocs/gcc-9/changes.html   1 Apr 2019 14:55:53 -   1.56
+++ htdocs/gcc-9/changes.html   2 Apr 2019 13:45:27 -
@@ -54,6 +54,11 @@
   in the https://gcc.gnu.org/ml/gcc/2018-10/msg00139.html;>
   announcement.
 
+
+  Cell Broadband Engine SPU (spu*-*-*).  Details can be 
found
+  in the https://gcc.gnu.org/ml/gcc/2019-04/msg00023.html;>
+  announcement.
+
   
 
 

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand

Hello,

the spu-elf target in GCC supports generating code for the SPU processors
of the Cell Broadband Engine; it has been part of upstream GCC since 2008.

However, at this point I believe this target is no longer in use:
- There is no supported Cell/B.E. hardware any more.
- There is no supported operating system supporting Cell/B.E. any more.

I've still been running daily regression tests until now, but I'll be
unable to continue to do so much longer since the systems I've been
using for this will go away.

Rather than leave SPU support untested/maintained, I'd therefore
propose to declare all SPU targets obsolete in GCC 9 and remove
the code with GCC 10.

Any objections to this approach?

Bye,
Ulrich


gcc/ChangeLog:

* config.gcc: Mark spu* targets as deprecated/obsolete.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 270076)
+++ gcc/config.gcc  (working copy)
@@ -248,6 +248,7 @@ md_file=
 # Obsolete configurations.
 case ${target} in
   *-*-solaris2.10* \
+  | spu*-*-*   \
   | tile*-*-*  \
  )
 if test "x$enable_obsolete" != xyes; then
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Reject invalid Q/R/S/T addresses after LRA

2019-02-12 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> gcc/ChangeLog:
> 
> 2019-02-11  Ilya Leoshkevich  
> 
>   PR target/89233
>   * config/s390/s390.c (s390_decompose_address): Update comment.
>   (s390_check_qrst_address): Reject invalid address forms after
>   LRA.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-02-11  Ilya Leoshkevich  
> 
>   PR target/89233
>   * gcc.target/s390/pr89233.c: New test.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Make legitimate_address_p accept UNSPEC_LTREF

2019-02-11 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> After r265371 (S/390: Make "b" constraint match literal pool references),
> satisfies_constraint_b () started accepting memory references, whose
> addresses do not pass legitimate_address_p ().  Specifically, literal
> pool accesses past the end of the entry they are based on are explicitly
> rejected by s390_decompose_address ().  This leads to ICE in early_mach
> pass when trying to perform UNSPEC_LTREF substitution on such addresses.
> 
> s390_decompose_address () check does not apply to relative addresses.
> The reason it is even made is that larl_operand () does not accept
> literal pool references transformed into UNSPEC_LTREF.  This patch
> makes larl_operand () treat plain and transformed literal pool
> references identically.

I don't think this change is correct.  Literal pool references that
match a larl_operand ("b" constraint) should not have been transformed
into UNSPEC_LTREF in the first place; see this comment and code:

/* Annotate every literal pool reference in INSN by an UNSPEC_LTREF expression.
   Fix up MEMs as required.
   Skip insns which support relative addressing, because they do not use a base
   register.  */

static void
annotate_constant_pool_refs (rtx_insn *insn)
{
  if (s390_safe_relative_long_p (insn))
return;

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Add new --param knobs for inliner

2019-01-07 Thread Ulrich Weigand

Jan Hubicka wrote:

> uinlined-* should be useful for architecutures with greater function
> call overhead than modern x86 chips (which is good portion of them,
> especially s390 as I learnt on Cauldron). It would be nice to benchmark
> effect of those and tune default in config/* files. I think this is a
> reasonable way to deal with architecutral differences without making
> inliner hard to tune in long term.

Thanks for the heads-up!  This looks interesting, we'll have a look.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Ulrich Weigand

Bill Schmidt wrote:

> +@item
> +When using @code{vector} in keyword-and-predefine mode; for example,
> +
> +@smallexample
> +typedef signed short int16;
> +vector int16 data;
> +@end smallexample
> +
> +Note that keyword-and-predefine mode is enabled by disabling GNU
> +extensions (e.g., by using @code{-std=c11}) and including
> +@code{}.
> +@end itemize

This looks correct to me, and I've just verified that the example
does indeed build with -std=c11 and #include  and fails
to build without either of these.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Ulrich Weigand

Bill Schmidt wrote:

> +@item
> +When using vector in keyword-and-predefine mode; for example,
> +
> +@smallexample
> +/* With -maltivec only: */

This is a bit confusing (at least to me).  What does "with -maltivec only"
mean here?  Just adding -maltivec will *not* switch to keyword-and-
predefine mode, as far as I can tell.  Rather, to switch to that mode
you'll have to disable GNU extensions, e.g. via -std=c11, and then
include  to get the predefine.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [wwwdocs] readings.html -- two broken z/Architecture links

2018-12-03 Thread Ulrich Weigand

Hi Gerald,

> I've been getting
> 
>   "A possible equivalent for dz9zr002 was not found." (or similar)
> 
> for these two pages for a while.
> 
> If you have suitable links to cover these, please go ahead and add
> them; for the time being I committed the patch below.

I've committed the following patch to update those links.

Bye,
Ulrich

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.308
diff -u -r1.308 readings.html
--- readings.html   2 Dec 2018 10:55:41 -   1.308
+++ readings.html   3 Dec 2018 12:32:00 -
@@ -309,7 +309,10 @@
 
  z/Architecture (S/390)
   Manufacturer: IBM
+  http://publibfp.dhe.ibm.com/epubs/pdf/dz9zr011.pdf;>z/Architecture 
Principles of Operation
+  http://publibfp.dhe.ibm.com/epubs/pdf/dz9ar008.pdf;>ESA/390 
Principles of Operation
   http://refspecs.linuxbase.org/ELF/zSeries/lzsabi0_zSeries.html;>Linux for 
z Systems ABI
+  http://refspecs.linuxbase.org/ELF/zSeries/lzsabi0_s390.html;>Linux for 
S/390 ABI
  
 
 


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v5] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand

Ilya Leoshkevich wrote:
 
>   PR target/87762
>   * config/s390/s390.c (s390_safe_relative_long_p): New function.
>   (annotate_constant_pool_refs): Skip insns which support
>   relative addressing.
>   (annotate_constant_pool_refs_1): New helper function.
>   (find_constant_pool_ref): Skip insns which support relative
>   addression.
>   (find_constant_pool_ref_1): New helper function.
>   (replace_constant_pool_ref): Skip insns which support
>   relative addressing.
>   (replace_constant_pool_ref_1): New helper function.
>   (s390_mainpool_start): Adapt to the new signature.
>   (s390_mainpool_finish): Likewise.
>   (s390_chunkify_start): Likewise.
>   (s390_chunkify_finish): Likewise.
>   (pass_s390_early_mach::execute): Likewise.
>   (s390_prologue_plus_offset): Likewise.
>   (s390_emit_prologue): Likewise.
>   (s390_emit_epilogue): Likewise.

This version is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

>gcc_assert (GET_CODE (x) != SYMBOL_REF
> -   || !CONSTANT_POOL_ADDRESS_P (x));
> +   || !CONSTANT_POOL_ADDRESS_P (x)
> +   || s390_symbol_relative_long_p (x));

Hmm, it's a bit weird that this routine now uses a different check
than the other two.  It would look more straightforward for
find_constant_pool_ref to use the same s390_safe_relative_long_p
check as the others.

(This would then also make s390_symbol_relative_long_p redundant.)

If we do that, it might even make sense to pull the 
s390_safe_relative_long_p check up into the callers:

E.g. in s390_mainpool_finish, replace

  /* Replace all literal pool references.  */

  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
{
  if (NONJUMP_INSN_P (insn) || CALL_P (insn))
{

with

  if ((NONJUMP_INSN_P (insn) || CALL_P (insn))
  && !s390_safe_relative_long_p (insn))

(This last change is just a suggestion, only if it makes the
overall code simpler and more readable.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> +  /* Return unannotated constant pool references, so that the corresponding
> + entries are added to the back-end-managed pool.  Not doing so would 
> result
> + in those entries being placed in the middle-end-managed pool, which 
> would
> + in turn prevent merging them with identical ones in the back-end-managed
> + pool.  */

I'm still not convinced this is necessary; how frequently does the case occur
that a constant is duplicated, and is it worth the extra code complexity?
(Also, note that the middle-end pool is shared across the whole translation
unit, while the back-end pool is duplicated in every function ... so if the
same constant is used by many functions, forcing it into the back-end pool
may *increase* overall size.)

In any case, does this even happen with your current code?  You find the
unannotated reference here, and therefore copy the constant into the local
pool, but then replace_constant_pool_ref will still ignore the insn, and
thus the insn will actually continue to refer to the middle-end pool
constant, *not* the copy created in the local pool.  Right?

>if (DISP_IN_RANGE (INTVAL (frame_off)))
>   {
> -   insn = gen_rtx_SET (frame_pointer,
> -   gen_rtx_PLUS (Pmode, frame_pointer, frame_off));
> -   insn = emit_insn (insn);
> +   insn = emit_insn (copy_rtx (cfa));
>   }

This seems unrelated?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2] S/390: Allow LARL of literal pool entries

2018-10-31 Thread Ulrich Weigand

Ilya Leoshkevich wrote:
> Am 30.10.2018 um 18:22 schrieb Ulrich Weigand :
> > This definitely looks wrong.  If we haven't annotated the address,
> > it should *not* be found by find_constant_pool_ref, since we are
> > not going to replace it!  That was the whole point of not annotating
> > it in the first place ...
> 
> There are two use cases for find_constant_pool_ref ().  One is indeed
> replacing annotated references.  The other (in s390_mainpool_start ()
> and s390_chunkify_start ()) is creating pool entries.  So I've decided
> to let it find unannotated references for the second use case.

OK, but if we access the constant via relative address, we don't need
to copy it into the back-end managed pool either; the relative address
can just refer the constant in the default pool maintained by the
middle end.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2] S/390: Allow LARL of literal pool entries

2018-10-30 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> @@ -8223,6 +8237,18 @@ find_constant_pool_ref (rtx x, rtx *ref)
>&& XINT (x, 1) == UNSPECV_POOL_ENTRY)
>  return;
>  
> +  if (SYMBOL_REF_P (x)
> +  && CONSTANT_POOL_ADDRESS_P (x)
> +  && s390_symbol_larl_p (x))
> +{
> +  if (*ref == NULL_RTX)
> + *ref = x;
> +  else
> + gcc_assert (*ref == x);
> +
> +  return;
> +}

This definitely looks wrong.  If we haven't annotated the address,
it should *not* be found by find_constant_pool_ref, since we are
not going to replace it!  That was the whole point of not annotating
it in the first place ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow LARL of literal pool entries

2018-10-30 Thread Ulrich Weigand

Ilya Leoshkevich wrote:
> Am 30.10.2018 um 16:20 schrieb Ulrich Weigand :
> > Not sure that this is fully correct either.  *Some* instructions, like
> > e.g. floating-point loads, do not accept relative operands.  And even
> > for the relative loads that exist, there may be slightly different
> > restrictions on what addresses are allowed (e.g. LGRL only accepts
> > 8-byte aligned addresses, while LARL accepts 2-byte aligned addresses).
> 
> In such instructions SYMBOL_REFs appear only inside MEMs.  There is the
> special case for that in annotate_constant_pool_refs (), which ensures
> that UNSPEC_LTREFs are generated and recursive calls are not made.  So,
> the newly introduced check should have no effect on such instructions.

The point is, they should *not* get annotated either, since the
instructions actually can handle relative accesses to the literal
pool, so we don't need to use a base register.

But I think that still would work; alignment shouldn't be an issue after
all since literal pool contents are always naturally aligned.  So we only
need to recognize whether the insn alternative accepts relative
operands at all, which we could do as suggested below.

> > It seems the underlying problem is that we have predicates/constraints
> > that accept literal pool differences for two distinct reasons now:
> > either because they can be naturally handled via relative addressing,
> > or because they are supposed to be transformed to base-register addressing
> > later on.  We really need to distinguish the two cases in some way.
> >
> > Maybe it would make sense to check which alternative/constraint matched
> > the insn, and decide based on that whether we need to rewrite to base-
> > register addressing or not?
> 
> We currently have "type" attribute, which has the value "larl" for most
> relative addressing alternatives.  In a few cases it's "load" / "store",
> but that looks like an omission to me: e.g. for "lhrl" it's "larl", but
> for "lrl" it's "load".  We could query it from the back-end with
> get_attr_type () and compare the result with TYPE_LARL.

Good point, that looks like it should work.  For those cases where you do
have to change the type attribute, we need to verify that scheduler
properties do not change; this should really only be an issue with z10
(i.e. the 2097.md scheduler description).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow LARL of literal pool entries

2018-10-30 Thread Ulrich Weigand

Ilya Leoshkevich wrote:
> Am 29.10.2018 um 19:45 schrieb Ulrich Weigand :
>
> > This is true.  But something else must still be going on here.  Note that
> > many other instruction patterns might contain constant pool addresses,
> > since they are accepted e.g. by the 'b' constraint.  In all of those
> > cases, we shouldn't add the UNSPEC_LTREF.  So just checking for the
> > specific LARL instruction pattern in annotate_constant_pool_refs does
> > not feel like a correct fix here.
>
> I have changed the patch to skip all larl_operands, regardless of which
> context they appear in.  Regtest is running.

Not sure that this is fully correct either.  *Some* instructions, like
e.g. floating-point loads, do not accept relative operands.  And even
for the relative loads that exist, there may be slightly different
restrictions on what addresses are allowed (e.g. LGRL only accepts
8-byte aligned addresses, while LARL accepts 2-byte aligned addresses).

It seems the underlying problem is that we have predicates/constraints
that accept literal pool differences for two distinct reasons now:
either because they can be naturally handled via relative addressing,
or because they are supposed to be transformed to base-register addressing
later on.  We really need to distinguish the two cases in some way.

Maybe it would make sense to check which alternative/constraint matched
the insn, and decide based on that whether we need to rewrite to base-
register addressing or not?

> > In fact, before r265490, the pattern for movdi_larl could also contain a
> > constant pool address, so why didn't the problem occur then?  What's the
> > difference whether this is part of movdi_larl or just movdi?
> 
> The difference is usage of "X" constraint.  Before, when we initially
> chose movdi_larl, we could still put UNSPEC_LTREF inside it without
> consequences, because during UNSPEC_LTREF lifetime only constraints are
> checked.

Huh, I see.  That was certainly unintentional.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow LARL of literal pool entries

2018-10-29 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> r265490 allowed the compiler to choose in a more flexible way whether to
> use load or load-address-relative-long (LARL) instruction.  When it
> chose LARL for literal pool references, the latter ones were rewritten
> by pass_s390_early_mach to use UNSPEC_LTREF, which assumes base register
> usage, which in turn is not compatible with LARL.  The end result was an
> ICE because of unrecognizable insn.
> 
> UNSPEC_LTREF and friends are necessary in order to communicate the
> dependency on the base register to pass_sched2.  When LARL is used, no
> base register is necessary, so in such cases the rewrite must be
> avoided.

This is true.  But something else must still be going on here.  Note that
many other instruction patterns might contain constant pool addresses,
since they are accepted e.g. by the 'b' constraint.  In all of those
cases, we shouldn't add the UNSPEC_LTREF.  So just checking for the
specific LARL instruction pattern in annotate_constant_pool_refs does
not feel like a correct fix here.

In fact, before r265490, the pattern for movdi_larl could also contain a
constant pool address, so why didn't the problem occur then?  What's the
difference whether this is part of movdi_larl or just movdi?

> @@ -8184,7 +8200,8 @@ annotate_constant_pool_refs (rtx *x)
> rtx addr = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, sym, base),
>UNSPEC_LTREF);
>  
> -   SET_SRC (*x) = plus_constant (Pmode, addr, off);
> +   SET_SRC (*x) = gen_rtx_CONST (Pmode,
> + plus_constant (Pmode, addr, off));

This looks like an unrelated change ... it seems incorrect to me, given
the UNSPEC_LTREF actually contains a register reference, so it shouldn't
really be CONST.  (And if it were, why make the change just here and not
everywhere a UNSPEC_LTREF is generated?)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v3] S/390: Fix conditional returns on z196+

2018-09-24 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> gcc/ChangeLog:
> 
> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL
>   RETURN+USE when returning via %r14.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * gcc.target/s390/risbg-ll-3.c: Expect conditional returns.
>   * gcc.target/s390/zvector/vec-cmp-2.c: Likewise.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v2] S/390: Fix conditional returns on z196+

2018-09-21 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> 2018-09-19  Ilya Leoshkevich  
> 
>   PR target/80080
>   * config/s390/s390.md: Do not use PARALLEL RETURN+USE when
>   returning via %r14.

This makes sense to me.  I'm just wondering if it wouldn't be
simpler to do the check for r14 right in s390_emit_epilogue
and then just call gen_return instead of gen_return_use
directly there?

> +++ b/gcc/config/s390/s390.md
> @@ -10831,6 +10831,13 @@
>   (use (match_operand 0 "register_operand" "a"))])]
>""
>  {
> +  if (REGNO (operands[0]) == RETURN_REGNUM && s390_can_use_return_insn ())

This probably cannot happen in practice in this specific case,
but in general a register_operand may also be e.g. a SUBREG,
so you shouldn't use REGNO without first verifying that this
is a REG.  (If you move the check to s390_emit_epilogue as above,
this point is moot; you don't even need to generate a REG RTX
if it's for r14 then.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib

2018-08-08 Thread Ulrich Weigand

Jonathan Wakely wrote:

> Aha, so newlib was using memalign previously:
> 
> @@ -53,20 +54,24 @@ aligned_alloc (std::size_t al, std::size_t sz)
>  #else
>  extern "C" void *memalign(std::size_t boundary, std::size_t size);
>  #endif
> -#define aligned_alloc memalign

Yes, exactly ... this commit introduced the regression.

> OK, I've regressed the branches then - I'll fix that.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] libstdc++-v3: Have aligned_alloc() on Newlib

2018-08-08 Thread Ulrich Weigand

Jonathan Wakely wrote:
> On 08/08/18 10:52 +0200, Sebastian Huber wrote:
> >While building for Newlib, some configure checks must be hard coded.
> >The aligned_alloc() is supported since 2015 in Newlib.
> >
> >libstdc++-v3
> >
> > PR target/85904
> > * configure.ac): Define HAVE_ALIGNED_ALLOC if building for
> 
> There's a stray closing parenthesis here.
> 
> > Newlib.
> > * configure: Regnerate.
> 
> Typo "Regnerate".
> 
> But the patch itself is fine - OK for trunk.
> 
> I'm ambivalent about this being backported to gcc-7 and gcc-8 branches
> (gcc-6 is unaffected as it doesn't use aligned_alloc).
> 
> It's strictly speaking an ABI change, because HAVE_ALIGNED_ALLOC
> affects the memory layout for allocations from operator new(size_t,
> align_val_t) (in new_opa.cc) which needs to agree with the
> corresponding operator delete (in del_opa.cc). Using static linking it
> might be possible to create a binary that has operator new using
> aligned_alloc, but operator delete expecting to do ((void**)ptr)[-1],
> which would be bad.
> 
> Those operators are C++17, so "experimental", but maybe we shouldn't
> make the change on release branches.

The way it is now I'm getting build failures on new SPU target
(which is newlib based):

/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:58:1:
 error: 'void* aligned_alloc(std::size_t, std::size_t)' was declared 'extern' 
and later 'static' [-fpermissive]
 aligned_alloc (std::size_t al, std::size_t sz)
 ^
In file included from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/cstdlib:75:62,
 from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-build/spu/libstdc++-v3/include/stdlib.h:36,
 from 
/home/uweigand/dailybuild/spu-tc-2018-08-07/gcc-head/src/libstdc++-v3/libsupc++/new_opa.cc:27:
/home/uweigand/dailybuild/spu-tc-2018-08-07/spu-toolchain/spu/include/stdlib.h:328:8:
 note: previous declaration of 'void* aligned_alloc(size_t, size_t)'
 void * aligned_alloc(size_t, size_t) __malloc_like __alloc_align(1)
^

This seems to be because configure is hard-coded to assume aligned_alloc
is not available, but then the new_opc.cc file tries to define its own
version, conflicting with the newlib prototype that is actually there.

So one way or the other this needs to be fixed ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/2] S/390: Remove TARGET_CPU_ZARCH

2018-08-08 Thread Ulrich Weigand

Ilya Leoshkevich wrote:

> TARGET_CPU_ZARCH allowed to distinguish between g5/g6 and newer
> machines.  Since the former are now gone, we can assume that
> TARGET_CPU_ZARCH is always true.  As a side-effect, branch splitting
> is now completely gone.  Some parts of literal pool splitting are
> also gone, but it's still there: we need to support it because
> floating point and vector instructions still cannot use relative
> addressing.

Great to see this finally go!

B.t.w. the literal pool handling can be simplified further now.
The current code was made complicated due to the potential of
interaction between branch splitting and literal pool splitting,
see the long comment:

 However, the two problems are interdependent: splitting the
 literal pool can move a branch further away from its target,
 causing the 64K limit to overflow, and on the other hand,
 replacing a PC-relative branch by an absolute branch means
 we need to put the branch target address into the literal
 pool, possibly causing it to overflow.

 So, we loop trying to fix up both problems until we manage
 to satisfy both conditions at the same time.  [...]

Since branch splitting is gone, this whole logic is superfluous;
the loop will never execute more than once.  (It may not be
necessary any longer to split the logic into the various
chunkify_start/finish routines either ...)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[spu, commit] Define TARGET_HAVE_SPECULATION_SAFE_VALUE

2018-08-06 Thread Ulrich Weigand

Hello,

the SPU processor is not affected by speculation, so this macro can
safely be defined as speculation_safe_value_not_needed.

Committed to mainline.

Bye,
Ulrich


gcc/ChangeLog:

PR target/86807
* config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_safe_value_not_needed.

Index: config/spu/spu.c
===
--- config/spu/spu.c(revision 263334)
+++ config/spu/spu.c(working copy)
@@ -7463,6 +7463,9 @@ static const struct attribute_spec spu_a
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT spu_constant_alignment
 
+#undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-spu.h"
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Build fail on gthr-single.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand

I wrote:
> Nicholas Koenig wrote:
> 
> > Hello everyone,
> > 
> > I have committed the async I/O patch as r262978.
> > 
> > The test cases are in libgomp.fortran for now, maybe that can be changed 
> > later.
> 
> It looks like this broke building libgfortran on spu, and presumably
> any platform that uses gthr-simple instead of gthr-posix.

The file is called gthr-single.h, not gthr-simple.h ... sorry for the typo.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Build fail on gthr-simple.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand

Nicholas Koenig wrote:

> Hello everyone,
> 
> I have committed the async I/O patch as r262978.
> 
> The test cases are in libgomp.fortran for now, maybe that can be changed 
> later.

It looks like this broke building libgfortran on spu, and presumably
any platform that uses gthr-simple instead of gthr-posix.

The problem is that io/asynch.h unconditionally uses a couple of
features that are not provided by gthr-simplex, in particular
  __gthread_cond_t
and
  __gthread_equal / __gthread_self

According to the documentation in gthr.h, the former is only available
if __GTHREAD_HAS_COND is defined, and the latter are only available if
__GTHREADS_CXX0X is defined.  Neither is true for gthr-simple.h.

To fix the build error, either libgfortran should only use those features
conditionally on those defines, or else the gthr.h logic needs to be
changed and (stubs for) those features provided in gthr-simple.h as well.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, S390] Avoid LA with base and index on z13

2018-07-05 Thread Ulrich Weigand

Robin Dapp wrote:

> * config/s390/s390.c (preferred_la_operand_p): Do not use
>   LA with base and index on z13 or later.

The code just before your change reads:

  /* Avoid LA instructions with index register on z196; it is
 preferable to use regular add instructions when possible.
 Starting with zEC12 the la with index register is "uncracked"
 again.  */
  if (addr.indx && s390_tune == PROCESSOR_2817_Z196)
return false;

But on zEC12 LA works pretty much the same as on z13/z14, it is
indeed not cracked, but still a 2-cycle instruction when using
an index register.  So I guess the change really should apply
to zEC12 as well, and this could be as simple as changing the
above line to:

  if (addr.indx && s390_tune >= PROCESSOR_2817_Z196)

(Note that "addr.base && addr.indx" is the same as just checking
for addr.indx, since s390_decompose_address will never fill in
*just* an index.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[spu, commit] Fix PR target/82960

2017-12-08 Thread Ulrich Weigand

Hello,

the ICE with --enable-checking=rtl reported in PR 82960 was caused by
spu.c:pad_bb using INSN_CODE on RTX with INSN_P false (specifically,
on jump_table_data).  Add checks to handle this case.

Tested on spu-elf, committed to mainline.

Bye,
Ulrich

ChangeLog:

PR target/82960
* config/spu/spu.c (pad_bb): Only check INSN_CODE when INSN_P is true.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 255408)
--- gcc/config/spu/spu.c(working copy)
*** pad_bb(void)
*** 2029,2036 
for (; insn; insn = next_insn)
  {
next_insn = next_active_insn (insn);
!   if (INSN_CODE (insn) == CODE_FOR_iprefetch
! || INSN_CODE (insn) == CODE_FOR_hbr)
{
  if (hbr_insn)
{
--- 2029,2037 
for (; insn; insn = next_insn)
  {
next_insn = next_active_insn (insn);
!   if (INSN_P (insn)
!   && (INSN_CODE (insn) == CODE_FOR_iprefetch
! || INSN_CODE (insn) == CODE_FOR_hbr))
{
  if (hbr_insn)
{
*** pad_bb(void)
*** 2048,2054 
}
  hbr_insn = insn;
}
!   if (INSN_CODE (insn) == CODE_FOR_blockage && next_insn)
{
  if (GET_MODE (insn) == TImode)
PUT_MODE (next_insn, TImode);
--- 2049,2055 
}
  hbr_insn = insn;
}
!   if (INSN_P (insn) && INSN_CODE (insn) == CODE_FOR_blockage && next_insn)
{
  if (GET_MODE (insn) == TImode)
PUT_MODE (next_insn, TImode);
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH v5] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-24 Thread Ulrich Weigand

Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> v5:
>   * Generate LT pattern directly for const 0 value.
>   * Split into three patches.
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

> gcc/ChangeLog-dv-atomic-gcc7-1
> 
>   * config/s390/s390.md ("cstorecc4"): Use load-on-condition and deal
>   with CCZmode for TARGET_Z196.

> gcc/ChangeLog-dv-atomic-gcc7-2
> 
>   * config/s390/s390.md (define_peephole2): New peephole to help
>   combining the load-and-test pattern with volatile memory.

> gcc/ChangeLog-dv-atomic-gcc7-3
> 
>   * s390-protos.h (s390_expand_cs_hqi): Removed.
>   (s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
>   * config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
>   modes as well as CCZ1mode and CCZmode.
>   (s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
>   signature of s390_emit_compare_and_swap.
>   (s390_expand_cs_hqi): Likewise, make static.
>   (s390_expand_cs_tdsi): Generate an explicit compare before trying
>   compare-and-swap, in some cases.
>   (s390_expand_cs): Wrapper function.
>   (s390_expand_atomic_exchange_tdsi): New backend specific expander for
>   atomic_exchange.
>   (s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
>   * config/s390/s390.md ("atomic_compare_and_swap"): Merge the
>   patterns for small and large integers.  Forbid symref memory operands.
>   Move expander to s390.c.  Require cc register.
>   ("atomic_compare_and_swap_internal")
>   ("*atomic_compare_and_swap_1")
>   ("*atomic_compare_and_swapdi_2")
>   ("*atomic_compare_and_swapsi_3"): Use s_operand to forbid
>   symref memory operands.  Remove CC mode and call s390_match_ccmode
>   instead.
>   ("atomic_exchange"): Allow and implement all integer modes.
>
> gcc/testsuite/ChangeLog-dv-atomic-gcc7
> 
>   * gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
>   * gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
>   * gcc.target/s390/md/atomic_exchange-1.inc: New test.


These all look good to me now.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-10 Thread Ulrich Weigand

Dominik Vogt wrote:

> So, we could add a special case for const0_rtx that generates the
> LT pattern and does not rely on Combine, and get rid of the
> peephole.  I'm not sure this is worthwhile thoug, because the
> peephole has other beneficial effects (as discussed), and until
> we've solved the problems preventing Combine from merging L+LTR in
> some cases, this is the best we have.  What do you think?

If we removed the peephole (for now), the patch now only touches
parts of the backend used to emit atomic instructions, so code
generation for any code that doesn't use those is guaranteed to
be unchanged.  Given that we're quite late in the cycle, this
might be a good idea at this point ...

But I don't see anything actually incorrect in the peephole, and
it might indeed be a good thing in general -- just maybe more
appropriate for the next stage1. 

Andreas, do you have an opinion on this?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Ulrich Weigand

Dominik Vogt wrote:
> On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > +; Peephole to combine a load-and-test from volatile memory which combine 
> > > does
> > > +; not do.
> > > +(define_peephole2
> > > +  [(set (match_operand:GPR 0 "register_operand")
> > > + (match_operand:GPR 2 "memory_operand"))
> > > +   (set (reg CC_REGNUM)
> > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > +   && GENERAL_REG_P (operands[0])
> > > +   && satisfies_constraint_T (operands[2])"
> > > +  [(parallel
> > > +[(set (reg:CCS CC_REGNUM)
> > > +   (compare:CCS (match_dup 2) (match_dup 1)))
> > > + (set (match_dup 0) (match_dup 2))])])
> > 
> > Still wondering why this is necessary.
> 
> It's necessary vecause Combine refuses to match anything that
> contains a volatile memory reference, using a global flag for
> Recog.

So is this specifically to match the pre-test load emitted here?

+  emit_move_insn (output, mem);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));

If so, since you already know that this should always map to a
LOAD AND TEST, could simply just emit the LT pattern here,
instead of relying on combine to do it ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Ulrich Weigand

Dominik Vogt wrote:

> v4:
> 
>   * Remoce CCZZ1 iterator. 
>   * Remove duplicates of CS patterns. 
>   * Move the skip_cs_label so that output is moved to vtarget even 
> if the CS instruction was not used. 
>   * Removed leftover from "sne" (from an earlier version of the
>   * patch). 

Thanks, this looks quite good to me now.  I do still have two questions:

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> + (match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +[(set (reg:CCS CC_REGNUM)
> +   (compare:CCS (match_dup 2) (match_dup 1)))
> + (set (match_dup 0) (match_dup 2))])])

Still wondering why this is necessary.  On the other hand, I guess it
cannot hurt to have the peephole either ...

> @@ -6518,13 +6533,30 @@
>[(parallel
>  [(set (match_operand:SI 0 "register_operand" "")
> (match_operator:SI 1 "s390_eqne_operator"
> -   [(match_operand:CCZ1 2 "register_operand")
> +   [(match_operand 2 "cc_reg_operand")
>   (match_operand 3 "const0_operand")]))
>   (clobber (reg:CC CC_REGNUM))])]
>""
> -  "emit_insn (gen_sne (operands[0], operands[2]));
> -   if (GET_CODE (operands[1]) == EQ)
> - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> +  "machine_mode mode = GET_MODE (operands[2]);
> +   if (TARGET_Z196)
> + {
> +   rtx cond, ite;
> +
> +   if (GET_CODE (operands[1]) == NE)
> +  cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> +   else
> +  cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> +   ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> +   emit_insn (gen_rtx_SET (operands[0], ite));
> + }
> +   else
> + {
> +   if (mode != CCZ1mode)
> +  FAIL;
> +   emit_insn (gen_sne (operands[0], operands[2]));
> +   if (GET_CODE (operands[1]) == EQ)
> +  emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> + }
> DONE;")

>From what I can see in the rest of the patch, none of the CS changes now
actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only
calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and
in the TARGET_Z196 case it deliberates does *not* use cstorecc4.

Now, in general this improvement to cstorecc4 is of course valuable
in itself.  But I think at this point it might be better to separate
this out into an independent patch (and measure its effect separately).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-06 Thread Ulrich Weigand

I wrote (incorrectly):

> >[(parallel
> >  [(set (match_operand:SI 0 "register_operand" "")
> >   (match_operator:SI 1 "s390_eqne_operator"
> > -   [(match_operand:CCZ1 2 "register_operand")
> > +   [(match_operand 2 "cc_reg_operand")
> > (match_operand 3 "const0_operand")]))
> >   (clobber (reg:CC CC_REGNUM))])]
> >""
> > -  "emit_insn (gen_sne (operands[0], operands[2]));
> > -   if (GET_CODE (operands[1]) == EQ)
> > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +  "machine_mode mode = GET_MODE (operands[2]);
> > +   if (TARGET_Z196)
> > + {
> > +   rtx cond, ite;
> > +
> > +   if (GET_CODE (operands[1]) == NE)
> > +cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> > +   else
> > +cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> 
> I guess as a result cond is now always the same as operands[1] and
> could be just taken from there?

This is wrong -- I didn't notice the mode changes (in the cstore
pattern, the mode on the operator is SImode, but of the if_then_else
we want VOIDmode.

> > +   ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> > +   emit_insn (gen_rtx_SET (operands[0], ite));
> 
> Also, since you're just emitting RTL directly, maybe you could simply use
> the expander pattern above to do so (and not use emit_insn followed
> by DONE in this case?)

Therefore this doesn't work either.

Sorry for the confusion.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-06 Thread Ulrich Weigand

Dominik Vogt wrote:

> > v3:
> > 
> >   * Remove sne* patterns.
> >   * Move alignment check from s390_expand_cs to s390.md.
> >   * Use s_operand instead of memory_nosymref_operand.
> >   * Remove memory_nosymref_operand.
> >   * Allow any CC-mode in cstorecc4 for TARGET_Z196.
> >   * Fix EQ with TARGET_Z196 in cstorecc4.
> >   * Duplicate CS patterns for CCZmode.
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.

>  s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
> - rtx cmp, rtx new_rtx)
> + rtx cmp, rtx new_rtx, machine_mode ccmode)
>  {
> -  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, 
> new_rtx));
> -  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
> - const0_rtx);
> +  switch (GET_MODE (mem))
> +{
> +case SImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapsiccz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapsiccz_internal (old, mem, cmp,
> +new_rtx));
> +  break;
> +case DImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapdiccz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapdiccz_internal (old, mem, cmp,
> +   new_rtx));
> +  break;
> +case TImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapticcz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapticcz_internal (old, mem, cmp,
> +   new_rtx));
> +  break;

These expanders don't really do anything different depending on the
mode of the accessed word (SI/DI/TImode), so this seems like a bit of
unncessary duplication.  The original code was correct in always
calling the SImode variant, even if this looks a bit odd.  Maybe a
better fix is to just remove the mode from this expander.

> +  if (TARGET_Z196
> +  && (mode == SImode || mode == DImode))
> +do_const_opt = (is_weak && CONST_INT_P (cmp));
> +
> +  if (do_const_opt)
> +{
> +  const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
> +  rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
> +
> +  skip_cs_label = gen_label_rtx ();
> +  emit_move_insn (output, mem);
> +  emit_move_insn (btarget, const0_rtx);
> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> +  s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
> +  add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
> +  /* If the jump is not taken, OUTPUT is the expected value.  */
> +  cmp = output;
> +  /* Reload newval to a register manually, *after* the compare and jump
> +  above.  Otherwise Reload might place it before the jump.  */
> +}
> +  else
> +cmp = force_reg (mode, cmp);
> +  new_rtx = force_reg (mode, new_rtx);
> +  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
> +   (do_const_opt) ? CCZmode : CCZ1mode);
> +
> +  /* We deliberately accept non-register operands in the predicate
> + to ensure the write back to the output operand happens *before*
> + the store-flags code below.  This makes it easier for combine
> + to merge the store-flags code with a potential test-and-branch
> + pattern following (immediately!) afterwards.  */
> +  if (output != vtarget)
> +emit_move_insn (vtarget, output);
> +
> +  if (skip_cs_label != NULL)
> +  emit_label (skip_cs_label);

So if do_const_opt is true, but output != vtarget, the code above will
write to output, but this is then never moved to vtarget.  This looks
incorrect.

> +  if (TARGET_Z196 && do_const_opt)

do_const_opt seems to always imply TARGET_Z196.

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> + (match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +[(set (reg:CCS CC_REGNUM)
> +   (compare:CCS (match_dup 2) (match_dup 1)))
> + (set (match_dup 0) (match_dup 2))])])

We should really try to understand why this isn't done earlier and
fix the problem there ...

>[(parallel
>  [(set (match_operand:SI 0 "register_operand" "")
>

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-05 Thread Ulrich Weigand

Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> New version attached.

No, it isn't :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000][GCC6] Fix PR78543, ICE in push_reload on powerpc64le-linux-gnu

2017-03-07 Thread Ulrich Weigand

Peter Bergner wrote:

> If we look at rs6000_mode_dependent_address(), it accepts some addresses
> as not being mode dependent:
> 
> case PLUS:
>   /* Any offset from virtual_stack_vars_rtx and arg_pointer_rtx
>  is considered a legitimate address before reload, so there
>  are no offset restrictions in that case.  Note that this
>  condition is safe in strict mode because any address involving
>  virtual_stack_vars_rtx or arg_pointer_rtx would already have
>  been rejected as illegitimate.  */
>   if (XEXP (addr, 0) != virtual_stack_vars_rtx
>   && XEXP (addr, 0) != arg_pointer_rtx
>   && GET_CODE (XEXP (addr, 1)) == CONST_INT)
> {
>   unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
>   return val + 0x8000 >= 0x1 - (TARGET_POWERPC64 ? 8 : 12);
> }
>   break;
> 
> It seems wrong that we accept addresses based off virtual_stack_vars_rtx
> and arg_pointer_rtx, but not stack_pointer_rtx (ie, r1) like we have in
> these test cases.

This check is usually done in legitimate_address_p, and there this
distinction is in fact correct: the problem is that when rewriting
an address based on a virtual register into one based on the stack
(or frame) pointer, the offset will change, and may change significantly.
(E.g. you may end up with "frame size - original offset".)

This means that checking the offset relative to the virtual register
is completely pointless:
(A) If the offset relative to the virtual register is in range, the
resulting offset relative to the stack pointer may be out of range.
(B) If the offset relative to the virtual register is out of range,
the resulting offset may still end up *in* range.

Because of (A), reload will re-check offsets after eliminating
registers, and fix up offsets that are now out of range.  But
because if (B), if we were to reject out-of-range offsets relative
to a virtual register already in legitimate_address_p, we'd already
commit to generatcwinge more complex code (e.g. precomputing the address)
and even if the final offset would be in range, nobody would then
clean up that unnecessary code any more.

That is why the legitimate_address_p deliberately and correctly
accepts *any* offset for virtual (and eliminable) registers,
thereby deferring the offset validity check until after the
register has been replaced by a real register.  But for any
actually real register (like the stack pointer), of course we
*do* have to perform the precise check -- nobody would do that
check otherwise, and we'd end up with invalid instructions.

Now, here we are in mode_dependent_address, which in general
should return true if legitimate_address_p on this address might
return different results depending on the mode of the address.

And in fact, for stack-pointer based addresses, legitimate_address_p
*can* return different results depending on the mode, as the comment
in front of mode_dependent_address explains:

/* Go to LABEL if ADDR (a legitimate address expression)
   has an effect that depends on the machine mode it is used for.

   On the RS/6000 this is true of all integral offsets (since AltiVec
   and VSX modes don't allow them) or is a pre-increment or decrement.

On the other hand, for addresses based on a virtual register,
legitimate_address_p does not depend on the mode since those
are special-cased to be always accepted (see the discussion above).

So I'm not sure that the proposed change is in fact correct ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA] Patch to allow SPU port to build with current trunk

2016-10-26 Thread Ulrich Weigand

Jeff Law wrote:

> First, there's a missing fallthru comment in spu_sched_reorder for 
> TYPE_LOAD/TYPE_STORE cases.  If I'm reading the SPU docs properly a 
> load/store insn is handled by pipe_1 and we're also trying to model some 
> aspects of the load-store unit.  So we should be setting pipe_ls and pipe_1:
> 
>   case TYPE_LOAD:
>case TYPE_STORE:
>  pipe_ls = i;
>case TYPE_LNOP:
>case TYPE_SHUF:
>case TYPE_BR:
>case TYPE_MULTI1:
>case TYPE_HBR:
>  pipe_1 = i;
>  break;
> 
> This looks like intentional fallthru and should just have an appropriate 
> comment to silence the warning.

Agreed.

> spu_legitimate_address looks far more interesting and I think it's buggy 
> as written:
> 
> 
>  case SUBREG:
>x = XEXP (x, 0);
>if (REG_P (x))
>  return 0;
> 
>  case REG:
>return INT_REG_OK_FOR_BASE_P (x, reg_ok_strict);
> 
> I think the test is inverted.  We want to consider (subreg (reg)) a 
> valid memory address and reject all other (subreg (...)) expressions. 
> But this code does the opposite.

Oops, it looks like this has been broken since this commit:
https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01505.html

>   * config/spu/spu.c (spu_sched_reorder): Add missing fallthru comment.
>   (spu_legitimate_address_p): Fix logic error and add missing fallthru
>   comment.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

MAINTAINERS: Update Hartmut Penner's email address

2016-10-21 Thread Ulrich Weigand

Hello,

as requested by Hartmut, I've updated the MAINTAINERS file to show
his new email address since the old one no longer works.

Bye,
Ulrich

Index: MAINTAINERS
===
--- MAINTAINERS (revision 241398)
+++ MAINTAINERS (working copy)
@@ -94,7 +94,7 @@ rs6000/powerpc port   David Edelsohn  
 rs6000 vector extnsAldy Hernandez  <al...@redhat.com>
 rx portNick Clifton<ni...@redhat.com>
-s390 port  Hartmut Penner  <hpen...@de.ibm.com>
+s390 port  Hartmut Penner  <hepen...@us.ibm.com>
 s390 port  Ulrich Weigand  <uweig...@de.ibm.com>
 s390 port  Andreas Krebbel <andreas.kreb...@de.ibm.com>
 score port Chen Liqin  <liqin....@gmail.com>
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, RELOAD] Don't assume subreg mem address is ok

2016-08-17 Thread Ulrich Weigand

Alan Modra wrote:

>   PR rtl-optimization/72771
>   * reload.c (find_reloads): Don't assume that a subreg mem is OK
>   when find_reloads_toplev returns address_reloaded==-1.
>   (alternative_allows_const_pool_ref): Update comment.
> testsuite/
>   * gcc.c-torture/compile/pr72771.c: New.

Yes, this makes sense.  address_reloaded == -1 means that the address
may or may not have been fully reloaded, so we always need to take the
conservative action in that case.

The patch is OK.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-12 Thread Ulrich Weigand

Alan Modra wrote:
> On Tue, Jul 12, 2016 at 02:02:43PM +0200, Ulrich Weigand wrote:
> > The second time around, get_secondary_mem should reuse the
> > same stack slot it already allocated, and the elimination
> > offsets should already be set to accommodate that stack slot,
> > which means the second time around, the correct RTX should be
> > generated for the memory access.
> > 
> > Is this not happening somehow?
> 
> Duh, yes, of course.  Second time around the mem is
> (mem/c:V16QI (plus:DI (reg/f:DI 1 1)
> (const_int -16 [0xfff0])) [0  S16 A128])
> so we're checking the correct offset.
> 
> The problem now is that this passes rs6000_legitimate_address_p due to
> mode_supports_vsx_dform_quad and quad_address_p being true.  That
> doesn't seem correct for -mno-vsx.

Hmm, I see in rs6000_option_override_internal:

  /* ISA 3.0 D-form instructions require p9-vector and upper-regs.  */
  if ((TARGET_P9_DFORM_SCALAR || TARGET_P9_DFORM_VECTOR) && !TARGET_P9_VECTOR)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
error ("-mpower9-dform requires -mpower9-vector");
  rs6000_isa_flags &= ~(OPTION_MASK_P9_DFORM_SCALAR
| OPTION_MASK_P9_DFORM_VECTOR);
}


and *later*:

  /* ISA 3.0 vector instructions include ISA 2.07.  */
  if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
error ("-mpower9-vector requires -mpower8-vector");
      rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
}

That seems the wrong way around ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-12 Thread Ulrich Weigand

Alan Modra wrote:
> On Mon, Jul 11, 2016 at 03:26:46PM +0200, Ulrich Weigand wrote:
> > However, there still seems to be another underlying problem: reload
> > should never generate invalid instructions.  (Playing with '?' to
> > fix *inefficient* instructions is fine, but we shouldn't rely on '?'
> > to fix *invalid* instructions.)
> > 
> > If reload -for whatever reason- wants to emit a vr->gpr move, then it
> > should generate a valid instruction sequence to do so, via secondary
> > reloads if necessary.  I think it would be a good idea to investigate
> > why that isn't happening in this test case.  [ There is a chance that
> > the underlying bug will reappear in a different context, even after
> > the '?' change is applied. ]
> 
> The vr->gpr move uses secondary memory.  The memory allocated by
> assign_stack_local, called from get_secondary_memory is
> 
> (mem/c:V16QI (plus:DI (reg/f:DI 113 sfp)
> (const_int 32 [0x20])) [0  S16 A128])
> 
> and that is incorrectly simplified by eliminate_regs to
> 
> (mem/c:V16QI (reg/f:DI 1 1))
> 
> for use by the strict_memory_address_space_p check, which of course
> then passes.
> 
> eliminate_regs sees this entry in the elimination table:
> (gdb) p *ep
> $7 = {from = 113, to = 1, initial_offset = -32, can_eliminate = 1, 
> can_eliminate_previous = 1, offset = -32, previous_offset = -32, 
> ref_outside_mem = 0, from_rtx = 0x7688a300, to_rtx = 0x7688a2e8}
> 
> The offset in the elimination table was correct before
> get_secondary_memory called assign_stack_local, but not after
> frame_offset was changed when allocating a new stack slot.

OK, I see.  Yes, the first time around that will be wrong.
However, if get_secondary_mem called assign_stack_local, the
stack size will be changing, which is why the main loop
around calculate_needs_all_insns() in reload() should set
the "something_changed" flag and restart the whole process.
(Which means the reloads for the first pass will be wrong,
but that doesn't matter since they will be ignored.)

The second time around, get_secondary_mem should reuse the
same stack slot it already allocated, and the elimination
offsets should already be set to accommodate that stack slot,
which means the second time around, the correct RTX should be
generated for the memory access.

Is this not happening somehow?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-11 Thread Ulrich Weigand

Alan Modra wrote:
> The reason this fails is that no alternative in altivec_mov
> exactly matches.  Ideally reload would choose the Z,v alternative
> (cost 9) and you'd get an address reload for the mem to make it match
> the "Z" constraint.  Instead reload chooses the Y,r alternative (cost
> 8) as best.
> 
> That's a bad choice.  "Y" matches the r3+16 mem so no reloads needed
> for the mem, however the altivec reg needs to be reloaded to gprs.
> Without direct moves from altivec to gpr the reload needs to go via
> mem, so you get an altivec store to stack plus gpr load from stack.
> The store to stack has the same form as the original store.  Oops, no
> progress.
> 
> Even if direct moves were available from altivec regs to gprs, the Y,r
> alternative would still be a bad choice.
> 
> I believe all the r alternatives in altivec_mov should be
> disparaged with "?".

I agree that this makes sense just from a code quality perspective.

However, there still seems to be another underlying problem: reload
should never generate invalid instructions.  (Playing with '?' to
fix *inefficient* instructions is fine, but we shouldn't rely on '?'
to fix *invalid* instructions.)

If reload -for whatever reason- wants to emit a vr->gpr move, then it
should generate a valid instruction sequence to do so, via secondary
reloads if necessary.  I think it would be a good idea to investigate
why that isn't happening in this test case.  [ There is a chance that
the underlying bug will reappear in a different context, even after
the '?' change is applied. ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Add ggc-tests.c

2016-06-14 Thread Ulrich Weigand

David Malcolm wrote:
> On Mon, 2016-06-13 at 13:36 +0200, Ulrich Weigand wrote:
> > Gerald Pfeifer wrote:
> > 
> > > The source code of need_finalization_p in ggc.h reads
> > > 
> > >template
> > >static inline bool
> > >need_finalization_p ()
> > >{
> > >#if GCC_VERSION >= 4003
> > >  return !__has_trivial_destructor (T);
> > >#else
> > >  return true;
> > >#endif
> > >}
> > > 
> > > which means your self test is broken by design for any compiler
> > > that is not GCC in at least version 4.3, isn't it?
> > 
> > Just to confirm that I'm seeing the same failure on my SPU
> > daily build machine, which is running RHEL 5 with a host
> > compiler of GCC 4.1.2.
> 
> Sorry about this.
> 
> Looks like Uros fixed this in r237381.

Yes, now it works again for me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] Add ggc-tests.c

2016-06-13 Thread Ulrich Weigand

Gerald Pfeifer wrote:

> The source code of need_finalization_p in ggc.h reads
> 
>template
>static inline bool
>need_finalization_p ()
>{
>#if GCC_VERSION >= 4003
>  return !__has_trivial_destructor (T);
>#else
>  return true;
>#endif
>}
> 
> which means your self test is broken by design for any compiler
> that is not GCC in at least version 4.3, isn't it?

Just to confirm that I'm seeing the same failure on my SPU
daily build machine, which is running RHEL 5 with a host
compiler of GCC 4.1.2.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2016-05-27 Thread Ulrich Weigand

Andre Vieira (lists) wrote:
> On 07/04/16 10:30, Andre Vieira (lists) wrote:
> > On 17/03/16 16:33, Andre Vieira (lists) wrote:
> >> On 23/10/15 12:31, Bernd Schmidt wrote:
> >>> On 10/12/2015 11:58 AM, Ulrich Weigand wrote:
> >>>>
> >>>> Index: gcc/configure.ac
> >>>> ===
> >>>> --- gcc/configure.ac(revision 228530)
> >>>> +++ gcc/configure.ac(working copy)
> >>>> @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
> >>>>   fi
> >>>>
> >>>>   if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
> >>>> -  if test "x$with_headers" != x; then
> >>>> +  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
> >>>>   target_header_dir=$with_headers
> >>>> elif test "x$with_sysroot" = x; then
> >>>>  
> >>>> target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
> >>>>
> >>>
> >>> I'm missing the beginning of this conversation, but this looks like a
> >>> reasonable change (avoiding target_header_dir=yes for --with-headers).
> >>> So, approved.
> >>>
> >>>
> >>> Bernd
> >>>
> >> Hi there,
> >>
> >> I was wondering why this never made it to trunk. I am currently running
> >> into an issue that this patch would fix.

Seems I never actually checked this in, even though it was approved.
Thanks for the reminder, I've now checked the patch in.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, rs6000] Fix PR target/70168

2016-03-10 Thread Ulrich Weigand

Hello,

this patch fixes PR target/70168, a wrong code generation problem
caused by rs6000_expand_atomic_compare_and_swap not properly handling
the case where changing retval clobbers newval due to a register overlap.

Tested with no regressions on powerpc64le-linux on mainline
and gcc-5-branch.

OK for both?

Bye,
Ulrich


ChangeLog:

PR target/70168
* config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
Handle overlapping retval and newval.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 234119)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20740,6 +20740,9 @@
   if (mode != TImode && !reg_or_short_operand (oldval, mode))
 oldval = copy_to_mode_reg (mode, oldval);
 
+  if (reg_overlap_mentioned_p (retval, newval))
+newval = copy_to_reg (newval);
+
   mem = rs6000_pre_atomic_barrier (mem, mod_s);
 
   label1 = NULL_RTX;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-03-02 Thread Ulrich Weigand

H.J. Lu wrote:

> +  if (type && type_is_empty_type_p (type))
> +{
> +  if (warn_empty_type_p)
> + warn_empty_type ();
> +  return NULL;
> +}

If I understand the problem correctly, for C code empty types
already were handled correctly, the problem occured just for
C++ code (since the size of an empty size is != 0 there).

However, it seems this warning check would now appear also
for C code, even though there really shouldn't be any ABI
changes there.  Would it not be better to only generate the
warning for C++ (maybe check whether the type size is nonzero)?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-03-01 Thread Ulrich Weigand

I wrote:
> Andreas Krebbel wrote:
> 
> > +; vec_set is supposed to *modify* an existing vector so operand 0 is
> > +; duplicated as input operand.
> > +(define_expand "vec_set"
> > +  [(set (match_operand:V0 "register_operand"   
> >"")
> > +   (unspec:V [(match_operand: 1 "general_operand"   
> > "")
> > +  (match_operand:SI2 "shift_count_or_setmem_operand" 
> > "")
> 
> This is probably only cosmetic, but should we use nonmemory_operand here
> instead of shift_count_or_setmem_operand (just like everywhere else now)?
> 
> > +(define_expand "vec_extract"
> > +  [(set (match_operand: 0 "nonimmediate_operand" "")
> > +   (unspec: [(match_operand:V  1 "register_operand" "")
> > +  (match_operand:SI 2 "shift_count_or_setmem_operand" 
> > "")]
> 
> Likewise.

I just noticed that there are two more UNSPEC_VEC_SET expanders
in vx-builtins.md.  I guess those should be likewise changed:

(define_expand "vec_insert"
  [(set (match_operand:V_HW0 "register_operand" "")
(unspec:V_HW [(match_operand: 2 "register_operand" "")
  (match_operand:SI3 
"shift_count_or_setmem_operand" "")
  (match_operand:V_HW  1 "register_operand" "")]
 UNSPEC_VEC_SET))]
  "TARGET_VX"
  "")

(define_expand "vec_promote"
  [(set (match_operand:V_HW0 "register_operand" "")
(unspec:V_HW [(match_operand: 1 "register_operand" "")
  (match_operand:SI2 
"shift_count_or_setmem_operand" "")
  (match_dup 0)]
 UNSPEC_VEC_SET))]
  "TARGET_VX"
  "")

Then the only remaining users of shift_count_or_setmem_operand are the
actual setmem patterns (so maybe the predicate can be renamed to
"setmem_operand") :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 0/9] S/390 rework shift count handling - v3

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   S/390: Use enabled attribute overrides to disable alternatives.
>   S/390: Get rid of Y constraint in rotate patterns.
>   S/390: Get rid of Y constraint in left and logical right shift
> patterns.
>   S/390: Get rid of Y constraint in arithmetic right shift patterns.
>   S/390: Get rid of Y constraint in tabort.
>   S/390: Get rid of Y constraint in vector.md.
>   S/390: Use define_subst for the setmem patterns.
>   S/390: Disallow SImode in s390_decompose_address

Except for a few minor comments for the vector.md patch (separate mail),
this all looks now very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:


> +; vec_set is supposed to *modify* an existing vector so operand 0 is
> +; duplicated as input operand.
> +(define_expand "vec_set"
> +  [(set (match_operand:V0 "register_operand" 
>  "")
> + (unspec:V [(match_operand: 1 "general_operand"   
> "")
> +(match_operand:SI2 "shift_count_or_setmem_operand" 
> "")

This is probably only cosmetic, but should we use nonmemory_operand here
instead of shift_count_or_setmem_operand (just like everywhere else now)?

> +(define_expand "vec_extract"
> +  [(set (match_operand: 0 "nonimmediate_operand" "")
> + (unspec: [(match_operand:V  1 "register_operand" "")
> +(match_operand:SI 2 "shift_count_or_setmem_operand" 
> "")]

Likewise.


> +(define_insn "*vec_set_plus"
> +  [(set (match_operand:V  0 "register_operand" "=v")
> + (unspec:V [(match_operand:   1 "general_operand"   "d")
> +(plus:SI (match_operand:SI 2 "register_operand"  "a")
> + (match_operand:SI 4 "const_int_operand" "n"))
> +(match_operand:V   3 "register_operand"  "0")]
> +   UNSPEC_VEC_SET))]
> +  "TARGET_VX"
> +  "vlvg\t%v0,%1,%4(%2)"
> +  [(set_attr "op_type" "VRS")])

Wouldn't it be better to use %Y4 instead of %4 here?  Or does the middle-end
guarantee that this is never out of range?

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "nonmemory_operand"     
> "a, I")
> + (match_operand:SI 3 "const_int_operand" 
> "n, I"))]
> +UNSPEC_VEC_EXTRACT))]
> +  "TARGET_VX"
> +  "@
> +   vlgv\t%0,%v1,%3(%2)
> +   vste\t%v1,%0,%2"
> +  [(set_attr "op_type" "VRS,VRX")])

Likewise for %3.  Also, the second alternative seems odd, it matches solely a
PLUS of two CONST_INTs, which is not canonical RTL ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-25 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   * config/s390/predicates.md (const_int_6bitset_operand): New
> predicates.
>   * config/s390/s390.md: Include subst.md.
>   ("rotl3"): New expander.
>   ("rotl3", "*rotl3_and"): Merge insn definitions into
>   ...
>   ("*rotl3"): New insn definition.
>   * config/s390/subst.md: New file.

This already looks much nicer (and shorter) :-)

In fact, I'm now wondering whether it couldn't be even shorter.  Would it
be possible to use instead of:

(define_insn "*rotl3"
  [(set (match_operand:GPR 0 "register_operand" "=d,d")
(rotate:GPR (match_operand:GPR 1 "register_operand"  "d,d")
(match_operand:SI  2 "nonmemory_operand" "a,n")))]
  "TARGET_CPU_ZARCH"
  "@
   rll\t%0,%1,(%2)
   rll\t%0,%1,%Y2"
  [(set_attr "op_type"  "RSE")
   (set_attr "atype""reg")
   (set_attr "enabled"  "*,")
   (set_attr "z10prop"  "z10_super_E1")])

simply something like:

(define_insn "*rotl3"
  [(set (match_operand:GPR 0 "register_operand" "=d")
(rotate:GPR (match_operand:GPR 1 "register_operand"  "d")
(match_operand:SI  2 "nonmemory_operand" "an")))]
  "TARGET_CPU_ZARCH"
  "rll\t%0,%1,
  [(set_attr "op_type"  "RSE")
   (set_attr "atype""reg")
   (set_attr "z10prop"  "z10_super_E1")])


where addr_style_op_ops is defined like:

(define_subst_attr "addr_style_op_ops" "addr_style_op_subst" "%Y2" "%Y3(%2)")

and we don't need addr_style_op_enabled any more?   %Y would continue
to emit both simple constants and register operands (and full address
operands e.g. for setmem) as before.


> +(define_subst "masked_op_subst"
> +  [(set (match_operand:DSI 0 ""   "")
> +(SUBST:DSI (match_operand:DSI 1 "" "")
> +(match_operand:SI  2 "" "")))]
> +  ""
> +  [(set (match_dup 0)
> +(SUBST:DSI (match_dup 1)
> +(and:SI (match_dup 2)
> +(match_operand:SI 3 "const_int_6bitset_operand" 
> ""])

Do we need a constraint letter here?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] PR 66337: Improve Compliance with Power ABI

2016-02-18 Thread Ulrich Weigand

Kevin Nilsen wrote:

> This patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu and=
>  powerpc64be-unknown-linux-gnu (both 32-bit and 64-bit) and=20
> powerpc64-unknown-freebsd11.0 (big endian) with no regressions.  Is it ok t=
> o fix this on the trunk?
> 
> The problem described in PR66337 (https://gcc.gnu.org/bugzilla/show_bug.cgi=
> ?id=3D3D66337) is that compiling for PowerPC targets with the -malign-power=
>  command-line option results in invalid field offsets for certain structure=
>  members.   As identified in the problem report, this results from a macro =
> definition present in both config/rs6000/{freebsd64,linux64}.h, which in bo=
> th cases is introduced by the comment:
> 
> /* PowerPC64 Linux word-aligns FP doubles when -malign-power is given. */
> 
> I have consulted various ABI documents, including "64-bit PowerPC ELF Appli=
> cation Binary Interface Supplement 1.9" (http://refspecs.linuxfoundation.or=
> g/ELF/ppc64/PPC-elf64abi.html), "Power Architecture 64-Bit ELF V2 ABI Speci=
> fication" (https://members.openpowerfoundation.org/document/dl/576), and "P=
> ower
> Architecture(R) 32-bit Application Binary Interface Supplement 1.0 - Linux(=
> R) & Embedded" (https://www.power.org/documentation/power-architecture-32-b=
> it-abi-supplement-1-0-embeddedlinuxunified/).  I have not been able to find=
>  any basis for this comment and thus am concluding that the comment and exi=
> sting implementation are incorrect.
> 
> The implemented patch removes the comment and changes the macro definition =
> so that field alignment calculations on 64-bit architectures ignore the -ma=
> lign-power command-line option.  With this fix, the test case identified in=
>  the PR behaves as was expected by the submitter.

There seems to be some confusion here.  First of all, on Linux and FreeBSD,
the *default* behavior is -malign-natural, which matches what the Linux ABI
specifies.  Using -malign-power on *Linux* is an explicit instruction to
the compiler to *deviate* from the documented ABI.

The only effect that the deviation has on Linux is to change the alignment
requirements for certain structure elements.  Your patch removes this change,
making -malign-power fully a no-op on Linux and FreeBSD.  This doesn't seem
to be particularly useful ...  If you don't want the effect, you can simply
not use that switch.

To my understanding, the intent of providing that switch was to allow
creating code that is compatible code produced by some other compilers
that do not adhere to the Linux ABI, but some other ABI.  In particular,
my understanding is that the *AIX* ABI has these alignment requirements.
And in fact, GCC on *AIX* defaults to -malign-power.

Looking at PR 66337, the submitter actually refers to the behaviour of
GCC on AIX, so I'm not sure how Linux is even relevant here.  (Maybe
there is something wrong in how GCC implements the AIX ABI.  But I'm
not really familar with AIX, so I can't help much with that.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread Ulrich Weigand

Alan Modra wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > Right.  It's a bit unfortunate that we can't just use IFmode 
> > unconditionally,
> > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> > then we probably shouldn't be using it.
> 
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.

Hmm, OK.  That does make things simpler.

> > Another option might be to use TDmode to allocate a scratch register pair.
> 
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.

I was thinking here that in the special case of the *reload scratch
register*, which reload allocates for us, we will always get a full
register.  This is different from some other operand that may originate
from pre-existing RTX that may require a SUBREG even after reload.

But I certainly agree that your current patch looks like a good choice
for a stage4 bugfix change.  Further cleanup can always happen later.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-12 Thread Ulrich Weigand

(Replying to multiple messages at once ...)

Michael Meissner wrote:

> At the present time, we do not allow DImode to go into Altivec
> registers. Mostly it was due to reload complications in the power8 time frame,
> where we didn't have d-form addressing for the Altivec registers and
> interference with the other DImode reload patterns, but also I felt I would
> need to go through the main rs6000.md to look for the various DImode patterns
> to see if they needed to be updated.  At some I hoped to extend this, so I
> added the "wi" and "wj" constraints to be used whenever the mode is DImode.
> "wi" is the constraint for the VSX register class DImode can go in
> (i.e. currently FLOAT=5FREGS), and "wj" is the VSX register class DImode can 
> go
> in if we have 64-bit direct move.

I see.  I hadn't noticed the restriction on DImode, sorry.  I think that in
general, it would be a good idea to allow DImode wherever DFmode is allowed,
since the same instructions should be able to handle both.  But given the
unexpected problems that seem to turn up whenever we want to allow more modes
in more registers, this is probably better for stage 1 than right now ...

> The TFmode thing was a hack.  It was the only way I could see getting two
> registers without a lot of hair.  Since TFmode is also restricted to FPR_REGS,
> you could use %L on it.  When I was writing it, I really wished we had a 
> reload
> pattern to get more than one temporary, but we didn't, so I went for TFmode to
> get 2 FPR registers.  Given the replacement pattern no longer uses a TFmode
> temporary, it can go in any appropriate register.

Again, it would probably make sense to allow TFmode (when it is double double)
in any pair of registers where DFmode is allowed, since the type *is* just a
pair of DFmode values.  Again, really more a stage 1 matter ...

> On Thu, Feb 11, 2016 at 07:38:15PM +0100, Ulrich Weigand wrote:
> > It's not a bug, it's deliberate behavior.  The reload registers allocated
> > for secondary reload clobbers may overlap the destination, since in many
> > cases you simply move the input to the reload register, and then the
> > reload register to the destination (where the latter move can be a no-op
> > if it is possible to allocate the reload register and the destination
> > into the same physical register).  If you need two separate intermediate
> > values, you need to allocate a secondary reload register of a larger
> > mode (as is already done in the pattern).
> 
> This is one of the cases I wished the reload support had the ability to
> allocate 2 scratch temporaries instead of 1.  As I said in my other message,
> TFmode was a hack to get two registers to use.

Yes, it would be nice if we could specify multiple scratch registers.  That's
a long-standing FIXME in the reload code:

  /* ??? It would be useful to be able to handle only two, or more than
 three, operands, but for now we can only handle the case of having
 exactly three: output, input and one temp/scratch.  */

Unfortunately, given the structure of reload, that is difficult to fix.

> On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
> > Another concern I had about this, besides using %L in asm output (what
> > forces TFmode to use just fprs?), is what happens when we're using
> > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
> 
> Good point that it breaks if the default long double (TFmode) type is IEEE
> 128-bit floating point.  We would need to have two patterns, one that uses
> TFmode and one that uses IFmode.  I wrote the power8 direct move stuff before
> going down the road of IEEE 128-bit floating point.

Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
then we probably shouldn't be using it.

Another option might be to use TDmode to allocate a scratch register pair.

David Edelsohn wrote:

> Good question: is p8_mtvsrd really necessary if the movdi_internal64
> is updated to use the correct constraints?
> 
> The patch definitely is going in the right direction.  Can we remove
> more unnecessary bits?

Given Mike's reply above, I don't think it is simply a matter of updating
constraints in the one pattern.  Rather, enabling DImode in all vector
registers would require a change to rs6000_hard_regno_mode_ok and then
reviewing all DImode patterns to make sure they're still compatible.

As I said above, that's probably more of a stage 1 effort.

As long as that is not done, my original suggestion here:

> Maybe instead have p8_mtvsrd use DImode as output (instead of
> DFmode), which shouldn't be any harder to use in the
> reload_vsx_from_gpr splitter, and keep using the
> vs

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-11 Thread Ulrich Weigand

David Edelsohn wrote:
> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote:
> > This is PR68973 part 2, the failure of a boost test, caused by a
> > splitter emitting an invalid move in reload_vsx_from_gprsf:
> >   emit_move_insn (op0_di, op2);
> >
> > op0 can be any vsx reg, but the mtvsrd destination constraint in
> > movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
> > we have that constraint so left movdi_internal64 alone and used a
> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
> > by reload_vsx_from_gpr.  When looking at those, I noticed we're
> > restricted to fprs there too so fixed that as well.  (We can't use %L
> > in asm operands that must accept vsx.)
> 
> Michael, please review the "wj" constraint in these patterns.
> 
> Alan, the explanation says that it uses a special p8_mtvsrd similar to
> p8_mtvsrd_[12], but does not explain why the two similar patterns are
> removed.  The incorrect use of %L implies those patterns, but the
> change is to p8_xxpermdi_ that is not mentioned in the
> ChangeLog.
> 
> I also would appreciate Uli's comments on this direction because of
> his reload expertise.

For the most part, this patch doesn't really change anything in the
interaction with reload as far as I can see.  The changes introduced
by the patch do make sense to me.  In particular, replacing the two
patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts
of a TFmode register pair with a single pattern p8_mtvsrd that just
works on any DFmode register (leaving the split into high/low to the
caller if necessary) seems to simplify things.

> > +  /* You might think that we could use op0 as one temp and a DF clobber
> > + as the other, but you'd be wrong.  These secondary_reload patterns
> > + don't check that the clobber is different to the destination, which
> > + is probably a reload bug.  */

It's not a bug, it's deliberate behavior.  The reload registers allocated
for secondary reload clobbers may overlap the destination, since in many
cases you simply move the input to the reload register, and then the
reload register to the destination (where the latter move can be a no-op
if it is possible to allocate the reload register and the destination
into the same physical register).  If you need two separate intermediate
values, you need to allocate a secondary reload register of a larger
mode (as is already done in the pattern).

> >/* Also use the destination register to hold the unconverted DImode 
> > value.
> >   This is conceptually a separate value from OP0, so we use gen_rtx_REG
> >   rather than simplify_gen_subreg.  */
> > -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> > +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
> > +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));

While this was not introduced by this patch, I'm a little bit concerned
about the hard-coded use of REGNO here, which will crash if op0 at this
point happens to be a SUBREG instead of a REG.  This is extremely unlikely
at this point in reload, but not 100% impossible, e.g. if op0 for some
reason is one of the "magic" registers like the stack pointer.

That's why it is in general better to use simplify_gen_subreg or one of
gen_highpart/gen_lowpart, which will handle SUBREG correctly as well.
I'm not sure why it matters whether this is "conceptually a separate
value" as the comment argues ...

> >/* Move SF value to upper 32-bits for xscvspdpn.  */
> >emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> > -  emit_move_insn (op0_di, op2);
> > -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
> > +  emit_insn (gen_p8_mtvsrd (op0_df, op2));
> > +  emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf));

The sequence of modes used for op0 here is a bit weird.  First,
op0 is loaded as DFmode by mtvsrd, then it is silently re-
interpreted as V4SFmode when used as input to xscvspdpn, which
gets a DFmode output that is again silently re-interpreted as
SFmode.

This isn't really wrong as such, just maybe a bit confusing.
Maybe instead have p8_mtvsrd use DImode as output (instead of
DFmode), which shouldn't be any harder to use in the
reload_vsx_from_gpr splitter, and keep using the
vsx_xscvspdpn_directmove pattern?

[ This of course reinforces the question why we have p8_mtvsrd
in the first place, instead of just allowing this use directly
in movdi_internal64 itself.  ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2016-02-09 Thread Ulrich Weigand

Hi Bill,

> 2014-02-20  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>   * config/rs6000/altivec.md (altivec_lvxl): Rename as
>   *altivec_lvxl__internal and use VM2 iterator instead of
>   V4SI.
>   (altivec_lvxl_): New define_expand incorporating
>   -maltivec=be semantics where needed.

I just noticed that this:

> -(define_insn "altivec_lvxl"
> +(define_expand "altivec_lvxl_"
>[(parallel
> -[(set (match_operand:V4SI 0 "register_operand" "=v")
> -   (match_operand:V4SI 1 "memory_operand" "Z"))
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
>   (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
>"TARGET_ALTIVEC"
> -  "lvxl %0,%y1"
> +{
> +  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
> +{
> +  altivec_expand_lvx_be (operands[0], operands[1], mode, 
> UNSPEC_SET_VSCR);
> +  DONE;
> +}
> +})
> +
> +(define_insn "*altivec_lvxl__internal"
> +  [(parallel
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
> + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> +  "TARGET_ALTIVEC"
> +  "lvx %0,%y1"
>[(set_attr "type" "vecload")])

now causes vec_ldl to emit the lvx instead of the lvxl instruction.
I assume this was not actually intended?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-05 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> I'll stay with checking for larl - while I can imagine someone adding a 
> new conditional branch instruction, I don't see a need for another 
> larl-like instruction.  Besides, this way the failure mode for an 
> unknown instruction would be producing an error, instead of silently 
> emitting code with unfixed prologue.

OK, fine with me.  B.t.w. Andreas has checked in the sibcall fix,
so you no longer should be seeing larl used for sibcalls.

> I've updated and resubmitted the gold patch.

Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-04 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Ugh. I take that back.  For -fPIC, the load-address sequence is:
> 
>  larl%r1,f@GOTENT
>  lg  %r2,0(%r1)
>  br  %r14

This is correct.

> And (sibling) call sequence is:
> 
>  larl%r1,f@GOTENT
>  lg  %r1,0(%r1)
>  br  %r1

Oops.  That is actually a GCC bug.  The sibcall sequence really must be:

jg f@PLT

This is a real bug since it forces non-lazy symbol resolution for f
just because the compiler chose those use a sibcall optimization;
that's not supposed to happen.

It seems this bug was accidentally introduced here:

2010-04-20  Andreas Krebbel  <andreas.kreb...@de.ibm.com>

PR target/43635
* config/s390/s390.c (s390_emit_call): Turn direct into indirect
calls for -fpic -m31 if they have been sibcall optimized.

since the patch doesn't check for TARGET_64BIT ...

Andreas, can you have a look?

> It seems there's no proper way to recognize a call vs a load address - 
> so we can either go with emitting the marker, or have the same problem 
> as on ppc.
> 
> So - how much should we care?

I think we should fix that bug.  That won't help for existing objects,
but those don't use split stack either, so that shouldn't matter.

If we fix that bug before (or at the same time as) adding split-stack
support, the linker will still be able to distigunish function pointer
loads from calls (including sibcalls) on all objects using split stack.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-04 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Fair enough.  Here's what I'm going to implement in gold:
> 
> - any PLT relocation: call
> - PC32DBL on a larl: non-call
> - PC32DBL otherwise: call
> - any other relocation: non-call
> 
> Does that sound right?

Hmm, I'm wondering about the PC32DBL choices.  There are now
a large number of other non-call instructions that use PC32DBL,
including lrl, strl, crl, cgrl, cgfrl, ...

However, these all access *data* at the pointed-to location,
so it is quite unlikely they would ever be used with a
function symbol.  So, assuming that you also check that the
target of the relocation is a function symbol, treating only
larl as non-call might be OK.

Maybe a more conservative approach might be to  make the decision
the other way round: for PC32DBL check for *branch* instructions,
and treat only those are calls.  There's just a few branch
instruction using PC32DBL:

brasl  (call)
brcl   (conditional or unconditional sibcall)
brcth  (???)

where the last one is extremely unlikely (but theorically
possible) to be used as conditional sibcall combined with
a register decrement; I don't think this can ever happen
with current compilers however.

For full completeness, there are also PC16DBL relocations that
*could* target called functions, but only when compiling with
the -msmall-exec flag to assume total executable size is less
than 64 KB.  These are used by the following instructions:

bras
brc
brct
brctg
brxh
brxhg
brxle
brxlg
crj
cgrj
clrj
clgrj
cij
cgij
clij
clgij

Note that those are *all* branch instructions, so it might
make sense to add any PC16DBL targetting a function symbol
to the list of calls, just in case.  (But since basically
nobody ever uses -msmall-exec, it doesn't really matter
much either.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-03 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Comment fixed, split_stack_marker gone, reorg gone.  Generated code seems 
> sane,
> but testsuite still running.
> 
> I will need to modify the gold patch to handle the "leaf function taking 
> non-split
> stack function address" issue - this will likely require messing with the 
> target
> independent plumbing, the hook for that doesn't seem to get enough params.

Thanks for making those changes; the patch is looking a lot nicer (and shorter 
:-))
now!  Just to clarify, your original patch series had two common-code 
prerequisite
patches (3/5 and 4/5) -- it looks like those may still be needed?  If so, we'll
have to get approval from the appropriate middle-end maintainers before this
patch can go it as well.

As to the back-end patch, I've now only got some cosmetical issues:

> +  insn = emit_insn (gen_main_base_64 (r1, parm_base));

Now that we aren't using the literal pool infrastructure for the block any more,
I guess we shouldn't be using it to load the address either.  Just something
like:
  insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base));
should do it.

> +(define_insn "split_stack_data"
> +  [(unspec_volatile [(match_operand 0 "" "X")
> +  (match_operand 1 "" "X")
> +  (match_operand 2 "consttable_operand" "X")
> +  (match_operand 3 "consttable_operand" "X")]

And similarly here, just use const_int_operand.

Otherwise, this all looks very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-03 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> libgcc/ChangeLog:
> 
>   * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>   * config/s390/morestack.S: New file.
>   * config/s390/t-stack-s390: New file.
>   * generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>   * config/s390/s390.c (struct machine_function): New field
>   split_stack_varargs_pointer.
>   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>   in s390_emit_prologue.
>   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>   vararg pointer.
>   (morestack_ref): New global.
>   (SPLIT_STACK_AVAILABLE): New macro.
>   (s390_expand_split_stack_prologue): New function.
>   (s390_live_on_entry): New function.
>   (s390_va_start): Use split-stack vararg pointer if appropriate.
>   (s390_asm_file_end): Emit the split-stack note sections.
>   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL): New unspec.
>   (UNSPECV_SPLIT_STACK_DATA): New unspec.
>   (split_stack_prologue): New expand.
>   (split_stack_space_check): New expand.
>   (split_stack_data): New insn.
>   (split_stack_call): New expand.
>   (split_stack_call_*): New insn.
>   (split_stack_cond_call): New expand.
>   (split_stack_cond_call_*): New insn.
> ---
> Changes applied.  Testsuite still running, still works on my simple tests.
> 
> As for common code prerequisites: #3 is no longer needed, and very likely
> so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite
> runs just fine without it now).

OK, I see.  The patch is OK for mainline then, assuming testing passes.

Thanks again,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Here we go.  I've also removed the "see below", since I don't really
> see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

  if (TARGET_TPF_PROFILING)
{
  /* Generate a BAS instruction to serve as a function
 entry intercept to facilitate the use of tracing
 algorithms located at the branch target.  */
  emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)

I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
  [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
 (match_operand 1 "bras_sym_operand" "X")
 (match_operand 2 "consttable_operand" "X")
 (match_operand 3 "consttable_operand" "X")]
UNSPECV_SPLIT_STACK_DATA)]
  ""
{
  switch_to_section (targetm.asm_out.function_rodata_section
  (current_function_decl));

  output_asm_insn (\".align 3", operands);
  (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
 CODE_LABEL_NUMBER (operands[0]));
  output_asm_insn (\".quad %2\", operands);
  output_asm_insn (\".quad %3\", operands);
  output_asm_insn (\".quad %1-%0\", operands);

  switch_to_section (current_function_section ());
  return "";
}
  [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.

The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

> Both currently supported 
> architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

  if (!info->push_p)
return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 6/9] S/390: Get rid of Y constraint in tabort.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

>  (define_insn "*tabort_1"
> -  [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" 
> "Y")]
> +  [(unspec_volatile [(match_operand:SI 0 "addrreg_or_constint_operand" 
> "a,n")]
>   UNSPECV_TABORT)]
>"TARGET_HTM && operands != NULL"
> -  "tabort\t%Y0"
> +  "@
> +   tabort\t0(%0)
> +   tabort\t%0"
> +  [(set_attr "op_type" "S")])
> +
> +(define_insn "*tabort_1_plus"
> +  [(unspec_volatile [(plus:SI (match_operand:SI 0 "register_operand"  "a")
> +   (match_operand:SI 1 "const_int_operand" "J"))]
> + UNSPECV_TABORT)]
> +  "TARGET_HTM && operands != NULL"
> +  "tabort\t%1(%0)"

This seems dangerous: const_int_operand may match a constant that does
not fit into the "J" constraint, which would lead to an abort in reload.

What is the semantics for the abort code anyway?  It is supposed to be
automatically truncated or not?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "register_operand"  
> "a, I")

The second alternative can never match here, can it?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +; Accept single register and immediate operands usable as shift
> +; counts.
> +(define_predicate "addrreg_or_constint_operand"
> +  (match_code "reg, subreg, const_int")

I'm wondering whether this is even necessary.

> +{
> +  if (GET_MODE (op) != VOIDmode
> +  && GET_MODE_CLASS (GET_MODE (op)) != MODE_INT)
> +return false;

The predicate always seems to be used with a mode, so any extra
mode checks here seem redundant.

> +  while (op && GET_CODE (op) == SUBREG)
> +op = SUBREG_REG (op);
> +
> +  if (REG_P (op)
> +  && (REGNO (op) >= FIRST_PSEUDO_REGISTER || ADDR_REG_P (op)))
> +return true;
> +
> +  if (CONST_INT_P (op))
> +return true;

This looks somewhat copied from shift_count_or_setmem_operand, where
we have a comment:
  /* Don't allow any non-base hard registers.  Doing so without
 confusing reload and/or regrename would be tricky, and doesn't
 buy us much anyway.  */

With the new set up, I don't believe there is any issue with confusing
reload -- it's no more complex than any other pattern now.  So it
should be fine to just accept any register.

In fact, I'm starting to wonder whether it wouldn't be just fine to
simply using nonmemory_operand instead of this special predicate
(the only issue might be that we could get CONST_WIDE_INT, but it
should be simple to handle those).

> +(define_expand "rotl3"
> +  [(set (match_operand:GPR 0 "register_operand" "")
> +(rotate:GPR (match_operand:GPR 1 "register_operand" "")
> + (match_operand:SI 2 "shift_count_or_setmem_operand" "")))]

Similarly, I think the expander no longer needs to use the special predicate
shift_count_or_setmem_operandm but could just use nonmemory_operand.
[ This will reject the PLUS that shift_count_or_setmem_operand accepted,
but I don't believe you ever *could* get the PLUS in the expander anyway.
It will only be moved in by combine, so as long as the insn accepts it,
everything should be good.  ]

> +(define_insn "*rotl3"
> +  [(set (match_operand:GPR 0 "register_operand"   "=d,d")
> + (rotate:GPR (match_operand:GPR 1 "register_operand""d,d")
> + (match_operand:SI  2 "addrreg_or_constint_operand" "a,n")))]
> +  "TARGET_CPU_ZARCH"
> +  "@
> +   rll\t%0,%1,(%2)
> +   rll\t%0,%1,%Y2"

So it looks like %Y is now always used for just a constant.  Maybe the
implementation of %Y should be adjusted (once the patch series is complete)?
Also, if we use just nonmemory_operand, %Y should be updated to accept
CONST_WIDE_INT just like e.g. %x does.

> +; This expands an register/immediate operand to a register+immediate
> +; operand to draw advantage of the address style operand format
> +; providing a addition for free.
> +(define_subst "addr_style_op_subst"
> +  [(set (match_operand:DSI 0 "" "")
> +(SUBST:DSI (match_operand:DSI 1 "" "")
> +(match_operand:SI 2 "" "")))]
> +  "REG_P (operands[2])"
> +  [(set (match_dup 0)
> +(SUBST:DSI (match_dup 1)
> +(plus:SI (match_dup 2)
> + (match_operand 3 "const_int_operand" "n"])

This seems to add just a single alternative.  Is that OK if it gets
substituted into a pattern that used multiple alternatives otherwise?

> +; In the subst pattern we have to disable the alternative where op2 is
> +; already a constant.  This attribute is supposed to be used in the
> +; "disabled" insn attribute to achieve this.
> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")

Is this necessary given that the subst adds a REG_P (operands[2])
condition?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 8/9] S/390: Use define_subst for the setmem patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebel:

> While trying to get rid of the Y constraint in the setmem patterns I
> noticed that for these patterns it isn't even a problem since these
> always only use the constraint with a Pmode match_operand.  But while
> being at it I've tried to fold some of the patterns a bit.

Hmm, I'm wondering whether it's worth it to use a subst pattern that
is only ever used in one single instance?  (Specifically I'm wondering
about the 31z subst.  The and subst is used multiple times.)


> +; Turn the DImode in the 31 bit pattern into TImode to enforce
> +; register pair usage even with -mzarch.
> +; The subreg offset is adjusted accordingly.
> +(define_subst "setmem_31z_subst"
> +  [(clobber (match_operand:DI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:DI  3 "register_operand" "") 
> 0))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +  (subreg:SI (match_operand:DI  4 "register_operand" "")
> + 0)]

Is this right?  Shouldn't it be subreg offset 4 originally?

> +  UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:DI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))]
> +""
> +  [(clobber (match_operand:TI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:TI  3 "register_operand" "") 
> 4))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +  (subreg:SI (match_operand:TI  4 "register_operand" "")
> + 12)]
> + UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:TI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> With this patch a `disabled' attribute is defined which can be used to
> explicitly disable an alternative.  This comes handy when defining the
> substitutions later and while adding it anyway I've used it for the
> existing cases as well.

> +; Insn attribute with boolean value
> +;  1 to disable an insn alternative
> +;  0 or * for an active insn alternative
> +;  an active alternative can still be disabled due to lacking cpu
> +;  facility support
> +(define_attr "disabled" "" (const_int 0))
> +
>  (define_attr "enabled" ""
> -  (cond [(eq_attr "cpu_facility" "standard")
> +  (cond [(eq_attr "disabled" "1")
> +  (const_int 0)
> +
> +  (eq_attr "cpu_facility" "standard")
>(const_int 1)
>  
>   (and (eq_attr "cpu_facility" "ieee")

So I'm wondering what the difference is between this and simply
overriding the default implementation of "enabled" per-insn?

So instead of adding
(set_attr "disabled" "0,1")])
to an insn, you might simply add instead:
(set_attr "enabled" "*,0")])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 5/9] S/390: Get rid of Y constraint in arithmetic right shift patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

> +; This is like the addr_style_op substitution above but with a CC clobber.
> +(define_subst "addr_style_op_cc_subst"

> +; This is like the masked_op substitution but with a CC clobber.
> +(define_subst "masked_op_cc_subst"

A bit unfortunate that these need to be duplicated.  Does the subst always
have to match the full pattern, or can it match and operate on just one
element of a PARALLEL?

> +; This adds an explicit CC reg set to an operation while keeping the
> +; set for the operation result as well.
> +(define_subst "setcc_subst"

> +; This adds an explicit CC reg set to an operation while dropping the
> +; result of the operation.
> +(define_subst "cconly_subst"

These are nice!  It would seem they could be applied to simplify many
of the non-shift patterns too, right?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 9/9] S/390: Disallow SImode in s390_decompose_address

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   * config/s390/s390.c (s390_decompose_address): Don't accept SImode
>   anymore.

Great!  Very nice to finally get rid of this.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> > Hmm.  In theory reload should always respect both constraints and 
> > predicates.
> > But I guess it can't hurt to disable the alternative just to be safe; it is
> > indeed an uncommon case to have the constraint accept more than the 
> > predicate.
> > (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)
> 
> Does this mean you would prefer to convert the insn condition into a 
> predicate and change it with
> subst_attr or should I leave it in the insn condition?

If we can transform the operand predicate, that would certainly be the best.
I.e. where the original pattern has:
  (match_operand:SI 2 "nonmemory_operand" "a,n")
have the replaced pattern use instead:
  (plus:SI (match_operand:SI 2 "register_operand" "a")
   (match_operand 3 "const_int_operand" "n"))

if that is possible via the subst mechanism ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 02/01/2016 02:30 PM, Ulrich Weigand wrote:
> > This seems to add just a single alternative.  Is that OK if it gets
> > substituted into a pattern that used multiple alternatives otherwise?
> Yes.  This is supposed to work.  The new constraint will be duplicated until 
> it matches the number
> of alternatives in the original insn.

OK, that makes sense.   But now I'm wondering about this bit of
the ashiftrt patch:

+; FIXME: The number of alternatives is doubled here to match the fix
+; number of 4 in the subst pattern for the (clobber (match_scratch...
+; The right fix should be to support match_scratch in the output
+; pattern of a define_subst.

+(define_subst "cconly_subst"
+  [(set (match_operand:DSI 0 ""   "")
+(match_operand:DSI 1 "" ""))
+   (clobber (reg:CC CC_REGNUM))]
+  "s390_match_ccmode(insn, CCSmode)"
+  [(set (reg CC_REGNUM)
+   (compare (match_dup 1) (const_int 0)))
+   (clobber (match_scratch:DSI 0 "=d,d,d,d"))])

I guess this is where the number 4 comes from?  But if the alternatives
are duplicated, shouldn't it then work to simply use:
   (clobber (match_scratch:DSI 0 "=d"))])


> >> +; In the subst pattern we have to disable the alternative where op2 is
> >> +; already a constant.  This attribute is supposed to be used in the
> >> +; "disabled" insn attribute to achieve this.
> >> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")
> > 
> > Is this necessary given that the subst adds a REG_P (operands[2])
> > condition?
> I wasn't sure about this.  How does reload/lra behave when the constraints 
> allow more than the insn
> condition? I expected that reload might remember that one of the alternatives 
> might take a constant
> as well and then tries as part of reload inheritance and such things to 
> actually exploit this?! Just
> disabling the alternative in order to prevent reload from recording stuff for 
> that alternative
> appeared safer to me.
> 
> On the other hand it is not ideal to have an operands[x] check in the insn 
> condition in the first
> place. Especially when using define_substs where the operands get renumbered 
> this might add hard to
> find bugs. Perhaps I should try replacing the insn predicate with a 
> subst_attr instead? Is having a
> constraint which allows more than a predicate any better than having the same 
> with the insn condition?

Hmm.  In theory reload should always respect both constraints and predicates.
But I guess it can't hurt to disable the alternative just to be safe; it is
indeed an uncommon case to have the constraint accept more than the predicate.
(Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 02/01/2016 02:45 PM, Ulrich Weigand wrote:
> > So I'm wondering what the difference is between this and simply
> > overriding the default implementation of "enabled" per-insn?
> > 
> > So instead of adding
> > (set_attr "disabled" "0,1")])
> > to an insn, you might simply add instead:
> > (set_attr "enabled" "*,0")])
> Not sure but wouldn't this mean that the value of the enabled attribute would 
> then depend on the
> order of the set_attr for "enabled" and "cpu_facility" since one is defined 
> on the value of the other?!

I don't think the order matches; genattrtab seems to first read in all .md
files and only then optimize and then emit definitions of all the get_attr_...
functions.  The eq_attr uses in the "enabled" definition would only be
evaluated at that point.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-16 Thread Ulrich Weigand

Dominik Vogt wrote:
> On Mon, Dec 14, 2015 at 04:08:32PM +0100, Ulrich Weigand wrote:
> > I don't think that r1 is actually safe here.  Note that it may be used
> > (unconditionally) as temp register in s390_emit_prologue in certain cases;
> > the upcoming split-stack code will also need to use r1 in some cases.
> 
> How about the attached patch?  It also allows to use r0 as the
> temp register if possible (needs more testing).

This doesn't look safe either.  In particular:

- you use cfun_save_high_fprs_p at a place where its value might not yet
  have been determined (when calling s390_get_prologue_temp_regno from
  s390_init_frame_layout *before* the s390_register_info/s390_frame_info
  calls)

- r0 might hold the incoming static chain value, in which case it cannot
  be used as temp register

> If that's too
> much effort, I'm fine with limiting the original patch to r4 to
> r2.

That seems preferable to me.

> > r2 through r4 should be fine.  [ Not sure if there will be many (any?) cases
> > where one of those is unused but r5 isn't, however. ]
> 
> This can happen if the function only uses register pairs
> (__int128).  Actually I'm not sure whether r2 and r4 are valid
> candidates.

Huh?  Why not?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-16 Thread Ulrich Weigand

Dominik Vogt wrote:
> On Wed, Dec 16, 2015 at 01:51:45PM +0100, Ulrich Weigand wrote:
> > Dominik Vogt wrote:
> > > > r2 through r4 should be fine.  [ Not sure if there will be many (any?) 
> > > > cases
> > > > where one of those is unused but r5 isn't, however. ]
> > > 
> > > This can happen if the function only uses register pairs
> > > (__int128).  Actually I'm not sure whether r2 and r4 are valid
> > > candidates.
> > 
> > Huh?  Why not?
> 
> Because I'm not sure it is possible to write code where r2 (r4) is
> free but r3 (r5) is not - at least when s390_emit_prologue is
> called.  Writing code that uses r4 and r5 but not r3 was diffucult
> enough:

Ah, OK.  I agree that it will rarely happen (it could in more
complex cases where something initially uses r2 but a very late
optimization pass manages to eliminate that use).

However, when it is *is* free, it is a valid candidate in the
sense that it would be safe and correct to use it.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Wide int support.

2015-12-15 Thread Ulrich Weigand

Richard Sandiford wrote:
> "Ulrich Weigand" <uweig...@de.ibm.com> writes:
> > The problem is not DImode LABEL_REFs, but rather VOIDmode LABEL_REFs when
> > matched against a match_operand:DI.
> 
> It'd be good to fix this in a more direct way though, rather than
> hack around it.  It's possible that the trick will stop working
> if genrecog.c gets smarter.
> 
> When do label_refs have VOIDmode?  Is this an m31-ism?

No, this seems to be a cross-platform issue.  For one, RTX in .md files
pretty much consistently uses (label_ref ...) without a mode.  This means
that any LABEL_REFs generated from .md file expanders or splitters will
use VOIDmode.

For LABEL_REFs generated via explicit gen_rtx_LABEL_REF, usage seems to
be mixed between using VOIDmode and Pmode in target C++ files.  Common
code does seem to be using always Pmode, as far as I can see.

Are LABEL_REFs in fact supposed to always have Pmode?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-14 Thread Ulrich Weigand

Dominik Vogt wrote:

> The attached patch enables using r1 to r4 as the literal pool base pointer if
> one of them is unused in a leaf function.  The unpatched code supports only r5
> and r13.

I don't think that r1 is actually safe here.  Note that it may be used
(unconditionally) as temp register in s390_emit_prologue in certain cases;
the upcoming split-stack code will also need to use r1 in some cases.

r2 through r4 should be fine.  [ Not sure if there will be many (any?) cases
where one of those is unused but r5 isn't, however. ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH] S/390: Wide int support.

2015-12-11 Thread Ulrich Weigand

Dominik Vogt wrote:

> +; Note: Although CONST_INT and CONST_DOUBLE are not handled in this 
> predicate,
> +; at least one of them needs to appear or otherwise safe_predicate_mode will
> +; assume that a DImode LABEL_REF is not accepted either (see genrecog.c).

The problem is not DImode LABEL_REFs, but rather VOIDmode LABEL_REFs when
matched against a match_operand:DI.

Otherwise, this patch is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 12/02/2015 11:12 AM, Dominik Vogt wrote:
> > Hopefully, this is correct now; it does pass the functional test case
> > that's part of the patch.  Unfortunately the define_insn patters
> > had to be duplicated because of the new subreg offsets.  
> 
> The number of patterns could possibly be reduced using the define_subst 
> machinery.  I'm looking into
> this for some other changes. No need to do this right now. We can do this 
> later on-top.

For this particular issue, shouldn't a simple mode_attr be OK?
I see that the sh port uses this:

(define_mode_attr lowpart_be [(QI "3") (HI "2")])

  [(set (reg:SI T_REG)
(eq:SI
  (subreg:QIHI
(and:SI (match_operand:SI 0 "arith_reg_operand")
(match_operand:SI 1 "arith_reg_operand")) )
  (const_int 0)))]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-11-30 Thread Ulrich Weigand

Andreas Krebbel wrote:
> On 11/30/2015 04:11 PM, Dominik Vogt wrote:
> > The attached patch fixes some warnings generated by the setmem...
> > patterns in s390.md during build and add test cases for the
> > patterns.  The patch is to be added on to p of the movstr patch:
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03485.html
> > 
> > The test cases validate that the patterns are actually used, but
> > at the moment the setmem_long_and pattern is never actually used
> > and thus the test case would fail.  So I've split the patch in two
> > (both attached to this message) to activate this part of the test
> > once we've fixed that.
> > 
> > The patch has passed the SPEC2006 testsuite without any measurable
> > changes in performance.
> 
> Shouldn't we instead describe the whole setmem operation as unspec including 
> the other operands as
> well? The semantics of the introduced UNSPEC_P_TO_BLK operation is not clear 
> to me.  It suggests to
> be some kind of "cast" which it isn't. In fact it is not able to do its job 
> without the length which
> is specified as use outside the unspec.

Well, I guess I suggested to Dominik to leave the basic
[parallel
  (set (dst:BLK) (src:BLK))
  (use (length)]
structure in place; my understanding is that the middle-end recognizes
this as a block move.  As "source" in this case we'd use a BLKmode
operand that consist iof the same byte replicated a number of times.

If we were to use just a single UNSPEC, how would we indicate to the
middle-end that a block of memory is modified, without using too coarse-
grained clobbers?

However, I agree that UNSPEC_P_TO_BLK really should also get the length
as input, to make it have precisely defined semantics.  Also, I'd rather
use a more descriptive name, like UNSPEC_REPLICATE_BYTE or the like.

What would you think about something like the following?

(define_insn "*setmem_long"
  [(clobber (match_operand: 0 "register_operand" "=d"))
   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
(unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
 (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))
   (use (match_operand: 1 "register_operand" "d"))
   (clobber (reg:CC CC_REGNUM))]

[ Not sure if we'd need an extra (use (match_dup 3)) any more. ]

B.t.w. this is certainly wrong and cannot be generated by common code:
(and:BLK (unspec:BLK
  [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")]
  UNSPEC_P_TO_BLK)
 (match_operand 4 "const_int_operand" "n"))
(This explains why the pattern would never match.)

The AND should be on the filler byte instead:
(unspec:BLK [(and:P (match_operand:P 2 "shift_count_or_setmem_operand" 
"Y")
(match_operand:P 4 "const_int_operand" 
"n"))
 (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

1 2 3 4 5 >

1 - 100 of 452 matches

Mail list logo