Re: [PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 04:46:47PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:17, Segher Boessenkool wrote:
> >Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
> >combining more than two insns, always?  I'll experiment.

> For the 2-insn case we don't try a split and simply give up; but when we 
> have a three-insn sequence we then combine tries harder and the generic 
> splitter does rewrite that in 2 insns.

When doing a 2-2 combination, we don't try to split currently.  We could,
if that ever is useful, but that needs more work (to make sure we won't
loop, for one thing).

> Perhaps combine should never try a 3, or 4 insn combination where i0 or 
> i1 is a simple reg-reg move and only feeds one subsequent insn in the 
> sequence on the basis that if that was really going to help then it 
> would have been caught by the two-insn sequence.  If it feeds both i2 
> and i3 then that's a different matter, of course.

Yeah...  Maybe a simple move as producer where the reg dies in the consumer
should not be tried with 3 or more insns?  (Never doing combinations with
trivial copies results in a few in a thousand extra insns, pretty bad).


Segher


Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 05:06:20PM +0100, Richard Earnshaw (lists) wrote:
> On 21/10/2019 16:46, Segher Boessenkool wrote:
> >>>There also is the insn_cost hook, which especially for RISC-like targets
> >>>is a lot easier to define.
> >>
> >>Easier, but not a complete replacement for rtx_costs, so not necessarily
> >>easier in the end...
> >
> >It isn't a full replacement *yet*, still chipping away at it.  If your
> >port has an rtx_cost already, adding ai reasonable insn_cost will only
> >improve it, not regress anything.
> >
> >But there are some places that still need rtx_costs, yes.
> >
> >Do you have anything in particular in mind?  PRs welcome!
> 
> Nothing specific.  I have vague recollections of a few places that 
> generated quite complex RTL expressions and then expected rtx_costs to 
> do the decomposition into the cost of the sequence for computing that. 
> Perhaps the mul synthesis code was like that, but I'm probably 
> misremembering it.

Yeah, and there are a few other (less important) places that do not have
anything like an insn.

> Certainly an API where you *know* the RTL is a legal insn and not 
> something made up by the mid-end would be a real step forward -

That is exactly what insn_cost is.  Well, it takes an rtx_insn *, even.

> I think 
> there are still places in the mid-end that generate (PLUS:QI ...), for 
> example, and simple expect rtx_cost to return a useful answer even on a 
> RISC target with no such instruction.

Yup.

CSE still uses rtx_cost directly.  avoid_expensive_constant does, too, and
some other optabs stuff.

Less important are the uses in dojump, expr, expmed (there is your "mult"),

But the really big ones left are set_src_cost and set_rtx_cost I think.


Segher


Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 08:20:33AM -0700, Mike Stump wrote:
> On Oct 21, 2019, at 3:30 AM, Segher Boessenkool  
> wrote:
> > 
> > On Sun, Oct 20, 2019 at 06:06:30PM +0200, Gerald Pfeifer wrote:
> >> On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
> >>> +Use "git commit" and "git push origin
> >>> +master" to check in the patch.
> >> 
> >> I will admit I made a couple of first commits without reading those
> >> details and just used a plain "git push".  
> >> 
> >> Is there any problem with that, any drawback?
> >> 
> >> Or could we simplify those instructions?
> > 
> > Neither command works in all cases
> 
> This isn't helpful.

I was answering Gerald's question.  I hope the answer was useful.

I was not recommending either "git push" nor "git push origin master".
Each has its drawbacks.  That was my *point*.

Either is fine in this doc, in my opinion.


Segher


Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 03:46:53PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 14:00, Segher Boessenkool wrote:
> >On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
> >>
> >>The cost routine for Arm and Thumb2 was not recognising the idioms that
> >>describe the addition with carry, this results in the instructions
> >>appearing more expensive than they really are, which occasionally can lead
> >>to poor choices by combine.  Recognising all the possible variants is
> >>a little trickier than normal because the expressions can become complex
> >>enough that this is no single canonical from.
> >
> >There also is the insn_cost hook, which especially for RISC-like targets
> >is a lot easier to define.
> 
> Easier, but not a complete replacement for rtx_costs, so not necessarily 
> easier in the end...

It isn't a full replacement *yet*, still chipping away at it.  If your
port has an rtx_cost already, adding ai reasonable insn_cost will only
improve it, not regress anything.

But there are some places that still need rtx_costs, yes.

Do you have anything in particular in mind?  PRs welcome!


Segher


Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 01:27:23PM +0200, Jakub Jelinek wrote:
> On Mon, Oct 21, 2019 at 05:45:16AM -0500, Segher Boessenkool wrote:
> > On Sun, Oct 20, 2019 at 09:51:22PM +0200, Uros Bizjak wrote:
> > > On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
> > > > As mentioned in the PR, the x86 backend has various 
> > > > define_insn_and_split
> > > > patterns that are meant to match usually during combine, are then
> > > > unconditionally split during split1 pass and as they have && 
> > > > can_create_pseudo_p ()
> > > > in their define_insn condition, if they get matched after split1, 
> > > > nothing
> > > > would split them anymore and they wouldn't match after reload.
> > > >
> > > > The split1 pass already sets a property that can be used.
> > > >
> > > > I've first tried to remove some constraints and associated attributes, 
> > > > but
> > > > it seems from further discussions in the PR I don't know much about the
> > > > reasons why they were added and if they are still needed or not, so this
> > > > version of the patch just replaces the can_create_pseudo_p () conditions
> > > > with a new predicate that stops matching already after the split1 pass.
> > > 
> > > As explained by Segher in the PR, there is no 100% guarantee that
> > > combine won't produce a pattern with a wrong hard register as a e.g.
> > > count reg of a shift insn. RA will die on this kind of pattern with
> > > reload failure, so these constraints are used together with
> > > ix86_legitimate_combined_insn target hook to reject invalid
> > > combinations involving hard registers.
> > 
> > And even that isn't completely safe, there is nothing that stops other
> > passes from creating the offending insns.
> 
> Sure, but with the patch either they are created before split1, then they
> will be split there and all is fine,

No, they can end up as needing reloads for hard registers in exactly the
same way as they can during combine.  The combine hook is only used by
combine, after all.

Other passes generally try only very conservative instruction modifications
(comparatively), I don't expect problems in reality, certainly not problems
that stay hidden for years.

It's just not really safe to use (non-fixed) hard registers early.


Segher


Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 09:51:22PM +0200, Uros Bizjak wrote:
> On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
> > As mentioned in the PR, the x86 backend has various define_insn_and_split
> > patterns that are meant to match usually during combine, are then
> > unconditionally split during split1 pass and as they have && 
> > can_create_pseudo_p ()
> > in their define_insn condition, if they get matched after split1, nothing
> > would split them anymore and they wouldn't match after reload.
> >
> > The split1 pass already sets a property that can be used.
> >
> > I've first tried to remove some constraints and associated attributes, but
> > it seems from further discussions in the PR I don't know much about the
> > reasons why they were added and if they are still needed or not, so this
> > version of the patch just replaces the can_create_pseudo_p () conditions
> > with a new predicate that stops matching already after the split1 pass.
> 
> As explained by Segher in the PR, there is no 100% guarantee that
> combine won't produce a pattern with a wrong hard register as a e.g.
> count reg of a shift insn. RA will die on this kind of pattern with
> reload failure, so these constraints are used together with
> ix86_legitimate_combined_insn target hook to reject invalid
> combinations involving hard registers.

And even that isn't completely safe, there is nothing that stops other
passes from creating the offending insns.

i386 isn't the only target doing nasty things here, btw., the sky hasn't
come crashing down for many years, we'll all be okay.  Fingers crossed.


Segher


Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 06:06:30PM +0200, Gerald Pfeifer wrote:
> On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
> > +Use "git commit" and "git push origin
> > +master" to check in the patch.
> 
> I will admit I made a couple of first commits without reading those
> details and just used a plain "git push".  
> 
> Is there any problem with that, any drawback?
> 
> Or could we simplify those instructions?

Neither command works in all cases, and both work in the simpler cases.

"git push" will push what you have configured to push for the current branch
you are on.  If you are on your local "master", and you have followed the
simple instructions above, that will probably push your local "master" to
the "master" on your remote "origin" (that is, "git push origin master:master").

"git push origin master" makes it explicit what remote you want to push to,
and what local branch you want to push, but not the remote branch you want
to push to.

"git push" is slightly more dangerous, but git will prevent you from doing
most dangerous things, so that's alright.  The instructions as-is are a good
habit to get into, and for more advanced things you *do* need to name your
remote and branches; you'll have to learn them some time, why not now.

Both commands are fine, if you have your changes on local master.


However.  *Never ever* use "git push -f".  Don't drive without safety belts.

(Use "+" when you want to force push a branch, don't force push the world).


Segher


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 12:21:21PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:31, Segher Boessenkool wrote:
> > I have a bunch of testcases from when I did something similar for PowerPC
> > that I wanted to test...  But I cannot get your series to apply.  Do you
> > have a git repo I can pull from?
> 
> Perhaps because it's already committed to trunk?

Oh probably.  Duh.  Thanks :-)

> > u64 addH(u64 a) { return a + 0x12345678ULL; }
> > u64 addH0(u64 a) { return a + 0x1234ULL; }

(If you change those to 0x340078ULL etc. it'll test as meant on arm
as well: to see if it uses immediates in the insn where it can.  It looks
like it'll work fine fwiw).

> We do pretty well on this.  Only addSHm1 needs three insns (except where
> the constant isn't valid for arm), and I think that's the minimum for
> this case anyway.  Several of the tests only need one insn.

Yeah, very nice :-)


Segher


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-19 Thread Segher Boessenkool
Hi Richard,

On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
> 
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).

Very nice :-)

I have a bunch of testcases from when I did something similar for PowerPC
that I wanted to test...  But I cannot get your series to apply.  Do you
have a git repo I can pull from?

Here is one test case (it's a bit geared towards what our ISA can do):

===
typedef unsigned int u32;
typedef unsigned long long u64;

u64 add(u64 a, u64 b) { return a + b; }
u64 add1(u64 a) { return a + 1; }
u64 add42(u64 a) { return a + 42; }
u64 addm1(u64 a) { return a - 1; }
u64 addff(u64 a) { return a + 0xULL; }
u64 addH(u64 a) { return a + 0x12345678ULL; }
u64 addH0(u64 a) { return a + 0x1234ULL; }
u64 addS(u64 a, u32 b) { return a + b; }
u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
u64 addB1(u64 a) { return a + 0x1ULL; }
u64 addB8(u64 a) { return a + 0x8ULL; }

u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xULL; }
===

rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
addSHff if I understand well enough.  So I'd love to see what it does with
your series applied :-)


Segher


Re: [PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

2019-10-19 Thread Segher Boessenkool
On Fri, Oct 18, 2019 at 08:48:42PM +0100, Richard Earnshaw wrote:
> 
> Consider this sequence during combine:
> 
> Trying 18, 7 -> 22:
>18: r118:SI=r122:SI
>   REG_DEAD r122:SI
> 7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
>   REG_DEAD r118:SI
>   REG_DEAD cc:CC_RSB
>22: r1:SI=r114:SI
>   REG_DEAD r114:SI

r122 is dead here.  combine will have tried just 7+22 before, and that
failed; it now only succeeds because the register copy is removed.

All like you say and what your patch is about.  But, this is a generic
problem, that should be solved in combine itself (whether or not you also
want to reduce the insn cost here).

Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
combining more than two insns, always?  I'll experiment.

> We don't want to prevent combine from eliminating such moves, as this
> can expose more combine opportunities, but we shouldn't rate them as
> profitable in themselves.  We can do this be adjusting the costs
> slightly so that the benefit of eliminating such a simple insn is
> reduced.

Yes, but combine should have removed the move in a 2->1 combination
already, if it is beneficial: both 18->7 and 7->22 should have combined
just fine.  This also points to a potential target problem: why are those
two combinations not allowed?


Segher


Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-19 Thread Segher Boessenkool
On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
> 
> The cost routine for Arm and Thumb2 was not recognising the idioms that
> describe the addition with carry, this results in the instructions
> appearing more expensive than they really are, which occasionally can lead
> to poor choices by combine.  Recognising all the possible variants is
> a little trickier than normal because the expressions can become complex
> enough that this is no single canonical from.

There also is the insn_cost hook, which especially for RISC-like targets
is a lot easier to define.


Segher


Re: [PATCH] Move jump threading before reload

2019-10-18 Thread Segher Boessenkool
On Fri, Oct 18, 2019 at 11:06:45AM +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux
> and ppc64le-redhat-linux.  The offending patch is in gcc-9_1_0-release
> and gcc-9_2_0-release - do I need to backport this fix to gcc-9-branch?

It is a regression on 9 (or so I assume), so yes please.

>   PR rtl-optimization/92007
>   * cfgcleanup.c (thread_jump): Add an assertion that we don't
>   call it after reload if hot/cold partitioning has been done.
>   (class pass_postreload_jump): Rename to
>   pass_jump_after_combine.

This fits on one line just fine.

>   (make_pass_postreload_jump): Rename to
>   make_pass_jump_after_combine.
>   * passes.def(pass_postreload_jump): Move before reload, rename

Space before (.

> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -259,6 +259,10 @@ thread_jump (edge e, basic_block b)
>bool failed = false;
>reg_set_iterator rsi;
>  
> +  /* Jump threading may cause fixup_partitions to introduce new crossing 
> edges,
> + which is not allowed after reload.  */
> +  gcc_checking_assert (!reload_completed || !crtl->has_bb_partition);

Thanks for the assert, that will help prevent people from running into
this again.

The patch looks fine to me, but I'm not a global reviewer :-)


Segher


Re: {PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-17 Thread Segher Boessenkool
Hi Kelvin,

On Wed, Oct 09, 2019 at 03:28:45PM -0500, Kelvin Nilsen wrote:
> This new pass scans existing rtl expressions and replaces them with rtl 
> expressions that favor selection of the D-form instructions in contexts for 
> which the D-form instructions are preferred.  The new pass runs after the RTL 
> loop optimizations since loop unrolling often introduces opportunities for 
> beneficial replacements of X-form addressing instructions.
> 
> For each of the new tests, multiple X-form instructions are replaced with 
> D-form instructions, some addi instructions are replaced with add 
> instructions, and some addi instructions are eliminated.  The typical 
> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
> number of instructions executed on each iteration of the loop.  The 
> optimization has not shown measurable improvement on specmark tests, 
> presumably because the typical loops that are benefited by this optimization 
> are memory bounded and this optimization does not eliminate memory loads or 
> stores.  However, it is anticipated that multi-threaded workloads and 
> measurements of total power and cooling costs for heavy server workloads 
> would benefit.

My first question is, why did ivopts choose the suboptimal solution?
_Did_ it, or did something later mess things up?

This new pass can help us investigate that.  It certainly sounds like we
could do better earlier already.

I think it is a good design to make fixes late in the pass pipeline, *but*
we should try to make good choices earlier, too -- the "late tweaks" should
be just that, tweaks; 4%-12% is a bit much.

(It's not that super late here; but still, why does it help so much?)

> 2. Improved comments and added discussion of computational complexity.

It's really not good at all to have anything that is quadratic in the size
of the program, or the size of a function, or the size of a basic block:
there always show up real programs which then take approximately infinitely
long to compile.

If there are good arguments why some parameter can not be bigger than 100
in reality, or maybe even 1000, it is different of course; but things like
number of function, number of basic blocks, or number of instructions (per
function or bb or loop) are not naturally limited.

> 5. Refactored the code to divide into smaller functions and provide more 
> descriptive commentary.

Many thanks for this :-)

> +   This pass replaces the above-matched sequences with:
> +
> +   Ai: derived_pointer = array_base + offset
> +   *(derived_pointer)
> +
> +   Aij: leave these alone.  expect that subsequent optimization deletes
> +this code as it may become dead (since we don't use the
> +indexing expression following our code transformations.)
> +
> +   Ai:
> +   *(derived_pointer + constant_i)
> + (where constant_i equals sum of constant (n,j) for all n from 1
> +  to i paired with all j from 1 to Kn,

So if I understand this correctly, if the code is

  x0 = [base+8]
  x1 = [base]
  x2 = [base+16]

this pass will change it to

  p = base+8
  x0 = [p]
  x1 = [p-8]
  x2 = [p+8]

Should it always pick the first access as the new base pointer?  Should it
use the lowest offset instead?

(Maybe the code does something more advanced than picking the first; not
clear from this comment though).

> +class indexing_web_entry: public web_entry_base
> +{
> + public:
> +  rtx_insn *insn;/* Pointer to the insn */
> +  basic_block bb;/* Pointer to the enclosing basic block */

The rest of the fields have the comment before the declaration.  I would
just lose the comments here though: if something called "insn" is not an
insn, or something called "bb" is not a block, ... :-)

> +  /* A unique sequence number is assigned to each instruction for the
> + purpose of simplifying domination tests.  Within each basic
> + block, sequence numbers areassigned in strictly increasing order.
> + Thus, for any two instructions known to reside in the same basic
> + block, the instruction with a lower insn_sequence_no is kknown
> + to dominate the instruction with a higher insn_sequence_no.  */
> +  unsigned int insn_sequence_no;

Many existing passes call this "luid" (for "local unique id").

(Typos: "are assigned", "known").

> +  /* If this insn is relevant, it is a load or store with a memory
> + address that is comprised of a base pointer (e.g. the address of
> + an array or array slice) and an index expression (e.g. an index
> + within the array).  The original_base_use and original_index_use
> + fields represent the numbers of the instructions that define the
> + base and index values which are summed together with a constant
> + value to determine the value of this instruction's memory
> + address.  */
> +  unsigned int original_base_use;
> +  unsigned int original_index_use;

I wonder how you determine what is base and what is index?

(I'll review the rest 

Re: [PATCH] LRA: side_effects_p stmts' output is not invariant (PR89721)

2019-10-17 Thread Segher Boessenkool
On Fri, Mar 15, 2019 at 05:14:48PM -0500, Segher Boessenkool wrote:
> On Fri, Mar 15, 2019 at 04:25:01PM -0400, Vladimir Makarov wrote:
> > On 2019-03-15 2:30 p.m., Segher Boessenkool wrote:
> > >PR89721 shows LRA treating an unspec_volatile's result as invariant,
> > >which of course isn't correct.  This patch fixes it.
> > Segher, thank you for fixing this.  The patch is ok to commit.
> 
> Thanks, done.  Is this okay for backports to 8 and 7 as well?  After a
> while of course.

I lost track of this.  Is it okay to backport to 8 and 7?


Segher


Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Segher Boessenkool
On Wed, Oct 16, 2019 at 11:44:37PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> >> If someone wants to add a new canonical form then the ports should of
> >> course adapt, but until then I think the patch is doing the right thing.
> >
> > We used to generate this, until GCC 5.  There aren't many ports that have
> > adapted yet.
> 
> The patch has testcases, so this won't be a silent failure for SVE2
> if things change again in future.

Sure.  But I am saying the current behaviour should *not* be canonical
(and it never was); it would make more sense to have the more sensible
behaviour we had for the many years (or decades?) before as canonical.

> >> > If the mask is not a constant, we really shouldn't generate a totally
> >> > different form.  The xor-and-xor form is very hard to handle, too.
> >> >
> >> > Expand currently generates this, because gimple thinks this is simpler.
> >> > I think this should be fixed.
> >> 
> >> But the constant form is effectively folding away the NOT.
> >> Without it the equivalent rtl uses 4 operations rather than 3:
> >> 
> >>   (ior (and A C) (and B (not C)))
> >
> > RTL canonicalisation rules are not based around number of ops.
> 
> Not to the exclusion of all else, sure.

No, not *at all*.  As a side-effect this is sometimes the case, of course,
perhaps more often than not; but it is not a rule.

> But my point was that there
> are reasons why forcing the (ior ...) form for non-constants might not
> be a strict improvement.

Sure.  But we either have canonical forms, which might be a bit awkward
sometimes, or we have to handle two (or three, or sometimes many more)
different forms everywhere.  This hurts especially in the MDs.  Like,
aarch now has this xor-and-xor pattern, but it is not canonical; some
parts in GCC might generate the and/ior thing for example, or instead
use some zero_extract thing (which in combine will then also try the
and/ior thing).

> > For example, we do (and (not A) (not B)) rather than (not (ior (A B)) .
> 
> Right, hence my complaint about this the other day on IRC. :-)
> I hadn't noticed until then that gimple had a different rule.

I think I missed that, sorry.

> > Instead, there are other rules (like here: push "not"s inward,
> > which can be applied locally with the wanted result).
> 
> Sure.  But I think it's common ground that there's no existing
> rtl rule that applies naturally to (xor (and (xor A B) C) B),
> where there's no (not ...) to push down.

Yes.  The documentation says
  @cindex @code{xor}, canonicalization of
  @item
  The only possible RTL expressions involving both bitwise exclusive-or
  and bitwise negation are @code{(xor:@var{m} @var{x} @var{y})}
  and @code{(not:@var{m} (xor:@var{m} @var{x} @var{y}))}.
and that is all it says about xor (and it is under-defined, anyway; surely
  (set (reg)
   (xor (mult (not (reg))
  (reg))
(reg)))
(it's hard to come up with a less silly example) is valid RTL as well!)

Because in the
  ((A^B) & C) ^ B
we have B twice it can be expressed in quite different ways.  I prefer
  (A) | (B&~C)
(which is the disjunctive normal form for this) (using ior instead of xor
is common, but that is not a documented canonicalisation either); this is
nice because C has a different role (different than A and B) here, and it
is the same shape as you get for C a constant, and the expression is
nicely symmetrical too, and the expression tree is less deep (ignoring
the inversion ;-) )

In general, the RTL code does not handle xor very well, even aside from
all of the canonicalisation issues.  So maybe we should just not use xor
much?

> >> As you say, it's no accident that we get this form, it's something
> >> that match.pd specifically chose.  And I think there should be a
> >> strong justification for having an RTL canonical form that reverses
> >> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> >> to be able to undo the gimple transforms in all cases.
> >
> > Canonical RTL is different in many ways, already.
> 
> Sure, wasn't claiming otherwise.  But most of the rtl canonicalisation
> rules predate gimple by some distance, so while the individual choices
> are certainly deliberate, the differences weren't necessarily planned
> as differences.

The RTL and Gimple rules have very different goals.

> Whereas here we're talking about adding a new rtl rule
> with the full knowledge that it's the ooposite of the equivalent gimple
> rule.

And also with the knowledge that it is what existing target code still
expects!  (The aarch64 code is the first using the "gimple" form for this
as far as I know).

> If we're g

Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
On Thu, Oct 17, 2019 at 10:37:33AM +0100, Iain Sandoe wrote:
> Segher Boessenkool  wrote:
> 
> > Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
> > again in two weeks, maybe?  I know it's important for the darwin port, but
> > the generic part is a little scary.
> 
> No problem (I would like to get this in to the final issue of 7, if possible).
> 
> FWIW, this implementation is completely guarded on TARGET_MACHO.

Yeah, I misread that, somehow I thought the big change was inside
mem_operand_gpr.  It isn't, and it is perfectly safe elsewhere, so no
special care is needed here at all, for backports.

> >> --- a/gcc/config/rs6000/rs6000.md
> >> +++ b/gcc/config/rs6000/rs6000.md
> >> @@ -6894,13 +6894,6 @@
> >> ;; do the load 16-bits at a time.  We could do this by loading from memory,
> >> ;; and this is even supposed to be faster, but it is simpler not to get
> >> ;; integers in the TOC.
> >> -(define_insn "movsi_low"
> > 
> > Should the preceding comment be moved elsewhere / changed / deleted?
> 
> It seemed to be a comment about the following code - or something that should
> have been deleted long ago - it mentions the TOC, which Darwin does not use
> so not a Darwin-related thing.
> 
> Happy to do a separate patch to delete it that’s desired.

I think the comment is more confusing than helpful, currently.  So sure,
removing it is pre-approved.


Segher


Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
On Thu, Oct 17, 2019 at 01:46:41PM +1030, Alan Modra wrote:
> On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > > For 32bit cases this isn't a problem since we can load/store to unaligned
> > > addresses using D-mode insns.
> > 
> > Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> > six years ago or so...  Alan, do you remember?  It required some assembler
> > work IIRC.
> 
> Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
> Rather than defining a whole series of _DS (and _DQ!) relocs, the
> linker inspects the instruction being relocated and complains if the
> relocation would modify opcode bits.  See is_insn_ds_form in
> bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

Ah right, that was it.  So it uses the D reloc but with DS or DQ
restrictions.  Gotcha.  For the compiler this is just as if those DS and
DQ relocs *do* exist.

> > I'll have another looke through this (esp. the generic part) when I'm fresh
> > awake (but not before coffee!).  Alan, can you have a look as well please?
> 
> It looks reasonable to me.

Thanks Alan!


Segher


Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
again in two weeks, maybe?  I know it's important for the darwin port, but
the generic part is a little scary.

On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
> 
>  - note, that rs6000_offsettable_memref_p () will not handle these so that
>would return early, producing the issue with unsatisfiable constraints.
> 
>   - I do wonder if that's also the case for some non-Darwin lo_sum cases.
> 
> (some things might be hard to detect, since the code will generally fall
>  back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>  less efficient than it could be).

I'm putting this on the Big List of things I may some day have time to
look at ;-)

>   * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete

Full stop.

> +  /* We only care if the access(es) would cause a change to the high part.  
> */
> +  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

So this works because the "extra" part only is relevant for positive
offsets.  Okay.  Tricky.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6894,13 +6894,6 @@
>  ;; do the load 16-bits at a time.  We could do this by loading from memory,
>  ;; and this is even supposed to be faster, but it is simpler not to get
>  ;; integers in the TOC.
> -(define_insn "movsi_low"

Should the preceding comment be moved elsewhere / changed / deleted?


Segher


Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool
On Wed, Oct 16, 2019 at 09:04:18PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > This isn't canonical RTL.  Does combine not simplify this?
> >
> > Or, rather, it should not be what we canonicalise to: nothing is defined
> > here.
> 
> But when nothing is defined, let's match what we get :-)

Of course.

> If someone wants to add a new canonical form then the ports should of
> course adapt, but until then I think the patch is doing the right thing.

We used to generate this, until GCC 5.  There aren't many ports that have
adapted yet.

> > If the mask is not a constant, we really shouldn't generate a totally
> > different form.  The xor-and-xor form is very hard to handle, too.
> >
> > Expand currently generates this, because gimple thinks this is simpler.
> > I think this should be fixed.
> 
> But the constant form is effectively folding away the NOT.
> Without it the equivalent rtl uses 4 operations rather than 3:
> 
>   (ior (and A C) (and B (not C)))

RTL canonicalisation rules are not based around number of ops.  For example,
we do  (and (not A) (not B))  rather than  (not (ior (A B))  .  Instead,
there are other rules (like here: push "not"s inward, which can be applied
locally with the wanted result).

> And folding 4 operations gets us into 4-insn combinations, which are
> obviously more limited (for good reason).

But on most machines it doesn't need to combine more than two or three
insns to get here.  Reducing the depth of the tree is more useful...  That
is 3 in both cases here, but "andc" is common on many machines, so that
makes it only two deep.

> As you say, it's no accident that we get this form, it's something
> that match.pd specifically chose.  And I think there should be a
> strong justification for having an RTL canonical form that reverses
> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> to be able to undo the gimple transforms in all cases.

Canonical RTL is different in many ways, already.

"Not as powerful", I have no idea what you mean, btw.  RTL is much closer
to the real machine, so is a lot *more* powerful than Gimple for modelling
machine instructions (where Gimple is much nicer for higher-level
optimisations).  We need both.


Segher


Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Segher Boessenkool
On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
> PS The counterexample nicely illustrates why -Wself-init should
> be in -Wall like in Clang or MSVC, or at least in -Wextra like in
> ICC.  Let me take it as a reminder to submit a patch for GCC 10.

c-family/c-gimplify.c says:

  /* This is handled mostly by gimplify.c, but we have to deal with
 not warning about int x = x; as it is a GCC extension to turn off
 this warning but only if warn_init_self is zero.  */

A lot of code will start to warn if you turn on -Winit-self by default
(in -Wall or -W), since forever we have had this GCC extension, and
people expect it.  (Or so I fear, feel free to prove me wrong :-) )


Segher


Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool
Hi,

[ Please don't use application/octet-stream attachments.  Thanks! ]

On Wed, Oct 16, 2019 at 04:24:29PM +, Yuliang Wang wrote:
> +;; Unpredicated bitwise select.
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]

This isn't canonical RTL.  Does combine not simplify this?

Or, rather, it should not be what we canonicalise to: nothing is defined
here.

We normally get something like

Trying 7, 8 -> 9:
7: r127:SI=r130:DI#4^r125:DI#4
  REG_DEAD r130:DI
8: r128:SI=r127:SI&0x2000
  REG_DEAD r127:SI
9: r126:SI=r128:SI^r125:DI#4
  REG_DEAD r128:SI
  REG_DEAD r125:DI
Successfully matched this instruction:
(set (reg:SI 126)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 4)
(const_int 536870912 [0x2000]))
(and:SI (subreg:SI (reg/v:DI 125 [ yD.2902+-4 ]) 4)
(const_int -536870913 [0xdfff]

If the mask is not a constant, we really shouldn't generate a totally
different form.  The xor-and-xor form is very hard to handle, too.

Expand currently generates this, because gimple thinks this is simpler.
I think this should be fixed.


Segher


Re: [PATCH V2] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-16 Thread Segher Boessenkool
Hi!

On Wed, Oct 16, 2019 at 04:50:21PM +0800, Jiufu Guo wrote:
> In PR70010, a function is marked with target(no-vsx) to disable VSX code
> generation.  To avoid VSX code generation, this function should not be
> inlined into VSX function.  
> 
> In previous implementation, target of non-vsx is treated as subset target
> with vsx, even user set no-vsx attribute. 
> 
> As discussed in previous mails, to fix the bug, in the current logic when
> checking whether the caller's ISA flags supports the callee's ISA flags, we
> just need to add a test that enforces that the caller's ISA flags match
> exactly the callee's flags, for those flags that were explicitly set in the
> callee.  If caller without target attribute then using options from command
> line.  Here is a patch which is updated with comments from previous mails.  

> gcc/
> 2019-10-12  Peter Bergner 
>   Jiufu Guo  
> 
>   PR target/70010
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Prohibit inlining if
>   the callee explicitly disables some isa_flags the caller is using.  

No trailing spaces please.

> +  /* If the caller has option attributes, then use them.
> +  Otherwise, use the command line options.  */
> +  if (caller_tree)
> + caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
> +  else
> + caller_isa = rs6000_isa_flags;

Okay, it's very clear with that comment -- or it *seems* easy, anyway :-)

> +  /* The callee's options must be a subset of the caller's options, i.e.
> +  a vsx function may inline an altivec function, but a non-vsx function
> +  must not inline a vsx function.  However, for those options that the

no-vsx instead of non-vsx?  Or make it clear even more explicitly that this is
about having something explicitly disabled?  (The code is clear though).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -flto -mvsx" } */
> +
> +vector int c, a, b;
> +
> +static inline void __attribute__ ((__always_inline__, target ("no-vsx")))
> +foo () /* { dg-error "inlining failed in call to .* target specific option 
> mismatch" } */

.* can match across lines.  Not a huge deal here of course -- but maybe
adding  (?n)  to the regexp works?  (Just at the very start of it).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-2.c

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-3.c 
> b/gcc/testsuite/gcc.target/powerpc/pr70010-3.c
> new file mode 100644

Do you want to test anything in those two new testcases?  Other than "it
compiles"?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-4.c

> +foo () /* { dg-error "inlining failed in call to .* target specific option 
> mismatch" } */

(.* again)

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010.c 
> b/gcc/testsuite/gcc.target/powerpc/pr70010.c
> new file mode 100644
> index 000..affd0c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */

This line isn't necessary anymore I think?  Or if it is, it is needed in
all these new testcases.

> +/* { dg-options "-O2 -finline-functions" } */
> +/* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */

Okay for trunk.  Thanks to both of you!

Also okay for 9 and 8, after waiting a week to see if there is fallout.


Segher


Re: [PATCH 0/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-15 Thread Segher Boessenkool
On Tue, Oct 15, 2019 at 12:49:18PM -0600, Jeff Law wrote:
> There aren't many that use PSImode.  In general we don't handle partial
> modes well in the optimizers -- largely because they're just not that
> common and the exact size is unspecified.  PSImode for example can be
> anywhere between 16 and 32 bits.  We can't track the sign bit state,
> simplify extensions, etc etc.

For GCC, it has exactly 32 bits, just not all are necessarily significant:

  A @code{MODE_PARTIAL_INT} mode behaves as if it were as wide as the
  corresponding @code{MODE_INT} mode, except that it has an unknown
  number of undefined bits.

So sure, you could use only the low 16 bits of it, for example, if that
works out well for your port.

Such modes cannot (in the portable code) be used for any arithmetic, only
for data movement and the like.

> One of the problems I distinctly remember is the promotion of integer
> loop induction variables to ptrdiff_t then using them in pointer
> arithmetic generated *horrible* code.

Heh.


Segher


[PATCH] genattrtab: Parenthesize expressions correctly (PR92107)

2019-10-15 Thread Segher Boessenkool
As PR92107 shows, genattrtab doesn't parenthesize expressions correctly
(or at all, even).  This fixes it.

I'll commit it as trivial and obvious if my bootstrap with it shows no
problems (or someone tells me not to, of course).


Segher


2019-10-15  Segher Boessenkool  

PR rtl-optimization/92107
* genattrtab.c (write_attr_value) : Parenthesize the
expression written.

---
 gcc/genattrtab.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index cdf0b5c..2fd8593 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -4425,11 +4425,11 @@ write_attr_value (FILE *outf, class attr_desc *attr, 
rtx value)
   goto do_operator;
 
 do_operator:
+  fprintf (outf, "(");
   write_attr_value (outf, attr, XEXP (value, 0));
-  fputc (' ', outf);
-  fputc (op,  outf);
-  fputc (' ', outf);
+  fprintf (outf, " %c ", op);
   write_attr_value (outf, attr, XEXP (value, 1));
+  fprintf (outf, ")");
   break;
 
 case IF_THEN_ELSE:
-- 
1.8.3.1



Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
On Tue, Oct 15, 2019 at 12:47:02PM -0500, Peter Bergner wrote:
> On 10/15/19 4:56 AM, Segher Boessenkool wrote:
> > On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
> >> And another issue: Behavior is still inconsistent between "-mno-vsx
> >> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
> >> between "-mvsx -flto" and "-mvsx". 
> > 
> >>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
> >> /home/guojiufu/temp/novsx.c: In function 'main':
> >> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to 
> >> 'always_inline' 'foo': target specific option mismatch
> > 
> > So what should we do about this?  There are arguments for *both*
> > behaviours, and apparently with LTO we do not know which flags are
> > explicit?
> 
> I'd say this is user error, telling the compiler it has to inline the callee
> function, but then using incompatible options on the caller and the callee,
> so that it cannot.  I think the error message is the correct thing here.

Everything here is -mno-vsx though?  The always_inline "foo" has it as
target attribute, but everyone (also) has it from the command line arg?


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
On Tue, Oct 15, 2019 at 01:19:51PM +0200, Richard Biener wrote:
> On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
>  wrote:
> > On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > > I think we just need to fix the bug in the current logic when checking
> > > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > > for that, I think we just need to add a test that enforces that the
> > > > caller's ISA flags match exactly the callee's flags, for those flags
> > > > that were explicitly set in the callee.  The patch below seems to fix
> > > > the issue (regtesting now).  Does this look like what we want?
> > >
> > > I believe this is going to bite you exactly in the case you want the
> > > opposite behavior.  If you have CUs compiled with defaults and
> > > a specialized one with VSX that calls into generic compiled functions
> > > you _do_ want to allow inlining into the VSX enabled routines.
> >
> > Yes, but *not* inlining is relatively harmless, while inlining can be
> > fatal.  I don't see how we can handle both scenarios optimally.
> 
> How can it be fatal to inline a non-VSX function into a VSX one?

Oh I misread, I thought it was the other way around.

> > > Just
> > > think of LTO, C++ and comdats - you'll get a random comdat entity
> > > at link time for inlining - either from the VSX CU or the non-VSX one.
> >
> > This would make LTO totally unusable, with or without this patch?  Something
> > else must be going on?
> 
> It's the same without LTO - the linker will simply choose (randomly)
> one of the comdats from one of the CUs providing it, not caring about
> some built with and some without VSX.

Hrm, so how does that ever work?


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > I think we just need to fix the bug in the current logic when checking
> > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > for that, I think we just need to add a test that enforces that the
> > caller's ISA flags match exactly the callee's flags, for those flags
> > that were explicitly set in the callee.  The patch below seems to fix
> > the issue (regtesting now).  Does this look like what we want?
> 
> I believe this is going to bite you exactly in the case you want the
> opposite behavior.  If you have CUs compiled with defaults and
> a specialized one with VSX that calls into generic compiled functions
> you _do_ want to allow inlining into the VSX enabled routines.

Yes, but *not* inlining is relatively harmless, while inlining can be
fatal.  I don't see how we can handle both scenarios optimally.

> Just
> think of LTO, C++ and comdats - you'll get a random comdat entity
> at link time for inlining - either from the VSX CU or the non-VSX one.

This would make LTO totally unusable, with or without this patch?  Something
else must be going on?


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
On Tue, Oct 15, 2019 at 11:52:26AM +0200, Andreas Schwab wrote:
> On Okt 15 2019, Segher Boessenkool  wrote:
> 
> > Please use
> >   /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */
> 
> ITYM
> 
> /* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */

Ha yes, thanks :-)  That's the problem with cut-and-paste (which is also
the cause of all these strange dg lines, I think).


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
Hi!

On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
> And another issue: Behavior is still inconsistent between "-mno-vsx
> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
> between "-mvsx -flto" and "-mvsx". 

>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
> /home/guojiufu/temp/novsx.c: In function 'main':
> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to 
> 'always_inline' 'foo': target specific option mismatch

So what should we do about this?  There are arguments for *both*
behaviours, and apparently with LTO we do not know which flags are
explicit?

> +  /* Propogate global flags to caller.  */
> +  HOST_WIDE_INT caller_isa = rs6000_isa_flags;

I don't think that is right, or, I don't see why, anyway?

> +
> +  if (((caller_isa & callee_isa) == callee_isa)
> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
> +   ret = true;
> +}


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool
Hi!

On Mon, Oct 14, 2019 at 07:18:11PM -0500, Peter Bergner wrote:
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute. 
> > 
> > I agree, and that is what we already do afaik.
> 
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do.  It just has some bugs.

It says that it wants the callee ISA to be a subset of the caller ISA,
and that is what it does.  Which is exactly what we want.  But as the
PR points out it is *also* useful to not inline callees that explicitly
disabled some ISA feature the caller has.  Which we never promissed, but
perhaps we should.

> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.

I think LTO should handle noinline attributes just fine, it would be a
much bigger problem than this PR if it doesn't!


Missing date+name+address here; and missing PR ref.

> gcc/
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
>   options.

"Prohibit inlining if the callee explicitly disables some isa_flags the
caller has", or something like that?  A few more words please.

> --- gcc/testsuite/gcc.target/powerpc/pr70010.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c(working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

Just
  /* { dg-do compile } */
please: everything in gcc.target checks for powerpc*-*-* always (and
compile is the default, but it's nice documentation).

> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */

You don't need this I think.  If this test cannot work on Darwin for
some reason, the Darwin maintainers will take care of it.  Unless you
*know* they want (or need) the skip here?

> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */

-finline does nothing; it isn't even documented, only -fno-inline is.

> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */

Please use
  /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */
so that it won't match "rldibl vadd_no_vsx", etc.  Doesn't matter much
for this testcase, but it's a good habit (and using it makes me not
comment about it, also useful ;-) )

Okay for trunk with those tweaks.  Thanks!


Segher


Re: [PATCH] V5, #4 of 15: Update predicates

2019-10-14 Thread Segher Boessenkool
On Mon, Oct 14, 2019 at 05:16:03PM -0400, Michael Meissner wrote:
> On Fri, Oct 11, 2019 at 04:17:02PM -0500, Segher Boessenkool wrote:
> > >   * config/rs6000/predicates.md (lwa_operand): Allow using PLWA to
> > >   generate sign extend with odd offsets.
> > 
> > I don't understand what this means.  "Odd offsets" isn't correct, in any
> > case?
> 
> Non-zero in the bottom 2/4 bits.  I'll try to come up with different words.

Ah.  "If prefixed is allowed, allow addresses not a multiple of 4."?

> > > +  /* The LWA instruction uses the DS-form format where the bottom two 
> > > bits of
> > > + the offset must be 0.  The prefixed PLWA does not have this
> > > + restriction.  */
> > > +  if (address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > > +return true;
> > 
> > DImode?
> 
> Yes, because LWA converts SImode to DImode.

Well, the access is to a SImode datum.  But you want DImode here so it uses
the code for ld (the DS-mode code) also for lwa.  Hrm, address_to_insn_form
needs a comment for that then, and so do callers like this one.  Should be
fine with that, it's just an irregularity in the ISA.

> > > +;; Return 1 if op is either a register operand or a memory operand that 
> > > does
> > > +;; not use a prefixed address.
> > > +(define_predicate "reg_or_non_prefixed_memory"
> > > +  (match_code "reg,subreg,mem")
> > > +{
> > > +  return (gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode));
> > > +})
> > 
> > This never allows subreg.
> 
> Gpc_reg_operand allows subreg, assuming that register_operand allows subreg.

It does, and I have no idea what I thought here now.  Sorry.


Segher


Re: [PATCH] V5, #2 of 15: Fix prefixed instruction length for some 128-bit types

2019-10-14 Thread Segher Boessenkool
On Mon, Oct 14, 2019 at 05:12:56PM -0400, Michael Meissner wrote:
> On Thu, Oct 10, 2019 at 05:02:09PM -0500, Segher Boessenkool wrote:
> > > @@ -7786,8 +7790,12 @@ (define_insn_and_split "*movtd_64bit_nod
> > >"#"
> > >"&& reload_completed"
> > >[(pc)]
> > > -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> > > -  [(set_attr "length" "8,8,8,12,12,8")])
> > > +{
> > > +  rs6000_split_multireg_move (operands[0], operands[1]);
> > > +  DONE;
> > > +}
> > > +  [(set_attr "non_prefixed_length" "8")
> > > +   (set_attr "prefixed_length" "20")])
> > 
> > It used to be 8,8,8,12,12,8 before.  Was that in error?  Please explain.
> 
> We've had this discussion before, but I didn't update the ChangeLog.
> 
> The two cases for 12 bytes (i.e. 3 insns) are r->Y and Y->r constaints.  Y
> matches DS offsets (i.e. bottom 2 bits non-zero) for traditional instructions.
> In looking at rs6000_split_multireg_move, the only way I can see that 3
> instructions would be generated would be if pre_modify/pre_update/etc. were
> generated.  But we don't allow pre_modify on larger types.

So do this as a separate change, first, please?

If it cannot happen, please also add an assert (somewhere early, before
doing anything else with the automodifies), an remove any now-dead code
(if there is any).

> But if you think there might be a case where it does generate 3 instructions, 
> I
> can modify it to use 8,8,8,12,12,8 for the non-prefixed case, and
> 20,20,20,24,24,20 for the prefixed case.

I think you are right, but it's not obvious, and it warrants a separate
patch.  That way, we can easily bisect it if some problem manifests, etc.
Thanks,


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-14 Thread Segher Boessenkool
On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> The general case should be that if the caller ISA supports the callee one 
> then inlining is OK. If this is not wanted in some cases then there are 
> options like using a noinline attribute. 

I agree, and that is what we already do afaik.

But in this case, the callee explicitly disables something (-mno-vsx),
while the caller has it enabled (all modern cpus have VSX).  If it ends
up being inlined, it will get VSX insns generated for it.  Which is what
Jiu Fu's patch aims to prevent.

So you are saying the GCC policy is that you should use noinline on the
callee in such cases?


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-14 Thread Segher Boessenkool
On Mon, Oct 14, 2019 at 10:31:58AM -0500, Peter Bergner wrote:
> I agree with your other comment that we should be looking at explicit option
> usage versus default options.  However, the way we now implement default CPU,
> the gcc driver always passes a -mcpu= option to cc1 no matter if the user
> used -mcpu= or not, so -mcpu= will always looks like an explicit option.
> So when -mcpu=power[789] is passed to cc1 (via explicit user usage or default
> cpu), does that look like -mvsx was also explicitly used?  I'm guessing not.

-mdebug=reg tells you (answer: explicit means explicit, for everything I
tried anyway).


Segher


Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-12 Thread Segher Boessenkool
Hi!

On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> (this is a bug reported against Fortran, but actually is a generic insn
>  selection problem for m64 powerpc-darwin.  In fact, I’d say it renders
>  the 64b multilib unusable).
> 
> The solution proposed is Darwin-local (it's a long-standing bug and it
> would be intended to back-port it).  However, I'd welcome views on it.
> 
> The current Darwin load/store lo_sum patterns have neither predicate nor
> constraint.  This means that most parts of the backend, which rely on
> recog() to validate the rtx, can produce invalid combinations/selections.
> 
> For 32bit cases this isn't a problem since we can load/store to unaligned
> addresses using D-mode insns.

Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
six years ago or so...  Alan, do you remember?  It required some assembler
work IIRC.

> Conversely, for 64bit instructions that use DS mode, this can manifest as
> assemble errors (for an assembler that checks the LO14 relocations), or as
> crashes caused by wrong offsets (or worse, wrong content for the two LSBs).

Resulting in a different insn than intended.  Yeah.

> Trying to find the right place to fix this has been tricky, there's
> discussion in the PR trail.  There doesn't seem to be any way to deal with
> this in legitimize_address (since most of the damage is done by the wide open
> patterns that are in effect after the legitimizer has run).  Likewise,
> extending the checks in legitimate_address_p() and mode_dependent_address
> doesn't help if the constraint is not accurate in the end.
> 
> What we want to check for "Y" on Darwin is:
>   - that the alignment of the Symbols' target is sufficient for DS mode
>   - that the offset is suitable for DS mode.
> (while looking through the Mach-O PIC unspecs).
> 
> Proposed:
> 
> 1) Drop the Darwin-specific lo_sum patterns.  Actually, this produces an
> immediate progression in quite a number of tests, we begin using the
> movdi_internal64 patterns.  However there are also regressions caused by
> instructions unable to satisfy their constraints.

And code quality regressions?  Or does it even improve?  (One can dream...)

> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
> 
>  - note, that rs6000_offsettable_memref_p () will not handle these so that
>would return early, producing the issue with unsatisfiable constraints.
> 
>   - I do wonder if that's also the case for some non-Darwin lo_sum cases.

Not sure what exactly you mean here?  Non-Darwin doesn't have issues with
not looking through Darwin-specific unspecs, anywhere, so you mean
something more general no doubt -- but what?

> (some things might be hard to detect, since the code will generally fall
>  back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>  less efficient than it could be).

Oh, you are saying mem_operand_gpr does not allow lo_sum *at all*?


> +  /* If we don't know the alignment of the thing to which the symbol refers,
> + we assume optimistically it is "enough".
> + ??? maybe we should be pessimistic instead.  */
> +  unsigned align = 0;

If you guess it is aligned enough but it isn't, the compile will fail.  Not
good.  OTOH, when don't we know the alignment?  Only for globals?  Does the
ABI guarantee alignment in that case?


I'll have another looke through this (esp. the generic part) when I'm fresh
awake (but not before coffee!).  Alan, can you have a look as well please?


Segher


Re: [PATCH 2/2] Documentation hyperlinks for [-Wname-of-option] (PR 87488)

2019-10-12 Thread Segher Boessenkool
On Sat, Oct 12, 2019 at 10:21:56AM -0400, David Malcolm wrote:
> It seems like instead it's displaying this:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wshadow=local-Wshadow=local%1B]8

That's what it does for me (well, with weird escapes at both ends).  And
if run inside a tmux it just shows what you wanted to show, but there is
no way to click it: the URL is not displayed at all.

> What happens if you try this at the terminal:
> 
>   printf '\e]8;;http://example.com\e\\This is a link\e]8;;\e\\\n'

In gnome term it shows a bunch of funny escapes.  In xterm it just
displays "This is a link", without any way to click it, like in the
tmux scenario.

The links are wrong btw, they should be version-specific?

> I don't yet know if there's a way to query the terminal to see if the
> escape is supported.  My hope was that they would be silently
> discarded.

Just display the URL if you want this?  That works, and has always worked,
and many terminals have convenient ways to use that (have had that since
forever, too).

Can we have this *off* by default, please?  The warnings are getting
way too verbose.  Often one warning does not fit on one screen already!

> > Do you see a way to find out if the escape sequences are supported,
> > or how to disable this function per configure option?
> 
> Do you mean a GCC configure-time option to change the default?  We have
> that for colors;

Do we?  I've had GCC_COLORS= since forever; is this finally not needed
anymore?  Neat :-)  Now just ASAN_OPTIONS=color=never TSAN_OPTIONS=color=never
by default and there is a bit of sanity again.

> Is this working/broken for other people?

I've only tried your example so far, and that does not work :-(


Segher


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-12 Thread Segher Boessenkool
Hi!

On Sat, Oct 12, 2019 at 11:22:15AM +0800, Jiufu Guo wrote:
> As expected in the PR, when a function is marked with no-vsx, we would
> assume user has good reason to disable VSX code generation for the function.  
> To avoid VSX code generation, this function should not be inlined into VSX
> function. 

But your patch also disables inlining if the callee merely defaulted to
no-vsx.  Can you see if the option is explicitly disabled here?

>   PR target/70010
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Check 'vsx' feature
>   for caller and callee

Each sentence should end with a period, also in changelogs.

>   PR target/70010
>   * gcc.target/powerpc/inline_error.c: New test

Ditto.

> -  /* Callee's options should a subset of the caller's, i.e. a vsx 
> function
> -  can inline an altivec function but a non-vsx function can't inline a
> -  vsx function.  */
> -  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> -   == callee_opts->x_rs6000_isa_flags)
> +  /* Callee's options should a subset of the caller's. In addition,
> +  vsx function should not inline function with non-vsx by which
> +  programmer may intend to disable VSX code generation.  */

Two spaces after a period.  How about something like

  /* Callee's options should be a subset of the caller's.  Also, a function
 without VSX enabled should not be inlined into one with VSX enabled,
 because it may be important it is disabled there; see PR70010.  */

It's not clear to me why this is important, and what makes -mvsx different
from all other similar options?


Segher


Re: [PATCH] V5, #6 of 15: Make vector load/store instruction length correct with prefixed addresses

2019-10-11 Thread Segher Boessenkool
Hi!

On Wed, Oct 09, 2019 at 04:26:20PM -0400, Michael Meissner wrote:
> --- gcc/config/rs6000/vsx.md  (revision 276713)
> +++ gcc/config/rs6000/vsx.md  (working copy)
> @@ -1149,10 +1149,14 @@ (define_insn "vsx_mov_64bit"
> "vecstore,  vecload,   vecsimple, mffgpr,mftgpr,load,
>  store, load,  store, *, vecsimple, 
> vecsimple,
>  vecsimple, *, *, vecstore,  vecload")
> -   (set_attr "length"
> +   (set_attr "non_prefixed_length"
> "*, *, *, 8, *, 8,
>  8, 8, 8, 8, *, *,
>  *, 20,8, *, *")
> +   (set_attr "prefixed_length"
> +   "*, *, *, 8, *, 20,
> +20,20,20,8, *, *,
> +*, 20,8, *, *")

Alternative 13 has non_prefixed_length 20, I wonder what insns that
generates?

Other than that, looks good afaics.


Segher


Re: [PATCH] V5, #5 of 15: Support -fstack-protect and large stack frames

2019-10-11 Thread Segher Boessenkool
On Wed, Oct 09, 2019 at 04:22:06PM -0400, Michael Meissner wrote:
> This patch allows -fstack-protect to work with large stack frames.

I don't understand; please explain.

What I see is a workaround for not properly handling prefixed addresses
in the stack protect code (by forcing the addresses to not be prefixed at
expand time).

> +rtx
> +make_memory_non_prefixed (rtx mem)
> +{
> +  gcc_assert (MEM_P (mem));
> +
> +  rtx old_addr = XEXP (mem, 0);
> +  if (address_is_prefixed (old_addr, GET_MODE (mem), NON_PREFIXED_DEFAULT))
> +{
> +  rtx new_addr;
> +
> +  if (GET_CODE (old_addr) == PLUS
> +   && (REG_P (XEXP (old_addr, 0)) || SUBREG_P (XEXP (old_addr, 0)))

How can you ever have a subreg in an address?  One in Pmode even?

> +   && CONST_INT_P (XEXP (old_addr, 1)))
> + {
> +   rtx tmp_reg = force_reg (Pmode, XEXP (old_addr, 1));
> +   new_addr = gen_rtx_PLUS (Pmode, XEXP (old_addr, 0), tmp_reg);
> + }
> +  else
> + new_addr = force_reg (Pmode, old_addr);
> +
> +  mem = change_address (mem, VOIDmode, new_addr);

replace_equiv_address ?

> +(define_expand "stack_protect_setdi"
> +  [(parallel [(set (match_operand:DI 0 "memory_operand")
> +(unspec:DI [(match_operand:DI 1 "memory_operand")]
> +UNSPEC_SP_SET))
> +   (set (match_scratch:DI 2)
> +(const_int 0))])]
> +  "TARGET_64BIT"
> +{
> +  if (TARGET_PREFIXED_ADDR)
> +{
> +  operands[0] = make_memory_non_prefixed (operands[0]);
> +  operands[1] = make_memory_non_prefixed (operands[1]);
> +}
> +})

It shouldn't be terribly hard to make the define_insns just *work* with
prefixed insns, instead?  Is there any reason we are sure these memory
references will not turn into something that needs prefixed insns, after
expand?  It seems fragile like this.

> +@item em
> +A memory operand that does not contain a prefixed address.

"A memory operand that does not require a prefixed instruction"?


Segher


Re: [PATCH] V5, #4 of 15: Update predicates

2019-10-11 Thread Segher Boessenkool
On Wed, Oct 09, 2019 at 04:07:50PM -0400, Michael Meissner wrote:
> It also adds two new predicates that disallow prefixed memory that will used 
> in
> patches #5 and #7.

Then you should have really introduced them in *those* patches.

> 2019-10-08  Michael Meissner  
> 
>   * config/rs6000/predicates.md (lwa_operand): Allow using PLWA to
>   generate sign extend with odd offsets.

I don't understand what this means.  "Odd offsets" isn't correct, in any
case?

> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> + the offset must be 0.  The prefixed PLWA does not have this
> + restriction.  */
> +  if (address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> +return true;

DImode?

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +return false;
> +
> +  return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +})

This one is fine.

> +;; Return 1 if op is either a register operand or a memory operand that does
> +;; not use a prefixed address.
> +(define_predicate "reg_or_non_prefixed_memory"
> +  (match_code "reg,subreg,mem")
> +{
> +  return (gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode));
> +})

This never allows subreg.


Segher


Re: [PATCH] V5, #3 of 15: Deal with prefixed instructions in rs6000_insn_cost

2019-10-11 Thread Segher Boessenkool
Hi!

On Wed, Oct 09, 2019 at 04:03:16PM -0400, Michael Meissner wrote:
> The basic problem is if there is no explicit cost predicate, rs6000_insn_cost
> uses the instruction size to figure out how many instructions are present, and
> make the cost a fact on that.  Since prefixed instructions are 12 bytes within
> GCC (to deal with the implicit NOP), if we did not do this change, the
> optimizers would try to save registers from prefixed loads because they 
> thought
> the load was more expensive.

Maybe we should just have an attribute that says how many insns this is?
You can get rid of many prefixed_length and non_prefixed_length attributes
that way, too.

> +  int cost;
> +  int length = get_attr_length (insn);
> +  int n = length / 4;
> +
> +  /* How many real instructions are generated for this insn?  This is 
> slightly

What is a "real" instruction?  Machine instruction?

> + different from the length attribute, in that the length attribute counts
> + the number of bytes.  With prefixed instructions, we don't want to 
> count a
> + prefixed instruction (length 12 bytes including possible NOP) as taking 
> 3
> + instructions, but just one.  */
> +  if (length >= 12 && get_attr_prefixed (insn) == PREFIXED_YES)
> +{
> +  /* Single prefixed instruction.  */
> +  if (length == 12)
> + n = 1;
> +
> +  /* A normal instruction and a prefixed instruction (16) or two back
> +  to back prefixed instructions (20).  */
> +  else if (length == 16 || length == 20)
> + n = 2;
> +
> +  /* Guess for larger instruction sizes.  */
> +  else
> + n = 2 + (length - 20) / 4;

That's a pretty bad estimate.

Can you look at non_prefixed_size, will that help?


Segher


Re: [PATCH, v2] Extend subst to simplify CONST_INT inside SIGN_EXTEND.

2019-10-11 Thread Segher Boessenkool
Hi!

On Thu, Oct 10, 2019 at 03:16:36PM -0700, Jim Wilson wrote:
> This addresses PR 91860 which has four testcases triggering internal errors.
> The problem here is that in combine when handling debug insns, we are trying
> to substitute
> (sign_extend:DI (const_int 8160 [0x1fe0]))
> as the value for
> (reg:DI 78 [ _9 ])
> in the debug insn
> (debug_insn 29 28 30 2 (var_location:QI d (subreg:QI (reg:DI 78 [ _9 ]) 0)) 
> "tmp4.c":11:5 -1
>  (nil))
> This eventually triggers an abort because 8160 is not a sign-extended
> QImode value.
> 
> We should avoid creating the invalid RTL in the first place.

It is *normal* for combine to create invalid RTL.  It first creates it,
and it checks if it is valid later.

However:

> In subst there
> is already code to avoid putting a CONST_INT inside a ZERO_EXTEND.  This
> needs to be extended to also handle a SIGN_EXTEND the same way.

That works, sure.  Approved for trunk.  Thanks!


Segher


Re: [PATCH] V5, #2 of 15: Fix prefixed instruction length for some 128-bit types

2019-10-10 Thread Segher Boessenkool
On Wed, Oct 09, 2019 at 03:56:01PM -0400, Michael Meissner wrote:
> This patch fixes the prefixed and non-prefixed instruction sizes for the
> 128-bit types that aren't loaded into 128-bit vectors (TDmode, TFmode, IFmode,
> PTImode).

> 2019-10-08  Michael Meissner  
> 
>   * config/rs6000/rs6000.md (mov_64bit_dm): Set prefixed and
>   non-prefixed length.
>   (movtd_64bit_nodm): Set prefixed and non-prefixed length.
>   (mov_ppc64): Set prefixed and non-prefixed length.

Please also note the patterns you reformatted.  (Just "Reformat." is
enough of course).

>  (define_insn_and_split "*movtd_64bit_nodm"
>[(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
> @@ -7786,8 +7790,12 @@ (define_insn_and_split "*movtd_64bit_nod
>"#"
>"&& reload_completed"
>[(pc)]
> -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> -  [(set_attr "length" "8,8,8,12,12,8")])
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "non_prefixed_length" "8")
> +   (set_attr "prefixed_length" "20")])

It used to be 8,8,8,12,12,8 before.  Was that in error?  Please explain.


Segher


Re: [PATCH] V5, #1 of 15: Support 34-bit offsets for prefixed instructions

2019-10-10 Thread Segher Boessenkool
On Wed, Oct 09, 2019 at 03:52:54PM -0400, Michael Meissner wrote:
> This patch adds support in the various functions that check memory offsets for
> the 34-bit offset with prefixed instructions on the 'future' machine.

> 2019-10-08  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (quad_address_p): Add check for prefixed
>   addresses.
>   (mem_operand_gpr): Add check for prefixed addresses.
>   (mem_operand_ds_form): Add check for prefixed addresses.
>   (rs6000_legitimate_offset_address_p): If we support prefixed
>   addresses, check for a 34-bit offset instead of 16-bit.
>   (rs6000_legitimate_address_p): Add check for prefixed addresses.
>   Do not allow load/store with update if the address is prefixed.
>   (rs6000_mode_dependent_address):  If we support prefixed
>   addresses, check for a 34-bit offset instead of 16-bit.

s/  If/ If/

Okay for trunk.  Thanks!


Segher


Re: [PATCH], V4, patch #9 [part of patch #4.2], Add prefixed address offset checks

2019-10-10 Thread Segher Boessenkool
On Wed, Oct 09, 2019 at 07:40:23PM -0400, Michael Meissner wrote:
> On Wed, Oct 09, 2019 at 04:56:48PM -0500, Segher Boessenkool wrote:
> > On Fri, Oct 04, 2019 at 08:29:11AM -0400, Michael Meissner wrote:
> > > @@ -8651,6 +8675,11 @@ rs6000_legitimate_address_p (machine_mod
> > >&& mode_supports_pre_incdec_p (mode)
> > >&& legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict))
> > >  return 1;
> > > +
> > > +  /* Handle prefixed addresses (PC-relative or 34-bit offset).  */
> > > +  if (address_is_prefixed (x, mode, NON_PREFIXED_DEFAULT))
> > > +return 1;
> > 
> > Is this correct?  Are addresses with a larger offset always legitimate?
> > I don't see why that would be the case.

> The function address_to_insn_form, which is called by address_is_prefixed,
> checks if the offset is 34-bits or less

Ah, right.

And "address_is_prefixed" is a long enough name, 
"address_is_a_valid_prefixed_address"
isn't better ;-)


Segher


Re: [PATCH], V4, patch #9 [part of patch #4.2], Add prefixed address offset checks

2019-10-09 Thread Segher Boessenkool
Hi!

On Fri, Oct 04, 2019 at 08:29:11AM -0400, Michael Meissner wrote:
> @@ -8651,6 +8675,11 @@ rs6000_legitimate_address_p (machine_mod
>&& mode_supports_pre_incdec_p (mode)
>&& legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict))
>  return 1;
> +
> +  /* Handle prefixed addresses (PC-relative or 34-bit offset).  */
> +  if (address_is_prefixed (x, mode, NON_PREFIXED_DEFAULT))
> +return 1;

Is this correct?  Are addresses with a larger offset always legitimate?
I don't see why that would be the case.

The rest of the patch looks good, thanks.


Segher


Re: [PATCH, rs6000] Lower vec_promote_demote vectorization cost for P8/P9

2019-10-09 Thread Segher Boessenkool
Hi Kewen,

On Wed, Oct 09, 2019 at 02:43:02PM +0800, Kewen.Lin wrote:
> This patch is to lower vec_promote_demote vectorization cost in
> rs6000_builtin_vectorization_cost.  It's similar to what we committed
> for vec_perm, the current cost for vec_promote_demote is also
> overpriced for Power8 and Power9 since Power8 and Power9 has
> supported more units for permute/unpack/pack rather than single one
> on Power7.
> 
> The performance evaluation on SPEC2017 Power9 shows +2.88% gain on
> 525.x264_r, degraded -1.70% on 526.blender_r but which has been
> identified as just exposing some other issues and actually unrelated,
> while SPEC2017 Power8 evaluation shows +4.63% gain on 525.x264_r 
> without any significant degradations, SPEC2006 Power8 evaluation 
> shows 1.99% gain on 453.povray.  The geomean gain for SPEC2017
> on both Power8 and Power9 is +0.06%, and it's unchanged for SPEC2006
> Power8.

Small steps :-)

The patch is okay for trunk.  Thank you!


Segher


>   * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Lower
>   vec_promote_demote cost to 1 for non-Power7 VSX architectures.


Re: Type representation in CTF and DWARF

2019-10-09 Thread Segher Boessenkool
On Tue, Oct 08, 2019 at 10:26:13PM -0700, Indu Bhagat wrote:
> The justification for CTF is and will remain - a compact, faster debug 
> format
> for type information and support some online debugging use-cases (like
> backtraces) in future.

Approximate backtraces, sure.  (It cannot know if another frame has been
stacked by the current function already, or not).


Segher


Re: [PATCH 2/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-09 Thread Segher Boessenkool
Hi Jozef,

On Tue, Oct 08, 2019 at 11:39:57AM +0100, Jozef Lawrynowicz wrote:
> +;; 
> +;; ZERO EXTEND INSTRUCTIONS
> +;; Byte-writes to registers clear bits 19:8
> +;;   * Byte-writes to memory do not affect bits 15:8
> +;; Word-writes to registers clear bits 19:16
> +;; PSImode writes to memory clear bits 16:4 of the second memory word

Should that be 15:4 instead?

> +; FIXME many of these should be unnnecessary once combine deals with
> +; (sign_extend (zero_extend)) or (sign_extend (subreg)) BZ 91865.

I have looked at it some time ago, and it is quite hard :-/  It's not
forgotten though!


Segher


Re: [PATCHv2] Change the library search path when using --with-advance-toolchain

2019-10-08 Thread Segher Boessenkool
On Fri, Oct 04, 2019 at 06:31:34PM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Remove all -L directories from LINK_OS_EXTRA_SPEC32 and
> LINK_OS_EXTRA_SPEC64 so that user directories specified at
> build time have higher preference over the advance toolchain libraries.
> 
> Set MD_STARTFILE_PREFIX to $prefix/lib/ and MD_STARTFILE_PREFIX_1 to
> $at/lib/ so that a compiler library has preference over the Advance
> Toolchain libraries.
> 
> 2019-10-04  Tulio Magno Quites Machado Filho  
> 
>   * config.gcc: Move -L usage from LINK_OS_EXTRA_SPEC32 and
>   LINK_OS_EXTRA_SPEC64 to MD_STARTFILE_PREFIX and
>   MD_STARTFILE_PREFIX_1 when using --with-advance-toolchain.

Committed ( https://gcc.gnu.org/r276702 ).  Thanks!


Segher


Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Segher Boessenkool
On Mon, Oct 07, 2019 at 05:11:27PM +, Bernd Edlinger wrote:
> On 10/7/19 9:20 AM, Eric Botcazou wrote:
> > No, please, the cure would be much worse than the disease.
> 
> Ack.
> 
> I think the least worse thing would be a pragma in the macro where the 
> shadowing
> variable is declared...

The best thing would be to write this in a way that nothing *can*
accidentally shadow something.


Segher


Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-07 Thread Segher Boessenkool
On Mon, Oct 07, 2019 at 08:56:44AM +0200, Richard Biener wrote:
> But I agree that mechanically "fixing" the current code-base, while ending up
> with no new introductions of local shadowing, is worse if it makes the current
> code-base worse.

Obfucating the names is not often a good fix for the "this code is too big
and complex" problem :-(

Good fixes for this cannot be done mechanically.

> This probably means fixing the header file issues first though.

Yeah, and those are some of the nastier issues anyway.  We need to try to
get rid of some more big macros.  Not the most exciting work, but useful.


Segher


Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-05 Thread Segher Boessenkool
Hi Bernd,

On Sat, Oct 05, 2019 at 09:12:19AM +, Bernd Edlinger wrote:
> On 10/5/19 10:08 AM, Segher Boessenkool wrote:
> > I'll just review the combine part.
> > 
> > On Sat, Oct 05, 2019 at 06:36:34AM +, Bernd Edlinger wrote:
> >> --- gcc/combine.c  (revision 276598)
> >> +++ gcc/combine.c  (working copy)
> >> @@ -1219,7 +1219,7 @@ combine_instructions (rtx_insn *f, unsigned int nr
> >>FOR_BB_INSNS (this_basic_block, insn)
> >>  if (INSN_P (insn) && BLOCK_FOR_INSN (insn))
> >>  {
> >> -rtx links;
> >> +rtx link;
> > 
> > Where/what does this conflict with?  Well, nothing; I meant "shadow", it
> > shadows some other name.  This whole (nested) loop should just be factored
> > out.
> > 
> > It's not a good name either way: it is *not* what we call "links" in
> > combine, it is a pointer to a register note instead.  So name it
> > something like "note"?
> > 
> > Renaming code without considering its semantics is called "obfuscation".
> 
> All I can consider is if the variable called links is of the same type,
> and if it is live or dead at this point.
> I have no idea how a good name should be, unfortunately.
> I'd be happy with using note, obviously.

You can change *this* code to non-sensical or at least worse names, or you
can fix the actual problems (and the latter cannot be done mechanically).

The former is arguably wrong, and because of that, certainly not an
obvious fix.  That is true about everything here, not just combine.

> >> @@ -1542,10 +1542,10 @@ retry:
> >>reg_stat.release ();
> >>  
> >>{
> >> -struct undo *undo, *next;
> >> -for (undo = undobuf.frees; undo; undo = next)
> >> +struct undo *undo, *next1;
> >> +for (undo = undobuf.frees; undo; undo = next1)
> > 
> > No.  I object.
> > 
> > If there is anything called "next" in a bigger scope as well, and for some
> > reason we feel that this is bad, then *that* one should change, and change
> > to a name that is meaningful in all of its large scope.
> 
> I would take your suggestion, if any, but I would be happy
> if you want to fix that, it will obviously way better when
> you do it, and we are of course not in a hurry.

You need to factor this code (not "refactor", that is something else),
or change the name in the *outer* scope.  Or, you can say this isn't a
big problem at all, also a reasonable stance if you ask me, so no "fix"
is needed.

> The disadvantage of shadowing, is when you copy code from one block to an
> outer block and you use the same name,

Stop right there.  Your problem is your code has too big blocks, so *that*
is what needs fixing there.

> >> -int i = XVECLEN (i3pat, 0) - 1;
> >> +int j = XVECLEN (i3pat, 0) - 1;
> > 
> > Why?  No.  No no no.
> 
> this is in this loop:
> 
>   for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
> {
> 
> so using i in the inner block is not okay, because a human
> reader may thing that i of the outer loop is assigned a new
> value here.

Using "i" in the inner loop is just dandy.  If it then conflicts with some
name in some larger scope, you need to fix *that*: change *that* name, or
don't have that scope accessible at all (i.e., factor your code).

> >> @@ -2546,8 +2546,6 @@ update_cfg_for_uncondjump (rtx_insn *insn)
> >>delete_insn (insn);
> >>if (EDGE_COUNT (bb->succs) == 1)
> >>  {
> >> -  rtx_insn *insn;
> >> -
> >>single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
> >>  
> >>/* Remove barriers from the footer if there are any.  */
> > 
> > This is a separate change.  Although, if this would be an unused variable
> > someone would have noticed by now.  So what is this about?
> > 
> > Are you suggesting abusing a variable in a larger scope for some local
> > use?  That doesn't fly.  Rename the var in the larger scope, instead.  To
> > a *useful* *descriptive* name.
> 
> We use scratch values, like 
>   rtx_insn *insn;
> everywhere, so i did not see
> what is wrong with just using it.

The outer "insn" is a function argument here!  Reusing that var for
something else is a real problem, much worse than having the same name
so someone might just *think* you do this.

> Obviously I can rename this insn to (insn1 ?) if you want that.
> Please say which name you want this insn to have.

I want it to be called "insn".  "insn" is a good name for 

Re: [PATCHv2] Change the library search path when using --with-advance-toolchain

2019-10-05 Thread Segher Boessenkool
On Fri, Oct 04, 2019 at 06:31:34PM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Remove all -L directories from LINK_OS_EXTRA_SPEC32 and
> LINK_OS_EXTRA_SPEC64 so that user directories specified at
> build time have higher preference over the advance toolchain libraries.
> 
> Set MD_STARTFILE_PREFIX to $prefix/lib/ and MD_STARTFILE_PREFIX_1 to
> $at/lib/ so that a compiler library has preference over the Advance
> Toolchain libraries.

This is fine, approved for all branches.  Thank you!  And thanks to Mike
for the testing.


Segher


> 2019-10-04  Tulio Magno Quites Machado Filho  
> 
>   * config.gcc: Move -L usage from LINK_OS_EXTRA_SPEC32 and
>   LINK_OS_EXTRA_SPEC64 to MD_STARTFILE_PREFIX and
>   MD_STARTFILE_PREFIX_1 when using --with-advance-toolchain.


Re: [PATCH] Fix -Wshadow=local warnings in elfos.h

2019-10-05 Thread Segher Boessenkool
On Sat, Oct 05, 2019 at 08:28:26AM +, Bernd Edlinger wrote:
> On 10/5/19 9:26 AM, Segher Boessenkool wrote:
> > On Fri, Oct 04, 2019 at 02:26:26PM -0600, Jeff Law wrote:
> > To get 95% of the benefit for only 2% of the pain, you can make an inline
> > function where there was a macro before, and define that same macro to
> > just call the function.

> Hmm, I tried this but it does not work:

[ snip ]

> In file included from ./tm.h:30:0,
>  from ../../gcc-trunk/gcc/gencheck.c:23:
> ../../gcc-trunk/gcc/config/elfos.h: In function ‘void 
> elfos_asm_declare_object_name(FILE*, const char*, tree)’:
> ../../gcc-trunk/gcc/config/elfos.h:324:51: error: ‘DECL_ONE_ONLY’ was not 
> declared in this scope

So not everything including elfos.h (via tm.h perhaps) includes tree.h
first, but this macro / function requires that, and as function that is
a compilation error.

So maybe this macro / function should move to some header that isn't
included everywhere, if it accesses stuff that isn't defined in all
places?

Or maybe it is best to make this a real hook, move the implementation to
a .c that actually includes the headers it needs ;-)

(elfos.h is kind of special, it is included via tm.h, this is set up in
config.gcc).

> Now I don't know how to fix that easily.

Yeah me neither.  Pity.


Segher


Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-05 Thread Segher Boessenkool
Hi Bernd,

I'll just review the combine part.

On Sat, Oct 05, 2019 at 06:36:34AM +, Bernd Edlinger wrote:
> --- gcc/combine.c (revision 276598)
> +++ gcc/combine.c (working copy)
> @@ -1219,7 +1219,7 @@ combine_instructions (rtx_insn *f, unsigned int nr
>FOR_BB_INSNS (this_basic_block, insn)
>  if (INSN_P (insn) && BLOCK_FOR_INSN (insn))
> {
> -rtx links;
> +rtx link;

Where/what does this conflict with?  Well, nothing; I meant "shadow", it
shadows some other name.  This whole (nested) loop should just be factored
out.

It's not a good name either way: it is *not* what we call "links" in
combine, it is a pointer to a register note instead.  So name it
something like "note"?

Renaming code without considering its semantics is called "obfuscation".


> @@ -1542,10 +1542,10 @@ retry:
>reg_stat.release ();
>  
>{
> -struct undo *undo, *next;
> -for (undo = undobuf.frees; undo; undo = next)
> +struct undo *undo, *next1;
> +for (undo = undobuf.frees; undo; undo = next1)

No.  I object.

If there is anything called "next" in a bigger scope as well, and for some
reason we feel that this is bad, then *that* one should change, and change
to a name that is meaningful in all of its large scope.

Here, that outer block (around all of seven lines of code) is there just
to make a new scope, so that we *can* have a new local variable.  It is
local.  There is no "shadowing".  It has the same name as some other
variables, but so what?

> -   int i = XVECLEN (i3pat, 0) - 1;
> +   int j = XVECLEN (i3pat, 0) - 1;

Why?  No.  No no no.

> @@ -2546,8 +2546,6 @@ update_cfg_for_uncondjump (rtx_insn *insn)
>delete_insn (insn);
>if (EDGE_COUNT (bb->succs) == 1)
>  {
> -  rtx_insn *insn;
> -
>single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
>  
>/* Remove barriers from the footer if there are any.  */

This is a separate change.  Although, if this would be an unused variable
someone would have noticed by now.  So what is this about?

Are you suggesting abusing a variable in a larger scope for some local
use?  That doesn't fly.  Rename the var in the larger scope, instead.  To
a *useful* *descriptive* name.

> @@ -2714,7 +2712,6 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
>int maxreg;
>rtx_insn *temp_insn;
>rtx temp_expr;
> -  struct insn_link *link;

Please don't mix moving scope (like this) with deletions (like the
previous), and all the renamings, and some re-indenting.

There is no description for this patch at all either, so it takes a long
time to read it.

> -   HOST_WIDE_INT pos = INTVAL (XEXP (SET_DEST (x), 2));
> -   unsigned HOST_WIDE_INT len = INTVAL (XEXP (SET_DEST (x), 1));
> +   HOST_WIDE_INT pos1 = INTVAL (XEXP (SET_DEST (x), 2));
> +   unsigned HOST_WIDE_INT len1 = INTVAL (XEXP (SET_DEST (x), 1));

No, and no.  No "1" please.  If the shadowing is an actual problem,
*improve* the code (like factor it a bit perhaps, much in combine is way
too big functions), instead of making it much more terrible (if I see a
"len1" I go look for a "len" or "len0" or "len2", and such are not to be
found here; the var should just be called "len").


So.  You say that shadowing is a problem.  A problem for what?  Most of
the time it is completely harmless.  If you write in non-ancient style
(so write shortish functions and blocks, and declare every var at its
first use) you don't see actual shadowing problems much at all.


In most cases here the warning indicates that the code it too big (and
too old), it could use a bit of factoring and rewriting.

But all this patch does is shut up the warnings, without fixing the causes.
This is not an improvement.  Hiding problems is bad.


Segher


Re: [PATCH] Fix -Wshadow=local warnings in elfos.h

2019-10-05 Thread Segher Boessenkool
On Fri, Oct 04, 2019 at 02:26:26PM -0600, Jeff Law wrote:
> Same objections as before.  As long as we're using macros like this,
> we're going to have increased potential for shadowing problems and
> macros which touch implementation details that just happen to be
> available in the context where the macro is used.
> 
> Convert to real functions.  It avoids the shadowing problem and avoids
> macros touching/referencing things they shouldn't.  Code in macros may
> have been reasonable in the 80s/90s, but we should know better by now.
> 
> I'm not ranting against you Bernd, it's more a rant against the original
> coding style for GCC.  Your changes just highlight how bad of an idea
> this kind of macro usage really is.  We should take the opportunity to
> fix this stuff for real.

To get 95% of the benefit for only 2% of the pain, you can make an inline
function where there was a macro before, and define that same macro to
just call the function.

Like

#define DEFAULT_SOME_MACRO(PARMS) { lots of code }

becomes

#define DEFAULT_SOME_MACRO(PARMS) default_some_macro(PARMS)

static inline int
default_some_macro (int parm, long another)
{
  lots of code;
}

The point is that all this is completely *local* changes.


Segher


Re: [PATCH] Fix -Wshadow=local warnings in rtl.h

2019-10-04 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 05:25:55PM +0200, Jakub Jelinek wrote:
> On Thu, Oct 03, 2019 at 03:17:47PM +, Bernd Edlinger wrote:
> > this fixes -Wshadow=local warnings in the RTL_FLAG_CHECKx macros,
> > which happen when this macro is used recursively in a macro
> > argument.  The __typeof (RTX) const _rtx in the inner macro
> > expansions shadows the outer macro expansions.
> > 
> > So reworked the macro to not use statement expressions but
> > use templates instead.  Since the 7-argument overload is not
> > used anywhere removed RTL_FLAG_CHECK7 for now.
> 
> What effect does this have on the cc1/cc1plus .text sizes?
> Does this affect debuggability of --enable-checking=yes,rtl compilers?
> I mean, often when we replace some macros with inlines step in GDB
> becomes a bigger nightmare, having to go through tons of inline frames.
> gdbinit.in has a lengthy list of inlines to skip in rtl.h, shouldn't this be
> added to that list?  Not 100% sure how well it will work on rtl checking
> vs. non-rtl checking builds.

Also, how much slower does it make RTL checking builds?


Segher


Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 04:26:55PM +0100, Richard Earnshaw (lists) wrote:
> Well if that's how tightly ALL_REGS is defined, then really no back-end 
> should be defining it at all.  Instead the build system (or the run-time 
> initialization code) should be constructing it on the fly.  Similarly 
> for NO_REGS.

It's all very old.  Every target should define NO_REGS and ALL_REGS as
enumerators.  All of this used to be completely manual (and still
largely is).  (I'm not sure how we could do much better, except for the
trivial cases, and who cares for that?)


Like this from the first revision of rs6000.h, from 1992.  It said:

+/* Define the classes of registers for register constraints in the
+   machine description.  Also define ranges of constants.
+
+   One of the classes must always be named ALL_REGS and include all hard regs.
+   If there is more than one class, another class must be named NO_REGS
+   and contain no registers.
+
+   The name GENERAL_REGS must be the name of a class (or an alias for
+   another name such as ALL_REGS).  This is the class of registers
+   that is allowed by "g" or "r" in a register constraint.
+   Also, registers outside this class are allocated only when
+   instructions express preferences for them.
+
+   The classes must be numbered in nondecreasing order; that is,
+   a larger-numbered class must never be contained completely
+   in a smaller-numbered class.
+
+   For any two classes, it is very desirable that there be another
+   class that represents their union.  */

and

+/* The RS/6000 has three types of registers, fixed-point, floating-point,
+   and condition registers, plus three special registers, MQ, CTR, and the
+   link register.
+
+   However, r0 is special in that it cannot be used as a base register.
+   So make a class for registers valid as base registers.
+
+   Also, cr0 is the only condition code register that can be used in
+   arithmetic insns, so make a separate class for it. */
+
+enum reg_class { NO_REGS, BASE_REGS, GENERAL_REGS, FLOAT_REGS,
+  NON_SPECIAL_REGS, MQ_REGS, LINK_REGS, CTR_REGS, LINK_OR_CTR_REGS,
+  SPECIAL_REGS, CR0_REGS, CR_REGS, ALL_REGS, LIM_REG_CLASSES };

and

+/* Define which registers fit in which classes.
+   This is an initializer for a vector of HARD_REG_SET
+   of length N_REG_CLASSES.  */
+
+#define REG_CLASS_CONTENTS \
+  { {0, 0, 0}, {0xfffe, 0, 8}, {~0, 0, 8}, \
+{0, ~0, 0}, {~0, ~0, 0}, {0, 0, 1}, {0, 0, 2}, \
+{0, 0, 4}, {0, 0, 6}, {0, 0, 7}, {0, 0, 16},   \
+{0, 0, 0xff0}, {~0, ~0, 0xfff5} }


Yes, I'm sure we could make a somewhat nicer interface now.  As you see
we *do* have a little bit nicer things than this, already.  But it is so
fundamental that changing anything means changing quite some passes, and
changing all backends, and none of this is trivial.


Segher


Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 09:26:45AM -0400, Paul Koning wrote:
> > On Oct 3, 2019, at 9:12 AM, Richard Earnshaw (lists) 
> >  wrote:
> > 
> > On 03/10/2019 10:48, Segher Boessenkool wrote:
> >>> ...
> >> But ALL_REGS should contain *all* registers.  The union of any two register
> >> classes has to be contained in some register class, so every register class
> >> has to be contained in ALL_REGS.
> > 
> > Why would anyone want a to form a union of one class with a class that 
> > can't be used for allocation.  That's just silly.

Precomputations over all register classes, for example.

> Yes, if that's the only purpose of register classes.
> 
> If the documentation doesn't match what the real intent is, and some 
> developers apparently use the intent while others use the documentation.  The 
> result is bugs like the one we're talking about here.  
> 
> It may well make sense to say that ALL_REGS should be the set of all 
> *allocatable* registers.  If that's what we want, the documentation should 
> say so, and then the code has to understand that some registers (like CC_REG) 
> may be outside ALL_REGS.  Similarly, NO_REGS is currently documented as "no 
> registers" (empty set).  If it is intended that it might contain 
> non-allocatable registers, again the documentation should say so.

I doubt that works.

And ALL_REGS as *all* regs works.  This is what is documented, and barring
any other evidence, also what was intended.  Some other things might also
work, but even if the code works it can still be buggy.


Segher


Re: [PATCH target/86811] Mark VAX as not needing speculation barriers

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 12:16:46PM +0100, Maciej W. Rozycki wrote:
> On Thu, 3 Oct 2019, Segher Boessenkool wrote:
> > When a (conditional) branch executes, it is detected if the branch was
> > mispredicted, and if so it (and everything after it) is flushed.  This is
> > similar to how it was done on the PowerPC 6xx CPUs, or on the original
> > Pentium, etc.
> 
>  OK, I stand corrected.  For the sake of my enlightenment, do you 
> specifically refer to this statement:
> 
> "If the I-box predicts incorrectly, the E-box invokes a trap mechanism to 
> drain the pipeline and restart the I-box at the alternate PC."
> 
> (page 5, section "The I-box") or something else?

Yes, that is it.


Segher


Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 10:52:38AM +0100, Richard Sandiford wrote:
>   /* Certain fixed registers might be of the class NO_REGS.  This means
>that not only can they not be allocated by the compiler, but
>they cannot be used in substitutions or canonicalizations
>either.  */
> 
> (from cse.c).  This isn't actually documented AFAICT.  But it is
> useful for registers that are only allowed to appear in special
> instructions that the port generates itself.  (Or at least it was useful.
> operand_reg_set was supposed to be a more flexible way of doing this.)

You can also put such regs in ALL_REGS but in no other register class.
And it works fine if you do not handle non-allocatable registers at all
in REGNO_REG_CLASS, btw., but it should really be ALL_REGS then.

> There's also the problem that if you don't actually need a register
> to be in a class for its own sake, but only because the documentation
> says so, then it's less clear how individual registers should be
> grouped into classes.  E.g. is having separate SFP_REG and AFP_REG
> classes right (as above), or should they really be members of a
> single class?  In practice it doesn't matter because nothing ever
> uses these classes [ ... ]

Yeah exactly.  Register classes only matter for allocatable registers.

Thanks for the exposition,


Segher


Re: [PATCH target/86811] Mark VAX as not needing speculation barriers

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 03:12:34AM +0100, Maciej W. Rozycki wrote:
> On Fri, 20 Sep 2019, Jeff Law wrote:
> 
> > > According to Bob Supnik (who is a very authoritative source on VAX),
> > > 
> > >> Funny you should ask. The short answer is no. No VAX ever did
> > >> speculative or out of order execution.
> > > 
> > > As such, marking VAX as not needing speculation barriers.
> > > 
> > > 
> > >   PR target/86811
> > >   * config/vax/vax.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
> > >   Define to speculation_safe_value_not_needed.
> > Installed on the trunk.
> 
>  I don't think this is right.  As I have just mentioned in a related 
> discussion elsewhere, the NVAX and NVAX+ implementations include a branch 
> predictor in their microarchitecture[1], so obviously they do execute 
> speculatively.  I think this change would best be reverted and the issue 
> further investigated.
> 
> References:
> 
> [1] G. Michael Uhler et al, "The NVAX and NVAX+ High-performance VAX
> Microprocessors", Digital Technical Journal Vol. 4 No. 3 Summer 1992
> 
> 

As that article explains, the NVAX does *not* execute speculatively.  You
can have branch prediction without speculative execution just fine.  It
is beneficial, too, because this hides the fetch latency when a branch is
predicted correctly.

When a (conditional) branch executes, it is detected if the branch was
mispredicted, and if so it (and everything after it) is flushed.  This is
similar to how it was done on the PowerPC 6xx CPUs, or on the original
Pentium, etc.


Segher


Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-03 Thread Segher Boessenkool
On Thu, Oct 03, 2019 at 10:35:15AM +0100, Richard Earnshaw (lists) wrote:
> On 03/10/2019 00:01, Segher Boessenkool wrote:
> >The documentation says
> >
> >@findex ALL_REGS
> >@findex NO_REGS
> >In general, each register will belong to several classes.  In fact, one
> >class must be named @code{ALL_REGS} and contain all the registers.  Another
> >class must be named @code{NO_REGS} and contain no registers.  Often the
> >union of two classes will be another class; however, this is not required.
> >
> >so is the arm port wrong, or is the documentation wrong?  I think the arm
> >port simply forgets to include CC_REG, VFPCC_REG, SFP_REG, AFP_REG:
> >
> >#define REG_CLASS_CONTENTS  \
> >...
> >   { 0x, 0x, 0x, 0x0010 }, /* CC_REG */  \
> >   { 0x, 0x, 0x, 0x0020 }, /* VFPCC_REG */   \
> >   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */ \
> >   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */ \
> >   { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS */\
> >
> >and that last number should be 0x00ff instead?
> 
> Well CC_REG is completely fixed and can't be used for anything other 
> than condition flags.  The register allocator should never be 
> considering it for allocation.  So at that level, it really shouldn't 
> matter and adding it would just make more useless work for the register 
> allocator.
> 
> The omission of SFP_REG and AFP_REG is probably also deliberate.  These 
> are essentially special pseudo regs that MUST be eliminated during frame 
> layout.  They should certainly never be considered for general allocation.

But ALL_REGS should contain *all* registers.  The union of any two register
classes has to be contained in some register class, so every register class
has to be contained in ALL_REGS.


Segher


Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 10:22:22PM +0100, Richard Sandiford wrote:
> My change to the -fipa-ra bookkeeping used ALL_REGS as the supposedly
> safe default assumption, but ALL_REGS isn't literally all registers,
> just a close approximation.
> 
> This caused a bootstrap failure on arm-linux-gnu, where the condition
> code register isn't in ALL_REGS and so was being masked out of some
> call-clobbered sets.

The documentation says

@findex ALL_REGS
@findex NO_REGS
In general, each register will belong to several classes.  In fact, one
class must be named @code{ALL_REGS} and contain all the registers.  Another
class must be named @code{NO_REGS} and contain no registers.  Often the
union of two classes will be another class; however, this is not required.

so is the arm port wrong, or is the documentation wrong?  I think the arm
port simply forgets to include CC_REG, VFPCC_REG, SFP_REG, AFP_REG:

#define REG_CLASS_CONTENTS  \
...
  { 0x, 0x, 0x, 0x0010 }, /* CC_REG */  \
  { 0x, 0x, 0x, 0x0020 }, /* VFPCC_REG */   \
  { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */ \
  { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */ \
  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS */\

and that last number should be 0x00ff instead?



Segher


Re: [PATCH], V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 03:04:35PM -0400, Michael Meissner wrote:
> On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote:
> > On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> > > I needed
> > > to add a second memory constraint ('eM') that prevents using a prefixed
> > > address.
> > 
> > Do we need both em and eM?  Do we ever want to allow prefixed insns but not
> > pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
> > prefix trick won't work (because there are multiple insns in the template);
> > we could solve that some other way (by inserting the p's manually for
> > example).
> 
> No right now we need one (no prefix) and the other (no pcrel) is desirable, 
> but
> if we can only have one, it will mean extra instructions being generated.

We can have both if we need to, but it should have less confusing names at
a minimum.

> In the case of no prefix, we need this for stack_protect_testdi and
> stack_protect_setdi.  There if we have a large stack frame and enable stack
> protection, we don't want the register allocator from re-combining the insns 
> to
> an insn with a large offset, which it will do, even though the predicate does
> not allow this case.  This was discovered when we tried to build glibc, and 
> one
> module (vfwprintf-internal.c) has a large stack frame and is built with
> -fstack-protector-all.

Yes, but why does it not allow prefixed insns anyway?  It does not currently
*handle* prefixed insns properly, but that can be fixed.  It won't be
pretty, but it won't be horrible either.

Anyway, we need to be able to handle non-prefixed anyway (for asm, as I
mentioned later), so yes we want a constraint for that, and have "m" in
inline asm mean that (just like right now it actually means "m but not
update form").

> In the case of no pc-rel, this occurs in optimizing vector extracts where the
> vector is in memory.  In this code, we only have one temporary base register.

Why?

> In the case where the address is PC-relative, and the element number being
> extracted is variable, we would need two temporary base registers, one to hold
> the PC-relative address, and the other to hold the offset from the start of 
> the
> vector.  So here, we disallow PC-relative addresses, but numeric addresses 
> that
> result in a prefixed instruction are fine, since the code calculates the
> offset, adds in the offset, and then does the memory operation.

So it should have more scratch registers here?

Or, alternatively, we can just disallow all prefixed addressing here?  Will
that really degrade anything?

> If we only had a no prefixed constraint, the code would not combine the vector
> extract from memory, and instead, it would load the whole vector into a
> register, and then do the appropriate shifts, VSLO, etc. to extract the
> element.  I imagine the case shows up when you have large stack frames (or 
> very
> large structures).

I don't understand.

> > But what should inline asm users do wrt prefixed/pcrel memory?  Should
> > they not use prefixed memory at all?  That means for asm we should always
> > disallow prefixed for "m".
> 
> Yes, I've been thinking that for some time.  But I'm not going to worry about
> that until the patches are in.

Please do worry about it.

> > > 4) In the previous patch, I missed setting the prefixed size and non 
> > > prefixed
> > > size for mov_ppc64 in the insn.  This pattern is used for moving 
> > > PTImode
> > > in GPR registers (on non-VSX systems, it would move TImode also).  By the 
> > > time
> > > it gets to final, it will have been split, but it is still useful to get 
> > > the
> > > sizes correct before the mode is split.
> > 
> > So that is a separate patch?  Please send it as one, then?
> 
> No, it needs to be part of this patch.  It was just missing from the patch I
> sent out.

This patch does a whole lot of separate things.  It needs to be split up,
it took ages to review it like this.

> > > +  /* The LWA instruction uses the DS-form format where the bottom two 
> > > bits of
> > > + the offset must be 0.  The prefixed PLWA does not have this
> > > + restriction.  */
> > > +  if (TARGET_PREFIXED_ADDR
> > > +  && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > > +return true;
> > 
> > Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
> > part of all its callers?
> 
> Just trying to do the test before the call, so the default case (not 'future')
> won't have the call/return.

Don't micro-optimise like this please, the compiler

Re: [PATCH 4/4] Modifications to the testsuite

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 09:29:22PM +0200, Andreas Schwab wrote:
> FAIL: gcc.dg/ipa/ipa-sra-19.c (test for excess errors)
> Excess errors:
> /daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:19:3: error: 
> AltiVec argument passed to unprototyped function
> /daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:17:12: warning: 
> GCC vector returned by reference: non-standard ABI extension with no 
> compatibility guarantee [-Wpsabi]

Yeah.  Many tests add -Wno-psabi to shut these up; is that good here?


Segher


Re: syncing the GCC vax port, atomic issue

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 10:39:23AM +0100, Richard Earnshaw (lists) wrote:
> There are some target hooks in combine that might help here. 
> targetm.cannot_copy_insn_p and targetm.legitimate_combined_insn.  The 
> former is used more widely than just combine, so you might need to be 
> careful that it doesn't adversely affect other optimizations.  The 
> latter relies on spotting that the combined insn is doing something 
> stupid even though it matches a pattern in the back-end.  So it may be 
> that neither really solves your problem in this case.

Well, if what you need to prevent here is two branches from being combined
into one, you need to prevent many more passes than just combine from doing
the same, so this hook won't really help you.

I don't quite see why the generic optimisers make something that ICEs
later here, but yeah UNSPECs can side-step this whole issue.


Segher


Re: RTL alternative selection question

2019-10-01 Thread Segher Boessenkool
On Tue, Oct 01, 2019 at 01:12:06PM +0100, Andrew Stubbs wrote:
> On 23/09/2019 15:39, Andrew Stubbs wrote:
> >On 23/09/2019 15:15, Segher Boessenkool wrote:
> >>On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >>> (ashift:DI
> >>>   (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>>   (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>>    (clobber (match_scratch:BI 3 
> >>>"=cs,X"))]
> >>
> >>>Unfortunately, the compiler (almost?) exclusively selects the second
> >>>alternative, even when this means moving the values from one register
> >>>file to the other, and then back again.
> >>>
> >>>The problem is that the scalar instruction clobbers the CC register,
> >>>which results in a "reject++" for that alternative in the LRA dump.
> >>
> >>What kind of reject?  It prints a reason, too.
> >
> >  0 Non input pseudo reload: reject++
> 
> Apparently I was confused by operand "0" versus alternative "0". That 
> message did occur, but it wasn't the only one. Here's all of it:
> 
>  0 Non input pseudo reload: reject++
>  3 Scratch win: reject+=2
>alt=0,overall=9,losers=1,rld_nregs=2
>alt=1,overall=6,losers=1,rld_nregs=2
> 
> I don't understand why the "reject++" occurs, but presumably has to do 
> with the "Sg" register availability somehow?

operands[0] (the output) needed a reload, apparently.

[ snip ]

> Unfortunately, removing the "Scratch win" penalty alone is not enough 
> for LRA to select the first alternative -- at least, no in my testcase 
> -- so I need to understand the "non input pseudo reload" issue as well. 
> I can see why it fires for alt0, but not why it does not fire for alt1.

The reload it decided to do there was for operands[1] or operands[2]?


Segher


Re: [PATCH], V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

2019-10-01 Thread Segher Boessenkool
Hi Mike,

On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> I needed
> to add a second memory constraint ('eM') that prevents using a prefixed
> address.

Do we need both em and eM?  Do we ever want to allow prefixed insns but not
pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
prefix trick won't work (because there are multiple insns in the template);
we could solve that some other way (by inserting the p's manually for
example).

But what should inline asm users do wrt prefixed/pcrel memory?  Should
they not use prefixed memory at all?  That means for asm we should always
disallow prefixed for "m".

Having both em and eM names is a bit confusing (which is what?)  The eM
one should not be documented in the user manual, probably.

Maybe just using wY here will work just as well?  That is also not ideal
of course, but we already have that one anyway.

> 4) In the previous patch, I missed setting the prefixed size and non prefixed
> size for mov_ppc64 in the insn.  This pattern is used for moving PTImode
> in GPR registers (on non-VSX systems, it would move TImode also).  By the time
> it gets to final, it will have been split, but it is still useful to get the
> sizes correct before the mode is split.

So that is a separate patch?  Please send it as one, then?

> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> + the offset must be 0.  The prefixed PLWA does not have this
> + restriction.  */
> +  if (TARGET_PREFIXED_ADDR
> +  && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> +return true;

Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
part of all its callers?

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +return false;
> +
> +  enum insn_form iform
> += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PREFIXED_NUMERIC
> +  && iform != INSN_FORM_PCREL_LOCAL
> +  && iform != INSN_FORM_BAD);
> +})
> +
> +(define_predicate "non_pcrel_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +return false;
> +
> +  enum insn_form iform
> += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PCREL_EXTERNAL
> +  && iform != INSN_FORM_PCREL_LOCAL
> +  && iform != INSN_FORM_BAD);
> +})

Why does non_prefixed_memory not check INSN_FORM_PCREL_EXTERNAL?  Why does
non_prefixed_memory not use non_pcrel_memory, instead of open-coding it?

What is INSN_FORM_BAD about, in both functions?

> +;; Return 1 if op is either a register operand or a memory operand that does
> +;; not use a PC-relative address.
> +(define_predicate "reg_or_non_pcrel_memory"
> +  (match_code "reg,subreg,mem")
> +{
> +  if (REG_P (op) || SUBREG_P (op))
> +return register_operand (op, mode);
> +
> +  return non_pcrel_memory (op, mode);
> +})

Why do we need this predicate?  Should it use register_operand like this,
or should it use gpc_reg_operand?

> +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);

Similar as above: should TARGET_PCREL be part of pcrel_local_address?

> @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest,
>   systems.  */
>if (MEM_P (src))
>  {
> +  /* If this is a PC-relative address, we would need another register to
> +  hold the address of the vector along with the variable offset.  The
> +  callers should use reg_or_non_pcrel_memory to make sure we don't
> +  get a PC-relative address here.  */

I don't understand this comment, nor the problem.  Please expand?

> +  /* Allow prefixed instructions if supported.  If the bottom two bits of the
> + offset are non-zero, we could use a prefixed instruction (which does not
> + have the DS-form constraint that the traditional instruction had) 
> instead
> + of forcing the unaligned offset to a GPR.  */
> +  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
> +return true;

Here (and for DQ) you aren't testing TARGET_PREFIXED?

> @@ -7371,7 +7435,7 @@ mem_operand_gpr (rtx op, machine_mode mo
> causes a wrap, so test only the low 16 bits.  */
>  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x1u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

This is a separate patch (and pre-approved).

> @@ -7404,7 +7475,7 @@ mem_operand_ds_form (rtx op, machine_mod
> causes a wrap, so test only the low 16 bits.  */
>  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x1u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

Together with this.

> -  offset += 0x8000;
> -  return offset < 0x1 - extra;
> +  if (TARGET_PREFIXED_ADDR)
> +return 

Re: [PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Segher Boessenkool
On Tue, Oct 01, 2019 at 10:18:37AM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> It's mentioned by name later too:
> 
> /* Step 2: Mark chains for which we have reads outside operands
>as unrenamable.
>We do this by munging all operands into CC0, and closing
>everything remaining.  */

Ah yes, I had that in a later patch.  I have patches that get rid of CC0
everywhere, except in the targets that actually use CC0 still, and there
are some examples in the manual that need to be rewritten.  If anyone
wants these patches, just ask please.

> OK with the same change there.

Done, committed.  Thanks!


Segher


[PATCH] regrename: Use PC instead of CC0 to hide operands

2019-10-01 Thread Segher Boessenkool
The regrename pass temporarily changes some operand RTL to CC0 so that
note_stores and scan_rtx don't see those operands.  CC0 is deprecated
and we want to remove it, so we need to use something else here.
PC fits the bill fine.

Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?


Segher


2019-10-01  Segher Boessenkool  

* regrename.c (hide_operands): Use pc_rtx instead of cc0_rtx.

---
 gcc/regrename.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 8c3bae8..ed1dcde 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1517,7 +1517,7 @@ scan_rtx (rtx_insn *insn, rtx *loc, enum reg_class cl, 
enum scan_actions action,
 }
 
 /* Hide operands of the current insn (of which there are N_OPS) by
-   substituting cc0 for them.
+   substituting pc for them.
Previous values are stored in the OLD_OPERANDS and OLD_DUPS.
For every bit set in DO_NOT_HIDE, we leave the operand alone.
If INOUT_AND_EC_ONLY is set, we only do this for OP_INOUT type operands
@@ -1541,7 +1541,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
*old_dups,
continue;
   if (!inout_and_ec_only || recog_data.operand_type[i] == OP_INOUT
  || op_alt[i].earlyclobber)
-   *recog_data.operand_loc[i] = cc0_rtx;
+   *recog_data.operand_loc[i] = pc_rtx;
 }
   for (i = 0; i < recog_data.n_dups; i++)
 {
@@ -1551,7 +1551,7 @@ hide_operands (int n_ops, rtx *old_operands, rtx 
*old_dups,
continue;
   if (!inout_and_ec_only || recog_data.operand_type[opn] == OP_INOUT
  || op_alt[opn].earlyclobber)
-   *recog_data.dup_loc[i] = cc0_rtx;
+   *recog_data.dup_loc[i] = pc_rtx;
 }
 }
 
-- 
1.8.3.1



[PATCH] doc/md.texi: Fix some typos

2019-10-01 Thread Segher Boessenkool
It says " size N/2" in a few places where "size S/2" is meant.

Committing as trivial and obvious.


Segher


2019-10-01  Segher Boessenkool  

* doc/md.texi (vec_pack_trunc_@var{m}): Fix typo.
(vec_pack_sfix_trunc_@var{m}, vec_pack_ufix_trunc_@var{m}): Ditto.
(vec_packs_float_@var{m}, vec_packu_float_@var{m}): Ditto.

---
 gcc/doc/md.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 868016a..859ebed 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5454,7 +5454,7 @@ The output and input vectors should have the same modes.
 Narrow (demote) and merge the elements of two vectors. Operands 1 and 2
 are vectors of the same mode having N integral or floating point elements
 of size S@.  Operand 0 is the resulting vector in which 2*N elements of
-size N/2 are concatenated after narrowing them down using truncation.
+size S/2 are concatenated after narrowing them down using truncation.
 
 @cindex @code{vec_pack_sbool_trunc_@var{m}} instruction pattern
 @item @samp{vec_pack_sbool_trunc_@var{m}}
@@ -5481,7 +5481,7 @@ saturating arithmetic.
 Narrow, convert to signed/unsigned integral type and merge the elements
 of two vectors.  Operands 1 and 2 are vectors of the same mode having N
 floating point elements of size S@.  Operand 0 is the resulting vector
-in which 2*N elements of size N/2 are concatenated.
+in which 2*N elements of size S/2 are concatenated.
 
 @cindex @code{vec_packs_float_@var{m}} instruction pattern
 @cindex @code{vec_packu_float_@var{m}} instruction pattern
@@ -5489,7 +5489,7 @@ in which 2*N elements of size N/2 are concatenated.
 Narrow, convert to floating point type and merge the elements
 of two vectors.  Operands 1 and 2 are vectors of the same mode having N
 signed/unsigned integral elements of size S@.  Operand 0 is the resulting 
vector
-in which 2*N elements of size N/2 are concatenated.
+in which 2*N elements of size S/2 are concatenated.
 
 @cindex @code{vec_unpacks_hi_@var{m}} instruction pattern
 @cindex @code{vec_unpacks_lo_@var{m}} instruction pattern
-- 
1.8.3.1



Re: [RFC] Come up with ipa passes introduction in gccint documentation

2019-09-30 Thread Segher Boessenkool
Hi!

On Mon, Sep 30, 2019 at 09:47:13AM +0800, luoxhu wrote:
> On 2019/9/30 00:17, Segher Boessenkool wrote:
> The updated output will be as below in gccint.pdf, references are valid to 
> jump over
> (suppose should be the same in info?):

> > Did you test this with both "make info" and "make pdf" (and checked the
> > result of those of course :-) )?

The reason I ask is, there often are little mistakes (in markup for example)
that are obvious in either rendered output or in the info reader (but not
(always) in both).  Not just the content, also the way things are laid out,
and what makes a link to what, etc.

> To simplify development, the GCC pass manager differentiates between normal 
> interprocedural
> passes see Section 9.4.2 [All regular IPA passes], page 127, small 
> inter-procedural
> passes see Section 9.4.1 [All small IPA passes], page 127 and late 
> inter-procedural passes see
> Section 9.4.3 [All late IPA passes], page 128. A small inter-procedural pass 
> (SIMPLE_IPA_

It is weird to have a different order here than the order of the actual
sections, for example.

> The reason for "small" is in passes.def, it is used as 
> INSERT_PASSES_AFTER (all_small_ipa_passes).  And in cgraphunit.c:
> ipa_passes (void)
> {
> ... 
>   execute_ipa_pass_list (passes->all_small_ipa_passes);
> ...
> }
> 
> So is it OK to use small here?

I don't mind if it is "small" or "simple", but we probably should use the
same name everywhere in the manual (or, if we use both names, explain that
they are the same thing).

There are many cases where the internal name (in the GCC source code, in
pass names, etc.) is different from the external name (in the manuals, in
option names, etc.)  It is hard to decide what to use in the internals
manual then :-)  Maybe the internal name should just be changed?

> One more thing to worry about is my poor English

It's perfectly good enough that we understand what you are saying :-)

> and I am not familiar with all the
> 30+ IPA passes so not easy to extract exact explanations for them. 

Yeah...  It's a good start to just have stubs for them, I guess?  Makes
it obvious things need to be filled out, so hopefully people will :-)


Segher


Re: [PATCH] rs6000: Fix PR91275

2019-09-30 Thread Segher Boessenkool
Hi Bill,

On Mon, Sep 30, 2019 at 08:24:09PM -0500, Bill Schmidt wrote:
> PR91275 observes that __builtin_crypto_vpmsumd fails to work properly
> with -O1 or higher with -mcpu=power8.  That combination spells swap
> optimization.  Indeed, all vpmsum* instructions were being accepted
> as swappable operations.  This is correct for all of them but vpmsumd,
> which creates a 128-bit result.

Yeah, that is obvious once you see it spelled out :-)

> The -std=gnu11 in the testcase is there to avoid errors about long long
> not being accepted with pure ANSI.  The "11" part is arbitrary.  The
> testcase is modified from the original bug report.

That is fine.

> This patch disallows swap optimization in the presence of vpmsumd.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this okay
> for trunk, and for backport to all active branches after an appropriate
> waiting period?

Yes please.  Thanks,


Segher


> 2019-09-30  Bill Schmidt  
> 
>   * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap
>   vpmsumd.
> 
> [gcc/testsuite]
> 
> 2019-09-30  Bill Schmidt  
> 
>   * gcc.target/powerpc/pr91275.c: New.


Re: [PATCH, RS6000] Add movmemsi pattern for inline expansion of memmove()

2019-09-30 Thread Segher Boessenkool
Hi!

On Mon, Sep 30, 2019 at 11:36:31AM -0500, Aaron Sawdey wrote:
> This patch uses the support added in the patch I posted last week for 
> actually doing
> inline expansion of memmove().

> I've also removed the code from expand_block_move() for dealing with
> mode==BLKmode because I don't believe that can happen. The big if construct 
> that
> figures out which size we are going to use has a plain else on it, and every
> clause in it sets mode to something other than BLKmode. So I removed that code
> to simplify things and just left a gcc_assert(mode != BLKmode).

That looks fine, with that assert.

> 2019-09-30  Aaron Sawdey 
> 
>   * config/rs6000/rs6000-protos.h (expand_block_move): Change prototype.
>   * config/rs6000/rs6000-string.c (expand_block_move): Add might_overlap 
> parm.

This line is too long...  80 columns max.

>   * config/rs6000/rs6000.md (movmemsi): Add new pattern.
>   (cpymemsi): Add might_overlap parm to expand_block_move() call.


Okay for trunk.  Thanks!


Segher


Re: [PATCH v3 4/9] S/390: Do not use signaling vector comparisons on z13

2019-09-30 Thread Segher Boessenkool
On Mon, Sep 30, 2019 at 03:36:41PM +0200, Ilya Leoshkevich wrote:
> > Am 06.09.2019 um 12:34 schrieb Segher Boessenkool 
> > :
> > Should you handle -fsignaling-nans here as well?
> 
> Do you mean disabling vectorisation of LE/LT/GE/GT/LTGT when
> -fsignaling-nans is in effect?  This makes sense to me.

I meant it sounds like you need to do *something* more than the generic
code does, here.

> I could do that
> here, but wouldn't common code (e.g. expand_vec_cond_expr_p) be a better
> place?

If it can be done there, all the better, sure.

Anyway, this is a separate issue, don't let me distract you :-)


Segher


Re: [PATCH] C testsuite, silence a FreeBSD libc warning

2019-09-30 Thread Segher Boessenkool
On Mon, Sep 30, 2019 at 07:47:54PM +0200, Jakub Jelinek wrote:
> So, can you try just
>{ dg-prune-output "warning: warning: \[^\n\r\]* possibly used unsafely; 
> consider using" } */
> or if that doesn't work, with .* at start end end?

Or even just

{ dg-prune-output {(?n)warning: warning: .* possibly used unsafely; consider 
using} } */

(?n) means to match single lines only; . and bracket expressions using ^
will not match newlines if you use it.

("warning: " twice, btw?)


Segher


Re: [10/32] Remove global call sets: combine.c

2019-09-29 Thread Segher Boessenkool
On Sun, Sep 29, 2019 at 04:32:13PM -0600, Jeff Law wrote:
> On 9/25/19 9:52 AM, Richard Sandiford wrote:
> > gcc/
> > * combine.c: Include function-abi.h.
> > (record_dead_and_set_regs): Use insn_callee_abi to get the ABI
> > of the target of call insns.  Invalidate partially-clobbered
> > registers as well as fully-clobbered ones.
> OK if Segher doesn't object.

https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01462.html

"That looks great, thanks!"

:-)


Segher


Re: [RFC] Come up with ipa passes introduction in gccint documentation

2019-09-29 Thread Segher Boessenkool
Hi!

Just some editorial comments...  The idea of the patch is fine IMHO.
(I am not maintainer of this, take all my comments for what they are).

On Sun, Sep 29, 2019 at 02:56:37AM -0500, Xiong Hu Luo wrote:
>  To simplify development, the GCC pass manager differentiates
> -between normal inter-procedural passes and small inter-procedural
> -passes.  A @emph{small inter-procedural pass}
> +between normal inter-procedural passes @pxref{All regular ipa passes}
> +and small inter-procedural passes@pxref{All small ipa passes}
> +& @pxref{All late ipa passes}. A @emph{small inter-procedural pass}
>  (@code{SIMPLE_IPA_PASS}) is a pass that does

To simplify development, the GCC pass manager differentiates
between normal inter-procedural passes @pxref{All regular IPA passes},
simple inter-procedural passes@pxref{All simple IPA passes},
and late inter-procedural passes @pxref{All late IPA passes}.
A @emph{simple inter-procedural pass}
(@code{SIMPLE_IPA_PASS}) is a pass that does

Don't use "&", write out the word.  IPA is an abbreviation, not a word,
so it should be written in all caps.  Enumerations are written like this,
that, and something else.  Is there a reason you used "small" instead of
"simple"?

> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index 6edb9a0bfb7..0b6cf73469c 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -21,6 +21,7 @@ where near complete.
>  * Gimplification pass::  The bits are turned into something we can optimize.
>  * Pass manager:: Sequencing the optimization passes.
>  * Tree SSA passes::  Optimizations on a high-level representation.
> +* IPA passes::   Optimizations on scope of intra-procedual.
>  * RTL passes::   Optimizations on a low-level representation.

I'd just say "Intra-procedural optimizations."  And maybe order this
differntly?  IPA, SSA, RTL, the same order as they run, and this also
keeps the "high/low-level representation" together, which reads a bit
better.

The rest should be worked out a bit more, but looks promising.

Did you test this with both "make info" and "make pdf" (and checked the
result of those of course :-) )?

Thanks,


Segher


Re: [PATCH, rs6000] Lower vec_perm vectorization cost for P8/P9

2019-09-29 Thread Segher Boessenkool
Hi!

On Sun, Sep 29, 2019 at 01:38:31PM +0800, Kewen.Lin wrote:
> Recently we are revisiting vectorization cost setting in 
> rs6000_builtin_vectorization_cost, and found the current cost of
> vec_perm on VSX looks overpriced for Power8 and Power9.

Yeah it does.

> The high
> cost was set for Power7 single VSU pipe, but Power8 and Power9
> have supported more VSX units,

Power7 has two VSU pipes, but only one permute (and only one of the other
VMX units, and only one store).  It can do two VSX or two classic FP, and
various combinations.

> the performance evaluation on
> SPEC2017 Power9 shows 2%+ gain on 538.imagick_r, while SPEC2006/
> SPEC2017 Power8 evaluations show ~2% gains on 453.povray/
> 511.povray_r, all don't have any remarkable degradations.

Nice :-)  What does it do to geomean, is that in the right direction
as well?

As a follow-up, vec_promote_demote is costed similarly higher for VSX
than for other things, it might help to change that one as well?

> This patch is to lower vec_perm vectorization cost for all
> non Power7 VSX architecture (currently Power8 and Power9).

Okay for trunk with the comment fixed (just say p7 has only one permute
unit?), and assuming geomean is fine too.

Thanks!


Segher


Re: [PATCH], V4, patch #3: Fix up mov_64bit_dm

2019-09-27 Thread Segher Boessenkool
On Wed, Sep 18, 2019 at 07:58:46PM -0400, Michael Meissner wrote:
> In doing the patches, I noticed that mov_64bit_dm had two alternatives
> combined together.  This patch fixes the problem, before the next patch that
> will need to modify mov_64bit_dm for prefixed addressing.

This is okay for trunk.  Thanks!


Segher


> 2019-09-18  Michael Meissner  
> 
>   * config/rs6000/rs6000.md (mov_64bit_dm): Split the
>   alternatives for loading 0.0 to a GPR and loading a 128-bit
>   floating point type to a GPR.


Re: [PATCH], V4.1, patch #2: Add prefixed insn attribute (revised)

2019-09-27 Thread Segher Boessenkool
Hi!

On Tue, Sep 24, 2019 at 02:10:10AM -0400, Michael Meissner wrote:
> +/* Return true if the address is a prefixed instruction that can be directly
> +   used in a memory instruction (i.e. using numeric offset or a PC-relative
> +   reference to a local symbol).

This could use a bit of a rewrite...  "Return whether the address is valid
for a prefixed memory instruction [...]"?

> +  /* Use the default pattern for loading up PC-relative addresses.  */
> +  if (TARGET_PCREL && mode == Pmode
> +   && (SYMBOL_REF_P (operands[1]) || LABEL_REF_P (operands[1])
> +   || GET_CODE (operands[1]) == CONST))

Maybe this can use some predicate function?  That will make the CONST
stand out more as being the special case here, too?

> +  unsigned int r = reg_or_subregno (reg);
> +
> +  /* If we have a pseudo, use the default instruction format.  */
> +  if (r >= FIRST_PSEUDO_REGISTER)
> +return NON_PREFIXED_DEFAULT;

Please use

  if (!HARD_REGISTER_NUM_P (r))

> +  (eq_attr "type" "load,fpload,vecload")
> +  (if_then_else (and (eq_attr "indexed" "no")
> + (eq_attr "update" "no")
> + (match_test "prefixed_load_p (insn)"))
> +(const_string "yes")
> +(const_string "no"))

It looks like prefixed_load_p and prefixed_store_p should test for
"indexed" "no" and "update" "no" themselves?  The code here simplifies
a bit then.

(blank line before the default case please).

> + (const_string "no")))


Okay for trunk with those things fixed.  Thanks!


Segher


Re: [PATCH] V4.1, patch #1: Rework prefixed/pc-relative lookup (revised)

2019-09-27 Thread Segher Boessenkool
Hi!

On Tue, Sep 24, 2019 at 01:59:07AM -0400, Michael Meissner wrote:
> +;; Return true if the operand is a PC-relative address to a local symbol or a
> +;; label that can be used directly in a memory operation.

"address of", not "address to"?

> +/* Different PowerPC instruction formats that are used by GCC.  There are
> +   various other instruction formats used by the PowerPC hardware, but the
> +   these formats are not currently used by GCC.  */

"the these".

> +enum non_prefixed {
> +  NON_PREFIXED_DEFAULT,  /* Use the default.  */
> +  NON_PREFIXED_D,/* All 16-bits are valid.  */
> +  NON_PREFIXED_DS,   /* Bottom 2 bits must be 0.  */
> +  NON_PREFIXED_DQ,   /* Bottom 4 bits must be 0.  */
> +  NON_PREFIXED_X /* No offset memory form exists.  */
> +};

Please call the enum non_prefixed_form.

With those nits fixed, this is fine for trunk.  Thank you!


Segher


Re: [PATCH, rs6000] Support float from/to long conversion vectorization

2019-09-27 Thread Segher Boessenkool
On Fri, Sep 27, 2019 at 04:52:30PM +0800, Kewen.Lin wrote:
> > (Maybe one of the gen* tools complains any_fix needs a mode? :QI will do
> > if so, or :P if you like that better).
> 
> I didn't encounter any errors, it sounds it's allowable now?

Ah, that particular warning (from genrecog.c) only happens for a SET,
so you won't get it here.  It would be a warning, not an error, btw.


Segher


Re: Problem exposed by recent ARM multiply changes

2019-09-27 Thread Segher Boessenkool
On Fri, Sep 27, 2019 at 10:08:46AM +0200, Jakub Jelinek wrote:
> On Thu, Sep 26, 2019 at 05:34:25PM -0500, Segher Boessenkool wrote:
> > > Are you sure?
> > > "The UMLAL instruction interprets the values from Rn and Rm as unsigned
> > > integers.  It multiplies these integers, and adds the 64-bit result to the
> > > 64-bit unsigned integer contained in RdHi and RdLo."
> > 
> > Yes.  It adds two 64-bit numbers.  That is not the same as adding the
> > top and bottom 32-bit halfs of that separately.
> 
> Sure.

> But because 131 is shifted <<32 and the result then >>32, we can simplify
> that further to what the pattern uses:
> (set (reg:SI 133 [+4 ])
> (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI (zero_extend:DI 
> (reg/v:SI 115 [ sec ]))
> (zero_extend:DI (reg:SI 124)))
> (zero_extend:DI (reg:SI 130)))
> (const_int 32 [0x20])))
> (reg:SI 131 [+4 ])))

I completely missed the (zero_extend (reg:SI 130)) in there.  My excuse
is the messed up formatting in the insn dump.  Sorry for the noise, and
thanks for your patience.

BTW, did you check if GCC optimises to this same formulation?  Say in
combine, can it do combinations of this insn with anything else?


Segher


Re: [PATCH, rs6000] Support float from/to long conversion vectorization

2019-09-27 Thread Segher Boessenkool
Hi Kewen,

On Fri, Sep 27, 2019 at 10:33:01AM +0800, Kewen.Lin wrote:
> This patch is to add the support for float from/to long conversion
> vectorization.  ISA 2.06 supports the vector version instructions
> for conversion between float and long long (both signed and unsigned),
> but vectorizer can't exploit them since the optab check fails.

Nice :-)

> +;; Support signed/unsigned long long to float conversion vectorization.
> +(define_expand "vec_pack_float_v2di"
> +  [(match_operand:V4SF 0 "vfloat_operand")
> +   (any_float:V4SF (parallel [(match_operand:V2DI 1 "vint_operand")
> + (match_operand:V2DI 2 "vint_operand")]))]

To concatenate two vectors, the syntax is vec_concat.  So

  [(set (match_operand:V4SF 0 "vfloat_operand")
(any_float:V4SF
  (vec_concat:V4DI (match_operand:V2DI 1 "vint_operand")
   (match_operand:V2DI 2 "vint_operand"]

It is of course a define_expand here, and it always calls DONE, so the
only thing the RTL template is used for is the match_operands; but also
important here is that you use an iterator (any_float), so you need to
work that into the template some way.

Your code would work, but it is a bit misleading, an unsuspecting reader
(*cough* me *cough*) might think this is the actual insn this expander
will create.

> +;; Support float to signed/unsigned long long conversion vectorization.
> +(define_expand "vec_unpack_fix_trunc_hi_v4sf"
> +  [(match_operand:V2DI 0 "vint_operand")
> +   (any_fix:V2DI (match_operand:V4SF 1 "vfloat_operand"))]

Similarly here: the pattern as you wrote it isn't valid RTL.

  [(set (match_operand:V2DI 0 "vint_operand")
(any_fix:V2DI (vec_select:V2SF ...
uh-oh, we do not have a mode V2SF.
Let's go with what you have then, add a comment that the template isn't
valid RTL, but you need it for the iterator?

Or can you think of a different way of putting an iterator like this in
the template?  Maybe something like

(define_expand "vec_unpack_fix_trunc_hi_v4sf"
  [(match_operand:V2DI 0 "vint_operand")
   (match_operand:V4SF 1 "vfloat_operand")
   (any_fix (pc))]

works?  If it does, please do that; if you cannot find a reasonably clear
syntax, go with what you had, but please add a comment saying the template
won't ever be inserted as instruction.

(Maybe one of the gen* tools complains any_fix needs a mode? :QI will do
if so, or :P if you like that better).

With either way, approved for trunk (after testing, of course).  Thanks!


Segher


Re: [PATCH v2, rs6000] Replace X-form addressing with D-form addressing in new pass for Power 9

2019-09-26 Thread Segher Boessenkool
Hi Kelvin,

Sorry for the slow review.

On Tue, Sep 03, 2019 at 03:10:09PM -0500, Kelvin Nilsen wrote:
> This new pass scans existing rtl expressions and replaces them with rtl 
> expressions that favor selection of the D-form instructions in contexts for 
> which the D-form instructions are preferred.  The new pass runs after the RTL 
> loop optimizations since loop unrolling often introduces opportunities for 
> beneficial replacements of X-form addressing instructions.

And that is before the "big" RTL passes (cprop, CSE, combine, and related).
Okay.

> --- gcc/config/rs6000/rs6000-p9dform.c(nonexistent)
> +++ gcc/config/rs6000/rs6000-p9dform.c(working copy)
> @@ -0,0 +1,1487 @@
> +/* Subroutines used to transform array subscripting expressions into
> +   forms that are more amenable to d-form instruction selection for p9
> +   little-endian VSX code.
> +   Copyright (C) 1991-2018 Free Software Foundation, Inc.

(Year needs updating here).

> +   This pass runs immediately after pass_loops2.  Loops have already

(typo, pass_loop2)

> +/* This is based on the union-find logic in web.c.  web_entry_base is
> +   defined in df.h.  */
> +class indexing_web_entry : public web_entry_base
> +{
> + public:

Most places have no extra leading space here.  Not sure what the coding
standards say?

> +static int count_links (struct df_link *def_link)
> +{
> +  int result;
> +  for (result = 0; def_link != NULL; result++)
> +def_link = def_link->next;
> +  return result;
> +}

Maybe this belongs in more generic code?  Or as inline in df.h even?

> +
> +  int result = 0;
> +  for (i = 0; i < count; i++)
> +{
> +  result = (result << 6) | ((result & 0xf00) >> 28);

Do you want an extra zero in the constant there?  The expression  is 0
as written.

Shifting a signed (possibly negative!) number to the right isn't a great
idea in any case.

Maybe you want to use some standard hash, instead?

> +   /* if there is no def, or the def is artificial,
> +  or there are multiple defs, this is an originating
> +  use.  */

(Sentence should start with a capital).

> +   unsigned int uid2 =
> + insn_entry[uid].original_base_use;

The = goes on the second line, instead.

> +  rtx mem = NULL;
> +  if (GET_CODE (body) == SET)
> +{
> +  if (MEM_P (SET_SRC (body)))
> + mem = XEXP (SET_SRC (body), 0);
> +  else if (MEM_P (SET_DEST (body)))
> + mem = XEXP (SET_DEST (body), 0);
> +}
> +  /* Otherwise, this is neither load or store, so leave mem as NULL.  */
> +
> +  if (mem != NULL)
> +{

You could instead say

  if (!mem)
return;

and then you don't need the comment or the "if" after it.  Or even:

  if (GET_CODE (body) != SET)
return;

  rtx mem;
  if (MEM_P (SET_SRC (body)))
mem = XEXP (SET_SRC (body), 0);
  else if (MEM_P (SET_DEST (body)))
mem = XEXP (SET_DEST (body), 0);
  else
return;

> +{
> +  rtx def_insn = DF_REF_INSN (def_link->ref);
> +  /* unsigned uid = INSN_UID (def_insn); not needed? */

Delete this line?  Or is it useful still :-)

> +  if (GET_CODE (body) == CONST_INT)
> + return true;

if (CONST_INT_P (body)) ...

> +static bool
> +in_use_list (struct df_link *list, struct df_link *element)
> +{
> +  while (list != NULL)
> +{
> +  if (element->ref == list->ref)
> + return true;
> +  list = list->next;
> +}
> +  /* Got to end of list without finding element.  */
> +  return false;
> +}

This kind of function makes me scared of quadratic complexity somewhere.
I didn't check, but it seems likely :-/

> +  if ((insn_entry[uid_1].base_equivalence_hash !=
> +   insn_entry[uid_2].base_equivalence_hash) ||
> +  (insn_entry[uid_1].index_equivalence_hash !=
> +   insn_entry[uid_2].index_equivalence_hash))
> +return false;
> +  else   /* Hash codes match.  Check details.  */

After a return you don't need an else.  Reduce indentation, instead :-)

> +/* Return true iff instruction I2 dominates instruction I1.  */
> +static bool
> +insn_dominated_by_p (rtx_insn *i1, rtx_insn *i2)
> +{
> +  basic_block bb1 = BLOCK_FOR_INSN (i1);
> +  basic_block bb2 = BLOCK_FOR_INSN (i2);
> +
> +  if (bb1 == bb2)
> +{
> +  for (rtx_insn *i = i2;
> +(i != NULL) && (BLOCK_FOR_INSN (i) == bb1); i = NEXT_INSN (i))

How can it ever be NULL?

> + if (i == i1)
> +   return true;
> +  return false;
> +}
> +  else
> +return dominated_by_p (CDI_DOMINATORS, bb1, bb2);
> +}

IRA already has an insn_dominated_by_p function, which is constant time
(it does require some extra init for uid_luid[]).

> +   int hash = ((insn_entry [uid].base_equivalence_hash +

No + at the end of line, either.  The check_GNU_style script should
check for this (not sure if it actually does, YMMV).

> +  /* Since this pass creates new instructions, get_max_uid () may
> + return different values at different times during this pass.  The
> +

Re: Problem exposed by recent ARM multiply changes

2019-09-26 Thread Segher Boessenkool
On Fri, Sep 27, 2019 at 12:30:50AM +0200, Jakub Jelinek wrote:
> On Thu, Sep 26, 2019 at 05:17:40PM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 25, 2019 at 10:06:13PM -0600, Jeff Law wrote:
> > > (insn 14 13 16 2 (parallel [
> > > (set (reg:SI 132)
> > > (plus:SI (mult:SI (zero_extend:DI (reg/v:SI 115 [ sec ]))
> > > (zero_extend:DI (reg:SI 124)))
> > > (reg:SI 130)))
> > > (set (reg:SI 133 [+4 ])
> > > (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI 
> > > (zero_extend:DI (reg/v:SI 115 [ sec ]))
> > > (zero_extend:DI (reg:SI 124)))
> > > (zero_extend:DI (reg:SI 130)))
> > > (const_int 32 [0x20])))
> > > (reg:SI 131 [+4 ])))
> > > ]) "j.c":10:54 60 {umlal}
> > >  (expr_list:REG_DEAD (reg:SI 131 [+4 ])
> > > (expr_list:REG_DEAD (reg:SI 130)
> > > (expr_list:REG_DEAD (reg:SI 124)
> > > (expr_list:REG_DEAD (reg/v:SI 115 [ sec ])
> > > (nil))
> > 
> > This is not a correct pattern for umlal (it should have a carry from the
> > low half of the addition to the high half).
> 
> Are you sure?
> "The UMLAL instruction interprets the values from Rn and Rm as unsigned
> integers.  It multiplies these integers, and adds the 64-bit result to the
> 64-bit unsigned integer contained in RdHi and RdLo."

Yes.  It adds two 64-bit numbers.  That is not the same as adding the
top and bottom 32-bit halfs of that separately.

> Appart from the bug I've provided untested fix in the PR, the above pattern
> seems to do what the documentation says, the low 32 bits are
> (unsigned int) (Rn*Rm+RdLo), and the whole result for unsigned int
> Rn, Rm, RdHi and RdLo is
> (unsigned long long)Rn*Rm+((unsigned long long)RdHi<<32)+RdLo
> and the upper 32 bits of that are IMHO equivalent to
> (unsigned) (((unsigned long long)Rn*Rm+RdLo)>>32)+RdHi

The upper half of the result should be one bigger if the addition of the
low half carries.


Segher


Re: Problem exposed by recent ARM multiply changes

2019-09-26 Thread Segher Boessenkool
On Wed, Sep 25, 2019 at 10:06:13PM -0600, Jeff Law wrote:
> (insn 14 13 16 2 (parallel [
> (set (reg:SI 132)
> (plus:SI (mult:SI (zero_extend:DI (reg/v:SI 115 [ sec ]))
> (zero_extend:DI (reg:SI 124)))
> (reg:SI 130)))
> (set (reg:SI 133 [+4 ])
> (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI
> (zero_extend:DI (reg/v:SI 115 [ sec ]))
> (zero_extend:DI (reg:SI 124)))
> (zero_extend:DI (reg:SI 130)))
> (const_int 32 [0x20])))
> (reg:SI 131 [+4 ])))
> ]) "j.c":10:54 60 {umlal}
>  (expr_list:REG_DEAD (reg:SI 131 [+4 ])
> (expr_list:REG_DEAD (reg:SI 130)
> (expr_list:REG_DEAD (reg:SI 124)
> (expr_list:REG_DEAD (reg/v:SI 115 [ sec ])
> (nil))

This is not a correct pattern for umlal (it should have a carry from the
low half of the addition to the high half).


Segher


Re: [PATCH, rs6000] Update powerpc vector load builtins with PURE attribute

2019-09-26 Thread Segher Boessenkool
On Thu, Sep 26, 2019 at 12:06:03PM -0500, Bill Schmidt wrote:
> On 9/26/19 12:00 PM, Segher Boessenkool wrote:
> >On Thu, Sep 26, 2019 at 10:40:29AM -0500, will schmidt wrote:
> >>Update our (rs6000) vector load built-ins with the PURE attribute.  These
> >>were previously given the MEM attribute, which meant that redundant loads
> >>surrounding the built-in calls could not be eliminated in earlier passes
> >>since they were defined as having the potential to touch memory.
> >>2019-09-26  Will Schmidt 
> >>* config/rs6000/rs6000-builtin.def: ( LVSL LVSR LVEBX LVEHX
> >  ^--- stray space
> >
> >Please put commas between the items, too?
> >
> >The patch is okay for trunk.  Thank you!
> >
> I wonder whether we should also consider a backport to 9, when we 
> started expanding these earlier.  Thoughts?

Good idea, yeah.  After waiting for a week or so to see if problems turn
up on trunk, as usual.


Segher


Re: [PATCH, rs6000] Update powerpc vector load builtins with PURE attribute

2019-09-26 Thread Segher Boessenkool
Hi Will,

On Thu, Sep 26, 2019 at 10:40:29AM -0500, will schmidt wrote:
> Update our (rs6000) vector load built-ins with the PURE attribute.  These
> were previously given the MEM attribute, which meant that redundant loads
> surrounding the built-in calls could not be eliminated in earlier passes
> since they were defined as having the potential to touch memory.

> 2019-09-26  Will Schmidt 
>   * config/rs6000/rs6000-builtin.def: ( LVSL LVSR LVEBX LVEHX
 ^--- stray space

Please put commas between the items, too?

The patch is okay for trunk.  Thank you!


Segher


Re: [PATCH] FreeBSD PowerPC use secure-plt

2019-09-25 Thread Segher Boessenkool
Hi!

On Wed, Sep 25, 2019 at 10:46:57PM +0200, Andreas Tobler wrote:
> --- gcc/config/rs6000/t-freebsd64 (revision 276090)
> +++ gcc/config/rs6000/t-freebsd64 (working copy)
> @@ -27,3 +27,6 @@
>  MULTILIB_EXCEPTIONS =
>  MULTILIB_OSDIRNAMES  = ../lib32
>  
> +SECURE_PLT = $(if $(findstring TARGET_FREEBSD32_SECURE_PLT=1, 
> $(tm_defines)),msecure-plt)
> +
> +MULTILIB_EXTRA_OPTS += $(SECURE_PLT)

$(findstring) isn't super great, it looks for substrings, so it would
also match "TARGET_FREEBSD32_SECURE_PLT=123"; you can use $(filter) instead?

Looks fine to me either way.


Segher


Re: git conversion of GCC wwwdocs repository

2019-09-25 Thread Segher Boessenkool
On Wed, Sep 25, 2019 at 04:49:58PM +, Joseph Myers wrote:
> A version of the conversion using @gcc.gnu.org addresses is now available:
> 
> git clone git+ssh://gcc.gnu.org/home/gccadmin/wwwdocs-test2.git

This looks great, thanks!

Can we test committing to this repo as well?  Are hooks set up already?
Am I too impatient?


Segher


Re: git conversion of GCC wwwdocs repository

2019-09-25 Thread Segher Boessenkool
On Wed, Sep 25, 2019 at 12:02:42PM +0100, Jonathan Wakely wrote:
> I mean that there's not much value in having my past commits listed as
> coming from various "different" authors:
> 
> Jonathan Wakely 
> Jonathan Wakely 
> Jonathan Wakely 
> Jonathan Wakely 
> Jonathan Wakely 
> 
> All of those "identities" are the same person, so making all my
> commits use the same Author is arguably correct, even if I was using
> different addresses at different times.
> 
> Using the @gcc.gnu.org address for all my past commits might be the
> least worst option, because it's "me" and that address was valid in
> 2004 and is still valid now, unlike the other options.

And using a different address is *incorrect*, and it is quite nasty if
you would be associating a commit with the wrong employer.  @gcc.gnu.org
is fine of course.


Segher


Re: [10/32] Remove global call sets: combine.c

2019-09-25 Thread Segher Boessenkool
On Wed, Sep 25, 2019 at 04:52:14PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Thu, Sep 12, 2019 at 08:51:59AM +0100, Richard Sandiford wrote:
> >> Segher Boessenkool  writes:
> >> > It is not such a great name like that.  Since its children are
> >> > very_long_names, it doesn't need to be only three chars itself,
> >> > either?
> >> 
> >> OK, what name would you prefer?
> >
> > Maybe call_abi is a good name?  It's difficult to capture the subtleties
> > in a short enough name.  As always :-)
> 
> The formatting ended up being a bit weird with a longer name,
> so how about the attached instead?

That looks great, thanks!

> +   /* ??? We could try to preserve some information from the last
> +  set of register I if the call doesn't actually clobber
> +  (reg:last_set_mode I), which might be true for ABIs with
> +  partial clobbers.  However, it would be difficult to
> +  update last_set_nonzero_bits and last_sign_bit_copies
> +  to account for the part of I that actually was clobbered.
> +  It wouldn't help much anyway, since we rarely see this
> +  situation before RA.  */

I would like to completely get rid of reg_stat, and have known bits
dealt with by some DF thing instead...  It would work much better and
be much easier to use at the same time.  Also, other passes could use
it as well.

If I ever will find time to do this, I don't know :-/


Segher


Re: [Darwin, PPC, Mode Iterators 1/n, committed] Use mode iterators in picbase patterns.

2019-09-24 Thread Segher Boessenkool
Hi Iain,

On Tue, Sep 24, 2019 at 08:31:16PM +0100, Iain Sandoe wrote:
> This switches the picbase load and reload patterns to use the 'P' mode
> iterator instead of writing an SI and DI pattern for each (and deletes the
> old patterns).  No functional change intended.

>  (define_expand "load_macho_picbase"
> -  [(set (reg:SI LR_REGNO)
> +  [(set (reg LR_REGNO)

This changes it to VOIDmode instead?  It should have been reg:P LR_REGNO?

>  (define_expand "reload_macho_picbase"
> -  [(set (reg:SI LR_REGNO)
> +  [(set (reg LR_REGNO)

Same here.


Segher


Re: [PATCH] Remove unused #include "vec.h" from hash-table.h

2019-09-24 Thread Segher Boessenkool
On Tue, Sep 24, 2019 at 07:44:10AM +0200, Bernhard Reutner-Fischer wrote:
> On Mon, 23 Sep 2019 14:52:19 -0500
> "Christian Biesinger via gcc-patches"  wrote:
> > From: Christian Biesinger 
> > Removes an unused include as a cleanup. Requires updating
> > lots of files who previously relied on this transitive include.
> 
> Note that we have a tool to help prune unused includes, somewhere.

contrib/header-tools/reduce-headers?  And see gcc-order-headers.


Segher


Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Segher Boessenkool
On Mon, Sep 23, 2019 at 06:56:12PM +0100, Jozef Lawrynowicz wrote:
> > It could have just done that as
> > 
> > (set (reg:PSI 28)
> >  (zero_extend:PSI (reg:QI 12)))
> > 
> > as far as I can see?  Do you already have a machine pattern that matches
> > that?
> 
> Yes that combination is what I was expecting to see. I guess since char is
> unsigned, a zero extend is needed to widen it, and then for the following 
> insn a
> sign extend is added to widen the HImode integer.

Yeah, but a sign_extend:M1 of a zero_extend:M2 of an M3 (with M2>M3) is
just a zero_extend:M1 of that M3.  Somehow combine (or simplify-rtx)
missed that; or come to think of it, it probably does that for MODE_INT
things, but this is a MODE_PARTIAL_INT.  Hrm, would it be correct then.
I think it is?  M2 is a normal MODE_INT, all bits in M2 are defined, the
M2 value always is non-negative, the sign_extend:M1 always is the same
as the zero_extend:M1 would be.

> There isn't currently a pattern which matches that, but adding it
> doesn't fix the issue which is why I thought it might need to be fixed earlier
> in RTL generation.
> Here is the pattern that is missing:
> 
> +(define_insn "movqipsi"
> +  [(set (match_operand:PSI 0 "register_operand" "=r,r")
> +   (zero_extend:PSI (match_operand:QI 1 "general_operand" "rYs,m")))]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)
> +
> 
> So will combine potentially be able to do away with (sign_extend 
> (zero_extend))
> in certain situations?

Yes.

> I filed PR/91865 which has all the relevant details from this thread.

Thanks!

> I can add the following nameless insn and combine will catch this case and
> generate the better, smaller code:
> 
> +(define_insn "*movqihipsi"
> +  [(set (match_operand:PSI 0 "register_operand" "=r,r")
> +   (sign_extend:PSI (zero_extend:HI (match_operand:QI 1 "general_operand"
> "rYs,m"]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)

Good to hear that already works!  combine should come up with the simpler
formulation though, let me see what I can do.


Segher


Re: [PATCH RS6000], add xxswapd support

2019-09-23 Thread Segher Boessenkool
Hi Carl,

On Mon, Sep 23, 2019 at 08:07:32AM -0700, Carl Love wrote:
>   * config/rs6000/vsx.md (xxswapd_v4si, xxswapd_v8hi, xxswapd_v16qi):
>   New define_insn.
>   (vsx_xxpermdi4_le_, vsx_xxpermdi8_le_V8HI,
>   vsx_xxpermdi16_le_V16QI): Removed define_insn.

If you use "" in a changelog, it usually is handy to say what mode
it is here.  Like in this case, (vsx_xxpermdi4_le_ for VSX_W).

> +(define_insn "xxswapd_v16qi"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (vec_select:V16QI
> +   (match_operand:V16QI 1 "vsx_register_operand" "wa")
> +   (parallel [(const_int 8) (const_int 9)
> +  (const_int 10) (const_int 11)
> +  (const_int 12) (const_int 13)
> +  (const_int 14) (const_int 15)
> +  (const_int 0) (const_int 1)
> +  (const_int 2) (const_int 3)
> +  (const_int 4) (const_int 5)
> +  (const_int 6) (const_int 7)])))]
> +   "TARGET_VSX"
> +;; AIX does not support the extended mnemonic xxswapd.  Use the basic
> +;; mnemonic xxpermdi instead.
> +   "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])

Both of the strings ("TARGET_VSX" and "xxpermdi*") should be indented by
only two spacesC, just like the things above and below them are.

> +;; AIX does not support the extended mnemonic xxswapd.  Use the basic
> +;; mnemonic xxpermdi instead.
> +   "xxpermdi %x0,%x1,%x1,2"
> +   [(set_attr "type" "vecperm")])

For the other two, the [(set_attr... as well: only two spaces.

Okay for trunk with that fixed up.  Thanks!


Segher


  1   2   3   4   5   6   7   8   9   10   >