Re: [PATCH] Fix PR91522

2020-05-09 Thread H.J. Lu via Gcc-patches
On Fri, Aug 23, 2019 at 3:57 AM Richard Biener  wrote:
>
> On Fri, 23 Aug 2019, Uros Bizjak wrote:
>
> > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
> > >
> > >
> > > This fixes quadraticness in STV and makes
> > >
> > >  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > 95%)  54 kB (  0%)
> > >
> > > drop to zero.  Anybody remembers why it is the way it is now?
> > >
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > >
> > > OK?
> >
> > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > and will be updated?
> >
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
>
> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> The following is a rewrite of the non-TImode chain conversion
> according to I think how it should operate als allowing the hunk
> that fixes the compile-time and fixing PR91527 on the way
> (which I ran into during more extensive testing of the patch myself).
>
> So compared to the state before which I still do not 100% understand
> we now do the following.  Chain detection works as before including
> recording of all defs (both defined by the insns in the chain and
> insns outside) that need copy-in or copy-out operations.
>
> But then the patch changes things as to guarantee that
> after the conversion all uses/defs of a pseudo are
> of the (subreg:Vmode ..) form or of the original scalar form.
> In particular it avoids the need to change any insns that
> are not part of the chain (besides emitting the extra copy
> instructions).  For this all defs that were marked as needing
> copies (thus they have uses/defs both in the chain and outside)
> the chain will use a new pseudo that we copy to from scalar sources
> and that we copy from for scalar uses.  There's the new defs_map
> which records the mapping of old to new reg.  pseudos that are
> only used in the chain already are not remapped.
>
> The conversion itself then happens in two stages, first,
> in make_vector_copies, we emit the copy-in insns and
> allocate all pseudos we need.  Then the rest of the
> conversion happens fully inside of convert_insn where
> we generate the copy-outs of the insns def, replace
> defs and uses according to the mapping and replace uses
> and defs with the (subreg:Vmode ..) style.
>
> For PR91527 IRA doesn't like the REG_EQUIV note in
>
> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> (subreg:V4SI (reg:SI 100) 0))
> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> 1248 {movv4si_internal}
>  (expr_list:REG_DEAD (reg:SI 100)
> (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> (const_int 16 [0x10])) [1 c+0 S4 A64])
> (nil
>
> because the SET_DEST is not a REG_P.  I'm not sure if this
> is invalid RTL, docs say SET_DEST might be a strict_low_part
> or a zero_extract but doesn't mention a subreg.  So I opted
> to simply remove equal/equiv notes on insns we convert
> and since the above has a REG_DEAD note I took the liberty
> to update that according to the mapping (so that would have
> been not needed before this patch) rather than dropping it.
>
> Bootstrapped with and without --with-march=westmere (to get
> some STV coverage, this included all languages) on
> x88_64-unknown-linux-gnu, testing in progress.
>
> OK if testing succeeds?
>
> It still solves the compile-time issue (which is a latent issue,
> btw, and with a carefully crafted testcase can be triggered
> since STV exists for DImode chains with !TARGET_64BIT).
>
> Thanks,
> Richard.
>
> 2019-08-22  Richard Biener  
>
> PR target/91522
> PR target/91527
> * config/i386/i386-features.h (general_scalar_chain::defs_map):
> New member.
> (general_scalar_chain::replace_with_subreg): Remove.
> (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> (general_scalar_chain::convert_reg): Adjust signature.
> * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> iterate over all defs of a reg.
> (general_scalar_chain::replace_with_subreg): Remove.
> (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> (general_scalar_chain::make_vector_copies): Populate defs_map,
> place copy only after defs that are used as vectors in the chain.
> (general_scalar_chain::convert_reg): Emit a copy for a specific
> def in a specific instruction.
> (general_scalar_chain::convert_op): All reg uses are converted here.
> (general_scalar_chain::convert_insn): Emit copies for scalar
> uses of defs here.  Replace uses with the copies we created.
> Replace and convert the def.  Adjust REG_DEAD notes, remove
> REG_EQUIV/EQUAL notes.
> (general_scalar_chain::convert_registers): Only handle copies
> into the chain here.
>

This caused:


Re: [PATCH] Fix PR91522

2019-08-28 Thread Uros Bizjak
On Wed, Aug 28, 2019 at 3:54 PM Richard Biener  wrote:
> > >>> 2019-08-23  Richard Biener  
> > >>>
> > >>>  PR target/91522
> > >>>  PR target/91527
> > >>>  * config/i386/i386-features.h (general_scalar_chain::defs_map):
> > >>>  New member.
> > >>>  (general_scalar_chain::replace_with_subreg): Remove.
> > >>>  (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> > >>>  (general_scalar_chain::convert_reg): Adjust signature.
> > >>>  * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> > >>>  iterate over all defs of a reg.
> > >>>  (general_scalar_chain::replace_with_subreg): Remove.
> > >>>  (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> > >>>  (general_scalar_chain::make_vector_copies): Populate defs_map,
> > >>>  place copy only after defs that are used as vectors in the chain.
> > >>>  (general_scalar_chain::convert_reg): Emit a copy for a specific
> > >>>  def in a specific instruction.
> > >>>  (general_scalar_chain::convert_op): All reg uses are converted 
> > >>> here.
> > >>>  (general_scalar_chain::convert_insn): Emit copies for scalar
> > >>>  uses of defs here.  Replace uses with the copies we created.
> > >>>  Replace and convert the def.  Adjust REG_DEAD notes, remove
> > >>>  REG_EQUIV/EQUAL notes.
> > >>>  (general_scalar_chain::convert_registers): Only handle copies
> > >>>  into the chain here.
> > >
> > > Rubberstamped with LGTM. It looks you are the master of this domain now ;)
> >
> > This  breaks bootstrap for i686-darwin (and most likely is the cause of 
> > bootstrap fail on
> > i686-linux i686-linux-gnu at 
> > https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html
> > et al)
> > It gives a bunch of compare errors spread around the tree - so no specific 
> > pointer there.
> >
> > There’s a secondary fail overlaying it between 274933-or so and 274980 
> > which confused
> > my initial search.
>
> Please file a bugreport.  Don't have time today to dig into but will do
> tomorrow (but now at least try to reproduce).

Looking at HJ's testreports, it looks configuring  for i686-linux
--with-fpmath=sse should be enough to trigger comparison failure.

Uros.


Re: [PATCH] Fix PR91522

2019-08-28 Thread Richard Biener
On Wed, 28 Aug 2019, Iain Sandoe wrote:

> 
> 
> 
> > On 26 Aug 2019, at 11:06, Uros Bizjak  wrote:
> > 
> > On Mon, Aug 26, 2019 at 10:40 AM Richard Biener  wrote:
> >> 
> >> On Fri, 23 Aug 2019, Richard Biener wrote:
> >> 
> >>> On Fri, 23 Aug 2019, Richard Biener wrote:
> >>> 
>  On Fri, 23 Aug 2019, Uros Bizjak wrote:
>  
> > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  
> > wrote:
> >> 
> >> 
> >> This fixes quadraticness in STV and makes
> >> 
> >> machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  
> >> 89.10 (
> >> 95%)  54 kB (  0%)
> >> 
> >> drop to zero.  Anybody remembers why it is the way it is now?
> >> 
> >> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >> 
> >> OK?
> > 
> > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > and will be updated?
> > 
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
>  
>  Yes.  I'm still learning how STV operates (and learing DF and RTL...).
>  The following is a rewrite of the non-TImode chain conversion
>  according to I think how it should operate als allowing the hunk
>  that fixes the compile-time and fixing PR91527 on the way
>  (which I ran into during more extensive testing of the patch myself).
>  
>  So compared to the state before which I still do not 100% understand
>  we now do the following.  Chain detection works as before including
>  recording of all defs (both defined by the insns in the chain and
>  insns outside) that need copy-in or copy-out operations.
>  
>  But then the patch changes things as to guarantee that
>  after the conversion all uses/defs of a pseudo are
>  of the (subreg:Vmode ..) form or of the original scalar form.
>  In particular it avoids the need to change any insns that
>  are not part of the chain (besides emitting the extra copy
>  instructions).  For this all defs that were marked as needing
>  copies (thus they have uses/defs both in the chain and outside)
>  the chain will use a new pseudo that we copy to from scalar sources
>  and that we copy from for scalar uses.  There's the new defs_map
>  which records the mapping of old to new reg.  pseudos that are
>  only used in the chain already are not remapped.
>  
>  The conversion itself then happens in two stages, first,
>  in make_vector_copies, we emit the copy-in insns and
>  allocate all pseudos we need.  Then the rest of the
>  conversion happens fully inside of convert_insn where
>  we generate the copy-outs of the insns def, replace
>  defs and uses according to the mapping and replace uses
>  and defs with the (subreg:Vmode ..) style.
>  
>  For PR91527 IRA doesn't like the REG_EQUIV note in
>  
>  (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> (subreg:V4SI (reg:SI 100) 0))
>  "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
>  1248 {movv4si_internal}
>  (expr_list:REG_DEAD (reg:SI 100)
> (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> (const_int 16 [0x10])) [1 c+0 S4 A64])
> (nil
>  
>  because the SET_DEST is not a REG_P.  I'm not sure if this
>  is invalid RTL, docs say SET_DEST might be a strict_low_part
>  or a zero_extract but doesn't mention a subreg.  So I opted
>  to simply remove equal/equiv notes on insns we convert
>  and since the above has a REG_DEAD note I took the liberty
>  to update that according to the mapping (so that would have
>  been not needed before this patch) rather than dropping it.
>  
>  Bootstrapped with and without --with-march=westmere (to get
>  some STV coverage, this included all languages) on
>  x88_64-unknown-linux-gnu, testing in progress.
>  
>  OK if testing succeeds?
> >>> 
> >>> Testing revealed I made an error in general_scalar_chain::convert_insn
> >>> failing to move down SET_SRC extraction below replacing with
> >>> the defs map.  This showed in 4 execute FAILs in 32bit fortran
> >>> testing (only).  Fixed by moving down the whole def_set/src/dst
> >>> extraction.
> >>> 
> >>> Re-testing on x86_64-unknown-linux-gnu.
> >> 
> >> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
> >> run runs into the latent PR91528 building 32bit libada in stage3
> >> for a few sources, I've manually built those with -mno-stv and
> >> bootstrap finishes successfully.  I hope HJ can help with this
> >> dynamic stack-alignment issue.
> >> 
> >> So - OK for trunk?
> >> 
> >> As followup we can now remove general_remove_non_convertible_regs
> >> because we can handle defs that cannot be converted just fine
> >> with the patch since we're splitting live-ranges of all defs at
> >> the chain boundary.
> >> 
> >> Thanks,
> >> 

Re: [PATCH] Fix PR91522

2019-08-28 Thread Iain Sandoe




> On 26 Aug 2019, at 11:06, Uros Bizjak  wrote:
> 
> On Mon, Aug 26, 2019 at 10:40 AM Richard Biener  wrote:
>> 
>> On Fri, 23 Aug 2019, Richard Biener wrote:
>> 
>>> On Fri, 23 Aug 2019, Richard Biener wrote:
>>> 
 On Fri, 23 Aug 2019, Uros Bizjak wrote:
 
> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
>> 
>> 
>> This fixes quadraticness in STV and makes
>> 
>> machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
>> 95%)  54 kB (  0%)
>> 
>> drop to zero.  Anybody remembers why it is the way it is now?
>> 
>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>> 
>> OK?
> 
> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> and will be updated?
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
 
 Yes.  I'm still learning how STV operates (and learing DF and RTL...).
 The following is a rewrite of the non-TImode chain conversion
 according to I think how it should operate als allowing the hunk
 that fixes the compile-time and fixing PR91527 on the way
 (which I ran into during more extensive testing of the patch myself).
 
 So compared to the state before which I still do not 100% understand
 we now do the following.  Chain detection works as before including
 recording of all defs (both defined by the insns in the chain and
 insns outside) that need copy-in or copy-out operations.
 
 But then the patch changes things as to guarantee that
 after the conversion all uses/defs of a pseudo are
 of the (subreg:Vmode ..) form or of the original scalar form.
 In particular it avoids the need to change any insns that
 are not part of the chain (besides emitting the extra copy
 instructions).  For this all defs that were marked as needing
 copies (thus they have uses/defs both in the chain and outside)
 the chain will use a new pseudo that we copy to from scalar sources
 and that we copy from for scalar uses.  There's the new defs_map
 which records the mapping of old to new reg.  pseudos that are
 only used in the chain already are not remapped.
 
 The conversion itself then happens in two stages, first,
 in make_vector_copies, we emit the copy-in insns and
 allocate all pseudos we need.  Then the rest of the
 conversion happens fully inside of convert_insn where
 we generate the copy-outs of the insns def, replace
 defs and uses according to the mapping and replace uses
 and defs with the (subreg:Vmode ..) style.
 
 For PR91527 IRA doesn't like the REG_EQUIV note in
 
 (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
(subreg:V4SI (reg:SI 100) 0))
 "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
 1248 {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 100)
(expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
(const_int 16 [0x10])) [1 c+0 S4 A64])
(nil
 
 because the SET_DEST is not a REG_P.  I'm not sure if this
 is invalid RTL, docs say SET_DEST might be a strict_low_part
 or a zero_extract but doesn't mention a subreg.  So I opted
 to simply remove equal/equiv notes on insns we convert
 and since the above has a REG_DEAD note I took the liberty
 to update that according to the mapping (so that would have
 been not needed before this patch) rather than dropping it.
 
 Bootstrapped with and without --with-march=westmere (to get
 some STV coverage, this included all languages) on
 x88_64-unknown-linux-gnu, testing in progress.
 
 OK if testing succeeds?
>>> 
>>> Testing revealed I made an error in general_scalar_chain::convert_insn
>>> failing to move down SET_SRC extraction below replacing with
>>> the defs map.  This showed in 4 execute FAILs in 32bit fortran
>>> testing (only).  Fixed by moving down the whole def_set/src/dst
>>> extraction.
>>> 
>>> Re-testing on x86_64-unknown-linux-gnu.
>> 
>> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
>> run runs into the latent PR91528 building 32bit libada in stage3
>> for a few sources, I've manually built those with -mno-stv and
>> bootstrap finishes successfully.  I hope HJ can help with this
>> dynamic stack-alignment issue.
>> 
>> So - OK for trunk?
>> 
>> As followup we can now remove general_remove_non_convertible_regs
>> because we can handle defs that cannot be converted just fine
>> with the patch since we're splitting live-ranges of all defs at
>> the chain boundary.
>> 
>> Thanks,
>> Richard.
>> 
>>> Updated patch below.  I'm feeling adventurous and will run
>>> the "westmere" bootstrap with costing disabled (aka always
>>> convert detected chains...).
>>> 
>>> Richard.
>>> 
>>> 2019-08-23  Richard Biener  
>>> 
>>>  PR target/91522
>>>  PR target/91527

Re: [PATCH] Fix PR91522

2019-08-26 Thread Uros Bizjak
On Mon, Aug 26, 2019 at 10:40 AM Richard Biener  wrote:
>
> On Fri, 23 Aug 2019, Richard Biener wrote:
>
> > On Fri, 23 Aug 2019, Richard Biener wrote:
> >
> > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > >
> > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  
> > > > wrote:
> > > > >
> > > > >
> > > > > This fixes quadraticness in STV and makes
> > > > >
> > > > >  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  
> > > > > 89.10 (
> > > > > 95%)  54 kB (  0%)
> > > > >
> > > > > drop to zero.  Anybody remembers why it is the way it is now?
> > > > >
> > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > > >
> > > > > OK?
> > > >
> > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > > > and will be updated?
> > > >
> > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> > >
> > > Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> > > The following is a rewrite of the non-TImode chain conversion
> > > according to I think how it should operate als allowing the hunk
> > > that fixes the compile-time and fixing PR91527 on the way
> > > (which I ran into during more extensive testing of the patch myself).
> > >
> > > So compared to the state before which I still do not 100% understand
> > > we now do the following.  Chain detection works as before including
> > > recording of all defs (both defined by the insns in the chain and
> > > insns outside) that need copy-in or copy-out operations.
> > >
> > > But then the patch changes things as to guarantee that
> > > after the conversion all uses/defs of a pseudo are
> > > of the (subreg:Vmode ..) form or of the original scalar form.
> > > In particular it avoids the need to change any insns that
> > > are not part of the chain (besides emitting the extra copy
> > > instructions).  For this all defs that were marked as needing
> > > copies (thus they have uses/defs both in the chain and outside)
> > > the chain will use a new pseudo that we copy to from scalar sources
> > > and that we copy from for scalar uses.  There's the new defs_map
> > > which records the mapping of old to new reg.  pseudos that are
> > > only used in the chain already are not remapped.
> > >
> > > The conversion itself then happens in two stages, first,
> > > in make_vector_copies, we emit the copy-in insns and
> > > allocate all pseudos we need.  Then the rest of the
> > > conversion happens fully inside of convert_insn where
> > > we generate the copy-outs of the insns def, replace
> > > defs and uses according to the mapping and replace uses
> > > and defs with the (subreg:Vmode ..) style.
> > >
> > > For PR91527 IRA doesn't like the REG_EQUIV note in
> > >
> > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> > > (subreg:V4SI (reg:SI 100) 0))
> > > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> > > 1248 {movv4si_internal}
> > >  (expr_list:REG_DEAD (reg:SI 100)
> > > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> > > (const_int 16 [0x10])) [1 c+0 S4 A64])
> > > (nil
> > >
> > > because the SET_DEST is not a REG_P.  I'm not sure if this
> > > is invalid RTL, docs say SET_DEST might be a strict_low_part
> > > or a zero_extract but doesn't mention a subreg.  So I opted
> > > to simply remove equal/equiv notes on insns we convert
> > > and since the above has a REG_DEAD note I took the liberty
> > > to update that according to the mapping (so that would have
> > > been not needed before this patch) rather than dropping it.
> > >
> > > Bootstrapped with and without --with-march=westmere (to get
> > > some STV coverage, this included all languages) on
> > > x88_64-unknown-linux-gnu, testing in progress.
> > >
> > > OK if testing succeeds?
> >
> > Testing revealed I made an error in general_scalar_chain::convert_insn
> > failing to move down SET_SRC extraction below replacing with
> > the defs map.  This showed in 4 execute FAILs in 32bit fortran
> > testing (only).  Fixed by moving down the whole def_set/src/dst
> > extraction.
> >
> > Re-testing on x86_64-unknown-linux-gnu.
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
> run runs into the latent PR91528 building 32bit libada in stage3
> for a few sources, I've manually built those with -mno-stv and
> bootstrap finishes successfully.  I hope HJ can help with this
> dynamic stack-alignment issue.
>
> So - OK for trunk?
>
> As followup we can now remove general_remove_non_convertible_regs
> because we can handle defs that cannot be converted just fine
> with the patch since we're splitting live-ranges of all defs at
> the chain boundary.
>
> Thanks,
> Richard.
>
> > Updated patch below.  I'm feeling adventurous and will run
> > the "westmere" bootstrap with costing disabled (aka always
> > convert detected chains...).
> >
> > Richard.
> >
> > 2019-08-23  Richard Biener  
> >
> >   PR target/91522
> 

Re: [PATCH] Fix PR91522

2019-08-26 Thread Richard Biener
On Fri, 23 Aug 2019, Richard Biener wrote:

> On Fri, 23 Aug 2019, Richard Biener wrote:
> 
> > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > 
> > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
> > > >
> > > >
> > > > This fixes quadraticness in STV and makes
> > > >
> > > >  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  
> > > > 89.10 (
> > > > 95%)  54 kB (  0%)
> > > >
> > > > drop to zero.  Anybody remembers why it is the way it is now?
> > > >
> > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > >
> > > > OK?
> > > 
> > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > > and will be updated?
> > > 
> > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> > 
> > Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> > The following is a rewrite of the non-TImode chain conversion
> > according to I think how it should operate als allowing the hunk
> > that fixes the compile-time and fixing PR91527 on the way
> > (which I ran into during more extensive testing of the patch myself).
> > 
> > So compared to the state before which I still do not 100% understand
> > we now do the following.  Chain detection works as before including
> > recording of all defs (both defined by the insns in the chain and
> > insns outside) that need copy-in or copy-out operations.
> > 
> > But then the patch changes things as to guarantee that
> > after the conversion all uses/defs of a pseudo are
> > of the (subreg:Vmode ..) form or of the original scalar form.
> > In particular it avoids the need to change any insns that
> > are not part of the chain (besides emitting the extra copy
> > instructions).  For this all defs that were marked as needing
> > copies (thus they have uses/defs both in the chain and outside)
> > the chain will use a new pseudo that we copy to from scalar sources
> > and that we copy from for scalar uses.  There's the new defs_map
> > which records the mapping of old to new reg.  pseudos that are
> > only used in the chain already are not remapped.
> > 
> > The conversion itself then happens in two stages, first,
> > in make_vector_copies, we emit the copy-in insns and
> > allocate all pseudos we need.  Then the rest of the
> > conversion happens fully inside of convert_insn where
> > we generate the copy-outs of the insns def, replace
> > defs and uses according to the mapping and replace uses
> > and defs with the (subreg:Vmode ..) style.
> > 
> > For PR91527 IRA doesn't like the REG_EQUIV note in
> > 
> > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> > (subreg:V4SI (reg:SI 100) 0)) 
> > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> >  
> > 1248 {movv4si_internal}
> >  (expr_list:REG_DEAD (reg:SI 100)
> > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> > (const_int 16 [0x10])) [1 c+0 S4 A64])
> > (nil
> > 
> > because the SET_DEST is not a REG_P.  I'm not sure if this
> > is invalid RTL, docs say SET_DEST might be a strict_low_part
> > or a zero_extract but doesn't mention a subreg.  So I opted
> > to simply remove equal/equiv notes on insns we convert
> > and since the above has a REG_DEAD note I took the liberty
> > to update that according to the mapping (so that would have
> > been not needed before this patch) rather than dropping it.
> > 
> > Bootstrapped with and without --with-march=westmere (to get
> > some STV coverage, this included all languages) on 
> > x88_64-unknown-linux-gnu, testing in progress.
> > 
> > OK if testing succeeds?
> 
> Testing revealed I made an error in general_scalar_chain::convert_insn
> failing to move down SET_SRC extraction below replacing with
> the defs map.  This showed in 4 execute FAILs in 32bit fortran
> testing (only).  Fixed by moving down the whole def_set/src/dst
> extraction.
> 
> Re-testing on x86_64-unknown-linux-gnu.

Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
run runs into the latent PR91528 building 32bit libada in stage3
for a few sources, I've manually built those with -mno-stv and
bootstrap finishes successfully.  I hope HJ can help with this
dynamic stack-alignment issue.

So - OK for trunk?

As followup we can now remove general_remove_non_convertible_regs
because we can handle defs that cannot be converted just fine
with the patch since we're splitting live-ranges of all defs at
the chain boundary.

Thanks,
Richard.

> Updated patch below.  I'm feeling adventurous and will run
> the "westmere" bootstrap with costing disabled (aka always
> convert detected chains...).
> 
> Richard.
> 
> 2019-08-23  Richard Biener  
> 
>   PR target/91522
>   PR target/91527
>   * config/i386/i386-features.h (general_scalar_chain::defs_map):
>   New member.
>   (general_scalar_chain::replace_with_subreg): Remove.
>   (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>   

Re: [PATCH] Fix PR91522

2019-08-23 Thread Richard Biener
On Fri, 23 Aug 2019, Richard Biener wrote:

> On Fri, 23 Aug 2019, Uros Bizjak wrote:
> 
> > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
> > >
> > >
> > > This fixes quadraticness in STV and makes
> > >
> > >  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > 95%)  54 kB (  0%)
> > >
> > > drop to zero.  Anybody remembers why it is the way it is now?
> > >
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > >
> > > OK?
> > 
> > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > and will be updated?
> > 
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> 
> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> The following is a rewrite of the non-TImode chain conversion
> according to I think how it should operate als allowing the hunk
> that fixes the compile-time and fixing PR91527 on the way
> (which I ran into during more extensive testing of the patch myself).
> 
> So compared to the state before which I still do not 100% understand
> we now do the following.  Chain detection works as before including
> recording of all defs (both defined by the insns in the chain and
> insns outside) that need copy-in or copy-out operations.
> 
> But then the patch changes things as to guarantee that
> after the conversion all uses/defs of a pseudo are
> of the (subreg:Vmode ..) form or of the original scalar form.
> In particular it avoids the need to change any insns that
> are not part of the chain (besides emitting the extra copy
> instructions).  For this all defs that were marked as needing
> copies (thus they have uses/defs both in the chain and outside)
> the chain will use a new pseudo that we copy to from scalar sources
> and that we copy from for scalar uses.  There's the new defs_map
> which records the mapping of old to new reg.  pseudos that are
> only used in the chain already are not remapped.
> 
> The conversion itself then happens in two stages, first,
> in make_vector_copies, we emit the copy-in insns and
> allocate all pseudos we need.  Then the rest of the
> conversion happens fully inside of convert_insn where
> we generate the copy-outs of the insns def, replace
> defs and uses according to the mapping and replace uses
> and defs with the (subreg:Vmode ..) style.
> 
> For PR91527 IRA doesn't like the REG_EQUIV note in
> 
> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> (subreg:V4SI (reg:SI 100) 0)) 
> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
>  
> 1248 {movv4si_internal}
>  (expr_list:REG_DEAD (reg:SI 100)
> (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> (const_int 16 [0x10])) [1 c+0 S4 A64])
> (nil
> 
> because the SET_DEST is not a REG_P.  I'm not sure if this
> is invalid RTL, docs say SET_DEST might be a strict_low_part
> or a zero_extract but doesn't mention a subreg.  So I opted
> to simply remove equal/equiv notes on insns we convert
> and since the above has a REG_DEAD note I took the liberty
> to update that according to the mapping (so that would have
> been not needed before this patch) rather than dropping it.
> 
> Bootstrapped with and without --with-march=westmere (to get
> some STV coverage, this included all languages) on 
> x88_64-unknown-linux-gnu, testing in progress.
> 
> OK if testing succeeds?

Testing revealed I made an error in general_scalar_chain::convert_insn
failing to move down SET_SRC extraction below replacing with
the defs map.  This showed in 4 execute FAILs in 32bit fortran
testing (only).  Fixed by moving down the whole def_set/src/dst
extraction.

Re-testing on x86_64-unknown-linux-gnu.

Updated patch below.  I'm feeling adventurous and will run
the "westmere" bootstrap with costing disabled (aka always
convert detected chains...).

Richard.

2019-08-23  Richard Biener  

PR target/91522
PR target/91527
* config/i386/i386-features.h (general_scalar_chain::defs_map):
New member.
(general_scalar_chain::replace_with_subreg): Remove.
(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
(general_scalar_chain::convert_reg): Adjust signature.
* config/i386/i386-features.c (scalar_chain::add_insn): Do not
iterate over all defs of a reg.
(general_scalar_chain::replace_with_subreg): Remove.
(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
(general_scalar_chain::make_vector_copies): Populate defs_map,
place copy only after defs that are used as vectors in the chain.
(general_scalar_chain::convert_reg): Emit a copy for a specific
def in a specific instruction.
(general_scalar_chain::convert_op): All reg uses are converted here.
(general_scalar_chain::convert_insn): Emit copies for scalar
uses of defs here.  Replace uses with the copies we created.
Replace and convert 

Re: [PATCH] Fix PR91522

2019-08-23 Thread Richard Biener
On Fri, 23 Aug 2019, Uros Bizjak wrote:

> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
> >
> >
> > This fixes quadraticness in STV and makes
> >
> >  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > 95%)  54 kB (  0%)
> >
> > drop to zero.  Anybody remembers why it is the way it is now?
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> and will be updated?
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3

Yes.  I'm still learning how STV operates (and learing DF and RTL...).
The following is a rewrite of the non-TImode chain conversion
according to I think how it should operate als allowing the hunk
that fixes the compile-time and fixing PR91527 on the way
(which I ran into during more extensive testing of the patch myself).

So compared to the state before which I still do not 100% understand
we now do the following.  Chain detection works as before including
recording of all defs (both defined by the insns in the chain and
insns outside) that need copy-in or copy-out operations.

But then the patch changes things as to guarantee that
after the conversion all uses/defs of a pseudo are
of the (subreg:Vmode ..) form or of the original scalar form.
In particular it avoids the need to change any insns that
are not part of the chain (besides emitting the extra copy
instructions).  For this all defs that were marked as needing
copies (thus they have uses/defs both in the chain and outside)
the chain will use a new pseudo that we copy to from scalar sources
and that we copy from for scalar uses.  There's the new defs_map
which records the mapping of old to new reg.  pseudos that are
only used in the chain already are not remapped.

The conversion itself then happens in two stages, first,
in make_vector_copies, we emit the copy-in insns and
allocate all pseudos we need.  Then the rest of the
conversion happens fully inside of convert_insn where
we generate the copy-outs of the insns def, replace
defs and uses according to the mapping and replace uses
and defs with the (subreg:Vmode ..) style.

For PR91527 IRA doesn't like the REG_EQUIV note in

(insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
(subreg:V4SI (reg:SI 100) 0)) 
"/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 
1248 {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 100)
(expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
(const_int 16 [0x10])) [1 c+0 S4 A64])
(nil

because the SET_DEST is not a REG_P.  I'm not sure if this
is invalid RTL, docs say SET_DEST might be a strict_low_part
or a zero_extract but doesn't mention a subreg.  So I opted
to simply remove equal/equiv notes on insns we convert
and since the above has a REG_DEAD note I took the liberty
to update that according to the mapping (so that would have
been not needed before this patch) rather than dropping it.

Bootstrapped with and without --with-march=westmere (to get
some STV coverage, this included all languages) on 
x88_64-unknown-linux-gnu, testing in progress.

OK if testing succeeds?

It still solves the compile-time issue (which is a latent issue,
btw, and with a carefully crafted testcase can be triggered
since STV exists for DImode chains with !TARGET_64BIT).

Thanks,
Richard.

2019-08-22  Richard Biener  

PR target/91522
PR target/91527
* config/i386/i386-features.h (general_scalar_chain::defs_map):
New member.
(general_scalar_chain::replace_with_subreg): Remove.
(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
(general_scalar_chain::convert_reg): Adjust signature.
* config/i386/i386-features.c (scalar_chain::add_insn): Do not
iterate over all defs of a reg.
(general_scalar_chain::replace_with_subreg): Remove.
(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
(general_scalar_chain::make_vector_copies): Populate defs_map,
place copy only after defs that are used as vectors in the chain.
(general_scalar_chain::convert_reg): Emit a copy for a specific
def in a specific instruction.
(general_scalar_chain::convert_op): All reg uses are converted here.
(general_scalar_chain::convert_insn): Emit copies for scalar
uses of defs here.  Replace uses with the copies we created.
Replace and convert the def.  Adjust REG_DEAD notes, remove
REG_EQUIV/EQUAL notes.
(general_scalar_chain::convert_registers): Only handle copies
into the chain here.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274843)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
  iterates over 

Re: [PATCH] Fix PR91522

2019-08-23 Thread Uros Bizjak
On Thu, Aug 22, 2019 at 3:35 PM Richard Biener  wrote:
>
>
> This fixes quadraticness in STV and makes
>
>  machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> 95%)  54 kB (  0%)
>
> drop to zero.  Anybody remembers why it is the way it is now?
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> OK?

Looking at the PR, comment #3 [1], I assume this patch is obsoltete
and will be updated?

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3

Uros.

> Thanks,
> Richard.
>
> 2019-08-22  Richard Biener  
>
> PR target/91522
> * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> iterate over all defs of a reg.
>
> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c (revision 274764)
> +++ gcc/config/i386/i386-features.c (working copy)
> @@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
>df_ref def;
>for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>  if (!HARD_REGISTER_P (DF_REF_REG (ref)))
> -  for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
> -  def;
> -  def = DF_REF_NEXT_REG (def))
> -   analyze_register_chain (candidates, def);
> +  analyze_register_chain (candidates, ref);
>for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>  if (!DF_REF_REG_MEM_P (ref))
>analyze_register_chain (candidates, ref);
>


[PATCH] Fix PR91522

2019-08-22 Thread Richard Biener


This fixes quadraticness in STV and makes

 machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 ( 
95%)  54 kB (  0%)

drop to zero.  Anybody remembers why it is the way it is now?

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2019-08-22  Richard Biener  

PR target/91522
* config/i386/i386-features.c (scalar_chain::add_insn): Do not
iterate over all defs of a reg.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274764)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
   df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
 if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-  for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-  def;
-  def = DF_REF_NEXT_REG (def))
-   analyze_register_chain (candidates, def);
+  analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
 if (!DF_REF_REG_MEM_P (ref))
   analyze_register_chain (candidates, ref);