[Bug target/70322] STV doesn't optimize andn

2022-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48

commit r12-7502-g8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48
Author: Roger Sayle 
Date:   Sat Mar 5 08:50:45 2022 +

PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32.

This clean-up patch resolves PR testsuite/104732, the failure of the recent
test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86.  Rather than just
tweak the testcase, the proposed approach is to fix the underlying problem
by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode
logical operation expanders and pre-reload splitters in i386.md, which as
I'll show generate inferior code (even a GCC 12 regression) on
!TARGET_64BIT
whenever -mno-stv (such as Solaris) or -msse (but not -msse2).

First a little bit of history.  In the beginning, DImode operations on
i386 weren't defined by the machine description, and lowered during RTL
expansion to SI mode operations.  The with PR 65105 in 2015, -mstv was
added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x)
together with several *di3_doubleword post-reload splitters that
made use of register allocation to perform some double word operations
in 64-but XMM registers.  A short while later in 2016, PR 70322 added
similar support for one_cmpldi2.  All of this logic was dependent upon
"!TARGET_64BIT && TARGET_STV && TARGET_SSE2".  With the passing of time,
these conditions became irrelevant when in 2019, it was decided to split
these double-word patterns before reload.
https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html
https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html
Hence the current situation, where on most modern CPU architectures
(where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI
mode operations, that are then split into two SI mode instructions
before reload, except on Solaris and other odd cases, where the splitting
is to two SI mode instructions is done during RTL expansion.  By the
time compilation reaches register allocation both paths in theory
produce identical or similar code, so the vestigial legacy/logic would
appear to be harmless.

Unfortunately, there is one place where this arbitrary choice of how
to lower DI mode doubleword operations is visible to the middle-end,
it controls whether the backend appears to have a suitable optab, and
the presence (or not) of DImode optabs can influence vectorization
cost models and veclower decisions.

The issue (and code quality regression) can be seen in this test case:

typedef long long v2di __attribute__((vector_size (16)));
v2di x;
void foo (long long a)
{
v2di t = {a, a};
x = ~t;
}

which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces:

foo:subl$28, %esp
movl%ebx, 16(%esp)
movl32(%esp), %eax
movl%esi, 20(%esp)
movl36(%esp), %edx
movl%edi, 24(%esp)
movl%eax, %esi
movl%eax, %edi
movl%edx, %ebx
movl%edx, %ecx
notl%esi
notl%ebx
movl%esi, (%esp)
notl%edi
notl%ecx
movl%ebx, 4(%esp)
movl20(%esp), %esi
movl%edi, 8(%esp)
movl16(%esp), %ebx
movl%ecx, 12(%esp)
movl24(%esp), %edi
movss   8(%esp), %xmm1
movss   12(%esp), %xmm2
movss   (%esp), %xmm0
movss   4(%esp), %xmm3
unpcklps%xmm2, %xmm1
unpcklps%xmm3, %xmm0
movlhps %xmm1, %xmm0
movaps  %xmm0, x
addl$28, %esp
ret

Importantly notice the four "notl" instructions.  With this patch:

foo:subl$28, %esp
movl32(%esp), %edx
movl36(%esp), %eax
notl%edx
movl%edx, (%esp)
notl%eax
movl%eax, 4(%esp)
movl%edx, 8(%esp)
movl%eax, 12(%esp)
movaps  (%esp), %xmm1
movaps  %xmm1, x
addl$28, %esp
ret

Notice only two "notl" instructions.  Checking with godbolt.org, GCC
generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x,
and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies
this clean-up as suitable for stage 4].

Most significantly, this patch allows pr100711-1.c to pass with
-mno-stv, allowing pandn to be used with V2DImode on Solaris/x86.
Fingers-crossed this should reduce the number of discrepancies

[Bug target/70322] STV doesn't optimize andn

2016-12-04 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #10 from uros at gcc dot gnu.org ---
Author: uros
Date: Sun Dec  4 14:38:05 2016
New Revision: 243228

URL: https://gcc.gnu.org/viewcvs?rev=243228=gcc=rev
Log:
PR target/70322
* config/i386/i386.c (dimode_scalar_to_vector_candidate_p): Handle NEG.
(dimode_scalar_chain::compute_convert_gain): Ditto.
(dimode_scalar_chain::convert_insn): Ditto.

testsuite/ChangeLog:

PR target/70322
* gcc.target/i386/pr70322-4.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr70322-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog

[Bug target/70322] STV doesn't optimize andn

2016-12-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jakub Jelinek  ---
Fixed.

[Bug target/70322] STV doesn't optimize andn

2016-12-02 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #8 from uros at gcc dot gnu.org ---
Author: uros
Date: Fri Dec  2 18:48:35 2016
New Revision: 243202

URL: https://gcc.gnu.org/viewcvs?rev=243202=gcc=rev
Log:
PR target/70322
* config/i386/i386.md (*andndi3_doubleword): Add non-BMI alternative
and corresponding post-reload splitter.

testsuite/ChangeLog:

PR target/70322
* gcc.target/i386/pr70322-2.c (dg-final): Remove xfail.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr70322-2.c

[Bug target/70322] STV doesn't optimize andn

2016-12-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #7 from Jakub Jelinek  ---
Author: jakub
Date: Fri Dec  2 16:28:41 2016
New Revision: 243195

URL: https://gcc.gnu.org/viewcvs?rev=243195=gcc=rev
Log:
PR target/70322
* config/i386/i386.c (dimode_scalar_to_vector_candidate_p): Handle
NOT.
(dimode_scalar_chain::compute_convert_gain): Likewise.
(dimode_scalar_chain::convert_insn): Likewise.
* config/i386/i386.md (*one_cmpldi2_doubleword): New
define_insn_and_split.
(one_cmpl2): Use SWIM1248x iterator instead of SWIM.

* gcc.target/i386/pr70322-1.c: New test.
* gcc.target/i386/pr70322-2.c: New test.
* gcc.target/i386/pr70322-3.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70322-1.c
trunk/gcc/testsuite/gcc.target/i386/pr70322-2.c
trunk/gcc/testsuite/gcc.target/i386/pr70322-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog

[Bug target/70322] STV doesn't optimize andn

2016-12-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Created attachment 40214
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40214=edit
gcc7-pr70322.patch

Untested patch I'm playing with.

[Bug target/70322] STV doesn't optimize andn

2016-03-21 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #5 from Ilya Enkovich  ---
STV is a scalar to vector converter.  It doesn't combine two instructions into
a single ANDN, it searches for existing ANDN patterns and converts them into
vector mode.  Combine is responsible for producing ANDN out of two
instructions.  With proper one_cmpl pattern combine should be able to handle
it.

TARGET_BMI is required for scalar version.  Vector version doesn't require BMI
but if instruction is not converted into a vector one then we split it into BMI
instructions.

I had ANDN support for non-BMI targets in my plans.  But you can't just remove
TARGET_BMI from existing pattern.  Additional split is needed as mentioned in
[1].

BTW I expect ANDN support for non-BMI targets should improve 462.libquantum
performance on Silvermont.  It should be used to test a fix.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01229.html

[Bug target/70322] STV doesn't optimize andn

2016-03-21 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #4 from H.J. Lu  ---
(In reply to Ilya Enkovich from comment #3)
> (In reply to H.J. Lu from comment #0)
> > i386.md has
> > 
> > (define_insn_and_split "*andndi3_doubleword"
> >   [(set (match_operand:DI 0 "register_operand" "=r,r")
> > (and:DI
> >   (not:DI (match_operand:DI 1 "register_operand" "r,r"))
> >   (match_operand:DI 2 "nonimmediate_operand" "r,m")))
> >(clobber (reg:CC FLAGS_REG))]
> >   "TARGET_BMI && !TARGET_64BIT && TARGET_STV && TARGET_SSE"
> >   "#"
> > 
> > But it is never used:
> 
> gcc.target/i386/pr65105-5.c checks it actually works

STV converts

insn 24 23 25 4 (parallel [
(set (reg:DI 103)
(and:DI (reg:DI 89 [ _10 ])
(reg/v:DI 98 [ p2 ])))
(clobber (reg:CC 17 flags))
])
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr65105-5.c:20
394 {*anddi3_doubleword}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 25 24 26 4 (parallel [
(set (reg:DI 104)
(xor:DI (reg/v:DI 98 [ p2 ])
(reg:DI 103)))
(clobber (reg:CC 17 flags))
])
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr65105-5.c:20
421 {*xordi3_doubleword}

to

(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (subreg:V2DI (reg:DI 104) 0)
(and:V2DI (not:V2DI (subreg:V2DI (reg:DI 89 [ _10 ]) 0)) 
(subreg:V2DI (reg/v:DI 98 [ p2 ]) 0)))
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr65105-5.c:20
3479 {*andnotv2di3}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

It shouldn't require BMI and it doesn't handle "~x & y".

[Bug target/70322] STV doesn't optimize andn

2016-03-21 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

Ilya Enkovich  changed:

   What|Removed |Added

 CC||ienkovich at gcc dot gnu.org

--- Comment #3 from Ilya Enkovich  ---
(In reply to H.J. Lu from comment #0)
> i386.md has
> 
> (define_insn_and_split "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,r")
> (and:DI
>   (not:DI (match_operand:DI 1 "register_operand" "r,r"))
>   (match_operand:DI 2 "nonimmediate_operand" "r,m")))
>(clobber (reg:CC FLAGS_REG))]
>   "TARGET_BMI && !TARGET_64BIT && TARGET_STV && TARGET_SSE"
>   "#"
> 
> But it is never used:

gcc.target/i386/pr65105-5.c checks it actually works

[Bug target/70322] STV doesn't optimize andn

2016-03-20 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

--- Comment #2 from H.J. Lu  ---
TARGET_BMI isn't needed in *andndi3_doubleword since combine won't
generate BMI andn patterns unless BMI is enabled.

[Bug target/70322] STV doesn't optimize andn

2016-03-20 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70322

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-20
 Ever confirmed|0   |1

--- Comment #1 from H.J. Lu  ---
one_cmpldi2 pattern is missing for STV.