Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jan Beulich via Gcc-patches
On 10.08.2023 15:12, Phoebe Wang wrote: >> The psABI should have some simple rule covering all of the above I think. > > psABI has a rule for the case doesn't mean the rule is a well defined ABI > in practice. A well defined ABI should guarantee 1) interlinkable across > different compile

Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 09:38, Hongtao Liu wrote: > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich wrote: >> >> On 09.08.2023 04:14, Hongtao Liu wrote: >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu wrote: On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers wrote: > > Do you have any comments

Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 04:14, Hongtao Liu wrote: > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu wrote: >> >> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers wrote: >>> >>> Do you have any comments on the interaction of AVX10 with the >>> micro-architecture levels defined in the ABI (and supported with >>>

[PATCH 10/10] x86: drop redundant "prefix_data16" attributes

2023-08-03 Thread Jan Beulich via Gcc-patches
The attribute defaults to 1 for TI-mode insns of type sselog, sselog1, sseiadd, sseimul, and sseishft. In *v8hi3 [smaxmin] and *v16qi3 [umaxmin] also drop the similarly stray "prefix_extra" at this occasion. These two max/min flavors are encoded in 0f space. gcc/ * config/i386/mmx.md

[PATCH 08/10] x86: add missing "prefix" attribute to VF{,C}MULC

2023-08-03 Thread Jan Beulich via Gcc-patches
gcc/ * config/i386/sse.md (__): Add "prefix" attribute. (avx512fp16_sh_v8hf): Likewise. --- Talking of "prefix": Shouldn't at least V32HF and V32BF have it also default to "evex"? (It won't matter right here, but it may matter elsewhere.) ---

[PATCH 06/10] x86: drop stray "prefix_extra"

2023-08-03 Thread Jan Beulich via Gcc-patches
While the attribute is relevant for legacy- and VEX-encoded insns, it is of no relevance for EVEX-encoded ones. While there in avx512dq_broadcast_1 add the missing "length_immediate". gcc/ * config/i386/sse.md (*_eq3_1): Drop "prefix_extra".

[PATCH 05/10] x86: replace/correct bogus "prefix_extra"

2023-08-03 Thread Jan Beulich via Gcc-patches
In the rdrand and rdseed cases "prefix_0f" is meant instead. For mmx_floatv2siv2sf2 1 is correct only for the first alternative. For the integer min/max cases 1 uniformly applies to legacy and VEX encodings (the UB and SW variants are dealt with separately anyway). Same for {,V}MOVNTDQA. Unlike

[PATCH 09/10] x86: correct "length_immediate" in a few cases

2023-08-03 Thread Jan Beulich via Gcc-patches
When first added explicitly in 3ddffba914b2 ("i386.md (sse4_1_round2): Add avx512f alternative"), "*" should not have been used for the pre-existing alternative. The attribute was plain missing. Subsequent changes adding more alternatives then generously extended the bogus pattern. Apparently

[PATCH 07/10] x86: add (adjust) XOP insn attributes

2023-08-03 Thread Jan Beulich via Gcc-patches
Many were lacking "prefix" and "prefix_extra", some had a bogus value of 2 for "prefix_extra" (presumably inherited from their SSE5 counterparts, which are long gone) and a meaningless "prefix_data16" one. Where missing, "mode" attributes are also added. (Note that "sse4arg" and "ssemuladd" ones

[PATCH 03/10] x86: "ssemuladd" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
They're all VEX3- (also covering XOP) or EVEX-encoded. Express that in the default calculation of "prefix". FMA4 insns also all have a 1-byte immediate operand. Where the default calculation is not sufficient / applicable, add explicit "prefix" attributes. While there also add a "mode" attribute

[PATCH 04/10] x86: "prefix_extra" can't really be "2"

2023-08-03 Thread Jan Beulich via Gcc-patches
In the three remaining instances separate "prefix_0f" and "prefix_rep" are what is wanted instead. gcc/ * config/i386/i386.md (rdbase): Add "prefix_0f" and "prefix_rep". Drop "prefix_extra". (wrbase): Likewise. (ptwrite): Likewise. --- a/gcc/config/i386/i386.md

[PATCH 02/10] x86: "sse4arg" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
Record common properties in other attributes' default calculations: There's always a 1-byte immediate, and they're always encoded in a VEX3- like manner (note that "prefix_extra" already evaluates to 1 in this case). The drop now (or already previously) redundant explicit attributes, adding "mode"

[PATCH 01/10] x86: "prefix_extra" tidying

2023-08-03 Thread Jan Beulich via Gcc-patches
Drop SSE5 leftovers from both its comment and its default calculation. A value of 2 simply cannot occur anymore. Instead extend the comment to mention the use of the attribute in "length_vex", clarifying why "prefix_extra" can actually be meaningful on VEX-encoded insns despite those not having

[PATCH 00/10] x86: (mainly) "prefix_extra" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
Having noticed various bogus uses, I thought I'd go through and audit them all. This is the result, with some other attributes also adjusted as noticed in the process. (I think this tidying also is a good thing to have ahead of APX further complicating insn length calculations.) 01:

[PATCH] MAINTAINERS: correct my email address

2023-08-01 Thread Jan Beulich via Gcc-patches
The @novell.com one has been out of use for quite some time. ChangeLog: * MAINTAINERS: Correct my email address. --- a/MAINTAINERS +++ b/MAINTAINERS @@ -344,7 +344,7 @@ Andrew Bennett Daniel Berlin Pat Bernardi

[PATCH RESEND] libatomic: drop redundant all-multi command

2023-07-31 Thread Jan Beulich via Gcc-patches
./multilib.am already specifies this same command, and make warns about the earlier one being ignored when seeing the later one. All that needs retaining to still satisfy the preceding comment is the extra dependency. libatomic/ * Makefile.am (all-multi): Drop commands. *

[PATCH] x86: fold two of vec_dupv2df's alternatives

2023-07-31 Thread Jan Beulich via Gcc-patches
By using Yvm in the source, both can be expressed in one. gcc/ * sse.md (vec_dupv2df): Fold the middle two of the alternatives. --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -13784,21 +13784,20 @@ (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])

Re: [PATCH] x86: slightly enhance "vec_dupv2df"

2023-07-17 Thread Jan Beulich via Gcc-patches
On 17.07.2023 08:09, Hongtao Liu wrote: > On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches > wrote: >> >> Introduce a new alternative permitting all 32 registers to be used as >> source without AVX512VL, by broadcasting to the full 512 bits in that >> cas

Re: [PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Jan Beulich via Gcc-patches
On 14.07.2023 12:10, Uros Bizjak wrote: > On Fri, Jul 14, 2023 at 11:44 AM Jan Beulich wrote: >> >> The corresponding insn serves this purpose quite fine, and leads to >> slightly less (generated) code. All we need is the insn to not have a >> leading * in its name, while retaining that * for

[PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Jan Beulich via Gcc-patches
The corresponding insn serves this purpose quite fine, and leads to slightly less (generated) code. All we need is the insn to not have a leading * in its name, while retaining that * for "extendhfsf2". Introduce a mode attribute in exchange to achieve that. gcc/ * config/i386/i386.md

[PATCH] x86: avoid maybe_gen_...()

2023-07-14 Thread Jan Beulich via Gcc-patches
In the (however unlikely) event that no insn can be found for the requested mode, using maybe_gen_...() without (really) checking its result for being a null rtx would lead to silent bad code generation. gcc/ * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Use

[PATCH] x86: slightly enhance "vec_dupv2df"

2023-07-14 Thread Jan Beulich via Gcc-patches
Introduce a new alternative permitting all 32 registers to be used as source without AVX512VL, by broadcasting to the full 512 bits in that case. (The insn would also permit all registers to be used as destination, but V2DFmode doesn't.) gcc/ * config/i386/sse.md (vec_dupv2df): Add new

Re: [PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Jan Beulich via Gcc-patches
On 11.07.2023 08:45, Liu, Hongtao wrote: >> -Original Message- >> From: Jan Beulich >> Sent: Tuesday, July 11, 2023 2:08 PM >> >> There's nothing AVX512BW-ish in here, so no reason to use Yw as the >> constraints for the AVX alternative. Furthermore by using the 512-bit form of >> VPSSLD

[PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Jan Beulich via Gcc-patches
There's nothing AVX512BW-ish in here, so no reason to use Yw as the constraints for the AVX alternative. Furthermore by using the 512-bit form of VPSSLD (in a new alternative) all 32 registers can be used directly by the insn without AVX512VL needing to be enabled. Also adjust the originally last

[PATCH v3] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-07-11 Thread Jan Beulich via Gcc-patches
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD, due to the immediate operand of the shuffle insns balancing the (uniform) need for VEX3 in the broadcast ones. When EVEX encoding is respective the

Re: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-07 Thread Jan Beulich via Gcc-patches
On 07.07.2023 09:46, Hongtao Liu wrote: > On Fri, Jul 7, 2023 at 3:18 PM Jan Beulich via Gcc-regression > wrote: >> >> On 06.07.2023 13:57, haochen.jiang wrote: >>> On Linux/x86_64, >>> >>> e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit >>> commit

Re: [r14-2310 Regression] FAIL: gcc.target/i386/pr53652-1.c scan-assembler-times pandn[ \\t] 2 on Linux/x86_64

2023-07-07 Thread Jan Beulich via Gcc-patches
On 07.07.2023 09:30, Hongtao Liu wrote: > On Fri, Jul 7, 2023 at 3:13 PM Jan Beulich via Gcc-regression > wrote: >> >> On 06.07.2023 13:57, haochen.jiang wrote: >>> On Linux/x86_64, >>> >>> 2d11c99dfca3cc603dbbfafb3afc41689a68e40f is the first bad commit >>> commit

Re: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-07 Thread Jan Beulich via Gcc-patches
On 06.07.2023 13:57, haochen.jiang wrote: > On Linux/x86_64, > > e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit > commit e007369c8b67bcabd57c4fed8cff2a6db82e78e6 > Author: Jan Beulich > Date: Wed Jul 5 09:49:16 2023 +0200 > > x86: yet more PR target/100711-like splitting

Re: [r14-2310 Regression] FAIL: gcc.target/i386/pr53652-1.c scan-assembler-times pandn[ \\t] 2 on Linux/x86_64

2023-07-07 Thread Jan Beulich via Gcc-patches
On 06.07.2023 13:57, haochen.jiang wrote: > On Linux/x86_64, > > 2d11c99dfca3cc603dbbfafb3afc41689a68e40f is the first bad commit > commit 2d11c99dfca3cc603dbbfafb3afc41689a68e40f > Author: Jan Beulich > Date: Wed Jul 5 09:41:09 2023 +0200 > > x86: use VPTERNLOG also for certain andnot

Re: [PATCH 2/2] x86: slightly correct / simplify *vec_extractv2ti

2023-07-05 Thread Jan Beulich via Gcc-patches
On 05.07.2023 10:47, Hongtao Liu wrote: > On Wed, Jul 5, 2023 at 4:01 PM Jan Beulich via Gcc-patches > wrote: >> >> V2TImode values cannot appear in the upper 16 YMM registers without >> AVX512VL being enabled. Therefore forcing 512-bit mode (also not >> ref

Re: [PATCH 1/2] x86: correct / simplify @vec_extract_hi_ and vec_extract_hi_v32qi

2023-07-05 Thread Jan Beulich via Gcc-patches
On 05.07.2023 10:40, Hongtao Liu wrote: > On Wed, Jul 5, 2023 at 4:00 PM Jan Beulich via Gcc-patches > wrote: >> >> The middle alternative each was unusable without enabling AVX512DQ (in >> addition to AVX512VL), which is entirely unrelated here. The last >> alter

[PATCH 2/2] x86: slightly correct / simplify *vec_extractv2ti

2023-07-05 Thread Jan Beulich via Gcc-patches
V2TImode values cannot appear in the upper 16 YMM registers without AVX512VL being enabled. Therefore forcing 512-bit mode (also not reflected in the "mode" attribute) is pointless. gcc/ * config/i386/sse.md (*vec_extractv2ti): Drop g modifiers. --- a/gcc/config/i386/sse.md +++

[PATCH 1/2] x86: correct / simplify @vec_extract_hi_ and vec_extract_hi_v32qi

2023-07-05 Thread Jan Beulich via Gcc-patches
The middle alternative each was unusable without enabling AVX512DQ (in addition to AVX512VL), which is entirely unrelated here. The last alternative is usable with AVX512VL only (due to type restrictions on what may be put in the upper 16 YMM registers), and hence is pointlessly forcing 512-bit

[PATCH 0/2] x86: vec_extract_* adjustments

2023-07-05 Thread Jan Beulich via Gcc-patches
1: correct / simplify @vec_extract_hi_ and vec_extract_hi_v32qi 2: slightly correct / simplify *vec_extractv2ti Jan

[PATCH] x86: suppress avx512f-copysign.c testcase for 32-bit

2023-07-05 Thread Jan Beulich via Gcc-patches
The test installed by "x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F" won't succeed on 32-bit, for floating point operations being done there (by default) without using SIMD insns. gcc/testsuite/ * gcc.target/i386/avx512f-copysign.c: Suppress for 32-bit. ---

Re: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-07-04 Thread Jan Beulich via Gcc-patches
On 27.06.2023 07:11, Hongtao Liu wrote: > On Tue, Jun 20, 2023 at 5:34 PM Hongtao Liu wrote: >> >> On Tue, Jun 20, 2023 at 5:03 PM Jan Beulich wrote: >>> >>> On 20.06.2023 10:33, Hongtao Liu wrote: >>>> On Tue, Jun 20, 2023 at 3:07 PM Jan Beulich via

Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations

2023-06-25 Thread Jan Beulich via Gcc-patches
42, Hongtao Liu wrote: >>>>> On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches >>>>> wrote: >>>>>> >>>>>> +(define_code_iterator andor [and ior]) >>>>>> +(define_code_attr nlogic [(and "nor") (ior

Re: [PATCH 5/5] x86: yet more PR target/100711-like splitting

2023-06-25 Thread Jan Beulich via Gcc-patches
On 25.06.2023 07:12, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 2:29 PM Jan Beulich via Gcc-patches > wrote: >> >> --- >> For the purpose here (and elsewhere) bcst_vector_operand() (really: >> bcst_mem_operand()) isn't permissive enough: We'd want it to allo

Re: [PATCH 4/5] x86: further PR target/100711-like splitting

2023-06-25 Thread Jan Beulich via Gcc-patches
On 25.06.2023 07:06, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 2:28 PM Jan Beulich via Gcc-patches > wrote: >> >> With respective two-operand bitwise operations now expressable by a >> single VPTERNLOG, add splitters to also deal with ior and xor >> counterparts

Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations

2023-06-24 Thread Jan Beulich via Gcc-patches
On 25.06.2023 06:42, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches > wrote: >> >> +(define_code_iterator andor [and ior]) >> +(define_code_attr nlogic [(and "nor") (ior "nand")]) >> +(define

Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Jan Beulich via Gcc-patches
On 21.06.2023 09:44, Jan Beulich wrote: > On 21.06.2023 09:37, Hongtao Liu wrote: >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches >> wrote: >>> >>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss >> According to c

Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Jan Beulich via Gcc-patches
On 21.06.2023 09:37, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches > wrote: >> >> Is there a reason why vec_dupv4sf uses sseshuf1 for its shuffle >> alternatives, but *vec_dupv4si uses sselog1? I'd be happy to correct >> this i

[PATCH 5/5] x86: yet more PR target/100711-like splitting

2023-06-21 Thread Jan Beulich via Gcc-patches
Following two-operand bitwise operations, add another splitter to also deal with not followed by broadcast all on its own, which can be expressed as simple embedded broadcast instead once a broadcast operand is actually permitted in the respective insn. While there also permit a broadcast operand

[PATCH 4/5] x86: further PR target/100711-like splitting

2023-06-21 Thread Jan Beulich via Gcc-patches
With respective two-operand bitwise operations now expressable by a single VPTERNLOG, add splitters to also deal with ior and xor counterparts of the original and-only case. Note that the splitters need to be separate, as the placement of "not" differs in the final insns (*iornot3, *xnor3) which

[PATCH 3/5] x86: allow memory operand for AVX2 splitter for PR target/100711

2023-06-21 Thread Jan Beulich via Gcc-patches
The intended broadcast (with AVX512) can very well be done right from memory. gcc/ * config/i386/sse.md: Permit non-immediate operand 1 in AVX2 form of splitter for PR target/100711. --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17356,7 +17356,7 @@

[PATCH 2/5] x86: use VPTERNLOG also for certain andnot forms

2023-06-21 Thread Jan Beulich via Gcc-patches
When it's the memory operand which is to be inverted, using VPANDN* requires a further load instruction. The same can be achieved by a single VPTERNLOG*. Add two new alternatives (for plain memory and embedded broadcast), adjusting the predicate for the first operand accordingly. Two pre-existing

[PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations

2023-06-21 Thread Jan Beulich via Gcc-patches
All combinations of and, ior, xor, and not involving two operands can be expressed that way in a single insn. gcc/ PR target/93768 * config/i386/i386.cc (ix86_rtx_costs): Further special-case bitwise vector operations. * config/i386/sse.md (*iornot3): New insn.

[PATCH 0/5] x86: make better use of VPTERNLOG{D,Q}

2023-06-21 Thread Jan Beulich via Gcc-patches
While there are some quite sophisticated 4-operand expanders, 2-operand binary logic which can't be expressed by just VPAND, VPANDN, VPOR, or VPXOR doesn't utilize this insn to carry out such operations in a single insn. Therefore the first two patches address one of the sub-aspects of PR

[PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-21 Thread Jan Beulich via Gcc-patches
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD, due to the immediate operand of the shuffle insns balancing the possible need for VEX3 in the broadcast ones. When EVEX encoding is required the

[PATCH] x86: add -mprefer-vector-width=512 to new avx512f-dupv2di.c testcase

2023-06-21 Thread Jan Beulich via Gcc-patches
This is to cover testing also being done with -march=cascadelake. --- Committing as obvious. --- a/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-mavx512f

Re: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-20 Thread Jan Beulich via Gcc-patches
On 20.06.2023 10:33, Hongtao Liu wrote: > On Tue, Jun 20, 2023 at 3:07 PM Jan Beulich via Gcc-patches > wrote: >> >> I guess the underlying pattern, going along the lines of what >> one_cmpl2 uses, can be applied elsewhere >> as well. > That should be guarded

[PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-20 Thread Jan Beulich via Gcc-patches
There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a

Re: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-19 Thread Jan Beulich via Gcc-patches
On 19.06.2023 04:07, Liu, Hongtao wrote: >> -Original Message- >> From: Jan Beulich >> Sent: Friday, June 16, 2023 2:22 PM >> >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -12597,11 +12597,11 @@ >> (set_attr "mode" "")]) >> >> (define_insn "*_vternlog_all" >>

[PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-16 Thread Jan Beulich via Gcc-patches
There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a

[PATCH v2] x86: correct and improve "*vec_dupv2di"

2023-06-16 Thread Jan Beulich via Gcc-patches
The input constraint for the %vmovddup alternative was wrong, as the upper 16 XMM registers require AVX512VL to be used with this insn. To compensate, introduce a new alternative permitting all 32 registers, by broadcasting to the full 512 bits in that case if AVX512VL is not available. gcc/

Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Jan Beulich via Gcc-patches
On 15.06.2023 09:45, Hongtao Liu wrote: > On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches > wrote: >> On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches >> wrote: >>> +case 3: >>> + return "%vmovddup\t{%1, %0|%0, %1}"; &

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-15 Thread Jan Beulich via Gcc-patches
On 15.06.2023 07:23, Hongtao Liu wrote: > On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich wrote: >> >> On 14.06.2023 09:41, Hongtao Liu wrote: >>> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches >>> wrote: >>>> >>>> ... in v

[PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Jan Beulich via Gcc-patches
The input constraint for the %vmovddup alternative was wrong, as the upper 16 XMM registers require AVX512VL to be used with this insn. To compensate, introduce a new alternative permitting all 32 registers, by broadcasting to the full 512 bits in that case if AVX512VL is not available. gcc/

Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Jan Beulich via Gcc-patches
On 14.06.2023 10:10, Hongtao Liu wrote: > On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches > wrote: >> >> There's no reason to constrain this to AVX512VL, as the wider operation >> is not usable for more narrow operands only when the possible memory >

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Jan Beulich via Gcc-patches
On 14.06.2023 09:41, Hongtao Liu wrote: > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > wrote: >> >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are >> never longer (yet sometimes shorter) than the corresponding VSHUFPS / >>

[PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-13 Thread Jan Beulich via Gcc-patches
There's no reason to constrain this to AVX512VL, as the wider operation is not usable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the

[PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-13 Thread Jan Beulich via Gcc-patches
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD, due to the immediate operand of the shuffle insns balancing the need for VEX3 in the broadcast ones. When EVEX encoding is required the broadcast

[PATCH] x86: add Bk and Br to comment list B's sub-chars

2023-06-13 Thread Jan Beulich via Gcc-patches
gcc/ * config/i386/constraints.md: Mention k and r for B. --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -162,7 +162,9 @@ ;; g GOT memory operand. ;; m Vector memory operand ;; c Constant memory operand +;; k TLS address that allows insn using

[PATCH] x86/AVX512: use VMOVDDUP for broadcast to V2DF

2023-06-13 Thread Jan Beulich via Gcc-patches
Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on double precision floating values - is more appropriate to use here, and it can also result in shorter insn encodings when source is memory or %xmm0...%xmm7, and no masking is applied (in allowing a 2-byte VEX prefix then instead

Re: [PATCH v3] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-06-13 Thread Jan Beulich via Gcc-patches
On 13.06.2023 05:28, Fangrui Song wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/large-data.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target lp64 } */ > +/* { dg-options "-O2 -mcmodel=large -mlarge-data-threshold=4" } */ > +/* { dg-final {

Re: [PATCH v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-26 Thread Jan Beulich via Gcc-patches
On 25.05.2023 18:11, Fangrui Song wrote: > On 2023-05-25, Jan Beulich wrote: >> On 25.05.2023 17:16, Fangrui Song wrote: >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -32942,9 +32942,10 @@ the cache line size. @samp{compat} is the default. >>> >>> @opindex

Re: [PATCH v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-25 Thread Jan Beulich via Gcc-patches
On 25.05.2023 17:16, Fangrui Song wrote: > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -32942,9 +32942,10 @@ the cache line size. @samp{compat} is the default. > > @opindex mlarge-data-threshold > @item -mlarge-data-threshold=@var{threshold} > -When @option{-mcmodel=medium} is

Re: Ping: [PATCH] testsuite/C++: suppress filename canonicalization in module tests

2023-04-28 Thread Jan Beulich via Gcc-patches
On 28.04.2023 00:24, Nathan Sidwell wrote: > On 4/25/23 11:04, Jan Beulich wrote: >> On 28.06.2022 16:06, Jan Beulich wrote: >>> The pathname underneath gcm.cache/ is determined from the effective name >>> used for the main input file of a particular module. When modules are >>> built, no

Re: [PATCH] testsuite: adjust NOP expectations for RISC-V

2023-04-27 Thread Jan Beulich via Gcc-patches
On 26.04.2023 17:45, Palmer Dabbelt wrote: > On Wed, 26 Apr 2023 08:26:26 PDT (-0700), gcc-patches@gcc.gnu.org wrote: >> >> >> On 4/25/23 08:50, Jan Beulich via Gcc-patches wrote: >>> RISC-V will emit ".option nopic" when -fno-pie is in effect, which >>

Ping: [PATCH] testsuite/C++: suppress filename canonicalization in module tests

2023-04-25 Thread Jan Beulich via Gcc-patches
On 28.06.2022 16:06, Jan Beulich wrote: > The pathname underneath gcm.cache/ is determined from the effective name > used for the main input file of a particular module. When modules are > built, no canonicalization occurs for the main input file. Hence the > module file wouldn't be found if a

[PATCH v2] testsuite/C++: cope with IPv6 being unavailable

2023-04-25 Thread Jan Beulich via Gcc-patches
When IPv6 is disabled in the kernel, the error message coming back from Cody::OpenInet6() is different from the sole so far expected one. --- v2: Re-base. --- a/gcc/testsuite/g++.dg/modules/bad-mapper-3.C +++ b/gcc/testsuite/g++.dg/modules/bad-mapper-3.C @@ -1,6 +1,6 @@ // {

[PATCH] testsuite: adjust NOP expectations for RISC-V

2023-04-25 Thread Jan Beulich via Gcc-patches
RISC-V will emit ".option nopic" when -fno-pie is in effect, which matches the generic pattern. Just like done for Alpha, special-case RISC-V. --- A couple more targets look to be affected as well, simply because their "no-operation" insn doesn't match the expectation. With the apparently

[PATCH] testsuite/C++: cope with IPv6 being unavailable

2022-06-28 Thread Jan Beulich via Gcc-patches
When IPv6 is disabled in the kernel, the error message coming back from Cody::OpenInet6() is different from the sole so far expected one. gcc/testsuite/ * g++.dg/modules/bad-mapper-3.C: Relax failure pattern. --- a/gcc/testsuite/g++.dg/modules/bad-mapper-3.C +++

[PATCH] testsuite/C++: suppress filename canonicalization in module tests

2022-06-28 Thread Jan Beulich via Gcc-patches
The pathname underneath gcm.cache/ is determined from the effective name used for the main input file of a particular module. When modules are built, no canonicalization occurs for the main input file. Hence the module file wouldn't be found if a different (the canonicalized) file name was used

Ping: [PATCH] libatomic: drop redundant all-multi command

2022-06-28 Thread Jan Beulich via Gcc-patches
On 27.05.2022 10:01, Jan Beulich wrote: > ./multilib.am already specifies this same command, and make warns about > the earlier one being ignored when seeing the later one. All that needs > retaining to still satisfy the preceding comment is the extra > dependency. > > libatomic/ > > *

[PATCH] testsuite/ix86: SSE2 is a prereq to _Float16 use

2022-06-28 Thread Jan Beulich via Gcc-patches
When enabling AVX512FP via attribute or pragma, the _Float16 type would remain unavailable when at initialization time SSE2 wouldn't be seen as available for use. While this may hint at a wider underlying issue (like the feature, the type may want providing dynamically, albeit this may be

[PATCH] testsuite/ix86: prune MMX ABI warning

2022-06-28 Thread Jan Beulich via Gcc-patches
So far on 32-bit hosts this test failed (for both C and C++) because of the ABI change warning occurring without (explictly) enabling MMX. gcc/testsuite/ * c-c++-common/torture/builtin-shufflevector-2.c: Prune ix86 MMX ABI warning. ---

Re: [PATCH] configure: arrange to use appropriate objcopy

2022-06-07 Thread Jan Beulich via Gcc-patches
On 07.06.2022 09:41, Jakub Jelinek wrote: > On Tue, Jun 07, 2022 at 08:12:26AM +0200, Jan Beulich via Gcc-patches wrote: >>> This regressed >>> Executing on host: /home/jakub/src/gcc/obj44/gcc/xgcc >>> -B/home/jakub/src/gcc/obj44/gcc/ -fdiagnostics-plain-output

Re: [PATCH] configure: arrange to use appropriate objcopy

2022-06-07 Thread Jan Beulich via Gcc-patches
On 04.06.2022 10:32, Jakub Jelinek wrote: > On Thu, Jun 02, 2022 at 05:32:10PM +0200, Jan Beulich via Gcc-patches wrote: >> Using the system objcopy is wrong when other configure checks have >> probed a different set of binutils (I've noticed the problem on a system >> where t

[PATCH] configure: arrange to use appropriate objcopy

2022-06-02 Thread Jan Beulich via Gcc-patches
Using the system objcopy is wrong when other configure checks have probed a different set of binutils (I've noticed the problem on a system where the base objcopy can't deal with compressed debug sections). Arrange for the matching one to be picked up, first and foremost if an "in tree" one is

[PATCH] x86-64: make "length_vex" also account for VEX.B use by register operand

2022-06-02 Thread Jan Beulich via Gcc-patches
The length attribute ought to be "the (bounding maximum) length of an instruction" according to the comment next to its definition. A register operand encoded using the ModR/M.rm field will additionally use VEX.B for encoding the highest bit of the register number. Hence for the high 8 GPR

[PATCH] x86: harmonize __builtin_ia32_psadbw*() types

2022-06-02 Thread Jan Beulich via Gcc-patches
The 64-bit, 128-bit, and 512-bit variants have VDI return type, in line with instruction behavior. Make the 256-bit builtin match, thus also making it match the insn it expands to (using VI8_AVX2_AVX512BW). gcc/ * config/i386/i386-builtin.def (__builtin_ia32_psadbw256): Change

[PATCH v2] x86: {,v}psadbw have commutative source operands

2022-06-02 Thread Jan Beulich via Gcc-patches
Like noticed for gas as well (binutils-gdb commit c8cad9d389b7), the "absolute difference" aspect of the insns makes their source operands commutative. gcc/ * config/i386/mmx.md (mmx_psadbw): Convert to expander. (*mmx_psadbw): New. Mark as commutative. *

Re: [PATCH] x86: {,v}psadbw have commutative source operands

2022-05-30 Thread Jan Beulich via Gcc-patches
On 27.05.2022 11:05, Uros Bizjak wrote: > On Fri, May 27, 2022 at 10:13 AM Jan Beulich wrote: >> >> Like noticed for gas as well (binutils-gdb commit c8cad9d389b7), the >> "absolute difference" aspect of the insns makes their source operands >> commutative. > > You will need to expand via

Re: [PATCH] x86: correct bmi2_umul3_1's MEM_P() uses

2022-05-27 Thread Jan Beulich via Gcc-patches
On 27.05.2022 10:57, Uros Bizjak wrote: > On Fri, May 27, 2022 at 10:05 AM Jan Beulich wrote: >> >> It's pretty clear that the operand numbers in the MEM_P() checks are >> off by one, perhaps due to a copy-and-paste oversight (unlike in most >> other places here we're dealing with two outputs).

[PATCH] x86: {,v}psadbw have commutative source operands

2022-05-27 Thread Jan Beulich via Gcc-patches
Like noticed for gas as well (binutils-gdb commit c8cad9d389b7), the "absolute difference" aspect of the insns makes their source operands commutative. gcc/ 2022-05-XX Jan Beulich * config/i386/mmx.md (mmx_psadbw): Mark as commutative. * config/i386/sse.md (_psadbw): Likewise.

[PATCH] x86: correct bmi2_umul3_1's MEM_P() uses

2022-05-27 Thread Jan Beulich via Gcc-patches
It's pretty clear that the operand numbers in the MEM_P() checks are off by one, perhaps due to a copy-and-paste oversight (unlike in most other places here we're dealing with two outputs). --- What I don't understand is why operand 2 is "nonimmediate_operand", not "register_operand" (which afaict

libatomic: drop redundant all-multi command

2022-05-27 Thread Jan Beulich via Gcc-patches
./multilib.am already specifies this same command, and make warns about the earlier one being ignored when seeing the later one. All that needs retaining to still satisfy the preceding comment is the extra dependency. libatomic/ 2022-05-XX Jan Beulich * Makefile.am (all-multi): Drop