RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-12-12 Thread Jiang, Haochen
> > On the other hand, a new EVEX-capable level might bring earlier adoption
> > of EVEX capabilities to AMD CPUs, which still should be an improvement
> > over AVX2.  This could benefit AMD as well.  So I would really like to
> > see some AMD feedback here.
> >
> > There's also the matter that time scales for EVEX adoption are so long
> > that by then, Intel CPUs may end up supporting and preferring 512 bit
> > vectors again.
> 
> True, there isn't even widespread VEX adoption yet ... and now there's
> APX as the next best thing to target.
> 
> That said, my main point was that x86-64-v4 is "broken" as it appears
> as a dead end - AVX512 is no more, the future is AVX10, but yet we have
> to define x86-64-v5 as something that includes x86-64-v4.
> 
> So, can we un-do x86-64-v4?

As far as I have heard, x86-64-v4 is rarely used. There should be a small
chance to un-do that and not to break too many things. But I am not sure.

Thx,
Haochen

> 
> Richard.
> 
> > Thanks,
> > Florian
> >


Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-12-12 Thread Richard Biener
On Tue, Dec 12, 2023 at 10:05 AM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > If it were possible I'd axe x86_64-v4.  Maybe we should add a x86_64-v3.5
> > that sits inbetween v3 and v4, offering AVX512 but restricted to 256bit
> > (and obviously not requiring more of the AVX512 features that v4
> > requires).
>
> As far as I understand it, GCC's Intel tuning for AVX-512 is leaning
> heavily towards 256 bit vector length anyway.

Indeed it does, enabling avx256_optimal everywhere, but enabling
512bit moves on Sapphire Rapids (and not Granite Rapids!?).

>  That's not true for the
> default tuning for -march=x86-64-v4, though, it prefers 512 bit vectors.
> I've seen third-party reports that AMD Zen 4 does better in some ways
> with 512 bit vectors than with 256 bit vectors (despite its 256-bit-wide
> execution ports), but I have not tried to verify these observations.
> Still, this suggests that restricting a post-x86-64-v3 level to 256 bit
> vectors may not be an easy decision.

The corner Intel painted itself to be in is that their small cores on the
hybrid consumer products only support 128bit native (256bit emulated)
and their data-center "small core" SKU doesn't fare any better there.
That's the reason their marketing invented AVX10 which will allow
the AVX512 ISA play "nice" with a smaller data path (but I'm sure, or
at least I hope, that actual implementations will have a native 256bit
data path and not emulate it via 128bit).

The current problem is that the castrated consumer SKUs cannot use
EVEX at all and in the future will be crippled to 256bits.  So that will
be the common thing to target when targeting EVEX support across
Intel/AMD - use 256bit only.  Note that AVX10 excludes Zen4 which
lacks support for two niche AVX512 ISA extensions.

> On the other hand, a new EVEX-capable level might bring earlier adoption
> of EVEX capabilities to AMD CPUs, which still should be an improvement
> over AVX2.  This could benefit AMD as well.  So I would really like to
> see some AMD feedback here.
>
> There's also the matter that time scales for EVEX adoption are so long
> that by then, Intel CPUs may end up supporting and preferring 512 bit
> vectors again.

True, there isn't even widespread VEX adoption yet ... and now there's
APX as the next best thing to target.

That said, my main point was that x86-64-v4 is "broken" as it appears
as a dead end - AVX512 is no more, the future is AVX10, but yet we have
to define x86-64-v5 as something that includes x86-64-v4.

So, can we un-do x86-64-v4?

Richard.

> Thanks,
> Florian
>


Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-12-12 Thread Florian Weimer
* Richard Biener:

> If it were possible I'd axe x86_64-v4.  Maybe we should add a x86_64-v3.5
> that sits inbetween v3 and v4, offering AVX512 but restricted to 256bit
> (and obviously not requiring more of the AVX512 features that v4
> requires).

As far as I understand it, GCC's Intel tuning for AVX-512 is leaning
heavily towards 256 bit vector length anyway.  That's not true for the
default tuning for -march=x86-64-v4, though, it prefers 512 bit vectors.
I've seen third-party reports that AMD Zen 4 does better in some ways
with 512 bit vectors than with 256 bit vectors (despite its 256-bit-wide
execution ports), but I have not tried to verify these observations.
Still, this suggests that restricting a post-x86-64-v3 level to 256 bit
vectors may not be an easy decision.

On the other hand, a new EVEX-capable level might bring earlier adoption
of EVEX capabilities to AMD CPUs, which still should be an improvement
over AVX2.  This could benefit AMD as well.  So I would really like to
see some AMD feedback here.

There's also the matter that time scales for EVEX adoption are so long
that by then, Intel CPUs may end up supporting and preferring 512 bit
vectors again.

Thanks,
Florian



RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Jiang, Haochen
> > > > I wonder whether adoption could be made easier by also providing a
> > > > -mavx10[.0] level that removes some of the more obscure sub-ISA
> > > > requirements to cover more existing implementations (I'd not add 
> > > > -mavx10.0-512 here).
> > > > I'd require only skylake-AVX512 features here, basically all
> > > > non-KNL AVX512 CPUs should have a "virtual" AVX10 level that
> > > > allows to use that feature set,
> > >
> > > We have -mno-evex512 can cover those cases, so what you want is like
> > > a simple alias of "-march=skylake-avx512 -mno-evex512"?
> >
> > For the AVX512 enabled sub-isas of skylake-avx512 yes I guess.
> >
> > > > restricted to 256bits so future AVX10-256 implementations can
> > > > handle it as well as all existing (and relevant, which excludes
> > > > KNL) AVX512 implementations.
> > > >
> > > > Otherwise AVX10 is really a hard sell (as AVX512 was originally).
> > >
> > > It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
> > > complicated things further(considering we already have x86-64-v4
> > > which is different from skylake-avx512).
> >
> > Well, the cut-off for "AVX512" is quite arbitrary.  Introducing a
> > "new" ISA that's only available in HW available in the future and
> > suggesting users to embrace that already (like Intel did with AVX512
> > without offering client SKU support) is a hard sell.
> >
> > I realize Intel thinks client SKU support for AVX10 (restricted to
> > 256bit) will be "easier".  But then don't expect anybody to adopt that in 
> > the next 10 years.
> >
> > Just to add - we were suggesting to use x86_64-v3 for the "next"
> > enterprise product but got downvoted to x86_64-v2 for compatibility reasons.
> >
> > If it were possible I'd axe x86_64-v4.  Maybe we should add a
> > x86_64-v3.5 that sits inbetween v3 and v4, offering AVX512 but
> > restricted to 256bit (and obviously not requiring more of the AVX512 
> > features that v4 requires).
>
> About the arch level is indeed a problem, especially since the default size of
> avx10 is 256.
> +Florian Weimer for more inputs.

IMO, AVX10.1 options should be there and the arch level issue should not affect 
the
existence of this series of options.

The issue currently we are facing is much about the arch level issue actually 
since
we have defined x86-64-v4 before. The "-march=skylake-server -mno-evex512" is
much like something x86-64-v4-256.

Thx,
Haochen


Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Hongtao Liu
On Mon, Nov 13, 2023 at 7:25 PM Richard Biener
 wrote:
>
> On Mon, Nov 13, 2023 at 7:58 AM Hongtao Liu  wrote:
> >
> > On Fri, Nov 10, 2023 at 6:15 PM Richard Biener
> >  wrote:
> > >
> > > On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang  
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This RFC patch aims to add AVX10.1 options. After we added 
> > > > -m[no-]evex512
> > > > support, it makes a lot easier to add them comparing to the August 
> > > > version.
> > > > Detail for AVX10 is shown below:
> > > >
> > > > Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture 
> > > > Specification
> > > > It describes the Intel Advanced Vector Extensions 10 Instruction Set
> > > > Architecture.
> > > > https://cdrdv2.intel.com/v1/dl/getContent/784267
> > > >
> > > > The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical 
> > > > Paper
> > > > It provides introductory information regarding the converged vector 
> > > > ISA: Intel
> > > > Advanced Vector Extensions 10.
> > > > https://cdrdv2.intel.com/v1/dl/getContent/784343
> > > >
> > > > Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" 
> > > > ISAs in
> > > > the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
> > > > anything at first. At the end of the option handling, we will check 
> > > > whether
> > > > the two bits are set. If AVX10.1-256 is set, we will set the AVX512 
> > > > related
> > > > ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.
> > > >
> > > > It means that AVX10 options will be separated from the existing AVX512 
> > > > and the
> > > > newly added -m[no-]evex512 options. AVX10 and AVX512 options will 
> > > > control
> > > > (enable/disable/set vector size) the AVX512 features underneath 
> > > > independently.
> > > > If there’s potential overlap or conflict between AVX10 and AVX512 
> > > > options,
> > > > some rules are provided to define the behavior, which will be described 
> > > > below.
> > > >
> > > > avx10.1 option will be provided as an alias of avx10.1-256.
> > > >
> > > > In the future, the AVX10 options will imply like this:
> > > >
> > > > AVX10.1-256 < AVX10.1-512
> > > >  ^ ^
> > > >  | |
> > > >
> > > > AVX10.2-256 < AVX10.2-512
> > > >  ^ ^
> > > >  | |
> > > >
> > > > AVX10.3-256 < AVX10.3-512
> > > >  ^ ^
> > > >  | |
> > > >
> > > > Each of them will have its own option to enable/disabled corresponding
> > > > features. The alias avx10.x will also be provided.
> > > >
> > > > As mentioned in August version RFC, since we lean towards the adoption 
> > > > of
> > > > AVX10 instead of AVX512 from now on, we don’t recommend users to 
> > > > combine the
> > > > AVX10 and legacy AVX512 options.
> > >
> > > I wonder whether adoption could be made easier by also providing a
> > > -mavx10[.0] level that removes some of the more obscure sub-ISA 
> > > requirements
> > > to cover more existing implementations (I'd not add -mavx10.0-512 here).
> > > I'd require only skylake-AVX512 features here, basically all non-KNL 
> > > AVX512
> > > CPUs should have a "virtual" AVX10 level that allows to use that feature 
> > > set,
> > We have -mno-evex512 can cover those cases, so what you want is like a
> > simple alias of "-march=skylake-avx512 -mno-evex512"?
>
> For the AVX512 enabled sub-isas of skylake-avx512 yes I guess.
>
> > > restricted to 256bits so future AVX10-256 implementations can handle it
> > > as well as all existing (and relevant, which excludes KNL) AVX512
> > > implementations.
> > >
> > > Otherwise AVX10 is really a hard sell (as AVX512 was originally).
> > It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
> > complicated things further(considering we already have x86-64-v4 which
> > is different from skylake-avx512).
>
> Well, the cut-off for "AVX512" is quite arbitrary.  Introducing a
> "new" ISA that's
> only available in HW available in the future and suggesting users to embrace
> that already (like Intel did with AVX512 without offering client SKU support)
> is a hard sell.
>
> I realize Intel thinks client SKU support for AVX10 (restricted to 256bit) 
> will
> be "easier".  But then don't expect anybody to adopt that in the next 10 
> years.
>
> Just to add - we were suggesting to use x86_64-v3 for the "next" enterprise
> product but got downvoted to x86_64-v2 for compatibility reasons.
>
> If it were possible I'd axe x86_64-v4.  Maybe we should add a x86_64-v3.5
> that sits inbetween v3 and v4, offering AVX512 but restricted to 256bit
> (and obviously not requiring more of the AVX512 features that v4 requires).
About the arch level is indeed a problem, especially since the default
size of avx10 is 256.
+Florian Weimer for more inputs.
>
> Richard.
>
> > >
> > > > However, we would like to introduce some
> > > > simple rules for user when it comes to combination.
> > > >
> > > > 1. Enabling 

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Richard Biener
On Mon, Nov 13, 2023 at 7:58 AM Hongtao Liu  wrote:
>
> On Fri, Nov 10, 2023 at 6:15 PM Richard Biener
>  wrote:
> >
> > On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang  
> > wrote:
> > >
> > > Hi all,
> > >
> > > This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
> > > support, it makes a lot easier to add them comparing to the August 
> > > version.
> > > Detail for AVX10 is shown below:
> > >
> > > Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture 
> > > Specification
> > > It describes the Intel Advanced Vector Extensions 10 Instruction Set
> > > Architecture.
> > > https://cdrdv2.intel.com/v1/dl/getContent/784267
> > >
> > > The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical 
> > > Paper
> > > It provides introductory information regarding the converged vector ISA: 
> > > Intel
> > > Advanced Vector Extensions 10.
> > > https://cdrdv2.intel.com/v1/dl/getContent/784343
> > >
> > > Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs 
> > > in
> > > the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
> > > anything at first. At the end of the option handling, we will check 
> > > whether
> > > the two bits are set. If AVX10.1-256 is set, we will set the AVX512 
> > > related
> > > ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.
> > >
> > > It means that AVX10 options will be separated from the existing AVX512 
> > > and the
> > > newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
> > > (enable/disable/set vector size) the AVX512 features underneath 
> > > independently.
> > > If there’s potential overlap or conflict between AVX10 and AVX512 options,
> > > some rules are provided to define the behavior, which will be described 
> > > below.
> > >
> > > avx10.1 option will be provided as an alias of avx10.1-256.
> > >
> > > In the future, the AVX10 options will imply like this:
> > >
> > > AVX10.1-256 < AVX10.1-512
> > >  ^ ^
> > >  | |
> > >
> > > AVX10.2-256 < AVX10.2-512
> > >  ^ ^
> > >  | |
> > >
> > > AVX10.3-256 < AVX10.3-512
> > >  ^ ^
> > >  | |
> > >
> > > Each of them will have its own option to enable/disabled corresponding
> > > features. The alias avx10.x will also be provided.
> > >
> > > As mentioned in August version RFC, since we lean towards the adoption of
> > > AVX10 instead of AVX512 from now on, we don’t recommend users to combine 
> > > the
> > > AVX10 and legacy AVX512 options.
> >
> > I wonder whether adoption could be made easier by also providing a
> > -mavx10[.0] level that removes some of the more obscure sub-ISA requirements
> > to cover more existing implementations (I'd not add -mavx10.0-512 here).
> > I'd require only skylake-AVX512 features here, basically all non-KNL AVX512
> > CPUs should have a "virtual" AVX10 level that allows to use that feature 
> > set,
> We have -mno-evex512 can cover those cases, so what you want is like a
> simple alias of "-march=skylake-avx512 -mno-evex512"?

For the AVX512 enabled sub-isas of skylake-avx512 yes I guess.

> > restricted to 256bits so future AVX10-256 implementations can handle it
> > as well as all existing (and relevant, which excludes KNL) AVX512
> > implementations.
> >
> > Otherwise AVX10 is really a hard sell (as AVX512 was originally).
> It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
> complicated things further(considering we already have x86-64-v4 which
> is different from skylake-avx512).

Well, the cut-off for "AVX512" is quite arbitrary.  Introducing a
"new" ISA that's
only available in HW available in the future and suggesting users to embrace
that already (like Intel did with AVX512 without offering client SKU support)
is a hard sell.

I realize Intel thinks client SKU support for AVX10 (restricted to 256bit) will
be "easier".  But then don't expect anybody to adopt that in the next 10 years.

Just to add - we were suggesting to use x86_64-v3 for the "next" enterprise
product but got downvoted to x86_64-v2 for compatibility reasons.

If it were possible I'd axe x86_64-v4.  Maybe we should add a x86_64-v3.5
that sits inbetween v3 and v4, offering AVX512 but restricted to 256bit
(and obviously not requiring more of the AVX512 features that v4 requires).

Richard.

> >
> > > However, we would like to introduce some
> > > simple rules for user when it comes to combination.
> > >
> > > 1. Enabling AVX10 and AVX512 at the same command line with different 
> > > vector
> > > size will lead to a warning message. The behavior of the compiler will be
> > > enabling AVX10 with longer, i.e., 512 bit vector size.
> > >
> > > If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f 
> > > -mno-evex512,
> > > -mavx10.1-512 -mavx512f), it will be valid with the corresponding vector 
> > > size.
> > >
> > > 2. -mno-avx10.1 option can’t disable any 

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 6:15 PM Richard Biener
 wrote:
>
> On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang  wrote:
> >
> > Hi all,
> >
> > This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
> > support, it makes a lot easier to add them comparing to the August version.
> > Detail for AVX10 is shown below:
> >
> > Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
> > It describes the Intel Advanced Vector Extensions 10 Instruction Set
> > Architecture.
> > https://cdrdv2.intel.com/v1/dl/getContent/784267
> >
> > The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical 
> > Paper
> > It provides introductory information regarding the converged vector ISA: 
> > Intel
> > Advanced Vector Extensions 10.
> > https://cdrdv2.intel.com/v1/dl/getContent/784343
> >
> > Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs in
> > the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
> > anything at first. At the end of the option handling, we will check whether
> > the two bits are set. If AVX10.1-256 is set, we will set the AVX512 related
> > ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.
> >
> > It means that AVX10 options will be separated from the existing AVX512 and 
> > the
> > newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
> > (enable/disable/set vector size) the AVX512 features underneath 
> > independently.
> > If there’s potential overlap or conflict between AVX10 and AVX512 options,
> > some rules are provided to define the behavior, which will be described 
> > below.
> >
> > avx10.1 option will be provided as an alias of avx10.1-256.
> >
> > In the future, the AVX10 options will imply like this:
> >
> > AVX10.1-256 < AVX10.1-512
> >  ^ ^
> >  | |
> >
> > AVX10.2-256 < AVX10.2-512
> >  ^ ^
> >  | |
> >
> > AVX10.3-256 < AVX10.3-512
> >  ^ ^
> >  | |
> >
> > Each of them will have its own option to enable/disabled corresponding
> > features. The alias avx10.x will also be provided.
> >
> > As mentioned in August version RFC, since we lean towards the adoption of
> > AVX10 instead of AVX512 from now on, we don’t recommend users to combine the
> > AVX10 and legacy AVX512 options.
>
> I wonder whether adoption could be made easier by also providing a
> -mavx10[.0] level that removes some of the more obscure sub-ISA requirements
> to cover more existing implementations (I'd not add -mavx10.0-512 here).
> I'd require only skylake-AVX512 features here, basically all non-KNL AVX512
> CPUs should have a "virtual" AVX10 level that allows to use that feature set,
We have -mno-evex512 can cover those cases, so what you want is like a
simple alias of "-march=skylake-avx512 -mno-evex512"?
> restricted to 256bits so future AVX10-256 implementations can handle it
> as well as all existing (and relevant, which excludes KNL) AVX512
> implementations.
>
> Otherwise AVX10 is really a hard sell (as AVX512 was originally).
It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
complicated things further(considering we already have x86-64-v4 which
is different from skylake-avx512).
>
> > However, we would like to introduce some
> > simple rules for user when it comes to combination.
> >
> > 1. Enabling AVX10 and AVX512 at the same command line with different vector
> > size will lead to a warning message. The behavior of the compiler will be
> > enabling AVX10 with longer, i.e., 512 bit vector size.
> >
> > If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f -mno-evex512,
> > -mavx10.1-512 -mavx512f), it will be valid with the corresponding vector 
> > size.
> >
> > 2. -mno-avx10.1 option can’t disable any features enabled by AVX512 options 
> > or
> > impact the vector size, and vice versa. The compiler will emit warnings if
> > necessary.
> >
> > For the auto dispatch support including function multi versioning, function
> > attribute usage, the behavior will be identical to compiler options.
> >
> > If you have any questions, feel free to ask in this thread.
> >
> > Thx,
> > Haochen
> >
> >



-- 
BR,
Hongtao


Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-10 Thread Richard Biener
On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
> support, it makes a lot easier to add them comparing to the August version.
> Detail for AVX10 is shown below:
>
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
> It describes the Intel Advanced Vector Extensions 10 Instruction Set
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
>
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
> It provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
>
> Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs in
> the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
> anything at first. At the end of the option handling, we will check whether
> the two bits are set. If AVX10.1-256 is set, we will set the AVX512 related
> ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.
>
> It means that AVX10 options will be separated from the existing AVX512 and the
> newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
> (enable/disable/set vector size) the AVX512 features underneath independently.
> If there’s potential overlap or conflict between AVX10 and AVX512 options,
> some rules are provided to define the behavior, which will be described below.
>
> avx10.1 option will be provided as an alias of avx10.1-256.
>
> In the future, the AVX10 options will imply like this:
>
> AVX10.1-256 < AVX10.1-512
>  ^ ^
>  | |
>
> AVX10.2-256 < AVX10.2-512
>  ^ ^
>  | |
>
> AVX10.3-256 < AVX10.3-512
>  ^ ^
>  | |
>
> Each of them will have its own option to enable/disabled corresponding
> features. The alias avx10.x will also be provided.
>
> As mentioned in August version RFC, since we lean towards the adoption of
> AVX10 instead of AVX512 from now on, we don’t recommend users to combine the
> AVX10 and legacy AVX512 options.

I wonder whether adoption could be made easier by also providing a
-mavx10[.0] level that removes some of the more obscure sub-ISA requirements
to cover more existing implementations (I'd not add -mavx10.0-512 here).
I'd require only skylake-AVX512 features here, basically all non-KNL AVX512
CPUs should have a "virtual" AVX10 level that allows to use that feature set,
restricted to 256bits so future AVX10-256 implementations can handle it
as well as all existing (and relevant, which excludes KNL) AVX512
implementations.

Otherwise AVX10 is really a hard sell (as AVX512 was originally).

> However, we would like to introduce some
> simple rules for user when it comes to combination.
>
> 1. Enabling AVX10 and AVX512 at the same command line with different vector
> size will lead to a warning message. The behavior of the compiler will be
> enabling AVX10 with longer, i.e., 512 bit vector size.
>
> If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f -mno-evex512,
> -mavx10.1-512 -mavx512f), it will be valid with the corresponding vector size.
>
> 2. -mno-avx10.1 option can’t disable any features enabled by AVX512 options or
> impact the vector size, and vice versa. The compiler will emit warnings if
> necessary.
>
> For the auto dispatch support including function multi versioning, function
> attribute usage, the behavior will be identical to compiler options.
>
> If you have any questions, feel free to ask in this thread.
>
> Thx,
> Haochen
>
>


[RFC] Intel AVX10.1 Compiler Design and Support

2023-11-09 Thread Haochen Jiang
Hi all,

This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
support, it makes a lot easier to add them comparing to the August version.
Detail for AVX10 is shown below:

Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
It describes the Intel Advanced Vector Extensions 10 Instruction Set
Architecture.
https://cdrdv2.intel.com/v1/dl/getContent/784267

The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
It provides introductory information regarding the converged vector ISA: Intel
Advanced Vector Extensions 10.
https://cdrdv2.intel.com/v1/dl/getContent/784343

Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs in
the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
anything at first. At the end of the option handling, we will check whether
the two bits are set. If AVX10.1-256 is set, we will set the AVX512 related
ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.

It means that AVX10 options will be separated from the existing AVX512 and the
newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
(enable/disable/set vector size) the AVX512 features underneath independently.
If there’s potential overlap or conflict between AVX10 and AVX512 options,
some rules are provided to define the behavior, which will be described below.

avx10.1 option will be provided as an alias of avx10.1-256.

In the future, the AVX10 options will imply like this:

AVX10.1-256 < AVX10.1-512
 ^ ^
 | |

AVX10.2-256 < AVX10.2-512
 ^ ^
 | |

AVX10.3-256 < AVX10.3-512
 ^ ^
 | |

Each of them will have its own option to enable/disabled corresponding
features. The alias avx10.x will also be provided.

As mentioned in August version RFC, since we lean towards the adoption of
AVX10 instead of AVX512 from now on, we don’t recommend users to combine the
AVX10 and legacy AVX512 options. However, we would like to introduce some
simple rules for user when it comes to combination. 

1. Enabling AVX10 and AVX512 at the same command line with different vector
size will lead to a warning message. The behavior of the compiler will be
enabling AVX10 with longer, i.e., 512 bit vector size.

If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f -mno-evex512,
-mavx10.1-512 -mavx512f), it will be valid with the corresponding vector size.

2. -mno-avx10.1 option can’t disable any features enabled by AVX512 options or
impact the vector size, and vice versa. The compiler will emit warnings if
necessary.

For the auto dispatch support including function multi versioning, function
attribute usage, the behavior will be identical to compiler options.

If you have any questions, feel free to ask in this thread.

Thx,
Haochen




Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 4:31 PM Jakub Jelinek  wrote:
>
> On Wed, Aug 23, 2023 at 08:03:58AM +, Jiang, Haochen wrote:
> > We could first work on -mevex512 then further discuss -mavx10.1-256/512 
> > since
> > these -mavx10.1-256/512 is quite controversial.
> >
> > Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector 
> > right?
>
> I think it should enable them because -mavx512f is after it.  But it seems the
> option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
> just cancels each other, rather than
> enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
> alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
> But, if one uses separate target pragmas, it behaves like that:
> #pragma GCC target ("avx512bw")
> #ifdef __AVX512F__
> int a;
> #endif
> #ifdef __AVX512BW__
> int b;
> #endif
> #pragma GCC target ("no-avx512bw")
> #ifdef __AVX512F__
> int c;
> #endif
> #ifdef __AVX512BW__
> int d;
> #endif
> The above defines a, b and c vars even without any special -march= or other
> command line option.
>
> So, first important decision would be whether to make EVEX512
> OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
> to move some other ISA flag from the first to second set.
> That OPTION_MASK_ISA*_EVEX512 then should be added to
> OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
> the latter, we also need to do that for tons of other AVX512*_SET),
> and then just arrange for -mavx10.1-256 to enable
> OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
> (but, only disable it from the newly added set, not actually act as
> -mavx512{f,bw,...} -mno-evex512).
> OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
> or just EVEX512?
> And then the UNSET cases...
We can make OPTION_MASK_ISA2_EVEX512, but not set/unset that in
ix86_handle_option, but in ix86_option_override_internal, after all
set/unset for the existing AVX512***, if there's still
OPTION_MASK_ISA_AVX512F and no explicit set/unset for
OPTION_MASK_ISA2_EVEX512, then we set OPTION_MASK_ISA2_EVEX512.
That would make -mavx512*** implicitly set -mevex-512, but when
there's explicit -mno-evex512, -mavx512f won't set -mevex512 no matter
where -mno-evex512 is put.(-mno-evex512 -mavx512f still disable
512-bit).
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 23, 2023 at 08:03:58AM +, Jiang, Haochen wrote:
> We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
> these -mavx10.1-256/512 is quite controversial.
> 
> Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector 
> right?

I think it should enable them because -mavx512f is after it.  But it seems the
option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
just cancels each other, rather than
enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
But, if one uses separate target pragmas, it behaves like that:
#pragma GCC target ("avx512bw")
#ifdef __AVX512F__
int a;
#endif
#ifdef __AVX512BW__
int b;
#endif
#pragma GCC target ("no-avx512bw")
#ifdef __AVX512F__
int c;
#endif
#ifdef __AVX512BW__
int d;
#endif
The above defines a, b and c vars even without any special -march= or other
command line option.

So, first important decision would be whether to make EVEX512
OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
to move some other ISA flag from the first to second set.
That OPTION_MASK_ISA*_EVEX512 then should be added to
OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
the latter, we also need to do that for tons of other AVX512*_SET),
and then just arrange for -mavx10.1-256 to enable
OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
(but, only disable it from the newly added set, not actually act as
-mavx512{f,bw,...} -mno-evex512).
OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
or just EVEX512?
And then the UNSET cases...

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 4:16 PM Jakub Jelinek  wrote:
>
> On Wed, Aug 23, 2023 at 01:57:59AM +, Jiang, Haochen wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > > would be
> > > > > like now, except that the current AVX512* sets imply also 
> > > > > EVEX512/whatever
> > > > > it will be called, that option itself enables nothing (or 
> > > > > TARGET_AVX512F),
> > > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not only
> > enable 512 bit vector instructions but also enable scalar instructions, 
> > which
> > means when it comes to -mavx512bw -mno-evex512, we should enable
> > the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need something
> > to distinguish them out from the ISA set w/o AVX512VL.
>
> Ah, forgot about scalar instructions, even better, then we don't have to do
> that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && 
> !TARGET_AVX512VL
> in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
> should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
> the patterns which refer to 512-bit modes.  Also wonder if it
> wouldn't be easiest to make "v" constraint in that case be equivalent to
> just "x" so that all those hacks to make xmm16+ registers working in various
We can clear evex sse register in ix86_conditional_register_usage when
TARGET_AVX512F && !TARGET_EVEX512 && !TARGET_AVX512VL if we don't care
much about scalar ones.
> instructions through g modifiers wouldn't trigger.  Sure, that would
> penalize also scalar instructions, but the above case wouldn't be something
> any CPU actually supports, it would be only the common subset of say XeonPhi
> and AVX10.1-256.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 3:33 PM Richard Biener
 wrote:
>
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
>
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
>
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
>
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
I think we can just support -mevex512 for now, as for avx10.1-256/512
it can wait for a while, considering it doesn't have new instructions
and is controversial.
Basically, -mno-evex512 is good enough for most needs.
The only part I disagree with Jakub is I think for -mavx512f
-mno-evex512 -mavx512bw, we need to disable 512-bit, an explicit
-mno-evex512 should precedence over implicit yes.
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).
>
> Richard.



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 23, 2023 at 01:57:59AM +, Jiang, Haochen wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > would be
> > > > like now, except that the current AVX512* sets imply also 
> > > > EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or 
> > > > TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> 
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
> 
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.

Ah, forgot about scalar instructions, even better, then we don't have to do
that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && 
!TARGET_AVX512VL
in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
the patterns which refer to 512-bit modes.  Also wonder if it
wouldn't be easiest to make "v" constraint in that case be equivalent to
just "x" so that all those hacks to make xmm16+ registers working in various
instructions through g modifiers wouldn't trigger.  Sure, that would
penalize also scalar instructions, but the above case wouldn't be something
any CPU actually supports, it would be only the common subset of say XeonPhi
and AVX10.1-256.

Jakub



RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 23, 2023 3:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET*
> would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> 
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
> 
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
> 
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
these -mavx10.1-256/512 is quite controversial.

Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

Thx,
Haochen

> 
> Richard.


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
>
> On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > evex instruction patterns.
> >
> > Why?
> > Internally for md etc. purposes, we should have the current
> > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > etc., or some other name) which says if 512-bit vector modes can be used,
> > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > keep -mavx10.1 just as an command line option which enables/disables
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.

As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
So instead
we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
-mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
set or not.

We then have the -mevex512 flag (or whatever name we agree to) to enable
(or disable) 512bit support.

If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
-mno-evex512,
but Jakub disagrees here, so I'd rather not have it at all.  We could have
-mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

Richard.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Wednesday, August 23, 2023 10:19 AM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; Richard Biener
> ; ZiNgA BuRgA ;
> gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen
>  wrote:
> >
> > > -Original Message-
> > > From: Jakub Jelinek 
> > > Sent: Tuesday, August 22, 2023 11:02 PM
> > > To: Hongtao Liu 
> > > Cc: Richard Biener ; Jiang, Haochen
> > > ; ZiNgA BuRgA ;
> > > gcc- patc...@gcc.gnu.org
> > > Subject: Re: Intel AVX10.1 Compiler Design and Support
> > >
> > > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > >
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPC
> NT
> > > > DQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc
> > > > > OPTION_MASK_ISA*SET* would be like now, except that the current
> > > > > AVX512* sets imply also EVEX512/whatever it will be called, that
> > > > > option itself enables nothing (or TARGET_AVX512F), and unsetting it
> doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without
> EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512*
> > > > sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not
> > only enable 512 bit vector instructions but also enable scalar
> > instructions, which means when it comes to -mavx512bw -mno-evex512,
> we
> > should enable the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need
> > something to distinguish them out from the ISA set w/o AVX512VL.
> Why do we need to distinguish scalar evex instruction?
> As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask, it
> should be ok.
> 
> Assume there's no delta in AVX10.1, It sounds to me the design should be like
> 
> avx512*  <== mno-evex512==  avx512* + mevex512
> (no-evex512)(original AVX512 stuff)
>/\  /\
>||(equal)   ||(equal)
>\/  \/
> avx10.1-256   avx10.1-512
> /\  /\
> ||  ||
> ||  ||
> impliedimplied
> ||  ||
> ||  ||
> avx10.2-256 <== implied ==  avx10.2-512
> /\ /\
> || ||
> || ||
> impliedImplied
> || ||
> || ||
> avx10.3-256 <== implied ==   avx10.3-512
> 
> 1. The new instructions in avx10.x should be put in either avx10.x-256 or
> avx10.x-512 according to vector/kmask size 2. -mno-evex512 should disable -
> avx10.x-512.
> 3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will just
> enable -mavx512* but not -mevex512

I will revert all the AVX10.1 patches that have been committed in trunk since
the design changed if there is no objection in 24 hours.

Also I am working on a sample patch for -mevex512. Although there is a little
encoding issue in APX EVEX promoted KMOVQ, most of the users will not
notice that. And -mavxex512 is quite straightforward.

Thx,
Haochen

> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > I think that would be my expectation.  -mavx512bw currently implies
> > > 512-bit vector support of avx512f and avx512bw, and with
> > > -mavx512{bw,vl} also 128-bit/256-bit vector support.  All pre-AVX10
> > > chips which do support AVX512BW support 512-bit vectors.  Now,
> > > -mavx10.1 will bring in also
> > > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you
> > > wrote whic

Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Tuesday, August 22, 2023 11:02 PM
> > To: Hongtao Liu 
> > Cc: Richard Biener ; Jiang, Haochen
> > ; ZiNgA BuRgA ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > would be
> > > > like now, except that the current AVX512* sets imply also 
> > > > EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or 
> > > > TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
>
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
>
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.
Why do we need to distinguish scalar evex instruction?
As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask,
it should be ok.

Assume there's no delta in AVX10.1, It sounds to me the design should be like

avx512*  <== mno-evex512==  avx512* + mevex512
(no-evex512)(original AVX512 stuff)
   /\  /\
   ||(equal)   ||(equal)
   \/  \/
avx10.1-256   avx10.1-512
/\  /\
||  ||
||  ||
impliedimplied
||  ||
||  ||
avx10.2-256 <== implied ==  avx10.2-512
/\ /\
|| ||
|| ||
impliedImplied
|| ||
|| ||
avx10.3-256 <== implied ==   avx10.3-512

1. The new instructions in avx10.x should be put in either avx10.x-256
or avx10.x-512 according to vector/kmask size
2. -mno-evex512 should disable -avx10.x-512.
3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will
just enable -mavx512* but not -mevex512

>
> Thx,
> Haochen
>
> >
> > I think that would be my expectation.  -mavx512bw currently implies
> > 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> > also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> > AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> > which weren't enabled before, but unless there is some existing or planned
> > CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> > only support 128/256-bit vectors in those
> > dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> > is no need to differentiate further; the only CPUs which will support both
> > what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> > either CPUs with 128/256/512-bit vector support of those
> > f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> > -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> > disable all 512-bit vector instructions and in the end just mean the
> > same as -mavx10.1-256.
> > For just
> > -mavx512bw -mno-evex512 -mavx10.1-256
> > the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> > avx512vl isn't enabled at that point during processing, or if we do that
> > only at the end as a s

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, August 22, 2023 11:02 PM
> To: Hongtao Liu 
> Cc: Richard Biener ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > EVEX512)
> > If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think we still need that since the current w/o AVX512VL, we will not only
enable 512 bit vector instructions but also enable scalar instructions, which
means when it comes to -mavx512bw -mno-evex512, we should enable
the scalar function.

And scalar functions will also be enabled in AVX10.1-256, we need something
to distinguish them out from the ISA set w/o AVX512VL.

Thx,
Haochen

> 
> I think that would be my expectation.  -mavx512bw currently implies
> 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> which weren't enabled before, but unless there is some existing or planned
> CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> only support 128/256-bit vectors in those
> dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> is no need to differentiate further; the only CPUs which will support both
> what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> either CPUs with 128/256/512-bit vector support of those
> f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> disable all 512-bit vector instructions and in the end just mean the
> same as -mavx10.1-256.
> For just
> -mavx512bw -mno-evex512 -mavx10.1-256
> the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> avx512vl isn't enabled at that point during processing, or if we do that
> only at the end as a special case.  Of course, in this exact case there is
> no difference, because -mavx10.1-256 turns that back on.
> But it would make a difference on
> -mavx512bw -mno-evex512 -mavx512vl
> (when processed right away would disable AVX512BW (because VL isn't on)
> and in the end enable VL,F including EVEX512, or be equivalent to just
> -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> -mevex512 again.
> 
>   Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> then the combination basically is equal to AVX10.1-512(AVX512* sets +
> EVEX512)
> If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think that would be my expectation.  -mavx512bw currently implies
512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
which weren't enabled before, but unless there is some existing or planned
CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
only support 128/256-bit vectors in those
dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
is no need to differentiate further; the only CPUs which will support both
what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
either CPUs with 128/256/512-bit vector support of those
f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
-mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
disable all 512-bit vector instructions and in the end just mean the
same as -mavx10.1-256.
For just
-mavx512bw -mno-evex512 -mavx10.1-256
the question is if that -mno-evex512 turns off also avx512bw/avx512f because
avx512vl isn't enabled at that point during processing, or if we do that
only at the end as a special case.  Of course, in this exact case there is
no difference, because -mavx10.1-256 turns that back on.
But it would make a difference on
-mavx512bw -mno-evex512 -mavx512vl
(when processed right away would disable AVX512BW (because VL isn't on)
and in the end enable VL,F including EVEX512, or be equivalent to just
-mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
-mevex512 again.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:35 PM Hongtao Liu  wrote:
>
> On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
>  wrote:
> >
> > On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best 
> > > > > option
> > > > > name to represent whether the effective ISA set allows 512-bit 
> > > > > vectors or
> > > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, 
> > > > > -mavx10.1-256
> > > > > option IMHO should be in the same spirit to all the others a positive 
> > > > > enablement,
> > > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative 
> > > > > (disallow
> > > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because 
> > > > > the
> > > > > former would allow 512-bit vectors, the latter shouldn't disable 
> > > > > those again
> > > > > because it isn't a -mno-* option.  Sure, instructions which are 
> > > > > specific to
> > > > But there's implicit negative (disallow 512-bit vector), I think
> > >
> > > That is wrong.
> > >
> > > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > > 512-bit vector.
> > >
> > > Because then the -mavx10.1-256 option behaves completely differently from
> > > all the other isa options.
> > >
> > > We have the -march= options which are processed separately, but the normal
> > > ISA options either only enable something (when -mwhatever), or only 
> > > disable something
> > > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > > harder to understand.
> > >
> > > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > > that either disallows both or allows both. Instead of some isa
> > > > allowing it and some isa disallowing it.
> > >
> > > No, it will be really terrible user experience if the new options behave
> > > completely differently from everything else.  Because then we'll need to
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.
> > > document it in detail how it behaves and users will have hard time to 
> > > figure
> > > it out, and specify what it does not just on the command line, but also 
> > > when
> > > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 
> > > should
> > > be a union of those two ISAs.  Either internally there is an ISA flag 
> > > whether
> > > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > > instructions from the 10.1 to 10.2 delta, or if there is no such 
> > > separation
> > > internally, it will just enable full AVX10.2-512.  User has asked for it.
> >
> > I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> > confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 
> > isn't
> > good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> > will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> > 512bits.
> >
> > So scrap -mavx10.1-256 and -mavx10.1-512 please.
The related issue is what's the meaning of -mno-avx10.1-256/-mno-avx10.1-512
For -mno-avx10.1-256, maybe it just disable whole avx10.1
But for avx10.1-512 should it disable whole avx10.1 or just EVEX512,
or maybe we just doesn't provide -mno-avx10.1-512, just provide
-mno-avx10.1-256.
And use -mno-evex512 to disable 512-bit vectors.
>
> It sounds to me we would have something like
> avx512XXX
>^
>|
> "independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
> 128/256-bit instruction.
>|
> avx10.1-256  ^  ^
> |   |
> |   |
> implied   implied
> |   |
> |   |
> avx10.2-256  ^  ^
> |   |
> |   |
> impliedImplied
> |   |
> |   |
> avx10.3-256 <---implied---avx10.3-512
>   .
>
> And put every existing and new instruction under those flags
>
> >
> > Richard.
> >
> > > Jakub
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > evex instruction patterns.
>
> Why?
> Internally for md etc. purposes, we should have the current
> TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> etc., or some other name) which says if 512-bit vector modes can be used,
> if g modifier can be used, if the 64-bit mask operations can be used etc.
> Plus, if AVX10.1 contains any instructions not covered in the preexisting
> TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> keep -mavx10.1 just as an command line option which enables/disables
Let's assume there's no detla now, AVX10.1-512 is equal to
AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> other stuff.
> The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> like now, except that the current AVX512* sets imply also EVEX512/whatever
> it will be called, that option itself enables nothing (or TARGET_AVX512F),
> and unsetting it doesn't disable all the TARGET_AVX512*.
> -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
-mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
then the combination basically is equal to AVX10.1-512(AVX512* sets +
EVEX512)
If this is your assumption, yes, there's no need for TARGET_AVX10_1.
(My former understanding is that you want  -mavx512bw -mavx10.1-256
enable all 128/256/scalar invariants but only avx512bw 512-bit
invariants, this can't be done without TARGET_AVX10_1).
So the whole point is -mavx10.x-256 shouldn't clear nor set EVEX512,
and -mavx10.x-512 should set EVEX512.
> At the end of the option processing, if EVEX512/whatever is set but
> TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
> because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
> enabled, there is nothing left.
There's scalar evex instruction under TARGET_AVX512F(and other
non-avx512vl) w/o EVEX512, not nothing left.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.

Why?
Internally for md etc. purposes, we should have the current
TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
(TARGET_EVEX512 even if it is not completely descriptive because of kandq
etc., or some other name) which says if 512-bit vector modes can be used,
if g modifier can be used, if the 64-bit mask operations can be used etc.
Plus, if AVX10.1 contains any instructions not covered in the preexisting
TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
keep -mavx10.1 just as an command line option which enables/disables
other stuff.
The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
like now, except that the current AVX512* sets imply also EVEX512/whatever
it will be called, that option itself enables nothing (or TARGET_AVX512F),
and unsetting it doesn't disable all the TARGET_AVX512*.
-mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
At the end of the option processing, if EVEX512/whatever is set but
TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
enabled, there is nothing left.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
 wrote:
>
> On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > > name to represent whether the effective ISA set allows 512-bit vectors 
> > > > or
> > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, 
> > > > -mavx10.1-256
> > > > option IMHO should be in the same spirit to all the others a positive 
> > > > enablement,
> > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > > again
> > > > because it isn't a -mno-* option.  Sure, instructions which are 
> > > > specific to
> > > But there's implicit negative (disallow 512-bit vector), I think
> >
> > That is wrong.
> >
> > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > 512-bit vector.
> >
> > Because then the -mavx10.1-256 option behaves completely differently from
> > all the other isa options.
> >
> > We have the -march= options which are processed separately, but the normal
> > ISA options either only enable something (when -mwhatever), or only disable 
> > something
> > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > harder to understand.
> >
> > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > that either disallows both or allows both. Instead of some isa
> > > allowing it and some isa disallowing it.
> >
> > No, it will be really terrible user experience if the new options behave
> > completely differently from everything else.  Because then we'll need to
Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
evex instruction patterns.
> > document it in detail how it behaves and users will have hard time to figure
> > it out, and specify what it does not just on the command line, but also when
> > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> > be a union of those two ISAs.  Either internally there is an ISA flag 
> > whether
> > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > instructions from the 10.1 to 10.2 delta, or if there is no such separation
> > internally, it will just enable full AVX10.2-512.  User has asked for it.
>
> I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
> good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> 512bits.
>
> So scrap -mavx10.1-256 and -mavx10.1-512 please.

It sounds to me we would have something like
avx512XXX
   ^
   |
"independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
128/256-bit instruction.
   |
avx10.1-256 
> Richard.
>
> > Jakub
> >



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive 
> > > enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific 
> > > to
> > But there's implicit negative (disallow 512-bit vector), I think
>
> That is wrong.
>
> > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > 512-bit vector.
>
> Because then the -mavx10.1-256 option behaves completely differently from
> all the other isa options.
>
> We have the -march= options which are processed separately, but the normal
> ISA options either only enable something (when -mwhatever), or only disable 
> something
> (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> harder to understand.
>
> > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > that either disallows both or allows both. Instead of some isa
> > allowing it and some isa disallowing it.
>
> No, it will be really terrible user experience if the new options behave
> completely differently from everything else.  Because then we'll need to
> document it in detail how it behaves and users will have hard time to figure
> it out, and specify what it does not just on the command line, but also when
> mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> be a union of those two ISAs.  Either internally there is an ISA flag whether
> the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> instructions from the 10.1 to 10.2 delta, or if there is no such separation
> internally, it will just enable full AVX10.2-512.  User has asked for it.

I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
good propose something else.  -mavx512f will enable 512bits, -mavx10.1
will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
512bits.

So scrap -mavx10.1-256 and -mavx10.1-512 please.

Richard.

> Jakub
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive 
> > enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> But there's implicit negative (disallow 512-bit vector), I think

That is wrong.

> -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> 512-bit vector.

Because then the -mavx10.1-256 option behaves completely differently from
all the other isa options.

We have the -march= options which are processed separately, but the normal
ISA options either only enable something (when -mwhatever), or only disable 
something
(when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
ISAs, like say -mavx2 -mbmi is, not an intersection or something even
harder to understand.

> Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> that either disallows both or allows both. Instead of some isa
> allowing it and some isa disallowing it.

No, it will be really terrible user experience if the new options behave
completely differently from everything else.  Because then we'll need to
document it in detail how it behaves and users will have hard time to figure
it out, and specify what it does not just on the command line, but also when
mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
be a union of those two ISAs.  Either internally there is an ISA flag whether
the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
instructions from the 10.1 to 10.2 delta, or if there is no such separation
internally, it will just enable full AVX10.2-512.  User has asked for it.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 4:34 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > I think internally we should have conditional 512bit support work across
> > AVX512 and AVX10.
> >
> > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > enable the respective AVX512 features.  AVX10.2 would then internally
> > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > redundancy and possibly make providing inter-operation between
> > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > just as "re-branding" latest AVX512, so we should treat it that way
> > (making it an alias to the AVX512 features).
> >
> > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > is an entirely separate
> > question.  But I think to not wreck the core idea (more interoperability,
> > here between small/big cores) we absolutely have to
> > provide a subset of avx10.1 but with disabled 512bit vectors which
> > effectively means AVX512 with disabled 512bit support.
>
> Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> name to represent whether the effective ISA set allows 512-bit vectors or
> not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> option IMHO should be in the same spirit to all the others a positive 
> enablement,
> not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> former would allow 512-bit vectors, the latter shouldn't disable those again
> because it isn't a -mno-* option.  Sure, instructions which are specific to
But there's implicit negative (disallow 512-bit vector), I think
-mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
512-bit vector.
Further, we should disallow a mix of exex512 and non-evex512 (e.g.
-mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
that either disallows both or allows both. Instead of some isa
allowing it and some isa disallowing it.
> AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> enabled only in 128/256 bit variants if we differentiate that level.
> But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:53 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 22, 2023 4:36 PM
> > To: Jakub Jelinek 
> > Cc: Jiang, Haochen ; ZiNgA BuRgA
> > ; Hongtao Liu ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> > wrote:
> > > > I think internally we should have conditional 512bit support work across
> > > > AVX512 and AVX10.
> > > >
> > > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > > redundancy and possibly make providing inter-operation between
> > > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > > just as "re-branding" latest AVX512, so we should treat it that way
> > > > (making it an alias to the AVX512 features).
> > > >
> > > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > > is an entirely separate
> > > > question.  But I think to not wreck the core idea (more 
> > > > interoperability,
> > > > here between small/big cores) we absolutely have to
> > > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > > effectively means AVX512 with disabled 512bit support.
> > >
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.
> >
> > Works for me.  Note it also implies mask regs are SImode, not DImode,
> > not sure if that relates to evex more than mask reg encodings are all evex 
> > ...
> >
>
> Just in case we are not on the same page.
>
> So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
> which can also be used on AVX512. The other basic logic will not change.

Yes, I think that fulfills the main complaints.

Internally I'd also like to avoid having TARGET_AVX10.1 guards in the md file
but alias -mavx10.1 to the set of AVX512 sub-ISAs it covers.  Only have
TARGET_AVX10.2 covering ISA extensions introduced with 10.2.

> BTW, -mevex512 is not a good name since there will be 64 bit mask operations
> promoted to EVEX128 in APX, which might cause confusion.
>
> Thx,
> Haochen
>
> > >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive
> > enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific 
> > > to
> > > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might 
> > > be
> > > enabled only in 128/256 bit variants if we differentiate that level.
> > > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> > >
> > > Jakub
> > >


RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 22, 2023 4:36 PM
> To: Jakub Jelinek 
> Cc: Jiang, Haochen ; ZiNgA BuRgA
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > I think internally we should have conditional 512bit support work across
> > > AVX512 and AVX10.
> > >
> > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > redundancy and possibly make providing inter-operation between
> > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > just as "re-branding" latest AVX512, so we should treat it that way
> > > (making it an alias to the AVX512 features).
> > >
> > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > is an entirely separate
> > > question.  But I think to not wreck the core idea (more interoperability,
> > > here between small/big cores) we absolutely have to
> > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > effectively means AVX512 with disabled 512bit support.
> >
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.
> 
> Works for me.  Note it also implies mask regs are SImode, not DImode,
> not sure if that relates to evex more than mask reg encodings are all evex ...
> 

Just in case we are not on the same page.

So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
which can also be used on AVX512. The other basic logic will not change.

BTW, -mevex512 is not a good name since there will be 64 bit mask operations
promoted to EVEX128 in APX, which might cause confusion.

Thx,
Haochen

> >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive
> enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > enabled only in 128/256 bit variants if we differentiate that level.
> > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> >
> > Jakub
> >


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > I think internally we should have conditional 512bit support work across
> > AVX512 and AVX10.
> >
> > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > enable the respective AVX512 features.  AVX10.2 would then internally
> > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > redundancy and possibly make providing inter-operation between
> > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > just as "re-branding" latest AVX512, so we should treat it that way
> > (making it an alias to the AVX512 features).
> >
> > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > is an entirely separate
> > question.  But I think to not wreck the core idea (more interoperability,
> > here between small/big cores) we absolutely have to
> > provide a subset of avx10.1 but with disabled 512bit vectors which
> > effectively means AVX512 with disabled 512bit support.
>
> Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> name to represent whether the effective ISA set allows 512-bit vectors or
> not.

Works for me.  Note it also implies mask regs are SImode, not DImode,
not sure if that relates to evex more than mask reg encodings are all evex ...

>  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> option IMHO should be in the same spirit to all the others a positive 
> enablement,
> not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> former would allow 512-bit vectors, the latter shouldn't disable those again
> because it isn't a -mno-* option.  Sure, instructions which are specific to
> AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> enabled only in 128/256 bit variants if we differentiate that level.
> But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
>
> Jakub
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches wrote:
> I think internally we should have conditional 512bit support work across
> AVX512 and AVX10.
> 
> I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> enable the respective AVX512 features.  AVX10.2 would then internally
> cover the ISA extensions added in 10.2 only.  Both would reduce the
> redundancy and possibly make providing inter-operation between
> AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> just as "re-branding" latest AVX512, so we should treat it that way
> (making it an alias to the AVX512 features).
> 
> Whether we want allow -mavx10.1 -mno-avx512cd or whether
> we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> is an entirely separate
> question.  But I think to not wreck the core idea (more interoperability,
> here between small/big cores) we absolutely have to
> provide a subset of avx10.1 but with disabled 512bit vectors which
> effectively means AVX512 with disabled 512bit support.

Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
name to represent whether the effective ISA set allows 512-bit vectors or
not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
option IMHO should be in the same spirit to all the others a positive 
enablement,
not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
former would allow 512-bit vectors, the latter shouldn't disable those again
because it isn't a -mno-* option.  Sure, instructions which are specific to
AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
enabled only in 128/256 bit variants if we differentiate that level.
But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 5:20 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: ZiNgA BuRgA 
> > Sent: Monday, August 21, 2023 5:27 PM
> > To: Richard Biener ; Hongtao Liu
> > 
> > Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > Another way (not saying this is better, just throwing out ideas) is to
> > break AVX10.1 into all the AVX-512 subsets.
> > So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> >
> > * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> > subsets, and set the __AVX10_1__ define
> > * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> > -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> > (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> > * -mno-avx512vbmi  would similarly be an alias for
> > `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> > with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> > unusual (enable all AVX10.1 but disable all VBMI)
> > * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> > like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off
>
> I am considering a proposal quite similar to this if we want to change the
> design so that it is flexible.
>
> But there are a few proposals on the table. The problem for this proposal
> is that if it is a over-design to make each AVX512 feature to split since in 
> most
> scenarios we just need to keep the vector width as the same.

I think internally we should have conditional 512bit support work across
AVX512 and AVX10.

I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
enable the respective AVX512 features.  AVX10.2 would then internally
cover the ISA extensions added in 10.2 only.  Both would reduce the
redundancy and possibly make providing inter-operation between
AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
just as "re-branding" latest AVX512, so we should treat it that way
(making it an alias to the AVX512 features).

Whether we want allow -mavx10.1 -mno-avx512cd or whether
we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
is an entirely separate
question.  But I think to not wreck the core idea (more interoperability,
here between small/big cores) we absolutely have to
provide a subset of avx10.1 but with disabled 512bit vectors which
effectively means AVX512 with disabled 512bit support.

Richard.

>
> Thx,
> Haochen
>
> >
> >
> > On 21/08/2023 5:36 pm, Richard Biener wrote:
> > > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> > >  wrote:
> > >
> > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > that
> > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > internal
> > > flag for this and the -mavx10.1-256 vector size effect.
> > >
> > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > flag together with -mavx10.1*?)
> > >
> > > mavx512vl-256
> > > Target ...
> > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > the 256bit vector ISA subset of AVX512.
> > >
> > > Richard.
> >
>


RE: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: ZiNgA BuRgA 
> Sent: Monday, August 21, 2023 5:27 PM
> To: Richard Biener ; Hongtao Liu
> 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> Another way (not saying this is better, just throwing out ideas) is to
> break AVX10.1 into all the AVX-512 subsets.
> So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> 
> * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> subsets, and set the __AVX10_1__ define
> * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> * -mno-avx512vbmi  would similarly be an alias for
> `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> unusual (enable all AVX10.1 but disable all VBMI)
> * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off

I am considering a proposal quite similar to this if we want to change the
design so that it is flexible.

But there are a few proposals on the table. The problem for this proposal
is that if it is a over-design to make each AVX512 feature to split since in 
most
scenarios we just need to keep the vector width as the same.

Thx,
Haochen

> 
> 
> On 21/08/2023 5:36 pm, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> >  wrote:
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
> >
> > Richard.
> 



Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 5:35 PM Richard Biener
 wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict 
> > > > > > AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are 
> > > > > > still allowed, so in some cases it might prove difficult to 
> > > > > > guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the 
> > > > > > constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > > that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a 
> > > > new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > > internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as 
> > > something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also 
> > > -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which 
> > > operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
We're only allowing a single vector width.
-mavx10.1-512 mavx10.2-256 will only enable -mavx10.2-256 + -mavx10.1-256.
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.
I see, we can try to use -mavx-max-512bit(maybe another name) to
decouple the size from the base ISA.
And make
 -mavx10.1-256 just implies all -mavx512XXX + -mno-avx-max-512bit,
 -mavx10.1-512 implies -mavx512XXX + mavx-max-512bit.
then -mavx512vl-256 is just equal to -mavx512vl + mno-avx-max-512bit.

Lots of work to do, but still not too late for GCC14.1
>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 
> 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 11:34 AM Richard Biener
 wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict 
> > > > > > AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are 
> > > > > > still allowed, so in some cases it might prove difficult to 
> > > > > > guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the 
> > > > > > constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > > that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a 
> > > > new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > > internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as 
> > > something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also 
> > > -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which 
> > > operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.

Like how about -mavx10.1 -mavx10.2 plus a -mavx10-512 where
-mavx10.[12...] enables just 256 bits (the intended default as Intel thinks)
and -mavx10-512 will enable 512 bits but for the whole selected ISA
(maybe have it enable -max10.1 if that wasn't specified, maybe not).
We can then allow -mno-avx10-512 also with AVX512?

>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 
> 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we need to restrict all 512-bit
> > vector patterns/builtins/intrinsics under both AVX512XXX and
> > TARGET_AVX10_512BIT.
> > similar for register 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
>
> On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> >
> > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > With the proposed design of these switches, how would I restrict 
> > > > > AVX10.1
> > > > > to particular AVX-512 subsets?
> > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > AVX512 related instructions.
> > > >
> > > > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > > > allowed, so in some cases it might prove difficult to guarantee this).
> > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > register is used).
> > > > > I don’t see any other way of doing what you want within the 
> > > > > constraints of this design.
> > > > It looks like the requirement is that we want a
> > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > option that acts on the original -mavx512XXX option to produce
> > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > include avx512fp16 directives and thus not be backward compatible
> > > > SKX/CLX/ICX.
> > >
> > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > that
> > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > internal
> > > flag for this and the -mavx10.1-256 vector size effect.
> > >
> > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > flag together with -mavx10.1*?)
> > >
> > > mavx512vl-256
> > > Target ...
> > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > the 256bit vector ISA subset of AVX512.
> >
> > Wouldn't it be better to have it similarly to other ISA options as something
> > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > EVEX.128)?
> > Have -mavx512f (and anything that implies it right now) imply also -mevex512
> > but allow -mno-evex512 which wouldn't unset everything dependent on
> > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > nothing is left.
> > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > 512-bit modes can be used etc.
> We have an undocumented option mavx10-max-512bit.
>
> 1314;; Only for implementation use
> 1315mavx10-max-512bit
> 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> 1317Indicates 512 bit vector width support for AVX10.

Ah, missed that, but ...

> Currently it's only used for AVX10 only, maybe we can extend it to
> existing AVX512*** FLAGS.
> so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> compatible binaries.

... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
apply, so what is it then?

If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
-mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
I think we opened up too many holes here and the options should be fixed
to decouple the size from the base ISA.

What variable we map this to internally doesn't really matter but yes,
we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 512-enabled-flag

Richard.

> From the implementation perspective, we need to restrict all 512-bit
> vector patterns/builtins/intrinsics under both AVX512XXX and
> TARGET_AVX10_512BIT.
> similar for register allocation, parameter passing, return value,
> vector_mode_supported_p, gather/scatter hook, and all other hooks.
> After that, the -mavx10-max-512bit will divide existing AVX512 into 2
> parts, AVX512XXX-256, AVX512XXX-512.
>
>
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread ZiNgA BuRgA via Gcc-patches
Another way (not saying this is better, just throwing out ideas) is to 
break AVX10.1 into all the AVX-512 subsets.

So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.

* -mavx10.1-256  would effectively be an alias for all the 128+256-bit 
subsets, and set the __AVX10_1__ define
* -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi 
-mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define 
(`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
* -mno-avx512vbmi  would similarly be an alias for 
`-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`; 
with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if 
unusual (enable all AVX10.1 but disable all VBMI)
* -mavx10.2-256  would act as a single feature, cementing in AVX10.2 
like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off



On 21/08/2023 5:36 pm, Richard Biener wrote:

On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
 wrote:

Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
flag that would restrict AVX512VL to 256bit, possibly using a common internal
flag for this and the -mavx10.1-256 vector size effect.

Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
-mavx512vl-256?  Writing these the last looks most sensible to me?
Note it should combine with -mavx512vl to -mavx512vl-256 to make
-march=native -mavx512vl-256 work (I think we should also allow the
flag together with -mavx10.1*?)

mavx512vl-256
Target ...
Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
the 256bit vector ISA subset of AVX512.

Richard.





Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 4:38 PM Jakub Jelinek  wrote:
>
> On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> > We have an undocumented option mavx10-max-512bit.
>
> How it is called internally is one thing, but it is weird to use
> avx10 in an option name which would be meant for finding common subset
> of -mavx512xxx and -mavx10.1-256.
We can have an alias for the name, but internally use the same bit
since they're doing the same thing.
And the option is somewhat orthogonal to  AVX512XXX/AVX10, it only
care about vector/kmask size.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> We have an undocumented option mavx10-max-512bit.

How it is called internally is one thing, but it is weird to use
avx10 in an option name which would be meant for finding common subset
of -mavx512xxx and -mavx10.1-256.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
>
> On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > to particular AVX-512 subsets?
> > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > AVX512 related instructions.
> > >
> > > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > > allowed, so in some cases it might prove difficult to guarantee this).
> > > intel sde support avx10.1-256 target which can be used to validate the
> > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > register is used).
> > > > I don’t see any other way of doing what you want within the constraints 
> > > > of this design.
> > > It looks like the requirement is that we want a
> > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > option that acts on the original -mavx512XXX option to produce
> > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > include avx512fp16 directives and thus not be backward compatible
> > > SKX/CLX/ICX.
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
>
> Wouldn't it be better to have it similarly to other ISA options as something
> positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> EVEX.128)?
> Have -mavx512f (and anything that implies it right now) imply also -mevex512
> but allow -mno-evex512 which wouldn't unset everything dependent on
> -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> nothing is left.
> TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> on 512-bit vector registers or 64-bit mask registers (in addition to the
> other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> 512-bit modes can be used etc.
We have an undocumented option mavx10-max-512bit.

1314;; Only for implementation use
1315mavx10-max-512bit
1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
1317Indicates 512 bit vector width support for AVX10.

Currently it's only used for AVX10 only, maybe we can extend it to
existing AVX512*** FLAGS.
so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
compatible binaries.

>From the implementation perspective, we need to restrict all 512-bit
vector patterns/builtins/intrinsics under both AVX512XXX and
TARGET_AVX10_512BIT.
similar for register allocation, parameter passing, return value,
vector_mode_supported_p, gather/scatter hook, and all other hooks.
After that, the -mavx10-max-512bit will divide existing AVX512 into 2
parts, AVX512XXX-256, AVX512XXX-512.


>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > With the proposed design of these switches, how would I restrict AVX10.1
> > > to particular AVX-512 subsets?
> > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > AVX512 related instructions.
> >
> > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > allowed, so in some cases it might prove difficult to guarantee this).
> > intel sde support avx10.1-256 target which can be used to validate the
> > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > register is used).
> > > I don’t see any other way of doing what you want within the constraints 
> > > of this design.
> > It looks like the requirement is that we want a
> > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > option that acts on the original -mavx512XXX option to produce
> > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > include avx512fp16 directives and thus not be backward compatible
> > SKX/CLX/ICX.
> 
> Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> flag that would restrict AVX512VL to 256bit, possibly using a common internal
> flag for this and the -mavx10.1-256 vector size effect.
> 
> Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> -mavx512vl-256?  Writing these the last looks most sensible to me?
> Note it should combine with -mavx512vl to -mavx512vl-256 to make
> -march=native -mavx512vl-256 work (I think we should also allow the
> flag together with -mavx10.1*?)
> 
> mavx512vl-256
> Target ...
> Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> the 256bit vector ISA subset of AVX512.

Wouldn't it be better to have it similarly to other ISA options as something
positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
EVEX.128)?
Have -mavx512f (and anything that implies it right now) imply also -mevex512
but allow -mno-evex512 which wouldn't unset everything dependent on
-mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
nothing is left.
TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
on 512-bit vector registers or 64-bit mask registers (in addition to the
other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
512-bit modes can be used etc.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread ZiNgA BuRgA via Gcc-patches

Thanks for the responses!

It'd be unfortunate if AVX10 adoption is desired, yet there's no way to 
compile existing 256-bit code to be compatible with it.

Relying on SDE to check the output isn't a particularly viable solution.

It looks like `-mavx512vl -mprefer-vector-width=256` is my best bet 
under this design, and hope it works.  Fortunately, I'm not relying on 
3rd party code here, so I control all intrinsics used.


Something like a `-mmax-vector-width=256` option sounds more preferrable 
though, particularly for those using 3rd party code which checks the 
`__AVX512VL__` define, and assumes 512-bit vectors are available.


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
 wrote:
>
> On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > With the proposed design of these switches, how would I restrict AVX10.1
> > to particular AVX-512 subsets?
> We can't, avx10.1 is taken as an indivisible ISA which contains all
> AVX512 related instructions.
>
> > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > allowed, so in some cases it might prove difficult to guarantee this).
> intel sde support avx10.1-256 target which can be used to validate the
> binary(if there's invalid 512-bit vector register or 64-bit kmask
> register is used).
> > I don’t see any other way of doing what you want within the constraints of 
> > this design.
> It looks like the requirement is that we want a
> -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> option that acts on the original -mavx512XXX option to produce
> avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> include avx512fp16 directives and thus not be backward compatible
> SKX/CLX/ICX.

Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
flag that would restrict AVX512VL to 256bit, possibly using a common internal
flag for this and the -mavx10.1-256 vector size effect.

Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
-mavx512vl-256?  Writing these the last looks most sensible to me?
Note it should combine with -mavx512vl to -mavx512vl-256 to make
-march=native -mavx512vl-256 work (I think we should also allow the
flag together with -mavx10.1*?)

mavx512vl-256
Target ...
Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
the 256bit vector ISA subset of AVX512.

Richard.

> >
> > For example, usage of the |_mm256_rol_epi32| intrinsic should be
> > compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> > without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> > I want compatibility with both these targets?
> >
> >   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
> > with 256-bit AVX10.
> >   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
> > from 512-bit registers, but I don't think it guarantees it.
> >   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
> > features at 256-bit wide (so in theory, it could choose to compile
> > it with |vpshldd|) -> incompatible with Skylake-X.
> >   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
> > and ignore the attempts at disabling AVX-512 subsets.
> >   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
> > the /intersection./
> >
> > Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> > misunderstanding the situation?
> >
> > Thanks!
>
>
>
> --
> BR,
> Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-20 Thread Hongtao Liu via Gcc-patches
On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
 wrote:
>
> Hi,
>
> With the proposed design of these switches, how would I restrict AVX10.1
> to particular AVX-512 subsets?
We can't, avx10.1 is taken as an indivisible ISA which contains all
AVX512 related instructions.

> We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, 
> so in some cases it might prove difficult to guarantee this).
intel sde support avx10.1-256 target which can be used to validate the
binary(if there's invalid 512-bit vector register or 64-bit kmask
register is used).
> I don’t see any other way of doing what you want within the constraints of 
> this design.
It looks like the requirement is that we want a
-mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
option that acts on the original -mavx512XXX option to produce
avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
include avx512fp16 directives and thus not be backward compatible
SKX/CLX/ICX.
>
> For example, usage of the |_mm256_rol_epi32| intrinsic should be
> compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> I want compatibility with both these targets?
>
>   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
> with 256-bit AVX10.
>   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
> from 512-bit registers, but I don't think it guarantees it.
>   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
> features at 256-bit wide (so in theory, it could choose to compile
> it with |vpshldd|) -> incompatible with Skylake-X.
>   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
> and ignore the attempts at disabling AVX-512 subsets.
>   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
> the /intersection./
>
> Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> misunderstanding the situation?
>
> Thanks!



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-19 Thread Richard Biener via Gcc-patches



> Am 20.08.2023 um 00:45 schrieb ZiNgA BuRgA via Gcc-patches 
> :
> 
> Hi,
> 
> With the proposed design of these switches, how would I restrict AVX10.1 to 
> particular AVX-512 subsets?
> 
> For example, usage of the |_mm256_rol_epi32| intrinsic should be compatible 
> on any AVX10/256 implementation, /as well as /any AVX-512VL without AVX10 
> implementation (e.g. Skylake-X).  But how do I signal that I want 
> compatibility with both these targets?
> 
> * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
>   with 256-bit AVX10.
> * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
>   from 512-bit registers, but I don't think it guarantees it.

We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, 
so in some cases it might prove difficult to guarantee this).

I don’t see any other way of doing what you want within the constraints of this 
design.

> * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
>   features at 256-bit wide (so in theory, it could choose to compile
>   it with |vpshldd|) -> incompatible with Skylake-X.
> * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
>   and ignore the attempts at disabling AVX-512 subsets.
> * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
>   the /intersection./
> 
> Is there something like |-mavx512vl -mmax-vector-width=256|, or am I 
> misunderstanding the situation?
> 
> Thanks!


Re: Intel AVX10.1 Compiler Design and Support

2023-08-19 Thread ZiNgA BuRgA via Gcc-patches

Hi,

With the proposed design of these switches, how would I restrict AVX10.1 
to particular AVX-512 subsets?


For example, usage of the |_mm256_rol_epi32| intrinsic should be 
compatible on any AVX10/256 implementation, /as well as /any AVX-512VL 
without AVX10 implementation (e.g. Skylake-X).  But how do I signal that 
I want compatibility with both these targets?


 * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
   with 256-bit AVX10.
 * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
   from 512-bit registers, but I don't think it guarantees it.
 * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
   features at 256-bit wide (so in theory, it could choose to compile
   it with |vpshldd|) -> incompatible with Skylake-X.
 * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
   and ignore the attempts at disabling AVX-512 subsets.
 * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
   the /intersection./

Is there something like |-mavx512vl -mmax-vector-width=256|, or am I 
misunderstanding the situation?


Thanks!


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Joseph Myers
On Thu, 10 Aug 2023, Richard Biener via Gcc-patches wrote:

> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors 
> supported
> by the current ISA?  There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any supported
> vector register.

Note there is a difference in some cases (I don't know if this is relevant 
for x86) between "vectors supported by the current ISA" and "vectors whose 
ABI, for ISAs that do support them, can be implemented using the current 
ISA".

Specifically, when working on the VFP AAPCS variant for 32-bit Arm, I made 
sure that generic vectors had the same ABI on all processors supporting 
VFP, whether or not the vector parts of the instruction set were supported 
on the chosen processor.  On 32-bit Arm that's possible because vector 
registers are the same as floating-point registers (and even the 
single-precision-only VFP variant has suitable load and store 
instructions).

Of course if your ABI for some kinds of vectors uses registers not 
supported on all processors, and on the processors that do support those 
registers you use that ABI for corresponding generic vectors, then you 
won't be able to be compatible with that ABI for those generic vectors on 
processors without those registers.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 03:08:31PM +, Jiang, Haochen via Gcc-patches wrote:
> There are lots of discussions on arch level and ABIs and I really appreciate 
> that.
> 
> For the arch level issue, it might be a little early to discuss and should 
> not block
> these patches.
> 
> For ABI issue, the problem actually comes from the current behavior between
> GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
> Then it becomes a question to get unified and we get the whole discussion.
> However, it is a corner case.

What LLVM does looks just wrong to me.

Try:

typedef int V256 __attribute__((vector_size (32)));
typedef int V512 __attribute__((vector_size (64)));
typedef int V1024 __attribute__((vector_size (128)));

V256
foo256 (V256 x, V256 y)
{
  return x + y;
}

V512
foo512 (V512 x, V512 y)
{
  return x + y;
}

V1024
foo1024 (V1024 x, V1024 y)
{
  return x + y;
}

with -msse4, -mavx2 and -mavx512f.
GCC passes all arguments and all return values in memory with warnings for
the first case, all but foo256 in the second case and everything in foo1024
in the last case.  That matches the psABI without/with __m256 and/or __m512
additions, it is unfortunate that there is no interoperability between the
pre-AVX2 vs. AVX2+ resp. pre-AVX512F vs. AVX512F+ passing/returning, but
that is a consequence of wanting to get fast code on new ISAs.

While LLVM passes all the arguments the same as GCC (though without
warnings), but for foo256 returns the result in xmm0/xmm1 pair with -msse4
and in ymm0 for -mavx2 and later, for foo512 returns the result in
xmm0/xmm1/xmm2/xmm3 quadruplet for -msse4, in ymm0/ymm1 pair for -mavx2 and
finally in zmm0 for -mavx512f.  And for foo1024 in memory for -msse4,
in ymm0/ymm1/ymm2/ymm3 quadruplet for -mavx2 and in zmm0/zmm1 pair for
-mavx512f.  I have no idea what in psABI would that be based on, both the
different passing of arguments vs. returning of result, but more
importantly, this doesn't mean 2 different ABIs for one function depending
on ISA flags, but 3, maybe 4 (with -mno-sse?).

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 03:08:11PM +, Zhang, Annita via Gcc-patches wrote:
> > IMO it is not acceptable for AVX10-256 to generate zmm registers.
> > 
> > If I have to choose among the three proposal, the second is better.
> > 
> > But the best choice I suppose is to keep what we are doing currently, which 
> > is
> > passing them in memory and emit a warning. It is a reasonable behavior.

Completely agree on this.  If anything in the psABI should be changed, that
IMHO would be just clarification if it is not clear enough that when __m256
and/or __m512 are passed on ISAs which do not support those they are passed
in memory.  That is what the psABI was clearly effectively saying before the
__m256 resp. __m512 support has been added there.
So yes, warn and use memory if ISA doesn't support those.

Jakub



RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.

For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
> instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
> AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
> (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
> We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
> in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
> is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
> maximum vector width should be the same when different version of AVX10 is
> used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
> AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
> Every feature showed up in the current version will always show up in 
> future
> version.
>   - AVX10 is an independent ISA feature set.
> Although sharing the same instructions and encodings, AVX10 and AVX512 are
> conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
> 256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
> vector width. “-mno-avx10.x-512” option will not be provided to avoid
> confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
> vector width. But it will disable 512 bit vector width since the vector 
> size
> is indicated in option. “-mno-avx10.x-256” option will not be provided to
> keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the fea

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Zhang, Annita via Gcc-patches
For ABI change proposal, I'd suggest to raise a discussion in x86-64-abi group. 

Thx,
Annita

> -Original Message-
> From: Jiang, Haochen 
> Sent: Thursday, August 10, 2023 10:15 PM
> To: Beulich, Jan ; Phoebe Wang
> 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; x86-64-
> abi ; llvm-dev ;
> Craig Topper ; Richard Biener
> 
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> > -Original Message-
> > From: Jan Beulich 
> > Sent: Thursday, August 10, 2023 9:31 PM
> > To: Phoebe Wang 
> > Cc: Joseph Myers ; Wang, Phoebe
> > ; Hongtao Liu ; Jiang,
> > Haochen ; gcc-patches@gcc.gnu.org;
> > ubiz...@gmail.com; Liu, Hongtao ; Zhang, Annita
> > ; x86-64-abi ;
> > llvm-dev ; Craig Topper
> > ; Richard Biener 
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On 10.08.2023 15:12, Phoebe Wang wrote:
> > >>  The psABI should have some simple rule covering all of the above I 
> > >> think.
> > >
> > > psABI has a rule for the case doesn't mean the rule is a well
> > > defined ABI in practice. A well defined ABI should guarantee 1)
> > > interlinkable across different compile options within the same
> > > compiler; 2) interlinkable across different compilers. Both aspects
> > > are failed in the non 512-
> > bit version.
> > >
> > > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > > Because we expect AVX10-256 is a general setting for binaries that
> > > can run on both AVX10-256 and AVX10-512. It would be common that
> > > binaries compiled with AVX10-256 may link with native built binaries
> > > on AVX10-512
> > targets.
> 
> IMO it is not acceptable for AVX10-256 to generate zmm registers.
> 
> If I have to choose among the three proposal, the second is better.
> 
> But the best choice I suppose is to keep what we are doing currently, which is
> passing them in memory and emit a warning. It is a reasonable behavior.
> 
> Thx,
> Haochen
> 
> >
> > But you're only describing a pre-existing problem here afaict. Code
> > compiled with -mavx51f passing __m512 type data to a function compiled
> > with only, say, -maxv2 won't interoperate properly either. What's
> > worse, imo the psABI doesn't sufficiently define what __m256 etc
> > actually are. After all these aren't types defined by the C standard
> > (as opposed to at least most other types in the respective table
> > there), and you can't really make assumptions like "this is what certain
> compilers think this is".
> >
> > Jan


RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jan Beulich 
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com; Liu,
> Hongtao ; Zhang, Annita ;
> x86-64-abi ; llvm-dev  d...@lists.llvm.org>; Craig Topper ; Richard Biener
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the 
> > non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled 
> with
> -mavx51f passing __m512 type data to a function compiled with only, say, 
> -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is 
> what
> certain compilers think this is".
> 
> Jan


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 3:31 PM Jan Beulich  wrote:
>
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined ABI
> > in practice. A well defined ABI should guarantee 1) interlinkable across
> > different compile options within the same compiler; 2) interlinkable across
> > different compilers. Both aspects are failed in the non 512-bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can run
> > on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> > with AVX10-256 may link with native built binaries on AVX10-512 targets.
>
> But you're only describing a pre-existing problem here afaict. Code compiled
> with -mavx51f passing __m512 type data to a function compiled with only,
> say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
> doesn't sufficiently define what __m256 etc actually are. After all these
> aren't types defined by the C standard (as opposed to at least most other
> types in the respective table there), and you can't really make assumptions
> like "this is what certain compilers think this is".

You might be able to speak in terms of OpenMP SIMD with simdlen?

Richard.

> Jan


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jan Beulich via Gcc-patches
On 10.08.2023 15:12, Phoebe Wang wrote:
>>  The psABI should have some simple rule covering all of the above I think.
> 
> psABI has a rule for the case doesn't mean the rule is a well defined ABI
> in practice. A well defined ABI should guarantee 1) interlinkable across
> different compile options within the same compiler; 2) interlinkable across
> different compilers. Both aspects are failed in the non 512-bit version.
> 
> 1) is more important than 2) and becomes more critical on AVX10 targets.
> Because we expect AVX10-256 is a general setting for binaries that can run
> on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> with AVX10-256 may link with native built binaries on AVX10-512 targets.

But you're only describing a pre-existing problem here afaict. Code compiled
with -mavx51f passing __m512 type data to a function compiled with only,
say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
doesn't sufficiently define what __m256 etc actually are. After all these
aren't types defined by the C standard (as opposed to at least most other
types in the respective table there), and you can't really make assumptions
like "this is what certain compilers think this is".

Jan


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Phoebe Wang via Gcc-patches
>  The psABI should have some simple rule covering all of the above I think.

psABI has a rule for the case doesn't mean the rule is a well defined ABI
in practice. A well defined ABI should guarantee 1) interlinkable across
different compile options within the same compiler; 2) interlinkable across
different compilers. Both aspects are failed in the non 512-bit version.

1) is more important than 2) and becomes more critical on AVX10 targets.
Because we expect AVX10-256 is a general setting for binaries that can run
on both AVX10-256 and AVX10-512. It would be common that binaries compiled
with AVX10-256 may link with native built binaries on AVX10-512 targets.

Both 1) and 2) show the problem of the current rule in the psABI. So I
think the psABI should be updated to solve them.

Thanks
Phoebe

Richard Biener  于2023年8月10日周四 20:46写道:

> On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
>  wrote:
> >
> > >  Changing ABIs like that for existing code that has worked for some
> time
> > on
> > >  existing hardware is a bad idea.
> >
> > I agree, so Proposal 3 is the last choice.
> >
> > The target of the proposals is to solve the ABI incompatible issue
> between
> > AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> > discussing the default ABI rather than other vector variants.
> >
> > If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> > idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> > version an ABI because it doesn't provide the interaction between 256-bit
> > and 512-bit targets. Besides, LLVM also behaves differently with GCC on
> non
> > 512-bit targets. It is a good time to solve the problem together if we
> make
> > the 512-bit ABI consistent and target independent. WDYT?
>
> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors
> supported
> by the current ISA?  There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any
> supported
> vector register.
>
> The psABI should have some simple rule covering all of the above I think.
>
> Richard.
>
> > Thanks
> > Phoebe
> >
> > Joseph Myers  于2023年8月10日周四 04:43写道:
> >
> > > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> > >
> > > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > > passed/returned from memory.
> > >
> > > Changing ABIs like that for existing code that has worked for some
> time on
> > > existing hardware is a bad idea.
> > >
> > > At this point it seems appropriate to remind people of another ABI
> > > consideration for vector extensions.  glibc's libmvec defines vector
> > > versions of various functions, including AVX512 ones (of course those
> > > function versions only work on hardware with the relevant
> instructions).
> > > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > > including those headers, what function variants are available in glibc.
> > >
> > > Existing glibc versions need to continue to work with new compiler
> > > versions.  That is, it's part of the ABI, which must remain stable,
> > > exactly which function versions the above pragma and attribute imply
> are
> > > available - and of course the details of how those functions versions
> take
> > > arguments / return results are also part of the ABI (it would be OK
> for a
> > > new compiler to choose not to use some of those vector versions, but
> not
> > > to start calling them with a different ABI).
> > >
> > > Maybe you'll want to add new vector function versions, with different
> > > interfaces, to libmvec in future.  If so, you need a *different*
> pragma or
> > > attribute to declare to the compiler that the libmvec version using
> that
> > > pragma or attribute has the additional functions - so new compilers
> using
> > > the existing header will not try to generate calls to new function
> > > versions that don't exist in that glibc version (but new compilers
> using a
> > > new header version from new glibc will see the new pragma or attribute
> and
> > > so be able to generate the relevant calls to new functions).  And once
> > > you've defined the ABI for such a new pragma or attribute, that itself
> > > then becomes a stable interface - so if you end up with vector
> extensions
> > > involving yet another set of interfaces, they need another
> corresponding
> > > new pragma / attribute for libmvec to declare to the compiler that the
> new
> > > interfaces exist.
> > >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "X86-64 System V Application Binary Interface" group.
> > > To unsubscribe from this group and stop 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
 wrote:
>
> >  Changing ABIs like that for existing code that has worked for some time
> on
> >  existing hardware is a bad idea.
>
> I agree, so Proposal 3 is the last choice.
>
> The target of the proposals is to solve the ABI incompatible issue between
> AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> discussing the default ABI rather than other vector variants.
>
> If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> version an ABI because it doesn't provide the interaction between 256-bit
> and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
> 512-bit targets. It is a good time to solve the problem together if we make
> the 512-bit ABI consistent and target independent. WDYT?

Isn't this situation similar to the not defined ABI when passing generic
vectors (via __attribute__((vector_size))) that do not map to vectors supported
by the current ISA?  There's cases like vector<2> char or vector<1> double
to consider for example that would fit in a lowpart of a supported vector
register and as in the AVX512 case vectors that are larger than any supported
vector register.

The psABI should have some simple rule covering all of the above I think.

Richard.

> Thanks
> Phoebe
>
> Joseph Myers  于2023年8月10日周四 04:43写道:
>
> > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> >
> > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > passed/returned from memory.
> >
> > Changing ABIs like that for existing code that has worked for some time on
> > existing hardware is a bad idea.
> >
> > At this point it seems appropriate to remind people of another ABI
> > consideration for vector extensions.  glibc's libmvec defines vector
> > versions of various functions, including AVX512 ones (of course those
> > function versions only work on hardware with the relevant instructions).
> > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > including those headers, what function variants are available in glibc.
> >
> > Existing glibc versions need to continue to work with new compiler
> > versions.  That is, it's part of the ABI, which must remain stable,
> > exactly which function versions the above pragma and attribute imply are
> > available - and of course the details of how those functions versions take
> > arguments / return results are also part of the ABI (it would be OK for a
> > new compiler to choose not to use some of those vector versions, but not
> > to start calling them with a different ABI).
> >
> > Maybe you'll want to add new vector function versions, with different
> > interfaces, to libmvec in future.  If so, you need a *different* pragma or
> > attribute to declare to the compiler that the libmvec version using that
> > pragma or attribute has the additional functions - so new compilers using
> > the existing header will not try to generate calls to new function
> > versions that don't exist in that glibc version (but new compilers using a
> > new header version from new glibc will see the new pragma or attribute and
> > so be able to generate the relevant calls to new functions).  And once
> > you've defined the ABI for such a new pragma or attribute, that itself
> > then becomes a stable interface - so if you end up with vector extensions
> > involving yet another set of interfaces, they need another corresponding
> > new pragma / attribute for libmvec to declare to the compiler that the new
> > interfaces exist.
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "X86-64 System V Application Binary Interface" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to x86-64-abi+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> > .
> >


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Phoebe Wang via Gcc-patches
>  Changing ABIs like that for existing code that has worked for some time
on
>  existing hardware is a bad idea.

I agree, so Proposal 3 is the last choice.

The target of the proposals is to solve the ABI incompatible issue between
AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
discussing the default ABI rather than other vector variants.

If you believe that changing 512-bit ABI (the 512-bit version) is a bad
idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
version an ABI because it doesn't provide the interaction between 256-bit
and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
512-bit targets. It is a good time to solve the problem together if we make
the 512-bit ABI consistent and target independent. WDYT?

Thanks
Phoebe

Joseph Myers  于2023年8月10日周四 04:43写道:

> On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
>
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
>
> Changing ABIs like that for existing code that has worked for some time on
> existing hardware is a bad idea.
>
> At this point it seems appropriate to remind people of another ABI
> consideration for vector extensions.  glibc's libmvec defines vector
> versions of various functions, including AVX512 ones (of course those
> function versions only work on hardware with the relevant instructions).
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> including those headers, what function variants are available in glibc.
>
> Existing glibc versions need to continue to work with new compiler
> versions.  That is, it's part of the ABI, which must remain stable,
> exactly which function versions the above pragma and attribute imply are
> available - and of course the details of how those functions versions take
> arguments / return results are also part of the ABI (it would be OK for a
> new compiler to choose not to use some of those vector versions, but not
> to start calling them with a different ABI).
>
> Maybe you'll want to add new vector function versions, with different
> interfaces, to libmvec in future.  If so, you need a *different* pragma or
> attribute to declare to the compiler that the libmvec version using that
> pragma or attribute has the additional functions - so new compilers using
> the existing header will not try to generate calls to new function
> versions that don't exist in that glibc version (but new compilers using a
> new header version from new glibc will see the new pragma or attribute and
> so be able to generate the relevant calls to new functions).  And once
> you've defined the ABI for such a new pragma or attribute, that itself
> then becomes a stable interface - so if you end up with vector extensions
> involving yet another set of interfaces, they need another corresponding
> new pragma / attribute for libmvec to declare to the compiler that the new
> interfaces exist.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to x86-64-abi+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> .
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 08:43:00PM +, Joseph Myers wrote:
> At this point it seems appropriate to remind people of another ABI 
> consideration for vector extensions.  glibc's libmvec defines vector 
> versions of various functions, including AVX512 ones (of course those 
> function versions only work on hardware with the relevant instructions).  
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
> including those headers, what function variants are available in glibc.

For omp declare simd or simd attribute that simply implies that the
variants with 512-bit vectors may only be called from -mavx512f or
-mavx10.1-512 (or how the switch will be called code), not from -mavx10.1.
We shouldn't change that ABI because of AVX10.

Jakub



RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Joseph Myers
On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:

> Proposal 3: Change the ABI of 512-bit vector and always be 
> passed/returned from memory.

Changing ABIs like that for existing code that has worked for some time on 
existing hardware is a bad idea.

At this point it seems appropriate to remind people of another ABI 
consideration for vector extensions.  glibc's libmvec defines vector 
versions of various functions, including AVX512 ones (of course those 
function versions only work on hardware with the relevant instructions).  
glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
__attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
including those headers, what function variants are available in glibc.

Existing glibc versions need to continue to work with new compiler 
versions.  That is, it's part of the ABI, which must remain stable, 
exactly which function versions the above pragma and attribute imply are 
available - and of course the details of how those functions versions take 
arguments / return results are also part of the ABI (it would be OK for a 
new compiler to choose not to use some of those vector versions, but not 
to start calling them with a different ABI).

Maybe you'll want to add new vector function versions, with different 
interfaces, to libmvec in future.  If so, you need a *different* pragma or 
attribute to declare to the compiler that the libmvec version using that 
pragma or attribute has the additional functions - so new compilers using 
the existing header will not try to generate calls to new function 
versions that don't exist in that glibc version (but new compilers using a 
new header version from new glibc will see the new pragma or attribute and 
so be able to generate the relevant calls to new functions).  And once 
you've defined the ABI for such a new pragma or attribute, that itself 
then becomes a stable interface - so if you end up with vector extensions 
involving yet another set of interfaces, they need another corresponding 
new pragma / attribute for libmvec to declare to the compiler that the new 
interfaces exist.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Zhang, Annita via Gcc-patches



> -Original Message-
> From: Michael Matz 
> Sent: Wednesday, August 9, 2023 9:54 PM
> To: Zhang, Annita 
> Cc: Florian Weimer ; Hongtao Liu
> ; Beulich, Jan ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com;
> Liu, Hongtao ; Wang, Phoebe
> ; x86-64-abi ;
> llvm-dev ; Craig Topper ;
> Joseph Myers 
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> Hello,
> 
> On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:
> 
> > > The question is whether you want to mandate the 16-bit floating
> > > point extensions.  You might get better adoption if you stay
> > > compatible with shipping CPUs.  Furthermore, the 256-bit tuning
> > > apparently benefits current Intel CPUs, even though they can do 512-bit
> vectors.
> > >
> > > (The thread subject is a bit misleading for this sub-topic, by the
> > > way.)
> > >
> > > Thanks,
> > > Florian
> >
> > Since 256bit and 512bit are diverged from AVX10.1 and will continue in
> > the future AVX10 versions, I think it's hard to keep a single version
> > number to cover both and increase monotonically. Hence I'd like to
> > suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.
> 
> The raison d'etre for the x86-64-vX scheme is to make life sensible as
> distributor.  That goal can only be achieved if this scheme contains only a 
> few
> components that have a simple relationship.  That basically means:
> one dimension only.  If you now add a second dimension (with and without
> -512) we have to add another one if Intel (or whomever else) next does a
> marketing stunt for feature "foobar" and end up with x86-64-v6, x86-64-v6-
> 512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, x86-64-v6-
> 1024-foobar.
> 
> In short: no.
> 
> It isn't the right time anyway to assign meaning to x86-64-v5, as it wasn't 
> the
> right time for assigning x86-64-v4 (as we now see).  These are supposed to
> reflect generally useful feature sets actually shipped in generally available 
> CPUs
> in the market, and be vendor independend.  As such it's much too early to
> define v5 based purely on text documents.
> 
> 
> Ciao,
> Michael.

Make sense. 


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions.  You might get better adoption if you stay compatible with 
> > shipping
> > CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel 
> > CPUs,
> > even though they can do 512-bit vectors.
> > 
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> > 
> > Thanks,
> > Florian
> 
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in 
> the future AVX10 versions, I think it's hard to keep a single version 
> number to cover both and increase monotonically. Hence I'd like to 
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as 
distributor.  That goal can only be achieved if this scheme contains only 
a few components that have a simple relationship.  That basically means: 
one dimension only.  If you now add a second dimension (with and without 
-512) we have to add another one if Intel (or whomever else) next does a 
marketing stunt for feature "foobar" and end up with x86-64-v6, 
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, 
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it 
wasn't the right time for assigning x86-64-v4 (as we now see).  These are 
supposed to reflect generally useful feature sets actually shipped in 
generally available CPUs in the market, and be vendor independend.  As 
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Zhang, Annita via Gcc-patches


> -Original Message-
> From: Florian Weimer 
> Sent: Wednesday, August 9, 2023 5:16 PM
> To: Hongtao Liu 
> Cc: Beulich, Jan ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com;
> Liu, Hongtao ; Zhang, Annita
> ; Wang, Phoebe ; x86-
> 64-abi ; llvm-dev ;
> Craig Topper ; Joseph Myers
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> * Hongtao Liu:
> 
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher
> >> versions would include everything earlier ones cover. Without such a
> >> guarantee, how would you propose compatibility checks to be
> >> implemented in a way
> 
> Correct, this was the intent.  But it's mostly to foster adoption and make it
> easier for developers to pick the variants that they want to target custom
> builds.  If it's an ascending chain, the trade-offs are simpler.
> 
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
> 
> The glibc framework can deal with alternate choices in principle, although I'd
> prefer not to go there for the reasons indicated.
> 
> >> applicable both forwards and backwards? If a new level is wanted
> >> here, then I guess it could only be something like v3.5.
> 
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with 
> shipping
> CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel CPUs,
> even though they can do 512-bit vectors.
> 
> (The thread subject is a bit misleading for this sub-topic, by the way.)
> 
> Thanks,
> Florian

Since 256bit and 512bit are diverged from AVX10.1 and will continue in the 
future AVX10 versions, I think it's hard to keep a single version number to 
cover both and increase monotonically. Hence I'd like to suggest x86-64-v5 for 
512bit and x86-64-v5-256 for 256bit, and so on. 

Thx,
Annita



 


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 5:15 PM Florian Weimer  wrote:
>
> * Hongtao Liu:
>
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> >> would include everything earlier ones cover. Without such a guarantee, how
> >> would you propose compatibility checks to be implemented in a way
>
> Correct, this was the intent.  But it's mostly to foster adoption and
> make it easier for developers to pick the variants that they want to
> target custom builds.  If it's an ascending chain, the trade-offs are
> simpler.
>
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
>
> The glibc framework can deal with alternate choices in principle,
> although I'd prefer not to go there for the reasons indicated.
>
> >> applicable both forwards and backwards? If a new level is wanted here, then
> >> I guess it could only be something like v3.5.
>
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
>
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with
> shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
> current Intel CPUs, even though they can do 512-bit vectors.
Not only 16-bit floating point, here's a whole picture of  AVX512->AVX10 in
Figure 1-1. Intel® AVX-512 Feature Flags Across Intel® Xeon® Processor
Generations vs. Intel® AVX10
and Figure 1-2. Intel® ISA Families and Features
at https://cdrdv2.intel.com/v1/dl/getContent/784343 (this link is a
direct download of pdf).



>
> (The thread subject is a bit misleading for this sub-topic, by the way.)
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches
* Hongtao Liu:

> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way

Correct, this was the intent.  But it's mostly to foster adoption and
make it easier for developers to pick the variants that they want to
target custom builds.  If it's an ascending chain, the trade-offs are
simpler.

> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.

The glibc framework can deal with alternate choices in principle,
although I'd prefer not to go there for the reasons indicated.

>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.

> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.
> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

The question is whether you want to mandate the 16-bit floating point
extensions.  You might get better adoption if you stay compatible with
shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
current Intel CPUs, even though they can do 512-bit vectors.

(The thread subject is a bit misleading for this sub-topic, by the way.)

Thanks,
Florian



Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 4:14 PM Florian Weimer  wrote:
>
> * Richard Biener via Gcc-patches:
>
> > I don’t think we can realistically change the ABI.  If we could
> > passing them in two 256bit registers would be possible as well.
> >
> > Note I fully expect intel to turn around and implement 512 bits on a
> > 256 but data path on the E cores in 5 years.  And it will take at
> > least that time for AVX10 to take off (look at AVX512 for this and how
> > they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> > shouldn’t worry at all and just wait and see for AVX42 to arrive.
>
> Yes, the direction is a bit unclear.  In retrospect, we could have
> defined x86-64-v4 to use 256 bit vector width, so it could eventually be
> compatible with AVX10; it's also what current Intel CPUs prefer (and
NOTE, avx10.x-256 also inhibit the usage of 64-bit kmask which is
supposed to be only used  by zmm instructions.
But in theory, those 64-bit kmask intrinsics can be used standalone
.i.e. kshift/kand/kor.
> past, with the exception of the Xeon Phi line).  But in the meantime,
> AMD has started to ship CPUs that seem to prefer 512 bit vectors,
> despite having a double pumped implementation.  (Disclaimer: All CPU
> preferences inferred from current compiler tuning defaults, not actual
> experiments. 8-/)
>
> To me, this looks like we may have defined x86-64-v4 prematurely, and
> this suggests we should wait a bit to see where things are heading.
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches
* Richard Biener via Gcc-patches:

> I don’t think we can realistically change the ABI.  If we could
> passing them in two 256bit registers would be possible as well.
>
> Note I fully expect intel to turn around and implement 512 bits on a
> 256 but data path on the E cores in 5 years.  And it will take at
> least that time for AVX10 to take off (look at AVX512 for this and how
> they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> shouldn’t worry at all and just wait and see for AVX42 to arrive.

Yes, the direction is a bit unclear.  In retrospect, we could have
defined x86-64-v4 to use 256 bit vector width, so it could eventually be
compatible with AVX10; it's also what current Intel CPUs prefer (and
past, with the exception of the Xeon Phi line).  But in the meantime,
AMD has started to ship CPUs that seem to prefer 512 bit vectors,
despite having a double pumped implementation.  (Disclaimer: All CPU
preferences inferred from current compiler tuning defaults, not actual
experiments. 8-/)

To me, this looks like we may have defined x86-64-v4 prematurely, and
this suggests we should wait a bit to see where things are heading.

Thanks,
Florian



Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 09:38, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>>
>> On 09.08.2023 04:14, Hongtao Liu wrote:
>>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:

 On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  
 wrote:
>
> Do you have any comments on the interaction of AVX10 with the
> micro-architecture levels defined in the ABI (and supported with
> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> should we take it that any future levels will be ones supporting 512-bit
> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> that only support 256-bit vector width will be considered to match the
> x86-64-v3 micro-architecture level but not any higher level?
 This is actually something we really want to discuss in the community,
 our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
 One big reason is Intel E-core will only support AVX10 256-bit, if we
 want to use x86-64-v5 accross  server and client, it's better to
 256-bit default.
>>
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way
> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.
>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.
> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.

Hmm, yes. But something will end up being odd in any event. Versions no
longer being integral values is kind of indicating a "branch", i.e. v4
not being a successor. Maybe v3.1 would be better, for it to then have
possible successors v3.2, v3.3, etc. Of course it would be possible to
"merge" branches back then, into e.g. v5 covering AVX10.2/512 (and
thus fully covering everything that's in v4).

Jan

> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> Our main proposal is to make AVX10.x as new micro-architecture level
> with 256-bit default, either v3.5 or v5 would be acceptable if it's
> just the name.



Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>
> On 09.08.2023 04:14, Hongtao Liu wrote:
> > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >>
> >> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  
> >> wrote:
> >>>
> >>> Do you have any comments on the interaction of AVX10 with the
> >>> micro-architecture levels defined in the ABI (and supported with
> >>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> >>> should we take it that any future levels will be ones supporting 512-bit
> >>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> >>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> >>> that only support 256-bit vector width will be considered to match the
> >>> x86-64-v3 micro-architecture level but not any higher level?
> >> This is actually something we really want to discuss in the community,
> >> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> >> One big reason is Intel E-core will only support AVX10 256-bit, if we
> >> want to use x86-64-v5 accross  server and client, it's better to
> >> 256-bit default.
>
> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> would include everything earlier ones cover. Without such a guarantee, how
> would you propose compatibility checks to be implemented in a way
Are there many software implemenation based on this assumption?
At least in GCC, it's not a big problem, we can adjust code for the
new micro-architecture level.
> applicable both forwards and backwards? If a new level is wanted here, then
> I guess it could only be something like v3.5.
But if we use avx10.1 as v3.5, it's still not subset of
x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
x86-64-v4), there will be still a diverge.
Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

Our main proposal is to make AVX10.x as new micro-architecture level
with 256-bit default, either v3.5 or v5 would be acceptable if it's
just the name.
>
> Jan



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 04:14, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
>>
>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
>>>
>>> Do you have any comments on the interaction of AVX10 with the
>>> micro-architecture levels defined in the ABI (and supported with
>>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
>>> should we take it that any future levels will be ones supporting 512-bit
>>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
>>> that only support 256-bit vector width will be considered to match the
>>> x86-64-v3 micro-architecture level but not any higher level?
>> This is actually something we really want to discuss in the community,
>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>> want to use x86-64-v5 accross  server and client, it's better to
>> 256-bit default.

Aiui these ABI levels were intended to be incremental, i.e. higher versions
would include everything earlier ones cover. Without such a guarantee, how
would you propose compatibility checks to be implemented in a way
applicable both forwards and backwards? If a new level is wanted here, then
I guess it could only be something like v3.5.

Jan


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 8, 2023 8:45 PM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; Liu, Hongtao 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions
> > and encodings, AVX10 and AVX512 are conceptual independent features,
> > which means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f /
> > > TARGET_AVX512F means 512 bit vector support is available and most of
> > > the various -mavx512XXX options imply -mavx512f (and -mno-avx512f
> > > turns those off).  And if -mavx512vl / TARGET_AVX512VL isn't
> > > available, tons of places just use 512-bit EVEX instructions for
> > > 256-bit or 128-bit stuff (mostly to be able to access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always
> > there, and also for [xy]mm16+. 512 version is "optional", which needs
> > user to indicate them in options. When 512 version is enabled,
> > 128/256/scalar version is also enabled, which is kind of reverse relation
> > between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic
> > for the current pattern, which is shown in our AVX512DQ+VL sample patches.
> 
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
>

We take AVX10 and AVX512 two independent ISAs.

Therefore, it is quite weird to disable something with another unrelated ISA.
I don't think -mavx10.1 -mno-avx512f should disable anything.

Thx,
Haochen

> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they
> > > have AVX512F CPUID even when the 512-bit vectors aren't present?
> > > What happens if one mixes the -mavx10* options together with
> > > -mno-avx512vl or similar options?  Will -mno-avx512f still imply 
> > > -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only
> > Xeon Server will have AVX512 related CPUIDs for backward
> > compatibility. For GNR, it will be AVX512F, AVX512VL, AVX512CD,
> > AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI, AVX512_VNNI,
> > AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom
> Server and client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10
> > and legacy
> > AVX512 options. We understand that there will be different opinions on
> > what should compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using 
> > OR logic.
> > Therefore, enabling either feature will turn on the shared
> > instructions, no matter the other feature is not mentioned or closed.
> > That is why we are emitting warning for some scenarios, which is also
> > mentioned in the letter.
> 
> I'm refraining from commenting on the senslesness of AVX10 as you're likely on
> the same receiving side as us.
> 
> Thanks,
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > >   Jakub
> >


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 9, 2023 1:38 PM
> To: Phoebe Wang 
> Cc: Hongtao Liu ; Joseph Myers
> ; Jiang, Haochen ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; Wang,
> Phoebe ; x86-64-abi  a...@googlegroups.com>; llvm-dev ; Craig Topper
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> 
> 
> > Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > I have some proposals about unifying ABI on AVX10 for both 256-bit
> > and 512-bit.
> >
> >
> >
> > Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any
> > function which has 512-bit or above vectors in passing/returning arguments.
> >
> > Problem: Binary cannot run on AVX10-256 only target.
> >
> > Reason:
> >
> > When user tries to pass/return 512-bit vector, they should be aware of
> > it will become target dependent. User should be taught not to use it
> > on 256-bit targets and there will be unexpected things happening if
> > they insist.
> >
> > Actually, ICC and MSVC already have chosen to promote for the argument:
> > https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> > misbehavior between fail in result and crash due to illegal
> > instruction, the latter is definitely better than the former.
> >
> > In this way, we can also declare x86-64-v5 is inherit from x86-64-v4
> > and has the interaction with previous versions.
> >
> >
> >
> > Proposal 2: Abort compilation when user tries to pass/return 512-bit
> > vectors.
> >
> > Reason: This turns possible run time crash into compile time error.
> >
> >
> >
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
> 
> I don’t think we can realistically change the ABI.  If we could passing them 
> in two
> 256bit registers would be possible as well.
> 
> Note I fully expect intel to turn around and implement 512 bits on a 256 but 
> data
> path on the E cores in 5 years.  And it will take at least that time for 
> AVX10 to take
> off (look at AVX512 for this and how they cautionously chose to include bf16 
> to
> cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for 
> AVX42
> to arrive.

Let me try to clarify the whole thing.

I suppose Phoebe's "change" is based on LLVM.

In GCC, current behavior is to pass 512 bit vector in memory when there is no
512 bit support. But when there is support, everything should be passed in 
register.

In AVX10, I prefer to still keep to this pattern. But if most of you want to 
change it,
I have no objection since AVX10 is a new start.

Thx,
Haochen

> 
> Richard
> 
> > Reason: We expect AVX10-256 is a universal configuration and in most
> > scenarios, 512-bit vector won't bring performance improvements. So we
> > can sacrifice a little 512-bit performance to achieve the interaction
> > between
> > AVX10-256 and AVX10-512. In this way, there won't have any runtime
> > issue in the future either.
> >
> >
> >
> > Thanks
> >
> > Phoebe
> >
> > Hongtao Liu  于2023年8月9日周三 10:18写道:
> >
> >>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
> >>>
> >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >>>>
> >>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers
> >>>> 
> >> wrote:
> >>>>>
> >>>>> Do you have any comments on the interaction of AVX10 with the
> >>>>> micro-architecture levels defined in the ABI (and supported with
> >>>>> glibc-hwcaps directories in glibc)?  Given that the levels are
> >> cumulative,
> >>>>> should we take it that any future levels will be ones supporting
> >> 512-bit
> >>>>> vector width for AVX10 (because x86-64-v4 requires the current
> >> AVX512F,
> >>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> >> processors
> >>>>> that only support 256-bit vector width will be considered to match
> >> the
> >>>>> x86-64-v3 micro-architecture level but not any higher level?
> >>>> This is actually something we really want to discuss in the
> >>>> community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-
> 256) + APX.
> >>>> One big reason is Intel E-core will only support AVX10 256-bit, if
> >>>> we want to us

Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Richard Biener via Gcc-patches



> Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches 
> :
> 
> I have some proposals about unifying ABI on AVX10 for both 256-bit and
> 512-bit.
> 
> 
> 
> Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function
> which has 512-bit or above vectors in passing/returning arguments.
> 
> Problem: Binary cannot run on AVX10-256 only target.
> 
> Reason:
> 
> When user tries to pass/return 512-bit vector, they should be aware of it
> will become target dependent. User should be taught not to use it on
> 256-bit targets and there will be unexpected things happening if they
> insist.
> 
> Actually, ICC and MSVC already have chosen to promote for the argument:
> https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> misbehavior between fail in result and crash due to illegal instruction,
> the latter is definitely better than the former.
> 
> In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and
> has the interaction with previous versions.
> 
> 
> 
> Proposal 2: Abort compilation when user tries to pass/return 512-bit
> vectors.
> 
> Reason: This turns possible run time crash into compile time error.
> 
> 
> 
> Proposal 3: Change the ABI of 512-bit vector and always be passed/returned
> from memory.

I don’t think we can realistically change the ABI.  If we could passing them in 
two 256bit registers would be possible as well.

Note I fully expect intel to turn around and implement 512 bits on a 256 but 
data path on the E cores in 5 years.  And it will take at least that time for 
AVX10 to take off (look at AVX512 for this and how they cautionously chose to 
include bf16 to cut off Zen4).  So IMHO we shouldn’t worry at all and just wait 
and see for AVX42 to arrive.

Richard 

> Reason: We expect AVX10-256 is a universal configuration and in most
> scenarios, 512-bit vector won't bring performance improvements. So we can
> sacrifice a little 512-bit performance to achieve the interaction between
> AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in
> the future either.
> 
> 
> 
> Thanks
> 
> Phoebe
> 
> Hongtao Liu  于2023年8月9日周三 10:18写道:
> 
>>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
>>> 
>>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
 
 On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers 
>> wrote:
> 
> Do you have any comments on the interaction of AVX10 with the
> micro-architecture levels defined in the ABI (and supported with
> glibc-hwcaps directories in glibc)?  Given that the levels are
>> cumulative,
> should we take it that any future levels will be ones supporting
>> 512-bit
> vector width for AVX10 (because x86-64-v4 requires the current
>> AVX512F,
> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
>> processors
> that only support 256-bit vector width will be considered to match
>> the
> x86-64-v3 micro-architecture level but not any higher level?
 This is actually something we really want to discuss in the community,
 our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
 One big reason is Intel E-core will only support AVX10 256-bit, if we
 want to use x86-64-v5 accross  server and client, it's better to
 256-bit default.
>>> + ABI and LLVM folked for this topic.
>> s/folked/folks/
>> 
> 
> --
> Joseph S. Myers
> jos...@codesourcery.com
 
 
 
 --
 BR,
 Hongtao
>>> 
>>> 
>>> 
>>> --
>>> BR,
>>> Hongtao
>> 
>> 
>> 
>> --
>> BR,
>> Hongtao
>> 
>> --
>> You received this message because you are subscribed to the Google Groups
>> "X86-64 System V Application Binary Interface" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to x86-64-abi+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB-9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com
>> .
>> 


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Phoebe Wang via Gcc-patches
I have some proposals about unifying ABI on AVX10 for both 256-bit and
512-bit.



Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function
which has 512-bit or above vectors in passing/returning arguments.

Problem: Binary cannot run on AVX10-256 only target.

Reason:

When user tries to pass/return 512-bit vector, they should be aware of it
will become target dependent. User should be taught not to use it on
256-bit targets and there will be unexpected things happening if they
insist.

Actually, ICC and MSVC already have chosen to promote for the argument:
https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
misbehavior between fail in result and crash due to illegal instruction,
the latter is definitely better than the former.

In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and
has the interaction with previous versions.



Proposal 2: Abort compilation when user tries to pass/return 512-bit
vectors.

Reason: This turns possible run time crash into compile time error.



Proposal 3: Change the ABI of 512-bit vector and always be passed/returned
from memory.

Reason: We expect AVX10-256 is a universal configuration and in most
scenarios, 512-bit vector won't bring performance improvements. So we can
sacrifice a little 512-bit performance to achieve the interaction between
AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in
the future either.



Thanks

Phoebe

Hongtao Liu  于2023年8月9日周三 10:18写道:

> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
> >
> > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> > >
> > > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers 
> wrote:
> > > >
> > > > Do you have any comments on the interaction of AVX10 with the
> > > > micro-architecture levels defined in the ABI (and supported with
> > > > glibc-hwcaps directories in glibc)?  Given that the levels are
> cumulative,
> > > > should we take it that any future levels will be ones supporting
> 512-bit
> > > > vector width for AVX10 (because x86-64-v4 requires the current
> AVX512F,
> > > > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> processors
> > > > that only support 256-bit vector width will be considered to match
> the
> > > > x86-64-v3 micro-architecture level but not any higher level?
> > > This is actually something we really want to discuss in the community,
> > > our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> > > One big reason is Intel E-core will only support AVX10 256-bit, if we
> > > want to use x86-64-v5 accross  server and client, it's better to
> > > 256-bit default.
> > + ABI and LLVM folked for this topic.
> s/folked/folks/
>
> > > >
> > > > --
> > > > Joseph S. Myers
> > > > jos...@codesourcery.com
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
>
> --
> You received this message because you are subscribed to the Google Groups
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to x86-64-abi+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB-9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com
> .
>


RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Wang, Phoebe via Gcc-patches
I have some proposals about unifying ABI on AVX10 for both 256-bit and 512-bit.

Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function 
which has 512-bit or above vectors in passing/returning arguments.
Problem: Binary cannot run on AVX10-256 only target.
Reason:
When user tries to pass/return 512-bit vector, they should be aware of it will 
become target dependent. User should be taught not to use it on 256-bit targets 
and there will be unexpected things happening if they insist.
Actually, ICC and MSVC already have chosen to promote for the argument: 
https://godbolt.org/z/vcrf9qW5z
I think if compiler have to choose the misbehavior between fail in result and 
crash due to illegal instruction, the latter is definitely better than the 
former.
In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and has 
the interaction with previous versions.

Proposal 2: Abort compilation when user tries to pass/return 512-bit vectors.
Reason: This turns possible run time crash into compile time error.

Proposal 3: Change the ABI of 512-bit vector and always be passed/returned from 
memory.
Reason: We expect AVX10-256 is a universal configuration and in most scenarios, 
512-bit vector won't bring performance improvements. So we can sacrifice a 
little 512-bit performance to achieve the interaction between AVX10-256 and 
AVX10-512. In this way, there won't have any runtime issue in the future either.

Thanks
Phoebe

-Original Message-
From: Hongtao Liu  
Sent: Wednesday, August 9, 2023 10:19 AM
To: Joseph Myers 
Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org; 
ubiz...@gmail.com; Liu, Hongtao ; Zhang, Annita 
; Wang, Phoebe ; x86-64-abi 
; llvm-dev ; Craig Topper 

Subject: Re: Intel AVX10.1 Compiler Design and Support

On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
>
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >
> > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
> > >
> > > Do you have any comments on the interaction of AVX10 with the 
> > > micro-architecture levels defined in the ABI (and supported with 
> > > glibc-hwcaps directories in glibc)?  Given that the levels are 
> > > cumulative, should we take it that any future levels will be ones 
> > > supporting 512-bit vector width for AVX10 (because x86-64-v4 
> > > requires the current AVX512F, AVX512BW, AVX512CD, AVX512DQ and 
> > > AVX512VL) - and so any future processors that only support 256-bit 
> > > vector width will be considered to match the
> > > x86-64-v3 micro-architecture level but not any higher level?
> > This is actually something we really want to discuss in the 
> > community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + 
> > APX.
> > One big reason is Intel E-core will only support AVX10 256-bit, if 
> > we want to use x86-64-v5 accross  server and client, it's better to 
> > 256-bit default.
> + ABI and LLVM folked for this topic.
s/folked/folks/

> > >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
>
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >
> > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
> > >
> > > Do you have any comments on the interaction of AVX10 with the
> > > micro-architecture levels defined in the ABI (and supported with
> > > glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> > > should we take it that any future levels will be ones supporting 512-bit
> > > vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> > > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> > > that only support 256-bit vector width will be considered to match the
> > > x86-64-v3 micro-architecture level but not any higher level?
> > This is actually something we really want to discuss in the community,
> > our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> > One big reason is Intel E-core will only support AVX10 256-bit, if we
> > want to use x86-64-v5 accross  server and client, it's better to
> > 256-bit default.
> + ABI and LLVM folked for this topic.
s/folked/folks/

> > >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
>
> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
> >
> > Do you have any comments on the interaction of AVX10 with the
> > micro-architecture levels defined in the ABI (and supported with
> > glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> > should we take it that any future levels will be ones supporting 512-bit
> > vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> > that only support 256-bit vector width will be considered to match the
> > x86-64-v3 micro-architecture level but not any higher level?
> This is actually something we really want to discuss in the community,
> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> One big reason is Intel E-core will only support AVX10 256-bit, if we
> want to use x86-64-v5 accross  server and client, it's better to
> 256-bit default.
+ ABI and LLVM folked for this topic.
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 10:06 AM Hongtao Liu  wrote:
>
> On Tue, Aug 8, 2023 at 8:45 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
> >  wrote:
> > >
> > > Hi Jakub,
> > >
> > > > So, what does this imply for the current ISAs?
> > >
> > > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > > independent ISA feature set. Although sharing the same instructions and
> > > encodings, AVX10 and AVX512 are conceptual independent features, which
> > > means they are orthogonal.
> > >
> > > > The expectations in lots of config/i386/* is that -mavx512f / 
> > > > TARGET_AVX512F
> > > > means 512 bit vector support is available and most of the various 
> > > > -mavx512XXX
> > > > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > > > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > > > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be 
> > > > able to
> > > > access [xy]mm16+).
> > >
> > > For AVX10, the 128/256/scalar version of the instructions are always 
> > > there, and
> > > also for [xy]mm16+. 512 version is "optional", which needs user to 
> > > indicate them
> > > in options. When 512 version is enabled, 128/256/scalar version is also 
> > > enabled,
> > > which is kind of reverse relation between the current AVX512F/AVX512VL.
> > >
> > > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for 
> > > the current
> > > pattern, which is shown in our AVX512DQ+VL sample patches.
> >
> > Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to 
> > specify
> > AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> In the future there're plantfomrs only support AVX10.x-256, but not
> AVX512 stuffs, it doesn't make much sense on that platfrom to disable
> part of AVX512.
> We really want to make AVX10.x a indivisible features, just like other
> individual CPUID.
> > complement those only so one can use, say, -mavx10 -mno-avx512bf16 to 
> > disable
> > parts of the former AVX512 ISA one doesn't like to get code generated for?
> > -mavx10 would then enable all the existing sub-AVX512 ISAs?
> Another alternative solution is
is split AVX512 into AVX512-256 and AVX512-512, like AVX512F-256,
AVX512FP16-256, AVX512FP16-512, AVX512FP16-512, and make AVX10.1-256
implies those AVX512-256, AVX10.1-512 implies AVX512-512.
> >
> > > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > > > AVX512F CPUID even when the 512-bit vectors aren't present? What 
> > > > happens if
> > > > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > > > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
> > >
> > > For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon 
> > > Server
> > > will have AVX512 related CPUIDs for backward compatibility. For GNR, it 
> > > will be
> > > AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> > > AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. 
> > > Atom Server and
> > > client will only have AVX10 CPUIDs with 256 bit support set.
> > >
> > > -mno-avx512f will still imply -mno-avx512vl.
> > >
> > > As we mentioned below, we don't recommend users to combine the AVX10 and 
> > > legacy
> > > AVX512 options. We understand that there will be different opinions on 
> > > what should
> > > compiler behave on some controversial option combinations.
> > >
> > > If there is someone mixes the options, the golden rule is that we are 
> > > using OR logic.
> > > Therefore, enabling either feature will turn on the shared instructions, 
> > > no matter the other
> > > feature is not mentioned or closed. That is why we are emitting warning 
> > > for some scenarios,
> > > which is also mentioned in the letter.
> >
> > I'm refraining from commenting on the senslesness of AVX10 as you're
> > likely on the same
> > receiving side as us.
> >
> > Thanks,
> > Richard.
> >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > >   Jakub
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 8:45 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
>  wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions and
> > encodings, AVX10 and AVX512 are conceptual independent features, which
> > means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f / 
> > > TARGET_AVX512F
> > > means 512 bit vector support is available and most of the various 
> > > -mavx512XXX
> > > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able 
> > > to
> > > access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always there, 
> > and
> > also for [xy]mm16+. 512 version is "optional", which needs user to indicate 
> > them
> > in options. When 512 version is enabled, 128/256/scalar version is also 
> > enabled,
> > which is kind of reverse relation between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
> > current
> > pattern, which is shown in our AVX512DQ+VL sample patches.
>
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
In the future there're plantfomrs only support AVX10.x-256, but not
AVX512 stuffs, it doesn't make much sense on that platfrom to disable
part of AVX512.
We really want to make AVX10.x a indivisible features, just like other
individual CPUID.
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
Another alternative solution is
>
> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > > AVX512F CPUID even when the 512-bit vectors aren't present? What happens 
> > > if
> > > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon 
> > Server
> > will have AVX512 related CPUIDs for backward compatibility. For GNR, it 
> > will be
> > AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> > AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
> > Server and
> > client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10 and 
> > legacy
> > AVX512 options. We understand that there will be different opinions on what 
> > should
> > compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using 
> > OR logic.
> > Therefore, enabling either feature will turn on the shared instructions, no 
> > matter the other
> > feature is not mentioned or closed. That is why we are emitting warning for 
> > some scenarios,
> > which is also mentioned in the letter.
>
> I'm refraining from commenting on the senslesness of AVX10 as you're
> likely on the same
> receiving side as us.
>
> Thanks,
> Richard.
>
> > Thx,
> > Haochen
> >
> > >
> > >   Jakub
> >



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
>
> Do you have any comments on the interaction of AVX10 with the
> micro-architecture levels defined in the ABI (and supported with
> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> should we take it that any future levels will be ones supporting 512-bit
> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> that only support 256-bit vector width will be considered to match the
> x86-64-v3 micro-architecture level but not any higher level?
This is actually something we really want to discuss in the community,
our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
One big reason is Intel E-core will only support AVX10 256-bit, if we
want to use x86-64-v5 accross  server and client, it's better to
256-bit default.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Joseph Myers
Do you have any comments on the interaction of AVX10 with the 
micro-architecture levels defined in the ABI (and supported with 
glibc-hwcaps directories in glibc)?  Given that the levels are cumulative, 
should we take it that any future levels will be ones supporting 512-bit 
vector width for AVX10 (because x86-64-v4 requires the current AVX512F, 
AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors 
that only support 256-bit vector width will be considered to match the 
x86-64-v3 micro-architecture level but not any higher level?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Richard Biener via Gcc-patches
On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
 wrote:
>
> Hi Jakub,
>
> > So, what does this imply for the current ISAs?
>
> AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> independent ISA feature set. Although sharing the same instructions and
> encodings, AVX10 and AVX512 are conceptual independent features, which
> means they are orthogonal.
>
> > The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> > means 512 bit vector support is available and most of the various 
> > -mavx512XXX
> > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> > access [xy]mm16+).
>
> For AVX10, the 128/256/scalar version of the instructions are always there, 
> and
> also for [xy]mm16+. 512 version is "optional", which needs user to indicate 
> them
> in options. When 512 version is enabled, 128/256/scalar version is also 
> enabled,
> which is kind of reverse relation between the current AVX512F/AVX512VL.
>
> Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
> current
> pattern, which is shown in our AVX512DQ+VL sample patches.

Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
parts of the former AVX512 ISA one doesn't like to get code generated for?
-mavx10 would then enable all the existing sub-AVX512 ISAs?

> > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
>
> For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon 
> Server
> will have AVX512 related CPUIDs for backward compatibility. For GNR, it will 
> be
> AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
> Server and
> client will only have AVX10 CPUIDs with 256 bit support set.
>
> -mno-avx512f will still imply -mno-avx512vl.
>
> As we mentioned below, we don't recommend users to combine the AVX10 and 
> legacy
> AVX512 options. We understand that there will be different opinions on what 
> should
> compiler behave on some controversial option combinations.
>
> If there is someone mixes the options, the golden rule is that we are using 
> OR logic.
> Therefore, enabling either feature will turn on the shared instructions, no 
> matter the other
> feature is not mentioned or closed. That is why we are emitting warning for 
> some scenarios,
> which is also mentioned in the letter.

I'm refraining from commenting on the senslesness of AVX10 as you're
likely on the same
receiving side as us.

Thanks,
Richard.

> Thx,
> Haochen
>
> >
> >   Jakub
>


RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches
Hi Jakub,

> So, what does this imply for the current ISAs?

AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
independent ISA feature set. Although sharing the same instructions and
encodings, AVX10 and AVX512 are conceptual independent features, which
means they are orthogonal.

> The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> means 512 bit vector support is available and most of the various -mavx512XXX
> options imply -mavx512f (and -mno-avx512f turns those off).  And if
> -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> access [xy]mm16+).

For AVX10, the 128/256/scalar version of the instructions are always there, and
also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
in options. When 512 version is enabled, 128/256/scalar version is also enabled,
which is kind of reverse relation between the current AVX512F/AVX512VL.

Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
current
pattern, which is shown in our AVX512DQ+VL sample patches.

> Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> one mixes the -mavx10* options together with -mno-avx512vl or similar
> options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
Server and
client will only have AVX10 CPUIDs with 256 bit support set.

-mno-avx512f will still imply -mno-avx512vl.

As we mentioned below, we don't recommend users to combine the AVX10 and legacy
AVX512 options. We understand that there will be different opinions on what 
should
compiler behave on some controversial option combinations.

If there is someone mixes the options, the golden rule is that we are using OR 
logic.
Therefore, enabling either feature will turn on the shared instructions, no 
matter the other
feature is not mentioned or closed. That is why we are emitting warning for 
some scenarios,
which is also mentioned in the letter.

Thx,
Haochen

> 
>   Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 08, 2023 at 03:13:09PM +0800, Haochen Jiang via Gcc-patches wrote:
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we 
> would
> like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.

So, what does this imply for the current ISAs?
The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
means 512 bit vector support is available and most of the various -mavx512XXX
options imply -mavx512f (and -mno-avx512f turns those off).  And if
-mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
access [xy]mm16+).
Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
one mixes the -mavx10* options together with -mno-avx512vl or similar
options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

Jakub