Re: New x86-64 micro-architecture levels

2020-07-31 Thread Carlos O'Donell via Gcc
On 7/22/20 6:34 AM, Florian Weimer wrote:
> * Jan Beulich:
> 
>> On 21.07.2020 20:04, Florian Weimer wrote:
>>> * Premachandra Mallappa:
>>> 
 [AMD Public Use]
 
 Hi Floarian,
 
> I'm including a proposal for the levels below.  I use single
> letters for them, but I expect that the concrete
> implementation of this proposal will use names like
> “x86-100”, “x86-101”, like in the glibc patch referenced
> above.  (But we can discuss other approaches.)
 
 Personally I am not a big fan of this, for 2 reasons 1. uses
 just x86 in name on x86_64 as well
>>> 
>>> That's deliberate, so that we can use the same x86-* names for
>>> 32-bit library selection (once we define matching
>>> micro-architecture levels there).
>> 
>> While indeed I did understand it to be deliberate, in the light of 
>> 64-bit only ISA extensions (like AMX, and I suspect we're going to 
>> see more) I nevertheless think Premachandra has a point here.
> 
> Let me explain how I ended up there.  Maybe I'm wrong.

I did a review of your analysis, and it is my opinion that your
conclusion is correct.

> Previously, I observed that it is difficult to set LD_PRELOAD and 
> LD_LIBRARY_PATH on combined x86-64/i386 systems, so that the right 
> libraries are loaded for both variants, and users aren't confused by 
> dynamic linker warning messages.  On some systems, it is possible to
> use dynamic string tokens ($LIB), but not all.

The case of LD_PRELOAD is the most difficult because it is a direct request
to the dynamic loader to load a particular library. If the library to be
loaded is an absolute path then you'll always get warning messages if you
need to execute child processes that inherited LD_PRELOAD for an architecture
that doesn't match the architecture to be executed.

The case of LD_LIBRARY_PATH is generally less troublesome because you are
adding search paths, and the library loading can be suppressed by other
mechanisms that include search path pruning.

It is also possible that $LIB does not match what is actually required
for the system to operate correctly and it depends on /etc/ld.so.conf
(and included files) for correctness (despite it being a cache, see
glibc bug 22359). This is an ISV problem that the ISV can solve.

> Eventually, it will be possible to add and restrict glibc-hwcaps 
> subdirectories by setting an environment variable.  The original
> patch series only contains ld.so command line options because I
> wanted to avoid a discussion about the precise mechanism for setting
> the environment variable (current glibc has two approaches).  But the
> desire to provide this functionality is there: for adding additional 
> glibc-hwcaps subdirectories to be searched first, and for
> restricting selection to a subset of the built-in
> (automatically-selected) subdirectories.

If you allow the addition of subdirectories, those subdirectories
can then be processed as directories are normally processed and we
can indeed avoid emitting an error message. The addition of directories
is not a direct request to the loader to load a specific shared object.

> I was worried that we would run into the same problem as with 
> LD_PRELOAD, where x86-64 and i386 binaries may have different 
> requirements.  I wanted to minimize the conflict by sharing the
> names (eventually, once we have 32-bit variants).

Right, this would make it easier to deploy from the ISV side.

> But thinking about this again, I'm not sure if my worry is
> warranted. The main selection criteria is still the library load
> path, and that is already provided by some different means (e.g.
> $LIB).  Within the library path, there is the glibc-hwcaps
> subdirectory, but since it is nested under a specific library path
> subdirectory (determined by the architecture), adding subdirectories
> to be searched which do not exist on the file system, or surpressing
> directories which would not be searched in the first place, is not a
> problem.  The situation is completely benign and would not warrant
> any error message from the dynamic loader.

I agree completely.
 
> If this analysis is correct, there is no reason to share the 
> subdirectory names between x86-64 and i386 binaries, and we can put
> “64” somewhere in the x86-64 strings.

We can choose not to share the paths. In fact it may make it easier to
explain to users that they are distinct.

In summary:

The conclusion is that x86-64 and i386 shared objects can use different
directories because they are just search paths, and such search paths
have different semantics from explicit load requests like LD_PRELOAD,
therefore they can be suppressed at runtime without the need to issue
an error or warning diagnostic.

Notes:
- We may wish to have an LD_DEBUG settings that helps catch issues
  with various paths, but that's a diagnostic settings whose semantics
  we can iron out as we discover developers making bad choices.

-- 
Cheers,
Carlos.



Re: New x86-64 micro-architecture levels

2020-07-31 Thread Carlos O'Donell via Gcc
On 7/22/20 5:26 AM, Richard Biener via Libc-alpha wrote:
> So for the bike-shedding I indeed think x86-10{0,1,2,3}
> or x86-{A,B,C,..}, eventually duplicating as x86_64- as
> suggested by Jan is better than x86-2014 or x86-avx2.

Agreed. If we really want to be clear, call it a "level"
or something else that has a direct English meaning
e.g. x86-level-101. This makes it unambiguously clear
that this is some kind of step at which some kind of
features are enabled.

-- 
Cheers,
Carlos.



Re: New x86-64 micro-architecture levels

2020-07-28 Thread Florian Weimer via Gcc
* Premachandra Mallappa:

> [AMD Public Use]

>>> Also we would also like to have dynamic loader support for "zen" / 
>>> "zen2" as a version of "Level D" and takes preference over Level D, 
>>> which may have super-optimized libraries from AMD or other vendors.
>
>> *That* shouldn't be too hard to implement if we can nail down the selection 
>> criteria.  Let's call this Zen-specific Level C x86-zen-avx2 for the sake of 
>> exposition.
>
> Some way of specifying a superset of "level C" , that "C" will capture fully.
>
> Zen/zen2 takes precedence over Level C, but not Level D, but falls
> back to "Level C" or "x86-avx2" but not "x86-avx".
>
> I think it is better to run a x86-zen on a x86-avx2 or x86-avx
> compared to running on a base x86_64 config.

We discussed this off-list for a bit and concluded that we do not want
to address this as part of this proposal.

I went ahead and created a merge request against the x86-64 psABI
supplement:

  

I used x86-64-v2 etc. as the level names, picking up the suggestion to
use x86-64 there.  I think we don't need to share names with 32-bit (if
that ever happens), as explained here:

  

There are only three new levels (level B was merged into level C).

I tried to make precise the meaning of the levels by matching them to
CPU features, based on their CPUID detection logic.  It's somewhat
complicated, but I think it's within reason for the task at hand.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-23 Thread H.J. Lu via Gcc
On Thu, Jul 23, 2020 at 5:44 AM Michael Matz  wrote:
>
> Hello,
>
> On Wed, 22 Jul 2020, Mallappa, Premachandra wrote:
>
> > > That's deliberate, so that we can use the same x86-* names for 32-bit 
> > > library selection (once we define matching micro-architecture levels 
> > > there).
> >
> > Understood.
> >
> > > If numbers are out, what should we use instead?
> > > x86-sse4, x86-avx2, x86-avx512?  Would that work?
> >
> > Yes please, I think we have to choose somewhere, above would be more
> > descriptive
>
> And IMHO that's exactly the problem.  These names should _not_ be
> descriptive, because any description invokes a wrong feeling of precision.
> E.g. what Florian already mentioned: sse4 - does it imply 4.1 and 4.2, or
> avx512: what of F, CD, ER, PF, VL, DQ, BW, IFMA, VBMI, 4VNNIW, 4FMAPS,
> VPOPCNTDQ, VNNI, VBMI2, BITALG, VP2INTERSECT, GFNI, VPCLMULQDQ, VAES does
> that one imply (rhethorical question, list shown just to make sillyness
> explicit).
>
> Regarding precision: I think we should rule out any mathematically correct
> scheme, e.g. one in which every ISA subset gets an index and the directory
> name contains a hexnumber constructed by bits with the corresponding index
> being one or zero, depending on if the ISA subset is required or not: I
> think we're currently at about 40 ISA subsets, and hence would end up in
> names like x86-32001afff and x86-22001afef (the latter missing two subset
> compared to the former).
>
> No, IMHO the non-vendor names should be non-descript, and either be
> numbers or characters, of which I would vote for characters, i.e. A, B, C.
> Obviously, as already mentioned here, the mapping of level to feature set
> needs to be described in documentation somewhere, and should be maintained
> by either glibc, glibc/gcc/llvm or psABI people.
>
> I don't have many suggestions about vendor names, be them ISA-subset
> market names, or core names or company names.  I will just note that using
> such names has lead to an explosion of number of names without very good
> separation between them.  As long as we're only talking about -march=
> cmdline flags that may have been okay, if silly, but under this proposal
> every such name is potentially a subdirectory containing many shared
> libraries, and one that potentially needs to be searched at every library
> looking in the dynamic linker; so it's prudent to limit the size of this
> name set as well.
>
> As for which subsets should or shouldn't be required in which level: I
> think the current suggestions all sound good, ultimately it's always going
> to be some compromise.
>

We can have x86-64, x86-64-v1, x86-64-v2, ..  These pseudo names
should be clearly documented.


-- 
H.J.


RE: New x86-64 micro-architecture levels

2020-07-23 Thread Michael Matz
Hello,

On Wed, 22 Jul 2020, Mallappa, Premachandra wrote:

> > That's deliberate, so that we can use the same x86-* names for 32-bit 
> > library selection (once we define matching micro-architecture levels there).
> 
> Understood.
> 
> > If numbers are out, what should we use instead?
> > x86-sse4, x86-avx2, x86-avx512?  Would that work?
> 
> Yes please, I think we have to choose somewhere, above would be more 
> descriptive

And IMHO that's exactly the problem.  These names should _not_ be 
descriptive, because any description invokes a wrong feeling of precision.  
E.g. what Florian already mentioned: sse4 - does it imply 4.1 and 4.2, or 
avx512: what of F, CD, ER, PF, VL, DQ, BW, IFMA, VBMI, 4VNNIW, 4FMAPS, 
VPOPCNTDQ, VNNI, VBMI2, BITALG, VP2INTERSECT, GFNI, VPCLMULQDQ, VAES does 
that one imply (rhethorical question, list shown just to make sillyness 
explicit).

Regarding precision: I think we should rule out any mathematically correct 
scheme, e.g. one in which every ISA subset gets an index and the directory 
name contains a hexnumber constructed by bits with the corresponding index 
being one or zero, depending on if the ISA subset is required or not: I 
think we're currently at about 40 ISA subsets, and hence would end up in 
names like x86-32001afff and x86-22001afef (the latter missing two subset 
compared to the former).

No, IMHO the non-vendor names should be non-descript, and either be 
numbers or characters, of which I would vote for characters, i.e. A, B, C.  
Obviously, as already mentioned here, the mapping of level to feature set 
needs to be described in documentation somewhere, and should be maintained 
by either glibc, glibc/gcc/llvm or psABI people.

I don't have many suggestions about vendor names, be them ISA-subset 
market names, or core names or company names.  I will just note that using 
such names has lead to an explosion of number of names without very good 
separation between them.  As long as we're only talking about -march= 
cmdline flags that may have been okay, if silly, but under this proposal 
every such name is potentially a subdirectory containing many shared 
libraries, and one that potentially needs to be searched at every library 
looking in the dynamic linker; so it's prudent to limit the size of this 
name set as well.

As for which subsets should or shouldn't be required in which level: I 
think the current suggestions all sound good, ultimately it's always going 
to be some compromise.


Ciao,
Michael.


RE: New x86-64 micro-architecture levels

2020-07-22 Thread Mallappa, Premachandra
[AMD Public Use]


> That's deliberate, so that we can use the same x86-* names for 32-bit library 
> selection (once we define matching micro-architecture levels there).

Understood.

> If numbers are out, what should we use instead?
> x86-sse4, x86-avx2, x86-avx512?  Would that work?

Yes please, I think we have to choose somewhere, above would be more descriptive

> Let's merge Level B into level C then?

I would vote for this.

>> Also we would also like to have dynamic loader support for "zen" / 
>> "zen2" as a version of "Level D" and takes preference over Level D, 
>> which may have super-optimized libraries from AMD or other vendors.

> *That* shouldn't be too hard to implement if we can nail down the selection 
> criteria.  Let's call this Zen-specific Level C x86-zen-avx2 for the sake of 
> exposition.

Some way of specifying a superset of "level C" , that "C" will capture fully.

Zen/zen2 takes precedence over Level C, but not Level D, but falls back to 
"Level C" or "x86-avx2" but not "x86-avx".

I think it is better to run a x86-zen on a x86-avx2 or x86-avx compared to 
running on a base x86_64 config.

> With the levels I proposed, these aspects are covered.  But if we start to 
> create vendor-specific forks in the feature progression, things get 
> complicated.
I am not strictly proposing OS vendors should create/maintain this (it would be 
nice if they did), but a support to cached load via system-wide-config. This 
directory may/will contain a subset of system libs.

> Do you think we need to figure this out in this iteration?  If yes, then I 
> really need a semi-formal description of the selection criteria for this 
> x86-zen-avx2 directory, so that I can passed it along with my psABI proposal.

Preference level (decreasing order) (I can only speak for AMD, others please 
pitch in)
- system wide config to override (in this case x86-zen)
- x86-avx2
- x86-sse4 (or avx, based on how we name and merge Level B)
- default x86_64


Re: New x86-64 micro-architecture levels

2020-07-22 Thread H.J. Lu via Gcc
On Wed, Jul 22, 2020 at 6:50 AM Richard Biener via Libc-alpha
 wrote:
>
> On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
> >
> > * Richard Biener:
> >
> > > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > > wrote:
> > >>
> > >> * Dongsheng Song:
> > >>
> > >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> > >>
> > >> I started out with a year number, but that was before the was Level A.
> > >> Too many new CPUs only fall under level A unfortunately because they do
> > >> not even have AVX.  This even applies to some new server CPU designs
> > >> released this year.
> > >>
> > >> I'm concerned that putting a year into the level name suggests that
> > >> everything main-stream released after that year supports that level, and
> > >> that's not true.  I think for manylinux, it's different, and it actually
> > >> works out there.  No one is building a new GNU/Linux distribution that
> > >> is based on glibc 2.12 today, for example.  But not so much for x86
> > >> CPUs.
> > >>
> > >> If you think my worry is unfounded, then a year-based approach sounds
> > >> compelling.
> > >
> > > I think the main question is whether those levels are supposed to be
> > > an implementation detail hidden from most software developer or
> > > if people are expected to make concious decisions between
> > > -march=x86-100 and -march=x86-101.  Implementation detail
> > > for system integrators, that is.
> >
> > Anyone who wants to optimize their software something that's more
> > current than what was available in 2003 has to think about this in some
> > form.
> >
> > With these levels, I hope to provide a pre-packaged set of choices, with
> > a consistent user interface, in the sense that -march= options and file
> > system locations match.  Programmers will definitely encounter these
> > strings, and they need to know what they mean for their users.  We need
> > to provide them with the required information so that they can make
> > decisions based on their knowledge of their user base.  But the ultimate
> > decision really has to be a programmer choice.
> >
> > I'm not sure if GCC documentation or glibc documentation would be the
> > right place for this.  An online resource that can be linked to directly
> > seems more appropriate.
> >
> > Apart from that, there is the more limited audience of general purpose
> > distribution builders.  I expect they will pick one of these levels to
> > build all the distribution binaries, unless they want to be stuck in
> > 2003.  But as long they do not choose the highest level defined,
> > programmers might still want to provide optimized library builds for
> > run-time selection, and then they need the same guidance as before.
> >
> > > If it's not merely an implementation detail then names without
> > > any chance of providing false hints (x86-2014 - oh, it will
> > > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > > course I want avx2) is better.  But this also means this feature
> > > should come with extensive documentation on how it is
> > > supposed to be used.  For example we might suggest ISVs
> > > provide binaries for all architecture levels or use IFUNCs
> > > or other runtime CPU selection capabilities.
> >
> > I think we should document the mechanism as best as we can, and provide
> > intended use cases.  We shouldn't go as far as to tell programmers what
> > library versions they must build, except that they should always include
> > a fallback version if no optimized library can be selected.
> >
> > Describing the interactions with IFUNCs also makes sense.
> >
> > But I think we should not go overboard with this.  Historically, we've
> > done not such a great job with documenting toolchain features, I know,
> > and we should do better now.  I will try to write something helpful, but
> > it should still match the relative importance of this feature.
> >
> > > It's also required to provide a (extensive?) list of SKUs that fall
> > > into the respective categories (probably up to CPU vendors to amend
> > > those).
> >
> > I'm afraid, but SKUs are not very useful in this context.
> > Virtualization can disable features (e.g., some cloud providers
> > advertise they use certain SKUs, but some features are not available to
> > guests), and firmware updates have done so as well.  I think the only
> > way is to document our selection criteria, and encourage CPU vendors to
> > enhance their SKU browsers so that you can search by the (lack of)
> > support for certain CPU features.
> >
> > The selection criteria I suggested should not be affected by firmware
> > and microcode updates at least (I took that into consideration), but
> > it's just not possible to achieve virtualization and kernel version
> > independence, given that some features based on which we want to make
> > 

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > wrote:
> >>
> >> * Dongsheng Song:
> >>
> >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> >>
> >> I started out with a year number, but that was before the was Level A.
> >> Too many new CPUs only fall under level A unfortunately because they do
> >> not even have AVX.  This even applies to some new server CPU designs
> >> released this year.
> >>
> >> I'm concerned that putting a year into the level name suggests that
> >> everything main-stream released after that year supports that level, and
> >> that's not true.  I think for manylinux, it's different, and it actually
> >> works out there.  No one is building a new GNU/Linux distribution that
> >> is based on glibc 2.12 today, for example.  But not so much for x86
> >> CPUs.
> >>
> >> If you think my worry is unfounded, then a year-based approach sounds
> >> compelling.
> >
> > I think the main question is whether those levels are supposed to be
> > an implementation detail hidden from most software developer or
> > if people are expected to make concious decisions between
> > -march=x86-100 and -march=x86-101.  Implementation detail
> > for system integrators, that is.
>
> Anyone who wants to optimize their software something that's more
> current than what was available in 2003 has to think about this in some
> form.
>
> With these levels, I hope to provide a pre-packaged set of choices, with
> a consistent user interface, in the sense that -march= options and file
> system locations match.  Programmers will definitely encounter these
> strings, and they need to know what they mean for their users.  We need
> to provide them with the required information so that they can make
> decisions based on their knowledge of their user base.  But the ultimate
> decision really has to be a programmer choice.
>
> I'm not sure if GCC documentation or glibc documentation would be the
> right place for this.  An online resource that can be linked to directly
> seems more appropriate.
>
> Apart from that, there is the more limited audience of general purpose
> distribution builders.  I expect they will pick one of these levels to
> build all the distribution binaries, unless they want to be stuck in
> 2003.  But as long they do not choose the highest level defined,
> programmers might still want to provide optimized library builds for
> run-time selection, and then they need the same guidance as before.
>
> > If it's not merely an implementation detail then names without
> > any chance of providing false hints (x86-2014 - oh, it will
> > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > course I want avx2) is better.  But this also means this feature
> > should come with extensive documentation on how it is
> > supposed to be used.  For example we might suggest ISVs
> > provide binaries for all architecture levels or use IFUNCs
> > or other runtime CPU selection capabilities.
>
> I think we should document the mechanism as best as we can, and provide
> intended use cases.  We shouldn't go as far as to tell programmers what
> library versions they must build, except that they should always include
> a fallback version if no optimized library can be selected.
>
> Describing the interactions with IFUNCs also makes sense.
>
> But I think we should not go overboard with this.  Historically, we've
> done not such a great job with documenting toolchain features, I know,
> and we should do better now.  I will try to write something helpful, but
> it should still match the relative importance of this feature.
>
> > It's also required to provide a (extensive?) list of SKUs that fall
> > into the respective categories (probably up to CPU vendors to amend
> > those).
>
> I'm afraid, but SKUs are not very useful in this context.
> Virtualization can disable features (e.g., some cloud providers
> advertise they use certain SKUs, but some features are not available to
> guests), and firmware updates have done so as well.  I think the only
> way is to document our selection criteria, and encourage CPU vendors to
> enhance their SKU browsers so that you can search by the (lack of)
> support for certain CPU features.
>
> The selection criteria I suggested should not be affected by firmware
> and microcode updates at least (I took that into consideration), but
> it's just not possible to achieve virtualization and kernel version
> independence, given that some features based on which we want to make
> library selections demand kernel and hypervisor support.
>
> > Since this is a feature crossing multiple projects - at least
> > glibc and GCC - sharing the source of said documentation
> > would be important.
>
> Technically, the GCC web site would work for 

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 22.07.2020 12:34, Florian Weimer wrote:
> The remaining issue is the - vs _ issue.  I think GCC currently uses
> “x86-64” in places that are not part of identifiers or target triplets.
> Richard mentioned “x86_64-” as a potential choice.  Would it be too
> awkward to have ”-march=x86_64-…”?

Personally I'm advocating for avoiding underscores whenever dashes
can also be used, and whenever they're not needed to distinguish
themselves from dashes (like in target triplets). But this doesn't
make their use "awkward" here of course - it's just my personal
view on it. And maybe, despite the main was sent _to_ just me, it
was really me you meant to ask ...

Jan


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Jan Beulich:

> On 21.07.2020 20:04, Florian Weimer wrote:
>> * Premachandra Mallappa:
>> 
>>> [AMD Public Use]
>>>
>>> Hi Floarian,
>>>
 I'm including a proposal for the levels below.  I use single letters for 
 them, but I expect that the concrete implementation of this proposal will 
 use 
 names like “x86-100”, “x86-101”, like in the glibc patch referenced above. 
  (But we can discuss other approaches.)
>>>
>>> Personally I am not a big fan of this, for 2 reasons 
>>> 1. uses just x86 in name on x86_64 as well
>> 
>> That's deliberate, so that we can use the same x86-* names for 32-bit
>> library selection (once we define matching micro-architecture levels
>> there).
>
> While indeed I did understand it to be deliberate, in the light of
> 64-bit only ISA extensions (like AMX, and I suspect we're going to
> see more) I nevertheless think Premachandra has a point here.

Let me explain how I ended up there.  Maybe I'm wrong.

Previously, I observed that it is difficult to set LD_PRELOAD and
LD_LIBRARY_PATH on combined x86-64/i386 systems, so that the right
libraries are loaded for both variants, and users aren't confused by
dynamic linker warning messages.  On some systems, it is possible to use
dynamic string tokens ($LIB), but not all.

Eventually, it will be possible to add and restrict glibc-hwcaps
subdirectories by setting an environment variable.  The original patch
series only contains ld.so command line options because I wanted to
avoid a discussion about the precise mechanism for setting the
environment variable (current glibc has two approaches).  But the desire
to provide this functionality is there: for adding additional
glibc-hwcaps subdirectories to be searched first, and for restricting
selection to a subset of the built-in (automatically-selected)
subdirectories.

I was worried that we would run into the same problem as with
LD_PRELOAD, where x86-64 and i386 binaries may have different
requirements.  I wanted to minimize the conflict by sharing the names
(eventually, once we have 32-bit variants).

But thinking about this again, I'm not sure if my worry is warranted.
The main selection criteria is still the library load path, and that is
already provided by some different means (e.g. $LIB).  Within the
library path, there is the glibc-hwcaps subdirectory, but since it is
nested under a specific library path subdirectory (determined by the
architecture), adding subdirectories to be searched which do not exist
on the file system, or surpressing directories which would not be
searched in the first place, is not a problem.  The situation is
completely benign and would not warrant any error message from the
dynamic loader.

If this analysis is correct, there is no reason to share the
subdirectory names between x86-64 and i386 binaries, and we can put “64”
somewhere in the x86-64 strings.

The remaining issue is the - vs _ issue.  I think GCC currently uses
“x86-64” in places that are not part of identifiers or target triplets.
Richard mentioned “x86_64-” as a potential choice.  Would it be too
awkward to have ”-march=x86_64-…”?

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Richard Biener:

> On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> wrote:
>>
>> * Dongsheng Song:
>>
>> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
>> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
>> > python's platform tags (e.g. manylinux2010, manylinux2014).
>>
>> I started out with a year number, but that was before the was Level A.
>> Too many new CPUs only fall under level A unfortunately because they do
>> not even have AVX.  This even applies to some new server CPU designs
>> released this year.
>>
>> I'm concerned that putting a year into the level name suggests that
>> everything main-stream released after that year supports that level, and
>> that's not true.  I think for manylinux, it's different, and it actually
>> works out there.  No one is building a new GNU/Linux distribution that
>> is based on glibc 2.12 today, for example.  But not so much for x86
>> CPUs.
>>
>> If you think my worry is unfounded, then a year-based approach sounds
>> compelling.
>
> I think the main question is whether those levels are supposed to be
> an implementation detail hidden from most software developer or
> if people are expected to make concious decisions between
> -march=x86-100 and -march=x86-101.  Implementation detail
> for system integrators, that is.

Anyone who wants to optimize their software something that's more
current than what was available in 2003 has to think about this in some
form.

With these levels, I hope to provide a pre-packaged set of choices, with
a consistent user interface, in the sense that -march= options and file
system locations match.  Programmers will definitely encounter these
strings, and they need to know what they mean for their users.  We need
to provide them with the required information so that they can make
decisions based on their knowledge of their user base.  But the ultimate
decision really has to be a programmer choice.

I'm not sure if GCC documentation or glibc documentation would be the
right place for this.  An online resource that can be linked to directly
seems more appropriate.

Apart from that, there is the more limited audience of general purpose
distribution builders.  I expect they will pick one of these levels to
build all the distribution binaries, unless they want to be stuck in
2003.  But as long they do not choose the highest level defined,
programmers might still want to provide optimized library builds for
run-time selection, and then they need the same guidance as before.

> If it's not merely an implementation detail then names without
> any chance of providing false hints (x86-2014 - oh, it will
> run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> course I want avx2) is better.  But this also means this feature
> should come with extensive documentation on how it is
> supposed to be used.  For example we might suggest ISVs
> provide binaries for all architecture levels or use IFUNCs
> or other runtime CPU selection capabilities.

I think we should document the mechanism as best as we can, and provide
intended use cases.  We shouldn't go as far as to tell programmers what
library versions they must build, except that they should always include
a fallback version if no optimized library can be selected.

Describing the interactions with IFUNCs also makes sense.

But I think we should not go overboard with this.  Historically, we've
done not such a great job with documenting toolchain features, I know,
and we should do better now.  I will try to write something helpful, but
it should still match the relative importance of this feature.

> It's also required to provide a (extensive?) list of SKUs that fall
> into the respective categories (probably up to CPU vendors to amend
> those).

I'm afraid, but SKUs are not very useful in this context.
Virtualization can disable features (e.g., some cloud providers
advertise they use certain SKUs, but some features are not available to
guests), and firmware updates have done so as well.  I think the only
way is to document our selection criteria, and encourage CPU vendors to
enhance their SKU browsers so that you can search by the (lack of)
support for certain CPU features.

The selection criteria I suggested should not be affected by firmware
and microcode updates at least (I took that into consideration), but
it's just not possible to achieve virtualization and kernel version
independence, given that some features based on which we want to make
library selections demand kernel and hypervisor support.

> Since this is a feature crossing multiple projects - at least
> glibc and GCC - sharing the source of said documentation
> would be important.

Technically, the GCC web site would work for me.  It's not a wiki.  It's
not CVS.  We can update it outside of release cycle.  We are not forced
to use the GFDL with Invariant Sections.  It doesn't end up in our
product documentation, where it would be confusing if it discusses

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  wrote:
>
> * Dongsheng Song:
>
> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > python's platform tags (e.g. manylinux2010, manylinux2014).
>
> I started out with a year number, but that was before the was Level A.
> Too many new CPUs only fall under level A unfortunately because they do
> not even have AVX.  This even applies to some new server CPU designs
> released this year.
>
> I'm concerned that putting a year into the level name suggests that
> everything main-stream released after that year supports that level, and
> that's not true.  I think for manylinux, it's different, and it actually
> works out there.  No one is building a new GNU/Linux distribution that
> is based on glibc 2.12 today, for example.  But not so much for x86
> CPUs.
>
> If you think my worry is unfounded, then a year-based approach sounds
> compelling.

I think the main question is whether those levels are supposed to be
an implementation detail hidden from most software developer or
if people are expected to make concious decisions between
-march=x86-100 and -march=x86-101.  Implementation detail
for system integrators, that is.

If it's not merely an implementation detail then names without
any chance of providing false hints (x86-2014 - oh, it will
run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
course I want avx2) is better.  But this also means this feature
should come with extensive documentation on how it is
supposed to be used.  For example we might suggest ISVs
provide binaries for all architecture levels or use IFUNCs
or other runtime CPU selection capabilities.  It's also required
to provide a (extensive?) list of SKUs that fall into the respective
categories (probably up to CPU vendors to amend those).
Since this is a feature crossing multiple projects - at least
glibc and GCC - sharing the source of said documentation
would be important.

So for the bike-shedding I indeed think x86-10{0,1,2,3}
or x86-{A,B,C,..}, eventually duplicating as x86_64- as
suggested by Jan is better than x86-2014 or x86-avx2.

Richard.

> Thanks,
> Florian
>


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Dongsheng Song:

> I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> python's platform tags (e.g. manylinux2010, manylinux2014).

I started out with a year number, but that was before the was Level A.
Too many new CPUs only fall under level A unfortunately because they do
not even have AVX.  This even applies to some new server CPU designs
released this year.

I'm concerned that putting a year into the level name suggests that
everything main-stream released after that year supports that level, and
that's not true.  I think for manylinux, it's different, and it actually
works out there.  No one is building a new GNU/Linux distribution that
is based on glibc 2.12 today, for example.  But not so much for x86
CPUs.

If you think my worry is unfounded, then a year-based approach sounds
compelling.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 21.07.2020 20:04, Florian Weimer wrote:
> * Premachandra Mallappa:
> 
>> [AMD Public Use]
>>
>> Hi Floarian,
>>
>>> I'm including a proposal for the levels below.  I use single letters for 
>>> them, but I expect that the concrete implementation of this proposal will 
>>> use 
>>> names like “x86-100”, “x86-101”, like in the glibc patch referenced above.  
>>> (But we can discuss other approaches.)
>>
>> Personally I am not a big fan of this, for 2 reasons 
>> 1. uses just x86 in name on x86_64 as well
> 
> That's deliberate, so that we can use the same x86-* names for 32-bit
> library selection (once we define matching micro-architecture levels
> there).

While indeed I did understand it to be deliberate, in the light of
64-bit only ISA extensions (like AMX, and I suspect we're going to
see more) I nevertheless think Premachandra has a point here.

Jan


Re: New x86-64 micro-architecture levels

2020-07-21 Thread Dongsheng Song via Gcc
I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
python's platform tags (e.g. manylinux2010, manylinux2014).


Re: New x86-64 micro-architecture levels

2020-07-21 Thread Florian Weimer via Gcc
* Premachandra Mallappa:

> [AMD Public Use]
>
> Hi Floarian,
>
>> I'm including a proposal for the levels below.  I use single letters for 
>> them, but I expect that the concrete implementation of this proposal will 
>> use 
>> names like “x86-100”, “x86-101”, like in the glibc patch referenced above.  
>> (But we can discuss other approaches.)
>
> Personally I am not a big fan of this, for 2 reasons 
> 1. uses just x86 in name on x86_64 as well

That's deliberate, so that we can use the same x86-* names for 32-bit
library selection (once we define matching micro-architecture levels
there).

GCC has -m32 -march=x86-64 for K8 without 3DNow! (essentially the shared
x86-64/EMT64 baseline), but I find this a bit confusing.

> 2. 100/101 not very intuitive

Any suggestions?  The advantage is that these numbers show a strong
preference ordering.  They do make in false suggestions about feature
sets: if we named Level C "x86-avx2", it would still be wrong for glibc
to load libraries found in that directory just because a system has AVX2
support, because the libraries might also need FMA, based on the Level C
definition).  On the GCC side, it avoids a confusion between -mavx2 and
-march=x86-avx2.

If numbers are out, what should we use instead?
x86-sse4, x86-avx2, x86-avx512?  Would that work?

>> * Level A
> ...
>> * Level B
>> This step is so small that it probably can be dropped, unless the benefits 
>> from using VEX encoding are truly significant.
>
> Yes, Agree, the delta is too small, can be clubbed into A or C.

Let's merge Level B into level C then?

>> * Level C
>> * Level D
>
> Others are inline with the what we expect as logical grouping.

Thanks.

> Also we would also like to have dynamic loader support for "zen" /
> "zen2" as a version of "Level D" and takes preference over Level D,
> which may have super-optimized libraries from AMD or other vendors.

*That* shouldn't be too hard to implement if we can nail down the
selection criteria.  Let's call this Zen-specific Level C x86-zen-avx2
for the sake of exposition.

What's going to be difficult is the choice for a hypothetical Zen
successor that's compatible feature-flag-wise with Level D.

Basically, there are two choices here:

  * Level D wins because it's the more powerful ISA.
  * x86-zen-avx2 wins because it has the Zen architecture optimizations.

There's also a related issue with Level C vs x86-zen-avx2 depending on
how we implement the Zen detection for AMD family numbers in the glibc
dynamic linker.  What I mean by this?  glibc detects that this a Level C
capable Zen-type CPU, but it's not one of the family/model numbers that
were hard-coded into the glibc sources.  What should we do then?  Should
we still prefer the x86-zen-avx2 library over the Level C library?

> These libraries are expected to be optimized according to
> micro-architectural details, not just ISA.

If it's supposed to be generally useful, we really need to document the
selection criteria for the subdirectory and make sure that it matches
what these libraries actually require at run time in terms of ISA.

I want to avoid two things here specifically: A hardware upgrade results
in crashes because we incorrectly load an incompatible library.  And, if
possible: A hardware upgrade (or kernel/hypervisor upgrade that exposes
more of the actual hardware) causes us to drop optimizations, so that
users experience a performance regression.

With the levels I proposed, these aspects are covered.  But if we start
to create vendor-specific forks in the feature progression, things get
complicated.

Do you think we need to figure this out in this iteration?  If yes, then
I really need a semi-formal description of the selection criteria for
this x86-zen-avx2 directory, so that I can passed it along with my psABI
proposal.

Thanks,
Florian



RE: New x86-64 micro-architecture levels

2020-07-21 Thread Mallappa, Premachandra
[AMD Public Use]

Hi Floarian,

> I'm including a proposal for the levels below.  I use single letters for 
> them, but I expect that the concrete implementation of this proposal will use 
> names like “x86-100”, “x86-101”, like in the glibc patch referenced above.  
> (But we can discuss other approaches.)

Personally I am not a big fan of this, for 2 reasons 
1. uses just x86 in name on x86_64 as well
2. 100/101 not very intuitive


> * Level A
...
> * Level B
> This step is so small that it probably can be dropped, unless the benefits 
> from using VEX encoding are truly significant.

Yes, Agree, the delta is too small, can be clubbed into A or C.

> * Level C
> * Level D

Others are inline with the what we expect as logical grouping.

As you mentioned it is not easy tackle this,
Also we would also like to have dynamic loader support for "zen" / "zen2" as a 
version of "Level D" and takes preference over Level D,
which may have super-optimized libraries from AMD or other vendors.
These libraries are expected to be optimized according to micro-architectural 
details, not just ISA.

Probably we can discuss this on the hwcaps thread.

-Prem

Re: New x86-64 micro-architecture levels

2020-07-15 Thread Florian Weimer via Gcc
* Mark Wielaard:

> One thing that wasn't clear to me from this proposal is how the glibc
> dynamic loader checks for the CPU feature flags. This is important for
> valgrind since it can communicate those through different means. cpuid
> interception, auxv AT_HWCAP/AT_HWCAP2 interception (but not AT_PLATFORM
> at the moment) and of course we can generate SIGILL for unsupported
> instructions. We currently don't intercept /proc/cpuinfo (but could).

glibc uses CPUID in combination with XGETBV.  There is also a masking
feature which I have not reviewed, but given that it only takes features
away, I don't think it matters to valgrind.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-15 Thread H.J. Lu via Gcc
On Wed, Jul 15, 2020 at 7:38 AM Mark Wielaard  wrote:
>
> Hi Florian,
>
> I understand you want to discuss the x86_64 micro-architecture levels
> only in this thread, but it would be nice to have a similar discussion
> for other architectures.
>
> One thing that wasn't clear to me from this proposal is how the glibc
> dynamic loader checks for the CPU feature flags. This is important for
> valgrind since it can communicate those through different means. cpuid
> interception, auxv AT_HWCAP/AT_HWCAP2 interception (but not AT_PLATFORM
> at the moment) and of course we can generate SIGILL for unsupported
> instructions. We currently don't intercept /proc/cpuinfo (but could).

In library, we can use :

https://sourceware.org/pipermail/libc-alpha/2020-June/115546.html

In GCC, we can use __builtin_cpu_supports.

 supports all features and __builtin_cpu_supports in
GCC 11 supports all features which GCC has codegen for.

> I think it is important to be precise here, because in the past this
> has sometimes caused confusion. For example for how to check correctly
> for avx, lzcnt, or fma[4] support.
>
> Thanks,
>
> Mark
>
> P.S. I don't particular like the numbered names, but well, bike-shed...



-- 
H.J.


Re: New x86-64 micro-architecture levels

2020-07-15 Thread Mark Wielaard
Hi Florian,

I understand you want to discuss the x86_64 micro-architecture levels
only in this thread, but it would be nice to have a similar discussion
for other architectures.

One thing that wasn't clear to me from this proposal is how the glibc
dynamic loader checks for the CPU feature flags. This is important for
valgrind since it can communicate those through different means. cpuid
interception, auxv AT_HWCAP/AT_HWCAP2 interception (but not AT_PLATFORM
at the moment) and of course we can generate SIGILL for unsupported
instructions. We currently don't intercept /proc/cpuinfo (but could).

I think it is important to be precise here, because in the past this
has sometimes caused confusion. For example for how to check correctly
for avx, lzcnt, or fma[4] support.

Thanks,

Mark

P.S. I don't particular like the numbered names, but well, bike-shed...


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Jakub Jelinek via Gcc
On Mon, Jul 13, 2020 at 06:31:31AM -0700, H.J. Lu via Gcc wrote:
> > > H.J. has patches for ELF program properties.  I think
> > > GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
> > > proposal and the glibc patches are independent of that.
> >
> > From (partly just halfway) recent discussions with H.J. I gained
> > the understanding that the piece we're aiming at getting to work
> > properly is the recording of GNU_PROPERTY_X86_FEATURE_2_*, not
> > so much GNU_PROPERTY_X86_ISA_1_*. If the ISA one is to be used as
> > a basis here, a lot of new flags will need adding (and properly
> > setting) first, I think.
> >
> 
> We can update GNU_PROPERTY_X86_ISA_1_* as needed.

I am not really sure such properties are a good idea, it will be a
maintainability nightmare (as it is on other OSes like Solaris).
Think about function multiversioning, target attribute for just some
functions, #pragma omp declare simd.  How do you differentiate between
using those on carefully written code that handles cpuid detection itself or
uses compiler support for that, where we do not want to mark the objects in
any way, they should work just fine even on K8, and cases where users want
something like that?

E.g. look for -mclear-hwcap stuff needed for Solaris because of that.

Jakub



Re: New x86-64 micro-architecture levels

2020-07-13 Thread H.J. Lu via Gcc
On Mon, Jul 13, 2020 at 12:48 AM Jan Beulich  wrote:
>
> On 13.07.2020 09:40, Florian Weimer wrote:
> > * Richard Biener:
> >>> 2. I have a library with AVX2 and FMA, which directory should it go?
> >>
> >> Eventually GCC/gas can annotate objects with the lowest architecture
> >> level that is applicable?
> >
> > H.J. has patches for ELF program properties.  I think
> > GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
> > proposal and the glibc patches are independent of that.
>
> From (partly just halfway) recent discussions with H.J. I gained
> the understanding that the piece we're aiming at getting to work
> properly is the recording of GNU_PROPERTY_X86_FEATURE_2_*, not
> so much GNU_PROPERTY_X86_ISA_1_*. If the ISA one is to be used as
> a basis here, a lot of new flags will need adding (and properly
> setting) first, I think.
>

We can update GNU_PROPERTY_X86_ISA_1_* as needed.

-- 
H.J.


Re: New x86-64 micro-architecture levels

2020-07-13 Thread H.J. Lu via Gcc
On Sun, Jul 12, 2020 at 11:49 PM Florian Weimer  wrote:
>
> * H. J. Lu:
>
> > Looks good.  I like it.
>
> Thanks.  What do you think about Level B?  Should we keep it?

Please drop Level B.

> > My only concerns are
> >
> > 1. Names like “x86-100”, “x86-101”, what features do they support?
>
> I think we can add more diagnostic output to ld.so --help.  My patch
> does not show individual CPU flags, but I agree this could be useful.
> (It's not needed for the legacy HWCAP subdirectories because in general,
> those are named & defined by the kernel, not by individually named CPU
> feature flags.)
>
> > 2. I have a library with AVX2 and FMA, which directory should it go?
> >
> > Can we pass such info to ld.so and ld.so prints out the best directory
> > name?
>
> I think this would require generating matching GNU property notes (list
> the CPU features required by the binary).  Once we have that, we can add

I have turned on -mx86-used-note=yes by default for binutils 2.36.
I will add more ISAs bits after we determine which ISAs will be used.
But compilers need to generate GNU_PROPERTY_X86_ISA_1_NEEDED
property.

> something to binutils or indeed ld.so to analyze them and print the
> recommended directory.  But I think this is something that could come
> later.
>
> We can also write a GCC header which looks at macros such as __AVX2__
> and prints a #warning with the recommended directory name.  Checking for
> excess flags will be tricky in this context, though, and if we miss
> something, a wrong recommendation will be the result.
>
> Thanks,
> Florian


-- 
H.J.


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Richard Biener via Gcc
On Mon, Jul 13, 2020 at 9:40 AM Florian Weimer  wrote:
>
> * Richard Biener:
>
> >> Looks good.  I like it.
> >
> > Likewise.  Btw, did you check that VIA family chips slot into Level A
> > at least?
>
> Those seem to lack SSE4.2, so they land in the baseline.
>
> > Where do AMD bdverN slot in?
>
> bdver1 to bdver3 (as defined by GCC) should land in Level B (so Level A
> if that is dropped).  bdver4 and znver1 (and later) should land in
> Level C.
>
> >>  My only concerns are
> >>
> >> 1. Names like “x86-100”, “x86-101”, what features do they support?
> >
> > Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
> > suggested -march=generic-{A,B,C,D} implying the respective
> > -mtune.
>
> With literal A, B, C, D, or are they just placeholders?  If not literal
> levels, then what we should use there?
>
> I like the simplicity of numbers.  I used letters in the proposal to
> avoid confusion if we alter the proposal by dropping or levels, shifting
> the meaning of those that come later.  I expect to switch back to
> numbers again for the final version.

They are indeed placeholders though I somehow prefer letters to
numbers.  But this is really bike-shedding territory.  Good documentation
on the tools side will be more imporant as well as consistent spelling
between tools sets, possibly driven by a good choice from within the
psABI document.

Richard.


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Florian Weimer via Gcc
* Joseph Myers:

> On Fri, 10 Jul 2020, Florian Weimer via Gcc wrote:
>
>> * Level A
>> 
>> CMPXCHG16B, LAHF/SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
>> 
>> This is one step above the K8 baseline and corresponds to a mainline CPU
>> model ca. 2008 to 2011.  It is also implemented by recent-ish
>> generations of Intel Atom server CPUs (although I haven't tested the
>> latest version).  A 32-bit variant would have to list many additional
>> CPU features here.
>
> FWIW, this is also the limit of what can be run under QEMU emulation, as 
> QEMU lacks support for AVX and newer instruction set features.

Oh, I had forgotten about.  I should have Cc:ed the QEMU folks as well.
We'll need to make sure that we have matching CPU models in
QEMU/libvirt, even for the levels that do not have TCG support.

valgrind is another consumer, but in my tests, it was mostly okay with
AVX2 code (but that was without auto-vectorization).  AVX-512 is a
different matter, but that is also much further out.

> On the other hand, virtual machines seem liable to report something closer 
> to the K8 baseline to the guest OS, missing the level A features, even 
> when the underlying hardware supports everything in level B or level C.

They do this to support migration.  I'm suspect that in many cases,
those are just configuration errors.  That's why I want at least one
major distribution to switch to Level C as the baseline, to clean the
pipes.  Then even those distributions that depend on run-time selection
for performance-critical code will benefit. 8-/

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-13 Thread Jan Beulich
On 13.07.2020 09:40, Florian Weimer wrote:
> * Richard Biener:
>>> 2. I have a library with AVX2 and FMA, which directory should it go?
>>
>> Eventually GCC/gas can annotate objects with the lowest architecture
>> level that is applicable?
> 
> H.J. has patches for ELF program properties.  I think
> GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
> proposal and the glibc patches are independent of that.

>From (partly just halfway) recent discussions with H.J. I gained
the understanding that the piece we're aiming at getting to work
properly is the recording of GNU_PROPERTY_X86_FEATURE_2_*, not
so much GNU_PROPERTY_X86_ISA_1_*. If the ISA one is to be used as
a basis here, a lot of new flags will need adding (and properly
setting) first, I think.

Jan


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Florian Weimer via Gcc
* Richard Biener:

>> Looks good.  I like it.
>
> Likewise.  Btw, did you check that VIA family chips slot into Level A
> at least?

Those seem to lack SSE4.2, so they land in the baseline.

> Where do AMD bdverN slot in?

bdver1 to bdver3 (as defined by GCC) should land in Level B (so Level A
if that is dropped).  bdver4 and znver1 (and later) should land in
Level C.

>>  My only concerns are
>>
>> 1. Names like “x86-100”, “x86-101”, what features do they support?
>
> Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
> suggested -march=generic-{A,B,C,D} implying the respective
> -mtune.

With literal A, B, C, D, or are they just placeholders?  If not literal
levels, then what we should use there?

I like the simplicity of numbers.  I used letters in the proposal to
avoid confusion if we alter the proposal by dropping or levels, shifting
the meaning of those that come later.  I expect to switch back to
numbers again for the final version.

> Do the patches end up annotating ELF binaries with the architecture
> level and does ld.so check that info?

This is a separate feature that H.J. has been working on.

> For example IIRC there's a penalty to switch between VEX and
> not VEX encoded instructions so even on AVX capable hardware
> it might be profitable to use non-AVX libraries if the program is
> using only architecture level A?

But this is impossible to know in general.  It may also be possible that
the library contains an inner loop that can be nicely vectorized with
AVX instructions, but not with SSE4.2 instructions and earlier.  Then
preferring the non-AVX version would be a mistake.

Regarding the transition penalty, I believe this is mostly addressed by
those VZEROUPPER instructions?  I've already explained why I think those
aren't a viable optimization target, given the current calling
convention.

My glibc patches already provide a way to mask subdirectories which
would otherwise be selected, so manual optimization is still possible.

> On that side, does architecture level B+ suggest using VEX encoding
> everywhere?  It would be indeed nice to have the architecture levels
> documented in the psABI.

I think this falls under optimization, and I really did not want to
discuss.

If there is a plan to change/amend the calling convention and some of
the levels should prefer to that, it's a different matter, of course.
(glibc can only give you four callee-saved 256-bit wide registers
easily, though, more would need close cooperation with GCC.)

The new glibc-hwcaps scheme in glibc scales a bit better than the old
one, so we do not have to settle this immediately and could add
additional subdirectories for objects that follow new calling convention
requirements.

>> 2. I have a library with AVX2 and FMA, which directory should it go?
>
> Eventually GCC/gas can annotate objects with the lowest architecture
> level that is applicable?

H.J. has patches for ELF program properties.  I think
GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
proposal and the glibc patches are independent of that.

If that function ever gets deployed, I plan to add those notes to
ld.so.cache, so that ld.so can select shared objects based on them (or
any allocated ELF note, really).  Efficient LD_LIBRARY_PATH support is
not possible, I think, so those designated glibc-hwcaps subdirectories
still have a place.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-13 Thread Florian Weimer via Gcc
* Allan Sandfeld Jensen:

> On Freitag, 10. Juli 2020 19:30:09 CEST Florian Weimer via Gcc wrote:
>> glibc (or an alternative loader implementation) would search for
>> libraries starting at level D, going back to level A, and finally the
>> baseline implementation in the default library location.
>> 
>> I expect that some distributions will also use these levels to set a
>> baseline for the entire distribution (i.e., everything would be built to
>> level A or maybe even level C), and these libraries would then be
>> installed in the default location.
>> 
>> I'll be glad if I can get any feedback on this proposal.  I plan to turn
>> it into a merge request for the x86-64 psABI document eventually.

> Sounds good, though if I could dream I would also love a partial
> replacement option. So that you could have a generic x86-64 binary
> that only had some AVX2 optimized replacement functions in a
> supplementary library.
>
> Perhaps implemented by marked the library as a partial replacement, so
> the dynamic linker would also load the base or lower libraries except
> for functions already resolved.

I think you can do something like it today, at least from the glibc
dynamic loader perspective.  Programs link against the soname of the
optimized shared object (which can be empty), and that shared object
depends on the object with the fallback implementation.  A special
link-only shared object containing just the ABI under the front soname
(that of the optimized library) would be used via a linker object, so
that it is not possible to accidentally link against the wrong soname.

For non-versioned symbols, this setup has worked since forever.  For
versioned symbols, delegating from the optimized to the unoptimized
library needs at least glibc 2.30, with commit f0b2132b35248c1f4a80
("ld.so: Support moving versioned symbols between sonames [BZ #24741]"),
although some of us have backported this commit into earlier releases.

Where this falls flat is support for LTO and
-fno-semantic-interposition.  Some care is needed to make precisely the
right set of symbols interposable.  But to honest, I'm not sure if this
entire mechanism is a big improvement over function multi-versioning.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-13 Thread Florian Weimer via Gcc
* H. J. Lu:

> Looks good.  I like it.

Thanks.  What do you think about Level B?  Should we keep it?

> My only concerns are
>
> 1. Names like “x86-100”, “x86-101”, what features do they support?

I think we can add more diagnostic output to ld.so --help.  My patch
does not show individual CPU flags, but I agree this could be useful.
(It's not needed for the legacy HWCAP subdirectories because in general,
those are named & defined by the kernel, not by individually named CPU
feature flags.)

> 2. I have a library with AVX2 and FMA, which directory should it go?
>
> Can we pass such info to ld.so and ld.so prints out the best directory
> name?

I think this would require generating matching GNU property notes (list
the CPU features required by the binary).  Once we have that, we can add
something to binutils or indeed ld.so to analyze them and print the
recommended directory.  But I think this is something that could come
later.

We can also write a GCC header which looks at macros such as __AVX2__
and prints a #warning with the recommended directory name.  Checking for
excess flags will be tricky in this context, though, and if we miss
something, a wrong recommendation will be the result.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-13 Thread Richard Biener via Gcc
On Fri, Jul 10, 2020 at 11:45 PM H.J. Lu via Gcc  wrote:
>
> On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer  wrote:
> >
> > Most Linux distributions still compile against the original x86-64
> > baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> > EM64T compatibility).
> >
> > There has been an attempt to use the existing AT_PLATFORM-based loading
> > mechanism in the glibc dynamic linker to enable a selection of optimized
> > libraries.  But the general selection mechanism in glibc is problematic:
> >
> >   hwcaps subdirectory selection in the dynamic loader
> >   
> >
> > We also have the problem that the glibc version of "haswell" is distinct
> > from GCC's -march=haswell (and presumably other compilers):
> >
> >   Definition of "haswell" platform is inconsistent with GCC
> >   
> >
> > And that the selection criteria are not what people expect:
> >
> >   Epyc and other current AMD CPUs do not select the "haswell" platform
> >   subdirectory
> >   
> >
> > Since the hwcaps-based selection does not work well regardless of
> > architecture (even in cases the kernel provides glibc with data), I
> > worked on a new mechanism that does not have the problems associated
> > with the old mechanism:
> >
> >   [PATCH 00/30] RFC: elf: glibc-hwcaps support
> >   
> >
> > (Don't be concerned that these patches have not been reviewed; we are
> > busy preparing the glibc 2.32 release, and these changes do not alter
> > the glibc ABI itself, so they do not have immediate priority.  I'm
> > fairly confident that a version of these changes will make it into glibc
> > 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> > Enterprise Linux 8.4.  Debian as well, but I have never done anything
> > like it there, so I don't know if the patches will be accepted.)
> >
> > Out of the box, this should work fairly well for IBM POWER and Z, where
> > there is a clear progression of silicon versions (at least on paper
> > —virtualization may blur the picture somewhat).
> >
> > However, for x86, we do not have such a clear progression of
> > micro-architecture versions.  This is not just as a result of the
> > AMD/Intel competition, but also due to ongoing product differentiation
> > within one chip vendor.  I think we need these levels broadly for the
> > following reasons:
> >
> > * Selecting on individual CPU features (similar to the old hwcaps
> >   mechanism) in glibc has scalability issues, particularly for
> >   LD_LIBRARY_PATH processing.
> >
> > * Developers need guidance about useful targets for optimization.  I
> >   think there is value in limiting the choices, in the sense that “if
> >   you are able to test three builds in total, these are the things you
> >   should build”.
> >
> > * glibc and the compilers should align in their definition of the
> >   levels, so that developers can use an -march= option to build for a
> >   particular level that is recognized by glibc.  This is why I think the
> >   description of the levels should go into the psABI supplement.
> >
> > * A preference order for these levels avoids falling back to the K8
> >   baseline if the platform progresses to a new version due to
> >   glibc/kernel/hypervisor/hardware upgrades.
> >
> > I'm including a proposal for the levels below.  I use single letters for
> > them, but I expect that the concrete implementation of this proposal
> > will use names like “x86-100”, “x86-101”, like in the glibc patch
> > referenced above.  (But we can discuss other approaches.)
> >
> > I looked at various machines in the Red Hat labs and talked to Intel and
> > AMD engineers about this, but this concrete proposal is based on my own
> > analysis of the situation.  I excluded CPU features related to
> > cryptography and cache management, including hardware transactional
> > memory, and CPU timing.  I assume that we will see some of these
> > features being disabled by the firmware or the kernel over time.  That
> > would eliminate entire levels from selection, which is not desirable.
> > For cryptographic code, I expect that localized selection of an
> > optimized implementation works because such code tends to be isolated
> > blocks, running for dozens of cycles each time, not something that gets
> > scattered all over the place by the compiler.
> >
> > We previously discussed not emitting VZEROUPPER at later levels, but I
> > don't think this is beneficial because the ABI does not have
> > callee-saved vector registers, so it can only be useful with local
> > functions (or whatever LTO considers local), where there is no ABI
> > impact anyway.
> >
> > I did not include FSGSBASE because the FS base is already available at
> > %fs:0.  Changing the FS base in userspace breaks too 

Re: New x86-64 micro-architecture levels

2020-07-11 Thread Allan Sandfeld Jensen
On Freitag, 10. Juli 2020 19:30:09 CEST Florian Weimer via Gcc wrote:
> glibc (or an alternative loader implementation) would search for
> libraries starting at level D, going back to level A, and finally the
> baseline implementation in the default library location.
> 
> I expect that some distributions will also use these levels to set a
> baseline for the entire distribution (i.e., everything would be built to
> level A or maybe even level C), and these libraries would then be
> installed in the default location.
> 
> I'll be glad if I can get any feedback on this proposal.  I plan to turn
> it into a merge request for the x86-64 psABI document eventually.
> 
Sounds good, though if I could dream I would also love a partial replacement 
option. So that you could have a generic x86-64 binary that only had some AVX2 
optimized replacement functions in a supplementary library.

Perhaps implemented by marked the library as a partial replacement, so the 
dynamic linker would also load the base or lower libraries except for 
functions already resolved.

You could also add a level E for the AVX512 instructions in ice lake and 
above. The VBMI1/2 instructions would likely be useful for autovectorization 
in GCC.

'Allan




Re: New x86-64 micro-architecture levels

2020-07-10 Thread H.J. Lu via Gcc
On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer  wrote:
>
> Most Linux distributions still compile against the original x86-64
> baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> EM64T compatibility).
>
> There has been an attempt to use the existing AT_PLATFORM-based loading
> mechanism in the glibc dynamic linker to enable a selection of optimized
> libraries.  But the general selection mechanism in glibc is problematic:
>
>   hwcaps subdirectory selection in the dynamic loader
>   
>
> We also have the problem that the glibc version of "haswell" is distinct
> from GCC's -march=haswell (and presumably other compilers):
>
>   Definition of "haswell" platform is inconsistent with GCC
>   
>
> And that the selection criteria are not what people expect:
>
>   Epyc and other current AMD CPUs do not select the "haswell" platform
>   subdirectory
>   
>
> Since the hwcaps-based selection does not work well regardless of
> architecture (even in cases the kernel provides glibc with data), I
> worked on a new mechanism that does not have the problems associated
> with the old mechanism:
>
>   [PATCH 00/30] RFC: elf: glibc-hwcaps support
>   
>
> (Don't be concerned that these patches have not been reviewed; we are
> busy preparing the glibc 2.32 release, and these changes do not alter
> the glibc ABI itself, so they do not have immediate priority.  I'm
> fairly confident that a version of these changes will make it into glibc
> 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> Enterprise Linux 8.4.  Debian as well, but I have never done anything
> like it there, so I don't know if the patches will be accepted.)
>
> Out of the box, this should work fairly well for IBM POWER and Z, where
> there is a clear progression of silicon versions (at least on paper
> —virtualization may blur the picture somewhat).
>
> However, for x86, we do not have such a clear progression of
> micro-architecture versions.  This is not just as a result of the
> AMD/Intel competition, but also due to ongoing product differentiation
> within one chip vendor.  I think we need these levels broadly for the
> following reasons:
>
> * Selecting on individual CPU features (similar to the old hwcaps
>   mechanism) in glibc has scalability issues, particularly for
>   LD_LIBRARY_PATH processing.
>
> * Developers need guidance about useful targets for optimization.  I
>   think there is value in limiting the choices, in the sense that “if
>   you are able to test three builds in total, these are the things you
>   should build”.
>
> * glibc and the compilers should align in their definition of the
>   levels, so that developers can use an -march= option to build for a
>   particular level that is recognized by glibc.  This is why I think the
>   description of the levels should go into the psABI supplement.
>
> * A preference order for these levels avoids falling back to the K8
>   baseline if the platform progresses to a new version due to
>   glibc/kernel/hypervisor/hardware upgrades.
>
> I'm including a proposal for the levels below.  I use single letters for
> them, but I expect that the concrete implementation of this proposal
> will use names like “x86-100”, “x86-101”, like in the glibc patch
> referenced above.  (But we can discuss other approaches.)
>
> I looked at various machines in the Red Hat labs and talked to Intel and
> AMD engineers about this, but this concrete proposal is based on my own
> analysis of the situation.  I excluded CPU features related to
> cryptography and cache management, including hardware transactional
> memory, and CPU timing.  I assume that we will see some of these
> features being disabled by the firmware or the kernel over time.  That
> would eliminate entire levels from selection, which is not desirable.
> For cryptographic code, I expect that localized selection of an
> optimized implementation works because such code tends to be isolated
> blocks, running for dozens of cycles each time, not something that gets
> scattered all over the place by the compiler.
>
> We previously discussed not emitting VZEROUPPER at later levels, but I
> don't think this is beneficial because the ABI does not have
> callee-saved vector registers, so it can only be useful with local
> functions (or whatever LTO considers local), where there is no ABI
> impact anyway.
>
> I did not include FSGSBASE because the FS base is already available at
> %fs:0.  Changing the FS base in userspace breaks too much, so the main
> benefit is the tighter encoding of rdfsbase, which seems very slim.
>
> Not covered in this are tuning decisions.  I think we can benefit from
> some variance in this area between implementations; it should not affect
> 

Re: New x86-64 micro-architecture levels

2020-07-10 Thread Joseph Myers
On Fri, 10 Jul 2020, Florian Weimer via Gcc wrote:

> * Level A
> 
> CMPXCHG16B, LAHF/SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
> 
> This is one step above the K8 baseline and corresponds to a mainline CPU
> model ca. 2008 to 2011.  It is also implemented by recent-ish
> generations of Intel Atom server CPUs (although I haven't tested the
> latest version).  A 32-bit variant would have to list many additional
> CPU features here.

FWIW, this is also the limit of what can be run under QEMU emulation, as 
QEMU lacks support for AVX and newer instruction set features.

On the other hand, virtual machines seem liable to report something closer 
to the K8 baseline to the guest OS, missing the level A features, even 
when the underlying hardware supports everything in level B or level C.

-- 
Joseph S. Myers
jos...@codesourcery.com