Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-07 Thread Aurelien Jarno
On 2020-05-07 13:04, Noah Meyerhans wrote:
> On Wed, May 06, 2020 at 04:15:09PM +0200, Aurelien Jarno wrote:
> > > >One solution for this would be to ship the optimized library in the same
> > > >package as the default library. Now this is not acceptable for embedded
> > > >systems as they might not need that library and can't remove it. This is
> > > >even more problematic if we need to add more optimized libraries. I guess
> > > >this might be the case for arm64 as there are many new extensions in the
> > > >pipe.
> > > 
> > > ACK. It's a problem to ship the different things in separate
> > > packages. If it's really a problem for smaller systems to have all the
> > > variants because of size, is there maybe another way to do things? How
> > > about keeping the existing libc and have an extra package
> > > ("libc-optimised") with all the optimised versions *and* the basic
> > > version, and have it provide/replace/conflict libc6?
> > > 
> > > (/me prepares to be ambarrassed as you point out the obvious flaw I'm
> > > missing...)
> > 
> > I guess that the provide/replace/conflict libc6 will just prevent
> > installation of foreign libc6 packages, basically making this optimized
> > package useless in the multiarch context.
> > 
> > OTOH, what is the drawback of having GCC defaulting to -moutline-atomics?
> > It will improve performance on many more packages than only glibc, and
> > is way easier to implement overall. It also means users has nothing to
> > do to get additional performances.
> 
> For the current issue, defaulting to -moutline-atomics might be a sane
> approach.  As you said earlier, though, it seems that there are many new
> extensions in the pipe for ARM.  There may not be an equivalent solution
> for all of them, and even if there is, at some point the runtime
> overhead of all this conditional code is going to add up to something
> meaningful.

If we are talking about future extensions, another option for some of
them is to use ifunc. It's how the various SSE and AVX extensions are
supported on x86, and neon is supported on armv7.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-07 Thread Noah Meyerhans
On Wed, May 06, 2020 at 04:15:09PM +0200, Aurelien Jarno wrote:
> > >One solution for this would be to ship the optimized library in the same
> > >package as the default library. Now this is not acceptable for embedded
> > >systems as they might not need that library and can't remove it. This is
> > >even more problematic if we need to add more optimized libraries. I guess
> > >this might be the case for arm64 as there are many new extensions in the
> > >pipe.
> > 
> > ACK. It's a problem to ship the different things in separate
> > packages. If it's really a problem for smaller systems to have all the
> > variants because of size, is there maybe another way to do things? How
> > about keeping the existing libc and have an extra package
> > ("libc-optimised") with all the optimised versions *and* the basic
> > version, and have it provide/replace/conflict libc6?
> > 
> > (/me prepares to be ambarrassed as you point out the obvious flaw I'm
> > missing...)
> 
> I guess that the provide/replace/conflict libc6 will just prevent
> installation of foreign libc6 packages, basically making this optimized
> package useless in the multiarch context.
> 
> OTOH, what is the drawback of having GCC defaulting to -moutline-atomics?
> It will improve performance on many more packages than only glibc, and
> is way easier to implement overall. It also means users has nothing to
> do to get additional performances.

For the current issue, defaulting to -moutline-atomics might be a sane
approach.  As you said earlier, though, it seems that there are many new
extensions in the pipe for ARM.  There may not be an equivalent solution
for all of them, and even if there is, at some point the runtime
overhead of all this conditional code is going to add up to something
meaningful.

noah



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-07 Thread Adrian Bunk
On Wed, May 06, 2020 at 01:56:24PM +0100, Steve McIntyre wrote:
>...
> On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote:
> >
> >One solution for this would be to ship the optimized library in the same
> >package as the default library. Now this is not acceptable for embedded
> >systems as they might not need that library and can't remove it. This is
> >even more problematic if we need to add more optimized libraries. I guess
> >this might be the case for arm64 as there are many new extensions in the
> >pipe.
> 
> ACK. It's a problem to ship the different things in separate
> packages. If it's really a problem for smaller systems to have all the
> variants because of size, is there maybe another way to do things? How
> about keeping the existing libc and have an extra package
> ("libc-optimised") with all the optimised versions *and* the basic
> version, and have it provide/replace/conflict libc6?
>...

What Noah mentioned for a similar proposal also applies here:

On Mon, May 04, 2020 at 02:45:41PM -0400, Noah Meyerhans wrote:
>...
> I don't know how well dpkg would cope with transitioning
> between providers, which seems like the riskiest side of this kind of
> thing.

I'd guess you could make this an installation-only change with
a few hacks here and there, but once you think that through
with all the followup-hacks required it doesn't sound like
a good idea.

cu
Adrian



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-06 Thread Aurelien Jarno
On 2020-05-06 13:56, Steve McIntyre wrote:
> Hey Aurelien,
> 
> On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote:
> >
> >One solution for this would be to ship the optimized library in the same
> >package as the default library. Now this is not acceptable for embedded
> >systems as they might not need that library and can't remove it. This is
> >even more problematic if we need to add more optimized libraries. I guess
> >this might be the case for arm64 as there are many new extensions in the
> >pipe.
> 
> ACK. It's a problem to ship the different things in separate
> packages. If it's really a problem for smaller systems to have all the
> variants because of size, is there maybe another way to do things? How
> about keeping the existing libc and have an extra package
> ("libc-optimised") with all the optimised versions *and* the basic
> version, and have it provide/replace/conflict libc6?
> 
> (/me prepares to be ambarrassed as you point out the obvious flaw I'm
> missing...)

I guess that the provide/replace/conflict libc6 will just prevent
installation of foreign libc6 packages, basically making this optimized
package useless in the multiarch context.

OTOH, what is the drawback of having GCC defaulting to -moutline-atomics?
It will improve performance on many more packages than only glibc, and
is way easier to implement overall. It also means users has nothing to
do to get additional performances.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-06 Thread Steve McIntyre
Hey Aurelien,

On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote:
>
>One solution for this would be to ship the optimized library in the same
>package as the default library. Now this is not acceptable for embedded
>systems as they might not need that library and can't remove it. This is
>even more problematic if we need to add more optimized libraries. I guess
>this might be the case for arm64 as there are many new extensions in the
>pipe.

ACK. It's a problem to ship the different things in separate
packages. If it's really a problem for smaller systems to have all the
variants because of size, is there maybe another way to do things? How
about keeping the existing libc and have an extra package
("libc-optimised") with all the optimised versions *and* the basic
version, and have it provide/replace/conflict libc6?

(/me prepares to be ambarrassed as you point out the obvious flaw I'm
missing...)

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
"... the premise [is] that privacy is about hiding a wrong. It's not.
 Privacy is an inherent human right, and a requirement for maintaining
 the human condition with dignity and respect."
  -- Bruce Schneier



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-06 Thread Adrian Bunk
On Mon, May 04, 2020 at 02:45:41PM -0400, Noah Meyerhans wrote:
>...
> I wonder if it'd make sense for libc to be a virtual package, with
> functionality provided by optimized builds and dependencies satisfied
> via Provides.  I don't know how well dpkg would cope with transitioning
> between providers, which seems like the riskiest side of this kind of
> thing.

What would happens if apt finds a dependency solution when installing or 
updating packages that involves switching to a libc package that does
not run on your device?
There are situations where changing the libc package would be the only
possible solution of the dependencies.

IMHO there are far too many ways how such a virtual package solution 
could brick devices.

> noah

cu
Adrian



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-04 Thread Noah Meyerhans
On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote:
> The hardware capabilities system works fine upstream, but doesn't work
> for us because:
> 1) we want to be able to upgrade major upstream version online (as
> opposed to fedora for example)
> 2) we ship the optimized libraries in a different package
> 
> The various libc librairies need to have the same version at any time,
> this is especially true for ld.so vs libc.so. As we do not upgrade the
> default libc and the optimized one exactly at the same time (they are in
> different packages), we upgrade first the default libc and then we have
> the Debian specific nohwcap mechanism to prevent using the optimize
> library until it has also been upgraded.
> 
> One solution for this would be to ship the optimized library in the same
> package as the default library. Now this is not acceptable for embedded
> systems as they might not need that library and can't remove it. This is
> even more problematic if we need to add more optimized libraries. I guess
> this might be the case for arm64 as there are many new extensions in the
> pipe.

Thanks for taking the time to explain that!

I wonder if it'd make sense for libc to be a virtual package, with
functionality provided by optimized builds and dependencies satisfied
via Provides.  I don't know how well dpkg would cope with transitioning
between providers, which seems like the riskiest side of this kind of
thing.

noah



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-03 Thread Aurelien Jarno
On 2020-04-21 18:37, Noah Meyerhans wrote:
> > To be honest from a glibc maintenance point of view it's something I
> > would like to avoid. We haven't been actively trying to remove the
> > remaining optimized libraries (on i386, hurd and alpha), but we have
> > tried to avoid adding new ones. The problem is not building a second
> > optimized glibc, but rather providing a safe upgrade as the optimized
> > and the non-optimized package have to be at the same version or one of
> > them has to be disabled. This has caused many system breakages overall.
> 
> Understood, that makes sense.  I wonder if it's worth it to investigate
> techniques to improve the situation around optimized libraries.  Do you
> have any thoughts on what such an improvement might look like?

The hardware capabilities system works fine upstream, but doesn't work
for us because:
1) we want to be able to upgrade major upstream version online (as
opposed to fedora for example)
2) we ship the optimized libraries in a different package

The various libc librairies need to have the same version at any time,
this is especially true for ld.so vs libc.so. As we do not upgrade the
default libc and the optimized one exactly at the same time (they are in
different packages), we upgrade first the default libc and then we have
the Debian specific nohwcap mechanism to prevent using the optimize
library until it has also been upgraded.

One solution for this would be to ship the optimized library in the same
package as the default library. Now this is not acceptable for embedded
systems as they might not need that library and can't remove it. This is
even more problematic if we need to add more optimized libraries. I guess
this might be the case for arm64 as there are many new extensions in the
pipe.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Steve Capper
On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote:
> Hi folks!

Hiya,

> 
> I'm adding a CC to Steve Capper, a colleague in Arm who's our expert
> here for this kind of question. He's also a DM in Debian... :-)

Now I feel guilty about not doing enough Debian :-).

> 
> On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote:
> >On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote:
> >
> >> It would also be nice to have numbers to see the impact on non-ARMv8.1
> >> CPU on real workloads. As pointed out by Florian, and if the impact is
> >> negligible, it might be a good idea to enable -moutline-atomics
> >> globally at the GCC level so that all software can benefit from it, and
> >> instead of only glibc. That could be either upstream or only in Debian,
> >> that's probably a separate discussion. Otherwise we will likely end up
> >> using this non-default GCC option on all packages that runs faster with
> >> it.
> >
> >Agreed.
> 
> I think the -moutline-atomics is probably good to enable by default
> once we've got it (gcc 10). that's the suggestion I've heard from gcc
> folks in Arm.
> 
> >> Also note that the mechanism allowing a safe upgrade *does* incur a 
> >> runtime overhead as every binary now has to test for the presence of
> >> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
> >> progress. That's why we have disabled it on architecture not providing
> >> an optimized library [1].
> 
> Oh, ick. :-/
> 
> >Thanks for the pointer, it's interesting to see data on that.  This also
> >suggests that it might be worthwhile to investigate a better mechanism
> >for identifying the availability of hardware features.
> >
> >> > I've tested both options and found them to be acceptable on v8.1a 
> >> > (Neoverse
> >> > N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
> >> > various different configuration permutations if you'd like to see 
> >> > additional
> >> > data.

That's good to hear!

> >> 
> >> As said above I think we would need more numbers on real workload to
> >> take a decision. Don't get me wrong I do not oppose on improving atomics
> >> on ARMv8.1, but I would like that we chose the best option. Also if we
> >> go with the -moutline-atomics option, I believe it rather has to be a
> >> ARM porters decision than a glibc maintainers decision (hence the Cc:).
> >
> >I'll see what I can come up with.
> >
> >Do the arm porters have any opinions on this matter?
> 
> It's a good question, and thanks for asking! I definitely think it's
> worth doing -moutline-atomics, and I'm hoping Steve can share some
> performance numbers to help convince. :-)
> 

We ran -moutline-atomics on a mixture of development hardware running,
IIRC some DPDK lock tests that employed C11-style atomics. As expected
there was a performance penalty, but it was order of magnitude of 1%.
The perf boost from moving to LSE was a lot larger (and we noticed the
variance dropping a lot with LSE too).

FWIW, I'd recommend the -moutline-atomics for the general case. (I used
to be a fan of the multi-lib approach; but the way the runtime selection
is implemented in gcc with a direct branch changed my mind :-) ).

Cheers,
-- 
Steve



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Florian Weimer
* Noah Meyerhans:

> On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote:
>> > Significant performance impact has also been observed in less contrived
>> > cases (MariaDB and Postgres), but I don't have a repro to share.
>> 
>> But indeed what counts is number on real workloads. It would be nice to
>> get numbers when those software are run against a rebuilt glibc. As
>> those software are using a lot of atomics directly, it would be also
>> interesting to have numbers with those software also rebuilt to use
>> those new instructions.
>
> Agreed.  I don't have specific examples of real world impact at the
> moment.  AIUI, the most significant impact comes in the usage of atomics
> in pthread_mutex_lock().  When there are multiple threads contending for
> a lock, one thread will (approximately) always obtain the lock, while
> the others will starve.  With atomics support in place, the probability
> of obtaining the lock is roughly evenly distributed among all the
> threads.  So any workload in which multiple threads may contend for a
> lock should be a candidate to demonstrate this problem in the real
> world.

Does this behavior affect just one implementation with LSE, or also
implementations without LSE?

If the latter, we might need a different mutex implementation for
AArch64. 8-(



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Steve McIntyre
On Wed, Apr 22, 2020 at 01:08:46PM -0400, Noah Meyerhans wrote:
>On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote:
>> I think the -moutline-atomics is probably good to enable by default
>> once we've got it (gcc 10). that's the suggestion I've heard from gcc
>> folks in Arm.
>
>JFTR, it's been backported to gcc 9 and is available in Debian's gcc-9
>as of 9.3.0-9. See
>https://salsa.debian.org/toolchain-team/gcc/-/blob/gcc-9-debian/debian/patches/git-updates.diff

Ah, cool. I knew it *was* being backported, but I wasn't aware it was
already with us. Woot!

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
  Getting a SCSI chain working is perfectly simple if you remember that there
  must be exactly three terminations: one on one end of the cable, one on the
  far end, and the goat, terminated over the SCSI chain with a silver-handled
  knife whilst burning *black* candles. --- Anthony DeBoer



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Noah Meyerhans
On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote:
> I think the -moutline-atomics is probably good to enable by default
> once we've got it (gcc 10). that's the suggestion I've heard from gcc
> folks in Arm.

JFTR, it's been backported to gcc 9 and is available in Debian's gcc-9
as of 9.3.0-9. See
https://salsa.debian.org/toolchain-team/gcc/-/blob/gcc-9-debian/debian/patches/git-updates.diff

noah



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Wookey
On 2020-04-12 12:18 +0200, Aurelien Jarno wrote:

> The problem is not building a second
> optimized glibc, but rather providing a safe upgrade as the optimized
> and the non-optimized package have to be at the same version or one of
> them has to be disabled. This has caused many system breakages overall.
> 
> Also note that the mechanism allowing a safe upgrade *does* incur a 
> runtime overhead as every binary now has to test for the presence of
> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
> progress. That's why we have disabled it on architecture not providing
> an optimized library [1].

Can you explain how this works please? I'm not familiar with this and
it seems like something worth understanding in this context.

How often is each binary checking for this file, and what exactly does
it indicate? And which binaries are checking?

Wookey
-- 
Principal hats:  Linaro, Debian, Wookware, ARM
http://wookware.org/


signature.asc
Description: PGP signature


Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Steve McIntyre
Hi folks!

I'm adding a CC to Steve Capper, a colleague in Arm who's our expert
here for this kind of question. He's also a DM in Debian... :-)

On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote:
>On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote:
>
>> It would also be nice to have numbers to see the impact on non-ARMv8.1
>> CPU on real workloads. As pointed out by Florian, and if the impact is
>> negligible, it might be a good idea to enable -moutline-atomics
>> globally at the GCC level so that all software can benefit from it, and
>> instead of only glibc. That could be either upstream or only in Debian,
>> that's probably a separate discussion. Otherwise we will likely end up
>> using this non-default GCC option on all packages that runs faster with
>> it.
>
>Agreed.

I think the -moutline-atomics is probably good to enable by default
once we've got it (gcc 10). that's the suggestion I've heard from gcc
folks in Arm.

>> Also note that the mechanism allowing a safe upgrade *does* incur a 
>> runtime overhead as every binary now has to test for the presence of
>> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
>> progress. That's why we have disabled it on architecture not providing
>> an optimized library [1].

Oh, ick. :-/

>Thanks for the pointer, it's interesting to see data on that.  This also
>suggests that it might be worthwhile to investigate a better mechanism
>for identifying the availability of hardware features.
>
>> > I've tested both options and found them to be acceptable on v8.1a (Neoverse
>> > N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
>> > various different configuration permutations if you'd like to see 
>> > additional
>> > data.
>> 
>> As said above I think we would need more numbers on real workload to
>> take a decision. Don't get me wrong I do not oppose on improving atomics
>> on ARMv8.1, but I would like that we chose the best option. Also if we
>> go with the -moutline-atomics option, I believe it rather has to be a
>> ARM porters decision than a glibc maintainers decision (hence the Cc:).
>
>I'll see what I can come up with.
>
>Do the arm porters have any opinions on this matter?

It's a good question, and thanks for asking! I definitely think it's
worth doing -moutline-atomics, and I'm hoping Steve can share some
performance numbers to help convince. :-)

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
Who needs computer imagery when you've got Brian Blessed?



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-21 Thread Noah Meyerhans
On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote:
> > Significant performance impact has also been observed in less contrived
> > cases (MariaDB and Postgres), but I don't have a repro to share.
> 
> But indeed what counts is number on real workloads. It would be nice to
> get numbers when those software are run against a rebuilt glibc. As
> those software are using a lot of atomics directly, it would be also
> interesting to have numbers with those software also rebuilt to use
> those new instructions.

Agreed.  I don't have specific examples of real world impact at the
moment.  AIUI, the most significant impact comes in the usage of atomics
in pthread_mutex_lock().  When there are multiple threads contending for
a lock, one thread will (approximately) always obtain the lock, while
the others will starve.  With atomics support in place, the probability
of obtaining the lock is roughly evenly distributed among all the
threads.  So any workload in which multiple threads may contend for a
lock should be a candidate to demonstrate this problem in the real
world.

> It would also be nice to have numbers to see the impact on non-ARMv8.1
> CPU on real workloads. As pointed out by Florian, and if the impact is
> negligible, it might be a good idea to enable -moutline-atomics
> globally at the GCC level so that all software can benefit from it, and
> instead of only glibc. That could be either upstream or only in Debian,
> that's probably a separate discussion. Otherwise we will likely end up
> using this non-default GCC option on all packages that runs faster with
> it.

Agreed.

> To be honest from a glibc maintenance point of view it's something I
> would like to avoid. We haven't been actively trying to remove the
> remaining optimized libraries (on i386, hurd and alpha), but we have
> tried to avoid adding new ones. The problem is not building a second
> optimized glibc, but rather providing a safe upgrade as the optimized
> and the non-optimized package have to be at the same version or one of
> them has to be disabled. This has caused many system breakages overall.

Understood, that makes sense.  I wonder if it's worth it to investigate
techniques to improve the situation around optimized libraries.  Do you
have any thoughts on what such an improvement might look like?

> Also note that the mechanism allowing a safe upgrade *does* incur a 
> runtime overhead as every binary now has to test for the presence of
> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
> progress. That's why we have disabled it on architecture not providing
> an optimized library [1].

Thanks for the pointer, it's interesting to see data on that.  This also
suggests that it might be worthwhile to investigate a better mechanism
for identifying the availability of hardware features.

> > I've tested both options and found them to be acceptable on v8.1a (Neoverse
> > N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
> > various different configuration permutations if you'd like to see additional
> > data.
> 
> As said above I think we would need more numbers on real workload to
> take a decision. Don't get me wrong I do not oppose on improving atomics
> on ARMv8.1, but I would like that we chose the best option. Also if we
> go with the -moutline-atomics option, I believe it rather has to be a
> ARM porters decision than a glibc maintainers decision (hence the Cc:).

I'll see what I can come up with.

Do the arm porters have any opinions on this matter?

noah



Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-12 Thread Aurelien Jarno
Hi,

On 2020-04-10 13:16, Noah Meyerhans wrote:
> Package: src:glibc
> Version: 2.30-4
> Severity: wishlist
> X-Debbugs-CC: debian-arm@lists.debian.org
> 
> The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor,
> introduces a set of instructions [1] that result in significant performance
> improvements for multithreaded applications.  Sample code demonstrating the
> performance improvements is attached.  When run on a 16-core Neoverse N1
> host with glibc 2.30-4, runtimes vary significantly, ranging from lows
> around 250ms to highs around 15 seconds.  When linked against glibc rebuilt
> with support for these instructions, runtimes are consistently <50ms.

This is an impressive improvement!

> Significant performance impact has also been observed in less contrived
> cases (MariaDB and Postgres), but I don't have a repro to share.

But indeed what counts is number on real workloads. It would be nice to
get numbers when those software are run against a rebuilt glibc. As
those software are using a lot of atomics directly, it would be also
interesting to have numbers with those software also rebuilt to use
those new instructions.

> Gcc provides two ways to enable support for these instructions at build
> time.  The simplest, and least disruptive, is to enable -moutline-atomics
> globally in the arm64 glibc build.  As described at [2], this option enables
> runtime checks for the availability of the atomic instructions.  If found,
> they are used, otherwise ARMv8.0 compatible code is used.  The drawback of
> this option is that the check happens at runtime, thus introducing some
> overhead on all arm64 installations.

It would also be nice to have numbers to see the impact on non-ARMv8.1
CPU on real workloads. As pointed out by Florian, and if the impact is
negligible, it might be a good idea to enable -moutline-atomics
globally at the GCC level so that all software can benefit from it, and
instead of only glibc. That could be either upstream or only in Debian,
that's probably a separate discussion. Otherwise we will likely end up
using this non-default GCC option on all packages that runs faster with
it.

> The second option is to provide libraries built with explicit support for
> the ARM v8.1a spec via the -march=armv8.1-a flag.  This option is also
> described at [2].  This build would be incompatible with earlier versions of
> the spec, so it would need to be provided in a location where the linker
> will automatically discover it if it is usable (e.g.
> /lib/aarch64-linux-gnu/atomics/).  This does not incur any runtime overhead,
> but obviously involves an additional libc build, and the corresponding
> complixity and disk space utilization.  I'm not sure if this is an option
> that the glibc maintainers are interested in pursuing.

To be honest from a glibc maintenance point of view it's something I
would like to avoid. We haven't been actively trying to remove the
remaining optimized libraries (on i386, hurd and alpha), but we have
tried to avoid adding new ones. The problem is not building a second
optimized glibc, but rather providing a safe upgrade as the optimized
and the non-optimized package have to be at the same version or one of
them has to be disabled. This has caused many system breakages overall.

Also note that the mechanism allowing a safe upgrade *does* incur a 
runtime overhead as every binary now has to test for the presence of
/etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
progress. That's why we have disabled it on architecture not providing
an optimized library [1].

> I've tested both options and found them to be acceptable on v8.1a (Neoverse
> N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
> various different configuration permutations if you'd like to see additional
> data.

As said above I think we would need more numbers on real workload to
take a decision. Don't get me wrong I do not oppose on improving atomics
on ARMv8.1, but I would like that we chose the best option. Also if we
go with the -moutline-atomics option, I believe it rather has to be a
ARM porters decision than a glibc maintainers decision (hence the Cc:).

> I can provide patches or merge requests implementing either option, at least
> for a starting point, if you'd like to see them.

Thanks for this offer, but I don't think that's the most difficult part,
it's fairly straightforward to go for either of those options once a
decision is taken.

Regards,
Aurelien

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908928

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net