Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On 2020-05-07 13:04, Noah Meyerhans wrote: > On Wed, May 06, 2020 at 04:15:09PM +0200, Aurelien Jarno wrote: > > > >One solution for this would be to ship the optimized library in the same > > > >package as the default library. Now this is not acceptable for embedded > > > >systems as they might not need that library and can't remove it. This is > > > >even more problematic if we need to add more optimized libraries. I guess > > > >this might be the case for arm64 as there are many new extensions in the > > > >pipe. > > > > > > ACK. It's a problem to ship the different things in separate > > > packages. If it's really a problem for smaller systems to have all the > > > variants because of size, is there maybe another way to do things? How > > > about keeping the existing libc and have an extra package > > > ("libc-optimised") with all the optimised versions *and* the basic > > > version, and have it provide/replace/conflict libc6? > > > > > > (/me prepares to be ambarrassed as you point out the obvious flaw I'm > > > missing...) > > > > I guess that the provide/replace/conflict libc6 will just prevent > > installation of foreign libc6 packages, basically making this optimized > > package useless in the multiarch context. > > > > OTOH, what is the drawback of having GCC defaulting to -moutline-atomics? > > It will improve performance on many more packages than only glibc, and > > is way easier to implement overall. It also means users has nothing to > > do to get additional performances. > > For the current issue, defaulting to -moutline-atomics might be a sane > approach. As you said earlier, though, it seems that there are many new > extensions in the pipe for ARM. There may not be an equivalent solution > for all of them, and even if there is, at some point the runtime > overhead of all this conditional code is going to add up to something > meaningful. If we are talking about future extensions, another option for some of them is to use ifunc. It's how the various SSE and AVX extensions are supported on x86, and neon is supported on armv7. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Wed, May 06, 2020 at 04:15:09PM +0200, Aurelien Jarno wrote: > > >One solution for this would be to ship the optimized library in the same > > >package as the default library. Now this is not acceptable for embedded > > >systems as they might not need that library and can't remove it. This is > > >even more problematic if we need to add more optimized libraries. I guess > > >this might be the case for arm64 as there are many new extensions in the > > >pipe. > > > > ACK. It's a problem to ship the different things in separate > > packages. If it's really a problem for smaller systems to have all the > > variants because of size, is there maybe another way to do things? How > > about keeping the existing libc and have an extra package > > ("libc-optimised") with all the optimised versions *and* the basic > > version, and have it provide/replace/conflict libc6? > > > > (/me prepares to be ambarrassed as you point out the obvious flaw I'm > > missing...) > > I guess that the provide/replace/conflict libc6 will just prevent > installation of foreign libc6 packages, basically making this optimized > package useless in the multiarch context. > > OTOH, what is the drawback of having GCC defaulting to -moutline-atomics? > It will improve performance on many more packages than only glibc, and > is way easier to implement overall. It also means users has nothing to > do to get additional performances. For the current issue, defaulting to -moutline-atomics might be a sane approach. As you said earlier, though, it seems that there are many new extensions in the pipe for ARM. There may not be an equivalent solution for all of them, and even if there is, at some point the runtime overhead of all this conditional code is going to add up to something meaningful. noah
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Wed, May 06, 2020 at 01:56:24PM +0100, Steve McIntyre wrote: >... > On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote: > > > >One solution for this would be to ship the optimized library in the same > >package as the default library. Now this is not acceptable for embedded > >systems as they might not need that library and can't remove it. This is > >even more problematic if we need to add more optimized libraries. I guess > >this might be the case for arm64 as there are many new extensions in the > >pipe. > > ACK. It's a problem to ship the different things in separate > packages. If it's really a problem for smaller systems to have all the > variants because of size, is there maybe another way to do things? How > about keeping the existing libc and have an extra package > ("libc-optimised") with all the optimised versions *and* the basic > version, and have it provide/replace/conflict libc6? >... What Noah mentioned for a similar proposal also applies here: On Mon, May 04, 2020 at 02:45:41PM -0400, Noah Meyerhans wrote: >... > I don't know how well dpkg would cope with transitioning > between providers, which seems like the riskiest side of this kind of > thing. I'd guess you could make this an installation-only change with a few hacks here and there, but once you think that through with all the followup-hacks required it doesn't sound like a good idea. cu Adrian
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On 2020-05-06 13:56, Steve McIntyre wrote: > Hey Aurelien, > > On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote: > > > >One solution for this would be to ship the optimized library in the same > >package as the default library. Now this is not acceptable for embedded > >systems as they might not need that library and can't remove it. This is > >even more problematic if we need to add more optimized libraries. I guess > >this might be the case for arm64 as there are many new extensions in the > >pipe. > > ACK. It's a problem to ship the different things in separate > packages. If it's really a problem for smaller systems to have all the > variants because of size, is there maybe another way to do things? How > about keeping the existing libc and have an extra package > ("libc-optimised") with all the optimised versions *and* the basic > version, and have it provide/replace/conflict libc6? > > (/me prepares to be ambarrassed as you point out the obvious flaw I'm > missing...) I guess that the provide/replace/conflict libc6 will just prevent installation of foreign libc6 packages, basically making this optimized package useless in the multiarch context. OTOH, what is the drawback of having GCC defaulting to -moutline-atomics? It will improve performance on many more packages than only glibc, and is way easier to implement overall. It also means users has nothing to do to get additional performances. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
Hey Aurelien, On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote: > >One solution for this would be to ship the optimized library in the same >package as the default library. Now this is not acceptable for embedded >systems as they might not need that library and can't remove it. This is >even more problematic if we need to add more optimized libraries. I guess >this might be the case for arm64 as there are many new extensions in the >pipe. ACK. It's a problem to ship the different things in separate packages. If it's really a problem for smaller systems to have all the variants because of size, is there maybe another way to do things? How about keeping the existing libc and have an extra package ("libc-optimised") with all the optimised versions *and* the basic version, and have it provide/replace/conflict libc6? (/me prepares to be ambarrassed as you point out the obvious flaw I'm missing...) -- Steve McIntyre, Cambridge, UK.st...@einval.com "... the premise [is] that privacy is about hiding a wrong. It's not. Privacy is an inherent human right, and a requirement for maintaining the human condition with dignity and respect." -- Bruce Schneier
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Mon, May 04, 2020 at 02:45:41PM -0400, Noah Meyerhans wrote: >... > I wonder if it'd make sense for libc to be a virtual package, with > functionality provided by optimized builds and dependencies satisfied > via Provides. I don't know how well dpkg would cope with transitioning > between providers, which seems like the riskiest side of this kind of > thing. What would happens if apt finds a dependency solution when installing or updating packages that involves switching to a libc package that does not run on your device? There are situations where changing the libc package would be the only possible solution of the dependencies. IMHO there are far too many ways how such a virtual package solution could brick devices. > noah cu Adrian
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote: > The hardware capabilities system works fine upstream, but doesn't work > for us because: > 1) we want to be able to upgrade major upstream version online (as > opposed to fedora for example) > 2) we ship the optimized libraries in a different package > > The various libc librairies need to have the same version at any time, > this is especially true for ld.so vs libc.so. As we do not upgrade the > default libc and the optimized one exactly at the same time (they are in > different packages), we upgrade first the default libc and then we have > the Debian specific nohwcap mechanism to prevent using the optimize > library until it has also been upgraded. > > One solution for this would be to ship the optimized library in the same > package as the default library. Now this is not acceptable for embedded > systems as they might not need that library and can't remove it. This is > even more problematic if we need to add more optimized libraries. I guess > this might be the case for arm64 as there are many new extensions in the > pipe. Thanks for taking the time to explain that! I wonder if it'd make sense for libc to be a virtual package, with functionality provided by optimized builds and dependencies satisfied via Provides. I don't know how well dpkg would cope with transitioning between providers, which seems like the riskiest side of this kind of thing. noah
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On 2020-04-21 18:37, Noah Meyerhans wrote: > > To be honest from a glibc maintenance point of view it's something I > > would like to avoid. We haven't been actively trying to remove the > > remaining optimized libraries (on i386, hurd and alpha), but we have > > tried to avoid adding new ones. The problem is not building a second > > optimized glibc, but rather providing a safe upgrade as the optimized > > and the non-optimized package have to be at the same version or one of > > them has to be disabled. This has caused many system breakages overall. > > Understood, that makes sense. I wonder if it's worth it to investigate > techniques to improve the situation around optimized libraries. Do you > have any thoughts on what such an improvement might look like? The hardware capabilities system works fine upstream, but doesn't work for us because: 1) we want to be able to upgrade major upstream version online (as opposed to fedora for example) 2) we ship the optimized libraries in a different package The various libc librairies need to have the same version at any time, this is especially true for ld.so vs libc.so. As we do not upgrade the default libc and the optimized one exactly at the same time (they are in different packages), we upgrade first the default libc and then we have the Debian specific nohwcap mechanism to prevent using the optimize library until it has also been upgraded. One solution for this would be to ship the optimized library in the same package as the default library. Now this is not acceptable for embedded systems as they might not need that library and can't remove it. This is even more problematic if we need to add more optimized libraries. I guess this might be the case for arm64 as there are many new extensions in the pipe. Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote: > Hi folks! Hiya, > > I'm adding a CC to Steve Capper, a colleague in Arm who's our expert > here for this kind of question. He's also a DM in Debian... :-) Now I feel guilty about not doing enough Debian :-). > > On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote: > >On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: > > > >> It would also be nice to have numbers to see the impact on non-ARMv8.1 > >> CPU on real workloads. As pointed out by Florian, and if the impact is > >> negligible, it might be a good idea to enable -moutline-atomics > >> globally at the GCC level so that all software can benefit from it, and > >> instead of only glibc. That could be either upstream or only in Debian, > >> that's probably a separate discussion. Otherwise we will likely end up > >> using this non-default GCC option on all packages that runs faster with > >> it. > > > >Agreed. > > I think the -moutline-atomics is probably good to enable by default > once we've got it (gcc 10). that's the suggestion I've heard from gcc > folks in Arm. > > >> Also note that the mechanism allowing a safe upgrade *does* incur a > >> runtime overhead as every binary now has to test for the presence of > >> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in > >> progress. That's why we have disabled it on architecture not providing > >> an optimized library [1]. > > Oh, ick. :-/ > > >Thanks for the pointer, it's interesting to see data on that. This also > >suggests that it might be worthwhile to investigate a better mechanism > >for identifying the availability of hardware features. > > > >> > I've tested both options and found them to be acceptable on v8.1a > >> > (Neoverse > >> > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the > >> > various different configuration permutations if you'd like to see > >> > additional > >> > data. That's good to hear! > >> > >> As said above I think we would need more numbers on real workload to > >> take a decision. Don't get me wrong I do not oppose on improving atomics > >> on ARMv8.1, but I would like that we chose the best option. Also if we > >> go with the -moutline-atomics option, I believe it rather has to be a > >> ARM porters decision than a glibc maintainers decision (hence the Cc:). > > > >I'll see what I can come up with. > > > >Do the arm porters have any opinions on this matter? > > It's a good question, and thanks for asking! I definitely think it's > worth doing -moutline-atomics, and I'm hoping Steve can share some > performance numbers to help convince. :-) > We ran -moutline-atomics on a mixture of development hardware running, IIRC some DPDK lock tests that employed C11-style atomics. As expected there was a performance penalty, but it was order of magnitude of 1%. The perf boost from moving to LSE was a lot larger (and we noticed the variance dropping a lot with LSE too). FWIW, I'd recommend the -moutline-atomics for the general case. (I used to be a fan of the multi-lib approach; but the way the runtime selection is implemented in gcc with a direct branch changed my mind :-) ). Cheers, -- Steve
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
* Noah Meyerhans: > On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: >> > Significant performance impact has also been observed in less contrived >> > cases (MariaDB and Postgres), but I don't have a repro to share. >> >> But indeed what counts is number on real workloads. It would be nice to >> get numbers when those software are run against a rebuilt glibc. As >> those software are using a lot of atomics directly, it would be also >> interesting to have numbers with those software also rebuilt to use >> those new instructions. > > Agreed. I don't have specific examples of real world impact at the > moment. AIUI, the most significant impact comes in the usage of atomics > in pthread_mutex_lock(). When there are multiple threads contending for > a lock, one thread will (approximately) always obtain the lock, while > the others will starve. With atomics support in place, the probability > of obtaining the lock is roughly evenly distributed among all the > threads. So any workload in which multiple threads may contend for a > lock should be a candidate to demonstrate this problem in the real > world. Does this behavior affect just one implementation with LSE, or also implementations without LSE? If the latter, we might need a different mutex implementation for AArch64. 8-(
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Wed, Apr 22, 2020 at 01:08:46PM -0400, Noah Meyerhans wrote: >On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote: >> I think the -moutline-atomics is probably good to enable by default >> once we've got it (gcc 10). that's the suggestion I've heard from gcc >> folks in Arm. > >JFTR, it's been backported to gcc 9 and is available in Debian's gcc-9 >as of 9.3.0-9. See >https://salsa.debian.org/toolchain-team/gcc/-/blob/gcc-9-debian/debian/patches/git-updates.diff Ah, cool. I knew it *was* being backported, but I wasn't aware it was already with us. Woot! -- Steve McIntyre, Cambridge, UK.st...@einval.com Getting a SCSI chain working is perfectly simple if you remember that there must be exactly three terminations: one on one end of the cable, one on the far end, and the goat, terminated over the SCSI chain with a silver-handled knife whilst burning *black* candles. --- Anthony DeBoer
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote: > I think the -moutline-atomics is probably good to enable by default > once we've got it (gcc 10). that's the suggestion I've heard from gcc > folks in Arm. JFTR, it's been backported to gcc 9 and is available in Debian's gcc-9 as of 9.3.0-9. See https://salsa.debian.org/toolchain-team/gcc/-/blob/gcc-9-debian/debian/patches/git-updates.diff noah
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On 2020-04-12 12:18 +0200, Aurelien Jarno wrote: > The problem is not building a second > optimized glibc, but rather providing a safe upgrade as the optimized > and the non-optimized package have to be at the same version or one of > them has to be disabled. This has caused many system breakages overall. > > Also note that the mechanism allowing a safe upgrade *does* incur a > runtime overhead as every binary now has to test for the presence of > /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in > progress. That's why we have disabled it on architecture not providing > an optimized library [1]. Can you explain how this works please? I'm not familiar with this and it seems like something worth understanding in this context. How often is each binary checking for this file, and what exactly does it indicate? And which binaries are checking? Wookey -- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/ signature.asc Description: PGP signature
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
Hi folks! I'm adding a CC to Steve Capper, a colleague in Arm who's our expert here for this kind of question. He's also a DM in Debian... :-) On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote: >On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: > >> It would also be nice to have numbers to see the impact on non-ARMv8.1 >> CPU on real workloads. As pointed out by Florian, and if the impact is >> negligible, it might be a good idea to enable -moutline-atomics >> globally at the GCC level so that all software can benefit from it, and >> instead of only glibc. That could be either upstream or only in Debian, >> that's probably a separate discussion. Otherwise we will likely end up >> using this non-default GCC option on all packages that runs faster with >> it. > >Agreed. I think the -moutline-atomics is probably good to enable by default once we've got it (gcc 10). that's the suggestion I've heard from gcc folks in Arm. >> Also note that the mechanism allowing a safe upgrade *does* incur a >> runtime overhead as every binary now has to test for the presence of >> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in >> progress. That's why we have disabled it on architecture not providing >> an optimized library [1]. Oh, ick. :-/ >Thanks for the pointer, it's interesting to see data on that. This also >suggests that it might be worthwhile to investigate a better mechanism >for identifying the availability of hardware features. > >> > I've tested both options and found them to be acceptable on v8.1a (Neoverse >> > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the >> > various different configuration permutations if you'd like to see >> > additional >> > data. >> >> As said above I think we would need more numbers on real workload to >> take a decision. Don't get me wrong I do not oppose on improving atomics >> on ARMv8.1, but I would like that we chose the best option. Also if we >> go with the -moutline-atomics option, I believe it rather has to be a >> ARM porters decision than a glibc maintainers decision (hence the Cc:). > >I'll see what I can come up with. > >Do the arm porters have any opinions on this matter? It's a good question, and thanks for asking! I definitely think it's worth doing -moutline-atomics, and I'm hoping Steve can share some performance numbers to help convince. :-) -- Steve McIntyre, Cambridge, UK.st...@einval.com Who needs computer imagery when you've got Brian Blessed?
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: > > Significant performance impact has also been observed in less contrived > > cases (MariaDB and Postgres), but I don't have a repro to share. > > But indeed what counts is number on real workloads. It would be nice to > get numbers when those software are run against a rebuilt glibc. As > those software are using a lot of atomics directly, it would be also > interesting to have numbers with those software also rebuilt to use > those new instructions. Agreed. I don't have specific examples of real world impact at the moment. AIUI, the most significant impact comes in the usage of atomics in pthread_mutex_lock(). When there are multiple threads contending for a lock, one thread will (approximately) always obtain the lock, while the others will starve. With atomics support in place, the probability of obtaining the lock is roughly evenly distributed among all the threads. So any workload in which multiple threads may contend for a lock should be a candidate to demonstrate this problem in the real world. > It would also be nice to have numbers to see the impact on non-ARMv8.1 > CPU on real workloads. As pointed out by Florian, and if the impact is > negligible, it might be a good idea to enable -moutline-atomics > globally at the GCC level so that all software can benefit from it, and > instead of only glibc. That could be either upstream or only in Debian, > that's probably a separate discussion. Otherwise we will likely end up > using this non-default GCC option on all packages that runs faster with > it. Agreed. > To be honest from a glibc maintenance point of view it's something I > would like to avoid. We haven't been actively trying to remove the > remaining optimized libraries (on i386, hurd and alpha), but we have > tried to avoid adding new ones. The problem is not building a second > optimized glibc, but rather providing a safe upgrade as the optimized > and the non-optimized package have to be at the same version or one of > them has to be disabled. This has caused many system breakages overall. Understood, that makes sense. I wonder if it's worth it to investigate techniques to improve the situation around optimized libraries. Do you have any thoughts on what such an improvement might look like? > Also note that the mechanism allowing a safe upgrade *does* incur a > runtime overhead as every binary now has to test for the presence of > /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in > progress. That's why we have disabled it on architecture not providing > an optimized library [1]. Thanks for the pointer, it's interesting to see data on that. This also suggests that it might be worthwhile to investigate a better mechanism for identifying the availability of hardware features. > > I've tested both options and found them to be acceptable on v8.1a (Neoverse > > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the > > various different configuration permutations if you'd like to see additional > > data. > > As said above I think we would need more numbers on real workload to > take a decision. Don't get me wrong I do not oppose on improving atomics > on ARMv8.1, but I would like that we chose the best option. Also if we > go with the -moutline-atomics option, I believe it rather has to be a > ARM porters decision than a glibc maintainers decision (hence the Cc:). I'll see what I can come up with. Do the arm porters have any opinions on this matter? noah
Re: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1
Hi, On 2020-04-10 13:16, Noah Meyerhans wrote: > Package: src:glibc > Version: 2.30-4 > Severity: wishlist > X-Debbugs-CC: debian-arm@lists.debian.org > > The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor, > introduces a set of instructions [1] that result in significant performance > improvements for multithreaded applications. Sample code demonstrating the > performance improvements is attached. When run on a 16-core Neoverse N1 > host with glibc 2.30-4, runtimes vary significantly, ranging from lows > around 250ms to highs around 15 seconds. When linked against glibc rebuilt > with support for these instructions, runtimes are consistently <50ms. This is an impressive improvement! > Significant performance impact has also been observed in less contrived > cases (MariaDB and Postgres), but I don't have a repro to share. But indeed what counts is number on real workloads. It would be nice to get numbers when those software are run against a rebuilt glibc. As those software are using a lot of atomics directly, it would be also interesting to have numbers with those software also rebuilt to use those new instructions. > Gcc provides two ways to enable support for these instructions at build > time. The simplest, and least disruptive, is to enable -moutline-atomics > globally in the arm64 glibc build. As described at [2], this option enables > runtime checks for the availability of the atomic instructions. If found, > they are used, otherwise ARMv8.0 compatible code is used. The drawback of > this option is that the check happens at runtime, thus introducing some > overhead on all arm64 installations. It would also be nice to have numbers to see the impact on non-ARMv8.1 CPU on real workloads. As pointed out by Florian, and if the impact is negligible, it might be a good idea to enable -moutline-atomics globally at the GCC level so that all software can benefit from it, and instead of only glibc. That could be either upstream or only in Debian, that's probably a separate discussion. Otherwise we will likely end up using this non-default GCC option on all packages that runs faster with it. > The second option is to provide libraries built with explicit support for > the ARM v8.1a spec via the -march=armv8.1-a flag. This option is also > described at [2]. This build would be incompatible with earlier versions of > the spec, so it would need to be provided in a location where the linker > will automatically discover it if it is usable (e.g. > /lib/aarch64-linux-gnu/atomics/). This does not incur any runtime overhead, > but obviously involves an additional libc build, and the corresponding > complixity and disk space utilization. I'm not sure if this is an option > that the glibc maintainers are interested in pursuing. To be honest from a glibc maintenance point of view it's something I would like to avoid. We haven't been actively trying to remove the remaining optimized libraries (on i386, hurd and alpha), but we have tried to avoid adding new ones. The problem is not building a second optimized glibc, but rather providing a safe upgrade as the optimized and the non-optimized package have to be at the same version or one of them has to be disabled. This has caused many system breakages overall. Also note that the mechanism allowing a safe upgrade *does* incur a runtime overhead as every binary now has to test for the presence of /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in progress. That's why we have disabled it on architecture not providing an optimized library [1]. > I've tested both options and found them to be acceptable on v8.1a (Neoverse > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the > various different configuration permutations if you'd like to see additional > data. As said above I think we would need more numbers on real workload to take a decision. Don't get me wrong I do not oppose on improving atomics on ARMv8.1, but I would like that we chose the best option. Also if we go with the -moutline-atomics option, I believe it rather has to be a ARM porters decision than a glibc maintainers decision (hence the Cc:). > I can provide patches or merge requests implementing either option, at least > for a starting point, if you'd like to see them. Thanks for this offer, but I don't think that's the most difficult part, it's fairly straightforward to go for either of those options once a decision is taken. Regards, Aurelien [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908928 -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net