Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-07 Thread Noah Meyerhans
On Wed, May 06, 2020 at 04:15:09PM +0200, Aurelien Jarno wrote:
> > >One solution for this would be to ship the optimized library in the same
> > >package as the default library. Now this is not acceptable for embedded
> > >systems as they might not need that library and can't remove it. This is
> > >even more problematic if we need to add more optimized libraries. I guess
> > >this might be the case for arm64 as there are many new extensions in the
> > >pipe.
> > 
> > ACK. It's a problem to ship the different things in separate
> > packages. If it's really a problem for smaller systems to have all the
> > variants because of size, is there maybe another way to do things? How
> > about keeping the existing libc and have an extra package
> > ("libc-optimised") with all the optimised versions *and* the basic
> > version, and have it provide/replace/conflict libc6?
> > 
> > (/me prepares to be ambarrassed as you point out the obvious flaw I'm
> > missing...)
> 
> I guess that the provide/replace/conflict libc6 will just prevent
> installation of foreign libc6 packages, basically making this optimized
> package useless in the multiarch context.
> 
> OTOH, what is the drawback of having GCC defaulting to -moutline-atomics?
> It will improve performance on many more packages than only glibc, and
> is way easier to implement overall. It also means users has nothing to
> do to get additional performances.

For the current issue, defaulting to -moutline-atomics might be a sane
approach.  As you said earlier, though, it seems that there are many new
extensions in the pipe for ARM.  There may not be an equivalent solution
for all of them, and even if there is, at some point the runtime
overhead of all this conditional code is going to add up to something
meaningful.

noah



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-05-04 Thread Noah Meyerhans
On Sun, May 03, 2020 at 11:53:35PM +0200, Aurelien Jarno wrote:
> The hardware capabilities system works fine upstream, but doesn't work
> for us because:
> 1) we want to be able to upgrade major upstream version online (as
> opposed to fedora for example)
> 2) we ship the optimized libraries in a different package
> 
> The various libc librairies need to have the same version at any time,
> this is especially true for ld.so vs libc.so. As we do not upgrade the
> default libc and the optimized one exactly at the same time (they are in
> different packages), we upgrade first the default libc and then we have
> the Debian specific nohwcap mechanism to prevent using the optimize
> library until it has also been upgraded.
> 
> One solution for this would be to ship the optimized library in the same
> package as the default library. Now this is not acceptable for embedded
> systems as they might not need that library and can't remove it. This is
> even more problematic if we need to add more optimized libraries. I guess
> this might be the case for arm64 as there are many new extensions in the
> pipe.

Thanks for taking the time to explain that!

I wonder if it'd make sense for libc to be a virtual package, with
functionality provided by optimized builds and dependencies satisfied
via Provides.  I don't know how well dpkg would cope with transitioning
between providers, which seems like the riskiest side of this kind of
thing.

noah



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-22 Thread Noah Meyerhans
On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote:
> I think the -moutline-atomics is probably good to enable by default
> once we've got it (gcc 10). that's the suggestion I've heard from gcc
> folks in Arm.

JFTR, it's been backported to gcc 9 and is available in Debian's gcc-9
as of 9.3.0-9. See
https://salsa.debian.org/toolchain-team/gcc/-/blob/gcc-9-debian/debian/patches/git-updates.diff

noah



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-11 Thread Noah Meyerhans
On Sat, Apr 11, 2020 at 10:23:54PM +0200, Florian Weimer wrote:
> Or put differently: If upstream doesn't want to default to
> -moutline-atomics, why should Debian?

Well, ultimately we own our build configurations and the optimizations
we enable therein.  If we don't want to enable -moutline-atomics
globally, then a second, optimized library is also an option.  IMO,
timing data like these should be enough to show that it's worth making a
change somewhere:

# 100 serial invocations of the "a.c" program attached to the bug
# report, linked against libc with -moutline-atomics
real0m1.902s
user0m3.488s
sys 0m25.498s

# 100 invocations of the same program linked against glibc with
# -march=armv8.1-a
real0m1.844s
user0m3.137s
sys 0m24.275s

# 100 invocations of the same program against our current libc build:
real8m15.452s
user130m33.139s
sys 0m1.162s

noah



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-11 Thread Noah Meyerhans
On Sat, Apr 11, 2020 at 09:14:11PM +0200, Florian Weimer wrote:
> > At least if I'm reading the code right (which I may very well not be
> > doing, being generally unfamiliar with gcc internals), -mtune=generic
> > enables the equivalent of ARMv8 support:
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/common/config/aarch64/aarch64-common.c;h=0bddcc8c3e9282a957c5479b4df7f68058093bab;hb=HEAD#l176
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/aarch64/aarch64-cores.def;h=ea9b98b4b0ad2a578755561bba5b6d5c56115994;hb=HEAD
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/aarch64/aarch64.h;h=8f08bad3562c4cbe8acdf5891e84f89d23ea6784;hb=HEAD#l226
> 
> Hmm.  I don't see anything that sets TARGET_OUTLINE_ATOMICS by
> default.

Only -moutline-atomics enables that.  Otherwise, unconditional support
for atomics is enabled by TARGET_LSE, which itself is enabled by a
number of options, e.g. -marmv8-a+lse, -marmv8.1-a, etc.

See
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/aarch64/aarch64.c;h=4af562a81ea760891fac3cf7101b8bf887fe7a0d;hb=HEAD#l18961



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-11 Thread Noah Meyerhans
On Sat, Apr 11, 2020 at 08:44:29AM +0200, Florian Weimer wrote:
> > Gcc provides two ways to enable support for these instructions at build
> > time.  The simplest, and least disruptive, is to enable -moutline-atomics
> > globally in the arm64 glibc build.
> 
> Shouldn't GCC do this by default, at least for -mtune=generic?

Maybe.  Would you rather pursue that avenue first?

At least if I'm reading the code right (which I may very well not be
doing, being generally unfamiliar with gcc internals), -mtune=generic
enables the equivalent of ARMv8 support:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/common/config/aarch64/aarch64-common.c;h=0bddcc8c3e9282a957c5479b4df7f68058093bab;hb=HEAD#l176

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/aarch64/aarch64-cores.def;h=ea9b98b4b0ad2a578755561bba5b6d5c56115994;hb=HEAD

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/aarch64/aarch64.h;h=8f08bad3562c4cbe8acdf5891e84f89d23ea6784;hb=HEAD#l226

noah



Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1

2020-04-10 Thread Noah Meyerhans
Package: src:glibc
Version: 2.30-4
Severity: wishlist
X-Debbugs-CC: debian-...@lists.debian.org

The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor,
introduces a set of instructions [1] that result in significant performance
improvements for multithreaded applications.  Sample code demonstrating the
performance improvements is attached.  When run on a 16-core Neoverse N1
host with glibc 2.30-4, runtimes vary significantly, ranging from lows
around 250ms to highs around 15 seconds.  When linked against glibc rebuilt
with support for these instructions, runtimes are consistently <50ms.
Significant performance impact has also been observed in less contrived
cases (MariaDB and Postgres), but I don't have a repro to share.

Gcc provides two ways to enable support for these instructions at build
time.  The simplest, and least disruptive, is to enable -moutline-atomics
globally in the arm64 glibc build.  As described at [2], this option enables
runtime checks for the availability of the atomic instructions.  If found,
they are used, otherwise ARMv8.0 compatible code is used.  The drawback of
this option is that the check happens at runtime, thus introducing some
overhead on all arm64 installations.

The second option is to provide libraries built with explicit support for
the ARM v8.1a spec via the -march=armv8.1-a flag.  This option is also
described at [2].  This build would be incompatible with earlier versions of
the spec, so it would need to be provided in a location where the linker
will automatically discover it if it is usable (e.g.
/lib/aarch64-linux-gnu/atomics/).  This does not incur any runtime overhead,
but obviously involves an additional libc build, and the corresponding
complixity and disk space utilization.  I'm not sure if this is an option
that the glibc maintainers are interested in pursuing.

I've tested both options and found them to be acceptable on v8.1a (Neoverse
N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
various different configuration permutations if you'd like to see additional
data.

I can provide patches or merge requests implementing either option, at least
for a starting point, if you'd like to see them.

Thanks!
noah

1. https://static.docs.arm.com/ddi0557/a/DDI0557A_b_armv8_1_supplement.pdf
   Section B1
2. https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
/*
 * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"). You may
 * not use this file except in compliance with the License. A copy of the
 * License is located at
 *
 *  http://aws.amazon.com/apache2.0/
 *
 * or in the "license" file accompanying this file. This file is distributed
 * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
 * express or implied. See the License for the specific language governing
 * permissions and limitations under the License.
*/

/* Build with:
 * gcc -O2 -o a.out a.c -lpthread -DITER=1000 -DTHREADS=64
*/

#include 
#include 
#include 
#include 

#ifndef ITER
# define ITER 1000
#endif
#ifndef THREADS
# define THREADS 3
#endif

#if THREADS < 1
# error "THREADS is supposed to be at least 1"
#endif

static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static int shared_ptr = 0;

typedef struct stats_s {
  uint64_t min, max;
  int times;
  uint64_t total;
  uint64_t flips;
} stats_t;

stats_t stats[THREADS + 1];
pthread_t threads[THREADS];

#ifdef __aarch64__
static uint64_t cpu_shift() {
  uint64_t shift = 0;
  __asm__ __volatile__ ("mrs %0,cntfrq_el0; clz %w0, %w0":"="(shift));
  return shift;
}
#endif

static uint64_t gettime() {
#ifdef __aarch64__
  uint64_t ret = 0;
  __asm__ __volatile__ ("isb; mrs %0,cntvct_el0":"=r"(ret));
  return ret << cpu_shift();

#elif defined __x86_64__
  uint64_t a, d;
  __asm__ __volatile__ ("rdtsc" : "=a" (a), "=d" (d));
  return ((uint64_t)a + ((uint64_t)d << 32));
#endif

  return 0;
}

static void init_stats() {
  int i;
  for (i = 0; i <= THREADS; i++) {
stats_t *s = [i];
s->min = 100;
s->max = 0;
s->times = 0;
s->total = 0;
s->flips = 0;
  }
}

static void print_stat(int i) {
  stats_t *s = [i];
  float average = (float) s->total / s->times;
  if (i == THREADS)
fprintf(stdout, "server: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", s->min, s->max, average, s->times, s->flips);
  else
fprintf(stdout, "thread %d: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", i, s->min, s->max, average, s->times, s->flips);
}

static void print_stats() {
  int i;
  for (i = 0; i <= THREADS; i++)
print_stat(i);
}

static void update_stats(stats_t *s, uint64_t time) {
  ++s->times;
  if (time < s->min)
s->min = time;
  if (time > s->max)
s->max = time;
  s->total += time;
}

static void fun(int check, int set, stats_t *stat) {
  int loop = 1;
  while (loop) {
uint64_t start = gettime();
pthread_mutex_lock ();
if (shared_ptr 

Bug#303478: forcely is not a word

2005-04-06 Thread Noah Meyerhans
Package: libc6
Version: 2.3.2.ds1-20
Severity: minor

In the libc6 postinst, the following can be found on line 89:
echo Non-interactive mode, upgrade glibc forcely

forcely is not a word in the English language (at least, not according
to any dictionary I've found).  Maybe you mean forcibly or forcedly.

noah



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]