Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-28 Thread Aurelien Jarno
Hi Florian,

On 2019-03-27 23:59, Florian Weimer wrote:
> retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after 
> signal in amd64 32-bit compat mode 
> thanks
> 
> * Lucas Nussbaum:
> 
> > On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
> >> > If that's useful, I can easily provide access to an AWS VM to debug this
> >> > issue.
> >> 
> >> Oh, that would be quite helpful indeed.
> >
> > Can you send your SSH key? (I thought there was a way to get the SSH key
> > for a DD, but I cannot find it anymore)
> >
> > Then you will be able to ssh to root@18.184.55.40.
> > There's sbuild and schroot setup on the VM.
> >
> > When you are done, please 'poweroff' the machine, which will terminate
> > it.
> 
> The issue reproduces outside the chroot, with the stretch userland.
> 
> What happens is that once we get out of the SIGUSR1 signal handler,
> the PKRU register has value zero.  This happens around this code in
> the test:
> 
>   /* Check that in a signal handler, there is no access.  */
>   xsignal (SIGUSR1, _handler);
>   xraise (SIGUSR1);
>   xsignal (SIGUSR1, SIG_DFL);
>   TEST_COMPARE (sigusr1_handler_ran, 1);
> 
> I checked the following (via a breakpoint in pkey_get; I don't think
> GDB can read the PKRU register directly): Inside the SIGUSR1 signal
> handler, PKRU has value 0x5554, as expected for this kernel, but
> after the return, we get zero.  This is the first time a signal is
> delivered on the main thread, so it's consistent with fairly broken
> signal handling as far as the PKRU register is concerned.  I guess
> clearing PKRU in this way might even constitute a minor security bug
> (because the zero value means no restrictions).

Thanks a lot for investigating and for all the details.

> This commit looks highly relevant:
> 
> commit a4455082dc6f0b5d51a23523f77600e8ede47c79
> Author: Dave Hansen 
> Date:   Wed Jun 8 10:25:33 2016 -0700
> 
> x86/signals: Add missing signal_compat code for x86 features
> 
> The 32-bit siginfo is a different binary format than the 64-bit
> one.  So, when running 32-bit binaries on 64-bit kernels, we have
> to convert the kernel's 64-bit version to a 32-bit version that
> userspace can grok.
> 
> If the siginfo_t layout is incorrect (with regards to what the
> hardware writes), I expect that we might end up copying back the wrong
> PKRU value.

This commit is actually already in the 4.9 kernel.

> I'm not sure what to do here.  This really looks like a kernel bug.
> Maybe we should just verify that this is fixed in the buster kernel
> and move on?

I agree. I have been able to find a machine where I can temporarily run
a VM. I have found that the problem has been solved between kernel
4.10-rc6 and 4.10, more precisely between the following debian packages:
- linux-image-4.10.0-rc6-amd64-unsigned version 4.10~rc6-1~exp1
- linux-image-4.10.0-trunk-amd64-unsigned version 4.10-1~exp1

I gave a quick look at the commit logs, and I haven't identified a
commit. I'll look again and try to identify the commit fixing the issue
so that it can be backported in the stretch kernel. I'll then reassign
the bug there.

Regards,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-28 Thread Lucas Nussbaum
On 27/03/19 at 23:59 +0100, Florian Weimer wrote:
> retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after 
> signal in amd64 32-bit compat mode 
> thanks
> 
> * Lucas Nussbaum:
> 
> > On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
> >> > If that's useful, I can easily provide access to an AWS VM to debug this
> >> > issue.
> >> 
> >> Oh, that would be quite helpful indeed.
> >
> > Can you send your SSH key? (I thought there was a way to get the SSH key
> > for a DD, but I cannot find it anymore)
> >
> > Then you will be able to ssh to root@18.184.55.40.
> > There's sbuild and schroot setup on the VM.
> >
> > When you are done, please 'poweroff' the machine, which will terminate
> > it.
> 
> The issue reproduces outside the chroot, with the stretch userland.
> 
> What happens is that once we get out of the SIGUSR1 signal handler,
> the PKRU register has value zero.  This happens around this code in
> the test:
> 
>   /* Check that in a signal handler, there is no access.  */
>   xsignal (SIGUSR1, _handler);
>   xraise (SIGUSR1);
>   xsignal (SIGUSR1, SIG_DFL);
>   TEST_COMPARE (sigusr1_handler_ran, 1);
> 
> I checked the following (via a breakpoint in pkey_get; I don't think
> GDB can read the PKRU register directly): Inside the SIGUSR1 signal
> handler, PKRU has value 0x5554, as expected for this kernel, but
> after the return, we get zero.  This is the first time a signal is
> delivered on the main thread, so it's consistent with fairly broken
> signal handling as far as the PKRU register is concerned.  I guess
> clearing PKRU in this way might even constitute a minor security bug
> (because the zero value means no restrictions).
> 
> This commit looks highly relevant:
> 
> commit a4455082dc6f0b5d51a23523f77600e8ede47c79
> Author: Dave Hansen 
> Date:   Wed Jun 8 10:25:33 2016 -0700
> 
> x86/signals: Add missing signal_compat code for x86 features
> 
> The 32-bit siginfo is a different binary format than the 64-bit
> one.  So, when running 32-bit binaries on 64-bit kernels, we have
> to convert the kernel's 64-bit version to a 32-bit version that
> userspace can grok.
> 
> If the siginfo_t layout is incorrect (with regards to what the
> hardware writes), I expect that we might end up copying back the wrong
> PKRU value.
> 
> I'm not sure what to do here.  This really looks like a kernel bug.
> Maybe we should just verify that this is fixed in the buster kernel
> and move on?
> 
> Lucas, can you run your rebuild tests on newer kernels?

Indeed. I upgraded the kernel to the stretch-backports one, and glibc
builds fine.

Lucas



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-27 Thread Florian Weimer
retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after 
signal in amd64 32-bit compat mode 
thanks

* Lucas Nussbaum:

> On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
>> > If that's useful, I can easily provide access to an AWS VM to debug this
>> > issue.
>> 
>> Oh, that would be quite helpful indeed.
>
> Can you send your SSH key? (I thought there was a way to get the SSH key
> for a DD, but I cannot find it anymore)
>
> Then you will be able to ssh to root@18.184.55.40.
> There's sbuild and schroot setup on the VM.
>
> When you are done, please 'poweroff' the machine, which will terminate
> it.

The issue reproduces outside the chroot, with the stretch userland.

What happens is that once we get out of the SIGUSR1 signal handler,
the PKRU register has value zero.  This happens around this code in
the test:

  /* Check that in a signal handler, there is no access.  */
  xsignal (SIGUSR1, _handler);
  xraise (SIGUSR1);
  xsignal (SIGUSR1, SIG_DFL);
  TEST_COMPARE (sigusr1_handler_ran, 1);

I checked the following (via a breakpoint in pkey_get; I don't think
GDB can read the PKRU register directly): Inside the SIGUSR1 signal
handler, PKRU has value 0x5554, as expected for this kernel, but
after the return, we get zero.  This is the first time a signal is
delivered on the main thread, so it's consistent with fairly broken
signal handling as far as the PKRU register is concerned.  I guess
clearing PKRU in this way might even constitute a minor security bug
(because the zero value means no restrictions).

This commit looks highly relevant:

commit a4455082dc6f0b5d51a23523f77600e8ede47c79
Author: Dave Hansen 
Date:   Wed Jun 8 10:25:33 2016 -0700

x86/signals: Add missing signal_compat code for x86 features

The 32-bit siginfo is a different binary format than the 64-bit
one.  So, when running 32-bit binaries on 64-bit kernels, we have
to convert the kernel's 64-bit version to a 32-bit version that
userspace can grok.

If the siginfo_t layout is incorrect (with regards to what the
hardware writes), I expect that we might end up copying back the wrong
PKRU value.

I'm not sure what to do here.  This really looks like a kernel bug.
Maybe we should just verify that this is fixed in the buster kernel
and move on?

Lucas, can you run your rebuild tests on newer kernels?



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-27 Thread Lucas Nussbaum
On 27/03/19 at 22:00 +0100, Lucas Nussbaum wrote:
> On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
> > > If that's useful, I can easily provide access to an AWS VM to debug this
> > > issue.
> > 
> > Oh, that would be quite helpful indeed.
> 
> Can you send your SSH key? (I thought there was a way to get the SSH key
> for a DD, but I cannot find it anymore)

I found a way. You have access.

> Then you will be able to ssh to root@18.184.55.40.
> There's sbuild and schroot setup on the VM.
> 
> When you are done, please 'poweroff' the machine, which will terminate
> it.
> 
> Lucas



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-27 Thread Lucas Nussbaum
On 27/03/19 at 08:48 +0100, Florian Weimer wrote:
> > If that's useful, I can easily provide access to an AWS VM to debug this
> > issue.
> 
> Oh, that would be quite helpful indeed.

Can you send your SSH key? (I thought there was a way to get the SSH key
for a DD, but I cannot find it anymore)

Then you will be able to ssh to root@18.184.55.40.
There's sbuild and schroot setup on the VM.

When you are done, please 'poweroff' the machine, which will terminate
it.

Lucas



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-27 Thread Florian Weimer
* Lucas Nussbaum:

> On 26/03/19 at 23:10 +0100, Aurelien Jarno wrote:
>> On 2019-03-22 17:30, Florian Weimer wrote:
>> > > About the archive rebuild: The rebuild was done on EC2 VM instances from
>> > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
>> > > failed build was retried once to eliminate random failures.
>> > 
>> > I believe the actual test failure is tst-pkey.
>> > 
>> > Presumably, this rebuild was performed on some Xeon SP CPU.  Do you
>> > know which model?  Do you have any information about the kernel and
>> > hypervisor used?
>> > 
>> > 32-bit protection key support has had issues from time to time.
>> 
>> Do you have some more details about the issue? Is it a glibc or a kernel
>> problem?
>> 
>> If we can't fix the issue easily on the libc side, I guess the way to
>> fix that is to XFAIL that test on 32-bit x86. 
>
> If that's useful, I can easily provide access to an AWS VM to debug this
> issue.

Oh, that would be quite helpful indeed.



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-27 Thread Lucas Nussbaum
On 26/03/19 at 23:10 +0100, Aurelien Jarno wrote:
> On 2019-03-22 17:30, Florian Weimer wrote:
> > > About the archive rebuild: The rebuild was done on EC2 VM instances from
> > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
> > > failed build was retried once to eliminate random failures.
> > 
> > I believe the actual test failure is tst-pkey.
> > 
> > Presumably, this rebuild was performed on some Xeon SP CPU.  Do you
> > know which model?  Do you have any information about the kernel and
> > hypervisor used?
> > 
> > 32-bit protection key support has had issues from time to time.
> 
> Do you have some more details about the issue? Is it a glibc or a kernel
> problem?
> 
> If we can't fix the issue easily on the libc side, I guess the way to
> fix that is to XFAIL that test on 32-bit x86. 

If that's useful, I can easily provide access to an AWS VM to debug this
issue.

Lucas



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-26 Thread Aurelien Jarno
On 2019-03-22 17:30, Florian Weimer wrote:
> > About the archive rebuild: The rebuild was done on EC2 VM instances from
> > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
> > failed build was retried once to eliminate random failures.
> 
> I believe the actual test failure is tst-pkey.
> 
> Presumably, this rebuild was performed on some Xeon SP CPU.  Do you
> know which model?  Do you have any information about the kernel and
> hypervisor used?
> 
> 32-bit protection key support has had issues from time to time.

Do you have some more details about the issue? Is it a glibc or a kernel
problem?

If we can't fix the issue easily on the libc side, I guess the way to
fix that is to XFAIL that test on 32-bit x86. 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-25 Thread Lucas Nussbaum
On 22/03/19 at 17:30 +0100, Florian Weimer wrote:
> > About the archive rebuild: The rebuild was done on EC2 VM instances from
> > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
> > failed build was retried once to eliminate random failures.
> 
> I believe the actual test failure is tst-pkey.
> 
> Presumably, this rebuild was performed on some Xeon SP CPU.  Do you
> know which model?  Do you have any information about the kernel and
> hypervisor used?
> 
> 32-bit protection key support has had issues from time to time.

Hi,

Below is /proc/cpuinfo on one of the VM. I believe that they are all the
same, but I'm not 100% sure.

I don't have any information about the kernel/hypervisor used on the
host system. In the VM, it's the current stretch kernel:
# uname -a
Linux ip-172-31-3-87 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 
GNU/Linux

- Lucas

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping: 4
microcode   : 0x25a
cpu MHz : 2500.000
cache size  : 33792 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 
fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser 
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f 
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt 
xsavec xgetbv1 xsaves ida arat pku ospke
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips: 5000.00
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping: 4
microcode   : 0x25a
cpu MHz : 2500.000
cache size  : 33792 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 
fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser 
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f 
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt 
xsavec xgetbv1 xsaves ida arat pku ospke
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips: 5000.00
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping: 4
microcode   : 0x25a
cpu MHz : 2500.000
cache size  : 33792 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 
fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser 
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f 
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt 
xsavec xgetbv1 xsaves ida arat pku ospke
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips: 5000.00
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping: 4
microcode   : 0x25a
cpu MHz : 2500.000
cache size  : 33792 KB
physical id : 0
siblings: 4
core id : 1

Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory

2019-03-22 Thread Florian Weimer
> About the archive rebuild: The rebuild was done on EC2 VM instances from
> Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
> failed build was retried once to eliminate random failures.

I believe the actual test failure is tst-pkey.

Presumably, this rebuild was performed on some Xeon SP CPU.  Do you
know which model?  Do you have any information about the kernel and
hypervisor used?

32-bit protection key support has had issues from time to time.

Thanks.