Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
Hi Florian, On 2019-03-27 23:59, Florian Weimer wrote: > retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after > signal in amd64 32-bit compat mode > thanks > > * Lucas Nussbaum: > > > On 27/03/19 at 08:48 +0100, Florian Weimer wrote: > >> > If that's useful, I can easily provide access to an AWS VM to debug this > >> > issue. > >> > >> Oh, that would be quite helpful indeed. > > > > Can you send your SSH key? (I thought there was a way to get the SSH key > > for a DD, but I cannot find it anymore) > > > > Then you will be able to ssh to root@18.184.55.40. > > There's sbuild and schroot setup on the VM. > > > > When you are done, please 'poweroff' the machine, which will terminate > > it. > > The issue reproduces outside the chroot, with the stretch userland. > > What happens is that once we get out of the SIGUSR1 signal handler, > the PKRU register has value zero. This happens around this code in > the test: > > /* Check that in a signal handler, there is no access. */ > xsignal (SIGUSR1, _handler); > xraise (SIGUSR1); > xsignal (SIGUSR1, SIG_DFL); > TEST_COMPARE (sigusr1_handler_ran, 1); > > I checked the following (via a breakpoint in pkey_get; I don't think > GDB can read the PKRU register directly): Inside the SIGUSR1 signal > handler, PKRU has value 0x5554, as expected for this kernel, but > after the return, we get zero. This is the first time a signal is > delivered on the main thread, so it's consistent with fairly broken > signal handling as far as the PKRU register is concerned. I guess > clearing PKRU in this way might even constitute a minor security bug > (because the zero value means no restrictions). Thanks a lot for investigating and for all the details. > This commit looks highly relevant: > > commit a4455082dc6f0b5d51a23523f77600e8ede47c79 > Author: Dave Hansen > Date: Wed Jun 8 10:25:33 2016 -0700 > > x86/signals: Add missing signal_compat code for x86 features > > The 32-bit siginfo is a different binary format than the 64-bit > one. So, when running 32-bit binaries on 64-bit kernels, we have > to convert the kernel's 64-bit version to a 32-bit version that > userspace can grok. > > If the siginfo_t layout is incorrect (with regards to what the > hardware writes), I expect that we might end up copying back the wrong > PKRU value. This commit is actually already in the 4.9 kernel. > I'm not sure what to do here. This really looks like a kernel bug. > Maybe we should just verify that this is fixed in the buster kernel > and move on? I agree. I have been able to find a machine where I can temporarily run a VM. I have found that the problem has been solved between kernel 4.10-rc6 and 4.10, more precisely between the following debian packages: - linux-image-4.10.0-rc6-amd64-unsigned version 4.10~rc6-1~exp1 - linux-image-4.10.0-trunk-amd64-unsigned version 4.10-1~exp1 I gave a quick look at the commit logs, and I haven't identified a commit. I'll look again and try to identify the commit fixing the issue so that it can be backported in the stretch kernel. I'll then reassign the bug there. Regards, Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 27/03/19 at 23:59 +0100, Florian Weimer wrote: > retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after > signal in amd64 32-bit compat mode > thanks > > * Lucas Nussbaum: > > > On 27/03/19 at 08:48 +0100, Florian Weimer wrote: > >> > If that's useful, I can easily provide access to an AWS VM to debug this > >> > issue. > >> > >> Oh, that would be quite helpful indeed. > > > > Can you send your SSH key? (I thought there was a way to get the SSH key > > for a DD, but I cannot find it anymore) > > > > Then you will be able to ssh to root@18.184.55.40. > > There's sbuild and schroot setup on the VM. > > > > When you are done, please 'poweroff' the machine, which will terminate > > it. > > The issue reproduces outside the chroot, with the stretch userland. > > What happens is that once we get out of the SIGUSR1 signal handler, > the PKRU register has value zero. This happens around this code in > the test: > > /* Check that in a signal handler, there is no access. */ > xsignal (SIGUSR1, _handler); > xraise (SIGUSR1); > xsignal (SIGUSR1, SIG_DFL); > TEST_COMPARE (sigusr1_handler_ran, 1); > > I checked the following (via a breakpoint in pkey_get; I don't think > GDB can read the PKRU register directly): Inside the SIGUSR1 signal > handler, PKRU has value 0x5554, as expected for this kernel, but > after the return, we get zero. This is the first time a signal is > delivered on the main thread, so it's consistent with fairly broken > signal handling as far as the PKRU register is concerned. I guess > clearing PKRU in this way might even constitute a minor security bug > (because the zero value means no restrictions). > > This commit looks highly relevant: > > commit a4455082dc6f0b5d51a23523f77600e8ede47c79 > Author: Dave Hansen > Date: Wed Jun 8 10:25:33 2016 -0700 > > x86/signals: Add missing signal_compat code for x86 features > > The 32-bit siginfo is a different binary format than the 64-bit > one. So, when running 32-bit binaries on 64-bit kernels, we have > to convert the kernel's 64-bit version to a 32-bit version that > userspace can grok. > > If the siginfo_t layout is incorrect (with regards to what the > hardware writes), I expect that we might end up copying back the wrong > PKRU value. > > I'm not sure what to do here. This really looks like a kernel bug. > Maybe we should just verify that this is fixed in the buster kernel > and move on? > > Lucas, can you run your rebuild tests on newer kernels? Indeed. I upgraded the kernel to the stretch-backports one, and glibc builds fine. Lucas
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register after signal in amd64 32-bit compat mode thanks * Lucas Nussbaum: > On 27/03/19 at 08:48 +0100, Florian Weimer wrote: >> > If that's useful, I can easily provide access to an AWS VM to debug this >> > issue. >> >> Oh, that would be quite helpful indeed. > > Can you send your SSH key? (I thought there was a way to get the SSH key > for a DD, but I cannot find it anymore) > > Then you will be able to ssh to root@18.184.55.40. > There's sbuild and schroot setup on the VM. > > When you are done, please 'poweroff' the machine, which will terminate > it. The issue reproduces outside the chroot, with the stretch userland. What happens is that once we get out of the SIGUSR1 signal handler, the PKRU register has value zero. This happens around this code in the test: /* Check that in a signal handler, there is no access. */ xsignal (SIGUSR1, _handler); xraise (SIGUSR1); xsignal (SIGUSR1, SIG_DFL); TEST_COMPARE (sigusr1_handler_ran, 1); I checked the following (via a breakpoint in pkey_get; I don't think GDB can read the PKRU register directly): Inside the SIGUSR1 signal handler, PKRU has value 0x5554, as expected for this kernel, but after the return, we get zero. This is the first time a signal is delivered on the main thread, so it's consistent with fairly broken signal handling as far as the PKRU register is concerned. I guess clearing PKRU in this way might even constitute a minor security bug (because the zero value means no restrictions). This commit looks highly relevant: commit a4455082dc6f0b5d51a23523f77600e8ede47c79 Author: Dave Hansen Date: Wed Jun 8 10:25:33 2016 -0700 x86/signals: Add missing signal_compat code for x86 features The 32-bit siginfo is a different binary format than the 64-bit one. So, when running 32-bit binaries on 64-bit kernels, we have to convert the kernel's 64-bit version to a 32-bit version that userspace can grok. If the siginfo_t layout is incorrect (with regards to what the hardware writes), I expect that we might end up copying back the wrong PKRU value. I'm not sure what to do here. This really looks like a kernel bug. Maybe we should just verify that this is fixed in the buster kernel and move on? Lucas, can you run your rebuild tests on newer kernels?
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 27/03/19 at 22:00 +0100, Lucas Nussbaum wrote: > On 27/03/19 at 08:48 +0100, Florian Weimer wrote: > > > If that's useful, I can easily provide access to an AWS VM to debug this > > > issue. > > > > Oh, that would be quite helpful indeed. > > Can you send your SSH key? (I thought there was a way to get the SSH key > for a DD, but I cannot find it anymore) I found a way. You have access. > Then you will be able to ssh to root@18.184.55.40. > There's sbuild and schroot setup on the VM. > > When you are done, please 'poweroff' the machine, which will terminate > it. > > Lucas
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 27/03/19 at 08:48 +0100, Florian Weimer wrote: > > If that's useful, I can easily provide access to an AWS VM to debug this > > issue. > > Oh, that would be quite helpful indeed. Can you send your SSH key? (I thought there was a way to get the SSH key for a DD, but I cannot find it anymore) Then you will be able to ssh to root@18.184.55.40. There's sbuild and schroot setup on the VM. When you are done, please 'poweroff' the machine, which will terminate it. Lucas
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
* Lucas Nussbaum: > On 26/03/19 at 23:10 +0100, Aurelien Jarno wrote: >> On 2019-03-22 17:30, Florian Weimer wrote: >> > > About the archive rebuild: The rebuild was done on EC2 VM instances from >> > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every >> > > failed build was retried once to eliminate random failures. >> > >> > I believe the actual test failure is tst-pkey. >> > >> > Presumably, this rebuild was performed on some Xeon SP CPU. Do you >> > know which model? Do you have any information about the kernel and >> > hypervisor used? >> > >> > 32-bit protection key support has had issues from time to time. >> >> Do you have some more details about the issue? Is it a glibc or a kernel >> problem? >> >> If we can't fix the issue easily on the libc side, I guess the way to >> fix that is to XFAIL that test on 32-bit x86. > > If that's useful, I can easily provide access to an AWS VM to debug this > issue. Oh, that would be quite helpful indeed.
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 26/03/19 at 23:10 +0100, Aurelien Jarno wrote: > On 2019-03-22 17:30, Florian Weimer wrote: > > > About the archive rebuild: The rebuild was done on EC2 VM instances from > > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every > > > failed build was retried once to eliminate random failures. > > > > I believe the actual test failure is tst-pkey. > > > > Presumably, this rebuild was performed on some Xeon SP CPU. Do you > > know which model? Do you have any information about the kernel and > > hypervisor used? > > > > 32-bit protection key support has had issues from time to time. > > Do you have some more details about the issue? Is it a glibc or a kernel > problem? > > If we can't fix the issue easily on the libc side, I guess the way to > fix that is to XFAIL that test on 32-bit x86. If that's useful, I can easily provide access to an AWS VM to debug this issue. Lucas
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 2019-03-22 17:30, Florian Weimer wrote: > > About the archive rebuild: The rebuild was done on EC2 VM instances from > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every > > failed build was retried once to eliminate random failures. > > I believe the actual test failure is tst-pkey. > > Presumably, this rebuild was performed on some Xeon SP CPU. Do you > know which model? Do you have any information about the kernel and > hypervisor used? > > 32-bit protection key support has had issues from time to time. Do you have some more details about the issue? Is it a glibc or a kernel problem? If we can't fix the issue easily on the libc side, I guess the way to fix that is to XFAIL that test on 32-bit x86. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
On 22/03/19 at 17:30 +0100, Florian Weimer wrote: > > About the archive rebuild: The rebuild was done on EC2 VM instances from > > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every > > failed build was retried once to eliminate random failures. > > I believe the actual test failure is tst-pkey. > > Presumably, this rebuild was performed on some Xeon SP CPU. Do you > know which model? Do you have any information about the kernel and > hypervisor used? > > 32-bit protection key support has had issues from time to time. Hi, Below is /proc/cpuinfo on one of the VM. I believe that they are all the same, but I'm not 100% sure. I don't have any information about the kernel/hypervisor used on the host system. In the VM, it's the current stretch kernel: # uname -a Linux ip-172-31-3-87 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux - Lucas processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping: 4 microcode : 0x25a cpu MHz : 2500.000 cache size : 33792 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips: 5000.00 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping: 4 microcode : 0x25a cpu MHz : 2500.000 cache size : 33792 KB physical id : 0 siblings: 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips: 5000.00 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping: 4 microcode : 0x25a cpu MHz : 2500.000 cache size : 33792 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips: 5000.00 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz stepping: 4 microcode : 0x25a cpu MHz : 2500.000 cache size : 33792 KB physical id : 0 siblings: 4 core id : 1
Bug#924891: glibc: FTBFS: /<>/build-tree/amd64-libc/conform/UNIX98/ndbm.h/scratch/ndbm.h-test.c:1:10: fatal error: ndbm.h: No such file or directory
> About the archive rebuild: The rebuild was done on EC2 VM instances from > Amazon Web Services, using a clean, minimal and up-to-date chroot. Every > failed build was retried once to eliminate random failures. I believe the actual test failure is tst-pkey. Presumably, this rebuild was performed on some Xeon SP CPU. Do you know which model? Do you have any information about the kernel and hypervisor used? 32-bit protection key support has had issues from time to time. Thanks.