I have been trying to implement missing support of sigsetjmp/siglongjmp for
aarch64 architecture which is critical (it looks) to get boost based unit
tests running. To that end, I replaced the stubs of these functions in
libc/arch/aarch64/setjmp/ and the real implementations
in libc/arch/x64/setjmp/ to point to new musl 1.1.24 code which support
aarch64 as well (again one more reason why I did the musl upgrade).
After this change and also implementing the functionality in
arch/aarch64/feexcept.cc (taken from newlib) the boost unit tests working
on aarch64.
However, the problem is that two unit test crash on x64:
-
tst-feexcept.cc
-
tst-sigaltstack.so
More specifically all assertions actually pass but the tests apps crash due
to zero address page fault:
./scripts/run.py -e '/tests/tst-sigaltstack.so' -c 1
OSv v0.55.0-107-g81539853
eth0: 192.168.122.15
Booted up in 169.13 ms
Cmdline: /tests/tst-sigaltstack.so
SUMMARY: 6 tests, 0 failures
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x0000000040428884 <osv::application::run_main()+68>
RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x0000000000000000 RBX: 0x0000000000000000 RCX: 0x0000000000000001
RDX: 0x0000000000000000
RSI: 0x00001000000070ac RDI: 0x0000200000700e80 RBP: 0x0000200000700f90
R8: 0xfffffffffffff958
R9: 0x000000004091bb58 R10: 0x0000000000000050 R11: 0xffffa0007ff77400
R12: 0x0000000000000001
R13: 0x000000004091bb58 R14: 0x0000000000000050 R15: 0xffffa0007ff77400
RSP: 0x0000200000700f60
Aborted
[backtrace]
0x000000004020556e <???+1075860846>
0x000000004039268f <page_fault+143>
0x0000000040391546 <???+1077482822>
0x0000000040428a78 <???+1078102648>
0x000000004045a1c5 <???+1078305221>
0x00000000403f7a59 <thread_main_c+41>
0x00000000403924c2 <???+1077486786>
The gdb backtrace looks like this:
(gdb) bt
#0 0x0000000040398e92 in processor::cli_hlt () at arch/x64/processor.hh:247
#1 arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2 osv::halt () at arch/x64/power.cc:26
#3 0x0000000040238530 in abort (fmt=fmt@entry=0x4064451f "Aborted\n") at
runtime.cc:132
#4 0x0000000040202ea6 in abort () at runtime.cc:98
#5 0x000000004020556f in mmu::vm_sigsegv (ef=0xffff800000ef9068, addr=0)
at core/mmu.cc:1314
#6 mmu::vm_sigsegv (ef=0xffff800000ef9068, addr=0) at core/mmu.cc:1308
#7 mmu::vm_fault (addr=0, addr@entry=88, ef=ef@entry=0xffff800000ef9068)
at core/mmu.cc:1336
#8 0x0000000040392690 in page_fault (ef=0xffff800000ef9068) at
arch/x64/mmu.cc:42
#9 <signal handler called>
#10 0x0000000040428884 in osv::application::run_main (this=0x0) at
/usr/include/c++/10/bits/stl_vector.h:918
#11 0x0000000040428a79 in operator() (app=<optimized out>, __closure=0x0)
at core/app.cc:233
#12 _FUN () at core/app.cc:235
#13 0x000000004045a1c6 in operator() (__closure=0xffffa00000cf6800) at
libc/pthread.cc:115
#14 std::__invoke_impl<void, pthread_private::pthread::pthread(void*
(*)(void*), void*, sigset_t, const
pthread_private::thread_attr*)::<lambda()>&> (__f=...) at
/usr/include/c++/10/bits/invoke.h:60
#15 std::__invoke_r<void, pthread_private::pthread::pthread(void*
(*)(void*), void*, sigset_t, const
pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at
/usr/include/c++/10/bits/invoke.h:153
#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void*
(*)(void*), void*, sigset_t, const
pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
std::_Any_data &) (
__functor=...) at /usr/include/c++/10/bits/std_function.h:291
#17 0x00000000403f7a5a in sched::thread::main (this=0xffff800000ef4040) at
core/sched.cc:1210
#18 sched::thread_main_c (t=0xffff800000ef4040) at
arch/x64/arch-switch.hh:321
#19 0x00000000403924c3 in thread_main () at arch/x64/entry.S:113
I think the this=0x0 in frame 10 is the culprit? Did something write over
this memory? BTW the crash for tst-feexcept.cc looks pretty much identical.
Now, I have replaced two files in Makefile from libc to musl and added 3rd
musl file.
diff --git a/Makefile b/Makefile
index 9d2997a1..f7d740b0 100644
--- a/Makefile
+++ b/Makefile
@@ -1398,8 +1398,12 @@ musl += process/wait.o
musl += setjmp/$(musl_arch)/setjmp.o
musl += setjmp/$(musl_arch)/longjmp.o
-libc += arch/$(arch)/setjmp/siglongjmp.o
-libc += arch/$(arch)/setjmp/sigsetjmp.o
+#libc += arch/$(arch)/setjmp/siglongjmp.o
+#libc += arch/$(arch)/setjmp/sigsetjmp.o
+musl += signal/$(musl_arch)/sigsetjmp.o
+musl += signal/siglongjmp.o
+musl += signal/sigsetjmp_tail.o
+$(out)/musl/src/signal/sigsetjmp_tail.o: CFLAGS += --include
libc/syscall_to_function.h
libc += arch/$(arch)/setjmp/block.o
ifeq ($(arch),x64)
libc += arch/$(arch)/ucontext/getcontext.o
Now here is how the 2 files look in libc:
cat libc/arch/x64/setjmp/siglongjmp.c
#include <setjmp.h>
extern void __restore_sigs(void *set);
_Noreturn void siglongjmp(sigjmp_buf buf, int ret)
{
if (buf->__fl) __restore_sigs(buf->__ss);
longjmp(buf, ret);
}
cat libc/arch/x64/setjmp/sigsetjmp.s
/* Copyright 2011-2012 Nicholas J. Kain, licensed under standard MIT
license */
.global sigsetjmp
.global __sigsetjmp
.type sigsetjmp,@function
.type __sigsetjmp,@function
__sigsetjmp:
sigsetjmp:
andl %esi,%esi
movq %rsi,64(%rdi)
jz 1f
pushq %rdi
leaq 72(%rdi),%rdx
xorl %esi,%esi
movl $2,%edi
call sigprocmask
popq %rdi
1: jmp setjmp
And here how 2 changed and 1 extra new file looks in musl:
cat musl/src/signal/siglongjmp.c
#include <setjmp.h>
#include <signal.h>
#include "syscall.h"
#include "pthread_impl.h"
_Noreturn void siglongjmp(sigjmp_buf buf, int ret)
{
longjmp(buf, ret);
}
cat musl/src/signal/x86_64/sigsetjmp.s
.global sigsetjmp
.global __sigsetjmp
.type sigsetjmp,@function
.type __sigsetjmp,@function
sigsetjmp:
__sigsetjmp:
test %esi,%esi
jz 1f
popq 64(%rdi)
mov %rbx,72+8(%rdi)
mov %rdi,%rbx
call setjmp@PLT
pushq 64(%rbx)
mov %rbx,%rdi
mov %eax,%esi
mov 72+8(%rbx),%rbx
.hidden __sigsetjmp_tail
jmp __sigsetjmp_tail
1: jmp setjmp@PLT
cat musl/src/signal/sigsetjmp_tail.c
#include <setjmp.h>
#include <signal.h>
#include "syscall.h"
hidden int __sigsetjmp_tail(sigjmp_buf jb, int ret)
{
void *p = jb->__ss;
__syscall(SYS_rt_sigprocmask, SIG_SETMASK, ret?p:0, ret?0:p, _NSIG/8);
return ret;
}
Please note that we are actually replacing __syscall() in the las file with
the local function call to sigprocmask() using the --include
libc/syscall_to_function.h trick.
As one can tell the implementation of sigsetjmp has changed in musl and
that is causing some issues on our side. Here is the link to the musl
commit -
https://git.musl-libc.org/cgit/musl/commit/?id=583e55122e767b1586286a0d9c35e2a4027998ab
(redesign
sigsetjmp so that signal mask is restored after longjmp) that changed it.
I am not sure if I fully understand the change but it looks the return
address is stored in a different place. Maybe because we use thread local
variable for env causes some issues? Why are using thread local variables
in those 2 tests?
Now, these two tests also fail on aarch64 but for a different reason - we
do not have TLS implemented on aarch64 but I guess if we did those tests
would fail in similar way.
Unfortunately, I do not understand signal handling on OSv and its
limitations so I am not sure if the changes to sigsetjmp on musl side are
actually helpful to us and we should upgrade to the newest version. Or
maybe it would be better to keep the x64 version of it as is and only use
(copy) the pre-583e55122e767b1586286a0d9c35e2a4027998ab-commit version of
aarch64 sigsetjmp from musl instead which would be sad.
Any ideas?
Waldek
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/osv-dev/bc3d65f7-f8d1-4bf6-afeb-76be861afa25n%40googlegroups.com.