I think I have all the answers. See below. On Thursday, January 6, 2022 at 4:55:25 AM UTC-5 Nadav Har'El wrote:
> On Thu, Jan 6, 2022 at 6:56 AM Waldek Kozaczuk <[email protected]> wrote: > >> Hi, >> >> When one compiles kernel with most symbols hidden (which also hides >> completely STD C++ library), the tst-tls-pie.so crashes like so: >> >> #0 0x00000000402da5b2 in processor::cli_hlt () at >> arch/x64/processor.hh:247 >> #1 arch::halt_no_interrupts () at arch/x64/arch.hh:48 >> #2 osv::halt () at arch/x64/power.cc:26 >> #3 0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at >> runtime.cc:137 >> #4 0x00000000402211b2 in abort () at runtime.cc:103 >> #5 0x000000004039cade in osv::generate_signal (siginfo=..., >> ef=0xffff800000e87068) at libc/signal.cc:130 >> #6 0x000000004039cb9f in osv::handle_mmap_fault >> (addr=addr@entry=18446744073709547520, sig=sig@entry=11, >> ef=ef@entry=0xffff800000e87068) >> at libc/signal.cc:145 >> #7 0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068, >> addr=18446744073709547520) at core/mmu.cc:1334 >> #8 mmu::vm_fault (addr=18446744073709547520, >> addr@entry=18446744073709551592, ef=ef@entry=0xffff800000e87068) at >> core/mmu.cc:1354 >> #9 0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at >> arch/x64/mmu.cc:42 >> #10 <signal handler called> >> #11 0x0000100000156cf7 in std::ostream::sentry::sentry >> (this=0x200000200df0, __os=...) >> at >> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51 >> > > Since your kernel does *not* export the C++ library, where did this symbol > come from? Did you build the tests with the host's C++ library? > I'll continue reading with the assumption that you did. > It turns out that even though the symbols are hidden and their names do not show up in the stack trace printed by the kernel, there are still in the original elf before stripping so when I connect with gdb pointing to loader.elf the symbol names show up nicely. readelf -sW build/release/loader-stripped.elf | grep run_init_funcs #NOTHING readelf -sW build/release/loader.elf | grep run_init_funcs 14789: 0000000040297120 488 FUNC LOCAL DEFAULT 4 _ZN3elf6object14run_init_funcsEiPPc > > >> #12 0x000010000015743c in std::__ostream_insert<char, >> std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4) >> at >> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82 >> #13 0x0000000000401ccf in std::operator<< <std::char_traits<char> > >> (__s=<optimized out>, __out=...) >> at /usr/include/c++/10/bits/char_traits.h:371 >> #14 report (ok=<optimized out>, msg="v7 in init function") at >> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57 >> #15 0x00000000004014b3 in before_main () at >> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127 >> #16 0x00000000402972c5 in elf::object::run_init_funcs >> (this=0xffffa0000094a600, argc=argc@entry=1, >> argv=argv@entry=0xffffa0000094a400) >> at core/elf.cc:1178 >> #17 0x0000000040298a0b in elf::program::init_library (this=<optimized >> out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500 >> #18 0x000000004020c239 in osv::application::main >> (this=0xffffa0000094cc10) at core/app.cc:319 >> #19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0) >> at core/app.cc:236 >> #20 _FUN () at core/app.cc:238 >> #21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at >> libc/pthread.cc:116 >> #22 std::__invoke_impl<void, pthread_private::pthread::pthread(void* >> (*)(void*), void*, sigset_t, const >> pthread_private::thread_attr*)::<lambda()>&> (__f=...) at >> /usr/include/c++/10/bits/invoke.h:60 >> #23 std::__invoke_r<void, pthread_private::pthread::pthread(void* >> (*)(void*), void*, sigset_t, const >> pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at >> /usr/include/c++/10/bits/invoke.h:153 >> #24 std::_Function_handler<void(), >> pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const >> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const >> std::_Any_data &) (__functor=...) at >> /usr/include/c++/10/bits/std_function.h:291 >> #25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) >> at core/sched.cc:1267 >> #26 sched::thread_main_c (t=0xffff800000e82040) at >> arch/x64/arch-switch.hh:325 >> #27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116 >> >> I have been researching this for a bit and realized that when I change >> the report() function not use std::cout but instead use printf() like so: >> >> printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str()); >> the crash goes away. >> >> I knew something was wrong with the initialization of some objects but >> could not quite pin it down. Then I tried to run the tst-tls-pie.so on the >> Linux host (it is a pie) and it crashed with a segmentation fault. This >> made me think that maybe something is wrong with the test program itself. >> > > Interesting. I remember tests/tst-tls.cc did work on Linux in the past, > and the "pie" compilation which you added in > 70547a6c64c56cbdf4c5b5d0c7eaa1e85badb0f7 should supposedly have worked too > but I don't see that you documented it. Maybe you never ran it on Linux? > I think I never ran it on Linux or if I did and my memory serves me well it did crash but I never got to the bottom of why. For example the tst-getopt-pie.so always worked on Linux. > > >> Finally, I found this on the internet - >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody >> tried to compile and run a program with a similar construct: std::cout used >> in the __constructor__ annotated function. The stack trace looks very >> similar as well. The bug was rejected with the explanation that the >> std::ios_base::Init object may not be initialized yet if used in the >> __constructor__ annotated function. >> > >> What is interesting this test happens to work with kernel exposing all >> symbols including stdc++ because the kernel copy of std::ios_base::Init was >> already initialized. >> > > Very interesting, and nice detective work. Yes, I think this is exactly > the same bug. > *However*, I'm not sure I understand why it shouldn't have worked: > > If test-tls-pie's *depends* on libstdc++.so, unless I'm missing something, > OSv should run libstdc++'s constructors *before* it runs test-tls-pie's. > So when tests/tst-tls's constructor function runs, libstdc++ should > already have been fully constructed. > I see below you also raised this question.... > > That being said, I'm probably missing something because if I was > correctly, it would run correctly on Linux and > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 would not exist... > After more research and using the simpler program below (taken from gcc report above), it turns out that std::ios_base::Init::Init() that initializes std::cout is called by an init function that is generated by the compiler as part of the test program itself NOT as we thought as part of libstdc++.so. So indeed it all depends on the order of the init functions in the same object and it seems that the function calling std::ios_base::Init::Init() happens to be after our init function - before_main. Here is a simpler example: #include <iostream> __attribute__((constructor)) void constructor() { std::cout << "Starting" << std::endl; } int main() { std::cout << "Hello, World!" << std::endl; return 0; } __attribute__((destructor)) void destructor() { std::cout << "Finishing" << std::endl; } It crashes on Linux and on OSv. When connected with gdb with a breakpoint in elf::run_init_funcs() I see this: (gdb) p funcs $2 = (void (**)(int, char **)) 0x403db8 (gdb) p funcs[0] $3 = (void (*)(int, char **)) 0x401150 <frame_dummy> (gdb) p funcs[1] $4 = (void (*)(int, char **)) 0x401156 <constructor()> (gdb) p funcs[2] $5 = (void (*)(int, char **)) 0x401230 <_GLOBAL__sub_I__Z11constructorv> (gdb) disas 0x401156 Dump of assembler code for function _Z11constructorv: 0x0000000000401156 <+0>: push %rbp 0x0000000000401157 <+1>: mov %rsp,%rbp 0x000000000040115a <+4>: lea 0xeb0(%rip),%rsi # 0x402011 0x0000000000401161 <+11>: lea 0x2ed8(%rip),%rdi # 0x404040 <_ZSt4cout@@GLIBCXX_3.4> 0x0000000000401168 <+18>: call 0x401040 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt> 0x000000000040116d <+23>: mov %rax,%rdx 0x0000000000401170 <+26>: mov 0x2e69(%rip),%rax # 0x403fe0 0x0000000000401177 <+33>: mov %rax,%rsi 0x000000000040117a <+36>: mov %rdx,%rdi 0x000000000040117d <+39>: call 0x401050 <_ZNSolsEPFRSoS_E@plt> 0x0000000000401182 <+44>: nop 0x0000000000401183 <+45>: pop %rbp 0x0000000000401184 <+46>: ret End of assembler dump. (gdb) disas 0x401230 Dump of assembler code for function _GLOBAL__sub_I__Z11constructorv: 0x0000000000401230 <+0>: push %rbp 0x0000000000401231 <+1>: mov %rsp,%rbp 0x0000000000401234 <+4>: mov $0xffff,%esi 0x0000000000401239 <+9>: mov $0x1,%edi 0x000000000040123e <+14>: call 0x4011e7 <_Z41__static_initialization_and_destruction_0ii> 0x0000000000401243 <+19>: pop %rbp 0x0000000000401244 <+20>: ret End of assembler dump. (gdb) disas _Z41__static_initialization_and_destruction_0ii Dump of assembler code for function __static_initialization_and_destruction_0(int, int): 0x00000000402029e0 <+0>: push %rbp 0x00000000402029e1 <+1>: mov $0x405910d0,%edi 0x00000000402029e6 <+6>: mov %rsp,%rbp 0x00000000402029e9 <+9>: sub $0x170,%rsp 0x00000000402029f0 <+16>: call 0x403e9ec0 <_ZNSt8ios_base4InitC2Ev> 0x00000000402029f5 <+21>: mov $0x40590888,%edx 0x00000000402029fa <+26>: mov $0x405910d0,%esi 0x00000000402029ff <+31>: mov $0x403ea6c0,%edi 0x0000000040202a04 <+36>: call 0x40221dc0 <__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)> 0x0000000040202a09 <+41>: mov $0x404a037e,%ecx 0x0000000040202a0e <+46>: mov $0x404a0368,%r8d 0x0000000040202a14 <+52>: xor %esi,%esi 0x0000000040202a16 <+54>: mov $0x404a0598,%edx 0x0000000040202a1b <+59>: mov $0x40591060,%edi 0x0000000040202a20 <+64>: call 0x40345500 <_ZN15tracepoint_baseC2EjRKSt9type_infoPKcS4_> 0x0000000040202a25 <+69>: mov $0x40591060,%esi 0x0000000040202a2a <+74>: mov $0x40234ca0,%edi 0x0000000040202a2f <+79>: mov $0x40590888,%edx 0x0000000040202a34 <+84>: movq $0x404a054f,0x38e641(%rip) # 0x40591080 <_ZL15trace_tcp_state+32> 0x0000000040202a3f <+95>: call 0x40221dc0 <__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)> 0x0000000040202a44 <+100>: mov $0x404a03a0,%esi 0x0000000040202a49 <+105>: lea -0x170(%rbp),%rdi 0x0000000040202a50 <+112>: mov $0x2e,%ecx 0x0000000040202a55 <+117>: rep movsq %ds:(%rsi),%es:(%rdi) 0x0000000040202a58 <+120>: lea -0x170(%rbp),%rax 0x0000000040202a5f <+127>: mov %rbp,%rsi 0x0000000040202a62 <+130>: nopw 0x0(%rax,%rax,1) 0x0000000040202a68 <+136>: mov (%rax),%rdx 0x0000000040202a6b <+139>: movzbl 0x8(%rax),%ecx 0x0000000040202a6f <+143>: add $0x10,%rax 0x0000000040202a73 <+147>: mov %cl,0x40590f60(%rdx) 0x0000000040202a79 <+153>: cmp %rsi,%rax 0x0000000040202a7c <+156>: jne 0x40202a68 <__static_initialization_and_destruction_0(int, int)+136> 0x0000000040202a7e <+158>: leave 0x0000000040202a7f <+159>: ret As you can see the _GLOBAL__sub_I__Z11constructorv which calls __static_initialization_and_destruction_0 which calls _ZNSt8ios_base4InitC2Ev is after the _Z11constructorv function defined as part of the test program. So indeed the std::cout is not initialized yet at this point. So as the gcc report comment say, it is NOT valid to use std::cot (and others) in an init function as the order in which these function and generated by the compiler and called in undetermined. In other words that simple sample program above and our tst-tls-pie.so is not a valid C++ program. The reason it works when symbols are NOT hidden is because we include and export libstdc++.so (statically linked) os the std::cout gets initialized as part of the kernel early on. > > >> So I think the tst-tls.cc needs to be changed to not to use std:cout in >> the report function or uses different report() from before_main(). >> > > That sounds like a good workaround, but I would have liked to understand > better if this problem is expected, or some sort of OSv > initialization-order bug. > I think this is more than a workaround. It is actually fixing an invalid program that just happens to work when all symbols and libstdc++.so is exported. > > >> Waldek >> >> PS1. Regarding using std:cout in various __constructor__ kernel >> functions like parse_madt() (which calls debug()) and called by smp_init(), >> I wonder if we are just lucky that std::ios_base::Init is already >> initialized and we do not see similar crashes. >> > > I think we already ironed out these issues in the kernel log ago. We have > an explicit order for some things (see include/osv/prio.hh) and all the C++ > code which depends on other stuff being initialized happens after those > things. I don't remember the details though.. > > >> >> PS2: This printout with dynamic linker info suggests that all global >> objects in stdc++.so should have been initialized before calling >> before_main(). So why is std::ios_base::Init not initialized yet? >> >> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing DT_INIT function >> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing DT_INIT >> function >> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing 12 >> DT_INIT_ARRAYSZ functions >> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing 12 >> DT_INIT_ARRAYSZ functions >> ELF [tid:26, mod:4, /tests/libtls.so]: Executing DT_INIT function >> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing DT_INIT function >> ELF [tid:26, mod:4, /tests/libtls.so]: Executing 1 DT_INIT_ARRAYSZ >> functions >> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing 1 >> DT_INIT_ARRAYSZ functions >> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing DT_INIT function >> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing DT_INIT >> function >> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing 2 DT_INIT_ARRAYSZ >> functions >> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing 2 >> DT_INIT_ARRAYSZ functions >> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing DT_INIT function >> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Finished executing DT_INIT >> function >> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing 3 DT_INIT_ARRAYSZ >> functions >> Aborted >> >> [backtrace] >> 0x000000004039cadd <???+1077529309> >> 0x000000004039cb9e <???+1077529502> >> 0x000000004028709a <???+1076392090> >> 0x00000000402d3d8f <???+1076706703> >> 0x00000000402d2bb6 <???+1076702134> >> 0x00000000004050bf <???+4214975> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/6dac9cfd-5db7-408d-8742-73cc38cf6402n%40googlegroups.com.
