On Thu, Jan 6, 2022 at 6:56 AM Waldek Kozaczuk <[email protected]> wrote:
> Hi,
>
> When one compiles kernel with most symbols hidden (which also hides
> completely STD C++ library), the tst-tls-pie.so crashes like so:
>
> #0 0x00000000402da5b2 in processor::cli_hlt () at
> arch/x64/processor.hh:247
> #1 arch::halt_no_interrupts () at arch/x64/arch.hh:48
> #2 osv::halt () at arch/x64/power.cc:26
> #3 0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at
> runtime.cc:137
> #4 0x00000000402211b2 in abort () at runtime.cc:103
> #5 0x000000004039cade in osv::generate_signal (siginfo=...,
> ef=0xffff800000e87068) at libc/signal.cc:130
> #6 0x000000004039cb9f in osv::handle_mmap_fault
> (addr=addr@entry=18446744073709547520,
> sig=sig@entry=11, ef=ef@entry=0xffff800000e87068)
> at libc/signal.cc:145
> #7 0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068,
> addr=18446744073709547520) at core/mmu.cc:1334
> #8 mmu::vm_fault (addr=18446744073709547520, addr@entry=18446744073709551592,
> ef=ef@entry=0xffff800000e87068) at core/mmu.cc:1354
> #9 0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at
> arch/x64/mmu.cc:42
> #10 <signal handler called>
> #11 0x0000100000156cf7 in std::ostream::sentry::sentry
> (this=0x200000200df0, __os=...)
> at
> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51
>
Since your kernel does *not* export the C++ library, where did this symbol
come from? Did you build the tests with the host's C++ library?
I'll continue reading with the assumption that you did.
> #12 0x000010000015743c in std::__ostream_insert<char,
> std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4)
> at
> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
> #13 0x0000000000401ccf in std::operator<< <std::char_traits<char> >
> (__s=<optimized out>, __out=...)
> at /usr/include/c++/10/bits/char_traits.h:371
> #14 report (ok=<optimized out>, msg="v7 in init function") at
> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57
> #15 0x00000000004014b3 in before_main () at
> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127
> #16 0x00000000402972c5 in elf::object::run_init_funcs
> (this=0xffffa0000094a600, argc=argc@entry=1, argv=argv@entry
> =0xffffa0000094a400)
> at core/elf.cc:1178
> #17 0x0000000040298a0b in elf::program::init_library (this=<optimized
> out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500
> #18 0x000000004020c239 in osv::application::main (this=0xffffa0000094cc10)
> at core/app.cc:319
> #19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0)
> at core/app.cc:236
> #20 _FUN () at core/app.cc:238
> #21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at
> libc/pthread.cc:116
> #22 std::__invoke_impl<void, pthread_private::pthread::pthread(void*
> (*)(void*), void*, sigset_t, const
> pthread_private::thread_attr*)::<lambda()>&> (__f=...) at
> /usr/include/c++/10/bits/invoke.h:60
> #23 std::__invoke_r<void, pthread_private::pthread::pthread(void*
> (*)(void*), void*, sigset_t, const
> pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at
> /usr/include/c++/10/bits/invoke.h:153
> #24 std::_Function_handler<void(), pthread_private::pthread::pthread(void*
> (*)(void*), void*, sigset_t, const
> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
> std::_Any_data &) (__functor=...) at
> /usr/include/c++/10/bits/std_function.h:291
> #25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) at
> core/sched.cc:1267
> #26 sched::thread_main_c (t=0xffff800000e82040) at
> arch/x64/arch-switch.hh:325
> #27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116
>
> I have been researching this for a bit and realized that when I change the
> report() function not use std::cout but instead use printf() like so:
>
> printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str());
> the crash goes away.
>
> I knew something was wrong with the initialization of some objects but
> could not quite pin it down. Then I tried to run the tst-tls-pie.so on the
> Linux host (it is a pie) and it crashed with a segmentation fault. This
> made me think that maybe something is wrong with the test program itself.
>
Interesting. I remember tests/tst-tls.cc did work on Linux in the past, and
the "pie" compilation which you added in
70547a6c64c56cbdf4c5b5d0c7eaa1e85badb0f7 should supposedly have worked too
but I don't see that you documented it. Maybe you never ran it on Linux?
> Finally, I found this on the internet -
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody tried
> to compile and run a program with a similar construct: std::cout used in
> the __constructor__ annotated function. The stack trace looks very similar
> as well. The bug was rejected with the explanation that the
> std::ios_base::Init object may not be initialized yet if used in the
> __constructor__ annotated function.
>
> What is interesting this test happens to work with kernel exposing all
> symbols including stdc++ because the kernel copy of std::ios_base::Init was
> already initialized.
>
Very interesting, and nice detective work. Yes, I think this is exactly the
same bug.
*However*, I'm not sure I understand why it shouldn't have worked:
If test-tls-pie's *depends* on libstdc++.so, unless I'm missing something,
OSv should run libstdc++'s constructors *before* it runs test-tls-pie's.
So when tests/tst-tls's constructor function runs, libstdc++ should already
have been fully constructed.
I see below you also raised this question....
That being said, I'm probably missing something because if I was correctly,
it would run correctly on Linux and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 would not exist...
> So I think the tst-tls.cc needs to be changed to not to use std:cout in
> the report function or uses different report() from before_main().
>
That sounds like a good workaround, but I would have liked to understand
better if this problem is expected, or some sort of OSv
initialization-order bug.
> Waldek
>
> PS1. Regarding using std:cout in various __constructor__ kernel functions
> like parse_madt() (which calls debug()) and called by smp_init(), I wonder
> if we are just lucky that std::ios_base::Init is already initialized and we
> do not see similar crashes.
>
I think we already ironed out these issues in the kernel log ago. We have
an explicit order for some things (see include/osv/prio.hh) and all the C++
code which depends on other stuff being initialized happens after those
things. I don't remember the details though..
>
> PS2: This printout with dynamic linker info suggests that all global
> objects in stdc++.so should have been initialized before calling
> before_main(). So why is std::ios_base::Init not initialized yet?
>
> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing DT_INIT function
> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing DT_INIT
> function
> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing 12 DT_INIT_ARRAYSZ
> functions
> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing 12
> DT_INIT_ARRAYSZ functions
> ELF [tid:26, mod:4, /tests/libtls.so]: Executing DT_INIT function
> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing DT_INIT function
> ELF [tid:26, mod:4, /tests/libtls.so]: Executing 1 DT_INIT_ARRAYSZ
> functions
> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing 1
> DT_INIT_ARRAYSZ functions
> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing DT_INIT function
> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing DT_INIT
> function
> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing 2 DT_INIT_ARRAYSZ
> functions
> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing 2
> DT_INIT_ARRAYSZ functions
> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing DT_INIT function
> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Finished executing DT_INIT
> function
> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing 3 DT_INIT_ARRAYSZ
> functions
> Aborted
>
> [backtrace]
> 0x000000004039cadd <???+1077529309>
> 0x000000004039cb9e <???+1077529502>
> 0x000000004028709a <???+1076392090>
> 0x00000000402d3d8f <???+1076706703>
> 0x00000000402d2bb6 <???+1076702134>
> 0x00000000004050bf <???+4214975>
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/osv-dev/CANEVyjsgZafUpxHr%2B09NtT-jgqdPxAav9Ps6BCHaE6ChfgXqbQ%40mail.gmail.com.