I think I have all the answers. See below.

On Thursday, January 6, 2022 at 4:55:25 AM UTC-5 Nadav Har'El wrote:

> On Thu, Jan 6, 2022 at 6:56 AM Waldek Kozaczuk <[email protected]> wrote:
>
>> Hi,
>>
>> When one compiles kernel with most symbols hidden (which also hides 
>> completely STD C++ library), the tst-tls-pie.so crashes like so:
>>
>> #0  0x00000000402da5b2 in processor::cli_hlt () at 
>> arch/x64/processor.hh:247
>> #1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
>> #2  osv::halt () at arch/x64/power.cc:26
>> #3  0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at 
>> runtime.cc:137
>> #4  0x00000000402211b2 in abort () at runtime.cc:103
>> #5  0x000000004039cade in osv::generate_signal (siginfo=..., 
>> ef=0xffff800000e87068) at libc/signal.cc:130
>> #6  0x000000004039cb9f in osv::handle_mmap_fault 
>> (addr=addr@entry=18446744073709547520, sig=sig@entry=11, 
>> ef=ef@entry=0xffff800000e87068)
>>     at libc/signal.cc:145
>> #7  0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068, 
>> addr=18446744073709547520) at core/mmu.cc:1334
>> #8  mmu::vm_fault (addr=18446744073709547520, 
>> addr@entry=18446744073709551592, ef=ef@entry=0xffff800000e87068) at 
>> core/mmu.cc:1354
>> #9  0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at 
>> arch/x64/mmu.cc:42
>> #10 <signal handler called>
>> #11 0x0000100000156cf7 in std::ostream::sentry::sentry 
>> (this=0x200000200df0, __os=...)
>>     at 
>> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51
>>
>
> Since your kernel does *not* export the C++ library, where did this symbol 
> come from? Did you build the tests with the host's C++ library?
> I'll continue reading with the assumption that you did.
>

It turns out that even though the symbols are hidden and their names do not 
show up in the stack trace printed by the kernel, there are still in the 
original elf before stripping so when I connect 
with gdb pointing to loader.elf the symbol names show up nicely.

readelf -sW build/release/loader-stripped.elf | grep run_init_funcs #NOTHING

readelf -sW build/release/loader.elf | grep run_init_funcs
 14789: 0000000040297120   488 FUNC    LOCAL  DEFAULT    4 
_ZN3elf6object14run_init_funcsEiPPc


>  
>
>> #12 0x000010000015743c in std::__ostream_insert<char, 
>> std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4)
>>     at 
>> /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
>> #13 0x0000000000401ccf in std::operator<< <std::char_traits<char> > 
>> (__s=<optimized out>, __out=...)
>>     at /usr/include/c++/10/bits/char_traits.h:371
>> #14 report (ok=<optimized out>, msg="v7 in init function") at 
>> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57
>> #15 0x00000000004014b3 in before_main () at 
>> /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127
>> #16 0x00000000402972c5 in elf::object::run_init_funcs 
>> (this=0xffffa0000094a600, argc=argc@entry=1, 
>> argv=argv@entry=0xffffa0000094a400)
>>     at core/elf.cc:1178
>> #17 0x0000000040298a0b in elf::program::init_library (this=<optimized 
>> out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500
>> #18 0x000000004020c239 in osv::application::main 
>> (this=0xffffa0000094cc10) at core/app.cc:319
>> #19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0) 
>> at core/app.cc:236
>> #20 _FUN () at core/app.cc:238
>> #21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at 
>> libc/pthread.cc:116
>> #22 std::__invoke_impl<void, pthread_private::pthread::pthread(void* 
>> (*)(void*), void*, sigset_t, const 
>> pthread_private::thread_attr*)::<lambda()>&> (__f=...) at 
>> /usr/include/c++/10/bits/invoke.h:60
>> #23 std::__invoke_r<void, pthread_private::pthread::pthread(void* 
>> (*)(void*), void*, sigset_t, const 
>> pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at 
>> /usr/include/c++/10/bits/invoke.h:153
>> #24 std::_Function_handler<void(), 
>> pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const 
>> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const 
>> std::_Any_data &) (__functor=...) at 
>> /usr/include/c++/10/bits/std_function.h:291
>> #25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) 
>> at core/sched.cc:1267
>> #26 sched::thread_main_c (t=0xffff800000e82040) at 
>> arch/x64/arch-switch.hh:325
>> #27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116
>>
>> I have been researching this for a bit and realized that when I change 
>> the report() function not use std::cout but instead use printf() like so:
>>
>> printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str());
>> the crash goes away.
>>
>> I knew something was wrong with the initialization of some objects but 
>> could not quite pin it down. Then I tried to run the tst-tls-pie.so on the 
>> Linux host (it is a pie) and it crashed with a segmentation fault. This 
>> made me think that maybe something is wrong with the test program itself. 
>>
>
> Interesting. I remember tests/tst-tls.cc did work on Linux in the past, 
> and the "pie" compilation which you added in 
> 70547a6c64c56cbdf4c5b5d0c7eaa1e85badb0f7 should supposedly have worked too 
> but I don't see that you documented it. Maybe you never ran it on Linux?
>

I think I never ran it on Linux or if I did and my memory serves me well it 
did crash but I never got to the bottom of why. For example the 
tst-getopt-pie.so always worked on Linux.

>
>
>> Finally, I found this on the internet - 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody 
>> tried to compile and run a program with a similar construct: std::cout used 
>> in the __constructor__ annotated function. The stack trace looks very 
>> similar as well. The bug was rejected with the explanation that the 
>> std::ios_base::Init object may not be initialized yet if used in the 
>> __constructor__ annotated function.
>>
>
>> What is interesting this test happens to work with kernel exposing all 
>> symbols including stdc++ because the kernel copy of std::ios_base::Init was 
>> already initialized.
>>
>
> Very interesting, and nice detective work. Yes, I think this is exactly 
> the same bug.
> *However*, I'm not sure I understand why it shouldn't have worked:
>
> If test-tls-pie's *depends* on libstdc++.so, unless I'm missing something, 
> OSv should run libstdc++'s constructors *before* it runs test-tls-pie's.
> So when tests/tst-tls's constructor function runs, libstdc++ should 
> already have been fully constructed.
> I see below you also raised this question....
>
> That being said, I'm probably missing something because if I was 
> correctly, it would run correctly on Linux and 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 would not exist...
>

After more research and using the simpler program below (taken from gcc 
report above), it turns out that std::ios_base::Init::Init() that 
initializes std::cout is called by an init function that is generated by 
the compiler as part of the test program itself NOT as we thought as part 
of libstdc++.so. So indeed it all depends on the order of the init 
functions in the same object and it seems that the function calling 
std::ios_base::Init::Init() happens to be after our init function - 
before_main.

Here is a simpler example:
#include <iostream>

__attribute__((constructor)) void constructor() {
    std::cout << "Starting" << std::endl;
}

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}

__attribute__((destructor)) void destructor() {
    std::cout << "Finishing" << std::endl;
}

It crashes on Linux and on OSv. When connected with gdb with a breakpoint 
in elf::run_init_funcs() I see this:

(gdb) p funcs
$2 = (void (**)(int, char **)) 0x403db8
(gdb) p funcs[0]
$3 = (void (*)(int, char **)) 0x401150 <frame_dummy>
(gdb) p funcs[1]
$4 = (void (*)(int, char **)) 0x401156 <constructor()>
(gdb) p funcs[2]
$5 = (void (*)(int, char **)) 0x401230 <_GLOBAL__sub_I__Z11constructorv>
(gdb) disas 0x401156
Dump of assembler code for function _Z11constructorv:
   0x0000000000401156 <+0>:        push   %rbp
   0x0000000000401157 <+1>:        mov    %rsp,%rbp
   0x000000000040115a <+4>:        lea    0xeb0(%rip),%rsi        # 0x402011
   0x0000000000401161 <+11>:        lea    0x2ed8(%rip),%rdi        # 
0x404040 <_ZSt4cout@@GLIBCXX_3.4>
   0x0000000000401168 <+18>:        call   0x401040 
<_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
   0x000000000040116d <+23>:        mov    %rax,%rdx
   0x0000000000401170 <+26>:        mov    0x2e69(%rip),%rax        # 
0x403fe0
   0x0000000000401177 <+33>:        mov    %rax,%rsi
   0x000000000040117a <+36>:        mov    %rdx,%rdi
   0x000000000040117d <+39>:        call   0x401050 <_ZNSolsEPFRSoS_E@plt>
   0x0000000000401182 <+44>:        nop
   0x0000000000401183 <+45>:        pop    %rbp
   0x0000000000401184 <+46>:        ret    
End of assembler dump.
(gdb) disas 0x401230
Dump of assembler code for function _GLOBAL__sub_I__Z11constructorv:
   0x0000000000401230 <+0>:        push   %rbp
   0x0000000000401231 <+1>:        mov    %rsp,%rbp
   0x0000000000401234 <+4>:        mov    $0xffff,%esi
   0x0000000000401239 <+9>:        mov    $0x1,%edi
   0x000000000040123e <+14>:        call   0x4011e7 
<_Z41__static_initialization_and_destruction_0ii>
   0x0000000000401243 <+19>:        pop    %rbp
   0x0000000000401244 <+20>:        ret    
End of assembler dump.
(gdb) disas _Z41__static_initialization_and_destruction_0ii
Dump of assembler code for function 
__static_initialization_and_destruction_0(int, int):
   0x00000000402029e0 <+0>:        push   %rbp
   0x00000000402029e1 <+1>:        mov    $0x405910d0,%edi
   0x00000000402029e6 <+6>:        mov    %rsp,%rbp
   0x00000000402029e9 <+9>:        sub    $0x170,%rsp
   0x00000000402029f0 <+16>:        call   0x403e9ec0 
<_ZNSt8ios_base4InitC2Ev>
   0x00000000402029f5 <+21>:        mov    $0x40590888,%edx
   0x00000000402029fa <+26>:        mov    $0x405910d0,%esi
   0x00000000402029ff <+31>:        mov    $0x403ea6c0,%edi
   0x0000000040202a04 <+36>:        call   0x40221dc0 
<__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)>
   0x0000000040202a09 <+41>:        mov    $0x404a037e,%ecx
   0x0000000040202a0e <+46>:        mov    $0x404a0368,%r8d
   0x0000000040202a14 <+52>:        xor    %esi,%esi
   0x0000000040202a16 <+54>:        mov    $0x404a0598,%edx
   0x0000000040202a1b <+59>:        mov    $0x40591060,%edi
   0x0000000040202a20 <+64>:        call   0x40345500 
<_ZN15tracepoint_baseC2EjRKSt9type_infoPKcS4_>
   0x0000000040202a25 <+69>:        mov    $0x40591060,%esi
   0x0000000040202a2a <+74>:        mov    $0x40234ca0,%edi
   0x0000000040202a2f <+79>:        mov    $0x40590888,%edx
   0x0000000040202a34 <+84>:        movq   $0x404a054f,0x38e641(%rip)       
 # 0x40591080 <_ZL15trace_tcp_state+32>
   0x0000000040202a3f <+95>:        call   0x40221dc0 
<__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)>
   0x0000000040202a44 <+100>:        mov    $0x404a03a0,%esi
   0x0000000040202a49 <+105>:        lea    -0x170(%rbp),%rdi
   0x0000000040202a50 <+112>:        mov    $0x2e,%ecx
   0x0000000040202a55 <+117>:        rep movsq %ds:(%rsi),%es:(%rdi)
   0x0000000040202a58 <+120>:        lea    -0x170(%rbp),%rax
   0x0000000040202a5f <+127>:        mov    %rbp,%rsi
   0x0000000040202a62 <+130>:        nopw   0x0(%rax,%rax,1)
   0x0000000040202a68 <+136>:        mov    (%rax),%rdx
   0x0000000040202a6b <+139>:        movzbl 0x8(%rax),%ecx
   0x0000000040202a6f <+143>:        add    $0x10,%rax
   0x0000000040202a73 <+147>:        mov    %cl,0x40590f60(%rdx)
   0x0000000040202a79 <+153>:        cmp    %rsi,%rax
   0x0000000040202a7c <+156>:        jne    0x40202a68 
<__static_initialization_and_destruction_0(int, int)+136>
   0x0000000040202a7e <+158>:        leave  
   0x0000000040202a7f <+159>:        ret    
 
As you can see the _GLOBAL__sub_I__Z11constructorv which calls 
__static_initialization_and_destruction_0 which calls 
_ZNSt8ios_base4InitC2Ev is after the _Z11constructorv function defined as 
part of the test program. So indeed the std::cout is not initialized yet at 
this point.

So as the gcc report comment say, it is NOT valid to use std::cot (and 
others) in an init function as the order in which these function and 
generated by the compiler and called in undetermined. In other words that 
simple sample program above and our tst-tls-pie.so is not a valid C++ 
program.

The reason it works when symbols are NOT hidden is because we include and 
export libstdc++.so (statically linked) os the std::cout gets initialized 
as part of the kernel early on.


>
>
>> So I think the tst-tls.cc needs to be changed to not to use std:cout in 
>> the report function or uses different report() from before_main(). 
>>
>
> That sounds like a good workaround, but I would have liked to understand 
> better if this problem is expected, or some sort of OSv 
> initialization-order bug.
>
I think this is more than a workaround. It is actually fixing an invalid 
program that just happens to work when all symbols and libstdc++.so is 
exported. 

>
>
>> Waldek
>>
>> PS1. Regarding using std:cout in various __constructor__  kernel 
>> functions like parse_madt() (which calls debug()) and called by smp_init(), 
>> I wonder if we are just lucky that std::ios_base::Init is already 
>> initialized and we do not see similar crashes.
>>
>
> I think we already ironed out these issues in the kernel log ago. We have 
> an explicit order for some things (see include/osv/prio.hh) and all the C++ 
> code which depends on other stuff being initialized happens after those 
> things. I don't remember the details though.. 
>  
>
>>
>> PS2: This printout with dynamic linker info suggests that all global 
>> objects in stdc++.so should have been initialized before calling 
>> before_main(). So why is std::ios_base::Init  not initialized yet?
>>
>> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing DT_INIT function
>> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing DT_INIT 
>> function
>> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing 12 
>> DT_INIT_ARRAYSZ functions
>> ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing 12 
>> DT_INIT_ARRAYSZ functions
>> ELF [tid:26, mod:4, /tests/libtls.so]: Executing DT_INIT function
>> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing DT_INIT function
>> ELF [tid:26, mod:4, /tests/libtls.so]: Executing 1 DT_INIT_ARRAYSZ 
>> functions
>> ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing 1 
>> DT_INIT_ARRAYSZ functions
>> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing DT_INIT function
>> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing DT_INIT 
>> function
>> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing 2 DT_INIT_ARRAYSZ 
>> functions
>> ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing 2 
>> DT_INIT_ARRAYSZ functions
>> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing DT_INIT function
>> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Finished executing DT_INIT 
>> function
>> ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing 3 DT_INIT_ARRAYSZ 
>> functions
>> Aborted
>>
>> [backtrace]
>> 0x000000004039cadd <???+1077529309>
>> 0x000000004039cb9e <???+1077529502>
>> 0x000000004028709a <???+1076392090>
>> 0x00000000402d3d8f <???+1076706703>
>> 0x00000000402d2bb6 <???+1076702134>
>> 0x00000000004050bf <???+4214975>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/6dac9cfd-5db7-408d-8742-73cc38cf6402n%40googlegroups.com.

Reply via email to