On Sat, Nov 30, 2019 at 5:32 AM Waldek Kozaczuk <[email protected]>
wrote:

> I think I have solved the mystery. It looks like another bug in the
> dynamic linker. This time related to an order in which objects are unloaded
> and FINI* functions are executed.
>

I'm afraid I didn't quite follow the details below and what's wrong with
the "wrong" order or what caused it.

I wonder if it's related to to issue
https://github.com/cloudius-systems/osv/issues/334 but I don't know how.



>
> As you can tell in the last test boost_filesystem, FINI function got
> executed after the libboost_unit_test_framework FINI function got called
> and unloaded. If you look at the first one, the order is opposite and that
> why it works.
>

Why is this order wrong?

If I understand correctly, our test loads (DT_NEED)
libboost_unit_test_framework
and that loads boost_filesystem
So you'd expect the program::remove_object() code to first finalize
unit_test_framework, and only then unload its dependency (boost_filesystem).
So it sounds the order you call wrong, is right, no? If it's not right -
then at least it was as intended ;-) And if this intent is wrong, it should
be fixed (in remove_object())

But there is a complication: in modules/tests/Makefile I see our test
depends on boost_filesystem *directly*.
So now the order that boost_filesystem gets unloaded may become random
(because we analyze the entire DAG for dependencies....).
This may explain why in some cases we got the "opposite" order and
everything worked.

>
>
> I think we need to change unloading logic to first execute all FINI
> functions of all objects probably on the same thread that the INIT ones got
> executed and only then unload the objects.
>

This is an interesting point. Indeed the loading code first loads all the
necessary objects in multiple levels, and only then runs all the init
functions (in some hopefully correct object), the unloading code doesn't do
this. But I don't think there's an easy way to do this, if at all.... It's
not always true that objects get unloaded in one large batch. It can also
happen like I mentioned above in our own example - boost_test_framework
says it wants to unload boost_filesystem, BUT, it is not the last holder of
boost_filesystem because the application also dt_needed boost_filesystem
directly. So we can't unload boost_filesystem just yet... And need to delay
its unload.



> What do you think?
>
> On Thursday, November 28, 2019 at 3:19:57 PM UTC-5, Nadav Har'El wrote:
>>
>> On Thu, Nov 28, 2019 at 10:04 PM Waldek Kozaczuk <[email protected]>
>> wrote:
>>
>>>
>>>
>>>>>
>>>>>
>>>>> Please note I have removed this line from the tests Makefile:
>>> -$(boost-tests:%=$(out)/tests/%): CXXFLAGS += -D_GLIBCXX_USE_CXX11_ABI=0
>>> \
>>> I do not fully understand the significance of it nor the reasons for it
>>> in the first place (why was it there). Could it cause this issue?
>>>
>>
>> This line was needed because we try to combine newly combined test code
>> and some super-antique version of Boost from external/.
>> It is explained in commit 6a3bff38a281e65ee715bab4fadef63e0918f7d3. You
>> shouldn't need it if you stop using the antique boost.
>>
>>
>>> BTW I also now noticed this linker warning (not sure if related):
>>>   LIBOSV.SO
>>>   STRIP loader.elf -> loader-stripped.elf
>>> strip:
>>> build/release.x64/loader.elf[.gnu.build.attributes.text._ZN5boost6system23dummy_exported_functionEv]:
>>> Warning: version note missing - assuming version 3
>>> strip:build/release.x64/loader-stripped.elf[.gnu.build.attributes.unlikely]:
>>> error: failed to copy merged notes into output: file in wrong format
>>>   LZ loader-stripped.elf
>>>
>>>
>> I noticed this too a few days ago, but didn't spend any time to chase it.
>>
>>
>>>
>>> Is the backtrace broken? The important line is
>>> #10 0x00001000000ed1e0 in ?? ()
>>>
>>> Which is inside the test executable. Did you try "osv syms"?
>>>
>>> I did. There were couple of libraries that had missing debug info
>>> missing. Installed still did not help and backtrace looks like this -
>>> broken:
>>>
>>
>> I'm still guessing the address 0x00001000000ed1e0 is in the actual test
>> itself, not any of the libraries.
>> It is possible that "osv syms" gets confused because this bug happens
>> while the test executable is being unloaded, so maybe it's no longer in the
>> list. You can comment out the code that removes the library from the list
>> that "osv syms" uses, and then maybe gdb will be able to know about it.
>>
>>
>>
>>> It seems there is a C++ destructor in the test code (or Boost framework)
>>>> being run, which causes a crash. But I don't know what it is...
>>>>
>>>> Not sure it is related but our usr.manifest.skel pulls ancient
>>> libgc_s.so from externals.
>>>
>>
>> The reason why we have this file is explained in commit
>> be565320c082c00069614c850d29b42831b3dea6
>>
>>
>>> Changing it to pull from host creates other issues which I will send
>>> email about.
>>>
>>
>> :-(
>>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/861da7d5-7a27-4a0a-bb03-2d417c425e4b%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/861da7d5-7a27-4a0a-bb03-2d417c425e4b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjs47un2hKCLL-EfzcADUcC60ODh23JeTD-zEwo7%2B%3DRwyw%40mail.gmail.com.

Reply via email to