On Mon, Oct 5, 2020 at 9:48 PM Waldek Kozaczuk <[email protected]> wrote:

>
> On Thursday, October 1, 2020 at 2:56:09 PM UTC-4 Nadav Har'El wrote:
>
>>
>> I looked at the difference between the libc and musl implementation of
>> sigsetjmp.s, and there are extensive differences I don't understand.
>> One of them is a difference between a "call" and a "jmp". One of them is
>> the use of "@PLT" in the code.
>> Maybe that's an ABI (stack alignment) problem? Maybe linker?
>> There's also a new bizarre __sigsetjmp_tail which I don't understand.
>>
>> Can you try switching those different functions one by one and perhaps
>> finding which one is causing the problem?
>>
> I have narrowed down the tst-feexcept.cc to have only the following two
> assertions on (remaining code after in main is commented out) and also have
> changed env to be regular variable (see below) :
>
> static sigjmp_buf env;
>

I thought about it, thread_local is weird, but should have also worked (and
apparently, doesn't work exactly the same) -  the reason is that for
synchronic signals (sigfpe, sigsegv - things that happen because of bad
instructions, not signal from another thread), Osv runs the handler in the
same thread.


> ...
>
> int main(int argc, char **argv)
> {
>     // Test that fegetexcept() does not return a negative number
>     expect(fegetexcept() >= 0, true);
>

I guess you can drop this from the test too, and it will fail just the same.


>     // Test that *integer* division by zero generates, ironically, a SIGFPE
>     expect(sig_check([] { printf("%d\n", 1 / zero_i()); }, SIGFPE), true);
>
>     std::cout << "SUMMARY: " << tests << " tests, " << fails << "
> failures\n";
>     return fails == 0 ? 0 : 1;
> }
>
> And the test still crashes (and 2 assertions pass). Obviously, if I keep
> the 1st assertion only, it does not crash. So something to do with
> sig_check().
>
> Now, I have discovered that if I comment out the invocation of f() in
> sig_check the test does not crash
>

The invocation of f() is what causes siglongjmp() (in the signal handler)
to be called.
You can add a printout in the signal handler before siglongjmp() and see
that if it was reached, and then add a printout in if (sigsetjmp(env, 1)) {
(inside the if) - if that printout isn't shown, the siglongjmp didn't work.


> but the 2nd assertion obviously fails.
>
> template<typename Func>
> bool sig_check(Func f, int sig) {
>     // save_except works around a bug in Linux glibc
>     // (https://sourceware.org/bugzilla/show_bug.cgi?id=21035) where
>     // longjmp resets the FPU exception mask. So if we want it to  survive
>     // the longjmp in this test, we unfortunately need to save it ourselves
>     int save_except = fegetexcept();
>     if (sigsetjmp(env, 1)) {
>         // got the signal (and automatically restored handler)
>         fedisableexcept(FE_ALL_EXCEPT);
>         feenableexcept(save_except);
>         return true;
>     }
>     struct sigaction sa;
>     sa.sa_flags = SA_RESETHAND;
>     sigemptyset(&sa.sa_mask);
>     sa.sa_handler = [] (int) {
>         siglongjmp(env, 1);
>     };
>     assert(sigaction(sig, &sa, NULL) == 0);
>     //f();
>     sa.sa_handler = SIG_DFL;
>     assert(sigaction(sig, &sa, NULL) == 0);
>     fedisableexcept(FE_ALL_EXCEPT);
>     feenableexcept(save_except);
>     return false;
> }
>
> So maybe something is wrong with sa_handler - siglongjmp(env, 1); - which
> when called for division by zero exception possibly wrote over the memory.
>

On a wild hunch, can you please try to compile this test without
optimization (-O0), maybe it changes something (as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982)? But I guess it's not
that, it's something else :-(

What do we actually benefit from taking Musl's new code and not keep the
old code, which worked well?

I think you can get rid of the thread-local and also have a chance to run
>> it on aarch64 then.
>>
> Getting rid of thread_local for env and making it static does not help on
> x64 - still exact same crash (which is good news in sense:-). Interestingly
> the same test does not crash on aarch64 now (no thread_local which is not
> supported yet) but some assertions fail.
>

Yes - it shouldn't be thread_local but for synchronous signals (like SIGFPE
and SIGSEGV) it doesn't matter - the handler runs in the same thread.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjs6oLnT6G%2B93J15UcxfWqHryoXTk6DB%2Be7V6%2BVkS%3D%2BzjA%40mail.gmail.com.

Reply via email to