On Mon, Oct 5, 2020 at 9:48 PM Waldek Kozaczuk <[email protected]> wrote:
>
> On Thursday, October 1, 2020 at 2:56:09 PM UTC-4 Nadav Har'El wrote:
>
>>
>> I looked at the difference between the libc and musl implementation of
>> sigsetjmp.s, and there are extensive differences I don't understand.
>> One of them is a difference between a "call" and a "jmp". One of them is
>> the use of "@PLT" in the code.
>> Maybe that's an ABI (stack alignment) problem? Maybe linker?
>> There's also a new bizarre __sigsetjmp_tail which I don't understand.
>>
>> Can you try switching those different functions one by one and perhaps
>> finding which one is causing the problem?
>>
> I have narrowed down the tst-feexcept.cc to have only the following two
> assertions on (remaining code after in main is commented out) and also have
> changed env to be regular variable (see below) :
>
> static sigjmp_buf env;
>
I thought about it, thread_local is weird, but should have also worked (and
apparently, doesn't work exactly the same) - the reason is that for
synchronic signals (sigfpe, sigsegv - things that happen because of bad
instructions, not signal from another thread), Osv runs the handler in the
same thread.
> ...
>
> int main(int argc, char **argv)
> {
> // Test that fegetexcept() does not return a negative number
> expect(fegetexcept() >= 0, true);
>
I guess you can drop this from the test too, and it will fail just the same.
> // Test that *integer* division by zero generates, ironically, a SIGFPE
> expect(sig_check([] { printf("%d\n", 1 / zero_i()); }, SIGFPE), true);
>
> std::cout << "SUMMARY: " << tests << " tests, " << fails << "
> failures\n";
> return fails == 0 ? 0 : 1;
> }
>
> And the test still crashes (and 2 assertions pass). Obviously, if I keep
> the 1st assertion only, it does not crash. So something to do with
> sig_check().
>
> Now, I have discovered that if I comment out the invocation of f() in
> sig_check the test does not crash
>
The invocation of f() is what causes siglongjmp() (in the signal handler)
to be called.
You can add a printout in the signal handler before siglongjmp() and see
that if it was reached, and then add a printout in if (sigsetjmp(env, 1)) {
(inside the if) - if that printout isn't shown, the siglongjmp didn't work.
> but the 2nd assertion obviously fails.
>
> template<typename Func>
> bool sig_check(Func f, int sig) {
> // save_except works around a bug in Linux glibc
> // (https://sourceware.org/bugzilla/show_bug.cgi?id=21035) where
> // longjmp resets the FPU exception mask. So if we want it to survive
> // the longjmp in this test, we unfortunately need to save it ourselves
> int save_except = fegetexcept();
> if (sigsetjmp(env, 1)) {
> // got the signal (and automatically restored handler)
> fedisableexcept(FE_ALL_EXCEPT);
> feenableexcept(save_except);
> return true;
> }
> struct sigaction sa;
> sa.sa_flags = SA_RESETHAND;
> sigemptyset(&sa.sa_mask);
> sa.sa_handler = [] (int) {
> siglongjmp(env, 1);
> };
> assert(sigaction(sig, &sa, NULL) == 0);
> //f();
> sa.sa_handler = SIG_DFL;
> assert(sigaction(sig, &sa, NULL) == 0);
> fedisableexcept(FE_ALL_EXCEPT);
> feenableexcept(save_except);
> return false;
> }
>
> So maybe something is wrong with sa_handler - siglongjmp(env, 1); - which
> when called for division by zero exception possibly wrote over the memory.
>
On a wild hunch, can you please try to compile this test without
optimization (-O0), maybe it changes something (as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982)? But I guess it's not
that, it's something else :-(
What do we actually benefit from taking Musl's new code and not keep the
old code, which worked well?
I think you can get rid of the thread-local and also have a chance to run
>> it on aarch64 then.
>>
> Getting rid of thread_local for env and making it static does not help on
> x64 - still exact same crash (which is good news in sense:-). Interestingly
> the same test does not crash on aarch64 now (no thread_local which is not
> supported yet) but some assertions fail.
>
Yes - it shouldn't be thread_local but for synchronous signals (like SIGFPE
and SIGSEGV) it doesn't matter - the handler runs in the same thread.
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/osv-dev/CANEVyjs6oLnT6G%2B93J15UcxfWqHryoXTk6DB%2Be7V6%2BVkS%3D%2BzjA%40mail.gmail.com.