On Mon, 17 Jan 2022 09:00:25 GMT, Kevin Walls <kev...@openjdk.org> wrote:
>> Hi @navyxliu, >> >> nice catch. I can see how this can be annoying. >> >> I propose a simpler and more robust way to fix it though. We (1) set up >> general hotspot signal handling very early, then (2) proceed to initialize >> the heap - which you have shown can take some time - then (3) set up SIGQUIT >> handling. We core if we get a quit signal before (3). >> >> I would add SIGQUIT handling to the general signal handler too, just to >> cover the time frame between (1) and (3). At (3), it would be overwritten, >> but that would be fine. Before (3), we could just ignore SIGQUIT, or print >> out some generic information (I assume thread dumps are not yet possible at >> this stage). >> >> Since the documented behavior for the JVM is to threaddump on SIGQUIT unless >> we run with -Xrs, I think this is acceptable behavior. Not printing thread >> dump or only printing partial information is preferable to quitting with >> core. >> >> Then, this would work for any client that sends sigquit to a JVM, not just >> those using the attach framework. And it would work on all Posix platforms, >> not just Linux. And we'd would not have to rely on parsing the proc fs. >> >> Als note that a solution implemented in the client as you did has the >> disadvantage that I need a modern jcmd for it to work. However, I often just >> use whatever jcmd is in the path. Better to handle this in the receiving >> hotspot. >> >> I sketched out a simple patch to test if what I propose can work: >> >> >> diff --git a/src/hotspot/os/posix/signals_posix.cpp >> b/src/hotspot/os/posix/signals_posix.cpp >> index 2c020a79408..3f0dd91e03b 100644 >> --- a/src/hotspot/os/posix/signals_posix.cpp >> +++ b/src/hotspot/os/posix/signals_posix.cpp >> @@ -615,6 +615,11 @@ int JVM_HANDLE_XXX_SIGNAL(int sig, siginfo_t* info, >> #endif // ZERO >> } >> >> +if (sig == BREAK_SIGNAL) { >> + printf("too early for thread dumps...\n"); >> + signal_was_handled = true; >> +} >> + >> // Ignore SIGPIPE and SIGXFSZ (4229104, 6499219). >> if (!signal_was_handled && >> (sig == SIGPIPE || sig == SIGXFSZ)) { >> @@ -1279,6 +1284,7 @@ void install_signal_handlers() { >> set_signal_handler(SIGFPE); >> PPC64_ONLY(set_signal_handler(SIGTRAP);) >> set_signal_handler(SIGXFSZ); >> + set_signal_handler(BREAK_SIGNAL); // just for initialization phase >> >> It still misses a number of things (I did not check signal mask setup and >> ReduceSignalUsage must be handled too), but it shows the general direction >> and worked as expected. >> >> Cheers, Thomas > >> I propose a simpler and more robust way to fix it though > > Great, this is the kind of thing I was heading towards with the conversation > in the bug text. Although not sure why I could not reproduce the problem, > with various different JDK versions. Apologies @kevinjwalls as I also missed the discussion in the JBS issue. Ideally we would know if the target VM is ready before we send the SIGQUIT to attach - e.g. writing a well-known file that the attach mechanism looks for before trying to attach. That seems feasible but perhaps costly and not always possible(?) ... What has been presented here is a side-channel way of knowing. In that respect I like this. It is a pity it is Linux only. The alternative suggestions of just making the window during which SIGQUIT terminates the VM process small enough to be un-hittable, also has merit. I don't think we have to try and accommodate extreme code that just looks up a process in the process table and throws a signal at it. Ignoring SIGQUIT during the early VM startup seems a reasonable solutions (we can't produce a thread dump at that time anyway). It seems to me that we can simply install UserHandler for SIGQUIT very early in the VM initialization process and it will be a no-op until the JDK signal handling is properly initialized (just need to fix one assert). Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/7003