On Mon, 10 Jan 2022 05:19:26 GMT, Xin Liu <x...@openjdk.org> wrote: > There is a handshake protocol between attach and HotSpot. Linux > VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been > initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as > SIGQUIT) and create AttachListener. However, it is possible that attach > starts "handshake" before os::initialize_jdk_signal_support() is called. The > signal handler which handles SIGQUIT has not been installed. Prior to > os::initialize_jdk_signal_support(), universe_init() is called. Its time > complexity will be linear to the initial size of heap with 'AlwaysPreTouch'. > It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. EC2 > m4.10xlarge). Many tools such jcmd, jstack etc may force initializing HotSpot > quit prematurely. > > This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will > be caught by the target process before striking it with SIGQUIT. It will > make HotSpot more robust. The fields of procfs are well > [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10) > and have supported since Linux 2.6.30. libattach.so will not the only > consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. > > > Testing > > Before, this patch, once initialization takes long time, jcmd may quit the > java process. > > $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr > -XX:ParallelGCThreads=1 & > [1] 9589 > [0.028s][debug][gc,heap] Minimum heap 68719476736 Initial heap 68719476736 > Maximum heap 68719476736 > [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work > units pre-touching 68719476736B. > $jcmd 9589 VM.flags > 9589: > [1] + 9589 quit java -Xms64g -XX:+AlwaysPreTouch > -Xlog:gc+heap=debug:stderr > java.io.IOException: No such process > at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native > Method) > at > jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100) > at > jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) > at > jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) > at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) > at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97) > > With this patch, jcmd will timeout but won't disrupt 15274. > > $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 & > [1] 15274 > $ jcmd 15274 VM.flags > 15274: > com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file > /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond > within 10500ms or HotSpot VM not loaded > at > jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105) > at > jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) > at > jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) > at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) > at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
Hi Xin, A couple of comments below. I'm still thinking about this one ... seems okay but I'm not certain ... Thanks, David src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 119: > 117: * Return -1 when it runs into any error. > 118: */ > 119: static int check_sigquit_caught(jint pid) { can't this just be a bool function? Even if int we don't need a ternary return value as only zero is of interest. src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164: > 162: { > 163: // Only give up sending SIGQUIT if we see that SigCgt is not set. > 164: if (check_sigquit_caught(pid) == 0) return; Suggestion (assumes bool function): // Only send the SIGQUIT if we can see that the target JVM is ready to catch it. if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) { JNU_ThrowIOExceptionWithLastError(env, "kill"); } ------------- PR: https://git.openjdk.java.net/jdk/pull/7003