On Mon, 17 Jan 2022 06:11:22 GMT, David Holmes <dhol...@openjdk.org> wrote:
>> There is a handshake protocol between attach and HotSpot. Linux >> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been >> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as >> SIGQUIT) and create AttachListener. However, it is possible that attach >> starts "handshake" before os::initialize_jdk_signal_support() is called. The >> signal handler which handles SIGQUIT has not been installed. Prior to >> os::initialize_jdk_signal_support(), universe_init() is called. Its time >> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'. >> It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. >> EC2 m4.10xlarge). Many tools such jcmd, jstack etc may force initializing >> HotSpot quit prematurely. >> >> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will >> be caught by the target process before striking it with SIGQUIT. It will >> make HotSpot more robust. The fields of procfs are well >> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10) >> and have supported since Linux 2.6.30. libattach.so will not the only >> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. >> >> >> Testing >> >> Before, this patch, once initialization takes long time, jcmd may quit the >> java process. >> >> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr >> -XX:ParallelGCThreads=1 & >> [1] 9589 >> [0.028s][debug][gc,heap] Minimum heap 68719476736 Initial heap 68719476736 >> Maximum heap 68719476736 >> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work >> units pre-touching 68719476736B. >> $jcmd 9589 VM.flags >> 9589: >> [1] + 9589 quit java -Xms64g -XX:+AlwaysPreTouch >> -Xlog:gc+heap=debug:stderr >> java.io.IOException: No such process >> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native >> Method) >> at >> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100) >> at >> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) >> at >> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) >> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) >> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97) >> >> With this patch, jcmd will timeout but won't disrupt 15274. >> >> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 & >> [1] 15274 >> $ jcmd 15274 VM.flags >> 15274: >> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file >> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond >> within 10500ms or HotSpot VM not loaded >> at >> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105) >> at >> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) >> at >> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) >> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) >> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97) > > src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164: > >> 162: { >> 163: // Only give up sending SIGQUIT if we see that SigCgt is not set. >> 164: if (check_sigquit_caught(pid) == 0) return; > > Suggestion (assumes bool function): > > // Only send the SIGQUIT if we can see that the target JVM is ready to catch > it. > if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) { > JNU_ThrowIOExceptionWithLastError(env, "kill"); > } The reason I would leave retval -1 because I guess someone may disable procfs entirely with kernel configure. As a result, we never know that answer of `check_sigquit_caught` for sure. ------------- PR: https://git.openjdk.java.net/jdk/pull/7003