On Mon, 17 Jan 2022 06:11:22 GMT, David Holmes <dhol...@openjdk.org> wrote:

>> There is a handshake protocol between attach and HotSpot. Linux 
>> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
>> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as 
>> SIGQUIT) and create AttachListener. However, it is possible that attach 
>> starts "handshake" before os::initialize_jdk_signal_support() is called. The 
>> signal handler which handles SIGQUIT has not been installed. Prior to 
>> os::initialize_jdk_signal_support(), universe_init() is called. Its time 
>> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'. 
>>  It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. 
>> EC2 m4.10xlarge). Many tools such jcmd, jstack etc may force initializing 
>> HotSpot quit prematurely. 
>> 
>> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will 
>> be caught by the target process before striking it with SIGQUIT.  It will 
>> make HotSpot more robust.  The fields of procfs are well 
>> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10)
>>  and have supported since Linux 2.6.30. libattach.so will not the only 
>> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. 
>> 
>> 
>> Testing 
>> 
>> Before, this patch, once initialization takes long time, jcmd may quit the 
>> java process. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
>> -XX:ParallelGCThreads=1 &
>> [1] 9589
>> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
>> Maximum heap 68719476736
>> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
>> units pre-touching 68719476736B.
>> $jcmd 9589 VM.flags
>> 9589:
>> [1]  + 9589 quit       java -Xms64g -XX:+AlwaysPreTouch 
>> -Xlog:gc+heap=debug:stderr
>> java.io.IOException: No such process
>>         at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
>> Method)
>>         at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100)
>>         at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>>         at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> 
>> With this patch, jcmd will timeout but won't disrupt 15274. 
>> 
>> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
>> [1] 15274
>> $ jcmd 15274 VM.flags
>> 15274:
>> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
>> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond 
>> within 10500ms or HotSpot VM not loaded
>>         at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105)
>>         at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>>         at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>
> src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164:
> 
>> 162: {
>> 163:     // Only give up sending SIGQUIT if we see that SigCgt is not set.
>> 164:     if (check_sigquit_caught(pid) == 0) return;
> 
> Suggestion (assumes bool function):
> 
> // Only send the SIGQUIT if we can see that the target JVM is ready to catch 
> it.
> if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) {
>     JNU_ThrowIOExceptionWithLastError(env, "kill");
> }

The reason I would leave retval -1 because I guess someone may disable procfs 
entirely with kernel configure. As a result, we never know that answer of 
`check_sigquit_caught` for sure.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7003

Reply via email to