On Mon, 10 Jan 2022 05:19:26 GMT, Xin Liu <x...@openjdk.org> wrote:

> There is a handshake protocol between attach and HotSpot. Linux 
> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as 
> SIGQUIT) and create AttachListener. However, it is possible that attach 
> starts "handshake" before os::initialize_jdk_signal_support() is called. The 
> signal handler which handles SIGQUIT has not been installed. Prior to 
> os::initialize_jdk_signal_support(), universe_init() is called. Its time 
> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'.  
> It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. EC2 
> m4.10xlarge). Many tools such jcmd, jstack etc may force initializing HotSpot 
> quit prematurely. 
> 
> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will 
> be caught by the target process before striking it with SIGQUIT.  It will 
> make HotSpot more robust.  The fields of procfs are well 
> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10)
>  and have supported since Linux 2.6.30. libattach.so will not the only 
> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. 
> 
> 
> Testing 
> 
> Before, this patch, once initialization takes long time, jcmd may quit the 
> java process. 
> 
> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
> -XX:ParallelGCThreads=1 &
> [1] 9589
> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
> Maximum heap 68719476736
> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
> units pre-touching 68719476736B.
> $jcmd 9589 VM.flags
> 9589:
> [1]  + 9589 quit       java -Xms64g -XX:+AlwaysPreTouch 
> -Xlog:gc+heap=debug:stderr
> java.io.IOException: No such process
>         at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
> Method)
>         at 
> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100)
>         at 
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>         at 
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
> 
> With this patch, jcmd will timeout but won't disrupt 15274. 
> 
> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
> [1] 15274
> $ jcmd 15274 VM.flags
> 15274:
> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond 
> within 10500ms or HotSpot VM not loaded
>         at 
> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105)
>         at 
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>         at 
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

Hi Xin,

A couple of comments below. I'm still thinking about this one ... seems okay 
but I'm not certain ...

Thanks,
David

src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 119:

> 117:  * Return -1 when it runs into any error.
> 118:  */
> 119: static int check_sigquit_caught(jint pid) {

can't this just be a bool function? Even if int we don't need a ternary return 
value as only zero is of interest.

src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164:

> 162: {
> 163:     // Only give up sending SIGQUIT if we see that SigCgt is not set.
> 164:     if (check_sigquit_caught(pid) == 0) return;

Suggestion (assumes bool function):

// Only send the SIGQUIT if we can see that the target JVM is ready to catch it.
if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) {
    JNU_ThrowIOExceptionWithLastError(env, "kill");
}

-------------

PR: https://git.openjdk.java.net/jdk/pull/7003

Reply via email to