Hello !

Requesting reviews for:

https://bugs.openjdk.java.net/browse/JDK-8202884
Webrev: http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html

Details:
For attaching to the threads in a process, we first go ahead and do a ptrace attach to the main thread. Later, we use the libthread_db library (or, in the case of being within a container, iterate through the /proc/<pid>/task files) to discover the threads of the process, and add them to the threads list (within SA) for this process. Once, we have discovered all the threads and added these to the list of threads, we then invoke ptrace attach individually on all these threads to attach to these. When we deal with an application where the threads are exiting continuously, some of these threads might not exist by the time we try to ptrace attach to these threads. The proposed fix includes the following modifications to solve this. 1. Check the state of the threads in the thread_db callback routine, and skip if the state of the thread is TD_THR_UNKNOWN or TD_THR_ZOMBIE. SA does not try to ptrace attach to these threads and does not include these threads in the threads list. 2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH, ...) fails with either ESCRH or EPERM, check the state of the thread by checking if the /proc/<pid>/status file corresponding to that thread exists and if so, reading in the 'State:' line of that file. Skip attaching to this thread and delete this thread from the SA list of threads, if the thread is dead (State: X) or is a zombie (State: Z). From the /proc man page, "Current state of the process. One of "R (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T (tracing stop)", "Z (zombie)", or "X (dead)"." 3. If waitpid() on the thread is a failure, again skip this thread (delete this from SA's list of threads) instead of bailing out if the thread has exited or terminated.

Testing:
1. Tested by attaching and detaching multiple times to a test program spawning numerous short lived threads. 2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed with 100 repeats on Mach5. 3. No new failures and no occurrences of JDK-8202884 seen with testing the SA tests (tiers 1 to 5) on Mach5.

More details in the bug comments section.

Thank you,
Jini.

Reply via email to