Your webrev.02 looks good to me!
Yasumasa 2018年12月13日(木) 14:05 Yasumasa Suenaga <yasue...@gmail.com>: > > Hi Jini, > > 2018年12月13日(木) 12:14 Jini George <jini.geo...@oracle.com>: > > > > Thank you very much for looking into this, Yasumasa! > > > > The 'pid' used in process_doesnt_exist() is actually the lwpid of the > > thread to be attached to. > > > > From Pgrab(), we call ptrace_attach() first for the pid at line 448. In > > which case, we end up calling process_doesnt_exist() through > > ptrace_attach() with the pid. In this case, we completely error out if > > the 'pid' doesn't exist. Then we go on to discover the lwpids of this > > process through libpthread_db or by going through the > > /proc/<pid>/task/<lwpid> in case of the process running in a container, > > and we then invoke ptrace_attach() again on all these newly discovered > > lwpids at line 503. (we have already attached to the main thread (where > > the pid and the lwpid are the same). This time when > > process_doesnt_exist() gets called inside ptrace_attach(), we are > > dealing with the lwpids. And we would not error out if the thread is > > missing or is in an 'exiting' state when we try to attach. > > Ok, it seems good. > > I think zombie thread(s) will not exist in thread_info list at this point > because they will be removed at thread_db_callback(). > > > > From the proc man page, /proc/<pid>/task/<lwpid>/* and > > /proc/<lwpid-treated-as-a-pid>/* files would have the same content for > > the same lwpid. > > > > ============= < Man Page Snippet > ================================ > > /proc/[pid]/task (since Linux 2.6.0-test6) > > This is a directory that contains one subdirectory for > > each thread in the process. The name of each subdirectory is the > > numerical thread ID ([tid]) of the thread (see gettid(2)). Within each > > of these subdirectories, there is a set of files with the same names and > > contents as under the /proc/[pid] directories. > > ============= < Man Page Snippet End> ============================= > > > > Let me know if you are not Ok with this. > > > > Going forward, we should remove the libpthread_db dependency for the > > threads discovery and only depend on /proc/<pid>/task/<lwpid>s for this > > (https://bugs.openjdk.java.net/browse/JDK-8181313). > > It's great news! > I will help you :-) > > > Thanks, > > Yasumasa > > > > Thank you, > > Jini. > > > > On 12/13/2018 6:15 AM, Yasumasa Suenaga wrote: > > > Hi Jini, > > > > > > I have a comment for your webrev.02 . > > > > > > > > > You added process_doesnt_exist() to check process / thread liveness from > > > /proc/<PID>, but it is not enough. > > > Information of threads (LWP) will be stored in /proc/<PID>/task/<LWPID>. > > > So you should check /proc/<PID>/task/status for threads. > > > > > > > > > Thanks, > > > > > > Yasumasa > > > > > > > > > On 2018/12/12 21:15, Jini George wrote: > > >> Thank you very much for looking into this, JC! > > >> > > >> I have a revised webrev addressing your comments at: > > >> > > >> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html > > >> > > >> Requesting one more review for this. My comments inline: > > >> > > >> On 12/12/2018 2:53 AM, JC Beyler wrote: > > >>> Hi Jini, > > >>> > > >>> I saw a few nits: > > >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html > > >>> > > >>> ? -> The comments are in the third person normally it seems so it > > >>> would > > >>> become (I also removed the s from threads): > > >>> > > >>> +// deletes a thread from the thread list > > >> Done. > > >>> > > >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html > > >>> > > >>> ? -> You added two empty lines it seems that could be removed > > >> Done. > > >>> > > >>> > > >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html > > >>> > > >>> ? -> Is there a real reason to have both enums? We could have a single > > >>> enum it seems and not lose too much > > >> > > >> You are right. I have done away with the WAITPID* enum. > > >> > > >>> ? -> you have a switch " > > >>> ? ? ? ?switch (errno) {" > > >>> ? ? ? ? -> Where really you could simplify the reading by moving the > > >>> EINTR case outside with its continue > > >>> ? ? ? ? -> The switch could then remain as it was (though you move > > >>> print_debug to print_error) > > >>> ? ? ? ? -> and just return in each cases > > >> I have changed this to: > > >> > > >> 206 } else { > > >> 207 switch (errno) { > > >> 208 case EINTR: > > >> 209 continue; > > >> 210 break; > > >> 211 case ECHILD: > > >> 212 print_debug("waitpid() failed. Child process pid (%d) does > > >> not exist \n", pid); > > >> 213 return ATTACH_THREAD_DEAD; > > >> 214 case EINVAL: > > >> 215 print_error("waitpid() failed. Invalid options > > >> argument.\n"); > > >> 216 return ATTACH_FAIL; > > >> 217 default: > > >> 218 print_error("waitpid() failed. Unexpected error %d\n", > > >> errno); > > >> 219 return ATTACH_FAIL; > > >> 220 } > > >> 221 } // else > > >> > > >> > > >>> > > >>> ? ?->?if (strncmp (buf, "State:", 6) == 0) { > > >>> ? ? ? -> You use sizeof("State:") right below; perhaps you could just > > >>> use "? const char const state[] = "State:";" and use sizeof(state) and > > >>> for the string, it seems less error prone > > >>> > > >>> ? -> A minor "bug" is here: > > >>> +? ? ? state = buf + sizeof ("State:"); > > >>> ? ? ? ? -> You did a strncmp above but that only assures the start of > > >>> the string is "State:", technically the character after the ':' is the > > >>> but it could only be that; sizeof("State:") is 7 and not 6. So you miss > > >>> one character when you are skipping spaces > > >>> ? ? ? ? -> It was probably ok because you always had at least one > > >>> space, ie "State: " > > >> > > >> Thanks! I have made some changes here to use a const char string and a > > >> variable to store the calculated length using strlen(). And I am using > > >> isspace() now to skip spaces since tabs could also be used as a > > >> delimiter. > > >> > > >>> ? -> Extra space here before the '(': "sizeof (buf)" > > >> Done. > > >>> > > >>> Finally your return sequence for that method could be simplified to: > > >>> > > >>> +? if (!found_state) { > > >>> +? ? print_error(" Could not find the State: string in the status file > > >>> for pid %d\n", pid); > > >>> +? } > > >>> +? fclose (fp); > > >>> +? return !found_state; > > >> > > >> I have modified this to: > > >> > > >> 257 if (!found_state) { > > >> 258 // Assuming the thread exists. > > >> 259 print_error("Could not find the 'State:' string in the > > >> /proc/%d/status file\n", pid); > > >> 260 } > > >> 261 fclose (fp); > > >> 262 return false; > > >> > > >> Thank you, > > >> Jini. > > >> > > >>> > > >>> Thanks! > > >>> Jc > > >>> > > >>> On Tue, Dec 11, 2018 at 9:30 AM Jini George <jini.george at oracle.com > > >>> <mailto:jini.george at oracle.com>> wrote: > > >>> > > >>> Hello ! > > >>> > > >>> Requesting reviews for: > > >>> > > >>> https://bugs.openjdk.java.net/browse/JDK-8202884 > > >>> Webrev: > > >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html > > >>> > > >>> Details: > > >>> For attaching to the threads in a process, we first go ahead and > > >>> do a > > >>> ptrace attach to the main thread. Later, we use the libthread_db > > >>> library > > >>> (or, in the case of being within a container, iterate through the > > >>> /proc/<pid>/task files) to discover the threads of the process, > > >>> and add > > >>> them to the threads list (within SA) for this process. Once, we > > >>> have > > >>> discovered all the threads and added these to the list of > > >>> threads, we > > >>> then invoke ptrace attach individually on all these threads to > > >>> attach to > > >>> these. When we deal with an application where the threads are > > >>> exiting > > >>> continuously, some of these threads might not exist by the time > > >>> we try > > >>> to ptrace attach to these threads. The proposed fix includes the > > >>> following modifications to solve this. > > >>> ? 1. Check the state of the threads in the thread_db callback > > >>> routine, > > >>> and skip if the state of the thread is TD_THR_UNKNOWN or > > >>> TD_THR_ZOMBIE. > > >>> SA does not try to ptrace attach to these threads and does not > > >>> include > > >>> these threads in the threads list. > > >>> ? 2. While ptrace attaching to the thread, if > > >>> ptrace(PTRACE_ATTACH, > > >>> ...) > > >>> fails with either ESCRH or EPERM, check the state of the thread by > > >>> checking if the /proc/<pid>/status file corresponding to that > > >>> thread > > >>> exists and if so, reading in the 'State:' line of that file. Skip > > >>> attaching to this thread and delete this thread from the SA list of > > >>> threads, if the thread is dead (State: X) or is a zombie (State: > > >>> Z). > > >>> ?From the /proc man page, "Current state of the process. One of "R > > >>> (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T > > >>> (tracing > > >>> stop)", "Z (zombie)", or "X (dead)"." > > >>> ? 3. If waitpid() on the thread is a failure, again skip this > > >>> thread > > >>> (delete this from SA's list of threads) instead of bailing out > > >>> if the > > >>> thread has exited or terminated. > > >>> > > >>> Testing: > > >>> 1. Tested by attaching and detaching multiple times to a test > > >>> program > > >>> spawning numerous short lived threads. > > >>> 2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed > > >>> with > > >>> 100 repeats on Mach5. > > >>> 3. No new failures and no occurrences of JDK-8202884 seen with > > >>> testing > > >>> the SA tests (tiers 1 to 5) on Mach5. > > >>> > > >>> More details in the bug comments section. > > >>> > > >>> Thank you, > > >>> Jini. > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Thanks, > > >>> Jc > > >>