Thank you very much for looking into this, JC!

I have a revised webrev addressing your comments at:

http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html

Requesting one more review for this. My comments inline:

On 12/12/2018 2:53 AM, JC Beyler wrote:
Hi Jini,

I saw a few nits:
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
  -> The comments are in the third person normally it seems so it would become (I also removed the s from threads):

+// deletes a thread from the thread list
Done.

http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
   -> You added two empty lines it seems that could be removed
Done.


http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
  -> Is there a real reason to have both enums? We could have a single enum it seems and not lose too much

You are right. I have done away with the WAITPID* enum.

   -> you have a switch "
        switch (errno) {"
        -> Where really you could simplify the reading by moving the EINTR case outside with its continue         -> The switch could then remain as it was (though you move print_debug to print_error)
         -> and just return in each cases
I have changed this to:

206     } else {
207       switch (errno) {
208         case EINTR:
209           continue;
210           break;
211         case ECHILD:
212 print_debug("waitpid() failed. Child process pid (%d) does not exist \n", pid);
213           return ATTACH_THREAD_DEAD;
214         case EINVAL:
215           print_error("waitpid() failed. Invalid options argument.\n");
216           return ATTACH_FAIL;
217         default:
218           print_error("waitpid() failed. Unexpected error %d\n", errno);
219           return ATTACH_FAIL;
220       }
221     } // else



    -> if (strncmp (buf, "State:", 6) == 0) {
      -> You use sizeof("State:") right below; perhaps you could just use "  const char const state[] = "State:";" and use sizeof(state) and for the string, it seems less error prone

   -> A minor "bug" is here:
+      state = buf + sizeof ("State:");
        -> You did a strncmp above but that only assures the start of the string is "State:", technically the character after the ':' is the but it could only be that; sizeof("State:") is 7 and not 6. So you miss one character when you are skipping spaces         -> It was probably ok because you always had at least one space, ie "State: "

Thanks! I have made some changes here to use a const char string and a variable to store the calculated length using strlen(). And I am using isspace() now to skip spaces since tabs could also be used as a delimiter.

   -> Extra space here before the '(': "sizeof (buf)"
Done.

Finally your return sequence for that method could be simplified to:

+  if (!found_state) {
+    print_error(" Could not find the State: string in the status file for pid %d\n", pid);
+  }
+  fclose (fp);
+  return !found_state;

I have modified this to:

257   if (!found_state) {
258     // Assuming the thread exists.
259 print_error("Could not find the 'State:' string in the /proc/%d/status file\n", pid);
260   }
261   fclose (fp);
262   return false;

Thank you,
Jini.


Thanks!
Jc

On Tue, Dec 11, 2018 at 9:30 AM Jini George <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>> wrote:

    Hello !

    Requesting reviews for:

    https://bugs.openjdk.java.net/browse/JDK-8202884
    Webrev: http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html

    Details:
    For attaching to the threads in a process, we first go ahead and do a
    ptrace attach to the main thread. Later, we use the libthread_db
    library
    (or, in the case of being within a container, iterate through the
    /proc/<pid>/task files) to discover the threads of the process, and add
    them to the threads list (within SA) for this process. Once, we have
    discovered all the threads and added these to the list of threads, we
    then invoke ptrace attach individually on all these threads to
    attach to
    these. When we deal with an application where the threads are exiting
    continuously, some of these threads might not exist by the time we try
    to ptrace attach to these threads. The proposed fix includes the
    following modifications to solve this.
       1. Check the state of the threads in the thread_db callback routine,
    and skip if the state of the thread is TD_THR_UNKNOWN or TD_THR_ZOMBIE.
    SA does not try to ptrace attach to these threads and does not include
    these threads in the threads list.
       2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH,
    ...)
    fails with either ESCRH or EPERM, check the state of the thread by
    checking if the /proc/<pid>/status file corresponding to that thread
    exists and if so, reading in the 'State:' line of that file. Skip
    attaching to this thread and delete this thread from the SA list of
    threads, if the thread is dead (State: X) or is a zombie (State: Z).
      From the /proc man page, "Current state of the process. One of "R
    (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T
    (tracing
    stop)", "Z (zombie)", or "X (dead)"."
       3. If waitpid() on the thread is a failure, again skip this thread
    (delete this from SA's list of threads) instead of bailing out if the
    thread has exited or terminated.

    Testing:
    1. Tested by attaching and detaching multiple times to a test program
    spawning numerous short lived threads.
    2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed
    with
    100 repeats on Mach5.
    3. No new failures and no occurrences of JDK-8202884 seen with testing
    the SA tests (tiers 1 to 5) on Mach5.

    More details in the bug comments section.

    Thank you,
    Jini.



--

Thanks,
Jc

Reply via email to