On 12/12/2018 10:47 PM, JC Beyler wrote:
Hi Jini,

Should your return not be return !found_state instead of false here:

257   if (!found_state) {
258     // Assuming the thread exists.
259     print_error("Could not find the 'State:' string in the
/proc/%d/status file\n", pid);
260   }
261   fclose (fp);
262   return false;

In your webrev.00 it was the case but now, you always return the process does exist even if you have not found it.

I referred to the gdb sources to check what is done under this scenario, and in gdb, it is assumed that if the line beginning with 'State:' is not found, the thread is alive. But to be frank, I don't know under what circumstances will we ever encounter such a scenario. Let me know if you don't agree with this.

cf:
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
vs
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html

Tiny nit: no need to check *space if you are using isspace(*space) right after :)

Will change this. Thanks!


Apart from the return question, the webrev looks good to me :-)
Jc

Thank you again!
Jini



On Wed, Dec 12, 2018 at 4:15 AM Jini George <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>> wrote:

    Thank you very much for looking into this, JC!

    I have a revised webrev addressing your comments at:

    http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html

    Requesting one more review for this. My comments inline:

    On 12/12/2018 2:53 AM, JC Beyler wrote:
     > Hi Jini,
     >
     > I saw a few nits:
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
     >    -> The comments are in the third person normally it seems so
    it would
     > become (I also removed the s from threads):
     >
     > +// deletes a thread from the thread list
    Done.
     >
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
     >    -> You added two empty lines it seems that could be removed
    Done.
     >
     >
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
     >    -> Is there a real reason to have both enums? We could have a
    single
     > enum it seems and not lose too much

    You are right. I have done away with the WAITPID* enum.

     >    -> you have a switch "
     >         switch (errno) {"
     >          -> Where really you could simplify the reading by moving
    the
     > EINTR case outside with its continue
     >          -> The switch could then remain as it was (though you move
     > print_debug to print_error)
     >          -> and just return in each cases
    I have changed this to:

    206     } else {
    207       switch (errno) {
    208         case EINTR:
    209           continue;
    210           break;
    211         case ECHILD:
    212           print_debug("waitpid() failed. Child process pid (%d)
    does
    not exist \n", pid);
    213           return ATTACH_THREAD_DEAD;
    214         case EINVAL:
    215           print_error("waitpid() failed. Invalid options
    argument.\n");
    216           return ATTACH_FAIL;
    217         default:
    218           print_error("waitpid() failed. Unexpected error %d\n",
    errno);
    219           return ATTACH_FAIL;
    220       }
    221     } // else


     >
     >     -> if (strncmp (buf, "State:", 6) == 0) {
     >        -> You use sizeof("State:") right below; perhaps you could
    just
     > use "  const char const state[] = "State:";" and use
    sizeof(state) and
     > for the string, it seems less error prone
     >
     >    -> A minor "bug" is here:
     > +      state = buf + sizeof ("State:");
     >          -> You did a strncmp above but that only assures the
    start of
     > the string is "State:", technically the character after the ':'
    is the
     > but it could only be that; sizeof("State:") is 7 and not 6. So
    you miss
     > one character when you are skipping spaces
     >          -> It was probably ok because you always had at least one
     > space, ie "State: "

    Thanks! I have made some changes here to use a const char string and a
    variable to store the calculated length using strlen(). And I am using
    isspace() now to skip spaces since tabs could also be used as a
    delimiter.

     >    -> Extra space here before the '(': "sizeof (buf)"
    Done.
     >
     > Finally your return sequence for that method could be simplified to:
     >
     > +  if (!found_state) {
     > +    print_error(" Could not find the State: string in the status
    file
     > for pid %d\n", pid);
     > +  }
     > +  fclose (fp);
     > +  return !found_state;

    I have modified this to:

    257   if (!found_state) {
    258     // Assuming the thread exists.
    259     print_error("Could not find the 'State:' string in the
    /proc/%d/status file\n", pid);
    260   }
    261   fclose (fp);
    262   return false;

    Thank you,
    Jini.

     >
     > Thanks!
     > Jc
     >
     > On Tue, Dec 11, 2018 at 9:30 AM Jini George
    <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>
     > <mailto:jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>>>
    wrote:
     >
     >     Hello !
     >
     >     Requesting reviews for:
     >
     > https://bugs.openjdk.java.net/browse/JDK-8202884
     >     Webrev:
    http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
     >
     >     Details:
     >     For attaching to the threads in a process, we first go ahead
    and do a
     >     ptrace attach to the main thread. Later, we use the libthread_db
     >     library
     >     (or, in the case of being within a container, iterate through the
     >     /proc/<pid>/task files) to discover the threads of the
    process, and add
     >     them to the threads list (within SA) for this process. Once,
    we have
     >     discovered all the threads and added these to the list of
    threads, we
     >     then invoke ptrace attach individually on all these threads to
     >     attach to
     >     these. When we deal with an application where the threads are
    exiting
     >     continuously, some of these threads might not exist by the
    time we try
     >     to ptrace attach to these threads. The proposed fix includes the
     >     following modifications to solve this.
     >        1. Check the state of the threads in the thread_db
    callback routine,
     >     and skip if the state of the thread is TD_THR_UNKNOWN or
    TD_THR_ZOMBIE.
     >     SA does not try to ptrace attach to these threads and does
    not include
     >     these threads in the threads list.
     >        2. While ptrace attaching to the thread, if
    ptrace(PTRACE_ATTACH,
     >     ...)
     >     fails with either ESCRH or EPERM, check the state of the
    thread by
     >     checking if the /proc/<pid>/status file corresponding to that
    thread
     >     exists and if so, reading in the 'State:' line of that file. Skip
     >     attaching to this thread and delete this thread from the SA
    list of
     >     threads, if the thread is dead (State: X) or is a zombie
    (State: Z).
     >       From the /proc man page, "Current state of the process. One
    of "R
     >     (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T
     >     (tracing
     >     stop)", "Z (zombie)", or "X (dead)"."
     >        3. If waitpid() on the thread is a failure, again skip
    this thread
     >     (delete this from SA's list of threads) instead of bailing
    out if the
     >     thread has exited or terminated.
     >
     >     Testing:
     >     1. Tested by attaching and detaching multiple times to a test
    program
     >     spawning numerous short lived threads.
     >     2. The SA tests (under test/hotspot/jtreg/serviceability/sa)
    passed
     >     with
     >     100 repeats on Mach5.
     >     3. No new failures and no occurrences of JDK-8202884 seen
    with testing
     >     the SA tests (tiers 1 to 5) on Mach5.
     >
     >     More details in the bug comments section.
     >
     >     Thank you,
     >     Jini.
     >
     >
     >
     > --
     >
     > Thanks,
     > Jc



--

Thanks,
Jc

Reply via email to