Thanks, JC! Will add the comments.

Thanks,
- Jini.

On 12/13/2018 12:09 AM, JC Beyler wrote:
Hi Jini,

Fair enough, thanks for the explanation. It makes sense to me. I imagine that it is a conservative approach, ie can't find the information then assume still live.

Perhaps a small comment above the 'X'/'Z' test saying that the threads are considered to be dead then? And for the return perhaps say: the thread is considered live if the state was not 'X'/'Z' or not found?

But those are really nits and no need to send another webrev for me!

Thanks!
Jc



On Wed, Dec 12, 2018 at 10:27 AM Jini George <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>> wrote:


    On 12/12/2018 10:47 PM, JC Beyler wrote:
     > Hi Jini,
     >
     > Should your return not be return !found_state instead of false here:
     >
     > 257   if (!found_state) {
     > 258     // Assuming the thread exists.
     > 259     print_error("Could not find the 'State:' string in the
     > /proc/%d/status file\n", pid);
     > 260   }
     > 261   fclose (fp);
     > 262   return false;
     >
     > In your webrev.00 it was the case but now, you always return the
    process
     > does exist even if you have not found it.

    I referred to the gdb sources to check what is done under this
    scenario,
    and in gdb, it is assumed that if the line beginning with 'State:' is
    not found, the thread is alive. But to be frank, I don't know under
    what
    circumstances will we ever encounter such a scenario. Let me know if
    you
    don't agree with this.

     > cf:
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
     > vs
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
     >
     > Tiny nit: no need to check *space if you are using
    isspace(*space) right
     > after :)

    Will change this. Thanks!

     >
     > Apart from the return question, the webrev looks good to me :-)
     > Jc

    Thank you again!
    Jini

     >
     >
     > On Wed, Dec 12, 2018 at 4:15 AM Jini George
    <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>
     > <mailto:jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>>>
    wrote:
     >
     >     Thank you very much for looking into this, JC!
     >
     >     I have a revised webrev addressing your comments at:
     >
     > http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html
     >
     >     Requesting one more review for this. My comments inline:
     >
     >     On 12/12/2018 2:53 AM, JC Beyler wrote:
     >      > Hi Jini,
     >      >
     >      > I saw a few nits:
     >      >
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
     >      >    -> The comments are in the third person normally it
    seems so
     >     it would
     >      > become (I also removed the s from threads):
     >      >
     >      > +// deletes a thread from the thread list
     >     Done.
     >      >
     >      >
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
     >      >    -> You added two empty lines it seems that could be removed
     >     Done.
     >      >
     >      >
     >      >
     >
    
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
     >      >    -> Is there a real reason to have both enums? We could
    have a
     >     single
     >      > enum it seems and not lose too much
     >
     >     You are right. I have done away with the WAITPID* enum.
     >
     >      >    -> you have a switch "
     >      >         switch (errno) {"
     >      >          -> Where really you could simplify the reading by
    moving
     >     the
     >      > EINTR case outside with its continue
     >      >          -> The switch could then remain as it was (though
    you move
     >      > print_debug to print_error)
     >      >          -> and just return in each cases
     >     I have changed this to:
     >
     >     206     } else {
     >     207       switch (errno) {
     >     208         case EINTR:
     >     209           continue;
     >     210           break;
     >     211         case ECHILD:
     >     212           print_debug("waitpid() failed. Child process
    pid (%d)
     >     does
     >     not exist \n", pid);
     >     213           return ATTACH_THREAD_DEAD;
     >     214         case EINVAL:
     >     215           print_error("waitpid() failed. Invalid options
     >     argument.\n");
     >     216           return ATTACH_FAIL;
     >     217         default:
     >     218           print_error("waitpid() failed. Unexpected error
    %d\n",
     >     errno);
     >     219           return ATTACH_FAIL;
     >     220       }
     >     221     } // else
     >
     >
     >      >
     >      >     -> if (strncmp (buf, "State:", 6) == 0) {
     >      >        -> You use sizeof("State:") right below; perhaps
    you could
     >     just
     >      > use "  const char const state[] = "State:";" and use
     >     sizeof(state) and
     >      > for the string, it seems less error prone
     >      >
     >      >    -> A minor "bug" is here:
     >      > +      state = buf + sizeof ("State:");
     >      >          -> You did a strncmp above but that only assures the
     >     start of
     >      > the string is "State:", technically the character after
    the ':'
     >     is the
     >      > but it could only be that; sizeof("State:") is 7 and not 6. So
     >     you miss
     >      > one character when you are skipping spaces
     >      >          -> It was probably ok because you always had at
    least one
     >      > space, ie "State: "
     >
     >     Thanks! I have made some changes here to use a const char
    string and a
     >     variable to store the calculated length using strlen(). And I
    am using
     >     isspace() now to skip spaces since tabs could also be used as a
     >     delimiter.
     >
     >      >    -> Extra space here before the '(': "sizeof (buf)"
     >     Done.
     >      >
     >      > Finally your return sequence for that method could be
    simplified to:
     >      >
     >      > +  if (!found_state) {
     >      > +    print_error(" Could not find the State: string in the
    status
     >     file
     >      > for pid %d\n", pid);
     >      > +  }
     >      > +  fclose (fp);
     >      > +  return !found_state;
     >
     >     I have modified this to:
     >
     >     257   if (!found_state) {
     >     258     // Assuming the thread exists.
     >     259     print_error("Could not find the 'State:' string in the
     >     /proc/%d/status file\n", pid);
     >     260   }
     >     261   fclose (fp);
     >     262   return false;
     >
     >     Thank you,
     >     Jini.
     >
     >      >
     >      > Thanks!
     >      > Jc
     >      >
     >      > On Tue, Dec 11, 2018 at 9:30 AM Jini George
     >     <jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>
    <mailto:jini.geo...@oracle.com <mailto:jini.geo...@oracle.com>>
     >      > <mailto:jini.geo...@oracle.com
    <mailto:jini.geo...@oracle.com> <mailto:jini.geo...@oracle.com
    <mailto:jini.geo...@oracle.com>>>>
     >     wrote:
     >      >
     >      >     Hello !
     >      >
     >      >     Requesting reviews for:
     >      >
     >      > https://bugs.openjdk.java.net/browse/JDK-8202884
     >      >     Webrev:
     > http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
     >      >
     >      >     Details:
     >      >     For attaching to the threads in a process, we first go
    ahead
     >     and do a
     >      >     ptrace attach to the main thread. Later, we use the
    libthread_db
     >      >     library
     >      >     (or, in the case of being within a container, iterate
    through the
     >      >     /proc/<pid>/task files) to discover the threads of the
     >     process, and add
     >      >     them to the threads list (within SA) for this process.
    Once,
     >     we have
     >      >     discovered all the threads and added these to the list of
     >     threads, we
     >      >     then invoke ptrace attach individually on all these
    threads to
     >      >     attach to
     >      >     these. When we deal with an application where the
    threads are
     >     exiting
     >      >     continuously, some of these threads might not exist by the
     >     time we try
     >      >     to ptrace attach to these threads. The proposed fix
    includes the
     >      >     following modifications to solve this.
     >      >        1. Check the state of the threads in the thread_db
     >     callback routine,
     >      >     and skip if the state of the thread is TD_THR_UNKNOWN or
     >     TD_THR_ZOMBIE.
     >      >     SA does not try to ptrace attach to these threads and does
     >     not include
     >      >     these threads in the threads list.
     >      >        2. While ptrace attaching to the thread, if
     >     ptrace(PTRACE_ATTACH,
     >      >     ...)
     >      >     fails with either ESCRH or EPERM, check the state of the
     >     thread by
     >      >     checking if the /proc/<pid>/status file corresponding
    to that
     >     thread
     >      >     exists and if so, reading in the 'State:' line of that
    file. Skip
     >      >     attaching to this thread and delete this thread from
    the SA
     >     list of
     >      >     threads, if the thread is dead (State: X) or is a zombie
     >     (State: Z).
     >      >       From the /proc man page, "Current state of the
    process. One
     >     of "R
     >      >     (running)", "S (sleeping)", "D (disk sleep)", "T
    (stopped)", "T
     >      >     (tracing
     >      >     stop)", "Z (zombie)", or "X (dead)"."
     >      >        3. If waitpid() on the thread is a failure, again skip
     >     this thread
     >      >     (delete this from SA's list of threads) instead of bailing
     >     out if the
     >      >     thread has exited or terminated.
     >      >
     >      >     Testing:
     >      >     1. Tested by attaching and detaching multiple times to
    a test
     >     program
     >      >     spawning numerous short lived threads.
     >      >     2. The SA tests (under
    test/hotspot/jtreg/serviceability/sa)
     >     passed
     >      >     with
     >      >     100 repeats on Mach5.
     >      >     3. No new failures and no occurrences of JDK-8202884 seen
     >     with testing
     >      >     the SA tests (tiers 1 to 5) on Mach5.
     >      >
     >      >     More details in the bug comments section.
     >      >
     >      >     Thank you,
     >      >     Jini.
     >      >
     >      >
     >      >
     >      > --
     >      >
     >      > Thanks,
     >      > Jc
     >
     >
     >
     > --
     >
     > Thanks,
     > Jc



--

Thanks,
Jc

Reply via email to