Thanks for looking into this.

Chris

On 6/20/19 6:42 PM, Daniil Titov wrote:
Thank you, Chris and Serguei, for reviewing this change.

I did more testing with a test program on Linux that repeats get_namespace_pid() 
method and reads from the real file ( the copy of /proc/<pid>/status).
I paused the program after the first several lines ( but not "NSpid:" line)  where 
processed and deleted all lines in the status file before " NSpid: ..." line in order
to this line became the first in the edited file. After that the program 
continues in the while loop but it seems as the original file content was
already buffered and the program just continues over the original unedited 
lines and successfully found the match.

Best regards,
Daniil

On 6/19/19, 9:13 PM, "Chris Plummer" <chris.plum...@oracle.com> wrote:

     Hi Daniil,
I think your fix is good, although I wonder about the robustness of this
     function given that the status file can change while it is reading it. I
     wonder if it can return a false negative because it never saw the
     matching line. This could happen if a line gets deleted, causing the
     matching line to suddenly be earlier in the file, possibly before the
     current location being read. Anyway, that's not really related to the
     current issue or fix, but if you think it might be an issue maybe a bug
     should be filed for it.
thanks, Chris On 6/19/19 9:02 PM, Daniil Titov wrote:
     > Please review the change that fixes an intermittent failure of 
serviceability/dcmd/framework/* tests on Linux platform.
     >
     > The problem here is that get_namespace_pid() method, that is called by 
mmap_attach_shared () that in turn is called by PerfMemory::attach(),
     > tries to read the namespace pid information from /proc/<pid>/status 
file. However, it doesn't check that the error indicator associated with
     > stream is set that results in the endless  loop (lines 664-677) if the process 
terminates after /proc/<pid>/status was opened (line 659)
     > and checked for null (line 661).
     >
     >    658          snprintf(fname, sizeof(fname), "/proc/%d/status", vmid);
     >     659         FILE *fp = fopen(fname, "r");
     >     660       
     >     661         if (fp) {
     >     662           int pid, nspid;
     >     663           int ret;
     >     664           while (!feof(fp)) {
     >     665             ret = fscanf(fp, "NSpid: %d %d", &pid, &nspid);
     >     666             if (ret == 1) {
     >     667               break;
     >     668             }
     >     669             if (ret == 2) {
     >     670               retpid = nspid;
     >     671               break;
     >     672             }
     >     673             for (;;) {
     >     674               int ch = fgetc(fp);
     >     675               if (ch == EOF || ch == (int)'\n') break;
     >     676             }
     >     677           }
     >     678           fclose(fp);
     >     679         }
     >
     > The fix adds the check for the error indicator to ensure that the 
"while" loop terminates properly if the file no longer exists.
     >
     > Issues [3] and [4] have the same cause and will be closed as duplicates 
of this issue.
     >
     > Testing: Mach5 hotspot_serviceability tests succeeded, tier1,tier2, and 
tier3 tests are in progress.
     >
     > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8220175/webrev.01/
     > [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8220175
     > [3] https://bugs.openjdk.java.net/browse/JDK-8223600
     > [4] https://bugs.openjdk.java.net/browse/JDK-8217351
     >
     > Thanks!
     > -Daniil
     >
     >



Reply via email to