Thank you, Chris and Serguei, for reviewing this change.
I did more testing with a test program on Linux that repeats
get_namespace_pid() method and reads from the real file ( the copy of
/proc/<pid>/status).
I paused the program after the first several lines ( but not "NSpid:"
line) where processed and deleted all lines in the status file
before " NSpid: ..." line in order
to this line became the first in the edited file. After that the
program continues in the while loop but it seems as the original file
content was
already buffered and the program just continues over the original
unedited lines and successfully found the match.
Best regards,
Daniil
On 6/19/19, 9:13 PM, "Chris Plummer" <chris.plum...@oracle.com> wrote:
Hi Daniil,
I think your fix is good, although I wonder about the
robustness of this
function given that the status file can change while it is
reading it. I
wonder if it can return a false negative because it never saw the
matching line. This could happen if a line gets deleted, causing
the
matching line to suddenly be earlier in the file, possibly
before the
current location being read. Anyway, that's not really related
to the
current issue or fix, but if you think it might be an issue
maybe a bug
should be filed for it.
thanks,
Chris
On 6/19/19 9:02 PM, Daniil Titov wrote:
> Please review the change that fixes an intermittent failure of
serviceability/dcmd/framework/* tests on Linux platform.
>
> The problem here is that get_namespace_pid() method, that is called
by mmap_attach_shared () that in turn is called by PerfMemory::attach(),
> tries to read the namespace pid information from /proc/<pid>/status
file. However, it doesn't check that the error indicator associated with
> stream is set that results in the endless loop (lines 664-677) if
the process terminates after /proc/<pid>/status was opened (line 659)
> and checked for null (line 661).
>
> 658 snprintf(fname, sizeof(fname), "/proc/%d/status", vmid);
> 659 FILE *fp = fopen(fname, "r");
> 660
> 661 if (fp) {
> 662 int pid, nspid;
> 663 int ret;
> 664 while (!feof(fp)) {
> 665 ret = fscanf(fp, "NSpid: %d %d", &pid, &nspid);
> 666 if (ret == 1) {
> 667 break;
> 668 }
> 669 if (ret == 2) {
> 670 retpid = nspid;
> 671 break;
> 672 }
> 673 for (;;) {
> 674 int ch = fgetc(fp);
> 675 if (ch == EOF || ch == (int)'\n') break;
> 676 }
> 677 }
> 678 fclose(fp);
> 679 }
>
> The fix adds the check for the error indicator to ensure that the
"while" loop terminates properly if the file no longer exists.
>
> Issues [3] and [4] have the same cause and will be closed as
duplicates of this issue.
>
> Testing: Mach5 hotspot_serviceability tests succeeded, tier1,tier2,
and tier3 tests are in progress.
>
> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8220175/webrev.01/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8220175
> [3] https://bugs.openjdk.java.net/browse/JDK-8223600
> [4] https://bugs.openjdk.java.net/browse/JDK-8217351
>
> Thanks!
> -Daniil
>
>