Re: perf: behavior of poll() changed in 3.18
On Thu, 22 Jan 2015, Jiri Olsa wrote: > > So what happens if you are using a signal handler to monitor a child and > > the child exits? > > AFAICS wrt to SIGIO, we notify only with POLL_IN if there's new data > and POLL_HUP if we reached the event_limit - the one you set with > PERF_EVENT_IOC_REFRESH ioctl I wrote a test for this: if you are monitoring another process with a SIGIO handler and the other process dies, then you get no signal for this at all, and in fact depending on the timing the last valid SIGIO overflow event signal might get lost (in my test I usually get one fewer POLL_IN signal than I would get from the equivelent poll() code). > I've actually never used the SIGIO interface in perf other than > when I was checking your gi repo with test code ;-) Yes, it seems that PAPI is really the only infrastructure that tries to do self-monitoring using the SIGIO interface. Which shows, as it breaks now and then. That was the whole reason I started writing my perf_event_test infrastructure in the first place. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Thu, Jan 22, 2015 at 01:53:35AM -0500, Vince Weaver wrote: > On Wed, 21 Jan 2015, Jiri Olsa wrote: > > > > However if we revert this code, we'll loose nice (and standard) way > > to check if the event is still valid.. not sure how to handle this. > > there's likely no need to revert as my code wasn't really released and > I've already fixed it to work with the new interface. > > I was mostly asking just so I could update the manpage to explain the new > behavior, as tools that expect to be backwards compatible will have to > handle both ways of detecting a process dieing. > > > > Part of why my code doesn't just exit on POLLHUP is because you can > > > get that result for reasons other than a process exit (for example, > > > if you are using ioctl(PERF_EVENT_IOC_REFRESH) > > > > Nope, this is related to POOL_HUP (notice the '_') which you'll get > > accompanied with SIGIO if you setup this. > > So what happens if you are using a signal handler to monitor a child and > the child exits? AFAICS wrt to SIGIO, we notify only with POLL_IN if there's new data and POLL_HUP if we reached the event_limit - the one you set with PERF_EVENT_IOC_REFRESH ioctl > > It's a shame the poll and signal handler interfaces are subtly different, > though I guess some of that is probably due to historical reasons. I've actually never used the SIGIO interface in perf other than when I was checking your gi repo with test code ;-) so I'm not sure what the correct behaviour should be when monitored process dies.. I'd say we should send SIGIO with POLL_HUP, but that clashes with that 'event_limit' thing.. I'll check on that thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Thu, Jan 22, 2015 at 01:53:35AM -0500, Vince Weaver wrote: On Wed, 21 Jan 2015, Jiri Olsa wrote: However if we revert this code, we'll loose nice (and standard) way to check if the event is still valid.. not sure how to handle this. there's likely no need to revert as my code wasn't really released and I've already fixed it to work with the new interface. I was mostly asking just so I could update the manpage to explain the new behavior, as tools that expect to be backwards compatible will have to handle both ways of detecting a process dieing. Part of why my code doesn't just exit on POLLHUP is because you can get that result for reasons other than a process exit (for example, if you are using ioctl(PERF_EVENT_IOC_REFRESH) Nope, this is related to POOL_HUP (notice the '_') which you'll get accompanied with SIGIO if you setup this. So what happens if you are using a signal handler to monitor a child and the child exits? AFAICS wrt to SIGIO, we notify only with POLL_IN if there's new data and POLL_HUP if we reached the event_limit - the one you set with PERF_EVENT_IOC_REFRESH ioctl It's a shame the poll and signal handler interfaces are subtly different, though I guess some of that is probably due to historical reasons. I've actually never used the SIGIO interface in perf other than when I was checking your gi repo with test code ;-) so I'm not sure what the correct behaviour should be when monitored process dies.. I'd say we should send SIGIO with POLL_HUP, but that clashes with that 'event_limit' thing.. I'll check on that thanks, jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Thu, 22 Jan 2015, Jiri Olsa wrote: So what happens if you are using a signal handler to monitor a child and the child exits? AFAICS wrt to SIGIO, we notify only with POLL_IN if there's new data and POLL_HUP if we reached the event_limit - the one you set with PERF_EVENT_IOC_REFRESH ioctl I wrote a test for this: if you are monitoring another process with a SIGIO handler and the other process dies, then you get no signal for this at all, and in fact depending on the timing the last valid SIGIO overflow event signal might get lost (in my test I usually get one fewer POLL_IN signal than I would get from the equivelent poll() code). I've actually never used the SIGIO interface in perf other than when I was checking your gi repo with test code ;-) Yes, it seems that PAPI is really the only infrastructure that tries to do self-monitoring using the SIGIO interface. Which shows, as it breaks now and then. That was the whole reason I started writing my perf_event_test infrastructure in the first place. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Wed, 21 Jan 2015, Jiri Olsa wrote: > > However if we revert this code, we'll loose nice (and standard) way > to check if the event is still valid.. not sure how to handle this. there's likely no need to revert as my code wasn't really released and I've already fixed it to work with the new interface. I was mostly asking just so I could update the manpage to explain the new behavior, as tools that expect to be backwards compatible will have to handle both ways of detecting a process dieing. > > Part of why my code doesn't just exit on POLLHUP is because you can > > get that result for reasons other than a process exit (for example, > > if you are using ioctl(PERF_EVENT_IOC_REFRESH) > > Nope, this is related to POOL_HUP (notice the '_') which you'll get > accompanied with SIGIO if you setup this. So what happens if you are using a signal handler to monitor a child and the child exits? It's a shame the poll and signal handler interfaces are subtly different, though I guess some of that is probably due to historical reasons. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Wed, Jan 21, 2015 at 01:06:30AM -0500, Vince Weaver wrote: > On Tue, 20 Jan 2015, Jiri Olsa wrote: > > > I made this change to get notification that monitored > > process exited. We use it in 'perf record' to find out > > that we have nothing more to monitor and quit. > > > > The logic is to return POLLHUP (via poll syscall) when > > the monitored process exits for the event. > > > > Could you please share your test code? > > My code does something like this to count the number of overflows > happening in a child: > > /* fds contains one fd, which is a harware event */ > /* attached to a child */ > > while(1) { > result=poll(fds,1,100); > if (result==0) { > waitpid(child,,WNOHANG); unrelated, but I noticed, that this seems broken.. you shouldn't check WIFEXITED(status) without waitpid(... WNOHANG) returning > 0 > if (WIFEXITED(status)) break; > if (WIFSIGNALED(status)) { > printf("Signalled %d!\n",WTERMSIG(status)); > break; > } > } > > if (fds[0].revents) count.in++; > if (fds[0].revents) count.hup++; > if (fds[0].revents) break; > } > > With the change you made this code hangs forever because poll no longer > returns "0" when the child exits, but instead returns "1" with > POLLHUP set. right, once event is HUPed the poll will return POLLHUP in revents > > I admit my code isn't the best out there, but it worked reliably up > until 3.18 when the change was made. I think that poll returning POLLHUP for an event which cease to work due to its process died is correct. On the other side we broke your code, which is against the policy. However if we revert this code, we'll loose nice (and standard) way to check if the event is still valid.. not sure how to handle this. thoughts? > > Part of why my code doesn't just exit on POLLHUP is because you can > get that result for reasons other than a process exit (for example, > if you are using ioctl(PERF_EVENT_IOC_REFRESH) Nope, this is related to POOL_HUP (notice the '_') which you'll get accompanied with SIGIO if you setup this. Before you could get POLLHUP (through poll) only if you'd called poll on unmapped event. Now (after commit 179033b3e064) you'll get it also if the process that the event monitored has died. jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Wed, Jan 21, 2015 at 01:06:30AM -0500, Vince Weaver wrote: On Tue, 20 Jan 2015, Jiri Olsa wrote: I made this change to get notification that monitored process exited. We use it in 'perf record' to find out that we have nothing more to monitor and quit. The logic is to return POLLHUP (via poll syscall) when the monitored process exits for the event. Could you please share your test code? My code does something like this to count the number of overflows happening in a child: /* fds contains one fd, which is a harware event */ /* attached to a child */ while(1) { result=poll(fds,1,100); if (result==0) { waitpid(child,status,WNOHANG); unrelated, but I noticed, that this seems broken.. you shouldn't check WIFEXITED(status) without waitpid(... WNOHANG) returning 0 if (WIFEXITED(status)) break; if (WIFSIGNALED(status)) { printf(Signalled %d!\n,WTERMSIG(status)); break; } } if (fds[0].reventsPOLLIN) count.in++; if (fds[0].reventsPOLLHUP) count.hup++; if (fds[0].reventsPOLLERR) break; } With the change you made this code hangs forever because poll no longer returns 0 when the child exits, but instead returns 1 with POLLHUP set. right, once event is HUPed the poll will return POLLHUP in revents I admit my code isn't the best out there, but it worked reliably up until 3.18 when the change was made. I think that poll returning POLLHUP for an event which cease to work due to its process died is correct. On the other side we broke your code, which is against the policy. However if we revert this code, we'll loose nice (and standard) way to check if the event is still valid.. not sure how to handle this. thoughts? Part of why my code doesn't just exit on POLLHUP is because you can get that result for reasons other than a process exit (for example, if you are using ioctl(PERF_EVENT_IOC_REFRESH) Nope, this is related to POOL_HUP (notice the '_') which you'll get accompanied with SIGIO if you setup this. Before you could get POLLHUP (through poll) only if you'd called poll on unmapped event. Now (after commit 179033b3e064) you'll get it also if the process that the event monitored has died. jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Wed, 21 Jan 2015, Jiri Olsa wrote: However if we revert this code, we'll loose nice (and standard) way to check if the event is still valid.. not sure how to handle this. there's likely no need to revert as my code wasn't really released and I've already fixed it to work with the new interface. I was mostly asking just so I could update the manpage to explain the new behavior, as tools that expect to be backwards compatible will have to handle both ways of detecting a process dieing. Part of why my code doesn't just exit on POLLHUP is because you can get that result for reasons other than a process exit (for example, if you are using ioctl(PERF_EVENT_IOC_REFRESH) Nope, this is related to POOL_HUP (notice the '_') which you'll get accompanied with SIGIO if you setup this. So what happens if you are using a signal handler to monitor a child and the child exits? It's a shame the poll and signal handler interfaces are subtly different, though I guess some of that is probably due to historical reasons. Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Tue, 20 Jan 2015, Jiri Olsa wrote: > I made this change to get notification that monitored > process exited. We use it in 'perf record' to find out > that we have nothing more to monitor and quit. > > The logic is to return POLLHUP (via poll syscall) when > the monitored process exits for the event. > > Could you please share your test code? My code does something like this to count the number of overflows happening in a child: /* fds contains one fd, which is a harware event */ /* attached to a child */ while(1) { result=poll(fds,1,100); if (result==0) { waitpid(child,,WNOHANG); if (WIFEXITED(status)) break; if (WIFSIGNALED(status)) { printf("Signalled %d!\n",WTERMSIG(status)); break; } } if (fds[0].revents) count.in++; if (fds[0].revents) count.hup++; if (fds[0].revents) break; } With the change you made this code hangs forever because poll no longer returns "0" when the child exits, but instead returns "1" with POLLHUP set. I admit my code isn't the best out there, but it worked reliably up until 3.18 when the change was made. Part of why my code doesn't just exit on POLLHUP is because you can get that result for reasons other than a process exit (for example, if you are using ioctl(PERF_EVENT_IOC_REFRESH) Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Tue, Jan 20, 2015 at 12:03:19PM -0500, Vince Weaver wrote: > Hello > > Some of my code that used poll() on a perf_event fd broke sometime between > 3.17 and current. (My perf_event_test validation testsuite caught this > too but I haven't been running it as regularly as I should). > > I bisected this to: > > commit 179033b3e064d2cd3f5f9945e76b0a0f0fbf4883 > Author: Jiri Olsa > Date: Thu Aug 7 11:48:26 2014 -0400 > > perf: Add PERF_EVENT_STATE_EXIT state for events with exited task > > > Before, when polling, my code would get a > result=0/WIFEXITED() > return when a child with events being polled exited. > > Now instead I get a > result=1/POLLHUP > return. > > Needless to say this broke my code. Is there a reason this change was > made? hi, I made this change to get notification that monitored process exited. We use it in 'perf record' to find out that we have nothing more to monitor and quit. The logic is to return POLLHUP (via poll syscall) when the monitored process exits for the event. Could you please share your test code? thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
perf: behavior of poll() changed in 3.18
Hello Some of my code that used poll() on a perf_event fd broke sometime between 3.17 and current. (My perf_event_test validation testsuite caught this too but I haven't been running it as regularly as I should). I bisected this to: commit 179033b3e064d2cd3f5f9945e76b0a0f0fbf4883 Author: Jiri Olsa Date: Thu Aug 7 11:48:26 2014 -0400 perf: Add PERF_EVENT_STATE_EXIT state for events with exited task Before, when polling, my code would get a result=0/WIFEXITED() return when a child with events being polled exited. Now instead I get a result=1/POLLHUP return. Needless to say this broke my code. Is there a reason this change was made? Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
perf: behavior of poll() changed in 3.18
Hello Some of my code that used poll() on a perf_event fd broke sometime between 3.17 and current. (My perf_event_test validation testsuite caught this too but I haven't been running it as regularly as I should). I bisected this to: commit 179033b3e064d2cd3f5f9945e76b0a0f0fbf4883 Author: Jiri Olsa jo...@kernel.org Date: Thu Aug 7 11:48:26 2014 -0400 perf: Add PERF_EVENT_STATE_EXIT state for events with exited task Before, when polling, my code would get a result=0/WIFEXITED() return when a child with events being polled exited. Now instead I get a result=1/POLLHUP return. Needless to say this broke my code. Is there a reason this change was made? Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Tue, Jan 20, 2015 at 12:03:19PM -0500, Vince Weaver wrote: Hello Some of my code that used poll() on a perf_event fd broke sometime between 3.17 and current. (My perf_event_test validation testsuite caught this too but I haven't been running it as regularly as I should). I bisected this to: commit 179033b3e064d2cd3f5f9945e76b0a0f0fbf4883 Author: Jiri Olsa jo...@kernel.org Date: Thu Aug 7 11:48:26 2014 -0400 perf: Add PERF_EVENT_STATE_EXIT state for events with exited task Before, when polling, my code would get a result=0/WIFEXITED() return when a child with events being polled exited. Now instead I get a result=1/POLLHUP return. Needless to say this broke my code. Is there a reason this change was made? hi, I made this change to get notification that monitored process exited. We use it in 'perf record' to find out that we have nothing more to monitor and quit. The logic is to return POLLHUP (via poll syscall) when the monitored process exits for the event. Could you please share your test code? thanks, jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: behavior of poll() changed in 3.18
On Tue, 20 Jan 2015, Jiri Olsa wrote: I made this change to get notification that monitored process exited. We use it in 'perf record' to find out that we have nothing more to monitor and quit. The logic is to return POLLHUP (via poll syscall) when the monitored process exits for the event. Could you please share your test code? My code does something like this to count the number of overflows happening in a child: /* fds contains one fd, which is a harware event */ /* attached to a child */ while(1) { result=poll(fds,1,100); if (result==0) { waitpid(child,status,WNOHANG); if (WIFEXITED(status)) break; if (WIFSIGNALED(status)) { printf(Signalled %d!\n,WTERMSIG(status)); break; } } if (fds[0].reventsPOLLIN) count.in++; if (fds[0].reventsPOLLHUP) count.hup++; if (fds[0].reventsPOLLERR) break; } With the change you made this code hangs forever because poll no longer returns 0 when the child exits, but instead returns 1 with POLLHUP set. I admit my code isn't the best out there, but it worked reliably up until 3.18 when the change was made. Part of why my code doesn't just exit on POLLHUP is because you can get that result for reasons other than a process exit (for example, if you are using ioctl(PERF_EVENT_IOC_REFRESH) Vince -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/