Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On 24.05.2018 17:51, Michael S. Tsirkin wrote: > On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote: >> On 24.05.2018 17:00, Michael S. Tsirkin wrote: >>> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: On 24.05.2018 16:30, Michael S. Tsirkin wrote: > Right now tests report OK status if QEMU crashes during cleanup. > Let's catch that case and fail the test. > > Signed-off-by: Michael S. Tsirkin> --- > tests/libqtest.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/tests/libqtest.c b/tests/libqtest.c > index 43fb97e..f869854 100644 > --- a/tests/libqtest.c > +++ b/tests/libqtest.c > @@ -103,8 +103,15 @@ static int socket_accept(int sock) > static void kill_qemu(QTestState *s) > { > if (s->qemu_pid != -1) { > +int wstatus = 0; > +pid_t pid; > + > kill(s->qemu_pid, SIGTERM); > -waitpid(s->qemu_pid, NULL, 0); > +pid = waitpid(s->qemu_pid, , 0); > + > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > +assert(!WCOREDUMP(wstatus)); > +} > } > } That's basically a good idea ... but I've already seen yet another issue in the past already: QEMU sometimes simply hangs in an endless loop during clean up and never terminates. I think we should detect that situation, too. So instead of killing QEMU at the end of the testing, I think we should rather try to terminate it with the QMP "quit" command. If QEMU does not terminate with an exit code of 0, then the test should be flagged a failed (and only if QEMU did not terminate at all, it should be killed with SIGKILL). Thomas >>> >>> Fine but can we agree to do this as a patch on top? And do you have >>> the time to implement this? >> >> Fine for me if we do that later. And no, I currently don't have time to >> work on this (but I've got it on my TODO list somewhere, so I hope I >> won't forget about it later...). >> >>> I'm seeing patches that cause crash on cleanup, it's not a theoretical >>> problem for me, so I'd like this one to go in first. >> >> Ok, so here are the two problems that I remember: >> >> 1) >> >> git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1 >> make -j4 subdir-nios2-softmmu >> nios2-softmmu/qemu-system-nios2 -monitor stdio >> >> ==> You can neither "quit" from the HMP prompt, nor kill QEMU with >> SIGTERM, you've got to use SIGKILL instead. >> >> Ok, libqtest likely would not have reported success in this case, too, >> we just did not notice since there is no libqtest in place that tests >> the nios2 machine in TCG mode. Anyway, it would be nice if qtest would >> properly detect the situation and report an error instead of just >> hanging in waitpid(). > > That's a tall order, it assumes we can define a reasonable > time during which QEMU should exit. Yeah, thinking about this again, and considering the problems that we've seen with overloaded test machines in the past, it's probably not worth the effort to find a reasonable timeout here. And since the problem with an endless loop won't go unnoticed (I originally thought SIGTERM would kill QEMU, so that endless loops would not be noticed here - I was just not aware of the signal handler that we have here in QEMU), let's simply forget about that idea with QMP "quit" + timeout + SIGKILL and continue with your patch instead. Thomas >> 2) >> >> git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1 >> make -j4 subdir-ppc-softmmu >> ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio >> >> ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the >> problem where libqtest did not report an error though it should have >> reported one. So QEMU was not hanging in an endless loop here, but core >> dumped ... Sorry, I apparently mixed this up in my mind with the first >> case. That means we should be fine here with your patch. >> >> Thomas
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote: > On 24.05.2018 17:00, Michael S. Tsirkin wrote: > > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: > >> On 24.05.2018 16:30, Michael S. Tsirkin wrote: > >>> Right now tests report OK status if QEMU crashes during cleanup. > >>> Let's catch that case and fail the test. > >>> > >>> Signed-off-by: Michael S. Tsirkin> >>> --- > >>> tests/libqtest.c | 9 - > >>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/tests/libqtest.c b/tests/libqtest.c > >>> index 43fb97e..f869854 100644 > >>> --- a/tests/libqtest.c > >>> +++ b/tests/libqtest.c > >>> @@ -103,8 +103,15 @@ static int socket_accept(int sock) > >>> static void kill_qemu(QTestState *s) > >>> { > >>> if (s->qemu_pid != -1) { > >>> +int wstatus = 0; > >>> +pid_t pid; > >>> + > >>> kill(s->qemu_pid, SIGTERM); > >>> -waitpid(s->qemu_pid, NULL, 0); > >>> +pid = waitpid(s->qemu_pid, , 0); > >>> + > >>> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > >>> +assert(!WCOREDUMP(wstatus)); > >>> +} > >>> } > >>> } > >> > >> That's basically a good idea ... but I've already seen yet another issue > >> in the past already: QEMU sometimes simply hangs in an endless loop > >> during clean up and never terminates. I think we should detect that > >> situation, too. So instead of killing QEMU at the end of the testing, I > >> think we should > >> rather try to terminate it with the QMP "quit" command. If QEMU does not > >> terminate with an exit code of 0, then the test should be flagged a > >> failed (and only if QEMU did not terminate at all, it should be killed > >> with SIGKILL). > >> > >> Thomas > > > > Fine but can we agree to do this as a patch on top? And do you have > > the time to implement this? > > Fine for me if we do that later. And no, I currently don't have time to > work on this (but I've got it on my TODO list somewhere, so I hope I > won't forget about it later...). > > > I'm seeing patches that cause crash on cleanup, it's not a theoretical > > problem for me, so I'd like this one to go in first. > > Ok, so here are the two problems that I remember: > > 1) > > git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1 > make -j4 subdir-nios2-softmmu > nios2-softmmu/qemu-system-nios2 -monitor stdio > > ==> You can neither "quit" from the HMP prompt, nor kill QEMU with > SIGTERM, you've got to use SIGKILL instead. > > Ok, libqtest likely would not have reported success in this case, too, > we just did not notice since there is no libqtest in place that tests > the nios2 machine in TCG mode. Anyway, it would be nice if qtest would > properly detect the situation and report an error instead of just > hanging in waitpid(). That's a tall order, it assumes we can define a reasonable time during which QEMU should exit. > 2) > > git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1 > make -j4 subdir-ppc-softmmu > ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio > > ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the > problem where libqtest did not report an error though it should have > reported one. So QEMU was not hanging in an endless loop here, but core > dumped ... Sorry, I apparently mixed this up in my mind with the first > case. That means we should be fine here with your patch. > > Thomas
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On 24.05.2018 17:00, Michael S. Tsirkin wrote: > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: >> On 24.05.2018 16:30, Michael S. Tsirkin wrote: >>> Right now tests report OK status if QEMU crashes during cleanup. >>> Let's catch that case and fail the test. >>> >>> Signed-off-by: Michael S. Tsirkin>>> --- >>> tests/libqtest.c | 9 - >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/libqtest.c b/tests/libqtest.c >>> index 43fb97e..f869854 100644 >>> --- a/tests/libqtest.c >>> +++ b/tests/libqtest.c >>> @@ -103,8 +103,15 @@ static int socket_accept(int sock) >>> static void kill_qemu(QTestState *s) >>> { >>> if (s->qemu_pid != -1) { >>> +int wstatus = 0; >>> +pid_t pid; >>> + >>> kill(s->qemu_pid, SIGTERM); >>> -waitpid(s->qemu_pid, NULL, 0); >>> +pid = waitpid(s->qemu_pid, , 0); >>> + >>> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { >>> +assert(!WCOREDUMP(wstatus)); >>> +} >>> } >>> } >> >> That's basically a good idea ... but I've already seen yet another issue >> in the past already: QEMU sometimes simply hangs in an endless loop >> during clean up and never terminates. I think we should detect that >> situation, too. So instead of killing QEMU at the end of the testing, I >> think we should >> rather try to terminate it with the QMP "quit" command. If QEMU does not >> terminate with an exit code of 0, then the test should be flagged a >> failed (and only if QEMU did not terminate at all, it should be killed >> with SIGKILL). >> >> Thomas > > Fine but can we agree to do this as a patch on top? And do you have > the time to implement this? Fine for me if we do that later. And no, I currently don't have time to work on this (but I've got it on my TODO list somewhere, so I hope I won't forget about it later...). > I'm seeing patches that cause crash on cleanup, it's not a theoretical > problem for me, so I'd like this one to go in first. Ok, so here are the two problems that I remember: 1) git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1 make -j4 subdir-nios2-softmmu nios2-softmmu/qemu-system-nios2 -monitor stdio ==> You can neither "quit" from the HMP prompt, nor kill QEMU with SIGTERM, you've got to use SIGKILL instead. Ok, libqtest likely would not have reported success in this case, too, we just did not notice since there is no libqtest in place that tests the nios2 machine in TCG mode. Anyway, it would be nice if qtest would properly detect the situation and report an error instead of just hanging in waitpid(). 2) git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1 make -j4 subdir-ppc-softmmu ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the problem where libqtest did not report an error though it should have reported one. So QEMU was not hanging in an endless loop here, but core dumped ... Sorry, I apparently mixed this up in my mind with the first case. That means we should be fine here with your patch. Thomas
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On Thu, May 24, 2018 at 05:04:56PM +0200, Marc-André Lureau wrote: > Hi > > On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkinwrote: > > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: > >> On 24.05.2018 16:30, Michael S. Tsirkin wrote: > >> > Right now tests report OK status if QEMU crashes during cleanup. > >> > Let's catch that case and fail the test. > >> > > >> > Signed-off-by: Michael S. Tsirkin > >> > --- > >> > tests/libqtest.c | 9 - > >> > 1 file changed, 8 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/tests/libqtest.c b/tests/libqtest.c > >> > index 43fb97e..f869854 100644 > >> > --- a/tests/libqtest.c > >> > +++ b/tests/libqtest.c > >> > @@ -103,8 +103,15 @@ static int socket_accept(int sock) > >> > static void kill_qemu(QTestState *s) > >> > { > >> > if (s->qemu_pid != -1) { > >> > +int wstatus = 0; > >> > +pid_t pid; > >> > + > >> > kill(s->qemu_pid, SIGTERM); > >> > -waitpid(s->qemu_pid, NULL, 0); > >> > +pid = waitpid(s->qemu_pid, , 0); > >> > + > >> > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > >> > +assert(!WCOREDUMP(wstatus)); > >> > +} > >> > } > >> > } > >> > >> That's basically a good idea ... but I've already seen yet another issue > >> in the past already: QEMU sometimes simply hangs in an endless loop > >> during clean up and never terminates. I think we should detect that > >> situation, too. So instead of killing QEMU at the end of the testing, I > >> think we should > >> rather try to terminate it with the QMP "quit" command. If QEMU does not > >> terminate with an exit code of 0, then the test should be flagged a > >> failed (and only if QEMU did not terminate at all, it should be killed > >> with SIGKILL). > >> > >> Thomas > > > > Fine but can we agree to do this as a patch on top? And do you have > > the time to implement this? > > > > I'm seeing patches that cause crash on cleanup, it's not a theoretical > > problem for me, so I'd like this one to go in first. > > What is the difference between sending "quit" and SIGTERM (qemu has > code to handle it in termsig_handler()). Is this documented somewhere? SIGTERM might wake QEMU if it's blocked somewhere in kernel. Sending quit also tests QMP isn't blocked. > > -- > Marc-André Lureau
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On 24 May 2018 at 15:30, Michael S. Tsirkinwrote: > Right now tests report OK status if QEMU crashes during cleanup. > Let's catch that case and fail the test. > Ha -- I've occasionally wondered why some obvious test failures got reported as OK. Thanks for tracking down what was going on. -- PMM
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
Hi On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkinwrote: > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: >> On 24.05.2018 16:30, Michael S. Tsirkin wrote: >> > Right now tests report OK status if QEMU crashes during cleanup. >> > Let's catch that case and fail the test. >> > >> > Signed-off-by: Michael S. Tsirkin >> > --- >> > tests/libqtest.c | 9 - >> > 1 file changed, 8 insertions(+), 1 deletion(-) >> > >> > diff --git a/tests/libqtest.c b/tests/libqtest.c >> > index 43fb97e..f869854 100644 >> > --- a/tests/libqtest.c >> > +++ b/tests/libqtest.c >> > @@ -103,8 +103,15 @@ static int socket_accept(int sock) >> > static void kill_qemu(QTestState *s) >> > { >> > if (s->qemu_pid != -1) { >> > +int wstatus = 0; >> > +pid_t pid; >> > + >> > kill(s->qemu_pid, SIGTERM); >> > -waitpid(s->qemu_pid, NULL, 0); >> > +pid = waitpid(s->qemu_pid, , 0); >> > + >> > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { >> > +assert(!WCOREDUMP(wstatus)); >> > +} >> > } >> > } >> >> That's basically a good idea ... but I've already seen yet another issue >> in the past already: QEMU sometimes simply hangs in an endless loop >> during clean up and never terminates. I think we should detect that >> situation, too. So instead of killing QEMU at the end of the testing, I >> think we should >> rather try to terminate it with the QMP "quit" command. If QEMU does not >> terminate with an exit code of 0, then the test should be flagged a >> failed (and only if QEMU did not terminate at all, it should be killed >> with SIGKILL). >> >> Thomas > > Fine but can we agree to do this as a patch on top? And do you have > the time to implement this? > > I'm seeing patches that cause crash on cleanup, it's not a theoretical > problem for me, so I'd like this one to go in first. What is the difference between sending "quit" and SIGTERM (qemu has code to handle it in termsig_handler()). Is this documented somewhere? -- Marc-André Lureau
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: > On 24.05.2018 16:30, Michael S. Tsirkin wrote: > > Right now tests report OK status if QEMU crashes during cleanup. > > Let's catch that case and fail the test. > > > > Signed-off-by: Michael S. Tsirkin> > --- > > tests/libqtest.c | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/tests/libqtest.c b/tests/libqtest.c > > index 43fb97e..f869854 100644 > > --- a/tests/libqtest.c > > +++ b/tests/libqtest.c > > @@ -103,8 +103,15 @@ static int socket_accept(int sock) > > static void kill_qemu(QTestState *s) > > { > > if (s->qemu_pid != -1) { > > +int wstatus = 0; > > +pid_t pid; > > + > > kill(s->qemu_pid, SIGTERM); > > -waitpid(s->qemu_pid, NULL, 0); > > +pid = waitpid(s->qemu_pid, , 0); > > + > > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > > +assert(!WCOREDUMP(wstatus)); > > +} > > } > > } > > That's basically a good idea ... but I've already seen yet another issue > in the past already: QEMU sometimes simply hangs in an endless loop > during clean up and never terminates. I think we should detect that > situation, too. So instead of killing QEMU at the end of the testing, I think > we should > rather try to terminate it with the QMP "quit" command. If QEMU does not > terminate with an exit code of 0, then the test should be flagged a > failed (and only if QEMU did not terminate at all, it should be killed > with SIGKILL). > > Thomas Fine but can we agree to do this as a patch on top? And do you have the time to implement this? I'm seeing patches that cause crash on cleanup, it's not a theoretical problem for me, so I'd like this one to go in first. -- MST
Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
On 24.05.2018 16:30, Michael S. Tsirkin wrote: > Right now tests report OK status if QEMU crashes during cleanup. > Let's catch that case and fail the test. > > Signed-off-by: Michael S. Tsirkin> --- > tests/libqtest.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/tests/libqtest.c b/tests/libqtest.c > index 43fb97e..f869854 100644 > --- a/tests/libqtest.c > +++ b/tests/libqtest.c > @@ -103,8 +103,15 @@ static int socket_accept(int sock) > static void kill_qemu(QTestState *s) > { > if (s->qemu_pid != -1) { > +int wstatus = 0; > +pid_t pid; > + > kill(s->qemu_pid, SIGTERM); > -waitpid(s->qemu_pid, NULL, 0); > +pid = waitpid(s->qemu_pid, , 0); > + > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > +assert(!WCOREDUMP(wstatus)); > +} > } > } That's basically a good idea ... but I've already seen yet another issue in the past already: QEMU sometimes simply hangs in an endless loop during clean up and never terminates. I think we should detect that situation, too. So instead of killing QEMU at the end of the testing, I think we should rather try to terminate it with the QMP "quit" command. If QEMU does not terminate with an exit code of 0, then the test should be flagged a failed (and only if QEMU did not terminate at all, it should be killed with SIGKILL). Thomas
[Qemu-devel] [PATCH] libqtest: fail if child coredumps
Right now tests report OK status if QEMU crashes during cleanup. Let's catch that case and fail the test. Signed-off-by: Michael S. Tsirkin--- tests/libqtest.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tests/libqtest.c b/tests/libqtest.c index 43fb97e..f869854 100644 --- a/tests/libqtest.c +++ b/tests/libqtest.c @@ -103,8 +103,15 @@ static int socket_accept(int sock) static void kill_qemu(QTestState *s) { if (s->qemu_pid != -1) { +int wstatus = 0; +pid_t pid; + kill(s->qemu_pid, SIGTERM); -waitpid(s->qemu_pid, NULL, 0); +pid = waitpid(s->qemu_pid, , 0); + +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { +assert(!WCOREDUMP(wstatus)); +} } } -- MST