Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Thomas Huth
On 24.05.2018 17:51, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote:
>> On 24.05.2018 17:00, Michael S. Tsirkin wrote:
>>> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
 On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  tests/libqtest.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 43fb97e..f869854 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>  static void kill_qemu(QTestState *s)
>  {
>  if (s->qemu_pid != -1) {
> +int wstatus = 0;
> +pid_t pid;
> +
>  kill(s->qemu_pid, SIGTERM);
> -waitpid(s->qemu_pid, NULL, 0);
> +pid = waitpid(s->qemu_pid, , 0);
> +
> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +assert(!WCOREDUMP(wstatus));
> +}
>  }
>  }

 That's basically a good idea ... but I've already seen yet another issue
 in the past already: QEMU sometimes simply hangs in an endless loop
 during clean up and never terminates. I think we should detect that
 situation, too. So instead of killing QEMU at the end of the testing, I 
 think we should
 rather try to terminate it with the QMP "quit" command. If QEMU does not
 terminate with an exit code of 0, then the test should be flagged a
 failed (and only if QEMU did not terminate at all, it should be killed
 with SIGKILL).

  Thomas
>>>
>>> Fine but can we agree to do this as a patch on top? And do you have
>>> the time to implement this?
>>
>> Fine for me if we do that later. And no, I currently don't have time to
>> work on this (but I've got it on my TODO list somewhere, so I hope I
>> won't forget about it later...).
>>
>>> I'm seeing patches that cause crash on cleanup, it's not a theoretical
>>> problem for me, so I'd like this one to go in first.
>>
>> Ok, so here are the two problems that I remember:
>>
>> 1)
>>
>> git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
>> make -j4 subdir-nios2-softmmu
>> nios2-softmmu/qemu-system-nios2 -monitor stdio
>>
>> ==> You can neither "quit" from the HMP prompt, nor kill QEMU with
>> SIGTERM, you've got to use SIGKILL instead.
>>
>> Ok, libqtest likely would not have reported success in this case, too,
>> we just did not notice since there is no libqtest in place that tests
>> the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
>> properly detect the situation and report an error instead of just
>> hanging in waitpid().
> 
> That's a tall order, it assumes we can define a reasonable
> time during which QEMU should exit.

Yeah, thinking about this again, and considering the problems that we've
seen with overloaded test machines in the past, it's probably not worth
the effort to find a reasonable timeout here.

And since the problem with an endless loop won't go unnoticed (I
originally thought SIGTERM would kill QEMU, so that endless loops would
not be noticed here - I was just not aware of the signal handler that we
have here in QEMU), let's simply forget about that idea with QMP "quit"
+ timeout + SIGKILL and continue with your patch instead.

 Thomas


>> 2)
>>
>> git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
>> make -j4 subdir-ppc-softmmu
>> ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio
>>
>> ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
>> problem where libqtest did not report an error though it should have
>> reported one. So QEMU was not hanging in an endless loop here, but core
>> dumped ... Sorry, I apparently mixed this up in my mind with the first
>> case. That means we should be fine here with your patch.
>>
>>  Thomas




Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Michael S. Tsirkin
On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote:
> On 24.05.2018 17:00, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> >> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> >>> Right now tests report OK status if QEMU crashes during cleanup.
> >>> Let's catch that case and fail the test.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin 
> >>> ---
> >>>  tests/libqtest.c | 9 -
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tests/libqtest.c b/tests/libqtest.c
> >>> index 43fb97e..f869854 100644
> >>> --- a/tests/libqtest.c
> >>> +++ b/tests/libqtest.c
> >>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >>>  static void kill_qemu(QTestState *s)
> >>>  {
> >>>  if (s->qemu_pid != -1) {
> >>> +int wstatus = 0;
> >>> +pid_t pid;
> >>> +
> >>>  kill(s->qemu_pid, SIGTERM);
> >>> -waitpid(s->qemu_pid, NULL, 0);
> >>> +pid = waitpid(s->qemu_pid, , 0);
> >>> +
> >>> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> >>> +assert(!WCOREDUMP(wstatus));
> >>> +}
> >>>  }
> >>>  }
> >>
> >> That's basically a good idea ... but I've already seen yet another issue
> >> in the past already: QEMU sometimes simply hangs in an endless loop
> >> during clean up and never terminates. I think we should detect that
> >> situation, too. So instead of killing QEMU at the end of the testing, I 
> >> think we should
> >> rather try to terminate it with the QMP "quit" command. If QEMU does not
> >> terminate with an exit code of 0, then the test should be flagged a
> >> failed (and only if QEMU did not terminate at all, it should be killed
> >> with SIGKILL).
> >>
> >>  Thomas
> > 
> > Fine but can we agree to do this as a patch on top? And do you have
> > the time to implement this?
> 
> Fine for me if we do that later. And no, I currently don't have time to
> work on this (but I've got it on my TODO list somewhere, so I hope I
> won't forget about it later...).
> 
> > I'm seeing patches that cause crash on cleanup, it's not a theoretical
> > problem for me, so I'd like this one to go in first.
> 
> Ok, so here are the two problems that I remember:
> 
> 1)
> 
> git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
> make -j4 subdir-nios2-softmmu
> nios2-softmmu/qemu-system-nios2 -monitor stdio
> 
> ==> You can neither "quit" from the HMP prompt, nor kill QEMU with
> SIGTERM, you've got to use SIGKILL instead.
> 
> Ok, libqtest likely would not have reported success in this case, too,
> we just did not notice since there is no libqtest in place that tests
> the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
> properly detect the situation and report an error instead of just
> hanging in waitpid().

That's a tall order, it assumes we can define a reasonable
time during which QEMU should exit.

> 2)
> 
> git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
> make -j4 subdir-ppc-softmmu
> ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio
> 
> ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
> problem where libqtest did not report an error though it should have
> reported one. So QEMU was not hanging in an endless loop here, but core
> dumped ... Sorry, I apparently mixed this up in my mind with the first
> case. That means we should be fine here with your patch.
> 
>  Thomas



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Thomas Huth
On 24.05.2018 17:00, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
>> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
>>> Right now tests report OK status if QEMU crashes during cleanup.
>>> Let's catch that case and fail the test.
>>>
>>> Signed-off-by: Michael S. Tsirkin 
>>> ---
>>>  tests/libqtest.c | 9 -
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>>> index 43fb97e..f869854 100644
>>> --- a/tests/libqtest.c
>>> +++ b/tests/libqtest.c
>>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>>>  static void kill_qemu(QTestState *s)
>>>  {
>>>  if (s->qemu_pid != -1) {
>>> +int wstatus = 0;
>>> +pid_t pid;
>>> +
>>>  kill(s->qemu_pid, SIGTERM);
>>> -waitpid(s->qemu_pid, NULL, 0);
>>> +pid = waitpid(s->qemu_pid, , 0);
>>> +
>>> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>>> +assert(!WCOREDUMP(wstatus));
>>> +}
>>>  }
>>>  }
>>
>> That's basically a good idea ... but I've already seen yet another issue
>> in the past already: QEMU sometimes simply hangs in an endless loop
>> during clean up and never terminates. I think we should detect that
>> situation, too. So instead of killing QEMU at the end of the testing, I 
>> think we should
>> rather try to terminate it with the QMP "quit" command. If QEMU does not
>> terminate with an exit code of 0, then the test should be flagged a
>> failed (and only if QEMU did not terminate at all, it should be killed
>> with SIGKILL).
>>
>>  Thomas
> 
> Fine but can we agree to do this as a patch on top? And do you have
> the time to implement this?

Fine for me if we do that later. And no, I currently don't have time to
work on this (but I've got it on my TODO list somewhere, so I hope I
won't forget about it later...).

> I'm seeing patches that cause crash on cleanup, it's not a theoretical
> problem for me, so I'd like this one to go in first.

Ok, so here are the two problems that I remember:

1)

git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
make -j4 subdir-nios2-softmmu
nios2-softmmu/qemu-system-nios2 -monitor stdio

==> You can neither "quit" from the HMP prompt, nor kill QEMU with
SIGTERM, you've got to use SIGKILL instead.

Ok, libqtest likely would not have reported success in this case, too,
we just did not notice since there is no libqtest in place that tests
the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
properly detect the situation and report an error instead of just
hanging in waitpid().

2)

git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
make -j4 subdir-ppc-softmmu
ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio

===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
problem where libqtest did not report an error though it should have
reported one. So QEMU was not hanging in an endless loop here, but core
dumped ... Sorry, I apparently mixed this up in my mind with the first
case. That means we should be fine here with your patch.

 Thomas



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Michael S. Tsirkin
On Thu, May 24, 2018 at 05:04:56PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkin  wrote:
> > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> >> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> >> > Right now tests report OK status if QEMU crashes during cleanup.
> >> > Let's catch that case and fail the test.
> >> >
> >> > Signed-off-by: Michael S. Tsirkin 
> >> > ---
> >> >  tests/libqtest.c | 9 -
> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> >> > index 43fb97e..f869854 100644
> >> > --- a/tests/libqtest.c
> >> > +++ b/tests/libqtest.c
> >> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >> >  static void kill_qemu(QTestState *s)
> >> >  {
> >> >  if (s->qemu_pid != -1) {
> >> > +int wstatus = 0;
> >> > +pid_t pid;
> >> > +
> >> >  kill(s->qemu_pid, SIGTERM);
> >> > -waitpid(s->qemu_pid, NULL, 0);
> >> > +pid = waitpid(s->qemu_pid, , 0);
> >> > +
> >> > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> >> > +assert(!WCOREDUMP(wstatus));
> >> > +}
> >> >  }
> >> >  }
> >>
> >> That's basically a good idea ... but I've already seen yet another issue
> >> in the past already: QEMU sometimes simply hangs in an endless loop
> >> during clean up and never terminates. I think we should detect that
> >> situation, too. So instead of killing QEMU at the end of the testing, I 
> >> think we should
> >> rather try to terminate it with the QMP "quit" command. If QEMU does not
> >> terminate with an exit code of 0, then the test should be flagged a
> >> failed (and only if QEMU did not terminate at all, it should be killed
> >> with SIGKILL).
> >>
> >>  Thomas
> >
> > Fine but can we agree to do this as a patch on top? And do you have
> > the time to implement this?
> >
> > I'm seeing patches that cause crash on cleanup, it's not a theoretical
> > problem for me, so I'd like this one to go in first.
> 
> What is the difference between sending "quit" and SIGTERM (qemu has
> code to handle it in termsig_handler()). Is this documented somewhere?

SIGTERM might wake QEMU if it's blocked somewhere in kernel.
Sending quit also tests QMP isn't blocked.

> 
> -- 
> Marc-André Lureau



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Peter Maydell
On 24 May 2018 at 15:30, Michael S. Tsirkin  wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
>

Ha -- I've occasionally wondered why some obvious
test failures got reported as OK. Thanks for tracking
down what was going on.

-- PMM



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Marc-André Lureau
Hi

On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkin  wrote:
> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
>> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
>> > Right now tests report OK status if QEMU crashes during cleanup.
>> > Let's catch that case and fail the test.
>> >
>> > Signed-off-by: Michael S. Tsirkin 
>> > ---
>> >  tests/libqtest.c | 9 -
>> >  1 file changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/tests/libqtest.c b/tests/libqtest.c
>> > index 43fb97e..f869854 100644
>> > --- a/tests/libqtest.c
>> > +++ b/tests/libqtest.c
>> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>> >  static void kill_qemu(QTestState *s)
>> >  {
>> >  if (s->qemu_pid != -1) {
>> > +int wstatus = 0;
>> > +pid_t pid;
>> > +
>> >  kill(s->qemu_pid, SIGTERM);
>> > -waitpid(s->qemu_pid, NULL, 0);
>> > +pid = waitpid(s->qemu_pid, , 0);
>> > +
>> > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>> > +assert(!WCOREDUMP(wstatus));
>> > +}
>> >  }
>> >  }
>>
>> That's basically a good idea ... but I've already seen yet another issue
>> in the past already: QEMU sometimes simply hangs in an endless loop
>> during clean up and never terminates. I think we should detect that
>> situation, too. So instead of killing QEMU at the end of the testing, I 
>> think we should
>> rather try to terminate it with the QMP "quit" command. If QEMU does not
>> terminate with an exit code of 0, then the test should be flagged a
>> failed (and only if QEMU did not terminate at all, it should be killed
>> with SIGKILL).
>>
>>  Thomas
>
> Fine but can we agree to do this as a patch on top? And do you have
> the time to implement this?
>
> I'm seeing patches that cause crash on cleanup, it's not a theoretical
> problem for me, so I'd like this one to go in first.

What is the difference between sending "quit" and SIGTERM (qemu has
code to handle it in termsig_handler()). Is this documented somewhere?


-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Michael S. Tsirkin
On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> > Right now tests report OK status if QEMU crashes during cleanup.
> > Let's catch that case and fail the test.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  tests/libqtest.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index 43fb97e..f869854 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >  static void kill_qemu(QTestState *s)
> >  {
> >  if (s->qemu_pid != -1) {
> > +int wstatus = 0;
> > +pid_t pid;
> > +
> >  kill(s->qemu_pid, SIGTERM);
> > -waitpid(s->qemu_pid, NULL, 0);
> > +pid = waitpid(s->qemu_pid, , 0);
> > +
> > +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> > +assert(!WCOREDUMP(wstatus));
> > +}
> >  }
> >  }
> 
> That's basically a good idea ... but I've already seen yet another issue
> in the past already: QEMU sometimes simply hangs in an endless loop
> during clean up and never terminates. I think we should detect that
> situation, too. So instead of killing QEMU at the end of the testing, I think 
> we should
> rather try to terminate it with the QMP "quit" command. If QEMU does not
> terminate with an exit code of 0, then the test should be flagged a
> failed (and only if QEMU did not terminate at all, it should be killed
> with SIGKILL).
> 
>  Thomas

Fine but can we agree to do this as a patch on top? And do you have
the time to implement this?

I'm seeing patches that cause crash on cleanup, it's not a theoretical
problem for me, so I'd like this one to go in first.

-- 
MST



Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Thomas Huth
On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  tests/libqtest.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 43fb97e..f869854 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>  static void kill_qemu(QTestState *s)
>  {
>  if (s->qemu_pid != -1) {
> +int wstatus = 0;
> +pid_t pid;
> +
>  kill(s->qemu_pid, SIGTERM);
> -waitpid(s->qemu_pid, NULL, 0);
> +pid = waitpid(s->qemu_pid, , 0);
> +
> +if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +assert(!WCOREDUMP(wstatus));
> +}
>  }
>  }

That's basically a good idea ... but I've already seen yet another issue
in the past already: QEMU sometimes simply hangs in an endless loop
during clean up and never terminates. I think we should detect that
situation, too.

So instead of killing QEMU at the end of the testing, I think we should
rather try to terminate it with the QMP "quit" command. If QEMU does not
terminate with an exit code of 0, then the test should be flagged a
failed (and only if QEMU did not terminate at all, it should be killed
with SIGKILL).

 Thomas



[Qemu-devel] [PATCH] libqtest: fail if child coredumps

2018-05-24 Thread Michael S. Tsirkin
Right now tests report OK status if QEMU crashes during cleanup.
Let's catch that case and fail the test.

Signed-off-by: Michael S. Tsirkin 
---
 tests/libqtest.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 43fb97e..f869854 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -103,8 +103,15 @@ static int socket_accept(int sock)
 static void kill_qemu(QTestState *s)
 {
 if (s->qemu_pid != -1) {
+int wstatus = 0;
+pid_t pid;
+
 kill(s->qemu_pid, SIGTERM);
-waitpid(s->qemu_pid, NULL, 0);
+pid = waitpid(s->qemu_pid, , 0);
+
+if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+assert(!WCOREDUMP(wstatus));
+}
 }
 }
 
-- 
MST