Re: qemu-guest agent asserts on shutdown

2020-06-04 Thread Marc-André Lureau
Hi

On Thu, Jun 4, 2020 at 10:54 AM Christian Ehrhardt
 wrote:
>
> Hi,
> while debugging a report I got in Ubuntu I found that since qemu 4.0 the 
> guest agent shutdown feature works (guest is shutting down) but crashes when 
> doing so each time. This can be a big red herring when debugging other things 
> as well as people start to get "an application crashed, do you want to 
> report" pop-ups if they have set up automatic crash reports.
>
> If you boot the guest after starting it again and check the guest-agent 
> status you will see in journal:
> -- Logs begin at Tue 2020-06-02 07:41:32 UTC, end at Thu 2020-06-04 08:07:37 
> UTC. --
> Jun 02 07:47:58 focal systemd[1]: Started QEMU Guest Agent.
> Jun 02 07:49:03 focal qemu-ga[1984]: info: guest-shutdown called, mode: (null)
> Jun 02 07:49:03 focal qemu-ga[1984]: **
> Jun 02 07:49:03 focal qemu-ga[1984]: 
> ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion 
> failed: (rsp && s->channel)
> Jun 02 07:49:03 focal qemu-ga[1984]: Bail out! 
> ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion 
> failed: (rsp && s->channel)
> Jun 02 07:49:04 focal systemd[1]: Stopping QEMU Guest Agent...
> Jun 02 07:49:04 focal systemd[1]: qemu-guest-agent.service: Succeeded.
> Jun 02 07:49:04 focal systemd[1]: Stopped QEMU Guest Agent.
>
> The actual assert is from "forever" [3] (v0.15) which is the initial addition 
> of qemu guest agent in 2011. That was later restructured in [1] (v1.1) and 
> [2] (v4.0).
>
> In a check through Ubuntu releases I got
> 1) Host: Q 2.11 L 4.0 (Bionic) - G 2.11 (Bionic)
> 2) Host: Q 4.0 L 5.4 (Eoan) - G 2.11 (Bionic)
> 3) Host: Q 4.2 L 6.0 (Focal) - G 2.11 (Bionic)
> 4) Host: Q 2.11 L 4.0 (Bionic) - G 4.0 (Eoan)
> 5) Host: Q 4.0 L 5.4 (Eoan) - G 4.0 (Eoan)
> 6) Host: Q 4.2 L 6.0 (Focal) - G 4.0 (Eoan)
> 7) Host: Q 2.11 L 4.0 (Bionic) - G 4.2 (Focal)
> 8) Host: Q 4.0 L 5.4 (Eoan) - G 4.2 (Focal)
> 9) Host: Q 4.2 L 6.0 (Focal) - G 4.2 (Focal)
>
> So it seemed to be the qemu-guest-agent portion since >=4.0.
> I did a build with [2] reverted and the crash is gone.
>
> I see from the host:
> $ virsh qemu-agent-command focal '{"execute": "guest-shutdown"}'
> "error: Guest agent is not responding: Guest agent disappeared while 
> executing command"
>
> I'm not sure which part of the communication breaks first, but it could try 
> to send on a dying connection, the old code had:
>
> rsp = qmp_dispatch(_commands, QOBJECT(req), false);
> if (rsp) {
> ret = send_response(s, rsp)
>
> While the new code is like:
>
> rsp = qmp_dispatch(_commands, obj, false);
> end:
>  ret = send_response(s, rsp);
>
> Maybe it runs send_response despite qmp_dispatch failing now?
>
> I didn't stare at it long enough to have a solution yet, but wanted to make 
> the maintainer of qga and the Author aware.
>

My bad, "guest-shutdown" is a QCO_NO_SUCCESS_RESP command, and in this
case qmp_dispatch() returns NULL. I'll send a patch.

thanks


> [1]: https://git.qemu.org/?p=qemu.git;a=commit;h=125b310e1d62
> [2]: https://git.qemu.org/?p=qemu.git;a=commit;h=781f2b3d1e5e
> [3]: https://git.qemu.org/?p=qemu.git;a=commit;h=48ff7a625b36
>
> --
> Christian Ehrhardt
> Staff Engineer, Ubuntu Server
> Canonical Ltd




qemu-guest agent asserts on shutdown

2020-06-04 Thread Christian Ehrhardt
Hi,
while debugging a report I got in Ubuntu I found that since qemu 4.0 the
guest agent shutdown feature works (guest is shutting down) but crashes
when doing so each time. This can be a big red herring when debugging other
things as well as people start to get "an application crashed, do you want
to report" pop-ups if they have set up automatic crash reports.

If you boot the guest after starting it again and check the guest-agent
status you will see in journal:
-- Logs begin at Tue 2020-06-02 07:41:32 UTC, end at Thu 2020-06-04
08:07:37 UTC. --
Jun 02 07:47:58 focal systemd[1]: Started QEMU Guest Agent.
Jun 02 07:49:03 focal qemu-ga[1984]: info: guest-shutdown called, mode:
(null)
Jun 02 07:49:03 focal qemu-ga[1984]: **
Jun 02 07:49:03 focal qemu-ga[1984]:
ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion
failed: (rsp && s->channel)
Jun 02 07:49:03 focal qemu-ga[1984]: Bail out!
ERROR:/build/qemu-7aKH5L/qemu-4.2/qga/main.c:532:send_response: assertion
failed: (rsp && s->channel)
Jun 02 07:49:04 focal systemd[1]: Stopping QEMU Guest Agent...
Jun 02 07:49:04 focal systemd[1]: qemu-guest-agent.service: Succeeded.
Jun 02 07:49:04 focal systemd[1]: Stopped QEMU Guest Agent.

The actual assert is from "forever" [3] (v0.15) which is the initial
addition of qemu guest agent in 2011. That was later restructured in [1]
(v1.1) and [2] (v4.0).

In a check through Ubuntu releases I got
1) Host: Q 2.11 L 4.0 (Bionic) - G 2.11 (Bionic)
2) Host: Q 4.0 L 5.4 (Eoan) - G 2.11 (Bionic)
3) Host: Q 4.2 L 6.0 (Focal) - G 2.11 (Bionic)
4) Host: Q 2.11 L 4.0 (Bionic) - G 4.0 (Eoan)
5) Host: Q 4.0 L 5.4 (Eoan) - G 4.0 (Eoan)
6) Host: Q 4.2 L 6.0 (Focal) - G 4.0 (Eoan)
7) Host: Q 2.11 L 4.0 (Bionic) - G 4.2 (Focal)
8) Host: Q 4.0 L 5.4 (Eoan) - G 4.2 (Focal)
9) Host: Q 4.2 L 6.0 (Focal) - G 4.2 (Focal)

So it seemed to be the qemu-guest-agent portion since >=4.0.
I did a build with [2] reverted and the crash is gone.

I see from the host:
$ virsh qemu-agent-command focal '{"execute": "guest-shutdown"}'
"error: Guest agent is not responding: Guest agent disappeared while
executing command"

I'm not sure which part of the communication breaks first, but it could try
to send on a dying connection, the old code had:

rsp = qmp_dispatch(_commands, QOBJECT(req), false);
if (rsp) {
ret = send_response(s, rsp)

While the new code is like:

rsp = qmp_dispatch(_commands, obj, false);
end:
 ret = send_response(s, rsp);

Maybe it runs send_response despite qmp_dispatch failing now?

I didn't stare at it long enough to have a solution yet, but wanted to make
the maintainer of qga and the Author aware.

[1]: https://git.qemu.org/?p=qemu.git;a=commit;h=125b310e1d62
[2]: https://git.qemu.org/?p=qemu.git;a=commit;h=781f2b3d1e5e
[3]: https://git.qemu.org/?p=qemu.git;a=commit;h=48ff7a625b36

-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd