Hi, On last week's call we discussed the issue of splitting non core features of QEMU into it's own process to reduce the security risks etc.
I wrote up a summary of my thoughts on this to try to cover the various issues. Feedback welcome and hopefully we can continue the discussion on a future call - maybe next week? I would like to be part of the discussion, but it's a public holiday here March 1st, so I won't be on that call. Cheers, Jes Separating host-side virtagent and other tasks from core QEMU ============================================================= To improve auditing of the core QEMU code, it would be ideal to be able to separate the core QEMU functionality from utility functionality by moving the utility functionality into its own process. This process will be referred to as the QEMU client below. Components which are candidates for moving out of QEMU include: - virtagent - vnc server (and other graphical displays such as SDL, spice and curses) - human monitor The idea is to have QEMU launch as a daemon, and then allow for one of more client processes to connect to it. These will then offer the various services. The main issue to discuss is how to handle various state information, reconnects, and migration. Security ======== The primary reason for this discussion is security, however there are other benefits from this split, as will be mentioned below. During a demo of virtagent I hit a case where a bug in the agent command handling code caused a crash of the host QEMU process. While it is probably a simple bug, it shows how adding more complexity to the QEMU process increases the risk of adding security problems that could potentially be exploited by a hostile guest. By splitting non core functionality into a QEMU client process, the host process will be isolated from a large number of potential security problems, ie. in case a client process is killed or crashes, it should not affect the main QEMU process. In addition it makes it easier to audit the core QEMU functionality. virtagent ========= In short virtagent provides a set of simple commands, most of which do not have state associated with them. These include shutdown, ping, fsfreeze/fsthaw, etc. Other commands might be multi-stage commands which could fail if the client is disconnected from the daemon while the command is in progress. These include copy-paste and file copy. vnc server ========== The vnc server simply needs a connection to the video memory of the QEMU process, video mode state, as well as channels for sending keyboard and mouse events. It is stateless by nature and supports reconnects. This applies to the other graphical display engines (SDL, spice, and curses) as well. Human monitor ============= The human monitor is effectively stateless. It issues commands and prints the result. There is no state in the monitor and it can be built directly on top of QMP. An additional benefit here is that it would allow for multiple monitors. Disconnects =========== It must be possible for a client process getting killed or disconnected from the QEMU process, in which case is should be possible to launch a new client that connects to the QEMU process. In this case, commands needs to be provided allowing the client process to query the QEMU process and virtagent for current state information. In-progress commands may fail, and will need to be re-run, such as copy-paste and and file copy. However neither of these are vital commands and a re-run of such commands is acceptable behavior. Migration ========= Given that migration moves the guest to a new QEMU process, normally on a different host, any connection from management tools to the monitor, QMP sockets, virtio-serial, etc. require a new connection through the new QEMU process. Stateless connections, such as the monitor, QMP and the vnc-server handles reconnects without problems, and should not have any issues during migration that are different from the issues in the current implementation. The main issues causing problems are stateful events such as copy-paste, which is handled via a guest agent. The copy-paste problem can be handled by blocking copy-paste operations during migration, until the guest agent is reachable through the QEMU process on the new host. This does mean that copy-paste can block or fail temporarily (depending on whether is it implemented as -EBUSY or just blocks), however it is not a mission critical feature, and it can also block or fail temporarily during normal operation on a non virtual system. Per the nature of the operation, a file copy via a guest agent is going to fail and will have to be restarted after migration has completed, in case it does not complete. This is again a non critical feature and requiring a restart is acceptable.