Hi On Tue, Nov 8, 2016 at 3:35 PM Wei Wang <wei.w.w...@intel.com> wrote:
On 11/08/2016 03:47 PM, Marc-André Lureau wrote: > Hi > > I suggest you split this patch for the various "features" you propose. OK. I'll make it several small patches. <*v1-AR1*> > > Master and slave can be either a client (i.e. connecting) or > server (listening) > in the socket communication. > "Client" and "Server" have already been used in the doc here. > > +The current vhost-user protocol is extended to support the > vhost-pci based inter-VM > +communication. In this case, Slave is a QEMU which runs a > vhost-pci server, and > +Master is another QEMU which runs a vhost-pci client. > + > > > > Why introduce new terminology "server" and "client"? What does it > change? This is confusing with socket client/server configuration. OK. I will try with "Slave" and "Master" in this doc when it's possible. <*v1-AR2*> > > Message Specification > --------------------- > > Note that all numbers are in the machine native byte order. A > vhost-user message > -consists of 3 header fields and a payload: > +consists of 4 header fields and a payload: > > ------------------------------------- > -| request | flags | size | payload | > ------------------------------------- > +---------------------------------------------- > +| request | flags | conn_id | size | payload | > +---------------------------------------------- > > * Request: 32-bit type of the request > * Flags: 32-bit bit field: > - Lower 2 bits are the version (currently 0x01) > - - Bit 2 is the reply flag - needs to be sent on each reply > from the slave > + - Bit 2 is the reply flag - needs to be sent on each reply > - Bit 3 is the need_reply flag - see > VHOST_USER_PROTOCOL_F_REPLY_ACK for > details. > + * Conn_id: 64-bit connection id to indentify a client socket > connection. It is > + introduced in version 0x02 to support the > "1-server-N-client" model > + and an asynchronous client read implementation. The > connection id, > + 0xFFFFFFFFFFFFFFFF, is used by an anonymous client > (e.g. a client who > + has not got its connection id from the server in the > initial talk) > > > I don't understand why you need a connection id, on each message. > What's the purpose? Since the communication is unicast, a single > message should be enough. Sure, please let me explain more: The QEMU socket is going to be upgraded to support 1 server socket being connected by multiple client sockets (I've made patches to achieve this). In other words, here, multiple masters will connect to one slave, and the slave creates a vhost-pci device for each master after receiving the necessary message info. The slave needs to know which master it is talking to when receiving a message, as it maintains multiple connections at the same time. You should be able to identify each connection in the slave (as a socket server), without a need for connection id: connected sockets are independent from each others. Also shed some light on the implementation: The slave maintains a table for those masters. Each master has an entry in the table (indexed by a "conn_id"). When the slave receives a message, the payload is added to the corresponding table entry. When things are ready (i.e. it has received enough info to create a vhost-pci device for the master), the device creation code creates and initializes a vhost-pci device (e.g. initialize VirtioPCIProxy in virtio-pci.c) from the corresponding table entry. > > * Size - 32-bit size of the payload > > > @@ -97,6 +106,13 @@ Depending on the request type, payload can be: > log offset: offset from start of supplied file descriptor > where logging starts (i.e. where guest address 0 would be > logged) > > +* Device info > + -------------------- > + | virito id | uuid | > + -------------------- > + Virtio id: 16-bit virtio id of the device > + UUID: 128-bit UUID to identify the QEMU instance that creates > the device > + > > > I wonder if UUID should be a different message. > We can make uuid another message if it has other usages. Do you see any other usages of uuid? Allows to associate data/configuration with a particular VM, in a multi-master/single-slave scenario. But tbh, I don't see how this is necessary, I can imagine solving this differently (having different connection address per vm for ex). I would like to understand your use case. > > In QEMU the vhost-user message is implemented with the following > struct: > > typedef struct VhostUserMsg { > @@ -109,6 +125,7 @@ typedef struct VhostUserMsg { > struct vhost_vring_addr addr; > VhostUserMemory memory; > VhostUserLog log; > + DeviceInfo dev_info; > }; > } QEMU_PACKED VhostUserMsg; > > @@ -119,17 +136,25 @@ The protocol for vhost-user is based on the > existing implementation of vhost > for the Linux Kernel. Most messages that can be sent via the Unix > domain socket > implementing vhost-user have an equivalent ioctl to the kernel > implementation. > > -The communication consists of master sending message requests and > slave sending > -message replies. Most of the requests don't require replies. Here > is a list of > -the ones that do: > +Traditionally, the communication consists of master sending > message requests > +and slave sending message replies. Most of the requests don't > require replies. > +Here is a list of the ones that do: > > * VHOST_GET_FEATURES > * VHOST_GET_PROTOCOL_FEATURES > * VHOST_GET_VRING_BASE > * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) > + * VHOST_USER_GET_CONN_ID > + * VHOST_USER_SET_PEER_CONNECTION > > Let's also fix the VHOST_USER prefix of the above requests. > Sure, will do. <*v1-AR3*> > [ Also see the section on REPLY_ACK protocol extension. ] > > +Currently, the communication also supports the Slave (server) > sending messages > +to the Master (client). Here is a list of them: > + * VHOST_USER_SET_FEATURES > > + * VHOST_USER_SET_PEER_CONNECTION (the serve may actively request > to disconnect > + with the client) > > > Oh, you are making the communication bidirectional? This is a > fundamental change in the protocol. This may be difficult to implement > in qemu, since the communication in synchronous, a request expects an > immediate reply, if it gets back a request (from the slave) in the > middle, it will fail. > Not really. Adding the above two doesn't affect the existing synchronous read() messages (basically, those VHOST_USER_GET_xx messages). Like VHOST_USER_SET_FEATURES, the _SET_ messages don't need a reply. Here, we just make the slave capable of actively sending messages to the master. Yes, that's the trouble. At any time the Master may send a request and expects an immediate reply. There is a race of getting a request from the Slave in the middle with your proposed change. I'd rather avoid making the request bidirectionnal if possible. (I proposed a second channel for Slave->Master request in the past: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html) > Currently all requests (including VHOST_USER_SET_FEATURES) are coming > from the Master. I don't understand yet the purpose of > VHOST_USER_SET_PEER_CONNECTION to propose an alternative, but I would > rather keep the unidirectional communication if possible. > > There are several messages that the master sends with file > descriptors passed > in the ancillary data: > > @@ -259,6 +284,7 @@ Protocol features > #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 > #define VHOST_USER_PROTOCOL_F_RARP 2 > #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 > +#define VHOST_USER_PROTOCOL_F_VHOST_PCI 4 > > Message types > ------------- > @@ -470,6 +496,43 @@ Message types > The first 6 bytes of the payload contain the mac address of > the guest to > allow the vhost user backend to construct and broadcast the > fake RARP. > > + * VHOST_USER_GET_CONN_ID > + > + Id: 20 > + Equivalent ioctl: N/A > + Master payload: u64 > + > + The client sends this message to the server to ask for its > connection id. > > > Confusing, please keep the Master/Slave terminology OK. > > + The connection id is then put into the message header (the > conn_id field), > + so that the server can always know who it is talking with. > + > > > Could you explain what the connection id is for? Explained above. Please let me know if I didn't make it clear. > > +This request should be sent only when VHOST_USER_PROTOCOL_F_VHOST_PCI > has... > > +* VHOST_USER_SET_DEV_INFO > + > + Id: 21 > + Equivalent ioctl: N/A > + Master payload: dev info > + > + The client sends the producer device info to the server. > > > "Master sends producer device info to the Slave" works, no? Yes, it works. The current dev info only contains a "virtio id" field (assume we'll take uuid out as a separate message), which tells the slave if it is a net, scsi, console or else. do you see any issue? > > Could we guarantee this message is sent before SET_VRING*? Why do we need to guarantee this? It would simplify the protocol to have expectations on when messages come. In particular, an early message with devinfo would allow to check/pre-configure the Slave for a particular device. Also VHOST_USER_SET_DEV_INFO should probably be unique (don't allow a device to be reconfigured) > > + This request should be sent only when > VHOST_USER_PROTOCOL_F_VHOST_PCI has > + been negotiated. > + > > > I think this message could be useful for other purposes than > vhost-pci, thus I would give it its own flag. Could you please give an example of other usage? Thanks. You could have a Slave that implements various devices, and pick the corresponding one dynamically (we already have implementations for net/input/gpu/scsi...) > > +* VHOST_USER_SET_PEER_CONNECTION > + > + Id: 22 > + Equivalent ioctl: N/A > + Master payload: u64 > + > + The producer device requests to connect or disconnect to > the consumer device. > > > producer->Master, consummer->Slave > > How does it interact with SET_VRING_ENABLE? It's independent of SET_VRING_ENABLE: SET_VRING_ENABLE enables a virtq to be in "active". SET_PEER_CONNECTION enables the peer (slave or master) device to be in "active". The driver shouldn't send packets if the device is inactive. I fail to see the difference with SET_VRING_ENABLE, perhaps someone more familiar with the protocol could help here. > > + The consumer device may request to disconnect to the producer > device. This > + request should be sent only when > VHOST_USER_PROTOCOL_F_VHOST_PCI has been > + negotiated. > + Connection request: If the reply message indicates > "success", the vhost-pci based > + inter-VM communication channel has been established. > + Disconnection request: If the reply message indicates > "success", the vhost-pci based > + inter-VM communication channel has been destroyed. > + #define VHOST_USER_SET_PEER_CONNECTION_F_OFF 0 > + #define VHOST_USER_SET_PEER_CONNECTION_F_ON 1 > + > I think it would be better to add one more command here: #define VHOST_USER_SET_PEER_CONNECTION_F_INIT 2 The master uses this command to tell the slave it's ready to create the vhost-pci device. Regarding the implementation, it is put at the bottom of vhost_net_start() function (when all the vring info have been sent and enabled). Do you have WIP branch for qemu vhost-pci? That could help to understand the context. thanks -- Marc-André Lureau