Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication

Marc-André Lureau Tue, 08 Nov 2016 04:21:34 -0800

Hi

On Tue, Nov 8, 2016 at 3:35 PM Wei Wang <wei.w.w...@intel.com> wrote:


On 11/08/2016 03:47 PM, Marc-André Lureau wrote:
> Hi
>
> I suggest you split this patch for the various "features" you propose.

OK. I'll make it several small patches.   <*v1-AR1*>
>
>      Master and slave can be either a client (i.e. connecting) or
>     server (listening)
>      in the socket communication.
>
"Client" and "Server" have already been used in the doc here.
>
>     +The current vhost-user protocol is extended to support the
>     vhost-pci based inter-VM
>     +communication. In this case, Slave is a QEMU which runs a
>     vhost-pci server, and
>     +Master is another QEMU which runs a vhost-pci client.
>     +
>
>
>
> Why introduce new terminology "server" and "client"? What does it
> change? This is confusing with socket client/server configuration.

OK. I will try with "Slave" and "Master" in this doc when it's possible.
<*v1-AR2*>

>
>      Message Specification
>      ---------------------
>
>      Note that all numbers are in the machine native byte order. A
>     vhost-user message
>     -consists of 3 header fields and a payload:
>     +consists of 4 header fields and a payload:
>
>     -------------------------------------
>     -| request | flags | size | payload |
>     -------------------------------------
>     +----------------------------------------------
>     +| request | flags | conn_id | size | payload |
>     +----------------------------------------------
>
>       * Request: 32-bit type of the request
>       * Flags: 32-bit bit field:
>         - Lower 2 bits are the version (currently 0x01)
>     -   - Bit 2 is the reply flag - needs to be sent on each reply
>     from the slave
>     +   - Bit 2 is the reply flag - needs to be sent on each reply
>         - Bit 3 is the need_reply flag - see
>     VHOST_USER_PROTOCOL_F_REPLY_ACK for
>           details.
>     + * Conn_id: 64-bit connection id to indentify a client socket
>     connection. It is
>     +            introduced in version 0x02 to support the
>     "1-server-N-client" model
>     +            and an asynchronous client read implementation. The
>     connection id,
>     +            0xFFFFFFFFFFFFFFFF, is used by an anonymous client
>     (e.g. a client who
>     +            has not got its connection id from the server in the
>     initial talk)
>
>
> I don't understand why you need a connection id, on each message.
> What's the purpose? Since the communication is unicast, a single
> message should be enough.

Sure, please let me explain more:
The QEMU socket is going to be upgraded to support 1 server socket being
connected by multiple client sockets (I've made patches to achieve
this). In other words, here, multiple masters will connect to one slave,
and the slave creates a vhost-pci device for each master after receiving
the necessary message info. The slave needs to know which master it is
talking to when receiving a message, as it maintains multiple
connections at the same time.


You should be able to identify each connection in the slave (as a socket
server), without a need for connection id: connected sockets are
independent from each others.


Also shed some light on the implementation:
The slave maintains a table for those masters. Each master has an entry
in the table (indexed by a "conn_id"). When the slave receives a
message, the payload is added to the corresponding table entry. When
things are ready (i.e. it has received enough info to create a vhost-pci
device for the master), the device creation code creates and initializes
a vhost-pci device (e.g. initialize VirtioPCIProxy in virtio-pci.c) from
the corresponding table entry.

>
>       * Size - 32-bit size of the payload
>
>
>     @@ -97,6 +106,13 @@ Depending on the request type, payload can be:
>         log offset: offset from start of supplied file descriptor
>             where logging starts (i.e. where guest address 0 would be
>     logged)
>
>     +* Device info
>     +   --------------------
>     +   | virito id | uuid |
>     +   --------------------
>     +   Virtio id: 16-bit virtio id of the device
>     +   UUID: 128-bit UUID to identify the QEMU instance that creates
>     the device
>     +
>
>
> I wonder if UUID should be a different message.
>
We can make uuid another message if it has other usages.
Do you see any other usages of uuid?


Allows to associate data/configuration with a particular VM, in a
multi-master/single-slave scenario. But tbh, I don't see how this is
necessary, I can imagine solving this differently (having different
connection address per vm for ex). I would like to understand your use case.


>
>      In QEMU the vhost-user message is implemented with the following
>     struct:
>
>      typedef struct VhostUserMsg {
>     @@ -109,6 +125,7 @@ typedef struct VhostUserMsg {
>              struct vhost_vring_addr addr;
>              VhostUserMemory memory;
>              VhostUserLog log;
>     +        DeviceInfo dev_info;
>          };
>      } QEMU_PACKED VhostUserMsg;
>
>     @@ -119,17 +136,25 @@ The protocol for vhost-user is based on the
>     existing implementation of vhost
>      for the Linux Kernel. Most messages that can be sent via the Unix
>     domain socket
>      implementing vhost-user have an equivalent ioctl to the kernel
>     implementation.
>
>     -The communication consists of master sending message requests and
>     slave sending
>     -message replies. Most of the requests don't require replies. Here
>     is a list of
>     -the ones that do:
>     +Traditionally, the communication consists of master sending
>     message requests
>     +and slave sending message replies. Most of the requests don't
>     require replies.
>     +Here is a list of the ones that do:
>
>       * VHOST_GET_FEATURES
>       * VHOST_GET_PROTOCOL_FEATURES
>       * VHOST_GET_VRING_BASE
>       * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
>     + * VHOST_USER_GET_CONN_ID
>     + * VHOST_USER_SET_PEER_CONNECTION
>
> Let's also fix  the VHOST_USER prefix of the above requests.
>
Sure, will do. <*v1-AR3*>

>      [ Also see the section on REPLY_ACK protocol extension. ]
>
>     +Currently, the communication also supports the Slave (server)
>     sending messages
>     +to the Master (client). Here is a list of them:
>     + * VHOST_USER_SET_FEATURES
>
>     + * VHOST_USER_SET_PEER_CONNECTION (the serve may actively request
>     to disconnect
>     +   with the client)
>
>
> Oh, you are making the communication bidirectional? This is a
> fundamental change in the protocol. This may be difficult to implement
> in qemu, since the communication in synchronous, a request expects an
> immediate reply, if it gets back a request (from the slave) in the
> middle, it will fail.
>

Not really.
Adding the above two doesn't affect the existing synchronous read()
messages (basically, those VHOST_USER_GET_xx messages). Like
VHOST_USER_SET_FEATURES, the _SET_ messages don't need a reply. Here, we
just make the slave capable of actively sending messages to the master.


Yes, that's the trouble. At any time the Master may send a request and
expects an immediate reply. There is a race of getting a request from the
Slave in the middle with your proposed change. I'd rather avoid making the
request bidirectionnal if possible. (I proposed a second channel for
Slave->Master request in the past:
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html)

> Currently all requests (including VHOST_USER_SET_FEATURES) are coming
> from the Master. I don't understand yet the purpose of
> VHOST_USER_SET_PEER_CONNECTION to propose an alternative, but I would
> rather keep the unidirectional communication if possible.
>
>      There are several messages that the master sends with file
>     descriptors passed
>      in the ancillary data:
>
>     @@ -259,6 +284,7 @@ Protocol features
>      #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
>      #define VHOST_USER_PROTOCOL_F_RARP           2
>      #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
>     +#define VHOST_USER_PROTOCOL_F_VHOST_PCI      4
>
>      Message types
>      -------------
>     @@ -470,6 +496,43 @@ Message types
>            The first 6 bytes of the payload contain the mac address of
>     the guest to
>            allow the vhost user backend to construct and broadcast the
>     fake RARP.
>
>     + * VHOST_USER_GET_CONN_ID
>     +
>     +      Id: 20
>     +      Equivalent ioctl: N/A
>     +      Master payload: u64
>     +
>     +      The client sends this message to the server to ask for its
>     connection id.
>
>
> Confusing, please keep the Master/Slave terminology
OK.
>
>     + The connection id is then put into the message header (the
>     conn_id field),
>     +      so that the server can always know who it is talking with.
>     +
>
>
> Could you explain what the connection id is for?

Explained above. Please let me know if I didn't make it clear.

>
> +This request should be sent only when VHOST_USER_PROTOCOL_F_VHOST_PCI
> has...
>
>     +* VHOST_USER_SET_DEV_INFO
>     +
>     +      Id: 21
>     +      Equivalent ioctl: N/A
>     +      Master payload: dev info
>     +
>     +      The client sends the producer device info to the server.
>
>
> "Master sends producer device info to the Slave" works, no?

Yes, it works. The current dev info only contains a "virtio id" field
(assume we'll take uuid out as a separate message), which tells the
slave if it is a net, scsi, console or else. do you see any issue?
>
> Could we guarantee this message is sent before SET_VRING*?

Why do we need to guarantee this?


It would simplify the protocol to have expectations on when messages come.
In particular, an early message with devinfo would allow to
check/pre-configure the Slave for a particular device. Also
VHOST_USER_SET_DEV_INFO should probably be unique (don't allow a device to
be reconfigured)


>
>     + This request should be sent only when
>     VHOST_USER_PROTOCOL_F_VHOST_PCI has
>     +      been negotiated.
>     +
>
>
> I think this message could be useful for other purposes than
> vhost-pci, thus I would give it its own flag.

Could you please give an example of other usage? Thanks.


You could have a Slave that implements various devices, and pick the
corresponding one dynamically (we already have implementations for
net/input/gpu/scsi...)


>
>     +* VHOST_USER_SET_PEER_CONNECTION
>     +
>     +      Id: 22
>     +      Equivalent ioctl: N/A
>     +      Master payload: u64
>     +
>     +      The producer device requests to connect or disconnect to
>     the consumer device.
>
>
> producer->Master, consummer->Slave
>
> How does it interact with SET_VRING_ENABLE?

It's independent of SET_VRING_ENABLE:
SET_VRING_ENABLE enables a virtq to be in "active".
SET_PEER_CONNECTION enables the peer (slave or master) device to be in
"active". The driver shouldn't send packets if the device is inactive.


I fail to see the difference with SET_VRING_ENABLE, perhaps someone more
familiar with the protocol could help here.

>
>     + The consumer device may request to disconnect to the producer
>     device. This
>     +      request should be sent only when
>     VHOST_USER_PROTOCOL_F_VHOST_PCI has been
>     +      negotiated.
>     +      Connection request: If the reply message indicates
>     "success", the vhost-pci based
>     +      inter-VM communication channel has been established.
>     +      Disconnection request: If the reply message indicates
>     "success", the vhost-pci based
>     +      inter-VM communication channel has been destroyed.
>     +      #define VHOST_USER_SET_PEER_CONNECTION_F_OFF 0
>     +      #define VHOST_USER_SET_PEER_CONNECTION_F_ON 1
>     +
>
I think it would be better to add one more command here:
#define VHOST_USER_SET_PEER_CONNECTION_F_INIT 2

The master uses this command to tell the slave it's ready to create the
vhost-pci device. Regarding the implementation, it is put at the bottom
of vhost_net_start() function (when all the vring info have been sent
and enabled).


Do you have WIP branch for qemu vhost-pci? That could help to understand
the context.

thanks
-- 
Marc-André Lureau

Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication

Reply via email to