On Mon, Apr 12, 2021 at 6:50 AM Stefan Hajnoczi <stefa...@redhat.com> wrote:
>
> On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > Add supports for datagram type for virtio-vsock. Datagram
> > sockets are connectionless and unreliable. To avoid contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to identify if those two new queues exist or not.
> >
> > Also add descriptions for resource management of datagram, which
> > does not use the existing credit update mechanism associated with
> > stream sockets.
> >
> > Signed-off-by: Jiang Wang <jiang.w...@bytedance.com>
> > ---
> > V2 addressed the comments for the previous version.
> >
> >  virtio-vsock.tex | 62 
> > +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> >
> > diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > index da7e641..62c12e0 100644
> > --- a/virtio-vsock.tex
> > +++ b/virtio-vsock.tex
> > @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket 
> > Device / Virtqueues}
> >  \begin{description}
> >  \item[0] rx
> >  \item[1] tx
> > +\item[2] datagram rx
> > +\item[3] datagram tx
> > +\item[4] event
> > +\end{description}
> > +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM 
> > is set. Otherwise, it
> > +only uses 3 queues, as the following. Rx and tx queues are always used for 
> > stream sockets.
> > +
> > +\begin{description}
> > +\item[0] rx
> > +\item[1] tx
> >  \item[2] event
> >  \end{description}
> >
>
> I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> virtqueues and also adding this:

OK

>   When behavior differs between stream and datagram rx/tx virtqueues
>   their full names are used. Common behavior is simply described in
>   terms of rx/tx virtqueues and applies to both stream and datagram
>   virtqueues.

OK

> This way you won't need to duplicate portions of the spec that deal with
> populating the virtqueues, for example.
>
> It's also clearer to use a full name for stream rx/tx virtqueues instead
> of calling them rx/tx virtqueues now that we have datagram rx/tx
> virtqueues.

Sure.

> > +
> >  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature 
> > bits}
> >
> > -There are currently no feature bits defined for this device.
> > +\begin{description}
> > +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket 
> > type.
> > +\end{description}
> >
> >  \subsection{Device configuration layout}\label{sec:Device Types / Socket 
> > Device / Device configuration layout}
> >
> > @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / 
> > Socket Device / Device Op
> >
> >  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket 
> > Device / Device Operation / Virtqueue Flow Control}
> >
> > +Flow control applies to stream sockets; datagram sockets do not have
> > +flow control.
> > +
> >  The tx virtqueue carries packets initiated by applications and replies to
> >  received packets.  The rx virtqueue carries packets initiated by the 
> > device and
> >  replies to previously transmitted packets.
> > @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / 
> > Socket Device / Device Opera
> >  consists of a (cid, port number) tuple. The header fields used for this are
> >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >
> > -Currently only stream sockets are supported. \field{type} is 1 for stream
> > -socket types.
> > +Currently stream and datagram (dgram) sockets are supported. \field{type} 
> > is 1 for stream
> > +socket types. \field{type} is 3 for dgram socket types.
> >
> >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> >  without message boundaries.
> >
> > +Datagram sockets provide connectionless unreliable messages of
> > +a fixed maximum length.
>
> Plus unordered (?) and with message boundaries. In other words:
>
>   Datagram sockets provide unordered, unreliable, connectionless message
>   with message boundaries and a fixed maximum length.

OK. Will do. In my implementation, the order is preserved because
there is only one virtqueue. But I think we can put unordered in the spec.

> I didn't think of the fixed maximum length aspect before. I guess the
> intention is that the rx buffer size is the message size limit? That's
> different from UDP messages, which can be fragmented into multiple IP
> packets and can be larger than 64KiB:
> https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>
> Is it possible to support large datagram messages in vsock? I'm a little
> concerned that applications that run successfully over UDP will not be
> portable if vsock has this limitation because it would impose extra
> message boundaries that the application protocol might not tolerate.
>
OK. I think one way is to support fragmentation as suggested by Stefano.

> > +
> >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket 
> > Device / Device Operation / Buffer Space Management}
> >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management 
> > of
> >  stream sockets. The guest and the device publish how much buffer space is
> > @@ -162,7 +181,7 @@ \subsubsection{Buffer Space 
> > Management}\label{sec:Device Types / Socket Device /
> >  u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> >  \end{lstlisting}
> >
> > -If there is insufficient buffer space, the sender waits until virtqueue 
> > buffers
> > +For stream sockets, if there is insufficient buffer space, the sender 
> > waits until virtqueue buffers
> >  are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. 
> > Sending
> >  the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> >  available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE 
> > packet.
> > @@ -170,16 +189,28 @@ \subsubsection{Buffer Space 
> > Management}\label{sec:Device Types / Socket Device /
> >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> >  communicating updates any time a change in buffer space occurs.
> >
> > +Unlike stream sockets, dgram sockets do not use 
> > VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> > +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> > +is split to two parts: tx side and rx side. For the tx side, there is
> > +additional buffer space for each socket.
>
> Plus:
>
>   ... according to the the driver and device's available memory
>   resources. The amount of tx buffer space is an implementation detail
>   of both the device and the driver. It is not visible to the other side
>   and may be controlled by the application or administrative resource
>   limits.
>
> What I'm trying to describe here is that the additional tx buffer space
> isn't part of the device interface.
>
OK. Will do.

> > +The dgram sender sends packets when the virtqueue or the additional buffer 
> > is not full.
> > +When both are full, the sender
> > +MUST return an appropriate error to the upper layer application.
>
> MUST, SHOULD, etc clauses need to go into the
> devicenormative/drivernormative sections. They cannot be in regular
> text.
>
OK.

> > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the 
> > packet
> > +is dropped by the receiver.
>
> UDP is connectionless so any number of other sources can send messages
> to the same destination, causing buf_alloc's value to be unpredictable.
> Can you explain how buf_alloc works with datagram sockets in more
> detail?

In the linux kernel in my prototype, datagram sockets also use
virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() to update
vvs->rx_bytes and compare it with vvs->buf_alloc. virtio_transport_inc_rx_pkt
is called when enqueuing the datagram packets.
virtio_transport_dec_rx_pkt is called when dequeuing those packets.

If multiple sources send messages to the same destination, they will share
the same buf_alloc. Do you want something with more control?
Maybe somehow allocate a buffer for each remote CID and port? But I
feel that is a little bit overkill. Any other suggestions?

> >  \drivernormative{\paragraph}{Device Operation: Buffer Space 
> > Management}{Device Types / Socket Device / Device Operation / Buffer Space 
> > Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be 
> > transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, 
> > VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be 
> > dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> >  \devicenormative{\paragraph}{Device Operation: Buffer Space 
> > Management}{Device Types / Socket Device / Device Operation / Buffer Space 
> > Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be 
> > transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, 
> > VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be 
> > dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device 
> > Types / Socket Device / De
> >  The \field{guest_cid} configuration field MUST be used as the source CID 
> > when
> >  sending outgoing packets.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet 
> > is received with an
> >  unknown \field{type} value.
>
> What about datagram sockets? Please state what must happen and why.

I think datagram sockets should do the same thing as the stream to
return a VIRTIO_VSOCK_OP_RST?
Any other ideas?

> >
> >  \devicenormative{\paragraph}{Device Operation: Receive and 
> > Transmit}{Device Types / Socket Device / Device Operation / Receive and 
> > Transmit}
> >
> >  The \field{guest_cid} configuration field MUST NOT contain a reserved CID 
> > as listed in \ref{sec:Device Types / Socket Device / Device configuration 
> > layout}.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet 
> > is received with an
> >  unknown \field{type} value.
> >
> >  \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / 
> > Device Operation / Stream Sockets}
> > @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types 
> > / Socket Device / Device O
> >  destination) address tuple for a new connection while the other peer is 
> > still
> >  processing the old connection.
> >
> > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / 
> > Device Operation / Stream Sockets}
>
> s/Stream Sockets/Datagram Sockets/

Will do

> > +
> > +Datagram (dgram) sockets are connectionless and unreliable. The sender 
> > just sends
> > +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST 
> > reply is sent if
>
> s/hope/hopes/

got it.

> > +a receiving socket does not exist on the destination.
> > +If the transmission or receiving buffers are full, the packets
> > +are dropped. If the transmission buffer is full, an appropriate error 
> > message
> > +is returned to the caller.
>
> It's unclear whether the caller is the driver/device or something else.
> I think you're referring to the application interace, which is outside
> the scope of the VIRTIO spec. I suggest dropping the last sentence.

Yeah, I was thinking about the application interface. I will drop this
sentence.

Thanks for all the feedback.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to