In addition to the data path established using vhost-pci-net, this
patch also adds a support of establishing a notification path between
two virtio devices. New registers are added to the virtio device to
record all that's needed for its driver to inject interrupts using
hypercalls to the peer device (here, we treat virtio<---->virtio
connection as peer<---->peer) on the other end.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
---
 content.tex | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 222 insertions(+), 5 deletions(-)

diff --git a/content.tex b/content.tex
index 4b45678..5f9bdae 100644
--- a/content.tex
+++ b/content.tex
@@ -1295,6 +1295,14 @@ struct virtio_pci_common_cfg {
         le64 queue_desc;                /* read-write */
         le64 queue_avail;               /* read-write */
         le64 queue_used;                /* read-write */
+
+        /* About a peer device */
+        le16 peer_connection;           /* read-write */
+        le16 peer_num_rx_queues;        /* read only for driver */
+        le16 peer_rx_queue_select;      /* read-write */
+        le32 peer_rx_queue_gsi;         /* read-only for driver */
+        le64 peer_uuid_hi;              /* read-only for driver */
+        le64 peer_uuid_lo;              /* read-only for driver */
 };
 \end{lstlisting}
 
@@ -1361,6 +1369,25 @@ struct virtio_pci_common_cfg {
 
 \item[\field{queue_used}]
         The driver writes the physical address of Used Ring here.  See section 
\ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{peer_connection}]
+        Connection Control/Status. 1 - Connected; 0 - Disconnected.
+
+\item[\field{peer_num_rx_queues}]
+        The device uses this to report the number of RX virtqueues that the 
connected peer device uses.
+
+\item[\field{peer_rx_queue_select}]
+        The driver selects which RX virtqueue of the peer device the following 
fields refer to.
+
+\item[\field{peer_rx_queue_gsi}]
+        The device writes the GSI of an RX virtqueue of the peer device here.
+
+\item[\field{peer_uuid_hi}]
+        The device writes the high order 64-bit of the peer uuid here.
+
+\item[\field{peer_uuid_lo}]
+        The device writes the low order 64-bit of the peer uuid here.
+
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio 
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common 
configuration structure layout}
@@ -1405,9 +1432,15 @@ The device MUST present a 0 in \field{queue_enable} on 
reset.
 The device MUST present a 0 in \field{queue_size} if the virtqueue
 corresponding to the current \field{queue_select} is unavailable.
 
+The peer device related registers are used when the device is connected to 
another device (e.g. a vhost-pci device instance). The device SHOULD negotiate 
with the peer device, and configure \field{peer_num_rx_queues}, 
\field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, and \field{peer_uuid_lo}.
+
+When the device finishes the necessary negotiation with the peer device to 
establish the connection, it MUST write a 1 to the \field{peer_connection} and 
notify the driver.
+
+When the device notifies that the driver requests to write a 0 to 
\field{peer_connection}, it SHOULD first negotiate with the peer device to 
close the connection, and then write a 0 to the \field{peer_connection} and 
notify the driver.
+
 \drivernormative{\paragraph}{Common configuration structure layout}{Virtio 
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common 
configuration structure layout}
 
-The driver MUST NOT write to \field{device_feature}, \field{num_queues}, 
\field{config_generation} or \field{queue_notify_off}.
+The driver MUST NOT write to \field{device_feature}, \field{num_queues}, 
\field{config_generation}, \field{queue_notify_off}, 
\field{peer_num_rx_queues}, \field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, or 
\field{peer_uuid_lo}.
 
 The driver MUST NOT write a value which is not a power of 2 to 
\field{queue_size}.
 
@@ -1419,6 +1452,12 @@ After writing 0 to \field{device_status}, the driver 
MUST wait for a read of
 
 The driver MUST NOT write a 0 to \field{queue_enable}.
 
+The driver MUST NOT write a 1 to \field{peer_connection}.
+
+The driver SHOULD NOT read the peer device related registers until it is 
notified that a 1 has been written to \field{peer_connection}.
+
+The driver MUST NOT unload until it reads a 0 from \field{peer_connection}.
+
 \subsubsection{Notification structure layout}\label{sec:Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
 
 The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
@@ -1476,11 +1515,11 @@ The \field{offset} for the \field{ISR status} has no 
alignment requirements.
 The ISR bits allow the device to distinguish between device-specific 
configuration
 change interrupts and normal virtqueue interrupts:
 
-\begin{tabular}{ |l||l|l|l| }
+\begin{tabular}{ |l||p{3.5cm}|p{3.5cm}|p{3.5cm}|l| }
 \hline
-Bits       & 0                               & 1               &  2 to 31 \\
+Bits    & 0               & 1                              & 2                 
           & 3 to 31 \\
 \hline
-Purpose    & Queue Interrupt  & Device Configuration Interrupt & Reserved \\
+Purpose & Queue Interrupt & Device Configuration Interrupt & Peer Device 
Status Interrupt & Reserved \\
 \hline
 \end{tabular}
 
@@ -5750,9 +5789,181 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{Vhost-pci Net Device}\label{sec:Device Types / Vhost-pci Net Device}
+
+The vhost-pci net device enables point-to-point transmission of network 
packets between two isolated address spaces (e.g. virtual machines). An 
instance of the vhost-pci net device transmits and grabs packets from its peer 
device, which is usually a virtio net device from another address space.
+
+\subsection{Device ID}\label{sec:Device Types / Vhost-pci Net Device / Device 
ID}
+  TBD
+
+\subsection{Virtqueues}\label{sec:Device Types / Vhost-pci Net Device / 
Virtqueues}
+
+\begin{description}
+\item[0] control receiveq
+\item[1] control transmitq
+\item[2] receiveq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Vhost-pci Net Device / 
Feature bits}
+
+\subsubsection{Device feature bits}\label{Device Types / Vhost-pci Net Device 
/ Feature bits / Device feature bits}
+
+The device feature bits are the traditional feature bits, which are negotiated 
between the device and its driver.
+
+\begin{description}
+\item[VHOST_PCI_NET_F_MAC (0)] Device has given MAC address.
+
+\item[VHOST_PCI_NET_F_CTRL_MAC_ADDR (1)] Set MAC address through control 
channel.
+
+\item[VHOST_PCI_NET_F_MRG_RXBUF (2)] Driver can merge receive buffers.
+\end{description}
+
+\subsubsection{Peer feature bits}\label{Device Types / Vhost-pci Net Device / 
Feature bits / Peer feature bits}
+The peer feature bits need to be negotiated with the peer device. The feature 
bits that have been negotiated with the peer device are sent to the driver for 
a negotiation. If the driver only accepts a subset of the feature bits, the 
device needs to re-negotiate the subset of feature bits with the peer device, 
which may trigger a reset of the peer device.
+
+\begin{description}
+\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Virtio-net can receive TSOv4.
+
+\item[VIRTIO_NET_F_GUEST_TSO6 (8)] Virtio-net can receive TSOv6.
+
+\item[VIRTIO_NET_F_GUEST_ECN (9)] Virtio-net can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_GUEST_UFO (10)] Virtio-net can receive UFO.
+
+\item[VIRTIO_NET_F_HOST_TSO4 (11)] Vhost-pci-net supports TSOv4.
+
+\item[VIRTIO_NET_F_HOST_TSO6 (12)] Vhost-pci-net supports TSOv6.
+
+\item[VIRTIO_NET_F_HOST_ECN (13)] Vhost-pci-net supports TSO with ECN.
+
+\item[VIRTIO_NET_F_HOST_UFO (14)] Vhost-pci-net supports UFO.
+
+\item[VIRTIO_NET_F_MRG_RXBUF (15)] Virtio-net can merge receive buffers.
+
+\item[VHOST_F_LOG_ALL (27)] Vhost-pci-net supports dirty page logging.
+
+\end{description}
+
+\devicenormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net 
Device / Feature bits / Peer feature bits}
+The device SHOULD send the feature bits that have been accepted by the peer 
device to the driver through the control receiveq.
+
+\drivernormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net 
Device / Feature bits / Peer feature bits }
+Upon receiving the peer feature bits from the device, the driver SHOULD send 
its supported peer feature bits to the device via the control transmitq.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Vhost-pci 
Device / Device configuration layout}
+  None currently defined.
+
+\subsection{Device Initialization}\label{sec:Device Types / Vhost-pci Device / 
Device Initialization}
+
+The driver would perform a typical initialization routine like so:
+
+\begin{enumerate}
+\item Identify and intialize the control receiveq, control transmitq, and 
receiveq.
+
+\item Fill the receiveq and control receiveq with buffers.
+
+\item If the VHOST_PCI_NET_F_MAC feature bit is set, the configuration
+  space \field{mac} entry indicates the ``physical'' address of the
+  network card, otherwise the driver would typically generate a random
+  local MAC address.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Vhost-pci Net Device / 
Device Operation}
+
+\subsubsection{Control Virtqueue}\label{sec:Device Types / Vhost-pci Net 
Device / Device Operation / Control Virtqueue}
+
+The pair of control virtqueues are used to exchange configuration messages 
between the device and driver. All the configuration messages are constructed 
using the folloing structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl {
+        u32 request;
+        u64 vhost_pci_id;
+        u8  request_specific_payload[];
+};
+\end{lstlisting}
+
+The \field{vhost_pci_id} stores the id of the vhost pci device. It is usually 
assigned by the vhost-pci device management software.
+The requests are defined following the VHOST_PCI_CTRL format, and they are 
introduced below.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_FEATURE_BITS 0
+\end{lstlisting}
+
+The device sends the peer feature bits that have been negotiated with the peer 
device to the driver via the control receiveq. The driver sends back its 
accepted peer feature bits to the device via the control transmitq.
+The request payload is described using the following structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_driver_feature_bits {
+        u64 feature_bits;
+}
+\end{lstlisting}
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_MEM_INFO 1
+\end{lstlisting}
+
+The device sends the memory info obtained from the peer device to the driver. 
The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_mem_info {
+#define VHOST_PCI_MEM_INFO_NEED_MAP_N 0
+#define VHOST_PCI_MEM_INFO_NEED_MAP_Y 1
+        u8 need_map;
+        u64 peer_mem;
+        u8 other_mem_info[];
+}
+\end{lstlisting}
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_N, \field{peer_mem} 
stores the virtual address which already maps to the start of the peer memory. 
The driver can use it directly to access the peer memory.
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_Y, the driver needs 
to map the peer memory via a device BAR, and \field{peer_mem} stores the BAR 
id. The driver sends back a message to the device with \field{peer_mem} set to 
the virtual address that maps to the peer memory.
+
+The \field{other_mem_info} stores other peer memory info for the driver to 
reference, and it is defined according to the implementation's need.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_VIRTQ_INFO 2
+\end{lstlisting}
+
+The device sends the virtqueue info obtained from the peer device to the 
driver. The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_virtq_info {
+#define VHOST_PCI_PEER_VIRTQ_TX 0
+#define VHOST_PCI_PEER_VIRTQ_RX 1
+        u8 tx_or_rx;
+        u32 virtq_num;
+        struct virtq vq[];
+}
+\end{lstlisting}
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_TX, the driver 
initializes \field{virtq_num} of virtqueues by sharing the TX virtqueues from 
the peer device, and uses them as its mirrored RX virtqueues. To receive 
packets from the peer device, the driver copies packets from the mirrored RX 
virtqueues to its own RX virtqueue (i.e. the defined receivq).
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_RX, the driver 
initializes \field{virtq_num} of virtqueues by sharing the RX virtqueues from 
the peer device, and uses them as its mirrored TX virtqueues. To transmit 
packets to the peer device, the driver copies packets to the mirrored TX 
virtqueues.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING 3
+\end{lstlisting}
+
+The device sends messages to turn on or off the page logging mode of the 
driver.
+\begin{lstlisting}
+struct vhost_pci_ctrl_dirty_page_logging {
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_OFF 0
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_ON  1
+        u8 off_or_on;
+}
+\end{lstlisting}
+
+Other types of vhost-pci devices (e.g. scsi, console) may use the same 
controlq messages above. Here defines the messages that are specific to 
vhost-pci net devices.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_MAC 0x10000
+\end{lstlisting}
+
+if \field{VHOST_PCI_NET_F_CTRL_MAC_ADDR} is negotiated, the driver sends a 
message via the control transmitq to set the MAC address of the device.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
-Currently there are three device-independent feature bits defined:
+Currently there are four device-independent feature bits defined:
 
 \begin{description}
   \item[VIRTIO_F_RING_INDIRECT_DESC (28)] Negotiating this feature indicates
@@ -5764,6 +5975,10 @@ Currently there are three device-independent feature 
bits defined:
 
   \item[VIRTIO_F_VERSION_1(32)] This indicates compliance with this
     specification, giving a simple way to detect legacy devices or drivers.
+
+  \item[VIRTIO_F_PV_INTERRUPT(33)] Negotiating this feature indicates that the
+    driver can inject an interrupt to its peer device in a paravirtualized
+    way (e.g. hypercall).
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -5776,6 +5991,8 @@ MAY fail to operate further if VIRTIO_F_VERSION_1 is not 
offered.
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
 if VIRTIO_F_VERSION_1 is not accepted.
 
+A device MUST check if the management environment (e.g. a virtual machine 
monitor) supports pv interrupt and configures the VIRTIO_F_PV_INTERRUPT feature 
bit accordingly.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature 
Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
-- 
1.9.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Reply via email to