Re: [virtio-dev] [PATCH v7] virtio_net: support split header
在 2022/8/16 下午5:34, Heng Qi 写道: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER 8 u8
[virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/2 下午2:21, Jason Wang 写道: On Tue, Aug 16, 2022 at 5:35 PM Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO
[virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/2 下午2:41, Michael S. Tsirkin 写道: On Fri, Sep 02, 2022 at 02:21:04PM +0800, Jason Wang wrote: On Tue, Aug 16, 2022 at 5:35 PM Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F
[virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/2 下午2:21, Jason Wang 写道: On Tue, Aug 16, 2022 at 5:35 PM Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO
Re: [virtio-dev] [PATCH v7] virtio_net: support split header
在 2022/8/16 下午5:34, Heng Qi 写道: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER 8 u8
Re: [virtio-dev] [PATCH v7] virtio_net: support split header
在 2022/8/16 下午5:34, Heng Qi 写道: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER 8 u8
Re: [virtio-dev] [PATCH v7] virtio_net: support split header
在 2022/8/25 下午10:22, Cornelia Huck 写道: On Tue, Aug 16 2022, Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) I do not have any further comments on the change, let's see what the networking folks think. Okay. Thanks for your review. [Do we require patches to be posted to virtio-comment, or is virtio-dev enough? I'm a bit unsure right now.] We can then consider posting patches to virtio-comment.
Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote: > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin wrote: > > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote: > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote: > > > > > > Jason I think the issue with previous proposals is that they > > > > > > conflict > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the > > > > > > driver flexibility in arranging the packet in memory is benefitial. > > > > > > > > > > > > > > > Yes, but I didn't found how it can conflict the any_layout. Device > > > > > can just > > > > > to not split the header when the layout doesn't fit for header > > > > > splitting. > > > > > (And this seems the case even if we're using buffers). > > > > > > > > Well spec says: > > > > > > > > indicates to both the device and the driver that no > > > > assumptions were made about framing. > > > > > > > > if device assumes that descriptor boundaries are where > > > > driver wants packet to be stored that is clearly > > > > an assumption. > > > > > > Yes but what I want to say is, the device can choose to not split the > > > packet if the framing doesn't fit. Does it still comply with the above > > > description? > > > > > > Thanks > > > > The point of ANY_LAYOUT is to give drivers maximum flexibility. > > For example, if driver wants to split the header at some specific > > offset this is already possible without extra functionality. > > I'm not sure how this would work without the support from the device. > This probably can only work if: > > 1) the driver know what kind of packet it can receive > 2) protocol have fixed length of the header > > This is probably not true consider: > > 1) TCP and UDP have different header length > 2) IPv6 has an variable length of the header > > > > > > Let's keep it that way. > > > > Now, let's formulate what are some of the problems with the current way. > > > > > > > > A- mergeable buffers is even more flexible, since a single packet > > is built up of multiple buffers. And in theory device can > > choose arbitrary set of buffers to store a packet. > > So you could supply a small buffer for headers followed by a bigger > > one for payload, in theory even without any changes. > > Problem 1: However since this is not how devices currently operate, > > a feature bit would be helpful. > > How do we know the bigger buffer is sufficient for the packet? If we > try to allocate 64K (not sufficient for the future even) it breaks the > effort of the mergeable buffer: > > header buffer #1 > payload buffer #1 > header buffer #2 > payload buffer #2 > > Is the device expected to > > 1) fill payload in header buffer #2, this breaks the effort that we > want to make payload page aligned > 2) skip header buffer #2, in this case, the device assumes the framing > when it breaks any layout > > > > > Problem 2: Also, in the past we found it useful to be able to figure out > > whether > > packet fits in a single buffer without looking at the header. > > For this reason, we have this text: > > > > If a receive packet is spread over multiple buffers, the device > > MUST use all buffers but the last (i.e. the first > > \field{num_buffers} - > > 1 buffers) completely up to the full length of each buffer > > supplied by the driver. > > > > if we want to keep this optimization and allow using a separate > > buffer for headers, then I think we could rely on the feature bit > > from Problem 1 and just make an exception for the first buffer. > > Also num_buffers is then always >= 2, maybe state this to avoid > > confusion. > > > > > > > > > > > > B- without mergeable, there's no flexibility. In particular, there can > > not be uninitialized space between header and data. > > I had two questions > > 1) why is this not a problem of mergeable? There's no guarantee that > the header is just the length of what the driver allocates for header > buffer anyhow > > E.g the header length could be smaller than the header buffer, the > device still needs to skip part of the space in the header buffer. > > 2) it should be the responsibility of the driver to handle the > uninitialized space, it should do anything that is necessary for > security, more below > We've talked a bit more about split header so far, but there still seem to be some issues, so let's recap. 一、 Method Discussion Review In order to adapt to the Eric's tcp receive interface to achieve zero copy, header and payload are required to be stored separately, and the payload is stored in a paged alignment way. Therefore, we have discussed several options for split header as follows: 1: method A ( depend on the descriptor chain ) | receive buffer| | 0th descriptor
Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
On Fri, Sep 09, 2022 at 08:47:57PM +0800, Xuan Zhuo wrote: > > hi >Qi also sent another same email today. Due to some email client problems, >this email has some confusion in the format, so we can discuss >under another one. > >https://lists.oasis-open.org/archives/virtio-dev/202209/msg00066.html > Yes. Due to some formatting issues with the mail client, I resent this new email which may be in a clear style for your review. Do you have more questions about the contents of this new email? Looking forward to your comments. Thanks. > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
On Sun, Sep 04, 2022 at 04:27:38PM -0400, Michael S. Tsirkin wrote: > On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote: > > We need to clarify that the purpose of header splitting is to make all > > payloads > > can be independently in a page, which is beneficial for the zerocopy > > implemented by the upper layer. > > absolutely, pls add motivation. > > > If the driver does not enforce that the buffers submitted to the receiveq > > MUST > > be composed of at least two descriptors, then header splitting will become > > meaningless, > > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated > > at this time. > > > > > > Thanks. > > > > > > > This seems very narrow and unecessarily wasteful of descriptors. > What is wrong in this: > > .. > > seems to achieve the goal of data in a separate page without > using extra descriptors. > > thus my proposal to replace the requirement of a separate > descriptor with an offset of data from beginning of > buffer that driver sets. > We have carefully considered your suggestion. Let's summarize the schemes we've considered before and now. 1. Scheme A ( refer to spec v7 ) We refer to spec v7 and earlier as scheme A for short. Review scheme A below: | receive buffer| | 0th descriptor | 1th descriptor | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be put independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Scheme A better solves the problem of headroom, tailroom and memory waste, but as you said, this solution relies on descriptor chain. 2. Scheme B ( refer to your suggestion ) Our rethinking approach is no longer based on descriptor chain. We refer to your proposed offset-based scheme as scheme B. As you suggested, scheme B gives the device a buffer, using offset to indicate where to place the payload. Like this: .. But how to apply for this buffer? Since we want the payload to be placed on a separate page, the method we consider is to directly alloc two pages from driver of contiguous memory. Then the beginning of this contiguous memory is used to store the headroom, and the contiguous memory after the headroom is directly handed over to the device. Similar to the following: [-- receive buffer(2 pages) --] [<first page ---><-- second page >] [<-> ..< payload >] ^^ || |pointer to device | | Driver reserved, the later part is filled 3. Scheme C (this sheme we have sent to you on September 7th, maybe you miss it.) Based on your previous suggestion, we also considered another new scheme C. This scheme is implemented based on mergeable buffer, filling a separate page each time. If the split header is negotiated and the packet can be successfully split by the device, the device needs to find at least two buffers, namely two pages, one for the virtio-net header and transport header, and the other for the payload. Like the following: | receive buffer1(page) | receive buffer2 (page) | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | At the same time, if XDP is considered, then the device needs to add headroom at the beginning of receive buffer1 when receiving packets, so that the driver can process programs similar to XDP. In order to solve this problem, scheme C introduce an offset which requires the device to write data from the offset to receive buffer1, like the following: | receive buffer (page) | receive buffer (page) | | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | ^ | pointer to device 4. Summarize Then we simply compare the advantages and disadvantages of scheme A(spec v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 1). descriptor chain: i) A depends on desciptor chain; ii) B, C do not depend on desciptor chain. 2). page alloc i) B fills with two consecutive pages, which causes a great waste of memory for small packages such as arp; ii) C fills with a single page, slightly better than B. 3). Memory waste: i) The memory waste of scheme A is mainly the 0th descriptor that is skipped by the device; ii) When scheme B and
[virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/5 上午4:27, Michael S. Tsirkin 写道: On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote: We need to clarify that the purpose of header splitting is to make all payloads can be independently in a page, which is beneficial for the zerocopy implemented by the upper layer. absolutely, pls add motivation. If the driver does not enforce that the buffers submitted to the receiveq MUST be composed of at least two descriptors, then header splitting will become meaningless, or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time. Thanks. This seems very narrow and unecessarily wasteful of descriptors. What is wrong in this: .. seems to achieve the goal of data in a separate page without using extra descriptors. thus my proposal to replace the requirement of a separate descriptor with an offset of data from beginning of buffer that driver sets. We have carefully considered your suggestion. We refer to spec v7 and earlier as scheme A for short. Review scheme A below: | receive buffer | | 0th descriptor | 1th descriptor | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. scheme A better solves the problem of headroom, tailroom and memory waste, but as you said, this solution relies on descriptor chain. Our rethinking approach is no longer based on or using descriptor chain. We refer to your proposed offset-based scheme as scheme B: As you suggested, scheme B gives the device a buffer, using offset to indicate where to place the payload like this: .. But how to apply for this buffer? Since we want the payload to be placed on a separate page, the method we consider is to directly apply to the driver for two pages of contiguous memory. Then the beginning of this contiguous memory is used to store the headroom, and the contiguous memory after the headroom is directly handed over to the device. similar to the following: <-- receive buffer(2 pages) -> <<-- first page --->< second page -->> <header>> Based on your previous suggestion, we also considered another new scheme C. This scheme is implemented based on mergeable buffer, filling a separate page each time. If the split header is negotiated and the packet can be successfully split by the device, the device needs to find at least two buffers, namely two pages, one for the virtio-net header and transport header, and the other for the data payload. Like the following: | receive buffer1(page) | receive buffer2 (page) | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | At the same time, if XDP is considered, then the device needs to add headroom at the beginning of receive buffer1 when receiving packets, so that the driver can process programs similar to XDP. In order to solve this problem, can scheme C introduce an offset, which requires the device to write data from the offset position to receive buffer1, like the following: | receive buffer (page) | receive buffer (page) | | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | Then we simply compare the advantages and disadvantages of scheme A(spec v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 1. desc chain: - A depends on desciptor chain; - B, C do not depend on desciptor chain. 2. page alloc - B fills two consecutive pages, which causes a great waste of memory for small packages such as arp; - C fills a single page, slightly better than B. 3. Memory waste: - The memory waste of scheme A is mainly the 0th descriptor that is skipped by the device; - When scheme B and scheme C successfully split the header, there is a huge waste of the first page, but the first page can be quickly released by copying. 4. headroom - The headrooms of plan A and plan B are reserved; - Scheme C requires the driver to set off to let the device skip off when using receive buffer1. 5. tailroom - When splitting the header, skb usually needs to store each independent page in the non-linear data area based on shinfo. - The tailroom of scheme A is reserved by itself; - Scheme B requires the driver to set the reserved padding area for the first receive buffer(2 pages) to use shinfo when the split header is not successfully executed; - Scheme C requires the driver to set max_len for the first receive buffer(page). Which plan do you prefer? --- Thanks. - To unsubscribe, e-mail: virtio-dev-unsubsc
Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/28 上午5:35, Michael S. Tsirkin 写道: On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote: 在 2022/9/9 20:38, Xuan Zhuo 写道: On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" wrote: On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote: 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道: On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote: We need to clarify that the purpose of header splitting is to make all payloads can be independently in a page, which is beneficial for the zerocopy implemented by the upper layer. absolutely, pls add motivation. If the driver does not enforce that the buffers submitted to the receiveq MUST be composed of at least two descriptors, then header splitting will become meaningless, or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time. Thanks. This seems very narrow and unecessarily wasteful of descriptors. What is wrong in this: .. seems to achieve the goal of data in a separate page without using extra descriptors. thus my proposal to replace the requirement of a separate descriptor with an offset of data from beginning of buffer that driver sets. We have carefully considered your suggestion. We refer to spec v7 and earlier as scheme A for short. Review scheme A below: | receive buffer | | 0th descriptor | 1th descriptor | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. scheme A better solves the problem of headroom, tailroom and memory waste, but as you said, this solution relies on descriptor chain. Our rethinking approach is no longer based on or using descriptor chain. We refer to your proposed offset-based scheme as scheme B: As you suggested, scheme B gives the device a buffer, using offset to indicate where to place the payload like this: .. But how to apply for this buffer? Since we want the payload to be placed on a separate page, the method we consider is to directly apply to the driver for two pages of contiguous memory. Then the beginning of this contiguous memory is used to store the headroom, and the contiguous memory after the headroom is directly handed over to the device. similar to the following: <-- receive buffer(2 pages) -> <<-- first page --->< second page -->> <> Based on your previous suggestion, we also considered another new scheme C. This scheme is implemented based on mergeable buffer, filling a separate page each time. If the split header is negotiated and the packet can be successfully split by the device, the device needs to find at least two buffers, namely two pages, one for the virtio-net header and transport header, and the other for the data payload. Like the following: | receive buffer1(page) | receive buffer2 (page) | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | At the same time, if XDP is considered, then the device needs to add headroom at the beginning of receive buffer1 when receiving packets, so that the driver can process programs similar to XDP. In order to solve this problem, can scheme C introduce an offset, which requires the device to write data from the offset position to receive buffer1, like the following: | receive buffer (page) | receive buffer (page) | | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | And in fact, B and C both use an offset now, right? B: offset is used to get the position to place the payload. C: The offset is used to reserve some space for the device, which the driver can use as headroom. In order to make the payload page-aligned, we can only hand over the entire page to the device, so we cannot reserve some headroom in advance. For C, it might be better to do some tweak since mergeable buffer doesn't forbid using a descriptor chain as a single buffer. So if it's a descriptor chain we got back the method A by placing the payload in a dedicated buffer. If it's not placing the payload in an adjacent buffer. Thanks Let's find a way so devices do not care how descriptors are laid out. Hi, I'm not sure if you're replying to the wrong email (now in v7 thread), if not, I'm guessing you mean we continue to discuss a way for devices to not care how descriptors are laid out based on split header v7, and no longer consider the new options B and C? Thanks. Then we simply compare the advantages and disadvantages of scheme A(spec v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 1. desc chain: - A depends on desciptor chain; - B, C
[virtio-dev] Re: [PATCH v7] virtio_net: support split header
在 2022/9/5 下午3:52, Xuan Zhuo 写道: On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin" wrote: On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote: When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the driver requires that the buffers submitted to receiveq MUST be composed of at least two descriptors, which means that each buffer the device gets is a descriptor chain, even if the device does not split the header for some packets. To store packet in the descriptor chain without header splitting by the device, the device MUST start with the first descriptor of the descriptor chain to store the packet, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}. Thanks. Descriptor chains will hurt performance badly. I understand the reasons for the performance impact here are: 1. Two buffers are used 2. One buffer occupies two descs This is the same as my understanding in the case of mergeable. We also need to pack the packets into two buffers, and a packet will eventually occupy two descs. How about simply making this feature depend on mergeable buffers? Then we have a separate buffer for the header and this works cleanly. Under mergeable, each buffer is independent, and the split header requires two unequal descs. If we implement it based on mergeable, then consider the scenario of tcp zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an separate buffer to save the header, then this is a waste, we may have to copy at the driver layer. @Qi Do you think there will be other problems with this approach? Thanks. When we think about specs, we shouldn't be too distracted by the implementation. But when we did think about this, suppose the driver fills by page based on mergeable mode. in order to use the xdp program, the driver usually takes the beginning of a single page as the headroom, and fills the rest of the page into the virtqueue. Therefore, the empty buffer obtained by the device is always smaller than a page when we implement split header based on this mode, that is, the data load finally obtained by the driver is offset from the beginning of the page. This does not enjoy the benefits of zero copy. At the same time, since the header is always only more than 100 bytes, the page occupied by the header is a waste of the buffer. Thanks.
Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
在 2022/8/4 下午9:50, Cornelia Huck 写道: On Thu, Aug 04 2022, Heng Qi wrote: 在 2022/8/4 下午2:27, Jason Wang 写道: On Mon, Aug 1, 2022 at 2:59 PM Heng Qi wrote: @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network driver MUST NOT use the \field{csum_start} and \field{csum_offset}. If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} +is not set, the driver MAY use \field{hdr_len} only as a hint about the transport header size. -The driver MUST NOT rely on \field{hdr_len} to be correct. + +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver +MUST NOT rely on \field{hdr_len} to be correct. I think we should keep the above description as-is. For whatever case, the driver must not trust the metadata set by the device and must perform necessary sanity tests on them. My idea is to keep the current description as it is, but to emphasize in the next version: "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver MAY treat the \field{hdr_len} as the length of the protocol header inside the first descriptor." Just to be clear, you suggest using "If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have been negotiated, the driver MAY use \field{hdr_len} only as a hint about the transport header size. The driver MUST NOT rely on \field{hdr_len} to be correct. If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver MAY treat the \field{hdr_len} as the length of the protocol header inside the first descriptor." Yes. I will use the above description to make it clearer in the next version. (Maybe "...the driver MAY use \field{hdr_len} as a hint about the length of the protocol header..."? It's still not reliable, right?) \field{hdr_len} is unreliable when VIRTIO_NET_F_SPLIT_HEADER is not negotiated. If VIRTIO_NET_F_SPLIT_HEADER is negotiated, "split header" MAY perform the split from the IP layer, so the protocol header and the transport header are different. so I think the "...the driver MAY use \field{hdr_len} only as a hint about the transport header size..." paragraph can be left as-is.
Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
在 2022/8/9 下午5:18, Jason Wang 写道: On Thu, Aug 4, 2022 at 8:48 PM Heng Qi wrote: 在 2022/8/4 下午2:27, Jason Wang 写道: On Mon, Aug 1, 2022 at 2:59 PM Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 + content.tex | 111 ++-- 2 files changed, 109 insertions(+), 4 deletions(-) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..bd0f463 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..74c36fe 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol +header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE0 #define VIRTIO_NET_HDR_GSO_TCPV4 1 @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network n
[virtio-dev] [PATCH v7] virtio_net: support split header
From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v7: 1. Fix some presentation issues. 2. Use "split transport header". @Jason Wang 3. Clarify some paragraphs. @Cornelia Huck 4. determine the device what to do if it does not perform header split on a packet. v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 ++ content.tex | 102 2 files changed, 104 insertions(+) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..4e2b82e 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..5676da9 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting +the transport header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE0 #define VIRTIO_N
Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
在 2022/8/4 下午2:27, Jason Wang 写道: On Mon, Aug 1, 2022 at 2:59 PM Heng Qi wrote: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 + content.tex | 111 ++-- 2 files changed, 109 insertions(+), 4 deletions(-) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..bd0f463 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..74c36fe 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol +header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE0 #define VIRTIO_NET_HDR_GSO_TCPV4 1 @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}. If one of the VIRTIO_NET_F_GUES
[virtio-dev] Re: [PATCH] virtio_net: support split header
The content of the mail just sent is about "[PATCH v6] virtio_net: support split header". 在 2022/8/1 下午2:59, Heng Qi 写道: From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 + content.tex | 111 ++-- 2 files changed, 109 insertions(+), 4 deletions(-) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..bd0f463 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..74c36fe 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol +header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE0 #define VIRTIO_NET_HDR_GSO_TCPV4 1 @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network not set VIRTIO_NET_HD
[virtio-dev] [PATCH] virtio_net: support split header
From: Xuan Zhuo The purpose of this feature is to split the header and the payload of the packet. |receive buffer| | 0th descriptor | 1th descriptor| | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | We can use a buffer plus a separate page when allocating the receive buffer. In this way, we can ensure that all payloads can be independently in a page, which is very beneficial for the zerocopy implemented by the upper layer. Signed-off-by: Xuan Zhuo Signed-off-by: Heng Qi Reviewed-by: Kangjie Xu --- v6: 1. Fix some syntax issues. @Cornelia Huck 2. Clarify some paragraphs. @Cornelia Huck 3. Determine the device what to do if it does not perform header split on a packet. v5: 1. Determine when hdr_len is credible in the process of rx 2. Clean up the use of buffers and descriptors 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge v4: 1. fix typo @Cornelia Huck @Jason Wang 2. do not split header for IP fragmentation packet. @Jason Wang v3: 1. Fix some syntax issues 2. Fix some terminology issues 3. It is not unified with ip alignment, so ip alignment is not included 4. Make it clear that the device must support four types, in the case of successful negotiation. conformance.tex | 2 + content.tex | 111 ++-- 2 files changed, 109 insertions(+), 4 deletions(-) diff --git a/conformance.tex b/conformance.tex index 2b86fc6..bd0f463 100644 --- a/conformance.tex +++ b/conformance.tex @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} \end{itemize} \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} diff --git a/content.tex b/content.tex index e863709..74c36fe 100644 --- a/content.tex +++ b/content.tex @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol +header and the payload. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM1 #define VIRTIO_NET_HDR_F_DATA_VALID2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE0 #define VIRTIO_NET_HDR_GSO_TCPV4 1 @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}. If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have -been negotiated, the device SHOULD set \field{hdr_len} to a value
[virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash
在 2023/1/9 下午7:39, Michael S. Tsirkin 写道: Btw this "are defined below" all over the place is just contributing to making the spec unnecesarily verbose. Simple "are:" will do. Sure. I'll fix it in the next version. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash
On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote: > On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote: > > > This will give extra pressure on the management stack, e.g it requires > > > the device to have an out of spec way for introspection. > > > > > > Thanks > > > > As I tried to explain this is already the case. Feature bits do not > > describe device capabilities fully, some of them are in config space. > > To be precise, this does not necessarily require introspection, but > it does require management control over config space > such as supported hash types just like it has control over feature bits. > E.g. QEMU currently seems to hard-code these to > #define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \ > VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \ > VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \ > VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \ > VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \ > VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \ > VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \ > VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \ > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) > > but there's no reason not to give management control over these. Yes, QEMU has requirements for live migration: the PCI config space will be checked in get_pci_config_device(), and if src and dst are inconsistent, it will prompt that the live migration failed. In fact, this is also done within our group. Live migration requires that the two VMs have the same rss configuration, otherwise the migration will fail. Therefore, it seems that we can regularize the description of VIRTIO_NET_F_HASH_TUNNEL into "[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for tunnel-encapsulated packets.", and use different hash_types to help the migration determine whether it can succeed. Thanks. > > -- > MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash
在 2023/1/10 下午3:26, Heng Qi 写道: On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote: On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote: This will give extra pressure on the management stack, e.g it requires the device to have an out of spec way for introspection. Thanks As I tried to explain this is already the case. Feature bits do not describe device capabilities fully, some of them are in config space. To be precise, this does not necessarily require introspection, but it does require management control over config space such as supported hash types just like it has control over feature bits. E.g. QEMU currently seems to hard-code these to #define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \ VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \ VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \ VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \ VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \ VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \ VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \ VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \ VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) but there's no reason not to give management control over these. Yes, QEMU has requirements for live migration: the PCI config space will be checked in get_pci_config_device(), and if src and dst are inconsistent, it will prompt that the live migration failed. To be clearer, I mean \filed{supported_hash_types} in structure virtio_net_config. Thanks. In fact, this is also done within our group. Live migration requires that the two VMs have the same rss configuration, otherwise the migration will fail. Therefore, it seems that we can regularize the description of VIRTIO_NET_F_HASH_TUNNEL into "[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for tunnel-encapsulated packets.", and use different hash_types to help the migration determine whether it can succeed. Thanks. -- MST This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v3] virtio_net: support inner header hash
在 2022/12/16 下午8:49, Michael S. Tsirkin 写道: On Mon, Dec 05, 2022 at 02:36:39PM +0800, Heng Qi wrote: @@ -4005,6 +4159,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 \end{lstlisting} +If \field{hash_report} differs from VIRTIO_NET_HASH_REPORT_NONE, +\field{hash_report_tunnel} can report the type of the tunnel-encapsulated +packet to the driver over the inner header hash calculation. +Possible values that the device can report in \field{hash_report_tunnel} +are defined below: + +\begin{lstlisting} +#define VIRTIO_NET_HASH_REPORT_GRE 1 +#define VIRTIO_NET_HASH_REPORT_VXLAN 2 +#define VIRTIO_NET_HASH_REPORT_GENEVE 3 +\end{lstlisting} + +The values VIRTIO_NET_HASH_REPORT_GRE, VIRTIO_NET_HASH_REPORT_VXLAN and +VIRTIO_NET_HASH_REPORT_GENEVE correspond to VIRTIO_NET_HASH_TYPE_GRE_INNER, +VIRTIO_NET_HASH_TYPE_VXLAN_INNER and VIRTIO_NET_HASH_TYPE_GENEVE_INNER bits +of supported hash types defined in respectively +\ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}. + \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue} The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is If the new feature flag is negotiated, we need to spell out more clearly what are the rules for packets that are not encapsulated. Is hash_report_tunnel set to 0 Yes, we should. When the _TUNNEL feature is negotiated, for non-encapsulated packets we set \field{hash_report_tunnel} to 0. then? Another comment is that we keep repeating GRE/VXLAN/GENEVE too many times. Let's add a paragraph defining a concept e.g. a "tunnel" or "tunneled packets", explaining how they are handled at a high level, and then just refer to the tunnel everywhere. Let's also add external references to specifications documenting the relevant tunnel types. Ok, I'll try to make this clear. Thanks. -- 2.19.1.6.gb485710b - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] [PATCH] virtio_net: support low and high rate of notification coalescing sets
在 2022/12/21 下午7:48, Alvaro Karsz 写道: Hi, I want to know which one is better than NetDim(Coalesce Adaptive) in driver. I know Heng Qi's work in this. Thanks Why choose? we can have both. ethtool can handle both pkt_rate_low/pkt_rate_high and use_adaptive_rx_coalesce/use_adaptive_tx_coalesce. The adaptive algorithm can even use this feature to set low and high coalescing sets. Hi, all. NetDIM is currently a mature library in the kernel. It uses the number of bytes, PPS and interrupt rate as samples to make an action, and it is performed independently in tx or rx direction. Also, There will be an extra worker to help us send the configuration based on the control queue to avoid interrupting the softirq. Although the method of pkt_rate_{low, high} params does not conflict with dim, I seem to have some doubts that the colasce parameters of the rx and tx directions are determined at the same time based on pkt_rate alone, will this be a problem? Thanks. Alvaro - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash
On Wed, Jan 18, 2023 at 11:45:39PM +, Parav Pandit wrote: > > > > From: virtio-dev@lists.oasis-open.org > > Sent: Wednesday, January 4, 2023 2:14 AM > > > If the tunnel is used to encapsulate the packets, the hash calculated using > > the > > outer header of the receive packets is always fixed for the same flow > > packets, > > i.e. they will be steered to the same receive queue. > > > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner > > +header hash for GRE, VXLAN and GENEVE tunnel-encapsulated packets. > > + > A device may not support all 3 at the same time. > Please remove mentioning tunneling protocols description from here. > Just say device support inner header hash ... Sorry for the late reply due to vacation. Good idea, Michael suggested doing the same. But we also discussed this issue: Early, we used a feature bit to force devices to support GRE and VXLAN (see https://lists.oasis-open.org/archives/virtio-dev/202211/msg00183.html ). Later Jason suggested to use VIRTIO_NET_F_HASH_TUNNEL instead of VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER (see https://lists.oasis-open.org/archives/virtio-dev/202212/msg00014.html ). Now Michael proposes to remove this list (see https://lists.oasis-open.org/archives/virtio-dev/202301/msg00079.html ), because the migration is based on the feature bit and hash types to determine whether the live migration is successful. > > An additional bit map somewhere else should say supported hash over different > tunneling types. > Yes, we use \field{supported_hash_types} to declare supported hash types. > [...] > > > +The device calculates the hash on the inner IPv4 packet of an > > +encapsulated packet according to 'Enabled hash types' bitmask as follows: > > +\begin{itemize} > > + \item If VIRTIO_NET_HASH_TYPE_TCPv4 is set and the encapsulated packet > > has an inner > > + TCPv4 header, the hash is calculated over the following fields: > > +\begin{itemsize} > > + \item inner Source IP address > > + \item inner Destination IP address > > + \item inner Source TCP port > > + \item inner Destination TCP port > > +\end{itemsize} > > + \item Else if VIRTIO_NET_HASH_TYPE_UDPv4 is set and the encapsulated > > packet has an > > + inner UDPv4 header, the hash is calculated over the following fields: > > +\begin{itemsize} > > + \item inner Source IP address > > + \item inner Destination IP address > > + \item inner Source UDP port > > + \item inner Destination UDP port > > +\end{itemize} > > + \item Else if VIRTIO_NET_HASH_TYPE_IPv4 is set, the hash is calculated > > over > > the > > + following fields: > > +\begin{itemsize} > > + \item inner Source IP address > > + \item inner Destination IP address > > +\end{itemsize} > > + \item Else the device does not calculate the hash \end{itemize} > > + > > +The device calculates the hash on the inner IPv6 packet without an > > +extension header of an encapsulated packet according to 'Enabled hash > > types' > > bitmask as follows: > > +\begin{itemize} > > + \item If VIRTIO_NET_HASH_TYPE_TCPv6 is set and the encapsulated packet > > has an inner > > + TCPv6 header, the hash is calculated over the following fields: > > +\begin{itemsize} > > + \item inner Source IPv6 address > > + \item inner Destination IPv6 address > > + \item inner Source TCP port > > + \item inner Destination TCP port > > +\end{itemsize} > > + \item Else if VIRTIO_NET_HASH_TYPE_UDPv6 is set and the encapsulated > > packet has an > > + inner UDPv6 header, the hash is calculated over the following fields: > > +\begin{itemsize} > > + \item inner Source IPv6 address > > + \item inner Destination IPv6 address > > + \item inner Source UDP port > > + \item inner Destination UDP port > > +\end{itemize} > > + \item Else if VIRTIO_NET_HASH_TYPE_IPv6 is set, the hash is calculated > > over > > the > > + following fields: > > +\begin{itemsize} > > + \item inner Source IPv6 address > > + \item inner Destination IPv6 address > > +\end{itemsize} > > + \item Else the device does not calculate the hash \end{itemize} > > + > > +The device calculates the hash on the inner IPv6 packet with an > > +extension header of an encapsulated packet according to 'Enabled hash > > types' > > bitmask as follows: > > +\begin{itemsize} > > + \item If VIRTIO_NET_HASH_TYPE_TCP_EX is set and the encapsulated packet > > has an inner > > + TCPv6 header, the hash is calculated over the following fields: > > +\begin{itemize} > > + \item Home address from the home address option in the inner IPv6 > > destination > > + options header. If the inner extension header is not > > present, use the > > + inner Source IPv6 address. > > + \item
[virtio-dev] Re: [PATCH 1/2] virtio_net: fix syntax errors
在 2022/11/9 下午9:23, Michael S. Tsirkin 写道: On Wed, Nov 09, 2022 at 07:35:14PM +0800, Heng Qi wrote: Please ignore this email. Thanks. ok. if 2/2 is still relevant pls post it separately. I sent a new patch set with a cover-letter, please see in https://lists.oasis-open.org/archives/virtio-dev/202211/msg00025.html. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash
在 2023/2/2 上午11:55, Parav Pandit 写道: From: virtio-comm...@lists.oasis-open.org On Behalf Of Michael S. Tsirkin Sent: Wednesday, February 1, 2023 1:57 AM Also, this patch is adding two functionalities. 1. Inner header hash calculation of existing already defined hash types 2. outer header hash for new type for GRE,VXLAN,GENEVE. #1 should be in 1st patch. #2 should be in 2nd patch. This is better to review. Parav, you come to this discussion pretty late. Asking to split up the patch when it's v1/v2 is ok. Asking after others have already reviewed v6 is not you are making review easier for yourself but re-review harder for others who already have a mind map of the patch. In this case unless we really want to enable these separately (and frankly I don't see a good reason to) then splitting it up makes review more confusing. No. There is no need to enable it separately. It was hard to parse new inner type decoding addition which has close to zero relation to outer headers. As you say, it has some history, I don't have strong opinion to split. But going forward in subsequent work, it is better to see logical changes in multiple patches. It seems that we don't need to emphasize the outer header hash, which is the same behavior as usual when the inner header hash is not raised. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] [PATCH v10] virtio-net: support the virtqueue coalescing moderation
在 2023/3/7 上午6:57, Michael S. Tsirkin 写道: On Thu, Mar 02, 2023 at 11:36:18AM +, David Edmondson wrote: +for an enabled transmit/receive virtqueue whose number is \field{vqn}. Should this now be "whose index is \field{vqn}"? Ugh. I guess we'll have to fix the number/index mess in the spec first. Parav you said you are looking into it? # Where virtqueue number and virtqueue index are used. 1. In the Virtqueue Configuration Section, use the virtqueue index: "Write the **virtqueue index** (first queue is 0) to queue_select." 2. Both descriptions are used separately in the Notification Section. 2.1 Here vqn is called virtqueue index: "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver sends an available buffer notification to the device by writing the **16-bit virtqueue index** of this virtqueue to the Queue Notify address. ... le32 { vqn: 16; next_off : 15; next_wrap: 1; }; ... If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver MUST use the queue_notify_data value instead of the **virtqueue index**." 2.2 Here vqn is called virtqueue number: "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the Notification data contains the **Virtqueue number**. ... be32 { vqn: 16; next_off : 15; next_wrap: 1; }; ... When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, this notification involves sending the **virtqueue number** to the device (method depending on the transport). ... vqn -- **VQ number** to be notified." # 0-based index and 0-based number are used respectively in the RSS Section: 1. "Field unclassified_queue contains the **0-based index** of the receive virtqueue to place unclassified packets in. Index 0 corresponds to receiveq1." 2. "use the result as the index in the indirection table to get **0-based number** of destination receiveq (value of 0 corresponds to receiveq1)." \field{vqn} has been called '0-based virtqueue index' or '0-based virtqueue number', I think both seem to be friendly to readers, so what are your options? Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. Some overhead is clearly involved in *sending* packets - to calculate the hash and stick it in the port number. This is, however, a separate problem and if you want to solve it then my suggestion would be to teach the *transmit* side about GRE offloads, so it can fill the source port in the card. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. how isn't it enough? 16 bit is enough to cover all vqs ... A 5-tuple brings more entropy than a single port, doesn't it? In fact, the inner hash of the physical network card used by the business team is indeed better than the udp port number of the outer header we modify now, but they did not give me the data. 2. When there is an inner header hash, the gateway will directly help parse the outer header, and use the inner 5 tuples to calculate the inner hash. The tunneled packet is then handed over to the host. 1) All the CPUs of the host are used to process data packets, and there is no need to use some CPUs to forward and parse the outer header. You really have to parse the outer header anyway, otherwise there's no tunneling. Unless you want to teach virtio to implement tunneling in hardware, which is something I'd find it easier to get behind. There is no need to parse the outer header twice, because we use shared memory. 2) The entropy of the original quintuple is sufficient, and the queue is widely distributed. It's exactly the same entropy, why would it be better? In fact you are taking out the outer hash entropy making things worse. I don't get the point, why the entropy of the inner 5-tuple and the outer tunnel header is the same, multiple streams have the same outer header. Thanks. Thanks. same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www
[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] RE: [PATCH v10] virtio-net: support the virtqueue coalescing moderation
在 2023/3/9 上午6:30, Parav Pandit 写道: From: virtio-dev@lists.oasis-open.org On Behalf Of Heng Qi Sent: Thursday, March 2, 2023 10:27 PM I remember we discussed that instead of mentioning each individual field, better to describe the whole structure being read-only or write-only. Consider the following scenarios: 1. A read-only field of the structure virtio_net_ctrl_coal is extended for CTRL_NOTF_COAL_RX/TX_SET to get some extra information A set command cannot extend the struct virtio_net_ctrl_coal, particularly for read-only and partially for write-only. This would mean that for the tiny number of bytes, an additional dma descriptor is to be allocated with read/write-only permissions. It would be inefficient for the driver to do so for the SET command to have vqn as write-only, reserved as read-only, rest fields as write-only dma attributes. As I think more, I think the whole set command fields should be read-only for device. Better to describe it this way instead of splitting individual fields. This way driver can just do a single DMA allocation with read-only attributes for set cmd. Get command doesn’t have any choice today the way CVQ is structured to it lives with the limitation. I think it is reasonable and will be revised in the next version. Looks good, however you have well covered in the device normative statements. So possibly it can be removed from here. I tend to keep this, as we have done in the past, we can have normative descriptions and the corresponding non-normative descriptions. Ok. but please revisit if the description can be simpler text than the normative lines. Ok. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote: > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > > > > > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: > > > > > > > > > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash > > > > > > > > > > calculated > > > > > > > > > > using the outer header of the receive packets is always > > > > > > > > > > fixed for the > > > > > > > > > > same flow packets, i.e. they will be steered to the same > > > > > > > > > > receive queue. > > > > > > > > > Wait a second. How is this true? Does not everyone stick the > > > > > > > > > inner header hash in the outer source port to solve this? > > > > > > > > Yes, you are right. That's what we did before the inner header > > > > > > > > hash, but it > > > > > > > > has a performance penalty, which I'll explain below. > > > > > > > > > > > > > > > > > For example geneve spec says: > > > > > > > > > > > > > > > > > >it is necessary for entropy from encapsulated packets > > > > > > > > > to be > > > > > > > > >exposed in the tunnel header. The most common > > > > > > > > > technique for this is > > > > > > > > >to use the UDP source port > > > > > > > > The end point of the tunnel called the gateway (with DPDK on > > > > > > > > top of it). > > > > > > > > > > > > > > > > 1. When there is no inner header hash, entropy can be inserted > > > > > > > > into the udp > > > > > > > > src port of the outer header of the tunnel, > > > > > > > > and then the tunnel packet is handed over to the host. The host > > > > > > > > needs to > > > > > > > > take out a part of the CPUs to parse the outer headers (but not > > > > > > > > drop them) > > > > > > > > to calculate the inner hash for the inner payloads, > > > > > > > > and then use the inner > > > > > > > > hash to forward them to another part of the CPUs that are > > > > > > > > responsible for > > > > > > > > processing. > > > > > > > I don't get this part. Leave inner hashes to the guest inside the > > > > > > > tunnel, why is your host doing this? > > > > > > > > Let's simplify some details and take a fresh look at two different > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). > > > > > > > > 1. In Scenario1, we can improve the processing performance of the same > > > > flow > > > > by implementing inner symmetric hashing. > > > > > > > > This is because even though client1 and client2 communicate > > > > bidirectionally > > > > through the same flow, their data may pass > > > > > > > > through and be encapsulated by different tunnels, resulting in the same > > > > flow > > > > being hashed to different queues and processed by different CPUs. > > > > > > > > To ensure consistency and optimized processing, we need to parse out the > > > > inner header and compute a symmetric hash on it using a special rss key. > > > > > > > > Sorry for not mentioning the inner symmetric hash before, in order to > > > > prevent the introduction of more concepts, but it is indeed a kind of > > > > inner > > > > hash. > > > If parts of a flow go through different tunnels wo
[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash
在 2023/3/15 下午11:09, Michael S. Tsirkin 写道: On Wed, Mar 15, 2023 at 09:19:43PM +0800, Heng Qi wrote: Any encapsulation technology that includes UDP/L4 header likely do not prefer based on the inner header. This is because the outer header src port entropy is added based on the inner header. I was not able to follow the discussion in v9 that you had with Michael. Did you conclude if this is needed for vxlan too? If not, for now it may be better to skip vxlan and nvegre as they inherently have unique outer header UDP src port based on the inner header. Symmetric hashing ignores the order of the five-tuples when calculating the hash, that is, using (a1,a2,p1,p2) and (a2,a1,p2,p1) respectively can calculate the same hash. There is a scenario that the two directions client1->client2 and client2->client1 of the same flow may pass through different tunnels. In order to allow the data in two directions to be processed by the same CPU, we need to calculate a symmetric hash based on the inner packet header. Sorry I didn't mention this earlier just to avoid introducing the concept of symmetric hashing. But the hash is already there in the port. Is it then maybe just the question of ignoring the IP addresses when hashing? We do not ignore the IP address, because after the tunnel sends the packets to the processing host, the processing host will parse the outer headers, and then use the inner symmetric hash to hand over the packets of the same flow to the same cpu for processing (for the network topology, please check my latest reply thread). Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support the virtqueue coalescing moderation
在 2023/3/23 上午1:02, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, March 22, 2023 12:53 PM On Wed, Mar 22, 2023 at 04:49:58PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, March 22, 2023 12:47 PM I agree with Cornelia here. Yes if devices do not want to trust drivers then they will validate input but what exactly happens then is currently up to device. If we want to try and specify devices in all cases of out of spec input that's a big project, certainly doable but I would rather not connect it to this, rather boutique, feature. Both of your and Cornelia's comment is abstract to me. We cannot change past. But we can make sure things are consistent. Currently we don't describe device behaviour if driver is out of spec and I see 0 reasons to start doing it with coalescing commands specifically. For the new command of interest here, hen driver supplied incorrect values, the device will return error. It might be easier for device to just set NEEDS_RESET and stop responding. This approach of treating all errors as a fatal category is completely the opposite of making the device and driver resilient to (recoverable) errors. We shouldn't go this route. Different discussion... For a hypervisor implementation that's often better than returning error since device state is then preserved making things easier to debug. How to implement is upto the device to figure out. what to do is also up to the device. Previously error code as not returned hence new command cannot return the error code is going backward. Returning the failure code is a way to indicate that the driver had a recoverable error. I agree with you. Part of the specification [1] covered something we're talking about, e.g. if an untrusted driver sends a disabled vq, the device returns an error: [1] +The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the designated virtqueue is disabled. Maybe we should modify [1] to: "The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the designated \field{vqn} is not the virtqueue number of an enabled transmit or receive virtqueue." Thanks! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
在 2023/3/23 上午11:58, Heng Qi 写道: 在 2023/3/23 上午11:13, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, March 22, 2023 12:42 PM Yes. But my point is this. Some flows can be IPv4 others IPv6. Do you see a way to have a key that will result in a symmetrical hash for both IPv4 and IPv6? Can you give an example please? Heng, Is that the requirement to have two completely different flows (ipv, ipv6) to steer to a single RQ? Michael should be talking about whether there is a symmetric key that can serve both IPv4 and IPv6, so that they can respectively achieve the purpose of symmetric hashing. I am not an expert in hashing, but this article [1] deduces a symmetric hash key, and I think it should be possible to deduce a specific key to meet this requirement. Sorry for the lost link: [1] https://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf Or we should support XOR hashing. Thanks. The requirement, what I understood, is between two directions for a flow to result in a single hash value. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
在 2023/3/23 上午11:13, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, March 22, 2023 12:42 PM Yes. But my point is this. Some flows can be IPv4 others IPv6. Do you see a way to have a key that will result in a symmetrical hash for both IPv4 and IPv6? Can you give an example please? Heng, Is that the requirement to have two completely different flows (ipv, ipv6) to steer to a single RQ? Michael should be talking about whether there is a symmetric key that can serve both IPv4 and IPv6, so that they can respectively achieve the purpose of symmetric hashing. I am not an expert in hashing, but this article [1] deduces a symmetric hash key, and I think it should be possible to deduce a specific key to meet this requirement. Or we should support XOR hashing. Thanks. The requirement, what I understood, is between two directions for a flow to result in a single hash value. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). Oh intersting. What are those gateways, how come there's expectation that you can change their addresses and topology completely seamlessly without any reordering whatsoever? Isn't network topology change kind of guaranteed to change ordering sometimes? 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. Some overhead is clearly involved in *sending* packets - to calculate the hash and stick it in the port number. This is, however, a separate problem and if you want to solve it then my suggestion would be to teach the *transmit* side about GRE offloads, so it can fill the source port in the card. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. how isn't it enough? 16 bit is enough to cover all vqs ... A 5-tuple brings more entropy than a single port, doesn't it? But you don't need more for RSS, the indirection table is not that large. In fact, the inner hash of the physical network card used by the business team is indeed better than the udp port number of the outer header we modify now, but they did not give me the data. Admittedly, out hash value is 32
[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash
On Wed, Mar 15, 2023 at 07:06:53PM -0400, Parav Pandit wrote: > > > On 3/15/2023 9:19 AM, Heng Qi wrote: > > > > > >在 2023/3/15 上午11:23, Parav Pandit 写道: > >> > >> > >>On 3/6/2023 10:48 AM, Heng Qi wrote: > >> > [..] > >>> \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. > >>>+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ. > >>I think this should also say that HASH_TUNNEL requires either of > >>the F_RSS or F_HASH_REPORT. > >>Because without it HASH_TUNNEL is not useful. > > > >F_HASH_TUNNEL indicates that the hash should be calculated using > >the inner packet header, even without F_RSS or F_HASH_REPORT, > >we can continue to use the hash value in scenarios such as RPS or > >ebpf programs. > Yes. > Even for rps or ebpf programs, F_HASH_TUNNEL is fine. > When such feature arrives in future, it above line will have OR > condition for RPS feature bit. > > > > > > >I think it's fine to let F_HASH_TUNNEL rely on F_RSS or > >_F_HASH_REPORT as those are probably important scenarios where > >inner packet header hash is used. > Yes. > > >>If not, for now it may be better to skip vxlan and nvegre as > >>they inherently have unique outer header UDP src port based on > >>the inner header. > > > >Symmetric hashing ignores the order of the five-tuples when > >calculating the hash, that is, using (a1,a2,p1,p2) and > >(a2,a1,p2,p1) respectively can calculate the same hash. > >There is a scenario that the two directions client1->client2 and > >client2->client1 of the same flow may pass through different > >tunnels. > >In order to allow the data in two directions to be processed by > >the same CPU, we need to calculate a symmetric hash based on the > >inner packet header. > I am lost in two directions and two clients above. > When you say two directions, do you mean tx and rx? > and do symmetric hashing between tx and rx between two end points > within single tunnel? > A rough description of this scene is as follows: Client1 sends packets to client2, and client2 sends packets to client1 respectively. This is called two directions, and they are in the same flow. The packets in the two directions may reach the processing host through different tunnels. In this scenario, we need to hash these packets to the same queue for the same cpu to process, so inner symmetric hashing is required. client1 client2 | | | __ | +->| tunnels |<---+ |-| | | | | | | v v +-+ | processing host | +-+ > >>>+\field{hash_tunnel_types} is set to > >>>VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for the > >>>+unencapsulated packets. > >>>+ > I missed this before. unencapsulated is not a term. > s/unencapsulated packets/Non encapsulated packets or non tunneled packets > Yes, you are right! Thanks. > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscr...@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org > List help: virtio-comment-h...@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash
在 2023/3/16 上午7:24, Parav Pandit 写道: On 3/15/2023 8:10 AM, Michael S. Tsirkin wrote: On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote: If not, for now it may be better to skip vxlan and nvegre as they inherently have unique outer header UDP src port based on the inner header. So what's left, GRE? GRE is actually different, in that it's not IP at all. Sorry, I wrongly wrote nvegre above. IPoIP, GRE and NVGRE are left. vxlan and geneve has the udp src entropy. Not sure I understand "its not IP at all". GRE has outer IP header + GRE header with the key to identify the flow. The key is effectively the hash for the flow. So if we are talking about GRE, hash is indeed not calculated at all at the moment, right? Hash of the outer IP header of the src and dst IP can be still calculated currently for GRE when the optional key is not present. And I would say a natural first step for GRE is actually adding a hash type that will support this protocol. For GRE and NVGRE GRE_header.key as the flow/hash identifier should work without inner header hash. Older version of the GRE doesn't have key, so inner header hash is useful. Yes, indeed. The old GRE does not have keys such as flow id. Even with the new GRE, we have no chance to use optional fields, because we have connected with various operators, and the most basic fields can get the best compatibility. Thanks. How about doing that? It seems like this should be a small step and completely uncontroversial. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? Inner hash can at least hash tunnel flows without outer transport headers like GRE to multiple queues, which is beneficial to us. For tunnel flows with outer transport headers like VXLAN, although they can hash flows to different queues by setting different outer udp port, this does not conflict with inner hash. Inner hashing can also be used for this purpose. For the same flow, packets in the receiving and sending directions may pass through different tunnels respectively, which cause the same flow to be hashed to different queues. In this case, we have to calculate a symmetric hash (can be called an inner symmetric hash, which is a type of inner hash.) through the inner header, so that the same flow can be hashed to the same queue. Symmetric hashing can ignore the order of the 5-tuples to calculate the hash, that is, the hash values calculated by (a1, a2, a3, a4) and (a2, a1, a4, a3) respectively are the same. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. If parts of a flow go through different tunnels won't this cause reordering at the network level? Why is it so important to prevent it at the nic then? Or, since you are stressing symmetric hash, are you talking about TX and RX side going through different tunnels? Yes, the directions client1->client2 and client2->client1 may go through different tunnels. Using inner symmetric hashing can satisfy the same CPU to process two directions of the same flow to improve performance. 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Well 2 is at least inexact, there's flowID there. It's just 8 bit We use the most basic GRE header fields (not NVGRE), not even optional fields. There is also no flow id in the GRE header, should you be referring to NVGRE? Thanks. so not sufficient if there are more than 512 queues. Still 512 queues is quite a lot. Are you trying to solve for configurations with more than 512 queues then? Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). Oh intersting. What are those gateways, how come there's expectation that you can change their addresses and topology completely seamlessly without any reordering whatsoever? Isn't network topology change kind of guaranteed to change ordering sometimes? 1). During this process, the CPUs on the hos
[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash
在 2023/3/15 上午11:23, Parav Pandit 写道: On 3/6/2023 10:48 AM, Heng Qi wrote: +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash May be to say inner packet header hash.. This make its little more clear about "which header" that you explained in the commit log. Sure, I'll add this. + for tunnel-encapsulated packets. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ. I think this should also say that HASH_TUNNEL requires either of the F_RSS or F_HASH_REPORT. Because without it HASH_TUNNEL is not useful. F_HASH_TUNNEL indicates that the hash should be calculated using the inner packet header, even without F_RSS or F_HASH_REPORT, we can continue to use the hash value in scenarios such as RPS or ebpf programs. Right? if no, than my below comments are meaningless. I think it's fine to let F_HASH_TUNNEL rely on F_RSS or _F_HASH_REPORT as those are probably important scenarios where inner packet header hash is used. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -198,20 +202,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device u8 rss_max_key_size; le16 rss_max_indirection_table_length; le32 supported_hash_types; + le32 supported_tunnel_hash_types; }; \end{lstlisting} -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set. +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set. It specifies the maximum supported length of RSS key in bytes. I think rss_max_key_size field dependency should be only of the existing feature bits F_RSS and F_HASH_REPORT. This is because those are the bits really deciding to consider rss_max_key_size. The following field, \field{rss_max_indirection_table_length} only exists if VIRTIO_NET_F_RSS is set. It specifies the maximum number of 16-bit entries in RSS indirection table. The next field, \field{supported_hash_types} only exists if the device supports hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set. +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set. Same as above. Field \field{supported_hash_types} contains the bitmask of supported hash types. See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types. +The next field, \field{supported_tunnel_hash_types} only exists if the device +supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set. + Above line "the next field .." can be just same as "Device supports inner packet header hash calculation, i.e..." This is because here, the term "header" is missed, which is present in the definition of feature bit 52. I'll rephrase the description to make the whole text more consistent. The device MUST set \field{rss_max_key_size} to at least 40, if it offers -VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT. +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL. This needs change if the above first comment about rss_max_key_size is right. The device MUST set \field{rss_max_indirection_table_length} to at least 128, if it offers VIRTIO_NET_F_RSS. @@ -843,11 +854,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network \begin{itemize} \item The feature VIRTIO_NET_F_RSS was negotiated. The device uses the hash to determine the receive virtqueue to place incoming packets. \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device reports the hash value and the hash type with the packet. +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device supports inner hash calculation. \end{itemize} inner packet header hash .. Ok. +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated, the encapsulation +hash type below indicates that the hash is calculated over the inner header of +the encapsulated packet: +Hash type applicable for inner payload of the gre-encapsulated packet +\begin{lstlisting} +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE (1 << 1) +\end{lstlisting} +Hash
[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash
在 2023/3/15 下午8:10, Michael S. Tsirkin 写道: On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote: If not, for now it may be better to skip vxlan and nvegre as they inherently have unique outer header UDP src port based on the inner header. So what's left, GRE? GRE is actually different, in that it's not IP at all. I do not think so, I mentioned that VXLAN and GENEVE need inner symmetric hashing, and we need this. And we know inner hashing doesn't conflict with other ways of adding entropy. Thanks. So if we are talking about GRE, hash is indeed not calculated at all at the moment, right? And I would say a natural first step for GRE is actually adding a hash type that will support this protocol. How about doing that? It seems like this should be a small step and completely uncontroversial. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
在 2023/2/22 下午2:21, Michael S. Tsirkin 写道: On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote: The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently. Yes, our cloud security and cloud network team will configure and use inner hash on dpdk. Sounds good. More practical for dpdk than Linux. Is there a chance that when the interface is close to be final, but before the vote, you post a patch to the dpdk list and get some acks from the maintainers, cc virtio-dev. This way we won't merge something that will then go unused? That would be best - do you have a prototype? Not yet, dpdk and the business team are waiting for our virtio specification, and they have stated as a business team that their implementation on dpdk will not necessarily be open sourced to the community. In fact I discussed with them the security issues between tunnels, and I will quote their solutions to tunnel attacks below, but this is a problem between the tunnels, not the introduction of inner hash. I don't think we need to focus too much on this, but I'll do my best to describe the security issues between tunnels in v10. " This is not a problem with the inner hash, it is a general problem with the outer hash. I communicated with our people who are doing cloud security (they are also one of the demanders of inner hash), and it is a common problem for one tunnel to attack another tunnel. For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and the vni id of t1 is id1, and the vni id of v2 is id2; a VM. At this time, regardless of the inner hash or the outer hash, the traffic of tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a single queue or multiple queues), and may be placed on the same queue to cause queue overflow. Do note (and explain in spec?) that with just an outer hash and RSS it is possible to configure the tunnels to use distict queues. Impossible with this interface but arguably only works for a small number of tunnels anyway. # Solutions: More like mitigations. Yes, you are right. 1. Some current forwarding tools such as DPDK have good forwarding performance, and it is difficult to fill up the queue; Oh that's a good point. If driver is generally faster than the device and queues stay away from filling up there's no DoS. I'd add this to the spec. Ok. 2. or switch the attack traffic to the attack clusters; What is that? This is done by the monitoring part outside the tunnel, which is also an important mitigation method they mentioned to prevent DoS between tunnels. For example, the monitoring part cuts off, limits or redirects the abnormal traffic of the tunnel. 3. or connect the traffic of different tunnels to different network card ports or network devices. Not sure how this is relevant. These a distinct outer MAC - with this why do we need a tunnel? 4.. " - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [PATCH v6] virtio-net: support the virtqueue coalescing moderation
Hi, Parav. 在 2023/2/22 下午10:28, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, February 22, 2023 1:29 AM MMIO device has vq_index register too. I am inclined to vq_index given current state of spec [mst@tuck virtio]$ git grep vq_index|wc -l 0 :) Grep for "virtqueue index". We need to change "virtqueue index" to "virtqueue number". Yes, this is more unified in the spec. :) This version is closed, please help review the v7 version [1] if you have time. [1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00520.html Thanks very much for your response! and new additions by Michael. admin vq? Good point, I will change that to vqn too. Ok. sounds good. Michael, Can you please suggest vqn or vq_index to use? [mst@tuck virtio]$ git grep vqn|wc -l 5 I'd say vqn has precedent. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/27 下午4:35, Jason Wang 写道: On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin wrote: On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote: Btw, this kind of 1:1 hash features seems not scalable and flexible. It requires an endless extension on bits/fields. Modern NICs allow the user to customize the hash calculation, for virtio-net we can allow to use eBPF program to classify the packets. It seems to be more flexible and scalable and there's almost no maintain burden in the spec (only bytecode is required, no need any fancy features/interactions like maps), easy to be migrated etc. Prototype is also easy, tun/tap had an eBPF classifier for years. Thanks Yea BPF offload would be great to have. We have been discussing it for years though - security issues keep blocking it. *Maybe* it's finally going to be there but I'm not going to block this work waiting for BPF offload. And easily migrated is what BPF is not. Just to make sure we're at the same page. I meant to find a way to allow the driver/user to fully customize what it wants to hash/classify. Similar technologies which is based on private solution has been used by some vendors, which allow user to customize the classifier[1] ePBF looks like a good open-source solution candidate for this (there could be others). But there could be many kinds of eBPF programs that could be offloaded. One famous one is XDP which requires many features other than the bytecode/VM like map access, tailcall. Starting from such a complicated type is hard. Instead, we can start from a simple type, that is the eBPF classifier. All it needs is to pass the bytecode to the device, the device can choose to run it or compile it to what it can understand for classifying. We don't need maps, tail calls and other features. We don't need to worry about the security because of its simplicity: the eBPF program is only in charge of doing classification, no other interactions with the driver and packet modification is prohibited. The feature is limited only to the VM/bytecode abstraction itself. Since the instruction set of ebpf is not complicated, some devices already support the offloading of ebpf, but what troubles them is how to standardize each device to implement standard and optional subsystem interfaces. There are two reasons: #1, due to the rapid development of ebpf , so it is difficult for them to ensure the backward compatibility of the interface. #2, such as network and blk devices, which interfaces must be implemented to allow the same ebpf program to run perfectly on each other. Also, ebpf program is not isolated, it is expected to interact with userspace programs, and few examples can demonstrate how to use them effectively. Maybe we can take advantage of the virtio spec to get out there first. Thanks. What's more, it's a good first step to achieve full eBPF offloading in the future. Thanks [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] [PATCH v6] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature
Hi, Alvaro. Thanks for your work ! As suggested earlier in the 'virtqueue coalescing' thread, I will be working on top of your patch. :) 在 2023/2/19 下午5:03, Alvaro Karsz 写道: This patch makes several improvements to the notification coalescing feature, including: - Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx into a single struct, virtio_net_ctrl_coal, as they are identical. - Emphasizing that the coalescing commands are best-effort. - Defining the behavior of coalescing with regards to delivering notifications when a change occur. - Stating that the commands should apply to all the receive/transmit virtqueues. - Stating that every receive/transmit virtqueue should count it's own packets. - A new intro explaining the entire coalescing operation. Signed-off-by: Alvaro Karsz Reviewed-by: Parav Pandit --- v2: - Add the last 2 points to the patch. - Rephrase the "commands are best-effort" note. - Replace "notification" with "used buffer notification" to be more consistent. v3: - Add an intro explaining the entire coalescing operation. v4: - Minor wording fixes. - Rephrase the general note. v5: - Replace virtio_net_ctrl_coal->usecs with virtio_net_ctrl_coal->max_usecs v6: - Replace usecs with microseconds. - Make the TX/RX examples relevant for MQ as well. - Explain that upon meeting the notification condition, the device starts counting packets and microseconds again. - Add Parav's Reviewed-by tag (Thanks!) device-types/net/description.tex | 74 +++- 1 file changed, 44 insertions(+), 30 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index 1741c79..c7a8ca6 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can send control commands for dynamically changing the coalescing parameters. -\begin{lstlisting} -struct virtio_net_ctrl_coal_rx { -le32 rx_max_packets; -le32 rx_usecs; -}; +\begin{note} +The behavior of the device in response to these commands is best-effort: +the device may generate notifications more or less frequently than specified. +\end{note} -struct virtio_net_ctrl_coal_tx { -le32 tx_max_packets; -le32 tx_usecs; +\begin{lstlisting} +struct virtio_net_ctrl_coal { +le32 max_packets; +le32 max_usecs; }; #define VIRTIO_NET_CTRL_NOTF_COAL 6 @@ -1532,49 +1532,63 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi Coalescing parameters: \begin{itemize} -\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification. -\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification. -\item \field{rx_max_packets}: Maximum number of packets to receive before a RX notification. -\item \field{tx_max_packets}: Maximum number of packets to send before a TX notification. +\item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification. +\item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification. +\item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification. +\item \field{max_packets} for TX: Maximum number of packets to send before a TX notification. \end{itemize} - The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands: \begin{enumerate} -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and \field{tx_max_packets} parameters. -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and \field{rx_max_packets} parameters. +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all the transmit virtqueues. +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all the receive virtqueues. \end{enumerate} -\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications} +\subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation} + +The device sends a used buffer notification once the notification conditions are met, if the notifications are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}. + +When the device has non-zero \field{max_usecs} and non-zero \field{max_packets}, it starts counting microseconds and packets upon receiving/sending a packet. +The device counts packets and microseconds for each receive virtqueue and transmit virtqueue separately. +In this case, the notification
Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation
On Fri, Feb 17, 2023 at 10:42:21AM +0200, Alvaro Karsz wrote: > Hi Heng, > > > +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports the virtqueue > > +notifications coalescing. > > + > > \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. > > > > \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. > > @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit > > requirements}\label{sec:Device Types / Network Device > > \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. > > \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or > > VIRTIO_NET_F_HOST_TSO6. > > \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. > > +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. > > \end{description} > > > > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / > > Network Device / Feature bits / Legacy Interface: Feature bits} > > @@ -4501,8 +4505,11 @@ \subsubsection{Control Virtqueue}\label{sec:Device > > Types / Network Device / Devi > > }; > > > > #define VIRTIO_NET_CTRL_NOTF_COAL 6 > > - #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0 > > + #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0 > > #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1 > > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2 > > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3 > > + > > \end{lstlisting} > > > > Coalescing parameters: > > @@ -4514,12 +4521,67 @@ \subsubsection{Control Virtqueue}\label{sec:Device > > Types / Network Device / Devi > > \end{itemize} > > > > > > -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands: > > +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands: > > \begin{enumerate} > > \item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and > > \field{tx_max_packets} parameters. > > \item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and > > \field{rx_max_packets} parameters. > > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_packets} and > > \field{max_usecs} parameters for a enabled > > +transmit/receive virtqueue whose > > number is \field{vqn}. > > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device gets the > > \field{max_packets} and \field{max_usecs} parameters of > > +a enabled transmit/receive > > virtqueue whose number is \field{vqn}, and then > > +responds them to the driver. > > \end{enumerate} > > > > +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated: > > +1. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set the > > coalescing > > + parameters of a enabled transmit/receive virtqueue. > > +2. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a > > device, and the device > > + responds to the driver with the coalescing parameters of a enabled > > transmit/receive virtqueue. > > + > > +\begin{lstlisting} > > +struct virtio_net_ctrl_coal_vq { > > +le16 vqn; > > +le16 reserved; > > +le32 max_packets; > > +le32 max_usecs; > > +}; > > +\end{lstlisting} > > + > > Maybe we can use struct virtio_net_ctrl_coal inside struct > virtio_net_ctrl_coal_vq instead of repeating max_usecs and > max_packets? > I'm not sure if it would be confusing, what do you think? > Hi Alvaro. I guess you mean one of the following two forms: #1 struct virtio_net_ctrl_coal { le32 max_packets; le32 max_usecs; }; struct virtio_net_ctrl_coal_vq { le16 vqn; le16 reserved; struct virtio_net_ctrl_coal coal; } coal_vq; #2 struct virtio_net_ctrl_coal { le32 max_packets; le32 max_usecs; le16 vqn; // if _F_VQ_NOTF_COAL is negotiated le16 reserved; // if _F_VQ_NOTF_COAL is negotiated }; If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the presence of vqn and reserved is awkward. If it's #2, I think this is a bit like the v1 version, using virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be semantic, but it is indeed more unified in function. I think we should hear from Michael and Parav. > > +Virtqueue coalescing parameters: > > +\begin{itemize} > > +\item \field{vqn}: The virtqueue number of the enabled transmit or receive > > virtqueue, excluding the control virtqueue. > > +\item \field{max_packets}: The maximum number of packets sent/received by > > the specified virtqueue before a TX/RX notification. > > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the > > specified virtqueue delays a TX/RX notification. > > +\end{itemize} > > + > > +\field{reserved} is reserved and it is ignored by the device. > > + > > max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET. > ("Maximum number of packets to receive/send before a RX/TX notification"). > The fact that this is applied to all VQs or to a specific one
Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation
On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote: > > > Maybe we can use struct virtio_net_ctrl_coal inside struct > > > virtio_net_ctrl_coal_vq instead of repeating max_usecs and > > > max_packets? > > > I'm not sure if it would be confusing, what do you think? > > > > > > > Hi Alvaro. > > > > I guess you mean one of the following two forms: > > > > #1 > > struct virtio_net_ctrl_coal { > > le32 max_packets; > > le32 max_usecs; > > }; > > > > struct virtio_net_ctrl_coal_vq { > > le16 vqn; > > le16 reserved; > > struct virtio_net_ctrl_coal coal; > > } coal_vq; > > > > #2 > > struct virtio_net_ctrl_coal { > > le32 max_packets; > > le32 max_usecs; > > le16 vqn; // if _F_VQ_NOTF_COAL is negotiated > > le16 reserved; // if _F_VQ_NOTF_COAL is negotiated > > }; > > > > If it's #1, I think the format is a bit ugly, it's not semantic to use > > coal_vq to send global commands when _F_VQ_NOTF_COAL is not negotiated, and > > the presence of vqn and reserved is awkward. > > If it's #2, I think this is a bit like the v1 version, using > > virtio_net_ctrl_coal as a virtual queue to send commands does not seem to > > be semantic, but it is indeed more unified in function. > > > > I think we should hear from Michael and Parav. > > > > I meant #1. > We can see virtio_net_ctrl_coal as a struct holding coalescing > parameters, regardless of the commands. > Yes, let's wait for more comments on that. > > > > > +Virtqueue coalescing parameters: > > > > +\begin{itemize} > > > > +\item \field{vqn}: The virtqueue number of the enabled transmit or > > > > receive virtqueue, excluding the control virtqueue. > > > > +\item \field{max_packets}: The maximum number of packets sent/received > > > > by the specified virtqueue before a TX/RX notification. > > > > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the > > > > specified virtqueue delays a TX/RX notification. > > > > +\end{itemize} > > > > + > > > > +\field{reserved} is reserved and it is ignored by the device. > > > > + > > > > > > max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with > > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET. > > > ("Maximum number of packets to receive/send before a RX/TX notification"). > > > The fact that this is applied to all VQs or to a specific one is > > > derived from the command. > > > Same for max_usecs. > > > Maybe we can join the coalescing parameters somehow instead of > > > repeating the explanations? > > > > > Any thoughts on this part? Good idea, and if so, is there a good way to expose vqn to the interpretation of max_packets ? #1 \item \field{vqn}: The virtqueue number of the enabled transmit or receive virtqueue. \item \field{max_packets}: The maximum number of packets sent/received by the specified virtqueue before a TX/RX notification. #2 \item \field{max_packets}: Maximum number of packets to receive/send before a RX/TX notification. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support the virtqueue coalescing moderation
在 2023/2/28 上午4:19, Michael S. Tsirkin 写道: On Mon, Feb 27, 2023 at 08:45:43PM +0200, Alvaro Karsz wrote: -If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can -send control commands for dynamically changing the coalescing parameters. +If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can send the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET +and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands to dynamically change the coalescing parameters for all transmit +and receive virtqueues, respectively. + +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated: +\begin{itemize} +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given + enabled transmit/receive virtqueue. +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with + coalescing parameters of a given enabled transmit/receive virtqueue. +\end{itemize} The VQ_NOTF_COAL commands are enclosed in an itemize and the NOTF_COAL commands aren't. I think that we should be consistent here. The VQ commands are enclosed in an itemize because it has both GET and SET operations. I don't think VIRTIO_NET_CTRL_NOTF_COAL_RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_TX_SET have to do this, because they are both settings, just in different directions, and we have described them like this elsewhere. I'm not sure that describing the commands in here is necessary, maybe just mentioning which commands can be used with which feature bit is Isn't that what this sentence does? Yes, but it also describes what the command does "to dynamically change the coalescing parameters for all transmit and receive virtqueues" Seems a bit repetitive to me. enough (this is what I meant in v8, sorry if I wasn't clear). I'm not saying that describing the commands in here is not good, but notice that the commands are described in 3 different places. Three places describe different effects. #1 describes which commands are available for which feature. #2 describes which command can use which structure. #3 describes which parameters can be set for each command, and whether they can affect "global values". It is placed in the "Operation" section because it explains how the device should behave. You're right, but some parts seems repetitive to me: #1: "commands to dynamically change the coalescing parameters for all transmit and receive virtqueues" #2: "commands use the virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all transmit/receive virtqueues." #3: " set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues, and values of..." Feels like every time we re-explain the commands with more detail. So, for example we read #2 and understand what the command does (set usecs and packets), then we read #3 and understand that it does some more stuff. IMO we need to explain it once, with all the details. This is how we've written it historically. The idea is that reader can get everything on the first pass: 1st high level picture then detailed description then the formalities. Other specs do it differently so one has to re-read them many times to understand. I don't like this personally and I much prefer the virtio way. Maybe #1 and #2 descriptions can be combined and described together. This is #1. \begin{note} The behavior of the device in response to these commands is best-effort: @@ -1519,25 +1531,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi le32 max_usecs; }; +struct virtio_net_ctrl_coal_vq { +le16 vqn; +le16 reserved; +struct virtio_net_ctrl_coal coal; +}; + #define VIRTIO_NET_CTRL_NOTF_COAL 6 #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0 #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1 + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2 + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3 \end{lstlisting} +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all +transmit/receive virtqueues. + +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}. + +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn} +in the virtio_net_ctrl_coal_vq structure. + This is #2. +# Read/Write attributes for coalescing parameters +\begin{itemize} +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs} + and \field{max_packets} are write-only for a driver. +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs} + and \field{max_packets} are write-only
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support the virtqueue coalescing moderation
在 2023/2/28 下午3:49, Michael S. Tsirkin 写道: On Tue, Feb 28, 2023 at 10:41:18AM +0800, Heng Qi wrote: The problem is the global qualifier. And it's not even global - there are two sets for rx and for tx and does not apply to cvq at all. How about "RX/TX coalescing parameters"? As Parav suggested, we probably don't need to mention "global values/coalescing parameters" since this values/parameters are the ones set by CTRL_NOTF_COAL_RX/TX_SET. Just like this: " After disabling and re-enabling a virtuqueue, the device MUST revert to coalescing parameters to the one configured using VIRTIO_NET_CTRL_NOTF_COAL. " Thanks. VIRTIO_NET_CTRL_NOTF_COAL is not a thing. Let's wait for Alvaro's new patch to be merged though, see what teminology he comes up with and reuse. Ok, are you referring to Alvaro's "virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature", if so I see that his v7 version is stable and has been voted in https://github.com/oasis-tcs/ virtio-spec/issues/159 , this patch was made on top of his v7 version. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature
在 2023/2/16 下午3:32, Alvaro Karsz 写道: This patch makes several improvements to the notification coalescing feature, including: - Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx into a single struct, virtio_net_ctrl_coal, as they are identical. - Emphasizing that the coalescing commands are best-effort. - Defining the behavior of coalescing with regards to delivering notifications when a change occur. - Stating that the commands should apply to all the receive/transmit virtqueues. - Stating that every receive/transmit virtqueue should count it's own packets. - A new intro explaining the entire coalescing operation. Signed-off-by: Alvaro Karsz --- v2: - Add the last 2 points to the patch. - Rephrase the "commands are best-effort" note. - Replace "notification" with "used buffer notification" to be more consistent. v3: - Add an intro explaining the entire coalescing operation. v4: - Minor wording fixes. - Rephrase the general note. device-types/net/description.tex | 69 +++- 1 file changed, 41 insertions(+), 28 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index 1741c79..11760f3 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can send control commands for dynamically changing the coalescing parameters. -\begin{lstlisting} -struct virtio_net_ctrl_coal_rx { -le32 rx_max_packets; -le32 rx_usecs; -}; +\begin{note} +The behavior of the device in response to these commands is best-effort: +the device may generate notifications more or less frequently than specified. +\end{note} -struct virtio_net_ctrl_coal_tx { -le32 tx_max_packets; -le32 tx_usecs; +\begin{lstlisting} +struct virtio_net_ctrl_coal { +le32 max_packets; +le32 usecs; }; #define VIRTIO_NET_CTRL_NOTF_COAL 6 @@ -1532,49 +1532,62 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi Coalescing parameters: \begin{itemize} -\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification. -\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification. -\item \field{rx_max_packets}: Maximum number of packets to receive before a RX notification. -\item \field{tx_max_packets}: Maximum number of packets to send before a TX notification. +\item \field{usecs} for RX: Maximum number of usecs to delay a RX notification. +\item \field{usecs} for TX: Maximum number of usecs to delay a TX notification. +\item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification. +\item \field{max_packets} for TX: Maximum number of packets to send before a TX notification. \end{itemize} - The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands: \begin{enumerate} -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and \field{tx_max_packets} parameters. -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and \field{rx_max_packets} parameters. +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and \field{max_packets} parameters for all the transmit virtqueues. +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and \field{max_packets} parameters for all the receive virtqueues. \end{enumerate} -\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications} +\subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation} + +The device sends a used buffer notification once the notification conditions are met, if the notifications are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}. + +When the device has non-zero \field{usecs} and non-zero \field{max_packets}, it starts counting usecs and packets upon receiving/sending a packet. +The device counts packets and usecs for each receive virtqueue and transmit virtqueue separately. +In this case, the notification conditions are met when \field{usecs} usecs elapses, or upon sending/receiving \field{max_packets} packets, whichever happens first. + Hi, Alvaro. "when \field{usecs} usecs elapses" --> "when \field{usecs} elapses", right? Thanks for your clear work. +When the device has \field{usecs} = 0 or \field{max_packets} = 0, the notification conditions are met after every packet received/sent. + +\subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example} If, for example: \begin{itemize} -\item
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v8] virtio-net: support inner header hash
在 2023/2/18 上午12:24, Parav Pandit 写道: From: virtio-dev@lists.oasis-open.org On Behalf Of Heng Qi [..] We assume that hash_report_tunnel_types is still present in the next version, I am little lost. Hi, Parav. You are not lost. I'm just answering some of Michael's questions and making assumptions. :) I thought we all agreed that reporting just the tunnel type in data path path virtio_net_hdr is not useful to sw. And hence we should omit in the virtio_net_hdr. Did I miss the motivation to add it back? If not, probably it is better to review and discuss v9 without it, which will be easier to discuss. but it only exists in virtio net hdr and should be populated by the device after the hash calculation. hash_tunnel_types already controls whether the device computes internal header hashes. I don't really follow this, hash_report_tunnel_type is better off keeping it "report" literally. Talking about VIRTIO_NET_F_HASH_TUNNEL here. Not hash_report_tunnel_type. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation
在 2023/2/17 下午6:07, Michael S. Tsirkin 写道: On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote: Maybe we can use struct virtio_net_ctrl_coal inside struct virtio_net_ctrl_coal_vq instead of repeating max_usecs and max_packets? I'm not sure if it would be confusing, what do you think? Hi Alvaro. I guess you mean one of the following two forms: #1 struct virtio_net_ctrl_coal { le32 max_packets; le32 max_usecs; }; struct virtio_net_ctrl_coal_vq { le16 vqn; le16 reserved; struct virtio_net_ctrl_coal coal; } coal_vq; #2 struct virtio_net_ctrl_coal { le32 max_packets; le32 max_usecs; le16 vqn; // if _F_VQ_NOTF_COAL is negotiated le16 reserved; // if _F_VQ_NOTF_COAL is negotiated }; If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the presence of vqn and reserved is awkward. If it's #2, I think this is a bit like the v1 version, using virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be semantic, but it is indeed more unified in function. I think we should hear from Michael and Parav. I meant #1. We can see virtio_net_ctrl_coal as a struct holding coalescing parameters, regardless of the commands. Yes, let's wait for more comments on that. Reusing virtio_net_ctrl_coal is a nice thought. Makes it a bit clearer these have exactly the same role. Whether to put vqn first or last does not matter imho. +Virtqueue coalescing parameters: +\begin{itemize} +\item \field{vqn}: The virtqueue number of the enabled transmit or receive virtqueue, excluding the control virtqueue. +\item \field{max_packets}: The maximum number of packets sent/received by the specified virtqueue before a TX/RX notification. +\item \field{max_usecs}: The maximum number of TX/RX usecs that the specified virtqueue delays a TX/RX notification. +\end{itemize} + +\field{reserved} is reserved and it is ignored by the device. + max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET. ("Maximum number of packets to receive/send before a RX/TX notification"). The fact that this is applied to all VQs or to a specific one is derived from the command. Same for max_usecs. Maybe we can join the coalescing parameters somehow instead of repeating the explanations? Any thoughts on this part? I think I agree. Generally I think we should first of all describe the new VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, moving all explanation text to that. Then just explain that VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET have the same effect as VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET repeated for all currently enabled tx/rx vqs. Plus maybe a single annotated example where there's a mix of VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, VIRTIO_NET_CTRL_NOTF_COAL_RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_TX_SET commands. For example with 2 vq pairs: 1. VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets for vq 0 and 2, vq 1 and 3 retain reset value 2. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 0, vq 2 retains value from 1 3. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 1, vq 3 retains reset value 4. VIRTIO_NET_CTRL_NOTF_COAL_TX_SET overrides command 3 no need for many examples. Good idea. This is a clear and comprehensive example. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v3] virtio-net: support the virtqueue coalescing moderation
On Fri, Feb 17, 2023 at 04:12:34PM +, Parav Pandit wrote: > > > From: Michael S. Tsirkin > > Sent: Friday, February 17, 2023 6:35 AM > > > > We mention the device reset case, but nothing about VQ reset. > > > > > > I feel that no matter how we handle this, we break something. > > > > > > Having default coalescing values may collide with "Upon reset, a > > > device MUST initialize all coalescing parameters to 0." > > > > No this is after device reset. > > > > > We can say that VQ reset doesn't affect the "global parameters" and a > > > device reset does, but this collides with "Device Requirements: > > > Virtqueue Reset". > > > > Not really. > When the device resets, VQ objects are destroyed in the device. > So VQ's notifications parameters doesn't exists on device reset. > > And so the same case with VQ reset. > When a VQ is reset (disabled), VQ's notifications configuration is removed in > the device too. > Just like its desc ring and other addresses are invalid. Yes, but there seems to be such a situation: when the device is reactivated, as the specification says, all parameters are set to 0 (use parameters as the default configuration on the device). When CTRL_COAL_SET and CTRL_COAL_VQ_SET are sent, the configuration is updated (the parameters of each vq may be different, but the global parameter configuration may be recorded), at this time, if vq is reset, should the parameters be 0 or a recorded global parameters after it is re-enabled? > > > > In fact, resetting coalescing values after vq reset may be derived > > > from "Upon reset, a device MUST initialize all coalescing parameters > > > to 0". > > > This is consistent with "Device Requirements: Virtqueue Reset". > > > > > > I can add in my patch some clarifications. > > > > > > This will break the linux virtio_net ethtool implementation a little, > > > we'll need to fix it. > > > > Not good. I feel we must come up with spec that is backwards compatible. > > Hmm, maybe this is why Parav kept talking about modes. > > I did not realize at the time, sorry Parav. > > > > I still feel modes are not the best way to describe things so I propose > > this: > > - in addition to per vq parameters, device that supports global TX/RX > > commands and ring reset maintains two sets of default parameters: for RX > > and > > TX > > - existing commands change default and change all enabled vqs > > of the correct type (RX/TX) to the same value > > - vq reset changes a vq to the default > > - device reset changes defaults to 0 and changes all vqs to 0 > > > > note how defaults are only used for ring reset. is "vq reset parameter" > > a better name? I feel we will repeat "reset" too many times in a sentence > > if we > > call it that though. > > > > So fundamentally the only change is with RING_RESET, then default is not > > always 0, it can be set by the global command. > > True default is not zero when global are configured. > It is ok to report VQ parameters on GET command when globals are configured. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/23 下午9:13, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. Is this just with RSS? I assume hash calculation is also used for something like queueing so there's a similar risk even just with hash reporting, no? I don't understand why it would be risky to just report hash when not used for queuing, and even we don't report whether hash come from inner or outer now because there is no more hash_report_tunnel. When a user inside a tunnel tries to control the +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to indentify an RX queue. We can use the remaining bit as a "tunnel bit" to signal whether to use the inner or the outer hash for queue selection. The lookup will work like this then: calculate outer hash if (rss[outer hash] & tunnel bit) How a tunnel bit distinguishes between multiple tunnel types, and I think it is not so reasonable to use the indirection table to determine the switch of the inner hash. The inner hash is only the ability to calculate the hash, and does not involve the indirection table. Thanks. then calculate inner hash return rss[inner hash] & ~tunnel bit else return rss[outer hash] this fixes the security issue returning us back to status quo : specific tunnels can be directed to separate queues. This is for RSS. For hash reporting indirection table is not used. Maybe it is enough to signal to driver that inner hash was used. We do need that signalling though. My question would be whether it's practical to implement in hardware. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/24 上午10:45, Jason Wang 写道: 在 2023/2/23 12:41, Heng Qi 写道: 在 2023/2/23 上午10:50, Jason Wang 写道: Hi: 在 2023/2/22 14:46, Heng Qi 写道: Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Anything may the tunnel different? I think this can happen even without tunnel (and even with single queue). I agree. How to mitigate those attackers seems more like a implementation details where might require fair queuing or other QOS technology which has been well studied. I am also not sure whether this point needs to be focused on in the spec, and I see that the protection against tunnel DoS is more protected outside the device, but it seems to be okay to give some attack reminders. Maybe it's sufficient to say the device should make sure the fairness among different flows when queuing packets? Yes, maybe the device does not guarantee QoS or needs to guarantee enqueue fairness between flows. Thanks. Thanks Thanks. It seems out of the scope of the spec (unless we want to let driver manageable QOS). Thanks Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/24 下午4:13, Michael S. Tsirkin 写道: On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Thursday, February 23, 2023 8:14 AM On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to indentify an RX queue. We can use the remaining bit as a "tunnel bit" to signal whether to use the inner or the outer hash for queue selection. I further brainstormed internally with Saeed and Rony on this. The inner hash is only needed for GRE, IPIP etc. For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header. It does that based on the inner header. Refer to [1] as one example. [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922 But I think hash was requested for RSS with dpdk, no? I think yes, at least probably the first customer to use the feature might be dpdk.:) The lookup will work like this then: calculate outer hash if (rss[outer hash] & tunnel bit) Tunnel bit, you mean tunneled packet, right? this idea stores a bit in the indirection table which signals which of the hashes to use for rss This allows inner hash to have the ability to select a queue and place packets to the queue (that is, parallel to RSS), which seems to be different from our discussion before v9. Thanks. then calculate inner hash return rss[inner hash] & ~tunnel bit Why to end with a tunnel bit? this just clears the bit so we end up with a vq number. else return rss[outer hash] this fixes the security issue returning us back to status quo : specific tunnels can be directed to separate queues. The number of tunnels is far higher than the number of queues with para virt driver doing decap. True. This seeks to get us back to where we are before the feature: driver can send specific outer hashes to specific queues. outer hash collisions remain a problem. This is for RSS. For hash reporting indirection table is not used. Maybe it is enough to signal to driver that inner hash was used. We do need that signalling though. My question would be whether it's practical to implement in hardware. In above example, hw calculating double hash is difficult without much gain. Either calculating on one inner or outer makes sense. Signaling whether calculated on inner or outer is fine because hw exactly tells what it did. This, in a sense, is what reporting hash tunnel type did. Do you now think we need it? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/23 上午10:50, Jason Wang 写道: Hi: 在 2023/2/22 14:46, Heng Qi 写道: Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Anything may the tunnel different? I think this can happen even without tunnel (and even with single queue). I agree. How to mitigate those attackers seems more like a implementation details where might require fair queuing or other QOS technology which has been well studied. I am also not sure whether this point needs to be focused on in the spec, and I see that the protection against tunnel DoS is more protected outside the device, but it seems to be okay to give some attack reminders. Thanks. It seems out of the scope of the spec (unless we want to let driver manageable QOS). Thanks Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午4:52, Michael S. Tsirkin 写道: On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote: On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin wrote: On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote: On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin wrote: On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote: Btw, this kind of 1:1 hash features seems not scalable and flexible. It requires an endless extension on bits/fields. Modern NICs allow the user to customize the hash calculation, for virtio-net we can allow to use eBPF program to classify the packets. It seems to be more flexible and scalable and there's almost no maintain burden in the spec (only bytecode is required, no need any fancy features/interactions like maps), easy to be migrated etc. Prototype is also easy, tun/tap had an eBPF classifier for years. Thanks Yea BPF offload would be great to have. We have been discussing it for years though - security issues keep blocking it. *Maybe* it's finally going to be there but I'm not going to block this work waiting for BPF offload. And easily migrated is what BPF is not. Just to make sure we're at the same page. I meant to find a way to allow the driver/user to fully customize what it wants to hash/classify. Similar technologies which is based on private solution has been used by some vendors, which allow user to customize the classifier[1] ePBF looks like a good open-source solution candidate for this (there could be others). But there could be many kinds of eBPF programs that could be offloaded. One famous one is XDP which requires many features other than the bytecode/VM like map access, tailcall. Starting from such a complicated type is hard. Instead, we can start from a simple type, that is the eBPF classifier. All it needs is to pass the bytecode to the device, the device can choose to run it or compile it to what it can understand for classifying. We don't need maps, tail calls and other features. Until people start asking exactly for maps because they want state for their classifier? Yes, but let's compare the eBPF without maps with the static feature proposed here. It is much more scalable and flexible. And it makes sense - if you want e.g. load balancing you need stats which needs maps. Yes, but we know it's possible to have that (through the XDP offload). This is impossible with the approach proposed here. I'm not actually objecting. And at least we then don't need to worry about leaking info - it's not virtio leaking info it's the bpf program. I wonder what does Heng Qi think. Heng Qi would it work for your scenario? We are positive on ebpf, which looks adequate in our scenario. Although it currently has some problems in offloading, such as imperfect interfaces, unstable, and user-unfriendly ebpf codes may consume a lot of device resources. Device support for ebpf will also take time. Also, the presence of ebpf offload does not conflict with other solutions, eg we still have RSS. Our goal is to pass this patch first. For the support of ebpf offloading, we have not collected internal requirements for the time being, but it is indeed a good direction. Thanks. We don't need to worry about the security because of its simplicity: the eBPF program is only in charge of doing classification, no other interactions with the driver and packet modification is prohibited. The feature is limited only to the VM/bytecode abstraction itself. What's more, it's a good first step to achieve full eBPF offloading in the future. Thanks [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html Dave seems to have nacked this approach, no? I may miss something but looking at kernel commit, there are few patches to support that: E.g commit c7648810961682b9388be2dd041df06915647445 Author: Tony Nguyen Date: Mon Sep 9 06:47:44 2019 -0700 ice: Implement Dynamic Device Personalization (DDP) download And it has been used by DPDK drivers. Thanks If we are talking about netdev then this discussion has to take place on netdev. If it's dpdk this is more believable. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH v10] virtio-net: support the virtqueue coalescing moderation
Currently, coalescing parameters are grouped for all transmit and receive virtqueues. This patch supports setting or getting the parameters for a specified virtqueue, and a typical application of this function is netdim[1]. When the traffic between virtqueues is unbalanced, for example, one virtqueue is busy and another virtqueue is idle, then it will be very useful to control coalescing parameters at the virtqueue granularity. [1] https://docs.kernel.org/networking/net_dim.html Signed-off-by: Heng Qi Reviewed-by: Xuan Zhuo --- This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html . v9->v10: 1. Remove the "global values". @Parav Pandit 2. Avoid multiple interpretations of command behavior. @Alvaro Karsz v8->v9: 1. Declare the commands that can be sent for each feature. @Alvaro Karsz 2. Add information about "global values" in the command's explanation. @Alvaro Karsz v7->v8: 1. Use "best-effort" in Alvaro's patch instead of "the device may set the parameter to a value close to 2". @Michael S . Tsirkin, @David Edmondson v6->v7: 1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin 2. Remove the formula for vqn range. @Parav Pandit 3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin v5->v6: 1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson v4->v5: 1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin 2. Add read and write attributes for each field. @Michael S. Tsirkin 3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin 4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson v3->v4: 1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz 2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz 3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin 4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz v2->v3: 1. Add the netdim link. @Parav Pandit 2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz 3. _VQ_GET is explained more. @Michael S. Tsirkin 4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin 5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz 6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin 7. Fix some typos. @Michael S. Tsirkin v1->v2: 1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin 2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin 3. Unify tx and rx control structures into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin 4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz 5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz 6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz device-types/net/description.tex | 88 1 file changed, 77 insertions(+), 11 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index e71e33b..b5d45e8 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing. + \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -1505,13 +1508,13 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device
Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/1 下午7:07, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 11:30:37AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? In fact, similar to protocols such as GRE, there is no outer transport header. Thanks. Sorry I don't understand the answer. What is similar to what? By GRE you mean NVGRE? That has FlowID for this purpose. Only 8 bit - is this the issue? Not enough entropy? Sorry I almost missed this email. Did you miss the reply in the other thread: " The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. 1). During this process, the CPUs on the host are divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. 2. When there is an inner header hash, the gateway will directly help parse the outer header, and use the inner 5 tuples to calculate the inner hash. The tunneled packet is then handed over to the host. 1) All the CPUs of the host are used to process data packets, and there is no need to use some CPUs to forward and parse the outer header. 2) The entropy of the original quintuple is sufficient, and the queue is widely distributed. " In this thread, I mean protocols such as Generic Routing Encapsulation (GRE)[1], which have IPv4 as *Delivery Header*. Compared with VXLAN, which increases entropy through outer udp src port, GRE has less entropy. [1] https://www.rfc-editor.org/rfc/rfc2784.html Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v5] virtio-net: support the virtqueue coalescing moderation
在 2023/2/21 下午7:48, David Edmondson 写道: On Tuesday, 2023-02-21 at 16:38:52 +08, Heng Qi wrote: ... +A device MAY set the coalescing parameter to a value close to a power of 2 value. What is this about? If it is intended to indicate that a device may use a value different to that passed by the driver, more text to describe that would be Yes. How about adding the following explanation to the "Operation" subparagraph? " When a device receives the VIRTIO_NET_CTRL_NOTF_COAL commands to set an coalescing parameter, it may set the parameter to a value close to a power of 2. For example: If the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with \field{max_usecs} = 7 is received, the device may set \field{max_usecs} = 8 for a given enabled virtqueue. " Thanks. useful. If not, what does it mean? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/22 上午7:18, Michael S. Tsirkin 写道: On Tue, Feb 21, 2023 at 10:32:11PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Tuesday, February 21, 2023 4:46 PM What is this information driver can't observe? It sees all the packets after all, we are not stripping tunneling headers. Just the tunnel type. If/when that tunnel header is stripped, it gets complicated where tunnel type is still present in the virtio_net_hdr because hash_report_tunnel feature bit is negotiated. whoever strips off the tunnel has I imagine strip off the virtio net hdr too - everything else in it such as gso type refers to the outer packet. I also don't really know what are upper layer drivers - for sure layering of drivers is not covered in the spec for now so I am not sure what do you mean by that. The risk I mentioned is leaking the information *on the network*. Got it. \begin{lstlisting} struct virtio_net_rss_config { le32 hash_types; +le32 hash_tunnel_types; This field is not needed as device config space advertisement for the support is enough. If the intent is to enable hashing for the specific tunnel(s), an individual command is better. new command? I am not sure why we want that. why not handle tunnels like we do other protocols? I didn't follow. We probably discussed in another thread that to set M bits, it is wise to avoid setting N other bits just to keep the command happy, where N >>> M and these N have a very strong relation in hw resource setup and packet steering. Any examples of 'other protocols'? #define VIRTIO_NET_HASH_TYPE_IPv4 (1 << 0) #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1) #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2) this kind of thing. I don't see how a tunnel is different fundamentally. Why does it need its own field? Driver is in control to enable/disable tunnel based inner hash acceleration only when its needed. This way certain data path hw parsers can be enabled/disabled. Without this it will be always enabled even if there may not be any user of it. Device has scope to optimize this flow. I feel you misunderstand the question. Or maybe I misunderstand what you are proposing. So tunnels need their own bits. But why a separate field and not just more bits along the existing ones? Because the hashing is not covering the outer header contents. We may be still not discussing the same. So let me refresh the context. The question of discussion was, Scenario: 1. device advertises the ability to hash on the inner packet header. 2. device prefers that driver enable it only when it needs to use this extra packet parser in hardware. There are 3 options. a. Because the feature is negotiated, it means it is enabled for all the tunnel types. Pros: 1. No need to extend cvq cmd. Cons: 1. device parser is always enabled, and the driver never uses it. This may result in inferior rx performance. b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, better not to enable hw for it. Hence, have the knob to explicitly enable in hw. So have the cvq command. b.1 should it be combined with the existing command? Cons: a. when the driver wants to enable hash on inner, it needs to supply the exact same RSS config as before. Sw overhead with no gain. b. device needs to parse new command value, compare with old config, and drop the RSS config, just enable inner hashing hw parser. Or destroy the old rss config and re-apply. This results in weird behavior for the short interval with no apparent gain. b.2 should it be on its own command? Pros: a. device and driver doesn't need to bother about b.1.a and b.1.b. b. still benefits from not always enabling hw parser, as this is not a common case. c. has the ability to enable when needed. I prefer b.1. With reporting of the tunnel type gone I don't see a fundamental difference between hashing over tunneling types and other protocol types we support. It's just a flag telling device over which bits to calculate the hash. We don't have a separate command for hashing of TCPv6, why have it for vxlan? Extending with more HASH_TYPE makes total sense to me, seems to fit better with the existing design and will make patch smaller. +1. It is infrequent to configure the *tunnel hash types* through commands, and when configuring the *hash types*, the hash key and indirection table are not required too. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. 2. When there is an inner header hash, the gateway will directly help parse the outer header, and use the inner 5 tuples to calculate the inner hash. The tunneled packet is then handed over to the host. 1) All the CPUs of the host are used to process data packets, and there is no need to use some CPUs to forward and parse the outer header. 2) The entropy of the original quintuple is sufficient, and the queue is widely distributed. Thanks. same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? In fact, similar to protocols such as GRE, there is no outer transport header. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
在 2023/2/21 下午8:47, Parav Pandit 写道: From: virtio-comm...@lists.oasis-open.org On Behalf Of Heng Qi Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report an encapsulation type, and the feature depends on VIRTIO_NET_F_HASH_REPORT. As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field. Hence, please remove this feature bit. HASH_TUNNEL is good enough. Please remove the references to it at more places below. If we don't want it at all, I'll remove it and references to it. Ok. thanks. [..] #define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8) \end{lstlisting} Lets please remove the below encoding. Here is the encoding of the hash tunnel types, I guess you are referring to remove the encoding of the hash_report_tunnel types? Right. [..] +RSS to select queues for placement. When a user inside a tunnel +tries to control the enqueuing of encapsulated packets, then the +user can flood the device with invaild packets, and the flooded +packets may be hashed into the same queue as packets in other normal +tunnels, which causing the queue to overflow. Invalid packets are confusing and the wording of "which causing" is not proper. There is some duplicate wording below too. I think above and below risk can be summarized in bit simpler manner. How about, When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels. + I think this sentence can be used as a starting summary, and readers may still need to expand the explanation. Do you think the following description is ok? " When a specific receive queue is shared to receive encapsulating packets of multiple tunnels, there is no quality of service for these packets of multiple tunnels. For example: A user inside the tunnel floods a device with packets, then the packets are hashed into the shared receive queue and cause the queue to overflow, and this increases packet loss and latency for other tunnels. " Even without a queue overflow, this shared receive queue may not have a balanced number of packets. For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1. So, your example is right (and extreme), a generic mention of QoS covers both aspects. Secondly "user inside the tunnel" is challenging to explain. In above sentence talks specifically about the "receive queue" as an object. "overflow" in my example. If we only use a one-sentence summary to describe tunnel risks, then I think this subparagraph is called "QoS problem for tunnel hash". struct virtio_net_rss_config { le32 hash_types; +le32 hash_tunnel_types; This field is not needed as device config space advertisement for the support is enough. If so, virtio_net_hash_config does not require hash_tunnel_types when it does not need to configure the specific tunnel(s). If the intent is to enable hashing for the specific tunnel(s), an individual command is better. Drivers MAY filter out some tunneling types when negotiating this feature. Do you think it would be better for us to add a separate command? I don't see tools like ethtool that can configure specific tunnels in userspace. The reason I proposed different command is, Let's say we have only single command. Rss config command has many other fields unrelated to the inner hash. Yes. For inner hash, fields such as indirection table are irrelevant. Hence, to enable inner hash, the driver needs to supply the exact same value for unrelated fields to the same value. And device needs to compare it with the old value and maintain some sort of cache to derive that nothing changes from the previous hash config, hence, ignore hw configuration for rss. This mechanism slows down the command for the unrelated task. Totally agree. Hence, I am considering a separate command that would be simple for the device and driver to implement. I agree with you. Do you want me to do it in this patch, or should we do it in another patch? Thanks! :) Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility. Yes, you are right. I'll fix this. Thank you very much! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
在 2023/2/21 下午8:47, Parav Pandit 写道: From: virtio-comm...@lists.oasis-open.org On Behalf Of Heng Qi Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report an encapsulation type, and the feature depends on VIRTIO_NET_F_HASH_REPORT. As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field. Hence, please remove this feature bit. HASH_TUNNEL is good enough. Please remove the references to it at more places below. If we don't want it at all, I'll remove it and references to it. Ok. thanks. [..] #define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8) \end{lstlisting} Lets please remove the below encoding. Here is the encoding of the hash tunnel types, I guess you are referring to remove the encoding of the hash_report_tunnel types? Right. [..] +RSS to select queues for placement. When a user inside a tunnel +tries to control the enqueuing of encapsulated packets, then the +user can flood the device with invaild packets, and the flooded +packets may be hashed into the same queue as packets in other normal +tunnels, which causing the queue to overflow. Invalid packets are confusing and the wording of "which causing" is not proper. There is some duplicate wording below too. I think above and below risk can be summarized in bit simpler manner. How about, When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels. + I think this sentence can be used as a starting summary, and readers may still need to expand the explanation. Do you think the following description is ok? " When a specific receive queue is shared to receive encapsulating packets of multiple tunnels, there is no quality of service for these packets of multiple tunnels. For example: A user inside the tunnel floods a device with packets, then the packets are hashed into the shared receive queue and cause the queue to overflow, and this increases packet loss and latency for other tunnels. " Even without a queue overflow, this shared receive queue may not have a balanced number of packets. For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1. So, your example is right (and extreme), a generic mention of QoS covers both aspects. Secondly "user inside the tunnel" is challenging to explain. In above sentence talks specifically about the "receive queue" as an object. struct virtio_net_rss_config { le32 hash_types; +le32 hash_tunnel_types; This field is not needed as device config space advertisement for the support is enough. If so, virtio_net_hash_config does not require hash_tunnel_types when it does not need to configure the specific tunnel(s). If the intent is to enable hashing for the specific tunnel(s), an individual command is better. Drivers MAY filter out some tunneling types when negotiating this feature. Do you think it would be better for us to add a separate command? I don't see tools like ethtool that can configure specific tunnels in userspace. The reason I proposed different command is, Let's say we have only single command. Rss config command has many other fields unrelated to the inner hash. And virtio_net_hash_config seems to suffice except for le16 reserved[4]. Hence, to enable inner hash, the driver needs to supply the exact same value for unrelated fields to the same value. And device needs to compare it with the old value and maintain some sort of cache to derive that nothing changes from the previous hash config, hence, ignore hw configuration for rss. This mechanism slows down the command for the unrelated task. Hence, I am considering a separate command that would be simple for the device and driver to implement. Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility. Yes, you are right. I'll fix this. Thank you very much! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash
在 2023/4/26 下午10:48, Michael S. Tsirkin 写道: On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote: This does not mean that every device needs to implement and support all of these, they can choose to support some protocols they want. I add these because we have scale application scenarios for modern protocols VXLAN-GPE/GENEVE: +\item In scenarios where the same flow passing through different tunnels is expected to be received in the same queue, + warm caches, lessing locking, etc. are optimized to obtain receiving performance. Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover. Thanks. But VXLAN-GPE/GENEVE can use source port for entropy. It is recommended that the UDP source port number be calculated using a hash of fields from the inner packet That is best because it allows end to end control and is protocol agnostic. Yes. I agree with this, I don't think we have an argument on this point right now.:) For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to deal with scenarios where the same flow passes through different tunnels. Having them hashed to the same rx queue, is hard to do via outer headers. All that is missing is symmetric Toepliz and all is well? The scenarios above or in the commit log also require inner headers. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support inner header hash
在 2023/4/26 下午10:24, Parav Pandit 写道: From: Heng Qi Sent: Wednesday, April 26, 2023 10:04 AM Yes, but that seems like a tiny cost, and the cvq command-related structure is much simpler. Current structure size is 24 bytes. This size becomes multiplier with device count scale to be always available and rarely changes. As we add new features such device capabilities grow making the multiplier bigger. For example a. flow steering capabilities (how many flows, what mask, supported protocols, generic options) b. hds capabilities c. counter capabilities (histogram based, which error counters supported, etc) d. which new type of tx vq improvements supported. e. hw gro context count supported May be more.. Depending on the container/VM size certain capabilities may change from device to device. Hence it is hard to deduplicate them at device level. This makes sense. In general, we should be careful about adding things to the device space unless the benefit is non-trivial. Thanks. Therefore, ability to query them over a non_always_available transport is preferred choice from the device. A driver may choose to cache it if its being frequently accessed or ask device when needed. Even when it's cached by driver, it is coming from the component that doesn’t have transport level timeouts associated with it. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v13] virtio-net: support inner header hash
在 2023/4/26 上午4:28, Parav Pandit 写道: On 4/23/2023 3:35 AM, Heng Qi wrote: \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device u8 rss_max_key_size; le16 rss_max_indirection_table_length; le32 supported_hash_types; + le32 supported_tunnel_hash_types; }; In v12 I was asking this to move to above field from the config area to the GET command in comment [1] as, "With that no need to define two fields at two different places in config area and also in cvq." I'm not sure if this is sufficiently motivated, RSS also has supports_hash_types in config space. We don't actually need cvq and config to sync on supported_tunnel_hash_types, since it doesn't need to change (meaning supported_tunnel_hash_types doesn't send configuration change notifications). I am sorry if that was not clear enough. [1] https://lore.kernel.org/virtio-dev/569cbaf9-f1fb-0e1f-a2ef-b1d7cd7db...@nvidia.com/ \subparagraph{Supported/enabled hash types} \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]}, \hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}. Hash types applicable for IPv4 packets: \begin{lstlisting} #define VIRTIO_NET_HASH_TYPE_IPv4 (1 << 0) @@ -980,6 +993,152 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}). \end{itemize} +\paragraph{Inner Header Hash} +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Inner Header Hash} + +If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports inner header hash and the driver can send +commands VIRTIO_NET_CTRL_TUNNEL_HASH_SET and VIRTIO_NET_CTRL_TUNNEL_HASH_GET for the inner header hash configuration. + +struct virtio_net_hash_tunnel_config { Please move field from the config struct to here. Both are RO fields. le32 supported_hash_tunnel_types; + le32 hash_tunnel_types; +}; + +#define VIRTIO_NET_CTRL_TUNNEL_HASH 7 + #define VIRTIO_NET_CTRL_TUNNEL_HASH_SET 0 + #define VIRTIO_NET_CTRL_TUNNEL_HASH_GET 1 + +Filed \field{hash_tunnel_types} contains a bitmask of configured hash tunnel types as +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}. + +The class VIRTIO_NET_CTRL_TUNNEL_HASH has the following commands: +\begin{itemize} +\item VIRTIO_NET_CTRL_TUNNEL_HASH_SET: set the \field{hash_tunnel_types} to configure the inner header hash calculation for the device. +\item VIRTIO_NET_CTRL_TUNNEL_HASH_GET: get the \field{hash_tunnel_types} from the device. +\end{itemize} + +For the command VIRTIO_NET_CTRL_TUNNEL_HASH_SET, the structure virtio_net_hash_tunnel_config is write-only for the driver. +For the command VIRTIO_NET_CTRL_TUNNEL_HASH_GET, the structure virtio_net_hash_tunnel_config is read-only for the driver. + You need to split the structures to two, one for get and one for set in above description as get and set contains different fields. + +If VIRTIO_NET_HASH_TUNNEL_TYPE_NONE is set or the encapsulation type is not included in \field{hash_tunnel_types}, +the hash of the outer header is calculated for the received encapsulated packet. + + +For scenarios with sufficient external entropy or no internal hashing requirements, inner header hash may not be needed: +A tunnel is often expected to isolate the external network from the internal one. By completely ignoring entropy +in the external header and replacing it with entropy from the internal header, for hash calculations, this expectation You wanted to say inner here like rest of the places. s/internal header/inner header I want to make the 'external' and 'internal' correspond, but avoid the internal header, and use a unified 'inner header' is also reasonable.:) +The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that are not supported by the device. Multiple flags so, s/flags that are/flags which are/ Will fix. Thanks! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/21 上午3:48, Michael S. Tsirkin 写道: On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: We use the most basic GRE header fields (not NVGRE), not even optional fields. I'd say yes, the most convincing usecase is with legacy GRE. Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric hashing. Please consider this. Given that, do you need the rest of protocols there? I would say that I checked the current tunneling protocols used for overlay networks and their respective RFC versions compared to each other. They are: 1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as either the payload or delivery protocol. link : https://datatracker.ietf.org/doc/rfc2784/ 2. GRE_rfc2890: This protocol describes extensions by which two fields, Key and Sequence Number, can be optionally carried in the GRE Header. link: https://www.rfc-editor.org/rfc/rfc2890 3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). This protocol is specified for IPv6 and used as either the payload or delivery protocol. Note that this does not change the GRE header format or any behaviors specified by RFC 2784 or RFC 2890. link: https://datatracker.ietf.org/doc/rfc7676/ 4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of encapsulating network protocol packets within GRE and UDP headers. This GRE-in-UDP encapsulation allows the UDP source port field to be used as an entropy field. This protocol is specified for IPv4 and IPv6, and used as either the payload or delivery protocol. link: https://www.rfc-editor.org/rfc/rfc8086 5. VXLAN: Virtual eXtensible Local Area Network. link: https://datatracker.ietf.org/doc/rfc7348/ 6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol describes extending Virtual eXtensible Local Area Network (VXLAN) via changes to the VXLAN header. link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt 7. GENEVE: Generic Network Virtualization Encapsulation. link: https://datatracker.ietf.org/doc/rfc8926/ 8. IPIP: IP Encapsulation within IP. link: https://www.rfc-editor.org/rfc/rfc2003 9. NVGRE: Network Virtualization Using Generic Routing Encapsulation link: https://www.rfc-editor.org/rfc/rfc7637.html 10. STT: Stateless Transport Tunneling. STT is particularly useful when some tunnel endpoints are in end-systems, as it utilizes the capabilities of the network interface card to improve performance. link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements for inner header hashing. GRE_rfc2784 requires RSS hashing to different queues. For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs to use inner symmetric hashing. I know you mean to want this feature to only support GRE_rfc2784, since it's the most convincing for RSS. But RSS hashes packets to different queues for different streams. For the same flow, it needs to hash it to the same queue. So this doesn't distort the role of RSS, and I believe that for modern protocols like VXLAN and others, inner symmetric hashing is still a common requirement for other vendors using virtio devices. So, can we make this feature support all the protocols I have checked above, so that vendors can choose to support the protocols they want. And this can avoid the addition of new tunnel protocols in the near future as much as possible. Do you think it's ok? Again: I'm so sorry I didn't realize I missed this until I checked my emails. We can start with just legacy GRE (think about including IPv6 or not). Given how narrow this usecase is, I'd be fine with focusing just on this, and addressing more protocols down the road with something programmable like BPF. WDYT? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/21 上午3:45, Michael S. Tsirkin 写道: On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote: On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote: On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. If parts of a flow go through different tunnels won't this cause reordering at the network level? Why is it so important to prevent it at the nic then? Or, since you are stressing symmetric hash, are you talking about TX and RX side going through different tunnels? Yes, the directions client1->client2 and client2->client1 may go through different tunnels. Using inner symmetric hashing can satisfy the same CPU to process two directions of the same flow to improve performance. Well sure but ... are you just doing forwarding or inner processing too? When there is an inner hash, there is no forwarding anymore. If forwarding why do you care about matching TX and RX queues? If e2e In fact, we are just matching on the same rx queue. The network topology is roughly as follows. The processing host will receive the packets sent from client1 and client2 respectively, then make some action judgments, and return them to client2 and client1 respectively. client1 client2 | | | __ | +->| tunnel |<+ || | | | | | | v v +-+ | processing host | +-+ Thanks. monotoring host would be a better term Sure. I'm so sorry I didn't realize I missed this until I checked my emails. :( processing can't you just store the incoming hash in the flow and reuse on TX? This is what Linux is doing... 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Well 2 is at least inexact, there's flowID there. It's just 8 bit We use the most basic GRE header fields (not NVGRE), not even optional fields. There is also no flow id in the GRE header, should you be referring to NVGRE? Thanks. so not sufficient if there are more than 512 queues. Still 512 queues is quite a lot. Are you try
[virtio-dev] Re: [PATCH v12] virtio-net: support inner header hash
在 2023/4/12 上午5:03, Parav Pandit 写道: On 4/3/2023 12:58 AM, Heng Qi wrote: To achieve this, the device can calculate a suitable hash based on the inner headers of this flow, for example using the Toeplitz combined with a symmetric hash key. I am not sure you need symmetric hash key. Toeplitz with symmetric hashing without the symmetric key is possible too. So just mentioning it as a 'combined with symmetric hashing' is enough. Yes, as discussed with Michael, we will also support XOR hashing or Toeplitz symmetric hashing or even both, I'm thinking of starting a draft in a separate thread to let us have initial discussions. The statement here I will modify. \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device u8 rss_max_key_size; le16 rss_max_indirection_table_length; le32 supported_hash_types; + le32 supported_tunnel_hash_types; }; Given that a set command is added via cvq, it make sense to also do symetrric work to get it via a cvq. This is also similar to the latest work for notification coalescing for VQ where get and set done using single channel = cvq. Only set is given to match the existing RSS/HASH configuration methods. But we should really look ahead as you suggest. Granted that RSS and other fields are done differently, but it was bit in the past. Yes. With that no need to define two fields at two different places in config area and also in cvq. Just the new opcode is needed for GET and things will be fine. Right. +If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports inner header hash and +the driver can configure the inner header hash calculation for encapsulated packets \ref{sec:Device Types / Network Device / Device OperatiHn / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet} +by issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG from the class VIRTIO_NET_CTRL_MQ. +The command sets \field{hash_tunnel_types} in the structure virtio_net_hash_tunnel_config. + +struct virtio_net_hash_tunnel_config { + le32 hash_tunnel_types; +}; + VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_SET and VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_GET Will do. +Filed \field{hash_tunnel_types} contains a bitmask of supported hash tunnel types as +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}. + +\subparagraph{Tunnel/Encapsulated packet} +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet} + +A tunnel packet is encapsulated from the original packet based on the tunneling +protocol (only a single level of encapsulation is currently supported). The +encapsulated packet contains an outer header and an inner header, and the device +calculates the hash over either the inner header or the outer header. + +If VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated packet's +outer header matches one of the supported \field{hash_tunnel_types}, the hash +of the inner header is calculated. s/one of the supported/one of the configured/ Because support comes from GET or config space area; out of which a smaller or equal subset of tunnel types are configured. Yes, configured is obviously more accurate than supported. +\devicenormative{\subparagraph}{Inner Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Header Hash} + +The device MUST calculate the outer header hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}. + Since the configured set can be smaller, a better reword is: The device MUST calculate the hash from the outer header if the received encapsulated packet type is not matching from hash_tunnel_types. Will modify. +The device MUST respond to the VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG command with VIRTIO_NET_ERR if the device +received an unrecognized or unsupported VIRTIO_NET_HASH_TUNNEL_TYPE_ flag. + +Upon reset, the device MUST initialize \field{hash_tunnel_type} to 0. + +\drivernormative{\subparagraph}{Inner Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Header Hash} + +The driver MUST have negotiated the feature VIRTIO_NET_F_HASH_TUNNEL when issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG. + +The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that are not supported by the device. + \paragraph{Hash reporting for incoming packets} \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting
Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash
在 2023/4/14 上午11:10, Jason Wang 写道: On Fri, Apr 14, 2023 at 5:46 AM Michael S. Tsirkin wrote: On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote: For example, when the packets of certain +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss. We have many places that can lead to packet dropping. For example, the automatic steering is best effort. I tend to avoid mentioning things like this. Ok. And Michael what do you think about this? I think this text did not do a great job explaining the security aspect. Here's a better, shorter explanation: It is often an expectation of users that a tunnel isolates the external network from the internal one. By completely ignoring entropy in the external header and replacing it with entropy from the internal header, for hash calculations, this expectation might be violated to a certain extent, depending on how the hash is used. When the hash use is limited to RSS queue selection, the effect will likely be limited to ability of users inside the tunnel to cause packet drops in multiple queues (as opposed to a single queue without the feature). And this is only for GRE-in-UDP? This makes me think if we should add GRE support for the outer header like: https://docs.napatech.com/r/Feature-Set-N-ANL9/Hash-Key-Type-10-3-Tuple-GREv0 I think this is for tunneling protocols with specific flow fields, such as GRE:key filed, NVGRE:FLOWID filed. This requires us to make a requirement when calculating the hash of the tunnels when F_TUNNEL_HASH is not negotiated. It's a new work. Thanks. Thanks + +Possible mitigations: +\begin{itemize} +\item Use a tool with good forwarding performance to keep the receive queue from filling up. +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE + to disable inner header hash for encapsulated packets. +\item Choose a hash key that can avoid queue collisions. +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues. +\end{itemize} + +The limitations mentioned above exist with/without the inner header hash. This conflicts with the tile "Tunnel QoS limitation" which readers may think it happens only for tunnel. Perhaps a "QoS Advices" is better? Plural of "advice" is "advice" not "advices". This advice is somewhat bogus though. The point I keep trying to make is that this: Choose a hash key that can avoid queue collisions. is impossible with the feature and possible without. This was the whole reason I asked for a security considerations sections. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash
在 2023/4/14 上午5:43, Michael S. Tsirkin 写道: On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote: For example, when the packets of certain +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss. We have many places that can lead to packet dropping. For example, the automatic steering is best effort. I tend to avoid mentioning things like this. Ok. And Michael what do you think about this? I think this text did not do a great job explaining the security aspect. Here's a better, shorter explanation: It is often an expectation of users that a tunnel isolates the external network from the internal one. By completely ignoring entropy in the external header and replacing it with entropy from the internal header, for hash calculations, this expectation might be violated to a certain extent, depending on how the hash is used. When the hash use is limited to RSS queue selection, the effect will likely be limited to ability of users inside the tunnel to cause packet drops in multiple queues (as opposed to a single queue without the feature). Sure. Will do in the v13. + +Possible mitigations: +\begin{itemize} +\item Use a tool with good forwarding performance to keep the receive queue from filling up. +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE + to disable inner header hash for encapsulated packets. +\item Choose a hash key that can avoid queue collisions. +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues. +\end{itemize} + +The limitations mentioned above exist with/without the inner header hash. This conflicts with the tile "Tunnel QoS limitation" which readers may think it happens only for tunnel. Perhaps a "QoS Advices" is better? Plural of "advice" is "advice" not "advices". My fault. This advice is somewhat bogus though. The point I keep trying to make is that this: Choose a hash key that can avoid queue collisions. is impossible with the feature and possible without. I don't think so, the outer headers also has corresponding entropy for different streams. Thanks. This was the whole reason I asked for a security considerations sections. Thanks! Thanks This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature
在 2023/2/7 上午5:53, Parav Pandit 写道: From: virtio-dev@lists.oasis-open.org On Behalf Of Michael S. Tsirkin On Mon, Feb 06, 2023 at 07:13:43PM +, Parav Pandit wrote: From: virtio-dev@lists.oasis-open.org On Behalf Of Alvaro Karsz This patch makes several improvements to the notification coalescing feature, including: - Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx into a single struct, virtio_net_ctrl_coal, as they are identical. - Emphasizing that the coalescing commands are best-effort. - Defining the behavior of coalescing with regards to delivering notifications when a change occur. Patch needs to do one thing at a time. Please split above into three patches. Signed-off-by: Alvaro Karsz --- device-types/net/description.tex | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index 1741c79..2a98411 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -1514,15 +1514,12 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can send control commands for dynamically changing the coalescing parameters. -\begin{lstlisting} -struct virtio_net_ctrl_coal_rx { -le32 rx_max_packets; -le32 rx_usecs; -}; +Note: In general, these commands are best-effort: A device could +send a notification even if it is not supposed to. Please remove this note from this patch. Instead of Note, we need to describe this device requirements description. We better need to describe the motivation for it. We may want to say there may be jitter in notification, but device should not be sending when it is not supposed to. It's explicitly allowed: split-ring.tex:The driver MUST handle spurious notifications from the device. split-ring.tex:The device MUST handle spurious notifications from the driver. The intent is the guide to device implementation to have less spurious interrupts. Best effort wording says that device is free to implement timer of any granularity of choice, which kind of defeats the purpose of IM. So, both can handle/generate spurious notifications, but it shouldn't be best line guidance. I also have more description to add in this area with regards to GSO and LRO. I make humble suggestion that we draft is jointly in separate patch combining these clarifications. -struct virtio_net_ctrl_coal_tx { -le32 tx_max_packets; -le32 tx_usecs; +\begin{lstlisting} +struct virtio_net_ctrl_coal { +le32 max_packets; +le32 usecs; }; This is one good change to go as separate patch. #define VIRTIO_NET_CTRL_NOTF_COAL 6 @@ -1532,25 +1529,25 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi Coalescing parameters: \begin{itemize} -\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification. -\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification. -\item \field{rx_max_packets}: Maximum number of packets to receive before a RX notification. -\item \field{tx_max_packets}: Maximum number of packets to send before a TX notification. +\item \field{usecs} for RX: Maximum number of usecs to delay a RX notification. +\item \field{usecs} for TX: Maximum number of usecs to delay a TX notification. +\item \field{max_packets} for RX: Maximum number of packets to +receive before a RX notification. +\item \field{max_packets} for TX: Maximum number of packets to send +before a TX notification. \end{itemize} s/for Rx/For receive virtqueue s/for Tx/For transmit virtqueue Which virtqueue? It says TX/RX pretty consistently in this text. Changing to receive virtqueue/transmit virtqueue would be a big change and frankly for a very modest gain in readability. Rather maybe just say RX/TX where we describe virtqueue. We are describing rest of the previous sections as transmitq, receiveq. Would like to keep this section consistent with rest of the Network device section, and not just notification coalescing section. The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands: \begin{enumerate} -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and \field{tx_max_packets} parameters. -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and \field{rx_max_packets} parameters. +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and \field{max_packets} parameters for TX. +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and \field{max_packets} parameters for RX. \end{enumerate} \subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications} If, for example: \begin{itemize} -\item \field{rx_usecs} = 10. -\item \field{rx_max_packets} = 15. +\item \field{usecs} = 10. +\item \field{max_packets} = 15.
[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation
在 2023/2/11 下午4:45, Alvaro Karsz 写道: Please add short description something like, When the driver prefers to use per virtqueue notifications coalescing, and if queue group (transmit or receive) level notification coalescing is enabled, driver SHOULD first disable device level notification coalescing. Or it should be, I disagree here. IMO "queue group level notification coalescing" is not something to enable or disable, but a shortcut to set all TX/RX queues at once. Why should the spec force a driver to "disable device level notification coalescing" (I assume you mean send a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)? What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command, and then a single queue traffic increases? why should it zero the parameters to all other queues? Hi, Alvaro! Thanks for your reply! I think Parav refers more to the scene where ethool sets parameters at the queue group level and the scene where netdim sets parameters for a single queue. In this scenario, netdim should really determine the coalescing parameters of the device, and the parameters at the queue group level set by ethtool should be ignored (many drivers are designed this way, such as mlx), that is, we need to give netdim the right, because it It has the ability to dynamically adjust parameters. (However, I think this friendly constraint is also possible in driver implementation.) Of course, if we consider setting coalescing parameters at the queue group level and single queue level separately through ethtool, then as you said, we should not set any priority for them. Back to reality, I think the function of ethtool to set single queue parameters may come later, which is thankless for users because of netdim. Therefore, if our specification tends to be practical, we can add Parav's proposal, and if our specification tends to be more general, then hand over the constraints to the driver implementation. what do you think? I think that this should be discussed in the driver implementation stage, not in the spec. Virtqueue level notifications coalescing, and device level notifications can be enabled together. When both of them are enabled, per virtqueue notifications coalescing take priority over queue group level. How do you enable Virtqueue level notifications coalescing? Why are they different entities? I don't think that we should have priorities, but the last command should be the one that dictates the coalescing parameters. For example, let's look at vq0 (RX): Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change the parameters accordingly (all RX vqs should do the same). Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0, vq0 changes the parameters accordingly (all RX vqs are still using the "old" parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 changes the parameters accordingly (all RX vqs should do the same). Yes I see what you mean, thanks for the clear example. This should be the second scenario I described above, let's discuss how to solve it in the above reply. Thanks! :) This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation
在 2023/2/8 下午11:04, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:48 AM On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:43 AM On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:18 AM On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote: I see two options. 1. Just have per VQ params. Software has the full knowledge of in which it is operating, and state remains at software level. This effectively achieves both the mode. 2. Have a mode cmd, Mode = (a) per device or (b) per VQ (c) disable After the mode is set, driver can set per device or per VQ. I find this more clear. Thanks. Rereading this I think I misunderstood the proposal. Now we are burning memory on maintaining mode, and this information is duplicated. It is not maintained in the pci resident memory, so it doesn't hurt. I'd say let's just add a new command COAL_QUEUE_SET with vqn as parameter. Existing commands are simply defined as a shortcut to running COAL_QUEUE_SET on all tx/rx queues respectively. Latest command dictates the parameters. To disable just set everything to 0 (btw we should make this explicit in the spec, but it can be guessed from: Upon reset, a device MUST initialize all coalescing parameters to 0. ) Switching between the modes (per q vs per device) implicitly is ambiguous and it only means device may need to iterate. hmm i feel it's only ambiguous because i failed to explain in well. This state is either better maintained in sw by always having per vq or have clearly defined mode of what device should do. Per Q is very common even for several years old devices. Last time I counted, there were at least 15 such devices supporting it. So actual usage wise, I practically see that most implementations will end up with per vq mode. I like to hear from Heng or Alvaro if they see any use of per device. Right so given this, most devices will be in per queue mode all the time. why do you want a mode then? just keep per queue. existing commands are kept around for compat but internally just translate to per-queue. Since the space is not released, do we need to keep the compat? It's been accepted for half a year so we can't say for sure no one built this. That is likely but we should have the ability to have the Errata/ECN to correct it, specially for unrelease spec. The way I propose is just a bit of firmware on device that scans all queues and copies same parameters everywhere. This scanning loop in sw appears cheaper to me than some embedded fw. But is not a lot of concern. Seems easier than worrying about this, and we get disabling coalescing for free which you wanted. With an extra mode its extra logic in the device fast path. Maybe it's cheap on hardware side but in software it's an extra branch, not free. Most performant data path wouldn't implement and read the extra mode. It is always fw that is going to program same value, or per queue valued or disable value in each Q regardless whichever way we craft the CVQ cmd. The sequence that bothers me is below. 1. driver set global params 2. few minutes later, now driver set param for Q=1 On this command, a device need to decide: Should Q = 2 to N (a) either work with previous globals, or (b) because per Q was set for one queue, they rest of the queues implicitly disable it. If it is (b), When a command on Q object =1 is issued, it affects other Q objects. <- This I want to avoid. A cmd that modifies the object, should only modify that object. If it is (a), it is mixed mode operation, which is ambiguous definition. I think it should be a. I think we should blur the concept of mode. There seems to be no mode here. From the perspective of the device, it only needs to distinguish commands and do what it should do. Thanks. A better semantic is to define such change at device level and no extra cost in the data path. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation
在 2023/2/9 上午1:53, Alvaro Karsz 写道: > > From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:48 AM On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:43 AM On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:18 AM On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote: I see two options. 1. Just have per VQ params. Software has the full knowledge of in which it is operating, and state remains at software level. This effectively achieves both the mode. 2. Have a mode cmd, Mode = (a) per device or (b) per VQ (c) disable After the mode is set, driver can set per device or per VQ. I find this more clear. Thanks. Rereading this I think I misunderstood the proposal. Now we are burning memory on maintaining mode, and this information is duplicated. It is not maintained in the pci resident memory, so it doesn't hurt. I'd say let's just add a new command COAL_QUEUE_SET with vqn as parameter. Existing commands are simply defined as a shortcut to running COAL_QUEUE_SET on all tx/rx queues respectively. Latest command dictates the parameters. To disable just set everything to 0 (btw we should make this explicit in the spec, but it can be guessed from: Upon reset, a device MUST initialize all coalescing parameters to 0. ) Switching between the modes (per q vs per device) implicitly is ambiguous and it only means device may need to iterate. hmm i feel it's only ambiguous because i failed to explain in well. This state is either better maintained in sw by always having per vq or have clearly defined mode of what device should do. Per Q is very common even for several years old devices. Last time I counted, there were at least 15 such devices supporting it. So actual usage wise, I practically see that most implementations will end up with per vq mode. I like to hear from Heng or Alvaro if they see any use of per device. Right so given this, most devices will be in per queue mode all the time. why do you want a mode then? just keep per queue. existing commands are kept around for compat but internally just translate to per-queue. Since the space is not released, do we need to keep the compat? It's been accepted for half a year so we can't say for sure no one built this. That is likely but we should have the ability to have the Errata/ECN to correct it, specially for unrelease spec. The way I propose is just a bit of firmware on device that scans all queues and copies same parameters everywhere. This scanning loop in sw appears cheaper to me than some embedded fw. But is not a lot of concern. Seems easier than worrying about this, and we get disabling coalescing for free which you wanted. With an extra mode its extra logic in the device fast path. Maybe it's cheap on hardware side but in software it's an extra branch, not free. Most performant data path wouldn't implement and read the extra mode. It is always fw that is going to program same value, or per queue valued or disable value in each Q regardless whichever way we craft the CVQ cmd. The sequence that bothers me is below. 1. driver set global params 2. few minutes later, now driver set param for Q=1 On this command, a device need to decide: Should Q = 2 to N (a) either work with previous globals, or (b) because per Q was set for one queue, they rest of the queues implicitly disable it. If it is (b), When a command on Q object =1 is issued, it affects other Q objects. <- This I want to avoid. A cmd that modifies the object, should only modify that object. If it is (a), it is mixed mode operation, which is ambiguous definition. A better semantic is to define such change at device level and no extra cost in the data path. I think that (a) is the way to go. I don't think that we should work with operation modes at all. I agree to keep the current global settings (VIRTIO_NET_F_NOTF_COAL and its corresponding commands), because our hardware team has limited resources for the control queue, and they don't want to send a separate cmd for each queue when send a global setting cmd. Then adding a VIRTIO_NET_F_PERQUEUE_NOTF_COAL or VIRTIO_NET_F_VQ_NOTF_COAL feature bit and new commands for per-queue or VQ setting looks better to me. In my opinion: We should have 2 features: VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL. VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and VIRTIO_NET_F_NOTF_COAL sets parameters for all queues. VIRTIO_NET_F_NOTF_COAL has 2 commands: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET VIRTIO_NET_CTRL_NOTF_COAL_TX_SET VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands: VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut for setting all queues with one command, exactly as intended wi
[virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation
在 2023/2/9 上午4:52, Michael S. Tsirkin 写道: On Wed, Feb 08, 2023 at 07:53:09PM +0200, Alvaro Karsz wrote: > > From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:48 AM On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:43 AM On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, February 8, 2023 9:18 AM On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote: I see two options. 1. Just have per VQ params. Software has the full knowledge of in which it is operating, and state remains at software level. This effectively achieves both the mode. 2. Have a mode cmd, Mode = (a) per device or (b) per VQ (c) disable After the mode is set, driver can set per device or per VQ. I find this more clear. Thanks. Rereading this I think I misunderstood the proposal. Now we are burning memory on maintaining mode, and this information is duplicated. It is not maintained in the pci resident memory, so it doesn't hurt. I'd say let's just add a new command COAL_QUEUE_SET with vqn as parameter. Existing commands are simply defined as a shortcut to running COAL_QUEUE_SET on all tx/rx queues respectively. Latest command dictates the parameters. To disable just set everything to 0 (btw we should make this explicit in the spec, but it can be guessed from: Upon reset, a device MUST initialize all coalescing parameters to 0. ) Switching between the modes (per q vs per device) implicitly is ambiguous and it only means device may need to iterate. hmm i feel it's only ambiguous because i failed to explain in well. This state is either better maintained in sw by always having per vq or have clearly defined mode of what device should do. Per Q is very common even for several years old devices. Last time I counted, there were at least 15 such devices supporting it. So actual usage wise, I practically see that most implementations will end up with per vq mode. I like to hear from Heng or Alvaro if they see any use of per device. Right so given this, most devices will be in per queue mode all the time. why do you want a mode then? just keep per queue. existing commands are kept around for compat but internally just translate to per-queue. Since the space is not released, do we need to keep the compat? It's been accepted for half a year so we can't say for sure no one built this. That is likely but we should have the ability to have the Errata/ECN to correct it, specially for unrelease spec. The way I propose is just a bit of firmware on device that scans all queues and copies same parameters everywhere. This scanning loop in sw appears cheaper to me than some embedded fw. But is not a lot of concern. Seems easier than worrying about this, and we get disabling coalescing for free which you wanted. With an extra mode its extra logic in the device fast path. Maybe it's cheap on hardware side but in software it's an extra branch, not free. Most performant data path wouldn't implement and read the extra mode. It is always fw that is going to program same value, or per queue valued or disable value in each Q regardless whichever way we craft the CVQ cmd. The sequence that bothers me is below. 1. driver set global params 2. few minutes later, now driver set param for Q=1 On this command, a device need to decide: Should Q = 2 to N (a) either work with previous globals, or (b) because per Q was set for one queue, they rest of the queues implicitly disable it. If it is (b), When a command on Q object =1 is issued, it affects other Q objects. <- This I want to avoid. A cmd that modifies the object, should only modify that object. If it is (a), it is mixed mode operation, which is ambiguous definition. A better semantic is to define such change at device level and no extra cost in the data path. I think that (a) is the way to go. I don't think that we should work with operation modes at all. In my opinion: We should have 2 features: VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL. VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and VIRTIO_NET_F_NOTF_COAL sets parameters for all queues. VIRTIO_NET_F_NOTF_COAL has 2 commands: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET VIRTIO_NET_CTRL_NOTF_COAL_TX_SET VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands: VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut for setting all queues with one command, exactly as intended with rx_qid= 0x, and without breaking devices following the current spec. The device's FW can decide if it stores parameters received with VIRTIO_NET_CTRL_NOTF_COAL_RX_SET in a global set, or if it iterates through all queues, but IMO the best way it to iterate through all queues. Seems like a win-win situation to me. We achieve the same func
Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation
在 2023/2/9 上午6:35, Alvaro Karsz 写道: From: Alvaro Karsz Sent: Wednesday, February 8, 2023 4:56 PM Alvaro, Do you know if any software used it? Can you get some real data? I implemented this feature in our DPU, so at least 1 vendor is using this feature But which software (virtio net driver) in which OS is using this? Sorry, I'm not sure I understand your question. The feature is implemented in the linux kernel https://github.com/torvalds/linux/commit/699b045a8e43bd1063db4795be685bfd659649dc So we'll always have kernel versions accepting this feature, if offered. (I will add support for the per vq command of course). I really don't know about other vendors.. You are suggesting to reserve the command and feature bit for safety, so, if we reserve them, why not just use them? What do we lose here? If it is used by some unknown software, only that sw breaks by using non release spec. If we use it by changing the definition, it may break that unknown sw. If we know there is no known sw, we are better of with redefinition (by adding vqn, and by removing tx,rx from it). Not having this feature/command even complicates things now that we are talking about removing the RX and TX from the per vq command, how do you change parameters to all TX queues? to all RX queues? we'll need 2 special indexes, so we now need le32 to hold the queue index. No need for special index. How does a driver disable all queues or reset all queues? -> One by one. So if user want to change for all TXQ, sw can do it one by one by iterating TXQ vqns. Yes, but resetting the queues doesn't require a control command. If a server has 64K queues, and a user wants to set all coalescing parameters to X (maybe with ethtool), it will generate 64K control commands.. At least our hardware design doesn't expect that. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] [PATCH v2] virtio-net: support the virtqueue coalescing moderation
在 2023/2/10 下午6:16, Alvaro Karsz 写道: So, should we remove VIRTIO_NET_F_CTRL_VQ here, or fix VIRTIO_NET_F_HOST_ECN? Ah good point. But I think VIRTIO_NET_F_VQ_NOTF_COAL should not depend on VIRTIO_NET_F_NOTF_COAL. This way devices can drop the all-rx/all-tx commands if they want to. We need to confirm this. If we make VIRTIO_NET_F_VQ_NOTF_COAL independent of VIRTIO_NET_F_NOTF_COAL, do we need to give vqn a special value so that the driver can also have the fast path of sending all queues with global settings via VIRTIO_NET_F_VQ_NOTF_COAL? IMO we don't need a special vqn value. A device that can modify all the vqs should offer VIRTIO_NET_F_NOTF_COAL. That's clear, thanks for the quick reply. Have a great weekend! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation
在 2023/2/8 下午6:10, Michael S. Tsirkin 写道: On Wed, Feb 08, 2023 at 09:57:54AM +0800, Heng Qi wrote: I think it's a good idea to do this on top of Alvaro's patch unifying these two structures. I saw Alvaro's patch, but it doesn't seem to be stable yet, is there a good way for me to unify the two structures, since a patch should only do one thing. Problem is you were trying to change these existing structures too so the patches conflicted. However I think at this point we are in agreement on a new command with a new structure. In this case there won't be a conflict as you are not touching existing commands so no need for you to depend on Alvaro's patch. I get it. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation
On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote: > > > > From: Alvaro Karsz > > Sent: Saturday, February 11, 2023 3:45 AM > > > > > Please add short description something like, > > > > > > When the driver prefers to use per virtqueue notifications coalescing, > > > and if > > queue group (transmit or receive) level notification coalescing is enabled, > > driver > > SHOULD first disable device level notification coalescing. > > > Or it should be, > > > > > > > I disagree here. > > IMO "queue group level notification coalescing" is not something to enable > > or > > disable, but a shortcut to set all TX/RX queues at once. > That short cut is the enable/disablement. > > > Why should the spec force a driver to "disable device level notification > > coalescing" (I assume you mean send a > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)? > Yes. Because to have well defined behavior when sw configured both one after > the another. > > > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET > > command, and then a single queue traffic increases? why should it zero the > > parameters to all other queues? > That is short transition when driver is switching over to per queue mode. > This is fine to have short glitch. > > > I think that this should be discussed in the driver implementation stage, > > not in > > the spec. > > > There should be a clear guidance on how device should behave when both per q > and per device are configured. > > > > Virtqueue level notifications coalescing, and device level notifications > > > can be > > enabled together. > > > When both of them are enabled, per virtqueue notifications coalescing take > > priority over queue group level. > > > > How do you enable Virtqueue level notifications coalescing? Why are they > > different entities? > Using the new command that has vqn in it. > > > I don't think that we should have priorities, but the last command should > > be the > > one that dictates the coalescing parameters. > > > Priority is applicable when driver has issued both the commands. Per tx/rx, > and per vqn. > > > For example, let's look at vq0 (RX): > > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change > > the parameters accordingly (all RX vqs should do the same). > > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0, > > vq0 changes the parameters accordingly (all RX vqs are still using the "old" > > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 > > changes the parameters accordingly (all RX vqs should do the same). > In this example, per VQ were overridden with per device. > Yes, so the last one is applicable, so priority of last one applies. > > We continue to refuse to add the mode, and hence need to supply these > description of both the sequence on how device should behave. > > Sequence_1: > 1. tx/rx group level > 2. per vqn level > When #2 is done, VQ's whose per vq level is configured, follows vqn, rest of > the VQs follow #1. > > Sequence_2: > 1. per vqn > 2. tx/rx group level > When #2 is done, group level overrides the per vqn parameters. Adding examples of the two command sequences is due, I will do so in the next release. :) Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation
On Sat, Feb 11, 2023 at 06:13:55PM +0200, Alvaro Karsz wrote: > I think that I wasn't clear enough. > > I'm not saying that we should not define in the spec how to handle a > situation when a device receives both RX_SET and VQ_SET (or a driver > sends both). > I'm saying that I don't think that the driver should handle the > situation the way you described it: > > > When the driver prefers to use per virtqueue notifications coalescing, and > > if queue group (transmit or receive) level notification coalescing is > > enabled, driver SHOULD first disable device level notification coalescing. > > Or it should be, > > > > Virtqueue level notifications coalescing, and device level notifications > > can be enabled together. > > When both of them are enabled, per virtqueue notifications coalescing take > > priority over queue group level. > > This implies that we have 2 modes and have priorities. > > I think that if we want to refer to this situation in the spec, it > should be something like: > "A Device should use the last coalescing parameters received for a > virtqueue, regardless of the command used to deliver the parameters." Your suggestion is good, the per-device command and the per-queue command need some examples and behavior definitions, I will add them to avoid some misunderstandings. Thanks. > (just an example to make the point). - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation
On Sun, Feb 12, 2023 at 04:35:37AM -0500, Michael S. Tsirkin wrote: > On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote: > > > > > > > From: Alvaro Karsz > > > Sent: Saturday, February 11, 2023 3:45 AM > > > > > > > Please add short description something like, > > > > > > > > When the driver prefers to use per virtqueue notifications coalescing, > > > > and if > > > queue group (transmit or receive) level notification coalescing is > > > enabled, driver > > > SHOULD first disable device level notification coalescing. > > > > Or it should be, > > > > > > > > > > I disagree here. > > > IMO "queue group level notification coalescing" is not something to > > > enable or > > > disable, but a shortcut to set all TX/RX queues at once. > > That short cut is the enable/disablement. > > > > > Why should the spec force a driver to "disable device level notification > > > coalescing" (I assume you mean send a > > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)? > > Yes. Because to have well defined behavior when sw configured both one > > after the another. > > > > > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET > > > command, and then a single queue traffic increases? why should it zero the > > > parameters to all other queues? > > That is short transition when driver is switching over to per queue mode. > > This is fine to have short glitch. > > > > > I think that this should be discussed in the driver implementation stage, > > > not in > > > the spec. > > > > > There should be a clear guidance on how device should behave when both per > > q and per device are configured. > > > > > > Virtqueue level notifications coalescing, and device level > > > > notifications can be > > > enabled together. > > > > When both of them are enabled, per virtqueue notifications coalescing > > > > take > > > priority over queue group level. > > > > > > How do you enable Virtqueue level notifications coalescing? Why are they > > > different entities? > > Using the new command that has vqn in it. > > > > > I don't think that we should have priorities, but the last command should > > > be the > > > one that dictates the coalescing parameters. > > > > > Priority is applicable when driver has issued both the commands. Per tx/rx, > > and per vqn. > > > > > For example, let's look at vq0 (RX): > > > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change > > > the parameters accordingly (all RX vqs should do the same). > > > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0, > > > vq0 changes the parameters accordingly (all RX vqs are still using the > > > "old" > > > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 > > > changes the parameters accordingly (all RX vqs should do the same). > > In this example, per VQ were overridden with per device. > > Yes, so the last one is applicable, so priority of last one applies. > > > > We continue to refuse to add the mode, and hence need to supply these > > description of both the sequence on how device should behave. > > > > Sequence_1: > > 1. tx/rx group level > > 2. per vqn level > > When #2 is done, VQ's whose per vq level is configured, follows vqn, rest > > of the VQs follow #1. > > > > Sequence_2: > > 1. per vqn > > 2. tx/rx group level > > When #2 is done, group level overrides the per vqn parameters. > > Since there's apparently some room for misunderstanding, I think adding these > examples can't hurt. Ok, I'll handle this. > I would also be more specific and just use specific numbers in the > example, to avoid any ambiguity. Agree. Thanks. > > -- > MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 0/4] Short document fixes to inner hash feature
在 2023/7/13 下午7:33, Michael S. Tsirkin 写道: On Thu, Jul 13, 2023 at 11:12:13AM +0200, Cornelia Huck wrote: On Thu, Jul 13 2023, Parav Pandit wrote: This short patches fixes the editing errors that stops the pdf and html generation. These 3 fixes are for the patch [1] for the github issue [2]. [1] https://lists.oasis-open.org/archives/virtio-comment/202307/msg00024.html [2] https://github.com/oasis-tcs/virtio-spec/issues/173 Patch summary: patch-1 place C code under listing patch-2 avoid hyphen and extra braces patch-3 use table as hyperlink do not work well in C code listing patch-4 refer 'advice' as 'note' Patch 1 to 3 appears to be must in the testing. Patch 4 is not a fix and can be done later if it requires discussion. Parav Pandit (4): virtio-net: Place C code under listing virtio-net: Avoid hyphen and extra braces virtio-net: Use table to describe inner hash to rfc mapping virtio-net: Use note instead of advice device-types/net/description.tex | 45 ++-- introduction.tex | 15 +-- 2 files changed, 38 insertions(+), 22 deletions(-) FTR, this is the diff I have locally (I had missed one underscore in the references yesterday...); maybe we can make the intra-reference links in introdcution.tex a bit nicer, but otherwise, this should be the minimal change to make this build: Perfect. Seems like clearly an editorial fix. Heng Qi, in the future I'd like to ask you to please build the spec and review the resulting PDF and HTML, before posting. Yes, I will! diff --git a/device-types/net/description.tex b/device-types/net/description.tex index 206020de567d..76585b0dd6d3 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -1024,12 +1024,14 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the driver can send the command VIRTIO_NET_CTRL_HASH_TUNNEL_SET to configure the calculation of the inner header hash. +\begin{lstlisting} struct virtnet_hash_tunnel { le32 enabled_tunnel_types; }; #define VIRTIO_NET_CTRL_HASH_TUNNEL 7 #define VIRTIO_NET_CTRL_HASH_TUNNEL_SET 0 +\end{lstlisting} Field \field{enabled_tunnel_types} contains the bitmask of encapsulation types enabled for inner header hash. See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / @@ -1063,16 +1065,16 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network Hash calculation for incoming packets / Encapsulation types supported/enabled for inner header hash} Encapsulation types applicable for inner header hash: -\begin{lstlisting} -#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* \hyperref[intro:gre_rfc2784]{[GRE_rfc2784]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* \hyperref[intro:gre_rfc2890]{[GRE_rfc2890]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* \hyperref[intro:gre_rfc7676]{[GRE_rfc7676]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* \hyperref[intro:gre_in_udp_rfc8086]{[GRE-in-UDP]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN (1 << 4) /* \hyperref[intro:vxlan]{[VXLAN]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE (1 << 5) /* \hyperref[intro:vxlan_gpe]{[VXLAN-GPE]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE (1 << 6) /* \hyperref[intro:geneve]{[GENEVE]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* \hyperref[intro:ipip]{[IPIP]} */ -#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE (1 << 8) /* \hyperref[intro:nvgre]{[NVGRE]} */ +\begin{lstlisting}[escapechar=|] +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* |\hyperref[intro:rfc2784]{[RFC2784]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* |\hyperref[intro:rfc2890]{[RFC2890]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* |\hyperref[intro:rfc7676]{[RFC7676]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* |\hyperref[intro:rfc8086]{[GRE-in-UDP]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN (1 << 4) /* |\hyperref[intro:vxlan]{[VXLAN]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE (1 << 5) /* |\hyperref[intro:vxlan-gpe]{[VXLAN-GPE]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE (1 << 6) /* |\hyperref[intro:geneve]{[GENEVE]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* |\hyperref[intro:ipip]{[IPIP]}| */ +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE (1 << 8) /* |\hyperref[intro:nvgre]{[NVGRE]}| */ \end{lstlisting} \subparagraph{Advice} diff --git a/introduction.tex b/introduction.tex index 81f07a4fee19..6f10a94b6fde 100644 --- a/introduction.tex +++ b/introduction.tex @@ -102,18 +102,18 @@ \section{Normative References}\labe
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net
On Tue, May 30, 2023 at 03:33:22PM -0400, Michael S. Tsirkin wrote: > On Wed, May 24, 2023 at 04:12:46PM +0800, Heng Qi wrote: > > On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote: > > > On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote: > > > > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote: > > > > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote: > > > > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote: > > > > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote: > > > > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin > > > > > > > > wrote: > > > > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote: > > > > > > > > > > 1) Add a feature bit to the virtio specification to tell > > > > > > > > > > the sender that a fully > > > > > > > > > > csumed packet must be sent. > > > > > > > > > > > > > > > > > > Who is the sender in this picture? The driver? > > > > > > > > > > > > > > > > The device or the driver. > > > > > > > > > > > > > > > > When the device is hw, the sender is more likely to be a device. > > > > > > > > When the device is sw, the sender can be a device or a driver. > > > > > > > > > > > > > > > > But in general, this feature is inclined to constrain the > > > > > > > > behavior of the device and > > > > > > > > the driver from the receiving side. > > > > > > > > > > > > > > Based on above I am guessing you are talking about driver getting > > > > > > > packets from device, I wish you used terms from virtio spec. > > > > > > > > > > > > Yes, I'm going to use the terminology of the virtio spec. > > > > > > > > > > > > > > > > > > > > > For example: > > > > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device > > > > > > > > that you must send me a fully csumed packet. > > > > > > > > > > > > > > > > Then the specific implementation can be > > > > > > > > > > > > > > > > (1) the sender sends a fully csumed packet; > > > > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the > > > > > > > > device helps calculate the fully csum > > > > > > > > (because the two parties in the communication are located > > > > > > > > on the same host, the packet is trusted.). > > > > > > > > > > > > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the > > > > > > > > driver will no longer receive any packets marked > > > > > > > > CHECKSUM_PARTIAL. > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does. > > > > > > > > > > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device > > > > > > can > > > > > > receive a fully checksummed packet, we can no longer enjoy > > > > > > the device's ability to validate the packet checksum. That is, the > > > > > > value > > > > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which > > > > > > means > > > > > > that the packet received by the driver will not be marked as > > > > > > VIRTIO_NET_HDR_F_DATA_VALID. > > > > > > > > > > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM). > > > > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must > > > > > > give the > > > > > > driver a fully checksummed packet, and the packet is validated by > > > > > > the > > > > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID. > > > > > > > > > > > > > > > > > > > > I feel you are tryin
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net
On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote: > On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote: > > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote: > > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote: > > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote: > > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote: > > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote: > > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote: > > > > > > > > 1) Add a feature bit to the virtio specification to tell the > > > > > > > > sender that a fully > > > > > > > > csumed packet must be sent. > > > > > > > > > > > > > > Who is the sender in this picture? The driver? > > > > > > > > > > > > The device or the driver. > > > > > > > > > > > > When the device is hw, the sender is more likely to be a device. > > > > > > When the device is sw, the sender can be a device or a driver. > > > > > > > > > > > > But in general, this feature is inclined to constrain the behavior > > > > > > of the device and > > > > > > the driver from the receiving side. > > > > > > > > > > Based on above I am guessing you are talking about driver getting > > > > > packets from device, I wish you used terms from virtio spec. > > > > > > > > Yes, I'm going to use the terminology of the virtio spec. > > > > > > > > > > > > > > > For example: > > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that > > > > > > you must send me a fully csumed packet. > > > > > > > > > > > > Then the specific implementation can be > > > > > > > > > > > > (1) the sender sends a fully csumed packet; > > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device > > > > > > helps calculate the fully csum > > > > > > (because the two parties in the communication are located on > > > > > > the same host, the packet is trusted.). > > > > > > > > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the > > > > > > driver will no longer receive any packets marked CHECKSUM_PARTIAL. > > > > > > > > > > > > Thanks. > > > > > > > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does. > > > > > > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can > > > > receive a fully checksummed packet, we can no longer enjoy > > > > the device's ability to validate the packet checksum. That is, the value > > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which > > > > means > > > > that the packet received by the driver will not be marked as > > > > VIRTIO_NET_HDR_F_DATA_VALID. > > > > > > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM). > > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the > > > > driver a fully checksummed packet, and the packet is validated by the > > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID. > > > > > > > > > > > > > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM > > > > > disables all offloads but you want to keep some of them? > > > > > > > > > > > > > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is > > > > needed > > > > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated, > > > > then the driver may always receive packets marked as > > > > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the > > > > same time. > > > > > > Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows > > > VIRTIO_NET_HDR_F_DATA_VALID: > > > > We need to focus on what happens when the XDP program is loaded: > > > > The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM > > feature when loading XDP. The main reason for doing this is because
[virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net
On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote: > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote: > > 1) Add a feature bit to the virtio specification to tell the sender that a > > fully > > csumed packet must be sent. > > Who is the sender in this picture? The driver? The device or the driver. When the device is hw, the sender is more likely to be a device. When the device is sw, the sender can be a device or a driver. But in general, this feature is inclined to constrain the behavior of the device and the driver from the receiving side. For example: VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must send me a fully csumed packet. Then the specific implementation can be (1) the sender sends a fully csumed packet; (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps calculate the fully csum (because the two parties in the communication are located on the same host, the packet is trusted.). In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will no longer receive any packets marked CHECKSUM_PARTIAL. Thanks. > > -- > MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net
On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote: > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote: > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote: > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote: > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote: > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote: > > > > > > 1) Add a feature bit to the virtio specification to tell the sender > > > > > > that a fully > > > > > > csumed packet must be sent. > > > > > > > > > > Who is the sender in this picture? The driver? > > > > > > > > The device or the driver. > > > > > > > > When the device is hw, the sender is more likely to be a device. > > > > When the device is sw, the sender can be a device or a driver. > > > > > > > > But in general, this feature is inclined to constrain the behavior of > > > > the device and > > > > the driver from the receiving side. > > > > > > Based on above I am guessing you are talking about driver getting > > > packets from device, I wish you used terms from virtio spec. > > > > Yes, I'm going to use the terminology of the virtio spec. > > > > > > > > > For example: > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you > > > > must send me a fully csumed packet. > > > > > > > > Then the specific implementation can be > > > > > > > > (1) the sender sends a fully csumed packet; > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device > > > > helps calculate the fully csum > > > > (because the two parties in the communication are located on the > > > > same host, the packet is trusted.). > > > > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver > > > > will no longer receive any packets marked CHECKSUM_PARTIAL. > > > > > > > > Thanks. > > > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does. > > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can > > receive a fully checksummed packet, we can no longer enjoy > > the device's ability to validate the packet checksum. That is, the value > > of \field{flags} in the virtio_net_hdr structure is set to 0, which means > > that the packet received by the driver will not be marked as > > VIRTIO_NET_HDR_F_DATA_VALID. > > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM). > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the > > driver a fully checksummed packet, and the packet is validated by the > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID. > > > > > > > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM > > > disables all offloads but you want to keep some of them? > > > > > > > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed > > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated, > > then the driver may always receive packets marked as > > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the > > same time. > > Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows > VIRTIO_NET_HDR_F_DATA_VALID: We need to focus on what happens when the XDP program is loaded: The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM feature when loading XDP. The main reason for doing this is because VIRTIO_NET_F_GUEST_CSUM allows to receive packets marked as VIRTIO_NET_HDR_F_NEEDS_CSUM. Such packets are not compatible with XDP programs, because we cannot guarantee that the csum_{start, offset} fields are correct after XDP modifies the packets. So in order for the driver to continue to enjoy the device's ability to validate packets while XDP is loaded, we need a new feature to tell the device not to deliver packets marked as VIRTIO_NET_HDR_F_NEEDS_CSUM (in other words, the device can only deliver packets marked as VIRTIO_NET_HDR_F_DATA_VALID). Thanks. > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} can be > set: if so, the packet checksum at offset \field{csum_offset} > from \field{csum_start} and any preceding checksums > have been validated. The checksum on the packet is incomplete and > if bit VIRTIO_NET_HDR_F_RSC_IN
Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net
On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote: > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote: > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote: > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote: > > > > 1) Add a feature bit to the virtio specification to tell the sender > > > > that a fully > > > > csumed packet must be sent. > > > > > > Who is the sender in this picture? The driver? > > > > The device or the driver. > > > > When the device is hw, the sender is more likely to be a device. > > When the device is sw, the sender can be a device or a driver. > > > > But in general, this feature is inclined to constrain the behavior of the > > device and > > the driver from the receiving side. > > Based on above I am guessing you are talking about driver getting > packets from device, I wish you used terms from virtio spec. Yes, I'm going to use the terminology of the virtio spec. > > > For example: > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must > > send me a fully csumed packet. > > > > Then the specific implementation can be > > > > (1) the sender sends a fully csumed packet; > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps > > calculate the fully csum > > (because the two parties in the communication are located on the same > > host, the packet is trusted.). > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will > > no longer receive any packets marked CHECKSUM_PARTIAL. > > > > Thanks. > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does. Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can receive a fully checksummed packet, we can no longer enjoy the device's ability to validate the packet checksum. That is, the value of \field{flags} in the virtio_net_hdr structure is set to 0, which means that the packet received by the driver will not be marked as VIRTIO_NET_HDR_F_DATA_VALID. So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM). If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the driver a fully checksummed packet, and the packet is validated by the device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID. > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM > disables all offloads but you want to keep some of them? > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated, then the driver may always receive packets marked as VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the same time. > Again please use virtio terminology not Linux. to help you out, > in current linux, VIRTIO_NET_HDR_F_NEEDS_CSUM and VIRTIO_NET_HDR_F_DATA_VALID > will set CHECKSUM_PARTIAL and CHECKSUM_UNNECESSARY respectively. > Sure. Will do as you suggested. Thanks. > > -- > MST > > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [Proposal] Relationship between XDP and rx-csum in virtio-net
Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net driver conflicts with the loading of the XDP program, which is caused by the problem described in [1][2], that is, XDP may cause errors in partial csumed-related fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in the virtualized environment, and its purpose is to save computing resource overhead. The *goal* of this proposal is to enable the coexistence of XDP and VIRTIO_NET_F_GUEST_CSUM. 1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL packet. Drivers related to the virtualized environment, such as virtio-net/veth/loopback, etc., may receive partial csumed packets. When the tx device finds that the destination rx device of the packet is located on the same host, it is clear that the packet may not pass through the physical link, so the tx device sends the packet with csum_{start, offset} directly to the rx side to save computational resources without computing a fully csum (depends on the specific implementation, some virtio-net backend devices are known to behave like this currently). From [3], the stack trusts such packets. However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may cause packet dropping as [1][2] stated. But currently the veth community does not seem to have reported such problems, can we guess that the coexistence of XDP and rx CHECKSUM_PARTIAL has less negative impact? 2. About rx CHECKSUM_UNECESSARY: We have just seen that in a virtualized environment a packet may flow between the same host, so not computing the complete csum for the packet saves some cpu resources. The purpose of the checksum is to verify that packets passing through the physical link are correct. Of course, it is also reasonable to do a fully csum for packets of the virtualized environment, which is exactly what we need. rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked, that is, it is a credible packet. If such a packet is modified by the XDP program, the user should recalculate the correct checksum using bpf_csum_diff() and bpf_{l3,l4}_csum_replace(). Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx, etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully checked at the tx side. AWS's ena driver is also designed to be in this fully checksum mode (we also mentioned below that a feature bit can be provided for virtio-net, telling the sender that a fully checksum must be calculated to implement similar behavior to other drivers), although it is in a virtualized environment. 3. To sum up: It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually exclusive, which may cause the following problems: When XDP loads, 1) For packets that are fully checked by the sender, packets are marked as CHECKSUM_UNECESSARY by the rx csum hw offloading. virtio-net driver needs additional CPU resources to compute the checksum for any packet. When testing with the following command in Aliyun ECS: qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw (mtu = 1500, dev layer GRO is on) The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%. 2) One of the main functions of the XDP prog is to be used as a monitoring and firewall, etc., which means that the XDP prog may not modify the packet. This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY, but we ignore the rx csum hw offloading capability for both cases. 4. How we try to solve: 1) Add a feature bit to the virtio specification to tell the sender that a fully csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM when this feature bit is negotiated. (similar to ENA behavior) 2) Modify the current virtio-net driver No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set(). Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy the software CPU resources saved by rx csum hw offloading. (This method is a bit rude) 5. Ending This is a proposal and does not represent a formal solution. Looking forward to feedback from the community and exploring a possible/common solution to the problem described in this proposal. 6. Quote [1] 18ba58e1c234 virtio-net: fail XDP set if guest csum is negotiated We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. [2] e59ff2c49ae1 virtio-net: disable guest csum during XDP set We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through
[virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash
在 2023/6/22 下午8:32, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Thursday, June 22, 2023 2:23 AM On Wed, Jun 21, 2023 at 08:52:04PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Wednesday, June 21, 2023 4:38 PM And the field is RO so no memory cost to exposing it in all VFs. Two structures do not bring the asymmetry. Accessing current and enabled fields via two different mechanism is bringing the asymmetry. I guess it's a matter of taste, but it is clearly more consistent with other hash things, to which it's very similar. This is consistent with new commands we define including notification coalescing whose GET is not coming config space. But there GET just reports the current state. Not the read only capability. So there would be cost per VF to keep it in config space. This one is RO no cost per VF. Let's make it convenient? And each VF can have different value hence requires per VF storage in the device. Nah, config space is too convenient when we can live with its limitations. I don't thin kwe prefer not to keep growing it. For some things such as this one it's perfect. Fields are different between different devices. Not sure what's the implication? Implication is device needs to store this in always available on-chip memory which is not good. For example, for migration driver might want to validate that two devices have same capability. doing it without dma is nicer. A migration driver for real world scenario, will almost have to use the dma for amount of data it needs to exchange. Not migration itself, provisioning. Provisioning driver usually do not attach to the member device directly. This requires device reset, followed by reaching _DRIVER stage, querying features etc and config area. And unbinding it and second reset by member driver. Ugh. Provisioning driver also needs to get the state or capabilities even when member driver is already attached. So config space is not much a gain either. Another example, future admin transport will have ability to provision devices by supplying their config space. This will include this capability automatically, if instead we hide it in a command we need to do extra custom work. So we do not prefer to keep growing the config space anymore, hence GET is the right approach to me. Heh I know you hate config space. Let it go, stop wasting time arguing about the same thing on every turn and instead help define admin transport to solve it This was discussed many times, a driver to have a direct (non-intercepted by owner device) channel to device. If you mean this non-intercepted channel as admin transport, fine. we can do that, sure. If you mean this is intercepted and it is going over admin cmd, then it is of no use for all future interfaces. We discussed this in thread with you and Jason. I provided concrete example with size and device provisioning math too and other example of multi-physical address VQ. So transporting register by register over some admin transport is sub-optimal. Not register by register, we can send all of config space as long as it's RO. This field is. It is RO in context of one member device, but every VF can have different value. The device will never know if one will use new cmdvq to access or some old driver will use without it. And hence, it always needs to provision it on onchip memory for backward compatibility. Yes, I think we also have to consider upcoming 1. device counters (e.g. supported_device_counter), 2. receive flow filters (e.g. supported_flow_types, supported_max_entries), 3. header splits (e.g. supported_split_types) etc. Continuous expansion of the configuration space needs to be careful. Instead of decision point being RO vs RW, any new fields via cmdvq and existing fields stays in cfg space, give predictable behavior to size the member devices in the system. Once the cmdvq is available, we can get rid of GET command used in this version for new future features. Till that arrives, GET command is the efficient way. Yes, I agree. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash
在 2023/6/29 上午9:56, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, June 28, 2023 3:45 PM Maybe I get it. You want to use the new features as a carrot to force drivers to implement DMA? You suspect they will ignore the spec requirement just because things seem to work? Right because it is not a must normative. Well SHOULD also does not mean "ok to just ignore". This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. RECOMMENDED and SHOULD forces the device to support MMIO, which is not good. So rather a good design is device tells the starting offset for the extended config space. And extended config space MUST be accessed using a DMA. With this sw can have infinite size MMIO and hw device forces DMA based on its implementation of where to start DMA from. This also gives the ability to maintain current config as MMIO for backward compatibility. There's some logic here, for sure. you just might be right. However, surely we can discuss this small tweak in 1.4 timeframe? Sure, if we prefer the DMA approach I don't have a problem in adding temporary one field to config space. I propose to add a line to the spec " Device Configuration Space" section, something like, Note: Any new device configuration space fields additional MUST consider accessing such fields via a DMA interface. And this will guide the new patches of what to do instead of last moment rush. Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to MMIO might be an option for qemu helping us debug DMA issues. There are too many queues whose debugging is needed and MMIO likely not the way to debug. The time to discuss this detail would be around when proposal for the DMA access to config space is on list though: I feel this SHOULD vs MUST is a small enough detail. From implementation POV it is certainly critical and good step forward to optimize virtio interface. Going back to inner hash. If we move supported_tunnels back to config space, do you feel we still need GET or just drop it? I note we do not have GET for either hash or rss config. For hash and rss config, debugging is missing. :) Yes, we can drop the GET after switching supported_tunnels to struct virtio_net_hash_config. Great! Glad to hear this! And if we no longer have GET is there still a reason for a separate command as opposed to a field in virtio_net_hash_config? I know this was done in v11 but there it was misaligned. We went with a command because we needed it for supported_tunnels but now that is no longer the case and there are reserved words in virtio_net_hash_config ... Let me know how you feel it about that, not critical for me. struct virtio_net_hash_config reserved is fine. +1. Inner header hash is orthogonal to RSS, and it's fine to have its own structure and commands. There is no need to send additional RSS fields when we configure inner header hash. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash
On Thu, Jun 29, 2023 at 01:56:34AM +, Parav Pandit wrote: > > > > From: Michael S. Tsirkin > > Sent: Wednesday, June 28, 2023 3:45 PM > > > > Maybe I get it. You want to use the new features as a carrot to > > > > force drivers to implement DMA? You suspect they will ignore the > > > > spec requirement just because things seem to work? > > > > > > > Right because it is not a must normative. > > > > Well SHOULD also does not mean "ok to just ignore". > > > > This word, or the adjective "RECOMMENDED", mean that there > >may exist valid reasons in particular circumstances to ignore a > >particular item, but the full implications must be understood and > >carefully weighed before choosing a different course. > > > RECOMMENDED and SHOULD forces the device to support MMIO, which is not good. > So rather a good design is device tells the starting offset for the extended > config space. > And extended config space MUST be accessed using a DMA. > With this sw can have infinite size MMIO and hw device forces DMA based on > its implementation of where to start DMA from. > This also gives the ability to maintain current config as MMIO for backward > compatibility. > > > > > > > > > There's some logic here, for sure. you just might be right. > > > > > > > > However, surely we can discuss this small tweak in 1.4 timeframe? > > > > > > Sure, if we prefer the DMA approach I don't have a problem in adding > > temporary one field to config space. > > > > > > I propose to add a line to the spec " Device Configuration Space" > > > section, something like, > > > > > > Note: Any new device configuration space fields additional MUST consider > > accessing such fields via a DMA interface. > > > > > > And this will guide the new patches of what to do instead of last moment > > rush. > > > > Yea, except again I'd probably make it a SHOULD: e.g. I can see how > > switching to > > MMIO might be an option for qemu helping us debug DMA issues. > > > There are too many queues whose debugging is needed and MMIO likely not the > way to debug. > > > The time to discuss this detail would be around when proposal for the DMA > > access to config space is on list though: I feel this SHOULD vs MUST is a > > small > > enough detail. > > > From implementation POV it is certainly critical and good step forward to > optimize virtio interface. > > > Going back to inner hash. If we move supported_tunnels back to config space, > > do you feel we still need GET or just drop it? I note we do not have GET for > > either hash or rss config. > > > For hash and rss config, debugging is missing. :) > Yes, we can drop the GET after switching supported_tunnels to struct > virtio_net_hash_config. > I would like to make sure if we're aligned. The new version should contain the following: 1. The supported_tunnel_types are placed in the device config space; 2. Reserve the following structure: struct virtnet_hash_tunnel { le32 enabled_tunnel_types; }; 3. Reserve the SET command for enabled_tunnel_types and remove the GET command for enabled_tunnel_types. If there is no problem, I will modify it accordingly. Thanks! > > And if we no longer have GET is there still a reason for a separate command > > as > > opposed to a field in virtio_net_hash_config? > > I know this was done in v11 but there it was misaligned. > > We went with a command because we needed it for supported_tunnels but > > now that is no longer the case and there are reserved words in > > virtio_net_hash_config ... > > > > Let me know how you feel it about that, not critical for me. > > struct virtio_net_hash_config reserved is fine. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash
在 2023/6/29 下午7:48, Michael S. Tsirkin 写道: On Thu, Jun 29, 2023 at 10:05:09AM +0800, Heng Qi wrote: 在 2023/6/29 上午9:56, Parav Pandit 写道: From: Michael S. Tsirkin Sent: Wednesday, June 28, 2023 3:45 PM Maybe I get it. You want to use the new features as a carrot to force drivers to implement DMA? You suspect they will ignore the spec requirement just because things seem to work? Right because it is not a must normative. Well SHOULD also does not mean "ok to just ignore". This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. RECOMMENDED and SHOULD forces the device to support MMIO, which is not good. So rather a good design is device tells the starting offset for the extended config space. And extended config space MUST be accessed using a DMA. With this sw can have infinite size MMIO and hw device forces DMA based on its implementation of where to start DMA from. This also gives the ability to maintain current config as MMIO for backward compatibility. There's some logic here, for sure. you just might be right. However, surely we can discuss this small tweak in 1.4 timeframe? Sure, if we prefer the DMA approach I don't have a problem in adding temporary one field to config space. I propose to add a line to the spec " Device Configuration Space" section, something like, Note: Any new device configuration space fields additional MUST consider accessing such fields via a DMA interface. And this will guide the new patches of what to do instead of last moment rush. Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to MMIO might be an option for qemu helping us debug DMA issues. There are too many queues whose debugging is needed and MMIO likely not the way to debug. The time to discuss this detail would be around when proposal for the DMA access to config space is on list though: I feel this SHOULD vs MUST is a small enough detail. From implementation POV it is certainly critical and good step forward to optimize virtio interface. Going back to inner hash. If we move supported_tunnels back to config space, do you feel we still need GET or just drop it? I note we do not have GET for either hash or rss config. For hash and rss config, debugging is missing. :) Yes, we can drop the GET after switching supported_tunnels to struct virtio_net_hash_config. Great! Glad to hear this! And if we no longer have GET is there still a reason for a separate command as opposed to a field in virtio_net_hash_config? I know this was done in v11 but there it was misaligned. We went with a command because we needed it for supported_tunnels but now that is no longer the case and there are reserved words in virtio_net_hash_config ... Let me know how you feel it about that, not critical for me. struct virtio_net_hash_config reserved is fine. +1. Inner header hash is orthogonal to RSS, and it's fine to have its own structure and commands. There is no need to send additional RSS fields when we configure inner header hash. Thanks. Not RSS, hash calculations. It's not critical, but I note that practically you said you will enable this with symmetric hash so it makes sense to me to send this in the same command This works for me. Thanks. with the key. Not critical though if there's opposition. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash
On Thu, Jun 29, 2023 at 04:59:28PM +, Parav Pandit wrote: > > > > From: Michael S. Tsirkin > > Sent: Thursday, June 29, 2023 7:48 AM > > > > > > struct virtio_net_hash_config reserved is fine. > > > > > > +1. > > > > > > Inner header hash is orthogonal to RSS, and it's fine to have its own > > > structure and commands. > > > There is no need to send additional RSS fields when we configure inner > > > header hash. > > > > > > Thanks. > > > > Not RSS, hash calculations. It's not critical, but I note that practically > > you said > > you will enable this with symmetric hash so it makes sense to me to send > > this in > > the same command with the key. > > > > In the v19, we have, > > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with > VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT. > > So it is done along with rss, so in same struct as rss config is fine. Do you mean having both virtio_net_rss_config and virtio_net_hash_config have enabled_hash_types? Like this: struct virtio_net_rss_config { le32 hash_types; le16 indirection_table_mask; struct rss_rq_id unclassified_queue; struct rss_rq_id indirection_table[indirection_table_length]; le16 max_tx_vq; u8 hash_key_length; u8 hash_key_data[hash_key_length]; +le32 enabled_tunnel_types; }; struct virtio_net_hash_config { le32 hash_types; -le16 reserved[4]; +le32 enabled_tunnel_types; +le16 reserved[2]; u8 hash_key_length; u8 hash_key_data[hash_key_length]; }; If yes, this should have been discussed in v10 [1] before, enabled_tunnel_types in virtio_net_rss_config will follow the variable length field and cause misalignment. If we let the inner header hash reuse the virtio_net_hash_config structure, it can work, but the only disadvantage is that the configuration of the inner header hash and *RSS*(not hash calculations) becomes somewhat coupled. Just imagine: If the driver and the device negotiated VIRTIO_NET_F_HASH_TUNNEL and VIRTIO_NET_F_RSS, but did not negotiate VIRTIO_NET_F_HASH_REPORT, 1. then if we only want to configure the inner header hash (such as enabled_tunnel_types), it is good for us to send virtio_net_hash_config alone; 2. but then if we want to configure the inner header hash and RSS (such as indirection table), we need to send all virtio_net_rss_config and virtio_net_hash_config once, because virtio_net_rss_config now does not carry enabled_tunnel_types due to misalignment. So, I think the following structure will make it clearer to configure inner header hash and RSS/hash calculation. But in any case, if we still propose to reuse virtio_net_hash_config proposal, I am ok, no objection: 1. The supported_tunnel_types are placed in the device config space; 2. Reserve the following structure: struct virtnet_hash_tunnel { le32 enabled_tunnel_types; }; 3. Reserve the SET command for enabled_tunnel_types and remove the GET command for enabled_tunnel_types. [1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00317.html Thanks a lot! - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] [PATCH v17] virtio-net: support inner header hash
On Tue, Jun 20, 2023 at 08:06:16AM -0400, Michael S. Tsirkin wrote: > On Mon, Jun 12, 2023 at 04:09:20PM +0800, Heng Qi wrote: > > 1. Currently, a received encapsulated packet has an outer and an inner > > header, but > > the virtio device is unable to calculate the hash for the inner header. The > > same > > flow can traverse through different tunnels, resulting in the encapsulated > > packets being spread across multiple receive queues (refer to the figure > > below). > > However, in certain scenarios, we may need to direct these encapsulated > > packets of > > the same flow to a single receive queue. This facilitates the processing > > of the flow by the same CPU to improve performance (warm caches, less > > locking, etc.). > > > >client1client2 > > |+---+ | > > +--->|tunnels|<+ > >+---+ > > | | > > v v > > +-+ > > | monitoring host | > > +-+ > > > > To achieve this, the device can calculate a symmetric hash based on the > > inner headers > > of the same flow. > > > > 2. For legacy systems, they may lack entropy fields which modern protocols > > have in > > the outer header, resulting in multiple flows with the same outer header but > > different inner headers being directed to the same receive queue. This > > results in > > poor receive performance. > > > > To address this limitation, inner header hash can be used to enable the > > device to advertise > > the capability to calculate the hash for the inner packet, regaining better > > receive performance. > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/173 > > > > Signed-off-by: Heng Qi > > Reviewed-by: Xuan Zhuo > > Reviewed-by: Parav Pandit > > > Looks good but small rewording suggestions: > > > --- > > v16->v17: > > 1. Some small rewrites. @Parav Pandit > > 2. Add Parav's Reviewed-by tag (Thanks!). > > > > v15->v16: > > 1. Remove the hash_option. In order to delimit the inner header hash > > and RSS > >configuration, the ability to configure the outer src udp port hash > > is given > >to RSS. This is orthogonal to inner header hash, which will be done > > in the > >RSS capability extension topic (considered as an RSS extension > > together > >with the symmetric toeplitz hash algorithm, etc.). @Parav Pandit > > @Michael S . Tsirkin > > Hmm maybe. I'd like the TC see this patch though, they are designed to > work together. Can we see at least a draft? I have been burned I've drawn up a simple draft on hashing based on outer src port, but not yet on toeplitz symmetric hashing, I'll send it out in about 4 weeks, but please don't delay this work. You have my word. > many times by contributors just addressing what they > care about, promising to follow through with other work and > never bothering. I got it. > > > 2. Fix a 'field' typo. @Parav Pandit > > > > v14->v15: > > 1. Add tunnel hash option suggested by @Michael S . Tsirkin > > 2. Adjust some descriptions. > > > > v13->v14: > > 1. Move supported_hash_tunnel_types from config space into cvq command. > > @Parav Pandit > > 2. Rebase to master branch. > > 3. Some minor modifications. > > > > v12->v13: > > 1. Add a GET command for hash_tunnel_types. @Parav Pandit > > 2. Add tunneling protocol explanation. @Jason Wang > > 3. Add comments on some usage scenarios for inner hash. > > > > v11->v12: > > 1. Add a command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG. > > 2. Refine the commit log. @Michael S . Tsirkin > > 3. Add some tunnel types. > > > > v10->v11: > > 1. Revise commit log for clarity for readers. > > 2. Some modifications to avoid undefined terms. @Parav Pandit > > 3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit > > 4. Add the normative statements. @Parav Pandit > > > > v9->v10: > > 1. Removed hash_report_tunnel related information. @Parav Pandit > > 2. Re-describe the limitations of QoS for tunneling. > > 3. Some clarification. > > > > v8->v9: > > 1. Merge hash_report_tunnel_types into hash_report.