Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-09-01 Thread Heng Qi

在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi

在 2022/9/2 下午2:21, Jason Wang 写道:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi


在 2022/9/2 下午2:41, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 02:21:04PM +0800, Jason Wang wrote:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi


在 2022/9/2 下午2:21, Jason Wang 写道:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO

Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-09-07 Thread Heng Qi


在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-30 Thread Heng Qi


在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-30 Thread Heng Qi

在 2022/8/25 下午10:22, Cornelia Huck 写道:

On Tue, Aug 16 2022, Heng Qi  wrote:


From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

I do not have any further comments on the change, let's see what the
networking folks think.


Okay. Thanks for your review.



[Do we require patches to be posted to virtio-comment, or is virtio-dev
enough? I'm a bit unsure right now.]


We can then consider posting patches to virtio-comment.


Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-20 Thread Heng Qi
On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
> 
> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first 
> > \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 


We've talked a bit more about split header so far, but there still seem to
be some issues, so let's recap.

一、 Method Discussion Review

In order to adapt to the Eric's tcp receive interface to achieve zero copy,
header and payload are required to be stored separately, and the payload is
stored in a paged alignment way. Therefore, we have discussed several options
for split header as follows:

1: method A ( depend on the descriptor chain )
| receive buffer| 
|  0th descriptor

Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-13 Thread Heng Qi
On Fri, Sep 09, 2022 at 08:47:57PM +0800, Xuan Zhuo wrote:
> 
> hi
>Qi also sent another same email today. Due to some email client problems,
>this email has some confusion in the format, so we can discuss
>under another one.
> 
>https://lists.oasis-open.org/archives/virtio-dev/202209/msg00066.html
> 

Yes. Due to some formatting issues with the mail client, I resent this new email
which may be in a clear style for your review.

Do you have more questions about the contents of this new email?

Looking forward to your comments.

Thanks.

> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-09 Thread Heng Qi
On Sun, Sep 04, 2022 at 04:27:38PM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > We need to clarify that the purpose of header splitting is to make all 
> > payloads
> > can be independently in a page, which is beneficial for the zerocopy
> > implemented by the upper layer.
> 
> absolutely, pls add motivation.
> 
> > If the driver does not enforce that the buffers submitted to the receiveq 
> > MUST
> > be composed of at least two descriptors, then header splitting will become 
> > meaningless,
> > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated 
> > at this time.
> > 
> > 
> > Thanks.
> > 
> > 
> 
> 
> This seems very narrow and unecessarily wasteful of descriptors.
> What is wrong in this:
> 
> .. 
> 
> seems to achieve the goal of data in a separate page without
> using extra descriptors.
> 
> thus my proposal to replace the requirement of a separate
> descriptor with an offset of data from beginning of
> buffer that driver sets.
>


We have carefully considered your suggestion. 

Let's summarize the schemes we've considered before and now.

1. Scheme A ( refer to spec v7 )

We refer to spec v7 and earlier as scheme A for short. Review scheme A below: 
| receive buffer| 
|  0th descriptor  | 1th descriptor | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|  payload   | 

We use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be put
independently in a page, which is very beneficial for the zerocopy 
implemented by the upper layer. 

Scheme A better solves the problem of headroom, tailroom and
memory waste, but as you said, this solution relies on descriptor chain. 

2. Scheme B ( refer to your suggestion )

Our rethinking approach is no longer based on descriptor chain.
 
We refer to your proposed offset-based scheme as scheme B.
As you suggested, scheme B gives the device a buffer, using offset to
indicate where to place the payload. Like this: 

..  

But how to apply for this buffer?
Since we want the payload to be placed on a separate page, the method
we consider is to directly alloc two pages from driver of contiguous memory. 

Then the beginning of this contiguous memory is used to store the headroom,
and the contiguous memory after the headroom is directly handed over to the 
device.
Similar to the following: 

[-- receive buffer(2 pages) --] 
[<first page ---><-- second page >] 
[<-> ..<   payload >] 
   ^^
   ||
   |pointer to device
   |
   |
   Driver reserved, the later part is filled

3. Scheme C (this sheme we have sent to you on September 7th, maybe you miss 
it.)

Based on your previous suggestion, we also considered another new scheme C. 
This scheme is implemented based on mergeable buffer, filling a separate page 
each time. 

If the split header is negotiated and the packet can be successfully split by 
the device,
the device needs to find at least two buffers, namely two pages, one for the 
virtio-net header
and transport header, and the other for the payload. Like the following: 

|   receive buffer1(page)  | receive buffer2 (page) 
  | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload
  | 

At the same time, if XDP is considered, then the device needs to add headroom 
at the
beginning of receive buffer1 when receiving packets, so that the driver can 
process
programs similar to XDP. 

In order to solve this problem, scheme C introduce an offset which requires
the device to write data from the offset to receive buffer1, like the 
following: 

|   receive buffer (page) | 
receive buffer (page) | 
| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| 
payload   | 
^
|
pointer to device
   


4. Summarize

Then we simply compare the advantages and disadvantages of scheme A(spec v7),
scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 

1). descriptor chain: 
 i)  A depends on desciptor chain;
 ii) B, C do not depend on desciptor chain. 

2). page alloc 
 i) B fills with two consecutive pages, which causes a great waste of 
memory
for small packages such as arp;
 ii) C fills with a single page, slightly better than B. 

3). Memory waste: 
 i) The memory waste of scheme A is mainly the 0th descriptor
that is skipped by the device;
 ii) When scheme B and 

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-09 Thread Heng Qi




在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:

We need to clarify that the purpose of header splitting is to make all payloads
can be independently in a page, which is beneficial for the zerocopy
implemented by the upper layer.

absolutely, pls add motivation.


If the driver does not enforce that the buffers submitted to the receiveq MUST
be composed of at least two descriptors, then header splitting will become 
meaningless,
or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at 
this time.


Thanks.




This seems very narrow and unecessarily wasteful of descriptors.
What is wrong in this:

.. 

seems to achieve the goal of data in a separate page without
using extra descriptors.

thus my proposal to replace the requirement of a separate
descriptor with an offset of data from beginning of
buffer that driver sets.



We have carefully considered your suggestion.

We refer to spec v7 and earlier as scheme A for short. Review scheme A 
below:


| receive buffer |

| 0th descriptor | 1th descriptor |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

We use a buffer plus a separate page when allocating the receive

buffer. In this way, we can ensure that all payloads can be

independently in a page, which is very beneficial for the zerocopy

implemented by the upper layer.

scheme A better solves the problem of headroom, tailroom and memory 
waste, but as you said, this solution relies on descriptor chain.


Our rethinking approach is no longer based on or using descriptor chain.

We refer to your proposed offset-based scheme as scheme B:

As you suggested, scheme B gives the device a buffer, using offset to 
indicate where to place the payload like this:


.. 

But how to apply for this buffer? Since we want the payload to be placed 
on a separate page, the method we consider is to directly apply to the 
driver for two pages of contiguous memory.


Then the beginning of this contiguous memory is used to store the 
headroom, and the contiguous memory after the headroom is directly 
handed over to the device. similar to the following:


<-- receive buffer(2 pages) 
->


<<-- first page 
--->< second page -->>


<header>>


Based on your previous suggestion, we also considered another new scheme C.

This scheme is implemented based on mergeable buffer, filling a separate 
page each time.


If the split header is negotiated and the packet can be successfully 
split by the device, the device needs to find at least two buffers, 
namely two pages, one for the virtio-net header and transport header, 
and the other for the data payload. Like the following:


| receive buffer1(page) | receive buffer2 (page) |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

At the same time, if XDP is considered, then the device needs to add 
headroom at the beginning of receive buffer1 when receiving packets, so 
that the driver can process programs similar to XDP. In order to solve 
this problem, can scheme C introduce an offset, which requires the 
device to write data from the offset position to receive buffer1, like 
the following:


| receive buffer (page) | receive buffer (page) |

| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold 
-->| payload |


Then we simply compare the advantages and disadvantages of scheme A(spec 
v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable 
buffer):


1. desc chain:

- A depends on desciptor chain; - B, C do not depend on desciptor chain.

2. page alloc

- B fills two consecutive pages, which causes a great waste of memory 
for small packages such as arp; - C fills a single page, slightly better 
than B.


3. Memory waste:

- The memory waste of scheme A is mainly the 0th descriptor that is 
skipped by the device; - When scheme B and scheme C successfully split 
the header, there is a huge waste of the first page, but the first page 
can be quickly released by copying.


4. headroom

- The headrooms of plan A and plan B are reserved; - Scheme C requires 
the driver to set off to let the device skip off when using receive buffer1.


5. tailroom

- When splitting the header, skb usually needs to store each independent 
page in the non-linear data area based on shinfo. - The tailroom of 
scheme A is reserved by itself; - Scheme B requires the driver to set 
the reserved padding area for the first receive buffer(2 pages) to use 
shinfo when the split header is not successfully executed; - Scheme C 
requires the driver to set max_len for the first receive buffer(page).



Which plan do you prefer?

---

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubsc

Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-27 Thread Heng Qi




在 2022/9/28 上午5:35, Michael S. Tsirkin 写道:

On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote:

在 2022/9/9 20:38, Xuan Zhuo 写道:

On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin"  wrote:

On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:

在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:

We need to clarify that the purpose of header splitting is to make all payloads
can be independently in a page, which is beneficial for the zerocopy
implemented by the upper layer.

absolutely, pls add motivation.


If the driver does not enforce that the buffers submitted to the receiveq MUST
be composed of at least two descriptors, then header splitting will become 
meaningless,
or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at 
this time.


Thanks.



This seems very narrow and unecessarily wasteful of descriptors.
What is wrong in this:

.. 

seems to achieve the goal of data in a separate page without
using extra descriptors.

thus my proposal to replace the requirement of a separate
descriptor with an offset of data from beginning of
buffer that driver sets.



We have carefully considered your suggestion.

We refer to spec v7 and earlier as scheme A for short. Review scheme A
below:

| receive buffer |

| 0th descriptor | 1th descriptor |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

We use a buffer plus a separate page when allocating the receive

buffer. In this way, we can ensure that all payloads can be

independently in a page, which is very beneficial for the zerocopy

implemented by the upper layer.

scheme A better solves the problem of headroom, tailroom and memory waste,
but as you said, this solution relies on descriptor chain.

Our rethinking approach is no longer based on or using descriptor chain.

We refer to your proposed offset-based scheme as scheme B:

As you suggested, scheme B gives the device a buffer, using offset to
indicate where to place the payload like this:

.. 

But how to apply for this buffer? Since we want the payload to be placed on
a separate page, the method we consider is to directly apply to the driver
for two pages of contiguous memory.

Then the beginning of this contiguous memory is used to store the headroom,
and the contiguous memory after the headroom is directly handed over to the
device. similar to the following:

<-- receive buffer(2 pages)
->

<<-- first page
--->< second page -->>

<>

Based on your previous suggestion, we also considered another new scheme C.

This scheme is implemented based on mergeable buffer, filling a separate
page each time.

If the split header is negotiated and the packet can be successfully split
by the device, the device needs to find at least two buffers, namely two
pages, one for the virtio-net header and transport header, and the other for
the data payload. Like the following:

| receive buffer1(page) | receive buffer2 (page) |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

At the same time, if XDP is considered, then the device needs to add
headroom at the beginning of receive buffer1 when receiving packets, so that
the driver can process programs similar to XDP. In order to solve this
problem, can scheme C introduce an offset, which requires the device to
write data from the offset position to receive buffer1, like the following:

| receive buffer (page) | receive buffer (page) |

| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
payload |

And in fact, B and C both use an offset now, right?

B: offset is used to get the position to place the payload.
C: The offset is used to reserve some space for the device, which the driver can
 use as headroom.

 In order to make the payload page-aligned, we can only hand over the entire
 page to the device, so we cannot reserve some headroom in advance.


For C, it might be better to do some tweak since mergeable buffer doesn't
forbid using a descriptor chain as a single buffer.

So if it's a descriptor chain we got back the method A by placing the
payload in a dedicated buffer. If it's not placing the payload in an
adjacent buffer.

Thanks

Let's find a way so devices do not care how descriptors are laid out.


Hi, I'm not sure if you're replying to the wrong email (now in v7 thread),
if not, I'm guessing you mean we continue to discuss a way for devices
to not care how descriptors are laid out based on split header v7,
and no longer consider the new options B and C?


Thanks.




Then we simply compare the advantages and disadvantages of scheme A(spec
v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
buffer):

1. desc chain:

- A depends on desciptor chain; - B, C

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-05 Thread Heng Qi


在 2022/9/5 下午3:52, Xuan Zhuo 写道:

On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"  wrote:

On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:

When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
the driver requires that the buffers submitted to receiveq
MUST be composed of at least two descriptors,
which means that each buffer the device gets is a descriptor chain,
even if the device does not split the header for some packets.

To store packet in the descriptor chain without header splitting
by the device, the device MUST start with the first descriptor of
the descriptor chain to store the packet, and MUST NOT set the
VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.

Thanks.

Descriptor chains will hurt performance badly.

I understand the reasons for the performance impact here are:
1. Two buffers are used
2. One buffer occupies two descs

This is the same as my understanding in the case of mergeable. We also need to
pack the packets into two buffers, and a packet will eventually occupy two
descs.



How about simply making this feature depend on mergeable buffers?
Then we have a separate buffer for the header and
this works cleanly.


Under mergeable, each buffer is independent, and the split header requires two
unequal descs.

If we implement it based on mergeable, then consider the scenario of tcp
zerocopy, when we fill receive vq, each buffer is an separate page, and if we 
use an
separate buffer to save the header, then this is a waste, we may
have to copy at the driver layer.

@Qi Do you think there will be other problems with this approach?

Thanks.


When we think about specs, we shouldn't be too distracted by the implementation.

But when we did think about this, suppose the driver fills by page based on
mergeable mode. in order to use the xdp program, the driver usually takes
the beginning of a single page as the headroom, and fills the rest of the page
into the virtqueue. Therefore, the empty buffer obtained by the
device is always smaller than a page when we implement split header
based on this mode, that is, the data load finally obtained by the driver
is offset from the beginning of the page. This does not enjoy the benefits of 
zero copy.

At the same time, since the header is always only more than 100 bytes,
the page occupied by the header is a waste of the buffer.

Thanks.








Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-04 Thread Heng Qi


在 2022/8/4 下午9:50, Cornelia Huck 写道:

On Thu, Aug 04 2022, Heng Qi  wrote:


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi   wrote:

@@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.

   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
+been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
+is not set, the driver MAY use \field{hdr_len} only as a hint about the
   transport header size.
-The driver MUST NOT rely on \field{hdr_len} to be correct.
+
+If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the 
driver
+MUST NOT rely on \field{hdr_len} to be correct.

I think we should keep the above description as-is. For whatever case,
the driver must not trust the metadata set by the device and must
perform necessary sanity tests on them.


My idea is to keep the current description as it is,
but to emphasize in the next version:
"If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."


Just to be clear, you suggest using

"If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
transport header size.

The driver MUST NOT rely on \field{hdr_len} to be correct.

If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."


Yes. I will use the above description to make it clearer in the next version.




(Maybe "...the driver MAY use \field{hdr_len} as a hint about the length
of the protocol header..."? It's still not reliable, right?)


\field{hdr_len} is unreliable when VIRTIO_NET_F_SPLIT_HEADER is not negotiated.


If VIRTIO_NET_F_SPLIT_HEADER is negotiated, "split header" MAY perform the split
from the IP layer, so the protocol header and the transport header are 
different.

so I think the "...the driver MAY use \field{hdr_len} only as a hint about the
transport header size..." paragraph can be left as-is.




Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-10 Thread Heng Qi


在 2022/8/9 下午5:18, Jason Wang 写道:

On Thu, Aug 4, 2022 at 8:48 PM Heng Qi  wrote:


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  n

[virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-16 Thread Heng Qi
From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

 conformance.tex |   2 ++
 content.tex | 102 
 2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
 #define VIRTIO_NET_HDR_F_DATA_VALID2
 #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
 u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE0
 #define VIRTIO_N

Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-04 Thread Heng Qi


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.

  If one of the VIRTIO_NET_F_GUES

[virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-01 Thread Heng Qi
The content of the mail just sent is about "[PATCH v6] virtio_net: 
support split header".


在 2022/8/1 下午2:59, Heng Qi 写道:

From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol

+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  not set VIRTIO_NET_HD

[virtio-dev] [PATCH] virtio_net: support split header

2022-08-01 Thread Heng Qi
From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split on a 
packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is skipped in 
the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case of
successful negotiation.

 conformance.tex |   2 +
 content.tex | 111 ++--
 2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
 #define VIRTIO_NET_HDR_F_DATA_VALID2
 #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
 u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE0
 #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
 not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
 
 If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the device SHOULD set \field{hdr_len} to a value

[virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-09 Thread Heng Qi




在 2023/1/9 下午7:39, Michael S. Tsirkin 写道:

Btw this "are defined below" all over the place is just contributing
to making the spec unnecesarily verbose. Simple "are:" will do.


Sure. I'll fix it in the next version.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-09 Thread Heng Qi
On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote:
> > > This will give extra pressure on the management stack, e.g it requires
> > > the device to have an out of spec way for introspection.
> > > 
> > > Thanks
> > 
> > As I tried to explain this is already the case. Feature bits do not
> > describe device capabilities fully, some of them are in config space.
> 
> To be precise, this does not necessarily require introspection, but
> it does require management control over config space
> such as supported hash types just like it has control over feature bits.
> E.g. QEMU currently seems to hard-code these to
> #define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
> 
> but there's no reason not to give management control over these.

Yes, QEMU has requirements for live migration: the PCI config space will be
checked in get_pci_config_device(), and if src and dst are inconsistent, it
will prompt that the live migration failed. 
In fact, this is also done within our group. Live migration requires that
the two VMs have the same rss configuration, otherwise the migration will fail.

Therefore, it seems that we can regularize the description of 
VIRTIO_NET_F_HASH_TUNNEL into
"[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for 
tunnel-encapsulated packets.",
and use different hash_types to help the migration determine whether it can 
succeed.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-10 Thread Heng Qi




在 2023/1/10 下午3:26, Heng Qi 写道:

On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote:

On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote:

This will give extra pressure on the management stack, e.g it requires
the device to have an out of spec way for introspection.

Thanks

As I tried to explain this is already the case. Feature bits do not
describe device capabilities fully, some of them are in config space.

To be precise, this does not necessarily require introspection, but
it does require management control over config space
such as supported hash types just like it has control over feature bits.
E.g. QEMU currently seems to hard-code these to
#define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \
  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)

but there's no reason not to give management control over these.

Yes, QEMU has requirements for live migration: the PCI config space will be
checked in get_pci_config_device(), and if src and dst are inconsistent, it
will prompt that the live migration failed.


To be clearer, I mean \filed{supported_hash_types} in structure 
virtio_net_config.


Thanks.


In fact, this is also done within our group. Live migration requires that
the two VMs have the same rss configuration, otherwise the migration will fail.

Therefore, it seems that we can regularize the description of 
VIRTIO_NET_F_HASH_TUNNEL into
"[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for 
tunnel-encapsulated packets.",
and use different hash_types to help the migration determine whether it can 
succeed.

Thanks.


--
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3] virtio_net: support inner header hash

2022-12-18 Thread Heng Qi




在 2022/12/16 下午8:49, Michael S. Tsirkin 写道:

On Mon, Dec 05, 2022 at 02:36:39PM +0800, Heng Qi wrote:

@@ -4005,6 +4159,24 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9
  \end{lstlisting}
  
+If \field{hash_report} differs from VIRTIO_NET_HASH_REPORT_NONE,

+\field{hash_report_tunnel} can report the type of the tunnel-encapsulated
+packet to the driver over the inner header hash calculation.
+Possible values that the device can report in \field{hash_report_tunnel}
+are defined below:
+
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_REPORT_GRE 1
+#define VIRTIO_NET_HASH_REPORT_VXLAN   2
+#define VIRTIO_NET_HASH_REPORT_GENEVE  3
+\end{lstlisting}
+
+The values VIRTIO_NET_HASH_REPORT_GRE, VIRTIO_NET_HASH_REPORT_VXLAN and
+VIRTIO_NET_HASH_REPORT_GENEVE correspond to VIRTIO_NET_HASH_TYPE_GRE_INNER,
+VIRTIO_NET_HASH_TYPE_VXLAN_INNER and VIRTIO_NET_HASH_TYPE_GENEVE_INNER bits
+of supported hash types defined in respectively
+\ref{sec:Device Types / Network Device / Device Operation / Processing of 
Incoming Packets / Hash calculation for incoming packets / Supported/enabled 
hash types}.
+
  \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / 
Device Operation / Control Virtqueue}
  
  The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is

If the new feature flag is negotiated, we need to spell out more clearly what 
are the rules
for packets that are not encapsulated. Is hash_report_tunnel set to 0


Yes, we should. When the _TUNNEL feature is negotiated, for 
non-encapsulated packets we set \field{hash_report_tunnel} to 0.



then? Another comment is that we keep repeating GRE/VXLAN/GENEVE too
many times. Let's add a paragraph defining a concept e.g. a "tunnel" or
"tunneled packets", explaining how they are handled at a high level,
and then just refer to the tunnel everywhere.
Let's also add external references to specifications documenting the
relevant tunnel types.


Ok, I'll try to make this clear.

Thanks.





--
2.19.1.6.gb485710b



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH] virtio_net: support low and high rate of notification coalescing sets

2022-12-21 Thread Heng Qi




在 2022/12/21 下午7:48, Alvaro Karsz 写道:

Hi,


I want to know which one is better than NetDim(Coalesce Adaptive) in driver.

I know Heng Qi's work in this.

Thanks


Why choose? we can have both.
ethtool can handle both pkt_rate_low/pkt_rate_high and
use_adaptive_rx_coalesce/use_adaptive_tx_coalesce.

The adaptive algorithm can even use this feature to set low and high
coalescing sets.


Hi, all.


NetDIM is currently a mature library in the kernel. It uses the number 
of bytes, PPS and interrupt rate


as samples to make an action, and it is performed independently in tx or 
rx direction. Also, There will


be an extra worker to help us send the configuration based on the 
control queue to avoid interrupting
the softirq. Although the method of pkt_rate_{low, high} params does not 
conflict with dim, I seem


to have some doubts that the colasce parameters of the rx and tx 
directions are determined at the


same time based on pkt_rate alone, will this be a problem?


Thanks.



Alvaro



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash

2023-01-30 Thread Heng Qi
On Wed, Jan 18, 2023 at 11:45:39PM +, Parav Pandit wrote:
> 
> 
> > From: virtio-dev@lists.oasis-open.org  
> > Sent: Wednesday, January 4, 2023 2:14 AM
> 
> > If the tunnel is used to encapsulate the packets, the hash calculated using 
> > the
> > outer header of the receive packets is always fixed for the same flow 
> > packets,
> > i.e. they will be steered to the same receive queue.
> 
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner
> > +header hash for GRE, VXLAN and GENEVE tunnel-encapsulated packets.
> > +
> A device may not support all 3 at the same time.
> Please remove mentioning tunneling protocols description from here.
> Just say device support inner header hash ...

Sorry for the late reply due to vacation.

Good idea, Michael suggested doing the same. But we also discussed this issue:
Early, we used a feature bit to force devices to support GRE and VXLAN (see
https://lists.oasis-open.org/archives/virtio-dev/202211/msg00183.html ).
Later Jason suggested to use VIRTIO_NET_F_HASH_TUNNEL instead of 
VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER (see
https://lists.oasis-open.org/archives/virtio-dev/202212/msg00014.html ). Now 
Michael proposes to remove this list (see
https://lists.oasis-open.org/archives/virtio-dev/202301/msg00079.html ), 
because the migration is based on the feature
bit and hash types to determine whether the live migration is successful.
> 
> An additional bit map somewhere else should say supported hash over different 
> tunneling types.
> 

Yes, we use \field{supported_hash_types} to declare supported hash types.

> [...]
> 
> > +The device calculates the hash on the inner IPv4 packet of an
> > +encapsulated packet according to 'Enabled hash types' bitmask as follows:
> > +\begin{itemize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCPv4 is set and the encapsulated packet
> > has an inner
> > +   TCPv4 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +  \item inner Source TCP port
> > +  \item inner Destination TCP port
> > +\end{itemsize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_UDPv4 is set and the encapsulated
> > packet has an
> > +   inner UDPv4 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +  \item inner Source UDP port
> > +  \item inner Destination UDP port
> > +\end{itemize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_IPv4 is set, the hash is calculated 
> > over
> > the
> > +   following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +\end{itemsize}
> > +  \item Else the device does not calculate the hash \end{itemize}
> > +
> > +The device calculates the hash on the inner IPv6 packet without an
> > +extension header of an encapsulated packet according to 'Enabled hash 
> > types'
> > bitmask as follows:
> > +\begin{itemize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCPv6 is set and the encapsulated packet
> > has an inner
> > +   TCPv6 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +  \item inner Source TCP port
> > +  \item inner Destination TCP port
> > +\end{itemsize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_UDPv6 is set and the encapsulated
> > packet has an
> > +   inner UDPv6 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +  \item inner Source UDP port
> > +  \item inner Destination UDP port
> > +\end{itemize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_IPv6 is set, the hash is calculated 
> > over
> > the
> > +   following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +\end{itemsize}
> > +  \item Else the device does not calculate the hash \end{itemize}
> > +
> > +The device calculates the hash on the inner IPv6 packet with an
> > +extension header of an encapsulated packet according to 'Enabled hash 
> > types'
> > bitmask as follows:
> > +\begin{itemsize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCP_EX is set and the encapsulated packet
> > has an inner
> > +   TCPv6 header, the hash is calculated over the following fields:
> > +\begin{itemize}
> > +  \item Home address from the home address option in the inner IPv6
> > destination
> > +  options header. If the inner extension header is not 
> > present, use the
> > +  inner Source IPv6 address.
> > +  \item 

[virtio-dev] Re: [PATCH 1/2] virtio_net: fix syntax errors

2022-11-09 Thread Heng Qi




在 2022/11/9 下午9:23, Michael S. Tsirkin 写道:

On Wed, Nov 09, 2022 at 07:35:14PM +0800, Heng Qi wrote:

Please ignore this email.

Thanks.

ok. if 2/2 is still relevant pls post it separately.


I sent a new patch set with a cover-letter, please see in 
https://lists.oasis-open.org/archives/virtio-dev/202211/msg00025.html.





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash

2023-02-01 Thread Heng Qi




在 2023/2/2 上午11:55, Parav Pandit 写道:

From: virtio-comm...@lists.oasis-open.org  On Behalf Of Michael S. Tsirkin
Sent: Wednesday, February 1, 2023 1:57 AM

Also, this patch is adding two functionalities.
1. Inner header hash calculation of existing already defined hash
types 2. outer header hash for new type for GRE,VXLAN,GENEVE.
#1 should be in 1st patch.
#2 should be in 2nd patch.
This is better to review.

Parav, you come to this discussion pretty late. Asking to split up the patch 
when
it's v1/v2 is ok. Asking after others have already reviewed v6 is not you are
making review easier for yourself but re-review harder for others who already
have a mind map of the patch.

In this case unless we really want to enable these separately (and frankly I 
don't
see a good reason to) then splitting it up makes review more confusing.


No. There is no need to enable it separately.
It was hard to parse new inner type decoding addition which has close to zero 
relation to outer headers.

As you say, it has some history, I don't have strong opinion to split.
But going forward in subsequent work, it is better to see logical changes in 
multiple patches.


It seems that we don't need to emphasize the outer header hash, which is the
same behavior as usual when the inner header hash is not raised.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH v10] virtio-net: support the virtqueue coalescing moderation

2023-03-08 Thread Heng Qi




在 2023/3/7 上午6:57, Michael S. Tsirkin 写道:

On Thu, Mar 02, 2023 at 11:36:18AM +, David Edmondson wrote:

+for an enabled transmit/receive 
virtqueue whose number is \field{vqn}.

Should this now be "whose index is \field{vqn}"?

Ugh.  I guess we'll have to fix the number/index mess in the spec
first. Parav you said you are looking into it?




# Where virtqueue number and virtqueue index are used.
  1. In the Virtqueue Configuration Section, use the virtqueue index: 
"Write the **virtqueue index** (first queue is 0) to queue_select."

  2. Both descriptions are used separately in the Notification Section.
  2.1 Here vqn is called virtqueue index:
    "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
the driver sends an available buffer notification
  to the device by writing the **16-bit virtqueue index** 
of this virtqueue to the Queue Notify address.

  ...

  le32 {
 vqn: 16;
 next_off : 15;
 next_wrap: 1;
 };
 ...

 If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the 
driver MUST use the queue_notify_data value instead of the **virtqueue 
index**."


  2.2 Here vqn is called virtqueue number:
    "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
the Notification data contains the **Virtqueue number**.

    ...

    be32 {
    vqn: 16;
    next_off : 15;
    next_wrap: 1;
 };
    ...

    When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
this notification involves sending the **virtqueue number** to the 
device (method depending on the transport).

    ...

    vqn -- **VQ number** to be notified."

# 0-based index and 0-based number are used respectively in the RSS Section:
1. "Field unclassified_queue contains the **0-based index** of the 
receive virtqueue to place unclassified packets in. Index 0 corresponds 
to receiveq1."
2. "use the result as the index in the indirection table to get 
**0-based number** of destination receiveq (value of 0 corresponds to 
receiveq1)."




\field{vqn} has been called '0-based virtqueue index' or '0-based 
virtqueue number',

I think both seem to be friendly to readers, so what are your options?

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-08 Thread Heng Qi




在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:


在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

 it is necessary for entropy from encapsulated packets to be
 exposed in the tunnel header.  The most common technique for this is
 to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Assuming that the same flow includes a unidirectional flow a->b, or a 
bidirectional flow a->b and b->a,

such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another 
gateway for some reason, resulting in different outer IP address,
    then this flow may be processed by different CPUs after reaching 
the host if there is no inner hash. So after the host receives the
    flow, first use the forwarding CPUs to parse the inner hash, and 
then use the hash to ensure that the flow is processed by the

    same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow 
may go to gateway 2. In order to ensure that the same flow is
    processed by the same CPU, we still need the forwarding CPUs to 
parse the real inner hash(here, the hash key needs to be replaced with a 
symmetric hash key).





1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
  and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...


A 5-tuple brings more entropy than a single port, doesn't it? In fact, 
the inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer 
header we modify now, but they did not give me the data.



2. When there is an inner header hash, the gateway will directly help parse
the outer header, and use the inner 5 tuples to calculate the inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there is
no need to use some CPUs to forward and parse the outer header.

You really have to parse the outer header anyway,
otherwise there's no tunneling.
Unless you want to teach virtio to implement tunneling
in hardware, which is something I'd find it easier to
get behind.


There is no need to parse the outer header twice, because we use shared 
memory.



2) The entropy of the original quintuple is sufficient, and the queue is
widely distributed.

It's exactly the same entropy, why would it be better? In fact you
are taking out the outer hash entropy making things worse.


I don't get the point, why the entropy of the inner 5-tuple and the 
outer tunnel header is the same,

multiple streams have the same outer header.

Thanks.



Thanks.

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www

[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] RE: [PATCH v10] virtio-net: support the virtqueue coalescing moderation

2023-03-08 Thread Heng Qi




在 2023/3/9 上午6:30, Parav Pandit 写道:

From: virtio-dev@lists.oasis-open.org  On
Behalf Of Heng Qi
Sent: Thursday, March 2, 2023 10:27 PM


I remember we discussed that instead of mentioning each individual field,

better to describe the whole structure being read-only or write-only.

Consider the following scenarios:
1. A read-only field of the structure virtio_net_ctrl_coal is extended for
CTRL_NOTF_COAL_RX/TX_SET to get some extra information

A set command cannot extend the struct virtio_net_ctrl_coal, particularly for 
read-only and partially for write-only.
This would mean that for the tiny number of bytes, an additional dma descriptor 
is to be allocated with read/write-only permissions.
It would be inefficient for the driver to do so for the SET command to have vqn 
as write-only, reserved as read-only, rest fields as write-only dma attributes.

As I think more, I think the whole set command fields should be read-only for 
device. Better to describe it this way instead of splitting individual fields.
This way driver can just do a single DMA allocation with read-only attributes 
for set cmd.

Get command doesn’t have any choice today the way CVQ is structured to it lives 
with the limitation.


I think it is reasonable and will be revised in the next version.




Looks good, however you have well covered in the device normative

statements.

So possibly it can be removed from here.

I tend to keep this, as we have done in the past, we can have normative
descriptions and the corresponding non-normative descriptions.


Ok. but please revisit if the description can be simpler text than the 
normative lines.


Ok.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi
On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > 
> > > > 
> > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash 
> > > > > > > > > > calculated
> > > > > > > > > > using the outer header of the receive packets is always 
> > > > > > > > > > fixed for the
> > > > > > > > > > same flow packets, i.e. they will be steered to the same 
> > > > > > > > > > receive queue.
> > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > Yes, you are right. That's what we did before the inner header 
> > > > > > > > hash, but it
> > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > 
> > > > > > > > > For example geneve spec says:
> > > > > > > > > 
> > > > > > > > >it is necessary for entropy from encapsulated packets 
> > > > > > > > > to be
> > > > > > > > >exposed in the tunnel header.  The most common 
> > > > > > > > > technique for this is
> > > > > > > > >to use the UDP source port
> > > > > > > > The end point of the tunnel called the gateway (with DPDK on 
> > > > > > > > top of it).
> > > > > > > > 
> > > > > > > > 1. When there is no inner header hash, entropy can be inserted 
> > > > > > > > into the udp
> > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > and then the tunnel packet is handed over to the host. The host 
> > > > > > > > needs to
> > > > > > > > take out a part of the CPUs to parse the outer headers (but not 
> > > > > > > > drop them)
> > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > and then use the inner
> > > > > > > > hash to forward them to another part of the CPUs that are 
> > > > > > > > responsible for
> > > > > > > > processing.
> > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > tunnel, why is your host doing this?
> > > > 
> > > > Let's simplify some details and take a fresh look at two different
> > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > 
> > > > 1. In Scenario1, we can improve the processing performance of the same 
> > > > flow
> > > > by implementing inner symmetric hashing.
> > > > 
> > > > This is because even though client1 and client2 communicate 
> > > > bidirectionally
> > > > through the same flow, their data may pass
> > > > 
> > > > through and be encapsulated by different tunnels, resulting in the same 
> > > > flow
> > > > being hashed to different queues and processed by different CPUs.
> > > > 
> > > > To ensure consistency and optimized processing, we need to parse out the
> > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > 
> > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > prevent the introduction of more concepts, but it is indeed a kind of 
> > > > inner
> > > > hash.
> > > If parts of a flow go through different tunnels wo

[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi




在 2023/3/15 下午11:09, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 09:19:43PM +0800, Heng Qi wrote:

Any encapsulation technology that includes UDP/L4 header likely do not
prefer based on the inner header. This is because the outer header src
port entropy is added based on the inner header.

I was not able to follow the discussion in v9 that you had with Michael.
Did you conclude if this is needed for vxlan too?

If not, for now it may be better to skip vxlan and nvegre as they
inherently have unique outer header UDP src port based on the inner
header.

Symmetric hashing ignores the order of the five-tuples when calculating the
hash, that is, using (a1,a2,p1,p2) and (a2,a1,p2,p1) respectively can
calculate the same hash.
There is a scenario that the two directions client1->client2 and
client2->client1 of the same flow may pass through different tunnels.
In order to allow the data in two directions to be processed by the same
CPU, we need to calculate a symmetric hash based on the inner packet header.
Sorry I didn't mention this earlier just to avoid introducing the concept of
symmetric hashing.

But the hash is already there in the port. Is it then maybe just
the question of ignoring the IP addresses when hashing?


We do not ignore the IP address, because after the tunnel sends the 
packets to the processing host,
the processing host will parse the outer headers, and then use the inner 
symmetric hash to hand over the
packets of the same flow to the same cpu for processing (for the network 
topology, please check my latest reply thread).


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support the virtqueue coalescing moderation

2023-03-22 Thread Heng Qi




在 2023/3/23 上午1:02, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:53 PM

On Wed, Mar 22, 2023 at 04:49:58PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:47 PM

I agree with Cornelia here. Yes if devices do not want to trust
drivers then they will validate input but what exactly happens then is

currently up to device.

If we want to try and specify devices in all cases of out of spec
input that's a big project, certainly doable but I would rather not
connect it to this, rather boutique, feature.

Both of your and Cornelia's comment is abstract to me.
We cannot change past.

But we can make sure things are consistent. Currently we don't describe device
behaviour if driver is out of spec and I see 0 reasons to start doing it with
coalescing commands specifically.


For the new command of interest here, hen driver supplied incorrect values,

the device will return error.

It might be easier for device to just set NEEDS_RESET and stop responding.

This approach of treating all errors as a fatal category is completely the 
opposite of making the device and driver resilient to (recoverable) errors.
We shouldn't go this route.
Different discussion...


For
a hypervisor implementation that's often better than returning error since
device state is then preserved making things easier to debug.


How to implement is upto the device to figure out.


what to do is also up to the device.

Previously error code as not returned hence new command cannot return the error 
code is going backward.

Returning the failure code is a way to indicate that the driver had a 
recoverable error.


I agree with you. Part of the specification [1] covered something we're 
talking about, e.g. if an untrusted driver sends a disabled vq, the 
device returns an error:


[1] +The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the 
designated virtqueue is disabled.


Maybe we should modify [1] to:

"The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the 
designated \field{vqn} is not the virtqueue number of an enabled 
transmit or receive virtqueue."



Thanks!





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash

2023-03-22 Thread Heng Qi




在 2023/3/23 上午11:58, Heng Qi 写道:



在 2023/3/23 上午11:13, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:42 PM
Yes. But my point is this. Some flows can be IPv4 others IPv6.
Do you see a way to have a key that will result in a symmetrical 
hash for both

IPv4 and IPv6? Can you give an example please?


Heng,

Is that the requirement to have two completely different flows (ipv, 
ipv6) to steer to a single RQ?


Michael should be talking about whether there is a symmetric key that 
can serve both IPv4 and IPv6, so that they can respectively achieve 
the purpose of symmetric hashing.
I am not an expert in hashing, but this article [1] deduces a 
symmetric hash key, and I think it should be possible to deduce a 
specific key to meet this requirement.


Sorry for the lost link:
[1] https://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf



Or we should support XOR hashing.

Thanks.

The requirement, what I understood, is between two directions for a 
flow to result in a single hash value.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash

2023-03-22 Thread Heng Qi




在 2023/3/23 上午11:13, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:42 PM
  

Yes. But my point is this. Some flows can be IPv4 others IPv6.
Do you see a way to have a key that will result in a symmetrical hash for both
IPv4 and IPv6? Can you give an example please?


Heng,

Is that the requirement to have two completely different flows (ipv, ipv6) to 
steer to a single RQ?


Michael should be talking about whether there is a symmetric key that 
can serve both IPv4 and IPv6, so that they can respectively achieve the 
purpose of symmetric hashing.
I am not an expert in hashing, but this article [1] deduces a symmetric 
hash key, and I think it should be possible to deduce a specific key to 
meet this requirement.


Or we should support XOR hashing.

Thanks.


The requirement, what I understood, is between two directions for a flow to 
result in a single hash value.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-10 Thread Heng Qi





在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:


在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

  it is necessary for entropy from encapsulated packets to be
  exposed in the tunnel header.  The most common technique for this is
  to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?



Let's simplify some details and take a fresh look at two different 
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).


1. In Scenario1, we can improve the processing performance of the same 
flow by implementing inner symmetric hashing.


This is because even though client1 and client2 communicate 
bidirectionally through the same flow, their data may pass


through and be encapsulated by different tunnels, resulting in the same 
flow being hashed to different queues and processed by different CPUs.


To ensure consistency and optimized processing, we need to parse out the 
inner header and compute a symmetric hash on it using a special rss key.


Sorry for not mentioning the inner symmetric hash before, in order to 
prevent the introduction of more concepts, but it is indeed a kind of 
inner hash.


2. In Scenario2 with GRE, the lack of outer transport headers means that 
flows between multiple communication pairs encapsulated by the same tunnel


will all be hashed to the same queue. To address this, we need to 
implement inner hashing to improve the performance of RSS. By parsing 
and calculating


the inner hash, different flows can be hashed to different queues.

Thanks.




Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
     then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
     flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
     same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
     processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
   and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...

A 5-tuple brings more entropy than a single port, doesn't it?

But you don't need more for RSS, the indirection table is not
that large.


In fact, the
inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer
header we modify now, but they did not give me the data.

Admittedly, out hash value is 32

[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi
On Wed, Mar 15, 2023 at 07:06:53PM -0400, Parav Pandit wrote:
> 
> 
> On 3/15/2023 9:19 AM, Heng Qi wrote:
> >
> >
> >在 2023/3/15 上午11:23, Parav Pandit 写道:
> >>
> >>
> >>On 3/6/2023 10:48 AM, Heng Qi wrote:
> >>
> [..]
> >>>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> >>>+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> >>I think this should also say that HASH_TUNNEL requires either of
> >>the F_RSS or F_HASH_REPORT.
> >>Because without it HASH_TUNNEL is not useful.
> >
> >F_HASH_TUNNEL indicates that the hash should be calculated using
> >the inner packet header, even without F_RSS or F_HASH_REPORT,
> >we can continue to use the hash value in scenarios such as RPS or
> >ebpf programs.
> Yes.
> Even for rps or ebpf programs, F_HASH_TUNNEL is fine.
> When such feature arrives in future, it above line will have OR
> condition for RPS feature bit.
> 
> >
> >
> >I think it's fine to let F_HASH_TUNNEL rely on F_RSS or
> >_F_HASH_REPORT as those are probably important scenarios where
> >inner packet header hash is used.
> Yes.
> 
> >>If not, for now it may be better to skip vxlan and nvegre as
> >>they inherently have unique outer header UDP src port based on
> >>the inner header.
> >
> >Symmetric hashing ignores the order of the five-tuples when
> >calculating the hash, that is, using (a1,a2,p1,p2) and
> >(a2,a1,p2,p1) respectively can calculate the same hash.
> >There is a scenario that the two directions client1->client2 and
> >client2->client1 of the same flow may pass through different
> >tunnels.
> >In order to allow the data in two directions to be processed by
> >the same CPU, we need to calculate a symmetric hash based on the
> >inner packet header.
> I am lost in two directions and two clients above.
> When you say two directions, do you mean tx and rx?
> and do symmetric hashing between tx and rx between two end points
> within single tunnel?
>

A rough description of this scene is as follows:
Client1 sends packets to client2, and client2 sends packets to client1 
respectively.
This is called two directions, and they are in the same flow.
The packets in the two directions may reach the processing host through
different tunnels. In this scenario, we need to hash these packets to the
same queue for the same cpu to process, so inner symmetric hashing is required.


client1   client2
| |
|  __ |
+->| tunnels |<---+
   |-|
  |  |
  |  |
  |  |
  v  v
+-+
| processing host |
+-+


> >>>+\field{hash_tunnel_types} is set to
> >>>VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for the
> >>>+unencapsulated packets.
> >>>+
> I missed this before. unencapsulated is not a term.
> s/unencapsulated packets/Non encapsulated packets or non tunneled packets
> 

Yes, you are right!

Thanks.

> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscr...@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
> List help: virtio-comment-h...@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi




在 2023/3/16 上午7:24, Parav Pandit 写道:


On 3/15/2023 8:10 AM, Michael S. Tsirkin wrote:

On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote:
If not, for now it may be better to skip vxlan and nvegre as they 
inherently

have unique outer header UDP src port based on the inner header.


So what's left, GRE?  GRE is actually different, in that it's not IP at
all.

Sorry, I wrongly wrote nvegre above.

IPoIP, GRE and NVGRE are left.

vxlan and geneve has the udp src entropy.




Not sure I understand "its not IP at all".

GRE has outer IP header + GRE header with the key to identify the flow.
The key is effectively the hash for the flow.


So if we are talking about GRE, hash is indeed not calculated at all at
the moment, right? 
Hash of the outer IP header of the src and dst IP can be still 
calculated currently for GRE when the optional key is not present.



And I would say a natural first step for GRE is
actually adding a hash type that will support this protocol.

For GRE and NVGRE GRE_header.key as the flow/hash identifier should 
work without inner header hash.
Older version of the GRE doesn't have key, so inner header hash is 
useful.


Yes, indeed.
The old GRE does not have keys such as flow id. Even with the new GRE, 
we have no chance to use optional fields,
because we have connected with various operators, and the most basic 
fields can get the best compatibility.


Thanks.




How about doing that? It seems like this should be a small step
and completely uncontroversial.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-09 Thread Heng Qi




在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



Inner hash can at least hash tunnel flows without outer transport 
headers like GRE to multiple queues,

which is beneficial to us.

For tunnel flows with outer transport headers like VXLAN, although they 
can hash flows to different queues
by setting different outer udp port, this does not conflict with inner 
hash. Inner hashing can also be used for this purpose.


For the same flow, packets in the receiving and sending directions may 
pass through different tunnels respectively, which cause
the same flow to be hashed to different queues. In this case, we have to 
calculate a symmetric hash (can be called an inner symmetric hash, which 
is a type of inner hash.)
through the inner header, so that the same flow can be hashed to the 
same queue.


Symmetric hashing can ignore the order of the 5-tuples to calculate the 
hash, that is, the hash values ​​calculated by (a1, a2, a3, a4) and (a2, 
a1, a4, a3) respectively are the same.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:



在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?


Yes, the directions client1->client2 and client2->client1 may go through 
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two 
directions of the same flow to improve performance.






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit


We use the most basic GRE header fields (not NVGRE), not even optional 
fields.
There is also no flow id in the GRE header, should you be referring to 
NVGRE?


Thanks.


so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for configurations with
more than 512 queues then?



Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
      then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
      flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
      same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
      processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



1). During this process, the CPUs on the hos

[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 上午11:23, Parav Pandit 写道:



On 3/6/2023 10:48 AM, Heng Qi wrote:


  +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash

May be to say inner packet header hash..
This make its little more clear about "which header" that you 
explained in the commit log.




Sure, I'll add this.


+    for tunnel-encapsulated packets.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications 
coalescing.
    \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 
packets.
@@ -139,6 +142,7 @@ \subsubsection{Feature bit 
requirements}\label{sec:Device Types / Network Device

  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.

  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
I think this should also say that HASH_TUNNEL requires either of the 
F_RSS or F_HASH_REPORT.

Because without it HASH_TUNNEL is not useful.


F_HASH_TUNNEL indicates that the hash should be calculated using the 
inner packet header, even without F_RSS or F_HASH_REPORT,
we can continue to use the hash value in scenarios such as RPS or ebpf 
programs.



Right?
if no, than my below comments are meaningless.


I think it's fine to let F_HASH_TUNNEL rely on F_RSS or _F_HASH_REPORT 
as those are probably important scenarios where inner packet header hash 
is used.






  \end{description}
    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,20 +202,27 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
  \end{lstlisting}
-The following field, \field{rss_max_key_size} only exists if 
VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+The following field, \field{rss_max_key_size} only exists if 
VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or 
VIRTIO_NET_F_HASH_TUNNEL is set.

  It specifies the maximum supported length of RSS key in bytes.

I think rss_max_key_size field dependency should be only of the 
existing feature bits F_RSS and F_HASH_REPORT.
This is because those are the bits really deciding to consider 
rss_max_key_size.


  The following field, \field{rss_max_indirection_table_length} only 
exists if VIRTIO_NET_F_RSS is set.
  It specifies the maximum number of 16-bit entries in RSS 
indirection table.
    The next field, \field{supported_hash_types} only exists if the 
device supports hash calculation,

-i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or 
VIRTIO_NET_F_HASH_TUNNEL is set.



Same as above.

  Field \field{supported_hash_types} contains the bitmask of 
supported hash types.
  See \ref{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash types} for details of supported hash 
types.
  +The next field, \field{supported_tunnel_hash_types} only exists if 
the device
+supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is 
set.

+
Above line "the next field .." can be just same as "Device supports 
inner packet header hash calculation, i.e..."
This is because here, the term "header" is missed, which is present in 
the definition of feature bit 52.


I'll rephrase the description to make the whole text more consistent.



    The device MUST set \field{rss_max_key_size} to at least 40, if 
it offers

-VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
+VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL.
This needs change if the above first comment about rss_max_key_size is 
right.


  The device MUST set \field{rss_max_indirection_table_length} to at 
least 128, if it offers

  VIRTIO_NET_F_RSS.
@@ -843,11 +854,13 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network

  \begin{itemize}
  \item The feature VIRTIO_NET_F_RSS was negotiated. The device uses 
the hash to determine the receive virtqueue to place incoming packets.
  \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The 
device reports the hash value and the hash type with the packet.
+\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The 
device supports inner hash calculation.

  \end{itemize}


inner packet header hash ..


Ok.



+If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated, the 
encapsulation
+hash type below indicates that the hash is calculated over the inner 
header of

+the encapsulated packet:
+Hash type applicable for inner payload of the gre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE (1 << 1)
+\end{lstlisting}
+Hash 

[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 下午8:10, Michael S. Tsirkin 写道:

On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote:

If not, for now it may be better to skip vxlan and nvegre as they inherently
have unique outer header UDP src port based on the inner header.

So what's left, GRE?  GRE is actually different, in that it's not IP at
all.


I do not think so, I mentioned that VXLAN and GENEVE need inner 
symmetric hashing, and we need this.


And we know inner hashing doesn't conflict with other ways of adding 
entropy.


Thanks.


So if we are talking about GRE, hash is indeed not calculated at all at
the moment, right?  And I would say a natural first step for GRE is
actually adding a hash type that will support this protocol.

How about doing that? It seems like this should be a small step
and completely uncontroversial.





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries to 
control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel owner.

Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same queue 
as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that we 
need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi




在 2023/2/22 下午2:21, Michael S. Tsirkin 写道:

On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:

The user will figure out how to mitigate when such QoS is not available. Either 
to run in best-effort mode or mitigate differently.

Yes, our cloud security and cloud network team will configure and use inner
hash on dpdk.

Sounds good. More practical for dpdk than Linux.
Is there a chance that when the interface is close
to be final, but before the vote, you post a patch to the dpdk list and
get some acks from the maintainers, cc virtio-dev. This way we won't
merge something that will then go unused?
That would be best - do you have a prototype?


Not yet, dpdk and the business team are waiting for our virtio 
specification, and
they have stated as a business team that their implementation on dpdk 
will not necessarily be open sourced to the community.





In fact I discussed with them the security issues between
tunnels,
and I will quote their solutions to tunnel attacks below, but this is a
problem between the tunnels, not the introduction of inner hash.
I don't think we need to focus too much on this, but I'll do my best to
describe the security issues between tunnels in v10.

"
This is not a problem with the inner hash, it is a general problem with the
outer hash.
I communicated with our people who are doing cloud security (they are also
one of the demanders of inner hash),
and it is a common problem for one tunnel to attack another tunnel.

For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
the vni id of t1 is id1, and the vni id of v2 is id2; a VM.

At this time, regardless of the inner hash or the outer hash, the traffic of
tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
single queue or multiple queues),
and may be placed on the same queue to cause queue overflow.

Do note (and explain in spec?) that with just an outer hash and RSS it
is possible to configure the tunnels to use distict queues. Impossible
with this interface but arguably only works for a small number of
tunnels anyway.


# Solutions:

More like mitigations.


Yes, you are right.




1. Some current forwarding tools such as DPDK have good forwarding
performance, and it is difficult to fill up the queue;

Oh that's a good point. If driver is generally faster than the device
and queues stay away from filling up there's no DoS.
I'd add this to the spec.


Ok.




2. or switch the attack traffic to the attack clusters;

What is that?


This is done by the monitoring part outside the tunnel, which is also an 
important mitigation method they mentioned
to prevent DoS between tunnels. For example, the monitoring part cuts 
off, limits or redirects the abnormal traffic of the tunnel.





3. or connect the traffic of different tunnels to different network card
ports or network devices.

Not sure how this is relevant. These a distinct outer MAC - with this
why do we need a tunnel?


4..
"



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [PATCH v6] virtio-net: support the virtqueue coalescing moderation

2023-02-22 Thread Heng Qi

Hi, Parav.

在 2023/2/22 下午10:28, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, February 22, 2023 1:29 AM

MMIO device has vq_index register too.
I am inclined to vq_index given current state of spec

[mst@tuck virtio]$ git grep vq_index|wc -l
0


:)

Grep for "virtqueue index".
We need to change "virtqueue index" to "virtqueue number".



Yes, this is more unified in the spec. :)

This version is closed, please help review the v7 version [1] if you 
have time.


[1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00520.html

Thanks very much for your response!




and new additions by Michael.

admin vq? Good point, I will change that to vqn too.

Ok. sounds good.


Michael,
Can you please suggest vqn or vq_index to use?

[mst@tuck virtio]$ git grep vqn|wc -l
5

I'd say vqn has precedent.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-27 Thread Heng Qi




在 2023/2/27 下午4:35, Jason Wang 写道:

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features. We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.


Since the instruction set of ebpf is not complicated, some devices 
already support the offloading of ebpf,
but what troubles them is how to standardize each device to implement 
standard and optional subsystem interfaces.

There are two reasons:
#1, due to the rapid development of ebpf , so it is difficult for them 
to ensure the backward compatibility of the interface.
#2, such as network and blk devices, which interfaces must be 
implemented to allow the same ebpf program to run perfectly on each other.
Also, ebpf program is not isolated, it is expected to interact with 
userspace programs, and few examples can demonstrate

how to use them effectively.

Maybe we can take advantage of the virtio spec to get out there first.

Thanks.



What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] 
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html


--
MST




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH v6] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-19 Thread Heng Qi

Hi, Alvaro.
Thanks for your work !
As suggested earlier in the 'virtqueue coalescing' thread, I will be 
working on top of your patch. :)



在 2023/2/19 下午5:03, Alvaro Karsz 写道:

This patch makes several improvements to the notification coalescing
feature, including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.
- Stating that the commands should apply to all the receive/transmit
   virtqueues.
- Stating that every receive/transmit virtqueue should count it's own
   packets.
- A new intro explaining the entire coalescing operation.

Signed-off-by: Alvaro Karsz 
Reviewed-by: Parav Pandit 
---
v2:
- Add the last 2 points to the patch.
- Rephrase the "commands are best-effort" note.
- Replace "notification" with "used buffer notification" to be
  more consistent.
v3:
- Add an intro explaining the entire coalescing operation.
v4:
- Minor wording fixes.
- Rephrase the general note.
v5:
- Replace virtio_net_ctrl_coal->usecs with
  virtio_net_ctrl_coal->max_usecs
v6:
- Replace usecs with microseconds.
- Make the TX/RX examples relevant for MQ as well.
- Explain that upon meeting the notification condition, the
  device starts counting packets and microseconds again.
- Add Parav's Reviewed-by tag (Thanks!)

  device-types/net/description.tex | 74 +++-
  1 file changed, 44 insertions(+), 30 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 1741c79..c7a8ca6 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
  send control commands for dynamically changing the coalescing parameters.
  
-\begin{lstlisting}

-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+\begin{note}
+The behavior of the device in response to these commands is best-effort:
+the device may generate notifications more or less frequently than specified.
+\end{note}
  
-struct virtio_net_ctrl_coal_tx {

-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 max_usecs;
  };
  
  #define VIRTIO_NET_CTRL_NOTF_COAL 6

@@ -1532,49 +1532,63 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  
  Coalescing parameters:

  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification.
-\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification.
-\item \field{rx_max_packets}: Maximum number of packets to receive before a RX 
notification.
-\item \field{tx_max_packets}: Maximum number of packets to send before a TX 
notification.
+\item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX 
notification.
+\item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX 
notification.
+\item \field{max_packets} for RX: Maximum number of packets to receive before 
a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send before a 
TX notification.
  \end{itemize}
  
-

  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
\field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
\field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and 
\field{max_packets} parameters for all the transmit virtqueues.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and 
\field{max_packets} parameters for all the receive virtqueues.
  \end{enumerate}
  
-\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications}

+\subparagraph{Operation}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / Operation}
+
+The device sends a used buffer notification once the notification conditions 
are met, if the notifications are not suppressed as explained in \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Used Buffer Notification 
Suppression}.
+
+When the device has non-zero \field{max_usecs} and non-zero 
\field{max_packets}, it starts counting microseconds and packets upon 
receiving/sending a packet.
+The device counts packets and microseconds for each receive virtqueue and 
transmit virtqueue separately.
+In this case, the notification 

Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 10:42:21AM +0200, Alvaro Karsz wrote:
> Hi Heng,
> 
> > +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports the virtqueue
> > +notifications coalescing.
> > +
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit 
> > requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
> > VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \end{description}
> >
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
> > Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -4501,8 +4505,11 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
> > Types / Network Device / Devi
> >  };
> >
> >  #define VIRTIO_NET_CTRL_NOTF_COAL 6
> > - #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0
> >   #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
> > +
> >  \end{lstlisting}
> >
> >  Coalescing parameters:
> > @@ -4514,12 +4521,67 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
> > Types / Network Device / Devi
> >  \end{itemize}
> >
> >
> > -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
> > +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
> >  \begin{enumerate}
> >  \item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
> > \field{tx_max_packets} parameters.
> >  \item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
> > \field{rx_max_packets} parameters.
> > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_packets} and 
> > \field{max_usecs} parameters for a enabled
> > +transmit/receive virtqueue whose 
> > number is \field{vqn}.
> > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device gets the 
> > \field{max_packets} and \field{max_usecs} parameters of
> > +a enabled transmit/receive 
> > virtqueue whose number is \field{vqn}, and then
> > +responds them to the driver.
> >  \end{enumerate}
> >
> > +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
> > +1. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set the 
> > coalescing
> > +   parameters of a enabled transmit/receive virtqueue.
> > +2. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a 
> > device, and the device
> > +   responds to the driver with the coalescing parameters of a enabled 
> > transmit/receive virtqueue.
> > +
> > +\begin{lstlisting}
> > +struct virtio_net_ctrl_coal_vq {
> > +le16 vqn;
> > +le16 reserved;
> > +le32 max_packets;
> > +le32 max_usecs;
> > +};
> > +\end{lstlisting}
> > +
> 
> Maybe we can use struct virtio_net_ctrl_coal inside struct
> virtio_net_ctrl_coal_vq instead of repeating max_usecs and
> max_packets?
> I'm not sure if it would be confusing, what do you think?
> 

Hi Alvaro.

I guess you mean one of the following two forms:

#1
struct virtio_net_ctrl_coal {
le32 max_packets;
le32 max_usecs;
};

struct virtio_net_ctrl_coal_vq {
le16 vqn;
le16 reserved;
struct virtio_net_ctrl_coal coal;
} coal_vq;

#2
struct virtio_net_ctrl_coal {
le32 max_packets;
le32 max_usecs;
le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
};

If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq 
to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the 
presence of vqn and reserved is awkward.
If it's #2, I think this is a bit like the v1 version, using 
virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be 
semantic, but it is indeed more unified in function.

I think we should hear from Michael and Parav.

> > +Virtqueue coalescing parameters:
> > +\begin{itemize}
> > +\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
> > virtqueue, excluding the control virtqueue.
> > +\item \field{max_packets}: The maximum number of packets sent/received by 
> > the specified virtqueue before a TX/RX notification.
> > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the 
> > specified virtqueue delays a TX/RX notification.
> > +\end{itemize}
> > +
> > +\field{reserved} is reserved and it is ignored by the device.
> > +
> 
> max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
> VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
> ("Maximum number of packets to receive/send before a RX/TX notification").
> The fact that this is applied to all VQs or to a specific one 

Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote:
> > > Maybe we can use struct virtio_net_ctrl_coal inside struct
> > > virtio_net_ctrl_coal_vq instead of repeating max_usecs and
> > > max_packets?
> > > I'm not sure if it would be confusing, what do you think?
> > >
> >
> > Hi Alvaro.
> >
> > I guess you mean one of the following two forms:
> >
> > #1
> > struct virtio_net_ctrl_coal {
> > le32 max_packets;
> > le32 max_usecs;
> > };
> >
> > struct virtio_net_ctrl_coal_vq {
> > le16 vqn;
> > le16 reserved;
> > struct virtio_net_ctrl_coal coal;
> > } coal_vq;
> >
> > #2
> > struct virtio_net_ctrl_coal {
> > le32 max_packets;
> > le32 max_usecs;
> > le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
> > le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
> > };
> >
> > If it's #1, I think the format is a bit ugly, it's not semantic to use 
> > coal_vq to send global commands when _F_VQ_NOTF_COAL is not negotiated, and 
> > the presence of vqn and reserved is awkward.
> > If it's #2, I think this is a bit like the v1 version, using 
> > virtio_net_ctrl_coal as a virtual queue to send commands does not seem to 
> > be semantic, but it is indeed more unified in function.
> >
> > I think we should hear from Michael and Parav.
> >
> 
> I meant #1.
> We can see virtio_net_ctrl_coal as a struct holding coalescing
> parameters, regardless of the commands.
> Yes, let's wait for more comments on that.
> 
> > > > +Virtqueue coalescing parameters:
> > > > +\begin{itemize}
> > > > +\item \field{vqn}: The virtqueue number of the enabled transmit or 
> > > > receive virtqueue, excluding the control virtqueue.
> > > > +\item \field{max_packets}: The maximum number of packets sent/received 
> > > > by the specified virtqueue before a TX/RX notification.
> > > > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the 
> > > > specified virtqueue delays a TX/RX notification.
> > > > +\end{itemize}
> > > > +
> > > > +\field{reserved} is reserved and it is ignored by the device.
> > > > +
> > >
> > > max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
> > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
> > > ("Maximum number of packets to receive/send before a RX/TX notification").
> > > The fact that this is applied to all VQs or to a specific one is
> > > derived from the command.
> > > Same for max_usecs.
> > > Maybe we can join the coalescing parameters somehow instead of
> > > repeating the explanations?
> > >
> 
> Any thoughts on this part?

Good idea, and if so, is there a good way to expose vqn to the interpretation 
of max_packets ?

#1
\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
virtqueue.
\item \field{max_packets}: The maximum number of packets sent/received by the 
specified virtqueue before a TX/RX notification.

#2
\item \field{max_packets}: Maximum number of packets to receive/send before a 
RX/TX notification.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support the virtqueue coalescing moderation

2023-02-27 Thread Heng Qi




在 2023/2/28 上午4:19, Michael S. Tsirkin 写道:

On Mon, Feb 27, 2023 at 08:45:43PM +0200, Alvaro Karsz wrote:

-If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
-send control commands for dynamically changing the coalescing parameters.
+If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can send the 
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET
+and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands to dynamically change the 
coalescing parameters for all transmit
+and receive virtqueues, respectively.
+
+If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
+\begin{itemize}
+\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set 
coalescing parameters of a given
+  enabled transmit/receive virtqueue.
+\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a 
device, and the device responds with
+  coalescing parameters of a given enabled transmit/receive virtqueue.
+\end{itemize}


The VQ_NOTF_COAL commands are enclosed in an itemize and the NOTF_COAL
commands aren't.
I think that we should be consistent here.

The VQ commands are enclosed in an itemize because it has both GET and
SET operations.

I don't think VIRTIO_NET_CTRL_NOTF_COAL_RX_SET and
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET
have to do this, because they are both settings, just in different
directions, and we have described them like this elsewhere.


I'm not sure that describing the commands in here is necessary, maybe
just mentioning which commands can be used with which feature bit is

Isn't that what this sentence does?


Yes, but it also describes what the command does
"to dynamically change the coalescing parameters for all transmit and
receive virtqueues"

Seems a bit repetitive to me.


enough (this is what I meant in v8, sorry if I wasn't clear).

I'm not saying that describing the commands in here is not good, but
notice that the commands are described in 3 different places.

Three places describe different effects.

#1 describes which commands are available for which feature.
#2 describes which command can use which structure.
#3 describes which parameters can be set for each command, and whether
they can affect "global values".
It is placed in the "Operation" section because it explains how the
device should behave.


You're right, but some parts seems repetitive to me:
#1:  "commands to dynamically change the coalescing parameters for all
transmit and receive virtqueues"

#2: "commands use the virtio_net_ctrl_coal structure to set
\field{max_usecs} and \field{max_packets} for all transmit/receive
virtqueues."

#3: " set the \field{max_usecs} and \field{max_packets} parameters for
all transmit virtqueues, and values of..."

Feels like every time we re-explain the commands with more detail.
So, for example we read #2 and understand what the command does (set
usecs and packets), then we read #3 and understand that it does some
more stuff.

IMO we need to explain it once, with all the details.


This is how we've written it historically. The idea is that
reader can get everything on the first pass: 1st high level
picture then detailed description then the formalities.

Other specs do it differently so one has to re-read them
many times to understand. I don't like this personally and
I much prefer the virtio way.



Maybe #1 and #2 descriptions can be combined and described together.


This is #1.


   \begin{note}
   The behavior of the device in response to these commands is best-effort:
@@ -1519,25 +1531,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
   le32 max_usecs;
   };

+struct virtio_net_ctrl_coal_vq {
+le16 vqn;
+le16 reserved;
+struct virtio_net_ctrl_coal coal;
+};
+
   #define VIRTIO_NET_CTRL_NOTF_COAL 6
#define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
#define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
+ #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
+ #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
   \end{lstlisting}

+The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 
commands use the
+virtio_net_ctrl_coal structure to set \field{max_usecs} and 
\field{max_packets} for all
+transmit/receive virtqueues.
+
+The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq 
structure
+to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue 
number \field{vqn}.
+
+The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of 
\field{max_usecs} and
+\field{max_packets} of the specified virtqueue from the device by setting 
\field{vqn}
+in the virtio_net_ctrl_coal_vq structure.
+

This is #2.


+# Read/Write attributes for coalescing parameters
+\begin{itemize}
+\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
+  and \field{max_packets} are write-only for a driver.
+\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, 
\field{reserved}, \field{max_usecs}
+  and \field{max_packets} are write-only 

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support the virtqueue coalescing moderation

2023-02-28 Thread Heng Qi




在 2023/2/28 下午3:49, Michael S. Tsirkin 写道:

On Tue, Feb 28, 2023 at 10:41:18AM +0800, Heng Qi wrote:

The problem is the global qualifier. And it's not even global -
there are two sets for rx and for tx and does not apply to cvq at all.
How about "RX/TX coalescing parameters"?

As Parav suggested, we probably don't need to mention "global
values/coalescing parameters" since this values/parameters are the ones set
by CTRL_NOTF_COAL_RX/TX_SET.

Just like this:
"
After disabling and re-enabling a virtuqueue, the device MUST revert to
coalescing parameters to the one configured using VIRTIO_NET_CTRL_NOTF_COAL.
"

Thanks.

VIRTIO_NET_CTRL_NOTF_COAL is not a thing. Let's wait for Alvaro's new
patch to be merged though, see what teminology he comes up with and
reuse.


Ok, are you referring to Alvaro's "virtio-net: Fix and update 
VIRTIO_NET_F_NOTF_COAL feature",
if so I see that his v7 version is stable and has been voted in 
https://github.com/oasis-tcs/ virtio-spec/issues/159 ,

this patch was made on top of his v7 version.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-16 Thread Heng Qi




在 2023/2/16 下午3:32, Alvaro Karsz 写道:

This patch makes several improvements to the notification coalescing
feature, including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.
- Stating that the commands should apply to all the receive/transmit
   virtqueues.
- Stating that every receive/transmit virtqueue should count it's own
   packets.
- A new intro explaining the entire coalescing operation.

Signed-off-by: Alvaro Karsz 
---
v2:
- Add the last 2 points to the patch.
- Rephrase the "commands are best-effort" note.
- Replace "notification" with "used buffer notification" to be
  more consistent.
v3:
- Add an intro explaining the entire coalescing operation.
v4:
- Minor wording fixes.
- Rephrase the general note.

  device-types/net/description.tex | 69 +++-
  1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 1741c79..11760f3 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
  send control commands for dynamically changing the coalescing parameters.
  
-\begin{lstlisting}

-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+\begin{note}
+The behavior of the device in response to these commands is best-effort:
+the device may generate notifications more or less frequently than specified.
+\end{note}
  
-struct virtio_net_ctrl_coal_tx {

-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 usecs;
  };
  
  #define VIRTIO_NET_CTRL_NOTF_COAL 6

@@ -1532,49 +1532,62 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  
  Coalescing parameters:

  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification.
-\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification.
-\item \field{rx_max_packets}: Maximum number of packets to receive before a RX 
notification.
-\item \field{tx_max_packets}: Maximum number of packets to send before a TX 
notification.
+\item \field{usecs} for RX: Maximum number of usecs to delay a RX notification.
+\item \field{usecs} for TX: Maximum number of usecs to delay a TX notification.
+\item \field{max_packets} for RX: Maximum number of packets to receive before 
a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send before a 
TX notification.
  \end{itemize}
  
-

  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
\field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
\field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and 
\field{max_packets} parameters for all the transmit virtqueues.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and 
\field{max_packets} parameters for all the receive virtqueues.
  \end{enumerate}
  
-\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications}

+\subparagraph{Operation}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / Operation}
+
+The device sends a used buffer notification once the notification conditions 
are met, if the notifications are not suppressed as explained in \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Used Buffer Notification 
Suppression}.
+
+When the device has non-zero \field{usecs} and non-zero \field{max_packets}, 
it starts counting usecs and packets upon receiving/sending a packet.
+The device counts packets and usecs for each receive virtqueue and transmit 
virtqueue separately.
+In this case, the notification conditions are met when \field{usecs} usecs 
elapses, or upon sending/receiving \field{max_packets} packets, whichever 
happens first.
+


Hi, Alvaro.

"when \field{usecs} usecs elapses"  --> "when \field{usecs} elapses", right?

Thanks for your clear work.


+When the device has \field{usecs} = 0 or \field{max_packets} = 0, the 
notification conditions are met after every packet received/sent.
+
+\subparagraph{RX Example}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / RX Example}
  
  If, for example:

  \begin{itemize}
-\item 

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v8] virtio-net: support inner header hash

2023-02-17 Thread Heng Qi



在 2023/2/18 上午12:24, Parav Pandit 写道:

From: virtio-dev@lists.oasis-open.org  On
Behalf Of Heng Qi

[..]


We assume that hash_report_tunnel_types is still present in the next version,

I am little lost.


Hi, Parav.

You are not lost. I'm just answering some of Michael's questions and 
making assumptions. :)



I thought we all agreed that reporting just the tunnel type in data path path 
virtio_net_hdr is not useful to sw.
And hence we should omit in the virtio_net_hdr.

Did I miss the motivation to add it back?
If not, probably it is better to review and discuss v9 without it, which will 
be easier to discuss.


but it only exists in virtio net hdr and should be populated by the device after
the hash calculation. hash_tunnel_types already controls whether the device
computes internal header hashes.


I don't really follow this, hash_report_tunnel_type is better off
keeping it "report" literally.


Talking about VIRTIO_NET_F_HASH_TUNNEL here. Not
hash_report_tunnel_type.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi



在 2023/2/17 下午6:07, Michael S. Tsirkin 写道:

On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote:

Maybe we can use struct virtio_net_ctrl_coal inside struct
virtio_net_ctrl_coal_vq instead of repeating max_usecs and
max_packets?
I'm not sure if it would be confusing, what do you think?


Hi Alvaro.

I guess you mean one of the following two forms:

#1
struct virtio_net_ctrl_coal {
 le32 max_packets;
 le32 max_usecs;
};

struct virtio_net_ctrl_coal_vq {
 le16 vqn;
 le16 reserved;
 struct virtio_net_ctrl_coal coal;
} coal_vq;

#2
struct virtio_net_ctrl_coal {
 le32 max_packets;
 le32 max_usecs;
 le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
 le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
};

If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq 
to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the 
presence of vqn and reserved is awkward.
If it's #2, I think this is a bit like the v1 version, using 
virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be 
semantic, but it is indeed more unified in function.

I think we should hear from Michael and Parav.


I meant #1.
We can see virtio_net_ctrl_coal as a struct holding coalescing
parameters, regardless of the commands.
Yes, let's wait for more comments on that.

Reusing virtio_net_ctrl_coal is a nice thought. Makes it a bit clearer
these have exactly the same role.
Whether to put vqn first or last does not matter imho.


+Virtqueue coalescing parameters:
+\begin{itemize}
+\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
virtqueue, excluding the control virtqueue.
+\item \field{max_packets}: The maximum number of packets sent/received by the 
specified virtqueue before a TX/RX notification.
+\item \field{max_usecs}: The maximum number of TX/RX usecs that the specified 
virtqueue delays a TX/RX notification.
+\end{itemize}
+
+\field{reserved} is reserved and it is ignored by the device.
+

max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
("Maximum number of packets to receive/send before a RX/TX notification").
The fact that this is applied to all VQs or to a specific one is
derived from the command.
Same for max_usecs.
Maybe we can join the coalescing parameters somehow instead of
repeating the explanations?


Any thoughts on this part?

I think I agree. Generally I think we should first of all describe the
new VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, moving all explanation text
to that.

Then just explain that VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET have the same effect
as VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET repeated for
all currently enabled tx/rx vqs.
Plus maybe a single annotated example where there's a mix of
VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, VIRTIO_NET_CTRL_NOTF_COAL_RX_SET and
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET commands. For example with 2 vq pairs:

1. VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets for vq 0 and 2, vq 1 and 3 retain 
reset value
2. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 0, vq 2 retains value from 1
3. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 1, vq 3 retains reset value
4. VIRTIO_NET_CTRL_NOTF_COAL_TX_SET overrides command 3

no need for many examples.



Good idea. This is a clear and comprehensive example.

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 04:12:34PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Friday, February 17, 2023 6:35 AM
> 
> > > We mention the device reset case, but nothing about VQ reset.
> > >
> > > I feel that no matter how we handle this, we break something.
> > >
> > > Having default coalescing values may collide with "Upon reset, a
> > > device MUST initialize all coalescing parameters to 0."
> > 
> > No this is after device reset.
> > 
> > > We can say that VQ reset doesn't affect the "global parameters" and a
> > > device reset does, but this collides with "Device Requirements:
> > > Virtqueue Reset".
> > >
> Not really.
> When the device resets, VQ objects are destroyed in the device.
> So VQ's notifications parameters doesn't exists on device reset.
> 
> And so the same case with VQ reset.
> When a VQ is reset (disabled), VQ's notifications configuration is removed in 
> the device too.
> Just like its desc ring and other addresses are invalid.

Yes, but there seems to be such a situation: when the device is reactivated, as 
the specification says,
all parameters are set to 0 (use parameters as the default configuration on the 
device).
When CTRL_COAL_SET and CTRL_COAL_VQ_SET are sent, the configuration is updated 
(the parameters of each vq may be different,
but the global parameter configuration may be recorded), at this time, if vq is 
reset,
should the parameters be 0 or a recorded global parameters after it is 
re-enabled?

> 
> > > In fact, resetting coalescing values after vq reset may be derived
> > > from "Upon reset, a device MUST initialize all coalescing parameters
> > > to 0".
> > > This is consistent with "Device Requirements: Virtqueue Reset".
> > >
> > > I can add in my patch some clarifications.
> > >
> > > This will break the linux virtio_net ethtool implementation a little,
> > > we'll need to fix it.
> > 
> > Not good. I feel we must come up with spec that is backwards compatible.
> > Hmm, maybe this is why Parav kept talking about modes.
> > I did not realize at the time, sorry Parav.
> > 
> > I still feel modes are not the best way to describe things so I propose 
> > this:
> > - in addition to per vq parameters, device that supports global TX/RX
> >   commands and ring reset maintains two sets of default parameters: for RX 
> > and
> > TX
> > - existing commands change default and change all enabled vqs
> >   of the correct type (RX/TX) to the same value
> > - vq reset changes a vq to the default
> > - device reset changes defaults to 0 and changes all vqs to  0
> > 
> > note how defaults are only used for ring reset.  is "vq reset parameter"
> > a better name? I feel we will repeat "reset" too many times in a sentence 
> > if we
> > call it that though.
> > 
> > So fundamentally the only change is with RING_RESET, then default is not
> > always 0, it can be set by the global command.
> 
> True default is not zero when global are configured.
> It is ok to report VQ parameters on GET command when globals are configured.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Heng Qi




在 2023/2/23 下午9:13, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets using RSS to
+select queues for placement.

Is this just with RSS? I assume hash calculation is also used for
something like queueing so there's a similar risk even just
with hash reporting, no?


I don't understand why it would be risky to just report hash when not 
used for queuing,
and even we don't report whether hash come from inner or outer now 
because there is no more hash_report_tunnel.





When a user inside a tunnel tries to control the
+enqueuing of encapsulated packets, then the user can flood the device with 
invaild
+packets, and the flooded packets may be hashed into the same queue as packets 
in
+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be enqueued due to 
queue
+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal tunnels are 
extremely increased.
+\item  The user can observe the traffic information and enqueue information of 
other normal
+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+


So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
up with an idea: RSS indirection table entries are 16 bit but
onlu 15 bits are used to indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.

The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)


How a tunnel bit distinguishes between multiple tunnel types, and I 
think it is not so reasonable to use the
indirection table to determine the switch of the inner hash. The inner 
hash is only the ability to calculate the hash,

and does not involve the indirection table.

Thanks.


then
calculate inner hash
return rss[inner hash] & ~tunnel bit
else
return rss[outer hash]


this fixes the security issue returning us back to
status quo : specific tunnels can be directed to separate queues.


This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Heng Qi




在 2023/2/24 上午10:45, Jason Wang 写道:


在 2023/2/23 12:41, Heng Qi 写道:



在 2023/2/23 上午10:50, Jason Wang 写道:

Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries 
to control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even 
without tunnel (and even with single queue).


I agree.



How to mitigate those attackers seems more like a implementation 
details where might require fair queuing or other QOS technology 
which has been well studied.


I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more 
protected outside the device,

but it seems to be okay to give some attack reminders.



Maybe it's sufficient to say the device should make sure the fairness 
among different flows when queuing packets?


Yes, maybe the device does not guarantee QoS or needs to guarantee 
enqueue fairness between flows.


Thanks.



Thanks




Thanks.



It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same 
queue as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem 
that we need to solve at spec level:


1) anything make encapsulated packets different or why we can't 
hit this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks


At this point with no ways to mitigate, I don't feel this is 
something

e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org





This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: 
https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Heng Qi




在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:

On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote:



From: Michael S. Tsirkin 
Sent: Thursday, February 23, 2023 8:14 AM

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:



So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.


I further brainstormed internally with Saeed and Rony on this.

The inner hash is only needed for GRE, IPIP etc.
For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source 
port of the outer header.
It does that based on the inner header.
Refer to [1] as one example.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

But I think hash was requested for RSS with dpdk, no?


I think yes, at least probably the first customer to use the feature 
might be dpdk.:)






The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)

Tunnel bit, you mean tunneled packet, right?

this idea stores a bit in the indirection table
which signals which of the hashes to use for rss


This allows inner hash to have the ability to select a queue and place 
packets to the queue (that is, parallel to RSS),

which seems to be different from our discussion before v9. 

Thanks.




then
calculate inner hash
return rss[inner hash] & ~tunnel bit

Why to end with a tunnel bit?


this just clears the bit so we end up with a vq number.


else
return rss[outer hash]


this fixes the security issue returning us back to status quo : specific 
tunnels can
be directed to separate queues.


The number of tunnels is far higher than the number of queues with para virt 
driver doing decap.

True. This seeks to get us back to where we are before the feature:
driver can send specific outer hashes to specific queues.
outer hash collisions remain a problem.



This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.

In above example, hw calculating double hash is difficult without much gain.
Either calculating on one inner or outer makes sense.

Signaling whether calculated on inner or outer is fine because hw exactly tells 
what it did.

This, in a sense, is what reporting hash tunnel type did.
Do you now think we need it?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-22 Thread Heng Qi




在 2023/2/23 上午10:50, Jason Wang 写道:

Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries to 
control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even 
without tunnel (and even with single queue).


I agree.



How to mitigate those attackers seems more like a implementation 
details where might require fair queuing or other QOS technology which 
has been well studied.


I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more protected 
outside the device,

but it seems to be okay to give some attack reminders.

Thanks.



It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same 
queue as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that 
we need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Heng Qi




在 2023/2/28 下午4:52, Michael S. Tsirkin 写道:

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:

On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:

On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features.

Until people start asking exactly for maps because they want
state for their classifier?

Yes, but let's compare the eBPF without maps with the static feature
proposed here. It is much more scalable and flexible.


And it makes sense - if you want
e.g. load balancing you need stats which needs maps.

Yes, but we know it's possible to have that (through the XDP offload).
This is impossible with the approach proposed here.

I'm not actually objecting. And at least we then don't need to
worry about leaking info - it's not virtio leaking info
it's the bpf program. I wonder what does Heng Qi think.
Heng Qi would it work for your scenario?


We are positive on ebpf, which looks adequate in our scenario. Although 
it currently has some problems in offloading,
such as imperfect interfaces, unstable, and user-unfriendly ebpf codes 
may consume a lot of device resources. Device support for ebpf will also 
take time.
Also, the presence of ebpf offload does not conflict with other 
solutions, eg we still have RSS.


Our goal is to pass this patch first. For the support of ebpf 
offloading, we have not collected internal requirements for the time 
being, but it is indeed a good direction.


Thanks.




We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] 
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

Dave seems to have nacked this approach, no?

I may miss something but looking at kernel commit, there are few
patches to support that:

E.g

commit c7648810961682b9388be2dd041df06915647445
Author: Tony Nguyen 
Date:   Mon Sep 9 06:47:44 2019 -0700

 ice: Implement Dynamic Device Personalization (DDP) download

And it has been used by DPDK drivers.

Thanks

If we are talking about netdev then this discussion has to take place on netdev.
If it's dpdk this is more believable.


--
MST



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10] virtio-net: support the virtqueue coalescing moderation

2023-03-01 Thread Heng Qi
Currently, coalescing parameters are grouped for all transmit and receive
virtqueues. This patch supports setting or getting the parameters for a
specified virtqueue, and a typical application of this function is netdim[1].

When the traffic between virtqueues is unbalanced, for example, one virtqueue
is busy and another virtqueue is idle, then it will be very useful to
control coalescing parameters at the virtqueue granularity.

[1] https://docs.kernel.org/networking/net_dim.html

Signed-off-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
---
This patch is on top of Alvaro's latest v7 patch: 
https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .

v9->v10:
   1. Remove the "global values". @Parav Pandit
   2. Avoid multiple interpretations of command behavior. @Alvaro Karsz

v8->v9:
   1. Declare the commands that can be sent for each feature. @Alvaro Karsz
   2. Add information about "global values" in the command's explanation. 
@Alvaro Karsz

v7->v8:
   1. Use "best-effort" in Alvaro's patch instead of "the device may set 
the parameter to a value close to 2". @Michael S . Tsirkin, @David Edmondson

v6->v7:
   1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
   2. Remove the formula for vqn range. @Parav Pandit
   3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin

v5->v6:
   1. Explain that the device may set a different value than the one passed 
in by the driver. @David Edmondson

v4->v5:
   1. Add the correspondence between virtio_net_ctrl_coal and 
virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
   2. Add read and write attributes for each field. @Michael S. Tsirkin
   3. A clearer description of how to set coalescing parameters for vq 
reset. @Michael S. Tsirkin
   4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson

v3->v4:
   1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq 
structure. @Alvaro Karsz
   2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, 
@Alvaro Karsz
   3. Avoid too many examples by giving a comprehensive example. @Michael 
S. Tsirkin
   4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav 
Pandit, @Alvaro Karsz

v2->v3:
   1. Add the netdim link. @Parav Pandit
   2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on 
VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
   3. _VQ_GET is explained more. @Michael S. Tsirkin
   4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
   5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro 
Karsz
   6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
   7. Fix some typos. @Michael S. Tsirkin

v1->v2:
   1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. 
@Michael S. Tsirkin
   2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
   3. Unify tx and rx control structures into one structure 
virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
   4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. 
Tsirkin, @Parav Pandit, @Alvaro Karsz
   5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL 
can be used. @Alvaro Karsz
   6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, 
@Alvaro Karsz

 device-types/net/description.tex | 88 
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index e71e33b..b5d45e8 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network 
Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification 
coalescing.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -1505,13 +1508,13 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
 
 \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device

Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-01 Thread Heng Qi




在 2023/3/1 下午7:07, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 11:30:37AM +0800, Heng Qi wrote:


在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

 it is necessary for entropy from encapsulated packets to be
 exposed in the tunnel header.  The most common technique for this is
 to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?


In fact, similar to protocols such as GRE, there is no outer transport
header.

Thanks.


Sorry I don't understand the answer. What is similar to what?
By GRE you mean NVGRE? That has FlowID for this purpose.
Only 8 bit - is this the issue? Not enough entropy?


Sorry I almost missed this email. 

Did you miss the reply in the other thread:
"
The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host are divided into two 
parts, one part is used as a forwarding node to parse the outer header,

  and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.


2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.

The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.

"

In this thread, I mean protocols such as Generic Routing Encapsulation 
(GRE)[1], which have IPv4 as *Delivery Header*.
Compared with VXLAN, which increases entropy through outer udp src port, 
GRE has less entropy.


[1] https://www.rfc-editor.org/rfc/rfc2784.html

Thanks.









-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v5] virtio-net: support the virtqueue coalescing moderation

2023-02-21 Thread Heng Qi



在 2023/2/21 下午7:48, David Edmondson 写道:

On Tuesday, 2023-02-21 at 16:38:52 +08, Heng Qi wrote:

...
+A device MAY set the coalescing parameter to a value close to a power of 2 
value.

What is this about?

If it is intended to indicate that a device may use a value different to
that passed by the driver, more text to describe that would be


Yes. How about adding the following explanation to the "Operation" 
subparagraph?

"
When a device receives the VIRTIO_NET_CTRL_NOTF_COAL commands to set an 
coalescing parameter, it may set the parameter to a value close to a 
power of 2. For example:
If the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with \field{max_usecs} = 
7 is received, the device may set \field{max_usecs} = 8 for a given 
enabled virtqueue.

"

Thanks.


useful. If not, what does it mean?


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi




在 2023/2/22 上午7:18, Michael S. Tsirkin 写道:

On Tue, Feb 21, 2023 at 10:32:11PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Tuesday, February 21, 2023 4:46 PM

What is this information driver can't observe? It sees all the packets after 
all,
we are not stripping tunneling headers.

Just the tunnel type.
If/when that tunnel header is stripped, it gets complicated where tunnel type 
is still present in the virtio_net_hdr because hash_report_tunnel feature bit 
is negotiated.

whoever strips off the tunnel has I imagine strip off the virtio net hdr
too - everything else in it such as gso type refers to the outer packet.


I also don't really know what are upper layer drivers - for sure layering of
drivers is not covered in the spec for now so I am not sure what do you mean by
that.  The risk I mentioned is leaking the information *on the network*.


Got it.





\begin{lstlisting}  struct virtio_net_rss_config {

  le32 hash_types;
+le32 hash_tunnel_types;

This field is not needed as device config space advertisement
for the support

is enough.

If the intent is to enable hashing for the specific tunnel(s),
an individual

command is better.

new command? I am not sure why we want that. why not handle
tunnels like we do other protocols?

I didn't follow.
We probably discussed in another thread that to set M bits, it is
wise to avoid

setting N other bits just to keep the command happy, where N >>> M
and these N have a very strong relation in hw resource setup and packet

steering.

Any examples of 'other protocols'?

#define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
#define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
#define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)

this kind of thing.

I don't see how a tunnel is different fundamentally. Why does it
need its own field?

Driver is in control to enable/disable tunnel based inner hash acceleration

only when its needed.

This way certain data path hw parsers can be enabled/disabled.
Without this it will be always enabled even if there may not be any user of it.
Device has scope to optimize this flow.

I feel you misunderstand the question. Or maybe I misunderstand what you are
proposing.  So tunnels need their own bits. But why a separate field and not 
just
more bits along the existing ones?

Because the hashing is not covering the outer header contents.

We may be still not discussing the same.
So let me refresh the context.

The question of discussion was,
Scenario:
1. device advertises the ability to hash on the inner packet header.
2. device prefers that driver enable it only when it needs to use this extra 
packet parser in hardware.

There are 3 options.
a. Because the feature is negotiated, it means it is enabled for all the tunnel 
types.
Pros:
1. No need to extend cvq cmd.
Cons:
1. device parser is always enabled, and the driver never uses it. This may 
result in inferior rx performance.

b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, 
better not to enable hw for it.
Hence, have the knob to explicitly enable in hw.
So have the cvq command.
b.1 should it be combined with the existing command?
Cons:
a. when the driver wants to enable hash on inner, it needs to supply the exact 
same RSS config as before. Sw overhead with no gain.
b. device needs to parse new command value, compare with old config, and drop 
the RSS config, just enable inner hashing hw parser.
Or destroy the old rss config and re-apply. This results in weird behavior for 
the short interval with no apparent gain.

b.2 should it be on its own command?
Pros:
a. device and driver doesn't need to bother about b.1.a and b.1.b.
b. still benefits from not always enabling hw parser, as this is not a common 
case.
c. has the ability to enable when needed.

I prefer b.1. With reporting of the tunnel type gone I don't see a
fundamental difference between hashing over tunneling types and other
protocol types we support.  It's just a flag telling device over which
bits to calculate the hash. We don't have a separate command for hashing
of TCPv6, why have it for vxlan?  Extending with more HASH_TYPE makes
total sense to me, seems to fit better with the existing design and will
make patch smaller.


+1.

It is infrequent to configure the *tunnel hash types* through commands, 
and when configuring the *hash types*,

the hash key and indirection table are not required too.







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Heng Qi




在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?


Yes, you are right. That's what we did before the inner header hash, but 
it has a performance penalty, which I'll explain below.



For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port


The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host is divided into two parts, 
one part is used as a forwarding node to parse the outer header,

 and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.


2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.

The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.


Thanks.



same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Heng Qi




在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



In fact, similar to protocols such as GRE, there is no outer transport 
header.


Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi



在 2023/2/21 下午8:47, Parav Pandit 写道:



From: virtio-comm...@lists.oasis-open.org  On Behalf Of Heng Qi

Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to
report an encapsulation type, and the feature depends on
VIRTIO_NET_F_HASH_REPORT.

As we discussed that tunnel type alone is not useful the sw, neither as an

individual field nor merged with some other field.

Hence, please remove this feature bit. HASH_TUNNEL is good enough.
Please remove the references to it at more places below.

If we don't want it at all, I'll remove it and references to it.


Ok. thanks.
[..]

   #define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8)
   \end{lstlisting}


Lets please remove the below encoding.

Here is the encoding of the hash tunnel types, I guess you are referring to
remove the encoding of the hash_report_tunnel types?


Right.
[..]

+RSS to select queues for placement. When a user inside a tunnel
+tries to control the enqueuing of encapsulated packets, then the
+user can flood the device with invaild packets, and the flooded
+packets may be hashed into the same queue as packets in other normal
+tunnels, which causing
the queue to overflow.


Invalid packets are confusing and the wording of "which causing" is not

proper.

There is some duplicate wording below too.

I think above and below risk can be summarized in bit simpler manner.

How about,

When a specific receive queue is shared to receive packets of multiple

tunnels, there is no quality of service for packets of multiple tunnels.

+

I think this sentence can be used as a starting summary, and readers may still
need to expand the explanation.
Do you think the following description is ok?
"
When a specific receive queue is shared to receive encapsulating packets of
multiple tunnels, there is no quality of service for these packets of multiple
tunnels.
For example:
A user inside the tunnel floods a device with packets, then the packets are
hashed into the shared receive queue and cause the queue to overflow, and
this increases packet loss and latency for other tunnels.
"

Even without a queue overflow, this shared receive queue may not have a 
balanced number of packets.
For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1.
So, your example is right (and extreme), a generic mention of QoS covers both 
aspects.

Secondly "user inside the tunnel" is challenging to explain.
In above sentence talks specifically about the "receive queue" as an object.
  


"overflow" in my example. If we only use a one-sentence summary to 
describe tunnel risks, then I think this subparagraph is called "QoS 
problem for tunnel hash".



struct virtio_net_rss_config {

   le32 hash_types;
+le32 hash_tunnel_types;

This field is not needed as device config space advertisement for the support

is enough.

If so, virtio_net_hash_config does not require hash_tunnel_types when it does
not need to configure the specific tunnel(s).


If the intent is to enable hashing for the specific tunnel(s), an individual

command is better.

Drivers MAY filter out some tunneling types when negotiating this feature.
Do you think it would be better for us to add a separate command? I don't see
tools like ethtool that can configure specific tunnels in userspace.


The reason I proposed different command is,
Let's say we have only single command.
Rss config command has many other fields unrelated to the inner hash.


Yes. For inner hash, fields such as indirection table are irrelevant.


Hence, to enable inner hash, the driver needs to supply the exact same value 
for unrelated fields to the same value.
And device needs to compare it with the old value and maintain some sort of 
cache to derive that nothing changes from the previous hash config, hence, 
ignore hw configuration for rss.

This mechanism slows down the command for the unrelated task.


Totally agree.


Hence, I am considering a separate command that would be simple for the device 
and driver to implement.


I agree with you. Do you want me to do it in this patch, or should we do 
it in another patch?


Thanks! :)




Regardless, this new field cannot be in the middle of the new structure as it

breaks backward compatibility.
Yes, you are right. I'll fix this.

Thank you very much!


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi



在 2023/2/21 下午8:47, Parav Pandit 写道:



From: virtio-comm...@lists.oasis-open.org  On Behalf Of Heng Qi

Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to
report an encapsulation type, and the feature depends on
VIRTIO_NET_F_HASH_REPORT.

As we discussed that tunnel type alone is not useful the sw, neither as an

individual field nor merged with some other field.

Hence, please remove this feature bit. HASH_TUNNEL is good enough.
Please remove the references to it at more places below.

If we don't want it at all, I'll remove it and references to it.


Ok. thanks.
[..]

   #define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8)
   \end{lstlisting}


Lets please remove the below encoding.

Here is the encoding of the hash tunnel types, I guess you are referring to
remove the encoding of the hash_report_tunnel types?


Right.
[..]

+RSS to select queues for placement. When a user inside a tunnel
+tries to control the enqueuing of encapsulated packets, then the
+user can flood the device with invaild packets, and the flooded
+packets may be hashed into the same queue as packets in other normal
+tunnels, which causing
the queue to overflow.


Invalid packets are confusing and the wording of "which causing" is not

proper.

There is some duplicate wording below too.

I think above and below risk can be summarized in bit simpler manner.

How about,

When a specific receive queue is shared to receive packets of multiple

tunnels, there is no quality of service for packets of multiple tunnels.

+

I think this sentence can be used as a starting summary, and readers may still
need to expand the explanation.
Do you think the following description is ok?
"
When a specific receive queue is shared to receive encapsulating packets of
multiple tunnels, there is no quality of service for these packets of multiple
tunnels.
For example:
A user inside the tunnel floods a device with packets, then the packets are
hashed into the shared receive queue and cause the queue to overflow, and
this increases packet loss and latency for other tunnels.
"

Even without a queue overflow, this shared receive queue may not have a 
balanced number of packets.
For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1.
So, your example is right (and extreme), a generic mention of QoS covers both 
aspects.

Secondly "user inside the tunnel" is challenging to explain.
In above sentence talks specifically about the "receive queue" as an object.
  

struct virtio_net_rss_config {

   le32 hash_types;
+le32 hash_tunnel_types;

This field is not needed as device config space advertisement for the support

is enough.

If so, virtio_net_hash_config does not require hash_tunnel_types when it does
not need to configure the specific tunnel(s).


If the intent is to enable hashing for the specific tunnel(s), an individual

command is better.

Drivers MAY filter out some tunneling types when negotiating this feature.
Do you think it would be better for us to add a separate command? I don't see
tools like ethtool that can configure specific tunnels in userspace.


The reason I proposed different command is,
Let's say we have only single command.
Rss config command has many other fields unrelated to the inner hash.


And virtio_net_hash_config seems to suffice except for le16 reserved[4].


Hence, to enable inner hash, the driver needs to supply the exact same value 
for unrelated fields to the same value.
And device needs to compare it with the old value and maintain some sort of 
cache to derive that nothing changes from the previous hash config, hence, 
ignore hw configuration for rss.

This mechanism slows down the command for the unrelated task.
Hence, I am considering a separate command that would be simple for the device 
and driver to implement.


Regardless, this new field cannot be in the middle of the new structure as it

breaks backward compatibility.
Yes, you are right. I'll fix this.

Thank you very much!

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:

On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:

This does not mean that every device needs to implement and support all of
these, they can choose to support some protocols they want.

I add these because we have scale application scenarios for modern protocols
VXLAN-GPE/GENEVE:

+\item In scenarios where the same flow passing through different tunnels is 
expected to be received in the same queue,
+  warm caches, lessing locking, etc. are optimized to obtain receiving 
performance.


Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover.

Thanks.

But VXLAN-GPE/GENEVE can use source port for entropy.

It is recommended that the UDP source port number
 be calculated using a hash of fields from the inner packet

That is best because
it allows end to end control and is protocol agnostic.


Yes. I agree with this, I don't think we have an argument on this point 
right now.:)


For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to 
deal with

scenarios where the same flow passes through different tunnels.

Having them hashed to the same rx queue, is hard to do via outer headers.


All that is missing is symmetric Toepliz and all is well?


The scenarios above or in the commit log also require inner headers.


Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 下午10:24, Parav Pandit 写道:



From: Heng Qi 
Sent: Wednesday, April 26, 2023 10:04 AM
Yes, but that seems like a tiny cost, and the cvq command-related structure is
much simpler.

Current structure size is 24 bytes.
This size becomes multiplier with device count scale to be always available and 
rarely changes.

As we add new features such device capabilities grow making the multiplier 
bigger.
For example
a. flow steering capabilities (how many flows, what mask, supported protocols, 
generic options)
b. hds capabilities
c. counter capabilities (histogram based, which error counters supported, etc)
d. which new type of tx vq improvements supported.
e. hw gro context count supported

May be more..

Depending on the container/VM size certain capabilities may change from device 
to device.
Hence it is hard to deduplicate them at device level.


This makes sense. In general, we should be careful about adding things 
to the device space unless the benefit is non-trivial.


Thanks.



Therefore, ability to query them over a non_always_available transport is 
preferred choice from the device.

A driver may choose to cache it if its being frequently accessed or ask device 
when needed.
Even when it's cached by driver, it is coming from the component that doesn’t 
have transport level timeouts associated with it.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 上午4:28, Parav Pandit 写道:



On 4/23/2023 3:35 AM, Heng Qi wrote:
    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,6 +202,7 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
In v12 I was asking this to move to above field from the config area 
to the GET command in comment [1] as,


"With that no need to define two fields at two different places in 
config area and also in cvq."


I'm not sure if this is sufficiently motivated, RSS also has 
supports_hash_types in config space.


We don't actually need cvq and config to sync on 
supported_tunnel_hash_types, since it doesn't need to change (meaning 
supported_tunnel_hash_types doesn't send configuration change 
notifications).




I am sorry if that was not clear enough.

[1] 
https://lore.kernel.org/virtio-dev/569cbaf9-f1fb-0e1f-a2ef-b1d7cd7db...@nvidia.com/



  \subparagraph{Supported/enabled hash types}
  \label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash types}
+This paragraph relies on definitions from \hyperref[intro:IP]{[IP]}, 
\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.

  Hash types applicable for IPv4 packets:
  \begin{lstlisting}
  #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
@@ -980,6 +993,152 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  (see \ref{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / IPv6 packets without extension header}).

  \end{itemize}
  +\paragraph{Inner Header Hash}
+\label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Inner Header Hash}

+
+If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports 
inner header hash and the driver can send
+commands VIRTIO_NET_CTRL_TUNNEL_HASH_SET and 
VIRTIO_NET_CTRL_TUNNEL_HASH_GET for the inner header hash configuration.

+
+struct virtio_net_hash_tunnel_config {

Please move field from the config struct to here. Both are RO fields.

le32 supported_hash_tunnel_types;

+    le32 hash_tunnel_types;
+};
+
+#define VIRTIO_NET_CTRL_TUNNEL_HASH 7
+ #define VIRTIO_NET_CTRL_TUNNEL_HASH_SET 0
+ #define VIRTIO_NET_CTRL_TUNNEL_HASH_GET 1
+
+Filed \field{hash_tunnel_types} contains a bitmask of configured 
hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation 
/ Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash tunnel types}.

+
+The class VIRTIO_NET_CTRL_TUNNEL_HASH has the following commands:
+\begin{itemize}
+\item VIRTIO_NET_CTRL_TUNNEL_HASH_SET: set the 
\field{hash_tunnel_types} to configure the inner header hash 
calculation for the device.
+\item VIRTIO_NET_CTRL_TUNNEL_HASH_GET: get the 
\field{hash_tunnel_types} from the device.

+\end{itemize}
+
+For the command VIRTIO_NET_CTRL_TUNNEL_HASH_SET, the structure 
virtio_net_hash_tunnel_config is write-only for the driver.
+For the command VIRTIO_NET_CTRL_TUNNEL_HASH_GET, the structure 
virtio_net_hash_tunnel_config is read-only for the driver.

+
You need to split the structures to two, one for get and one for set 
in above description as get and set contains different fields.

+
+If VIRTIO_NET_HASH_TUNNEL_TYPE_NONE is set or the encapsulation type 
is not included in \field{hash_tunnel_types},
+the hash of the outer header is calculated for the received 
encapsulated packet.

+
+
+For scenarios with sufficient external entropy or no internal 
hashing requirements, inner header hash may not be needed:
+A tunnel is often expected to isolate the external network from the 
internal one. By completely ignoring entropy
+in the external header and replacing it with entropy from the 
internal header, for hash calculations, this expectation

You wanted to say inner here like rest of the places.

s/internal header/inner header


I want to make the 'external' and 'internal' correspond, but avoid the 
internal header, and use a unified 'inner header' is also reasonable.:)





+The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that 
are not supported by the device.

Multiple flags so,

s/flags that are/flags which are/


Will fix.

Thanks!


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi




在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:

We use the most basic GRE header fields (not NVGRE), not even optional
fields.

I'd say yes, the most convincing usecase is with legacy GRE.


Yes. But we still have a strong need for VXLAN and GENEVE to do 
symmetric hashing. Please consider this.



Given that, do you need the rest of protocols there?


I would say that I checked the current tunneling protocols used for 
overlay networks and their respective RFC versions compared to each other.


They are:

1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as 
either the payload or delivery protocol.

    link : https://datatracker.ietf.org/doc/rfc2784/

2. GRE_rfc2890: This protocol describes extensions by which two fields, 
Key and Sequence Number, can be optionally carried in the GRE Header.

    link: https://www.rfc-editor.org/rfc/rfc2890

3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). 
This protocol is specified for IPv6 and used as either the payload or 
delivery protocol.
    Note that this does not change the GRE header format or any 
behaviors specified by RFC 2784 or RFC 2890.

    link: https://datatracker.ietf.org/doc/rfc7676/

4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of 
encapsulating network protocol packets within GRE and UDP headers.
    This GRE-in-UDP encapsulation allows the UDP source port field to 
be used as an entropy field. This protocol is specified for IPv4 and 
IPv6, and used as either the payload or delivery protocol.

    link: https://www.rfc-editor.org/rfc/rfc8086

5. VXLAN: Virtual eXtensible Local Area Network.
    link: https://datatracker.ietf.org/doc/rfc7348/

6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol 
describes extending Virtual eXtensible Local Area Network (VXLAN) via 
changes to the VXLAN header.

    link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt

7. GENEVE: Generic Network Virtualization Encapsulation.
    link: https://datatracker.ietf.org/doc/rfc8926/

8. IPIP: IP Encapsulation within IP.
    link: https://www.rfc-editor.org/rfc/rfc2003

9. NVGRE: Network Virtualization Using Generic Routing Encapsulation
    link: https://www.rfc-editor.org/rfc/rfc7637.html

10. STT: Stateless Transport Tunneling. STT is particularly useful when 
some tunnel endpoints are in end-systems, as it utilizes the 
capabilities of the network interface card to improve performance.

  link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt

Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements 
for inner header hashing.

GRE_rfc2784 requires RSS hashing to different queues.
For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs 
to use inner symmetric hashing.


I know you mean to want this feature to only support GRE_rfc2784, since 
it's the most convincing for RSS.

But RSS hashes packets to different queues for different streams.
For the same flow, it needs to hash it to the same queue.
So this doesn't distort the role of RSS, and I believe that for modern 
protocols like VXLAN and others, inner symmetric hashing is still a 
common requirement for other vendors using virtio devices.


So, can we make this feature support all the protocols I have checked 
above, so that vendors can choose to support the protocols they want. 
And this can avoid the addition of new tunnel protocols

in the near future as much as possible.

Do you think it's ok?

Again: I'm so sorry I didn't realize I missed this until I checked my 
emails. 



We can start with just legacy GRE (think about including IPv6 or not).
Given how narrow this usecase is, I'd be fine with focusing
just on this, and addressing more protocols down the road
with something programmable like BPF. WDYT?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi




在 2023/3/21 上午3:45, Michael S. Tsirkin 写道:

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:

On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:


在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:


在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?

Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?

Yes, the directions client1->client2 and client2->client1 may go through
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two
directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?

When there is an inner hash, there is no forwarding anymore.


If forwarding why do you care about matching TX and RX queues? If e2e

In fact, we are just matching on the same rx queue. The network topology
is roughly as follows. The processing host will receive the packets
sent from client1 and client2 respectively, then make some action judgments,
and return them to client2 and client1 respectively.

client1   client2
| |
|  __ |
+->| tunnel |<+
   ||
  |  |
  |  |
  |  |
  v  v
+-+
| processing host |
+-+

Thanks.

monotoring host would be a better term


Sure.

I'm so sorry I didn't realize I missed this until I checked my emails.  :(





processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit

We use the most basic GRE header fields (not NVGRE), not even optional
fields.
There is also no flow id in the GRE header, should you be referring to
NVGRE?

Thanks.


so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you try

[virtio-dev] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-11 Thread Heng Qi




在 2023/4/12 上午5:03, Parav Pandit 写道:



On 4/3/2023 12:58 AM, Heng Qi wrote:

To achieve this, the device can calculate a suitable hash based on 
the inner headers
of this flow, for example using the Toeplitz combined with a 
symmetric hash key.


I am not sure you need symmetric hash key. Toeplitz with symmetric 
hashing without the symmetric key is possible too.


So just mentioning it as a 'combined with symmetric hashing' is enough.


Yes, as discussed with Michael, we will also support XOR hashing or 
Toeplitz symmetric hashing or even both, I'm thinking of starting a 
draft in a separate thread to let us have initial discussions.


The statement here I will modify.



  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,6 +202,7 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
Given that a set command is added via cvq, it make sense to also do 
symetrric work to get it via a cvq.
This is also similar to the latest work for notification coalescing 
for VQ where get and set done using single channel = cvq.


Only set is given to match the existing RSS/HASH configuration methods. 
But we should really look ahead as you suggest.




Granted that RSS and other fields are done differently, but it was bit 
in the past.


Yes.



With that no need to define two fields at two different places in 
config area and also in cvq.


Just the new opcode is needed for GET and things will be fine.


Right.



+If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports 
inner header hash and
+the driver can configure the inner header hash calculation for 
encapsulated packets \ref{sec:Device Types / Network Device / Device 
OperatiHn / Processing of Incoming Packets / Hash calculation for 
incoming packets / Tunnel/Encapsulated packet}
+by issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG from the 
class VIRTIO_NET_CTRL_MQ.
+The command sets \field{hash_tunnel_types} in the structure 
virtio_net_hash_tunnel_config.

+
+struct virtio_net_hash_tunnel_config {
+    le32 hash_tunnel_types;
+};
+

VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_SET
and
VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_GET


Will do.



+Filed \field{hash_tunnel_types} contains a bitmask of supported hash 
tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation 
/ Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash tunnel types}.

+
+\subparagraph{Tunnel/Encapsulated packet}
+\label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Tunnel/Encapsulated packet}

+
+A tunnel packet is encapsulated from the original packet based on 
the tunneling
+protocol (only a single level of encapsulation is currently 
supported). The
+encapsulated packet contains an outer header and an inner header, 
and the device

+calculates the hash over either the inner header or the outer header.
+
+If VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received 
encapsulated packet's
+outer header matches one of the supported \field{hash_tunnel_types}, 
the hash

+of the inner header is calculated.

s/one of the supported/one of the configured/
Because support comes from GET or config space area; out of which a 
smaller or equal subset of tunnel types are configured.


Yes, configured is obviously more accurate than supported.



+\devicenormative{\subparagraph}{Inner Header Hash}{Device Types / 
Network Device / Device Operation / Control Virtqueue / Inner Header 
Hash}

+
+The device MUST calculate the outer header hash if the received 
encapsulated packet has an encapsulation type not in 
\field{supported_tunnel_hash_types}.

+

Since the configured set can be smaller, a better reword is:
The device MUST calculate the hash from the outer header if the 
received encapsulated packet type is not matching from hash_tunnel_types.


Will modify.



+The device MUST respond to the VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG 
command with VIRTIO_NET_ERR if the device
+received an unrecognized or unsupported VIRTIO_NET_HASH_TUNNEL_TYPE_ 
flag.

+
+Upon reset, the device MUST initialize \field{hash_tunnel_type} to 0.
+
+\drivernormative{\subparagraph}{Inner Header Hash}{Device Types / 
Network Device / Device Operation / Control Virtqueue / Inner Header 
Hash}

+
+The driver MUST have negotiated the feature VIRTIO_NET_F_HASH_TUNNEL 
when issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG.

+
+The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that 
are not supported by the device.

+
  \paragraph{Hash reporting for incoming packets}
  \label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash reporting

Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-13 Thread Heng Qi




在 2023/4/14 上午11:10, Jason Wang 写道:

On Fri, Apr 14, 2023 at 5:46 AM Michael S. Tsirkin  wrote:

On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote:

   For example, when the packets of certain
+tunnels are spread across multiple receive queues, these receive
queues may have an unbalanced
+amount of packets. This can cause a specific receive queue to
become full, resulting in packet loss.


We have many places that can lead to packet dropping. For example, the
automatic steering is best effort. I tend to avoid mentioning things
like this.

Ok. And Michael what do you think about this?


I think this text did not do a great job explaining the
security aspect. Here's a better, shorter explanation:

 It is often an expectation of users that a tunnel isolates the external
 network from the internal one. By completely ignoring entropy in the
 external header and replacing it with entropy from the internal header,
 for hash calculations, this expectation might be violated to a certain
 extent, depending on how the hash is used. When the hash use is limited
 to RSS queue selection, the effect will likely be limited to ability of
 users inside the tunnel to cause packet drops in multiple queues (as
 opposed to a single queue without the feature).

And this is only for GRE-in-UDP? This makes me think if we should add
GRE support for the outer header like:

https://docs.napatech.com/r/Feature-Set-N-ANL9/Hash-Key-Type-10-3-Tuple-GREv0


I think this is for tunneling protocols with specific flow fields, such 
as GRE:key filed, NVGRE:FLOWID filed.


This requires us to make a requirement when calculating the hash of the 
tunnels when F_TUNNEL_HASH is not negotiated. It's a new work.


Thanks.



Thanks





+
+Possible mitigations:
+\begin{itemize}
+\item Use a tool with good forwarding performance to keep the
receive queue from filling up.
+\item If the QoS is unavailable, the driver can set
\field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
+  to disable inner header hash for encapsulated packets.
+\item Choose a hash key that can avoid queue collisions.
+\item Perform appropriate QoS before packets consume the receive
buffers of the receive queues.
+\end{itemize}
+
+The limitations mentioned above exist with/without the inner header
hash.


This conflicts with the tile "Tunnel QoS limitation" which readers may
think it happens only for tunnel.

Perhaps a "QoS Advices" is better?

Plural of "advice" is "advice" not "advices".

This advice is somewhat bogus though.

The point I keep trying to make is that this:

 Choose a hash key that can avoid queue collisions.

is impossible with the feature and possible without.
This was the whole reason I asked for a security
considerations sections.

--
MST



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-13 Thread Heng Qi




在 2023/4/14 上午5:43, Michael S. Tsirkin 写道:

On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote:

   For example, when the packets of certain
+tunnels are spread across multiple receive queues, these receive
queues may have an unbalanced
+amount of packets. This can cause a specific receive queue to
become full, resulting in packet loss.


We have many places that can lead to packet dropping. For example, the
automatic steering is best effort. I tend to avoid mentioning things
like this.

Ok. And Michael what do you think about this?


I think this text did not do a great job explaining the
security aspect. Here's a better, shorter explanation:

It is often an expectation of users that a tunnel isolates the external
network from the internal one. By completely ignoring entropy in the
external header and replacing it with entropy from the internal header,
for hash calculations, this expectation might be violated to a certain
extent, depending on how the hash is used. When the hash use is limited
to RSS queue selection, the effect will likely be limited to ability of
users inside the tunnel to cause packet drops in multiple queues (as
opposed to a single queue without the feature).


Sure. Will do  in the v13.






+
+Possible mitigations:
+\begin{itemize}
+\item Use a tool with good forwarding performance to keep the
receive queue from filling up.
+\item If the QoS is unavailable, the driver can set
\field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
+  to disable inner header hash for encapsulated packets.
+\item Choose a hash key that can avoid queue collisions.
+\item Perform appropriate QoS before packets consume the receive
buffers of the receive queues.
+\end{itemize}
+
+The limitations mentioned above exist with/without the inner header
hash.


This conflicts with the tile "Tunnel QoS limitation" which readers may
think it happens only for tunnel.

Perhaps a "QoS Advices" is better?

Plural of "advice" is "advice" not "advices".


My fault.



This advice is somewhat bogus though.

The point I keep trying to make is that this:

Choose a hash key that can avoid queue collisions.

is impossible with the feature and possible without.


I don't think so, the outer headers also has corresponding entropy for 
different streams.


Thanks.


This was the whole reason I asked for a security
considerations sections.



Thanks!


Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines:
https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-06 Thread Heng Qi




在 2023/2/7 上午5:53, Parav Pandit 写道:



From: virtio-dev@lists.oasis-open.org  On
Behalf Of Michael S. Tsirkin

On Mon, Feb 06, 2023 at 07:13:43PM +, Parav Pandit wrote:



From: virtio-dev@lists.oasis-open.org
 On Behalf Of Alvaro Karsz

This patch makes several improvements to the notification coalescing
feature,
including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.


Patch needs to do one thing at a time.
Please split above into three patches.


Signed-off-by: Alvaro Karsz 
---
  device-types/net/description.tex | 40
++--
  1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/device-types/net/description.tex
b/device-types/net/description.tex
index 1741c79..2a98411 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,12 @@ \subsubsection{Control
Virtqueue}\label{sec:Device Types / Network Device / Devi  If the
VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can  send
control commands for dynamically changing the coalescing parameters.

-\begin{lstlisting}
-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+Note: In general, these commands are best-effort: A device could
+send a
notification even if it is not supposed to.


Please remove this note from this patch.
Instead of Note, we need to describe this device requirements description.
We better need to describe the motivation for it.
We may want to say there may be jitter in notification, but device should not

be sending when it is not supposed to.

It's explicitly allowed:

split-ring.tex:The driver MUST handle spurious notifications from the device.
split-ring.tex:The device MUST handle spurious notifications from the driver.


The intent is the guide to device implementation to have less spurious 
interrupts.
Best effort wording says that device is free to implement timer of any 
granularity of choice, which kind of defeats the purpose of IM.

So, both can handle/generate spurious notifications, but it shouldn't be best 
line guidance.




I also have more description to add in this area with regards to GSO and LRO.
I make humble suggestion that we draft is jointly in separate patch combining

these clarifications.

-struct virtio_net_ctrl_coal_tx {
-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 usecs;
  };


This is one good change to go as separate patch.


  #define VIRTIO_NET_CTRL_NOTF_COAL 6 @@ -1532,25 +1529,25 @@
\subsubsection{Control Virtqueue}\label{sec:Device Types / Network
Device / Devi

  Coalescing parameters:
  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX

notification.

-\item \field{tx_usecs}: Maximum number of usecs to delay a TX

notification.

-\item \field{rx_max_packets}: Maximum number of packets to receive
before a RX notification.
-\item \field{tx_max_packets}: Maximum number of packets to send
before a TX notification.
+\item \field{usecs} for RX: Maximum number of usecs to delay a RX
notification.
+\item \field{usecs} for TX: Maximum number of usecs to delay a TX
notification.
+\item \field{max_packets} for RX: Maximum number of packets to
+receive
before a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send
+before
a TX notification.
  \end{itemize}


s/for Rx/For receive virtqueue
s/for Tx/For transmit virtqueue

Which virtqueue? It says TX/RX pretty consistently in this text.
Changing to receive virtqueue/transmit virtqueue would be a big change and
frankly for a very modest gain in readability.
Rather maybe just say RX/TX where we describe virtqueue.


We are describing rest of the previous sections as transmitq, receiveq.
Would like to keep this section consistent with rest of the Network device 
section, and not just notification coalescing section.


  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs}
and \field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs}
and \field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and
\field{max_packets} parameters for TX.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and
\field{max_packets} parameters for RX.
  \end{enumerate}

  \subparagraph{RX Notifications}\label{sec:Device Types / Network
Device / Device Operation / Control Virtqueue / Notifications
Coalescing / RX Notifications}

  If, for example:
  \begin{itemize}
-\item \field{rx_usecs} = 10.
-\item \field{rx_max_packets} = 15.
+\item \field{usecs} = 10.
+\item \field{max_packets} = 15.
  

[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-11 Thread Heng Qi




在 2023/2/11 下午4:45, Alvaro Karsz 写道:

Please add short description something like,

When the driver prefers to use per virtqueue notifications coalescing, and if 
queue group (transmit or receive) level notification coalescing is enabled, 
driver SHOULD first disable device level notification coalescing.
Or it should be,


I disagree here.
IMO "queue group level notification coalescing" is not something to
enable or disable, but a shortcut to set all TX/RX queues at once.
Why should the spec force a driver to "disable device level
notification coalescing" (I assume you mean send a
VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
command, and then a single queue traffic increases? why should it zero
the parameters to all other queues?


Hi, Alvaro! Thanks for your reply!

I think Parav refers more to the scene where ethool sets parameters at 
the queue group level and
the scene where netdim sets parameters for a single queue. In this 
scenario, netdim should really
determine the coalescing parameters of the device, and the parameters at 
the queue group level set
by ethtool should be ignored (many drivers are designed this way, such 
as mlx), that is, we need to give netdim the right,
because it It has the ability to dynamically adjust parameters. 
(However, I think this friendly constraint is also possible in driver 
implementation.)


Of course, if we consider setting coalescing parameters at the queue 
group level and single queue level separately through ethtool,

then as you said, we should not set any priority for them.

Back to reality, I think the function of ethtool to set single queue 
parameters may come later, which is thankless for users because of netdim.


Therefore, if our specification tends to be practical, we can add 
Parav's proposal, and if our specification tends to be more general, 
then hand over

the constraints to the driver implementation. what do you think?


I think that this should be discussed in the driver implementation
stage, not in the spec.


Virtqueue level notifications coalescing, and device level notifications can be 
enabled together.
When both of them are enabled, per virtqueue notifications coalescing take 
priority over queue group level.

How do you enable  Virtqueue level notifications coalescing? Why are
they different entities?
I don't think that we should have priorities, but the last command
should be the one that dictates the coalescing parameters.

For example, let's look at vq0 (RX):
Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
the parameters accordingly (all RX vqs should do the same).
Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
vq0 changes the parameters accordingly (all RX vqs are still using the
"old" parameters)
Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 changes the
parameters accordingly (all RX vqs should do the same).


Yes I see what you mean, thanks for the clear example. This should be 
the second scenario I described above,

let's discuss how to solve it in the above reply.

Thanks! :)



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/8 下午11:04, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.


I think it should be a. I think we should blur the concept of mode. 
There seems to be no mode here.
From the perspective of the device, it only needs to distinguish 
commands and do what it should do.


Thanks.



A better semantic is to define such change at device level and no extra cost in 
the data path.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午1:53, Alvaro Karsz 写道:

  > > From: Michael S. Tsirkin 

Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.

A better semantic is to define such change at device level and no extra cost in 
the data path.

I think that (a) is the way to go.
I don't think that we should work with operation modes at all.


I agree to keep the current global settings (VIRTIO_NET_F_NOTF_COAL and 
its corresponding commands),
because our hardware team has limited resources for the control queue, 
and they don't want to send a separate

cmd for each queue when send a global setting cmd.

Then adding a VIRTIO_NET_F_PERQUEUE_NOTF_COAL or 
VIRTIO_NET_F_VQ_NOTF_COAL feature bit and new commands for per-queue or 
VQ setting looks better to me.

In my opinion:

We should have 2 features:
VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL.

VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and
VIRTIO_NET_F_NOTF_COAL sets parameters for all queues.

VIRTIO_NET_F_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_TX_SET

VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET

We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut
for setting all queues with one command, exactly as intended wi

[virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午4:52, Michael S. Tsirkin 写道:

On Wed, Feb 08, 2023 at 07:53:09PM +0200, Alvaro Karsz wrote:

  > > From: Michael S. Tsirkin 

Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.

A better semantic is to define such change at device level and no extra cost in 
the data path.

I think that (a) is the way to go.
I don't think that we should work with operation modes at all.

In my opinion:

We should have 2 features:
VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL.

VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and
VIRTIO_NET_F_NOTF_COAL sets parameters for all queues.

VIRTIO_NET_F_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_TX_SET

VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET

We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut
for setting all queues with one command, exactly as intended with
rx_qid= 0x, and without breaking devices following the current
spec.

The device's FW can decide if it stores parameters received with
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET in a global set, or if it iterates
through all queues, but IMO the best way it to iterate through all
queues.

Seems like a win-win situation to me.
We achieve the same func

Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午6:35, Alvaro Karsz 写道:

From: Alvaro Karsz 
Sent: Wednesday, February 8, 2023 4:56 PM

Alvaro,
Do you know if any software used it? Can you get some real data?

I implemented this feature in our DPU, so at least 1 vendor is using this 
feature

But which software (virtio net driver) in which OS is using this?

Sorry, I'm not sure I understand your question.

The feature is implemented in the linux kernel
https://github.com/torvalds/linux/commit/699b045a8e43bd1063db4795be685bfd659649dc
So we'll always have kernel versions accepting this feature, if offered.


(I will add support for the per vq command of course).
I really don't know about other vendors..

You are suggesting to reserve the command and feature bit for safety, so, if we
reserve them, why not just use them? What do we lose here?


If it is used by some unknown software, only that sw breaks by using non 
release spec.
If we use it by changing the definition, it may break that unknown sw.
If we know there is no known sw, we are better of with redefinition (by adding 
vqn, and by removing tx,rx from it).


Not having this feature/command even complicates things now that we are
talking about removing the RX and TX from the per vq command, how do you
change parameters to all TX queues? to all RX queues? we'll need 2 special
indexes, so we now need le32 to hold the queue index.

No need for special index.
How does a driver disable all queues or reset all queues? -> One by one.
So if user want to change for all TXQ, sw can do it one by one by iterating TXQ 
vqns.

Yes, but resetting the queues doesn't require a control command.
If a server has 64K queues, and a user wants to set all coalescing
parameters to X (maybe with ethtool), it will generate 64K control
commands..


At least our hardware design doesn't expect that.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-10 Thread Heng Qi




在 2023/2/10 下午6:16, Alvaro Karsz 写道:

So, should we remove VIRTIO_NET_F_CTRL_VQ here, or fix VIRTIO_NET_F_HOST_ECN?

Ah good point.
But I think  VIRTIO_NET_F_VQ_NOTF_COAL should not depend on 
VIRTIO_NET_F_NOTF_COAL.
This way devices can drop the all-rx/all-tx commands if they want to.

We need to confirm this. If we make VIRTIO_NET_F_VQ_NOTF_COAL
independent of VIRTIO_NET_F_NOTF_COAL,
do we need to give vqn a special value so that the driver can also have
the fast path of sending all queues with global settings via
VIRTIO_NET_F_VQ_NOTF_COAL?

IMO we don't need a special vqn value.
A device that can modify all the vqs should offer VIRTIO_NET_F_NOTF_COAL.


That's clear, thanks for the quick reply.
Have a great weekend!


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/8 下午6:10, Michael S. Tsirkin 写道:

On Wed, Feb 08, 2023 at 09:57:54AM +0800, Heng Qi wrote:

I think it's a good idea to do this on top of Alvaro's patch
unifying these two structures.

I saw Alvaro's patch, but it doesn't seem to be stable yet, is there a good
way for me to
unify the two structures, since a patch should only do one thing.

Problem is you were trying to change these existing structures too
so the patches conflicted. However I think at this point
we are in agreement on a new command with a new structure.
In this case there won't be a conflict as you are
not touching existing commands so no need for you
to depend on Alvaro's patch.


I get it.

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote:
> 
> 
> > From: Alvaro Karsz 
> > Sent: Saturday, February 11, 2023 3:45 AM
> > 
> > > Please add short description something like,
> > >
> > > When the driver prefers to use per virtqueue notifications coalescing, 
> > > and if
> > queue group (transmit or receive) level notification coalescing is enabled, 
> > driver
> > SHOULD first disable device level notification coalescing.
> > > Or it should be,
> > >
> > 
> > I disagree here.
> > IMO "queue group level notification coalescing" is not something to enable 
> > or
> > disable, but a shortcut to set all TX/RX queues at once.
> That short cut is the enable/disablement.
> 
> > Why should the spec force a driver to "disable device level notification
> > coalescing" (I assume you mean send a
> > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
> Yes. Because to have well defined behavior when sw configured both one after 
> the another.
> 
> > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
> > command, and then a single queue traffic increases? why should it zero the
> > parameters to all other queues?
> That is short transition when driver is switching over to per queue mode.
> This is fine to have short glitch.
> 
> > I think that this should be discussed in the driver implementation stage, 
> > not in
> > the spec.
> > 
> There should be a clear guidance on how device should behave when both per q 
> and per device are configured.
> 
> > > Virtqueue level notifications coalescing, and device level notifications 
> > > can be
> > enabled together.
> > > When both of them are enabled, per virtqueue notifications coalescing take
> > priority over queue group level.
> > 
> > How do you enable  Virtqueue level notifications coalescing? Why are they
> > different entities?
> Using the new command that has vqn in it.
> 
> > I don't think that we should have priorities, but the last command should 
> > be the
> > one that dictates the coalescing parameters.
> > 
> Priority is applicable when driver has issued both the commands. Per tx/rx, 
> and per vqn.
> 
> > For example, let's look at vq0 (RX):
> > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
> > the parameters accordingly (all RX vqs should do the same).
> > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
> > vq0 changes the parameters accordingly (all RX vqs are still using the "old"
> > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0
> > changes the parameters accordingly (all RX vqs should do the same).
> In this example, per VQ were overridden with per device.
> Yes, so the last one is applicable, so priority of last one applies.
> 
> We continue to refuse to add the mode, and hence need to supply these 
> description of both the sequence on how device should behave.
> 
> Sequence_1:
> 1. tx/rx group level
> 2. per vqn level
> When #2 is done, VQ's whose per vq level is configured, follows vqn, rest of 
> the VQs follow #1.
> 
> Sequence_2:
> 1. per vqn
> 2. tx/rx group level
> When #2 is done, group level overrides the per vqn parameters. 

Adding examples of the two command sequences is due, I will do so in the next 
release. :)

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sat, Feb 11, 2023 at 06:13:55PM +0200, Alvaro Karsz wrote:
> I think that I wasn't clear enough.
> 
> I'm not saying that we should not define in the spec how to handle a
> situation when a device receives both  RX_SET and VQ_SET (or a driver
> sends both).
> I'm saying that I don't think that the driver should handle the
> situation the way you described it:
> 
> > When the driver prefers to use per virtqueue notifications coalescing, and 
> > if queue group (transmit or receive) level notification coalescing is 
> > enabled, driver SHOULD first disable device level notification coalescing.
> > Or it should be,
> >
> > Virtqueue level notifications coalescing, and device level notifications 
> > can be enabled together.
> > When both of them are enabled, per virtqueue notifications coalescing take 
> > priority over queue group level.
> 
> This implies that we have 2 modes and have priorities.
> 
> I think that if we want to refer to this situation in the spec, it
> should be something like:
> "A Device should use the last coalescing parameters received for a
> virtqueue, regardless of the command used to deliver the parameters."

Your suggestion is good, the per-device command and the per-queue command need 
some examples and behavior definitions,
I will add them to avoid some misunderstandings.

Thanks.

> (just an example to make the point).

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sun, Feb 12, 2023 at 04:35:37AM -0500, Michael S. Tsirkin wrote:
> On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote:
> > 
> > 
> > > From: Alvaro Karsz 
> > > Sent: Saturday, February 11, 2023 3:45 AM
> > > 
> > > > Please add short description something like,
> > > >
> > > > When the driver prefers to use per virtqueue notifications coalescing, 
> > > > and if
> > > queue group (transmit or receive) level notification coalescing is 
> > > enabled, driver
> > > SHOULD first disable device level notification coalescing.
> > > > Or it should be,
> > > >
> > > 
> > > I disagree here.
> > > IMO "queue group level notification coalescing" is not something to 
> > > enable or
> > > disable, but a shortcut to set all TX/RX queues at once.
> > That short cut is the enable/disablement.
> > 
> > > Why should the spec force a driver to "disable device level notification
> > > coalescing" (I assume you mean send a
> > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
> > Yes. Because to have well defined behavior when sw configured both one 
> > after the another.
> > 
> > > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
> > > command, and then a single queue traffic increases? why should it zero the
> > > parameters to all other queues?
> > That is short transition when driver is switching over to per queue mode.
> > This is fine to have short glitch.
> > 
> > > I think that this should be discussed in the driver implementation stage, 
> > > not in
> > > the spec.
> > > 
> > There should be a clear guidance on how device should behave when both per 
> > q and per device are configured.
> > 
> > > > Virtqueue level notifications coalescing, and device level 
> > > > notifications can be
> > > enabled together.
> > > > When both of them are enabled, per virtqueue notifications coalescing 
> > > > take
> > > priority over queue group level.
> > > 
> > > How do you enable  Virtqueue level notifications coalescing? Why are they
> > > different entities?
> > Using the new command that has vqn in it.
> > 
> > > I don't think that we should have priorities, but the last command should 
> > > be the
> > > one that dictates the coalescing parameters.
> > > 
> > Priority is applicable when driver has issued both the commands. Per tx/rx, 
> > and per vqn.
> > 
> > > For example, let's look at vq0 (RX):
> > > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
> > > the parameters accordingly (all RX vqs should do the same).
> > > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
> > > vq0 changes the parameters accordingly (all RX vqs are still using the 
> > > "old"
> > > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0
> > > changes the parameters accordingly (all RX vqs should do the same).
> > In this example, per VQ were overridden with per device.
> > Yes, so the last one is applicable, so priority of last one applies.
> > 
> > We continue to refuse to add the mode, and hence need to supply these 
> > description of both the sequence on how device should behave.
> > 
> > Sequence_1:
> > 1. tx/rx group level
> > 2. per vqn level
> > When #2 is done, VQ's whose per vq level is configured, follows vqn, rest 
> > of the VQs follow #1.
> > 
> > Sequence_2:
> > 1. per vqn
> > 2. tx/rx group level
> > When #2 is done, group level overrides the per vqn parameters. 
> 
> Since there's apparently some room for misunderstanding, I think adding these 
> examples can't hurt.

Ok, I'll handle this.

> I would also be more specific and just use specific numbers in the
> example, to avoid any ambiguity.

Agree.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 0/4] Short document fixes to inner hash feature

2023-07-13 Thread Heng Qi




在 2023/7/13 下午7:33, Michael S. Tsirkin 写道:

On Thu, Jul 13, 2023 at 11:12:13AM +0200, Cornelia Huck wrote:

On Thu, Jul 13 2023, Parav Pandit  wrote:


This short patches fixes the editing errors that stops the pdf and html
generation.

These 3 fixes are for the patch [1] for the github issue [2].

[1] https://lists.oasis-open.org/archives/virtio-comment/202307/msg00024.html
[2] https://github.com/oasis-tcs/virtio-spec/issues/173

Patch summary:
patch-1 place C code under listing
patch-2 avoid hyphen and extra braces
patch-3 use table as hyperlink do not work well in C code listing
patch-4 refer 'advice' as 'note'

Patch 1 to 3 appears to be must in the testing.
Patch 4 is not a fix and can be done later if it requires discussion.

Parav Pandit (4):
   virtio-net: Place C code under listing
   virtio-net: Avoid hyphen and extra braces
   virtio-net: Use table to describe inner hash to rfc mapping
   virtio-net: Use note instead of advice

  device-types/net/description.tex | 45 ++--
  introduction.tex | 15 +--
  2 files changed, 38 insertions(+), 22 deletions(-)


FTR, this is the diff I have locally (I had missed one underscore in the
references yesterday...); maybe we can make the intra-reference links in
introdcution.tex a bit nicer, but otherwise, this should be the minimal
change to make this build:

Perfect. Seems like clearly an editorial fix.

Heng Qi, in the future I'd like to ask you to please build the
spec and review the resulting PDF and HTML, before posting.


Yes, I will!






diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 206020de567d..76585b0dd6d3 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1024,12 +1024,14 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the driver can send the 
command
  VIRTIO_NET_CTRL_HASH_TUNNEL_SET to configure the calculation of the inner 
header hash.
  
+\begin{lstlisting}

  struct virtnet_hash_tunnel {
  le32 enabled_tunnel_types;
  };
  
  #define VIRTIO_NET_CTRL_HASH_TUNNEL 7

   #define VIRTIO_NET_CTRL_HASH_TUNNEL_SET 0
+\end{lstlisting}
  
  Field \field{enabled_tunnel_types} contains the bitmask of encapsulation types enabled for inner header hash.

  See \ref{sec:Device Types / Network Device / Device Operation / Processing of 
Incoming Packets /
@@ -1063,16 +1065,16 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  Hash calculation for incoming packets / Encapsulation types supported/enabled 
for inner header hash}
  
  Encapsulation types applicable for inner header hash:

-\begin{lstlisting}
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* 
\hyperref[intro:gre_rfc2784]{[GRE_rfc2784]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* 
\hyperref[intro:gre_rfc2890]{[GRE_rfc2890]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* 
\hyperref[intro:gre_rfc7676]{[GRE_rfc7676]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* 
\hyperref[intro:gre_in_udp_rfc8086]{[GRE-in-UDP]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN   (1 << 4) /* 
\hyperref[intro:vxlan]{[VXLAN]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE   (1 << 5) /* 
\hyperref[intro:vxlan_gpe]{[VXLAN-GPE]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE  (1 << 6) /* 
\hyperref[intro:geneve]{[GENEVE]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* 
\hyperref[intro:ipip]{[IPIP]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE   (1 << 8) /* 
\hyperref[intro:nvgre]{[NVGRE]} */
+\begin{lstlisting}[escapechar=|]
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* 
|\hyperref[intro:rfc2784]{[RFC2784]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* 
|\hyperref[intro:rfc2890]{[RFC2890]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* 
|\hyperref[intro:rfc7676]{[RFC7676]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* 
|\hyperref[intro:rfc8086]{[GRE-in-UDP]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN   (1 << 4) /* 
|\hyperref[intro:vxlan]{[VXLAN]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE   (1 << 5) /* 
|\hyperref[intro:vxlan-gpe]{[VXLAN-GPE]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE  (1 << 6) /* 
|\hyperref[intro:geneve]{[GENEVE]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* 
|\hyperref[intro:ipip]{[IPIP]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE   (1 << 8) /* 
|\hyperref[intro:nvgre]{[NVGRE]}| */
  \end{lstlisting}
  
  \subparagraph{Advice}

diff --git a/introduction.tex b/introduction.tex
index 81f07a4fee19..6f10a94b6fde 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -102,18 +102,18 @@ \section{Normative References}\labe

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-31 Thread Heng Qi
On Tue, May 30, 2023 at 03:33:22PM -0400, Michael S. Tsirkin wrote:
> On Wed, May 24, 2023 at 04:12:46PM +0800, Heng Qi wrote:
> > On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote:
> > > > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > > > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > > > > > 1) Add a feature bit to the virtio specification to tell 
> > > > > > > > > > the sender that a fully
> > > > > > > > > > csumed packet must be sent.
> > > > > > > > > 
> > > > > > > > > Who is the sender in this picture? The driver?
> > > > > > > > 
> > > > > > > > The device or the driver.
> > > > > > > > 
> > > > > > > > When the device is hw, the sender is more likely to be a device.
> > > > > > > > When the device is sw, the sender can be a device or a driver.
> > > > > > > >
> > > > > > > > But in general, this feature is inclined to constrain the 
> > > > > > > > behavior of the device and
> > > > > > > > the driver from the receiving side.
> > > > > > > 
> > > > > > > Based on above I am guessing you are talking about driver getting
> > > > > > > packets from device, I wish you used terms from virtio spec.
> > > > > > 
> > > > > > Yes, I'm going to use the terminology of the virtio spec.
> > > > > > 
> > > > > > > 
> > > > > > > > For example: 
> > > > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device 
> > > > > > > > that you must send me a fully csumed packet.
> > > > > > > > 
> > > > > > > > Then the specific implementation can be
> > > > > > > > 
> > > > > > > > (1) the sender sends a fully csumed packet;
> > > > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the 
> > > > > > > > device helps calculate the fully csum
> > > > > > > > (because the two parties in the communication are located 
> > > > > > > > on the same host, the packet is trusted.).
> > > > > > > > 
> > > > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the 
> > > > > > > > driver will no longer receive any packets marked 
> > > > > > > > CHECKSUM_PARTIAL.
> > > > > > > > 
> > > > > > > > Thanks.
> > > > > > > 
> > > > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > > > > > 
> > > > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device 
> > > > > > can
> > > > > > receive a fully checksummed packet, we can no longer enjoy
> > > > > > the device's ability to validate the packet checksum. That is, the 
> > > > > > value
> > > > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which 
> > > > > > means
> > > > > > that the packet received by the driver will not be marked as
> > > > > > VIRTIO_NET_HDR_F_DATA_VALID.
> > > > > > 
> > > > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > > > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must 
> > > > > > give the
> > > > > > driver a fully checksummed packet, and the packet is validated by 
> > > > > > the
> > > > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > > > > > 
> > > > > > > 
> > > > > > > I feel you are tryin

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-24 Thread Heng Qi
On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote:
> > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > > > 1) Add a feature bit to the virtio specification to tell the 
> > > > > > > > sender that a fully
> > > > > > > > csumed packet must be sent.
> > > > > > > 
> > > > > > > Who is the sender in this picture? The driver?
> > > > > > 
> > > > > > The device or the driver.
> > > > > > 
> > > > > > When the device is hw, the sender is more likely to be a device.
> > > > > > When the device is sw, the sender can be a device or a driver.
> > > > > >
> > > > > > But in general, this feature is inclined to constrain the behavior 
> > > > > > of the device and
> > > > > > the driver from the receiving side.
> > > > > 
> > > > > Based on above I am guessing you are talking about driver getting
> > > > > packets from device, I wish you used terms from virtio spec.
> > > > 
> > > > Yes, I'm going to use the terminology of the virtio spec.
> > > > 
> > > > > 
> > > > > > For example: 
> > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that 
> > > > > > you must send me a fully csumed packet.
> > > > > > 
> > > > > > Then the specific implementation can be
> > > > > > 
> > > > > > (1) the sender sends a fully csumed packet;
> > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device 
> > > > > > helps calculate the fully csum
> > > > > > (because the two parties in the communication are located on 
> > > > > > the same host, the packet is trusted.).
> > > > > > 
> > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the 
> > > > > > driver will no longer receive any packets marked CHECKSUM_PARTIAL.
> > > > > > 
> > > > > > Thanks.
> > > > > 
> > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > > > 
> > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
> > > > receive a fully checksummed packet, we can no longer enjoy
> > > > the device's ability to validate the packet checksum. That is, the value
> > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which 
> > > > means
> > > > that the packet received by the driver will not be marked as
> > > > VIRTIO_NET_HDR_F_DATA_VALID.
> > > > 
> > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
> > > > driver a fully checksummed packet, and the packet is validated by the
> > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > > > 
> > > > > 
> > > > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> > > > > disables all offloads but you want to keep some of them?
> > > > > 
> > > > 
> > > > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is 
> > > > needed
> > > > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
> > > > then the driver may always receive packets marked as
> > > > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
> > > > same time.
> > > 
> > > Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows
> > > VIRTIO_NET_HDR_F_DATA_VALID:
> > 
> > We need to focus on what happens when the XDP program is loaded:
> > 
> > The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM
> > feature when loading XDP. The main reason for doing this is because

[virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi
On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > 1) Add a feature bit to the virtio specification to tell the sender that a 
> > fully
> > csumed packet must be sent.
> 
> Who is the sender in this picture? The driver?

The device or the driver.

When the device is hw, the sender is more likely to be a device.
When the device is sw, the sender can be a device or a driver.

But in general, this feature is inclined to constrain the behavior of the 
device and
the driver from the receiving side.

For example: 
VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must send 
me a fully csumed packet.

Then the specific implementation can be

(1) the sender sends a fully csumed packet;
(2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps 
calculate the fully csum
(because the two parties in the communication are located on the same host, 
the packet is trusted.).

In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will no 
longer receive any packets marked CHECKSUM_PARTIAL.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-23 Thread Heng Qi
On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > 1) Add a feature bit to the virtio specification to tell the sender 
> > > > > > that a fully
> > > > > > csumed packet must be sent.
> > > > > 
> > > > > Who is the sender in this picture? The driver?
> > > > 
> > > > The device or the driver.
> > > > 
> > > > When the device is hw, the sender is more likely to be a device.
> > > > When the device is sw, the sender can be a device or a driver.
> > > >
> > > > But in general, this feature is inclined to constrain the behavior of 
> > > > the device and
> > > > the driver from the receiving side.
> > > 
> > > Based on above I am guessing you are talking about driver getting
> > > packets from device, I wish you used terms from virtio spec.
> > 
> > Yes, I'm going to use the terminology of the virtio spec.
> > 
> > > 
> > > > For example: 
> > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you 
> > > > must send me a fully csumed packet.
> > > > 
> > > > Then the specific implementation can be
> > > > 
> > > > (1) the sender sends a fully csumed packet;
> > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device 
> > > > helps calculate the fully csum
> > > > (because the two parties in the communication are located on the 
> > > > same host, the packet is trusted.).
> > > > 
> > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver 
> > > > will no longer receive any packets marked CHECKSUM_PARTIAL.
> > > > 
> > > > Thanks.
> > > 
> > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > 
> > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
> > receive a fully checksummed packet, we can no longer enjoy
> > the device's ability to validate the packet checksum. That is, the value
> > of \field{flags} in the virtio_net_hdr structure is set to 0, which means
> > that the packet received by the driver will not be marked as
> > VIRTIO_NET_HDR_F_DATA_VALID.
> > 
> > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
> > driver a fully checksummed packet, and the packet is validated by the
> > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > 
> > > 
> > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> > > disables all offloads but you want to keep some of them?
> > > 
> > 
> > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed
> > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
> > then the driver may always receive packets marked as
> > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
> > same time.
> 
> Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows
> VIRTIO_NET_HDR_F_DATA_VALID:

We need to focus on what happens when the XDP program is loaded:

The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM
feature when loading XDP. The main reason for doing this is because
VIRTIO_NET_F_GUEST_CSUM allows to receive packets marked as
VIRTIO_NET_HDR_F_NEEDS_CSUM. Such packets are not compatible with
XDP programs, because we cannot guarantee that the csum_{start, offset}
fields are correct after XDP modifies the packets.

So in order for the driver to continue to enjoy the device's ability to
validate packets while XDP is loaded, we need a new feature to tell the
device not to deliver packets marked as VIRTIO_NET_HDR_F_NEEDS_CSUM (in
other words, the device can only deliver packets marked as
VIRTIO_NET_HDR_F_DATA_VALID).

Thanks.

> \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
>   VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} can be
>   set: if so, the packet checksum at offset \field{csum_offset}
>   from \field{csum_start} and any preceding checksums
>   have been validated.  The checksum on the packet is incomplete and
>   if bit VIRTIO_NET_HDR_F_RSC_IN

Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-23 Thread Heng Qi
On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > 1) Add a feature bit to the virtio specification to tell the sender 
> > > > that a fully
> > > > csumed packet must be sent.
> > > 
> > > Who is the sender in this picture? The driver?
> > 
> > The device or the driver.
> > 
> > When the device is hw, the sender is more likely to be a device.
> > When the device is sw, the sender can be a device or a driver.
> >
> > But in general, this feature is inclined to constrain the behavior of the 
> > device and
> > the driver from the receiving side.
> 
> Based on above I am guessing you are talking about driver getting
> packets from device, I wish you used terms from virtio spec.

Yes, I'm going to use the terminology of the virtio spec.

> 
> > For example: 
> > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must 
> > send me a fully csumed packet.
> > 
> > Then the specific implementation can be
> > 
> > (1) the sender sends a fully csumed packet;
> > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps 
> > calculate the fully csum
> > (because the two parties in the communication are located on the same 
> > host, the packet is trusted.).
> > 
> > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will 
> > no longer receive any packets marked CHECKSUM_PARTIAL.
> > 
> > Thanks.
> 
> This is what clearing VIRTIO_NET_F_GUEST_CSUM does.

Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
receive a fully checksummed packet, we can no longer enjoy
the device's ability to validate the packet checksum. That is, the value
of \field{flags} in the virtio_net_hdr structure is set to 0, which means
that the packet received by the driver will not be marked as
VIRTIO_NET_HDR_F_DATA_VALID.

So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
driver a fully checksummed packet, and the packet is validated by the
device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.

> 
> I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> disables all offloads but you want to keep some of them?
> 

No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed
in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
then the driver may always receive packets marked as
VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
same time.

> Again please use virtio terminology not Linux. to help you out,
> in current linux, VIRTIO_NET_HDR_F_NEEDS_CSUM and VIRTIO_NET_HDR_F_DATA_VALID
> will set CHECKSUM_PARTIAL and CHECKSUM_UNNECESSARY respectively.
> 

Sure. Will do as you suggested.

Thanks.

> 
> -- 
> MST
> 
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi


Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net
driver conflicts with the loading of the XDP program, which is caused by the
problem described in [1][2], that is, XDP may cause errors in partial 
csumed-related
fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in 
the
virtualized environment, and its purpose is to save computing resource overhead.

The *goal* of this proposal is to enable the coexistence of XDP and 
VIRTIO_NET_F_GUEST_CSUM.

1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL 
packet.

Drivers related to the virtualized environment, such as 
virtio-net/veth/loopback,
etc., may receive partial csumed packets.

When the tx device finds that the destination rx device of the packet is
located on the same host, it is clear that the packet may not pass through
the physical link, so the tx device sends the packet with csum_{start, offset}
directly to the rx side to save computational resources without computing a 
fully csum
(depends on the specific implementation, some virtio-net backend devices are 
known to
behave like this currently). From [3], the stack trusts such packets.

However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may 
cause
packet dropping as [1][2] stated. But currently the veth community does not 
seem to
have reported such problems, can we guess that the coexistence of XDP and
rx CHECKSUM_PARTIAL has less negative impact?

2. About rx CHECKSUM_UNECESSARY:

We have just seen that in a virtualized environment a packet may flow between 
the
same host, so not computing the complete csum for the packet saves some cpu 
resources.

The purpose of the checksum is to verify that packets passing through the
physical link are correct. Of course, it is also reasonable to do a fully csum 
for
packets of the virtualized environment, which is exactly what we need.

rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked,
that is, it is a credible packet. If such a packet is modified by the XDP 
program,
the user should recalculate the correct checksum using bpf_csum_diff() and
bpf_{l3,l4}_csum_replace().

Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx,
etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully 
checked
at the tx side.

AWS's ena driver is also designed to be in this fully checksum mode
(we also mentioned below that a feature bit can be provided for virtio-net,
telling the sender that a fully checksum must be calculated to implement similar
behavior to other drivers), although it is in a virtualized environment.

3. To sum up:

It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually
exclusive, which may cause the following problems:

When XDP loads,

1) For packets that are fully checked by the sender, packets are marked as 
CHECKSUM_UNECESSARY
by the rx csum hw offloading.

virtio-net driver needs additional CPU resources to compute the checksum for 
any packet.

When testing with the following command in Aliyun ECS:
qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw
(mtu = 1500, dev layer GRO is on)

The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%.

2)
One of the main functions of the XDP prog is to be used as a monitoring and
firewall, etc., which means that the XDP prog may not modify the packet.
This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY,
but we ignore the rx csum hw offloading capability for both cases.

4. How we try to solve:

1) Add a feature bit to the virtio specification to tell the sender that a fully
csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM 
when this
feature bit is negotiated. (similar to ENA behavior)

2) Modify the current virtio-net driver

No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set().
Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy 
the software
CPU resources saved by rx csum hw offloading.
(This method is a bit rude)

5. Ending 

This is a proposal and does not represent a formal solution. Looking forward to 
feedback
from the community and exploring a possible/common solution to the problem 
described in
this proposal.

6. Quote

[1] 18ba58e1c234

virtio-net: fail XDP set if guest csum is negotiated

We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.

[2] e59ff2c49ae1

virtio-net: disable guest csum during XDP set

We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:

- It could be overridden by header adjustment, thus is might be not
  correct after XDP processing.
- There's no way to pass such metadata information through
  

[virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-22 Thread Heng Qi




在 2023/6/22 下午8:32, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Thursday, June 22, 2023 2:23 AM

On Wed, Jun 21, 2023 at 08:52:04PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, June 21, 2023 4:38 PM

And the field is RO so no memory cost to exposing it in all VFs.

Two structures do not bring the asymmetry.
Accessing current and enabled fields via two different mechanism
is bringing

the asymmetry.

I guess it's a matter of taste, but it is clearly more consistent
with other hash things, to which it's very similar.


This is consistent with new commands we define including notification

coalescing whose GET is not coming config space.

But there GET just reports the current state. Not the read only capability. So
there would be cost per VF to keep it in config space.
This one is RO no cost per VF. Let's make it convenient?


And each VF can have different value hence requires per VF storage in the 
device.
  

Nah, config space is too convenient when we can live with its
limitations. I don't thin kwe prefer not to keep growing it.
For some things such as this one it's perfect.


Fields are different between different devices.

Not sure what's the implication?

Implication is device needs to store this in always available on-chip memory 
which is not good.


For example, for migration driver might want to validate that two
devices have same capability. doing it without dma is nicer.


A migration driver for real world scenario, will almost have to use the dma for

amount of data it needs to exchange.

Not migration itself, provisioning.


Provisioning driver usually do not attach to the member device directly.
This requires device reset, followed by reaching _DRIVER stage, querying 
features etc and config area.
And unbinding it and second reset by member driver. Ugh.

Provisioning driver also needs to get the state or capabilities even when 
member driver is already attached.
So config space is not much a gain either.
  

Another example, future admin transport will have ability to
provision devices by supplying their config space.
This will include this capability automatically, if instead we hide
it in a command we need to do extra custom work.


So we do not prefer to keep growing the config space anymore,
hence GET is the right approach to me.

Heh I know you hate config space. Let it go, stop wasting time
arguing about the same thing on every turn and instead help define
admin transport to solve it

This was discussed many times, a driver to have a direct (non-intercepted by

owner device) channel to device.

If you mean this non-intercepted channel as admin transport, fine.

we can do that, sure.


If you mean this is intercepted and it is going over admin cmd, then it is of no

use for all future interfaces.

We discussed this in thread with you and Jason.
I provided concrete example with size and device provisioning math too and

other example of multi-physical address VQ.

So transporting register by register over some admin transport is sub-optimal.


Not register by register, we can send all of config space as long as it's RO. 
This
field is.


It is RO in context of one member device, but every VF can have different value.
The device will never know if one will use new cmdvq to access or some old 
driver will use without it.
And hence, it always needs to provision it on onchip memory for backward 
compatibility.


Yes, I think we also have to consider upcoming
    1. device counters (e.g. supported_device_counter),
    2. receive flow filters (e.g. supported_flow_types, 
supported_max_entries),

    3. header splits (e.g. supported_split_types) etc.
Continuous expansion of the configuration space needs to be careful.



Instead of decision point being RO vs RW,
any new fields via cmdvq and existing fields stays in cfg space, give 
predictable behavior to size the member devices in the system.
Once the cmdvq is available, we can get rid of GET command used in this version 
for new future features.
Till that arrives, GET command is the efficient way.


Yes, I agree.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Heng Qi




在 2023/6/29 上午9:56, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, June 28, 2023 3:45 PM

Maybe I get it. You want to use the new features as a carrot to
force drivers to implement DMA? You suspect they will ignore the
spec requirement just because things seem to work?


Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.
  



There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding

temporary one field to config space.

I propose to add a line to the spec " Device Configuration Space"
section, something like,

Note: Any new device configuration space fields additional MUST consider

accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment

rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to
MMIO might be an option for qemu helping us debug DMA issues.


There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.
  

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST is a small
enough detail.


 From implementation POV it is certainly critical and good step forward to 
optimize virtio interface.
  

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not have GET for
either hash or rss config.


For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.


Great! Glad to hear this!

  

And if we no longer have GET is there still a reason for a separate command as
opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels but
now that is no longer the case and there are reserved words in
virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.


+1.

Inner header hash is orthogonal to RSS, and it's fine to have its own 
structure and commands.
There is no need to send additional RSS fields when we configure inner 
header hash.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-29 Thread Heng Qi
On Thu, Jun 29, 2023 at 01:56:34AM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 3:45 PM
> > > > Maybe I get it. You want to use the new features as a carrot to
> > > > force drivers to implement DMA? You suspect they will ignore the
> > > > spec requirement just because things seem to work?
> > > >
> > > Right because it is not a must normative.
> > 
> > Well SHOULD also does not mean "ok to just ignore".
> > 
> > This word, or the adjective "RECOMMENDED", mean that there
> >may exist valid reasons in particular circumstances to ignore a
> >particular item, but the full implications must be understood and
> >carefully weighed before choosing a different course.
> >
> RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
> So rather a good design is device tells the starting offset for the extended 
> config space.
> And extended config space MUST be accessed using a DMA.
> With this sw can have infinite size MMIO and hw device forces DMA based on 
> its implementation of where to start DMA from.
> This also gives the ability to maintain current config as MMIO for backward 
> compatibility.
>  
> > 
> > 
> > > > There's some logic here, for sure. you just might be right.
> > > >
> > > > However, surely we can discuss this small tweak in 1.4 timeframe?
> > >
> > > Sure, if we prefer the DMA approach I don't have a problem in adding
> > temporary one field to config space.
> > >
> > > I propose to add a line to the spec " Device Configuration Space"
> > > section, something like,
> > >
> > > Note: Any new device configuration space fields additional MUST consider
> > accessing such fields via a DMA interface.
> > >
> > > And this will guide the new patches of what to do instead of last moment
> > rush.
> > 
> > Yea, except again I'd probably make it a SHOULD: e.g. I can see how 
> > switching to
> > MMIO might be an option for qemu helping us debug DMA issues.
> >
> There are too many queues whose debugging is needed and MMIO likely not the 
> way to debug.
>  
> > The time to discuss this detail would be around when proposal for the DMA
> > access to config space is on list though: I feel this SHOULD vs MUST is a 
> > small
> > enough detail.
> >
> From implementation POV it is certainly critical and good step forward to 
> optimize virtio interface.
>  
> > Going back to inner hash. If we move supported_tunnels back to config space,
> > do you feel we still need GET or just drop it? I note we do not have GET for
> > either hash or rss config.
> >
> For hash and rss config, debugging is missing. :)
> Yes, we can drop the GET after switching supported_tunnels to struct 
> virtio_net_hash_config.
>  

I would like to make sure if we're aligned. The new version should contain the 
following:
1. The supported_tunnel_types are placed in the device config space;
2. Reserve the following structure:

 struct virtnet_hash_tunnel {
le32 enabled_tunnel_types;
 };

3. Reserve the SET command for enabled_tunnel_types and remove the GET command 
for enabled_tunnel_types.

If there is no problem, I will modify it accordingly.

Thanks!

> > And if we no longer have GET is there still a reason for a separate command 
> > as
> > opposed to a field in virtio_net_hash_config?
> > I know this was done in v11 but there it was misaligned.
> > We went with a command because we needed it for supported_tunnels but
> > now that is no longer the case and there are reserved words in
> > virtio_net_hash_config ...
> > 
> > Let me know how you feel it about that, not critical for me.
> 
> struct virtio_net_hash_config reserved is fine.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-29 Thread Heng Qi




在 2023/6/29 下午7:48, Michael S. Tsirkin 写道:

On Thu, Jun 29, 2023 at 10:05:09AM +0800, Heng Qi wrote:


在 2023/6/29 上午9:56, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, June 28, 2023 3:45 PM

Maybe I get it. You want to use the new features as a carrot to
force drivers to implement DMA? You suspect they will ignore the
spec requirement just because things seem to work?


Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.

There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding

temporary one field to config space.

I propose to add a line to the spec " Device Configuration Space"
section, something like,

Note: Any new device configuration space fields additional MUST consider

accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment

rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to
MMIO might be an option for qemu helping us debug DMA issues.


There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST is a small
enough detail.


  From implementation POV it is certainly critical and good step forward to 
optimize virtio interface.

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not have GET for
either hash or rss config.


For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.

Great! Glad to hear this!


And if we no longer have GET is there still a reason for a separate command as
opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels but
now that is no longer the case and there are reserved words in
virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.

+1.

Inner header hash is orthogonal to RSS, and it's fine to have its own
structure and commands.
There is no need to send additional RSS fields when we configure inner
header hash.

Thanks.

Not RSS, hash calculations. It's not critical, but I note that
practically you said you will enable this with symmetric hash
so it makes sense to me to send this in the same command


This works for me.

Thanks.


with the key.

Not critical though if there's opposition.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-29 Thread Heng Qi
On Thu, Jun 29, 2023 at 04:59:28PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, June 29, 2023 7:48 AM
> 
> 
> > > > struct virtio_net_hash_config reserved is fine.
> > >
> > > +1.
> > >
> > > Inner header hash is orthogonal to RSS, and it's fine to have its own
> > > structure and commands.
> > > There is no need to send additional RSS fields when we configure inner
> > > header hash.
> > >
> > > Thanks.
> > 
> > Not RSS, hash calculations. It's not critical, but I note that practically 
> > you said
> > you will enable this with symmetric hash so it makes sense to me to send 
> > this in
> > the same command with the key.
> > 
> 
> In the v19, we have,
> 
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with 
> VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
> 
> So it is done along with rss, so in same struct as rss config is fine.

Do you mean having both virtio_net_rss_config and virtio_net_hash_config
have enabled_hash_types?
Like this:

struct virtio_net_rss_config {
 le32 hash_types;
 le16 indirection_table_mask;
 struct rss_rq_id unclassified_queue;
 struct rss_rq_id indirection_table[indirection_table_length];
 le16 max_tx_vq;
 u8 hash_key_length;
 u8 hash_key_data[hash_key_length];
+le32 enabled_tunnel_types; 
};

struct virtio_net_hash_config {
 le32 hash_types;
-le16 reserved[4];
+le32 enabled_tunnel_types;
+le16 reserved[2];
 u8 hash_key_length;
 u8 hash_key_data[hash_key_length];
};


If yes, this should have been discussed in v10 [1] before, enabled_tunnel_types 
in virtio_net_rss_config
will follow the variable length field and cause misalignment.

If we let the inner header hash reuse the virtio_net_hash_config structure, it 
can work, but the only disadvantage
is that the configuration of the inner header hash and *RSS*(not hash 
calculations) becomes somewhat coupled.
Just imagine:
If the driver and the device negotiated VIRTIO_NET_F_HASH_TUNNEL and 
VIRTIO_NET_F_RSS, but did not negotiate VIRTIO_NET_F_HASH_REPORT,
1. then if we only want to configure the inner header hash (such as 
enabled_tunnel_types), it is good for us to send
virtio_net_hash_config alone;
2. but then if we want to configure the inner header hash and RSS (such as 
indirection table), we need to send all
virtio_net_rss_config and virtio_net_hash_config once, because 
virtio_net_rss_config now does not carry enabled_tunnel_types
due to misalignment.

So, I think the following structure will make it clearer to configure inner 
header hash and RSS/hash calculation.
But in any case, if we still propose to reuse virtio_net_hash_config proposal, 
I am ok, no objection:

1. The supported_tunnel_types are placed in the device config space;

2.
Reserve the following structure:

  struct virtnet_hash_tunnel {
le32 enabled_tunnel_types;
  };

3. Reserve the SET command for enabled_tunnel_types and remove the GET
command for enabled_tunnel_types.

[1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00317.html

Thanks a lot!

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH v17] virtio-net: support inner header hash

2023-06-20 Thread Heng Qi
On Tue, Jun 20, 2023 at 08:06:16AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 12, 2023 at 04:09:20PM +0800, Heng Qi wrote:
> > 1. Currently, a received encapsulated packet has an outer and an inner 
> > header, but
> > the virtio device is unable to calculate the hash for the inner header. The 
> > same
> > flow can traverse through different tunnels, resulting in the encapsulated
> > packets being spread across multiple receive queues (refer to the figure 
> > below).
> > However, in certain scenarios, we may need to direct these encapsulated 
> > packets of
> > the same flow to a single receive queue. This facilitates the processing
> > of the flow by the same CPU to improve performance (warm caches, less 
> > locking, etc.).
> > 
> >client1client2
> >   |+---+ |
> >   +--->|tunnels|<+
> >+---+
> >   |  |
> >   v  v
> >   +-+
> >   | monitoring host |
> >   +-+
> > 
> > To achieve this, the device can calculate a symmetric hash based on the 
> > inner headers
> > of the same flow.
> > 
> > 2. For legacy systems, they may lack entropy fields which modern protocols 
> > have in
> > the outer header, resulting in multiple flows with the same outer header but
> > different inner headers being directed to the same receive queue. This 
> > results in
> > poor receive performance.
> > 
> > To address this limitation, inner header hash can be used to enable the 
> > device to advertise
> > the capability to calculate the hash for the inner packet, regaining better 
> > receive performance.
> > 
> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/173
> > 
> > Signed-off-by: Heng Qi 
> > Reviewed-by: Xuan Zhuo 
> > Reviewed-by: Parav Pandit 
> 
> 
> Looks good but small rewording suggestions:
> 
> > ---
> > v16->v17:
> > 1. Some small rewrites. @Parav Pandit
> > 2. Add Parav's Reviewed-by tag (Thanks!).
> > 
> > v15->v16:
> > 1. Remove the hash_option. In order to delimit the inner header hash 
> > and RSS
> >configuration, the ability to configure the outer src udp port hash 
> > is given
> >to RSS. This is orthogonal to inner header hash, which will be done 
> > in the
> >RSS capability extension topic (considered as an RSS extension 
> > together
> >with the symmetric toeplitz hash algorithm, etc.). @Parav Pandit 
> > @Michael S . Tsirkin
> 
> Hmm maybe. I'd like the TC see this patch though, they are designed to
> work together. Can we see at least a draft? I have been burned

I've drawn up a simple draft on hashing based on outer src port,
but not yet on toeplitz symmetric hashing, I'll send it out in about 4
weeks, but please don't delay this work. You have my word.

> many times by contributors just addressing what they
> care about, promising to follow through with other work and
> never bothering.

I got it.

> 
> > 2. Fix a 'field' typo. @Parav Pandit
> > 
> > v14->v15:
> > 1. Add tunnel hash option suggested by @Michael S . Tsirkin
> > 2. Adjust some descriptions.
> > 
> > v13->v14:
> > 1. Move supported_hash_tunnel_types from config space into cvq command. 
> > @Parav Pandit
> > 2. Rebase to master branch.
> > 3. Some minor modifications.
> > 
> > v12->v13:
> > 1. Add a GET command for hash_tunnel_types. @Parav Pandit
> > 2. Add tunneling protocol explanation. @Jason Wang
> > 3. Add comments on some usage scenarios for inner hash.
> > 
> > v11->v12:
> > 1. Add a command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG.
> > 2. Refine the commit log. @Michael S . Tsirkin
> > 3. Add some tunnel types.
> > 
> > v10->v11:
> > 1. Revise commit log for clarity for readers.
> > 2. Some modifications to avoid undefined terms. @Parav Pandit
> > 3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > 4. Add the normative statements. @Parav Pandit
> > 
> > v9->v10:
> > 1. Removed hash_report_tunnel related information. @Parav Pandit
> > 2. Re-describe the limitations of QoS for tunneling.
> > 3. Some clarification.
> > 
> > v8->v9:
> > 1. Merge hash_report_tunnel_types into hash_report.

  1   2   >