[ovs-dev] Can ovs dnat support 1:1 shifted port range mapping?

2023-03-01 Thread
Hi, folks

 

Netfilter added a shifted port range mapping function in Linux kernel 4.19,  it 
looks like this:

 

iptables -t nat -A zone_wan_prerouting -p tcp -m tcp --dport 5000:5100 -j DNAT 
--to-destination '192.168.1.2:2000-2100/5000'

 

5000-51000 is mapped to 2000-2100

 

Can ovs ct action do this? If not, can it do the below case?

 

iptables -t nat -A zone_wan_prerouting -p tcp -m tcp --dport 5000:5100 -j DNAT 
--to-destination '192.168.1.2:5000-5100'

 

I’m wondering how we can use openflow to express this precisely, tcp dport 
range expression is also a big problem.

 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Is it necessary for ovs that version of ovs command line tools must be same as the one of ovs-vswitchd?

2021-11-26 Thread
Hi, folks

 

Recently we found some weird issues, ovs-vswitchd  will crash occasionally when 
ovs-ofctl add-br with protocol option if version of ovs command line tools in 
neutron agent (which is packaged as k8s pod) is different from the one on the 
host, or openflows in br-int can’t work as expected although flows are ok. 

 

We often have such cases, i.e. ovs version on the host is different from the 
one of neutron agent, but they did work as expected.

 

Per my understanding, they should have same version, otherwise highe version is 
talking with low version, or low version is talking with high version. Anybody 
can give us advice how we can fix such issues, is there a list which shows 
which versions can work with which versions together, thank you for helping in 
advance.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] correct macro name

2021-09-13 Thread
Acked-by: Yi Yang 

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 chengtcli--- via dev
发送时间: 2021年9月13日 21:34
收件人: ovs-dev 
主题: [ovs-dev] [PATCH] correct macro name

From: lic121 

fix macro name from "VLXAN_GPE_FLAGS_P" to "VXLAN_GPE_FLAGS_P"

Signed-off-by: lic121 
---
 lib/packets.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/packets.h b/lib/packets.h index 515bb59..e8bdf08 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -1492,7 +1492,7 @@ BUILD_ASSERT_DECL(sizeof(struct vxlanhdr) == 8);

 /* Fields in struct vxlanhdr.vx_gpe.flags */
 #define VXLAN_GPE_FLAGS_VER 0x30/* Version. */
-#define VLXAN_GPE_FLAGS_P   0x04/* Next Protocol Bit. */
+#define VXLAN_GPE_FLAGS_P   0x04/* Next Protocol Bit. */
 #define VXLAN_GPE_FLAGS_O   0x01/* OAM Bit. */

 /* VXLAN-GPE header flags. */
--
1.8.3.1



chengt...@qq.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Does anybody know how we can work around NORMAL action limitation about vlan?

2021-08-05 Thread
Hi, folks

 

I’m changing Openstack Neutron to use openflow-based pipeline to implement 
qrouter and floating IP, everything is ok when two VMs (which are in two 
different compute nodes and two subnets but same network, so vlan tag is same), 
but NORMAL action will drop packets if they are on different networks (main 
difference is vlan tags are different), we can see such information by 
“ovs-appctl ofproto/trace …”

 

dropping VLAN 2 tagged packet received on port tap351a3443-9c configured as 
VLAN 1 access port

 

I believe it is led by different vlan, so I changed vlan id and IN_PORT by 
mod_vlan_vid:2 and resubmit(qr-492d8a7b-2d,73), but NORMAL still dropped it.

 

dropping VLAN 2 tagged packet received on port qr-492d8a7b-2d configured as 
VLAN 2 access port

 

Anybody can help explain why it is so? 

 

But it is ok if I use output:patch-tun directly instead of NORMAL in table 94. 
I’m wondering how OVN work around such limitation of NORMAL, anybody knows how 
I can fix it if I still want to leverage the existing pipeline, thank you all 
for help in advance.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 ND message?

2021-02-24 Thread
No, neighbor advertisement (reply for neighbor solicitation, i.e. ARP reply) 
must have nd_options_type==2.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月25日 3:05
收件人: Yi Yang (杨燚)-云服务集团 
抄送: f...@sysclose.org; b...@ovn.org; ovs-dev@openvswitch.org; 
ovs-disc...@openvswitch.org
主题: Re: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 
ND message?

On Tue, Feb 23, 2021 at 6:03 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Out of curious, I remember OVN doesn't support OVS DPDK, I believe OVN also 
> does IPv6 ND by openflow, is it acceptable to use slow path to handle IPv6 ND 
> for OVS kernel datapath?
>
Maybe OVN uses IPv6 ND, but not setting/matching 'nd_options_type'?
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 ND message?

2021-02-23 Thread
Out of curious, I remember OVN doesn't support OVS DPDK, I believe OVN also 
does IPv6 ND by openflow, is it acceptable to use slow path to handle IPv6 ND 
for OVS kernel datapath?

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2021年2月24日 9:56
收件人: 'u9012...@gmail.com' 
抄送: 'f...@sysclose.org' ; 'b...@ovn.org' ; 
'ovs-dev@openvswitch.org' ; 
'ovs-disc...@openvswitch.org' 
主题: 答复: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 
ND message?
重要性: 高

Thanks, got it, I'll add this to kernel data path if nobody did it.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月24日 2:31
收件人: Yi Yang (杨燚)-云服务集团 
抄送: f...@sysclose.org; b...@ovn.org; ovs-dev@openvswitch.org; 
ovs-disc...@openvswitch.org
主题: Re: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 
ND message?

On Sun, Feb 21, 2021 at 9:39 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Thanks William, it is my ovs-ofctl issue, my ovs-vswitchd is new, but 
> ovs-ofctl is old, but after I used ovs-ofctl, I still saw issues:
>
> OFPT_ERROR (OF1.3) (xid=0x6): OFPBAC_BAD_SET_ARGUMENT OFPT_FLOW_MOD 
> (OF1.3) (xid=0x6): ADD 
> icmp6,icmp_type=135,icmp_code=0,nd_sll=22:70:e0:d0:fc:75 
> cookie:0x1234567890123456 
> actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,set_field:
> 2->nd_options_type,resubmit(,0) OFPT_ERROR (OF1.3) (xid=0x6): 
> OFPBAC_BAD_SET_ARGUMENT OFPT_FLOW_MOD (OF1.3) (xid=0x6): ADD 
> icmp6,icmp_type=136,icmp_code=0 cookie:0x1234567890123456 
> actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:22:70:e0:d0:
> fc:76->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe8
> 0::2070:e0ff:fed0:fc76->ipv6_src,set_field:22:70:e0:d0:fc:76->nd_tll,s
> et_field:57344->nd_reserved,IN_PORT
>
> I saw this thread 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html discussed 
> it,  it can work after I applied patch in 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html.
>
> So @Flavio, maybe you need to apply the patch in 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html to fix 
> this issue in kernel datapath, per my check, kernel datapath with this patch 
> can work, I verified it by using openflow to do NS reply and ICMPv6 ping 
> reply.
>
> William, I guess  you're using ovs userspace, so it doesn't have this issue.

Yes, I'm testing it using userspace datapath. I think kernel datapath doesn't 
support setting nd_ext.
Look like you can get it working by doing it in slow path, like the patch you 
pointed to.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 ND message?

2021-02-23 Thread
Thanks, got it, I'll add this to kernel data path if nobody did it.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月24日 2:31
收件人: Yi Yang (杨燚)-云服务集团 
抄送: f...@sysclose.org; b...@ovn.org; ovs-dev@openvswitch.org; 
ovs-disc...@openvswitch.org
主题: Re: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 
ND message?

On Sun, Feb 21, 2021 at 9:39 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Thanks William, it is my ovs-ofctl issue, my ovs-vswitchd is new, but 
> ovs-ofctl is old, but after I used ovs-ofctl, I still saw issues:
>
> OFPT_ERROR (OF1.3) (xid=0x6): OFPBAC_BAD_SET_ARGUMENT OFPT_FLOW_MOD 
> (OF1.3) (xid=0x6): ADD 
> icmp6,icmp_type=135,icmp_code=0,nd_sll=22:70:e0:d0:fc:75 
> cookie:0x1234567890123456 
> actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,set_field:
> 2->nd_options_type,resubmit(,0) OFPT_ERROR (OF1.3) (xid=0x6): 
> OFPBAC_BAD_SET_ARGUMENT OFPT_FLOW_MOD (OF1.3) (xid=0x6): ADD 
> icmp6,icmp_type=136,icmp_code=0 cookie:0x1234567890123456 
> actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:22:70:e0:d0:
> fc:76->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe8
> 0::2070:e0ff:fed0:fc76->ipv6_src,set_field:22:70:e0:d0:fc:76->nd_tll,s
> et_field:57344->nd_reserved,IN_PORT
>
> I saw this thread 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html discussed 
> it,  it can work after I applied patch in 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html.
>
> So @Flavio, maybe you need to apply the patch in 
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html to fix 
> this issue in kernel datapath, per my check, kernel datapath with this patch 
> can work, I verified it by using openflow to do NS reply and ICMPv6 ping 
> reply.
>
> William, I guess  you're using ovs userspace, so it doesn't have this issue.

Yes, I'm testing it using userspace datapath. I think kernel datapath doesn't 
support setting nd_ext.
Look like you can get it working by doing it in slow path, like the patch you 
pointed to.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 ND message?

2021-02-21 Thread
Thanks William, it is my ovs-ofctl issue, my ovs-vswitchd is new, but ovs-ofctl 
is old, but after I used ovs-ofctl, I still saw issues:

OFPT_ERROR (OF1.3) (xid=0x6): OFPBAC_BAD_SET_ARGUMENT
OFPT_FLOW_MOD (OF1.3) (xid=0x6): ADD 
icmp6,icmp_type=135,icmp_code=0,nd_sll=22:70:e0:d0:fc:75 
cookie:0x1234567890123456 
actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,set_field:2->nd_options_type,resubmit(,0)
OFPT_ERROR (OF1.3) (xid=0x6): OFPBAC_BAD_SET_ARGUMENT
OFPT_FLOW_MOD (OF1.3) (xid=0x6): ADD icmp6,icmp_type=136,icmp_code=0 
cookie:0x1234567890123456 
actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:22:70:e0:d0:fc:76->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe80::2070:e0ff:fed0:fc76->ipv6_src,set_field:22:70:e0:d0:fc:76->nd_tll,set_field:57344->nd_reserved,IN_PORT

I saw this thread 
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html discussed 
it,  it can work after I applied patch in 
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html.

So @Flavio, maybe you need to apply the patch in 
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg47815.html to fix this 
issue in kernel datapath, per my check, kernel datapath with this patch can 
work, I verified it by using openflow to do NS reply and ICMPv6 ping reply.

William, I guess  you're using ovs userspace, so it doesn't have this issue.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月22日 12:20
收件人: Yi Yang (杨燚)-云服务集团 
抄送: vishal.deep.ajm...@ericsson.com; b...@ovn.org; ovs-dev@openvswitch.org; 
ovs-disc...@openvswitch.org
主题: Re: [ovs-discuss] Do you know how I can set nd_options_type field for ipv6 
ND message?

On Sat, Feb 20, 2021 at 2:11 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, folks
>
>
>
> I need to set nd_options_type to 2 for NS message to respond IPv6 NS, my flow 
> is below, why nd_options_type can’t be set? Per commit 
> 9b2b84973db76e1138d9234ff1b84bb6bb156011, it should work, what’s wrong? 
> Appreciate your help in advance, thank you.
>
>
>
> $ sudo ovs-ofctl -Oopenflow13 add-flow br-int 
> "table=0,ipv6,icmp6,icmp_type=135,icmp_code=0,nd_target=fe80::505c:cff:fe88:392f,nd_sll=52:5c:0c:88:39:2f,actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:52:5c:0c:88:39:3f->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe80::505c:cff:fe88:392f->ipv6_src,set_field:52:5c:0c:88:39:3f->nd_tll,set_field:2->nd_options_type,set_field:OxE000->nd_reserved,output:IN_PORT"
>
> ovs-ofctl: nd_options_type is not a valid OXM field name
>
> $ sudo ovs-ofctl -Oopenflow13 add-flow br-int 
> "table=0,ipv6,icmp6,icmp_type=135,icmp_code=0,nd_target=fe80::505c:cff:fe88:392f,nd_sll=52:5c:0c:88:39:2f,actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:52:5c:0c:88:39:3f->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe80::505c:cff:fe88:392f->ipv6_src,set_field:52:5c:0c:88:39:3f->nd_tll,load:2->ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE[],set_field:OxE000->nd_reserved,output:IN_PORT"
>
> ovs-ofctl: ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE[]: unknown field 
> `ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE'

I tested by doing
roos:~/ovs# ovs-ofctl add-flow br0 "in_port=1
icmp6,icmpv6_code=0,icmpv6_type=135
actions=set_field:2->nd_options_type, 2"
roos:~/ovs# ovs-ofctl dump-flows br0
 cookie=0x0, duration=8.492s, table=0, n_packets=0, n_bytes=0,
icmp6,in_port="afxdp-p0",icmp_type=135,icmp_code=0
actions=load:0x2->ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE[],output:2
 cookie=0x0, duration=1078.248s, table=0, n_packets=0, n_bytes=0,
priority=0 actions=NORMAL

Looks ok.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Do you know how I can set nd_options_type field for ipv6 ND message?

2021-02-20 Thread
Hi, folks

 

I need to set nd_options_type to 2 for NS message to respond IPv6 NS, my flow 
is below, why nd_options_type can’t be set? Per commit 
9b2b84973db76e1138d9234ff1b84bb6bb156011, it should work, what’s wrong? 
Appreciate your help in advance, thank you.

 

$ sudo ovs-ofctl -Oopenflow13 add-flow br-int 
"table=0,ipv6,icmp6,icmp_type=135,icmp_code=0,nd_target=fe80::505c:cff:fe88:392f,nd_sll=52:5c:0c:88:39:2f,actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:52:5c:0c:88:39:3f->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe80::505c:cff:fe88:392f->ipv6_src,set_field:52:5c:0c:88:39:3f->nd_tll,set_field:2->nd_options_type,set_field:OxE000->nd_reserved,output:IN_PORT"

ovs-ofctl: nd_options_type is not a valid OXM field name

$ sudo ovs-ofctl -Oopenflow13 add-flow br-int 
"table=0,ipv6,icmp6,icmp_type=135,icmp_code=0,nd_target=fe80::505c:cff:fe88:392f,nd_sll=52:5c:0c:88:39:2f,actions=set_field:136->icmpv6_type,set_field:0->icmpv6_code,move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:52:5c:0c:88:39:3f->eth_src,move:NXM_NX_IPV6_SRC[]->NXM_NX_IPV6_DST[],set_field:fe80::505c:cff:fe88:392f->ipv6_src,set_field:52:5c:0c:88:39:3f->nd_tll,load:2->ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE[],set_field:OxE000->nd_reserved,output:IN_PORT"

ovs-ofctl: ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE[]: unknown field 
`ERICOXM_OF_ICMPV6_ND_OPTIONS_TYPE'

$

 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH V3 2/4] Add GSO support for DPDK data path

2021-02-07 Thread
Yes, GSO  is ok, but GRO may be have that issue, I didn't see that issue in my 
openstack environment, so maybe it will be great if we can have a test case to 
trigger that issue.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月7日 23:46
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org; 
f...@sysclose.org
主题: Re: [ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path

On Tue, Oct 27, 2020 at 6:02 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
> 发送时间: 2020年10月27日 21:03
> 收件人: yang_y...@163.com; ovs-dev@openvswitch.org
> 抄送: f...@sysclose.org; i.maxim...@ovn.org
> 主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path
>
> On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> > From: Yi Yang 
> >
> > GSO(Generic Segment Offload) can segment large UDP  and TCP packet 
> > to small packets per MTU of destination , especially for the case 
> > that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, 
> > GSO can make sure userspace TSO can still work but not drop.
> >
> > In addition, GSO can help improve UDP performane when UFO is enabled 
> > in VM.
> >
> > GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx 
> > function of physical NIC.
> >
> > Signed-off-by: Yi Yang 
> > ---
> >  lib/dp-packet.h|  21 +++-
> >  lib/netdev-dpdk.c  | 358
> > +
> >  lib/netdev-linux.c |  17 ++-
> >  lib/netdev.c   |  67 +++---
> >  4 files changed, 417 insertions(+), 46 deletions(-)

snip

> >
> > @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk 
> > *dev, struct rte_mbuf **pkts,
> >  return cnt;
> >  }
> >
> > -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes 
> > ownership of
> > - * 'pkts', even in case of failure.
> > - *
> > - * Returns the number of packets that weren't transmitted. */  
> > static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int 
> > qid,
> > - struct rte_mbuf **pkts, int cnt)
> > +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
> > +   struct rte_mbuf **pkts, int cnt)
> >  {
> >  uint32_t nb_tx = 0;
> > -uint16_t nb_tx_prep = cnt;
> > +uint32_t nb_tx_prep;
> >
> > -if (userspace_tso_enabled()) {
> > -nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> > -if (nb_tx_prep != cnt) {
> > -VLOG_WARN_RL(, "%s: Output batch contains invalid packets. "
> > - "Only %u/%u are valid: %s", dev->up.name, 
> > nb_tx_prep,
> > - cnt, rte_strerror(rte_errno));
> > -}
> > +nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> > +if (nb_tx_prep != cnt) {
> > +VLOG_WARN_RL(, "%s: Output batch contains invalid packets. "
> > +  "Only %u/%u are valid: %s",
> > + dev->up.name, nb_tx_prep,
> > + cnt, rte_strerror(rte_errno));
> >  }
> >
> >  while (nb_tx != nb_tx_prep) {
> > @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, 
> > int qid,
> >  return cnt - nb_tx;
> >  }
> >
> > +static inline void
> > +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf)
>
> I didn't review the patch, only had a quick glance, but this part bothers me. 
>  OVS doesn't support multi-segment mbufs, so it should not be possible for 
> such mbufs being transmitted by OVS.  So, I do not understand why this 
> function needs to work with such mbufs.
>
> [Yi Yang] Only DPDK driver/Tx function will use it, not OVS, 
> set_multiseg_udptcp_cksum is called in GSO part, it is last step before Tx 
> function, it is a big external mbuf before rte_gso_segment, that isn't a 
> multi-segmented mbuf.
>

Hi Ilya,

Now I understand Yi Yang's point better and I agree with him.
Looks like the patch does the GSO at the DPDK TX function.
It creates multi-seg mbuf after rte_gso_segment(), but will immediately send 
out the multi-seg mbuf to DPDK port, without traversing inside other part of 
OVS code. I guess this case it should work OK?

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK data path

2021-02-06 Thread
I have sent v4 patch to remove GRO and GSO code to avoid such concern.

GRO and GSO must use multi-seg mbuf, our local openstack environment has used 
these code but didn't see any corrupt issue, so I'm not sure what case will 
result in corrupt.

Linearizing mbuf means copying, that will have huge impact on performance. 

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月7日 0:02
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org; 
f...@sysclose.org
主题: Re: [ovs-dev] 答复: [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK 
data path

On Tue, Oct 27, 2020 at 5:50 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
> 发送时间: 2020年10月27日 21:12
> 收件人: yang_y...@163.com; ovs-dev@openvswitch.org
> 抄送: f...@sysclose.org; i.maxim...@ovn.org
> 主题: Re: [ovs-dev] [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for 
> DPDK data path
>
> On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> > From: Yi Yang 
> >
> > GRO(Generic Receive Offload) can help improve performance when TSO 
> > (TCP Segment Offload) or VXLAN TSO is enabled on transmit side, this 
> > can avoid overhead of ovs DPDK data path and enqueue vhost for VM by 
> > merging many small packets to large packets (65535 bytes at most) 
> > once it receives packets from physical NIC.
>
> IIUC, this patch allows multi-segment mbufs to float across different parts 
> of OVS.  This will definitely crash it somewhere.  Much more changes all over 
> the OVS required to make it safely work with such mbufs.  There were few 
> attempts to introduce this support, but all of them ended up being rejected.  
> As it is this patch is not acceptable as it doesn't cover almost anything 
> beside simple cases inside netdev implementation.
>
> Here is the latest attempt with multi-segment mbufs:
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=130193
> tate=*

Thanks, that's very helpful. Looks like a huge amount of work to introduce 
multi-seg mbuf.
>
> Best regards, Ilya Maximets.
>
> [Yi Yang] We have to support this because we have supported TSO for TCP, it 
> can't handle big UDP, this is why we must introduce GSO, the prerequisite of 
> GSO is multi-segment  must be enabled because GSOed mbufs are 
> multi-segmented, but it is just last  step before dpdk Tx, so I don't think 
> it is an issue, per my test in our openstack environment, I didn't encounter 
> any crash, this just enabled DPDK PMD driver to handle GSOed mbuf. For GRO, 
> reassembling also use chained multi-segment mbuf to avoid copy, per long time 
> test, it also didn't lead to any crash. We can fix some corner cases if they 
> aren't covered.
>
I just started to understand the problem. Sorry if I missed something.
So currently what do we do to prevent DPDK sending OVS using multi-seg mbuf?
Do we check it and linearize the mbuf?
Can we make GSO/GRO working using linearized mbuf?

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

2021-02-06 Thread
In OVS DPDK (for tap in lib/netdev-linux.c), we don't use net_tap, internal 
type is also tap type in OVS DPDK use case, that means all the bridge ports 
(i.e. br-int, ovs-netdev, br-phy, etc.) are tap, it will result in deadlock if 
we use PMDs to handle them.

For non-bridge tap ports, PMD is much better than ovs-vswitchd, ovs DPDK has 
support that, you just add tap by vdev (if tap has been created, it will be 
opened), but if existing tap is in network namespace, it can't be handle. A 
patch I sent before added netns option, that can fix this issue.

TSO-related code in linux/netdev-linux.c can be ported into DPDK net-tap pmd 
driver, it is very easy.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2021年2月7日 0:16
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; f...@sysclose.org; yang_y...@163.com; 
ovs-dev@openvswitch.org; olivier.m...@6wind.com
主题: Re: [ovs-dev] 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo 
initialization

On Mon, Feb 1, 2021 at 5:48 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Thanks Ilya, net_tap PMD is handling tap device on host side, so it can 
> leverage vnet header to do TSO/GSO, maybe net_pmd authors don't know how to 
> do this, from source code, tap fd isn't enabled vnet header and TSO.
>
thanks, learned a lot from these discussions.

I looked at the DPDK net_tap and indeed it doesn't support virtio net hdr.
Do you guys think it makes sense to add TSO at dpdk net_tap?
Or simply using the current OVS's userspace-enable-tso on tap/veth is good 
enough?
(using type=system, not using dpdk port type on tap/veth.)

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

2021-02-01 Thread
Thanks Ilya, net_tap PMD is handling tap device on host side, so it can 
leverage vnet header to do TSO/GSO, maybe net_pmd authors don't know how to do 
this, from source code, tap fd isn't enabled vnet header and TSO. 

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2021年2月2日 3:47
收件人: Yi Yang (杨燚)-云服务集团 ; f...@sysclose.org; 
i.maxim...@ovn.org
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; olivier.m...@6wind.com
主题: Re: 答复: [ovs-dev] [PATCH] netdev-dpdk: fix incorrect shinfo initialization

On 10/28/20 1:35 AM, Yi Yang (杨燚)-云服务集团 wrote:
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Flavio Leitner
> 发送时间: 2020年10月27日 21:08
> 收件人: Ilya Maximets 
> 抄送: yang_y...@163.com; ovs-dev@openvswitch.org; olivier.m...@6wind.com
> 主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix incorrect shinfo 
> initialization
> 
> On Tue, Oct 27, 2020 at 01:47:22PM +0100, Ilya Maximets wrote:
>> On 10/27/20 12:34 PM, Flavio Leitner wrote:
>>> On Wed, Oct 14, 2020 at 03:22:48PM +0800, yang_y...@163.com wrote:
>>>> From: Yi Yang 
>>>>
>>>> shinfo is used to store reference counter and free callback of an 
>>>> external buffer, but it is stored in mbuf if the mbuf has tailroom 
>>>> for it.
>>>>
>>>> This is wrong because the mbuf (and its data) can be freed before 
>>>> the external buffer, for example:
>>>>
>>>>   pkt2 = rte_pktmbuf_alloc(mp);
>>>>   rte_pktmbuf_attach(pkt2, pkt);
>>>>   rte_pktmbuf_free(pkt);
>>
>> How is that possible with OVS?  Right now OVS doesn't support 
>> multi-segement mbufs and will, likely, not support them in a near 
>> future because it requires changes all other the codebase.
>>
>> Is there any other scenario that could lead to issues with current 
>> OVS implementation?
> 
> This is copying packets. The shinfo is allocated in the mbuf of the first 
> packet which could be deleted without any references to the external buffer 
> still using it.
> 
> Fbl
> 
> [Yi Yang]  Yes, this is not related with multi-segment mbuf, dpdk interfaces 
> to system interfaces communication will use it if the packet size is greater 
> than mtu size, i.e. TSO case from veth/tap to dpdk/vhost and backward will 
> use it, this is a wrong use of shinfo, the same fix (which is used by 
> virtio/vhost driver)has been merged into dpdk branch.

Thanks.  Sorry for delay.
I added some of this information to the commit message and applied to master.  
Backported down to 2.13.

I'm wondering, though, why net_tap PMD implements TSO in userspace and
doesn't offload this to kernel via virtio headers?   In many cases
actual segmentation is not necessary or could be done later by HW, so it makes 
sense to not waste cycles in userspace and let the kernel decide if it's needed 
or not.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Why does VXLAN tunnel need to wait for a long time to work normally in OVS DPDK?

2021-01-19 Thread
Hi, folks

 

I noticed ping from local IP to remote IP of VXLAN port can’t work immediately 
after OVS DPDK is started and VXLAN port is created, sometimes it needs to wait 
for more than 1 minute,  I found ARP request triggered by ping and ARP request 
sent by ovs itself can’t be responded timely, once ping side received ARP 
reply, it can work, moreover, I can’t see dpdk port received ARP request, 
obviously ARP request is intercepted by ovs and treated specially, otherwise it 
impossibly received ARP reply.  It wouldn’t have effect on it even if I added 
arp entry by “ovs-appctl tnl/arp/set BRIDGE IP MAC”.

 

Anybody can help explain why, what happened? How can I fix this?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH V3 1/4] Enable VXLAN TSO for DPDK datapath

2020-11-23 Thread
Flavio, thank you so much for clarification, I'll push "Enable VXLAN TSO for 
DPDK datapath" first, replies for your comments inline, please check them in 
the later part.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Flavio Leitner
发送时间: 2020年11月24日 3:10
收件人: yang_y_yi 
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH V3 1/4] Enable VXLAN TSO for DPDK datapath


Hi Yi,

On Mon, Nov 02, 2020 at 11:16:49AM +0800, yang_y_yi wrote:
> 
> 
> Thanks a lot, Flavio, please check inline comments for more discussion.
> 
> 
> 
> At 2020-10-31 01:55:57, "Flavio Leitner"  wrote:
> >
> >Hi Yi,
> >
> >Thanks for the patch and sorry the delay to review it.
> >See my comments in line.
> >
> >Thanks,
> >fbl
> >
> >
> >On Fri, Aug 07, 2020 at 06:56:45PM +0800, yang_y...@163.com wrote:
> >> From: Yi Yang 
> >> 
> >> Many NICs can support VXLAN TSO which can help improve 
> >> across-compute-node VM-to-VM performance in case that MTU is set to 
> >> 1500.
> >> 
> >> This patch allows dpdkvhostuserclient interface and veth/tap 
> >> interface to leverage NICs' offload capability to maximize 
> >> across-compute-node TCP performance, with it applied, OVS DPDK can 
> >> reach linespeed for across-compute-node VM-to-VM TCP performance.
> >> 
> >> Signed-off-by: Yi Yang 
> >> ---
> >>  lib/dp-packet.h   |  76 
> >>  lib/netdev-dpdk.c | 188 
> >> ++
> >>  lib/netdev-linux.c|  20 ++
> >>  lib/netdev-provider.h |   1 +
> >>  lib/netdev.c  |  69 --
> >>  5 files changed, 338 insertions(+), 16 deletions(-)
> >> 
> >> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 
> >> 0430cca..79895f2 100644
> >> --- a/lib/dp-packet.h
> >> +++ b/lib/dp-packet.h
> >> @@ -81,6 +81,8 @@ enum dp_packet_offload_mask {
> >>  DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, PKT_TX_UDP_CKSUM, 0x400),
> >>  /* Offload SCTP checksum. */
> >>  DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 
> >> 0x800),
> >> +/* VXLAN TCP Segmentation Offload. */
> >> +DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_VXLAN, 
> >> + 0x1000),
> >>  /* Adding new field requires adding to 
> >> DP_PACKET_OL_SUPPORTED_MASK. */  };
> >>  
> >> @@ -1032,6 +1034,80 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
> >>  *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;  }
> >>  
> >> +#ifdef DPDK_NETDEV
> >> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ 
> >> +static inline void dp_packet_hwol_set_vxlan_tcp_seg(struct 
> >> +dp_packet *b) {
> >> +b->mbuf.ol_flags |= DP_PACKET_OL_TX_TUNNEL_VXLAN;
> >> +b->mbuf.l2_len += sizeof(struct udp_header) +
> >> +  sizeof(struct vxlanhdr);
> >
> >
> >What about L3 length?
> 
> For tunnel offload, l2_len must be original l2_len plus vxlan and udp header 
> size, l3_len is still be inner l3_len.

Ok, I see now that the patch requires DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM. 
That feature is not very common, but might be fine to start with it,
and if needed add extra support for segmenting inner TCP only.

[Yi Yang] all the NICS which support TUNNEL offload can support 
DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM, DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM will not be 
used if no TUNNEL offload feature is available. For TUNNEL TCP segmenting, it 
is also necessary, so it is a must-have feature if it can support TUNNEL 
offload.

> >
> >> +b->mbuf.outer_l2_len = ETH_HEADER_LEN;
> >> +b->mbuf.outer_l3_len = IP_HEADER_LEN;
> >
> >What about IPv6?
> 
> Good catch, we need to care outer ipv6 case. I'll split it to a single 
> function dp_packet_hwol_set_outer_l2_len & dp_packet_hwol_set_l3_len to 
> handle this.

Ok.

> >
> >> +}
> >> +
> >> +/* Check if it is a VXLAN packet */
> >> +static inline bool
> >> +dp_packet_hwol_is_vxlan_tcp_seg(struct dp_packet *b)
> >> +{
> >> +return (b->mbuf.ol_flags & DP_PACKET_OL_TX_TUNNEL_VXLAN);
> >
> >
> >Please use dp_packet_ol_flags_ptr()
> 
> Ok, will use use dp_packet_ol_flags_ptr() in new version.
> 
> >
> >> +}
> >> +
> >> +/* Set l2_len for the packet 'b' */
> >> +static inline void
> >> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> >> +{
> >> +b->mbuf.l2_len = l2_len;
> >> +}
> >
> >This function is only called by Linux in the ingress
> >path before the data processing, so it shouldn't set
> >any value other than the ones related to the iface
> >offloading at this point. Also that the data path can
> >change the packet and there is nothing to update
> >those lengths.
> 
> Does "Linux" mean "system interfaces"?, we need to use l2_len, but I saw 
> l2_len isn't set in some cases, so added this function.

Yes, system interfaces. See more details below.


> >In the egress path it calls netdev_dpdk_prep_hwol_packet()
> >to update those fields.
> 
> If output port is system interfaces (veth or tap), 
> netdev_dpdk_prep_hwol_packet() won't be called.

If the mbuf needs to be copied then you're 

[ovs-dev] 答复: can userspace conntrack support IP fragment?

2020-11-17 Thread
It works after I disabled my GRO, so please ignore my issue, thanks a lot.

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年11月17日 9:38
收件人: 'acon...@redhat.com' 
抄送: 'yihung@gmail.com' ; 'u9012...@gmail.com' 
; 'dlu...@gmail.com' ; 
'd...@openvswitch.org' ; 'yang_y...@163.com' 

主题: 答复: can userspace conntrack support IP fragment?
重要性: 高

Thanks Aaron, here are my ipf settings

# ovs-appctl dpctl/ipf-get-status netdev@ovs-netdev
Fragmentation Module Status
---
v4 enabled: 1
v6 enabled: 1
max num frags (v4/v6): 1000
num frag: 0
min v4 frag size: 1200
v4 frags accepted: 660
v4 frags completed: 660
v4 frags expired: 0
v4 frags too small: 0
v4 frags overlapped: 0
v4 frags purged: 0
min v6 frag size: 1280
v6 frags accepted: 0
v6 frags completed: 0
v6 frags expired: 0
v6 frags too small: 0
v6 frags overlapped: 0
v6 frags purged: 0

I tried big packet ping, ICMP is ok, but tcp and udp are not ok. So I really 
don't know what's wrong. Ip fragment size should be 1500, it is VM MTU value.

root@yangyi-ovsdpdk-vm1-on-07:~# ping 172.16.1.250 -s 8192
PING 172.16.1.250 (172.16.1.250) 8192(8220) bytes of data.
8200 bytes from 172.16.1.250: icmp_seq=1 ttl=64 time=1.06 ms
8200 bytes from 172.16.1.250: icmp_seq=2 ttl=64 time=0.651 ms
8200 bytes from 172.16.1.250: icmp_seq=3 ttl=64 time=0.541 ms
8200 bytes from 172.16.1.250: icmp_seq=4 ttl=64 time=0.485 ms
8200 bytes from 172.16.1.250: icmp_seq=5 ttl=64 time=0.600 ms
8200 bytes from 172.16.1.250: icmp_seq=6 ttl=64 time=0.536 ms
^C
--- 172.16.1.250 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.485/0.646/1.067/0.197 ms
root@yangyi-ovsdpdk-vm1-on-07:~# tcpdump -i ens3 -vvv -c 5 icmp
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 
bytes
01:32:15.373681 IP (tos 0x0, ttl 64, id 3275, offset 0, flags [+], proto ICMP 
(1), length 1500)
172.16.1.250 > 172.16.2.10: ICMP echo request, id 1610, seq 22, length 1480
01:32:15.373705 IP (tos 0x0, ttl 64, id 3275, offset 1480, flags [+], proto 
ICMP (1), length 1500)
172.16.1.250 > 172.16.2.10: icmp
01:32:15.373709 IP (tos 0x0, ttl 64, id 3275, offset 2960, flags [+], proto 
ICMP (1), length 1500)
172.16.1.250 > 172.16.2.10: icmp
01:32:15.373712 IP (tos 0x0, ttl 64, id 3275, offset 4440, flags [+], proto 
ICMP (1), length 1500)
172.16.1.250 > 172.16.2.10: icmp
01:32:15.373715 IP (tos 0x0, ttl 64, id 3275, offset 5920, flags [+], proto 
ICMP (1), length 1500)
172.16.1.250 > 172.16.2.10: icmp
5 packets captured
240 packets received by filter
233 packets dropped by kernel
root@yangyi-ovsdpdk-vm1-on-07:~# iperf3 -t 5 -i 1 -c 172.16.1.250 
--get-server-output
Connecting to host 172.16.1.250, port 5201
[  4] local 172.16.2.10 port 55350 connected to 172.16.1.250 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   433 KBytes  3.54 Mbits/sec   88   2.83 KBytes
[  4]   1.00-2.00   sec  1.01 MBytes  8.43 Mbits/sec  124   4.24 KBytes
[  4]   2.00-3.00   sec   921 KBytes  7.54 Mbits/sec  270   7.07 KBytes
[  4]   3.00-4.00   sec   573 KBytes  4.69 Mbits/sec   86   4.24 KBytes
[  4]   4.00-5.00   sec  1.06 MBytes  8.86 Mbits/sec  152   2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-5.00   sec  3.94 MBytes  6.61 Mbits/sec  720 sender
[  4]   0.00-5.00   sec  3.82 MBytes  6.40 Mbits/sec  receiver

Server output:
Accepted connection from 172.16.2.10, port 55348
[  5] local 172.16.1.250 port 5201 connected to 172.16.2.10 port 55350
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-1.00   sec   317 KBytes  2.59 Mbits/sec
[  5]   1.00-2.00   sec  1015 KBytes  8.32 Mbits/sec
[  5]   2.00-3.00   sec   897 KBytes  7.34 Mbits/sec
[  5]   3.00-4.00   sec   590 KBytes  4.83 Mbits/sec
[  5]   4.00-5.00   sec  1.04 MBytes  8.71 Mbits/sec


iperf Done.
root@yangyi-ovsdpdk-vm1-on-07:~# iperf3 -t 5 -i 1 -c 172.16.1.250 
--get-server-output -u -b 1G -l 8192
Connecting to host 172.16.1.250, port 5201
[  4] local 172.16.2.10 port 58188 connected to 172.16.1.250 port 5201
[ ID] Interval   Transfer Bandwidth   Total Datagrams
[  4]   0.00-1.00   sec   119 MBytes   998 Mbits/sec  15223
[  4]   1.00-2.00   sec   118 MBytes   990 Mbits/sec  15110
[  4]   2.00-3.00   sec   120 MBytes  1.01 Gbits/sec  15418
[  4]   3.00-4.00   sec   118 MBytes   989 Mbits/sec  15088
[  4]   4.00-5.00   sec   121 MBytes  1.01 Gbits/sec  15443
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   JitterLost/Total 
Datagrams
[  4]   0.00-5.00   sec   596 MBytes  1000 Mbits/sec  0.000 ms  0/0 (-nan%)
[  4] Sent 0 datagrams

[ovs-dev] 答复: can userspace conntrack support IP fragment?

2020-11-16 Thread
.00   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (-nan%)
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (-nan%)
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (-nan%)
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (-nan%)


iperf Done.
root@yangyi-ovsdpdk-vm1-on-07:~#

-邮件原件-
发件人: Aaron Conole [mailto:acon...@redhat.com] 
发送时间: 2020年11月16日 22:58
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yihung@gmail.com; u9012...@gmail.com; dlu...@gmail.com; 
d...@openvswitch.org; yang_y...@163.com
主题: Re: can userspace conntrack support IP fragment?

"Yi Yang (杨燚)-云服务集团"  writes:

> Hi, folks
>
>  
>
> I used latest ovs matser in Openstack, when I enabled security group 
> and port security (note: openstack is using ovs openflow to implement 
> security group), TCP performance is about several Mbps, big UDP packet (i.e.
> 8192) can’t work, but after disabled security group and port security, 
> everything is ok, I doubt userspace conntrack can’t support IP 
> fragment (or recent changes introduced bugs),
> https://bugzilla.redhat.com/show_bug.cgi?id=1639173 said it can’t 
> handle big ICMP packet, anybody can help clarify what limitations of 
> userspace conntrack are? Is there any existing document to warn users about 
> them? Thank you in advance.

What were your frag settings?  For example, try:

  ovs-appctl dpctl/ipf-set-min-frag v4 1000
  ovs-appctl dpctl/ipf-set-max-nfrags 500

See if that helps?

IIRC, the fragmentation engine doesn't support ICMP, just tcp/udp.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] can userspace conntrack support IP fragment?

2020-11-16 Thread
Hi, folks

 

I used latest ovs matser in Openstack, when I enabled security group and port 
security (note: openstack is using ovs openflow to implement security group), 
TCP performance is about several Mbps, big UDP packet (i.e. 8192) can’t work, 
but after disabled security group and port security, everything is ok, I doubt 
userspace conntrack can’t support IP fragment (or recent changes introduced 
bugs), https://bugzilla.redhat.com/show_bug.cgi?id=1639173 said it can’t handle 
big ICMP packet, anybody can help clarify what limitations of userspace 
conntrack are? Is there any existing document to warn users about them? Thank 
you in advance.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path

2020-10-27 Thread
-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年10月27日 21:03
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: f...@sysclose.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path

On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> GSO(Generic Segment Offload) can segment large UDP  and TCP packet to 
> small packets per MTU of destination , especially for the case that 
> physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, GSO 
> can make sure userspace TSO can still work but not drop.
> 
> In addition, GSO can help improve UDP performane when UFO is enabled 
> in VM.
> 
> GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx 
> function of physical NIC.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h|  21 +++-
>  lib/netdev-dpdk.c  | 358 
> +
>  lib/netdev-linux.c |  17 ++-
>  lib/netdev.c   |  67 +++---
>  4 files changed, 417 insertions(+), 46 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 79895f2..c33868d 
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -83,6 +83,8 @@ enum dp_packet_offload_mask {
>  DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800),
>  /* VXLAN TCP Segmentation Offload. */
>  DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_VXLAN, 
> 0x1000),
> +/* UDP Segmentation Offload. */
> +DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_SEG, PKT_TX_UDP_SEG, 0x2000),
>  /* Adding new field requires adding to 
> DP_PACKET_OL_SUPPORTED_MASK. */  };
>  
> @@ -97,7 +99,8 @@ enum dp_packet_offload_mask {
>   DP_PACKET_OL_TX_IPV6  | \
>   DP_PACKET_OL_TX_TCP_CKSUM | \
>   DP_PACKET_OL_TX_UDP_CKSUM | \
> - DP_PACKET_OL_TX_SCTP_CKSUM)
> + DP_PACKET_OL_TX_SCTP_CKSUM| \
> + DP_PACKET_OL_TX_UDP_SEG)
>  
>  #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \
>   DP_PACKET_OL_TX_UDP_CKSUM | \ @@ 
> -956,6 +959,13 @@ dp_packet_hwol_is_tso(const struct dp_packet *b)
>  return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_TCP_SEG);  
> }
>  
> +/* Returns 'true' if packet 'b' is marked for UDP segmentation 
> +offloading. */ static inline bool dp_packet_hwol_is_uso(const struct 
> +dp_packet *b) {
> +return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_UDP_SEG); 
> +}
> +
>  /* Returns 'true' if packet 'b' is marked for IPv4 checksum 
> offloading. */  static inline bool  dp_packet_hwol_is_ipv4(const 
> struct dp_packet *b) @@ -1034,6 +1044,15 @@ 
> dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>  *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;  }
>  
> +/* Mark packet 'b' for UDP segmentation offloading.  It implies that
> + * either the packet 'b' is marked for IPv4 or IPv6 checksum 
> +offloading
> + * and also for UDP checksum offloading. */ static inline void 
> +dp_packet_hwol_set_udp_seg(struct dp_packet *b) {
> +*dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_UDP_SEG; }
> +
>  #ifdef DPDK_NETDEV
>  /* Mark packet 'b' for VXLAN TCP segmentation offloading. */  static 
> inline void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 
> 30493ed..888a45e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -38,13 +38,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  #include "cmap.h"
>  #include "coverage.h"
> @@ -162,6 +164,7 @@ typedef uint16_t dpdk_port_t;
> | DEV_TX_OFFLOAD_UDP_CKSUM\
> | DEV_TX_OFFLOAD_IPV4_CKSUM)
>  
> +#define MAX_GSO_MBUFS 64
>  
>  static const struct rte_eth_conf port_conf = {
>  .rxmode = {
> @@ -2171,6 +2174,16 @@ is_local_to_local(uint16_t src_port_id, struct 
> netdev_dpdk *dev)
>  return ret;
>  }
>  
> +static uint16_t
> +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) {
> +if (ethertype == htons(RTE_ETHER_TYPE_IPV4)) {
> +return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
> +} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> +return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
> +}
> +}
> +
>  /* Prepare the packet for HWOL.
>   * Return True if the packet is OK to continue. */  static bool @@ 
> -2203,10 +2216,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
> struct rte_mbuf *mbuf)
>   * also can't be handled. So PKT_TX_TUNNEL_VXLAN must be cleared
>   * outer_l2_len and outer_l3_len must be zeroed.
>   */
> -if (!(dev->up.ol_flags & NETDEV_TX_OFFLOAD_VXLAN_TSO)
> -

[ovs-dev] 答复: [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK data path

2020-10-27 Thread
-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年10月27日 21:12
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: f...@sysclose.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK 
data path

On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> GRO(Generic Receive Offload) can help improve performance when TSO 
> (TCP Segment Offload) or VXLAN TSO is enabled on transmit side, this 
> can avoid overhead of ovs DPDK data path and enqueue vhost for VM by 
> merging many small packets to large packets (65535 bytes at most) once 
> it receives packets from physical NIC.

IIUC, this patch allows multi-segment mbufs to float across different parts of 
OVS.  This will definitely crash it somewhere.  Much more changes all over the 
OVS required to make it safely work with such mbufs.  There were few attempts 
to introduce this support, but all of them ended up being rejected.  As it is 
this patch is not acceptable as it doesn't cover almost anything beside simple 
cases inside netdev implementation.

Here is the latest attempt with multi-segment mbufs:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=130193=*

Best regards, Ilya Maximets.

[Yi Yang] We have to support this because we have supported TSO for TCP, it 
can't handle big UDP, this is why we must introduce GSO, the prerequisite of 
GSO is multi-segment  must be enabled because GSOed mbufs are multi-segmented, 
but it is just last  step before dpdk Tx, so I don't think it is an issue, per 
my test in our openstack environment, I didn't encounter any crash, this just 
enabled DPDK PMD driver to handle GSOed mbuf. For GRO, reassembling also use 
chained multi-segment mbuf to avoid copy, per long time test, it also didn't 
lead to any crash. We can fix some corner cases if they aren't covered. 


> 
> It can work for both VXLAN and vlan case.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h|  37 -
>  lib/netdev-dpdk.c  | 227 
> -
>  lib/netdev-linux.c | 112 --
>  3 files changed, 365 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index c33868d..18307c0 
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -580,7 +580,16 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v)
>   * (and thus 'v') will always be <= UINT16_MAX; this means that there is 
> no
>   * loss of accuracy in assigning 'v' to 'data_len'.
>   */
> -b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +if (b->mbuf.nb_segs <= 1) {
> +b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +} else {
> +/* For multi-seg packet, if it is resize, data_len should be
> + * adjusted by offset, this will happend in case of push or pop.
> + */
> +if (b->mbuf.pkt_len != 0) {
> +b->mbuf.data_len += v - b->mbuf.pkt_len;
> +}
> +}
>  b->mbuf.pkt_len = v; /* Total length of all segments linked 
> to
>* this segment. */  } @@ 
> -1092,6 +1101,20 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b, int 
> l4_len)  {
>  b->mbuf.l4_len = l4_len;
>  }
> +
> +/* Set outer_l2_len for the packet 'b' */ static inline void 
> +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int 
> +outer_l2_len) {
> +b->mbuf.outer_l2_len = outer_l2_len; }
> +
> +/* Set outer_l3_len for the packet 'b' */ static inline void 
> +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int 
> +outer_l3_len) {
> +b->mbuf.outer_l3_len = outer_l3_len; }
>  #else
>  /* Mark packet 'b' for VXLAN TCP segmentation offloading. */  static 
> inline void @@ -1125,6 +1148,18 @@ dp_packet_hwol_set_l4_len(struct 
> dp_packet *b OVS_UNUSED,
>int l4_len OVS_UNUSED)  {  }
> +
> +/* Set outer_l2_len for the packet 'b' */ static inline void 
> +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int 
> +outer_l2_len) { }
> +
> +/* Set outer_l3_len for the packet 'b' */ static inline void 
> +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int 
> +outer_l3_len) { }
>  #endif /* DPDK_NETDEV */
>  
>  static inline bool
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 888a45e..b6c57a6 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* Include rte_compat.h first to allow experimental API's needed for the
>   * rte_meter.h rfc4115 functions. Once they are no longer marked as
> @@ -47,6 +48,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "cmap.h"
>  #include "coverage.h"
> @@ -2184,6 +2186,8 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, 
> uint16_t ethertype)
>  }
>  }
>  
> +#define UDP_VXLAN_ETH_HDR_SIZE 30
> +
>  /* Prepare the packet for HWOL.
>   * Return True if 

[ovs-dev] 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

2020-10-27 Thread
-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Flavio Leitner
发送时间: 2020年10月27日 21:08
收件人: Ilya Maximets 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; olivier.m...@6wind.com
主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix incorrect shinfo initialization

On Tue, Oct 27, 2020 at 01:47:22PM +0100, Ilya Maximets wrote:
> On 10/27/20 12:34 PM, Flavio Leitner wrote:
> > On Wed, Oct 14, 2020 at 03:22:48PM +0800, yang_y...@163.com wrote:
> >> From: Yi Yang 
> >>
> >> shinfo is used to store reference counter and free callback of an 
> >> external buffer, but it is stored in mbuf if the mbuf has tailroom 
> >> for it.
> >>
> >> This is wrong because the mbuf (and its data) can be freed before 
> >> the external buffer, for example:
> >>
> >>   pkt2 = rte_pktmbuf_alloc(mp);
> >>   rte_pktmbuf_attach(pkt2, pkt);
> >>   rte_pktmbuf_free(pkt);
> 
> How is that possible with OVS?  Right now OVS doesn't support 
> multi-segement mbufs and will, likely, not support them in a near 
> future because it requires changes all other the codebase.
> 
> Is there any other scenario that could lead to issues with current OVS 
> implementation?

This is copying packets. The shinfo is allocated in the mbuf of the first 
packet which could be deleted without any references to the external buffer 
still using it.

Fbl

[Yi Yang]  Yes, this is not related with multi-segment mbuf, dpdk interfaces to 
system interfaces communication will use it if the packet size is greater than 
mtu size, i.e. TSO case from veth/tap to dpdk/vhost and backward will use it, 
this is a wrong use of shinfo, the same fix (which is used by virtio/vhost 
driver)has been merged into dpdk branch.
 
[Yi Yang] 



> 
> >>
> >> After this, pkt is freed, but it still contains shinfo, which is 
> >> referenced by pkt2.
> >>
> >> Fix this by always storing shinfo at the tail of external buffer.
> >>
> >> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload 
> >> support")
> >> Co-authored-by: Olivier Matz 
> >> Signed-off-by: Olivier Matz 
> >> Signed-off-by: Yi Yang 
> >> ---
> > 
> > Acked-by: Flavio Leitner 
> > 
> > Thanks Yi,
> > fbl
> > 
> 

--
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

2020-10-26 Thread
Hi, folks

Anybody can help merge this fix patch, please add comments if you think it has 
any issue, thanks a lot.

-邮件原件-
发件人: yang_y...@163.com [mailto:yang_y...@163.com] 
发送时间: 2020年10月14日 15:23
收件人: ovs-dev@openvswitch.org
抄送: b...@ovn.org; i.maxim...@ovn.org; ian.sto...@intel.com; u9012...@gmail.com; 
olivier.m...@6wind.com; f...@sysclose.org; Yi Yang (杨燚)-云服务集团 
; yang_y...@163.com
主题: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

From: Yi Yang 

shinfo is used to store reference counter and free callback of an external 
buffer, but it is stored in mbuf if the mbuf has tailroom for it.

This is wrong because the mbuf (and its data) can be freed before the external 
buffer, for example:

  pkt2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_attach(pkt2, pkt);
  rte_pktmbuf_free(pkt);

After this, pkt is freed, but it still contains shinfo, which is referenced by 
pkt2.

Fix this by always storing shinfo at the tail of external buffer.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Co-authored-by: Olivier Matz 
Signed-off-by: Olivier Matz 
Signed-off-by: Yi Yang 
---
 lib/netdev-dpdk.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0b830be..c7f9326 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2654,12 +2654,8 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, 
uint32_t data_len)
 uint16_t buf_len;
 void *buf;
 
-if (rte_pktmbuf_tailroom(pkt) >= sizeof *shinfo) {
-shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *);
-} else {
-total_len += sizeof *shinfo + sizeof(uintptr_t);
-total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
-}
+total_len += sizeof *shinfo + sizeof(uintptr_t);
+total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
 
 if (OVS_UNLIKELY(total_len > UINT16_MAX)) {
 VLOG_ERR("Can't copy packet: too big %u", total_len); @@ -2674,20 
+2670,14 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len)
 }
 
 /* Initialize shinfo. */
-if (shinfo) {
-shinfo->free_cb = netdev_dpdk_extbuf_free;
-shinfo->fcb_opaque = buf;
-rte_mbuf_ext_refcnt_set(shinfo, 1);
-} else {
-shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, _len,
-netdev_dpdk_extbuf_free,
-buf);
-if (OVS_UNLIKELY(shinfo == NULL)) {
-rte_free(buf);
-VLOG_ERR("Failed to initialize shared info for mbuf while "
- "attempting to attach an external buffer.");
-return NULL;
-}
+shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, _len,
+netdev_dpdk_extbuf_free,
+buf);
+if (OVS_UNLIKELY(shinfo == NULL)) {
+rte_free(buf);
+VLOG_ERR("Failed to initialize shared info for mbuf while "
+ "attempting to attach an external buffer.");
+return NULL;
 }
 
 rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), buf_len,
--
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH 2/2] odp-util: Add missing comma after gtpu attributes.

2020-10-19 Thread
Acked-by: Yi Yang 

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年10月20日 2:24
收件人: ovs-dev@openvswitch.org
抄送: William Tu ; Yi Yang (杨燚)-云服务集团 ; 
Ilya Maximets 
主题: [PATCH 2/2] odp-util: Add missing comma after gtpu attributes.

Currently flows are printed like this:
'tunnel(gtpu(flags=0x7f,msgtype=0)flags(0))'
With this change:
'tunnel(gtpu(flags=0x7f,msgtype=0),flags(0))'

Fixes: 3c6d05a02e0f ("userspace: Add GTP-U support.")
Signed-off-by: Ilya Maximets 
---
 lib/odp-util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c index e7424a9ac..0bd2f9aa8 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -3971,7 +3971,7 @@ format_odp_tun_attr(const struct nlattr *attr, const 
struct nlattr *mask_attr,
 case OVS_TUNNEL_KEY_ATTR_GTPU_OPTS:
 ds_put_cstr(ds, "gtpu(");
 format_odp_tun_gtpu_opt(a, ma, ds, verbose);
-ds_put_cstr(ds, ")");
+ds_put_cstr(ds, "),");
 break;
 case __OVS_TUNNEL_KEY_ATTR_MAX:
 default:
--
2.25.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH 1/2] odp-util: Fix using uninitialized gtpu metadata.

2020-10-19 Thread
Acked-by: Yi Yang 

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年10月20日 2:24
收件人: ovs-dev@openvswitch.org
抄送: William Tu ; Yi Yang (杨燚)-云服务集团 ; 
Ilya Maximets 
主题: [PATCH 1/2] odp-util: Fix using uninitialized gtpu metadata.

If datpath flow doesn't have one of the fields of gtpu metdata, e.g.
'tunnel(gtpu())', uninitialized stack memory will be used instead.

 ==3485429==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x853a1b in format_u8x lib/odp-util.c:3474:13
#1 0x86ee9c in format_odp_tun_gtpu_opt lib/odp-util.c:3713:5
#2 0x86a099 in format_odp_tun_attr lib/odp-util.c:3973:13
#3 0x83afe6 in format_odp_key_attr__ lib/odp-util.c:4179:9
#4 0x838afb in odp_flow_format lib/odp-util.c:4563:17
#5 0x738422 in log_flow_message lib/dpif.c:1750:5
#6 0x738e2f in log_flow_put_message lib/dpif.c:1784:9
#7 0x7371a4 in dpif_operate lib/dpif.c:1377:21
#8 0x7363ef in dpif_flow_put lib/dpif.c:1035:5
#9 0xc7aab7 in dpctl_put_flow lib/dpctl.c:1171:13
#10 0xc65a4f in dpctl_unixctl_handler lib/dpctl.c:2701:17
#11 0xaaad04 in process_command lib/unixctl.c:308:13
#12 0xaa87f7 in run_connection lib/unixctl.c:342:17
#13 0xaa842e in unixctl_server_run lib/unixctl.c:393:21
#14 0x51c09c in main vswitchd/ovs-vswitchd.c:128:9
#15 0x7f88344391a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
#16 0x46b92d in _start (vswitchd/ovs-vswitchd+0x46b92d)

  Uninitialized value was stored to memory at
#0 0x87da17 in scan_gtpu_metadata lib/odp-util.c:5221:27
#1 0x874588 in parse_odp_key_mask_attr__ lib/odp-util.c:5862:9
#2 0x83ee14 in parse_odp_key_mask_attr lib/odp-util.c:5808:18
#3 0x83e8b5 in odp_flow_from_string lib/odp-util.c:6065:18
#4 0xc7a4f3 in dpctl_put_flow lib/dpctl.c:1145:13
#5 0xc65a4f in dpctl_unixctl_handler lib/dpctl.c:2701:17
#6 0xaaad04 in process_command lib/unixctl.c:308:13
#7 0xaa87f7 in run_connection lib/unixctl.c:342:17
#8 0xaa842e in unixctl_server_run lib/unixctl.c:393:21
#9 0x51c09c in main vswitchd/ovs-vswitchd.c:128:9
#10 0x7f88344391a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

  Uninitialized value was created by an allocation of 'msgtype_ma' in the
  stack frame of function 'scan_gtpu_metadata'
#0 0x87d440 in scan_gtpu_metadata lib/odp-util.c:5187

Fix that by initializing fields to all zeroes by default.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21426
Fixes: 3c6d05a02e0f ("userspace: Add GTP-U support.")
Signed-off-by: Ilya Maximets 
---
 lib/odp-util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c index 5989381e9..e7424a9ac 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -5186,8 +5186,8 @@ scan_gtpu_metadata(const char *s,
struct gtpu_metadata *mask)  {
 const char *s_base = s;
-uint8_t flags, flags_ma;
-uint8_t msgtype, msgtype_ma;
+uint8_t flags = 0, flags_ma = 0;
+uint8_t msgtype = 0, msgtype_ma = 0;
 int len;
 
 if (!strncmp(s, "flags=", 6)) {
--
2.25.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v2] userspace: fix bad UDP performance issue of veth

2020-09-23 Thread
Aaron, I have sent out v3 to fix your comments, but runtime set isn't good 
because other_config is set by bridge_run(), runtime set will increase 
unnecessary overhead for bridge_run in loop, it also results in inconsistent 
socket buffer size for old interfaces and new interfaces, I think it is the 
best way to restart ovs-vswitchd to make new value take effect.

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年9月18日 9:48
收件人: 'acon...@redhat.com' 
抄送: 'yang_y...@163.com' ; 'ovs-dev@openvswitch.org' 
; 'i.maxim...@ovn.org' ; 
'f...@sysclose.org' 
主题: 答复: 答复: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue of 
veth
重要性: 高

Good idea,  but I don't have BSD to check it, maybe somebody can port it to BSD 
if he/she really care performance on BSD, I think it makes sense to use a 
separate patch to handle this.

-邮件原件-
发件人: Aaron Conole [mailto:acon...@redhat.com] 
发送时间: 2020年9月17日 22:34
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org; 
f...@sysclose.org
主题: Re: 答复: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue of 
veth

"Yi Yang (杨燚)-云服务集团"  writes:

> Aaron, thank you so much for comments, I'll update it to fix your comment in 
> v3, replies for comments inline, please check them.

Thanks.

I have one more comment to consider.  SO_SNDBUF / SO_RCVBUF are available on 
many OSes - does it make sense to make a similar change to the BSD code as well 
since that is also a userspace datapath component?

> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
> 发送时间: 2020年9月17日 1:17
> 收件人: yang_y...@163.com
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
> 主题: Re: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue 
> of veth
>
> yang_y...@163.com writes:
>
>> From: Yi Yang 
>>
>> iperf3 UDP performance of veth to veth case is very very bad because 
>> of too many packet loss, the root cause is rmem_default and 
>> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
>> which resulted in many UDP fragment in case that MTU size is 1500, 
>> one 8K UDP send would enqueue 6 UDP fragments to socket receive 
>> queue, the default small socket buffer size can't cache so many 
>> packets that many packets are lost.
>>
>> This commit fixed packet loss issue, it allows users to set socket 
>> receive and send buffer size per their own system environment to 
>> proper value, therefore there will not be packet loss.
>>
>> Users can set system interface socket buffer size by command lines:
>>
>>   $ sudo sh -c "1073741823 > /proc/sys/net/core/wmem_max"
>>   $ sudo sh -c "1073741823 > /proc/sys/net/core/rmem_max"
>>
>> or
>>
>>   $ sudo ovs-vsctl set Open_vSwitch . \
>> other_config:userspace-sock-buf-size=1073741823
>>
>> But final socket buffer size is minimum one among of them.
>> Possible value range is 212992 to 1073741823. Current default value 
>> for other_config:userspace-sock-buf-size is 212992, users need to 
>> increase it to improve UDP performance, the changed value will take 
>> effect after restarting ovs-vswitchd. More details about it is in the 
>> document Documentation/howto/userspace-udp-performance-tunning.rst.
>>
>> By the way, big socket buffer doesn't mean it will allocate big 
>> buffer on creating socket, actually it won't alocate any extra buffer 
>> compared to default socket buffer size, it just means more skbuffs 
>> can be enqueued to socket receive queue and send queue, therefore 
>> there will not be packet loss.
>>
>> The below is for your reference.
>>
>> The result before apply this commit
>> ===
>> $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
>> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  
>> 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
>> [ ID] Interval   Transfer Bandwidth   Total Datagrams
>> [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
>> [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
>> Datagrams
>> [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 (67%)
>> [  4] Sent 531 datagrams
>>
>> Server output:
>> 

[ovs-dev] 答复: 答复: 答复: [PATCH v2 0/3] userspace: enable tap interface?statistics and status update support

2020-09-21 Thread
Got it, thank you two guys, I'll try them and get it done.

-邮件原件-
发件人: Aaron Conole [mailto:acon...@redhat.com] 
发送时间: 2020年9月21日 22:46
收件人: Flavio Leitner 
抄送: Yi Yang (杨燚)-云服务集团 ; yang_y...@163.com; 
ovs-dev@openvswitch.org; i.maxim...@ovn.org
主题: Re: 答复: 答复: [ovs-dev] [PATCH v2 0/3] userspace: enable tap 
interface?statistics and status update support

Flavio Leitner  writes:

> On Fri, Sep 18, 2020 at 02:07:51AM +, Yi Yang (杨燚)-云服务集团 wrote:
>> To be clarified, tap socket isn't created in netns currently because OVS 
>> doesn't have such info, current way is:
>> 
>> # Step 1, add tap interface into ovs bridge in root netns, tap socket is 
>> created at this point.
>> # step 2, move tap interface to specified netns.
>> 
>> So question, how do you get netns id from tap socket? Would you guys 
>> like to add netlink API for this in Linux kernel?  I know you Redhat 
>> guys are familiar with rtnl.
>> 
>> Can you two guys clearly explain how to implement the behavior you expect?
>
> The commit below implements an ioctl API to get the netns fd from the tun 
> device.
> With that information, we can use netlink RTM_GETNSID to translate the 
> fd to the netns id. Finally, call rtm_getlink with that netnsid as 
> target netnsid to get all the info.
>
> Those are 3 calls, which is not really efficient. I would suggest to 
> improve rtm_getlink to accept a target netns fd as well or add an 
> ioctl to tun to return the netns id right away.

Yes.  We can implement this without any requirement on modifying the kernel.  
While it wouldn't be as efficient, it is still worthwhile, and doesn't require 
reassigning thread netns information.

It would require ensuring that we fill the netdev->netnsid information - and I 
don't have a strong opinion on the best way to do that (it will require a 
special case in the code right now to call the tun ioctl).

> commit 0c3e0e3bb623c3735b8c9ab8aa8332f944f83a9f
> Author: Kirill Tkhai 
> Date:   Wed Mar 20 12:16:42 2019 +0300
>
> tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns 
> of tun device
> 
> In commit f2780d6d7475 "tun: Add ioctl() SIOCGSKNS cmd to allow
> obtaining net ns of tun device" it was missed that tun may change
> its net ns, while net ns of socket remains the same as it was
> created initially. SIOCGSKNS returns net ns of socket, so it is
> not suitable for obtaining net ns of device.
> 
> We may have two tun devices with the same names in two net ns,
> and in this case it's not possible to determ, which of them
> fd refers to (TUNGETIFF will return the same name).
> 
> This patch adds new ioctl() cmd for obtaining net ns of a device.
> 
> Reported-by: Harald Albrecht 
> Signed-off-by: Kirill Tkhai 
> Signed-off-by: David S. Miller 
>
> HTH,
> fbl
>
>> 
>> 
>> -邮件原件-
>> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
>> 发送时间: 2020年9月17日 19:59
>> 收件人: Yi Yang (杨燚)-云服务集团 
>> 抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
>> i.maxim...@ovn.org
>> 主题: Re: 答复: [ovs-dev] [PATCH v2 0/3] userspace: enable tap 
>> interface?statistics and status update support
>> 
>> On Thu, Sep 17, 2020 at 01:05:22AM +, Yi Yang (杨燚)-云服务集团 wrote:
>> > Aaron, any caller thread just binds it to netns on calling 
>> > enter_netns, once it has entered netns, it won't disappear, so 
>> > exit_netns caller thread must be current thread, once it exits 
>> > netns, it returns back to original root netns, at this point, this 
>> > thread can disappear, not a question, isn't it? So I'm not sure why 
>> > you're saying it is unsafe.
>> > 
>> > It is impossible to let Linux kernel  provide that API with netns 
>> > as argument, although it is possible to do it theoretically, it is 
>> > impractical  fantasy IMO :-)
>> 
>> OVS already uses rtm_getlink to get that information, see 
>> netdev_linux_update_via_netlink().
>> 
>> What we need is to get netnsid from the tap socket. I also think that is a 
>> reasonable kernel API addition.  See for example:
>>  a86bd14ec ("netlink: provide network namespace id from a msg.").
>> 
>> fbl
>> 
>> > 
>> > -邮件原件-
>> > 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
>> > 发送时间: 2020年9月17日 0:38
>> > 收件人: yang_y...@163.com
>> > 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
>> > 主题: Re: [ovs-dev] [PATCH v2 0/3] userspace: enable tap interface 
>> > statistics and status update support
>> >

[ovs-dev] 答复: 答复: [PATCH v2 0/3] userspace: enable tap interface?statistics and status update support

2020-09-17 Thread
To be clarified, tap socket isn't created in netns currently because OVS 
doesn't have such info, current way is:

# Step 1, add tap interface into ovs bridge in root netns, tap socket is 
created at this point.
# step 2, move tap interface to specified netns.

So question, how do you get netns id from tap socket? Would you guys like to 
add netlink API for this in Linux kernel?  I know you Redhat guys are familiar 
with rtnl.

Can you two guys clearly explain how to implement the behavior you expect?


-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年9月17日 19:59
收件人: Yi Yang (杨燚)-云服务集团 
抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
i.maxim...@ovn.org
主题: Re: 答复: [ovs-dev] [PATCH v2 0/3] userspace: enable tap interface?statistics 
and status update support

On Thu, Sep 17, 2020 at 01:05:22AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Aaron, any caller thread just binds it to netns on calling enter_netns, once 
> it has entered netns, it won't disappear, so exit_netns caller thread must be 
> current thread, once it exits netns, it returns back to original root netns, 
> at this point, this thread can disappear, not a question, isn't it? So I'm 
> not sure why you're saying it is unsafe.
> 
> It is impossible to let Linux kernel  provide that API with netns as 
> argument, although it is possible to do it theoretically, it is 
> impractical  fantasy IMO :-)

OVS already uses rtm_getlink to get that information, see 
netdev_linux_update_via_netlink().

What we need is to get netnsid from the tap socket. I also think that is a 
reasonable kernel API addition.  See for example:
 a86bd14ec ("netlink: provide network namespace id from a msg.").

fbl

> 
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
> 发送时间: 2020年9月17日 0:38
> 收件人: yang_y...@163.com
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
> 主题: Re: [ovs-dev] [PATCH v2 0/3] userspace: enable tap interface 
> statistics and status update support
> 
> yang_y...@163.com writes:
> 
> > From: Yi Yang 
> >
> > OVS userspace datapath can't support tap interface statistics and 
> > status update, so users can't get these information by cmd 
> > "ovs-vsctl list interface tap1", the root cause of this issue is OVS 
> > doesn't know network namespace of tap interface.
> >
> > This patch series fixed this issue and make sure tap interface can 
> > show statistics and get status update.
> >
> > Yi Yang (3):
> >   Add netns option for tap interface in userspace datapath
> >   Fix tap interface statistics issue
> >   Fix tap interface status update issue in network namespace
> >
> >  lib/dpif-netlink.c |  51 +
> >  lib/dpif-netlink.h |   3 +
> >  lib/netdev-linux-private.h |   1 +
> >  lib/netdev-linux.c | 481 
> > -
> >  lib/netlink-socket.c   | 146 ++
> >  lib/netlink-socket.h   |   2 +
> >  lib/socket-util-unix.c |  37 
> >  lib/socket-util.h  |   3 +
> >  8 files changed, 675 insertions(+), 49 deletions(-)
> >
> > --
> >
> > Changelog
> >
> >   v1 -> v2:
> > * Split pmd thread support to seperate patch series
> > * Check enter_netns return error
> > * Limit setns to network namespace only by CLONE_NEWNET
> 
> Sorry, but more thinking about this I don't support this series going in.  It 
> reassociates the thread with a netns that may disappear causing faults in the 
> middle of processing - I don't think it's safe.
> 
> NAK.
> 
> I think the correct solution is to add support in the kernel for 
> getting the netns/ifindex from the tap socket, and then use that to 
> query the statistics.  This should be solved by using (or creating if 
> one doesn't
> exist) a kernel API to do this query by getting the netns information and 
> using that to do these get operations.
> 
> Maybe someone disagrees.
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



--
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v2] userspace: fix bad UDP performance issue of veth

2020-09-17 Thread
Good idea,  but I don't have BSD to check it, maybe somebody can port it to BSD 
if he/she really care performance on BSD, I think it makes sense to use a 
separate patch to handle this.

-邮件原件-
发件人: Aaron Conole [mailto:acon...@redhat.com] 
发送时间: 2020年9月17日 22:34
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org; 
f...@sysclose.org
主题: Re: 答复: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue of 
veth

"Yi Yang (杨燚)-云服务集团"  writes:

> Aaron, thank you so much for comments, I'll update it to fix your comment in 
> v3, replies for comments inline, please check them.

Thanks.

I have one more comment to consider.  SO_SNDBUF / SO_RCVBUF are available on 
many OSes - does it make sense to make a similar change to the BSD code as well 
since that is also a userspace datapath component?

> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
> 发送时间: 2020年9月17日 1:17
> 收件人: yang_y...@163.com
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
> 主题: Re: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue 
> of veth
>
> yang_y...@163.com writes:
>
>> From: Yi Yang 
>>
>> iperf3 UDP performance of veth to veth case is very very bad because 
>> of too many packet loss, the root cause is rmem_default and 
>> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
>> which resulted in many UDP fragment in case that MTU size is 1500, 
>> one 8K UDP send would enqueue 6 UDP fragments to socket receive 
>> queue, the default small socket buffer size can't cache so many 
>> packets that many packets are lost.
>>
>> This commit fixed packet loss issue, it allows users to set socket 
>> receive and send buffer size per their own system environment to 
>> proper value, therefore there will not be packet loss.
>>
>> Users can set system interface socket buffer size by command lines:
>>
>>   $ sudo sh -c "1073741823 > /proc/sys/net/core/wmem_max"
>>   $ sudo sh -c "1073741823 > /proc/sys/net/core/rmem_max"
>>
>> or
>>
>>   $ sudo ovs-vsctl set Open_vSwitch . \
>> other_config:userspace-sock-buf-size=1073741823
>>
>> But final socket buffer size is minimum one among of them.
>> Possible value range is 212992 to 1073741823. Current default value 
>> for other_config:userspace-sock-buf-size is 212992, users need to 
>> increase it to improve UDP performance, the changed value will take 
>> effect after restarting ovs-vswitchd. More details about it is in the 
>> document Documentation/howto/userspace-udp-performance-tunning.rst.
>>
>> By the way, big socket buffer doesn't mean it will allocate big 
>> buffer on creating socket, actually it won't alocate any extra buffer 
>> compared to default socket buffer size, it just means more skbuffs 
>> can be enqueued to socket receive queue and send queue, therefore 
>> there will not be packet loss.
>>
>> The below is for your reference.
>>
>> The result before apply this commit
>> ===
>> $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
>> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  
>> 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
>> [ ID] Interval   Transfer Bandwidth   Total Datagrams
>> [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
>> [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
>> Datagrams
>> [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 (67%)
>> [  4] Sent 531 datagrams
>>
>> Server output:
>> ---
>> Accepted connection from 10.15.2.2, port 60314 [  5] local 10.15.2.6 
>> port 5201 connected to 10.15.2.2 port 59053
>> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
>> Datagrams
>> [  5]   0.00-1.00   sec  1.36 MBytes  11.4 Mbits/sec  0.047 ms  357/531 (67%)
>> [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>> [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>> [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>> [  5]   4.00-5.00   sec  0.00 Bytes  0.0

[ovs-dev] 答复: [PATCH v2] userspace: fix bad UDP performance issue of veth

2020-09-16 Thread
Aaron, thank you so much for comments, I'll update it to fix your comment in 
v3, replies for comments inline, please check them.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
发送时间: 2020年9月17日 1:17
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
主题: Re: [ovs-dev] [PATCH v2] userspace: fix bad UDP performance issue of veth

yang_y...@163.com writes:

> From: Yi Yang 
>
> iperf3 UDP performance of veth to veth case is very very bad because 
> of too many packet loss, the root cause is rmem_default and 
> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
> which resulted in many UDP fragment in case that MTU size is 1500, one 
> 8K UDP send would enqueue 6 UDP fragments to socket receive queue, the 
> default small socket buffer size can't cache so many packets that many 
> packets are lost.
>
> This commit fixed packet loss issue, it allows users to set socket 
> receive and send buffer size per their own system environment to 
> proper value, therefore there will not be packet loss.
>
> Users can set system interface socket buffer size by command lines:
>
>   $ sudo sh -c "1073741823 > /proc/sys/net/core/wmem_max"
>   $ sudo sh -c "1073741823 > /proc/sys/net/core/rmem_max"
>
> or
>
>   $ sudo ovs-vsctl set Open_vSwitch . \
> other_config:userspace-sock-buf-size=1073741823
>
> But final socket buffer size is minimum one among of them.
> Possible value range is 212992 to 1073741823. Current default value 
> for other_config:userspace-sock-buf-size is 212992, users need to 
> increase it to improve UDP performance, the changed value will take 
> effect after restarting ovs-vswitchd. More details about it is in the 
> document Documentation/howto/userspace-udp-performance-tunning.rst.
>
> By the way, big socket buffer doesn't mean it will allocate big buffer 
> on creating socket, actually it won't alocate any extra buffer 
> compared to default socket buffer size, it just means more skbuffs can 
> be enqueued to socket receive queue and send queue, therefore there 
> will not be packet loss.
>
> The below is for your reference.
>
> The result before apply this commit
> ===
> $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  4] 
> local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
> [ ID] Interval   Transfer Bandwidth   Total Datagrams
> [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
> [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 (67%)
> [  4] Sent 531 datagrams
>
> Server output:
> ---
> Accepted connection from 10.15.2.2, port 60314 [  5] local 10.15.2.6 
> port 5201 connected to 10.15.2.2 port 59053
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  5]   0.00-1.00   sec  1.36 MBytes  11.4 Mbits/sec  0.047 ms  357/531 (67%)
> [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>
> iperf Done.
>
> The result after apply this commit
> ===
> $ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 
> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  4] 
> local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201
> [ ID] Interval   Transfer Bandwidth   Total Datagrams
> [  4]   0.00-1.00   sec   440 MBytes  3.69 Gbits/sec  56276
> [  4]   1.00-2.00   sec   481 MBytes  4.04 Gbits/sec  61579
> [  4]   2.00-3.00   sec   474 MBytes  3.98 Gbits/sec  60678
> [  4]   3.00-4.00   sec   480 MBytes  4.03 Gbits/sec  61452
> [  4]   4.00-5.00   sec   480 MBytes  4.03 Gbits/sec  61441
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  4]   0.00-5.00   sec  2.30 GBytes  3.95 Gbits/sec  0.024 ms  0/301426 (0%)
> [  4] Sent 301426 datagrams
>
> Server output:
> ---
> Accepted connection from 10.15.2.2, port 60320 [  5] local 10.15.2.6 
> port 5201 connected to 10.15.2.2 port 48547
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  5]   0.00-1.00   sec   209 

[ovs-dev] 答复: [PATCH v2 0/3] userspace: enable tap interface statistics and status update support

2020-09-16 Thread
Aaron, any caller thread just binds it to netns on calling enter_netns, once it 
has entered netns, it won't disappear, so exit_netns caller thread must be 
current thread, once it exits netns, it returns back to original root netns, at 
this point, this thread can disappear, not a question, isn't it? So I'm not 
sure why you're saying it is unsafe.

It is impossible to let Linux kernel  provide that API with netns as argument, 
although it is possible to do it theoretically, it is impractical  fantasy IMO 
:-)

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
发送时间: 2020年9月17日 0:38
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
主题: Re: [ovs-dev] [PATCH v2 0/3] userspace: enable tap interface statistics and 
status update support

yang_y...@163.com writes:

> From: Yi Yang 
>
> OVS userspace datapath can't support tap interface statistics and 
> status update, so users can't get these information by cmd "ovs-vsctl 
> list interface tap1", the root cause of this issue is OVS doesn't know 
> network namespace of tap interface.
>
> This patch series fixed this issue and make sure tap interface can 
> show statistics and get status update.
>
> Yi Yang (3):
>   Add netns option for tap interface in userspace datapath
>   Fix tap interface statistics issue
>   Fix tap interface status update issue in network namespace
>
>  lib/dpif-netlink.c |  51 +
>  lib/dpif-netlink.h |   3 +
>  lib/netdev-linux-private.h |   1 +
>  lib/netdev-linux.c | 481 
> -
>  lib/netlink-socket.c   | 146 ++
>  lib/netlink-socket.h   |   2 +
>  lib/socket-util-unix.c |  37 
>  lib/socket-util.h  |   3 +
>  8 files changed, 675 insertions(+), 49 deletions(-)
>
> --
>
> Changelog
>
>   v1 -> v2:
> * Split pmd thread support to seperate patch series
> * Check enter_netns return error
> * Limit setns to network namespace only by CLONE_NEWNET

Sorry, but more thinking about this I don't support this series going in.  It 
reassociates the thread with a netns that may disappear causing faults in the 
middle of processing - I don't think it's safe.

NAK.

I think the correct solution is to add support in the kernel for getting the 
netns/ifindex from the tap socket, and then use that to query the statistics.  
This should be solved by using (or creating if one doesn't
exist) a kernel API to do this query by getting the netns information and using 
that to do these get operations.

Maybe someone disagrees.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: 答复: [PATCH] userspace: fix bad UDP performance issue of?veth

2020-09-04 Thread
No problem, will add Documentation/topics/userspace-udp-performance-tunning 
.rst to document more information. This is also helpful to VM-to-VM udp packet 
loss rate improvement.

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年9月3日 20:03
收件人: Yi Yang (杨燚)-云服务集团 ; f...@sysclose.org
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org; Aaron 
Conole 
主题: Re: 答复: [ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance issue 
of?veth

On 9/3/20 4:06 AM, Yi Yang (杨燚)-云服务集团 wrote:
> As I have replied per Aaron's concern, users need to use 
> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max to set values 
> they prefer, final values are smaller one between 
> rmem_max(SOL_RCVBUF)/wmem_max(SOL_SNDBUF) and  1073741823. So 
> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max are just knob 
> you're expecting.

AFAIU, the question here is not the size of buffers, but the fact of calling 
setsockopt with SOL_RCVBUF/SOL_SNDBUF regardless of the value.

By making this syscall you're disabling a big pile of optimizations in kernel 
TCP stack and this is not a good thing to do.  So, if user wants to have better 
UDP performance and agrees to sacrifice some features of TCP stack, we could 
allow that with a special configuration knob for the OVS interface.  But we 
should not sacrifice any TCP features by default even if this increases UDP 
preformance.

Documentation with pros and cons will be needed for this configuration knob.

Best regards, Ilya Maximets.

> 
> -邮件原件-
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年9月3日 1:32
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org
> 主题: Re: [ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance 
> issue of?veth
> 
> On Mon, Aug 31, 2020 at 12:38:16AM +, Yi Yang (杨燚)-云服务集团 wrote:
>> Flavio, per my test, it also improved TCP performance, you can use 
>> run-iperf.sh script  I sent to have a try. Actually, iperf3 did the 
>> same thing by using -w option.
> 
> I believe that the results improve with lab tests on a controlled 
> environment. The kernel auto-tune buffer and the backlogging happens when the 
> system is busy with something else and you have multiple devices and TCP 
> streams going on.  For instance, the bufferbloat problem as Aaron already 
> mentioned.
> 
> So, I agree with Aaron that this needs a config knob for cases where using 
> the maximum is not a good idea. For instance, I know that some software 
> solutions out there recommends to pump those defaults to huge numbers, which 
> might be ok for that solution, but it may cause OVS issues under load.
> 
> Does that make sense?
> 
> fbl
> 
>>
>> -邮件原件-
>> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
>> 发送时间: 2020年8月27日 21:28
>> 收件人: Yi Yang (杨燚)-云服务集团 
>> 抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
>> i.maxim...@ovn.org
>> 主题: Re: [ovs-dev] 答复: [PATCH] userspace: fix bad UDP performance 
>> issue of?veth
>>
>>
>> Hi,
>>
>>
>> On Wed, Aug 26, 2020 at 12:47:43AM +, Yi Yang (杨燚)-云服务集团 wrote:
>>> Aaron, thank for your comments, actually final value depends on 
>>> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max, so it 
>>> is still configurable. setsockopt(...) will set it to minimum one 
>>> among of
>>> 1073741823 and w/rmem_max.
>>>
>>> -邮件原件-
>>> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
>>> 发送时间: 2020年8月25日 23:26
>>> 收件人: yang_y...@163.com
>>> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
>>> 主题: Re: [ovs-dev] [PATCH] userspace: fix bad UDP performance issue 
>>> of veth
>>>
>>> yang_y...@163.com writes:
>>>
>>>> From: Yi Yang 
>>>>
>>>> iperf3 UDP performance of veth to veth case is very very bad 
>>>> because of too many packet loss, the root cause is rmem_default and 
>>>> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
>>>> which resulted in many UDP fragment in case that MTU size is 1500, 
>>>> one 8K UDP send would enqueue 6 UDP fragments to socket receive 
>>>> queue, the default small socket buffer size can't cache so many 
>>>> packets that many packets are lost.
>>>>
>>>> This commit fixed packet loss issue, it set socket receive and send 
>>>> buffer to maximum possible value, therefore there will not be 
>>>> packet loss forever, this also helps improve TCP performance 
>>>> because of no retransmit.
>>>&

[ovs-dev] 答复: 答复: 答复: [PATCH] userspace: fix bad UDP performance issue of?veth

2020-09-02 Thread
As I have replied per Aaron's concern, users need to use 
/proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max to set values they 
prefer, final values are smaller one between 
rmem_max(SOL_RCVBUF)/wmem_max(SOL_SNDBUF) and  1073741823. So 
/proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max are just knob 
you're expecting.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年9月3日 1:32
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance issue 
of?veth

On Mon, Aug 31, 2020 at 12:38:16AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Flavio, per my test, it also improved TCP performance, you can use 
> run-iperf.sh script  I sent to have a try. Actually, iperf3 did the 
> same thing by using -w option.

I believe that the results improve with lab tests on a controlled environment. 
The kernel auto-tune buffer and the backlogging happens when the system is busy 
with something else and you have multiple devices and TCP streams going on.  
For instance, the bufferbloat problem as Aaron already mentioned.

So, I agree with Aaron that this needs a config knob for cases where using the 
maximum is not a good idea. For instance, I know that some software solutions 
out there recommends to pump those defaults to huge numbers, which might be ok 
for that solution, but it may cause OVS issues under load.

Does that make sense?

fbl

> 
> -邮件原件-
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年8月27日 21:28
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
> i.maxim...@ovn.org
> 主题: Re: [ovs-dev] 答复: [PATCH] userspace: fix bad UDP performance issue 
> of?veth
> 
> 
> Hi,
> 
> 
> On Wed, Aug 26, 2020 at 12:47:43AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > Aaron, thank for your comments, actually final value depends on 
> > /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max, so it 
> > is still configurable. setsockopt(...) will set it to minimum one 
> > among of
> > 1073741823 and w/rmem_max.
> > 
> > -邮件原件-
> > 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
> > 发送时间: 2020年8月25日 23:26
> > 收件人: yang_y...@163.com
> > 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
> > 主题: Re: [ovs-dev] [PATCH] userspace: fix bad UDP performance issue 
> > of veth
> > 
> > yang_y...@163.com writes:
> > 
> > > From: Yi Yang 
> > >
> > > iperf3 UDP performance of veth to veth case is very very bad 
> > > because of too many packet loss, the root cause is rmem_default 
> > > and wmem_default are just 212992, but iperf3 UDP test used 8K UDP 
> > > size which resulted in many UDP fragment in case that MTU size is 
> > > 1500, one 8K UDP send would enqueue 6 UDP fragments to socket 
> > > receive queue, the default small socket buffer size can't cache so 
> > > many packets that many packets are lost.
> > >
> > > This commit fixed packet loss issue, it set socket receive and 
> > > send buffer to maximum possible value, therefore there will not be 
> > > packet loss forever, this also helps improve TCP performance 
> > > because of no retransmit.
> > >
> > > By the way, big socket buffer doesn't mean it will allocate big 
> > > buffer on creating socket, actually it won't alocate any extra 
> > > buffer compared to default socket buffer size, it just means more 
> > > skbuffs can be enqueued to socket receive queue and send queue, 
> > > therefore there will not be packet loss.
> > >
> > > The below is for your reference.
> > >
> > > The result before apply this commit 
> > > ===
> > > $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
> > > --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [ 
> > > 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
> > > [ ID] Interval   Transfer Bandwidth   Total Datagrams
> > > [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
> > > [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > > [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > > [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > > [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval   Transfer Bandwidth   Jitter
> > > Lost/Total Datagrams
> > > [  4]   0.00-5.00   sec  58.5 MBytes  98.1

[ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance issue of?veth

2020-08-30 Thread
Flavio, per my test, it also improved TCP performance, you can use run-iperf.sh 
script  I sent to have a try. Actually, iperf3 did the same thing by using -w 
option. 

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年8月27日 21:28
收件人: Yi Yang (杨燚)-云服务集团 
抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
i.maxim...@ovn.org
主题: Re: [ovs-dev] 答复: [PATCH] userspace: fix bad UDP performance issue of?veth


Hi,


On Wed, Aug 26, 2020 at 12:47:43AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Aaron, thank for your comments, actually final value depends on 
> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max, so it is 
> still configurable. setsockopt(...) will set it to minimum one among 
> of
> 1073741823 and w/rmem_max.
> 
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
> 发送时间: 2020年8月25日 23:26
> 收件人: yang_y...@163.com
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
> 主题: Re: [ovs-dev] [PATCH] userspace: fix bad UDP performance issue of 
> veth
> 
> yang_y...@163.com writes:
> 
> > From: Yi Yang 
> >
> > iperf3 UDP performance of veth to veth case is very very bad because 
> > of too many packet loss, the root cause is rmem_default and 
> > wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
> > which resulted in many UDP fragment in case that MTU size is 1500, 
> > one 8K UDP send would enqueue 6 UDP fragments to socket receive 
> > queue, the default small socket buffer size can't cache so many 
> > packets that many packets are lost.
> >
> > This commit fixed packet loss issue, it set socket receive and send 
> > buffer to maximum possible value, therefore there will not be packet 
> > loss forever, this also helps improve TCP performance because of no 
> > retransmit.
> >
> > By the way, big socket buffer doesn't mean it will allocate big 
> > buffer on creating socket, actually it won't alocate any extra 
> > buffer compared to default socket buffer size, it just means more 
> > skbuffs can be enqueued to socket receive queue and send queue, 
> > therefore there will not be packet loss.
> >
> > The below is for your reference.
> >
> > The result before apply this commit
> > ===
> > $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
> > --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  
> > 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
> > [ ID] Interval   Transfer Bandwidth   Total Datagrams
> > [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
> > [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> > Datagrams
> > [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 
> > (67%)
> > [  4] Sent 531 datagrams
> >
> > Server output:
> > ---
> > Accepted connection from 10.15.2.2, port 60314 [  5] local 10.15.2.6 
> > port 5201 connected to 10.15.2.2 port 59053
> > [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> > Datagrams
> > [  5]   0.00-1.00   sec  1.36 MBytes  11.4 Mbits/sec  0.047 ms  357/531 
> > (67%)
> > [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> > [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> > [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> > [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> >
> > iperf Done.
> >
> > The result after apply this commit
> > ===
> > $ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 
> > --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  
> > 4] local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201
> > [ ID] Interval   Transfer Bandwidth   Total Datagrams
> > [  4]   0.00-1.00   sec   440 MBytes  3.69 Gbits/sec  56276
> > [  4]   1.00-2.00   sec   481 MBytes  4.04 Gbits/sec  61579
> > [  4]   2.00-3.00   sec   474 MBytes  3.98 Gbits/sec  60678
> > [  4]   3.00-4.00   sec   480 MBytes  4.03 Gbits/sec  61452
> > [  4]   4.00-5.00   sec   480 MBytes  4.03 Gbits/sec  61441
&

[ovs-dev] 答复: [PATCH V1 3/4] Fix tap interface statistics issue

2020-08-25 Thread
Aaron, thank you so much for review, yes, I should handle error. For setns, I 
will check if it can work as you commented. I'll send V2 to fix your comments.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
发送时间: 2020年8月25日 23:06
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
主题: Re: [ovs-dev] [PATCH V1 3/4] Fix tap interface statistics issue

yang_y...@163.com writes:

> From: Yi Yang 
>
> After tap interface is moved to network namespace, "ovs-vsctl list 
> interface tapXXX" can get statistics info of tap interface, the root 
> cause is OVS still gets statistics info in root namespace.
>
> With netns option help, OVS can get statistics info in tap interface 
> netns.
>
> This patch added enter and exit netns helpers and change 
> statistics-related functions for those tap interfaces which have been 
> moved into netns and make sure "ovs-vsctl list interface tapXXX" can 
> get statistics info correctly.
>
> Here is a result sample for reference:
> name: tap1
> ofport  : 4
> ofport_request  : []
> options : {netns=ns01}
> other_config: {}
> statistics  : {rx_bytes=6228, rx_packets=68, tx_bytes=8310, 
> tx_packets=95}
> status  : {}
> type: tap
>
> Signed-off-by: Yi Yang 
> ---
>  lib/dpif-netlink.c   |  51 ++
>  lib/dpif-netlink.h   |   3 ++
>  lib/netdev-linux.c   |  60 -
>  lib/netlink-socket.c | 146 
> +++
>  lib/netlink-socket.h |   2 +
>  5 files changed, 260 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 
> 7da4fb5..8ed37ed 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -4282,6 +4282,43 @@ dpif_netlink_vport_transact(const struct 
> dpif_netlink_vport *request,
>  return error;
>  }
>  
> +static int
> +dpif_netlink_vport_transact_nopool(const struct dpif_netlink_vport *request,
> +   struct dpif_netlink_vport *reply,
> +   struct ofpbuf **bufp) {
> +struct ofpbuf *request_buf;
> +int error;
> +
> +ovs_assert((reply != NULL) == (bufp != NULL));
> +
> +error = dpif_netlink_init();
> +if (error) {
> +if (reply) {
> +*bufp = NULL;
> +dpif_netlink_vport_init(reply);
> +}
> +return error;
> +}
> +
> +request_buf = ofpbuf_new(1024);
> +dpif_netlink_vport_to_ofpbuf(request, request_buf);
> +error = nl_transact_nopool(NETLINK_GENERIC, request_buf, bufp);
> +ofpbuf_delete(request_buf);
> +
> +if (reply) {
> +if (!error) {
> +error = dpif_netlink_vport_from_ofpbuf(reply, *bufp);
> +}
> +if (error) {
> +dpif_netlink_vport_init(reply);
> +ofpbuf_delete(*bufp);
> +*bufp = NULL;
> +}
> +}
> +return error;
> +}
> +
>  /* Obtains information about the kernel vport named 'name' and stores it into
>   * '*reply' and '*bufp'.  The caller must free '*bufp' when the reply is no
>   * longer needed ('reply' will contain pointers into '*bufp').  */ @@ 
> -4298,6 +4335,20 @@ dpif_netlink_vport_get(const char *name, struct 
> dpif_netlink_vport *reply,
>  return dpif_netlink_vport_transact(, reply, bufp);  }
>  
> +int
> +dpif_netlink_vport_get_nopool(const char *name,
> +  struct dpif_netlink_vport *reply,
> +  struct ofpbuf **bufp) {
> +struct dpif_netlink_vport request;
> +
> +dpif_netlink_vport_init();
> +request.cmd = OVS_VPORT_CMD_GET;
> +request.name = name;
> +
> +return dpif_netlink_vport_transact_nopool(, reply, bufp); 
> +}
> +
>  /* Parses the contents of 'buf', which contains a "struct ovs_header" 
> followed
>   * by Netlink attributes, into 'dp'.  Returns 0 if successful, otherwise a
>   * positive errno value.
> diff --git a/lib/dpif-netlink.h b/lib/dpif-netlink.h index 
> 24294bc..9372241 100644
> --- a/lib/dpif-netlink.h
> +++ b/lib/dpif-netlink.h
> @@ -55,6 +55,9 @@ int dpif_netlink_vport_transact(const struct 
> dpif_netlink_vport *request,
>  struct ofpbuf **bufp);  int 
> dpif_netlink_vport_get(const char *name, struct dpif_netlink_vport *reply,
> struct ofpbuf **bufp);
> +int dpif_netlink_vport_get_nopool(const char *name,
> +  struct dpif_netlink_vport *reply,
> +  struct ofpbuf **bufp);
>  
>  bool dpif_netlink_is_internal_device(const char *name);
>  
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
> bb3aa9b..2a7d5ec 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -520,8 +520,16 @@ static struct vlog_rate_limit rl = 
> VLOG_RATE_LIMIT_INIT(5, 20);
>   * changes in the device miimon status, so we can use atomic_count. 

[ovs-dev] 答复: [PATCH] userspace: fix bad UDP performance issue of veth

2020-08-25 Thread
Aaron, thank for your comments, actually final value depends on 
/proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max, so it is still 
configurable. setsockopt(...) will set it to minimum one among of 
1073741823 and w/rmem_max.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
发送时间: 2020年8月25日 23:26
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
主题: Re: [ovs-dev] [PATCH] userspace: fix bad UDP performance issue of veth

yang_y...@163.com writes:

> From: Yi Yang 
>
> iperf3 UDP performance of veth to veth case is very very bad because 
> of too many packet loss, the root cause is rmem_default and 
> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
> which resulted in many UDP fragment in case that MTU size is 1500, one 
> 8K UDP send would enqueue 6 UDP fragments to socket receive queue, the 
> default small socket buffer size can't cache so many packets that many 
> packets are lost.
>
> This commit fixed packet loss issue, it set socket receive and send 
> buffer to maximum possible value, therefore there will not be packet 
> loss forever, this also helps improve TCP performance because of no 
> retransmit.
>
> By the way, big socket buffer doesn't mean it will allocate big buffer 
> on creating socket, actually it won't alocate any extra buffer 
> compared to default socket buffer size, it just means more skbuffs can 
> be enqueued to socket receive queue and send queue, therefore there 
> will not be packet loss.
>
> The below is for your reference.
>
> The result before apply this commit
> ===
> $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  4] 
> local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
> [ ID] Interval   Transfer Bandwidth   Total Datagrams
> [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
> [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
> [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 (67%)
> [  4] Sent 531 datagrams
>
> Server output:
> ---
> Accepted connection from 10.15.2.2, port 60314 [  5] local 10.15.2.6 
> port 5201 connected to 10.15.2.2 port 59053
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  5]   0.00-1.00   sec  1.36 MBytes  11.4 Mbits/sec  0.047 ms  357/531 (67%)
> [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>
> iperf Done.
>
> The result after apply this commit
> ===
> $ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 
> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [  4] 
> local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201
> [ ID] Interval   Transfer Bandwidth   Total Datagrams
> [  4]   0.00-1.00   sec   440 MBytes  3.69 Gbits/sec  56276
> [  4]   1.00-2.00   sec   481 MBytes  4.04 Gbits/sec  61579
> [  4]   2.00-3.00   sec   474 MBytes  3.98 Gbits/sec  60678
> [  4]   3.00-4.00   sec   480 MBytes  4.03 Gbits/sec  61452
> [  4]   4.00-5.00   sec   480 MBytes  4.03 Gbits/sec  61441
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  4]   0.00-5.00   sec  2.30 GBytes  3.95 Gbits/sec  0.024 ms  0/301426 (0%)
> [  4] Sent 301426 datagrams
>
> Server output:
> ---
> Accepted connection from 10.15.2.2, port 60320 [  5] local 10.15.2.6 
> port 5201 connected to 10.15.2.2 port 48547
> [ ID] Interval   Transfer Bandwidth   JitterLost/Total 
> Datagrams
> [  5]   0.00-1.00   sec   209 MBytes  1.75 Gbits/sec  0.021 ms  0/26704 (0%)
> [  5]   1.00-2.00   sec   258 MBytes  2.16 Gbits/sec  0.025 ms  0/32967 (0%)
> [  5]   2.00-3.00   sec   258 MBytes  2.16 Gbits/sec  0.022 ms  0/32987 (0%)
> [  5]   3.00-4.00   sec   257 MBytes  2.16 Gbits/sec  0.023 ms  0/32954 (0%)
> [  5]   4.00-5.00   sec   257 MBytes  2.16 Gbits/sec  0.021 ms  0/32937 (0%)
> [  5]   5.00-6.00   sec   255 MBytes  2.14 Gbits/sec  0.026 ms  0/32685 (0%)
> [  5]   6.00-7.00   sec   254 MBytes  2.13 Gbits/sec  0.025 ms  0/32453 (0%)
> [  5]   7.00-8.00   sec   255 MBytes  2.14 Gbits/sec  0.026 ms  

[ovs-dev] 答复: [PATCH v2 2/5] Enable VXLAN TSO for DPDK datapath

2020-07-12 Thread
Flavio, thank you so much for reviewing again, I'll fix them in next version, 
detailed replies inline. Sorry for bad outlook email format, I only can use 
outlook.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Flavio Leitner
发送时间: 2020年7月11日 4:51
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v2 2/5] Enable VXLAN TSO for DPDK datapath


Hi Yi,

This is not a full review, but netdev-dpdk.c is used by Window and BSD as well, 
and there is a 'linux' function which seems to be a copy of another existing 
one. Perhaps we can use just one?

[Yi Yang] I have some changes against them, do you mean we can invoke functions 
in netdev-linux? How shall we handle them in non-Linux cases?

This patch resets ol_flags from vhostuser ignoring what has been set by 
rte_vhost_dequeue_burst(). What happens if a VM turns off offloading? Also that 
it is always enabled while userspace offloading is experimental and default to 
off.

[Yi Yang] I didn't realize this, will fix them in new version.

Why do we need to set l2_len, l3_len and l4_len when receiving from the VM? 
Those are not used by OVS and if the packet changes during the pipeline 
execution, they will need to be updated at the appropriate prepare function, 
which for dpdk is netdev_dpdk_prep_hwol_packet().

[Yi Yang] Currently, netdev_dpdk_prep_hwol_packet assumes ol_flags and l*_len 
have been set correctly before it is called, netdev_dpdk_prep_hwol_packet needs 
to use them to make some decisions. But in netdev_dpdk_prep_hwol_batch, dev is 
output device, not input device, so we can't decide if it is from VM or 
physical dpdk port. Your concern make sense, I will move it to 
netdev_dpdk_prep_hwol_packet,  can you propose a way to decide if the packet is 
from VM in netdev_dpdk_prep_hwol_packet? 

Few more comments below. 

Thanks!
fbl

On Wed, Jul 01, 2020 at 05:15:30PM +0800, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> Many NICs can support VXLAN TSO which can help improve 
> across-compute-node VM-to-VM performance in case that MTU is set to 
> 1500.
> 
> This patch allows dpdkvhostuserclient interface and veth/tap interface 
> to leverage NICs' offload capability to maximize across-compute-node 
> TCP performance, with it applied, OVS DPDK can reach linespeed for 
> across-compute-node VM-to-VM TCP performance.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h|  61 +
>  lib/netdev-dpdk.c  | 193 
> +
>  lib/netdev-linux.c |  20 ++
>  lib/netdev.c   |  14 ++--
>  4 files changed, 271 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 070d111..07af124 
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -1034,6 +1034,67 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>  *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;  }
>  
> +#ifdef DPDK_NETDEV
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static 
> +inline void dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b) {
> +b->mbuf.ol_flags |= PKT_TX_TUNNEL_VXLAN;
> +b->mbuf.l2_len += sizeof(struct udp_header) +
> +  sizeof(struct vxlanhdr);
> +b->mbuf.outer_l2_len = ETH_HEADER_LEN;
> +b->mbuf.outer_l3_len = IP_HEADER_LEN;

What about IPv6?
[Yi Yang] Current DPDK GSO code can't handle ipv6 (I mean outer IP header is 
IPv6), I'm not sure if NIC can handle it. So far only IPv4 is verified, I will 
add check code here after I confirm it.


> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len) {
> +b->mbuf.l2_len = l2_len;
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len) {
> +b->mbuf.l3_len = l3_len;
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len) {
> +b->mbuf.l4_len = l4_len;
> +}
> +#else
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static 
> +inline void dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b 
> +OVS_UNUSED) { }
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b OVS_UNUSED,
> +  int l2_len OVS_UNUSED) { }
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b OVS_UNUSED,
> +  int l3_len OVS_UNUSED) { }
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED,
> +  int l4_len OVS_UNUSED) { } #endif /* 
> +DPDK_NETDEV */
> +
>  static inline bool
>  dp_packet_ip_checksum_valid(const struct dp_packet *p)  { diff --git 
> a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 44ebf96..bf5fa63 100644
> --- a/lib/netdev-dpdk.c
> +++ 

[ovs-dev] 答复: [PATCH v2 1/5] Fix dp_packet_set_size error for multi-seg mbuf

2020-07-12 Thread
Flavio, thank you so much for reviewing, I'll fix them per your comments in 
next version.  Some replies per your comments inline.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Flavio Leitner
发送时间: 2020年7月11日 3:42
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v2 1/5] Fix dp_packet_set_size error for multi-seg mbuf


Hi Yi,

Thanks for putting this patch-set together.

On Wed, Jul 01, 2020 at 05:15:29PM +0800, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> For multi-seg mbuf, pkt_len isn't equal to data_len, data_len is 
> data_len of the first seg, pkt_len is sum of data_len of all the segs, 
> so for such packets, dp_packet_set_size shouldn't change data_len.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 0430cca..070d111 
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -575,7 +575,9 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v)
>   * (and thus 'v') will always be <= UINT16_MAX; this means that there is 
> no
>   * loss of accuracy in assigning 'v' to 'data_len'.
>   */
> -b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +if (b->mbuf.nb_segs <= 1) {
> +b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +}
>  b->mbuf.pkt_len = v; /* Total length of all segments linked 
> to
>* this segment. */  }

Currently OVS doesn't support multi-seg mbuf, so although this patch wouldn't 
break anything it doesn't sound correct as it is.  It seems incomplete/limited 
as well.

I think at least the patch should add a comment explaining why and when that is 
needed. 

Another thing is that this change alone has no users.  Usually we do changes 
along with the first user. I am still reviewing the following patches, but I 
suspect this change is for GRO/GSO, so in my opinion it makes sense to be part 
of one of them.
Doing so helps to backtrack the reason for a specific change.

I think we should prioritize single mbuf as they carry less data, so OVS has 
less time to process them. Therefore, it seems appropriate to use OVS_LIKELY().

[Yi Yang] Make sense, I'll merge it in GRO patch, yes, GRO needs it, OVS_LIKELY 
will be better.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v1] Enable VXLAN TSO for DPDK datapath

2020-06-04 Thread
Hi, Ian

I have fixed issues your comments mentioned and passed travis build check, it 
is in my github, https://github.com/yyang13/ovs, v1 sent out last week missed a 
change in lib/netdev-linux.c, please use this one patch in 
https://github.com/yyang13/ovs for your setup.

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年6月4日 8:30
收件人: 'ian.sto...@intel.com' ; 'yang_y...@163.com' 
; 'ovs-dev@openvswitch.org' 
抄送: 'b...@ovn.org' ; 'u9012...@gmail.com' 
主题: 答复: [PATCH v1] Enable VXLAN TSO for DPDK datapath
重要性: 高

Ian, thank you so much for comments, I will run travis  to find out all  the 
similar issues and fix them in v2. For setup, I have sent you an email which 
included dpdk bug link, please use that to setup it, please let me know if you 
have any issue.

-邮件原件-
发件人: Stokes, Ian [mailto:ian.sto...@intel.com] 
发送时间: 2020年6月3日 20:43
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: b...@ovn.org; u9012...@gmail.com; Yi Yang (杨燚)-云服务集团 
主题: Re: [PATCH v1] Enable VXLAN TSO for DPDK datapath



On 6/1/2020 11:15 AM, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> Many NICs can support VXLAN TSO which can help improve 
> across-compute-node VM-to-VM performance in case that MTU is set to 
> 1500.
> 
> This patch allows dpdkvhostuserclient interface and veth/tap interface 
> to leverage NICs' offload capability to maximize across-compute-node 
> TCP performance, with it applied, OVS DPDK can reach linespeed for 
> across-compute-node VM-to-VM TCP performance.
> 
> GSO (for UDP and software VXLAN TSO ) and GRO will be added in the 
> near future.

Hi Yi, thanks for working on this, I have not tested yet as there were multiple 
compilation issues to be addressed, I've flagged these below.


If you can address these and meanwhile I'll look to deploy a setup in 
our lab to enable testing/review in greater detail.

> 
> Signed-off-by: Yi Yang 
> ---
>   lib/dp-packet.h   |  58 
>   lib/netdev-dpdk.c | 193 
> ++
>   lib/netdev.c  |  16 ++---
>   3 files changed, 243 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 0430cca..7faa675 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -1032,6 +1032,64 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>   *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;
>   }
>   
> +#ifdef DPDK_NETDEV
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b)
> +{
> +b->mbuf.ol_flags |= PKT_TX_TUNNEL_VXLAN;
> +b->mbuf.l2_len += sizeof(struct udp_header) +
> +  sizeof(struct vxlanhdr);
> +b->mbuf.outer_l2_len = ETH_HEADER_LEN;
> +b->mbuf.outer_l3_len = IP_HEADER_LEN;
> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> +{
> +b->mbuf.l2_len = l2_len;
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len)
> +{
> +b->mbuf.l3_len = l3_len;
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len)
> +{
> +b->mbuf.l4_len = l4_len;
> +}
> +#else

For non DPDK case, need to add OVS_UNUSED after each function parameter 
below, otherwise it will be treated as unused parameter and cause 
compilation failure for non DPDK case.

> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b)
> +{
> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> +{
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len)
> +{
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len)
> +{
> +}
> +#endif /* DPDK_NETDEV */
> +
>   static inline bool
>   dp_packet_ip_checksum_valid(const struct dp_packet *p)
>   {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 44ebf96..c2424f7 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -44,6 +44,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include "cmap.h"
>   #include "coverage.h"
> @@ -405,6 +406,7 @@ enum dpdk_hw_ol_features {
>   NETDEV_RX_HW_SCATTER = 1 << 2,
>   NETDEV_TX_TSO_OFFLOAD = 1 << 3,
>   NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 

[ovs-dev] 答复: [PATCH v1] Enable VXLAN TSO for DPDK datapath

2020-06-03 Thread
Ian, thank you so much for comments, I will run travis  to find out all  the 
similar issues and fix them in v2. For setup, I have sent you an email which 
included dpdk bug link, please use that to setup it, please let me know if you 
have any issue.

-邮件原件-
发件人: Stokes, Ian [mailto:ian.sto...@intel.com] 
发送时间: 2020年6月3日 20:43
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: b...@ovn.org; u9012...@gmail.com; Yi Yang (杨燚)-云服务集团 
主题: Re: [PATCH v1] Enable VXLAN TSO for DPDK datapath



On 6/1/2020 11:15 AM, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> Many NICs can support VXLAN TSO which can help improve 
> across-compute-node VM-to-VM performance in case that MTU is set to 
> 1500.
> 
> This patch allows dpdkvhostuserclient interface and veth/tap interface 
> to leverage NICs' offload capability to maximize across-compute-node 
> TCP performance, with it applied, OVS DPDK can reach linespeed for 
> across-compute-node VM-to-VM TCP performance.
> 
> GSO (for UDP and software VXLAN TSO ) and GRO will be added in the 
> near future.

Hi Yi, thanks for working on this, I have not tested yet as there were multiple 
compilation issues to be addressed, I've flagged these below.


If you can address these and meanwhile I'll look to deploy a setup in 
our lab to enable testing/review in greater detail.

> 
> Signed-off-by: Yi Yang 
> ---
>   lib/dp-packet.h   |  58 
>   lib/netdev-dpdk.c | 193 
> ++
>   lib/netdev.c  |  16 ++---
>   3 files changed, 243 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 0430cca..7faa675 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -1032,6 +1032,64 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>   *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;
>   }
>   
> +#ifdef DPDK_NETDEV
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b)
> +{
> +b->mbuf.ol_flags |= PKT_TX_TUNNEL_VXLAN;
> +b->mbuf.l2_len += sizeof(struct udp_header) +
> +  sizeof(struct vxlanhdr);
> +b->mbuf.outer_l2_len = ETH_HEADER_LEN;
> +b->mbuf.outer_l3_len = IP_HEADER_LEN;
> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> +{
> +b->mbuf.l2_len = l2_len;
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len)
> +{
> +b->mbuf.l3_len = l3_len;
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len)
> +{
> +b->mbuf.l4_len = l4_len;
> +}
> +#else

For non DPDK case, need to add OVS_UNUSED after each function parameter 
below, otherwise it will be treated as unused parameter and cause 
compilation failure for non DPDK case.

> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b)
> +{
> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> +{
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len)
> +{
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len)
> +{
> +}
> +#endif /* DPDK_NETDEV */
> +
>   static inline bool
>   dp_packet_ip_checksum_valid(const struct dp_packet *p)
>   {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 44ebf96..c2424f7 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -44,6 +44,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include "cmap.h"
>   #include "coverage.h"
> @@ -405,6 +406,7 @@ enum dpdk_hw_ol_features {
>   NETDEV_RX_HW_SCATTER = 1 << 2,
>   NETDEV_TX_TSO_OFFLOAD = 1 << 3,
>   NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4,
> +NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 5,
>   };
>   
>   /*
> @@ -988,6 +990,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int 
> n_rxq, int n_txq)
>   
>   if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>   conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS;
> +/* Enable VXLAN TSO support if available */
> +if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) {
> + 

[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-04-13 Thread
ec  6.41 GBytes  5.50 Gbits/sec  56704184 KBytes
[  4]  20.00-30.00  sec  6.64 GBytes  5.71 Gbits/sec  55720189 KBytes
[  4]  30.00-40.00  sec  6.52 GBytes  5.60 Gbits/sec  53433178 KBytes
[  4]  40.00-50.00  sec  6.41 GBytes  5.51 Gbits/sec  52541185 KBytes
[  4]  50.00-60.00  sec  6.52 GBytes  5.60 Gbits/sec  56081141 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  38.8 GBytes  5.55 Gbits/sec  331152 sender
[  4]   0.00-60.00  sec  38.8 GBytes  5.55 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 59548
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 59550
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  6.25 GBytes  5.37 Gbits/sec
[  5]  10.00-20.00  sec  6.41 GBytes  5.51 Gbits/sec
[  5]  20.00-30.00  sec  6.64 GBytes  5.71 Gbits/sec
[  5]  30.00-40.00  sec  6.52 GBytes  5.60 Gbits/sec
[  5]  40.00-50.00  sec  6.41 GBytes  5.51 Gbits/sec
[  5]  50.00-60.00  sec  6.51 GBytes  5.60 Gbits/sec
[  5]  60.00-60.04  sec  22.5 MBytes  4.71 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.04  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.04  sec  38.8 GBytes  5.55 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

TPACKET_V2 TSO
===
[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 32884 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  10.7 GBytes  9.21 Gbits/sec7   3.13 MBytes
[  4]  10.00-20.00  sec  10.8 GBytes  9.25 Gbits/sec0   3.13 MBytes
[  4]  20.00-30.00  sec  10.8 GBytes  9.25 Gbits/sec0   3.13 MBytes
[  4]  30.00-40.00  sec  10.8 GBytes  9.29 Gbits/sec0   3.13 MBytes
[  4]  40.00-50.00  sec  10.8 GBytes  9.30 Gbits/sec0   3.13 MBytes
[  4]  50.00-60.00  sec  10.7 GBytes  9.20 Gbits/sec0   3.13 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  64.6 GBytes  9.25 Gbits/sec7 sender
[  4]   0.00-60.00  sec  64.6 GBytes  9.25 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 32882
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 32884
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  10.7 GBytes  9.17 Gbits/sec
[  5]  10.00-20.00  sec  10.8 GBytes  9.25 Gbits/sec
[  5]  20.00-30.00  sec  10.8 GBytes  9.25 Gbits/sec
[  5]  30.00-40.00  sec  10.8 GBytes  9.29 Gbits/sec
[  5]  40.00-50.00  sec  10.8 GBytes  9.30 Gbits/sec
[  5]  50.00-60.00  sec  10.7 GBytes  9.20 Gbits/sec
[  5]  60.00-60.04  sec  46.0 MBytes  9.29 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.04  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.04  sec  64.6 GBytes  9.24 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 22:53
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18, 2020 at 8:12 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, folks
>
> As I said, TPACKET_V3 does have kernel implementation issue, I tried to fix 
> it in Linux kernel 5.5.9, here is my test data with tpacket_v3 and tso 
> enabled. On my low end server, my goal is to reach 16Gbps at least, I still 
> have another idea to improve it.
>
Can you share your kernel fix?
Or have you sent patch somewhere?
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-25 Thread
William, FYI, my Linux kernel fix patch for TPACKET_V3 TSO performance issue. 
You can try it in Ubuntu 4.15.0 kernel.

https://patchwork.ozlabs.org/patch/1261410/


-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年3月23日 18:00
收件人: 'u9012...@gmail.com' 
抄送: 'i.maxim...@ovn.org' ; 'yang_y...@163.com' 
; 'ovs-dev@openvswitch.org' 
主题: 答复: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath
重要性: 高

Hi, folks

I implemented my goal in Ubuntu kernel 4.15.0-92.93, here is my performance 
data with tpacket_v3 and tso. So now I'm very sure tpacket_v3 can do better.

[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
iperf3: no process found
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 44976 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  19.6 GBytes  16.8 Gbits/sec  106586307 KBytes
[  4]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec  104625215 KBytes
[  4]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec  106962301 KBytes
[  4]  30.00-40.00  sec  19.9 GBytes  17.1 Gbits/sec  102262346 KBytes
[  4]  40.00-50.00  sec  19.8 GBytes  17.0 Gbits/sec  105383225 KBytes
[  4]  50.00-60.00  sec  19.9 GBytes  17.1 Gbits/sec  103177294 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec   119 GBytes  17.0 Gbits/sec  628995 sender
[  4]   0.00-60.00  sec   119 GBytes  17.0 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 44974
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 44976
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  19.5 GBytes  16.7 Gbits/sec
[  5]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec
[  5]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec
[  5]  30.00-40.00  sec  19.9 GBytes  17.1 Gbits/sec
[  5]  40.00-50.00  sec  19.8 GBytes  17.0 Gbits/sec
[  5]  50.00-60.00  sec  19.9 GBytes  17.1 Gbits/sec
[  5]  60.00-60.04  sec  89.1 MBytes  17.5 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.04  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.04  sec   119 GBytes  17.0 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年3月19日 11:12
收件人: 'u9012...@gmail.com' 
抄送: 'i.maxim...@ovn.org' ; 'yang_y...@163.com' 
; 'ovs-dev@openvswitch.org' 
主题: 答复: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath
重要性: 高

Hi, folks

As I said, TPACKET_V3 does have kernel implementation issue, I tried to fix it 
in Linux kernel 5.5.9, here is my test data with tpacket_v3 and tso enabled. On 
my low end server, my goal is to reach 16Gbps at least, I still have another 
idea to improve it.

[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 42336 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec1   3.09 MBytes
[  4]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec0   3.09 MBytes
[  4]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec3   3.09 MBytes
[  4]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec4 sender
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 42334
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 42336
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  60.00-60.01  sec  14.3 MBytes  12.4 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.01  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.01  sec  77.2 GBytes  11.0 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 5:42
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18

[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-23 Thread
Hi, folks

I implemented my goal in Ubuntu kernel 4.15.0-92.93, here is my performance 
data with tpacket_v3 and tso. So now I'm very sure tpacket_v3 can do better.

[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
iperf3: no process found
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 44976 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  19.6 GBytes  16.8 Gbits/sec  106586307 KBytes
[  4]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec  104625215 KBytes
[  4]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec  106962301 KBytes
[  4]  30.00-40.00  sec  19.9 GBytes  17.1 Gbits/sec  102262346 KBytes
[  4]  40.00-50.00  sec  19.8 GBytes  17.0 Gbits/sec  105383225 KBytes
[  4]  50.00-60.00  sec  19.9 GBytes  17.1 Gbits/sec  103177294 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec   119 GBytes  17.0 Gbits/sec  628995 sender
[  4]   0.00-60.00  sec   119 GBytes  17.0 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 44974
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 44976
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  19.5 GBytes  16.7 Gbits/sec
[  5]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec
[  5]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec
[  5]  30.00-40.00  sec  19.9 GBytes  17.1 Gbits/sec
[  5]  40.00-50.00  sec  19.8 GBytes  17.0 Gbits/sec
[  5]  50.00-60.00  sec  19.9 GBytes  17.1 Gbits/sec
[  5]  60.00-60.04  sec  89.1 MBytes  17.5 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.04  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.04  sec   119 GBytes  17.0 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年3月19日 11:12
收件人: 'u9012...@gmail.com' 
抄送: 'i.maxim...@ovn.org' ; 'yang_y...@163.com' 
; 'ovs-dev@openvswitch.org' 
主题: 答复: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath
重要性: 高

Hi, folks

As I said, TPACKET_V3 does have kernel implementation issue, I tried to fix it 
in Linux kernel 5.5.9, here is my test data with tpacket_v3 and tso enabled. On 
my low end server, my goal is to reach 16Gbps at least, I still have another 
idea to improve it.

[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 42336 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec1   3.09 MBytes
[  4]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec0   3.09 MBytes
[  4]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec3   3.09 MBytes
[  4]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec4 sender
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 42334
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 42336
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  60.00-60.01  sec  14.3 MBytes  12.4 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.01  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.01  sec  77.2 GBytes  11.0 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 5:42
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18, 2020 at 6:22 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Ilya, raw socket for the interface type of which is "system" has been 
> set to non-block mode, can you explain which syscall will lead to 
> sleep? Yes, pmd thread will consume CPU resource even if it has 
> nothing to do, but all the type=dpdk ports are handled by pmd thread, 
> here we just let system interfaces look like a DPDK interface. I 
> di

[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-19 Thread
William, this is just a simple experiment, I'm still trying other ideas to get 
higher performance improvement, final patch is for Linux kernel net tree, not 
for ovs.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 22:53
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18, 2020 at 8:12 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, folks
>
> As I said, TPACKET_V3 does have kernel implementation issue, I tried to fix 
> it in Linux kernel 5.5.9, here is my test data with tpacket_v3 and tso 
> enabled. On my low end server, my goal is to reach 16Gbps at least, I still 
> have another idea to improve it.
>
Can you share your kernel fix?
Or have you sent patch somewhere?
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-18 Thread
Hi, folks

As I said, TPACKET_V3 does have kernel implementation issue, I tried to fix it 
in Linux kernel 5.5.9, here is my test data with tpacket_v3 and tso enabled. On 
my low end server, my goal is to reach 16Gbps at least, I still have another 
idea to improve it.

[yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 42336 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec1   3.09 MBytes
[  4]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec0   3.09 MBytes
[  4]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec3   3.09 MBytes
[  4]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
[  4]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.09 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec4 sender
[  4]   0.00-60.00  sec  77.2 GBytes  11.1 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 42334
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 42336
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  10.00-20.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  20.00-30.00  sec  12.9 GBytes  11.1 Gbits/sec
[  5]  30.00-40.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  40.00-50.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  50.00-60.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  60.00-60.01  sec  14.3 MBytes  12.4 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.01  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.01  sec  77.2 GBytes  11.0 Gbits/sec  receiver


iperf Done.
[yangyi@localhost ovs-master]$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 5:42
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18, 2020 at 6:22 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Ilya, raw socket for the interface type of which is "system" has been 
> set to non-block mode, can you explain which syscall will lead to 
> sleep? Yes, pmd thread will consume CPU resource even if it has 
> nothing to do, but all the type=dpdk ports are handled by pmd thread, 
> here we just let system interfaces look like a DPDK interface. I 
> didn't see any problem in my test, it will be better if you can tell 
> me what will result in a problem and how I can reproduce it. By the 
> way, type=tap/internal interfaces are still be handled by ovs-vswitchd thread.
>
> In addition, only one line change is there, ".is_pmd = true,", 
> ".is_pmd = false," will keep it in ovs-vswitchd if there is any other 
> concern. We can change non-thread-safe parts to support pmd.
>

Hi Yiyang an Ilya,

How about making tpacket_v3 a new netdev class with type="tpacket"?
Like my original patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366229.html

Users have to create it specifically by doing type="tpacket", ex:
  $ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="tpacket"
And we can set is_pmd=true for this particular type.

Regards
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-18 Thread
William, that can't fix Ilya's concern, we can fix thread-safe issues if most 
of code in lib/netdev-linux.c is not thread-safe, we also have to consider how 
we can fix scalability issue, obviously it isn't scalable that one ovs-vswitchd 
thread handles all the such interfaces, this is performance bottleneck and the 
performance is linearly related with number of interfaces. I think pmd thread 
is our natural choice, we have no reason to refuse this way, we can first fix 
them if current code does have issues to support pmd thread.

If it is difficult to support pmd currently, I can remove ".is_pmd = true", if 
a user wants to do that way, maybe we can add an option in interface level, 
options:is_pmd=true, but I'm not sure if is_pmd can be set on adding interface.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月19日 5:42
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Wed, Mar 18, 2020 at 6:22 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Ilya, raw socket for the interface type of which is "system" has been 
> set to non-block mode, can you explain which syscall will lead to 
> sleep? Yes, pmd thread will consume CPU resource even if it has 
> nothing to do, but all the type=dpdk ports are handled by pmd thread, 
> here we just let system interfaces look like a DPDK interface. I 
> didn't see any problem in my test, it will be better if you can tell 
> me what will result in a problem and how I can reproduce it. By the 
> way, type=tap/internal interfaces are still be handled by ovs-vswitchd thread.
>
> In addition, only one line change is there, ".is_pmd = true,", 
> ".is_pmd = false," will keep it in ovs-vswitchd if there is any other 
> concern. We can change non-thread-safe parts to support pmd.
>

Hi Yiyang an Ilya,

How about making tpacket_v3 a new netdev class with type="tpacket"?
Like my original patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366229.html

Users have to create it specifically by doing type="tpacket", ex:
  $ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="tpacket"
And we can set is_pmd=true for this particular type.

Regards
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-17 Thread
Ok, I will send out v7 with these changes.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月18日 11:56
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; b...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Tue, Mar 17, 2020 at 7:00 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> By the way, with tpacket_v3, zero copy optimization and is_pmd=true, the 
> performance is much better, 3.77Gbps, (3.77-1.34)/1.34 = 1.81 , i.e. 181% 
> improvement, here is the performance data.
>
Can you send out the tpacket_v3 patch together with these optimizations to the 
mailing list?
Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-17 Thread
By the way, with tpacket_v3, zero copy optimization and is_pmd=true, the 
performance is much better, 3.77Gbps, (3.77-1.34)/1.34 = 1.81 , i.e. 181% 
improvement, here is the performance data.

is_pmd = true
=
eipadmin@eip01:~$ sudo ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 43210 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  4.34 GBytes  3.73 Gbits/sec0   3.03 MBytes
[  4]  10.00-20.00  sec  4.40 GBytes  3.78 Gbits/sec0   3.03 MBytes
[  4]  20.00-30.00  sec  4.40 GBytes  3.78 Gbits/sec0   3.03 MBytes
[  4]  30.00-40.00  sec  4.40 GBytes  3.78 Gbits/sec0   3.03 MBytes
[  4]  40.00-50.00  sec  4.40 GBytes  3.78 Gbits/sec0   3.03 MBytes
[  4]  50.00-60.00  sec  4.40 GBytes  3.78 Gbits/sec0   3.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  26.3 GBytes  3.77 Gbits/sec0 sender
[  4]   0.00-60.00  sec  26.3 GBytes  3.77 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 43208
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43210
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  4.32 GBytes  3.71 Gbits/sec
[  5]  10.00-20.00  sec  4.40 GBytes  3.78 Gbits/sec
[  5]  20.00-30.00  sec  4.40 GBytes  3.78 Gbits/sec
[  5]  30.00-40.00  sec  4.40 GBytes  3.78 Gbits/sec
[  5]  40.00-50.00  sec  4.40 GBytes  3.78 Gbits/sec
[  5]  50.00-60.00  sec  4.40 GBytes  3.78 Gbits/sec


iperf Done.
eipadmin@eip01:~$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月17日 22:58
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; b...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Tue, Mar 17, 2020 at 2:08 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, William
>
> Finally, my highend server is available and so I can do performance 
> comparison again, tpacket_v3 obviously has big performance improvement, here 
> is my data. By the way, in order to get stable performance data, please use 
> taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other 
> task to its logical sibling core for stable performance data), iperf3 client 
> an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, 
> iperf3 server is pinned to core 4, iperf3 client is pinned to core 5.
>
> According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 
> 2.08,  (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy 
> for receive side), it can have more improvement (from 1.34 to 2.21, 
> (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, 
> please reconsider it again.
>

That's great improvement.
What is your optimization "zero copy for receive side"?
Does it include in the patch?

Regards
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-17 Thread
William, are you trying my patch for zero copy? I can send you for a try on 
your platform. Per your af_xdp change, I find dp_packet can use pre-allocated 
buffer, so I used that way, because tpacket_v3 has setup rx ring there, so 
dp_packet can directly use those rx ring buffer.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月17日 22:58
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; b...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Tue, Mar 17, 2020 at 2:08 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, William
>
> Finally, my highend server is available and so I can do performance 
> comparison again, tpacket_v3 obviously has big performance improvement, here 
> is my data. By the way, in order to get stable performance data, please use 
> taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other 
> task to its logical sibling core for stable performance data), iperf3 client 
> an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, 
> iperf3 server is pinned to core 4, iperf3 client is pinned to core 5.
>
> According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 
> 2.08,  (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy 
> for receive side), it can have more improvement (from 1.34 to 2.21, 
> (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, 
> please reconsider it again.
>

That's great improvement.
What is your optimization "zero copy for receive side"?
Does it include in the patch?

Regards
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-17 Thread
[  4]   0.00-60.00  sec  15.4 GBytes  2.21 Gbits/sec0 sender
[  4]   0.00-60.00  sec  15.4 GBytes  2.21 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 43180
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43182
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  2.53 GBytes  2.17 Gbits/sec
[  5]  10.00-20.00  sec  2.58 GBytes  2.22 Gbits/sec
[  5]  20.00-30.00  sec  2.58 GBytes  2.22 Gbits/sec
[  5]  30.00-40.00  sec  2.59 GBytes  2.22 Gbits/sec
[  5]  40.00-50.00  sec  2.57 GBytes  2.21 Gbits/sec
[  5]  50.00-60.00  sec  2.57 GBytes  2.21 Gbits/sec


iperf Done.
eipadmin@eip01:~$

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月14日 22:18
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; b...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Fri, Mar 13, 2020 at 9:45 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Io_uring is a feature brought in by Linux kernel 5.1, so it can't be 
> used on Linux system with kernel version < 5.1. tpacket_v3 is only one 
> way to avoid system call on almost all the Linux kernel versions, it 
> is unique from this perspective. Maybe you will miss it if someone 
> fixes kernel side issue :-)
>
> In addition, according to what Flavio said, TSO can't support VXLAN 
> currently, but in most cloud scenarios, VXLAN is only one choice, so for such 
> cases, TSO can be ignored.
>
> My point is we can provide one option for such use cases, once kernel side 
> issue is fixed, all the Linux distributions can apply this fix, users can get 
> immediate benefits without change. So maybe adding a switch 
> userspace-use-tpacket-v3 in other-config (set to False by default) is an 
> acceptable way to handle this.
>

The tpacket_v3 patch now shows very little performance improvement.
So there is little incentive to merge and maintain this code.
Do you know if kernel side is fixed, will tpacket_v3 have better performance 
improvement?

Or another way is to study io_uring and compare its performance with tpacket_v3.

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-15 Thread
All the definitions/macros have been in include/linux/if_packet.h since 3.10.0, 
so there will not be that case existing. 

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年3月15日 4:04
收件人: Yi Yang (杨燚)-云服务集团 
抄送: u9012...@gmail.com; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: 答复: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

There might still be a misunderstanding.

There can be a difference between the kernel that OVS runs on (version
A) and the kernel headers against which it is built (version B).  Often, the 
latter are supplied by the distribution and they are not usually kept as up to 
date, so B < A is common.

I don't know whether this is likely to be a problem in this particular case.

On Sat, Mar 14, 2020 at 03:35:46AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Got it, then we can safely remove inclue/linux/if_packet.h in ovs 
> because the minimal Linux version OVS supports has supported 
> tpacket_v3. Thanks Ben for clarification.
> 
> -邮件原件-
> 发件人: Ben Pfaff [mailto:b...@ovn.org]
> 发送时间: 2020年3月13日 23:57
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: u9012...@gmail.com; yang_y...@163.com; ovs-dev@openvswitch.org
> 主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for 
> userspace datapath
> 
> On Fri, Mar 13, 2020 at 01:04:07AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > Per my understanding, Ben meant a build system (which isn't Linux 
> > probably, it doesn't have include/linux/if_packet.h) should be able 
> > to build tpacket_v3 code in order that built-out binary can work on 
> > Linux system with tpacket_v3 feature, this is Ben's point, that is 
> > why he wanted me to add include/linux/if_packet.h in ovs repo.
> > 
> > Ben, can you help double confirm if include/linux/if_packet.h in ovs 
> > is necessary?
> 
> I think my meaning was misunderstood.  Linux always has if_packet.h.
> Only recent enough Linux has TPACKET_V3 in if_packet.h.  If the system is 
> Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, 
> then the build system should define them.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-13 Thread
Io_uring is a feature brought in by Linux kernel 5.1, so it can't be used on 
Linux system with kernel version < 5.1. tpacket_v3 is only one way to avoid 
system call on almost all the Linux kernel versions, it is unique from this 
perspective. Maybe you will miss it if someone fixes kernel side issue :-)

In addition, according to what Flavio said, TSO can't support VXLAN currently, 
but in most cloud scenarios, VXLAN is only one choice, so for such cases, TSO 
can be ignored.

My point is we can provide one option for such use cases, once kernel side 
issue is fixed, all the Linux distributions can apply this fix, users can get 
immediate benefits without change. So maybe adding a switch 
userspace-use-tpacket-v3 in other-config (set to False by default) is an 
acceptable way to handle this.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年3月14日 0:48
收件人: William Tu ; Ben Pfaff 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On 3/13/20 5:22 PM, William Tu wrote:
> On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff  wrote:
>>
>> On Fri, Mar 13, 2020 at 01:04:07AM +, Yi Yang (杨燚)-云服务集团 wrote:
>>> Per my understanding, Ben meant a build system (which isn't Linux 
>>> probably, it doesn't have include/linux/if_packet.h) should be able 
>>> to build tpacket_v3 code in order that built-out binary can work on 
>>> Linux system with tpacket_v3 feature, this is Ben's point, that is 
>>> why he wanted me to add include/linux/if_packet.h in ovs repo.
>>>
>>> Ben, can you help double confirm if include/linux/if_packet.h in ovs 
>>> is necessary?
>>
>> I think my meaning was misunderstood.  Linux always has if_packet.h.
>> Only recent enough Linux has TPACKET_V3 in if_packet.h.  If the 
>> system is Linux but the TPACKET_V3 types and constants are not 
>> defined in if_packet.h, then the build system should define them.
> 
> Thanks!
> 
> My suggestion is that if the system is Linux but the TPACKET_V3 types 
> and constants are not defined in if_packet.h, then just skip using
> TPACKET_V3 and
> use the current recvmmsg approach.  Because when we start  TPACKET_V3 
> patch, the af_packet on veth performance is about 200Mbps, so 
> tpacket_v3 has huge performance benefits.
> 
> With YiYang's patch
> "Use batch process recv for tap and raw socket in netdev datapath"
> the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows 
> similar or 7% better performance. So there isn't a huge benefits now.

With such a small performance benefit does it make sense to have these 700 
lines of code that is so hard to read and maintain?

Another point is that hopefully segmentation offloading in userspace datapath 
will evolve so we could enable it by default and all this code will become 
almost useless.

If you're looking for poll mode/async -like solutions we could try and check 
io_uring way for calling same recvmsg/sendmsg.  That might have more benefits 
and it will support all the functionality supported by these calls.  Even 
better, we could also make io_uring support as an internal library and reuse it 
for other OVS subsystems like making async poll/timers/logging/etc in the 
future.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-13 Thread
Got it, then we can safely remove inclue/linux/if_packet.h in ovs because the 
minimal Linux version OVS supports has supported tpacket_v3. Thanks Ben for 
clarification.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年3月13日 23:57
收件人: Yi Yang (杨燚)-云服务集团 
抄送: u9012...@gmail.com; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Fri, Mar 13, 2020 at 01:04:07AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Per my understanding, Ben meant a build system (which isn't Linux 
> probably, it doesn't have include/linux/if_packet.h) should be able to 
> build tpacket_v3 code in order that built-out binary can work on Linux 
> system with tpacket_v3 feature, this is Ben's point, that is why he 
> wanted me to add include/linux/if_packet.h in ovs repo.
> 
> Ben, can you help double confirm if include/linux/if_packet.h in ovs 
> is necessary?

I think my meaning was misunderstood.  Linux always has if_packet.h.
Only recent enough Linux has TPACKET_V3 in if_packet.h.  If the system is Linux 
but the TPACKET_V3 types and constants are not defined in if_packet.h, then the 
build system should define them.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-12 Thread
Hi, Ben, can you help double confirm if include/linux/if_packet.h in ovs is 
necessary?

William, let me paster Ben's comments here. Original comments link is 
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367085.html.

In addition, based on the fact tpacket_v3 isn't so good currently, I agree we 
needn't use it as default implementation, I think we can add an other 
config:userspace-use-tpacket-v3=true|false and set it to false by default, what 
do you think about it? I will change it in next ver in this way if yes.

"""
Thanks for the patch!

I am a bit concerned about version compatibility issues here.  There are
two relevant kinds of versions.  The first is the version of the
kernel/library headers.  This patch works pretty hard to adapt to the
headers that are available at compile time, only dealing with the
versions of the protocols that are available from the headers.  This
approach is sometimes fine, but an approach can be better is to simply
declare the structures or constants that the headers lack.  This is
often pretty easy for Linux data structures.  OVS does this for some
structures that it cares about with the headers in ovs/include/linux.
This approach has two advantages: the OVS code (outside these special
declarations) doesn't have to care whether particular structures are
declared, because they are always declared, and the OVS build always
supports a particular feature regardless of the headers of the system on
which it was built.

The second kind of version is the version of the system that OVS runs
on.  Unless a given feature is one that is supported by every version
that OVS cares about, OVS needs to test at runtime whether the feature
is supported and, if not, fall back to the older feature.  I don't see
that in this code.  Instead, it looks to me like it assumes that if the
feature was available at build time, then it is available at runtime.
This is not a good way to do things, since we want people to be able to
get builds from distributors such as Red Hat or Debian and then run
those builds on a diverse collection of kernels.

One specific comment I have here is that, in acinclude.m4, it would be
better to use AC_CHECK_TYPE or AC_CHECK_TYPES thatn OVS_GREP_IFELSE.
The latter is for testing for kernel builds only; we can't use the
normal AC_* tests for those because we often can't successfully build
kernel headers using the compiler and flags that Autoconf sets up for
building OVS.

Thanks,
"""

Per my understanding, Ben meant a build system (which isn't Linux probably, it 
doesn't have include/linux/if_packet.h) should be able to build tpacket_v3 code 
in order that built-out binary can work on Linux system with tpacket_v3 
feature, this is Ben's point, that is why he wanted me to add 
include/linux/if_packet.h in ovs repo.

Ben, can you help double confirm if include/linux/if_packet.h in ovs is 
necessary?

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月13日 2:35
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Wed, Mar 11, 2020 at 6:14 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> > >
> > > TPACKET_V3 can support TSO, but its performance isn't good because 
> > > of
> > > TPACKET_V3 kernel implementation issue, so it falls back to
> >
> > What's the implementation issue? If we use latest kernel, does the 
> > issue still exist?
> >
> > [Yi Yang] Per my check, the issue is the kernel can't feed enough 
> > packets to tpacket_recv, so in many cases, no packets received, no 
> > 32 packets available, but for original non-tpacket case, one recv 
> > will get 32 packets in most cases, throughput is about more than 
> > twice for veth, for tap case, it is more than three times, I read 
> > kernel source code, but I can't find root cause, I'll check from tpacket 
> > maintainer.
> >
> > > recvmmsg in case userspace-tso-enable is set to true, but its 
> > > performance is better than recvmmsg in case userspace-tso-enable 
> > > is set to false, so just use TPACKET_V3 in that case.
> > >
> > > Signed-off-by: Yi Yang 
> > > Co-authored-by: William Tu 
> > > Signed-off-by: William Tu 
> > > ---
> > > diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h 
> > > new file mode 100644 index 000..e20aacc
> > > --- /dev/null
> > > +++ b/include/linux/if_packet.h
> >
> > if OVS_CHECK_LINUX_TPACKET returns false, can we simply fall back to 
> > recvmmsg?
> > So this is not needed?
> >
> > [Yi Yang] As you said, ovs support Linux kernel 3.10.0 or above, so 
> > no that case existing, isn't it?
>
> I mean if kernel supports it AND if_packet.h header exi

[ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-11 Thread
Thanks William, replies inline.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年3月12日 1:51
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Tue, Mar 10, 2020 at 7:42 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, William
>
> I'll fix some your concerns in next ver, please check other inline replies.
>
> -邮件原件-
> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 William Tu
> 发送时间: 2020年3月11日 3:43
> 收件人: yang_y_yi 
> 抄送: ovs-dev 
> 主题: Re: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for 
> userspace datapath
>
> On Fri, Mar 6, 2020 at 6:35 AM  wrote:
> >
> > From: Yi Yang 
> >
> > We can avoid high system call overhead by using TPACKET_V3 and using 
> > DPDK-like poll to receive and send packets (Note: send still needs 
> > to call sendto to trigger final packet transmission).
> >
> > From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> > Linux kernels current OVS supports can run
> > TPACKET_V3 without any problem.
> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
>
> On my testbed, I didn't see any performance gain.
> For a 100 sec TCP iperf3, I see with/without tpacket show the same 1.70Gbps.
> Do you think if we set .is_pmd=true, the performance might be better 
> because tpacket is ring-based?
>
> [Yi Yang] Please make sure userspace-tso-enabled is set to false for 
> your test, if it is true, tpacket_v3 isn't used.
>
> Please use physical machines, it isn't so noticeable if you use it 
> inside VMs. Here is my data for your reference ( I used a 5.5.7 
> kernel, but it is not relevant to kernel version basically).
>
> My physical machine is a low end server, so performance improvement 
> isn't so obvious. But a big improvement is retr value is almost 0. To 
> set is_pmd to true and use dpdk buffer is my next step to improve 
> performance further. I also have a tpacket_v3 patch for tap in hand. 
> In my previous physical server, improvement is very obvious. My goal 
> is about 4Gbps, it is 3.9Gbps in my previous physical server with 
> is_pmd set to true and use dpdk buffer for dp_packet.

With the current patch, is_pmd is always false.
How do you set is_pmd to true?

[Yi Yang] I have patches in my hand to do this, my goal is to use pmd thread to 
handle such case, it is more scalable than ovs-vswitchd, currently, only one 
ovs-vswitchd is handling all the such interfaces, I don't think it is an 
efficient way for the use cases which pursue performance.


>
> No tpacket_v3
> =
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  7.90 GBytes  1.13 Gbits/sec  39672 sender
> [  4]   0.00-60.00  sec  7.90 GBytes  1.13 Gbits/sec  receiver
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-60.00  sec  0.00 Bytes  0.00 bits/sec  sender
> [  5]   0.00-60.00  sec  7.90 GBytes  1.13 Gbits/sec  receiver
>

>
> iperf Done.
> [yangyi@localhost ovs-master]$ uname -a Linux localhost.localdomain 
> 5.5.7-1.el7.elrepo.x86_64 #1 SMP Fri Feb 28
> 12:21:58 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
> tpacket_v3
> ==

> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-60.02  sec  0.00 Bytes  0.00 bits/sec  sender
> [  5]   0.00-60.02  sec  8.39 GBytes  1.20 Gbits/sec  receiver
>

So your current result is
no tpacket 1.13G (with some retransmission) with tpacket 1.20G (zero 
retransmission) This is around 7% improvement.

[Yi Yang] It is so from this test result, but on my high-end server, I did see 
higher improvement, but I can't use it now, will recheck this once it is 
available.


>
>
>
> >
> > TPACKET_V3 can support TSO, but its performance isn't good because 
> > of
> > TPACKET_V3 kernel implementation issue, so it falls back to
>
> What's the implementation issue? If we use latest kernel, does the 
> issue still exist?
>
> [Yi Yang] Per my check, the issue is the kernel can't feed enough 
> packets to tpacket_recv, so in many cases, no packets received, no 32 
> packets available, but for original non-tpacket case, one recv will 
> get 32 packets in most cases, throughput is about more than twice for 
> veth, for tap case, it is more than three times, I read kernel source 
> code, but I can't find root cause, I'll check from tpacket ma

[ovs-dev] 答复: [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-06 Thread
Hi, William

After I checked some doubtable points, I think performance issue lies in 
tpacket_v3 kernel side implementation, so I fall back it to original one in 
case userspace-tso-enabled is set to true. V6 has been sent out, please review 
v6.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月27日 23:30
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@126.com; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Tue, Feb 25, 2020 at 5:41 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> In the same environment, but I used tap but not veth, retr number is 0 
> for the case without this patch (of course, I applied Flavio's tap 
> enable patch)
>

Right, because tap does not use the tpacket_v3 mmap packet, so it works fine.

> vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh Connecting to host 
> 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 54572 connected to 
> 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec0   3.14 MBytes
> [  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.14 MBytes
> [  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec0   3.14 MBytes
> [  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec0   3.14 MBytes
> [  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec0   3.14 MBytes
> [  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec0   3.14 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec0 sender
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
> receiver
>

> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
> >
> > TPACKET_V3 can support TSO, it can work only if your kernel can 
> > support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , 
> > if you find the performance is very poor, please turn off tso for 
> > veth interfces in case userspace-tso-enable is set to true.
>
> Do you test the performance of enabling TSO?
>
> Using veth (like your run-iperf3.sh) and with kernel 5.3.
> Without your patch, with TSO enabled, I can get around 6Gbps But with 
> this patch, with TSO enabled, the performance drops to 1.9Gbps.
>

Are you investigating this issue?
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH] userspace-tso: Document the minimum kernel version.

2020-03-02 Thread
I totally agree with Flavio, ovs document is the best place to tell users where 
the issue is and how it should be fixed, maybe the statement is not that 
correct for distribution kernel, but it is indeed very valuable information for 
end users, new statement obviously fixed Ilya's concerns.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年3月3日 3:52
收件人: Ilya Maximets 
抄送: d...@openvswitch.org; Yi Yang (杨燚)-云服务集团 
主题: Re: [ovs-dev] [PATCH] userspace-tso: Document the minimum kernel version.

On Mon, Mar 02, 2020 at 01:11:35PM +0100, Ilya Maximets wrote:
> On 2/28/20 11:58 PM, Flavio Leitner wrote:
> > The kernel needs to be at least 4.19-rc7 to include the commit
> > 9d2f67e43b73 ("net/packet: fix packet drop as of virtio gso") 
> > otherwise the TSO packets are dropped when using raw sockets.
> > 
> > Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload 
> > support")
> > Reported-by: Yi Yang (杨燚)-云服务集团 
> > Signed-off-by: Flavio Leitner 
> > ---
> >  Documentation/topics/userspace-tso.rst | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > This applies to branch-2.13 as well.
> > 
> > diff --git a/Documentation/topics/userspace-tso.rst 
> > b/Documentation/topics/userspace-tso.rst
> > index 0fbac93a5..da370e64f 100644
> > --- a/Documentation/topics/userspace-tso.rst
> > +++ b/Documentation/topics/userspace-tso.rst
> > @@ -113,6 +113,10 @@ __ https://patches.dpdk.org/patch/64136/
> >  This fix is expected to be included in the 19.11.1 release. When 
> > OVS migrates  to this DPDK release, this limitation can be removed.
> >  
> > +All kernel devices that use the raw socket interface (veth, for 
> > +example) require a kernel with minimum version of 4.19-rc7 to include the 
> > commit:
> > +9d2f67e43b73 ("net/packet: fix packet drop as of virtio gso").
> 
> I'm not very happy with this kind of documentation updates.  The main 
> reason is that every distribution uses their own custom kernel with 
> some patches backported or not.  In this particular case we seem to 
> have issue with particular ubuntu kernel that happened to not have this bug 
> fix backported.
> If someone uses non-lognterm upstream kernel that is out of its 
> support lifetime it's his/her responsibility to backport bugfixes.
> Upstream LTS kernel 4.14 has this bugfix backported and there should 
> be no issues. Upstream LTS 4.9 might not have this issue at all and 
> work fine (I didn't check).
> 
> What we can document is that particular kernel version from the Ubuntu 
> 16.04 distribution has an issue and should not be used along with 
> userspace-tso.
> Ideally, someone should file a bug for ubuntu kernel maintainers, so 
> they will backport relevant patch and update their kernel since it's 
> still in a supported state.  In this case we will have no need to document 
> anything.
> 
> What do you think?

I think worth for Ubuntu users to open a bug, so that this gets solved.

But I think the documentation is still valid. Yi and I spent time to find the 
root cause and it is something out of OvS control, so the best we can do is 
document the requirement.

Maybe we can word it differently so that the kernel version becomes less 
relevant. For example:

"All kernel devices that use the raw socket interface (veth, for example) 
require the kernel commit 9d2f67e43b73 ("net/packet: fix packet drop as of 
virtio gso") in order to work properly. This commit was merged in upstream 
kernel 4.19-rc7, so make sure your kernel is either newer or contains the 
backport."

How does that sound to you now?

Thanks,
--
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-29 Thread
Flavio, got it, thanks a lot. By the way, about too many retransmissions issue 
for veth, I did further investigation, I doubt it is also a veth-related bug on 
kernel side, maybe you Redhat kernel guys can help fix it, tap interface 
doesn't have this issue and it has super high performance compared to veth 
interface.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月29日 1:56
收件人: Yi Yang (杨燚)-云服务集团 
抄送: pkusunyif...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
txfh2...@aliyun.com
主题: Re: 答复: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK


Hi Yi Yang,

This is the bug fix required to make veth TSO work in OvS:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9d2f67e43b73e8af7438be219b66a5de0cfa8bd9

commit 9d2f67e43b73e8af7438be219b66a5de0cfa8bd9
Author: Jianfeng Tan 
Date:   Sat Sep 29 15:41:27 2018 +

net/packet: fix packet drop as of virtio gso

When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.

To fix this, we set this field according to virito hdr information.

Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and 
skb GSO conversion")
Signed-off-by: Jianfeng Tan 
Signed-off-by: David S. Miller 


So, the minimum kernel version is 4.19.

fbl

On Sun, Feb 23, 2020 at 08:45:16AM +0000, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, Flavio
> 
> Just let you know, your TSO support patch does need higher kernel version, it 
> will be great if you can add document to tell users which kernel version is 
> minimal requirement. I can confirm it can work after I used Ubuntu 18.04 and 
> use kernel 5.3.0-40-generic.
> 
> vagrant@ubuntu1804:~$ uname -a
> Linux ubuntu1804 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 
> 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux vagrant@ubuntu1804:~$
> 
> By the way, TPACKET_V3 also can support TSO without any new change needed, so 
> I think TPACKET_V3 can work normally with userspace-tso-enable=true as long 
> as your TSO patch can work normally for veth-to-veth case.
> 
> I'll send out my tpacket patch v5 for review.
> 
> -邮件原件-
> 发件人: Yi Yang (杨燚)-云服务集团
> 发送时间: 2020年2月23日 11:05
> 收件人: 'f...@sysclose.org' 
> 抄送: 'pkusunyif...@gmail.com' ; 
> 'd...@openvswitch.org' ; 'i.maxim...@ovn.org' 
> ; 'txfh2...@aliyun.com' 
> 主题: 答复: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK
> 重要性: 高
> 
> Hi, Flavio
> 
> After I ran it repeatedly in different servers, I'm very sure it can't work 
> on Ubuntu 16.04 kernel 4.15.0-55-generic and Upstream kernel 4.15.9, so can 
> you tell me your kernel version when you ran run-iperf3.sh I provided ? I 
> doubt this TSO patch for veth needs higher kernel version.
> 
> -邮件原件-
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年2月20日 21:41
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: pkusunyif...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
> txfh2...@aliyun.com
> 主题: Re: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK
> 
> On Thu, Feb 20, 2020 at 10:10:36AM +, Yi Yang (杨 D)-云服务集团 wrote:
> > Hi, Flavio
> > 
> > I find this tso feature doesn't work normally on my Ubuntu 16.04, 
> > here is my result. My kernel version is
> > 
> > $ uname -a
> > Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4
> > 09:03:09 UTC
> > 2019 x86_64 x86_64 x86_64 GNU/Linux
> > $
> 
> I tested with 4.15.0 upstream and it worked. Can you do the same?
> 
> > $ ./run-iperf3.sh
> > Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port
> > 56466 connected to 10.15.1.3 port 5201
> > [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> > [  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
> > [  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
> > [  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
> > [  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
> > [  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
> > [  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval   Transfer Bandwidth   Retr
> > [  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
> > sender
> > [  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
> > receiver
> 
> That looks like TSO packets are being dropped and the traffic is basically 
> TCP retransmissions 

[ovs-dev] 答复: [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-27 Thread
William. here I don't use my patch, I just showed you tap is ok, veth is not 
ok, by capturing packets, I'm very sure the packets are truncated, veth's 
packets are different from tap's, during big packets, all the packet sizes are 
about 64K, but not such pattern, 1514 followed a big packet, so I think the 
code for veth is wrong. Yes, I'm verifying it, will send out a patch to fix 
this issue once it is verified.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月27日 23:30
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@126.com; yang_y...@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Tue, Feb 25, 2020 at 5:41 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> In the same environment, but I used tap but not veth, retr number is 0 
> for the case without this patch (of course, I applied Flavio's tap 
> enable patch)
>

Right, because tap does not use the tpacket_v3 mmap packet, so it works fine.

> vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh Connecting to host 
> 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 54572 connected to 
> 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec0   3.14 MBytes
> [  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.14 MBytes
> [  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec0   3.14 MBytes
> [  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec0   3.14 MBytes
> [  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec0   3.14 MBytes
> [  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec0   3.14 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec0 sender
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
> receiver
>

> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
> >
> > TPACKET_V3 can support TSO, it can work only if your kernel can 
> > support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , 
> > if you find the performance is very poor, please turn off tso for 
> > veth interfces in case userspace-tso-enable is set to true.
>
> Do you test the performance of enabling TSO?
>
> Using veth (like your run-iperf3.sh) and with kernel 5.3.
> Without your patch, with TSO enabled, I can get around 6Gbps But with 
> this patch, with TSO enabled, the performance drops to 1.9Gbps.
>

Are you investigating this issue?
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-23 Thread
Hi, Flavio

Just let you know, your TSO support patch does need higher kernel version, it 
will be great if you can add document to tell users which kernel version is 
minimal requirement. I can confirm it can work after I used Ubuntu 18.04 and 
use kernel 5.3.0-40-generic.

vagrant@ubuntu1804:~$ uname -a
Linux ubuntu1804 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 
2020 x86_64 x86_64 x86_64 GNU/Linux
vagrant@ubuntu1804:~$

By the way, TPACKET_V3 also can support TSO without any new change needed, so I 
think TPACKET_V3 can work normally with userspace-tso-enable=true as long as 
your TSO patch can work normally for veth-to-veth case.

I'll send out my tpacket patch v5 for review.

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2020年2月23日 11:05
收件人: 'f...@sysclose.org' 
抄送: 'pkusunyif...@gmail.com' ; 'd...@openvswitch.org' 
; 'i.maxim...@ovn.org' ; 
'txfh2...@aliyun.com' 
主题: 答复: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK
重要性: 高

Hi, Flavio

After I ran it repeatedly in different servers, I'm very sure it can't work on 
Ubuntu 16.04 kernel 4.15.0-55-generic and Upstream kernel 4.15.9, so can you 
tell me your kernel version when you ran run-iperf3.sh I provided ? I doubt 
this TSO patch for veth needs higher kernel version.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月20日 21:41
收件人: Yi Yang (杨燚)-云服务集团 
抄送: pkusunyif...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
txfh2...@aliyun.com
主题: Re: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK

On Thu, Feb 20, 2020 at 10:10:36AM +, Yi Yang (杨 D)-云服务集团 wrote:
> Hi, Flavio
> 
> I find this tso feature doesn't work normally on my Ubuntu 16.04, here 
> is my result. My kernel version is
> 
> $ uname -a
> Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 
> 09:03:09 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> $

I tested with 4.15.0 upstream and it worked. Can you do the same?

> $ ./run-iperf3.sh
> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 56466 connected to 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
> [  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
> [  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
> [  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
> [  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
> [  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
> sender
> [  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
> receiver

That looks like TSO packets are being dropped and the traffic is basically TCP 
retransmissions of MTU size.

fbl


> 
> Server output:
> Accepted connection from 10.15.1.2, port 56464 [  5] local 10.15.1.3 
> port 5201 connected to 10.15.1.2 port 56466
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.00  sec  6.90 MBytes  5.79 Mbits/sec
> [  5]  10.00-20.00  sec  7.71 MBytes  6.47 Mbits/sec [  5]  
> 20.00-30.00  sec  7.73 MBytes  6.48 Mbits/sec [  5]  30.00-40.00  sec  
> 7.79 MBytes  6.53 Mbits/sec [  5]  40.00-50.00  sec  7.79 MBytes  6.53 
> Mbits/sec [  5]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec
> 
> 
> iperf Done.
> $
> 
> But it does work for tap, I'm not sure if it is a kernel issue, which 
> kernel version are you using? I didn't use tpacket_v3 patch. Here is 
> my local ovs info.
> 
> $ git log
> commit 1223cf123ed141c0a0110ebed17572bdb2e3d0f4
> Author: Ilya Maximets 
> Date:   Thu Feb 6 14:24:23 2020 +0100
> 
> netdev-dpdk: Don't enable offloading on HW device if not requested.
> 
> DPDK drivers has different implementations of transmit functions.
> Enabled offloading may cause driver to choose slower variant
> significantly affecting performance if userspace TSO wasn't requested.
> 
> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
> Reported-by: David Marchand 
> Acked-by: David Marchand 
> Acked-by: Flavio Leitner 
> Acked-by: Kevin Traynor 
> Signed-off-by: Ilya Maximets 
> 
> commit 73858f9dbe83daf8cc8d4b604acc23eb62cc3f52
> Author: Flavio Leitner 
> Date:   Mon Feb 3 18:45:50 2020 -0300
> 
> netdev-linux: Prepend the std packet in the TSO packet
> 
> Usually TSO packets are close to 50k, 60k bytes long, so to
> to copy less bytes when receiving a packet from the kernel
> change the approach. Instead of extending the MTU sized
> packet received and append

[ovs-dev] 答复: 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-22 Thread
Hi, Flavio

After I ran it repeatedly in different servers, I'm very sure it can't work on 
Ubuntu 16.04 kernel 4.15.0-55-generic and Upstream kernel 4.15.9, so can you 
tell me your kernel version when you ran run-iperf3.sh I provided ? I doubt 
this TSO patch for veth needs higher kernel version.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月20日 21:41
收件人: Yi Yang (杨燚)-云服务集团 
抄送: pkusunyif...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
txfh2...@aliyun.com
主题: Re: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK

On Thu, Feb 20, 2020 at 10:10:36AM +, Yi Yang (杨 D)-云服务集团 wrote:
> Hi, Flavio
> 
> I find this tso feature doesn't work normally on my Ubuntu 16.04, here 
> is my result. My kernel version is
> 
> $ uname -a
> Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 
> 09:03:09 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> $

I tested with 4.15.0 upstream and it worked. Can you do the same?

> $ ./run-iperf3.sh
> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 56466 connected to 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
> [  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
> [  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
> [  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
> [  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
> [  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
> sender
> [  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
> receiver

That looks like TSO packets are being dropped and the traffic is basically TCP 
retransmissions of MTU size.

fbl


> 
> Server output:
> Accepted connection from 10.15.1.2, port 56464 [  5] local 10.15.1.3 
> port 5201 connected to 10.15.1.2 port 56466
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.00  sec  6.90 MBytes  5.79 Mbits/sec
> [  5]  10.00-20.00  sec  7.71 MBytes  6.47 Mbits/sec [  5]  
> 20.00-30.00  sec  7.73 MBytes  6.48 Mbits/sec [  5]  30.00-40.00  sec  
> 7.79 MBytes  6.53 Mbits/sec [  5]  40.00-50.00  sec  7.79 MBytes  6.53 
> Mbits/sec [  5]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec
> 
> 
> iperf Done.
> $
> 
> But it does work for tap, I'm not sure if it is a kernel issue, which 
> kernel version are you using? I didn't use tpacket_v3 patch. Here is 
> my local ovs info.
> 
> $ git log
> commit 1223cf123ed141c0a0110ebed17572bdb2e3d0f4
> Author: Ilya Maximets 
> Date:   Thu Feb 6 14:24:23 2020 +0100
> 
> netdev-dpdk: Don't enable offloading on HW device if not requested.
> 
> DPDK drivers has different implementations of transmit functions.
> Enabled offloading may cause driver to choose slower variant
> significantly affecting performance if userspace TSO wasn't requested.
> 
> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
> Reported-by: David Marchand 
> Acked-by: David Marchand 
> Acked-by: Flavio Leitner 
> Acked-by: Kevin Traynor 
> Signed-off-by: Ilya Maximets 
> 
> commit 73858f9dbe83daf8cc8d4b604acc23eb62cc3f52
> Author: Flavio Leitner 
> Date:   Mon Feb 3 18:45:50 2020 -0300
> 
> netdev-linux: Prepend the std packet in the TSO packet
> 
> Usually TSO packets are close to 50k, 60k bytes long, so to
> to copy less bytes when receiving a packet from the kernel
> change the approach. Instead of extending the MTU sized
> packet received and append with remaining TSO data from
> the TSO buffer, allocate a TSO packet with enough headroom
> to prepend the std packet data.
> 
> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
> Suggested-by: Ben Pfaff 
> Signed-off-by: Flavio Leitner 
> Signed-off-by: Ben Pfaff 
> 
> commit 2297cbe6cc25b6b1862c499ce8f16f52f75d9e5f
> Author: Flavio Leitner 
> Date:   Mon Feb 3 11:22:22 2020 -0300
> 
> netdev-linux-private: fix max length to be 16 bits
> 
> The dp_packet length is limited to 16 bits, so document that
> and fix the length value accordingly.
> 
> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
> Signed-off-by: Flavio Leitner 
> Signed-off-by: Ben Pfaff 
> 
> commit 3d6a6f450af5b7eaf4b532983cb14458ae792b72
> Author: David Marchand 
> Date:   Tue Feb 4 22:28:26 2020 +0100
> 
>   

[ovs-dev] 答复: 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-20 Thread
Very weird, I built 4.15.9, it is from upstream kernel, the result is same, 
what's wrong? I can't understand. I directly used current ovs master for this 
time.

$ ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 54078 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  6.01 MBytes  5.04 Mbits/sec  1688   5.66 KBytes
[  4]  10.00-20.00  sec  6.17 MBytes  5.17 Mbits/sec  1725   7.07 KBytes
[  4]  20.00-30.00  sec  6.51 MBytes  5.46 Mbits/sec  1828   5.66 KBytes
[  4]  30.00-40.00  sec  5.58 MBytes  4.68 Mbits/sec  1509   7.07 KBytes
[  4]  40.00-50.00  sec  4.83 MBytes  4.05 Mbits/sec  1182   7.07 KBytes
[  4]  50.00-60.00  sec  4.49 MBytes  3.77 Mbits/sec  1110   5.66 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  33.6 MBytes  4.70 Mbits/sec  9042 sender
[  4]   0.00-60.00  sec  33.5 MBytes  4.69 Mbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 54076
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 54078
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  5.89 MBytes  4.94 Mbits/sec
[  5]  10.00-20.00  sec  6.23 MBytes  5.22 Mbits/sec
[  5]  20.00-30.00  sec  6.46 MBytes  5.42 Mbits/sec
[  5]  30.00-40.00  sec  5.62 MBytes  4.71 Mbits/sec
[  5]  40.00-50.00  sec  4.83 MBytes  4.05 Mbits/sec
[  5]  50.00-60.00  sec  4.45 MBytes  3.73 Mbits/sec


iperf Done.
$ uname -a
Linux cmp008 4.15.9 #1 SMP Fri Feb 21 09:27:41 UTC 2020 x86_64 x86_64 x86_64 
GNU/Linux
eipadmin@cmp008:~$
eipadmin@cmp008:~$ cd ovs-master/
eipadmin@cmp008:~/ovs-master$ git log
commit ac23d20fc90da3b1c9b2117d1e22102e99fba006
Author: Yi-Hung Wei 
Date:   Fri Feb 7 14:55:06 2020 -0800

conntrack: Fix TCP conntrack state

If a TCP connection is in SYN_SENT state, receiving another SYN packet
would just renew the timeout of that conntrack entry rather than create
a new one.  Thus, tcp_conn_update() should return CT_UPDATE_VALID_NEW.

This also fixes regressions of a couple of  OVN system tests.

Fixes: a867c010ee91 ("conntrack: Fix conntrack new state")
Reported-by: Dumitru Ceara 
Signed-off-by: Yi-Hung Wei 
Tested-by: Dumitru Ceara 
Signed-off-by: William Tu 

commit 486139d9e4b81dae04b2bb7487d45366865ac0ad
Author: Tomasz Konieczny 
Date:   Wed Feb 12 14:15:56 2020 +0100

docs: Update DPDK version table

Signed-off-by: Tomasz Konieczny 
Acked-by: Flavio Leitner 
Acked-by: Kevin Traynor 
Signed-off-by: Ian Stokes 

commit 9efbdaa201530ab7023a69176aba54c32c468efb
Author: Ben Pfaff 
Date:   Thu Feb 13 16:27:01 2020 -0800

Set release date for 2.13.0.

The "Valentine's Day" release.

Acked-by: Flavio Leitner 
Signed-off-by: Ben Pfaff 

commit 19e99c83bb4da4617730f20392515d8aca5b61ba
Author: Yi-Hung Wei 
$

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月20日 21:41
收件人: Yi Yang (杨燚)-云服务集团 
抄送: pkusunyif...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
txfh2...@aliyun.com
主题: Re: 答复: [ovs-dev] [PATCH v4 0/3] Add support for TSO with DPDK

On Thu, Feb 20, 2020 at 10:10:36AM +, Yi Yang (杨 D)-云服务集团 wrote:
> Hi, Flavio
> 
> I find this tso feature doesn't work normally on my Ubuntu 16.04, here 
> is my result. My kernel version is
> 
> $ uname -a
> Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 
> 09:03:09 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> $

I tested with 4.15.0 upstream and it worked. Can you do the same?

> $ ./run-iperf3.sh
> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 56466 connected to 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
> [  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
> [  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
> [  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
> [  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
> [  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
> sender
> [  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
> receiver

That looks like TSO packets are being dropped and the traffic is basically TCP 
retransmissions of MTU size.

fbl


> 
> Server output:
> Accepted connection from 10.15.1.2, port 56464 [  5] local 10.15.1.3 
> port 5201 connected to 10.15.1.2 port 56466
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.00  sec 

[ovs-dev] 答复: 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-20 Thread
No, I didn't use VMs, just veth in netns, I doubt it is Ubuntu kernel bug.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月21日 3:21
收件人: Yi Yang (杨燚)-云服务集团 
抄送: f...@sysclose.org; pkusunyif...@gmail.com; d...@openvswitch.org; 
i.maxim...@ovn.org; txfh2...@aliyun.com
主题: Re: [ovs-dev] 答复: [PATCH v4 0/3] Add support for TSO with DPDK

On Thu, Feb 20, 2020 at 2:12 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, Flavio
>
> I find this tso feature doesn't work normally on my Ubuntu 16.04, here 
> is my result. My kernel version is

Hi Yiyang,

I'm so confused with your description. Which case does not work for you?
Yifeng and Flavio were using OVS-DPDK with vhostuser to VM, is this the case 
you're talking about?

>
> $ uname -a
> Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 
> 09:03:09 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> $
>
> $ ./run-iperf3.sh

Which case is this one?

> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 56466 connected to 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
> [  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
> [  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
> [  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
> [  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
> [  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
> sender
> [  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
> receiver
>
> Server output:
> Accepted connection from 10.15.1.2, port 56464 [  5] local 10.15.1.3 
> port 5201 connected to 10.15.1.2 port 56466
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.00  sec  6.90 MBytes  5.79 Mbits/sec
> [  5]  10.00-20.00  sec  7.71 MBytes  6.47 Mbits/sec [  5]  
> 20.00-30.00  sec  7.73 MBytes  6.48 Mbits/sec [  5]  30.00-40.00  sec  
> 7.79 MBytes  6.53 Mbits/sec [  5]  40.00-50.00  sec  7.79 MBytes  6.53 
> Mbits/sec [  5]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec
>
>
> iperf Done.
> $
>
> But it does work for tap, I'm not sure if it is a kernel issue, which 
> kernel
  ^^^
So which case does not work?

> version are you using? I didn't use tpacket_v3 patch. Here is my local 
> ovs info.

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: OVS performance issue: why small udp packet pps performance between VMs is highly related with number of ovs ports and number of VMs?

2020-02-19 Thread
Hi, Flavio and Ilya

After I checked it further, I'm very sure it isn't led by bandwidth limit, the 
issue is still there after I totally removed bandwidth limit (tc qdisc show can 
confirm this).

The issue becomes more seriously when we started more VMs in a subnet, we did 
see the actions which output to many ports, here is output of "sudo ovs-appctl 
dpif/dump-flows br-int" when "iperf3 is running"

recirc_id(0),in_port(12),eth(src=fa:16:3e:49:26:51,dst=fa:16:3e:a7:0a:3a),eth_type(0x0800),ipv4(tos=0/0x3,frag=no),
 packets:11012944, bytes:726983412, used:0.000s, flags:SP., 
actions:push_vlan(vid=1,pcp=0),2,set(tunnel(tun_id=0x49,src=10.3.2.17,dst=10.3.2.16,ttl=64,tp_dst=4789,flags(df|key))),pop_vlan,9,8,11,13,14,15,16,17,18,19

number of output ports will be linearly related with number of VMs.

Obviously, mac fdbs ovs learned didn't include this destination mac which is 
MAC of iperf3 server VM in another compute node.

$ sudo ovs-vsctl show | grep Bridge
Bridge br-floating
Bridge br-int
Bridge "br-bond1"
Bridge br-tun
$ sudo ovs-appctl fdb/show br-floating | grep fa:16:3e:49:26:51
$ sudo ovs-appctl fdb/show br-tun | grep fa:16:3e:49:26:51
$ sudo ovs-appctl fdb/show br-bond1 | grep fa:16:3e:49:26:51
$ sudo ovs-appctl fdb/show br-int | grep fa:16:3e:49:26:51

This is indeed done by default "NORMAL action" in br-int, my question is why 
"NORMAL" action will output it to some other ports? why it can't learn MAC in 
other compute nodes by vxlan? Is there any good way to fix it?

By the way, all the other VMs which have same subnet as iperf3 client VM can 
receive iperf3 client packets(ifconfig eth0 in these VMs can see RX packets 
number which is increasing very quickly, tcpdump can see these packets), 
destination MAC of these packets are fa:16:3e:49:26:51, not broadcast MAC, VM 
interface is not in promisc mode.

Look forward to getting your guide, is this default behavior of OVS? Can you 
explain this "NORMAL" action?

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月13日 22:21
收件人: Ilya Maximets 
抄送: Yi Yang (杨燚)-云服务集团 ; ovs-disc...@openvswitch.org; 
ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-dev] OVS performance issue: why small udp packet pps 
performance between VMs is highly related with number of ovs ports and number 
of VMs?

On Thu, Feb 13, 2020 at 03:07:33PM +0100, Ilya Maximets wrote:
> On 2/13/20 2:52 PM, Yi Yang (杨燚)-云服务集团 wrote:
> > Thanks Ilya, iperf3 udp should be single direction, source IP address and 
> > destination IP address are two VMs' IP, udp bandwidth will be 0 if they are 
> > wrong, but obviously UDP loss rate is 0, so it isn't the case you're 
> > saying, do we have way to disable MAC learning or MAC broadcast?
> 
> NORMAL action acts like an L2 learning switch.  If you don't want to 
> use MAC learning, remove flow with NORMAL action and add direct 
> forwarding flow like output:.  But I don't think that 
> you want to do that in OpenStack setup.

Also iperf3 establishes the control connection which uses TCP in both 
directions. So, in theory, the FDB should be updated.

> > Is NORMAL action or MAC learning slow path process? If so, ovs-vswitchd 
> > daemon should have high cpu utilization.
> 
> It's not a slow path, so there will be no cpu usage by ovs-vswitchd 
> userspace process.  To confirm that you're flooding packets, you may 
> dump installed datapath flows with the following command:
> 
> ovs-appctl dpctl/dump-flows
> 
> In case of flood, you will see datapath flow with big number of output 
> ports like this:
> 
> <...>  actions:,,...

I'd suggest to look at the fdb: ovs-appctl fdb/show  and port stats to see 
if there is traffic moving as well.
Maybe it's not your UDP test packet, but another unrelated traffic in the 
network.

HTH,
fbl


> 
> > 
> > -邮件原件-
> > 发件人: Ilya Maximets [mailto:i.maxim...@ovn.org]
> > 发送时间: 2020年2月13日 21:23
> > 收件人: Flavio Leitner ; Yi Yang (杨燚)-云服务集团 
> > 
> > 抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; Ilya 
> > Maximets 
> > 主题: Re: [ovs-dev] OVS performance issue: why small udp packet pps 
> > performance between VMs is highly related with number of ovs ports and 
> > number of VMs?
> > 
> > On 2/13/20 12:48 PM, Flavio Leitner wrote:
> >> On Thu, Feb 13, 2020 at 09:18:38AM +, Yi Yang (杨燚)-云服务集团 wrote:
> >>> Hi, all
> >>>
> >>> We find ovs has serious performance issue, we only launch one VM 
> >>> in one compute, and do iperf small udp pps performance test 
> >>> between these two VMs, we can see about 18 pps (packets per 
> >>> second, -l 16), but
> >>>
> >>> 1) if we add 100 veth ports in br-int bridge, respectively, t

[ovs-dev] 答复: 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-18 Thread
Hi, William

I pushed fix patch into my github repo, travis showed all the builds passed, 
https://travis-ci.org/yyang13/ovs (my ovs repo: https://github.com/yyang13/ovs)

I'll include this in next version, I'll send out it after I get more comments 
and work out a way which can make sure tpacket_v3 and TSO coexist.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月18日 23:32
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@126.com; ovs-dev@openvswitch.org; yang_y...@163.com
主题: Re: [ovs-dev] 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Fri, Feb 14, 2020 at 8:10 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> William, I don't know why I can't receive your comments in my outlook, 
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367860.ht
> ml
>
Are you able to use github? if OK, then travis can directly links to you github 
repo. So when you checkin your code to your repo, travis will start the test.

> I don't know how to check travis build issue, can you help provide a 
> quick guide in order that I can fix it?
>
OK!

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-18 Thread
Thanks William, I forked ovs to my github account, travis worked, I'll fix 
those build issues.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月18日 23:32
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@126.com; ovs-dev@openvswitch.org; yang_y...@163.com
主题: Re: [ovs-dev] 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for 
userspace datapath

On Fri, Feb 14, 2020 at 8:10 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> William, I don't know why I can't receive your comments in my outlook, 
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367860.ht
> ml
>
Are you able to use github? if OK, then travis can directly links to you github 
repo. So when you checkin your code to your repo, travis will start the test.

> I don't know how to check travis build issue, can you help provide a 
> quick guide in order that I can fix it?
>
OK!

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v2] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-13 Thread
Maximum packet size for TSO is 65535+ethheader (+vlan header if have), for TSO, 
frame size is set to 64k+4k (for tpacket3_header), block size is same as frame 
size, tpacket send requires one packet must be inside one frame, one packet 
can't cross frame boundary for tpacket.

Please read my code for details by 
https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367689.html, I 
have sent out v3 patch, but no comments so far.


-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月14日 9:56
收件人: Yi Yang (杨燚)-云服务集团 
抄送: i.maxim...@ovn.org; yang_y...@126.com; ovs-dev@openvswitch.org; 
yang_y...@163.com; b...@ovn.org
主题: Re: [ovs-dev] [PATCH v2] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Thu, Feb 13, 2020 at 5:00 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> No, block size and frame size are defined by user, you can specify any size, 
> but block size must be pagesize aligned, please read v3 patch and try it in 
> your environment.
>

Right, but
How do we set block size and frame size to accommodate the TSO 64K-size packet?

William

> -邮件原件-
> 发件人: William Tu [mailto:u9012...@gmail.com]
> 发送时间: 2020年2月14日 8:38
> 收件人: Ilya Maximets 
> 抄送: yang_y...@126.com; ovs-dev ; yang_y_yi 
> ; Ben Pfaff ; Yi Yang (杨燚)-云服务集团 
> 
> 主题: Re: [ovs-dev] [PATCH v2] Use TPACKET_V3 to accelerate veth for 
> userspace datapath
>
> On Fri, Feb 7, 2020 at 6:43 AM Ilya Maximets  wrote:
> >
> > On 2/7/20 12:50 PM, yang_y...@126.com wrote:
> > > From: Yi Yang 
> > >
> > > We can avoid high system call overhead by using TPACKET_V3 and 
> > > using DPDK-like poll to receive and send packets (Note: send still 
> > > needs to call sendto to trigger final packet transmission).
> > >
> > >>From Linux kernel 3.10 on, TPACKET_V3 has been supported,
> > > so all the Linux kernels current OVS supports can run
> > > TPACKET_V3 without any problem.
> > >
> > > I can see about 30% performance improvement for veth compared to 
> > > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > > Gbps, but it was 1.47 Gbps before.
> > >
> > > Note: it can't support TSO which is in progress.
> >
> > So, this patch effectively breaks TSO functionality in compile time, 
> > i.e. it compiles out the TSO capable function invocation.
> > I don't think that we should mege that. For this patch to be 
> > acceptable, tpacket implementation should support TSO or it should 
> > be possible to dynamically switch to usual sendmmsg if we want to enable 
> > TSO support.
> >
> I think it's impossible to support tpacket + TSO, because tpacket 
> pre-allocate a ring buffer with 2K buffer size, and each descriptor can only 
> point to one entry.
> (If I understand correctly)
>
> So I think we should dynamically switch back to sendmmsg when TSO is enabled.
>
> Regards,
> William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v2] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-13 Thread
No, block size and frame size are defined by user, you can specify any size, 
but block size must be pagesize aligned, please read v3 patch and try it in 
your environment.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2020年2月14日 8:38
收件人: Ilya Maximets 
抄送: yang_y...@126.com; ovs-dev ; yang_y_yi 
; Ben Pfaff ; Yi Yang (杨燚)-云服务集团 

主题: Re: [ovs-dev] [PATCH v2] Use TPACKET_V3 to accelerate veth for userspace 
datapath

On Fri, Feb 7, 2020 at 6:43 AM Ilya Maximets  wrote:
>
> On 2/7/20 12:50 PM, yang_y...@126.com wrote:
> > From: Yi Yang 
> >
> > We can avoid high system call overhead by using TPACKET_V3 and using 
> > DPDK-like poll to receive and send packets (Note: send still needs 
> > to call sendto to trigger final packet transmission).
> >
> >>From Linux kernel 3.10 on, TPACKET_V3 has been supported,
> > so all the Linux kernels current OVS supports can run
> > TPACKET_V3 without any problem.
> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
> >
> > Note: it can't support TSO which is in progress.
>
> So, this patch effectively breaks TSO functionality in compile time, 
> i.e. it compiles out the TSO capable function invocation.
> I don't think that we should mege that. For this patch to be 
> acceptable, tpacket implementation should support TSO or it should be 
> possible to dynamically switch to usual sendmmsg if we want to enable TSO 
> support.
>
I think it's impossible to support tpacket + TSO, because tpacket pre-allocate 
a ring buffer with 2K buffer size, and each descriptor can only point to one 
entry.
(If I understand correctly)

So I think we should dynamically switch back to sendmmsg when TSO is enabled.

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: OVS performance issue: why small udp packet pps performance between VMs is highly related with number of ovs ports and number of VMs?

2020-02-13 Thread
Thanks Ilya, iperf3 udp should be single direction, source IP address and 
destination IP address are two VMs' IP, udp bandwidth will be 0 if they are 
wrong, but obviously UDP loss rate is 0, so it isn't the case you're saying, do 
we have way to disable MAC learning or MAC broadcast?

Is NORMAL action or MAC learning slow path process? If so, ovs-vswitchd daemon 
should have high cpu utilization.

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年2月13日 21:23
收件人: Flavio Leitner ; Yi Yang (杨燚)-云服务集团 

抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; Ilya Maximets 

主题: Re: [ovs-dev] OVS performance issue: why small udp packet pps performance 
between VMs is highly related with number of ovs ports and number of VMs?

On 2/13/20 12:48 PM, Flavio Leitner wrote:
> On Thu, Feb 13, 2020 at 09:18:38AM +, Yi Yang (杨燚)-云服务集团 wrote:
>> Hi, all
>>
>> We find ovs has serious performance issue, we only launch one VM in 
>> one compute, and do iperf small udp pps performance test between 
>> these two VMs, we can see about 18 pps (packets per second, -l 
>> 16), but
>>
>> 1) if we add 100 veth ports in br-int bridge, respectively, then the pps 
>> performance will be about 5 pps.
>> 2) If we launch one more VM in every compute node, but don’t run any 
>> workload, the pps performance will be about 9 pps. (note, no 
>> above veth ports in this test)
>> 3) If we launch two more VMs in every compute node (totally 3 VMs 
>> every compute nodes), but don’t run any workload , the pps 
>> performance will be about 5 pps (note, no above veth ports in 
>> this test)
>>
>> Anybody can help explain why it is so? Is there any known way to 
>> optimized this? I really think ovs performance is bad (we can draw 
>> such conclusion from our test result at least), I don’t want to 
>> defame ovs ☺
>>
>> BTW, we used ovs kernel datapath and vhost, we can see every port has a 
>> vhost kernel thread, it is running with 100% cpu utilization if we run iperf 
>> in VM, bu for those idle VMs, the corresponding vhost still has about 30% 
>> cpu utilization, I don’t understand why.
>>
>> In addition, we find udp performance is also very bad for small UDP packet 
>> for physical NIC. But it can reach 26 pps for –l 80 which enough covers 
>> vxlan header (8 bytes) + inner eth header (14) + ipudp header (28) + 16 = 
>> 66, if we consider performance overhead ovs bridge introduces, pps 
>> performance between VMs should be able to reach 20 pps at least, other 
>> VMs and ports shouldn’t have so big hurt against it because they are idle, 
>> no any workload there.
> 
> What do you have in the flow table?  It sounds like the traffic is 
> being broadcast to all ports. Check the FDB to see if OvS is learning 
> the mac addresses.
> 
> It's been a while since I don't run performance tests with kernel 
> datapath, but it should be no different than Linux bridge with just 
> action NORMAL in the flow table.
> 

I agree that if your performance heavily depends on the number of ports than 
you're most likely just flooding all the packets to all the ports.  Since 
you're using UDP traffic, please, be sure that you're sending some packets in 
backward direction, so OVS and all other switches (if any) will learn/not 
forget to which port packets should be sent.  Also, check if your IP addresses 
are correct.  If for some reason it's not possible for OVS to learn MAC 
addresses correctly, avoid using action:NORMAL.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: OVS performance issue: why small udp packet pps performance between VMs is highly related with number of ovs ports and number of VMs?

2020-02-13 Thread
Flavio, this is an openstack environment, all the flows are added by neutron, 
NORMAL action is default flow before neutron adds any flow, this is ovs default 
flow.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月13日 19:48
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] OVS performance issue: why small udp packet pps performance 
between VMs is highly related with number of ovs ports and number of VMs?

On Thu, Feb 13, 2020 at 09:18:38AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, all
> 
> We find ovs has serious performance issue, we only launch one VM in 
> one compute, and do iperf small udp pps performance test between these 
> two VMs, we can see about 18 pps (packets per second, -l 16), but
> 
> 1) if we add 100 veth ports in br-int bridge, respectively, then the pps 
> performance will be about 5 pps.
> 2) If we launch one more VM in every compute node, but don’t run any 
> workload, the pps performance will be about 9 pps. (note, no above 
> veth ports in this test)
> 3) If we launch two more VMs in every compute node (totally 3 VMs 
> every compute nodes), but don’t run any workload , the pps performance 
> will be about 5 pps (note, no above veth ports in this test)
> 
> Anybody can help explain why it is so? Is there any known way to 
> optimized this? I really think ovs performance is bad (we can draw 
> such conclusion from our test result at least), I don’t want to defame 
> ovs ☺
> 
> BTW, we used ovs kernel datapath and vhost, we can see every port has a vhost 
> kernel thread, it is running with 100% cpu utilization if we run iperf in VM, 
> bu for those idle VMs, the corresponding vhost still has about 30% cpu 
> utilization, I don’t understand why.
> 
> In addition, we find udp performance is also very bad for small UDP packet 
> for physical NIC. But it can reach 26 pps for –l 80 which enough covers 
> vxlan header (8 bytes) + inner eth header (14) + ipudp header (28) + 16 = 66, 
> if we consider performance overhead ovs bridge introduces, pps performance 
> between VMs should be able to reach 20 pps at least, other VMs and ports 
> shouldn’t have so big hurt against it because they are idle, no any workload 
> there.

What do you have in the flow table?  It sounds like the traffic is being 
broadcast to all ports. Check the FDB to see if OvS is learning the mac 
addresses.

It's been a while since I don't run performance tests with kernel datapath, but 
it should be no different than Linux bridge with just action NORMAL in the flow 
table.

--
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS performance issue: why small udp packet pps performance between VMs is highly related with number of ovs ports and number of VMs?

2020-02-13 Thread
Hi, all

We find ovs has serious performance issue, we only launch one VM in one 
compute, and do iperf small udp pps performance test between these two VMs, we 
can see about 18 pps (packets per second, -l 16), but

1) if we add 100 veth ports in br-int bridge, respectively, then the pps 
performance will be about 5 pps.
2) If we launch one more VM in every compute node, but don’t run any workload, 
the pps performance will be about 9 pps. (note, no above veth ports in this 
test)
3) If we launch two more VMs in every compute node (totally 3 VMs every compute 
nodes), but don’t run any workload , the pps performance will be about 5 
pps (note, no above veth ports in this test)

Anybody can help explain why it is so? Is there any known way to optimized 
this? I really think ovs performance is bad (we can draw such conclusion from 
our test result at least), I don’t want to defame ovs ☺

BTW, we used ovs kernel datapath and vhost, we can see every port has a vhost 
kernel thread, it is running with 100% cpu utilization if we run iperf in VM, 
bu for those idle VMs, the corresponding vhost still has about 30% cpu 
utilization, I don’t understand why.

In addition, we find udp performance is also very bad for small UDP packet for 
physical NIC. But it can reach 26 pps for –l 80 which enough covers vxlan 
header (8 bytes) + inner eth header (14) + ipudp header (28) + 16 = 66, if we 
consider performance overhead ovs bridge introduces, pps performance between 
VMs should be able to reach 20 pps at least, other VMs and ports shouldn’t 
have so big hurt against it because they are idle, no any workload there.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: 答复: 答复: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

2020-02-06 Thread
Got it, I didn't apply the third one " netdev-linux-private: fix max length to 
be 16 bits ", that should be the reason of the issue. I had thought you just 
added a comment in this patch, so didn't apply it, it changed 65536 to 65535, 
it is my mistake. Thanks a lot.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月6日 21:12
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
主题: Re: 答复: 答复: 答复: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std 
packet in the TSO packet

On Thu, Feb 06, 2020 at 01:03:32PM +, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, Flavio
> 
> What's the difference between
> https://github.com/fleitner/ovs/tree/tso-tap-enable-tx-v1 and ovs 
> master? My network is very slow, maybe you call tell me which commits 
> are new compared to ovs master, I can get those commits and apply them 
> against my local ovs tree, that will be faster way to do quick check. 
> I used current ovs master, I didn't apply any special patch except 
> your patches.

You can browse the list of patches there:

netdev-linux: Enable TSO in the TAP device.
netdev-linux: Prepend the std packet in the TSO packet
netdev-linux-private: fix max length to be 16 bits 

On top of: Prepare for 2.13.0. 

You can always add a git remote and fetch only the changes you need.

fbl


> 
> -邮件原件-
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年2月6日 20:10
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
> 主题: Re: 答复: 答复: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std 
> packet in the TSO packet
> 
> On Thu, Feb 06, 2020 at 07:34:35AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > Hi, Flavio
> > 
> > I tried current ovs master and your two patches (this one and the 
> > one for tap), tap to tap and veth to veth performance are very bad 
> > when tso is on, I think merged tso patch series are wrong for such 
> > use cases, BTW, the performance data is normal if I turned off tso 
> > "sudo ip netns exec nsXXX ethtool -K vethX tso off", for tap, the 
> > same is there (I used your tap patch for this), it is ok when I 
> > turned off tso.
> 
> Can you try out the code from here instead:
> https://github.com/fleitner/ovs/tree/tso-tap-enable-tx-v1
>  
> > here is performance data:
> > 
> > Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port
> > 36024 connected to 10.15.1.3 port 5201
> > [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> > [  4]   0.00-10.00  sec  7.59 MBytes  6.37 Mbits/sec  2410   7.07 KBytes
> > [  4]  10.00-20.00  sec  7.76 MBytes  6.51 Mbits/sec  2496   5.66 KBytes
> > [  4]  20.00-30.00  sec  7.99 MBytes  6.70 Mbits/sec  2536   5.66 KBytes
> > [  4]  30.00-40.00  sec  7.85 MBytes  6.58 Mbits/sec  2506   5.66 KBytes
> > [  4]  40.00-50.00  sec  7.93 MBytes  6.65 Mbits/sec  2556   5.66 KBytes
> > [  4]  50.00-60.00  sec  8.25 MBytes  6.92 Mbits/sec  2696   7.07 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval   Transfer Bandwidth   Retr
> > [  4]   0.00-60.00  sec  47.4 MBytes  6.62 Mbits/sec  15200 
> > sender
> > [  4]   0.00-60.00  sec  47.3 MBytes  6.61 Mbits/sec  
> > receiver
> > 
> > Server output:
> > Accepted connection from 10.15.1.2, port 36022 [  5] local 10.15.1.3 
> > port 5201 connected to 10.15.1.2 port 36024
> > [ ID] Interval   Transfer Bandwidth
> > [  5]   0.00-10.00  sec  7.44 MBytes  6.24 Mbits/sec
> > [  5]  10.00-20.00  sec  7.80 MBytes  6.54 Mbits/sec [  5]
> > 20.00-30.00  sec  7.95 MBytes  6.67 Mbits/sec [  5]  30.00-40.00  
> > sec
> > 7.83 MBytes  6.57 Mbits/sec [  5]  40.00-50.00  sec  8.00 MBytes  
> > 6.71 Mbits/sec [  5]  50.00-60.00  sec  8.23 MBytes  6.91 Mbits/sec
> 
> Yeah, that looks like to be the performance only for TCP retransmits. 
> 
> Here is your script running with the code I provided before/above:
> [root@wsfd-netdev93 yiyang]# ./yiyang-test.sh PING 10.15.1.3 (10.15.1.3) 
> 56(84) bytes of data.
> 64 bytes from 10.15.1.3: icmp_seq=1 ttl=64 time=0.425 ms
> 64 bytes from 10.15.1.3: icmp_seq=2 ttl=64 time=0.253 ms
> 64 bytes from 10.15.1.3: icmp_seq=3 ttl=64 time=0.356 ms
> 
> --- 10.15.1.3 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 44ms rtt 
> min/avg/max/mdev = 0.253/0.344/0.425/0.073 ms PING 10.15.1.2 (10.15.1.2) 
> 56(84) bytes of data.
> 64 bytes from 10.15.1.2: icmp_seq=1 ttl=64 time=0.302 ms
> 64 bytes from 10.15.1.2: icmp_seq=2 ttl=64 time=0.475 ms
> 64 bytes from 10.15.1.2: icmp_seq=3 ttl=64 time=0.383 ms
> 
> --- 10.15.1.

[ovs-dev] 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

2020-02-06 Thread
Thanks Illya for pointing out this, I checked if_packet.h in git.kernel.org by 
using v3.10 tag, it indeed can support TPACKET_V3

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/if_packet.h?h=v3.10

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/if_packet.h?h=v3.10#n225

#define TPACKET3_HDRLEN (TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + 
sizeof(struct sockaddr_ll))

So we can safely support TPACKET_V3 only because ovs' minimal kernel 
requirement is 3.10.0

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年2月6日 19:11
收件人: ovs-dev@openvswitch.org; Yi Yang (杨燚)-云服务集团 ; William 
Tu 
抄送: Ben Pfaff ; Ilya Maximets 
主题: Re: [ovs-dev] [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK 
datapath

> On Sun, Feb 2, 2020 at 8:06 PM Yi Yang (杨燚)-云服务集团  
> wrote:
>>
>> Hi, William
>>
>> Sorry for last reply, I don't know why I always can't get your 
>> comments email from my outlook, Ben's comments are ok, I also can't 
>> see your comments in outlook junk box.
>>
>> About your comments in
>> https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367146.ht
>> ml, I checked it in my CentOS 7 which has 3.10.0 kernel, TPACKET_V3 
>> sample code can work, so I'm ok to remove V1 code.
>>
> 
> OK thank you for confirming that v3 works on 3.10 cento 7!


FYI, rhel/centos 3.10 kernels has almost nothing in common with upstream 3.10 
kernel.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: 答复: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

2020-02-06 Thread
Hi, Flavio

What's the difference between 
https://github.com/fleitner/ovs/tree/tso-tap-enable-tx-v1 and ovs master? My 
network is very slow, maybe you call tell me which commits are new compared to 
ovs master, I can get those commits and apply them against my local ovs tree, 
that will be faster way to do quick check. I used current ovs master, I didn't 
apply any special patch except your patches.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月6日 20:10
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
主题: Re: 答复: 答复: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std packet 
in the TSO packet

On Thu, Feb 06, 2020 at 07:34:35AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, Flavio
> 
> I tried current ovs master and your two patches (this one and the one 
> for tap), tap to tap and veth to veth performance are very bad when 
> tso is on, I think merged tso patch series are wrong for such use 
> cases, BTW, the performance data is normal if I turned off tso "sudo 
> ip netns exec nsXXX ethtool -K vethX tso off", for tap, the same is 
> there (I used your tap patch for this), it is ok when I turned off 
> tso.

Can you try out the code from here instead:
https://github.com/fleitner/ovs/tree/tso-tap-enable-tx-v1 
 
> here is performance data:
> 
> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 36024 connected to 10.15.1.3 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-10.00  sec  7.59 MBytes  6.37 Mbits/sec  2410   7.07 KBytes
> [  4]  10.00-20.00  sec  7.76 MBytes  6.51 Mbits/sec  2496   5.66 KBytes
> [  4]  20.00-30.00  sec  7.99 MBytes  6.70 Mbits/sec  2536   5.66 KBytes
> [  4]  30.00-40.00  sec  7.85 MBytes  6.58 Mbits/sec  2506   5.66 KBytes
> [  4]  40.00-50.00  sec  7.93 MBytes  6.65 Mbits/sec  2556   5.66 KBytes
> [  4]  50.00-60.00  sec  8.25 MBytes  6.92 Mbits/sec  2696   7.07 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth   Retr
> [  4]   0.00-60.00  sec  47.4 MBytes  6.62 Mbits/sec  15200 sender
> [  4]   0.00-60.00  sec  47.3 MBytes  6.61 Mbits/sec  receiver
> 
> Server output:
> Accepted connection from 10.15.1.2, port 36022 [  5] local 10.15.1.3 
> port 5201 connected to 10.15.1.2 port 36024
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.00  sec  7.44 MBytes  6.24 Mbits/sec
> [  5]  10.00-20.00  sec  7.80 MBytes  6.54 Mbits/sec [  5]  
> 20.00-30.00  sec  7.95 MBytes  6.67 Mbits/sec [  5]  30.00-40.00  sec  
> 7.83 MBytes  6.57 Mbits/sec [  5]  40.00-50.00  sec  8.00 MBytes  6.71 
> Mbits/sec [  5]  50.00-60.00  sec  8.23 MBytes  6.91 Mbits/sec

Yeah, that looks like to be the performance only for TCP retransmits. 

Here is your script running with the code I provided before/above:
[root@wsfd-netdev93 yiyang]# ./yiyang-test.sh PING 10.15.1.3 (10.15.1.3) 56(84) 
bytes of data.
64 bytes from 10.15.1.3: icmp_seq=1 ttl=64 time=0.425 ms
64 bytes from 10.15.1.3: icmp_seq=2 ttl=64 time=0.253 ms
64 bytes from 10.15.1.3: icmp_seq=3 ttl=64 time=0.356 ms

--- 10.15.1.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 44ms rtt 
min/avg/max/mdev = 0.253/0.344/0.425/0.073 ms PING 10.15.1.2 (10.15.1.2) 56(84) 
bytes of data.
64 bytes from 10.15.1.2: icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from 10.15.1.2: icmp_seq=2 ttl=64 time=0.475 ms
64 bytes from 10.15.1.2: icmp_seq=3 ttl=64 time=0.383 ms

--- 10.15.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 43ms rtt 
min/avg/max/mdev = 0.302/0.386/0.475/0.074 ms Connecting to host 10.15.1.3, 
port 5201 [  5] local 10.15.1.2 port 59660 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-10.00  sec  7.31 GBytes  6.28 Gbits/sec  49510187 KBytes   
[  5]  10.00-20.00  sec  7.37 GBytes  6.33 Gbits/sec  51415215 KBytes   
[  5]  20.00-30.00  sec  7.30 GBytes  6.27 Gbits/sec  46197182 KBytes   
[  5]  30.00-40.00  sec  7.30 GBytes  6.27 Gbits/sec  46344202 KBytes   
[  5]  40.00-50.00  sec  7.16 GBytes  6.15 Gbits/sec  49123287 KBytes   
[  5]  50.00-60.00  sec  7.33 GBytes  6.30 Gbits/sec  48734214 KBytes   
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-60.00  sec  43.8 GBytes  6.27 Gbits/sec  291323 sender
[  5]   0.00-60.00  sec  43.8 GBytes  6.27 Gbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 59658 [  5] local 10.15.1.3 port 5201 
connected to 10.15.1.2 port 59660
[ ID] Interval   Transfer Bitrate
[  5]   0.00-10.00  sec  7.30 GBytes  6.27 Gbits/sec  
[  5]  10.00-20.00  sec  7.37 GByt

[ovs-dev] 答复: 答复: 答复: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

2020-02-05 Thread
Hi, Flavio

I tried current ovs master and your two patches (this one and the one for tap), 
tap to tap and veth to veth performance are very bad when tso is on, I think 
merged tso patch series are wrong for such use cases, BTW, the performance data 
is normal if I turned off tso "sudo ip netns exec nsXXX ethtool -K vethX tso 
off", for tap, the same is there (I used your tap patch for this), it is ok 
when I turned off tso.

here is performance data:

Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 36024 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  7.59 MBytes  6.37 Mbits/sec  2410   7.07 KBytes
[  4]  10.00-20.00  sec  7.76 MBytes  6.51 Mbits/sec  2496   5.66 KBytes
[  4]  20.00-30.00  sec  7.99 MBytes  6.70 Mbits/sec  2536   5.66 KBytes
[  4]  30.00-40.00  sec  7.85 MBytes  6.58 Mbits/sec  2506   5.66 KBytes
[  4]  40.00-50.00  sec  7.93 MBytes  6.65 Mbits/sec  2556   5.66 KBytes
[  4]  50.00-60.00  sec  8.25 MBytes  6.92 Mbits/sec  2696   7.07 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  47.4 MBytes  6.62 Mbits/sec  15200 sender
[  4]   0.00-60.00  sec  47.3 MBytes  6.61 Mbits/sec  receiver

Server output:
Accepted connection from 10.15.1.2, port 36022
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 36024
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  7.44 MBytes  6.24 Mbits/sec
[  5]  10.00-20.00  sec  7.80 MBytes  6.54 Mbits/sec
[  5]  20.00-30.00  sec  7.95 MBytes  6.67 Mbits/sec
[  5]  30.00-40.00  sec  7.83 MBytes  6.57 Mbits/sec
[  5]  40.00-50.00  sec  8.00 MBytes  6.71 Mbits/sec
[  5]  50.00-60.00  sec  8.23 MBytes  6.91 Mbits/sec


iperf Done.

You can use the below script to check performance (need sudo run-iperf3.sh).

$ cat run-iperf3.sh
#!/bin/bash

ovs-vsctl add-br br-int -- set bridge br-int datapath_type=netdev 
protocols=OpenFlow10,OpenFlow12,OpenFlow13 stp_enable=false
ip link add veth1 type veth peer name vethbr1
ip link add veth2 type veth peer name vethbr2
ip netns add ns01
ip netns add ns02

ip link set veth1 netns ns01
ip link set veth2 netns ns02

ip netns exec ns01 ifconfig lo 127.0.0.1 up
ip netns exec ns01 ifconfig veth1 10.15.1.2/24 up

ip netns exec ns02 ifconfig lo 127.0.0.1 up
ip netns exec ns02 ifconfig veth2 10.15.1.3/24 up

ifconfig vethbr1 0 up
ifconfig vethbr2 0 up


ovs-vsctl add-port br-int vethbr1
ovs-vsctl add-port br-int vethbr2

ip netns exec ns01 ping 10.15.1.3 -c 3
ip netns exec ns02 ping 10.15.1.2 -c 3

ip netns exec ns02 iperf3 -s -i 10 -D
ip netns exec ns01 iperf3 -t 60 -i 10 -c 10.15.1.3 --get-server-output

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月5日 19:46
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
主题: Re: 答复: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std packet in 
the TSO packet

On Wed, Feb 05, 2020 at 12:05:23AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Thanks Flavio, which kernel version can support TSO for tap device?

That ioctl was introduced in 2.6.27.
fbl

> 
> -邮件原件-
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年2月5日 1:12
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
> 主题: Re: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std packet 
> in the TSO packet
> 
> On Tue, Feb 04, 2020 at 12:00:19PM -0300, Flavio Leitner wrote:
> > On Tue, Feb 04, 2020 at 12:51:24AM +, Yi Yang (杨 D)-云服务集团 wrote:
> > > Hi, Flavio
> > > 
> > > With this one patch and previous several merged TSO-related 
> > > patches, can veth work with "ethtool -K vethX tx on"? I always 
> > > can't figure out why veth can work in dpdk data path when tx 
> > > offload features are on, it looks like you're fixing this big issue, 
> > > right?
> > 
> > If you have tso enabled with dpdk, then veth works with vethX tx on 
> > (which is the default setting for veth). Otherwise TSO is not 
> > enabled and then you need to turn tx offloading off.
> > 
> > You said "can work in dpdk data path when tx ... are on", so I think 
> > it's okay? Not sure though.
> > 
> > > For tap interface, it can't support TSO, do you Redhat guys have 
> > > plan to enable it on kernel side.
> > 
> > Yeah, currently it only works in one direction (OvS -> kernel). I am 
> > looking into this now.
> 
> With TSO enabled on the TAP device:
> 
> Traffic Direction  TSO disabled TSO enabled  
> VM->tap2.98 Gbits/sec   22.9 Gbits/sec
> tap->VM2.29 Gbits/sec   18.0 Gbits/sec
> 
> The code is in

[ovs-dev] 答复: 答复: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

2020-02-04 Thread
Thanks Flavio, which kernel version can support TSO for tap device?

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月5日 1:12
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; txfh2...@aliyun.com
主题: Re: [ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std packet in the 
TSO packet

On Tue, Feb 04, 2020 at 12:00:19PM -0300, Flavio Leitner wrote:
> On Tue, Feb 04, 2020 at 12:51:24AM +, Yi Yang (杨 D)-云服务集团 wrote:
> > Hi, Flavio
> > 
> > With this one patch and previous several merged TSO-related patches, 
> > can veth work with "ethtool -K vethX tx on"? I always can't figure 
> > out why veth can work in dpdk data path when tx offload features are 
> > on, it looks like you're fixing this big issue, right?
> 
> If you have tso enabled with dpdk, then veth works with vethX tx on 
> (which is the default setting for veth). Otherwise TSO is not enabled 
> and then you need to turn tx offloading off.
> 
> You said "can work in dpdk data path when tx ... are on", so I think 
> it's okay? Not sure though.
> 
> > For tap interface, it can't support TSO, do you Redhat guys have 
> > plan to enable it on kernel side.
> 
> Yeah, currently it only works in one direction (OvS -> kernel). I am 
> looking into this now.

With TSO enabled on the TAP device:

Traffic Direction  TSO disabled TSO enabled  
VM->tap2.98 Gbits/sec   22.9 Gbits/sec
tap->VM2.29 Gbits/sec   18.0 Gbits/sec

The code is in my github branch:
https://github.com/fleitner/ovs/tree/tso-tap-enable-tx-v1

commit 884371df3bf3df836d4c2ab2d62b420339691fe8
Author: Flavio Leitner 
Date:   Tue Feb 4 11:18:49 2020 -0300

netdev-linux: Enable TSO in the TAP device.

Use ioctl TUNSETOFFLOAD if kernel supports to enable TSO
offloading in the tap device.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner 

Please try it out and let me know if it works for you as well.
Thanks
fbl


> VM->veth   2.96 Gbits/sec   22.6 Gbits/sec
> veth->VM   2.30 Gbits/sec   9.58 Gbits/sec



> See below my iperf3 results as a reference:
> 
> Traffic Direction  TSO disabled TSO enabled  
> VM->tap2.98 Gbits/sec   22.7 Gbits/sec
> VM->veth   2.96 Gbits/sec   22.6 Gbits/sec
> veth->VM   2.30 Gbits/sec   9.58 Gbits/sec
> tap->VM2.29 Gbits/sec   2.19 Gbits/sec
> 
> fbl
>  
> > -邮件原件-
> > 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> > 发送时间: 2020年2月4日 5:46
> > 收件人: d...@openvswitch.org
> > 抄送: Stokes Ian ; Loftus Ciara 
> > ; Ilya Maximets ; Yi 
> > Yang (杨
> > ?D)-云服务集团 ; txfh2007 ; Ben 
> > Pfaff ; Flavio Leitner 
> > 主题: [PATCH v2] netdev-linux: Prepend the std packet in the TSO 
> > packet
> > 
> > Usually TSO packets are close to 50k, 60k bytes long, so to to copy 
> > less bytes when receiving a packet from the kernel change the 
> > approach. Instead of extending the MTU sized packet received and 
> > append with remaining TSO data from the TSO buffer, allocate a TSO 
> > packet with enough headroom to prepend the std packet data.
> > 
> > Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload 
> > support")
> > Suggested-by: Ben Pfaff 
> > Signed-off-by: Flavio Leitner 
> > ---
> >  lib/dp-packet.c|   8 +--
> >  lib/dp-packet.h|   2 +
> >  lib/netdev-linux-private.h |   3 +-
> >  lib/netdev-linux.c | 117 ++---
> >  4 files changed, 78 insertions(+), 52 deletions(-)
> > 
> > V2:
> >   - tso packets tailroom depends on headroom in netdev_linux_rxq_recv()
> >   - iov_len uses packet's tailroom.
> > 
> >   This patch depends on a previous posted patch to work:
> >   Subject: netdev-linux-private: fix max length to be 16 bits
> >   
> > https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367469.
> > html
> > 
> >   With both patches applied, I can run iperf3 and scp on both directions
> >   with good performance and no issues.
> > 
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 
> > 8dfedcb7c..cd2623500
> > 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -243,8 +243,8 @@ dp_packet_copy__(struct dp_packet *b, uint8_t 
> > *new_base,
> >  
> >  /* Reallocates 'b' so that it has exactly 'new_headroom' and 'new_tailroom'
> >   * bytes of headroom and tailroom, respectivel

[ovs-dev] 答复: 答复: 答复: [PATCH] socket-util: Introduce emulation and wrapper for recvmmsg().

2020-01-07 Thread
Ben, I think the patch using recvmmsg is ready for merge if you want, basically 
4.15 or later kernels can support TPACKET_V3, I'm not sure if recvmmsg and 
TPACKET_V3 can coexist, do you mean we can use config HAVE_ TPACKET_V3/2 to 
build different version for different kernel?

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年1月8日 4:11
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org
主题: Re: 答复: 答复: [PATCH] socket-util: Introduce emulation and wrapper for 
recvmmsg().

On Mon, Dec 23, 2019 at 12:22:52AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Ben, socket.h in master does include sendmmsg
> 
> https://github.com/openvswitch/ovs/blob/master/include/sparse/sys/sock
> et.h#L165
> 
> Per your explanation, I understood why you call recvmsg there, so I don't 
> have other comments.
> 
> As William explained in his RFC patch, I think TPACKET_V3 is the best way to 
> fix this. I tried af_packet to use veth in OVS DPDK, it's performance is 2 
> times more than my patch, about 4Gbps, for my patch, veth performance is 
> about 1.47Gbps, af_packet just used TPACKET_V2, TPACKET_V3 should be much 
> better than TPACKET_V2 per William's explanation.

OK.  Do you want to continue working to use recvmmsg() in OVS?  Or do you want 
to withdraw the idea in favor of TPACKET_V3?  The possible advantage of 
recvmmsg() is that it's going to be available pretty much everywhere, whereas 
TPACKET_V3 is a more recent addition to Linux.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [dpdk.org代发][dpdk-dev] Question about using virtio_user in OVS-DPDK

2020-01-01 Thread
William, use the below option for your tap0

sudo ovs-vsctl add-port br-int virtio_user0 -- set Interface virtio_user0 
type=dpdk 
options:dpdk-devargs=net_virtio_user0,iface=tap0,path=/dev/vhost-net,queue_size=1024

vistio_user also can create tap0 if it doesn't exist, you remove "iface=tap0" 
from option for that case.

Maybe ovs port name can't be same as tap interface names, it is just my guess.

-邮件原件-
发件人: dev [mailto:dev-boun...@dpdk.org] 代表 William Tu
发送时间: 2020年1月1日 6:19
收件人:  ; d...@dpdk.org
抄送: Jianfeng Tan 
主题: [dpdk.org代发][dpdk-dev] Question about using virtio_user in OVS-DPDK

Hi,

I'm trying to find a faster way to communicate from userspace OVS to kernel. So 
I create a virtio_user port at OVS-DPDK, and send packets to kernel's tap 
device.

packets in OVS userspace -> virtio-user port -> vhost-net (kernel) -> tap 
device (kernel) As described in paper[1], figure 1 for legacy applications.

But there is no documentation about it. I tried:
1) load vhost-net
# lsmod | grep vhost
vhost_net  32768  0
vhost  57344  1 vhost_net
tap28672  1 vhost_net
tun57344  8 vhost_net

2) start OVS
3) create tap and attach to OVS
ip tuntap add mode tap tap0
ip link set dev tap0 up
ovs-vsctl add-port br0 tap0 -- set interface tap0 type=dpdk \
options:dpdk-devargs=vdev:net_virtio_user1,iface=tap0,path=/dev/vhost-net

So I thought this is a faster channel using virtio ring than readv/writev to 
the tap fd.
But it doesn't work.
2019-12-31T22:06:39.956Z|00033|netdev|WARN|could not create netdev
tap0 of unknown type dpdk
2019-12-31T22:06:39.956Z|00034|bridge|WARN|could not open network device tap0 
(Address family not supported by protocol)

Any suggestions? Or do I understand the concept of virtio_user correctly?

[1] VIRTIO-USER: A New Versatile Channel for Kernel-Bypass Networks Thanks 
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [dpdk.org代发][dpdk-dev] Question about using virtio_user in OVS-DPDK

2020-01-01 Thread
William, use the below option for your tap0

sudo ovs-vsctl add-port br-int virtio_user0 -- set Interface virtio_user0 
type=dpdk 
options:dpdk-devargs=net_virtio_user0,iface=tap0,path=/dev/vhost-net,queue_size=1024

vistio_user also can create tap0 if it doesn't exist, you remove "iface=tap0" 
from option for that case.

Maybe ovs port name can't be same as tap interface names, it is just my guess.

-邮件原件-
发件人: dev [mailto:dev-boun...@dpdk.org] 代表 William Tu
发送时间: 2020年1月1日 6:19
收件人:  ; d...@dpdk.org
抄送: Jianfeng Tan 
主题: [dpdk.org代发][dpdk-dev] Question about using virtio_user in OVS-DPDK

Hi,

I'm trying to find a faster way to communicate from userspace OVS to kernel. So 
I create a virtio_user port at OVS-DPDK, and send packets to kernel's tap 
device.

packets in OVS userspace -> virtio-user port -> vhost-net (kernel) -> tap 
device (kernel) As described in paper[1], figure 1 for legacy applications.

But there is no documentation about it. I tried:
1) load vhost-net
# lsmod | grep vhost
vhost_net  32768  0
vhost  57344  1 vhost_net
tap28672  1 vhost_net
tun57344  8 vhost_net

2) start OVS
3) create tap and attach to OVS
ip tuntap add mode tap tap0
ip link set dev tap0 up
ovs-vsctl add-port br0 tap0 -- set interface tap0 type=dpdk \
options:dpdk-devargs=vdev:net_virtio_user1,iface=tap0,path=/dev/vhost-net

So I thought this is a faster channel using virtio ring than readv/writev to 
the tap fd.
But it doesn't work.
2019-12-31T22:06:39.956Z|00033|netdev|WARN|could not create netdev
tap0 of unknown type dpdk
2019-12-31T22:06:39.956Z|00034|bridge|WARN|could not open network device tap0 
(Address family not supported by protocol)

Any suggestions? Or do I understand the concept of virtio_user correctly?

[1] VIRTIO-USER: A New Versatile Channel for Kernel-Bypass Networks Thanks 
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

2019-12-23 Thread
William, maybe you don't know that kind of tap interface you're saying only can 
be used for VM, that is why openvswitch has to introduce internal type for the 
case I'm saying.

In OVS DPDK case, if you create the below interface, it is a tap interface.

ovs-vsctl add-port tapX -- set interface type=internal

It won't work if you create tap interface in the below way

Ip tuntap add tapX node tap
ovs-vsctl add-port br-int tapX

I have tried af_packet for it, it can't work, I don't think af_xdp can work for 
such tap, maybe you can double confirm this.



-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2019年12月24日 8:17
收件人: Yi Yang (杨燚)-云服务集团 
抄送: b...@ovn.org; d...@openvswitch.org; i.maxim...@ovn.org; echau...@redhat.com
主题: Re: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

On Sun, Dec 22, 2019 at 4:35 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Thanks William, af_packet only can open tap interface, it can't create 
> tap interface. Tap interface onlu can be created by the below way
>
> ovs-vsctl add-port tapX -- set interface tapX type=internal
>
> this tap is very special, it is like a mystery to me so far. "ip 
> tuntap add tapX mode tap" can't work for such tap interface.

Why not? What's the error message?
you can create a tapX device using ip tuntap first, and add tapX using OVS

using ovs-vsctl add-port tapX -- set interface tapX type=afxdp

Regards,
William
>
> Anybody can tell me how I can create such a tap interface without using "
> ovs-vsctl add-port tapX"
>
> By the way, I tried af_packet for veth, the performance is very good, 
> it is about 4Gbps on my machine, but it used TPACKET_V2.
>
> -邮件原件-
> 发件人: William Tu [mailto:u9012...@gmail.com]
> 发送时间: 2019年12月21日 1:50
> 收件人: Ben Pfaff 
> 抄送: d...@openvswitch.org; i.maxim...@ovn.org; Yi Yang (杨燚)-云服务集团
> ; echau...@redhat.com
> 主题: Re: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.
>
> On Thu, Dec 19, 2019 at 08:44:30PM -0800, Ben Pfaff wrote:
> > On Thu, Dec 19, 2019 at 04:41:25PM -0800, William Tu wrote:
> > > Currently the performance of sending packets from userspace ovs to 
> > > kernel veth device is pretty bad as reported from YiYang[1].
> > > The patch adds AF_PACKET v3, tpacket v3, as another way to tx/rx 
> > > packet to linux device, hopefully showing better performance.
> > >
> > > AF_PACKET v3 should get closed to 1Mpps, as shown[2]. However, my 
> > > current patch using iperf tcp shows only 1.4Gbps, maybe I'm doing 
> > > something wrong.  Also DPDK has similar implementation using 
> > > AF_PACKET v2[3].  This is still work-in-progress but any feedbacks 
> > > are welcome.
> >
> > Is there a good reason that this is implemented as a new kind of 
> > netdev rather than just a new way for the existing netdev 
> > implementation to do packet i/o?
>
> The AF_PACKET v3 is more like PMD mode driver (the netdev-afxdp and 
> other dpdk netdev), which has its own memory mgmt, ring structure, and 
> polling the descriptors. So I implemented it as a new kind. I feel its 
> pretty different than tap or existing af_packet netdev.
>
> But integrate it to the existing netdev (lib/netdev-linux.c) is also OK.
>
> William
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: iperf tcp issue on veth using afxdp

2019-12-23 Thread
Thanks Yifeng, good performance number. I'll run it on my machine and feedback 
you my result.

-邮件原件-
发件人: Yifeng Sun [mailto:pkusunyif...@gmail.com] 
发送时间: 2019年12月24日 6:59
收件人: Yi Yang (杨燚)-云服务集团 
抄送: u9012...@gmail.com; d...@openvswitch.org; i.maxim...@ovn.org; 
echau...@redhat.com
主题: Re: [ovs-dev] iperf tcp issue on veth using afxdp

Hi Yi,

I don't have OVS DPDK setup yet. I need to set it up first.

On my machine, afxdp can reach 4.6Gbps.

[  3]  0.0- 1.0 sec   564 MBytes  4.73 Gbits/sec
[  3]  1.0- 2.0 sec   553 MBytes  4.64 Gbits/sec
[  3]  2.0- 3.0 sec   558 MBytes  4.68 Gbits/sec
[  3]  3.0- 4.0 sec   556 MBytes  4.66 Gbits/sec
[  3]  4.0- 5.0 sec   545 MBytes  4.57 Gbits/sec
[  3]  5.0- 6.0 sec   554 MBytes  4.64 Gbits/sec
[  3]  6.0- 7.0 sec   548 MBytes  4.60 Gbits/sec
[  3]  7.0- 8.0 sec   548 MBytes  4.60 Gbits/sec
[  3]  8.0- 9.0 sec   550 MBytes  4.62 Gbits/sec
[  3]  9.0-10.0 sec   548 MBytes  4.60 Gbits/sec

Thanks,
Yifeng

On Sun, Dec 22, 2019 at 4:40 PM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, Yifeng
>
> I'll try it again. By the way, did you try af_packet for veth in OVS DPDK? In 
> my machine it can reach 4Gbps, do you think af_xdp can reach this number?
>
> -邮件原件-
> 发件人: Yifeng Sun [mailto:pkusunyif...@gmail.com]
> 发送时间: 2019年12月21日 9:11
> 收件人: William Tu 
> 抄送:  ; Ilya Maximets 
> ; Eelco Chaudron ; Yi Yang 
> (杨燚)-云服务集团 
> 主题: Re: [ovs-dev] iperf tcp issue on veth using afxdp
>
> This seems to be related to netdev-afxdp's batch size bigger than kernel's 
> xdp batch size.
> I created a patch to fix it.
>
> https://patchwork.ozlabs.org/patch/1214397/
>
> Could anyone take a look at this patch?
>
> Thanks,
> Yifeng
>
> On Fri, Nov 22, 2019 at 9:52 AM William Tu  wrote:
> >
> > Hi Ilya and Eelco,
> >
> > Yiyang reports very poor TCP performance on his setup and I can also 
> > reproduce it on my machine. Somehow I think this might be a kernel 
> > issue, but I don't know where to debug this. Need your suggestion 
> > about how to debug.
> >
> > So the setup is like the system-traffic, creating 2 namespaces and 
> > veth devices and attach to OVS. I do remember to turn off tx offload 
> > and ping, UDP, nc (tcp-mode) works fine.
> >
> > TCP using iperf drops to 0Mbps after 4 seconds.
> > At server side:
> > root@osboxes:~/ovs# ip netns exec at_ns0 iperf -s
> > 
> > Server listening on TCP port 5001
> > TCP window size:  128 KByte (default)
> > 
> > [  4] local 10.1.1.1 port 5001 connected with 10.1.1.2 port 40384 
> > Waiting for server threads to complete. Interrupt again to force quit.
> >
> > At client side
> > root@osboxes:~/bpf-next# ip netns exec at_ns1 iperf -c 10.1.1.1 -i 1 
> > -t 10
> > 
> > Client connecting to 10.1.1.1, TCP port 5001 TCP window size: 85.0 
> > KByte (default)
> > 
> > [  3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001
> > [ ID] Interval   Transfer Bandwidth
> > [  3]  0.0- 1.0 sec  17.0 MBytes   143 Mbits/sec
> > [  3]  1.0- 2.0 sec  9.62 MBytes  80.7 Mbits/sec [  3]  2.0- 3.0 sec
> > 6.75 MBytes  56.6 Mbits/sec [  3]  3.0- 4.0 sec  11.0 MBytes  92.3 
> > Mbits/sec [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec [  3]  6.0-
> > 7.0 sec  0.00 Bytes  0.00 bits/sec [  3]  7.0- 8.0 sec  0.00 Bytes
> > 0.00 bits/sec [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec [  3]
> > 9.0-10.0 sec  0.00 Bytes  0.00 bits/sec [  3] 10.0-11.0 sec  0.00 
> > Bytes  0.00 bits/sec
> >
> > (after this, even ping stops working)
> >
> > Script to reproduce
> > -
> > ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> >
> > ip netns add at_ns0
> > ip link add p0 type veth peer name afxdp-p0 ip link set p0 netns
> > at_ns0 ip link set dev afxdp-p0 up ovs-vsctl add-port br0 afxdp-p0
> >
> > ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp"
> > options:xdp-mode=native
> > ip netns exec at_ns0 sh << NS_EXEC_HEREDOC ip addr add "10.1.1.1/24"
> > dev p0 ip link set dev p0 up NS_EXEC_HEREDOC
> >
> > ip netns add at_ns1
> > ip link add p1 type veth peer name afxdp-p1 ip link set p1 netns
> > at_ns1 ip link set dev afxdp-p1 up ovs-vsctl add-port br0 afxdp-p1 
> > -- \
> >set interface afxdp-p1 options:n_rxq=1 type="afxdp"
> > option

[ovs-dev] 答复: iperf tcp issue on veth using afxdp

2019-12-22 Thread
Hi, Yifeng

I'll try it again. By the way, did you try af_packet for veth in OVS DPDK? In 
my machine it can reach 4Gbps, do you think af_xdp can reach this number?

-邮件原件-
发件人: Yifeng Sun [mailto:pkusunyif...@gmail.com] 
发送时间: 2019年12月21日 9:11
收件人: William Tu 
抄送:  ; Ilya Maximets 
; Eelco Chaudron ; Yi Yang (杨燚)-云服务集团 

主题: Re: [ovs-dev] iperf tcp issue on veth using afxdp

This seems to be related to netdev-afxdp's batch size bigger than kernel's xdp 
batch size.
I created a patch to fix it.

https://patchwork.ozlabs.org/patch/1214397/

Could anyone take a look at this patch?

Thanks,
Yifeng

On Fri, Nov 22, 2019 at 9:52 AM William Tu  wrote:
>
> Hi Ilya and Eelco,
>
> Yiyang reports very poor TCP performance on his setup and I can also 
> reproduce it on my machine. Somehow I think this might be a kernel 
> issue, but I don't know where to debug this. Need your suggestion 
> about how to debug.
>
> So the setup is like the system-traffic, creating 2 namespaces and 
> veth devices and attach to OVS. I do remember to turn off tx offload 
> and ping, UDP, nc (tcp-mode) works fine.
>
> TCP using iperf drops to 0Mbps after 4 seconds.
> At server side:
> root@osboxes:~/ovs# ip netns exec at_ns0 iperf -s
> 
> Server listening on TCP port 5001
> TCP window size:  128 KByte (default)
> 
> [  4] local 10.1.1.1 port 5001 connected with 10.1.1.2 port 40384 
> Waiting for server threads to complete. Interrupt again to force quit.
>
> At client side
> root@osboxes:~/bpf-next# ip netns exec at_ns1 iperf -c 10.1.1.1 -i 1 
> -t 10
> 
> Client connecting to 10.1.1.1, TCP port 5001 TCP window size: 85.0 
> KByte (default)
> 
> [  3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0- 1.0 sec  17.0 MBytes   143 Mbits/sec
> [  3]  1.0- 2.0 sec  9.62 MBytes  80.7 Mbits/sec [  3]  2.0- 3.0 sec  
> 6.75 MBytes  56.6 Mbits/sec [  3]  3.0- 4.0 sec  11.0 MBytes  92.3 
> Mbits/sec [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec [  3]  6.0- 
> 7.0 sec  0.00 Bytes  0.00 bits/sec [  3]  7.0- 8.0 sec  0.00 Bytes  
> 0.00 bits/sec [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec [  3]  
> 9.0-10.0 sec  0.00 Bytes  0.00 bits/sec [  3] 10.0-11.0 sec  0.00 
> Bytes  0.00 bits/sec
>
> (after this, even ping stops working)
>
> Script to reproduce
> -
> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>
> ip netns add at_ns0
> ip link add p0 type veth peer name afxdp-p0 ip link set p0 netns 
> at_ns0 ip link set dev afxdp-p0 up ovs-vsctl add-port br0 afxdp-p0
>
> ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp"
> options:xdp-mode=native
> ip netns exec at_ns0 sh << NS_EXEC_HEREDOC ip addr add "10.1.1.1/24" 
> dev p0 ip link set dev p0 up NS_EXEC_HEREDOC
>
> ip netns add at_ns1
> ip link add p1 type veth peer name afxdp-p1 ip link set p1 netns 
> at_ns1 ip link set dev afxdp-p1 up ovs-vsctl add-port br0 afxdp-p1 -- 
> \
>set interface afxdp-p1 options:n_rxq=1 type="afxdp"
> options:xdp-mode=native
>
> ip netns exec at_ns1 sh << NS_EXEC_HEREDOC ip addr add "10.1.1.2/24" 
> dev p1 ip link set dev p1 up NS_EXEC_HEREDOC
>
> ethtool -K afxdp-p0 tx off
> ethtool -K afxdp-p1 tx off
> ip netns exec at_ns0 ethtool -K p0 tx off ip netns exec at_ns1 ethtool 
> -K p1 tx off
>
> ip netns exec at_ns0 ping  -c 10 -i .2 10.1.1.2 echo "ip netns exec 
> at_ns1 iperf -c 10.1.1.1 -i 1 -t 10"
> ip netns exec at_ns0 iperf -s
>
> Thank you
> William
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

2019-12-22 Thread
Thanks William, 
https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt is very 
good document for TPACKET_V*, I completely agree TPCKET_V3 is the best way to 
improve tap and veth performance. Can you share us how to use your patch? 
Lib/netdev-linux.c is still there, which recv function will be called when I 
add veth/tap in OVS DPDK?

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2019年12月21日 1:43
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; b...@ovn.org; echau...@redhat.com
主题: Re: 答复: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

On Fri, Dec 20, 2019 at 06:09:08AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, William
> 
> What kernel version can support AF_PACKET v3? I can try it with your patch.

Hi Yiyang,

Kernel +4.0 should have v3 support.

I'm also reading this doc:
https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt

---
+ AF_PACKET TPACKET_V3 example
---

AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame 
sizes by doing it's own memory management. It is based on blocks where polling 
works on a per block basis instead of per ring as in TPACKET_V2 and predecessor.

It is said that TPACKET_V3 brings the following benefits:
 *) ~15 - 20% reduction in CPU-usage
 *) ~20% increase in packet capture rate
 *) ~2x increase in packet density
 *) Port aggregation analysis
 *) Non static frame size to capture entire packet payload

So it seems to be a good candidate to be used with packet fanout.

DPDK library is using TPACKET_V2, and V3 is better due to:
TPACKET_V2 --> TPACKET_V3:
- Flexible buffer implementation for RX_RING:
1. Blocks can be configured with non-static frame-size
2. Read/poll is at a block-level (as opposed to packet-level)
3. Added poll timeout to avoid indefinite user-space wait
   on idle links
4. Added user-configurable knobs:
4.1 block::timeout
4.2 tpkt_hdr::sk_rxhash
- RX Hash data available in user space
- TX_RING semantics are conceptually similar to TPACKET_V2;

Thanks
William

> 
> -邮件原件-
> 发件人: William Tu [mailto:u9012...@gmail.com]
> 发送时间: 2019年12月20日 8:41
> 收件人: d...@openvswitch.org
> 抄送: i.maxim...@ovn.org; Yi Yang (杨燚)-云服务集团 ; 
> b...@ovn.org; echau...@redhat.com
> 主题: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.
> 
> Currently the performance of sending packets from userspace ovs to 
> kernel veth device is pretty bad as reported from YiYang[1].
> The patch adds AF_PACKET v3, tpacket v3, as another way to tx/rx 
> packet to linux device, hopefully showing better performance.
> 
> AF_PACKET v3 should get closed to 1Mpps, as shown[2]. However, my 
> current patch using iperf tcp shows only 1.4Gbps, maybe I'm doing something 
> wrong.
> Also DPDK has similar implementation using AF_PACKET v2[3].  This is 
> still work-in-progress but any feedbacks are welcome.
> 
> [1] https://patchwork.ozlabs.org/patch/1204939/
> [2] slide 18, https://www.netdevconf.info/2.2/slides/karlsson-afpacket-talk.
> pdf
> [3] dpdk/drivers/net/af_packet/rte_eth_af_packet.c
> ---
>  lib/automake.mk|   2 +
>  lib/netdev-linux-private.h |  23 +++
>  lib/netdev-linux.c |  24 ++-
>  lib/netdev-provider.h  |   1 +
>  lib/netdev-tpacket.c   | 487
> +
>  lib/netdev-tpacket.h   |  43 
>  lib/netdev.c   |   1 +
>  7 files changed, 580 insertions(+), 1 deletion(-)  create mode 100644 
> lib/netdev-tpacket.c  create mode 100644 lib/netdev-tpacket.h
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH] socket-util: Introduce emulation and wrapper for recvmmsg().

2019-12-22 Thread
Ben, socket.h in master does include sendmmsg

https://github.com/openvswitch/ovs/blob/master/include/sparse/sys/socket.h#L165

Per your explanation, I understood why you call recvmsg there, so I don't have 
other comments.

As William explained in his RFC patch, I think TPACKET_V3 is the best way to 
fix this. I tried af_packet to use veth in OVS DPDK, it's performance is 2 
times more than my patch, about 4Gbps, for my patch, veth performance is about 
1.47Gbps, af_packet just used TPACKET_V2, TPACKET_V3 should be much better than 
TPACKET_V2 per William's explanation.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年12月21日 4:51
收件人: Yi Yang (杨燚)-云服务集团 
抄送: d...@openvswitch.org
主题: Re: 答复: [PATCH] socket-util: Introduce emulation and wrapper for recvmmsg().

On Fri, Dec 20, 2019 at 01:25:29AM +, Yi Yang (杨 D)-云服务集团 wrote:
> Current ovs matser has included sendmmsg declaration in 
> include/sparse/sys/socket.h

I believe you are mistaken.

> int sendmmsg(int, struct mmsghdr *, unsigned int, unsigned int);
> 
> I saw  "+^L" in your patch.

Yes, OVS uses page breaks to separate logical sections of code.  The coding 
style document mentions this.

> --- a/lib/socket-util.c
> +++ b/lib/socket-util.c
> @@ -1283,3 +1283,59 @@ wrap_sendmmsg(int fd, struct mmsghdr *msgs, 
> unsigned int n, unsigned int flags)  }  #endif  #endif
> +^L
> +#ifndef _WIN32 /* Avoid using recvmsg on Windows entirely. */
> 
> +#undef recvmmsg
> +int
> +wrap_recvmmsg(int fd, struct mmsghdr *msgs, unsigned int n,
> +  int flags, struct timespec *timeout) {
> +ovs_assert(!timeout);   /* XXX not emulated */
> +
> +static bool recvmmsg_broken = false;
> +if (!recvmmsg_broken) {
> +int save_errno = errno;
> +int retval = recvmmsg(fd, msgs, n, flags, timeout);
> +if (retval >= 0 || errno != ENOSYS) {
> +return retval;
> +}
> +recvmmsg_broken = true;
> +errno = save_errno;
> +}
> +return emulate_recvmmsg(fd, msgs, n, flags, timeout); } #endif
> 
> I don't understand why call recvmmsg here although we have known 
> recvmmsg isn't defined,

Can you explain that comment?  I don't believe that the code tries to call 
recvmmsg() when it is not defined.  The code inside #ifndef HAVE_SENDMMSG only 
uses emulate_recvmmsg(), which itself only calls recvmsg(), not recvmmsg().

> I don't think "static bool recvmmsg_broken" is thread-safe. 

It is thread-safe enough, because it is merely an optimization: if the value is 
wrong, then at most the code gets a little bit slower.

> I think we can completely remove the below part if we do know recvmmsg 
> isn't defined (I think autoconf can detect it very precisely, we 
> needn't to do runtime check for this)
> +static bool recvmmsg_broken = false;
> +if (!recvmmsg_broken) {
> +int save_errno = errno;
> +int retval = recvmmsg(fd, msgs, n, flags, timeout);
> +if (retval >= 0 || errno != ENOSYS) {
> +return retval;
> +}
> +recvmmsg_broken = true;
> +errno = save_errno;
> +}

There are three cases:

1. The C library does not have recvmmsg().  Then we cannot call it at
   all.  In this case, HAVE_SENDMMSG is false and the "#ifndef
   HAVE_SENDMMSG" fork will use emulate_recvmmsg().

2. The C library has recvmmsg() but the kernel does not, because it is
   too old.  Then wrap_recvmmsg() will receive an ENOSYS error from the
   kernel, call emulate_recvmmsg(), and set recvmmsg_broken so that
   future calls don't have to bother going into the kernel at all.

3. The C library and the kernel both have recvmmsg().  Then
   wrap_recvmmsg() will call recvmmsg() and either succeed or get back
   some error other than ENOSYS.  recvmmsg_broken will remain false, and
   all future calls to recvmmsg() will also take the kernel fast path.

Autoconf cannot distinguish cases 2 and 3, nor can anything that runs at build 
time, because there is no way to guess whether the runtime kernel matches the 
build time kernel.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [PATCH] Use batch process recv for tap and raw socket in netdev datapath

2019-12-19 Thread
Ben, per my understanding, buffers are allocated in 
netdev_linux_batch_rxq_recv_sock, they should have been thread-local, do you 
mean maintain static buffers per thread (declared as thread local storage 
array)? I don't think it is feasible, maybe they aren't free on receiving next 
time because those buffers last time aren't handled.

I'll try your recvmmsg emulation patch, I didn't see it, I don't check all the 
patches in ovs-dev, which are really too many :-)

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年12月19日 23:56
收件人: Yi Yang (杨燚)-云服务集团 
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; ian.sto...@intel.com
主题: Re: 答复: [PATCH] Use batch process recv for tap and raw socket in netdev 
datapath

On Wed, Dec 18, 2019 at 02:01:47AM +, Yi Yang (杨 D)-云服务集团 wrote:
> Ben, thank for your review, for recvmmsg, we have to prepare some 
> buffers for it, but we have no way to know how many packets are there 
> for socket, so these mallocs are must-have overhead, maybe 
> self-adaptive malloc mechanism is better, for example, the first 
> receive just mallocs 4 buffers, if it receives 4 buffers successfully, 
> we can increase it to 8, till it is up to 32, if it can't receive all 
> the buffers, we can decrease it by one half, but this will make code 
> complicated a bit.

I don't know whether this is actually a performance problem in practice.
My thought is caching: maintain a per-thread collection of buffers and receive 
into those, then return only the ones that actually got populated and keep the 
rest for next time.

> Your fix is right, I should be set to 0 when retval < 0, thank for 
> your review again, I'll update it with your fix patch and send another 
> version.

Would you mind reviewing the patch I posted that adds recvmmsg() emulation for 
systems that don't have it?  I CCed you on it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [openvswitch.org代发] [PATCH v2] netdev-afxdp: Best-effort configuration of XDP mode.

2019-11-19 Thread
Ilya, got it, thanks a lot.

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2019年11月19日 19:55
收件人: Yi Yang (杨燚)-云服务集团 ; i.maxim...@ovn.org; 
ovs-dev@openvswitch.org
主题: Re: 答复: [openvswitch.org代发][ovs-dev] [PATCH v2] netdev-afxdp: Best-effort 
configuration of XDP mode.

On 19.11.2019 10:00, Yi Yang (杨燚)-云服务集团 wrote:
> Hi, Ilya
> 
> Can you explain what kernel limitations are for TCP for veth? I can't 
> understand why veth has such limitations only for TCP. I saw a veth 
> bug 
> (https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-
> to-mes
> os-kubernetes-docker-containers-4986f88f7a19) but it has been fixed in 2016.

Hi.

Have you read the issue referenced in docs:
https://github.com/cilium/cilium/issues/3077
?

In short, TCP stack clones the packets and netif_receive_generic_xdp() drops 
all the cloned packets. Native XDP for veth seems to work fine.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: Re: 答复: [ovs-discuss] why action "meter" only can be specified once?

2019-08-12 Thread
Thanks Ilya, your proposal works for me, not bad :-)

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@samsung.com] 
发送时间: 2019年8月12日 14:48
收件人: ovs-dev@openvswitch.org; Yi Yang (杨燚)-云服务集团 
抄送: William Tu ; Ben Pfaff 
主题: Re: Re: [ovs-dev] 答复: [ovs-discuss] why action "meter" only can be 
specified once?

Hi.
If you'll try to use more recent OVS version, you'll get more informative error 
message:
ovs-ofctl: duplicate meter instruction not allowed, for OpenFlow 1.1+ 
compatibility

So, it seems like an issue between different versions of OF standards.

Anyway, you may overcome this issue by splitting your flows in two parts:

table=0,ip,nw_src=10.0.0.0/24 actions=meter:4,resubmit:1
table=1,ip,nw_src=10.0.0.2actions=meter:1, ...other-actions
table=1,ip,nw_src=10.0.0.3actions=meter:2, ...other-actions
table=1,ip,nw_src=10.0.0.4actions=meter:3, ...other-actions

Best regards, Ilya Maximets.

> William, I have several flows to share a meter, at the time, every flow has 
> its own meter, they look like the below:
> 
> table=0,ip,nw_src=10.0.0.2 actions=meter:1,meter:4, ...other-actions
> table=0,ip,nw_src=10.0.0.3 actions=meter:2,meter:4, ...other-actions
> table=0,ip,nw_src=10.0.0.3 actions=meter:3,meter:4, ...other-actions
> 
> meter 4 are shared by three flows so that we can let tenants leverage their 
> bandwidth more efficient between three flows, so total limit is 100Mbps, 
> every one limit is 50Mbps, if flow 3 has lower bandwidth utilization rate, 
> flow 1 and 2 can leverage it. I can get every flow stats and also can get all 
> the flows stats by meter 4, that is what I'm saying.
> 
> I think supporting it isn't a big technical issue, it will be great if you 
> can add this feature, I think it is very helpful, we have real use cases 
> which is called shared bandwith.
> 
> -邮件原件-
> 发件人: William Tu [mailto:u9012063 at gmail.com]
> 发送时间: 2019年8月10日 0:54
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: ovs-dev at openvswitch.org; ovs-discuss at openvswitch.org
> 主题: Re: [ovs-discuss] why action "meter" only can be specified once?
> 
> On Mon, Aug 5, 2019 at 12:39 AM Yi Yang (杨燚)-云服务集团  
> wrote:
>>
>> Hi, all
>>
>>
>>
>> I was told meter only can be specified once, but actually there is such case 
>> existing, i.e. multiple flows share a total bandwidth, but every flow also 
>> has its own bandwidth limit, by two meters, we can not only get every flow 
>> stats but also get total stats, I think this is very reasonable user 
>> scenario.
>>
> 
> I don't understand your use case.
> You can create multiple meters and each flow can use it own meter to rate 
> limit, right?
>>
>>
>> ovs-ofctl: instruction meter may be specified only once
> How do you get this error?
> 
> Thanks
> William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] why action "meter" only can be specified once?

2019-08-11 Thread
William, I have several flows to share a meter, at the time, every flow has its 
own meter, they look like the below:

table=0,ip,nw_src=10.0.0.2 actions=meter:1,meter:4, ...other-actions
table=0,ip,nw_src=10.0.0.3 actions=meter:2,meter:4, ...other-actions
table=0,ip,nw_src=10.0.0.3 actions=meter:3,meter:4, ...other-actions

meter 4 are shared by three flows so that we can let tenants leverage their 
bandwidth more efficient between three flows, so total limit is 100Mbps, every 
one limit is 50Mbps, if flow 3 has lower bandwidth utilization rate, flow 1 and 
2 can leverage it. I can get every flow stats and also can get all the flows 
stats by meter 4, that is what I'm saying.

I think supporting it isn't a big technical issue, it will be great if you can 
add this feature, I think it is very helpful, we have real use cases which is 
called shared bandwith.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2019年8月10日 0:54
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
主题: Re: [ovs-discuss] why action "meter" only can be specified once?

On Mon, Aug 5, 2019 at 12:39 AM Yi Yang (杨燚)-云服务集团  wrote:
>
> Hi, all
>
>
>
> I was told meter only can be specified once, but actually there is such case 
> existing, i.e. multiple flows share a total bandwidth, but every flow also 
> has its own bandwidth limit, by two meters, we can not only get every flow 
> stats but also get total stats, I think this is very reasonable user scenario.
>

I don't understand your use case.
You can create multiple meters and each flow can use it own meter to rate 
limit, right?
>
>
> ovs-ofctl: instruction meter may be specified only once
How do you get this error?

Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: How can we improve veth and tap performance in OVS DPDK?

2019-07-31 Thread
Got it, thanks Ilya.

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@samsung.com] 
发送时间: 2019年7月31日 15:50
收件人: Yi Yang (杨燚)-云服务集团 ; ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-dev] How can we improve veth and tap performance in OVS DPDK?

On 31.07.2019 3:44, Yi Yang (杨燚)-云服务集团 wrote:
> Thanks Ilya, it works after disable tx offload, the performance is indeed 
> very poor,> about one tenth of ovs kernel. This is a very very strong warning 
> for us, I strongly> suggest ovs document should tell ovs DPDK users the truth 
> in bold word.

The truth is that DPDK is intended to bypass the kernel to achieve performance, 
but you're going to push all the traffic back to kernel.  In this case you 
will, obviously, never get performance better than the performance of your 
kernel anyway (even with offloading support).  So, it makes *no sense* using 
DPDK in this kind of setup and sending packets back and forth between the 
kernel and userspace. Just keep everything in kernel.

> 
> For ovn, last year, the information I got is ovn can't support VXLAN, is it 
> true so> far? In my mind, GENEVE is worse than VXLAN as far as the 
> performance is concerned.

At least, it should be much better than pushing all the traffic back to kernel.
If you don't like OVN, use ODL or any other SDN controller.

Best regards, Ilya Maximets.

> 
> -邮件原件-
> 发件人: Ilya Maximets [mailto:i.maxim...@samsung.com]
> 发送时间: 2019年7月30日 0:18
> 收件人: ovs-dev@openvswitch.org; Yi Yang (杨燚)-云服务集团 
> 主题: Re: [ovs-dev] How can we improve veth and tap performance in OVS DPDK?
> 
> 
> 
> On 29.07.2019 19:07, Ilya Maximets wrote:
>>> Hi, all
>>> We’re trying OVS DPDK in openstack cloud, but a big warn makes us hesitate.
>>> Floating IP and qrouter use tap interfaces which are attached into 
>>> br-int, SNAT also should use similar way, so OVS DPDK will impact on 
>>> VM network performance significantly, I believe many cloud providers 
>>> have deployed OVS DPDK, my questions are:
>>>
>>> 1.   Do we have some known ways to improve this?
>>
>> As RedHat OSP guide suggests, you could use any SDN controller (like
>> OpenDayLight) or, alternatively, you could use OVN as a network provider for 
>> OpenStack.
>> This way all the required functionality will be handled by the 
>> OpenFlow rules inside OVS without necessity to send traffic over veths and 
>> taps to Linux Kernel.
>>
>>> 2.   Is there any existing effort for this? Veth in kubernetes should
>>> have the same performance issue in OVS DPDK case.
>>
>> It makes no sense right now to run OVS-DPDK on veth pairs in Kubernetes.
>> The only benefit from OVS-DPDK in K8s might be from using 
>> virtio-vhost-user
> 
> I meant virtio-user ports.
> 
>> ports instead of veths for container networking. But this is not implemented.
>> Running DPDK apps inside K8s containers has a lot of unresolved issues right 
>> now.
>>
>> One approach that could improve performance of veths and taps in the 
>> future is using AF_XDP sockets which are supported in OVS now. But 
>> AF_XDP doesn't work properly for virtual interfaces (veths, taps) yet due to 
>> issues in Linux Kernel.
>>
>>>
>>> I also found a very weird issue. I added two veth pairs into ovs 
>>> bridge and ovs DPDK bridge, for ovs case, iperf3 can work well, but 
>>> it can’t for OVS DPDK case, what’s wrong.
>>
>> This is exactly same issue as we already discussed previously. 
>> Disable tx offloading on veth pairs and everything will work.
>>
>> Best regards, Ilya Maximets.
>>
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: 答复: [ovs-discuss] How can I delete flows which match a given cookie value?

2019-07-18 Thread
Cookie is enough, I just think two field matches can avoid false del-flows.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年7月18日 11:34
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
主题: Re: 答复: 答复: [ovs-discuss] How can I delete flows which match a given cookie 
value?

Hmm.  I can see how that would be inconvenient.

If you have control over the cookies, and enough space in them, then you could 
encode the (16-bit) priority as part of the (64-bit) cookie.

On Thu, Jul 18, 2019 at 12:18:55AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Got it, thanks Ben, I want to use cookie and priority to del-flows, but only 
> --strict option can handle priority. So I only can use cookie to del-flows in 
> batch, --strict only can delete one flow.
> 
> sudo ovs-ofctl -Oopenflow13 del-flows br-int "cookie=0x01/-1,priority=1"
> ovs-ofctl: unknown keyword priority
> 
> -邮件原件-
> 发件人: Ben Pfaff [mailto:b...@ovn.org]
> 发送时间: 2019年7月18日 2:02
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> 主题: Re: 答复: [ovs-discuss] How can I delete flows which match a given cookie 
> value?
> 
> --strict means that only exact matches are deleted, so add 'icmp' to your 
> del-flows command to delete the flow.
> 
> On Wed, Jul 17, 2019 at 03:35:09AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > Ben, I found del-flows ran successfully but the flows aren't deleted, 
> > what's wrong?
> > 
> > [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 add-flow br-int 
> > "table=0,cookie=0x1234,priority=1,icmp,actions=drop"
> > [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int 
> > 
> > NXST_FLOW reply (xid=0x4):
> >  cookie=0x1234, duration=3.994s, table=0, n_packets=0, n_bytes=0, 
> > idle_age=3, priority=1,icmp actions=drop [yangyi@localhost ~]$ sudo 
> > ovs-ofctl -Oopenflow10 --strict del-flows br-int 
> > "table=0,cookie=0x1234/-1,priority=1"
> > [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int 
> > 
> > NXST_FLOW reply (xid=0x4):
> >  cookie=0x1234, duration=49.866s, table=0, n_packets=0, n_bytes=0, 
> > idle_age=49, priority=1,icmp actions=drop [yangyi@localhost ~]$
> > 
> > -邮件原件-
> > 发件人: Ben Pfaff [mailto:b...@ovn.org]
> > 发送时间: 2019年7月17日 1:04
> > 收件人: Yi Yang (杨燚)-云服务集团 
> > 抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> > 主题: Re: [ovs-discuss] How can I delete flows which match a given cookie 
> > value?
> > 
> > On Tue, Jul 16, 2019 at 09:35:06AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > > I need to add and delete flows according to user operations, I 
> > > know openflowplugin in Opendaylight can do this, but it seems 
> > > “ovs-ofctl del-flows” can’t do this way, why can’t cookie value be 
> > > used to do this for “ovs-ofctl del-flows”?
> > > 
> > >  
> > > 
> > > sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int 
> > > "table=2,cookie=12345"
> > 
> > To match on a cookie, specify a mask, e.g. cookie=12345/-1.
> 
> 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: [ovs-discuss] How can I delete flows which match a given cookie value?

2019-07-17 Thread
Got it, thanks Ben, I want to use cookie and priority to del-flows, but only 
--strict option can handle priority. So I only can use cookie to del-flows in 
batch, --strict only can delete one flow.

sudo ovs-ofctl -Oopenflow13 del-flows br-int "cookie=0x01/-1,priority=1"
ovs-ofctl: unknown keyword priority

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年7月18日 2:02
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-discuss] How can I delete flows which match a given cookie 
value?

--strict means that only exact matches are deleted, so add 'icmp' to your 
del-flows command to delete the flow.

On Wed, Jul 17, 2019 at 03:35:09AM +, Yi Yang (杨燚)-云服务集团 wrote:
> Ben, I found del-flows ran successfully but the flows aren't deleted, what's 
> wrong?
> 
> [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 add-flow br-int 
> "table=0,cookie=0x1234,priority=1,icmp,actions=drop"
> [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int   
>   
> NXST_FLOW reply (xid=0x4):
>  cookie=0x1234, duration=3.994s, table=0, n_packets=0, n_bytes=0, 
> idle_age=3, priority=1,icmp actions=drop [yangyi@localhost ~]$ sudo 
> ovs-ofctl -Oopenflow10 --strict del-flows br-int 
> "table=0,cookie=0x1234/-1,priority=1"
> [yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int   
>   
> NXST_FLOW reply (xid=0x4):
>  cookie=0x1234, duration=49.866s, table=0, n_packets=0, n_bytes=0, 
> idle_age=49, priority=1,icmp actions=drop [yangyi@localhost ~]$
> 
> -邮件原件-
> 发件人: Ben Pfaff [mailto:b...@ovn.org]
> 发送时间: 2019年7月17日 1:04
> 收件人: Yi Yang (杨燚)-云服务集团 
> 抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> 主题: Re: [ovs-discuss] How can I delete flows which match a given cookie value?
> 
> On Tue, Jul 16, 2019 at 09:35:06AM +, Yi Yang (杨燚)-云服务集团 wrote:
> > I need to add and delete flows according to user operations, I know 
> > openflowplugin in Opendaylight can do this, but it seems “ovs-ofctl 
> > del-flows” can’t do this way, why can’t cookie value be used to do 
> > this for “ovs-ofctl del-flows”?
> > 
> >  
> > 
> > sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int "table=2,cookie=12345"
> 
> To match on a cookie, specify a mask, e.g. cookie=12345/-1.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] How can I delete flows which match a given cookie value?

2019-07-16 Thread
Ben, I found del-flows ran successfully but the flows aren't deleted, what's 
wrong?

[yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 add-flow br-int 
"table=0,cookie=0x1234,priority=1,icmp,actions=drop"
[yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int 

NXST_FLOW reply (xid=0x4):
 cookie=0x1234, duration=3.994s, table=0, n_packets=0, n_bytes=0, idle_age=3, 
priority=1,icmp actions=drop
[yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 --strict del-flows br-int 
"table=0,cookie=0x1234/-1,priority=1"
[yangyi@localhost ~]$ sudo ovs-ofctl -Oopenflow10 dump-flows br-int 

NXST_FLOW reply (xid=0x4):
 cookie=0x1234, duration=49.866s, table=0, n_packets=0, n_bytes=0, idle_age=49, 
priority=1,icmp actions=drop
[yangyi@localhost ~]$

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年7月17日 1:04
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
主题: Re: [ovs-discuss] How can I delete flows which match a given cookie value?

On Tue, Jul 16, 2019 at 09:35:06AM +, Yi Yang (杨燚)-云服务集团 wrote:
> I need to add and delete flows according to user operations, I know 
> openflowplugin in Opendaylight can do this, but it seems “ovs-ofctl 
> del-flows” can’t do this way, why can’t cookie value be used to do 
> this for “ovs-ofctl del-flows”?
> 
>  
> 
> sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int "table=2,cookie=12345"

To match on a cookie, specify a mask, e.g. cookie=12345/-1.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [ovs-discuss] How can I delete flows which match a given cookie value?

2019-07-16 Thread
Thanks Ben, it works.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年7月17日 1:04
收件人: Yi Yang (杨燚)-云服务集团 
抄送: ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
主题: Re: [ovs-discuss] How can I delete flows which match a given cookie value?

On Tue, Jul 16, 2019 at 09:35:06AM +, Yi Yang (杨燚)-云服务集团 wrote:
> I need to add and delete flows according to user operations, I know 
> openflowplugin in Opendaylight can do this, but it seems “ovs-ofctl 
> del-flows” can’t do this way, why can’t cookie value be used to do 
> this for “ovs-ofctl del-flows”?
> 
>  
> 
> sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int "table=2,cookie=12345"

To match on a cookie, specify a mask, e.g. cookie=12345/-1.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: 答复: Why is ovs DPDK much worse than ovs in my test case?

2019-07-11 Thread
Ilya, you're right, I captured 64K packets although MTU is 1500 when I use 
ovs-kernel, but packet size is always <1500 in most cases when I use ovs-DPDK.

00:34:33.331360 IP 192.168.200.101.48968 > 192.168.230.101.5201: Flags [.], seq 
17462881:17528041, ack 0, win 229, options [nop,nop,TS val 148218621 ecr 
148145855], length 65160

00:34:33.332064 IP 192.168.200.101.48968 > 192.168.230.101.5201: Flags [.], seq 
17528041:17588857, ack 0, win 229, options [nop,nop,TS val 148218621 ecr 
148145855], length 60816

Thank you so much, I will use e1000 for this. It will be great if OVS DPDK can 
handle it in the same way as kernel does, otherwise it will break people's 
sense for OVS DPDK, it shocked me at least.

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@samsung.com] 
发送时间: 2019年7月11日 15:35
收件人: Yi Yang (杨燚)-云服务集团 ; ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?

On 11.07.2019 3:27, Yi Yang (杨燚)-云服务集团 wrote:
> BTW, offload features are on in my test client1 and server1 (iperf 
> server)
> 
...
> -邮件原件-----
> 发件人: Yi Yang (杨燚)-云服务集团
> 发送时间: 2019年7月11日 8:22
> 收件人: i.maxim...@samsung.com; ovs-dev@openvswitch.org
> 抄送: Yi Yang (杨燚)-云服务集团 
> 主题: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?
> 重要性: 高
> 
> Ilya, thank you so much, using 9K MTU for all the virtio interfaces in 
> transport path does help (including DPDK port), the data is here.

8K usually works a bit better for me than 9K. Probably, because of the page 
size.

Have you configured MTU for the tap interfaces on host side too just in case 
that host kernel doesn't negotiate the MTU with guest?

> 
> vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
> 
> Client connecting to 192.168.230.101, TCP port 5001 TCP window size:  
> 325 KByte (default)
> 
> [  3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port 
> 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0-10.0 sec   315 MBytes   264 Mbits/sec
> [  3] 10.0-20.0 sec   333 MBytes   280 Mbits/sec
> [  3] 20.0-30.0 sec   300 MBytes   252 Mbits/sec
> [  3] 30.0-40.0 sec   307 MBytes   258 Mbits/sec
> [  3] 40.0-50.0 sec   322 MBytes   270 Mbits/sec
> [  3] 50.0-60.0 sec   316 MBytes   265 Mbits/sec
> [  3]  0.0-60.0 sec  1.85 GBytes   265 Mbits/sec
> vagrant@client1:~$
> 
> But it is still much worse than ovs kernel. In my test case, I used 
> VirtualBox network, the whole transport path traverses several different VMs, 
> every VM has turned on offload features except ovs DPDK VM, I understand tso 
> offload should be done on send side, so when the packet is sent out from the 
> send side or receive side, it has been segmented by tso to adapt to path MTU, 
> so in ovs kernel VM/ovs DPDK VM, the packet size has been MTU of ovs 
> port/DPDK port, so it needn't do tso work, right?

Not sure if I understand the question correctly, but I'll try to clarify. I 
assume that all your VMs located on the same physical host.
Linux kernel is smart and it will not segment the packets until it is 
unavoidable. If all the interfaces on a packet path supports TSO, kernel will 
never segment packets and will always traverse 64K packets all the way from the 
iperf client to iperf server.
In case of OVS with DPDK its VM doesn't support TSO. This way packets will be 
splitted into segments to fit MTU before sending to that VM.

The key point here is the virtio interfaces you're using for VMs.
virtio-net is a para-virtual network interface. This means that the guest knows 
that interface is virtual and it knows that host is able to receive packets 
larger than MTU if offloading was negotiated.
At the same time host knows that guest is able to receive packets larger than 
MTU too. So, nothing will be segmented.

In case of OVS with DPDK host knows that guest is not able to receive packets 
larger than MTU and splits them before sending.

You can't send packets larger than MTU to physical network, but you able to do 
that with virtual network if it was negotiated.


Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: Why is ovs DPDK much worse than ovs in my test case?

2019-07-10 Thread
BTW, offload features are on in my test client1 and server1 (iperf server)

vagrant@client1:~$ ethtool -k enp0s8
Features for enp0s8:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]
vagrant@client1:~$

vagrant@server1:~$ ifconfig enp0s8
enp0s8Link encap:Ethernet  HWaddr 08:00:27:c0:a6:0b
  inet addr:192.168.230.101  Bcast:192.168.230.255  Mask:255.255.255.0
  inet6 addr: fe80::a00:27ff:fec0:a60b/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
  RX packets:4228443 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2484988 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:34527894301 (34.5 GB)  TX bytes:528944799 (528.9 MB)

vagrant@server1:~$ ethtool -k enp0s8
Features for enp0s8:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]
hw-tc-offload: off [fixed]
vagrant@server1:~$

-邮件原件-
发件人: Yi Yang (杨燚)-云服务集团 
发送时间: 2019年7月11日 8:22
收件人: i.maxim...@samsung.com; ovs-dev@openvswitch.org
抄送: Yi Yang (杨燚)-云服务集团 
主题: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?
重要性: 高

Ilya, thank you so much, using 9K MTU for all the virtio interfaces in 
transport path does help (including DPDK port), the data is here.

vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101

Client connecting to 192.168.230.101, TCP port 5001
TCP window size:  325 KByte (default)

[  3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec   315 MBytes   264 Mbits/sec
[  3] 10.0-20.0 sec   333 MBytes   280 Mbits/sec
[  3] 20.0-30.0 sec   300 MBytes   252 Mbits/sec
[  3] 30.0-40.0 sec   307 MBytes   258 Mbits/sec
[  3] 40.0-50.0 sec   322 MBytes   270 Mbits/sec
[  3] 50.0-60.0 sec   316 MBytes   265 Mbits/sec
[  3]  0.0-60.0 sec  1.85 GBytes   265 Mbits/sec
vagrant@client1:~$

But it is still much worse than ovs kernel. In my test case, I used VirtualBox 
network, the whole transport path traverses several different VMs, every VM has 
turned on offload features except ovs DPDK VM, I understand tso offload should 
be done on send side, so when the packet is sent out from the send side or 
receive side, it has been segmented by tso to adapt to path MTU, so in ovs 
kernel VM/ovs DPDK VM, the packet size has been MTU of ovs port/DPDK

[ovs-dev] 答复: Why is ovs DPDK much worse than ovs in my test case?

2019-07-10 Thread
Ilya, thank you so much, using 9K MTU for all the virtio interfaces in 
transport path does help (including DPDK port), the data is here.

vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101

Client connecting to 192.168.230.101, TCP port 5001
TCP window size:  325 KByte (default)

[  3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec   315 MBytes   264 Mbits/sec
[  3] 10.0-20.0 sec   333 MBytes   280 Mbits/sec
[  3] 20.0-30.0 sec   300 MBytes   252 Mbits/sec
[  3] 30.0-40.0 sec   307 MBytes   258 Mbits/sec
[  3] 40.0-50.0 sec   322 MBytes   270 Mbits/sec
[  3] 50.0-60.0 sec   316 MBytes   265 Mbits/sec
[  3]  0.0-60.0 sec  1.85 GBytes   265 Mbits/sec
vagrant@client1:~$

But it is still much worse than ovs kernel. In my test case, I used VirtualBox 
network, the whole transport path traverses several different VMs, every VM has 
turned on offload features except ovs DPDK VM, I understand tso offload should 
be done on send side, so when the packet is sent out from the send side or 
receive side, it has been segmented by tso to adapt to path MTU, so in ovs 
kernel VM/ovs DPDK VM, the packet size has been MTU of ovs port/DPDK port, so 
it needn't do tso work, right?

-邮件原件-
发件人: Ilya Maximets [mailto:i.maxim...@samsung.com] 
发送时间: 2019年7月10日 18:11
收件人: ovs-dev@openvswitch.org; Yi Yang (杨燚)-云服务集团 
主题: Re: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?

> Hi, all
> 
> I just use ovs as a static router in my test case, ovs is ran in 
> vagrant VM, ethernet interfaces uses virtio driver, I create two ovs 
> bridges, each one adds one ethernet interface, two bridges are 
> connected by patch port, only default openflow rule is there.
> 
> table=0, priority=0 actions=NORMAL
> Bridge br-int
> Port patch-br-ex
> Interface patch-br-ex
> type: patch
> options: {peer=patch-br-int}
> Port br-int
> Interface br-int
> type: internal
> Port "dpdk0"
> Interface "dpdk0"
> type: dpdk
> options: {dpdk-devargs=":00:08.0"}
> Bridge br-ex
> Port "dpdk1"
> Interface "dpdk1"
> type: dpdk
> options: {dpdk-devargs=":00:09.0"}
> Port patch-br-int
> Interface patch-br-int
> type: patch
> options: {peer=patch-br-ex}
> Port br-ex
> Interface br-ex
> type: internal
> 
> But when I run iperf to do performance benchmark, the result shocked me.
> 
> For ovs nondpdk, the result is
> 
> vagrant at client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
> 
> 
> Client connecting to 192.168.230.101, TCP port 5001 TCP window size: 
> 85.0 KByte (default)
> 
> [  3] local 192.168.200.101 port 53900 connected with 192.168.230.101 
> port
> 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0-10.0 sec  1.05 GBytes   905 Mbits/sec
> [  3] 10.0-20.0 sec  1.02 GBytes   877 Mbits/sec
> [  3] 20.0-30.0 sec  1.07 GBytes   922 Mbits/sec
> [  3] 30.0-40.0 sec  1.08 GBytes   927 Mbits/sec
> [  3] 40.0-50.0 sec  1.06 GBytes   914 Mbits/sec
> [  3] 50.0-60.0 sec  1.07 GBytes   922 Mbits/sec
> [  3]  0.0-60.0 sec  6.37 GBytes   911 Mbits/sec
> 
> vagrant at client1:~$
> 
> For ovs dpdk, the bandwidth is just about 45Mbits/sec, why? I really 
> don’t understand what happened.
> 
> vagrant at client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
> 
> 
> Client connecting to 192.168.230.101, TCP port 5001 TCP window size: 
> 85.0 KByte (default)
> 
> [  3] local 192.168.200.101 port 53908 connected with 192.168.230.101 
> port
> 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0-10.0 sec  54.6 MBytes  45.8 Mbits/sec [  3] 10.0-20.0 sec  
> 55.5 MBytes  46.6 Mbits/sec [  3] 20.0-30.0 sec  52.5 MBytes  44.0 
> Mbits/sec [  3] 30.0-40.0 sec  53.6 MBytes  45.0 Mbits/sec [  3] 
> 40.0-50.0 sec  54.0 MBytes  45.3 Mbits/sec [  3] 50.0-60.0 sec  53.9 
> MBytes  45.2 Mbits/sec
> [  3]  0.0-60.0 sec   324 MBytes  45.3 Mbits/sec
> 
> vagrant at client1:~$
> 
> By the way, I tried to pin physical cores to qemu processes which 
> correspond to ovs pmd threads, but it hardly affects on performance.
> 
>