[ovs-dev] which fields should be masked or unmasked while using megaflow match?
HI, is there any policy about which fields should be wildcard when using megaflow match? exp 1: table=0, priorIty=0,actions=NORMAL then the datapath flow is like that: recirc_id(0),in_port(3),eth(src=b6:49:dd:5d:3a:a6,dst=2e:b5:7b:d6:52:c2),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 recirc_id(0),in_port(2),eth(src=2e:b5:7b:d6:52:c2,dst=b6:49:dd:5d:3a:a6),eth_type(0x0800),ipv4(frag=no), packets:12, bytes:1176, used:0.825s, actions:3 exp 2: table=0,in_port=1,actions=2 table=0,in_port=2,actions=1 then the datapath flow is like that: recirc_id(0),in_port(2),eth_type(0x0800),ipv4(frag=no), packets:26, bytes:2548, used:0.441s, actions:3 recirc_id(0),in_port(3),eth_type(0x0800),ipv4(frag=no), packets:26, bytes:2548, used:0.441s, actions:2 my question is why ETH_SRC, ETH_DST is needed when using normal action? exp 3: table=0,in_port=1,nw_src=1.1.1.0/24, actions=2 table=0,in_port=2,nw_src=1.1.1.0/24, actions=1 then the datapath flow is like that: recirc_id(0),in_port(3),eth_type(0x0800),ipv4(src=1.1.1.0/255.255.255.0,frag=no), packets:1863, bytes:182574, used:0.552s, actions:2 recirc_id(0),in_port(2),eth_type(0x0800),ipv4(src=1.1.1.0/255.255.255.0,frag=no), packets:1863, bytes:182574, used:0.552s, actions:3 exp 4: table=0,in_port=1,nw_src=1.1.1.0/24, actions=mod_nw_src:1.1.1.3, output:2 table=0,in_port=2,actions=1 then the datapath flow is like that: recirc_id(0),in_port(3),eth_type(0x0800),ipv4(src=1.1.1.2,frag=no), packets:37, bytes:3626, used:0.332s, actions:set(ipv4(src=1.1.1.3)),2 recirc_id(0),in_port(2),eth_type(0x0800),ipv4(frag=no), packets:37, bytes:3626, used:0.332s, actions:3 my question is why NW_SRC=1.1.1.2 should be all masked with 0xff, why not 0xff00 like the rule we created? in one word, is there any rules to set flow mask when using megaflow match? which fields should be wildcard? why? we can extract all fields from packets, and we can find the rule match the packet, but why the datapath flow match fields is not the same as userspace rule? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] is there any performance consideration for max emc cache numbers and megaflow cache numbers?
Hi: in ovs code, MAX_FLOWS = 65536 // for megaflow #define EM_FLOW_HASH_SHIFT 13 #define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) // for emc cache so why choose 65536 and 8192? is there any performance consideration? can I just larger these numbers to make packet only lookup emc cache and megaflow cache? another question: is there any document/data for packet thuoughput in netdev dpdk mode with only emc cache/megaflow cache or with only userspace flow lookup? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
I can reproduce this problem with the script supported by vguntaka in both ovs version 2.5 and ovs version 2.6. 1. Add bridge ovs-vsctl add-br br0 2. Add vm port ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal ip netns add ns0 ip link set dev tap0 netns ns0 ip netns exec ns0 ip link set dev tap0 up ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24 ip netns exec ns0 ip route add default via 1.1.1.1 ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0 3. Send packet Ip netns exec ns0 ping 2.2.2.2 4. Add flows (make sure packet is always sending) while true do ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" ovs-ofctl del-flows br0 done waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): OFPFMFC_BAD_COMMAND” printed in console. Also, I noticed that when using Openflow13, the error disappeared, like this: ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" –O openflow13 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] does ovs bfd support flow based tunnel?
can I enable bfd on flow based tunnel? does it work? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] why the max action length is 32K in kernel?
in function nla_alloc_flow_actions(), there is a check if action length is greater than MAX_ACTIONS_BUFSIZE(32k), then kernel datapath flow will not be installed, and packets will droppped. but in function xlate_actions(), there is such clause: if (nl_attr_oversized(ctx.odp_actions->size)) { /* These datapath actions are too big for a Netlink attribute, so we * can't hand them to the kernel directly. dpif_execute() can execute * them one by one with help, so just mark the result as SLOW_ACTION to * prevent the flow from being installed. */ COVERAGE_INC(xlate_actions_oversize); ctx.xout->slow |= SLOW_ACTION; } and in function nl_attr_oversized(), the clause is like this: return payload_size > UINT16_MAX - NLA_HDRLEN; so we can see that in user space, max action length is almost 64K, but in kernel space, max action length is only 32K. my question is: why the max action length is different? packet will drop when its action length exceeds 32K, but packet can excute in slow path when its action length exceeds 64K? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] does ovs bfd support flow based tunnel?
for flow-based tunnel: ovs-vsctl add-port br-int vxlan1 -- set interface vxlan1 type=vxlan options:remote_ip=flow options:key=flow options:local_ip=10.10.0.1 ovs-vsctl set interface vxlan1 bfd:enable=true when I enable bfd in such a vxlan interface , I can not capture any bfd packets in the physical port.(which used by vxlan interface) At 2017-09-14 23:38:19, "Miguel Angel Ajo Pelayo" wrote: What do you mean by flow-based tunnel? We're using it internally to provide HA connectivity to Gateway_Chassis on OVN, and it's working as a charm to monitor tunnel endpoints on OVS bridges. https://github.com/openvswitch/ovs/blob/master/ovn/controller/bfd.c On Tue, Sep 12, 2017 at 9:19 PM, ychen wrote: can I enable bfd on flow based tunnel? does it work? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] is there any document about how to build debian package with dpdk?
we modified little code for dpdk, so we must rebuild ovs debian package with dpdk by ourself. so is there any guide about how to build openvswith-dpdk package? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] is there any document about how to build debian package with dpdk?
I have read this document, but following this guide, I can not build package for openvswitch-switch-dpdk. I want to build package with our own libdpdk, and is there any guides? At 2017-09-21 16:25:58, "Bodireddy, Bhanuprakash" wrote: >>we modified little code for dpdk, so we must rebuild ovs debian package with >>dpdk by ourself. >>so is there any guide about how to build openvswith-dpdk package? > >There is a guide on this here >http://docs.openvswitch.org/en/latest/intro/install/debian/ > >- Bhanuprakash. > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] can not well distributed when use dp_hash for ovs group
hi, I tested dp_hash for ovs group, and found that dp_hash can not well distributed, some buckets even can not be selected. In my testing environment, I have 11 buckets: group_id=131841,type=select,selection_method=dp_hash, bucket=bucket_id:51162,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.29:80))), bucket=bucket_id:42099,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.25:80))), bucket=bucket_id:53526,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.27:80))), bucket=bucket_id:12221,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.40:80))), bucket=bucket_id:2787,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.26:80))), bucket=bucket_id:18951,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.24:80))), bucket=bucket_id:32559,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.62:80))), bucket=bucket_id:35550,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.43:80))), bucket=bucket_id:9026,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.57:80))), bucket=bucket_id:26811,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.34:80))) But about 3~5 buckets can not be selected always. In the function xlate_dp_hash_select_group(), I found the code: uint32_t mask = (1 << log_2_ceil(n_buckets)) - 1; uint32_t basis = 0xc2b73583 * (ctx->xin->flow.dp_hash & mask); uint32_t score = (hash_int(bucket->bucket_id, basis) & 0x) * bucket->weight; for the above formula, if n_buckets is 11, then there are totally 16 probabilities for basis. so how can we make sure the best score can well distributed in the 11 buckets? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] can not update userspace vxlan tunnel neigh mac when peer VTEP mac changed
Hi, I found that sometime userspace vxlan can not work happily. 1. first data packet loss when tunnel neigh cache is empty, then the first data packet triggered sending ARP packet to peer VTEP, and the data packet dropped, tunnel neigh cache added this entry when receive ARP reply packet. err = tnl_neigh_lookup(out_dev->xbridge->name, &d_ip6, &dmac); if (err) { xlate_report(ctx, OFT_DETAIL, "neighbor cache miss for %s on bridge %s, " "sending %s request", buf_dip6, out_dev->xbridge->name, d_ip ? "ARP" : "ND"); if (d_ip) { tnl_send_arp_request(ctx, out_dev, smac, s_ip, d_ip); } else { tnl_send_nd_request(ctx, out_dev, smac, &s_ip6, &d_ip6); } return err; } 2. connection lost when peer VTEP mac changed when VTEP mac is already in tunnel neigh cache, exp: 10.182.6.81 fa:eb:26:c3:16:a5 br-phy so when data packet come in, it will use this mac for encaping outer VXLAN header. but VTEP 10.182.6.81 mac changed from fa:eb:26:c3:16:a5 to 24:eb:26:c3:16:a5 because of NIC changed. data packet continue sending with the old mac fa:eb:26:c3:16:a5, but the peer VTEP will not accept these packets because of mac not match. the wrong tunnel neigh entry aging until the data packet stop sending. if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) { tnl_neigh_snoop(flow, wc, ctx->xbridge->name); } 3. is there anybody has working for these problems? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] can not update userspace vxlan tunnel neigh mac when peer VTEP mac changed
HI, Jan, Thanks for your reply. we have already modify code snooping on the GARP packets, but these 2 problem still exists. I think the main problem is that GARP packets are not sending from interfaces when we changed NIC mac address or IP address(read the linux kernel code, there is no such process) so we must depend on data packet to trigger the ARP request. I know that in linux kernel, when ARP packet is triggered, data packets will be cached in a specified time, so the first data packet can still be send out when ARP reply is received. for the second problem, can we update tunnel neigh cache when we receive data packet from remote VTEP? since we can fetch tun_src and outer mac sa from the data packet. At 2018-03-28 04:41:12, "Jan Scheurich" wrote: >Hi Ychen, > >Funny! Again we are already working on a solution for problem 1. > >In our scenario the situation arises with a tunnel next hop being a VRRP >switch pair. The switch sends periodic gratuitous ARPs (GARPs) to announce the >VRRP IP&MAC but OVS native tunneling doesn't snoop on GARPs, only on ARP >replies. The host IP stack, on the other hand, accepts these GARPs and stops >sending refresh ARP requests itself. Hence nothing for OVS to snoop upon. > >The solution is to make OVS snoop on GARP requests also. > >It is quite possible that this will also fix your problem 2. If you also have >a VRRP tunnel next hop which just moves its VRRP IP address but not the MAC >address, should send a GARP with the new IP/MAC mapping when it moves the IP >address, which would now update OVS' tunnel neighbor cache. > >@Mano: Can you submit the GARP patch in the near future? > >BR, Jan > >> -Original Message- >> From: ovs-dev-boun...@openvswitch.org >> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of ychen >> Sent: Tuesday, 27 March, 2018 14:44 >> To: d...@openvswitch.org >> Subject: [ovs-dev] can not update userspace vxlan tunnel neigh mac when peer >> VTEP mac changed >> >> Hi, >>I found that sometime userspace vxlan can not work happily. >>1. first data packet loss >> when tunnel neigh cache is empty, then the first data packet >> triggered sending ARP packet to peer VTEP, and the data packet >> dropped, >> tunnel neigh cache added this entry when receive ARP reply packet. >> >> err = tnl_neigh_lookup(out_dev->xbridge->name, &d_ip6, &dmac); >>if (err) { >> xlate_report(ctx, OFT_DETAIL, >> "neighbor cache miss for %s on bridge %s, " >> "sending %s request", >> buf_dip6, out_dev->xbridge->name, d_ip ? "ARP" : "ND"); >> if (d_ip) { >> tnl_send_arp_request(ctx, out_dev, smac, s_ip, d_ip); >> } else { >> tnl_send_nd_request(ctx, out_dev, smac, &s_ip6, &d_ip6); >> } >> return err; >> } >> >> >> 2. connection lost when peer VTEP mac changed >> when VTEP mac is already in tunnel neigh cache, exp: >> 10.182.6.81 fa:eb:26:c3:16:a5 br-phy >> >> so when data packet come in, it will use this mac for encaping outer >> VXLAN header. >> but VTEP 10.182.6.81 mac changed from fa:eb:26:c3:16:a5 to >> 24:eb:26:c3:16:a5 because of NIC changed. >> >> data packet continue sending with the old mac fa:eb:26:c3:16:a5, but >> the peer VTEP will not accept these packets because of mac >> not match. >> the wrong tunnel neigh entry aging until the data packet stop sending. >> >> >>if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) { >> tnl_neigh_snoop(flow, wc, ctx->xbridge->name); >> } >> >> >> 3. is there anybody has working for these problems? >> >> >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for select groups
Hi, Jan: When I test dp_hash with the new patch, vswitchd was killed by segment fault in some conditions. 1. add group with no buckets, then winner will be NULL 2. add buckets with weight with 0, then winner will also be NULL I did little modify to the patch, will you help to check whether it is correct? diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 8f6070d..b3a9639 100755 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -4773,6 +4773,8 @@ group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash) webster[i].value = bucket->weight; i++; } +//consider bucket weight equal to 0 +if (!min_weight) min_weight = 1; uint32_t min_slots = ceil(total_weight / min_weight); n_hash = MAX(16, 1L << log_2_ceil(min_slots)); @@ -4794,11 +4796,12 @@ group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash) for (int hash = 0; hash < n_hash; hash++) { VLOG_DBG("Hash value: %d", hash); double max_val = 0.0; -struct webster *winner; +struct webster *winner = NULL; for (i = 0; i < n_buckets; i++) { VLOG_DBG("Webster[%d]: divisor=%d value=%.2f", i, webster[i].divisor, webster[i].value); -if (webster[i].value > max_val) { +// use >= in condition there is only one bucket with weight 0 +if (webster[i].value >= max_val) { max_val = webster[i].value; winner = &webster[i]; } @@ -4827,7 +4830,8 @@ group_set_selection_method(struct group_dpif *group) group->selection_method = SEL_METHOD_DEFAULT; } else if (!strcmp(selection_method, "dp_hash")) { /* Try to use dp_hash if possible at all. */ -if (group_setup_dp_hash_table(group, 64)) { +uint32_t n_buckets = group->up.n_buckets; +if (n_buckets && group_setup_dp_hash_table(group, 64)) { group->selection_method = SEL_METHOD_DP_HASH; group->hash_alg = props->selection_method_param >> 32; if (group->hash_alg >= __OVS_HASH_MAX) { Another question, I found in function xlate_default_select_group and xlate_hash_fields_select_group, when group_best_live_bucket is NULL, it will call ofproto_group_unref, why dp_hash function no need to call it when there is no best bucket found?(exp: group with no buckets) At 2018-03-21 02:16:17, "Jan Scheurich" wrote: >The current implementation of the "dp_hash" selection method suffers >from two deficiences: 1. The hash mask and hence the number of dp_hash >values is just large enough to cover the number of group buckets, but >does not consider the case that buckets have different weights. 2. The >xlate-time selection of best bucket from the masked dp_hash value often >results in bucket load distributions that are quite different from the >bucket weights because the number of available masked dp_hash values >is too small (2-6 bits compared to 32 bits of a full hash in the default >hash selection method). > >This commit provides a more accurate implementation of the dp_hash >select group by applying the well known Webster method for distributing >a small number of "seats" fairly over the weighted "parties" >(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method). >The dp_hash mask is autmatically chosen large enough to provide good >enough accuracy even with widely differing weights. > >This distribution happens at group modification time and the resulting >table is stored with the group-dpif struct. At xlation time, we use the >masked dp_hash values as index to look up the assigned bucket. > >If the bucket should not be live, we do a circular search over the >mapping table until we find the first live bucket. As the buckets in >the table are by construction in pseudo-random order with a frequency >according to their weight, this method maintains correct distribution >even if one or more buckets are non-live. > >Xlation is further simplified by storing some derived select group state >at group construction in struct group-dpif in a form better suited for >xlation purposes. > >Signed-off-by: Jan Scheurich >Signed-off-by: Nitin Katiyar >Co-authored-by: Nitin Katiyar >Signed-off-by: Jan Scheurich >--- > include/openvswitch/ofp-group.h | 1 + > ofproto/ofproto-dpif-xlate.c| 70 > ofproto/ofproto-dpif.c | 142 > ofproto/ofproto-dpif.h | 13 > 4 files changed, 200 insertions(+), 26 deletions(-) > >diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h >index 8d893a5..af4033d 100644 >--- a/include/openvswitch/ofp-group.h >+++ b/include/openvswitch/ofp-group.h >@@ -47,6 +47,7 @@ struct bucket_counter { > /* Bucket for use in groups. */ > struct ofputil_bucket { > struct ovs_list list_node; >+uint16_t aux; /* Padding. Also used for temporary data.
Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups
Hi, Jan: I think the following code should also be modified + for (int hash = 0; hash < n_hash; hash++) { + double max_val = 0.0; + struct webster *winner; +for (i = 0; i < n_buckets; i++) { +if (webster[i].value > max_val) { ===> if bucket->weight=0, and there is only one bucket with weight equal to 0, then winner will be null +max_val = webster[i].value; +winner = &webster[i]; +} +} Test like this command: ovs-ofctl add-group br-int -O openflow15 "group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id=1,weight=0,actions=output:10" vswitchd crashed after command put. At 2018-04-16 22:26:27, "Jan Scheurich" wrote: >The current implementation of the "dp_hash" selection method suffers >from two deficiences: 1. The hash mask and hence the number of dp_hash >values is just large enough to cover the number of group buckets, but >does not consider the case that buckets have different weights. 2. The >xlate-time selection of best bucket from the masked dp_hash value often >results in bucket load distributions that are quite different from the >bucket weights because the number of available masked dp_hash values >is too small (2-6 bits compared to 32 bits of a full hash in the default >hash selection method). > >This commit provides a more accurate implementation of the dp_hash >select group by applying the well known Webster method for distributing >a small number of "seats" fairly over the weighted "parties" >(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method). >The dp_hash mask is autmatically chosen large enough to provide good >enough accuracy even with widely differing weights. > >This distribution happens at group modification time and the resulting >table is stored with the group-dpif struct. At xlation time, we use the >masked dp_hash values as index to look up the assigned bucket. > >If the bucket should not be live, we do a circular search over the >mapping table until we find the first live bucket. As the buckets in >the table are by construction in pseudo-random order with a frequency >according to their weight, this method maintains correct distribution >even if one or more buckets are non-live. > >Xlation is further simplified by storing some derived select group state >at group construction in struct group-dpif in a form better suited for >xlation purposes. > >Adapted the unit test case for dp_hash select group accordingly. > >Signed-off-by: Jan Scheurich >Signed-off-by: Nitin Katiyar >Co-authored-by: Nitin Katiyar >--- > include/openvswitch/ofp-group.h | 1 + > ofproto/ofproto-dpif-xlate.c| 74 +--- > ofproto/ofproto-dpif.c | 146 > ofproto/ofproto-dpif.h | 13 > tests/ofproto-dpif.at | 18 +++-- > 5 files changed, 221 insertions(+), 31 deletions(-) > >diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h >index 8d893a5..af4033d 100644 >--- a/include/openvswitch/ofp-group.h >+++ b/include/openvswitch/ofp-group.h >@@ -47,6 +47,7 @@ struct bucket_counter { > /* Bucket for use in groups. */ > struct ofputil_bucket { > struct ovs_list list_node; >+uint16_t aux; /* Padding. Also used for temporary data. */ > uint16_t weight;/* Relative weight, for "select" groups. */ > ofp_port_t watch_port; /* Port whose state affects whether this > bucket > * is live. Only required for fast failover >diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >index c8baba1..df245c5 100644 >--- a/ofproto/ofproto-dpif-xlate.c >+++ b/ofproto/ofproto-dpif-xlate.c >@@ -4235,35 +4235,55 @@ xlate_hash_fields_select_group(struct xlate_ctx *ctx, >struct group_dpif *group, > } > } > >+static struct ofputil_bucket * >+group_dp_hash_best_bucket(struct xlate_ctx *ctx, >+ const struct group_dpif *group, >+ uint32_t dp_hash) >+{ >+struct ofputil_bucket *bucket, *best_bucket = NULL; >+uint32_t n_hash = group->hash_mask + 1; >+ >+uint32_t hash = dp_hash &= group->hash_mask; >+ctx->wc->masks.dp_hash |= group->hash_mask; >+ >+/* Starting from the original masked dp_hash value iterate over the >+ * hash mapping table to find the first live bucket. As the buckets >+ * are quasi-randomly spread over the hash values, this maintains >+ * a distribution according to bucket weights even when some buckets >+ * are non-live. */ >+for (int i = 0; i < n_hash; i++) { >+bucket = group->hash_map[(hash + i) % n_hash]; >+if (bucket_is_alive(ctx, bucket, 0)) { >+best_bucket = bucket; >+break; >+} >+} >+ >+return best_bucket; >+} >+ > static void > xlate_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group, >
[ovs-dev] can not do ecmp with ovs group when send packet out from userspace vxlan port
1. environment Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal Port "vf-10.180.0.95" Interface "vf-10.180.0.95" type: vxlan options: {csum="true", df_default="false", in_key=flow, local_ip="10.180.0.95", out_key=flow, remote_ip=flow} Port tap111 Interface tap111 type: internal Bridge br-phy fail_mode: secure Port "dpdk_phy1" Interface "dpdk_phy1" type: dpdk options: {dpdk-devargs=":01:10.0", n_rxq="2"} Port br-phy Interface br-phy type: internal Port "dpdk_phy0" Interface "dpdk_phy0" type: dpdk options: {dpdk-devargs=":01:10.1", n_rxq="2"} 01:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 01:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 175: br-phy: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether fa:86:77:0b:1a:31 brd ff:ff:ff:ff:ff:ff inet 10.180.0.95/24 scope global br-phy valid_lft forever preferred_lft forever inet6 fe80::f886:77ff:fe0b:1a31/64 scope link valid_lft forever preferred_lft forever bridge br-phy flows: table=0,priority=150,in_port=LOCAL actions=group:1 table=0,priority=150,in_port="dpdk_phy0" actions=LOCAL table=0,priority=150,in_port="dpdk_phy1" actions=LOCAL group_id=1,type=select,bucket=watch_port:"dpdk_phy0",actions=output:"dpdk_phy0",bucket=watch_port:"dpdk_phy1",actions=output:"dpdk_phy1" bridge br-int flows: table=0, priority=100,in_port="tap111", actions=set_field:10.180.0.81->tun_dst,set_field:0x1435->tun_id,output:"vf-10.180.0.95" tap111 configurations: ip netns ns111 ip link set dev tap111 netns ns111 ip netns exec ns111 ip link set dev tap111 up ip netns exec ns111 ip addr add 192.168.10.5/24 dev tap111 ip netns exec ns111 ip neigh add 192.168.10.6 lladdr 00:00:00:00:11:66 dev tap111 send packet from tap111 with ip_dst=192.168.10.6, ip_src=192.168.10.5, udp dst port=5000, udp src port= range from 4~65534 2. phenomenon we can only watch packet from dpdk_phy0, but not sometimes dpdk_phy0, and sometimes dpdk_phy1 3. code trace in ovs a. we can watch the packet send from dpdk_phy0 with outer header(dst=10.180.0.81, src=10.180.0.95, udp src port=range in 32768 to 65535, dst=4789), and with inner header(dst=192.168.10.6, src=192.168.10.5,udp dst port=5000, udp src port=range from 4~65534) b. as we can see, we will use default group select method FIRST QUESTION: why we not use udp port for hash calculate? in function flow_hash_symmetric_l4(), we can see the following code: if (fields.eth_type == htons(ETH_TYPE_IP)) { fields.ipv4_addr = flow->nw_src ^ flow->nw_dst; fields.ip_proto = flow->nw_proto; if (fields.ip_proto == IPPROTO_TCP || fields.ip_proto == IPPROTO_SCTP) { fields.tp_port = flow->tp_src ^ flow->tp_dst; } } c. when send packet out from userspace vxlan port, it will first do group select, then build the total tunnel packet and send out SECOND QUESTION: how can we use the tunnel src port to do group hash? when packet do xlate in function xlate_select_group(), flow->tp_src is always 0 Thread 1 "ovs-vswitchd" hit Breakpoint 2, xlate_default_select_group (ctx=0x7ffc80f91e10, group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4135 4135struct flow_wildcards *wc = ctx->wc; (gdb) p/x ctx->xin->flow->tp_dst $6 = 0xb512 (gdb) p/x ctx->xin->flow->tp_src $8 = 0x0 (gdb) bt #0 xlate_default_select_group (ctx=0x7ffc80f91e10, group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4135 #1 0x55f7e7440f6d in xlate_select_group (ctx=0x7ffc80f91e10, group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4260 #2 0x55f7e744100f in xlate_group_action__ (ctx=0x7ffc80f91e10, group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4287 #3 0x55f7e74410df in xlate_group_action (ctx=0x7ffc80f91e10, group_id=1) at ofproto/ofproto-dpif-xlate.c:4314 #4 0x55f7e7445405 in do_xlate_actions (ofpacts=0x55f7e7ff3758, ofpacts_len=8, ctx=0x7ffc80f91e10) at ofproto/ofproto-dpif-xlate.c:6215 #5 0x55f7e7440117 in xlate_recursively (ctx=0x7ffc80f91e10, rule=0x55f7e7ffb1f0, deepens=true) at ofproto/ofproto-dpif-xlate.c:3907 #6 0x55f7e744069d in xlate_table_action (ctx=0x7ffc80f91e10, in_port=65534, table_id=0 '\000', may_packet_in=true, honor_table_miss=true, with_ct_orig=false) at ofproto/ofproto-dpif-xlate.c:4033 #7 0x55f7e743f07d in apply_nested_clone_actions (ctx=0x7ffc80f91e10, in_dev=0x55f7e8033b60, out_dev=0x55f7e7fff320) at ofproto/ofproto-dpif-xlate.c:3559 #8 0x55f7e743e266 in validate_and_combine_post_tnl_actions (ctx=0x7ffc80f91e10, xport=0x55f7e8033b60,
[ovs-dev] meter stats cleared when modify meter bands
Hi, all: I have a question why meter stats need cleared when just modify meter bands? when call function handle_modify_meter(), it finally call function dpif_netdev_meter_set(), in this function new dp_meter will be allocated and attched, hence stats will be cleared. if we just update dp_meter bands configuration, so the stats will be keeped when meter modify. Is there any consideration about this meter modify action? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward
we can easily reproduce this phenomenon by using tcp socket stream sending from ovs internal port. At 2019-10-30 19:49:16, "ychen" wrote: Hi, when we use docker to establish tcp session, we found that the packet which must do upcall to userspace has different encapsulated udp source port with packet that only needs do datapath flow forwarding. After some code research and kprobe debug, we found the following: 1. use udp_flow_src_port() to get the port so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will be used to calculate the skb->hash 2. when first packet of tcp session coming, packet needs do upcall to userspace, and then ovs_packet_cmd_execute() called new skb is allocated with both l4_hash and sw_hash set to 0 3. when none first packet of tcp sesion coming, function ovs_dp_process_packet()->ovs_execute_actions() called, and this time original skb is reserved. when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, sw_hash=0 4. we searched kernel code, and found such code: skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) { if (sk->sk_txhash) { skb->l4_hash = 1; skb->hash = sk->sk_txhash; } } static inline void sk_set_txhash(struct sock *sk) {sk->sk_txhash = net_tx_rndhash(); ==>it is a random value!!} 5. so let's have a summary: when packet is processing only in datapath flow, skb->hash is random value for the same tcp session? when packet needs processing first to userspace, than kernel space, skb->hash is calculated by 5 tuple? Our testing enviroment: debian 9, kernel 4.9.65 ovs version: 2.8.2 Simple topo is like this: docker_eth0<---+ | veth ip_forward +host_veth0<->port-eth(ovs-ineternal) host_veth0 and port-eth device stay in physical host. So can we treat skb->hash as a attribute, when send packet to userspace, encode this attribute; and then do ovs_packet_cmd_execute(), retrieve the same hash value from userspace? another important tips: if we send packets from qemu based tap device, vxlan source port is always same for the same tcp session; only when send packets from docker in which packets will do ip_forward, vxlan source port may different for same tcp session. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward
a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -209,6 +209,7 @@ struct upcall { ofp_port_t in_port;/* OpenFlow in port, or OFPP_NONE. */ uint16_t mru; /* If !0, Maximum receive unit of fragmented IP packet */ +uint32_t skb_hash; enum dpif_upcall_type type;/* Datapath type of the upcall. */ const struct nlattr *userdata; /* Userdata for DPIF_UC_ACTION Upcalls. */ @@ -772,6 +773,7 @@ recv_upcalls(struct handler *handler) struct upcall *upcall = &upcalls[n_upcalls]; struct flow *flow = &flows[n_upcalls]; unsigned int mru; +unsigned int skb_hash; int error; ofpbuf_use_stub(recv_buf, recv_stubs[n_upcalls], @@ -792,6 +794,12 @@ recv_upcalls(struct handler *handler) mru = 0; } +if (dupcall->skb_hash){ +skb_hash = nl_attr_get_u32(dupcall->skb_hash); +} else { +skb_hash = 0; +} + error = upcall_receive(upcall, udpif->backer, &dupcall->packet, dupcall->type, dupcall->userdata, flow, mru, &dupcall->ufid, PMD_ID_NULL); @@ -816,7 +824,7 @@ recv_upcalls(struct handler *handler) upcall->out_tun_key = dupcall->out_tun_key; upcall->actions = dupcall->actions; - +upcall->skb_hash = skb_hash; pkt_metadata_from_flow(&dupcall->packet.md, flow); flow_extract(&dupcall->packet, flow); @@ -1470,6 +1478,7 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, op->dop.u.execute.needs_help = (upcall->xout.slow & SLOW_ACTION) != 0; op->dop.u.execute.probe = false; op->dop.u.execute.mtu = upcall->mru; +op->dop.u.execute.skb_hash = upcall->skb_hash; } } -- 2.1.4 At 2019-11-06 12:04:57, "Tonghao Zhang" wrote: >On Mon, Nov 4, 2019 at 7:44 PM ychen wrote: >> >> >> >> we can easily reproduce this phenomenon by using tcp socket stream sending >> from ovs internal port. >> >> >> >> >> At 2019-10-30 19:49:16, "ychen" wrote: >> >> Hi, >>when we use docker to establish tcp session, we found that the packet >> which must do upcall to userspace has different encapsulated udp source port >>with packet that only needs do datapath flow forwarding. >> >> >>After some code research and kprobe debug, we found the following: >>1. use udp_flow_src_port() to get the port >> so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will >> be used to calculate the skb->hash >> 2. when first packet of tcp session coming, packet needs do upcall to >> userspace, and then ovs_packet_cmd_execute() called >> new skb is allocated with both l4_hash and sw_hash set to 0 >> 3. when none first packet of tcp sesion coming, function >> ovs_dp_process_packet()->ovs_execute_actions() called, >> and this time original skb is reserved. >> when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, >> sw_hash=0 >> 4. we searched kernel code, and found such code: >> skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) >> { if (sk->sk_txhash) { >> skb->l4_hash = 1; >> skb->hash = sk->sk_txhash; >> } >>} >> static inline void sk_set_txhash(struct sock *sk) >> {sk->sk_txhash = net_tx_rndhash(); ==>it is a random >> value!!} >>5. so let's have a summary: >>when packet is processing only in datapath flow, skb->hash is random >> value for the same tcp session? >>when packet needs processing first to userspace, than kernel space, >> skb->hash is calculated by 5 tuple? >> >>Our testing enviroment: >>debian 9, kernel 4.9.65 >>ovs version: 2.8.2 >> >> >>Simple topo is like this: >>docker_eth0<---+ >> | veth ip_forward >> >> +host_veth0<->port-eth(ovs-ineternal) >> host_veth0 and port-eth device stay in physical host. >> >> >>So can we treat skb->hash as a attribute, when send packet to userspace, >> encode this attribute; >>and then do ovs_packet_cmd_execute(), retrieve the same hash value from >> userspace? >> >> >> another important tips: >> if we send packets from qemu based tap device, vxlan source port is always >> same for the same tcp session; >> only when send packets from docker in which packets will do ip_forward, >> vxlan source port may different for same tcp session. >Should be fixed. The patch will be sent. >> >> >> >> >> >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] why the behavior for weigh=0 for group's dp_hash method is different with default selection method?
hi, I noticed that we can set bucket's weight to 0 when add/mod group. 1. when we use default select method, and when all the buckets with weight larger than 0 change to dead, we can still pick the bucket whose weight is 0. here is the code: pick_default_select_group()->group_best_live_bucket(): LIST_FOR_EACH (bucket, list_node, &group->up.buckets) { if (bucket_is_alive(ctx, bucket, 0)) { / so when only bucket with weight=0 is active uint32_t score = (hash_int(bucket->bucket_id, basis) & 0x) * bucket->weight; if (score >= best_score) { and bucket with weight=0 does match this clause best_bucket = bucket; best_score = score; } 2. but for dp_hash selection method, we init the bucket when group_construct and bucket whose weight is 0 will be excluded. Here is the code: for (int hash = 0; hash < n_hash; hash++) { struct webster *winner = &webster[0]; for (i = 1; i < n_buckets; i++) { if (webster[i].value > winner->value) { bucket with weight=0 always be excluded winner = &webster[i]; } } so here is my question: why the behavior is different for dp_hash method and default selection method? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] datapath flow will match packet's tll when we use dec_ttl in action
hi, when I send IP packets with ttl in IP header is random in range(1-255), and with all other IP header fields stay not changed but generated 255 datapath flows each with different ttl value. of course, i use the action dec_ttl, here is code: case OFPACT_DEC_TTL: wc->masks.nw_ttl = 0xff; my question is: can we optimize the dec_ttl action? only differentiate TTL>1 and TTL <=1? as we all know, when TTL=0, we should send packet to the controller, and let it make decision whether we should send icmp error packet out. when TTL is larger than 1, I think they are no difference, am i right? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] same tcp session selects different ovs group bucket when tcp retransmit
Hi, all: recently we meet a problem that when use ovs group with selection method dp_hash, same tcp session selects different ovs group bucket when tcp packet retransmit. if we fill different snat gw in group buckets, that will make tcp session reset after packet retranmition. we can reproduce this problem in a simple enviroment: Node1: (debian 9.8 with kernel version 4.9.65 and ovs version 2.10.1) act as a httpserver ovs-vsctl add-br br-int ovs-vsctl set bridge br-int protocols="OpenFlow10","OpenFlow11","OpenFlow12","OpenFlow13","OpenFlow14","OpenFlow15" ovs-vsctl add-port br-int tap111 -- set interface tap111 type=internal ovs-vsctl add-port br-int vxlan111 -- set interface vxlan111 type=vxlan options:in_key=flow options:local_ip="10.185.2.46" options:out_key=flow options:remote_ip=flow ip link set dev tap111 netns ns111 ip netns exec ns111 ip link set dev tap111 up ip netns exec ns111 ip link set dev tap111 mtu 1450 ip netns exec ns111 ip address add 10.1.1.1/24 dev tap111 //only an emulation, just set different nw_ttl in different bucket, we can simply watch the problem by capture packets ovs-ofctl add-group br-int -O openflow15 \ "group_id=2233,type=select,selection_method=dp_hash,bucket=bucket_id=1,actions=mod_nw_ttl:10,output:vxlan111,bucket=bucket_id=2,actions=mod_nw_ttl:20,output:vxlan111" ovs-ofctl add-flow br-int -O openflow15 "priority=100,in_port=tap111,ip,actions=set_field:1122->tun_id,set_field:10.185.2.47->tun_dst,group:2233" ovs-ofctl add-flow br-int -O openflow15 "priority=100,in_port=tap111,arp,actions=set_field:1122->tun_id,set_field:10.185.2.47->tun_dst,output:vxlan111" ovs-ofctl add-flow br-int -O openflow15 "priority=100,in_port=vxlan111,tun_id=1122,actions=output:tap111" //use tc netem to emulate tcp retransmition ip netns exec ns111 tc qdisc add dev tap111 root netem loss 1% Node2: (debian 9.1 with kernel version 4.9.0 and ovs version 2.8.2) act as a httpclient ovs-vsctl add-br br-int ovs-vsctl set bridge br-int protocols="OpenFlow10","OpenFlow11","OpenFlow12","OpenFlow13","OpenFlow14","OpenFlow15" ovs-vsctl add-port br-int tap111 -- set interface tap111 type=internal ovs-vsctl add-port br-int vxlan111 -- set interface vxlan111 type=vxlan options:in_key=flow options:local_ip="10.185.2.47" options:out_key=flow options:remote_ip=flow ip link set dev tap111 netns ns111 ip netns exec ns111 ip link set dev tap111 up ip netns exec ns111 ip link set dev tap111 mtu 1450 ip netns exec ns111 ip address add 10.1.1.8/24 dev tap111 ovs-ofctl add-flow br-int -O openflow15 "priority=100,in_port=tap111,actions=set_field:1122->tun_id,set_field:10.185.2.46->tun_dst,output:vxlan111" ovs-ofctl add-flow br-int -O openflow15 "priority=100,in_port=vxlan111,tun_id=1122,actions=output:tap111" In such enviroment, when we try to get a large file from Node1(httpserver), we can found that after tcp retransmition, not only the outer header of vxlan udp source port changed, but also inner IP header TTL changed. I think maybe sk_rethink_txhash() makes skb->hash changed when tcp retransmit, and any function who calls skb_get_hash() will be affected, like execute_hash() and udp_flow_src_port(). ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] dp_hash algorithm works incorretly when tcp retransmit
We meet a problem that same tcp session selects different ovs group bucket when in tcp retransmition, and we can easily reproduce this phenomenon. After some code research, we found that when tcp retransmit, it may call function sk_rethink_txhash(), and this function makes skb->hash changed, hence different ovs group bucket selected. anyone has good suggestions to fix this problem? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] question about userspace flow stats with meter
hi, I want to know how datapath stats mapped to userspace flow stats? is there any documents? example: table=0,in_port=1, meter=11,goto_table:2 table=2,in_port=1,output:2 meter: rate=1Mbps when I send packets with 2Mbps from port1, and totally 1 packets transmited first I expected that table=0 should with stats 1 packets, and table=2 should only have 5000 packets(meter stats show that there are 5000 packets dropped) but actually both table=0 and table=2 with stats 1 packets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] ifup locked when start ovs in debian9 with systemd
1. phenomenon ifup: waiting for lock on /run/network/ifstate.br-int 2. configurations /etc/network/interfaces allow-ovs br-int iface br-int inet manual ovs_type OVSBridge ovs_ports tap111 allow-br-int tap111 iface ngwintp inet manual ovs_bridge br-int ovs_type OVSIntPort 3. start ovs systemctl start openvswitch-switch now we can see 2 ifup processes, and when use cmd " systemctl status openvswitch-switch", we can see error " ifup: waiting for lock on /run/network/ifstate.br-int" 4. I found that in ovs ifupdown.sh scripts, there is such bash cmd: if /etc/init.d/openvswitch-switch status > /dev/null 2>&1; then :; else /etc/init.d/openvswitch-switch start fi is it means if openvswitch is not running, then start it? but when use systemd, "/etc/init.d/openvswitch-switch status" this command always returns value not equal to 0, hence openvswitch restarted, then ifup will be boot again , that's the reason caused the LOCK 5. when use the following bash cmd: if ovs_ctl status > /dev/null 2>&1; then :; else /etc/init.d/openvswitch-switch start fi then we can start and stop openvswitch smoothly but when we use "ifup --allow=ovs br-int", ifup LOCKED again 6. my question is: why we need to brought up openvswitch process in ifupdown.sh? is there a simple way to fix this problem? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ifup locked when start ovs in debian9 with systemd
thanks Shetty I tried the patch provided by you, it does fix the problem when ovs started. At 2017-06-19 23:00:18, "Guru Shetty" wrote: What OVS version is this? What is the platform version? i.e Debian/Ubuntu etc. Does your OVS have the following fix? https://github.com/openvswitch/ovs/commit/15af3d44c65eb3cd724378ce1b30c51aa87f4f69 On 19 June 2017 at 07:17, ychen wrote: 1. phenomenon ifup: waiting for lock on /run/network/ifstate.br-int 2. configurations /etc/network/interfaces allow-ovs br-int iface br-int inet manual ovs_type OVSBridge ovs_ports tap111 allow-br-int tap111 iface ngwintp inet manual ovs_bridge br-int ovs_type OVSIntPort 3. start ovs systemctl start openvswitch-switch now we can see 2 ifup processes, and when use cmd " systemctl status openvswitch-switch", we can see error " ifup: waiting for lock on /run/network/ifstate.br-int" 4. I found that in ovs ifupdown.sh scripts, there is such bash cmd: if /etc/init.d/openvswitch-switch status > /dev/null 2>&1; then :; else /etc/init.d/openvswitch-switch start fi is it means if openvswitch is not running, then start it? but when use systemd, "/etc/init.d/openvswitch-switch status" this command always returns value not equal to 0, hence openvswitch restarted, then ifup will be boot again , that's the reason caused the LOCK 5. when use the following bash cmd: if ovs_ctl status > /dev/null 2>&1; then :; else /etc/init.d/openvswitch-switch start fi then we can start and stop openvswitch smoothly but when we use "ifup --allow=ovs br-int", ifup LOCKED again 6. my question is: why we need to brought up openvswitch process in ifupdown.sh? is there a simple way to fix this problem? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] vswitchd crashed when revalidate flows in ovs 2.8.2
Hi, has any one see the following backtrace? Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f82d6ffd700 (LWP 10089))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x7f835a2b042a in __GI_abort () at abort.c:89 #2 0x7f835a2a7e67 in __assert_fail_base (fmt=, assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> "__pthread_mutex_lock") at assert.c:92 #3 0x7f835a2a7f12 in __GI___assert_fail (assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> "__pthread_mutex_lock") at assert.c:101 #4 0x7f835ab30d50 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f835b3935e0 ) at ../nptl/pthread_mutex_lock.c:81 #5 0x7f835b064218 in ovs_mutex_lock_at (l_=l_@entry=0x7f835b3935e0 , where=where@entry=0x7f835b1052cb "lib/seq.c:141") at lib/ovs-thread.c:76 #6 0x7f835b0841d7 in seq_change (seq=0x55982c7b5630) at lib/seq.c:141 #7 0x7f835b062d06 in ovsrcu_quiesce () at lib/ovs-rcu.c:152 #8 0x7f835b5f7058 in revalidator_sweep__ (revalidator=revalidator@entry=0x55982c7bb178, purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:2549 #9 0x7f835b5f9b80 in revalidator_sweep (revalidator=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:2556 #10 udpif_revalidator (arg=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:913 #11 0x7f835b0641d7 in ovsthread_wrapper (aux_=) at lib/ovs-thread.c:348 #12 0x7f835ab2e4a4 in start_thread (arg=0x7f82d6ffd700) at pthread_create.c:456 #13 0x7f835a364d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 we haven't find the rule how to reproduce it, but it seems crashes frequently about one time a day kernel version: 4.9.0-3-openstack-amd64 ovs version:2.8.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] vswitchd crashed when revalidate flows in ovs 2.8.2
(gdb) p/x seq_mutex $1 = { lock = { __data = { __lock = 0x2, __count = 0x0, __owner = 0x0, > owner is already 0, but still abort __nusers = 0x0, __kind = 0x2, __spins = 0x0, __elision = 0x0, __list = { __prev = 0x0, __next = 0x0 } }, __size = {0x2, 0x0 , 0x2, 0x0 }, __align = 0x2 }, where = 0x7f835b0e5520 } At 2019-08-26 19:51:20, "ychen" wrote: Hi, has any one see the following backtrace? Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f82d6ffd700 (LWP 10089))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x7f835a2b042a in __GI_abort () at abort.c:89 #2 0x7f835a2a7e67 in __assert_fail_base (fmt=, assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> "__pthread_mutex_lock") at assert.c:92 #3 0x7f835a2a7f12 in __GI___assert_fail (assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> "__pthread_mutex_lock") at assert.c:101 #4 0x7f835ab30d50 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f835b3935e0 ) at ../nptl/pthread_mutex_lock.c:81 #5 0x7f835b064218 in ovs_mutex_lock_at (l_=l_@entry=0x7f835b3935e0 , where=where@entry=0x7f835b1052cb "lib/seq.c:141") at lib/ovs-thread.c:76 #6 0x7f835b0841d7 in seq_change (seq=0x55982c7b5630) at lib/seq.c:141 #7 0x7f835b062d06 in ovsrcu_quiesce () at lib/ovs-rcu.c:152 #8 0x7f835b5f7058 in revalidator_sweep__ (revalidator=revalidator@entry=0x55982c7bb178, purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:2549 #9 0x7f835b5f9b80 in revalidator_sweep (revalidator=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:2556 #10 udpif_revalidator (arg=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:913 #11 0x7f835b0641d7 in ovsthread_wrapper (aux_=) at lib/ovs-thread.c:348 #12 0x7f835ab2e4a4 in start_thread (arg=0x7f82d6ffd700) at pthread_create.c:456 #13 0x7f835a364d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 we haven't find the rule how to reproduce it, but it seems crashes frequently about one time a day kernel version: 4.9.0-3-openstack-amd64 ovs version:2.8.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Meter measures incorrect when use multi-pmd in ovs2.10
Hi, I meet a problem that when send packets using netperf in multi-thread mode. The reproducing condition is like this: 1. ovs 2.10 in dpdk usermode with 2 pmd 2. Set Meter with rate=100,000pps, burst=20,000packet 3. when use single-thread mode for netperf, Meter behaves correct, and packets larger than 100,pps is dropped when use multi-thread mode for netperf, we noticed packets come from both 2 pmds, and in this time, Meter measure does not work. But Meter behaves correctly in ovs2.8 whether using single pmd or multi-pmd . Also, we have merged the patch 42697ca7757b594cc841d944e43ffc17905e3188 (long_delta_t = now / 1000 - meter->used / 1000) but problem still exists. We researched the meter code, found that meter use time_usec() to compute the delta time in ovs2.10. but when call the function dp_netdev_run_meter(), the input parameter 'now' variable come from pmd->ctx.now and pmd->ctx.now may updated when receive packet in function dp_netdev_process_rxq_port() so let's suppose the following condition: pmd1 receives packet in T1 pmd2 receives packet in T2 (T2used changed to pmd1->ctx.now than handle Meter in pmd2, now = pmd2->ctx.now and long_delta_t = now / 1000 - meter->used / 1000 = T2/1000 - T1/1000 which will be a negative value!!! then delta time changed with clause: delta_t = (long_delta_t > (long long int)meter->max_delta_t) ? meter->max_delta_t : (uint32_t)long_delta_t; in this time, delta_t = (uint32_t)(T2/1000 - T1/1000) will be overflow??? now, we just fix this problem by modifing the input parameter 'now' with time_usec(). we don't know whether this code modify has any side effect(performance issue?) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] group dp_hash method works incorrectly when using snat
Hi, We found that when the same TCP session using snat with dp_hash group as output actionj, SYN packet and the other packets behaves different, SYN packet outputs to one group bucket, and the other packets outputs to another group bucket. Here is the ovs flows: table=0,in_port=DOWN_PORT,tun_id=vni,ip,actions=ct(nat,zone=ZID,table=1) table=1,ip,ct_state=+new,ct(commit,nat,src=SNAT_PUB_IP,zone=ZID,table=2) table=1,ip,ct_state=-new,actions=goto_table(table=2) table=2,ip,actions=group:1 group=1,type=select,selection_method=dp_hash,bucket=actions=output:UP_PORT1,bucket=actions=output:UP_PORT2 Here is the datapath flow: tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.100.16/255.255.255.240,frag=no), packets:5, bytes:455, used:2.978s, flags:FP., actions:meter(248),meter(249),ct(zone=1298,nat),recirc(0x176) flow-dump from pmd on cpu core: 6 tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),ct_state(+new-inv),ct_zone(0x512),recirc_id(0x176),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:meter(250),ct(commit,zone=1298,nat(src=172.16.1.152:1024-65535)),recirc(0x177) tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),ct_state(-new-inv),ct_zone(0x512),recirc_id(0x176),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:00:00:00,dst=00:00:00:00:00:00),eth_type(0x0800),ipv4(ttl=64,frag=no), packets:4, bytes:389, used:3.002s, flags:FP., actions:set(eth(src=fa:25:fa:c2:52:71,dst=xx:xx:xx:xx:xx:xx)),set(ipv4(ttl=63)),hash(hash_l4(0)),recirc(0x178) flow-dump from pmd on cpu core: 6 tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x178),dp_hash(0x8a6c9809/0xf),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:4, bytes:389, used:3.025s, flags:FP., actions:2 tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x178),dp_hash(0xbab97b2e/0xf),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:3 flow-dump from pmd on cpu core: 6 tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x177),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:00:00:00,dst=00:00:00:00:00:00),eth_type(0x0800),ipv4(ttl=64,frag=no), packets:0, bytes:0, used:never, actions:set(eth(src=fa:25:fa:c2:52:71,dst=xx:xx:xx:xx:xx:xx)),set(ipv4(ttl=63)),hash(hash_l4(0)),recirc(0x178) from the above datapath flow, we can get the conclusion: 1. the first SYN packet match ct_state=+new, and recirculates 3 times 2. other packets match ct_state=-new, and recirculates only 2 times 3. packet's match +new and packets match -new have different dp_hash value, hence may output to different port (same session TCP packets output to different port may increase the disorder risk) we researched ovs code, and found the following: dpif_netdev_packet_get_rss_hash(struct dp_packet *packet, const struct miniflow *mf) { uint32_t hash, recirc_depth; if (OVS_LIKELY(dp_packet_rss_valid(packet))) { hash = dp_packet_get_rss_hash(packet); } else { hash = miniflow_hash_5tuple(mf, 0); dp_packet_set_rss_hash(packet, hash); } /* The RSS hash must account for the recirculation depth to avoid * collisions in the exact match cache */ recirc_depth = *recirc_depth_get_unsafe(); if (OVS_UNLIKELY(recirc_depth)) { hash = hash_finish(hash, recirc_depth);=> this code changes the RSS hash, and this function is called before EMC lookup dp_packet_set_rss_hash(packet, hash); } return hash; } so is there any method to fix this problem? we tried change the ovs flow with : table=1,ip,ct_state=-new,actions=ct(commit, table=2) and problem dispeer, but in this time ,packets match ct_state=-new also need recirc 3 times which may decrease performance. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Do not mix recirculation depth into RSS hash itself.
Thanks! I have simply verified in our testing enviroment, and it really worked! At 2019-10-24 20:32:11, "Ilya Maximets" wrote: >Mixing of RSS hash with recirculation depth is useful for flow lookup >because same packet after recirculation should match with different >datapath rule. Setting of the mixed value back to the packet is >completely unnecessary because recirculation depth is different on >each recirculation, i.e. we will have different packet hash for >flow lookup anyway. > >This should fix the issue that packets from the same flow could be >directed to different buckets based on a dp_hash or different ports of >a balanced bonding in case they were recirculated different number of >times (e.g. due to conntrack rules). >With this change, the original RSS hash will remain the same making >it possible to calculate equal dp_hash values for such packets. > >Reported-at: >https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363127.html >Fixes: 048963aa8507 ("dpif-netdev: Reset RSS hash when recirculating.") >Signed-off-by: Ilya Maximets >--- > lib/dpif-netdev.c | 1 - > 1 file changed, 1 deletion(-) > >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index 4546b55e8..c09b8fd95 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -6288,7 +6288,6 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet *packet, > recirc_depth = *recirc_depth_get_unsafe(); > if (OVS_UNLIKELY(recirc_depth)) { > hash = hash_finish(hash, recirc_depth); >-dp_packet_set_rss_hash(packet, hash); > } > return hash; > } >-- >2.17.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward
Hi, when we use docker to establish tcp session, we found that the packet which must do upcall to userspace has different encapsulated udp source port with packet that only needs do datapath flow forwarding. After some code research and kprobe debug, we found the following: 1. use udp_flow_src_port() to get the port so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will be used to calculate the skb->hash 2. when first packet of tcp session coming, packet needs do upcall to userspace, and then ovs_packet_cmd_execute() called new skb is allocated with both l4_hash and sw_hash set to 0 3. when none first packet of tcp sesion coming, function ovs_dp_process_packet()->ovs_execute_actions() called, and this time original skb is reserved. when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, sw_hash=0 4. we searched kernel code, and found such code: skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) { if (sk->sk_txhash) { skb->l4_hash = 1; skb->hash = sk->sk_txhash; } } static inline void sk_set_txhash(struct sock *sk) {sk->sk_txhash = net_tx_rndhash(); ==>it is a random value!!} 5. so let's have a summary: when packet is processing only in datapath flow, skb->hash is random value for the same tcp session? when packet needs processing first to userspace, than kernel space, skb->hash is calculated by 5 tuple? Our testing enviroment: debian 9, kernel 4.9.65 ovs version: 2.8.2 Simple topo is like this: docker_eth0<---+ | veth ip_forward +host_veth0<->port-eth(ovs-ineternal) host_veth0 and port-eth device stay in physical host. So can we treat skb->hash as a attribute, when send packet to userspace, encode this attribute; and then do ovs_packet_cmd_execute(), retrieve the same hash value from userspace? another important tips: if we send packets from qemu based tap device, vxlan source port is always same for the same tcp session; only when send packets from docker in which packets will do ip_forward, vxlan source port may different for same tcp session. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] meter stats cleared when modify meter bands
I know that in latest ovs versions, both kernel datapath and dpdk userspace datapath supports meter action. what I want to know is why we need to clear stats when modify meter bands? are there any considerations? I think it is easily to keep meter stats when only modify meter bands. At 2021-07-28 13:46:21, "Tonghao Zhang" wrote: >On Wed, Jul 28, 2021 at 10:57 AM ychen wrote: >> >> Hi, all: >> I have a question why meter stats need cleared when just modify meter >> bands? >> when call function handle_modify_meter(), it finally call function >> dpif_netdev_meter_set(), in this function new dp_meter will be allocated and >> attched, hence stats will be cleared. >>if we just update dp_meter bands configuration, so the stats will be >> keeped when meter modify. >>Is there any consideration about this meter modify action? >The commit 80738e5f93a70 supports the meter for kernel datapath. >and even though kernel modules support to set stats, but userspace >doesn't set them. >https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=96fbc13d7e770b542d2d1fcf700d0baadc6e8063 > >If needed, we can support this. > >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > >-- >Best regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] dest mac in fast datapath does not act as expected
hi, when we send 2 packets with different dest mac in 10s(fast datapath flow aging time), with the same userspace flow action, but the second packet act incorrectly. 1. problem phenomenon: userspace flow: in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:fa:16:3e:c0:ee:8c->eth_dst,output:tap111 packet caputured: 13:51:02.097914 fe:ff:ff:ff:ff:ff > fa:16:3e:c0:ee:8c , ethertype IPv4 (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 66.102.251.24 (73) //first packet, corrrect change its mac 13:51:04.213568 fe:ff:ff:ff:ff:ff > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 66.102.251.24 (73)//second packet, dest mac stay as unchanged 2. enviroment ovs 2.12, can only reproduce in kernel mode 3. reproduce step client node--> server node 3.1 client node configuration: $ sudo ovs-vsctl show 77f97d1d-e34f-4e4c-b4f1-1d2299a4411a Bridge br-test fail_mode: secure Port "vxlan11" Interface "vxlan11" type: vxlan options: {in_key=flow, local_ip="10.185.2.87", out_key=flow, remote_ip=flow} Port br-test Interface br-test type: internal Port "tap11" Interface "tap11" type: internal ovs_version: "2.12.0" sudo ovs-ofctl dump-flows br-test -O openflow13 cookie=0x0, duration=223377.958s, table=0, n_packets=7512, n_bytes=1755097, reset_counts in_port=tap11 actions=set_field:0x3562->tun_id,load:0xab90251->NXM_NX_TUN_IPV4_DST[],output:vxlan11 3.2 server node configuration: # ovs-vsctl show f39fb127-019b-41c3-86b7-a420a3b4d7f2 Bridge br-int fail_mode: secure Port "vf-10.185.2.81" Interface "vf-10.185.2.81" type: vxlan options: {csum="true", df_default="false", in_key=flow, local_ip="10.185.2.81", out_key=flow, remote_ip=flow} Port br-int Interface br-int type: internal Port "tap111" Interface "tap111" type: internal ovs_version: "2.12.0" ovs-ofctl add-flow br-int -O openflow13 "in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:c6:3a:16:ec:e0:d9->eth_dst,output:tap111" 3.3 sending packets packet payload: dst mac:c6:3a:16:ec:e0:d9 src mac: 02:00:00:00:00:00 src ip: 10.194.50.241 dst ip: 10.100.100.212 proto: udp l4port: 45678 # ovs-ofctl packet-out br-test 1 "table=0" "c63a16ece0d902000800451c400040118de60ac232f10a6464d48000b26e00082084" sleep 1s, send the second packet: # ovs-ofctl packet-out br-test 1 "table=0" "02000800451c400040118de60ac232f10a6464d48000b26e00082084" 3.4 server node packet capture 10:49:59.725865 fe:ff:ff:ff:ff:ff > c6:3a:16:ec:e0:d9, ethertype IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, length 0 10:50:00.881564 fe:ff:ff:ff:ff:ff > 11:11:11:11:11:11, ethertype IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, length 0 //it is wrong, dest mac should be c6:3a:16:ec:e0:d9 3.5 fast datapath flow in server node recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:60, used:2.176s, actions:,set(eth(src=fe:ff:ff:ff:ff:ff)),3 correct datapath flow: recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00,dst=11:11:11:11:11:11),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:set(eth(src=fe:ff:ff:ff:ff:ff,dst=c6:3a:16:ec:e0:d9)),3 compare with the correct datapath flow, dest mac is disappeared in match and action. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] dest mac in fast datapath does not act as expected
thanks! upgrade to ovs version 2.12.1 fix my problem. At 2022-08-17 18:49:46, "Ilya Maximets" wrote: >On 8/17/22 11:32, ychen wrote: >> hi, >>when we send 2 packets with different dest mac in 10s(fast datapath flow >> aging time), with the same userspace flow action, but the second packet act >> incorrectly. >> >> 1. problem phenomenon: >> userspace flow: >> in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:fa:16:3e:c0:ee:8c->eth_dst,output:tap111 >> packet caputured: >> 13:51:02.097914 fe:ff:ff:ff:ff:ff > fa:16:3e:c0:ee:8c , ethertype IPv4 >> (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A >> 66.102.251.24 (73) //first packet, corrrect change its mac >> 13:51:04.213568 fe:ff:ff:ff:ff:ff > 00:00:00:00:00:00, ethertype IPv4 >> (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A >> 66.102.251.24 (73)//second packet, dest mac stay as unchanged >> >> >>2. enviroment >>ovs 2.12, can only reproduce in kernel mode >> >> >>3. reproduce step >> client node--> server node >> >> >> 3.1 client node configuration: >> $ sudo ovs-vsctl show >> 77f97d1d-e34f-4e4c-b4f1-1d2299a4411a >> Bridge br-test >> fail_mode: secure >> Port "vxlan11" >> Interface "vxlan11" >> type: vxlan >> options: {in_key=flow, local_ip="10.185.2.87", out_key=flow, >> remote_ip=flow} >> Port br-test >> Interface br-test >> type: internal >> Port "tap11" >> Interface "tap11" >> type: internal >> ovs_version: "2.12.0" >> >> sudo ovs-ofctl dump-flows br-test -O openflow13 >> cookie=0x0, duration=223377.958s, table=0, n_packets=7512, n_bytes=1755097, >> reset_counts in_port=tap11 >> actions=set_field:0x3562->tun_id,load:0xab90251->NXM_NX_TUN_IPV4_DST[],output:vxlan11 >> >> 3.2 server node configuration: >># ovs-vsctl show >> f39fb127-019b-41c3-86b7-a420a3b4d7f2 >> Bridge br-int >> fail_mode: secure >> Port "vf-10.185.2.81" >> Interface "vf-10.185.2.81" >> type: vxlan >> options: {csum="true", df_default="false", in_key=flow, >> local_ip="10.185.2.81", out_key=flow, remote_ip=flow} >> Port br-int >> Interface br-int >> type: internal >> Port "tap111" >> Interface "tap111" >> type: internal >> ovs_version: "2.12.0" >> >> ovs-ofctl add-flow br-int -O openflow13 >> "in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:c6:3a:16:ec:e0:d9->eth_dst,output:tap111" >> >> >> 3.3 sending packets >> packet payload: >>dst mac:c6:3a:16:ec:e0:d9 >>src mac: 02:00:00:00:00:00 >>src ip: 10.194.50.241 >>dst ip: 10.100.100.212 >>proto: udp >> l4port: 45678 >> # ovs-ofctl packet-out br-test 1 "table=0" >> "c63a16ece0d902000800451c400040118de60ac232f10a6464d48000b26e00082084" >>sleep 1s, send the second packet: >> # ovs-ofctl packet-out br-test 1 "table=0" >> "02000800451c400040118de60ac232f10a6464d48000b26e00082084" >> >> 3.4 server node packet capture >> 10:49:59.725865 fe:ff:ff:ff:ff:ff > c6:3a:16:ec:e0:d9, ethertype >> IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, >> length 0 >> 10:50:00.881564 fe:ff:ff:ff:ff:ff > 11:11:11:11:11:11, ethertype >> IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, >> length 0 //it is wrong, dest mac should be c6:3a:16:ec:e0:d9 >> >> 3.5 fast datapath flow in server node >> >> recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), >> packets:1, bytes:60, used:2.17