Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.
On Thu, Mar 4, 2021 at 5:25 PM Numan Siddique wrote: > > On Fri, Mar 5, 2021 at 4:22 AM Han Zhou wrote: > > > > On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique wrote: > > > > > > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou wrote: > > > > > > > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique wrote: > > > > > > > > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou wrote: > > > > > > > > > > > > On Wed, Feb 24, 2021 at 5:27 AM wrote: > > > > > > > > > > > > > > From: Numan Siddique > > > > > > > > > > > > > > Presently we add 65535 priority lflows in the stages - > > > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which > > > > > > > match on 'ct.inv'. > > > > > > > > > > > > > > As per the 'ovs-fields' man page, this > > > > > > > ct state field can be used to identify problems such as: > > > > > > > • L3/L4 protocol handler is not loaded/unavailable. > > > > > > > > > > > > > > • L3/L4 protocol handler determines that the packet is > > > > > > > malformed. > > > > > > > > > > > > > > • Packets are unexpected length for protocol. > > > > > > > > > > > > > > This patch removes the usage of this field for the following > > > > > > > reasons: > > > > > > > > > > > > > > • Some of the smart NICs which support offloading datapath > > > > > > >flows don't support this field. > > > > > > > > > > > > What do you mean by "don't support this field"? Do you mean the NIC > > > > > > offloading supports connection tracking, but cannot detect if a > > packet > > > > is > > > > > > invalid and always populate the ct.inv as 0? > > > > > > > > > > I think so. From what I understand, the kernel conntrack feature is > > used > > > > > for the actual connection tracking. So NIC can't tell if the packet is > > > > > invalid or not > > > > > (say due to out-of-window tcp errors). > > > > > > > > > I know some NICs support CT offloading and some doesn't. > > > > So here what you are referring are the NICs that doesn't support CT > > > > offloading, which falls back to kernel datapath when CT is used, is it? > > If > > > > this is the case, then even without ct.inv it still couldn't support > > > > ct.est, etc. right? > > > > Or, do you mean this is specifically for NICs that support CT offloading > > > > but not ct.inv, i.e. it can do regular conntrack in NIC but just can't > > > > identify out-of-window packets, and that's why it supports ct.est but > > not > > > > ct.inv? > > > > I am still quite confused. Could clarify a little more, which types of > > NICs > > > > would benefit from this, and how? > > > > > > I'm not sure if I can explain the issue well. Can you please look > > > into this bugzilla - > > > https://bugzilla.redhat.com/show_bug.cgi?id=1921946 > > > We can discuss further if you have further questions or comments. > > > > > Unfortunately this one seems to require access permission. > > Ok. Let me try to share in some other way. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > • A recent commit in kernel ovs datapath sets the committed > > > > > > >connection tracking entry to be liberal for out-of-window > > > > > > >tcp packets (nf_ct_set_tcp_be_liberal()). Such TCP > > > > > > >packets will not be marked as invalid. > > > > > > > > > > > > > > > > > > > Could you share a link to this commit? > > > > > > > > > > Sure. > > > > > > > > > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff > > > > > > > > > Thanks for sharing. So OVS is not capable of detecting a out-of-window > > > > packet now. Could you explain more about the motivation? I couldn't get > > the > > > > full picture from commit message of that patch. Do you have a link that > > > > discusses more details? > > > > > > Let me share with you the patch which I first submitted to handle this > > issue. > > > During the review, @Florian Westphal suggested being liberal for > > out-of-window > > > packets to solve this issue. > > > > > > I think the patch discussions have enough information. Please let me know > > > if you have further questions or comments and we can discuss further. > > > > > > Initial approach taken by me - > > > > > https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/ > > > Final approach taken - > > > > > https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/ > > > > > Thanks for the pointers. Now I have a better context. It seems all these > > work was to deal with (optimize) the LB (stateful) + stateless ACL use > > cases. > > 1. we don't want to track packets coming from VIF (because there is no > > stateful ACL) > > 2. but packets to VIF need to go through CT because there is LB configured > > which requires CT (for nat), which regarded return packets as invalid. > > > > With the above patch, the return packets won't be invalid any more in the > > above scenario. > > However, isn't the stateful ACL support also insufficient now? With
Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
On Fri, Mar 5, 2021 at 7:10 AM Ben Pfaff wrote: > > On Fri, Mar 05, 2021 at 06:46:55AM +0530, Numan Siddique wrote: > > On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff wrote: > > > > > > On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote: > > > > From: Numan Siddique > > > > > > > > The commit c6e21a23bd8 which supported the option > > > > 'lb_force_snat_ip=router_ip' > > > > on a gateway router, missed out on > > > > - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb > > > > = 1'. > > > > - removing the flow to drop if ip.dst == router port IP in > > > > 'lr_in_ip_input' > > > > stage. > > > > > > > > This patch fixes these issue and adds a system test to cover the > > > > hairpin load balancer traffic for the gateway routers. > > > > > > > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option > > > > 'lb_force_snat_ip' to take router port ips.") > > > > > > > > CC: Ben Pfaff > > > > Signed-off-by: Numan Siddique > > > > > > Thanks! I took a look at the differences between the C and the DDlog > > > versions. I have a few comments. > > > > > > The C version adds a new flow to the output. The DDlog version doesn't > > > appear to, yet I do see a similar flow in the DDlog code. I am not sure > > > how that happened. > > > > Thanks for the review. > > > > Can you please tell me in which table the new flow is seen? > > > > If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then > > both > > the C version and ddlog version do the below > > > > (1) Update the flow in the table lr_in_dnat to add > > "flags.force_snat_for_lb = 1" in the action > > for the packets which are destined to load balancer vips. > > > > (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the > > packets for router ips. > > The below suggestion provided by you to replace "not > > force_lb_snat" with ".force_lb_snat = false" > > does that. > > If they do the same thing, then that's great and no change is needed. > What surprises me is that the C version of the changes has the following > hunk to add a new flow, but there's no corresponding change to the DDlog > version to add a new flow. > > @@ -9153,10 +9153,17 @@ build_lrouter_force_snat_flows_op(struct ovn_port *op, > > if (op->lrp_networks.n_ipv4_addrs) { > ds_clear(match); > ds_clear(actions); > > +ds_put_format(match, "inport == %s && ip4.dst == %s", > + op->json_key, op->lrp_networks.ipv4_addrs[0].addr_s); > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_UNSNAT, 110, > + ds_cstr(match), "ct_snat;"); > + > +ds_clear(match); > + > /* Higher priority rules to force SNAT with the router port ip. > * This only takes effect when the packet has already been > * load balanced once. */ > ds_put_format(match, "flags.force_snat_for_lb == 1 && ip4 && " >"outport == %s", op->json_key); > Oops. You're right. I totally missed it. Thanks a lot for spotting this. I'll address this and send v2. > > > You can optimize it by writing > > > > + .force_lb_snat = false, > > > and then omitting the later clause. With the way DDlog currently > > > works, this isn't just cosmetic, it will actually yield faster > > > code because DDlog will discard nonmatching records earlier. (If > > > that's an actual performance problem it will come out later in > > > profiling, I'm just giving a tip here that I only recently learned > > > myself.) > > > > I tried like below as per your suggestion and the compilation failed > > Oh, I meant like the following incremental. Sorry I wasn't clear! > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl > index 0e208c1f5160..21092cc1bcea 100644 > --- a/northd/ovn_northd.dl > +++ b/northd/ovn_northd.dl > @@ -5142,12 +5142,11 @@ Flow(.logical_datapath = lr_uuid, > .external_ids = stage_hint(lrp_uuid)) :- > (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > .router = {.snat_ips = snat_ips, > - .force_lb_snat = force_lb_snat, > + .force_lb_snat = false, Thanks. I'll try this out. Numan >.lr = nb::Logical_Router{._uuid = > lr_uuid}}, > .networks = networks), > var addr = FlatMap(networks.ipv4_addrs), > not snat_ips.contains_key(IPv4{addr.addr}), > -not force_lb_snat, > var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > Flow(.logical_datapath = lr_uuid, > .stage = router_stage(IN, IP_INPUT), > @@ -5157,12 +5156,11 @@ Flow(.logical_datapath = lr_uuid, > .external_ids = stage_hint(lrp_uuid)) :- > (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > .router = {.snat_ips = snat_ips, > - .force_lb_snat = force_lb_snat, > +
Re: [ovs-dev] [PATCH v12 00/11] Add offload support for sFlow
Hi Ilya, I think about your suggestion recently. But I'm still not very clear about the design. Please see my reply below: On 3/1/2021 8:48 PM, Ilya Maximets wrote: On 3/1/21 9:30 AM, Chris Mi wrote: Hi Simon, Ilya, Could I know what should we do to make progress for this patch set? It has been posted in the community for a long time In general, the way to get your patches reviewed is to review other patches. It's simply because we still have a huge review backlog (214 patches right now in patchwork and most of them needs review) and bugfixes usually has a bit higher priority. By reviewing other patches you're reducing amount of work for maintainers so they can get to your patches faster. OK, I see. For the series and comments from Eelco: I didn't read the patches carefully, only a quick glance, but I still do not understand why we need a separate thread to poll psample events. Why can't we just allow usual handler threads to do that? I'm not sure if you are aware of that the psample netlink is different from the ovs netlink. Without offload, kernel sends missed packet and sFlow packet to userspace using the same netlink 'ovs_packet_family'. So they can use the same handler thread. But in order to offload sFlow action, we use psample kernel module to send sampled packets from kernel to userspace. The format for ovs netlink message and psample netlink messages are different. From the architecture perspective it's not a good thing to call ofproto code from the offload-provider. This introduces lots of complications and might cause lots of issues in the future. I'd say that design should look like this: handler thread -> dpif_recv() -> dpif_netlink_recv() -> netdev_offload_recv() -> netdev_offload_tc_recv() -> nl_sock_recv() In order to use the handler thread, I'm not sure if we should add psample socket fd to every handler's epoll_fd. If we should do that, we should consider how to differentiate if the event comes from ovs or psample netlink. Maybe we should allocate a special port number for psample and assign it to event.data.u32. Anyway, that's the details. If this is the right direction, I'll think about it. Last three calls should be implemented. The better version of this will be to throw away dpif part from above call chain and make it: handler thread -> netdev_offload_recv() -> netdev_offload_tc_recv() -> nl_sock_recv() If we throw away dpif part, maybe we have to write some duplicate epoll code for psample only. And we can't block in nl_sock_recv(). Maybe we have to add psample socket fd to every handler's epoll_fd. So we have to change the dpif somehow. I don't know if I understand your suggestion correctly. Or I missed anything So if you have time, could you please elaborate? Thanks, Chris This way we could avoid touching dpif-netdev and still have psample offloading for the case where netdev-offload-tc is used from the userspace datapath. Above architecture also implies implementation of: - netdev_offload_recv_wait() - netdev_offload_recv_purge() - and the netdev_offload_tc_* counterparts. Best regards, Ilya Maximets. Thanks, Chris On 2/23/2021 5:08 PM, Roi Dayan wrote: On 2021-01-27 8:23 AM, Chris Mi wrote: This patch set adds offload support for sFlow. Psample is a genetlink channel for packet sampling. TC action act_sample uses psample to send sampled packets to userspace. When offloading sample action to TC, userspace creates a unique ID to map sFlow action and tunnel info and passes this ID to kernel instead of the sFlow info. psample will send this ID and sampled packet to userspace. Using the ID, userspace can recover the sFlow info and send sampled packet to the right sFlow monitoring host. v2-v1: - Fix robot errors. v3-v2: - Remove Gerrit Change-Id. - Add patch #9 to fix older kernels build issue. - Add travis test result. v4-v3: - Fix offload issue when sampling rate is 1. v5-v4: - Move polling thread from ofproto to netdev-offload-tc. v6-v5: - Rebase. - Add GitHub Actions test result. v7-v6: - Remove Gerrit Change-Id. - Fix "ERROR: Inappropriate spacing around cast" v8-v7 - Address Eelco Chaudron's comment for patch #11. v9-v8 - Remove sflow_len from struct dpif_sflow_attr. - Log a debug message for other userspace actions. v10-v9 - Address Eelco Chaudron's comments on v9. v11-v10 - Fix a bracing error. v12-v11 - Add duplicate sample group id check. Chris Mi (11): compat: Add psample and tc sample action defines for older kernels ovs-kmod-ctl: Load kernel module psample dpif: Introduce register sFlow upcall callback API ofproto: Add upcall callback to process sFlow packet netdev-offload: Introduce register sFlow upcall callback API netdev-offload-tc: Implement register sFlow upcall callback API dpif-netlink: Implement register sFlow upcall callback API netdev-offload-tc: Introduce group ID management API
Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
On Fri, Mar 05, 2021 at 06:46:55AM +0530, Numan Siddique wrote: > On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff wrote: > > > > On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote: > > > From: Numan Siddique > > > > > > The commit c6e21a23bd8 which supported the option > > > 'lb_force_snat_ip=router_ip' > > > on a gateway router, missed out on > > > - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = > > > 1'. > > > - removing the flow to drop if ip.dst == router port IP in > > > 'lr_in_ip_input' > > > stage. > > > > > > This patch fixes these issue and adds a system test to cover the > > > hairpin load balancer traffic for the gateway routers. > > > > > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option > > > 'lb_force_snat_ip' to take router port ips.") > > > > > > CC: Ben Pfaff > > > Signed-off-by: Numan Siddique > > > > Thanks! I took a look at the differences between the C and the DDlog > > versions. I have a few comments. > > > > The C version adds a new flow to the output. The DDlog version doesn't > > appear to, yet I do see a similar flow in the DDlog code. I am not sure > > how that happened. > > Thanks for the review. > > Can you please tell me in which table the new flow is seen? > > If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then > both > the C version and ddlog version do the below > > (1) Update the flow in the table lr_in_dnat to add > "flags.force_snat_for_lb = 1" in the action > for the packets which are destined to load balancer vips. > > (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the > packets for router ips. > The below suggestion provided by you to replace "not > force_lb_snat" with ".force_lb_snat = false" > does that. If they do the same thing, then that's great and no change is needed. What surprises me is that the C version of the changes has the following hunk to add a new flow, but there's no corresponding change to the DDlog version to add a new flow. @@ -9153,10 +9153,17 @@ build_lrouter_force_snat_flows_op(struct ovn_port *op, if (op->lrp_networks.n_ipv4_addrs) { ds_clear(match); ds_clear(actions); +ds_put_format(match, "inport == %s && ip4.dst == %s", + op->json_key, op->lrp_networks.ipv4_addrs[0].addr_s); +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_UNSNAT, 110, + ds_cstr(match), "ct_snat;"); + +ds_clear(match); + /* Higher priority rules to force SNAT with the router port ip. * This only takes effect when the packet has already been * load balanced once. */ ds_put_format(match, "flags.force_snat_for_lb == 1 && ip4 && " "outport == %s", op->json_key); > > You can optimize it by writing > > > + .force_lb_snat = false, > > and then omitting the later clause. With the way DDlog currently > > works, this isn't just cosmetic, it will actually yield faster > > code because DDlog will discard nonmatching records earlier. (If > > that's an actual performance problem it will come out later in > > profiling, I'm just giving a tip here that I only recently learned > > myself.) > > I tried like below as per your suggestion and the compilation failed Oh, I meant like the following incremental. Sorry I wasn't clear! diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 0e208c1f5160..21092cc1bcea 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5142,12 +5142,11 @@ Flow(.logical_datapath = lr_uuid, .external_ids = stage_hint(lrp_uuid)) :- (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, .router = {.snat_ips = snat_ips, - .force_lb_snat = force_lb_snat, + .force_lb_snat = false, .lr = nb::Logical_Router{._uuid = lr_uuid}}, .networks = networks), var addr = FlatMap(networks.ipv4_addrs), not snat_ips.contains_key(IPv4{addr.addr}), -not force_lb_snat, var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). Flow(.logical_datapath = lr_uuid, .stage = router_stage(IN, IP_INPUT), @@ -5157,12 +5156,11 @@ Flow(.logical_datapath = lr_uuid, .external_ids = stage_hint(lrp_uuid)) :- (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, .router = {.snat_ips = snat_ips, - .force_lb_snat = force_lb_snat, + .force_lb_snat = false, .lr = nb::Logical_Router{._uuid = lr_uuid}}, .networks = networks), var addr = FlatMap(networks.ipv6_addrs), not snat_ips.contains_key(IPv6{addr.addr}), -not force_lb_snat, var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). for (RouterPortNetworksIPv4Addr(
Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.
On Fri, Mar 5, 2021 at 4:22 AM Han Zhou wrote: > > On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique wrote: > > > > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou wrote: > > > > > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique wrote: > > > > > > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou wrote: > > > > > > > > > > On Wed, Feb 24, 2021 at 5:27 AM wrote: > > > > > > > > > > > > From: Numan Siddique > > > > > > > > > > > > Presently we add 65535 priority lflows in the stages - > > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which > > > > > > match on 'ct.inv'. > > > > > > > > > > > > As per the 'ovs-fields' man page, this > > > > > > ct state field can be used to identify problems such as: > > > > > > • L3/L4 protocol handler is not loaded/unavailable. > > > > > > > > > > > > • L3/L4 protocol handler determines that the packet is > > > > > > malformed. > > > > > > > > > > > > • Packets are unexpected length for protocol. > > > > > > > > > > > > This patch removes the usage of this field for the following > > > > > > reasons: > > > > > > > > > > > > • Some of the smart NICs which support offloading datapath > > > > > >flows don't support this field. > > > > > > > > > > What do you mean by "don't support this field"? Do you mean the NIC > > > > > offloading supports connection tracking, but cannot detect if a > packet > > > is > > > > > invalid and always populate the ct.inv as 0? > > > > > > > > I think so. From what I understand, the kernel conntrack feature is > used > > > > for the actual connection tracking. So NIC can't tell if the packet is > > > > invalid or not > > > > (say due to out-of-window tcp errors). > > > > > > > I know some NICs support CT offloading and some doesn't. > > > So here what you are referring are the NICs that doesn't support CT > > > offloading, which falls back to kernel datapath when CT is used, is it? > If > > > this is the case, then even without ct.inv it still couldn't support > > > ct.est, etc. right? > > > Or, do you mean this is specifically for NICs that support CT offloading > > > but not ct.inv, i.e. it can do regular conntrack in NIC but just can't > > > identify out-of-window packets, and that's why it supports ct.est but > not > > > ct.inv? > > > I am still quite confused. Could clarify a little more, which types of > NICs > > > would benefit from this, and how? > > > > I'm not sure if I can explain the issue well. Can you please look > > into this bugzilla - > > https://bugzilla.redhat.com/show_bug.cgi?id=1921946 > > We can discuss further if you have further questions or comments. > > > Unfortunately this one seems to require access permission. Ok. Let me try to share in some other way. > > > > > > > > > > > > > > > > > > > > > > > • A recent commit in kernel ovs datapath sets the committed > > > > > >connection tracking entry to be liberal for out-of-window > > > > > >tcp packets (nf_ct_set_tcp_be_liberal()). Such TCP > > > > > >packets will not be marked as invalid. > > > > > > > > > > > > > > > > Could you share a link to this commit? > > > > > > > > Sure. > > > > > > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff > > > > > > > Thanks for sharing. So OVS is not capable of detecting a out-of-window > > > packet now. Could you explain more about the motivation? I couldn't get > the > > > full picture from commit message of that patch. Do you have a link that > > > discusses more details? > > > > Let me share with you the patch which I first submitted to handle this > issue. > > During the review, @Florian Westphal suggested being liberal for > out-of-window > > packets to solve this issue. > > > > I think the patch discussions have enough information. Please let me know > > if you have further questions or comments and we can discuss further. > > > > Initial approach taken by me - > > > https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/ > > Final approach taken - > > > https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/ > > > Thanks for the pointers. Now I have a better context. It seems all these > work was to deal with (optimize) the LB (stateful) + stateless ACL use > cases. > 1. we don't want to track packets coming from VIF (because there is no > stateful ACL) > 2. but packets to VIF need to go through CT because there is LB configured > which requires CT (for nat), which regarded return packets as invalid. > > With the above patch, the return packets won't be invalid any more in the > above scenario. > However, isn't the stateful ACL support also insufficient now? With the > above patch, middle-of-traffic packets without established connection will > be regarded as "new", instead of "invalid", right? Then I wonder if we lose > the value of stateful ACL completely. Any considerations here? (I am not a > fan of stateful ACLs but just thinking if our
Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff wrote: > > On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote: > > From: Numan Siddique > > > > The commit c6e21a23bd8 which supported the option > > 'lb_force_snat_ip=router_ip' > > on a gateway router, missed out on > > - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'. > > - removing the flow to drop if ip.dst == router port IP in > > 'lr_in_ip_input' > > stage. > > > > This patch fixes these issue and adds a system test to cover the > > hairpin load balancer traffic for the gateway routers. > > > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option > > 'lb_force_snat_ip' to take router port ips.") > > > > CC: Ben Pfaff > > Signed-off-by: Numan Siddique > > Thanks! I took a look at the differences between the C and the DDlog > versions. I have a few comments. > > The C version adds a new flow to the output. The DDlog version doesn't > appear to, yet I do see a similar flow in the DDlog code. I am not sure > how that happened. Thanks for the review. Can you please tell me in which table the new flow is seen? If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then both the C version and ddlog version do the below (1) Update the flow in the table lr_in_dnat to add "flags.force_snat_for_lb = 1" in the action for the packets which are destined to load balancer vips. (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the packets for router ips. The below suggestion provided by you to replace "not force_lb_snat" with ".force_lb_snat = false" does that. > > Looking at the following: > > (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > > .router = {.snat_ips = snat_ips, > > + .force_lb_snat = force_lb_snat, > >.lr = nb::Logical_Router{._uuid = > > lr_uuid}}, > > .networks = networks), > > var addr = FlatMap(networks.ipv4_addrs), > > not snat_ips.contains_key(IPv4{addr.addr}), > > +not force_lb_snat, > > var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > > You can optimize it by writing > > + .force_lb_snat = false, > and then omitting the later clause. With the way DDlog currently > works, this isn't just cosmetic, it will actually yield faster > code because DDlog will discard nonmatching records earlier. (If > that's an actual performance problem it will come out later in > profiling, I'm just giving a tip here that I only recently learned > myself.) I tried like below as per your suggestion and the compilation failed *** ddlog -i ../northd/ovn_northd.dl -o ./northd -L /home/nusiddiq/.local/ddlog/lib -L ./northd error: failed to parse input file: "../northd/ovn_northd.dl" (line 5150, column 20): unexpected "=" expecting "in" * diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 90839b663..a7cd0892a 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5147,7 +5147,7 @@ Flow(.logical_datapath = lr_uuid, .networks = networks), var addr = FlatMap(networks.ipv4_addrs), not snat_ips.contains_key(IPv4{addr.addr}), -not force_lb_snat, +.force_lb_snat = false, var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). Flow(.logical_datapath = lr_uuid, .stage = router_stage(IN, IP_INPUT), @@ -5162,7 +5162,7 @@ Flow(.logical_datapath = lr_uuid, .networks = networks), var addr = FlatMap(networks.ipv6_addrs), not snat_ips.contains_key(IPv6{addr.addr}), -not force_lb_snat, +.force_lb_snat = false, var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). I also tried "force_lb_snat = false" and it still failed. >From what I understand the above code does (2) which I mentioned. Thanks Numan > > Thanks, > > Ben. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.
On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique wrote: > > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou wrote: > > > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique wrote: > > > > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou wrote: > > > > > > > > On Wed, Feb 24, 2021 at 5:27 AM wrote: > > > > > > > > > > From: Numan Siddique > > > > > > > > > > Presently we add 65535 priority lflows in the stages - > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which > > > > > match on 'ct.inv'. > > > > > > > > > > As per the 'ovs-fields' man page, this > > > > > ct state field can be used to identify problems such as: > > > > > • L3/L4 protocol handler is not loaded/unavailable. > > > > > > > > > > • L3/L4 protocol handler determines that the packet is > > > > > malformed. > > > > > > > > > > • Packets are unexpected length for protocol. > > > > > > > > > > This patch removes the usage of this field for the following > > > > > reasons: > > > > > > > > > > • Some of the smart NICs which support offloading datapath > > > > >flows don't support this field. > > > > > > > > What do you mean by "don't support this field"? Do you mean the NIC > > > > offloading supports connection tracking, but cannot detect if a packet > > is > > > > invalid and always populate the ct.inv as 0? > > > > > > I think so. From what I understand, the kernel conntrack feature is used > > > for the actual connection tracking. So NIC can't tell if the packet is > > > invalid or not > > > (say due to out-of-window tcp errors). > > > > > I know some NICs support CT offloading and some doesn't. > > So here what you are referring are the NICs that doesn't support CT > > offloading, which falls back to kernel datapath when CT is used, is it? If > > this is the case, then even without ct.inv it still couldn't support > > ct.est, etc. right? > > Or, do you mean this is specifically for NICs that support CT offloading > > but not ct.inv, i.e. it can do regular conntrack in NIC but just can't > > identify out-of-window packets, and that's why it supports ct.est but not > > ct.inv? > > I am still quite confused. Could clarify a little more, which types of NICs > > would benefit from this, and how? > > I'm not sure if I can explain the issue well. Can you please look > into this bugzilla - > https://bugzilla.redhat.com/show_bug.cgi?id=1921946 > We can discuss further if you have further questions or comments. > Unfortunately this one seems to require access permission. > > > > > > > > > > > > > > > > > • A recent commit in kernel ovs datapath sets the committed > > > > >connection tracking entry to be liberal for out-of-window > > > > >tcp packets (nf_ct_set_tcp_be_liberal()). Such TCP > > > > >packets will not be marked as invalid. > > > > > > > > > > > > > Could you share a link to this commit? > > > > > > Sure. > > > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff > > > > > Thanks for sharing. So OVS is not capable of detecting a out-of-window > > packet now. Could you explain more about the motivation? I couldn't get the > > full picture from commit message of that patch. Do you have a link that > > discusses more details? > > Let me share with you the patch which I first submitted to handle this issue. > During the review, @Florian Westphal suggested being liberal for out-of-window > packets to solve this issue. > > I think the patch discussions have enough information. Please let me know > if you have further questions or comments and we can discuss further. > > Initial approach taken by me - > https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/ > Final approach taken - > https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/ > Thanks for the pointers. Now I have a better context. It seems all these work was to deal with (optimize) the LB (stateful) + stateless ACL use cases. 1. we don't want to track packets coming from VIF (because there is no stateful ACL) 2. but packets to VIF need to go through CT because there is LB configured which requires CT (for nat), which regarded return packets as invalid. With the above patch, the return packets won't be invalid any more in the above scenario. However, isn't the stateful ACL support also insufficient now? With the above patch, middle-of-traffic packets without established connection will be regarded as "new", instead of "invalid", right? Then I wonder if we lose the value of stateful ACL completely. Any considerations here? (I am not a fan of stateful ACLs but just thinking if our implementation reflects what we are declaring to the users) > > > > > > > > > > > > > • Even if a ct.inv packet is delivered to a VIF, the > > > > >networking stack of the VIF's kernel can handle such > > > > >packets. > > > > > > > > I have some concern for this point. We shouldn't make assumptions for > > > > what's configured
Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote: > From: Numan Siddique > > The commit c6e21a23bd8 which supported the option 'lb_force_snat_ip=router_ip' > on a gateway router, missed out on > - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'. > - removing the flow to drop if ip.dst == router port IP in 'lr_in_ip_input' > stage. > > This patch fixes these issue and adds a system test to cover the > hairpin load balancer traffic for the gateway routers. > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option > 'lb_force_snat_ip' to take router port ips.") > > CC: Ben Pfaff > Signed-off-by: Numan Siddique Thanks! I took a look at the differences between the C and the DDlog versions. I have a few comments. The C version adds a new flow to the output. The DDlog version doesn't appear to, yet I do see a similar flow in the DDlog code. I am not sure how that happened. Looking at the following: > (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > .router = {.snat_ips = snat_ips, > + .force_lb_snat = force_lb_snat, >.lr = nb::Logical_Router{._uuid = > lr_uuid}}, > .networks = networks), > var addr = FlatMap(networks.ipv4_addrs), > not snat_ips.contains_key(IPv4{addr.addr}), > +not force_lb_snat, > var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). You can optimize it by writing > + .force_lb_snat = false, and then omitting the later clause. With the way DDlog currently works, this isn't just cosmetic, it will actually yield faster code because DDlog will discard nonmatching records earlier. (If that's an actual performance problem it will come out later in profiling, I'm just giving a tip here that I only recently learned myself.) Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn 1/2] northd-ddlog: Fix lb_force_snat_ip router option.
On Wed, Mar 03, 2021 at 11:32:22PM +0530, num...@ovn.org wrote: > From: Numan Siddique > > There were few typos because of which lflows related to > router option lb_force_snat_ip were not generated correctly. > > This patch fixes it. > > Fixes: 0e77b3bcbfe("ovn-northd-ddlog: New implementation of ovn-northd based > on ddlog.") > CC: Ben Pfaff > Signed-off-by: Numan Siddique Thank you for the fix! Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5] python: Send notifications after the transaction ends
Acked-by: Brian Haley Thanks for fixing this Terry. -Brian On 3/4/21 11:18 AM, Terry Wilson wrote: The Python IDL notification mechanism was sending a notification for each processed update in a transaction as it was processed. This causes issues with multi-row changes that contain references to each other. For example, if a Logical_Router_Port is created along with a Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then when the notify() passes the CREATE event for the LRP, the GC will not yet have been processed, so __getattr__ when _uuid_to_row fails to find the GC, will return the default value for LRP.gateway_chassis which is []. This patch has the process_update methods return the notifications that would be produced when a row changes, so they can be queued and sent after all rows have been processed. Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook") Signed-off-by: Terry Wilson --- python/ovs/db/idl.py | 39 --- tests/ovsdb-idl.at | 22 ++ tests/test-ovsdb.py | 7 +-- 3 files changed, 51 insertions(+), 17 deletions(-) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/1] python: Send notifications after the transaction ends
Bleep bloop. Greetings Terry Wilson, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author Terry Wilson needs to sign off. Lines checked: 220, Warnings: 0, Errors: 1 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
One big problem of netdev-afxdp is that there is no metadata support from the hardware at all. For example, OVS netdev-afxdp has to do rxhash, or TCP checksum in software, resulting in high performance overhead. A generic meta data type for XDP frame using BTF is proposed[1] and there is sample implementation[2][3]. This patch experiments enabling the XDP metadata, or called HW hints, and shows the potential performance improvement. The patch uses only the rxhash value provided from HW, so avoiding at the calculation of hash at lib/dpif-netdev.c: if (!dp_packet_rss_valid(execute->packet)) { dp_packet_set_rss_hash(execute->packet, flow_hash_5tuple(execute->flow, 0)); } Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing cycles per packet' drops from 402 to 272. More details below Reference: -- [1] https://www.kernel.org/doc/html/latest/bpf/btf.html [2] https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf [3] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4 Testbed: Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by: $ bpftool net xdp show xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0) enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0) $ bpftool net xdp set dev enp2s0f0np0 md_btf on $ bpftool net xdp xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1) Limitations/TODO: - 1. Support only AF_XDP native mode, not zero-copy mode. 2. Currently only three fields: vlan, hash, and flow_mark, and only receive side supports XDP metadata. 3. Control plane, how to enable and probe the structure, not upstream yet. OVS rxdrop without HW hints: --- Drop rate: 4.8Mpps pmd thread numa_id 0 core_id 3: packets received: 196592006 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 196592006 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 56009063835 (41.43%) processing cycles: 79164971931 (58.57%) avg cycles per packet: 687.59 (135174035766/196592006) avg processing cycles per packet: 402.69 (79164971931/196592006) pmd thread numa_id 0 core_id 3: Iterations: 339607649 (0.23 us/it) - Used TSC cycles: 188620512777 ( 99.9 % of total cycles) - idle iterations:330697002 ( 40.3 % of used cycles) - busy iterations: 8910647 ( 59.7 % of used cycles) Rx packets: 285140031 (3624 Kpps, 395 cycles/pkt) Datapath passes: 285140031 (1.00 passes/pkt) - EMC hits: 28513 (100.0 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) - Upcalls:0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 0 Perf report: 17.56% pmd-c03/id:11 ovs-vswitchd[.] netdev_afxdp_rxq_recv 14.39% pmd-c03/id:11 ovs-vswitchd[.] dp_netdev_process_rxq_port 14.17% pmd-c03/id:11 ovs-vswitchd[.] pmd_thread_main 10.86% pmd-c03/id:11 [vdso] [.] __vdso_clock_gettime 10.19% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_end_iteration 7.71% pmd-c03/id:11 ovs-vswitchd[.] time_timespec__ 5.64% pmd-c03/id:11 ovs-vswitchd[.] time_usec 3.88% pmd-c03/id:11 ovs-vswitchd[.] netdev_get_class 2.95% pmd-c03/id:11 ovs-vswitchd[.] netdev_rxq_recv 2.78% pmd-c03/id:11 libbpf.so.0.2.0 [.] xsk_socket__fd 2.74% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_start_iteration 2.11% pmd-c03/id:11 libc-2.27.so[.] __clock_gettime 1.32% pmd-c03/id:11 ovs-vswitchd[.] xsk_socket__fd@plt OVS rxdrop with HW hints: - rxdrop rate: 4.73Mpps pmd thread numa_id 0 core_id 7: packets received: 13686880 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 13686880 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 3182105544 (46.02%) processing cycles: 3732023844 (53.98%) avg cycles per packet: 505.16 (6914129388/13686880) avg processing cycles per packet: 272.67 (3732023844/13686880) pmd thread numa_id 0 core_id 7: Iterations: 392909539 (0.18 us/it) - Used TSC cycles: 167697342678 ( 99.9 % of total cycles) - idle iterations:382539861 ( 46.0 % of used cycles) - busy iterations: 10369678 ( 54.0 % of used cycles) Rx packets: 331829656 (4743 Kpps, 273 cycles/pkt) Datapath passes: 331829656 (1.00 passes/pkt) - EMC hits: 331829656 (100.0 %) - SMC hits:
Re: [ovs-dev] [PATCH ovn 00/11] ovn-northd-ddlog improvements
On Thu, Mar 04, 2021 at 06:07:42PM +0530, Numan Siddique wrote: > On Thu, Mar 4, 2021 at 9:40 AM Ben Pfaff wrote: > Thanks for this series. > > Acked-by: Numan Siddique > > Please note that I'm still learning ddlog and haven't understood much > of the northd ddlog code. > > I ran the patches and found a few northd-ddlog tests to be failing > when run with -j5. If I run individually they pass. > May be the scale improvements in northd-ddlog could help in those tests. > > I found one particular test failing 100% of the time. > > Can you please take a look into that. Thanks for all the comments. I'm looking into this stuff now. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5] python: Send notifications after the transaction ends
Acked-by: Flavio Fernandes mailto:fla...@flaviof.com>> > On Mar 4, 2021, at 11:18 AM, Terry Wilson wrote: > > The Python IDL notification mechanism was sending a notification > for each processed update in a transaction as it was processed. > This causes issues with multi-row changes that contain references > to each other. > > For example, if a Logical_Router_Port is created along with a > Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then > when the notify() passes the CREATE event for the LRP, the GC will > not yet have been processed, so __getattr__ when _uuid_to_row fails > to find the GC, will return the default value for LRP.gateway_chassis > which is []. > > This patch has the process_update methods return the notifications > that would be produced when a row changes, so they can be queued > and sent after all rows have been processed. > > Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook") > Signed-off-by: Terry Wilson > --- > python/ovs/db/idl.py | 39 --- > tests/ovsdb-idl.at | 22 ++ > tests/test-ovsdb.py | 7 +-- > 3 files changed, 51 insertions(+), 17 deletions(-) > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] netlink: ignore IFLA_WIRELESS events
Bleep bloop. Greetings Michal Kazior, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 80 characters long (recommended limit is 79) #158 FILE: lib/rtnetlink.h:49: don't really indicate real netdev change that Lines checked: 165, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4] netlink: ignore IFLA_WIRELESS events
From: Michal Kazior Some older wireless drivers - ones relying on the old and long deprecated wireless extension ioctl system - can generate quite a bit of IFLA_WIRELESS events depending on their configuration and runtime conditions. These are delivered as RTNLGRP_LINK via RTM_NEWLINK messages. These tend to be relatively easily identifiable because they report the change mask being 0. This isn't guaranteed but in practice it shouldn't be a problem. None of the wireless events that I ever observed actually carry any unique information about netdev states that ovs-vswitchd is interested in. Hence ignoring these shouldn't cause any problems. These events can be responsible for a significant CPU churn as ovs-vswitchd attempts to do plenty of work for each and every netlink message regardless of what that message carries. On low-end devices such as consumer-grade routers these can lead to a lot of CPU cycles being wasted, adding up to heat output and reducing performance. It could be argued that wireless drivers in question should be fixed, but that isn't exactly a trivial thing to do. Patching ovs seems far more viable while still making sense. Signed-off-by: Michal Kazior --- Notes: v4: - fixed comment-too-long checkpatch warnin [0day robot] v3: - dont change rtnetlink_parse() semantics, instead extend rtnetlink_change struct and update its consumers [Ilya] - adjusted commit log to reflect different approach [Ilya] v2: - fix bracing style [0day robot / checkpatch] lib/if-notifier.c | 7 ++- lib/netdev-linux.c | 9 + lib/route-table.c | 4 lib/rtnetlink.c| 18 ++ lib/rtnetlink.h| 3 +++ 5 files changed, 40 insertions(+), 1 deletion(-) diff --git a/lib/if-notifier.c b/lib/if-notifier.c index 9a64f9b15..7ba9cc316 100644 --- a/lib/if-notifier.c +++ b/lib/if-notifier.c @@ -26,9 +26,14 @@ struct if_notifier { }; static void -if_notifier_cb(const struct rtnetlink_change *change OVS_UNUSED, void *aux) +if_notifier_cb(const struct rtnetlink_change *change, void *aux) { struct if_notifier *notifier; + +if (change->irrelevant) { +return; +} + notifier = aux; notifier->cb(notifier->aux); } diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 6be23dbee..388288f71 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -663,6 +663,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change) { struct linux_lag_member *lag; +if (change->irrelevant) { +return; +} + if (change->sub && netdev_linux_kind_is_lag(change->sub)) { lag = shash_find_data(_shash, change->ifname); @@ -887,6 +891,10 @@ netdev_linux_update(struct netdev_linux *dev, int nsid, const struct rtnetlink_change *change) OVS_REQUIRES(dev->mutex) { +if (change->irrelevant) { +return; +} + if (netdev_linux_netnsid_is_eq(dev, nsid)) { netdev_linux_update__(dev, change); } @@ -6344,6 +6352,7 @@ netdev_linux_update_via_netlink(struct netdev_linux *netdev) } if (rtnetlink_parse(reply, change) +&& !change->irrelevant && change->nlmsg_type == RTM_NEWLINK) { bool changed = false; error = 0; diff --git a/lib/route-table.c b/lib/route-table.c index 6c82cdfdd..a4531df1e 100644 --- a/lib/route-table.c +++ b/lib/route-table.c @@ -342,6 +342,10 @@ static void name_table_change(const struct rtnetlink_change *change, void *aux OVS_UNUSED) { +if (change->irrelevant) { +return; +} + /* Changes to interface status can cause routing table changes that some * versions of the linux kernel do not advertise for some reason. */ route_table_valid = false; diff --git a/lib/rtnetlink.c b/lib/rtnetlink.c index 125802925..ebb032aa7 100644 --- a/lib/rtnetlink.c +++ b/lib/rtnetlink.c @@ -99,6 +99,7 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) [IFLA_MTU]= { .type = NL_A_U32,.optional = true }, [IFLA_ADDRESS] = { .type = NL_A_UNSPEC, .optional = true }, [IFLA_LINKINFO] = { .type = NL_A_NESTED, .optional = true }, +[IFLA_WIRELESS] = { .type = NL_A_UNSPEC, .optional = true }, }; struct nlattr *attrs[ARRAY_SIZE(policy)]; @@ -111,6 +112,23 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) ifinfo = ofpbuf_at(buf, NLMSG_HDRLEN, sizeof *ifinfo); +/* Wireless events can be spammy and cause a + * lot of unnecessary churn and CPU load in + * ovs-vswitchd. The best way to filter them out + * is to rely on the IFLA_WIRELESS and + * ifi_change. As per rtnetlink_ifinfo_prep() in + * the kernel, the ifi_change = 0. That combined + * with the fact wireless events never really + * change interface state (as
[ovs-dev] [PATCH v5] python: Send notifications after the transaction ends
The Python IDL notification mechanism was sending a notification for each processed update in a transaction as it was processed. This causes issues with multi-row changes that contain references to each other. For example, if a Logical_Router_Port is created along with a Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then when the notify() passes the CREATE event for the LRP, the GC will not yet have been processed, so __getattr__ when _uuid_to_row fails to find the GC, will return the default value for LRP.gateway_chassis which is []. This patch has the process_update methods return the notifications that would be produced when a row changes, so they can be queued and sent after all rows have been processed. Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook") Signed-off-by: Terry Wilson --- python/ovs/db/idl.py | 39 --- tests/ovsdb-idl.at | 22 ++ tests/test-ovsdb.py | 7 +-- 3 files changed, 51 insertions(+), 17 deletions(-) diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index 5850ac7ab..4226d1cb2 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import collections import functools import uuid @@ -39,6 +40,10 @@ OVSDB_UPDATE2 = 1 CLUSTERED = "clustered" +Notice = collections.namedtuple('Notice', ('event', 'row', 'updates')) +Notice.__new__.__defaults__ = (None,) # default updates=None + + class Idl(object): """Open vSwitch Database Interface Definition Language (OVSDB IDL). @@ -614,6 +619,7 @@ class Idl(object): raise error.Error(" is not an object", table_updates) +notices = [] for table_name, table_update in table_updates.items(): table = tables.get(table_name) if not table: @@ -639,7 +645,9 @@ class Idl(object): % (table_name, uuid_string)) if version == OVSDB_UPDATE2: -if self.__process_update2(table, uuid, row_update): +changes = self.__process_update2(table, uuid, row_update) +if changes: +notices.append(changes) self.change_seqno += 1 continue @@ -652,17 +660,20 @@ class Idl(object): raise error.Error(' missing "old" and ' '"new" members', row_update) -if self.__process_update(table, uuid, old, new): +changes = self.__process_update(table, uuid, old, new) +if changes: +notices.append(changes) self.change_seqno += 1 +for notice in notices: +self.notify(*notice) def __process_update2(self, table, uuid, row_update): +"""Returns Notice if a column changed, False otherwise.""" row = table.rows.get(uuid) -changed = False if "delete" in row_update: if row: del table.rows[uuid] -self.notify(ROW_DELETE, row) -changed = True +return Notice(ROW_DELETE, row) else: # XXX rate-limit vlog.warn("cannot delete missing row %s from table" @@ -681,29 +692,27 @@ class Idl(object): changed = self.__row_update(table, row, row_update) table.rows[uuid] = row if changed: -self.notify(ROW_CREATE, row) +return Notice(ROW_CREATE, row) elif "modify" in row_update: if not row: raise error.Error('Modify non-existing row') old_row = self.__apply_diff(table, row, row_update['modify']) -self.notify(ROW_UPDATE, row, Row(self, table, uuid, old_row)) -changed = True +return Notice(ROW_UPDATE, row, Row(self, table, uuid, old_row)) else: raise error.Error(' unknown operation', row_update) -return changed +return False def __process_update(self, table, uuid, old, new): -"""Returns True if a column changed, False otherwise.""" +"""Returns Notice if a column changed, False otherwise.""" row = table.rows.get(uuid) changed = False if not new: # Delete row. if row: del table.rows[uuid] -changed = True -self.notify(ROW_DELETE, row) +return Notice(ROW_DELETE, row) else: # XXX rate-limit vlog.warn("cannot delete missing row %s from table %s" @@ -723,7 +732,7 @@ class Idl(object): if op == ROW_CREATE: table.rows[uuid] = row
[ovs-dev] [PATCH v4 1/1] python: Send notifications after the transaction ends
The Python IDL notification mechanism was sending a notification for each processed update in a transaction as it was processed. This causes issues with multi-row changes that contain references to each other. For example, if a Logical_Router_Port is created along with a Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then when the notify() passes the CREATE event for the LRP, the GC will not yet have been processed, so __getattr__ when _uuid_to_row fails to find the GC, will return the default value for LRP.gateway_chassis which is []. This patch has the process_update methods return the notifications that would be produced when a row changes, so they can be queued and sent after all rows have been processed. Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook") --- python/ovs/db/idl.py | 39 --- tests/ovsdb-idl.at | 22 ++ tests/test-ovsdb.py | 7 +-- 3 files changed, 51 insertions(+), 17 deletions(-) diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index 5850ac7ab..4226d1cb2 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import collections import functools import uuid @@ -39,6 +40,10 @@ OVSDB_UPDATE2 = 1 CLUSTERED = "clustered" +Notice = collections.namedtuple('Notice', ('event', 'row', 'updates')) +Notice.__new__.__defaults__ = (None,) # default updates=None + + class Idl(object): """Open vSwitch Database Interface Definition Language (OVSDB IDL). @@ -614,6 +619,7 @@ class Idl(object): raise error.Error(" is not an object", table_updates) +notices = [] for table_name, table_update in table_updates.items(): table = tables.get(table_name) if not table: @@ -639,7 +645,9 @@ class Idl(object): % (table_name, uuid_string)) if version == OVSDB_UPDATE2: -if self.__process_update2(table, uuid, row_update): +changes = self.__process_update2(table, uuid, row_update) +if changes: +notices.append(changes) self.change_seqno += 1 continue @@ -652,17 +660,20 @@ class Idl(object): raise error.Error(' missing "old" and ' '"new" members', row_update) -if self.__process_update(table, uuid, old, new): +changes = self.__process_update(table, uuid, old, new) +if changes: +notices.append(changes) self.change_seqno += 1 +for notice in notices: +self.notify(*notice) def __process_update2(self, table, uuid, row_update): +"""Returns Notice if a column changed, False otherwise.""" row = table.rows.get(uuid) -changed = False if "delete" in row_update: if row: del table.rows[uuid] -self.notify(ROW_DELETE, row) -changed = True +return Notice(ROW_DELETE, row) else: # XXX rate-limit vlog.warn("cannot delete missing row %s from table" @@ -681,29 +692,27 @@ class Idl(object): changed = self.__row_update(table, row, row_update) table.rows[uuid] = row if changed: -self.notify(ROW_CREATE, row) +return Notice(ROW_CREATE, row) elif "modify" in row_update: if not row: raise error.Error('Modify non-existing row') old_row = self.__apply_diff(table, row, row_update['modify']) -self.notify(ROW_UPDATE, row, Row(self, table, uuid, old_row)) -changed = True +return Notice(ROW_UPDATE, row, Row(self, table, uuid, old_row)) else: raise error.Error(' unknown operation', row_update) -return changed +return False def __process_update(self, table, uuid, old, new): -"""Returns True if a column changed, False otherwise.""" +"""Returns Notice if a column changed, False otherwise.""" row = table.rows.get(uuid) changed = False if not new: # Delete row. if row: del table.rows[uuid] -changed = True -self.notify(ROW_DELETE, row) +return Notice(ROW_DELETE, row) else: # XXX rate-limit vlog.warn("cannot delete missing row %s from table %s" @@ -723,7 +732,7 @@ class Idl(object): if op == ROW_CREATE: table.rows[uuid] = row if changed: -
[ovs-dev] [PATCH v4 0/1] Send notifications after the transaction ends
Ok, I think this is likely to be the final iteration of the patch as we are down to the typos. In Neutron, we process these notifications in a separate thread. So what that means is that we were experiencing race conditions where sometimes the notification's row would have a value set, and other times it wouldn't. In fixing some other race conditions related to that event processing thread, we tried to add a read-only copy of the Row object and that actually turned the race condition into a consistent failure on py27 due to the table_updates dict not maintaining insertion-order, but consistently putting Logical_Router_Port before Gateway_Chassis. In py3 things could still fail as in the test case provided, e.g. if rows referenced each other in the same transaction, or depending on the order in which ovsdb-server adds the updates. Terry Wilson (1): python: Send notifications after the transaction ends python/ovs/db/idl.py | 39 --- tests/ovsdb-idl.at | 22 ++ tests/test-ovsdb.py | 7 +-- 3 files changed, 51 insertions(+), 17 deletions(-) -- 2.17.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3] netlink: ignore IFLA_WIRELESS events
From: Michal Kazior Some older wireless drivers - ones relying on the old and long deprecated wireless extension ioctl system - can generate quite a bit of IFLA_WIRELESS events depending on their configuration and runtime conditions. These are delivered as RTNLGRP_LINK via RTM_NEWLINK messages. These tend to be relatively easily identifiable because they report the change mask being 0. This isn't guaranteed but in practice it shouldn't be a problem. None of the wireless events that I ever observed actually carry any unique information about netdev states that ovs-vswitchd is interested in. Hence ignoring these shouldn't cause any problems. These events can be responsible for a significant CPU churn as ovs-vswitchd attempts to do plenty of work for each and every netlink message regardless of what that message carries. On low-end devices such as consumer-grade routers these can lead to a lot of CPU cycles being wasted, adding up to heat output and reducing performance. It could be argued that wireless drivers in question should be fixed, but that isn't exactly a trivial thing to do. Patching ovs seems far more viable while still making sense. Signed-off-by: Michal Kazior --- Notes: v3: - dont change rtnetlink_parse() semantics, instead extend rtnetlink_change struct and update its consumers [Ilya] - adjusted commit log to reflect different approach [Ilya] v2: - fix bracing style [0day robot / checkpatch] lib/if-notifier.c | 7 ++- lib/netdev-linux.c | 9 + lib/route-table.c | 4 lib/rtnetlink.c| 18 ++ lib/rtnetlink.h| 3 +++ 5 files changed, 40 insertions(+), 1 deletion(-) diff --git a/lib/if-notifier.c b/lib/if-notifier.c index 9a64f9b15..7ba9cc316 100644 --- a/lib/if-notifier.c +++ b/lib/if-notifier.c @@ -26,9 +26,14 @@ struct if_notifier { }; static void -if_notifier_cb(const struct rtnetlink_change *change OVS_UNUSED, void *aux) +if_notifier_cb(const struct rtnetlink_change *change, void *aux) { struct if_notifier *notifier; + +if (change->irrelevant) { +return; +} + notifier = aux; notifier->cb(notifier->aux); } diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 6be23dbee..388288f71 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -663,6 +663,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change) { struct linux_lag_member *lag; +if (change->irrelevant) { +return; +} + if (change->sub && netdev_linux_kind_is_lag(change->sub)) { lag = shash_find_data(_shash, change->ifname); @@ -887,6 +891,10 @@ netdev_linux_update(struct netdev_linux *dev, int nsid, const struct rtnetlink_change *change) OVS_REQUIRES(dev->mutex) { +if (change->irrelevant) { +return; +} + if (netdev_linux_netnsid_is_eq(dev, nsid)) { netdev_linux_update__(dev, change); } @@ -6344,6 +6352,7 @@ netdev_linux_update_via_netlink(struct netdev_linux *netdev) } if (rtnetlink_parse(reply, change) +&& !change->irrelevant && change->nlmsg_type == RTM_NEWLINK) { bool changed = false; error = 0; diff --git a/lib/route-table.c b/lib/route-table.c index 6c82cdfdd..a4531df1e 100644 --- a/lib/route-table.c +++ b/lib/route-table.c @@ -342,6 +342,10 @@ static void name_table_change(const struct rtnetlink_change *change, void *aux OVS_UNUSED) { +if (change->irrelevant) { +return; +} + /* Changes to interface status can cause routing table changes that some * versions of the linux kernel do not advertise for some reason. */ route_table_valid = false; diff --git a/lib/rtnetlink.c b/lib/rtnetlink.c index 125802925..ebb032aa7 100644 --- a/lib/rtnetlink.c +++ b/lib/rtnetlink.c @@ -99,6 +99,7 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) [IFLA_MTU]= { .type = NL_A_U32,.optional = true }, [IFLA_ADDRESS] = { .type = NL_A_UNSPEC, .optional = true }, [IFLA_LINKINFO] = { .type = NL_A_NESTED, .optional = true }, +[IFLA_WIRELESS] = { .type = NL_A_UNSPEC, .optional = true }, }; struct nlattr *attrs[ARRAY_SIZE(policy)]; @@ -111,6 +112,23 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) ifinfo = ofpbuf_at(buf, NLMSG_HDRLEN, sizeof *ifinfo); +/* Wireless events can be spammy and cause a + * lot of unnecessary churn and CPU load in + * ovs-vswitchd. The best way to filter them out + * is to rely on the IFLA_WIRELESS and + * ifi_change. As per rtnetlink_ifinfo_prep() in + * the kernel, the ifi_change = 0. That combined + * with the fact wireless events never really + * change interface state (as far as core + * networking is concerned) they can be ignored +
[ovs-dev] [PATCH v2 3/3] netdev-offload-tc: Add support for ct_state flags inv and rpl
Signed-off-by: Paul Blakey Acked-by: Roi Dayan --- lib/netdev-offload-tc.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 0775883..8553cb2 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -751,6 +751,20 @@ parse_tc_flower_to_match(struct tc_flower *flower, ct_statem |= OVS_CS_F_TRACKED; } +if (mask->ct_state & TCA_FLOWER_KEY_CT_FLAGS_REPLY) { +if (key->ct_state & TCA_FLOWER_KEY_CT_FLAGS_REPLY) { +ct_statev |= OVS_CS_F_REPLY_DIR; +} +ct_statem |= OVS_CS_F_REPLY_DIR; +} + +if (mask->ct_state & TCA_FLOWER_KEY_CT_FLAGS_INVALID) { +if (key->ct_state & TCA_FLOWER_KEY_CT_FLAGS_INVALID) { +ct_statev |= OVS_CS_F_INVALID; +} +ct_statem |= OVS_CS_F_INVALID; +} + match_set_ct_state_masked(match, ct_statev, ct_statem); } @@ -1492,6 +1506,22 @@ parse_match_ct_state_to_flower(struct tc_flower *flower, struct match *match) mask->ct_state &= ~OVS_CS_F_TRACKED; } +if (mask->ct_state & OVS_CS_F_REPLY_DIR) { +if (key->ct_state & OVS_CS_F_REPLY_DIR) { +flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_REPLY; +} +flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_REPLY; +mask->ct_state &= ~OVS_CS_F_REPLY_DIR; +} + +if (mask->ct_state & OVS_CS_F_INVALID) { +if (key->ct_state & OVS_CS_F_INVALID) { +flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_INVALID; +} +flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_INVALID; +mask->ct_state &= ~OVS_CS_F_INVALID; +} + if (flower->key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) { flower->key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); flower->mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2 2/3] netdev-offload-tc: Probe for support for any of the ct_state flags
Upstream kernel now rejects unsupported ct_state flags. Earlier kernels, ignored it but still echoed back the requested ct_state, if ct_state was supported. ct_state initial support had trk, new, est, and rel flags. If kernel echos back ct_state, assume support for trk, new, est, and rel. If kernel rejects a specific unsupported flag, continue and use reject mechanisim to probe for flags rep and inv. Disallow inserting rules with unnsupported ct_state flags. Signed-off-by: Paul Blakey Acked-by: Roi Dayan --- lib/netdev-offload-tc.c | 205 1 file changed, 157 insertions(+), 48 deletions(-) diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 04b5e21..0775883 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -48,6 +48,7 @@ static struct hmap ufid_to_tc = HMAP_INITIALIZER(_to_tc); static struct hmap tc_to_ufid = HMAP_INITIALIZER(_to_ufid); static bool multi_mask_per_prio = false; static bool block_support = false; +static uint16_t ct_state_support; struct netlink_field { int offset; @@ -1456,6 +1457,66 @@ flower_match_to_tun_opt(struct tc_flower *flower, const struct flow_tnl *tnl, flower->mask.tunnel.metadata.present.len = tnl->metadata.present.len; } +static void +parse_match_ct_state_to_flower(struct tc_flower *flower, struct match *match) +{ +const struct flow *key = >flow; +struct flow *mask = >wc.masks; + +if (!ct_state_support) { +return; +} + +if ((ct_state_support & mask->ct_state) == mask->ct_state) { +if (mask->ct_state & OVS_CS_F_NEW) { +if (key->ct_state & OVS_CS_F_NEW) { +flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW; +} +flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW; +mask->ct_state &= ~OVS_CS_F_NEW; +} + +if (mask->ct_state & OVS_CS_F_ESTABLISHED) { +if (key->ct_state & OVS_CS_F_ESTABLISHED) { +flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; +} +flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; +mask->ct_state &= ~OVS_CS_F_ESTABLISHED; +} + +if (mask->ct_state & OVS_CS_F_TRACKED) { +if (key->ct_state & OVS_CS_F_TRACKED) { +flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED; +} +flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED; +mask->ct_state &= ~OVS_CS_F_TRACKED; +} + +if (flower->key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) { +flower->key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); +flower->mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); +} +} + +if (mask->ct_zone) { +flower->key.ct_zone = key->ct_zone; +flower->mask.ct_zone = mask->ct_zone; +mask->ct_zone = 0; +} + +if (mask->ct_mark) { +flower->key.ct_mark = key->ct_mark; +flower->mask.ct_mark = mask->ct_mark; +mask->ct_mark = 0; +} + +if (!ovs_u128_is_zero(mask->ct_label)) { +flower->key.ct_label = key->ct_label; +flower->mask.ct_label = mask->ct_label; +mask->ct_label = OVS_U128_ZERO; +} +} + static int netdev_tc_flow_put(struct netdev *netdev, struct match *match, struct nlattr *actions, size_t actions_len, @@ -1708,54 +1769,7 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, } } -if (mask->ct_state) { -if (mask->ct_state & OVS_CS_F_NEW) { -if (key->ct_state & OVS_CS_F_NEW) { -flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW; -} -flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW; -mask->ct_state &= ~OVS_CS_F_NEW; -} - -if (mask->ct_state & OVS_CS_F_ESTABLISHED) { -if (key->ct_state & OVS_CS_F_ESTABLISHED) { -flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; -} -flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; -mask->ct_state &= ~OVS_CS_F_ESTABLISHED; -} - -if (mask->ct_state & OVS_CS_F_TRACKED) { -if (key->ct_state & OVS_CS_F_TRACKED) { -flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED; -} -flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED; -mask->ct_state &= ~OVS_CS_F_TRACKED; -} - -if (flower.key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) { -flower.key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); -flower.mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW); -} -} - -if (mask->ct_zone) { -flower.key.ct_zone = key->ct_zone; -flower.mask.ct_zone = mask->ct_zone; -mask->ct_zone = 0; -} - -if (mask->ct_mark) { -flower.key.ct_mark = key->ct_mark; -
[ovs-dev] [PATCH v2 0/3] Add offload support for ct_state rpl and inv flags
Add offload support for ct_state rpl and inv flags. For example: ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs "table=0, ip,ct_state=-trk actions=ct(table=1,zone=5)" ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+new actions=ct(zone=5, commit),normal" ovs-ofctl add-flow br-ovs "table=1, ip,ct_zone=5,ct_state=+trk+est+rpl actions=normal" ovs-ofctl add-flow br-ovs "table=1, ip,ct_zone=5,ct_state=+trk+est-rpl actions=normal" Also probe for actual kernel ct_state per flag support. Changelog: v1->v2: In probe patch, if no support for ct_state flags, dont support mark/label/zone... Paul Blakey (3): compat: Add ct_state flags definitions netdev-offload-tc: Probe for support for any of the ct_state flags netdev-offload-tc: Add support for ct_state flags inv and rpl acinclude.m4| 6 +- include/linux/pkt_cls.h | 5 +- lib/netdev-offload-tc.c | 235 ++-- 3 files changed, 194 insertions(+), 52 deletions(-) -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2 1/3] compat: Add ct_state flags definitions
Add TCA_FLOWER_KEY_CT_FLAGS_REPLY, and TCA_FLOWER_KEY_CT_FLAGS_INVALID. Signed-off-by: Paul Blakey Acked-by: Roi Dayan --- acinclude.m4| 6 +++--- include/linux/pkt_cls.h | 5 - 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 435685c..15a54d6 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -209,10 +209,10 @@ dnl Configure Linux tc compat. AC_DEFUN([OVS_CHECK_LINUX_TC], [ AC_COMPILE_IFELSE([ AC_LANG_PROGRAM([#include ], [ -int x = TCA_ACT_FLAGS; +int x = TCA_FLOWER_KEY_CT_FLAGS_REPLY; ])], -[AC_DEFINE([HAVE_TCA_ACT_FLAGS], [1], - [Define to 1 if TCA_ACT_FLAGS is available.])]) +[AC_DEFINE([HAVE_TCA_FLOWER_KEY_CT_FLAGS_REPLY], [1], + [Define to 1 if TCA_FLOWER_KEY_CT_FLAGS_REPLY is available.])]) AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include ]) diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index b0a5ce8..bc51a57 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -1,7 +1,7 @@ #ifndef __LINUX_PKT_CLS_WRAPPER_H #define __LINUX_PKT_CLS_WRAPPER_H 1 -#if defined(__KERNEL__) || defined(HAVE_TCA_ACT_FLAGS) +#if defined(__KERNEL__) || defined(HAVE_TCA_FLOWER_KEY_CT_FLAGS_REPLY) #include_next #else @@ -255,6 +255,9 @@ enum { TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ + TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */ + TCA_FLOWER_KEY_CT_FLAGS_REPLY = 1 << 5, /* Packet is in the reply direction. */ + __TCA_FLOWER_KEY_CT_FLAGS_MAX, }; enum { -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn 00/11] ovn-northd-ddlog improvements
On Thu, Mar 4, 2021 at 9:40 AM Ben Pfaff wrote: > > The first commit here is an improvement to the tests. It is not > ddlog specific: > tests: Improve synchronization and debuggability. > > The next commit is an improvement to ovs-sandbox for > ovn-northd-ddlog: > ovs-sandbox: Make it possible to disable recording ddlog input. > > The next several commits improve the ddlog code from my viewpoint > as someone who likes to read code, without changing its behavior: > ovn-northd-ddlog: Improve type safety for datapath stages. > ovn-northd-ddlog: Use object form of is_some(), drop is_none(). > ovn-northd-ddlog: Make map_get_*() more object-like. > ovn-northd-ddlog: Add general-purpose bitwise library. > ovn-northd-ddlog: Define in_addr, in6_addr, eth_addr in ddlog code. > > These commits are performance improvements. The one in the middle > (come to think of it, it should not be in the middle) modifies > ovn-northd-ddlog.c, the other two modify the ddlog code: > ovn-northd-ddlog: Avoid N*M crossproduct joining switches with > routers. > ovn-northd-ddlog: Apply multiple database updates in single ddlog txn. > ovn-northd-ddlog: Rephrase RouterStaticRoute rule. > > This makes it easier to debug performance: > ovn-northd-ddlog: Add profiling support. > > northd/automake.mk|4 +- > northd/bitwise.dl | 272 ++ > northd/bitwise.rs | 133 + > northd/helpers.dl | 22 +- > northd/ipam.dl| 66 ++- > northd/lrouter.dl | 55 +- > northd/lswitch.dl | 26 +- > northd/multicast.dl | 52 +- > northd/ovn-northd-ddlog.c | 155 -- > northd/ovn-northd.8.xml | 36 +- > northd/ovn.dl | 286 +-- > northd/ovn.rs | 295 --- > northd/ovn_northd.dl | 1018 + > tests/ovn.at | 23 +- > tutorial/ovs-sandbox |7 +- > 15 files changed, 1373 insertions(+), 1077 deletions(-) > create mode 100644 northd/bitwise.dl > create mode 100644 northd/bitwise.rs Thanks for this series. Acked-by: Numan Siddique Please note that I'm still learning ddlog and haven't understood much of the northd ddlog code. I ran the patches and found a few northd-ddlog tests to be failing when run with -j5. If I run individually they pass. May be the scale improvements in northd-ddlog could help in those tests. I found one particular test failing 100% of the time. Can you please take a look into that. Checking for 1 rows in sb Chassis_Private with name=hv3 nb_cfg=1... found 1 -- 3. *_cfg(*_cfg_timestamp): nb=2(+1117) sb=2(+1107) hv=1(+0) test 2 = 2 ../../tests/ovn-macros.at:327: "$@" test 1614860305430 -gt 1614860304313 ../../tests/ovn-macros.at:327: "$@" test 1614860305487 -gt 1614860304380 ../../tests/ovn-macros.at:327: "$@" test 1614860304343 -gt 0 ../../tests/ovn-macros.at:327: "$@" ovn-controller -vconsole:off --detach --no-chdir --pidfile --log-file Waiting until 1 rows in sb Chassis_Private with name=hv3 nb_cfg=2... ovn-macros.at:378: waiting until test $count = $(count_rows $db:$table $a $b $c $d $e)... ovn-macros.at:378: wait succeeded immediately -- 4. *_cfg(*_cfg_timestamp): nb=2(+0) sb=2(+0) hv=1(-1) hv3_ts=1614860305531 test 1614860305531 = 1614860304342 ../../tests/ovn-macros.at:327: "$@" ../../tests/ovn-macros.at:327: exit code was 1, expected 0 292. ovn.at:23123: 292. ovn -- nb_cfg timestamp -- ovn-northd-ddlog (ovn.at:23123): FAILED (1614860305531) --- Another issue I want to report (which I see happening with the present master). When the sources are configured with "CFLAGS=-g -fsanitize=address", many tests fail with address sanitizer complaining about memory leaks. I have copied below some of the memory logs. Please let me know if you need the complete log files. - = 40. ovn.at:1903: 40. ovn -- 3 HVs, 1 LS, 3 lports/HV -- ovn-northd-ddlog (ovn.at:1903): FAILED (ovs-macros.at:228) ==2426681==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1824 byte(s) in 76 object(s) allocated from: #0 0x14fd5df453cf in __interceptor_malloc (/lib64/libasan.so.6+0xab3cf) #1 0x4141ff in ddlog_delta_get_table (/home/nusiddiq/workspace_cpp/ovn-org/ovn-for-reviews/ovn/_gcc/northd/ovn-northd-ddlog+0x4141ff) #2 0x410eac in main ../northd/ovn-northd-ddlog.c:1233 #3 0x14fd5d6921e1 in __libc_start_main (/lib64/libc.so.6+0x281e1) Direct leak of 720 byte(s) in 30 object(s) allocated from: #0 0x14fd5df453cf in __interceptor_malloc (/lib64/libasan.so.6+0xab3cf) #1 0x412a75 in ddlog_transaction_commit_dump_changes (/home/nusiddiq/workspace_cpp/ovn-org/ovn-for-reviews/ovn/_gcc/northd/ovn-northd-ddlog+0x412a75) #2 0x40cf24 in northd_parse_updates ../northd/ovn-northd-ddlog.c:448 #3 0x40d6ac in northd_run ../northd/ovn-northd-ddlog.c:548 #4 0x410e93 in main ../northd/ovn-northd-ddlog.c:1232 #5
Re: [ovs-dev] [PATCH 2/3] netdev-offload-tc: Probe for support for any of the ct_state flags
On Wed, 3 Mar 2021, Marcelo Ricardo Leitner wrote: > Hi, > > On Wed, Mar 03, 2021 at 02:15:35PM +0200, Paul Blakey wrote: > > Upstream kernel now rejects unsupported ct_state flags. > > Earlier kernels, ignored it but still echoed back the requested ct_state, > > if ct_state was supported. ct_state initial support had trk, new, est, > > and rel flags. > > > > If kernel echos back ct_state, assume support for trk, new, est, and > > rel. If kernel rejects a specific unsupported flag, continue and > > use reject mechanisim to probe for flags rep and inv. > > > > Disallow inserting rules with unnsupported ct_state flags. > > > > Signed-off-by: Paul Blakey > > Acked-by: Roi Dayan > > --- > > lib/netdev-offload-tc.c | 96 > > + > > 1 file changed, 96 insertions(+) > > > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > > index 04b5e21..8940409 100644 > > --- a/lib/netdev-offload-tc.c > > +++ b/lib/netdev-offload-tc.c > > @@ -48,6 +48,7 @@ static struct hmap ufid_to_tc = > > HMAP_INITIALIZER(_to_tc); > > static struct hmap tc_to_ufid = HMAP_INITIALIZER(_to_ufid); > > static bool multi_mask_per_prio = false; > > static bool block_support = false; > > +static uint16_t ct_state_support; > > > > struct netlink_field { > > int offset; > > @@ -1709,6 +1710,10 @@ netdev_tc_flow_put(struct netdev *netdev, struct > > match *match, > > } > > > > if (mask->ct_state) { > > +if ((ct_state_support & mask->ct_state) != mask->ct_state) { > > +return -EOPNOTSUPP; > > +} > > + > > if (mask->ct_state & OVS_CS_F_NEW) { > > if (key->ct_state & OVS_CS_F_NEW) { > > flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW; > > Considering some ct_state support is a must for CT with TC, it can use > a similar check on other fields as well, like (c'ed) the below, to > not try to offload flows with anything related to CT and not just > ct_state. Thoughts? Yes ill add these to v2. > > @@ -1739,13 +1739,13 @@ netdev_tc_flow_put(struct netdev *netdev, struct > match *match, > } > } > > -if (mask->ct_zone) { > +if (mask->ct_zone && ct_state_support) { > > (either this way or return right away as you did above) > > flower.key.ct_zone = key->ct_zone; > flower.mask.ct_zone = mask->ct_zone; > mask->ct_zone = 0; > } > > -if (mask->ct_mark) { > +if (mask->ct_mark && ct_state_support) { > flower.key.ct_mark = key->ct_mark; > flower.mask.ct_mark = mask->ct_mark; > mask->ct_mark = 0; > @@ -1837,6 +1837,10 @@ netdev_tc_flow_put(struct netdev *netdev, struct match > *match, > const struct nlattr *ct = nl_attr_get(nla); > const size_t ct_len = nl_attr_get_size(nla); > > + if (!ct_state_support) { > + return -EOPNOTSUPP; > + } > + > err = parse_put_flow_ct_action(, action, ct, ct_len); > if (err) { > return err; > > > > @@ -2029,6 +2034,96 @@ out: > > tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS); > > } > > > > + > > +static int > > +probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id > > *id) > > +{ > > +int prio = TC_RESERVED_PRIORITY_MAX + 1; > > +struct tc_flower flower; > > + > > +memset(, 0, sizeof flower); > > +flower.key.ct_state = ct_state; > > +flower.mask.ct_state = ct_state; > > +flower.tc_policy = TC_POLICY_SKIP_HW; > > +flower.key.eth_type = htons(ETH_P_IP); > > +flower.mask.eth_type = OVS_BE16_MAX; > > + > > +*id = tc_make_tcf_id(ifindex, 0, prio, TC_INGRESS); > > +return tc_replace_flower(id, ); > > +} > > + > > +static void > > +probe_ct_state_support(int ifindex) > > +{ > > +struct tc_flower flower; > > +uint16_t ct_state; > > +struct tcf_id id; > > +int error; > > + > > +error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS); > > +if (error) { > > +return; > > +} > > + > > +/* Test for base ct_state match support */ > > +ct_state = TCA_FLOWER_KEY_CT_FLAGS_NEW | > > TCA_FLOWER_KEY_CT_FLAGS_TRACKED; > > +error = probe_insert_ct_state_rule(ifindex, ct_state, ); > > +if (error) { > > +goto out; > > +} > > + > > +error = tc_get_flower(, ); > > +if (error || flower.mask.ct_state != ct_state) { > > +goto out_del; > > +} > > + > > +tc_del_filter(); > > +ct_state_support = TCA_FLOWER_KEY_CT_FLAGS_NEW | > > + TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED | > > + TCA_FLOWER_KEY_CT_FLAGS_TRACKED | > > + TCA_FLOWER_KEY_CT_FLAGS_RELATED; > > + > > +/* Test for reject, ct_state >= MAX */ > > +ct_state = ~0; > > +error = probe_insert_ct_state_rule(ifindex, ct_state, ); > > +if (!error) { > > +/* No reject, can't continue probing other
Re: [ovs-dev] [PATCH v3 2/2] dpctl: dpif: allow viewing and configuring dp cache sizes
On 3 Mar 2021, at 21:44, Flavio Leitner wrote: > On Wed, Mar 03, 2021 at 02:53:22PM +0100, Eelco Chaudron wrote: >> This patch adds a general way of viewing/configuring datapath >> cache sizes. With an implementation for the netlink interface. >> >> The ovs-dpctl/ovs-appctl show commands will display the >> current cache sizes configured: >> >> ovs-dpctl show >> system@ovs-system: >> lookups: hit:25 missed:63 lost:0 >> flows: 0 >> masks: hit:282 total:0 hit/pkt:3.20 >> cache: hit:4 hit rate:4.5455% >> caches: >> masks-cache: size: 256 >> port 0: ovs-system (internal) >> port 1: br-int (internal) >> port 2: genev_sys_6081 (geneve: packet_type=ptap) >> port 3: br-ex (internal) >> port 4: eth2 >> port 5: sw0p1 (internal) >> port 6: sw0p3 (internal) >> >> A specific cache can be configured as follows: >> >> ovs-appctl dpctl/cache-set-size DP CACHE SIZE >> ovs-dpctl cache-set-size DP CACHE SIZE >> >> For example to disable the cache do: >> >> $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0 >> Setting cache size successful, new size 0. >> >> Signed-off-by: Eelco Chaudron >> --- > > Same nit in the commit with regards to precision as before. > > The API doesn't say anything about cache 'name' pointer ownership > most probably because it is hardcoded to a single name. I think > it's fine for now but in the future we might need to change that > using xstrdup() and let the caller calls free(). > > Thanks for including the test unit. Works here. > > Acked-by: Flavio Leitner Thanks Flavio! Ilya do you need a v4 for the commit message change? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] dpctl: dpif: add kernel datapath cache hit output
On 3 Mar 2021, at 21:44, Flavio Leitner wrote: On Wed, Mar 03, 2021 at 02:52:33PM +0100, Eelco Chaudron wrote: This patch adds cache usage statistics to the output: $ ovs-dpctl show system@ovs-system: lookups: hit:24 missed:71 lost:0 flows: 0 masks: hit:334 total:0 hit/pkt:3.52 cache: hit:4 hit rate:4.2105% nit: maybe the maintainer can fix that to be 2 digits as the code below. port 0: ovs-system (internal) port 1: genev_sys_6081 (geneve: packet_type=ptap) port 2: br-int (internal) port 3: br-ex (internal) port 4: eth2 port 5: sw1p1 (internal) port 6: sw0p4 (internal) Signed-off-by: Eelco Chaudron --- v2: - Changed precision to 2 digits - Handle missing kernel feature at netlink level v3: - Rebase on the latest master branch datapath/linux/compat/include/linux/openvswitch.h |2 +- lib/dpctl.c |9 + lib/dpif-netdev.c |1 + lib/dpif-netlink.c|9 + lib/dpif.h|2 ++ 5 files changed, 22 insertions(+), 1 deletion(-) diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 875de2025..690d78871 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -122,8 +122,8 @@ struct ovs_dp_megaflow_stats { __u64 n_mask_hit;/* Number of masks used for flow lookups. */ __u32 n_masks; /* Number of masks for the datapath. */ __u32 pad0; /* Pad for future expension. */ + __u64 n_cache_hit; /* Number of cache matches for flow lookups. */ another nit: the datapath uses tab + spaces. Otherwise: Acked-by: Flavio Leitner Thanks Flavio! Ilya do you need a v4 for the nits? __u64 pad1; /* Pad for future expension. */ - __u64 pad2; /* Pad for future expension. */ }; struct ovs_vport_stats { diff --git a/lib/dpctl.c b/lib/dpctl.c index ef8ae7402..acc677974 100644 --- a/lib/dpctl.c +++ b/lib/dpctl.c @@ -611,6 +611,15 @@ show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p) dpctl_print(dpctl_p, " masks: hit:%"PRIu64" total:%"PRIu32 " hit/pkt:%.2f\n", stats.n_mask_hit, stats.n_masks, avg); + +if (stats.n_cache_hit != UINT64_MAX) { +double avg_hits = n_pkts ? +(double) stats.n_cache_hit / n_pkts * 100 : 0.0; + +dpctl_print(dpctl_p, +" cache: hit:%"PRIu64" hit rate:%.2f%%\n", +stats.n_cache_hit, avg_hits); +} } } diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 816945375..c3d3aac36 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2025,6 +2025,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) } stats->n_masks = UINT32_MAX; stats->n_mask_hit = UINT64_MAX; +stats->n_cache_hit = UINT64_MAX; return 0; } diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index ceb56c685..392b894c5 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -672,9 +672,18 @@ dpif_netlink_get_stats(const struct dpif *dpif_, struct dpif_dp_stats *stats) stats->n_masks = dp.megaflow_stats->n_masks; stats->n_mask_hit = get_32aligned_u64( _stats->n_mask_hit); +stats->n_cache_hit = get_32aligned_u64( +_stats->n_cache_hit); + +if (!stats->n_cache_hit) { +/* Old kernels don't use this field and always + * report zero instead. Disable this stat. */ +stats->n_cache_hit = UINT64_MAX; +} } else { stats->n_masks = UINT32_MAX; stats->n_mask_hit = UINT64_MAX; +stats->n_cache_hit = UINT64_MAX; } ofpbuf_delete(buf); } diff --git a/lib/dpif.h b/lib/dpif.h index ecda896c7..600e4522c 100644 --- a/lib/dpif.h +++ b/lib/dpif.h @@ -429,6 +429,8 @@ struct dpif_dp_stats { uint64_t n_missed; /* Number of flow table misses. */ uint64_t n_lost;/* Number of misses not sent to userspace. */ uint64_t n_flows; /* Number of flows present. */ +uint64_t n_cache_hit; /* Number of mega flow mask cache hits for + flow table matches. */ uint64_t n_mask_hit;/* Number of mega flow masks visited for flow table matches. */ uint32_t n_masks; /* Number of mega flow masks. */ -- fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev