Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.

2021-03-04 Thread Han Zhou
On Thu, Mar 4, 2021 at 5:25 PM Numan Siddique  wrote:
>
> On Fri, Mar 5, 2021 at 4:22 AM Han Zhou  wrote:
> >
> > On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique  wrote:
> > >
> > > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou  wrote:
> > > >
> > > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique 
wrote:
> > > > >
> > > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou  wrote:
> > > > > >
> > > > > > On Wed, Feb 24, 2021 at 5:27 AM  wrote:
> > > > > > >
> > > > > > > From: Numan Siddique 
> > > > > > >
> > > > > > > Presently we add 65535 priority lflows in the stages -
> > > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > > > > > > match on 'ct.inv'.
> > > > > > >
> > > > > > > As per the 'ovs-fields' man page, this
> > > > > > > ct state field can be used to identify problems such as:
> > > > > > >  •  L3/L4 protocol handler is not loaded/unavailable.
> > > > > > >
> > > > > > >  •  L3/L4 protocol handler determines that the packet is
> > > > > > > malformed.
> > > > > > >
> > > > > > >  •  Packets are unexpected length for protocol.
> > > > > > >
> > > > > > > This patch removes the usage of this field for the following
> > > > > > > reasons:
> > > > > > >
> > > > > > >  • Some of the smart NICs which support offloading datapath
> > > > > > >flows don't support this field.
> > > > > >
> > > > > > What do you mean by "don't support this field"? Do you mean the
NIC
> > > > > > offloading supports connection tracking, but cannot detect if a
> > packet
> > > > is
> > > > > > invalid and always populate the ct.inv as 0?
> > > > >
> > > > > I think so. From what I understand, the kernel conntrack feature
is
> > used
> > > > > for the actual connection tracking. So NIC can't tell if the
packet is
> > > > > invalid or not
> > > > > (say due to out-of-window tcp errors).
> > > > >
> > > > I know some NICs support CT offloading and some doesn't.
> > > > So here what you are referring are the NICs that doesn't support CT
> > > > offloading, which falls back to kernel datapath when CT is used, is
it?
> > If
> > > > this is the case, then even without ct.inv it still couldn't support
> > > > ct.est, etc. right?
> > > > Or, do you mean this is specifically for NICs that support CT
offloading
> > > > but not ct.inv, i.e. it can do regular conntrack in NIC but just
can't
> > > > identify out-of-window packets, and that's why it supports ct.est
but
> > not
> > > > ct.inv?
> > > > I am still quite confused. Could clarify a little more, which types
of
> > NICs
> > > > would benefit from this, and how?
> > >
> > > I'm not sure if I can explain the issue well.  Can you please look
> > > into this bugzilla -
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1921946
> > > We can discuss further if you have further questions or comments.
> > >
> > Unfortunately this one seems to require access permission.
>
> Ok. Let me try to share in some other way.
>
> >
> > >
> > > >
> > > > > >
> > > > > > >
> > > > > > >  • A recent commit in kernel ovs datapath sets the committed
> > > > > > >connection tracking entry to be liberal for out-of-window
> > > > > > >tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> > > > > > >packets will not be marked as invalid.
> > > > > > >
> > > > > >
> > > > > > Could you share a link to this commit?
> > > > >
> > > > > Sure.
> > > > >
> > > >
> >
https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff
> > > > >
> > > > Thanks for sharing. So OVS is not capable of detecting a
out-of-window
> > > > packet now. Could you explain more about the motivation? I couldn't
get
> > the
> > > > full picture from commit message of that patch. Do you have a link
that
> > > > discusses more details?
> > >
> > > Let me share with you the patch which I first submitted to handle this
> > issue.
> > > During the review, @Florian Westphal suggested being liberal for
> > out-of-window
> > > packets to solve this issue.
> > >
> > > I think the patch discussions have enough information. Please let me
know
> > > if you have further questions or comments and we can discuss further.
> > >
> > > Initial approach taken by me -
> > >
> >
https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/
> > > Final approach taken -
> > >
> >
https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/
> > >
> > Thanks for the pointers. Now I have a better context. It seems all these
> > work was to deal with (optimize) the LB (stateful) + stateless ACL use
> > cases.
> > 1. we don't want to track packets coming from VIF (because there is no
> > stateful ACL)
> > 2. but packets to VIF need to go through CT because there is LB
configured
> > which requires CT (for nat), which regarded return packets as invalid.
> >
> > With the above patch, the return packets won't be invalid any more in
the
> > above scenario.
> > However, isn't the stateful ACL support also insufficient now? With 

Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.

2021-03-04 Thread Numan Siddique
On Fri, Mar 5, 2021 at 7:10 AM Ben Pfaff  wrote:
>
> On Fri, Mar 05, 2021 at 06:46:55AM +0530, Numan Siddique wrote:
> > On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff  wrote:
> > >
> > > On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote:
> > > > From: Numan Siddique 
> > > >
> > > > The commit c6e21a23bd8 which supported the option 
> > > > 'lb_force_snat_ip=router_ip'
> > > > on a gateway router, missed out on
> > > >   - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb 
> > > > = 1'.
> > > >   - removing the flow to drop if ip.dst == router port IP in 
> > > > 'lr_in_ip_input'
> > > > stage.
> > > >
> > > > This patch fixes these issue and adds a system test to cover the
> > > > hairpin load balancer traffic for the gateway routers.
> > > >
> > > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option 
> > > > 'lb_force_snat_ip' to take router port ips.")
> > > >
> > > > CC: Ben Pfaff 
> > > > Signed-off-by: Numan Siddique 
> > >
> > > Thanks!  I took a look at the differences between the C and the DDlog
> > > versions.  I have a few comments.
> > >
> > > The C version adds a new flow to the output.  The DDlog version doesn't
> > > appear to, yet I do see a similar flow in the DDlog code.  I am not sure
> > > how that happened.
> >
> > Thanks for the review.
> >
> > Can you please tell me in which table the new flow is seen?
> >
> > If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then 
> > both
> > the C version and ddlog version do the below
> >
> >   (1) Update the flow in the table lr_in_dnat to add
> > "flags.force_snat_for_lb = 1" in the action
> > for the packets which are destined to load balancer vips.
> >
> >   (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the
> > packets for router ips.
> > The below suggestion provided by you to replace "not
> > force_lb_snat" with  ".force_lb_snat = false"
> > does that.
>
> If they do the same thing, then that's great and no change is needed.
> What surprises me is that the C version of the changes has the following
> hunk to add a new flow, but there's no corresponding change to the DDlog
> version to add a new flow.
>
> @@ -9153,10 +9153,17 @@ build_lrouter_force_snat_flows_op(struct ovn_port *op,
>
>  if (op->lrp_networks.n_ipv4_addrs) {
>  ds_clear(match);
>  ds_clear(actions);
>
> +ds_put_format(match, "inport == %s && ip4.dst == %s",
> +  op->json_key, op->lrp_networks.ipv4_addrs[0].addr_s);
> +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_UNSNAT, 110,
> +  ds_cstr(match), "ct_snat;");
> +
> +ds_clear(match);
> +
>  /* Higher priority rules to force SNAT with the router port ip.
>   * This only takes effect when the packet has already been
>   * load balanced once. */
>  ds_put_format(match, "flags.force_snat_for_lb == 1 && ip4 && "
>"outport == %s", op->json_key);
>

Oops. You're right. I totally missed it. Thanks a lot for spotting this.
I'll address this and send v2.

> > > You can optimize it by writing
> > > > +  .force_lb_snat = false,
> > > and then omitting the later clause.  With the way DDlog currently
> > > works, this isn't just cosmetic, it will actually yield faster
> > > code because DDlog will discard nonmatching records earlier.  (If
> > > that's an actual performance problem it will come out later in
> > > profiling, I'm just giving a tip here that I only recently learned
> > > myself.)
> >
> > I tried like below as per your suggestion and the compilation failed
>
> Oh, I meant like the following incremental.  Sorry I wasn't clear!
>
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 0e208c1f5160..21092cc1bcea 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -5142,12 +5142,11 @@ Flow(.logical_datapath = lr_uuid,
>   .external_ids = stage_hint(lrp_uuid)) :-
>  (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
>  .router = {.snat_ips = snat_ips,
> -  .force_lb_snat = force_lb_snat,
> +  .force_lb_snat = false,

Thanks. I'll try this out.

Numan

>.lr = nb::Logical_Router{._uuid = 
> lr_uuid}},
>  .networks = networks),
>  var addr = FlatMap(networks.ipv4_addrs),
>  not snat_ips.contains_key(IPv4{addr.addr}),
> -not force_lb_snat,
>  var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
>  Flow(.logical_datapath = lr_uuid,
>   .stage = router_stage(IN, IP_INPUT),
> @@ -5157,12 +5156,11 @@ Flow(.logical_datapath = lr_uuid,
>   .external_ids = stage_hint(lrp_uuid)) :-
>  (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
>  .router = {.snat_ips = snat_ips,
> -  .force_lb_snat = force_lb_snat,
> +   

Re: [ovs-dev] [PATCH v12 00/11] Add offload support for sFlow

2021-03-04 Thread Chris Mi

Hi Ilya,

I think about your suggestion recently. But I'm still not very clear 
about the design.

Please see my reply below:

On 3/1/2021 8:48 PM, Ilya Maximets wrote:

On 3/1/21 9:30 AM, Chris Mi wrote:

Hi Simon, Ilya,

Could I know what should we do to make progress for this patch set?
It has been posted in the community for a long time 

In general, the way to get your patches reviewed is to review other
patches.  It's simply because we still have a huge review backlog
(214 patches right now in patchwork and most of them needs review)
and bugfixes usually has a bit higher priority.  By reviewing other
patches you're reducing amount of work for maintainers so they can
get to your patches faster.

OK, I see.


For the series and comments from Eelco:
I didn't read the patches carefully, only a quick glance, but I still
do not understand why we need a separate thread to poll psample events.
Why can't we just allow usual handler threads to do that?
I'm not sure if you are aware of that the psample netlink is different 
from the ovs
netlink. Without offload, kernel sends missed packet and sFlow packet to 
userspace
using the same netlink 'ovs_packet_family'. So they can use the same 
handler thread.
But in order to offload sFlow action, we use psample kernel module to 
send sampled
packets from kernel to userspace. The format for ovs netlink message and 
psample

netlink messages are different.

   From the
architecture perspective it's not a good thing to call ofproto code
from the offload-provider.  This introduces lots of complications
and might cause lots of issues in the future.

I'd say that design should look like this:

   handler thread ->
 dpif_recv() ->
   dpif_netlink_recv() ->
 netdev_offload_recv() ->
   netdev_offload_tc_recv() ->
 nl_sock_recv()
In order to use the handler thread, I'm not sure if we should add 
psample socket
fd to every handler's epoll_fd. If we should do that, we should consider 
how to
differentiate if the event comes from ovs or psample netlink. Maybe we 
should

allocate a special port number for psample and assign it to event.data.u32.
Anyway, that's the details. If this is the right direction, I'll think 
about it.


Last three calls should be implemented.

The better version of this will be to throw away dpif part from
above call chain and make it:

   handler thread ->
 netdev_offload_recv() ->
   netdev_offload_tc_recv() ->
 nl_sock_recv()

If we throw away dpif part, maybe we have to write some duplicate epoll code
for psample only. And we can't block in nl_sock_recv(). Maybe we have to add
psample socket fd to every handler's epoll_fd. So we have to change the dpif
somehow.

I don't know if I understand your suggestion correctly. Or I missed anything
So if you have time, could you please elaborate?

Thanks,
Chris


This way we could avoid touching dpif-netdev and still have psample
offloading for the case where netdev-offload-tc is used from the
userspace datapath.

Above architecture also implies implementation of:
  - netdev_offload_recv_wait()
  - netdev_offload_recv_purge()
  - and the netdev_offload_tc_* counterparts.

Best regards, Ilya Maximets.


Thanks,
Chris

On 2/23/2021 5:08 PM, Roi Dayan wrote:


On 2021-01-27 8:23 AM, Chris Mi wrote:

This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action act_sample
uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a unique ID to
map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and send
sampled packet to the right sFlow monitoring host.

v2-v1:
- Fix robot errors.
v3-v2:
- Remove Gerrit Change-Id.
- Add patch #9 to fix older kernels build issue.
- Add travis test result.
v4-v3:
- Fix offload issue when sampling rate is 1.
v5-v4:
- Move polling thread from ofproto to netdev-offload-tc.
v6-v5:
- Rebase.
- Add GitHub Actions test result.
v7-v6:
- Remove Gerrit Change-Id.
- Fix "ERROR: Inappropriate spacing around cast"
v8-v7
- Address Eelco Chaudron's comment for patch #11.
v9-v8
- Remove sflow_len from struct dpif_sflow_attr.
- Log a debug message for other userspace actions.
v10-v9
- Address Eelco Chaudron's comments on v9.
v11-v10
- Fix a bracing error.
v12-v11
- Add duplicate sample group id check.

Chris Mi (11):
    compat: Add psample and tc sample action defines for older kernels
    ovs-kmod-ctl: Load kernel module psample
    dpif: Introduce register sFlow upcall callback API
    ofproto: Add upcall callback to process sFlow packet
    netdev-offload: Introduce register sFlow upcall callback API
    netdev-offload-tc: Implement register sFlow upcall callback API
    dpif-netlink: Implement register sFlow upcall callback API
    netdev-offload-tc: Introduce group ID management API
    

Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.

2021-03-04 Thread Ben Pfaff
On Fri, Mar 05, 2021 at 06:46:55AM +0530, Numan Siddique wrote:
> On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff  wrote:
> >
> > On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote:
> > > From: Numan Siddique 
> > >
> > > The commit c6e21a23bd8 which supported the option 
> > > 'lb_force_snat_ip=router_ip'
> > > on a gateway router, missed out on
> > >   - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 
> > > 1'.
> > >   - removing the flow to drop if ip.dst == router port IP in 
> > > 'lr_in_ip_input'
> > > stage.
> > >
> > > This patch fixes these issue and adds a system test to cover the
> > > hairpin load balancer traffic for the gateway routers.
> > >
> > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option 
> > > 'lb_force_snat_ip' to take router port ips.")
> > >
> > > CC: Ben Pfaff 
> > > Signed-off-by: Numan Siddique 
> >
> > Thanks!  I took a look at the differences between the C and the DDlog
> > versions.  I have a few comments.
> >
> > The C version adds a new flow to the output.  The DDlog version doesn't
> > appear to, yet I do see a similar flow in the DDlog code.  I am not sure
> > how that happened.
> 
> Thanks for the review.
> 
> Can you please tell me in which table the new flow is seen?
> 
> If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then 
> both
> the C version and ddlog version do the below
> 
>   (1) Update the flow in the table lr_in_dnat to add
> "flags.force_snat_for_lb = 1" in the action
> for the packets which are destined to load balancer vips.
> 
>   (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the
> packets for router ips.
> The below suggestion provided by you to replace "not
> force_lb_snat" with  ".force_lb_snat = false"
> does that.

If they do the same thing, then that's great and no change is needed.
What surprises me is that the C version of the changes has the following
hunk to add a new flow, but there's no corresponding change to the DDlog
version to add a new flow.

@@ -9153,10 +9153,17 @@ build_lrouter_force_snat_flows_op(struct ovn_port *op,
 
 if (op->lrp_networks.n_ipv4_addrs) {
 ds_clear(match);
 ds_clear(actions);
 
+ds_put_format(match, "inport == %s && ip4.dst == %s",
+  op->json_key, op->lrp_networks.ipv4_addrs[0].addr_s);
+ovn_lflow_add(lflows, op->od, S_ROUTER_IN_UNSNAT, 110,
+  ds_cstr(match), "ct_snat;");
+
+ds_clear(match);
+
 /* Higher priority rules to force SNAT with the router port ip.
  * This only takes effect when the packet has already been
  * load balanced once. */
 ds_put_format(match, "flags.force_snat_for_lb == 1 && ip4 && "
   "outport == %s", op->json_key);

> > You can optimize it by writing
> > > +  .force_lb_snat = false,
> > and then omitting the later clause.  With the way DDlog currently
> > works, this isn't just cosmetic, it will actually yield faster
> > code because DDlog will discard nonmatching records earlier.  (If
> > that's an actual performance problem it will come out later in
> > profiling, I'm just giving a tip here that I only recently learned
> > myself.)
> 
> I tried like below as per your suggestion and the compilation failed

Oh, I meant like the following incremental.  Sorry I wasn't clear!

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 0e208c1f5160..21092cc1bcea 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5142,12 +5142,11 @@ Flow(.logical_datapath = lr_uuid,
  .external_ids = stage_hint(lrp_uuid)) :-
 (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
 .router = {.snat_ips = snat_ips,
-  .force_lb_snat = force_lb_snat,
+  .force_lb_snat = false,
   .lr = nb::Logical_Router{._uuid = lr_uuid}},
 .networks = networks),
 var addr = FlatMap(networks.ipv4_addrs),
 not snat_ips.contains_key(IPv4{addr.addr}),
-not force_lb_snat,
 var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
 Flow(.logical_datapath = lr_uuid,
  .stage = router_stage(IN, IP_INPUT),
@@ -5157,12 +5156,11 @@ Flow(.logical_datapath = lr_uuid,
  .external_ids = stage_hint(lrp_uuid)) :-
 (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
 .router = {.snat_ips = snat_ips,
-  .force_lb_snat = force_lb_snat,
+  .force_lb_snat = false,
   .lr = nb::Logical_Router{._uuid = lr_uuid}},
 .networks = networks),
 var addr = FlatMap(networks.ipv6_addrs),
 not snat_ips.contains_key(IPv6{addr.addr}),
-not force_lb_snat,
 var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
 
 for (RouterPortNetworksIPv4Addr(

Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.

2021-03-04 Thread Numan Siddique
On Fri, Mar 5, 2021 at 4:22 AM Han Zhou  wrote:
>
> On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique  wrote:
> >
> > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou  wrote:
> > >
> > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique  wrote:
> > > >
> > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou  wrote:
> > > > >
> > > > > On Wed, Feb 24, 2021 at 5:27 AM  wrote:
> > > > > >
> > > > > > From: Numan Siddique 
> > > > > >
> > > > > > Presently we add 65535 priority lflows in the stages -
> > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > > > > > match on 'ct.inv'.
> > > > > >
> > > > > > As per the 'ovs-fields' man page, this
> > > > > > ct state field can be used to identify problems such as:
> > > > > >  •  L3/L4 protocol handler is not loaded/unavailable.
> > > > > >
> > > > > >  •  L3/L4 protocol handler determines that the packet is
> > > > > > malformed.
> > > > > >
> > > > > >  •  Packets are unexpected length for protocol.
> > > > > >
> > > > > > This patch removes the usage of this field for the following
> > > > > > reasons:
> > > > > >
> > > > > >  • Some of the smart NICs which support offloading datapath
> > > > > >flows don't support this field.
> > > > >
> > > > > What do you mean by "don't support this field"? Do you mean the NIC
> > > > > offloading supports connection tracking, but cannot detect if a
> packet
> > > is
> > > > > invalid and always populate the ct.inv as 0?
> > > >
> > > > I think so. From what I understand, the kernel conntrack feature is
> used
> > > > for the actual connection tracking. So NIC can't tell if the packet is
> > > > invalid or not
> > > > (say due to out-of-window tcp errors).
> > > >
> > > I know some NICs support CT offloading and some doesn't.
> > > So here what you are referring are the NICs that doesn't support CT
> > > offloading, which falls back to kernel datapath when CT is used, is it?
> If
> > > this is the case, then even without ct.inv it still couldn't support
> > > ct.est, etc. right?
> > > Or, do you mean this is specifically for NICs that support CT offloading
> > > but not ct.inv, i.e. it can do regular conntrack in NIC but just can't
> > > identify out-of-window packets, and that's why it supports ct.est but
> not
> > > ct.inv?
> > > I am still quite confused. Could clarify a little more, which types of
> NICs
> > > would benefit from this, and how?
> >
> > I'm not sure if I can explain the issue well.  Can you please look
> > into this bugzilla -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1921946
> > We can discuss further if you have further questions or comments.
> >
> Unfortunately this one seems to require access permission.

Ok. Let me try to share in some other way.

>
> >
> > >
> > > > >
> > > > > >
> > > > > >  • A recent commit in kernel ovs datapath sets the committed
> > > > > >connection tracking entry to be liberal for out-of-window
> > > > > >tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> > > > > >packets will not be marked as invalid.
> > > > > >
> > > > >
> > > > > Could you share a link to this commit?
> > > >
> > > > Sure.
> > > >
> > >
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff
> > > >
> > > Thanks for sharing. So OVS is not capable of detecting a out-of-window
> > > packet now. Could you explain more about the motivation? I couldn't get
> the
> > > full picture from commit message of that patch. Do you have a link that
> > > discusses more details?
> >
> > Let me share with you the patch which I first submitted to handle this
> issue.
> > During the review, @Florian Westphal suggested being liberal for
> out-of-window
> > packets to solve this issue.
> >
> > I think the patch discussions have enough information. Please let me know
> > if you have further questions or comments and we can discuss further.
> >
> > Initial approach taken by me -
> >
> https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/
> > Final approach taken -
> >
> https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/
> >
> Thanks for the pointers. Now I have a better context. It seems all these
> work was to deal with (optimize) the LB (stateful) + stateless ACL use
> cases.
> 1. we don't want to track packets coming from VIF (because there is no
> stateful ACL)
> 2. but packets to VIF need to go through CT because there is LB configured
> which requires CT (for nat), which regarded return packets as invalid.
>
> With the above patch, the return packets won't be invalid any more in the
> above scenario.
> However, isn't the stateful ACL support also insufficient now? With the
> above patch, middle-of-traffic packets without established connection will
> be regarded as "new", instead of "invalid", right? Then I wonder if we lose
> the value of stateful ACL completely. Any considerations here? (I am not a
> fan of stateful ACLs but just thinking if our 

Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.

2021-03-04 Thread Numan Siddique
On Fri, Mar 5, 2021 at 2:30 AM Ben Pfaff  wrote:
>
> On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > The commit c6e21a23bd8 which supported the option 
> > 'lb_force_snat_ip=router_ip'
> > on a gateway router, missed out on
> >   - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'.
> >   - removing the flow to drop if ip.dst == router port IP in 
> > 'lr_in_ip_input'
> > stage.
> >
> > This patch fixes these issue and adds a system test to cover the
> > hairpin load balancer traffic for the gateway routers.
> >
> > Fixes: c6e21a23bd8("northd: Provide the Gateway router option 
> > 'lb_force_snat_ip' to take router port ips.")
> >
> > CC: Ben Pfaff 
> > Signed-off-by: Numan Siddique 
>
> Thanks!  I took a look at the differences between the C and the DDlog
> versions.  I have a few comments.
>
> The C version adds a new flow to the output.  The DDlog version doesn't
> appear to, yet I do see a similar flow in the DDlog code.  I am not sure
> how that happened.

Thanks for the review.

Can you please tell me in which table the new flow is seen?

If a logical router has the option 'lb_force_nat_ip=router_ip' is set, then both
the C version and ddlog version do the below

  (1) Update the flow in the table lr_in_dnat to add
"flags.force_snat_for_lb = 1" in the action
for the packets which are destined to load balancer vips.

  (2) Don't add the priority-60 flow in 'lr_in_ip_input' to drop the
packets for router ips.
The below suggestion provided by you to replace "not
force_lb_snat" with  ".force_lb_snat = false"
does that.


>
> Looking at the following:
> >  (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
> >  .router = {.snat_ips = snat_ips,
> > +  .force_lb_snat = force_lb_snat,
> >.lr = nb::Logical_Router{._uuid = 
> > lr_uuid}},
> >  .networks = networks),
> >  var addr = FlatMap(networks.ipv4_addrs),
> >  not snat_ips.contains_key(IPv4{addr.addr}),
> > +not force_lb_snat,
> >  var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
>
> You can optimize it by writing
> > +  .force_lb_snat = false,
> and then omitting the later clause.  With the way DDlog currently
> works, this isn't just cosmetic, it will actually yield faster
> code because DDlog will discard nonmatching records earlier.  (If
> that's an actual performance problem it will come out later in
> profiling, I'm just giving a tip here that I only recently learned
> myself.)

I tried like below as per your suggestion and the compilation failed

***
ddlog -i ../northd/ovn_northd.dl -o ./northd -L
/home/nusiddiq/.local/ddlog/lib -L ./northd

error: failed to parse input file: "../northd/ovn_northd.dl" (line
5150, column 20):
unexpected "="
expecting "in"
*


diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 90839b663..a7cd0892a 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5147,7 +5147,7 @@ Flow(.logical_datapath = lr_uuid,
 .networks = networks),
 var addr = FlatMap(networks.ipv4_addrs),
 not snat_ips.contains_key(IPv4{addr.addr}),
-not force_lb_snat,
+.force_lb_snat = false,
 var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
 Flow(.logical_datapath = lr_uuid,
  .stage = router_stage(IN, IP_INPUT),
@@ -5162,7 +5162,7 @@ Flow(.logical_datapath = lr_uuid,
 .networks = networks),
 var addr = FlatMap(networks.ipv6_addrs),
 not snat_ips.contains_key(IPv6{addr.addr}),
-not force_lb_snat,
+.force_lb_snat = false,
 var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().
 

I also tried "force_lb_snat = false" and it still failed.

>From what I understand the above code does (2) which I mentioned.

Thanks
Numan

>
> Thanks,
>
> Ben.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.

2021-03-04 Thread Han Zhou
On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique  wrote:
>
> On Fri, Feb 26, 2021 at 1:15 AM Han Zhou  wrote:
> >
> > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique  wrote:
> > >
> > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou  wrote:
> > > >
> > > > On Wed, Feb 24, 2021 at 5:27 AM  wrote:
> > > > >
> > > > > From: Numan Siddique 
> > > > >
> > > > > Presently we add 65535 priority lflows in the stages -
> > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > > > > match on 'ct.inv'.
> > > > >
> > > > > As per the 'ovs-fields' man page, this
> > > > > ct state field can be used to identify problems such as:
> > > > >  •  L3/L4 protocol handler is not loaded/unavailable.
> > > > >
> > > > >  •  L3/L4 protocol handler determines that the packet is
> > > > > malformed.
> > > > >
> > > > >  •  Packets are unexpected length for protocol.
> > > > >
> > > > > This patch removes the usage of this field for the following
> > > > > reasons:
> > > > >
> > > > >  • Some of the smart NICs which support offloading datapath
> > > > >flows don't support this field.
> > > >
> > > > What do you mean by "don't support this field"? Do you mean the NIC
> > > > offloading supports connection tracking, but cannot detect if a
packet
> > is
> > > > invalid and always populate the ct.inv as 0?
> > >
> > > I think so. From what I understand, the kernel conntrack feature is
used
> > > for the actual connection tracking. So NIC can't tell if the packet is
> > > invalid or not
> > > (say due to out-of-window tcp errors).
> > >
> > I know some NICs support CT offloading and some doesn't.
> > So here what you are referring are the NICs that doesn't support CT
> > offloading, which falls back to kernel datapath when CT is used, is it?
If
> > this is the case, then even without ct.inv it still couldn't support
> > ct.est, etc. right?
> > Or, do you mean this is specifically for NICs that support CT offloading
> > but not ct.inv, i.e. it can do regular conntrack in NIC but just can't
> > identify out-of-window packets, and that's why it supports ct.est but
not
> > ct.inv?
> > I am still quite confused. Could clarify a little more, which types of
NICs
> > would benefit from this, and how?
>
> I'm not sure if I can explain the issue well.  Can you please look
> into this bugzilla -
> https://bugzilla.redhat.com/show_bug.cgi?id=1921946
> We can discuss further if you have further questions or comments.
>
Unfortunately this one seems to require access permission.

>
> >
> > > >
> > > > >
> > > > >  • A recent commit in kernel ovs datapath sets the committed
> > > > >connection tracking entry to be liberal for out-of-window
> > > > >tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> > > > >packets will not be marked as invalid.
> > > > >
> > > >
> > > > Could you share a link to this commit?
> > >
> > > Sure.
> > >
> >
https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff
> > >
> > Thanks for sharing. So OVS is not capable of detecting a out-of-window
> > packet now. Could you explain more about the motivation? I couldn't get
the
> > full picture from commit message of that patch. Do you have a link that
> > discusses more details?
>
> Let me share with you the patch which I first submitted to handle this
issue.
> During the review, @Florian Westphal suggested being liberal for
out-of-window
> packets to solve this issue.
>
> I think the patch discussions have enough information. Please let me know
> if you have further questions or comments and we can discuss further.
>
> Initial approach taken by me -
>
https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/
> Final approach taken -
>
https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/
>
Thanks for the pointers. Now I have a better context. It seems all these
work was to deal with (optimize) the LB (stateful) + stateless ACL use
cases.
1. we don't want to track packets coming from VIF (because there is no
stateful ACL)
2. but packets to VIF need to go through CT because there is LB configured
which requires CT (for nat), which regarded return packets as invalid.

With the above patch, the return packets won't be invalid any more in the
above scenario.
However, isn't the stateful ACL support also insufficient now? With the
above patch, middle-of-traffic packets without established connection will
be regarded as "new", instead of "invalid", right? Then I wonder if we lose
the value of stateful ACL completely. Any considerations here? (I am not a
fan of stateful ACLs but just thinking if our implementation reflects what
we are declaring to the users)

>
>
> >
> > > >
> > > > >  • Even if a ct.inv packet is delivered to a VIF, the
> > > > >networking stack of the VIF's kernel can handle such
> > > > >packets.
> > > >
> > > > I have some concern for this point. We shouldn't make assumptions
for
> > > > what's configured 

Re: [ovs-dev] [PATCH ovn 2/2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.

2021-03-04 Thread Ben Pfaff
On Wed, Mar 03, 2021 at 11:32:39PM +0530, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> The commit c6e21a23bd8 which supported the option 'lb_force_snat_ip=router_ip'
> on a gateway router, missed out on
>   - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'.
>   - removing the flow to drop if ip.dst == router port IP in 'lr_in_ip_input'
> stage.
> 
> This patch fixes these issue and adds a system test to cover the
> hairpin load balancer traffic for the gateway routers.
> 
> Fixes: c6e21a23bd8("northd: Provide the Gateway router option 
> 'lb_force_snat_ip' to take router port ips.")
> 
> CC: Ben Pfaff 
> Signed-off-by: Numan Siddique 

Thanks!  I took a look at the differences between the C and the DDlog
versions.  I have a few comments.

The C version adds a new flow to the output.  The DDlog version doesn't
appear to, yet I do see a similar flow in the DDlog code.  I am not sure
how that happened.

Looking at the following:
>  (.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid},
>  .router = {.snat_ips = snat_ips,
> +  .force_lb_snat = force_lb_snat,
>.lr = nb::Logical_Router{._uuid = 
> lr_uuid}},
>  .networks = networks),
>  var addr = FlatMap(networks.ipv4_addrs),
>  not snat_ips.contains_key(IPv4{addr.addr}),
> +not force_lb_snat,
>  var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec().

You can optimize it by writing
> +  .force_lb_snat = false,
and then omitting the later clause.  With the way DDlog currently
works, this isn't just cosmetic, it will actually yield faster
code because DDlog will discard nonmatching records earlier.  (If
that's an actual performance problem it will come out later in
profiling, I'm just giving a tip here that I only recently learned
myself.)

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/2] northd-ddlog: Fix lb_force_snat_ip router option.

2021-03-04 Thread Ben Pfaff
On Wed, Mar 03, 2021 at 11:32:22PM +0530, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> There were few typos because of which lflows related to
> router option lb_force_snat_ip were not generated correctly.
> 
> This patch fixes it.
> 
> Fixes: 0e77b3bcbfe("ovn-northd-ddlog: New implementation of ovn-northd based 
> on ddlog.")
> CC: Ben Pfaff 
> Signed-off-by: Numan Siddique 

Thank you for the fix!

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] python: Send notifications after the transaction ends

2021-03-04 Thread Brian Haley

Acked-by: Brian Haley 

Thanks for fixing this Terry.

-Brian

On 3/4/21 11:18 AM, Terry Wilson wrote:

The Python IDL notification mechanism was sending a notification
for each processed update in a transaction as it was processed.
This causes issues with multi-row changes that contain references
to each other.

For example, if a Logical_Router_Port is created along with a
Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then
when the notify() passes the CREATE event for the LRP, the GC will
not yet have been processed, so __getattr__ when _uuid_to_row fails
to find the GC, will return the default value for LRP.gateway_chassis
which is [].

This patch has the process_update methods return the notifications
that would be produced when a row changes, so they can be queued
and sent after all rows have been processed.

Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook")
Signed-off-by: Terry Wilson 
---
  python/ovs/db/idl.py | 39 ---
  tests/ovsdb-idl.at   | 22 ++
  tests/test-ovsdb.py  |  7 +--
  3 files changed, 51 insertions(+), 17 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/1] python: Send notifications after the transaction ends

2021-03-04 Thread 0-day Robot
Bleep bloop.  Greetings Terry Wilson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Terry Wilson  needs to sign off.
Lines checked: 220, Warnings: 0, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.

2021-03-04 Thread William Tu
One big problem of netdev-afxdp is that there is no metadata support
from the hardware at all.  For example, OVS netdev-afxdp has to do rxhash,
or TCP checksum in software, resulting in high performance overhead.

A generic meta data type for XDP frame using BTF is proposed[1] and
there is sample implementation[2][3].  This patch experiments enabling 
the XDP metadata, or called HW hints, and shows the potential performance
improvement.  The patch uses only the rxhash value provided from HW,
so avoiding at the calculation of hash at lib/dpif-netdev.c:
if (!dp_packet_rss_valid(execute->packet)) {
dp_packet_set_rss_hash(execute->packet,
   flow_hash_5tuple(execute->flow, 0));
}

Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing
cycles per packet' drops from 402 to 272.  More details below

Reference:
--
[1] https://www.kernel.org/doc/html/latest/bpf/btf.html
[2] 
https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4

Testbed:

Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox
ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by:
$ bpftool net xdp show
xdp:
enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0)
enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0)
$ bpftool net xdp set dev enp2s0f0np0 md_btf on
$ bpftool net xdp
xdp:
enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1)

Limitations/TODO:
-
1. Support only AF_XDP native mode, not zero-copy mode.
2. Currently only three fields: vlan, hash, and flow_mark, and only receive
   side supports XDP metadata.
3. Control plane, how to enable and probe the structure, not upstream yet.

OVS rxdrop without HW hints:
---
Drop rate: 4.8Mpps

pmd thread numa_id 0 core_id 3:
  packets received: 196592006
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 196592006
  smc hits: 0
  megaflow hits: 0
  avg. subtable lookups per megaflow hit: 0.00
  miss with success upcall: 0
  miss with failed upcall: 0
  avg. packets per output batch: 0.00
  idle cycles: 56009063835 (41.43%)
  processing cycles: 79164971931 (58.57%)
  avg cycles per packet: 687.59 (135174035766/196592006)
  avg processing cycles per packet: 402.69 (79164971931/196592006)

pmd thread numa_id 0 core_id 3:
  Iterations:   339607649  (0.23 us/it)
  - Used TSC cycles: 188620512777  ( 99.9 % of total cycles)
  - idle iterations:330697002  ( 40.3 % of used cycles)
  - busy iterations:  8910647  ( 59.7 % of used cycles)
  Rx packets:   285140031  (3624 Kpps, 395 cycles/pkt)
  Datapath passes:  285140031  (1.00 passes/pkt)
  - EMC hits:   28513  (100.0 %)
  - SMC hits:   0  (  0.0 %)
  - Megaflow hits:  0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls:0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:   0  (  0.0 %)
  Tx packets:   0

Perf report:
  17.56%  pmd-c03/id:11  ovs-vswitchd[.] netdev_afxdp_rxq_recv
  14.39%  pmd-c03/id:11  ovs-vswitchd[.] dp_netdev_process_rxq_port
  14.17%  pmd-c03/id:11  ovs-vswitchd[.] pmd_thread_main
  10.86%  pmd-c03/id:11  [vdso]  [.] __vdso_clock_gettime
  10.19%  pmd-c03/id:11  ovs-vswitchd[.] pmd_perf_end_iteration
   7.71%  pmd-c03/id:11  ovs-vswitchd[.] time_timespec__
   5.64%  pmd-c03/id:11  ovs-vswitchd[.] time_usec
   3.88%  pmd-c03/id:11  ovs-vswitchd[.] netdev_get_class
   2.95%  pmd-c03/id:11  ovs-vswitchd[.] netdev_rxq_recv
   2.78%  pmd-c03/id:11  libbpf.so.0.2.0 [.] xsk_socket__fd
   2.74%  pmd-c03/id:11  ovs-vswitchd[.] pmd_perf_start_iteration
   2.11%  pmd-c03/id:11  libc-2.27.so[.] __clock_gettime
   1.32%  pmd-c03/id:11  ovs-vswitchd[.] xsk_socket__fd@plt

OVS rxdrop with HW hints:
-
rxdrop rate: 4.73Mpps

pmd thread numa_id 0 core_id 7:
  packets received: 13686880
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 13686880
  smc hits: 0
  megaflow hits: 0
  avg. subtable lookups per megaflow hit: 0.00
  miss with success upcall: 0
  miss with failed upcall: 0
  avg. packets per output batch: 0.00
  idle cycles: 3182105544 (46.02%)
  processing cycles: 3732023844 (53.98%)
  avg cycles per packet: 505.16 (6914129388/13686880)
  avg processing cycles per packet: 272.67 (3732023844/13686880)

pmd thread numa_id 0 core_id 7:

  Iterations:   392909539  (0.18 us/it)
  - Used TSC cycles: 167697342678  ( 99.9 % of total cycles)
  - idle iterations:382539861  ( 46.0 % of used cycles)
  - busy iterations: 10369678  ( 54.0 % of used cycles)
  Rx packets:   331829656  (4743 Kpps, 273 cycles/pkt)
  Datapath passes:  331829656  (1.00 passes/pkt)
  - EMC hits:   331829656  (100.0 %)
  - SMC hits:

Re: [ovs-dev] [PATCH ovn 00/11] ovn-northd-ddlog improvements

2021-03-04 Thread Ben Pfaff
On Thu, Mar 04, 2021 at 06:07:42PM +0530, Numan Siddique wrote:
> On Thu, Mar 4, 2021 at 9:40 AM Ben Pfaff  wrote:
> Thanks for this series.
> 
> Acked-by: Numan Siddique 
> 
> Please note that I'm still learning ddlog and haven't understood much
> of the northd ddlog code.
> 
> I ran the patches and found a few northd-ddlog tests to be failing
> when run with -j5. If I run individually they pass.
> May be the scale improvements in northd-ddlog could help in those tests.
> 
> I found one particular test failing 100% of the time.
> 
> Can you please take a look into that.

Thanks for all the comments.  I'm looking into this stuff now.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] python: Send notifications after the transaction ends

2021-03-04 Thread Flavio Fernandes
Acked-by: Flavio Fernandes mailto:fla...@flaviof.com>>

> On Mar 4, 2021, at 11:18 AM, Terry Wilson  wrote:
> 
> The Python IDL notification mechanism was sending a notification
> for each processed update in a transaction as it was processed.
> This causes issues with multi-row changes that contain references
> to each other.
> 
> For example, if a Logical_Router_Port is created along with a
> Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then
> when the notify() passes the CREATE event for the LRP, the GC will
> not yet have been processed, so __getattr__ when _uuid_to_row fails
> to find the GC, will return the default value for LRP.gateway_chassis
> which is [].
> 
> This patch has the process_update methods return the notifications
> that would be produced when a row changes, so they can be queued
> and sent after all rows have been processed.
> 
> Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook")
> Signed-off-by: Terry Wilson 
> ---
> python/ovs/db/idl.py | 39 ---
> tests/ovsdb-idl.at   | 22 ++
> tests/test-ovsdb.py  |  7 +--
> 3 files changed, 51 insertions(+), 17 deletions(-)
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netlink: ignore IFLA_WIRELESS events

2021-03-04 Thread 0-day Robot
Bleep bloop.  Greetings Michal Kazior, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 80 characters long (recommended limit is 79)
#158 FILE: lib/rtnetlink.h:49:
   don't really indicate real netdev change that

Lines checked: 165, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] netlink: ignore IFLA_WIRELESS events

2021-03-04 Thread Michal Kazior
From: Michal Kazior 

Some older wireless drivers - ones relying on the
old and long deprecated wireless extension ioctl
system - can generate quite a bit of IFLA_WIRELESS
events depending on their configuration and
runtime conditions. These are delivered as
RTNLGRP_LINK via RTM_NEWLINK messages.

These tend to be relatively easily identifiable
because they report the change mask being 0. This
isn't guaranteed but in practice it shouldn't be a
problem. None of the wireless events that I ever
observed actually carry any unique information
about netdev states that ovs-vswitchd is
interested in. Hence ignoring these shouldn't
cause any problems.

These events can be responsible for a significant
CPU churn as ovs-vswitchd attempts to do plenty of
work for each and every netlink message regardless
of what that message carries. On low-end devices
such as consumer-grade routers these can lead to a
lot of CPU cycles being wasted, adding up to heat
output and reducing performance.

It could be argued that wireless drivers in
question should be fixed, but that isn't exactly a
trivial thing to do. Patching ovs seems far more
viable while still making sense.

Signed-off-by: Michal Kazior 
---

Notes:
v4:
 - fixed comment-too-long checkpatch warnin [0day robot]

v3:
 - dont change rtnetlink_parse() semantics, instead
   extend rtnetlink_change struct and update its
   consumers [Ilya]
 - adjusted commit log to reflect different approach
   [Ilya]

v2:
 - fix bracing style [0day robot / checkpatch]

 lib/if-notifier.c  |  7 ++-
 lib/netdev-linux.c |  9 +
 lib/route-table.c  |  4 
 lib/rtnetlink.c| 18 ++
 lib/rtnetlink.h|  3 +++
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/lib/if-notifier.c b/lib/if-notifier.c
index 9a64f9b15..7ba9cc316 100644
--- a/lib/if-notifier.c
+++ b/lib/if-notifier.c
@@ -26,9 +26,14 @@ struct if_notifier {
 };
 
 static void
-if_notifier_cb(const struct rtnetlink_change *change OVS_UNUSED, void *aux)
+if_notifier_cb(const struct rtnetlink_change *change, void *aux)
 {
 struct if_notifier *notifier;
+
+if (change->irrelevant) {
+return;
+}
+
 notifier = aux;
 notifier->cb(notifier->aux);
 }
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 6be23dbee..388288f71 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -663,6 +663,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
 {
 struct linux_lag_member *lag;
 
+if (change->irrelevant) {
+return;
+}
+
 if (change->sub && netdev_linux_kind_is_lag(change->sub)) {
 lag = shash_find_data(_shash, change->ifname);
 
@@ -887,6 +891,10 @@ netdev_linux_update(struct netdev_linux *dev, int nsid,
 const struct rtnetlink_change *change)
 OVS_REQUIRES(dev->mutex)
 {
+if (change->irrelevant) {
+return;
+}
+
 if (netdev_linux_netnsid_is_eq(dev, nsid)) {
 netdev_linux_update__(dev, change);
 }
@@ -6344,6 +6352,7 @@ netdev_linux_update_via_netlink(struct netdev_linux 
*netdev)
 }
 
 if (rtnetlink_parse(reply, change)
+&& !change->irrelevant
 && change->nlmsg_type == RTM_NEWLINK) {
 bool changed = false;
 error = 0;
diff --git a/lib/route-table.c b/lib/route-table.c
index 6c82cdfdd..a4531df1e 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -342,6 +342,10 @@ static void
 name_table_change(const struct rtnetlink_change *change,
   void *aux OVS_UNUSED)
 {
+if (change->irrelevant) {
+return;
+}
+
 /* Changes to interface status can cause routing table changes that some
  * versions of the linux kernel do not advertise for some reason. */
 route_table_valid = false;
diff --git a/lib/rtnetlink.c b/lib/rtnetlink.c
index 125802925..ebb032aa7 100644
--- a/lib/rtnetlink.c
+++ b/lib/rtnetlink.c
@@ -99,6 +99,7 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change 
*change)
 [IFLA_MTU]= { .type = NL_A_U32,.optional = true },
 [IFLA_ADDRESS] = { .type = NL_A_UNSPEC, .optional = true },
 [IFLA_LINKINFO] = { .type = NL_A_NESTED, .optional = true },
+[IFLA_WIRELESS] = { .type = NL_A_UNSPEC, .optional = true },
 };
 
 struct nlattr *attrs[ARRAY_SIZE(policy)];
@@ -111,6 +112,23 @@ rtnetlink_parse(struct ofpbuf *buf, struct 
rtnetlink_change *change)
 
 ifinfo = ofpbuf_at(buf, NLMSG_HDRLEN, sizeof *ifinfo);
 
+/* Wireless events can be spammy and cause a
+ * lot of unnecessary churn and CPU load in
+ * ovs-vswitchd. The best way to filter them out
+ * is to rely on the IFLA_WIRELESS and
+ * ifi_change. As per rtnetlink_ifinfo_prep() in
+ * the kernel, the ifi_change = 0. That combined
+ * with the fact wireless events never really
+ * change interface state (as 

[ovs-dev] [PATCH v5] python: Send notifications after the transaction ends

2021-03-04 Thread Terry Wilson
The Python IDL notification mechanism was sending a notification
for each processed update in a transaction as it was processed.
This causes issues with multi-row changes that contain references
to each other.

For example, if a Logical_Router_Port is created along with a
Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then
when the notify() passes the CREATE event for the LRP, the GC will
not yet have been processed, so __getattr__ when _uuid_to_row fails
to find the GC, will return the default value for LRP.gateway_chassis
which is [].

This patch has the process_update methods return the notifications
that would be produced when a row changes, so they can be queued
and sent after all rows have been processed.

Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook")
Signed-off-by: Terry Wilson 
---
 python/ovs/db/idl.py | 39 ---
 tests/ovsdb-idl.at   | 22 ++
 tests/test-ovsdb.py  |  7 +--
 3 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 5850ac7ab..4226d1cb2 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import collections
 import functools
 import uuid
 
@@ -39,6 +40,10 @@ OVSDB_UPDATE2 = 1
 CLUSTERED = "clustered"
 
 
+Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
+Notice.__new__.__defaults__ = (None,)  # default updates=None
+
+
 class Idl(object):
 """Open vSwitch Database Interface Definition Language (OVSDB IDL).
 
@@ -614,6 +619,7 @@ class Idl(object):
 raise error.Error(" is not an object",
   table_updates)
 
+notices = []
 for table_name, table_update in table_updates.items():
 table = tables.get(table_name)
 if not table:
@@ -639,7 +645,9 @@ class Idl(object):
   % (table_name, uuid_string))
 
 if version == OVSDB_UPDATE2:
-if self.__process_update2(table, uuid, row_update):
+changes = self.__process_update2(table, uuid, row_update)
+if changes:
+notices.append(changes)
 self.change_seqno += 1
 continue
 
@@ -652,17 +660,20 @@ class Idl(object):
 raise error.Error(' missing "old" and '
   '"new" members', row_update)
 
-if self.__process_update(table, uuid, old, new):
+changes = self.__process_update(table, uuid, old, new)
+if changes:
+notices.append(changes)
 self.change_seqno += 1
+for notice in notices:
+self.notify(*notice)
 
 def __process_update2(self, table, uuid, row_update):
+"""Returns Notice if a column changed, False otherwise."""
 row = table.rows.get(uuid)
-changed = False
 if "delete" in row_update:
 if row:
 del table.rows[uuid]
-self.notify(ROW_DELETE, row)
-changed = True
+return Notice(ROW_DELETE, row)
 else:
 # XXX rate-limit
 vlog.warn("cannot delete missing row %s from table"
@@ -681,29 +692,27 @@ class Idl(object):
 changed = self.__row_update(table, row, row_update)
 table.rows[uuid] = row
 if changed:
-self.notify(ROW_CREATE, row)
+return Notice(ROW_CREATE, row)
 elif "modify" in row_update:
 if not row:
 raise error.Error('Modify non-existing row')
 
 old_row = self.__apply_diff(table, row, row_update['modify'])
-self.notify(ROW_UPDATE, row, Row(self, table, uuid, old_row))
-changed = True
+return Notice(ROW_UPDATE, row, Row(self, table, uuid, old_row))
 else:
 raise error.Error(' unknown operation',
   row_update)
-return changed
+return False
 
 def __process_update(self, table, uuid, old, new):
-"""Returns True if a column changed, False otherwise."""
+"""Returns Notice if a column changed, False otherwise."""
 row = table.rows.get(uuid)
 changed = False
 if not new:
 # Delete row.
 if row:
 del table.rows[uuid]
-changed = True
-self.notify(ROW_DELETE, row)
+return Notice(ROW_DELETE, row)
 else:
 # XXX rate-limit
 vlog.warn("cannot delete missing row %s from table %s"
@@ -723,7 +732,7 @@ class Idl(object):
 if op == ROW_CREATE:
 table.rows[uuid] = row

[ovs-dev] [PATCH v4 1/1] python: Send notifications after the transaction ends

2021-03-04 Thread Terry Wilson
The Python IDL notification mechanism was sending a notification
for each processed update in a transaction as it was processed.
This causes issues with multi-row changes that contain references
to each other.

For example, if a Logical_Router_Port is created along with a
Gateway_Chassis, and the LRP.gateway_chassis set to that GC, then
when the notify() passes the CREATE event for the LRP, the GC will
not yet have been processed, so __getattr__ when _uuid_to_row fails
to find the GC, will return the default value for LRP.gateway_chassis
which is [].

This patch has the process_update methods return the notifications
that would be produced when a row changes, so they can be queued
and sent after all rows have been processed.

Fixes: d7d417fcddf9 ("Allow subclasses of Idl to define a notification hook")
---
 python/ovs/db/idl.py | 39 ---
 tests/ovsdb-idl.at   | 22 ++
 tests/test-ovsdb.py  |  7 +--
 3 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 5850ac7ab..4226d1cb2 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import collections
 import functools
 import uuid
 
@@ -39,6 +40,10 @@ OVSDB_UPDATE2 = 1
 CLUSTERED = "clustered"
 
 
+Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
+Notice.__new__.__defaults__ = (None,)  # default updates=None
+
+
 class Idl(object):
 """Open vSwitch Database Interface Definition Language (OVSDB IDL).
 
@@ -614,6 +619,7 @@ class Idl(object):
 raise error.Error(" is not an object",
   table_updates)
 
+notices = []
 for table_name, table_update in table_updates.items():
 table = tables.get(table_name)
 if not table:
@@ -639,7 +645,9 @@ class Idl(object):
   % (table_name, uuid_string))
 
 if version == OVSDB_UPDATE2:
-if self.__process_update2(table, uuid, row_update):
+changes = self.__process_update2(table, uuid, row_update)
+if changes:
+notices.append(changes)
 self.change_seqno += 1
 continue
 
@@ -652,17 +660,20 @@ class Idl(object):
 raise error.Error(' missing "old" and '
   '"new" members', row_update)
 
-if self.__process_update(table, uuid, old, new):
+changes = self.__process_update(table, uuid, old, new)
+if changes:
+notices.append(changes)
 self.change_seqno += 1
+for notice in notices:
+self.notify(*notice)
 
 def __process_update2(self, table, uuid, row_update):
+"""Returns Notice if a column changed, False otherwise."""
 row = table.rows.get(uuid)
-changed = False
 if "delete" in row_update:
 if row:
 del table.rows[uuid]
-self.notify(ROW_DELETE, row)
-changed = True
+return Notice(ROW_DELETE, row)
 else:
 # XXX rate-limit
 vlog.warn("cannot delete missing row %s from table"
@@ -681,29 +692,27 @@ class Idl(object):
 changed = self.__row_update(table, row, row_update)
 table.rows[uuid] = row
 if changed:
-self.notify(ROW_CREATE, row)
+return Notice(ROW_CREATE, row)
 elif "modify" in row_update:
 if not row:
 raise error.Error('Modify non-existing row')
 
 old_row = self.__apply_diff(table, row, row_update['modify'])
-self.notify(ROW_UPDATE, row, Row(self, table, uuid, old_row))
-changed = True
+return Notice(ROW_UPDATE, row, Row(self, table, uuid, old_row))
 else:
 raise error.Error(' unknown operation',
   row_update)
-return changed
+return False
 
 def __process_update(self, table, uuid, old, new):
-"""Returns True if a column changed, False otherwise."""
+"""Returns Notice if a column changed, False otherwise."""
 row = table.rows.get(uuid)
 changed = False
 if not new:
 # Delete row.
 if row:
 del table.rows[uuid]
-changed = True
-self.notify(ROW_DELETE, row)
+return Notice(ROW_DELETE, row)
 else:
 # XXX rate-limit
 vlog.warn("cannot delete missing row %s from table %s"
@@ -723,7 +732,7 @@ class Idl(object):
 if op == ROW_CREATE:
 table.rows[uuid] = row
 if changed:
-   

[ovs-dev] [PATCH v4 0/1] Send notifications after the transaction ends

2021-03-04 Thread Terry Wilson
Ok, I think this is likely to be the final iteration of the patch as we
are down to the typos.

In Neutron, we process these notifications in a separate thread. So what
that means is that we were experiencing race conditions where sometimes
the notification's row would have a value set, and other times it
wouldn't. In fixing some other race conditions related to that event
processing thread, we tried to add a read-only copy of the Row object
and that actually turned the race condition into a consistent failure on
py27 due to the table_updates dict not maintaining insertion-order, but
consistently putting Logical_Router_Port before Gateway_Chassis. In py3
things could still fail as in the test case provided, e.g. if rows
referenced each other in the same transaction, or depending on the order
in which ovsdb-server adds the updates.

Terry Wilson (1):
  python: Send notifications after the transaction ends

 python/ovs/db/idl.py | 39 ---
 tests/ovsdb-idl.at   | 22 ++
 tests/test-ovsdb.py  |  7 +--
 3 files changed, 51 insertions(+), 17 deletions(-)

-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] netlink: ignore IFLA_WIRELESS events

2021-03-04 Thread Michal Kazior
From: Michal Kazior 

Some older wireless drivers - ones relying on the
old and long deprecated wireless extension ioctl
system - can generate quite a bit of IFLA_WIRELESS
events depending on their configuration and
runtime conditions. These are delivered as
RTNLGRP_LINK via RTM_NEWLINK messages.

These tend to be relatively easily identifiable
because they report the change mask being 0. This
isn't guaranteed but in practice it shouldn't be a
problem. None of the wireless events that I ever
observed actually carry any unique information
about netdev states that ovs-vswitchd is
interested in. Hence ignoring these shouldn't
cause any problems.

These events can be responsible for a significant
CPU churn as ovs-vswitchd attempts to do plenty of
work for each and every netlink message regardless
of what that message carries. On low-end devices
such as consumer-grade routers these can lead to a
lot of CPU cycles being wasted, adding up to heat
output and reducing performance.

It could be argued that wireless drivers in
question should be fixed, but that isn't exactly a
trivial thing to do. Patching ovs seems far more
viable while still making sense.

Signed-off-by: Michal Kazior 
---

Notes:
v3:
 - dont change rtnetlink_parse() semantics, instead
   extend rtnetlink_change struct and update its
   consumers [Ilya]
 - adjusted commit log to reflect different approach
   [Ilya]

v2:
 - fix bracing style [0day robot / checkpatch]

 lib/if-notifier.c  |  7 ++-
 lib/netdev-linux.c |  9 +
 lib/route-table.c  |  4 
 lib/rtnetlink.c| 18 ++
 lib/rtnetlink.h|  3 +++
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/lib/if-notifier.c b/lib/if-notifier.c
index 9a64f9b15..7ba9cc316 100644
--- a/lib/if-notifier.c
+++ b/lib/if-notifier.c
@@ -26,9 +26,14 @@ struct if_notifier {
 };
 
 static void
-if_notifier_cb(const struct rtnetlink_change *change OVS_UNUSED, void *aux)
+if_notifier_cb(const struct rtnetlink_change *change, void *aux)
 {
 struct if_notifier *notifier;
+
+if (change->irrelevant) {
+return;
+}
+
 notifier = aux;
 notifier->cb(notifier->aux);
 }
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 6be23dbee..388288f71 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -663,6 +663,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
 {
 struct linux_lag_member *lag;
 
+if (change->irrelevant) {
+return;
+}
+
 if (change->sub && netdev_linux_kind_is_lag(change->sub)) {
 lag = shash_find_data(_shash, change->ifname);
 
@@ -887,6 +891,10 @@ netdev_linux_update(struct netdev_linux *dev, int nsid,
 const struct rtnetlink_change *change)
 OVS_REQUIRES(dev->mutex)
 {
+if (change->irrelevant) {
+return;
+}
+
 if (netdev_linux_netnsid_is_eq(dev, nsid)) {
 netdev_linux_update__(dev, change);
 }
@@ -6344,6 +6352,7 @@ netdev_linux_update_via_netlink(struct netdev_linux 
*netdev)
 }
 
 if (rtnetlink_parse(reply, change)
+&& !change->irrelevant
 && change->nlmsg_type == RTM_NEWLINK) {
 bool changed = false;
 error = 0;
diff --git a/lib/route-table.c b/lib/route-table.c
index 6c82cdfdd..a4531df1e 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -342,6 +342,10 @@ static void
 name_table_change(const struct rtnetlink_change *change,
   void *aux OVS_UNUSED)
 {
+if (change->irrelevant) {
+return;
+}
+
 /* Changes to interface status can cause routing table changes that some
  * versions of the linux kernel do not advertise for some reason. */
 route_table_valid = false;
diff --git a/lib/rtnetlink.c b/lib/rtnetlink.c
index 125802925..ebb032aa7 100644
--- a/lib/rtnetlink.c
+++ b/lib/rtnetlink.c
@@ -99,6 +99,7 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change 
*change)
 [IFLA_MTU]= { .type = NL_A_U32,.optional = true },
 [IFLA_ADDRESS] = { .type = NL_A_UNSPEC, .optional = true },
 [IFLA_LINKINFO] = { .type = NL_A_NESTED, .optional = true },
+[IFLA_WIRELESS] = { .type = NL_A_UNSPEC, .optional = true },
 };
 
 struct nlattr *attrs[ARRAY_SIZE(policy)];
@@ -111,6 +112,23 @@ rtnetlink_parse(struct ofpbuf *buf, struct 
rtnetlink_change *change)
 
 ifinfo = ofpbuf_at(buf, NLMSG_HDRLEN, sizeof *ifinfo);
 
+/* Wireless events can be spammy and cause a
+ * lot of unnecessary churn and CPU load in
+ * ovs-vswitchd. The best way to filter them out
+ * is to rely on the IFLA_WIRELESS and
+ * ifi_change. As per rtnetlink_ifinfo_prep() in
+ * the kernel, the ifi_change = 0. That combined
+ * with the fact wireless events never really
+ * change interface state (as far as core
+ * networking is concerned) they can be ignored
+ 

[ovs-dev] [PATCH v2 3/3] netdev-offload-tc: Add support for ct_state flags inv and rpl

2021-03-04 Thread Paul Blakey
Signed-off-by: Paul Blakey 
Acked-by: Roi Dayan 
---
 lib/netdev-offload-tc.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 0775883..8553cb2 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -751,6 +751,20 @@ parse_tc_flower_to_match(struct tc_flower *flower,
 ct_statem |= OVS_CS_F_TRACKED;
 }
 
+if (mask->ct_state & TCA_FLOWER_KEY_CT_FLAGS_REPLY) {
+if (key->ct_state & TCA_FLOWER_KEY_CT_FLAGS_REPLY) {
+ct_statev |= OVS_CS_F_REPLY_DIR;
+}
+ct_statem |= OVS_CS_F_REPLY_DIR;
+}
+
+if (mask->ct_state & TCA_FLOWER_KEY_CT_FLAGS_INVALID) {
+if (key->ct_state & TCA_FLOWER_KEY_CT_FLAGS_INVALID) {
+ct_statev |= OVS_CS_F_INVALID;
+}
+ct_statem |= OVS_CS_F_INVALID;
+}
+
 match_set_ct_state_masked(match, ct_statev, ct_statem);
 }
 
@@ -1492,6 +1506,22 @@ parse_match_ct_state_to_flower(struct tc_flower *flower, 
struct match *match)
 mask->ct_state &= ~OVS_CS_F_TRACKED;
 }
 
+if (mask->ct_state & OVS_CS_F_REPLY_DIR) {
+if (key->ct_state & OVS_CS_F_REPLY_DIR) {
+flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_REPLY;
+}
+flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_REPLY;
+mask->ct_state &= ~OVS_CS_F_REPLY_DIR;
+}
+
+if (mask->ct_state & OVS_CS_F_INVALID) {
+if (key->ct_state & OVS_CS_F_INVALID) {
+flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_INVALID;
+}
+flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_INVALID;
+mask->ct_state &= ~OVS_CS_F_INVALID;
+}
+
 if (flower->key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) {
 flower->key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
 flower->mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/3] netdev-offload-tc: Probe for support for any of the ct_state flags

2021-03-04 Thread Paul Blakey
Upstream kernel now rejects unsupported ct_state flags.
Earlier kernels, ignored it but still echoed back the requested ct_state,
if ct_state was supported. ct_state initial support had trk, new, est,
and rel flags.

If kernel echos back ct_state, assume support for trk, new, est, and
rel. If kernel rejects a specific unsupported flag, continue and
use reject mechanisim to probe for flags rep and inv.

Disallow inserting rules with unnsupported ct_state flags.

Signed-off-by: Paul Blakey 
Acked-by: Roi Dayan 
---
 lib/netdev-offload-tc.c | 205 
 1 file changed, 157 insertions(+), 48 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 04b5e21..0775883 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -48,6 +48,7 @@ static struct hmap ufid_to_tc = HMAP_INITIALIZER(_to_tc);
 static struct hmap tc_to_ufid = HMAP_INITIALIZER(_to_ufid);
 static bool multi_mask_per_prio = false;
 static bool block_support = false;
+static uint16_t ct_state_support;
 
 struct netlink_field {
 int offset;
@@ -1456,6 +1457,66 @@ flower_match_to_tun_opt(struct tc_flower *flower, const 
struct flow_tnl *tnl,
 flower->mask.tunnel.metadata.present.len = tnl->metadata.present.len;
 }
 
+static void
+parse_match_ct_state_to_flower(struct tc_flower *flower, struct match *match)
+{
+const struct flow *key = >flow;
+struct flow *mask = >wc.masks;
+
+if (!ct_state_support) {
+return;
+}
+
+if ((ct_state_support & mask->ct_state) == mask->ct_state) {
+if (mask->ct_state & OVS_CS_F_NEW) {
+if (key->ct_state & OVS_CS_F_NEW) {
+flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW;
+}
+flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW;
+mask->ct_state &= ~OVS_CS_F_NEW;
+}
+
+if (mask->ct_state & OVS_CS_F_ESTABLISHED) {
+if (key->ct_state & OVS_CS_F_ESTABLISHED) {
+flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
+}
+flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
+mask->ct_state &= ~OVS_CS_F_ESTABLISHED;
+}
+
+if (mask->ct_state & OVS_CS_F_TRACKED) {
+if (key->ct_state & OVS_CS_F_TRACKED) {
+flower->key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
+}
+flower->mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
+mask->ct_state &= ~OVS_CS_F_TRACKED;
+}
+
+if (flower->key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) {
+flower->key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
+flower->mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
+}
+}
+
+if (mask->ct_zone) {
+flower->key.ct_zone = key->ct_zone;
+flower->mask.ct_zone = mask->ct_zone;
+mask->ct_zone = 0;
+}
+
+if (mask->ct_mark) {
+flower->key.ct_mark = key->ct_mark;
+flower->mask.ct_mark = mask->ct_mark;
+mask->ct_mark = 0;
+}
+
+if (!ovs_u128_is_zero(mask->ct_label)) {
+flower->key.ct_label = key->ct_label;
+flower->mask.ct_label = mask->ct_label;
+mask->ct_label = OVS_U128_ZERO;
+}
+}
+
 static int
 netdev_tc_flow_put(struct netdev *netdev, struct match *match,
struct nlattr *actions, size_t actions_len,
@@ -1708,54 +1769,7 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
*match,
 }
 }
 
-if (mask->ct_state) {
-if (mask->ct_state & OVS_CS_F_NEW) {
-if (key->ct_state & OVS_CS_F_NEW) {
-flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW;
-}
-flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW;
-mask->ct_state &= ~OVS_CS_F_NEW;
-}
-
-if (mask->ct_state & OVS_CS_F_ESTABLISHED) {
-if (key->ct_state & OVS_CS_F_ESTABLISHED) {
-flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
-}
-flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
-mask->ct_state &= ~OVS_CS_F_ESTABLISHED;
-}
-
-if (mask->ct_state & OVS_CS_F_TRACKED) {
-if (key->ct_state & OVS_CS_F_TRACKED) {
-flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
-}
-flower.mask.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
-mask->ct_state &= ~OVS_CS_F_TRACKED;
-}
-
-if (flower.key.ct_state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED) {
-flower.key.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
-flower.mask.ct_state &= ~(TCA_FLOWER_KEY_CT_FLAGS_NEW);
-}
-}
-
-if (mask->ct_zone) {
-flower.key.ct_zone = key->ct_zone;
-flower.mask.ct_zone = mask->ct_zone;
-mask->ct_zone = 0;
-}
-
-if (mask->ct_mark) {
-flower.key.ct_mark = key->ct_mark;
-

[ovs-dev] [PATCH v2 0/3] Add offload support for ct_state rpl and inv flags

2021-03-04 Thread Paul Blakey
Add offload support for ct_state rpl and inv flags.

For example:
ovs-ofctl del-flows br-ovs
ovs-ofctl add-flow br-ovs arp,actions=normal
ovs-ofctl add-flow br-ovs "table=0, ip,ct_state=-trk actions=ct(table=1,zone=5)"
ovs-ofctl add-flow br-ovs "table=1, ip,ct_state=+trk+new actions=ct(zone=5, 
commit),normal"
ovs-ofctl add-flow br-ovs "table=1, ip,ct_zone=5,ct_state=+trk+est+rpl 
actions=normal"
ovs-ofctl add-flow br-ovs "table=1, ip,ct_zone=5,ct_state=+trk+est-rpl 
actions=normal"

Also probe for actual kernel ct_state per flag support.

Changelog:
v1->v2:
In probe patch, if no support for ct_state flags, dont support 
mark/label/zone...

Paul Blakey (3):
  compat: Add ct_state flags definitions
  netdev-offload-tc: Probe for support for any of the ct_state flags
  netdev-offload-tc: Add support for ct_state flags inv and rpl

 acinclude.m4|   6 +-
 include/linux/pkt_cls.h |   5 +-
 lib/netdev-offload-tc.c | 235 ++--
 3 files changed, 194 insertions(+), 52 deletions(-)

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/3] compat: Add ct_state flags definitions

2021-03-04 Thread Paul Blakey
Add TCA_FLOWER_KEY_CT_FLAGS_REPLY, and TCA_FLOWER_KEY_CT_FLAGS_INVALID.

Signed-off-by: Paul Blakey 
Acked-by: Roi Dayan 
---
 acinclude.m4| 6 +++---
 include/linux/pkt_cls.h | 5 -
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 435685c..15a54d6 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -209,10 +209,10 @@ dnl Configure Linux tc compat.
 AC_DEFUN([OVS_CHECK_LINUX_TC], [
   AC_COMPILE_IFELSE([
 AC_LANG_PROGRAM([#include ], [
-int x = TCA_ACT_FLAGS;
+int x = TCA_FLOWER_KEY_CT_FLAGS_REPLY;
 ])],
-[AC_DEFINE([HAVE_TCA_ACT_FLAGS], [1],
-   [Define to 1 if TCA_ACT_FLAGS is available.])])
+[AC_DEFINE([HAVE_TCA_FLOWER_KEY_CT_FLAGS_REPLY], [1],
+   [Define to 1 if TCA_FLOWER_KEY_CT_FLAGS_REPLY is available.])])
 
   AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include 
])
 
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index b0a5ce8..bc51a57 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -1,7 +1,7 @@
 #ifndef __LINUX_PKT_CLS_WRAPPER_H
 #define __LINUX_PKT_CLS_WRAPPER_H 1
 
-#if defined(__KERNEL__) || defined(HAVE_TCA_ACT_FLAGS)
+#if defined(__KERNEL__) || defined(HAVE_TCA_FLOWER_KEY_CT_FLAGS_REPLY)
 #include_next 
 #else
 
@@ -255,6 +255,9 @@ enum {
TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing 
connection. */
TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established 
connection. */
TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */
+   TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */
+   TCA_FLOWER_KEY_CT_FLAGS_REPLY = 1 << 5, /* Packet is in the reply 
direction. */
+   __TCA_FLOWER_KEY_CT_FLAGS_MAX,
 };
 
 enum {
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 00/11] ovn-northd-ddlog improvements

2021-03-04 Thread Numan Siddique
On Thu, Mar 4, 2021 at 9:40 AM Ben Pfaff  wrote:
>
> The first commit here is an improvement to the tests.  It is not
> ddlog specific:
>   tests: Improve synchronization and debuggability.
>
> The next commit is an improvement to ovs-sandbox for
> ovn-northd-ddlog:
>   ovs-sandbox: Make it possible to disable recording ddlog input.
>
> The next several commits improve the ddlog code from my viewpoint
> as someone who likes to read code, without changing its behavior:
>   ovn-northd-ddlog: Improve type safety for datapath stages.
>   ovn-northd-ddlog: Use object form of is_some(), drop is_none().
>   ovn-northd-ddlog: Make map_get_*() more object-like.
>   ovn-northd-ddlog: Add general-purpose bitwise library.
>   ovn-northd-ddlog: Define in_addr, in6_addr, eth_addr in ddlog code.
>
> These commits are performance improvements.  The one in the middle
> (come to think of it, it should not be in the middle) modifies
> ovn-northd-ddlog.c, the other two modify the ddlog code:
>   ovn-northd-ddlog: Avoid N*M crossproduct joining switches with
> routers.
>   ovn-northd-ddlog: Apply multiple database updates in single ddlog txn.
>   ovn-northd-ddlog: Rephrase RouterStaticRoute rule.
>
> This makes it easier to debug performance:
>   ovn-northd-ddlog: Add profiling support.
>
>  northd/automake.mk|4 +-
>  northd/bitwise.dl |  272 ++
>  northd/bitwise.rs |  133 +
>  northd/helpers.dl |   22 +-
>  northd/ipam.dl|   66 ++-
>  northd/lrouter.dl |   55 +-
>  northd/lswitch.dl |   26 +-
>  northd/multicast.dl   |   52 +-
>  northd/ovn-northd-ddlog.c |  155 --
>  northd/ovn-northd.8.xml   |   36 +-
>  northd/ovn.dl |  286 +--
>  northd/ovn.rs |  295 ---
>  northd/ovn_northd.dl  | 1018 +
>  tests/ovn.at  |   23 +-
>  tutorial/ovs-sandbox  |7 +-
>  15 files changed, 1373 insertions(+), 1077 deletions(-)
>  create mode 100644 northd/bitwise.dl
>  create mode 100644 northd/bitwise.rs


Thanks for this series.

Acked-by: Numan Siddique 

Please note that I'm still learning ddlog and haven't understood much
of the northd ddlog code.

I ran the patches and found a few northd-ddlog tests to be failing
when run with -j5. If I run individually they pass.
May be the scale improvements in northd-ddlog could help in those tests.

I found one particular test failing 100% of the time.

Can you please take a look into that.


Checking for 1 rows in sb Chassis_Private with name=hv3 nb_cfg=1... found 1

-- 3. *_cfg(*_cfg_timestamp): nb=2(+1117) sb=2(+1107) hv=1(+0)
test 2 = 2
../../tests/ovn-macros.at:327: "$@"
test 1614860305430 -gt 1614860304313
../../tests/ovn-macros.at:327: "$@"
test 1614860305487 -gt 1614860304380
../../tests/ovn-macros.at:327: "$@"
test 1614860304343 -gt 0
../../tests/ovn-macros.at:327: "$@"
ovn-controller -vconsole:off --detach --no-chdir --pidfile --log-file
Waiting until 1 rows in sb Chassis_Private with name=hv3 nb_cfg=2...
ovn-macros.at:378: waiting until test $count = $(count_rows $db:$table
$a $b $c $d $e)...
ovn-macros.at:378: wait succeeded immediately

-- 4. *_cfg(*_cfg_timestamp): nb=2(+0) sb=2(+0) hv=1(-1)
hv3_ts=1614860305531
test 1614860305531 = 1614860304342
../../tests/ovn-macros.at:327: "$@"
../../tests/ovn-macros.at:327: exit code was 1, expected 0
292. ovn.at:23123: 292. ovn -- nb_cfg timestamp -- ovn-northd-ddlog
(ovn.at:23123): FAILED (1614860305531)

---

Another issue I want to report (which I see happening with the present master).
When the sources are configured with "CFLAGS=-g -fsanitize=address",
many tests fail with address sanitizer
complaining about memory leaks.  I have copied below some of the
memory logs. Please let me know
if you need the complete log files.


-
=
40. ovn.at:1903: 40. ovn -- 3 HVs, 1 LS, 3 lports/HV --
ovn-northd-ddlog (ovn.at:1903): FAILED (ovs-macros.at:228)

==2426681==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1824 byte(s) in 76 object(s) allocated from:
#0 0x14fd5df453cf in __interceptor_malloc (/lib64/libasan.so.6+0xab3cf)
#1 0x4141ff in ddlog_delta_get_table
(/home/nusiddiq/workspace_cpp/ovn-org/ovn-for-reviews/ovn/_gcc/northd/ovn-northd-ddlog+0x4141ff)
#2 0x410eac in main ../northd/ovn-northd-ddlog.c:1233
#3 0x14fd5d6921e1 in __libc_start_main (/lib64/libc.so.6+0x281e1)

Direct leak of 720 byte(s) in 30 object(s) allocated from:
#0 0x14fd5df453cf in __interceptor_malloc (/lib64/libasan.so.6+0xab3cf)
#1 0x412a75 in ddlog_transaction_commit_dump_changes
(/home/nusiddiq/workspace_cpp/ovn-org/ovn-for-reviews/ovn/_gcc/northd/ovn-northd-ddlog+0x412a75)
#2 0x40cf24 in northd_parse_updates ../northd/ovn-northd-ddlog.c:448
#3 0x40d6ac in northd_run ../northd/ovn-northd-ddlog.c:548
#4 0x410e93 in main ../northd/ovn-northd-ddlog.c:1232
#5 

Re: [ovs-dev] [PATCH 2/3] netdev-offload-tc: Probe for support for any of the ct_state flags

2021-03-04 Thread Paul Blakey




On Wed, 3 Mar 2021, Marcelo Ricardo Leitner wrote:

> Hi,
> 
> On Wed, Mar 03, 2021 at 02:15:35PM +0200, Paul Blakey wrote:
> > Upstream kernel now rejects unsupported ct_state flags.
> > Earlier kernels, ignored it but still echoed back the requested ct_state,
> > if ct_state was supported. ct_state initial support had trk, new, est,
> > and rel flags.
> > 
> > If kernel echos back ct_state, assume support for trk, new, est, and
> > rel. If kernel rejects a specific unsupported flag, continue and
> > use reject mechanisim to probe for flags rep and inv.
> > 
> > Disallow inserting rules with unnsupported ct_state flags.
> > 
> > Signed-off-by: Paul Blakey 
> > Acked-by: Roi Dayan 
> > ---
> >  lib/netdev-offload-tc.c | 96 
> > +
> >  1 file changed, 96 insertions(+)
> > 
> > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> > index 04b5e21..8940409 100644
> > --- a/lib/netdev-offload-tc.c
> > +++ b/lib/netdev-offload-tc.c
> > @@ -48,6 +48,7 @@ static struct hmap ufid_to_tc = 
> > HMAP_INITIALIZER(_to_tc);
> >  static struct hmap tc_to_ufid = HMAP_INITIALIZER(_to_ufid);
> >  static bool multi_mask_per_prio = false;
> >  static bool block_support = false;
> > +static uint16_t ct_state_support;
> >  
> >  struct netlink_field {
> >  int offset;
> > @@ -1709,6 +1710,10 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
> > match *match,
> >  }
> >  
> >  if (mask->ct_state) {
> > +if ((ct_state_support & mask->ct_state) != mask->ct_state) {
> > +return -EOPNOTSUPP;
> > +}
> > +
> >  if (mask->ct_state & OVS_CS_F_NEW) {
> >  if (key->ct_state & OVS_CS_F_NEW) {
> >  flower.key.ct_state |= TCA_FLOWER_KEY_CT_FLAGS_NEW;
> 
> Considering some ct_state support is a must for CT with TC, it can use
> a similar check on other fields as well, like (c'ed) the below, to
> not try to offload flows with anything related to CT and not just
> ct_state. Thoughts?

Yes ill add these to v2.

> 
> @@ -1739,13 +1739,13 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
> match *match,
>  }
>  }
> 
> -if (mask->ct_zone) {
> +if (mask->ct_zone && ct_state_support) {
> 
> (either this way or return right away as you did above)
> 
>  flower.key.ct_zone = key->ct_zone;
>  flower.mask.ct_zone = mask->ct_zone;
>  mask->ct_zone = 0;
>  }
> 
> -if (mask->ct_mark) {
> +if (mask->ct_mark && ct_state_support) {
>  flower.key.ct_mark = key->ct_mark;
>  flower.mask.ct_mark = mask->ct_mark;
>  mask->ct_mark = 0;
> @@ -1837,6 +1837,10 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
> *match,
>  const struct nlattr *ct = nl_attr_get(nla);
>  const size_t ct_len = nl_attr_get_size(nla);
> 
> +   if (!ct_state_support) {
> +   return -EOPNOTSUPP;
> +   }
> +
>  err = parse_put_flow_ct_action(, action, ct, ct_len);
>  if (err) {
>  return err;
> 
> 
> > @@ -2029,6 +2034,96 @@ out:
> >  tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS);
> >  }
> >  
> > +
> > +static int
> > +probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id 
> > *id)
> > +{
> > +int prio = TC_RESERVED_PRIORITY_MAX + 1;
> > +struct tc_flower flower;
> > +
> > +memset(, 0, sizeof flower);
> > +flower.key.ct_state = ct_state;
> > +flower.mask.ct_state = ct_state;
> > +flower.tc_policy = TC_POLICY_SKIP_HW;
> > +flower.key.eth_type = htons(ETH_P_IP);
> > +flower.mask.eth_type = OVS_BE16_MAX;
> > +
> > +*id = tc_make_tcf_id(ifindex, 0, prio, TC_INGRESS);
> > +return tc_replace_flower(id, );
> > +}
> > +
> > +static void
> > +probe_ct_state_support(int ifindex)
> > +{
> > +struct tc_flower flower;
> > +uint16_t ct_state;
> > +struct tcf_id id;
> > +int error;
> > +
> > +error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS);
> > +if (error) {
> > +return;
> > +}
> > +
> > +/* Test for base ct_state match support */
> > +ct_state = TCA_FLOWER_KEY_CT_FLAGS_NEW | 
> > TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
> > +error = probe_insert_ct_state_rule(ifindex, ct_state, );
> > +if (error) {
> > +goto out;
> > +}
> > +
> > +error = tc_get_flower(, );
> > +if (error || flower.mask.ct_state != ct_state) {
> > +goto out_del;
> > +}
> > +
> > +tc_del_filter();
> > +ct_state_support = TCA_FLOWER_KEY_CT_FLAGS_NEW |
> > +   TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED |
> > +   TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
> > +   TCA_FLOWER_KEY_CT_FLAGS_RELATED;
> > +
> > +/* Test for reject, ct_state >= MAX */
> > +ct_state = ~0;
> > +error = probe_insert_ct_state_rule(ifindex, ct_state, );
> > +if (!error) {
> > +/* No reject, can't continue probing other 

Re: [ovs-dev] [PATCH v3 2/2] dpctl: dpif: allow viewing and configuring dp cache sizes

2021-03-04 Thread Eelco Chaudron



On 3 Mar 2021, at 21:44, Flavio Leitner wrote:

> On Wed, Mar 03, 2021 at 02:53:22PM +0100, Eelco Chaudron wrote:
>> This patch adds a general way of viewing/configuring datapath
>> cache sizes. With an implementation for the netlink interface.
>>
>> The ovs-dpctl/ovs-appctl show commands will display the
>> current cache sizes configured:
>>
>> ovs-dpctl show
>> system@ovs-system:
>>   lookups: hit:25 missed:63 lost:0
>>   flows: 0
>>   masks: hit:282 total:0 hit/pkt:3.20
>>   cache: hit:4 hit rate:4.5455%
>>   caches:
>> masks-cache: size: 256
>>   port 0: ovs-system (internal)
>>   port 1: br-int (internal)
>>   port 2: genev_sys_6081 (geneve: packet_type=ptap)
>>   port 3: br-ex (internal)
>>   port 4: eth2
>>   port 5: sw0p1 (internal)
>>   port 6: sw0p3 (internal)
>>
>> A specific cache can be configured as follows:
>>
>> ovs-appctl dpctl/cache-set-size DP CACHE SIZE
>> ovs-dpctl cache-set-size DP CACHE SIZE
>>
>> For example to disable the cache do:
>>
>> $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0
>> Setting cache size successful, new size 0.
>>
>> Signed-off-by: Eelco Chaudron 
>> ---
>
> Same nit in the commit with regards to precision as before.
>
> The API doesn't say anything about cache 'name' pointer ownership
> most probably because it is hardcoded to a single name. I think
> it's fine for now but in the future we might need to change that
> using xstrdup() and let the caller calls free().
>
> Thanks for including the test unit. Works here.
>
> Acked-by: Flavio Leitner 


Thanks Flavio! Ilya do you need a v4 for the commit message change?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] dpctl: dpif: add kernel datapath cache hit output

2021-03-04 Thread Eelco Chaudron



On 3 Mar 2021, at 21:44, Flavio Leitner wrote:


On Wed, Mar 03, 2021 at 02:52:33PM +0100, Eelco Chaudron wrote:

This patch adds cache usage statistics to the output:

$ ovs-dpctl show
system@ovs-system:
  lookups: hit:24 missed:71 lost:0
  flows: 0
  masks: hit:334 total:0 hit/pkt:3.52
  cache: hit:4 hit rate:4.2105%


nit: maybe the maintainer can fix that to be 2 digits as the code 
below.



  port 0: ovs-system (internal)
  port 1: genev_sys_6081 (geneve: packet_type=ptap)
  port 2: br-int (internal)
  port 3: br-ex (internal)
  port 4: eth2
  port 5: sw1p1 (internal)
  port 6: sw0p4 (internal)

Signed-off-by: Eelco Chaudron 
---
v2: - Changed precision to 2 digits
- Handle missing kernel feature at netlink level
v3: - Rebase on the latest master branch

 datapath/linux/compat/include/linux/openvswitch.h |2 +-
 lib/dpctl.c   |9 +
 lib/dpif-netdev.c |1 +
 lib/dpif-netlink.c|9 +
 lib/dpif.h|2 ++
 5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h

index 875de2025..690d78871 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -122,8 +122,8 @@ struct ovs_dp_megaflow_stats {
__u64 n_mask_hit;/* Number of masks used for flow lookups. */
__u32 n_masks;   /* Number of masks for the datapath. */
__u32 pad0;  /* Pad for future expension. */
+	__u64 n_cache_hit;   /* Number of cache matches for flow 
lookups. */


another nit: the datapath uses tab + spaces.

Otherwise:
Acked-by: Flavio Leitner 


Thanks Flavio! Ilya do you need a v4 for the nits?


__u64 pad1;  /* Pad for future expension. */
-   __u64 pad2;  /* Pad for future expension. */
 };

 struct ovs_vport_stats {
diff --git a/lib/dpctl.c b/lib/dpctl.c
index ef8ae7402..acc677974 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -611,6 +611,15 @@ show_dpif(struct dpif *dpif, struct dpctl_params 
*dpctl_p)
 dpctl_print(dpctl_p, "  masks: hit:%"PRIu64" 
total:%"PRIu32

  " hit/pkt:%.2f\n",
 stats.n_mask_hit, stats.n_masks, avg);
+
+if (stats.n_cache_hit != UINT64_MAX) {
+double avg_hits = n_pkts ?
+(double) stats.n_cache_hit / n_pkts * 100 : 0.0;
+
+dpctl_print(dpctl_p,
+"  cache: hit:%"PRIu64" hit 
rate:%.2f%%\n",

+stats.n_cache_hit, avg_hits);
+}
 }
 }

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 816945375..c3d3aac36 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2025,6 +2025,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, 
struct dpif_dp_stats *stats)

 }
 stats->n_masks = UINT32_MAX;
 stats->n_mask_hit = UINT64_MAX;
+stats->n_cache_hit = UINT64_MAX;

 return 0;
 }
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index ceb56c685..392b894c5 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -672,9 +672,18 @@ dpif_netlink_get_stats(const struct dpif *dpif_, 
struct dpif_dp_stats *stats)

 stats->n_masks = dp.megaflow_stats->n_masks;
 stats->n_mask_hit = get_32aligned_u64(
 _stats->n_mask_hit);
+stats->n_cache_hit = get_32aligned_u64(
+_stats->n_cache_hit);
+
+if (!stats->n_cache_hit) {
+/* Old kernels don't use this field and always
+ * report zero instead. Disable this stat. */
+stats->n_cache_hit = UINT64_MAX;
+}
 } else {
 stats->n_masks = UINT32_MAX;
 stats->n_mask_hit = UINT64_MAX;
+stats->n_cache_hit = UINT64_MAX;
 }
 ofpbuf_delete(buf);
 }
diff --git a/lib/dpif.h b/lib/dpif.h
index ecda896c7..600e4522c 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -429,6 +429,8 @@ struct dpif_dp_stats {
 uint64_t n_missed;  /* Number of flow table misses. */
 uint64_t n_lost;/* Number of misses not sent to 
userspace. */

 uint64_t n_flows;   /* Number of flows present. */
+uint64_t n_cache_hit;   /* Number of mega flow mask cache 
hits for

+   flow table matches. */
 uint64_t n_mask_hit;/* Number of mega flow masks visited 
for

flow table matches. */
 uint32_t n_masks;   /* Number of mega flow masks. */



--
fbl


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev