Thanks Frode, can those drop rules be added in such a way so as not to
prevent setting up an OVN load balancer or DNAT entry on the logical
router's IP from working?

Thanks
Tom

On Fri, 1 Apr 2022 at 06:35, Frode Nordahl <[email protected]>
wrote:

> On Thu, Mar 31, 2022 at 5:06 PM Stéphane Graber <[email protected]>
> wrote:
> >
> > On Thu, Mar 31, 2022 at 6:55 AM Ilya Maximets <[email protected]>
> wrote:
> > >
> > > On 3/31/22 06:24, Stéphane Graber wrote:
> > > > Thanks for the patch!
> > > >
> > > > I've applied it on top of my 5.17.1 and I'm now getting a very very
> > > > large number of:
> > > >
> > > > [  188.616220] openvswitch: netlink: ovs_ct_free_action at depth 3
> > > >
> > > > After just 3 minutes of uptime I'm seeing 230 such lines but also:
> > > >
> > > > [  108.726680] net_ratelimit: 6283 callbacks suppressed
> > > > [  114.515126] net_ratelimit: 3959 callbacks suppressed
> > > > [  120.523252] net_ratelimit: 4388 callbacks suppressed
> > > > [  127.458617] net_ratelimit: 2992 callbacks suppressed
> > > > [  133.543592] net_ratelimit: 2348 callbacks suppressed
> > > > [  140.099915] net_ratelimit: 5089 callbacks suppressed
> > > > [  145.116922] net_ratelimit: 6136 callbacks suppressed
> > > > [  150.551786] net_ratelimit: 5953 callbacks suppressed
> > > > [  155.555861] net_ratelimit: 5108 callbacks suppressed
> > > > [  162.740087] net_ratelimit: 6507 callbacks suppressed
> > > > [  168.059741] net_ratelimit: 3161 callbacks suppressed
> > > > [  173.092062] net_ratelimit: 4377 callbacks suppressed
> > > > [  178.104068] net_ratelimit: 3090 callbacks suppressed
> > > > [  183.135704] net_ratelimit: 2735 callbacks suppressed
> > > > [  188.616160] net_ratelimit: 2893 callbacks suppressed
> > > > [  193.628158] net_ratelimit: 4149 callbacks suppressed
> > > > [  199.128232] net_ratelimit: 3813 callbacks suppressed
> > > > [  204.136278] net_ratelimit: 2884 callbacks suppressed
> > > > [  211.443642] net_ratelimit: 5322 callbacks suppressed
> > > > [  216.643209] net_ratelimit: 5820 callbacks suppressed
> > > > [  221.648342] net_ratelimit: 4908 callbacks suppressed
> > > > [  226.656369] net_ratelimit: 4533 callbacks suppressed
> > > > [  231.680416] net_ratelimit: 4247 callbacks suppressed
> > > > [  237.172416] net_ratelimit: 2481 callbacks suppressed
> > > >
> > > > I did a 30min test run like earlier and I'm now seeing a tiny
> increase
> > > > in kmalloc-256 of just 448KB :)
> > > >
> > > > So that fix definitely did the trick to plug that leak!
> > > >
> > > > Tested-by: Stéphane Graber <[email protected]>
> > >
> > > Thanks!
> > >
> > > That fast flow rotation on a system though is not a healthy sign.
> > > But, at least, it doesn't leak the memory anymore.
> > >
> > > May I ask you to find the datapath flow that triggers the issue?
> > > I wasn't able to reproduce it with OVN system tests, so I'm curious
> > > how it looks like.  The flow should have a nested clone() or
> > > check_pkt_len() actions with the ct() action inside.  'depth 3'
> > > suggests that it is double nested, i.e. clone(...clone(...ct())) or
> > > clone(...check_pkt_len(...ct())) or something similar.  ovs-dpctl
> > > can be used to dump flows.
> >
> > So I've run (a rather crazy) "while :; do ovs-dpctl dump-flows | grep
> > clone; done" on all 3 servers and didn't see a single hit.
> >
> > Instead I went to grep for one of the addresses directly and I'm seeing:
> > https://gist.github.com/stgraber/4c7fe7b4e45cd3fa36d54759091bb2c0
> >
> > I don't know if that's telling you anything useful, but it appears to
> > be a couple of machines trying to access a TCP port on the OVN router.
> > My OVN router addresses are publicly reachable (45.45.148.136) in this
> > case, so some amount of port scanning and the like is somewhat common
> > and the likely cause of this traffic in the first place.
>
> FWIW; I've scheduled time to look into making OVN add drop rules so
> that the recirculation until TTL is 0 does not happen when TCP/UDP
> traffic is directed at the LRP address. Hopefully it will be simple,
> but the LRP address is also used for SNAT and needs to let conntrack
> accept return traffic, so there may be more to it. I'll find out when
> I start unraveling.
>
> --
> Frode Nordahl
>
> > > For now, I'll work on a proper patch for the kernel.
> >
> > Thanks!
> >
> > Stéphane
> >
> > > Best regards, Ilya Maximets.
> > >
> > > >
> > > > Stéphane
> > > >
> > > >
> > > > On Wed, Mar 30, 2022 at 9:24 PM Ilya Maximets <[email protected]>
> wrote:
> > > >>
> > > >> On 3/31/22 03:00, Stéphane Graber wrote:
> > > >>> So it looks like my main issue and the one mentioned in the PS
> section
> > > >>> may actually be one and the same.
> > > >>>
> > > >>> I've run an experiment now where I monitor the normal leakage over
> > > >>> 30min, getting me around 1.1GiB of extra kmalloc-256.
> > > >>> I then repeated the experiment but with firewall (good old
> iptables)
> > > >>> dropping all non-ICMP traffic headed to the OVN router addresses,
> this
> > > >>> reduced the leakage to just 89MiB.
> > > >>>
> > > >>> So it looks like that traffic is getting lost into a redirection
> loop
> > > >>> for a while in OVN/OVS, triggers the "openvswitch: ovs-system:
> > > >>> deferred action limit reached, drop recirc action" entry and also
> > > >>> leaks a bunch of memory in the process. We did attempt to pin the
> > > >>> issue to this cause earlier by generating some traffic which would
> hit
> > > >>> the recirc loop but that didn't cause a very visible increase at
> the
> > > >>> time. Given the new data however, our attempt at manually
> triggering
> > > >>> it on an otherwise unused network must have been flawed somehow.
> > > >>>
> > > >>> The amount of traffic dropped by the firewall is quite small too
> which
> > > >>> suggests that things could be significantly worse.
> > > >>> In this case I saw a total of 3478 packets dropped for a total of
> 183852 bytes.
> > > >>>
> > > >>> We'll be doing a bit more digging on the OVN side to see what may
> be
> > > >>> causing this and will report back.
> > > >>>
> > > >>> Stéphane
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Stéphane
> > > >>>
> > > >>> On Wed, Mar 30, 2022 at 5:08 PM Stéphane Graber <
> [email protected]> wrote:
> > > >>>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I'm using OVS 2.17.0 combined with OVN 22.03.0 on Ubuntu 20.04
> and a
> > > >>>> 5.17.1 mainline kernel.
> > > >>>>
> > > >>>> I'm trying to debug a very very problematic kernel memory leak
> which
> > > >>>> is happening in this environment.
> > > >>>> Sadly I don't have a clear reproducer or much idea of when it
> first
> > > >>>> appeared. I went through about a year of metrics and some amount
> of
> > > >>>> leakage may always have been present, the magnitude of it just
> > > >>>> changing recently making it such that my normal weekly server
> > > >>>> maintenance is now not frequent enough to take care of it.
> > > >>>>
> > > >>>> Basically what I'm doing is running a LXD + OVN setup on 3
> servers,
> > > >>>> they all act as OVN chassis with various priorities to spread the
> > > >>>> load.
> > > >>>> All 3 servers also normally run a combination of containers and
> > > >>>> virtual machines attached to about a dozen different OVN networks.
> > > >>>>
> > > >>>> What I'm seeing is about 2MB/s of memory leakage (kmalloc-256
> slub)
> > > >>>> which after enabling slub debugging can be tracked down to
> > > >>>> nf_ct_tmpl_alloc kernel calls such as those made by the
> openvswitch
> > > >>>> kernel module as part of its ovs_ct_copy_action function which is
> > > >>>> exposed as OVS_ACTION_ATTR_CT to userspace through the openvswitch
> > > >>>> netlink API.
> > > >>>>
> > > >>>> This means that in just a couple of days I'm dealing with just
> shy of
> > > >>>> 40GiB of those kmalloc-256 entries.
> > > >>>> Here is one of the servers which has been running for just 6
> hours:
> > > >>>>
> > > >>>> ```
> > > >>>> root@abydos:~# uptime
> > > >>>>  20:51:44 up  6:32,  1 user,  load average: 6.63, 5.95, 5.26
> > > >>>>
> > > >>>> root@abydos:~# slabtop -o -s c | head -n10
> > > >>>>  Active / Total Objects (% used)    : 24919212 / 25427299 (98.0%)
> > > >>>>  Active / Total Slabs (% used)      : 541777 / 541777 (100.0%)
> > > >>>>  Active / Total Caches (% used)     : 150 / 197 (76.1%)
> > > >>>>  Active / Total Size (% used)       : 11576509.49K / 11680410.31K
> (99.1%)
> > > >>>>  Minimum / Average / Maximum Object : 0.01K / 0.46K / 50.52K
> > > >>>>
> > > >>>>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > > >>>> 13048822 13048737  99%    0.75K 310688       42   9942016K
> kmalloc-256
> > > >>>> 2959671 2677549  90%    0.10K  75889       39    303556K
> buffer_head
> > > >>>> 460411 460411 100%    0.57K  16444       28    263104K
> radix_tree_node
> > > >>>>
> > > >>>> root@abydos:~# cat
> /sys/kernel/debug/slab/kmalloc-256/alloc_traces |
> > > >>>> sort -rn | head -n5
> > > >>>> 12203895 nf_ct_tmpl_alloc+0x55/0xb0 [nf_conntrack]
> > > >>>> age=57/2975485/5871871 pid=1964-2048 cpus=0-31 nodes=0-1
> > > >>>>  803599 metadata_dst_alloc+0x25/0x50 age=2/2663973/5873408
> > > >>>> pid=0-683331 cpus=0-31 nodes=0-1
> > > >>>>   32773 memcg_alloc_slab_cgroups+0x3d/0x90 age=386/4430883/5878515
> > > >>>> pid=1-731302 cpus=0-31 nodes=0-1
> > > >>>>    3861 do_seccomp+0xdb/0xb80 age=749613/4661870/5878386
> > > >>>> pid=752-648826 cpus=0-31 nodes=0-1
> > > >>>>    2314 device_add+0x504/0x920 age=751269/5665662/5883377
> pid=1-648698
> > > >>>> cpus=0-31 nodes=0-1
> > > >>>>
> > > >>>> root@abydos:~# cat
> /sys/kernel/debug/slab/kmalloc-256/free_traces |
> > > >>>> sort -rn | head -n5
> > > >>>> 8129152 <not-available> age=4300785451 pid=0 cpus=0 nodes=0-1
> > > >>>> 2770915 reserve_sfa_size+0xdf/0x110 [openvswitch]
> > > >>>> age=1912/2970717/5881994 pid=1964-2069 cpus=0-31 nodes=0-1
> > > >>>> 1621182 dst_destroy+0x70/0xd0 age=4/3065853/5883592 pid=0-733033
> > > >>>> cpus=0-31 nodes=0-1
> > > >>>>  288686 nf_ct_tmpl_free+0x1b/0x30 [nf_conntrack]
> > > >>>> age=109/2985710/5879968 pid=0-733208 cpus=0-31 nodes=0-1
> > > >>>>  134435 ovs_nla_free_flow_actions+0x68/0x90 [openvswitch]
> > > >>>> age=134/2955781/5883717 pid=0-733208 cpus=0-31 nodes=0-1
> > > >>>> ```
> > > >>>>
> > > >>>> Here you can see 12M calls to nf_ct_tmpl_alloc but just 288k to
> nf_ct_tmpl_free.
> > > >>>>
> > > >>>> Things I've done so far to try to isolate things:
> > > >>>>
> > > >>>> 1) I've evacuated all workloads from the server, so the only thing
> > > >>>> running on it is OVS vswitchd. This did not change anything.
> > > >>>> 2) I've added iptables/ip6tables raw table rules marking all
> traffic
> > > >>>> as NOTRACK. This did not change anything.
> > > >>>> 3) I've played with the chassis assignment. This does change
> things in
> > > >>>> that a server with no active chassis will not show any leakage
> > > >>>> (thankfully). The busier the network I move back to the host, the
> > > >>>> faster the leakage.
> > > >>>>
> > > >>>>
> > > >>>> I've had both Frode and Tom (Cced) assist with a variety of ideas
> and
> > > >>>> questions but while we have found some unrelated OVS and kernel
> > > >>>> issues, we're yet to figure this one out. So I wanted to reach
> out to
> > > >>>> the wider community to see if anyone has either seen something
> like
> > > >>>> this before or has suggestions as to where to look next.
> > > >>>>
> > > >>>> I can pretty easily rebuild the kernel, OVS or OVN and while this
> > > >>>> cluster is a production environment, the fact that I can evacuate
> one
> > > >>>> of the three servers with no user-visible impact makes it not too
> bad
> > > >>>> to debug. Having to constantly reboot the entire setup to clear
> the
> > > >>>> memory leak is the bigger annoyance right now :)
> > > >>>>
> > > >>>> Stéphane
> > > >>>>
> > > >>>> PS: The side OVS/kernel issue I'm referring to is
> > > >>>>
> https://lore.kernel.org/netdev/[email protected]/
> > > >>>> which allowed us to track down an issue with OVN logical routers
> > > >>>> properly responding to ICMP on their external address but getting
> into
> > > >>>> a recirculation loop when any other kind of traffic is thrown at
> them
> > > >>>> (instead of being immediately dropped or rejected). The kernel
> change
> > > >>>> made it possible to track this down.
> > > >>
> > > >> Hi, Stéphane.
> > > >>
> > > >> Thanks for the report.
> > > >>
> > > >> I'm not sure if that's what you see, but I think that I found one
> > > >> pretty serious memory leak in the openvswitch module.  In short,
> > > >> it leaks all the dynamically allocated memory for nested actions.
> > > >> E.g. if you have a datapath flow with actions:clone(ct(commit)),
> > > >> the structure allocated by nf_ct_tmpl_alloc for a nested ct() action
> > > >> will not be freed.
> > > >>
> > > >> Could you try the change below?
> > > >>
> > > >> It will additionally print a rate-limited error, if the nested CT
> > > >> encountered:
> > > >>   openvswitch: netlink: ovs_ct_free_action at depth 2
> > > >>
> > > >> This message can be used to confirm the issue.
> > > >>
> > > >> The very last bit of the fix is not necessary, but I just spotted
> > > >> the additional formatting issue and fixed it.
> > > >>
> > > >> ---
> > > >> diff --git a/net/openvswitch/flow_netlink.c
> b/net/openvswitch/flow_netlink.c
> > > >> index 5176f6ccac8e..248c39da465e 100644
> > > >> --- a/net/openvswitch/flow_netlink.c
> > > >> +++ b/net/openvswitch/flow_netlink.c
> > > >> @@ -2317,6 +2317,65 @@ static struct sw_flow_actions
> *nla_alloc_flow_actions(int size)
> > > >>         return sfa;
> > > >>  }
> > > >>
> > > >> +static void ovs_nla_free_flow_actions_nla(const struct nlattr
> *actions,
> > > >> +                                         int len, int depth);
> > > >> +
> > > >> +static void ovs_nla_free_check_pkt_len_action(const struct nlattr
> *action,
> > > >> +                                             int depth)
> > > >> +{
> > > >> +       const struct nlattr *a;
> > > >> +       int rem;
> > > >> +
> > > >> +       nla_for_each_nested(a, action, rem) {
> > > >> +               switch (nla_type(a)) {
> > > >> +               case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL:
> > > >> +               case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER:
> > > >> +                       ovs_nla_free_flow_actions_nla(nla_data(a),
> nla_len(a),
> > > >> +                                                     depth);
> > > >> +                       break;
> > > >> +               }
> > > >> +       }
> > > >> +}
> > > >> +
> > > >> +static void ovs_nla_free_clone_action(const struct nlattr *action,
> int depth)
> > > >> +{
> > > >> +       const struct nlattr *a;
> > > >> +       int rem;
> > > >> +
> > > >> +       nla_for_each_nested(a, action, rem) {
> > > >> +               switch (nla_type(a)) {
> > > >> +               case OVS_CLONE_ATTR_EXEC:
> > > >> +                       /* Real list of actions follows this
> attribute.
> > > >> +                        * Free them and return. */
> > > >> +                       a = nla_next(a, &rem);
> > > >> +                       ovs_nla_free_flow_actions_nla(a, rem,
> depth);
> > > >> +                       return;
> > > >> +               }
> > > >> +       }
> > > >> +}
> > > >> +
> > > >> +static void ovs_nla_free_dec_ttl_action(const struct nlattr
> *action, int depth)
> > > >> +{
> > > >> +       const struct nlattr *a = nla_data(action);
> > > >> +
> > > >> +       switch (nla_type(a)) {
> > > >> +       case OVS_DEC_TTL_ATTR_ACTION:
> > > >> +               ovs_nla_free_flow_actions_nla(nla_data(a),
> nla_len(a), depth);
> > > >> +               break;
> > > >> +       }
> > > >> +}
> > > >> +
> > > >> +static void ovs_nla_free_sample_action(const struct nlattr
> *action, int depth)
> > > >> +{
> > > >> +       const struct nlattr *a = nla_data(action);
> > > >> +
> > > >> +       switch (nla_type(a)) {
> > > >> +       case OVS_SAMPLE_ATTR_ARG:
> > > >> +               ovs_nla_free_flow_actions_nla(nla_data(a),
> nla_len(a), depth);
> > > >> +               break;
> > > >> +       }
> > > >> +}
> > > >> +
> > > >>  static void ovs_nla_free_set_action(const struct nlattr *a)
> > > >>  {
> > > >>         const struct nlattr *ovs_key = nla_data(a);
> > > >> @@ -2330,25 +2389,55 @@ static void ovs_nla_free_set_action(const
> struct nlattr *a)
> > > >>         }
> > > >>  }
> > > >>
> > > >> -void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
> > > >> +static void ovs_nla_free_flow_actions_nla(const struct nlattr
> *actions,
> > > >> +                                         int len, int depth)
> > > >>  {
> > > >>         const struct nlattr *a;
> > > >>         int rem;
> > > >>
> > > >> -       if (!sf_acts)
> > > >> +       if (!actions)
> > > >>                 return;
> > > >>
> > > >> -       nla_for_each_attr(a, sf_acts->actions,
> sf_acts->actions_len, rem) {
> > > >> +       depth++;
> > > >> +
> > > >> +       nla_for_each_attr(a, actions, len, rem) {
> > > >>                 switch (nla_type(a)) {
> > > >> -               case OVS_ACTION_ATTR_SET:
> > > >> -                       ovs_nla_free_set_action(a);
> > > >> +               case OVS_ACTION_ATTR_CHECK_PKT_LEN:
> > > >> +                       ovs_nla_free_check_pkt_len_action(a, depth);
> > > >> +                       break;
> > > >> +
> > > >> +               case OVS_ACTION_ATTR_CLONE:
> > > >> +                       ovs_nla_free_clone_action(a, depth);
> > > >>                         break;
> > > >> +
> > > >>                 case OVS_ACTION_ATTR_CT:
> > > >> +                       if (depth != 1)
> > > >> +                               OVS_NLERR(true,
> > > >> +                                       "ovs_ct_free_action at
> depth %d", depth);
> > > >>                         ovs_ct_free_action(a);
> > > >>                         break;
> > > >> +
> > > >> +               case OVS_ACTION_ATTR_DEC_TTL:
> > > >> +                       ovs_nla_free_dec_ttl_action(a, depth);
> > > >> +                       break;
> > > >> +
> > > >> +               case OVS_ACTION_ATTR_SAMPLE:
> > > >> +                       ovs_nla_free_sample_action(a, depth);
> > > >> +                       break;
> > > >> +
> > > >> +               case OVS_ACTION_ATTR_SET:
> > > >> +                       ovs_nla_free_set_action(a);
> > > >> +                       break;
> > > >>                 }
> > > >>         }
> > > >> +}
> > > >> +
> > > >> +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
> > > >> +{
> > > >> +       if (!sf_acts)
> > > >> +               return;
> > > >>
> > > >> +       ovs_nla_free_flow_actions_nla(sf_acts->actions,
> sf_acts->actions_len, 0);
> > > >>         kfree(sf_acts);
> > > >>  }
> > > >>
> > > >> @@ -3458,7 +3547,9 @@ static int clone_action_to_attr(const struct
> nlattr *attr,
> > > >>         if (!start)
> > > >>                 return -EMSGSIZE;
> > > >>
> > > >> -       err = ovs_nla_put_actions(nla_data(attr), rem, skb);
> > > >> +       /* Skipping the OVS_CLONE_ATTR_EXEC that is always present.
> */
> > > >> +       attr = nla_next(nla_data(attr), &rem);
> > > >> +       err = ovs_nla_put_actions(attr, rem, skb);
> > > >>
> > > >>         if (err)
> > > >>                 nla_nest_cancel(skb, start);
> > > >> ---
> > > >>
> > > >> Best regards, Ilya Maximets.
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to