On Thu, Apr 19, 2018 at 08:07:33PM -0700, Gregory Rose wrote:
> On 4/19/2018 4:18 PM, Flavio Leitner wrote:
> > On Tue, Apr 17, 2018 at 12:34:08PM -0700, Greg Rose wrote:
> > > On RHEL 7.x kernels we observe a panic induced by a paging error
> > > when the timer kicks off a job that subsequently accesses memory
> > > that belonged to the openvswitch kernel module but was since
> > > unloaded - thus the paging error.
> > >
> > > The panic can be induced on any RHEL 7.x kernel with the following test:
> > >
> > > while `true`
> > > do
> > > make check-kmod TESTSUITEFLAGS="-k \!gre"
> > > done
> > >
> > > On the systems I've been testing on it generally takes anywhere from a
> > > minute to 15 minutes or so to repro but never longer than that. Similar
> > > results have been seen by other testers.
> > >
> > > This patch does not fix the underlying bug, which does need to be
> > > investigated and fixed, but it does prevent it from occurring. We
> > > would like to prevent customer systems from panicking while we do
> > > futher investigation to find the root cause.
> > This looks like related and it is fixed in kernel-3.10.0-842.el7 as
> > part of 7.5.
Agree. It seems likely to be the same issue. Sorry I missed this patch
being posted. Thanks far catching it Flavio!
> >
> > Bug 1531680 - openvswitch: list_add corruption splat on running OVS
> > check-kernel tests
> > [...]
> > Running the OVS check-kernel on kernel-3.10.0-826.el7 results in a
> > splat and/or panic. I bisected this down to:
> >
> > fc2302dde0d9 ("[net] openvswitch: Fix refcount leak on force commit")
This commit hash is wrong [0]. It's actually RHEL commit
f37ed043ed24 ("[net] openvswitch: Add force commit")
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1531680#c2
> >
> > I believe this is caused by a missing del_timer() when deleting the ct
> > in OVS. RHEL7 does not yet have the changes to remove the timers
> > (f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")).
> > The force commit feature was added upstream _after_ f330a7fdbe16.
> >
> > And the RHEL-only patch is:
> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > index 7582bf79033e..ef2a324e88b4 100644
> > --- a/net/openvswitch/conntrack.c
> > +++ b/net/openvswitch/conntrack.c
> > @@ -554,7 +554,8 @@ ovs_ct_expect_find(struct net *net, const struct
> > nf_conntrack_zone *zone,
> > if (h) {
> > struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> > - nf_ct_delete(ct, 0, 0);
> > + if (del_timer(&ct->timeout))
> > + nf_ct_delete(ct, 0, 0);
> > nf_conntrack_put(&ct->ct_general);
> > }
> > }
> > @@ -705,7 +706,8 @@ static bool skb_nfct_cached(struct net *net,
> > * the reference.
> > */
> > if (nf_ct_is_confirmed(ct))
> > - nf_ct_delete(ct, 0, 0);
> > + if (del_timer(&ct->timeout))
> > + nf_ct_delete(ct, 0, 0);
> > nf_conntrack_put(&ct->ct_general);
> > nf_ct_set(skb, NULL, 0);
> >
> >
> > If that's the case, we should say thank Eric.
> > Thanks,
> > fbl
>
> Fantastic, I'll test this and whip up a patch.
>
> Thanks!
>
> - Greg
I'll be on the lookout for it. Thanks.
[..]
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev