Hi, Conole. Thank you for pointing out the mistakes in my last patch. Yes, I 
did missed some something important.
The correct and complete way to fix this problem is like the following patch. 
In fact, we have found and fixed this
crash problem one year ago; and this patch has worked very well in our 
production environment for 11 months.

From 7e13a39cdbdfb484d4cb6b074f08471dc16d22b7 Mon Sep 17 00:00:00 2001
From: Wang Liang <[email protected]>
Date: Thu, 15 Apr 2021 13:52:01 +0800
Subject: [PATCH] Fix userspace datapath crash caused by IP fragments

The ovs userspace datapath will crash on openflow packet-out message
with IP packet largger than MTU. This patch will fix the problem.
When pre-processing the IP fragments, clone the packet if it has
'dnsteal' flag set.

Signed-off-by: Wang Liang <[email protected]>
---
 lib/ipf.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index c20bcc0b3..176140afb 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
              * recommend not setting the mempool number of buffers too low
              * and also clamp the number of fragments. */
             struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1];
-            frag->pkt = pkt;
+            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;
             frag->start_data_byte = start_data_byte;
             frag->end_data_byte = end_data_byte;
             frag->dnsteal = dnsteal;
@@ -1338,9 +1338,7 @@ ipf_destroy(struct ipf *ipf)
         while (ipf_list->last_sent_idx < ipf_list->last_inuse_idx) {
             struct dp_packet *pkt
                 = ipf_list->frag_list[ipf_list->last_sent_idx + 1].pkt;
-            if (!ipf_list->frag_list[ipf_list->last_sent_idx + 1].dnsteal) {
-                dp_packet_delete(pkt);
-            }
+            dp_packet_delete(pkt);
             atomic_count_dec(&ipf->nfrag);
             ipf_list->last_sent_idx++;
         }
-- 
2.27.0


-----原始邮件-----
发件人: Aaron Conole <[email protected]>
日期: 2021年4月15日 星期四 上午2:54
收件人: 王亮 Liang Wang <[email protected]>
抄送: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>
主题: Re: [ovs-dev] [PATCH] ipf.c: Fix userspace datapath crash caused by IP 
fragments

    王亮 Liang Wang <[email protected]> writes:

    > From d5c454bc9c68a6b2b40a2301a21f7337d6086d4c Mon Sep 17 00:00:00 2001
    > From: Wang Liang <[email protected]>
    > Date: Mon, 12 Apr 2021 18:24:21 +0800
    > Subject: [PATCH] Fix userspace datapath crash caused by IP fragments
    >
    > The ovs userspace datapath will crash on openflow packet-out message
    > with IP packet largger than MTU. This patch will fix the problem.
    > When pre-processing the IP fragments, clone the packet if it has
    > 'dnsteal' flag set.
    >
    > Signed-off-by: Wang Liang <[email protected]>
    > ---

    I guess the real issue is that we don't honor the dnsteal flag
    correctly.  It implies that we cannot safely hold a reference to the
    dp_packet because some other entity (read: the conntrack processing
    pipeline) will free it.

    Can you provide a test case that shows this?  I've done a little bit of
    testing, but haven't been able to reproduce.

    >  lib/ipf.c | 2 +-
    >  1 file changed, 1 insertion(+), 1 deletion(-)
    >
    > diff --git a/lib/ipf.c b/lib/ipf.c
    > index c20bcc0b3..43f81706c 100644
    > --- a/lib/ipf.c
    > +++ b/lib/ipf.c
    > @@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list 
*ipf_list,
    >               * recommend not setting the mempool number of buffers too 
low
    >               * and also clamp the number of fragments. */
    >              struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx 
+ 1];
    > -            frag->pkt = pkt;
    > +            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;

    I think the problem we have is once we do this duplication, I think
    we'll leak a reference to the packet, because we record the dnsteal
    value with frag, and that will skip the dp_packet_delete() calls later
    on during fragment queue release.

    I think with this change we also need to remove the dnsteal flag and
    simply always own the packet that comes in.  Maybe I missed something?

    >              frag->start_data_byte = start_data_byte;
    >              frag->end_data_byte = end_data_byte;
    >              frag->dnsteal = dnsteal;


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to