Hi Ben,
I was working on the code changes and I can think of two approaches we can take 
to prevent this crash in the dpdk datapath.

1. Today the dp_packet module that we have is never return any error . The only 
error handling it is by assert using OVS_NOT_REACHED(). Should we go ahead and 
restructure the entire dp_packet module to do some error handling and  return 
(refer to openvswitchlib/dp-packet.c). This would require a more detailed 
effort to ensure all flows are handled.
2.  Do not call the  dp_packet_uninit without properly checking we have 
adequate headroom for DPBUF_DPDK buffers. 

Any suggestions on which approach we should go ahead with  ?

Regards & Thanks
-----Original Message-----
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Friday, May 11, 2018 2:00 AM
To: Anju Thomas <anju.tho...@ericsson.com>
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
> It looks like your commit message describes at least two other bugs in 
> OVS, though.  First, if OVS crashes when it pushes tunnel headers, 
> even if there's not enough headroom, that's really bad.  At worst, it 
> should drop the packet.  Do you know where the crash occurs?  We 
> should fix the problem, since it might recur in some other context.
> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual 
> assert in our case.
> The rootcause is that dp receives the actions after the upcall (say with >=3 
> tunnel pushes ) . Now as part of action processing , since it is a tunnel 
> push action , we try to find space in the dpdk mbuf packet headroom (which Is 
> 128 bytes). By the time we try to process the third tunnel push , there is no 
> headroom left since each tunnel header is of 50 bytes (50 *3 > 128 bytes 
> headroom). Hence it manually asserts .  This assert is to catch any 
> unexpected code flow . Do you think that in this case we should still go 
> ahead and  prevent the crash ? 

I don't understand why it's OK to crash in this case.  Why do you think so?

> Second, it's a little shocking to hear that an encapsulation action without a 
> following output action causes a memory leak.  We also need to fix that.  Do 
> you have any more details?
> [Anju] Now as explained above, the crash happens because we run out of 
> headroom. But in case we have say 2 or less than 2 tunnel pushes we will have 
> a mem leak as packet is never freed because the tnl push is the dp last 
> action and there is no other output action or any other action like recirc 
> that may translate to output action in the end leading  to packet buffer not 
> being freed.
> Are you proposing that we have some sort of preventive fix in the dp to 
> handle an incorrect action list from the upcall handling? 

Yes.  It's unacceptable to leak memory because there's a packet modification 
without an output action.  The kernel datapath never does this, for example.
dev mailing list

Reply via email to