Hi Ilya,
I had a look at the patch that you have submitted.
In push_tnl_action, I can see that with the patch you are trying to return
error in case we don’t find the tunnel port or return the error that
netdev_push header returns. And we are deciding to delete the batch in the
calling function dp_execute_cb in case of error. But netdev_push_header always
returns 0. Actually the following code in push_tnl_action is dead code.
if (!err) {
return 0;
}
The actual issue is that the push_header function always return void . So any
error happening during push_header is never captured or acted upon.
I think the fix also be should be to add/capture error scenarios during tnl
header push as well.
Regards & Thanks
Anju
-----Original Message-----
From: Ilya Maximets [mailto:[email protected]]
Sent: Monday, May 14, 2018 6:28 PM
To: [email protected]; Anju Thomas <[email protected]>; Ben Pfaff
<[email protected]>
Cc: Tiago Lam <[email protected]>
Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action
Hello Anju, Ben,
Looks like I fixed a leak reported here in my recent patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346892.html
Could you, please, take a look at it?
I actually managed to reproduce the packet leak on tunnel-push-pop unit tests
with valgrind:
==6445== 2,912 (2,208 direct, 704 indirect) bytes in 4 blocks are definitely
lost in loss record 400 of 410
==6445== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6445== by 0x516314: xmalloc (util.c:120)
==6445== by 0x466154: dp_packet_new (dp-packet.c:138)
==6445== by 0x466154: dp_packet_new_with_headroom (dp-packet.c:148)
==6445== by 0x4F6CFE: eth_from_hex (packets.c:498)
==6445== by 0x48EB43: eth_from_packet (netdev-dummy.c:1450)
==6445== by 0x48EB43: netdev_dummy_receive (netdev-dummy.c:1562)
==6445== by 0x515980: process_command (unixctl.c:313)
==6445== by 0x515980: run_connection (unixctl.c:347)
==6445== by 0x515980: unixctl_server_run (unixctl.c:398)
==6445== by 0x406F1E: main (ovs-vswitchd.c:120)
Above patch fixes it.
Best regards, Ilya Maximets.
> Hi Ben,
>
> I will work on these changes as well.
>
> Regards
> Anju
>
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: Friday, May 11, 2018 2:00 AM
> To: Anju Thomas <anju.thomas at ericsson.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push
> action
>
> On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
>> It looks like your commit message describes at least two other bugs
>> in OVS, though. First, if OVS crashes when it pushes tunnel headers,
>> even if there's not enough headroom, that's really bad. At worst, it
>> should drop the packet. Do you know where the crash occurs? We
>> should fix the problem, since it might recur in some other context.
>>
>> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual
>> assert in our case.
>> The rootcause is that dp receives the actions after the upcall (say with >=3
>> tunnel pushes ) . Now as part of action processing , since it is a tunnel
>> push action , we try to find space in the dpdk mbuf packet headroom (which
>> Is 128 bytes). By the time we try to process the third tunnel push , there
>> is no headroom left since each tunnel header is of 50 bytes (50 *3 > 128
>> bytes headroom). Hence it manually asserts . This assert is to catch any
>> unexpected code flow . Do you think that in this case we should still go
>> ahead and prevent the crash ?
>
> I don't understand why it's OK to crash in this case. Why do you think so?
>
>> Second, it's a little shocking to hear that an encapsulation action without
>> a following output action causes a memory leak. We also need to fix that.
>> Do you have any more details?
>> [Anju] Now as explained above, the crash happens because we run out of
>> headroom. But in case we have say 2 or less than 2 tunnel pushes we will
>> have a mem leak as packet is never freed because the tnl push is the dp last
>> action and there is no other output action or any other action like recirc
>> that may translate to output action in the end leading to packet buffer not
>> being freed.
>> Are you proposing that we have some sort of preventive fix in the dp to
>> handle an incorrect action list from the upcall handling?
>
> Yes. It's unacceptable to leak memory because there's a packet modification
> without an output action. The kernel datapath never does this, for example.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev