On 11/16/21 19:31, Ilya Maximets wrote:
On 10/25/21 19:45, David Christensen wrote:
In certain high load situations, such as when creating a large number of
ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
as null, resulting in a segmentation fault when 'sock' is later
dereferenced, such as when calling recvmsg().
Hi, David. Thanks for the patch.
It's OK to check for a NULL pointer there, I guess. However,
do you know from where it was actually called? This function,
in general, should not be called without the actual socket,
so we, probably, should fix the caller instead.
Best regards, Ilya Maximets.
Hi, Ilya Maximets.
When I looked at the coredump file, ch->sock was nil and was passed to
nl_sock_recv():
(gdb) l
2701
2702 while (handler->event_offset < handler->n_events) {
2703 int idx = handler->epoll_events[handler->event_offset].data.u32;
2704 struct dpif_channel *ch = &dpif->channels[idx];
(gdb) p idx
$26 = 4
(gdb) p *dpif->channels@5
$27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock =
0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510,
last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock =
0x0,
last_poll = 61340749}}
The above snippet is from lib/dpif-netlink.c and the caller is
dpif_netlink_recv_vport_dispatch().
The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via
ch->sock parameter.
In that function, it tried to access sock->fd when calling recvmsg(), causing
the segfault.
I'm not enough experienced in Open vSwitch to explain why sock was nil at that
given index.
The fix seems worth, though.
Cheers!
The ovs-vswitchd.log will display something like this:
fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)
Tested this change under the same circumstances that originally generated
the segmentation fault and it ran successfully for four days without any
issue.
Signed-off-by: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: David Christensen <[email protected]>
IBM-BZ: #193057
---
lib/netlink-socket.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 5867de564..3ab4d8485 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -653,6 +653,10 @@ nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf,
int *nsid, bool wait)
int *ptr;
int error;
+ if (sock == NULL) {
+ return ECONNRESET;
+ }
+
ovs_assert(buf->allocated >= sizeof *nlmsghdr);
ofpbuf_clear(buf);
--
Murilo
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev