Bug#994728: libtirpc3:amd64: rpcbind stops replying to subnet broadcast CALLIT after one stray UDP datagram

2022-01-05 Thread Martin Dorey
On Sun, 19 Sep 2021 19:11:26 -0700 Martin Dorey 
wrote:
> Package: libtirpc3
> Version: 1.1.4-0.4

My production occurrence was always on Stretch, but I'd thought that the
patch might be more easily accepted against a less stale branch.  It
applies to Stretch too, where I've just put it into production.

> sendmsg(7, {msg_name={sa_family=AF_INET, sin_port=htons(800),
sin_addr=inet_addr("172.27.5.162")}, msg_namelen=16,
msg_iov=[{iov_base="\0\2:\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\371\0\0\0\4"...,
iov_len=36}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP,
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=0,
ipi_spec_dst=inet_addr("127.0.0.1"), ipi_addr=inet_addr("127.0.0.1")}}],
msg_controllen=32, msg_flags=0}, 0) = -1 EINVAL (Invalid argument)

I don't, today, see EINVAL being returned on Stretch, though I did before.
While successfully sent, the response doesn't get to its intended target
for me today, perhaps ending up at 127.0.0.1 instead of the intended
172.27.5.162 (to use the IP address from the above output).

> Once an rpcbind process has got into this state, it doesn't
> recover without being restarted.

I found, today, that sending Stretch rpcbind the poison pill from a
different machine caused it to recover.  I've only seen the problem
exhibited, then, when the poison has been sent locally.

> First enable remote call support.

That was first disabled in Buster, so isn't needed to reproduce the problem
on Stretch.

> rpcinfo -b 10 4

For my Stretch reproduction today, I have to make this call from a
different computer to the one on which rpcbind is running to see the
problem.  Sending a unicast request, like rpcbind -T udp sirius 10 4,
didn't suffer from the problem for me, at least not today, on Stretch.


Bug#994728: libtirpc3:amd64: rpcbind stops replying to subnet broadcast CALLIT after one stray UDP datagram

2021-09-19 Thread Martin Dorey
Package: libtirpc3
Version: 1.1.4-0.4
Severity: normal
Tags: patch

Dear Maintainer,

My NIS setup stops working occasionally.
The clients rely on subnet broadcast CALLIT requests to locate
the NIS servers.
The rpcbind process on the NIS server sees the requests but
fails to send the reply.
The strace output looks like this:

recvmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(800), 
sin_addr=inet_addr("172.27.5.162")}, msg_namelen=128->16, 
msg_iov=[{iov_base="\0\2:\270\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\5\0\0\0\0\0\0\0\0"...,
 iov_len=9000}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("eth0"), 
ipi_spec_dst=inet_addr("172.27.8.1"), ipi_addr=inet_addr("172.27.63.255")}}], 
msg_controllen=32, msg_flags=0}, 0) = 64
...
sendmsg(7, {msg_name={sa_family=AF_INET, sin_port=htons(800), 
sin_addr=inet_addr("172.27.5.162")}, msg_namelen=16, 
msg_iov=[{iov_base="\0\2:\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\371\0\0\0\4"...,
 iov_len=36}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=0, 
ipi_spec_dst=inet_addr("127.0.0.1"), ipi_addr=inet_addr("127.0.0.1")}}], 
msg_controllen=32, msg_flags=0}, 0) = -1 EINVAL (Invalid argument)

... where I think the active ingredient is that ipi_spec_dst or
ipi_addr is 127.0.0.1 rather than the 172.27.5.162 intended
reply address.
Once an rpcbind process has got into this state, it doesn't
recover without being restarted.
rpcbind is calling svc_sendreply on the xprt it created in
create_rmtcall_fd, which isn't where the request originated.
That calls svc_dg_reply which assumes:

/* cmsg already set in svc_dg_recv */

... as of:

https://git.linux-nfs.org/?p=steved/libtirpc.git;a=commit;h=74ef3df0236c55185225c62fba34953f2582da72
(Try to ensure datagram replies come from the address requests were sent to.)

That's been zero-initialized, so everything works fine until
a port scan or some such sends a datagram to the same port.
Its IP_PKTINFO gets remembered and used on every subsequent
reply.

I can demonstrate the problem without needing NIS.
First enable remote call support.
On Debian, that can be done with:

sudo tee --append /etc/default/rpcbind <--- src/svc_dg.c.orig   2021-09-19 18:24:32.462610751 -0700
+++ src/svc_dg.c2021-09-19 18:14:52.271066229 -0700
@@ -278,6 +278,8 @@
if (su->su_cache)
cache_set(xprt, slen);
}
+   msg->msg_control = NULL;
+   msg->msg_controllen = 0;
}
return (stat);
 }