• Vitaliy Makkoveev [2023-10-04 22:03]:
On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
If it happns again, could you send an 'ps axlww | grep ifconifg'
output? Then we see the wait channel where it hangs in the kernel.
$ ps axlww
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
Here it happened again:
0 75339 23922 0 10 0 360 296 wg_ifq D+U p0 0:00.00
ifconfig wg1 destroy
wg_peer_destroy()
...
NET_LOCK();
while (!ifq_empty(&sc->sc_if.if_snd)) {
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();
}
NET_UNLOCK();
This net lock dance looks fishy. And the sleep has a timeout of 1
milli second. But that is may be per packet. So if you have a
long queue or the queue refills somehow, it will take forever.
I think the difference in the usage is constant traffic that keeps
the send queue full. The timeout hides the problem when there are
only a few packets.
This should ensure wg_qstart() will not dereference dying `peer'. Looks
crappy and potentially could block forever, but should work. However
netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
while netlock released before tsleep(), so it serializes nothing but
stops packets processing.
Kirill, does this diff help?
nope, same hang.
tested on a fresh Vultr VM with -current and patch below. VM got added
to my normal WG network, and VM was accessed by SSH over that WG network.
then:
# ifconfig wg1 down (from ssh -- connection to ssh session disappears)
# ifconfig wg1 delete (from console)
# ifconfig wg1 destroy" (from console -- command hangs)
interestingly, destroy works fine from ssh when commands are entered in
a tmux session and executed immediately after each other:
# ifconfig wg1 down; ifconfig wg1 delete; ifconfig wg1 destroy
looks like a timing issue.
Index: sys/net/if_wg.c
===================================================================
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -0000 1.31
+++ sys/net/if_wg.c 4 Oct 2023 20:01:16 -0000
@@ -507,13 +507,8 @@ wg_peer_destroy(struct wg_peer *peer)
noise_remote_clear(&peer->p_remote);
- NET_LOCK();
- while (!ifq_empty(&sc->sc_if.if_snd)) {
- NET_UNLOCK();
+ while (!ifq_empty(&sc->sc_if.if_snd))
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
- NET_LOCK();
- }
- NET_UNLOCK();
taskq_barrier(wg_crypt_taskq);
taskq_barrier(net_tq(sc->sc_if.if_index));