UPDATE: During the outage, I ran a simple test to check if both sides have access to each other's specified UDP ports, then I found something interesting: Some 148 bytes length packets are transferred between parties which I can't recognize what are them. By the way here I copy my test results. Anybody can spot what's going on here? The middle-node host name which runs wg1 is ir-pp and the exit-note (to free internet) which runs wg2 is sf-do: (these two parts happened at the same time and are copied from 2 different servers):
======================== root@ir-pp:~# nc -u sf-do 50840 123456 1234 12 ^C root@ir-pp:~# tcpdump 'port 50841' dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 20:56:57.607342 IP ir-pp.50841 > sf-do.50840: UDP, length 148 20:57:01.042471 IP sf-do.43161 > ir-pp.50841: UDP, length 3 20:57:02.361827 IP ir-pp.50841 > sf-do.50840: UDP, length 148 20:57:03.635754 IP sf-do.43161 > ir-pp.50841: UDP, length 5 20:57:06.740922 IP sf-do.43161 > ir-pp.50841: UDP, length 7 20:57:07.552305 IP ir-pp.50841 > sf-do.50840: UDP, length 148 ^C 6 packets captured 15 packets received by filter 0 packets dropped by kernel root@ir-pp:~# ======================== root@sf-do:~# tcpdump 'port 50840' dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 20:56:09.126958 IP sf-do.50840 > ir-pp.50841: UDP, length 148 20:56:14.758974 IP sf-do.50840 > ir-pp.50841: UDP, length 148 20:56:18.222814 IP ir-pp.38136 > sf-do.50840: UDP, length 7 20:56:20.391041 IP sf-do.50840 > ir-pp.50841: UDP, length 148 20:56:22.307488 IP ir-pp.38136 > sf-do.50840: UDP, length 5 20:56:24.702590 IP ir-pp.38136 > sf-do.50840: UDP, length 3 20:56:25.510985 IP sf-do.50840 > ir-pp.50841: UDP, length 148 20:56:30.630917 IP sf-do.50840 > ir-pp.50841: UDP, length 148 20:56:35.750965 IP sf-do.50840 > ir-pp.50841: UDP, length 148 ^C 9 packets captured 9 packets received by filter 0 packets dropped by kernel root@sf-do:~# nc -u ir-pp 50841 12 1234 123456 ^C root@sf-do:~# حامد صابر <[email protected]> در تاریخ جمعه ۳ سپتامبر ۲۰۲۱ ساعت ۸:۳۹ نوشت: > > Hi again, > Something new happened which makes me more confused. > I wrote a small shell-script to check the connection between wg1 and > wg2, so whenever it drops, the script restarts the wg1 and everything > comes back. > Yes yes I don't like this way of addressing issues, but what could I > do if no meaningful debug information exists, and I predict that it > might be a bug of Wireguard itself? > BTW this system worked fine and the anti-censorship VPN chain was up > and running till this morning. > The connection went down at 7:49 and didn't come back with > auto-restart. The process of restarting continued for about 6 minutes, > and at last at 7:55 it came back. > During the outage I checked both sides and everything seemed fine. > Sides could ping the public ip of each other, their wg udp ports were > accessible to each other, and even handshake seemed to be finished > correctly (using wg-show command) but peers couldn't ping each other. > And the most confusing part is everything came back to life without > any action from my side. Just after 6 minutes of continuously > restarting the wg1 interface! > Isn't there a bug? Somebody please help me to debug this problem and > find the cause. > > Here I bring you my shell-script code, and then its related log: > > ------------------------- > #!/bin/bash > > exec >>/var/log/wg-ping 2>&1 > while true > do > connection=$(ping -c 1 10.10.10.1) > time=$(date +%H:%M) > seconds=$(date +%S) > seconds=${seconds#0} > if [[ "$connection" != *"icmp"* ]]; then > echo " " > date > wg-quick down wg1 > echo " " > wg-quick up wg1 > connection=$(ping -c 1 10.10.10.1) > time=$(date +%T) > if [[ "$connection" != *"icmp"* ]]; then > echo "$time ERROR" > else > echo "$time OK" > echo " " > fi > elif [[ $seconds -lt 5 ]]; then > echo "$time OK" > fi > sleep 5 > done > =============== > Sample log of simply restarting the wg1 which makes everything fine > (and happens every few hours): > ----------------------- > 01:17 OK > 01:18 OK > 01:19 OK > 01:20 OK > 01:21 OK > 01:22 OK > 01:23 OK > 01:24 OK > 01:25 OK > 01:26 OK > > Fri Sep 3 01:26:41 +0430 2021 > [#] ip route del default dev wg1 table middle > [#] ip rule del iif wg0 lookup middle > [#] ip link delete dev wg1 > > [#] ip link add wg1 type wireguard > [#] wg setconf wg1 /dev/fd/63 > [#] ip -4 address add 10.10.10.2/32 dev wg1 > [#] ip link set mtu 1420 up dev wg1 > [#] ip -4 route add 10.10.10.1/32 dev wg1 > [#] ip route add default dev wg1 table middle > [#] ip rule add iif wg0 lookup middle > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > 01:26:42 OK > > 01:27 OK > 01:28 OK > 01:30 OK > 01:31 OK > 01:32 OK > 01:33 OK > 01:34 OK > ===================================================== > And the log for the confusing situation I explained: > ----------------------- > 07:45 OK > 07:46 OK > 07:47 OK > 07:48 OK > 07:49 OK > > Fri Sep 3 07:49:42 +0430 2021 > [#] ip route del default dev wg1 table middle > [#] ip rule del iif wg0 lookup middle > [#] ip link delete dev wg1 > > [#] ip link add wg1 type wireguard > [#] wg setconf wg1 /dev/fd/63 > [#] ip -4 address add 10.10.10.2/32 dev wg1 > [#] ip link set mtu 1420 up dev wg1 > [#] ip -4 route add 10.10.10.1/32 dev wg1 > [#] ip route add default dev wg1 table middle > [#] ip rule add iif wg0 lookup middle > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > 07:49:53 ERROR > > Fri Sep 3 07:50:08 +0430 2021 > [#] ip route del default dev wg1 table middle > [#] ip rule del iif wg0 lookup middle > [#] ip link delete dev wg1 > > [#] ip link add wg1 type wireguard > [#] wg setconf wg1 /dev/fd/63 > [#] ip -4 address add 10.10.10.2/32 dev wg1 > [#] ip link set mtu 1420 up dev wg1 > [#] ip -4 route add 10.10.10.1/32 dev wg1 > [#] ip route add default dev wg1 table middle > [#] ip rule add iif wg0 lookup middle > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > 07:50:18 ERROR > > ====== > LOTS OF RETRY LOGS CROPPED > ====== > > Fri Sep 3 07:55:13 +0430 2021 > [#] ip route del default dev wg1 table middle > [#] ip rule del iif wg0 lookup middle > [#] ip link delete dev wg1 > > [#] ip link add wg1 type wireguard > [#] wg setconf wg1 /dev/fd/63 > [#] ip -4 address add 10.10.10.2/32 dev wg1 > [#] ip link set mtu 1420 up dev wg1 > [#] ip -4 route add 10.10.10.1/32 dev wg1 > [#] ip route add default dev wg1 table middle > [#] ip rule add iif wg0 lookup middle > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > 07:55:23 ERROR > > Fri Sep 3 07:55:38 +0430 2021 > [#] ip route del default dev wg1 table middle > [#] ip rule del iif wg0 lookup middle > [#] ip link delete dev wg1 > > [#] ip link add wg1 type wireguard > [#] wg setconf wg1 /dev/fd/63 > [#] ip -4 address add 10.10.10.2/32 dev wg1 > [#] ip link set mtu 1420 up dev wg1 > [#] ip -4 route add 10.10.10.1/32 dev wg1 > [#] ip route add default dev wg1 table middle > [#] ip rule add iif wg0 lookup middle > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > 07:55:39 OK > > 07:56 OK > 07:57 OK > 07:58 OK > 07:59 OK > 08:00 OK > --------------------------------------- > > > > حامد صابر <[email protected]> در تاریخ پنجشنبه ۲ سپتامبر ۲۰۲۱ ساعت > ۸:۴۷ نوشت: > > > > -Thanks for reply. > > For test reasons, I turned the firewall off on my middle-node server. > > But I can't understand how this issue may be related to firewall, > > because most of the time it works (and to me it means firewall is Ok), > > but sometime for some unknown reason it stops working, which when I > > restart the wg1 interface with this command everything comes back to > > life: > > wg-quick down wg1 && wg-quick up wg1 > > > > BTW here it is the firewall status of middle-node: > > ================== > > ● firewalld.service - firewalld - dynamic firewall daemon > > Loaded: loaded (/usr/lib/systemd/system/firewalld.service; > > disabled; vendor preset: enabled) > > Active: inactive (dead) > > Docs: man:firewalld(1) > > ----------------------------------- > > Chain INPUT (policy ACCEPT) > > target prot opt source destination > > > > Chain FORWARD (policy ACCEPT) > > target prot opt source destination > > > > Chain OUTPUT (policy ACCEPT) > > target prot opt source destination > > ----------------------------------- > > > > And the firewall status of exit-node: > > ==================== > > Unit firewalld.service could not be found. > > --------------------------- > > Chain INPUT (policy ACCEPT) > > target prot opt source destination > > ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:50842 > > ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53 > > > > Chain FORWARD (policy ACCEPT) > > target prot opt source destination > > ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 > > ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 > > > > Chain OUTPUT (policy ACCEPT) > > target prot opt source destination > > --------------------- > > Regards > > > > John Lauro <[email protected]> در تاریخ چهارشنبه ۱ سپتامبر > > ۲۰۲۱ ساعت ۲۱:۵۱ نوشت: > > > > > > Just a guess, but I would be suspicious about connection tracking causing > > > the issue. What are your firewall rules? > > > > > > On Wed, Sep 1, 2021 at 9:51 AM حامد صابر <[email protected]> wrote: > > >> > > >> Dear friends, > > >> I have configured 3 wireguard interfaces on 2 servers to act as a > > >> chained VPN for me (to bypass the internet censorship in my country), > > >> with this schema: > > >> > > >> client -- wg0 on middle-node -- wg1 on middle node -- wg2 on exit node > > >> (to free internet) > > >> > > >> Everything works fine, but after a while, the connection between wg1 > > >> and wg2 drops and I can't find the reason. The connection comes back > > >> to action by simply switching the wg1 down and up again using > > >> wg-quick. And the amazing behaviour is that sometimes the connection > > >> comes back to work automatically after some random time passes, > > >> without any actions from my side (sometimes after a few tens of > > >> minutes, sometimes after a few hours). > > >> When the wg1-wg2 connection is not working, anything else between 2 > > >> servers (middle-node and exit-node) works fine. I mean I can ping the > > >> public IP of each server from another part, but the local wireguard ip > > >> of none of them are accessible. > > >> > > >> I tried to monitor the situation and read the logs but couldn't find > > >> what is happening here, so please help! > > >> > > >> The configuration: > > >> ====================== > > >> > > >> client (my mobile phone): > > >> ------------------------------------------- > > >> [Interface] > > >> Address = 10.10.20.2/32 > > >> PrivateKey = <private key of client> > > >> DNS = 10.10.10.1 > > >> > > >> ### Middle Node > > >> [Peer] > > >> PublicKey = <public key of wg0> > > >> PresharedKey = <preshared key> > > >> AllowedIPs = 0.0.0.0/0 > > >> Endpoint = middle-node:50842 > > >> ====================== > > >> > > >> wg0 (in middle-node server): > > >> ------------------------------------------- > > >> [Interface] > > >> Address = 10.10.20.1/24 > > >> ListenPort = 50842 > > >> PrivateKey = <private key of wg0> > > >> > > >> ### Client > > >> [Peer] > > >> PublicKey = <public key of client> > > >> PresharedKey = <preshared key> > > >> AllowedIPs = 10.10.20.2/32 > > >> ====================== > > >> > > >> wg1 (again in middle-node server): > > >> ------------------------------------------- > > >> [Interface] > > >> Address = 10.10.10.2/32 > > >> PrivateKey = <private key of wg1> > > >> > > >> PostUp = ip route add default dev wg1 table middle > > >> PostUp = ip rule add iif wg0 lookup middle > > >> PostUp = wg set wg1 peer <publickey of wg2 (in exit-node)> allowed-ips > > >> 0.0.0.0/0 > > >> > > >> PreDown = ip route del default dev wg1 table middle > > >> PreDown = ip rule del iif wg0 lookup middle > > >> > > >> ### Exit Node > > >> [Peer] > > >> PublicKey = <publickey of wg2 (in exit-node)> > > >> PresharedKey = <preshared key> > > >> AllowedIPs = 10.10.10.1/32 > > >> Endpoint = exit-node:50842 > > >> PersistentKeepalive = 25 > > >> ====================== > > >> > > >> wg2 (in exit-node server): > > >> ------------------------------------------- > > >> [Interface] > > >> Address = 10.10.10.1/24 > > >> ListenPort = 50842 > > >> PrivateKey = <private key of wg2> > > >> > > >> PostUp = iptables -A FORWARD -i eth0 -o wg2 -j ACCEPT > > >> PostUp = iptables -A FORWARD -i wg2 -j ACCEPT > > >> PostUp = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > > >> > > >> PostDown = iptables -D FORWARD -i eth0 -o wg2 -j ACCEPT > > >> PostDown = iptables -D FORWARD -i wg2 -j ACCEPT > > >> PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE > > >> > > >> ### Middle Node > > >> [Peer] > > >> PublicKey = <publickey of wg1 (in middle-node)> > > >> PresharedKey = <preshared key> > > >> AllowedIPs = 10.0.0.0/8 > > >> ====================== > > >> ====================== > > >> ====================== > > >> > > >> Sample log of dmesg when the wg1-wg2 connection is not working: > > >> ------------------------------------------- > > >> [Wed Sep 1 11:19:32 2021] wireguard: wg1: Sending keepalive packet to > > >> peer 12 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:19:44 2021] wireguard: wg0: Sending keepalive packet to > > >> peer 8 (~client-ip~:65323) > > >> [Wed Sep 1 11:19:44 2021] wireguard: wg1: Receiving keepalive packet > > >> from peer 12 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Receiving handshake > > >> initiation from peer 8 (~client-ip~:65323) > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Sending handshake response > > >> to peer 8 (~client-ip~:65323) > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Keypair 2867 destroyed for > > >> peer 8 > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Keypair 2871 created for peer > > >> 8 > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Receiving keepalive packet > > >> from peer 8 (~client-ip~:65323) > > >> [Wed Sep 1 11:21:19 2021] wireguard: wg0: Sending keepalive packet to > > >> peer 8 (~client-ip~:65323) > > >> [Wed Sep 1 11:21:24 2021] wireguard: wg1: Retrying handshake with > > >> peer 12 (~exit-node-ip~:50842) because we stopped hearing back after > > >> 15 seconds > > >> [Wed Sep 1 11:21:24 2021] wireguard: wg1: Sending handshake > > >> initiation to peer 12 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:21:30 2021] wireguard: wg1: Handshake for peer 12 > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try > > >> 2) > > >> ====================== > > >> > > >> Sample log of dmesg when the wg1-wg2 connection is coming back using > > >> manual restart: > > >> ------------------------------------------- > > >> [Wed Sep 1 11:45:52 2021] wireguard: wg1: Sending handshake > > >> initiation to peer 12 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:45:52 2021] wireguard: wg0: Sending keepalive packet to > > >> peer 8 (~client-ip~:2335) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Handshake for peer 12 > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try > > >> 3) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending handshake > > >> initiation to peer 12 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Keypair 2878 destroyed for > > >> peer 12 > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Peer 12 > > >> (~exit-node-ip~:50842) destroyed > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Interface destroyed > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Interface created > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Peer 13 created > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending keepalive packet to > > >> peer 13 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending handshake > > >> initiation to peer 13 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Receiving handshake > > >> response from peer 13 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Keypair 2881 created for peer > > >> 13 > > >> [Wed Sep 1 11:46:12 2021] wireguard: wg0: Receiving keepalive packet > > >> from peer 8 (~client-ip~:2335) > > >> [Wed Sep 1 11:46:14 2021] wireguard: wg1: Receiving keepalive packet > > >> from peer 13 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:46:27 2021] wireguard: wg0: Sending keepalive packet to > > >> peer 8 (~client-ip~:2335) > > >> [Wed Sep 1 11:46:28 2021] wireguard: wg1: Receiving keepalive packet > > >> from peer 13 (~exit-node-ip~:50842) > > >> [Wed Sep 1 11:46:52 2021] wireguard: wg1: Receiving keepalive packet > > >> from peer 13 (~exit-node-ip~:50842) > > >> > > >> > > >> Thanks in advance for your kind help
