if relevant, some more details about interface and routes from good and bad example to compare:
root@eng196-router:~# ip a sh wg0 46: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000 link/none root@eng196-router:~# ip r sh dev wg0 10.5.44.0/24 scope link 172.27.0.0/24 scope link root@eng196-router:~# ip a sh br1 11: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 44:d9:e7:x:y:z brd ff:ff:ff:ff:ff:ff inet 10.29.85.100/24 brd 10.29.85.255 scope global br1 valid_lft forever preferred_lft forever inet6 fe80::7c4c:1dff:fe84:fece/64 scope link valid_lft forever preferred_lft forever root@zi1-router:~# ip a sh wg0 18: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000 link/none root@zi1-router:~# ip r sh dev wg0 10.5.44.0/24 scope link 172.27.0.0/24 scope link root@zi1-router:~# ip a sh br1 12: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 74:83:c2:x:y:z brd ff:ff:ff:ff:ff:ff inet 10.34.0.100/24 brd 10.34.0.255 scope global br1 valid_lft forever preferred_lft forever inet6 fe80::2c2e:76ff:fedc:d8e/64 scope link valid_lft forever preferred_lft forever Am 19.11.2021 um 00:40 schrieb Christoph Loesch:
Hi, I am using wireguard on about 20 EdgeRouters (based on Debian stretch). Each router has exact same configuration (apart from router ip addresses and wireguard keys/passphrases). Works very well on most of them but on five routers wireguard uses the wrong ip address for outgoing connections over the tunnel. All routers use kernel 4.14.54-UBNT and wireguard-tools v1.0.20210914 Wireguard debian package is from github/WireGuard/wireguard-vyatta-ubnt On the problematic routers the public ip address is used for the tunnel instead the private ip address. Interestingly even in the bad example the wg tunnel is running and the server can reach the routers(=wg clients), but not the other way round. In the following examples 172.27.0.1 is the wireguard server internal ip address. Routers use ip addresses in the 10.0.0.0/8 range for the wg tunnel which are allowed on the server. I already even debugged this with tcpdump where I found out it uses the wrong ip. But looking at a simple ping you also notice the wrong ip after the word "from". Good example: eng196-router:~$ \ping -I wg0 -c1 172.27.0.1 ping: Warning: source address might be selected on device other than wg0. PING 172.27.0.1 (172.27.0.1) from 10.29.85.100 wg0: 56(84) bytes of data. 64 bytes from 172.27.0.1: icmp_seq=1 ttl=64 time=6.82 ms --- 172.27.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 6.826/6.826/6.826/0.000 ms Bad example: zi1-router:~$ \ping -I wg0 -c1 172.27.0.1 ping: Warning: source address might be selected on device other than wg0. PING 172.27.0.1 (172.27.0.1) from 78.41.x.y wg0: 56(84) bytes of data. --- 172.27.0.1 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms Configurations: eng196-router:~# wg interface: wg0 public key: SoV2obcH0qWfCRY3gZbkLNeMa1QRcnhNDCeiI9weszA= private key: (hidden) listening port: 58205 peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU= preshared key: (hidden) endpoint: 86.59.x.y:1024 allowed ips: 172.27.0.0/24, 10.5.44.0/24 latest handshake: 53 seconds ago transfer: 24.57 MiB received, 26.48 MiB sent persistent keepalive: every 25 seconds zi1-router:~# wg interface: wg0 public key: aYtVhblpR0XSsAb/dXF3zM9Hu+LxlvrR5RWFU2psF3M= private key: (hidden) listening port: 45514 peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU= preshared key: (hidden) endpoint: 86.59.x.y:51820 allowed ips: 172.27.0.0/24, 10.5.44.0/24 latest handshake: 13 seconds ago transfer: 1.79 MiB received, 6.26 MiB sent persistent keepalive: every 25 seconds What could cause the wrong selection? Why does that work for most routers but for some not? There must be some difference or something gets confused up by specific ip addresses I guess? How could I debug this further to find the difference and/or cause for this problem? Thanks for any hints and kind regards, Christoph
