if relevant, some more details about interface and routes from good and bad 
example to compare:

root@eng196-router:~# ip a sh wg0
46: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN 
group default qlen 1000
    link/none
root@eng196-router:~# ip r sh dev wg0
10.5.44.0/24 scope link
172.27.0.0/24 scope link
root@eng196-router:~# ip a sh br1
11: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 44:d9:e7:x:y:z brd ff:ff:ff:ff:ff:ff
    inet 10.29.85.100/24 brd 10.29.85.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::7c4c:1dff:fe84:fece/64 scope link
       valid_lft forever preferred_lft forever

root@zi1-router:~# ip a sh wg0
18: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN 
group default qlen 1000
    link/none
root@zi1-router:~# ip r sh dev wg0
10.5.44.0/24 scope link
172.27.0.0/24 scope link
root@zi1-router:~# ip a sh br1
12: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 74:83:c2:x:y:z brd ff:ff:ff:ff:ff:ff
    inet 10.34.0.100/24 brd 10.34.0.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::2c2e:76ff:fedc:d8e/64 scope link
       valid_lft forever preferred_lft forever

Am 19.11.2021 um 00:40 schrieb Christoph Loesch:
Hi,

I am using wireguard on about 20 EdgeRouters (based on Debian stretch).
Each router has exact same configuration (apart from router ip addresses and 
wireguard keys/passphrases).
Works very well on most of them but on five routers wireguard uses the wrong ip 
address for outgoing connections over the tunnel.
All routers use kernel 4.14.54-UBNT and wireguard-tools v1.0.20210914
Wireguard debian package is from github/WireGuard/wireguard-vyatta-ubnt

On the problematic routers the public ip address is used for the tunnel instead 
the private ip address.
Interestingly even in the bad example the wg tunnel is running and the server 
can reach the routers(=wg clients), but not the other way round.

In the following examples 172.27.0.1 is the wireguard server internal ip 
address.
Routers use ip addresses in the 10.0.0.0/8 range for the wg tunnel which are 
allowed on the server.
I already even debugged this with tcpdump where I found out it uses the wrong 
ip.
But looking at a simple ping you also notice the wrong ip after the word "from".

Good example:
eng196-router:~$ \ping -I wg0 -c1 172.27.0.1
ping: Warning: source address might be selected on device other than wg0.
PING 172.27.0.1 (172.27.0.1) from 10.29.85.100 wg0: 56(84) bytes of data.
64 bytes from 172.27.0.1: icmp_seq=1 ttl=64 time=6.82 ms
--- 172.27.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.826/6.826/6.826/0.000 ms

Bad example:
zi1-router:~$ \ping -I wg0 -c1 172.27.0.1
ping: Warning: source address might be selected on device other than wg0.
PING 172.27.0.1 (172.27.0.1) from 78.41.x.y wg0: 56(84) bytes of data.
--- 172.27.0.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Configurations:
eng196-router:~# wg
interface: wg0
  public key: SoV2obcH0qWfCRY3gZbkLNeMa1QRcnhNDCeiI9weszA=
  private key: (hidden)
  listening port: 58205
peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU=
  preshared key: (hidden)
  endpoint: 86.59.x.y:1024
  allowed ips: 172.27.0.0/24, 10.5.44.0/24
  latest handshake: 53 seconds ago
  transfer: 24.57 MiB received, 26.48 MiB sent
  persistent keepalive: every 25 seconds

zi1-router:~# wg
interface: wg0
  public key: aYtVhblpR0XSsAb/dXF3zM9Hu+LxlvrR5RWFU2psF3M=
  private key: (hidden)
  listening port: 45514
peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU=
  preshared key: (hidden)
  endpoint: 86.59.x.y:51820
  allowed ips: 172.27.0.0/24, 10.5.44.0/24
  latest handshake: 13 seconds ago
  transfer: 1.79 MiB received, 6.26 MiB sent
  persistent keepalive: every 25 seconds

What could cause the wrong selection?
Why does that work for most routers but for some not? There must be some 
difference or something gets confused up by specific ip addresses I guess?
How could I debug this further to find the difference and/or cause for this 
problem?

Thanks for any hints and kind regards,
Christoph

Reply via email to