Hi,

I could at least (temporary) fix this issue by adding the correct src IP to the 
routes like shown in the following example.
Now I just don't fully understand, what causes wireguard to select the IP from 
the wrong interface. Or based on what it selects the IP in the first place.
Even ping gives the warning: "ping: Warning: source address might be selected on 
device other than wg0."
That warning goes away when the routes have the correct IP set as src.
-> But I can definitely say that wireguard somehow selects the wrong IP for 
outgoing packets.
Just that I don't know why this happens only on 5 out of over 20 devices with 
same configuration..

ip route del <NET>
ip route add <NET> dev <ALIAS_DEV> src <SRC_IP>

information about interfaces:
root@zi1-router:~# ip a sh wg0
18: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN 
group default qlen 1000
    link/none
root@zi1-router:~# ip -4 a sh br0
11: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    inet 78.41.x.y/32 scope global br0
       valid_lft forever preferred_lft forever
root@zi1-router:~# ip -4 a sh br1
12: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    inet 10.34.0.100/24 brd 10.34.0.255 scope global br1
       valid_lft forever preferred_lft forever

root@zi1-router:~# ip r sh dev wg0
10.5.44.0/24 scope link
172.27.0.0/24 scope link
root@zi1-router:~# ip r d 172.27.0.0/24
root@zi1-router:~# ip r d 10.5.44.0/24
root@zi1-router:~# ip r a 172.27.0.0/24 dev wg0 src 10.34.0.100
root@zi1-router:~# ip r a 10.5.44.0/24 dev wg0 src 10.34.0.100
root@zi1-router:~# ip r sh dev wg0
10.5.44.0/24 scope link src 10.34.0.100
172.27.0.0/24 scope link src 10.34.0.100

root@zi1-router:~# ping 172.27.0.1
PING 172.27.0.1 (172.27.0.1) 56(84) bytes of data.
64 bytes from 172.27.0.1: icmp_seq=1 ttl=64 time=13.1 ms

Kind regards,
Christoph

Am 19.11.2021 um 01:11 schrieb Christoph Loesch:
if relevant, some more details about interface and routes from good and bad 
example to compare:

root@eng196-router:~# ip a sh wg0
46: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN 
group default qlen 1000
    link/none
root@eng196-router:~# ip r sh dev wg0
10.5.44.0/24 scope link
172.27.0.0/24 scope link
root@eng196-router:~# ip a sh br1
11: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 44:d9:e7:x:y:z brd ff:ff:ff:ff:ff:ff
    inet 10.29.85.100/24 brd 10.29.85.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::7c4c:1dff:fe84:fece/64 scope link
       valid_lft forever preferred_lft forever

root@zi1-router:~# ip a sh wg0
18: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN 
group default qlen 1000
    link/none
root@zi1-router:~# ip r sh dev wg0
10.5.44.0/24 scope link
172.27.0.0/24 scope link
root@zi1-router:~# ip a sh br1
12: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 74:83:c2:x:y:z brd ff:ff:ff:ff:ff:ff
    inet 10.34.0.100/24 brd 10.34.0.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::2c2e:76ff:fedc:d8e/64 scope link
       valid_lft forever preferred_lft forever

Am 19.11.2021 um 00:40 schrieb Christoph Loesch:
Hi,

I am using wireguard on about 20 EdgeRouters (based on Debian stretch).
Each router has exact same configuration (apart from router ip addresses and 
wireguard keys/passphrases).
Works very well on most of them but on five routers wireguard uses the wrong ip 
address for outgoing connections over the tunnel.
All routers use kernel 4.14.54-UBNT and wireguard-tools v1.0.20210914
Wireguard debian package is from github/WireGuard/wireguard-vyatta-ubnt

On the problematic routers the public ip address is used for the tunnel instead 
the private ip address.
Interestingly even in the bad example the wg tunnel is running and the server 
can reach the routers(=wg clients), but not the other way round.

In the following examples 172.27.0.1 is the wireguard server internal ip 
address.
Routers use ip addresses in the 10.0.0.0/8 range for the wg tunnel which are 
allowed on the server.
I already even debugged this with tcpdump where I found out it uses the wrong 
ip.
But looking at a simple ping you also notice the wrong ip after the word "from".

Good example:
eng196-router:~$ \ping -I wg0 -c1 172.27.0.1
ping: Warning: source address might be selected on device other than wg0.
PING 172.27.0.1 (172.27.0.1) from 10.29.85.100 wg0: 56(84) bytes of data.
64 bytes from 172.27.0.1: icmp_seq=1 ttl=64 time=6.82 ms
--- 172.27.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.826/6.826/6.826/0.000 ms

Bad example:
zi1-router:~$ \ping -I wg0 -c1 172.27.0.1
ping: Warning: source address might be selected on device other than wg0.
PING 172.27.0.1 (172.27.0.1) from 78.41.x.y wg0: 56(84) bytes of data.
--- 172.27.0.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Configurations:
eng196-router:~# wg
interface: wg0
  public key: SoV2obcH0qWfCRY3gZbkLNeMa1QRcnhNDCeiI9weszA=
  private key: (hidden)
  listening port: 58205
peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU=
  preshared key: (hidden)
  endpoint: 86.59.x.y:1024
  allowed ips: 172.27.0.0/24, 10.5.44.0/24
  latest handshake: 53 seconds ago
  transfer: 24.57 MiB received, 26.48 MiB sent
  persistent keepalive: every 25 seconds

zi1-router:~# wg
interface: wg0
  public key: aYtVhblpR0XSsAb/dXF3zM9Hu+LxlvrR5RWFU2psF3M=
  private key: (hidden)
  listening port: 45514
peer: 1syRMYD1jIVFMUMm5hF/j0MzjMQmuC5mlcT1VVugIkU=
  preshared key: (hidden)
  endpoint: 86.59.x.y:51820
  allowed ips: 172.27.0.0/24, 10.5.44.0/24
  latest handshake: 13 seconds ago
  transfer: 1.79 MiB received, 6.26 MiB sent
  persistent keepalive: every 25 seconds

What could cause the wrong selection?
Why does that work for most routers but for some not? There must be some 
difference or something gets confused up by specific ip addresses I guess?
How could I debug this further to find the difference and/or cause for this 
problem?

Thanks for any hints and kind regards,
Christoph

Reply via email to