Short version: if I set WG_ENDPOINT_RESOLUTION_RETRIES=infinity, I would like 
wg(8) to actually retry infinitely, rather than exiting the first time it gets 
what it assumes to be a permanent failure.

Long version:

When WG_ENDPOINT_RESOLUTION_RETRIES is set, wg will retry endpoint resolution 
failures...but it special-cases 2 or 3 error response codes [0] - EAI_NONAME, 
EAI_FAIL and (if defined) EAI_NODATA because it considers them "permanent" 
failures that are not worth retrying.

I have several Wireguard tunnels that are set to start at boot on a NixOS box I 
host. NixOS sets this variable to infinite for me [1]. Despite this, when I 
reboot that host, I consistently have the tunnels fail on startup. They're 
failing with a error that wg(8) considers permanent:

Oct 29 19:08:48 mord kernel: wireguard: WireGuard 1.0.20200908 loaded. See 
www.wireguard.com for information.
Oct 29 19:08:48 mord 
wireguard-wg0-peer-eG9xbERdnkQrnrpnRrteQQ4zKn-Bi2WA2V2Y2X0UCl0--x3d-start[1046]:
 Name or service not known: `4184f12df2343796c155.freemyip.com:12345'
Oct 29 19:08:48 mord systemd[1]: 
wireguard-wg0-peer-eG9xbERdnkQrnrpnRrteQQ4zKn-Bi2WA2V2Y2X0UCl0\x3d.service: 
Main process exited, code=exited, status=1/FAILURE
Oct 29 19:08:48 mord 
wireguard-wg0-peer-CBdpnmSVnwIvgtj4M1g3LlCLm-wooeo--x2bs5AARyoPjxU--x3d-start[1048]:
 Name or service not known: `d67930b08f5396e21ae1.freemyip.com:12345'
Oct 29 19:08:48 mord systemd[1]: 
wireguard-wg0-peer-CBdpnmSVnwIvgtj4M1g3LlCLm-wooeo\x2bs5AARyoPjxU\x3d.service: 
Main process exited, code=exited, status=1/FAILURE
Oct 29 19:08:48 mord 
wireguard-wg0-peer-J4rZgtReGrTwTglP05wLQt1GniIfUV4o4zAqcO-b3AI--x3d-start[1047]:
 Name or service not known: `9fa2756baed60cb5f18e.freemyip.com:12345'
Oct 29 19:08:48 mord systemd[1]: 
wireguard-wg0-peer-J4rZgtReGrTwTglP05wLQt1GniIfUV4o4zAqcO-b3AI\x3d.service: 
Main process exited, code=exited, status=1/FAILURE

This host gets an IP from DHCP a few seconds later, and after that I can SSH in 
and manually start the Wireguard tunnels without issue.

The assumption that wg(8) makes - that EAI_NONAME / "name or service not known" 
is a permanent failure - may be true in some cases, but isn't true in mine. 

I think it might also make sense, along with not special-casing those error 
codes, to lower the default number of retries to maybe 1 or 2 (instead of 15)? 
That would achieve the desired effect of not taking forever to fail if the 
error truly is permanent, but also allow use cases like mine where the tunnel 
is configured to start on boot and I want "the network will be up soon, trust 
me" retry behavior.

0: https://git.zx2c4.com/wireguard-tools/tree/src/config.c#n245

1: 
https://github.com/NixOS/nixpkgs/blob/nixos-20.09/nixos/modules/services/networking/wireguard.nix#L280

Reply via email to