Hi, I have a satellite ISP where bandwidth costs around the $10-100/MB range
(that's MB, not GB).
There are also additional costs for each "connection", meaning its more
efficient to transmit in
bursts, than a gentle trickle of packets intermittently
An additional limitation of the network is that it takes 5-15 seconds to start
passing packets once
it's been idle for a while ("a while" is about 25 seconds), and during that
time wireguard will
retransmit packets on a 5 sec fixed interval, however, the network queues all
these packets and so
after say 15 seconds I might have 4+ rekey packets queued, which result in 4+
responses from the far
end. Such data is quite expensive for this network
The device side is behind a network NAT and my goal is to keep a reverse
connection in place, ie the
far end of the network can send packets back to the device. So I enable
Keepalive in the wireguard
config
A conceptual description would be: hub and spoke arrangement of IOT devices on
the far end of a
satellite internet link, where we want the hub to be able to keep an open pipe
and push commands to
the IOT devices. The NAT UDP timeout on the satellite link firewalls is
measured at approx 3 minutes.
So I face 3 challenges:
- keepalive packets are desired to be retransmitted every 3 minutes, but the
encryption rekey timers
are set closer to 2 minutes. For example setting the keepalive timer to 2
minutes leads to sending
148 byte rekey packets for each rekey. However, setting the keepalive timer
just short of 2 minutes
leads to a situation of 1 keepalive around the 2 min mark, followed by a rekey
packet at the second
2 min mark, etc)
- wireguard has a 2 minute ish rekey timeout which causes sending a 148 byte
request and triggering
a 92 byte response. However, as the retry interval is every 5 seconds, which
usually leads to
sending 3-10x 148 byte requests (which are queued and retransmitted one the
interface is up) and
leads to an equal number of 92 byte responses
So questions:
- Is it feasible within the design of wireguard to be able to "debounce" a
stream of rekey packets
that will arrive reasonably consecutively (at about 22kbit/s), particularly
it's the replies that I
want to queue and only send the latest? I couldn't see that this was feasible
from the code as it
stands today? Suggestions appreciated though?
- Is it possible to adjust these constants
REKEY_AFTER_TIME = 120,
REJECT_AFTER_TIME = 180,
My concern looking at the code is that if I have some unmodified clients using
the default settings,
then it's not clear to me how they would respond if one side has passed the
REJECT_AFTER_TIME
interval and the other has not? (The intended scenario might be a hub spoke of
IOT clients on the
satellite network, being accessed by other clients via general internet. The
IOT clients and hub
server would be modified, but the other clients would be at defaults)
Can anyone comment on the implications of say altering only the client IOT
devices to have a say
REKEY/REJECT times closer to 30 minutes? (ie server remaining on defaults)
- I implemented a very basic backoff on the resend of rekeys which better suits
the characteristics
of this network, eg first retry is not until after 15 seconds, then it retries
at 10, 15, 20, 25 sec
interval after that. Usually this leads to very few retries for my network.
Code is below, any
comments?
Results:
With these changes and assuming a somewhat unreliable satellite network which
might not have
coverage for some of the time (leading to additional retransmits), I see
theoretical monthly idle
usage close to 3MB/month. However, being able to increase the REKEY/REJECT
times to 30 mins might
drop this by a factor 10x or more. Can it be done?
Thanks
Ed W
Patch:
--- a/src/messages.h 2021-09-06 16:24:47.121985094 +0000
+++ b/src/messages.h 2021-09-06 13:54:59.879700016 +0000
@@ -40,14 +40,15 @@
enum limits {
REKEY_AFTER_MESSAGES = 1ULL << 60,
REJECT_AFTER_MESSAGES = U64_MAX - COUNTER_WINDOW_SIZE - 1,
- REKEY_TIMEOUT = 5,
+ REKEY_TIMEOUT = 10,
+ REKEY_BACKOFF = 5,
REKEY_TIMEOUT_JITTER_MAX_JIFFIES = HZ / 3,
REKEY_AFTER_TIME = 120,
REJECT_AFTER_TIME = 180,
INITIATIONS_PER_SECOND = 50,
MAX_PEERS_PER_DEVICE = 1U << 20,
KEEPALIVE_TIMEOUT = 10,
- MAX_TIMER_HANDSHAKES = 90 / REKEY_TIMEOUT,
+ MAX_TIMER_HANDSHAKES = 5, /* 100 secs */
MAX_QUEUED_INCOMING_HANDSHAKES = 4096, /* TODO: replace this with DQL */
MAX_STAGED_PACKETS = 128,
MAX_QUEUED_PACKETS = 1024 /* TODO: replace this with DQL */
--- a/src/timers.c 2021-09-06 16:24:47.122985106 +0000
+++ b/src/timers.c 2021-09-06 16:27:41.050156437 +0000
@@ -64,7 +64,7 @@
++peer->timer_handshake_attempts;
pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete
after %d seconds,
retrying (try %d)\n",
peer->device->dev->name, peer->internal_id,
- &peer->endpoint.addr, REKEY_TIMEOUT,
+ &peer->endpoint.addr, (REKEY_TIMEOUT +
(peer->timer_handshake_attempts * REKEY_BACKOFF)),
peer->timer_handshake_attempts + 1);
/* We clear the endpoint address src address, in case this is
@@ -182,7 +182,7 @@
void wg_timers_handshake_initiated(struct wg_peer *peer)
{
mod_peer_timer(peer, &peer->timer_retransmit_handshake,
- jiffies + REKEY_TIMEOUT * HZ +
+ jiffies + (REKEY_TIMEOUT + (peer->timer_handshake_attempts *
REKEY_BACKOFF) + 5) * HZ +
prandom_u32_max(REKEY_TIMEOUT_JITTER_MAX_JIFFIES));
}