Hi, I have a satellite ISP where bandwidth costs around the $10-100/MB range 
(that's MB, not GB).
There are also additional costs for each "connection", meaning its more 
efficient to transmit in
bursts, than a gentle trickle of packets intermittently

An additional limitation of the network is that it takes 5-15 seconds to start 
passing packets once
it's been idle for a while ("a while" is about 25 seconds), and during that 
time wireguard will
retransmit packets on a 5 sec fixed interval, however, the network queues all 
these packets and so
after say 15 seconds I might have 4+ rekey packets queued, which result in 4+ 
responses from the far
end. Such data is quite expensive for this network

The device side is behind a network NAT and my goal is to keep a reverse 
connection in place, ie the
far end of the network can send packets back to the device. So I enable 
Keepalive in the wireguard
config

A conceptual description would be: hub and spoke arrangement of IOT devices on 
the far end of a
satellite internet link, where we want the hub to be able to keep an open pipe 
and push commands to
the IOT devices. The NAT UDP timeout on the satellite link firewalls is 
measured at approx 3 minutes.


So I face 3 challenges:

- keepalive packets are desired to be retransmitted every 3 minutes, but the 
encryption rekey timers
are set closer to 2 minutes. For example setting the keepalive timer to 2 
minutes leads to sending
148 byte rekey packets for each rekey. However, setting the keepalive timer 
just short of 2 minutes
leads to a situation of 1 keepalive around the 2 min mark, followed by a rekey 
packet at the second
2 min mark, etc)

- wireguard has a 2 minute ish rekey timeout which causes sending a 148 byte 
request and triggering
a 92 byte response. However, as the retry interval is every 5 seconds, which 
usually leads to
sending 3-10x 148 byte requests (which are queued and retransmitted one the 
interface is up) and
leads to an equal number of 92 byte responses


So questions:

- Is it feasible within the design of wireguard to be able to "debounce" a 
stream of rekey packets
that will arrive reasonably consecutively (at about 22kbit/s), particularly 
it's the replies that I
want to queue and only send the latest? I couldn't see that this was feasible 
from the code as it
stands today? Suggestions appreciated though?


- Is it possible to adjust these constants

     REKEY_AFTER_TIME = 120,
     REJECT_AFTER_TIME = 180,

My concern looking at the code is that if I have some unmodified clients using 
the default settings,
then it's not clear to me how they would respond if one side has passed the 
REJECT_AFTER_TIME
interval and the other has not? (The intended scenario might be a hub spoke of 
IOT clients on the
satellite network, being accessed by other clients via general internet. The 
IOT clients and hub
server would be modified, but the other clients would be at defaults)

Can anyone comment on the implications of say altering only the client IOT 
devices to have a say
REKEY/REJECT times closer to 30 minutes? (ie server remaining on defaults)


- I implemented a very basic backoff on the resend of rekeys which better suits 
the characteristics
of this network, eg first retry is not until after 15 seconds, then it retries 
at 10, 15, 20, 25 sec
interval after that. Usually this leads to very few retries for my network. 
Code is below, any
comments?


Results:

With these changes and assuming a somewhat unreliable satellite network which 
might not have
coverage for some of the time (leading to additional retransmits), I see 
theoretical monthly idle
usage close to 3MB/month. However, being able to increase the REKEY/REJECT 
times to 30 mins might
drop this by a factor 10x or more. Can it be done?

Thanks

Ed W


Patch:

--- a/src/messages.h    2021-09-06 16:24:47.121985094 +0000
+++ b/src/messages.h    2021-09-06 13:54:59.879700016 +0000
@@ -40,14 +40,15 @@
 enum limits {
     REKEY_AFTER_MESSAGES = 1ULL << 60,
     REJECT_AFTER_MESSAGES = U64_MAX - COUNTER_WINDOW_SIZE - 1,
-    REKEY_TIMEOUT = 5,
+    REKEY_TIMEOUT = 10,
+    REKEY_BACKOFF = 5,
     REKEY_TIMEOUT_JITTER_MAX_JIFFIES = HZ / 3,
     REKEY_AFTER_TIME = 120,
     REJECT_AFTER_TIME = 180,
     INITIATIONS_PER_SECOND = 50,
     MAX_PEERS_PER_DEVICE = 1U << 20,
     KEEPALIVE_TIMEOUT = 10,
-    MAX_TIMER_HANDSHAKES = 90 / REKEY_TIMEOUT,
+    MAX_TIMER_HANDSHAKES = 5, /* 100 secs */
     MAX_QUEUED_INCOMING_HANDSHAKES = 4096, /* TODO: replace this with DQL */
     MAX_STAGED_PACKETS = 128,
     MAX_QUEUED_PACKETS = 1024 /* TODO: replace this with DQL */
--- a/src/timers.c    2021-09-06 16:24:47.122985106 +0000
+++ b/src/timers.c    2021-09-06 16:27:41.050156437 +0000
@@ -64,7 +64,7 @@
         ++peer->timer_handshake_attempts;
         pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete 
after %d seconds,
retrying (try %d)\n",
              peer->device->dev->name, peer->internal_id,
-             &peer->endpoint.addr, REKEY_TIMEOUT,
+             &peer->endpoint.addr, (REKEY_TIMEOUT + 
(peer->timer_handshake_attempts * REKEY_BACKOFF)),
              peer->timer_handshake_attempts + 1);

         /* We clear the endpoint address src address, in case this is
@@ -182,7 +182,7 @@
 void wg_timers_handshake_initiated(struct wg_peer *peer)
 {
     mod_peer_timer(peer, &peer->timer_retransmit_handshake,
-               jiffies + REKEY_TIMEOUT * HZ +
+               jiffies + (REKEY_TIMEOUT + (peer->timer_handshake_attempts * 
REKEY_BACKOFF) + 5) * HZ +
                prandom_u32_max(REKEY_TIMEOUT_JITTER_MAX_JIFFIES));
 }




Reply via email to