Hi Jason

Apologies for taking some time to get back to you.
We tried to verify a few things and to see if we spot anything unusual,
and waited for a few mor instances to happen to get sufficient right data.

> That's surprising behavior. Thanks for debugging it. Can you see if
> you can reproduce with dynamic logging enabled? That'll give some
> useful information in dmesg:
>
>            # modprobe wireguard && echo module wireguard +p >
> /sys/kernel/debug/dynamic_debug/control

I did enable the debug control and also set
  sysctl -w net.core.message_cost=0
and have extracted a sample of the issue.
Please find it here https://nem3d.net/wireguard_20210512a.txt

From my observation, it is always the following symptoms:
1. Everything is WORKING:
LXC container d1-h sends handshake initiation.
Host wg0 receives, re-creates keypair, answers
d1-h receives, re-creates keypair, sends keepalive
wg0 receives keepalive
etc.


2. Somewhen it BREAKS
d1-h stopps hearing back after 15 seconds.
Initialization loop like mentioned above
d1-h stopps hearing back after 15 seconds.
etc.

As mentioned, the resolution is to dump the config, 
remove the peer, and syncconf to restore.
This time,  I used "nsenter -n" to apply this procedure to the
unprivileged container interface d1-h.

Lastly, we also saw similar behavior even between 2 physical hosts.
I will try to gather similar debug information.

Please let me know if further information is needed to
better understand the problem.

Raoul

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to