> Fun fact: initial versions of WireGuard from years ago weren't like > this. We wound up redoing some crypto and coming up with the `_psk2` > variant for this purpose. I'm glad it's useful. I'm interested to > learn: what are you doing this for? Got any code online?
That's a dangerous question to ask, because I'm really excited about it! It's an embedded device that connects otherwise-insecure stuff together into a transparent overlay network that's centrally configurable. You get a set of them in a box, plug one legacy thingamabob into each, and all the devices show up in a GUI. You configure all the IP allocations and allowed traffic flows, and an included hardware token signs the configuration. You then throw the master key into a safe somewhere, confident in the knowledge that even if your network infrastructure is all broken into by nation-state-du-jour your traffic will stay confidential; even if the network you're using to communicate is re-addressed underneath you -- or your stuff is moved across the country -- none of your legacy stuff will need reconfiguration; and even if the legacy stuff itself is broken into, the network configuration is being enforced by hardware. The boxes themselves have no persistent storage at all; in fact, possession of a private key the devices don't have is required to unlock the flash. The point is to resist malware infection by making assured remediation as simple as power-cycling the unit -- eventually, I'll have a dedicated microcontroller acting as a watchdog which shorts the reset pin to ground if the unit can't provide a TPM-backed health attestation every minute or so. The grand master plan is that as soon as you hack in and try to run something interesting, it all resets and you're back out again. Of course, each unit can't be updated every time you need to add a new one, and that's where the LLAs and in-band authentication stuff comes in. New boxes use Zeroconf to find peers, after which they connect and present the certificate authorizing their `AllowedIPs` and appropriate firewall setup. My goal is for every packet that comes out of each device except for ICMP, DHCP, and (m)DNS to be a WireGuard packet, cutting the attack surface to the bone -- and until you present a valid certificate, the only thing allowed inside the tunnel is TFTP. Most of my code for this thing is all fairly hacky and environment-specific at the moment -- that `wg-lla.sh` Gist from before is the first real piece I've been able to clean up and open-source. It's a fairly big project, but there are some more sections that I'm fairly certain I'll end up releasing as well; for example, the mesh-routing setup might be useful to some people. The principle is that by "brute-forcing" the MAC1 field from handshake initiations against the static public keys of all known peers, you can figure out what peer (or, rather, peer's endpoint) to send the handshake and subsequent flow towards, which makes every node in the mesh a potential endpoint for any other peer in the mesh. There's also a microcontroller-compatible implementation of WireGuard I'm working on (though in the very early stages), targeted at the Cortex-M0 platform and written in purely `no_std`, `forbid(unsafe_code)` Rust. All this stuff integrates into one big product in the end, but I'm very open to hearing community feedback on which of these bits would be most useful to others -- I'll prioritize them. (And if you're really interested in any of it, it's all at least proof-of-concept, and I do contract work!) By the way, putting the PSK exchange at the end is useful for another reason, too: you can use it to chain authentication mechanisms. The key here is that the PSK can be updated after sending an initiation packet, but before receiving the response. I've done an experiment using nfqueue on the initiator to catch an outgoing handshake request and stick an extra nonce on the end -- which is signed using a secondary key. On the responder side, another nonce is chosen; a new PSK, calculated by hashing the nonces, is set using `wg`, and the initiator ID is noted. The handshake initiation is then released for processing, which occurs using the freshly-set PSK. When WireGuard sends the handshake response, nfqueue intercepts the outgoing packet, matches the initiator ID, and sticks the responder's nonce on the end encrypted to the secondary key. The initiator intercepts the response with nfqueue, decrypts the second nonce, calculates the new PSK, and issues the same `wg set` command before releasing the response packet. This all works just fine (at least, as long as the daemon stays up), proving that you can do interactive authentication out-of-band. I've since realized that sticking the authentication I'm looking for inside the tunnel is a much better choice for my application, but I'm glad to have options. > This sounds like a motivation for doing the LLv6 generation inside of > your daemon, not inside of the kernel, right? In that case, your > design must already take into account a malicious peer finding public > key collisions after hashing. I'm not actually looking for a feature here as much as I am a standard, and this definitely shouldn't go in the kernel. (Heck, I wrote a Blake2s implementation in Bash just so it wouldn't have to go any deeper than `wg-quick`.) That said, part of me would really like to see a command like `wg lla AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=` that spits out `fe8b:5ea9:9e65:3bc2:b593:db41:30d1:0a4e`. That would serve the dual purposes of avoiding running a hash algorithm in a shell script and serving as a standard. There's not a lot of decisions to make when you sit down with the goal to make an LLA from a public key hash -- I listed them in my prior post, and I'm pretty sure it's exhaustive -- and it would be a shame if we didn't have sensible defaults to adopt. As for security, a 256-bit ECC public key only gives 128 bits of security in the first place. Hashing the key down to 16 bytes doesn't hurt security, because it would take as much effort to find a collision as it would to just run Pollard-rho and crack the key you're trying to mess with. The compromise comes in when you start masking off bits, and losing 10 bits to fit into `fe80::/10` isn't actually that bad -- in fact, I argue that it's negligible, because finding a colliding keypair requires an ECC scalar multiplication to determine the public key associated with each private key guess. This easily takes more than 1024 times as long as running the hash itself, meaning that the process of finding a keypair that's a second-preimage of a desired 118-bit LLA suffix actually takes longer than brute-forcing a second-preimage of a 128-bit Blake2s hash. (For reference, Curve25519 takes [832457 cycles] for a single scalar multiplication; Blake2s on a single 64-byte block takes [5.5 cycles per byte], or 352 cycles. These numbers are different microarchitectures, so it's kind of an apples-to-oranges thing, but we're talking orders of magnitude here.) : https://cr.yp.to/ecdh/curve25519-20051115.pdf : https://blake2.net/blake2.pdf