Re: [RFC] wiregard RX packet processing.

2022-01-04 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi Sebastian,
>
> Seems like you've identified two things, the use of need_resched, and
> potentially surrounding napi_schedule in local_bh_{disable,enable}.
>
> Regarding need_resched, I pulled that out of other code that seemed to
> have the "same requirements", as vaguely conceived. It indeed might
> not be right. The intent is to have that worker running at maximum
> throughput for extended periods of time, but not preventing other
> threads from running elsewhere, so that, e.g., a user's machine
> doesn't have a jenky mouse when downloading a file.
>
> What are the effects of unconditionally calling cond_resched() without
> checking for if (need_resched())? Sounds like you're saying none at
> all?

I believe so: AFAIU, you use need_resched() if you need to do some kind
of teardown before the schedule point, like this example I was recently
looking at:

https://elixir.bootlin.com/linux/latest/source/net/bpf/test_run.c#L73

If you just need to maybe reschedule, you can just call cond_resched()
and it'll do what it says on the tin: do a schedule if needed, and
return immediately otherwise.

> Regarding napi_schedule, I actually wasn't aware that it's requirement
> to _only_ ever run from softirq was a strict one. When I switched to
> using napi_schedule in this way, throughput really jumped up
> significantly. Part of this indeed is from the batching, so that the
> napi callback can then handle more packets in one go later. But I
> assumed it was something inside of NAPI that was batching and
> scheduling it, rather than a mistake on my part to call this from a wq
> and not from a softirq.
>
> What, then, are the effects of surrounding that in
> local_bh_{disable,enable} as you've done in the patch? You mentioned
> one aspect is that it will "invoke wg_packet_rx_poll() where you see
> only one skb." It sounds like that'd be bad for performance, though,
> given that the design of napi is really geared toward batching.

Heh, I wrote a whole long explanation he about variable batch sizes
because you don't control when the NAPI is scheduled, etc... And then I
noticed the while loop is calling ptr_ring_consume_bh(), which means
that there's already a local_bh_disable/enable pair on every loop
invocation. So you already have this :)

Which of course raises the question of whether there's anything to gain
from *adding* batching to the worker? Something like:

#define BATCH_SIZE 8
void wg_packet_decrypt_worker(struct work_struct *work)
{
struct crypt_queue *queue = container_of(work, struct multicore_worker,
 work)->ptr;
void *skbs[BATCH_SIZE];
bool again;
int i;

restart:
local_bh_disable();
ptr_ring_consume_batched(&queue->ring, skbs, BATCH_SIZE);

for (i = 0; i < BATCH_SIZE; i++) {
struct sk_buff *skb = skbs[i];
enum packet_state state;

if (!skb)
break;

state = likely(decrypt_packet(skb, PACKET_CB(skb)->keypair)) ?
PACKET_STATE_CRYPTED : PACKET_STATE_DEAD;
wg_queue_enqueue_per_peer_rx(skb, state);
}

again = !ptr_ring_empty(&queue->ring);
local_bh_enable();

if (again) {
cond_resched();
goto restart;
}
}


Another thing that might be worth looking into is whether it makes sense
to enable threaded NAPI for Wireguard. See:
https://lore.kernel.org/r/20210208193410.3859094-1-wei...@google.com

-Toke


Re: [RFC PATCH 0/4] Introduce per-peer MTU setting

2022-01-04 Thread Toke Høiland-Jørgensen
leon@is.currently.online writes:

> From: Leon Schuermann 
>
> This patch series is an attempt to integrate a per-peer MTU setting
> into WireGuard. With matching changes to the wireguard-tools,
> individual MTU values can be set and retrieved for each registered
> peer.
>
> While Linux supports setting an MTU metric for specific FIB route
> entries [which I've only found out after implementing this :)], and
> thus allows to lower the MTU for individual peers, this appears to
> disable regular path MTU discovery (PMTUD) entirely on the
> route. While regular PMTUD does not work over the tunnel link, it
> should still be usable on the rest of the route.

I'm not sure I understand the use case? Either PMTUD works through the
tunnel and you can just let that do its job, or it doesn't and you have
to do out-of-band discovery anyway in which case you can just use the
FIB route MTU? Or what do you mean by "usable on the rest of the route"?

> Furthermore, with the goal of eventually introducing an in-band
> per-peer PMTUD mechanism, keeping an internal per-peer MTU value does
> not require modifying the FIB and thus potentially interfere with
> userspace.

What "in-band per-peer PMTUD mechanism"? And why does it need this?

-Toke


Re: eBPF + IPv6 + WireGuard

2021-12-23 Thread Toke Høiland-Jørgensen
Alex  writes:

> Hi all,
>
> I am championing WireGuard at work, and I have been granted permission
> to use it for establishing remote access to a private IPv6 VLAN for all
> employees. I have experimented with different approaches and ran in to
> an issue.
>
> With the help of a machine dedicated fully to the job of remote access
> (Ubuntu 20.04 / Linux 5.4), I've managed to establish end-to-end
> connectivity between my work laptop and the servers. The VPN gateway
> has a wg0 interface for employees and a "private0" interface for
> private VLAN connectivity. The goal is to link the two.
>
> I noticed an odd quirk: It only works when
> "net.ipv6.conf.all.forwarding" is 1. If this value is 0,
> "net.ipv6.conf.[wg0 | private0].forwarding" has no effect. In other
> words, I cannot seem to enable forwarding for *only* the interfaces
> that need it. It only works when forwarding is enabled for *all*
> interfaces. This is a problem because when the "all" value is set to 1,
> the machine will start to behave as a router on VLANs for which it is
> most definitely not a router.
>
> For the servers, I am using the officially defined[0] subnet-router
> anycast address as the default IPv6 gateway, and it works well.
> However, when I flipped the switch on "net.ipv6.conf.all.forwarding",
> the VPN gateway started using NDP to announce that it was a router
> across *all* VLANs! Specifically, it was claiming ownership over our
> global subnet anycast address 2001:::::. This is neither
> true nor desired. The VPN gateway started to receive outbound non-VPN
> Internet traffic which broke Internet connectivity for all servers. 
>
> This leads me to my first question: Why does
> "net.ipv6.conf.wg0.forwarding" have no effect?

That's how forwarding works in the kernel for IPv6:

https://elixir.bootlin.com/linux/latest/source/Documentation/networking/ip-sysctl.rst#L1981

I.e., there *is* no per-interface forwarding setting, you'll need to
setup netfilter rules or some other mechanism to control which
interfaces packets will be forwarded to/from. The per-interface
"forwarding" attribute only controls things like whether the kernel will
accept router advertisements on that interface. Yes, this is a bit
confusing (and also different from IPv4).

The isRouter flag in neighbour advertisements should be controllable via
the per-interface 'forwarding' setting; the default changes when you
enable the global setting, but I think it should be possible to re-set
it? See:

https://elixir.bootlin.com/linux/latest/source/Documentation/networking/ip-sysctl.rst#L2148

> In researching a solution, I decided to give eBPF a try. I attached an
> XDP program to "private0" (a layer 2 device, naturally) and
> successfully redirected packets to "wg0" (a layer 3 device) with
> bpf_redirect[1]. If the packets are unmodified, WireGuard will happily
> pass them to the remote hosts. This is an issue because the remote
> hosts are expecting to receive IP(v6) packets, not Ethernet frames. If
> I use bpf_xdp_adjust_head[1] to strip off the Ethernet frame, WireGuard
> will drop the packet[2] before it can be sent to the remote host.
>
> My theory is that bpf_xdp_adjust_head is modifying the data pointer
> only and not any underlying structures that may be associated with the
> packet (sk_buff perhaps).
>
> This leads me to my second question: Why can't I redirect traffic
> received on an L2 interface to an L3 interface, *even after stripping
> off the Ethernet frame?*

I assume you were using generic XDP here? Anyway, XDP is layer2-only by
definition, so there's not really any way to achieve what you're trying
to do with XDP.

> Finally, during this whole process I was using WireShark to inspect
> traffic received by the remote host (i.e. my work laptop). With the
> help of the extract-handshakes.sh script, I was able to decrypt traffic.
> I did discover a bug though. I believe "index_hashtable_insert" should
> actually be "wg_index_hashtable_insert"[3].
>
> Does anyone have any insights as to what a proper solution would look
> like? Is there a way to achieve my goal without introducing eBPF? Is
> XDP completely unsuited for this particular purpose? Do I actually need
> to operate on the sk_buff level, as opposed to the xdp_buff level?

You could maybe do something with TC-BPF, but why can't you just enable
net.ipv6.conf.all.forwarding and just use netfilter to control which
interfaces packets will be forwarded across? That seems like it would be
the simplest solution...

-Toke


Re: Cannot add wg0 on CentOS 8

2021-12-11 Thread Toke Høiland-Jørgensen
Phil Perry  writes:

> On 09/12/2021 10:47, Mehdi Haghgoo wrote:
>> Hi,
>> 
>> I'm trying to add  a wireguard device on CentOS Stream (was Linux just 
>> converted to Stream).
>> When I enter "ip link add wg0 type wireguard" or try it using wg-quick wg0 
>> up, I get the following error:
>> 
>> RTNETLINK answers: Operation not supported
>> 
>> I have wireguard-tools and kmod-wireguard installed.
>> Additional information:
>> 
>> uname -r: 4.18.0-147.8.1.el8_1.x86_64
>> 
>> wg --version: wireguard-tools v1.0.20210914 - 
>> https://git.zx2c4.com/wireguard-tools/
>> 
>> kmod-wireguard version: 1.0.20211208
>> 
>> 
>> Best regards,
>> Mehdi
>
> Is the wireguard module loaded? I doubt it.
>
> You are running a *very* old kernel, and I suspect the version of 
> kmod-wireguard you are running does not support that kernel. Why are you 
> running a kernel that is two years out of date and full of unpatched 
> security holes?

RHEL (and thus CentOS) kernel version numbers are a complete fiction, so
don't mind the '4.18'. The important bit is the '-147' at the end, which
is still way out of date, though...

> Updating your system is probably the first step to fixing your issue.

My guess would be that he actually did update the system, it's just
running the old kernel still? I.e., if you install kmod-wireguard from
elrepo, that depends on a specific version of the kernel (-348 in the
current version), but if you haven't rebooted to actually run that
kernel, the module is obviously not going to work...

So, erm, Mehdi, have you tried turning it off and back on again? ;)

-Toke


Re: Source IP for multihomed peer

2021-10-15 Thread Toke Høiland-Jørgensen
> 2) Is there any way to force the source ip of the connection from boxA 
> to always use address boxA1 ?

In theory this should be possible to enforce via policy routing. Just
tried this on a simple veth setup:

# ip a add 10.11.1.1/24 dev veth0
# ip a add 10.11.2.1/24 dev veth0
# ping 10.11.1.2 -c 1
12:09:22.385888 IP 10.11.1.1 > 10.11.1.2: ICMP echo request, id 15, seq 1, 
length 64
12:09:22.385903 IP 10.11.1.2 > 10.11.1.1: ICMP echo reply, id 15, seq 1, length 
64

# ip r add 10.11.1.2 src 10.11.2.1 dev veth0
# ping 10.11.1.2 -c 1
12:09:53.251386 IP 10.11.2.1 > 10.11.1.2: ICMP echo request, id 16, seq 1, 
length 64
12:09:53.251403 IP 10.11.1.2 > 10.11.2.1: ICMP echo reply, id 16, seq 1, length 
64

I think this ought to work for wireguard's source selection as well. If
you don't have a particular destination, you should be able to do
something similar based on sports with ip-rule using the wireguard
source port:

# ip rule add sport 1234 lookup 100
# ip route add table 100 default via 1.2.3.4 src 3.4.5.6

That last bit I didn't test, though...

-Toke


Re: [PATCH net] wireguard: remove peer cache in netns_pre_exit

2021-09-03 Thread Toke Høiland-Jørgensen
Hangbin Liu  writes:

> On Thu, Sep 02, 2021 at 06:26:23PM +0200, Toke Høiland-Jørgensen wrote:
>> Ran this through the same series of tests as the previous patch, and
>> indeed it also seems to resolve the issue, so feel free to add:
>> 
>> Tested-by: Toke Høiland-Jørgensen 
>
>
> Thanks Toke for the testing during my PTO. I will try also do the test.
> Toke has all my reproducer. So please feel free to send the patch if I
> can't finish the test before next week.

You're welcome - yeah, I ran both the scripts you provided me with, and
Jason's patch fixes the issue in both :)

-Toke



Re: [PATCH net] wireguard: remove peer cache in netns_pre_exit

2021-09-02 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi Hangbin,
>
> Thanks for the patch and especially for the test. While I see that
> you've pointed to a real problem, I don't think that this particular way
> of fixing it is correct, because it will cause issues for userspace that
> expects to be able to read back the list of peers for, for example,
> keeping track of the latest endpoint addresses or rx/tx transfer
> quantities.
>
> I think the real solution here is to simply clear the endpoint src cache
> and consequently the dst_cache. This is slightly complicated by the fact
> that dst_cache releases dsts lazily, so I needed to add a little utility
> function for that, but that was pretty easy to do.
>
> Can you take a look at the below patch and let me know if it works for
> you and passes other testing you and Toke might be doing with it? (Also,
> please CC the wireguard mailing list in addition to netdev next time?)
> If the patch looks good to you and works well, I'll include it in the
> next series of wireguard patches I send back out to netdev. I'm back
> from travels next week and will begin working on the next series then.

Ran this through the same series of tests as the previous patch, and
indeed it also seems to resolve the issue, so feel free to add:

Tested-by: Toke Høiland-Jørgensen 

-Toke



Re: passing-through TOS/DSCP marking

2021-07-05 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> Hi Toke,
>
> On Mon, Jul 05, 2021 at 06:59:10PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle  writes:
>> > ...
>> >> The only potential operational issue with using it on multiple wg
>> >> interfaces is if they share IP space; because in that case you might
>> >> have packets from different tunnels ending up with identical hashes,
>> >> confusing the egress side. Fixing this would require the outer BPF
>> >> program to know about wg endpoint addresses and map the packets back to
>> >> their inner ifindexes using that. But as long as the wireguard tunnels
>> >> are using different IP subnets (or mostly forwarding traffic without the
>> >> inner addresses as sources or destinations), the hash collision
>> >> probability should not be bigger than just traffic on a single tunnel, I
>> >> suppose.
>> >> 
>> >> One particular thing to watch out for here is IPv6 link-local traffic;
>> >> sine wg doesn't generate link-local addresses automatically, they are
>> >> commonly configured with (the same) static address (like fe80::1 or
>> >> fe80::2), which would make link-local traffic identical across wg
>> >> interfaces. But this is only used for particular setups (I use it for
>> >> running Babel over wg, for instance), just make sure it won't be an
>> >> issue for your deployment scenario :)
>> >
>> > All this is good to know, but from what I can see now shouldn't be
>> > a problem in our deployment -- it's multiple wireguard links which are
>> > (using fwmark and ip rules) routed over several uplinks. We then use
>> > mwan3 to balance most of the gateway traffic accross the available
>> > wireguard interfaces, using MASQ/SNAT on each tunnel which has a
>> > unique transfer network assigned, and no IPv6 at all.
>> > Hence it should be ok to go under the restrictions you described.
>> 
>> Alright, so the wireguard-to-physical interfaces is always many-to-one?
>> I.e., each wireguard interface is always routed out the same physical
>> interface, but there may be multiple wg interfaces sharing the same
>> uplink?
>
> Well, on the access concentrator in the datacentre this is the case:
> All wireguard tunnels are using the same interface.
>
> On the remote system there are *many* tunnels to the same access
> concentrator, each routed over a different uplink interface.
> So there it's a 1:1 mapping, each wgX has it's distinct ethX.
> (and there the current solution already works fine)

Alright. The important thing here is that no single wg interface splits
its traffic over multiple uplinks, so that's all fine :)

>> I'm asking because in that case it does make sense to keep separate
>> instances of the whole setup per physical interface to limit hash
>> collisions; otherwise, the lookup table could also be made global and
>> shared between all physical interfaces, so you'd avoid having to specify
>> the relationship explicitly...
>> 
>> >> >  * Once a wireguard interface goes down, one cannot unload the
>> >> >remaining program on the upstream interface, as
>> >> >preserve-dscp wg0 eth0 --unload
>> >> >would fail in case of 'wg0' having gone missing.
>> >> >What do you suggest to do in this case?
>> >> 
>> >> Just fixing the userspace utility to deal with this case properly as
>> >> well is probably the easiest. How are you thinking you'd deploy this?
>> >> Via ifup hooks on openwrt, or something different?
>> >
>> > Yes, I use ifup hooks configured in an init script for procd and have
>> > it tied to the wireguard config sections in /etc/config/network:
>> >
>> > https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56
>> >
>> > Passing multiple inner interfaces to one call to the to-be-modified
>> > preserve-dscp tool could be achieved by some shell magic dealing with
>> > the configuration...
>> 
>> Not necessary: it's perfectly fine to attach them one at a time.
>
> So assume we changed the userspace tool to accept multiple inner
> interfaces, let's say we called:
> preserve-dscp wg0 wg1 wg2 eth0
>
> And then, at some later point in time we want to add 'wg3'.
> So calling
> preserve-dscp 

Re: passing-through TOS/DSCP marking

2021-07-05 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> On Mon, Jul 05, 2021 at 05:21:25PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle  writes:
>> ...
>> > I have managed to test your solution and it seems to do the job.
>> > Remaining issues:
>> >  * What to do if there are many tunnels all sharing the same upstream
>> >interface? In this case I'm thinking of doing:
>> >preserve-dscp wg0 eth0
>> >preserve-dscp wg1 eth0
>> >preserve-dscp wg2 eth0
>> >...
>> >But I'm unsure whether this is indented or if further details need
>> >to be implemented in order to make that work.
>> 
>> Hmm, not sure whether that will work out of the box, actually. Would
>> definitely be doable to make the userspace utility understand how to do
>> this properly, though. There's nothing in principle preventing this from
>> working; the loader should just be smart enough to do incremental
>> loading of multiple "ingress" programs while still sharing the map
>> between all of them.
>
> You make it at least sound easy :)

I'd say the implementation is relatively straight-forward for anyone
familiar with how BPF works; figuring out how it's *supposed* to work is
the hard bit ;)

>> The only potential operational issue with using it on multiple wg
>> interfaces is if they share IP space; because in that case you might
>> have packets from different tunnels ending up with identical hashes,
>> confusing the egress side. Fixing this would require the outer BPF
>> program to know about wg endpoint addresses and map the packets back to
>> their inner ifindexes using that. But as long as the wireguard tunnels
>> are using different IP subnets (or mostly forwarding traffic without the
>> inner addresses as sources or destinations), the hash collision
>> probability should not be bigger than just traffic on a single tunnel, I
>> suppose.
>> 
>> One particular thing to watch out for here is IPv6 link-local traffic;
>> sine wg doesn't generate link-local addresses automatically, they are
>> commonly configured with (the same) static address (like fe80::1 or
>> fe80::2), which would make link-local traffic identical across wg
>> interfaces. But this is only used for particular setups (I use it for
>> running Babel over wg, for instance), just make sure it won't be an
>> issue for your deployment scenario :)
>
> All this is good to know, but from what I can see now shouldn't be
> a problem in our deployment -- it's multiple wireguard links which are
> (using fwmark and ip rules) routed over several uplinks. We then use
> mwan3 to balance most of the gateway traffic accross the available
> wireguard interfaces, using MASQ/SNAT on each tunnel which has a
> unique transfer network assigned, and no IPv6 at all.
> Hence it should be ok to go under the restrictions you described.

Alright, so the wireguard-to-physical interfaces is always many-to-one?
I.e., each wireguard interface is always routed out the same physical
interface, but there may be multiple wg interfaces sharing the same
uplink?

I'm asking because in that case it does make sense to keep separate
instances of the whole setup per physical interface to limit hash
collisions; otherwise, the lookup table could also be made global and
shared between all physical interfaces, so you'd avoid having to specify
the relationship explicitly...

>> >  * Once a wireguard interface goes down, one cannot unload the
>> >remaining program on the upstream interface, as
>> >preserve-dscp wg0 eth0 --unload
>> >would fail in case of 'wg0' having gone missing.
>> >What do you suggest to do in this case?
>> 
>> Just fixing the userspace utility to deal with this case properly as
>> well is probably the easiest. How are you thinking you'd deploy this?
>> Via ifup hooks on openwrt, or something different?
>
> Yes, I use ifup hooks configured in an init script for procd and have
> it tied to the wireguard config sections in /etc/config/network:
>
> https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=blob;f=package/network/utils/bpf-examples/files/wireguard-preserve-dscp.init;h=f1e5e25e663308e057285e2bd8e3bcb9560bdd54;hb=5923a78d74be3f05e734b0be0a832a87be8d369b#l56
>
> Passing multiple inner interfaces to one call to the to-be-modified
> preserve-dscp tool could be achieved by some shell magic dealing with
> the configuration...

Not necessary: it's perfectly fine to attach them one at a time.

> We will have to restart the filter for all inner interfaces in case of
> one being added or removed, right?

Nope, that's no 

Re: passing-through TOS/DSCP marking

2021-07-05 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> Hi Toke,
>
> thank you for the ongoing efforts and support on this issue.

You're welcome! :)

> On Wed, Jun 30, 2021 at 10:55:09PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle  writes:
>> > ...
>> >> >
>> >> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
>> >> > that and started working on integrating GCC-10's BPF target in our build
>> >> > system...
>> >> 
>> >> I saw that, but I have no idea if GCC's BPF target support will support
>> >> this. My tentative guess would be no, unfortunately :(
>> >
>> > Probably you are right. When building the BPF object with GCC, the
>> > result is:
>> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
>> > libbpf: elf: skipping unrecognized data section(4) .stab
>> > libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
>> > libbpf: elf: skipping unrecognized data section(13) .comment
>> > libbpf: BTF is required, but is missing or corrupted.
>> > Couldn't open file: preserve_dscp_kern.o
>> 
>> Hmm, for this example it should be possible to make it run without BTF.
>> I'm only using that for the map definition, so that could be changed to
>> the old format; you could try this patch:
>> 
>> diff --git a/preserve-dscp/preserve_dscp_kern.c 
>> b/preserve-dscp/preserve_dscp_kern.c
>> index 24120cb8a3ff..08248e1f0e41 100644
>> --- a/preserve-dscp/preserve_dscp_kern.c
>> +++ b/preserve-dscp/preserve_dscp_kern.c
>> @@ -9,12 +9,12 @@
>>   * otherwise clean up stale entries. Instead, we just rely on the LRU 
>> mechanism
>>   * to evict old entries as the map fills up.
>>   */
>> -struct {
>> -   __uint(type, BPF_MAP_TYPE_LRU_HASH);
>> -   __type(key, __u32);
>> -   __type(value, __u8);
>> -   __uint(max_entries, 16384);
>> -} flow_dscps SEC(".maps");
>> +struct bpf_map_def SEC("maps") flow_dscps = {
>> +   .type   = BPF_MAP_TYPE_LRU_HASH,
>> +   .key_size   = sizeof(__u32),
>> +   .value_size = sizeof(__u8),
>> +   .max_entries= 16384,
>> +};
>>  
>>  const volatile static int ip_only = 0;
>> 
>
> That change gives me the next error:
> libbpf: map '' (legacy): static maps are not supported
>
> Also speaks for itself...

Ah, right, the ip_only config also needs to be moved to an old-style
map, then...

>> > Using the LLVM/Clang compiled object also doesn't work:
>> > root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
>> > libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not 
>> > permitted(-1). Retrying without BTF.
>> > libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
>> > libbpf: permission error while running as root; try raising 'ulimit -l'? 
>> > current value: 512.0 KiB
>> > libbpf: failed to load object 'preserve_dscp_kern.o'
>> > Failed to load object
>> >
>> > Probably Kernel 5.4.124 is too old...?
>> 
>> Here I think the hint is in the error message ;)
>
> Yep, I realized I had to increase it to ulimit to 2048...
> With that at least the LLVM/Clang generated BPF object seems to load
> properly, and I can load and unload it as expected.

Awesome!

>> >> An alternative to getting LLVM built as part of the OpenWrt toolchain is
>> >> to just use the host clang to build the BPF binaries. It doesn't
>> >> actually need to be cross-compiled with a special compiler, the BPF byte
>> >> code format is the same on all architectures except for endianness, so
>> >> just passing that to the host clang should theoretically be enough...
>> >
>> > I believe that having a way to build BPF objects compatible with the
>> > target built-into our toolchain would be a huge step forward.
>> > And given that gcc already get's pretty far, I think it'd be worth
>> > fixing/patching what ever is missing (I haven't even tried GCC-11 yet)
>> 
>> For this example that might work (as noted above), but for other things
>> BTF is a hard requirement, and I don't believe GCC supports that at all,
>> sadly :(
>
> It looks like this has changed very recently:
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d5cf2b5db325fd5c053ca7bc8d6a54a06cd71124

Uhh, exciting! Will be interesting to see how compatible this will be;
and how BPF upstream will deal with multiple compilers :)

>>

Re: passing-through TOS/DSCP marking

2021-06-30 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> Hi Toke,
>
> On Mon, Jun 21, 2021 at 04:27:08PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle  writes:
>> 
>> > On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
>> >> Hey Toke,
>> >> 
>> >> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen  
>> >> wrote:
>> >> > > I think you can achieve something similar using BPF filters, by 
>> >> > > relying
>> >> > > on wireguard passing through the skb->hash value when encrypting.
>> >> > >
>> >> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the 
>> >> > > DSCP
>> >> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
>> >> > > filter on the physical interface that shares that same map, lookup the
>> >> > > DSCP value based on the skb->hash value, and rewrite the outer IP
>> >> > > header.
>> >> > >
>> >> > > The read-side filter will need to use bpf_get_hash_recalc() to make 
>> >> > > sure
>> >> > > the hash is calculated before the packet gets handed to wireguard, and
>> >> > > it'll be subject to hash collisions, but I think it should generally
>> >> > > work fairly well (for anything that's flow-based of course). And it 
>> >> > > can
>> >> > > be done without patching wireguard itself :)
>> >> >
>> >> > Just for fun I implemented such a pair of eBPF filters, and tested that
>> >> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
>> >> > PoC is here:
>> >> >
>> >> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>> >> >
>> >> > To try it out (you'll need a recent-ish kernel and clang version) run:
>> >> >
>> >> > git clone --recurse-submodules 
>> >> > https://github.com/xdp-project/bpf-examples
>> >> > cd bpf-examples/preserve-dscp
>> >> > make
>> >> > ./preserve-dscp wg0 eth0
>> >> >
>> >> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
>> >> > question, respectively).
>> >> >
>> >> > To actually deploy this it would probably need a few tweaks; in
>> >> > particular the second filter that rewrites packets should probably check
>> >> > that the packets are actually part of the Wireguard tunnel in question
>> >> > (by parsing the UDP header and checking the source port) before writing
>> >> > anything to the packet.
>> >> >
>> >> > -Toke
>> >> 
>> >> That is a super cool approach. Thanks for writing that! Sounds like a
>> >> good approach, and one pretty easy to deploy, without the need to
>> >> patch kernels and such.
>> >> 
>> >> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
>> >> 
>> >> Daniel -- can you let the list know if this works for your use case?
>> >
>> > Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
>> > extremely recent environment. I will try pushing to that direction, but
>> > it doesn't look like it's going to be ready very soon.
>> >
>> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
>> > that and started working on integrating GCC-10's BPF target in our build
>> > system...
>> 
>> I saw that, but I have no idea if GCC's BPF target support will support
>> this. My tentative guess would be no, unfortunately :(
>
> Probably you are right. When building the BPF object with GCC, the
> result is:
> root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> libbpf: elf: skipping unrecognized data section(4) .stab
> libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
> libbpf: elf: skipping unrecognized data section(13) .comment
> libbpf: BTF is required, but is missing or corrupted.
> Couldn't open file: preserve_dscp_kern.o

Hmm, for this example it should be possible to make it run without BTF.
I'm only using that for the map definition, so that could be changed to
the old format; you could try this patch:

diff --git a/preserve-dscp/preserve_dscp_kern.c 
b/preserve-dscp/preserve_dscp_kern.c
index 24120cb8a3ff..08248e1f0e41 100644
--- a/preserve-dscp/preserve_dscp_kern.c
+++ 

Re: passing-through TOS/DSCP marking

2021-06-21 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
>> Hey Toke,
>> 
>> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen  wrote:
>> > > I think you can achieve something similar using BPF filters, by relying
>> > > on wireguard passing through the skb->hash value when encrypting.
>> > >
>> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
>> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
>> > > filter on the physical interface that shares that same map, lookup the
>> > > DSCP value based on the skb->hash value, and rewrite the outer IP
>> > > header.
>> > >
>> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
>> > > the hash is calculated before the packet gets handed to wireguard, and
>> > > it'll be subject to hash collisions, but I think it should generally
>> > > work fairly well (for anything that's flow-based of course). And it can
>> > > be done without patching wireguard itself :)
>> >
>> > Just for fun I implemented such a pair of eBPF filters, and tested that
>> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
>> > PoC is here:
>> >
>> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>> >
>> > To try it out (you'll need a recent-ish kernel and clang version) run:
>> >
>> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> > cd bpf-examples/preserve-dscp
>> > make
>> > ./preserve-dscp wg0 eth0
>> >
>> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
>> > question, respectively).
>> >
>> > To actually deploy this it would probably need a few tweaks; in
>> > particular the second filter that rewrites packets should probably check
>> > that the packets are actually part of the Wireguard tunnel in question
>> > (by parsing the UDP header and checking the source port) before writing
>> > anything to the packet.
>> >
>> > -Toke
>> 
>> That is a super cool approach. Thanks for writing that! Sounds like a
>> good approach, and one pretty easy to deploy, without the need to
>> patch kernels and such.
>> 
>> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
>> 
>> Daniel -- can you let the list know if this works for your use case?
>
> Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
> extremely recent environment. I will try pushing to that direction, but
> it doesn't look like it's going to be ready very soon.
>
> In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
> that and started working on integrating GCC-10's BPF target in our build
> system...

I saw that, but I have no idea if GCC's BPF target support will support
this. My tentative guess would be no, unfortunately :(

An alternative to getting LLVM built as part of the OpenWrt toolchain is
to just use the host clang to build the BPF binaries. It doesn't
actually need to be cross-compiled with a special compiler, the BPF byte
code format is the same on all architectures except for endianness, so
just passing that to the host clang should theoretically be enough...

> In terms of kernel support: recent kernels don't build yet because of
> gelf_getsymshndx, so we got to update libelf first for that. Recent
> libelf doesn't seem to be an option yet on many of the build hosts we
> currently support (Darwin and such).
>
> In terms of library support: our build of libbpf comes from Linux
> release tarballs. There isn't yet a release supporting bpf_tc_attach,
> the easiest would be to wait for Linux 5.13 to be released.

I used the libbpf TC loading support for convenience, but it's possible
to load it using 'tc' as well without too much trouble (right now the
userspace component sets a config variable before loading the program,
but it can be restructured to not need that).

Alternatively, the bpf-examples repository is setup with a libbpf
submodule that it can link statically against, so you could use that for
now?

> I (of course ;) also tried and spend almost a day looking for a
> quick-and-dirty path for temporary deployment, so I could at least give
> feedback -- bpf-examples also isn't exactly made to be cross-compiled
> manually, so I have failed with that as well so far.

Heh, no, it isn't, really. Anything in particular you need to make this
easier? We already added some bits to xdp-tools for supporting
cross-compilation (and that shares some lineage with bpf-examples), so
porting those over should not be too difficult.

See: https://github.com/xdp-project/xdp-tools/pull/78 and
https://github.com/xdp-project/xdp-tools/issues/74

Unfortunately I don't have a lot of time to poke more at this right now,
but feel free to open up an issue / pull request to the bpf-examples
repository with any changes you need :)

-Toke


Re: passing-through TOS/DSCP marking

2021-06-17 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Daniel Golle  writes:
>
>> Hi Florent,
>>
>> On Thu, Jun 17, 2021 at 07:55:09AM +, Florent Daigniere wrote:
>>> On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
>>> > Daniel Golle  writes:
>>> > 
>>> > > Hi Jason,
>>> > > 
>>> > > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>>> > > > WireGuard does not copy the inner DSCP mark to the outside, aside
>>> > > > from
>>> > > > the ECN bits, in order to avoid a data leak.
>>> > > 
>>> > > That's a very valid argument.
>>> > > 
>>> > > However, from my experience now, Wireguard is not suitable for
>>> > > VoIP/RTP
>>> > > data (minimize-delay) being sent through the same tunnel as TCP bulk
>>> > > (maximize-throughput) traffic in bandwidth constraint and/or high-
>>> > > latency
>>> > > environments, as that ruins the VoIP calls to the degree of not
>>> > > being
>>> > > understandable. ECN helps quite a bit when it comes to avoid packet
>>> > > drops
>>> > > for TCP traffic, but that's not enough to avoid high jitter and
>>> > > drops for
>>> > > RTP/UDP traffic at the same time.
>>> > > 
>>> > > I thought about ways to improve that and wonder what you would
>>> > > suggest.
>>> > > My ideas are:
>>> > >  * have different tunnels depending on inner DSCP bits and mark them
>>> > >accordingly on the outside.
>>> > >=> we already got multiple tunnels and that would double the
>>> > > number.
>>> > > 
>>> > >  * mark outer packets with DSCP bits based on their size.
>>> > >VoIP RTP/UDP packets are typically "medium sized" while TCP
>>> > > packets
>>> > >typically max out the MTU.
>>> > >=> we would not leak information, but that assumption may not
>>> > > always
>>> > >   be true
>>> > > 
>>> > >  * patch wireguard kernel code to allow preserving inner DSCP bits.
>>> > >=> even only having 2 differentl classes of traffic (critical vs.
>>> > >   bulk) would already help a lot...
>>> > > 
>>> > > 
>>> > > What do you think? Any other ideas?
>>> > 
>>> > Can you share a few more details about the network setup? I.e., where
>>> > is
>>> > the bottleneck link that requires this special treatment?
>>> 
>>> I can tell you about mine. WiFi in a congested environment: "voip on
>>> mobile phones". WMM/802.11e uses the diffserv markings; most commercial
>>> APs will do the right thing provided packets are marked appropriately.
>>> 
>>> At the time I have sent patches (back in 2019) for both the golang and
>>> linux implementation that turned it on by default. I believe that
>>> Russell Strong further improved upon them by adding a knob (20190318 on
>>> this mailing list).
>>
>> Thank you very much for the hint!
>> This patch is exactly what I was looking for:
>> https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html
>>
>> Unfortunately it has not received a great amount of feedback back then.
>> I'll try forward-porting and deploying it now, because to me it looks
>> like the best solution money can buy :)
>
> I think you can achieve something similar using BPF filters, by relying
> on wireguard passing through the skb->hash value when encrypting.
>
> Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
> value and store it in a map keyed on skb->hash. Then, run a second BPF
> filter on the physical interface that shares that same map, lookup the
> DSCP value based on the skb->hash value, and rewrite the outer IP
> header.
>
> The read-side filter will need to use bpf_get_hash_recalc() to make sure
> the hash is calculated before the packet gets handed to wireguard, and
> it'll be subject to hash collisions, but I think it should generally
> work fairly well (for anything that's flow-based of course). And it can
> be done without patching wireguard itself :)

Just for fun I implemented such a pair of eBPF filters, and tested that
it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
PoC is here:

https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp

To try it out (you'll need a recent-ish kernel and clang version) run:

git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
cd bpf-examples/preserve-dscp
make
./preserve-dscp wg0 eth0

(assuming wg0 and eth0 are the wireguard and physical interfaces in
question, respectively).

To actually deploy this it would probably need a few tweaks; in
particular the second filter that rewrites packets should probably check
that the packets are actually part of the Wireguard tunnel in question
(by parsing the UDP header and checking the source port) before writing
anything to the packet.

-Toke


Re: passing-through TOS/DSCP marking

2021-06-17 Thread Toke Høiland-Jørgensen
Reid Rankin  writes:

> It can also be done in a shell script with nftables (maybe iptables too,
> haven't tried) by taking advantage of fwmark passthrough. You can have one
> rule that matches incoming outgoing packets (heh) with a certain dscp value
> and marks them, and another rule that matches outgoing outgoing packets
> with that mark and sets the DSCP bits back.

The fwmark is not passed through wireguard, though, it's cleared during
skb scrubbing:

https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L5344

There's an fwmark config that you can set which will make wireguard
apply a certain mark to all outgoing packets, but that has nothing to
do with what was set on the inner packet...

-Toke


Re: passing-through TOS/DSCP marking

2021-06-17 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> Hi Florent,
>
> On Thu, Jun 17, 2021 at 07:55:09AM +, Florent Daigniere wrote:
>> On Thu, 2021-06-17 at 01:33 +0200, Toke Høiland-Jørgensen wrote:
>> > Daniel Golle  writes:
>> > 
>> > > Hi Jason,
>> > > 
>> > > On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>> > > > WireGuard does not copy the inner DSCP mark to the outside, aside
>> > > > from
>> > > > the ECN bits, in order to avoid a data leak.
>> > > 
>> > > That's a very valid argument.
>> > > 
>> > > However, from my experience now, Wireguard is not suitable for
>> > > VoIP/RTP
>> > > data (minimize-delay) being sent through the same tunnel as TCP bulk
>> > > (maximize-throughput) traffic in bandwidth constraint and/or high-
>> > > latency
>> > > environments, as that ruins the VoIP calls to the degree of not
>> > > being
>> > > understandable. ECN helps quite a bit when it comes to avoid packet
>> > > drops
>> > > for TCP traffic, but that's not enough to avoid high jitter and
>> > > drops for
>> > > RTP/UDP traffic at the same time.
>> > > 
>> > > I thought about ways to improve that and wonder what you would
>> > > suggest.
>> > > My ideas are:
>> > >  * have different tunnels depending on inner DSCP bits and mark them
>> > >accordingly on the outside.
>> > >=> we already got multiple tunnels and that would double the
>> > > number.
>> > > 
>> > >  * mark outer packets with DSCP bits based on their size.
>> > >VoIP RTP/UDP packets are typically "medium sized" while TCP
>> > > packets
>> > >typically max out the MTU.
>> > >=> we would not leak information, but that assumption may not
>> > > always
>> > >   be true
>> > > 
>> > >  * patch wireguard kernel code to allow preserving inner DSCP bits.
>> > >=> even only having 2 differentl classes of traffic (critical vs.
>> > >   bulk) would already help a lot...
>> > > 
>> > > 
>> > > What do you think? Any other ideas?
>> > 
>> > Can you share a few more details about the network setup? I.e., where
>> > is
>> > the bottleneck link that requires this special treatment?
>> 
>> I can tell you about mine. WiFi in a congested environment: "voip on
>> mobile phones". WMM/802.11e uses the diffserv markings; most commercial
>> APs will do the right thing provided packets are marked appropriately.
>> 
>> At the time I have sent patches (back in 2019) for both the golang and
>> linux implementation that turned it on by default. I believe that
>> Russell Strong further improved upon them by adding a knob (20190318 on
>> this mailing list).
>
> Thank you very much for the hint!
> This patch is exactly what I was looking for:
> https://lists.zx2c4.com/pipermail/wireguard/2019-March/004026.html
>
> Unfortunately it has not received a great amount of feedback back then.
> I'll try forward-porting and deploying it now, because to me it looks
> like the best solution money can buy :)

I think you can achieve something similar using BPF filters, by relying
on wireguard passing through the skb->hash value when encrypting.

Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
value and store it in a map keyed on skb->hash. Then, run a second BPF
filter on the physical interface that shares that same map, lookup the
DSCP value based on the skb->hash value, and rewrite the outer IP
header.

The read-side filter will need to use bpf_get_hash_recalc() to make sure
the hash is calculated before the packet gets handed to wireguard, and
it'll be subject to hash collisions, but I think it should generally
work fairly well (for anything that's flow-based of course). And it can
be done without patching wireguard itself :)

-Toke


Re: passing-through TOS/DSCP marking

2021-06-16 Thread Toke Høiland-Jørgensen
Daniel Golle  writes:

> Hi Jason,
>
> On Wed, Jun 16, 2021 at 06:28:12PM +0200, Jason A. Donenfeld wrote:
>> WireGuard does not copy the inner DSCP mark to the outside, aside from
>> the ECN bits, in order to avoid a data leak.
>
> That's a very valid argument.
>
> However, from my experience now, Wireguard is not suitable for VoIP/RTP
> data (minimize-delay) being sent through the same tunnel as TCP bulk
> (maximize-throughput) traffic in bandwidth constraint and/or high-latency
> environments, as that ruins the VoIP calls to the degree of not being
> understandable. ECN helps quite a bit when it comes to avoid packet drops
> for TCP traffic, but that's not enough to avoid high jitter and drops for
> RTP/UDP traffic at the same time.
>
> I thought about ways to improve that and wonder what you would suggest.
> My ideas are:
>  * have different tunnels depending on inner DSCP bits and mark them
>accordingly on the outside.
>=> we already got multiple tunnels and that would double the number.
>
>  * mark outer packets with DSCP bits based on their size.
>VoIP RTP/UDP packets are typically "medium sized" while TCP packets
>typically max out the MTU.
>=> we would not leak information, but that assumption may not always
>   be true
>
>  * patch wireguard kernel code to allow preserving inner DSCP bits.
>=> even only having 2 differentl classes of traffic (critical vs.
>   bulk) would already help a lot...
>
>
> What do you think? Any other ideas?

Can you share a few more details about the network setup? I.e., where is
the bottleneck link that requires this special treatment? If the
bottleneck router is the same one that does the wireguard encapsulation
(e.g., a home router with a slow uplink), you should be able to just use
flow queueing (fq_codel or sch_cake) in place of your diffserv-based
prioritisation and get most of the benefit: wireguard will save the
packet hash before encapsulation, so any qdisc running on the same box
can actually distinguish flows even on the encapsulated packets...

-Toke


[PATCH] sysdep/bsd: propagate OS-level IFF_MULTICAST to internal IF_MULTICAST flag

2021-04-19 Thread Toke Høiland-Jørgensen
The BSD code did not propagate the OS-level IFF_MULTICAST flag to the
Bird-internal IF_MULTICAST flag, which causes problems with Wireguard
interfaces on FreeBSD. The Linux sysdep code does propagate the flag
already, so just copy over the same check and flag update.

Tested-by: Stefan Haller 
Signed-off-by: Toke Høiland-Jørgensen 
---
 sysdep/bsd/krt-sock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sysdep/bsd/krt-sock.c b/sysdep/bsd/krt-sock.c
index c2faa23dd44f..cd89544063c7 100644
--- a/sysdep/bsd/krt-sock.c
+++ b/sysdep/bsd/krt-sock.c
@@ -665,6 +665,9 @@ krt_read_ifinfo(struct ks_msg *msg, int scan)
   else
 f.flags |= IF_MULTIACCESS;  /* NBMA */
 
+  if (fl & IFF_MULTICAST)
+f.flags |= IF_MULTICAST;
+
   iface = if_update(&f);
 
   if (!scan)
-- 
2.31.1



Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-19 Thread Toke Høiland-Jørgensen
Stefan Haller  writes:

> On Mon, Apr 19, 2021 at 01:42:58PM -0600, Jason A. Donenfeld wrote:
>> On Mon, Apr 19, 2021 at 1:42 PM Stefan Haller  wrote:
>> >
>> > On Mon, Apr 19, 2021 at 08:25:46PM +0200, Toke Høiland-Jørgensen wrote:
>> > > Stefan, any chance you could test this patch to Bird (*instead of* the
>> > > previous one that removes the check from the Babel code)?
>> >
>> > The patch is working on FreeBSD 13.0.
>> 
>> Just FYI, the previous patch was added to ports, so I wanted to double
>> check that you removed that previous patch before adding this one...
>
> Yes, I did remove the old patch (in proto/babel/babel.c):
>
>> root@fbsd:/usr/ports/net/bird2 # git status .
>> On branch main
>> Changes not staged for commit:
>>   (use "git add/rm ..." to update what will be committed)
>>   (use "git restore ..." to discard changes in working directory)
>> deleted:files/patch-proto_babel_babel.c
>> 
>> Untracked files:
>>   (use "git add ..." to include in what will be committed)
>> files/patch-sysdep_bsd_krt-sock.c

Awesome! Thank you for testing! :)

-Toke


Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-19 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Stefan Haller  writes:
>
>> Hi Jason,
>>
>> On Thu, Apr 15, 2021 at 06:05:03PM -0600, Jason A. Donenfeld wrote:
>>> I spent the day playing around with bird and babel and sorted out
>>> FreeBSD's v6 situation. Basically, ff00::/8 addresses are treated
>>> differently, and they're blocked unless the interface sets
>>> IFF_MULTICAST. So I've committed
>>> https://git.zx2c4.com/wireguard-freebsd/commit/?id=a7a84a17faf784857f076e37aa4818f6b6c12a95
>>> to do this.
>>
>> That is also what I observed. Without IFF_MULTICAST I see the following
>> error in bird's log:
>>
>> bird[8045]: babel1: Socket error: IPV6_MULTICAST_IF: Can't assign requested 
>> address
>> bird[8045]: babel1: Cannot open socket for wg1
>>
>>> Stefan - please let me know if those work for you. In my testing thus
>>> far, things seem to work for me.
>>
>> After applying Toke's patch for bird and your Wireguard patch in
>> a7a84a17faf784 everything is working as before (with minor config
>> changes).
>>
>> Just for the record, my previous configuration looked like this (using
>> POINTTOPOINT interfaces, I use ifconfig to set the peer address):
>>
>>> [Interface]
>>> ...
>>> Address = fe80::5/64
>>> PostUp = ifconfig %i inet 169.254.42.5/32 169.254.42.2
>>
>> My new configuration without POINTTOPOINT, but only a single peer
>> directly attached to other side of the wg tunnel:
>>
>>> [Interface]
>>> ...
>>> Address = 169.254.42.5/32, fe80::5/64
>>> PostUp = route add 169.254.42.2 -iface %i
>>
>> So for me everything works as expected again. A big thanks to all of you for
>> figuring out what was going wrong and getting it fixed so quickly.
>
> Great! You're welcome :)

Stefan, any chance you could test this patch to Bird (*instead of* the
previous one that removes the check from the Babel code)?

-Toke

diff --git a/sysdep/bsd/krt-sock.c b/sysdep/bsd/krt-sock.c
index c2faa23dd44f..cd89544063c7 100644
--- a/sysdep/bsd/krt-sock.c
+++ b/sysdep/bsd/krt-sock.c
@@ -665,6 +665,9 @@ krt_read_ifinfo(struct ks_msg *msg, int scan)
   else
 f.flags |= IF_MULTIACCESS;  /* NBMA */
 
+  if (fl & IFF_MULTICAST)
+f.flags |= IF_MULTICAST;
+
   iface = if_update(&f);
 
   if (!scan)


Re: [PATCH] babel: Drop check for IF_MULTICAST interface flag

2021-04-19 Thread Toke Høiland-Jørgensen
Ondrej Zajicek  writes:

> On Mon, Apr 19, 2021 at 03:55:18PM +0200, Toke Høiland-Jørgensen wrote:
>> Ondrej Zajicek  writes:
>> 
>> > Is there a reason why to disregard the IF_MULTICAST flag? This seems to me
>> > more like a bug in FreeBSD Wireguard implementation that should be fixed
>> > there. Is this flag properly checked on Linux, or is there some reason why
>> > the flag is missing?
>> 
>> We did fix Wireguard - see:
>> https://git.zx2c4.com/wireguard-freebsd/patch/?id=a7a84a17faf784857f076e37aa4818f6b6c12a95
>> 
>> However, that didn't help, Babel still refused to use the interface.
>> Looking at krt-sock.c, the IF_MULTICAST flag is only set on
>> IFF_POINTOPOINT or IFF_BROADCAST on bsd. The Linux code (in netlink.c)
>> has a further:
>> 
>>   if (fl & IFF_MULTICAST)
>>  f.flags |= IF_MULTICAST;
>> 
>> beneath the other flag checks, so maybe that's really what's missing on
>> the BSD side?
>
> Yes, it is likely that it is an issue in sysdep/bsd code.

Alright, I'll send a patch for that then :)

>> > Routing protocols in BIRD generally follow this flag (and perhaps use
>> > it to switch to unicast-only mode), so i do not see why Babel should
>> > behave differently.
>> 
>> Yeah, I do believe I originally copied that check from one of the other
>> protocols. I can see how it makes sense to check the flag and change
>> operation mode based on it, but given that Babel doesn't do that it just
>> seems kinda redundant? If the interface *actually* is unable to send
>> multicast packets, the subsequent socket operation is going to fail, and
>> at least that produces an error message instead of just silently
>> ignoring the interface like that flag check does :)
>
> Well, i am OK with generating a warning in cases of non-matching interface
> type, instead of ignoring it silently. (In contrast to iface down or missing
> lladdr, which should be silent, as it may correct later.)

OK, fine with me; I'll send an updated patch that adds a warning instead
of dropping the check...

-Toke


Re: [PATCH] babel: Drop check for IF_MULTICAST interface flag

2021-04-19 Thread Toke Høiland-Jørgensen
Ondrej Zajicek  writes:

> On Thu, Apr 15, 2021 at 03:44:50PM +0200, Toke Høiland-Jørgensen wrote:
>> The babel protocol code was checking interfaces for the IF_MULTICAST flag
>> and refusing to run if this isn't present. However, there are cases where
>> this flag doesn't correspond to the actual capability of sending multicast
>> packets. For instance, Wireguard interfaces on FreeBSD doesn't set the
>> required flags, but Babel will run just fine over such an interface given
>> the right configuration.
>
> Hi
>
> Is there a reason why to disregard the IF_MULTICAST flag? This seems to me
> more like a bug in FreeBSD Wireguard implementation that should be fixed
> there. Is this flag properly checked on Linux, or is there some reason why
> the flag is missing?

We did fix Wireguard - see:
https://git.zx2c4.com/wireguard-freebsd/patch/?id=a7a84a17faf784857f076e37aa4818f6b6c12a95

However, that didn't help, Babel still refused to use the interface.
Looking at krt-sock.c, the IF_MULTICAST flag is only set on
IFF_POINTOPOINT or IFF_BROADCAST on bsd. The Linux code (in netlink.c)
has a further:

  if (fl & IFF_MULTICAST)
f.flags |= IF_MULTICAST;

beneath the other flag checks, so maybe that's really what's missing on
the BSD side?

> Routing protocols in BIRD generally follow this flag (and perhaps use
> it to switch to unicast-only mode), so i do not see why Babel should
> behave differently.

Yeah, I do believe I originally copied that check from one of the other
protocols. I can see how it makes sense to check the flag and change
operation mode based on it, but given that Babel doesn't do that it just
seems kinda redundant? If the interface *actually* is unable to send
multicast packets, the subsequent socket operation is going to fail, and
at least that produces an error message instead of just silently
ignoring the interface like that flag check does :)

-Toke


Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-16 Thread Toke Høiland-Jørgensen
Stefan Haller  writes:

> Hi Jason,
>
> On Thu, Apr 15, 2021 at 06:05:03PM -0600, Jason A. Donenfeld wrote:
>> I spent the day playing around with bird and babel and sorted out
>> FreeBSD's v6 situation. Basically, ff00::/8 addresses are treated
>> differently, and they're blocked unless the interface sets
>> IFF_MULTICAST. So I've committed
>> https://git.zx2c4.com/wireguard-freebsd/commit/?id=a7a84a17faf784857f076e37aa4818f6b6c12a95
>> to do this.
>
> That is also what I observed. Without IFF_MULTICAST I see the following
> error in bird's log:
>
> bird[8045]: babel1: Socket error: IPV6_MULTICAST_IF: Can't assign requested 
> address
> bird[8045]: babel1: Cannot open socket for wg1
>
>> Stefan - please let me know if those work for you. In my testing thus
>> far, things seem to work for me.
>
> After applying Toke's patch for bird and your Wireguard patch in
> a7a84a17faf784 everything is working as before (with minor config
> changes).
>
> Just for the record, my previous configuration looked like this (using
> POINTTOPOINT interfaces, I use ifconfig to set the peer address):
>
>> [Interface]
>> ...
>> Address = fe80::5/64
>> PostUp = ifconfig %i inet 169.254.42.5/32 169.254.42.2
>
> My new configuration without POINTTOPOINT, but only a single peer
> directly attached to other side of the wg tunnel:
>
>> [Interface]
>> ...
>> Address = 169.254.42.5/32, fe80::5/64
>> PostUp = route add 169.254.42.2 -iface %i
>
> So for me everything works as expected again. A big thanks to all of you for
> figuring out what was going wrong and getting it fixed so quickly.

Great! You're welcome :)

-Toke


Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-15 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi Stefan,
>
> Sounds like Toke has come up with the optimal solution, so I think
> I'll drop the "link1" patch/hack.
>
> One thing I was wondering though, mostly for my own curiosity, is what
> config you're using that shows the problem, and how does the problem
> manifest?
>
> I just put this in bird.conf:
>
> router id 192.168.88.2;
> protocol babel {
>interface "wg0" {
>type wired;
>port 1234;
>};
> };
>
> And then I ran:
>
> # pkg install bird2
> # bird -d -c bird.conf
>
> And I didn't see any troubling error messages. But maybe it's more
> subtle than that?

I think it would just silently ignore the interface; does it say
anything about running on it? You could also see if there's any traffic,
there should be UDP packets with dest port 6696 appearing if it does
run...

-Toke


[PATCH] babel: Drop check for IF_MULTICAST interface flag

2021-04-15 Thread Toke Høiland-Jørgensen
The babel protocol code was checking interfaces for the IF_MULTICAST flag
and refusing to run if this isn't present. However, there are cases where
this flag doesn't correspond to the actual capability of sending multicast
packets. For instance, Wireguard interfaces on FreeBSD doesn't set the
required flags, but Babel will run just fine over such an interface given
the right configuration.

Since we're also checking for the presence of a link-local addresses right
below the flag check, we don't really need it. So let's just drop the check
and trust that users will only configure Babel on interfaces that can
handle the traffic.

Reported-by: Stefan Haller 
Signed-off-by: Toke Høiland-Jørgensen 
---
 proto/babel/babel.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/proto/babel/babel.c b/proto/babel/babel.c
index 4b6b9d7f9f6f..297b86b06a46 100644
--- a/proto/babel/babel.c
+++ b/proto/babel/babel.c
@@ -1658,10 +1658,6 @@ babel_if_notify(struct proto *P, unsigned flags, struct 
iface *iface)
 if (!(iface->flags & IF_UP))
   return;
 
-/* We only speak multicast */
-if (!(iface->flags & IF_MULTICAST))
-  return;
-
 /* Ignore ifaces without link-local address */
 if (!iface->llv6)
   return;
@@ -1736,10 +1732,6 @@ babel_reconfigure_ifaces(struct babel_proto *p, struct 
babel_config *cf)
 if (!(iface->flags & IF_UP))
   continue;
 
-/* Ignore non-multicast ifaces */
-if (!(iface->flags & IF_MULTICAST))
-  continue;
-
 /* Ignore ifaces without link-local address */
 if (!iface->llv6)
   continue;
-- 
2.31.1



Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-15 Thread Toke Høiland-Jørgensen
Stefan Haller  writes:

> Hi Toke,
>
> On Thu, Apr 15, 2021 at 12:14:04AM +0200, Toke Høiland-Jørgensen wrote:
>> That's because the babel protocol code is checking for Bird's internal
>> MULTICAST flag, which is set like:
>> 
>>   else if (fl & IFF_POINTOPOINT)/* PtP */
>> f.flags |= IF_MULTICAST;
>>   else if (fl & IFF_BROADCAST)  /* Broadcast */
>> f.flags |= IF_MULTIACCESS | IF_BROADCAST | IF_MULTICAST;
>> 
>> so it needs either the OS-level POINTOPOINT or the BROADCAST flag set.
>> Wireguard interfaces on Linux has POINTOPOINT which is enough for Bird.
>
> That explains a lot. I expected something like this, but did not have
> time yet to look more closely.
>
>> And yeah, for now Babel only speaks multicast; the spec does allow for
>> unicast communication, but the code in Bird doesn't implement that yet
>> (I'm the author of the Babel implementation in Bird). Even for unicast,
>> Babel still needs multicast for discovery, but in the case of Wireguard
>> that could be replaced by reading the peers directly from the Wireguard
>> kernel module. Add in updating of Wireguard AllowedIPs, and presto,
>> there's you completely dynamic mesh requiring only a single wg interface
>> on each peer :)
>
> Overall, this sounds like a great idea. Having to create so many
> wireguard p2p tunnels to form a mesh is quite cumbersome. Using
> Wireguards AllowedIPs as an alternative to the kernel routing table
> sounds useful. The peer discovery could also be useful outside of the
> babel protocol implementation (even though it will always be quite
> non-standard). One could probably assume that the first configured
> v6/128 and v4/32 IPs belong to the directly connected peer.
>
>> Quite happy to review Bird patches if someone wants to hack on this,
>> BTW, but otherwise it's on my "eventually" list :P
>
> While I am interested and it sounds like a great opportunity to learn
> cool new things I don't know a lot about yet, I have to see if I am
> actually up to the task. :)
>
> Anyway, I think there is an agreement that it is better to add specific
> support for Wireguard interfaces in bird instead of changing the
> interface flags.

Yeah; in the meantime, you can just patch Bird; just get rid of this
check in proto/babel/babel.c (there are two of them):

if (!(iface->flags & IF_MULTICAST))
  continue;

should have no ill effects. Actually I think we could just get rid of
that check entirely, it's not strictly needed for anything other than
maybe filtering a very wide glob of interfaces. I'll send a patch...

-Toke


Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-15 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey Toke,
>
> Regarding POINTTOPOINT flag in Linux vs FreeBSD -- apparently FreeBSD
> routes everything differently simply by virtue of the interface having
> that flag, whereas on Linux, PTP routing mode is only switched on if
> you actually add an address with a dest peer. So for FreeBSD, the
> different routing seemed somewhat disruptive, so I stopped setting
> that flag in a recent snapshot release.

Ah, of course that would be different...

> Hey Stefan,
>
> Looking at bird's source code (iface.c), it looks like bird will only
> look at IFF flags if you fail to specify the interface type in the
> config file. Can you try *not changing to link1/ptp mode* and then
> setting the type flag in your bird config? Specifically:
>
> interface  [instance ] {
> ...
> type [broadcast|bcast|pointopoint|ptp|
> nonbroadcast|nbma|pointomultipoint|ptmp];

This is from the OSPF protocol code, though, so it won't work for
Babel.

-Toke


Re: FreeBSD if_wg POINTTOPOINT and MULTICAST behaviour

2021-04-14 Thread Toke Høiland-Jørgensen
Stefan Haller  writes:

> Hi Jason,
>
> Thanks for your clarification. I understand that setting this flag would
> be a false promise to userspace, because generally Wireguard is
> point-to-multipoint and doesn't copy messages to multiple peers (which
> is not exactly necessary in my case, where only a single peer is
> configured on both sides).
>
> I just wanted to ensure that the introduced change was intentional
> before looking into other directions, hence my question.
>
> On Wed, Apr 14, 2021 at 02:24:20PM -0600, Jason A. Donenfeld wrote:
>> Does bird completely ignore interfaces without it? Is there no setting
>> to change that?
>
> At least a brief look at the code suggests this: [1]
>
> The Babel protocol seems to rely on well-known *link-local* IPv6
> multicast addresses. I did not find anything related to unicast "hello"
> messages in the RFC or in the implementations. (OSPF is similar, but
> as far as I remember unicast hellos are explicitly allowed.)
>
> One odd thing I noticed: On Linux (5.11.13-arch1-1, so quite recent),
> the interface does not list the MULTICAST flag and the interface is
> still used by bird:
>
> # ip l show dev wg1
> 4: wg1:  mtu 1400 qdisc noqueue state UNKNOWN 
> mode DEFAULT group default qlen 1000
>
> I will have a closer look why it doesn't work on FreeBSD but the same thing
> works on Linux. I am probably missing something important.

That's because the babel protocol code is checking for Bird's internal
MULTICAST flag, which is set like:

  else if (fl & IFF_POINTOPOINT)/* PtP */
f.flags |= IF_MULTICAST;
  else if (fl & IFF_BROADCAST)  /* Broadcast */
f.flags |= IF_MULTIACCESS | IF_BROADCAST | IF_MULTICAST;

so it needs either the OS-level POINTOPOINT or the BROADCAST flag set.
Wireguard interfaces on Linux has POINTOPOINT which is enough for Bird.

And yeah, for now Babel only speaks multicast; the spec does allow for
unicast communication, but the code in Bird doesn't implement that yet
(I'm the author of the Babel implementation in Bird). Even for unicast,
Babel still needs multicast for discovery, but in the case of Wireguard
that could be replaced by reading the peers directly from the Wireguard
kernel module. Add in updating of Wireguard AllowedIPs, and presto,
there's you completely dynamic mesh requiring only a single wg interface
on each peer :)

Quite happy to review Bird patches if someone wants to hack on this,
BTW, but otherwise it's on my "eventually" list :P

-Toke


Re: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring buffers

2021-02-17 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Wed, Feb 17, 2021 at 7:36 PM Toke Høiland-Jørgensen  wrote:
>> Are these performance measurements are based on micro-benchmarks of the
>> queueing structure, or overall wireguard performance? Do you see any
>> measurable difference in the overall performance (i.e., throughput
>> drop)?
>
> These are from counting cycles per instruction using perf and seeing
> which instructions are hotspots that take a greater or smaller
> percentage of the overall time.

Right. Would still love to see some actual numbers if you have them.
I.e., what kind of overhead is the queueing operations compared to the
rest of the wg data path, and how much of that is the hotspot
operations? Even better if you have a comparison with a spinlock
version, but I do realise that may be asking too much :)

>> And what about relative to using one of the existing skb queueing
>> primitives in the kernel? Including some actual numbers would be nice to
>> justify adding yet-another skb queueing scheme to the kernel :)
>
> If you're referring to skb_queue_* and friends, those very much will
> not work in any way, shape, or form here. Aside from the fact that the
> MPSC nature of it is problematic for performance, those functions use
> a doubly linked list. In wireguard's case, there is only one pointer
> available (skb->prev), as skb->next is used to create the singly
> linked skb_list (see skb_list_walk_safe) of gso frags. And in fact, by
> having these two pointers next to each other for the separate lists,
> it doesn't need to pull in another cache line. This isn't "yet-another
> queueing scheme" in the kernel. This is just a singly linked list
> queue.

Having this clearly articulated in the commit message would be good, and
could prevent others from pushing back against what really does appear
at first glance to be "yet-another queueing scheme"...

I.e., in the version you posted you go "the ring buffer is too much
memory, so here's a new linked-list queueing algoritm", skipping the
"and this is why we can't use any of the existing ones" in-between.

>> I say this also because the actual queueing of the packets has never
>> really shown up on any performance radar in the qdisc and mac80211
>> layers, which both use traditional spinlock-protected queueing
>> structures.
>
> Those are single threaded and the locks aren't really contended much.
>
>> that would be good; also for figuring out if this algorithm might be
>> useful in other areas as well (and don't get me wrong, I'm fascinated by
>> it!).
>
> If I find the motivation -- and if the mailing list conversations
> don't become overly miserable -- I might try to fashion the queueing
> mechanism into a general header-only data structure in include/linux/.
> But that'd take a bit of work to see if there are actually places
> where it matters and where it's useful. WireGuard can get away with it
> because of its workqueue design, but other things probably aren't as
> lucky like that. So I'm on the fence about generality.

Yeah, I can't think of any off the top of my head either. But I'll
definitely keep this in mind if I do run into any. If there's no obvious
contenders, IMO it would be fine to just keep it internal to wg until
such a use case shows up, and then generalise it at that time. Although
that does give it less visibility for other users, it also saves you
some potentially-redundant work :)

>> > - if (wg_packet_queue_init(&peer->tx_queue, wg_packet_tx_worker, false,
>> > -  MAX_QUEUED_PACKETS))
>> > - goto err_2;
>> > + INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker);
>>
>> It's not quite clear to me why changing the queue primitives requires
>> adding another work queue?
>
> It doesn't require a new workqueue. It's just that a workqueue was
> init'd earlier in the call to "wg_packet_queue_init", which allocated
> a ring buffer at the same time. We're not going through that
> infrastructure anymore, but I still want the workqueue it used, so I
> init it there instead. I truncated the diff in my quoted reply -- take
> a look at that quote above and you'll see more clearly what I mean.

Ah, right, it's moving things from wg_packet_queue_init() - missed that.
Thanks!

>> > +#define NEXT(skb) ((skb)->prev)
>>
>> In particular, please explain this oxymoronic define :)
>
> I can write more about that, sure. But it's what I wrote earlier in
> this email -- the next pointer is taken; the prev one is free. So,
> th

Re: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring buffers

2021-02-17 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Having two ring buffers per-peer means that every peer results in two
> massive ring allocations. On an 8-core x86_64 machine, this commit
> reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
> is an 90% reduction. Ninety percent! With some single-machine
> deployments approaching 400,000 peers, we're talking about a reduction
> from 7 gigs of memory down to 700 megs of memory.
>
> In order to get rid of these per-peer allocations, this commit switches
> to using a list-based queueing approach. Currently GSO fragments are
> chained together using the skb->next pointer, so we form the per-peer
> queue around the unused skb->prev pointer, which makes sense because the
> links are pointing backwards.

"which makes sense because the links are pointing backwards" - huh?

> Multiple cores can write into the queue at any given time, because its
> writes occur in the start_xmit path or in the udp_recv path. But reads
> happen in a single workqueue item per-peer, amounting to a
> multi-producer, single-consumer paradigm.
>
> The MPSC queue is implemented locklessly and never blocks. However, it
> is not linearizable (though it is serializable), with a very tight and
> unlikely race on writes, which, when hit (about 0.15% of the time on a
> fully loaded 16-core x86_64 system), causes the queue reader to
> terminate early. However, because every packet sent queues up the same
> workqueue item after it is fully added, the queue resumes again, and
> stopping early isn't actually a problem, since at that point the packet
> wouldn't have yet been added to the encryption queue. These properties
> allow us to avoid disabling interrupts or spinning.

Wow, so this was a fascinating rabbit hole into the concurrent algorithm
realm, thanks to Dmitry's link to his original posting of the algorithm.
Maybe referencing the origin of the algorithm would be nice for context
and posterity (as well as commenting it so the original properties are
not lost if the source should disappear)?

> Performance-wise, ordinarily list-based queues aren't preferable to
> ringbuffers, because of cache misses when following pointers around.
> However, we *already* have to follow the adjacent pointers when working
> through fragments, so there shouldn't actually be any change there. A
> potential downside is that dequeueing is a bit more complicated, but the
> ptr_ring structure used prior had a spinlock when dequeueing, so all and
> all the difference appears to be a wash.
>
> Actually, from profiling, the biggest performance hit, by far, of this
> commit winds up being atomic_add_unless(count, 1, max) and atomic_
> dec(count), which account for the majority of CPU time, according to
> perf. In that sense, the previous ring buffer was superior in that it
> could check if it was full by head==tail, which the list-based approach
> cannot do.

Are these performance measurements are based on micro-benchmarks of the
queueing structure, or overall wireguard performance? Do you see any
measurable difference in the overall performance (i.e., throughput
drop)? And what about relative to using one of the existing skb queueing
primitives in the kernel? Including some actual numbers would be nice to
justify adding yet-another skb queueing scheme to the kernel :)

I say this also because the actual queueing of the packets has never
really shown up on any performance radar in the qdisc and mac80211
layers, which both use traditional spinlock-protected queueing
structures. Now Wireguard does have a somewhat unusual structure with
the MPSC pattern, so it may of course be different here. But quantifying
that would be good; also for figuring out if this algorithm might be
useful in other areas as well (and don't get me wrong, I'm fascinated by
it!).

> Cc: Dmitry Vyukov 
> Signed-off-by: Jason A. Donenfeld 
> ---
> Hoping to get some feedback here from people running massive deployments
> and running into ram issues, as well as Dmitry on the queueing semantics
> (the mpsc queue is his design), before I send this to Dave for merging.
> These changes are quite invasive, so I don't want to get anything wrong.
>
>  drivers/net/wireguard/device.c   | 12 ++---
>  drivers/net/wireguard/device.h   | 15 +++---
>  drivers/net/wireguard/peer.c | 29 ---
>  drivers/net/wireguard/peer.h |  4 +-
>  drivers/net/wireguard/queueing.c | 82 +---
>  drivers/net/wireguard/queueing.h | 45 +-
>  drivers/net/wireguard/receive.c  | 16 +++
>  drivers/net/wireguard/send.c | 31 +---
>  8 files changed, 141 insertions(+), 93 deletions(-)
>
> diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
> index cd51a2afa28e..d744199823b3 100644
> --- a/drivers/net/wireguard/device.c
> +++ b/drivers/net/wireguard/device.c
> @@ -234,8 +234,8 @@ static void wg_destruct(struct net_device *dev)
>   destroy_workqueue(wg->handshake_receive_wq);
>   destro

Re: network namespace wireguard routing [Was: Re: Userspace Networking Stack + WireGuard + Go]

2021-01-14 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Wed, Jan 13, 2021 at 5:46 PM Toke Høiland-Jørgensen  wrote:
>> 5. also requires CAP_SYS_ADMIN (and I think by extension, so does 3.,
>> and 4.). From 'man setns':
>>
>>Network, IPC, time, and UTS namespaces
>>   In order to reassociate itself with a new network, IPC,
>>   time, or UTS namespace, the caller must have the
>>   CAP_SYS_ADMIN capability both in its own user namespace
>>   and in the user namespace that owns the target namespace.
>
> For this, you just create a new user namespace first. You can try it
> yourself from the command line:
>
> zx2c4@thinkpad ~ $ unshare -n
> unshare: unshare failed: Operation not permitted
> zx2c4@thinkpad ~ $ unshare -Un
> nobody@thinkpad ~ $ ip a
> 1: lo:  mtu 65536 qdisc noop state DOWN group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Ah, right, of course, silly me :)

-Toke


Re: network namespace wireguard routing [Was: Re: Userspace Networking Stack + WireGuard + Go]

2021-01-13 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> In order to prevent this Go thread from being hijacked with Linux
> concerns, I've changed the Subject line of the email. Please keep
> follow ups in this thread rather than the other.
>
> Response is in line below:
>
> On Wed, Jan 13, 2021 at 5:26 PM Julian Orth  wrote:
>>
>> On 13/01/2021 17.04, Jason A. Donenfeld wrote:
>>
>>  > Even if you're unprivileged and want a WireGuard interface for just a
>>  > single application that's bound to the lifetime of that application,
>>  > you can still use WireGuard's normal kernel interface inside of a user
>>  > namespace + a network namespace, and get a private process-specific
>>  > WireGuard interface.
>>
>> That's what my patches from back in 2018 were trying to accomplish.
>> Unless I've missed something since, I do not see how what you're
>> describing would work.  Unless you also
>>
>> - create a TUN device in the network namespace
>> - add a default route through that TUN device
>> - manually route all traffic between the init network namespace and your
>>network namespace.
>>
>> Is that what you meant or is there a simpler way?
>
> What I meant was:
>
> 1. User opens his shell and runs ./blah. That executes in the init
> namespace where all the physical interfaces are.
> 2. blah creates a wireguard interface.
> 3. blah creates a network namespace.
> 4. blah moves that wireguard interface into that network namespace.
> 5. blah calls `setns()` on one of its threads to use that network namespace.
>
> Thinking about this in more detail, I'm guessing you take issue with
> step #2? Since that actually might require privileges in the init
> namespace?

5. also requires CAP_SYS_ADMIN (and I think by extension, so does 3.,
and 4.). From 'man setns':

   Network, IPC, time, and UTS namespaces
  In order to reassociate itself with a new network, IPC,
  time, or UTS namespace, the caller must have the
  CAP_SYS_ADMIN capability both in its own user namespace
  and in the user namespace that owns the target namespace.

-Toke


Re: Wireguard not available for CentOS Stream

2021-01-05 Thread Toke Høiland-Jørgensen
Jeffrey Walton  writes:

> On Tue, Jan 5, 2021 at 8:06 AM Silvan Nagl  wrote:
>>
>> Thank you for this information.
>> Since Stream is more or less like a very old Fedora version now I am
>> convinced using Fedora Server wont be that bad at all.
>>
>> Gonna test it soon.
>
> Yeah, I think you'll like it.
>
> I got tired of dicking around with all the breaks and workarounds
> caused by Red Hat and CentOS antique software. I also did not feel
> comfortable with abandoned kernels.
>
> I don't understand how Red Hat or CentOS can provide a 2.6 or 3.10
> kernel in good conscience. Even the kernel folks tell you to use a
> modern kernel, because those old kernels get no attention. The new
> kernels get the bug fixes and security updates (and include the bug
> fixes of the old kernels).

The version number for RHEL kernels is completely fictional. IIRC we
backport 1/3 of all patches in each new kernel release, but keep the
version number fixed and do an insane amount of engineering to keep the
internal kernel ABI stable in spite of the backports.

We can argue about whether this is a reasonable thing to do in the first
place (and I'm not sure I'll actually argue that it is), but it's wrong
to think of RHEL kernels as "ancient with no security updates"... :)

-Toke


Re: Interest in adding multicast support to Wireguard?

2020-09-21 Thread Toke Høiland-Jørgensen
Derrick Lyndon Pallas  writes:

> I know this has come up a few times before, but if there was resolution, 
> I couldn't find it.
>
> I am trying to set up a hub-and-spoke network with many clients 
> connected to a single concentrator. One application I need to support 
> relies on mDNS.

While I generally support adding multicast support to Wireguard, for
mDNS you may be able to solve your problem by using a 'hybrid proxy'
which makes it possible to do mDNS discovery over unicast DNS. See
https://tools.ietf.org/html/draft-ietf-dnssd-hybrid-10 for the standard,
and https://github.com/sbyx/ohybridproxy for an implementation. I'm
running the latter on my network, and while it does require a bit of
configuration, it works reasonably well for device discovery across
routed network segments... So depending on your setup it could likely be
made to work across wg links as well :)

-Toke


Re: Standardized IPv6 ULA from PublicKey

2020-06-29 Thread Toke Høiland-Jørgensen
Roman Mamedov  writes:

> On Mon, 29 Jun 2020 13:03:40 +0200
> Toke Høiland-Jørgensen  wrote:
>
>> Eh? This is specified pretty clearly in RFC4291, section 2.1:
>
> It also says:
>
> -
>
> 2.5.6.  Link-Local IPv6 Unicast Addresses
>
>Link-Local addresses are for use on a single link.  Link-Local
>addresses have the following format:
>
>|   10 |
>|  bits| 54 bits |  64 bits   |
>+--+-++
>|111010|   0 |   interface ID |
>+--+-++
>
>
> -
>
> So should we also follow the designated format for link-locals, or
> accept that WG's case differs from what they had in mind in those
> sections.

In general I'd say that deviating from the RFC needs a good reason.
Expanding the number of bits we can use for the identifier may be a good
reason to expand the LL interface ID width (although I'm not actually
too worried about collisions even if we only use 64 bits). I have yet to
hear a good reason for not just having LL addresses enabled by default :)

> That the "interface" is a special one, with a "link" that doesn't
> function as other kinds of links do, that there's no "neighbour" per
> se to contact by an all-neighbour multicast for instance, no mechanism
> for the "all routers" multicast to work, etc (i.e. all of what the LLs
> were intended to support).

But it's not special. If wireguard is setup as a single point-to-point
link (allowed-ip ::/0), "all-neighbour multicast" and "broadcast"
already works. If there are multiple peers on the interface it doesn't,
but that is also a bug that should be fixed as far as I'm concerned.

> To be clear I'm not against adding LLs, just that "the RFC says so"
> shouldn't be considered the main argument for that when it comes to
> WG.

Oh sure, to me the main argument is also "because it's useful" rather
than "because the RFC says so". But in my view "because it's useful" is
also the reason that requirement is in the RFC in the first place, so
really it amounts to the same thing.

-Toke


Re: Standardized IPv6 ULA from PublicKey

2020-06-29 Thread Toke Høiland-Jørgensen
Roman Mamedov  writes:

> On Mon, 29 Jun 2020 12:22:49 +0200
> Toke Høiland-Jørgensen  wrote:
>
>> Reid Rankin  writes:
>> 
>> > Each IPv6 network device is *required* to have a link-local
>> > address by the RFC
>> 
>> Given this
>
> What you quoted is the shakiest statement of the entire proposal. Might be a
> cool idea and all, but I don't think RFCs say anything about "requiring" that
> for point-to-point L3 interfaces, where there's no functioning multicast or
> broadcast to begin with. And it doesn't seem nice that submitter is trying to
> skew facts in their favor like that.

Eh? This is specified pretty clearly in RFC4291, section 2.1:

2.1.  Addressing Model

   IPv6 addresses of all types are assigned to interfaces, not nodes.
   An IPv6 unicast address refers to a single interface.  Since each
   interface belongs to a single node, any of that node's interfaces'
   unicast addresses may be used as an identifier for the node.

   All interfaces are required to have at least one Link-Local unicast
   address (see Section 2.8 for additional required addresses).  A
   single interface may also have multiple IPv6 addresses of any type
   (unicast, anycast, and multicast) or scope.  Unicast addresses with a
   scope greater than link-scope are not needed for interfaces that are
   not used as the origin or destination of any IPv6 packets to or from
   non-neighbors.  This is sometimes convenient for point-to-point
   interfaces.  There is one exception to this addressing model:

  A unicast address or a set of unicast addresses may be assigned to
  multiple physical interfaces if the implementation treats the
  multiple physical interfaces as one interface when presenting it
  to the internet layer.  This is useful for load-sharing over
  multiple physical interfaces.

   Currently, IPv6 continues the IPv4 model in that a subnet prefix is
   associated with one link.  Multiple subnet prefixes may be assigned
   to the same link.


The fact that Wireguard doesn't assign one is often a source of
annoyance, and since there already is a unique identifier for each peer
on a link (the public key), I really don't see why wg shouldn't just
assign a LL identifier and be done with it. Sure, have a config knob to
turn it off if you're not using IPv6, but let's make this the default
and have wg devices 'just work' over IPv6 by default.

-Toke


Re: Standardized IPv6 ULA from PublicKey

2020-06-29 Thread Toke Høiland-Jørgensen
Reid Rankin  writes:

> Each IPv6 network device is *required* to have a link-local
> address by the RFC

Given this, and how obvious it is to just hash the pubkey into a LL
address, I think the right thing to do would just be to take the hashing
scheme you proposed and put it into the wg kernel part, on by default.
Maybe with a switch to turn it back off for the paranoid :)

-Toke


Re: [PATCH net] wireguard: Use tunnel helpers for decapsulating ECN markings

2020-04-28 Thread Toke Høiland-Jørgensen
"Rodney W. Grimes"  writes:

> Replying to a single issue I am reading, and really hope I
> am miss understanding.  I am neither a wireguard or linux
> user so I may be miss understanding what is said.
>
> Inline at {RWG}
>
>> "Jason A. Donenfeld"  writes:
>> 
>> > Hey Toke,
>> >
>> > Thanks for fixing this. I wasn't aware there was a newer ECN RFC. A
>> > few comments below:
>> >
>> > On Mon, Apr 27, 2020 at 8:47 AM Toke H?iland-J?rgensen  
>> > wrote:
>> >> RFC6040 also recommends dropping packets on certain combinations of
>> >> erroneous code points on the inner and outer packet headers which 
>> >> shouldn't
>> >> appear in normal operation. The helper signals this by a return value > 1,
>> >> so also add a handler for this case.
>> >
>> > This worries me. In the old implementation, we propagate some outer
>> > header data to the inner header, which is technically an authenticity
>> > violation, but minor enough that we let it slide. This patch here
>> > seems to make that violation a bit worse: namely, we're now changing
>> > the behavior based on a combination of outer header + inner header. An
>> > attacker can manipulate the outer header (set it to CE) in order to
>> > learn whether the inner header was CE or not, based on whether or not
>> > the packet gets dropped, which is often observable. That's some form
>
> Why is anyone dropping on decap over the CE bit?  It should be passed
> on, not lead to a packet drop.  If the outer header is CE on an inner
> header of CE it should just continue to be a CE, dropping it is actually
> breaking the purpose of the CE codepoint, to signal congestion before
> having to cause a packet loss.
>
>> > of an oracle, which I'm not too keen on having in wireguard. On the
>> > other hand, we pretty much already _explicitly leak this bit_ on tx
>> > side -- in send.c:
>> >
>> > PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); // inner 
>> > packet
>> > ...
>> > wg_socket_send_skb_to_peer(peer, skb, PACKET_CB(skb)->ds); // outer packet
>> >
>> > We considered that leak a-okay. But a decryption oracle seems slightly
>> > worse than an explicit and intentional leak. But maybe not that much
>> > worse.
>> 
>> Well, seeing as those two bits on the outer header are already copied
>> from the inner header, there's no additional leak added by this change,
>> is there? An in-path observer could set CE and observe that the packet
>> gets dropped, but all they would learn is that the bits were zero
>
> Again why is CE leading to anyone dropping?
>
>> (non-ECT). Which they already knew because they could just read the bits
>> directly from the header.
>> 
>> Also note, BTW, that another difference between RFC 3168 and 6040 is the
>> propagation of ECT(1) from outer to inner header. That's not actually
>> done correctly in Linux ATM, but I sent a separate patch to fix this[0],
>> which Wireguard will also benefit from with this patch.
>
> Thanks for this.
>
>> 
>> > I wanted to check with you: is the analysis above correct? And can you
>> > somehow imagine the ==2 case leading to different behavior, in which
>> > the packet isn't dropped? Or would that ruin the "[de]congestion" part
>> > of ECN? I just want to make sure I understand the full picture before
>> > moving in one direction or another.
>> 
>> So I think the logic here is supposed to be that if there are CE marks
>> on the outer header, then an AQM somewhere along the path has marked the
>> packet, which is supposed to be a congestion signal, which we want to
>> propagate all the way to the receiver (who will then echo it back to the
>> receiver). However, if the inner packet is non-ECT then we can't
>> actually propagate the ECN signal; and a drop is thus the only
>> alternative congestion signal available to us.
>
> You cannot get a CE mark on the outer packet if the inner packet is
> not ECT, as the outer packet would also be not ECT and thus not
> eligible for CE mark.  If you get the above sited condition something
> has gone *wrong*.

Yup, you're quite right. If everything is working correctly, this should
never happen. This being the internet, though, there are bound to be
cases where it will go wrong :)

>> This case shouldn't
>> actually happen that often, a middlebox has to be misconfigured to
>> CE-mark a non-ECT packet in the first place. But, well, misconfigured
>> middleboxes do exist as you're no doubt aware :)
>
> That is true, though I believe the be liberal in what you accept
> concept would say ok, someone messed up, just propogate it and
> let the end nodes deal with it, otherwise your creating a blackhole
> that could be very hard to find.

But that would lead you to ignore a congestion signal. And someone has
to go through an awful lot of trouble to set this signal; if they're
just randomly mangling bits the packet checksum will likely be wrong and
the packet would be dropped anyway. So on balance I'd tend to agree with
the RFC that the right thing to do is to propagate the congestion

Re: [PATCH net v2] wireguard: use tunnel helpers for decapsulating ECN markings

2020-04-28 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Mon, Apr 27, 2020 at 3:16 PM Toke Høiland-Jørgensen  
> wrote:
>>
>> WireGuard currently only propagates ECN markings on tunnel decap according
>> to the old RFC3168 specification. However, the spec has since been updated
>> in RFC6040 to recommend slightly different decapsulation semantics. This
>> was implemented in the kernel as a set of common helpers for ECN
>> decapsulation, so let's just switch over WireGuard to using those, so it
>> can benefit from this enhancement and any future tweaks.
>>
>> RFC6040 also recommends dropping packets on certain combinations of
>> erroneous code points on the inner and outer packet headers which shouldn't
>> appear in normal operation. The helper signals this by a return value > 1,
>> so also add a handler for this case.
>
> Thanks for the details in your other email and for this v2. I've
> applied this to the wireguard tree and will send things up to net
> later this week with a few other things brewing there.

Thanks!

> By the way, the original code came out of a discussion I had with Dave
> Taht while I was coding this on an airplane many years ago. I read
> some old RFCs, made some changes, he tested them with cake, and told
> me that the behavior looked correct. And that's about as far as I've
> forayed into ECN land with WireGuard. It seems like it might be
> helpful (at some point) to add something to the netns.sh test to make
> sure that all this machinery is actually working and continues to work
> properly as things change in the future.

Yeah, good point. I guess I can look into that too at some point :)

-Toke



[PATCH net v2] wireguard: use tunnel helpers for decapsulating ECN markings

2020-04-27 Thread Toke Høiland-Jørgensen
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks.

RFC6040 also recommends dropping packets on certain combinations of
erroneous code points on the inner and outer packet headers which shouldn't
appear in normal operation. The helper signals this by a return value > 1,
so also add a handler for this case.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans 
Cc: Dave Taht 
Cc: Rodney W. Grimes 
Signed-off-by: Toke Høiland-Jørgensen 
---
v2:
  - Don't log decap errors, and make sure they are recorded as frame errors,
not length errors.

 drivers/net/wireguard/receive.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index da3b782ab7d3..ad36f358c807 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -393,13 +393,15 @@ static void wg_packet_consume_data_done(struct wg_peer 
*peer,
len = ntohs(ip_hdr(skb)->tot_len);
if (unlikely(len < sizeof(struct iphdr)))
goto dishonest_packet_size;
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP_ECN_set_ce(ip_hdr(skb));
+   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
+ip_hdr(skb)->tos) > 1)
+   goto ecn_decap_error;
} else if (skb->protocol == htons(ETH_P_IPV6)) {
len = ntohs(ipv6_hdr(skb)->payload_len) +
  sizeof(struct ipv6hdr);
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
+ipv6_get_dsfield(ipv6_hdr(skb))) > 1)
+   goto ecn_decap_error;
} else {
goto dishonest_packet_type;
}
@@ -437,6 +439,7 @@ static void wg_packet_consume_data_done(struct wg_peer 
*peer,
 dishonest_packet_type:
net_dbg_ratelimited("%s: Packet is neither ipv4 nor ipv6 from peer %llu 
(%pISpfsc)\n",
dev->name, peer->internal_id, &peer->endpoint.addr);
+ecn_decap_error:
++dev->stats.rx_errors;
++dev->stats.rx_frame_errors;
goto packet_processed;
-- 
2.26.2



Re: [PATCH net] wireguard: Use tunnel helpers for decapsulating ECN markings

2020-04-27 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey Toke,
>
> Thanks for fixing this. I wasn't aware there was a newer ECN RFC. A
> few comments below:
>
> On Mon, Apr 27, 2020 at 8:47 AM Toke Høiland-Jørgensen  
> wrote:
>> RFC6040 also recommends dropping packets on certain combinations of
>> erroneous code points on the inner and outer packet headers which shouldn't
>> appear in normal operation. The helper signals this by a return value > 1,
>> so also add a handler for this case.
>
> This worries me. In the old implementation, we propagate some outer
> header data to the inner header, which is technically an authenticity
> violation, but minor enough that we let it slide. This patch here
> seems to make that violation a bit worse: namely, we're now changing
> the behavior based on a combination of outer header + inner header. An
> attacker can manipulate the outer header (set it to CE) in order to
> learn whether the inner header was CE or not, based on whether or not
> the packet gets dropped, which is often observable. That's some form
> of an oracle, which I'm not too keen on having in wireguard. On the
> other hand, we pretty much already _explicitly leak this bit_ on tx
> side -- in send.c:
>
> PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); // inner packet
> ...
> wg_socket_send_skb_to_peer(peer, skb, PACKET_CB(skb)->ds); // outer packet
>
> We considered that leak a-okay. But a decryption oracle seems slightly
> worse than an explicit and intentional leak. But maybe not that much
> worse.

Well, seeing as those two bits on the outer header are already copied
from the inner header, there's no additional leak added by this change,
is there? An in-path observer could set CE and observe that the packet
gets dropped, but all they would learn is that the bits were zero
(non-ECT). Which they already knew because they could just read the bits
directly from the header.

Also note, BTW, that another difference between RFC 3168 and 6040 is the
propagation of ECT(1) from outer to inner header. That's not actually
done correctly in Linux ATM, but I sent a separate patch to fix this[0],
which Wireguard will also benefit from with this patch.

> I wanted to check with you: is the analysis above correct? And can you
> somehow imagine the ==2 case leading to different behavior, in which
> the packet isn't dropped? Or would that ruin the "[de]congestion" part
> of ECN? I just want to make sure I understand the full picture before
> moving in one direction or another.

So I think the logic here is supposed to be that if there are CE marks
on the outer header, then an AQM somewhere along the path has marked the
packet, which is supposed to be a congestion signal, which we want to
propagate all the way to the receiver (who will then echo it back to the
receiver). However, if the inner packet is non-ECT then we can't
actually propagate the ECN signal; and a drop is thus the only
alternative congestion signal available to us. This case shouldn't
actually happen that often, a middlebox has to be misconfigured to
CE-mark a non-ECT packet in the first place. But, well, misconfigured
middleboxes do exist as you're no doubt aware :)

>> +   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
>> +ip_hdr(skb)->tos) > 1)
>> +   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
>> +ipv6_get_dsfield(ipv6_hdr(skb))) > 
>> 1)
>
> The documentation for the function says:
>
> *  returns 0 on success
> *  1 if something is broken and should be logged (!!! above)
> *  2 if packet should be dropped
>
> Would it be more clear to explicitly check for ==2 then?

Hmm, maybe? Other callers seem to use >1, so I figured it was better to
be consistent with those. I won't insist on that, though, so if you'd
rather I use a ==2 check I can certainly change it?

>> +ecn_decap_error:
>> +   net_dbg_ratelimited("%s: Non-ECT packet from peer %llu (%pISpfsc)\n",
>> +   dev->name, peer->internal_id, 
>> &peer->endpoint.addr);
>
> All the other error messages in this block are in the form of: "Packet
> .* from peer %llu (%pISpfsc)\n", so better text here might be "Packet
> is non-ECT from peer %llu (%pISpfsc)\n". However, do you think we
> really need to log to the console for this? Does it add much in the
> way of wireguard internals debugging? Can't network congestion induce
> it in complicated scenarios? Maybe it'd be best just to increment
> those error counters instead.

The other callers do seem to hide the logging be

[PATCH net] wireguard: Use tunnel helpers for decapsulating ECN markings

2020-04-27 Thread Toke Høiland-Jørgensen
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks.

RFC6040 also recommends dropping packets on certain combinations of
erroneous code points on the inner and outer packet headers which shouldn't
appear in normal operation. The helper signals this by a return value > 1,
so also add a handler for this case.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans 
Cc: Dave Taht 
Cc: Rodney W. Grimes 
Signed-off-by: Toke Høiland-Jørgensen 
---
 drivers/net/wireguard/receive.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index da3b782ab7d3..f33e476ad574 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -393,13 +393,15 @@ static void wg_packet_consume_data_done(struct wg_peer 
*peer,
len = ntohs(ip_hdr(skb)->tot_len);
if (unlikely(len < sizeof(struct iphdr)))
goto dishonest_packet_size;
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP_ECN_set_ce(ip_hdr(skb));
+   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
+ip_hdr(skb)->tos) > 1)
+   goto ecn_decap_error;
} else if (skb->protocol == htons(ETH_P_IPV6)) {
len = ntohs(ipv6_hdr(skb)->payload_len) +
  sizeof(struct ipv6hdr);
-   if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
-   IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+   if (INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds,
+ipv6_get_dsfield(ipv6_hdr(skb))) > 1)
+   goto ecn_decap_error;
} else {
goto dishonest_packet_type;
}
@@ -446,6 +448,12 @@ static void wg_packet_consume_data_done(struct wg_peer 
*peer,
++dev->stats.rx_errors;
++dev->stats.rx_length_errors;
goto packet_processed;
+ecn_decap_error:
+   net_dbg_ratelimited("%s: Non-ECT packet from peer %llu (%pISpfsc)\n",
+   dev->name, peer->internal_id, &peer->endpoint.addr);
+   ++dev->stats.rx_errors;
+   ++dev->stats.rx_length_errors;
+   goto packet_processed;
 packet_processed:
dev_kfree_skb(skb);
 }
-- 
2.26.2



Re: Attaching XDP program into wireguard interface

2020-04-24 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Fri, Apr 24, 2020 at 3:59 PM Toke Høiland-Jørgensen  wrote:
>>
>> "Jason A. Donenfeld"  writes:
>>
>> > Oh. Set XDP_FLAGS_SKB_MODE.
>>
>> Yeah, you'd definitely need to run this in skb/generic XDP mode.
>>
>> -Toke
>
> It looks like the code in question is likely:
>
> bpf_op = bpf_chk = ops->ndo_bpf;
>if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
>NL_SET_ERR_MSG(extack, "underlying driver does not
> support XDP in native mode");
>return -EOPNOTSUPP;
>}
>if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
>bpf_op = generic_xdp_install;
>if (bpf_op == bpf_chk)
>bpf_chk = generic_xdp_install;
>
> It looks like bpf_op == generic_xdp_install is the case when neither
> XDP_FLAGS_DRV_MODE or XDP_FLAGS_HW_MODE is set. Setting
> XDP_FLAGS_SKB_MODE explicitly will force it on all drivers, but not
> specifying it will fallback to it if the driver doesn't have hardware
> support, which is WireGuard's case, unless either XDP_FLAGS_DRV_MODE
> or XDP_FLAGS_HW_MODE are set.

Yup, that sounds right :)

-Toke


Re: Attaching XDP program into wireguard interface

2020-04-24 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Oh. Set XDP_FLAGS_SKB_MODE.

Yeah, you'd definitely need to run this in skb/generic XDP mode.

-Toke


Re: Tunnel traffic in VRF

2020-01-25 Thread Toke Høiland-Jørgensen
"Steven Honson"  writes:

> Hi Daniele,
>
> By VRFs, do you mean Linux network namespaces, or something different?

VRFs are a separate feature from namespaces - see
https://cumulusnetworks.com/blog/vrf-for-linux/

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: multiple endpoints for a single peer -- implementation details

2020-01-16 Thread Toke Høiland-Jørgensen
Motiejus Jakštys  writes:

> On Thu, Jan 16, 2020 at 11:55 AM Toke Høiland-Jørgensen  wrote:
>>
>> Motiejus Jakštys  writes:
>>
>> > [...]
>> Before you go and re-invent the happy eyeballs algorithm, may I suggest
>> you read the RFC? :)
>>
>> https://tools.ietf.org/html/rfc8305
>>
>> Specifically, the considerations for how to do multiple connection
>> attempts safely without overloading the network is relevant for this. As
>> is the bit about sorting DNS responses.
>
> From the RFC I learned that the client should not issue parallel
> requests, and time between tries must be reasonable (say 100ms). So
> the original problem goes away. Thanks for the link.

You're welcome :)

>> [...]
>>
>> > 2. more a concern: neither kernel, nor wireguard-go implementations
>> > are willing to accept more than one endpoint, and it would be messy to
>> > extend:
>> >
>> > include/uapi/linux/wireguard.h
>> >   WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6
>> >
>> > device/uapi.go:
>> >   case "endpoint":
>> >   ...
>> > endpoint, err := CreateEndpoint(value)
>> >
>> > Endpoint is fixed to be a single UDP address, and both kernel and
>> > wireguard-go refuse unknown keys. To have tooling
>> > backwards-compatibility (i.e. use newer wireguard-tools with older
>> > kernel implementations), wireguard-tools would need to know the
>> > supported "features" of the underlying implementation. And there is no
>> > version negotiation between the user/kernel space. Which makes it
>> > tricky to add features like this.
>>
>> Eh? The kernel API is netlink - you could just add multiple
>> WGPEER_A_ENDPOINT attributes. Or add a new one
>> (WGPEER_A_ENDPOINTS_MULTI?). Same thing for wireguard-go (I assume).
>
> New netlink attribute in userspace alone would mean that older kernels
> will not understand the message. Now that I looked at the headers
> again, kernel does send WGPEER_A_PROTOCOL_VERSION. Which solves the
> problem. Go has an equivalent "protocol_version".

Or a new userspace just adds the new attribute, and if the kernel
doesn't understand it, it will be rejected and userspace can fall back
to the old behaviour. At least I hope that's what the kernel does :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: multiple endpoints for a single peer -- implementation details

2020-01-16 Thread Toke Høiland-Jørgensen
Motiejus Jakštys  writes:

> Hi all,
>
> I thought I'd implement a prototype to use multiple endpoints for a
> single peer, but after some analysis on "## Select new endpoint during
> each handshake"[1], I'd like to share the concerns with future readers
> who might try the same endeavor. TLDR: I think the kernel is not in
> the best position to do this, "decision making in user space" may be
> more appropriate.
>
> To make it happen, handshake process would change. New suggested flow:
> - Initiator sends a handshake packet to all endpoints quasi-simultaneously.
>   - Each handshake is a new message with a different ephemeral key et al.
> - Responder receives the first one and responds.
> - Responder receives more handshakes within 1/INITIATIONS_PER_SECOND
> and discards them.
> - Responder may receive more after 1/INITIATIONS_PER_SECOND and responds.
>
> Responder needs to maintain more than one handshake state for
> MAX_TIMER_HANDSHAKES, specifically, the whole `noise_handshake`
> struct. Following a later suggestion in the thread, this can have an
> upper bound of MAX_ENDPOINTS_PER_PEER (TBD constant).

Before you go and re-invent the happy eyeballs algorithm, may I suggest
you read the RFC? :)

https://tools.ietf.org/html/rfc8305

Specifically, the considerations for how to do multiple connection
attempts safely without overloading the network is relevant for this. As
is the bit about sorting DNS responses.

[...]

> 2. more a concern: neither kernel, nor wireguard-go implementations
> are willing to accept more than one endpoint, and it would be messy to
> extend:
>
> include/uapi/linux/wireguard.h
>   WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6
>
> device/uapi.go:
>   case "endpoint":
>   ...
> endpoint, err := CreateEndpoint(value)
>
> Endpoint is fixed to be a single UDP address, and both kernel and
> wireguard-go refuse unknown keys. To have tooling
> backwards-compatibility (i.e. use newer wireguard-tools with older
> kernel implementations), wireguard-tools would need to know the
> supported "features" of the underlying implementation. And there is no
> version negotiation between the user/kernel space. Which makes it
> tricky to add features like this.

Eh? The kernel API is netlink - you could just add multiple
WGPEER_A_ENDPOINT attributes. Or add a new one
(WGPEER_A_ENDPOINTS_MULTI?). Same thing for wireguard-go (I assume).

> I am suggesting that "## Decision-making in userspace" would work
> better here. Userspace would regularly* issue handshake initiations
> and measure how long it takes for each endpoint, and hand over the
> *single* endpoint to the kernel to connect.

Why not just let userspace add more endpoints to an already established
peer, and have the kernel figure out the selection between them?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: organization of wireguard linux kernel repos moving forward

2019-12-09 Thread Toke Høiland-Jørgensen
David Ahern  writes:

> On 12/9/19 5:49 AM, Jason A. Donenfeld wrote:
>> I'd definitely be interested in this. Back in 2015, that was the plan.
>> Then it took a long time to get to where we are now, and since then
>> wg(8) has really evolved into its own useful thing. The easiest thing
>> would be to move wg(8) wholesale into iproute2 like you suggested;
>> that'd allow people to continue using their infrastructure and whatnot
>> they've used for a long time now. A more nuanced approach would be
>> coming up with a _parallel_ iproute2 tool with mostly the same syntax
>> as wg(8) but as a subcommand of ip(8). Originally the latter appealed
>> to me, but at this point maybe the former is better after all. I
>> suppose something to consider is that wg(8) is actually a
>> cross-platform tool now, with a unified syntax across a whole bunch of
>> operating systems. But it's also just boring C.
>
> If wg is to move into iproute2, it needs to align with the other
> commands and leverage the generic facilities where possible. ie., any
> functionality that overlaps with existing iproute2 code to be converted
> to use iproute2 code.

Thought that might be the case :)

That means a re-implementation, then. In which case the question becomes
whether it's better to do it as an 'ip' subcommand (or even just new
parameters to 'ip link'), or a new top-level utility striving for
compatibility with 'wg'. But that's mostly a UI issue...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: organization of wireguard linux kernel repos moving forward

2019-12-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Mon, Dec 9, 2019 at 1:43 PM Toke Høiland-Jørgensen  wrote:
>>
>> "Jason A. Donenfeld"  writes:
>>
>> > 2) wireguard-tools.git will have the userspace utilities and scripts,
>> > such as wg(8) and wg-quick(8), and be easily packageable by distros.
>> > This repo won't be live until we get a bit closer to the 5.6 release,
>> > but when it is live, it will live at:
>> > https://git.zx2c4.com/wireguard-tools/ [currently 404s]
>> > https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/wireguard-tools.git/
>> > [currently 404s]
>>
>> Any plans for integrating this further with iproute2? One could imagine
>> either teaching 'ip' about the wireguard-specific config (keys etc), or
>> even just moving the 'wg' binary wholesale into iproute2?
>
> I'd definitely be interested in this. Back in 2015, that was the plan.
> Then it took a long time to get to where we are now, and since then
> wg(8) has really evolved into its own useful thing. The easiest thing
> would be to move wg(8) wholesale into iproute2 like you suggested;
> that'd allow people to continue using their infrastructure and whatnot
> they've used for a long time now. A more nuanced approach would be
> coming up with a _parallel_ iproute2 tool with mostly the same syntax
> as wg(8) but as a subcommand of ip(8). Originally the latter appealed
> to me, but at this point maybe the former is better after all. I
> suppose something to consider is that wg(8) is actually a
> cross-platform tool now, with a unified syntax across a whole bunch of
> operating systems.

Hmm, I don't really have any opinion about which approach makes the most
sense; I'm primarily concerned with getting the support into iproute2 so
that it is possible to set up and configure a wireguard tunnel "out of
the box". Both approaches would achieve that, I think...

> But it's also just boring C.

Well, we could always rewrite it in Rust or something? ;)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: organization of wireguard linux kernel repos moving forward

2019-12-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> 2) wireguard-tools.git will have the userspace utilities and scripts,
> such as wg(8) and wg-quick(8), and be easily packageable by distros.
> This repo won't be live until we get a bit closer to the 5.6 release,
> but when it is live, it will live at:
> https://git.zx2c4.com/wireguard-tools/ [currently 404s]
> https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/wireguard-tools.git/
> [currently 404s]

Any plans for integrating this further with iproute2? One could imagine
either teaching 'ip' about the wireguard-specific config (keys etc), or
even just moving the 'wg' binary wholesale into iproute2?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WireGuard to port to existing Crypto API

2019-11-19 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey folks,
>
> Small update on this thread: it turns out that at the same time as I
> was stepping toward compromising and using the old crypto API here
> with this thread, other kernel developers were interested in
> compromising to upstream some aspects of Zinc. The result is that
> everybody took constructive steps toward each other, and the first
> part of Zinc has been merged:
>
> https://lore.kernel.org/linux-crypto/CAHmME9rxGp439vNYECm85bgibkVyrN7Qc+5v3r8QBmBXPZM=d...@mail.gmail.com/
>
> It's not called "Zinc" any more, and some of the design decisions I
> liked aren't there, but I think the lion's share of what we were after
> is there, and a few other pieces should be possible to upstream one at
> a time.
>
> These steps forward should unlock WireGuard upstreaming, which I
> expect to get rolling again soon. WireGuard is probably mostly okay,
> but I still do anticipate review with lots of feedback to incorporate,
> since now there's more impetus for people to take the patch submission
> seriously. I'll keep this list updated as we move forward.

This is great news! I'll keep by eyes peeled for the wireguard
submission. Do note that net-next closes during the merge window,
though, so you'll probably have to wait until after 5.5-rc1 is out the
door to submit that (and so we'll get Wireguard with Linux 5.6).

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WireGuard to port to existing Crypto API

2019-09-25 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi folks,
>
> I'm at the Kernel Recipes conference now and got a chance to talk with
> DaveM a bit about WireGuard upstreaming. His viewpoint has recently
> solidified: in order to go upstream, WireGuard must port to the
> existing crypto API, and handle the Zinc project separately. As DaveM
> is the upstream network tree maintainer, his opinion is quite
> instructive.
>
> I've long resisted the idea of porting to the existing crypto API,
> because I think there are serious problems with it, in terms of
> primitives, API, performance, and overall safety. I didn't want to
> ship WireGuard in a form that I thought was sub-optimal from a
> security perspective, since WireGuard is a security-focused project.
>
> But it seems like with or without us, WireGuard will get ported to the
> existing crypto API. So it's probably better that we just fully
> embrace it, and afterwards work evolutionarily to get Zinc into Linux
> piecemeal. I've ported WireGuard already several times as a PoC to the
> API and have a decent idea of the ways it can go wrong and generally
> how to do it in the least-bad way.
>
> I realize this kind of compromise might come as a disappointment for
> some folks. But it's probably better that as a project we remain
> intimately involved with our Linux kernel users and the security of
> the implementation, rather than slinking away in protest because we
> couldn't get it all in at once. So we'll work with upstream, port to
> the crypto API, and get the process moving again. We'll pick up the
> Zinc work after that's done.

On the contrary, kudos on taking the pragmatic route! Much as I have
enjoyed watching your efforts on Zinc, I always thought it was a shame
it had to hold back the upstreaming of WireGuard. So as far as I'm
concerned, doing that separately sounds like the right approach at this
point, and I'll look forward to seeing the patches land :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Introduce Wireguard support to bird

2019-07-08 Thread Toke Høiland-Jørgensen
Janne Heß  writes:

> On 6/8/19 12:18 AM, Toke Høiland-Jørgensen wrote:
>> Janne Heß  writes:
>> 
>>> Hey everyone,
>>>
>>> as advertised, I have completed the Wireguard support.
>>> You might see that I am not really a C expert, but I hope the code is good 
>>> enough.
>>> If you need me to change anything or have additional questions, just
>>> let me know.
>> 
>> Hi Janne
>> 
>> Awesome that you have taken the time to implement this!
>> 
>> I'll let the Bird maintainers comment on the approach of embedding the
>> wireguard netlink library; but I have two other concerns:
>> 
>> - As far as I can tell there's nothing preventing Bird from removing
>>AllowedIPs that it did not itself install from an interface, right?
>> 
>> - The algorithm is basically O(P*M*N) for inserting N routes on an
>>interface with P peers that each have M existing AllowedIPs. That is
>>not going to scale very far :/
>> 
>> -Toke
>> 
>
> Hi Toke,
>
> thanks for your review.
>
> Your first observation is correct. Due to differences in semantics, this 
> cannot be implemented in the same way it is implemented in the KRT 
> protocol because AllowedIPs don't have a notion of the proto field.

Yeah, not sure what a good way to handle that might be. You could add
this capability to the upstream wireguard? 

> Addressing your second point, the WireGuard netlink interface and 
> library only offer and understand the peers and allowed IPs as a linked 
> list.
> As we only get each route seperately, we cannot process multiple routes 
> in one run. We're open to suggestions and patches.

Well, there are a couple of options, I guess:

- Fix the wireguard netlink interface to work better with
  insertion/deletion semantics.

- Cache the kernel response in a data structure that allows more
  efficient lookups.


Before doing any of this, you may want to wait for feedback on whether
this approach to adding the support is the right way to go. Personally,
I tend to think that the "right" way to add the support is by
introducing a new 'wireguard' protocol type to Bird. But that's up to
Ondrej et al, of course :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: [ANNOUNCE] WireGuard Snapshot `0.0.20190702` Available

2019-07-04 Thread Toke Høiland-Jørgensen
Jaron Kent-Dobias  writes:

> On 7/4/19 5:45 AM, Jason A. Donenfeld wrote:
>> Try running `rm net/wireguard/.check` and trying again.
>> 
>> https://git.zx2c4.com/WireGuard/commit/?id=c31f8664cff475b8f4160506e582fc423c71f381
>> 
>> Something something cthulhu something.
>
> It worked! Interesting use of an inline Euclid's algorithm.

Wow, yeah. I'm impressed and horrified; cthulhu fhtagn indeed!

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Introduce Wireguard support to bird

2019-06-07 Thread Toke Høiland-Jørgensen
Ondrej Zajicek  writes:

> On Fri, Jun 07, 2019 at 06:21:42PM +, Janne Heß wrote:
>> Hey everyone,
>> 
>> as advertised, I have completed the Wireguard support.
>> You might see that I am not really a C expert, but I hope the code is good 
>> enough.
>> If you need me to change anything or have additional questions, just let me 
>> know.
>
> Hi
>
> Could you explain what the Wireguard support broadly does (compared to
> regular route update)?

Wireguard ties prefixes to public keys to figure out which peer to send
a packet to. This supersedes any regular routing on that interface, so
it needs to be done in addition to installing the route on the
interface. See https://www.wireguard.com/#cryptokey-routing

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Introduce Wireguard support to bird

2019-06-07 Thread Toke Høiland-Jørgensen
Janne Heß  writes:

> Hey everyone,
>
> as advertised, I have completed the Wireguard support.
> You might see that I am not really a C expert, but I hope the code is good 
> enough.
> If you need me to change anything or have additional questions, just
> let me know.

Hi Janne

Awesome that you have taken the time to implement this!

I'll let the Bird maintainers comment on the approach of embedding the
wireguard netlink library; but I have two other concerns:

- As far as I can tell there's nothing preventing Bird from removing
  AllowedIPs that it did not itself install from an interface, right?

- The algorithm is basically O(P*M*N) for inserting N routes on an
  interface with P peers that each have M existing AllowedIPs. That is
  not going to scale very far :/

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Overlapping AllowedIPs Configuration

2019-06-06 Thread Toke Høiland-Jørgensen
Paul Zillmann  writes:

> Hello,
>
> we have the same problem here, although our allowed IP ranges should be 
> 0.0.0.0/0 for all peers.
> We have OSPF traffic on the wireguard links so it should be task of the 
> Kernel's routing table to decide where to send what.
>
> The problem is that the allowed-ips configuration has multiple purposes: 
> routing table and firewall/packet filter. This introduces these 
> problems. It would be helpfull to get a compile flag or something else 
> to make this behavior optional.

That is probably not going to happen; the crypto-routing is quite
integral to Wireguard, and is an important security feature.

> Right now Wireguard isn't very friendly to dynamic routing.
>
> I came up with multiple solutions:
> - create multiple interfaces + tunnels.
> or
> - create a bash script that injects the Kernel's routing table into the 
> wg tool every other minute.
>
> Do you guys have a better idea? If not I would create the bash script.

IMO, the "right" way to fix this is to make your routing daemon aware of
wireguard and have it configure the routes directly into the wireguard
table. That also gives you security for your routing protocol
automatically (since only authenticated sources/destinations will be
allowed), as long as you have a secure way of bootstrapping the
wireguard keying.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Host routes – ARP on wireguard interfaces?

2018-12-03 Thread Toke Høiland-Jørgensen
Matthias Urlichs  writes:

> On 03.12.18 14:14, Toke Høiland-Jørgensen wrote:
>> I'm not sure I quite understand what it is you are trying to achieve;
>> why can't just you reconfigure the wireguard interface to route the IP
>> to the right peer?
>
> Because that (a) requires a new mechanism and (b) requires locking,
> because currently you can't atomically remove/add an address from/to a peer.
>
> For a "normal" interface I'd change the host route to whatever the
> nexthop to the real destination address is, and I'm *done*. That's one
> atomic "ip route replace" command (or its netlink equivalent). I've
> found a couple of HA management programs which can do that.
>
> For a wireguard interface I need to find the correct peer (by matching
> the real destination against all Allowed-IP entries), lock the peer
> against changes, read the Allowed-IP list, add the multihomed address,
> and write the list back. Before/after I do all of this I have to remove
> the multihomed address from whatever peer it was previously set to, so
> there's an indeterminate time during which the destination is either
> unreachable or random. The aforementioned HA managers have no idea what
> wireguard is, and their authors may or may not be interested in
> special-casing a still-somewhat-obscure network interface type.

I'm pretty sure you don't have to go through that whole dance. You can
just add the AllowedIP to the new peer, which Wireguard will interpret
as a 'move'.

That still leaves the 'new mechanism' complaint, of course. I'm not sure
it's quite trivial to have the kernel-side do what you want, though, so
it may not be something that is likely to show up. I have a similar
problem because I want to run routing protocols over wireguard; my plan
is to teach the Bird routing daemon about wg peers to resolve this...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Host routes – ARP on wireguard interfaces?

2018-12-03 Thread Toke Høiland-Jørgensen
Matthias Urlichs  writes:

> Hi,
>
> My problem: I have a couple of host routes within my network that point
> to a service's "current master". When that machine goes down, I need to
> automatically redirect its address to a different gateway. That's easy –
> lookup the nexthop for the "real" server address and change the host
> route – except when the destination is on wireguard and needs to be
> (re)directed to a (different) peer.
>
> My preferred solution for this (and also the one that is most intuitive
> IMHO) would be some sort of pseudo-ARP – just enough to make "ip route
> add FOO via BAR" work, 'BAR' being a host address that's associated with
> exactly one wireguard peer.
>
> Thoughts?

I'm not sure I quite understand what it is you are trying to achieve;
why can't just you reconfigure the wireguard interface to route the IP
to the right peer?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Multiple endpoints with same public key

2018-12-03 Thread Toke Høiland-Jørgensen
Davide Depau  writes:

> A few days ago I was struggling with a very slow connection and I was
> wondering whether WireGuard can support this setup (please see attached
> graph).
>
> There is a WireGuard server (the port it's listening on is reachable from
> the outside), then one client with two interfaces connected to the Internet
> with two different IP addresses.
>
> Is it possible to have *one* WireGuard interface on the client, which sends
> packets to the server through both interfaces in a round-robin fashion? I
> would expect the server to detect the client (identified by the public key)
> is sending packets from multiple endpoints, and send packets to both
> endpoints.

I think this would be better solved at a higher layer: Run two tunnels
(to two different port numbers on the server, for instance), and have
the kernel do ECMP routing across both wireguard interfaces...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Sending just ssh traffic via wg

2018-10-05 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey Konstantin,
>
> When you're doing policy routing with packets that are being forwarded
> by the system -- a router, for example -- then the prerouting table is
> sufficient. But for locally generated packets, you have to use the
> OUTPUT table and also probably MASQUERADE. I just reproduced
> everything here and confirm this works:
>
> ip route add default dev wg0 table 2468
> ip rule add fwmark 1234 table 2468
> wg set wg0 peer [...] allowed-ips 0.0.0.0/0
> sysctl net.ipv4.conf.wg0.rp_filter=0
> iptables -t nat -A POSTROUTING -p tcp --dport 22 -m addrtype
> --src-type LOCAL -j MASQUERADE
> iptables -t mangle -A OUTPUT -p tcp --dport 22 -j MARK --set-mark 1234

Any reason why you can't just do

ip rule add dport 22 lookup 2468

?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Configure WireGuard for Roaming Between IPv4, IPv6

2018-09-16 Thread Toke Høiland-Jørgensen
Lane Russell  writes:

> Since this is a home setup and my /56 might (will) change at some
> point, I don't want to have to reconfigure my router, server, and
> clients. Unless there's a way to dynamically reconfigure these devices
> in such a situation?

Ah, right; renumbering is a PITA. Hmm, you could tell your ISP to get a
clue and stop doing that? ;)

Otherwise I suppose it *may* be possible to run a DHCPv6 server on the
wireguard server. Assuming the clients have the server configured with
AllowedIPs=::/0, the DHCP request should get through to the server. So
as long as the DHCP daemon replies via unicast, it could work. You'd
need to manually configure a static link-local (fe80::/64) address for
each client, which can be used as a source address for a DHCP request.
And then have a trigger script on the wireguard server add the IP to
AllowedIPs for the peer when it assigns an IP. I think odhcpd (that is
used on OpenWrt) allows this at least.

Note I haven't tested this; but I *think* it could be made to work, with
a bit of tweaking :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Configure WireGuard for Roaming Between IPv4, IPv6

2018-09-16 Thread Toke Høiland-Jørgensen
Lane Russell  writes:

> Thanks so much for setting me straight. I've gotten IPv6 working over
> my IPv4 tunnels to ensure that IPv6 traffic can't leak out while I'm
> using Wireguard. Since my ISP uses SLAAC to hand out /56s, I have a
> /64 pointed at the local subnet where my VPN server is. From there,
> the VPN clients use my ULA prefix to talk to the server. The server
> masquerades these ULA addresses to its global address.

Why are you using masquerading? Kinda defeats the whole point of IPv6,
doesn't it? :)

You can just pick a public /64 from your subnet and assign that for use
inside the tunnel, then give your clients addresses from that and use
normal routing on the wireguard server. You'll have to get the prefix
routed to your wireguard server, of course; either set that up manually,
or use something like DHCP prefix delegation, or a routing daemon...

If you don't want to use a whole /64 (but really, there's no reason you
shouldn't be able to), you can also use /128's inside the tunnel and
just route those from your gateway to your wireguard server.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: wg addconf :: AllowedIPs gets deleted with the additions of peers

2018-06-26 Thread Toke Høiland-Jørgensen
Adrian Sevcenco  writes:

> On 06/25/2018 11:37 PM, Toke Høiland-Jørgensen wrote:
>> Adrian Sevcenco  writes:
>> 
>>> On 06/25/2018 10:55 PM, Toke Høiland-Jørgensen wrote:
>>>> Adrian Sevcenco  writes:
>>>>
>>>>> Hi! It seems that AllowedIPs declaration gets erased when peers are
>>>>> added with addconf
>>>>
>>>> You can't have the same AllowedIPs for two different peers... :)
>>>
>>> Err... so, it's a bug or a feature?
>> 
>> A feature. The AllowedIPs controls which IP addresses will be routed to
>> that peer. They refer to addresses inside the tunnel. So depending on
>> your setup you'd specify the single IP you assign each peer, or possibly
>> any subnets behind that peer you want routed through the tunnel.
> Then, how can i set a default allow everything for each peer? Should i 
> make a different tunnel for each peer?

Yes, if you want point-to-point links where all traffic is sent to a
single other peer, you'll need separate interfaces.

If you want a road warrior type setup, where client devices connect to a
server and use that as a default gateway, you'd assign each client a
single IP (inside the tunnel) and put that in each peer config's
allowedips. The clients can then all have 0.0.0.0/0 as allowedip of the
server.

> But given your explanation i still feel that it is a bug that when an 
> AllowIPs is declared with the addition of a second peer the declaration 
> from the first peer gets erased ...

Well, the UI can be surprising, sure, but the alternative would be to
report an error if you try to set the same allowedIP on different peers,
which is not necessarily better. And it's not a bug in that it is
intentional behaviour ;)

> It should be either a global setting per tunnel OR an individual setting 
> per peer (in which case it should stay set)

I think the point of confusion is that it is called 'allowedips', but it
really means 'ips that are allowed from this peer *and* routed to this
peer'. I.e., it is also a routing table. See
https://www.wireguard.com/#cryptokey-routing


-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: wg addconf :: AllowedIPs gets deleted with the additions of peers

2018-06-25 Thread Toke Høiland-Jørgensen
Adrian Sevcenco  writes:

> On 06/25/2018 10:55 PM, Toke Høiland-Jørgensen wrote:
>> Adrian Sevcenco  writes:
>> 
>>> Hi! It seems that AllowedIPs declaration gets erased when peers are
>>> added with addconf
>> 
>> You can't have the same AllowedIPs for two different peers... :)
>
> Err... so, it's a bug or a feature?

A feature. The AllowedIPs controls which IP addresses will be routed to
that peer. They refer to addresses inside the tunnel. So depending on
your setup you'd specify the single IP you assign each peer, or possibly
any subnets behind that peer you want routed through the tunnel.

> If it is a feature how can i make server accept whatever ip get the 
> client(s) in various networks?

Changing IPs *on the outside* of the tunnel will be accepted
automatically. The Endpoint specifier is only the initial address; if a
device changes its IP, it'll just keep sending packets from the new IP,
and because they are authenticated by the crypto, the other peer will
accept them and change its notion of what IP the other peer is
reachable at automatically. So as long as only one peer changes its IP
at a time, roaming mostly just works :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: wg addconf :: AllowedIPs gets deleted with the additions of peers

2018-06-25 Thread Toke Høiland-Jørgensen
Adrian Sevcenco  writes:

> Hi! It seems that AllowedIPs declaration gets erased when peers are 
> added with addconf

You can't have the same AllowedIPs for two different peers... :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Need for HW-clock independent timestamps

2018-05-16 Thread Toke Høiland-Jørgensen
Axel Neumann  writes:

> On 13.05.2018 14:37, Toke Høiland-Jørgensen wrote:> Matthias Urlichs
>  writes:
>>
>>> Can anybody think of problems with this solution?
>>
>> Well, the possibility of DOS if you set the counter too high,
>
> Correct me please, but skipping even many counter values should not be
> a problem at all. So do you mean DOS in case your hit a wrap around of
> the counter? IMO this can be easily prevented.

No I meant DOS if you fail to save state properly. I.e., I send seqno
10, lose my state, reboot, and re-initialise to seqno 100. I have
now essentially locked myself out of the network until my seqno goes
above 10 again. Since I have no way of reliably detecting this
condition, there is no straight-forward manual recovery possible.

>> and the
>> possibility of replay attacks if you fail to save the last state when
>> you shut down comes to mind :)
>
> Where is that possibility? If you fail then you would send
> handshake_initiation messages with an already outdated timestamp
> field. Exactly what now happens by default with non-HWC equipped
> devices after each reboot.

You'd need to not only save your own seqno, but also the last seen seqno
from every peer. Otherwise you're vulnerable to a replay attack after
rebooting. And if you lose that state you are, well, vulnerable to a
replay attack after rebooting :)

>> (Not saying it's not possible to create a workable solution, just that
>> it's not trivial and requires careful thought to not break the security
>> assumptions of the protocol).
>
> I agree, but looking at the recent discussion (how to secure NTP as a
> work around for for non-HWC devices) some of the assumptions made by
> the current approach seem already quite questionable to me right now.
> Like super-simple WG and firewall setup. Instead of two-lines
> documentation you will likely need 2 pages plus some references for
> further reading to other tools (like NTP) and also inherit related
> problems. That does not sound like the WG philosophy to me.

Oh, I totally agree that it would be good if a solution could be found
to this. I'm just objecting to the assertion that "it's easy, just
replace the timestamp with an increasing seqno".

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Need for HW-clock independent timestamps

2018-05-13 Thread Toke Høiland-Jørgensen
Matthias Urlichs  writes:

> Can anybody think of problems with this solution?

Well, the possibility of DOS if you set the counter too high, and the
possibility of replay attacks if you fail to save the last state when
you shut down comes to mind :)

(Not saying it's not possible to create a workable solution, just that
it's not trivial and requires careful thought to not break the security
assumptions of the protocol).

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Need for HW-clock independent timestamps

2018-05-13 Thread Toke Høiland-Jørgensen
reiner otto  writes:

> Having implemented this solution already, I consider it some type of
> hack, as the standard time sync unfortunately happens very late in the
> start of the services, after rc.local called. And the sync might take
> quite some time.
>
> Which means, I had to "hack" the time sync immediately after WAN up,
> and to be done in a single shot, before starting WG.

Yeah, messing init script order is going to be hackish. You'd want to
add a hotplug script to react to when the NTP daemon syncs and apply
config after that. See the dnsmasq hotplug script for an example:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/network/services/dnsmasq/files/dnsmasqsec.hotplug;h=781d5337348bb8e927bd15f1caac94a5e7a9ba63;hb=HEAD

For WG it should be necessary to wait to configure the wg interfaces;
they just won't validate correctly until time is fixed. So you could
configure the interfaces in /etc/config/network and just have the
hotplug script add the default route (or whatever your needs are).

> However, as a real RTC is rather cheap, it might be a good idea, in
> case of commercial apps, to ask the supplier of the device to be used
> for the inclusion of a RTC. The more requests, the better the chances
> to find more devices with RTC included.

Sure, for people who are building their own hardware. Most people
aren't, though...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Need for HW-clock independent timestamps

2018-05-12 Thread Toke Høiland-Jørgensen
Axel Neumann  writes:

> Thanks a lot for your replies. But as you can see from my comments below
> this all does not look like a valid option for many embedded use cases.
> BTW, my background are community mesh networks which are maintained by
> all kind of different individuals using a zoo of different device
> types.

We had a long discussion on the issue of time on embedded routers with
no RTC around the time support for DNSSEC was added to dnsmasq. The
solution we ended up with in OpenWrt was that dnsmasq will run without
validating signatures until NTP indicates that it has synced to a time
server. See the --dnssec-no-timecheck and --dnssec-timestamp options to
dnsmasq for details on how this works.

You're right, of course, that "just add an RTC" is not a solution...

The analogue for a wireguard deployment would be to run NTP on the
unsecured links and not configure the wireguard tunnels until NTP has
synced. This has different security implications for a VPN than for
dnssec, of course, but it could be doable. Depends on your setup how
this is best done; you don't give enough details for me to have an
informed opinion :)

> I would really appreciate if WG can further elaborate on this issue.
> There are many real-life communities with embedded-device deployments
> that would be looking forward to use WG.
>
> Could you also comment on the described approach (see again at the end
> of the mail) of allowing (maybe as an alternative) a sequence number
> instead of a timestamp?

Can't comment on the security implications of this; but even if it is
possible without degrading the security of the protocol, this is a
non-trivial change at the protocol level; so if you want to deploy
anything within the next ~6-12 months, I'd suggest finding a
workaround...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WG load balancing?

2018-05-10 Thread Toke Høiland-Jørgensen
Tim Weippert  writes:

> Hi Matthias, 
>
>
> On Thu, May 10, 2018 at 11:21:44AM +0200, Matthias Urlichs wrote:
>> Hello list,
>> 
>> Assume a branch office with two uplinks to the Internet that wants to
>> use WG to talk to the main office, using both of these uplinks in
>> parallel (assuming they're both up) for better uplink speed (and for
>> redundancy if they aren't). Now the obvious idea is to create two WG
>> interfaces on each side, and add a couple of firewall rules to make sure
>> that packets fwmarked 1 go out on the first uplink, and so on.
>> 
>> That's the easy part. The hard part is how to teach the kernel to load
>> balance its default route between the WG interfaces. I tried to use a
>> libteam or bonding interface to tie them together, but apparently WG
>> isn't Ethernet, so that doesn't work.
>> 
>> I thought about using a GRE tunnel, but tunnels have fixed endpoint
>> addresses – somehow I don't think it'd be a good idea to create two
>> wireguard interfaces with the same IP address … and I don't really want
>> to do heavy-handed address mangling on every packet. Losing all
>> connectivity whenever I happen to flush my firewall tables doesn't
>> appeal to me.
>
> Maybe you can use some kind of dynamic routing approach here. Use FRR,
> Quagga or Bird with e.g. OSPF and ECMP ( Equal Cost Multipath) to utilize
> both links. (practically you can also have two default routes with the
> same metric and this should do a round robin fashioned loadbalancing)

You can add ECMP routes with 'ip route' via the 'nexthop' parameter. See
'man ip-route'.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WG load balancing?

2018-05-10 Thread Toke Høiland-Jørgensen
Matthias Urlichs  writes:

> Hello list,
>
> Assume a branch office with two uplinks to the Internet that wants to
> use WG to talk to the main office, using both of these uplinks in
> parallel (assuming they're both up) for better uplink speed (and for
> redundancy if they aren't). Now the obvious idea is to create two WG
> interfaces on each side, and add a couple of firewall rules to make sure
> that packets fwmarked 1 go out on the first uplink, and so on.
>
> That's the easy part. The hard part is how to teach the kernel to load
> balance its default route between the WG interfaces. I tried to use a
> libteam or bonding interface to tie them together, but apparently WG
> isn't Ethernet, so that doesn't work.
>
> I thought about using a GRE tunnel, but tunnels have fixed endpoint
> addresses – somehow I don't think it'd be a good idea to create two
> wireguard interfaces with the same IP address … and I don't really want
> to do heavy-handed address mangling on every packet. Losing all
> connectivity whenever I happen to flush my firewall tables doesn't
> appeal to me.

You could create GRE tunnels on the internal IP addresses of the
wireguard interface? Or use the kernel's ECMP routing as suggested by
Tim :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WG interface to ipv4

2018-05-06 Thread Toke Høiland-Jørgensen
ѽ҉ᶬḳ℠  writes:

> For that matter it is pretty easy in ssh to limit its socket and
> iface/ip range exposure. Is it due to the inferior design of ssh that
> such security hardening features are made available/considered? If you
> keep the ssh keys safe that should be all that matters, should it not?

SSH is different for two reasons: It runs over TCP, and it runs in
userspace.

Because it runs over TCP, it will react to unauthenticated packets,
perform a handshake and exchange quite a bit of traffic before its gets
to the point where it can authenticate its peer. Wireguard does not
exhibit this behaviour: Instead, every data packet is authenticated
individually, and if it doesn't match it is simply dropped. So an
attacker that doesn't know the private key can't even discover that a
host is running wireguard.

Secondly, because SSH runs in userspace, a lot of the processing (such
as the TCP handshake) is done by the kernel on the application's behalf.
So the only way the application has of telling the kernel not to do
this, is by setting the listen address. Wireguard lives directly in the
kernel and so can perform the authentication directly after receiving
the packet, without suffering a context switch to userspace.

The first reason is obviously more important than the second one. Either
way, the decision about whether to add a configuration knob is a
tradeoff; where any possible security gains have to be weighed against
the added complexity (which includes maintaining the extra code, the
risk of misconfiguration, and the cognitive load on the user who has to
deal with more options). Wireguard, in general, tries very hard to avoid
configuration knobs that are not absolutely necessary; and since in this
case the security gains are lower than in many other cases (to the point
where they are mostly theoretical), this decision does make sense :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: WG via systemd (dis)advantage?

2018-05-02 Thread Toke Høiland-Jørgensen
ѽ҉ᶬḳ℠  writes:

> Debian kernel 4.15.11
> WG 0.0.20180420-1
>
> Out of curiosity having WG configured/manged via systemd, which is 
> working thus far as having the interface up and listening (also after 
> rebooting the server).
>
> Now I was looking to manipulate the network interface with some 
> post-up/post-down which though does not seems applicable with systemd, 
> like it would be when managed through networking.service.

You can't do one-line up/down scripts, but you could create a separate
service file and have it depend on the wg interface...

> Hence, wondering whether I miss something about systemd or whether it is 
> rather a bit of a disadvantage to configure/manage WG through systemd as 
> opposed to networking.service?
> What is the benefit of systemd vs. networking.service WG management?

If you're using systemd-networkd to configure the rest of your
networking, having wireguard configured the same way can be useful; and
systemd-networkd can manage dependencies between interfaces as well (to
a certain extent).

However, as you note, things like running arbitrary scripts is a bit
more of a hassle...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: The case for anycast (contra move semantics for unicast AllowedIPs).

2018-04-20 Thread Toke Høiland-Jørgensen
George Walker  writes:

> Hi Jason, Juliusz, Toke, Dave, et. al.,
>
> A year ago we discussed Multicast addressing and the move semantics for
> allowedIPs.

Yeah, whatever happened to actually implementing that? :)

> Recently, I've been mulling over move semantics and their
> implication for WireGuard's support for anycast addressing.
>
> Unlike Multicast addresses, there are no designated address ranges for
> anycast: an anycast address is just a unicast address that's
> advertised in more than one place. As I understand it, the move
> semantics for allowedIPs preclude anycast addressing, as IP addresses
> can only be assigned to one peer at a time. This makes me wonder if it
> might be more correct to allow unicast allowedIPs to be assigned to
> more than one peer, treating them as anycast in that case.

That is not how anycasting works. You don't actually get more than one
path to an anycast address, and you don't duplicate any packets
(otherwise TCP would break, as you say). What anycast does is just that
it announced the same address in multiple places *at the control plane*,
and each router can then pick the closest path to that address and
install that one route in its data plane.

The AllowedIP configuration in WireGuard is the data plane configuration
(i.e., it corresponds to the kernel FIB). So supporting anycast at that
level doesn't make sense; you'll need to decide which peer gets your
"anycast" traffic. To the extent that anycast makes sense as a concept
at all on top of a VPN, you'd need to run some sort of control plane
(e.g., a routing protocol; there's a WireGuard GSOC that looks into
that) which would then configure appropriate AllowedIP configs into
wireguard.

> Also, if memory serves, move semantics account for a large proportion
> of the troubleshooting requests that show up on this list, suggesting
> to me that the status quo --elegant though it is!-- may not be
> especially intuitive.

This is probably a separate point that might be worth exploring further ;)


-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Neat script, looks pretty easy to use. The wg repo has a kprobes
> script too for extracting ephemeral keys from the kernel:
>
> https://git.zx2c4.com/WireGuard/tree/contrib/examples/extract-handshakes

Neat! Brave new world of debugging ;)

/me goes to write some more printk's


-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Fri, Mar 9, 2018 at 3:39 PM, Toke Høiland-Jørgensen  wrote:
>> And leaving it running a bit more, there is also a call from
>> expired_retransmit_handshake:
>
> Yep! These are the two calls in timers.c.

Right, cool. Kprobes are awesome, BTW (this was my first time trying this):

https://github.com/iovisor/bcc/blob/master/examples/tracing/stacksnoop.py

./stacksnoop.py socket_clear_peer_endpoint_src

and presto; nice stack traces every time they are called :D

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi Toke,
>
> That all makes sense. I'm going out of town extremely soon, but I'll
> fix this when I've returned. I have a pretty good idea of what's
> required. If you're curious to try it yourself, just try removing
> invocations of socket_clear_peer_endpoint_src inside timers.c.

Cool! I'll be travelling for a few weeks myself, so no great rush. Not
sure I'll the time to try out any changes before I leave...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> Toke Høiland-Jørgensen  writes:
>
>> "Jason A. Donenfeld"  writes:
>>
>>> On Thu, Mar 8, 2018 at 6:50 PM, Toke Høiland-Jørgensen  wrote:
>>>> Well, I do generally setup routing in a somewhat unusual manner.
>>>>
>>>> I can try to capture some packet dumps tomorrow to poke into it a bit 
>>>> more. Anything in particular I should look for?
>>>
>>> One thing to examine is when WireGuard calls
>>> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
>>> source address that it should be using and fall back to the default.
>>> You could add a pr_info(...) call in this function. I have an inkling
>>> that I make calls to this function too zealously and in potentially
>>> unneeded places, such as on handshake transmission retries.
>>>
>>> I'm headed out of town super soon, so likely debugging this will have
>>> to wait until I'm back, but do let me know what you find, and we'll
>>> get this fixed up upon return.
>>
>> Well, completely failed to reproduce it; everything works as its
>> supposed to now (wireguard correctly picks the public IP as its source
>> address when replying to the client).
>>
>> Not sure if I have changed something in my setup or what is going on;
>> but at least I can roam now, so I'm happy ;)
>
> Scratch that, it's still happening; just not straight away upon roaming.
> It is definitely a timeout thing; installed a kprobe on the function you
> mentioned and got this strack trace when it switches IP:
>
> TIME(s)FUNCTION
> 104.999884129  socket_clear_peer_endpoint_src
>   socket_clear_peer_endpoint_src
>   expired_new_handshake
>   call_timer_fn
>   run_timer_softirq
>   __do_softirq
>   irq_exit
>   smp_apic_timer_interrupt
>   __irqentry_text_start
>   cpuidle_enter_state
>   do_idle
>   cpu_startup_entry
>   start_secondary
>   secondary_startup_64

And leaving it running a bit more, there is also a call from
expired_retransmit_handshake:

449.079751015  socket_clear_peer_endpoint_src
socket_clear_peer_endpoint_src
expired_retransmit_handshake
call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
__irqentry_text_start
cpuidle_enter_state
do_idle
cpu_startup_entry
start_secondary
secondary_startup_64


-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen  writes:

> "Jason A. Donenfeld"  writes:
>
>> On Thu, Mar 8, 2018 at 6:50 PM, Toke Høiland-Jørgensen  wrote:
>>> Well, I do generally setup routing in a somewhat unusual manner.
>>>
>>> I can try to capture some packet dumps tomorrow to poke into it a bit more. 
>>> Anything in particular I should look for?
>>
>> One thing to examine is when WireGuard calls
>> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
>> source address that it should be using and fall back to the default.
>> You could add a pr_info(...) call in this function. I have an inkling
>> that I make calls to this function too zealously and in potentially
>> unneeded places, such as on handshake transmission retries.
>>
>> I'm headed out of town super soon, so likely debugging this will have
>> to wait until I'm back, but do let me know what you find, and we'll
>> get this fixed up upon return.
>
> Well, completely failed to reproduce it; everything works as its
> supposed to now (wireguard correctly picks the public IP as its source
> address when replying to the client).
>
> Not sure if I have changed something in my setup or what is going on;
> but at least I can roam now, so I'm happy ;)

Scratch that, it's still happening; just not straight away upon roaming.
It is definitely a timeout thing; installed a kprobe on the function you
mentioned and got this strack trace when it switches IP:

TIME(s)FUNCTION
104.999884129  socket_clear_peer_endpoint_src
socket_clear_peer_endpoint_src
expired_new_handshake
call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
__irqentry_text_start
cpuidle_enter_state
do_idle
cpu_startup_entry
start_secondary
secondary_startup_64


Think it may be related to powersave on the phone or something? Doesn't
seem to happen with my laptop at least...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-09 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Thu, Mar 8, 2018 at 6:50 PM, Toke Høiland-Jørgensen  wrote:
>> Well, I do generally setup routing in a somewhat unusual manner.
>>
>> I can try to capture some packet dumps tomorrow to poke into it a bit more. 
>> Anything in particular I should look for?
>
> One thing to examine is when WireGuard calls
> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
> source address that it should be using and fall back to the default.
> You could add a pr_info(...) call in this function. I have an inkling
> that I make calls to this function too zealously and in potentially
> unneeded places, such as on handshake transmission retries.
>
> I'm headed out of town super soon, so likely debugging this will have
> to wait until I'm back, but do let me know what you find, and we'll
> get this fixed up upon return.

Well, completely failed to reproduce it; everything works as its
supposed to now (wireguard correctly picks the public IP as its source
address when replying to the client).

Not sure if I have changed something in my setup or what is going on;
but at least I can roam now, so I'm happy ;)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-08 Thread Toke Høiland-Jørgensen


On 8 March 2018 18:39:15 CET, "Jason A. Donenfeld"  wrote:
>Hi Toke,
>
>On Thu, Mar 8, 2018 at 6:23 PM, Toke Høiland-Jørgensen 
>wrote:
>>
>> I have a gateway device with two interfaces, one public and one
>private.
>> This device performs NAT, and is also the one running wireguard (as
>the
>> 'server'). The client roams. So I have two cases:
>>
>>
>> C (public IP) --- (public IP) GW (private IP) -- [LAN]
>>
>> In this case, C talks to GW on GWs public IP; everything works fine.
>>
>> Second case:
>>
>> [internet] --- (public IP) GW (private IP) -- [LAN] -- C (private IP)
>>
>> Here, C talks to GW; it still tries to send packets to the public IP
>of
>> GW (because that is what it's configured to do), but because GW sees
>> that the source IP is on its internal subnet, it replies with a
>source
>> address in the private subnet. This works fine as long as the client
>is
>> on the LAN; but once it roams outside, it now thinks that the
>wireguard
>> server lives on the private IP of the GW, which is obviously can't
>reach
>> from its shiny new public IP.
>>
>> So what I'd want to happen is that GW should keep using its public
>> IP as the source of the wireguard packets, even when talking to a
>client
>> on a directly-connected internal subnet. Or, alternatively, that C
>> should ignore the source address change of the packets coming from GW
>> and keep sending its packets to the public IP it was first configured
>to
>> use...
>>
>
>In this case, WireGuard is indeed supposed to make the right decision.
>Namely, it should continue replying using the correct source address.
>It's not supposed to switch to the internal one. I have the exact same
>setup at home, so I just tried things out again to verify, and from my
>end it seems to be working fine:
>
>zx2c4@thinkpad ~ $ wg
>interface: martino
>  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=
>  private key: (hidden)
>  listening port: 53249
>  fwmark: 0xca6c
>
>peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=
>  preshared key: (hidden)
>  endpoint: 129.228.12.33:1
>  allowed ips: 0.0.0.0/0, ::/0
>  latest handshake: 48 seconds ago
>  transfer: 1.06 KiB received, 19.50 KiB sent
>zx2c4@thinkpad ~ $ ip link set wwan0 down
>zx2c4@thinkpad ~ $ ip link set wlan0 up
>zx2c4@thinkpad ~ $ pingg
>PING google.com (172.217.19.142) 56(84) bytes of data.
>64 bytes from mrs08s04-in-f14.1e100.net (172.217.19.142): icmp_seq=1
>ttl=53 time=20.1 ms
>64 bytes from mrs08s04-in-f14.1e100.net (172.217.19.142): icmp_seq=2
>ttl=53 time=19.1 ms
>^C
>--- google.com ping statistics ---
>2 packets transmitted, 2 received, 0% packet loss, time 1001ms
>rtt min/avg/max/mdev = 19.181/19.666/20.151/0.485 ms
>zx2c4@thinkpad ~ $ wg
>interface: martino
>  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=
>  private key: (hidden)
>  listening port: 53249
>  fwmark: 0xca6c
>
>peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=
>  preshared key: (hidden)
>  endpoint: 129.228.12.33:1
>  allowed ips: 0.0.0.0/0, ::/0
>  latest handshake: 5 seconds ago
>  transfer: 113.70 KiB received, 85.43 KiB sent
>
>I wonder what might be different about your configuration...

Well, I do generally setup routing in a somewhat unusual manner.

I can try to capture some packet dumps tomorrow to poke into it a bit more. 
Anything in particular I should look for?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-08 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Thu, Mar 8, 2018 at 5:59 PM, Toke Høiland-Jørgensen  wrote:
>>> and so I wonder if a simpler solution would also
>>>involve NAT -- namely, configuring "hair pin" NAT?
>>
>> What's that?
>
> It's the terrible vendor term for hitting the gateway through one of
> its IPs (say, the public one) and having it forward packets for you to
> another machine on the same LAN. The idea here, being, you'd get to
> keep using the same IP address for communicating, even when you're
> behind NAT in the private network. (This seems to work well for me at
> my house.)
>
> Wikipedia describes it in terms of the p2p discovery issue, which is
> slightly different, but still the same underlying concept:
> https://en.wikipedia.org/wiki/Hairpinning

Ah, right. In that case I think I probably didn't explain my setup well
enough. Let me try again:


I have a gateway device with two interfaces, one public and one private.
This device performs NAT, and is also the one running wireguard (as the
'server'). The client roams. So I have two cases:


C (public IP) --- (public IP) GW (private IP) -- [LAN]

In this case, C talks to GW on GWs public IP; everything works fine.

Second case:

[internet] --- (public IP) GW (private IP) -- [LAN] -- C (private IP)

Here, C talks to GW; it still tries to send packets to the public IP of
GW (because that is what it's configured to do), but because GW sees
that the source IP is on its internal subnet, it replies with a source
address in the private subnet. This works fine as long as the client is
on the LAN; but once it roams outside, it now thinks that the wireguard
server lives on the private IP of the GW, which is obviously can't reach
from its shiny new public IP.

So what I'd want to happen is that GW should keep using its public
IP as the source of the wireguard packets, even when talking to a client
on a directly-connected internal subnet. Or, alternatively, that C
should ignore the source address change of the packets coming from GW
and keep sending its packets to the public IP it was first configured to
use...

This is all orthogonal to NAT, as far as I can tell :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Another roaming problem

2018-03-08 Thread Toke Høiland-Jørgensen


On 8 March 2018 17:18:47 CET, "Jason A. Donenfeld"  wrote:
>Hi Toke,
>
>On Thu, Mar 8, 2018 at 3:29 PM, Toke Høiland-Jørgensen 
>wrote:
>> So is there a way to either tell the client not to change its idea of
>> the endpoint, or to tell the server to always use a certain source
>> address for outgoing packets?
>
>There have been some discussions on adding another [gasp] nob to clamp
>an endpoint, for this reason and some other related ones. But the
>source address caching is supposed to be sticky. That is -- it's
>supposed to be that WireGuard will use the correct source address
>based on in the prior incoming packet. I can try to reproduce to see
>if perhaps you're uncovering some incorrect behavior here. More
>generally speaking, it seems like this problem is occurring for you
>because of NAT 

Well, in the sense that this wouldn't be a problem if there was no NAT on the 
internet, sure...

But other than that, how is it related to NAT?

> and so I wonder if a simpler solution would also
>involve NAT -- namely, configuring "hair pin" NAT?

What's that?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Another roaming problem

2018-03-08 Thread Toke Høiland-Jørgensen
So I ran into another roaming problem, which I thought I'd open a
separate thread for.

Basically, the problem is this: I run wireguard on my gateway router,
which I then connect to from road warriors (laptop, phone) and tunnel
all my traffic through it.

This works well, except that when the client is connected to the local
network (behind the gateway router), it'll start talking to the internal
interface of the gateway device, and so the client will change its idea
of the endpoint address to the internal (private) address. And so, when
I leave the local network, it can no longer reach the server, and I have
to restart the wireguard interface on the client to get connectivity.

So is there a way to either tell the client not to change its idea of
the endpoint, or to tell the server to always use a certain source
address for outgoing packets?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Roaming between IPv4 and IPv6?

2018-03-07 Thread Toke Høiland-Jørgensen
Kalin KOZHUHAROV  writes:

> On Tue, Mar 6, 2018 at 11:14 PM, Jason A. Donenfeld  wrote:
>> On Tue, Mar 6, 2018 at 11:08 PM, Toke Høiland-Jørgensen  wrote:
>>> I think the idea of configuring both v4 and v6 on startup and caching
>>> them is a reasonable idea. Maybe even configure all available addresses
>>> when doing the initial DNS lookup? Or is that awkward to do?
>>
>> You mean taking one v4 and one v6? That's probably possible. Since
>> getaddrinfo has complicated ordering logic, this probably be best
>> expressed as something like "endpoint" and "secondary endpoint" when
>> told by userspace, with them then being swapped when the FIB complains
>> about trying to route to one of them.
>>
> A slight simplification/generalization will be to define a peer in
> terms of and ordered C-list of IP addresses (whether v4 or v6), 0 or
> more (currently 0 or 1 IP+port).

Yeah, this is basically what I meant: Resolve *all* A and  records
of the configured hostname (for bonus points: get the port number from
SRV records), and stuff them all into the kernel, which will then use
all of them as possible candidates for connecting and use whichever
works (or do happy eyeballs, or something).

However, yeah, this is maybe a bit overkill, but could be a cool idea
for GSOC.

For a simple v4/v6 roaming fix, having one v4 and one v6 configured and
switching between them when the FIB state changes would probably
suffice. I think I would add a v6 preference, though; otherwise it'll
never roam back to v6 once it's on v4 unless the client connects to a
v6-only network.

So something like: If v6 FIB becomes routable, try the v6 address and
switch to that if it works; if v6 FIB becomes unroutable, switch to v4
address...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Roaming between IPv4 and IPv6?

2018-03-06 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey Toke,
>
> For incoming packets, this would be strange behavior, since it's
> listening on v4 and v6.

Yeah, I think the incoming side is fine (it works over both v4 and v6 as
long as I have connectivity on the other end).

> For outgoing packets, if wireguard thinks it should be sending to a v6
> address, then that's what it will do.

Right, so it's not just me, this doesn't actually work currently. Cool ;)

> One way to fix this would be to re-resolve DNS from userspace, which
> is a bit ugly. Another way would be to simply store the last v4
> address, and fall back to that if it can't establish a route for the
> v6 address. And yet another way -- if simplicity is desired -- would
> be to do nothing (the status quo), and not build legacy semantics into
> something new. Any opinions on this?

While I can appreciate the simplicity of doing nothing, I think seamless
roaming even across v4/v6 is a pretty killer feature to have. It turns
wireguard into a "universal connectivity" tool that you can just enable
and forget about, without having to worry about calls dropping when
roaming, etc.

I think the idea of configuring both v4 and v6 on startup and caching
them is a reasonable idea. Maybe even configure all available addresses
when doing the initial DNS lookup? Or is that awkward to do?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Roaming between IPv4 and IPv6?

2018-03-06 Thread Toke Høiland-Jørgensen
Hi

Is wireguard supposed to be able to handle roaming between IPv4 and
IPv6? If I setup a tunnel to a dual-stack machine and establish the
tunnel using IPv6, then switch to a different interface (from WiFi to
cellular in this case) that doesn't have IPv6, the tunnel stops having
connectivity. Whereas if I go in the other direction (first establish
the tunnel over IPv4), I can seamlessly roam between different
connections.

Is this expected behaviour, or should v4/v6 roaming work?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Babel over wireguard

2017-12-06 Thread Toke Høiland-Jørgensen


On 6 December 2017 13:07:56 CET, Ryan Whelan  wrote:
>I'm looking to run babel over wireguard links and running into issues. 
>I
>seem to be unable to get Bird or the reference implementation of Babel
>to
>bind to any wireguard interfaces.  Is this a known issue? or has anyone
>found a config that works?

You need to manually add link-local IPs (fe80:: something) to the wireguard 
interface. Also, since wireguard doesn't support multicast it is only likely to 
work on point-to-point links with AllowedIPs set to ::/0 (or something else 
that includes the Babel multicast address).

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Babel over wireguard

2017-12-06 Thread Toke Høiland-Jørgensen
Ryan Whelan  writes:

> If you're gauging interest, I would be very interested in using
> unicast atop Wireguard for routing selection

Noted. No promises as to when I'll get around to implementing this,
though ;)

> Thank you for the explanation; very helpful.

You're very welcome!

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Babel over wireguard

2017-12-06 Thread Toke Høiland-Jørgensen
Ryan Whelan  writes:

> Are there any routing protocol implementations that do not depend on
> multicast?

We are in the process of standardising Babel, and one of the things we
are adding is the ability to run entirely over unicast. So in the
future, Babel will be able to do this (and integration with Wireguard is
one of the things I want to achieve with this). But for now, no
implementation exists.

Other than that, maybe BGP? But you'd still need integration with
Wireguard if you don't want to just set AllowedIPs to ::/0

> In my setup, 2 hosts will be able to route to one another over 2
> different wg interfaces and I just need something to select whichever
> interface has the least latency. Anything like that exist? :D

You can do this with point-to-point wireguard links. I.e., as long as
the wireguard link only has two peers, you can set AllowedIPs to
0.0.0.0/0, ::/0 on both sides, assign manual link-local addresses
(anything in fe80::/64 will work, so you could just assign fe80::1/64 to
one side and fe80::2/64 to the other side; they don't need to be
globally unique either). Then you can run babeld on top, which will
instruct the kernel to send appropriate packets to the wireguard
interface, and wireguard will forward it to the other side.

It's not currently possible to run a routing daemon on a multi-peer
wireguard interface. The routing daemon would need to reconfigure
wireguard in the kernel when it adds routes. I am planning to add this
to Bird at some point, but have not gotten around to it yet...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Text-based IPC for Userspace Implementations

2017-05-17 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hello list,
>
> Re:JSON --
>
> JSON is difficult to parse and requires large libraries, especially
> for something like JSON-RPC. I'd rather not have these dependencies or
> complications.
>
> On the contrary, a simple key=value newline scheme can be parsed
> trivially by anyone in a safe way, and allows for very quick in basic
> implementations in nearly all environments, even bash. Even given a
> JSON library, a stream of key=value is considerably easier and more
> flexible in the parsing stage.

My point was not so much "just use JSON". My point was "you are
disguising something that requires stateful parsing as a simple
key/value scheme, which is bound to lead to bugs eventually". I.e. if
someone were to write a parser that would (e.g.) just grep for a public
key, they'd end up with ambiguous results at best.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Text-based IPC for Userspace Implementations

2017-05-16 Thread Toke Høiland-Jørgensen
Jonathan Rudenberg  writes:

>> On May 16, 2017, at 09:12, Toke Høiland-Jørgensen  wrote:
>> 
>> So why not avoid any possible confusion and just emit JSON? Or another
>> well-established serialisation format where the nesting can be made
>> explicit... :)
>
> +1 to this, requiring implementors to also implement a custom
> serialization format will lead to avoidable bugs.
>
> I took a quick look and there is a bit of precedent for the use of
> JSON in the kernel tree:
> https://github.com/torvalds/linux/tree/3401e8d1e1300742ed41910b9338b9da52689a16/tools/perf/pmu-events

Yeah, JSON is definitely gaining ground as the go-to format. See also
NetJSON: http://netjson.org/docs/what.html

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Text-based IPC for Userspace Implementations

2017-05-16 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey guys,
>
> Currently wg(8) talks to the kernel by passing structs through an
> ioctl. Due to upstream demands, this is going to be changed to
> netlink, and we'll ditch those structs. wg(8) previously tried to
> re-use those same structs for userspace implementations, passing them
> through a unix socket. Implementors did not like dealing with this.
> Since the structs are going for the kernel stuff, we might as well
> move to something better, too, for the userspace stuff. Therefore
> wg(8) now has a very simple text-based IPC format over unix sockets
> (or Windows named pipes) that can be easily implemented in nearly
> every language, even bash.
>
> I've written a small description of it here: 
> https://www.wireguard.io/xplatform/
>
> This will be part of the next snapshot.
>
> Any questions?

I applaud the general idea of moving to a text-based interface. But why
invent Yet Another Configuration Format?

I guess you could say that key=value pairs are fairly straight forward;
but from the examples it looks like there's an implicit nested
structure? I.e. a public_key=xxx line denotes the start of a new
endpoint, and the following keys are logically part of that endpoint?
Or? If this is the case, you'll need a stateful parser to parse it,
which is not immediately obvious from the description, and is bound to
trip people up at some point...

So why not avoid any possible confusion and just emit JSON? Or another
well-established serialisation format where the nesting can be made
explicit... :)

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: [RFC] Multicast and IPv6 Link Local Addresses

2017-04-17 Thread Toke Høiland-Jørgensen
Baptiste Jonglez  writes:

> Hi Jason,
>
> On Fri, Apr 07, 2017 at 04:02:49PM +0200, Jason A. Donenfeld wrote:
>> Various networking people have been poking and prodding about
>> supporting IPv6 Link Local addresses and about supporting special
>> multicast addresses. *I MAY VERY WELL NEVER CHOOSE TO IMPLEMENT THIS*
>> but in case I do, I wanted to start spec'ing out what this might look
>> like in order to think about it better. There are a lot of odd
>> concerns to take into account, so I doubt that the below will wind up
>> as a final solution.
>
> This is good news!  I can't wait to see Babel running on a wireguard
> interface with several peers, or even what OSPF would look like on such a
> network.
>
> That being said, for the purpose of a routing protocol like Babel, I think
> it still makes more sense to use only *point-to-point* wireguard links.
> Link-local and multicast communication solves the problem of discovering
> remote routing daemon, but the AllowedIPs list is still static, which does
> not make sense for a routing protocol.  With point-to-point links, you can
> bypass this limitation by simply setting AllowedIPs to ::/0.  Of course,
> once we have dynamic AllowedIPs, this will change :)

Yeah, for the use case I'm envisioning, I would teach the routing daemon
to update wireguard's route tables...

> There is just the minor issue of subnets that encompass both unicast and
> multicast addresses, the simplest one being ::/0.  Such subnets could be
> automatically split by wireguard, or just have the "moving" semantic of
> unicast subnets.  With this last option, a user would have to explicitly
> add a subnet which is *within* a multicast prefix to trigger the "cloning"
> semantic.

I agree that the least confusing would be to only treat prefixes
entirely contained in the known multicast prefixes as having "cloning"
semantics.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: [RFC] Multicast and IPv6 Link Local Addresses

2017-04-08 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey George,
>
> More excellent feedback, thanks. Be sure to CC the list next time though.
>
> If I understand correctly, your suggestion is to not clutter
> everything with a horrible "multi:" prefix, but instead allow
> multicast addressees, which are well defined, to be added to multiple
> peers, and only allow unicast addresses to be added to one peer at a
> time keeping the current behavior. I find that a very nice UI
> solution. Wonderful.

I'd be fine with this, as long as it's on the subnet level (which I
believe is the case, just making sure). I.e., if I want behaviour
corresponding to full link-local multicast, I can just add fe02::/16 to
every peer in my config and be done with it.

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Multicast over a wireguard link?

2016-12-20 Thread Toke Høiland-Jørgensen


On 20 December 2016 19:43:15 CET, "Jason A. Donenfeld"  wrote:
>On Tue, Dec 20, 2016 at 7:40 PM, Toke Høiland-Jørgensen 
>wrote:
>> Right, but that means that even if multicast is added, a routing
>> protocol won't be terribly useful, since it can't tell wireguard
>which
>> subnets lives behind which peers. What I would need is to be able to
>> assign /32s (or IPv6 lladdrs) to the interface for each peer, and
>have
>> the kernel routing table determine which subnets should go to each of
>> those. My hope was that wireguard could then figure out where to send
>> the packet from the /32s (which would be in the wireguard config,
>> presumably).
>
>Ahh, I see. In this case, the routing protocol needs to update
>WireGuard, not the kernel's routing table. This forces you to
>re-envision your routing protocol in terms of "which public key should
>get which routes?" which strikes me as an authenticity improvement.

Hmm, longer term carrying public keys in the routing protocol might be viable, 
but getting to there is not trivial. 

However a netlink interface that says, essentially, "add traffic going to 
subnet A to the same peer that has address X". That would turn wireguard into a 
normal routing table (conceptually, at least). If the same netlink interface 
could be used, it wouldn't even be necessary to modify the routing daemon...

___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Multicast over a wireguard link?

2016-12-20 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> On Tue, Dec 20, 2016 at 7:19 PM, Toke Høiland-Jørgensen  wrote:
>> Can wireguard handle routed traffic to an endpoint. I.e. if endpoint A
>> has address 10.0.0.1/32 and endpoint B has a route 'ip route add
>> 10.10.10.1/24 via 10.0.0.1 dev wg0', would the traffic go where it's
>> supposed to?
>
> The `via` isn't relavent, since that only functions to determine which
> device to send it out of. All the ordinary routing table will do is
> direct it toward wg0. Once it hits wg0, then the allowed-ips entries
> are consulted to see where to send it (the "cryptokey routing table").
> See sections 2 and 3 of [1].

Right, but that means that even if multicast is added, a routing
protocol won't be terribly useful, since it can't tell wireguard which
subnets lives behind which peers. What I would need is to be able to
assign /32s (or IPv6 lladdrs) to the interface for each peer, and have
the kernel routing table determine which subnets should go to each of
those. My hope was that wireguard could then figure out where to send
the packet from the /32s (which would be in the wireguard config,
presumably).

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: Multicast over a wireguard link?

2016-12-20 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi Toke,
>
> Right now, no, there's no multicast. But it wouldn't be that hard to
> add the ability to add the same allowed-ips entry to multiple peers,
> and have WireGuard duplicate the message to all of them.

The multicast destination address is known, so that should work. Would
also serve as a kind of ACL, I guess.

> It's not complicated in theory, but I wonder if this would be
> genuinely useful, and whether or not it'd open up a wormhole of
> potential issues.

Well, I would certainly limit it to multicast addresses. But other than
that it sounds like an excellent idea.

Can wireguard handle routed traffic to an endpoint. I.e. if endpoint A
has address 10.0.0.1/32 and endpoint B has a route 'ip route add
10.10.10.1/24 via 10.0.0.1 dev wg0', would the traffic go where it's
supposed to?

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Multicast over a wireguard link?

2016-12-20 Thread Toke Høiland-Jørgensen
Does Wireguard has a notion of multicast? I would like to eventually
replace my current VPN setup with wireguard. I currently run Tinc in
switch (layer 2) mode, and run the Babel routing protocol on top.

Babel announces itself via link-local multicast to everyone on the link.
Does this work with wireguard? I guess the equivalent semantics would be
"send this packet to all known peers"...

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard


Re: [WireGuard] Major Queueing Algorithm Simplification

2016-11-04 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hey,
>
> This might be of interest...
>
> Before, every time I got a GSO superpacket from the kernel, I'd split
> it into little packets, and then queue each little packet as a
> different parallel job.
>
> Now, every time I get a GSO super packet from the kernel, I split it
> into little packets, and queue up that whole bundle of packets as a
> single parallel job. This means that each GSO superpacket expansion
> gets processed on a single CPU. This greatly simplifies the algorithm,
> and delivers mega impressive performance throughput gains.
>
> In practice, what this means is that if you call send(tcp_socket_fd,
> buffer, biglength), then each 65k contiguous chunk of buffer will be
> encrypted on the same CPU. Before, each 1.5k contiguous chunk would be
> encrypted on the same CPU.
>
> I had thought about doing this a long time ago, but didn't, due to
> reasons that are now fuzzy to me. I believe it had something to do
> with latency. But at the moment, I think this solution will actually
> reduce latency on systems with lots of cores, since it means those
> cores don't all have to be synchronized before a bundle can be sent
> out. I haven't measured this yet, and I welcome any such tests. The
> magic commit for this is [1], if you'd like to compare before and
> after.
>
> Are there any obvious objections I've overlooked with this simplified
> approach?

My guess would be that it would worsen latency. You now basically have
head of line blocking where an entire superpacket needs to be processed
before another flow gets to transmit one packet.

I guess this also means that the total amount of data that is currently
being processed increases? I.e., before you would have (max number of
jobs * 1.5K) bytes queued up for encryption at once, where now you will
have (max number of jobs * 65K) bytes? That can be a substantive amount
of latency.

But really, instead of guessing why not measure?

Simply run a `flent tcp_4up ` through the tunnel (requires a
netperf server instance running on ) and look at the latency
graph. The TCP flows will start five seconds after the ping flow; this
shouldn't cause the ping RTT to rise by more than max ~10ms. And of
course, trying this on a machine that does *not* have a gazillion
megafast cores as well is important :)

There's an Ubuntu PPA to get Flent, or you can just `pip install flent`.
See https://flent.org/intro.html#installing-flent

-Toke
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
http://lists.zx2c4.com/mailman/listinfo/wireguard


  1   2   >