Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Jason A. Donenfeld
Hi Otto,

On Wed, May 27, 2020 at 4:07 AM Otto Moerbeek  wrote:
> Although I'm not terribly interested in bugs that are only seen (s0
> far) using emulation, please send me the details on how you set up
> qemu.

Right, it could very well be a TCG bug. But maybe not. Here's how to
get things [not-]working:

$ qemu-system-sparc64 --version
QEMU emulator version 5.0.0
$ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
$ qemu-img resize disk.qcow2 20G
$ qemu-system-sparc64 \
-machine sun4u \
-m 1024 \
-drive file=disk.qcow2,if=ide \
-net nic,model=ne2k_pci -net user \
-nographic -serial stdio -monitor none \
-boot c

OpenBIOS for Sparc64
[...]
Loading FCode image...
Loaded 5840 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 1.4
..>> OpenBSD BOOT 1.14
Trying bsd...
[...]
OpenBSD 6.6 (RAMDISK) #84: Sat Oct 12 10:42:12 MDT 2019
   dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK
[...]
Welcome to the OpenBSD/sparc64 6.6 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell?

If you swap out miniroot66.fs for miniroot67.fs, you'll get the error
I sent prior.

Jason



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Matt Dunwoodie
On Wed, 27 May 2020 01:43:34 -0600
"Jason A. Donenfeld"  wrote:

> On Wed, May 27, 2020 at 1:34 AM Martin Pieuchot 
> wrote:
> > First question is, is it possible to use the wg(4) interface
> > without a port?  
> 
> No, that is not how WireGuard works. For details on the actual
> protocol particulars, please see
> https://www.wireguard.com/papers/wireguard.pdf . Our rev 1 submission
> has links to further links as well in the cover letter.

Oh, I hope we haven't gotten terminology mixed up. Yes, WireGuard
requires a UDP port, no it does not require any software from the ports
tree. All functionality is accessible from the base system as a "first
class citizen".

Matt



pipex(4): kill NET_LOCK() in pipex_ioctl()

2020-05-27 Thread Vitaliy Makkoveev
pipex(4) is simultaneously protected by KERNEL_LOCK() and NET_LOCK() and
the last is not required. I guess to start remove NET_LOCK(). Diff below
drops NET_LOCK() in pipex_ioctl() and underlaying routines. At least
this helps to kill unlock/lock mess in pppx_add_session() and
pppx_if_destroy().

Index: sys/net/pipex.c
===
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.113
diff -u -p -r1.113 pipex.c
--- sys/net/pipex.c 7 Apr 2020 07:11:22 -   1.113
+++ sys/net/pipex.c 27 May 2020 08:46:32 -
@@ -205,7 +205,8 @@ pipex_ioctl(struct pipex_iface_context *
 {
int pipexmode, ret = 0;
 
-   NET_LOCK();
+   KERNEL_ASSERT_LOCKED();
+
switch (cmd) {
case PIPEXSMODE:
pipexmode = *(int *)data;
@@ -250,7 +251,6 @@ pipex_ioctl(struct pipex_iface_context *
ret = ENOTTY;
break;
}
-   NET_UNLOCK();
 
return (ret);
 }
@@ -269,6 +269,8 @@ pipex_add_session(struct pipex_session_r
struct ifnet *over_ifp = NULL;
 #endif
 
+   KERNEL_ASSERT_LOCKED();
+
/* Checks requeted parameters.  */
if (!iface->pipexmode)
return (ENXIO);
@@ -423,7 +425,6 @@ pipex_add_session(struct pipex_session_r
}
 #endif
 
-   NET_ASSERT_LOCKED();
/* commit the session */
if (!in_nullhost(session->ip_address.sin_addr)) {
if (pipex_lookup_by_ip_address(session->ip_address.sin_addr)
@@ -473,7 +474,7 @@ pipex_add_session(struct pipex_session_r
 int
 pipex_notify_close_session(struct pipex_session *session)
 {
-   NET_ASSERT_LOCKED();
+   KERNEL_ASSERT_LOCKED();
session->state = PIPEX_STATE_CLOSE_WAIT;
session->stat.idle_time = 0;



Re: clear margins when remapping efifb

2020-05-27 Thread Mark Kettenis
> Date: Wed, 27 May 2020 19:39:07 +1000
> From: Jonathan Gray 
> 
> When testing the row and column increase for efifb on a 1920x1080
> display I noticed the top part of the screen continues to contain part
> of a white on blue line from earlier in the dmesg even after the machine
> has finished booting.
> 
> RI_CENTER changes the ri_bits offset, doing RI_CLEARMARGINS in cnremap
> clears the fragment of a line caused by using RI_CENTER.

ok kettenis@

> Index: efifb.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 efifb.c
> --- efifb.c   27 May 2020 07:48:02 -  1.31
> +++ efifb.c   27 May 2020 09:27:50 -
> @@ -219,7 +219,7 @@ efifb_attach(struct device *parent, stru
>   crow = ri->ri_crow;
>  
>   efifb_rasops_preinit(fb);
> - ri->ri_flg &= ~RI_CLEAR;
> + ri->ri_flg &= ~(RI_CLEAR | RI_CLEARMARGINS);
>   ri->ri_flg |= RI_VCONS | RI_WRONLY;
>  
>   rasops_init(ri, efifb_std_descr.nrows, efifb_std_descr.ncols);
> @@ -478,7 +478,7 @@ efifb_cnremap(void)
>  
>   efifb_rasops_preinit(fb);
>   ri->ri_flg &= ~RI_CLEAR;
> - ri->ri_flg |= RI_CENTER | RI_WRONLY;
> + ri->ri_flg |= RI_CENTER | RI_WRONLY | RI_CLEARMARGINS;
>  
>   rasops_init(ri, efifb_std_descr.nrows, efifb_std_descr.ncols);
>  
> 
> 



Remove KERNEL_LOCK() in socket(2) and socketpair(2)

2020-05-27 Thread Martin Pieuchot
When these two syscalls have been marked NOLOCK, falloc(), fdinsert() &
friends weren't ready to be executed without KERNEL_LOCK().  This is no
longer true.  kqueue(2), for example, do the same dances without this
lock.

ok?

Index: kern/uipc_syscalls.c
===
RCS file: /cvs/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.184
diff -u -p -r1.184 uipc_syscalls.c
--- kern/uipc_syscalls.c15 Jan 2020 13:17:35 -  1.184
+++ kern/uipc_syscalls.c27 May 2020 10:13:37 -
@@ -97,7 +97,6 @@ sys_socket(struct proc *p, void *v, regi
if (error)
return (error);
 
-   KERNEL_LOCK();
fdplock(fdp);
error = falloc(p, , );
if (error) {
@@ -114,7 +113,6 @@ sys_socket(struct proc *p, void *v, regi
FRELE(fp, p);
*retval = fd;
}
-   KERNEL_UNLOCK();
return (error);
 }
 
@@ -450,7 +448,6 @@ sys_socketpair(struct proc *p, void *v, 
if (error != 0)
goto free2;
}
-   KERNEL_LOCK();
fdplock(fdp);
if ((error = falloc(p, , [0])) != 0)
goto free3;
@@ -475,7 +472,6 @@ sys_socketpair(struct proc *p, void *v, 
fdpunlock(fdp);
FRELE(fp1, p);
FRELE(fp2, p);
-   KERNEL_UNLOCK();
return (0);
}
fdremove(fdp, sv[1]);
@@ -487,7 +483,6 @@ free4:
so1 = NULL;
 free3:
fdpunlock(fdp);
-   KERNEL_UNLOCK();
 free2:
if (so2 != NULL)
(void)soclose(so2, 0);



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Matt Dunwoodie
On Wed, 27 May 2020 09:34:53 +0200
Martin Pieuchot  wrote:

> Hello Matt,
> 
> Thank you for your submission.  

Hi Martin,

No worries, thank you for your feedback. This is something I want to
help make happen and if I recall correctly, someone once said that if I
wanted a new feature on OpenBSD then submit a patch :)

> On 26/05/20(Tue) 19:39, Matt Dunwoodie wrote:  
> > After some feedback and comments, we've addressed the concerns, and
> > fixed a few things from our side too. Overall the structure is
> > familiar with no major changes, so any prior readings mostly carry
> > over.
> 
> It's hard to review a such big diff.  If you're interested in getting
> feedbacks in the code itself I'd suggest you split it :)  

I do realise it is a bit unwieldy, but it is a fairly
integrated/cohesive patch. I'll have a think about splitting it up
if/when we get to a rev. 3.

> Regarding the kernel, I'd suggest you use "#if NWG > 0" like it is
> done for other pseudo-drives with 'needs-flag'.  

For the most part there is no significant changes to other parts of the
network stack, so I don't believe this should be necessary. If there is
anything in particular that you think should be flagged like that then
please do say.

> Can you re-use PACKET_TAG_GRE instead of introduce
> PACKET_TAG_WIREGUARD? The former is already used by multiple
> pseudo-drivers.  

As far as I can tell PACKET_TAG_GRE is only used in gif(4) and gre(4),
however they just use the tag to store the if_index. I also understand
m_pkthdr.ph_tagsset is limited on space so adding a new TAG type is not
a great idea therefore, I've repurposed PACKET_TAG_GIF to
PACKET_TAG_WIREGUARD.

In this case we store a little bit more information (and not the
if_index) and I would not want any kind of conflict between wg(4)
PACKET_TAG_GRE and gif(4)/gre(4) PACKET_TAG_GRE.

> Aren't wg_noise.c and wg_cookie.c meant to be used by if_wg.c only?
> Or would other parts of the kernel benefit from them?  If wg(4) is
> the only user I'm questionnings whether putting them in crypto/ is
> the best choice.  

Yes, I'm open to moving them somewhere, the were originally in crypto/
as they handled specifically the crypto side of WireGuard. The end
result though, I don't think they should be merged into if_wg.c as they
are designed to be separate to help auditing/review. Would it make
sense to have them in sys/net/?

> Why are you using rwlock to protect cookie_* and noise_* states?  Is
> any sleeping involved?  Did you consider a mutex with IPL_NONE? rwlock
> introduce sleep point that might be surprising.  Did you try the
> option WITNESS of the kernel to check your locking?  

Both the cookie_* and noise_* are expected to run concurrently, and I
wouldn't want one thread holding up another with a mutex while
performing (relatively) expensive crypto operations. Where possible and
correct, we use a read lock.

Running with WITNESS does not raise any errors, I've checked.

> Keeping comments away from source code, like you did in wg_cookie.h
> can result in them not being updated.  This isn't something common in
> the tree and after some times comments tend to lie :o)  

Makes sense, and if it isn't that common I'm more than happy to move
them.

> Would you mind using variables of more than one-letter?  That makes is
> harder to grep for patterns.  

I'll see what I can do :)
 
> If you prefix fields of variable, I'd suggest picking the firs letter
> of the two words, for example 'wt_' for "struct wg_tag" instead of
> "t_" currently.  Then grepping for 'wt_endpoint' is more likely to
> yield what we're looking for :)  

Very reasonable - will do.
 
> Is mq_push() required?  I'm always surprise to see mbuf APIs grow and
> grow over years.  Anyway, that could be a separate change.  

To be explicit, the behaviour of mq_push is required, that is: we want
to add a packet to the queue; if the queue is full we want to drop the
oldest packet in the queue.

I'm aware of APIs growing though, so understand the concern. While it
would be possible to emulate this behaviour with mq_full, and then some
combination of mq_dequeue and mq_enqueue the result would be racey and
may have unintended side effects. The end goal is that we want to
achieve the behaviour above atomically.

> ioctl(2) shouldn't require the NET_LOCK, so if your pseudo-driver uses
> its own locks to serialize access to its data, please say that in the
> comments for SIOCSWG and SIOCGWG.  

Yes, I'll add some comments.

> WG_PEERS_FOREACH{,SAFE} macros are only introducing abstraction and
> make code harder to understand.  

Absolutely, I agree here. It was a remnant from some refactoring.

> Please do not cast function pointer in task_set(), use the expected
> prototype.  

Noted, will fix.

> Why did you introduce wg_queue_*(), isn't mq_init(9) API good enough?
>  

The queueing semantics for WireGuard are a little different from mq_*
for the most part due to encrypting on a taskq with multiple threads.

You may imagine a 

Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Otto Moerbeek
On Wed, May 27, 2020 at 03:14:29AM -0600, Jason A. Donenfeld wrote:

> One interesting quirk in doing this on qemu is that the 6.7 and
> -current kernel both crash:
> 
> Loading FCode image...
> Loaded 6882 bytes
> entry point is 0x4000
> Evaluating FCode...
> OpenBSD IEEE 1275 Bootblock 2.0
> Unhandled Exception 0x0030
> PC = 0xffd0f3ac NPC = 0xffd0f3b0
> Stopping execution
> 
> Luckily it works fine on 6.6, so that's where I debugged this issue.
> But this might be a bug worth looking into. Otto's recent bootblk
> patch is a possible culprit, so I've CC'd him.
> 
> Jason
> 

Although I'm not terribly interested in bugs that are only seen (s0
far) using emulation, please send me the details on how you set up
qemu.

-Otto



clear margins when remapping efifb

2020-05-27 Thread Jonathan Gray
When testing the row and column increase for efifb on a 1920x1080
display I noticed the top part of the screen continues to contain part
of a white on blue line from earlier in the dmesg even after the machine
has finished booting.

RI_CENTER changes the ri_bits offset, doing RI_CLEARMARGINS in cnremap
clears the fragment of a line caused by using RI_CENTER.

Index: efifb.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v
retrieving revision 1.31
diff -u -p -r1.31 efifb.c
--- efifb.c 27 May 2020 07:48:02 -  1.31
+++ efifb.c 27 May 2020 09:27:50 -
@@ -219,7 +219,7 @@ efifb_attach(struct device *parent, stru
crow = ri->ri_crow;
 
efifb_rasops_preinit(fb);
-   ri->ri_flg &= ~RI_CLEAR;
+   ri->ri_flg &= ~(RI_CLEAR | RI_CLEARMARGINS);
ri->ri_flg |= RI_VCONS | RI_WRONLY;
 
rasops_init(ri, efifb_std_descr.nrows, efifb_std_descr.ncols);
@@ -478,7 +478,7 @@ efifb_cnremap(void)
 
efifb_rasops_preinit(fb);
ri->ri_flg &= ~RI_CLEAR;
-   ri->ri_flg |= RI_CENTER | RI_WRONLY;
+   ri->ri_flg |= RI_CENTER | RI_WRONLY | RI_CLEARMARGINS;
 
rasops_init(ri, efifb_std_descr.nrows, efifb_std_descr.ncols);
 



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Jason A. Donenfeld
On Wed, May 27, 2020 at 2:12 AM Jason A. Donenfeld  wrote:
>
> Hi again Klemens,
>
> On Tue, May 26, 2020 at 5:42 PM Jason A. Donenfeld  wrote:
> >
> > On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld  wrote:
> > > With regards to your crash, though, that's a bit more puzzling, and
> > > I'd be interested to learn more details. Because these structs are
> > > already naturally aligned, the __packed attribute, even with the odd
> > > nesting Matt had prior, should have produced all entirely aligned
> > > accesses. That makes me think your kaboom was coming from someplace
> > > else. One possibility is that you were running the git tree on the two
> > > days that I was playing with uint128_t, only to find out that some of
> > > openbsd's patches to clang miscalculate stack sizes when they're in
> > > use, so that work was shelved for another day and the commits removed;
> > > perhaps you were just unlucky? Or you hit some other bug that's
> > > lurking. Either way, output from ddb's `bt` would at least be useful.
> >
> > Do you know off hand if we're able to assume any type of alignment
> > with mbuf->m_data? mtod just casts without any address fixup, which
> > means if mbuf->m_data isn't aligned by some other mechanism, we're in
> > trouble. But I would assume there _is_ some alignment imposed, since
> > the rest of the stack appears to parse tcp headers and such directly
> > without byte-by-byte copies being made.
>
> After a day of TCG-run compilation, I've got a working sparc64 setup
> and can confirm the issue. Working on a fix.

The latest git master should work well now:

solar# arch -s
sparc64
solar# wg-quick up demo
[#] ifconfig wg0 create
[#] wg setconf wg0 /dev/fd/63
[#] ifconfig wg0 inet 192.168.4.155/24 alias
[#] ifconfig wg0 mtu 1420
[#] ifconfig wg0 up
[#] cp /etc/resolv.conf /etc/resolv.conf.wg-quick-backup.demo
[#] printf nameserver %s\n 8.8.8.8 8.8.4.4 1.1.1.1 1.0.0.1
[#] route -q -n add -inet 0.0.0.0/1 -iface 192.168.4.155
[#] route -q -n add -inet 128.0.0.0/1 -iface 192.168.4.155
[#] route -q -n delete -inet 163.172.161.0
[#] route -q -n add -inet 163.172.161.0 -gateway 10.0.2.2
[+] Backgrounding route monitor
solar# curl zx2c4.com/ip
163.172.161.0
demo.wireguard.com

One interesting quirk in doing this on qemu is that the 6.7 and
-current kernel both crash:

Loading FCode image...
Loaded 6882 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
Unhandled Exception 0x0030
PC = 0xffd0f3ac NPC = 0xffd0f3b0
Stopping execution

Luckily it works fine on 6.6, so that's where I debugged this issue.
But this might be a bug worth looking into. Otto's recent bootblk
patch is a possible culprit, so I've CC'd him.

Jason



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Jason A. Donenfeld
Hey David,

On Wed, May 27, 2020 at 2:26 AM David Gwynne  wrote:
>
> On Tue, May 26, 2020 at 05:42:13PM -0600, Jason A. Donenfeld wrote:
> > On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld  wrote:
> > > With regards to your crash, though, that's a bit more puzzling, and
> > > I'd be interested to learn more details. Because these structs are
> > > already naturally aligned, the __packed attribute, even with the odd
> > > nesting Matt had prior, should have produced all entirely aligned
> > > accesses. That makes me think your kaboom was coming from someplace
> > > else. One possibility is that you were running the git tree on the two
> > > days that I was playing with uint128_t, only to find out that some of
> > > openbsd's patches to clang miscalculate stack sizes when they're in
> > > use, so that work was shelved for another day and the commits removed;
> > > perhaps you were just unlucky? Or you hit some other bug that's
> > > lurking. Either way, output from ddb's `bt` would at least be useful.
>
> When you say clang patches miscalculate the stack size with uint128_t,
> do you know which one(s) specifically? Was it -msave-args?

Yep, exactly. It looked to me like that plugin was considering each
argument of a function to be register-sized, which obviously isn't the
case for uint128_t. I had planned to come back to that after the
wireguard stuff is finished.

> Anyway, tl;dr, my guess is you're reading a 64bit value out of a packet,
> but I'm pretty sure the stack probably only provides 32bit alignment.

Yep, that seems to be exactly the case. Earlier today I looked at
every struct passed to mtod for the entire kernel, and found two other
drivers that do this with 64bit types, but maybe they're in contexts
where that's okay somehow.

Now that I've got my sparc64 rig cooking away at these kernels, I can
confirm that this is indeed the case, and it's trivially fixed by just
memcpy'ing to/from a properly variable on the stack. In my experience,
this pattern,

uint64_t i;
memcpy(, somecharbuffer, sizeof(i));
return i;

optimizes to just the simple boring cast on architectures with fast
unaligned access (e.g. amd64), while doing the smart thing for
architectures that trap or have a performance penalty. IOW, the
compiler generates optimal code over that pattern, so that's what I've
done to work around the issue here.

> - The network stack accesses bits of packets directly. I'm pretty
> sure so far this is limited to 32bit word accesses for things like
> addresses in IPv4 headers, GRE protocol fields, etc. I'm not sure
> there are (currently) any 64bit accesses.

My review earlier today indeed showed all max 32bit, except for two drivers:
- sys/net/switchofp.c and sys/net/ofp.h
- sys/dev/usb/if_athn_usb.c and sys/dev/usb/if_athn_usb.h

Either those are incorrect, are limited to archs that don't trap, or
are used in contexts where other factors make them safe. But beyond
those two, I couldn't find any others.

> Sorry for the rambling. Hopefully you know a bit more about mbuf
> and network stack alignment now though.

Not a problem at all, super appreciated. Looking forward to your other
comments on the driver too.

Jason



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread David Gwynne
On Tue, May 26, 2020 at 05:42:13PM -0600, Jason A. Donenfeld wrote:
> On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld  wrote:
> > With regards to your crash, though, that's a bit more puzzling, and
> > I'd be interested to learn more details. Because these structs are
> > already naturally aligned, the __packed attribute, even with the odd
> > nesting Matt had prior, should have produced all entirely aligned
> > accesses. That makes me think your kaboom was coming from someplace
> > else. One possibility is that you were running the git tree on the two
> > days that I was playing with uint128_t, only to find out that some of
> > openbsd's patches to clang miscalculate stack sizes when they're in
> > use, so that work was shelved for another day and the commits removed;
> > perhaps you were just unlucky? Or you hit some other bug that's
> > lurking. Either way, output from ddb's `bt` would at least be useful.

When you say clang patches miscalculate the stack size with uint128_t,
do you know which one(s) specifically? Was it -msave-args?

> Do you know off hand if we're able to assume any type of alignment
> with mbuf->m_data? mtod just casts without any address fixup, which
> means if mbuf->m_data isn't aligned by some other mechanism, we're in
> trouble. But I would assume there _is_ some alignment imposed, since
> the rest of the stack appears to parse tcp headers and such directly
> without byte-by-byte copies being made.

It's probably more correct to say that payload alignment is required by
the network stack rather than imposed. However, you could argue
that's a useless distinction to make in practice.

Anyway, tl;dr, my guess is you're reading a 64bit value out of a packet,
but I'm pretty sure the stack probably only provides 32bit alignment.

The long version of the relevant things are:

- OpenBSD runs on some architectures that require strict alignment
for accessing words, and the fault handlers for unaligned accesses
in the kernel drop you into ddb. ie, you don't want to do that.

- The network stack accesses bits of packets directly. I'm pretty
sure so far this is limited to 32bit word accesses for things like
addresses in IPv4 headers, GRE protocol fields, etc. I'm not sure
there are (currently) any 64bit accesses.

- mbufs and mbuf clusters (big buffers for packet data) are allocated
out of pools which provide 64 byte alignment. The data portion of
mbufs starts after some header structures, but should remain long
aligned. At least on amd64 and sparc64 it happens to be 16 byte
aligned. Figuring it out on a 32bit arch is too much maths for me atm.

- The mbuf API tries to keep the m_data pointer in newly allocated mbufs
long word aligned.

- However, when you're building a packet and you put a header onto it,
you usually do that with m_prepend. m_prepend will put the new header
up against the existing data, unless there is no space to do that with
the current mbuf. In that latter situation it will allocate a new mbuf
and add the new header there. Because it is a new mbuf, and as per the
previous point, the new header will start on a long word boundary.

This is mostly a problem for Ethernet related stuff (14 byte headers,
yay), and has a few consequences.

One of the consequences is that the receive side of Ethernet drivers
in OpenBSD try hard to align Ethernet payloads to 4 byte boundaries.
This means they allocate an mbuf with at least their mtu + 2 bytes,
and then chop 2 bytes off the front and give that to the chip to
rx into.

So if you're on a sparc64 running one of those, let's say the driver does
this and it gets an mbuf cluster to rx into. That cluster will be 64
byte aligned, which is good. Let's also say it gets a wg packet. The
layout of the packet will be:

- 2 byte ETHER_ALIGN pad at offset 0
- 14 byte Ethernet header at offset 2
- 20 byte IP header at offset 16 (hooray)
- 8 byte UDP header at offset 36
- 4 byte wg message type + pad at offset 44
- 4 byte wg/noise receiver index at offset 48
- 8 byte wg/noise counter/none at offset 52

52 % 8 = explosion.

However, in your diff struct noise_data is __packed, so the compiler
shouldn't make assumptions about the alignment and should do a byte
by byte load. It's hard to say what's happening without a backtrace
like you suggest.

For reasons I would suggest making struct noise_data look like this:

struct noise_data {
uint32_tr_idx;
uint32_tnonce_lo;
uint32_tnonce_hi;
};

And making the load of the counter look like this:

uint64_t ctr;

ctr = lemtoh32(>nonce_lo) |
((uint64_t)lemtoh32(>nonce_hi) << 32);

Sorry for the rambling. Hopefully you know a bit more about mbuf
and network stack alignment now though.



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Jason A. Donenfeld
Hi again Klemens,

On Tue, May 26, 2020 at 5:42 PM Jason A. Donenfeld  wrote:
>
> On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld  wrote:
> > With regards to your crash, though, that's a bit more puzzling, and
> > I'd be interested to learn more details. Because these structs are
> > already naturally aligned, the __packed attribute, even with the odd
> > nesting Matt had prior, should have produced all entirely aligned
> > accesses. That makes me think your kaboom was coming from someplace
> > else. One possibility is that you were running the git tree on the two
> > days that I was playing with uint128_t, only to find out that some of
> > openbsd's patches to clang miscalculate stack sizes when they're in
> > use, so that work was shelved for another day and the commits removed;
> > perhaps you were just unlucky? Or you hit some other bug that's
> > lurking. Either way, output from ddb's `bt` would at least be useful.
>
> Do you know off hand if we're able to assume any type of alignment
> with mbuf->m_data? mtod just casts without any address fixup, which
> means if mbuf->m_data isn't aligned by some other mechanism, we're in
> trouble. But I would assume there _is_ some alignment imposed, since
> the rest of the stack appears to parse tcp headers and such directly
> without byte-by-byte copies being made.

After a day of TCG-run compilation, I've got a working sparc64 setup
and can confirm the issue. Working on a fix.

Jason



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Matt Dunwoodie
On Tue, 26 May 2020 13:28:22 +0200
Tobias Heider  wrote:

> Hi Matt,
> 
> just repeating what I commented yesterday for the new diff to make
> sure it isn't overlooked.

Thank you for repeating it, I didn't get around to addressing it before
the new diff.

> > +int
> > +wg_ioctl_get(struct wg_softc *sc, struct wg_data_io *data)
> > +{
> > +   struct wg_interface_io  *iface_p, iface_o;
> > +   struct wg_peer_io   *peer_p, peer_o;
> > +   struct wg_aip_io*aip_p;
> > +
> > +   struct wg_peer  *peer;
> > +   struct wg_aip   *aip;
> > +
> > +   size_t   size, peer_count, aip_count;
> > +   int  ret = 0, is_suser =
> > suser(curproc) == 0; +
> > +   size = sizeof(struct wg_interface_io);
> > +   if (data->wgd_size < size && !is_suser)
> > +   goto ret_size;
> > +
> > +   iface_p = data->wgd_interface;
> > +   bzero(_o, sizeof(iface_o));
> > +
> > +   rw_enter_read(>sc_lock);
> > +
> > +   if (sc->sc_udp_port != 0) {
> > +   iface_o.i_port = ntohs(sc->sc_udp_port);
> > +   iface_o.i_flags |= WG_INTERFACE_HAS_PORT;
> > +   }
> > +
> > +   if (sc->sc_udp_rtable != 0) {
> > +   iface_o.i_rtable = sc->sc_udp_rtable;
> > +   iface_o.i_flags |= WG_INTERFACE_HAS_RTABLE;
> > +   }
> > +
> > +   if (!is_suser)
> > +   goto out;
> > +
> > +   if (noise_local_keys(>sc_local, iface_o.i_public,
> > +   iface_o.i_private) == 0) {
> > +   iface_o.i_flags |= WG_INTERFACE_HAS_PUBLIC;
> > +   iface_o.i_flags |= WG_INTERFACE_HAS_PRIVATE;
> > +   }
> > +
> > +   if (((SIZE_MAX - size) / sizeof(struct wg_peer_io)) <
> > sc->sc_peer_num)
> > +   goto error;
> > +   size += sizeof(struct wg_peer_io) * sc->sc_peer_num;
> > +   if (((SIZE_MAX - size) / sizeof(struct wg_aip_io)) <
> > sc->sc_aip_num)
> > +   goto error;  
> 
> I still think those two should return an error.  'goto error' is
> misleading as it doesn't actually set ret != 0.  'error' should
> probably be renamed to 'cleanup' to avoid confusion and ret should be
> set to ERANGE .

I'll elaborate on this a bit. These goto errors are just checks for
integer overflows on size. Upon further consideration it probably
doesn't make sense if we are using a size_t. Therefore it makes sense
to remove these two checks.

> > +   size += sizeof(struct wg_aip_io) * sc->sc_aip_num;
> > +   if (data->wgd_size < size)
> > +   goto error;

This has been changed to "goto unlock_and_ret_size;". I think Jason
touched on it, but the problem is that if we return ERANGE, then
wg_data_io does not get copied back to userspace and the caller cannot
read the returned size. Therefore we need to return 0 to indicate no
error.

> > +   peer_count = 0;
> > +   peer_p = _p->i_peers[0];
> > +   WG_PEERS_FOREACH(peer, sc) {
> > +   bzero(_o, sizeof(peer_o));
> > +   peer_o.p_flags = WG_PEER_HAS_PUBLIC;
> > +   peer_o.p_protocol_version = 1;
> > +
> > +   if (noise_remote_keys(>p_remote,
> > peer_o.p_public,
> > +   peer_o.p_psk) == 0)
> > +   peer_o.p_flags |= WG_PEER_HAS_PSK;
> > +
> > +   if
> > (wg_timers_get_persistent_keepalive(>p_timers,
> > +   _o.p_pka) == 0)
> > +   peer_o.p_flags |= WG_PEER_HAS_PKA;
> > +
> > +   if (wg_peer_get_sockaddr(peer, _o.p_sa) == 0)
> > +   peer_o.p_flags |= WG_PEER_HAS_ENDPOINT;
> > +
> > +   mtx_enter(>p_counters_mtx);
> > +   peer_o.p_txbytes = peer->p_counters_tx;
> > +   peer_o.p_rxbytes = peer->p_counters_rx;
> > +   mtx_leave(>p_counters_mtx);
> > +
> > +   wg_timers_get_last_handshake(>p_timers,
> > +   _o.p_last_handshake);
> > +
> > +   aip_count = 0;
> > +   aip_p = _p->p_aips[0];
> > +   LIST_FOREACH(aip, >p_aip, a_entry) {
> > +   if ((ret = copyout(>a_data, aip_p,
> > sizeof(*aip_p))) != 0)
> > +   goto error;
> > +   aip_p++;
> > +   aip_count++;
> > +   }
> > +   peer_o.p_aips_count = aip_count;
> > +
> > +   if ((ret = copyout(_o, peer_p,
> > sizeof(peer_o))) != 0)
> > +   goto error;
> > +
> > +   peer_p = (struct wg_peer_io *)aip_p;
> > +   peer_count++;
> > +   }
> > +   iface_o.i_peers_count = peer_count;
> > +
> > +out:
> > +   if ((ret = copyout(_o, iface_p, sizeof(iface_o))) !=
> > 0)
> > +   goto error;
> > +error:
> > +   rw_exit_read(>sc_lock);
> > +   explicit_bzero(_o, sizeof(iface_o));
> > +   explicit_bzero(_o, sizeof(peer_o));
> > +ret_size:
> > +   data->wgd_size = size;
> > +   return ret;
> > +}  
> 



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Jason A. Donenfeld
Hi Martin,

To answer a few but not all of your questions:

On Wed, May 27, 2020 at 1:34 AM Martin Pieuchot  wrote:
> First question is, is it possible to use the wg(4) interface without a
> port?

No, that is not how WireGuard works. For details on the actual
protocol particulars, please see
https://www.wireguard.com/papers/wireguard.pdf . Our rev 1 submission
has links to further links as well in the cover letter.

> Aren't wg_noise.c and wg_cookie.c meant to be used by if_wg.c only?  Or
> would other parts of the kernel benefit from them?  If wg(4) is the only
> user I'm questionnings whether putting them in crypto/ is the best choice.

Yes, they should only be used by if_wg.c. Moving them closer to
if_wg.c would make sense imo.

> Is IFT_WIREGUARD needed?  Isn't the interface a point-to-point interface?

It is a point to point interface, but it also has some important
traits that point to point interfaces do not have. In this case, that
means disabling automatically generated v6 ip addresses, which
wireguard explicitly does not support. (It does support manually
generated and cryptographically bound link local v6 addresses,
however.)

> Why did you re-implemented a routing table?  Did you consider storing
> the information you need in existing data structures?

WireGuard's cryptokey routing is a different structure with different
semantics and is intended to work separately from the normal system
routing table. Trying to merge the two is a recipe for disaster. Paper
has details.

> My overall impression is that this can be simplified and integrate
> better in the kernel :)  If you could explain the basic architecture or
> point me where you explained it, it would help.

See paper above. Also see links on cover letter of rev 1 submission.

Regards,
Jason



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Martin Pieuchot
Hello Matt,

Thank you for your submission.

On 26/05/20(Tue) 19:39, Matt Dunwoodie wrote:
> After some feedback and comments, we've addressed the concerns, and
> fixed a few things from our side too. Overall the structure is familiar
> with no major changes, so any prior readings mostly carry over.

It's hard to review a such big diff.  If you're interested in getting
feedbacks in the code itself I'd suggest you split it :)

First question is, is it possible to use the wg(4) interface without a
port?

Regarding the kernel, I'd suggest you use "#if NWG > 0" like it is done
for other pseudo-drives with 'needs-flag'.

Can you re-use PACKET_TAG_GRE instead of introduce PACKET_TAG_WIREGUARD?
The former is already used by multiple pseudo-drivers.

Aren't wg_noise.c and wg_cookie.c meant to be used by if_wg.c only?  Or
would other parts of the kernel benefit from them?  If wg(4) is the only
user I'm questionnings whether putting them in crypto/ is the best choice.

The blake implementation and ChaCha extensions could be reviewed
independently.

Why are you using rwlock to protect cookie_* and noise_* states?  Is any
sleeping involved?  Did you consider a mutex with IPL_NONE? rwlock
introduce sleep point that might be surprising.  Did you try the option
WITNESS of the kernel to check your locking?

Keeping comments away from source code, like you did in wg_cookie.h can
result in them not being updated.  This isn't something common in the
tree and after some times comments tend to lie :o)

Would you mind using variables of more than one-letter?  That makes is
harder to grep for patterns.

If you prefix fields of variable, I'd suggest picking the firs letter of
the two words, for example 'wt_' for "struct wg_tag" instead of "t_"
currently.  Then grepping for 'wt_endpoint' is more likely to yield what
we're looking for :)

Is mq_push() required?  I'm always surprise to see mbuf APIs grow and
grow over years.  Anyway, that could be a separate change.

ioctl(2) shouldn't require the NET_LOCK, so if your pseudo-driver uses
its own locks to serialize access to its data, please say that in the
comments for SIOCSWG and SIOCGWG.

Is IFT_WIREGUARD needed?  Isn't the interface a point-to-point interface?

WG_PEERS_FOREACH{,SAFE} macros are only introducing abstraction and
make code harder to understand.

Please do not cast function pointer in task_set(), use the expected
prototype.

Why did you introduce wg_queue_*(), isn't mq_init(9) API good enough?

Why did you re-implemented a routing table?  Did you consider storing
the information you need in existing data structures?

My overall impression is that this can be simplified and integrate
better in the kernel :)  If you could explain the basic architecture or
point me where you explained it, it would help.

Thanks



Re: [RFC] pppd: add pipex(4) L2TP control support

2020-05-27 Thread Martin Pieuchot
On 26/05/20(Tue) 10:31, Claudio Jeker wrote:
> [...] 
> npppd(8) is server only it can not establish a connection. pppd(8) on the
> other hand is more client side (but I think it can do both).

Could someone knowledgable indicate that in the man pages?  Currently
there is:

npppd – new Point-to-Point Protocol daemon
pppd – Point-to-Point Protocol daemon

Confusing...

The DESCRIPTIONs aren't much more helpful :)