Hey David,

On Wed, May 27, 2020 at 2:26 AM David Gwynne <da...@gwynne.id.au> wrote:
>
> On Tue, May 26, 2020 at 05:42:13PM -0600, Jason A. Donenfeld wrote:
> > On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> > > With regards to your crash, though, that's a bit more puzzling, and
> > > I'd be interested to learn more details. Because these structs are
> > > already naturally aligned, the __packed attribute, even with the odd
> > > nesting Matt had prior, should have produced all entirely aligned
> > > accesses. That makes me think your kaboom was coming from someplace
> > > else. One possibility is that you were running the git tree on the two
> > > days that I was playing with uint128_t, only to find out that some of
> > > openbsd's patches to clang miscalculate stack sizes when they're in
> > > use, so that work was shelved for another day and the commits removed;
> > > perhaps you were just unlucky? Or you hit some other bug that's
> > > lurking. Either way, output from ddb's `bt` would at least be useful.
>
> When you say clang patches miscalculate the stack size with uint128_t,
> do you know which one(s) specifically? Was it -msave-args?

Yep, exactly. It looked to me like that plugin was considering each
argument of a function to be register-sized, which obviously isn't the
case for uint128_t. I had planned to come back to that after the
wireguard stuff is finished.

> Anyway, tl;dr, my guess is you're reading a 64bit value out of a packet,
> but I'm pretty sure the stack probably only provides 32bit alignment.

Yep, that seems to be exactly the case. Earlier today I looked at
every struct passed to mtod for the entire kernel, and found two other
drivers that do this with 64bit types, but maybe they're in contexts
where that's okay somehow.

Now that I've got my sparc64 rig cooking away at these kernels, I can
confirm that this is indeed the case, and it's trivially fixed by just
memcpy'ing to/from a properly variable on the stack. In my experience,
this pattern,

uint64_t i;
memcpy(&i, somecharbuffer, sizeof(i));
return i;

optimizes to just the simple boring cast on architectures with fast
unaligned access (e.g. amd64), while doing the smart thing for
architectures that trap or have a performance penalty. IOW, the
compiler generates optimal code over that pattern, so that's what I've
done to work around the issue here.

> - The network stack accesses bits of packets directly. I'm pretty
> sure so far this is limited to 32bit word accesses for things like
> addresses in IPv4 headers, GRE protocol fields, etc. I'm not sure
> there are (currently) any 64bit accesses.

My review earlier today indeed showed all max 32bit, except for two drivers:
- sys/net/switchofp.c and sys/net/ofp.h
- sys/dev/usb/if_athn_usb.c and sys/dev/usb/if_athn_usb.h

Either those are incorrect, are limited to archs that don't trap, or
are used in contexts where other factors make them safe. But beyond
those two, I couldn't find any others.

> Sorry for the rambling. Hopefully you know a bit more about mbuf
> and network stack alignment now though.

Not a problem at all, super appreciated. Looking forward to your other
comments on the driver too.

Jason

Reply via email to