On Tue, May 26, 2020 at 05:42:13PM -0600, Jason A. Donenfeld wrote: > On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld <ja...@zx2c4.com> wrote: > > With regards to your crash, though, that's a bit more puzzling, and > > I'd be interested to learn more details. Because these structs are > > already naturally aligned, the __packed attribute, even with the odd > > nesting Matt had prior, should have produced all entirely aligned > > accesses. That makes me think your kaboom was coming from someplace > > else. One possibility is that you were running the git tree on the two > > days that I was playing with uint128_t, only to find out that some of > > openbsd's patches to clang miscalculate stack sizes when they're in > > use, so that work was shelved for another day and the commits removed; > > perhaps you were just unlucky? Or you hit some other bug that's > > lurking. Either way, output from ddb's `bt` would at least be useful.
When you say clang patches miscalculate the stack size with uint128_t, do you know which one(s) specifically? Was it -msave-args? > Do you know off hand if we're able to assume any type of alignment > with mbuf->m_data? mtod just casts without any address fixup, which > means if mbuf->m_data isn't aligned by some other mechanism, we're in > trouble. But I would assume there _is_ some alignment imposed, since > the rest of the stack appears to parse tcp headers and such directly > without byte-by-byte copies being made. It's probably more correct to say that payload alignment is required by the network stack rather than imposed. However, you could argue that's a useless distinction to make in practice. Anyway, tl;dr, my guess is you're reading a 64bit value out of a packet, but I'm pretty sure the stack probably only provides 32bit alignment. The long version of the relevant things are: - OpenBSD runs on some architectures that require strict alignment for accessing words, and the fault handlers for unaligned accesses in the kernel drop you into ddb. ie, you don't want to do that. - The network stack accesses bits of packets directly. I'm pretty sure so far this is limited to 32bit word accesses for things like addresses in IPv4 headers, GRE protocol fields, etc. I'm not sure there are (currently) any 64bit accesses. - mbufs and mbuf clusters (big buffers for packet data) are allocated out of pools which provide 64 byte alignment. The data portion of mbufs starts after some header structures, but should remain long aligned. At least on amd64 and sparc64 it happens to be 16 byte aligned. Figuring it out on a 32bit arch is too much maths for me atm. - The mbuf API tries to keep the m_data pointer in newly allocated mbufs long word aligned. - However, when you're building a packet and you put a header onto it, you usually do that with m_prepend. m_prepend will put the new header up against the existing data, unless there is no space to do that with the current mbuf. In that latter situation it will allocate a new mbuf and add the new header there. Because it is a new mbuf, and as per the previous point, the new header will start on a long word boundary. This is mostly a problem for Ethernet related stuff (14 byte headers, yay), and has a few consequences. One of the consequences is that the receive side of Ethernet drivers in OpenBSD try hard to align Ethernet payloads to 4 byte boundaries. This means they allocate an mbuf with at least their mtu + 2 bytes, and then chop 2 bytes off the front and give that to the chip to rx into. So if you're on a sparc64 running one of those, let's say the driver does this and it gets an mbuf cluster to rx into. That cluster will be 64 byte aligned, which is good. Let's also say it gets a wg packet. The layout of the packet will be: - 2 byte ETHER_ALIGN pad at offset 0 - 14 byte Ethernet header at offset 2 - 20 byte IP header at offset 16 (hooray) - 8 byte UDP header at offset 36 - 4 byte wg message type + pad at offset 44 - 4 byte wg/noise receiver index at offset 48 - 8 byte wg/noise counter/none at offset 52 52 % 8 = explosion. However, in your diff struct noise_data is __packed, so the compiler shouldn't make assumptions about the alignment and should do a byte by byte load. It's hard to say what's happening without a backtrace like you suggest. For reasons I would suggest making struct noise_data look like this: struct noise_data { uint32_t r_idx; uint32_t nonce_lo; uint32_t nonce_hi; }; And making the load of the counter look like this: uint64_t ctr; ctr = lemtoh32(&data->nonce_lo) | ((uint64_t)lemtoh32(&data->nonce_hi) << 32); Sorry for the rambling. Hopefully you know a bit more about mbuf and network stack alignment now though.