> Date: Sun, 25 Oct 2020 10:42:38 +0100 (CET)
> From: Mark Kettenis <[email protected]>
>
> While making radeondrm(4) work on powerpc64 I'm running into an
> interesting unaligned access issue.
>
> Modern POWER CPUs generally support unaligned access. Normal load and
> store unstructions work fine with addresses that aren't naturally
> aligned when operating on cached memory. As a result, clang will
> optimize code by replacing two 32-bit store instructions with a single
> 64-bit store instruction even if there is only 32-bit alignment.
>
> However, this doesn't work for memory that is mapped uncachable. And
> there is some code in radeondrm(4) (and also in amdgpu(4)) that
> generates alignment exceptions because it is writing to bits of video
> memory that are mapped through the graphics aperture.
>
> There are two ways to fix this. The compiler won't apply this
> optimization if memory is accessed through pointers that are marked
> volatile. Hence the fix below. In my opinion that is the right fix
> as rdev->uvd.cpu_addr is a volatile pointer and that aspect shouldn't
> be dropped. The downside of this approach is that we may need to
> maintain some additional local fixes.
>
> The alternative is to emulate the access in the kernel. I fear that
> is what Linux does, which is why they don't notice this issue. As
> such, this issue may crop up in more places and the emulation would
> catch them all. But I'm a bit reluctant to add this emulation since
> it may hide bugs in other parts of our kernel.
>
> Thoughts? ok?
There is more code in radeondrm(4) and amdgpu(4) that is affected by
this issues and some of it isn't easy to "volatilize".
There is an llvm option to enforce strict alignment, but it isn't
exposed as a proper option by clang. I'm still investigating the use
of that option, but meanwhile I think I'll commit the attached diff
such that the kernel side of things works and I can look at what needs
to happen on the userland side.
ok?
Index: arch/powerpc64/powerpc64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v
retrieving revision 1.45
diff -u -p -r1.45 trap.c
--- arch/powerpc64/powerpc64/trap.c 22 Oct 2020 13:41:49 -0000 1.45
+++ arch/powerpc64/powerpc64/trap.c 26 Oct 2020 19:12:28 -0000
@@ -162,6 +162,51 @@ trap(struct trapframe *frame)
printf("dar 0x%lx dsisr 0x%lx\n", frame->dar, frame->dsisr);
goto fatal;
+ case EXC_ALI:
+ {
+ /*
+ * In general POWER allows unaligned loads and stores
+ * and executes those instructions in an efficient
+ * way. As a result compilers may combine word-sized
+ * stores into a single doubleword store instruction
+ * even if the address is not guaranteed to be
+ * doubleword aligned. Such unaligned stores are not
+ * supported in storage that is Caching Inibited.
+ * Access to such storage should be done through
+ * volatile pointers which inhibit the aforementioned
+ * optimizations. Unfortunately code in the amdgpu(4)
+ * and radeondrm(4) drivers happens to run into such
+ * unaligned access because pointers aren't always
+ * marked as volatile. For that reason we emulate
+ * certain store instructions here.
+ */
+ uint32_t insn = *(uint32_t *)frame->srr0;
+
+ /* STD and STDU */
+ if ((insn & 0xfc000002) == 0xf8000000) {
+ uint32_t rs = (insn >> 21) & 0x1f;
+ uint32_t ra = (insn >> 16) & 0x1f;
+ uint64_t ds = insn & 0xfffc;
+ uint64_t ea;
+
+ if (ds & 0x8000)
+ ds |= ~0x7fff;
+ if ((insn & 0x00000001) == 0 && ra == 0)
+ ea = ds;
+ else
+ ea = frame->fixreg[ra] + ds;
+
+ *(uint32_t *)(ea + 0) = frame->fixreg[rs] >> 32;
+ *(uint32_t *)(ea + 4) = frame->fixreg[rs];
+ if (insn & 0x00000001)
+ frame->fixreg[ra] = ea;
+ frame->srr0 += 4;
+ return;
+ }
+ printf("dar 0x%lx dsisr 0x%lx\n", frame->dar, frame->dsisr);
+ goto fatal;
+ }
+
case EXC_DSE|EXC_USER:
pm = p->p_vmspace->vm_map.pmap;
error = pmap_slbd_fault(pm, frame->dar);