On Sat, Dec 11, 2021 at 06:28:53PM -0700, Ted Bullock wrote:
> On 2021-12-11 5:39 a.m., Mark Kettenis wrote:
> >> Date: Sat, 11 Dec 2021 05:10:41 -0700
> >> From: Ted Bullock <tbull...@comlore.com>
> > 
> > The are several reasons why that test can fail though.  It can be an
> > endian-ness issue or on sparc64 it could also be an IOMMU issue where
> > the wrong address is programmed into the hardware because CPU
> > addresses aren't properly translated into device virtual addresses.
> > 
> 
> Trying to figure out what the hell is happening here is making my eyes
> bleed a little...  there are lots of preprocessor stuff in this code
> that looks fragile to me. I've not written much in the last few years
> but surely this isn't a normal way of programming or maybe the authors
> are smarter than me. :( Anyway I'm looking for where things could get
> broken.
> 
> >> sys/dev/pci/drm/radeon/r100.c:3651
> >> WREG32(scratch, 0xCAFEDEAD);
> 
> Starting here this is a macro that calls an inline function:
> #define WREG32(reg, v) r100_mm_wreg(rdev, (reg), (v), false)
> 
> fwiw r100_mm_wreg is called only by one other thing, the macro:
> #define WREG32_IDX(reg, v) r100_mm_wreg(rdev, (reg), (v), true)
> 
> I don't know why they wrapped an inline function that is called in only
> 2 different ways behind a macro but they did so ok, then looking at
> r100_mm_wreg:
> 
> static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, 
> uint32_t v,
>                               bool always_indirect)
> {
>       if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
> !always_indirect)
>               writel(v, ((void __iomem *)rdev->rmmio) + reg);
>       else
>               r100_mm_wreg_slow(rdev, reg, v);
> }
> 
> This has some pointer math but this doesn't look like it has anything to
> cause endian issues, so I suppose it's fine.
> 
> >> r = radeon_ring_lock(rdev, ring, 2);
> >> if (r) {
> >>    DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
> >>    radeon_scratch_free(rdev, scratch);
> >>    return r;
> >> }
> 
> This locking stuff has nothing that looks problematic to me for endian
> issues.
> 
> >> radeon_ring_write(ring, PACKET0(scratch, 0));
> 
> Until we get to this ^, this is kind of a nightmare for me to grasp.
> 
> The driver uses ring, or circular queue data structures as I'm familiar
> with for interacting with the gpu, work items are written to the queue
> and read by the gpu. The queue code doesn't look like it should have
> endian issues and obviously it's working on other platforms so can
> probably ignore it. It's weird to me that linux, bsd et al have circular
> queue data structures pre-rolled so I don't know why they made their
> own, perhaps ignorance or hubris, or they are super smart?
> 
> PACKET0(scratch, 0) this is kind of a monster though.
> 
> #define PACKET0(reg, n)       (CP_PACKET0 |                                   
> \
>                        REG_SET(PACKET0_BASE_INDEX, (reg) >> 2) |      \
>                        REG_SET(PACKET0_COUNT, (n)))
> and also here:
> 
> #define REG_SET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)
> 
> Think that maybe this is a big candidate for some sort of endian bug? I
> do. hmmm. This looks insanely fragile but maybe I'm crazy and probably a
> moron or something.
> 
> >> radeon_ring_write(ring, 0xDEADBEEF);

I notice that radeon_ring_write() takes a uint32_t argument.
When writing to memory which is shared with the device, such values need
to be byte-swapped for the device to read them in the expected byte-order.
And swapped back again in case such memory is read by the host.

If no attention was given to this by the original developers then you
will have a lot of fun trying to track down the places where byte swaps
are missing. Any multi-byte read/write access to data structures in memory
shared with the device (i.e. mapped for DMA) needs to do this.
You can look at virtually all the drivers in our tree for examples, and
read the htole32(3) man page for details.

I have not tested the following patch at all (not even compiled it).
And even if this patch is correct it will probably not suffice to make
everything work.
But fixes for missing byte-swaps should all be of this nature, assuming
the device expects little endian and your host is using big endian:

diff 1fa0b3b4477a96dd9841c14c78e338c6ab0abe1d /usr/src
blob - 4674299c6900dc3bfd32b29579f34986df2429b6
file + sys/dev/pci/drm/radeon/radeon.h
--- sys/dev/pci/drm/radeon/radeon.h
+++ sys/dev/pci/drm/radeon/radeon.h
@@ -2737,7 +2737,7 @@ static inline void radeon_ring_write(struct radeon_rin
        if (ring->count_dw <= 0)
                DRM_ERROR("radeon: writing more dwords to the ring than 
expected!\n");
 
-       ring->ring[ring->wptr++] = v;
+       ring->ring[ring->wptr++] = htole32(v);
        ring->wptr &= ring->ptr_mask;
        ring->count_dw--;
        ring->ring_free_dw--;


> >> radeon_ring_unlock_commit(rdev, ring, false);
> >> for (i = 0; i < rdev->usec_timeout; i++) {
> >>    tmp = RREG32(scratch);
> >>    if (tmp == 0xDEADBEEF) {
> >>            break;
> >>    }
> >>    udelay(1);
> >> }
> >> if (i < rdev->usec_timeout) {
> >>    DRM_INFO("ring test succeeded in %d usecs\n", i);
> >> } else {
> >>    DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",
> >>              scratch, tmp);
> >>    r = -EINVAL;
> >> }
> 
> 
> -- 
> Ted Bullock <tbull...@comlore.com>
> 
> 

Reply via email to