Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-12 Thread Stefan Sperling
On Sat, Dec 11, 2021 at 06:28:53PM -0700, Ted Bullock wrote:
> On 2021-12-11 5:39 a.m., Mark Kettenis wrote:
> >> Date: Sat, 11 Dec 2021 05:10:41 -0700
> >> From: Ted Bullock 
> > 
> > The are several reasons why that test can fail though.  It can be an
> > endian-ness issue or on sparc64 it could also be an IOMMU issue where
> > the wrong address is programmed into the hardware because CPU
> > addresses aren't properly translated into device virtual addresses.
> > 
> 
> Trying to figure out what the hell is happening here is making my eyes
> bleed a little...  there are lots of preprocessor stuff in this code
> that looks fragile to me. I've not written much in the last few years
> but surely this isn't a normal way of programming or maybe the authors
> are smarter than me. :( Anyway I'm looking for where things could get
> broken.
> 
> >> sys/dev/pci/drm/radeon/r100.c:3651
> >> WREG32(scratch, 0xCAFEDEAD);
> 
> Starting here this is a macro that calls an inline function:
> #define WREG32(reg, v) r100_mm_wreg(rdev, (reg), (v), false)
> 
> fwiw r100_mm_wreg is called only by one other thing, the macro:
> #define WREG32_IDX(reg, v) r100_mm_wreg(rdev, (reg), (v), true)
> 
> I don't know why they wrapped an inline function that is called in only
> 2 different ways behind a macro but they did so ok, then looking at
> r100_mm_wreg:
> 
> static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, 
> uint32_t v,
>   bool always_indirect)
> {
>   if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
> !always_indirect)
>   writel(v, ((void __iomem *)rdev->rmmio) + reg);
>   else
>   r100_mm_wreg_slow(rdev, reg, v);
> }
> 
> This has some pointer math but this doesn't look like it has anything to
> cause endian issues, so I suppose it's fine.
> 
> >> r = radeon_ring_lock(rdev, ring, 2);
> >> if (r) {
> >>DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
> >>radeon_scratch_free(rdev, scratch);
> >>return r;
> >> }
> 
> This locking stuff has nothing that looks problematic to me for endian
> issues.
> 
> >> radeon_ring_write(ring, PACKET0(scratch, 0));
> 
> Until we get to this ^, this is kind of a nightmare for me to grasp.
> 
> The driver uses ring, or circular queue data structures as I'm familiar
> with for interacting with the gpu, work items are written to the queue
> and read by the gpu. The queue code doesn't look like it should have
> endian issues and obviously it's working on other platforms so can
> probably ignore it. It's weird to me that linux, bsd et al have circular
> queue data structures pre-rolled so I don't know why they made their
> own, perhaps ignorance or hubris, or they are super smart?
> 
> PACKET0(scratch, 0) this is kind of a monster though.
> 
> #define PACKET0(reg, n)   (CP_PACKET0 |   
> \
>REG_SET(PACKET0_BASE_INDEX, (reg) >> 2) |  \
>REG_SET(PACKET0_COUNT, (n)))
> and also here:
> 
> #define REG_SET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)
> 
> Think that maybe this is a big candidate for some sort of endian bug? I
> do. hmmm. This looks insanely fragile but maybe I'm crazy and probably a
> moron or something.
> 
> >> radeon_ring_write(ring, 0xDEADBEEF);

I notice that radeon_ring_write() takes a uint32_t argument.
When writing to memory which is shared with the device, such values need
to be byte-swapped for the device to read them in the expected byte-order.
And swapped back again in case such memory is read by the host.

If no attention was given to this by the original developers then you
will have a lot of fun trying to track down the places where byte swaps
are missing. Any multi-byte read/write access to data structures in memory
shared with the device (i.e. mapped for DMA) needs to do this.
You can look at virtually all the drivers in our tree for examples, and
read the htole32(3) man page for details.

I have not tested the following patch at all (not even compiled it).
And even if this patch is correct it will probably not suffice to make
everything work.
But fixes for missing byte-swaps should all be of this nature, assuming
the device expects little endian and your host is using big endian:

diff 1fa0b3b4477a96dd9841c14c78e338c6ab0abe1d /usr/src
blob - 4674299c6900dc3bfd32b29579f34986df2429b6
file + sys/dev/pci/drm/radeon/radeon.h
--- sys/dev/pci/drm/radeon/radeon.h
+++ sys/dev/pci/drm/radeon/radeon.h
@@ -2737,7 +2737,7 @@ static inline void radeon_ring_write(struct radeon_rin
if (ring->count_dw <= 0)
DRM_ERROR("radeon: writing more dwords to the ring than 
expected!\n");
 
-   ring->ring[ring->wptr++] = v;
+   ring->ring[ring->wptr++] = htole32(v);
ring->wptr &= ring->ptr_mask;
ring->count_dw--;
ring->ring_free_dw--;


> >> radeon_ring_unlock_commit(rdev, ring, false);
> >> for (i = 0; i < rdev->usec_timeout; i++) {
> >> 

Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Ted Bullock
On 2021-12-11 5:39 a.m., Mark Kettenis wrote:
>> Date: Sat, 11 Dec 2021 05:10:41 -0700
>> From: Ted Bullock 
> 
> The are several reasons why that test can fail though.  It can be an
> endian-ness issue or on sparc64 it could also be an IOMMU issue where
> the wrong address is programmed into the hardware because CPU
> addresses aren't properly translated into device virtual addresses.
> 

Trying to figure out what the hell is happening here is making my eyes
bleed a little...  there are lots of preprocessor stuff in this code
that looks fragile to me. I've not written much in the last few years
but surely this isn't a normal way of programming or maybe the authors
are smarter than me. :( Anyway I'm looking for where things could get
broken.

>> sys/dev/pci/drm/radeon/r100.c:3651
>> WREG32(scratch, 0xCAFEDEAD);

Starting here this is a macro that calls an inline function:
#define WREG32(reg, v) r100_mm_wreg(rdev, (reg), (v), false)

fwiw r100_mm_wreg is called only by one other thing, the macro:
#define WREG32_IDX(reg, v) r100_mm_wreg(rdev, (reg), (v), true)

I don't know why they wrapped an inline function that is called in only
2 different ways behind a macro but they did so ok, then looking at
r100_mm_wreg:

static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, 
uint32_t v,
bool always_indirect)
{
if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
!always_indirect)
writel(v, ((void __iomem *)rdev->rmmio) + reg);
else
r100_mm_wreg_slow(rdev, reg, v);
}

This has some pointer math but this doesn't look like it has anything to
cause endian issues, so I suppose it's fine.

>> r = radeon_ring_lock(rdev, ring, 2);
>> if (r) {
>>  DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
>>  radeon_scratch_free(rdev, scratch);
>>  return r;
>> }

This locking stuff has nothing that looks problematic to me for endian
issues.

>> radeon_ring_write(ring, PACKET0(scratch, 0));

Until we get to this ^, this is kind of a nightmare for me to grasp.

The driver uses ring, or circular queue data structures as I'm familiar
with for interacting with the gpu, work items are written to the queue
and read by the gpu. The queue code doesn't look like it should have
endian issues and obviously it's working on other platforms so can
probably ignore it. It's weird to me that linux, bsd et al have circular
queue data structures pre-rolled so I don't know why they made their
own, perhaps ignorance or hubris, or they are super smart?

PACKET0(scratch, 0) this is kind of a monster though.

#define PACKET0(reg, n) (CP_PACKET0 |   \
 REG_SET(PACKET0_BASE_INDEX, (reg) >> 2) |  \
 REG_SET(PACKET0_COUNT, (n)))
and also here:

#define REG_SET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)

Think that maybe this is a big candidate for some sort of endian bug? I
do. hmmm. This looks insanely fragile but maybe I'm crazy and probably a
moron or something.

>> radeon_ring_write(ring, 0xDEADBEEF);
>> radeon_ring_unlock_commit(rdev, ring, false);
>> for (i = 0; i < rdev->usec_timeout; i++) {
>>  tmp = RREG32(scratch);
>>  if (tmp == 0xDEADBEEF) {
>>  break;
>>  }
>>  udelay(1);
>> }
>> if (i < rdev->usec_timeout) {
>>  DRM_INFO("ring test succeeded in %d usecs\n", i);
>> } else {
>>  DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",
>>scratch, tmp);
>>  r = -EINVAL;
>> }


-- 
Ted Bullock 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Mark Kettenis
> Date: Sat, 11 Dec 2021 05:10:41 -0700
> From: Ted Bullock 
> 
> On 2021-12-11 4:41 a.m., Mark Kettenis wrote:
> >> Date: Fri, 10 Dec 2021 17:24:58 -0700
> >> From: Ted Bullock 
> > So the real problem is:
> > 
> >> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> >> [drm] *ERROR* radeon: cp isn't working (-22).
> >> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> >> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> >> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> >> Failed to wait GUI idle while programming pipes. Bad things might happen.
> > 
> > as a result of this GPU acceleration is disabled and software
> > rendering is used.  Which obviously has endian-ness issues.
> 
> Yeah so there are actually 2 problems here.  The first is the fault you 
> can see above causing it to fall back to software rendering.  The second 
> is that there is going to be some sort of endian issue (probably) with 
> the software renderer causing everything to display in the wrong colors.
> 
> > The sad truth is that most of us don't have much time to test older
> > hardware and we tend to favor making new hardware work correctly over
> > keeping the really old stuff working.  But help is appreciated and we
> > certainly won't outright reject any fixes you discover.
> 
> That's totally expected, and not a problem for me. I'm definitely not 
> looking for other people to swoop in and fix this old stuff for me, but 
> I am trying to document what I'm finding, and if it's possible to keep 
> stuff working a while longer I think it's worth my time. It's not like 
> there will ever be another ultrasparc workstation made but there is 
> definitely big endian stuff out in the world. Like that new powerpc 
> system which is unfortunately a little too expensive to just buy to have 
> one sitting around.
> 
> Is this more appropriate to take to the freedesktop.org bug list btw?

Yes, but the most likely answer you'll get there is probably "we don't
care about big-endian platforms".

> > That said I think Jonathan said that support for the R100 is going to
> > be removed from Mesa, which would probably mean the end of GPU
> > acceleration support for that hardware.
> 
> That's kind of sad to hear given how much hardware is going to still be 
> out there, but I guess it depends on people using it, testing and 
> fixing. c'est la vie.
> 
> ok, regarding this fault, it's also apparently impacting macppc [0] and 
> has been around for a while [1].

The are several reasons why that test can fail though.  It can be an
endian-ness issue or on sparc64 it could also be an IOMMU issue where
the wrong address is programmed into the hardware because CPU
addresses aren't properly translated into device virtual addresses.

> sys/dev/pci/drm/radeon/r100.c:3651
> WREG32(scratch, 0xCAFEDEAD);
> r = radeon_ring_lock(rdev, ring, 2);
> if (r) {
>   DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
>   radeon_scratch_free(rdev, scratch);
>   return r;
> }
> radeon_ring_write(ring, PACKET0(scratch, 0));
> radeon_ring_write(ring, 0xDEADBEEF);
> radeon_ring_unlock_commit(rdev, ring, false);
> for (i = 0; i < rdev->usec_timeout; i++) {
>   tmp = RREG32(scratch);
>   if (tmp == 0xDEADBEEF) {
>   break;
>   }
>   udelay(1);
> }
> if (i < rdev->usec_timeout) {
>   DRM_INFO("ring test succeeded in %d usecs\n", i);
> } else {
>   DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",
> scratch, tmp);
>   r = -EINVAL;
> }
> 
> [0] https://marc.info/?l=openbsd-bugs=162447131102854
> [1] https://gitlab.freedesktop.org/drm/amd/-/issues/162
> 
> -- 
> Ted Bullock 
> 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Jonathan Gray
On Sat, Dec 11, 2021 at 12:41:46PM +0100, Mark Kettenis wrote:
> > Date: Fri, 10 Dec 2021 17:24:58 -0700
> > From: Ted Bullock 
> > 
> > On 2021-12-10 12:53 a.m., Jonathan Gray wrote:
> > > On Thu, Dec 09, 2021 at 10:01:30PM -0700, Ted Bullock wrote:
> > >> Thoughts folks? This is clearly going to impact all big endian + radeon 
> > >> gear.
> > >>
> > >> Actually, I bet that the macppc platform has the same problem too.
> > > 
> > > sparc64 maps pci little endian, I don't think macppc does
> > > 
> > > can you try the following?
> > 
> > Yeah that did resolve the bios warning; X is yellow still though.
> 
> So the real problem is:
> 
> > [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> > [drm] *ERROR* radeon: cp isn't working (-22).
> > drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> > drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> > [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> > Failed to wait GUI idle while programming pipes. Bad things might happen.
> 
> as a result of this GPU acceleration is disabled and software
> rendering is used.  Which obviously has endian-ness issues.
> 
> I believe the XVR-300 doesn't hit these errors and still (mostly)
> works.  But you can't plug one of those into a blade100.
> 
> The sad truth is that most of us don't have much time to test older
> hardware and we tend to favor making new hardware work correctly over
> keeping the really old stuff working.  But help is appreciated and we
> certainly won't outright reject any fixes you discover.

I test on old r100 and i915 hardware but on i386 not sparc64.

> 
> That said I think Jonathan said that support for the R100 is going to
> be removed from Mesa, which would probably mean the end of GPU
> acceleration support for that hardware.

In Mesa git (and >= 22.0) the non-gallium 'classic' drivers are removed.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10153
For radeon r100/r200 are removed.
For intel i915 (gen 2-3), i965 (gen 4-7)
For i965 there is a new replacement gallium driver (crocus)

The proposed way of supporting older hardware is building the 'amber'
branch based on 21.3 in addition to building a newer release, with 
the nvidia libglvnd acting as libGL calling different Mesa versions.
I don't think we'd want to go that route.

I have a Mesa 21.3 update in the works which still has the classic
drivers but that is on hold for now as it expects the kernel drm to
implement android sync file interfaces in at least the intel vulkan
driver.



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Ted Bullock

On 2021-12-11 4:41 a.m., Mark Kettenis wrote:

Date: Fri, 10 Dec 2021 17:24:58 -0700
From: Ted Bullock 

So the real problem is:


[drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[drm] *ERROR* radeon: cp isn't working (-22).
drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
drm:pid0:r100_init *ERROR* Disabling GPU acceleration
[drm] *ERROR* Wait for CP idle timeout, shutting down CP.
Failed to wait GUI idle while programming pipes. Bad things might happen.


as a result of this GPU acceleration is disabled and software
rendering is used.  Which obviously has endian-ness issues.


Yeah so there are actually 2 problems here.  The first is the fault you 
can see above causing it to fall back to software rendering.  The second 
is that there is going to be some sort of endian issue (probably) with 
the software renderer causing everything to display in the wrong colors.



The sad truth is that most of us don't have much time to test older
hardware and we tend to favor making new hardware work correctly over
keeping the really old stuff working.  But help is appreciated and we
certainly won't outright reject any fixes you discover.


That's totally expected, and not a problem for me. I'm definitely not 
looking for other people to swoop in and fix this old stuff for me, but 
I am trying to document what I'm finding, and if it's possible to keep 
stuff working a while longer I think it's worth my time. It's not like 
there will ever be another ultrasparc workstation made but there is 
definitely big endian stuff out in the world. Like that new powerpc 
system which is unfortunately a little too expensive to just buy to have 
one sitting around.


Is this more appropriate to take to the freedesktop.org bug list btw?


That said I think Jonathan said that support for the R100 is going to
be removed from Mesa, which would probably mean the end of GPU
acceleration support for that hardware.


That's kind of sad to hear given how much hardware is going to still be 
out there, but I guess it depends on people using it, testing and 
fixing. c'est la vie.


ok, regarding this fault, it's also apparently impacting macppc [0] and 
has been around for a while [1].


sys/dev/pci/drm/radeon/r100.c:3651
WREG32(scratch, 0xCAFEDEAD);
r = radeon_ring_lock(rdev, ring, 2);
if (r) {
DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
radeon_scratch_free(rdev, scratch);
return r;
}
radeon_ring_write(ring, PACKET0(scratch, 0));
radeon_ring_write(ring, 0xDEADBEEF);
radeon_ring_unlock_commit(rdev, ring, false);
for (i = 0; i < rdev->usec_timeout; i++) {
tmp = RREG32(scratch);
if (tmp == 0xDEADBEEF) {
break;
}
udelay(1);
}
if (i < rdev->usec_timeout) {
DRM_INFO("ring test succeeded in %d usecs\n", i);
} else {
DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",
  scratch, tmp);
r = -EINVAL;
}

[0] https://marc.info/?l=openbsd-bugs=162447131102854
[1] https://gitlab.freedesktop.org/drm/amd/-/issues/162

--
Ted Bullock 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Mark Kettenis
> Date: Fri, 10 Dec 2021 17:24:58 -0700
> From: Ted Bullock 
> 
> On 2021-12-10 12:53 a.m., Jonathan Gray wrote:
> > On Thu, Dec 09, 2021 at 10:01:30PM -0700, Ted Bullock wrote:
> >> Thoughts folks? This is clearly going to impact all big endian + radeon 
> >> gear.
> >>
> >> Actually, I bet that the macppc platform has the same problem too.
> > 
> > sparc64 maps pci little endian, I don't think macppc does
> > 
> > can you try the following?
> 
> Yeah that did resolve the bios warning; X is yellow still though.

So the real problem is:

> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.

as a result of this GPU acceleration is disabled and software
rendering is used.  Which obviously has endian-ness issues.

I believe the XVR-300 doesn't hit these errors and still (mostly)
works.  But you can't plug one of those into a blade100.

The sad truth is that most of us don't have much time to test older
hardware and we tend to favor making new hardware work correctly over
keeping the really old stuff working.  But help is appreciated and we
certainly won't outright reject any fixes you discover.

That said I think Jonathan said that support for the R100 is going to
be removed from Mesa, which would probably mean the end of GPU
acceleration support for that hardware.

> FWIW, I was worried there might be a hardware fault here so I tested on 
> solaris
> and it was working appropriately there.
> 
> Current relevant dmesg:
> 
> radeondrm0: ivec 0x7d5
> machfb0 at pci0 dev 19 function 0 "ATI Rage XL" rev 0x27
> machfb0: ATY,RageXL, 1152x900
> wsdisplay0 at machfb0 mux 1
> wsdisplay0: screen 0 added (std, sun emulation)
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 configuration 1 interface 0 "Sun OHCI root hub" rev 1.00/1.00 
> addr 1
> dt: 451 probes
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootpath: /pci@1f,0/ide@d,0/disk@0,0
> root on wd0a (abe1c474.a) swap on wd0b dump on wd0b
> radeondrm0: RV100
> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.
> radeondrm0: 1280x1024, 8bpp
> wsdisplay1 at radeondrm0 mux 1
> wsdisplay1: screen 0 added (std, sun emulation)
> Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
> mask=0x7)
> 
> 
> 
> 
> -- 
> Ted Bullock 
> 
> 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-10 Thread Ted Bullock
On 2021-12-10 12:53 a.m., Jonathan Gray wrote:
> On Thu, Dec 09, 2021 at 10:01:30PM -0700, Ted Bullock wrote:
>> Thoughts folks? This is clearly going to impact all big endian + radeon gear.
>>
>> Actually, I bet that the macppc platform has the same problem too.
> 
> sparc64 maps pci little endian, I don't think macppc does
> 
> can you try the following?

Yeah that did resolve the bios warning; X is yellow still though.

FWIW, I was worried there might be a hardware fault here so I tested on solaris
and it was working appropriately there.

Current relevant dmesg:

radeondrm0: ivec 0x7d5
machfb0 at pci0 dev 19 function 0 "ATI Rage XL" rev 0x27
machfb0: ATY,RageXL, 1152x900
wsdisplay0 at machfb0 mux 1
wsdisplay0: screen 0 added (std, sun emulation)
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 configuration 1 interface 0 "Sun OHCI root hub" rev 1.00/1.00 
addr 1
dt: 451 probes
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootpath: /pci@1f,0/ide@d,0/disk@0,0
root on wd0a (abe1c474.a) swap on wd0b dump on wd0b
radeondrm0: RV100
[drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[drm] *ERROR* radeon: cp isn't working (-22).
drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
drm:pid0:r100_init *ERROR* Disabling GPU acceleration
[drm] *ERROR* Wait for CP idle timeout, shutting down CP.
Failed to wait GUI idle while programming pipes. Bad things might happen.
radeondrm0: 1280x1024, 8bpp
wsdisplay1 at radeondrm0 mux 1
wsdisplay1: screen 0 added (std, sun emulation)
Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
mask=0x7)
Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
mask=0x7)
Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
mask=0x7)




-- 
Ted Bullock 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-09 Thread Jonathan Gray
On Thu, Dec 09, 2021 at 10:01:30PM -0700, Ted Bullock wrote:
> On 2021-12-09 6:46 p.m., Ted Bullock wrote:
> > On 2021-12-06 4:21 p.m., Ted Bullock wrote:
> > I think that there is an bug triggered by endian code here:
> > 
> >> radeondrm0: RV100
> >> BIOS signature incorrect 0 0
> > 
> > in sys/dev/pci/drm/radeon/radeon_bios.c:840
> > 
> > if (rdev->bios[0] != 0x55 || rdev->bios[1] != 0xaa) {
> > printk("BIOS signature incorrect %x %x\n", rdev->bios[0], 
> > rdev->bios[1]);
> > goto free_bios;
> > }
> > 
> > I'm pretty sure that on sparc those bytes aren't going to be reporting
> > the same information as on a little endian machine. Or am I crazy and
> > wrong...
> > 
> 
> Indeed, I'm correct about there being an endian bug here.
> 
> I wrote some testing printfs to determine the code path since I'm still
> an uneducated peasant who doesn't understand ddb.  At least part of the
> problem for this card/system, starts with the following code:
> 
> function: radeon_read_bios in sys/dev/pci/drm/radeon/radeon_bios.c:157
> 
> I added a test around the memcpy where the cards bios is copied to a
> buffer rdev->bios and printed the first 8 bytes.
> 
>   printk("radeon bios header: %x %x %x %x %x %x %x %x\n",
>   bios[0],
>   bios[1],
>   bios[2],
>   bios[3],
>   bios[4],
>   bios[5],
>   bios[6],
>   bios[7]);
> 
>   rdev->bios = kmalloc(size, GFP_KERNEL);
>   memcpy(rdev->bios, bios, size);
> 
>   printk("buffered bios header: %x %x %x %x %x %x %x %x\n",
>   rdev->bios[0],
>   rdev->bios[1],
>   rdev->bios[2],
>   rdev->bios[3],
>   rdev->bios[4],
>   rdev->bios[5],
>   rdev->bios[6],
>   rdev->bios[7]);
> 
> On the following boot I see this:
> 
> Rebooting with command: boot
> Boot device: disk  File and args:
> OpenBSD IEEE 1275 Bootblock 2.1
> ..>> OpenBSD BOOT 1.22
> Trying bsd...
> 
> 
> 
> radeondrm0: RV100
> radeon bios header: 55 aa 34 0 0 0 0 0
> buffered bios header: 0 0 0 0 0 34 aa 55
> BIOS signature incorrect 0 0
> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.
> radeondrm0: 1280x1024, 8bpp
> wsdisplay1 at radeondrm0 mux 1
> wsdisplay1: screen 0 added (std, sun emulation)
> Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
> mask=0x7)
> 
> Thoughts folks? This is clearly going to impact all big endian + radeon gear.
> 
> Actually, I bet that the macppc platform has the same problem too.

sparc64 maps pci little endian, I don't think macppc does

can you try the following?

Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_bios.c
===
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_bios.c,v
retrieving revision 1.4
diff -u -p -r1.4 amdgpu_bios.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_bios.c7 Jul 2021 02:38:22 -   
1.4
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_bios.c10 Dec 2021 07:41:39 -
@@ -200,7 +200,6 @@ bool amdgpu_read_bios(struct amdgpu_devi
 #else
 bool amdgpu_read_bios(struct amdgpu_device *adev)
 {
-   uint8_t __iomem *bios;
size_t size;
pcireg_t address, mask;
bus_space_handle_t romh;
@@ -218,25 +217,15 @@ bool amdgpu_read_bios(struct amdgpu_devi
size = PCI_ROM_SIZE(mask);
if (size == 0)
return false;
-   rc = bus_space_map(adev->memt, PCI_ROM_ADDR(address), size,
-   BUS_SPACE_MAP_LINEAR, );
+   rc = bus_space_map(adev->memt, PCI_ROM_ADDR(address), size, 0, );
if (rc != 0) {
printf(": can't map PCI ROM (%d)\n", rc);
return false;
}
-   bios = (uint8_t *)bus_space_vaddr(adev->memt, romh);
-   if (!bios) {
-   printf(": bus_space_vaddr failed\n");
-   return false;
-   }
 
adev->bios = kzalloc(size, GFP_KERNEL);
-   if (adev->bios == NULL) {
-   bus_space_unmap(adev->memt, romh, size);
-   return false;
-   }
adev->bios_size = size;
-   memcpy_fromio(adev->bios, bios, size);
+   bus_space_read_region_1(adev->memt, romh, 0, adev->bios, size);
bus_space_unmap(adev->memt, romh, size);
 
if (!check_atom_bios(adev->bios, size)) {
Index: sys/dev/pci/drm/radeon/radeon_bios.c
===
RCS file: 

Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-09 Thread Ted Bullock
On 2021-12-09 6:46 p.m., Ted Bullock wrote:
> On 2021-12-06 4:21 p.m., Ted Bullock wrote:
> I think that there is an bug triggered by endian code here:
> 
>> radeondrm0: RV100
>> BIOS signature incorrect 0 0
> 
> in sys/dev/pci/drm/radeon/radeon_bios.c:840
> 
> if (rdev->bios[0] != 0x55 || rdev->bios[1] != 0xaa) {
>   printk("BIOS signature incorrect %x %x\n", rdev->bios[0], 
> rdev->bios[1]);
>   goto free_bios;
> }
> 
> I'm pretty sure that on sparc those bytes aren't going to be reporting
> the same information as on a little endian machine. Or am I crazy and
> wrong...
> 

Indeed, I'm correct about there being an endian bug here.

I wrote some testing printfs to determine the code path since I'm still
an uneducated peasant who doesn't understand ddb.  At least part of the
problem for this card/system, starts with the following code:

function: radeon_read_bios in sys/dev/pci/drm/radeon/radeon_bios.c:157

I added a test around the memcpy where the cards bios is copied to a
buffer rdev->bios and printed the first 8 bytes.

printk("radeon bios header: %x %x %x %x %x %x %x %x\n",
bios[0],
bios[1],
bios[2],
bios[3],
bios[4],
bios[5],
bios[6],
bios[7]);

rdev->bios = kmalloc(size, GFP_KERNEL);
memcpy(rdev->bios, bios, size);

printk("buffered bios header: %x %x %x %x %x %x %x %x\n",
rdev->bios[0],
rdev->bios[1],
rdev->bios[2],
rdev->bios[3],
rdev->bios[4],
rdev->bios[5],
rdev->bios[6],
rdev->bios[7]);

On the following boot I see this:

Rebooting with command: boot
Boot device: disk  File and args:
OpenBSD IEEE 1275 Bootblock 2.1
..>> OpenBSD BOOT 1.22
Trying bsd...



radeondrm0: RV100
radeon bios header: 55 aa 34 0 0 0 0 0
buffered bios header: 0 0 0 0 0 34 aa 55
BIOS signature incorrect 0 0
[drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[drm] *ERROR* radeon: cp isn't working (-22).
drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
drm:pid0:r100_init *ERROR* Disabling GPU acceleration
[drm] *ERROR* Wait for CP idle timeout, shutting down CP.
Failed to wait GUI idle while programming pipes. Bad things might happen.
radeondrm0: 1280x1024, 8bpp
wsdisplay1 at radeondrm0 mux 1
wsdisplay1: screen 0 added (std, sun emulation)
Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
mask=0x7)
Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
mask=0x7)
Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
mask=0x7)

Thoughts folks? This is clearly going to impact all big endian + radeon gear.

Actually, I bet that the macppc platform has the same problem too.


-- 
Ted Bullock 



Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-09 Thread Ted Bullock
On 2021-12-06 4:21 p.m., Ted Bullock wrote:
> Ok, so this time I plugged in a discrete GPU into this ultrasparc
> system, the sun XVR-100 which is a PCI card with vga and dvi ports.  The
> card uses an ati radeon r100 generation video chip.

I think that there is an bug triggered by endian code here:

> radeondrm0: RV100
> BIOS signature incorrect 0 0

in sys/dev/pci/drm/radeon/radeon_bios.c:840

if (rdev->bios[0] != 0x55 || rdev->bios[1] != 0xaa) {
printk("BIOS signature incorrect %x %x\n", rdev->bios[0], 
rdev->bios[1]);
goto free_bios;
}

I'm pretty sure that on sparc those bytes aren't going to be reporting
the same information as on a little endian machine. Or am I crazy and
wrong...

At the moment I don't know how to use the debugger to inspect what's
happening here. So stay tuned I suppose while I learn some stuff. In the
meantime I'll throw some printf to see what's actually there (after this
slow machine builds a test kernel which seems to take a while :P

> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.
> radeondrm0: 1280x1024, 8bpp
> wsdisplay1 at radeondrm0 mux 1: console (std, sun emulation), using wskbd0
> Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
> mask=0x7)
> 

^^^ I don't think that any of this information means anything until
talking to the cards bios works.

-- 
Ted Bullock