On Tue, Oct 23, 2012 at 8:31 AM, Benjamin Herrenschmidt <[email protected]> wrote: > On Tue, 2012-10-23 at 21:45 +1100, Benjamin Herrenschmidt wrote: >> On Tue, 2012-10-23 at 18:54 +1100, Benjamin Herrenschmidt wrote: >> > On Tue, 2012-10-23 at 18:42 +1100, Benjamin Herrenschmidt wrote: >> > > >> > > As you can see, it's not doing much before the failure: >> > >> > Allright, that debug output is bad, it's missing a bunch of stuff, >> > due to a bad log level (the prink(KERN_DEBUG) in the atom debug >> > stuff doesn't work anymore new kernel btw) >> >> More data: I've done a bit of AtomDis under Dave instructions and >> improved my tracing, and what it looks like is we run those 3 tables in >> that order: > > And more :-) > > .../... > >> I don't know (yet) whether anything happens in between that doesn't go >> via ATOM, in which case that wouldn't be traced. That's the next thing >> to check (including interrupts though we shouldn't be getting any at >> this stage afaik). > > So I think it's in between. From what I can tell, the error happens > somewhere inside the call to drm_vblank_pre_modeset() from > atombios_crtc_dpms(). > > This is actually a bit nasty, but basically that pre_modeset() calls > drm_vblank_get() which enables vblanks. > > Now, it doesn't look like it's actually taking any interrupt ... Well > assuming this is indeed using the irq handler in evergreen.c, which > doesn't appear to be called, but I might have confused my ASICs and > missed something specific to CEDAR here. > > Here's what I've traced so far: > > evergreen_irq_set: vblank 0 > evergreen_irq_set: hpd 1 > evergreen_irq_set: hpd 2 > evergreen_irq_set: hpd 3 > evergreen_irq_set: hpd 4 > 0001:01:00.0: EEH freeze detected, fstate=3 pcierr=9 msg: irqset 2 > > What that output means is that it called evergreen_irq_set() which > enables vblank0 (and various hpd's but those were already enabled), and > the freeze is detected at the tracepoint "irqset 2" that I added in > there. > > This point is basically right at the end of evergreen_irq_set(), where I > do a 500ms delay and check for freeze. A previous trace point right > before writing to CP_INT_CNTL didn't show any freeze. > > Now the interrupt being an MSI, it's a memory store ... I had a vague > memory of one of you guys mentioning address limitations to 40-bit or so > in the radeon, though I though that shouldn't affect MSIs right ? > Well ... > > Our 64-bit MSIs are actually using all 64-bit address bits. If the > radeon doesn't do that properly and crops the address bits, the MSIs are > going to hit wrong, right in the middle of nowhere, possibly some DMA > space. > > So I hacked my platform code to force it to only hand out 32-bit MSI > addresses and guess what ? ... the problem seems to be gone. Ouch. > > That's really nasty. Supporting only a subset of the PCI address space > for DMA was already fairly nasty to begin with, but not doing the full > MSI addresses looks like a clear violation of the PCIe spec :-( > > I'll do some more tests tomorrow to confirm whether that is the problem > or not at which point, if it is, we'll need some kind of quirk to > indicate that it supports only MSI32 and not MSI64 or something along > those lines. Guys, go shoot your HW engineers please. >
Well, we only support a 40 bit DMA mask, so I suspect MSIs are limited to 40 bits as well. Alex _______________________________________________ xorg-driver-ati mailing list [email protected] http://lists.x.org/mailman/listinfo/xorg-driver-ati
