[Nouveau] reproducible CACHE_ERRORS

Marcin Slusarz Thu, 20 Dec 2012 14:59:45 -0800

Hi

I found a way to reliably reproduce PFIFO CACHE_ERRORS, but I don't see why.


How to reproduce them:
1) Run a couple of glxinfo loops - 3 is usually enough.
   while [ true ]; do nvgl glxinfo >/dev/null 2>/dev/null; done
2) Run glxgears.
3) If you can't see it now, resize glxgears window or add more glxinfos.
Note: you need at least 2 CPUs.

Usually the error looks like this (with a bit improved logging):
nouveau E[   PFIFO][0000:02:00.0] CACHE_ERROR - ch 6 [glxgears[15559]] subc 0 
mthd 0x0060 data 0x8000000f c1p0 0x20000010 HASH_FAILED (unknown bits 
0x20000000) c1_hash 0x00000436

What I found so far:
1) It's triggered by setting of NV11_SUBCHAN_DMA_SEMAPHORE to NvSema
   (0x8000000f) in nv84_fence_emit. Hw tells us it cannot find ramht entry
   for NvSema object (NV04_PFIFO_CACHE1_PULL0 == HASH_FAILED, frequently
   unknown 30th bit is set)
2) In 95% cases CACHE_ERRORs are triggered on glxgears channel.
3) RAMHT entry was definitely created and used many times before reporting
   an error. Next use of NvSema usually does NOT trigger another CACHE_ERROR.
4) NV04_PFIFO_CACHE1_HASH has the same value as was written to ramht.
5) I can replace glxinfo loops with application which creates and destroys
   PGRAPH objects. PCRYPT, PMPEG and SW objects do NOT provoke this bug.
   (program attached)
6) There are no interrupts between CACHE_ERRORs, so it's not caused by race in
   cache_error/software method handling.

Any ideas how to debug it?

Marcin

test_objects.tar.gz
Description: Binary data

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] reproducible CACHE_ERRORS

Reply via email to