Re: [Dri-devel] Deadlock with radeon DRI
On Fri, 10 Oct 2003, Keith Whitwell wrote: Date: Fri, 10 Oct 2003 21:55:57 +0100 From: Keith Whitwell [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: DRI Devel [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii; format=flowed Subject: Re: [Dri-devel] Deadlock with radeon DRI John Dennis wrote: The locking problem is solved, my original analysis was incorrect. The problem was that DRM_CAS was not correctly implemented on IA64. Thus this was an IA64 issue only, this is consistent with others who showed up in a google search describing the problem, all were on IA64. I have filed an XFree86 bug report on this. I could not find a DRI specific bug reporting mechanism other than the dri-devel list. This list is good. Can you put together a patch to correct the behaviour? http://bugs.xfree86.org/show_bug.cgi?id=778 has John's patch that fixes this. Thanks Keith, TTYL -- Mike A. Harris ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
The locking problem is solved, my original analysis was incorrect. The problem was that DRM_CAS was not correctly implemented on IA64. Thus this was an IA64 issue only, this is consistent with others who showed up in a google search describing the problem, all were on IA64. I have filed an XFree86 bug report on this. I could not find a DRI specific bug reporting mechanism other than the dri-devel list. The IA64 implementation of CAS was this: #define DRM_CAS(lock,old,new,__ret) \ do { \ unsigned int __result, __old = (old); \ __asm__ __volatile__( \ mf\n\ mov ar.ccv=%2\n \ ;;\n\ cmpxchg4.acq %0=%1,%3,ar.ccv\ : =r (__result), =m (__drm_dummy_lock(lock)) \ : r (__old), r (new) \ : memory); \ __ret = (__result) != (__old);\ } while (0) The problem was with the data types given to the cmpxchg4 instruction. All of the lock types in DRM are int's and on IA64 thats 4 bytes wide. The digit suffix cmpxchg4 signifies this instruction operates on a 4 byte quantity. One might expect then since this instruction operates on 4 byte values and in DRM the locks are 4 bytes everything is fine, but it isn't. The cmpxchg4 instruction operates this way: cmpxchg4 r1=[r3],r2,ar.ccv 4 bytes are read at the address pointed to by r3, that 32 bit value is then zero extended to 64 bits. The 64 bit value is then compared to the 64 bit value stored in appliation register CCV. If the two 64 bit values are equal then the least significant 4 bytes in r2 are written back to the address pointed to by r3. The original value pointed to by r3 is stored in r1. The entire operation is atomic. The mistake in the DRM_CAS implemenation is that the comparison is 64 bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64 bits wide and for us that means zero extending the 32 bit old parameter to 64 bits. Because of the way GCC asm blocks work to tie C variables and data types to asm values the promotion of old from unsigned int to unsigned long was not happening. Thus when old was stored into ar.ccv its most significant 32 bits contained garbage. (Actually because of the way GCC generates constants it turns out the upper 32 bits was 0x, this was from the OR of DRM_LOCK_HELD which is defined as 0x8000, but the compiler generates a 64 bit OR operation using the immediate value 0x8000, which is legal because the upper 32 bits are undefined on int (32 bit) operations). The bottom line is that the test would fail when it shouldn't because the high 32 bits in ar.ccv were not zero. One might think that because old was assigned to __old in a local block which was unsigned int the compiler would know enough when using this value in the asm to have zero extended it. But that's not true, in asm blocks its critical to define the asm value correctly so the compiler can translate between the C code variable and what the asm code is referring to. The line: : r (__old), r (new) says %2 is mapped by r (__old), in other words put __old in a general 64 bit register. We've told the compiler to put 64 bits of __old into a register, but __old is a 32 bit value with its high order 32 bits undefined. We need to tell the compiler to widen the type when assigning it to a general register, thus the asm template type definition needs to be modified with a cast to unsigned long. : r ((unsigned long)__old), r (new) Only with this will the compiler know to widen the 32 bit __old value to 64 bits inside the asm code. Thanks to Jakub Jelinek who helped me understand the nuances of GCC asm templates and type conversions. As a minor side note, definitions of bit flags should be tagged as unsigned. Thus things like: #define DRM_LOCK_HELD 0x8000 #define DRM_LOCK_CONT 0x4000 should really be: #define DRM_LOCK_HELD 0x8000U #define DRM_LOCK_CONT 0x4000U John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
John Dennis wrote: The locking problem is solved, my original analysis was incorrect. The problem was that DRM_CAS was not correctly implemented on IA64. Thus this was an IA64 issue only, this is consistent with others who showed up in a google search describing the problem, all were on IA64. I have filed an XFree86 bug report on this. I could not find a DRI specific bug reporting mechanism other than the dri-devel list. This list is good. Can you put together a patch to correct the behaviour? Keith ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
John Dennis wrote: [Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ] I'm trying to debug a hung X server problem with DRI using the radeon driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at the moment I don't see anything architecture specific about the problem. The symptom of the problem is the following message from the drm radeon kernel driver: [drm:radeon_lock_take] *ERROR* x holds heavyweight lock where x is a context id. I've tracked the sequence of events down to the following: DRIFinishScreenInit is called during the radeon driver initialization, inside DRIFinishScreenInit is the following code snippet: /* Now that we have created the X server's context, we can grab the * hardware lock for the X server. */ DRILock(pScreen, 0); pDRIPriv-grabbedDRILock = TRUE; Slightly later on RADEONAdjustFrame is called and it does the following: #ifdef XF86DRI if (info-CPStarted) DRILock(pScrn-pScreen, 0); #endif Its this DRILock which is causing the *ERROR* x holds heavyweight lock message. The reason is both DRIFinishScreenInit and RADEONAdjustFrame are executing in the server and using the servers DRI lock. DRIFinishScreenInit never unlocks, it sets the grabbedDRILock flag, big deal, no one ever references this flag. When RADEONAdjustFrame calls DRILock its already locked because DRIFinishScreenInit locked and never unlocked. The dri kernel driver on the second lock call then suspends the X server process (DRM(lock_take) returns zero to DRM(lock) because the context holding the lock and context requesting the lock are the same, this then causes DRM(lock) to put the X server on the lock wait queue). Putting the X server on the wait queue waiting for the lock to be released then deadlocks the X server because its the process holding the lock on its context. Questions: The whole crux of the problem seems to me the taking and holding of the lock in DRIFinishScreenInit. Why is this being done? It is done because the X server expects to be holding the lock whenever it is between the Wakeup Block handlers. The odd man out in this case is when the server first starts up, it won't have aquired the lock coming in through the Wakeup handler. So, it gets aquired at this point. The rest of the DRI initialization code needs to be holding the lock, so we have to grab it somewhere. The problem seems to be that RADEONAdjustFrame() is designed to be called from cursor handling routines that are executed outside the Wakeup/Block handlers (perhaps this came in with SilkenMouse?) but is being called during initialization after the point the lock is grabbed. I haven't deeply investigated this but two solutions spring to mind: - Hack: Move the call to RADEONAdjustFrame() during initialization to before the lock is grabbed. - Better: Replace the call to RADEONAdjustFrame() during initialization with something like: if (info-FBDev) { fbdevHWAdjustFrame(scrnIndex, x, y, flags); } else { RADEONDoAdjustFrame(pScrn, x, y, FALSE); } which is basically what RADEONAdjustFrame() wraps. Keith ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
Keith Whitwell wrote: John Dennis wrote: [Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ] I'm trying to debug a hung X server problem with DRI using the radeon driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at the moment I don't see anything architecture specific about the problem. The symptom of the problem is the following message from the drm radeon kernel driver: [drm:radeon_lock_take] *ERROR* x holds heavyweight lock where x is a context id. I've tracked the sequence of events down to the following: DRIFinishScreenInit is called during the radeon driver initialization, inside DRIFinishScreenInit is the following code snippet: /* Now that we have created the X server's context, we can grab the * hardware lock for the X server. */ DRILock(pScreen, 0); pDRIPriv-grabbedDRILock = TRUE; Slightly later on RADEONAdjustFrame is called and it does the following: #ifdef XF86DRI if (info-CPStarted) DRILock(pScrn-pScreen, 0); #endif Its this DRILock which is causing the *ERROR* x holds heavyweight lock message. The reason is both DRIFinishScreenInit and RADEONAdjustFrame are executing in the server and using the servers DRI lock. DRIFinishScreenInit never unlocks, it sets the grabbedDRILock flag, big deal, no one ever references this flag. When RADEONAdjustFrame calls DRILock its already locked because DRIFinishScreenInit locked and never unlocked. The dri kernel driver on the second lock call then suspends the X server process (DRM(lock_take) returns zero to DRM(lock) because the context holding the lock and context requesting the lock are the same, this then causes DRM(lock) to put the X server on the lock wait queue). Putting the X server on the wait queue waiting for the lock to be released then deadlocks the X server because its the process holding the lock on its context. Questions: The whole crux of the problem seems to me the taking and holding of the lock in DRIFinishScreenInit. Why is this being done? It is done because the X server expects to be holding the lock whenever it is between the Wakeup Block handlers. The odd man out in this case is when the server first starts up, it won't have aquired the lock coming in through the Wakeup handler. So, it gets aquired at this point. The rest of the DRI initialization code needs to be holding the lock, so we have to grab it somewhere. The problem seems to be that RADEONAdjustFrame() is designed to be called from cursor handling routines that are executed outside the Wakeup/Block handlers (perhaps this came in with SilkenMouse?) but is being called during initialization after the point the lock is grabbed. Actually, looking more closely at RADEONAdjustFrame(), I would guess that the test of (info-CPStarted) is designed to avoid precisely this problem, right? So I wonder why that's not working for you... Keith ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
Keith Whitwell wrote: I haven't deeply investigated this but two solutions spring to mind: - Hack: Move the call to RADEONAdjustFrame() during initialization to before the lock is grabbed. - Better: Replace the call to RADEONAdjustFrame() during initialization with something like: if (info-FBDev) { fbdevHWAdjustFrame(scrnIndex, x, y, flags); } else { RADEONDoAdjustFrame(pScrn, x, y, FALSE); } which is basically what RADEONAdjustFrame() wraps. That seems like the right way to go, but I'd feel better if the body of RADEONAdjectFrame was moved to a new function called RADEONAdjectFrameLocked. RADEONAdjectFrame would just lock, call RADEONAdjectFrameLocked, and unlock. That matches what's been done elsewhere in the 3D driver, anyway. ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel