Re: [Dri-devel] Deadlock with radeon DRI

2003-10-11 Thread Mike A. Harris
On Fri, 10 Oct 2003, Keith Whitwell wrote:

Date: Fri, 10 Oct 2003 21:55:57 +0100
From: Keith Whitwell [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: DRI Devel [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
Content-Type: text/plain; charset=us-ascii; format=flowed
Subject: Re: [Dri-devel] Deadlock with radeon DRI

John Dennis wrote:
 The locking problem is solved, my original analysis was incorrect. The
 problem was that DRM_CAS was not correctly implemented on IA64. Thus
 this was an IA64 issue only, this is consistent with others who showed
 up in a google search describing the problem, all were on IA64.
 
 I have filed an XFree86 bug report on this. I could not find a DRI
 specific bug reporting mechanism other than the dri-devel list.

This list is good.  Can you put together a patch to correct the behaviour?

http://bugs.xfree86.org/show_bug.cgi?id=778 has John's patch that 
fixes this.

Thanks Keith,
TTYL

-- 
Mike A. Harris

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-10 Thread John Dennis
The locking problem is solved, my original analysis was incorrect. The
problem was that DRM_CAS was not correctly implemented on IA64. Thus
this was an IA64 issue only, this is consistent with others who showed
up in a google search describing the problem, all were on IA64.

I have filed an XFree86 bug report on this. I could not find a DRI
specific bug reporting mechanism other than the dri-devel list.

The IA64 implementation of CAS was this:

#define DRM_CAS(lock,old,new,__ret)   \
do {  \
unsigned int __result, __old = (old); \
__asm__ __volatile__( \
mf\n\
mov ar.ccv=%2\n \
;;\n\
cmpxchg4.acq %0=%1,%3,ar.ccv\
: =r (__result), =m (__drm_dummy_lock(lock))  \
: r (__old), r (new)  \
: memory);  \
__ret = (__result) != (__old);\
} while (0)

The problem was with the data types given to the cmpxchg4
instruction. All of the lock types in DRM are int's and on IA64 thats
4 bytes wide. The digit suffix cmpxchg4 signifies this instruction
operates on a 4 byte quantity. One might expect then since this
instruction operates on 4 byte values and in DRM the locks are 4 bytes
everything is fine, but it isn't.

The cmpxchg4 instruction operates this way: 

cmpxchg4 r1=[r3],r2,ar.ccv

4 bytes are read at the address pointed to by r3, that 32 bit value is
then zero extended to 64 bits. The 64 bit value is then compared to
the 64 bit value stored in appliation register CCV. If the two 64 bit
values are equal then the least significant 4 bytes in r2 are written
back to the address pointed to by r3. The original value pointed to by
r3 is stored in r1. The entire operation is atomic.

The mistake in the DRM_CAS implemenation is that the comparison is 64
bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64
bits wide and for us that means zero extending the 32 bit old
parameter to 64 bits.

Because of the way GCC asm blocks work to tie C variables and data
types to asm values the promotion of old from unsigned int to
unsigned long was not happening. Thus when old was stored into
ar.ccv its most significant 32 bits contained garbage. (Actually
because of the way GCC generates constants it turns out the upper 32
bits was 0x, this was from the OR of DRM_LOCK_HELD which is
defined as 0x8000, but the compiler generates a 64 bit OR
operation using the immediate value 0x8000, which is legal
because the upper 32 bits are undefined on int (32 bit) operations).

The bottom line is that the test would fail when it shouldn't because
the high 32 bits in ar.ccv were not zero.

One might think that because old was assigned to __old in a local
block which was unsigned int the compiler would know enough when using
this value in the asm to have zero extended it. But that's not true,
in asm blocks its critical to define the asm value correctly so the
compiler can translate between the C code variable and what the asm
code is referring to.

The line:

: r (__old), r (new)

says %2 is mapped by r (__old), in other words put __old in a
general 64 bit register. We've told the compiler to put 64 bits of
__old into a register, but __old is a 32 bit value with its high
order 32 bits undefined. We need to tell the compiler to widen the
type when assigning it to a general register, thus the asm template
type definition needs to be modified with a cast to unsigned long.

: r ((unsigned long)__old), r (new)

Only with this will the compiler know to widen the 32 bit __old value
to 64 bits inside the asm code.

Thanks to Jakub Jelinek who helped me understand the nuances of GCC
asm templates and type conversions.

As a minor side note, definitions of bit flags should be tagged as
unsigned. Thus things like:

#define DRM_LOCK_HELD  0x8000
#define DRM_LOCK_CONT  0x4000

should really be:

#define DRM_LOCK_HELD  0x8000U
#define DRM_LOCK_CONT  0x4000U

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-10 Thread Keith Whitwell
John Dennis wrote:
The locking problem is solved, my original analysis was incorrect. The
problem was that DRM_CAS was not correctly implemented on IA64. Thus
this was an IA64 issue only, this is consistent with others who showed
up in a google search describing the problem, all were on IA64.
I have filed an XFree86 bug report on this. I could not find a DRI
specific bug reporting mechanism other than the dri-devel list.
This list is good.  Can you put together a patch to correct the behaviour?

Keith

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-02 Thread Keith Whitwell
John Dennis wrote:
[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ]

I'm trying to debug a hung X server problem with DRI using the radeon
driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at
the moment I don't see anything architecture specific about the problem.
The symptom of the problem is the following message from the drm
radeon kernel driver:
[drm:radeon_lock_take] *ERROR* x holds heavyweight lock

where x is a context id. I've tracked the sequence of events down to
the following:
DRIFinishScreenInit is called during the radeon driver initialization,
inside DRIFinishScreenInit is the following code snippet:
/* Now that we have created the X server's context, we can grab the
 * hardware lock for the X server.
 */
DRILock(pScreen, 0);
pDRIPriv-grabbedDRILock = TRUE;
Slightly later on RADEONAdjustFrame is called and it does the following:

#ifdef XF86DRI
if (info-CPStarted) DRILock(pScrn-pScreen, 0);
#endif
Its this DRILock which is causing the *ERROR* x holds heavyweight
lock message. The reason is both DRIFinishScreenInit and
RADEONAdjustFrame are executing in the server and using the servers
DRI lock. DRIFinishScreenInit never unlocks, it sets the
grabbedDRILock flag, big deal, no one ever references this flag. When
RADEONAdjustFrame calls DRILock its already locked because
DRIFinishScreenInit locked and never unlocked. The dri kernel driver
on the second lock call then suspends the X server process
(DRM(lock_take) returns zero to DRM(lock) because the context holding
the lock and context requesting the lock are the same, this then
causes DRM(lock) to put the X server on the lock wait queue). Putting
the X server on the wait queue waiting for the lock to be released
then deadlocks the X server because its the process holding the lock
on its context.
Questions:

The whole crux of the problem seems to me the taking and holding of
the lock in DRIFinishScreenInit. Why is this being done? 
It is done because the X server expects to be holding the lock whenever it is 
between the Wakeup  Block handlers.  The odd man out in this case is when the 
server first starts up, it won't have aquired the lock coming in through the 
Wakeup handler.  So, it gets aquired at this point.

The rest of the DRI initialization code needs to be holding the lock, so we 
have to grab it somewhere.

The problem seems to be that RADEONAdjustFrame() is designed to be called from 
cursor handling routines that are executed outside the Wakeup/Block handlers 
(perhaps this came in with SilkenMouse?) but is being called during 
initialization after the point the lock is grabbed.

I haven't deeply investigated this but two solutions spring to mind:
	- Hack:  Move the call to RADEONAdjustFrame() during initialization to before 
the lock is grabbed.
	- Better:  Replace the call to RADEONAdjustFrame() during initialization with 
something like:

if (info-FBDev) {
fbdevHWAdjustFrame(scrnIndex, x, y, flags);
} else {
RADEONDoAdjustFrame(pScrn, x, y, FALSE);
}
which is basically what RADEONAdjustFrame() wraps.

Keith

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-02 Thread Keith Whitwell
Keith Whitwell wrote:
John Dennis wrote:

[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ]

I'm trying to debug a hung X server problem with DRI using the radeon
driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at
the moment I don't see anything architecture specific about the problem.
The symptom of the problem is the following message from the drm
radeon kernel driver:
[drm:radeon_lock_take] *ERROR* x holds heavyweight lock

where x is a context id. I've tracked the sequence of events down to
the following:
DRIFinishScreenInit is called during the radeon driver initialization,
inside DRIFinishScreenInit is the following code snippet:
/* Now that we have created the X server's context, we can grab the
 * hardware lock for the X server.
 */
DRILock(pScreen, 0);
pDRIPriv-grabbedDRILock = TRUE;
Slightly later on RADEONAdjustFrame is called and it does the following:

#ifdef XF86DRI
if (info-CPStarted) DRILock(pScrn-pScreen, 0);
#endif
Its this DRILock which is causing the *ERROR* x holds heavyweight
lock message. The reason is both DRIFinishScreenInit and
RADEONAdjustFrame are executing in the server and using the servers
DRI lock. DRIFinishScreenInit never unlocks, it sets the
grabbedDRILock flag, big deal, no one ever references this flag. When
RADEONAdjustFrame calls DRILock its already locked because
DRIFinishScreenInit locked and never unlocked. The dri kernel driver
on the second lock call then suspends the X server process
(DRM(lock_take) returns zero to DRM(lock) because the context holding
the lock and context requesting the lock are the same, this then
causes DRM(lock) to put the X server on the lock wait queue). Putting
the X server on the wait queue waiting for the lock to be released
then deadlocks the X server because its the process holding the lock
on its context.
Questions:

The whole crux of the problem seems to me the taking and holding of
the lock in DRIFinishScreenInit. Why is this being done? 


It is done because the X server expects to be holding the lock whenever 
it is between the Wakeup  Block handlers.  The odd man out in this case 
is when the server first starts up, it won't have aquired the lock 
coming in through the Wakeup handler.  So, it gets aquired at this point.

The rest of the DRI initialization code needs to be holding the lock, so 
we have to grab it somewhere.

The problem seems to be that RADEONAdjustFrame() is designed to be 
called from cursor handling routines that are executed outside the 
Wakeup/Block handlers (perhaps this came in with SilkenMouse?) but is 
being called during initialization after the point the lock is grabbed.
Actually, looking more closely at RADEONAdjustFrame(), I would guess that the 
test of (info-CPStarted) is designed to avoid precisely this problem, right? 
 So I wonder why that's not working for you...

Keith

	

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-02 Thread Ian Romanick
Keith Whitwell wrote:

I haven't deeply investigated this but two solutions spring to mind:
- Hack:  Move the call to RADEONAdjustFrame() during initialization 
to before the lock is grabbed.
- Better:  Replace the call to RADEONAdjustFrame() during 
initialization with something like:

if (info-FBDev) {
fbdevHWAdjustFrame(scrnIndex, x, y, flags);
} else {
RADEONDoAdjustFrame(pScrn, x, y, FALSE);
}
which is basically what RADEONAdjustFrame() wraps.
That seems like the right way to go, but I'd feel better if the body of 
RADEONAdjectFrame was moved to a new function called 
RADEONAdjectFrameLocked.  RADEONAdjectFrame would just lock, call 
RADEONAdjectFrameLocked, and unlock.  That matches what's been done 
elsewhere in the 3D driver, anyway.

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel