Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Sunil Mushran
bast queued and flushed,before the ast was queued Unlikely with o2dlm. dlmthread always sends ASTs before BASTs. Can you recreate the entire lockres? A full dump may yield more information. Sunil On 02/20/2012 10:12 PM, xiaowei...@oracle.com wrote: I am trying to fix bug13611997,CT's

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Sunil Mushran
Moreover what is lockres_clear_pending doing in 1.4. That code is not meant for 1.4. It fixes a problem associated with fsdlm. It was left out of 1.4 for a reason. Meaning this bug was introduced by the patch that introduced this one in 1.4. On 02/20/2012 10:12 PM, xiaowei...@oracle.com wrote:

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Xiaowei.hu
Hi Sunil, I mean it execute in this way: nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't hold spin locks, then it start to execute the proxy ast handler , process bast request from nodeB, then dlmthread flushed the bast, after this node A start to queue its ast in

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Xiaowei.hu
Yes, I noticed the lockres_set_pending and lockres_clear_pending doesn't exist in 1.4 code. But 1.4 code did have the problem, that when lock a new lockres, lockres-l_action = OCFS2_AST_ATTACH, and l_flags |= OCFS2_LOCK_BUSY ,and release the spin lock before ast was queued. Also there is no

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Sunil Mushran
Both AST and BAST can only be sent by the master. And we ensure the master sends the ASTs before BAST. Do you have the full lockres dump? On 02/21/2012 04:36 PM, Xiaowei.hu wrote: Hi Sunil, I mean it execute in this way: nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Xiaowei.hu
here is the whole lockres and backtrace: KERNEL: vmlinux DUMPFILE: vmcore1 CPUS: 16 DATE: Tue Jan 17 20:48:22 2012 UPTIME: 100 days, 21:11:24 LOAD AVERAGE: 7.11, 7.46, 7.85 TASKS: 1210 NODENAME: sgi-not-efped05 RELEASE: 2.6.18-92.el5 VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008 MACHINE: x86_64

Re: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-21 Thread Sunil Mushran
New lockres requested at PR. It has not received the AST. But still has blocking set? This is pretty whacked. What patches are on this tree? Srini, Have you verified the patches? Sunil On 02/21/2012 04:58 PM, Xiaowei.hu wrote: here is the whole lockres and backtrace: KERNEL: vmlinux

[Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

2012-02-20 Thread xiaowei . hu
I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc thread, BUG_ON(lockres-l_action != OCFS2_AST_CONVERT lockres-l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore , the lockres-l_action = OCFS2_AST_ATTACH and l_flags=326(which means