I have upgraded to ocfs2 1.2.8 and am getting the same lock problem.
Here is the var/log/messages entries from: echo R mas PAF1A9
/proc/fs/ocfs2_dlm/debug
I'm not sure how to decode this, is this lock still held ?
N1 kernel: (13416,1):dlm_dump_one_lock_resource:259 struct dlm_ctxt:
mas, node=0, key=4125434387
N1 kernel: (13416,1):dlm_print_one_lock_resource:294 lockres: PAF1A9,
owner=0, state=0
N1 kernel: (13416,1):__dlm_print_one_lock_resource:309 lockres: PAF1A9,
owner=0, state=0
N1 kernel: (13416,1):__dlm_print_one_lock_resource:311 last used:
20693697, on purge list: no
N1 kernel: (13416,1):dlm_print_lockres_refmap:277 refmap nodes: [ ],
inflight=0
N1 kernel: (13416,1):__dlm_print_one_lock_resource:313 granted queue:
N1 kernel: (13416,1):__dlm_print_one_lock_resource:325 type=5,
conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
bast=(empty=y,pend=n)
N1 kernel: (13416,1):__dlm_print_one_lock_resource:328 converting
queue:
N1 kernel: (13416,1):__dlm_print_one_lock_resource:343 blocked queue:
Thank you,
charlie
-----Original Message-----
From: Sunil Mushran [mailto:[EMAIL PROTECTED]
Sent: Friday, February 01, 2008 2:29 PM
To: Charlie Sharkey
Cc: [email protected]
Subject: Re: [Ocfs2-users] Dlm question
There are 3 issues. I'll address them in the reverse order.
debugfs, not to be confused with debugfs.ocfs2, is a kernel component.
It used to be shipped with the ocfs2 kernel module package as
RHEL4/SLES9 did not bundle it.
RHEL5/SLES10 build/ship it as part of the kernel (not as a module) and
hence the scanlocks check fails. Solution is to comment out the section
that checks whether is is loaded. (Section commented with with "#is
debugfs loaded?")
The second issue is the oops. File a bugzilla with NOVELL.
We will handle this via them as we need to see what patches your
kernel/ocfs2 has.
The first issue indicates that the lock is busy. (-16 is EBUSY).
Meaning there are holders. As the locks are files, you can use fuser to
see which pid is using it. If you want to see the state of the lock, you
will have to dump it via the dlm proc interface.
# echo R domain lock >/proc/fs/ocfs2_dlm/debug Here domain will the
directory in /dlm and lock the file in it.
The state will be dumped in /var/log/messages.
No, scanlocks cannot dump dlmfs locks.
Sunil
Charlie Sharkey wrote:
Hi,
I'm having some dlm issues on a system. It looks like the scenario
went something like:
19:45:29 --> 19:47:31 16 dlm locks are released using
o2dlm_unlock().
ocfs2 logs an error into
/var/log/messages, but returns
ok to the application
19:45:15 a dlm lock (o2dlm_lock()) is put on P00000
-- ok
19:49:37 lock on P00000 is released -- ok
19:49:40 a lock is attempted P00000. and the lock
fails. Returned error
is "Trylock failed"
Here is the data from /var/log/messages:
Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink
P50005, error -16 from destroy Jan 31 19:45:43 N1 kernel:
(25038,1):dlmfs_unlink:512 ERROR: unlink P20010, error -16 from
destroy Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR:
unlink P20002, error -16 from destroy Jan 31 19:45:59 N1 kernel:
(25034,3):dlmfs_unlink:512 ERROR: unlink P60006, error -16 from
destroy Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR:
unlink P70015, error -16 from destroy Jan 31 19:46:08 N1 kernel:
(25035,0):dlmfs_unlink:512 ERROR: unlink P70007, error -16 from
destroy Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR:
unlink P50013, error -16 from destroy Jan 31 19:46:25 N1 kernel:
(25040,1):dlmfs_unlink:512 ERROR: unlink P40012, error -16 from
destroy Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR:
unlink P60014, error -16 from destroy Jan 31 19:46:30 N1 kernel:
(25032,1):dlmfs_unlink:512 ERROR: unlink P40004, error -16 from
destroy Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR:
unlink P00000, error -16 from destroy Jan 31 19:47:08 N1 kernel:
(25036,1):dlmfs_unlink:512 ERROR: unlink P00008, error -16 from
destroy Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR:
unlink P10001, error -16 from destroy Jan 31 19:47:19 N1 kernel:
(25037,0):dlmfs_unlink:512 ERROR: unlink P10009, error -16 from
destroy Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR:
unlink P30011, error -16 from destroy Jan 31 19:47:31 N1 kernel:
(25031,1):dlmfs_unlink:512 ERROR: unlink P30003, error -16 from
destroy
Here is data from the application dlm log file
01/31/2008 19:42:50 C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:42:50 C001: Dlm Lock fd/id 152/P10001, returning: ok
01/31/2008 19:42:50 C002: Dlm Lock fd/id 154/P20002, returning: ok
01/31/2008 19:42:51 C003: Dlm Lock fd/id 156/P30003, returning: ok
01/31/2008 19:42:51 C004: Dlm Lock fd/id 158/P40004, returning: ok
01/31/2008 19:42:52 C005: Dlm Lock fd/id 160/P50005, returning: ok
01/31/2008 19:42:52 C006: Dlm Lock fd/id 162/P60006, returning: ok
01/31/2008 19:42:52 C007: Dlm Lock fd/id 164/P70007, returning: ok
01/31/2008 19:42:53 C008: Dlm Lock fd/id 166/P00008, returning: ok
01/31/2008 19:42:53 C009: Dlm Lock fd/id 168/P10009, returning: ok
01/31/2008 19:42:53 C00A: Dlm Lock fd/id 170/P20010, returning: ok
01/31/2008 19:42:54 C00B: Dlm Lock fd/id 172/P30011, returning: ok
01/31/2008 19:42:54 C00C: Dlm Lock fd/id 174/P40012, returning: ok
01/31/2008 19:42:54 C00D: Dlm Lock fd/id 178/P50013, returning: ok
01/31/2008 19:42:55 C00E: Dlm Lock fd/id 180/P60014, returning: ok
01/31/2008 19:42:58 C00F: Dlm Lock fd/id 182/P70015, returning: ok
01/31/2008 19:45:29 C005: Dlm UnLock. fd/id 160/P50005, returning ok
01/31/2008 19:45:43 C00A: Dlm UnLock. fd/id 170/P20010, returning ok
01/31/2008 19:45:44 C002: Dlm UnLock. fd/id 154/P20002, returning ok
01/31/2008 19:45:59 C006: Dlm UnLock. fd/id 162/P60006, returning ok
01/31/2008 19:46:07 C00F: Dlm UnLock. fd/id 182/P70015, returning ok
01/31/2008 19:46:08 C007: Dlm UnLock. fd/id 164/P70007, returning ok
01/31/2008 19:46:10 C00D: Dlm UnLock. fd/id 178/P50013, returning ok
01/31/2008 19:46:25 C00C: Dlm UnLock. fd/id 174/P40012, returning ok
01/31/2008 19:46:30 C00E: Dlm UnLock. fd/id 180/P60014, returning ok
01/31/2008 19:46:30 C004: Dlm UnLock. fd/id 158/P40004, returning ok
01/31/2008 19:47:07 C000: Dlm UnLock. fd/id 150/P00000, returning ok
01/31/2008 19:47:08 C008: Dlm UnLock. fd/id 166/P00008, returning ok
01/31/2008 19:47:09 C001: Dlm UnLock. fd/id 152/P10001, returning ok
01/31/2008 19:47:19 C009: Dlm UnLock. fd/id 168/P10009, returning ok
01/31/2008 19:47:30 C00B: Dlm UnLock. fd/id 172/P30011, returning ok
01/31/2008 19:47:31 C003: Dlm UnLock. fd/id 156/P30003, returning ok
01/31/2008 19:49:15 C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:49:37 C000: Dlm UnLock. fd/id 150/P00000, returning ok
01/31/2008 19:49:40 C000: Dlm Lock fd/id 150/P00000, returning:
Trylock failed
I also had a problem with this system the day before this. Here is The
data from that:
SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
DUMPFILE: vmcore
CPUS: 4
DATE: Wed Jan 30 17:44:49 2008
UPTIME: 9 days, 00:55:51
LOAD AVERAGE: 1.10, 1.06, 1.01
TASKS: 341
NODENAME: N1
RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
MACHINE: i686 (2327 Mhz)
MEMORY: 2 GB
PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
PID: 31585
COMMAND: "masx"
TASK: dab912d0 [THREAD_INFO: d5840000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 31585 TASK: dab912d0 CPU: 0 COMMAND: "masx"
#0 [d5841d78] crash_kexec at c013bb1a
#1 [d5841dbc] die at c01055fe
#2 [d5841dec] do_invalid_op at c0105ce2
#3 [d5841e9c] error_code (via invalid_op) at c0104e4d
EAX: 00000051 EBX: ca668280 ECX: 00000000 EDX: 00000296 EBP:
da0c1c00
DS: 007b ESI: da0c1c00 ES: 007b EDI: ca668280
CS: 0060 EIP: fb835e95 ERR: ffffffff EFLAGS: 00010296
#4 [d5841ed0] dlm_empty_lockres at fb835e95
#5 [d5841ee0] dlm_unregister_domain at fb827305
#6 [d5841f18] dlmfs_clear_inode at fb6c2eae
#7 [d5841f24] clear_inode at c0175dfe
#8 [d5841f30] generic_delete_inode at c0175eee
#9 [d5841f3c] iput at c0175838
#10 [d5841f48] dput at c01744e0
#11 [d5841f54] do_rmdir at c016e63d
#12 [d5841fb8] sysenter_entry at c0103bd4
EAX: 00000028 EBX: 08299988 ECX: 00000000 EDX: 08273be4
DS: 007b ESI: 00000000 ES: 007b EDI: 082ebf28
SS: 007b ESP: bf999f7c EBP: bf999fa8
CS: 0073 EIP: ffffe410 ERR: 00000028 EFLAGS: 00000246
This is a Suse Sles10 SP1 system, with a suse nfs patch.
Ocfs2 tools version 1.2.3-0.7
Ocfs2 version 1.2.5-SLES-r2997
I was hoping you would have some ideas on this.
Also, another question. I have been trying to run one of the debugging
Scripts, for example, scanlocks. I keep getting the message 'Module
debugfs not loaded'. I don't see any debugfs.ko on the system. Isn't
it a part of The ocfs2 tools ?
Thank you,
Charlie
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users