Yes. Shows node 0 not only as the owner (or master) but also
with a EX lock.

struct dlm_ctxt: mas, node=0, key=4125434387
lockres: PAF1A9, owner=0, state=0
last used: 20693697, on purge list: no
refmap nodes: [ ], inflight=0
granted queue:
type=5, conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n), bast=(empty=y,pend=n)
converting queue:
blocked queue:


Charlie Sharkey wrote:
I have upgraded to ocfs2 1.2.8 and am getting the same lock problem.
Here is the var/log/messages entries from:  echo R mas PAF1A9
/proc/fs/ocfs2_dlm/debug
I'm not sure how to decode this, is this lock still held ?


N1 kernel: (13416,1):dlm_dump_one_lock_resource:259 struct dlm_ctxt:
mas, node=0, key=4125434387

N1 kernel: (13416,1):dlm_print_one_lock_resource:294 lockres: PAF1A9,
owner=0, state=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:309 lockres: PAF1A9,
owner=0, state=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:311   last used:
20693697, on purge list: no

N1 kernel: (13416,1):dlm_print_lockres_refmap:277   refmap nodes: [ ],
inflight=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:313   granted queue:

N1 kernel: (13416,1):__dlm_print_one_lock_resource:325     type=5,
conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
bast=(empty=y,pend=n)
N1 kernel: (13416,1):__dlm_print_one_lock_resource:328   converting
queue:

N1 kernel: (13416,1):__dlm_print_one_lock_resource:343   blocked queue:

Thank you,

charlie


-----Original Message-----
From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Friday, February 01, 2008 2:29 PM
To: Charlie Sharkey
Cc: [email protected]
Subject: Re: [Ocfs2-users] Dlm question

There are 3 issues. I'll address them in the reverse order.

debugfs, not to be confused with debugfs.ocfs2, is a kernel component.
It used to be shipped with the ocfs2 kernel module package as
RHEL4/SLES9 did not bundle it.

RHEL5/SLES10 build/ship it as part of the kernel (not as a module) and
hence the scanlocks check fails. Solution is to comment out the section
that checks whether is is loaded. (Section commented with with "#is
debugfs loaded?")

The second issue is the oops. File a bugzilla with NOVELL.
We will handle this via them as we need to see what patches your
kernel/ocfs2 has.

The first issue indicates that the lock is busy. (-16 is EBUSY).
Meaning there are holders. As the locks are files, you can use fuser to
see which pid is using it. If you want to see the state of the lock, you
will have to dump it via the dlm proc interface.
# echo R domain lock >/proc/fs/ocfs2_dlm/debug Here domain will the
directory in /dlm and lock the file in it.
The state will be dumped in /var/log/messages.

No, scanlocks cannot dump dlmfs locks.

Sunil

Charlie Sharkey wrote:
Hi,

I'm having some dlm issues on a system. It looks like the scenario went something like:

  19:45:29  -->  19:47:31   16 dlm locks are released using
o2dlm_unlock(). ocfs2 logs an error into /var/log/messages, but returns
                            ok to the application

  19:45:15                  a dlm lock (o2dlm_lock()) is put on P00000
-- ok

  19:49:37                  lock on P00000 is released -- ok
19:49:40 a lock is attempted P00000. and the lock
fails. Returned error
                            is "Trylock failed"
Here is the data from /var/log/messages:

Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink P50005, error -16 from destroy Jan 31 19:45:43 N1 kernel: (25038,1):dlmfs_unlink:512 ERROR: unlink P20010, error -16 from destroy Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: unlink P20002, error -16 from destroy Jan 31 19:45:59 N1 kernel: (25034,3):dlmfs_unlink:512 ERROR: unlink P60006, error -16 from destroy Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: unlink P70015, error -16 from destroy Jan 31 19:46:08 N1 kernel: (25035,0):dlmfs_unlink:512 ERROR: unlink P70007, error -16 from destroy Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: unlink P50013, error -16 from destroy Jan 31 19:46:25 N1 kernel: (25040,1):dlmfs_unlink:512 ERROR: unlink P40012, error -16 from destroy Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: unlink P60014, error -16 from destroy Jan 31 19:46:30 N1 kernel: (25032,1):dlmfs_unlink:512 ERROR: unlink P40004, error -16 from destroy Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: unlink P00000, error -16 from destroy Jan 31 19:47:08 N1 kernel: (25036,1):dlmfs_unlink:512 ERROR: unlink P00008, error -16 from destroy Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: unlink P10001, error -16 from destroy Jan 31 19:47:19 N1 kernel: (25037,0):dlmfs_unlink:512 ERROR: unlink P10009, error -16 from destroy Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: unlink P30011, error -16 from destroy Jan 31 19:47:31 N1 kernel: (25031,1):dlmfs_unlink:512 ERROR: unlink P30003, error -16 from destroy


Here is data from the application dlm log file

01/31/2008 19:42:50  C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:42:50  C001: Dlm Lock fd/id 152/P10001, returning: ok
01/31/2008 19:42:50  C002: Dlm Lock fd/id 154/P20002, returning: ok
01/31/2008 19:42:51  C003: Dlm Lock fd/id 156/P30003, returning: ok
01/31/2008 19:42:51  C004: Dlm Lock fd/id 158/P40004, returning: ok
01/31/2008 19:42:52  C005: Dlm Lock fd/id 160/P50005, returning: ok
01/31/2008 19:42:52  C006: Dlm Lock fd/id 162/P60006, returning: ok
01/31/2008 19:42:52  C007: Dlm Lock fd/id 164/P70007, returning: ok
01/31/2008 19:42:53  C008: Dlm Lock fd/id 166/P00008, returning: ok
01/31/2008 19:42:53  C009: Dlm Lock fd/id 168/P10009, returning: ok
01/31/2008 19:42:53  C00A: Dlm Lock fd/id 170/P20010, returning: ok
01/31/2008 19:42:54  C00B: Dlm Lock fd/id 172/P30011, returning: ok
01/31/2008 19:42:54  C00C: Dlm Lock fd/id 174/P40012, returning: ok
01/31/2008 19:42:54  C00D: Dlm Lock fd/id 178/P50013, returning: ok
01/31/2008 19:42:55  C00E: Dlm Lock fd/id 180/P60014, returning: ok
01/31/2008 19:42:58  C00F: Dlm Lock fd/id 182/P70015, returning: ok
01/31/2008 19:45:29  C005: Dlm UnLock.  fd/id 160/P50005, returning ok
01/31/2008 19:45:43  C00A: Dlm UnLock.  fd/id 170/P20010, returning ok
01/31/2008 19:45:44  C002: Dlm UnLock.  fd/id 154/P20002, returning ok
01/31/2008 19:45:59  C006: Dlm UnLock.  fd/id 162/P60006, returning ok
01/31/2008 19:46:07  C00F: Dlm UnLock.  fd/id 182/P70015, returning ok
01/31/2008 19:46:08  C007: Dlm UnLock.  fd/id 164/P70007, returning ok
01/31/2008 19:46:10  C00D: Dlm UnLock.  fd/id 178/P50013, returning ok
01/31/2008 19:46:25  C00C: Dlm UnLock.  fd/id 174/P40012, returning ok
01/31/2008 19:46:30  C00E: Dlm UnLock.  fd/id 180/P60014, returning ok
01/31/2008 19:46:30  C004: Dlm UnLock.  fd/id 158/P40004, returning ok
01/31/2008 19:47:07  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
01/31/2008 19:47:08  C008: Dlm UnLock.  fd/id 166/P00008, returning ok
01/31/2008 19:47:09  C001: Dlm UnLock.  fd/id 152/P10001, returning ok
01/31/2008 19:47:19  C009: Dlm UnLock.  fd/id 168/P10009, returning ok
01/31/2008 19:47:30  C00B: Dlm UnLock.  fd/id 172/P30011, returning ok
01/31/2008 19:47:31  C003: Dlm UnLock.  fd/id 156/P30003, returning ok
01/31/2008 19:49:15  C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:49:37  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
01/31/2008 19:49:40 C000: Dlm Lock fd/id 150/P00000, returning: Trylock failed


I also had a problem with this system the day before this. Here is The

data from that:

  SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
    DUMPFILE: vmcore
        CPUS: 4
        DATE: Wed Jan 30 17:44:49 2008
      UPTIME: 9 days, 00:55:51
LOAD AVERAGE: 1.10, 1.06, 1.01
       TASKS: 341
    NODENAME: N1
     RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
     VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
     MACHINE: i686  (2327 Mhz)
      MEMORY: 2 GB
       PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
         PID: 31585
     COMMAND: "masx"
        TASK: dab912d0  [THREAD_INFO: d5840000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 31585  TASK: dab912d0  CPU: 0   COMMAND: "masx"
 #0 [d5841d78] crash_kexec at c013bb1a
 #1 [d5841dbc] die at c01055fe
 #2 [d5841dec] do_invalid_op at c0105ce2
 #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
    EAX: 00000051  EBX: ca668280  ECX: 00000000  EDX: 00000296  EBP:
da0c1c00
    DS:  007b      ESI: da0c1c00  ES:  007b      EDI: ca668280
    CS:  0060      EIP: fb835e95  ERR: ffffffff  EFLAGS: 00010296
 #4 [d5841ed0] dlm_empty_lockres at fb835e95
 #5 [d5841ee0] dlm_unregister_domain at fb827305
 #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
 #7 [d5841f24] clear_inode at c0175dfe
 #8 [d5841f30] generic_delete_inode at c0175eee
 #9 [d5841f3c] iput at c0175838
#10 [d5841f48] dput at c01744e0
#11 [d5841f54] do_rmdir at c016e63d
#12 [d5841fb8] sysenter_entry at c0103bd4
    EAX: 00000028  EBX: 08299988  ECX: 00000000  EDX: 08273be4
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 082ebf28
    SS:  007b      ESP: bf999f7c  EBP: bf999fa8
    CS:  0073      EIP: ffffe410  ERR: 00000028  EFLAGS: 00000246


This is a Suse Sles10 SP1 system, with a suse nfs patch.
Ocfs2 tools version 1.2.3-0.7
Ocfs2 version  1.2.5-SLES-r2997

I was hoping you would have some ideas on this.

Also, another question. I have been trying to run one of the debugging

Scripts, for example, scanlocks. I keep getting the message 'Module debugfs not loaded'. I don't see any debugfs.ko on the system. Isn't it a part of The ocfs2 tools ?

Thank you,

Charlie
















_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users



_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to