Aha, ok, I don't see the oops, or anything about the hang in the logs. The hanged machine still reply to pings.
The story now is , that I thought that I can use the : tunefs.ocfs2 --cloned-volume /dev/mylvmsnapshot in order to mount the snapshot... (big mistake)...well I did manage to mount the snapshot, but as soon as I umounted it, the umount process hanged, and then the whole machine hanged, except that it responds to pings. Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to section 'f) DLM Debuging', and tried the commands there on the still working node, but only 'cat /sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following output: Domain: 1ACAFCEE7ACA47C089069117560F5C91 Key: 0xb9d649ba Thread Pid: 5664 Node: 0 State: JOINED Number of Joins: 1 Joining Node: 255 Domain Map: 0 Live Map: 0 Lock Resources: 51168 (180512) MLEs: 0 (291689) Blocking: 0 (139713) Mastery: 0 (151976) Migration: 0 (0) Lists: Dirty=Empty Purge=InUse PendingASTs=Empty PendingBASTs=Empty Purge Count: 8 Refs: 51169 Dead Node: 255 Recovery Pid: 5665 Master: 255 State: INACTIVE Recovery Map: Recovery Node State: the other commands: debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0 debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv debugfs.ocfs2 –R “dlm_locks M000000000000000022d63c00000000” /dev/drbd0 produced the error: open: Device name specified was not found while opening context for device –R debugfs.ocfs2 1.4.2 debugfs: and: ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN procuded no D state process. I am sorry I write it in the mailing list, but I am a noob, so I don't even know if it is a bug, or a misconfiguration, or a misunderstanding. PS. Is nodiratime option supported for mounts? I used it, but I don't see it in the user-guide. -----Original Message----- From: Sunil Mushran <sunil.mush...@oracle.com> To: sylarrrr...@aim.com Cc: tao...@oracle.com; ocfs2-users@oss.oracle.com Sent: Tue, Jul 7, 2009 8:46 pm Subject: Re: [Ocfs2-users] umount hang + high CPU The fix was for the oops you saw. The hang is a different issue. We have no info on that. For that, if you would like to diagnose the problem, read up the dlm notes in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs. If the issue is dlm related, then we would like to have the tcpdumps. Lastly, emails are not an efficient vehicle for handling such issues. Use the bugzilla as it allows us to collect information in one place. Sunil sylarrrr...@aim.com wrote: > So this bug is not over yet :( > > I have checked my kernel source and indeed it have this patch but I > still get the hang. > > PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has: > > 290 else > 291 mlog_errno(ret); > 292 > 293 /* > 294 * In case of error, manually free the allocation and > do the iput(). > 295 * We need to do this because error here means no > d_instantiate(), > 296 * which means iput() will not be called during > dput(dentry). > 297 */ > 298 if (ret < 0 && !alias) { > 299 ocfs2_lock_res_free(&dl->dl_lockres); > 300 BUG_ON(dl->dl_count != 1); > 301 spin_lock(&dentry_attach_lock); > 302 dentry->d_fsdata = NULL; > 303 spin_unlock(&dentry_attach_lock); > 304 kfree(dl); > 305 iput(inode); > 306 } > 307 > 308 dput(alias); > 309 > 310 return ret; > 311 } > >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users