Here's the result of the command. I'll check for a newer version of tools.
PID STAT COMMAND WIDE-WCHAN-COLUMN 1 S init - 2 S migration/0 migration_thread 3 SN ksoftirqd/0 ksoftirqd 4 S migration/1 migration_thread 5 SN ksoftirqd/1 ksoftirqd 6 S migration/2 migration_thread 7 SN ksoftirqd/2 ksoftirqd 8 S migration/3 migration_thread 9 SN ksoftirqd/3 ksoftirqd 10 S migration/4 migration_thread 11 SN ksoftirqd/4 ksoftirqd 12 S migration/5 migration_thread 13 SN ksoftirqd/5 ksoftirqd 14 S migration/6 migration_thread 15 SN ksoftirqd/6 ksoftirqd 16 S migration/7 migration_thread 17 SN ksoftirqd/7 ksoftirqd 18 S< events/0 worker_thread 19 S< events/1 worker_thread 20 S< events/2 worker_thread 21 S< events/3 worker_thread 22 S< events/4 worker_thread 23 S< events/5 worker_thread 24 S< events/6 worker_thread 25 S< events/7 worker_thread 26 S< khelper worker_thread 27 S< kthread worker_thread 37 S< kblockd/0 worker_thread 38 S< kblockd/1 worker_thread 39 S< kblockd/2 worker_thread 40 S< kblockd/3 worker_thread 41 S< kblockd/4 worker_thread 42 S< kblockd/5 worker_thread 43 S< kblockd/6 worker_thread 44 S< kblockd/7 worker_thread 45 S< kacpid worker_thread 46 S< kacpi_notify worker_thread 327 S pdflush pdflush 328 S pdflush pdflush 329 S kswapd0 kswapd 330 S< aio/0 worker_thread 331 S< aio/1 worker_thread 332 S< aio/2 worker_thread 333 S< aio/3 worker_thread 334 S< aio/4 worker_thread 335 S< aio/5 worker_thread 336 S< aio/6 worker_thread 337 S< aio/7 worker_thread 582 S< cqueue/0 worker_thread 583 S< cqueue/1 worker_thread 584 S< cqueue/2 worker_thread 585 S< cqueue/3 worker_thread 586 S< cqueue/4 worker_thread 587 S< cqueue/5 worker_thread 588 S< cqueue/6 worker_thread 589 S< cqueue/7 worker_thread 590 S< kseriod serio_thread 623 S< kpsmoused worker_thread 1056 S< ata/0 worker_thread 1057 S< ata/1 worker_thread 1058 S< ata/2 worker_thread 1059 S< ata/3 worker_thread 1060 S< ata/4 worker_thread 1061 S< ata/5 worker_thread 1062 S< ata/6 worker_thread 1063 S< ata/7 worker_thread 1064 S< ata_aux worker_thread 1093 S< scsi_eh_0 scsi_error_handler 1218 S< scsi_eh_1 scsi_error_handler 1232 S< qla2xxx_1_dpc 144669341936254977 2061 S< scsi_eh_2 scsi_error_handler 2111 S< qla2xxx_2_dpc 18446604440027791361 2190 S kjournald kjournald 2251 S<s udevd - 3469 S< khubd hub_thread 4474 S< scsi_eh_3 scsi_error_handler 4475 S< usb-storage - 4620 S< kmpathd/0 worker_thread 4621 S< kmpathd/1 worker_thread 4622 S< kmpathd/2 worker_thread 4623 S< kmpathd/3 worker_thread 4624 S< kmpathd/4 worker_thread 4625 S< kmpathd/5 worker_thread 4626 S< kmpathd/6 worker_thread 4627 S< kmpathd/7 worker_thread 5768 S kjournald kjournald 5770 S kjournald kjournald 5823 S< kauditd kauditd_thread 6117 Ss resmgrd - 6249 Ss acpid - 6326 Ss dbus-daemon - 6494 Ss hald - 6695 S hald-addon-acpi - 7050 S< bond worker_thread 7244 S hald-addon-stor - 7495 Ss syslog-ng - 7499 Ss klogd syslog 7524 SLl multipathd stext 7529 Ss portmap - 7547 Ss slpd - 7626 Ss irqbalance 1 7658 SN kipmi0 - 7725 S snmpd - 7950 S< CID_control OS_cidWait 7951 D< CID_timer - 7952 S< CID_sched_0 OS_cidWait 7953 S< CID_sched_1 OS_cidWait 7975 S btitool OS_cidWait 7989 Ss startpar - 8094 Sl qlremote stext 8146 Ss sshd - 8191 S< user_dlm worker_thread 8206 Ss ntpd - 8217 S< o2net worker_thread 8250 Ss cron - 8261 S< o2hb-D5304888F9 - 8272 Ss httpd2-prefork - 8273 S httpd2-prefork - 8274 S httpd2-prefork - 8275 S httpd2-prefork - 8276 S httpd2-prefork - 8277 S httpd2-prefork - 8324 S< ocfs2_wq worker_thread 8325 S< ocfs2dc ocfs2_downconvert_thread 8326 S< dlm_thread - 8327 S< dlm_reco_thread - 8328 S< dlm_wq worker_thread 8329 S kjournald kjournald 8330 S< ocfs2cmt ocfs2_commit_thread 8336 S< o2hb-B98C95FB4B - 8353 S< ocfs2dc ocfs2_downconvert_thread 8354 S< dlm_thread - 8355 S< dlm_reco_thread - 8356 S< dlm_wq worker_thread 8357 S kjournald kjournald 8358 S< ocfs2cmt ocfs2_commit_thread 8364 S< o2hb-B3EE601AEB - 8381 S< ocfs2dc ocfs2_downconvert_thread 8382 S< dlm_thread - 8383 S< dlm_reco_thread - 8384 S< dlm_wq worker_thread 8385 S kjournald kjournald 8386 S< ocfs2cmt ocfs2_commit_thread 8392 S< o2hb-2043DFCC18 - 8409 S< ocfs2dc ocfs2_downconvert_thread 8410 S< dlm_thread - 8411 S< dlm_reco_thread - 8412 S< dlm_wq worker_thread 8413 S kjournald kjournald 8414 S< ocfs2cmt ocfs2_commit_thread 8420 S< o2hb-6B6685A881 - 8437 S< ocfs2dc ocfs2_downconvert_thread 8438 S< dlm_thread - 8439 S< dlm_reco_thread - 8440 S< dlm_wq worker_thread 8441 S kjournald kjournald 8442 S< ocfs2cmt ocfs2_commit_thread 8538 S logger pipe_wait 8540 Ss startpar - 8542 Dsl vt ocfs2_wait_for_mask 8555 Ss+ mingetty - 8556 Ss+ mingetty - 8557 Ss+ mingetty - 8558 Ss+ mingetty - 8559 Ss+ mingetty - 8560 Ss+ mingetty - 8615 S< dlm_thread - 8616 S< dlm_reco_thread - 8617 S< dlm_wq worker_thread 9369 R+ ps - 10405 Ss sshd - 10407 Ss+ vtcon - 26609 Ss sshd - 26611 Ss bash wait 26698 S+ gdb wait 26894 Ss sshd - 26896 Ss+ bash - 29881 Ss sshd - 29883 Ss bash wait -----Original Message----- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, January 13, 2010 4:04 PM To: Charlie Sharkey Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] hung process -- sles10 sp2 Charlie Sharkey wrote: > > version info > > --------------- > > n1 kernel: OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 > > n1 kernel: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 > > n1 kernel: OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 > > ocfs2-tools-1.4.0-0.5 > > ocfs2console-1.4.0-0.5 > > Linux n1 2.6.16.60-0.34-smp #1 SMP Fri Jan 16 14:59:01 UTC 2009 x86_64 > x86_64 x86_64 GNU/Linux > > ============================================================================ > > One of the nodes of a six node cluster got a hung process. The 'ps > -elf' command shows it as: > > 5 D vtape 8542 1 6 77 0 - 77376 ocfs2_ Jan12 ? 01:34:31 > /opt/bti/mas/bin/vt -d -p /var/run/vt.pid > > The system isn't hung, I can ssh into the system and ls each ocfs2 > directory. I have run the debugfs.ocfs2 > > command: debug.ocfs2 -R "stats" and it shows no errors. I ran the > 'scanlocks2' script and it didn't show > > any hung locks. It did create some files (/tmp/_fsl_dm-22 à > /tmp/_fsl_dm-26). The contents of those files > > are: "Debug string proto 2 found, but 1 is the highest I understand." > You have an old debugfs.ocfs2. See if sles has a newer ocfs2-tools. With it, rerun scanlocks2. That will tell us if dlm is involved or not. Meanwhile what does this say. ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users