Nature of the computational problem and the architecture of the cluster would have a significant bearing on the solution to your problem I think. I would guess offhand its a bandwith issue since the NFS does serve requests eventually. If you have a really large cluster or a very file writing intensive computational problem you can tie up the NFS server fairly quickly.
About how many nodes are you using? >===== Original Message From Howell Silverman <[EMAIL PROTECTED]> ===== >Can anyone help lead us to a solution? > > >> >Subject: NFS problem on [machine]> > > >> >> On the new cluster [x], we just found that for a few >> >> times, the master node was not responding [to] NFS requests. >> >> Attached are lines grepped with 'nfs' from all >> >>'messages' >> >> file. >> >> >> >> This is a serious problem to us. When the nfs server >> >> stops responding, many running jobs are restarted from >> >> scratch. Is this a problem of the nfs configuration, >> >> oscar or the hardware? What can we do to make NFS >> >>stable? >> >> Please advise. >> >> >> >> Thanks, > >[MORE INFORMATION] > >> We noticed this when our big jobs (need 3 days) were >> restarted. From the log, it was happening more and more >> frequently. Any suggestion on identifying the source of >> the problem? >> >> >> [EMAIL PROTECTED] log]# grep nfs messages | grep not | wc -l >> 339 >> [EMAIL PROTECTED] log]# grep nfs messages.1 | grep not | wc -l >> 381 >> [EMAIL PROTECTED] log]# grep nfs messages.2 | grep not | wc -l >> 9 >> [EMAIL PROTECTED] log]# grep nfs messages.3 | grep not | wc -l >> 0 >> [EMAIL PROTECTED] log]# grep nfs messages.4 | grep not | wc -l >> 0 >> [EMAIL PROTECTED] log]# ls -l messages* >> -rw------- 1 root root 130926 Dec 29 15:45 >> messages >> -rw------- 1 root root 509784 Dec 28 04:02 >> messages.1 >> -rw------- 1 root root 416508 Dec 21 04:02 >> messages.2 >> -rw------- 1 root root 586158 Dec 14 04:02 >> messages.3 >> -rw------- 1 root root 413372 Dec 7 04:02 >> messages.4 > >[ ... more message ....] >messages:Dec 28 04:05:06 node7.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:05:30 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:05:46 node2.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:05:57 node2.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:06:09 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:06:34 node7.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:06:58 node7.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:07:48 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:08:20 node2.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:09:01 node2.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:09:29 node7.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:09:53 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:10:10 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:10:38 node2.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:10:47 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:11:06 node2.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:11:25 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:11:35 node2.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:11:42 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:11:47 node2.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:11:51 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:12:08 node2.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:12:35 node2.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:13:14 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:14:27 node3.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:15:36 node7.metis kernel: nfs: server nfs_oscar not responding, still trying >messages:Dec 28 04:16:30 node3.metis kernel: nfs: server nfs_oscar OK >messages:Dec 28 04:17:27 node7.metis kernel: nfs: server nfs_oscar OK > >[snip] ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
