On Wed, 4 Apr 2012 at 6:33pm, Tru Huynh wrote
On Tue, Apr 03, 2012 at 03:19:51PM -0700, Joshua Baker-LePain wrote:
Yes. We have the SGE commlib errors, and the Open MPI
"routed:binomial" errors. I'm mainly focusing on the SGE problem
right now, as I think (hope) that fixing that will also fix the MPI
issue.
could it be related to NFS (locking?) between your CentOS-6 clients
and NFS shared SGE directory?
or readdir failure such as:
http://bugs.centos.org/view.php?id=5496
Aside: Wow, NFS in 6.2 seems rather wonky. We've also hit this
<https://bugzilla.redhat.com/show_bug.cgi?id=770250>.
That being said, our SGE directory isn't NFS shared. We use local spool
directories and local SGE installations on all the nodes. The only thing
that's NFS mounted is $SGE_ROOT/$SGE_CELL/common so that we can have a
shadow master.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users