On Wed, 4 Apr 2012 at 6:33pm, Tru Huynh wrote

On Tue, Apr 03, 2012 at 03:19:51PM -0700, Joshua Baker-LePain wrote:

Yes.  We have the SGE commlib errors, and the Open MPI
"routed:binomial" errors.  I'm mainly focusing on the SGE problem
right now, as I think (hope) that fixing that will also fix the MPI
issue.

could it be related to NFS (locking?) between your CentOS-6 clients
and NFS shared SGE directory?

or readdir failure such as:
http://bugs.centos.org/view.php?id=5496

Aside: Wow, NFS in 6.2 seems rather wonky. We've also hit this <https://bugzilla.redhat.com/show_bug.cgi?id=770250>.

That being said, our SGE directory isn't NFS shared. We use local spool directories and local SGE installations on all the nodes. The only thing that's NFS mounted is $SGE_ROOT/$SGE_CELL/common so that we can have a shadow master.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to