I’ll create a patch that you can try - if it works okay, we can commit it
> On Jan 18, 2017, at 3:29 AM, William Hay <w....@ucl.ac.uk> wrote: > > On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote: >> As I recall, the problem was that qrsh isn???t available on the backend >> compute nodes, and so we can???t use a tree for launch. If that isn???t >> true, then we can certainly adjust it. >> > qrsh should be available on all nodes of a SoGE cluster but, depending on how > things are set up, may not be > findable (ie not in the PATH) when you qrsh -inherit into a node. A > workaround would be to start backend > processes with qrsh -inherit -v PATH which will copy the PATH from the master > node to the slave node > process or otherwise pass the location of qrsh from one node or another. > That of course assumes that > qrsh is in the same location on all nodes. > > I've tested that it is possible to qrsh from the head node of a job to a > slave node and then on to > another slave node by this method. > > William > > >>> On Jan 17, 2017, at 9:37 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: >>> >>> Hi, >>> >>> While commissioning a new cluster, I wanted to run HPL across the whole >>> thing using openmpi 2.0.1. >>> >>> I couldn't get it to start on more than 129 hosts under Son of Gridengine >>> (128 remote plus the localhost running the mpirun command). openmpi would >>> sit there, waiting for all the orted's to check in; however, there were >>> "only" a maximum of 128 qrsh processes, therefore a maximum of 128 orted's, >>> therefore waiting a loooong time. >>> >>> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to >>> launch. >>> >>> Is this intentional, please? >>> >>> Doesn't openmpi use a tree-like startup sometimes - any particular reason >>> it's not using it here? > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users