I’ll create a patch that you can try - if it works okay, we can commit it

> On Jan 18, 2017, at 3:29 AM, William Hay <w....@ucl.ac.uk> wrote:
> 
> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote:
>> As I recall, the problem was that qrsh isn???t available on the backend 
>> compute nodes, and so we can???t use a tree for launch. If that isn???t 
>> true, then we can certainly adjust it.
>> 
> qrsh should be available on all nodes of a SoGE cluster but, depending on how 
> things are set up, may not be 
> findable (ie not in the PATH) when you qrsh -inherit into a node.  A 
> workaround would be to start backend 
> processes with qrsh -inherit -v PATH which will copy the PATH from the master 
> node to the slave node 
> process or otherwise pass the location of qrsh from one node or another.  
> That of course assumes that 
> qrsh is in the same location on all nodes.
> 
> I've tested that it is possible to qrsh from the head node of a job to a 
> slave node and then on to
> another slave node by this method.
> 
> William
> 
> 
>>> On Jan 17, 2017, at 9:37 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote:
>>> 
>>> Hi,
>>> 
>>> While commissioning a new cluster, I wanted to run HPL across the whole 
>>> thing using openmpi 2.0.1.
>>> 
>>> I couldn't get it to start on more than 129 hosts under Son of Gridengine 
>>> (128 remote plus the localhost running the mpirun command). openmpi would 
>>> sit there, waiting for all the orted's to check in; however, there were 
>>> "only" a maximum of 128 qrsh processes, therefore a maximum of 128 orted's, 
>>> therefore waiting a loooong time.
>>> 
>>> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to 
>>> launch.
>>> 
>>> Is this intentional, please?
>>> 
>>> Doesn't openmpi use a tree-like startup sometimes - any particular reason 
>>> it's not using it here?
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to