Hi Reuti

As far as I am concerned, you SGE users “own” the SGE support - so feel free to 
submit a patch!

Ralph

> On Sep 13, 2017, at 9:10 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> 
> Hi,
> 
> I wonder whether it came ever to the discussion, that SGE can have a similar 
> behavior like Torque/PBS regarding the mangling of hostnames. It's similiar 
> to https://github.com/open-mpi/ompi/issues/2328, in the behavior that a node 
> can have multiple network interfaces and each has an unique name. SGE's 
> operation can be routed to a specific network interface by the use of a file:
> 
> $SGE_ROOT/$SGE_CELL/common/host_aliases
> 
> which has the format:
> 
> <sge-name of the node> <one or more blanks> <real long or short hostname>
> 
> Hence in the generated $PE_HOSTFILE the name known to SGE is listed, although 
> the `hostname` command provides the real name. Open MPI would in this case 
> start a `qrsh -inherit …` call instead of forking, as it thinks that these 
> are different machines (assuming an allocation_rule of $PE_SLOTS so that the 
> `mpiexec` is supposed to be on the same machine as the started tasks).
> 
> I tried to go the "old" way to provide a start_proc_args to the PE to create 
> a symbolic link to `hostname` in $TMPDIR, so that inside the job script an 
> adjusted `hostname` call is available, but obviously Open MPI calls 
> gethostname() directly and not by an external binary.
> 
> So I mangled the hostname in the created machinefile in the jobscript to feed 
> an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: 
> Open MPI creates forks.
> 
> Does anyone else need such a patch in Open MPI and is it suitable to be 
> included?
> 
> -- Reuti
> 
> PS: Only the headnodes have more than one network interface in our case and 
> hence it's didn't come to my attention up to now, as now there was a need to 
> use also some cores on the headnodes. They are known internally to SGE as 
> "login" and "master", but the external names may be "foo" and "baz" which 
> gethostname() returns.
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to