Here is most of the command output when run on a grid machine: mpiexec --version

mpiexec (OpenRTE) 2.0.2 ompi_info | grep grid

                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v2.0.2) c

denied: host "" is neither submit nor admin host

Does that suggest anything?

qconf is restricted to sysadmins, which I am not.

I would note that we are running debian stretch on the cluster machines.
On some of our other (non-grid) machines, running debian buster, the output
is: mpiexec --version

mpiexec (OpenRTE) 3.1.3

Report bugs to ompi_info | grep grid

                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component


-David Laidlaw

On Thu, Jul 25, 2019 at 2:13 PM Reuti <> wrote:

> Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:
> > I have been trying to run some MPI jobs under SGE for almost a year
> without success.  What seems like a very simple test program fails; the
> ingredients of it are below.  Any suggestions on any piece of the test,
> reasons for failure, requests for additional info, configuration thoughts,
> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
> but can't identify the problem.  We do have SGE support build into MPI.  We
> also have the SGE parallel environment (PE) set up as described in several
> places on the web.
> >
> > Many thanks for any input!
> Did you compile Open MPI on your own or was it delivered with the Linux
> distribution? That it tries to use `ssh` is quite strange, as nowadays Open
> MPI and others have built-in support to detect that they are running under
> the control of a queuing system. It should use `qrsh` in your case.
> What does:
> mpiexec --version
> ompi_info | grep grid
> reveal? What does:
> qconf -sconf | egrep "(command|daemon)"
> show?
> -- Reuti
> > Cheers,
> >
> > -David Laidlaw
> >
> >
> >
> >
> > Here is how I submit the job:
> >
> >    /usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> >
> >
> > Here is what is in runme:
> >
> >   #!/bin/bash
> >   #$ -cwd
> >   #$ -pe orte_fill 1
> >   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> > allocation ./hello
> >
> >
> > Here is hello.c:
> >
> > #include <mpi.h>
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <stdlib.h>
> >
> > int main(int argc, char** argv) {
> >     // Initialize the MPI environment
> >     MPI_Init(NULL, NULL);
> >
> >     // Get the number of processes
> >     int world_size;
> >     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
> >
> >     // Get the rank of the process
> >     int world_rank;
> >     MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
> >
> >     // Get the name of the processor
> >     char processor_name[MPI_MAX_PROCESSOR_NAME];
> >     int name_len;
> >     MPI_Get_processor_name(processor_name, &name_len);
> >
> >     // Print off a hello world message
> >     printf("Hello world from processor %s, rank %d out of %d
> processors\n",
> >            processor_name, world_rank, world_size);
> >     // system("printenv");
> >
> >     sleep(15); // sleep for 60 seconds
> >
> >     // Finalize the MPI environment.
> >     MPI_Finalize();
> > }
> >
> >
> > This command will build it:
> >
> >      mpicc hello.c -o hello
> >
> >
> > Running produces the following:
> >
> > /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
> >
> --------------------------------------------------------------------------
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> >   Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
> >   Please check with your sys admin to determine the correct location to
> use.
> >
> > *  compilation of the orted with dynamic libraries when static are
> required
> >   (e.g., on Cray). Please check your configure cmd line and consider
> using
> >   one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> >   lack of common network interfaces and/or no route found between
> >   them. Please check network connectivity (including firewalls
> >   and network routing requirements).
> >
> --------------------------------------------------------------------------
> >
> >
> > and:
> >
> > [dblade01:10902] [[37323,0],0] plm:rsh: final template argv:
> >         /usr/bin/ssh <template>     set path = ( /usr/bin $path ) ; if (
> $?
> > LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH
> >  == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 )
> setenv
> > _PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) setenv
> > DYLD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_dllp == 1 ) setenv DY
> > LD_LIBRARY_PATH /usr/lib:$DYLD_LIBRARY_PATH ;   /usr/bin/orted
> --hnp-topo-sig
> > 0N:2S:0L3:4L2:4L1:4C:4H:x86_64 -mca ess "env" -mca ess_base_jo
> > bid "2446000128" -mca ess_base_vpid "<template>" -mca ess_base_num_procs
> "2" -
> > mca orte_hnp_uri "2446000128.0;usock;tcp://"
> >  --mca plm_base_verbose "1" -mca plm "rsh" -mca orte_display_alloc "1"
> -mca
> > pmix "^s1,s2,cray"
> > ssh_exchange_identification: read: Connection reset by peer
> >
> >
> >
> > _______________________________________________
> > users mailing list
> >
> >
users mailing list

Reply via email to