Here is most of the command output when run on a grid machine:
dblade65.dhl(101) mpiexec --version mpiexec (OpenRTE) 2.0.2 dblade65.dhl(102) ompi_info | grep grid MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v2.0.2) dblade65.dhl(103) c denied: host "dblade65.cs.brown.edu" is neither submit nor admin host dblade65.dhl(104) Does that suggest anything? qconf is restricted to sysadmins, which I am not. I would note that we are running debian stretch on the cluster machines. On some of our other (non-grid) machines, running debian buster, the output is: cslab3d.dhl(101) mpiexec --version mpiexec (OpenRTE) 3.1.3 Report bugs to http://www.open-mpi.org/community/help/ cslab3d.dhl(102) ompi_info | grep grid MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v3.1.3) Thanks! -David Laidlaw On Thu, Jul 25, 2019 at 2:13 PM Reuti <re...@staff.uni-marburg.de> wrote: > > Am 25.07.2019 um 18:59 schrieb David Laidlaw via users: > > > I have been trying to run some MPI jobs under SGE for almost a year > without success. What seems like a very simple test program fails; the > ingredients of it are below. Any suggestions on any piece of the test, > reasons for failure, requests for additional info, configuration thoughts, > etc. would be much appreciated. I suspect the linkage between SGIEand MPI, > but can't identify the problem. We do have SGE support build into MPI. We > also have the SGE parallel environment (PE) set up as described in several > places on the web. > > > > Many thanks for any input! > > Did you compile Open MPI on your own or was it delivered with the Linux > distribution? That it tries to use `ssh` is quite strange, as nowadays Open > MPI and others have built-in support to detect that they are running under > the control of a queuing system. It should use `qrsh` in your case. > > What does: > > mpiexec --version > ompi_info | grep grid > > reveal? What does: > > qconf -sconf | egrep "(command|daemon)" > > show? > > -- Reuti > > > > Cheers, > > > > -David Laidlaw > > > > > > > > > > Here is how I submit the job: > > > > /usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme > > > > > > Here is what is in runme: > > > > #!/bin/bash > > #$ -cwd > > #$ -pe orte_fill 1 > > env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display- > > allocation ./hello > > > > > > Here is hello.c: > > > > #include <mpi.h> > > #include <stdio.h> > > #include <unistd.h> > > #include <stdlib.h> > > > > int main(int argc, char** argv) { > > // Initialize the MPI environment > > MPI_Init(NULL, NULL); > > > > // Get the number of processes > > int world_size; > > MPI_Comm_size(MPI_COMM_WORLD, &world_size); > > > > // Get the rank of the process > > int world_rank; > > MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); > > > > // Get the name of the processor > > char processor_name[MPI_MAX_PROCESSOR_NAME]; > > int name_len; > > MPI_Get_processor_name(processor_name, &name_len); > > > > // Print off a hello world message > > printf("Hello world from processor %s, rank %d out of %d > processors\n", > > processor_name, world_rank, world_size); > > // system("printenv"); > > > > sleep(15); // sleep for 60 seconds > > > > // Finalize the MPI environment. > > MPI_Finalize(); > > } > > > > > > This command will build it: > > > > mpicc hello.c -o hello > > > > > > Running produces the following: > > > > /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile > > dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED > > > -------------------------------------------------------------------------- > > ORTE was unable to reliably start one or more daemons. > > This usually is caused by: > > > > * not finding the required libraries and/or binaries on > > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > > settings, or configure OMPI with --enable-orterun-prefix-by-default > > > > * lack of authority to execute on one or more specified nodes. > > Please verify your allocation and authorities. > > > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > > Please check with your sys admin to determine the correct location to > use. > > > > * compilation of the orted with dynamic libraries when static are > required > > (e.g., on Cray). Please check your configure cmd line and consider > using > > one of the contrib/platform definitions for your system type. > > > > * an inability to create a connection back to mpirun due to a > > lack of common network interfaces and/or no route found between > > them. Please check network connectivity (including firewalls > > and network routing requirements). > > > -------------------------------------------------------------------------- > > > > > > and: > > > > [dblade01:10902] [[37323,0],0] plm:rsh: final template argv: > > /usr/bin/ssh <template> set path = ( /usr/bin $path ) ; if ( > $? > > LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH > > == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 ) > setenv > > LD_LIBRARY_PATH /usr/lib:$LD_LIBRARY_PATH ; if ( $?DYLD_LIBRARY > > _PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) setenv > > DYLD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_dllp == 1 ) setenv DY > > LD_LIBRARY_PATH /usr/lib:$DYLD_LIBRARY_PATH ; /usr/bin/orted > --hnp-topo-sig > > 0N:2S:0L3:4L2:4L1:4C:4H:x86_64 -mca ess "env" -mca ess_base_jo > > bid "2446000128" -mca ess_base_vpid "<template>" -mca ess_base_num_procs > "2" - > > mca orte_hnp_uri "2446000128.0;usock;tcp://10.116.85.90:44791" > > --mca plm_base_verbose "1" -mca plm "rsh" -mca orte_display_alloc "1" > -mca > > pmix "^s1,s2,cray" > > ssh_exchange_identification: read: Connection reset by peer > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users