Re: [OMPI users] can't run MPI job under SGE

2019-07-29 Thread Reuti via users


> Am 29.07.2019 um 17:17 schrieb David Laidlaw :
> 
> I will try building a newer ompi version in my home directory, but that will 
> take me some time.
> 
> qconf is not available to me on any machine.  It provides that same error 
> wherever I am able to try it:
> > denied: host "..." is neither submit nor admin host
> 
> Here is what it produces when I have a sysadmin run it:
> $ qconf -sconf | egrep "(command|daemon)"
> qlogin_command   /sysvol/sge.test/bin/qlogin-wrapper
> qlogin_daemon/sysvol/sge.test/bin/grid-sshd -i
> rlogin_command   builtin
> rlogin_daemonbuiltin
> rsh_command  builtin
> rsh_daemon   builtin

That's fine. I wondered whether rsh_* would contain a redirection to `ssh` to 
(to get the source of the used `ssh` in your error output).

-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] can't run MPI job under SGE

2019-07-29 Thread David Laidlaw via users
I will try building a newer ompi version in my home directory, but that
will take me some time.

qconf is not available to me on any machine.  It provides that same error
wherever I am able to try it:

> denied: host ". .." is neither submit nor
admin host


Here is what it produces when I have a sysadmin run it:

$ qconf -sconf | egrep "(command|daemon)"
qlogin_command   /sysvol/sge.test/bin/qlogin-wrapper
qlogin_daemon/sysvol/sge.test/bin/grid-sshd -i
rlogin_command   builtin
rlogin_daemonbuiltin
rsh_command  builtin
rsh_daemon   builtin


does that suggest anything?

Thanks!

-David Laidlaw




On Thu, Jul 25, 2019 at 5:21 PM Reuti  wrote:

>
> Am 25.07.2019 um 23:00 schrieb David Laidlaw:
>
> > Here is most of the command output when run on a grid machine:
> >
> > dblade65.dhl(101) mpiexec --version
> > mpiexec (OpenRTE) 2.0.2
>
> This is some time old. I would suggest to install a fresh one. You can
> even compile one in your home directory and install it e.g. in
> $HOME/local/openmpi-3.1.4-gcc_7.4.0-shared ( by --prefix=…intended path…)
> and then access this for all your jobs (adjust for your version of gcc). In
> your ~/.bash_profile and the job script:
>
> DEFAULT_MANPATH="$(manpath -q)"
> MY_OMPI="$HOME/local/openmpi-3.1.4_gcc-7.4.0_shared"
> export PATH="$MY_OMPI/bin:$PATH"
> export
> LD_LIBRARY_PATH="$MY_OMPI/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
> export MANPATH="$MY_OMPI/share/man${DEFAULT_MANPATH:+:$DEFAULT_MANPATH}"
> unset MY_OMPI
> unset DEFAULT_MANPATH
>
> Essentially there is no conflict with the already installed version.
>
>
> > dblade65.dhl(102) ompi_info | grep grid
> >  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v2.0.2)
> > dblade65.dhl(103) c
> > denied: host "dblade65.cs.brown.edu" is neither submit nor admin host
> > dblade65.dhl(104)
>
> On a node it’s ok this way.
>
>
> > Does that suggest anything?
> >
> > qconf is restricted to sysadmins, which I am not.
>
> What error is output if you try it anyway? Usually the viewing is always
> possible.
>
>
> > I would note that we are running debian stretch on the cluster
> machines.  On some of our other (non-grid) machines, running debian buster,
> the output is:
> >
> > cslab3d.dhl(101) mpiexec --version
> > mpiexec (OpenRTE) 3.1.3
> > Report bugs to http://www.open-mpi.org/community/help/
> > cslab3d.dhl(102) ompi_info | grep grid
> >  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v3.1.3)
>
> If you compile on such a machine and intend to run it in the cluster it
> won't work, as the versions don't match. Therefore the above solution, to
> use a personal version available in your $HOME for compiling and running
> the applications.
>
> Side note: Open MPI binds the processes to cores by default. In case more
> than one MPI job is running on a node one will have to use `mpiexec
> --bind-to none …` as otherwise all jobs on this node will use core 0
> upwards.
>
> -- Reuti
>
>
> > Thanks!
> >
> > -David Laidlaw
> >
> > On Thu, Jul 25, 2019 at 2:13 PM Reuti 
> wrote:
> >
> > Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:
> >
> > > I have been trying to run some MPI jobs under SGE for almost a year
> without success.  What seems like a very simple test program fails; the
> ingredients of it are below.  Any suggestions on any piece of the test,
> reasons for failure, requests for additional info, configuration thoughts,
> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
> but can't identify the problem.  We do have SGE support build into MPI.  We
> also have the SGE parallel environment (PE) set up as described in several
> places on the web.
> > >
> > > Many thanks for any input!
> >
> > Did you compile Open MPI on your own or was it delivered with the Linux
> distribution? That it tries to use `ssh` is quite strange, as nowadays Open
> MPI and others have built-in support to detect that they are running under
> the control of a queuing system. It should use `qrsh` in your case.
> >
> > What does:
> >
> > mpiexec --version
> > ompi_info | grep grid
> >
> > reveal? What does:
> >
> > qconf -sconf | egrep "(command|daemon)"
> >
> > show?
> >
> > -- Reuti
> >
> >
> > > Cheers,
> > >
> > > -David Laidlaw
> > >
> > >
> > >
> > >
> > > Here is how I submit the job:
> > >
> > >/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> > >
> > >
> > > Here is what is in runme:
> > >
> > >   #!/bin/bash
> > >   #$ -cwd
> > >   #$ -pe orte_fill 1
> > >   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> > > allocation ./hello
> > >
> > >
> > > Here is hello.c:
> > >
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > >
> > > int main(int argc, char** argv) {
> > > // Initialize the MPI environment
> > > MPI_Init(NULL, NULL);
> > >
> > > // Get the number of processes
> > > 

Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread Reuti via users


Am 25.07.2019 um 23:00 schrieb David Laidlaw:

> Here is most of the command output when run on a grid machine:
> 
> dblade65.dhl(101) mpiexec --version
> mpiexec (OpenRTE) 2.0.2

This is some time old. I would suggest to install a fresh one. You can even 
compile one in your home directory and install it e.g. in 
$HOME/local/openmpi-3.1.4-gcc_7.4.0-shared ( by --prefix=…intended path…) and 
then access this for all your jobs (adjust for your version of gcc). In your 
~/.bash_profile and the job script:

DEFAULT_MANPATH="$(manpath -q)"
MY_OMPI="$HOME/local/openmpi-3.1.4_gcc-7.4.0_shared"
export PATH="$MY_OMPI/bin:$PATH"
export LD_LIBRARY_PATH="$MY_OMPI/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export MANPATH="$MY_OMPI/share/man${DEFAULT_MANPATH:+:$DEFAULT_MANPATH}"
unset MY_OMPI
unset DEFAULT_MANPATH

Essentially there is no conflict with the already installed version.


> dblade65.dhl(102) ompi_info | grep grid
>  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component 
> v2.0.2)
> dblade65.dhl(103) c
> denied: host "dblade65.cs.brown.edu" is neither submit nor admin host
> dblade65.dhl(104) 

On a node it’s ok this way.


> Does that suggest anything?
> 
> qconf is restricted to sysadmins, which I am not.

What error is output if you try it anyway? Usually the viewing is always 
possible.


> I would note that we are running debian stretch on the cluster machines.  On 
> some of our other (non-grid) machines, running debian buster, the output is:
> 
> cslab3d.dhl(101) mpiexec --version
> mpiexec (OpenRTE) 3.1.3
> Report bugs to http://www.open-mpi.org/community/help/
> cslab3d.dhl(102) ompi_info | grep grid
>  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component 
> v3.1.3)

If you compile on such a machine and intend to run it in the cluster it won't 
work, as the versions don't match. Therefore the above solution, to use a 
personal version available in your $HOME for compiling and running the 
applications.

Side note: Open MPI binds the processes to cores by default. In case more than 
one MPI job is running on a node one will have to use `mpiexec --bind-to none 
…` as otherwise all jobs on this node will use core 0 upwards.

-- Reuti


> Thanks!
> 
> -David Laidlaw
> 
> On Thu, Jul 25, 2019 at 2:13 PM Reuti  wrote:
> 
> Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:
> 
> > I have been trying to run some MPI jobs under SGE for almost a year without 
> > success.  What seems like a very simple test program fails; the ingredients 
> > of it are below.  Any suggestions on any piece of the test, reasons for 
> > failure, requests for additional info, configuration thoughts, etc. would 
> > be much appreciated.  I suspect the linkage between SGIEand MPI, but can't 
> > identify the problem.  We do have SGE support build into MPI.  We also have 
> > the SGE parallel environment (PE) set up as described in several places on 
> > the web.
> > 
> > Many thanks for any input!
> 
> Did you compile Open MPI on your own or was it delivered with the Linux 
> distribution? That it tries to use `ssh` is quite strange, as nowadays Open 
> MPI and others have built-in support to detect that they are running under 
> the control of a queuing system. It should use `qrsh` in your case.
> 
> What does:
> 
> mpiexec --version
> ompi_info | grep grid
> 
> reveal? What does:
> 
> qconf -sconf | egrep "(command|daemon)"
> 
> show?
> 
> -- Reuti
> 
> 
> > Cheers,
> > 
> > -David Laidlaw
> > 
> > 
> > 
> > 
> > Here is how I submit the job:
> > 
> >/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> > 
> > 
> > Here is what is in runme:
> > 
> >   #!/bin/bash
> >   #$ -cwd
> >   #$ -pe orte_fill 1
> >   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> > allocation ./hello
> > 
> > 
> > Here is hello.c:
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > int main(int argc, char** argv) {
> > // Initialize the MPI environment
> > MPI_Init(NULL, NULL);
> > 
> > // Get the number of processes
> > int world_size;
> > MPI_Comm_size(MPI_COMM_WORLD, _size);
> > 
> > // Get the rank of the process
> > int world_rank;
> > MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> > 
> > // Get the name of the processor
> > char processor_name[MPI_MAX_PROCESSOR_NAME];
> > int name_len;
> > MPI_Get_processor_name(processor_name, _len);
> > 
> > // Print off a hello world message
> > printf("Hello world from processor %s, rank %d out of %d processors\n",
> >processor_name, world_rank, world_size);
> > // system("printenv");
> > 
> > sleep(15); // sleep for 60 seconds
> > 
> > // Finalize the MPI environment.
> > MPI_Finalize();
> > }
> > 
> > 
> > This command will build it:
> > 
> >  mpicc hello.c -o hello
> > 
> > 
> > Running produces the following:
> > 
> > /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
> > dblade01.cs.brown.edu 1 

Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread David Laidlaw via users
Here is most of the command output when run on a grid machine:


dblade65.dhl(101) mpiexec --version

mpiexec (OpenRTE) 2.0.2

dblade65.dhl(102) ompi_info | grep grid

 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v2.0.2)

dblade65.dhl(103) c

denied: host "dblade65.cs.brown.edu" is neither submit nor admin host

dblade65.dhl(104)


Does that suggest anything?

qconf is restricted to sysadmins, which I am not.

I would note that we are running debian stretch on the cluster machines.
On some of our other (non-grid) machines, running debian buster, the output
is:

cslab3d.dhl(101) mpiexec --version

mpiexec (OpenRTE) 3.1.3

Report bugs to http://www.open-mpi.org/community/help/

cslab3d.dhl(102) ompi_info | grep grid

 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v3.1.3)


Thanks!

-David Laidlaw

On Thu, Jul 25, 2019 at 2:13 PM Reuti  wrote:

>
> Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:
>
> > I have been trying to run some MPI jobs under SGE for almost a year
> without success.  What seems like a very simple test program fails; the
> ingredients of it are below.  Any suggestions on any piece of the test,
> reasons for failure, requests for additional info, configuration thoughts,
> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
> but can't identify the problem.  We do have SGE support build into MPI.  We
> also have the SGE parallel environment (PE) set up as described in several
> places on the web.
> >
> > Many thanks for any input!
>
> Did you compile Open MPI on your own or was it delivered with the Linux
> distribution? That it tries to use `ssh` is quite strange, as nowadays Open
> MPI and others have built-in support to detect that they are running under
> the control of a queuing system. It should use `qrsh` in your case.
>
> What does:
>
> mpiexec --version
> ompi_info | grep grid
>
> reveal? What does:
>
> qconf -sconf | egrep "(command|daemon)"
>
> show?
>
> -- Reuti
>
>
> > Cheers,
> >
> > -David Laidlaw
> >
> >
> >
> >
> > Here is how I submit the job:
> >
> >/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> >
> >
> > Here is what is in runme:
> >
> >   #!/bin/bash
> >   #$ -cwd
> >   #$ -pe orte_fill 1
> >   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> > allocation ./hello
> >
> >
> > Here is hello.c:
> >
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(int argc, char** argv) {
> > // Initialize the MPI environment
> > MPI_Init(NULL, NULL);
> >
> > // Get the number of processes
> > int world_size;
> > MPI_Comm_size(MPI_COMM_WORLD, _size);
> >
> > // Get the rank of the process
> > int world_rank;
> > MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> >
> > // Get the name of the processor
> > char processor_name[MPI_MAX_PROCESSOR_NAME];
> > int name_len;
> > MPI_Get_processor_name(processor_name, _len);
> >
> > // Print off a hello world message
> > printf("Hello world from processor %s, rank %d out of %d
> processors\n",
> >processor_name, world_rank, world_size);
> > // system("printenv");
> >
> > sleep(15); // sleep for 60 seconds
> >
> > // Finalize the MPI environment.
> > MPI_Finalize();
> > }
> >
> >
> > This command will build it:
> >
> >  mpicc hello.c -o hello
> >
> >
> > Running produces the following:
> >
> > /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
> > dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED
> >
> --
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> >   Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
> >   Please check with your sys admin to determine the correct location to
> use.
> >
> > *  compilation of the orted with dynamic libraries when static are
> required
> >   (e.g., on Cray). Please check your configure cmd line and consider
> using
> >   one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> >   lack of common network interfaces and/or no route found between
> >   them. Please check network connectivity (including firewalls
> >   and network routing requirements).
> >
> --
> >
> >
> > and:
> >
> > [dblade01:10902] [[37323,0],0] plm:rsh: final template argv:
> > /usr/bin/ssh  set path = ( /usr/bin $path ) ; if (
> $?
> > LD_LIBRARY_PATH == 1 ) set 

Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread David Laidlaw via users
Thanks for the input, John.  Here are some responses (inline):

On Thu, Jul 25, 2019 at 1:21 PM John Hearns via users <
users@lists.open-mpi.org> wrote:

> Have you checked your ssh between nodes?
>

ssh is not allowed between nodes, but my understanding is that processes
should be getting set up and run by SGE, since it handles the queuing.


> Also how is your Path set up?
>

It should be using the same startup scripts as I use on other machines
within our dept, since the filesystem and home directories are shared
across both grid and non-grid machines.  In any case, I have put in fully
qualified pathnames for everything that I start up.


> A. Construct a hosts file and mpirun by hand
>

I have looked at the hosts file, and it seems correct.  I don't know that I
can pass a hosts file to mpirun directly, since SGE queues things and
determines what hosts will be assigned.


>
> B. Use modules rather than. Bashrc files
>

Hmm.  I don't really understand this one.  (I know what both are, but I
don't understand the problem that would be solved by converting to
modules..)


> C. Slurm
>

I don't run the grid/cluster, so I can't choose the queuing tools that are
run.  There are plans to migrate to slurm at some point in the future, but
that doesn't help me now...

Thanks!

-David Laidlaw


>
> On Thu, 25 Jul 2019, 18:00 David Laidlaw via users, <
> users@lists.open-mpi.org> wrote:
>
>> I have been trying to run some MPI jobs under SGE for almost a year
>> without success.  What seems like a very simple test program fails; the
>> ingredients of it are below.  Any suggestions on any piece of the test,
>> reasons for failure, requests for additional info, configuration thoughts,
>> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
>> but can't identify the problem.  We do have SGE support build into MPI.  We
>> also have the SGE parallel environment (PE) set up as described in several
>> places on the web.
>>
>> Many thanks for any input!
>>
>> Cheers,
>>
>> -David Laidlaw
>>
>>
>>
>>
>> Here is how I submit the job:
>>
>>/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
>>
>>
>> Here is what is in runme:
>>
>>   #!/bin/bash
>>   #$ -cwd
>>   #$ -pe orte_fill 1
>>   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
>> allocation ./hello
>>
>>
>> Here is hello.c:
>>
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main(int argc, char** argv) {
>> // Initialize the MPI environment
>> MPI_Init(NULL, NULL);
>>
>> // Get the number of processes
>> int world_size;
>> MPI_Comm_size(MPI_COMM_WORLD, _size);
>>
>> // Get the rank of the process
>> int world_rank;
>> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
>>
>> // Get the name of the processor
>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> int name_len;
>> MPI_Get_processor_name(processor_name, _len);
>>
>> // Print off a hello world message
>> printf("Hello world from processor %s, rank %d out of %d
>> processors\n",
>>processor_name, world_rank, world_size);
>> // system("printenv");
>>
>> sleep(15); // sleep for 60 seconds
>>
>> // Finalize the MPI environment.
>> MPI_Finalize();
>> }
>>
>>
>> This command will build it:
>>
>>  mpicc hello.c -o hello
>>
>>
>> Running produces the following:
>>
>> /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
>> dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED
>> --
>> ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>>
>> * not finding the required libraries and/or binaries on
>>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>>
>> * lack of authority to execute on one or more specified nodes.
>>   Please verify your allocation and authorities.
>>
>> * the inability to write startup files into /tmp
>> (--tmpdir/orte_tmpdir_base).
>>   Please check with your sys admin to determine the correct location to
>> use.
>>
>> *  compilation of the orted with dynamic libraries when static are
>> required
>>   (e.g., on Cray). Please check your configure cmd line and consider using
>>   one of the contrib/platform definitions for your system type.
>>
>> * an inability to create a connection back to mpirun due to a
>>   lack of common network interfaces and/or no route found between
>>   them. Please check network connectivity (including firewalls
>>   and network routing requirements).
>> --
>>
>>
>> and:
>>
>> [dblade01:10902] [[37323,0],0] plm:rsh: final template argv:
>> /usr/bin/ssh  set path = ( /usr/bin $path ) ; if (
>> $?
>> LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH
>>  == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 )
>> 

Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread Reuti via users


Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:

> I have been trying to run some MPI jobs under SGE for almost a year without 
> success.  What seems like a very simple test program fails; the ingredients 
> of it are below.  Any suggestions on any piece of the test, reasons for 
> failure, requests for additional info, configuration thoughts, etc. would be 
> much appreciated.  I suspect the linkage between SGIEand MPI, but can't 
> identify the problem.  We do have SGE support build into MPI.  We also have 
> the SGE parallel environment (PE) set up as described in several places on 
> the web.
> 
> Many thanks for any input!

Did you compile Open MPI on your own or was it delivered with the Linux 
distribution? That it tries to use `ssh` is quite strange, as nowadays Open MPI 
and others have built-in support to detect that they are running under the 
control of a queuing system. It should use `qrsh` in your case.

What does:

mpiexec --version
ompi_info | grep grid

reveal? What does:

qconf -sconf | egrep "(command|daemon)"

show?

-- Reuti


> Cheers,
> 
> -David Laidlaw
> 
> 
> 
> 
> Here is how I submit the job:
> 
>/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> 
> 
> Here is what is in runme:
> 
>   #!/bin/bash
>   #$ -cwd
>   #$ -pe orte_fill 1
>   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> allocation ./hello
> 
> 
> Here is hello.c:
> 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char** argv) {
> // Initialize the MPI environment
> MPI_Init(NULL, NULL);
> 
> // Get the number of processes
> int world_size;
> MPI_Comm_size(MPI_COMM_WORLD, _size);
> 
> // Get the rank of the process
> int world_rank;
> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> 
> // Get the name of the processor
> char processor_name[MPI_MAX_PROCESSOR_NAME];
> int name_len;
> MPI_Get_processor_name(processor_name, _len);
> 
> // Print off a hello world message
> printf("Hello world from processor %s, rank %d out of %d processors\n",
>processor_name, world_rank, world_size);
> // system("printenv");
> 
> sleep(15); // sleep for 60 seconds
> 
> // Finalize the MPI environment.
> MPI_Finalize();
> }
> 
> 
> This command will build it:
> 
>  mpicc hello.c -o hello
> 
> 
> Running produces the following:
> 
> /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
> dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --
> 
> 
> and:
> 
> [dblade01:10902] [[37323,0],0] plm:rsh: final template argv:
> /usr/bin/ssh  set path = ( /usr/bin $path ) ; if ( $?
> LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH
>  == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 ) setenv
> LD_LIBRARY_PATH /usr/lib:$LD_LIBRARY_PATH ; if ( $?DYLD_LIBRARY
> _PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) setenv
> DYLD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_dllp == 1 ) setenv DY
> LD_LIBRARY_PATH /usr/lib:$DYLD_LIBRARY_PATH ;   /usr/bin/orted --hnp-topo-sig
> 0N:2S:0L3:4L2:4L1:4C:4H:x86_64 -mca ess "env" -mca ess_base_jo
> bid "2446000128" -mca ess_base_vpid "" -mca ess_base_num_procs "2" -
> mca orte_hnp_uri "2446000128.0;usock;tcp://10.116.85.90:44791"
>  --mca plm_base_verbose "1" -mca plm "rsh" -mca orte_display_alloc "1" -mca
> pmix "^s1,s2,cray"
> ssh_exchange_identification: read: Connection reset by peer
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread John Hearns via users
Have you checked your ssh between nodes?
Also how is your Path set up?
There is a difference between interactive and non interactive login sessions

I advuse
A. Construct a hosts file and mpirun by hand

B. Use modules rather than. Bashrc files

C. Slurm

On Thu, 25 Jul 2019, 18:00 David Laidlaw via users, <
users@lists.open-mpi.org> wrote:

> I have been trying to run some MPI jobs under SGE for almost a year
> without success.  What seems like a very simple test program fails; the
> ingredients of it are below.  Any suggestions on any piece of the test,
> reasons for failure, requests for additional info, configuration thoughts,
> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
> but can't identify the problem.  We do have SGE support build into MPI.  We
> also have the SGE parallel environment (PE) set up as described in several
> places on the web.
>
> Many thanks for any input!
>
> Cheers,
>
> -David Laidlaw
>
>
>
>
> Here is how I submit the job:
>
>/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
>
>
> Here is what is in runme:
>
>   #!/bin/bash
>   #$ -cwd
>   #$ -pe orte_fill 1
>   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> allocation ./hello
>
>
> Here is hello.c:
>
> #include 
> #include 
> #include 
> #include 
>
> int main(int argc, char** argv) {
> // Initialize the MPI environment
> MPI_Init(NULL, NULL);
>
> // Get the number of processes
> int world_size;
> MPI_Comm_size(MPI_COMM_WORLD, _size);
>
> // Get the rank of the process
> int world_rank;
> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
>
> // Get the name of the processor
> char processor_name[MPI_MAX_PROCESSOR_NAME];
> int name_len;
> MPI_Get_processor_name(processor_name, _len);
>
> // Print off a hello world message
> printf("Hello world from processor %s, rank %d out of %d processors\n",
>processor_name, world_rank, world_size);
> // system("printenv");
>
> sleep(15); // sleep for 60 seconds
>
> // Finalize the MPI environment.
> MPI_Finalize();
> }
>
>
> This command will build it:
>
>  mpicc hello.c -o hello
>
>
> Running produces the following:
>
> /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile
> dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to
> use.
>
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --
>
>
> and:
>
> [dblade01:10902] [[37323,0],0] plm:rsh: final template argv:
> /usr/bin/ssh  set path = ( /usr/bin $path ) ; if ( $?
> LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH
>  == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 )
> setenv
> LD_LIBRARY_PATH /usr/lib:$LD_LIBRARY_PATH ; if ( $?DYLD_LIBRARY
> _PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) setenv
> DYLD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_dllp == 1 ) setenv DY
> LD_LIBRARY_PATH /usr/lib:$DYLD_LIBRARY_PATH ;   /usr/bin/orted
> --hnp-topo-sig
> 0N:2S:0L3:4L2:4L1:4C:4H:x86_64 -mca ess "env" -mca ess_base_jo
> bid "2446000128" -mca ess_base_vpid "" -mca ess_base_num_procs
> "2" -
> mca orte_hnp_uri "2446000128.0;usock;tcp://10.116.85.90:44791"
>  --mca plm_base_verbose "1" -mca plm "rsh" -mca orte_display_alloc "1" -mca
> pmix "^s1,s2,cray"
> ssh_exchange_identification: read: Connection reset by peer
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users