Re: [OMPI users] can't run MPI job under SGE

2019-07-29 Thread Reuti via users


> Am 29.07.2019 um 17:17 schrieb David Laidlaw :
> 
> I will try building a newer ompi version in my home directory, but that will 
> take me some time.
> 
> qconf is not available to me on any machine.  It provides that same error 
> wherever I am able to try it:
> > denied: host "..." is neither submit nor admin host
> 
> Here is what it produces when I have a sysadmin run it:
> $ qconf -sconf | egrep "(command|daemon)"
> qlogin_command   /sysvol/sge.test/bin/qlogin-wrapper
> qlogin_daemon/sysvol/sge.test/bin/grid-sshd -i
> rlogin_command   builtin
> rlogin_daemonbuiltin
> rsh_command  builtin
> rsh_daemon   builtin

That's fine. I wondered whether rsh_* would contain a redirection to `ssh` to 
(to get the source of the used `ssh` in your error output).

-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] can't run MPI job under SGE

2019-07-29 Thread David Laidlaw via users
I will try building a newer ompi version in my home directory, but that
will take me some time.

qconf is not available to me on any machine.  It provides that same error
wherever I am able to try it:

> denied: host ". .." is neither submit nor
admin host


Here is what it produces when I have a sysadmin run it:

$ qconf -sconf | egrep "(command|daemon)"
qlogin_command   /sysvol/sge.test/bin/qlogin-wrapper
qlogin_daemon/sysvol/sge.test/bin/grid-sshd -i
rlogin_command   builtin
rlogin_daemonbuiltin
rsh_command  builtin
rsh_daemon   builtin


does that suggest anything?

Thanks!

-David Laidlaw




On Thu, Jul 25, 2019 at 5:21 PM Reuti  wrote:

>
> Am 25.07.2019 um 23:00 schrieb David Laidlaw:
>
> > Here is most of the command output when run on a grid machine:
> >
> > dblade65.dhl(101) mpiexec --version
> > mpiexec (OpenRTE) 2.0.2
>
> This is some time old. I would suggest to install a fresh one. You can
> even compile one in your home directory and install it e.g. in
> $HOME/local/openmpi-3.1.4-gcc_7.4.0-shared ( by --prefix=…intended path…)
> and then access this for all your jobs (adjust for your version of gcc). In
> your ~/.bash_profile and the job script:
>
> DEFAULT_MANPATH="$(manpath -q)"
> MY_OMPI="$HOME/local/openmpi-3.1.4_gcc-7.4.0_shared"
> export PATH="$MY_OMPI/bin:$PATH"
> export
> LD_LIBRARY_PATH="$MY_OMPI/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
> export MANPATH="$MY_OMPI/share/man${DEFAULT_MANPATH:+:$DEFAULT_MANPATH}"
> unset MY_OMPI
> unset DEFAULT_MANPATH
>
> Essentially there is no conflict with the already installed version.
>
>
> > dblade65.dhl(102) ompi_info | grep grid
> >  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v2.0.2)
> > dblade65.dhl(103) c
> > denied: host "dblade65.cs.brown.edu" is neither submit nor admin host
> > dblade65.dhl(104)
>
> On a node it’s ok this way.
>
>
> > Does that suggest anything?
> >
> > qconf is restricted to sysadmins, which I am not.
>
> What error is output if you try it anyway? Usually the viewing is always
> possible.
>
>
> > I would note that we are running debian stretch on the cluster
> machines.  On some of our other (non-grid) machines, running debian buster,
> the output is:
> >
> > cslab3d.dhl(101) mpiexec --version
> > mpiexec (OpenRTE) 3.1.3
> > Report bugs to http://www.open-mpi.org/community/help/
> > cslab3d.dhl(102) ompi_info | grep grid
> >  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v3.1.3)
>
> If you compile on such a machine and intend to run it in the cluster it
> won't work, as the versions don't match. Therefore the above solution, to
> use a personal version available in your $HOME for compiling and running
> the applications.
>
> Side note: Open MPI binds the processes to cores by default. In case more
> than one MPI job is running on a node one will have to use `mpiexec
> --bind-to none …` as otherwise all jobs on this node will use core 0
> upwards.
>
> -- Reuti
>
>
> > Thanks!
> >
> > -David Laidlaw
> >
> > On Thu, Jul 25, 2019 at 2:13 PM Reuti 
> wrote:
> >
> > Am 25.07.2019 um 18:59 schrieb David Laidlaw via users:
> >
> > > I have been trying to run some MPI jobs under SGE for almost a year
> without success.  What seems like a very simple test program fails; the
> ingredients of it are below.  Any suggestions on any piece of the test,
> reasons for failure, requests for additional info, configuration thoughts,
> etc. would be much appreciated.  I suspect the linkage between SGIEand MPI,
> but can't identify the problem.  We do have SGE support build into MPI.  We
> also have the SGE parallel environment (PE) set up as described in several
> places on the web.
> > >
> > > Many thanks for any input!
> >
> > Did you compile Open MPI on your own or was it delivered with the Linux
> distribution? That it tries to use `ssh` is quite strange, as nowadays Open
> MPI and others have built-in support to detect that they are running under
> the control of a queuing system. It should use `qrsh` in your case.
> >
> > What does:
> >
> > mpiexec --version
> > ompi_info | grep grid
> >
> > reveal? What does:
> >
> > qconf -sconf | egrep "(command|daemon)"
> >
> > show?
> >
> > -- Reuti
> >
> >
> > > Cheers,
> > >
> > > -David Laidlaw
> > >
> > >
> > >
> > >
> > > Here is how I submit the job:
> > >
> > >/usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme
> > >
> > >
> > > Here is what is in runme:
> > >
> > >   #!/bin/bash
> > >   #$ -cwd
> > >   #$ -pe orte_fill 1
> > >   env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display-
> > > allocation ./hello
> > >
> > >
> > > Here is hello.c:
> > >
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > >
> > > int main(int argc, char** argv) {
> > > // Initialize the MPI environment
> > > MPI_Init(NULL, NULL);
> > >
> > > // Get the number of processes
> > > 

Re: [OMPI users] TMPDIR for running openMPI job under grid

2019-07-29 Thread Kulshrestha, Vipul via users
Thanks. I will give this a try.

Regards,
Vipul


From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ralph 
Castain via users
Sent: Friday, July 26, 2019 3:24 PM
To: Open MPI Users 
Cc: Ralph Castain 
Subject: Re: [OMPI users] TMPDIR for running openMPI job under grid

Upgrade to OMPI v4 or at least something in the v3 series. If you continue to 
have a problem, then set PMIX_MCA_ptl=tcp in your environment.



On Jul 26, 2019, at 12:12 PM, Kulshrestha, Vipul via users 
mailto:users@lists.open-mpi.org>> wrote:

Hi,

I am trying to setup my open-mpi application to run under grid.

It works sometimes, but sometimes I get the below error. I have contacted my 
grid site administrator and the message from them is that they cannot change 
the TMPDIR path used in the grid configuration.

I have tried setting TNPDIR, but it does not help (probably because grid engine 
resets it).

What other alternatives do I have?

One other curious question is that why does open-mpi creates such a large name? 
I understand that part of this path is dependent on TMPDIR value, but even 
after that it adds additional unnecessary characters like “openmpi-sessions-<5 
digit number>@_0/”, which could have been 
shortened to something like “omp-<5 digit number>@_0/<5 digit 
number>” and saving 14 characters (almost 15% of possible length).

Thanks,
Vipul

PMIx has detected a temporary directory name that results
in a path that is too long for the Unix domain socket:

Temp dir: /var/spool/sge/wv2/tmp/<9 digit grid job id>.1.<16 character 
queuename>.q/openmpi-sessions-43757@<12 
character machine name>_0/50671

Try setting your TMPDIR environmental variable to point to
something shorter in length

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users