Hello,
This must be the problem. I've check that each compute node can only
resolve his own IP address:
For example in compute-0-0:
/opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.2
Hostname: compute-0-0.local
Aliases: compute-0-0
Host Address(es): 10.4.0.2
10.4.0.3 (compute-0-1)
$ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.3
error resolving ip "10.4.0.3": can't resolve ip address (h_errno =
HOST_NOT_FOUND)
And the inverse on compute-0-1, it can resolve 10.4.0.3 but not 10.4.0.2.
Regards,
Guillermo.
El 12/11/2012 13:35, Guillermo Marco Puche escribió:
Hello,
Ok I've patched my nodes with the RPM fix for MPI and SGE. (i forgot
to install it on compute nodes).
Removed -np 16 argument and got this new error:
error: commlib error: access denied (client IP resolved to host name
"". This is not identical to clients host name "")
error: executing task of job 97 failed: failed sending task to
[email protected]: can't find connection
--------------------------------------------------------------------------
A daemon (pid 3037) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have
the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
El 12/11/2012 13:11, Reuti escribió:
Am 12.11.2012 um 12:18 schrieb Guillermo Marco Puche:
Hello,
I'm currently trying with the following job script and then
submiting with qsub.
I don't know why it just uses cpus of one of my two compute nodes.
It's not using both compute nodes. (compute-0-2 it's currently
powered off node).
#!/bin/bash
#$ -S /bin/bash
#$ -V
### name
#$ -N aln_left
### work dir
#$ -cwd
### outputs
#$ -j y
### PE
#$ -pe orte 16
### all.q
#$ -q all.q
mpirun -np 16 pBWA aln -f aln_left
/data_in/references/genomes/human/hg19/bwa_ref/hg19.fa
/data_in/data/rawdata/HapMap_1.fastq >
If the compute-0-2 is powered off, it won't get slots assigned by SGE.
The 16 slots are available on the actual machine - otherwise the job
should be in "qw" state? As Open MPI was compiled with tight
integration, the argument "-np 16" isn't necessary. It will detect
the granted amount of slots and their location automatically.
-- Reuti
/data_out_2/tmp/05_11_12/mpi/HapMap_cloud.left.sai
Here's all.q config file:
qname all.q
hostlist @allhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make mpich mpi orte openmpi smp
rerun FALSE
slots 0,[compute-0-0.local=8],[compute-0-1.local=8], \
[compute-0-2.local.sg=8]
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
Best regards,
Guillermo.
El 05/11/2012 12:01, Reuti escribió:
Hi,
Am 05.11.2012 um 10:55 schrieb Guillermo Marco Puche:
I've managed to compile Open MPI for Rocks:
ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0,
Component v1.4.3)
Now I'm really confused on how i should run my pBWA program with
Open MPI.
Program website (http://pbwa.sourceforge.net/) suggests something
like:
sqsub -q mpi -n 240 -r 1h --mpp 4G ./pBWA bla bla bla...
Seems to be a local proprietary command on Sharcnet, or at least a
wrapper to another unknown queuing system.
I don't have sqsub, but qsub provided by SGE. "-q" option isn't
valid for SGE since it's for queue selection.
Correct, the SGE paradigm is to request resources and SGE will
select an appropriate queue for your job which fullfils the
requirements.
Maybe the solution is to create a simple job bash script and
include parallel environment for SGE and the number of slots
(since pBWA internally supports Open MPI)
How is your actal setup of your SGE? Most likely you will need to
define a PE and request it during submission like for any other
Open MPI application:
$ qsub -pe orte 240 -l h_rt=1:00:00,h_vmem=4G ./pBWA bla bla bla...
Assuming "-n" gives the number of cores.
Assuming "-r 1h" means wallclock time: -l h_rt=1:00:00
Assuming "--mpp 4G" requests the memory per slot: -l h_vmem=4G
Necessary setup:
http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
-- Reuti
Regards,
Guillermo.
El 26/10/2012 12:21, Reuti escribió:
Am 26.10.2012 um 12:02 schrieb Guillermo Marco Puche:
Hello,
Like I said i'm using Rocks cluster 5.4.3 and it comes with
mpirun (Open MPI) 1.4.3.
But $ ompi_info | grep gridengine shows nothing.
So I'm confused if I've to update and rebuild open-mpi into the
latest version.
You can also remove the supplied version 1.4.3 from your system
and build it from source with SGE support. But I don't see the
advantage of using an old version. ROCKS supplies the source of
their used version of Open MPI?
Or if i can keep that current version of MPI and re-build it
(that would be the preferred option to keep the stability of the
cluster)
If you compile and install only in your own $HOME (as normal
user, no root access necessary), then there is no impact to any
system tool at all. You just have to take care which version you
use by setting the correct $PATH and $LD_LIBRARY_PATH during
compilation of your application and during execution of it.
Therefore I suggested to include the name of the used compiler
and Open MPI version in the build installation's directory name.
There was a question about the to be used version of `mpiexec`
just on the MPICH2 mailing list, maybe it's additional info:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-October/013318.html
-- Reuti
Thanks !
Best regards,
Guillermo.
El 26/10/2012 11:59, Reuti escribió:
Am 26.10.2012 um 09:40 schrieb Guillermo Marco Puche:
Hello,
Thank you for the links Reuti !
When they talk about:
shell $ ./configure --with-sge
It's in bash shell or in any other special shell?
There is no special shell required (please have a look at the
INSTALL file in Open MPI's tar-archive).
Do I've to be in a specified directory to execute that command?
Depends.
As it's set up according to the
http://en.wikipedia.org/wiki/GNU_build_system
, you can either:
$ tar -xf openmpi-1.6.2.tar.gz
$ cd openmpi-1.6.2
$ ./configure --prefix=$HOME/local/openmpi-1.6.2_gcc --with-sge
$ make
$ make install
It's quite common to build inside the source tree. But if it is
set up in the right way, it also supports building in different
directories inside or outside the source tree which avoids a
`make distclean` in case you want to generate different builds:
$ tar -xf openmpi-1.6.2.tar.gz
$ mkdir openmpi-gcc
$ cd openmpi-gcc
$ ../openmpi-1.6.2/configure
--prefix=$HOME/local/openmpi-1.6.2_gcc --with-sge
$ make
$ make install
While at the time in another window you can execute:
$ mkdir openmpi-intel
$ cd openmpi-intel
$ ../openmpi-1.6.2/configure
--prefix=$HOME/local/openmpi-1.6.2_intel CC=icc CXX=icpc
FC=ifort F77=ifort --disable-vt --with-sge
$ make
$ make install
(Not to confuse anyone: there is bug in combination of Intel
compiler and GNU headers with the above version of Open MPI,
disabling VampirTrace support helps.)
-- Reuti
Thank you !
Sorry again for my ignorance.
Regards,
Guillermo.
El 25/10/2012 19:50, Reuti escribió:
Am 25.10.2012 um 19:36 schrieb Guillermo Marco Puche:
Hello,
I've no idea who compiled the application. I just found on
seqanswers forum that pBWA was a nice speed up to the
original BWA since it supports native OPEN MPI.
As you told me i'll look further on how to compile open-mpi
with SGE. If anyone knows a good introduction/tutorial to
this would be appreciated.
The Open MPI site has huge documentation:
http://www.open-mpi.org/faq/?category=building#build-rte-sge
http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
Be sure that during execution you pick the correct `mpiexec`
and LD_LIBRARY_PATH from you own build. You can also adjust
the location of Open MPI with the usual --prefix. I put it in
--prefix==$HOME/local/openmpi-1.6.2_shared_gcc refelcting the
version I built.
-- Reuti
Then i'll try to run it with my current version of open-mpi
and update if needed.
Thanks.
Best regards,
Guillermo.
El 25/10/2012 18:53, Reuti escribió:
Please keep the list posted, so that others can participate
on the discussion. I'm not aware of this application, but
maybe someone else is on the list who could be of broader
help.
Again: who compiled the application, as I can see only the
source at the site you posted?
-- Reuti
Am 25.10.2012 um 13:23 schrieb Guillermo Marco Puche:
$ ompi_info | grep grid
Returns nothing. Like i said I'm newbie to MPI.
I didn't know that I had to compile anything. I've Rocks
installation out of the box.
So MPI is installed but nothing more I guess.
I've found an old thread in Rocks discuss list:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-April/057303.html
User asking is using this script:
*#$ -S /bin/bash*
*#*
*#*
*# Export all environment variables*
*#$ -V*
*# specify the PE and core #*
*#$ -pe mpi 128*
*# Customize job name*
*#$ -N job_hpl_2.0*
*# Use current working directory*
*#$ -cwd*
*# Join stdout and stder into one file*
*#$ -j y*
*# The mpirun command; note the lack of host names as
SGE will provide them
on-the-fly.*
*mpirun -np $NSLOTS ./xhpl >> xhpl.out*
But then I read this:
in rocks sge PE
mpi is loosely integrated
mpich and orte are tightly integrated
qsub require args are different for mpi mpich with orte
mpi and mpich need machinefile
by default
mpi, mpich are for mpich2
orte is for openmpi
regards
-LT
The program I need to run is pBWA:
http://pbwa.sourceforge.net/
It uses MPI.
At this moment i'm kinda confused on which is the next step.
I thought i just could run with MPI and a simple SGE job
pBWA with multiple processes.
Regards,
Guillermo.
El 25/10/2012 13:17, Reuti escribió:
Am 25.10.2012 um 13:11 schrieb Guillermo Marco Puche:
Hello Reuti,
I got stoned here. I've no idea what MPI library I've
got. I'm using Rocks Cluster Viper 5.4.3 which comes out
with Centos 5.6, SGE, SPM, OPEN MPI and MPI.
How can i check which library i got installed?
I found this:
$ mpirun -V
mpirun (Open MPI) 1.4.3
Report bugs to
http://www.open-mpi.org/community/help/
Good, and this one you also used to compile the application?
The check whether Open MPI was build with SGE support:
$ ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API
v2.0, Component v1.6.2)
-- Reuti
Thanks,
Best regards,
Guillermo.
El 25/10/2012 13:05, Reuti escribió:
Am 25.10.2012 um 10:37 schrieb Guillermo Marco Puche:
Hello !
I found a new version of my tool which supports
multi-threading but also MPI or OPENMPI for more
additional processes.
I'm kinda new to MPI with SGE. What would be the good
command for qsub or config inside a job file to ask
SGE to work with 2 MPI processes?
Will the following code work in a SGE job file?
#$ -pe mpi 2
That's supposed to make job work with 2 processes
instead of 1.
Not out of the box: it will grant 2 slots for the job
according to the allocation rules of the PE. But how to
start your application in the jobscript inside the
granted allocation is up to you. Fortunately the MPI
libraries got an (almost) automatic integration into
queuing systems nowadays without further user
intervention.
Which MPI library do you use when you compile your
application of the mentioned ones above?
-- Reuti
Regards,
Guillermo.
El 22/10/2012 17:19, Reuti escribió:
Am 22.10.2012 um 16:31 schrieb Guillermo Marco Puche:
I'm using a program where I can specify the number
of threads I want to use.
Only threads and not additional processes? Then you
are limited to one node, unless you add something
like
http://www.kerrighed.org/wiki/index.php/Main_Page or
http://www.scalemp.com
to get a cluster wide unique process and memory space.
-- Reuti
I'm able to launch multiple instances of that tool
in separate nodes.
For example: job_process_00 in compute-0-0,
job_process_01 in compute-1 etc.. each job is
calling that program which splits up in 8 threads
(each of my nodes has 8 CPUs).
When i setup 16 threads i can't split 8 threads per
node. So I would like to split them between 2
compute nodes.
Currently I've 4 compute nodes and i would like to
speed up the process setting 16 threads of my
program splitting between more than one compute
node. At this moment I'm stuck using only 1 compute
node per process with 8 threads.
Thank you !
Best regards,
Guillermo.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users