Re: [gridengine users] Split process between multiple nodes.

Guillermo Marco Puche Mon, 12 Nov 2012 04:49:09 -0800

Hello,

This must be the problem. I've check that each compute node can onlyresolve his own IP address:


For example in compute-0-0:

/opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.2
Hostname: compute-0-0.local
Aliases:  compute-0-0
Host Address(es): 10.4.0.2

10.4.0.3 (compute-0-1)

$ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.3

error resolving ip "10.4.0.3": can't resolve ip address (h_errno =HOST_NOT_FOUND)


And the inverse on compute-0-1, it can resolve 10.4.0.3 but not 10.4.0.2.

Regards,
Guillermo.
El 12/11/2012 13:35, Guillermo Marco Puche escribió:

Hello,
Ok I've patched my nodes with the RPM fix for MPI and SGE. (i forgotto install it on compute nodes).
Removed -np 16 argument and got this new error:
error: commlib error: access denied (client IP resolved to host name"". This is not identical to clients host name "")error: executing task of job 97 failed: failed sending task to[email protected]: can't find connection--------------------------------------------------------------------------
A daemon (pid 3037) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to havethe
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
El 12/11/2012 13:11, Reuti escribió:
Am 12.11.2012 um 12:18 schrieb Guillermo Marco Puche:
Hello,
I'm currently trying with the following job script and thensubmiting with qsub.I don't know why it just uses cpus of one of my two compute nodes.It's not using both compute nodes. (compute-0-2 it's currentlypowered off node).
#!/bin/bash
#$ -S /bin/bash
#$ -V
### name
#$ -N aln_left
### work dir
#$ -cwd
### outputs
#$ -j y
### PE
#$ -pe orte 16
### all.q
#$ -q all.q
mpirun -np 16 pBWA aln -f aln_left/data_in/references/genomes/human/hg19/bwa_ref/hg19.fa/data_in/data/rawdata/HapMap_1.fastq >
If the compute-0-2 is powered off, it won't get slots assigned by SGE.
The 16 slots are available on the actual machine - otherwise the jobshould be in "qw" state? As Open MPI was compiled with tightintegration, the argument "-np 16" isn't necessary. It will detectthe granted amount of slots and their location automatically.
-- Reuti
/data_out_2/tmp/05_11_12/mpi/HapMap_cloud.left.sai

Here's all.q config file:

qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpich mpi orte openmpi smp
rerun                 FALSE
slots 0,[compute-0-0.local=8],[compute-0-1.local=8], \
                      [compute-0-2.local.sg=8]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Best regards,
Guillermo.


El 05/11/2012 12:01, Reuti escribió:
Hi,

Am 05.11.2012 um 10:55 schrieb Guillermo Marco Puche:
I've managed to compile Open MPI for Rocks:
ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0,Component v1.4.3)
Now I'm really confused on how i should run my pBWA program withOpen MPI.Program website (http://pbwa.sourceforge.net/) suggests somethinglike:
sqsub -q mpi -n 240 -r 1h --mpp 4G ./pBWA bla bla bla...
Seems to be a local proprietary command on Sharcnet, or at least awrapper to another unknown queuing system.
I don't have sqsub, but qsub provided by SGE. "-q" option isn'tvalid for SGE since it's for queue selection.
Correct, the SGE paradigm is to request resources and SGE willselect an appropriate queue for your job which fullfils therequirements.
Maybe the solution is to create a simple job bash script andinclude parallel environment for SGE and the number of slots(since pBWA internally supports Open MPI)
How is your actal setup of your SGE? Most likely you will need todefine a PE and request it during submission like for any otherOpen MPI application:
$ qsub -pe orte 240 -l h_rt=1:00:00,h_vmem=4G ./pBWA bla bla bla...

Assuming "-n" gives the number of cores.
Assuming "-r 1h" means wallclock time: -l h_rt=1:00:00
Assuming "--mpp 4G" requests the memory per slot: -l h_vmem=4G

Necessary setup:

http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge

-- Reuti
Regards,
Guillermo.

El 26/10/2012 12:21, Reuti escribió:
Am 26.10.2012 um 12:02 schrieb Guillermo Marco Puche:
Hello,
Like I said i'm using Rocks cluster 5.4.3 and it comes withmpirun (Open MPI) 1.4.3.
But $ ompi_info | grep gridengine shows nothing.
So I'm confused if I've to update and rebuild open-mpi into thelatest version.
You can also remove the supplied version 1.4.3 from your systemand build it from source with SGE support. But I don't see theadvantage of using an old version. ROCKS supplies the source oftheir used version of Open MPI?
Or if i can keep that current version of MPI and re-build it(that would be the preferred option to keep the stability of thecluster)
If you compile and install only in your own $HOME (as normaluser, no root access necessary), then there is no impact to anysystem tool at all. You just have to take care which version youuse by setting the correct $PATH and $LD_LIBRARY_PATH duringcompilation of your application and during execution of it.Therefore I suggested to include the name of the used compilerand Open MPI version in the build installation's directory name.
There was a question about the to be used version of `mpiexec`just on the MPICH2 mailing list, maybe it's additional info:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-October/013318.html
-- Reuti
Thanks !

Best regards,
Guillermo.

El 26/10/2012 11:59, Reuti escribió:
Am 26.10.2012 um 09:40 schrieb Guillermo Marco Puche:
Hello,

Thank you for the links Reuti !

When they talk about:

shell $ ./configure --with-sge

It's in bash shell or in any other special shell?
There is no special shell required (please have a look at theINSTALL file in Open MPI's tar-archive).
Do I've to be in a specified directory to execute that command?
Depends.

As it's set up according to the
http://en.wikipedia.org/wiki/GNU_build_system
, you can either:

$ tar -xf openmpi-1.6.2.tar.gz
$ cd openmpi-1.6.2
$ ./configure --prefix=$HOME/local/openmpi-1.6.2_gcc --with-sge
$ make
$ make install
It's quite common to build inside the source tree. But if it isset up in the right way, it also supports building in differentdirectories inside or outside the source tree which avoids a`make distclean` in case you want to generate different builds:
$ tar -xf openmpi-1.6.2.tar.gz
$ mkdir openmpi-gcc
$ cd openmpi-gcc
$ ../openmpi-1.6.2/configure--prefix=$HOME/local/openmpi-1.6.2_gcc --with-sge
$ make
$ make install

While at the time in another window you can execute:

$ mkdir openmpi-intel
$ cd openmpi-intel
$ ../openmpi-1.6.2/configure--prefix=$HOME/local/openmpi-1.6.2_intel CC=icc CXX=icpcFC=ifort F77=ifort --disable-vt --with-sge
$ make
$ make install
(Not to confuse anyone: there is bug in combination of Intelcompiler and GNU headers with the above version of Open MPI,disabling VampirTrace support helps.)
-- Reuti
Thank you !
Sorry again for my ignorance.

Regards,
Guillermo.

El 25/10/2012 19:50, Reuti escribió:
Am 25.10.2012 um 19:36 schrieb Guillermo Marco Puche:
Hello,
I've no idea who compiled the application. I just found onseqanswers forum that pBWA was a nice speed up to theoriginal BWA since it supports native OPEN MPI.
As you told me i'll look further on how to compile open-mpiwith SGE. If anyone knows a good introduction/tutorial tothis would be appreciated.
The Open MPI site has huge documentation:


http://www.open-mpi.org/faq/?category=building#build-rte-sge



http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
Be sure that during execution you pick the correct `mpiexec`and LD_LIBRARY_PATH from you own build. You can also adjustthe location of Open MPI with the usual --prefix. I put it in--prefix==$HOME/local/openmpi-1.6.2_shared_gcc refelcting theversion I built.
-- Reuti
Then i'll try to run it with my current version of open-mpiand update if needed.
Thanks.

Best regards,
Guillermo.

El 25/10/2012 18:53, Reuti escribió:
Please keep the list posted, so that others can participateon the discussion. I'm not aware of this application, butmaybe someone else is on the list who could be of broaderhelp.
Again: who compiled the application, as I can see only thesource at the site you posted?
-- Reuti


Am 25.10.2012 um 13:23 schrieb Guillermo Marco Puche:
$ ompi_info | grep grid

Returns nothing. Like i said I'm newbie to MPI.
I didn't know that I had to compile anything. I've Rocksinstallation out of the box.
So MPI is installed but nothing more I guess.

I've found an old thread in Rocks discuss list:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-April/057303.html
User asking is using this script:

  *#$ -S /bin/bash*

  *#*

  *#*

  *# Export all environment variables*

  *#$ -V*

  *# specify the PE and core #*

  *#$ -pe mpi 128*

  *# Customize job name*

  *#$ -N job_hpl_2.0*

  *# Use current working directory*

  *#$ -cwd*

  *# Join stdout and stder into one file*

  *#$ -j y*
*# The mpirun command; note the lack of host names asSGE will provide them
  on-the-fly.*

  *mpirun -np $NSLOTS ./xhpl >> xhpl.out*



But then I read this:


in rocks  sge PE
mpi is loosely integrated
mpich and orte are tightly integrated
qsub require args are different for mpi mpich with orte

mpi and mpich need machinefile

by default
mpi, mpich are for mpich2
orte is for openmpi
regards
-LT


The program I need to run is pBWA:
  http://pbwa.sourceforge.net/


It uses MPI.

At this moment i'm kinda confused on which is the next step.
I thought i just could run with MPI and a simple SGE jobpBWA with multiple processes.
Regards,
Guillermo.


El 25/10/2012 13:17, Reuti escribió:
Am 25.10.2012 um 13:11 schrieb Guillermo Marco Puche:
Hello Reuti,
I got stoned here. I've no idea what MPI library I'vegot. I'm using Rocks Cluster Viper 5.4.3 which comes outwith Centos 5.6, SGE, SPM, OPEN MPI and MPI.
How can i check which library i got installed?

I found this:

$ mpirun -V
mpirun (Open MPI) 1.4.3

Report bugs to


http://www.open-mpi.org/community/help/
Good, and this one you also used to compile the application?

The check whether Open MPI was build with SGE support:

$ ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, APIv2.0, Component v1.6.2)
-- Reuti
Thanks,

Best regards,
Guillermo.

El 25/10/2012 13:05, Reuti escribió:
Am 25.10.2012 um 10:37 schrieb Guillermo Marco Puche:
Hello !
I found a new version of my tool which supportsmulti-threading but also MPI or OPENMPI for moreadditional processes.
I'm kinda new to MPI with SGE. What would be the goodcommand for qsub or config inside a job file to askSGE to work with 2 MPI processes?
Will the following code work in a SGE job file?

#$ -pe mpi 2
That's supposed to make job work with 2 processesinstead of 1.
Not out of the box: it will grant 2 slots for the jobaccording to the allocation rules of the PE. But how tostart your application in the jobscript inside thegranted allocation is up to you. Fortunately the MPIlibraries got an (almost) automatic integration intoqueuing systems nowadays without further userintervention.
Which MPI library do you use when you compile yourapplication of the mentioned ones above?
-- Reuti
Regards,
Guillermo.

El 22/10/2012 17:19, Reuti escribió:
Am 22.10.2012 um 16:31 schrieb Guillermo Marco Puche:
I'm using a program where I can specify the numberof threads I want to use.
Only threads and not additional processes? Then youare limited to one node, unless you add somethinglikehttp://www.kerrighed.org/wiki/index.php/Main_Page orhttp://www.scalemp.com
  to get a cluster wide unique process and memory space.

-- Reuti
I'm able to launch multiple instances of that toolin separate nodes.For example: job_process_00 in compute-0-0,job_process_01 in compute-1 etc.. each job iscalling that program which splits up in 8 threads(each of my nodes has 8 CPUs).
When i setup 16 threads i can't split 8 threads pernode. So I would like to split them between 2compute nodes.
Currently I've 4 compute nodes and i would like tospeed up the process setting 16 threads of myprogram splitting between more than one computenode. At this moment I'm stuck using only 1 computenode per process with 8 threads.
Thank you !

Best regards,
Guillermo.
_______________________________________________
users mailing list




[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list



[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Split process between multiple nodes.

Reply via email to