[Please keep the list posted so others can follow.]

Am 15.01.2013 um 14:24 schrieb Weiner John:

> Hi Reuti:
> 
> Thank you for your questions and suggestions.  I'll try to answer them and to 
> show you with more clarity what is going wrong.  
> 1. Yes I do mean Open MPI and as far as I know there is only on "mpirun" 
> installed in this system.  Basically the Rocks cluster installation includes 
> a package called SGE that is installed together with everything else when the 
> cluster configuration is set up on the front end node.
> 2. Therefore I am quite sure that Open MPI was compiled with SGE.

In the links I posted was a way mentioned to test it:

$ ompi_info | grep grid
              MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3)


>  The same Rocks installation works with sge flawlessly on another cluster we 
> have here.

Open MPI could have been compiled outside of the Rocks installation. Each and 
every user could compiler his/her own version of Open MPI and put it e.g. in 
~/local/myopenmpi and it will work to use this versions only.


> 3. When I execute a shell file test.sh with ./test.sh, where test.sh is:
> 
> /opt/.../bin/mpiexec  -host 
> compute-0-0:16,compute-0-1:16,compute-0-2:16,compute-0-3:16  
> /opt/.../bin/fdtd-engine-mpich2nem  /home/jweiner/.../myprogram
> 
> the program executes as expected.  Sixteen processes are launched on each of 
> compute-0-0, compute-0-1, compute-0-2, compute-0-3
> 
> If I now place the same execution command line in a sge-type shell file, 
> myfile.sh,
> 
> #!/bin/bash
> #$ -V -cwd
> #$ -m bea
> #$ -M [email protected]
> /opt/.../bin/mpiexec  -host 
> compute-0-0:16,compute-0-1:16,compute-0-2:16,compute-0-3:16  
> /opt/.../bin/fdtd-engine-mpich2nem  /home/jweiner/.../myprogram

How do you know beforehand, that you will be granted access to theses nodes? 
Instead the list of hosts granted by SGE must be used. With a proper compiled 
open MPI it can even be left out when running under SGE. 

#!/bin/bash
#$ -cwd
#$ -m bea
#$ -M [email protected]
mpiexec  /opt/.../bin/fdtd-engine-mpich2nem  /home/jweiner/.../myprogram

(Why is mpich2 here in the name - you cant start MPICH2 programs with Open 
MPI's `mpiexec` - well, at least not to run them in parallel. Most llikely n 
times the serial version will be started.)

Personally I would judge the -V option as dangerous, as some playing around 
with the environment in your interactive session might give the effect that 
sometimes the job is running, and sometimes fails - depending what was changed 
in the shell. I prefer forwarding only these environment variables which I 
really need, or set them explicitly in the jobscript.


> 
> The command line is,
> 
> qsub -pe orte 64 myfile.sh
> 
> If I query the submission with qstat -f, I find 32 processes are distributed 
> to compute-0-2 and 32 processes to compute-0-3 with a status of "r" 
> indicating that they are running.

Nope. SGE will only output what was granted. Not that there is really load on 
these machines.


>  The compute nodes compute-0-0 and compute-0-1 are not used, and the error 
> file contains the message,
> 
> error: executing task of job 47 failed: execution daemon on host 
> "compute-0-0" didn't accept task
> error: executing task of job 47 failed: execution daemon on host 
> "compute-0-1" didn't accept task

Correct output, as you try to use these two nodes despite the fact that they 
weren't granted for your job.



> Despite the r status on compute-0-2 and compute-0-3, in fact nothing happens 
> and the operation has to aborted with qdel.
> 
> Somehow sge is not interacting correctly with the compute nodes.

It's working the other way round: SGE will grant access to the requested number 
of slots and Open MPI has to use these and only these.

-- Reuti



> Is there some environment variable to check or to edit? I didn't understand 
> your comment about qrsh -inherit.
> 
> Any suggestions would be greatly appreciated.
> 
> John
> 
> 
> On Jan 15, 2013, at 8:13 AM, Reuti <[email protected]> wrote:
> 
>> Am 15.01.2013 um 02:06 schrieb John Weiner:
>> 
>>> Dear Experts:
>>> 
>>> I am a newbie to linux clusters and have only yeoman competence in 
>>> information technology generally so my culture and intuition are not deep.  
>>> Some help on a perplexing problem would be greatly appreciated.
>>> 
>>> About a month ago we installed Rocks v. 6.1 on a small cluster consisting 
>>> of a FrontEnd and two compute nodes.  The installation proceeded without 
>>> error and parallel processing on the cluster works fine.  The SGE queue 
>>> works fine as well.  SGE is a package installed with Rocks software.
>>> 
>>> We have just installed another Rocks 6.1 cluster, using different hardware, 
>>> on a FrontEnd and 5 Compute nodes.  After some adjustments to the BIOS on 
>>> the motherboards of the compute nodes, the installation looks normal, 1 
>>> FrontEnd and compute-0-0, compute-0-1…compute-0-5.  The compute nodes 
>>> consist of a SuperMicro motherboard with dual processors, Intel E5 2650 
>>> with hyper threading.  Each compute node has a total of 16 physical cores 
>>> and with hyper threading the "effective" number of cores is 32.
>>> 
>>>> When parallel jobs, using MPI,
>> 
>> By MPI you mean Open MPI (as you use a PE "orte" below) - there is only one 
>> `mpirun` installed resp. the correct one called in the jobscript?
>> 
>> 
>>>> are submitted "by hand", typing out the explicit commands at the command 
>>>> line, the system works without any problem.  When the very same job is 
>>>> submitted to the SGE queue, an error is generated, and although qstat 
>>>> indicates a running program, in fact it is not.  qstat -f shows that the 
>>>> job was not distributed among the four compute nodes as specified by 
>>>> mpi.exe command.
>> 
>> It's working the other way round: SGE will grant access to the requested 
>> number of slots and Open MPI has to use these and only these.
>> 
>> http://www.open-mpi.org/faq/?category=building#build-rte-sge
>> 
>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>> 
>> The granted allocation can be viewed by:
>> 
>> $ qstat -g t
>> 
>> 
>>>> The command line for submitting the job to the SGE queue is
>>>> 
>>>> qsub -pe orte 64 shellfile.sh  (there are 64 cores specified for the job 
>>>> on 4 compute nodes)
>> 
>> You compiled Open MPI with --with-sge?
>> 
>> 
>>> In this case job 43 was started, but the program does not run on the 
>>> specified nodes with the specified cores.
>>>> 
>>>> The error from shellfile.sh.e43 is:
>>>> 
>>>> error: executing task of job 43 failed: execution daemon on host 
>>>> "compute-0-0" didn't accept task
>>>> error: executing task of job 43 failed: execution daemon on host 
>>>> "compute-0-1" didn't accept task
>> 
>> You set up a PE for Tight Integration of Open MPI?
>> 
>> One of the causes can be, that there is at least one `qrsh - inherit ...` 
>> call made too much to a slave machine of the parallel job than allowed by 
>> the granted slot count thereon.
>> 
>> -- Reuti
>> 
>> NB: Nowadays often only one `qrsh -inherit ...` call is made at all to each 
>> slave machine, as additional processes are started as forks (you can observe 
>> this with `ps -e f`).
>> 
>> 
>>>> The job had been submitted to compute-0-0, compute-0-1, compute-0-2, 
>>>> compute-0-3
>>>> 
>>>> What does "execution daemon on host "compute-0-0" didn't accept task" mean?
>>> 
>>> Since SGE works without problems on the earlier cluster, I don't understand 
>>> what where the error is here.
>>>> 
>>>> Any suggestions would be much appreciated.
>>> 
>>> John
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>> 
>> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to