[Please keep the list posted so others can follow.]
Am 15.01.2013 um 14:24 schrieb Weiner John:
> Hi Reuti:
>
> Thank you for your questions and suggestions. I'll try to answer them and to
> show you with more clarity what is going wrong.
> 1. Yes I do mean Open MPI and as far as I know there is only on "mpirun"
> installed in this system. Basically the Rocks cluster installation includes
> a package called SGE that is installed together with everything else when the
> cluster configuration is set up on the front end node.
> 2. Therefore I am quite sure that Open MPI was compiled with SGE.
In the links I posted was a way mentioned to test it:
$ ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3)
> The same Rocks installation works with sge flawlessly on another cluster we
> have here.
Open MPI could have been compiled outside of the Rocks installation. Each and
every user could compiler his/her own version of Open MPI and put it e.g. in
~/local/myopenmpi and it will work to use this versions only.
> 3. When I execute a shell file test.sh with ./test.sh, where test.sh is:
>
> /opt/.../bin/mpiexec -host
> compute-0-0:16,compute-0-1:16,compute-0-2:16,compute-0-3:16
> /opt/.../bin/fdtd-engine-mpich2nem /home/jweiner/.../myprogram
>
> the program executes as expected. Sixteen processes are launched on each of
> compute-0-0, compute-0-1, compute-0-2, compute-0-3
>
> If I now place the same execution command line in a sge-type shell file,
> myfile.sh,
>
> #!/bin/bash
> #$ -V -cwd
> #$ -m bea
> #$ -M [email protected]
> /opt/.../bin/mpiexec -host
> compute-0-0:16,compute-0-1:16,compute-0-2:16,compute-0-3:16
> /opt/.../bin/fdtd-engine-mpich2nem /home/jweiner/.../myprogram
How do you know beforehand, that you will be granted access to theses nodes?
Instead the list of hosts granted by SGE must be used. With a proper compiled
open MPI it can even be left out when running under SGE.
#!/bin/bash
#$ -cwd
#$ -m bea
#$ -M [email protected]
mpiexec /opt/.../bin/fdtd-engine-mpich2nem /home/jweiner/.../myprogram
(Why is mpich2 here in the name - you cant start MPICH2 programs with Open
MPI's `mpiexec` - well, at least not to run them in parallel. Most llikely n
times the serial version will be started.)
Personally I would judge the -V option as dangerous, as some playing around
with the environment in your interactive session might give the effect that
sometimes the job is running, and sometimes fails - depending what was changed
in the shell. I prefer forwarding only these environment variables which I
really need, or set them explicitly in the jobscript.
>
> The command line is,
>
> qsub -pe orte 64 myfile.sh
>
> If I query the submission with qstat -f, I find 32 processes are distributed
> to compute-0-2 and 32 processes to compute-0-3 with a status of "r"
> indicating that they are running.
Nope. SGE will only output what was granted. Not that there is really load on
these machines.
> The compute nodes compute-0-0 and compute-0-1 are not used, and the error
> file contains the message,
>
> error: executing task of job 47 failed: execution daemon on host
> "compute-0-0" didn't accept task
> error: executing task of job 47 failed: execution daemon on host
> "compute-0-1" didn't accept task
Correct output, as you try to use these two nodes despite the fact that they
weren't granted for your job.
> Despite the r status on compute-0-2 and compute-0-3, in fact nothing happens
> and the operation has to aborted with qdel.
>
> Somehow sge is not interacting correctly with the compute nodes.
It's working the other way round: SGE will grant access to the requested number
of slots and Open MPI has to use these and only these.
-- Reuti
> Is there some environment variable to check or to edit? I didn't understand
> your comment about qrsh -inherit.
>
> Any suggestions would be greatly appreciated.
>
> John
>
>
> On Jan 15, 2013, at 8:13 AM, Reuti <[email protected]> wrote:
>
>> Am 15.01.2013 um 02:06 schrieb John Weiner:
>>
>>> Dear Experts:
>>>
>>> I am a newbie to linux clusters and have only yeoman competence in
>>> information technology generally so my culture and intuition are not deep.
>>> Some help on a perplexing problem would be greatly appreciated.
>>>
>>> About a month ago we installed Rocks v. 6.1 on a small cluster consisting
>>> of a FrontEnd and two compute nodes. The installation proceeded without
>>> error and parallel processing on the cluster works fine. The SGE queue
>>> works fine as well. SGE is a package installed with Rocks software.
>>>
>>> We have just installed another Rocks 6.1 cluster, using different hardware,
>>> on a FrontEnd and 5 Compute nodes. After some adjustments to the BIOS on
>>> the motherboards of the compute nodes, the installation looks normal, 1
>>> FrontEnd and compute-0-0, compute-0-1…compute-0-5. The compute nodes
>>> consist of a SuperMicro motherboard with dual processors, Intel E5 2650
>>> with hyper threading. Each compute node has a total of 16 physical cores
>>> and with hyper threading the "effective" number of cores is 32.
>>>
>>>> When parallel jobs, using MPI,
>>
>> By MPI you mean Open MPI (as you use a PE "orte" below) - there is only one
>> `mpirun` installed resp. the correct one called in the jobscript?
>>
>>
>>>> are submitted "by hand", typing out the explicit commands at the command
>>>> line, the system works without any problem. When the very same job is
>>>> submitted to the SGE queue, an error is generated, and although qstat
>>>> indicates a running program, in fact it is not. qstat -f shows that the
>>>> job was not distributed among the four compute nodes as specified by
>>>> mpi.exe command.
>>
>> It's working the other way round: SGE will grant access to the requested
>> number of slots and Open MPI has to use these and only these.
>>
>> http://www.open-mpi.org/faq/?category=building#build-rte-sge
>>
>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>
>> The granted allocation can be viewed by:
>>
>> $ qstat -g t
>>
>>
>>>> The command line for submitting the job to the SGE queue is
>>>>
>>>> qsub -pe orte 64 shellfile.sh (there are 64 cores specified for the job
>>>> on 4 compute nodes)
>>
>> You compiled Open MPI with --with-sge?
>>
>>
>>> In this case job 43 was started, but the program does not run on the
>>> specified nodes with the specified cores.
>>>>
>>>> The error from shellfile.sh.e43 is:
>>>>
>>>> error: executing task of job 43 failed: execution daemon on host
>>>> "compute-0-0" didn't accept task
>>>> error: executing task of job 43 failed: execution daemon on host
>>>> "compute-0-1" didn't accept task
>>
>> You set up a PE for Tight Integration of Open MPI?
>>
>> One of the causes can be, that there is at least one `qrsh - inherit ...`
>> call made too much to a slave machine of the parallel job than allowed by
>> the granted slot count thereon.
>>
>> -- Reuti
>>
>> NB: Nowadays often only one `qrsh -inherit ...` call is made at all to each
>> slave machine, as additional processes are started as forks (you can observe
>> this with `ps -e f`).
>>
>>
>>>> The job had been submitted to compute-0-0, compute-0-1, compute-0-2,
>>>> compute-0-3
>>>>
>>>> What does "execution daemon on host "compute-0-0" didn't accept task" mean?
>>>
>>> Since SGE works without problems on the earlier cluster, I don't understand
>>> what where the error is here.
>>>>
>>>> Any suggestions would be much appreciated.
>>>
>>> John
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>>
>>
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users