Hi,
Am 16.11.2011 um 04:29 schrieb Vang Le:
> Hello GridUsers,
> My grid is running, it can deliver jobs, but they only run on one nodes at a
> time.
> When I tried running with mpirun in a batch script, i get errors like
> "execution daemon on host <hostname> didn't accept task" as shown at the
> bottom of this email.
can you please check, whether your Open MPI was built with support for SGE
properly:
$ ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.3)
A simple `hostname` should work. You installed this version of Open MPI on all
machines? What does your PE definition look like: "control_slaves TRUE" is set?
-- Reuti
> I can run mpirun outside of sge without any problems.
> I am suspecting that when mpirun is put inside the sge batch script, it can
> not communicate with exec nodes successfully.
>
>
> My system information:
> 3 servers running Ubuntu Lucid Lynx with recompiled openmpi to support
> gridengine. SGE was installed via Ubuntu repository setup correct
> environmental variables.
> I also setup non-password ssh access for openmpi user account, which is the
> same account that I use to submit sge batch.
>
>
> Any help is very much appreciated.
>
> Vang.
>
>
>
>
> ============ERROR================
> error: executing task of job 63 failed: execution daemon on host "node1"
> didn't accept task
> error: executing task of job 63 failed: execution daemon on host "submithost"
> didn't accept task
> --------------------------------------------------------------------------
> A daemon (pid 13317) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
>
>
> ============CONTENT OF SGE BATCH SUBMIT==============
>
> #!/bin/bash
>
> # run at current working directory
> #$ -cwd
> #$ -V
> # Specify the shell for this job
> #$ -S /bin/bash
> #$ -pe test_pe 5
> #$ -P test1
>
> # Merge the standard output and standard error
> #$ -j y
>
> # Specify the location of the output messages
> #$ -o messages.txt
>
> #---------Customization part starts below -------------
> # Customization
> # Which email should the start running and edning of this job be emailed to
> #
> #$ -M <my_gmail_id>@gmail.com
> #$ -m be
>
> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
>
> mpirun -np $NSLOTS hostname
> mpirun -np $NSLOTS ~/hello
>
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users