Hi William and Reuti,
Thank you for your suggestions and your time. They are really helpful. I solved almost of my problems.

I installed rsh-redone-client and rsh-redone-server, also I modify my PE so that "control_slaves TRUE" is set. I can run this part now:

mpirun -np $NSLOTS hostname
mpirun -np $NSLOTS ~/hello

However I still can not start interactive PE with: qsh or qrsh. They both said:
---------
$ qrsh -pe test_pe 5
Your "qrsh" request could not be scheduled, try again later.
---------
qsh -pe test_pe 5
Your job 50 ("INTERACTIVE") has been submitted
waiting for interactive job to be scheduled ...

Your "qsh" request could not be scheduled, try again later.
---------

I googled and there was something mentioned about editing/etc/hosts.equiv file to permit 
rsh and rlogin without password. However, typing "qconf -mconf" at the 
management host, I saw this:
----
rlogin_daemon                /usr/sbin/sshd -i
rlogin_command               /usr/bin/ssh
----

Do I need to change something in the queue and PE to run interactive PE?

Regards
Vang.

On 11/16/11 11:03 AM, Reuti wrote:
Hi,

Am 16.11.2011 um 04:29 schrieb Vang Le:

Hello GridUsers,
My grid is running, it can deliver jobs, but they only run on one nodes at a 
time.
When I tried running with mpirun in a batch script, i get errors like "execution daemon on 
host<hostname>  didn't accept task" as shown at the bottom of this email.
can you please check, whether your Open MPI was built with support for SGE 
properly:

$ ompi_info | grep grid
                  MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.3)

A simple `hostname` should work. You installed this version of Open MPI on all machines? 
What does your PE definition look like: "control_slaves TRUE" is set?

-- Reuti


I can run mpirun outside of sge without any problems.
I am suspecting that when mpirun is put inside the sge batch script, it can not 
communicate with exec nodes successfully.


My system information:
3 servers running Ubuntu Lucid Lynx with recompiled openmpi to support 
gridengine. SGE was installed via Ubuntu repository setup correct environmental 
variables.
I also setup non-password ssh access for openmpi user account, which is the 
same account that I use to submit sge batch.


Any help is very much appreciated.

Vang.




============ERROR================
error: executing task of job 63 failed: execution daemon on host "node1" didn't 
accept task
error: executing task of job 63 failed: execution daemon on host "submithost" 
didn't accept task
--------------------------------------------------------------------------
A daemon (pid 13317) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.


============CONTENT OF SGE BATCH SUBMIT==============

#!/bin/bash

# run at current working directory
#$ -cwd
#$ -V
# Specify the shell for this job
#$ -S /bin/bash
#$ -pe test_pe 5
#$ -P test1

# Merge the standard output and standard error
#$ -j y

# Specify the location of the output messages
#$ -o messages.txt

#---------Customization part starts below -------------
# Customization
# Which email should the start running and edning of this job be emailed to
#
#$ -M<my_gmail_id>@gmail.com
#$ -m be

export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH

mpirun -np $NSLOTS hostname
mpirun -np $NSLOTS ~/hello




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to