On 16 November 2011 13:52, Vang Le <[email protected]> wrote: > Hi William and Reuti, > Thank you for your suggestions and your time. They are really helpful. I > solved almost of my problems. > > I installed rsh-redone-client and rsh-redone-server, also I modify my PE so > that "control_slaves TRUE" is set. I can runĀ this part now: > > mpirun -np $NSLOTS hostname > mpirun -np $NSLOTS ~/hello > > However I still can not start interactive PE with: qsh or qrsh. They both > said: > --------- > $ qrsh -pe test_pe 5 > Your "qrsh" request could not be scheduled, try again later. > --------- > qsh -pe test_pe 5 > Your job 50 ("INTERACTIVE") has been submitted > waiting for interactive job to be scheduled ... > > Your "qsh" request could not be scheduled, try again later. > --------- > > I googled and there was something mentioned about editing /etc/hosts.equiv > file to permit rsh and rlogin without password. However, typing "qconf > -mconf" at the management host, I saw this: > ---- > rlogin_daemon /usr/sbin/sshd -i > rlogin_command /usr/bin/ssh > ---- > > Do I need to change something in the queue and PE to run interactive PE? Check qtype in the queue_conf is either INTERACTIVE or BATCH INTERACTIVE if you want to run without -now n
William > > Regards > Vang. > > On 11/16/11 11:03 AM, Reuti wrote: > > Hi, > > Am 16.11.2011 um 04:29 schrieb Vang Le: > > Hello GridUsers, > My grid is running, it can deliver jobs, but they only run on one nodes at a > time. > When I tried running with mpirun in a batch script, i get errors like > "execution daemon on host <hostname> didn't accept task" as shown at the > bottom of this email. > > can you please check, whether your Open MPI was built with support for SGE > properly: > > $ ompi_info | grep grid > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.3) > > A simple `hostname` should work. You installed this version of Open MPI on > all machines? What does your PE definition look like: "control_slaves TRUE" > is set? > > -- Reuti > > > I can run mpirun outside of sge without any problems. > I am suspecting that when mpirun is put inside the sge batch script, it can > not communicate with exec nodes successfully. > > > My system information: > 3 servers running Ubuntu Lucid Lynx with recompiled openmpi to support > gridengine. SGE was installed via Ubuntu repository setup correct > environmental variables. > I also setup non-password ssh access for openmpi user account, which is the > same account that I use to submit sge batch. > > > Any help is very much appreciated. > > Vang. > > > > > ============ERROR================ > error: executing task of job 63 failed: execution daemon on host "node1" > didn't accept task > error: executing task of job 63 failed: execution daemon on host > "submithost" didn't accept task > -------------------------------------------------------------------------- > A daemon (pid 13317) died unexpectedly with status 1 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > > > ============CONTENT OF SGE BATCH SUBMIT============== > > #!/bin/bash > > # run at current working directory > #$ -cwd > #$ -V > # Specify the shell for this job > #$ -S /bin/bash > #$ -pe test_pe 5 > #$ -P test1 > > # Merge the standard output and standard error > #$ -j y > > # Specify the location of the output messages > #$ -o messages.txt > > #---------Customization part starts below ------------- > # Customization > # Which email should the start running and edning of this job be emailed to > # > #$ -M <my_gmail_id>@gmail.com > #$ -m be > > export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH > > mpirun -np $NSLOTS hostname > mpirun -np $NSLOTS ~/hello > > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
