Hello GridUsers, My grid is running, it can deliver jobs, but they only run on one nodes at a time. When I tried running with mpirun in a batch script, i get errors like "execution daemon on host <hostname> didn't accept task" as shown at the bottom of this email.
I can run mpirun outside of sge without any problems. I am suspecting that when mpirun is put inside the sge batch script, it can not communicate with exec nodes successfully. My system information: 3 servers running Ubuntu Lucid Lynx with recompiled openmpi to support gridengine. SGE was installed via Ubuntu repository setup correct environmental variables. I also setup non-password ssh access for openmpi user account, which is the same account that I use to submit sge batch. Any help is very much appreciated. Vang. ============ERROR================ error: executing task of job 63 failed: execution daemon on host "node1" didn't accept task error: executing task of job 63 failed: execution daemon on host "submithost" didn't accept task -------------------------------------------------------------------------- A daemon (pid 13317) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. ============CONTENT OF SGE BATCH SUBMIT============== #!/bin/bash # run at current working directory #$ -cwd #$ -V # Specify the shell for this job #$ -S /bin/bash #$ -pe test_pe 5 #$ -P test1 # Merge the standard output and standard error #$ -j y # Specify the location of the output messages #$ -o messages.txt #---------Customization part starts below ------------- # Customization # Which email should the start running and edning of this job be emailed to # #$ -M <my_gmail_id>@gmail.com #$ -m be export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH mpirun -np $NSLOTS hostname mpirun -np $NSLOTS ~/hello _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
