I am installing GE 8.0b for testing using binary (install_execd and install_qmaster). Everything seemed going very smoothly. However when I submit a test job (sleeper.sh), it didn't run. I guess there is something I missed in configurations (I used most default options when I installed). Here is snippet of what in the log files: (master node is qmaster and also execution)
Qmaster log: 10/07/2011 12:15:26| main|master|I|read job database with 0 entries in 0 seconds 10/07/2011 12:15:26| main|master|E|error opening file "/usr/local/sge/default/spool/qmaster/./sharetree" for reading: No such file or directory 10/07/2011 12:15:26| main|master|I|qmaster hard descriptor limit is set to 8192 10/07/2011 12:15:26| main|master|I|qmaster soft descriptor limit is set to 8192 10/07/2011 12:15:26| main|master|I|qmaster will use max. 8172 file descriptors for communication 10/07/2011 12:15:26| main|master|I|qmaster will accept max. 99 dynamic event clients 10/07/2011 12:15:26| main|master|I|starting up SGE 8.0.0b (lx-amd64) 10/07/2011 12:22:57|worker|master|W|job 8.1 failed on host master invalid execution state because: shepherd exited with exit status 127: invalid execution state Execution log 10/07/2011 12:22:38| main|master|I|starting up SGE 8.0.0b (lx-amd64) 10/07/2011 12:22:57| main|master|E|shepherd of job 8.1 exited with exit status = 127 10/07/2011 12:22:57| main|master|E|abnormal termination of shepherd for job 8.1: no "exit_status" file 10/07/2011 12:22:57| main|master|E|can't open file active_jobs/8.1/error: No such file or directory 10/07/2011 12:22:57| main|master|E|can't open pid file "active_jobs/8.1/pid" for job 8.1 10/07/2011 12:22:57| main|master|E|can't open usage file "active_jobs/8.1/usage" for job 8.1: No such file or directory 10/07/2011 12:22:57| main|master|E|shepherd exited with exit status 127: invalid execution state Both qmaster and execd are running root@master:/usr/local/sge/default/spool/qmaster[1113]> ps aguwx | grep sge sgeadmin 7410 0.0 0.8 653128 34976 ? Sl 12:15 0:00 /usr/local/sge/bin/lx-amd64/sge_qmaster sgeadmin 7546 0.0 0.0 111580 2532 ? Sl 12:22 0:00 /usr/local/sge/bin/lx-amd64/sge_execd root 7669 0.0 0.0 61192 764 pts/1 S+ 12:33 0:00 grep sge Can someone help? Thank you
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
