Ideas anyone? I installed 8.0.0a, and ran into the same problem. There must be something very stupid I'm missing.
On Fri, Oct 7, 2011 at 12:34 PM, Allan Tran <[email protected]> wrote: > I am installing GE 8.0b for testing using binary (install_execd and > install_qmaster). Everything seemed going very smoothly. However when I > submit a test job (sleeper.sh), it didn't run. > I guess there is something I missed in configurations (I used most default > options when I installed). > Here is snippet of what in the log files: (master node is qmaster and also > execution) > > Qmaster log: > 10/07/2011 12:15:26| main|master|I|read job database with 0 entries in 0 > seconds > 10/07/2011 12:15:26| main|master|E|error opening file > "/usr/local/sge/default/spool/qmaster/./sharetree" for reading: No such file > or directory > 10/07/2011 12:15:26| main|master|I|qmaster hard descriptor limit is set to > 8192 > 10/07/2011 12:15:26| main|master|I|qmaster soft descriptor limit is set to > 8192 > 10/07/2011 12:15:26| main|master|I|qmaster will use max. 8172 file > descriptors for communication > 10/07/2011 12:15:26| main|master|I|qmaster will accept max. 99 dynamic > event clients > 10/07/2011 12:15:26| main|master|I|starting up SGE 8.0.0b (lx-amd64) > 10/07/2011 12:22:57|worker|master|W|job 8.1 failed on host master invalid > execution state because: shepherd exited with exit status 127: invalid > execution state > > Execution log > 10/07/2011 12:22:38| main|master|I|starting up SGE 8.0.0b (lx-amd64) > 10/07/2011 12:22:57| main|master|E|shepherd of job 8.1 exited with exit > status = 127 > 10/07/2011 12:22:57| main|master|E|abnormal termination of shepherd for > job 8.1: no "exit_status" file > 10/07/2011 12:22:57| main|master|E|can't open file active_jobs/8.1/error: > No such file or directory > 10/07/2011 12:22:57| main|master|E|can't open pid file > "active_jobs/8.1/pid" for job 8.1 > 10/07/2011 12:22:57| main|master|E|can't open usage file > "active_jobs/8.1/usage" for job 8.1: No such file or directory > 10/07/2011 12:22:57| main|master|E|shepherd exited with exit status 127: > invalid execution state > > Both qmaster and execd are running > root@master:/usr/local/sge/default/spool/qmaster[1113]> ps aguwx | grep > sge > sgeadmin 7410 0.0 0.8 653128 34976 ? Sl 12:15 0:00 > /usr/local/sge/bin/lx-amd64/sge_qmaster > sgeadmin 7546 0.0 0.0 111580 2532 ? Sl 12:22 0:00 > /usr/local/sge/bin/lx-amd64/sge_execd > root 7669 0.0 0.0 61192 764 pts/1 S+ 12:33 0:00 grep sge > > > Can someone help? > Thank you > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
