Ideas anyone? I installed 8.0.0a, and ran into the same problem. There must
be something very stupid I'm missing.

On Fri, Oct 7, 2011 at 12:34 PM, Allan Tran <[email protected]> wrote:

> I am installing GE 8.0b for testing using binary (install_execd and
> install_qmaster). Everything seemed going very smoothly. However when I
> submit a test job (sleeper.sh), it didn't run.
> I guess there is something I missed in configurations (I used most default
> options when I installed).
> Here is snippet of what in the log files: (master node is qmaster and also
> execution)
>
> Qmaster log:
> 10/07/2011 12:15:26|  main|master|I|read job database with 0 entries in 0
> seconds
> 10/07/2011 12:15:26|  main|master|E|error opening file
> "/usr/local/sge/default/spool/qmaster/./sharetree" for reading: No such file
> or directory
> 10/07/2011 12:15:26|  main|master|I|qmaster hard descriptor limit is set to
> 8192
> 10/07/2011 12:15:26|  main|master|I|qmaster soft descriptor limit is set to
> 8192
> 10/07/2011 12:15:26|  main|master|I|qmaster will use max. 8172 file
> descriptors for communication
> 10/07/2011 12:15:26|  main|master|I|qmaster will accept max. 99 dynamic
> event clients
> 10/07/2011 12:15:26|  main|master|I|starting up SGE 8.0.0b (lx-amd64)
> 10/07/2011 12:22:57|worker|master|W|job 8.1 failed on host master invalid
> execution state because: shepherd exited with exit status 127: invalid
> execution state
>
> Execution log
> 10/07/2011 12:22:38|  main|master|I|starting up SGE 8.0.0b (lx-amd64)
> 10/07/2011 12:22:57|  main|master|E|shepherd of job 8.1 exited with exit
> status = 127
> 10/07/2011 12:22:57|  main|master|E|abnormal termination of shepherd for
> job 8.1: no "exit_status" file
> 10/07/2011 12:22:57|  main|master|E|can't open file active_jobs/8.1/error:
> No such file or directory
> 10/07/2011 12:22:57|  main|master|E|can't open pid file
> "active_jobs/8.1/pid" for job 8.1
> 10/07/2011 12:22:57|  main|master|E|can't open usage file
> "active_jobs/8.1/usage" for job 8.1: No such file or directory
> 10/07/2011 12:22:57|  main|master|E|shepherd exited with exit status 127:
> invalid execution state
>
> Both qmaster and execd are running
> root@master:/usr/local/sge/default/spool/qmaster[1113]> ps aguwx | grep
> sge
> sgeadmin  7410  0.0  0.8 653128 34976 ?        Sl   12:15   0:00
> /usr/local/sge/bin/lx-amd64/sge_qmaster
> sgeadmin  7546  0.0  0.0 111580  2532 ?        Sl   12:22   0:00
> /usr/local/sge/bin/lx-amd64/sge_execd
> root      7669  0.0  0.0  61192   764 pts/1    S+   12:33   0:00 grep sge
>
>
> Can someone help?
> Thank you
>
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to