Hi Chris,
I don't see the au state indicated (but maybe I did the command wrong)
[root@mseas rocksconfig.d]# qstat -f -u '*'
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
12 0.50000 PeJob phaley qw 05/25/2016 09:55:03 1
One more piece of information, on one of my compute nodes, I did try
manually launching a deamon but that didn't seem to help either
compute-0-0(~)% ps aux | grep sge
sge 23767 0.0 0.0 153572 2248 ? Sl May24 0:02 ./sge_execd
I agree that I don't have a functional grid, but I don't know where the
exact problem is yet.
Thanks
On 05/25/2016 11:17 AM, Chris Dagdigian wrote:
I'd be willing to bet the output of "qstat -f -u '*' " shows that all
your compute nodes are in 'au' state
If there is no sge_execd process running on each compute node then
Grid Engine won't work and it can't dispatch "work" to those nodes.
The errors you see and the jobs pending forever in wait state is just
a symptom of the real problem -- you have no functional grid in which
to dispatch the jobs.
Basically your compute nodes fell over; if you can restart SGE on
those nodes and monitor via 'qstat -f' to confirm that the 'au' state
goes away then your jobs should start flowing again
Chris
Pat Haley wrote:
We have also noticed that there are no sge deamons running on any of
the execution nodes (I don't know if that is normal or not). We have
also collected the information below from qconf. Any help in
resolving this would be greatly appreciated.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: pha...@mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users