Hi Chris,

I don't see the au state indicated (but maybe I did the command wrong)

[root@mseas rocksconfig.d]# qstat -f -u '*'

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     12 0.50000 PeJob      phaley       qw    05/25/2016 09:55:03     1

One more piece of information, on one of my compute nodes, I did try manually launching a deamon but that didn't seem to help either

compute-0-0(~)% ps aux | grep sge
sge      23767  0.0  0.0 153572  2248 ?        Sl   May24 0:02 ./sge_execd

I agree that I don't have a functional grid, but I don't know where the exact problem is yet.

Thanks

On 05/25/2016 11:17 AM, Chris Dagdigian wrote:

I'd be willing to bet the output of "qstat -f -u '*' " shows that all your compute nodes are in 'au' state

If there is no sge_execd process running on each compute node then Grid Engine won't work and it can't dispatch "work" to those nodes.

The errors you see and the jobs pending forever in wait state is just a symptom of the real problem -- you have no functional grid in which to dispatch the jobs.

Basically your compute nodes fell over; if you can restart SGE on those nodes and monitor via 'qstat -f' to confirm that the 'au' state goes away then your jobs should start flowing again

Chris



Pat Haley wrote:

We have also noticed that there are no sge deamons running on any of the execution nodes (I don't know if that is normal or not). We have also collected the information below from qconf. Any help in resolving this would be greatly appreciated.


--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  pha...@mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to