I have public queue *intel_all.q* and  private queue *namd.q* with
*subordinate_list      intel_all.q=1*

Some nodes of namd.q included in intel_all.q, that have
*suspend_method        /storage/Scripts/job_resubmit.sh $job_id*

cat /storage/Scripts/job_resubmit.sh
#!/bin/sh
/storage/SGE/bin/lx24-amd64/qresub $1
/storage/SGE/bin/lx24-amd64/qdel $1

When even 1 job from private queue submitted, public jobs have to be resubmitted and killed.
Sometimes it doesn't work, they got status S (suspend)
sge143 lx24-amd64 24 45.65 47.3G 30.4G 48.0G 0.0
   namd.q               BIP   24/24
   intel_all.q          BIP *23/24    S*

5219266 0.50511 SemanticEx alexla *S* 12/29/2011 16:34:08 intel_all.q@sge143 1

and stiil actually running and take resources of the node (CPU & memory).
How I can solve this problem?



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to