Am 18.12.2014 um 22:21 schrieb [email protected]:
> 
> We've got a job that was suspended via:
> 
>       qmod -sj $jobid
> 
> that's continuing to run.  The job consists of a BASH script, which in
> turn submits other jobs in a loop, sleeping for 30 seconds after each loop.
> 
> When I examine the job status on the node where it is executing via:
>       ps -e f | grep $JOBID
> 
> I see that the process is sleeping (state "S"), which is not unexpected,
> given the 'sleep 30' in the loop, but not suspended (state "T"):
> 
>       30559 ?        SNs    0:02  |   \_ /bin/bash 
> /var/tmp/gridengine/8.1.6/default/spool/node-5-2/job_scripts/2367998

Maybe it was introduced in this edition, as in 6.2u5 it's working for me. Do 
you have a chance to test any other version on another machine with your 
application in question?

-- Reuti


> Indeed, the job is not suspended, as it keeps performing the action
> inside the loop.
> 
> The problem can be consistently reproduced with a trivial job, such as:
> 
> ------------------------
> #! /bin/bash
> i=0
> while [ $i -le 100 ]
> do
>       date
>       i=$((i + 1))
>       sleep 30
> done
> ------------------------
> 
> Submitting that job to SGE, then executing 'qmod -sj $jobid' after it
> starts does not suspend the running job. The 'qstat' command does show
> the job as being in the 's' (suspended) state.
> 
> We're not using any custom 'suspend_method' or changing the default
> signals sent by SGE.
> 
> Jobs that are suspended (due to subordinated queues) by SGE have never
> shown this behavior.
> 
> Any suggestions about how to proceed with troubleshooting?
> 
> Thanks,
> 
> Mark
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to