Hi,
Am 01.11.2013 um 19:18 schrieb Joseph Farran:
> Yes, after going through the logs, the subsequent restarts are messed up.
>
> I've played with it more and there is easy no way to do this inside the job
> submission script,
Inside the submission script it's possible - I thought you were looking to get
it implemented in SGE (but the user has to take care of it [i.e. trust the
users] - or using a "startup_method"):
#!/bin/sh
. /usr/sge/default/common/settings.sh
{ sleep 172800; qmod -sj $JOB_ID; } &
./my_application
> so I will have to resort ( as you indicated ) to using outside script to run
> periodically and do a "qsub -sj job / job.task-id when near the s_rt value.
>
> It seems to me that Grid Engine is missing an option in the checkpoint
> environment to deal when s_rt value has been reached to then trigger the
> equivalent of a suspension ( "qsub -sj " ).
Yes. I would call it runtime-intervall inside the checkpoint definition or so,
to distinguish it from s/h_rt.
-- Reuti
> Best,
> Joseph
>
> On 10/31/2013 04:23 PM, Reuti wrote:
>>
>> Although this looks fine, I can't get it working. I mean: it's working for
>> the first time, but in the second iteration the job is killed directly even
>> if there is no h_rt attached at all (or set in the queue definition).
>>
>> It looks like SGE is checking whether there was any warning already and if
>> so, issues directly a SIGKILL - this is on the one hand wrong of course. But
>> it's for sure a matter of discussion: is s_rt/h_rt per iteration or for the
>> overall job time? (maybe: queue = per interation, resource request = overall
>> time?)
>>
>> I see only the option to do this outside of SGE and issue once in a while
>> `qstatus -r`*) to get the runtime per job and make appropriate measures,
>> i.e. execute `qmod -sj <job_id>` as you intended.
>>
>> -- Reuti
>>
>> *) It's necessary to make a change to the awk script to get the raw output
>> instead the formatted time in the "(relative)" case:
>>
>> starttime=sprintf("%s", running_seconds)
>>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users