Am 10.09.2012 um 10:20 schrieb Lars van der bijl: >> <snip> >> Why do you want to release only certain array tasks? Usually a plain >> `qhold`/`qrls` like `qalter` will affect the complete array job, i.e. all >> tasks. If for example task 26 of the first job fails, you only want to block >> task 26 of job 2 and let all other run? >> > > yes exactly. but knowing which things to unblock would be tricking. > unless there is information in the epilog on which task should be > unblocked in job2.
As it's your workflow, you could record in the the job context with `qsub -ac ...` >> Nevertheless, the above commands allow a task range to be given or a single >> task index: >> >> $ qrls 1234 -t 1-10 >> $ qrls 1234.42 >> >> will release only tasks 1-10 and the others are still on hold. >> >> >>>> >>>> - Create a special queue for some kind of `enabler' jobs which run forever >>>> (loop e.g. once a minute until they quit), the original job will >>>> create/touch a special file for which the `enabler' is waiting. If the >>>> existence of the relevant file is detected, the `enabler' can release a >>>> hold of a certain job or even just submit the successor job. >>>> >>>> - Creating a workflow can be done with: http://wildfire.bii.a-star.edu.sg/ >>>> tool GEL http://wildfire.bii.a-star.edu.sg/docs/gel_ref.pdf where you can >>>> check for files. But the jobs will be submitted during the workflow and >>>> not all in advance. Maybe it is useful anyway. >>>> >>>> -- Reuti >>> >>> it would still be nice to know if it where possible to know implement >>> the "dormant" task approach. the company I work for would be willing >>> to pay for such development. depending on the feasibility. >> >> I'm still not sure what you mean by "dormant" state, as error state is not >> sufficient. Similar you can use `qhold 1234.42` and `qmod -rj 1234.42` to >> put the task 42 back into waiting state. >> >> In which state should a "dormant" task be? > > if a task errors thats one thing. our wrappers catch that. see if you > hit the retry limit and exit with 100. Good. > but there a many cases it errors with 137 or 139 and it gets removed > from the queue. or a task doesn't error but the host application spits > out corrupt data. Okay, now I see. You could use a script like: #!/bin/sh . /usr/sge/default/common/settings.sh qalter -h u $JOB_ID qmod -rj $JOB_ID kill -9 -- -$1 for the "terminate_method $job_pid" the queue definition. Seeing "hRq" as a dormant state. -- Reuti > instead of removing a task i'd want to be able to run it again. just > have it be put in a none active state or "dormant" so that I could run > it again without having to submit a new set of task. we very rarely > run a single task. > they always have dependencies and always have batching. so being able > to run a subset of tasks again without having to do a re-submission > would make a huge different. > > >> >> -- Reuti > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
