Hi, Am 15.01.2014 um 11:16 schrieb Joe Borġ:
> Using h_rt kills the job after the allotted time. Yes. > Can't this be disabled? There is no feature in SGE to extend the granted runtime of a job (I heard such a thing is available in Torque). > We only want to use it as a rough guide. If you want to do it only once in a time for a particular job: In this case you can just kill (or softstop) the `sgeexecd` on the node. You will lose control of the jobs on the node and the node (from SGE's view - `qhost` shows "-" for the node's load). So you have to check from time to time whether the job in question finished already, and then restart the `sgeexecd`. Also no new jobs will be scheduled to the node. Only at point of restarting the `sgeexecd` it will discover that the job finished (and send an email if applicable). Other (still) running jobs will gain supervision of their runtime again. -- Reuti > Thanks > > > > Regards, > Joseph David Borġ > josephb.org > > > On 13 January 2014 17:43, Reuti <[email protected]> wrote: > Am 13.01.2014 um 18:33 schrieb Joe Borġ: > > > Thanks. Can you please tell me what I'm doing wrong? > > > > qsub -q test.q -R y -l h_rt=60 -pe test.pe 1 small.bash > > qsub -q test.q -R y -l h_rt=120 -pe test.pe 2 big.bash > > qsub -q test.q -R y -l h_rt=60 -pe test.pe 1 small.bash > > qsub -q test.q -R y -l h_rt=60 -pe test.pe 1 small.bash > > Only the parallel job needs "-R y". > > > > > > job-ID prior name user state submit/start at queue > > slots ja-task-ID > > ----------------------------------------------------------------------------------------------------------------- > > 156757 0.50000 small.bash joe.borg qw 01/13/2014 16:45:18 > > 1 > > 156761 0.50000 big.bash joe.borg qw 01/13/2014 16:55:31 > > 2 > > 156762 0.50000 small.bash joe.borg qw 01/13/2014 16:55:33 > > 1 > > 156763 0.50000 small.bash joe.borg qw 01/13/2014 16:55:34 > > 1 > > > > ...But when I release... > > max_reservation is set? > > But the reservation feature must also be seen in a running cluster. If all > four jobs are on hold and released at once, I wouldn't be surprised if it's > not strictly FIFO. > > > > job-ID prior name user state submit/start at queue > > slots ja-task-ID > > ----------------------------------------------------------------------------------------------------------------- > > 156757 0.50000 small.bash joe.borg r 01/13/2014 16:56:06 > > test.q@test 1 > > 156762 0.50000 small.bash joe.borg r 01/13/2014 16:56:06 > > test.q@test 1 > > 156761 0.50000 big.bash joe.borg qw 01/13/2014 16:55:31 > > 2 > > 156763 0.50000 small.bash joe.borg qw 01/13/2014 16:55:34 > > 1 > > As job 156762 has the same runtime as 156757, backfilling will occur to use > the otherwise idling core. Whether job 156762 is started or not, the parallel > one 156761 will start at the same time. Only 156763 shouldn't start. > > -- Reuti > > > > > > > > Thanks > > > > > > > > Regards, > > Joseph David Borġ > > josephb.org > > > > > > On 13 January 2014 17:26, Reuti <[email protected]> wrote: > > Am 13.01.2014 um 17:24 schrieb Joe Borġ: > > > > > Hi Reuti, > > > > > > I am using a PE, so that's fine. > > > > > > I've not set either of the other 3. Will the job be killed if > > > default_duration is exceeded? > > > > No. It can be set to any value you like (like a few weeks), but it > > shouldn't be set to "INFINITY" as SGE judges infinity being smaller than > > infinity and so backfilling will always occur. > > > > -- Reuti > > > > > > > Thanks > > > > > > > > > > > > Regards, > > > Joseph David Borġ > > > josephb.org > > > > > > > > > On 13 January 2014 16:16, Reuti <[email protected]> wrote: > > > Hi, > > > > > > Am 13.01.2014 um 16:58 schrieb Joe Borġ: > > > > > > > I'm trying to set up an SGE queue and am having a problem getting the > > > > jobs to start in the right order. Here is my example - test.q with 2 > > > > possible slots and the following jobs queued: > > > > > > > > job-ID prior name user state submit/start at queue > > > > slots ja-task-ID > > > > ----------------------------------------------------------------------------------------------------------------- > > > > 1 0.50000 small.bash joe.borg qw 01/13/2014 15:43:16 > > > > 1 > > > > 2 0.50000 big.bash joe.borg qw 01/13/2014 15:43:24 > > > > 2 > > > > 3 0.50000 small.bash joe.borg qw 01/13/2014 15:43:27 > > > > 1 > > > > 4 0.50000 small.bash joe.borg qw 01/13/2014 15:43:28 > > > > 1 > > > > > > > > I want the jobs to run in that order, but (obviously), when I enable > > > > the queue, the small jobs fill the available slots and the big job has > > > > to wait for them to complete. I'd like it setup so that only job 1 > > > > runs; finishes, then 2 (with both slots), then the final 2 jobs, 3 & 4, > > > > together. > > > > > > > > I've looked at -R y on submission, but doesn't seem to work. > > > > > > For the reservation to work (and it's only necessary to request it for > > > the parallel job) it's necessary to have suitable "h_rt" requests for all > > > jobs. > > > > > > - Do you request any "h_rt" for all jobs? > > > - Do you have a "default_duration" set to a proper value in the schedule > > > configuration otherwise? > > > - Is "max_reservation" set to a value like 16? > > > > > > -- Reuti > > > > > > > > > > Regards, > > > > Joseph David Borġ > > > > josephb.org > > > > _______________________________________________ > > > > users mailing list > > > > [email protected] > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
