Hi Reuti - Thanks for the quick reply - yes, the jobs do get suspended 
if the Q instance gets suspended (I made a mistake in checking this), 
but there seemed to be no way to kill them as a last resort.

I'll check the checkpointing link  that sounds like a way to handle 
the problem more elegantly.

hjm


On Tuesday 07 February 2012 12:47:36 Reuti wrote:
> Am 07.02.2012 um 20:58 schrieb Harry Mangalam:
> > I run a cluster that is a mostly peaceful mix of open
> > (universally available, under SGE 6.2) and condo nodes
> > (generally open, except when the owners want them under their
> > control and available only to them. I've assigned a node Q to
> > the owner who can disable/enable & suspend/resume the QUEUES
> > according to the docs.
> 
> You mean the jobs are not suspended, despite the fact that the
> queue instance got suspended they are running in?
> 
> You could use a custom suspend method to kill the jobs instead
> suspending. Or maybe better: attach in an JSV a checkpointing
> environment. This way the jobs would stay at the top of the queue,
> if the checkpointing environment is setup to reschedule on
> suspend. You are using the checkpointing facility only for the
> removal of the jobs from the node, i.e. for a migration.
> 
> http://arc.liv.ac.uk/SGE/howto/checkpointing.html
> 
> -- Reuti
> 
> >  Is there a mechanism to allow the Q owner to suspend or even
> >  kill running JOBS or is that forbidden to the Q owner and only
> >  available to the admin?
> > 
> > ie in the following extract, argardne is the owner of the Q on
> > node claw9, but he can't kill/suspend jobs running there - he
> > can only operate on Qs. $ qconf -sq claws
> > qname                 claws
> > hostlist              @execlaws
> > seq_no                0
> > load_thresholds       np_load_avg=1.1
> > suspend_thresholds    NONE
> > nsuspend              1
> > suspend_interval      00:05:00
> > priority              0
> > min_cpu_interval      00:05:00
> > processors           
> > 1-4,[claw1.bduc=1-2],[claw5.bduc=1-8],[claw7.bduc=1-8], \
> > 
> >                       [claw8.bduc=1-16],[claw9.bduc=1-48]
> > 
> > qtype                
> > BATCH,[claw1.bduc=BATCH],[claw5.bduc=BATCH], \
> > 
> >                       [claw9.bduc=BATCH],[claw8.bduc=BATCH],[claw
> >                       7.bduc=BATCH]
> > 
> > ...
> > owner_list            NONE,[claw9.bduc=argardne]
> > user_lists           
> > arusers,[claw9.bduc=arusers],[claw5.bduc=arusers], \
> > 
> >                       [claw7.bduc=arusers],[claw1.bduc=arusers]

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
Citzens United: Democracy on meth 
- Walter Egan
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to