Re: [gridengine users] Rocks 5.4: Terminate Non-SGE Jobs on Compute Nodes by Normal Users

Chris Dagdigian Fri, 19 Aug 2011 09:51:42 -0700

I think I learned this trick from Reuti:

- Any legit job running under Grid Engine will be a child process ofan sge_execd daemon.

A nice little trick is a cronjob that does a "kill -9" on any userprocess that is not a child of sge_execd -- that will quickly send amessage to the people bypassing the resource scheduling layer.

That said, however, I've been in this position in a number ofenvironments and I can tell you that you will NEVER win the battle withusers trying to game the system. The motivated user will always havemore time and more incentive than an overworked cluster administrator.

While simple technical measures like that "kill -9" trick or Reuti'smore sensible suggestion of blocking interactive SSH access to nodesoutside of SGE should be pursued I'd suggest that you don't spend muchmore time than that developing technical countermeasures.

The real way this gets solved in a multi-user cluster environment is bytreating acceptable cluster usage as a human resources policy. You'llnever win a technical battle with a motivated power user.

Acceptable cluster use should be governed by a published policy and whenthe policy is avoided or gamed then the response should involve mentors,managers or the HR department, not technology or scripts.


In a corporate setting this comes down to:

1. First time you bypass SGE the admins send you a warning

2. Second time you get caught your manager gets notified

3. Third time? Account is disabled and you are reported to the HRdepartment for violating company policy repeatedly

Sorry for being long winded but most long-time cluster admins mightshare my option that cluster use policies can't be treated as atechnical war between admins and users -- it's far easier and better totreat this as a workplace behavior thing.


-Chris






Reuti wrote:

Hi,

Am 19.08.2011 um 18:30 schrieb Gowtham:

In some of the computing clusters across our campus, we have noticed many users 
running their jobs outside of the SGE queuing system. While we have plans to 
continue tutoring them about the benefits of using a queuing system, not 
everyone seems to be getting the message - as such, these
violating-users' jobs are hampering those who have been
using SGE.

On all our Rocks based clusters, we do keep the list of
cluster's uses in a flat text file, one user per line.

Is there a way by which I (as root) can kill all those
jobs submitted outside of SGE on compute nodes by these
normal users?

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Rocks 5.4: Terminate Non-SGE Jobs on Compute Nodes by Normal Users

Reply via email to