I think I learned this trick from Reuti:
- Any legit job running under Grid Engine will be a child process of
an sge_execd daemon.
A nice little trick is a cronjob that does a "kill -9" on any user
process that is not a child of sge_execd -- that will quickly send a
message to the people bypassing the resource scheduling layer.
That said, however, I've been in this position in a number of
environments and I can tell you that you will NEVER win the battle with
users trying to game the system. The motivated user will always have
more time and more incentive than an overworked cluster administrator.
While simple technical measures like that "kill -9" trick or Reuti's
more sensible suggestion of blocking interactive SSH access to nodes
outside of SGE should be pursued I'd suggest that you don't spend much
more time than that developing technical countermeasures.
The real way this gets solved in a multi-user cluster environment is by
treating acceptable cluster usage as a human resources policy. You'll
never win a technical battle with a motivated power user.
Acceptable cluster use should be governed by a published policy and when
the policy is avoided or gamed then the response should involve mentors,
managers or the HR department, not technology or scripts.
In a corporate setting this comes down to:
1. First time you bypass SGE the admins send you a warning
2. Second time you get caught your manager gets notified
3. Third time? Account is disabled and you are reported to the HR
department for violating company policy repeatedly
Sorry for being long winded but most long-time cluster admins might
share my option that cluster use policies can't be treated as a
technical war between admins and users -- it's far easier and better to
treat this as a workplace behavior thing.
-Chris
Reuti wrote:
Hi,
Am 19.08.2011 um 18:30 schrieb Gowtham:
In some of the computing clusters across our campus, we have noticed many users
running their jobs outside of the SGE queuing system. While we have plans to
continue tutoring them about the benefits of using a queuing system, not
everyone seems to be getting the message - as such, these
violating-users' jobs are hampering those who have been
using SGE.
On all our Rocks based clusters, we do keep the list of
cluster's uses in a flat text file, one user per line.
Is there a way by which I (as root) can kill all those
jobs submitted outside of SGE on compute nodes by these
normal users?
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users