I agree with Chris. You will never find a technical solution to
"policing" your cluster. I have found the number one most effective tool
to be shame. I publish a largest disk hog and rogue job list to the
entire community under the guise of a "How is the cluster doing" report.
Nothing like peer pressure to put a stop to this kind of activity.
--Gary
From: Chris Dagdigian <[email protected]>
To: Reuti <[email protected]>, Gowtham <[email protected]>,
Sun Grid Engine Discussion List <[email protected]>, NPACI Rocks
Discussion List <[email protected]>
Date: 08/19/2011 12:52 PM
Subject: Re: [gridengine users] Rocks 5.4: Terminate Non-SGE Jobs
on Compute Nodes by Normal Users
Sent by: [email protected]
I think I learned this trick from Reuti:
- Any legit job running under Grid Engine will be a child process of
an sge_execd daemon.
A nice little trick is a cronjob that does a "kill -9" on any user
process that is not a child of sge_execd -- that will quickly send a
message to the people bypassing the resource scheduling layer.
That said, however, I've been in this position in a number of
environments and I can tell you that you will NEVER win the battle with
users trying to game the system. The motivated user will always have
more time and more incentive than an overworked cluster administrator.
While simple technical measures like that "kill -9" trick or Reuti's
more sensible suggestion of blocking interactive SSH access to nodes
outside of SGE should be pursued I'd suggest that you don't spend much
more time than that developing technical countermeasures.
The real way this gets solved in a multi-user cluster environment is by
treating acceptable cluster usage as a human resources policy. You'll
never win a technical battle with a motivated power user.
Acceptable cluster use should be governed by a published policy and when
the policy is avoided or gamed then the response should involve mentors,
managers or the HR department, not technology or scripts.
In a corporate setting this comes down to:
1. First time you bypass SGE the admins send you a warning
2. Second time you get caught your manager gets notified
3. Third time? Account is disabled and you are reported to the HR
department for violating company policy repeatedly
Sorry for being long winded but most long-time cluster admins might
share my option that cluster use policies can't be treated as a
technical war between admins and users -- it's far easier and better to
treat this as a workplace behavior thing.
-Chris
Reuti wrote:
> Hi,
>
> Am 19.08.2011 um 18:30 schrieb Gowtham:
>
>> In some of the computing clusters across our campus, we have noticed
many users running their jobs outside of the SGE queuing system. While we
have plans to continue tutoring them about the benefits of using a queuing
system, not everyone seems to be getting the message - as such, these
>> violating-users' jobs are hampering those who have been
>> using SGE.
>>
>> On all our Rocks based clusters, we do keep the list of
>> cluster's uses in a flat text file, one user per line.
>>
>> Is there a way by which I (as root) can kill all those
>> jobs submitted outside of SGE on compute nodes by these
>> normal users?
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users