Thanks, core binding looks like it does what we need.  Do I understand 
correctly that if a process spawns more threads than slots that it will then 
just restrict those threads to the core it’s been allocated, so they’ll just 
end up slowing down their own job, and that it won’t actually get killed?

I’ll be very careful in testing this :-)

Simon.

From: "MacMullan, Hugh" 
<hugh...@wharton.upenn.edu<mailto:hugh...@wharton.upenn.edu>>
Date: Thursday, 30 July 2015 16:20
To: Simon Andrews 
<simon.andr...@babraham.ac.uk<mailto:simon.andr...@babraham.ac.uk>>, 
"users@gridengine.org<mailto:users@gridengine.org>" 
<users@gridengine.org<mailto:users@gridengine.org>>
Subject: RE: Monitoring slot usage

Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as slots 
requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the other 
job inputs: single slot and MPI jobs are bound to 1 core (for each slot 
requested), OpenMP jobs are bound to the number of slots requested in the pe 
option.

Or you might be able to just put '-binding linear:1' in 
$SGE_ROOT/default/common/sge_request, and then have users specify '-binding 
linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

From: users-boun...@gridengine.org<mailto:users-boun...@gridengine.org> 
[mailto:users-boun...@gridengine.org] On Behalf Of Simon Andrews
Sent: Thursday, July 30, 2015 11:01 AM
To: users@gridengine.org<mailto:users@gridengine.org>
Subject: [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more CPU 
than they’ve requested?  I have an environment set up where people mostly 
submit SMP jobs through a parallel environment and we can use this information 
to schedule them appropriately.  We’ve had several cases though where the jobs 
have used significantly more cores on the machine they’re assigned to than they 
requested, so the nodes become overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously used 
by a job and comparing this to the number which were requested so I can find 
cases where the actual usage is way above the request and kill them?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to