Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread MacMullan, Hugh
Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as slots 
requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the other 
job inputs: single slot and MPI jobs are bound to 1 core (for each slot 
requested), OpenMP jobs are bound to the number of slots requested in the pe 
option.

Or you might be able to just put '-binding linear:1' in 
$SGE_ROOT/default/common/sge_request, and then have users specify '-binding 
linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On 
Behalf Of Simon Andrews
Sent: Thursday, July 30, 2015 11:01 AM
To: users@gridengine.org
Subject: [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more CPU 
than they've requested?  I have an environment set up where people mostly 
submit SMP jobs through a parallel environment and we can use this information 
to schedule them appropriately.  We've had several cases though where the jobs 
have used significantly more cores on the machine they're assigned to than they 
requested, so the nodes become overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously used 
by a job and comparing this to the number which were requested so I can find 
cases where the actual usage is way above the request and kill them?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread Feng Zhang
I have similar issue too. Especially when users run MPI+Multithreads
jobs. Some Multithreading programs by default use all of the cores on
a node they find.

Now I have a script to scan the usage of CPU and RAM on all nodes, and
it will warn me if it find any overloaded nodes.

Not sure SGE has built-in ability to track the CPU cores each job
uses. But it may not be difficult to prepare a script to do that
routinely out of SGE.



On Thu, Jul 30, 2015 at 11:00 AM, Simon Andrews
simon.andr...@babraham.ac.uk wrote:
 What is the recommended way of identifying jobs which are consuming more CPU
 than they’ve requested?  I have an environment set up where people mostly
 submit SMP jobs through a parallel environment and we can use this
 information to schedule them appropriately.  We’ve had several cases though
 where the jobs have used significantly more cores on the machine they’re
 assigned to than they requested, so the nodes become overloaded and go into
 an alarm state.

 What options do I have for monitoring the number of cores simultaneously
 used by a job and comparing this to the number which were requested so I can
 find cases where the actual usage is way above the request and kill them?

 Thanks

 Simon.

 The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
 Registered Charity No. 1053902.

 The information transmitted in this email is directed only to the addressee.
 If you received this in error, please contact the sender and delete this
 email from your system. The contents of this e-mail are the views of the
 sender and do not necessarily represent the views of the Babraham Institute.
 Full conditions at: www.babraham.ac.uk


 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users




-- 
Best,

Feng

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread Simon Andrews
Thanks, core binding looks like it does what we need.  Do I understand 
correctly that if a process spawns more threads than slots that it will then 
just restrict those threads to the core it’s been allocated, so they’ll just 
end up slowing down their own job, and that it won’t actually get killed?

I’ll be very careful in testing this :-)

Simon.

From: MacMullan, Hugh 
hugh...@wharton.upenn.edumailto:hugh...@wharton.upenn.edu
Date: Thursday, 30 July 2015 16:20
To: Simon Andrews 
simon.andr...@babraham.ac.ukmailto:simon.andr...@babraham.ac.uk, 
users@gridengine.orgmailto:users@gridengine.org 
users@gridengine.orgmailto:users@gridengine.org
Subject: RE: Monitoring slot usage

Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as slots 
requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the other 
job inputs: single slot and MPI jobs are bound to 1 core (for each slot 
requested), OpenMP jobs are bound to the number of slots requested in the pe 
option.

Or you might be able to just put '-binding linear:1' in 
$SGE_ROOT/default/common/sge_request, and then have users specify '-binding 
linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

From: users-boun...@gridengine.orgmailto:users-boun...@gridengine.org 
[mailto:users-boun...@gridengine.org] On Behalf Of Simon Andrews
Sent: Thursday, July 30, 2015 11:01 AM
To: users@gridengine.orgmailto:users@gridengine.org
Subject: [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more CPU 
than they’ve requested?  I have an environment set up where people mostly 
submit SMP jobs through a parallel environment and we can use this information 
to schedule them appropriately.  We’ve had several cases though where the jobs 
have used significantly more cores on the machine they’re assigned to than they 
requested, so the nodes become overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously used 
by a job and comparing this to the number which were requested so I can find 
cases where the actual usage is way above the request and kill them?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users