Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread MacMullan, Hugh
Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as slots 
requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the other 
job inputs: single slot and MPI jobs are bound to 1 core (for each slot 
requested), OpenMP jobs are bound to the number of slots requested in the pe 
option.

Or you might be able to just put '-binding linear:1' in 
$SGE_ROOT/default/common/sge_request, and then have users specify '-binding 
linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On 
Behalf Of Simon Andrews
Sent: Thursday, July 30, 2015 11:01 AM
To: users@gridengine.org
Subject: [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more CPU 
than they've requested?  I have an environment set up where people mostly 
submit SMP jobs through a parallel environment and we can use this information 
to schedule them appropriately.  We've had several cases though where the jobs 
have used significantly more cores on the machine they're assigned to than they 
requested, so the nodes become overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously used 
by a job and comparing this to the number which were requested so I can find 
cases where the actual usage is way above the request and kill them?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread William Hay
On Thu, 30 Jul 2015 12:57:13 +
Winkler, Ursula (ursula.wink...@uni-graz.at)
ursula.wink...@uni-graz.at wrote:

  My suggestion was to modify your jsv/gepetools to force single node
  parallel jobs into PEs with $pe_slots allocation rules (which gives
  you control over where they are scheduled via queue_sort_method and
  load_formula) while sending the others to PEs with other
  (appropriate) 
 allocation rules that won't cause (ii).
 
 Well, I created an additional PE with alloacation_rule $pe_slots,
 and built in an if condition in pe.jsv for all jobs which request
 just a single node to be assigned to this new PE. But the annoying
 situation didn't change. The scheduler configuration is set to
 queue_sort_methodload and load_formula  slots. So what I'm
 still missing?
How is job_load_adjustment configured?

 
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users



pgpfZHlQA2o9k.pgp
Description: OpenPGP digital signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread Feng Zhang
I have similar issue too. Especially when users run MPI+Multithreads
jobs. Some Multithreading programs by default use all of the cores on
a node they find.

Now I have a script to scan the usage of CPU and RAM on all nodes, and
it will warn me if it find any overloaded nodes.

Not sure SGE has built-in ability to track the CPU cores each job
uses. But it may not be difficult to prepare a script to do that
routinely out of SGE.



On Thu, Jul 30, 2015 at 11:00 AM, Simon Andrews
simon.andr...@babraham.ac.uk wrote:
 What is the recommended way of identifying jobs which are consuming more CPU
 than they’ve requested?  I have an environment set up where people mostly
 submit SMP jobs through a parallel environment and we can use this
 information to schedule them appropriately.  We’ve had several cases though
 where the jobs have used significantly more cores on the machine they’re
 assigned to than they requested, so the nodes become overloaded and go into
 an alarm state.

 What options do I have for monitoring the number of cores simultaneously
 used by a job and comparing this to the number which were requested so I can
 find cases where the actual usage is way above the request and kill them?

 Thanks

 Simon.

 The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
 Registered Charity No. 1053902.

 The information transmitted in this email is directed only to the addressee.
 If you received this in error, please contact the sender and delete this
 email from your system. The contents of this e-mail are the views of the
 sender and do not necessarily represent the views of the Babraham Institute.
 Full conditions at: www.babraham.ac.uk


 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users




-- 
Best,

Feng

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
 Sorry to step in the discussion: `qstat -j ...` shows the requested one, the 
 granted one is in `qstat -r`.

 $ qsub -pe * 2 test.sh
  Your job 44329 (test.sh) has been submitted $ qstat -j 44329 ...
 parallel environment:  * range: 2
 ...

My jobs:

qstat -j ...
   ...
   parallel environment: gepetools_1host range 2
   ...

That's the PE I created for that purposes.  So qstat -j shows the right info.


 $ qstat -r
 ...
Requested PE: * 2
Granted PE:   make 2

qstat -r
...
Requested PE:   gepetools_1host 2
Granted PE:gepetools_1host 2

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
 Well, I created an additional PE with alloacation_rule $pe_slots, 
 and built in an if condition in pe.jsv for all jobs which request 
 just a single node to be assigned to this new PE. But the annoying 
 situation didn't change. The scheduler configuration is set to
 queue_sort_methodload and load_formula  slots. So what I'm
 still missing?
I believe it should be a load_formula of -slots so the more slots are 
available(fewest used) 
the lower the load and the more attractive the node.  
The page Reuti pointed to manages to write this both ways around.

Setting load_formula to -slots doesn't change anything - every job still 
starts on a separate host
(but in this case it should be the correct hehave if I don't misinterpret the 
instructions from the Web Page Reuti  mentioned).

I must be missing something else and pretty basic...

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread William Hay
On Thu, 30 Jul 2015 06:12:52 +
Winkler, Ursula (ursula.wink...@uni-graz.at)
ursula.wink...@uni-graz.at wrote:

 
 
 -Ursprüngliche Nachricht-
 Von: Reuti [mailto:re...@staff.uni-marburg.de] 
 Gesendet: Mittwoch, 29. Juli 2015 15:10
 An: Winkler, Ursula (ursula.wink...@uni-graz.at)
 Cc: users@gridengine.org
 Betreff: Re: [gridengine users] Filling up nodes when using gepetools
 
 Hi,
 
  Am 29.07.2015 um 12:50 schrieb Winkler, Ursula
  (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at:
  
  Node1 has 12 Cores/Slots and 1 MPI-Job with 2 Slots is running
  on it. A user submits job2 which require maximal 10 slots.
  Independently from schedule_interval, job_load_adjustments, 
  load_formula and/or load_adjustment_decay_time
  parameters-settings job2 usually won't start on Node1 if
  What about queue_sort_method?
  
  Doesn't work neither.
 
  As long as the requested PE has $pe_slots as allocation_rule, it
  should be possible to use a fill up configuration:
 
  https://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least
 
 Thank you for the link, that with $pe_slots I didn't know. But
 unfortunately it still doesn't work  - maybe because of the gepetools
 Sub-PE's. Setting there $pe_slots too has the effect that jobs
 doesn't start anymore. 
 
 Ursula
$pe_slots restricts you to a single node so I'm guessing the jobs that
don't start are jobs that need more than one node.  While we don't use
gepetools we do have a JSV that rewrites people's requested PE based on
the number

What you need I think is something that routes jobs that request 1 node
to PEs with a $pe_slots allocation rule while other jobs are routed to
nodes with an allocation rule equal to the requested ppn.  In all
cases the number of slots to request should be nodes*ppn.
 
 
 
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users



pgp5A7ayz_l7X.pgp
Description: OpenPGP digital signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)

 I believe it should be a load_formula of -slots so the more slots are
 available(fewest used) the lower the load and the more attractive the
 node.  The page Reuti pointed to manages to write this both ways around.

I'll try it out tomorrow - I'm not at the office now and it's a little bit 
difficult from here.
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)


 On Thu, 30 Jul 2015 12:57:13 +
 Winkler, Ursula (ursula.wink...@uni-graz.at)
 ursula.wink...@uni-graz.at wrote:
 
 My suggestion was to modify your jsv/gepetools to force single node
 parallel jobs into PEs with $pe_slots allocation rules (which gives
 you control over where they are scheduled via queue_sort_method and
 load_formula) while sending the others to PEs with other
 (appropriate) 
 allocation rules that won't cause (ii).
 
 Well, I created an additional PE with alloacation_rule $pe_slots,
 and built in an if condition in pe.jsv for all jobs which request
 just a single node to be assigned to this new PE. But the annoying
 situation didn't change. The scheduler configuration is set to
 queue_sort_methodload and load_formula  slots. So what I'm
 still missing?
 Ignore previous message.  Me getting it back to front I think.  That
 looks correct (I think).  Have you checked the jobs show the right
 granted PE with qstat -j?

Yes, of course.

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Monitoring slot usage

2015-07-30 Thread Simon Andrews
Thanks, core binding looks like it does what we need.  Do I understand 
correctly that if a process spawns more threads than slots that it will then 
just restrict those threads to the core it’s been allocated, so they’ll just 
end up slowing down their own job, and that it won’t actually get killed?

I’ll be very careful in testing this :-)

Simon.

From: MacMullan, Hugh 
hugh...@wharton.upenn.edumailto:hugh...@wharton.upenn.edu
Date: Thursday, 30 July 2015 16:20
To: Simon Andrews 
simon.andr...@babraham.ac.ukmailto:simon.andr...@babraham.ac.uk, 
users@gridengine.orgmailto:users@gridengine.org 
users@gridengine.orgmailto:users@gridengine.org
Subject: RE: Monitoring slot usage

Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as slots 
requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the other 
job inputs: single slot and MPI jobs are bound to 1 core (for each slot 
requested), OpenMP jobs are bound to the number of slots requested in the pe 
option.

Or you might be able to just put '-binding linear:1' in 
$SGE_ROOT/default/common/sge_request, and then have users specify '-binding 
linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

From: users-boun...@gridengine.orgmailto:users-boun...@gridengine.org 
[mailto:users-boun...@gridengine.org] On Behalf Of Simon Andrews
Sent: Thursday, July 30, 2015 11:01 AM
To: users@gridengine.orgmailto:users@gridengine.org
Subject: [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more CPU 
than they’ve requested?  I have an environment set up where people mostly 
submit SMP jobs through a parallel environment and we can use this information 
to schedule them appropriately.  We’ve had several cases though where the jobs 
have used significantly more cores on the machine they’re assigned to than they 
requested, so the nodes become overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously used 
by a job and comparing this to the number which were requested so I can find 
cases where the actual usage is way above the request and kill them?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Reuti

 Am 30.07.2015 um 18:14 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at) 
 ursula.wink...@uni-graz.at:
 
 
 
 On Thu, 30 Jul 2015 12:57:13 +
 Winkler, Ursula (ursula.wink...@uni-graz.at)
 ursula.wink...@uni-graz.at wrote:
 
 My suggestion was to modify your jsv/gepetools to force single node
 parallel jobs into PEs with $pe_slots allocation rules (which gives
 you control over where they are scheduled via queue_sort_method and
 load_formula) while sending the others to PEs with other
 (appropriate) 
 allocation rules that won't cause (ii).
 
 Well, I created an additional PE with alloacation_rule $pe_slots,
 and built in an if condition in pe.jsv for all jobs which request
 just a single node to be assigned to this new PE. But the annoying
 situation didn't change. The scheduler configuration is set to
 queue_sort_methodload and load_formula  slots. So what I'm
 still missing?
 Ignore previous message.  Me getting it back to front I think.  That
 looks correct (I think).  Have you checked the jobs show the right
 granted PE with qstat -j?
 
 Yes, of course.

Sorry to step in the discussion: `qstat -j ...` shows the requested one, the 
granted one is in `qstat -r`.

$ qsub -pe * 2 test.sh
Your job 44329 (test.sh) has been submitted
$ qstat -j 44329
...
parallel environment:  * range: 2
...
$ qstat -r
...
   Requested PE: * 2
   Granted PE:   make 2

-- Reuti


 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Filling up nodes when using gepetools

2015-07-30 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)




Am 30.07.2015 um 18:29 schrieb Reuti re...@staff.uni-marburg.de:

 
 Am 30.07.2015 um 18:14 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at) 
 ursula.wink...@uni-graz.at:
 
 
 
 On Thu, 30 Jul 2015 12:57:13 +
 Winkler, Ursula (ursula.wink...@uni-graz.at)
 ursula.wink...@uni-graz.at wrote:
 
 My suggestion was to modify your jsv/gepetools to force single node
 parallel jobs into PEs with $pe_slots allocation rules (which gives
 you control over where they are scheduled via queue_sort_method and
 load_formula) while sending the others to PEs with other
 (appropriate) 
 allocation rules that won't cause (ii).
 
 Well, I created an additional PE with alloacation_rule $pe_slots,
 and built in an if condition in pe.jsv for all jobs which request
 just a single node to be assigned to this new PE. But the annoying
 situation didn't change. The scheduler configuration is set to
 queue_sort_methodload and load_formula  slots. So what I'm
 still missing?
 Ignore previous message.  Me getting it back to front I think.  That
 looks correct (I think).  Have you checked the jobs show the right
 granted PE with qstat -j?
 
 Yes, of course.
 
 Sorry to step in the discussion: `qstat -j ...` shows the requested one, the 
 granted one is in `qstat -r`.
 
 $ qsub -pe * 2 test.sh
 Your job 44329 (test.sh) has been submitted
 $ qstat -j 44329
 ...
 parallel environment:  * range: 2
 ...
 $ qstat -r
 ...
   Requested PE: * 2
   Granted PE:   make 2
 
 -- Reuti

At the moment I don't know if I checked it with qstat -j, but I checked it - 
when I'm in the office again I probably have the output still on some screen 
window so I can tell it exactly. And I did do a test: I removed the PE 
temporarely from the queue - with the result that the jobs could not start 
anymore (as respected). 
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users