Re: [gridengine users] Incorrect share tree usage for task array jobs

Cameron Brunner Fri, 04 Apr 2014 15:52:48 -0700

Mark -

  I analyzed this issue on a UGE 8.1.4 cluster last fall and found that
this process was the simplest way to get some overbooking reported.  I
think any version of SGE with original share tree code will show this
issue.  The following steps show how to get one of the share tree booking
issues to manifest...this is due to the fact that only one task from a job
array is used when calculating online usage and then a large 'adjustment'
is added at the end when the final usage is delivered.  The adjustment is
delivered in one scheduler interval even though that usage was really
accrued over the life of the job...this causes a large spike in share tree
usage as the adjustment was never decayed.


My test setup is as follows:
0) Make sure you have 8.1.4+ on 3 hosts (I had 8 slots per host with 6
hosts)
1) Enable share tree as usual with a tree that provides equal shares to
users (simple ROOT with s single default leaf)
2) Enable SHARETREE_RESERVED_USAGE
3) (For testing ease) Disable decay (set halftime to 0 and
halflife_decay_list to none)
4) (For testing ease) Decrease the scheduling interval to about 10 seconds
5) (For testing ease) Decrease the load_report_time to 5 seconds

Test 1 (Show incorrect usage with task arrays):
0) Make sure no jobs are running and clear the sharetree usage
1) As user1 submit a single slot job (ie qsub -b y sleep 100000)
2) As user2 submit a task array with 2 slots (ie qsub -t 1-2 -b y sleep
10000)
3) Verify that three slots are in use
4) Using sge_share_mon verify that the usage for each user is growing at
the same rate.
5) Wait 5 minutes
6) Verify that the usage for each user is still about equal
7) Kill the task array job
8) Verify that after the next scheduler run the usage for user2 is double
that of user1
---- If things were working correctly user2 should have always been twice
user1

Test 2 (Show correct usage with PEs):
0) Make sure no jobs are running and clear the sharetree usage
1) As user1 submit a single slot job (ie qsub -b y sleep 100000)
2) As user2 submit a pe with 2 slots (ie qsub -pe smp 2 -b y sleep 10000)
3) Verify that three slots are in use
4) Using sge_share_mon verify that the usage for each user2 is growing at
double the rate of user1.
5) Wait 5 minutes
6) Verify that the usage of user2 is double user1
7) Kill the pe job
8) Verify that after the next scheduler run the usage for user2 is double
that of user1
---- This is proper behavior

Test 2 (Show incorrect usage with tasks and PEs):
0) Make sure no jobs are running and clear the sharetree usage
1) As user1 submit a single slot job (ie qsub -b y sleep 100000)
2) As user2 submit a pe with 2 slots (ie qsub -t 1-2 -pe smp 2 -b y sleep
10000)
3) Verify that five slots are in use
4) Using sge_share_mon verify that the usage for each user2 is growing at
double the rate of user1.
5) Wait 5 minutes
6) Verify that the usage of user2 is double user1
7) Kill the task array job
8) Verify that after the next scheduler run the usage for user2 is 4 times
that of user1
---- If things were working correctly user2 should have always been 4 times
user1


In UGE we fixed this issue but that it led to us solving the 2298 issue as
this fix made the 2298 issue much more obvious.  The 2298 issue can
reproduce by submitting a job that has more tasks than you have slots
available so that some are pending and some are running.  This is should be
easy to see if you use the above testing environment (not 100% sure here
though since the majority of my time with 2298 was with our code that was
patched to address the 'adjustment' issue).

These two issues play against each-other in a way that can make it hard to
identify the real causes of the improper usage in an active cluster.  I
think these problems would be very difficult to identify in a cluster with
a large half-lifle unless very large long running job-arrays are used
(runtimes at least 10 times the half-life).

Even after fixing these 2 issues there was a little bit more work that we
needed to do to get the share tree working properly with array jobs but we
feel that it is fully fixed in UGE 8.1.7.  The changes have been running in
production for several months without incident at a site where crazy share
tree usage was initially seen daily.

-Cameron

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Incorrect share tree usage for task array jobs

Reply via email to