Mark - I analyzed this issue on a UGE 8.1.4 cluster last fall and found that this process was the simplest way to get some overbooking reported. I think any version of SGE with original share tree code will show this issue. The following steps show how to get one of the share tree booking issues to manifest...this is due to the fact that only one task from a job array is used when calculating online usage and then a large 'adjustment' is added at the end when the final usage is delivered. The adjustment is delivered in one scheduler interval even though that usage was really accrued over the life of the job...this causes a large spike in share tree usage as the adjustment was never decayed.
My test setup is as follows: 0) Make sure you have 8.1.4+ on 3 hosts (I had 8 slots per host with 6 hosts) 1) Enable share tree as usual with a tree that provides equal shares to users (simple ROOT with s single default leaf) 2) Enable SHARETREE_RESERVED_USAGE 3) (For testing ease) Disable decay (set halftime to 0 and halflife_decay_list to none) 4) (For testing ease) Decrease the scheduling interval to about 10 seconds 5) (For testing ease) Decrease the load_report_time to 5 seconds Test 1 (Show incorrect usage with task arrays): 0) Make sure no jobs are running and clear the sharetree usage 1) As user1 submit a single slot job (ie qsub -b y sleep 100000) 2) As user2 submit a task array with 2 slots (ie qsub -t 1-2 -b y sleep 10000) 3) Verify that three slots are in use 4) Using sge_share_mon verify that the usage for each user is growing at the same rate. 5) Wait 5 minutes 6) Verify that the usage for each user is still about equal 7) Kill the task array job 8) Verify that after the next scheduler run the usage for user2 is double that of user1 ---- If things were working correctly user2 should have always been twice user1 Test 2 (Show correct usage with PEs): 0) Make sure no jobs are running and clear the sharetree usage 1) As user1 submit a single slot job (ie qsub -b y sleep 100000) 2) As user2 submit a pe with 2 slots (ie qsub -pe smp 2 -b y sleep 10000) 3) Verify that three slots are in use 4) Using sge_share_mon verify that the usage for each user2 is growing at double the rate of user1. 5) Wait 5 minutes 6) Verify that the usage of user2 is double user1 7) Kill the pe job 8) Verify that after the next scheduler run the usage for user2 is double that of user1 ---- This is proper behavior Test 2 (Show incorrect usage with tasks and PEs): 0) Make sure no jobs are running and clear the sharetree usage 1) As user1 submit a single slot job (ie qsub -b y sleep 100000) 2) As user2 submit a pe with 2 slots (ie qsub -t 1-2 -pe smp 2 -b y sleep 10000) 3) Verify that five slots are in use 4) Using sge_share_mon verify that the usage for each user2 is growing at double the rate of user1. 5) Wait 5 minutes 6) Verify that the usage of user2 is double user1 7) Kill the task array job 8) Verify that after the next scheduler run the usage for user2 is 4 times that of user1 ---- If things were working correctly user2 should have always been 4 times user1 In UGE we fixed this issue but that it led to us solving the 2298 issue as this fix made the 2298 issue much more obvious. The 2298 issue can reproduce by submitting a job that has more tasks than you have slots available so that some are pending and some are running. This is should be easy to see if you use the above testing environment (not 100% sure here though since the majority of my time with 2298 was with our code that was patched to address the 'adjustment' issue). These two issues play against each-other in a way that can make it hard to identify the real causes of the improper usage in an active cluster. I think these problems would be very difficult to identify in a cluster with a large half-lifle unless very large long running job-arrays are used (runtimes at least 10 times the half-life). Even after fixing these 2 issues there was a little bit more work that we needed to do to get the share tree working properly with array jobs but we feel that it is fully fixed in UGE 8.1.7. The changes have been running in production for several months without incident at a site where crazy share tree usage was initially seen daily. -Cameron
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
