Have you looked at sshare? Phil Eckert LLNL
From: Mario Kadastik <[email protected]<mailto:[email protected]>> Reply-To: slurm-dev <[email protected]<mailto:[email protected]>> Date: Tuesday, January 22, 2013 11:17 AM To: slurm-dev <[email protected]<mailto:[email protected]>> Subject: [slurm-dev] fairshare usage Hi, is there some decent way to get multifactor fairshare current state? Something akin to maui's diagnose -f output that shows groups (accounts for slurm) and users with their fairshare target as well as their historic usage over the past N days. This would seriously help understand how the fairshare is computed based on the actual usage statistics and current cluster state. For example we have all user fairshares set as parent and for the accounts: Account Share ---------- --------- root 1 grid 1 grid-ops 1 hepusers 100 kbfiusers 1 now let's assume one of the users in hepusers spends the past N days computing with the full cluster and then another user submits a number of jobs it would be logical to assume that as there is no distinction between the users in an account the newcomers priority would be higher as (s)he hasn't had any allocated time. [root@slurm-1 slurm]# sreport cluster accountutilizationbyuser start=2013-01-08 -------------------------------------------------------------------------------- Cluster/Account/User Utilization 2013-01-08T00:00:00 - 2013-01-21T23:59:59 (1209600 secs) Time reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Account Login Proper Name Used --------- --------------- --------- --------------- ---------- t2estonia root 7801048 t2estonia grid 0 t2estonia grid cms134 mapped user fo+ 0 t2estonia grid sgmcms000 mapped user fo+ 0 t2estonia hepusers 7801048 t2estonia hepusers andres Andres Tiko 85048 t2estonia hepusers mario Mario Kadastik 7716000 so according to this Mario (me) has computed a huge amount of time in comparison to andres. However if I look at the priorities from sinfo -nl I see this: [root@slurm-1 slurm]# sprio -nl|head -3 JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS 53498 mario 0.00003497 0.2404977 0.4897101 0.9919238 1.0000000 0.0000000 53499 mario 0.00003497 0.2404977 0.4897101 0.9919238 1.0000000 0.0000000 [root@slurm-1 slurm]# sprio -nl|grep andres|head -1 53835 andres 0.00003497 0.2396412 0.4897101 0.9919238 1.0000000 0.0000000 so in fact the fairshare factor is equivalent for both users no matter that one has been getting a lot of the resource while the other has not. or do I misunderstand the =parent part? I tried also setting all users shares to 1 and have no clue how long it will take for sprio to recompute this, but right now it's showing the same priorities. That's one of the reasons why I'd like to be able to see how the actual usage and decay over time affect the factor so that I can better understand the algorithm and tune the weights. Thanks, Mario Kadastik, PhD Researcher --- "Physics is like sex, sure it may have practical reasons, but that's not why we do it" -- Richard P. Feynman
