Look at the sshare and sprio tools. Quoting Mario Kadastik <[email protected]>:
> Hi, > > is there some decent way to get multifactor fairshare current state? > Something akin to maui's diagnose -f output that shows groups > (accounts for slurm) and users with their fairshare target as well > as their historic usage over the past N days. This would seriously > help understand how the fairshare is computed based on the actual > usage statistics and current cluster state. > > For example we have all user fairshares set as parent and for the accounts: > > Account Share > ---------- --------- > root 1 > grid 1 > grid-ops 1 > hepusers 100 > kbfiusers 1 > > now let's assume one of the users in hepusers spends the past N days > computing with the full cluster and then another user submits a > number of jobs it would be logical to assume that as there is no > distinction between the users in an account the newcomers priority > would be higher as (s)he hasn't had any allocated time. > > [root@slurm-1 slurm]# sreport cluster accountutilizationbyuser > start=2013-01-08 > -------------------------------------------------------------------------------- > Cluster/Account/User Utilization 2013-01-08T00:00:00 - > 2013-01-21T23:59:59 (1209600 secs) > Time reported in CPU Minutes > -------------------------------------------------------------------------------- > Cluster Account Login Proper Name Used > --------- --------------- --------- --------------- ---------- > t2estonia root 7801048 > t2estonia grid 0 > t2estonia grid cms134 mapped user fo+ 0 > t2estonia grid sgmcms000 mapped user fo+ 0 > t2estonia hepusers 7801048 > t2estonia hepusers andres Andres Tiko 85048 > t2estonia hepusers mario Mario Kadastik 7716000 > > so according to this Mario (me) has computed a huge amount of time > in comparison to andres. However if I look at the priorities from > sinfo -nl I see this: > > [root@slurm-1 slurm]# sprio -nl|head -3 > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS > 53498 mario 0.00003497 0.2404977 0.4897101 0.9919238 > 1.0000000 0.0000000 > 53499 mario 0.00003497 0.2404977 0.4897101 0.9919238 > 1.0000000 0.0000000 > [root@slurm-1 slurm]# sprio -nl|grep andres|head -1 > 53835 andres 0.00003497 0.2396412 0.4897101 0.9919238 > 1.0000000 0.0000000 > > so in fact the fairshare factor is equivalent for both users no > matter that one has been getting a lot of the resource while the > other has not. > > or do I misunderstand the =parent part? I tried also setting all > users shares to 1 and have no clue how long it will take for sprio > to recompute this, but right now it's showing the same priorities. > > That's one of the reasons why I'd like to be able to see how the > actual usage and decay over time affect the factor so that I can > better understand the algorithm and tune the weights. > > Thanks, > > Mario Kadastik, PhD > Researcher > > --- > "Physics is like sex, sure it may have practical reasons, but > that's not why we do it" > -- Richard P. Feynman > >
