Re: [slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

2020-07-09 Thread Paul Edmon
Try setting RawShares to something greater than 1.  I've seen it be the 
case then when you set 1 it creates weirdness like this.



-Paul Edmon-


On 7/9/2020 1:12 PM, Dumont, Joey wrote:


Hi,


We recently set up fair tree scheduling (we have 19.05 running), and 
are trying to use sshare to see usage information. Unfortunately, 
sshare reports all zeros, even though there seems to be data in the 
backend DB. Here's an example output:



$ sshare -l
           Account       User  RawShares  NormShares RawUsage  
 NormUsage  EffectvUsage  FairShare    LevelFS               
GrpTRESMins                    TRESRunMins
 -- -- --- --- 
--- - -- -- 
-- --
root                                                            0     
            0.00              0.00                 
cpu=0,mem=0,energy=0,node=0,b+
 covid                                  1                       0     
          0.00              0.00               
cpu=0,mem=0,energy=0,node=0,b+
covid-01                               1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
covid-02                               1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 group1                                 1                       0     
        0.00              0.00             
cpu=0,mem=0,energy=0,node=0,b+
subgroup1                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              4  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                               1                       0     
      0.00              0.00           
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                               1                       0     
      0.00              0.00           
cpu=0,mem=0,energy=0,node=0,b+




And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
FastSchedule=1

SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=600,bf_window=2880,bf_max_job_test=5000,bf_max_job_part=1000,bf_max_job_user=10,bf_max_job_start=100

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=10
PriorityWeightAge=1000
PriorityWeightPartition=1
PriorityWeightJobSize=1000
PriorityMaxAge=1-0

# LOGGING
SlurmctldDebug=3

[slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

2020-07-09 Thread Dumont, Joey
Hi,


We recently set up fair tree scheduling (we have 19.05 running), and are trying 
to use sshare to see usage information. Unfortunately, sshare reports all 
zeros, even though there seems to be data in the backend DB. Here's an example 
output:


$ sshare -l
 Account   User  RawShares  NormSharesRawUsage   NormUsage  
EffectvUsage  FairShareLevelFSGrpTRESMins   
 TRESRunMins
 -- -- --- --- --- 
- -- -- -- 
--
root 0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 covid   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  covid-01   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  covid-02   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 group1  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroup1  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  4   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+



And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill