Dear slurm community users, We are using slurm version 20.02.x.
We see the below message appearing a lot of times in slurmctld log and found that whenever this message is appearing the sinfo/squeue out gets slow. No timeout as i kept the value 100. Warning: Note very large processing time from load_part_uid_allow_list: usec=10800885 began=16:27:55.952 [2021-08-29T16:28:06.753] Warning: Note very large processing time from _slurmctld_background: usec=10801120 began=16:27:55.952 Is this a bug or some config issue. if anybody faced the similar issue.could anybody throw some light on this. please find the attached slurm.conf. Regards Navin.
ClusterName=merckhpc ControlMachine=Master ControlAddr=localhost AuthType=auth/munge CredType=cred/munge CacheGroups=1 ReturnToService=0 ProctrackType=proctrack/linuxproc SlurmctldPort=6817 SlurmdPort=6818 SchedulerPort=7321 SlurmctldPidFile=/var/slurm/slurmctld.pid SlurmdPidFile=/var/slurm/slurmd.%n.pid SlurmdSpoolDir=/var/slurm/spool/slurmd.%n.spool StateSaveLocation=/var/slurm/state SlurmctldLogFile=/var/slurm/log/slurmctld.log SlurmdLogFile=/var/slurm/log/slurmd.%n.log.%h SlurmUser=hpcadmin MpiDefault=none SwitchType=switch/none TaskPlugin=task/affinity TaskPluginParam=Sched SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 KillWait=30 MinJobAge=3600 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core AccountingStorageEnforce=associations AccountingStorageHost=localhost AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES JobCompType=jobcomp/slurmdbd JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmdDebug=5 SlurmctldDebug=5 Waittime=0 Epilog=/etc/slurm/slurm.epilog.clean GresTypes=gpu MaxArraySize=10000 MaxJobCount=5000000 MessageTimeout=100 SchedulerParameters=enable_user_top,default_queue_depth=1000000 PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=500000 PriorityFlags=FAIR_TREE NodeName=node[35-40] NodeHostname=bng1x[1847-1852] NodeAddr=node[35-40] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386626 NodeName=node[17-26] NodeHostName=bng1x[1590-1599] NodeAddr=node[17-26] CPUs=36 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257680 Feature=K2200 Gres=gpu:2 NodeName=node41 NodeHostName=bng1x1855 NodeAddr=node41 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386643 Feature=V100S Gres=gpu:2 NodeName=node[32-33] NodeHostname=bng1x[1793-1794] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 RealMemory=773690 NodeName=node[28-31] NodeHostname=bng1x[1737-1740] NodeAddr=node[28-31] Sockets=2 CoresPerSocket=28 RealMemory=257586 NodeName=node[27] NodeHostname=bng1x1600 NodeAddr=node27 Sockets=2 CoresPerSocket=18 RealMemory=515728 Feature=K40 Gres=gpu:2 NodeName=node[34] NodeHostname=bng1x1795 NodeAddr=node34 Sockets=2 CoresPerSocket=24 RealMemory=773682 Feature=RTX Gres=gpu:8 PartitionName=Normal Nodes=node[28-33,35-40] Default=Yes MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO PartitionName=testq Nodes=node41 Default=NO MaxTime=INFINITE State=UP Shared=YES PartitionName=smallgpu Nodes=node[34] Default=NO MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO PartitionName=biggpu Nodes=node[17-27] Default=NO MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO