Hi,

I noticed a node today with a load average of like 5,000.  After logging
in to investigate I noticed that there were thousands of flock processes
running.

Snipped ps aux|grep flock:

root     80850  0.0  0.0   4056   484 ?        S    11:48   0:00 flock -x
/cgroup/memory -c /etc/slurm/cgroup/release_memory sync
/slurm/uid_3373/job_959282
root     80851  0.0  0.0   4056   488 ?        S    11:48   0:00 flock -x
/cgroup/memory -c /etc/slurm/cgroup/release_memory sync
/slurm/uid_3373/job_959282
root     80852  0.0  0.0   4056   484 ?        S    11:48   0:00 flock -x
/cgroup/memory -c /etc/slurm/cgroup/release_memory sync
/slurm/uid_3373/job_959282
root     81016  0.0  0.0   4056   480 ?        S    11:48   0:00 flock -x
/cgroup/memory -c rmdir /cgroup/memory/slurm/uid_3373
root     81026  0.0  0.0   4056   484 ?        S    11:48   0:00 flock -x
/cgroup/memory -c rmdir /cgroup/memory/slurm/uid_3373
root     81470  0.0  0.0 103248   900 pts/1    S+   11:48   0:00 grep flock


I haven’t seen this before, likely something to do with the type of job
that is running on the node possibly?  Whats going on here?  Bad code in
the job?  Or bad slurm config somewhere? Is there something that can be
done to fix this behavior?

Slurm version 14.11.3

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167


Reply via email to