If you are running on CentOS/RedHat gstack is also a useful command to
print the stack.
On 05/13/2014 01:37 PM, Mario Kadastik wrote:
Hi,
I'm seeing in logs this:
[2014-05-13T14:24:22.363] server_thread_count over limit (256), waiting
and user commands get during that time:
[root@slurm-1 ~]# squeue -j 73271
squeue: error: slurm_receive_msg: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation
any ideas how to debug what the 256 threads are in fact doing to understand the
underlying cause? As I doubt it's normal that we're exhausting the thread count
on a 5000 jobslot cluster...
Mario Kadastik, PhD
Senior researcher
---
"Physics is like sex, sure it may have practical reasons, but that's not why we
do it"
-- Richard P. Feynman
--
Thanks,
/David/Bigagli
[email protected]