If you are running on CentOS/RedHat gstack is also a useful command to print the stack.

On 05/13/2014 01:37 PM, Mario Kadastik wrote:
Hi,

I'm seeing in logs this:

[2014-05-13T14:24:22.363] server_thread_count over limit (256), waiting

and user commands get during that time:

[root@slurm-1 ~]# squeue -j 73271
squeue: error: slurm_receive_msg: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation

any ideas how to debug what the 256 threads are in fact doing to understand the 
underlying cause? As I doubt it's normal that we're exhausting the thread count 
on a 5000 jobslot cluster...

Mario Kadastik, PhD
Senior researcher

---
   "Physics is like sex, sure it may have practical reasons, but that's not why we 
do it"
      -- Richard P. Feynman

--

Thanks,
/David/Bigagli
[email protected]

Reply via email to