[slurm-dev] Re: submitting 100k job array causes slurmctld to socket timeout

lilinji Sun, 15 Mar 2015 18:29:10 -0700

we use TH-1A slurm cluster


when i use sbatch jobs 

zhe jobs always  pd  1 hour zhe jobs run  What reason be ?







李林吉
BGI | IT运维（NSCC-TJ）
电话 +86 02259803611, 手机 +86 18665379896, 传真 +86 02259803603
天津华大基因科技有限公司,天津华大基因科技有限公司临床检验所.
天津东丽区空港经济区环河北路80号 商务园东区E3楼，300308
邮箱: [email protected]   网站: http://www.genomics.cn
打印前请考虑环保
linji, Li
BGI | IT manager(NSCC-TJ)
Office +86 02259803611, Mobile +86 18665379896, Fax +86 02259803604
BGI Tianjin Corporation,BG Tianjin Medical Examination Laboratory
Add:E3,Tianjin,Airpot industrial Zone. Tianjin,china 300308
mail: [email protected]   web: http://www.genomics.cn
 
Please consider the environment before printing this email
 
Confidentiality Notice | This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential or 
proprietary information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, immediately 
contact the sender by reply e-mail and destroy all copies of the original 
message.
 
 
From: Daniel Letai
Date: 2015-03-15 21:46
To: slurm-dev
Subject: [slurm-dev] submitting 100k job array causes slurmctld to socket 
timeout
Hi,

Testing a new slurm cluster (14.11.4) on a 1k nodes cluster.

Several things we've tried:
Increase slurmctld threads (8 ports range)
Increase munge threads (threads=10)
Increase messageTimeout to 30


We are using accounting (db on different server)

Thanks for any help

[slurm-dev] Re: submitting 100k job array causes slurmctld to socket timeout

Reply via email to