[root@holy-slurm01 ~]# squeue
squeue: error: slurm_receive_msg: Insane message length
slurm_load_jobs error: Insane message length

[root@holy-slurm01 ~]# sdiag
*******************************************************
sdiag output at Sun Sep 29 15:12:13 2013
Data since      Sat Sep 28 20:00:01 2013
*******************************************************
Server thread count: 3
Agent queue size:    0

Jobs submitted: 21797
Jobs started:   12030
Jobs completed: 12209
Jobs canceled:  70
Jobs failed:    5

Main schedule statistics (microseconds):
        Last cycle:   9207042
        Max cycle:    10088674
        Total cycles: 1563
        Mean cycle:   17859
        Mean depth cycle:  12138
        Cycles per minute: 1
        Last queue length: 496816

Backfilling stats
        Total backfilled jobs (since last slurm start): 9325
        Total backfilled jobs (since last stats cycle start): 4952
        Total cycles: 84
        Last cycle when: Sun Sep 29 15:06:15 2013
        Last cycle: 2555321
        Max cycle:  27633565
        Mean cycle: 6115033
        Last depth cycle: 3
        Last depth cycle (try sched): 2
        Depth Mean: 278
        Depth Mean (try depth): 62
        Last queue length: 496814
        Queue length mean: 100807

I'm guessing this is due to the fact that there are roughly 500,000 jobs in the queue. This is at our upper limit which is 500,000 (MaxJobCount). Is there anything that can be done about this? It seems that commands that query jobs such as squeue and scancel are not working. So I can't tell who sent in this many jobs.

-Paul Edmon-

Reply via email to