Hi all,
Today a single user submitted 7000 jobs and squeue and scancel returns the 
error message: Insane Message Length.
I have read on a previous topic in slurm devel 
list<https://groups.google.com/forum/#!searchin/slurm-devel/Insane$20message$20length|sort:relevance/slurm-devel/7gyGUEg3zWg/4cxCPzRMMc8J>
 that this is due to the fact that MAX_MSG_SIZE defines a total size of 16 Mb 
(our slurm version is 2.2.7), which is exceeded by these 7000 jobs. I was not 
able to cancel a single job with scancel.
With sacct I was able to retrieve the JobID of all the jobs in the queue.
My questions are:
If I stop the slurm control daemon and then I start it with the startclean 
option will I lose all the jobs?, only the pending ones?
Is there a way of cancelling all the pending jobs without cancelling also the 
running ones? I have 1000 jobs running at this moment and I would like to 
preserve them.
Would it be possible to stop slurmctld and then manually deleting them from 
/var/slurm-clustername?

Thanks in advance.

Juan Pancorbo.

Reply via email to