Hi, We have noticed that when some of our users do big queries using sacct the slurmdbd daemon uses too much ram and the OOM killer kills the slurmdbd process.
We have in around ~11 million jobs in our database. If we try this query the slurmdbd memory usage goes above ~15G and crashes because of the OOM killer (The VM hosting it has 16G of ram) sacct --format=jobid,user,group,account,cluster,cputime, cputimeraw,elapsed,ncpus,state,start,end -X --allusers -S 2017-01-01T00:00 -E 2017-12-30T00:00 -s COMPLETED,FAILED,CANCELLED,TIMEOUT If we do a query which returns ~3 million jobs the memory usage for slurmdbd stays around ~4GB After some debugging we have noticed that mysql can handle the query without issues so there is no fine tunning that we can do in the mysql server. It's the slurmdbd's memory usage what grows really fast and then the OOM killer does his job. The problem we see is that any user in the cluster doing some testing with sacct can crash the slurmdbd daemon. Does anyone knows of any workaround for this issue? thanks in advance for any help or suggestion. regards, Pablo. p.s. I know I can increase the memory in the VM as a short-term solution but I guess this won't scale in the long term.