I will have to try a few of those tweaks to the configuration we have. They may help alot.

We are running bear metal hardware though we are logging quite a bit so that likely doesn't help.

I would say high throughput would be 100 jobs completing simultaneously and then it trying schedule those cores again only to have them come available immediately. Essentially the master gets so busy that it won't respond to any outside probing. The only way to get any info is to watch the log roll by as sdiag is also unresponsive.

Again we will have to try some of that machine tuning stuff. It should be helpful.

-Paul Edmon-

On 1/26/2014 7:21 PM, Moe Jette wrote:

A great deal depends upon your hardware and configuration. Slurm should be able to handle a few hundred jobs per soecond when tuned for high throughput as described here:
http://slurm.schedmd.com/high_throughput.html

If not tuned for high throughput, say with lots of logging, running on a virtual machine, etc. then the slurmctld daemon will definitely bog down. What sort of throughput were you seeing? Did the jobs just exit right away?

Moe Jette
SchedMD

Quoting Paul Edmon <[email protected]>:


So I've found that if some one submits a ton of jobs that have a very short runtime slurm tends to trash as jobs are launching and exiting pretty much constantly. Is there an easy way to enforce a minimum runtime?

-Paul Edmon-


Reply via email to