On 10/26/2015 10:03 AM, Trey Dockendorf wrote: > We ran into this swap issue when using SelectTypeParameters=CR_CPU_Memory. > We are still on 14.03.10 and have a very ugly hack that adds a > SchedulerParameter of "assume_swap" which basically forces SLURM to ignore > memory allocations of swapped jobs. The patch was very rushed so likely we > ended up just making SLURM behave like it's configured with CR_CPU instead > of CR_CPU_Memory. When we upgrade to 15.08.x we will be using CR_CPU > without our patch since we define MaxMemoryPerCPU on all partitions. So > far in testing, CR_CPU and MaxMemoryPerCPU results in behavior where a 64GB > node can have 64GB worth of suspended jobs and still run 64GB worth of > active jobs.
That's exactly what I want. The ability to use swap to suspend jobs, leaving 100% of system ram for running jobs. > If a user requests 1 CPU and 64GB with MaxMemoryPerCPU=2000, > they end up with 32 CPUs which we use for QOS resource limits and > accounting. Ah, I didn't realize that's the way that worked. Seems like that fixes my problem, by not tracking ram or swap that eliminates the problem. > Attached are the patches. They likely only work on 14.03.x releases. I > wouldn't recommend using the patches, but they may give an idea of how to > implement a proper solution that is worthy of being submitted for inclusion > in SLURM. The only problem I can think of with using CR_CPU and MaxMemoryPerCPU is slurm will consume CPUs when users need more ram per CPU than is available. Assuming a node with 32 CPUs and 64 GB ram. When a user asks for 1 CPU and 32GB ram 16 CPUs will be consumed, but 15 are left idle. So if another user submits a bunch of 1GB ram 1 CPU jobs they won't be able to use the 15 idle CPUs that resulted from the large memory job. Thanks for the patch and tip. I applied it to 14.03 successfully, but the changes in 15.08 look large enough to make it difficult. 10 of 17 diffs failed in the first patch and some of the function calls have different parameters.
