Hi everyone, I have a question relating to process memory usage.

Right now I'm using 'sched/backfill' with CR_CPU_MEMORY as a select type. Apart 
from having to use the "defer" parameter due to large job submissions, 
everything is working ok.

I have one particular case however where memory usage leads to sub-optimal 
usage, and I would like to hear if there's a better 
suggestion/approach/configuration.

If you remember my previous messages, I'm running batches (in the order of 
10-20k submissions) of bioinformatical programs. In this case I'm using 
"merlin". In a job submission I'm currently sitting at, I have that the 
"normal" usage is about ~1gb per-process, but in a 5% of cases usage spikes to 
9GB. Unfortunately, I cannot determine a-priory which process is going to take 
more memory.

I think you already see the problem. If I set a limit of ~1gb I can maximize 
CPU usage, but 5% of those jobs (taking as much as 6 hours) will be killed. If 
I set a 9GB memory limit, I can load less then 30% of my current CPU capacity. 
At this point it is simply worth to just to run everything with a ~1GB limit 
and re-run the killed instances. I cannot simply ignore memory allocation, 
since it already happened to have all those jobs allocated on a single 64 cores 
machine not capable to handle it.

I'm wondering if the GANG scheduler can help me there. I can put loads of swap 
space if necessary as long as the VM is not trashing all the time. It would be 
very nice if the scheduler would simply put those processes to sleep when a 
treshold is hit, just to re-schedule allocation with the current memory usage.

Thanks for any pointer.

Reply via email to