Running memory intensive jobs: Your help is needed

Ole Tange Mon, 15 Dec 2014 02:21:08 -0800

I and others have wanted to run jobs in parallel where the available
memory is the limiting factor.


I currently have a prototype that starts jobs until less than M MB is
free. If the free memory drops below M/2 MB, GNU Parallel kills off
the youngest jobs and puts them back on the queue for later retrying.

Git version 962ad80 can do:

memjob() {
  # 0.1 * 100_000_000 = 2g max
  (/usr/bin/time -v perl -e 'srand(shift);while(rand()>0.1) {
push@a,"x"x(rand()*100_000_000); } `sleep 3`' "$@" 2>&1 | grep
"Maximum resident"; echo $@ done);
}
export -f memjob

seq 100|parallel -j500 --joblog jl --memfree 4g --delay .1 --retries 40 memjob

Here a good value for --memfree seems to be 2x the max of the memory
use of the program.

It clearly works for some jobs, but it also fails miserably on other
jobs, so I would like to have something closer to real jobs to test
on.

You can easily imagine having 100 small jobs running that all are set
to grab 50% of the physical RAM at 23:59:59, then GNU Parallel will
not be fast enough to kill off jobs and the machine go into
swap-death. And I do not see a way to guard against that.

So if you would like to see this feature working in the future, please
help by producing some profile data from real programs, that you would
like to run. If you install 'forever' and 'timestamp' from tangetools
(https://github.com/ole-tange/tangetools), you should be able to run
this (replace my_program and my_helper_program with the programs you
want to measure).

  forever ps -au`whoami` u | timestamp|egrep 'my_program|my_helper_program'

The data I need in particular is: How fast is memory being used? How
much compared to before? Is it released again? The above command ought
to be able to give me that.

This can give me an indication of how much free "buffer" GNU Parallel
should leave, and how fast it must be able to react to change (by
killing the youngsters).


/Ole

Running memory intensive jobs: Your help is needed

Reply via email to