> I just swapped a machine to death by starting 1 jobs per CPU on a 48 > core machine. The problem was that each job took more than 1/48th of > the memory.
Definitely been there. > Would it make sense to have a setting in GNU > Parallel that automatically run 'ulimit ' with the relevant amount of > memory, so if you ask for X jobs to be run on a given server, then > each job is only allowed 1/X'th of the memory on the machine. I like it. Maybe '--whatever 100%' is up to the limit of the machine and '--whatever 100M' allows setting a specific size to be divided amongst the jobs. I wondered if one could set a memory limit on a group so that the on average the processes could use no more than 1/Xth of the memory but any individual process might employ more than 1/X. Reading the ulimit description in the bash(1) man page, I noticed this ulimit option: -v The maximum amount of virtual memory available to the shell and, on some systems, to its children It seems the goal you have is to enforce that none of the processes swap. Do you think a 'ulimit -v'-like behavior could be collectively attached to the group or processes that GNU parallel spawns? If so, could the virtual limit be set to be only the physical memory? > I am pretty sure it does make sense to have that. But would it also > make sense to have this as default (with an option to override it)? Or > will that be an unpleasant surprise? I would not make it a default for parallel (but I would personally immediately add any such option to my ~/.parallel/config). > For most situations it will not make a difference, so I am interested > in whether you will anticipate more surprise by having your big job > killed or the happyness by not swapping the machine to death? The principle of least surprise says running too many memory hungry processes should swap the machine into the ground, IMHO. - Rhys
