On Wed, May 7, 2014 at 10:33 AM, Sebastian Eiser
<sebastian.ei...@gmail.com> wrote:

> Just a thought, which may be a simple solution, but suiting most people.
>
> The kernel is pretty good at killing misbehaving jobs. @Ole: can you capture
> SIGKILL from a job? Can you record memory usage shortly after SIGKILL?

If you see --joblog I capture the exit value and signal that the
command died from.

If we are talking swapping, the kernel will rarely kill a job. It will
do that if we are running out of memory (both virtual and physical),
and that situation is way simpler to deal with:

cat jobs | parallel -j100% --joblog my_joblog
cat jobs | parallel -j50% --resume-failed --joblog my_joblog
cat jobs | parallel -j25% --resume-failed --joblog my_joblog
cat jobs | parallel -j12% --resume-failed --joblog my_joblog
cat jobs | parallel -j6% --resume-failed --joblog my_joblog
cat jobs | parallel -j3% --resume-failed --joblog my_joblog
cat jobs | parallel -j1% --resume-failed --joblog my_joblog
cat jobs | parallel -j1 --resume-failed --joblog my_joblog

This should scale up to 256 core-machines. So the above should work if
you have disabled swap and enabled the OOM-killer.

> Some people disable swap deliberately, so using swap as metric might not be
> general enough.

The above deals with that situation.

/Ole

Reply via email to