bugid: loadavg_invalid_content

Thomas Danckaert Tue, 13 Jun 2017 06:39:27 -0700

Dear parallel developers,

on our hpc cluster, when jobs on different nodes use parallelsimultaneously, they abort with the following error message:


parallel: This should not happen. You have found a bug.
Please contact <[email protected]> and include:
* The version number: 20170522

* The bugid: loadavg_invalid_content:/home/thomasd/.parallel/tmp/sshlogin/:/loadavg


This is the command being run

parallel --tmpdir /dev/shm/pbs.2448832.hpc-pbs --no-notice --load100% --delay 1 /home/thomasd/create_lut.sh -m 0 -l 4 -p 0 -s 2 -v {1}-r {2} /home/thomasd/grid_hpc.cfg ::: {0..12} ::: {0..8}

When only one parallel process is running at a time, it works fine.I think the parallel jobs on different nodes, which share the samehome directory, are accessing the same “loadavg” file.

As a workaround I can pass each parallel job a different$PARALLEL_HOME environment variable, and this seems to avoid theproblem. When I look in those directories, they each contain adirectory named after the hostname of the node used for the job (i.e.tmp/sshlogin/hpc-nXYZ), and a “:” directory (tmp/sshlogin/:).


Sincerely,

Thomas Danckaert

bugid: loadavg_invalid_content

Reply via email to