On Thu, Jun 23, 2011 at 2:47 AM, Jon Wilson <[email protected]> wrote:

>  1) submit jobs that will take several hours to run, during which time I
> won't have anything else in particular to do
>  2) Go work on bringing cluster nodes back up
>  3) Change ~/.parallel/sshloginfile
>  4) GNU parallel notices that the file has changed, just like if I were
> using -j procfile, and immediately starts jobs on those additional nodes.
>
> I am using parallel 20110522.  Is this behavior already implemented?  If
> not, I would like to request this feature.

That is currently not implemented.

A workaround for you may be to put all the nodes in
~/.parallel/sshloginfile and use --retry to retry the job if it fails
on a node (e.g. if it is not up). You should set --retry to
number_of_nodes_down+1, so that if GNU Parallel retries on another
node that is down, it will retry until it finds at least one that is
up.

It is abusing the --retry and if a job actually _does_ fail, then you
will run that job number_of_nodes_down+1 times.

If you still want the feature, file a Whislist at
https://savannah.gnu.org/bugs/?group=parallel

/Ole

Reply via email to