On Sat, Aug 16, 2014 at 8:10 PM, Douglas A. Augusto <daaugu...@gmail.com> wrote:
> If the ability to dynamically include/exclude servers is implemented (for > instance by re-reading a file containing the list of servers) then the user > could take care of maintaining a list of active servers by doing something > like (just to get the idea): > > while true; do parallel -k 'if ssh {} /bin/true; then echo "{}"; fi' ::: > host1 host2 ... hostN > active_hosts.slf; sleep 10; done So you are basically suggesting a daemon that keeps the slf updated. Daemon: forever { nice parallel --nonall -j0 -k --slf original.slf --tag echo | remove final tab > tmp.slf if diff tmp.slf original.slf: mv tmp.slf tmp2.slf sleep 10 } Parallel: sub init { cp original.slf tmp2.slf start daemon } if tmp2.slf changed: @new = grep { not $existing{$_} } @slf @back = grep { $existing{$_} and $existing{$_}->jobslots == 0 } @slf @removed = grep { not in @slf } keys %existing for @new: add_host for @back: reset_jobslots for @removed: remove_host sub add_host { do as normal } sub reset_jobslots { jobslots = original_jobslots } sub remove_host { set jobslots = 0 } sub cleanup { kill daemon rm tmp.slf tmp2.slf } It is starting to look more and more doable. > Of course, the jobs that were sent to the unavailable servers before they were > detected as down will still fail. But in this case I think it is okay to > re-run > GNU Parallel with --resume-failed. Or the user should use --retries which actively selects a server on which the job has failed the least number of times. /Ole