Re: Dynamically changing remote servers list

Ole Tange Sun, 17 Aug 2014 03:30:16 -0700

On Sat, Aug 16, 2014 at 8:10 PM, Douglas A. Augusto <daaugu...@gmail.com> wrote:


> If the ability to dynamically include/exclude servers is implemented (for
> instance by re-reading a file containing the list of servers) then the user
> could take care of maintaining a list of active servers by doing something
> like (just to get the idea):
>
>    while true; do parallel -k 'if ssh {} /bin/true; then echo "{}"; fi' ::: 
> host1 host2 ... hostN > active_hosts.slf; sleep 10; done

So you are basically suggesting a daemon that keeps the slf updated.

Daemon:

forever {
  nice parallel --nonall -j0 -k --slf original.slf --tag echo | remove
final tab > tmp.slf
  if diff tmp.slf original.slf:
    mv tmp.slf tmp2.slf
  sleep 10
}


Parallel:

sub init {
  cp original.slf tmp2.slf
  start daemon
}

if tmp2.slf changed:
  @new = grep { not $existing{$_} } @slf
  @back = grep { $existing{$_} and $existing{$_}->jobslots == 0 } @slf
  @removed = grep { not in @slf } keys %existing

  for @new: add_host
  for @back: reset_jobslots
  for @removed: remove_host

sub add_host {
  do as normal
}

sub reset_jobslots {
  jobslots = original_jobslots
}

sub remove_host {
  set jobslots = 0
}

sub cleanup {
  kill daemon
  rm tmp.slf tmp2.slf
}

It is starting to look more and more doable.

> Of course, the jobs that were sent to the unavailable servers before they were
> detected as down will still fail. But in this case I think it is okay to 
> re-run
> GNU Parallel with --resume-failed.

Or the user should use --retries which actively selects a server on
which the job has failed the least number of times.


/Ole

Re: Dynamically changing remote servers list

Reply via email to