On Sat, Nov 23, 2013 at 1:19 AM, Ole Tange <[email protected]> wrote: > On Wed, Nov 20, 2013 at 11:11 AM, Adam Lindberg <[email protected]> wrote: >> Running the following command results in a crash > [...] >> $ parallel --filter-hosts --controlmaster -j 128 --nonall --tag --slf >> servers 'ps aux | grep [o]psworks | wc -l’
I have worked a bit on this. The problem seems to be that --filter-hosts starts 4 ssh connections to each server in parallel. If the connections are proxied through a single machine (e.g. using SSH's ProxyCommand or ControlMaster) then this single machine's ssh daemon may be overloaded and reject some ssh connections. The problem only arises when there are a lot of machines (e.g. if you have 3 machines then it will never happen), and only when you do not connect directly (i.e. no proxy = no problems). My experiments show that putting a delay (--delay 0.1) in for every ssh command makes the problem much smaller. The same is true if the connections are retried (--retries 3). The problem by using these is that it makes --filter-hosts slower, and if you have many hosts and you connect directly to these hosts, then you will be paying a price without getting any benefit. I have chosen safety over speed, so --filter-hosts ought to work better now - albeit slower: 0.4 seconds per host + 18 seconds if one or more hosts are down. /Ole
