We’ve been experiencing this as well, and our simple solution is to actually
keep trying the ssh connection instead of just waiting:
Something like this:
def wait_for_ssh_connection(opts, host):
u.message("Waiting for ssh connection to host {}".format(host))
connected = False
while (connected==False):
try:
if (subprocess.check_call(s.ssh_command(opts) + ['-t', '-t', '%s@%s' %
(opts.user, host), "ls"])==0):
connected = True
except subprocess.CalledProcessError as e:
print "Ssh connection to host {} failed, retrying in 10
seconds...".format(host)
time.sleep(10)
print "Ssh connection to host {} successfully established!".format(host)
HTH
Pierre Borckmans
RealImpact Analytics | Brussels Office
www.realimpactanalytics.com | [email protected]
FR +32 485 91 87 31 | Skype pierre.borckmans
On 19 Apr 2014, at 06:51, Patrick Wendell <[email protected]> wrote:
> Unfortunately - I think a lot of this is due to generally increased latency
> on ec2 itself. I've noticed that it's way more common than it used to be for
> instances to come online past the "wait" timeout in the ec2 script.
>
>
> On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <[email protected]>
> wrote:
> Aureliano,
>
> I've been noticing this error recently as well:
>
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
>
> However, this isn't an issue with the spark-ec2 scripts. After the scripts
> fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will
> finish launching and port 22 will open up. Until the EC2 host has launched
> and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts
> will fail. I've noticed that EC2 machine launch latency seems to be highest
> in Oregon; I haven't run into this problem on either the California or
> Virgina EC2 farms. To work around this issue, I've manually modified my copy
> of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to
> work OK. Might be worth a try on your end. I can't comment about the password
> request; I haven't seen that on my end.
>
> Regards,
>
> Frank Austin Nothaft
> [email protected]
> [email protected]
> 202-340-0466
>
>
> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[email protected]>
> wrote:
> Hi,
>
> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors
> like:
>
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
>
> .. and recently, it prompts for passwords!:
>
> Warning: Permanently added '' (RSA) to the list of known hosts.
> Password:
>
> Note that the hostname in Permanently added '' is missing in the log, which
> is probably why it asks for a password.
>
> Is this a known bug?
>
>