It's happening again.  The error message is:

ssh: connect to host X.X.X.X port 22: Connection refused


I ran nmap on the instance, and it's not accepting socket connections on
any TCP ports, so it's not just SSH that's a problem -- the datanode and
tasktracker clearly aren't running either.  I can ping it, so it's at least
partially booted.

I'm about ready to give up on this Ubuntu 10.04 image.  There's this reboot
flakiness, and I'm noticing that the mapred/hdfs nofile limit "upgrade" via
/etc/security/limits.d/hadoop.nofiles.conf appears to work only a portion
of the time (e.g. one out of five of my cluster nodes had the appropriate
nofile limit reported by ulimit -a one hour ago, now none of the five have
the "upgraded" value).

What 64 bit AMI are you guys using for high volume data processing using
hive and mapreduce?  My workload is on the order of 150 GB compressed, ~8K
files, and I chew through it in around 1.5-2 hours on most days given a
15+1 node cluster (m1.xlarges in EC2).

Reply via email to