It's happening again. The error message is: ssh: connect to host X.X.X.X port 22: Connection refused
I ran nmap on the instance, and it's not accepting socket connections on any TCP ports, so it's not just SSH that's a problem -- the datanode and tasktracker clearly aren't running either. I can ping it, so it's at least partially booted. I'm about ready to give up on this Ubuntu 10.04 image. There's this reboot flakiness, and I'm noticing that the mapred/hdfs nofile limit "upgrade" via /etc/security/limits.d/hadoop.nofiles.conf appears to work only a portion of the time (e.g. one out of five of my cluster nodes had the appropriate nofile limit reported by ulimit -a one hour ago, now none of the five have the "upgraded" value). What 64 bit AMI are you guys using for high volume data processing using hive and mapreduce? My workload is on the order of 150 GB compressed, ~8K files, and I chew through it in around 1.5-2 hours on most days given a 15+1 node cluster (m1.xlarges in EC2).
