Looks like a network congestion issue to me. I don't know how to do this but I would try to increase the heartbeat timeout.
Tom any ideas? Have you seen this before on aws? I don't think there is something wrong with the AMI, I suspect there is something wrong with the Hadoop configuration. On Wednesday, October 5, 2011, John Conwell wrote: > It starts with hadoop reporting bocks of data being 'lost', then individual > data nodes stop responding, the individual data nodes get taken off line, > then jobs get killed, then data nodes come back on line and the data blocks > get replicated back out the correct replication factor. > > The end result are about 80% of the time, my hadoop jobs get killed because > some task fails 3 times in a row, but about an hour after the job gets > killed, all data nodes are back online and all data is fully replicated. > > Before I go rat holing down "why are my data nodes going down", I want to > cover the easy scenarios like "oh yea...your totally misconfigured. You > should use ABC ami with the cloudera install and config scripts". Basically > validate if there are any best practices for setting up a cloudera > distribution of hadoop on EC2. > > I know cloudera has created their own AMIs. Should I be using them? Does > it matter? > > > > On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu > <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> > > wrote: > >> What do you mean by failing? Is the Hadoop daemon shutting down or the >> machine as a whole? >> >> On Wednesday, October 5, 2011, John Conwell wrote: >> >>> I'm having stability issues (data nodes constantly failing under very >>> little load) on the hadoop clusters I'm creating, and I'm trying to figure >>> out the best practice for creating the most stable hadoop environment on >>> EC2. >>> >>> In order to run the cdh install and config scripts, I'm >>> setting whirr.hadoop-install-function to install_cdh_hadoop, and >>> whirr.hadoop-configure-function to configure_cdh_hadoop. But I'm using a >>> plain jane ubuntu amd64 ami (ami-da0cf8b3). Should I also be using the >>> cloudera AMIs as well as the cloudera install and config scripts. >>> >>> Are they any best practices for how to setup a cloudera distribution of >>> hadoop on EC2? >>> >>> -- >>> >>> Thanks, >>> John C >>> >>> >> >> -- >> -- Andrei Savu / andreisavu.ro >> >> > > > -- > > Thanks, > John C > > -- -- Andrei Savu / andreisavu.ro
