I have created the following issue for this: https://issues.apache.org/jira/browse/WHIRR-501
On Fri, Feb 3, 2012 at 8:24 PM, Andrei Savu <[email protected]> wrote: > Good catch Karel! I have tried to investigate this in the past but I have > never considered that it may be a race condition with a cron job (most of > the synchronisation tests we've added are designed to prove that this is > not a condition triggered by Whirr). > > What if we stop the crond service while running the install/configure > scripts? > http://www.cyberciti.biz/faq/howto-linux-unix-start-restart-cron/ > > >> In my opinion, as much of the installation/configuration steps should >> be done using a config management tool (puppet/chef). >> > > Totally agree + we have the needed infrastructure for this. > > >> Once the configuration is published to each node you can trigger >> puppet/chef it as much as you like, and eventually you should reach a >> good state. Running the complete whirr-generated script(s) multiple >> times is going to be slower and much more error prone. >> > > + it's hard to make retry-friendly bash scripts. > > >> >> Regards, >> Karel >> >> On Mon, Oct 3, 2011 at 10:22 PM, Paul Baclace <[email protected]> >> wrote: >> > Two runs of whirr on EC2 yesterday randomly failed to install Hadoop >> > components. First it occurred on the master node, but when it occurred >> in >> > one slave and not another, I could find the diff of the /tmp/logs/ from >> > jclouds. In a third run, everything worked fine. Same scripts driving >> > whirr, same AMI, same number of nodes, same region, etc. Snippets of >> > /tmp/logs/stderr.log shown below indicate that apt-get update had >> "Could not >> > get lock /var/lib/dpkg/lock" on one slave, but not another. >> > >> > This is a serious reliability issue. What is non-deterministic here? >> > >> > Paul >> > >> > ------------ slave 1 ------------------- >> > + register_cloudera_repo >> > + which dpkg >> > + cat >> > + curl -s http://archive.cloudera.com/debian/archive.key >> > + sudo apt-key add - >> > + sudo apt-get update >> > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource >> temporarily >> > unavailable) >> > E: Unable to lock the administration directory (/var/lib/dpkg/), is >> another >> > process using it? >> > + which dpkg >> > + apt-get update >> > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource >> temporarily >> > unavailable) >> > E: Unable to lock the administration directory (/var/lib/dpkg/), is >> another >> > process using it? >> > + apt-get -y install hadoop-0.20 >> > >> > -------------- slave 2 --------------- >> > + register_cloudera_repo >> > + which dpkg >> > + cat >> > + curl -s http://archive.cloudera.com/debian/archive.key >> > + sudo apt-key add - >> > + sudo apt-get update >> > + which dpkg >> > + apt-get update >> > + apt-get -y install hadoop-0.20 >> > dpkg-preconfigure: unable to re-open stdin: >> > + cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.dist >> > + update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf >> > /etc/hadoop-0.20/conf.dist 90 >> > + install_cdh_hbase -c aws-ec2 -u >> > http://apache.cs.utah.edu/hbase/hbase-0.90.3/hbase-0.90.3.tar.gz >> > >> > ------------- >> >> >> >> -- >> Karel Vervaeke >> http://outerthought.org/ >> Open Source Content Applications >> Makers of Kauri, Daisy CMS and Lily >> > >
