This bites me regularly as well. I suspect this is caused by cron jobs - there are several cron jobs invoking dpkg/apt/aptitude, all of which take the dpkg lock.
I'm seeing this on byon nodes, so a quick hack is to disable these cron jobs. (e.g. Simply removing these files should do the trick): "/etc/cron.daily/standard", "/etc/cron.daily/dpkg", "/etc/cron.daily/man-db", "/etc/cron.daily/apt", "/etc/cron.daily/aptitude", "/etc/cron.weekly/man-db" For ec2 you would have to create your own images without those cron jobs (blech) This kind of problem (and others, such as failed downloads and other randomness) can never by completely avoided. In my opinion, as much of the installation/configuration steps should be done using a config management tool (puppet/chef). Once the configuration is published to each node you can trigger puppet/chef it as much as you like, and eventually you should reach a good state. Running the complete whirr-generated script(s) multiple times is going to be slower and much more error prone. Regards, Karel On Mon, Oct 3, 2011 at 10:22 PM, Paul Baclace <[email protected]> wrote: > Two runs of whirr on EC2 yesterday randomly failed to install Hadoop > components. First it occurred on the master node, but when it occurred in > one slave and not another, I could find the diff of the /tmp/logs/ from > jclouds. In a third run, everything worked fine. Same scripts driving > whirr, same AMI, same number of nodes, same region, etc. Snippets of > /tmp/logs/stderr.log shown below indicate that apt-get update had "Could not > get lock /var/lib/dpkg/lock" on one slave, but not another. > > This is a serious reliability issue. What is non-deterministic here? > > Paul > > ------------ slave 1 ------------------- > + register_cloudera_repo > + which dpkg > + cat > + curl -s http://archive.cloudera.com/debian/archive.key > + sudo apt-key add - > + sudo apt-get update > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily > unavailable) > E: Unable to lock the administration directory (/var/lib/dpkg/), is another > process using it? > + which dpkg > + apt-get update > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily > unavailable) > E: Unable to lock the administration directory (/var/lib/dpkg/), is another > process using it? > + apt-get -y install hadoop-0.20 > > -------------- slave 2 --------------- > + register_cloudera_repo > + which dpkg > + cat > + curl -s http://archive.cloudera.com/debian/archive.key > + sudo apt-key add - > + sudo apt-get update > + which dpkg > + apt-get update > + apt-get -y install hadoop-0.20 > dpkg-preconfigure: unable to re-open stdin: > + cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.dist > + update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf > /etc/hadoop-0.20/conf.dist 90 > + install_cdh_hbase -c aws-ec2 -u > http://apache.cs.utah.edu/hbase/hbase-0.90.3/hbase-0.90.3.tar.gz > > ------------- -- Karel Vervaeke http://outerthought.org/ Open Source Content Applications Makers of Kauri, Daisy CMS and Lily
