This bites me regularly as well.
I suspect this is caused by cron jobs - there are several cron jobs
invoking dpkg/apt/aptitude, all of which take the dpkg lock.

I'm seeing this on byon nodes, so a quick hack is to disable these
cron jobs. (e.g. Simply removing these files should do the trick):
"/etc/cron.daily/standard",
"/etc/cron.daily/dpkg",
"/etc/cron.daily/man-db",
"/etc/cron.daily/apt",
"/etc/cron.daily/aptitude",
"/etc/cron.weekly/man-db"

For ec2 you would have to create your own images without those cron jobs (blech)

This kind of problem (and others, such as failed downloads and other
randomness) can never by completely avoided.
In my opinion, as much of the installation/configuration steps should
be done using a config management tool (puppet/chef).
Once the configuration is published to each node you can trigger
puppet/chef it as much as you like, and eventually you should reach a
good state. Running the complete whirr-generated script(s) multiple
times is going to be slower and much more error prone.

Regards,
Karel

On Mon, Oct 3, 2011 at 10:22 PM, Paul Baclace <[email protected]> wrote:
> Two runs of whirr on EC2 yesterday randomly failed to install Hadoop
> components.  First it occurred on the master node, but when it occurred in
> one slave and not another, I could find the diff of the /tmp/logs/ from
> jclouds.  In a third run, everything worked fine.  Same scripts driving
> whirr, same AMI, same number of nodes, same region, etc. Snippets of
> /tmp/logs/stderr.log shown below indicate that apt-get update had "Could not
> get lock /var/lib/dpkg/lock" on one slave, but not another.
>
> This is a serious reliability issue.  What is non-deterministic here?
>
> Paul
>
> ------------ slave 1 -------------------
> + register_cloudera_repo
> + which dpkg
> + cat
> + curl -s http://archive.cloudera.com/debian/archive.key
> + sudo apt-key add -
> + sudo apt-get update
> E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
> unavailable)
> E: Unable to lock the administration directory (/var/lib/dpkg/), is another
> process using it?
> + which dpkg
> + apt-get update
> E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
> unavailable)
> E: Unable to lock the administration directory (/var/lib/dpkg/), is another
> process using it?
> + apt-get -y install hadoop-0.20
>
> -------------- slave 2 ---------------
> + register_cloudera_repo
> + which dpkg
> + cat
> + curl -s http://archive.cloudera.com/debian/archive.key
> + sudo apt-key add -
> + sudo apt-get update
> + which dpkg
> + apt-get update
> + apt-get -y install hadoop-0.20
> dpkg-preconfigure: unable to re-open stdin:
> + cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.dist
> + update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf
> /etc/hadoop-0.20/conf.dist 90
> + install_cdh_hbase -c aws-ec2 -u
> http://apache.cs.utah.edu/hbase/hbase-0.90.3/hbase-0.90.3.tar.gz
>
> -------------



-- 
Karel Vervaeke
http://outerthought.org/
Open Source Content Applications
Makers of Kauri, Daisy CMS and Lily

Reply via email to