On 20111112 11:51 , Andrei Savu wrote:
Paul -
I am sorry you are affected by this issue but there isn't much we can
do when the external repositories are unavailable - except for failing
fast. Any other suggestion?
I think it is a bug to not detect a stuck package lock
(/var/lib/dpkg/lock etc.). If only one invocation of a package
installer should be active at a time (reasonable assumption, especially
since you don't really want to rely on the package locking mechanism to
serialize parallel threads), then the function script could look for the
lock first, and remove it if it is more than 1 minute old. Since
install/configure is done when no users are logged in, the normal
reasons for locking are not present.
I know you are using a custom AMI - what if you install the JDK by
default and override the install_java function with an empty one?
I tried that, but my pre-installed java was under a different dir and
some paths are hardwired; this approach can work, but I could not get it
working in the time I had (1 hour) to correct the problem.
Paul
-- Andrei Savu
On Sat, Nov 12, 2011 at 9:20 PM, Paul Baclace <[email protected]
<mailto:[email protected]>> wrote:
HOLDING dpkg lock::
0 S root 961 762 0 80 0 - 6669 poll_s 18:29 ?
00:00:00 apt-get -y install sun-java6-jdk
(still holding after 45 minutes.)
Root cause of problem is: The jdk installs are failing, timeout
occurs, a retry (or just marching on) succeeds eventually, but
that leaves dpkg locked, so no further installs occur.
I launched 21 broken clusters this morning... :^(
Paul
On 20111112 11:02 , Paul Baclace wrote:
Here is a guess: a remote depo went missing during an install,
and the package system was left in a locked state, never to be
cleared again.
What if Whirr forced the dpkg lock clear? Does it rely on that
lock for serialization?
Paul
On 20111112 10:44 , Paul Baclace wrote:
I am seeing this error, not due to any change I made:
E: Could not get lock /var/lib/dpkg/lock - open (11: Resource
temporarily unavailable)
E: Unable to lock the administration directory (/var/lib/dpkg/),
is another process using it?
What causes this intermittent problem? At the moment, it is
very repeatable.
Paul
On 20111111 22:23 , Andrei Savu wrote:
Can you make the S3 files public? Is this happening on all
machines?
You should probably consider
using whirr.instance-templates-max-percent-failures as
described here:
http://whirr.apache.org/docs/0.6.0/configuration-guide.html
Cheers,
-- Andrei Savu / andreisavu.ro <http://andreisavu.ro>
On Sat, Nov 12, 2011 at 2:22 AM, Arun Ramakrishnan
<[email protected]
<mailto:[email protected]>> wrote:
Guys,
It looks like the apt hadoop packages aren't getting
installed. Any ideas ?
###################################################
2011-11-11 12:31:31,893 DEBUG [jclouds.compute] (user
thread 6) << stderr from jclouds-script-1321043482986 as
[email protected] <mailto:[email protected]>
sed: can't read /etc/hadoop-0.20/conf.dist/hadoop-env.sh:
No such file or directory
sed: can't read /etc/hadoop-0.20/conf.dist/hadoop-env.sh:
No such file or directory
chgrp: invalid group: `hadoop'
chgrp: invalid group: `hadoop'
E: Could not get lock /var/lib/dpkg/lock - open (11:
Resource temporarily unavailable)
E: Unable to lock the administration directory
(/var/lib/dpkg/), is another process using it?
hadoop-0.20-datanode: unrecognized service
E: Could not get lock /var/lib/dpkg/lock - open (11:
Resource temporarily unavailable)
E: Unable to lock the administration directory
(/var/lib/dpkg/), is another process using it?
hadoop-0.20-tasktracker: unrecognized service
##################################################
I am using a binaries that i built form 0.7 a few weeks back.
Full log :
http://incentica-public.s3.amazonaws.com/whirr-ccore44.log
Config :
http://incentica-public.s3.amazonaws.com/whirr_cdh.properties
This seems to happen non-deterministically and more so for
larger clusters 10+
thanks
Arun