On 20111112 11:51 , Andrei Savu wrote:
Paul -

I am sorry you are affected by this issue but there isn't much we can do when the external repositories are unavailable - except for failing fast. Any other suggestion?

I think it is a bug to not detect a stuck package lock (/var/lib/dpkg/lock etc.). If only one invocation of a package installer should be active at a time (reasonable assumption, especially since you don't really want to rely on the package locking mechanism to serialize parallel threads), then the function script could look for the lock first, and remove it if it is more than 1 minute old. Since install/configure is done when no users are logged in, the normal reasons for locking are not present.


I know you are using a custom AMI - what if you install the JDK by default and override the install_java function with an empty one?

I tried that, but my pre-installed java was under a different dir and some paths are hardwired; this approach can work, but I could not get it working in the time I had (1 hour) to correct the problem.


Paul


-- Andrei Savu

On Sat, Nov 12, 2011 at 9:20 PM, Paul Baclace <[email protected] <mailto:[email protected]>> wrote:


    HOLDING dpkg lock::

0 S root 961 762 0 80 0 - 6669 poll_s 18:29 ? 00:00:00 apt-get -y install sun-java6-jdk

    (still holding after 45 minutes.)

    Root cause of problem is:  The jdk installs are failing, timeout
    occurs, a retry (or just marching on) succeeds eventually, but
    that leaves dpkg locked, so no further installs occur.

    I launched 21 broken clusters this morning...  :^(


    Paul


    On 20111112 11:02 , Paul Baclace wrote:
    Here is a guess:  a remote depo went missing during an install,
    and the package system was left in a locked state, never to be
    cleared again.

    What if Whirr forced the dpkg lock clear?  Does it rely on that
    lock for serialization?

    Paul


    On 20111112 10:44 , Paul Baclace wrote:
    I am seeing this error, not due to any change I made:

    E: Could not get lock /var/lib/dpkg/lock - open (11: Resource
    temporarily unavailable)
    E: Unable to lock the administration directory (/var/lib/dpkg/),
    is another process using it?

    What causes this intermittent problem?  At the moment, it is
    very repeatable.


    Paul

    On 20111111 22:23 , Andrei Savu wrote:
    Can you make the S3 files public? Is this happening on all
    machines?

    You should probably consider
    using whirr.instance-templates-max-percent-failures as
    described here:
    http://whirr.apache.org/docs/0.6.0/configuration-guide.html

    Cheers,

    -- Andrei Savu / andreisavu.ro <http://andreisavu.ro>

    On Sat, Nov 12, 2011 at 2:22 AM, Arun Ramakrishnan
    <[email protected]
    <mailto:[email protected]>> wrote:

        Guys,

        It looks like the apt hadoop packages aren't getting
        installed. Any ideas ?

        ###################################################

        2011-11-11 12:31:31,893 DEBUG [jclouds.compute] (user
        thread 6) << stderr from jclouds-script-1321043482986 as
        [email protected] <mailto:[email protected]>
        sed: can't read /etc/hadoop-0.20/conf.dist/hadoop-env.sh:
        No such file or directory
        sed: can't read /etc/hadoop-0.20/conf.dist/hadoop-env.sh:
        No such file or directory
        chgrp: invalid group: `hadoop'
        chgrp: invalid group: `hadoop'
        E: Could not get lock /var/lib/dpkg/lock - open (11:
        Resource temporarily unavailable)
        E: Unable to lock the administration directory
        (/var/lib/dpkg/), is another process using it?
        hadoop-0.20-datanode: unrecognized service
        E: Could not get lock /var/lib/dpkg/lock - open (11:
        Resource temporarily unavailable)
        E: Unable to lock the administration directory
        (/var/lib/dpkg/), is another process using it?
        hadoop-0.20-tasktracker: unrecognized service

        ##################################################

        I am using a binaries that i built form 0.7 a few weeks back.


        Full log :
        http://incentica-public.s3.amazonaws.com/whirr-ccore44.log
        Config  :
        http://incentica-public.s3.amazonaws.com/whirr_cdh.properties


        This seems to happen non-deterministically and more so for
        larger clusters 10+


        thanks
        Arun







Reply via email to