Nathan: How many interfaces or IP's are you bringing up? That error
message makes it sound like there could be a lot of contention on the
lock. Could you also get the output of `pstree | grep -B3 lockfile`
while a VM is coming up? (You'll need to  attach to a free virtual
terminal using the kvm console).

Upon reading more of the lockfile-create manpage, it appears that
there's a non-configurable 5-minute timeout on stale locks. Setting the
--use-pid option might free up the lock more quickly if the parent
process has died for some reason.

It's not clear to me how this could prevent networking from coming up,
since the network has to be up for NTP to run, and the if-up.d script
backgrounds the ntpdate locking+syncing script. sshd in 12.04 and 14.04
is started from an upstart script which does not depend on the NTP
service. The NTP service itself is fairly early in the sysvinit order at
S23, so there might be other init scripts blocked behind it.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ntp in Ubuntu.
https://bugs.launchpad.net/bugs/1125726

Title:
  boot-time race between /etc/network/if-up.d/ntpdate and
  "/etc/init.d/ntp start"

Status in ntp package in Ubuntu:
  Fix Released
Status in ntp source package in Precise:
  Fix Released
Status in ntp source package in Trusty:
  Fix Released

Bug description:
  [Impact] 
  * Hardware clocks are not stepped at boot, which can prevent NTP from ever
    syncing the clock.
    Incorrect clocks can cause serious issues in distributed systems.

  * Upstream originally added a lock file to eliminate a race between the ntp
    service (which keeps the clock synchronized during normal operation) and
    ntpdate (which is used to step the clock by large intervals at boot time).
    That change had a flaw which introduced a deadlock. An Ubuntu patch was
    applied which broke the locking mechanism entirely, reintroducing the race
    condition.

  * This change undoes the Ubuntu patch and fixes the deadlock by unlocking
    before attempting to start the ntp service.

  [Test Case]

  * There are two bugs: The race, and the deadlock. To reproduce the race more
    consistently:
    - add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding
      '/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out
      'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will
      reproduce the case where the ntp service starts between the stop command
      and the ntpdate command.
      The result will be that the ntpdate command fails. There will be a
      message in syslog like:
        'ntpdate[17660]: the NTP socket is in use, exiting'
    - Reintroducing the lock brings back the deadlock issue. Both the ntpdate
      if-up.d script and the ntp init script check the lock file, but the
      ntpdate script attempted to start the ntp init script before unlocking
      the lock. Moving the unlock before the init script invocation fixes
      the deadlock. The original deadlock behavior is described here:
        https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203

  [Regression Potential]

  * Low. Out-of-sync clocks could be changed a large amount at boot time, but
    only for machines with static IP's. The clock is only likely to be in this
    state if the clock was very skewed at boot time, which is also unlikely
    since NTP usually keeps the software clock in sync during operation and
    the hardware clock is updated at shutdown.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1125726/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to