** Description changed:

  init (in Ubuntu 13.04) accidentally aborts with the following messages
  in the kernel log:
  
  [    4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! 
log->io
  [    4.991031] init: Caught abort, core dumped
  [    4.995571] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000600
  
- I've figured out that the probable cause is a race between the line 699 of 
the file init/log.c
- of the function log_read_watch() that sets the field log->remote_closed to 1 
(the caller clears log->io)
- and the routine log_clear_unflushed() that causes the abort when 
log->remote_closed == 1 but log->io != NULL:
- log_clear_unflushed() may preempt log_read_watch() when log->remote_closed is 
set to 1 but
- before log->io is cleared for the same log.
+ I've figured out that the probable cause is a race between the line 699 of 
the file init/log.c of the function log_read_watch() that sets the field
+ log->remote_closed to 1 (the caller clears log->io) and the routine 
log_clear_unflushed() that causes the abort when log->remote_closed == 1 but 
log->io != NULL: log_clear_unflushed() may preempt log_read_watch() when
+ log->remote_closed is set to 1 but before log->io is cleared for
+ the same log.
  
  I applied the following patch to prove this assertion:
  
  @@ -264,6 +266,8 @@
-                        */
-                       log->io->error_handler = NULL;
-  
+      */
+     log->io->error_handler = NULL;
+ 
  +                     nih_fatal("log %p: clearing log->io\n", log);
  +
-                       nih_free (log->io);
-                       log->io = NULL;
-               }
+     nih_free (log->io);
+     log->io = NULL;
+    }
  @@ -384,6 +388,8 @@
-  
-       nih_free (err);
-  
+ 
+   nih_free (err);
+ 
  +     nih_fatal("log %p: clearing log->io\n", log);
  +
-       /* Ensure the NihIo is closed */
-       nih_free (log->io);
-       log->io = NULL;
+   /* Ensure the NihIo is closed */
+   nih_free (log->io);
+   log->io = NULL;
  @@ -696,7 +702,11 @@
-                        *
-                        */
-                       if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
+      *
+      */
+     if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
  +                     {
  +                             nih_fatal("log %p: setting 
log->remote_closed\n", log);
-                               log->remote_closed = 1;
+      log->remote_closed = 1;
  +                     }
-  
-                       close (log->fd);
-                       log->fd = -1;
+ 
+     close (log->fd);
+     log->fd = -1;
  @@ -806,6 +816,8 @@
-               nih_assert (log->open_errno);
-  
-               if (log->remote_closed) {
+    nih_assert (log->open_errno);
+ 
+    if (log->remote_closed) {
  +                     nih_fatal("log %p: checking log->io\n", log);
  +
-                       /* Parent job has ended and unflushed data
-                        * exists.
-                        */
- 
+     /* Parent job has ended and unflushed data
+      * exists.
+      */
  
  I got the  following kernel log before the abort:
  
  [    4.635146] init: log 0xb882a660: clearing log->io
  [    4.640487] init: log 0xb88035e0: setting log->remote_closed
  [    4.654816] init: log 0xb882ce88: clearing log->io
  [    4.655263] init: log 0xb882a1d0: setting log->remote_closed
  [    4.706758] init: log 0xb8831370: setting log->remote_closed
  [    6.714543] init: log 0xb882a600: clearing log->io
  [    6.733596] init: log 0xb882c6a0: clearing log->io
  [    6.746124] init: log 0xb882cbd8: setting log->remote_closed
  [    6.765075] init: log 0xb882b1d0: clearing log->io
  [    6.787772] init: log 0xb8842b50: clearing log->io
  [    6.801163] init: log 0xb882cb78: clearing log->io
  [    6.810717] init: log 0xb882df58: clearing log->io
  [    6.864869] init: log 0xb883e418: clearing log->io
  [    6.871033] init: log 0xb882d2a8: clearing log->io
  [    6.899113] init: log 0xb882d268: clearing log->io
  [    6.927477] init: log 0xb884eb28: clearing log->io
  [    6.942432] init: log 0xb88519a8: clearing log->io
  [    6.997231] init: log 0xb8850c40: clearing log->io
  [    7.031685] init: log 0xb8850650: clearing log->io
  [    7.042562] init: log 0xb8856d10: clearing log->io
  [    7.042811] init: log 0xb8857cf8: clearing log->io
  [    7.063461] init: log 0xb885a850: setting log->remote_closed
  [    7.063996] init: log 0xb883e478: setting log->remote_closed
  [    7.095505] init: log 0xb8854960: setting log->remote_closed
  [    7.135581] init: log 0xb883e478: checking log->io
  [    7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! 
log->io
  [    7.140537] init: Caught abort, core dumped
  [    7.143211] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000600
  
- So the assertion has been proved for the log at 0xb883e478 which has the 
field remote_closed updated
- but the field io dirty.
+ So the assertion has been proved for the log at 0xb883e478 which has the
+ field remote_closed updated but the field io dirty.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1188642

Title:
  Init aborts with the message: init: log.c:813: Assertion failed in
  log_clear_unflushed: ! log->io

To manage notifications about this bug go to:
https://bugs.launchpad.net/upstart/+bug/1188642/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to