I think I have nailed it down now, here is brief description what is happening 
(if I read code right)
There seems to be race, when we get new log data for one of the jobs after job 
has been terminated, and while processing it we call log_io_reader and 
eventually log_file_write which will try to flash unflashed buffer to drive.
This succeeds, mind this is before we got disk writable signal.
Since it succeeds, unflashed->len becomes 0, but we don't remove that log 
instance from list of logs which needs to be flashed (log_unflushed_files).
So next time when we get signal that disk is writable, we try again to flash 
that log and BOOM it panics on assert checking that log has something to be 
flashed, but it was already flashed.


Actual change of unfleshed log len changes on line 562, that's where we shrink 
unflashed buffer by amount we managed to write to disk, which in our case if 
full len, making buffer after shrinking zero length.
So I can see at least three fixes:
1) we should after calling nih_io_buffer_shrink (log->unflushed, (size_t)wlen); 
try to check and if log->unflashed->len is 0, and if so  then remove it from 
the log_unflushed_files list.
2) we need to make log_clear_unflushed more tolerant to logs which has been 
already flushed successfully before reaching to this point.
3) we don't try to flash unfleshed buffers till we get disk writable signal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756

Title:
  segfault in log.c code causes phone reboot loops

Status in the base for Ubuntu mobile products:
  Fix Committed
Status in Upstart:
  New
Status in upstart package in Ubuntu:
  Confirmed

Bug description:
  We recently started getting reprots from phone users that their
  devices go into a reboot loop after changing the language or getting
  an OTA upgrade (either of both end with a reboot of the phone)

  after a bit of research we collected the log at
  http://pastebin.ubuntu.com/10872934/

  this shows a segfault of upstarts init binary in the log.c code:

  [    6.999083]init: log.c:819: Assertion failed in log_clear_unflushed: 
log->unflushed->len
  [    7.000279]init: Caught abort, core dumped
  [    7.467176]Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000600

To manage notifications about this bug go to:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1447756/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to