** Description changed:
+ = Summary =
+
+ The version of Upstart in vivid is affected by a coule of bugs relating
+ to the flushing data from early-boot jobs to disk which can both result
+ in a crash:
+
+ == Problem 1 ==
+
+ An internal list is mishandled meaning a crash could occur randomly.
** Changed in: upstart (Ubuntu Vivid)
Status: New => In Progress
** Changed in: upstart (Ubuntu Vivid)
Assignee: (unassigned) => James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchp
** Branch linked: lp:~jamesodhunt/ubuntu/vivid/upstart/sru-bug-1447756
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifi
** Changed in: canonical-devices-system-image
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
On Thu, May 21, 2015 at 05:05:16PM -, Alex Kaluzhny wrote:
> Is the fix landing in vivid?
The fix has landed in the stable phone overlay ppa (upstart
1.13.2-0ubuntu13.1).
James, can you please follow through on SRUing this to vivid? I've copied
the package into the vivid-proposed queue from
** Changed in: upstart (Ubuntu Utopic)
Status: New => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifi
Is the fix landing in vivid?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifications about this bug go to:
https://bugs
Per Ondrej on private bug
https://bugs.launchpad.net/barajas/+bug/1439778/comments/30
"I can confirm that phone which we received from BQ had same issue and problem
has been resolved once applied fix developed in bug #1447756 was applied.
We should now revert workaround committed as part of Bug #
Unfortunately, ondra no longer has the failing phones so we may need to
take a decision to just land this if QA are happy the change has not
regressed the behaviour for non-failing phones.
I've tried to raise QA but they are sprinting in the US this week so no
direct response.
I believe that sil2
Thanks for nudging -proposed Steve.
I've silo 021 now includes upstart version 1.13.2-0ubuntu13.1 (which
sil2100 synced from wily).
Basic test plan is here: https://wiki.ubuntu.com/Process/TestPlans
/upstart-bug-1447756
I've tested this as follows:
$ wget http://people.canonical.com/~jhm/baraj
This bug was fixed in the package upstart - 1.13.2-0ubuntu14
---
upstart (1.13.2-0ubuntu14) wily; urgency=medium
* Cherry-pick upstream fix for LP: #1447756, fixing broken handling
when flushing logs to disk.
* Cherry-pick follow-on upstream fix for LP: #1447756.
-- James Hu
@james we can land in vivid first if we need to, and the fix does not need to
be made to utopic
We do need the fix to land by May 21
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
seg
Erroneous test failures can (and should) be overridden in proposed-
migration. Also the requirement is not that the package *reach* the
development release before being SRUed, only that it be *uploaded* to
the development series.
--
You received this bug notification because you are a member of
Looks like kernel bug 1429756 could block 1.13.2-0ubuntu14 landing for
wily (being a pre-req to getting this fix into vivid and utopic):
https://jenkins.qa.ubuntu.com/view/Wily/view/AutoPkgTest/job/wily-
adt-upstart/ARCH=amd64,label=adt/9/
--
You received this bug notification because you ar
** Branch linked: lp:ubuntu/upstart
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifications about this bug go to:
https
Just to confirm. I just received another device exhibiting boot loop issue and
after quick investigation it was same problem, race of chid process continuing
logging after parent died but before writable disk signal.
Once I used patched "upstart" binary, device booted normally, so all good.
--
** Also affects: upstart (Ubuntu Utopic)
Importance: Undecided
Status: New
** Also affects: upstart (Ubuntu Wily)
Importance: Critical
Assignee: James Hunt (jamesodhunt)
Status: In Progress
** Also affects: upstart (Ubuntu Vivid)
Importance: Undecided
Status: Ne
** Changed in: upstart
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notificati
** Tags added: patch
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifications about this bug go to:
https://bugs.launchp
MP raised on lp:upstart to start the trickle-down to the rtm package.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notific
Applying the top 2 commits (r1665 and r1666) from
lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix [1] to
https://bugs.launchpad.net/ubuntu-
rtm/+source/upstart/1.13.2-0ubuntu1rtm1 is now working for me.
I've tested this by building on the device itself and also by building
in a ARCH=armhf utopi
** Changed in: canonical-devices-system-image
Status: Fix Committed => In Progress
** Changed in: canonical-devices-system-image
Milestone: ww19-ota => ww22-2015
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.
lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix contains the fix and
a new test (which correctly fails with the current lp:upstart but passes
with the fix in that branch).
The code has been tested on a failing device and a server system. I am
currently testing on a non-failing krillin device. I
** Branch linked: lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifica
** Changed in: upstart (Ubuntu)
Status: Confirmed => In Progress
** Changed in: upstart
Status: New => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfaul
\o/. Yes, Upstart uses asserts extremely agressively. It's unfortunate
that we've never hit this issue in testing but I'm currently working on
new tests for this slightly unusual scenario.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu
I can confirm fix: http://paste.ubuntu.com/11095313/ does the job.
It will safely ignore entries of log_unflushed_files list which have
log->unflushed->len set to zero. Since we know how this state is reached it
seems like previous nih_assert (log->unflushed->len); was too aggressive.
--
You re
Ondrej - aha! with the debug, this is making more sense now. Yes, since
the ureadahead-touch job spawns a process in the background (ureadahead)
and then the job itself exits, the log associated with the main job
process gets added to the unflushed list. ureadahead then writes output
and the NihIo
the ureadahead-touch upstart job does definitely not need to log
anything we should add "console none" to it so it does not attempt
to. (logging there only makes sense when actively debugging ureadahead
anyway)
--
You received this bug notification because you are a member of Ubuntu
Bugs, wh
Sorry for spamming, but I guess that log_io_reader is called by nih_io_watcher
which has been initialised by "nih_io_reopen"
which I suppose is called when job starts?
So that would go back to my original finding, job dies and is restarted before
we get signal about disk being writable, but at t
So one thing which I still cannot track down is how is that unidentified
log_io_reader called. Or who is calling it.
I have put traces to log_read_watch under condition "if (io->recv_buf->len) {"
where we call "log_io_reader" but call is not coming from there.
So what are other options for log_io
So when job gets terminated we don't succeed to write to to the disk and it is
added to the unflushed list.
Problem is another call to write function later on, but before we get writable
disk signal:
[7.460627]init: log_handle_unflushed:778:len=32673,
path='/var/log/upstart/ureadahead-touch.
Hi Ondrej,
Regarding #15, I'm not sure this is correct. As you say, when the job
process terminates, job_process_terminated() gets called. This calls
log_handle_unflushed() and that function calls log_read_watch(), which
ultimately calls write(2). However, even if the write is successful
before 'i
Hi Ondrej,
Regarding #15, I'm not sure this is correct. As you say, when the job
process terminates, job_process_terminated() gets called. This calls
log_handle_unflushed() and that function calls log_read_watch(), which
ultimately calls write(2). However, even if the write is successful
before 'i
Sorry in previous comment, replace all "flash" with "flush"
Actually one more issue I can see there is this:
Job dies -> it's added to log_unflushed_files when it has unflushed data
but if job is restarted before we get disk writable signal, it will mess up the
things, job will still remain in log
I think I have nailed it down now, here is brief description what is happening
(if I read code right)
There seems to be race, when we get new log data for one of the jobs after job
has been terminated, and while processing it we call log_io_reader and
eventually log_file_write which will try to
ondra and I have been hammering away at this, but progress is painfully
slow given that:
a) the problem is not seen on every boot.
b) we can only view the end of kmsg log.
c) rebuild times are relatively slow.
>From what ondra says he's seen today, it sounds as though we might be
hitting a stack
DIsabling as a workaround is bug #1452663
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifications about this bug go to:
Agreed, either we open new bug to track this, or we don't mark this as
fixed. This is the time one could use state "workaround"
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault
i dont think we should close this one, the bu still persists and need to
urgently be fixed, the workaround we ship can not stay for log (since it
removes all logging for system jobs)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
http
** Changed in: canonical-devices-system-image
Status: Confirmed => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
I've rebuild the fix in a clean environment and the init binary below
now boots fine for me on a bq aquaris E4.5:
http://people.canonical.com/~jhunt/upstart/bugs/bug-1447756/armhf/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https
hi James
So one way is to disable upstart logging all together with "--no-log" kern
command option. We are going with this option for next OTA, till we can crack
actual root cause of this issue.
As for restore, best bet is with MTK flash_tool
--
You received this bug notification because you ar
Hi ondra/ogra - Can you comment on my suggestions in #6 and #7? My
device is still bricked so if you have any suggestions on how to perform
a full reset, that'd be great as udf is unable to recover it.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscri
** Changed in: canonical-devices-system-image
Assignee: (unassigned) => Ondrej Kubik (w-ondra)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone r
** Changed in: canonical-devices-system-image
Milestone: ww17-2015 => ww19-ota
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To m
Something else to try - disable /etc/init/flush-early-job-log.conf on
boot...
$ sudo mount -oremount,rw /
$ echo manual | sudo tee /etc/init/flush-early-job-log.override
$ sudo reboot
... and post-boot do the following:
$ for i in $(seq 17); do sudo initctl notify-disk-writeable; done
As Steve
I've tried the fix on my bq device and it appears to be in a reboot loop
(like the one the fix was supposed to resolve).
As such, I'd recommend testing the binaries only for the session init
initially (/usr/bin/ubuntu-touch-session / /usr/share/lightdm/sessions
/ubuntu-touch.desktop).
Also, for b
Test binaries are available here:
http://people.canonical.com/~jhunt/upstart/bugs/bug-1447756/armhf/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
i386+amd64 are packaged here:
https://launchpad.net/~jamesodhunt/+archive/ubuntu/bug-1447756/+packages
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phon
** Branch linked: lp:~jamesodhunt/ubuntu/vivid/upstart/bug-1447756
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notificati
** Branch linked: lp:~jamesodhunt/upstart/bug-1447756
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
To manage notifications about thi
** Also affects: upstart
Importance: Undecided
Status: New
** Changed in: upstart
Assignee: (unassigned) => James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/144775
I think I understand what's happening now - it's not the log that isn't
being freed, it's the list entry the log is attached to that is not
freed. The effect is the same though - calling log_clear_unflushed()
multiple times could trigger this issue since the still-valid (but
incorectly so) list ent
** Changed in: upstart (Ubuntu)
Assignee: (unassigned) => James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
I think I see a potential problem if 'initctl notify-disk-writeable' is
called multiple times. The log_clear_unflushed() function walks the
log_unflushed_files list, attempting to flush each of the logs and
freeing them when done with nih_free(). But as far as I know,
nih_free() will not cause th
Note for the record that this bug has so far only been reported on the
ubuntu-rtm branch, not the ubuntu branch, of the upstart package.
However, the differences between these branches are negligible and
include no changes to the upstream code.
--
You received this bug notification because you ar
** Changed in: upstart (Ubuntu)
Importance: Undecided => Critical
** Changed in: upstart (Ubuntu)
Status: New => Confirmed
** Also affects: canonical-devices-system-image
Importance: Undecided
Status: New
** Changed in: canonical-devices-system-image
Importance: Undecided
58 matches
Mail list logo