** Description changed:
+ = Summary =
+
+ The version of Upstart in vivid is affected by a coule of bugs relating
+ to the flushing data from early-boot jobs to disk which can both result
+ in a crash:
+
+ == Problem 1 ==
+
+ An internal list is mishandled meaning a crash could occur randomly.
** Changed in: upstart (Ubuntu Vivid)
Status: New = In Progress
** Changed in: upstart (Ubuntu Vivid)
Assignee: (unassigned) = James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in
** Branch linked: lp:~jamesodhunt/ubuntu/vivid/upstart/sru-bug-1447756
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
** Changed in: upstart (Ubuntu Utopic)
Status: New = Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
On Thu, May 21, 2015 at 05:05:16PM -, Alex Kaluzhny wrote:
Is the fix landing in vivid?
The fix has landed in the stable phone overlay ppa (upstart
1.13.2-0ubuntu13.1).
James, can you please follow through on SRUing this to vivid? I've copied
the package into the vivid-proposed queue from
Is the fix landing in vivid?
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
Status in the base for Ubuntu
Unfortunately, ondra no longer has the failing phones so we may need to
take a decision to just land this if QA are happy the change has not
regressed the behaviour for non-failing phones.
I've tried to raise QA but they are sprinting in the US this week so no
direct response.
I believe that
Per Ondrej on private bug
https://bugs.launchpad.net/barajas/+bug/1439778/comments/30
I can confirm that phone which we received from BQ had same issue and problem
has been resolved once applied fix developed in bug #1447756 was applied.
We should now revert workaround committed as part of Bug
This bug was fixed in the package upstart - 1.13.2-0ubuntu14
---
upstart (1.13.2-0ubuntu14) wily; urgency=medium
* Cherry-pick upstream fix for LP: #1447756, fixing broken handling
when flushing logs to disk.
* Cherry-pick follow-on upstream fix for LP: #1447756.
-- James
Thanks for nudging -proposed Steve.
I've silo 021 now includes upstart version 1.13.2-0ubuntu13.1 (which
sil2100 synced from wily).
Basic test plan is here: https://wiki.ubuntu.com/Process/TestPlans
/upstart-bug-1447756
I've tested this as follows:
$ wget
@james we can land in vivid first if we need to, and the fix does not need to
be made to utopic
We do need the fix to land by May 21
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
Erroneous test failures can (and should) be overridden in proposed-
migration. Also the requirement is not that the package *reach* the
development release before being SRUed, only that it be *uploaded* to
the development series.
--
You received this bug notification because you are a member of
** Branch linked: lp:ubuntu/upstart
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
Status in the base for
Looks like kernel bug 1429756 could block 1.13.2-0ubuntu14 landing for
wily (being a pre-req to getting this fix into vivid and utopic):
https://jenkins.qa.ubuntu.com/view/Wily/view/AutoPkgTest/job/wily-
adt-upstart/ARCH=amd64,label=adt/9/
--
You received this bug notification because you
** Changed in: upstart
Status: In Progress = Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot
** Also affects: upstart (Ubuntu Utopic)
Importance: Undecided
Status: New
** Also affects: upstart (Ubuntu Wily)
Importance: Critical
Assignee: James Hunt (jamesodhunt)
Status: In Progress
** Also affects: upstart (Ubuntu Vivid)
Importance: Undecided
Status:
Just to confirm. I just received another device exhibiting boot loop issue and
after quick investigation it was same problem, race of chid process continuing
logging after parent died but before writable disk signal.
Once I used patched upstart binary, device booted normally, so all good.
--
Applying the top 2 commits (r1665 and r1666) from
lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix [1] to
https://bugs.launchpad.net/ubuntu-
rtm/+source/upstart/1.13.2-0ubuntu1rtm1 is now working for me.
I've tested this by building on the device itself and also by building
in a ARCH=armhf
** Tags added: patch
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
Status in the base for Ubuntu mobile
MP raised on lp:upstart to start the trickle-down to the rtm package.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
I think I have nailed it down now, here is brief description what is happening
(if I read code right)
There seems to be race, when we get new log data for one of the jobs after job
has been terminated, and while processing it we call log_io_reader and
eventually log_file_write which will try to
** Changed in: canonical-devices-system-image
Status: Fix Committed = In Progress
** Changed in: canonical-devices-system-image
Milestone: ww19-ota = ww22-2015
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to
lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix contains the fix and
a new test (which correctly fails with the current lp:upstart but passes
with the fix in that branch).
The code has been tested on a failing device and a server system. I am
currently testing on a non-failing krillin device.
Sorry for spamming, but I guess that log_io_reader is called by nih_io_watcher
which has been initialised by nih_io_reopen
which I suppose is called when job starts?
So that would go back to my original finding, job dies and is restarted before
we get signal about disk being writable, but at
So when job gets terminated we don't succeed to write to to the disk and it is
added to the unflushed list.
Problem is another call to write function later on, but before we get writable
disk signal:
[7.460627]init: log_handle_unflushed:778:len=32673,
So one thing which I still cannot track down is how is that unidentified
log_io_reader called. Or who is calling it.
I have put traces to log_read_watch under condition if (io-recv_buf-len) {
where we call log_io_reader but call is not coming from there.
So what are other options for
Ondrej - aha! with the debug, this is making more sense now. Yes, since
the ureadahead-touch job spawns a process in the background (ureadahead)
and then the job itself exits, the log associated with the main job
process gets added to the unflushed list. ureadahead then writes output
and the NihIo
I can confirm fix: http://paste.ubuntu.com/11095313/ does the job.
It will safely ignore entries of log_unflushed_files list which have
log-unflushed-len set to zero. Since we know how this state is reached it
seems like previous nih_assert (log-unflushed-len); was too aggressive.
--
You
Hi Ondrej,
Regarding #15, I'm not sure this is correct. As you say, when the job
process terminates, job_process_terminated() gets called. This calls
log_handle_unflushed() and that function calls log_read_watch(), which
ultimately calls write(2). However, even if the write is successful
before
Sorry in previous comment, replace all flash with flush
Actually one more issue I can see there is this:
Job dies - it's added to log_unflushed_files when it has unflushed data
but if job is restarted before we get disk writable signal, it will mess up the
things, job will still remain in
Hi Ondrej,
Regarding #15, I'm not sure this is correct. As you say, when the job
process terminates, job_process_terminated() gets called. This calls
log_handle_unflushed() and that function calls log_read_watch(), which
ultimately calls write(2). However, even if the write is successful
before
the ureadahead-touch upstart job does definitely not need to log
anything we should add console none to it so it does not attempt
to. (logging there only makes sense when actively debugging ureadahead
anyway)
--
You received this bug notification because you are a member of Ubuntu
Touch
\o/. Yes, Upstart uses asserts extremely agressively. It's unfortunate
that we've never hit this issue in testing but I'm currently working on
new tests for this slightly unusual scenario.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is
** Changed in: upstart (Ubuntu)
Status: Confirmed = In Progress
** Changed in: upstart
Status: New = In Progress
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
** Branch linked: lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
ondra and I have been hammering away at this, but progress is painfully
slow given that:
a) the problem is not seen on every boot.
b) we can only view the end of kmsg log.
c) rebuild times are relatively slow.
From what ondra says he's seen today, it sounds as though we might be
hitting a stack
DIsabling as a workaround is bug #1452663
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
Status in the
** Changed in: canonical-devices-system-image
Status: Confirmed = Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c
i dont think we should close this one, the bu still persists and need to
urgently be fixed, the workaround we ship can not stay for log (since it
removes all logging for system jobs)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is
Agreed, either we open new bug to track this, or we don't mark this as
fixed. This is the time one could use state workaround
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
I've rebuild the fix in a clean environment and the init binary below
now boots fine for me on a bq aquaris E4.5:
http://people.canonical.com/~jhunt/upstart/bugs/bug-1447756/armhf/
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is
hi James
So one way is to disable upstart logging all together with --no-log kern
command option. We are going with this option for next OTA, till we can crack
actual root cause of this issue.
As for restore, best bet is with MTK flash_tool
--
You received this bug notification because you are
Hi ondra/ogra - Can you comment on my suggestions in #6 and #7? My
device is still bricked so if you have any suggestions on how to perform
a full reset, that'd be great as udf is unable to recover it.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded
** Changed in: canonical-devices-system-image
Milestone: ww17-2015 = ww19-ota
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code
** Changed in: canonical-devices-system-image
Assignee: (unassigned) = Ondrej Kubik (w-ondra)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault
** Changed in: upstart (Ubuntu)
Assignee: (unassigned) = James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c
I think I understand what's happening now - it's not the log that isn't
being freed, it's the list entry the log is attached to that is not
freed. The effect is the same though - calling log_clear_unflushed()
multiple times could trigger this issue since the still-valid (but
incorectly so) list
Test binaries are available here:
http://people.canonical.com/~jhunt/upstart/bugs/bug-1447756/armhf/
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
** Branch linked: lp:~jamesodhunt/ubuntu/vivid/upstart/bug-1447756
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone
** Also affects: upstart
Importance: Undecided
Status: New
** Changed in: upstart
Assignee: (unassigned) = James Hunt (jamesodhunt)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
I've tried the fix on my bq device and it appears to be in a reboot loop
(like the one the fix was supposed to resolve).
As such, I'd recommend testing the binaries only for the session init
initially (/usr/bin/ubuntu-touch-session / /usr/share/lightdm/sessions
/ubuntu-touch.desktop).
Also, for
Something else to try - disable /etc/init/flush-early-job-log.conf on
boot...
$ sudo mount -oremount,rw /
$ echo manual | sudo tee /etc/init/flush-early-job-log.override
$ sudo reboot
... and post-boot do the following:
$ for i in $(seq 17); do sudo initctl notify-disk-writeable; done
As Steve
I think I see a potential problem if 'initctl notify-disk-writeable' is
called multiple times. The log_clear_unflushed() function walks the
log_unflushed_files list, attempting to flush each of the logs and
freeing them when done with nih_free(). But as far as I know,
nih_free() will not cause
** Changed in: upstart (Ubuntu)
Importance: Undecided = Critical
** Changed in: upstart (Ubuntu)
Status: New = Confirmed
** Also affects: canonical-devices-system-image
Importance: Undecided
Status: New
** Changed in: canonical-devices-system-image
Importance: Undecided =
Note for the record that this bug has so far only been reported on the
ubuntu-rtm branch, not the ubuntu branch, of the upstart package.
However, the differences between these branches are negligible and
include no changes to the upstream code.
--
You received this bug notification because you
55 matches
Mail list logo