There are two issues here, that interact and so they are confusing
people.  The first is that the kernel has a potential livelock problem
in the writeback code, such that if there are constantly new pages
dirtied that requires writeback, the sync(2) system call will never
return (at least until all of the pages are clean, but on a busy system
with lots of processes writing to the disk that could never happen).
It doesn't happen all of the time sync(2) is called, but since dpkg was
calling sync(2) all the time, it tended to happen there.  Still, this
problem can happen without dpkg being involved at all, and on many
different file systems, since it's a problem with the generic writeback
code.   Trying to backport this fix to the ancient kernel which is in
10.04 is going to be _hard_.   There are people at Red Hat who are paid
the big bucks to do this kind of painful backporting (which in this case
is multiple patches spread across multiple kernel releases before it was
finally fixed, and with all sorts of dependencies).   Good luck finding
a volunteer willing to figure this out.   I wouldn't --- I would much
rather run a 3.x kernel.   And if I had a business that needed to use a
stable enterprise kernel, I'd pay the darned Red Hat or SLES support
fees, and get a professionally managed enterprise kernel.
Unfortunately, in my experience Canonical doesn't have paid kernel
engineers who have either the skill or the bandwidth (not sure which) to
do this kind of very tricky backporting to ancient LTS kernels, as
compared to what Red Hat has done.  I've seen this with ext4 bug fixes
which don't get made to 10.04, but which Red Hat has been willing to do
for their RHEL6 kernel.

Note that this problem is much less likely to hit on desktop/laptop
systems where there generally aren't servers continuously writing to the
file system.   So for most Ubuntu systems that tend not to be production
servers running with highly stressful workloads, this won't be an issue.
The people who are complaining on this Launchpad bug are probably
outliers, which probably explains the priority paid Canonical engineers
have towards doing this kind of backporting.

The second problem/bugfix is the fix to dpkg, which significantly
improves both its performance, and the impact on the system as a whole,
by using sync_file_range() instead of sync().    Fixing this also tends
to remove one of the more common ways of tickling the bug above, but
that's not the only reason why backporting this dpkg package would also
be a good idea, since it speeds up and decreases the overall system
impact of doing package installs.

Or, people could just upgrade their system to Ubuntu LTS 12.04.....

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/624877

Title:
  INFO: task dpkg:23317 blocked for more than 120 seconds.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/624877/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to