Arjan van de Ven wrote:
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM
Arjan van de Ven wrote:
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > Er, it won't play well if that happen when tasks are frozen for
> > > suspend.
> >
> > right now any suspend attempt times out after 20 seconds:
> >
> > $ grep TIMEOUT kernel/power/process.c
> > #define TIMEOUT (20 * HZ)
> >
On Monday, 3 of December 2007, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
>
> > > This feature will save one full reporter-developer round-trip during
> > > investigation of a significant number of bug reports.
> > >
> > > It might be more practical if it were to
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > This feature will save one full reporter-developer round-trip during
> > investigation of a significant number of bug reports.
> >
> > It might be more practical if it were to dump the traces for _all_
> > D-state processes when it fires -
On Monday, 3 of December 2007, Andrew Morton wrote:
> On Mon, 3 Dec 2007 15:19:25 +0100
> Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
> >
>
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:
> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
>
> -->
> INFO: task prctl:3042 blocked for more
On Dec 3, 2007 6:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> That won't address my concerns about already "breaking" (as in
> frightening the user etc.) common error handling scenarios by default.
Andi, may I respectfully submit that you're not understanding real users here?
Real users
> the scsi layer will have the IO totally aborted within that time anyway;
> the retry timeout for disks is 30 seconds after all.
There are blocking waits who wait for multiple IOs.
Also i think the SCSI driver can tune this anyways and I suspect
iSCSI and friends increase it (?)
-Andi
--
To
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
> > broken.
>
> What should it do when the NFS server doesn't answer anymore or
> when the network to the SAN RAID array located a few hundred KM away
>
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > debugging feature can be disabled/enabled on a wide scale already:
> >
> > - in the .config
> >
> > - runtime, temporarily, via:
> >
> > echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>
> That won't address my concerns about already
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> Now Ingo's latest unreleased version with single line messages might
> be actually ok if he turns off the backtraces by default.
> Unfortunately I wasn't able to find out so far if he has done that or
> not, he always cuts away these parts of the
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > I would still appreciate if you could state what default value you
> > plan to set the backtrace sysctl to in the submitted patch.
>
> there's no "backtrace sysctl" planned for the
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote:
> Andi, is that true? If yes, why didnt Andi state this concern outright,
> instead of pooh-pooh-ing the patch on various other grounds?
No of course not. Radoslaw is talking nonsense.
-Andi
--
To unsubscribe from this list: send the
> It's more like "lets warn about it and fix the problems when we find
> some."
It is already known there are lots of problems. I won't repeat
them because I already wrote too much about them. Feel free
to read back in the thread.
Now if all the known problems are fixed and only some hard to
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:
> On Mon, 3 Dec 2007 14:29:56 +0100
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > > feedback about an impending catastrophy has been duly noted
> > >
> > > The point was less about an impending catastrophe, but more of a
> > > timebomb
* Pekka Enberg <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > "audit thousands of callsites in 8 million lines of code first" is a
> > > nice euphemism for hiding from the blame forever. We had 10 years for it
>
> On Dec 3, 2007 2:13 PM,
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> I would still appreciate if you could state what default value you
> plan to set the backtrace sysctl to in the submitted patch.
there's no "backtrace sysctl" planned for the moment. This "hung tasks"
debugging feature can be disabled/enabled on a
Hi,
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > "audit thousands of callsites in 8 million lines of code first" is a
> > nice euphemism for hiding from the blame forever. We had 10 years for it
On Dec 3, 2007 2:13 PM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> Ok your approach
I would still appreciate if you could state what default value
you plan to set the backtrace sysctl to in the submitted patch.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
On Mon, 3 Dec 2007 14:29:56 +0100
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > feedback about an impending catastrophy has been duly noted
> >
> > The point was less about an impending catastrophe, but more of a
> > timebomb ticking until the next widely used release.
I think I know why
> negative
I would consider it positive, but ok. If I was negative I would
probably not care and just make always sure to disable SOFTLOCKUP
in the kernels I use.
> feedback about an impending catastrophy has been duly noted
The point was less about an impending catastrophe, but more of a
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > you are over-designing it way too much - a backtrace is obviously
> > very helpful and it must be printed by default. There's enough
> > configurability in it already so that you can turn it off if you
> > want.
>
> So it will hit everybody first
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
>
> > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > no. (that's why i added the '(or a kill -9)' qualification above - if
> > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> > > should
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > no. (that's why i added the '(or a kill -9)' qualification above - if
> > NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> > should not have an interrupting
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> no. (that's why i added the '(or a kill -9)' qualification above - if
> NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> should not have an interrupting effect.)
NFS is already interruptible with umount -f (I
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
> >
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> > >
> > > What should it do when the NFS server doesn't
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> >
> > What should it do when the NFS server doesn't answer anymore or when
> > the network to the SAN RAID
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
>
> What should it do when the NFS server doesn't answer anymore or when
> the network to the SAN RAID array located a few hundred KM away
> develops some hickup? [...]
maybe:
> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM away develops
some hickup? Or just the SCSI driver decides to do lengthy error
recovery --
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:
> > iirc TASK_KILLABLE fixed NFS only. While that's a good thing there
> > are unfortunately a lot more subsystems that would need the same
> > treatment.
>
> Yes, that's exactly why the patch is needed - to find the bugs and fix
> them.
On Mon, 3 Dec 2007 10:55:01 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> > On Mon, 3 Dec 2007 01:07:41 +0100
> > Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > This patch will likely work against that by breaking error paths.
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> On Mon, 3 Dec 2007 01:07:41 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > We really need to get better diagnostics for the
> > > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > > get to the scenario
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
On Mon, 3 Dec 2007 01:07:41 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
We really need to get better diagnostics for the
bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
get to the scenario where we have
* Andi Kleen [EMAIL PROTECTED] wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or when
the network to the SAN RAID array located a few hundred KM away
develops some hickup? [...]
maybe: if the
On Mon, 3 Dec 2007 10:55:01 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
On Mon, 3 Dec 2007 01:07:41 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
This patch will likely work against that by breaking error paths.
it won't
* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote:
iirc TASK_KILLABLE fixed NFS only. While that's a good thing there
are unfortunately a lot more subsystems that would need the same
treatment.
Yes, that's exactly why the patch is needed - to find the bugs and fix
them. Otherwise
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM away develops
some hickup? Or just the SCSI driver decides to do lengthy error
recovery --
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
* Andi Kleen [EMAIL PROTECTED] wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or when
the network to the SAN RAID array located a
* Andi Kleen [EMAIL PROTECTED] wrote:
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
* Andi Kleen [EMAIL PROTECTED] wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or when
* Andi Kleen [EMAIL PROTECTED] wrote:
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
no. (that's why i added the '(or a kill -9)' qualification above - if
NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
should not have an interrupting effect.)
NFS
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
no. (that's why i added the '(or a kill -9)' qualification above - if
NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
should not have an interrupting effect.)
NFS is already interruptible with umount -f (I
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
no. (that's why i added the '(or a kill -9)' qualification above - if
NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
should not have an
* Andi Kleen [EMAIL PROTECTED] wrote:
you are over-designing it way too much - a backtrace is obviously
very helpful and it must be printed by default. There's enough
configurability in it already so that you can turn it off if you
want.
So it will hit everybody first before they
negative
I would consider it positive, but ok. If I was negative I would
probably not care and just make always sure to disable SOFTLOCKUP
in the kernels I use.
feedback about an impending catastrophy has been duly noted
The point was less about an impending catastrophe, but more of a
* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote:
On Mon, 3 Dec 2007 14:29:56 +0100
* Andi Kleen [EMAIL PROTECTED] wrote:
feedback about an impending catastrophy has been duly noted
The point was less about an impending catastrophe, but more of a
timebomb ticking until the
* Andi Kleen [EMAIL PROTECTED] wrote:
I would still appreciate if you could state what default value you
plan to set the backtrace sysctl to in the submitted patch.
there's no backtrace sysctl planned for the moment. This hung tasks
debugging feature can be disabled/enabled on a wide scale
Hi,
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
audit thousands of callsites in 8 million lines of code first is a
nice euphemism for hiding from the blame forever. We had 10 years for it
On Dec 3, 2007 2:13 PM, Andi Kleen [EMAIL PROTECTED] wrote:
Ok your approach is then
On Mon, 3 Dec 2007 14:29:56 +0100
* Andi Kleen [EMAIL PROTECTED] wrote:
feedback about an impending catastrophy has been duly noted
The point was less about an impending catastrophe, but more of a
timebomb ticking until the next widely used release.
I think I know why Andi is so
I would still appreciate if you could state what default value
you plan to set the backtrace sysctl to in the submitted patch.
-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
* Pekka Enberg [EMAIL PROTECTED] wrote:
Hi,
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
audit thousands of callsites in 8 million lines of code first is a
nice euphemism for hiding from the blame forever. We had 10 years for it
On Dec 3, 2007 2:13 PM, Andi Kleen
* Andi Kleen [EMAIL PROTECTED] wrote:
debugging feature can be disabled/enabled on a wide scale already:
- in the .config
- runtime, temporarily, via:
echo 0 /proc/sys/kernel/hung_task_timeout_secs
That won't address my concerns about already breaking (as in
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote:
* Andi Kleen [EMAIL PROTECTED] wrote:
I would still appreciate if you could state what default value you
plan to set the backtrace sysctl to in the submitted patch.
there's no backtrace sysctl planned for the moment. This
It's more like lets warn about it and fix the problems when we find
some.
It is already known there are lots of problems. I won't repeat
them because I already wrote too much about them. Feel free
to read back in the thread.
Now if all the known problems are fixed and only some hard to know
* Andi Kleen [EMAIL PROTECTED] wrote:
Now Ingo's latest unreleased version with single line messages might
be actually ok if he turns off the backtraces by default.
Unfortunately I wasn't able to find out so far if he has done that or
not, he always cuts away these parts of the emails.
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote:
Andi, is that true? If yes, why didnt Andi state this concern outright,
instead of pooh-pooh-ing the patch on various other grounds?
No of course not. Radoslaw is talking nonsense.
-Andi
--
To unsubscribe from this list: send the
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM away
develops some
the scsi layer will have the IO totally aborted within that time anyway;
the retry timeout for disks is 30 seconds after all.
There are blocking waits who wait for multiple IOs.
Also i think the SCSI driver can tune this anyways and I suspect
iSCSI and friends increase it (?)
-Andi
--
To
On Dec 3, 2007 6:17 AM, Andi Kleen [EMAIL PROTECTED] wrote:
That won't address my concerns about already breaking (as in
frightening the user etc.) common error handling scenarios by default.
Andi, may I respectfully submit that you're not understanding real users here?
Real users either:
-
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:
--
INFO: task prctl:3042 blocked for more than 120
On Monday, 3 of December 2007, Andrew Morton wrote:
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:
* Rafael J. Wysocki [EMAIL PROTECTED] wrote:
This feature will save one full reporter-developer round-trip during
investigation of a significant number of bug reports.
It might be more practical if it were to dump the traces for _all_
D-state processes when it fires - basically an
On Monday, 3 of December 2007, Ingo Molnar wrote:
* Rafael J. Wysocki [EMAIL PROTECTED] wrote:
This feature will save one full reporter-developer round-trip during
investigation of a significant number of bug reports.
It might be more practical if it were to dump the traces for
* Rafael J. Wysocki [EMAIL PROTECTED] wrote:
Er, it won't play well if that happen when tasks are frozen for
suspend.
right now any suspend attempt times out after 20 seconds:
$ grep TIMEOUT kernel/power/process.c
#define TIMEOUT (20 * HZ)
end_time = jiffies +
On Mon, 3 Dec 2007 01:07:41 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > We really need to get better diagnostics for the
> > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > get to the scenario where we have a more or less robust measure of
> > kernel quality (and we're
> We really need to get better diagnostics for the
> bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get
> to the scenario where we have a more or less robust measure of kernel
> quality (and we're not all that far off for several cases), one thing
One measure to kernel quality
> Delay accounting (or the /proc//sched fields that i added recently)
> only get updated once a task has finished its unreasonably long delay
> and has scheduled.
If it is stuck forever then you can just use sysrq-t
If it recovers delay accounting will catch it.
> detected_ this way. This is
On Sun, 2 Dec 2007 21:47:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Out of direct experience, 95% of the "too long delay" cases are
> > plain old bugs. The rest we can (and must!) convert to
> > TASK_KILLABLE or could
>
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
> > something that most humans consider as "buggy" in the overwhelming
> > majority of cases, regardless of the reason? Yes, there are and will
> > be some exceptions, but not
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Until now users had little direct recourse to get such problems
> > fixed. (we had sysrq-t, but that included no real metric of how long
> > a task was
>
> Actually task delay accounting can measure this now. iirc someone had
> a latencytop based
On Sun, 2 Dec 2007 22:19:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > Until now users had little direct recourse to get such problems
> > fixed. (we had sysrq-t, but that included no real metric of how
> > long a task was
>
> Actually task delay accounting can measure this now. iirc
Ingo Molnar <[EMAIL PROTECTED]> writes:
>
> do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
> something that most humans consider as "buggy" in the overwhelming
> majority of cases, regardless of the reason? Yes, there are and will be
> some exceptions, but not nearly as
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> > what if you considered - just for a minute - the possibility of this
> > debug tool being the thing that actually animates developers to fix such
> > long delay bugs that have bothered
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> what if you considered - just for a minute - the possibility of this
> debug tool being the thing that actually animates developers to fix such
> long delay bugs that have bothered users for almost a decade meanwhile?
Throwing
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Out of direct experience, 95% of the "too long delay" cases are plain
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
>
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It
> would be pretty bad to
> Out of direct experience, 95% of the "too long delay" cases are plain
> old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).
It would be pretty bad to merge this patch without converting them to
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > .. and it's even a tool to show where we missed making something
> > TASK_KILLABLE... anything that triggers from NFS and the like really
> > ought to be TASK_KILLABLE after all. This patch will point any
> > omissions out quite nicely without
* Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > TASK_KILLABLE should be the right solution i think.
>
> .. and it's even a tool to show where we missed making something
> TASK_KILLABLE... anything that triggers from NFS and the like really
> ought to be TASK_KILLABLE after all. This patch
> .. and it's even a tool to show where we missed making something
> TASK_KILLABLE... anything that triggers from NFS and the like really
> ought to be TASK_KILLABLE after all. This patch will point any
> omissions out quite nicely without having to do any kind of destructive
> testing.
It would
On Sun, 2 Dec 2007 19:59:45 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > Ingo Molnar <[EMAIL PROTECTED]> writes:
> >
> > > this patch extends the soft-lockup detector to automatically
> > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> Ingo Molnar <[EMAIL PROTECTED]> writes:
>
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
>
> That will likely trigger anytime a hard
Ingo Molnar <[EMAIL PROTECTED]> writes:
> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
That will likely trigger anytime a hard nfs/cifs mount loses its
server for 120s. To make this work
On Sun, 2 Dec 2007, Ingo Oeser wrote:
> > maybe, but we'd have to see how often this gets triggered. An OOM is
> > something that could happen in any overloaded system - while a hung task
> > is likely due to a kernel bug.
>
> What about a client using hard mounted NFS shares here? That
* Ingo Oeser <[EMAIL PROTECTED]> wrote:
> On Saturday 01 December 2007, Ingo Molnar wrote:
> > maybe, but we'd have to see how often this gets triggered. An OOM is
> > something that could happen in any overloaded system - while a hung task
> > is likely due to a kernel bug.
>
> What about a
* Ingo Oeser [EMAIL PROTECTED] wrote:
On Saturday 01 December 2007, Ingo Molnar wrote:
maybe, but we'd have to see how often this gets triggered. An OOM is
something that could happen in any overloaded system - while a hung task
is likely due to a kernel bug.
What about a client
On Sun, 2 Dec 2007, Ingo Oeser wrote:
maybe, but we'd have to see how often this gets triggered. An OOM is
something that could happen in any overloaded system - while a hung task
is likely due to a kernel bug.
What about a client using hard mounted NFS shares here? That shouldn't be
Ingo Molnar [EMAIL PROTECTED] writes:
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:
That will likely trigger anytime a hard nfs/cifs mount loses its
server for 120s. To make this work you
* Andi Kleen [EMAIL PROTECTED] wrote:
Ingo Molnar [EMAIL PROTECTED] writes:
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:
That will likely trigger anytime a hard nfs/cifs mount loses
On Sun, 2 Dec 2007 19:59:45 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:
* Andi Kleen [EMAIL PROTECTED] wrote:
Ingo Molnar [EMAIL PROTECTED] writes:
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed
* Andi Kleen [EMAIL PROTECTED] wrote:
.. and it's even a tool to show where we missed making something
TASK_KILLABLE... anything that triggers from NFS and the like really
ought to be TASK_KILLABLE after all. This patch will point any
omissions out quite nicely without having to do
* Arjan van de Ven [EMAIL PROTECTED] wrote:
TASK_KILLABLE should be the right solution i think.
.. and it's even a tool to show where we missed making something
TASK_KILLABLE... anything that triggers from NFS and the like really
ought to be TASK_KILLABLE after all. This patch will
Out of direct experience, 95% of the too long delay cases are plain
old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).
It would be pretty bad to merge this patch without converting them to
TASK_KILLABLE
.. and it's even a tool to show where we missed making something
TASK_KILLABLE... anything that triggers from NFS and the like really
ought to be TASK_KILLABLE after all. This patch will point any
omissions out quite nicely without having to do any kind of destructive
testing.
It would be
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
what if you considered - just for a minute - the possibility of this
debug tool being the thing that actually animates developers to fix such
long delay bugs that have bothered users for almost a decade meanwhile?
Throwing frequent
* Andi Kleen [EMAIL PROTECTED] wrote:
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
what if you considered - just for a minute - the possibility of this
debug tool being the thing that actually animates developers to fix such
long delay bugs that have bothered users for
Ingo Molnar [EMAIL PROTECTED] writes:
do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
something that most humans consider as buggy in the overwhelming
majority of cases, regardless of the reason? Yes, there are and will be
some exceptions, but not nearly as countless as
* Andi Kleen [EMAIL PROTECTED] wrote:
Out of direct experience, 95% of the too long delay cases are plain
old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It
would be pretty bad to merge this
* Andi Kleen [EMAIL PROTECTED] wrote:
do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
something that most humans consider as buggy in the overwhelming
majority of cases, regardless of the reason? Yes, there are and will
be some exceptions, but not nearly as
On Sun, 2 Dec 2007 22:19:25 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
Until now users had little direct recourse to get such problems
fixed. (we had sysrq-t, but that included no real metric of how
long a task was
Actually task delay accounting can measure this now. iirc someone
* Andi Kleen [EMAIL PROTECTED] wrote:
Until now users had little direct recourse to get such problems
fixed. (we had sysrq-t, but that included no real metric of how long
a task was
Actually task delay accounting can measure this now. iirc someone had
a latencytop based on it
1 - 100 of 116 matches
Mail list logo