subject:"Re\: \[BUG\] Strange 1\-second pauses during Resume\-from\-RAM"

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-04 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote:
> > 
> > After an eternity of compile time, this config does generate some useful
> > output.  qemu is not to blame.
> 
> Or is it?  The output definitely looks suspicious.  Large amounts of 
> code get processed within a microsecond, while update_wall_time() 
> appears to cause huge delays every time it is called: 
> http://logfs.org/~joern/trace
> 
> Does this output make sense or does it rather indicate some sloppiness 
> wrt. time in the qemu virtual machine?

not sure. It could be qemu being scheduled away? You could try to run 
qemu with nice -20 or so, to avoid getting preempted. If time lapses 
like this still show up:

 trace-cm 434   0D.h. 1008us!: do_timer (tick_periodic)
 trace-cm 434   0D.h. 1972us+: update_wall_time (do_timer)

 trace-cm 434   0D.h. 1008us!: do_timer (tick_periodic)
 trace-cm 434   0D.h. 1972us+: update_wall_time (do_timer)

then that could indicate a timekeeping weirdness, OR it could mean that 
qemu is simply very slow. (there could be timer hw access between those 
two function calls)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-04 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote:
  
  After an eternity of compile time, this config does generate some useful
  output.  qemu is not to blame.
 
 Or is it?  The output definitely looks suspicious.  Large amounts of 
 code get processed within a microsecond, while update_wall_time() 
 appears to cause huge delays every time it is called: 
 http://logfs.org/~joern/trace
 
 Does this output make sense or does it rather indicate some sloppiness 
 wrt. time in the qemu virtual machine?

not sure. It could be qemu being scheduled away? You could try to run 
qemu with nice -20 or so, to avoid getting preempted. If time lapses 
like this still show up:

 trace-cm 434   0D.h. 1008us!: do_timer (tick_periodic)
 trace-cm 434   0D.h. 1972us+: update_wall_time (do_timer)

 trace-cm 434   0D.h. 1008us!: do_timer (tick_periodic)
 trace-cm 434   0D.h. 1972us+: update_wall_time (do_timer)

then that could indicate a timekeeping weirdness, OR it could mean that 
qemu is simply very slow. (there could be timer hw access between those 
two function calls)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-03 Thread Jörn Engel

On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote:
> 
> After an eternity of compile time, this config does generate some useful
> output.  qemu is not to blame.

Or is it?  The output definitely looks suspicious.  Large amounts of
code get processed within a microsecond, while update_wall_time()
appears to cause huge delays every time it is called:
http://logfs.org/~joern/trace

Does this output make sense or does it rather indicate some sloppiness
wrt. time in the qemu virtual machine?

Jörn

-- 
tglx1 thinks that joern should get a (TM) for "Thinking Is Hard"
-- Thomas Gleixner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-03 Thread Jörn Engel

On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote:
 
 After an eternity of compile time, this config does generate some useful
 output.  qemu is not to blame.

Or is it?  The output definitely looks suspicious.  Large amounts of
code get processed within a microsecond, while update_wall_time()
appears to cause huge delays every time it is called:
http://logfs.org/~joern/trace

Does this output make sense or does it rather indicate some sloppiness
wrt. time in the qemu virtual machine?

Jörn

-- 
tglx1 thinks that joern should get a (TM) for Thinking Is Hard
-- Thomas Gleixner
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 22:19:00 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> 
> > Maybe one more thing: can you send me the config you used for the 
> > setup above?  I'd like to know whether qemu or my config is to blame.
> 
> sure - attached.

After an eternity of compile time, this config does generate some useful
output.  qemu is not to blame.

Jörn

-- 
Joern's library part 9:
http://www.scl.ameslab.gov/Publications/Gus/TwelveWays.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote:
> 
> to capture that trace i did not use -rt, i just patched latest -git 
> with:
> 
>   
> http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
> 
> (this has your fixes included already)
> 
> have done:
> 
>   echo 1 > /proc/sys/kernel/mcount_enabled
> 
> and have run:
> 
>   ./trace-cmd sleep 1 > trace.txt
> 
>   http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c
> 
> to capture a 1 second trace of what the system is doing. I think your 
> troubles are due to running it within a qemu guest - that is not a 
> typical utilization so you are on unchartered waters.

Maybe one more thing: can you send me the config you used for the setup
above?  I'd like to know whether qemu or my config is to blame.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote:
> 
> to capture a 1 second trace of what the system is doing. I think your 
> troubles are due to running it within a qemu guest - that is not a 
> typical utilization so you are on unchartered waters.

Looks like it.  Guess I'll switch to something else for the moment.

Jörn

-- 
Linux is more the core point of a concept that surrounds "open source"
which, in turn, is based on a false concept. This concept is that
people actually want to look at source code.
-- Rob Enderle
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> > oprofile helps if you can reliably reproduce the slowdown in a loop 
> > or for a long amount of time, with lots of CPU utilization - and 
> > then it's also lower overhead. The tracer can be used to capture 
> > rare or complex events, and gives the full flow control and what is 
> > happening within the kernel.
> 
> Such a trace would be useful indeed.  But so far the patch has only 
> given me grief and nothing remotely like useful output.  Maybe I 
> should simply use the complete -rt patch instead of debugging the 
> broken-out latency-tracer patch.

to capture that trace i did not use -rt, i just patched latest -git 
with:

  
http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch

(this has your fixes included already)

have done:

  echo 1 > /proc/sys/kernel/mcount_enabled

and have run:

  ./trace-cmd sleep 1 > trace.txt

  http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c

to capture a 1 second trace of what the system is doing. I think your 
troubles are due to running it within a qemu guest - that is not a 
typical utilization so you are on unchartered waters.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:07:22 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> 
> > Result looked like a livelock and finally convinced me to abandon the 
> > latency tracer.  Sorry, but it appears to be the right tool for the 
> > wrong job.
> 
> hm, we routinely use it in -rt to capture "what on earth is happening" 
> incidents. The snippet below is a random snipped from a trace that i've 
> just captured, with mcount enabled. It seems to work fine here, with and 
> without mcount. (pit clocksource is almost never used, that's why you 
> had those early problems.)
> 
> oprofile helps if you can reliably reproduce the slowdown in a loop or 
> for a long amount of time, with lots of CPU utilization - and then it's 
> also lower overhead. The tracer can be used to capture rare or complex 
> events, and gives the full flow control and what is happening within the 
> kernel.

Such a trace would be useful indeed.  But so far the patch has only
given me grief and nothing remotely like useful output.  Maybe I should
simply use the complete -rt patch instead of debugging the broken-out
latency-tracer patch.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote:
> > 
> > well what does the trace say, where do the delays come from? To get a 
> > quick overview you can make tracing lighter weight by doing:
> > 
> >  echo 0 > /proc/sys/kernel/mcount_enabled
> >  echo 1 > /proc/sys/kernel/trace_syscalls
> 
> I mistyped and did 
>  echo 1 > /proc/sys/kernel/mcount_enabled
> 
> Result looked like a livelock and finally convinced me to abandon the 
> latency tracer.  Sorry, but it appears to be the right tool for the 
> wrong job.

hm, we routinely use it in -rt to capture "what on earth is happening" 
incidents. The snippet below is a random snipped from a trace that i've 
just captured, with mcount enabled. It seems to work fine here, with and 
without mcount. (pit clocksource is almost never used, that's why you 
had those early problems.)

oprofile helps if you can reliably reproduce the slowdown in a loop or 
for a long amount of time, with lots of CPU utilization - and then it's 
also lower overhead. The tracer can be used to capture rare or complex 
events, and gives the full flow control and what is happening within the 
kernel.

Ingo

>
   0 1D...  811us : sched_clock_idle_sleep_event (acpi_processor_idle)
   0 1D...  813us : _spin_lock (sched_clock_idle_sleep_event)
trace-cm 2463  0  814us : native_flush_tlb_others (flush_tlb_mm)
   0 1D...  815us : __update_rq_clock (sched_clock_idle_sleep_event)
trace-cm 2463  0  817us : _spin_lock (native_flush_tlb_others)
   0 1D...  817us+: acpi_cstate_enter (acpi_processor_idle)
trace-cm 2463  0  820us+: send_IPI_mask_bitmask (native_flush_tlb_others)
trace-cm 2463  0D...  823us+: apic_wait_icr_idle (send_IPI_mask_bitmask)
trace-cm 2463  0  856us+: up_write (copy_process)
trace-cm 2463  0  859us+: copy_keys (copy_process)
trace-cm 2463  0  862us+: copy_namespaces (copy_process)
trace-cm 2463  0  865us+: copy_thread (copy_process)
trace-cm 2463  0  868us+: memcpy (copy_thread)
trace-cm 2463  0  871us+: alloc_pid (copy_process)
trace-cm 2463  0  874us+: kmem_cache_alloc (alloc_pid)
trace-cm 2463  0  877us+: _spin_lock_irq (alloc_pid)
trace-cm 2463  0  880us+: _write_lock_irq (copy_process)
trace-cm 2463  0D...  883us+: _spin_lock (copy_process)
trace-cm 2463  0D...  887us+: recalc_sigpending (copy_process)
trace-cm 2463  0D...  890us+: recalc_sigpending_tsk (recalc_sigpending)
trace-cm 2463  0D...  893us+: attach_pid (copy_process)
trace-cm 2463  0D...  896us+: attach_pid (copy_process)
trace-cm 2463  0D...  899us+: attach_pid (copy_process)
trace-cm 2463  0  902us+: proc_fork_connector (copy_process)
trace-cm 2463  0  905us+: wake_up_new_task (do_fork)
trace-cm 2463  0  908us+: task_rq_lock (wake_up_new_task)
trace-cm 2463  0D...  911us+: _spin_lock (task_rq_lock)
trace-cm 2463  0D...  914us+: update_rq_clock (wake_up_new_task)
trace-cm 2463  0D...  918us+: __update_rq_clock (update_rq_clock)
trace-cm 2463  0D...  921us+: effective_prio (wake_up_new_task)
trace-cm 2463  0D...  924us+: wake_up_new_task  (0 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote:
> 
> well what does the trace say, where do the delays come from? To get a 
> quick overview you can make tracing lighter weight by doing:
> 
>  echo 0 > /proc/sys/kernel/mcount_enabled
>  echo 1 > /proc/sys/kernel/trace_syscalls

I mistyped and did 
 echo 1 > /proc/sys/kernel/mcount_enabled

Result looked like a livelock and finally convinced me to abandon the
latency tracer.  Sorry, but it appears to be the right tool for the
wrong job.

Jörn

-- 
They laughed at Galileo.  They laughed at Copernicus.  They laughed at
Columbus. But remember, they also laughed at Bozo the Clown.
-- unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> I do.  Went through 10odd runs and annotated the function right below 
> mcount each time.  Seems to work now.
> 
> Trouble is that it doesn't solve my real problem at hand.  Something 
> is causing significant delays when writing to logfs.  Core logfs code 
> is not running, but may cause whatever other code is running and 
> burning up all the cpu time.  Wasting 100ms of "qemu-time" to write a 
> single page happens fairly frequently.
> 
> With the latency tracer the problem appears to have become worse.  Now 
> the loftlockup code triggers quite frequently.  Which makes a bit of 
> sense, as the problem is a busy CPU, rather than an idle one.

well what does the trace say, where do the delays come from? To get a 
quick overview you can make tracing lighter weight by doing:

 echo 0 > /proc/sys/kernel/mcount_enabled
 echo 1 > /proc/sys/kernel/trace_syscalls

(this turns the latency tracer into a "global strace" kind of tracer)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote:
> > 
> > After another ten or so notrace annotations throughout the spinlock
> > code, the latency tracer appears to work.  Not sure how many useful
> > information is missing through all the annotations, though.
> 
> And here is a patch with the needed annotations.  Looks a bit shabby, 
> as it was generated though git diff, patcher, interdiff and 
> combinediff.

thanks - the patches applied fine, i've added them to my latency-tracer 
patchqueue. Steve Rostedt might want to pick the fixes up for -rt as 
well.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote:
> 
> After another ten or so notrace annotations throughout the spinlock
> code, the latency tracer appears to work.  Not sure how many useful
> information is missing through all the annotations, though.

And here is a patch with the needed annotations.  Looks a bit shabby, as
it was generated though git diff, patcher, interdiff and combinediff.

Jörn

-- 
Joern's library part 10:
http://blogs.msdn.com/David_Gristwood/archive/2004/06/24/164849.aspx

unchanged:
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -125,7 +125,7 @@ void __init setup_pit_timer(void)
  * to just read by itself. So use jiffies to emulate a free
  * running counter:
  */
-static cycle_t pit_read(void)
+static notrace cycle_t pit_read(void)
 {
unsigned long flags;
int count;
unchanged:
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -76,7 +76,7 @@ void __lockfunc _read_lock(rwlock_t *lock)
 }
 EXPORT_SYMBOL(_read_lock);
 
-unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock)
+unsigned long notrace __lockfunc _spin_lock_irqsave(spinlock_t *lock)
 {
unsigned long flags;
 
@@ -341,7 +341,7 @@ void __lockfunc _read_unlock(rwlock_t *lock)
 }
 EXPORT_SYMBOL(_read_unlock);
 
-void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
+void notrace __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned 
long flags)
 {
spin_release(>dep_map, 1, _RET_IP_);
_raw_spin_unlock(lock);
unchanged:
--- a/lib/spinlock_debug.c
+++ b/lib/spinlock_debug.c
@@ -148,7 +148,7 @@ int _raw_spin_trylock(spinlock_t *lock)
return ret;
 }
 
-void _raw_spin_unlock(spinlock_t *lock)
+void notrace _raw_spin_unlock(spinlock_t *lock)
 {
debug_spin_unlock(lock);
__raw_spin_unlock(>raw_lock);
only in patch2:
unchanged:
--- linux/arch/x86/kernel/tsc_32.c
+++ linux-2.6.24-rc3logfs/arch/x86/kernel/tsc_32.c  2007-12-02 
15:21:15.0 +0100
@@ -92,7 +92,7 @@
 /*
  * Scheduler clock - returns current time in nanosec units.
  */
-unsigned long long native_sched_clock(void)
+unsigned notrace long long native_sched_clock(void)
 {
unsigned long long this_offset;
 
only in patch2:
unchanged:
--- linux/kernel/lockdep.c
+++ linux-2.6.24-rc3logfs/kernel/lockdep.c  2007-12-02 15:21:16.0 
+0100
@@ -139,7 +139,7 @@
return i;
 }
 
-static void lock_time_inc(struct lock_time *lt, s64 time)
+static notrace void lock_time_inc(struct lock_time *lt, s64 time)
 {
if (time > lt->max)
lt->max = time;
@@ -198,7 +198,7 @@
memset(class->contention_point, 0, sizeof(class->contention_point));
 }
 
-static struct lock_class_stats *get_lock_stats(struct lock_class *class)
+static notrace struct lock_class_stats *get_lock_stats(struct lock_class 
*class)
 {
return _cpu_var(lock_stats)[class - lock_classes];
 }
@@ -208,7 +208,7 @@
put_cpu_var(lock_stats);
 }
 
-static void lock_release_holdtime(struct held_lock *hlock)
+static notrace void lock_release_holdtime(struct held_lock *hlock)
 {
struct lock_class_stats *stats;
s64 holdtime;
@@ -2872,7 +2872,7 @@
 }
 EXPORT_SYMBOL_GPL(lock_contended);
 
-void lock_acquired(struct lockdep_map *lock)
+void notrace lock_acquired(struct lockdep_map *lock)
 {
unsigned long flags;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 14:57:11 +0100, Ingo Molnar wrote:
> 
> hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable?

I do.  Went through 10odd runs and annotated the function right below
mcount each time.  Seems to work now.

Trouble is that it doesn't solve my real problem at hand.  Something is
causing significant delays when writing to logfs.  Core logfs code is
not running, but may cause whatever other code is running and burning up
all the cpu time.  Wasting 100ms of "qemu-time" to write a single page
happens fairly frequently.

With the latency tracer the problem appears to have become worse.  Now
the loftlockup code triggers quite frequently.  Which makes a bit of
sense, as the problem is a busy CPU, rather than an idle one.

Guess I'll try oprofile or lcov instead.

Jörn

-- 
Joern's library part 5:
http://www.faqs.org/faqs/compression-faq/part2/section-9.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote:
> > * Jörn Engel <[EMAIL PROTECTED]> wrote:
> > 
> > > > ah. You should mark pit_read() function as notrace. PIT clocksource 
> > > > is rare. (add the 'notrace' word to the function prototype)
> > > 
> > > Hardly a change at all.  Apart from some offsets, this dump is 
> > > identical.
> > > 
> > > stopped custom tracer.
> > > BUG: spinlock recursion on CPU#0, sh/954
> > >  lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
> > > Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
> > >  [] show_trace_log_lvl+0x35/0x54
> > >  [] show_trace+0x2c/0x2e
> > >  [] dump_stack+0x84/0x8a
> > >  [] spin_bug+0xa7/0xae
> > >  [] _raw_spin_lock+0x45/0xfa
> > >  [] _spin_lock_irqsave+0x68/0x7a
> > >  [] pit_read+0xf/0x91
> > >  [] get_monotonic_cycles+0xf/0x2d
> > >  [] now+0x2a/0x7c
> > >  [] trace+0x4d/0x1e8
> > >  [] __mcount+0x95/0xa6
> > >  [] mcount+0x14/0x18
> > >  [] lock_acquired+0xe/0x1d7
> > >  [] _spin_lock_irqsave+0x70/0x7a
> > >  [] pit_read+0xf/0x91
> > 
> > hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
> > 'notrace' too - otherwise we recurse back into pit_read().
> 
> This time not even the offsets have changed.  Dump is identical.

hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote:
> > 
> > This time not even the offsets have changed.  Dump is identical.
> 
> After another ten or so notrace annotations throughout the spinlock 
> code, the latency tracer appears to work.  Not sure how many useful 
> information is missing through all the annotations, though.

a few annotations out of thousands of function calls in the kernel it's 
usually not significant.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote:
> 
> This time not even the offsets have changed.  Dump is identical.

After another ten or so notrace annotations throughout the spinlock
code, the latency tracer appears to work.  Not sure how many useful
information is missing through all the annotations, though.

Jörn

-- 
Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo
vorher keine existiert hat.
-- Doris Lessing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> 
> > > ah. You should mark pit_read() function as notrace. PIT clocksource 
> > > is rare. (add the 'notrace' word to the function prototype)
> > 
> > Hardly a change at all.  Apart from some offsets, this dump is 
> > identical.
> > 
> > stopped custom tracer.
> > BUG: spinlock recursion on CPU#0, sh/954
> >  lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
> > Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
> >  [] show_trace_log_lvl+0x35/0x54
> >  [] show_trace+0x2c/0x2e
> >  [] dump_stack+0x84/0x8a
> >  [] spin_bug+0xa7/0xae
> >  [] _raw_spin_lock+0x45/0xfa
> >  [] _spin_lock_irqsave+0x68/0x7a
> >  [] pit_read+0xf/0x91
> >  [] get_monotonic_cycles+0xf/0x2d
> >  [] now+0x2a/0x7c
> >  [] trace+0x4d/0x1e8
> >  [] __mcount+0x95/0xa6
> >  [] mcount+0x14/0x18
> >  [] lock_acquired+0xe/0x1d7
> >  [] _spin_lock_irqsave+0x70/0x7a
> >  [] pit_read+0xf/0x91
> 
> hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
> 'notrace' too - otherwise we recurse back into pit_read().

This time not even the offsets have changed.  Dump is identical.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> > ah. You should mark pit_read() function as notrace. PIT clocksource 
> > is rare. (add the 'notrace' word to the function prototype)
> 
> Hardly a change at all.  Apart from some offsets, this dump is 
> identical.
> 
> stopped custom tracer.
> BUG: spinlock recursion on CPU#0, sh/954
>  lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
> Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
>  [] show_trace_log_lvl+0x35/0x54
>  [] show_trace+0x2c/0x2e
>  [] dump_stack+0x84/0x8a
>  [] spin_bug+0xa7/0xae
>  [] _raw_spin_lock+0x45/0xfa
>  [] _spin_lock_irqsave+0x68/0x7a
>  [] pit_read+0xf/0x91
>  [] get_monotonic_cycles+0xf/0x2d
>  [] now+0x2a/0x7c
>  [] trace+0x4d/0x1e8
>  [] __mcount+0x95/0xa6
>  [] mcount+0x14/0x18
>  [] lock_acquired+0xe/0x1d7
>  [] _spin_lock_irqsave+0x70/0x7a
>  [] pit_read+0xf/0x91

hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
'notrace' too - otherwise we recurse back into pit_read().

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

  ah. You should mark pit_read() function as notrace. PIT clocksource 
  is rare. (add the 'notrace' word to the function prototype)
 
 Hardly a change at all.  Apart from some offsets, this dump is 
 identical.
 
 stopped custom tracer.
 BUG: spinlock recursion on CPU#0, sh/954
  lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
  [c0103a04] show_trace_log_lvl+0x35/0x54
  [c010450a] show_trace+0x2c/0x2e
  [c0104e6d] dump_stack+0x84/0x8a
  [c01ded7c] spin_bug+0xa7/0xae
  [c01def14] _raw_spin_lock+0x45/0xfa
  [c02a02b1] _spin_lock_irqsave+0x68/0x7a
  [c01087e2] pit_read+0xf/0x91
  [c0130ee1] get_monotonic_cycles+0xf/0x2d
  [c013c0e7] now+0x2a/0x7c
  [c013c333] trace+0x4d/0x1e8
  [c013dbeb] __mcount+0x95/0xa6
  [c010d354] mcount+0x14/0x18
  [c0135a3c] lock_acquired+0xe/0x1d7
  [c02a02b9] _spin_lock_irqsave+0x70/0x7a
  [c01087e2] pit_read+0xf/0x91

hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
'notrace' too - otherwise we recurse back into pit_read().

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
 
   ah. You should mark pit_read() function as notrace. PIT clocksource 
   is rare. (add the 'notrace' word to the function prototype)
  
  Hardly a change at all.  Apart from some offsets, this dump is 
  identical.
  
  stopped custom tracer.
  BUG: spinlock recursion on CPU#0, sh/954
   lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
  Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
   [c0103a04] show_trace_log_lvl+0x35/0x54
   [c010450a] show_trace+0x2c/0x2e
   [c0104e6d] dump_stack+0x84/0x8a
   [c01ded7c] spin_bug+0xa7/0xae
   [c01def14] _raw_spin_lock+0x45/0xfa
   [c02a02b1] _spin_lock_irqsave+0x68/0x7a
   [c01087e2] pit_read+0xf/0x91
   [c0130ee1] get_monotonic_cycles+0xf/0x2d
   [c013c0e7] now+0x2a/0x7c
   [c013c333] trace+0x4d/0x1e8
   [c013dbeb] __mcount+0x95/0xa6
   [c010d354] mcount+0x14/0x18
   [c0135a3c] lock_acquired+0xe/0x1d7
   [c02a02b9] _spin_lock_irqsave+0x70/0x7a
   [c01087e2] pit_read+0xf/0x91
 
 hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
 'notrace' too - otherwise we recurse back into pit_read().

This time not even the offsets have changed.  Dump is identical.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote:
 
 This time not even the offsets have changed.  Dump is identical.

After another ten or so notrace annotations throughout the spinlock
code, the latency tracer appears to work.  Not sure how many useful
information is missing through all the annotations, though.

Jörn

-- 
Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo
vorher keine existiert hat.
-- Doris Lessing
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote:
  * Jörn Engel [EMAIL PROTECTED] wrote:
  
ah. You should mark pit_read() function as notrace. PIT clocksource 
is rare. (add the 'notrace' word to the function prototype)
   
   Hardly a change at all.  Apart from some offsets, this dump is 
   identical.
   
   stopped custom tracer.
   BUG: spinlock recursion on CPU#0, sh/954
lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
   Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
[c0103a04] show_trace_log_lvl+0x35/0x54
[c010450a] show_trace+0x2c/0x2e
[c0104e6d] dump_stack+0x84/0x8a
[c01ded7c] spin_bug+0xa7/0xae
[c01def14] _raw_spin_lock+0x45/0xfa
[c02a02b1] _spin_lock_irqsave+0x68/0x7a
[c01087e2] pit_read+0xf/0x91
[c0130ee1] get_monotonic_cycles+0xf/0x2d
[c013c0e7] now+0x2a/0x7c
[c013c333] trace+0x4d/0x1e8
[c013dbeb] __mcount+0x95/0xa6
[c010d354] mcount+0x14/0x18
[c0135a3c] lock_acquired+0xe/0x1d7
[c02a02b9] _spin_lock_irqsave+0x70/0x7a
[c01087e2] pit_read+0xf/0x91
  
  hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 
  'notrace' too - otherwise we recurse back into pit_read().
 
 This time not even the offsets have changed.  Dump is identical.

hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote:
  
  This time not even the offsets have changed.  Dump is identical.
 
 After another ten or so notrace annotations throughout the spinlock 
 code, the latency tracer appears to work.  Not sure how many useful 
 information is missing through all the annotations, though.

a few annotations out of thousands of function calls in the kernel it's 
usually not significant.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 14:57:11 +0100, Ingo Molnar wrote:
 
 hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable?

I do.  Went through 10odd runs and annotated the function right below
mcount each time.  Seems to work now.

Trouble is that it doesn't solve my real problem at hand.  Something is
causing significant delays when writing to logfs.  Core logfs code is
not running, but may cause whatever other code is running and burning up
all the cpu time.  Wasting 100ms of qemu-time to write a single page
happens fairly frequently.

With the latency tracer the problem appears to have become worse.  Now
the loftlockup code triggers quite frequently.  Which makes a bit of
sense, as the problem is a busy CPU, rather than an idle one.

Guess I'll try oprofile or lcov instead.

Jörn

-- 
Joern's library part 5:
http://www.faqs.org/faqs/compression-faq/part2/section-9.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote:
 
 After another ten or so notrace annotations throughout the spinlock
 code, the latency tracer appears to work.  Not sure how many useful
 information is missing through all the annotations, though.

And here is a patch with the needed annotations.  Looks a bit shabby, as
it was generated though git diff, patcher, interdiff and combinediff.

Jörn

-- 
Joern's library part 10:
http://blogs.msdn.com/David_Gristwood/archive/2004/06/24/164849.aspx

unchanged:
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -125,7 +125,7 @@ void __init setup_pit_timer(void)
  * to just read by itself. So use jiffies to emulate a free
  * running counter:
  */
-static cycle_t pit_read(void)
+static notrace cycle_t pit_read(void)
 {
unsigned long flags;
int count;
unchanged:
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -76,7 +76,7 @@ void __lockfunc _read_lock(rwlock_t *lock)
 }
 EXPORT_SYMBOL(_read_lock);
 
-unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock)
+unsigned long notrace __lockfunc _spin_lock_irqsave(spinlock_t *lock)
 {
unsigned long flags;
 
@@ -341,7 +341,7 @@ void __lockfunc _read_unlock(rwlock_t *lock)
 }
 EXPORT_SYMBOL(_read_unlock);
 
-void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
+void notrace __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned 
long flags)
 {
spin_release(lock-dep_map, 1, _RET_IP_);
_raw_spin_unlock(lock);
unchanged:
--- a/lib/spinlock_debug.c
+++ b/lib/spinlock_debug.c
@@ -148,7 +148,7 @@ int _raw_spin_trylock(spinlock_t *lock)
return ret;
 }
 
-void _raw_spin_unlock(spinlock_t *lock)
+void notrace _raw_spin_unlock(spinlock_t *lock)
 {
debug_spin_unlock(lock);
__raw_spin_unlock(lock-raw_lock);
only in patch2:
unchanged:
--- linux/arch/x86/kernel/tsc_32.c
+++ linux-2.6.24-rc3logfs/arch/x86/kernel/tsc_32.c  2007-12-02 
15:21:15.0 +0100
@@ -92,7 +92,7 @@
 /*
  * Scheduler clock - returns current time in nanosec units.
  */
-unsigned long long native_sched_clock(void)
+unsigned notrace long long native_sched_clock(void)
 {
unsigned long long this_offset;
 
only in patch2:
unchanged:
--- linux/kernel/lockdep.c
+++ linux-2.6.24-rc3logfs/kernel/lockdep.c  2007-12-02 15:21:16.0 
+0100
@@ -139,7 +139,7 @@
return i;
 }
 
-static void lock_time_inc(struct lock_time *lt, s64 time)
+static notrace void lock_time_inc(struct lock_time *lt, s64 time)
 {
if (time  lt-max)
lt-max = time;
@@ -198,7 +198,7 @@
memset(class-contention_point, 0, sizeof(class-contention_point));
 }
 
-static struct lock_class_stats *get_lock_stats(struct lock_class *class)
+static notrace struct lock_class_stats *get_lock_stats(struct lock_class 
*class)
 {
return get_cpu_var(lock_stats)[class - lock_classes];
 }
@@ -208,7 +208,7 @@
put_cpu_var(lock_stats);
 }
 
-static void lock_release_holdtime(struct held_lock *hlock)
+static notrace void lock_release_holdtime(struct held_lock *hlock)
 {
struct lock_class_stats *stats;
s64 holdtime;
@@ -2872,7 +2872,7 @@
 }
 EXPORT_SYMBOL_GPL(lock_contended);
 
-void lock_acquired(struct lockdep_map *lock)
+void notrace lock_acquired(struct lockdep_map *lock)
 {
unsigned long flags;
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote:
  
  After another ten or so notrace annotations throughout the spinlock
  code, the latency tracer appears to work.  Not sure how many useful
  information is missing through all the annotations, though.
 
 And here is a patch with the needed annotations.  Looks a bit shabby, 
 as it was generated though git diff, patcher, interdiff and 
 combinediff.

thanks - the patches applied fine, i've added them to my latency-tracer 
patchqueue. Steve Rostedt might want to pick the fixes up for -rt as 
well.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 I do.  Went through 10odd runs and annotated the function right below 
 mcount each time.  Seems to work now.
 
 Trouble is that it doesn't solve my real problem at hand.  Something 
 is causing significant delays when writing to logfs.  Core logfs code 
 is not running, but may cause whatever other code is running and 
 burning up all the cpu time.  Wasting 100ms of qemu-time to write a 
 single page happens fairly frequently.
 
 With the latency tracer the problem appears to have become worse.  Now 
 the loftlockup code triggers quite frequently.  Which makes a bit of 
 sense, as the problem is a busy CPU, rather than an idle one.

well what does the trace say, where do the delays come from? To get a 
quick overview you can make tracing lighter weight by doing:

 echo 0  /proc/sys/kernel/mcount_enabled
 echo 1  /proc/sys/kernel/trace_syscalls

(this turns the latency tracer into a global strace kind of tracer)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote:
 
 well what does the trace say, where do the delays come from? To get a 
 quick overview you can make tracing lighter weight by doing:
 
  echo 0  /proc/sys/kernel/mcount_enabled
  echo 1  /proc/sys/kernel/trace_syscalls

I mistyped and did 
 echo 1  /proc/sys/kernel/mcount_enabled

Result looked like a livelock and finally convinced me to abandon the
latency tracer.  Sorry, but it appears to be the right tool for the
wrong job.

Jörn

-- 
They laughed at Galileo.  They laughed at Copernicus.  They laughed at
Columbus. But remember, they also laughed at Bozo the Clown.
-- unknown
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote:
  
  well what does the trace say, where do the delays come from? To get a 
  quick overview you can make tracing lighter weight by doing:
  
   echo 0  /proc/sys/kernel/mcount_enabled
   echo 1  /proc/sys/kernel/trace_syscalls
 
 I mistyped and did 
  echo 1  /proc/sys/kernel/mcount_enabled
 
 Result looked like a livelock and finally convinced me to abandon the 
 latency tracer.  Sorry, but it appears to be the right tool for the 
 wrong job.

hm, we routinely use it in -rt to capture what on earth is happening 
incidents. The snippet below is a random snipped from a trace that i've 
just captured, with mcount enabled. It seems to work fine here, with and 
without mcount. (pit clocksource is almost never used, that's why you 
had those early problems.)

oprofile helps if you can reliably reproduce the slowdown in a loop or 
for a long amount of time, with lots of CPU utilization - and then it's 
also lower overhead. The tracer can be used to capture rare or complex 
events, and gives the full flow control and what is happening within the 
kernel.

Ingo


  idle 0 1D...  811us : sched_clock_idle_sleep_event (acpi_processor_idle)
  idle 0 1D...  813us : _spin_lock (sched_clock_idle_sleep_event)
trace-cm 2463  0  814us : native_flush_tlb_others (flush_tlb_mm)
  idle 0 1D...  815us : __update_rq_clock (sched_clock_idle_sleep_event)
trace-cm 2463  0  817us : _spin_lock (native_flush_tlb_others)
  idle 0 1D...  817us+: acpi_cstate_enter (acpi_processor_idle)
trace-cm 2463  0  820us+: send_IPI_mask_bitmask (native_flush_tlb_others)
trace-cm 2463  0D...  823us+: apic_wait_icr_idle (send_IPI_mask_bitmask)
trace-cm 2463  0  856us+: up_write (copy_process)
trace-cm 2463  0  859us+: copy_keys (copy_process)
trace-cm 2463  0  862us+: copy_namespaces (copy_process)
trace-cm 2463  0  865us+: copy_thread (copy_process)
trace-cm 2463  0  868us+: memcpy (copy_thread)
trace-cm 2463  0  871us+: alloc_pid (copy_process)
trace-cm 2463  0  874us+: kmem_cache_alloc (alloc_pid)
trace-cm 2463  0  877us+: _spin_lock_irq (alloc_pid)
trace-cm 2463  0  880us+: _write_lock_irq (copy_process)
trace-cm 2463  0D...  883us+: _spin_lock (copy_process)
trace-cm 2463  0D...  887us+: recalc_sigpending (copy_process)
trace-cm 2463  0D...  890us+: recalc_sigpending_tsk (recalc_sigpending)
trace-cm 2463  0D...  893us+: attach_pid (copy_process)
trace-cm 2463  0D...  896us+: attach_pid (copy_process)
trace-cm 2463  0D...  899us+: attach_pid (copy_process)
trace-cm 2463  0  902us+: proc_fork_connector (copy_process)
trace-cm 2463  0  905us+: wake_up_new_task (do_fork)
trace-cm 2463  0  908us+: task_rq_lock (wake_up_new_task)
trace-cm 2463  0D...  911us+: _spin_lock (task_rq_lock)
trace-cm 2463  0D...  914us+: update_rq_clock (wake_up_new_task)
trace-cm 2463  0D...  918us+: __update_rq_clock (update_rq_clock)
trace-cm 2463  0D...  921us+: effective_prio (wake_up_new_task)
trace-cm 2463  0D...  924us+: wake_up_new_task hwclock 2474 (0 0)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:07:22 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
 
  Result looked like a livelock and finally convinced me to abandon the 
  latency tracer.  Sorry, but it appears to be the right tool for the 
  wrong job.
 
 hm, we routinely use it in -rt to capture what on earth is happening 
 incidents. The snippet below is a random snipped from a trace that i've 
 just captured, with mcount enabled. It seems to work fine here, with and 
 without mcount. (pit clocksource is almost never used, that's why you 
 had those early problems.)
 
 oprofile helps if you can reliably reproduce the slowdown in a loop or 
 for a long amount of time, with lots of CPU utilization - and then it's 
 also lower overhead. The tracer can be used to capture rare or complex 
 events, and gives the full flow control and what is happening within the 
 kernel.

Such a trace would be useful indeed.  But so far the patch has only
given me grief and nothing remotely like useful output.  Maybe I should
simply use the complete -rt patch instead of debugging the broken-out
latency-tracer patch.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

  oprofile helps if you can reliably reproduce the slowdown in a loop 
  or for a long amount of time, with lots of CPU utilization - and 
  then it's also lower overhead. The tracer can be used to capture 
  rare or complex events, and gives the full flow control and what is 
  happening within the kernel.
 
 Such a trace would be useful indeed.  But so far the patch has only 
 given me grief and nothing remotely like useful output.  Maybe I 
 should simply use the complete -rt patch instead of debugging the 
 broken-out latency-tracer patch.

to capture that trace i did not use -rt, i just patched latest -git 
with:

  
http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch

(this has your fixes included already)

have done:

  echo 1  /proc/sys/kernel/mcount_enabled

and have run:

  ./trace-cmd sleep 1  trace.txt

  http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c

to capture a 1 second trace of what the system is doing. I think your 
troubles are due to running it within a qemu guest - that is not a 
typical utilization so you are on unchartered waters.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote:
 
 to capture a 1 second trace of what the system is doing. I think your 
 troubles are due to running it within a qemu guest - that is not a 
 typical utilization so you are on unchartered waters.

Looks like it.  Guess I'll switch to something else for the moment.

Jörn

-- 
Linux is more the core point of a concept that surrounds open source
which, in turn, is based on a false concept. This concept is that
people actually want to look at source code.
-- Rob Enderle
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote:
 
 to capture that trace i did not use -rt, i just patched latest -git 
 with:
 
   
 http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
 
 (this has your fixes included already)
 
 have done:
 
   echo 1  /proc/sys/kernel/mcount_enabled
 
 and have run:
 
   ./trace-cmd sleep 1  trace.txt
 
   http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c
 
 to capture a 1 second trace of what the system is doing. I think your 
 troubles are due to running it within a qemu guest - that is not a 
 typical utilization so you are on unchartered waters.

Maybe one more thing: can you send me the config you used for the setup
above?  I'd like to know whether qemu or my config is to blame.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-02 Thread Jörn Engel

On Sun, 2 December 2007 22:19:00 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
 
  Maybe one more thing: can you send me the config you used for the 
  setup above?  I'd like to know whether qemu or my config is to blame.
 
 sure - attached.

After an eternity of compile time, this config does generate some useful
output.  qemu is not to blame.

Jörn

-- 
Joern's library part 9:
http://www.scl.ameslab.gov/Publications/Gus/TwelveWays.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Sat, 1 December 2007 21:54:56 +0100, Ingo Molnar wrote:
> * J??rn Engel <[EMAIL PROTECTED]> wrote:
> > 
> > stopped custom tracer.
> > BUG: spinlock recursion on CPU#0, sh/953
> >  lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
> > Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
> >  [] show_trace_log_lvl+0x35/0x54
> >  [] show_trace+0x2c/0x2e
> >  [] dump_stack+0x84/0x8a
> >  [] spin_bug+0xa7/0xae
> >  [] _raw_spin_lock+0x45/0xfa
> >  [] _spin_lock_irqsave+0x68/0x7a
> >  [] pit_read+0x14/0x99
> >  [] get_monotonic_cycles+0xf/0x2d
> 
> ah. You should mark pit_read() function as notrace. PIT clocksource is 
> rare. (add the 'notrace' word to the function prototype)

Hardly a change at all.  Apart from some offsets, this dump is
identical.

stopped custom tracer.
BUG: spinlock recursion on CPU#0, sh/954
 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
 [] show_trace_log_lvl+0x35/0x54
 [] show_trace+0x2c/0x2e
 [] dump_stack+0x84/0x8a
 [] spin_bug+0xa7/0xae
 [] _raw_spin_lock+0x45/0xfa
 [] _spin_lock_irqsave+0x68/0x7a
 [] pit_read+0xf/0x91
 [] get_monotonic_cycles+0xf/0x2d
 [] now+0x2a/0x7c
 [] trace+0x4d/0x1e8
 [] __mcount+0x95/0xa6
 [] mcount+0x14/0x18
 [] lock_acquired+0xe/0x1d7
 [] _spin_lock_irqsave+0x70/0x7a
 [] pit_read+0xf/0x91
 [] update_wall_time+0x23/0x692
 [] do_timer+0x24/0xb1
 [] tick_periodic+0x49/0x84
 [] tick_handle_periodic+0x22/0x73
 [] timer_interrupt+0x4f/0x56
 [] handle_IRQ_event+0x24/0x4f
 [] handle_edge_irq+0xb8/0x125
 [] do_IRQ+0x89/0xa3
 [] common_interrupt+0x23/0x28
 [] mcount+0x14/0x18
 [] sysctl_head_finish+0xc/0x33
 [] proc_sys_write+0x96/0xa0
 [] vfs_write+0xa6/0x14c
 [] sys_write+0x4c/0x70
 [] syscall_call+0x7/0xb
 ===

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Ingo Molnar


* J??rn Engel <[EMAIL PROTECTED]> wrote:

> Almost.  "-serial stdio" was missing.  Much better now.
> 
> stopped custom tracer.
> BUG: spinlock recursion on CPU#0, sh/953
>  lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
> Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
>  [] show_trace_log_lvl+0x35/0x54
>  [] show_trace+0x2c/0x2e
>  [] dump_stack+0x84/0x8a
>  [] spin_bug+0xa7/0xae
>  [] _raw_spin_lock+0x45/0xfa
>  [] _spin_lock_irqsave+0x68/0x7a
>  [] pit_read+0x14/0x99
>  [] get_monotonic_cycles+0xf/0x2d

ah. You should mark pit_read() function as notrace. PIT clocksource is 
rare. (add the 'notrace' word to the function prototype)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Sat, 1 December 2007 19:32:56 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> 
> > I have to change my qemu setup a little to see the top of those 
> > dumps...
> 
> btw., if you start qemu like this:
> 
> qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel 
> ~/bzImage -append "root=/dev/hda1 earlyprintk=serial,ttyS0,9600 
> console=tty console=ttyS0,9600 enforcing=0 debug"
> 
> you'll get the inner kernel's serial console log to qemu's standard 
> output. Pretty useful for capturing kernel crashes.

Almost.  "-serial stdio" was missing.  Much better now.

stopped custom tracer.
BUG: spinlock recursion on CPU#0, sh/953
 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
 [] show_trace_log_lvl+0x35/0x54
 [] show_trace+0x2c/0x2e
 [] dump_stack+0x84/0x8a
 [] spin_bug+0xa7/0xae
 [] _raw_spin_lock+0x45/0xfa
 [] _spin_lock_irqsave+0x68/0x7a
 [] pit_read+0x14/0x99
 [] get_monotonic_cycles+0xf/0x2d
 [] now+0x2a/0x7c
 [] trace+0x4d/0x1e8
 [] __mcount+0x95/0xa6
 [] mcount+0x14/0x18
 [] lock_acquired+0xe/0x1d7
 [] _spin_lock_irqsave+0x70/0x7a
 [] pit_read+0x14/0x99
 [] update_wall_time+0x23/0x692
 [] do_timer+0x24/0xb1
 [] tick_periodic+0x49/0x84
 [] tick_handle_periodic+0x22/0x73
 [] timer_interrupt+0x4f/0x56
 [] handle_IRQ_event+0x24/0x4f
 [] handle_edge_irq+0xb8/0x125
 [] do_IRQ+0x89/0xa3
 [] common_interrupt+0x23/0x28
 [] vfs_write+0xa6/0x14c
 [] sys_write+0x4c/0x70
 [] syscall_call+0x7/0xb
 ===

I assume you have the latency tracer working.  If you could send me your
config, I could do a manual config-bisect and see which part of mine
causes the problem.

Jörn

-- 
Admonish your friends privately, but praise them openly.
-- Publilius Syrus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> > hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
> > improve things? (or just turns a noisy lockup into a silent lockup?)
> 
> Not much, although the dumps look different now:
> http://logfs.org/~joern/trace3.jpg
> http://logfs.org/~joern/trace4.jpg
> 
> I have to change my qemu setup a little to see the top of those 
> dumps...

btw., if you start qemu like this:

qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel 
~/bzImage -append "root=/dev/hda1 earlyprintk=serial,ttyS0,9600 
console=tty console=ttyS0,9600 enforcing=0 debug"

you'll get the inner kernel's serial console log to qemu's standard 
output. Pretty useful for capturing kernel crashes.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Fri, 30 November 2007 19:46:25 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> > 
> > It compiles.  It boots with a 512M RAM (384M was too little with all
> > the other debug options on).  But it seems to lock up when running
> > trace-cmd.  On a rerun it locks up again, but with different output.
> 
> hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 
> million to 16K or so. 1 million entries probably depletes lowmem quite 
> seriously.

That's ok.  RAM is cheap.

> > Rerun was captured:
> > http://logfs.org/~joern/trace1.jpg
> 
> hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
> improve things? (or just turns a noisy lockup into a silent lockup?)

Not much, although the dumps look different now:
http://logfs.org/~joern/trace3.jpg
http://logfs.org/~joern/trace4.jpg

I have to change my qemu setup a little to see the top of those
dumps...

> > I should do a couple of runs, but my girlfriend claims realtime 
> > priority for the evening.
> 
> yeah, SCHED_IDLE is not generally well received by them.

...as soon as more urgent tasks has finished (weekend is over).

Jörn

-- 
It does not matter how slowly you go, so long as you do not stop.
-- Confucius
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Fri, 30 November 2007 19:46:25 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
  
  It compiles.  It boots with a 512M RAM (384M was too little with all
  the other debug options on).  But it seems to lock up when running
  trace-cmd.  On a rerun it locks up again, but with different output.
 
 hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 
 million to 16K or so. 1 million entries probably depletes lowmem quite 
 seriously.

That's ok.  RAM is cheap.

  Rerun was captured:
  http://logfs.org/~joern/trace1.jpg
 
 hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
 improve things? (or just turns a noisy lockup into a silent lockup?)

Not much, although the dumps look different now:
http://logfs.org/~joern/trace3.jpg
http://logfs.org/~joern/trace4.jpg

I have to change my qemu setup a little to see the top of those
dumps...

  I should do a couple of runs, but my girlfriend claims realtime 
  priority for the evening.
 
 yeah, SCHED_IDLE is not generally well received by them.

...as soon as more urgent tasks has finished (weekend is over).

Jörn

-- 
It does not matter how slowly you go, so long as you do not stop.
-- Confucius
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

  hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
  improve things? (or just turns a noisy lockup into a silent lockup?)
 
 Not much, although the dumps look different now:
 http://logfs.org/~joern/trace3.jpg
 http://logfs.org/~joern/trace4.jpg
 
 I have to change my qemu setup a little to see the top of those 
 dumps...

btw., if you start qemu like this:

qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel 
~/bzImage -append root=/dev/hda1 earlyprintk=serial,ttyS0,9600 
console=tty console=ttyS0,9600 enforcing=0 debug

you'll get the inner kernel's serial console log to qemu's standard 
output. Pretty useful for capturing kernel crashes.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Sat, 1 December 2007 19:32:56 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
 
  I have to change my qemu setup a little to see the top of those 
  dumps...
 
 btw., if you start qemu like this:
 
 qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel 
 ~/bzImage -append root=/dev/hda1 earlyprintk=serial,ttyS0,9600 
 console=tty console=ttyS0,9600 enforcing=0 debug
 
 you'll get the inner kernel's serial console log to qemu's standard 
 output. Pretty useful for capturing kernel crashes.

Almost.  -serial stdio was missing.  Much better now.

stopped custom tracer.
BUG: spinlock recursion on CPU#0, sh/953
 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
 [c0103a04] show_trace_log_lvl+0x35/0x54
 [c010450a] show_trace+0x2c/0x2e
 [c0104e6d] dump_stack+0x84/0x8a
 [c01ded7c] spin_bug+0xa7/0xae
 [c01def14] _raw_spin_lock+0x45/0xfa
 [c02a02b1] _spin_lock_irqsave+0x68/0x7a
 [c01087e7] pit_read+0x14/0x99
 [c0130ee9] get_monotonic_cycles+0xf/0x2d
 [c013c0ef] now+0x2a/0x7c
 [c013c33b] trace+0x4d/0x1e8
 [c013dbf3] __mcount+0x95/0xa6
 [c010d35c] mcount+0x14/0x18
 [c0135a44] lock_acquired+0xe/0x1d7
 [c02a02b9] _spin_lock_irqsave+0x70/0x7a
 [c01087e7] pit_read+0x14/0x99
 [c0130791] update_wall_time+0x23/0x692
 [c0121756] do_timer+0x24/0xb1
 [c01331fe] tick_periodic+0x49/0x84
 [c013325b] tick_handle_periodic+0x22/0x73
 [c0106315] timer_interrupt+0x4f/0x56
 [c013e2c7] handle_IRQ_event+0x24/0x4f
 [c013f44a] handle_edge_irq+0xb8/0x125
 [c01054ee] do_IRQ+0x89/0xa3
 [c01033df] common_interrupt+0x23/0x28
 [c015d924] vfs_write+0xa6/0x14c
 [c015df6e] sys_write+0x4c/0x70
 [c0102a1f] syscall_call+0x7/0xb
 ===

I assume you have the latency tracer working.  If you could send me your
config, I could do a manual config-bisect and see which part of mine
causes the problem.

Jörn

-- 
Admonish your friends privately, but praise them openly.
-- Publilius Syrus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Ingo Molnar


* J??rn Engel [EMAIL PROTECTED] wrote:

 Almost.  -serial stdio was missing.  Much better now.
 
 stopped custom tracer.
 BUG: spinlock recursion on CPU#0, sh/953
  lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
 Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
  [c0103a04] show_trace_log_lvl+0x35/0x54
  [c010450a] show_trace+0x2c/0x2e
  [c0104e6d] dump_stack+0x84/0x8a
  [c01ded7c] spin_bug+0xa7/0xae
  [c01def14] _raw_spin_lock+0x45/0xfa
  [c02a02b1] _spin_lock_irqsave+0x68/0x7a
  [c01087e7] pit_read+0x14/0x99
  [c0130ee9] get_monotonic_cycles+0xf/0x2d

ah. You should mark pit_read() function as notrace. PIT clocksource is 
rare. (add the 'notrace' word to the function prototype)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-01 Thread Jörn Engel

On Sat, 1 December 2007 21:54:56 +0100, Ingo Molnar wrote:
 * J??rn Engel [EMAIL PROTECTED] wrote:
  
  stopped custom tracer.
  BUG: spinlock recursion on CPU#0, sh/953
   lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0
  Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2
   [c0103a04] show_trace_log_lvl+0x35/0x54
   [c010450a] show_trace+0x2c/0x2e
   [c0104e6d] dump_stack+0x84/0x8a
   [c01ded7c] spin_bug+0xa7/0xae
   [c01def14] _raw_spin_lock+0x45/0xfa
   [c02a02b1] _spin_lock_irqsave+0x68/0x7a
   [c01087e7] pit_read+0x14/0x99
   [c0130ee9] get_monotonic_cycles+0xf/0x2d
 
 ah. You should mark pit_read() function as notrace. PIT clocksource is 
 rare. (add the 'notrace' word to the function prototype)

Hardly a change at all.  Apart from some offsets, this dump is
identical.

stopped custom tracer.
BUG: spinlock recursion on CPU#0, sh/954
 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0
Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3
 [c0103a04] show_trace_log_lvl+0x35/0x54
 [c010450a] show_trace+0x2c/0x2e
 [c0104e6d] dump_stack+0x84/0x8a
 [c01ded7c] spin_bug+0xa7/0xae
 [c01def14] _raw_spin_lock+0x45/0xfa
 [c02a02b1] _spin_lock_irqsave+0x68/0x7a
 [c01087e2] pit_read+0xf/0x91
 [c0130ee1] get_monotonic_cycles+0xf/0x2d
 [c013c0e7] now+0x2a/0x7c
 [c013c333] trace+0x4d/0x1e8
 [c013dbeb] __mcount+0x95/0xa6
 [c010d354] mcount+0x14/0x18
 [c0135a3c] lock_acquired+0xe/0x1d7
 [c02a02b9] _spin_lock_irqsave+0x70/0x7a
 [c01087e2] pit_read+0xf/0x91
 [c0130789] update_wall_time+0x23/0x692
 [c012174e] do_timer+0x24/0xb1
 [c01331f6] tick_periodic+0x49/0x84
 [c0133253] tick_handle_periodic+0x22/0x73
 [c0106315] timer_interrupt+0x4f/0x56
 [c013e2bf] handle_IRQ_event+0x24/0x4f
 [c013f442] handle_edge_irq+0xb8/0x125
 [c01054ee] do_IRQ+0x89/0xa3
 [c01033df] common_interrupt+0x23/0x28
 [c010d354] mcount+0x14/0x18
 [c0120130] sysctl_head_finish+0xc/0x33
 [c0192d64] proc_sys_write+0x96/0xa0
 [c015d91c] vfs_write+0xa6/0x14c
 [c015df66] sys_write+0x4c/0x70
 [c0102a1f] syscall_call+0x7/0xb
 ===

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Mark Lord


Mark Lord wrote:

Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch 


   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

Is there a version of these that works with 2.6.23.1 ?

I'm not using 2.6.24-* on this machine because:
 (a) behaviour may be different, and
 (b) something broke VMware compatibility again.

..

An update:

Ingo's 2.6.23 version of the latency-tracing-patches only lock-up on resume 
here.

But now that I've hacked vmware to work on 2.6.24,
my notebook is now running the newer kernels.

So now to see if the "strange 1-second pauses" ever happen here,
and if they do I'll patch in Ingo's stuff to try and find out why.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote:
> > 
> >   
> > http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
> > 
> > does it work any better?
> 
> It compiles.  It boots with a 512M RAM (384M was too little with all
> the other debug options on).  But it seems to lock up when running
> trace-cmd.  On a rerun it locks up again, but with different output.

hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 
million to 16K or so. 1 million entries probably depletes lowmem quite 
seriously.

> Rerun was captured:
> http://logfs.org/~joern/trace1.jpg

hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
improve things? (or just turns a noisy lockup into a silent lockup?)

> I should do a couple of runs, but my girlfriend claims realtime 
> priority for the evening.

yeah, SCHED_IDLE is not generally well received by them.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote:
> 
>   
> http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
> 
> does it work any better?

It compiles.  It boots with a 512M RAM (384M was too little with all
the other debug options on).  But it seems to lock up when running
trace-cmd.  On a rerun it locks up again, but with different output.
Rerun was captured:
http://logfs.org/~joern/trace1.jpg

I should do a couple of runs, but my girlfriend claims realtime priority
for the evening.

Jörn

-- 
Chance favors only the prepared mind.
-- Louis Pasteur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Fri, 30 November 2007 14:35:46 +0100, Ingo Molnar wrote:
> * Jörn Engel <[EMAIL PROTECTED]> wrote:
> > 
> > kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter 
> > list
> > kernel/sched.c:3384: warning: its scope is only this definition or 
> > declaration, which is probably not what you want
> > kernel/sched.c: In function ‘trace_array’:
> > kernel/sched.c:3391: error: dereferencing pointer to incomplete type
> > kernel/sched.c:3393: error: dereferencing pointer to incomplete type
> > kernel/sched.c:3393: error: dereferencing pointer to incomplete type
> > kernel/sched.c:3396: error: dereferencing pointer to incomplete type
> > kernel/sched.c:3396: error: dereferencing pointer to incomplete type
> > kernel/sched.c: In function ‘trace_all_runnable_tasks’:
> > kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
> > make[1]: *** [kernel/sched.o] Error 1
> > 
> > And I cannot find a definition of struct prio_array in current git
> > either.  Is another patch needed?
> 
> change that to rt_prio_array in the code.

Solves the prio_array problem, but leaves the non-existing member
active.  I've upgraded to -rc3 and will give your latest patch a whirl.

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > > sorry, wrong URLs, the correct links are:
> > > 
> > >
> > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
> > >http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
> > 
> > Don't seem to work with plain 2.6.23:

could you try this updated version:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch

does it work any better?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote:
> > * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> > 
> > > pick up the latest latency tracer patch from:
> > 
> > sorry, wrong URLs, the correct links are:
> > 
> >
> > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
> >http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
> 
> Don't seem to work with plain 2.6.23:
> 
> kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter 
> list
> kernel/sched.c:3384: warning: its scope is only this definition or 
> declaration, which is probably not what you want
> kernel/sched.c: In function ‘trace_array’:
> kernel/sched.c:3391: error: dereferencing pointer to incomplete type
> kernel/sched.c:3393: error: dereferencing pointer to incomplete type
> kernel/sched.c:3393: error: dereferencing pointer to incomplete type
> kernel/sched.c:3396: error: dereferencing pointer to incomplete type
> kernel/sched.c:3396: error: dereferencing pointer to incomplete type
> kernel/sched.c: In function ‘trace_all_runnable_tasks’:
> kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
> make[1]: *** [kernel/sched.o] Error 1
> 
> And I cannot find a definition of struct prio_array in current git
> either.  Is another patch needed?

change that to rt_prio_array in the code.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > pick up the latest latency tracer patch from:
> 
> sorry, wrong URLs, the correct links are:
> 
>
> http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
>http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

Don't seem to work with plain 2.6.23:

kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list
kernel/sched.c:3384: warning: its scope is only this definition or declaration, 
which is probably not what you want
kernel/sched.c: In function ‘trace_array’:
kernel/sched.c:3391: error: dereferencing pointer to incomplete type
kernel/sched.c:3393: error: dereferencing pointer to incomplete type
kernel/sched.c:3393: error: dereferencing pointer to incomplete type
kernel/sched.c:3396: error: dereferencing pointer to incomplete type
kernel/sched.c:3396: error: dereferencing pointer to incomplete type
kernel/sched.c: In function ‘trace_all_runnable_tasks’:
kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
make[1]: *** [kernel/sched.o] Error 1

And I cannot find a definition of struct prio_array in current git
either.  Is another patch needed?

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote:
  

  http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
  
  does it work any better?
 
 It compiles.  It boots with a 512M RAM (384M was too little with all
 the other debug options on).  But it seems to lock up when running
 trace-cmd.  On a rerun it locks up again, but with different output.

hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 
million to 16K or so. 1 million entries probably depletes lowmem quite 
seriously.

 Rerun was captured:
 http://logfs.org/~joern/trace1.jpg

hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that 
improve things? (or just turns a noisy lockup into a silent lockup?)

 I should do a couple of runs, but my girlfriend claims realtime 
 priority for the evening.

yeah, SCHED_IDLE is not generally well received by them.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote:
 
   
 http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch
 
 does it work any better?

It compiles.  It boots with a 512M RAM (384M was too little with all
the other debug options on).  But it seems to lock up when running
trace-cmd.  On a rerun it locks up again, but with different output.
Rerun was captured:
http://logfs.org/~joern/trace1.jpg

I should do a couple of runs, but my girlfriend claims realtime priority
for the evening.

Jörn

-- 
Chance favors only the prepared mind.
-- Louis Pasteur
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

   sorry, wrong URLs, the correct links are:
   
  
   http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
  http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
  
  Don't seem to work with plain 2.6.23:

could you try this updated version:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch

does it work any better?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote:
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
  pick up the latest latency tracer patch from:
 
 sorry, wrong URLs, the correct links are:
 

 http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

Don't seem to work with plain 2.6.23:

kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list
kernel/sched.c:3384: warning: its scope is only this definition or declaration, 
which is probably not what you want
kernel/sched.c: In function ‘trace_array’:
kernel/sched.c:3391: error: dereferencing pointer to incomplete type
kernel/sched.c:3393: error: dereferencing pointer to incomplete type
kernel/sched.c:3393: error: dereferencing pointer to incomplete type
kernel/sched.c:3396: error: dereferencing pointer to incomplete type
kernel/sched.c:3396: error: dereferencing pointer to incomplete type
kernel/sched.c: In function ‘trace_all_runnable_tasks’:
kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
make[1]: *** [kernel/sched.o] Error 1

And I cannot find a definition of struct prio_array in current git
either.  Is another patch needed?

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Jörn Engel

On Fri, 30 November 2007 14:35:46 +0100, Ingo Molnar wrote:
 * Jörn Engel [EMAIL PROTECTED] wrote:
  
  kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter 
  list
  kernel/sched.c:3384: warning: its scope is only this definition or 
  declaration, which is probably not what you want
  kernel/sched.c: In function ‘trace_array’:
  kernel/sched.c:3391: error: dereferencing pointer to incomplete type
  kernel/sched.c:3393: error: dereferencing pointer to incomplete type
  kernel/sched.c:3393: error: dereferencing pointer to incomplete type
  kernel/sched.c:3396: error: dereferencing pointer to incomplete type
  kernel/sched.c:3396: error: dereferencing pointer to incomplete type
  kernel/sched.c: In function ‘trace_all_runnable_tasks’:
  kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
  make[1]: *** [kernel/sched.o] Error 1
  
  And I cannot find a definition of struct prio_array in current git
  either.  Is another patch needed?
 
 change that to rt_prio_array in the code.

Solves the prio_array problem, but leaves the non-existing member
active.  I've upgraded to -rc3 and will give your latest patch a whirl.

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Ingo Molnar


* Jörn Engel [EMAIL PROTECTED] wrote:

 On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote:
  * Ingo Molnar [EMAIL PROTECTED] wrote:
  
   pick up the latest latency tracer patch from:
  
  sorry, wrong URLs, the correct links are:
  
 
  http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
 http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
 
 Don't seem to work with plain 2.6.23:
 
 kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter 
 list
 kernel/sched.c:3384: warning: its scope is only this definition or 
 declaration, which is probably not what you want
 kernel/sched.c: In function ‘trace_array’:
 kernel/sched.c:3391: error: dereferencing pointer to incomplete type
 kernel/sched.c:3393: error: dereferencing pointer to incomplete type
 kernel/sched.c:3393: error: dereferencing pointer to incomplete type
 kernel/sched.c:3396: error: dereferencing pointer to incomplete type
 kernel/sched.c:3396: error: dereferencing pointer to incomplete type
 kernel/sched.c: In function ‘trace_all_runnable_tasks’:
 kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’
 make[1]: *** [kernel/sched.o] Error 1
 
 And I cannot find a definition of struct prio_array in current git
 either.  Is another patch needed?

change that to rt_prio_array in the code.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-30 Thread Mark Lord


Mark Lord wrote:

Ingo Molnar wrote:

* Ingo Molnar [EMAIL PROTECTED] wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch 


   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

Is there a version of these that works with 2.6.23.1 ?

I'm not using 2.6.24-* on this machine because:
 (a) behaviour may be different, and
 (b) something broke VMware compatibility again.

..

An update:

Ingo's 2.6.23 version of the latency-tracing-patches only lock-up on resume 
here.

But now that I've hacked vmware to work on 2.6.24,
my notebook is now running the newer kernels.

So now to see if the strange 1-second pauses ever happen here,
and if they do I'll patch in Ingo's stuff to try and find out why.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Mark Lord


Ingo Molnar wrote:

* Mark Lord <[EMAIL PROTECTED]> wrote:

Since Ingo's latency trace patches lock up the machine on resume, the 
next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y.


hm, which patch did you try? Could you check whether all chunks from the 
patch below are applied? (these are the fixed i did when i was doing 
cross-suspend traces - this is not something i've done before, so the 
tracer had to be adjusted)

..

Hi Ingo!

I was using your 2.6.23.1 version of the patches, plus the fix you posted.

The new patch you gave just now is for 2.6.24, which won't apply to
the older kernel.  I'm not switching kernels yet, as doing so might mask
the problem without actually resolving it for good.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Ingo Molnar


* Mark Lord <[EMAIL PROTECTED]> wrote:

> Since Ingo's latency trace patches lock up the machine on resume, the 
> next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y.

hm, which patch did you try? Could you check whether all chunks from the 
patch below are applied? (these are the fixed i did when i was doing 
cross-suspend traces - this is not something i've done before, so the 
tracer had to be adjusted)

i suspect if you turn off CONFIG_FUNCTION_TRACING then you wont get any 
hung resume - and the resulting trace would still be pretty useful. (it 
will show scheduling and irq activities, etc.)

Ingo

---
 arch/x86/kernel/stacktrace.c |2 +-
 arch/x86/power/cpu.c |3 ++-
 drivers/acpi/namespace/nsutils.c |2 +-
 drivers/acpi/namespace/nswalk.c  |2 +-
 include/linux/sched.h|2 ++
 kernel/latency_trace.c   |   26 +++---
 kernel/softirq.c |6 +++---
 7 files changed, 33 insertions(+), 10 deletions(-)

Index: linux/arch/x86/kernel/stacktrace.c
===
--- linux.orig/arch/x86/kernel/stacktrace.c
+++ linux/arch/x86/kernel/stacktrace.c
@@ -22,7 +22,7 @@ static int save_stack_stack(void *data, 
return -1;
 }
 
-static void save_stack_address(void *data, unsigned long addr)
+static void notrace save_stack_address(void *data, unsigned long addr)
 {
struct stack_trace *trace = (struct stack_trace *)data;
if (trace->skip > 0) {
Index: linux/arch/x86/power/cpu.c
===
--- linux.orig/arch/x86/power/cpu.c
+++ linux/arch/x86/power/cpu.c
@@ -123,8 +123,9 @@ void __restore_processor_state(struct sa
mcheck_init(_cpu_data);
 }
 
-void restore_processor_state(void)
+void notrace restore_processor_state(void)
 {
+   trace_resume();
__restore_processor_state(_context);
 }
 
Index: linux/drivers/acpi/namespace/nsutils.c
===
--- linux.orig/drivers/acpi/namespace/nsutils.c
+++ linux/drivers/acpi/namespace/nsutils.c
@@ -923,7 +923,7 @@ struct acpi_namespace_node *acpi_ns_get_
  *
  
**/
 
-struct acpi_namespace_node *acpi_ns_get_next_valid_node(struct
+struct acpi_namespace_node * notrace acpi_ns_get_next_valid_node(struct
acpi_namespace_node
*node)
 {
Index: linux/drivers/acpi/namespace/nswalk.c
===
--- linux.orig/drivers/acpi/namespace/nswalk.c
+++ linux/drivers/acpi/namespace/nswalk.c
@@ -65,7 +65,7 @@ ACPI_MODULE_NAME("nswalk")
  *  within Scope is returned.
  *
  
**/
-struct acpi_namespace_node *acpi_ns_get_next_node(acpi_object_type type, 
struct acpi_namespace_node
+struct acpi_namespace_node * notrace acpi_ns_get_next_node(acpi_object_type 
type, struct acpi_namespace_node
  *parent_node, struct 
acpi_namespace_node
  *child_node)
 {
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -337,6 +337,7 @@ static inline void touch_all_softlockup_
   extern long user_trace_stop(void);
   extern void trace_cmdline(void);
   extern void init_tracer(void);
+  extern void trace_resume(void);
 #else
 # define mcount_enabled0
 # define trace_enabled 0
@@ -358,6 +359,7 @@ static inline void touch_all_softlockup_
 # define user_trace_stop() do { } while (0)
 # define trace_cmdline()   do { } while (0)
 # define init_tracer() do { } while (0)
+# define trace_resume()do { } while (0)
 #endif
 
 #ifdef CONFIG_WAKEUP_TIMING
Index: linux/kernel/latency_trace.c
===
--- linux.orig/kernel/latency_trace.c
+++ linux/kernel/latency_trace.c
@@ -258,19 +258,28 @@ static struct cpu_trace cpu_traces[NR_CP
 #endif
  } };
 
-static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+static inline notrace cycle_t __now(int monotonic)
 {
-   cycles_t now, delta, last = tr->last_cycles;
+   cycles_t now;
 
if (trace_use_raw_cycles && !monotonic)
now = get_cycles();
else
now = get_monotonic_cycles();
 
+   return now;
+}
+
+static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+{
+   cycles_t now, delta, last = tr->last_cycles;
+
+   now = __now(monotonic);
+
/*

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Mark Lord


Mark Lord wrote:

I have been reporting this off and on since 2.6.23 was released.
This problem was not apparent up to perhaps 2.6.23-rc8,
but definitely became common in 2.6.23 and 2.6.23.1.

Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
of kernel time to complete.

Once in a while, it takes *much* longer, in the 14-20 second range.
These long events *seem* to be mostly after the notebook has been
in suspend for a longish time, but there's really nothing consistent here.

So git-bisect isn't going to work for this one.   I recently rebuilt the 
kernel
to include printk timestamps, and then it went 2 days without the issue 
happening,

until this morning (after an overnight suspend) finally.

The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ 
w/3GB DDR2.

PCIe express chipset, ATI graphics, SATA hard drive.

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 2 (rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 4 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)

00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) 
Serial ATA Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller 
(rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
X1400
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
(rev 02)

03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)

03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
Adapter (rev 0a)

03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG 
Network Connection (rev 02)


There's also a USB2 hub connected, with the Logitech mouse being the
only thing plugged into it at present.

The best clue I have of what is happening, other than that it was first
seen during the late 2.6.23-rc* series, are the following two sets of
kernel logs.  The first set is from a "normal" fast wake-up,
and the second set is from the "slow" wake-up seen this morning.

Both logs show the same information, in pretty much the same order.
The BIG difference is a bunch of unexplained pauses of exactly 1-second
each that occur during the slow wakeup.

The only other difference is that the SATA disk messages are placed
differently within the two logs.  I'm thinking on that one but it
doesn't look significant to me.

The real question is, why the 1-sec pauses?

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

HERE IS THE NORMAL RESUME LOG.


[  820.147914] CPU1 is down
[0.355669] Back to C!
[0.356120] Enabling non-boot CPUs ...
[0.377167] SMP alternatives: switching to SMP code
[0.378565] Booting processor 1/1 eip 3000
[0.378568] CPU 1 irqstacks, hard=c0375000 soft=c0373000
[0.389374] Initializing CPU#1
[0.470225] Calibrating delay using timer specific routine.. 4324.85 
BogoMIPS (lpj=7204571)
[0.470235] CPU: After generic identify, caps: bfebfbff 2010 
  e3bd  0001 

[0.470244] monitor/mwait feature present.
[0.470248] CPU: L1 I cache: 32K, L1 D cache: 32K
[0.470252] CPU: L2 cache: 4096K
[0.470255] CPU: Physical Processor ID: 0
[0.470257] CPU: Processor Core ID: 1
[0.470260] CPU: After all inits, caps: bfebfbff 2010  
3940 e3bd  0001 
[0.471330] CPU1: Intel(R) Core(TM)2 CPU T7400  @ 2.16GHz 
stepping 06

[0.471724] CPU1 is up
[0.474787] Switched to high resolution mode on CPU 1
[0.483415] PM: Writing back config space on device :00:01.0 at 
offset a (was f, writing 0)
[0.483422] PM: Writing back config space on device :00:01.0 at 
offset 7 (was

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Ingo Molnar


* Mark Lord [EMAIL PROTECTED] wrote:

 Since Ingo's latency trace patches lock up the machine on resume, the 
 next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y.

hm, which patch did you try? Could you check whether all chunks from the 
patch below are applied? (these are the fixed i did when i was doing 
cross-suspend traces - this is not something i've done before, so the 
tracer had to be adjusted)

i suspect if you turn off CONFIG_FUNCTION_TRACING then you wont get any 
hung resume - and the resulting trace would still be pretty useful. (it 
will show scheduling and irq activities, etc.)

Ingo

---
 arch/x86/kernel/stacktrace.c |2 +-
 arch/x86/power/cpu.c |3 ++-
 drivers/acpi/namespace/nsutils.c |2 +-
 drivers/acpi/namespace/nswalk.c  |2 +-
 include/linux/sched.h|2 ++
 kernel/latency_trace.c   |   26 +++---
 kernel/softirq.c |6 +++---
 7 files changed, 33 insertions(+), 10 deletions(-)

Index: linux/arch/x86/kernel/stacktrace.c
===
--- linux.orig/arch/x86/kernel/stacktrace.c
+++ linux/arch/x86/kernel/stacktrace.c
@@ -22,7 +22,7 @@ static int save_stack_stack(void *data, 
return -1;
 }
 
-static void save_stack_address(void *data, unsigned long addr)
+static void notrace save_stack_address(void *data, unsigned long addr)
 {
struct stack_trace *trace = (struct stack_trace *)data;
if (trace-skip  0) {
Index: linux/arch/x86/power/cpu.c
===
--- linux.orig/arch/x86/power/cpu.c
+++ linux/arch/x86/power/cpu.c
@@ -123,8 +123,9 @@ void __restore_processor_state(struct sa
mcheck_init(boot_cpu_data);
 }
 
-void restore_processor_state(void)
+void notrace restore_processor_state(void)
 {
+   trace_resume();
__restore_processor_state(saved_context);
 }
 
Index: linux/drivers/acpi/namespace/nsutils.c
===
--- linux.orig/drivers/acpi/namespace/nsutils.c
+++ linux/drivers/acpi/namespace/nsutils.c
@@ -923,7 +923,7 @@ struct acpi_namespace_node *acpi_ns_get_
  *
  
**/
 
-struct acpi_namespace_node *acpi_ns_get_next_valid_node(struct
+struct acpi_namespace_node * notrace acpi_ns_get_next_valid_node(struct
acpi_namespace_node
*node)
 {
Index: linux/drivers/acpi/namespace/nswalk.c
===
--- linux.orig/drivers/acpi/namespace/nswalk.c
+++ linux/drivers/acpi/namespace/nswalk.c
@@ -65,7 +65,7 @@ ACPI_MODULE_NAME(nswalk)
  *  within Scope is returned.
  *
  
**/
-struct acpi_namespace_node *acpi_ns_get_next_node(acpi_object_type type, 
struct acpi_namespace_node
+struct acpi_namespace_node * notrace acpi_ns_get_next_node(acpi_object_type 
type, struct acpi_namespace_node
  *parent_node, struct 
acpi_namespace_node
  *child_node)
 {
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -337,6 +337,7 @@ static inline void touch_all_softlockup_
   extern long user_trace_stop(void);
   extern void trace_cmdline(void);
   extern void init_tracer(void);
+  extern void trace_resume(void);
 #else
 # define mcount_enabled0
 # define trace_enabled 0
@@ -358,6 +359,7 @@ static inline void touch_all_softlockup_
 # define user_trace_stop() do { } while (0)
 # define trace_cmdline()   do { } while (0)
 # define init_tracer() do { } while (0)
+# define trace_resume()do { } while (0)
 #endif
 
 #ifdef CONFIG_WAKEUP_TIMING
Index: linux/kernel/latency_trace.c
===
--- linux.orig/kernel/latency_trace.c
+++ linux/kernel/latency_trace.c
@@ -258,19 +258,28 @@ static struct cpu_trace cpu_traces[NR_CP
 #endif
  } };
 
-static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+static inline notrace cycle_t __now(int monotonic)
 {
-   cycles_t now, delta, last = tr-last_cycles;
+   cycles_t now;
 
if (trace_use_raw_cycles  !monotonic)
now = get_cycles();
else
now = get_monotonic_cycles();
 
+   return now;
+}
+
+static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+{
+   cycles_t now, delta, last = tr-last_cycles;
+
+   now = __now(monotonic);
+
/*

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Mark Lord


Mark Lord wrote:

I have been reporting this off and on since 2.6.23 was released.
This problem was not apparent up to perhaps 2.6.23-rc8,
but definitely became common in 2.6.23 and 2.6.23.1.

Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
of kernel time to complete.

Once in a while, it takes *much* longer, in the 14-20 second range.
These long events *seem* to be mostly after the notebook has been
in suspend for a longish time, but there's really nothing consistent here.

So git-bisect isn't going to work for this one.   I recently rebuilt the 
kernel
to include printk timestamps, and then it went 2 days without the issue 
happening,

until this morning (after an overnight suspend) finally.

The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ 
w/3GB DDR2.

PCIe express chipset, ATI graphics, SATA hard drive.

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 2 (rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
Port 4 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
#4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)

00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) 
Serial ATA Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller 
(rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
X1400
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
(rev 02)

03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)

03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
Adapter (rev 0a)

03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG 
Network Connection (rev 02)


There's also a USB2 hub connected, with the Logitech mouse being the
only thing plugged into it at present.

The best clue I have of what is happening, other than that it was first
seen during the late 2.6.23-rc* series, are the following two sets of
kernel logs.  The first set is from a normal fast wake-up,
and the second set is from the slow wake-up seen this morning.

Both logs show the same information, in pretty much the same order.
The BIG difference is a bunch of unexplained pauses of exactly 1-second
each that occur during the slow wakeup.

The only other difference is that the SATA disk messages are placed
differently within the two logs.  I'm thinking on that one but it
doesn't look significant to me.

The real question is, why the 1-sec pauses?

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

HERE IS THE NORMAL RESUME LOG.


[  820.147914] CPU1 is down
[0.355669] Back to C!
[0.356120] Enabling non-boot CPUs ...
[0.377167] SMP alternatives: switching to SMP code
[0.378565] Booting processor 1/1 eip 3000
[0.378568] CPU 1 irqstacks, hard=c0375000 soft=c0373000
[0.389374] Initializing CPU#1
[0.470225] Calibrating delay using timer specific routine.. 4324.85 
BogoMIPS (lpj=7204571)
[0.470235] CPU: After generic identify, caps: bfebfbff 2010 
  e3bd  0001 

[0.470244] monitor/mwait feature present.
[0.470248] CPU: L1 I cache: 32K, L1 D cache: 32K
[0.470252] CPU: L2 cache: 4096K
[0.470255] CPU: Physical Processor ID: 0
[0.470257] CPU: Processor Core ID: 1
[0.470260] CPU: After all inits, caps: bfebfbff 2010  
3940 e3bd  0001 
[0.471330] CPU1: Intel(R) Core(TM)2 CPU T7400  @ 2.16GHz 
stepping 06

[0.471724] CPU1 is up
[0.474787] Switched to high resolution mode on CPU 1
[0.483415] PM: Writing back config space on device :00:01.0 at 
offset a (was f, writing 0)
[0.483422] PM: Writing back config space on device :00:01.0 at 
offset 7 (was e0e0,

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-18 Thread Mark Lord


Ingo Molnar wrote:

* Mark Lord [EMAIL PROTECTED] wrote:

Since Ingo's latency trace patches lock up the machine on resume, the 
next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y.


hm, which patch did you try? Could you check whether all chunks from the 
patch below are applied? (these are the fixed i did when i was doing 
cross-suspend traces - this is not something i've done before, so the 
tracer had to be adjusted)

..

Hi Ingo!

I was using your 2.6.23.1 version of the patches, plus the fix you posted.

The new patch you gave just now is for 2.6.24, which won't apply to
the older kernel.  I'm not switching kernels yet, as doing so might mask
the problem without actually resolving it for good.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mark Lord


Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:


Is there a version of these that works with 2.6.23.1 ?

yes, i've backported it and have uploaded the v2.6.23 version to:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

..

ok, i experimented around with the latency tracer, trying to capture the 
trace of a full suspend+resume cycle, and it needed the tracer fix below 
(GTOD clocksource suspend/resume would otherwise confuse the tracer and 
you'd get no trace output as a result).


once that tracer bug was fixed, the best method to generate a trace was 
to do this:


   echo 1 > /proc/sys/kernel/stackframe_tracing
   echo 1 > /proc/sys/kernel/syscall_tracing
   ./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt

..

I've applied the tracer for 2.6.23.1 patch, plus the bugfix,
and now I just get blinking-LEDs (black screen of death) on resume.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mark Lord


Mark Lord wrote:

Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch 


   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

make trace-cmd
cc -Wall -O2 -strace-cmd.c   -o trace-cmd
trace-cmd.c: In function ‘main’:
trace-cmd.c:65: warning: label ‘usage’ defined but not used

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mike Galbraith

On Fri, 2007-11-16 at 12:23 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > once that tracer bug was fixed, the best method to generate a trace 
> > was to do this:
> > 
> >echo 1 > /proc/sys/kernel/stackframe_tracing
> >echo 1 > /proc/sys/kernel/syscall_tracing
> >./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt
> 
> so here's an UP suspend+resume trace i did:
> 
>   
> http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2
> 
> tons of detail - which might be interesting to other folks as well. Fact 
> is, our suspend-to-RAM+resume cycle is very, very slow, even on fast 
> hardware - and this trace shows all the reasons why.
> 
> This was a fully cached system - i.e. i've done a suspend+resume before 
> to warm up the caches. (not that suspend+resume does much IO normally.)
> 
> The trace shows that a suspend+resume cycle is 7.95 seconds long 
> (without counting the time the box spent suspended) - ouch! This was a 
> T60 with Core2Duo 1.83GHz.

Ouch?  That's an order of magnitude faster than my 3GHz P4 :)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> so here's an UP suspend+resume trace i did:
> 
>   
> http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2
> 
> tons of detail - which might be interesting to other folks as well. 
> Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on 
> fast hardware - and this trace shows all the reasons why.
> 
> This was a fully cached system - i.e. i've done a suspend+resume 
> before to warm up the caches. (not that suspend+resume does much IO 
> normally.)
> 
> The trace shows that a suspend+resume cycle is 7.95 seconds long 
> (without counting the time the box spent suspended) - ouch! This was a 
> T60 with Core2Duo 1.83GHz.

and the amount of time spent executing on the CPU was only 70 msecs! So 
we spent 99% of that 7.9 seconds with just waiting around. Here are the 
top 10 sleep reasons:

864  schedule()<-schedule_timeout()<-ps2_sendbyte()<-ps2_command()
183  schedule()<-vt_waitactive()<-vt_ioctl()<-tty_ioctl()
164  schedule()<-schedule_timeout()<-acpi_ec_wait()<-acpi_ec_transaction()
157  
schedule()<-refrigerator()<-get_signal_to_deliver()<-do_notify_resume()
118  schedule()<-worker_thread()<-kthread()<-kernel_thread_helper()
 80  schedule()<-do_msleep()<-msleep()<-sata_link_debounce()
 64  schedule()<-schedule_timeout()<-inet_csk_accept()<-inet_accept()
 37  
schedule()<-__mutex_lock_slowpath()<-mutex_lock()<-acpi_ec_transaction()
 20  schedule()<-schedule_timeout()<-do_select()<-core_sys_select()
 20  schedule()<-io_schedule()<-sync_buffer()<-__wait_on_bit()

what's weird are all those ps2 related sleeps - they make up for much of 
the delay. This how such a sleep point looks like:

bash 3500  0D... 8641415us : 
schedule()<-schedule_timeout()<-ps2_sendbyte()<-ps2_command()
bash 3500  0D... 8641417us : 
psmouse_sliced_command()<-synaptics_pt_write()<-ps2_sendbyte()<-ps2_command()

it starts somewhere here:

bash 3500  0 5302376us : serio_reconnect_driver (serio_resume)

and ends:

 kseriod 208   0D... 9182560us : 
synaptics_query_hardware()<-synaptics_reconnect()<-psmouse_reconnect()<-serio_reconnect_driver()

so this section (serio_resume()) took almost 4 seconds.

the main delay seems to be dpm_resume():

bash 3500  0.N.. 3061040us : dpm_resume (device_resume)
...
bash 3500  0 9105017us : mutex_unlock (dpm_resume)
bash 3500  0 9105018us : mutex_unlock (device_resume)

6.1 seconds!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> once that tracer bug was fixed, the best method to generate a trace 
> was to do this:
> 
>echo 1 > /proc/sys/kernel/stackframe_tracing
>echo 1 > /proc/sys/kernel/syscall_tracing
>./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt

so here's an UP suspend+resume trace i did:

http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2

tons of detail - which might be interesting to other folks as well. Fact 
is, our suspend-to-RAM+resume cycle is very, very slow, even on fast 
hardware - and this trace shows all the reasons why.

This was a fully cached system - i.e. i've done a suspend+resume before 
to warm up the caches. (not that suspend+resume does much IO normally.)

The trace shows that a suspend+resume cycle is 7.95 seconds long 
(without counting the time the box spent suspended) - ouch! This was a 
T60 with Core2Duo 1.83GHz.

For example here is where freezing starts:

bash-2397  0 31686us : remove_wait_queue (vt_waitactive)
bash-2397  0 31688us : freeze_processes (enter_state)
bash-2397  0 31689us : printk (freeze_processes)

here is where the ACPI code triggers the suspend:

bash-2397  0D... 1904138us : acpi_hw_low_level_write 
(acpi_hw_register_write)

but this is a whopping 1.9 seconds into the trace already!

first sign of life after i opened the laptop lid again:

bash-2397  0D... 1904138us : __restore_processor_state 
(restore_processor_state)
bash-2397  0D... 1904138us : enable_sep_cpu (__restore_processor_state)

(in the trace there's no delay visible - the period of time spent 
suspended is not visible to the tracer.)

One good way to start looking at such traces is to filter out 
rescheduling events alone:

  grep ': schedule <'  trace-suspend-long.txt

that gives a rough outline of what's going on:

   -0 0D... 1776566us : schedule  (0 20)
 bash-2397  0D... 1786748us : schedule <-0> (20 0)
 scsi_eh_-419   0D... 1786814us : schedule  (0 -5)
 bash-2397  0D... 1786960us : schedule  (-5 0)
 scsi_eh_-421   0D... 1787020us : schedule  (0 -5)
 bash-2397  0D... 1787125us : schedule  (-5 0)

so you can zoom in on the real area of interest by searching for the 
timestamp.

Hope this helps,

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar


ok, i experimented around with the latency tracer, trying to capture the 
trace of a full suspend+resume cycle, and it needed the tracer fix below 
(GTOD clocksource suspend/resume would otherwise confuse the tracer and 
you'd get no trace output as a result).

once that tracer bug was fixed, the best method to generate a trace was 
to do this:

   echo 1 > /proc/sys/kernel/stackframe_tracing
   echo 1 > /proc/sys/kernel/syscall_tracing
   ./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt

Ingo

---
 kernel/latency_trace.c |   53 +
 1 file changed, 36 insertions(+), 17 deletions(-)

Index: linux/kernel/latency_trace.c
===
--- linux.orig/kernel/latency_trace.c
+++ linux/kernel/latency_trace.c
@@ -67,13 +67,6 @@ static unsigned long notrace cycles_to_u
 }
 #endif
 
-static notrace inline cycle_t now(void)
-{
-   if (trace_use_raw_cycles)
-   return get_cycles();
-   return get_monotonic_cycles();
-}
-
 #ifndef irqs_off
 # define irqs_off()0
 #endif
@@ -237,6 +230,8 @@ struct cpu_trace {
unsigned long critical_sequence;
atomic_t underrun;
atomic_t overrun;
+   cycle_t cycles;
+   cycle_t last_cycles;
int early_warning;
int latency_type;
int cpu;
@@ -263,6 +258,30 @@ static struct cpu_trace cpu_traces[NR_CP
 #endif
  } };
 
+static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+{
+   cycles_t now, delta, last = tr->last_cycles;
+
+   if (trace_use_raw_cycles && !monotonic)
+   now = get_cycles();
+   else
+   now = get_monotonic_cycles();
+
+   /*
+* Protect against time warps:
+*/
+   if (unlikely(now < last))
+   delta = 1;
+   else
+   delta = now - last;
+
+   tr->last_cycles = now;
+   tr->cycles += delta;
+
+   return tr->cycles;
+}
+
+
 #ifdef CONFIG_EVENT_TRACE
 
 int trace_enabled = 0;
@@ -615,7 +634,7 @@ trace(int cpu, enum trace_type type,
 again:
idx = tr->trace_idx;
idx_next = idx + 1;
-   timestamp = now();
+   timestamp = now(tr, 0);
 
if (unlikely((trace_freerunning || print_functions || 
atomic_read(>underrun)) &&
 (idx_next >= MAX_TRACE) && !atomic_read(>overrun))) {
@@ -1743,7 +1762,7 @@ check_critical_timing(int cpu, struct cp
 * as long as possible:
 */
T0 = tr->preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now(tr, 1);
delta = T1-T0;
 
local_save_flags(flags);
@@ -1757,7 +1776,7 @@ check_critical_timing(int cpu, struct cp
 * might change it (it can only get larger so the latency
 * is fair to be reported):
 */
-   T2 = get_monotonic_cycles();
+   T2 = now(tr, 1);
 
delta = T2-T0;
 
@@ -1807,7 +1826,7 @@ check_critical_timing(int cpu, struct cp
printk(" =>   ended at timestamp %lu: ", t1);
print_symbol("<%s>\n", tr->critical_end);
dump_stack();
-   t1 = cycles_to_usecs(get_monotonic_cycles());
+   t1 = cycles_to_usecs(now(tr, 1));
printk(" =>   dump-end timestamp %lu\n\n", t1);
 #endif
 
@@ -1817,7 +1836,7 @@ check_critical_timing(int cpu, struct cp
 
 out:
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now(tr, 1);
tr->early_warning = 0;
reset_trace_idx(cpu, tr);
_trace_cmdline(cpu, tr);
@@ -1866,7 +1885,7 @@ __start_critical_timing(unsigned long ei
atomic_inc(>disabled);
 
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now(tr, 1);
tr->critical_start = eip;
reset_trace_idx(cpu, tr);
tr->latency_type = latency_type;
@@ -2150,7 +2169,7 @@ check_wakeup_timing(struct cpu_trace *tr
goto out;
 
T0 = tr->preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now(tr, 1);
/*
 * Any wraparound or time warp and we are out:
 */
@@ -2255,7 +2274,7 @@ void __trace_start_sched_wakeup(struct t
 // if (!atomic_read(>disabled)) {
atomic_inc(>disabled);
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now(tr, 1);
tr->latency_type = WAKEUP_LATENCY;
tr->critical_start = CALLER_ADDR0;
_trace_cmdline(raw_smp_processor_id(), tr);
@@ -2366,7 +2385,7 @@ long user_trace_start(void)
reset_trace_idx(cpu, tr);
 
tr->critical_sequence = max_sequence;
-   tr->preempt_timestamp = get_monotonic_cycles();
+   tr->preempt_timestamp = now(tr, 1);
tr->critical_start = CALLER_ADDR0;

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 once that tracer bug was fixed, the best method to generate a trace 
 was to do this:
 
echo 1  /proc/sys/kernel/stackframe_tracing
echo 1  /proc/sys/kernel/syscall_tracing
./trace-cmd bash -c echo mem  /sys/power/state  trace.txt

so here's an UP suspend+resume trace i did:

  
http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2

tons of detail - which might be interesting to other folks as well. Fact 
is, our suspend-to-RAM+resume cycle is very, very slow, even on fast 
hardware - and this trace shows all the reasons why.

This was a fully cached system - i.e. i've done a suspend+resume before 
to warm up the caches. (not that suspend+resume does much IO normally.)

The trace shows that a suspend+resume cycle is 7.95 seconds long 
(without counting the time the box spent suspended) - ouch! This was a 
T60 with Core2Duo 1.83GHz.

For example here is where freezing starts:

bash-2397  0 31686us : remove_wait_queue (vt_waitactive)
bash-2397  0 31688us : freeze_processes (enter_state)
bash-2397  0 31689us : printk (freeze_processes)

here is where the ACPI code triggers the suspend:

bash-2397  0D... 1904138us : acpi_hw_low_level_write 
(acpi_hw_register_write)

but this is a whopping 1.9 seconds into the trace already!

first sign of life after i opened the laptop lid again:

bash-2397  0D... 1904138us : __restore_processor_state 
(restore_processor_state)
bash-2397  0D... 1904138us : enable_sep_cpu (__restore_processor_state)

(in the trace there's no delay visible - the period of time spent 
suspended is not visible to the tracer.)

One good way to start looking at such traces is to filter out 
rescheduling events alone:

  grep ': schedule '  trace-suspend-long.txt

that gives a rough outline of what's going on:

   idle-0 0D... 1776566us : schedule bash-2397 (0 20)
 bash-2397  0D... 1786748us : schedule idle-0 (20 0)
 scsi_eh_-419   0D... 1786814us : schedule bash-2397 (0 -5)
 bash-2397  0D... 1786960us : schedule scsi_eh_-419 (-5 0)
 scsi_eh_-421   0D... 1787020us : schedule bash-2397 (0 -5)
 bash-2397  0D... 1787125us : schedule scsi_eh_-421 (-5 0)

so you can zoom in on the real area of interest by searching for the 
timestamp.

Hope this helps,

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mike Galbraith

On Fri, 2007-11-16 at 12:23 +0100, Ingo Molnar wrote:
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
  once that tracer bug was fixed, the best method to generate a trace 
  was to do this:
  
 echo 1  /proc/sys/kernel/stackframe_tracing
 echo 1  /proc/sys/kernel/syscall_tracing
 ./trace-cmd bash -c echo mem  /sys/power/state  trace.txt
 
 so here's an UP suspend+resume trace i did:
 
   
 http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2
 
 tons of detail - which might be interesting to other folks as well. Fact 
 is, our suspend-to-RAM+resume cycle is very, very slow, even on fast 
 hardware - and this trace shows all the reasons why.
 
 This was a fully cached system - i.e. i've done a suspend+resume before 
 to warm up the caches. (not that suspend+resume does much IO normally.)
 
 The trace shows that a suspend+resume cycle is 7.95 seconds long 
 (without counting the time the box spent suspended) - ouch! This was a 
 T60 with Core2Duo 1.83GHz.

Ouch?  That's an order of magnitude faster than my 3GHz P4 :)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 so here's an UP suspend+resume trace i did:
 
   
 http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2
 
 tons of detail - which might be interesting to other folks as well. 
 Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on 
 fast hardware - and this trace shows all the reasons why.
 
 This was a fully cached system - i.e. i've done a suspend+resume 
 before to warm up the caches. (not that suspend+resume does much IO 
 normally.)
 
 The trace shows that a suspend+resume cycle is 7.95 seconds long 
 (without counting the time the box spent suspended) - ouch! This was a 
 T60 with Core2Duo 1.83GHz.

and the amount of time spent executing on the CPU was only 70 msecs! So 
we spent 99% of that 7.9 seconds with just waiting around. Here are the 
top 10 sleep reasons:

864  schedule()-schedule_timeout()-ps2_sendbyte()-ps2_command()
183  schedule()-vt_waitactive()-vt_ioctl()-tty_ioctl()
164  schedule()-schedule_timeout()-acpi_ec_wait()-acpi_ec_transaction()
157  
schedule()-refrigerator()-get_signal_to_deliver()-do_notify_resume()
118  schedule()-worker_thread()-kthread()-kernel_thread_helper()
 80  schedule()-do_msleep()-msleep()-sata_link_debounce()
 64  schedule()-schedule_timeout()-inet_csk_accept()-inet_accept()
 37  
schedule()-__mutex_lock_slowpath()-mutex_lock()-acpi_ec_transaction()
 20  schedule()-schedule_timeout()-do_select()-core_sys_select()
 20  schedule()-io_schedule()-sync_buffer()-__wait_on_bit()

what's weird are all those ps2 related sleeps - they make up for much of 
the delay. This how such a sleep point looks like:

bash 3500  0D... 8641415us : 
schedule()-schedule_timeout()-ps2_sendbyte()-ps2_command()
bash 3500  0D... 8641417us : 
psmouse_sliced_command()-synaptics_pt_write()-ps2_sendbyte()-ps2_command()

it starts somewhere here:

bash 3500  0 5302376us : serio_reconnect_driver (serio_resume)

and ends:

 kseriod 208   0D... 9182560us : 
synaptics_query_hardware()-synaptics_reconnect()-psmouse_reconnect()-serio_reconnect_driver()

so this section (serio_resume()) took almost 4 seconds.

the main delay seems to be dpm_resume():

bash 3500  0.N.. 3061040us : dpm_resume (device_resume)
...
bash 3500  0 9105017us : mutex_unlock (dpm_resume)
bash 3500  0 9105018us : mutex_unlock (device_resume)

6.1 seconds!

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Ingo Molnar


ok, i experimented around with the latency tracer, trying to capture the 
trace of a full suspend+resume cycle, and it needed the tracer fix below 
(GTOD clocksource suspend/resume would otherwise confuse the tracer and 
you'd get no trace output as a result).

once that tracer bug was fixed, the best method to generate a trace was 
to do this:

   echo 1  /proc/sys/kernel/stackframe_tracing
   echo 1  /proc/sys/kernel/syscall_tracing
   ./trace-cmd bash -c echo mem  /sys/power/state  trace.txt

Ingo

---
 kernel/latency_trace.c |   53 +
 1 file changed, 36 insertions(+), 17 deletions(-)

Index: linux/kernel/latency_trace.c
===
--- linux.orig/kernel/latency_trace.c
+++ linux/kernel/latency_trace.c
@@ -67,13 +67,6 @@ static unsigned long notrace cycles_to_u
 }
 #endif
 
-static notrace inline cycle_t now(void)
-{
-   if (trace_use_raw_cycles)
-   return get_cycles();
-   return get_monotonic_cycles();
-}
-
 #ifndef irqs_off
 # define irqs_off()0
 #endif
@@ -237,6 +230,8 @@ struct cpu_trace {
unsigned long critical_sequence;
atomic_t underrun;
atomic_t overrun;
+   cycle_t cycles;
+   cycle_t last_cycles;
int early_warning;
int latency_type;
int cpu;
@@ -263,6 +258,30 @@ static struct cpu_trace cpu_traces[NR_CP
 #endif
  } };
 
+static notrace cycle_t now(struct cpu_trace *tr, int monotonic)
+{
+   cycles_t now, delta, last = tr-last_cycles;
+
+   if (trace_use_raw_cycles  !monotonic)
+   now = get_cycles();
+   else
+   now = get_monotonic_cycles();
+
+   /*
+* Protect against time warps:
+*/
+   if (unlikely(now  last))
+   delta = 1;
+   else
+   delta = now - last;
+
+   tr-last_cycles = now;
+   tr-cycles += delta;
+
+   return tr-cycles;
+}
+
+
 #ifdef CONFIG_EVENT_TRACE
 
 int trace_enabled = 0;
@@ -615,7 +634,7 @@ trace(int cpu, enum trace_type type,
 again:
idx = tr-trace_idx;
idx_next = idx + 1;
-   timestamp = now();
+   timestamp = now(tr, 0);
 
if (unlikely((trace_freerunning || print_functions || 
atomic_read(tr-underrun)) 
 (idx_next = MAX_TRACE)  !atomic_read(tr-overrun))) {
@@ -1743,7 +1762,7 @@ check_critical_timing(int cpu, struct cp
 * as long as possible:
 */
T0 = tr-preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now(tr, 1);
delta = T1-T0;
 
local_save_flags(flags);
@@ -1757,7 +1776,7 @@ check_critical_timing(int cpu, struct cp
 * might change it (it can only get larger so the latency
 * is fair to be reported):
 */
-   T2 = get_monotonic_cycles();
+   T2 = now(tr, 1);
 
delta = T2-T0;
 
@@ -1807,7 +1826,7 @@ check_critical_timing(int cpu, struct cp
printk( =   ended at timestamp %lu: , t1);
print_symbol(%s\n, tr-critical_end);
dump_stack();
-   t1 = cycles_to_usecs(get_monotonic_cycles());
+   t1 = cycles_to_usecs(now(tr, 1));
printk( =   dump-end timestamp %lu\n\n, t1);
 #endif
 
@@ -1817,7 +1836,7 @@ check_critical_timing(int cpu, struct cp
 
 out:
tr-critical_sequence = max_sequence;
-   tr-preempt_timestamp = get_monotonic_cycles();
+   tr-preempt_timestamp = now(tr, 1);
tr-early_warning = 0;
reset_trace_idx(cpu, tr);
_trace_cmdline(cpu, tr);
@@ -1866,7 +1885,7 @@ __start_critical_timing(unsigned long ei
atomic_inc(tr-disabled);
 
tr-critical_sequence = max_sequence;
-   tr-preempt_timestamp = get_monotonic_cycles();
+   tr-preempt_timestamp = now(tr, 1);
tr-critical_start = eip;
reset_trace_idx(cpu, tr);
tr-latency_type = latency_type;
@@ -2150,7 +2169,7 @@ check_wakeup_timing(struct cpu_trace *tr
goto out;
 
T0 = tr-preempt_timestamp;
-   T1 = get_monotonic_cycles();
+   T1 = now(tr, 1);
/*
 * Any wraparound or time warp and we are out:
 */
@@ -2255,7 +2274,7 @@ void __trace_start_sched_wakeup(struct t
 // if (!atomic_read(tr-disabled)) {
atomic_inc(tr-disabled);
tr-critical_sequence = max_sequence;
-   tr-preempt_timestamp = get_monotonic_cycles();
+   tr-preempt_timestamp = now(tr, 1);
tr-latency_type = WAKEUP_LATENCY;
tr-critical_start = CALLER_ADDR0;
_trace_cmdline(raw_smp_processor_id(), tr);
@@ -2366,7 +2385,7 @@ long user_trace_start(void)
reset_trace_idx(cpu, tr);
 
tr-critical_sequence = max_sequence;
-   tr-preempt_timestamp = get_monotonic_cycles();
+   tr-preempt_timestamp = now(tr, 1);
tr-critical_start = CALLER_ADDR0;
_trace_cmdline(cpu, tr);
mcount();
@@ -2425,7

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mark Lord


Ingo Molnar wrote:

* Ingo Molnar [EMAIL PROTECTED] wrote:


Is there a version of these that works with 2.6.23.1 ?

yes, i've backported it and have uploaded the v2.6.23 version to:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

..

ok, i experimented around with the latency tracer, trying to capture the 
trace of a full suspend+resume cycle, and it needed the tracer fix below 
(GTOD clocksource suspend/resume would otherwise confuse the tracer and 
you'd get no trace output as a result).


once that tracer bug was fixed, the best method to generate a trace was 
to do this:


   echo 1  /proc/sys/kernel/stackframe_tracing
   echo 1  /proc/sys/kernel/syscall_tracing
   ./trace-cmd bash -c echo mem  /sys/power/state  trace.txt

..

I've applied the tracer for 2.6.23.1 patch, plus the bugfix,
and now I just get blinking-LEDs (black screen of death) on resume.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-16 Thread Mark Lord


Mark Lord wrote:

Ingo Molnar wrote:

* Ingo Molnar [EMAIL PROTECTED] wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch 


   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

make trace-cmd
cc -Wall -O2 -strace-cmd.c   -o trace-cmd
trace-cmd.c: In function ‘main’:
trace-cmd.c:65: warning: label ‘usage’ defined but not used

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > Is there a version of these that works with 2.6.23.1 ?
> 
> yes, i've backported it and have uploaded the v2.6.23 version to:
> 
>   
> http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

btw., if the trace is too large and the interesting section of suspend 
does not fit into it then you can narrow it down to the most important 
events only by changing this in trace-cmd.c:

system("echo 1 > /proc/sys/kernel/mcount_enabled");

to:

system("echo 0 > /proc/sys/kernel/mcount_enabled");

that way we'll still trace IRQs and scheduling events, which is enough 
to see roughly where the delay is happening. To include symbolic 
backtraces in the trace, do this:

echo 1 > /proc/sys/kernel/stackframe_tracing

to get such trace entries:

  ls 3688  1D..3 4642us+: deactivate_task  (0 0)
  ls 3688  1D..3 4644us+: 
schedule()<-do_exit()<-sys_exit_group()<-sys_exit_group()
  ls 3688  1D..3 4647us+: sysenter_past_esp()<-(   -1)()<-(0)()<-(
0)()
   0 1D..2 4652us : schedule  (0 20)

this way we can see why a task goes to sleep and which exact kernel 
codepath is waking it up.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Mark Lord <[EMAIL PROTECTED]> wrote:

>> sorry, wrong URLs, the correct links are:
>>
>>
>> http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
>>http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
> ..
>
> Is there a version of these that works with 2.6.23.1 ?

yes, i've backported it and have uploaded the v2.6.23 version to:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

Is there a version of these that works with 2.6.23.1 ?

I'm not using 2.6.24-* on this machine because:
 (a) behaviour may be different, and
 (b) something broke VMware compatibility again.

Thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Rafael J. Wysocki

On Thursday, 15 of November 2007, Mark Lord wrote:
> > Hi!
> > 
> >> > I have been reporting this off and on since 2.6.23 was released.
> >> > This problem was not apparent up to perhaps 2.6.23-rc8,
> >> > but definitely became common in 2.6.23 and 2.6.23.1.
> >> > 
> >> > Most of the time, a resume-from-RAM on my notebook takes about 2.1 
> >> > seconds
> >> > of kernel time to complete.
> >> > 
> >> > Once in a while, it takes *much* longer, in the 14-20 second range.
> >> > These long events *seem* to be mostly after the notebook has been
> >> > in suspend for a longish time, but there's really nothing consistent 
> >> > here.
> >> > 
> >> > So git-bisect isn't going to work for this one.   I recently rebuilt the 
> >> > kernel
> >> > to include printk timestamps, and then it went 2 days without the issue 
> >> > happening,
> >> > until this morning (after an overnight suspend) finally.
> >> > 
> >> > The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ 
> >> > w/3GB 
> >> > DDR2.
> >> > PCIe express chipset, ATI graphics, SATA hard drive.
> >> > 
> >> > 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
> >> > 945GT 
> >> > Express Memory Controller Hub (rev 03)
> >> > 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
> >> > 945GT 
> >> > Express PCI Express Root Port (rev 03)
> >> > 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
> >> > Definition Audio Controller (rev 01)
> >> > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
> >> > Port 
> >> > 1 (rev 01)
> >> > 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
> >> > Port 
> >> > 2 (rev 01)
> >> > 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
> >> > Port 
> >> > 4 (rev 01)
> >> > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
> >> > #1 
> >> > (rev 01)
> >> > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
> >> > #2 
> >> > (rev 01)
> >> > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
> >> > #3 
> >> > (rev 01)
> >> > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
> >> > #4 
> >> > (rev 01)
> >> > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
> >> > Controller (rev 01)
> >> > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
> >> > 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
> >> > Bridge (rev 01)
> >> > 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) 
> >> > Serial 
> >> > ATA Storage Controller IDE (rev 01)
> >> > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller 
> >> > (rev 
> >> > 01)
> >> > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
> >> > X1400
> >> > 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
> >> > (rev 02)
> >> > 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
> >> > 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
> >> > SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
> >> > 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
> >> > 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
> >> > Adapter (rev 0a)
> >> > 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 
> >> > 05)
> >> > 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG 
> >> > Network 
> >> > Connection (rev 02)
> >> > 
> >> > There's also a USB2 hub connected, with the Logitech mouse being the
> >> > only thing plugged into it at present.
> >> > 
> >> > The best clue I have of what is happening, other than that it was first
> >> > seen during the late 2.6.23-rc* series, are the following two sets of
> >> > kernel logs.  The first set is from a "normal" fast wake-up,
> >> > and the second set is from the "slow" wake-up seen this morning.
> >> > 
> >> > Both logs show the same information, in pretty much the same order.
> >> > The BIG difference is a bunch of unexplained pauses of exactly 1-second
> >> > each that occur during the slow wakeup.
> >> > 
> >> > The only other difference is that the SATA disk messages are placed
> >> > differently within the two logs.  I'm thinking on that one but it
> >> > doesn't look significant to me.
> >> > 
> >> > The real question is, why the 1-sec pauses?
> > 
> > Can you try nohz=off highres=off? Strange stuff is happening with
> > nohz.
> ..
> 
> (added Ingo to CC: list:  maybe this is some weird interaction with CFS
> and jiffies being reset to 0 on resume ??)
> 
> I can try it, but it won't help debug the problem much.

Well, if you could verify that it doesn't happen at all with NO_HZ unset, that
would be a valuable data point, IMO.

> Remember, this happens very inconsistently, maybe 3-4 times a day,
> or not at all for 3-4 days.
> 
> But if somebody has a specific bug-fix patch that could explain this,
> then I'll

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> pick up the latest latency tracer patch from:

sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar

* Mark Lord <[EMAIL PROTECTED]> wrote:

>> Can you try nohz=off highres=off? Strange stuff is happening with 
>> nohz.
>
> (added Ingo to CC: list: maybe this is some weird interaction with CFS 
> and jiffies being reset to 0 on resume ??)

hm, CFS should have no impact here. To see what's happening you could 
try to use the latency tracer of the -rt patch and do a cross-resume 
trace.

pick up the latest latency tracer patch from:

  http://redhat.com/~mingo/private/latency-tracer-v2.6.24-rc2-git5-combo.patch

apply it and enable CONFIG_FUNCTION_TRACING, then pick up trace-cmd.c:

  http://redhat.com/~mingo/private/trace-cmd.c

and do something like:

  ./trace-cmd pm-suspend > trace.txt

or:

  ./trace-cmd /bin/bash -c "echo ram > /sys/power/state" > trace.txt

this should trigger suspend - then you should do the resume. If 
everything goes well then trace.txt should contain a pretty large trace 
of all the stuff we do during a suspend+resume.

and wait for such a pause and send us the resulting trace.txt.

if it's an SMP box then first do:

  echo 1 > /proc/sys/kernel/trace_all_cpus

to get a global trace. Let me know if something doesnt work with this 
scheme.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Hi!


> I have been reporting this off and on since 2.6.23 was released.
> This problem was not apparent up to perhaps 2.6.23-rc8,
> but definitely became common in 2.6.23 and 2.6.23.1.
> 
> Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds

> of kernel time to complete.
> 
> Once in a while, it takes *much* longer, in the 14-20 second range.

> These long events *seem* to be mostly after the notebook has been
> in suspend for a longish time, but there's really nothing consistent here.
> 
> So git-bisect isn't going to work for this one.   I recently rebuilt the 
> kernel
> to include printk timestamps, and then it went 2 days without the issue 
> happening,

> until this morning (after an overnight suspend) finally.
> 
> The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
> DDR2.

> PCIe express chipset, ATI graphics, SATA hard drive.
> 
> 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express Memory Controller Hub (rev 03)
> 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express PCI Express Root Port (rev 03)
> 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
> Definition Audio Controller (rev 01)
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 1 (rev 01)
> 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 2 (rev 01)
> 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 4 (rev 01)
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
> (rev 01)
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
> (rev 01)
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
> (rev 01)
> 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
> (rev 01)
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
> Controller (rev 01)

> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
> Bridge (rev 01)
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
> ATA Storage Controller IDE (rev 01)
> 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
> 01)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
> X1400
> 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
> (rev 02)

> 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
> 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
> SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)

> 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
> 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
> Adapter (rev 0a)

> 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
> 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
> Connection (rev 02)
> 
> There's also a USB2 hub connected, with the Logitech mouse being the

> only thing plugged into it at present.
> 
> The best clue I have of what is happening, other than that it was first

> seen during the late 2.6.23-rc* series, are the following two sets of
> kernel logs.  The first set is from a "normal" fast wake-up,
> and the second set is from the "slow" wake-up seen this morning.
> 
> Both logs show the same information, in pretty much the same order.

> The BIG difference is a bunch of unexplained pauses of exactly 1-second
> each that occur during the slow wakeup.
> 
> The only other difference is that the SATA disk messages are placed

> differently within the two logs.  I'm thinking on that one but it
> doesn't look significant to me.
> 
> The real question is, why the 1-sec pauses?


Can you try nohz=off highres=off? Strange stuff is happening with
nohz.

..

(added Ingo to CC: list:  maybe this is some weird interaction with CFS
and jiffies being reset to 0 on resume ??)

I can try it, but it won't help debug the problem much.
Remember, this happens very inconsistently, maybe 3-4 times a day,
or not at all for 3-4 days.

But if somebody has a specific bug-fix patch that could explain this,
then I'll happily apply it here.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Pavel Machek

Hi!

> I have been reporting this off and on since 2.6.23 was released.
> This problem was not apparent up to perhaps 2.6.23-rc8,
> but definitely became common in 2.6.23 and 2.6.23.1.
> 
> Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
> of kernel time to complete.
> 
> Once in a while, it takes *much* longer, in the 14-20 second range.
> These long events *seem* to be mostly after the notebook has been
> in suspend for a longish time, but there's really nothing consistent here.
> 
> So git-bisect isn't going to work for this one.   I recently rebuilt the 
> kernel
> to include printk timestamps, and then it went 2 days without the issue 
> happening,
> until this morning (after an overnight suspend) finally.
> 
> The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
> DDR2.
> PCIe express chipset, ATI graphics, SATA hard drive.
> 
> 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express Memory Controller Hub (rev 03)
> 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express PCI Express Root Port (rev 03)
> 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
> Definition Audio Controller (rev 01)
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 1 (rev 01)
> 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 2 (rev 01)
> 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> 4 (rev 01)
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
> (rev 01)
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
> (rev 01)
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
> (rev 01)
> 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
> (rev 01)
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
> Controller (rev 01)
> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
> Bridge (rev 01)
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
> ATA Storage Controller IDE (rev 01)
> 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
> 01)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
> X1400
> 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
> (rev 02)
> 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
> 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
> SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
> 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
> 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
> Adapter (rev 0a)
> 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
> 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
> Connection (rev 02)
> 
> There's also a USB2 hub connected, with the Logitech mouse being the
> only thing plugged into it at present.
> 
> The best clue I have of what is happening, other than that it was first
> seen during the late 2.6.23-rc* series, are the following two sets of
> kernel logs.  The first set is from a "normal" fast wake-up,
> and the second set is from the "slow" wake-up seen this morning.
> 
> Both logs show the same information, in pretty much the same order.
> The BIG difference is a bunch of unexplained pauses of exactly 1-second
> each that occur during the slow wakeup.
> 
> The only other difference is that the SATA disk messages are placed
> differently within the two logs.  I'm thinking on that one but it
> doesn't look significant to me.
> 
> The real question is, why the 1-sec pauses?

Can you try nohz=off highres=off? Strange stuff is happening with
nohz.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Mark Lord wrote:

Ray Lee wrote:
..

The real question is, why the 1-sec pauses?


Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?

..

Yes, A NO_HZ kernel.  Full config follows:

..

Blah.. that was for the wrong kernel.

Here is the correct .config that was in use for both suspend logs:


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23.1
# Wed Nov 14 09:41:54 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MCORE2=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_HPET_TIMER=y
CONFIG_NR_CPUS=2
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_MCE is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
CONFIG_I8K=m
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
CONFIG_EDD=m
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_VMSPLIT_3G=y
# CONFIG_VMSPLIT_3G_OPT is not set
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_2G_OPT is not set
# CONFIG_VMSPLIT_1G is not set

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Ray Lee wrote:

On Nov 15, 2007 8:32 AM, Mark Lord <[EMAIL PROTECTED]> wrote:

I have been reporting this off and on since 2.6.23 was released.
This problem was not apparent up to perhaps 2.6.23-rc8,
but definitely became common in 2.6.23 and 2.6.23.1.

Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
of kernel time to complete.

Once in a while, it takes *much* longer, in the 14-20 second range.
These long events *seem* to be mostly after the notebook has been
in suspend for a longish time, but there's really nothing consistent here.

So git-bisect isn't going to work for this one.   I recently rebuilt the kernel
to include printk timestamps, and then it went 2 days without the issue 
happening,
until this morning (after an overnight suspend) finally.

The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2.
PCIe express chipset, ATI graphics, SATA hard drive.

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
(rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 
01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 
01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 
01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 
01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge 
(rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA 
Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter 
(rev 0a)
03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
Connection (rev 02)

There's also a USB2 hub connected, with the Logitech mouse being the
only thing plugged into it at present.

The best clue I have of what is happening, other than that it was first
seen during the late 2.6.23-rc* series, are the following two sets of
kernel logs.  The first set is from a "normal" fast wake-up,
and the second set is from the "slow" wake-up seen this morning.

Both logs show the same information, in pretty much the same order.
The BIG difference is a bunch of unexplained pauses of exactly 1-second
each that occur during the slow wakeup.

The only other difference is that the SATA disk messages are placed
differently within the two logs.  I'm thinking on that one but it
doesn't look significant to me.

The real question is, why the 1-sec pauses?


Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?

..

Yes, A NO_HZ kernel.  Full config follows:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc2-git4
# Wed Nov 14 14:19:45 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ray Lee

On Nov 15, 2007 8:32 AM, Mark Lord <[EMAIL PROTECTED]> wrote:
> I have been reporting this off and on since 2.6.23 was released.
> This problem was not apparent up to perhaps 2.6.23-rc8,
> but definitely became common in 2.6.23 and 2.6.23.1.
>
> Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
> of kernel time to complete.
>
> Once in a while, it takes *much* longer, in the 14-20 second range.
> These long events *seem* to be mostly after the notebook has been
> in suspend for a longish time, but there's really nothing consistent here.
>
> So git-bisect isn't going to work for this one.   I recently rebuilt the 
> kernel
> to include printk timestamps, and then it went 2 days without the issue 
> happening,
> until this morning (after an overnight suspend) finally.
>
> The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
> DDR2.
> PCIe express chipset, ATI graphics, SATA hard drive.
>
> 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express Memory Controller Hub (rev 03)
> 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
> Express PCI Express Root Port (rev 03)
> 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
> Audio Controller (rev 01)
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
> (rev 01)
> 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 
> (rev 01)
> 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
> (rev 01)
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
> (rev 01)
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
> (rev 01)
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
> (rev 01)
> 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
> (rev 01)
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
> Controller (rev 01)
> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge 
> (rev 01)
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
> ATA Storage Controller IDE (rev 01)
> 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
> 01)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400
> 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 
> 02)
> 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
> 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
> SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
> 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
> 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter 
> (rev 0a)
> 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
> 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
> Connection (rev 02)
>
> There's also a USB2 hub connected, with the Logitech mouse being the
> only thing plugged into it at present.
>
> The best clue I have of what is happening, other than that it was first
> seen during the late 2.6.23-rc* series, are the following two sets of
> kernel logs.  The first set is from a "normal" fast wake-up,
> and the second set is from the "slow" wake-up seen this morning.
>
> Both logs show the same information, in pretty much the same order.
> The BIG difference is a bunch of unexplained pauses of exactly 1-second
> each that occur during the slow wakeup.
>
> The only other difference is that the SATA disk messages are placed
> differently within the two logs.  I'm thinking on that one but it
> doesn't look significant to me.
>
> The real question is, why the 1-sec pauses?

Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Pavel Machek

Hi!

 I have been reporting this off and on since 2.6.23 was released.
 This problem was not apparent up to perhaps 2.6.23-rc8,
 but definitely became common in 2.6.23 and 2.6.23.1.
 
 Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
 of kernel time to complete.
 
 Once in a while, it takes *much* longer, in the 14-20 second range.
 These long events *seem* to be mostly after the notebook has been
 in suspend for a longish time, but there's really nothing consistent here.
 
 So git-bisect isn't going to work for this one.   I recently rebuilt the 
 kernel
 to include printk timestamps, and then it went 2 days without the issue 
 happening,
 until this morning (after an overnight suspend) finally.
 
 The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
 DDR2.
 PCIe express chipset, ATI graphics, SATA hard drive.
 
 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express Memory Controller Hub (rev 03)
 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express PCI Express Root Port (rev 03)
 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
 Definition Audio Controller (rev 01)
 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 1 (rev 01)
 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 2 (rev 01)
 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 4 (rev 01)
 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
 (rev 01)
 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
 (rev 01)
 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
 (rev 01)
 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
 (rev 01)
 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
 Controller (rev 01)
 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
 Bridge (rev 01)
 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
 ATA Storage Controller IDE (rev 01)
 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
 01)
 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
 X1400
 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
 (rev 02)
 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
 Adapter (rev 0a)
 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
 Connection (rev 02)
 
 There's also a USB2 hub connected, with the Logitech mouse being the
 only thing plugged into it at present.
 
 The best clue I have of what is happening, other than that it was first
 seen during the late 2.6.23-rc* series, are the following two sets of
 kernel logs.  The first set is from a normal fast wake-up,
 and the second set is from the slow wake-up seen this morning.
 
 Both logs show the same information, in pretty much the same order.
 The BIG difference is a bunch of unexplained pauses of exactly 1-second
 each that occur during the slow wakeup.
 
 The only other difference is that the SATA disk messages are placed
 differently within the two logs.  I'm thinking on that one but it
 doesn't look significant to me.
 
 The real question is, why the 1-sec pauses?

Can you try nohz=off highres=off? Strange stuff is happening with
nohz.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ray Lee

On Nov 15, 2007 8:32 AM, Mark Lord [EMAIL PROTECTED] wrote:
 I have been reporting this off and on since 2.6.23 was released.
 This problem was not apparent up to perhaps 2.6.23-rc8,
 but definitely became common in 2.6.23 and 2.6.23.1.

 Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
 of kernel time to complete.

 Once in a while, it takes *much* longer, in the 14-20 second range.
 These long events *seem* to be mostly after the notebook has been
 in suspend for a longish time, but there's really nothing consistent here.

 So git-bisect isn't going to work for this one.   I recently rebuilt the 
 kernel
 to include printk timestamps, and then it went 2 days without the issue 
 happening,
 until this morning (after an overnight suspend) finally.

 The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
 DDR2.
 PCIe express chipset, ATI graphics, SATA hard drive.

 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express Memory Controller Hub (rev 03)
 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express PCI Express Root Port (rev 03)
 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
 Audio Controller (rev 01)
 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
 (rev 01)
 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 
 (rev 01)
 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
 (rev 01)
 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
 (rev 01)
 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
 (rev 01)
 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
 (rev 01)
 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
 (rev 01)
 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
 Controller (rev 01)
 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge 
 (rev 01)
 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
 ATA Storage Controller IDE (rev 01)
 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
 01)
 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400
 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 
 02)
 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter 
 (rev 0a)
 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
 Connection (rev 02)

 There's also a USB2 hub connected, with the Logitech mouse being the
 only thing plugged into it at present.

 The best clue I have of what is happening, other than that it was first
 seen during the late 2.6.23-rc* series, are the following two sets of
 kernel logs.  The first set is from a normal fast wake-up,
 and the second set is from the slow wake-up seen this morning.

 Both logs show the same information, in pretty much the same order.
 The BIG difference is a bunch of unexplained pauses of exactly 1-second
 each that occur during the slow wakeup.

 The only other difference is that the SATA disk messages are placed
 differently within the two logs.  I'm thinking on that one but it
 doesn't look significant to me.

 The real question is, why the 1-sec pauses?

Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Ray Lee wrote:

On Nov 15, 2007 8:32 AM, Mark Lord [EMAIL PROTECTED] wrote:

I have been reporting this off and on since 2.6.23 was released.
This problem was not apparent up to perhaps 2.6.23-rc8,
but definitely became common in 2.6.23 and 2.6.23.1.

Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds
of kernel time to complete.

Once in a while, it takes *much* longer, in the 14-20 second range.
These long events *seem* to be mostly after the notebook has been
in suspend for a longish time, but there's really nothing consistent here.

So git-bisect isn't going to work for this one.   I recently rebuilt the kernel
to include printk timestamps, and then it went 2 days without the issue 
happening,
until this morning (after an overnight suspend) finally.

The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2.
PCIe express chipset, ATI graphics, SATA hard drive.

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
(rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 
01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 
01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 
01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 
01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge 
(rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA 
Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter 
(rev 0a)
03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
Connection (rev 02)

There's also a USB2 hub connected, with the Logitech mouse being the
only thing plugged into it at present.

The best clue I have of what is happening, other than that it was first
seen during the late 2.6.23-rc* series, are the following two sets of
kernel logs.  The first set is from a normal fast wake-up,
and the second set is from the slow wake-up seen this morning.

Both logs show the same information, in pretty much the same order.
The BIG difference is a bunch of unexplained pauses of exactly 1-second
each that occur during the slow wakeup.

The only other difference is that the SATA disk messages are placed
differently within the two logs.  I'm thinking on that one but it
doesn't look significant to me.

The real question is, why the 1-sec pauses?


Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?

..

Yes, A NO_HZ kernel.  Full config follows:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc2-git4
# Wed Nov 14 14:19:45 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Mark Lord wrote:

Ray Lee wrote:
..

The real question is, why the 1-sec pauses?


Well, and why the 1-second pauses eventually stop, too. Seems
interesting that they don't continue. Also, they're pretty much
dead-on one- and two-second pauses, with HZ accuracy. Is this with a
NO_HZ kernel?

..

Yes, A NO_HZ kernel.  Full config follows:

..

Blah.. that was for the wrong kernel.

Here is the correct .config that was in use for both suspend logs:


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23.1
# Wed Nov 14 09:41:54 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MCORE2=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_HPET_TIMER=y
CONFIG_NR_CPUS=2
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_MCE is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
CONFIG_I8K=m
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
CONFIG_EDD=m
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_VMSPLIT_3G=y
# CONFIG_VMSPLIT_3G_OPT is not set
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_2G_OPT is not set
# CONFIG_VMSPLIT_1G is not set

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Hi!


 I have been reporting this off and on since 2.6.23 was released.
 This problem was not apparent up to perhaps 2.6.23-rc8,
 but definitely became common in 2.6.23 and 2.6.23.1.
 
 Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds

 of kernel time to complete.
 
 Once in a while, it takes *much* longer, in the 14-20 second range.

 These long events *seem* to be mostly after the notebook has been
 in suspend for a longish time, but there's really nothing consistent here.
 
 So git-bisect isn't going to work for this one.   I recently rebuilt the 
 kernel
 to include printk timestamps, and then it went 2 days without the issue 
 happening,

 until this morning (after an overnight suspend) finally.
 
 The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB 
 DDR2.

 PCIe express chipset, ATI graphics, SATA hard drive.
 
 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express Memory Controller Hub (rev 03)
 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT 
 Express PCI Express Root Port (rev 03)
 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
 Definition Audio Controller (rev 01)
 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 1 (rev 01)
 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 2 (rev 01)
 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
 4 (rev 01)
 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 
 (rev 01)
 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 
 (rev 01)
 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 
 (rev 01)
 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 
 (rev 01)
 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
 Controller (rev 01)

 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
 Bridge (rev 01)
 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
 ATA Storage Controller IDE (rev 01)
 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
 01)
 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
 X1400
 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
 (rev 02)

 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)

 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
 Adapter (rev 0a)

 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network 
 Connection (rev 02)
 
 There's also a USB2 hub connected, with the Logitech mouse being the

 only thing plugged into it at present.
 
 The best clue I have of what is happening, other than that it was first

 seen during the late 2.6.23-rc* series, are the following two sets of
 kernel logs.  The first set is from a normal fast wake-up,
 and the second set is from the slow wake-up seen this morning.
 
 Both logs show the same information, in pretty much the same order.

 The BIG difference is a bunch of unexplained pauses of exactly 1-second
 each that occur during the slow wakeup.
 
 The only other difference is that the SATA disk messages are placed

 differently within the two logs.  I'm thinking on that one but it
 doesn't look significant to me.
 
 The real question is, why the 1-sec pauses?


Can you try nohz=off highres=off? Strange stuff is happening with
nohz.

..

(added Ingo to CC: list:  maybe this is some weird interaction with CFS
and jiffies being reset to 0 on resume ??)

I can try it, but it won't help debug the problem much.
Remember, this happens very inconsistently, maybe 3-4 times a day,
or not at all for 3-4 days.

But if somebody has a specific bug-fix patch that could explain this,
then I'll happily apply it here.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 pick up the latest latency tracer patch from:

sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Mark Lord [EMAIL PROTECTED] wrote:

 Can you try nohz=off highres=off? Strange stuff is happening with 
 nohz.

 (added Ingo to CC: list: maybe this is some weird interaction with CFS 
 and jiffies being reset to 0 on resume ??)

hm, CFS should have no impact here. To see what's happening you could 
try to use the latency tracer of the -rt patch and do a cross-resume 
trace.

pick up the latest latency tracer patch from:

  http://redhat.com/~mingo/private/latency-tracer-v2.6.24-rc2-git5-combo.patch

apply it and enable CONFIG_FUNCTION_TRACING, then pick up trace-cmd.c:

  http://redhat.com/~mingo/private/trace-cmd.c

and do something like:

  ./trace-cmd pm-suspend  trace.txt

or:

  ./trace-cmd /bin/bash -c echo ram  /sys/power/state  trace.txt

this should trigger suspend - then you should do the resume. If 
everything goes well then trace.txt should contain a pretty large trace 
of all the stuff we do during a suspend+resume.

and wait for such a pause and send us the resulting trace.txt.

if it's an SMP box then first do:

  echo 1  /proc/sys/kernel/trace_all_cpus

to get a global trace. Let me know if something doesnt work with this 
scheme.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Rafael J. Wysocki

On Thursday, 15 of November 2007, Mark Lord wrote:
  Hi!
  
   I have been reporting this off and on since 2.6.23 was released.
   This problem was not apparent up to perhaps 2.6.23-rc8,
   but definitely became common in 2.6.23 and 2.6.23.1.
   
   Most of the time, a resume-from-RAM on my notebook takes about 2.1 
   seconds
   of kernel time to complete.
   
   Once in a while, it takes *much* longer, in the 14-20 second range.
   These long events *seem* to be mostly after the notebook has been
   in suspend for a longish time, but there's really nothing consistent 
   here.
   
   So git-bisect isn't going to work for this one.   I recently rebuilt the 
   kernel
   to include printk timestamps, and then it went 2 days without the issue 
   happening,
   until this morning (after an overnight suspend) finally.
   
   The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ 
   w/3GB 
   DDR2.
   PCIe express chipset, ATI graphics, SATA hard drive.
   
   00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
   945GT 
   Express Memory Controller Hub (rev 03)
   00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 
   945GT 
   Express PCI Express Root Port (rev 03)
   00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High 
   Definition Audio Controller (rev 01)
   00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
   Port 
   1 (rev 01)
   00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
   Port 
   2 (rev 01)
   00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express 
   Port 
   4 (rev 01)
   00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
   #1 
   (rev 01)
   00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
   #2 
   (rev 01)
   00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
   #3 
   (rev 01)
   00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
   #4 
   (rev 01)
   00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
   Controller (rev 01)
   00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
   00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface 
   Bridge (rev 01)
   00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) 
   Serial 
   ATA Storage Controller IDE (rev 01)
   00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller 
   (rev 
   01)
   01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility 
   X1400
   03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
   (rev 02)
   03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
   03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 
   SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
   03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
   03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host 
   Adapter (rev 0a)
   03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 
   05)
   0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG 
   Network 
   Connection (rev 02)
   
   There's also a USB2 hub connected, with the Logitech mouse being the
   only thing plugged into it at present.
   
   The best clue I have of what is happening, other than that it was first
   seen during the late 2.6.23-rc* series, are the following two sets of
   kernel logs.  The first set is from a normal fast wake-up,
   and the second set is from the slow wake-up seen this morning.
   
   Both logs show the same information, in pretty much the same order.
   The BIG difference is a bunch of unexplained pauses of exactly 1-second
   each that occur during the slow wakeup.
   
   The only other difference is that the SATA disk messages are placed
   differently within the two logs.  I'm thinking on that one but it
   doesn't look significant to me.
   
   The real question is, why the 1-sec pauses?
  
  Can you try nohz=off highres=off? Strange stuff is happening with
  nohz.
 ..
 
 (added Ingo to CC: list:  maybe this is some weird interaction with CFS
 and jiffies being reset to 0 on resume ??)
 
 I can try it, but it won't help debug the problem much.

Well, if you could verify that it doesn't happen at all with NO_HZ unset, that
would be a valuable data point, IMO.

 Remember, this happens very inconsistently, maybe 3-4 times a day,
 or not at all for 3-4 days.
 
 But if somebody has a specific bug-fix patch that could explain this,
 then I'll happily apply it here.

I'm not aware of any fix related to the symptoms that you observe.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Mark Lord


Ingo Molnar wrote:

* Ingo Molnar [EMAIL PROTECTED] wrote:


pick up the latest latency tracer patch from:


sorry, wrong URLs, the correct links are:

   
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
   http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c

..

Is there a version of these that works with 2.6.23.1 ?

I'm not using 2.6.24-* on this machine because:
 (a) behaviour may be different, and
 (b) something broke VMware compatibility again.

Thanks
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Mark Lord [EMAIL PROTECTED] wrote:

 sorry, wrong URLs, the correct links are:


 http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch
http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c
 ..

 Is there a version of these that works with 2.6.23.1 ?

yes, i've backported it and have uploaded the v2.6.23 version to:

  
http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-11-15 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

  Is there a version of these that works with 2.6.23.1 ?
 
 yes, i've backported it and have uploaded the v2.6.23 version to:
 
   
 http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch

btw., if the trace is too large and the interesting section of suspend 
does not fit into it then you can narrow it down to the most important 
events only by changing this in trace-cmd.c:

system(echo 1  /proc/sys/kernel/mcount_enabled);

to:

system(echo 0  /proc/sys/kernel/mcount_enabled);

that way we'll still trace IRQs and scheduling events, which is enough 
to see roughly where the delay is happening. To include symbolic 
backtraces in the trace, do this:

echo 1  /proc/sys/kernel/stackframe_tracing

to get such trace entries:

  ls 3688  1D..3 4642us+: deactivate_task ls 3688 (0 0)
  ls 3688  1D..3 4644us+: 
schedule()-do_exit()-sys_exit_group()-sys_exit_group()
  ls 3688  1D..3 4647us+: sysenter_past_esp()-(   -1)()-(0)()-(
0)()
  idle 0 1D..2 4652us : schedule ls 3688 (0 20)

this way we can see why a task goes to sleep and which exact kernel 
codepath is waking it up.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

100 matches

Mail list logo