Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote: > > > > After an eternity of compile time, this config does generate some useful > > output. qemu is not to blame. > > Or is it? The output definitely looks suspicious. Large amounts of > code get processed within a microsecond, while update_wall_time() > appears to cause huge delays every time it is called: > http://logfs.org/~joern/trace > > Does this output make sense or does it rather indicate some sloppiness > wrt. time in the qemu virtual machine? not sure. It could be qemu being scheduled away? You could try to run qemu with nice -20 or so, to avoid getting preempted. If time lapses like this still show up: trace-cm 434 0D.h. 1008us!: do_timer (tick_periodic) trace-cm 434 0D.h. 1972us+: update_wall_time (do_timer) trace-cm 434 0D.h. 1008us!: do_timer (tick_periodic) trace-cm 434 0D.h. 1972us+: update_wall_time (do_timer) then that could indicate a timekeeping weirdness, OR it could mean that qemu is simply very slow. (there could be timer hw access between those two function calls) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote: After an eternity of compile time, this config does generate some useful output. qemu is not to blame. Or is it? The output definitely looks suspicious. Large amounts of code get processed within a microsecond, while update_wall_time() appears to cause huge delays every time it is called: http://logfs.org/~joern/trace Does this output make sense or does it rather indicate some sloppiness wrt. time in the qemu virtual machine? not sure. It could be qemu being scheduled away? You could try to run qemu with nice -20 or so, to avoid getting preempted. If time lapses like this still show up: trace-cm 434 0D.h. 1008us!: do_timer (tick_periodic) trace-cm 434 0D.h. 1972us+: update_wall_time (do_timer) trace-cm 434 0D.h. 1008us!: do_timer (tick_periodic) trace-cm 434 0D.h. 1972us+: update_wall_time (do_timer) then that could indicate a timekeeping weirdness, OR it could mean that qemu is simply very slow. (there could be timer hw access between those two function calls) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote: > > After an eternity of compile time, this config does generate some useful > output. qemu is not to blame. Or is it? The output definitely looks suspicious. Large amounts of code get processed within a microsecond, while update_wall_time() appears to cause huge delays every time it is called: http://logfs.org/~joern/trace Does this output make sense or does it rather indicate some sloppiness wrt. time in the qemu virtual machine? Jörn -- tglx1 thinks that joern should get a (TM) for "Thinking Is Hard" -- Thomas Gleixner -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote: After an eternity of compile time, this config does generate some useful output. qemu is not to blame. Or is it? The output definitely looks suspicious. Large amounts of code get processed within a microsecond, while update_wall_time() appears to cause huge delays every time it is called: http://logfs.org/~joern/trace Does this output make sense or does it rather indicate some sloppiness wrt. time in the qemu virtual machine? Jörn -- tglx1 thinks that joern should get a (TM) for Thinking Is Hard -- Thomas Gleixner -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 22:19:00 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > Maybe one more thing: can you send me the config you used for the > > setup above? I'd like to know whether qemu or my config is to blame. > > sure - attached. After an eternity of compile time, this config does generate some useful output. qemu is not to blame. Jörn -- Joern's library part 9: http://www.scl.ameslab.gov/Publications/Gus/TwelveWays.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote: > > to capture that trace i did not use -rt, i just patched latest -git > with: > > > http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch > > (this has your fixes included already) > > have done: > > echo 1 > /proc/sys/kernel/mcount_enabled > > and have run: > > ./trace-cmd sleep 1 > trace.txt > > http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c > > to capture a 1 second trace of what the system is doing. I think your > troubles are due to running it within a qemu guest - that is not a > typical utilization so you are on unchartered waters. Maybe one more thing: can you send me the config you used for the setup above? I'd like to know whether qemu or my config is to blame. Jörn -- Eighty percent of success is showing up. -- Woody Allen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote: > > to capture a 1 second trace of what the system is doing. I think your > troubles are due to running it within a qemu guest - that is not a > typical utilization so you are on unchartered waters. Looks like it. Guess I'll switch to something else for the moment. Jörn -- Linux is more the core point of a concept that surrounds "open source" which, in turn, is based on a false concept. This concept is that people actually want to look at source code. -- Rob Enderle -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > > oprofile helps if you can reliably reproduce the slowdown in a loop > > or for a long amount of time, with lots of CPU utilization - and > > then it's also lower overhead. The tracer can be used to capture > > rare or complex events, and gives the full flow control and what is > > happening within the kernel. > > Such a trace would be useful indeed. But so far the patch has only > given me grief and nothing remotely like useful output. Maybe I > should simply use the complete -rt patch instead of debugging the > broken-out latency-tracer patch. to capture that trace i did not use -rt, i just patched latest -git with: http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch (this has your fixes included already) have done: echo 1 > /proc/sys/kernel/mcount_enabled and have run: ./trace-cmd sleep 1 > trace.txt http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c to capture a 1 second trace of what the system is doing. I think your troubles are due to running it within a qemu guest - that is not a typical utilization so you are on unchartered waters. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:07:22 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > Result looked like a livelock and finally convinced me to abandon the > > latency tracer. Sorry, but it appears to be the right tool for the > > wrong job. > > hm, we routinely use it in -rt to capture "what on earth is happening" > incidents. The snippet below is a random snipped from a trace that i've > just captured, with mcount enabled. It seems to work fine here, with and > without mcount. (pit clocksource is almost never used, that's why you > had those early problems.) > > oprofile helps if you can reliably reproduce the slowdown in a loop or > for a long amount of time, with lots of CPU utilization - and then it's > also lower overhead. The tracer can be used to capture rare or complex > events, and gives the full flow control and what is happening within the > kernel. Such a trace would be useful indeed. But so far the patch has only given me grief and nothing remotely like useful output. Maybe I should simply use the complete -rt patch instead of debugging the broken-out latency-tracer patch. Jörn -- Mundie uses a textbook tactic of manipulation: start with some reasonable talk, and lead the audience to an unreasonable conclusion. -- Bruce Perens -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote: > > > > well what does the trace say, where do the delays come from? To get a > > quick overview you can make tracing lighter weight by doing: > > > > echo 0 > /proc/sys/kernel/mcount_enabled > > echo 1 > /proc/sys/kernel/trace_syscalls > > I mistyped and did > echo 1 > /proc/sys/kernel/mcount_enabled > > Result looked like a livelock and finally convinced me to abandon the > latency tracer. Sorry, but it appears to be the right tool for the > wrong job. hm, we routinely use it in -rt to capture "what on earth is happening" incidents. The snippet below is a random snipped from a trace that i've just captured, with mcount enabled. It seems to work fine here, with and without mcount. (pit clocksource is almost never used, that's why you had those early problems.) oprofile helps if you can reliably reproduce the slowdown in a loop or for a long amount of time, with lots of CPU utilization - and then it's also lower overhead. The tracer can be used to capture rare or complex events, and gives the full flow control and what is happening within the kernel. Ingo > 0 1D... 811us : sched_clock_idle_sleep_event (acpi_processor_idle) 0 1D... 813us : _spin_lock (sched_clock_idle_sleep_event) trace-cm 2463 0 814us : native_flush_tlb_others (flush_tlb_mm) 0 1D... 815us : __update_rq_clock (sched_clock_idle_sleep_event) trace-cm 2463 0 817us : _spin_lock (native_flush_tlb_others) 0 1D... 817us+: acpi_cstate_enter (acpi_processor_idle) trace-cm 2463 0 820us+: send_IPI_mask_bitmask (native_flush_tlb_others) trace-cm 2463 0D... 823us+: apic_wait_icr_idle (send_IPI_mask_bitmask) trace-cm 2463 0 856us+: up_write (copy_process) trace-cm 2463 0 859us+: copy_keys (copy_process) trace-cm 2463 0 862us+: copy_namespaces (copy_process) trace-cm 2463 0 865us+: copy_thread (copy_process) trace-cm 2463 0 868us+: memcpy (copy_thread) trace-cm 2463 0 871us+: alloc_pid (copy_process) trace-cm 2463 0 874us+: kmem_cache_alloc (alloc_pid) trace-cm 2463 0 877us+: _spin_lock_irq (alloc_pid) trace-cm 2463 0 880us+: _write_lock_irq (copy_process) trace-cm 2463 0D... 883us+: _spin_lock (copy_process) trace-cm 2463 0D... 887us+: recalc_sigpending (copy_process) trace-cm 2463 0D... 890us+: recalc_sigpending_tsk (recalc_sigpending) trace-cm 2463 0D... 893us+: attach_pid (copy_process) trace-cm 2463 0D... 896us+: attach_pid (copy_process) trace-cm 2463 0D... 899us+: attach_pid (copy_process) trace-cm 2463 0 902us+: proc_fork_connector (copy_process) trace-cm 2463 0 905us+: wake_up_new_task (do_fork) trace-cm 2463 0 908us+: task_rq_lock (wake_up_new_task) trace-cm 2463 0D... 911us+: _spin_lock (task_rq_lock) trace-cm 2463 0D... 914us+: update_rq_clock (wake_up_new_task) trace-cm 2463 0D... 918us+: __update_rq_clock (update_rq_clock) trace-cm 2463 0D... 921us+: effective_prio (wake_up_new_task) trace-cm 2463 0D... 924us+: wake_up_new_task (0 0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote: > > well what does the trace say, where do the delays come from? To get a > quick overview you can make tracing lighter weight by doing: > > echo 0 > /proc/sys/kernel/mcount_enabled > echo 1 > /proc/sys/kernel/trace_syscalls I mistyped and did echo 1 > /proc/sys/kernel/mcount_enabled Result looked like a livelock and finally convinced me to abandon the latency tracer. Sorry, but it appears to be the right tool for the wrong job. Jörn -- They laughed at Galileo. They laughed at Copernicus. They laughed at Columbus. But remember, they also laughed at Bozo the Clown. -- unknown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > I do. Went through 10odd runs and annotated the function right below > mcount each time. Seems to work now. > > Trouble is that it doesn't solve my real problem at hand. Something > is causing significant delays when writing to logfs. Core logfs code > is not running, but may cause whatever other code is running and > burning up all the cpu time. Wasting 100ms of "qemu-time" to write a > single page happens fairly frequently. > > With the latency tracer the problem appears to have become worse. Now > the loftlockup code triggers quite frequently. Which makes a bit of > sense, as the problem is a busy CPU, rather than an idle one. well what does the trace say, where do the delays come from? To get a quick overview you can make tracing lighter weight by doing: echo 0 > /proc/sys/kernel/mcount_enabled echo 1 > /proc/sys/kernel/trace_syscalls (this turns the latency tracer into a "global strace" kind of tracer) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote: > > > > After another ten or so notrace annotations throughout the spinlock > > code, the latency tracer appears to work. Not sure how many useful > > information is missing through all the annotations, though. > > And here is a patch with the needed annotations. Looks a bit shabby, > as it was generated though git diff, patcher, interdiff and > combinediff. thanks - the patches applied fine, i've added them to my latency-tracer patchqueue. Steve Rostedt might want to pick the fixes up for -rt as well. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote: > > After another ten or so notrace annotations throughout the spinlock > code, the latency tracer appears to work. Not sure how many useful > information is missing through all the annotations, though. And here is a patch with the needed annotations. Looks a bit shabby, as it was generated though git diff, patcher, interdiff and combinediff. Jörn -- Joern's library part 10: http://blogs.msdn.com/David_Gristwood/archive/2004/06/24/164849.aspx unchanged: --- a/arch/x86/kernel/i8253.c +++ b/arch/x86/kernel/i8253.c @@ -125,7 +125,7 @@ void __init setup_pit_timer(void) * to just read by itself. So use jiffies to emulate a free * running counter: */ -static cycle_t pit_read(void) +static notrace cycle_t pit_read(void) { unsigned long flags; int count; unchanged: --- a/kernel/spinlock.c +++ b/kernel/spinlock.c @@ -76,7 +76,7 @@ void __lockfunc _read_lock(rwlock_t *lock) } EXPORT_SYMBOL(_read_lock); -unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) +unsigned long notrace __lockfunc _spin_lock_irqsave(spinlock_t *lock) { unsigned long flags; @@ -341,7 +341,7 @@ void __lockfunc _read_unlock(rwlock_t *lock) } EXPORT_SYMBOL(_read_unlock); -void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) +void notrace __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { spin_release(>dep_map, 1, _RET_IP_); _raw_spin_unlock(lock); unchanged: --- a/lib/spinlock_debug.c +++ b/lib/spinlock_debug.c @@ -148,7 +148,7 @@ int _raw_spin_trylock(spinlock_t *lock) return ret; } -void _raw_spin_unlock(spinlock_t *lock) +void notrace _raw_spin_unlock(spinlock_t *lock) { debug_spin_unlock(lock); __raw_spin_unlock(>raw_lock); only in patch2: unchanged: --- linux/arch/x86/kernel/tsc_32.c +++ linux-2.6.24-rc3logfs/arch/x86/kernel/tsc_32.c 2007-12-02 15:21:15.0 +0100 @@ -92,7 +92,7 @@ /* * Scheduler clock - returns current time in nanosec units. */ -unsigned long long native_sched_clock(void) +unsigned notrace long long native_sched_clock(void) { unsigned long long this_offset; only in patch2: unchanged: --- linux/kernel/lockdep.c +++ linux-2.6.24-rc3logfs/kernel/lockdep.c 2007-12-02 15:21:16.0 +0100 @@ -139,7 +139,7 @@ return i; } -static void lock_time_inc(struct lock_time *lt, s64 time) +static notrace void lock_time_inc(struct lock_time *lt, s64 time) { if (time > lt->max) lt->max = time; @@ -198,7 +198,7 @@ memset(class->contention_point, 0, sizeof(class->contention_point)); } -static struct lock_class_stats *get_lock_stats(struct lock_class *class) +static notrace struct lock_class_stats *get_lock_stats(struct lock_class *class) { return _cpu_var(lock_stats)[class - lock_classes]; } @@ -208,7 +208,7 @@ put_cpu_var(lock_stats); } -static void lock_release_holdtime(struct held_lock *hlock) +static notrace void lock_release_holdtime(struct held_lock *hlock) { struct lock_class_stats *stats; s64 holdtime; @@ -2872,7 +2872,7 @@ } EXPORT_SYMBOL_GPL(lock_contended); -void lock_acquired(struct lockdep_map *lock) +void notrace lock_acquired(struct lockdep_map *lock) { unsigned long flags; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 14:57:11 +0100, Ingo Molnar wrote: > > hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable? I do. Went through 10odd runs and annotated the function right below mcount each time. Seems to work now. Trouble is that it doesn't solve my real problem at hand. Something is causing significant delays when writing to logfs. Core logfs code is not running, but may cause whatever other code is running and burning up all the cpu time. Wasting 100ms of "qemu-time" to write a single page happens fairly frequently. With the latency tracer the problem appears to have become worse. Now the loftlockup code triggers quite frequently. Which makes a bit of sense, as the problem is a busy CPU, rather than an idle one. Guess I'll try oprofile or lcov instead. Jörn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote: > > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > > > > ah. You should mark pit_read() function as notrace. PIT clocksource > > > > is rare. (add the 'notrace' word to the function prototype) > > > > > > Hardly a change at all. Apart from some offsets, this dump is > > > identical. > > > > > > stopped custom tracer. > > > BUG: spinlock recursion on CPU#0, sh/954 > > > lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 > > > Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 > > > [] show_trace_log_lvl+0x35/0x54 > > > [] show_trace+0x2c/0x2e > > > [] dump_stack+0x84/0x8a > > > [] spin_bug+0xa7/0xae > > > [] _raw_spin_lock+0x45/0xfa > > > [] _spin_lock_irqsave+0x68/0x7a > > > [] pit_read+0xf/0x91 > > > [] get_monotonic_cycles+0xf/0x2d > > > [] now+0x2a/0x7c > > > [] trace+0x4d/0x1e8 > > > [] __mcount+0x95/0xa6 > > > [] mcount+0x14/0x18 > > > [] lock_acquired+0xe/0x1d7 > > > [] _spin_lock_irqsave+0x70/0x7a > > > [] pit_read+0xf/0x91 > > > > hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked > > 'notrace' too - otherwise we recurse back into pit_read(). > > This time not even the offsets have changed. Dump is identical. hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote: > > > > This time not even the offsets have changed. Dump is identical. > > After another ten or so notrace annotations throughout the spinlock > code, the latency tracer appears to work. Not sure how many useful > information is missing through all the annotations, though. a few annotations out of thousands of function calls in the kernel it's usually not significant. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote: > > This time not even the offsets have changed. Dump is identical. After another ten or so notrace annotations throughout the spinlock code, the latency tracer appears to work. Not sure how many useful information is missing through all the annotations, though. Jörn -- Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo vorher keine existiert hat. -- Doris Lessing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > > ah. You should mark pit_read() function as notrace. PIT clocksource > > > is rare. (add the 'notrace' word to the function prototype) > > > > Hardly a change at all. Apart from some offsets, this dump is > > identical. > > > > stopped custom tracer. > > BUG: spinlock recursion on CPU#0, sh/954 > > lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 > > Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 > > [] show_trace_log_lvl+0x35/0x54 > > [] show_trace+0x2c/0x2e > > [] dump_stack+0x84/0x8a > > [] spin_bug+0xa7/0xae > > [] _raw_spin_lock+0x45/0xfa > > [] _spin_lock_irqsave+0x68/0x7a > > [] pit_read+0xf/0x91 > > [] get_monotonic_cycles+0xf/0x2d > > [] now+0x2a/0x7c > > [] trace+0x4d/0x1e8 > > [] __mcount+0x95/0xa6 > > [] mcount+0x14/0x18 > > [] lock_acquired+0xe/0x1d7 > > [] _spin_lock_irqsave+0x70/0x7a > > [] pit_read+0xf/0x91 > > hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked > 'notrace' too - otherwise we recurse back into pit_read(). This time not even the offsets have changed. Dump is identical. Jörn -- Mundie uses a textbook tactic of manipulation: start with some reasonable talk, and lead the audience to an unreasonable conclusion. -- Bruce Perens -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > > ah. You should mark pit_read() function as notrace. PIT clocksource > > is rare. (add the 'notrace' word to the function prototype) > > Hardly a change at all. Apart from some offsets, this dump is > identical. > > stopped custom tracer. > BUG: spinlock recursion on CPU#0, sh/954 > lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 > Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 > [] show_trace_log_lvl+0x35/0x54 > [] show_trace+0x2c/0x2e > [] dump_stack+0x84/0x8a > [] spin_bug+0xa7/0xae > [] _raw_spin_lock+0x45/0xfa > [] _spin_lock_irqsave+0x68/0x7a > [] pit_read+0xf/0x91 > [] get_monotonic_cycles+0xf/0x2d > [] now+0x2a/0x7c > [] trace+0x4d/0x1e8 > [] __mcount+0x95/0xa6 > [] mcount+0x14/0x18 > [] lock_acquired+0xe/0x1d7 > [] _spin_lock_irqsave+0x70/0x7a > [] pit_read+0xf/0x91 hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 'notrace' too - otherwise we recurse back into pit_read(). Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Hardly a change at all. Apart from some offsets, this dump is identical. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/954 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e2] pit_read+0xf/0x91 [c0130ee1] get_monotonic_cycles+0xf/0x2d [c013c0e7] now+0x2a/0x7c [c013c333] trace+0x4d/0x1e8 [c013dbeb] __mcount+0x95/0xa6 [c010d354] mcount+0x14/0x18 [c0135a3c] lock_acquired+0xe/0x1d7 [c02a02b9] _spin_lock_irqsave+0x70/0x7a [c01087e2] pit_read+0xf/0x91 hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 'notrace' too - otherwise we recurse back into pit_read(). Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Hardly a change at all. Apart from some offsets, this dump is identical. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/954 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e2] pit_read+0xf/0x91 [c0130ee1] get_monotonic_cycles+0xf/0x2d [c013c0e7] now+0x2a/0x7c [c013c333] trace+0x4d/0x1e8 [c013dbeb] __mcount+0x95/0xa6 [c010d354] mcount+0x14/0x18 [c0135a3c] lock_acquired+0xe/0x1d7 [c02a02b9] _spin_lock_irqsave+0x70/0x7a [c01087e2] pit_read+0xf/0x91 hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 'notrace' too - otherwise we recurse back into pit_read(). This time not even the offsets have changed. Dump is identical. Jörn -- Mundie uses a textbook tactic of manipulation: start with some reasonable talk, and lead the audience to an unreasonable conclusion. -- Bruce Perens -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote: This time not even the offsets have changed. Dump is identical. After another ten or so notrace annotations throughout the spinlock code, the latency tracer appears to work. Not sure how many useful information is missing through all the annotations, though. Jörn -- Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo vorher keine existiert hat. -- Doris Lessing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Sun, 2 December 2007 09:56:08 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Hardly a change at all. Apart from some offsets, this dump is identical. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/954 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e2] pit_read+0xf/0x91 [c0130ee1] get_monotonic_cycles+0xf/0x2d [c013c0e7] now+0x2a/0x7c [c013c333] trace+0x4d/0x1e8 [c013dbeb] __mcount+0x95/0xa6 [c010d354] mcount+0x14/0x18 [c0135a3c] lock_acquired+0xe/0x1d7 [c02a02b9] _spin_lock_irqsave+0x70/0x7a [c01087e2] pit_read+0xf/0x91 hm, it seems lock_acquired() [in kernel/lockdep.c] needs to be marked 'notrace' too - otherwise we recurse back into pit_read(). This time not even the offsets have changed. Dump is identical. hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable? Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Sun, 2 December 2007 12:31:43 +0100, Jörn Engel wrote: This time not even the offsets have changed. Dump is identical. After another ten or so notrace annotations throughout the spinlock code, the latency tracer appears to work. Not sure how many useful information is missing through all the annotations, though. a few annotations out of thousands of function calls in the kernel it's usually not significant. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 14:57:11 +0100, Ingo Molnar wrote: hm, do you have CONFIG_FRAME_POINTERS=y, i.e. are the dumps reliable? I do. Went through 10odd runs and annotated the function right below mcount each time. Seems to work now. Trouble is that it doesn't solve my real problem at hand. Something is causing significant delays when writing to logfs. Core logfs code is not running, but may cause whatever other code is running and burning up all the cpu time. Wasting 100ms of qemu-time to write a single page happens fairly frequently. With the latency tracer the problem appears to have become worse. Now the loftlockup code triggers quite frequently. Which makes a bit of sense, as the problem is a busy CPU, rather than an idle one. Guess I'll try oprofile or lcov instead. Jörn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote: After another ten or so notrace annotations throughout the spinlock code, the latency tracer appears to work. Not sure how many useful information is missing through all the annotations, though. And here is a patch with the needed annotations. Looks a bit shabby, as it was generated though git diff, patcher, interdiff and combinediff. Jörn -- Joern's library part 10: http://blogs.msdn.com/David_Gristwood/archive/2004/06/24/164849.aspx unchanged: --- a/arch/x86/kernel/i8253.c +++ b/arch/x86/kernel/i8253.c @@ -125,7 +125,7 @@ void __init setup_pit_timer(void) * to just read by itself. So use jiffies to emulate a free * running counter: */ -static cycle_t pit_read(void) +static notrace cycle_t pit_read(void) { unsigned long flags; int count; unchanged: --- a/kernel/spinlock.c +++ b/kernel/spinlock.c @@ -76,7 +76,7 @@ void __lockfunc _read_lock(rwlock_t *lock) } EXPORT_SYMBOL(_read_lock); -unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) +unsigned long notrace __lockfunc _spin_lock_irqsave(spinlock_t *lock) { unsigned long flags; @@ -341,7 +341,7 @@ void __lockfunc _read_unlock(rwlock_t *lock) } EXPORT_SYMBOL(_read_unlock); -void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) +void notrace __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { spin_release(lock-dep_map, 1, _RET_IP_); _raw_spin_unlock(lock); unchanged: --- a/lib/spinlock_debug.c +++ b/lib/spinlock_debug.c @@ -148,7 +148,7 @@ int _raw_spin_trylock(spinlock_t *lock) return ret; } -void _raw_spin_unlock(spinlock_t *lock) +void notrace _raw_spin_unlock(spinlock_t *lock) { debug_spin_unlock(lock); __raw_spin_unlock(lock-raw_lock); only in patch2: unchanged: --- linux/arch/x86/kernel/tsc_32.c +++ linux-2.6.24-rc3logfs/arch/x86/kernel/tsc_32.c 2007-12-02 15:21:15.0 +0100 @@ -92,7 +92,7 @@ /* * Scheduler clock - returns current time in nanosec units. */ -unsigned long long native_sched_clock(void) +unsigned notrace long long native_sched_clock(void) { unsigned long long this_offset; only in patch2: unchanged: --- linux/kernel/lockdep.c +++ linux-2.6.24-rc3logfs/kernel/lockdep.c 2007-12-02 15:21:16.0 +0100 @@ -139,7 +139,7 @@ return i; } -static void lock_time_inc(struct lock_time *lt, s64 time) +static notrace void lock_time_inc(struct lock_time *lt, s64 time) { if (time lt-max) lt-max = time; @@ -198,7 +198,7 @@ memset(class-contention_point, 0, sizeof(class-contention_point)); } -static struct lock_class_stats *get_lock_stats(struct lock_class *class) +static notrace struct lock_class_stats *get_lock_stats(struct lock_class *class) { return get_cpu_var(lock_stats)[class - lock_classes]; } @@ -208,7 +208,7 @@ put_cpu_var(lock_stats); } -static void lock_release_holdtime(struct held_lock *hlock) +static notrace void lock_release_holdtime(struct held_lock *hlock) { struct lock_class_stats *stats; s64 holdtime; @@ -2872,7 +2872,7 @@ } EXPORT_SYMBOL_GPL(lock_contended); -void lock_acquired(struct lockdep_map *lock) +void notrace lock_acquired(struct lockdep_map *lock) { unsigned long flags; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Sun, 2 December 2007 13:31:25 +0100, Jörn Engel wrote: After another ten or so notrace annotations throughout the spinlock code, the latency tracer appears to work. Not sure how many useful information is missing through all the annotations, though. And here is a patch with the needed annotations. Looks a bit shabby, as it was generated though git diff, patcher, interdiff and combinediff. thanks - the patches applied fine, i've added them to my latency-tracer patchqueue. Steve Rostedt might want to pick the fixes up for -rt as well. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: I do. Went through 10odd runs and annotated the function right below mcount each time. Seems to work now. Trouble is that it doesn't solve my real problem at hand. Something is causing significant delays when writing to logfs. Core logfs code is not running, but may cause whatever other code is running and burning up all the cpu time. Wasting 100ms of qemu-time to write a single page happens fairly frequently. With the latency tracer the problem appears to have become worse. Now the loftlockup code triggers quite frequently. Which makes a bit of sense, as the problem is a busy CPU, rather than an idle one. well what does the trace say, where do the delays come from? To get a quick overview you can make tracing lighter weight by doing: echo 0 /proc/sys/kernel/mcount_enabled echo 1 /proc/sys/kernel/trace_syscalls (this turns the latency tracer into a global strace kind of tracer) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote: well what does the trace say, where do the delays come from? To get a quick overview you can make tracing lighter weight by doing: echo 0 /proc/sys/kernel/mcount_enabled echo 1 /proc/sys/kernel/trace_syscalls I mistyped and did echo 1 /proc/sys/kernel/mcount_enabled Result looked like a livelock and finally convinced me to abandon the latency tracer. Sorry, but it appears to be the right tool for the wrong job. Jörn -- They laughed at Galileo. They laughed at Copernicus. They laughed at Columbus. But remember, they also laughed at Bozo the Clown. -- unknown -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Sun, 2 December 2007 16:47:46 +0100, Ingo Molnar wrote: well what does the trace say, where do the delays come from? To get a quick overview you can make tracing lighter weight by doing: echo 0 /proc/sys/kernel/mcount_enabled echo 1 /proc/sys/kernel/trace_syscalls I mistyped and did echo 1 /proc/sys/kernel/mcount_enabled Result looked like a livelock and finally convinced me to abandon the latency tracer. Sorry, but it appears to be the right tool for the wrong job. hm, we routinely use it in -rt to capture what on earth is happening incidents. The snippet below is a random snipped from a trace that i've just captured, with mcount enabled. It seems to work fine here, with and without mcount. (pit clocksource is almost never used, that's why you had those early problems.) oprofile helps if you can reliably reproduce the slowdown in a loop or for a long amount of time, with lots of CPU utilization - and then it's also lower overhead. The tracer can be used to capture rare or complex events, and gives the full flow control and what is happening within the kernel. Ingo idle 0 1D... 811us : sched_clock_idle_sleep_event (acpi_processor_idle) idle 0 1D... 813us : _spin_lock (sched_clock_idle_sleep_event) trace-cm 2463 0 814us : native_flush_tlb_others (flush_tlb_mm) idle 0 1D... 815us : __update_rq_clock (sched_clock_idle_sleep_event) trace-cm 2463 0 817us : _spin_lock (native_flush_tlb_others) idle 0 1D... 817us+: acpi_cstate_enter (acpi_processor_idle) trace-cm 2463 0 820us+: send_IPI_mask_bitmask (native_flush_tlb_others) trace-cm 2463 0D... 823us+: apic_wait_icr_idle (send_IPI_mask_bitmask) trace-cm 2463 0 856us+: up_write (copy_process) trace-cm 2463 0 859us+: copy_keys (copy_process) trace-cm 2463 0 862us+: copy_namespaces (copy_process) trace-cm 2463 0 865us+: copy_thread (copy_process) trace-cm 2463 0 868us+: memcpy (copy_thread) trace-cm 2463 0 871us+: alloc_pid (copy_process) trace-cm 2463 0 874us+: kmem_cache_alloc (alloc_pid) trace-cm 2463 0 877us+: _spin_lock_irq (alloc_pid) trace-cm 2463 0 880us+: _write_lock_irq (copy_process) trace-cm 2463 0D... 883us+: _spin_lock (copy_process) trace-cm 2463 0D... 887us+: recalc_sigpending (copy_process) trace-cm 2463 0D... 890us+: recalc_sigpending_tsk (recalc_sigpending) trace-cm 2463 0D... 893us+: attach_pid (copy_process) trace-cm 2463 0D... 896us+: attach_pid (copy_process) trace-cm 2463 0D... 899us+: attach_pid (copy_process) trace-cm 2463 0 902us+: proc_fork_connector (copy_process) trace-cm 2463 0 905us+: wake_up_new_task (do_fork) trace-cm 2463 0 908us+: task_rq_lock (wake_up_new_task) trace-cm 2463 0D... 911us+: _spin_lock (task_rq_lock) trace-cm 2463 0D... 914us+: update_rq_clock (wake_up_new_task) trace-cm 2463 0D... 918us+: __update_rq_clock (update_rq_clock) trace-cm 2463 0D... 921us+: effective_prio (wake_up_new_task) trace-cm 2463 0D... 924us+: wake_up_new_task hwclock 2474 (0 0) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:07:22 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: Result looked like a livelock and finally convinced me to abandon the latency tracer. Sorry, but it appears to be the right tool for the wrong job. hm, we routinely use it in -rt to capture what on earth is happening incidents. The snippet below is a random snipped from a trace that i've just captured, with mcount enabled. It seems to work fine here, with and without mcount. (pit clocksource is almost never used, that's why you had those early problems.) oprofile helps if you can reliably reproduce the slowdown in a loop or for a long amount of time, with lots of CPU utilization - and then it's also lower overhead. The tracer can be used to capture rare or complex events, and gives the full flow control and what is happening within the kernel. Such a trace would be useful indeed. But so far the patch has only given me grief and nothing remotely like useful output. Maybe I should simply use the complete -rt patch instead of debugging the broken-out latency-tracer patch. Jörn -- Mundie uses a textbook tactic of manipulation: start with some reasonable talk, and lead the audience to an unreasonable conclusion. -- Bruce Perens -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: oprofile helps if you can reliably reproduce the slowdown in a loop or for a long amount of time, with lots of CPU utilization - and then it's also lower overhead. The tracer can be used to capture rare or complex events, and gives the full flow control and what is happening within the kernel. Such a trace would be useful indeed. But so far the patch has only given me grief and nothing remotely like useful output. Maybe I should simply use the complete -rt patch instead of debugging the broken-out latency-tracer patch. to capture that trace i did not use -rt, i just patched latest -git with: http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch (this has your fixes included already) have done: echo 1 /proc/sys/kernel/mcount_enabled and have run: ./trace-cmd sleep 1 trace.txt http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c to capture a 1 second trace of what the system is doing. I think your troubles are due to running it within a qemu guest - that is not a typical utilization so you are on unchartered waters. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote: to capture a 1 second trace of what the system is doing. I think your troubles are due to running it within a qemu guest - that is not a typical utilization so you are on unchartered waters. Looks like it. Guess I'll switch to something else for the moment. Jörn -- Linux is more the core point of a concept that surrounds open source which, in turn, is based on a false concept. This concept is that people actually want to look at source code. -- Rob Enderle -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 21:45:59 +0100, Ingo Molnar wrote: to capture that trace i did not use -rt, i just patched latest -git with: http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch (this has your fixes included already) have done: echo 1 /proc/sys/kernel/mcount_enabled and have run: ./trace-cmd sleep 1 trace.txt http://people.redhat.com/mingo/latency-tracing-patches/trace-cmd.c to capture a 1 second trace of what the system is doing. I think your troubles are due to running it within a qemu guest - that is not a typical utilization so you are on unchartered waters. Maybe one more thing: can you send me the config you used for the setup above? I'd like to know whether qemu or my config is to blame. Jörn -- Eighty percent of success is showing up. -- Woody Allen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sun, 2 December 2007 22:19:00 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: Maybe one more thing: can you send me the config you used for the setup above? I'd like to know whether qemu or my config is to blame. sure - attached. After an eternity of compile time, this config does generate some useful output. qemu is not to blame. Jörn -- Joern's library part 9: http://www.scl.ameslab.gov/Publications/Gus/TwelveWays.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sat, 1 December 2007 21:54:56 +0100, Ingo Molnar wrote: > * J??rn Engel <[EMAIL PROTECTED]> wrote: > > > > stopped custom tracer. > > BUG: spinlock recursion on CPU#0, sh/953 > > lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 > > Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 > > [] show_trace_log_lvl+0x35/0x54 > > [] show_trace+0x2c/0x2e > > [] dump_stack+0x84/0x8a > > [] spin_bug+0xa7/0xae > > [] _raw_spin_lock+0x45/0xfa > > [] _spin_lock_irqsave+0x68/0x7a > > [] pit_read+0x14/0x99 > > [] get_monotonic_cycles+0xf/0x2d > > ah. You should mark pit_read() function as notrace. PIT clocksource is > rare. (add the 'notrace' word to the function prototype) Hardly a change at all. Apart from some offsets, this dump is identical. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/954 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 [] show_trace_log_lvl+0x35/0x54 [] show_trace+0x2c/0x2e [] dump_stack+0x84/0x8a [] spin_bug+0xa7/0xae [] _raw_spin_lock+0x45/0xfa [] _spin_lock_irqsave+0x68/0x7a [] pit_read+0xf/0x91 [] get_monotonic_cycles+0xf/0x2d [] now+0x2a/0x7c [] trace+0x4d/0x1e8 [] __mcount+0x95/0xa6 [] mcount+0x14/0x18 [] lock_acquired+0xe/0x1d7 [] _spin_lock_irqsave+0x70/0x7a [] pit_read+0xf/0x91 [] update_wall_time+0x23/0x692 [] do_timer+0x24/0xb1 [] tick_periodic+0x49/0x84 [] tick_handle_periodic+0x22/0x73 [] timer_interrupt+0x4f/0x56 [] handle_IRQ_event+0x24/0x4f [] handle_edge_irq+0xb8/0x125 [] do_IRQ+0x89/0xa3 [] common_interrupt+0x23/0x28 [] mcount+0x14/0x18 [] sysctl_head_finish+0xc/0x33 [] proc_sys_write+0x96/0xa0 [] vfs_write+0xa6/0x14c [] sys_write+0x4c/0x70 [] syscall_call+0x7/0xb === Jörn -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by Raph Levien, 1979 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* J??rn Engel <[EMAIL PROTECTED]> wrote: > Almost. "-serial stdio" was missing. Much better now. > > stopped custom tracer. > BUG: spinlock recursion on CPU#0, sh/953 > lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 > Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 > [] show_trace_log_lvl+0x35/0x54 > [] show_trace+0x2c/0x2e > [] dump_stack+0x84/0x8a > [] spin_bug+0xa7/0xae > [] _raw_spin_lock+0x45/0xfa > [] _spin_lock_irqsave+0x68/0x7a > [] pit_read+0x14/0x99 > [] get_monotonic_cycles+0xf/0x2d ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sat, 1 December 2007 19:32:56 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > I have to change my qemu setup a little to see the top of those > > dumps... > > btw., if you start qemu like this: > > qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel > ~/bzImage -append "root=/dev/hda1 earlyprintk=serial,ttyS0,9600 > console=tty console=ttyS0,9600 enforcing=0 debug" > > you'll get the inner kernel's serial console log to qemu's standard > output. Pretty useful for capturing kernel crashes. Almost. "-serial stdio" was missing. Much better now. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/953 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 [] show_trace_log_lvl+0x35/0x54 [] show_trace+0x2c/0x2e [] dump_stack+0x84/0x8a [] spin_bug+0xa7/0xae [] _raw_spin_lock+0x45/0xfa [] _spin_lock_irqsave+0x68/0x7a [] pit_read+0x14/0x99 [] get_monotonic_cycles+0xf/0x2d [] now+0x2a/0x7c [] trace+0x4d/0x1e8 [] __mcount+0x95/0xa6 [] mcount+0x14/0x18 [] lock_acquired+0xe/0x1d7 [] _spin_lock_irqsave+0x70/0x7a [] pit_read+0x14/0x99 [] update_wall_time+0x23/0x692 [] do_timer+0x24/0xb1 [] tick_periodic+0x49/0x84 [] tick_handle_periodic+0x22/0x73 [] timer_interrupt+0x4f/0x56 [] handle_IRQ_event+0x24/0x4f [] handle_edge_irq+0xb8/0x125 [] do_IRQ+0x89/0xa3 [] common_interrupt+0x23/0x28 [] vfs_write+0xa6/0x14c [] sys_write+0x4c/0x70 [] syscall_call+0x7/0xb === I assume you have the latency tracer working. If you could send me your config, I could do a manual config-bisect and see which part of mine causes the problem. Jörn -- Admonish your friends privately, but praise them openly. -- Publilius Syrus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > > hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that > > improve things? (or just turns a noisy lockup into a silent lockup?) > > Not much, although the dumps look different now: > http://logfs.org/~joern/trace3.jpg > http://logfs.org/~joern/trace4.jpg > > I have to change my qemu setup a little to see the top of those > dumps... btw., if you start qemu like this: qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel ~/bzImage -append "root=/dev/hda1 earlyprintk=serial,ttyS0,9600 console=tty console=ttyS0,9600 enforcing=0 debug" you'll get the inner kernel's serial console log to qemu's standard output. Pretty useful for capturing kernel crashes. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 19:46:25 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > > It compiles. It boots with a 512M RAM (384M was too little with all > > the other debug options on). But it seems to lock up when running > > trace-cmd. On a rerun it locks up again, but with different output. > > hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 > million to 16K or so. 1 million entries probably depletes lowmem quite > seriously. That's ok. RAM is cheap. > > Rerun was captured: > > http://logfs.org/~joern/trace1.jpg > > hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that > improve things? (or just turns a noisy lockup into a silent lockup?) Not much, although the dumps look different now: http://logfs.org/~joern/trace3.jpg http://logfs.org/~joern/trace4.jpg I have to change my qemu setup a little to see the top of those dumps... > > I should do a couple of runs, but my girlfriend claims realtime > > priority for the evening. > > yeah, SCHED_IDLE is not generally well received by them. ...as soon as more urgent tasks has finished (weekend is over). Jörn -- It does not matter how slowly you go, so long as you do not stop. -- Confucius -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 19:46:25 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: It compiles. It boots with a 512M RAM (384M was too little with all the other debug options on). But it seems to lock up when running trace-cmd. On a rerun it locks up again, but with different output. hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 million to 16K or so. 1 million entries probably depletes lowmem quite seriously. That's ok. RAM is cheap. Rerun was captured: http://logfs.org/~joern/trace1.jpg hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that improve things? (or just turns a noisy lockup into a silent lockup?) Not much, although the dumps look different now: http://logfs.org/~joern/trace3.jpg http://logfs.org/~joern/trace4.jpg I have to change my qemu setup a little to see the top of those dumps... I should do a couple of runs, but my girlfriend claims realtime priority for the evening. yeah, SCHED_IDLE is not generally well received by them. ...as soon as more urgent tasks has finished (weekend is over). Jörn -- It does not matter how slowly you go, so long as you do not stop. -- Confucius -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that improve things? (or just turns a noisy lockup into a silent lockup?) Not much, although the dumps look different now: http://logfs.org/~joern/trace3.jpg http://logfs.org/~joern/trace4.jpg I have to change my qemu setup a little to see the top of those dumps... btw., if you start qemu like this: qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel ~/bzImage -append root=/dev/hda1 earlyprintk=serial,ttyS0,9600 console=tty console=ttyS0,9600 enforcing=0 debug you'll get the inner kernel's serial console log to qemu's standard output. Pretty useful for capturing kernel crashes. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sat, 1 December 2007 19:32:56 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: I have to change my qemu setup a little to see the top of those dumps... btw., if you start qemu like this: qemu -cdrom ./cdrom.iso -hda ./hda.img -boot c -full-screen -kernel ~/bzImage -append root=/dev/hda1 earlyprintk=serial,ttyS0,9600 console=tty console=ttyS0,9600 enforcing=0 debug you'll get the inner kernel's serial console log to qemu's standard output. Pretty useful for capturing kernel crashes. Almost. -serial stdio was missing. Much better now. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/953 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e7] pit_read+0x14/0x99 [c0130ee9] get_monotonic_cycles+0xf/0x2d [c013c0ef] now+0x2a/0x7c [c013c33b] trace+0x4d/0x1e8 [c013dbf3] __mcount+0x95/0xa6 [c010d35c] mcount+0x14/0x18 [c0135a44] lock_acquired+0xe/0x1d7 [c02a02b9] _spin_lock_irqsave+0x70/0x7a [c01087e7] pit_read+0x14/0x99 [c0130791] update_wall_time+0x23/0x692 [c0121756] do_timer+0x24/0xb1 [c01331fe] tick_periodic+0x49/0x84 [c013325b] tick_handle_periodic+0x22/0x73 [c0106315] timer_interrupt+0x4f/0x56 [c013e2c7] handle_IRQ_event+0x24/0x4f [c013f44a] handle_edge_irq+0xb8/0x125 [c01054ee] do_IRQ+0x89/0xa3 [c01033df] common_interrupt+0x23/0x28 [c015d924] vfs_write+0xa6/0x14c [c015df6e] sys_write+0x4c/0x70 [c0102a1f] syscall_call+0x7/0xb === I assume you have the latency tracer working. If you could send me your config, I could do a manual config-bisect and see which part of mine causes the problem. Jörn -- Admonish your friends privately, but praise them openly. -- Publilius Syrus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* J??rn Engel [EMAIL PROTECTED] wrote: Almost. -serial stdio was missing. Much better now. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/953 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e7] pit_read+0x14/0x99 [c0130ee9] get_monotonic_cycles+0xf/0x2d ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Sat, 1 December 2007 21:54:56 +0100, Ingo Molnar wrote: * J??rn Engel [EMAIL PROTECTED] wrote: stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/953 lock: c030f280, .magic: dead4ead, .owner: sh/953, .owner_cpu: 0 Pid: 953, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #2 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e7] pit_read+0x14/0x99 [c0130ee9] get_monotonic_cycles+0xf/0x2d ah. You should mark pit_read() function as notrace. PIT clocksource is rare. (add the 'notrace' word to the function prototype) Hardly a change at all. Apart from some offsets, this dump is identical. stopped custom tracer. BUG: spinlock recursion on CPU#0, sh/954 lock: c030f280, .magic: dead4ead, .owner: sh/954, .owner_cpu: 0 Pid: 954, comm: sh Not tainted 2.6.24-rc3-ge1cca7e8-dirty #3 [c0103a04] show_trace_log_lvl+0x35/0x54 [c010450a] show_trace+0x2c/0x2e [c0104e6d] dump_stack+0x84/0x8a [c01ded7c] spin_bug+0xa7/0xae [c01def14] _raw_spin_lock+0x45/0xfa [c02a02b1] _spin_lock_irqsave+0x68/0x7a [c01087e2] pit_read+0xf/0x91 [c0130ee1] get_monotonic_cycles+0xf/0x2d [c013c0e7] now+0x2a/0x7c [c013c333] trace+0x4d/0x1e8 [c013dbeb] __mcount+0x95/0xa6 [c010d354] mcount+0x14/0x18 [c0135a3c] lock_acquired+0xe/0x1d7 [c02a02b9] _spin_lock_irqsave+0x70/0x7a [c01087e2] pit_read+0xf/0x91 [c0130789] update_wall_time+0x23/0x692 [c012174e] do_timer+0x24/0xb1 [c01331f6] tick_periodic+0x49/0x84 [c0133253] tick_handle_periodic+0x22/0x73 [c0106315] timer_interrupt+0x4f/0x56 [c013e2bf] handle_IRQ_event+0x24/0x4f [c013f442] handle_edge_irq+0xb8/0x125 [c01054ee] do_IRQ+0x89/0xa3 [c01033df] common_interrupt+0x23/0x28 [c010d354] mcount+0x14/0x18 [c0120130] sysctl_head_finish+0xc/0x33 [c0192d64] proc_sys_write+0x96/0xa0 [c015d91c] vfs_write+0xa6/0x14c [c015df66] sys_write+0x4c/0x70 [c0102a1f] syscall_call+0x7/0xb === Jörn -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by Raph Levien, 1979 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. Is there a version of these that works with 2.6.23.1 ? I'm not using 2.6.24-* on this machine because: (a) behaviour may be different, and (b) something broke VMware compatibility again. .. An update: Ingo's 2.6.23 version of the latency-tracing-patches only lock-up on resume here. But now that I've hacked vmware to work on 2.6.24, my notebook is now running the newer kernels. So now to see if the "strange 1-second pauses" ever happen here, and if they do I'll patch in Ingo's stuff to try and find out why. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote: > > > > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch > > > > does it work any better? > > It compiles. It boots with a 512M RAM (384M was too little with all > the other debug options on). But it seems to lock up when running > trace-cmd. On a rerun it locks up again, but with different output. hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 million to 16K or so. 1 million entries probably depletes lowmem quite seriously. > Rerun was captured: > http://logfs.org/~joern/trace1.jpg hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that improve things? (or just turns a noisy lockup into a silent lockup?) > I should do a couple of runs, but my girlfriend claims realtime > priority for the evening. yeah, SCHED_IDLE is not generally well received by them. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote: > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch > > does it work any better? It compiles. It boots with a 512M RAM (384M was too little with all the other debug options on). But it seems to lock up when running trace-cmd. On a rerun it locks up again, but with different output. Rerun was captured: http://logfs.org/~joern/trace1.jpg I should do a couple of runs, but my girlfriend claims realtime priority for the evening. Jörn -- Chance favors only the prepared mind. -- Louis Pasteur - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 14:35:46 +0100, Ingo Molnar wrote: > * Jörn Engel <[EMAIL PROTECTED]> wrote: > > > > kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter > > list > > kernel/sched.c:3384: warning: its scope is only this definition or > > declaration, which is probably not what you want > > kernel/sched.c: In function ‘trace_array’: > > kernel/sched.c:3391: error: dereferencing pointer to incomplete type > > kernel/sched.c:3393: error: dereferencing pointer to incomplete type > > kernel/sched.c:3393: error: dereferencing pointer to incomplete type > > kernel/sched.c:3396: error: dereferencing pointer to incomplete type > > kernel/sched.c:3396: error: dereferencing pointer to incomplete type > > kernel/sched.c: In function ‘trace_all_runnable_tasks’: > > kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ > > make[1]: *** [kernel/sched.o] Error 1 > > > > And I cannot find a definition of struct prio_array in current git > > either. Is another patch needed? > > change that to rt_prio_array in the code. Solves the prio_array problem, but leaves the non-existing member active. I've upgraded to -rc3 and will give your latest patch a whirl. Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > sorry, wrong URLs, the correct links are: > > > > > > > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch > > >http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c > > > > Don't seem to work with plain 2.6.23: could you try this updated version: http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch does it work any better? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel <[EMAIL PROTECTED]> wrote: > On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > pick up the latest latency tracer patch from: > > > > sorry, wrong URLs, the correct links are: > > > > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch > >http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c > > Don't seem to work with plain 2.6.23: > > kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter > list > kernel/sched.c:3384: warning: its scope is only this definition or > declaration, which is probably not what you want > kernel/sched.c: In function ‘trace_array’: > kernel/sched.c:3391: error: dereferencing pointer to incomplete type > kernel/sched.c:3393: error: dereferencing pointer to incomplete type > kernel/sched.c:3393: error: dereferencing pointer to incomplete type > kernel/sched.c:3396: error: dereferencing pointer to incomplete type > kernel/sched.c:3396: error: dereferencing pointer to incomplete type > kernel/sched.c: In function ‘trace_all_runnable_tasks’: > kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ > make[1]: *** [kernel/sched.o] Error 1 > > And I cannot find a definition of struct prio_array in current git > either. Is another patch needed? change that to rt_prio_array in the code. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote: > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > pick up the latest latency tracer patch from: > > sorry, wrong URLs, the correct links are: > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch >http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Don't seem to work with plain 2.6.23: kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list kernel/sched.c:3384: warning: its scope is only this definition or declaration, which is probably not what you want kernel/sched.c: In function ‘trace_array’: kernel/sched.c:3391: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c: In function ‘trace_all_runnable_tasks’: kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ make[1]: *** [kernel/sched.o] Error 1 And I cannot find a definition of struct prio_array in current git either. Is another patch needed? Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch does it work any better? It compiles. It boots with a 512M RAM (384M was too little with all the other debug options on). But it seems to lock up when running trace-cmd. On a rerun it locks up again, but with different output. hm, you should decrease MAX_TRACE in kernel/latency_tracing.c from 1 million to 16K or so. 1 million entries probably depletes lowmem quite seriously. Rerun was captured: http://logfs.org/~joern/trace1.jpg hm, that looks weird. if you disable CONFIG_PROVE_LOCKING, does that improve things? (or just turns a noisy lockup into a silent lockup?) I should do a couple of runs, but my girlfriend claims realtime priority for the evening. yeah, SCHED_IDLE is not generally well received by them. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 14:43:12 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch does it work any better? It compiles. It boots with a 512M RAM (384M was too little with all the other debug options on). But it seems to lock up when running trace-cmd. On a rerun it locks up again, but with different output. Rerun was captured: http://logfs.org/~joern/trace1.jpg I should do a couple of runs, but my girlfriend claims realtime priority for the evening. Jörn -- Chance favors only the prepared mind. -- Louis Pasteur - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar [EMAIL PROTECTED] wrote: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Don't seem to work with plain 2.6.23: could you try this updated version: http://redhat.com/~mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch does it work any better? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Don't seem to work with plain 2.6.23: kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list kernel/sched.c:3384: warning: its scope is only this definition or declaration, which is probably not what you want kernel/sched.c: In function ‘trace_array’: kernel/sched.c:3391: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c: In function ‘trace_all_runnable_tasks’: kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ make[1]: *** [kernel/sched.o] Error 1 And I cannot find a definition of struct prio_array in current git either. Is another patch needed? Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 30 November 2007 14:35:46 +0100, Ingo Molnar wrote: * Jörn Engel [EMAIL PROTECTED] wrote: kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list kernel/sched.c:3384: warning: its scope is only this definition or declaration, which is probably not what you want kernel/sched.c: In function ‘trace_array’: kernel/sched.c:3391: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c: In function ‘trace_all_runnable_tasks’: kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ make[1]: *** [kernel/sched.o] Error 1 And I cannot find a definition of struct prio_array in current git either. Is another patch needed? change that to rt_prio_array in the code. Solves the prio_array problem, but leaves the non-existing member active. I've upgraded to -rc3 and will give your latest patch a whirl. Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Jörn Engel [EMAIL PROTECTED] wrote: On Thu, 15 November 2007 20:36:12 +0100, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Don't seem to work with plain 2.6.23: kernel/sched.c:3384: warning: ‘struct prio_array’ declared inside parameter list kernel/sched.c:3384: warning: its scope is only this definition or declaration, which is probably not what you want kernel/sched.c: In function ‘trace_array’: kernel/sched.c:3391: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3393: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c:3396: error: dereferencing pointer to incomplete type kernel/sched.c: In function ‘trace_all_runnable_tasks’: kernel/sched.c:3407: error: ‘struct rq’ has no member named ‘active’ make[1]: *** [kernel/sched.o] Error 1 And I cannot find a definition of struct prio_array in current git either. Is another patch needed? change that to rt_prio_array in the code. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. Is there a version of these that works with 2.6.23.1 ? I'm not using 2.6.24-* on this machine because: (a) behaviour may be different, and (b) something broke VMware compatibility again. .. An update: Ingo's 2.6.23 version of the latency-tracing-patches only lock-up on resume here. But now that I've hacked vmware to work on 2.6.24, my notebook is now running the newer kernels. So now to see if the strange 1-second pauses ever happen here, and if they do I'll patch in Ingo's stuff to try and find out why. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Mark Lord <[EMAIL PROTECTED]> wrote: Since Ingo's latency trace patches lock up the machine on resume, the next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y. hm, which patch did you try? Could you check whether all chunks from the patch below are applied? (these are the fixed i did when i was doing cross-suspend traces - this is not something i've done before, so the tracer had to be adjusted) .. Hi Ingo! I was using your 2.6.23.1 version of the patches, plus the fix you posted. The new patch you gave just now is for 2.6.24, which won't apply to the older kernel. I'm not switching kernels yet, as doing so might mask the problem without actually resolving it for good. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord <[EMAIL PROTECTED]> wrote: > Since Ingo's latency trace patches lock up the machine on resume, the > next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y. hm, which patch did you try? Could you check whether all chunks from the patch below are applied? (these are the fixed i did when i was doing cross-suspend traces - this is not something i've done before, so the tracer had to be adjusted) i suspect if you turn off CONFIG_FUNCTION_TRACING then you wont get any hung resume - and the resulting trace would still be pretty useful. (it will show scheduling and irq activities, etc.) Ingo --- arch/x86/kernel/stacktrace.c |2 +- arch/x86/power/cpu.c |3 ++- drivers/acpi/namespace/nsutils.c |2 +- drivers/acpi/namespace/nswalk.c |2 +- include/linux/sched.h|2 ++ kernel/latency_trace.c | 26 +++--- kernel/softirq.c |6 +++--- 7 files changed, 33 insertions(+), 10 deletions(-) Index: linux/arch/x86/kernel/stacktrace.c === --- linux.orig/arch/x86/kernel/stacktrace.c +++ linux/arch/x86/kernel/stacktrace.c @@ -22,7 +22,7 @@ static int save_stack_stack(void *data, return -1; } -static void save_stack_address(void *data, unsigned long addr) +static void notrace save_stack_address(void *data, unsigned long addr) { struct stack_trace *trace = (struct stack_trace *)data; if (trace->skip > 0) { Index: linux/arch/x86/power/cpu.c === --- linux.orig/arch/x86/power/cpu.c +++ linux/arch/x86/power/cpu.c @@ -123,8 +123,9 @@ void __restore_processor_state(struct sa mcheck_init(_cpu_data); } -void restore_processor_state(void) +void notrace restore_processor_state(void) { + trace_resume(); __restore_processor_state(_context); } Index: linux/drivers/acpi/namespace/nsutils.c === --- linux.orig/drivers/acpi/namespace/nsutils.c +++ linux/drivers/acpi/namespace/nsutils.c @@ -923,7 +923,7 @@ struct acpi_namespace_node *acpi_ns_get_ * **/ -struct acpi_namespace_node *acpi_ns_get_next_valid_node(struct +struct acpi_namespace_node * notrace acpi_ns_get_next_valid_node(struct acpi_namespace_node *node) { Index: linux/drivers/acpi/namespace/nswalk.c === --- linux.orig/drivers/acpi/namespace/nswalk.c +++ linux/drivers/acpi/namespace/nswalk.c @@ -65,7 +65,7 @@ ACPI_MODULE_NAME("nswalk") * within Scope is returned. * **/ -struct acpi_namespace_node *acpi_ns_get_next_node(acpi_object_type type, struct acpi_namespace_node +struct acpi_namespace_node * notrace acpi_ns_get_next_node(acpi_object_type type, struct acpi_namespace_node *parent_node, struct acpi_namespace_node *child_node) { Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -337,6 +337,7 @@ static inline void touch_all_softlockup_ extern long user_trace_stop(void); extern void trace_cmdline(void); extern void init_tracer(void); + extern void trace_resume(void); #else # define mcount_enabled0 # define trace_enabled 0 @@ -358,6 +359,7 @@ static inline void touch_all_softlockup_ # define user_trace_stop() do { } while (0) # define trace_cmdline() do { } while (0) # define init_tracer() do { } while (0) +# define trace_resume()do { } while (0) #endif #ifdef CONFIG_WAKEUP_TIMING Index: linux/kernel/latency_trace.c === --- linux.orig/kernel/latency_trace.c +++ linux/kernel/latency_trace.c @@ -258,19 +258,28 @@ static struct cpu_trace cpu_traces[NR_CP #endif } }; -static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +static inline notrace cycle_t __now(int monotonic) { - cycles_t now, delta, last = tr->last_cycles; + cycles_t now; if (trace_use_raw_cycles && !monotonic) now = get_cycles(); else now = get_monotonic_cycles(); + return now; +} + +static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +{ + cycles_t now, delta, last = tr->last_cycles; + + now = __now(monotonic); + /*
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a "normal" fast wake-up, and the second set is from the "slow" wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * HERE IS THE NORMAL RESUME LOG. [ 820.147914] CPU1 is down [0.355669] Back to C! [0.356120] Enabling non-boot CPUs ... [0.377167] SMP alternatives: switching to SMP code [0.378565] Booting processor 1/1 eip 3000 [0.378568] CPU 1 irqstacks, hard=c0375000 soft=c0373000 [0.389374] Initializing CPU#1 [0.470225] Calibrating delay using timer specific routine.. 4324.85 BogoMIPS (lpj=7204571) [0.470235] CPU: After generic identify, caps: bfebfbff 2010 e3bd 0001 [0.470244] monitor/mwait feature present. [0.470248] CPU: L1 I cache: 32K, L1 D cache: 32K [0.470252] CPU: L2 cache: 4096K [0.470255] CPU: Physical Processor ID: 0 [0.470257] CPU: Processor Core ID: 1 [0.470260] CPU: After all inits, caps: bfebfbff 2010 3940 e3bd 0001 [0.471330] CPU1: Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz stepping 06 [0.471724] CPU1 is up [0.474787] Switched to high resolution mode on CPU 1 [0.483415] PM: Writing back config space on device :00:01.0 at offset a (was f, writing 0) [0.483422] PM: Writing back config space on device :00:01.0 at offset 7 (was
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord [EMAIL PROTECTED] wrote: Since Ingo's latency trace patches lock up the machine on resume, the next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y. hm, which patch did you try? Could you check whether all chunks from the patch below are applied? (these are the fixed i did when i was doing cross-suspend traces - this is not something i've done before, so the tracer had to be adjusted) i suspect if you turn off CONFIG_FUNCTION_TRACING then you wont get any hung resume - and the resulting trace would still be pretty useful. (it will show scheduling and irq activities, etc.) Ingo --- arch/x86/kernel/stacktrace.c |2 +- arch/x86/power/cpu.c |3 ++- drivers/acpi/namespace/nsutils.c |2 +- drivers/acpi/namespace/nswalk.c |2 +- include/linux/sched.h|2 ++ kernel/latency_trace.c | 26 +++--- kernel/softirq.c |6 +++--- 7 files changed, 33 insertions(+), 10 deletions(-) Index: linux/arch/x86/kernel/stacktrace.c === --- linux.orig/arch/x86/kernel/stacktrace.c +++ linux/arch/x86/kernel/stacktrace.c @@ -22,7 +22,7 @@ static int save_stack_stack(void *data, return -1; } -static void save_stack_address(void *data, unsigned long addr) +static void notrace save_stack_address(void *data, unsigned long addr) { struct stack_trace *trace = (struct stack_trace *)data; if (trace-skip 0) { Index: linux/arch/x86/power/cpu.c === --- linux.orig/arch/x86/power/cpu.c +++ linux/arch/x86/power/cpu.c @@ -123,8 +123,9 @@ void __restore_processor_state(struct sa mcheck_init(boot_cpu_data); } -void restore_processor_state(void) +void notrace restore_processor_state(void) { + trace_resume(); __restore_processor_state(saved_context); } Index: linux/drivers/acpi/namespace/nsutils.c === --- linux.orig/drivers/acpi/namespace/nsutils.c +++ linux/drivers/acpi/namespace/nsutils.c @@ -923,7 +923,7 @@ struct acpi_namespace_node *acpi_ns_get_ * **/ -struct acpi_namespace_node *acpi_ns_get_next_valid_node(struct +struct acpi_namespace_node * notrace acpi_ns_get_next_valid_node(struct acpi_namespace_node *node) { Index: linux/drivers/acpi/namespace/nswalk.c === --- linux.orig/drivers/acpi/namespace/nswalk.c +++ linux/drivers/acpi/namespace/nswalk.c @@ -65,7 +65,7 @@ ACPI_MODULE_NAME(nswalk) * within Scope is returned. * **/ -struct acpi_namespace_node *acpi_ns_get_next_node(acpi_object_type type, struct acpi_namespace_node +struct acpi_namespace_node * notrace acpi_ns_get_next_node(acpi_object_type type, struct acpi_namespace_node *parent_node, struct acpi_namespace_node *child_node) { Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -337,6 +337,7 @@ static inline void touch_all_softlockup_ extern long user_trace_stop(void); extern void trace_cmdline(void); extern void init_tracer(void); + extern void trace_resume(void); #else # define mcount_enabled0 # define trace_enabled 0 @@ -358,6 +359,7 @@ static inline void touch_all_softlockup_ # define user_trace_stop() do { } while (0) # define trace_cmdline() do { } while (0) # define init_tracer() do { } while (0) +# define trace_resume()do { } while (0) #endif #ifdef CONFIG_WAKEUP_TIMING Index: linux/kernel/latency_trace.c === --- linux.orig/kernel/latency_trace.c +++ linux/kernel/latency_trace.c @@ -258,19 +258,28 @@ static struct cpu_trace cpu_traces[NR_CP #endif } }; -static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +static inline notrace cycle_t __now(int monotonic) { - cycles_t now, delta, last = tr-last_cycles; + cycles_t now; if (trace_use_raw_cycles !monotonic) now = get_cycles(); else now = get_monotonic_cycles(); + return now; +} + +static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +{ + cycles_t now, delta, last = tr-last_cycles; + + now = __now(monotonic); + /*
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * HERE IS THE NORMAL RESUME LOG. [ 820.147914] CPU1 is down [0.355669] Back to C! [0.356120] Enabling non-boot CPUs ... [0.377167] SMP alternatives: switching to SMP code [0.378565] Booting processor 1/1 eip 3000 [0.378568] CPU 1 irqstacks, hard=c0375000 soft=c0373000 [0.389374] Initializing CPU#1 [0.470225] Calibrating delay using timer specific routine.. 4324.85 BogoMIPS (lpj=7204571) [0.470235] CPU: After generic identify, caps: bfebfbff 2010 e3bd 0001 [0.470244] monitor/mwait feature present. [0.470248] CPU: L1 I cache: 32K, L1 D cache: 32K [0.470252] CPU: L2 cache: 4096K [0.470255] CPU: Physical Processor ID: 0 [0.470257] CPU: Processor Core ID: 1 [0.470260] CPU: After all inits, caps: bfebfbff 2010 3940 e3bd 0001 [0.471330] CPU1: Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz stepping 06 [0.471724] CPU1 is up [0.474787] Switched to high resolution mode on CPU 1 [0.483415] PM: Writing back config space on device :00:01.0 at offset a (was f, writing 0) [0.483422] PM: Writing back config space on device :00:01.0 at offset 7 (was e0e0,
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Mark Lord [EMAIL PROTECTED] wrote: Since Ingo's latency trace patches lock up the machine on resume, the next thing I'll try instead is to re-enable CONFIG_IRQBALANCE=y. hm, which patch did you try? Could you check whether all chunks from the patch below are applied? (these are the fixed i did when i was doing cross-suspend traces - this is not something i've done before, so the tracer had to be adjusted) .. Hi Ingo! I was using your 2.6.23.1 version of the patches, plus the fix you posted. The new patch you gave just now is for 2.6.24, which won't apply to the older kernel. I'm not switching kernels yet, as doing so might mask the problem without actually resolving it for good. Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: Is there a version of these that works with 2.6.23.1 ? yes, i've backported it and have uploaded the v2.6.23 version to: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch .. ok, i experimented around with the latency tracer, trying to capture the trace of a full suspend+resume cycle, and it needed the tracer fix below (GTOD clocksource suspend/resume would otherwise confuse the tracer and you'd get no trace output as a result). once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 > /proc/sys/kernel/stackframe_tracing echo 1 > /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt .. I've applied the tracer for 2.6.23.1 patch, plus the bugfix, and now I just get blinking-LEDs (black screen of death) on resume. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. make trace-cmd cc -Wall -O2 -strace-cmd.c -o trace-cmd trace-cmd.c: In function ‘main’: trace-cmd.c:65: warning: label ‘usage’ defined but not used - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 2007-11-16 at 12:23 +0100, Ingo Molnar wrote: > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > once that tracer bug was fixed, the best method to generate a trace > > was to do this: > > > >echo 1 > /proc/sys/kernel/stackframe_tracing > >echo 1 > /proc/sys/kernel/syscall_tracing > >./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt > > so here's an UP suspend+resume trace i did: > > > http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 > > tons of detail - which might be interesting to other folks as well. Fact > is, our suspend-to-RAM+resume cycle is very, very slow, even on fast > hardware - and this trace shows all the reasons why. > > This was a fully cached system - i.e. i've done a suspend+resume before > to warm up the caches. (not that suspend+resume does much IO normally.) > > The trace shows that a suspend+resume cycle is 7.95 seconds long > (without counting the time the box spent suspended) - ouch! This was a > T60 with Core2Duo 1.83GHz. Ouch? That's an order of magnitude faster than my 3GHz P4 :) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > so here's an UP suspend+resume trace i did: > > > http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 > > tons of detail - which might be interesting to other folks as well. > Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on > fast hardware - and this trace shows all the reasons why. > > This was a fully cached system - i.e. i've done a suspend+resume > before to warm up the caches. (not that suspend+resume does much IO > normally.) > > The trace shows that a suspend+resume cycle is 7.95 seconds long > (without counting the time the box spent suspended) - ouch! This was a > T60 with Core2Duo 1.83GHz. and the amount of time spent executing on the CPU was only 70 msecs! So we spent 99% of that 7.9 seconds with just waiting around. Here are the top 10 sleep reasons: 864 schedule()<-schedule_timeout()<-ps2_sendbyte()<-ps2_command() 183 schedule()<-vt_waitactive()<-vt_ioctl()<-tty_ioctl() 164 schedule()<-schedule_timeout()<-acpi_ec_wait()<-acpi_ec_transaction() 157 schedule()<-refrigerator()<-get_signal_to_deliver()<-do_notify_resume() 118 schedule()<-worker_thread()<-kthread()<-kernel_thread_helper() 80 schedule()<-do_msleep()<-msleep()<-sata_link_debounce() 64 schedule()<-schedule_timeout()<-inet_csk_accept()<-inet_accept() 37 schedule()<-__mutex_lock_slowpath()<-mutex_lock()<-acpi_ec_transaction() 20 schedule()<-schedule_timeout()<-do_select()<-core_sys_select() 20 schedule()<-io_schedule()<-sync_buffer()<-__wait_on_bit() what's weird are all those ps2 related sleeps - they make up for much of the delay. This how such a sleep point looks like: bash 3500 0D... 8641415us : schedule()<-schedule_timeout()<-ps2_sendbyte()<-ps2_command() bash 3500 0D... 8641417us : psmouse_sliced_command()<-synaptics_pt_write()<-ps2_sendbyte()<-ps2_command() it starts somewhere here: bash 3500 0 5302376us : serio_reconnect_driver (serio_resume) and ends: kseriod 208 0D... 9182560us : synaptics_query_hardware()<-synaptics_reconnect()<-psmouse_reconnect()<-serio_reconnect_driver() so this section (serio_resume()) took almost 4 seconds. the main delay seems to be dpm_resume(): bash 3500 0.N.. 3061040us : dpm_resume (device_resume) ... bash 3500 0 9105017us : mutex_unlock (dpm_resume) bash 3500 0 9105018us : mutex_unlock (device_resume) 6.1 seconds! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > once that tracer bug was fixed, the best method to generate a trace > was to do this: > >echo 1 > /proc/sys/kernel/stackframe_tracing >echo 1 > /proc/sys/kernel/syscall_tracing >./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt so here's an UP suspend+resume trace i did: http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 tons of detail - which might be interesting to other folks as well. Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on fast hardware - and this trace shows all the reasons why. This was a fully cached system - i.e. i've done a suspend+resume before to warm up the caches. (not that suspend+resume does much IO normally.) The trace shows that a suspend+resume cycle is 7.95 seconds long (without counting the time the box spent suspended) - ouch! This was a T60 with Core2Duo 1.83GHz. For example here is where freezing starts: bash-2397 0 31686us : remove_wait_queue (vt_waitactive) bash-2397 0 31688us : freeze_processes (enter_state) bash-2397 0 31689us : printk (freeze_processes) here is where the ACPI code triggers the suspend: bash-2397 0D... 1904138us : acpi_hw_low_level_write (acpi_hw_register_write) but this is a whopping 1.9 seconds into the trace already! first sign of life after i opened the laptop lid again: bash-2397 0D... 1904138us : __restore_processor_state (restore_processor_state) bash-2397 0D... 1904138us : enable_sep_cpu (__restore_processor_state) (in the trace there's no delay visible - the period of time spent suspended is not visible to the tracer.) One good way to start looking at such traces is to filter out rescheduling events alone: grep ': schedule <' trace-suspend-long.txt that gives a rough outline of what's going on: -0 0D... 1776566us : schedule (0 20) bash-2397 0D... 1786748us : schedule <-0> (20 0) scsi_eh_-419 0D... 1786814us : schedule (0 -5) bash-2397 0D... 1786960us : schedule (-5 0) scsi_eh_-421 0D... 1787020us : schedule (0 -5) bash-2397 0D... 1787125us : schedule (-5 0) so you can zoom in on the real area of interest by searching for the timestamp. Hope this helps, Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
ok, i experimented around with the latency tracer, trying to capture the trace of a full suspend+resume cycle, and it needed the tracer fix below (GTOD clocksource suspend/resume would otherwise confuse the tracer and you'd get no trace output as a result). once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 > /proc/sys/kernel/stackframe_tracing echo 1 > /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt Ingo --- kernel/latency_trace.c | 53 + 1 file changed, 36 insertions(+), 17 deletions(-) Index: linux/kernel/latency_trace.c === --- linux.orig/kernel/latency_trace.c +++ linux/kernel/latency_trace.c @@ -67,13 +67,6 @@ static unsigned long notrace cycles_to_u } #endif -static notrace inline cycle_t now(void) -{ - if (trace_use_raw_cycles) - return get_cycles(); - return get_monotonic_cycles(); -} - #ifndef irqs_off # define irqs_off()0 #endif @@ -237,6 +230,8 @@ struct cpu_trace { unsigned long critical_sequence; atomic_t underrun; atomic_t overrun; + cycle_t cycles; + cycle_t last_cycles; int early_warning; int latency_type; int cpu; @@ -263,6 +258,30 @@ static struct cpu_trace cpu_traces[NR_CP #endif } }; +static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +{ + cycles_t now, delta, last = tr->last_cycles; + + if (trace_use_raw_cycles && !monotonic) + now = get_cycles(); + else + now = get_monotonic_cycles(); + + /* +* Protect against time warps: +*/ + if (unlikely(now < last)) + delta = 1; + else + delta = now - last; + + tr->last_cycles = now; + tr->cycles += delta; + + return tr->cycles; +} + + #ifdef CONFIG_EVENT_TRACE int trace_enabled = 0; @@ -615,7 +634,7 @@ trace(int cpu, enum trace_type type, again: idx = tr->trace_idx; idx_next = idx + 1; - timestamp = now(); + timestamp = now(tr, 0); if (unlikely((trace_freerunning || print_functions || atomic_read(>underrun)) && (idx_next >= MAX_TRACE) && !atomic_read(>overrun))) { @@ -1743,7 +1762,7 @@ check_critical_timing(int cpu, struct cp * as long as possible: */ T0 = tr->preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(tr, 1); delta = T1-T0; local_save_flags(flags); @@ -1757,7 +1776,7 @@ check_critical_timing(int cpu, struct cp * might change it (it can only get larger so the latency * is fair to be reported): */ - T2 = get_monotonic_cycles(); + T2 = now(tr, 1); delta = T2-T0; @@ -1807,7 +1826,7 @@ check_critical_timing(int cpu, struct cp printk(" => ended at timestamp %lu: ", t1); print_symbol("<%s>\n", tr->critical_end); dump_stack(); - t1 = cycles_to_usecs(get_monotonic_cycles()); + t1 = cycles_to_usecs(now(tr, 1)); printk(" => dump-end timestamp %lu\n\n", t1); #endif @@ -1817,7 +1836,7 @@ check_critical_timing(int cpu, struct cp out: tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(tr, 1); tr->early_warning = 0; reset_trace_idx(cpu, tr); _trace_cmdline(cpu, tr); @@ -1866,7 +1885,7 @@ __start_critical_timing(unsigned long ei atomic_inc(>disabled); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(tr, 1); tr->critical_start = eip; reset_trace_idx(cpu, tr); tr->latency_type = latency_type; @@ -2150,7 +2169,7 @@ check_wakeup_timing(struct cpu_trace *tr goto out; T0 = tr->preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(tr, 1); /* * Any wraparound or time warp and we are out: */ @@ -2255,7 +2274,7 @@ void __trace_start_sched_wakeup(struct t // if (!atomic_read(>disabled)) { atomic_inc(>disabled); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(tr, 1); tr->latency_type = WAKEUP_LATENCY; tr->critical_start = CALLER_ADDR0; _trace_cmdline(raw_smp_processor_id(), tr); @@ -2366,7 +2385,7 @@ long user_trace_start(void) reset_trace_idx(cpu, tr); tr->critical_sequence = max_sequence; - tr->preempt_timestamp = get_monotonic_cycles(); + tr->preempt_timestamp = now(tr, 1); tr->critical_start = CALLER_ADDR0;
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar [EMAIL PROTECTED] wrote: once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 /proc/sys/kernel/stackframe_tracing echo 1 /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c echo mem /sys/power/state trace.txt so here's an UP suspend+resume trace i did: http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 tons of detail - which might be interesting to other folks as well. Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on fast hardware - and this trace shows all the reasons why. This was a fully cached system - i.e. i've done a suspend+resume before to warm up the caches. (not that suspend+resume does much IO normally.) The trace shows that a suspend+resume cycle is 7.95 seconds long (without counting the time the box spent suspended) - ouch! This was a T60 with Core2Duo 1.83GHz. For example here is where freezing starts: bash-2397 0 31686us : remove_wait_queue (vt_waitactive) bash-2397 0 31688us : freeze_processes (enter_state) bash-2397 0 31689us : printk (freeze_processes) here is where the ACPI code triggers the suspend: bash-2397 0D... 1904138us : acpi_hw_low_level_write (acpi_hw_register_write) but this is a whopping 1.9 seconds into the trace already! first sign of life after i opened the laptop lid again: bash-2397 0D... 1904138us : __restore_processor_state (restore_processor_state) bash-2397 0D... 1904138us : enable_sep_cpu (__restore_processor_state) (in the trace there's no delay visible - the period of time spent suspended is not visible to the tracer.) One good way to start looking at such traces is to filter out rescheduling events alone: grep ': schedule ' trace-suspend-long.txt that gives a rough outline of what's going on: idle-0 0D... 1776566us : schedule bash-2397 (0 20) bash-2397 0D... 1786748us : schedule idle-0 (20 0) scsi_eh_-419 0D... 1786814us : schedule bash-2397 (0 -5) bash-2397 0D... 1786960us : schedule scsi_eh_-419 (-5 0) scsi_eh_-421 0D... 1787020us : schedule bash-2397 (0 -5) bash-2397 0D... 1787125us : schedule scsi_eh_-421 (-5 0) so you can zoom in on the real area of interest by searching for the timestamp. Hope this helps, Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Fri, 2007-11-16 at 12:23 +0100, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 /proc/sys/kernel/stackframe_tracing echo 1 /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c echo mem /sys/power/state trace.txt so here's an UP suspend+resume trace i did: http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 tons of detail - which might be interesting to other folks as well. Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on fast hardware - and this trace shows all the reasons why. This was a fully cached system - i.e. i've done a suspend+resume before to warm up the caches. (not that suspend+resume does much IO normally.) The trace shows that a suspend+resume cycle is 7.95 seconds long (without counting the time the box spent suspended) - ouch! This was a T60 with Core2Duo 1.83GHz. Ouch? That's an order of magnitude faster than my 3GHz P4 :) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar [EMAIL PROTECTED] wrote: so here's an UP suspend+resume trace i did: http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2 tons of detail - which might be interesting to other folks as well. Fact is, our suspend-to-RAM+resume cycle is very, very slow, even on fast hardware - and this trace shows all the reasons why. This was a fully cached system - i.e. i've done a suspend+resume before to warm up the caches. (not that suspend+resume does much IO normally.) The trace shows that a suspend+resume cycle is 7.95 seconds long (without counting the time the box spent suspended) - ouch! This was a T60 with Core2Duo 1.83GHz. and the amount of time spent executing on the CPU was only 70 msecs! So we spent 99% of that 7.9 seconds with just waiting around. Here are the top 10 sleep reasons: 864 schedule()-schedule_timeout()-ps2_sendbyte()-ps2_command() 183 schedule()-vt_waitactive()-vt_ioctl()-tty_ioctl() 164 schedule()-schedule_timeout()-acpi_ec_wait()-acpi_ec_transaction() 157 schedule()-refrigerator()-get_signal_to_deliver()-do_notify_resume() 118 schedule()-worker_thread()-kthread()-kernel_thread_helper() 80 schedule()-do_msleep()-msleep()-sata_link_debounce() 64 schedule()-schedule_timeout()-inet_csk_accept()-inet_accept() 37 schedule()-__mutex_lock_slowpath()-mutex_lock()-acpi_ec_transaction() 20 schedule()-schedule_timeout()-do_select()-core_sys_select() 20 schedule()-io_schedule()-sync_buffer()-__wait_on_bit() what's weird are all those ps2 related sleeps - they make up for much of the delay. This how such a sleep point looks like: bash 3500 0D... 8641415us : schedule()-schedule_timeout()-ps2_sendbyte()-ps2_command() bash 3500 0D... 8641417us : psmouse_sliced_command()-synaptics_pt_write()-ps2_sendbyte()-ps2_command() it starts somewhere here: bash 3500 0 5302376us : serio_reconnect_driver (serio_resume) and ends: kseriod 208 0D... 9182560us : synaptics_query_hardware()-synaptics_reconnect()-psmouse_reconnect()-serio_reconnect_driver() so this section (serio_resume()) took almost 4 seconds. the main delay seems to be dpm_resume(): bash 3500 0.N.. 3061040us : dpm_resume (device_resume) ... bash 3500 0 9105017us : mutex_unlock (dpm_resume) bash 3500 0 9105018us : mutex_unlock (device_resume) 6.1 seconds! Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
ok, i experimented around with the latency tracer, trying to capture the trace of a full suspend+resume cycle, and it needed the tracer fix below (GTOD clocksource suspend/resume would otherwise confuse the tracer and you'd get no trace output as a result). once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 /proc/sys/kernel/stackframe_tracing echo 1 /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c echo mem /sys/power/state trace.txt Ingo --- kernel/latency_trace.c | 53 + 1 file changed, 36 insertions(+), 17 deletions(-) Index: linux/kernel/latency_trace.c === --- linux.orig/kernel/latency_trace.c +++ linux/kernel/latency_trace.c @@ -67,13 +67,6 @@ static unsigned long notrace cycles_to_u } #endif -static notrace inline cycle_t now(void) -{ - if (trace_use_raw_cycles) - return get_cycles(); - return get_monotonic_cycles(); -} - #ifndef irqs_off # define irqs_off()0 #endif @@ -237,6 +230,8 @@ struct cpu_trace { unsigned long critical_sequence; atomic_t underrun; atomic_t overrun; + cycle_t cycles; + cycle_t last_cycles; int early_warning; int latency_type; int cpu; @@ -263,6 +258,30 @@ static struct cpu_trace cpu_traces[NR_CP #endif } }; +static notrace cycle_t now(struct cpu_trace *tr, int monotonic) +{ + cycles_t now, delta, last = tr-last_cycles; + + if (trace_use_raw_cycles !monotonic) + now = get_cycles(); + else + now = get_monotonic_cycles(); + + /* +* Protect against time warps: +*/ + if (unlikely(now last)) + delta = 1; + else + delta = now - last; + + tr-last_cycles = now; + tr-cycles += delta; + + return tr-cycles; +} + + #ifdef CONFIG_EVENT_TRACE int trace_enabled = 0; @@ -615,7 +634,7 @@ trace(int cpu, enum trace_type type, again: idx = tr-trace_idx; idx_next = idx + 1; - timestamp = now(); + timestamp = now(tr, 0); if (unlikely((trace_freerunning || print_functions || atomic_read(tr-underrun)) (idx_next = MAX_TRACE) !atomic_read(tr-overrun))) { @@ -1743,7 +1762,7 @@ check_critical_timing(int cpu, struct cp * as long as possible: */ T0 = tr-preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(tr, 1); delta = T1-T0; local_save_flags(flags); @@ -1757,7 +1776,7 @@ check_critical_timing(int cpu, struct cp * might change it (it can only get larger so the latency * is fair to be reported): */ - T2 = get_monotonic_cycles(); + T2 = now(tr, 1); delta = T2-T0; @@ -1807,7 +1826,7 @@ check_critical_timing(int cpu, struct cp printk( = ended at timestamp %lu: , t1); print_symbol(%s\n, tr-critical_end); dump_stack(); - t1 = cycles_to_usecs(get_monotonic_cycles()); + t1 = cycles_to_usecs(now(tr, 1)); printk( = dump-end timestamp %lu\n\n, t1); #endif @@ -1817,7 +1836,7 @@ check_critical_timing(int cpu, struct cp out: tr-critical_sequence = max_sequence; - tr-preempt_timestamp = get_monotonic_cycles(); + tr-preempt_timestamp = now(tr, 1); tr-early_warning = 0; reset_trace_idx(cpu, tr); _trace_cmdline(cpu, tr); @@ -1866,7 +1885,7 @@ __start_critical_timing(unsigned long ei atomic_inc(tr-disabled); tr-critical_sequence = max_sequence; - tr-preempt_timestamp = get_monotonic_cycles(); + tr-preempt_timestamp = now(tr, 1); tr-critical_start = eip; reset_trace_idx(cpu, tr); tr-latency_type = latency_type; @@ -2150,7 +2169,7 @@ check_wakeup_timing(struct cpu_trace *tr goto out; T0 = tr-preempt_timestamp; - T1 = get_monotonic_cycles(); + T1 = now(tr, 1); /* * Any wraparound or time warp and we are out: */ @@ -2255,7 +2274,7 @@ void __trace_start_sched_wakeup(struct t // if (!atomic_read(tr-disabled)) { atomic_inc(tr-disabled); tr-critical_sequence = max_sequence; - tr-preempt_timestamp = get_monotonic_cycles(); + tr-preempt_timestamp = now(tr, 1); tr-latency_type = WAKEUP_LATENCY; tr-critical_start = CALLER_ADDR0; _trace_cmdline(raw_smp_processor_id(), tr); @@ -2366,7 +2385,7 @@ long user_trace_start(void) reset_trace_idx(cpu, tr); tr-critical_sequence = max_sequence; - tr-preempt_timestamp = get_monotonic_cycles(); + tr-preempt_timestamp = now(tr, 1); tr-critical_start = CALLER_ADDR0; _trace_cmdline(cpu, tr); mcount(); @@ -2425,7
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: Is there a version of these that works with 2.6.23.1 ? yes, i've backported it and have uploaded the v2.6.23 version to: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch .. ok, i experimented around with the latency tracer, trying to capture the trace of a full suspend+resume cycle, and it needed the tracer fix below (GTOD clocksource suspend/resume would otherwise confuse the tracer and you'd get no trace output as a result). once that tracer bug was fixed, the best method to generate a trace was to do this: echo 1 /proc/sys/kernel/stackframe_tracing echo 1 /proc/sys/kernel/syscall_tracing ./trace-cmd bash -c echo mem /sys/power/state trace.txt .. I've applied the tracer for 2.6.23.1 patch, plus the bugfix, and now I just get blinking-LEDs (black screen of death) on resume. Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. make trace-cmd cc -Wall -O2 -strace-cmd.c -o trace-cmd trace-cmd.c: In function ‘main’: trace-cmd.c:65: warning: label ‘usage’ defined but not used - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > Is there a version of these that works with 2.6.23.1 ? > > yes, i've backported it and have uploaded the v2.6.23 version to: > > > http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch btw., if the trace is too large and the interesting section of suspend does not fit into it then you can narrow it down to the most important events only by changing this in trace-cmd.c: system("echo 1 > /proc/sys/kernel/mcount_enabled"); to: system("echo 0 > /proc/sys/kernel/mcount_enabled"); that way we'll still trace IRQs and scheduling events, which is enough to see roughly where the delay is happening. To include symbolic backtraces in the trace, do this: echo 1 > /proc/sys/kernel/stackframe_tracing to get such trace entries: ls 3688 1D..3 4642us+: deactivate_task (0 0) ls 3688 1D..3 4644us+: schedule()<-do_exit()<-sys_exit_group()<-sys_exit_group() ls 3688 1D..3 4647us+: sysenter_past_esp()<-( -1)()<-(0)()<-( 0)() 0 1D..2 4652us : schedule (0 20) this way we can see why a task goes to sleep and which exact kernel codepath is waking it up. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord <[EMAIL PROTECTED]> wrote: >> sorry, wrong URLs, the correct links are: >> >> >> http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch >>http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c > .. > > Is there a version of these that works with 2.6.23.1 ? yes, i've backported it and have uploaded the v2.6.23 version to: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. Is there a version of these that works with 2.6.23.1 ? I'm not using 2.6.24-* on this machine because: (a) behaviour may be different, and (b) something broke VMware compatibility again. Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Thursday, 15 of November 2007, Mark Lord wrote: > > Hi! > > > >> > I have been reporting this off and on since 2.6.23 was released. > >> > This problem was not apparent up to perhaps 2.6.23-rc8, > >> > but definitely became common in 2.6.23 and 2.6.23.1. > >> > > >> > Most of the time, a resume-from-RAM on my notebook takes about 2.1 > >> > seconds > >> > of kernel time to complete. > >> > > >> > Once in a while, it takes *much* longer, in the 14-20 second range. > >> > These long events *seem* to be mostly after the notebook has been > >> > in suspend for a longish time, but there's really nothing consistent > >> > here. > >> > > >> > So git-bisect isn't going to work for this one. I recently rebuilt the > >> > kernel > >> > to include printk timestamps, and then it went 2 days without the issue > >> > happening, > >> > until this morning (after an overnight suspend) finally. > >> > > >> > The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ > >> > w/3GB > >> > DDR2. > >> > PCIe express chipset, ATI graphics, SATA hard drive. > >> > > >> > 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and > >> > 945GT > >> > Express Memory Controller Hub (rev 03) > >> > 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and > >> > 945GT > >> > Express PCI Express Root Port (rev 03) > >> > 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High > >> > Definition Audio Controller (rev 01) > >> > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express > >> > Port > >> > 1 (rev 01) > >> > 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express > >> > Port > >> > 2 (rev 01) > >> > 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express > >> > Port > >> > 4 (rev 01) > >> > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI > >> > #1 > >> > (rev 01) > >> > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI > >> > #2 > >> > (rev 01) > >> > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI > >> > #3 > >> > (rev 01) > >> > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI > >> > #4 > >> > (rev 01) > >> > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI > >> > Controller (rev 01) > >> > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) > >> > 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface > >> > Bridge (rev 01) > >> > 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) > >> > Serial > >> > ATA Storage Controller IDE (rev 01) > >> > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller > >> > (rev > >> > 01) > >> > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility > >> > X1400 > >> > 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX > >> > (rev 02) > >> > 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 > >> > 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 > >> > SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) > >> > 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) > >> > 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host > >> > Adapter (rev 0a) > >> > 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev > >> > 05) > >> > 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG > >> > Network > >> > Connection (rev 02) > >> > > >> > There's also a USB2 hub connected, with the Logitech mouse being the > >> > only thing plugged into it at present. > >> > > >> > The best clue I have of what is happening, other than that it was first > >> > seen during the late 2.6.23-rc* series, are the following two sets of > >> > kernel logs. The first set is from a "normal" fast wake-up, > >> > and the second set is from the "slow" wake-up seen this morning. > >> > > >> > Both logs show the same information, in pretty much the same order. > >> > The BIG difference is a bunch of unexplained pauses of exactly 1-second > >> > each that occur during the slow wakeup. > >> > > >> > The only other difference is that the SATA disk messages are placed > >> > differently within the two logs. I'm thinking on that one but it > >> > doesn't look significant to me. > >> > > >> > The real question is, why the 1-sec pauses? > > > > Can you try nohz=off highres=off? Strange stuff is happening with > > nohz. > .. > > (added Ingo to CC: list: maybe this is some weird interaction with CFS > and jiffies being reset to 0 on resume ??) > > I can try it, but it won't help debug the problem much. Well, if you could verify that it doesn't happen at all with NO_HZ unset, that would be a valuable data point, IMO. > Remember, this happens very inconsistently, maybe 3-4 times a day, > or not at all for 3-4 days. > > But if somebody has a specific bug-fix patch that could explain this, > then I'll
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord <[EMAIL PROTECTED]> wrote: >> Can you try nohz=off highres=off? Strange stuff is happening with >> nohz. > > (added Ingo to CC: list: maybe this is some weird interaction with CFS > and jiffies being reset to 0 on resume ??) hm, CFS should have no impact here. To see what's happening you could try to use the latency tracer of the -rt patch and do a cross-resume trace. pick up the latest latency tracer patch from: http://redhat.com/~mingo/private/latency-tracer-v2.6.24-rc2-git5-combo.patch apply it and enable CONFIG_FUNCTION_TRACING, then pick up trace-cmd.c: http://redhat.com/~mingo/private/trace-cmd.c and do something like: ./trace-cmd pm-suspend > trace.txt or: ./trace-cmd /bin/bash -c "echo ram > /sys/power/state" > trace.txt this should trigger suspend - then you should do the resume. If everything goes well then trace.txt should contain a pretty large trace of all the stuff we do during a suspend+resume. and wait for such a pause and send us the resulting trace.txt. if it's an SMP box then first do: echo 1 > /proc/sys/kernel/trace_all_cpus to get a global trace. Let me know if something doesnt work with this scheme. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Hi! > I have been reporting this off and on since 2.6.23 was released. > This problem was not apparent up to perhaps 2.6.23-rc8, > but definitely became common in 2.6.23 and 2.6.23.1. > > Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds > of kernel time to complete. > > Once in a while, it takes *much* longer, in the 14-20 second range. > These long events *seem* to be mostly after the notebook has been > in suspend for a longish time, but there's really nothing consistent here. > > So git-bisect isn't going to work for this one. I recently rebuilt the > kernel > to include printk timestamps, and then it went 2 days without the issue > happening, > until this morning (after an overnight suspend) finally. > > The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB > DDR2. > PCIe express chipset, ATI graphics, SATA hard drive. > > 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express Memory Controller Hub (rev 03) > 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express PCI Express Root Port (rev 03) > 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High > Definition Audio Controller (rev 01) > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 1 (rev 01) > 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 2 (rev 01) > 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 4 (rev 01) > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 > (rev 01) > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 > (rev 01) > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 > (rev 01) > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 > (rev 01) > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI > Controller (rev 01) > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) > 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface > Bridge (rev 01) > 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial > ATA Storage Controller IDE (rev 01) > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev > 01) > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility > X1400 > 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX > (rev 02) > 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 > 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 > SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) > 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) > 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host > Adapter (rev 0a) > 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) > 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network > Connection (rev 02) > > There's also a USB2 hub connected, with the Logitech mouse being the > only thing plugged into it at present. > > The best clue I have of what is happening, other than that it was first > seen during the late 2.6.23-rc* series, are the following two sets of > kernel logs. The first set is from a "normal" fast wake-up, > and the second set is from the "slow" wake-up seen this morning. > > Both logs show the same information, in pretty much the same order. > The BIG difference is a bunch of unexplained pauses of exactly 1-second > each that occur during the slow wakeup. > > The only other difference is that the SATA disk messages are placed > differently within the two logs. I'm thinking on that one but it > doesn't look significant to me. > > The real question is, why the 1-sec pauses? Can you try nohz=off highres=off? Strange stuff is happening with nohz. .. (added Ingo to CC: list: maybe this is some weird interaction with CFS and jiffies being reset to 0 on resume ??) I can try it, but it won't help debug the problem much. Remember, this happens very inconsistently, maybe 3-4 times a day, or not at all for 3-4 days. But if somebody has a specific bug-fix patch that could explain this, then I'll happily apply it here. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Hi! > I have been reporting this off and on since 2.6.23 was released. > This problem was not apparent up to perhaps 2.6.23-rc8, > but definitely became common in 2.6.23 and 2.6.23.1. > > Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds > of kernel time to complete. > > Once in a while, it takes *much* longer, in the 14-20 second range. > These long events *seem* to be mostly after the notebook has been > in suspend for a longish time, but there's really nothing consistent here. > > So git-bisect isn't going to work for this one. I recently rebuilt the > kernel > to include printk timestamps, and then it went 2 days without the issue > happening, > until this morning (after an overnight suspend) finally. > > The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB > DDR2. > PCIe express chipset, ATI graphics, SATA hard drive. > > 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express Memory Controller Hub (rev 03) > 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express PCI Express Root Port (rev 03) > 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High > Definition Audio Controller (rev 01) > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 1 (rev 01) > 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 2 (rev 01) > 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port > 4 (rev 01) > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 > (rev 01) > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 > (rev 01) > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 > (rev 01) > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 > (rev 01) > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI > Controller (rev 01) > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) > 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface > Bridge (rev 01) > 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial > ATA Storage Controller IDE (rev 01) > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev > 01) > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility > X1400 > 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX > (rev 02) > 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 > 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 > SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) > 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) > 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host > Adapter (rev 0a) > 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) > 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network > Connection (rev 02) > > There's also a USB2 hub connected, with the Logitech mouse being the > only thing plugged into it at present. > > The best clue I have of what is happening, other than that it was first > seen during the late 2.6.23-rc* series, are the following two sets of > kernel logs. The first set is from a "normal" fast wake-up, > and the second set is from the "slow" wake-up seen this morning. > > Both logs show the same information, in pretty much the same order. > The BIG difference is a bunch of unexplained pauses of exactly 1-second > each that occur during the slow wakeup. > > The only other difference is that the SATA disk messages are placed > differently within the two logs. I'm thinking on that one but it > doesn't look significant to me. > > The real question is, why the 1-sec pauses? Can you try nohz=off highres=off? Strange stuff is happening with nohz. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ray Lee wrote: .. The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? .. Yes, A NO_HZ kernel. Full config follows: .. Blah.. that was for the wrong kernel. Here is the correct .config that was in use for both suspend logs: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23.1 # Wed Nov 14 09:41:54 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CPUSETS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set CONFIG_MCORE2=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_HPET_TIMER=y CONFIG_NR_CPUS=2 # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_MCE is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set CONFIG_I8K=m CONFIG_X86_REBOOTFIXUPS=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # CONFIG_EDD=m CONFIG_DELL_RBU=m CONFIG_DCDBAS=m CONFIG_DMIID=y # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_VMSPLIT_3G=y # CONFIG_VMSPLIT_3G_OPT is not set # CONFIG_VMSPLIT_2G is not set # CONFIG_VMSPLIT_2G_OPT is not set # CONFIG_VMSPLIT_1G is not set
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ray Lee wrote: On Nov 15, 2007 8:32 AM, Mark Lord <[EMAIL PROTECTED]> wrote: I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a "normal" fast wake-up, and the second set is from the "slow" wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? .. Yes, A NO_HZ kernel. Full config follows: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2-git4 # Wed Nov 14 14:19:45 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Nov 15, 2007 8:32 AM, Mark Lord <[EMAIL PROTECTED]> wrote: > I have been reporting this off and on since 2.6.23 was released. > This problem was not apparent up to perhaps 2.6.23-rc8, > but definitely became common in 2.6.23 and 2.6.23.1. > > Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds > of kernel time to complete. > > Once in a while, it takes *much* longer, in the 14-20 second range. > These long events *seem* to be mostly after the notebook has been > in suspend for a longish time, but there's really nothing consistent here. > > So git-bisect isn't going to work for this one. I recently rebuilt the > kernel > to include printk timestamps, and then it went 2 days without the issue > happening, > until this morning (after an overnight suspend) finally. > > The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB > DDR2. > PCIe express chipset, ATI graphics, SATA hard drive. > > 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express Memory Controller Hub (rev 03) > 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT > Express PCI Express Root Port (rev 03) > 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition > Audio Controller (rev 01) > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 > (rev 01) > 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 > (rev 01) > 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 > (rev 01) > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 > (rev 01) > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 > (rev 01) > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 > (rev 01) > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 > (rev 01) > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI > Controller (rev 01) > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) > 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge > (rev 01) > 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial > ATA Storage Controller IDE (rev 01) > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev > 01) > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 > 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev > 02) > 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 > 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 > SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) > 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) > 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter > (rev 0a) > 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) > 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network > Connection (rev 02) > > There's also a USB2 hub connected, with the Logitech mouse being the > only thing plugged into it at present. > > The best clue I have of what is happening, other than that it was first > seen during the late 2.6.23-rc* series, are the following two sets of > kernel logs. The first set is from a "normal" fast wake-up, > and the second set is from the "slow" wake-up seen this morning. > > Both logs show the same information, in pretty much the same order. > The BIG difference is a bunch of unexplained pauses of exactly 1-second > each that occur during the slow wakeup. > > The only other difference is that the SATA disk messages are placed > differently within the two logs. I'm thinking on that one but it > doesn't look significant to me. > > The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Hi! I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Can you try nohz=off highres=off? Strange stuff is happening with nohz. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Nov 15, 2007 8:32 AM, Mark Lord [EMAIL PROTECTED] wrote: I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ray Lee wrote: On Nov 15, 2007 8:32 AM, Mark Lord [EMAIL PROTECTED] wrote: I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? .. Yes, A NO_HZ kernel. Full config follows: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2-git4 # Wed Nov 14 14:19:45 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Mark Lord wrote: Ray Lee wrote: .. The real question is, why the 1-sec pauses? Well, and why the 1-second pauses eventually stop, too. Seems interesting that they don't continue. Also, they're pretty much dead-on one- and two-second pauses, with HZ accuracy. Is this with a NO_HZ kernel? .. Yes, A NO_HZ kernel. Full config follows: .. Blah.. that was for the wrong kernel. Here is the correct .config that was in use for both suspend logs: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23.1 # Wed Nov 14 09:41:54 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CPUSETS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set CONFIG_MCORE2=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_HPET_TIMER=y CONFIG_NR_CPUS=2 # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_MCE is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set CONFIG_I8K=m CONFIG_X86_REBOOTFIXUPS=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # CONFIG_EDD=m CONFIG_DELL_RBU=m CONFIG_DCDBAS=m CONFIG_DMIID=y # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_VMSPLIT_3G=y # CONFIG_VMSPLIT_3G_OPT is not set # CONFIG_VMSPLIT_2G is not set # CONFIG_VMSPLIT_2G_OPT is not set # CONFIG_VMSPLIT_1G is not set
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Hi! I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Can you try nohz=off highres=off? Strange stuff is happening with nohz. .. (added Ingo to CC: list: maybe this is some weird interaction with CFS and jiffies being reset to 0 on resume ??) I can try it, but it won't help debug the problem much. Remember, this happens very inconsistently, maybe 3-4 times a day, or not at all for 3-4 days. But if somebody has a specific bug-fix patch that could explain this, then I'll happily apply it here. Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord [EMAIL PROTECTED] wrote: Can you try nohz=off highres=off? Strange stuff is happening with nohz. (added Ingo to CC: list: maybe this is some weird interaction with CFS and jiffies being reset to 0 on resume ??) hm, CFS should have no impact here. To see what's happening you could try to use the latency tracer of the -rt patch and do a cross-resume trace. pick up the latest latency tracer patch from: http://redhat.com/~mingo/private/latency-tracer-v2.6.24-rc2-git5-combo.patch apply it and enable CONFIG_FUNCTION_TRACING, then pick up trace-cmd.c: http://redhat.com/~mingo/private/trace-cmd.c and do something like: ./trace-cmd pm-suspend trace.txt or: ./trace-cmd /bin/bash -c echo ram /sys/power/state trace.txt this should trigger suspend - then you should do the resume. If everything goes well then trace.txt should contain a pretty large trace of all the stuff we do during a suspend+resume. and wait for such a pause and send us the resulting trace.txt. if it's an SMP box then first do: echo 1 /proc/sys/kernel/trace_all_cpus to get a global trace. Let me know if something doesnt work with this scheme. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
On Thursday, 15 of November 2007, Mark Lord wrote: Hi! I have been reporting this off and on since 2.6.23 was released. This problem was not apparent up to perhaps 2.6.23-rc8, but definitely became common in 2.6.23 and 2.6.23.1. Most of the time, a resume-from-RAM on my notebook takes about 2.1 seconds of kernel time to complete. Once in a while, it takes *much* longer, in the 14-20 second range. These long events *seem* to be mostly after the notebook has been in suspend for a longish time, but there's really nothing consistent here. So git-bisect isn't going to work for this one. I recently rebuilt the kernel to include printk timestamps, and then it went 2 days without the issue happening, until this morning (after an overnight suspend) finally. The machine is a Dell Inspiron 9400, Intel chipset + Core2Duo 2.1GHZ w/3GB DDR2. PCIe express chipset, ATI graphics, SATA hard drive. 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832 03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19) 03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01) 03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a) 03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) There's also a USB2 hub connected, with the Logitech mouse being the only thing plugged into it at present. The best clue I have of what is happening, other than that it was first seen during the late 2.6.23-rc* series, are the following two sets of kernel logs. The first set is from a normal fast wake-up, and the second set is from the slow wake-up seen this morning. Both logs show the same information, in pretty much the same order. The BIG difference is a bunch of unexplained pauses of exactly 1-second each that occur during the slow wakeup. The only other difference is that the SATA disk messages are placed differently within the two logs. I'm thinking on that one but it doesn't look significant to me. The real question is, why the 1-sec pauses? Can you try nohz=off highres=off? Strange stuff is happening with nohz. .. (added Ingo to CC: list: maybe this is some weird interaction with CFS and jiffies being reset to 0 on resume ??) I can try it, but it won't help debug the problem much. Well, if you could verify that it doesn't happen at all with NO_HZ unset, that would be a valuable data point, IMO. Remember, this happens very inconsistently, maybe 3-4 times a day, or not at all for 3-4 days. But if somebody has a specific bug-fix patch that could explain this, then I'll happily apply it here. I'm not aware of any fix related to the symptoms that you observe. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: pick up the latest latency tracer patch from: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. Is there a version of these that works with 2.6.23.1 ? I'm not using 2.6.24-* on this machine because: (a) behaviour may be different, and (b) something broke VMware compatibility again. Thanks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Mark Lord [EMAIL PROTECTED] wrote: sorry, wrong URLs, the correct links are: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.24-rc2-git5-combo.patch http://redhat.com/~mingo/latency-tracing-patches/trace-cmd.c .. Is there a version of these that works with 2.6.23.1 ? yes, i've backported it and have uploaded the v2.6.23 version to: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Strange 1-second pauses during Resume-from-RAM
* Ingo Molnar [EMAIL PROTECTED] wrote: Is there a version of these that works with 2.6.23.1 ? yes, i've backported it and have uploaded the v2.6.23 version to: http://redhat.com/~mingo/latency-tracing-patches/latency-tracer-v2.6.23.1-combo.patch btw., if the trace is too large and the interesting section of suspend does not fit into it then you can narrow it down to the most important events only by changing this in trace-cmd.c: system(echo 1 /proc/sys/kernel/mcount_enabled); to: system(echo 0 /proc/sys/kernel/mcount_enabled); that way we'll still trace IRQs and scheduling events, which is enough to see roughly where the delay is happening. To include symbolic backtraces in the trace, do this: echo 1 /proc/sys/kernel/stackframe_tracing to get such trace entries: ls 3688 1D..3 4642us+: deactivate_task ls 3688 (0 0) ls 3688 1D..3 4644us+: schedule()-do_exit()-sys_exit_group()-sys_exit_group() ls 3688 1D..3 4647us+: sysenter_past_esp()-( -1)()-(0)()-( 0)() idle 0 1D..2 4652us : schedule ls 3688 (0 20) this way we can see why a task goes to sleep and which exact kernel codepath is waking it up. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/