Bug#700333: Stack trace

2013-04-30 Thread vitalif

I merged a slightly better fix, you all were on cc. It's going into
3.10 and it's tagged stable, so it will show up in stable kernels
soon.


Thanks for the fix!
But where did you post it - on LKML?
(I didn't see it because I'm not subscribed to LKML?)


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/9eab1d6a826bbd58d0b074f045a1c...@yourcmc.ru



Bug#700333: Stack trace

2013-04-28 Thread vitalif

When you do a suspend/resume cycle.


OK, yes, I've found it there.

The bug says The photo shows a BUG in hrtimer_interrupt() after 
making

the hibernation image and while resuming the non-boot CPUs. so I'm
guessing with Thomas' patch it suspends fine now?


Yeah, now I'm using a patched kernel and it's OK.

So, does it mean the problem is fixed by this patch or it's just 
confirmed and should be fixed by another one?



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/925b81fa055e645546ae9d237eeb2...@yourcmc.ru



Bug#700333: Stack trace

2013-04-27 Thread vitalif

Looks like we can't do anything about that in the HPET code itself.

Vitaliy, could you try that patch ?


Thanks, I've tried it several days ago (and still using a patched 
kernel :)) - the box survives.

But at which moment should I check for Spurious interrupt in dmesg?


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2666050d6d50efdbfa3503aa10c0e...@yourcmc.ru



Bug#700333: Stack trace

2013-04-20 Thread vitalif

Stack trace picture is here:
http://vmx.yourcmc.ru/var/pics/IMG_20130306_141045.jpg


Vitaliy reported that his system crashes when suspending to disk.  
This
was a regression from 3.2 to 3.7, and remains in 3.8.  Some details 
of

this system are in the bug log at http://bugs.debian.org/700333.

The photo shows a BUG in hrtimer_interrupt() after making the
hibernation image and while resuming the non-boot CPUs.  The HPET
interrupt handler was called immediately after it was registered for 
CPU

2 (?), before the corresponding clock_event_device was registered.

Seems like an obvious race condition, but then shouldn't the HPET 
have

been stopped while the CPU was previously offlined?  And it's strange
that this system apparently hits the race quite reliably.


Anyone?


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/cc9020446ea75ed733ec96c505039...@yourcmc.ru



Bug#700333: Stack trace

2013-03-07 Thread vitalif

Hi Ben!

Did the stack help you to identify something?

Enabling non-boot CPUs seems suspicious to me - does that mean 
instead of writing an image to disk and hibernating it's trying to 
resume?



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/c17f13055e1e901a9f4a3ee94...@yourcmc.ru



Bug#700333: Stack trace

2013-03-06 Thread vitalif

No, but I think this kernel parameter will help:

pause_on_oops=
Halt all CPUs after the first oops has been printed for
the specified number of seconds.  This is to be used if
your oopses keep scrolling off the screen.

(How have I not noticed this in all the years I've been crashing
kernels?!)


Thanks, it helped :)
By the way, this crash happens with init=/bin/bash

Stack trace picture is here: 
http://vmx.yourcmc.ru/var/pics/IMG_20130306_141045.jpg



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/15ec7b46ebd929a67caea0d80b324...@yourcmc.ru



Bug#700333: Stack trace

2013-03-05 Thread vitalif

Hello

I've booted with no_console_suspend and got the stack trace, however 
it's from 3.8-aptosid kernel. The problem with 3.8 is the same as with 
3.7.


Can someone please help me - what does this stack mean?

Kernel panic - not syncing: Fatal exception in interrupt
[ cut here ]
WARNING: at 
/tmp/buildd/linux-aptosid-3.8/debian/build/source_amd64_none/arch/kernel/smp.c:123 
update_process_times+0x55/0x61()

Hardware name: Studio XPS 1645
Modules linked in: dm_mirror dm_region_hash dm_log dm_mod ext4 crc16 
jbd2 mbcache sd_mod crc_t10dif thermal ahci libahci libata scsi_mod fan

Pid: 17, comm: kworker/1:0 Tainted: G D 3.8-1.slh.2-aptosid-amd64 #1
Call Trace:
IRQ warn_slowpath_common+0x76/0x8a
update_process_times+0x55/0x61
tick_periodic+0x60/0x6b
tick_handle_periodic+0x18/0x52
smp_apic_timer_interrupt+0x6e/0x81
apic_timer_interrupt+0x6d/0x80
up+0xc/0x35
panic+0x18b/0x1c7
panic+0xfd/0x1c7
oops_end+0x9c/0xa9
do_invalid_op+0x87/0x91
hrtimer_interrupt+0x24/0x1a4
load_balance+0xc3/0x62a
run_posix_cpu_timers+0x25/0x57a
invalid_op+0x1e/0x30
request_threaded_irq+0x84/0xf5
hrtimer_get_next_event+0x92/0x92
hrtimer_interrupt+0x24/0x1a4
tick_notify+0x216/0x378
hpet_interrupt_handler+0x23/0x2b
request_threaded_irq+0x84/0xf5
handle_irq_event_percpu+0x24/0x124
handle_irq_event+0x37/0x57
handle_edge_irq+0x98/0xbb
handle_irq+0x15/0x1d
do_IRQ+0x41/0x97
common_interrupt+0x6d/0x6d
request_threaded_irq+0x84/0xf5
vsnprintf+0x187/0x439
vsnprintf+0x70/0x439
snprintf+0x39/0x3e
register_handler_proc+0xd8/0x114
__setup_irq+0x334/0x3d4
hpet_set_periodic_freq+0x5f/0x5f
request_threaded_irq+0xba/0xf5
hpet_work+0xe7/0x1a6
process_one_work+0x15d/0x252
worker_thread+0x117/0x1b2
rescuer_thread+0x187/0x187
kthread+0x81/0x89
__kthread_parkme+0x5b/0x5b
ret_from_fork+0x7c/0xb0
__kthread_parkme+0x5b/0x5b
---[ end trace e6f760295bda327e ]---


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4bded8a24b0719d575f6ffa6b38ae...@yourcmc.ru



Bug#700333: Stack trace

2013-03-05 Thread vitalif
It means nothing very much.  How about the stack trace *before* 
this

line:

The problem is that the maximum available VESA mode is 1400x1050 on
my laptop and the stack is very long, and obviously I can't scroll
it after a kernel panic :-)
How can I get to previous lines of it? :-)


There is netconsole:
https://www.kernel.org/doc/Documentation/networking/netconsole.txt

Although that might not work while suspending.  Serial console would
probably work if the computer has a serial port.  If neither of those
works then you might be able to use a video recording and freeze-
frame.


Yeah, the netconsole doesn't work during suspend - I've just checked, 
the last line it prints is Freezing remaining freezable tasks ... 
(elapsed 0.01 seconds) done.


However the 1st time I tried to use netconsole the suspend surprisingly 
worked with 3.8 :-) the second time it returned back. So it seems the 
bug also isn't 100% reproducible.


The computer has no serial port.

And the video is also not an option - I've tried to film it with 60fps 
ContourHD, it seems the stack trace is printed very fast.


It would be good to have some delay after printing each line of stack 
trace in the kernel - is there such an option?



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e8c511dcfd811c0f2ab822adaf52...@yourcmc.ru



Bug#700333: Anyone?

2013-02-15 Thread vitalif

Anyone?

The bug still persists in 3.7.8-1~experimental.1.


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/6cab98d3f95e5927ce5ba43edb197...@yourcmc.ru