Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-06-17 Thread Jonathan Nieder
tags 584314 - moreinfo + unreproducible
quit

Andreas Berger wrote:

> unfortunately, the spare harddrive that i've been using for this died on me 
> and i can't use my remaining (productive) system for this.

Marking accordingly.  Thanks again for the update.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-05-29 Thread Jonathan Nieder
Andreas Berger wrote:

> out of curiousity, is the scope of this still to make a patch for squeeze? 

Yep, squeeze still has at least a year of life in it yet.

Thanks,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-05-29 Thread Andreas Berger
On Sunday, May 20, 2012 07:32:10 Jonathan Nieder wrote:
> Hi Andreas,
> 
> Jonathan Nieder wrote:
> > Andreas Berger wrote:
> >> ok, i narrowed it down, but it is:
> >> 
> >> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
> >> not found: linux-image-2.6.37-rc4-686,   version
> >> 2.6.37~rc4-1~experimental.1
> > 
> > Unfortunately there are a lot of interesting patches in that range,
> > so we will probably need a little more data to track this down.  So
> 
> > I suggested:
> [...]
> 
> >  - suspending from single-user mode (kernel params "single debug")
> >  
> >or from an initramfs shell (kernel param "break=top") to see if
> >the same problem occurs even if the i915 driver is not loaded
> >yet when the suspend/hibernate happens
> 
> Thanks again for all your help narrowing the bug down this far.
> 
> Did you get a chance to try this?
> 
> Curious,
> Jonathan

i'm sorry i didn't respond in quite a while.

unfortunately, the spare harddrive that i've been using for this died on me 
and i can't use my remaining (productive) system for this. also, the laptop 
itself will probably fail soon (having random power blackouts and the case is 
coming apart everywhere).

out of curiousity, is the scope of this still to make a patch for squeeze? 
seeing as i seem to be the only one affected by this...

greetings,
andreas


signature.asc
Description: This is a digitally signed message part.


Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-05-19 Thread Jonathan Nieder
Hi Andreas,

Jonathan Nieder wrote:
> Andreas Berger wrote:

>> ok, i narrowed it down, but it is:
>>
>> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
>> not found: linux-image-2.6.37-rc4-686,   version 2.6.37~rc4-1~experimental.1
>
> Unfortunately there are a lot of interesting patches in that range,
> so we will probably need a little more data to track this down.  So
> I suggested:
[...]
>  - suspending from single-user mode (kernel params "single debug")
>or from an initramfs shell (kernel param "break=top") to see if
>the same problem occurs even if the i915 driver is not loaded
>yet when the suspend/hibernate happens

Thanks again for all your help narrowing the bug down this far.

Did you get a chance to try this?

Curious,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-03-16 Thread Jonathan Nieder
Jonathan Nieder wrote:

> To recap, this bug is about symptoms of memory corruption after
> suspending to disk on an Acer Aspire 5610, which uses (I think)
> the 945GM express chipset.

This should have read "after suspending to RAM".  Sorry for the
nonsense.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2012-03-16 Thread Jonathan Nieder
affects 584314 + xserver-xorg-video-intel
quit

Hi,

Thanks again for your work on this bug so far.

To recap, this bug is about symptoms of memory corruption after
suspending to disk on an Acer Aspire 5610, which uses (I think)
the 945GM express chipset.

Lenny and wheezy worked fine; it is only the squeeze kernel that
has this problem.

Searching through kernels from snapshot.debian.org, you found
that it was introduced between 2.6.36 and 2.6.37-rc4.  (Nicely
done.)

Andreas Berger wrote:

> ok, i narrowed it down, but it is:
>
> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
> not found: linux-image-2.6.37-rc4-686,   version 2.6.37~rc4-1~experimental.1

Unfortunately there are a lot of interesting patches in that range,
so we will probably need a little more data to track this down.  So
I suggested:

 - trying suspend-to-disk (with

echo disk >/sys/power/state

   ) and seeing if that reproduces the same trouble

 - suspending from single-user mode (kernel params "single debug")
   or from an initramfs shell (kernel param "break=top") to see if
   the same problem occurs even if the i915 driver is not loaded
   yet when the suspend/hibernate happens

Other ideas would be welcome, too.  I'd be happy to get this fixed in
squeeze.

Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-11-23 Thread Jonathan Nieder
Hi again,

Jonathan Nieder wrote:
> Andreas Berger wrote:

>> ok, i narrowed it down, but it is:
>>
>> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
>> not found: linux-image-2.6.37-rc4-686,   version 2.6.37~rc4-1~experimental.1
>>
>> and this time i think i got a complete call trace, is attached
>
> Nice.
[...]
>  - could you try suspending in single-user mode (i.e., kernel
>parameters "single debug"), to rule out a problem in the i915
>driver?

Did you get a chance to try this?

Even simpler can be to suspend from an initramfs rescue shell,
prepared as described at [1]:

echo mem >/sys/power/state

By the way, does trouble only happen after suspend (suspend to RAM),
or does hibernation (suspend to disk) trigger it, too?

[1] http://wiki.debian.org/InitramfsDebug



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-09-11 Thread Jonathan Nieder
Andreas Berger wrote:

> ok, i narrowed it down, but it is:
>
> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
> not found: linux-image-2.6.37-rc4-686,   version 2.6.37~rc4-1~experimental.1
>
> and this time i think i got a complete call trace, is attached

Nice.  Alas, after looking at the Debian changelog and "git shortlog
v2.6.36..v2.6.37-rc4" output, no particular change jumps out as likely
to have fixed this corruption (and the places the kernel panicked
don't give any obvious clue).

Some ideas for narrowing it down:

 - could you try suspending in single-user mode (i.e., kernel
   parameters "single debug"), to rule out a problem in the i915
   driver?

 - likewise, does unloading other modules before suspend help?

 - if nothing else gives a hint: can you bisect to find the fix?  It
   works like this:

1. Reproduce the bug with the unpatched kernel.

# apt-get install git-core build-essential
$ git clone git://github.com/torvalds/linux.git; # kernel.org is down
$ cd linux
$ git checkout v2.6.36
$ make localmodconfig; # minimal configuration
$ make deb-pkg; # with -j for parallel build if wanted
# dpkg -i ../
# reboot
... test test test ...

Hopefully it reproduces the bug.  Otherwise, declare victory and we
can figure out how Debian-specific changes screwed it up.

2. Reproduce the fix.

$ cd ~/src/linux
$ git checkout v2.6.37-rc4
$ yes "" | make silentoldconfig; # reuse configuration
$ make deb-pkg
# dpkg -i ../
# reboot
... test test test ...

Hopefully it does _not_ reproduce the bug.  If not, try again after
copying Debian's config-2.6.37-rc4-686 as ~/src/linux/.config and
rebuild --- if that fixes it, declare victory and we can figure out
which configuration change fixed it, and if that doesn't fix it, we
can look for a relevant Debian-specific patch.

3. Great --- so v2.6.36 reproduces the bug and v2.6.37-rc4 reproduces
the fix.  Tell git:

$ cd ~/src/linux
$ git bisect start v2.6.37-rc4 v2.6.36

Git checks out a revision halfway between to test.

$ yes "" | make silentoldconfig; # reuse configuration
$ make deb-pkg
# dpkg -i ../
# reboot
... test test test ...
$ cd ~/src/linux
$ git bisect good; # if it crashes
$ git bisect bad; # if it is stable
$ git bisect skip; # if some other bug makes it hard to test

Yes, "good" means "successfully demonstrates the bug".  The naming is
a little confusing because git bisect is usually used to find changes
introducing bugs rather than changes fixing them.

4. Repeat until bored:

$ make silentoldconfig
$ make deb-pkg
# dpkg -i ../
# reboot
... test test test ...
$ cd ~/src/linux
$ git bisect good / bad / skip

Eventually it will tell the "first bad commit" (i.e., the fix), which
was what was wanted.  If you get bored before then, that's still
useful --- "git bisect log" will tell the results so far.  (Even a
few rounds can narrow things down a lot.)  If the gitk package is
installed, you can run "git bisect visualize" at any time to watch the
range of changes potentially containing the fix narrowing.

"man git-bisect" and /usr/share/doc/git-doc/git-bisect-lk2009.html
from the git-doc package have details.

Thanks much for your help so far!
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-09-07 Thread Andreas Berger
On Thursday, September 01, 2011 21:09:38 Jonathan Nieder wrote:
> Hi Andreas,
> 
> Andreas Berger wrote:
> > On Tuesday, August 30, 2011 05:10:47 you wrote:
> >> I suspect memory corruption.  Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
> >> Fix memory corruption related to swap, 2010-12-03) fixes it.  Could
> >> you test 2.6.37-rc5 and 2.6.37-rc4?
> > 
> > um, maybe a stupid question, but where do i get these kernels? are they
> > in some debian repository or do i have to build them?
> 
> http://snapshot.debian.org/, source package linux-2.6. :)
> 
> Thanks,
> Jonathan

ok, i narrowed it down, but it is:

found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
not found: linux-image-2.6.37-rc4-686,   version 2.6.37~rc4-1~experimental.1

and this time i think i got a complete call trace, is attached


greetings,
andreas
[  186.878224] BUG: unable to handle kernel paging request at f76ff01c
[  186.878300] IP: [] df_probe+0x3a/0x287 [ext3]
[  186.878366] *pde = 7067 *pte = f0001212
[  186.878415] Oops:  [#1] SMP
[  186.878454] last sysfs file: 
/sys/devices/pci:00/:00:1c.3/:05:00.0/ieee80211/phy0/rfkill0/state
[  186.878540] Modules linked in: acpi_cpufreq mperf cpufreq_conservative 
cpufreq_stats cpufreq_userspace cpufreq_powersave parport_pc ppdev lp parport 
sco bridge stp bnep rfcomm 12cap crc16 bluetooth uinput fuse loop firewire_sbp2 
firewire_core crc_itu_t snd_hda_codec_realtek arc4 ecb snd_hda_intel iwl3945 
snd_hda_codec iwlcore i915 snd_hwdep snd_pcm mac80211 yenta_socket snd_seq 
drm_kms_helper i2c_i801 snd_timer snd_deq_device cfg80211 pcmcia_rsrc drm 
i2c_algo_bit rng_core tpm_tis joydev rfkill tpm i2c_core tpm_bios snd shpchp 
container ac pci_hotplug soundcore wmi battery video pcspkr snd_page_alloc 
button serio_raw psmouse evdev processor output ext3 jbd mbcache sg sr_mod 
cdrom sd_mod crc_t10dif b44 ata_generic ssb ata_piix uhci_hcd pcmcia libata 
scsi_mod sdhci_pci ehci_hcd usbcore sdhci mmc_core thermal pcmcia_core 
led_class mii thermal_sys nls_base [last unloaded: scsi_wait_scan]
[  186.879586]
[  186.879604] Pid: 1643, comm: NetworkManager Not tainted 2.6.36-trunk-686 #1 
Grapevine/Aspire 5610
[  186.879604] EIP: 0060:[] EFLAGS: 00010282 CPU: 0
[  186.879743] EIP is at dx_probe+0x3a/0x287 [ext3]
[  186.879785] EAX: f6dee8f8 EBX: f6dbedf4 ECX:  EDX: f6dee8f8
[  186.879840] ESI: f6dee8f8 EDI: f76ff000 EBP: f6ba9e00 ESP: f6ba9d50
[  186.879895]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  186.879943] Process NetworkManager (pid: 1643, ti=f6ba8000 task=f68ad4a0 
task.ti=f6ba8000)
[  186.880014] Stack:
[  186.880034]  f6ba9e14  f6ba9df0 f6f902c8 c12b2ef0 f6ba9da0 000f 
f6ba9f54 f6f902a8
[  186.880130] <0> f6dbedf4 f685f200 f8618d39 f6ba9dd8 f6ba9e00 000f 
f6ba9f54 f6e1ca80
[  186.880233] <0> f6f90304 f6f902c8 1000  003c  
f6ba9e1c 0001
[  186.880376] Call Trace:
[  186.880376]  [] ? ext3_find_entry+0x85/0x49a [ext3]
[  186.880432]  [] ? d_alloc+0x1b/0x142
[  186.880482]  [] ? ext3_lookup+0x24/0xa8 [ext3]
[  186.880535]  [] ? d_alloc_and_lookup+0x3c/0x52
[  186.880584]  [] ? do_lookup+0x92/0xcb
[  186.880627]  [] ? link_path_walk+0x242/0x372
[  186.880674]  [] ? path_walk+0x4f/0xae
[  186.880717]  [] ? do_path_lookup+0x1f/0x69
[  186.880762]  [] ? user_path_at+0x37/0x5f
[  186.880808]  [] ? vfs_fstatat+0x2a/0x50
[  186.880851]  [] ? vfs_lstat+0x13/0x15
[  186.880892]  [] ? sys_lstat64+0xf/0x23
[  186.880937]  [] ? sysenter_do_call+0x12/0x28
[  186.880983] Code: 24 30 8b 44 24 2c 89 4c 24 08 31 c9 c7 00 00 00 00 00 31 
c0 55 6a 00 e8 50 f6 ff ff 59 5f 85 c0 89 c6 0f 84 20 02 00 00 8b 78 18 <8a> 47 
1c 3c 02 76 0b 0f b6 c0 50 68 f2 33 62 f8 eb 64 8b 54 24
[  186.881401] EIP: [] dx_probe+0x3a/0x287 [ext3] SS:ESP 0068:f6ba9d50
[  186.881483] CR2: f76ff01c
[  186.881645] ---[ end trace 4938385b8da477eb ]---


Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-09-01 Thread Jonathan Nieder
Hi Andreas,

Andreas Berger wrote:
> On Tuesday, August 30, 2011 05:10:47 you wrote:

>> I suspect memory corruption.  Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
>> Fix memory corruption related to swap, 2010-12-03) fixes it.  Could
>> you test 2.6.37-rc5 and 2.6.37-rc4?
>
> um, maybe a stupid question, but where do i get these kernels? are they in 
> some debian repository or do i have to build them?

http://snapshot.debian.org/, source package linux-2.6. :)

Thanks,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-09-01 Thread Andreas Berger
On Tuesday, August 30, 2011 05:10:47 you wrote:
> Unfortunately what you typed doesn't include the call trace (or maybe
> there was none).  


ah, ok, then there was no call trace, i definitely typed off everything there 
was


> > I suspect memory corruption.  Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
> Fix memory corruption related to swap, 2010-12-03) fixes it.  Could
> you test 2.6.37-rc5 and 2.6.37-rc4?


um, maybe a stupid question, but where do i get these kernels? are they in 
some debian repository or do i have to build them?


> 
> Thanks,
> Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-08-29 Thread Jonathan Nieder
notfound 584314 linux-2.6/2.6.32-32
notfixed 584314 linux-2.6/2.6.38-3
found 584314 linux-2.6/2.6.32-30
fixed 584314 linux-2.6/2.6.38-5
quit

Andreas Berger wrote:

> in linux-image-2.6.32-5-686, version 2.6.32-30, the bug was still there,
> in linux-image-2.6.38-2-686, version 2.6.38-5, the bug was no longer there,
>
> in between the two, i don't know, but if it helps, i can narrow it down as 
> soon as i get home to a spare hard drive.

Sure, it would help to narrow the search for the fix (but see below to
save some time).

> On Thursday, July 28, 2011 04:19:39 Jonathan Nieder wrote:

>>  - could you send a photo of the screen during the oops, so we can read
>>the backtrace?
>
> i typed it off the screen and included it in my previous mail here: 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=22;bug=584314
>
> is that not what you mean?

Unfortunately what you typed doesn't include the call trace (or maybe
there was none).  It does include the code, which when passed through
scripts/decodecode looks like this:

| kernel:[ 496.263433] Code: 04 01 00 00 00 66 83 7c 24 28 00 79 37 89 f5 31 db 
eb 2b ba 03 00 00 00 89 e8 e8 ee 73 fa ff b9 00 04 00 00 89 04 24 89 c7 31 c0 
 ab 8b 04 24 ba 03 00 00 00 43 83 c5 20 e8 20 72 fa ff 3b 5c
[...]
|   11: eb 2b   jmp0x3e
|   13: ba 03 00 00 00  mov$0x3,%edx
|   18: 89 e8   mov%ebp,%eax
|   1a: e8 ee 73 fa ff  callq  0xfffa740d
|   1f: b9 00 04 00 00  mov$0x400,%ecx
|   24: 89 04 24mov%eax,(%rsp)
|   27: 89 c7   mov%eax,%edi
|   29: 31 c0   xor%eax,%eax
|   2b:*f3 ab   rep stos %eax,%es <-- trapping 
instruction:(%rdi)
|   2d: 8b 04 24mov(%rsp),%eax

Building mm/page_alloc.s and comparing, we see that this is in
"clear_highpage"; the function call starting on line 13 is to
kmap_atomic and the trapping rep stos is memset(page, 0, PAGE_SIZE).

Unwinding a little: clear_highpage is called by prep_zero_page,
which is called by prep_new_page, which is called by buffered_rmqueue,
which is called by get_page_from_freelist for each potentially
free page.

I suspect memory corruption.  Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
Fix memory corruption related to swap, 2010-12-03) fixes it.  Could
you test 2.6.37-rc5 and 2.6.37-rc4?

Thanks,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2011-07-27 Thread Jonathan Nieder
found 584314 linux-2.6/2.6.32-32
fixed 584314 linux-2.6/2.6.38-3
tags 584314 = upstream
quit

Hi Andreas,

Andreas Berger wrote:

> i can no longer reproduce this bug with kernel 2.6.38
>
> to be sure that it's not due to some other change in testing, i did:
> -clean install of debian 6 (kernel 2.6.32-5), suspend, resume, kerneloops
> -add kernel 2.6.38-2 (from testing), suspend, resume, everything goes fine

Thanks!  Quick questions:

 - when you say "kernel 2.6.32-5", I assume you mean the package
   linux-image-2.6.32-5-686 or linux-image-2.6.32-5-amd64.  What version
   did you use?  (The number after the dash should be around 30; you can
   get it with "dpkg -l 'linux-image-*'".

 - same question for 2.6.38.

 - could you send a photo of the screen during the oops, so we can read
   the backtrace?

 - could you send the full output of "dmesg" after booting with a
   working version?

Regards,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

2010-06-02 Thread Andreas Berger
Package: base
Severity: important
Tags: squeeze

Steps to Reproduce:
1: Suspend Laptop to RAM
2: Resume from Suspend
3: Wait and see, preferably monitoring top:
At some random time, ranging from immediately (black screen after resume) to 
several hours later, the system will become unresponsive. Switching to tty1 or 
killing xorg with Alt+Print+K does not work, Alt+Print+REISUB does work. Each 
freeze is anticipated by a random process (this time it was mandb, was 
installing something) hogging 100% of CPU, then the System becomes gradually 
unresponsive within a minute or so (panel, metacity, finally mouse cursor 
freezes too). Additionally, i don't know if this is related, i noticed one 
process using % of CPU according to top, just thought i'd mention it.
This bug constitutes a regression, suspend does work flawlessly on this Laptop 
in Lenny.
Also, i encountered this bug in Ubuntu 9.10 (ironically, this was the one that 
pushed me over the edge to switch to debian), the corresponding bug report is 
here: https://bugs.launchpad.net/ubuntu/+bug/480850
Hardware is an Acer Aspire 5610 Laptop, please advise me on what more specific 
information to gather and what else to do, I'm happy to try out anything you 
suggest.

I assigned this bug to base because reportbug forced me to choose something, 
but i can only guess about the package, please reassign it.

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-trunk-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org