Re: Suspend to RAM generates oops and general protection fault
Hi, Sorry I haven't replied recently about that bug, but I have to admit I have no idea where to start. There actually seems to be much more fundamental problems with the kernel on my machines. I initially realised that even without using suspend to RAM, I was still getting crashes when docking. So I stopped docking and realised my machine would sometimes just crash when I plug/unplug the AC adaptor. Just to give an idea, I've experienced about 10-15 crashes in the past two months -- I don't think I've even done a single clean shutdown during that period. To make things worse, the behaviour is always different. Sometimes I get a panic with keyboard LEDs flashing. Sometimes I get nothing at all and the machine is just frozen (doesn't respond to pings or to Alt-SysRq commands). Sometimes, I just lose my keyboard and/or mouse but the machine stays up. I'm running a vanilla 2.6.20 kernel (not tainted) with the following configuration: http://jmspeex.livejournal.com/1090.html Jean-Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Hi, Sorry I haven't replied recently about that bug, but I have to admit I have no idea where to start. There actually seems to be much more fundamental problems with the kernel on my machines. I initially realised that even without using suspend to RAM, I was still getting crashes when docking. So I stopped docking and realised my machine would sometimes just crash when I plug/unplug the AC adaptor. Just to give an idea, I've experienced about 10-15 crashes in the past two months -- I don't think I've even done a single clean shutdown during that period. To make things worse, the behaviour is always different. Sometimes I get a panic with keyboard LEDs flashing. Sometimes I get nothing at all and the machine is just frozen (doesn't respond to pings or to Alt-SysRq commands). Sometimes, I just lose my keyboard and/or mouse but the machine stays up. I'm running a vanilla 2.6.20 kernel (not tainted) with the following configuration: http://jmspeex.livejournal.com/1090.html Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Luming Yu a écrit : > what about removing psmouse module? Trying that now. Any particular reason you suspect that one? Jean-Marc > On 1/23/07, Jean-Marc Valin <[EMAIL PROTECTED]> wrote: >> >>> will be a device driver. Common causes of suspend/resume problems >> from >> >>> the list you give below are acpi modules, bluetooth and usb. I'd >> also be >> >>> consider pcmcia, drm and fuse possibilities. But again, go for >> unloading >> >>> everything possible in the first instance. >> >> Actually, the reason I sent this is that when I showed the oops/gpf to >> >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug >> >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the >> >> suspend to RAM now works ~95% of the time. >> > >> > Try a kernel without CONFIG_SMP... that will verify if it is SMP >> > related. >> >> Well, this happens to be my main work machine, which I'm not willing to >> have running at half speed for several weeks. Anything else you can >> suggest? >> >> Jean-Marc >> - >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
On 1/23/07, Jean-Marc Valin <[EMAIL PROTECTED]> wrote: Luming Yu a écrit : > what about removing psmouse module? Trying that now. Any particular reason you suspect that one? I suspect it is due to broken modules. If not psmouse, please trying a boot with minimal modules loaded, and re-test . Thanks, Luming - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
On 1/23/07, Jean-Marc Valin [EMAIL PROTECTED] wrote: Luming Yu a écrit : what about removing psmouse module? Trying that now. Any particular reason you suspect that one? I suspect it is due to broken modules. If not psmouse, please trying a boot with minimal modules loaded, and re-test . Thanks, Luming - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Luming Yu a écrit : what about removing psmouse module? Trying that now. Any particular reason you suspect that one? Jean-Marc On 1/23/07, Jean-Marc Valin [EMAIL PROTECTED] wrote: will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
what about removing psmouse module? On 1/23/07, Jean-Marc Valin <[EMAIL PROTECTED]> wrote: >>> will be a device driver. Common causes of suspend/resume problems from >>> the list you give below are acpi modules, bluetooth and usb. I'd also be >>> consider pcmcia, drm and fuse possibilities. But again, go for unloading >>> everything possible in the first instance. >> Actually, the reason I sent this is that when I showed the oops/gpf to >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the >> suspend to RAM now works ~95% of the time. > > Try a kernel without CONFIG_SMP... that will verify if it is SMP > related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>>> will be a device driver. Common causes of suspend/resume problems from >>> the list you give below are acpi modules, bluetooth and usb. I'd also be >>> consider pcmcia, drm and fuse possibilities. But again, go for unloading >>> everything possible in the first instance. >> Actually, the reason I sent this is that when I showed the oops/gpf to >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the >> suspend to RAM now works ~95% of the time. > > Try a kernel without CONFIG_SMP... that will verify if it is SMP > related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>> I just encountered the following oops and general protection fault >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The >> relevant errors are below but the full dmesg log is at >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt >> >> This happens when I'm running 2.6.20-rc5. The previous kernel version I >> was using is 2.6.19-rc6 and was much more broken (second attempt >> *always* failed), so it's probably not a regression. > > This is a shot against the odds, but could you please check if the attached > patch has any effect? Thanks, I'll try that. It may take a while because the problem only happened once in dozens of suspend/resume cycles. Jean-Marc > Rafael > > > > > > > Both process_zones()and drain_node_pages() check for populated zones before > touching pagesets. However, __drain_pages does not do so, > > This may result in a NULL pointer dereference for pagesets in unpopulated > zones if a NUMA setup is combined with cpu hotplug. > > Initially the unpopulated zone has the pcp pointers pointing to the boot > pagesets. Since the zone is not populated the boot pageset pointers will > not be changed during page allocator and slab bootstrap. > > If a cpu is later brought down (first call to __drain_pages()) then the pcp > pointers for cpus in unpopulated zones are set to NULL since __drain_pages > does not first check for an unpopulated zone. > > If the cpu is then brought up again then we call process_zones() which will > ignore > the unpopulated zone. So the pageset pointers will still be NULL. > > If the cpu is then again brought down then __drain_pages will attempt to drain > pages by following the NULL pageset pointer for unpopulated zones. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > --- > mm/page_alloc.c |3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-2.6.20-rc4/mm/page_alloc.c > === > --- linux-2.6.20-rc4.orig/mm/page_alloc.c > +++ linux-2.6.20-rc4/mm/page_alloc.c > @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c > if (!populated_zone(zone)) > continue; > > + if (!populated_zone(zone)) > + continue; > + > pset = zone_pcp(zone, cpu); > for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { > struct per_cpu_pages *pcp; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Hi! > > will be a device driver. Common causes of suspend/resume problems from > > the list you give below are acpi modules, bluetooth and usb. I'd also be > > consider pcmcia, drm and fuse possibilities. But again, go for unloading > > everything possible in the first instance. > > Actually, the reason I sent this is that when I showed the oops/gpf to > Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug > problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the > suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Hi, On Monday, 22 January 2007 03:34, Jean-Marc Valin wrote: > Hi, > > I just encountered the following oops and general protection fault > trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 > GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The > relevant errors are below but the full dmesg log is at > http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in > http://people.xiph.org/~jm/config-2.6.20-rc5.txt > > This happens when I'm running 2.6.20-rc5. The previous kernel version I > was using is 2.6.19-rc6 and was much more broken (second attempt > *always* failed), so it's probably not a regression. This is a shot against the odds, but could you please check if the attached patch has any effect? Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King Both process_zones()and drain_node_pages() check for populated zones before touching pagesets. However, __drain_pages does not do so, This may result in a NULL pointer dereference for pagesets in unpopulated zones if a NUMA setup is combined with cpu hotplug. Initially the unpopulated zone has the pcp pointers pointing to the boot pagesets. Since the zone is not populated the boot pageset pointers will not be changed during page allocator and slab bootstrap. If a cpu is later brought down (first call to __drain_pages()) then the pcp pointers for cpus in unpopulated zones are set to NULL since __drain_pages does not first check for an unpopulated zone. If the cpu is then brought up again then we call process_zones() which will ignore the unpopulated zone. So the pageset pointers will still be NULL. If the cpu is then again brought down then __drain_pages will attempt to drain pages by following the NULL pageset pointer for unpopulated zones. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/page_alloc.c |3 +++ 1 file changed, 3 insertions(+) Index: linux-2.6.20-rc4/mm/page_alloc.c === --- linux-2.6.20-rc4.orig/mm/page_alloc.c +++ linux-2.6.20-rc4/mm/page_alloc.c @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c if (!populated_zone(zone)) continue; + if (!populated_zone(zone)) + continue; + pset = zone_pcp(zone, cpu); for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { struct per_cpu_pages *pcp;
Re: Suspend to RAM generates oops and general protection fault
Hi, On Monday, 22 January 2007 03:34, Jean-Marc Valin wrote: Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. This is a shot against the odds, but could you please check if the attached patch has any effect? Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King Both process_zones()and drain_node_pages() check for populated zones before touching pagesets. However, __drain_pages does not do so, This may result in a NULL pointer dereference for pagesets in unpopulated zones if a NUMA setup is combined with cpu hotplug. Initially the unpopulated zone has the pcp pointers pointing to the boot pagesets. Since the zone is not populated the boot pageset pointers will not be changed during page allocator and slab bootstrap. If a cpu is later brought down (first call to __drain_pages()) then the pcp pointers for cpus in unpopulated zones are set to NULL since __drain_pages does not first check for an unpopulated zone. If the cpu is then brought up again then we call process_zones() which will ignore the unpopulated zone. So the pageset pointers will still be NULL. If the cpu is then again brought down then __drain_pages will attempt to drain pages by following the NULL pageset pointer for unpopulated zones. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/page_alloc.c |3 +++ 1 file changed, 3 insertions(+) Index: linux-2.6.20-rc4/mm/page_alloc.c === --- linux-2.6.20-rc4.orig/mm/page_alloc.c +++ linux-2.6.20-rc4/mm/page_alloc.c @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c if (!populated_zone(zone)) continue; + if (!populated_zone(zone)) + continue; + pset = zone_pcp(zone, cpu); for (i = 0; i ARRAY_SIZE(pset-pcp); i++) { struct per_cpu_pages *pcp;
Re: Suspend to RAM generates oops and general protection fault
Hi! will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. This is a shot against the odds, but could you please check if the attached patch has any effect? Thanks, I'll try that. It may take a while because the problem only happened once in dozens of suspend/resume cycles. Jean-Marc Rafael Both process_zones()and drain_node_pages() check for populated zones before touching pagesets. However, __drain_pages does not do so, This may result in a NULL pointer dereference for pagesets in unpopulated zones if a NUMA setup is combined with cpu hotplug. Initially the unpopulated zone has the pcp pointers pointing to the boot pagesets. Since the zone is not populated the boot pageset pointers will not be changed during page allocator and slab bootstrap. If a cpu is later brought down (first call to __drain_pages()) then the pcp pointers for cpus in unpopulated zones are set to NULL since __drain_pages does not first check for an unpopulated zone. If the cpu is then brought up again then we call process_zones() which will ignore the unpopulated zone. So the pageset pointers will still be NULL. If the cpu is then again brought down then __drain_pages will attempt to drain pages by following the NULL pageset pointer for unpopulated zones. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/page_alloc.c |3 +++ 1 file changed, 3 insertions(+) Index: linux-2.6.20-rc4/mm/page_alloc.c === --- linux-2.6.20-rc4.orig/mm/page_alloc.c +++ linux-2.6.20-rc4/mm/page_alloc.c @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c if (!populated_zone(zone)) continue; + if (!populated_zone(zone)) + continue; + pset = zone_pcp(zone, cpu); for (i = 0; i ARRAY_SIZE(pset-pcp); i++) { struct per_cpu_pages *pcp; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
what about removing psmouse module? On 1/23/07, Jean-Marc Valin [EMAIL PROTECTED] wrote: will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Hi. On Mon, 2007-01-22 at 16:16 +1100, Jean-Marc Valin wrote: > >> I just encountered the following oops and general protection fault > >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 > >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The > >> relevant errors are below but the full dmesg log is at > >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in > >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt > ... > > It looks like something is stomping on memory it shouldn't be touching, > > so I would suggest testing multiple cycles with a minimal (preferably > > zero) number of modules loaded. If that looks good and reliable, add > > modules & processes until you can say 'If I do X, it breaks.'. If having > > a minimal number of modules loaded doesn't help, I would then suggest > > reviewing your kernel config to see if other things can be built as > > modules and the same logic applied. You can be reasonably sure that it > > will be a device driver. Common causes of suspend/resume problems from > > the list you give below are acpi modules, bluetooth and usb. I'd also be > > consider pcmcia, drm and fuse possibilities. But again, go for unloading > > everything possible in the first instance. > > Actually, the reason I sent this is that when I showed the oops/gpf to > Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug > problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the > suspend to RAM now works ~95% of the time. I agree that the second is cpu hotplug, but the first is something else, hence my recommendations above. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>> I just encountered the following oops and general protection fault >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The >> relevant errors are below but the full dmesg log is at >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt ... > It looks like something is stomping on memory it shouldn't be touching, > so I would suggest testing multiple cycles with a minimal (preferably > zero) number of modules loaded. If that looks good and reliable, add > modules & processes until you can say 'If I do X, it breaks.'. If having > a minimal number of modules loaded doesn't help, I would then suggest > reviewing your kernel config to see if other things can be built as > modules and the same logic applied. You can be reasonably sure that it > will be a device driver. Common causes of suspend/resume problems from > the list you give below are acpi modules, bluetooth and usb. I'd also be > consider pcmcia, drm and fuse possibilities. But again, go for unloading > everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Jean-Marc > Regards, > > Nigel > >> Cheers, >> >> Jean-Marc >> >> P.S. This is the same laptop I had at LCA for which Linus told me to >> disable preemption and try the newest rc version. >> >> [10746.449071] Unable to handle kernel NULL pointer dereference at >> 0038 RIP: >> [10746.449080] [] iput+0x18/0x80 >> [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 >> [10746.449099] Oops: [1] SMP >> [10746.449104] CPU 0 >> [10746.449107] Modules linked in: psmouse battery ac thermal fan button >> ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep >> ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm >> speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand >> cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock >> asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp >> parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss >> snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp >> pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket >> rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 >> ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor >> [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 >> [10746.449196] RIP: 0010:[] [] >> iput+0x18/0x80 >> [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 >> [10746.449212] RAX: RBX: 8103fcf0 RCX: >> 8103fd20 >> [10746.449219] RDX: 0001 RSI: 0286 RDI: >> 8103fcf0 >> [10746.449225] RBP: 0042 R08: R09: >> >> [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: >> >> [10746.449239] R13: 810075721c70 R14: 805fa940 R15: >> >> [10746.449246] FS: () GS:8058e000() >> knlGS: >> [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b >> [10746.449259] CR2: 0038 CR3: 1207f000 CR4: >> 06e0 >> [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, >> task 810037a1b760) >> [10746.449269] Stack: 811ce2f0 802ddaf8 >> 811ce3c0 811ce2f0 >> [10746.449280] 0042 8022f645 810037f2dd80 >> 0001cb60 >> [10746.449288] 0090 81007daa0e00 00d0 >> 802ddb49 >> [10746.449296] Call Trace: >> [10746.449305] [] prune_one_dentry+0x68/0xa0 >> [10746.449314] [] prune_dcache+0x145/0x1e0 >> [10746.449323] [] shrink_dcache_memory+0x19/0x50 >> [10746.449331] [] shrink_slab+0x117/0x190 >> [10746.449342] [] kswapd+0x382/0x4e0 >> [10746.449356] [] autoremove_wake_function+0x0/0x30 >> [10746.449370] [] kswapd+0x0/0x4e0 >> [10746.449376] [] keventd_create_kthread+0x0/0x90 >> [10746.449383] [] kthread+0xd9/0x120 >> [10746.449394] [] child_rip+0xa/0x12 >> [10746.449401] [] keventd_create_kthread+0x0/0x90 >> [10746.449414] [] kthread+0x0/0x120 >> [10746.449421] [] child_rip+0x0/0x12 >> [10746.449426] >> [10746.449429] >> [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b >> 40 28 48 >> [10746.449449] RIP [] iput+0x18/0x80 >> [10746.449456] RSP >> [10746.449460] CR2: 0038 >> [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to >> get data from device DCKS [20060707] >> >> >> and later: >> >> >> [3.668009] SMP alternatives: switching to SMP code >> [3.668168] Booting
Re: Suspend to RAM generates oops and general protection fault
Hi. On Mon, 2007-01-22 at 13:34 +1100, Jean-Marc Valin wrote: > Hi, > > I just encountered the following oops and general protection fault > trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 > GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The > relevant errors are below but the full dmesg log is at > http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in > http://people.xiph.org/~jm/config-2.6.20-rc5.txt > > This happens when I'm running 2.6.20-rc5. The previous kernel version I > was using is 2.6.19-rc6 and was much more broken (second attempt > *always* failed), so it's probably not a regression. A second attempt always failing usually indicates that a driver was dazed and confused after the first cycle and properly killed by the second attempt, usually because of a lack of [proper] power management code. Between any two versions, some things can be fixed, some things can be broken and some things can become broken in different ways, so your different experience with 2.6.20-rc5 doesn't necessarily mean that this is a different issue. It looks like something is stomping on memory it shouldn't be touching, so I would suggest testing multiple cycles with a minimal (preferably zero) number of modules loaded. If that looks good and reliable, add modules & processes until you can say 'If I do X, it breaks.'. If having a minimal number of modules loaded doesn't help, I would then suggest reviewing your kernel config to see if other things can be built as modules and the same logic applied. You can be reasonably sure that it will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Regards, Nigel > Cheers, > > Jean-Marc > > P.S. This is the same laptop I had at LCA for which Linus told me to > disable preemption and try the newest rc version. > > [10746.449071] Unable to handle kernel NULL pointer dereference at > 0038 RIP: > [10746.449080] [] iput+0x18/0x80 > [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 > [10746.449099] Oops: [1] SMP > [10746.449104] CPU 0 > [10746.449107] Modules linked in: psmouse battery ac thermal fan button > ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep > ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm > speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand > cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock > asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp > parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss > snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp > pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket > rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 > ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor > [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 > [10746.449196] RIP: 0010:[] [] > iput+0x18/0x80 > [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 > [10746.449212] RAX: RBX: 8103fcf0 RCX: > 8103fd20 > [10746.449219] RDX: 0001 RSI: 0286 RDI: > 8103fcf0 > [10746.449225] RBP: 0042 R08: R09: > > [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: > > [10746.449239] R13: 810075721c70 R14: 805fa940 R15: > > [10746.449246] FS: () GS:8058e000() > knlGS: > [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > [10746.449259] CR2: 0038 CR3: 1207f000 CR4: > 06e0 > [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, > task 810037a1b760) > [10746.449269] Stack: 811ce2f0 802ddaf8 > 811ce3c0 811ce2f0 > [10746.449280] 0042 8022f645 810037f2dd80 > 0001cb60 > [10746.449288] 0090 81007daa0e00 00d0 > 802ddb49 > [10746.449296] Call Trace: > [10746.449305] [] prune_one_dentry+0x68/0xa0 > [10746.449314] [] prune_dcache+0x145/0x1e0 > [10746.449323] [] shrink_dcache_memory+0x19/0x50 > [10746.449331] [] shrink_slab+0x117/0x190 > [10746.449342] [] kswapd+0x382/0x4e0 > [10746.449356] [] autoremove_wake_function+0x0/0x30 > [10746.449370] [] kswapd+0x0/0x4e0 > [10746.449376] [] keventd_create_kthread+0x0/0x90 > [10746.449383] [] kthread+0xd9/0x120 > [10746.449394] [] child_rip+0xa/0x12 > [10746.449401] [] keventd_create_kthread+0x0/0x90 > [10746.449414] [] kthread+0x0/0x120 > [10746.449421] [] child_rip+0x0/0x12 > [10746.449426] > [10746.449429] > [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b >
Suspend to RAM generates oops and general protection fault
Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[] [] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [] prune_one_dentry+0x68/0xa0 [10746.449314] [] prune_dcache+0x145/0x1e0 [10746.449323] [] shrink_dcache_memory+0x19/0x50 [10746.449331] [] shrink_slab+0x117/0x190 [10746.449342] [] kswapd+0x382/0x4e0 [10746.449356] [] autoremove_wake_function+0x0/0x30 [10746.449370] [] kswapd+0x0/0x4e0 [10746.449376] [] keventd_create_kthread+0x0/0x90 [10746.449383] [] kthread+0xd9/0x120 [10746.449394] [] child_rip+0xa/0x12 [10746.449401] [] keventd_create_kthread+0x0/0x90 [10746.449414] [] kthread+0x0/0x120 [10746.449421] [] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [] iput+0x18/0x80 [10746.449456] RSP [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707] and later: [3.668009] SMP alternatives: switching to SMP code [3.668168] Booting processor 1/2 APIC 0x1 [4.149691] Initializing CPU#1 [4.229595] Calibrating delay using timer specific routine.. 3990.32 BogoMIPS (lpj=7980654) [4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K [4.229604] CPU: L2 cache: 4096K [4.229606] CPU 1/1 -> Node 0 [4.229608] CPU: Physical Processor ID: 0 [4.229609] CPU: Processor Core ID: 1 [4.230107] Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping 06 [4.233607] CPU 1: Syncing TSC to CPU 0. [3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 960 cycles) [3.764689] general protection fault: [2] SMP [3.764963] CPU 1 [3.764983] Modules linked in: psmouse battery ac thermal fan button arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer
Suspend to RAM generates oops and general protection fault
Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [8022b9c8] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[8022b9c8] [8022b9c8] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [802ddaf8] prune_one_dentry+0x68/0xa0 [10746.449314] [8022f645] prune_dcache+0x145/0x1e0 [10746.449323] [802ddb49] shrink_dcache_memory+0x19/0x50 [10746.449331] [802418a7] shrink_slab+0x117/0x190 [10746.449342] [8025a392] kswapd+0x382/0x4e0 [10746.449356] [802a13b0] autoremove_wake_function+0x0/0x30 [10746.449370] [8025a010] kswapd+0x0/0x4e0 [10746.449376] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449383] [802335a9] kthread+0xd9/0x120 [10746.449394] [80260ec8] child_rip+0xa/0x12 [10746.449401] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449414] [802334d0] kthread+0x0/0x120 [10746.449421] [80260ebe] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [8022b9c8] iput+0x18/0x80 [10746.449456] RSP 810037f2dd50 [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707] and later: [3.668009] SMP alternatives: switching to SMP code [3.668168] Booting processor 1/2 APIC 0x1 [4.149691] Initializing CPU#1 [4.229595] Calibrating delay using timer specific routine.. 3990.32 BogoMIPS (lpj=7980654) [4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K [4.229604] CPU: L2 cache: 4096K [4.229606] CPU 1/1 - Node 0 [4.229608] CPU: Physical Processor ID: 0 [4.229609] CPU: Processor Core ID: 1 [4.230107] Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping 06 [4.233607] CPU 1: Syncing TSC to CPU 0. [3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 960 cycles) [3.764689] general protection fault: [2] SMP [3.764963] CPU 1 [3.764983] Modules linked in: psmouse battery ac thermal fan button arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino
Re: Suspend to RAM generates oops and general protection fault
Hi. On Mon, 2007-01-22 at 13:34 +1100, Jean-Marc Valin wrote: Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. A second attempt always failing usually indicates that a driver was dazed and confused after the first cycle and properly killed by the second attempt, usually because of a lack of [proper] power management code. Between any two versions, some things can be fixed, some things can be broken and some things can become broken in different ways, so your different experience with 2.6.20-rc5 doesn't necessarily mean that this is a different issue. It looks like something is stomping on memory it shouldn't be touching, so I would suggest testing multiple cycles with a minimal (preferably zero) number of modules loaded. If that looks good and reliable, add modules processes until you can say 'If I do X, it breaks.'. If having a minimal number of modules loaded doesn't help, I would then suggest reviewing your kernel config to see if other things can be built as modules and the same logic applied. You can be reasonably sure that it will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Regards, Nigel Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [8022b9c8] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[8022b9c8] [8022b9c8] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [802ddaf8] prune_one_dentry+0x68/0xa0 [10746.449314] [8022f645] prune_dcache+0x145/0x1e0 [10746.449323] [802ddb49] shrink_dcache_memory+0x19/0x50 [10746.449331] [802418a7] shrink_slab+0x117/0x190 [10746.449342] [8025a392] kswapd+0x382/0x4e0 [10746.449356] [802a13b0] autoremove_wake_function+0x0/0x30 [10746.449370] [8025a010] kswapd+0x0/0x4e0 [10746.449376] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449383] [802335a9] kthread+0xd9/0x120 [10746.449394] [80260ec8] child_rip+0xa/0x12 [10746.449401] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449414] [802334d0]
Re: Suspend to RAM generates oops and general protection fault
I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt ... It looks like something is stomping on memory it shouldn't be touching, so I would suggest testing multiple cycles with a minimal (preferably zero) number of modules loaded. If that looks good and reliable, add modules processes until you can say 'If I do X, it breaks.'. If having a minimal number of modules loaded doesn't help, I would then suggest reviewing your kernel config to see if other things can be built as modules and the same logic applied. You can be reasonably sure that it will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Jean-Marc Regards, Nigel Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [8022b9c8] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[8022b9c8] [8022b9c8] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [802ddaf8] prune_one_dentry+0x68/0xa0 [10746.449314] [8022f645] prune_dcache+0x145/0x1e0 [10746.449323] [802ddb49] shrink_dcache_memory+0x19/0x50 [10746.449331] [802418a7] shrink_slab+0x117/0x190 [10746.449342] [8025a392] kswapd+0x382/0x4e0 [10746.449356] [802a13b0] autoremove_wake_function+0x0/0x30 [10746.449370] [8025a010] kswapd+0x0/0x4e0 [10746.449376] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449383] [802335a9] kthread+0xd9/0x120 [10746.449394] [80260ec8] child_rip+0xa/0x12 [10746.449401] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449414] [802334d0] kthread+0x0/0x120 [10746.449421] [80260ebe] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [8022b9c8] iput+0x18/0x80 [10746.449456] RSP 810037f2dd50 [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707]
Re: Suspend to RAM generates oops and general protection fault
Hi. On Mon, 2007-01-22 at 16:16 +1100, Jean-Marc Valin wrote: I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt ... It looks like something is stomping on memory it shouldn't be touching, so I would suggest testing multiple cycles with a minimal (preferably zero) number of modules loaded. If that looks good and reliable, add modules processes until you can say 'If I do X, it breaks.'. If having a minimal number of modules loaded doesn't help, I would then suggest reviewing your kernel config to see if other things can be built as modules and the same logic applied. You can be reasonably sure that it will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. I agree that the second is cpu hotplug, but the first is something else, hence my recommendations above. Regards, Nigel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/