Re: AMD graphics performance regression in 4.15 and later
On 04/11/2018 05:37 AM, Christian König wrote: >> With your patches my EPYC box is unusable with 4.15++ kernels. >> The whole Desktop is acting weird. This one is using >> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU. >> >> Box is 2 * EPYC 7281 with 128 GB ECC RAM >> >> Also a 14C Xeon box with a HD7700 is broken same way. > > The hardware is irrelevant for this. We need to know what software stack > you use on top of it. Well, the hardware appears to be part of the issue too. I don't think it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on 2xXeon and the previous reported had it on a Core 2 Quad that internally has two dies. I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it over the weekend and report what happens. Cheers, Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
On 04/11/2018 05:37 AM, Christian König wrote: >> With your patches my EPYC box is unusable with 4.15++ kernels. >> The whole Desktop is acting weird. This one is using >> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU. >> >> Box is 2 * EPYC 7281 with 128 GB ECC RAM >> >> Also a 14C Xeon box with a HD7700 is broken same way. > > The hardware is irrelevant for this. We need to know what software stack > you use on top of it. Well, the hardware appears to be part of the issue too. I don't think it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on 2xXeon and the previous reported had it on a Core 2 Quad that internally has two dies. I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it over the weekend and report what happens. Cheers, Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
On 04/09/2018 05:42 AM, Christian König wrote: > Backporting all the detection logic is to invasive, but you could just > go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the > other code path. > > Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those. Do you mean just taking the 4.15 code as is and replacing "#ifdef CONFIG_SWIOTLB" with "#if 0" in drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ? Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
On 04/09/2018 05:42 AM, Christian König wrote: > Backporting all the detection logic is to invasive, but you could just > go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the > other code path. > > Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those. Do you mean just taking the 4.15 code as is and replacing "#ifdef CONFIG_SWIOTLB" with "#if 0" in drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ? Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, Thanks for the info. FYI, I've also opened a Firefox bug for that at: https://bugzilla.mozilla.org/show_bug.cgi?id=1448778 Feel free to comment since you have a better understanding of what's going on. One last question: right now I'm running 4.15.0 with the "offending" patch reverted. Is that safe to run or are there possible bad interactions with other changes. Cheers, Jean-Marc On 04/06/2018 01:20 PM, Christian König wrote: > Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin: >> Hi Christian, >> >> On 04/09/2018 07:48 AM, Christian König wrote: >>> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin: >>>> Hi Christian, >>>> >>>> Is there a way to turn off these huge pages at boot-time/run-time? >>> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE. >> Any reason why >> echo never > /sys/kernel/mm/transparent_hugepage/enabled >> doesn't solve the problem? > > Because we unfortunately try to allocate huge pages anyway, we > unfortunately just fail in 100% of all cases. > > That basically gives you both, the extra allocation overhead and the > still bad throughput. > >> Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable >> them for everything and not just what your patch added, right? > > Correct, that's why I wrote that disabling SWIOTLBs might be better. > >>>> I'm not sure what you mean by "We mitigated the problem by avoiding the >>>> slow coherent DMA code path on almost all platforms on newer >>>> kernels". I >>>> tested up to 4.16 and the performance regression is just as bad as >>>> it is >>>> for 4.15. >>> Indeed 4.16 still doesn't have that. You could use the >>> amd-staging-drm-next branch or wait for 4.17. >> Is there a way to pull just that change or is there too much >> interactions with other changes? > > It adds a new detection if memory allocation needs to be coherent or > not, that is not something you can easily pull into older versions. > >>> That isn't related to the GFX hardware, but to your CPU/motherboard and >>> whatever else you have in the system. >> Well, I have an nvidia GPU in the same system (normally only used for >> CUDA) and if I use it instead of my RX 560 then I'm not seeing any >> performance issue with 4.15. > > That's because you are probably using the Nvidia binary driver which has > a completely separate code base. > >>> Some part of your system needs SWIOTLB and that makes allocating memory >>> much slower. >> What would that part be? FTR, I have a complete description of my system >> at https://jmvalin.dreamwidth.org/15583.html >> >> I don't know if it's related, but I can maybe see one thing in common >> between my machine and the Core 2 Quad from the other bug report and >> that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2 >> Quad is made of two two-core CPUs glued together with little >> communication between them. > > Yeah, that is probably the reason. > >>> Intel doesn't use TTM because they don't have dedicated VRAM, but the >>> open source nvidia driver should be affected as well. >> I'm using the proprietary nvidia driver (because CUDA). Is that supposed >> to be affected as well? > > No. > >>> We already mitigated that problem and I don't see any solution which >>> will arrive faster than 4.17. >> Is that supposed to make the slowdown unnoticeable or just slightly >> better? > > It completely goes away. The issue with the coherent path is that it > tries to always allocate the lowest possible memory to make sure that it > fits into the DMA constrains of all devices in the system. > > But since AMD GPU can handle 40bits of addresses you would need at least > 1TB of memory in the system to trigger that (or a NUMA where some system > is low and some in a high area). > > Christian. > >>> The only quick workaround I can see is to avoid firefox, chrome for >>> example is reported to work perfectly fine. >> Or use an unaffected GPU/driver ;-) >> >> Cheers, >> >> Jean-Marc >> >
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, Thanks for the info. FYI, I've also opened a Firefox bug for that at: https://bugzilla.mozilla.org/show_bug.cgi?id=1448778 Feel free to comment since you have a better understanding of what's going on. One last question: right now I'm running 4.15.0 with the "offending" patch reverted. Is that safe to run or are there possible bad interactions with other changes. Cheers, Jean-Marc On 04/06/2018 01:20 PM, Christian König wrote: > Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin: >> Hi Christian, >> >> On 04/09/2018 07:48 AM, Christian König wrote: >>> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin: >>>> Hi Christian, >>>> >>>> Is there a way to turn off these huge pages at boot-time/run-time? >>> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE. >> Any reason why >> echo never > /sys/kernel/mm/transparent_hugepage/enabled >> doesn't solve the problem? > > Because we unfortunately try to allocate huge pages anyway, we > unfortunately just fail in 100% of all cases. > > That basically gives you both, the extra allocation overhead and the > still bad throughput. > >> Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable >> them for everything and not just what your patch added, right? > > Correct, that's why I wrote that disabling SWIOTLBs might be better. > >>>> I'm not sure what you mean by "We mitigated the problem by avoiding the >>>> slow coherent DMA code path on almost all platforms on newer >>>> kernels". I >>>> tested up to 4.16 and the performance regression is just as bad as >>>> it is >>>> for 4.15. >>> Indeed 4.16 still doesn't have that. You could use the >>> amd-staging-drm-next branch or wait for 4.17. >> Is there a way to pull just that change or is there too much >> interactions with other changes? > > It adds a new detection if memory allocation needs to be coherent or > not, that is not something you can easily pull into older versions. > >>> That isn't related to the GFX hardware, but to your CPU/motherboard and >>> whatever else you have in the system. >> Well, I have an nvidia GPU in the same system (normally only used for >> CUDA) and if I use it instead of my RX 560 then I'm not seeing any >> performance issue with 4.15. > > That's because you are probably using the Nvidia binary driver which has > a completely separate code base. > >>> Some part of your system needs SWIOTLB and that makes allocating memory >>> much slower. >> What would that part be? FTR, I have a complete description of my system >> at https://jmvalin.dreamwidth.org/15583.html >> >> I don't know if it's related, but I can maybe see one thing in common >> between my machine and the Core 2 Quad from the other bug report and >> that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2 >> Quad is made of two two-core CPUs glued together with little >> communication between them. > > Yeah, that is probably the reason. > >>> Intel doesn't use TTM because they don't have dedicated VRAM, but the >>> open source nvidia driver should be affected as well. >> I'm using the proprietary nvidia driver (because CUDA). Is that supposed >> to be affected as well? > > No. > >>> We already mitigated that problem and I don't see any solution which >>> will arrive faster than 4.17. >> Is that supposed to make the slowdown unnoticeable or just slightly >> better? > > It completely goes away. The issue with the coherent path is that it > tries to always allocate the lowest possible memory to make sure that it > fits into the DMA constrains of all devices in the system. > > But since AMD GPU can handle 40bits of addresses you would need at least > 1TB of memory in the system to trigger that (or a NUMA where some system > is low and some in a high area). > > Christian. > >>> The only quick workaround I can see is to avoid firefox, chrome for >>> example is reported to work perfectly fine. >> Or use an unaffected GPU/driver ;-) >> >> Cheers, >> >> Jean-Marc >> >
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, On 04/09/2018 07:48 AM, Christian König wrote: > Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin: >> Hi Christian, >> >> Is there a way to turn off these huge pages at boot-time/run-time? > > Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE. Any reason why echo never > /sys/kernel/mm/transparent_hugepage/enabled doesn't solve the problem? Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable them for everything and not just what your patch added, right? >> I'm not sure what you mean by "We mitigated the problem by avoiding the >> slow coherent DMA code path on almost all platforms on newer kernels". I >> tested up to 4.16 and the performance regression is just as bad as it is >> for 4.15. > > Indeed 4.16 still doesn't have that. You could use the > amd-staging-drm-next branch or wait for 4.17. Is there a way to pull just that change or is there too much interactions with other changes? > That isn't related to the GFX hardware, but to your CPU/motherboard and > whatever else you have in the system. Well, I have an nvidia GPU in the same system (normally only used for CUDA) and if I use it instead of my RX 560 then I'm not seeing any performance issue with 4.15. > Some part of your system needs SWIOTLB and that makes allocating memory > much slower. What would that part be? FTR, I have a complete description of my system at https://jmvalin.dreamwidth.org/15583.html I don't know if it's related, but I can maybe see one thing in common between my machine and the Core 2 Quad from the other bug report and that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2 Quad is made of two two-core CPUs glued together with little communication between them. > Intel doesn't use TTM because they don't have dedicated VRAM, but the > open source nvidia driver should be affected as well. I'm using the proprietary nvidia driver (because CUDA). Is that supposed to be affected as well? > We already mitigated that problem and I don't see any solution which > will arrive faster than 4.17. Is that supposed to make the slowdown unnoticeable or just slightly better? > The only quick workaround I can see is to avoid firefox, chrome for > example is reported to work perfectly fine. Or use an unaffected GPU/driver ;-) Cheers, Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, On 04/09/2018 07:48 AM, Christian König wrote: > Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin: >> Hi Christian, >> >> Is there a way to turn off these huge pages at boot-time/run-time? > > Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE. Any reason why echo never > /sys/kernel/mm/transparent_hugepage/enabled doesn't solve the problem? Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable them for everything and not just what your patch added, right? >> I'm not sure what you mean by "We mitigated the problem by avoiding the >> slow coherent DMA code path on almost all platforms on newer kernels". I >> tested up to 4.16 and the performance regression is just as bad as it is >> for 4.15. > > Indeed 4.16 still doesn't have that. You could use the > amd-staging-drm-next branch or wait for 4.17. Is there a way to pull just that change or is there too much interactions with other changes? > That isn't related to the GFX hardware, but to your CPU/motherboard and > whatever else you have in the system. Well, I have an nvidia GPU in the same system (normally only used for CUDA) and if I use it instead of my RX 560 then I'm not seeing any performance issue with 4.15. > Some part of your system needs SWIOTLB and that makes allocating memory > much slower. What would that part be? FTR, I have a complete description of my system at https://jmvalin.dreamwidth.org/15583.html I don't know if it's related, but I can maybe see one thing in common between my machine and the Core 2 Quad from the other bug report and that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2 Quad is made of two two-core CPUs glued together with little communication between them. > Intel doesn't use TTM because they don't have dedicated VRAM, but the > open source nvidia driver should be affected as well. I'm using the proprietary nvidia driver (because CUDA). Is that supposed to be affected as well? > We already mitigated that problem and I don't see any solution which > will arrive faster than 4.17. Is that supposed to make the slowdown unnoticeable or just slightly better? > The only quick workaround I can see is to avoid firefox, chrome for > example is reported to work perfectly fine. Or use an unaffected GPU/driver ;-) Cheers, Jean-Marc
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, Is there a way to turn off these huge pages at boot-time/run-time? Right now the recent kernels are making Firefox pretty much unusable for me. I've been able to revert the patch from 4.15 but it's not really a long-term solution. You mention that the purpose of the patch is to improve performance, but I haven't actually noticed anything running faster on my system. Is there any particular test where I'm supposed to see an improvement compared to 4.14? I'm not sure what you mean by "We mitigated the problem by avoiding the slow coherent DMA code path on almost all platforms on newer kernels". I tested up to 4.16 and the performance regression is just as bad as it is for 4.15. Unlike the older hardware reported on kernel bug 198511, the hardware I have is quite recent (RX 560) and still being sold. I've also confirmed that neither nvidia (on the same machine) nor intel GPUs (on a less powerful machine) are affected, so it seems like there's a way to avoid that slow performance. I'm not saying that what Firefox is doing is ideal (I don't know what it does and why), but it still seems like something that should still be avoided in the kernel. Cheers, Jean-Marc On 04/06/2018 04:03 AM, Christian König wrote: > Hi Jean, > > yeah, that is a known problem. Using huge pages improves the performance > because of better TLB usage, but for the cost of higher allocation > overhead. > > What we found is that firefox is doing something rather strange by > allocating large textures and then just trowing them away again > immediately. > > We mitigated the problem by avoiding the slow coherent DMA code path on > almost all platforms on newer kernels, but essentially somebody needs to > figure out why firefox and/or the user space stack is doing this > constant allocation/freeing of memory. > > There is also a bug tracker on bugs.kernel.org about this, but I can't > find it any more of hand. > > Regards, > Christian. > > Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin: >> Hi, >> >> I noticed a serious graphics performance regression between 4.14 and >> 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and >> causes scrolling to be really choppy/sluggish. I've confirmed that the >> problem is also there on 4.16, while 4.13 works fine. >> >> After a bisection, I've narrowed the regression down to this commit: >> >> commit 648bc3574716400acc06f99915815f80d9563783 >> Author: Christian König <christian.koe...@amd.com> >> Date: Thu Jul 6 09:59:43 2017 +0200 >> >> drm/ttm: add transparent huge page support for DMA allocations v2 >> >> >> Some details about my system: >> Distro: Fedora 27 (up-to-date) >> Video: MSI Radeon RX 560 AERO >> CPU: Dual-socket Xeon E5-2640 v4 (20 cores total) >> RAM: 128 GB ECC >> >> >> As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop >> (with Intel graphics only) the responsiveness is much better then what >> I'm getting on the Xeon machine above with the Radeon card, so this >> really seems to be an AMD-only issue. >> >> Any way to fix the issue? >> >> Thanks, >> >> Jean-Marc >> ___ >> dri-devel mailing list >> dri-de...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Re: AMD graphics performance regression in 4.15 and later
Hi Christian, Is there a way to turn off these huge pages at boot-time/run-time? Right now the recent kernels are making Firefox pretty much unusable for me. I've been able to revert the patch from 4.15 but it's not really a long-term solution. You mention that the purpose of the patch is to improve performance, but I haven't actually noticed anything running faster on my system. Is there any particular test where I'm supposed to see an improvement compared to 4.14? I'm not sure what you mean by "We mitigated the problem by avoiding the slow coherent DMA code path on almost all platforms on newer kernels". I tested up to 4.16 and the performance regression is just as bad as it is for 4.15. Unlike the older hardware reported on kernel bug 198511, the hardware I have is quite recent (RX 560) and still being sold. I've also confirmed that neither nvidia (on the same machine) nor intel GPUs (on a less powerful machine) are affected, so it seems like there's a way to avoid that slow performance. I'm not saying that what Firefox is doing is ideal (I don't know what it does and why), but it still seems like something that should still be avoided in the kernel. Cheers, Jean-Marc On 04/06/2018 04:03 AM, Christian König wrote: > Hi Jean, > > yeah, that is a known problem. Using huge pages improves the performance > because of better TLB usage, but for the cost of higher allocation > overhead. > > What we found is that firefox is doing something rather strange by > allocating large textures and then just trowing them away again > immediately. > > We mitigated the problem by avoiding the slow coherent DMA code path on > almost all platforms on newer kernels, but essentially somebody needs to > figure out why firefox and/or the user space stack is doing this > constant allocation/freeing of memory. > > There is also a bug tracker on bugs.kernel.org about this, but I can't > find it any more of hand. > > Regards, > Christian. > > Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin: >> Hi, >> >> I noticed a serious graphics performance regression between 4.14 and >> 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and >> causes scrolling to be really choppy/sluggish. I've confirmed that the >> problem is also there on 4.16, while 4.13 works fine. >> >> After a bisection, I've narrowed the regression down to this commit: >> >> commit 648bc3574716400acc06f99915815f80d9563783 >> Author: Christian König >> Date: Thu Jul 6 09:59:43 2017 +0200 >> >> drm/ttm: add transparent huge page support for DMA allocations v2 >> >> >> Some details about my system: >> Distro: Fedora 27 (up-to-date) >> Video: MSI Radeon RX 560 AERO >> CPU: Dual-socket Xeon E5-2640 v4 (20 cores total) >> RAM: 128 GB ECC >> >> >> As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop >> (with Intel graphics only) the responsiveness is much better then what >> I'm getting on the Xeon machine above with the Radeon card, so this >> really seems to be an AMD-only issue. >> >> Any way to fix the issue? >> >> Thanks, >> >> Jean-Marc >> ___ >> dri-devel mailing list >> dri-de...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >
AMD graphics performance regression in 4.15 and later
Hi, I noticed a serious graphics performance regression between 4.14 and 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and causes scrolling to be really choppy/sluggish. I've confirmed that the problem is also there on 4.16, while 4.13 works fine. After a bisection, I've narrowed the regression down to this commit: commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian KönigDate: Thu Jul 6 09:59:43 2017 +0200 drm/ttm: add transparent huge page support for DMA allocations v2 Some details about my system: Distro: Fedora 27 (up-to-date) Video: MSI Radeon RX 560 AERO CPU: Dual-socket Xeon E5-2640 v4 (20 cores total) RAM: 128 GB ECC As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop (with Intel graphics only) the responsiveness is much better then what I'm getting on the Xeon machine above with the Radeon card, so this really seems to be an AMD-only issue. Any way to fix the issue? Thanks, Jean-Marc
AMD graphics performance regression in 4.15 and later
Hi, I noticed a serious graphics performance regression between 4.14 and 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and causes scrolling to be really choppy/sluggish. I've confirmed that the problem is also there on 4.16, while 4.13 works fine. After a bisection, I've narrowed the regression down to this commit: commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König Date: Thu Jul 6 09:59:43 2017 +0200 drm/ttm: add transparent huge page support for DMA allocations v2 Some details about my system: Distro: Fedora 27 (up-to-date) Video: MSI Radeon RX 560 AERO CPU: Dual-socket Xeon E5-2640 v4 (20 cores total) RAM: 128 GB ECC As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop (with Intel graphics only) the responsiveness is much better then what I'm getting on the Xeon machine above with the Radeon card, so this really seems to be an AMD-only issue. Any way to fix the issue? Thanks, Jean-Marc
Re: Suspend to RAM generates oops and general protection fault
Hi, Sorry I haven't replied recently about that bug, but I have to admit I have no idea where to start. There actually seems to be much more fundamental problems with the kernel on my machines. I initially realised that even without using suspend to RAM, I was still getting crashes when docking. So I stopped docking and realised my machine would sometimes just crash when I plug/unplug the AC adaptor. Just to give an idea, I've experienced about 10-15 crashes in the past two months -- I don't think I've even done a single clean shutdown during that period. To make things worse, the behaviour is always different. Sometimes I get a panic with keyboard LEDs flashing. Sometimes I get nothing at all and the machine is just frozen (doesn't respond to pings or to Alt-SysRq commands). Sometimes, I just lose my keyboard and/or mouse but the machine stays up. I'm running a vanilla 2.6.20 kernel (not tainted) with the following configuration: http://jmspeex.livejournal.com/1090.html Jean-Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Hi, Sorry I haven't replied recently about that bug, but I have to admit I have no idea where to start. There actually seems to be much more fundamental problems with the kernel on my machines. I initially realised that even without using suspend to RAM, I was still getting crashes when docking. So I stopped docking and realised my machine would sometimes just crash when I plug/unplug the AC adaptor. Just to give an idea, I've experienced about 10-15 crashes in the past two months -- I don't think I've even done a single clean shutdown during that period. To make things worse, the behaviour is always different. Sometimes I get a panic with keyboard LEDs flashing. Sometimes I get nothing at all and the machine is just frozen (doesn't respond to pings or to Alt-SysRq commands). Sometimes, I just lose my keyboard and/or mouse but the machine stays up. I'm running a vanilla 2.6.20 kernel (not tainted) with the following configuration: http://jmspeex.livejournal.com/1090.html Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Luming Yu a écrit : > what about removing psmouse module? Trying that now. Any particular reason you suspect that one? Jean-Marc > On 1/23/07, Jean-Marc Valin <[EMAIL PROTECTED]> wrote: >> >>> will be a device driver. Common causes of suspend/resume problems >> from >> >>> the list you give below are acpi modules, bluetooth and usb. I'd >> also be >> >>> consider pcmcia, drm and fuse possibilities. But again, go for >> unloading >> >>> everything possible in the first instance. >> >> Actually, the reason I sent this is that when I showed the oops/gpf to >> >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug >> >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the >> >> suspend to RAM now works ~95% of the time. >> > >> > Try a kernel without CONFIG_SMP... that will verify if it is SMP >> > related. >> >> Well, this happens to be my main work machine, which I'm not willing to >> have running at half speed for several weeks. Anything else you can >> suggest? >> >> Jean-Marc >> - >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
Luming Yu a écrit : what about removing psmouse module? Trying that now. Any particular reason you suspect that one? Jean-Marc On 1/23/07, Jean-Marc Valin [EMAIL PROTECTED] wrote: will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>>> will be a device driver. Common causes of suspend/resume problems from >>> the list you give below are acpi modules, bluetooth and usb. I'd also be >>> consider pcmcia, drm and fuse possibilities. But again, go for unloading >>> everything possible in the first instance. >> Actually, the reason I sent this is that when I showed the oops/gpf to >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the >> suspend to RAM now works ~95% of the time. > > Try a kernel without CONFIG_SMP... that will verify if it is SMP > related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>> I just encountered the following oops and general protection fault >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The >> relevant errors are below but the full dmesg log is at >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt >> >> This happens when I'm running 2.6.20-rc5. The previous kernel version I >> was using is 2.6.19-rc6 and was much more broken (second attempt >> *always* failed), so it's probably not a regression. > > This is a shot against the odds, but could you please check if the attached > patch has any effect? Thanks, I'll try that. It may take a while because the problem only happened once in dozens of suspend/resume cycles. Jean-Marc > Rafael > > > > > > > Both process_zones()and drain_node_pages() check for populated zones before > touching pagesets. However, __drain_pages does not do so, > > This may result in a NULL pointer dereference for pagesets in unpopulated > zones if a NUMA setup is combined with cpu hotplug. > > Initially the unpopulated zone has the pcp pointers pointing to the boot > pagesets. Since the zone is not populated the boot pageset pointers will > not be changed during page allocator and slab bootstrap. > > If a cpu is later brought down (first call to __drain_pages()) then the pcp > pointers for cpus in unpopulated zones are set to NULL since __drain_pages > does not first check for an unpopulated zone. > > If the cpu is then brought up again then we call process_zones() which will > ignore > the unpopulated zone. So the pageset pointers will still be NULL. > > If the cpu is then again brought down then __drain_pages will attempt to drain > pages by following the NULL pageset pointer for unpopulated zones. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > --- > mm/page_alloc.c |3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-2.6.20-rc4/mm/page_alloc.c > === > --- linux-2.6.20-rc4.orig/mm/page_alloc.c > +++ linux-2.6.20-rc4/mm/page_alloc.c > @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c > if (!populated_zone(zone)) > continue; > > + if (!populated_zone(zone)) > + continue; > + > pset = zone_pcp(zone, cpu); > for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { > struct per_cpu_pages *pcp; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. This is a shot against the odds, but could you please check if the attached patch has any effect? Thanks, I'll try that. It may take a while because the problem only happened once in dozens of suspend/resume cycles. Jean-Marc Rafael Both process_zones()and drain_node_pages() check for populated zones before touching pagesets. However, __drain_pages does not do so, This may result in a NULL pointer dereference for pagesets in unpopulated zones if a NUMA setup is combined with cpu hotplug. Initially the unpopulated zone has the pcp pointers pointing to the boot pagesets. Since the zone is not populated the boot pageset pointers will not be changed during page allocator and slab bootstrap. If a cpu is later brought down (first call to __drain_pages()) then the pcp pointers for cpus in unpopulated zones are set to NULL since __drain_pages does not first check for an unpopulated zone. If the cpu is then brought up again then we call process_zones() which will ignore the unpopulated zone. So the pageset pointers will still be NULL. If the cpu is then again brought down then __drain_pages will attempt to drain pages by following the NULL pageset pointer for unpopulated zones. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- mm/page_alloc.c |3 +++ 1 file changed, 3 insertions(+) Index: linux-2.6.20-rc4/mm/page_alloc.c === --- linux-2.6.20-rc4.orig/mm/page_alloc.c +++ linux-2.6.20-rc4/mm/page_alloc.c @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c if (!populated_zone(zone)) continue; + if (!populated_zone(zone)) + continue; + pset = zone_pcp(zone, cpu); for (i = 0; i ARRAY_SIZE(pset-pcp); i++) { struct per_cpu_pages *pcp; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Try a kernel without CONFIG_SMP... that will verify if it is SMP related. Well, this happens to be my main work machine, which I'm not willing to have running at half speed for several weeks. Anything else you can suggest? Jean-Marc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM generates oops and general protection fault
>> I just encountered the following oops and general protection fault >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The >> relevant errors are below but the full dmesg log is at >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt ... > It looks like something is stomping on memory it shouldn't be touching, > so I would suggest testing multiple cycles with a minimal (preferably > zero) number of modules loaded. If that looks good and reliable, add > modules & processes until you can say 'If I do X, it breaks.'. If having > a minimal number of modules loaded doesn't help, I would then suggest > reviewing your kernel config to see if other things can be built as > modules and the same logic applied. You can be reasonably sure that it > will be a device driver. Common causes of suspend/resume problems from > the list you give below are acpi modules, bluetooth and usb. I'd also be > consider pcmcia, drm and fuse possibilities. But again, go for unloading > everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Jean-Marc > Regards, > > Nigel > >> Cheers, >> >> Jean-Marc >> >> P.S. This is the same laptop I had at LCA for which Linus told me to >> disable preemption and try the newest rc version. >> >> [10746.449071] Unable to handle kernel NULL pointer dereference at >> 0038 RIP: >> [10746.449080] [] iput+0x18/0x80 >> [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 >> [10746.449099] Oops: [1] SMP >> [10746.449104] CPU 0 >> [10746.449107] Modules linked in: psmouse battery ac thermal fan button >> ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep >> ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm >> speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand >> cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock >> asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp >> parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss >> snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp >> pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket >> rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 >> ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor >> [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 >> [10746.449196] RIP: 0010:[] [] >> iput+0x18/0x80 >> [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 >> [10746.449212] RAX: RBX: 8103fcf0 RCX: >> 8103fd20 >> [10746.449219] RDX: 0001 RSI: 0286 RDI: >> 8103fcf0 >> [10746.449225] RBP: 0042 R08: R09: >> >> [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: >> >> [10746.449239] R13: 810075721c70 R14: 805fa940 R15: >> >> [10746.449246] FS: () GS:8058e000() >> knlGS: >> [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b >> [10746.449259] CR2: 0038 CR3: 1207f000 CR4: >> 06e0 >> [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, >> task 810037a1b760) >> [10746.449269] Stack: 811ce2f0 802ddaf8 >> 811ce3c0 811ce2f0 >> [10746.449280] 0042 8022f645 810037f2dd80 >> 0001cb60 >> [10746.449288] 0090 81007daa0e00 00d0 >> 802ddb49 >> [10746.449296] Call Trace: >> [10746.449305] [] prune_one_dentry+0x68/0xa0 >> [10746.449314] [] prune_dcache+0x145/0x1e0 >> [10746.449323] [] shrink_dcache_memory+0x19/0x50 >> [10746.449331] [] shrink_slab+0x117/0x190 >> [10746.449342] [] kswapd+0x382/0x4e0 >> [10746.449356] [] autoremove_wake_function+0x0/0x30 >> [10746.449370] [] kswapd+0x0/0x4e0 >> [10746.449376] [] keventd_create_kthread+0x0/0x90 >> [10746.449383] [] kthread+0xd9/0x120 >> [10746.449394] [] child_rip+0xa/0x12 >> [10746.449401] [] keventd_create_kthread+0x0/0x90 >> [10746.449414] [] kthread+0x0/0x120 >> [10746.449421] [] child_rip+0x0/0x12 >> [10746.449426] >> [10746.449429] >> [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b >> 40 28 48 >> [10746.449449] RIP [] iput+0x18/0x80 >> [10746.449456] RSP >> [10746.449460] CR2: 0038 >> [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to >> get data from device DCKS [20060707] >> >> >> and later: >> >> >> [3.668009] SMP alternatives: switching to SMP code >> [3.668168] Booting
Suspend to RAM generates oops and general protection fault
Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[] [] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [] prune_one_dentry+0x68/0xa0 [10746.449314] [] prune_dcache+0x145/0x1e0 [10746.449323] [] shrink_dcache_memory+0x19/0x50 [10746.449331] [] shrink_slab+0x117/0x190 [10746.449342] [] kswapd+0x382/0x4e0 [10746.449356] [] autoremove_wake_function+0x0/0x30 [10746.449370] [] kswapd+0x0/0x4e0 [10746.449376] [] keventd_create_kthread+0x0/0x90 [10746.449383] [] kthread+0xd9/0x120 [10746.449394] [] child_rip+0xa/0x12 [10746.449401] [] keventd_create_kthread+0x0/0x90 [10746.449414] [] kthread+0x0/0x120 [10746.449421] [] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [] iput+0x18/0x80 [10746.449456] RSP [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707] and later: [3.668009] SMP alternatives: switching to SMP code [3.668168] Booting processor 1/2 APIC 0x1 [4.149691] Initializing CPU#1 [4.229595] Calibrating delay using timer specific routine.. 3990.32 BogoMIPS (lpj=7980654) [4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K [4.229604] CPU: L2 cache: 4096K [4.229606] CPU 1/1 -> Node 0 [4.229608] CPU: Physical Processor ID: 0 [4.229609] CPU: Processor Core ID: 1 [4.230107] Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping 06 [4.233607] CPU 1: Syncing TSC to CPU 0. [3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 960 cycles) [3.764689] general protection fault: [2] SMP [3.764963] CPU 1 [3.764983] Modules linked in: psmouse battery ac thermal fan button arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer
Suspend to RAM generates oops and general protection fault
Hi, I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt This happens when I'm running 2.6.20-rc5. The previous kernel version I was using is 2.6.19-rc6 and was much more broken (second attempt *always* failed), so it's probably not a regression. Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [8022b9c8] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[8022b9c8] [8022b9c8] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [802ddaf8] prune_one_dentry+0x68/0xa0 [10746.449314] [8022f645] prune_dcache+0x145/0x1e0 [10746.449323] [802ddb49] shrink_dcache_memory+0x19/0x50 [10746.449331] [802418a7] shrink_slab+0x117/0x190 [10746.449342] [8025a392] kswapd+0x382/0x4e0 [10746.449356] [802a13b0] autoremove_wake_function+0x0/0x30 [10746.449370] [8025a010] kswapd+0x0/0x4e0 [10746.449376] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449383] [802335a9] kthread+0xd9/0x120 [10746.449394] [80260ec8] child_rip+0xa/0x12 [10746.449401] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449414] [802334d0] kthread+0x0/0x120 [10746.449421] [80260ebe] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [8022b9c8] iput+0x18/0x80 [10746.449456] RSP 810037f2dd50 [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707] and later: [3.668009] SMP alternatives: switching to SMP code [3.668168] Booting processor 1/2 APIC 0x1 [4.149691] Initializing CPU#1 [4.229595] Calibrating delay using timer specific routine.. 3990.32 BogoMIPS (lpj=7980654) [4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K [4.229604] CPU: L2 cache: 4096K [4.229606] CPU 1/1 - Node 0 [4.229608] CPU: Physical Processor ID: 0 [4.229609] CPU: Processor Core ID: 1 [4.230107] Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping 06 [4.233607] CPU 1: Syncing TSC to CPU 0. [3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 960 cycles) [3.764689] general protection fault: [2] SMP [3.764963] CPU 1 [3.764983] Modules linked in: psmouse battery ac thermal fan button arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino
Re: Suspend to RAM generates oops and general protection fault
I just encountered the following oops and general protection fault trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The relevant errors are below but the full dmesg log is at http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in http://people.xiph.org/~jm/config-2.6.20-rc5.txt ... It looks like something is stomping on memory it shouldn't be touching, so I would suggest testing multiple cycles with a minimal (preferably zero) number of modules loaded. If that looks good and reliable, add modules processes until you can say 'If I do X, it breaks.'. If having a minimal number of modules loaded doesn't help, I would then suggest reviewing your kernel config to see if other things can be built as modules and the same logic applied. You can be reasonably sure that it will be a device driver. Common causes of suspend/resume problems from the list you give below are acpi modules, bluetooth and usb. I'd also be consider pcmcia, drm and fuse possibilities. But again, go for unloading everything possible in the first instance. Actually, the reason I sent this is that when I showed the oops/gpf to Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the suspend to RAM now works ~95% of the time. Jean-Marc Regards, Nigel Cheers, Jean-Marc P.S. This is the same laptop I had at LCA for which Linus told me to disable preemption and try the newest rc version. [10746.449071] Unable to handle kernel NULL pointer dereference at 0038 RIP: [10746.449080] [8022b9c8] iput+0x18/0x80 [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0 [10746.449099] Oops: [1] SMP [10746.449104] CPU 0 [10746.449107] Modules linked in: psmouse battery ac thermal fan button ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1 [10746.449196] RIP: 0010:[8022b9c8] [8022b9c8] iput+0x18/0x80 [10746.449206] RSP: :810037f2dd50 EFLAGS: 00010283 [10746.449212] RAX: RBX: 8103fcf0 RCX: 8103fd20 [10746.449219] RDX: 0001 RSI: 0286 RDI: 8103fcf0 [10746.449225] RBP: 0042 R08: R09: [10746.449232] R10: 28f5c28f5c28f5c3 R11: 8023ae90 R12: [10746.449239] R13: 810075721c70 R14: 805fa940 R15: [10746.449246] FS: () GS:8058e000() knlGS: [10746.449253] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [10746.449259] CR2: 0038 CR3: 1207f000 CR4: 06e0 [10746.449265] Process kswapd0 (pid: 218, threadinfo 810037f2c000, task 810037a1b760) [10746.449269] Stack: 811ce2f0 802ddaf8 811ce3c0 811ce2f0 [10746.449280] 0042 8022f645 810037f2dd80 0001cb60 [10746.449288] 0090 81007daa0e00 00d0 802ddb49 [10746.449296] Call Trace: [10746.449305] [802ddaf8] prune_one_dentry+0x68/0xa0 [10746.449314] [8022f645] prune_dcache+0x145/0x1e0 [10746.449323] [802ddb49] shrink_dcache_memory+0x19/0x50 [10746.449331] [802418a7] shrink_slab+0x117/0x190 [10746.449342] [8025a392] kswapd+0x382/0x4e0 [10746.449356] [802a13b0] autoremove_wake_function+0x0/0x30 [10746.449370] [8025a010] kswapd+0x0/0x4e0 [10746.449376] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449383] [802335a9] kthread+0xd9/0x120 [10746.449394] [80260ec8] child_rip+0xa/0x12 [10746.449401] [802a11d0] keventd_create_kthread+0x0/0x90 [10746.449414] [802334d0] kthread+0x0/0x120 [10746.449421] [80260ebe] child_rip+0x0/0x12 [10746.449426] [10746.449429] [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b 40 28 48 [10746.449449] RIP [8022b9c8] iput+0x18/0x80 [10746.449456] RSP 810037f2dd50 [10746.449460] CR2: 0038 [10746.449463] ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to get data from device DCKS [20060707]
Low latency patches
Hi, I've recently come across Con Kolivas' isochronous scheduler and Ingo's RLIMIT_RT_CPU patch. I cannot comment on Ingo's patch, but I've been using Con's scheduler for a few days and I only have good things to say about it (latency is as good as running the process as root). The only thing missing is perhaps a way to enable the feature on a per-user basis (e.g. enable only for owner of the console), though I'm not sure whether it goes in kernel or user space. Are there any plans on merging some of that work? I think it would really help everyone doing audio (or other real-time stuff) on Linux. Jean-Marc P.S. Please include me in CC, I'm not subscribed. -- Jean-Marc Valin <[EMAIL PROTECTED]> Université de Sherbrooke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Low latency patches
Hi, I've recently come across Con Kolivas' isochronous scheduler and Ingo's RLIMIT_RT_CPU patch. I cannot comment on Ingo's patch, but I've been using Con's scheduler for a few days and I only have good things to say about it (latency is as good as running the process as root). The only thing missing is perhaps a way to enable the feature on a per-user basis (e.g. enable only for owner of the console), though I'm not sure whether it goes in kernel or user space. Are there any plans on merging some of that work? I think it would really help everyone doing audio (or other real-time stuff) on Linux. Jean-Marc P.S. Please include me in CC, I'm not subscribed. -- Jean-Marc Valin [EMAIL PROTECTED] Université de Sherbrooke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
Le lundi 28 février 2005 à 08:31 -0700, jmerkey a écrit : > I see this problem infrequently on systems that have low memory > conditions and > with heavy swapping.I have not seen it on 2.6.9 but I have seen it > on 2.6.10. My machine has 1 GB RAM and I wasn't using much of it at that time (2GB free on the swap), so I doubt that's the problem in my case. Jean-Marc -- Jean-Marc Valin <[EMAIL PROTECTED]> Université de Sherbrooke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
Le lundi 28 février 2005 à 08:31 -0700, jmerkey a écrit : I see this problem infrequently on systems that have low memory conditions and with heavy swapping.I have not seen it on 2.6.9 but I have seen it on 2.6.10. My machine has 1 GB RAM and I wasn't using much of it at that time (2GB free on the swap), so I doubt that's the problem in my case. Jean-Marc -- Jean-Marc Valin [EMAIL PROTECTED] Université de Sherbrooke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
> Hmm.. So that error is not FC3 specific, it is present in stock 2.6.10 as > well. Also - This is on a USB disk, right? If so, the error may re-surface. > Try upgrading to latest kernel if possible. It's a USB disk (3.5" IDE + IDE to USB). What has been changed in 2.6.11-rcX? Jean-Marc -- Jean-Marc Valin <[EMAIL PROTECTED]> Université de Sherbrooke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
> Please try stock kernel. 2.6.11-rc3 onwards should be fine. - I saw a similar > problem while running 2.6.10 kernel from Fedora Core 3. It doesn't happen > with stock kernels. I did use a stock 2.6.10 kernel (I said custom in the sense that it wasn't a Debian kernel). After a reboot, I was able to run fsck on the disk (many, many errors) and it went fine after. Jean-Marc -- Jean-Marc Valin <[EMAIL PROTECTED]> Université de Sherbrooke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
Please try stock kernel. 2.6.11-rc3 onwards should be fine. - I saw a similar problem while running 2.6.10 kernel from Fedora Core 3. It doesn't happen with stock kernels. I did use a stock 2.6.10 kernel (I said custom in the sense that it wasn't a Debian kernel). After a reboot, I was able to run fsck on the disk (many, many errors) and it went fine after. Jean-Marc -- Jean-Marc Valin [EMAIL PROTECTED] Université de Sherbrooke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext3 bug
Hmm.. So that error is not FC3 specific, it is present in stock 2.6.10 as well. Also - This is on a USB disk, right? If so, the error may re-surface. Try upgrading to latest kernel if possible. It's a USB disk (3.5 IDE + IDE to USB). What has been changed in 2.6.11-rcX? Jean-Marc -- Jean-Marc Valin [EMAIL PROTECTED] Université de Sherbrooke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ext3 bug
Hi, Looks like I ran into an ext3 bug (or at least the log says so). I got a bunch of messages like: ext3_free_blocks_sb: aborting transaction: Journal has aborted in __ext3_journal_get_undo_access<2>EXT3-fs error (device sda2) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones -Block = 228, count = 1 It happened while I was doing an "rm -rf" on a directory. The "rm" gave a segfault and now I can't unmount the filesystem: unmount says "device is busy", even though lsof reports nothing. The filesystem is on a USB hard disk. The actual dump is in attachment. I'm running Debian unstable with a custom 2.6.10 kernel on a 1.6 GHz Pentium-M. Jean-Marc -- Jean-Marc Valin <[EMAIL PROTECTED]> Université de Sherbrooke Feb 27 01:15:48 idefix kernel: [ cut here ] Feb 27 01:15:48 idefix kernel: PREEMPT Feb 27 01:15:48 idefix kernel: Modules linked in: msdos sd_mod udf isofs sr_mod usb_storage scsi_mod joydev usbhid appletalk ax25 ipx radeon ipt_state iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables ipv6 orinoco_cs orinoco hermes pcmcia lp binfmt_misc af_packet parport_pc parport uhci_hcd pci_hotplug intel_agp agpgart yenta_socket pcmcia_core tg3 snd_intel8x0 snd_ac97_codec ehci_hcd usbcore nls_iso8859_1 nls_cp437 vfat fat ppp_async ppp_generic slhc crc_ccitt snd_pcm_oss tsdev evdev snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore psmouse thermal fan button ac battery cpufreq_ondemand cpufreq_powersave speedstep_centrino freq_table processor Feb 27 01:15:48 idefix kernel: CPU:0 Feb 27 01:15:48 idefix kernel: EIP:0060:[]Not tainted VLI Feb 27 01:15:48 idefix kernel: EFLAGS: 00210286 (2.6.10) Feb 27 01:15:48 idefix kernel: EIP is at journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: eax: 005f ebx: d1f1c000 ecx: b032c7cc edx: b032c7cc Feb 27 01:15:48 idefix kernel: esi: b8932d48 edi: bb2ad41c ebp: dd668080 esp: d1f1dda0 Feb 27 01:15:48 idefix kernel: ds: 007b es: 007b ss: 0068 Feb 27 01:15:48 idefix kernel: Process rm (pid: 10370, threadinfo=d1f1c000 task=c97f49e0) Feb 27 01:15:48 idefix kernel: Stack: b02f67e0 b02e1027 b02f445b 04ca b02f4571 be0a5aac b8932d48 Feb 27 01:15:48 idefix kernel:dfc002b8 b019c940 dfc002b8 b8932d48 e73d7980 b275f400 b8932d48 0006 Feb 27 01:15:48 idefix kernel:b0aeb448 dfc002b8 be0a5aac b019f028 dfc002b8 be0a5aac b8932d48 Feb 27 01:15:48 idefix kernel: Call Trace: Feb 27 01:15:48 idefix kernel: [] ext3_forget+0xf0/0x100 Feb 27 01:15:48 idefix kernel: [] ext3_clear_blocks+0x118/0x170 Feb 27 01:15:48 idefix kernel: [] ext3_free_data+0x98/0x150 Feb 27 01:15:48 idefix kernel: [] ext3_free_branches+0xec/0x270 Feb 27 01:15:48 idefix kernel: [] ext3_truncate+0x46b/0x5d0 Feb 27 01:15:48 idefix kernel: [] ext3_mark_iloc_dirty+0x28/0x40 Feb 27 01:15:48 idefix kernel: [] journal_start+0xad/0xe0 Feb 27 01:15:48 idefix kernel: [] __ext3_journal_stop+0x24/0x50 Feb 27 01:15:48 idefix kernel: [] start_transaction+0x29/0x70 Feb 27 01:15:48 idefix kernel: [] ext3_delete_inode+0xc8/0x100 Feb 27 01:15:48 idefix kernel: [] ext3_delete_inode+0x0/0x100 Feb 27 01:15:48 idefix kernel: [] generic_delete_inode+0xa5/0x170 Feb 27 01:15:48 idefix kernel: [] iput+0x63/0x90 Feb 27 01:15:48 idefix kernel: [] sys_unlink+0xd7/0x150 Feb 27 01:15:48 idefix kernel: [] sys_getdents64+0xa0/0xaa Feb 27 01:15:48 idefix kernel: [] filldir64+0x0/0x100 Feb 27 01:15:48 idefix kernel: [] syscall_call+0x7/0xb Feb 27 01:15:48 idefix kernel: Code: 2f b0 b8 71 45 2f b0 89 44 24 10 b8 ca 04 00 00 89 44 24 0c b8 5b 44 2f b0 89 44 24 08 b8 27 10 2e b0 89 44 24 04 e8 c0 a6 f6 ff <0f> 0b ca 04 5b 44 2f b0 e9 4d ff ff ff c7 04 24 e0 67 2f b0 b8 Feb 27 01:15:48 idefix kernel: <6>note: rm[10370] exited with preempt_count 2 Feb 27 01:15:48 idefix kernel: [] schedule+0x532/0x540 Feb 27 01:15:48 idefix kernel: [] unmap_page_range+0x53/0x80 Feb 27 01:15:48 idefix kernel: [] unmap_vmas+0x1b6/0x1d0 Feb 27 01:15:48 idefix kernel: [] exit_mmap+0x7d/0x160 Feb 27 01:15:48 idefix kernel: [] mmput+0x37/0xa0 Feb 27 01:15:48 idefix kernel: [] do_exit+0x16f/0x470 Feb 27 01:15:48 idefix kernel: [] do_invalid_op+0x0/0xd0 Feb 27 01:15:48 idefix kernel: [] die+0x18b/0x190 Feb 27 01:15:48 idefix kernel: [] do_invalid_op+0xb2/0xd0 Feb 27 01:15:48 idefix kernel: [] journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: [] __wake_up_common+0x41/0x70 Feb 27 01:15:48 idefix kernel: [] release_console_sem+0xbf/0xd0 Feb 27 01:15:48 idefix kernel: [] error_code+0x2b/0x30 Feb 27 01:15:48 idefix kernel: [] journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: [] ext3_forget+0xf0/0x100 Feb 27 01:15:48 idefix kernel: [] ext3_clear_blocks+0x118/0x170 Feb 27 01:15:48 idefix kernel: [] ext3_free_data+0x98/0x150 Feb 27 01:15:48 idefix kernel: [] ext3_
ext3 bug
Hi, Looks like I ran into an ext3 bug (or at least the log says so). I got a bunch of messages like: ext3_free_blocks_sb: aborting transaction: Journal has aborted in __ext3_journal_get_undo_access2EXT3-fs error (device sda2) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones -Block = 228, count = 1 It happened while I was doing an rm -rf on a directory. The rm gave a segfault and now I can't unmount the filesystem: unmount says device is busy, even though lsof reports nothing. The filesystem is on a USB hard disk. The actual dump is in attachment. I'm running Debian unstable with a custom 2.6.10 kernel on a 1.6 GHz Pentium-M. Jean-Marc -- Jean-Marc Valin [EMAIL PROTECTED] Université de Sherbrooke Feb 27 01:15:48 idefix kernel: [ cut here ] Feb 27 01:15:48 idefix kernel: PREEMPT Feb 27 01:15:48 idefix kernel: Modules linked in: msdos sd_mod udf isofs sr_mod usb_storage scsi_mod joydev usbhid appletalk ax25 ipx radeon ipt_state iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables ipv6 orinoco_cs orinoco hermes pcmcia lp binfmt_misc af_packet parport_pc parport uhci_hcd pci_hotplug intel_agp agpgart yenta_socket pcmcia_core tg3 snd_intel8x0 snd_ac97_codec ehci_hcd usbcore nls_iso8859_1 nls_cp437 vfat fat ppp_async ppp_generic slhc crc_ccitt snd_pcm_oss tsdev evdev snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore psmouse thermal fan button ac battery cpufreq_ondemand cpufreq_powersave speedstep_centrino freq_table processor Feb 27 01:15:48 idefix kernel: CPU:0 Feb 27 01:15:48 idefix kernel: EIP:0060:[b01af540]Not tainted VLI Feb 27 01:15:48 idefix kernel: EFLAGS: 00210286 (2.6.10) Feb 27 01:15:48 idefix kernel: EIP is at journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: eax: 005f ebx: d1f1c000 ecx: b032c7cc edx: b032c7cc Feb 27 01:15:48 idefix kernel: esi: b8932d48 edi: bb2ad41c ebp: dd668080 esp: d1f1dda0 Feb 27 01:15:48 idefix kernel: ds: 007b es: 007b ss: 0068 Feb 27 01:15:48 idefix kernel: Process rm (pid: 10370, threadinfo=d1f1c000 task=c97f49e0) Feb 27 01:15:48 idefix kernel: Stack: b02f67e0 b02e1027 b02f445b 04ca b02f4571 be0a5aac b8932d48 Feb 27 01:15:48 idefix kernel:dfc002b8 b019c940 dfc002b8 b8932d48 e73d7980 b275f400 b8932d48 0006 Feb 27 01:15:48 idefix kernel:b0aeb448 dfc002b8 be0a5aac b019f028 dfc002b8 be0a5aac b8932d48 Feb 27 01:15:48 idefix kernel: Call Trace: Feb 27 01:15:48 idefix kernel: [b019c940] ext3_forget+0xf0/0x100 Feb 27 01:15:48 idefix kernel: [b019f028] ext3_clear_blocks+0x118/0x170 Feb 27 01:15:48 idefix kernel: [b019f118] ext3_free_data+0x98/0x150 Feb 27 01:15:48 idefix kernel: [b019f2bc] ext3_free_branches+0xec/0x270 Feb 27 01:15:48 idefix kernel: [b019f8ab] ext3_truncate+0x46b/0x5d0 Feb 27 01:15:48 idefix kernel: [b01a08b8] ext3_mark_iloc_dirty+0x28/0x40 Feb 27 01:15:48 idefix kernel: [b01ae12d] journal_start+0xad/0xe0 Feb 27 01:15:48 idefix kernel: [b01a5234] __ext3_journal_stop+0x24/0x50 Feb 27 01:15:48 idefix kernel: [b019c9a9] start_transaction+0x29/0x70 Feb 27 01:15:48 idefix kernel: [b019cb28] ext3_delete_inode+0xc8/0x100 Feb 27 01:15:48 idefix kernel: [b019ca60] ext3_delete_inode+0x0/0x100 Feb 27 01:15:48 idefix kernel: [b01726f5] generic_delete_inode+0xa5/0x170 Feb 27 01:15:48 idefix kernel: [b01729a3] iput+0x63/0x90 Feb 27 01:15:48 idefix kernel: [b0167f27] sys_unlink+0xd7/0x150 Feb 27 01:15:48 idefix kernel: [b016ad40] sys_getdents64+0xa0/0xaa Feb 27 01:15:48 idefix kernel: [b016aba0] filldir64+0x0/0x100 Feb 27 01:15:48 idefix kernel: [b01030df] syscall_call+0x7/0xb Feb 27 01:15:48 idefix kernel: Code: 2f b0 b8 71 45 2f b0 89 44 24 10 b8 ca 04 00 00 89 44 24 0c b8 5b 44 2f b0 89 44 24 08 b8 27 10 2e b0 89 44 24 04 e8 c0 a6 f6 ff 0f 0b ca 04 5b 44 2f b0 e9 4d ff ff ff c7 04 24 e0 67 2f b0 b8 Feb 27 01:15:48 idefix kernel: 6note: rm[10370] exited with preempt_count 2 Feb 27 01:15:48 idefix kernel: [b02d8772] schedule+0x532/0x540 Feb 27 01:15:48 idefix kernel: [b0146c53] unmap_page_range+0x53/0x80 Feb 27 01:15:48 idefix kernel: [b0146e36] unmap_vmas+0x1b6/0x1d0 Feb 27 01:15:48 idefix kernel: [b014b53d] exit_mmap+0x7d/0x160 Feb 27 01:15:48 idefix kernel: [b0117617] mmput+0x37/0xa0 Feb 27 01:15:48 idefix kernel: [b011c06f] do_exit+0x16f/0x470 Feb 27 01:15:48 idefix kernel: [b01046a0] do_invalid_op+0x0/0xd0 Feb 27 01:15:48 idefix kernel: [b01042cb] die+0x18b/0x190 Feb 27 01:15:48 idefix kernel: [b0104752] do_invalid_op+0xb2/0xd0 Feb 27 01:15:48 idefix kernel: [b01af540] journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: [b01162d1] __wake_up_common+0x41/0x70 Feb 27 01:15:48 idefix kernel: [b0119e9f] release_console_sem+0xbf/0xd0 Feb 27 01:15:48 idefix kernel: [b0103b17] error_code+0x2b/0x30 Feb 27 01:15:48 idefix kernel: [b01af540] journal_forget+0x1d0/0x220 Feb 27 01:15:48 idefix kernel: [b019c940