Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Thu, 15 Nov 2007 08:54:32 +1100 Von: Paul Mackerras [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: [EMAIL PROTECTED], linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() No it's not just for ppc64. We had a patch that went in some time ago to ensure that the M bit was set on various 32-bit platforms because otherwise we got data corruption (due to a small cache in the northbridge not being kept coherent with the processor cache). Ah, I thought this was due to a CPU errata, where L2 cache prefetching causes data corruption with the coherent bit set to 0. Gerhard -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote: Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache prefetch engines (I don't care about the performance loss). I couldn't find any other code that sets the M bit, except for huge TLB page support, but isn't that only for PPC64? Right, it's only 64 bits. You've double checked nothing broke the M bit thing ? In which case, I don't know what else... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Wed, 14 Nov 2007 10:37:52 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() Add printk's to things :-) It's a UP kernel so there should be no spinlocks anyway. Best is to try to get a 100% reprocase and printk your way toward the origin of the problem if you don't have a HW debugger. Unless you manage to sneak in an irq to xmon but if you are totally locked up, you probably can't. Also xmon seems to lockup the machine. I was able to active it through the magic sysrq key, but the machine died afterwards. Could also be something you do that your buggy northbridge doesn't like. For example, maybe it dislikes M bit in the hash table and you end up with it set due to other reasons (I know we had changes in this area). Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache prefetch engines (I don't care about the performance loss). I couldn't find any other code that sets the M bit, except for huge TLB page support, but isn't that only for PPC64? Gerhard -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Wed, 14 Nov 2007 21:04:57 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote: Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache prefetch engines (I don't care about the performance loss). I couldn't find any other code that sets the M bit, except for huge TLB page support, but isn't that only for PPC64? Right, it's only 64 bits. You've double checked nothing broke the M bit thing ? In which case, I don't know what else... Yes, I did. Otherwise the machine dies much earlier in the boot process. Gerhard -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Gerhard Pircher writes: Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code Wow. masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache prefetch engines (I don't care about the performance loss). I couldn't find any other code that sets the M bit, except for huge TLB page support, but isn't that only for PPC64? No it's not just for ppc64. We had a patch that went in some time ago to ensure that the M bit was set on various 32-bit platforms because otherwise we got data corruption (due to a small cache in the northbridge not being kept coherent with the processor cache). Look for CPU_FTR_NEED_COHERENT in include/asm-powerpc/cputable.h and arch/powerpc/mm/*. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Sun, 11 Nov 2007 14:55:40 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote: Original-Nachricht Datum: Fri, 09 Nov 2007 18:50:29 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() Is there a way to debug it without a hardware debugger or can you recommend a cheap hardware debugger? There are ways, sure, which probably involve adding prink's all over the place to figure it out... could be some DMA issue for example, could be pretty much anything. Have you tried booting an initrd with no disk access ? I tried to boot with a ramdisk, but that didn't help much. I still locks up while loading an init program or after entering some commands in sh shell. Looks like the problem is hidden deep in the kernel. Thanks! Gerhard -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
On Tue, 2007-11-13 at 22:23 +0100, Gerhard Pircher wrote: There are ways, sure, which probably involve adding prink's all over the place to figure it out... could be some DMA issue for example, could be pretty much anything. Have you tried booting an initrd with no disk access ? I tried to boot with a ramdisk, but that didn't help much. I still locks up while loading an init program or after entering some commands in sh shell. Looks like the problem is hidden deep in the kernel. Well, at least the above tells is it's not DMA related. I don't know of any deeply hidden problem, so you are probably hitting something else ... if you have disabled idle, then it may be useful to try instrumenting locks or irq enable/disable. Also, did you try booting with all kernel debug options enabled ? Finally, since the problem seem to have started around a specific kernel version, can you try to bisect the patch that causes it ? Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Wed, 14 Nov 2007 08:43:38 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() Well, at least the above tells is it's not DMA related. I don't know of any deeply hidden problem, so you are probably hitting something else ... if you have disabled idle, then it may be useful to try instrumenting locks or irq enable/disable. Well, I only disabled power saving with powersave=off. Are there any other ways to disable idle? What do you mean with instrumenting locks or irq enable/disable? Also, did you try booting with all kernel debug options enabled ? I compiled in almost all kernel debugging options and booted the kernel with driver_debug, initcall_debug and debug. I didn't notice any serious error messages so far. Not sure however, if I missed a debug option. Finally, since the problem seem to have started around a specific kernel version, can you try to bisect the patch that causes it ? Hmm, I'm not sure how to do this (only worked on platform code so far). I guess you think about checking out a kernel version from the git repository, which doesn't contain the patch for kernel_execve(). I still suspect the kernel_execve() function (which was introduced in 2.6.17) because the kernel locks up after starting the first user program. AFAIK kernel threads should be running much earlier. Thanks! regards, Gerhard -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
On Tue, 2007-11-13 at 23:06 +0100, Gerhard Pircher wrote: Well, I only disabled power saving with powersave=off. Are there any other ways to disable idle? What do you mean with instrumenting locks or irq enable/disable? Add printk's to things :-) It's a UP kernel so there should be no spinlocks anyway. Best is to try to get a 100% reprocase and printk your way toward the origin of the problem if you don't have a HW debugger. Unless you manage to sneak in an irq to xmon but if you are totally locked up, you probably can't. Could also be something you do that your buggy northbridge doesn't like. For example, maybe it dislikes M bit in the hash table and you end up with it set due to other reasons (I know we had changes in this area). Also, did you try booting with all kernel debug options enabled ? I compiled in almost all kernel debugging options and booted the kernel with driver_debug, initcall_debug and debug. I didn't notice any serious error messages so far. Not sure however, if I missed a debug option. Finally, since the problem seem to have started around a specific kernel version, can you try to bisect the patch that causes it ? Hmm, I'm not sure how to do this (only worked on platform code so far). I guess you think about checking out a kernel version from the git repository, which doesn't contain the patch for kernel_execve(). I still suspect the kernel_execve() function (which was introduced in 2.6.17) because the kernel locks up after starting the first user program. AFAIK kernel threads should be running much earlier. They are but they cause a lot less MMU pressure, could be an indication... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Fri, 09 Nov 2007 18:50:29 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() I tried to use /bin/sh as init program and was able to enter a command, but then the machine locked up, too. Could that be a problem with a CPU sleeping/idle code? That's possibly an issue, try disabling power save if any for that CPU type. If it worked and broke, you may have to bisect tho. I disabled the powersaving code by adding powersave=off to the kernel's command line, but it didn't help. It seems to lockup whenever it tries to access a filesystem. Is there a way to debug it without a hardware debugger or can you recommend a cheap hardware debugger? Gerhard -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote: Original-Nachricht Datum: Fri, 09 Nov 2007 18:50:29 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() I tried to use /bin/sh as init program and was able to enter a command, but then the machine locked up, too. Could that be a problem with a CPU sleeping/idle code? That's possibly an issue, try disabling power save if any for that CPU type. If it worked and broke, you may have to bisect tho. I disabled the powersaving code by adding powersave=off to the kernel's command line, but it didn't help. It seems to lockup whenever it tries to access a filesystem. Is there a way to debug it without a hardware debugger or can you recommend a cheap hardware debugger? There are ways, sure, which probably involve adding prink's all over the place to figure it out... could be some DMA issue for example, could be pretty much anything. Have you tried booting an initrd with no disk access ? Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote: Hi, I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2 kernel snapshot. The kernel runs through all initcalls, but locks up completely after calling INIT (/sbin/init) by kernel_execve(). Thus I couldn't capture any kernel oops or panic output. Also the magic sysrq key doesn't work. Enabling debug code for soft lockups and spinlock debugging didn't reveal any information. I'm not sure, but I think it is the same problem I had with all kernels = 2.6.17. All of these kernels lock up shortly before or right at calling the init program (resp. as soon as the kernel forks some kernel theads). Any suggestions on how to track down this problem? You don't have a HW debugger or anything like that ? Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Kernel locks up after calling kernel_execve()
Hi, I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2 kernel snapshot. The kernel runs through all initcalls, but locks up completely after calling INIT (/sbin/init) by kernel_execve(). Thus I couldn't capture any kernel oops or panic output. Also the magic sysrq key doesn't work. Enabling debug code for soft lockups and spinlock debugging didn't reveal any information. I'm not sure, but I think it is the same problem I had with all kernels = 2.6.17. All of these kernels lock up shortly before or right at calling the init program (resp. as soon as the kernel forks some kernel theads). Any suggestions on how to track down this problem? regards, Gerhard -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
Original-Nachricht Datum: Fri, 09 Nov 2007 10:20:17 +1100 Von: Benjamin Herrenschmidt [EMAIL PROTECTED] An: Gerhard Pircher [EMAIL PROTECTED] CC: linuxppc-dev@ozlabs.org Betreff: Re: Kernel locks up after calling kernel_execve() On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote: Hi, I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2 kernel snapshot. The kernel runs through all initcalls, but locks up completely after calling INIT (/sbin/init) by kernel_execve(). Thus I couldn't capture any kernel oops or panic output. Also the magic sysrq key doesn't work. Enabling debug code for soft lockups and spinlock debugging didn't reveal any information. I'm not sure, but I think it is the same problem I had with all kernels = 2.6.17. All of these kernels lock up shortly before or right at calling the init program (resp. as soon as the kernel forks some kernel theads). Any suggestions on how to track down this problem? You don't have a HW debugger or anything like that ? Ben. Unfortunately, no. A BDI2000 is too costly for me. I tried to use /bin/sh as init program and was able to enter a command, but then the machine locked up, too. Could that be a problem with a CPU sleeping/idle code? Gerhard -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Kernel locks up after calling kernel_execve()
I tried to use /bin/sh as init program and was able to enter a command, but then the machine locked up, too. Could that be a problem with a CPU sleeping/idle code? That's possibly an issue, try disabling power save if any for that CPU type. If it worked and broke, you may have to bisect tho. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev