Re: Kernel locks up after calling kernel_execve()

2007-11-15 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Thu, 15 Nov 2007 08:54:32 +1100
 Von: Paul Mackerras [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: [EMAIL PROTECTED], linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 No it's not just for ppc64.  We had a patch that went in some time ago
 to ensure that the M bit was set on various 32-bit platforms because
 otherwise we got data corruption (due to a small cache in the
 northbridge not being kept coherent with the processor cache).
Ah, I thought this was due to a CPU errata, where L2 cache prefetching
causes data corruption with the coherent bit set to 0.

Gerhard
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-14 Thread Benjamin Herrenschmidt

On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote:
 Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
 masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
 prefetch engines (I don't care about the performance loss).
 I couldn't find any other code that sets the M bit, except for huge
 TLB
 page support, but isn't that only for PPC64?

Right, it's only 64 bits. You've double checked nothing broke the M bit
thing ? In which case, I don't know what else...

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-14 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Wed, 14 Nov 2007 10:37:52 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 Add printk's to things :-) It's a UP kernel so there should be no
 spinlocks anyway.
 
 Best is to try to get a 100% reprocase and printk your way toward the
 origin of the problem if you don't have a HW debugger. Unless you manage
 to sneak in an irq to xmon but if you are totally locked up, you
 probably can't.
Also xmon seems to lockup the machine. I was able to active it through the
magic sysrq key, but the machine died afterwards.

 Could also be something you do that your buggy northbridge doesn't like.
 For example, maybe it dislikes M bit in the hash table and you end up
 with it set due to other reasons (I know we had changes in this area).
Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
prefetch engines (I don't care about the performance loss).
I couldn't find any other code that sets the M bit, except for huge TLB
page support, but isn't that only for PPC64?

Gerhard
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-14 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Wed, 14 Nov 2007 21:04:57 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote:
  Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
  masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
  prefetch engines (I don't care about the performance loss).
  I couldn't find any other code that sets the M bit, except for huge
  TLB
  page support, but isn't that only for PPC64?
 
 Right, it's only 64 bits. You've double checked nothing broke the M bit
 thing ? In which case, I don't know what else...
Yes, I did. Otherwise the machine dies much earlier in the boot process.

Gerhard
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-14 Thread Paul Mackerras
Gerhard Pircher writes:

 Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code

Wow.

 masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
 prefetch engines (I don't care about the performance loss).
 I couldn't find any other code that sets the M bit, except for huge TLB
 page support, but isn't that only for PPC64?

No it's not just for ppc64.  We had a patch that went in some time ago
to ensure that the M bit was set on various 32-bit platforms because
otherwise we got data corruption (due to a small cache in the
northbridge not being kept coherent with the processor cache).

Look for CPU_FTR_NEED_COHERENT in include/asm-powerpc/cputable.h and
arch/powerpc/mm/*.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-13 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Sun, 11 Nov 2007 14:55:40 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 
 On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote:
   Original-Nachricht 
   Datum: Fri, 09 Nov 2007 18:50:29 +1100
   Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
   An: Gerhard Pircher [EMAIL PROTECTED]
   CC: linuxppc-dev@ozlabs.org
   Betreff: Re: Kernel locks up after calling kernel_execve()
 
  Is there a way to debug it without a hardware debugger or can you
  recommend a cheap hardware debugger?
 
 There are ways, sure, which probably involve adding prink's all over the
 place to figure it out... could be some DMA issue for example, could be
 pretty much anything. Have you tried booting an initrd with no disk
 access ?
I tried to boot with a ramdisk, but that didn't help much. I still locks up
while loading an init program or after entering some commands in
sh shell. Looks like the problem is hidden deep in the kernel.

Thanks!

Gerhard

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-13 Thread Benjamin Herrenschmidt

On Tue, 2007-11-13 at 22:23 +0100, Gerhard Pircher wrote:
  There are ways, sure, which probably involve adding prink's all over
 the
  place to figure it out... could be some DMA issue for example, could
 be
  pretty much anything. Have you tried booting an initrd with no disk
  access ?
 I tried to boot with a ramdisk, but that didn't help much. I still
 locks up
 while loading an init program or after entering some commands in
 sh shell. Looks like the problem is hidden deep in the kernel.

Well, at least the above tells is it's not DMA related.

I don't know of any deeply hidden problem, so you are probably hitting
something else ... if you have disabled idle, then it may be useful to
try instrumenting locks or irq enable/disable.

Also, did you try booting with all kernel debug options enabled ?

Finally, since the problem seem to have started around a specific kernel
version, can you try to bisect the patch that causes it ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-13 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Wed, 14 Nov 2007 08:43:38 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 Well, at least the above tells is it's not DMA related.
 
 I don't know of any deeply hidden problem, so you are probably hitting
 something else ... if you have disabled idle, then it may be useful to
 try instrumenting locks or irq enable/disable.
Well, I only disabled power saving with powersave=off. Are there any other
ways to disable idle? What do you mean with instrumenting locks or
irq enable/disable?

 Also, did you try booting with all kernel debug options enabled ?
I compiled in almost all kernel debugging options and booted the kernel
with driver_debug, initcall_debug and debug. I didn't notice any serious
error messages so far. Not sure however, if I missed a debug option.

 Finally, since the problem seem to have started around a specific kernel
 version, can you try to bisect the patch that causes it ?
Hmm, I'm not sure how to do this (only worked on platform code so far).
I guess you think about checking out a kernel version from the git
repository, which doesn't contain the patch for kernel_execve().
I still suspect the kernel_execve() function (which was introduced in
2.6.17) because the kernel locks up after starting the first user program.
AFAIK kernel threads should be running much earlier.

Thanks!

regards,

Gerhard
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-13 Thread Benjamin Herrenschmidt

On Tue, 2007-11-13 at 23:06 +0100, Gerhard Pircher wrote:
 Well, I only disabled power saving with powersave=off. Are there any
 other
 ways to disable idle? What do you mean with instrumenting locks or
 irq enable/disable?

Add printk's to things :-) It's a UP kernel so there should be no
spinlocks anyway.

Best is to try to get a 100% reprocase and printk your way toward the
origin of the problem if you don't have a HW debugger. Unless you manage
to sneak in an irq to xmon but if you are totally locked up, you
probably can't.

Could also be something you do that your buggy northbridge doesn't like.
For example, maybe it dislikes M bit in the hash table and you end up
with it set due to other reasons (I know we had changes in this area).

  Also, did you try booting with all kernel debug options enabled ?
 I compiled in almost all kernel debugging options and booted the
 kernel
 with driver_debug, initcall_debug and debug. I didn't notice any
 serious
 error messages so far. Not sure however, if I missed a debug option.
 
  Finally, since the problem seem to have started around a specific
 kernel
  version, can you try to bisect the patch that causes it ?
 Hmm, I'm not sure how to do this (only worked on platform code so
 far).
 I guess you think about checking out a kernel version from the git
 repository, which doesn't contain the patch for kernel_execve().
 I still suspect the kernel_execve() function (which was introduced in
 2.6.17) because the kernel locks up after starting the first user
 program.
 AFAIK kernel threads should be running much earlier.

They are but they cause a lot less MMU pressure, could be an
indication...

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-10 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Fri, 09 Nov 2007 18:50:29 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 
  I tried to use /bin/sh as init program and was able to enter a command,
  but then the machine locked up, too.
  Could that be a problem with a CPU sleeping/idle code?
 
 That's possibly an issue, try disabling power save if any for that CPU
 type. If it worked and broke, you may have to bisect tho.
I disabled the powersaving code by adding powersave=off to the kernel's
command line, but it didn't help. It seems to lockup whenever it tries to
access a filesystem.
Is there a way to debug it without a hardware debugger or can you
recommend a cheap hardware debugger?

Gerhard
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-10 Thread Benjamin Herrenschmidt

On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote:
  Original-Nachricht 
  Datum: Fri, 09 Nov 2007 18:50:29 +1100
  Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
  An: Gerhard Pircher [EMAIL PROTECTED]
  CC: linuxppc-dev@ozlabs.org
  Betreff: Re: Kernel locks up after calling kernel_execve()
 
  
   I tried to use /bin/sh as init program and was able to enter a command,
   but then the machine locked up, too.
   Could that be a problem with a CPU sleeping/idle code?
  
  That's possibly an issue, try disabling power save if any for that CPU
  type. If it worked and broke, you may have to bisect tho.
 I disabled the powersaving code by adding powersave=off to the kernel's
 command line, but it didn't help. It seems to lockup whenever it tries to
 access a filesystem.
 Is there a way to debug it without a hardware debugger or can you
 recommend a cheap hardware debugger?

There are ways, sure, which probably involve adding prink's all over the
place to figure it out... could be some DMA issue for example, could be
pretty much anything. Have you tried booting an initrd with no disk
access ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-08 Thread Benjamin Herrenschmidt

On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote:
 Hi,
 
 I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2
 kernel snapshot. The kernel runs through all initcalls, but locks up
 completely after calling INIT (/sbin/init) by kernel_execve(). Thus I
 couldn't capture any kernel oops or panic output. Also the magic sysrq
 key doesn't work. Enabling debug code for soft lockups and spinlock
 debugging didn't reveal any information.
 I'm not sure, but I think it is the same problem I had with all kernels
 = 2.6.17. All of these kernels lock up shortly before or right at calling
 the init program (resp. as soon as the kernel forks some kernel theads).
 Any suggestions on how to track down this problem?

You don't have a HW debugger or anything like that ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Kernel locks up after calling kernel_execve()

2007-11-08 Thread Gerhard Pircher
Hi,

I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2
kernel snapshot. The kernel runs through all initcalls, but locks up
completely after calling INIT (/sbin/init) by kernel_execve(). Thus I
couldn't capture any kernel oops or panic output. Also the magic sysrq
key doesn't work. Enabling debug code for soft lockups and spinlock
debugging didn't reveal any information.
I'm not sure, but I think it is the same problem I had with all kernels
= 2.6.17. All of these kernels lock up shortly before or right at calling
the init program (resp. as soon as the kernel forks some kernel theads).
Any suggestions on how to track down this problem?

regards,

Gerhard

-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-08 Thread Gerhard Pircher

 Original-Nachricht 
 Datum: Fri, 09 Nov 2007 10:20:17 +1100
 Von: Benjamin Herrenschmidt [EMAIL PROTECTED]
 An: Gerhard Pircher [EMAIL PROTECTED]
 CC: linuxppc-dev@ozlabs.org
 Betreff: Re: Kernel locks up after calling kernel_execve()

 
 On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote:
  Hi,
  
  I tested my patches for the AmigaOne platform with the lastest
  2.6.24-rc2 kernel snapshot. The kernel runs through all initcalls, but
  locks up completely after calling INIT (/sbin/init) by kernel_execve().
  Thus I couldn't capture any kernel oops or panic output. Also the magic 
  sysrq key doesn't work. Enabling debug code for soft lockups and
  spinlock debugging didn't reveal any information.
  I'm not sure, but I think it is the same problem I had with all kernels
  = 2.6.17. All of these kernels lock up shortly before or right at
  calling the init program (resp. as soon as the kernel forks some kernel
  theads).
  Any suggestions on how to track down this problem?
 
 You don't have a HW debugger or anything like that ?
 
 Ben.
Unfortunately, no. A BDI2000 is too costly for me.

I tried to use /bin/sh as init program and was able to enter a command,
but then the machine locked up, too.
Could that be a problem with a CPU sleeping/idle code?

Gerhard

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Kernel locks up after calling kernel_execve()

2007-11-08 Thread Benjamin Herrenschmidt

 I tried to use /bin/sh as init program and was able to enter a command,
 but then the machine locked up, too.
 Could that be a problem with a CPU sleeping/idle code?

That's possibly an issue, try disabling power save if any for that CPU
type. If it worked and broke, you may have to bisect tho.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev