Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-10-08 Thread Helge Hafting

Thomas Gleixner wrote:

On Sat, 29 Sep 2007, Helge Hafting wrote:
  

Thomas Gleixner wrote:


I have gone back to 2.6.22rc4, which seems to work.

This is a single opteron, although on a dual-slot board.



Can you switch to serial console, so we can get some information out of
that box? Sysrq-B is working, so we can get info from other sysrq
functions as well.
  
  

I didn't need the serial - it crashes during console work too.
I think a "make clean" was in progress at the time. There must be work going
on in order to crash.

This time 2.6.22rc4 died on me with a general protection fault

I got two reports, the first one scrolled partially off screen but
the whole trace was there:



That's why I asked for a serial console. That way we can get all the
information from the reports including the register dumps 
  

I got another crash - with a full dump.  I have also discovered
files with lots of single-bit errors, so this is probably just some kind
of hw problem. :-(

Replace mermory or the motherboard with everything on it . . . :-(

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-10-05 Thread Helge Hafting

Andi Kleen wrote:

Helge Hafting <[EMAIL PROTECTED]> writes:
  

shrink_dcache_memory



That usually means random memory corruption from somewhere -- dcache
tends to use a lot of memory and when it is corrupted anywhere these 
functions tend to crash while walking the lists. 


Unfortunately memory corruption is hard to track down because
the messenger is usually not the one to blame.

Perhaps enable slab debugging and see if it turns
something up. Could be also broken hardware. Does an older kernel
run stable? If yes and if it can be reproduced bisecting would
be good.
  

I attempted bisecting - and failed. The problem is that
2.6.23rc7 seems very unstable, but 2.6.22rc4 is much better
but not perfect. 2.6.22rc4 only crashed once - it can compile for
hours and swap lots and keep running. But it died at least once.

I'll try running recent kernels with more debugging instead.
I think I used SLUB instead of SLAB - either way I can switch
that over to see if it changes things.

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-10-01 Thread Helge Hafting

Andi Kleen wrote:

Helge Hafting <[EMAIL PROTECTED]> writes:
  

shrink_dcache_memory



That usually means random memory corruption from somewhere -- dcache
tends to use a lot of memory and when it is corrupted anywhere these 
functions tend to crash while walking the lists. 


Unfortunately memory corruption is hard to track down because
the messenger is usually not the one to blame.

Perhaps enable slab debugging and see if it turns
something up. Could be also broken hardware. Does an older kernel
run stable? If yes and if it can be reproduced bisecting would
be good.
  

2.6.18 had no problem compiling stuff without crashing.
Looks like I have some work to do then.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-30 Thread Andi Kleen
Helge Hafting <[EMAIL PROTECTED]> writes:
> 
> shrink_dcache_memory

That usually means random memory corruption from somewhere -- dcache
tends to use a lot of memory and when it is corrupted anywhere these 
functions tend to crash while walking the lists. 

Unfortunately memory corruption is hard to track down because
the messenger is usually not the one to blame.

Perhaps enable slab debugging and see if it turns
something up. Could be also broken hardware. Does an older kernel
run stable? If yes and if it can be reproduced bisecting would
be good.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-30 Thread Helge Hafting

Thomas Gleixner wrote:

On Sat, 29 Sep 2007, Helge Hafting wrote:
  

Thomas Gleixner wrote:


I have gone back to 2.6.22rc4, which seems to work.

This is a single opteron, although on a dual-slot board.



Can you switch to serial console, so we can get some information out of
that box? Sysrq-B is working, so we can get info from other sysrq
functions as well.
  
  

I didn't need the serial - it crashes during console work too.
I think a "make clean" was in progress at the time. There must be work going
on in order to crash.

This time 2.6.22rc4 died on me with a general protection fault

I got two reports, the first one scrolled partially off screen but
the whole trace was there:



That's why I asked for a serial console. That way we can get all the
information from the reports including the register dumps ...
  

Sure. But I can't get a cable right now. Was the registers necessary
in this case? Often, the trace turns out to be enough.

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-30 Thread Thomas Gleixner
On Sat, 29 Sep 2007, Helge Hafting wrote:
> Thomas Gleixner wrote:
> > > I have gone back to 2.6.22rc4, which seems to work.
> > > 
> > > This is a single opteron, although on a dual-slot board.
> > > 
> > 
> > Can you switch to serial console, so we can get some information out of
> > that box? Sysrq-B is working, so we can get info from other sysrq
> > functions as well.
> >   
> I didn't need the serial - it crashes during console work too.
> I think a "make clean" was in progress at the time. There must be work going
> on in order to crash.
> 
> This time 2.6.22rc4 died on me with a general protection fault
> 
> I got two reports, the first one scrolled partially off screen but
> the whole trace was there:

That's why I asked for a serial console. That way we can get all the
information from the reports including the register dumps 

> Then I got:
> spinlock lockup on cpu #0, kswapd 0/212

That's probably caused by the previous one.

   tglx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-29 Thread Helge Hafting

Thomas Gleixner wrote:

On Mon, 2007-09-24 at 23:08 +0200, Helge Hafting wrote:
  

The two kernels mentioned hangs occationally.
Typically when I compile something and pass the time
by surfing the web.

A few minutes and then I notice that the mouse (and everything else in X)
stops.  kbd LEDs does not react to numlock/capslock.
The only thing that still works is sysrq+B
So far this has happened while running X, so no messages.

I have gone back to 2.6.22rc4, which seems to work.

This is a single opteron, although on a dual-slot board.



Can you switch to serial console, so we can get some information out of
that box? Sysrq-B is working, so we can get info from other sysrq
functions as well.
  

I didn't need the serial - it crashes during console work too.
I think a "make clean" was in progress at the time. There must be work 
going on

in order to crash.

This time 2.6.22rc4 died on me with a general protection fault

I got two reports, the first one scrolled partially off screen but
the whole trace was there:

shrink_dcache_memory
shrink_slab
kswapd
autoremove_wake_function
thread_return
trace_hardirqs_on
kswapd
kswapd
kthtread
child_rip
restore_args
kthread
child_rip

Then I got:
spinlock lockup on cpu #0, kswapd 0/212
_raw_spin_lock
shrink_dcache_parent
shrink_dcache_parent
proc_flush_task
release_task
do_exit
die
error_exit
prune_dcache
[From here on, it continues exactly like the first report:]
shrink_dcache_memory
shrink_slab
kswapd
autoremove_wake_function
thread_return
trace_hardirqs_on
kswapd
kswapd
kthtread
child_rip
restore_args
kthread
child_rip


sysrq P says:
cpu 0
pid 212 comm: kswapd0  not tainted 2.6.22-rc4 #18
RIP: __delay

I took a picture of the screen, in case the register dumps are interesting.
Wonder what this is - dcache trouble? swap trouble?
Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-24 Thread Thomas Gleixner

On Mon, 2007-09-24 at 23:08 +0200, Helge Hafting wrote:
> The two kernels mentioned hangs occationally.
> Typically when I compile something and pass the time
> by surfing the web.
> 
> A few minutes and then I notice that the mouse (and everything else in X)
> stops.  kbd LEDs does not react to numlock/capslock.
> The only thing that still works is sysrq+B
> So far this has happened while running X, so no messages.
> 
> I have gone back to 2.6.22rc4, which seems to work.
> 
> This is a single opteron, although on a dual-slot board.

Can you switch to serial console, so we can get some information out of
that box? Sysrq-B is working, so we can get info from other sysrq
functions as well.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-09-24 Thread Helge Hafting

The two kernels mentioned hangs occationally.
Typically when I compile something and pass the time
by surfing the web.

A few minutes and then I notice that the mouse (and everything else in X)
stops.  kbd LEDs does not react to numlock/capslock.
The only thing that still works is sysrq+B
So far this has happened while running X, so no messages.

I have gone back to 2.6.22rc4, which seems to work.

This is a single opteron, although on a dual-slot board.


Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/