For many months now, I've been dealing with host machine instability. Essentially, I'm rebooting at least one host a week (see http://www.linode.com/forums/viewforum.php?f=25 for details). The affected machines are all running a kernel patched with skas3 at least version 8. Only my more recent boxes seem to be affected by this bug, however the only difference between the boxes that crash and the ones that don't are processor speed (faster ones crash). Identical hardware, otherwise. Slower boxes running the identical kernel have great uptimes (> 120 days). Boxes running skas3 v7 or less have uptimes of over 400 days (!).

Yesterday another box crashed. Due to my crappy remote console unit requiring me to be connected at the time of a panic to actually capture it, I had my datacenter plug in a monitor and write down by hand some of the panic output. There's not much here, but this is what they provided me:

<quote>
(hoangvo-08/10/2005 10:28:22):
Your server has been rebooted and verified to respond to SSH requests. The error messages I recorded from the console are as follows:

========================================
EFLAGS: 00010292 (2.6.11.11-1-bigmem)
EIP is at 0x0
========================================

I skipped over the information here as I did not feel it would be useful to you. This is more useful, however:

========================================
Call Trace
[<c0106fd7>] do_syscall_trace+0x97/0x10e
[<c0104934>] math_state_restore+0x24/0x40
[<c0102639>] syscall_trace_entry_+0x11/0x2a

Code: Bad EIP value.
</quote>

Source tree and vmlinux file:
http://www.theshore.net/~caker/uml/2.6.11.11-1-bigmem.tar.bz2 (38M)
http://www.theshore.net/~caker/uml/vmlinux.bz2 (2M)

Built source tree:
http://www.theshore.net/~caker/uml/2.6.11.11-1-bigmem.tar.gz (77M)

Jeff took a look at this yesterday, but I didn't really expect him to get very far with such little information. I just wanted to get this out into the open in case anyone else has experienced something similar. Next host that panics, I'll make sure that the datacenter copies down the entire panic output.

-Chris


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to