Host lockup

OS is:
Linux seanl64.xxxxxx.com 2.6.32.14-1.2.107.xendom0.fc12.x86_64 #1 SMP Wed Jun 
16 19:26:35 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

xen-4.0.1-0.1.rc3.fc12.x86_64

livelock appears to be related to the network driver interrupt (source: from 
scanning the news groups). The network LED is flashing continuously, but the 
system is locked up from the console. We cannot SSH into the box anymore.

Could this be related to interrupt priorities (don't know the hardware ins and 
outs of that).

We have a serial captcha of the XEN output!!

Here is the suspicious piece:

(XEN) MCE: MSR 417 is not MCA MSR
(XEN) traps.c:2854: GPF (0000): ffff82c4801ae806 -> ffff82c4801f7cb3
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x30
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x20
(XEN) vlapic.c:702:d13 Local APIC Write to read-only register 0x20
(XEN) irq.c:243: Dom13 PCI link 0 changed 5 -> 0
(XEN) irq.c:243: Dom13 PCI link 1 changed 10 -> 0
(XEN) irq.c:243: Dom13 PCI link 2 changed 11 -> 0
(XEN) irq.c:243: Dom13 PCI link 3 changed 5 -> 0
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 1683748570 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 1683749071 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86402 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 91586 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86408 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 92211 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 86456 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 94532 seconds
(XEN) rtc.c:296: HVM RTC: dom 2 skipping 86402 seconds
(XEN) rtc.c:296: HVM RTC: dom 13 skipping 94742 seconds


Note the "skipping 1683748570 seconds" which is more than 53 years.

But we haven't had the computer switched on for that long :-)

Cheers
V

On Thu, 1 Jul 2010 09:25:48 am Virgil wrote:
> Update:
> 
> Completely stable with 7 VMs. Hasn't missed a beat for several days now.
> 
> Will now run the pings again in 2 of 64PV from the virtual consoles (no
> graphics - and disconnected) and see if the host clock tick dies again.
> 
> Cheers
> V
> 
> On Thu, 24 Jun 2010 03:17:51 pm Virgil wrote:
> > Quick update:
> > 
> > Added 3 more VMs. Total of 7 now on this "desktop" computer.
> > 
> > 3x32PCFC6
> > 1x32HVFC12
> > 1x32HVwinXP-pro
> > 2x64PVFC12
> > 
> > Host is maxed out now.
> > 
> > Everything going well when the pings are not running in the 64PVFC12
> > machines.
> > 
> > Will leave it going for another couple of days.
> > 
> > Cheers
> > V
> > 
> > p.s. FYI Samba4-Alpha12 Active Directory controller is working well on
> > 64PVFC12.
> > 
> > On Wed, 23 Jun 2010 05:18:56 pm Virgil wrote:
> > > On Wed, 23 Jun 2010 03:30:52 pm Pasi Kärkkäinen wrote:
> > > > On Wed, Jun 23, 2010 at 11:11:58AM +1000, Virgil wrote:
> > > > > Hi Pasi,
> > > > > 
> > > > > Had a hiccup overnite:
> > > > > 
> > > > > The host became unresponsive in a weird way. The time stopped
> > > > > incrementing.
> > > > > 
> > > > > Turns out the clock stopped ticking (which I put down to the
> > > > > interrupts being disconnected).
> > > > > 
> > > > > Anyway I decided I'd reset the time using 'time -s 10:41:30'.
> > > > > 
> > > > > Kaboom, or actually deathly silence. The machine fully stopped dead
> > > > > in its tracks.
> > > > > 
> > > > > Just prior to this I connected to the console of one of the 64PV
> > > > > machines which was just running a ping from yesterday. Anyway,
> > > > > 60,000 or so lines of pings went to the console zipping up the
> > > > > screen. Then it was dead. I did a CTRL-C and eventually it
> > > > > returned to the prompt.
> > > > > 
> > > > > So I looked at the other 64PV machine, which was also pining, and
> > > > > identical situation.
> > > > > 
> > > > > So I reckon, there's some kind of buffer overflow going on when
> > > > > you're not "xm console MACHINE" connected. Once you pass 60,000
> > > > > lines of text this buffer overflow causes the RTC to hangup
> > > > > somehow.
> > > > 
> > > > Do you have xenconsoled running?
> > > > 
> > > > I've noticed PV guests that print a lot to the console will stall if
> > > > xenconsoled is not running.. xenconsoled needs to clear the guest
> > > > console buffer..
> > > > 
> > > > -- Pasi
> > > 
> > > Seems to be now. Pretty sure it was then too.
> > > 
> > > udev-post       0:off   1:on    2:on    3:on    4:on    5:on    6:off
> > > wpa_supplicant  0:off   1:off   2:off   3:off   4:off   5:off   6:off
> > > xenconsoled     0:off   1:off   2:off   3:on    4:on    5:on    6:off
> > > xend            0:off   1:off   2:off   3:on    4:on    5:on    6:off
> > > xendomains      0:off   1:off   2:off   3:on    4:on    5:on    6:off
> > > xenstored       0:off   1:off   2:off   3:on    4:on    5:on    6:off
> > > ypbind          0:off   1:off   2:off   3:off   4:off   5:off   6:off
> > > [r...@seanl64 ~]# ps -ef | grep xenconso
> > > root      1508     1  0 10:19 ?        00:00:00 /usr/sbin/xenconsoled
> > > --log=none --log-dir=/var/log/xen/console root      7815  7732  0 17:07
> > > pts/5    00:00:00 grep xenconso
> > > 
> > > Cheers
> > > V
> > > 
> > > > > I pressed the reset button, but this time the 2 64PV machines are
> > > > > not logged in. I'll just let it go and see if it keeps going.
> > > > > 
> > > > > Cheers
> > > > > V
> > > > > 
> > > > > On Tue, 22 Jun 2010 04:29:06 pm Pasi Kärkkäinen wrote:
> > > > > > On Tue, Jun 22, 2010 at 12:03:53PM +1000, Virgil wrote:
> > > > > > > Hi Pasi,
> > > > > > > 
> > > > > > > On Mon, 21 Jun 2010 08:57:55 pm Pasi Kärkkäinen wrote:
> > > > > > > > On Mon, Jun 21, 2010 at 01:56:36PM +0300, Pasi Kärkkäinen 
wrote:
> > > > > > > > > On Mon, Jun 21, 2010 at 02:28:15PM +1000, Virgil wrote:
> > > > > > > > > > Another quick update....
> > > > > > > > > > 
> > > > > > > > > > xen-4.0.1-0.1.rc3.fc13.src.rpm just compiled this under
> > > > > > > > > > fc12.
> > > > > > > > > > 
> > > > > > > > > > Identical results with this too (i.e. it's probably in
> > > > > > > > > > the kernel).
> > > > > > > > > > 
> > > > > > > > > > I have a (silly) idea for the serial console. The wiki
> > > > > > > > > > page recommends using a phone camera to capture the
> > > > > > > > > > screen....
> > > > > > > > > > 
> > > > > > > > > > Well my idea is to add an n-millisecond delay every time
> > > > > > > > > > the output stream in Xen sees a \n. This would delay the
> > > > > > > > > > screen updates enough for the camera to see them. The n
> > > > > > > > > > should be configurable on the kernel boot command line.
> > > > > > > > > > It's set to 0 right now.
> > > > > > > > > 
> > > > > > > > > Yeah, we really need to get a log somehow to troubleshoot
> > > > > > > > > your problem.
> > > > > > > > > 
> > > > > > > > > Serial console log would be the best:
> > > > > > > > > http://wiki.xensource.com/xenwiki/XenSerialConsole
> > > > > > > > 
> > > > > > > > Btw are you running the latest kernel:
> > > > > > > > http://koji.fedoraproject.org/koji/taskinfo?taskID=2254110
> > > > > > > > 
> > > > > > > > Or are you running custom/self compiled kernel?
> > > > > > > 
> > > > > > > Everything is working with:
> > > > > > > xen-4.0.1-0.1.rc3 compiled from source on fc12 machine and
> > > > > > > 2.6.32.14-1.2.107.xendom0.fc12.x86_64 from the  myoung repo.
> > > > > > > 
> > > > > > > All fixed.
> > > > > > 
> > > > > > Good to hear it works!
> > > > > > 
> > > > > > > We also now have a "null modem" cable to another old computer
> > > > > > > with a COM port. Turns out I was the only old man that could
> > > > > > > remember what a null modem cable is. The young guy said "wtf"?
> > > > > > > Also turns out I'm the only one who knows what minicom is and
> > > > > > > what 8N1 means
> > > > > > > 
> > > > > > > :-)
> > > > > > 
> > > > > > Hehe.. yeah I guess young people don't get to play with serial
> > > > > > consoles nowadays, until they're doing networking stuff..
> > > > > > 
> > > > > > So I guess most SOL devices in servers go unused.. :)
> > > > > > 
> > > > > > -- Pasi
> > > > > > 
> > > > > > > All VMs are now running concurrently.
> > > > > > > 
> > > > > > > Very happy again. Thanks.
> > > > > > > V
> > > > > > > 
> > > > > > > > -- Pasi
> > > > > > > > 
> > > > > > > > > > Cheers
> > > > > > > > > > V
> > > > > > > > > > 
> > > > > > > > > > On Mon, 21 Jun 2010 12:10:17 pm Virgil wrote:
> > > > > > > > > > > Just a quick update:
> > > > > > > > > > > 
> > > > > > > > > > > Just tried xen-4.0.0-2. Recompile from source on
> > > > > > > > > > > fc12.x86_64.
> > > > > > > > > > > 
> > > > > > > > > > > identical behaviour.
> > > > > > > > > > > 
> > > > > > > > > > > Cheers
> > > > > > > > > > > V
> > > > > > > > > > > 
> > > > > > > > > > > On Fri, 18 Jun 2010 03:17:19 pm Virgil wrote:
> > > > > > > > > > > > On Sat, 29 May 2010 11:26:50 pm M A Young wrote:
> > > > > > > > > > > > > If anyone wants to test xen 3.4.3, I have put up a
> > > > > > > > > > > > > source RPM at
> > > > > > > > > > > > > http://myoung.fedorapeople.org/dom0/src/xen-3.4.3-0
> > > > > > > > > > > > > .9 1. fc 13. src.r pm
> > > > > > > > > > > > > 
> > > > > > > > > > > > >       Michael Young
> > > > > > > > > > > > > 
> > > > > > > > > > > > > --
> > > > > > > > > > > > > xen mailing list
> > > > > > > > > > > > > xen@lists.fedoraproject.org
> > > > > > > > > > > > > https://admin.fedoraproject.org/mailman/listinfo/xe
> > > > > > > > > > > > > n
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi list,
> > > > > > > > > > > > 
> > > > > > > > > > > > Host crashing on 64FC12 kernel -105 dom0 when 2 PV64
> > > > > > > > > > > > machines are run.
> > > > > > > > > > > > 
> > > > > > > > > > > > I can run HV32WinXP and HV32FC12 and 1 PV64FC12 all
> > > > > > > > > > > > at the same time.
> > > > > > > > > > > > 
> > > > > > > > > > > > However, when any combination involves 2 PV64FC12
> > > > > > > > > > > > (kernel version doesn't matter) the host crashes.
> > > > > > > > > > > > 
> > > > > > > > > > > > Running on the -97 dom0 everything works in all
> > > > > > > > > > > > combos.
> > > > > > > > > > > > 
> > > > > > > > > > > > Using Xen 3.4.3.
> > > > > > > > > > > > 
> > > > > > > > > > > > Turning off the virt network cards in the PV64FC12
> > > > > > > > > > > > machines makes things go (obviously not much use
> > > > > > > > > > > > though).
> > > > > > > > > > > > 
> > > > > > > > > > > > Tried disabling IPV6, firewall stuff etc. etc.
> > > > > > > > > > > > 
> > > > > > > > > > > > Sometimes it would fire up and go but whichever
> > > > > > > > > > > > machine is started second gets really long ping
> > > > > > > > > > > > times like it's not receiving unless it sends
> > > > > > > > > > > > something (if that makes sense). Sooner or later the
> > > > > > > > > > > > host crashes.
> > > > > > > > > > > > 
> > > > > > > > > > > > Strangely a PV64FC12 and a PV64FC10 machine coexist
> > > > > > > > > > > > happily. It's only when a second PV64FC12 machine
> > > > > > > > > > > > starts up.
> > > > > > > > > > > > 
> > > > > > > > > > > > V
> > > > > > > > > > > > --
> > > > > > > > > > > > xen mailing list
> > > > > > > > > > > > xen@lists.fedoraproject.org
> > > > > > > > > > > > https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > > > > > 
> > > > > > > > > > > --
> > > > > > > > > > > xen mailing list
> > > > > > > > > > > xen@lists.fedoraproject.org
> > > > > > > > > > > https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > > > > 
> > > > > > > > > > --
> > > > > > > > > > xen mailing list
> > > > > > > > > > xen@lists.fedoraproject.org
> > > > > > > > > > https://admin.fedoraproject.org/mailman/listinfo/xen
> > > > > > > > > 
> > > > > > > > > --
> > > > > > > > > xen mailing list
> > > > > > > > > xen@lists.fedoraproject.org
> > > > > > > > > https://admin.fedoraproject.org/mailman/listinfo/xen
> > > 
> > > --
> > > xen mailing list
> > > xen@lists.fedoraproject.org
> > > https://admin.fedoraproject.org/mailman/listinfo/xen
> > 
> > --
> > xen mailing list
> > xen@lists.fedoraproject.org
> > https://admin.fedoraproject.org/mailman/listinfo/xen
> 
> --
> xen mailing list
> xen@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/xen
--
xen mailing list
xen@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/xen

Reply via email to