On 11/10/2014 0:51, Jan Beulich wrote:
On 10.11.14 at 09:03, <sfl...@ihonk.com> wrote:
Sorry for the delay, took some debugging on another computer to get
serial logging working. Due to its size, I've posted the entire log of a
crashed session here: http://pastebin.com/AiPHUZRH In this case I used a
3.0 gig memory size for the Windows domU.
As I mentioned before, sometimes it's the SATA that goes first, other
times the tg3 ethernet driver. Also note that between 4.4.1 and 4.5rc1,
the kernel I'm using (stock Debian Jessie) has not changed.
Please let me know if you need any other information. Thanks!
Raising the kernel log level to maximum too would have helped.
Okay, I've done that and the output is here, let me know if you have any
preferred logging flags instead:
http://pastebin.com/M3yvWNTT
Regardless of that, the first device showing anomalies here appears
to be the UHCI controller:
[ 147.415713] usb 7-1: reset low-speed USB device number 2 using uhci_hcd
while booting the guest.
I assume this is related to the USB device (a keyboard) I'm passing
through to the domU.
And these
[ 199.775209] pcieport 0000:00:03.0: AER: Multiple Corrected error
received: id=0018
[ 199.775238] pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=0018(Transmitter ID)
[ 199.775251] pcieport 0000:00:03.0: device [8086:340a] error
status/mask=00001100/00002000
[ 199.775255] pcieport 0000:00:03.0: [ 8] RELAY_NUM Rollover
[ 199.775258] pcieport 0000:00:03.0: [12] Replay Timer Timeout
hint at a problem in the system's design. 00:03.0 is the parent bridge
of 02:00.0 (and from what I can tell that's the only device behind that
bridge), and hence the above messages can only reasonably have
their origin at the passed through VGA device.
You are correct that the VGA card is the only device on 03.0:
root@g2:~# lspci -tv
-[0000:00]-+-00.0 Intel Corporation 5520 I/O Hub to ESI Port
+-01.0-[01]----00.0 Marvell Technology Group Ltd.
MV64460/64461/64462 System Controller, Revision B
+-03.0-[02]----00.0 NVIDIA Corporation GT200GL [Quadro FX 4800]
+-07.0-[03]--
+-14.0 Intel Corporation 7500/5520/5500/X58 I/O Hub System
Management Registers
+-14.1 Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO
and Scratch Pad Registers
+-14.2 Intel Corporation 7500/5520/5500/X58 I/O Hub Control
Status and RAS Registers
+-16.0 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.1 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.2 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.3 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.4 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.5 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.6 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-16.7 Intel Corporation 5520/5500/X58 Chipset QuickData
Technology Device
+-1a.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #4
+-1a.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #5
+-1a.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI
Controller #2
+-1b.0 Intel Corporation 82801JI (ICH10 Family) HD Audio
Controller
+-1c.0-[04]--
+-1c.4-[05]----00.0 Broadcom Corporation NetXtreme BCM5755
Gigabit Ethernet PCI Express
+-1c.5-[06-09]----00.0-[07-09]--+-02.0-[08]--
| \-03.0-[09]----00.0 Broadcom
Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express
+-1d.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #1
+-1d.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #2
+-1d.2 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #3
+-1d.3 Intel Corporation 82801JI (ICH10 Family) USB UHCI
Controller #6
+-1d.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI
Controller #1
+-1e.0-[0a]----0e.0 Advanced Micro Devices, Inc. [AMD/ATI]
RV100 [Radeon 7000 / Radeon VE]
+-1f.0 Intel Corporation 82801JIB (ICH10) LPC Interface
Controller
+-1f.2 Intel Corporation 82801JI (ICH10 Family) SATA AHCI
Controller
\-1f.3 Intel Corporation 82801JI (ICH10 Family) SMBus
Controller
What problem in the system's design does this hint at?
IOW it may well be that
you were just lucky that things worked earlier on.
Certainly possible but this is a very common machine in the corporate
world -- a Lenovo ThinkStation D20 running the X58 chipset. If it's an
inherent defect in the machine and somebody else hasn't already tripped
over it I would be very surprised.
And btw - the title saying "host crash" seems to not match the provided
log, as there's no sign of a crash anywhere (the host may be hung from
what is visible). Was that just badly worded, or have you actually seen
crashes too?
Only seen hanging. Sorry for the lack of technical rigor on the title,
but from the other end of the ethernet cable, it might as well have crashed.
If the expanded logging doesn't tell you anything useful, I'll see if I
can bisect the problem.
Thanks very much for your time.
Steve
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel