On Thu, Oct 6, 2011 at 2:39 AM, Vladimir Mosgalin
<[email protected]>wrote:

> Hi James Kelly!
>
>  On 2011.10.05 at 22:31:18 +0100, James Kelly wrote next:
>
> > I lost contact with my Scientific Linux 6.1 KVM host earlier today.
> >
> > The machine is headless and I don't have any IPMI stuff on the machine so
> I
> > had to plug a monitor into it. However, there was no life from the
> monitor
> > and I pressed the reset button.
> >
> > It seems to me that the networking died. The machine is booted first
> thing
> > every morning (so the 9:00am start was missed by two minutes!) and the
> > networking error seems to have occurred about 27 minutes after
> > the initial boot.
>
> It's unclear to me if tg3 driver errors in the second half of message
> are source or cause of this situation, however if they are source, you
> might be interested in recent update that Red Hat has released:
> http://rhn.redhat.com/errata/RHEA-2011-1348.html
>
> Try installing kmod-tg3 from sl-fastbugs repo and rebooting, it should
> make your system use newer version of network driver that's mentioned in
> these messages. I have no idea if it will really help, but it probably
> won't hurt to try.
>
>
> The often cause of similar problems with network drivers could be
> interrupt setup - network cards generate lots of interrupts under load
> and use various advanced features to ease it a bit, I saw some
> situations where panics and warnings in kernel appeared due to hardware
> interrupt setup or buggy interrupt code in network driver under load.
> Just in case, you might want to find mention of eth in /proc/interrupts
> to make sure that it uses MSI-X (shown as PCI-MSI-edge or PCI-MSI-X) and
> not IO-APIC-level or something like that. However, I don't think these
> kind of problems should arrive on such hardware.
>
> In the worst case, if these problems will keep appearing, consider
> installing external intel-based network card, these work most flawlessly
> under Linux in my (and some other people) experience. It's kind of sad,
> but marvell, broadcom and nvidia products are a bit of second class
> citizens and don't always work flawlessly under load - might be more of
> a driver problem, who knows, but that's just my experience from past
> years.
> (also, I'd definitely stay away from NICs based on other manufacturer's
> chips, except for these 4 nothing else should probably be allowed in
> server market. YMMV)
>
> These messages also can be indicating something else than network
> problems but people with deeper kernel knowledge than me should answer
> this. All I can say is that NICs+network drivers+interrupt settings
> combination *can* be real source of problems, up to kernel panics under
> some conditions, it's not that rare at all to find out that such
> problems are caused by network driver.
>
> --
>
> Vladimir
>

Thanks for your reply Vladimir.

PCI-MSI-edge was visible from a cat /proc/interrupts.

One of the guests on the server started to give me a few headaches so I
have reverted to a Debian KVM setup for the time being.

I have not had any similar issues with Debian Squeeze and KVM on the same
hardware. However, I want to switch to SL6 for KVM so I am going to look for
some newer hardware with Intel NIC's.

James

Reply via email to