On Thu, Oct 6, 2011 at 2:39 AM, Vladimir Mosgalin <[email protected]>wrote:
> Hi James Kelly! > > On 2011.10.05 at 22:31:18 +0100, James Kelly wrote next: > > > I lost contact with my Scientific Linux 6.1 KVM host earlier today. > > > > The machine is headless and I don't have any IPMI stuff on the machine so > I > > had to plug a monitor into it. However, there was no life from the > monitor > > and I pressed the reset button. > > > > It seems to me that the networking died. The machine is booted first > thing > > every morning (so the 9:00am start was missed by two minutes!) and the > > networking error seems to have occurred about 27 minutes after > > the initial boot. > > It's unclear to me if tg3 driver errors in the second half of message > are source or cause of this situation, however if they are source, you > might be interested in recent update that Red Hat has released: > http://rhn.redhat.com/errata/RHEA-2011-1348.html > > Try installing kmod-tg3 from sl-fastbugs repo and rebooting, it should > make your system use newer version of network driver that's mentioned in > these messages. I have no idea if it will really help, but it probably > won't hurt to try. > > > The often cause of similar problems with network drivers could be > interrupt setup - network cards generate lots of interrupts under load > and use various advanced features to ease it a bit, I saw some > situations where panics and warnings in kernel appeared due to hardware > interrupt setup or buggy interrupt code in network driver under load. > Just in case, you might want to find mention of eth in /proc/interrupts > to make sure that it uses MSI-X (shown as PCI-MSI-edge or PCI-MSI-X) and > not IO-APIC-level or something like that. However, I don't think these > kind of problems should arrive on such hardware. > > In the worst case, if these problems will keep appearing, consider > installing external intel-based network card, these work most flawlessly > under Linux in my (and some other people) experience. It's kind of sad, > but marvell, broadcom and nvidia products are a bit of second class > citizens and don't always work flawlessly under load - might be more of > a driver problem, who knows, but that's just my experience from past > years. > (also, I'd definitely stay away from NICs based on other manufacturer's > chips, except for these 4 nothing else should probably be allowed in > server market. YMMV) > > These messages also can be indicating something else than network > problems but people with deeper kernel knowledge than me should answer > this. All I can say is that NICs+network drivers+interrupt settings > combination *can* be real source of problems, up to kernel panics under > some conditions, it's not that rare at all to find out that such > problems are caused by network driver. > > -- > > Vladimir > Thanks for your reply Vladimir. PCI-MSI-edge was visible from a cat /proc/interrupts. One of the guests on the server started to give me a few headaches so I have reverted to a Debian KVM setup for the time being. I have not had any similar issues with Debian Squeeze and KVM on the same hardware. However, I want to switch to SL6 for KVM so I am going to look for some newer hardware with Intel NIC's. James
