Chris Knowles wrote:
Got a weird one.

(Oh, regarding that crashing box, further investigation pointed at the motherboard as a culprit.)

I've got a Nagios server in place that's been happily warning us of doom and gloom for over a year. It's one of the great success stories for Linux at our company.

Until now.

Starting this morning, it has been randomly unable to ping various boxes on our network. That is, until you ping the nagios server from the "unpingable" server. Then Nagios can ping that server all it wants.

This is all local network, no routing involved.

Any idears as to what could be causing this? (This is a simple switched network, and other than this seems to be working fine.)

Any help is appreciated.

CJK

As Jason mentions, this is an arp problem. The nagios box is either not sending arp requests, or not listening to the replies. When another box arps for the nagios, it hears that request and replies, at the same time populating its cache, so it can send packets to that box then. (I see this kind of one-way pingability a lot in my day job of debugging switch/routers).

The best bet is to run the all-seeing, all-knowing Ethereal on both the nagios box and the 'other' box, or its command-line cousin tcpdump (you needn't even put them in promiscuous mode, as you're tracing packets destined to the boxes in question). Then you can see what's going wrong with those arp packets.

Corey
--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc

Reply via email to