Contegix Customer:

Please do not reply to this email.  If you have any questions, please submit a 
support request to [email protected].

At approximately 11:39 AM on July 2nd, our NOC engineers began to receive several monitor alarms alerting us of a potential network issue. We found our core switches were dropping packets to both internal and external traffic.

We began to investigate and found abnormal traffic lights on one of our intrusion prevention systems. At that time, we believed this to be the cause and physically bypassed the units. We quickly determined that this was not the root cause and the problem still persisted. We then began to troubleshoot in our core switching.

At approximately 11:59 AM, we determined there was a multicast packet storm on our network. Due to the high number of packets, the CPUs in both core switches reached max capacity which caused packet loss. After further debugging we found that the storm was from a routing protocol (VRRP-E) multicast IP and originating from a specific customer core switch port. The customer connected to this port had had a switch malfunction a few minutes prior to the network issue and we determined this could be the cause. At approximately 12:05 PM, we disabled the customer port and the CPUs on our core switches began to stabilize.

Network availability to internal and external destinations were restored, but we found that we still could not reach a few external destinations. Also, traffic was increasing on our network but not at normal utilization. After further troubleshooting, we found that we could not route out Level(3)’s network. Based on our observations and data, we could not determine the reason for the Level(3) issues. At approximately 12:19 PM, we disabled BGP with Level(3). Once this was disabled, our network returned to normal and traffic flowed through to outbound routes correctly.

While the issue started when a customer replaced a switch, we do not believe this is the direct cause. We do suspect that it triggered a bug in our core switch software despite all engineered precautions. We are working closely with the hardware manufacturer to determine the exact cause. We will forward any new information on this issue and long-term resolution. In the interim, we have placed a moratorium on adding new customer switching equipment connected to our core switches. In addition, we restored our BGP session with Level(3) once it was determined to be safe.

We apologize for any inconvenience this may have created for you or your customers. Our reliable network is one of our great assets, and we place a great deal of emphasis on making sure it is working optimally. As mentioned before, we are working closely with the switch manufacturer to identify and fix this bug to make sure this does not occur again.


Sincerely,
Contegix Support

---
Contegix
900 Walnut Street
Suite 700
Saint Louis, MO  63102
Phone: 314.622.6200 ext. 3
Toll Free: 877.4.CONTEGIX ext. 3
Fax: 314.621.4422
E-mail: [email protected]
Beyond Managed Hosting(r) for Your Enterprise
Favorite Linux-Friendly Hosting Company - Linux Journal
http://www.contegix.com/linuxjournal

Reply via email to