Re: [Users] OpenVZ on EL6 - weird network issue
Hi Scott, I'd like to hear more once you get it figured out. We had two more outages in between. I then also learned that there are not only the three OpenVZ nodes in the same class C network, but also 50 other (physical) Linux servers. So there is a certain level of noise and congestion in that subnet, which on the average has around 16Mbit of traffic. The pfSense in front of it used to be clustered, but the clustering had been disabled for troubleshooting. Still, the remaining pfSense flooded the network with VRRP requests, which made up the majority of the broadcast related traffic in that class C. Which might (or might not) play a role here. You'd only see those VRRP requests (or this level of them) in a datacenter with a similar setup. Further segmentation of that network seems prudent, but can't be done at the moment as it would be too disruptive. The client is considering it, though. Forcing the OpenVZ nodes to do hourly arpsends looked like it had helped to improve the situation, but there were still two recorded outages. And like before two of the three nodes would usually loose network connection at the same time - just a few minutes apart. Not always the same nodes, but always two of three. We had set up some monitoring to dump the arp table and routes every minute and to diff it during the next run. On the nodes that diff didn't indicate any changes during the time the failures happened. The arp table on the pfSense also remained the same. We did run a ping from inside a VPS to the outside world and from the outside to a VPS, doing a tcpdump on both br0 and venet0 on the node and inside the VPS another tcpdump on venet0. If I recall correctly the ICMP packets arrived at the node's br0, were routed to venet0, where the VPS could see them. But on the way back the ICMP response got lost between venet0 and br0. We were kinda running out of options there (and patience as far as the client's clients were concerned). So we ditched the bridges entirely and re-configured all nodes to use eth0 directly. As we don't (at least at the moment) use KVM we can make do without the bridges. Open-V-Switch sounds like a *very* interesting alternative, but we couldn't yet find the time to experiment with it. So we chose the devil we know, which is eth0. :p It'll take a few days to see if that solves our issues, but so far it's looking good. /knocking on wood Many thanks to all who offered suggestions. Much appreciated! -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Greetings, - Original Message - Symptoms: = All nodes and VPS's sporadically get unreachable from the outside. From within the private network (from another box for example) one can still SSH in. Pings to public IPs then still work from the nodes, but no longer work from inside the VPS's. I have seen this reported in the #openvz IRC channel a number of times over the years and almost always it was inside of a commercial datacenter. The cause of the problem seemed to be some upstream routing device that was periodically dropping ARP table entries for the containers. I'm not much of a networking person so forgive me for a lack of precision of language. Anyway, one workaround to fix it was to periodically (via cron or whatever) do an arpsend from the host (or maybe each container?) to keep the ARP entries refreshed. I don't remember the specifics of the arpsend method but in a pinch you can probably look at how vzctl (in the C source code) sets up a container's IP address... and mimic that. I'd like to hear more once you get it figured out. TYL, -- Scott Dowdle 704 Church Street Belgrade, MT 59714 (406)388-0827 [home] (406)994-3931 [work] ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Hello, folks! Did you tried open-v-switch instead standard Linux bridges? I working with openwswitch on many servers and everything working perfectly now. On Sun, Dec 28, 2014 at 12:21 AM, Michael Stauber mstau...@blueonyx.it wrote: Hi Scott, [...] almost always it was inside of a commercial datacenter. Yeah, it's in a large datacenter as well. The cause of the problem seemed to be some upstream routing device that was periodically dropping ARP table entries for the containers. That is indeed interesting. Well, I guess there is only one way to find out. I'll set up a cronjob as suggested to do periodic arpseeds. It sure won't hurt. So far we tried this here: http://forum.proxmox.com/threads/8301-OpenVZ-Containers-lose-internet-connection-%28VLAN-venet%29 TL;DR: echo 2 /proc/sys/net/ipv4/conf/br0/rp_filter That sounded good on paper, but just 20 hours later we had the next outage. The OpenVZ wiki also pointed out this here (Bridge doesn't forward packets): http://openvz.org/Bridge_doesn%27t_forward_packets Yet I'm not sure how applicable that might be to us. I noted down the current status of /proc/sys/net/bridge/* and will compare it with what it'll be during the next loss of connectivity. I'd like to hear more once you get it figured out. Sure, I'll post follow ups then. Likewise: If you have any other ideas or suggestions, then I'd love to hear them. -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users -- Sincerely yours, Pavel Odintsov ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Hi Pavel, Did you tried open-v-switch instead standard Linux bridges? I actually wasn't aware of that one before you mentioned it. I'm just starting to read the docs and specs of it and it looks indeed very interesting. Do you by chance have any specific pointers or docs how to set it up for usage with OpenVZ? I'm willing to give it a shot, but with these client systems I can't afford much in the way of experimenting. Taking the bridge stack apart and replacing it with something else might be a little too extreme, but I'll consider it as last resort. In any case I'd like to avoid the usual beginner mistakes one could make when switching to open-v-switch, so any tips and hints (or URLs to reading material) would be appreciated. As for Scott's suggestion with a cronjob with arp-seeds: I did some digging and the network related shell scripts of OpenVZ are quite enlightening there: /etc/sysconfig/network-scripts/ifup-venet /usr/libexec/vzctl/scripts/vps-functions It appears that if I set up a cronjob that periodically runs ... /etc/sysconfig/network-scripts/ifup-venet /etc/sysconfig/network-scripts/ifcfg-br0 ... then it'll handle the arp-seeds as well via the function vzarp() as provided by /usr/libexec/vzctl/scripts/vps-functions. I tried it on a test box and it didn't appear to have any negative impact. Some debugging lines thrown in by me also showed me the exact commands that these scripts use to do the arp-seeds. I can work with that. I'll implement that on the three affected nodes and will see if it helps. -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Hello! You can read this ticket https://bugzilla.openvz.org/show_bug.cgi?id=2896 and can find all answer there. On Sun, Dec 28, 2014 at 1:18 AM, Michael Stauber mstau...@blueonyx.it wrote: Hi Pavel, Did you tried open-v-switch instead standard Linux bridges? I actually wasn't aware of that one before you mentioned it. I'm just starting to read the docs and specs of it and it looks indeed very interesting. Do you by chance have any specific pointers or docs how to set it up for usage with OpenVZ? I'm willing to give it a shot, but with these client systems I can't afford much in the way of experimenting. Taking the bridge stack apart and replacing it with something else might be a little too extreme, but I'll consider it as last resort. In any case I'd like to avoid the usual beginner mistakes one could make when switching to open-v-switch, so any tips and hints (or URLs to reading material) would be appreciated. As for Scott's suggestion with a cronjob with arp-seeds: I did some digging and the network related shell scripts of OpenVZ are quite enlightening there: /etc/sysconfig/network-scripts/ifup-venet /usr/libexec/vzctl/scripts/vps-functions It appears that if I set up a cronjob that periodically runs ... /etc/sysconfig/network-scripts/ifup-venet /etc/sysconfig/network-scripts/ifcfg-br0 ... then it'll handle the arp-seeds as well via the function vzarp() as provided by /usr/libexec/vzctl/scripts/vps-functions. I tried it on a test box and it didn't appear to have any negative impact. Some debugging lines thrown in by me also showed me the exact commands that these scripts use to do the arp-seeds. I can work with that. I'll implement that on the three affected nodes and will see if it helps. -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users -- Sincerely yours, Pavel Odintsov ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Оригинално писмо От: Michael Stauber mstau...@blueonyx.it Относно: Re: [Users] OpenVZ on EL6 - weird network issue До: OpenVZ users users@openvz.org Изпратено на: Неделя, 2014, Декември 28 00:18:17 EET Hi Pavel, Did you tried open-v-switch instead standard Linux bridges? I actually wasn't aware of that one before you mentioned it. I'm just starting to read the docs and specs of it and it looks indeed very interesting. Do you by chance have any specific pointers or docs how to set it up for usage with OpenVZ? I'm willing to give it a shot, but with these client systems I can't afford much in the way of experimenting. Taking the bridge stack apart and replacing it with something else might be a little too extreme, but I'll consider it as last resort. In any case I'd like to avoid the usual beginner mistakes one could make when switching to open-v-switch, so any tips and hints (or URLs to reading material) would be appreciated. As for Scott's suggestion with a cronjob with arp-seeds: I did some digging and the network related shell scripts of OpenVZ are quite enlightening there: /etc/sysconfig/network-scripts/ifup-venet /usr/libexec/vzctl/scripts/vps-functions It appears that if I set up a cronjob that periodically runs ... /etc/sysconfig/network-scripts/ifup-venet /etc/sysconfig/network-scripts/ifcfg-br0 ... then it'll handle the arp-seeds as well via the function vzarp() as provided by /usr/libexec/vzctl/scripts/vps-functions. I tried it on a test box and it didn't appear to have any negative impact. Some debugging lines thrown in by me also showed me the exact commands that these scripts use to do the arp-seeds. I can work with that. I'll implement that on the three affected nodes and will see if it helps. -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users From what you are experiencing I'll check ARP on the switch... If I understood correctly you firewall cannot reach the nodes (cannot be reached from outside...) - can you ssh to them from FW? Good luck... ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ on EL6 - weird network issue
Hi Hristo, From what you are experiencing I'll check ARP on the switch... We'll do that the next time it happens. If I understood correctly you firewall cannot reach the nodes (cannot be reached from outside...) - can you ssh to them from FW? When the problem happens we cannot SSH in to the affected node from outside the private network. But from unaffected hosts inside the private network we can SSH to the affected node(s). So far we didn't try to SSH from the firewall to the affected node while we had the problem. But you're right: This was probably an oversight and we'll definitely check that next time around. Thank you! -- With best regards Michael Stauber ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users