Re: [Users] OpenVZ on EL6 - weird network issue

2015-01-04 Thread Michael Stauber
Hi Scott,

 I'd like to hear more once you get it figured out.

We had two more outages in between. I then also learned that there are
not only the three OpenVZ nodes in the same class C network, but also 50
other (physical) Linux servers. So there is a certain level of noise and
congestion in that subnet, which on the average has around 16Mbit of
traffic.

The pfSense in front of it used to be clustered, but the clustering had
been disabled for troubleshooting. Still, the remaining pfSense flooded
the network with VRRP requests, which made up the majority of the
broadcast related traffic in that class C. Which might (or might not)
play a role here. You'd only see those VRRP requests (or this level of
them) in a datacenter with a similar setup.

Further segmentation of that network seems prudent, but can't be done at
the moment as it would be too disruptive. The client is considering it,
though.

Forcing the OpenVZ nodes to do hourly arpsends looked like it had helped
to improve the situation, but there were still two recorded outages. And
like before two of the three nodes would usually loose network
connection at the same time - just a few minutes apart. Not always the
same nodes, but always two of three.

We had set up some monitoring to dump the arp table and routes every
minute and to diff it during the next run. On the nodes that diff didn't
indicate any changes during the time the failures happened. The arp
table on the pfSense also remained the same.

We did run a ping from inside a VPS to the outside world and from the
outside to a VPS, doing a tcpdump on both br0 and venet0 on the node and
inside the VPS another tcpdump on venet0. If I recall correctly the ICMP
packets arrived at the node's br0, were routed to venet0, where the VPS
could see them. But on the way back the ICMP response got lost between
venet0 and br0.

We were kinda running out of options there (and patience as far as the
client's clients were concerned). So we ditched the bridges entirely and
re-configured all nodes to use eth0 directly. As we don't (at least at
the moment) use KVM we can make do without the bridges.

Open-V-Switch sounds like a *very* interesting alternative, but we
couldn't yet find the time to experiment with it. So we chose the devil
we know, which is eth0. :p

It'll take a few days to see if that solves our issues, but so far it's
looking good. /knocking on wood

Many thanks to all who offered suggestions. Much appreciated!

-- 
With best regards

Michael Stauber
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Scott Dowdle
Greetings,

- Original Message -
 Symptoms:
 =
 
 All nodes and VPS's sporadically get unreachable from the outside. From
 within the private network (from another box for example) one can still
 SSH in. Pings to public IPs then still work from the nodes, but no
 longer work from inside the VPS's.

I have seen this reported in the #openvz IRC channel a number of times over the 
years and almost always it was inside of a commercial datacenter.  The cause of 
the problem seemed to be some upstream routing device that was periodically 
dropping ARP table entries for the containers.  I'm not much of a networking 
person so forgive me for a lack of precision of language.  Anyway, one 
workaround to fix it was to periodically (via cron or whatever) do an arpsend 
from the host (or maybe each container?) to keep the ARP entries refreshed.  I 
don't remember the specifics of the arpsend method but in a pinch you can 
probably look at how vzctl (in the C source code) sets up a container's IP 
address... and mimic that.

I'd like to hear more once you get it figured out.

TYL,
-- 
Scott Dowdle
704 Church Street
Belgrade, MT 59714
(406)388-0827 [home]
(406)994-3931 [work]
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Pavel Odintsov
Hello, folks!

Did you tried open-v-switch instead standard Linux bridges?

I working with openwswitch on many servers and everything working
perfectly now.

On Sun, Dec 28, 2014 at 12:21 AM, Michael Stauber mstau...@blueonyx.it wrote:
 Hi Scott,

 [...] almost always it was inside of a commercial datacenter.

 Yeah, it's in a large datacenter as well.

 The cause of the problem seemed to be some upstream routing
 device that was periodically dropping ARP table entries for
 the containers.

 That is indeed interesting. Well, I guess there is only one way to find
 out. I'll set up a cronjob as suggested to do periodic arpseeds. It sure
 won't hurt.

 So far we tried this here:

 http://forum.proxmox.com/threads/8301-OpenVZ-Containers-lose-internet-connection-%28VLAN-venet%29

 TL;DR:

 echo 2  /proc/sys/net/ipv4/conf/br0/rp_filter

 That sounded good on paper, but just 20 hours later we had the next outage.

 The OpenVZ wiki also pointed out this here (Bridge doesn't forward
 packets):

 http://openvz.org/Bridge_doesn%27t_forward_packets

 Yet I'm not sure how applicable that might be to us. I noted down the
 current status of /proc/sys/net/bridge/* and will compare it with what
 it'll be during the next loss of connectivity.

 I'd like to hear more once you get it figured out.

 Sure, I'll post follow ups then. Likewise: If you have any other ideas
 or suggestions, then I'd love to hear them.

 --
 With best regards

 Michael Stauber
 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users



-- 
Sincerely yours, Pavel Odintsov
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Michael Stauber
Hi Pavel,

 Did you tried open-v-switch instead standard Linux bridges?

I actually wasn't aware of that one before you mentioned it.

I'm just starting to read the docs and specs of it and it looks indeed
very interesting.

Do you by chance have any specific pointers or docs how to set it up for
usage with OpenVZ? I'm willing to give it a shot, but with these client
systems I can't afford much in the way of experimenting. Taking the
bridge stack apart and replacing it with something else might be a
little too extreme, but I'll consider it as last resort. In any case I'd
like to avoid the usual beginner mistakes one could make when switching
to open-v-switch, so any tips and hints (or URLs to reading material)
would be appreciated.

As for Scott's suggestion with a cronjob with arp-seeds:

I did some digging and the network related shell scripts of OpenVZ are
quite enlightening there:

/etc/sysconfig/network-scripts/ifup-venet
/usr/libexec/vzctl/scripts/vps-functions

It appears that if I set up a cronjob that periodically runs ...

/etc/sysconfig/network-scripts/ifup-venet
/etc/sysconfig/network-scripts/ifcfg-br0

... then it'll handle the arp-seeds as well via the function vzarp() as
provided by /usr/libexec/vzctl/scripts/vps-functions.

I tried it on a test box and it didn't appear to have any negative
impact. Some debugging lines thrown in by me also showed me the exact
commands that these scripts use to do the arp-seeds. I can work with that.

I'll implement that on the three affected nodes and will see if it helps.

-- 
With best regards

Michael Stauber
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Pavel Odintsov
Hello!

You can read this ticket
https://bugzilla.openvz.org/show_bug.cgi?id=2896 and can find all
answer there.

On Sun, Dec 28, 2014 at 1:18 AM, Michael Stauber mstau...@blueonyx.it wrote:
 Hi Pavel,

 Did you tried open-v-switch instead standard Linux bridges?

 I actually wasn't aware of that one before you mentioned it.

 I'm just starting to read the docs and specs of it and it looks indeed
 very interesting.

 Do you by chance have any specific pointers or docs how to set it up for
 usage with OpenVZ? I'm willing to give it a shot, but with these client
 systems I can't afford much in the way of experimenting. Taking the
 bridge stack apart and replacing it with something else might be a
 little too extreme, but I'll consider it as last resort. In any case I'd
 like to avoid the usual beginner mistakes one could make when switching
 to open-v-switch, so any tips and hints (or URLs to reading material)
 would be appreciated.

 As for Scott's suggestion with a cronjob with arp-seeds:

 I did some digging and the network related shell scripts of OpenVZ are
 quite enlightening there:

 /etc/sysconfig/network-scripts/ifup-venet
 /usr/libexec/vzctl/scripts/vps-functions

 It appears that if I set up a cronjob that periodically runs ...

 /etc/sysconfig/network-scripts/ifup-venet
 /etc/sysconfig/network-scripts/ifcfg-br0

 ... then it'll handle the arp-seeds as well via the function vzarp() as
 provided by /usr/libexec/vzctl/scripts/vps-functions.

 I tried it on a test box and it didn't appear to have any negative
 impact. Some debugging lines thrown in by me also showed me the exact
 commands that these scripts use to do the arp-seeds. I can work with that.

 I'll implement that on the three affected nodes and will see if it helps.

 --
 With best regards

 Michael Stauber
 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users



-- 
Sincerely yours, Pavel Odintsov
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Hristo Benev
 Оригинално писмо 

 От: Michael Stauber mstau...@blueonyx.it

 Относно: Re: [Users] OpenVZ on EL6 - weird network issue

 До: OpenVZ users users@openvz.org

 Изпратено на: Неделя, 2014, Декември 28 00:18:17 EET



 
 Hi Pavel,
 
 
 
  Did you tried open-v-switch instead standard Linux bridges?
 
 
 
 I actually wasn't aware of that one before you mentioned it.
 
 
 
 I'm just starting to read the docs and specs of it and it looks indeed
 
 very interesting.
 
 
 
 Do you by chance have any specific pointers or docs how to set it up for
 
 usage with OpenVZ? I'm willing to give it a shot, but with these client
 
 systems I can't afford much in the way of experimenting. Taking the
 
 bridge stack apart and replacing it with something else might be a
 
 little too extreme, but I'll consider it as last resort. In any case I'd
 
 like to avoid the usual beginner mistakes one could make when switching
 
 to open-v-switch, so any tips and hints (or URLs to reading material)
 
 would be appreciated.
 
 
 
 As for Scott's suggestion with a cronjob with arp-seeds:
 
 
 
 I did some digging and the network related shell scripts of OpenVZ are
 
 quite enlightening there:
 
 
 
 /etc/sysconfig/network-scripts/ifup-venet
 
 /usr/libexec/vzctl/scripts/vps-functions
 
 
 
 It appears that if I set up a cronjob that periodically runs ...
 
 
 
 /etc/sysconfig/network-scripts/ifup-venet
 
 /etc/sysconfig/network-scripts/ifcfg-br0
 
 
 
 ... then it'll handle the arp-seeds as well via the function vzarp() as
 
 provided by /usr/libexec/vzctl/scripts/vps-functions.
 
 
 
 I tried it on a test box and it didn't appear to have any negative
 
 impact. Some debugging lines thrown in by me also showed me the exact
 
 commands that these scripts use to do the arp-seeds. I can work with that.
 
 
 
 I'll implement that on the three affected nodes and will see if it helps.
 
 
 
 -- 
 
 
With best regards
 
 
 
 Michael Stauber
 
 ___
 
 Users mailing list
 
 Users@openvz.org
 
 https://lists.openvz.org/mailman/listinfo/users


From what you are experiencing I'll check ARP on the switch...

If I understood correctly you firewall cannot reach the nodes (cannot be 
reached from outside...) - can you ssh to them from FW?

Good luck...
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ on EL6 - weird network issue

2014-12-27 Thread Michael Stauber
Hi Hristo,

 From what you are experiencing I'll check ARP on the switch...

We'll do that the next time it happens.

 If I understood correctly you firewall cannot reach the nodes 
 (cannot be reached from outside...) - can you ssh to them from FW?

When the problem happens we cannot SSH in to the affected node from
outside the private network. But from unaffected hosts inside the
private network we can SSH to the affected node(s). So far we didn't try
to SSH from the firewall to the affected node while we had the problem.
But you're right: This was probably an oversight and we'll definitely
check that next time around.

Thank you!

-- 
With best regards

Michael Stauber
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users