Hi, in general yes it may make sense, a few years ago I was hitting same issue with 10GB Intel NICs, but with KVM on Ubuntu 14 due to either issues with kernel or the Intel driver itself - TSO and LRO should be disabled with routing and bridging, since it's incompatible in those scenarios. ( i.e. https://downloadmirror.intel.com/14687/eng/readme.txt and search for "incompatible" , or , http://ehaselwanter.com/en/blog/2014/11/02/mtu-issue--nope-it-is-lro-with-bridge-and-bond/ ). i.e. LRO/TSO should be automatically turned off when you add NIC to a bridge (i.e. CentOS6 was fine, but Ubuntu 14 had issues)...
Cheers Andrija On Sun, 24 Feb 2019 at 08:41, Haijiao <18602198...@163.com> wrote: > Hi, Dag and All > > > Yes, we are using active-active(mode 7) for bond. > > VM A1 ---> VR A(Isloated Network A) ----> VR B(Isolated Network B) ----> > VM B1 > > > > After rounds of isoloation, based on packet analysis, it seems to us > - the traffic between VM A1 and VR A is normal > - however, between VR A and VM B1, VR A receives packets > aknowledges from VM B1 which VR A thinks they has not sent thru it yet. > - Then, VR A reset the session, causing the traffic dropped. > > > For testing purpose, we turned off the TSO (tcp-segmentation-offload )on > XenServer network adpaters by command 'ethtool -k eth0 tso off', the > issue is just gone, we can run iperf for testing without any drop for a > couple of hours. > > > Does it make sense ? Any improvement can be implemented from ACS side ? > > > Thanks ! > > > > > 在2019年02月22 23时20分, "Haijiao"<18602198...@163.com>写道: > > > Thanks Dag, you are always helpful ! > > > We will look into your sharing and come back. > > > > > > > > 在2019年02月22 17时26分, "Dag Sonstebo"<dag.sonst...@shapeblue.com>写道: > > Hi Haijiao, > > We've come across similar things in the past. In short - what is your > XenServer bond mode? Is it active-active (mode 7) or LACP (mode 4)? (see > https://support.citrix.com/article/CTX137599) > > In short if your switches don't keep up with MAC address changes on the XS > hosts then you will get traffic flapping with intermittent loss of > connectivity (root cause is a MAC address moves to another uplink, but the > switch only checks for changes every X seconds so it takes a while for it > to catch up). LACP mode 4 has a much more robust mechanism for this but > obviously needs configured both XS and switch end. Normal active-active > (mode 7) seems to always cause problems. > > My general advise would be to simplify and just go active-passive (mode 1) > - unless you really need the bandwidth this gives you a much more stable > network backend. > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > > On 22/02/2019, 07:14, "Haijiao" <18602198...@163.com> wrote: > > Hi, Devs and Community Users > > > To be more specific, our environment is built with > * 2 Dell R740XD Servers + Dell Compellent Storage w/ iSCSI > * Each server equiped with two Mellanox Connect-4 LX 25GbE network > adapters, and configured with bond mode(active+active) in XenServer > * CloudStack 4.11.2 LTS + XenServer 7.1CU2(LTS) Enterprise > > > Everything goes fine with shared network, but the weird thing is if we > setup 2 isolated networks, try to use 'iperf', 'wget' or 'SCP' to test > the network performance betwen two VMs located in these 2 isolated > networks, the traffic will drop to zero in about 200-300 seconds, even > though we were still able to ping or SSH VM B1 from A1 or verse. > > > VM A1 ---> VR A(Isloated Network A) ----> VR B(Isolated Network B) > ----> VM B1 > > > ---------------------------------------------------------------------------------------------------------------------------------------- > We have checked the configuration on switches, upgraded Mellanox driver > for XenServer, but no luck. > Meanwhile, we can not re-produce this issue in another environment > (XenServer 7.1CU2+ACS 4.11.2+ Intel Gb network). > > > It seems it might be related to Mellanox adapter, but we have no idea > what part we could possibly miss in this case. > > > Any advice would be highly appreciated ! Thank you ! > > > 在2019年02月22 13时09分, "gu haven"<gumin...@hotmail.com>写道: > > > hi ,all > I try iperf wget scp connection will break after 200 seconds ,Do > need any optimization in vr ? > > environment infomation below: > > cloudstack 4.11.2 > > xenserver 7.1 CU2 Enterprise > > NIC :MLNX 25GbE 2P ConnectX4LX > > bond mode in xenserver : acitce-active > > > > > > > > > > > > > > > > > > dag.sonst...@shapeblue.com > www.shapeblue.com > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > > > > > > > > -- Andrija Panić