Greetings, community. Recently I've noticed traffic loss in my IP over IB network environment. I'm using servers equipped with MCX354A-QCBT under Debian 7.6 with MLNX_OFED_LINUX-2.4-1.0.4 and MCX353A-QCBT under Windows 2012 Server with WinOF 4.90. All cards are burned with 2.33.5000 fw. Hosts are connected to InfiniScale IS5023 switch.
Packet loss happens when tx rate is over 1001p/s and packet size is less than 2002 bytes. As test I run some ping series between Linux and Windows hosts. They show following: IPoIB, rate = 10k p/s, over 10k packers are sent. Packet size > 2002. No loss. ping -q -i 0.0001 -c 100000 -s 2003 192.168.* 100000 packets transmitted, 100000 received, 0% packet loss, time 5310ms rtt min/avg/max/mdev = 0.035/0.046/3.765/0.098 ms, ipg/ewma 0.053/0.043 ms IPoIB, rate = 10k p/s, over 10k packers are sent. Packet size = 2002. I see loss. ping -q -i 0.0001 -c 100000 -s 2002 192.168.* 100000 packets transmitted, 98795 received, 1% packet loss, time 18688ms rtt min/avg/max/mdev = 0.023/0.034/8.920/0.106 ms, ipg/ewma 0.186/0.156 ms IPoIB, rate = 10k p/s, over 10k packers are sent. Packet size = 2002. I see loss. ping -q -i 0.0001 -c 100000 -s 2002 192.168.* 100000 packets transmitted, 99255 received, 0% packet loss, time 13153ms rtt min/avg/max/mdev = 0.024/0.035/8.636/0.103 ms, ipg/ewma 0.131/0.025 ms IPoIB, rate = 10k p/s, over 10k packers are sent. Packet size > 2002. No loss. ping -q -i 0.0001 -c 100000 -s 2003 192.168.* 100000 packets transmitted, 100000 received, 0% packet loss, time 9278ms rtt min/avg/max/mdev = 0.074/0.085/9.890/0.108 ms, ipg/ewma 0.092/0.076 ms IPoIB, rate is 10k p/s, but less than 10k packets are sent. No loss. ping -i 0.0001 192.168.* -q -c 1000 1000 packets transmitted, 1000 received, 0% packet loss, time 45ms rtt min/avg/max/mdev = 0.031/0.040/1.065/0.086 ms, ipg/ewma 0.045/0.033 ms IPoIB, rate is 10k p/s, 10k packets are sent. Loss happens again. ping -i 0.0001 192.168.* -q -c 10000 10000 packets transmitted, 9842 received, 1% packet loss, time 2334ms rtt min/avg/max/mdev = 0.020/0.038/1.072/0.088 ms, ipg/ewma 0.233/0.023 ms IPoIB, rate is 10k p/s, over 10k packets are sent. Packet size is standard. Loss detected. ping -q -i 0.0001 -c 100000 192.168.* 100000 packets transmitted, 98167 received, 1% packet loss, time 25631ms IPoIB, rate is 1k p/s, 10k packets are sent. Standard packet size. ping -i 0.001 -c 10000 -q 192.168.* 10000 packets transmitted, 10000 received, 0% packet loss, time 9998ms rtt min/avg/max/mdev = 0.021/0.025/0.334/0.010 ms IPoIB, rate is higher than 1k p/s, 10k packets are sent. Standard packet size. ping -i 0.0009 -c 10000 -q 192.168.* 10000 packets transmitted, 9865 received, 1% packet loss, time 2126ms rtt min/avg/max/mdev = 0.021/0.036/1.070/0.088 ms, ipg/ewma 0.212/0.022 ms Also I have a twin linux server. I see no packet loss during tests between Linux and Linux: IPoIB, rate is higher than 1k p/s, -c ping -i 0.0009 -c 10000 -q 192.168.* 10000 packets transmitted, 10000 received, 0% packet loss, time 311ms rtt min/avg/max/mdev = 0.015/0.026/1.398/0.096 ms, ipg/ewma 0.031/0.017 ms All of these servers are also connected to same 1G Ethernet switch. While running equal test in Ethrenet environment I see no packet loss at all whatever with no dependence on OS: Ethernet, rate > 1k p/s, 10k packets are sent, standard packet size. ping -i 0.0009 -c 10000 -q 192.168.** 10000 packets transmitted, 10000 received, 0% packet loss, time 750ms rtt min/avg/max/mdev = 0.039/0.065/0.632/0.013 ms, ipg/ewma 0.075/0.066 ms Please feel free to share ideas about this issue.
_______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/mailman/listinfo/ofw
