A the subject says, I am seeing rate spikes in long duration tests using tcpreplay. Spike occurred after roughly 12 hours when transmitting at a rate of 425Mb/s.
Question: Is this a known problem? Is it possible I am doing something wrong in my testing? I've never heard of such a thing before and none of my colleagues who have used tcpreplay extensively have seen such a thing either, and we are all at a loss to explain it. I have a work-around so that I can complete my testing. Essentially I will run tcpreplay on the file once at the given rate ("-l 1" instead of "-l 0") and wrap *that* in a script so it repeats indefinitely. That should get me the same results, but I would like to know if the behaviour I am seeing is expected or anomalous. Thanks for any time you have to look into this. ~~~~~ Setup: I am using tcpreplay version 4.1.0 on a CentOS 7.2.1511 operating system. My goal is to send traffic at a steady rate for a long duration (24 hours). In my two attempts, I have seen that after about 12 hours the rate that I specified was no longer enforced. I have two servers directly connected by ethernet cable. Conveniently, the connected interfaces are both named p1p3. In my first attempt, I was sending traffic from one server to the other at 415Mb/s. # tcpreplay -i p1p3 -M 415 -l 0 ethernet_all.dmp I started this test at around 18:00. I verified the speed on both ends using a script which basically takes the delta of # cat /sys/class/net/${INTERFACE}/statistics/[tr]x_bytes periodically to compute the speed. I divided the result by 1024^2 to get it in MB/s instead and was seeing 49MB/s consistently. The source machine reported 49MB/s tx and the destination machine reported 49MB/s rx. Things got strange at 05:26 the next day; the rate on both machines jumped from 49MB/s to 114MB/s. By the time I saw it, it had been running at that rate for many hours. When I stopped it, the script I used to compute the rate reported 0MB/s, and when I restarted (only a few seconds later), it was back to the normal 49MB/s. I retried the same test the next day with the same results. This time I was using a traffic rate of 425Mb/s (50MB/s), but the same result was seen. I started the traffic around 16:00 and the spike occurred at 6:50 the next morning (sorry, I don't have exact times). Again, it began transmitting at 114MB/s. In an effort to isolate the problem, I repeated the procedure without my application. I used tcpreplay on one end and tcpdump on the other. I set a rate of "-M 425" and left it over night. About 12 hours later (1:45) the spike again occurred ramping traffic up to 114MB/s. Here are some additional data points: - If I run at top speed (-t), then I also see 114MB/s, so it would seem that after a while tcpreplay begins transmitting at top speed. - Originally I was accessing my pcap file over an NFS mount. In order to rule out the possibility that it was somehow affected by this, I made a local copy of the file. - There is no other data on that interface. When I stop tcpreplay, the traffic rate drops to 0 and when I restart, it goes to the normal (pre-spike) value. If there was another source of traffic, I would expect stopping tcpreplay would decrease the rate only by the amount contributed (e.g. 49MB/s), but not all the way to 0. This suggests that tcpreplay is the only source of traffic. Since it takes about 12 hours for this to happen, it is a little slow to get some results. However, my next steps are to test the following: - Reverse the flow. See if this same problem occurs if I replay traffic from the destination back to the source. Should have the same problem, but if not, then I can start investigating HW/config on the two supposedly "identical" servers. - Try with a different pcap file. Not sure why the pcap file should have an effect, but I have no experience with the tcpreplay source code, so perhaps. - Try with higher/lower rates. For example, try with "-M 200" or "-M 600" instead. See if the problem occurs sooner/later/at all with the different rates. If it is a memory problem, perhaps higher rates will cause it to happen sooner. Also, if the spikes occur, see if they all spike to the same 114MB/s value. I am not sure what else to try, so I am open to suggestions. Like I said, I have a work around so I don't need to investigate this, however I would like to know if it is a bug or perhaps a user error. If there is any additional information I can get or things I can try, I am open to doing so. Thanks again for your time! Chris TCPREPLAY version: # tcpreplay -V tcpreplay version: 4.1.0 (build git:v4.1.0) Copyright 2013-2014 by Fred Klassen <tcpreplay at appneta dot com> - AppNeta Inc. Copyright 2000-2012 by Aaron Turner <aturner at synfin dot net> The entire Tcpreplay Suite is licensed under the GPLv3 Cache file supported: 04 Not compiled with libdnet. Compiled against libpcap: 1.5.3 64 bit packet counters: enabled Verbose printing via tcpdump: enabled Packet editing: disabled Fragroute engine: disabled Injection method: PF_PACKET send() Not compiled with netmap Sample command line: # tcpreplay -i p1p3 -M 425 -l 0 ethernet_all.dmp Platform: # cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) Network information (identical on both servers): # ethtool p1p3 Settings for p1p3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off (auto) Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: yes # ethtool -i p1p3 driver: igb version: 5.2.15-k firmware-version: 1.67, 0x80000d66, 16.5.20 bus-info: 0000:01:00.2 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no # lshw -class network *-network:2 description: Ethernet interface product: I350 Gigabit Network Connection vendor: Intel Corporation physical id: 0.2 bus info: pci@0000:01:00.2 logical name: p1p3 version: 01 serial: a0:36:9f:83:79:52 (ends in 78:bb on the other machine for what it is worth) size: 1Gbit/s capacity: 1Gbit/s width: 32 bits clock: 33MHz capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.2.15-k duplex=full firmware=1.67, 0x80000d66, 16.5.20 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s resources: irq:18 memory:a2c00000-a2cfffff memory:a2f04000-a2f07fff memory:a0180000-a01fffff memory:a2f50000-a2f6ffff ------------------------------------------------------------------------------ Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z _______________________________________________ Tcpreplay-users mailing list Tcpreplay-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tcpreplay-users Support Information: http://tcpreplay.synfin.net/trac/wiki/Support