A the subject says, I am seeing rate spikes in long duration tests using
tcpreplay. Spike occurred after roughly 12 hours when transmitting at a
rate of 425Mb/s.
Question: Is this a known problem? Is it possible I am doing something
wrong in my testing? I've never heard of such a thing before and none
of my colleagues who have used tcpreplay extensively have seen such a
thing either, and we are all at a loss to explain it.
I have a work-around so that I can complete my testing. Essentially I
will run tcpreplay on the file once at the given rate ("-l 1" instead of
"-l 0") and wrap *that* in a script so it repeats indefinitely. That
should get me the same results, but I would like to know if the
behaviour I am seeing is expected or anomalous.
Thanks for any time you have to look into this.
~~~~~
Setup:
I am using tcpreplay version 4.1.0 on a CentOS 7.2.1511 operating
system. My goal is to send traffic at a steady rate for a long duration
(24 hours). In my two attempts, I have seen that after about 12 hours
the rate that I specified was no longer enforced.
I have two servers directly connected by ethernet cable. Conveniently,
the connected interfaces are both named p1p3.
In my first attempt, I was sending traffic from one server to the other
at 415Mb/s.
# tcpreplay -i p1p3 -M 415 -l 0 ethernet_all.dmp
I started this test at around 18:00. I verified the speed on both ends
using a script which basically takes the delta of
# cat /sys/class/net/${INTERFACE}/statistics/[tr]x_bytes
periodically to compute the speed. I divided the result by 1024^2 to
get it in MB/s instead and was seeing 49MB/s consistently. The source
machine reported 49MB/s tx and the destination machine reported 49MB/s rx.
Things got strange at 05:26 the next day; the rate on both machines
jumped from 49MB/s to 114MB/s. By the time I saw it, it had been
running at that rate for many hours. When I stopped it, the script I
used to compute the rate reported 0MB/s, and when I restarted (only a
few seconds later), it was back to the normal 49MB/s.
I retried the same test the next day with the same results. This time I
was using a traffic rate of 425Mb/s (50MB/s), but the same result was
seen. I started the traffic around 16:00 and the spike occurred at 6:50
the next morning (sorry, I don't have exact times). Again, it began
transmitting at 114MB/s.
In an effort to isolate the problem, I repeated the procedure without my
application. I used tcpreplay on one end and tcpdump on the other. I
set a rate of "-M 425" and left it over night. About 12 hours later
(1:45) the spike again occurred ramping traffic up to 114MB/s.
Here are some additional data points:
- If I run at top speed (-t), then I also see 114MB/s, so it would seem
that after a while tcpreplay begins transmitting at top speed.
- Originally I was accessing my pcap file over an NFS mount. In order
to rule out the possibility that it was somehow affected by this, I made
a local copy of the file.
- There is no other data on that interface. When I stop tcpreplay, the
traffic rate drops to 0 and when I restart, it goes to the normal
(pre-spike) value. If there was another source of traffic, I would
expect stopping tcpreplay would decrease the rate only by the amount
contributed (e.g. 49MB/s), but not all the way to 0. This suggests that
tcpreplay is the only source of traffic.
Since it takes about 12 hours for this to happen, it is a little slow to
get some results. However, my next steps are to test the following:
- Reverse the flow. See if this same problem occurs if I replay traffic
from the destination back to the source. Should have the same problem,
but if not, then I can start investigating HW/config on the two
supposedly "identical" servers.
- Try with a different pcap file. Not sure why the pcap file should
have an effect, but I have no experience with the tcpreplay source code,
so perhaps.
- Try with higher/lower rates. For example, try with "-M 200" or "-M
600" instead. See if the problem occurs sooner/later/at all with the
different rates. If it is a memory problem, perhaps higher rates will
cause it to happen sooner. Also, if the spikes occur, see if they all
spike to the same 114MB/s value.
I am not sure what else to try, so I am open to suggestions. Like I
said, I have a work around so I don't need to investigate this, however
I would like to know if it is a bug or perhaps a user error.
If there is any additional information I can get or things I can try, I
am open to doing so.
Thanks again for your time!
Chris
TCPREPLAY version:
# tcpreplay -V
tcpreplay version: 4.1.0 (build git:v4.1.0)
Copyright 2013-2014 by Fred Klassen <tcpreplay at appneta dot com> -
AppNeta Inc.
Copyright 2000-2012 by Aaron Turner <aturner at synfin dot net>
The entire Tcpreplay Suite is licensed under the GPLv3
Cache file supported: 04
Not compiled with libdnet.
Compiled against libpcap: 1.5.3
64 bit packet counters: enabled
Verbose printing via tcpdump: enabled
Packet editing: disabled
Fragroute engine: disabled
Injection method: PF_PACKET send()
Not compiled with netmap
Sample command line:
# tcpreplay -i p1p3 -M 425 -l 0 ethernet_all.dmp
Platform:
# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
Network information (identical on both servers):
# ethtool p1p3
Settings for p1p3:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off (auto)
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
# ethtool -i p1p3
driver: igb
version: 5.2.15-k
firmware-version: 1.67, 0x80000d66, 16.5.20
bus-info: 0000:01:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
# lshw -class network
*-network:2
description: Ethernet interface
product: I350 Gigabit Network Connection
vendor: Intel Corporation
physical id: 0.2
bus info: pci@0000:01:00.2
logical name: p1p3
version: 01
serial: a0:36:9f:83:79:52 (ends in 78:bb on the other machine
for what it is worth)
size: 1Gbit/s
capacity: 1Gbit/s
width: 32 bits
clock: 33MHz
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom
ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=igb
driverversion=5.2.15-k duplex=full firmware=1.67, 0x80000d66, 16.5.20
latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
resources: irq:18 memory:a2c00000-a2cfffff
memory:a2f04000-a2f07fff memory:a0180000-a01fffff memory:a2f50000-a2f6ffff
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Tcpreplay-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tcpreplay-users
Support Information: http://tcpreplay.synfin.net/trac/wiki/Support