Am 17.06.19 um 15:32 schrieb Alistair Evans: > Hi, > > I'm encountering a strange problem when running our JMeter test plan. > > I effectively get a halt to all JMeter activity for up to 60 seconds (a > strangely round number) in the middle of the test. JMeter treats this as > elapsed time when it processes the HTTP responses after the delay, so we get > a spike in our response times graph (screenshot available here: > https://imgur.com/ayKTzOU).
Are you sure it is JMeter that is hanging for 60 seconds? Have you had a look at the system under test? Maybe use another JMeter instance with less load to look, if you get the same 60 seconds period of hangup. Is this halt reproducible? If so take a few threaddumps at those times (both on JMeter side and the server side). > > Test Description: A typical test contains 1000 users, over 2 hours, with a 15 > minute ramp time on requests. Randomised delays are present between all > requests. > > Platform Details from JMeter Log: > Version 5.1.1 r1855137 > java.version=1.8.0_211 > java.vm.name=Java HotSpot(TM) 64-Bit Server VM > os.name=Windows 10 > os.arch=amd64 > os.version=10.0 > file.encoding=Cp1252 > java.awt.headless=true > Max memory =7654604800 > Available Processors =4 > Default Locale=English (EN) > JMeter Locale=English (EN) > > During this window of zero activity (in these examples between 11:38:44 and > 11:39:44), I have determined the following: > > > - During this window there are zero HTTP requests being processed by > the application servers. As written above, have a look at the state of the server, too. > > > - After the delay, JMeter issues any requests whose delay timers > have passed (no timers are longer than 1 or 2 seconds), causing a spike in > requests. That is expected and a feature for JMeter waiting to be implemented (that is handling these kind of overload gracefully and notifying the user about such an overload period). > > > - Processor usage on the load source box where JMeter is running > drops to almost zero during this period (you can see the perfmon capture for > the test on the JMeter box here: https://imgur.com/vgVhB1X). Not unsual, waiting on locks or the server doesn't need much cpu. > > > > - Disk queue length is also zero, which suggests it isn't a case of > Windows blocking a process when it opens a file handle. > > > - I've turned on GC Verbose logging, because my initial suspicion > was GC STW pauses, but weirdly no GC happens at all during that period, the > GC logs reports 60 seconds of application time, when normal time between GC > is 1 second or less; here's an extract of the log: > > > Metaspace used 53031K, capacity 83842K, committed 101364K, reserved > 1112064K > class space used 8006K, capacity 20674K, committed 38132K, reserved > 1048576K > } > [Times: user=0.19 sys=0.00, real=0.05 secs] > 2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application > threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 > seconds > 2019-06-17T11:38:44.246+0100: 1550.967: [GC concurrent-string-deduplication, > 2702.9K->1178.3K(1524.6K), avg 54.2%, 0.0038746 secs] > 2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 seconds > {Heap before GC invocations=1196 (full 1): > garbage-first heap total 7475200K, used 3054884K [0x0000000628400000, > 0x0000000628607210, 0x00000007f0800000) > region size 2048K, 256 young (524288K), 32 survivors (65536K) > Metaspace used 53616K, capacity 85276K, committed 102388K, reserved > 1114112K Nice to now that GC -- or rather memory usage -- is not a problem here. > > - If I turn on some of the debug logging in JMeter, (for the network > activity), then JMeter does still write to the log during this period, but at > a drastically reduced rate; for example, typically the > PoolingHttpClientConnectionManager generates a 'Closing connections idle > longer than 1 MICROSECONDS' log up to 10 times per second, but during this > period it writes that message zero times. Probably because all connections are still used waiting for the server. > > > - Changing GC settings does have some effect on the frequency of > this issue happening. With the default GC settings it happens quite often > (multiple times per test). I'm currently running the following custom GC > options, which reduces the frequency of the problem to about 50% probability > of it happening once during a test: > > > > -XX:+HeapDumpOnOutOfMemoryError > > -Xms7299m > > -Xmx7299m > > -XX:MaxMetaspaceSize=256m > > -XX:+UseG1GC > > -XX:MaxGCPauseMillis=100 > > -XX:G1ReservePercent=20 > > -XX:+AlwaysPreTouch > > -XX:+ParallelRefProcEnabled > > -XX:+UseStringDeduplication > > -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses > > -XX:NewSize=512m > > -XX:MaxNewSize=3g > > -XX:NewRatio=2 > > -XX:MaxMetaspaceExpansion=64M Wow, that are a lot of GC options. I tend to start without any GC options (except Xmx/Xms) and only add more if there are problems with GC. > > > > Any clues as to additional logging to further investigate, or JMeter settings > to change, would be greatly appreciated. Take threaddumps when the halt is happening. Felix > > Thanks, > > Alistair > > ================================================================== > Alistair Evans > Chief Software Architect for Wholesale Banking > 7 Layer Solutions Limited > Tel: +44 (0)1242 225520 > Email: alistair.ev...@7layer.net<mailto:alistair.ev...@7layer.net> > Web: www.7layer.net<http://www.7layer.net/> > ================================================================== > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@jmeter.apache.org For additional commands, e-mail: user-h...@jmeter.apache.org