Re: Usage of SynchronizedStack/Queue
On Fri, Jun 5, 2020 at 6:29 PM Mark Thomas wrote: > On 05/06/2020 14:08, Martin Grigorov wrote: > > For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test > > The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails > -verbose:gc > > JDK 1.8.0 b252 > > > > The GC logs are: > > - Tomcat 9.0.x (uses > > SynchronizedQueue/Stack): > https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6 > > - Tomcat 9.0.x with my > > changes: > https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852 > > I've uploaded them to https://gceasy.io/ and the charts are similar. > > lock-free has marginally higher throughput. > > It has a longer GC pause but that might be a result of running longer > (~x2 longer). > > Average creation rate and average promotion rate are lower. > > Given this was with Java 8 I'm +1 to switching Tomcat 10 and also +1 to > switching Tomcat 9 if we can do it without impacting the API. > I've noticed org.apache.tomcat.util.collections.TesterPerformanceSynchronizedStack and org.apache.tomcat.util.collections.TesterPerformanceSynchronizedQueue in test/ folder. Those give 1.5-3 times better results for SynchronizedQueue/Stack than Java Collections impls. I am not so sure my proposal is a good one anymore. > > I'm neutral on making the change in Tomcat 8.5 and -0.5 on Tomcat 7. > > Mark > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >
Re: Usage of SynchronizedStack/Queue
On 05/06/2020 14:08, Martin Grigorov wrote: > For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test > The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc > JDK 1.8.0 b252 > > The GC logs are: > - Tomcat 9.0.x (uses > SynchronizedQueue/Stack): > https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6 > - Tomcat 9.0.x with my > changes: https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852 > I've uploaded them to https://gceasy.io/ and the charts are similar. lock-free has marginally higher throughput. It has a longer GC pause but that might be a result of running longer (~x2 longer). Average creation rate and average promotion rate are lower. Given this was with Java 8 I'm +1 to switching Tomcat 10 and also +1 to switching Tomcat 9 if we can do it without impacting the API. I'm neutral on making the change in Tomcat 8.5 and -0.5 on Tomcat 7. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Usage of SynchronizedStack/Queue
On Fri, Jun 5, 2020 at 3:08 PM Martin Grigorov wrote: > Hi, > > On Thu, Jun 4, 2020 at 9:28 PM Rémy Maucherat wrote: > >> On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas wrote: >> >>> On 04/06/2020 16:33, Martin Grigorov wrote: >>> > Hi team, >>> > >>> > I am profiling our application and I've found that Tomcat uses its own >>> > Synchronized Queue/Stack implementations which synchronize on instance >>> > level, i.e. their methods are 'synchronized'. >>> >> >> Can you work off 10 (master) first ? >> > > The application is based on Spring Boot and my attempts to > "jakartaee migrate it" were not successful. > > But I've created a very simple Servlet and I will try it on Tomcat 10.x > soon: > > @WebServlet(urlPatterns = "/test") > public class PlainTextServlet extends HttpServlet { > > public static final String CONTENT = "Hello world!"; > public static final int CONTENT_LENGTH = CONTENT.length(); > > @Override > protected void doGet(final HttpServletRequest req, final > HttpServletResponse resp) throws IOException { > resp.setContentType("text/plain"); > resp.setContentLength(CONTENT_LENGTH); > resp.getWriter().println(CONTENT); > } > } > > For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test > The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc > JDK 1.8.0 b252 > > The GC logs are: > - Tomcat 9.0.x (uses SynchronizedQueue/Stack): > https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6 > - Tomcat 9.0.x with my changes: > https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852 > I've uploaded them to https://gceasy.io/ and the charts are similar. > > The summary from wrk with SynchronizedQueue/Stack is: > > Running 1m test @ http://192.168.0.206:8080 > 8 threads and 96 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 1.24ms 788.15us 19.46ms 91.44% > Req/Sec10.04k 1.64k 12.45k79.48% > 4797193 requests in 1.00m, 600.19MB read > Requests/sec: 79834.45 > Transfer/sec: 9.99MB > > I didn't preserve the summary from the lock-free run but it was similar - > the throughput is around 80K/sec +- few thousands. > I.e. the throughput does not improve after my changes! > Only the locking profiling results become better (Async Profiler: > ./profile.sh -e lock -d50s -f lock-results.txt PID) > I rarely got any gains when making an optimization lately. But this sounds good [or not bad at least]. Rémy > > Martin > > >> >>> >>> ~8 years ago when I was profiling Tomcat and wrote those having >>> something targetted to Tomcat's specific use case made a measurable >>> improvement. >>> >> >> Yes, the SyncStack was the go to structure for all reuse. Not sure if >> it's the best. >> >> I already removed some of the usage in 10: >> - No more block poller and its pooling/recycling [I had to pull its >> removal out of 9 due to an unfortunate regression, this could be brought >> back now that it seems to work] >> - No recycling of trivial objects >> >> The main use should now be the recycled processors and the recycled >> channels (it would be bad to not reuse them since they may use direct >> buffers ...). >> >> >>> >>> I am more than prepared to believe that things have moved on since then >>> but I'd be interested in seeing the same numbers with the latest Java 8 >>> from Adopt OpenJDK rather than Java 14. >>> >> >> +1, firnally there could be some differences between JDK, like now they >> started to actually improve NIO 1 again. >> >> Rémy >> >> >>> >>> I'd also like to see overall throughput numbers and the GC frequency for >>> each test scenario. >>> >>> Mark >>> >>> >>> >>> > >>> > The javadoc for those two classes say that they are preferred because >>> > they are GC-friendlier than >>> > java.uti.concurrent.ConcurrentLinkedQueue. >>> > >>> > I use Async Profiler >>> > (https://github.com/jvm-profiling-tools/async-profiler) to profile >>> CPU, >>> > memory, >>> > locks and few lower level types (from perf). >>> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such >>> reports: >>> > >>> > Started [lock] profiling >>> > >>> > --- Execution profile --- >>> > >>> > Total samples : 713834 >>> >>> > >>> > >>> > >>> > Frame buffer usage : 0.0288% >>> >>> > >>> > >>> > --- >>> > 1106165618177 ns (58.97%), 395609 samples >>> > >>> > [ 0] >>> > org.apache.tomcat.util.collections.SynchronizedStack >>> >>> > >>> > [ 1] >>> > org.apache.tomcat.util.collections.SynchronizedStack.push >>> > >>> > [ 2] >>> > org.apache.tomcat.util.net.NioBlockingSelector.write >>> >>> > >>> > [ 3] >>> > org.apache.tomcat.util.net.NioSelectorPool.write >>> >>> >
Re: Usage of SynchronizedStack/Queue
Hi, On Thu, Jun 4, 2020 at 9:28 PM Rémy Maucherat wrote: > On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas wrote: > >> On 04/06/2020 16:33, Martin Grigorov wrote: >> > Hi team, >> > >> > I am profiling our application and I've found that Tomcat uses its own >> > Synchronized Queue/Stack implementations which synchronize on instance >> > level, i.e. their methods are 'synchronized'. >> > > Can you work off 10 (master) first ? > The application is based on Spring Boot and my attempts to "jakartaee migrate it" were not successful. But I've created a very simple Servlet and I will try it on Tomcat 10.x soon: @WebServlet(urlPatterns = "/test") public class PlainTextServlet extends HttpServlet { public static final String CONTENT = "Hello world!"; public static final int CONTENT_LENGTH = CONTENT.length(); @Override protected void doGet(final HttpServletRequest req, final HttpServletResponse resp) throws IOException { resp.setContentType("text/plain"); resp.setContentLength(CONTENT_LENGTH); resp.getWriter().println(CONTENT); } } For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc JDK 1.8.0 b252 The GC logs are: - Tomcat 9.0.x (uses SynchronizedQueue/Stack): https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6 - Tomcat 9.0.x with my changes: https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852 I've uploaded them to https://gceasy.io/ and the charts are similar. The summary from wrk with SynchronizedQueue/Stack is: Running 1m test @ http://192.168.0.206:8080 8 threads and 96 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.24ms 788.15us 19.46ms 91.44% Req/Sec10.04k 1.64k 12.45k79.48% 4797193 requests in 1.00m, 600.19MB read Requests/sec: 79834.45 Transfer/sec: 9.99MB I didn't preserve the summary from the lock-free run but it was similar - the throughput is around 80K/sec +- few thousands. I.e. the throughput does not improve after my changes! Only the locking profiling results become better (Async Profiler: ./profile.sh -e lock -d50s -f lock-results.txt PID) Martin > >> >> ~8 years ago when I was profiling Tomcat and wrote those having >> something targetted to Tomcat's specific use case made a measurable >> improvement. >> > > Yes, the SyncStack was the go to structure for all reuse. Not sure if it's > the best. > > I already removed some of the usage in 10: > - No more block poller and its pooling/recycling [I had to pull its > removal out of 9 due to an unfortunate regression, this could be brought > back now that it seems to work] > - No recycling of trivial objects > > The main use should now be the recycled processors and the recycled > channels (it would be bad to not reuse them since they may use direct > buffers ...). > > >> >> I am more than prepared to believe that things have moved on since then >> but I'd be interested in seeing the same numbers with the latest Java 8 >> from Adopt OpenJDK rather than Java 14. >> > > +1, firnally there could be some differences between JDK, like now they > started to actually improve NIO 1 again. > > Rémy > > >> >> I'd also like to see overall throughput numbers and the GC frequency for >> each test scenario. >> >> Mark >> >> >> >> > >> > The javadoc for those two classes say that they are preferred because >> > they are GC-friendlier than >> > java.uti.concurrent.ConcurrentLinkedQueue. >> > >> > I use Async Profiler >> > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, >> > memory, >> > locks and few lower level types (from perf). >> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such >> reports: >> > >> > Started [lock] profiling >> > >> > --- Execution profile --- >> > >> > Total samples : 713834 >> > >> > >> > >> > Frame buffer usage : 0.0288% >> > >> > >> > --- >> > 1106165618177 ns (58.97%), 395609 samples >> > >> > [ 0] >> > org.apache.tomcat.util.collections.SynchronizedStack >> > >> > [ 1] >> > org.apache.tomcat.util.collections.SynchronizedStack.push >> > >> > [ 2] >> > org.apache.tomcat.util.net.NioBlockingSelector.write >> >> > >> > [ 3] >> > org.apache.tomcat.util.net.NioSelectorPool.write >> >> > >> > [ 4] >> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite >> > >> > [ 5] >> > org.apache.tomcat.util.net.SocketWrapperBase.doWrite >> >> > >> > [ 6] >> > org.apache.tomcat.util.net.Sock
Re: Usage of SynchronizedStack/Queue
On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas wrote: > On 04/06/2020 16:33, Martin Grigorov wrote: > > Hi team, > > > > I am profiling our application and I've found that Tomcat uses its own > > Synchronized Queue/Stack implementations which synchronize on instance > > level, i.e. their methods are 'synchronized'. > Can you work off 10 (master) first ? > > ~8 years ago when I was profiling Tomcat and wrote those having > something targetted to Tomcat's specific use case made a measurable > improvement. > Yes, the SyncStack was the go to structure for all reuse. Not sure if it's the best. I already removed some of the usage in 10: - No more block poller and its pooling/recycling [I had to pull its removal out of 9 due to an unfortunate regression, this could be brought back now that it seems to work] - No recycling of trivial objects The main use should now be the recycled processors and the recycled channels (it would be bad to not reuse them since they may use direct buffers ...). > > I am more than prepared to believe that things have moved on since then > but I'd be interested in seeing the same numbers with the latest Java 8 > from Adopt OpenJDK rather than Java 14. > +1, firnally there could be some differences between JDK, like now they started to actually improve NIO 1 again. Rémy > > I'd also like to see overall throughput numbers and the GC frequency for > each test scenario. > > Mark > > > > > > > The javadoc for those two classes say that they are preferred because > > they are GC-friendlier than > > java.uti.concurrent.ConcurrentLinkedQueue. > > > > I use Async Profiler > > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, > > memory, > > locks and few lower level types (from perf). > > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such > reports: > > > > Started [lock] profiling > > > > --- Execution profile --- > > > > Total samples : 713834 > > > > > > > > Frame buffer usage : 0.0288% > > > > > > --- > > 1106165618177 ns (58.97%), 395609 samples > > > > [ 0] > > org.apache.tomcat.util.collections.SynchronizedStack > > > > [ 1] > > org.apache.tomcat.util.collections.SynchronizedStack.push > > > > [ 2] > > org.apache.tomcat.util.net.NioBlockingSelector.write > > > > [ 3] > > org.apache.tomcat.util.net.NioSelectorPool.write > > > > [ 4] > > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite > > > > [ 5] > > org.apache.tomcat.util.net.SocketWrapperBase.doWrite > > > > [ 6] > > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking > > > > [ 7] > > org.apache.tomcat.util.net.SocketWrapperBase.flush > > > > [ 8] > > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end > > [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end > > > > [10] org.apache.coyote.http11.Http11OutputBuffer.end > > [11] org.apache.coyote.http11.Http11Processor.finishResponse > > [12] org.apache.coyote.AbstractProcessor.action > > [13] org.apache.coyote.Response.action > > [14] org.apache.catalina.connector.OutputBuffer.close > > [15] org.apache.catalina.connector.Response.finishResponse > > [16] org.apache.catalina.connector.CoyoteAdapter.service > > [17] org.apache.coyote.http11.Http11Processor.service > > [18] org.apache.coyote.AbstractProcessorLight.process > > [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process > > [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun > > [21] org.apache.tomcat.util.net.SocketProcessorBase.run > > [22] java.util.concurrent.ThreadPoolExecutor.runWorker > > [23] java.util.concurrent.ThreadPoolExecutor$Worker.run > > [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run > > [25] java.lang.Thread.run > > > > --- 769519847169 ns (41.03%), 314384 samples > > [ 0] org.apache.tomcat.util.collections.SynchronizedStack > > [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop > > [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write > > [ 3] org.apache.tomcat.util.net.NioSelectorPool.write > > [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite > > [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite > > [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking > > [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush > > [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.en
Re: Usage of SynchronizedStack/Queue
On 04/06/2020 16:33, Martin Grigorov wrote: > Hi team, > > I am profiling our application and I've found that Tomcat uses its own > Synchronized Queue/Stack implementations which synchronize on instance > level, i.e. their methods are 'synchronized'. ~8 years ago when I was profiling Tomcat and wrote those having something targetted to Tomcat's specific use case made a measurable improvement. I am more than prepared to believe that things have moved on since then but I'd be interested in seeing the same numbers with the latest Java 8 from Adopt OpenJDK rather than Java 14. I'd also like to see overall throughput numbers and the GC frequency for each test scenario. Mark > > The javadoc for those two classes say that they are preferred because > they are GC-friendlier than > java.uti.concurrent.ConcurrentLinkedQueue. > > I use Async Profiler > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, > memory, > locks and few lower level types (from perf). > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports: > > Started [lock] profiling > > --- Execution profile --- > > Total samples : 713834 > > > > Frame buffer usage : 0.0288% > > > --- > 1106165618177 ns (58.97%), 395609 samples > > [ 0] > org.apache.tomcat.util.collections.SynchronizedStack > > [ 1] > org.apache.tomcat.util.collections.SynchronizedStack.push > > [ 2] > org.apache.tomcat.util.net.NioBlockingSelector.write > > [ 3] > org.apache.tomcat.util.net.NioSelectorPool.write > > [ 4] > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite > > [ 5] > org.apache.tomcat.util.net.SocketWrapperBase.doWrite > > [ 6] > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking > > [ 7] > org.apache.tomcat.util.net.SocketWrapperBase.flush > > [ 8] > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end > [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end > > [10] org.apache.coyote.http11.Http11OutputBuffer.end > [11] org.apache.coyote.http11.Http11Processor.finishResponse > [12] org.apache.coyote.AbstractProcessor.action > [13] org.apache.coyote.Response.action > [14] org.apache.catalina.connector.OutputBuffer.close > [15] org.apache.catalina.connector.Response.finishResponse > [16] org.apache.catalina.connector.CoyoteAdapter.service > [17] org.apache.coyote.http11.Http11Processor.service > [18] org.apache.coyote.AbstractProcessorLight.process > [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process > [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun > [21] org.apache.tomcat.util.net.SocketProcessorBase.run > [22] java.util.concurrent.ThreadPoolExecutor.runWorker > [23] java.util.concurrent.ThreadPoolExecutor$Worker.run > [24] org.apache.tomcat.util.threads.TaskThread$WrappingRu
Usage of SynchronizedStack/Queue
Hi team, I am profiling our application and I've found that Tomcat uses its own Synchronized Queue/Stack implementations which synchronize on instance level, i.e. their methods are 'synchronized'. The javadoc for those two classes say that they are preferred because they are GC-friendlier than java.uti.concurrent.ConcurrentLinkedQueue. I use Async Profiler (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, memory, locks and few lower level types (from perf). Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports: Started [lock] profiling --- Execution profile --- Total samples : 713834 Frame buffer usage : 0.0288% --- 1106165618177 ns (58.97%), 395609 samples [ 0] org.apache.tomcat.util.collections.SynchronizedStack [ 1] org.apache.tomcat.util.collections.SynchronizedStack.push [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write [ 3] org.apache.tomcat.util.net.NioSelectorPool.write [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end [10] org.apache.coyote.http11.Http11OutputBuffer.end [11] org.apache.coyote.http11.Http11Processor.finishResponse [12] org.apache.coyote.AbstractProcessor.action [13] org.apache.coyote.Response.action [14] org.apache.catalina.connector.OutputBuffer.close [15] org.apache.catalina.connector.Response.finishResponse [16] org.apache.catalina.connector.CoyoteAdapter.service [17] org.apache.coyote.http11.Http11Processor.service [18] org.apache.coyote.AbstractProcessorLight.process [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun [21] org.apache.tomcat.util.net.SocketProcessorBase.run [22] java.util.concurrent.ThreadPoolExecutor.runWorker [23] java.util.concurrent.ThreadPoolExecutor$Worker.run [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run [25] java.lang.Thread.run --- 769519847169 ns (41.03%), 314384 samples [ 0] org.apache.tomcat.util.collections.SynchronizedStack [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write [ 3] org.apache.tomcat.util.net.NioSelectorPool.write [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end [10] org.apache.coyote.http11.Http11OutputBuffer.end [11] org.apache.coyote.http11.Http11Processor.finishResponse I've made the following changes - https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack and ran the test suite - all green! Now the output from the locks profiling looks like: --- 72354435197 ns (47.98%), 4008 samples [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync [ 1] jdk.internal.misc.Unsafe.park [ 2] java.util.concurrent.locks.LockSupport.park [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire [ 4] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly [ 7] java.util.concurrent.LinkedBlockingQueue.poll [ 8] org.apache.tomcat.util.threads.TaskQueue.poll [ 9] org.apache.tomcat.util.threads.TaskQueue.poll [10] java.util.concurrent.ThreadPoolExecutor.getTask [11] java.util.concurrent.ThreadPoolExecutor.runWorker [12] java.util.concurrent.ThreadPoolExecutor$Worker.run [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run [14] java.lang.Thread.run --- 58413748017 ns (38.74%), 10179 samples [ 0] java.lang.Object [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray [ 2] org.apache.catalin