I've been chasing a slowdown in our application for a couple of months now. I have what I believe is a solution (no slowdown for 4 days now). But I'm having difficulty understanding why the solution works.
Symptoms: At random intervals and a random times our web servers will go from serving responses in the 300 ms range to taking 30 seconds or more. Sometimes the servers will recover, sometimes they require a restart of the webserver (spring boot/tomcat). When the applications slow down we always see the tomcat thread pool hit the maximum size. Every single thread in the thread pool is in the RUNNABLE state but appears to be making no progress. Successive thread dumps show that the stacks are changing, but VERY slowly. The top of the stack is always this method: at java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native Method). The other common condition is that whatever application code is on the stack is always dynamically compiled. Code that is @CompileStatic is NEVER on the stack when we see these slowdowns. The thread dumps showed that the application code is never waiting on locks, socket reads, db connections, etc. Solution: The solution to the problem was to disable Indy compilation and return to non-Indy compilation. However, I don't think Indy is the problem here. I noticed that our Spring Boot executable jar contained BOTH groovy-all-2.4.5.jar AND groovy-all-indy-2.4.5.jar. Someone forgot to exclude the non-indy jars. My theory: Having both indy and non-indy jars on the classpath is confusing the JIT compiler. Code will be continuously JIT-ed as different methods fight over which class files to JIT, those loaded from the groovy-all jar or those loaded from the groovy-all-indy jar. If this is true then the compiler threads will be continuously running and applying native locks which are invisible to tools like VisualVM. The result would be random slowdowns because only certain combinations of code paths would result in slowdowns. It would also cause application code to go very slowly as the JIT compiler continuously re-compiles code over and over again. Application code would be stuck mostly waiting for JIT operations to complete as invalidated code is continuously removed and replaced. For now I will be leaving Indy disabled until we can do more accurate load testing in non production environments. My Question: Is this theory possible? Am I going in a direction that is possible or likely?
