Re: Our webapp is running very slowly on one particular customer box
James, On 10/28/20 16:40, James H. H. Lampert wrote: First, thanks once again, Mr. Schultz, for getting back to me. I noticed something rather promising: it seems that maxThreads for the Port 443 connector were set at 150 for System "A" (problem box), but 400 for System "J" (box that's quite happy). I've restarted Tomcat with the maxThreads bumped up to 400, and so far, it seems much happier than it was. That could have been the problem all along. Hmm. That doesn't sound very satisfying to me, honestly. Allowing *more* load uses *less* GC and/or fewer page-faults? Seems fishy. My colleagues and I also observed that yesterday, when we did *not* shut down and restart, the slowdown and the nearly-full "tenured-SOA" portion of the heap eventually resolved itself, which suggests that it wasn't a memory leak in any even remotely conventional sense of the term. That's a Good Thing, but also not very satisfying when you just want it to stop sucking and let your users get work done :) The page-faulting is a virtual memory term: on an AS/400, the entire combined total of main storage and disk is addressable (the concept is called "Single-Level Store"), and virtual storage paging is built into the OS at a very low level; a "page fault" is when a process finds tries to access something that's been paged out to disk. Yes, this is the common definition of a page-fault, not just an AS/400 thing. Good to know for sure that AS/400 doesn't re-define that term, though :) How long has the process on System J been running? How about System A (before you restarted the JVM)? As to the private memory pool, it's not that the subsystem is restricted to its private pool; rather, everything else is kept *out* of that private pool. It still has full access to the "Machine" and "Base" shared pools. Okay, so it's like a guaranteed-minimum memory space? -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Our webapp is running very slowly on one particular customer box
First, thanks once again, Mr. Schultz, for getting back to me. I noticed something rather promising: it seems that maxThreads for the Port 443 connector were set at 150 for System "A" (problem box), but 400 for System "J" (box that's quite happy). I've restarted Tomcat with the maxThreads bumped up to 400, and so far, it seems much happier than it was. That could have been the problem all along. My colleagues and I also observed that yesterday, when we did *not* shut down and restart, the slowdown and the nearly-full "tenured-SOA" portion of the heap eventually resolved itself, which suggests that it wasn't a memory leak in any even remotely conventional sense of the term. The page-faulting is a virtual memory term: on an AS/400, the entire combined total of main storage and disk is addressable (the concept is called "Single-Level Store"), and virtual storage paging is built into the OS at a very low level; a "page fault" is when a process finds tries to access something that's been paged out to disk. As to the private memory pool, it's not that the subsystem is restricted to its private pool; rather, everything else is kept *out* of that private pool. It still has full access to the "Machine" and "Base" shared pools. -- JHHL - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Our webapp is running very slowly on one particular customer box
James, On 10/27/20 16:20, James H. H. Lampert wrote: This is related to my query (thanks, Mr. Gregg) about "Tenured SOA." It seems that on one of our customer installations, our webapp gets into a state of running very slowly, and the dedicated subsystem it's running in is showing massive levels of page-faulting. I've compared the GC stats of the "problem" system with one that's actually got more users connected, but isn't experiencing performance issues. It seems that they're both going to GC about every 30-50 seconds, but GC on the "problem" system appears to be somewhat less effective. Also, I've looked at the threads on both. On the system that is behaving normally, the "GC Slave" threads (7 of them) are showing total CPU (at this moment) of around 150 seconds each, and Aux I/O of mostly zero, with one showing 1 and one showing 3. Conversely, on the "problem" system, I'm seeing 15(!) GC Slave threads, each with total CPU under 6 seconds each, but Aux I/O ranging from 5800 to over 8000. I'm not sure what to make of this. In both cases, Tomcat's JVM is running in a subsystem of its own, with a private memory pool of around 7G, and they're both running with -Xms4096m -Xmx5120m. If you expect the service to be long-running, definitely set Xms=Xmx. There's no reason to artificially restrict the heap "early" in the process's lifetime only to completely re-size and re-organize the heap over time. You may as well allocate the maximum right up front and leave it that way. The problem system certainly appears to be thrashing its GC. Are there any environmental differences that you notice about the two systems? For a JVM with a maximum heap of ~5GiB, I think that a 7GiB private memory space (this is an AS/400 thing isn't it?) isn't large enough. The heap space is just the "Java heap" and there are other things that need memory, sometimes ~= to the heap size. It's sometimes surprising how much "native" memory a JVM needs. Is the kernel+userspace running in that "subsystem" as well? Or just the JVM process? I'm guessing that your comment about page-faulting and "Aux I/O rang[es] from 5800 - 8000 [sec]" means that you are actually paging the heap to the disk. What happens if you shrink your max-heap to 2GiB and change nothing else? This should make sure that your heap + native memory fits into physical memory and that thrashing should stop. Maybe you *do* need a 5GiB heap, though. In that case, if the heap-shrink works but you get OOMEs under load, then I think that simply increasing the memory allocated to the "subsystem" should help a lot. How much (real) memory does the system report is being used by the JVM process? I think you'll find it much larger than 5GiB. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Our webapp is running very slowly on one particular customer box
This is related to my query (thanks, Mr. Gregg) about "Tenured SOA." It seems that on one of our customer installations, our webapp gets into a state of running very slowly, and the dedicated subsystem it's running in is showing massive levels of page-faulting. I've compared the GC stats of the "problem" system with one that's actually got more users connected, but isn't experiencing performance issues. It seems that they're both going to GC about every 30-50 seconds, but GC on the "problem" system appears to be somewhat less effective. Also, I've looked at the threads on both. On the system that is behaving normally, the "GC Slave" threads (7 of them) are showing total CPU (at this moment) of around 150 seconds each, and Aux I/O of mostly zero, with one showing 1 and one showing 3. Conversely, on the "problem" system, I'm seeing 15(!) GC Slave threads, each with total CPU under 6 seconds each, but Aux I/O ranging from 5800 to over 8000. I'm not sure what to make of this. In both cases, Tomcat's JVM is running in a subsystem of its own, with a private memory pool of around 7G, and they're both running with -Xms4096m -Xmx5120m. -- JHHL - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org