Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
Hi Mark, For reproducing, if you're making the requests locally to the server that might be why you're not seeing the issue occur. I had been testing between servers (client on server A, tomcat on server B); I tried this morning doing a wget locally to tomcat and do not see the issue occur (no WSAECONNABORTED status) while doing the wget remotely does get the aborted status. in sslnetwork.c:ssl_socket_recv adding the following debug statements; if (s == 0 && (SSL_get_shutdown(con->ssl) & SSL_RECEIVED_SHUTDOWN)) { con->shutdown_type = SSL_SHUTDOWN_TYPE_STANDARD; return APR_EOF; } printf("ssl_socket_recv:: Have value for i: %d\n", i); printf("ssl_socket_recv:: Have value for s: %d\n", s); printf("ssl_socket_recv:: Have value for rv: %d\n", rv); printf("ssl_socket_recv:: con->shutdown_type: %d\n", con->shutdown_type); fflush(stdout); switch (i) { case SSL_ERROR_WANT_READ: case SSL_ERROR_WANT_WRITE: Outputs: Local server (wget --no-check-certificate https://localhost:8443/examples ) ssl_socket_recv:: Have value for i: 5 ssl_socket_recv:: Have value for s: -1 ssl_socket_recv:: Have value for rv: 730054 ssl_socket_recv:: con->shutdown_type: 2 While on remote server performing the same request (wget --no-check-certificate https://serverB:8443/examples) ssl_socket_recv:: Have value for i: 5 ssl_socket_recv:: Have value for s: -1 ssl_socket_recv:: Have value for rv: 730053 ssl_socket_recv:: con->shutdown_type: 2 Thanks, On Fri, Jan 6, 2017 at 5:43 PM, David Oswellwrote: > Yep that would, Originally I had just turned the return 0 to throw an > IOException - which as it would fall through to the else would be the same > effect. > > I'll see if a java client would do the same, I'm using a golang > client/console app for testing. > I just tried with wget and see the WSAECONNABORTED status within APR. > It also seems a bit time dependent, might be a case of whether APR is > getting to the socket read before or after the FIN packet coming though ?? > > > > On Fri, Jan 6, 2017 at 5:11 PM, Mark Thomas wrote: > >> On 06/01/2017 21:50, David Oswell wrote: >> > I've somehow gotten the build for tcnative working here (more shocked I >> > finally got openssl to build!) >> > >> > There seems to be a slight difference in how the shutdown occurs. When >> the >> > APR_EGENERAL is returned its due to falling through the >> SSL_ERROR_SYSCALL >> > block in sslnetwork.c:ssl_socket_recv, >> > This seems to be due to a difference in the value returned by; >> > sslnetwork.c:324 :rv = >> > apr_get_netos_error(); >> > >> > on the bad case (quick socket close), rv is (730053) which >> > is WSAECONNABORTED - APR_STATUS_IS_ECONNABORTED >> > on a good disconnect case (slower socket close) rv is (730054) >> > = WSAECONNRESET - APR_STATUS_IS_ECONNRESET >> > >> > I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to >> > capture this scenario (WSAECONNABORTED), however I'm not sure how else >> this >> > status might occur, and if any of those cases should not flag it as >> closed >> > - although reading on WSAECONNABORTED it sounds like this is a close >> case. >> > Not sure if it's an exception case or just normal EOF though. >> >> Thanks. That is really useful information. >> >> I've been trying to re-create the original issue that led to this odd >> 'treat an error as eagain' code but without success. I have found a >> couple of other bugs (now fixed) so it wasn't a complete waste of time. >> >> When I added this hack I was fairly sure I was missing something and it >> is looking increasingly like this code was fixing a symptom rather than >> the root cause. Given that I can't re-create the original problem, I'm >> leaning towards removing the special handling for EGENERAL and letting >> it trigger a close. >> >> If you remove the >> else if (-result == Status.APR_EGENERAL && isSecure()) { >> ... >> } >> >> block, does that fix the problem you are seeing? >> >> Mark >> >> >> - >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> >
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
Yep that would, Originally I had just turned the return 0 to throw an IOException - which as it would fall through to the else would be the same effect. I'll see if a java client would do the same, I'm using a golang client/console app for testing. I just tried with wget and see the WSAECONNABORTED status within APR. It also seems a bit time dependent, might be a case of whether APR is getting to the socket read before or after the FIN packet coming though ?? On Fri, Jan 6, 2017 at 5:11 PM, Mark Thomaswrote: > On 06/01/2017 21:50, David Oswell wrote: > > I've somehow gotten the build for tcnative working here (more shocked I > > finally got openssl to build!) > > > > There seems to be a slight difference in how the shutdown occurs. When > the > > APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL > > block in sslnetwork.c:ssl_socket_recv, > > This seems to be due to a difference in the value returned by; > > sslnetwork.c:324 :rv = > > apr_get_netos_error(); > > > > on the bad case (quick socket close), rv is (730053) which > > is WSAECONNABORTED - APR_STATUS_IS_ECONNABORTED > > on a good disconnect case (slower socket close) rv is (730054) > > = WSAECONNRESET - APR_STATUS_IS_ECONNRESET > > > > I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to > > capture this scenario (WSAECONNABORTED), however I'm not sure how else > this > > status might occur, and if any of those cases should not flag it as > closed > > - although reading on WSAECONNABORTED it sounds like this is a close > case. > > Not sure if it's an exception case or just normal EOF though. > > Thanks. That is really useful information. > > I've been trying to re-create the original issue that led to this odd > 'treat an error as eagain' code but without success. I have found a > couple of other bugs (now fixed) so it wasn't a complete waste of time. > > When I added this hack I was fairly sure I was missing something and it > is looking increasingly like this code was fixing a symptom rather than > the root cause. Given that I can't re-create the original problem, I'm > leaning towards removing the special handling for EGENERAL and letting > it trigger a close. > > If you remove the > else if (-result == Status.APR_EGENERAL && isSecure()) { > ... > } > > block, does that fix the problem you are seeing? > > Mark > > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
On 06/01/2017 21:50, David Oswell wrote: > I've somehow gotten the build for tcnative working here (more shocked I > finally got openssl to build!) > > There seems to be a slight difference in how the shutdown occurs. When the > APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL > block in sslnetwork.c:ssl_socket_recv, > This seems to be due to a difference in the value returned by; > sslnetwork.c:324 :rv = > apr_get_netos_error(); > > on the bad case (quick socket close), rv is (730053) which > is WSAECONNABORTED - APR_STATUS_IS_ECONNABORTED > on a good disconnect case (slower socket close) rv is (730054) > = WSAECONNRESET - APR_STATUS_IS_ECONNRESET > > I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to > capture this scenario (WSAECONNABORTED), however I'm not sure how else this > status might occur, and if any of those cases should not flag it as closed > - although reading on WSAECONNABORTED it sounds like this is a close case. > Not sure if it's an exception case or just normal EOF though. Thanks. That is really useful information. I've been trying to re-create the original issue that led to this odd 'treat an error as eagain' code but without success. I have found a couple of other bugs (now fixed) so it wasn't a complete waste of time. When I added this hack I was fairly sure I was missing something and it is looking increasingly like this code was fixing a symptom rather than the root cause. Given that I can't re-create the original problem, I'm leaning towards removing the special handling for EGENERAL and letting it trigger a close. If you remove the else if (-result == Status.APR_EGENERAL && isSecure()) { ... } block, does that fix the problem you are seeing? Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
I've somehow gotten the build for tcnative working here (more shocked I finally got openssl to build!) There seems to be a slight difference in how the shutdown occurs. When the APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL block in sslnetwork.c:ssl_socket_recv, This seems to be due to a difference in the value returned by; sslnetwork.c:324 :rv = apr_get_netos_error(); on the bad case (quick socket close), rv is (730053) which is WSAECONNABORTED - APR_STATUS_IS_ECONNABORTED on a good disconnect case (slower socket close) rv is (730054) = WSAECONNRESET - APR_STATUS_IS_ECONNRESET I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to capture this scenario (WSAECONNABORTED), however I'm not sure how else this status might occur, and if any of those cases should not flag it as closed - although reading on WSAECONNABORTED it sounds like this is a close case. Not sure if it's an exception case or just normal EOF though. On Fri, Jan 6, 2017 at 10:00 AM, Mark Thomaswrote: > On 06/01/2017 10:52, Mark Thomas wrote: > > > I can reproduce the (new?) loop error with the following: > > - clean trunk (9.0.x) build > > - enable debug logging for AprEndpoint > > - APR/native 1.2.10 > > - WebSocket drawboard example > > - hold down F5 > > > > I see some error messages as per the original problem but fairly quickly > > Tomcat enters the infinite loop. > > > > Next steps are digging into the APR/native code. > > It was a different loop related specifically to WebSocket that is now > fixed. > > I'm still looking into this. I'm am currently leaning towards removing > the suspect APR_EGENERAL handling from Tomcat but I want to see if I can > re-create the problem that caused it to be added in the first place. I > haven't had any success so far which leads me to suspect that the root > cause may have been a Tomcat bug that was fixed in the 8.5.x/trunk > refactoring. Proving that is going to be tricky. > > Mark > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
On 06/01/2017 10:52, Mark Thomas wrote: > I can reproduce the (new?) loop error with the following: > - clean trunk (9.0.x) build > - enable debug logging for AprEndpoint > - APR/native 1.2.10 > - WebSocket drawboard example > - hold down F5 > > I see some error messages as per the original problem but fairly quickly > Tomcat enters the infinite loop. > > Next steps are digging into the APR/native code. It was a different loop related specifically to WebSocket that is now fixed. I'm still looking into this. I'm am currently leaning towards removing the suspect APR_EGENERAL handling from Tomcat but I want to see if I can re-create the problem that caused it to be added in the first place. I haven't had any success so far which leads me to suspect that the root cause may have been a Tomcat bug that was fixed in the 8.5.x/trunk refactoring. Proving that is going to be tricky. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
On 05/01/2017 22:30, Mark Thomas wrote: > On 05/01/2017 22:01, David Oswell wrote: >> After some more digging I've been able to further narrow down the problem >> somewhat, but still not able to pin point the exact cause; >> The issue is not load related, but rather seems to be related to the timing >> of the TCP connection being closed. >> Depending on the timing the poller and exec appear to get into a loop - >> drilled thsi down to an error status returned in AprEndpoint - which might >> need to be thrown as an exception rather than return 0 from >> the fillReadBuffer in AprEndpoint. >> >> Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups >> Poller thread - AprEndpoint:1652 - Poller adds socket to timeout >> Poller thread - AprEndpoint:1675 - Poller gets rv = 1 >> Poller thread - AprEndpoint:1694 - Poller gets connection (socket >> Id 565911936 ) >> Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event >> OPEN_READ >> Poller thread - AbstractEndpoint:903 - executor executes >> AprEndpoint$SocketProcessor (id 63) -> No exception thrown. >> >> Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014 >> Interesting comment ? at AprEndpoint:2445 : >> } else if (-result == Status.APR_EGENERAL && isSecure()) { >> //Status.APR_EGENERAL=20014 >> // Not entirely sure why this is necessary. Testing to date >> has not >> // identified any issues with this but log it so it can be >> tracked >> // if it is suspected of causing issues in the future. >> if (log.isDebugEnabled()) { >> >> log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(), >> this)); >> } >> return 0; >> Does this need to throw an exception to get caught higher up as an error? > > Oh great. That code. > > It originates here: > http://svn.apache.org/viewvc?view=revision=1534619 > > For the background see this thread: > http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q > > As far as I could tell, something was happening in the TLS layer that > APR/native was reporting as an error that wasn't really an error. > Therefore, I changed Tomcat to ignore the report of an error and carried on. > > What I suspect is happening is that you are seeing a real error that > Tomcat now isn't treating as an error. > > I do have a working build environment for tc-native on Windows now so > this is probably worth a re-visit. > > I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x > releases I've been meaning to start for the last couple of days. > > Given how far you've got with this in a short time, if you wanted to > continue digging that would be great. My suggestion for a way forward > would be: > - enable debug logging for AprEndpoint > - try and recreate the 20014 error with the WebSocket drawing board > example (as described in the links above) > - trace back into APR/native to figure out a) what is generating that > 20014 error code and b) what it should really be generating. > > If you need to build APR/native, the instructions are here: > https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows I can reproduce the (new?) loop error with the following: - clean trunk (9.0.x) build - enable debug logging for AprEndpoint - APR/native 1.2.10 - WebSocket drawboard example - hold down F5 I see some error messages as per the original problem but fairly quickly Tomcat enters the infinite loop. Next steps are digging into the APR/native code. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
On 05/01/2017 22:01, David Oswell wrote: > After some more digging I've been able to further narrow down the problem > somewhat, but still not able to pin point the exact cause; > The issue is not load related, but rather seems to be related to the timing > of the TCP connection being closed. > Depending on the timing the poller and exec appear to get into a loop - > drilled thsi down to an error status returned in AprEndpoint - which might > need to be thrown as an exception rather than return 0 from > the fillReadBuffer in AprEndpoint. > > Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups > Poller thread - AprEndpoint:1652 - Poller adds socket to timeout > Poller thread - AprEndpoint:1675 - Poller gets rv = 1 > Poller thread - AprEndpoint:1694 - Poller gets connection (socket > Id 565911936 ) > Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event > OPEN_READ > Poller thread - AbstractEndpoint:903 - executor executes > AprEndpoint$SocketProcessor (id 63) -> No exception thrown. > > Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014 > Interesting comment ? at AprEndpoint:2445 : > } else if (-result == Status.APR_EGENERAL && isSecure()) { > //Status.APR_EGENERAL=20014 > // Not entirely sure why this is necessary. Testing to date > has not > // identified any issues with this but log it so it can be > tracked > // if it is suspected of causing issues in the future. > if (log.isDebugEnabled()) { > > log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(), > this)); > } > return 0; > Does this need to throw an exception to get caught higher up as an error? Oh great. That code. It originates here: http://svn.apache.org/viewvc?view=revision=1534619 For the background see this thread: http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q As far as I could tell, something was happening in the TLS layer that APR/native was reporting as an error that wasn't really an error. Therefore, I changed Tomcat to ignore the report of an error and carried on. What I suspect is happening is that you are seeing a real error that Tomcat now isn't treating as an error. I do have a working build environment for tc-native on Windows now so this is probably worth a re-visit. I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x releases I've been meaning to start for the last couple of days. Given how far you've got with this in a short time, if you wanted to continue digging that would be great. My suggestion for a way forward would be: - enable debug logging for AprEndpoint - try and recreate the 20014 error with the WebSocket drawing board example (as described in the links above) - trace back into APR/native to figure out a) what is generating that 20014 error code and b) what it should really be generating. If you need to build APR/native, the instructions are here: https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows HTH, Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
After some more digging I've been able to further narrow down the problem somewhat, but still not able to pin point the exact cause; The issue is not load related, but rather seems to be related to the timing of the TCP connection being closed. Depending on the timing the poller and exec appear to get into a loop - drilled thsi down to an error status returned in AprEndpoint - which might need to be thrown as an exception rather than return 0 from the fillReadBuffer in AprEndpoint. Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups Poller thread - AprEndpoint:1652 - Poller adds socket to timeout Poller thread - AprEndpoint:1675 - Poller gets rv = 1 Poller thread - AprEndpoint:1694 - Poller gets connection (socket Id 565911936 ) Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event OPEN_READ Poller thread - AbstractEndpoint:903 - executor executes AprEndpoint$SocketProcessor (id 63) -> No exception thrown. Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014 Interesting comment ? at AprEndpoint:2445 : } else if (-result == Status.APR_EGENERAL && isSecure()) { //Status.APR_EGENERAL=20014 // Not entirely sure why this is necessary. Testing to date has not // identified any issues with this but log it so it can be tracked // if it is suspected of causing issues in the future. if (log.isDebugEnabled()) { log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(), this)); } return 0; Does this need to throw an exception to get caught higher up as an error? Exec thread - HJttp11InputBuffer:708 - Reads from buffer, returns nRead=0 Exec thread - AbstractProtocol:789 - process reuqest ; state returns OPEN Exec thread - AbstractProtocol:2678 - Exec adds socket (id 565911936) Exec thread - AprEndpoint:1435 - Calls notify() on AprEndpoint$Poller (id 55) It seems to just bounce between those two operations, the poller adding the executor for OPEN_READ, but the execute just does a add of the socket for APR_POLLIN, the poller then adds it back as OPEN_READ again, and exec adds back to poller for APR_POLLIN. desc in AprEndpoint contains; [48, 565911936, 1, 565903728, 0, 0, 0, 0, 0, ] desc[0] looks to indicate Poll.APR_POLLHUP wrapper.pollerFlags = 1 On Wed, Jan 4, 2017 at 5:54 PM, David Oswellwrote: > Ok the issue appears to be related to the state the connection handler > thinks it is in, after the connection is disconnected the state returned > by org.apache.coyote.http11.Http11Processor.process remains as OPEN, > which causes the underlying poll thread to get notified and repeats. > > Breakpointing on all of the CLOSED states does not get triggered, unless > at low requests per second, on my server at around 200 requests/second on > one connection the disconnect doesn't get triggered, but at 1-2 it does. > Because it doesn't get flagged as disconnect it just constantly gets > processed and always shows a state of OPEN_READ. > > As a side note, this only seems to occur on the servers (have 2 where the > same behaviour occurs); running it on my local machine the issue doesn't > appear. So not sure if somehow related to Win2012R2 interaction??? - > Yes I know the obvious fix is to not use Win2012 :) > > > > On Wed, Jan 4, 2017 at 12:29 PM, David Oswell wrote: > >> Hi Mark, >> >> Thanks for the response - So far I've narrowed down to this; >> 0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246 >> 1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e >> 2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9 >> 3 - ntoskrnl.exe!KeWaitForSingleObject+0x373 >> 4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977 >> 5 - ntoskrnl.exe!_misaligned_access+0x13f9 >> 6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133 >> 7 - ntdll.dll!RtlCopyMemory+0x3 >> 8 - WS2_32.dll!WSAPoll+0xb9 >> 9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10 >> 10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac >> >> From what I can see there is one thread which is the main cpu user when >> idle, it cycles through the above and; >> jvm.dll!JVM_FindSignal+0x2cba >> jvm.dll!JVM_ResolveClass+0x371c7 >> jvm.dll!JVM_ResolveClass+0x571d7 >> jvm.dll!JVM_MonitorWait+0xfe >> jvm.dll!JVM_FindSignal+0x73f >> >> Which looks to correspond to https-openssl-apr-8443-poller thread, which >> I see cycling through RUNNABLE/TIMED_WAITING states (below are from when >> there are zero requests occuring); >> Name: https-openssl-apr-8443-Poller >> State: RUNNABLE >> Total blocked: 11,222,146 Total waited: 11,222,507 >> >> Stack trace: >> org.apache.tomcat.jni.Poll.poll(Native Method) >> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpo >> int.java:1675) >> java.lang.Thread.run(Thread.java:745) >> >> - >> >> Name: https-openssl-apr-8443-Poller >>
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
Ok the issue appears to be related to the state the connection handler thinks it is in, after the connection is disconnected the state returned by org.apache.coyote.http11.Http11Processor.process remains as OPEN, which causes the underlying poll thread to get notified and repeats. Breakpointing on all of the CLOSED states does not get triggered, unless at low requests per second, on my server at around 200 requests/second on one connection the disconnect doesn't get triggered, but at 1-2 it does. Because it doesn't get flagged as disconnect it just constantly gets processed and always shows a state of OPEN_READ. As a side note, this only seems to occur on the servers (have 2 where the same behaviour occurs); running it on my local machine the issue doesn't appear. So not sure if somehow related to Win2012R2 interaction??? - Yes I know the obvious fix is to not use Win2012 :) On Wed, Jan 4, 2017 at 12:29 PM, David Oswellwrote: > Hi Mark, > > Thanks for the response - So far I've narrowed down to this; > 0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246 > 1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e > 2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9 > 3 - ntoskrnl.exe!KeWaitForSingleObject+0x373 > 4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977 > 5 - ntoskrnl.exe!_misaligned_access+0x13f9 > 6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133 > 7 - ntdll.dll!RtlCopyMemory+0x3 > 8 - WS2_32.dll!WSAPoll+0xb9 > 9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10 > 10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac > > From what I can see there is one thread which is the main cpu user when > idle, it cycles through the above and; > jvm.dll!JVM_FindSignal+0x2cba > jvm.dll!JVM_ResolveClass+0x371c7 > jvm.dll!JVM_ResolveClass+0x571d7 > jvm.dll!JVM_MonitorWait+0xfe > jvm.dll!JVM_FindSignal+0x73f > > Which looks to correspond to https-openssl-apr-8443-poller thread, which I > see cycling through RUNNABLE/TIMED_WAITING states (below are from when > there are zero requests occuring); > Name: https-openssl-apr-8443-Poller > State: RUNNABLE > Total blocked: 11,222,146 Total waited: 11,222,507 > > Stack trace: > org.apache.tomcat.jni.Poll.poll(Native Method) > org.apache.tomcat.util.net.AprEndpoint$Poller.run( > AprEndpoint.java:1675) > java.lang.Thread.run(Thread.java:745) > > - > > Name: https-openssl-apr-8443-Poller > State: RUNNABLE > Total blocked: 11,624,292 Total waited: 11,624,681 > > Stack trace: > org.apache.tomcat.jni.Poll.add(Native Method) > org.apache.tomcat.util.net.AprEndpoint$Poller. > addToPoller(AprEndpoint.java:1449) > org.apache.tomcat.util.net.AprEndpoint$Poller.run( > AprEndpoint.java:1649) > java.lang.Thread.run(Thread.java:745) > > -- > > Name: https-openssl-apr-8443-Poller > State: TIMED_WAITING on org.apache.tomcat.util.net. > AprEndpoint$Poller@4f5e5841 > Total blocked: 12,441,180 Total waited: 12,441,614 > > Stack trace: > java.lang.Object.wait(Native Method) > org.apache.tomcat.util.net.AprEndpoint$Poller.run( > AprEndpoint.java:1573) > java.lang.Thread.run(Thread.java:745) > > - > > Name: https-openssl-apr-8443-Poller > State: RUNNABLE > Total blocked: 13,278,363 Total waited: 13,278,853 > > Stack trace: > sun.misc.Unsafe.unpark(Native Method) > java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141) > java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor( > AbstractQueuedSynchronizer.java:662) > java.util.concurrent.locks.AbstractQueuedSynchronizer.release( > AbstractQueuedSynchronizer.java:1264) > java.util.concurrent.locks.ReentrantLock.unlock( > ReentrantLock.java:457) > java.util.concurrent.LinkedBlockingQueue.signalNotEmpty( > LinkedBlockingQueue.java:176) > java.util.concurrent.LinkedBlockingQueue.offer( > LinkedBlockingQueue.java:430) > org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74) > org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31) > java.util.concurrent.ThreadPoolExecutor.execute( > ThreadPoolExecutor.java:1361) > org.apache.tomcat.util.threads.ThreadPoolExecutor. > execute(ThreadPoolExecutor.java:163) > org.apache.tomcat.util.threads.ThreadPoolExecutor. > execute(ThreadPoolExecutor.java:141) > org.apache.tomcat.util.net.AbstractEndpoint.processSocket( > AbstractEndpoint.java:903) > org.apache.tomcat.util.net.AprEndpoint.processSocket( > AprEndpoint.java:866) > org.apache.tomcat.util.net.AprEndpoint$Poller.run( > AprEndpoint.java:1731) > java.lang.Thread.run(Thread.java:745) > > > If I look at the 8080 poller thread (after load), it's waiting > appropriately. > I'll try and get the remote debugger connected now and see what time is > being passed for waiting. > > > > On Wed, Jan 4, 2017 at 11:30 AM,
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
Hi Mark, Thanks for the response - So far I've narrowed down to this; 0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246 1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e 2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9 3 - ntoskrnl.exe!KeWaitForSingleObject+0x373 4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977 5 - ntoskrnl.exe!_misaligned_access+0x13f9 6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133 7 - ntdll.dll!RtlCopyMemory+0x3 8 - WS2_32.dll!WSAPoll+0xb9 9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10 10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac >From what I can see there is one thread which is the main cpu user when idle, it cycles through the above and; jvm.dll!JVM_FindSignal+0x2cba jvm.dll!JVM_ResolveClass+0x371c7 jvm.dll!JVM_ResolveClass+0x571d7 jvm.dll!JVM_MonitorWait+0xfe jvm.dll!JVM_FindSignal+0x73f Which looks to correspond to https-openssl-apr-8443-poller thread, which I see cycling through RUNNABLE/TIMED_WAITING states (below are from when there are zero requests occuring); Name: https-openssl-apr-8443-Poller State: RUNNABLE Total blocked: 11,222,146 Total waited: 11,222,507 Stack trace: org.apache.tomcat.jni.Poll.poll(Native Method) org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1675) java.lang.Thread.run(Thread.java:745) - Name: https-openssl-apr-8443-Poller State: RUNNABLE Total blocked: 11,624,292 Total waited: 11,624,681 Stack trace: org.apache.tomcat.jni.Poll.add(Native Method) org.apache.tomcat.util.net.AprEndpoint$Poller.addToPoller(AprEndpoint.java:1449) org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1649) java.lang.Thread.run(Thread.java:745) -- Name: https-openssl-apr-8443-Poller State: TIMED_WAITING on org.apache.tomcat.util.net.AprEndpoint$Poller@4f5e5841 Total blocked: 12,441,180 Total waited: 12,441,614 Stack trace: java.lang.Object.wait(Native Method) org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1573) java.lang.Thread.run(Thread.java:745) - Name: https-openssl-apr-8443-Poller State: RUNNABLE Total blocked: 13,278,363 Total waited: 13,278,853 Stack trace: sun.misc.Unsafe.unpark(Native Method) java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141) java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662) java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264) java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457) java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:176) java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:430) org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74) org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31) java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1361) org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:163) org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:141) org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:903) org.apache.tomcat.util.net.AprEndpoint.processSocket(AprEndpoint.java:866) org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1731) java.lang.Thread.run(Thread.java:745) If I look at the 8080 poller thread (after load), it's waiting appropriately. I'll try and get the remote debugger connected now and see what time is being passed for waiting. On Wed, Jan 4, 2017 at 11:30 AM, Mark Thomaswrote: > On 04/01/2017 16:02, David Oswell wrote: > > Hello all, > > > > wondering if anyone has seen similar behaviour - sorry if already > reported, > > googling hasn't found much other than a somewhat similar older issue on > APR > > going to 100% CPU when idle due to a poll loop/timeout issue - however > this > > seems a bit different as this is only occurring with SSL; > > > > Running (ESX Based): > > Server version:Apache Tomcat/8.5.9 > > Server built: Dec 5 2016 20:18:12 UTC > > Server number: 8.5.9.0 > > OS Name: Windows Server 2012 R2 > > OS Version:6.3 > > Architecture: amd64 > > JVM Version: 1.8.0_112-b15 > > JVM Vendor:Oracle Corporation > > Apache Tomcat Native library 1.2.10 using APR version 1.5.2. > > > > APR capabilities: IPv6 [true], sendfile [true], accept filters [false], > > random [true]. > > APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true] > > OpenSSL successfully initialized (OpenSSL 1.0.2j 26 Sep 2016) > > > > Server setup as 4 vCPU / 8GB mem > > > > Problem: > > After running load on the server (this is a QA
Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
On 04/01/2017 16:02, David Oswell wrote: > Hello all, > > wondering if anyone has seen similar behaviour - sorry if already reported, > googling hasn't found much other than a somewhat similar older issue on APR > going to 100% CPU when idle due to a poll loop/timeout issue - however this > seems a bit different as this is only occurring with SSL; > > Running (ESX Based): > Server version:Apache Tomcat/8.5.9 > Server built: Dec 5 2016 20:18:12 UTC > Server number: 8.5.9.0 > OS Name: Windows Server 2012 R2 > OS Version:6.3 > Architecture: amd64 > JVM Version: 1.8.0_112-b15 > JVM Vendor:Oracle Corporation > Apache Tomcat Native library 1.2.10 using APR version 1.5.2. > > APR capabilities: IPv6 [true], sendfile [true], accept filters [false], > random [true]. > APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true] > OpenSSL successfully initialized (OpenSSL 1.0.2j 26 Sep 2016) > > Server setup as 4 vCPU / 8GB mem > > Problem: > After running load on the server (this is a QA system, so just load tests) > Tomcat goes to 70-100% CPU usage when it should be idle. > This only occurs when putting load on the SSL connectors w/APR. When > re-applying load (to SSL connector), the CPU usage drops to appropriate > levels for the load, but then returns to the higher idle CPU usage. > If applying load back to the non-SSL connector, CPU usage stays high > (higher than after a restart and sending to non-SSL connector first). > > Performing the same load tests on non-SSL (still with APR), or non-APR > connectors CPU returns to idle after the load run is completed. > > The load applied is 8 persistent connections, which send basic POST > requests, the service does some basic operations and responds, generally - > if not maxed out - in 1-2ms. > Even when using 1 connection, the CPU usage stays constant when the > requests stop. > Again this only occurs when using the SSL/APR connector, if using a non-APR > connector, or the APR-non-SSL connector it's fine. > > It seems to be a problem with just how APR-SSL works or possibly OpenSSL ? An issue with the Tomcat code is more likely than with the OpenSSL code. It sounds like one or more threads is entering a fairly tight loop. Use a combination of top to find out which thread(s) is(are) using the CPU and then a thread dump to find where in the code it is happening. Then add a breakpoint at that point in the code, run your test and once the high CPU starts, attach your IDE and step through the code to see where/how the loop is happening. Instructions for running Tomcat under a debugger are on the Wiki. The more of the above you are able to complete, the simpler it is going to be to get this fixed. Mark > Not sure how to further isolate to see which one is the problem. > > > SSL Connector setup: > protocol="org.apache.coyote.http11.Http11AprProtocol" > maxThreads="150" > SSLEnabled="true" > maxKeepAliveRequests="-1"> > > certificateFile="conf/localhost-rsa-cert.pem" > certificateKeyFile="conf/localhost-rsa-key.pem" > certificateKeyPassword="password" > type="RSA" /> > > > > Non-SSL Connector: >protocol="org.apache.coyote.http11.Http11AprProtocol" >connectionTimeout="2" >maxKeepAliveRequests="-1" >redirectPort="8443" > > > > Initializing ProtocolHandler ["http-apr-8080"] > Initializing ProtocolHandler ["https-openssl-apr-8443"] > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?
Hello all, wondering if anyone has seen similar behaviour - sorry if already reported, googling hasn't found much other than a somewhat similar older issue on APR going to 100% CPU when idle due to a poll loop/timeout issue - however this seems a bit different as this is only occurring with SSL; Running (ESX Based): Server version:Apache Tomcat/8.5.9 Server built: Dec 5 2016 20:18:12 UTC Server number: 8.5.9.0 OS Name: Windows Server 2012 R2 OS Version:6.3 Architecture: amd64 JVM Version: 1.8.0_112-b15 JVM Vendor:Oracle Corporation Apache Tomcat Native library 1.2.10 using APR version 1.5.2. APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true]. APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true] OpenSSL successfully initialized (OpenSSL 1.0.2j 26 Sep 2016) Server setup as 4 vCPU / 8GB mem Problem: After running load on the server (this is a QA system, so just load tests) Tomcat goes to 70-100% CPU usage when it should be idle. This only occurs when putting load on the SSL connectors w/APR. When re-applying load (to SSL connector), the CPU usage drops to appropriate levels for the load, but then returns to the higher idle CPU usage. If applying load back to the non-SSL connector, CPU usage stays high (higher than after a restart and sending to non-SSL connector first). Performing the same load tests on non-SSL (still with APR), or non-APR connectors CPU returns to idle after the load run is completed. The load applied is 8 persistent connections, which send basic POST requests, the service does some basic operations and responds, generally - if not maxed out - in 1-2ms. Even when using 1 connection, the CPU usage stays constant when the requests stop. Again this only occurs when using the SSL/APR connector, if using a non-APR connector, or the APR-non-SSL connector it's fine. It seems to be a problem with just how APR-SSL works or possibly OpenSSL ? Not sure how to further isolate to see which one is the problem. SSL Connector setup: Non-SSL Connector: Initializing ProtocolHandler ["http-apr-8080"] Initializing ProtocolHandler ["https-openssl-apr-8443"]