Re: Stabilizing the trunk (9.0.x) build
On 27/02/2015 15:01, Mark Thomas wrote: On 27/02/2015 14:42, Christopher Schultz wrote: On 2/27/15 7:00 AM, Mark Thomas wrote: snip/ There is also an issue with APR on Linux that I can reproduce (with some code changes) that triggers a crash every couple of runs. Next time is happens, can you give me the backtrace and register details (basically, the top of the Java hs_* file)? From my perspective, it should not be possible to crash tcnative if we can help it -- even if the Java code is all kinds of wrong. Throwing exceptions is fine, but taking-down the JVM is obnoxious :) I should be able to do this fairly easily. I'll open BZ item with the info you requested when I have it. As requested: https://bz.apache.org/bugzilla/show_bug.cgi?id=57653 Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
Mark, On 2/27/15 7:00 AM, Mark Thomas wrote: Another update. I think I am getting close to being able to commit these changes[1]. The current status is: - NIO appears to pass on Windows, OSX and Linux - APR appears to pass on OSX and Linux - APR unknown on Windows - NIO2 appears to pass on OSX and Linux - NIO2 hanging on Windows I say appears to pass since with timing issues one can never be sure. There is also an issue with APR on Linux that I can reproduce (with some code changes) that triggers a crash every couple of runs. Next time is happens, can you give me the backtrace and register details (basically, the top of the Java hs_* file)? From my perspective, it should not be possible to crash tcnative if we can help it -- even if the Java code is all kinds of wrong. Throwing exceptions is fine, but taking-down the JVM is obnoxious :) I'm not sure if it is possible to trigger the error with the current code. I plan to look at this some more once the unit tests are passing. -chris signature.asc Description: OpenPGP digital signature
Re: Stabilizing the trunk (9.0.x) build
On 27/02/2015 14:42, Christopher Schultz wrote: Mark, On 2/27/15 7:00 AM, Mark Thomas wrote: Another update. I think I am getting close to being able to commit these changes[1]. The current status is: - NIO appears to pass on Windows, OSX and Linux - APR appears to pass on OSX and Linux - APR unknown on Windows - NIO2 appears to pass on OSX and Linux - NIO2 hanging on Windows I say appears to pass since with timing issues one can never be sure. Cracked it (I think). Unit tests pass for all three connectors on all three platforms. There is also an issue with APR on Linux that I can reproduce (with some code changes) that triggers a crash every couple of runs. Next time is happens, can you give me the backtrace and register details (basically, the top of the Java hs_* file)? From my perspective, it should not be possible to crash tcnative if we can help it -- even if the Java code is all kinds of wrong. Throwing exceptions is fine, but taking-down the JVM is obnoxious :) I should be able to do this fairly easily. I'll open BZ item with the info you requested when I have it. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
2015-02-27 13:00 GMT+01:00 Mark Thomas ma...@apache.org: Another update. I think I am getting close to being able to commit these changes[1]. The current status is: - NIO appears to pass on Windows, OSX and Linux - APR appears to pass on OSX and Linux - APR unknown on Windows - NIO2 appears to pass on OSX and Linux - NIO2 hanging on Windows The testsuite passes for me, on Windows (with non connector or websocket related fails) and Linux (NIO2). Do I need a really slow thing like the CI system to run into issues ? It's not related, but there's a glitch with some testsuites and CI systems: the websocket client needs a lot of entropy if each test is run in a separate JVM (this does not happen with the Tomcat testsuite). Rémy
Re: Stabilizing the trunk (9.0.x) build
Another update. I think I am getting close to being able to commit these changes[1]. The current status is: - NIO appears to pass on Windows, OSX and Linux - APR appears to pass on OSX and Linux - APR unknown on Windows - NIO2 appears to pass on OSX and Linux - NIO2 hanging on Windows I say appears to pass since with timing issues one can never be sure. There is also an issue with APR on Linux that I can reproduce (with some code changes) that triggers a crash every couple of runs. I'm not sure if it is possible to trigger the error with the current code. I plan to look at this some more once the unit tests are passing. Mark [1] https://github.com/markt-asf/tomcat/tree/markt-trunk - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
On 26/02/2015 21:58, Christopher Schultz wrote: Mark, On 2/23/15 4:16 AM, Mark Thomas wrote: Given that it is my changes that have triggered the problems I think I have a responsibility to fix them (and intend to do so over) but I'm not going to say no if anyone wants to pitch in. Therefore, I'm starting this thread so that we can co-ordinate work on fixing the various failures being reported. I'm going to start with why TestWsWebSocketContainer.testMaxMessageSize04() hangs on Windows. I'd like to commit Ognjen's patch for https://bz.apache.org/bugzilla/show_bug.cgi?id=55988 (patch is https://bz.apache.org/bugzilla/attachment.cgi?id=32407action=diff). It's fairly innocuous, but since it will change the AbstractEndpoint class and you guys are trying to track-down irritating issues in there, would you prefer that I hold-off? No objections to committing the patch from me. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
On 26/02/2015 12:25, Rémy Maucherat wrote: 2015-02-26 11:42 GMT+01:00 Mark Thomas ma...@apache.org: What I have at the moment is at: https://github.com/markt-asf/tomcat/tree/markt-trunk I'm currently running the unit tests. Looking good. Better, certainly. NIO tests pass on Windows, Linux and OSX. I've found a bug in NIO2 + SSL that is fairly common on Linux/OSX that I have fixed and am re-running the tests. I haven't really looked at APR/native yet but there did appear to be some unexpectedly long running tests on Windows (the only platform to get to APR/native so far) so I suspect there is still more to do. Overall I think things are heading in the right direction. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
2015-02-26 17:40 GMT+01:00 Mark Thomas ma...@apache.org: Better, certainly. NIO tests pass on Windows, Linux and OSX. Very good ! I've found a bug in NIO2 + SSL that is fairly common on Linux/OSX that I have fixed and am re-running the tests. Aw, *another* one ? I haven't really looked at APR/native yet but there did appear to be some unexpectedly long running tests on Windows (the only platform to get to APR/native so far) so I suspect there is still more to do. Overall I think things are heading in the right direction. Rémy
Re: Stabilizing the trunk (9.0.x) build
On 26/02/2015 17:34, Rémy Maucherat wrote: 2015-02-26 17:40 GMT+01:00 Mark Thomas ma...@apache.org: Better, certainly. NIO tests pass on Windows, Linux and OSX. Very good ! I've found a bug in NIO2 + SSL that is fairly common on Linux/OSX that I have fixed and am re-running the tests. Aw, *another* one ? Yes. Looks like it affects 8.0.x as well. This fixes it: https://github.com/markt-asf/tomcat/commit/f8eda8da61751b0b224d59dbd93ed9f5f1fa9441 I haven't really looked at APR/native yet but there did appear to be some unexpectedly long running tests on Windows (the only platform to get to APR/native so far) so I suspect there is still more to do. There was another issue but it looked to be a fairly simple one - one of the concurrent read/write fixes was breaking a bunch of stuff. Most likely the fix wasn't right but since we don't need it removing it was the simplest solution. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
On 25/02/2015 19:32, Rémy Maucherat wrote: 2015-02-25 19:36 GMT+01:00 Mark Thomas ma...@apache.org: I was planning on waiting until the build was stable but given that: - read/write concurrency is at the root of a lot of these issues - only WebSocket should be using it now in trunk - the plan is to refactor WebSocket to remove it I'm going to go back to what I have in git, rebase it to current trunk and see where we are. If the unit tests pass on the usual platforms I'd be tempted to commit it. WDYT? Ok. What I have at the moment is at: https://github.com/markt-asf/tomcat/tree/markt-trunk I'm currently running the unit tests. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
2015-02-26 11:42 GMT+01:00 Mark Thomas ma...@apache.org: What I have at the moment is at: https://github.com/markt-asf/tomcat/tree/markt-trunk I'm currently running the unit tests. Looking good. Rémy
Re: Stabilizing the trunk (9.0.x) build
Mark, On 2/23/15 4:16 AM, Mark Thomas wrote: Given that it is my changes that have triggered the problems I think I have a responsibility to fix them (and intend to do so over) but I'm not going to say no if anyone wants to pitch in. Therefore, I'm starting this thread so that we can co-ordinate work on fixing the various failures being reported. I'm going to start with why TestWsWebSocketContainer.testMaxMessageSize04() hangs on Windows. I'd like to commit Ognjen's patch for https://bz.apache.org/bugzilla/show_bug.cgi?id=55988 (patch is https://bz.apache.org/bugzilla/attachment.cgi?id=32407action=diff). It's fairly innocuous, but since it will change the AbstractEndpoint class and you guys are trying to track-down irritating issues in there, would you prefer that I hold-off? Thanks, -chris signature.asc Description: OpenPGP digital signature
Re: Stabilizing the trunk (9.0.x) build
2015-02-24 16:33 GMT+01:00 Mark Thomas ma...@apache.org: On 24/02/2015 13:10, Rémy Maucherat wrote: I'm having issues with the write timeout tests in TestWsWebSocketContainer, which made me do some changes since there are still things I don't understand: These appear to be OK for me at the moment with NIO and NIO2 but the very nature of timing issues means that doesn't count for much. I am seeing failures or crashes with APR/native so there is still work to be done there. - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be invoked concurrently (doWrite calls it directly and changes the buffers). I think it should be synced. Those calls should be nested. If you are seeing concurrent calls then there is probably still an issue around write registration. I still think there is concurrency there, at least with the first write notification (which is concurrent if the first read does write immediately, just like our big failing test does). Without the read/write concurrency, I think there wouldn't be any issue. With the TestWebSocketFrameClient failure, the contending traces look like (I used a semaphore to isolate them): [junit] java.lang.Exception: Stack trace [junit] at java.lang.Thread.dumpStack(Thread.java:1329) [junit] at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146) [junit] at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:87) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$OutputBufferSendHandler.write(WsRemoteEndpointImplBase.java:822) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:447) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:338) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:730) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:250) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:193) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37) [junit] at org.apache.tomcat.websocket.TesterFirehoseServer$Endpoint.onMessage(TesterFirehoseServer.java:121) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit] at java.lang.reflect.Method.invoke(Method.java:483) [junit] at org.apache.tomcat.websocket.pojo.PojoMessageHandlerWholeBase.onMessage(PojoMessageHandlerWholeBase.java:80) [junit] at org.apache.tomcat.websocket.WsFrameBase.sendMessageText(WsFrameBase.java:393) [junit] at org.apache.tomcat.websocket.WsFrameBase.processDataText(WsFrameBase.java:494) [junit] at org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:289) [junit] at org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:130) [junit] at org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:56) [junit] at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:207) [junit] at org.apache.coyote.http11.upgrade.UpgradeServletInputStream.onDataAvailable(UpgradeServletInputStream.java:213) [junit] at org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:108) [junit] at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658) [junit] at org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1653) [junit] at org.apache.tomcat.util.net.Nio2Endpoint.processSocket0(Nio2Endpoint.java:578) [junit] at org.apache.tomcat.util.net.Nio2Endpoint.processSocket(Nio2Endpoint.java:563) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:794) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:775) [junit] at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) [junit] at sun.nio.ch.Invoker$2.run(Invoker.java:218) [junit] at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
Re: Stabilizing the trunk (9.0.x) build
On 25/02/2015 14:31, Rémy Maucherat wrote: 2015-02-24 16:33 GMT+01:00 Mark Thomas ma...@apache.org: On 24/02/2015 13:10, Rémy Maucherat wrote: I'm having issues with the write timeout tests in TestWsWebSocketContainer, which made me do some changes since there are still things I don't understand: These appear to be OK for me at the moment with NIO and NIO2 but the very nature of timing issues means that doesn't count for much. I am seeing failures or crashes with APR/native so there is still work to be done there. - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be invoked concurrently (doWrite calls it directly and changes the buffers). I think it should be synced. Those calls should be nested. If you are seeing concurrent calls then there is probably still an issue around write registration. I still think there is concurrency there, at least with the first write notification (which is concurrent if the first read does write immediately, just like our big failing test does). Without the read/write concurrency, I think there wouldn't be any issue. With the TestWebSocketFrameClient failure, the contending traces look like (I used a semaphore to isolate them): [junit] java.lang.Exception: Stack trace [junit] at java.lang.Thread.dumpStack(Thread.java:1329) [junit] at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146) [junit] at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:87) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$OutputBufferSendHandler.write(WsRemoteEndpointImplBase.java:822) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:447) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:338) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:730) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:250) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:193) [junit] at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37) [junit] at org.apache.tomcat.websocket.TesterFirehoseServer$Endpoint.onMessage(TesterFirehoseServer.java:121) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit] at java.lang.reflect.Method.invoke(Method.java:483) [junit] at org.apache.tomcat.websocket.pojo.PojoMessageHandlerWholeBase.onMessage(PojoMessageHandlerWholeBase.java:80) [junit] at org.apache.tomcat.websocket.WsFrameBase.sendMessageText(WsFrameBase.java:393) [junit] at org.apache.tomcat.websocket.WsFrameBase.processDataText(WsFrameBase.java:494) [junit] at org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:289) [junit] at org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:130) [junit] at org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:56) [junit] at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:207) [junit] at org.apache.coyote.http11.upgrade.UpgradeServletInputStream.onDataAvailable(UpgradeServletInputStream.java:213) [junit] at org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:108) [junit] at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658) [junit] at org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1653) [junit] at org.apache.tomcat.util.net.Nio2Endpoint.processSocket0(Nio2Endpoint.java:578) [junit] at org.apache.tomcat.util.net.Nio2Endpoint.processSocket(Nio2Endpoint.java:563) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:794) [junit] at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:775) [junit] at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) [junit] at sun.nio.ch.Invoker$2.run(Invoker.java:218)
Re: Stabilizing the trunk (9.0.x) build
2015-02-25 19:36 GMT+01:00 Mark Thomas ma...@apache.org: I was planning on waiting until the build was stable but given that: - read/write concurrency is at the root of a lot of these issues - only WebSocket should be using it now in trunk - the plan is to refactor WebSocket to remove it I'm going to go back to what I have in git, rebase it to current trunk and see where we are. If the unit tests pass on the usual platforms I'd be tempted to commit it. WDYT? Ok. Rémy
Re: Stabilizing the trunk (9.0.x) build
Progress is being made. TestWsWebSocketContainer.testMaxMessageSize04() is fixed. I do want to come back to exactly how/if flushing is performed on ServletOutputStream.close() but I plan on parking that until the other failures are fixed. Next on my list is TestUpgrade.testMessagesBlocking(). I am seeing failures on most runs on Linux and Windows (command line only - not IDE). The symptom is that the connection to the client is closed before the second message is received. I've tried - without success so far - to reproduce this in a debugger. I'll be working on this today. On a related topic the Gump OpenSSL tests are still failing. They pass when run directly from the command line on vmgump.a.o. I can't come up with a better idea than adding some debugging to the tests. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
Am 24.02.2015 um 10:01 schrieb Mark Thomas: On a related topic the Gump OpenSSL tests are still failing. They pass when run directly from the command line on vmgump.a.o. I can't come up with a better idea than adding some debugging to the tests. I installed OpenSSL master (current snapshot) locally and ran the TestOpenSSLCipherConfigurationParser test against our trunk.I get failures as well although I can confirm, that the correct OpenSSL version 1.1.0-dev was used. Looking at the simplest failure example SSLv2: OpenSSL 1.1.0 no longer supports SSLv2, so openssl ciphers -v SSLv2 returns and empty result and that is what the test expects. OTOH in TestOpenSSLCipherConfigurationParser there are about 6 ciphers which are defined for SSLv2 and those show up in the failed tests (plus some of their aliases). Not sure how to handle OpenSSL version compatibility in the tests and in the Tomcat runtime code. Which version of OpenSSl is java/org/apache/tomcat/util/net/jsse/openssl/ supposed to reflect? Any specific version, or any cipher existing in some OpenSSL version? That code I think does not actually use OpenSSL and is only a translation mechanism from OpenSSL syntax to JSSE syntax, correct? The test OTOH actually use OpenSSL and compare results, so would never be compatible with a extended cipher list. Maybe for testing we need to mark the ciphers in the list, that actually exist in the OpenSSL version that's supposed to be used during the tests?I don't have a convincing idea... Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
On 24/02/2015 09:01, Mark Thomas wrote: Progress is being made. TestWsWebSocketContainer.testMaxMessageSize04() is fixed. I do want to come back to exactly how/if flushing is performed on ServletOutputStream.close() but I plan on parking that until the other failures are fixed. Next on my list is TestUpgrade.testMessagesBlocking(). I am seeing failures on most runs on Linux and Windows (command line only - not IDE). The symptom is that the connection to the client is closed before the second message is received. I've tried - without success so far - to reproduce this in a debugger. I've tracked down and fixed one possible cause of this failure. Unfortunately, the test still fails. It looks like the same problem exists in NioSelectorPool. I'm investigating possible fixes. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
2015-02-24 10:01 GMT+01:00 Mark Thomas ma...@apache.org: Progress is being made. TestWsWebSocketContainer.testMaxMessageSize04() is fixed. I do want to come back to exactly how/if flushing is performed on ServletOutputStream.close() but I plan on parking that until the other failures are fixed. I'm having issues with the write timeout tests in TestWsWebSocketContainer, which made me do some changes since there are still things I don't understand: - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be invoked concurrently (doWrite calls it directly and changes the buffers). I think it should be synced. - In Nio2Endpoint socket wrapper uses nestedWriteCompletionCount over the inline flag that was used in 8. If the write completes inline, then isReady should already be set back to true, and writing could continue. So the change was IMO adding more write notifications which could hide some issues. I tried changing that many times following the refactoring started, but this is the first time I can do it without obviously breaking the testsuite (where some of the non blocking write tests would hang due to missing write notifications). - NPE guards in the NIO connector socket processor for concurrent closing [NIO2 has them, somehow it wasn't needed earlier in NIO, which is also an odd thing; I actually feel better having to add them]. So this could improve on some possible timing related problems. I'll keep on investigating though before committing anything. Rémy
Re: Stabilizing the trunk (9.0.x) build
On 24/02/2015 13:10, Rémy Maucherat wrote: I'm having issues with the write timeout tests in TestWsWebSocketContainer, which made me do some changes since there are still things I don't understand: These appear to be OK for me at the moment with NIO and NIO2 but the very nature of timing issues means that doesn't count for much. I am seeing failures or crashes with APR/native so there is still work to be done there. - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be invoked concurrently (doWrite calls it directly and changes the buffers). I think it should be synced. Those calls should be nested. If you are seeing concurrent calls then there is probably still an issue around write registration. - In Nio2Endpoint socket wrapper uses nestedWriteCompletionCount over the inline flag that was used in 8. If the write completes inline, then isReady should already be set back to true, and writing could continue. So the change was IMO adding more write notifications which could hide some issues. I tried changing that many times following the refactoring started, but this is the first time I can do it without obviously breaking the testsuite (where some of the non blocking write tests would hang due to missing write notifications). This change was to prevent multiple write threads being triggered if there were multiple levels of nesting with the write completion handler. It was a fairly rare event but it did happen. - NPE guards in the NIO connector socket processor for concurrent closing [NIO2 has them, somehow it wasn't needed earlier in NIO, which is also an odd thing; I actually feel better having to add them]. So this could improve on some possible timing related problems. I'll keep on investigating though before committing anything. One thing to keep in mind that may simplify some of these issues is that once WebSocket moves to using the Tomcat I/O layer directly the requirement for one container thread reading and one container thread writing concurrently will go away. A number of the concurrency issues we have observed are triggered by these concurrent threads so switching back to a single thread should help. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
On 23/02/2015 10:40, Rémy Maucherat wrote: 2015-02-23 10:16 GMT+01:00 Mark Thomas ma...@apache.org: Given that it is my changes that have triggered the problems I think I have a responsibility to fix them (and intend to do so over) but I'm not going to say no if anyone wants to pitch in. Therefore, I'm starting this thread so that we can co-ordinate work on fixing the various failures being reported. I'm going to start with why TestWsWebSocketContainer.testMaxMessageSize04() hangs on Windows. I'll try to help and get up to speed with the changes. Thanks. Much appreciated. I've made progress in that the test now fails rather than hangs. I'm waiting for the various CI systems to see if the issues I've fixed were the only causes of the hangs or if there are others still to fix. In the meantime, I'm going to look at fixing this particular test. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Stabilizing the trunk (9.0.x) build
2015-02-23 10:16 GMT+01:00 Mark Thomas ma...@apache.org: Given that it is my changes that have triggered the problems I think I have a responsibility to fix them (and intend to do so over) but I'm not going to say no if anyone wants to pitch in. Therefore, I'm starting this thread so that we can co-ordinate work on fixing the various failures being reported. I'm going to start with why TestWsWebSocketContainer.testMaxMessageSize04() hangs on Windows. I'll try to help and get up to speed with the changes. Rémy