Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 28/06/2021 06:14, Deshmukh, Kedar wrote: Any tentative time line when fix will be available in 9.0.X release ? Releases are typically made every month. The release usually happens some time in the second week of the month. The July releases are currently look like they will be comparatively early - possibly as soon as the end of this week. If you want to track release progress then I'd recommend following the dev@ list. Mark Thanks, Kedar -Original Message- From: Mark Thomas Sent: Friday, June 18, 2021 2:50 AM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 17/06/2021 09:26, Mark Thomas wrote: I think I might have found one contributing factor to this bug. I need to run a series of tests to determine whether I am seeing random variation in test results or a genuine effect. It was random effects but I believe I have now found the bug. Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in the same HTTP/2 Connection. You'll need to have the code in front of you to follow what is going on The write: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1364data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=p2j7GZ6kDbeWc8%2BqnBGoadjhgV8w%2FcG8YnriPDeV%2F2g%3Dreserved=0 and the associated completion handler https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1044data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UVt7wgZ2GuKML2VUMh%2B58f7sK0cxdS0ZRAOs0gGiasQ%3Dreserved=0 The detail of the code is fairly complex but all you really need to keep in mind is the following: - the writePending semaphore ensures only one thread can write at a time - the state of the write is maintained in a OperationState instance that is stored in SocketWrapperBase.writeOperation (L1390) - the completion handler clears this state (L1050) and releases the semaphore (L1046) The sequence of events for a failure is as follows: - T1 obtains the write semaphore (L1366) - T1 creates an OperationState and sets writeOperation (L1390) - the async write for T1 completes and the completion handler is called - T1's completion handler releases the semaphore (L1046) - T2 obtains the write semaphore (L1366) - T2 creates an OperationState and sets writeOperation (L1390) - T1's completion handler clears writeOperation (L1050) - the async write for T2 does not complete and the socket is added to the Poller - The Poller signals the socket is ready for write - The Poller finds writeOperation is null so performs a normal dispatch for write - The async write times out as it never receives the notification from the Poller The fix is to swap the order of clearing writeOperation and releasing the semaphore. Concurrent reads will have the same problem and will be fixed by the same solution. Fix will be applied shortly. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
Any tentative time line when fix will be available in 9.0.X release ? Thanks, Kedar -Original Message- From: Mark Thomas Sent: Friday, June 18, 2021 2:50 AM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 17/06/2021 09:26, Mark Thomas wrote: > I think I might have found one contributing factor to this bug. I need > to run a series of tests to determine whether I am seeing random > variation in test results or a genuine effect. It was random effects but I believe I have now found the bug. Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in the same HTTP/2 Connection. You'll need to have the code in front of you to follow what is going on The write: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1364data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=p2j7GZ6kDbeWc8%2BqnBGoadjhgV8w%2FcG8YnriPDeV%2F2g%3Dreserved=0 and the associated completion handler https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1044data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UVt7wgZ2GuKML2VUMh%2B58f7sK0cxdS0ZRAOs0gGiasQ%3Dreserved=0 The detail of the code is fairly complex but all you really need to keep in mind is the following: - the writePending semaphore ensures only one thread can write at a time - the state of the write is maintained in a OperationState instance that is stored in SocketWrapperBase.writeOperation (L1390) - the completion handler clears this state (L1050) and releases the semaphore (L1046) The sequence of events for a failure is as follows: - T1 obtains the write semaphore (L1366) - T1 creates an OperationState and sets writeOperation (L1390) - the async write for T1 completes and the completion handler is called - T1's completion handler releases the semaphore (L1046) - T2 obtains the write semaphore (L1366) - T2 creates an OperationState and sets writeOperation (L1390) - T1's completion handler clears writeOperation (L1050) - the async write for T2 does not complete and the socket is added to the Poller - The Poller signals the socket is ready for write - The Poller finds writeOperation is null so performs a normal dispatch for write - The async write times out as it never receives the notification from the Poller The fix is to swap the order of clearing writeOperation and releasing the semaphore. Concurrent reads will have the same problem and will be fixed by the same solution. Fix will be applied shortly. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
Magic Mark, > Am 17.06.2021 um 23:20 schrieb Mark Thomas : > > On 17/06/2021 09:26, Mark Thomas wrote: > >> I think I might have found one contributing factor to this bug. I need to >> run a series of tests to determine whether I am seeing random variation in >> test results or a genuine effect. > > It was random effects but I believe I have now found the bug. > > Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently > in the same HTTP/2 Connection. > > You'll need to have the code in front of you to follow what is going on > > The write: > > https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364 > > and the associated completion handler > > https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044 > > > The detail of the code is fairly complex but all you really need to keep in > mind is the following: > > - the writePending semaphore ensures only one thread can write at a time > > - the state of the write is maintained in a OperationState instance that is > stored in SocketWrapperBase.writeOperation (L1390) > > - the completion handler clears this state (L1050) and releases the > semaphore (L1046) > > > The sequence of events for a failure is as follows: > > - T1 obtains the write semaphore (L1366) > - T1 creates an OperationState and sets writeOperation (L1390) > - the async write for T1 completes and the completion handler is called > - T1's completion handler releases the semaphore (L1046) > - T2 obtains the write semaphore (L1366) > - T2 creates an OperationState and sets writeOperation (L1390) > - T1's completion handler clears writeOperation (L1050) > - the async write for T2 does not complete and the socket is added to > the Poller > - The Poller signals the socket is ready for write > - The Poller finds writeOperation is null so performs a normal dispatch > for write > - The async write times out as it never receives the notification from > the Poller > > The fix is to swap the order of clearing writeOperation and releasing the > semaphore. > > Concurrent reads will have the same problem and will be fixed by the same > solution. > Thread handling and synchronizing at the max! Thanks for the insight and your hard work finding this! > Fix will be applied shortly. > > Mark > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 17/06/2021 09:26, Mark Thomas wrote: I think I might have found one contributing factor to this bug. I need to run a series of tests to determine whether I am seeing random variation in test results or a genuine effect. It was random effects but I believe I have now found the bug. Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in the same HTTP/2 Connection. You'll need to have the code in front of you to follow what is going on The write: https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364 and the associated completion handler https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044 The detail of the code is fairly complex but all you really need to keep in mind is the following: - the writePending semaphore ensures only one thread can write at a time - the state of the write is maintained in a OperationState instance that is stored in SocketWrapperBase.writeOperation (L1390) - the completion handler clears this state (L1050) and releases the semaphore (L1046) The sequence of events for a failure is as follows: - T1 obtains the write semaphore (L1366) - T1 creates an OperationState and sets writeOperation (L1390) - the async write for T1 completes and the completion handler is called - T1's completion handler releases the semaphore (L1046) - T2 obtains the write semaphore (L1366) - T2 creates an OperationState and sets writeOperation (L1390) - T1's completion handler clears writeOperation (L1050) - the async write for T2 does not complete and the socket is added to the Poller - The Poller signals the socket is ready for write - The Poller finds writeOperation is null so performs a normal dispatch for write - The async write times out as it never receives the notification from the Poller The fix is to swap the order of clearing writeOperation and releasing the semaphore. Concurrent reads will have the same problem and will be fixed by the same solution. Fix will be applied shortly. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 17/06/2021 08:44, Rémy Maucherat wrote: On Thu, Jun 17, 2021 at 9:27 AM Mark Thomas wrote: On 17/06/2021 07:56, Rémy Maucherat wrote: The main benefit is that it removes some blocking IO which is a good idea. NIO2 is worth testing with your new test, BTW. NIO2 works. The issue appears to be limited to the NIO connector. Interesting. At least it's another workaround then. Are you testing NIO on Java 11+ ? Yes, as the test case uses the HTTP client from Java 11 I think I might have found one contributing factor to this bug. I need to run a series of tests to determine whether I am seeing random variation in test results or a genuine effect. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On Thu, Jun 17, 2021 at 9:27 AM Mark Thomas wrote: > On 17/06/2021 07:56, Rémy Maucherat wrote: > > > The main benefit is that it removes some blocking IO which is a good > idea. > > NIO2 is worth testing with your new test, BTW. > > NIO2 works. The issue appears to be limited to the NIO connector. > Interesting. At least it's another workaround then. Are you testing NIO on Java 11+ ? Rémy > > Mark > > > Transferring large files with HTTP/2 is a bad idea though, it's always > > going to be inefficient compared to HTTP/1.1. And if the idea is to > > multiplex, then it will become terrible, that's a given and it should not > > be done. HTTP/2 is very good only if you have tons of small entities to > > transfer (that's what sites have these days, so that's good). > > > > Rémy > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 17/06/2021 07:56, Rémy Maucherat wrote: The main benefit is that it removes some blocking IO which is a good idea. NIO2 is worth testing with your new test, BTW. NIO2 works. The issue appears to be limited to the NIO connector. Mark Transferring large files with HTTP/2 is a bad idea though, it's always going to be inefficient compared to HTTP/1.1. And if the idea is to multiplex, then it will become terrible, that's a given and it should not be done. HTTP/2 is very good only if you have tons of small entities to transfer (that's what sites have these days, so that's good). Rémy - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On Wed, Jun 16, 2021 at 11:02 PM Mark Thomas wrote: > On 16/06/2021 19:42, Deshmukh, Kedar wrote: > > Thanks Mark for the quick update. > > > > Can you please provide how useAsyncIO="false" makes impact in terms of > performance, scalability (number of connections to the server) and > reliability ? > > Well, if you set useAsyncIO="false" it works. If you set > useAsyncIO="true" it fails 90% of the time. > > I'd suggest that, with that failure rate, discussion of performance and > scalability are somewhat irrelevant. > > With test cases that do not trigger this issue the difference between > the two is marginal. Remy is more familiar with the details than me but > from memory useAsyncIO="true" is a little better on older JREs. On > modern JREs there isn't much in it. It is also likely to depend on > application usage patterns, OS, etc. In short, you'd need to test the > difference on your application with your hardware etc. I'd expect the > difference to be hard to measure. > The main benefit is that it removes some blocking IO which is a good idea. NIO2 is worth testing with your new test, BTW. Transferring large files with HTTP/2 is a bad idea though, it's always going to be inefficient compared to HTTP/1.1. And if the idea is to multiplex, then it will become terrible, that's a given and it should not be done. HTTP/2 is very good only if you have tons of small entities to transfer (that's what sites have these days, so that's good). Rémy > > Mark > > > > > > Regards, > > Kedar > > > > > > -----Original Message- > > From: Mark Thomas > > Sent: Wednesday, June 16, 2021 11:41 PM > > To: users@tomcat.apache.org > > Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer > (server -> client) > > > > On 16/06/2021 18:47, Rémy Maucherat wrote: > >> On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: > >> > >>> On 16/06/2021 18:01, Deshmukh, Kedar wrote: > >>> > >>>> I have one additional question at this point. How easy is this issue > >>>> to > >>> reproduce? Does it happen every time? In 10% of requests? 1% ? > >>>> > >>>> [Kedar] It is reproducible 9/10 times in my environment. So 90% time > >>>> it > >>> is reproducible when concurrency is 5 or more and file sizes are > >>> between 1GB-5GB. > >>> > >>> Thanks for the confirmation. I have converted your test classes into > >>> a Tomcat unit test (easy for me to work with) and the issue looks to > >>> be repeatable on Linux with the latest 10.1.x code. > >>> > >>> I'm starting to look at this now. I'll post again when I have more > >>> information. I'm not expecting this to be quick. > > > > Kedar, > > > > If you set useAsyncIO="false" on the Connector that should work around > the problem for now. The Servlet Async API will still be available. > > Tomcat just uses a different code path to write to the network. > > > >> I did not expect it would be so easy to reproduce. Can you commit the > test ? > > > > It is a bit of a hack at the moment. The code isn't particularly clean > and I have hard-coded some file paths for my system (I have a bunch of 5GB > Windows MSDN ISOs I am using for the large files. I also don't think we > want test cases that using multi-GB files running on every test run. > > > > If I clean things up a bit, parameterise the hard-coded paths bits and > disable the test by default it should be in a reasonable state to commit. > > > > It looks very much like the vectoredOperation and the associated > semaphore is where things are going wrong. > > > > I'm aiming to work on this some more tomorrow. > > > > Mark > > > > - > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > > > - > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
You are right. Actually, my concern is if useAsyncIO="true" gives better results in general then should it be good idea to wait for that as we are still assessing the HTTP/2 behavior in context of our application. Thanks, Kedar -Original Message- From: Mark Thomas Sent: Thursday, June 17, 2021 2:32 AM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 16/06/2021 19:42, Deshmukh, Kedar wrote: > Thanks Mark for the quick update. > > Can you please provide how useAsyncIO="false" makes impact in terms of > performance, scalability (number of connections to the server) and > reliability ? Well, if you set useAsyncIO="false" it works. If you set useAsyncIO="true" it fails 90% of the time. I'd suggest that, with that failure rate, discussion of performance and scalability are somewhat irrelevant. With test cases that do not trigger this issue the difference between the two is marginal. Remy is more familiar with the details than me but from memory useAsyncIO="true" is a little better on older JREs. On modern JREs there isn't much in it. It is also likely to depend on application usage patterns, OS, etc. In short, you'd need to test the difference on your application with your hardware etc. I'd expect the difference to be hard to measure. Mark > > Regards, > Kedar > > > -Original Message- > From: Mark Thomas > Sent: Wednesday, June 16, 2021 11:41 PM > To: users@tomcat.apache.org > Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer > (server -> client) > > On 16/06/2021 18:47, Rémy Maucherat wrote: >> On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: >> >>> On 16/06/2021 18:01, Deshmukh, Kedar wrote: >>> >>>> I have one additional question at this point. How easy is this >>>> issue to >>> reproduce? Does it happen every time? In 10% of requests? 1% ? >>>> >>>> [Kedar] It is reproducible 9/10 times in my environment. So 90% >>>> time it >>> is reproducible when concurrency is 5 or more and file sizes are >>> between 1GB-5GB. >>> >>> Thanks for the confirmation. I have converted your test classes into >>> a Tomcat unit test (easy for me to work with) and the issue looks to >>> be repeatable on Linux with the latest 10.1.x code. >>> >>> I'm starting to look at this now. I'll post again when I have more >>> information. I'm not expecting this to be quick. > > Kedar, > > If you set useAsyncIO="false" on the Connector that should work around the > problem for now. The Servlet Async API will still be available. > Tomcat just uses a different code path to write to the network. > >> I did not expect it would be so easy to reproduce. Can you commit the test ? > > It is a bit of a hack at the moment. The code isn't particularly clean and I > have hard-coded some file paths for my system (I have a bunch of 5GB Windows > MSDN ISOs I am using for the large files. I also don't think we want test > cases that using multi-GB files running on every test run. > > If I clean things up a bit, parameterise the hard-coded paths bits and > disable the test by default it should be in a reasonable state to commit. > > It looks very much like the vectoredOperation and the associated semaphore is > where things are going wrong. > > I'm aiming to work on this some more tomorrow. > > Mark > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 16/06/2021 19:42, Deshmukh, Kedar wrote: Thanks Mark for the quick update. Can you please provide how useAsyncIO="false" makes impact in terms of performance, scalability (number of connections to the server) and reliability ? Well, if you set useAsyncIO="false" it works. If you set useAsyncIO="true" it fails 90% of the time. I'd suggest that, with that failure rate, discussion of performance and scalability are somewhat irrelevant. With test cases that do not trigger this issue the difference between the two is marginal. Remy is more familiar with the details than me but from memory useAsyncIO="true" is a little better on older JREs. On modern JREs there isn't much in it. It is also likely to depend on application usage patterns, OS, etc. In short, you'd need to test the difference on your application with your hardware etc. I'd expect the difference to be hard to measure. Mark Regards, Kedar -Original Message- From: Mark Thomas Sent: Wednesday, June 16, 2021 11:41 PM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 16/06/2021 18:47, Rémy Maucherat wrote: On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: On 16/06/2021 18:01, Deshmukh, Kedar wrote: I have one additional question at this point. How easy is this issue to reproduce? Does it happen every time? In 10% of requests? 1% ? [Kedar] It is reproducible 9/10 times in my environment. So 90% time it is reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB. Thanks for the confirmation. I have converted your test classes into a Tomcat unit test (easy for me to work with) and the issue looks to be repeatable on Linux with the latest 10.1.x code. I'm starting to look at this now. I'll post again when I have more information. I'm not expecting this to be quick. Kedar, If you set useAsyncIO="false" on the Connector that should work around the problem for now. The Servlet Async API will still be available. Tomcat just uses a different code path to write to the network. I did not expect it would be so easy to reproduce. Can you commit the test ? It is a bit of a hack at the moment. The code isn't particularly clean and I have hard-coded some file paths for my system (I have a bunch of 5GB Windows MSDN ISOs I am using for the large files. I also don't think we want test cases that using multi-GB files running on every test run. If I clean things up a bit, parameterise the hard-coded paths bits and disable the test by default it should be in a reasonable state to commit. It looks very much like the vectoredOperation and the associated semaphore is where things are going wrong. I'm aiming to work on this some more tomorrow. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 16/06/2021 21:52, Christopher Schultz wrote: Mark, On 6/16/21 14:10, Mark Thomas wrote: On 16/06/2021 18:47, Rémy Maucherat wrote: On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: On 16/06/2021 18:01, Deshmukh, Kedar wrote: I have one additional question at this point. How easy is this issue to reproduce? Does it happen every time? In 10% of requests? 1% ? [Kedar] It is reproducible 9/10 times in my environment. So 90% time it is reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB. Thanks for the confirmation. I have converted your test classes into a Tomcat unit test (easy for me to work with) and the issue looks to be repeatable on Linux with the latest 10.1.x code. I'm starting to look at this now. I'll post again when I have more information. I'm not expecting this to be quick. Kedar, If you set useAsyncIO="false" on the Connector that should work around the problem for now. The Servlet Async API will still be available. Tomcat just uses a different code path to write to the network. I did not expect it would be so easy to reproduce. Can you commit the test ? It is a bit of a hack at the moment. The code isn't particularly clean and I have hard-coded some file paths for my system (I have a bunch of 5GB Windows MSDN ISOs I am using for the large files. I also don't think we want test cases that using multi-GB files running on every test run. If I clean things up a bit, parameterise the hard-coded paths bits and disable the test by default it should be in a reasonable state to commit. It looks very much like the vectoredOperation and the associated semaphore is where things are going wrong. I'm aiming to work on this some more tomorrow. Is it inadvisable to use a trivial JSP or Servlet that just generates X bytes? Or does this require the use of the DefaultServlet at the moment? Right now, I don't know. Is it still possible to reproduce using smaller window sizes and smaller total resource sizes? It would be nice if a unit-test didn't have to transfer 5GiB even through the loopback interface. I may have a better answer after I do some further investigation tomorrow. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
Mark, On 6/16/21 14:10, Mark Thomas wrote: On 16/06/2021 18:47, Rémy Maucherat wrote: On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: On 16/06/2021 18:01, Deshmukh, Kedar wrote: I have one additional question at this point. How easy is this issue to reproduce? Does it happen every time? In 10% of requests? 1% ? [Kedar] It is reproducible 9/10 times in my environment. So 90% time it is reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB. Thanks for the confirmation. I have converted your test classes into a Tomcat unit test (easy for me to work with) and the issue looks to be repeatable on Linux with the latest 10.1.x code. I'm starting to look at this now. I'll post again when I have more information. I'm not expecting this to be quick. Kedar, If you set useAsyncIO="false" on the Connector that should work around the problem for now. The Servlet Async API will still be available. Tomcat just uses a different code path to write to the network. I did not expect it would be so easy to reproduce. Can you commit the test ? It is a bit of a hack at the moment. The code isn't particularly clean and I have hard-coded some file paths for my system (I have a bunch of 5GB Windows MSDN ISOs I am using for the large files. I also don't think we want test cases that using multi-GB files running on every test run. If I clean things up a bit, parameterise the hard-coded paths bits and disable the test by default it should be in a reasonable state to commit. It looks very much like the vectoredOperation and the associated semaphore is where things are going wrong. I'm aiming to work on this some more tomorrow. Is it inadvisable to use a trivial JSP or Servlet that just generates X bytes? Or does this require the use of the DefaultServlet at the moment? Is it still possible to reproduce using smaller window sizes and smaller total resource sizes? It would be nice if a unit-test didn't have to transfer 5GiB even through the loopback interface. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
Thanks Mark for the quick update. Can you please provide how useAsyncIO="false" makes impact in terms of performance, scalability (number of connections to the server) and reliability ? Regards, Kedar -Original Message- From: Mark Thomas Sent: Wednesday, June 16, 2021 11:41 PM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 16/06/2021 18:47, Rémy Maucherat wrote: > On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: > >> On 16/06/2021 18:01, Deshmukh, Kedar wrote: >> >>> I have one additional question at this point. How easy is this issue >>> to >> reproduce? Does it happen every time? In 10% of requests? 1% ? >>> >>> [Kedar] It is reproducible 9/10 times in my environment. So 90% time >>> it >> is reproducible when concurrency is 5 or more and file sizes are >> between 1GB-5GB. >> >> Thanks for the confirmation. I have converted your test classes into >> a Tomcat unit test (easy for me to work with) and the issue looks to >> be repeatable on Linux with the latest 10.1.x code. >> >> I'm starting to look at this now. I'll post again when I have more >> information. I'm not expecting this to be quick. Kedar, If you set useAsyncIO="false" on the Connector that should work around the problem for now. The Servlet Async API will still be available. Tomcat just uses a different code path to write to the network. > I did not expect it would be so easy to reproduce. Can you commit the test ? It is a bit of a hack at the moment. The code isn't particularly clean and I have hard-coded some file paths for my system (I have a bunch of 5GB Windows MSDN ISOs I am using for the large files. I also don't think we want test cases that using multi-GB files running on every test run. If I clean things up a bit, parameterise the hard-coded paths bits and disable the test by default it should be in a reasonable state to commit. It looks very much like the vectoredOperation and the associated semaphore is where things are going wrong. I'm aiming to work on this some more tomorrow. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 16/06/2021 18:47, Rémy Maucherat wrote: On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: On 16/06/2021 18:01, Deshmukh, Kedar wrote: I have one additional question at this point. How easy is this issue to reproduce? Does it happen every time? In 10% of requests? 1% ? [Kedar] It is reproducible 9/10 times in my environment. So 90% time it is reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB. Thanks for the confirmation. I have converted your test classes into a Tomcat unit test (easy for me to work with) and the issue looks to be repeatable on Linux with the latest 10.1.x code. I'm starting to look at this now. I'll post again when I have more information. I'm not expecting this to be quick. Kedar, If you set useAsyncIO="false" on the Connector that should work around the problem for now. The Servlet Async API will still be available. Tomcat just uses a different code path to write to the network. I did not expect it would be so easy to reproduce. Can you commit the test ? It is a bit of a hack at the moment. The code isn't particularly clean and I have hard-coded some file paths for my system (I have a bunch of 5GB Windows MSDN ISOs I am using for the large files. I also don't think we want test cases that using multi-GB files running on every test run. If I clean things up a bit, parameterise the hard-coded paths bits and disable the test by default it should be in a reasonable state to commit. It looks very much like the vectoredOperation and the associated semaphore is where things are going wrong. I'm aiming to work on this some more tomorrow. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas wrote: > On 16/06/2021 18:01, Deshmukh, Kedar wrote: > > > I have one additional question at this point. How easy is this issue to > reproduce? Does it happen every time? In 10% of requests? 1% ? > > > > [Kedar] It is reproducible 9/10 times in my environment. So 90% time it > is reproducible when concurrency is 5 or more and file sizes are between > 1GB-5GB. > > Thanks for the confirmation. I have converted your test classes into a > Tomcat unit test (easy for me to work with) and the issue looks to be > repeatable on Linux with the latest 10.1.x code. > > I'm starting to look at this now. I'll post again when I have more > information. I'm not expecting this to be quick. > I did not expect it would be so easy to reproduce. Can you commit the test ? Rémy
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 16/06/2021 18:01, Deshmukh, Kedar wrote: I have one additional question at this point. How easy is this issue to reproduce? Does it happen every time? In 10% of requests? 1% ? [Kedar] It is reproducible 9/10 times in my environment. So 90% time it is reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB. Thanks for the confirmation. I have converted your test classes into a Tomcat unit test (easy for me to work with) and the issue looks to be repeatable on Linux with the latest 10.1.x code. I'm starting to look at this now. I'll post again when I have more information. I'm not expecting this to be quick. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
-Original Message- From: Mark Thomas Sent: Wednesday, June 16, 2021 9:29 PM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 16/06/2021 15:05, Deshmukh, Kedar wrote: > Dear Tomcat users/dev team, > > We are understanding the impact of HTTP/2 in our application as HTTP/2 > provides better throughput and performance. I'd be wary of making such sweeping statements. HTTP/2 has some advantages and some disadvantages. Generally, the advantages will outweigh the disadvantages but that will not always be the case. [Kedar] - Okay. > Before directly tuning > HTTP/2 in application, we thought of analyzing certain use cases which > our application demands in standalone environment. > > Our use case is very simple. Java based standalone client is making > simple POST request to the server and server read the file name from > the request and push the requested file to client. Here, client can > request multiple files same time by sending multiple requests > concurrently, so server should be able to send multiple files > concurrently depending on configured concurrency level. > > Currently, in this test only single client is making requests to the > server with concurrency 5. Server is not overloaded and not performing > any other tasks. Machine has more than 500GB empty space and not > running any heavy applications. > > Test: > > We used different set of files for this test. Files with sizes between > 1GB - 5GB and concurrency > 5. We are using traditional connector > protocol HTTP11NIOProtocol with HTTP/2 is turned on. > > Observations: > > HTTP/1.1 - With HTTP/1.1 given sample code works fine. Only drawback > here is it opens multiple TCP connections to satisfy HTTP/1.1 > > HTTP/2 - With HTTP/2, it is expected to be only one TCP connection and > multiple streams to handle the traffic. Tomcat HTTP/2 debug logs > suggest that only one connection being used and multiple streams are > spawned as expected. So far everything is fine. But sample code does > not work consistently with higher concurrency (> 3). We captured the > stack trace of tomcat process which is attached here. Couple of tomcat > threads are waiting to acquire semaphore for socket write operation. > When write operation is stuck servlet is not able to push any data to > client and client is also stuck waiting for more data. I don't see any > error/exception at the client/server. That looks / sounds like there is a code path - probably an error condition ? - where the semaphore isn't released. [Kedar] Semaphore never released. As mentioned in sample code concurrency level is 5 so If you observe attached tomcat stack trace, you would find following very similar 4 threads which are waiting on semaphore which was acquired by other thread "Thread-9". "Thread-7" #58 daemon prio=5 os_prio=0 cpu=218.75ms elapsed=105.35s tid=0x01844997a800 nid=0x4bc0 waiting on condition [0x0027cc7fe000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method) - parking to wait for <0x0006158518b0> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.10/LockSupport.java:234) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1079) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1369) at java.util.concurrent.Semaphore.tryAcquire(java.base@11.0.10/Semaphore.java:415) at org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1426) "Thread-9" #59 daemon prio=5 os_prio=0 cpu=187.50ms elapsed=105.35s tid=0x01844997b800 nid=0x4b48 in Object.wait() [0x0027cc9fe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.10/Native Method) - waiting on <0x00061ed36de0> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState) at org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1457) The other possibility is related to the HTTP/2 flow control windows. If something goes wrong with the management of the flow control window for the connection it would block everything. [Kedar] - Okay. But I did not see anything in the logs or WINDOWS_UPDATE frame related errors. > streamReadTimeout and streamWriteTimeout are configured as -1 so they > are infinitely waiting for the write semaphore. That is generally a bad idea. By all means set it high but an infinite
Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)
On 16/06/2021 15:05, Deshmukh, Kedar wrote: Dear Tomcat users/dev team, We are understanding the impact of HTTP/2 in our application as HTTP/2 provides better throughput and performance. I'd be wary of making such sweeping statements. HTTP/2 has some advantages and some disadvantages. Generally, the advantages will outweigh the disadvantages but that will not always be the case. Before directly tuning HTTP/2 in application, we thought of analyzing certain use cases which our application demands in standalone environment. Our use case is very simple. Java based standalone client is making simple POST request to the server and server read the file name from the request and push the requested file to client. Here, client can request multiple files same time by sending multiple requests concurrently, so server should be able to send multiple files concurrently depending on configured concurrency level. Currently, in this test only single client is making requests to the server with concurrency 5. Server is not overloaded and not performing any other tasks. Machine has more than 500GB empty space and not running any heavy applications. Test: We used different set of files for this test. Files with sizes between 1GB – 5GB and concurrency > 5. We are using traditional connector protocol HTTP11NIOProtocol with HTTP/2 is turned on. Observations: HTTP/1.1 - With HTTP/1.1 given sample code works fine. Only drawback here is it opens multiple TCP connections to satisfy HTTP/1.1 HTTP/2 - With HTTP/2, it is expected to be only one TCP connection and multiple streams to handle the traffic. Tomcat HTTP/2 debug logs suggest that only one connection being used and multiple streams are spawned as expected. So far everything is fine. But sample code does not work consistently with higher concurrency (> 3). We captured the stack trace of tomcat process which is attached here. Couple of tomcat threads are waiting to acquire semaphore for socket write operation. When write operation is stuck servlet is not able to push any data to client and client is also stuck waiting for more data. I don’t see any error/exception at the client/server. That looks / sounds like there is a code path - probably an error condition ? - where the semaphore isn't released. The other possibility is related to the HTTP/2 flow control windows. If something goes wrong with the management of the flow control window for the connection it would block everything. streamReadTimeout and streamWriteTimeout are configured as -1 so they are infinitely waiting for the write semaphore. That is generally a bad idea. By all means set it high but an infinite timeout is going to cause problems - particularly if clients just drop off the network. Outcome of this is client is able to receive only partial data from server and at some point server stuck to send any more data. We also tried IOUtils file transfer related APIs still it didn't help. I have also tried with Async non-blocking IO but the observations are same. Generally, the simpler you keep the test case, the easier it is for us to work with. Non-async and no external IO libraries is better. Our actual requirement is very similar where java based http client would request bulk data concurrently from server and server should push that without any trouble. But, it is not limited to files only. Server can push serialized java bulk objects over the stream concurrently. The content type should not make any difference to Tomcat. Static files vs dynamic content would make a difference. Note that sample code works fine most of the time if I enable HTTP/2 logs either in client or tomcat. So I would suggest not to turn on HTTP/2 debug logs to conclude anything. That suggests a timing issue of some sort. HTTP/2 is significantly more complex than HTTP/1.1 because you have multiple independent application threads all trying to write to the same socket and Tomcat has to track and allocate flow control window allocations both for individual streams and the overall connection. Following components are used in sample code for the test 1. Client - Java 11.0.10 httpclient - (client\Client.java) 2. Server - Tomcat 9.0.46 3. Servlet - AsyncServlet - (server\Server.java) 4. Operating system - Windows 10 5. Machine specifications – 32GB RAM and 500GB open space. 5. Latency - None, client and server are running on same machine 6. Set of files - You can use any random files whose sizes are between 1GB-5GB to reproduce the issue. Refer attachment for 1. Client side code 2. Server side servlet 3. server.xml 4. Tomcat Stacktrace 5. Tomcat server logs Thanks. That is all useful information. Could you please go through sample code along with server.xml. Here are my few questions 1. Why HTTP/2 is failing for such use case where large files are concurrently pushing to the client. I believe this is a very common use case must have be