Any tentative time line when fix will be available in 9.0.X release ? Thanks, Kedar
-----Original Message----- From: Mark Thomas <ma...@apache.org> Sent: Friday, June 18, 2021 2:50 AM To: users@tomcat.apache.org Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client) On 17/06/2021 09:26, Mark Thomas wrote: > I think I might have found one contributing factor to this bug. I need > to run a series of tests to determine whether I am seeing random > variation in test results or a genuine effect. It was random effects but I believe I have now found the bug. Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in the same HTTP/2 Connection. You'll need to have the code in front of you to follow what is going on The write: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1364&data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p2j7GZ6kDbeWc8%2BqnBGoadjhgV8w%2FcG8YnriPDeV%2F2g%3D&reserved=0 and the associated completion handler https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1044&data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UVt7wgZ2GuKML2VUMh%2B58f7sK0cxdS0ZRAOs0gGiasQ%3D&reserved=0 The detail of the code is fairly complex but all you really need to keep in mind is the following: - the writePending semaphore ensures only one thread can write at a time - the state of the write is maintained in a OperationState instance that is stored in SocketWrapperBase.writeOperation (L1390) - the completion handler clears this state (L1050) and releases the semaphore (L1046) The sequence of events for a failure is as follows: - T1 obtains the write semaphore (L1366) - T1 creates an OperationState and sets writeOperation (L1390) - the async write for T1 completes and the completion handler is called - T1's completion handler releases the semaphore (L1046) - T2 obtains the write semaphore (L1366) - T2 creates an OperationState and sets writeOperation (L1390) - T1's completion handler clears writeOperation (L1050) - the async write for T2 does not complete and the socket is added to the Poller - The Poller signals the socket is ready for write - The Poller finds writeOperation is null so performs a normal dispatch for write - The async write times out as it never receives the notification from the Poller The fix is to swap the order of clearing writeOperation and releasing the semaphore. Concurrent reads will have the same problem and will be fixed by the same solution. Fix will be applied shortly. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org