Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-28 Thread Mark Thomas

On 28/06/2021 06:14, Deshmukh, Kedar wrote:

Any tentative time line when fix will be available in 9.0.X release ?


Releases are typically made every month. The release usually happens 
some time in the second week of the month. The July releases are 
currently look like they will be comparatively early - possibly as soon 
as the end of this week.


If you want to track release progress then I'd recommend following the 
dev@ list.


Mark




Thanks,
Kedar

-Original Message-
From: Mark Thomas 
Sent: Friday, June 18, 2021 2:50 AM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 17/06/2021 09:26, Mark Thomas wrote:


I think I might have found one contributing factor to this bug. I need
to run a series of tests to determine whether I am seeing random
variation in test results or a genuine effect.


It was random effects but I believe I have now found the bug.

Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in 
the same HTTP/2 Connection.

You'll need to have the code in front of you to follow what is going on

The write:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1364data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=p2j7GZ6kDbeWc8%2BqnBGoadjhgV8w%2FcG8YnriPDeV%2F2g%3Dreserved=0

and the associated completion handler

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1044data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UVt7wgZ2GuKML2VUMh%2B58f7sK0cxdS0ZRAOs0gGiasQ%3Dreserved=0


The detail of the code is fairly complex but all you really need to keep in 
mind is the following:

- the writePending semaphore ensures only one thread can write at a time

- the state of the write is maintained in a OperationState instance that is 
stored in SocketWrapperBase.writeOperation (L1390)

- the completion handler clears this state (L1050) and releases the
semaphore (L1046)


The sequence of events for a failure is as follows:

- T1 obtains the write semaphore (L1366)
- T1 creates an OperationState and sets writeOperation (L1390)
- the async write for T1 completes and the completion handler is called
- T1's completion handler releases the semaphore (L1046)
- T2 obtains the write semaphore (L1366)
- T2 creates an OperationState and sets writeOperation (L1390)
- T1's completion handler clears writeOperation (L1050)
- the async write for T2 does not complete and the socket is added to
the Poller
- The Poller signals the socket is ready for write
- The Poller finds writeOperation is null so performs a normal dispatch
for write
- The async write times out as it never receives the notification from
the Poller

The fix is to swap the order of clearing writeOperation and releasing the 
semaphore.

Concurrent reads will have the same problem and will be fixed by the same 
solution.

Fix will be applied shortly.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-27 Thread Deshmukh, Kedar
Any tentative time line when fix will be available in 9.0.X release ?

Thanks,
Kedar

-Original Message-
From: Mark Thomas  
Sent: Friday, June 18, 2021 2:50 AM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 17/06/2021 09:26, Mark Thomas wrote:

> I think I might have found one contributing factor to this bug. I need 
> to run a series of tests to determine whether I am seeing random 
> variation in test results or a genuine effect.

It was random effects but I believe I have now found the bug.

Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently in 
the same HTTP/2 Connection.

You'll need to have the code in front of you to follow what is going on

The write:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1364data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=p2j7GZ6kDbeWc8%2BqnBGoadjhgV8w%2FcG8YnriPDeV%2F2g%3Dreserved=0

and the associated completion handler

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftomcat%2Fblob%2Fmain%2Fjava%2Forg%2Fapache%2Ftomcat%2Futil%2Fnet%2FSocketWrapperBase.java%23L1044data=04%7C01%7Cdkedar%40ptc.com%7C5df90eb802f84737230a08d931d5ce12%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C637595616621142835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UVt7wgZ2GuKML2VUMh%2B58f7sK0cxdS0ZRAOs0gGiasQ%3Dreserved=0


The detail of the code is fairly complex but all you really need to keep in 
mind is the following:

- the writePending semaphore ensures only one thread can write at a time

- the state of the write is maintained in a OperationState instance that is 
stored in SocketWrapperBase.writeOperation (L1390)

- the completion handler clears this state (L1050) and releases the
   semaphore (L1046)


The sequence of events for a failure is as follows:

- T1 obtains the write semaphore (L1366)
- T1 creates an OperationState and sets writeOperation (L1390)
- the async write for T1 completes and the completion handler is called
- T1's completion handler releases the semaphore (L1046)
- T2 obtains the write semaphore (L1366)
- T2 creates an OperationState and sets writeOperation (L1390)
- T1's completion handler clears writeOperation (L1050)
- the async write for T2 does not complete and the socket is added to
   the Poller
- The Poller signals the socket is ready for write
- The Poller finds writeOperation is null so performs a normal dispatch
   for write
- The async write times out as it never receives the notification from
   the Poller

The fix is to swap the order of clearing writeOperation and releasing the 
semaphore.

Concurrent reads will have the same problem and will be fixed by the same 
solution.

Fix will be applied shortly.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread logo
Magic Mark,

> Am 17.06.2021 um 23:20 schrieb Mark Thomas :
> 
> On 17/06/2021 09:26, Mark Thomas wrote:
> 
>> I think I might have found one contributing factor to this bug. I need to 
>> run a series of tests to determine whether I am seeing random variation in 
>> test results or a genuine effect.
> 
> It was random effects but I believe I have now found the bug.
> 
> Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently 
> in the same HTTP/2 Connection.
> 
> You'll need to have the code in front of you to follow what is going on
> 
> The write:
> 
> https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364
> 
> and the associated completion handler
> 
> https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044
> 
> 
> The detail of the code is fairly complex but all you really need to keep in 
> mind is the following:
> 
> - the writePending semaphore ensures only one thread can write at a time
> 
> - the state of the write is maintained in a OperationState instance that is 
> stored in SocketWrapperBase.writeOperation (L1390)
> 
> - the completion handler clears this state (L1050) and releases the
>  semaphore (L1046)
> 
> 
> The sequence of events for a failure is as follows:
> 
> - T1 obtains the write semaphore (L1366)
> - T1 creates an OperationState and sets writeOperation (L1390)
> - the async write for T1 completes and the completion handler is called
> - T1's completion handler releases the semaphore (L1046)
> - T2 obtains the write semaphore (L1366)
> - T2 creates an OperationState and sets writeOperation (L1390)
> - T1's completion handler clears writeOperation (L1050)
> - the async write for T2 does not complete and the socket is added to
>  the Poller
> - The Poller signals the socket is ready for write
> - The Poller finds writeOperation is null so performs a normal dispatch
>  for write
> - The async write times out as it never receives the notification from
>  the Poller
> 
> The fix is to swap the order of clearing writeOperation and releasing the 
> semaphore.
> 
> Concurrent reads will have the same problem and will be fixed by the same 
> solution.
> 

Thread handling and synchronizing at the max!

Thanks for the insight and your hard work finding this!


> Fix will be applied shortly.
> 
> Mark
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread Mark Thomas

On 17/06/2021 09:26, Mark Thomas wrote:

I think I might have found one contributing factor to this bug. I need 
to run a series of tests to determine whether I am seeing random 
variation in test results or a genuine effect.


It was random effects but I believe I have now found the bug.

Consider two threads, T1 and T2 writing HTTP/2 response bodies 
concurrently in the same HTTP/2 Connection.


You'll need to have the code in front of you to follow what is going on

The write:

https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364

and the associated completion handler

https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044


The detail of the code is fairly complex but all you really need to keep 
in mind is the following:


- the writePending semaphore ensures only one thread can write at a time

- the state of the write is maintained in a OperationState instance that 
is stored in SocketWrapperBase.writeOperation (L1390)


- the completion handler clears this state (L1050) and releases the
  semaphore (L1046)


The sequence of events for a failure is as follows:

- T1 obtains the write semaphore (L1366)
- T1 creates an OperationState and sets writeOperation (L1390)
- the async write for T1 completes and the completion handler is called
- T1's completion handler releases the semaphore (L1046)
- T2 obtains the write semaphore (L1366)
- T2 creates an OperationState and sets writeOperation (L1390)
- T1's completion handler clears writeOperation (L1050)
- the async write for T2 does not complete and the socket is added to
  the Poller
- The Poller signals the socket is ready for write
- The Poller finds writeOperation is null so performs a normal dispatch
  for write
- The async write times out as it never receives the notification from
  the Poller

The fix is to swap the order of clearing writeOperation and releasing 
the semaphore.


Concurrent reads will have the same problem and will be fixed by the 
same solution.


Fix will be applied shortly.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread Mark Thomas

On 17/06/2021 08:44, Rémy Maucherat wrote:

On Thu, Jun 17, 2021 at 9:27 AM Mark Thomas  wrote:


On 17/06/2021 07:56, Rémy Maucherat wrote:


The main benefit is that it removes some blocking IO which is a good

idea.

NIO2 is worth testing with your new test, BTW.


NIO2 works. The issue appears to be limited to the NIO connector.



Interesting. At least it's another workaround then. Are you testing NIO on
Java 11+ ?


Yes, as the test case uses the HTTP client from Java 11

I think I might have found one contributing factor to this bug. I need 
to run a series of tests to determine whether I am seeing random 
variation in test results or a genuine effect.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread Rémy Maucherat
On Thu, Jun 17, 2021 at 9:27 AM Mark Thomas  wrote:

> On 17/06/2021 07:56, Rémy Maucherat wrote:
>
> > The main benefit is that it removes some blocking IO which is a good
> idea.
> > NIO2 is worth testing with your new test, BTW.
>
> NIO2 works. The issue appears to be limited to the NIO connector.
>

Interesting. At least it's another workaround then. Are you testing NIO on
Java 11+ ?

Rémy


>
> Mark
>
> > Transferring large files with HTTP/2 is a bad idea though, it's always
> > going to be inefficient compared to HTTP/1.1. And if the idea is to
> > multiplex, then it will become terrible, that's a given and it should not
> > be done. HTTP/2 is very good only if you have tons of small entities to
> > transfer (that's what sites have these days, so that's good).
> >
> > Rémy
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>


Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread Mark Thomas

On 17/06/2021 07:56, Rémy Maucherat wrote:


The main benefit is that it removes some blocking IO which is a good idea.
NIO2 is worth testing with your new test, BTW.


NIO2 works. The issue appears to be limited to the NIO connector.

Mark


Transferring large files with HTTP/2 is a bad idea though, it's always
going to be inefficient compared to HTTP/1.1. And if the idea is to
multiplex, then it will become terrible, that's a given and it should not
be done. HTTP/2 is very good only if you have tons of small entities to
transfer (that's what sites have these days, so that's good).

Rémy


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-17 Thread Rémy Maucherat
On Wed, Jun 16, 2021 at 11:02 PM Mark Thomas  wrote:

> On 16/06/2021 19:42, Deshmukh, Kedar wrote:
> > Thanks Mark for the quick update.
> >
> > Can you please provide how useAsyncIO="false" makes impact in terms of
> performance, scalability (number of connections to the server) and
> reliability ?
>
> Well, if you set useAsyncIO="false" it works. If you set
> useAsyncIO="true" it fails 90% of the time.
>
> I'd suggest that, with that failure rate, discussion of performance and
> scalability are somewhat irrelevant.
>
> With test cases that do not trigger this issue the difference between
> the two is marginal. Remy is more familiar with the details than me but
> from memory useAsyncIO="true" is a little better on older JREs. On
> modern JREs there isn't much in it. It is also likely to depend on
> application usage patterns, OS, etc. In short, you'd need to test the
> difference on your application with your hardware etc. I'd expect the
> difference to be hard to measure.
>

The main benefit is that it removes some blocking IO which is a good idea.
NIO2 is worth testing with your new test, BTW.

Transferring large files with HTTP/2 is a bad idea though, it's always
going to be inefficient compared to HTTP/1.1. And if the idea is to
multiplex, then it will become terrible, that's a given and it should not
be done. HTTP/2 is very good only if you have tons of small entities to
transfer (that's what sites have these days, so that's good).

Rémy


>
> Mark
>
>
> >
> > Regards,
> > Kedar
> >
> >
> > -----Original Message-
> > From: Mark Thomas 
> > Sent: Wednesday, June 16, 2021 11:41 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer
> (server -> client)
> >
> > On 16/06/2021 18:47, Rémy Maucherat wrote:
> >> On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:
> >>
> >>> On 16/06/2021 18:01, Deshmukh, Kedar wrote:
> >>>
> >>>> I have one additional question at this point. How easy is this issue
> >>>> to
> >>> reproduce? Does it happen every time? In 10% of requests? 1% ?
> >>>>
> >>>> [Kedar] It is reproducible 9/10 times in my environment. So 90% time
> >>>> it
> >>> is reproducible when concurrency is 5 or more and file sizes are
> >>> between 1GB-5GB.
> >>>
> >>> Thanks for the confirmation. I have converted your test classes into
> >>> a Tomcat unit test (easy for me to work with) and the issue looks to
> >>> be repeatable on Linux with the latest 10.1.x code.
> >>>
> >>> I'm starting to look at this now. I'll post again when I have more
> >>> information. I'm not expecting this to be quick.
> >
> > Kedar,
> >
> > If you set useAsyncIO="false" on the Connector that should work around
> the problem for now. The Servlet Async API will still be available.
> > Tomcat just uses a different code path to write to the network.
> >
> >> I did not expect it would be so easy to reproduce. Can you commit the
> test ?
> >
> > It is a bit of a hack at the moment. The code isn't particularly clean
> and I have hard-coded some file paths for my system (I have a bunch of 5GB
> Windows MSDN ISOs I am using for the large files. I also don't think we
> want test cases that using multi-GB files running on every test run.
> >
> > If I clean things up a bit, parameterise the hard-coded paths bits and
> disable the test by default it should be in a reasonable state to commit.
> >
> > It looks very much like the vectoredOperation and the associated
> semaphore is where things are going wrong.
> >
> > I'm aiming to work on this some more tomorrow.
> >
> > Mark
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>


RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Deshmukh, Kedar
You are right. Actually, my concern is if useAsyncIO="true" gives better 
results in general then should it be good idea to wait for that as we are still 
assessing the HTTP/2 behavior in context of our application.

Thanks,
Kedar

-Original Message-
From: Mark Thomas  
Sent: Thursday, June 17, 2021 2:32 AM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 16/06/2021 19:42, Deshmukh, Kedar wrote:
> Thanks Mark for the quick update.
> 
> Can you please provide how useAsyncIO="false" makes impact in terms of 
> performance, scalability (number of connections to the server) and 
> reliability ?

Well, if you set useAsyncIO="false" it works. If you set useAsyncIO="true" it 
fails 90% of the time.

I'd suggest that, with that failure rate, discussion of performance and 
scalability are somewhat irrelevant.

With test cases that do not trigger this issue the difference between the two 
is marginal. Remy is more familiar with the details than me but from memory 
useAsyncIO="true" is a little better on older JREs. On modern JREs there isn't 
much in it. It is also likely to depend on application usage patterns, OS, etc. 
In short, you'd need to test the difference on your application with your 
hardware etc. I'd expect the difference to be hard to measure.

Mark


> 
> Regards,
> Kedar
> 
> 
> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, June 16, 2021 11:41 PM
> To: users@tomcat.apache.org
> Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer 
> (server -> client)
> 
> On 16/06/2021 18:47, Rémy Maucherat wrote:
>> On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:
>>
>>> On 16/06/2021 18:01, Deshmukh, Kedar wrote:
>>>
>>>> I have one additional question at this point. How easy is this 
>>>> issue to
>>> reproduce? Does it happen every time? In 10% of requests? 1% ?
>>>>
>>>> [Kedar] It is reproducible 9/10 times in my environment. So 90% 
>>>> time it
>>> is reproducible when concurrency is 5 or more and file sizes are 
>>> between 1GB-5GB.
>>>
>>> Thanks for the confirmation. I have converted your test classes into 
>>> a Tomcat unit test (easy for me to work with) and the issue looks to 
>>> be repeatable on Linux with the latest 10.1.x code.
>>>
>>> I'm starting to look at this now. I'll post again when I have more 
>>> information. I'm not expecting this to be quick.
> 
> Kedar,
> 
> If you set useAsyncIO="false" on the Connector that should work around the 
> problem for now. The Servlet Async API will still be available.
> Tomcat just uses a different code path to write to the network.
> 
>> I did not expect it would be so easy to reproduce. Can you commit the test ?
> 
> It is a bit of a hack at the moment. The code isn't particularly clean and I 
> have hard-coded some file paths for my system (I have a bunch of 5GB Windows 
> MSDN ISOs I am using for the large files. I also don't think we want test 
> cases that using multi-GB files running on every test run.
> 
> If I clean things up a bit, parameterise the hard-coded paths bits and 
> disable the test by default it should be in a reasonable state to commit.
> 
> It looks very much like the vectoredOperation and the associated semaphore is 
> where things are going wrong.
> 
> I'm aiming to work on this some more tomorrow.
> 
> Mark
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Mark Thomas

On 16/06/2021 19:42, Deshmukh, Kedar wrote:

Thanks Mark for the quick update.

Can you please provide how useAsyncIO="false" makes impact in terms of 
performance, scalability (number of connections to the server) and reliability ?


Well, if you set useAsyncIO="false" it works. If you set 
useAsyncIO="true" it fails 90% of the time.


I'd suggest that, with that failure rate, discussion of performance and 
scalability are somewhat irrelevant.


With test cases that do not trigger this issue the difference between 
the two is marginal. Remy is more familiar with the details than me but 
from memory useAsyncIO="true" is a little better on older JREs. On 
modern JREs there isn't much in it. It is also likely to depend on 
application usage patterns, OS, etc. In short, you'd need to test the 
difference on your application with your hardware etc. I'd expect the 
difference to be hard to measure.


Mark




Regards,
Kedar


-Original Message-
From: Mark Thomas 
Sent: Wednesday, June 16, 2021 11:41 PM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 16/06/2021 18:47, Rémy Maucherat wrote:

On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:


On 16/06/2021 18:01, Deshmukh, Kedar wrote:


I have one additional question at this point. How easy is this issue
to

reproduce? Does it happen every time? In 10% of requests? 1% ?


[Kedar] It is reproducible 9/10 times in my environment. So 90% time
it

is reproducible when concurrency is 5 or more and file sizes are
between 1GB-5GB.

Thanks for the confirmation. I have converted your test classes into
a Tomcat unit test (easy for me to work with) and the issue looks to
be repeatable on Linux with the latest 10.1.x code.

I'm starting to look at this now. I'll post again when I have more
information. I'm not expecting this to be quick.


Kedar,

If you set useAsyncIO="false" on the Connector that should work around the 
problem for now. The Servlet Async API will still be available.
Tomcat just uses a different code path to write to the network.


I did not expect it would be so easy to reproduce. Can you commit the test ?


It is a bit of a hack at the moment. The code isn't particularly clean and I 
have hard-coded some file paths for my system (I have a bunch of 5GB Windows 
MSDN ISOs I am using for the large files. I also don't think we want test cases 
that using multi-GB files running on every test run.

If I clean things up a bit, parameterise the hard-coded paths bits and disable 
the test by default it should be in a reasonable state to commit.

It looks very much like the vectoredOperation and the associated semaphore is 
where things are going wrong.

I'm aiming to work on this some more tomorrow.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Mark Thomas

On 16/06/2021 21:52, Christopher Schultz wrote:

Mark,

On 6/16/21 14:10, Mark Thomas wrote:

On 16/06/2021 18:47, Rémy Maucherat wrote:

On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:


On 16/06/2021 18:01, Deshmukh, Kedar wrote:

I have one additional question at this point. How easy is this 
issue to

reproduce? Does it happen every time? In 10% of requests? 1% ?


[Kedar] It is reproducible 9/10 times in my environment. So 90% 
time it
is reproducible when concurrency is 5 or more and file sizes are 
between

1GB-5GB.

Thanks for the confirmation. I have converted your test classes into a
Tomcat unit test (easy for me to work with) and the issue looks to be
repeatable on Linux with the latest 10.1.x code.

I'm starting to look at this now. I'll post again when I have more
information. I'm not expecting this to be quick.


Kedar,

If you set useAsyncIO="false" on the Connector that should work around 
the problem for now. The Servlet Async API will still be available. 
Tomcat just uses a different code path to write to the network.


I did not expect it would be so easy to reproduce. Can you commit the 
test ?


It is a bit of a hack at the moment. The code isn't particularly clean 
and I have hard-coded some file paths for my system (I have a bunch of 
5GB Windows MSDN ISOs I am using for the large files. I also don't 
think we want test cases that using multi-GB files running on every 
test run.


If I clean things up a bit, parameterise the hard-coded paths bits and 
disable the test by default it should be in a reasonable state to commit.


It looks very much like the vectoredOperation and the associated 
semaphore is where things are going wrong.


I'm aiming to work on this some more tomorrow.


Is it inadvisable to use a trivial JSP or Servlet that just generates X 
bytes? Or does this require the use of the DefaultServlet at the moment?


Right now, I don't know.

Is it still possible to reproduce using smaller window sizes and smaller 
total resource sizes? It would be nice if a unit-test didn't have to 
transfer 5GiB even through the loopback interface.


I may have a better answer after I do some further investigation tomorrow.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Christopher Schultz

Mark,

On 6/16/21 14:10, Mark Thomas wrote:

On 16/06/2021 18:47, Rémy Maucherat wrote:

On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:


On 16/06/2021 18:01, Deshmukh, Kedar wrote:


I have one additional question at this point. How easy is this issue to

reproduce? Does it happen every time? In 10% of requests? 1% ?


[Kedar] It is reproducible 9/10 times in my environment. So 90% time it

is reproducible when concurrency is 5 or more and file sizes are between
1GB-5GB.

Thanks for the confirmation. I have converted your test classes into a
Tomcat unit test (easy for me to work with) and the issue looks to be
repeatable on Linux with the latest 10.1.x code.

I'm starting to look at this now. I'll post again when I have more
information. I'm not expecting this to be quick.


Kedar,

If you set useAsyncIO="false" on the Connector that should work around 
the problem for now. The Servlet Async API will still be available. 
Tomcat just uses a different code path to write to the network.


I did not expect it would be so easy to reproduce. Can you commit the 
test ?


It is a bit of a hack at the moment. The code isn't particularly clean 
and I have hard-coded some file paths for my system (I have a bunch of 
5GB Windows MSDN ISOs I am using for the large files. I also don't think 
we want test cases that using multi-GB files running on every test run.


If I clean things up a bit, parameterise the hard-coded paths bits and 
disable the test by default it should be in a reasonable state to commit.


It looks very much like the vectoredOperation and the associated 
semaphore is where things are going wrong.


I'm aiming to work on this some more tomorrow.


Is it inadvisable to use a trivial JSP or Servlet that just generates X 
bytes? Or does this require the use of the DefaultServlet at the moment?


Is it still possible to reproduce using smaller window sizes and smaller 
total resource sizes? It would be nice if a unit-test didn't have to 
transfer 5GiB even through the loopback interface.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Deshmukh, Kedar
Thanks Mark for the quick update.

Can you please provide how useAsyncIO="false" makes impact in terms of 
performance, scalability (number of connections to the server) and reliability ?

Regards,
Kedar


-Original Message-
From: Mark Thomas  
Sent: Wednesday, June 16, 2021 11:41 PM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 16/06/2021 18:47, Rémy Maucherat wrote:
> On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:
> 
>> On 16/06/2021 18:01, Deshmukh, Kedar wrote:
>>
>>> I have one additional question at this point. How easy is this issue 
>>> to
>> reproduce? Does it happen every time? In 10% of requests? 1% ?
>>>
>>> [Kedar] It is reproducible 9/10 times in my environment. So 90% time 
>>> it
>> is reproducible when concurrency is 5 or more and file sizes are 
>> between 1GB-5GB.
>>
>> Thanks for the confirmation. I have converted your test classes into 
>> a Tomcat unit test (easy for me to work with) and the issue looks to 
>> be repeatable on Linux with the latest 10.1.x code.
>>
>> I'm starting to look at this now. I'll post again when I have more 
>> information. I'm not expecting this to be quick.

Kedar,

If you set useAsyncIO="false" on the Connector that should work around the 
problem for now. The Servlet Async API will still be available. 
Tomcat just uses a different code path to write to the network.

> I did not expect it would be so easy to reproduce. Can you commit the test ?

It is a bit of a hack at the moment. The code isn't particularly clean and I 
have hard-coded some file paths for my system (I have a bunch of 5GB Windows 
MSDN ISOs I am using for the large files. I also don't think we want test cases 
that using multi-GB files running on every test run.

If I clean things up a bit, parameterise the hard-coded paths bits and disable 
the test by default it should be in a reasonable state to commit.

It looks very much like the vectoredOperation and the associated semaphore is 
where things are going wrong.

I'm aiming to work on this some more tomorrow.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Mark Thomas

On 16/06/2021 18:47, Rémy Maucherat wrote:

On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:


On 16/06/2021 18:01, Deshmukh, Kedar wrote:


I have one additional question at this point. How easy is this issue to

reproduce? Does it happen every time? In 10% of requests? 1% ?


[Kedar] It is reproducible 9/10 times in my environment. So 90% time it

is reproducible when concurrency is 5 or more and file sizes are between
1GB-5GB.

Thanks for the confirmation. I have converted your test classes into a
Tomcat unit test (easy for me to work with) and the issue looks to be
repeatable on Linux with the latest 10.1.x code.

I'm starting to look at this now. I'll post again when I have more
information. I'm not expecting this to be quick.


Kedar,

If you set useAsyncIO="false" on the Connector that should work around 
the problem for now. The Servlet Async API will still be available. 
Tomcat just uses a different code path to write to the network.



I did not expect it would be so easy to reproduce. Can you commit the test ?


It is a bit of a hack at the moment. The code isn't particularly clean 
and I have hard-coded some file paths for my system (I have a bunch of 
5GB Windows MSDN ISOs I am using for the large files. I also don't think 
we want test cases that using multi-GB files running on every test run.


If I clean things up a bit, parameterise the hard-coded paths bits and 
disable the test by default it should be in a reasonable state to commit.


It looks very much like the vectoredOperation and the associated 
semaphore is where things are going wrong.


I'm aiming to work on this some more tomorrow.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Rémy Maucherat
On Wed, Jun 16, 2021 at 7:36 PM Mark Thomas  wrote:

> On 16/06/2021 18:01, Deshmukh, Kedar wrote:
>
> > I have one additional question at this point. How easy is this issue to
> reproduce? Does it happen every time? In 10% of requests? 1% ?
> >
> > [Kedar] It is reproducible 9/10 times in my environment. So 90% time it
> is reproducible when concurrency is 5 or more and file sizes are between
> 1GB-5GB.
>
> Thanks for the confirmation. I have converted your test classes into a
> Tomcat unit test (easy for me to work with) and the issue looks to be
> repeatable on Linux with the latest 10.1.x code.
>
> I'm starting to look at this now. I'll post again when I have more
> information. I'm not expecting this to be quick.
>

I did not expect it would be so easy to reproduce. Can you commit the test ?

Rémy


Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Mark Thomas

On 16/06/2021 18:01, Deshmukh, Kedar wrote:


I have one additional question at this point. How easy is this issue to 
reproduce? Does it happen every time? In 10% of requests? 1% ?

[Kedar] It is reproducible 9/10 times in my environment. So 90% time it is 
reproducible when concurrency is 5 or more and file sizes are between 1GB-5GB.


Thanks for the confirmation. I have converted your test classes into a 
Tomcat unit test (easy for me to work with) and the issue looks to be 
repeatable on Linux with the latest 10.1.x code.


I'm starting to look at this now. I'll post again when I have more 
information. I'm not expecting this to be quick.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Deshmukh, Kedar



-Original Message-
From: Mark Thomas  
Sent: Wednesday, June 16, 2021 9:29 PM
To: users@tomcat.apache.org
Subject: Re: Trouble with HTTP/2 during concurrent bulk data transfer (server 
-> client)

On 16/06/2021 15:05, Deshmukh, Kedar wrote:
> Dear Tomcat users/dev team,
> 
> We are understanding the impact of HTTP/2 in our application as HTTP/2 
> provides better throughput and performance.

I'd be wary of making such sweeping statements. HTTP/2 has some advantages and 
some disadvantages. Generally, the advantages will outweigh the disadvantages 
but that will not always be the case.

[Kedar] - Okay.

> Before directly tuning
> HTTP/2 in application, we thought of analyzing certain use cases which 
> our application demands in standalone environment.
> 
> Our use case is very simple. Java based standalone client is making 
> simple POST request to the server and server read the file name from 
> the request and push the requested file to client. Here, client can 
> request multiple files same time by sending multiple requests 
> concurrently, so server should be able to send multiple files 
> concurrently depending on configured concurrency level.
> 
> Currently, in this test only single client is making requests to the 
> server with concurrency 5. Server is not overloaded and not performing 
> any other tasks. Machine has more than 500GB empty space and not 
> running any heavy applications.
> 
> Test:
> 
> We used different set of files for this test. Files with sizes between 
> 1GB - 5GB and concurrency > 5. We are using traditional connector 
> protocol HTTP11NIOProtocol with HTTP/2 is turned on.
> 
> Observations:
> 
> HTTP/1.1 - With HTTP/1.1 given sample code works fine. Only drawback 
> here is it opens multiple TCP connections to satisfy HTTP/1.1
> 
> HTTP/2 - With HTTP/2, it is expected to be only one TCP connection and 
> multiple streams to handle the traffic. Tomcat HTTP/2 debug logs 
> suggest that only one connection being used and multiple streams are 
> spawned as expected. So far everything is fine. But sample code does 
> not work consistently with higher concurrency (> 3). We captured the 
> stack trace of tomcat process which is attached here. Couple of tomcat 
> threads are waiting to acquire semaphore for socket write operation. 
> When write operation is stuck servlet is not able to push any data to 
> client and client is also stuck waiting for more data. I don't see any 
> error/exception at the client/server.

That looks / sounds like there is a code path - probably an error condition ? - 
where the semaphore isn't released.

[Kedar] Semaphore never released. As mentioned in sample code concurrency level 
is 5 so If you observe attached tomcat stack trace, you would find following 
very similar 4 threads which are waiting on semaphore which was acquired by 
other thread "Thread-9".

"Thread-7" #58 daemon prio=5 os_prio=0 cpu=218.75ms elapsed=105.35s 
tid=0x01844997a800 nid=0x4bc0 waiting on condition  [0x0027cc7fe000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method)
- parking to wait for  <0x0006158518b0> (a 
java.util.concurrent.Semaphore$NonfairSync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.10/LockSupport.java:234)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1079)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1369)
at 
java.util.concurrent.Semaphore.tryAcquire(java.base@11.0.10/Semaphore.java:415)
at 
org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1426)

"Thread-9" #59 daemon prio=5 os_prio=0 cpu=187.50ms elapsed=105.35s 
tid=0x01844997b800 nid=0x4b48 in Object.wait()  [0x0027cc9fe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.10/Native Method)
- waiting on <0x00061ed36de0> (a 
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState)
at 
org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1457)

 
The other possibility is related to the HTTP/2 flow control windows. If 
something goes wrong with the management of the flow control window for the 
connection it would block everything.

[Kedar] - Okay. But I did not see anything in the logs or WINDOWS_UPDATE frame 
related errors.

> streamReadTimeout and streamWriteTimeout are configured as -1 so they 
> are infinitely waiting for the write semaphore.

That is generally a bad idea. By all means set it high but an infinite 

Re: Trouble with HTTP/2 during concurrent bulk data transfer (server -> client)

2021-06-16 Thread Mark Thomas

On 16/06/2021 15:05, Deshmukh, Kedar wrote:

Dear Tomcat users/dev team,

We are understanding the impact of HTTP/2 in our application as HTTP/2 
provides better throughput and performance.


I'd be wary of making such sweeping statements. HTTP/2 has some 
advantages and some disadvantages. Generally, the advantages will 
outweigh the disadvantages but that will not always be the case.


Before directly tuning 
HTTP/2 in application, we thought of analyzing certain use cases which 
our application demands in standalone environment.


Our use case is very simple. Java based standalone client is making 
simple POST request to the server and server read the file name from the 
request and push the requested file to client. Here, client can request 
multiple files same time by sending multiple requests concurrently, so 
server should be able to send multiple files concurrently depending on 
configured concurrency level.


Currently, in this test only single client is making requests to the 
server with concurrency 5. Server is not overloaded and not performing 
any other tasks. Machine has more than 500GB empty space and not running 
any heavy applications.


Test:

We used different set of files for this test. Files with sizes between 
1GB – 5GB and concurrency > 5. We are using traditional connector 
protocol HTTP11NIOProtocol with HTTP/2 is turned on.


Observations:

HTTP/1.1 - With HTTP/1.1 given sample code works fine. Only drawback 
here is it opens multiple TCP connections to satisfy HTTP/1.1


HTTP/2 - With HTTP/2, it is expected to be only one TCP connection and 
multiple streams to handle the traffic. Tomcat HTTP/2 debug logs suggest 
that only one connection being used and multiple streams are spawned as 
expected. So far everything is fine. But sample code does not work 
consistently with higher concurrency (> 3). We captured the stack trace 
of tomcat process which is attached here. Couple of tomcat threads are 
waiting to acquire semaphore for socket write operation. When write 
operation is stuck servlet is not able to push any data to client and 
client is also stuck waiting for more data. I don’t see any 
error/exception at the client/server.


That looks / sounds like there is a code path - probably an error 
condition ? - where the semaphore isn't released.


The other possibility is related to the HTTP/2 flow control windows. If 
something goes wrong with the management of the flow control window for 
the connection it would block everything.


streamReadTimeout and streamWriteTimeout are configured as -1 so they 
are infinitely waiting for the write semaphore.


That is generally a bad idea. By all means set it high but an infinite 
timeout is going to cause problems  - particularly if clients just drop 
off the network.


Outcome of this is client is able to receive only partial data from 
server and at some point server stuck to send any more data.


We also tried IOUtils file transfer related APIs still it didn't help. I 
have also tried with Async non-blocking IO but the observations are same.


Generally, the simpler you keep the test case, the easier it is for us 
to work with. Non-async and no external IO libraries is better.


Our actual requirement is very similar where java based http client 
would request bulk data concurrently from server and server should push 
that without any trouble. But, it is not limited to files only. Server 
can push serialized java bulk objects over the stream concurrently.


The content type should not make any difference to Tomcat. Static files 
vs dynamic content would make a difference.


Note that sample code works fine most of the time if I enable HTTP/2 
logs either in client or tomcat. So I would suggest not to turn on 
HTTP/2 debug logs to conclude anything.


That suggests a timing issue of some sort.

HTTP/2 is significantly more complex than HTTP/1.1 because you have 
multiple independent application threads all trying to write to the same 
socket and Tomcat has to track and allocate flow control window 
allocations both for individual streams and the overall connection.



Following components are used in sample code for the test

1. Client - Java 11.0.10 httpclient - (client\Client.java)

2. Server - Tomcat 9.0.46

3. Servlet - AsyncServlet - (server\Server.java)

4. Operating system - Windows 10

5. Machine specifications – 32GB RAM and 500GB open space.

5. Latency - None, client and server are running on same machine

6. Set of files - You can use any random files whose sizes are between 
1GB-5GB to reproduce the issue.


Refer attachment for

1. Client side code

2. Server side servlet

3. server.xml

4. Tomcat Stacktrace

5. Tomcat server logs


Thanks. That is all useful information.

Could you please go through sample code along with server.xml. Here are 
my few questions


1. Why HTTP/2 is failing for such use case where large files are 
concurrently pushing to the client. I believe this is a very common use 
case must have be