Re: 7.0.22+ fd leak with APR/native

2012-01-13 Thread Mike Wertheim
Has a bug been logged for this issue (what seems to be a file descriptor leak)?

On Tue, Jan 3, 2012 at 1:17 PM, Mark Thomas ma...@apache.org wrote:
 I am trying to bring together all the information I have gleaned on this
 so far from the multiple threads to try and find the common factors.

 So far I have:
 - 7.0.21 is OK
 - 7.0.22 has an fd leak
 - 7.0.23 has an fd leak and may leak faster than 7.0.22
 - occurs with APR/native
 - does not occur with BIO
 - has been observed in HTTP  HTTPS
 - use of Comet does not trigger it
 - use of compression does not trigger it
 - separate connection and keep-alive timeouts does not trigger it

 It may be related to POST processing.

 I have tried (and so far failed) to reproduce this. I'll be looking at
 POST processing next. In the meantime, here are some further questions
 to try and narrow things down:

 1. Does the application where this is observed make use of Servlet 3.0
 async requests?

 2. Does this leak occur when the NIO connector is used?

 3. Are there any exceptions in the logs that weren't present in 7.0.21
 or earlier?

 4. Does the leak occur if sendfile is disabled?

 I also have reviewing the 7.0.21 to 7.0.22 changes on my todo list but
 there are quite a few as I was refactoring the connectors to reduce code
 duplication and ironically, reduce maintenance requirements, at the time.

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-13 Thread Arijit Ganguly
I am curious to know if this leak is related to unix sockets, or the IPv6
file handles. I have seen a similar issue with the NIO HTTP handler, where
it does not close some connections properly and they incarnate as file
handles corresponding to unix sockets (all pointing to same inode number).
Even 7.0.21 has problems.

Thanks,
Arijit

On Fri, Jan 13, 2012 at 4:24 PM, Mike Wertheim m...@hyperreal.org wrote:

 Has a bug been logged for this issue (what seems to be a file descriptor
 leak)?

 On Tue, Jan 3, 2012 at 1:17 PM, Mark Thomas ma...@apache.org wrote:
  I am trying to bring together all the information I have gleaned on this
  so far from the multiple threads to try and find the common factors.
 
  So far I have:
  - 7.0.21 is OK
  - 7.0.22 has an fd leak
  - 7.0.23 has an fd leak and may leak faster than 7.0.22
  - occurs with APR/native
  - does not occur with BIO
  - has been observed in HTTP  HTTPS
  - use of Comet does not trigger it
  - use of compression does not trigger it
  - separate connection and keep-alive timeouts does not trigger it
 
  It may be related to POST processing.
 
  I have tried (and so far failed) to reproduce this. I'll be looking at
  POST processing next. In the meantime, here are some further questions
  to try and narrow things down:
 
  1. Does the application where this is observed make use of Servlet 3.0
  async requests?
 
  2. Does this leak occur when the NIO connector is used?
 
  3. Are there any exceptions in the logs that weren't present in 7.0.21
  or earlier?
 
  4. Does the leak occur if sendfile is disabled?
 
  I also have reviewing the 7.0.21 to 7.0.22 changes on my todo list but
  there are quite a few as I was refactoring the connectors to reduce code
  duplication and ironically, reduce maintenance requirements, at the time.
 
  -
  To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
  For additional commands, e-mail: users-h...@tomcat.apache.org
 

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org




Re: 7.0.22+ fd leak with APR/native

2012-01-06 Thread Mark Thomas
On 04/01/2012 13:59, Konstantin Kolinko wrote:
 2012/1/4 Konstantin Kolinko knst.koli...@gmail.com:

 2. The
 processSocket(desc[n*2+1], SocketStatus.DISCONNECT);
 call just above the fixed line.

 It looks like a NOOP, because processSocket(long,SocketStatus) has an
 if() that does not mention SocketStatus.DISCONNECT.

 I think it is one more way to leak sockets. This method is used only
 in comet requests.
 
 and I think that if() might be a cause of missing some ERROR events in comet.

Agreed.

 Can the if() condition in processSocket(long,SocketStatus) be removed?
 Besides SocketStatus.DISCONNECT. there is one more status that is not
 mentioned in that if() and can lead to similar leaks:
 SocketStatus.ERROR.

 The if() was added in
 http://svn.apache.org/viewvc?view=revisionrevision=944518

 I do not see a reason for this if(). The r944518 says about running
 with a SecurityManager so it should have just added a
 PrivilegedAction there.

 
 One more oddity with r944518 in AprEndpoint:
 1. I do not understand why one has to set TCCL before calling
 getExecutor().execute(). We do not do this in other places. E.g. we do
 not do it in AprEndpoint#processSocket(long).

I suspect because the Servlet 3 Async code was re-using bits of the
Comet code (inconsistently in different ways on different connectors).

 2. r944518 says about TCK failures, but TCK does not test comet and so
 it could not be a reason for the change in this method which is used
 only by comet.

It is only used by Comet now but r944518 pre-dates r1001698 where there
was a major refactoring of the connectors.

 So it looks that not only the if(), but the whole r944518 change in
 AprEndpoint has to be reverted.

That looks OK to me. I'll do that but run the TCKs as a double check.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-06 Thread Konstantin Kolinko
2012/1/6 Mark Thomas ma...@apache.org:
 So it looks that not only the if(), but the whole r944518 change in
 AprEndpoint has to be reverted.

 That looks OK to me. I'll do that but run the TCKs as a double check.


I am running TestCometProcessor tests with this very change just now. ;)

(My plan is to reenable TestCometProcessor - that is revert
http://svn.apache.org/viewvc?rev=1225632view=rev
but with the following change in testCometConnectorStop():

- Do not ask reader thread whether END event message was received
over the wire, but ask the servlet directly whether the last CometEvent
that it processed was EventType.END.  I think it would be more reliable.
)

The NIO connector is OK, but I observe an issue with APR one:

In testCometConnectorStop() the test fails:

[[[
junit.framework.AssertionFailedError: No exception in writing thread
at 
org.apache.catalina.comet.TestCometProcessor.testCometConnectorStop(TestCometProcessor.java:289)
]]]

So no SocketException happens in the WriterThread.  At the same time
when I ask SimpleCometServlet for the last event that it received it
is EventType.END.

My thought is that the WriterThread does not see that the socket that
it writes to has been closed.

(It probably is not of much concern because it happens at shutdown,
but it might be that we are still missing something).

I thought that maybe during shutdown the
#processSocket(long,SocketStatus) that uses an executor to run the
event did not have chance to run,  but because the END event was
received by servlet I think that it did run.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-06 Thread Mark Thomas
On 06/01/2012 20:13, Konstantin Kolinko wrote:
 2012/1/6 Mark Thomas ma...@apache.org:
 So it looks that not only the if(), but the whole r944518 change in
 AprEndpoint has to be reverted.

 That looks OK to me. I'll do that but run the TCKs as a double check.

 
 I am running TestCometProcessor tests with this very change just now. ;)
 
 (My plan is to reenable TestCometProcessor - that is revert
 http://svn.apache.org/viewvc?rev=1225632view=rev
 but with the following change in testCometConnectorStop():
 
 - Do not ask reader thread whether END event message was received
 over the wire, but ask the servlet directly whether the last CometEvent
 that it processed was EventType.END.  I think it would be more reliable.
 )

I'm not keen on that idea since the point of the test was to ensure that
the END event was received by the client, not that it was processed by
the Server.

 The NIO connector is OK, but I observe an issue with APR one:
 
 In testCometConnectorStop() the test fails:
 
 [[[
 junit.framework.AssertionFailedError: No exception in writing thread
   at 
 org.apache.catalina.comet.TestCometProcessor.testCometConnectorStop(TestCometProcessor.java:289)
 ]]]
 
 So no SocketException happens in the WriterThread.  At the same time
 when I ask SimpleCometServlet for the last event that it received it
 is EventType.END.
 
 My thought is that the WriterThread does not see that the socket that
 it writes to has been closed.
 
 (It probably is not of much concern because it happens at shutdown,
 but it might be that we are still missing something).

I'd be happy for the lack of a WriterThread exception to trigger a log
message rather than a test failure.

Mark

 I thought that maybe during shutdown the
 #processSocket(long,SocketStatus) that uses an executor to run the
 event did not have chance to run,  but because the END event was
 received by servlet I think that it did run.
 
 Best regards,
 Konstantin Kolinko
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-06 Thread Konstantin Kolinko
2012/1/7 Mark Thomas ma...@apache.org:
 On 06/01/2012 20:13, Konstantin Kolinko wrote:
 2012/1/6 Mark Thomas ma...@apache.org:
 So it looks that not only the if(), but the whole r944518 change in
 AprEndpoint has to be reverted.

 That looks OK to me. I'll do that but run the TCKs as a double check.


 I am running TestCometProcessor tests with this very change just now. ;)

 (My plan is to reenable TestCometProcessor - that is revert
 http://svn.apache.org/viewvc?rev=1225632view=rev
 but with the following change in testCometConnectorStop():

 - Do not ask reader thread whether END event message was received
 over the wire, but ask the servlet directly whether the last CometEvent
 that it processed was EventType.END.  I think it would be more reliable.
 )

 I'm not keen on that idea since the point of the test was to ensure that
 the END event was received by the client, not that it was processed by
 the Server.

Is there any promise that it should happen? Won't the client see the
same from closed socket?

(Anyway ci failures showed that it does not happen reliably).


 The NIO connector is OK, but I observe an issue with APR one:

 In testCometConnectorStop() the test fails:

 [[[
 junit.framework.AssertionFailedError: No exception in writing thread
       at 
 org.apache.catalina.comet.TestCometProcessor.testCometConnectorStop(TestCometProcessor.java:289)
 ]]]

 So no SocketException happens in the WriterThread.  At the same time
 when I ask SimpleCometServlet for the last event that it received it
 is EventType.END.

 My thought is that the WriterThread does not see that the socket that
 it writes to has been closed.

 (It probably is not of much concern because it happens at shutdown,
 but it might be that we are still missing something).

 I'd be happy for the lack of a WriterThread exception to trigger a log
 message rather than a test failure.

 Mark

 I thought that maybe during shutdown the
 #processSocket(long,SocketStatus) that uses an executor to run the
 event did not have chance to run,  but because the END event was
 received by servlet I think that it did run.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-04 Thread Konstantin Kolinko
2012/1/4 Mark Thomas ma...@apache.org:
 On 03/01/2012 21:32, Mark Thomas wrote:
 On 03/01/2012 21:26, André Warnier wrote:
 Mark Thomas wrote:
 I am trying to bring together all the information I have gleaned on this
 so far from the multiple threads to try and find the common factors.

 snip/

 Suggestion: the large POST requests mentioned by a couple of posters
 suggest file uploads, which may imply temporary files used to buffer the
 files being uploaded, no ?

 Probably not. That code is connector agnostic. There might be something
 in the APR InputBuffer though.

 The APR InputBuffer is untouched between 7.0.21 and 7.0.22. However, I
 did find an error in r1166072 that might explain the fd leak. I am
 trying to reproduce it at the moment. In the meantime, if someone could
 try the following patch that would be great:

 Index: java/org/apache/tomcat/util/net/AprEndpoint.java
 ===
 --- java/org/apache/tomcat/util/net/AprEndpoint.java    (revision 1226551)
 +++ java/org/apache/tomcat/util/net/AprEndpoint.java    (working copy)
 @@ -1364,7 +1364,6 @@
                         } else {
                             destroySocket(desc[n*2+1]);
                         }
 -                        return true;
                     }
                 }
             } else if (rv  0) {



Good catch.

1. I think to reproduce the leak here the Poll.pol() has to return
several sockets where one has Poll.APR_POLLHUP flag set. The sockets
after that one will be lost.

(I do not see an easy way to get processSocket() to return false, so I
think APR_POLLHUP is a more easy target to reproduce this.).

2. The
processSocket(desc[n*2+1], SocketStatus.DISCONNECT);
call just above the fixed line.

It looks like a NOOP, because processSocket(long,SocketStatus) has an
if() that does not mention SocketStatus.DISCONNECT.

I think it is one more way to leak sockets. This method is used only
in comet requests.

Can the if() condition in processSocket(long,SocketStatus) be removed?
Besides SocketStatus.DISCONNECT. there is one more status that is not
mentioned in that if() and can lead to similar leaks:
SocketStatus.ERROR.

The if() was added in
http://svn.apache.org/viewvc?view=revisionrevision=944518

I do not see a reason for this if(). The r944518 says about running
with a SecurityManager so it should have just added a
PrivilegedAction there.


Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-04 Thread Konstantin Kolinko
2012/1/4 Konstantin Kolinko knst.koli...@gmail.com:

 2. The
 processSocket(desc[n*2+1], SocketStatus.DISCONNECT);
 call just above the fixed line.

 It looks like a NOOP, because processSocket(long,SocketStatus) has an
 if() that does not mention SocketStatus.DISCONNECT.

 I think it is one more way to leak sockets. This method is used only
 in comet requests.

and I think that if() might be a cause of missing some ERROR events in comet.


 Can the if() condition in processSocket(long,SocketStatus) be removed?
 Besides SocketStatus.DISCONNECT. there is one more status that is not
 mentioned in that if() and can lead to similar leaks:
 SocketStatus.ERROR.

 The if() was added in
 http://svn.apache.org/viewvc?view=revisionrevision=944518

 I do not see a reason for this if(). The r944518 says about running
 with a SecurityManager so it should have just added a
 PrivilegedAction there.


One more oddity with r944518 in AprEndpoint:
1. I do not understand why one has to set TCCL before calling
getExecutor().execute(). We do not do this in other places. E.g. we do
not do it in AprEndpoint#processSocket(long).

2. r944518 says about TCK failures, but TCK does not test comet and so
it could not be a reason for the change in this method which is used
only by comet.

So it looks that not only the if(), but the whole r944518 change in
AprEndpoint has to be reverted.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-04 Thread Mike Wertheim
On Tue, Jan 3, 2012 at 1:17 PM, Mark Thomas ma...@apache.org wrote:
 I am trying to bring together all the information I have gleaned on this
 so far from the multiple threads to try and find the common factors.

 So far I have:
 - 7.0.21 is OK
 - 7.0.22 has an fd leak
 - 7.0.23 has an fd leak and may leak faster than 7.0.22
 - occurs with APR/native
 - does not occur with BIO
 - has been observed in HTTP  HTTPS
 - use of Comet does not trigger it
 - use of compression does not trigger it
 - separate connection and keep-alive timeouts does not trigger it

 It may be related to POST processing.

 I have tried (and so far failed) to reproduce this. I'll be looking at
 POST processing next. In the meantime, here are some further questions
 to try and narrow things down:

 1. Does the application where this is observed make use of Servlet 3.0
 async requests?

No

 2. Does this leak occur when the NIO connector is used?

I haven't tried NIO.  I only tried APR.

 3. Are there any exceptions in the logs that weren't present in 7.0.21
 or earlier?

No

 4. Does the leak occur if sendfile is disabled?

Yes.  I just disabled sendfile and the behavior is the same,


 I also have reviewing the 7.0.21 to 7.0.22 changes on my todo list but
 there are quite a few as I was refactoring the connectors to reduce code
 duplication and ironically, reduce maintenance requirements, at the time.

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-03 Thread André Warnier

Mark Thomas wrote:

I am trying to bring together all the information I have gleaned on this
so far from the multiple threads to try and find the common factors.

So far I have:
- 7.0.21 is OK
- 7.0.22 has an fd leak
- 7.0.23 has an fd leak and may leak faster than 7.0.22
- occurs with APR/native
- does not occur with BIO
- has been observed in HTTP  HTTPS
- use of Comet does not trigger it
- use of compression does not trigger it
- separate connection and keep-alive timeouts does not trigger it

It may be related to POST processing.

I have tried (and so far failed) to reproduce this. I'll be looking at
POST processing next. In the meantime, here are some further questions
to try and narrow things down:

1. Does the application where this is observed make use of Servlet 3.0
async requests?

2. Does this leak occur when the NIO connector is used?

3. Are there any exceptions in the logs that weren't present in 7.0.21
or earlier?

4. Does the leak occur if sendfile is disabled?

I also have reviewing the 7.0.21 to 7.0.22 changes on my todo list but
there are quite a few as I was refactoring the connectors to reduce code
duplication and ironically, reduce maintenance requirements, at the time.



Suggestion: the large POST requests mentioned by a couple of posters suggest file 
uploads, which may imply temporary files used to buffer the files being uploaded, no ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-03 Thread Mark Thomas
On 03/01/2012 21:26, André Warnier wrote:
 Mark Thomas wrote:
 I am trying to bring together all the information I have gleaned on this
 so far from the multiple threads to try and find the common factors.

snip/

 Suggestion: the large POST requests mentioned by a couple of posters
 suggest file uploads, which may imply temporary files used to buffer the
 files being uploaded, no ?

Probably not. That code is connector agnostic. There might be something
in the APR InputBuffer though.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 7.0.22+ fd leak with APR/native

2012-01-03 Thread Mark Thomas
On 03/01/2012 21:32, Mark Thomas wrote:
 On 03/01/2012 21:26, André Warnier wrote:
 Mark Thomas wrote:
 I am trying to bring together all the information I have gleaned on this
 so far from the multiple threads to try and find the common factors.
 
 snip/
 
 Suggestion: the large POST requests mentioned by a couple of posters
 suggest file uploads, which may imply temporary files used to buffer the
 files being uploaded, no ?
 
 Probably not. That code is connector agnostic. There might be something
 in the APR InputBuffer though.

The APR InputBuffer is untouched between 7.0.21 and 7.0.22. However, I
did find an error in r1166072 that might explain the fd leak. I am
trying to reproduce it at the moment. In the meantime, if someone could
try the following patch that would be great:

Index: java/org/apache/tomcat/util/net/AprEndpoint.java
===
--- java/org/apache/tomcat/util/net/AprEndpoint.java(revision 1226551)
+++ java/org/apache/tomcat/util/net/AprEndpoint.java(working copy)
@@ -1364,7 +1364,6 @@
 } else {
 destroySocket(desc[n*2+1]);
 }
-return true;
 }
 }
 } else if (rv  0) {


Cheers,

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org