Re: [OT] Re: jvm exits without trace
Here's a hs_err file after a crash I had yesterday. We turned off some things in our code without restarting and the crashes have virtually stopped but we do still get the off one here and there where the application has not been restarted, could be that the problem lingers and builds up in time, who knows. It's a sigsegv in GCTaskThread. From the occupation in eden it looks like it happened during a scavenge (ParNew). Maybe an expert in some dark cave could shed some more light on it. On Tue, 2010-03-16 at 22:00 +0100, André Warnier wrote: Carl wrote: My approach is to get something (a JVM) that works and then gradually change until it breaks. Then, I know what is causing the problem. To date, I haven't been able to get a JVM that works. I think we understand that, and agree. Our remarks were tongue in cheek, if that is the right expression. At the bottom of things, finding a bug in the most recent JVM would be much more globally important than finding it in your applications, particularly a bug that can cause the JVM to segfault. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
With parent I meant the main JVM process as opposed to forked processes or threads, sorry to confuse you there. Stracing the threads generates too much data to store so I had to settle with the parent process. To answer your other questions. The code is 100% pure java, why it causes this messy crash is still unclear but development is working to figure it out. I'll follow up when we find out more, but I'm not sure if we're likely to dig into the root cause, working around it is more of a priority right now than debugging the jvm. On Mon, 2010-03-15 at 17:08 +0100, Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, On 3/15/2010 10:19 AM, Taylan Develioglu wrote: The cause for the crashes was in our own application code, we're currently investigating the exact reason. Yeah, I'd like to second Chuck's question: was it native code? A strace of the parent process shows killed by sigsegv, why or how this can happen is still unclear. So, the parent was being killed? What was the parent of the JVM? Thanks to everyone that gave their assistance. Definitely follow-up to let us all know what you've uncovered... this was certainly a weird situation. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkueW4wACgkQ9CaO5/Lv0PAdhgCfa32vlcsMI5ELCNcLSjjV+S/o FZEAnjvjXgAwxjejTXexGO//89TyeF+r =BPtZ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
The cause for the crashes was in our own application code, we're currently investigating the exact reason. A strace of the parent process shows killed by sigsegv, why or how this can happen is still unclear. Thanks to everyone that gave their assistance. On Thu, 2010-03-11 at 15:40 +0100, Taylan Develioglu wrote: Hi Carl, thanks for the suggestion. I am going to try jvm 1.6.07 regardless of what I said before. Funny coincidence, I tried the ibm jvm as well and ran into a similar issue (part of our ssl implementation uses sun specific libraries). On Thu, 2010-03-11 at 12:38 +0100, Carl wrote: Taylan, I am currently trying JVM 1.6.0_7 per Chuck's suggestion and, so far (4 days), it is working. I started down the IBM JVM path but have abandoned that for now due to difficulties with the SSL implementation (somne browsers would work and some wouldn't with seemingly the same setup.) Thanks, Carl - Original Message - From: Taylan Develioglu tdevelio...@ebuddy.com To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, March 11, 2010 6:13 AM Subject: Re: jvm exits without trace a different kernel did not help either... On Thu, 2010-03-11 at 11:37 +0100, Taylan Develioglu wrote: Changing to JIO didn't help, the silent crashes continue. I'm changing kernel versions now. On Fri, 2010-03-05 at 10:45 +0100, Taylan Develioglu wrote: It's performing rather poorly performance wise, compared to the apr connector. The number of threads required to handle the requests has gone up significantly over the board. Stability wise, I don't have complaints yet. I'm keeping my fingers crossed. On Fri, 2010-03-05 at 10:09 +0100, Pid wrote: On 05/03/2010 08:41, Taylan Develioglu wrote: Pid, that would assume we had a working 1.6.10 version before that we replaced. That it would. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. I must have missed that. How is the HTTP connector performing? p On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related
Re: jvm exits without trace
Changing to JIO didn't help, the silent crashes continue. I'm changing kernel versions now. On Fri, 2010-03-05 at 10:45 +0100, Taylan Develioglu wrote: It's performing rather poorly performance wise, compared to the apr connector. The number of threads required to handle the requests has gone up significantly over the board. Stability wise, I don't have complaints yet. I'm keeping my fingers crossed. On Fri, 2010-03-05 at 10:09 +0100, Pid wrote: On 05/03/2010 08:41, Taylan Develioglu wrote: Pid, that would assume we had a working 1.6.10 version before that we replaced. That it would. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. I must have missed that. How is the HTTP connector performing? p On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern of failures you are experiencing because they may provide some clues to my problem. In my case, the system will run for 15 minutes to 10 days before failing (most of the time it is several days to a week.) It appears to die from a seg fault in the JVM (I am using Sun
Re: jvm exits without trace
a different kernel did not help either... On Thu, 2010-03-11 at 11:37 +0100, Taylan Develioglu wrote: Changing to JIO didn't help, the silent crashes continue. I'm changing kernel versions now. On Fri, 2010-03-05 at 10:45 +0100, Taylan Develioglu wrote: It's performing rather poorly performance wise, compared to the apr connector. The number of threads required to handle the requests has gone up significantly over the board. Stability wise, I don't have complaints yet. I'm keeping my fingers crossed. On Fri, 2010-03-05 at 10:09 +0100, Pid wrote: On 05/03/2010 08:41, Taylan Develioglu wrote: Pid, that would assume we had a working 1.6.10 version before that we replaced. That it would. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. I must have missed that. How is the HTTP connector performing? p On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern
Re: jvm exits without trace
Hi Carl, thanks for the suggestion. I am going to try jvm 1.6.07 regardless of what I said before. Funny coincidence, I tried the ibm jvm as well and ran into a similar issue (part of our ssl implementation uses sun specific libraries). On Thu, 2010-03-11 at 12:38 +0100, Carl wrote: Taylan, I am currently trying JVM 1.6.0_7 per Chuck's suggestion and, so far (4 days), it is working. I started down the IBM JVM path but have abandoned that for now due to difficulties with the SSL implementation (somne browsers would work and some wouldn't with seemingly the same setup.) Thanks, Carl - Original Message - From: Taylan Develioglu tdevelio...@ebuddy.com To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, March 11, 2010 6:13 AM Subject: Re: jvm exits without trace a different kernel did not help either... On Thu, 2010-03-11 at 11:37 +0100, Taylan Develioglu wrote: Changing to JIO didn't help, the silent crashes continue. I'm changing kernel versions now. On Fri, 2010-03-05 at 10:45 +0100, Taylan Develioglu wrote: It's performing rather poorly performance wise, compared to the apr connector. The number of threads required to handle the requests has gone up significantly over the board. Stability wise, I don't have complaints yet. I'm keeping my fingers crossed. On Fri, 2010-03-05 at 10:09 +0100, Pid wrote: On 05/03/2010 08:41, Taylan Develioglu wrote: Pid, that would assume we had a working 1.6.10 version before that we replaced. That it would. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. I must have missed that. How is the HTTP connector performing? p On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit
Re: jvm exits without trace
Sorry I wasn't clear. I didn't mean 2172 concurrent requests. Just sessions. It hadn't occured to me that the number of sessions does not necessarily equal the number of connections (duh). the number of established connections indeed equals the number of threads. So what Chuck said was true. On Tue, 2010-03-09 at 19:29 +0100, André Warnier wrote: Taylan Develioglu wrote: Chuck, if that is true how can we explain I see only 637 busy threads on a server that is serving 2172 clients ? Woaw ! can you give us your trick ? If every connection requires its own thread there should be 2172 threads. Seriously now : when a thread is finished serving a request, there is still some time during which the response bytes are cascading through the network to the clients. I think you need to defined serving 2172 clients a bit more precisely before you can say this, no ? On Tue, 2010-03-09 at 16:40 +0100, Caldarale, Charles R wrote: From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: RE: jvm exits without trace where peak busy-threads used to be ~50 with APR, now it has become ~200 with JIO. To be expected when you have unlimited keep-alives configured. Each HTTP connection requires a separate thread with JIO, whereas the NIO and APR connectors use a single poller thread. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: jvm exits without trace
The switch is from APR to JIO. SSL practically doesn't get used. Almost all pages served are jsp or java, very little static files are served and keep-alive is on. where peak busy-threads used to be ~50 with APR, now it has become ~200 with JIO. Here are the connector definitions for reference (no executor is used): - APR: Connector port=80 protocol=org.apache.coyote.http11.Http11AprProtocol compression=1024 keepAliveTimeout=6 maxKeepAliveRequests=-1 enableLookups=false redirectPort=443 maxThreads=150 pollerSize=32768 / - JIO: Connector port=80 protocol=org.apache.coyote.http11.Http11Protocol compression=1024 connectionTimeout=1 keepAliveTimeout=6 maxKeepAliveRequests=-1 enableLookups=false redirectPort=443 maxThreads=720/ On Fri, 2010-03-05 at 19:13 +0100, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: jvm exits without trace I thought he said he was using APR, not NIO. He was, but IIRC, switched away from it to see if that would affect the outages. What we don't know is what was switched to - JIO or NIO. If it's JIO, there may be a lot of threads tied up handling persistent HTTP connections, possibly causing heap or other resource problems. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: jvm exits without trace
Chuck, if that is true how can we explain I see only 637 busy threads on a server that is serving 2172 clients ? If every connection requires its own thread there should be 2172 threads. On Tue, 2010-03-09 at 16:40 +0100, Caldarale, Charles R wrote: From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: RE: jvm exits without trace where peak busy-threads used to be ~50 with APR, now it has become ~200 with JIO. To be expected when you have unlimited keep-alives configured. Each HTTP connection requires a separate thread with JIO, whereas the NIO and APR connectors use a single poller thread. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
Pid, that would assume we had a working 1.6.10 version before that we replaced. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern of failures you are experiencing because they may provide some clues to my problem. In my case, the system will run for 15 minutes to 10 days before failing (most of the time it is several days to a week.) It appears to die from a seg fault in the JVM (I am using Sun 1.6.0_18 but have tried previous versions)... you may be able to see the cause of the failure from the core file (the core files on my systems were in several directories so you may have to do a 'find' to locate them.) Load may be a factor but the failures generally come after the load has been heavy for a while. I am running a couple of applications and it seems the failures are more frequent when people are hitting the additional apps (the primary app is always used, the remaining apps are used sporatically.) How does this compare to what you are experiencing? Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org;p...@pidster.com Sent: Wednesday, February 24, 2010 5
Re: jvm exits without trace
It's performing rather poorly performance wise, compared to the apr connector. The number of threads required to handle the requests has gone up significantly over the board. Stability wise, I don't have complaints yet. I'm keeping my fingers crossed. On Fri, 2010-03-05 at 10:09 +0100, Pid wrote: On 05/03/2010 08:41, Taylan Develioglu wrote: Pid, that would assume we had a working 1.6.10 version before that we replaced. That it would. We've run 1.6.10 upwards succesfully for a very long time. So I don't see the point in doing this. I must have missed that. How is the HTTP connector performing? p On Wed, 2010-03-03 at 12:00 +0100, Pid wrote: On 03/03/2010 09:11, Taylan Develioglu wrote: Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. As Chuck mentioned in the other thread, significant changes occurred at 1.6.10, so trying the release before (1.6.7) might be necessary to establish a better determination. p On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglutdevelio...@ebuddy.com To: Tomcat Users Listusers@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern of failures you are experiencing because they may provide some clues to my problem. In my case, the system will run for 15 minutes to 10 days before failing (most of the time it is several days to a week.) It appears to die from a seg fault in the JVM (I am using Sun 1.6.0_18 but have tried previous versions)... you may be able to see the cause of the failure from the core file (the core files on my systems were in several directories so you may have to do a 'find' to locate them.) Load may be a factor but the failures generally come after the load has been heavy for a while
Re: jvm exits without trace
Downgrading to 1.6.0_16 did not help. I'm replacing the apr connector with http now. On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - Original Message - From: Taylan Develioglu tdevelio...@ebuddy.com To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, February 24, 2010 8:31 AM Subject: Re: jvm exits without trace Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern of failures you are experiencing because they may provide some clues to my problem. In my case, the system will run for 15 minutes to 10 days before failing (most of the time it is several days to a week.) It appears to die from a seg fault in the JVM (I am using Sun 1.6.0_18 but have tried previous versions)... you may be able to see the cause of the failure from the core file (the core files on my systems were in several directories so you may have to do a 'find' to locate them.) Load may be a factor but the failures generally come after the load has been heavy for a while. I am running a couple of applications and it seems the failures are more frequent when people are hitting the additional apps (the primary app is always used, the remaining apps are used sporatically.) How does this compare to what you are experiencing? Thanks, Carl - Original Message - From: Taylan Develioglu tdevelio...@ebuddy.com To: Tomcat Users List users@tomcat.apache.org; p...@pidster.com Sent: Wednesday, February 24, 2010 5:09 AM Subject: Re: jvm exits without trace The GC log shows plenty of heap space left in all the spaces. I purposely didn't bother replacing the variables because I figured they would not be relevant. But if you think they might provide clues they're as follows: JAVA_HEAP_SIZE=18432M JAVA_EDEN_SIZE=$(($(echo $JAVA_HEAP_SIZE|sed 's/M$\|G$//')/6))M JAVA_PERM_SIZE=128M JAVA_STCK_SIZE=128K EDEN_SIZE is 1/6th of total heap
Re: [OT] jvm exits without trace
Chuck I am aware. A SIGSEGV is a signal sent by the kernel. Not a violation itsself. A sigsegv is sent when an invalid memory access is attempted by a process in userspace, in other words a page fault occurs, when the page is actually present in physical memory but cannot be accessed by the program. When this kind of violation occurs the a sigsegv is sent by the kernel to the violating program. At least that's what 'Understanding the linux kernel' leads me to believe (chapter process address space, page fault exception handler p376 - 378). On Thu, 2010-02-25 at 22:36 +0100, Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, On 2/24/2010 8:31 AM, Taylan Develioglu wrote: We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. Just to be clear, a SIGSEGV is a segmentation violation (memory read outside process space), not a page fault, which is a perfectly normal thing to occur during execution. The latter is a virtual memory matter handled by the operating system and should be transparent (other than a delay) to the application. http://en.wikipedia.org/wiki/Segmentation_violation http://en.wikipedia.org/wiki/Page_fault The Wikipedia page for Page fault does indicate that Invalid page fault is a term that essentially means null pointer dereference but I've never heard that term used, ever. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuG7UMACgkQ9CaO5/Lv0PAdLgCfUypdTf332QZ6JHyTzPlS4Lu5 4xMAnReYrzhvO9xiSS7qB331Tq5DwPpx =5cqn -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
Hi Chris, There's no doubt about it. The amount free is what's left after everything is taken into account, heap, jvm, jni, permgen. And trust me I'd like it to be the oom killer, but it's not. They could survive, but then I could throw away half of my ram. Not seeing any point in doing that (doesn't fix the problem). On Thu, 2010-02-25 at 22:38 +0100, Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylann, On 2/24/2010 8:31 AM, Taylan Develioglu wrote: Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Are you sure the rest of the JVM can fit into this space? I've heard of JVMs (particularly on Windows) that take a significant chunk of memory on top of the heap space requested on the command-line. Definitely check your system logs for OOM killer, here. What happens if you cut your heap in half? Can each machine in your (probably) cluster survive with less heap space? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuG7cEACgkQ9CaO5/Lv0PDRtgCfd7qBww9EUP9whAf6ZlvSvl02 VnYAoK6f6GTY1vBzw3QW0phnr/53gBYG =8thi -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] jvm exits without trace
for some reason I keep calling you chuck... I hope I'm not offending anyone :O On Fri, 2010-02-26 at 13:55 +0100, Taylan Develioglu wrote: Chuck I am aware. A SIGSEGV is a signal sent by the kernel. Not a violation itsself. A sigsegv is sent when an invalid memory access is attempted by a process in userspace, in other words a page fault occurs, when the page is actually present in physical memory but cannot be accessed by the program. When this kind of violation occurs the a sigsegv is sent by the kernel to the violating program. At least that's what 'Understanding the linux kernel' leads me to believe (chapter process address space, page fault exception handler p376 - 378). On Thu, 2010-02-25 at 22:36 +0100, Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, On 2/24/2010 8:31 AM, Taylan Develioglu wrote: We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. Just to be clear, a SIGSEGV is a segmentation violation (memory read outside process space), not a page fault, which is a perfectly normal thing to occur during execution. The latter is a virtual memory matter handled by the operating system and should be transparent (other than a delay) to the application. http://en.wikipedia.org/wiki/Segmentation_violation http://en.wikipedia.org/wiki/Page_fault The Wikipedia page for Page fault does indicate that Invalid page fault is a term that essentially means null pointer dereference but I've never heard that term used, ever. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuG7UMACgkQ9CaO5/Lv0PAdLgCfUypdTf332QZ6JHyTzPlS4Lu5 4xMAnReYrzhvO9xiSS7qB331Tq5DwPpx =5cqn -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
jvm exits without trace
Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? Any help appreciated. - Tomcat 6.0.24 - TC native 1.1.18 - APR 1.3.9 - Sun JDK 6u18 - Debian Lenny, 2.6.31.10-amd64 2 servlets, one as ROOT. 2 HTTP connectors that use TCNative/APR. JAVA_OPTS ( ): -verbose:gc -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=$TMP_DIR -Djava.library.path=/usr/local/lib -Djava.endorsed.dirs=$CATALINA_BASE/endorsed -Dcatalina.base=$CATALINA_BASE -Dcatalina.home=$CATALINA_HOME -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties -XX:+PrintGCDetails -Xloggc:$CATALINA_BASE/logs/gc.log -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Xms$JAVA_HEAP_SIZE -Xmx$JAVA_HEAP_SIZE -XX:NewSize=$JAVA_EDEN_SIZE -XX:MaxNewSize=$JAVA_EDEN_SIZE -XX:PermSize=$JAVA_PERM_SIZE -XX:MaxPermSize=$JAVA_PERM_SIZE -Xss$JAVA_STCK_SIZE -XX:+UseLargePages - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
I thought I'd add the connector definitions too, : Connector port=80 protocol=org.apache.coyote.http11.Http11AprProtocol compression=1024 keepAliveTimeout=6 maxKeepAliveRequests=-1 enableLookups=false redirectPort=443 maxThreads=150 pollerSize=32768 pollerThreadCount=4/ Connector port=443 protocol=org.apache.coyote.http11.Http11AprProtocol SSLEnabled=true enableLookups=false maxThreads=10 scheme=https secure=true SSLCertificateFile=/etc/ssl/private/something.crt SSLCertificateKeyFile=/etc/ssl/private/something.key SSLCACertificateFile=/etc/ssl/certs/ca.crt/ On Wed, 2010-02-24 at 10:23 +0100, Taylan Develioglu wrote: Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? Any help appreciated. - Tomcat 6.0.24 - TC native 1.1.18 - APR 1.3.9 - Sun JDK 6u18 - Debian Lenny, 2.6.31.10-amd64 2 servlets, one as ROOT. 2 HTTP connectors that use TCNative/APR. JAVA_OPTS ( ): -verbose:gc -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=$TMP_DIR -Djava.library.path=/usr/local/lib -Djava.endorsed.dirs=$CATALINA_BASE/endorsed -Dcatalina.base=$CATALINA_BASE -Dcatalina.home=$CATALINA_HOME -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties -XX:+PrintGCDetails -Xloggc:$CATALINA_BASE/logs/gc.log -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Xms$JAVA_HEAP_SIZE -Xmx$JAVA_HEAP_SIZE -XX:NewSize=$JAVA_EDEN_SIZE -XX:MaxNewSize=$JAVA_EDEN_SIZE -XX:PermSize=$JAVA_PERM_SIZE -XX:MaxPermSize=$JAVA_PERM_SIZE -Xss$JAVA_STCK_SIZE -XX:+UseLargePages - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
The GC log shows plenty of heap space left in all the spaces. I purposely didn't bother replacing the variables because I figured they would not be relevant. But if you think they might provide clues they're as follows: JAVA_HEAP_SIZE=18432M JAVA_EDEN_SIZE=$(($(echo $JAVA_HEAP_SIZE|sed 's/M$\|G$//')/6))M JAVA_PERM_SIZE=128M JAVA_STCK_SIZE=128K EDEN_SIZE is 1/6th of total heap. And I said there was nothing in the system logs. But you get a couple of points for trying. On Wed, 2010-02-24 at 10:44 +0100, Pid wrote: On 24/02/2010 09:36, Taylan Develioglu wrote: I thought I'd add the connector definitions too, : Connector port=80 protocol=org.apache.coyote.http11.Http11AprProtocol compression=1024 keepAliveTimeout=6 maxKeepAliveRequests=-1 enableLookups=false redirectPort=443 maxThreads=150 pollerSize=32768 pollerThreadCount=4/ Connector port=443 protocol=org.apache.coyote.http11.Http11AprProtocol SSLEnabled=true enableLookups=false maxThreads=10 scheme=https secure=true SSLCertificateFile=/etc/ssl/private/something.crt SSLCertificateKeyFile=/etc/ssl/private/something.key SSLCACertificateFile=/etc/ssl/certs/ca.crt/ On Wed, 2010-02-24 at 10:23 +0100, Taylan Develioglu wrote: Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? Any help appreciated. - Tomcat 6.0.24 - TC native 1.1.18 - APR 1.3.9 - Sun JDK 6u18 - Debian Lenny, 2.6.31.10-amd64 2 servlets, one as ROOT. 2 HTTP connectors that use TCNative/APR. JAVA_OPTS ( ): -verbose:gc -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=$TMP_DIR -Djava.library.path=/usr/local/lib -Djava.endorsed.dirs=$CATALINA_BASE/endorsed -Dcatalina.base=$CATALINA_BASE -Dcatalina.home=$CATALINA_HOME -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties -XX:+PrintGCDetails -Xloggc:$CATALINA_BASE/logs/gc.log -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Xms$JAVA_HEAP_SIZE -Xmx$JAVA_HEAP_SIZE -XX:NewSize=$JAVA_EDEN_SIZE -XX:MaxNewSize=$JAVA_EDEN_SIZE -XX:PermSize=$JAVA_PERM_SIZE -XX:MaxPermSize=$JAVA_PERM_SIZE -Xss$JAVA_STCK_SIZE -XX:+UseLargePages There's no actual heap size settings in the above. But you get a couple of points for trying. Google Linux Out Of Memory killer or OOM Killer and then check the server logs carefully. (e.g. /var/log/messages) p - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
Thank you Konstantin, I've read the thread you mentioned. I should have mentioned the mysterious exit happens on several different servers with different hardware and configuration. So it's very unlikely it's being caused by a hardware issue. It's also not the oom killer as I mentioned before, I already investigated those possibilities. I'm suspecting jni with tomcat native and apr now, I believe native code outside the jvm could very well cause a crash like this but my ignorance on the subject isn't helping. I've had strange behavior with libapr 1.3 and apache on machines with debian 5.0 that synchronize their clock using clock slew (ntpdate) and decreased the ntpdate frequency to see if that helps. ((as you can tell I'm getting a bit desperate) On Wed, 2010-02-24 at 11:28 +0100, Konstantin Kolinko wrote: 2010/2/24 Taylan Develioglu tdevelio...@ebuddy.com: Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? There is currently a thread named Tomcat dies suddenly Look there for starters. While that is unlikely your case, most ideas of diagnosing such an issue are mentioned in the first dozen of messages of that thread. http://marc.info/?t=12632496092r=1w=2 http://marc.info/?t=12633901125r=1w=2 http://marc.info/?t=12647949758r=6w=2 http://marc.info/?t=12660960545r=1w=2 Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
Hello Carl, The failures we've seen are in anywhere between 8 hours to a week of runtime. Most of them have (still) been running for almost a month without failure. There are ~100 machines. From the top of my head, I think we've had about 10+ failures now. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. But I don't know if the two are related. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. It might be useful to note that the failures happen with tomcat 6.0.20 as well as 6.0.24. As far as load concerns, I haven't had a failure on an idle machines. The machines are well loaded, but only at a fraction limit in regards to load and cpu utilization. Most memory is commited to tomcat, where a 24G machine would have 18G allocated to heap, 128M to permgen and some unspecified amount would get used by jni for apr. About 4G remains free after calculating taking into account the jvm itsself. A 16G machine would have 12G allocated to the heap. Besides the fact that our apps heavily use nio and mina I wouldn't say there's anything else noteworthy. There can be anywhere up to 1 concurrents on one machine. I had searched for coredumps, but no luck. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. On Wed, 2010-02-24 at 12:42 +0100, Carl wrote: Taylan, I am the person who started the Tomcat dies suddenly thread which I still haven't resolved. I am curious about the pattern of failures you are experiencing because they may provide some clues to my problem. In my case, the system will run for 15 minutes to 10 days before failing (most of the time it is several days to a week.) It appears to die from a seg fault in the JVM (I am using Sun 1.6.0_18 but have tried previous versions)... you may be able to see the cause of the failure from the core file (the core files on my systems were in several directories so you may have to do a 'find' to locate them.) Load may be a factor but the failures generally come after the load has been heavy for a while. I am running a couple of applications and it seems the failures are more frequent when people are hitting the additional apps (the primary app is always used, the remaining apps are used sporatically.) How does this compare to what you are experiencing? Thanks, Carl - Original Message - From: Taylan Develioglu tdevelio...@ebuddy.com To: Tomcat Users List users@tomcat.apache.org; p...@pidster.com Sent: Wednesday, February 24, 2010 5:09 AM Subject: Re: jvm exits without trace The GC log shows plenty of heap space left in all the spaces. I purposely didn't bother replacing the variables because I figured they would not be relevant. But if you think they might provide clues they're as follows: JAVA_HEAP_SIZE=18432M JAVA_EDEN_SIZE=$(($(echo $JAVA_HEAP_SIZE|sed 's/M$\|G$//')/6))M JAVA_PERM_SIZE=128M JAVA_STCK_SIZE=128K EDEN_SIZE is 1/6th of total heap. And I said there was nothing in the system logs. But you get a couple of points for trying. On Wed, 2010-02-24 at 10:44 +0100, Pid wrote: On 24/02/2010 09:36, Taylan Develioglu wrote: I thought I'd add the connector definitions too, : Connector port=80 protocol=org.apache.coyote.http11.Http11AprProtocol compression=1024 keepAliveTimeout=6 maxKeepAliveRequests=-1 enableLookups=false redirectPort=443 maxThreads=150 pollerSize=32768 pollerThreadCount=4/ Connector port=443 protocol=org.apache.coyote.http11.Http11AprProtocol SSLEnabled=true enableLookups=false maxThreads=10 scheme=https secure=true SSLCertificateFile=/etc/ssl/private/something.crt SSLCertificateKeyFile=/etc/ssl/private/something.key SSLCACertificateFile=/etc/ssl/certs/ca.crt/ On Wed, 2010-02-24 at 10:23 +0100, Taylan Develioglu wrote: Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? Any help appreciated. - Tomcat 6.0.24 - TC native 1.1.18 - APR 1.3.9 - Sun JDK 6u18 - Debian Lenny, 2.6.31.10-amd64 2 servlets, one as ROOT. 2 HTTP connectors that use TCNative/APR. JAVA_OPTS ( ): -verbose:gc -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dfile.encoding=UTF-8
Re: jvm exits without trace
It's possible, I'm going to try an earlier jvm first. u16 was the previous one running production, will try moving back to u16. If that fails removing APR is the next thing to try out. After that I'm going to try beating the dev team with a stick (I know you're reading this!). This is incredibly frustrating, thanks for all the help. Can you disable APR, use the alternative SSL configuration or is that not possible? Also, would be it be possible to use an earlier 1.6 JVM* or perhaps even a completely different one? I can't remember, offhand, what (if any) results Carl had with other JVMs. p * Perhaps there's a subtle bug in recent releases of the JVM. On Wed, 2010-02-24 at 11:28 +0100, Konstantin Kolinko wrote: 2010/2/24 Taylan Develioglutdevelio...@ebuddy.com: Hi, I have jvm's, running tomcat and our application, exiting mysteriously, and was wondering if anyone could give me some advice on how to debug this thing. There is nothing in catalina.out, nor our application logs, and no hotspot error file. GC log looks normal. No trace in system logs. I am left completely clueless :(, has anyone dealt with a problem like this before? There is currently a thread named Tomcat dies suddenly Look there for starters. While that is unlikely your case, most ideas of diagnosing such an issue are mentioned in the first dozen of messages of that thread. http://marc.info/?t=12632496092r=1w=2 http://marc.info/?t=12633901125r=1w=2 http://marc.info/?t=12647949758r=6w=2 http://marc.info/?t=12660960545r=1w=2 Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: jvm exits without trace
I'll be sure to post an update if u16 resolves it. Or any other progress for that matter. In the meantime don't be shy either :) On Wed, 2010-02-24 at 14:52 +0100, Carl wrote: Taylan, The failures we've seen are in anywhere between 8 hours to a week of runtime. The timing of the failures seems similar. We have also had failures with hotspot error files (hs_err) present, and the cause specified was indeed SIGSEGV indicating a page fault. I have never seen any hs_* files but have seen core files where strace showed the jvm stopped on a seg fault. We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when the situation allows (during regular updates of the application, or a crash) to see if that helps. I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not tried 1.6.0_16. Please post your results of this trial. Running tomcat on the foreground might show something, but then again I could be waiting for a month for it to happen. Yes, this has been part of my problem as anytime we change something, we have to wait a week for the server to fail. In one sense, I am fortunate that I have a little more flexibility than you. I have two servers (different hardware) but only need one in service at a time. Therefore, I always have one server I can test ideas on although I have never been able to develop a meaningful stress test, i.e., the only way I can test a change is to put it in production. Thanks, Carl - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: CLOSE_WAIT and what to do about it
Skimmed quickly through your post there while working, so forgive me if this is irrelevant. CLOSE_WAIT is a state where the connection has been closed on the tcp/ip level, but the application (in this case java) has not closed the socket descriptor yet. As a coincidence we just fixed this very same issue in our application, which uses the httpclient library. There is a known issue with the httpclient library where sockets are not closed after the connection ends (issue or feature you be the judge), we worked around this by explicitly calling a close ourselves. If httpclient is used that could be the culprit. See http://www.nabble.com/tcp-connections-left-with-CLOSE_WAIT-td13757202.html for a better description Rgds, Taylan André Warnier wrote: Hi. As a follow-upon another thread originally entitled apache/tomcat communication issues (502 response), I'd like to pursue the CLOSE-WAIT subject. Sorry if this post is a bit long, I want to make sure that I do provide all the necessary information. Like the original poster, I am seeing on my systems a fair number of sockets apparently stuck for a long time in the CLOSE_WAIT state. (Sometimes several hundreds of them). They seem to predominantly concern Tomcat and other java processes, but as Alan pointed out previously and I confirm, my perspective is slanted, because we use a lot of common java programs and webapps on our servers, and the ones mostly affected talk to eachother and come from the same vendor. Unfortunately also, I do not have the sources of these programs/webapps available, and will not get them, and I can't do without these programs. It has been previously established that a socket in a long-time-lingering CLOSE-WAIT status, is due to one or the other side of a TCP connection not properly closing its side of the connection when it is done with it. I also surmise (without having a definite proof of this), that this is essentially bad, as it ties up some resources that could be otherwise freed. I have also been told or discovered that, our servers being Linux Debian servers, programs such as ps, netstat and lsof can help in determining precisely how many such lingering sockets there are, and who the culprit processes are (to some extent). In our case, we know which are the programs involved, because we know which ones open a listening socket and on what fixed port, and we also know which are the other processes talking to them. But, as mentioned previously, we do not have the source of these programs and will not get them, but cannot practically do without them for now. But we do have full root control of the Linux servers where these programs are running. So my question is : considering the situation above, is there something I can do locally to free these lingering CLOSE_WAIT sockets, and under which conditions ? (I must admit that I am a bit lost among the myriad options of lsof) For example, suppose I start with a netstat -pan command and I see the display below (sorry for the line-wrapping). I see a number of sockets in the CLOSE_WAIT state, and for those I have a process-id, which I can associate to a particular process. For example, I see this line : tcp6 12 0 :::127.0.0.1:41764 :::127.0.0.1:11002 CLOSE_WAIT 29649/java which tells me that there is a local process 29649/java, whith a local socket port 41674 in the CLOSE_WAIT state, related to another socket #11002 on the same host. On the other hand, I see this line : tcp0 0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2 - which shows a local socket on port 11002, related to this other local socket port #41764, with no process-id/program displayed. What does that tell me ? I also know that the process-id 29649 corresponds to a local java process, of the daemon variety, multi-threaded. That program talks to another known server program, written in C, of which instances are started on an ad-hoc base by inetd, and which listens on port 11002 (in fact it is inetd who does, and it passes this socket on to the process it forks, I understand that). (The link with Tomcat is that I also see frequently the same situation, where the process owning the CLOSE_WAIT socket is Tomcat, more specifically one webapp running inside it. It's just that in this particular snapshot it isn't.) What it looks like to me in this case, is that at some point one of the threads of process # 29649 opened a client socket #41674 to the local inetd port #11002; that inetd then started the underlying server process (the C program); that the underlying C program then at some point exited; but that process #41674 never closes one of the sides of its connection with port #11002. Can I somehow detect this condition, and force the offending thread of process #29649 to close that socket (or just force this thread to exit) ? I realise this may be a complex question, and that the answers may be
Re: CPU usage with APR and connectionTimeout impact
Funny, according to the documentation there exists no connectionTimeout attribute for the apr connector. Setting the value to '0' could mean all sorts of behavior, no way to know for sure short of checking the code. (it could mean the connector will not wait for the uri line at all) I can't comment about a correct value for your application. Setting it to a low value will have the connector thread return to the pool faster on connections where the peer has gone to lunch after the initial connection. This only matters if you have a large number of such peers. I'm sure one of the veterans here can clear this up for you. Hello, In my project, we are using Tomcat 6.0.18, with APR 1.2.12 and tc native 1.1.14 on an Redhat OS (Linux kernel 2.6.18). There is a behavior that I can't explain: -with connectionTimeout=0, the process tomcat uses a huge percentage of CPU, even if there is no traffic. but we doesn't observe any problem and the response time is good. -with connectionTimeout=5000, the process tomcat uses a normal percentage of CPU, when there is no traffic. -without APR and with connectionTimeout=0, the process tomcat uses a normal percentage of CPU when there is no traffic. After different searches on the web, tomcat manual and mailing lists, I don't find the reason of the link between CPU usage and connectionTimeout/keepAliveTimeout with APR. With the previous release of Tomcat (5.5) and APR, we have a similar CPU usage (without traffic, high CPU load) and when we modify another parameter (firstReadTimeout), the behavior also changes in the same way. I know there is no real trouble, but I'm curious and prudent: I don't like to do something, when I don't understand what is hidden behind. Could somebody explain to me why Tomcat/APR has these behaviors? Is there a performance risk to set connectionTimeout to 5000? Thank you for your answers. Yann - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: CPU usage with APR and connectionTimeout impact
You're right. I missed it. APR has the same attributes as the HTTP connector. I think a seperate overview of attributes per connector would be clearer. The HTTP connectionTimeout description states: - The number of milliseconds this *Connector* will wait, after accepting a connection, for the request URI line to be presented. The default value is 6 (i.e. 60 seconds). '0' is not explicitly defined as a special value. According to the description it would mean a wait period of 0 milliseconds for the uri to be presented. This would make the connector practically useless. Caldarale, Charles R wrote: From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: Re: CPU usage with APR and connectionTimeout impact according to the documentation there exists no connectionTimeout attribute for the apr connector. Which documentation is that? Note that the HTTP connector attributes apply when running in APR mode. Quoting from the APR-specific doc: The following attributes are supported in the HTTP APR connector in addition to the ones supported in the regular HTTP connector: What's not clear in the doc is that many of the HTTP attributes also apply to the NIO version of the protocol handler. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: very off topic marketing question
and it pre-compiles all of its code before it runs each script For starters, I'd point out the jsp page compiler does this as well... Then redirect the person to this thread to get lynched. (seriously, aside from the lynching this is a good idea) - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: tomcat w/apr data lost in http post request?
Possibly IE writes to the socket buffer in seperate steps for header info and post parameters. This would cause the data to be sent out in seperate packets if nagle's alg. is off. Caldarale, Charles R wrote: From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: tomcat w/apr data lost in http post request? Can MSIE even control which data goes in which packet? TCP/IP APIs on most platforms allow the Nagle algorithm to be disabled, which will cause data to be sent out on each call. Most TCP/IP stacks also set the push flag on the last packet of a sequence to force the peer stack to deliver the data to the receiver without delay. Tthat's probably all that IE is doing (but I don't know the MS APIs). - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: tomcat w/apr data lost in http post request?
Hi Chris, Raising the keepalive-timeout value on the connector definitely improves the situation. From what I've gathered from what people posted here (thanks guys) and dumping packets I believe the situation to be somewhat as follows: With nagle's off, IE sends out the http request in two separate packets. Somewhere between Tomcats receipt of packet 1 (header) and packet 2 (body/parameters) timeout occurs leading to the contents of the second packet to be ignored. Raising keepalive-timeout alleviates the problem by decreasing the chance of a timeout to occur. Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, On 3/6/2009 4:05 AM, Taylan Develioglu wrote: James, thank you very much. I suspected IE to be guilty because it was happening only with IE clients. Chris, I guess we don't need to try and reproduce this anymore now we know the cause? Well, you might want to figure out how to handle this situation. You can't simply ignore 80% of the potential clients out there :) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkm1mosACgkQ9CaO5/Lv0PBDRgCfQXPTf2uwKVgIeNHiuVbcyYT6 ZuEAnjNY9yEDmIFrc0q4TwNuvPkBuI3U =NGPN -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: tomcat w/apr data lost in http post request?
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, No, you'd need to modify the source. It's not particularly useful in most scenarios to intentionally stall an HTTP conversation, so it's not a built-in feature :) No, I'm saying that you should send exactly the right amount of data, but you should stall in the middle. For instance, set the Content-Length to 10 bytes, then send 5 bytes, then wait 10 or 20 seconds, and send the rest. Ah ofcourse, I understand what you're saying now. We basically wait for the timeout to occur before we send the post parameters. Could be done by *socket.setTcpNoDelay*() then writing to the socket and closing ,then waiting and writing the other half I think. Wow, I didn't realize that browsers would keep an HTTP connection open to a web server for 10 idle seconds. That seems like a really long time. Actually, the default for IE is even 60 seconds (idle) on a keepalive connection. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: tomcat w/apr data lost in http post request?
Hi Andre, I meant to stop writing, not closing the socket. Poor choice of words, apologies. André Warnier wrote: Taylan Develioglu wrote: Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, No, you'd need to modify the source. It's not particularly useful in most scenarios to intentionally stall an HTTP conversation, so it's not a built-in feature :) No, I'm saying that you should send exactly the right amount of data, but you should stall in the middle. For instance, set the Content-Length to 10 bytes, then send 5 bytes, then wait 10 or 20 seconds, and send the rest. Ah ofcourse, I understand what you're saying now. We basically wait for the timeout to occur before we send the post parameters. Could be done by *socket.setTcpNoDelay*() then writing to the socket and closing ,then waiting and writing the other half I think. No, I don't think you want to close. Send the first 5 bytes, then wait, then send the rest. Then maybe close (but only the sending side of the socket). If you close the connection totally (including the receiving side), you will provoke an error at the server side. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Effect of Heap Size on Performance?
Chris, We have 100+ application servers in a loadbalancing (application based, not tomcat) setup. If servers are removed from the load balancing pool the others need to be able to pick up the load. So the number of concurrent users is highly dynamic. You can imagine the problem if we keep the heapsize to a minimum on every server. I'm talking about a fixed size here ofcourse. I also don't think the relationship between number of objects and young gc duration is a linear one. Increasing the young generation leads to longer gc's. Increasing young to 682M on a 4G heap from its default size increased gc time approx. 3-4 x (47ms average to 154ms average on one server), but it also decreased the number of gc's performed by 15-20x. So eventually, a larger heapsize saved cycles, and subsequently increased throughput. At least for us. I think it also gives short-lived objects (for example short sessions) a longer time to 'die out', so they won't be moved to tenure because survivor space is increased and gc frequency is decreased (can anyone confirm this?). Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Taylan, On 3/5/2009 5:11 AM, Taylan Develioglu wrote: I always hold this as a ground rule: Increase heapsize as much as possible as long as: My rule has always been to run with the smallest heap you can get away with. We ran our main production app in 64MB of heap (the default for our platform) for 4 years before we got our first OOME. Now we run it with a 192MB heap. A smaller heap means that you'll catch even small memory leaks faster. At least, that's my position. Surprisingly, Chuck hasn't responded (he usually has something to say about GC/heap myths), but I suspect he'd say something like heap size itself has little effect on the GC's performance... it's really the number of objects that affect the performance. Granted, a larger heap invites more objects into it, but generational garbage collection is decent enough that the generations rarely grow to such a size that the app stalls while the GC runs. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkm2fBAACgkQ9CaO5/Lv0PCPngCfRfClYEVoDAI57VBbqoBUaAC8 RDAAn0fztUgMY0d0K0FAdV0uxYzSjDxN =EbMZ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: tomcat w/apr data lost in http post request?
I would like to correct this, it seems to only happen with IE6/7.. maybe old firefox 2.0 It happens with different clients indeed. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Record and simulate a web app
Guys, I've been following this thread for a while now, but doesn't Jmeter already do what you're trying to accomplish here? I've used jmeter's proxy to record and replay http requests/responses before with success. Or am I missing something here? Here's a link to some instructions : http://jakarta.apache.org/jmeter/usermanual/jmeter_proxy_step_by_step.pd f Rgds, T -Original Message- From: Christopher Schultz [mailto:ch...@christopherschultz.net] Sent: vrijdag 20 februari 2009 17:07 To: Tomcat Users List Subject: Re: Record and simulate a web app -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Youssef, On 2/20/2009 10:45 AM, Youssef Mohammed wrote: Yeah I was thinking that the capture code would perfectly fit in some HTTP tunnel so that we can capture the whole thing coming out of the web server , what do you think ? Okay, I took a stack trace of my servlet's code right in the middle of the request (TC 5.5.26) and found this at the top (it's really my first delve into exactly what code gets executed before application code, so forgive me if you already knew this): ...blah...blah...blah... at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 74) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 4) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc essConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint .java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow erWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool .java:689) The blah blah blah part is all application code from there on up. ApplicationFilterChain.internalDoFilter calls the configured filters in order, starting with mine (which I of course configured first). Note that I'm looking at the source for Tomcat 6.0.16 yet running 5.5.26. Stupid, I know, but the architecture hasn't changed /that/ much. I started at the /bottom/, ignoring the socket stuff, and right there in Http11Processor.prepareResponse I find this: headers.setValue(Date).setString(date); So, Tomcat post-processes the HTTP headers at the Connector level. Without writing your own /Connector/, you aren't going to be able to intercept the response properly. I was hoping to get away with a valve. I suppose you could subclass, say, Http11Processor and, in your constructor, replace the outputBuffer class with a wrapper for InternalOutputBuffer. But this is getting a little messy for me. Since I don't need it, I'm not too concerned about getting it done. :( If you figure out a way to capture the response, and determine how how to uniquely identify requests to match the response you want to return, I can show you how to write a servlet that can re-play the previously-saved responses. Good luck, - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkme1R0ACgkQ9CaO5/Lv0PByBwCfay9gRGEJ/R8m5H+iGB3s0lLP vP8An122DIn2SreN7czoa1+4HMaWeNPZ =anEz -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Fatal error: Cleaner terminated abnormally
I wanted to let you know it worked. The system.exit does get trapped. java.lang.Error: Cleaner terminated abnormally at sun.misc.Cleaner$1.run(Cleaner.java:130) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.Cleaner.clean(Cleaner.java:127) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:124) Caused by: java.lang.Error: java.io.IOException: Broken pipe at sun.nio.ch.Util$SelectorWrapper$Closer.run(Util.java:97) at sun.misc.Cleaner.clean(Cleaner.java:125) ... 1 more Caused by: java.io.IOException: Broken pipe at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method) at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:242) at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:170) at sun.nio.ch.SelectorImpl.implCloseSelector(SelectorImpl.java:92) at java.nio.channels.spi.AbstractSelector.close(AbstractSelector.java:91) at sun.nio.ch.Util$SelectorWrapper$Closer.run(Util.java:95) ... 2 more Exception in thread Reference Handler java.lang.SecurityException: Can't call System.exit() at com.emessenger.web.CustomSecurityManager.checkExit(CustomSecurityManager .java:22) at java.lang.Runtime.exit(Runtime.java:88) at java.lang.System.exit(System.java:906) at sun.misc.Cleaner$1.run(Cleaner.java:132) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.Cleaner.clean(Cleaner.java:127) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:124) - What we see is, GC breaks and class unloading starts after the first full GC: [Unloading class sun.reflect.GeneratedMethodAccessor100] [Unloading class sun.reflect.GeneratedConstructorAccessor76] [Unloading class sun.reflect.GeneratedConstructorAccessor80] [Unloading class sun.reflect.GeneratedConstructorAccessor77] [Unloading class sun.reflect.GeneratedMethodAccessor95] [Unloading class sun.reflect.GeneratedMethodAccessor98] [Unloading class sun.reflect.GeneratedConstructorAccessor78] [Unloading class sun.reflect.GeneratedMethodAccessor106] [Unloading class sun.reflect.GeneratedMethodAccessor91] [Unloading class sun.reflect.GeneratedMethodAccessor105] [Unloading class sun.reflect.GeneratedMethodAccessor85] - And then Full GC madness begins: [GC 4080947K(4177280K), 0.0603370 secs] [GC 4090738K(4177280K), 0.0683390 secs] [Full GC 4177280K-4081390K(4177280K), 16.2954960 secs] [GC 4081673K(4177280K), 0.0607990 secs] [GC 4097803K(4177280K), 0.0739870 secs] [Full GC 4177279K-4081951K(4177280K), 16.2857450 secs] [GC 4082100K(4177280K), 0.0614000 secs] [GC 4101581K(4177280K), 0.0814330 secs] [Full GC 4177279K-4082845K(4177280K), 16.2079870 secs] [GC 4084452K(4177280K), 0.0628080 secs] [GC 4106928K(4177280K), 0.0835720 secs] [Full GC 4177279K-4083187K(4177280K), 16.3403530 secs] [GC 4084203K(4177280K), 0.0627750 secs] [GC 4101856K(4177280K), 0.0737540 secs] [Full GC 4177278K-4083998K(4177280K), 16.2605530 secs] [GC 4084493K(4177280K), 0.0632620 secs] [GC 4107486K(4177280K), 0.0804700 secs] [Full GC 4177278K-4084298K(4177280K), 16.3931240 secs] [GC 4084500K(4177280K), 0.0633480 secs] [Full GC 4177279K-4085842K(4177280K), 16.4017970 secs] [GC 409K(4177280K), 0.0702090 secs] [GC 4127816K(4177280K), 0.1089220 secs] - But it's still better then an instant shutdown. -Original Message- From: Caldarale, Charles R [mailto:chuck.caldar...@unisys.com] Sent: donderdag 19 februari 2009 16:22 To: Tomcat Users List Subject: RE: Fatal error: Cleaner terminated abnormally From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: Re: Fatal error: Cleaner terminated abnormally By trapping the exit call using security manager we hope to prevent Tomcat from closing down on a cleaner termination. This is not likely to work, since the Cleaner is running this code as a privileged operation; if regular applications could trap those, I think there would be some serious security holes. Not sure what the side effects would be to keep running after a cleaner terminates (any idea). The thread doing the System.exit() call is the reference handler; the JVM will not function properly if it's not running. The exception should have been logged and ignored, not result in JVM termination, but I suspect it will be difficult to convince Sun of that at this point. I forgot to say thanks for the response guys. Especially yours Chris, it was very helpful. Odd, because Chris didn't participate in this thread... - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e
Re: Fatal error: Cleaner terminated abnormally
We're trying a workaround now. By trapping the exit call using security manager we hope to prevent Tomcat from closing down on a cleaner termination. Not sure what the side effects would be to keep running after a cleaner terminates (any idea). Keeping fingers crossed. I forgot to say thanks for the response guys. Especially yours Chris, it was very helpful. Rgds, T - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Fatal error: Cleaner terminated abnormally
This is bad news, but it was a longshot to begin with. I submitted a bug report which is under review now. and apologies for the name mixup. Chuck is obviously a much prettier name :) Caldarale, Charles R wrote: From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: Re: Fatal error: Cleaner terminated abnormally By trapping the exit call using security manager we hope to prevent Tomcat from closing down on a cleaner termination. This is not likely to work, since the Cleaner is running this code as a privileged operation; if regular applications could trap those, I think there would be some serious security holes. Not sure what the side effects would be to keep running after a cleaner terminates (any idea). The thread doing the System.exit() call is the reference handler; the JVM will not function properly if it's not running. The exception should have been logged and ignored, not result in JVM termination, but I suspect it will be difficult to convince Sun of that at this point. I forgot to say thanks for the response guys. Especially yours Chris, it was very helpful. Odd, because Chris didn't participate in this thread... - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Fatal error: Cleaner terminated abnormally
Hi Guys, Our application is a servlet running in a container in Tomcat standalone. It uses the following NIO connector definition: Connector port=80 protocol=org.apache.coyote.http11.Http11NioProtocol connectionTimeout=65000 keepAliveTimeout=1 maxKeepAliveRequests=1000 redirectPort=443 maxThreads=2000/ Lately we've been experiencing a fatal error, related to gc, with Tomcat that causes it to stop and unload, which I hoped you could give some advice for. I'm still unclear on what is causing the cleaner to terminate, but I guess that's more of a question for the java forums (I cannot find anything related to tomcat when I cross reference) Following the gc trail, it looks like an oom situation (maybe a mem leak in our application, our heapsize is 4GB), is it normal behavior for tomcat to destroy itsself like this? Has anyone experienced a similar problem before? What are usual causes for Tomcat to stop like this? *Any* advice or feedback is welcome. Either way, thanks in advance. Debian 4.0 Tomcat 6.0.18 Sun jdk 1.6.0.11 We use the following java options: OPTS= -verbose:gc -Dsun.net.inetaddr.ttl=60 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=$TMP_DIR -Djava.library.path=/usr/local/lib -Djava.endorsed.dirs=$CATALINA_BASE/endorsed -Dcatalina.base=$CATALINA_BASE -Dcatalina.home=$CATALINA_HOME -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode -Xms4096M -Xmx4096M -Xss128k -XX:PermSize=256M -XX:MaxPermSize=256M --- catalina.out snippet [GC 4052829K-3924296K(4177280K), 0.0519680 secs] [GC 4060616K-3924100K(4177280K), 0.1517880 secs] [GC 4060420K-3926867K(4177280K), 0.0883940 secs] [GC 4062488K-3931589K(4177280K), 0.1008470 secs] [GC 4067906K-3935097K(4177280K), 0.0931530 secs] [GC 4071417K-3934946K(4177280K), 0.0787300 secs] [GC 4029027K(4177280K), 0.1941170 secs] java.lang.Error: Cleaner terminated abnormally at sun.misc.Cleaner$1.run(Cleaner.java:130) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.Cleaner.clean(Cleaner.java:127) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:124) Caused by: java.lang.Error: java.io.IOException: Broken pipe at sun.nio.ch.Util$SelectorWrapper$Closer.run(Util.java:97) at sun.misc.Cleaner.clean(Cleaner.java:125) ... 1 more Caused by: java.io.IOException: Broken pipe at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method) at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:242) at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:170) at sun.nio.ch.SelectorImpl.implCloseSelector(SelectorImpl.java:92) at java.nio.channels.spi.AbstractSelector.close(AbstractSelector.java:91) at sun.nio.ch.Util$SelectorWrapper$Closer.run(Util.java:95) ... 2 more Feb 17, 2009 12:10:38 AM org.apache.coyote.http11.Http11NioProtocol pause INFO: Pausing Coyote HTTP/1.1 on http-80 Feb 17, 2009 12:10:38 AM org.apache.coyote.http11.Http11AprProtocol pause INFO: Pausing Coyote HTTP/1.1 on http-443 Feb 17, 2009 12:10:38 AM org.apache.coyote.ajp.AjpAprProtocol pause INFO: Pausing Coyote AJP/1.3 on ajp-8009 [GC 4071265K-3937784K(4177280K), 0.0921220 secs] Feb 17, 2009 12:10:39 AM org.apache.catalina.core.StandardService stop INFO: Stopping service Catalina Feb 17, 2009 12:10:39 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 28,017 instance(s) to be deallocated Feb 17, 2009 12:10:41 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 27,669 instance(s) to be deallocated Feb 17, 2009 12:10:42 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 27,666 instance(s) to be deallocated Feb 17, 2009 12:10:43 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 3 instance(s) to be deallocated Feb 17, 2009 12:10:44 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 3 instance(s) to be deallocated Feb 17, 2009 12:10:45 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 3 instance(s) to be deallocated 360358820 [SocketConnectorIoProcessor-0.0] null org.apache.mina.common.support.DefaultExceptionMonitor - Unexpected exception. java.lang.NullPointerException at org.apache.mina.common.ByteBuffer.allocate(ByteBuffer.java:225) at org.apache.mina.common.ByteBuffer.allocate(ByteBuffer.java:208) at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:210) at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:198) at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$400(SocketIoProcessor.java:45) at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:485) at
RE: Fatal error: Cleaner terminated abnormally
Sadly there is no mention of a fix related to NIO in the 6u12 release notes. This comes as kind of a bummer, as we were hoping to make a comet implementation soon. The native/apr connector looks like it could be a replacement for NIO for us, but after searching I could not find anything conclusive about the scalability and performance compared to NIO. Opinions on native vs nio in discussions I have found seem to be divided. I'm also not sure if the native/apr implementation is completely separate from the NIO api. Does anyone know of any downsides/pitfalls I should look out for when using native/apr ? As always, any comment is appreciated. - Taylan -Original Message- From: Caldarale, Charles R [mailto:chuck.caldar...@unisys.com] Sent: dinsdag 17 februari 2009 16:36 To: Tomcat Users List Subject: RE: Fatal error: Cleaner terminated abnormally From: Taylan Develioglu [mailto:tdevelio...@ebuddy.com] Subject: Fatal error: Cleaner terminated abnormally Lately we've been experiencing a fatal error, related to gc, with Tomcat that causes it to stop and unload It's not really a GC problem - rather a silly bug in NIO. You might try the standard HTTP connector to avoid the problem. Sun seems to be continually fixing NIO, so there may be something for this in 6u12, if you want to keep using the NIO connector. I'm still unclear on what is causing the cleaner to terminate The Cleaner terminates if the run() method of the registered object throws *any* kind of exception - and then takes the entire JVM down with it, via a System.exit() call (bloody brilliant, that one). In this case, the NIO Selector Closer object didn't like the fact that its peer had gone away, and puked. Not quite as robust as one might hope. - Chuck - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Fatal error: Cleaner terminated abnormally
Yes, 64-bit hotspot server vm. -Original Message- From: Mark Thomas [mailto:ma...@apache.org] Sent: dinsdag 17 februari 2009 16:23 To: Tomcat Users List Subject: Re: Fatal error: Cleaner terminated abnormally Taylan Develioglu wrote: Following the gc trail, it looks like an oom situation (maybe a mem leak in our application, our heapsize is 4GB), is it normal behavior for tomcat to destroy itsself like this? Are you on a 64-bit JVM? If not, the process heap is limited to 4GB so the Java object heap (set with Xmx) needs to allow for this. I would use 3.5GB as a starting point. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Fatal error: Cleaner terminated abnormally
I found bug 4938372, but it didn't seem related to me at the time. There's a post dated 2007, from Alan Bateman, indicating they'd try putting the fix in a java 6 update. I'll submit a bug report and in the meanwhile explore other options such as native/apr then. -Original Message- From: Caldarale, Charles R [mailto:chuck.caldar...@unisys.com] Sent: dinsdag 17 februari 2009 23:46 To: Tomcat Users List Subject: RE: Fatal error: Cleaner terminated abnormally From: Filip Hanik - Dev Lists [mailto:devli...@hanik.com] Subject: Re: Fatal error: Cleaner terminated abnormally search the sun database, some results there http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6521677 It's somewhat related, but I don't think it will cover the case reported here, which looked like a simple socket closure rather than anything to do with memory mapping of files. I think a new bug submission is in order (preferably by the OP). - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org