On Jun 7, 2013, at 12:30 PM, John Bellassai <[email protected]> wrote:
> Hi, > > We've been seeing an issue in production for the past few months where after > running smoothly for a couple weeks, our application hangs and stops > responding to new requests until we bounce the container (Tomcat 7/CXF 2.5.3). > > Some thread/heap dump analysis for a few of these hang events have shown a > running theme. It seems that all of Tomcat's HTTPS handler threads except > one are waiting on a specific lock to become available. The problem is in > the org.apache.cxf.frontend.WSDLGetInterceptor class, specifically in the > synchronized block in the handleMessage method. > > What we are experiencing seems to not technically be a deadlock because one > thread is legitimately holding the lock and is still runnable and writing the > WSDL to the client, but for some reason (network issues perhaps), it is not > making very quick progress while other threads continue to pile up. > Eventually all of Tomcat's handler threads are in use and are waiting for > this lock so from the outside, the server does not respond to new requests. > > In examining the code for this interceptor I wonder if the synchronized block > needs to be as coarse as it is currently. In other words, would it be > possible to lock only while creating the WSDL Document object, then actually > write to the XMLStreamWriter outside of the synchronized block such that all > threads can make progress even if one client happens to be slow or > experiencing network issues? I don't think that would be possible for two reasons: 1) Another thread could then modify the document while it's being written out. I'm not exactly sure what would happen in that case. 2) Traversing a DOM object is also not thread safe: http://xerces.apache.org/xerces2-j/faq-dom.html#faq-1 Thus, you don't want 2 threads traversing it at the same time. One potential fix that might work would be to write it to a CachedOutputStream within the lock. That should be fairly fast. Then outside the lock, write that to the network stream. Would you care to give that a try and maybe submit a patch? -- Daniel Kulp [email protected] - http://dankulp.com/blog Talend Community Coder - http://coders.talend.com
