[ 
https://issues.apache.org/jira/browse/XMLRPC-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704036#action_12704036
 ] 

Alan Burlison commented on XMLRPC-168:
--------------------------------------

The traces come from jstack.  When the server has deadlocked, use jps to get 
the pid of its process, then run 'jstack <pid>'.

In my testing the new code isn't any different.  The place where it deadlocks 
hasn't been touched by your changes, so that's not too surprising.  The problem 
is the ordering of the locks.  If you look in the trace above, the weblistener 
lock ordering is (ThreadPool$Poolable, ThreadPool$Poolable$1) and the worker 
lock ordering is (ThreadPool$Poolable$1, ThreadPool$Poolable).  The $1 object 
is the anonymous Thread object created inside the Poolable constructor.

The only way of fixing this is to change things so that both the weblistener 
and the worker thread acquire those locks in the same order.  As the locks are 
in part taken out by synchronized methods (e.g. isShuttingDown), the only way 
of doing *that* is to change the order in which the calls are made, and that 
basically means a major rewrite.

The nub of the problem is that the weblistener thread is locking 'down' into 
the Poolable and the Poolable thread is locking 'up' into the weblistener, and 
if those two things happen at the same time, you'll get a deadlock.  Taking out 
locks via the use of synchronized methods means that the *lock* ordering is 
going to be determined by the *call* ordering, and if object A locks and then 
calls synchronized methods on object B whist object B simultaneously locks and 
calls synchronized methods on object A the resultant lock ordering is (A, B) 
and (B, A) - i.e. deadlock.  That's an architectural issue, and I suspect the 
only way of fixing it is to rewrite most of the code.

That's why I suggested using the java.util.concurrent thread pool - that code 
will have been tested, and will scale.  There's a trade-off between reinventing 
the wheel and losing JDK 1.4 compatibility as the j.u.concurrent stuff didn't 
come in until JDK 1.5.  However there is a backport of the j.u.concurrent stuff 
- see http://backport-jsr166.sourceforge.net/ 

> XML-RPC server deadlocks under heavy load
> -----------------------------------------
>
>                 Key: XMLRPC-168
>                 URL: https://issues.apache.org/jira/browse/XMLRPC-168
>             Project: XML-RPC
>          Issue Type: Bug
>          Components: Source
>    Affects Versions: 3.1.2
>         Environment: Soalris
>            Reporter: Alan Burlison
>         Attachments: Client.java, Server.java, ThreadPool.java
>
>
> When running a XML-RPC server under heavy load, it eventually deadlocks 
> inside the thread pool that manages the 'worker' threads which handle the 
> individual XML-RPC requests - the classes involved are 
> org.apache.xmlrpc.util.ThreadPool and 
> org.apache.xmlrpc.util.ThreadPool$Poolable.  jstack on the hung process shows:
> ----------
> Found one Java-level deadlock:
> =============================
> "XML-RPC-13":
>   waiting to lock monitor 0x08d10bec (object 0xbb6959c0, a 
> org.apache.xmlrpc.util.ThreadPool),
>   which is held by "XML-RPC Weblistener"
> "XML-RPC Weblistener":
>   waiting to lock monitor 0x08d1186c (object 0xbd2ed340, a 
> org.apache.xmlrpc.util.ThreadPool$Poolable$1),
>   which is held by "XML-RPC-2"
> "XML-RPC-2":
>   waiting to lock monitor 0x08d112f4 (object 0xbd2ed570, a 
> org.apache.xmlrpc.util.ThreadPool$Poolable),
>   which is held by "XML-RPC Weblistener"
> Java stack information for the threads listed above:
> ===================================================
> "XML-RPC-13":
>       at org.apache.xmlrpc.util.ThreadPool.repool(Unknown Source)
>       - waiting to lock <0xbb6959c0> (a org.apache.xmlrpc.util.ThreadPool)
>       at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(Unknown Source)
> "XML-RPC Weblistener":
>       at org.apache.xmlrpc.util.ThreadPool$Poolable.start(Unknown Source)
>       - waiting to lock <0xbd2ed340> (a 
> org.apache.xmlrpc.util.ThreadPool$Poolable$1)
>       - locked <0xbd2ed570> (a org.apache.xmlrpc.util.ThreadPool$Poolable)
>       at org.apache.xmlrpc.util.ThreadPool.startTask(Unknown Source)
>       - locked <0xbb6959c0> (a org.apache.xmlrpc.util.ThreadPool)
>       at org.apache.xmlrpc.webserver.WebServer.run(Unknown Source)
>       at java.lang.Thread.run(Thread.java:619)
> "XML-RPC-2":
>       at org.apache.xmlrpc.util.ThreadPool$Poolable.isShuttingDown(Unknown 
> Source)
>       - waiting to lock <0xbd2ed570> (a 
> org.apache.xmlrpc.util.ThreadPool$Poolable)
>       at org.apache.xmlrpc.util.ThreadPool$Poolable.access$000(Unknown Source)
>       at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(Unknown Source)
>       - locked <0xbd2ed340> (a org.apache.xmlrpc.util.ThreadPool$Poolable$1)
> Found 1 deadlock.
> ----------
> And another slight variant on the above, from a debug build of revision 
> 769436.
> ----------
> Found one Java-level deadlock:
> =============================
> "XML-RPC-6":
>   waiting to lock monitor 0x0870d8ec (object 0xbac020f8, a 
> org.apache.xmlrpc.util.ThreadPool),
>   which is held by "XML-RPC Weblistener"
> "XML-RPC Weblistener":
>   waiting to lock monitor 0x0814de4c (object 0xbad73820, a 
> org.apache.xmlrpc.util.ThreadPool$Poolable$1),
>   which is held by "XML-RPC-5"
> "XML-RPC-5":
>   waiting to lock monitor 0x0814eacc (object 0xbad73b48, a 
> org.apache.xmlrpc.util.ThreadPool$Poolable),
>   which is held by "XML-RPC Weblistener"
> Java stack information for the threads listed above:
> ===================================================
> "XML-RPC-6":
>       at org.apache.xmlrpc.util.ThreadPool.repool(ThreadPool.java:136)
>       - waiting to lock <0xbac020f8> (a org.apache.xmlrpc.util.ThreadPool)
>       at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:70)
> "XML-RPC Weblistener":
>       at org.apache.xmlrpc.util.ThreadPool$Poolable.start(ThreadPool.java:106)
>       - waiting to lock <0xbad73820> (a 
> org.apache.xmlrpc.util.ThreadPool$Poolable$1)
>       - locked <0xbad73b48> (a org.apache.xmlrpc.util.ThreadPool$Poolable)
>       at org.apache.xmlrpc.util.ThreadPool.startTask(ThreadPool.java:168)
>       - locked <0xbac020f8> (a org.apache.xmlrpc.util.ThreadPool)
>       at org.apache.xmlrpc.webserver.WebServer.run(WebServer.java:338)
>       at java.lang.Thread.run(Thread.java:619)
> "XML-RPC-5":
>       at 
> org.apache.xmlrpc.util.ThreadPool$Poolable.getTask(ThreadPool.java:99)
>       - waiting to lock <0xbad73b48> (a 
> org.apache.xmlrpc.util.ThreadPool$Poolable)
>       at 
> org.apache.xmlrpc.util.ThreadPool$Poolable.access$100(ThreadPool.java:47)
>       at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:59)
>       - locked <0xbad73820> (a org.apache.xmlrpc.util.ThreadPool$Poolable$1)
> Found 1 deadlock.
> ----------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to