Peter Firmstone wrote:
I started thinking about a replacement, it's in org.apache.river.imp.util.

Excellent. I'll take a look, and pick up code and ideas as appropriate.


The name spaces are as follows:

net.jini.* API - must be backward compatible, or breakage minimal for if there's a good reason.
com.sun.jini.* - implementation, subject to change.
com.artima.* - implementation, subject to change.
org.apache.river.api.* - new API under review, please feel free to look for ways to improve before its frozen in November / December. org.apache.river.imp.* - new implementation, subject to change, eventually the com.* namespace will move there too.

Best bet would be to search for the classes that depend on the Task interface, see how they use it and come up with something better.

I've looked at several of them. Most runAfter implementations unconditionally return false. I'm tracking down the ones that don't.

Agreed, the Task interface is less than optimum, best to replace it. TaskManager has been identified as a performance bottleneck.

Are there any benchmarks demonstrating the bottleneck? I'll need some to see if my changes make matters significantly better. My guess from reading TaskManager code is that it is fine for small numbers of tasks, but does not scale well. However, measurements always beat guesses.

My current idea is to use a long sequence number assigned when a Task is added to the TaskManager. That would work for a TaskManager that operates continuously for over 290 years given one new Task every nanosecond. If more is needed, I could go to BigInteger depending on what the benchmarks tell me.

This requires wrapping the caller's Task object in another data structure, which has other uses.

Given a sequence number, I could separate the tasks into separate data structures depending status. Each data structure would be designed for the operations TaskManager needs to do on tasks in that state. In particular, tasks that are waiting to be assigned a thread would live in a BlockingPriorityQueue whose Comparator goes by sequence number.

I am considering changing the runAfter method to take a single Task, rather than a List. It moves a method call from outside to inside a loop, but I think it is worth it for the additional information it would give TaskManager. It simplifies the interface, which is usually a good thing to do for maintainability. It decouples the TaskManager callers from how TaskManager organizes its data structures, important not just for this round, but for any future tuning of TaskManager.

TaskManager, given a new Task, would begin by identifying the highest sequence number existing task on which it depends, using a backwards scan. If the Task depends on nothing, it goes in the priority queue for dispatch to a thread. If Task Y is the youngest incomplete task on which Task X depends, add X to a Set associated with Y, and put X in a pending data structure.

(To reduce object creation and destruction in simple cases, Y's Set would be created the first time an object is added to it.)

When Task Y leaves the TaskManager, through removal or completion, reexamine X, looking to see if it depends on an incomplete Task with a lower sequence number than Y. If it does, put it in that Task's waiting Set and leave it pending. If not, put it in the priority queue.

In this design, each question of the form "Does X depend on Y" would be asked at most once. A Task that is waiting on runAfter is kept out of the dispatch data structure.

This is just a first thought, and will almost certainly change after I have read more and done some experiments.

Patricia

Reply via email to