addProxyReg is called from the LookupCacheImpl constructor and from the
discover method of a DiscoveryListener attached to a
LookupDiscoveryManager. I don't see any specific rules about thread
choice for the discover call, but as far as I can tell the lookup
discovery manager does not know anything about the service discovery
manager's cache, and so is unlikely to use its TaskManager.
A LookupTask is created in the run method of a RegisterListenerTask,
which is the class of task that addProxyReg adds to the cacheTaskMgr
TaskManager. RegisterListenerTask inherits the CacheTask runAfter method
that just returns false, so there is nothing to stop it from running in
parallel with any older task.
Looks like two or more different threads to me. I think a simpler
explanation for the lack of problem reports lies in two effects:
1. Almost all the time, addProxyReg will win the race. It has a
significant running start. Immediately after the end of its synchronized
block, it drops straight into calling the TaskManager add method.
Meanwhile, another thread that was at the start of its synchronized
block would have to go through creating a new Task object before
attempting the add call. For addProxyReg to lose the race for the add
method's TaskManager synchronization would require a cache miss or an
interrupt in a window a few instructions long.
2. The symptom, if any, would be cache confusion that would be very hard
to distinguish from a variation on
https://issues.apache.org/jira/browse/RIVER-324. If there were any
observations of this problem before RIVER-324 was fixed, they would have
been conflated with it and presumed fixed.
I am a strong believer in closing even the tiniest timing windows,
because they can collectively lead to general flakiness even if there
are no reproducible bug reports.
Patricia
On 8/5/2010 6:13 AM, Gregg Wonderly wrote:
I haven't looked yet, but a quick thought I had was, are the other
dependent instances (higher sequence numbers) actually created on a
separate thread, so that ordering could be compromised, or are they just
created in later in the same threads execution path?
Gregg Wonderly
Peter Firmstone wrote:
Thanks Patricia, sharp eyes!
Cheers,
Peter.
Patricia Shanahan wrote:
...
ServiceDiscoveryManager.LookupCacheImpl.taskSeqN is used to
initialize sequence number fields in some CacheTask subclasses that
are then used in runAfter methods.
ServiceDiscoveryManager.LookupCacheImpl.addProxyReg creates a task
with incremented taskSeqN inside a serviceIdMap synchronized block,
but adds it to the cacheTaskMgr outside the block.
public void addProxyReg(ProxyReg reg) {
RegisterListenerTask treg;
synchronized(serviceIdMap) {
treg = new RegisterListenerTask(reg, taskSeqN++);
}//end sync(serviceIdMap)
cacheTaskMgr.add(treg);
}//end LookupCacheImpl.addProxyReg
In the remaining sequence numbered CacheTask subclass cases, the
cacheTaskMgr.add call is done inside the same
synchronized(serviceIdMap) block as the taskSeqN increment.
There is a window in addProxyReg during which a LookupTask or
NotifyEventTask with a higher sequence number could be added before
the RegisterListenerTask task addProxyReg is adding. The LookupTask
and NotifyEventTask runAfter methods both wait for any
RegisterEventTask with a lower sequence number.
I believe this is a bug, but it will be hard to construct a test case
because it involves such a narrow timing window. I do recommend
moving the cacheTaskMgr.add(treg) statement inside the synchronized
block.