Thanks Chris, I'll action your recommendations. It would be nice to try
to track down where the problem is, it's a shame DiscoveryEvent isn't
immutable.
Does anyone need access to the protected fields in DiscoveryEvent?
What are the ramifications of making DiscoveryEvent immutable? How much
breakage of application code?.
Or all Event's for that matter.
Cheers,
Peter.
Chris Dolan (JIRA) wrote:
Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
--------------------------------------------------------------------------
Key: RIVER-337
URL: https://issues.apache.org/jira/browse/RIVER-337
Project: River
Issue Type: Bug
Components: net_jini_discovery, net_jini_lookup
Affects Versions: AR1, jtsk_2.1
Reporter: Chris Dolan
The method
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
has the following code that throws a RuntimeException (the code comment
suggests that it is supposed to be impossible, but it's not).
ProxyReg reg = findReg(proxys[i]);
if(reg != null ) { // this check can be removed.
proxyRegSet.remove(proxyRegSet.indexOf(reg));
drops.add(reg);
} else {
throw new RuntimeException("discard error");
}//endif
Our QA does failover testing with two servers, each with a Reggie, where we
deliberately crash and reboot server 1 then server 2 every 30 minutes
continuously. In one case, we hit that RuntimeException. I don't know why we
got a null reg (that's a problem for another defect, maybe an undiagnosed race
of two discards put on a task queue? Maybe related to RIVER-37?). But it
caused a catastrophic chain of events because the RuntimeException is not
caught anywhere up the stack. In our case, it killed the
LookupLocatorDiscovery$Notifier thread.
java.lang.RuntimeException: discard error
at
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
at net.jini.discovery.LookupDiscoveryManager.access$500(92)
at
net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
I propose three changes:
1) change the discarded() method above to simply warn instead of throwing
2) put a try/catch(Throwable) around the listener invocation in
LookupLocatorDiscovery$Notifier.run()
3) put a similar try/catch around listener invocation in
LookupDiscoveryManager.notifyListener
The idea behind #2 and #3 is that misbehaving listeners should not be allowed
to derail the discovery process.