There's a related problem on busy networks when reggies disappear; see RIVER-52. Although (probably) not the cause of Chris' trouble it's in a related area... I think.
I sure that the problem described in RIVER-52 exists because I've encountered it in the wild, but I'm still having trouble reproducing it at will. I can take a look at both of these when time allows. Peter; I don't know about DiscoveryEvent's fields. I can't think of any reason off the top of my head as to why they can't be made protected. I remember a little while ago you were talking about MarshalledInstance for some reason. (Or did I just make that up?) What are your thoughts on RIVER-29? Cheers, Tom On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <[email protected]> wrote: > Thanks Chris, I'll action your recommendations. It would be nice to try to > track down where the problem is, it's a shame DiscoveryEvent isn't > immutable. > > Does anyone need access to the protected fields in DiscoveryEvent? > > What are the ramifications of making DiscoveryEvent immutable? How much > breakage of application code?. > > Or all Event's for that matter. > > Cheers, > > Peter. > > > Chris Dolan (JIRA) wrote: > >> Attempted discard of unknown registrar kills LookupLocatorDiscovery thread >> -------------------------------------------------------------------------- >> >> Key: RIVER-337 >> URL: https://issues.apache.org/jira/browse/RIVER-337 >> Project: River >> Issue Type: Bug >> Components: net_jini_discovery, net_jini_lookup >> Affects Versions: AR1, jtsk_2.1 >> Reporter: Chris Dolan >> >> >> The method >> >> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent) >> has the following code that throws a RuntimeException (the code comment >> suggests that it is supposed to be impossible, but it's not). >> >> ProxyReg reg = findReg(proxys[i]); >> if(reg != null ) { // this check can be removed. >> proxyRegSet.remove(proxyRegSet.indexOf(reg)); >> drops.add(reg); >> } else { >> throw new RuntimeException("discard error"); >> }//endif >> >> Our QA does failover testing with two servers, each with a Reggie, where >> we deliberately crash and reboot server 1 then server 2 every 30 minutes >> continuously. In one case, we hit that RuntimeException. I don't know why >> we got a null reg (that's a problem for another defect, maybe an undiagnosed >> race of two discards put on a task queue? Maybe related to RIVER-37?). But >> it caused a catastrophic chain of events because the RuntimeException is not >> caught anywhere up the stack. In our case, it killed the >> LookupLocatorDiscovery$Notifier thread. >> >> java.lang.RuntimeException: discard error >> at >> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639) >> at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375) >> at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356) >> at net.jini.discovery.LookupDiscoveryManager.access$500(92) >> at >> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543) >> at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650) >> >> I propose three changes: >> >> 1) change the discarded() method above to simply warn instead of throwing >> 2) put a try/catch(Throwable) around the listener invocation in >> LookupLocatorDiscovery$Notifier.run() >> 3) put a similar try/catch around listener invocation in >> LookupDiscoveryManager.notifyListener >> >> The idea behind #2 and #3 is that misbehaving listeners should not be >> allowed to derail the discovery process. >> >> >> >> > >
