There's a related problem on busy networks when reggies disappear; see
RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
related area... I think.

I sure that the problem described in RIVER-52 exists because I've
encountered it in the wild, but I'm still having trouble reproducing it at
will.  I can take a look at both of these when time allows.

Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
reason off the top of my head as to why they can't be made protected.  I
remember a little while ago you were talking about MarshalledInstance for
some reason.  (Or did I just make that up?)  What are your thoughts on
RIVER-29?

Cheers,

Tom



On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <[email protected]> wrote:

> Thanks Chris, I'll action your recommendations.  It would be nice to try to
> track down where the problem is, it's a shame DiscoveryEvent isn't
> immutable.
>
> Does anyone need access to the protected fields in DiscoveryEvent?
>
> What are the ramifications of making DiscoveryEvent immutable? How much
> breakage of application code?.
>
> Or all Event's for that matter.
>
> Cheers,
>
> Peter.
>
>
> Chris Dolan (JIRA) wrote:
>
>> Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
>> --------------------------------------------------------------------------
>>
>>                 Key: RIVER-337
>>                 URL: https://issues.apache.org/jira/browse/RIVER-337
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_discovery, net_jini_lookup
>>    Affects Versions: AR1, jtsk_2.1
>>            Reporter: Chris Dolan
>>
>>
>> The method
>>
>> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
>> has the following code that throws a RuntimeException (the code comment
>> suggests that it is supposed to be impossible, but it's not).
>>
>>        ProxyReg reg = findReg(proxys[i]);
>>        if(reg != null ) { // this check can be removed.
>>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
>>            drops.add(reg);
>>        } else {
>>            throw new RuntimeException("discard error");
>>        }//endif
>>
>> Our QA does failover testing with two servers, each with a Reggie, where
>> we deliberately crash and reboot server 1 then server 2 every 30 minutes
>> continuously.  In one case, we hit that RuntimeException.  I don't know why
>> we got a null reg (that's a problem for another defect, maybe an undiagnosed
>> race of two discards put on a task queue?  Maybe related to RIVER-37?).  But
>> it caused a catastrophic chain of events because the RuntimeException is not
>> caught anywhere up the stack.  In our case, it killed the
>> LookupLocatorDiscovery$Notifier thread.
>>
>> java.lang.RuntimeException: discard error
>>        at
>> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
>>        at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
>>        at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
>>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
>>        at
>> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
>>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
>>
>> I propose three changes:
>>
>>  1) change the discarded() method above to simply warn instead of throwing
>>  2) put a try/catch(Throwable) around the listener invocation in
>>  LookupLocatorDiscovery$Notifier.run()
>>  3) put a similar try/catch around listener invocation in
>> LookupDiscoveryManager.notifyListener
>>
>> The idea behind #2 and #3 is that misbehaving listeners should not be
>> allowed to derail the discovery process.
>>
>>
>>
>>
>
>

Reply via email to