Unless I've misunderstood RIVER-52, I can't see how it could be related
to my bug because that's on the Reggie side.

My scenario is that we have a Reggie on Server#1 and a Reggie on
Server#2 and when we shut down Server#1, then a *client* also running on
Server#2 gets this error.  That client is still in contact with the
Reggie on Server#2, but appears to be executing two discards for
Reggie#1.

I just had a new thought: could it be that we're getting one discard
from LookupDiscovery and one from LookupLocatorDiscovery that are
coincidentally firing at the same time when Reggie#1 shuts down?  I know
for a fact that my two servers are on the same subnet and that they also
have locators pointing at each other, so both discovery mechanisms are
active in the client.

Thanks for your attention.
Chris

-----Original Message-----
From: Tom Hobbs [mailto:[email protected]] 
Sent: Wednesday, April 21, 2010 6:22 AM
To: [email protected]
Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
registrar kills LookupLocatorDiscovery thread

There's a related problem on busy networks when reggies disappear; see
RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
related area... I think.

I sure that the problem described in RIVER-52 exists because I've
encountered it in the wild, but I'm still having trouble reproducing it
at
will.  I can take a look at both of these when time allows.

Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
reason off the top of my head as to why they can't be made protected.  I
remember a little while ago you were talking about MarshalledInstance
for
some reason.  (Or did I just make that up?)  What are your thoughts on
RIVER-29?

Cheers,

Tom



On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <[email protected]>
wrote:

> Thanks Chris, I'll action your recommendations.  It would be nice to
try to
> track down where the problem is, it's a shame DiscoveryEvent isn't
> immutable.
>
> Does anyone need access to the protected fields in DiscoveryEvent?
>
> What are the ramifications of making DiscoveryEvent immutable? How
much
> breakage of application code?.
>
> Or all Event's for that matter.
>
> Cheers,
>
> Peter.
>
>
> Chris Dolan (JIRA) wrote:
>
>> Attempted discard of unknown registrar kills LookupLocatorDiscovery
thread
>>
------------------------------------------------------------------------
--
>>
>>                 Key: RIVER-337
>>                 URL: https://issues.apache.org/jira/browse/RIVER-337
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_discovery, net_jini_lookup
>>    Affects Versions: AR1, jtsk_2.1
>>            Reporter: Chris Dolan
>>
>>
>> The method
>>
>>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(Discov
eryEvent)
>> has the following code that throws a RuntimeException (the code
comment
>> suggests that it is supposed to be impossible, but it's not).
>>
>>        ProxyReg reg = findReg(proxys[i]);
>>        if(reg != null ) { // this check can be removed.
>>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
>>            drops.add(reg);
>>        } else {
>>            throw new RuntimeException("discard error");
>>        }//endif
>>
>> Our QA does failover testing with two servers, each with a Reggie,
where
>> we deliberately crash and reboot server 1 then server 2 every 30
minutes
>> continuously.  In one case, we hit that RuntimeException.  I don't
know why
>> we got a null reg (that's a problem for another defect, maybe an
undiagnosed
>> race of two discards put on a task queue?  Maybe related to
RIVER-37?).  But
>> it caused a catastrophic chain of events because the RuntimeException
is not
>> caught anywhere up the stack.  In our case, it killed the
>> LookupLocatorDiscovery$Notifier thread.
>>
>> java.lang.RuntimeException: discard error
>>        at
>>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
>>        at
net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
>>        at
net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
>>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
>>        at
>>
net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.disca
rded(543)
>>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
>>
>> I propose three changes:
>>
>>  1) change the discarded() method above to simply warn instead of
throwing
>>  2) put a try/catch(Throwable) around the listener invocation in
>>  LookupLocatorDiscovery$Notifier.run()
>>  3) put a similar try/catch around listener invocation in
>> LookupDiscoveryManager.notifyListener
>>
>> The idea behind #2 and #3 is that misbehaving listeners should not be
>> allowed to derail the discovery process.
>>
>>
>>
>>
>
>

Reply via email to