Yes, you're right.  I got the wrong end of the stick - wouldn't be the first
time.

Your problem is that you're getting too many events.  Whereas River-52 is
the opposite, a disappearing reggie (or slow-to respond listener) results in
all lookup events timing out.  Is that correct?

In your reggie configs, are you setting the initialLookupLocators array for
instance to point to the other?  I've experienced problems when doing this,
in this situation it seems like each reggie chats to the other which can
cause problems when one of them disappears.

Maybe someone who knows more about the ins and outs of reggie can comment?



On Wed, Apr 21, 2010 at 3:18 PM, Christopher Dolan <
[email protected]> wrote:

> Unless I've misunderstood RIVER-52, I can't see how it could be related
> to my bug because that's on the Reggie side.
>
> My scenario is that we have a Reggie on Server#1 and a Reggie on
> Server#2 and when we shut down Server#1, then a *client* also running on
> Server#2 gets this error.  That client is still in contact with the
> Reggie on Server#2, but appears to be executing two discards for
> Reggie#1.
>
> I just had a new thought: could it be that we're getting one discard
> from LookupDiscovery and one from LookupLocatorDiscovery that are
> coincidentally firing at the same time when Reggie#1 shuts down?  I know
> for a fact that my two servers are on the same subnet and that they also
> have locators pointing at each other, so both discovery mechanisms are
> active in the client.
>
> Thanks for your attention.
> Chris
>
> -----Original Message-----
> From: Tom Hobbs [mailto:[email protected]]
> Sent: Wednesday, April 21, 2010 6:22 AM
> To: [email protected]
> Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
> registrar kills LookupLocatorDiscovery thread
>
> There's a related problem on busy networks when reggies disappear; see
> RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
> related area... I think.
>
> I sure that the problem described in RIVER-52 exists because I've
> encountered it in the wild, but I'm still having trouble reproducing it
> at
> will.  I can take a look at both of these when time allows.
>
> Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
> reason off the top of my head as to why they can't be made protected.  I
> remember a little while ago you were talking about MarshalledInstance
> for
> some reason.  (Or did I just make that up?)  What are your thoughts on
> RIVER-29?
>
> Cheers,
>
> Tom
>
>
>
> On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <[email protected]>
> wrote:
>
> > Thanks Chris, I'll action your recommendations.  It would be nice to
> try to
> > track down where the problem is, it's a shame DiscoveryEvent isn't
> > immutable.
> >
> > Does anyone need access to the protected fields in DiscoveryEvent?
> >
> > What are the ramifications of making DiscoveryEvent immutable? How
> much
> > breakage of application code?.
> >
> > Or all Event's for that matter.
> >
> > Cheers,
> >
> > Peter.
> >
> >
> > Chris Dolan (JIRA) wrote:
> >
> >> Attempted discard of unknown registrar kills LookupLocatorDiscovery
> thread
> >>
> ------------------------------------------------------------------------
> --
> >>
> >>                 Key: RIVER-337
> >>                 URL: https://issues.apache.org/jira/browse/RIVER-337
> >>             Project: River
> >>          Issue Type: Bug
> >>          Components: net_jini_discovery, net_jini_lookup
> >>    Affects Versions: AR1, jtsk_2.1
> >>            Reporter: Chris Dolan
> >>
> >>
> >> The method
> >>
> >>
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(Discov
> eryEvent)
> >> has the following code that throws a RuntimeException (the code
> comment
> >> suggests that it is supposed to be impossible, but it's not).
> >>
> >>        ProxyReg reg = findReg(proxys[i]);
> >>        if(reg != null ) { // this check can be removed.
> >>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
> >>            drops.add(reg);
> >>        } else {
> >>            throw new RuntimeException("discard error");
> >>        }//endif
> >>
> >> Our QA does failover testing with two servers, each with a Reggie,
> where
> >> we deliberately crash and reboot server 1 then server 2 every 30
> minutes
> >> continuously.  In one case, we hit that RuntimeException.  I don't
> know why
> >> we got a null reg (that's a problem for another defect, maybe an
> undiagnosed
> >> race of two discards put on a task queue?  Maybe related to
> RIVER-37?).  But
> >> it caused a catastrophic chain of events because the RuntimeException
> is not
> >> caught anywhere up the stack.  In our case, it killed the
> >> LookupLocatorDiscovery$Notifier thread.
> >>
> >> java.lang.RuntimeException: discard error
> >>        at
> >>
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
> >>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
> >>        at
> >>
> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.disca
> rded(543)
> >>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
> >>
> >> I propose three changes:
> >>
> >>  1) change the discarded() method above to simply warn instead of
> throwing
> >>  2) put a try/catch(Throwable) around the listener invocation in
> >>  LookupLocatorDiscovery$Notifier.run()
> >>  3) put a similar try/catch around listener invocation in
> >> LookupDiscoveryManager.notifyListener
> >>
> >> The idea behind #2 and #3 is that misbehaving listeners should not be
> >> allowed to derail the discovery process.
> >>
> >>
> >>
> >>
> >
> >
>

Reply via email to