Re: [Openslp-devel] Problem with uncontrolled loss of DAs

Nick Wagner Thu, 20 Sep 2007 07:23:46 -0700

You're right about my alternate suggestion.  I had a different system setup
in mind.


I like option C the best.  On my systems it's important that DAs are
discovered quickly, so my configured DA advert rate is already at 20 seconds
-- so I'm not too worried about the advert overhead, and recovery from a
premature drop is quick if this mechanism is used.

I don't know what the other developers think, but I say go for it.

--Nick

On 9/20/07, Morrell Richard <[EMAIL PROTECTED]> wrote:
>
>  Thanks for the feedback.
>
> Ideally, we would like to detect invalid DAs within a minute or so.  I
> know that sounds ambitious, but our base networking is Gigabit ethernet, our
> backbone is 10Gbit, and our systems have less than 30 DAs (although we have
> a couple of hundred "slave" SAs), so the additional load of querying for
> option B is acceptable (option A has the same number of replies, but fewer
> request messages as it would use multicast).  We use very aggressive
> timeouts, and our searches usually complete anyway within a few milliseconds
> (or a few tens of milliseconds in the worst case).
>
> In option A, I had originally thought that the normal multicast algorithm
> was used, in which case subsequent messages would include an increasing
> responders list.  However, on more detailed inspection of the code, I
> realise that active discovery is a single-shot multicast send with no
> retransmits, so it is not really suitable for our purposes.
>
> I'm not clear how your alternate suggestion would work.  My understanding
> from looking at the code is that the library sends registration
> requests only to the local SLP daemon (although the comments for
> NetworkConnectToSA suggest that the cached socket, handle->sasock, can be
> connected directly to a DA/SA, the only place that I can find that
> handle->sasock is set up is from a call to NetworkConnectToSlpd), and as I
> understand it, the local SLP daemon will reject registrations that are not
> in its scope.
>
> I have also had a thought about an option C.  If each DA periodically sent
> out a multicast DAAdvert (a heartbeat) at a rate of, say, three times the
> inactive DA check rate, the daemon could remember whether it had received an
> advert for each known DA since the last check, and remove those DAs it
> hadn't heard from.  This would involve the least additional traffic, and
> probably the least additional load on the daemons.  The risk of losing a DA
> incorrectly would be higher than the risk from the unicast option
> (which does up to five retransmits), or similar if we used a rate of five
> times the check rate.  This mechanism would also be likely to recover from
> an incorrect removal more quickly than option B, which would have to rely on
> active DA discovery to re-find the DA (although this could probably be
> tuned).  The periodic sending would be dependent on having a configured
> check period, which would default to disabled.
>
> Thinking about it, this option seems to have a lot going for it.  What do
> you think ?
>
> --Richard
>
> -----Original Message-----
> *From:* Nick Wagner [mailto:[EMAIL PROTECTED]
> *Sent:* 19 September 2007 17:47
> *To:* Morrell Richard
> *Cc:* openslp-devel@lists.sourceforge.net
> *Subject:* Re: [Openslp-devel] Problem with uncontrolled loss of DAs
>
> In my systems everyone is on the same scope, so I haven't run into this
> problem (and why I would prefer that any added mechanism would be disabled
> by default).  I'm a little curious as to how often multiple scopes are
> actually used, and are used in the same manner as your system.  What kind of
> time period do you need to detect invalid DAs in?
>
> You are correct that the issue here is not just a FindScopes one, it's the
> fact that DAs don't expire in slpd.  I ran into the same issue when moving
> slpd unicast to UDP, which is why I added the timeout on the service
> registration (following the protocol, of course :).  If FindScopes were a
> protocol-level command, I'd suggest a similar solution, but FindScopes just
> queries the internal database as given to libslp by the connected slpd.
>
>  As an alternate to either option, the app could register a fake
> registration on each scope it knows about through a previous SLPFindScopes,
> which should help keep the knownDAs in sync.  And openslp is not changed.
> If multiple scopes aren't widely used, or used in the way you use it, this
> may be the preferred option.  Or it could act as a quick proof of concept.
>
> I'm a little worried about removing the answer suppression in option A.
> You are never guaranteed to receive a response from a DA in a particular
> FindSrvs, and if there are a lot of DAs on the system the likelihood of
> seeing that DA could decrease because you are processing all the DA adverts
> each request.  And I'm assuming you have some sort of list of potential
> drops and not just drop if a DA doesn't respond to one request.
>
> Option B has potential.  Slpd could periodically do a unicast request and
> time out in the same way that registration requests currently time out.  If
> this period wasn't too small there wouldn't be that much of an impact on the
> system (and if disabled by default would be even less :).  I think of the
> two options I prefer this one.
>
> Just my two cents.
>
> --Nick
>
>
> On 9/19/07, Morrell Richard <[EMAIL PROTECTED]> wrote:
> >
> > I have a problem with uncontrolled loss of DAs  ie. where DAs can drop
> > off
> > the network without sending out a corresponding DA advert, such as power
> >
> > loss, or network device failure.
> >
> > All the DAs in our system have unique scopes, and we perform unicast
> > searches of each scope (I have a patch to the 1.2.1 library that does
> > parallel unicast to multiple DAs, which I haven't yet had time to port
> > to
> > the latest trunk for submission).
> >
> > We get the list of scopes using the SLPFindScopes call, which queries
> > the
> > local daemon.  The problem is that when a DA goes down in an
> > uncontrolled
> > fashion, its scope never seems to get removed from the list of scopes,
> > so we
> > get timeouts for all subsequent searches until the DA comes back up
> > again,
> > which is unacceptable in our application (we can cope with a short
> > period
> > where this occurs, provided the situation is not permanent).
> >
> > We have tried setting the active DA discovery parameters to their most
> > aggressive, in the hope that this would flag up the lost DAs, but this
> > makes
> > no difference.
> >
> > I have looked at the code, both version 1.2.1 which we are using, and
> > the
> > latest trunk, and I believe the problem is in both, and arises because
> > the
> > active DA discovery only adds new DAs to the DA cache, and does not
> > remove
> > them.  DAs ARE removed from the cache if a unicast request to a DA
> > fails,
> > but these seem to be related only to service registration and
> > deregistration
> > and, since each DA has a unique scope, there is no requirement to
> > perform
> > these operations between DAs.
> >
> > The two approaches I was considering were
> >
> > a) Change the active discovery mechanism to query for all DAs (use an
> > empty
> > previous responders list), and construct a list of those known DAs that
> > don't reply, removing them from the cache after a time.  This behaviour
> > could be enabled/disabled using a new property.
> >
> > b) Perform a regular unicast (DA request?) to each of the known DAs eg.
> > on a
> > round robin basis so that all DAs are polled within a time period
> > controlled
> > by a new property (could be set to zero to disable this behaviour)
> >
> > Obviously, I would like to feed any changes back into the project, so I
> > am
> > looking for feedback as to which approach would be preferable, or if
> > there
> > is another approach that would be better, or if someone else was working
> > on
> > the problem already.
> >
> > Thanks.
> >
> > Richard Morrell
> >
> >
> >
> > Software Architecture & Technologies
> > THALES UNDERWATER SYSTEMS LTD
> >
> > This email, including any attachment, is a confidential communication
> > intended solely for the use of the individual or entity to whom it is
> > addressed. It contains information which is private and may be
> > proprietary
> > or covered by legal professional privilege. If you have received this
> > email
> > in error, please notify the sender upon receipt, and immediately delete
> > it
> > from your system.
> >
> > Anything contained in this email that is not connected with the
> > businesses
> > of this company is neither endorsed by nor is the liability of this
> > company.
> >
> > Whilst we have taken reasonable precautions to ensure that any
> > attachment to
> > this email has been swept for viruses, we cannot accept liability for
> > any
> > damage sustained as a result of software viruses, and would advise that
> > you
> > carry out your own virus checks before opening any attachment.
> >
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2005.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > Openslp-devel mailing list
> > Openslp-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/openslp-devel
> >
>
>
> This email, including any attachment, is a confidential communication
> intended solely for the use of the individual or entity to whom it is
> addressed. It contains information which is private and may be proprietary
> or covered by legal professional privilege. If you have received this email
> in error, please notify the sender upon receipt, and immediately delete it
> from your system.
>
> Anything contained in this email that is not connected with the businesses
> of this company is neither endorsed by nor is the liability of this company.
>
> Whilst we have taken reasonable precautions to ensure that any attachment
> to this email has been swept for viruses, we cannot accept liability for any
> damage sustained as a result of software viruses, and would advise that you
> carry out your own virus checks before opening any attachment.
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
Openslp-devel mailing list
Openslp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openslp-devel

Re: [Openslp-devel] Problem with uncontrolled loss of DAs

Reply via email to