RE: MINIMIZE BACKOFF SPF two technical points

Antoni Przygienda Mon, 28 Jul 2014 08:54:18 -0700

1. Like I said, decorrelation via flooding delays may be good enough already 
(or may be actually so bad that SPF synchronization isn't even possible [I am 
pretty that will be ultimately the case under heavy load/bigger failures] but 
under stable conditions/single link failures we are talking today very low 
msecs delays flooding end/end from recent data). Based on experience with LSPs 
synchronizing into strange attractors I suggest a jittered configurable timer 
before a run nevertheless. Can be set to 0 or made optional for implementations
2. Yes, Bruno, agreed, CAP is _not_ easy and so far unproven (but so is LS 
flooding as far I remember ;-). I do however think that my point has been 
missed, the draft tries not to synchronize a single node but a whole network = 
a massively distributed, loosely coupled system. And a node does by modifying 
its own LSA modify the state in all nodes via flooding.  Anyway, there are no 
clear conclusions as to how this draft should be modified except maybe adding a 
section highlighting the trade-offs between faster initial computation, better 
synchronization and issues I pointed out.  In case I find a minute, I'll read 
it & come with more specific suggestions.


Again, I'm supportive of this work and looking fwd' to practical deployment 
experiences once the  exponential/batching/whatever mechanism with enough 
parameters has been suggested as BCP. 

--- tony

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Monday, July 28, 2014 3:42 AM
> To: Antoni Przygienda
> Cc: [email protected]
> Subject: RE: MINIMIZE BACKOFF SPF two technical points
> 
> Hi Antoni,
> 
> Thanks for your feedback. More inlined.
> 
> > From: rtgwg [mailto:[email protected]] On Behalf Of Antoni
> > Przygienda > Sent: Thursday, July 24, 2014 11:06 PM
> >
> > As first, I'm supportive for the work & I think it's of solid
> > applicable value
> 
> Thank you.
> 
> > albeit it's strictly not IETF territory (it's not necessary for
> > interop strictly speaking).
> >
> > First is very blunt:  if you manage to really make all the routers in
> > the area compute @ precisely the same time, you may not be doing
> > yourself the favor you seek ;-)  What I mean is that generating
> > perfectly synchronized peaks in a network tends to generate strange
> > attractors, a good example was the synchronization of the HELLOs on all
> links over time that had to be jittered.
> > Peaks can stress infra unexpectedly & lead to e.g. synchronized re-
> > advertisement of LSAs  (or anything that SPF can trigger now and in
> > the future).  Given on top that an SPF in the future is not
> > necessarily the 2-3 msec SPF seen today (rLFA & such runs seem to
> > become the new flavor of
> > SPF) I suggest to include a small configurable jitter before the first
> > SPF is triggered (couple msecs should do the trick but I'm willing to
> > hear the argument that flooding de-sync's the SPF runs enough already).
> 
> Interesting feedback. We'll try to keep it in mind.
> FYI, note that so far we mainly had the opposite feedback: "You'll never
> manage to have a perfect synchronization. e.g. CPU scheduler delay between
> routers)
> 
> 
> > The other issue is far more subtle but may merit a section in the
> > draft.  This work is pushing the protocol in a very specific direction
> > along the CAP paradigm, i.e. a link-state routing protocol is roughly
> >
> > 1. Always 100% P (partitioned)
> > 2. Basically 100% available  A  (tad hard to define given FIBs) 3.
> > _eventually_ consistent C
> >
> > Now, it is fairly well understood that having all 3 is not possible
> > across very wide set of CS problems and we are not exempt of that.  We
> > cannot move P  so pushing on the C will cause A to move to the
> > negative. Now, what do I mean by that.
> 
> Well, I'm not an expert in theoretical computer science but:
> - there seem be a debate on CAP itself
> - I'm not sure it's applicable to SPF delay. In particular, SPF delay has no
> influence on the LSDB and hence in particular about its Consistency,
> Availability, and tolerance to Partition. Also, the proof of the CAP theorem
> seems to be limited to a replicated distributed system, while in LS IGP, only 
> a
> single node is allowed to modify a given data (LSP/LSA) so we don't have the
> issue of a distributed system which gets partitioned and where we have 2
> simultaneous conflicting requests (asking for a different change).
> 
> >Triggering the SPFs more aggressively will give you better
> >consistent&available in the scenario of a single link failure if things go 
> >well.
> > Now, compared to e.g. a batching algorithm that computes every 500
> >msecs  without backing off and will show linear consistent&available
> >even in case of  fast-link flapping, many links failing consecutively
> >and so on, exponential  backoff will cause massively lower consistency
> >after several link failure and  this network-wide so certain people may loose
> big time when using that.
> > Beside that, the quick SPFs can block lots of other things in the
> >protocol that  are not parallelized or block other protocols waiting
> >for SPFs to finish or next  SPF (2nd failure) stuck on FIB download
> >running (all hypothetical, but  availability in widest sense will go
> >down if you see more consistency). Again,  the work is good but the section
> will show people that it's not an 'universal'
> > improvement but something triggered to ideally a seldom occurring 1 or
> >2  links failure.
> 
> Let's step back a little bit on these 2 comments and in particular the CAP 
> one.
> The draft proposes to specify a common SPF delay specification. (full point).
> At most (best IMHO, worst related to your CAP comment):
> - only the SPF delay is changed
> - all nodes uses the same spec.
> 
> I don't think it's possible to claim that there is a theoretical issue/risk,
> because a mono vendor AS (using the same implementation on all nodes) is
> going much further in term of consistent behavior. And I don't think that any
> vendor will say that in such circumstance, its implementation does not work
> or is more risky, and the AS MUST /SHOULD introduce another vendor.
> Link State IGP do work even in mono-vendor network with all nodes running
> the same code and hence the same spec.
> 
> Thanks,
> Bruno
> 
> 
> > Thanks
> >
> > --- tony
> >
> >
> >
> > "FUTURE, n.
> > That period of time in which our affairs prosper, our friends are true
> > and our happiness is assured."
> > ― Ambrose Bierce, The Unabridged Devil's Dictionary
> >
> 
> ________________________________________________________________
> _________________________________________________________
> 
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc pas etre diffuses, 
> exploites
> ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez
> le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les
> messages electroniques etant susceptibles d'alteration, Orange decline toute
> responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged
> information that may be protected by law; they should not be distributed,
> used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

RE: MINIMIZE BACKOFF SPF two technical points

Reply via email to