On Thu, Nov 29, 2018 at 05:28:53AM +0000, Tom Smyth wrote:
> Hello,
> before I begin... im just a sysadmin not a programmer
> I appreciate the work you are doing on OpenBGPd :) and I use it and im
> very happy
> 
> 
> I saw Claudes presentation on openBGPd recently and how there was some
> work on MP for BGPd..  and I was wondering about an idea
> (I thought it was a simple thing which may be too simple ... please
> bear in mind Im not a professional programmer,
> 
> basically I was wondering  could there be a simple way of removing risk of
> having an MP RDE,
> 
> from my understanding, one of the  concerns about RDE being MP
> is a withdraw being received on process running on a busy processor ,
> and a subsequent
> announce being received by a process running on a lighty loaded
> processor .. the announce being processed first and then the withdraw
> being processed afterwards... (there by withdrawing the route even
> though the announce was received after the withdraw)  there are
> probably others...
> 
> To get around the concern above..
> could the Ip space be carved up between different processes so it is
> always the same process dealing with all messages to do with an
> address space)
> 
> 
> lets say
> with 2 core and 4 core  Box for example
> 
> Could you split the RDE into 2 RDE processes which would only process
> half of the IP space ?
> so  for ip v4
> 
> 0.0.0.0/0 (route decision engine on a single core (current setup)
> 
> on a 2 core system
> 0.0.0.0/1 would be managed by RDE0
> 128.0.0.0/1 would be managed by RDE1
> 
> on a 4 core system
> 
> 0.0.0.0/2      RDE0
> 64.0.0.0/2    RDE1
> 128.0.0.0/2  RDE2
> 192.0.0.0/2  RDE3
> 
> on an 8 core system
> 
> 0.0.0.0/3      RDE0
> 32.0.0.0/3     RDE1
> 64.0.0.0/3     RDE2
> 96.0.0.0/3      RDE3
> 128.0.0.0/3    RDE4
> 160.0.0.0/3      RDE5
> 192.0.0.0/3      RDE6
> 224.0.0.0/3      RDE7 ???? you can use the spare cycles on this one to
> generate your favourite concurrency or help SETI :P or something :)
> 
> so each process may not be equally loaded but they would be faster
> over all with the increased parallelism ...  and because each RDE is
> operating on  prefixes that are only in a certain range that doesn't
> overlap other RDE Process work ...  there would never be a race
> condition or nasty MP introduced bugs
> 
> also there would need to be some work on the session engine to pass
> theBGP  messages to the correct RDE process easily (the decision
> process would be similar to a routing decision in the kernel LPM with
> 2 or 4 or 8 routes )
> 
> IPv6 I get would need a more nuanced approach as in split the subset
> 2000::/3  as opposed to
> 0:0/128
> 
> It would be a long time before I would be able to submit a patch
> worthy of consideration...  but is this Idea worth pursuing ?  is
> there a fundamental flaw in this idea / approach
> 
> Thanks for your time and consideration...

While it sounds trivial it is not that easy. First of all the RIB
maintenance is not the compute intensive work - that is the filtering.
In general I was considering this some time ago but for now I see no
reason to go down the road of sharding the RIB.
My current plan is to make input and output processing decoupled. Without
this even sharding will be hard to implement.

-- 
:wq Claudio

Reply via email to