Re: [pmacct-discussion] Starving BGP sessions

2015-07-25 Thread Paolo Lucente
Hi Markus,

Thanks for the feedback, good to know you are using it in production
and i see how it can help after a session is started. To confirm that
processing the patch is on my todo list and let me thank you a lot for
the contribution :)

Cheers,
Paolo


On Fri, Jul 24, 2015 at 01:16:31PM +0200, Markus Weber wrote:
> On 24.07.2015 04:34, Paolo Lucente wrote:
> 
> >Thanks for the patch; makes sense to me and i see the benefit but
> >i > need some test in lab before committing as it has its
> >potential >
> danger ;-)
> 
> Just FYI - works here like a charm so far. No more starving sessions.
> 
> >Btw, did you have a look to the config directives bgp_daemon_batch
> >> and bgp_daemon_batch_interval? They allow to re-establish the
> >BGP >
> peerings in a more "controlled" fashion, ie. a defineable few at a >
> time, precisely to tackle such scenarios.
> 
> Those help to throttle new incoming BGP sessions (startup scenario).
> However, those settings will not help to prevent already established
> sessions starving, when there are heavy routing updates.
> 
> In the worst case: If the first connected peer keeps on sending updates
> for longer than the holdtime of any other session, this/these other
> session/s might starve. True, unlikely and here the cut off point are
> ~23 sessions on startup, but we've indeed seen starving "later connected"
> sessions because of ongoing heavy routing updates.
> 
> And a dropped session will delete the RIB for that peer - so I rather
> prefer slower updates than reloading the full table. Or am I missing
> something? My understanding (but only flew over the code) is, that if
> there's no RIB for the peer, no BGP information can be attached (esp.
> it's not looked up in any other RIB).
> 
> Cheers,
> Markus

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Starving BGP sessions

2015-07-24 Thread Markus Weber

On 24.07.2015 04:34, Paolo Lucente wrote:

Thanks for the patch; makes sense to me and i see the benefit but i > need some test in lab before committing as it has its potential > 

danger ;-)

Just FYI - works here like a charm so far. No more starving sessions.

Btw, did you have a look to the config directives bgp_daemon_batch  > and bgp_daemon_batch_interval? They allow to re-establish the BGP > 
peerings in a more "controlled" fashion, ie. a defineable few at a > 
time, precisely to tackle such scenarios.


Those help to throttle new incoming BGP sessions (startup scenario).
However, those settings will not help to prevent already established
sessions starving, when there are heavy routing updates.

In the worst case: If the first connected peer keeps on sending updates
for longer than the holdtime of any other session, this/these other
session/s might starve. True, unlikely and here the cut off point are
~23 sessions on startup, but we've indeed seen starving "later connected"
sessions because of ongoing heavy routing updates.

And a dropped session will delete the RIB for that peer - so I rather
prefer slower updates than reloading the full table. Or am I missing
something? My understanding (but only flew over the code) is, that if
there's no RIB for the peer, no BGP information can be attached (esp.
it's not looked up in any other RIB).

Cheers,
Markus

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Starving BGP sessions

2015-07-23 Thread Paolo Lucente
Hi Markus,

Thanks for the patch; makes sense to me and i see the benefit but
i need some test in lab before committing as it has its potential
danger ;-)

Btw, did you have a look to the config directives bgp_daemon_batch 
and bgp_daemon_batch_interval? They allow to re-establish the BGP
peerings in a more "controlled" fashion, ie. a defineable few at a
time, precisely to tackle such scenarios. 

Cheers,
Paolo

On Thu, Jul 23, 2015 at 01:25:45PM +0200, Markus Weber wrote:
> 
> In heavy loaded environments [Paolo knows our probably better
> than me ;-] BGP sessions might run into timeouts during heavy
> routing updates (on startup it takes here -thanks to the fast
> box it's running on - 30 mins to load all tables).
> 
> Reasons seems to be/is the processing loop in bgp.c, which
> handles incoming updates of peers first, which connected first,
> 'till there are no more updates available from them, and then -
> only then - get to handle the later connected peers.
> 
> Attached is a small quick-and-dirty hack (against 1.5.0), which
> should fix this (still not a real round-robin fashion, unless
> you have as many peers as configured as max; so it still gives
> a preference to the first ones, but does not forget the later
> ones; more rr like would be to set peers_idx_rr to peers_idx+1
> before the break ...).
> 
> Please let me know if you think this break anything ... if not,
> maybe worth merging it into the official release.
> 
> Cheers,
> Markus
> 
> Pricing Housing

> *** /home/ertools/source/pmacct-1.5.0/src/bgp/bgp.c-origThu Jul 23 
> 10:17:34 2015
> --- /home/ertools/source/pmacct-1.5.0/src/bgp/bgp.c Thu Jul 23 10:24:18 
> 2015
> ***
> *** 57,62 
> --- 57,63 
>   void skinny_bgp_daemon()
>   {
> int slen, ret, rc, peers_idx, allowed;
> +   int peers_idx_rr = 0;
> struct host_addr addr;
> struct bgp_header bhdr;
> struct bgp_peer *peer;
> ***
> *** 507,514 
>   /* We have something coming in: let's lookup which peer is thatl
>  XXX: to be optimized */
>   for (peer = NULL, peers_idx = 0; peers_idx < 
> config.nfacctd_bgp_max_peers; peers_idx++) {
> !   if (peers[peers_idx].fd && FD_ISSET(peers[peers_idx].fd, 
> &read_descs)) {
> !   peer = &peers[peers_idx];
> break;
> }
>   }
> --- 508,516 
>   /* We have something coming in: let's lookup which peer is thatl
>  XXX: to be optimized */
>   for (peer = NULL, peers_idx = 0; peers_idx < 
> config.nfacctd_bgp_max_peers; peers_idx++) {
> !   if (peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers].fd 
> && FD_ISSET(peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers].fd, 
> &read_descs)) {
> !   peer = &peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers];
> !   peers_idx_rr = (peers_idx_rr+1)%config.nfacctd_bgp_max_peers;
> break;
> }
>   }

> ___
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


[pmacct-discussion] Starving BGP sessions

2015-07-23 Thread Markus Weber


In heavy loaded environments [Paolo knows our probably better
than me ;-] BGP sessions might run into timeouts during heavy
routing updates (on startup it takes here -thanks to the fast
box it's running on - 30 mins to load all tables).

Reasons seems to be/is the processing loop in bgp.c, which
handles incoming updates of peers first, which connected first,
'till there are no more updates available from them, and then -
only then - get to handle the later connected peers.

Attached is a small quick-and-dirty hack (against 1.5.0), which
should fix this (still not a real round-robin fashion, unless
you have as many peers as configured as max; so it still gives
a preference to the first ones, but does not forget the later
ones; more rr like would be to set peers_idx_rr to peers_idx+1
before the break ...).

Please let me know if you think this break anything ... if not,
maybe worth merging it into the official release.

Cheers,
Markus

Pricing Housing
*** /home/ertools/source/pmacct-1.5.0/src/bgp/bgp.c-origThu Jul 23 
10:17:34 2015
--- /home/ertools/source/pmacct-1.5.0/src/bgp/bgp.c Thu Jul 23 10:24:18 2015
***
*** 57,62 
--- 57,63 
  void skinny_bgp_daemon()
  {
int slen, ret, rc, peers_idx, allowed;
+   int peers_idx_rr = 0;
struct host_addr addr;
struct bgp_header bhdr;
struct bgp_peer *peer;
***
*** 507,514 
  /* We have something coming in: let's lookup which peer is thatl
 XXX: to be optimized */
  for (peer = NULL, peers_idx = 0; peers_idx < 
config.nfacctd_bgp_max_peers; peers_idx++) {
!   if (peers[peers_idx].fd && FD_ISSET(peers[peers_idx].fd, &read_descs)) {
!   peer = &peers[peers_idx];
break;
}
  }
--- 508,516 
  /* We have something coming in: let's lookup which peer is thatl
 XXX: to be optimized */
  for (peer = NULL, peers_idx = 0; peers_idx < 
config.nfacctd_bgp_max_peers; peers_idx++) {
!   if (peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers].fd && 
FD_ISSET(peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers].fd, 
&read_descs)) {
!   peer = &peers[(peers_idx+peers_idx_rr)%config.nfacctd_bgp_max_peers];
!   peers_idx_rr = (peers_idx_rr+1)%config.nfacctd_bgp_max_peers;
break;
}
  }
___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists