Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-11 Thread Laurent CARON

Le 04/04/2022 à 15:43, Claudio Jeker a écrit :


You should really use as-set for this:

as-set ru-set { 2148 2585 2587 ... }

And also not match any (at least I think you don't really want that to
match on ibgp sessions):

match from ebgp AS as-set ru-set set { localpref 250 nexthop blackhole }

If done right you can replace all your rules by one single one.



Hi Claudio,

I followed your advice and everything is stable now. No need to increase 
memory limits.


Thanks for the hint.

Laurent



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-06 Thread Laurent CARON



Le 04/04/2022 à 15:43, Claudio Jeker a écrit :

On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:

Hi,

I'm happily running several OpenBGPd routers (Openbsd 7.0).

After having applied the folloxing filters (to blackhole traffic from
certain countries):

include "/etc/bgpd/deny-asn.ru.bgpd"
include "/etc/bgpd/deny-asn.by.bgpd"
include "/etc/bgpd/deny-asn.ua.bgpd"


# head /etc/bgpd/deny-asn.ru.bgpd
match from any AS 2148 set { localpref 250 nexthop blackhole }
match from any AS 2585 set { localpref 250 nexthop blackhole }
match from any AS 2587 set { localpref 250 nexthop blackhole }
match from any AS 2599 set { localpref 250 nexthop blackhole }
match from any AS 2766 set { localpref 250 nexthop blackhole }
match from any AS 2848 set { localpref 250 nexthop blackhole }
match from any AS 2854 set { localpref 250 nexthop blackhole }
match from any AS 2875 set { localpref 250 nexthop blackhole }
match from any AS 2878 set { localpref 250 nexthop blackhole }
match from any AS 2895 set { localpref 250 nexthop blackhole }


You should really use as-set for this:

as-set ru-set { 2148 2585 2587 ... }

And also not match any (at least I think you don't really want that to
match on ibgp sessions):

match from ebgp AS as-set ru-set set { localpref 250 nexthop blackhole }

If done right you can replace all your rules by one single one.


Hi Claudio,

Thanks for the hints.

Will change the config accordingly and report back.

Cheers,

Laurent



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Claudio Jeker
On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:
> Hi,
> 
> I'm happily running several OpenBGPd routers (Openbsd 7.0).
> 
> After having applied the folloxing filters (to blackhole traffic from
> certain countries):
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> 
> # head /etc/bgpd/deny-asn.ru.bgpd
> match from any AS 2148 set { localpref 250 nexthop blackhole }
> match from any AS 2585 set { localpref 250 nexthop blackhole }
> match from any AS 2587 set { localpref 250 nexthop blackhole }
> match from any AS 2599 set { localpref 250 nexthop blackhole }
> match from any AS 2766 set { localpref 250 nexthop blackhole }
> match from any AS 2848 set { localpref 250 nexthop blackhole }
> match from any AS 2854 set { localpref 250 nexthop blackhole }
> match from any AS 2875 set { localpref 250 nexthop blackhole }
> match from any AS 2878 set { localpref 250 nexthop blackhole }
> match from any AS 2895 set { localpref 250 nexthop blackhole }
> 

You should really use as-set for this:

as-set ru-set { 2148 2585 2587 ... }

And also not match any (at least I think you don't really want that to
match on ibgp sessions):

match from ebgp AS as-set ru-set set { localpref 250 nexthop blackhole }

If done right you can replace all your rules by one single one.

-- 
:wq Claudio



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Claudio Jeker
On Mon, Apr 04, 2022 at 03:14:35PM +0200, Laurent CARON wrote:
> 
> Le 01/04/2022 à 14:38, Claudio Jeker a écrit :
> > 
> > The numbers look reasonable with maybe the exception of prefix and BGP
> > path attrs. Unless this system is pushing or pulling lots of full feeds to
> > peers I would not expect such a high number of prefixes. Also the number
> > of path attributes is high but that could again be reasonable if many
> > different full feeds are involved.
> 
> Hi Claudio,
> 
> This box is terminating 3 full IPv4 + 3 full IPv6 feeds + a few dozen IX
> sessions in addition to 5 IPv4 + 5 IPv6 iBGP connections.

3G is not enough for such a busy system. You need to increase your limit,
5GB is probably enough.
 
> > > I'm not sure why the processes gets killed at around 3GB. Feels like you
> > > hit the ulimit. See Stuart's mail about how to look into that.
> > > So looking at this output I feel like you somehow created a BGP update
> > > loop where one or more systems are constantly sending UPDATEs to each
> > > other because the moment the update is processed the route decision
> > > changes and flaps back resulting in a withdraw or update.
> 
> I sincerely think it is not related to a BGP update loop because the issue
> is only triggered when adding the following filters:
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> for a total of 8265 rules
> 
> I'll try to dig further.

If you deny asns then please use an as-set instead of individual rules.

-- 
:wq Claudio



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Laurent CARON



Le 01/04/2022 à 14:38, Claudio Jeker a écrit :


The numbers look reasonable with maybe the exception of prefix and BGP
path attrs. Unless this system is pushing or pulling lots of full feeds to
peers I would not expect such a high number of prefixes. Also the number
of path attributes is high but that could again be reasonable if many
different full feeds are involved.


Hi Claudio,

This box is terminating 3 full IPv4 + 3 full IPv6 feeds + a few dozen IX 
sessions in addition to 5 IPv4 + 5 IPv6 iBGP connections.



I'm not sure why the processes gets killed at around 3GB. Feels like you
hit the ulimit. See Stuart's mail about how to look into that.
  
So looking at this output I feel like you somehow created a BGP update

loop where one or more systems are constantly sending UPDATEs to each
other because the moment the update is processed the route decision
changes and flaps back resulting in a withdraw or update.


I sincerely think it is not related to a BGP update loop because the 
issue is only triggered when adding the following filters:


include "/etc/bgpd/deny-asn.ru.bgpd"
include "/etc/bgpd/deny-asn.by.bgpd"
include "/etc/bgpd/deny-asn.ua.bgpd"

for a total of 8265 rules

I'll try to dig further.


Thanks



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Laurent CARON



Le 29/03/2022 à 14:50, Stuart Henderson a écrit :


Also: check the values for bgpd's login class (as root, "su -c bgpd -"
then "ulimit -a"), and are you starting bgpd from the rc-script or by hand?




Hi Stuart,

# ulimit -a
time(cpu-seconds)    unlimited
file(blocks) unlimited
coredump(blocks) unlimited
data(kbytes) 33554432
stack(kbytes)    8192
lockedmem(kbytes)    21502949
memory(kbytes)   64498548
nofiles(descriptors) 512
processes    1310


I'm starting bgpd through "rcctl start bgpd"


Thanks

Laurent



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-01 Thread Claudio Jeker
On Thu, Mar 31, 2022 at 09:06:05PM +0200, Laurent CARON wrote:
> Le 29/03/2022 à 12:10, Claudio Jeker a écrit :
> > I doubt it is the filters. You run into some sort of memory leak. Please
> > monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output
> > to see why and when the memory starts to go up.
> > With that information it may be possible to figure out where this leak
> > sits and how to fix it.
> > 
> > Cheers
> 
> 
> Hi Claudio,
> 
> Please find the output of 'bgpctl show rib mem' just 1 minute before the
> crash:
> 
> cat 2022-03-30::15:07:01.mem
> RDE memory statistics
> 909685 IPv4 unicast network entries using 34.7M of memory
> 272248 IPv6 unicast network entries using 14.5M of memory
>2363169 rib entries using 144M of memory
>   14616410 prefix entries using 1.7G of memory
>1539060 BGP path attribute entries using 106M of memory
>and holding 14616410 references
> 635275 BGP AS-PATH attribute entries using 33.7M of memory
>and holding 1539060 references
>  47399 entries for 681150 BGP communities using 15.1M of memory
>and holding 14616410 references
>  22139 BGP attributes entries using 865K of memory
>and holding 3436885 references
>  22138 BGP attributes using 175K of memory
> 270121 as-set elements in 249193 tables using 9.7M of memory
> 452138 prefix-set elements using 19.0M of memory
> RIB using 2.1G of memory
> Sets using 28.7M of memory
> 
> RDE hash statistics
> path hash: size 131072, 1539060 entries
> min 0 max 31 avg/std-dev = 11.742/3.623
> aspath hash: size 131072, 635275 entries
> min 0 max 16 avg/std-dev = 4.847/2.123
> comm hash: size 16384, 47399 entries
> min 0 max 12 avg/std-dev = 2.893/1.622
> attr hash: size 16384, 22139 entries
> min 0 max 8 avg/std-dev = 1.351/1.084

The numbers look reasonable with maybe the exception of prefix and BGP
path attrs. Unless this system is pushing or pulling lots of full feeds to
peers I would not expect such a high number of prefixes. Also the number
of path attributes is high but that could again be reasonable if many
different full feeds are involved.
 
> Here is the output of 'ps aux | grep bgp' one minute before the crash:
> 
> _bgpd25479 100.1 40.1 33547416 33620192 ??  Rp/2   Tue09AM 1755:38.49
> bgpd: route
> _bgpd 8696 31.6  0.0 15800 13240 ??  Sp Tue09AM  626:35.66 bgpd:
> sessio
> _bgpd46603  0.0  0.0 22728 25876 ??  Ip Tue09AM1:29.11 bgpd: rtr
> en
> root 94644  0.0  0.0   196   916 ??  Rp/33:07PM0:00.00 grep bgpd
 
Interesting, the size is around 3GB which is somewhat reasonable.
What surprises me is the high CPU load and time spent in both the RDE and
SE. One of my core routers running since last September has about the same
CPU usage that your box collected in a few days. It seems that there is a
lot of churn.

I'm not sure why the processes gets killed at around 3GB. Feels like you
hit the ulimit. See Stuart's mail about how to look into that.
 
So looking at this output I feel like you somehow created a BGP update
loop where one or more systems are constantly sending UPDATEs to each
other because the moment the update is processed the route decision
changes and flaps back resulting in a withdraw or update.

You can check the 'bgpctl show' and 'bgpctl show nei ' output to
see between which peers many messages are sent. From there on you need to
see which prefixes cause this update storm. Probably some filter rule
causes this.

My assumption is that because of this UPDATE loop the systems slowly kill
each other by pushing more and more updates into various buffers along the
way.

> During the crash, bgpctl show rib mem doesn't work.
> Here is the ps aux | grep bgp output during the crash:
> 
> _bgpd25479  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> _bgpd46603  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> _bgpd 8696  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> root 76428  0.0  0.0   180   772 ??  R/2 3:08PM0:00.00 grep bgpd
> 
> 
> Please note /var/log/messages output:
> 
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: main: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: RTR: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: fatal in RTR: Lost connection to
> parent
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE control
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 

Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-31 Thread Laurent CARON

Le 29/03/2022 à 12:10, Claudio Jeker a écrit :

I doubt it is the filters. You run into some sort of memory leak. Please
monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output
to see why and when the memory starts to go up.
With that information it may be possible to figure out where this leak
sits and how to fix it.

Cheers



Hi Claudio,

Please find the output of 'bgpctl show rib mem' just 1 minute before the 
crash:


cat 2022-03-30::15:07:01.mem
RDE memory statistics
909685 IPv4 unicast network entries using 34.7M of memory
272248 IPv6 unicast network entries using 14.5M of memory
   2363169 rib entries using 144M of memory
  14616410 prefix entries using 1.7G of memory
   1539060 BGP path attribute entries using 106M of memory
   and holding 14616410 references
635275 BGP AS-PATH attribute entries using 33.7M of memory
   and holding 1539060 references
 47399 entries for 681150 BGP communities using 15.1M of memory
   and holding 14616410 references
 22139 BGP attributes entries using 865K of memory
   and holding 3436885 references
 22138 BGP attributes using 175K of memory
270121 as-set elements in 249193 tables using 9.7M of memory
452138 prefix-set elements using 19.0M of memory
RIB using 2.1G of memory
Sets using 28.7M of memory

RDE hash statistics
path hash: size 131072, 1539060 entries
min 0 max 31 avg/std-dev = 11.742/3.623
aspath hash: size 131072, 635275 entries
min 0 max 16 avg/std-dev = 4.847/2.123
comm hash: size 16384, 47399 entries
min 0 max 12 avg/std-dev = 2.893/1.622
attr hash: size 16384, 22139 entries
min 0 max 8 avg/std-dev = 1.351/1.084


Here is the output of 'ps aux | grep bgp' one minute before the crash:

_bgpd25479 100.1 40.1 33547416 33620192 ??  Rp/2   Tue09AM 
1755:38.49 bgpd: route
_bgpd 8696 31.6  0.0 15800 13240 ??  Sp Tue09AM  626:35.66 bgpd: 
sessio
_bgpd46603  0.0  0.0 22728 25876 ??  Ip Tue09AM1:29.11 bgpd: 
rtr en

root 94644  0.0  0.0   196   916 ??  Rp/33:07PM0:00.00 grep bgpd


During the crash, bgpctl show rib mem doesn't work.
Here is the ps aux | grep bgp output during the crash:

_bgpd25479  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
_bgpd46603  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
_bgpd 8696  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
root 76428  0.0  0.0   180   772 ??  R/2 3:08PM0:00.00 grep bgpd


Please note /var/log/messages output:

Mar 30 15:07:27 bgpgw-004 bgpd[17103]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[17103]: main: Lost connection to RDE
Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[46603]: RTR: Lost connection to RDE
Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[46603]: fatal in RTR: Lost connection to 
parent

Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE
Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE control
Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to parent


Thanks,

Laurent



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-29 Thread Laurent CARON

Le 29/03/2022 à 12:10, Claudio Jeker a écrit :

I doubt it is the filters. You run into some sort of memory leak. Please
monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output
to see why and when the memory starts to go up.
With that information it may be possible to figure out where this leak
sits and how to fix it.

Cheers


Thanks Claudio, will do and report.



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-29 Thread Stuart Henderson
On 2022-03-29, Claudio Jeker  wrote:
> On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:
>> Hi,
>> 
>> I'm happily running several OpenBGPd routers (Openbsd 7.0).
>> 
>> After having applied the folloxing filters (to blackhole traffic from
>> certain countries):
>> 
>> include "/etc/bgpd/deny-asn.ru.bgpd"
>> include "/etc/bgpd/deny-asn.by.bgpd"
>> include "/etc/bgpd/deny-asn.ua.bgpd"
>> 
>> 
>> # head /etc/bgpd/deny-asn.ru.bgpd
>> match from any AS 2148 set { localpref 250 nexthop blackhole }
>> match from any AS 2585 set { localpref 250 nexthop blackhole }
>> match from any AS 2587 set { localpref 250 nexthop blackhole }
>> match from any AS 2599 set { localpref 250 nexthop blackhole }
>> match from any AS 2766 set { localpref 250 nexthop blackhole }
>> match from any AS 2848 set { localpref 250 nexthop blackhole }
>> match from any AS 2854 set { localpref 250 nexthop blackhole }
>> match from any AS 2875 set { localpref 250 nexthop blackhole }
>> match from any AS 2878 set { localpref 250 nexthop blackhole }
>> match from any AS 2895 set { localpref 250 nexthop blackhole }
>> 
>> The bgpd daemon crashes every few days with the following:
>> 
>> Mar 21 11:36:54 bgpgw-004 bgpd[76476]: 338 roa-set entries expired
>> Mar 21 12:06:54 bgpgw-004 bgpd[76476]: 36 roa-set entries expired
>> Mar 21 12:11:54 bgpgw-004 bgpd[76476]: 82 roa-set entries expired
>> Mar 21 12:22:36 bgpgw-004 bgpd[99215]: fatal in RDE: prefix_alloc: Cannot
>> allocate memory
>> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: main: Lost connection to RDE
>> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: RTR: Lost connection to RDE
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE control
>> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: fatal in RTR: Lost connection to
>> parent
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: Can't send message 61 to RDE, pipe
>> closed
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
>> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to parent
>> ...
>> 
>> Mar 24 06:34:17 bgpgw-004 bgpd[83062]: 17 roa-set entries expired
>> Mar 24 06:54:47 bgpgw-004 bgpd[82782]: fatal in RDE: communities_copy:
>> Cannot allocate memory
>> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: main: Lost connection to RDE
>> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: RTR: Lost connection to RDE
>> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: fatal in RTR: Lost connection to
>> parent
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE control
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: Can't send message 61 to RDE, pipe
>> closed
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
>> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to parent
>> ...
>> 
>> Mar 27 13:07:56 bgpgw-004 bgpd[95001]: fatal in RDE: aspath_get: Cannot
>> allocate memory
>> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: main: Lost connection to RDE
>> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: RTR: Lost connection to RDE
>> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: fatal in RTR: Lost connection to
>> parent
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE control
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
>> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to parent
>> 
>> Is my filter too aggressive for bgpd ? Is there a more efficient way to
>> write it ?
>  
> I doubt it is the filters. You run into some sort of memory leak. Please
> monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output 
> to see why and when the memory starts to go up.
> With that information it may be possible to figure out where this leak
> sits and how to fix it.
>
> Cheers

Also: check the values for bgpd's login class (as root, "su 

Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-29 Thread Claudio Jeker
On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:
> Hi,
> 
> I'm happily running several OpenBGPd routers (Openbsd 7.0).
> 
> After having applied the folloxing filters (to blackhole traffic from
> certain countries):
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> 
> # head /etc/bgpd/deny-asn.ru.bgpd
> match from any AS 2148 set { localpref 250 nexthop blackhole }
> match from any AS 2585 set { localpref 250 nexthop blackhole }
> match from any AS 2587 set { localpref 250 nexthop blackhole }
> match from any AS 2599 set { localpref 250 nexthop blackhole }
> match from any AS 2766 set { localpref 250 nexthop blackhole }
> match from any AS 2848 set { localpref 250 nexthop blackhole }
> match from any AS 2854 set { localpref 250 nexthop blackhole }
> match from any AS 2875 set { localpref 250 nexthop blackhole }
> match from any AS 2878 set { localpref 250 nexthop blackhole }
> match from any AS 2895 set { localpref 250 nexthop blackhole }
> 
> The bgpd daemon crashes every few days with the following:
> 
> Mar 21 11:36:54 bgpgw-004 bgpd[76476]: 338 roa-set entries expired
> Mar 21 12:06:54 bgpgw-004 bgpd[76476]: 36 roa-set entries expired
> Mar 21 12:11:54 bgpgw-004 bgpd[76476]: 82 roa-set entries expired
> Mar 21 12:22:36 bgpgw-004 bgpd[99215]: fatal in RDE: prefix_alloc: Cannot
> allocate memory
> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: main: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: RTR: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE control
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: fatal in RTR: Lost connection to
> parent
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: Can't send message 61 to RDE, pipe
> closed
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to parent
> ...
> 
> Mar 24 06:34:17 bgpgw-004 bgpd[83062]: 17 roa-set entries expired
> Mar 24 06:54:47 bgpgw-004 bgpd[82782]: fatal in RDE: communities_copy:
> Cannot allocate memory
> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: main: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: RTR: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: fatal in RTR: Lost connection to
> parent
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE control
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: Can't send message 61 to RDE, pipe
> closed
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to parent
> ...
> 
> Mar 27 13:07:56 bgpgw-004 bgpd[95001]: fatal in RDE: aspath_get: Cannot
> allocate memory
> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: main: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: RTR: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: fatal in RTR: Lost connection to
> parent
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE control
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to parent
> 
> Is my filter too aggressive for bgpd ? Is there a more efficient way to
> write it ?
 
I doubt it is the filters. You run into some sort of memory leak. Please
monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output 
to see why and when the memory starts to go up.
With that information it may be possible to figure out where this leak
sits and how to fix it.

Cheers
-- 
:wq Claudio



OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-29 Thread Laurent CARON

Hi,

I'm happily running several OpenBGPd routers (Openbsd 7.0).

After having applied the folloxing filters (to blackhole traffic from 
certain countries):


include "/etc/bgpd/deny-asn.ru.bgpd"
include "/etc/bgpd/deny-asn.by.bgpd"
include "/etc/bgpd/deny-asn.ua.bgpd"


# head /etc/bgpd/deny-asn.ru.bgpd
match from any AS 2148 set { localpref 250 nexthop blackhole }
match from any AS 2585 set { localpref 250 nexthop blackhole }
match from any AS 2587 set { localpref 250 nexthop blackhole }
match from any AS 2599 set { localpref 250 nexthop blackhole }
match from any AS 2766 set { localpref 250 nexthop blackhole }
match from any AS 2848 set { localpref 250 nexthop blackhole }
match from any AS 2854 set { localpref 250 nexthop blackhole }
match from any AS 2875 set { localpref 250 nexthop blackhole }
match from any AS 2878 set { localpref 250 nexthop blackhole }
match from any AS 2895 set { localpref 250 nexthop blackhole }

The bgpd daemon crashes every few days with the following:

Mar 21 11:36:54 bgpgw-004 bgpd[76476]: 338 roa-set entries expired
Mar 21 12:06:54 bgpgw-004 bgpd[76476]: 36 roa-set entries expired
Mar 21 12:11:54 bgpgw-004 bgpd[76476]: 82 roa-set entries expired
Mar 21 12:22:36 bgpgw-004 bgpd[99215]: fatal in RDE: prefix_alloc: 
Cannot allocate memory

Mar 21 12:22:36 bgpgw-004 bgpd[65049]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[65049]: main: Lost connection to RDE
Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[76476]: RTR: Lost connection to RDE
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE control
Mar 21 12:22:36 bgpgw-004 bgpd[76476]: fatal in RTR: Lost connection to 
parent
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: Can't send message 61 to RDE, 
pipe closed

Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to parent
...

Mar 24 06:34:17 bgpgw-004 bgpd[83062]: 17 roa-set entries expired
Mar 24 06:54:47 bgpgw-004 bgpd[82782]: fatal in RDE: communities_copy: 
Cannot allocate memory

Mar 24 06:54:47 bgpgw-004 bgpd[99753]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[99753]: main: Lost connection to RDE
Mar 24 06:54:47 bgpgw-004 bgpd[83062]: RTR: Lost connection to RDE
Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[83062]: fatal in RTR: Lost connection to 
parent

Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE
Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE control
Mar 24 06:54:47 bgpgw-004 bgpd[40748]: Can't send message 61 to RDE, 
pipe closed

Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to parent
...

Mar 27 13:07:56 bgpgw-004 bgpd[95001]: fatal in RDE: aspath_get: Cannot 
allocate memory

Mar 27 13:07:56 bgpgw-004 bgpd[84816]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[84816]: main: Lost connection to RDE
Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[3118]: RTR: Lost connection to RDE
Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[3118]: fatal in RTR: Lost connection to 
parent

Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE
Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE control
Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to parent

Is my filter too aggressive for bgpd ? Is there a more efficient way to 
write it ?


Thanks

Laurent