Re: [c-nsp] ASR9001 BGP scaling and memory shortage

2020-05-25 Thread Vladimir Troitskiy
Hello everyone,

Other list members have a significantly lower memory usage for a BGP
process and a shmwin on ASR9001 routers with more sessions/routes in GRT.

Saku Ytti has suggested me some useful notes which I would like to mention
as a summary for this thread:
- one could use 'hw-module profile scale l3xl' in admin mode to increase an
RLIMIT for a BGP process, even on Typhoon-based platforms (not only on
Trident-based ones as I thought);
- a shmwin shortage is probably caused by per-prefix label mode, the per-ce
mode will be much more scalable. We use the per-prefix mode because of BGP
PIC limitations, but maybe it's time to reconsider the feature-set used.

вт, 19 мая 2020 г. в 20:09, Vladimir Troitskiy :

> Hello everyone,
>
> ASR9001 has some memory usage limits:
> - 1658M for a BGP process on a RSP
> - 1536M for a shared memory window on a LC
> Those limits seems to be unconfigurable.
>
> Has anybody experienced any issues with these limits on high-loaded
> ASR9001 boxes?
> We have a surprisingly high memory usage while the typical router setup is
> pretty lightweight - 4-5 full feeds (couple of upstreams and RRs). The only
> probably uncommon thing is we use "Internet in a VRF" approach.
>
> #show processes memory detail location 0/RSP0/CPU0
>> Tue May 19 19:39:12.592 Ural
>> JIDText   Data   Stack  DynamicDyn-Limit  Shm-Tot
>>  Phy-TotProcess
>> -- -- -- -- -- -- --
>> -- ---
>> 1054   1M 5M   516K  1485M  1658M76M
>>  1491M  bgp
>>
>
> #show memory summary location 0/0/CPU0
>>
> node:  node0_0_CPU0
>> --
>> Physical Memory: 8192M total
>>  Application Memory : 7988M (3811M available)
>>  Image: 75M (bootram: 75M)
>>  Reserved: 128M, IOMem: 0, flashfsys: 0
>>  Total shared window: 1327M
>>
>
> We have already had FIB inconsistency issues due to SHMWIN exhaustion
> despite the fact the total prefix amount was far from the platform limit
> (4M):
>
>> fib_mgr[184]: %OS-SHMWIN-3-ALLOC_ARENA_FAILED : SHMWIN: Failed to
>> allocate new arena from the server : 'SHMWIN_SVR' detected the 'fatal'
>> condition 'VM is exhausted or totally fragmented'
>> fib_mgr[184]: %ROUTING-FIB-3-ASSERT_RL : FIB internal inconsistency
>> detected
>> fib_mgr[184]: %ROUTING-FIB-3-PD_FAIL : FIB platform error:
>> fib_leaf_insert 5204 Cannot insert leaf
>>
>
> What are practical limits for BGP scaling on ASR9001 boxes? Could anyone
> share a memory usage stats?
> --
> Best regards,
> Vladimir Troitsky
>

-- 
Best regards,
Vladimir Troitsky
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] BGP router process using way more memory on one system

2020-05-25 Thread Nick Hilliard

Drew Weaver wrote on 24/05/2020 19:20:

We have two routers that have a mirrored configuration. Peers, BGP
configuration, everything. Exactly the same [except for IP
addresses]

One of the routers BGP router process is holding 617576024. The other
is holding 577596716.

The one that is holding more appears to be suffering from an out of
memory condition.


There were a couple of releases where the ipv4_rib process had a 
persistent memory leak.  Try this:


Router# admin process restart ipv4_rib

This is non service affecting - restarting the process temporarily stops 
FIB reprogramming, then does a full RIB reload from all RIB sources, 
then does a FIB check across the device. I.e. it's safer to do this than 
to hobble along with OOM errors.


Nick
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/