Re: [j-nsp] Junos 20 - slow RPD

2022-03-29 Thread Luca Salvatore via juniper-nsp
I've been down the path of very slow RPD with JTAC recently.  In our case
it was due to some mildly complex BGP community stuff that we do which was
exhausting memory limits.
A good fix for us was to bump up the memory allocation using these hidden
commands:

set policy-options as-path-match memory-limit 16m
set policy-options community-match memory-limit 16m

Default memory is 2097152 bytes, so very small.  You can see some
interesting numbers with some other hidden commands:

show policy community-match
show policy as-path-match

Also if you're running EVPN, check out this PR which is a whole world of
fun
https://prsearch.juniper.net/InfoCenter/index?page=prcontent=PR1616167



On Fri, Mar 25, 2022 at 6:27 AM Mark Tinka via juniper-nsp <
juniper-nsp@puck.nether.net> wrote:

>
>
> On 3/25/22 11:21, Mihai via juniper-nsp wrote:
>
> > In my case I just upgraded one MX204 in the lab to 21.2R2, enabled
> > rib-sharding and increased the JunosVM memory to 24G and things look
> > better now.
>
> Glad to hear!
>
> Mark.
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-25 Thread Mark Tinka via juniper-nsp




On 3/25/22 11:21, Mihai via juniper-nsp wrote:

In my case I just upgraded one MX204 in the lab to 21.2R2, enabled 
rib-sharding and increased the JunosVM memory to 24G and things look 
better now.


Glad to hear!

Mark.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-25 Thread Mihai via juniper-nsp
In my case I just upgraded one MX204 in the lab to 21.2R2, enabled 
rib-sharding and increased the JunosVM memory to 24G and things look 
better now.



On 25/03/2022 00:58, Gustavo Santos via juniper-nsp wrote:

Hi,
I think that I was the only one with this issue.

Even with a  RE-S-X6-64G.  We have very slow outbound updates. sending a
lot of fullrouting tables to customers may take upto 60 minutes or more
when you
have a lot of BGP groups , for instance, one group per customer ...  and if
the we have an issue with the preferred upstream provider, the customer
routers may me offline
until all updates are sent..

We got new routers and we are going to try Junos 20.4R3 latest service
release with update threading and rib-sharding to see if we get some
improvement, it is better to lost NSR than blackhole
traffic for over an hour..



Em qua., 23 de mar. de 2022 às 06:41, Mark Tinka via juniper-nsp <
juniper-nsp@puck.nether.net> escreveu:




On 3/22/22 22:42, Mihai via juniper-nsp wrote:



Hi Saku,

The routes are in VRF so no support for rib-sharding unfortunately.
This MX204 is running 20.2R3-S3 so probably the only option is to try
another version.


We've had some terrible experiences with RPD due to NSR sync. to re1 for
BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't
deal with grouping outbound updates to eBGP neighbors at scale for that
RE, which crashes RPD on re1.

The options were to either disable NSR, rewrite our outbound policies
and combine multiple customers in the same outbound group, or get more
memory. We went for the last option.

No more problems on the RE-S-X6-64G.

Juniper have some work to do to optimize the code in these use-cases.

Mark.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp



___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-25 Thread Mark Tinka via juniper-nsp




On 3/25/22 02:58, Gustavo Santos via juniper-nsp wrote:


Hi,
I think that I was the only one with this issue.


From their feedback, it seems the issue of scaling outbound updates of 
full tables to eBGP neighbors is known within Juniper, because they told 
us they have had to come up with all manner of hacks for many of their 
large scale customers as well.


So it's a fundamental problem, one I'm not sure they are addressing very 
well.


We can't keep throwing hardware at the problem.



it is better to lost NSR than blackhole
traffic for over an hour..


Agreed - we had gotten to the point where we were willing to give up NSR 
until we figure this out.


Mark.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-24 Thread Gustavo Santos via juniper-nsp
Hi,
I think that I was the only one with this issue.

Even with a  RE-S-X6-64G.  We have very slow outbound updates. sending a
lot of fullrouting tables to customers may take upto 60 minutes or more
when you
have a lot of BGP groups , for instance, one group per customer ...  and if
the we have an issue with the preferred upstream provider, the customer
routers may me offline
until all updates are sent..

We got new routers and we are going to try Junos 20.4R3 latest service
release with update threading and rib-sharding to see if we get some
improvement, it is better to lost NSR than blackhole
traffic for over an hour..



Em qua., 23 de mar. de 2022 às 06:41, Mark Tinka via juniper-nsp <
juniper-nsp@puck.nether.net> escreveu:

>
>
> On 3/22/22 22:42, Mihai via juniper-nsp wrote:
>
> >
> > Hi Saku,
> >
> > The routes are in VRF so no support for rib-sharding unfortunately.
> > This MX204 is running 20.2R3-S3 so probably the only option is to try
> > another version.
>
> We've had some terrible experiences with RPD due to NSR sync. to re1 for
> BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't
> deal with grouping outbound updates to eBGP neighbors at scale for that
> RE, which crashes RPD on re1.
>
> The options were to either disable NSR, rewrite our outbound policies
> and combine multiple customers in the same outbound group, or get more
> memory. We went for the last option.
>
> No more problems on the RE-S-X6-64G.
>
> Juniper have some work to do to optimize the code in these use-cases.
>
> Mark.
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-23 Thread Mark Tinka via juniper-nsp




On 3/22/22 22:42, Mihai via juniper-nsp wrote:



Hi Saku,

The routes are in VRF so no support for rib-sharding unfortunately.
This MX204 is running 20.2R3-S3 so probably the only option is to try 
another version.


We've had some terrible experiences with RPD due to NSR sync. to re1 for 
BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't 
deal with grouping outbound updates to eBGP neighbors at scale for that 
RE, which crashes RPD on re1.


The options were to either disable NSR, rewrite our outbound policies 
and combine multiple customers in the same outbound group, or get more 
memory. We went for the last option.


No more problems on the RE-S-X6-64G.

Juniper have some work to do to optimize the code in these use-cases.

Mark.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-22 Thread Mihai via juniper-nsp



Hi Saku,

The routes are in VRF so no support for rib-sharding unfortunately.
This MX204 is running 20.2R3-S3 so probably the only option is to try 
another version.


Thank you for your time and info, very useful as always.

On 22/03/2022 17:58, Saku Ytti wrote:

Hey,


On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
way slower in processing BGP policies and sending the routes to neighbors.
For example, on a BGP group with one neighbor and an export policy
containing 5 terms each matching a community it takes ~1min ( 100% RPD
utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
in 18.2.
Disabling terms will reduce the time.

Anyone experienced something similar?


I don't recognise this problem specifically. It seems rather terrible
regression so you probably should either open a JTAC case or do the
Junos dance. If you have a large RIB/FIB ratio allowing more than 1
core to work on BGP will produce improvement:

set system processes routing bgp rib-sharding number-of-shards 4
set system processes routing bgp update-threading

This is a disruptive change. JNPR wanted us on 20.3 (we are on
20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3
with success. We are currently targeting 21.4R1-S1.

If you have memory pressure, you can expand the default 16GB DRAM to
24GB DRAM via configuration toggle (post 21.2R1). If you are
comfortable hacking QEMU/KVM config manually, you can do it on any
release and can entertain other sizes.


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 20 - slow RPD

2022-03-22 Thread Saku Ytti via juniper-nsp
Hey,

> On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
> way slower in processing BGP policies and sending the routes to neighbors.
> For example, on a BGP group with one neighbor and an export policy
> containing 5 terms each matching a community it takes ~1min ( 100% RPD
> utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
> in 18.2.
> Disabling terms will reduce the time.
>
> Anyone experienced something similar?

I don't recognise this problem specifically. It seems rather terrible
regression so you probably should either open a JTAC case or do the
Junos dance. If you have a large RIB/FIB ratio allowing more than 1
core to work on BGP will produce improvement:

set system processes routing bgp rib-sharding number-of-shards 4
set system processes routing bgp update-threading

This is a disruptive change. JNPR wanted us on 20.3 (we are on
20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3
with success. We are currently targeting 21.4R1-S1.

If you have memory pressure, you can expand the default 16GB DRAM to
24GB DRAM via configuration toggle (post 21.2R1). If you are
comfortable hacking QEMU/KVM config manually, you can do it on any
release and can entertain other sizes.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp