from:"Rob Foehl"

[j-nsp] Setting IGP metrics in BGP import policies

2013-02-28 Thread Rob Foehl

I'm looking for an equivalent of 'then metric igp' that would actually 
work in a BGP import policy, specifically such that the resulting MEDs 
would be redistributed to iBGP neighbors.  'path-selection med-plus-igp' 
seems to be purely local, and 'then metric igp' is explicitly documented 
as only working in eBGP export policies, so I doubt even the combination 
would accomplish what I'm after.


The obvious answer would be to just set the metric explicitly in separate 
import policies, but that results in a lot of extra config.  Are there any 
clever ways to do this, or am I crazy for even trying?


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] KRT queue stalls fixed in 11.4R8?

2013-06-24 Thread Rob Foehl

According to the release notes for 11.4R8, the KRT queue stall issue 
(PR836197) has been marked as resolved.  Has anyone had a chance to 
confirm this on a suitably session-heavy MX?


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Matching specific OSPF routes in aggregate policy

2013-09-25 Thread Rob Foehl


Hey folks,

Another OSPF issue for the day: I have a somewhat specific need to match a 
route from a particular OSPF speaker in an aggregate policy, and I'm not 
having much luck coming up with a straightforward way to do so.


The route in question is injected via a type 5 LSA from a (dumb) source in 
an isolated area; the ABR is an EX, the upstream box where I'd prefer to 
handle the aggregation is an MX.  If there was a from router-id x.x.x.x 
match condition, it'd solve this pretty easily...


OSPF tag values are another option, but the source isn't smart enough to 
set them, and I haven't found any way to apply them to inter-area routes 
on the ABR (at least not external LSAs, anyway).


Matching next-hop values in aggregate policy works fine, but only at the 
ABR, of course.  If I do things this way, I'm stuck layering and 
redistributing aggregates within the network, which seems like more 
trouble than it ought to be.


As long as I'm willing to carry an extra route around in parallel, it'd 
likely be easiest to just redistribute the route into BGP at the boundary 
and build the aggregate from that (optionally via community match).


Am I missing any other possibilities?

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Matching specific OSPF routes in aggregate policy

2013-09-26 Thread Rob Foehl


On Thu, 26 Sep 2013, Phil Fagan wrote:


Is your aggregate policy already on the MX and is its purpose to export into
BGP from OSPF?


That's the idea...  There are disparate contributing routes, and they tend 
to come and go on a fairly regular basis.  Generating like aggregates 
elsewhere and letting BGP multipath do its thing leads to a flurry of BGP 
updates, even though the aggregate prefix is stable, hence trying to do 
the aggregate generation in a common place.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Matching specific OSPF routes in aggregate policy

2013-09-30 Thread Rob Foehl


On Fri, 27 Sep 2013, Phil Fagan wrote:


could you use BGP multi-hop and simply peer directly to the MX bypassing the
need to redist routes in though your OSPF core?


That's basically what I'm considering from the perspective of 
readvertising into BGP on the ABR...  The source is going to be OSPF 
either way (and I need ECMP to behave itself, as well).  I'm probably 
going to wind up with some sort of hybrid along these lines, unless I'm 
still missing something either really clever or really obvious... :)


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] RSVP path messages and loopback firewall filters

2014-04-21 Thread Rob Foehl

A quick question: how are folks handling RSVP path messages in loopback 
firewall filters, particularly on MX?  prefix-lists covering all RSVP 
speakers?  Explicit IP options match?  Ignoring them entirely and hoping 
the policer on a default accept term won't step on them too hard? ;)


Thanks,

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX80 Sampling - High CPU

2015-01-15 Thread Rob Foehl


On Thu, 15 Jan 2015, Mark Tees wrote:


For me on an MX80 running 11.4R13 with samlping that 10 minute equates to:

- around 3mins of rpd + sampling taking turns to smash the routing
engine CPU whilst seeming allowing other things to still be scheduled
in (phew).
- another 7mins of sampling chewing the CPU


I get similar behavior across several MX80s on various 11.4 builds with 
sampling enabled.  Pretty much anything that causes rpd to walk the entire 
RIB leads to this, including policy updates that produce no changes toward 
the PFEs.  I watched sampled take well over 20 minutes to settle down 
after one of those today, and the RE was basically useless for the 
duration...


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Commit script portability between ELS and non-ELS platforms

2016-06-07 Thread Rob Foehl

Does anyone have any clever methods for probing Enhanced Layer 2 Software 
support from a commit script on QFX/EX in order to generate changes 
appropriate to the platform?  Specifically looking for something beyond 
checking hardware and version numbers, or for pieces of config hierarchy 
that might not be present on any given box either way.


Thanks!

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Fate sharing between BGP and RSVP

2016-09-13 Thread Rob Foehl


On Tue, 13 Sep 2016, Chuck Anderson wrote:


Could you just use a strict MPLS path with an ERO?


Hmm, doesn't look like it...  I just tried configuring an explicit path 
LSP to nowhere on a lab box, and it didn't install anything into the 
routing table without the LSP up.  Either way, a strict path would 
probably cause more trouble than it'd be worth.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Fate sharing between BGP and RSVP

2016-09-13 Thread Rob Foehl


On Tue, 13 Sep 2016, Chuck Anderson wrote:


I guess I don't understand what you are trying to accomplish then.
Ttaffic engineering specific routes is exactly what RSVP is used for.
The MPLS path should be torn down if there is no available
RSVP-capable route.  Did you try just not configuring RSVP on the
interfaces that can't support MPLS?


The LSP is torn down; the BGP session carrying a bunch of routes for which 
that LSP is the only viable forwarding path survives.  (RSVP is only 
configured where it ought to be, no "interfaces all" or anything like 
that.)


The goal is for the BGP-learned routes to disappear along with the LSP -- 
preferably by just tearing the session down with it, but otherwise 
invalidating the next-hops would suffice.


I have another idea in my back pocket if this isn't workable, but that 
involves turning a bunch of P routers into full BGP RRs...


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Fate sharing between BGP and RSVP

2016-09-13 Thread Rob Foehl

Assuming a typical IBGP session built between loopbacks, is there any 
relatively clean way to tie that session state to RSVP-signaled LSPs 
between the same pair of routers?


I'm trying to work around a case where the IGP knows about another path 
between the two that doesn't carry any MPLS traffic, thus keeping the BGP 
session alive without a valid forwarding path for any of the received 
routes in the event of a failure along the MPLS-enabled path.


Thanks!

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Commit script portability between ELS and non-ELS platforms

2016-09-13 Thread Rob Foehl


On Wed, 8 Jun 2016, Phil Mayers wrote:


On 07/06/16 21:51, Rob Foehl wrote:

 Does anyone have any clever methods for probing Enhanced Layer 2
 Software support from a commit script on QFX/EX in order to generate
 changes appropriate to the platform?  Specifically looking for something
 beyond checking hardware and version numbers, or for pieces of config
 hierarchy that might not be present on any given box either way.






...returns substantially different XML b/w ELS and non-ELS IIRC.


Thanks, Phil.  Just realizing I'd never followed up on this...

That RPC does return completely different XML, and was easy enough to 
wedge into a commit script to detect ELS vs. non-ELS, but we hit a snag 
with VCs in that the backup RE fails to execute the RPC and thus blows up 
during a commit sync.  This should still work fine over netconf for 
off-box differentiation, of course.


We wound up falling back to platform detection, with explicit pattern 
matching for the models we've tested the script on.  There's a small 
benefit there, in that the script will refuse to run on a new box where 
nobody's tested it before, but that's still a maintenance trade-off.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] PR1097749 and default-address-selection

2016-11-03 Thread Rob Foehl


For the Juniper engineering folks on the list:

PR1097749 was opened in June 2015 concerning a default-address-selection 
regression in 12.1 and later releases, which was eventually fixed on SRX, 
but we're still running into it on EX in certain circumstances.


The PR remains non-public, and I've reached something of an impasse in 
trying to get this resolved without being able to keep an affected box 
available for TAC to play with.  Would any of you mind looking at the fix 
that landed for SRX and at least determining whether the same fix is 
applicable to other platforms, EX in particular?


Off-list is fine, and I'll gladly get you any additional information I can 
in order to get this moving again.  Thanks!


-Rob

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Ex8208 TRAP

2018-05-21 Thread Rob Foehl


On Mon, 21 May 2018, Chris Kawchuk wrote:


Your dates are all over the place

May 19, then Jun 14, then back to May 19th...


That's what happens when a card boots, before it figures out the current 
time...  Why the RE accepts logs like this at face value is another 
question, but this behavior is not uncommon.



Your SFP lost optics. Low power.


To be expected when the card resets, as indicated by the first log entry:


May 19 20:42:52  Core-SW-8208 chassisd[1338]: CHASSISD_SNMP_TRAP10: SNMP trap 
generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 1, jnxFruL2Index 
1, jnxFruL3Index 0, jnxFruName PIC: 8x 10GE SFP+ @ 0/0/*, jnxFruType 11, 
jnxFruSlot 0, jnxFruOfflineReason 2, jnxFruLastPowerOff 20675705, 
jnxFruLastPowerOn 20686688)



So.. what have you done to troubleshoot this w/your optical carrier or fibre 
provider, besides post on j-nsp?


OP needs to figure out why the card reset, which their optical carrier or 
fibre provider probably can't tell them.  Earlier logs might help.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Mixing v4/v6 neighbors in BGP groups

2018-06-29 Thread Rob Foehl


On Fri, 29 Jun 2018, Mark Tinka wrote:


I prefer not to find out whether walking on hot coal will kill all
feeling in my feet, or just numb them for 2hrs :-).


So...  Is that a vote for or against, and which one? ;)

On Fri, 29 Jun 2018, Job Snijders wrote:


For the purpose of inter-domain routing I'd advise against mixing warm
mayonnaise and jagermeister. uh.. i mean IPv4 and IPv6.

Keeping things separate maybe makes debugging easier.


I may have been insufficiently specific...  I'm referring to:

group example {
neighbor 192.0.2.0;
neighbor 2001:db8::;
}

vs.

group example-ipv4 {
neighbor 192.0.2.0;
}

group example-ipv6 {
neighbor 2001:db8::;
}


The former is (operationally) simpler to deal with, until it isn't -- 
think "deactivate group example", etc.  I'm tempted to just be explicit 
about the split everywhere, but I already spend enough time explaining 
that there are two of everything and it's been that way for a while now...


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Router for full routes

2018-06-27 Thread Rob Foehl


On Wed, 27 Jun 2018, Mark Tinka wrote:


At this stage, I'd say the cheapest MX router you should go for that is
decent is the MX204.


Any thoughts on MX204s replacing ancient MX240s, assuming one can make the 
interface mix work?


I'm looking at the replacement option vs. in-place upgrades of a mixed bag 
of old RE/SCB/DPC/MPC parts...  Seems like an obvious win in cases with 
only a handful of 10G ports, less so otherwise.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Mixing v4/v6 in other places (was Re: Mixing v4/v6 neighbors in BGP groups)

2018-08-15 Thread Rob Foehl


On Fri, 29 Jun 2018, Rob Foehl wrote:


I may have been insufficiently specific...  I'm referring to:

group example {
neighbor 192.0.2.0;
neighbor 2001:db8::;
} 


vs.

group example-ipv4 {
neighbor 192.0.2.0;
} 


group example-ipv6 {
neighbor 2001:db8::;
}


Hey folks,

Appreciate all the responses here, even those which apparently assumed I'd 
been bored and daydreaming about trying to wedge both address families 
into a single session -- hopefully the six week late reply is sufficient 
proof otherwise... :)


I have a few follow up questions on related topics, especially for those 
in favor of separate peer groups:


Are you doing anything differently in your IGP?  (Okay, a bit of a loaded 
question, I know...)  What about LDP and/or RSVP transport?


How are you handling v6 TE -- nothing yet, 6PE, inet6 shortcuts, native v6 
LSPs, waiting for SR to take over the world, something else?  Which Junos 
release(s) were necessary to get there?


I'm still carrying v6 traffic around natively, though not sure how much 
longer that'll be workable.  6PE seems like a step backward when the 
network is already 100% v6 enabled, but then I haven't quite wrapped my 
head around all of the options here...  Thanks in advance for any clue.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] LSP's with IPV6 on Juniper

2018-08-28 Thread Rob Foehl


On Tue, 28 Aug 2018, adamv0...@netconsultings.com wrote:


Just out of curiosity is there a business problem/requirement/limitation you're 
trying to solve by not changing the next hop to v6 mapped v4 address and using 
native v6 NHs instead please?


I'd asked a similar question as the OP two weeks ago in the thread about 
mixing v4 and v6 in the same BGP peer groups, after several responses 
extolling the virtues of avoiding any conflation between the two.  If 
that's the case for routing, but forwarding v6 in an entirely v4-dependent 
manner on a 100% dual stack network is tolerable, then this inconsistency 
is... inconsistent.


By all outward appearances, v6 is still a second class citizen when it 
comes to TE, and it doesn't seem unreasonable to ask why this is the way 
it is in 2018.  There are plenty of valid reasons for wanting parity.



On contrary 6PE/6VPE is such a well-trodden path.


The world is covered with well-trodden paths that have fallen into disuse 
with the arrival of newer, better, more convenient infrastructure.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] mx960 crashed

2018-04-04 Thread Rob Foehl


On Wed, 4 Apr 2018, Aaron Gould wrote:


Any idea why this happened and how do I tshoot cause ?



login: root

Password:SysRq : Trigger a crash


Looks like you're running a RE-S-X6-64G, and somehow sent it SysRq c -- 
which is a break followed by c within 5 seconds on a serial console -- and 
the hypervisor dutifully crashed and wrote out a dump.  Can't really blame 
it for doing what it's told.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Mixing v4/v6 neighbors in BGP groups

2018-06-29 Thread Rob Foehl

Wondering aloud a bit...  I've seen plenty of cases where wedging parallel 
v4/v6 sessions into the same BGP group and letting the router sort out 
which AFI it's supposed to be using on each session works fine, and nearly 
as many where configuring anything family-specific starts to get ugly 
without splitting them into separate v4/v6 groups.  Are there any 
particularly compelling reasons to prefer one over the other?


I can think of a bunch of reasons for and against on both sides, and 
several ways to handle it with apply-groups or commit scripts.  Curious 
what others are doing here.


Thanks!

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Router for full routes

2018-06-29 Thread Rob Foehl


On Wed, 27 Jun 2018, Mark Tinka wrote:


But to your question, there is nothing ancient about the MX240. It's just
small. Look at your future needs and consider whether having those 2 line
card slots running the latest-generation Trio chip will scale better than
migrating to the MX204, and that should answer your question.


Thanks for the detailed reply, Mark.

By "ancient", I mean boxes still running RE-S-1300s, original SCBs, and 
either DPCs or older MPC2s -- basically, everything EOL except the 
chassis, and running a mix of 1G and 10G interfaces.  The limited slot 
count isn't much of an issue, especially with the possibility of moving to 
at least MPC3Es with 10x10G MICs.


The REs are the biggest issue, stuck on old code and not nearly enough 
memory.  1G interfaces are also a problem, but switches are cheap...


I do like the idea of the MX204 as an edge box, currently have some MX80s 
in that role that wouldn't be missed.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] "set routing-options protect core" breaks local-preference

2018-09-27 Thread Rob Foehl


On Thu, 27 Sep 2018, Karl Gerhard wrote:


thanks for sharing, seems like all Junos versions above 17.3R3 are affected.


I'd been hoping this was specific to 18.x, but no such luck.  We'd just 
settled on 17.4 in large part due to the number of times we've heard that 
it's received "extra QA", although issues like this make me wonder.


PIC isn't critical -- wasn't available on the platforms we're replacing -- 
but I'd be more forgiving if it was a new feature in this code, rather 
than a regression.



All of the changes and work that is being done on the RPD code and other
parts of Junos is completely worthless to me if folks at Juniper don't start
writing regression tests.


Agreed.  Worse still is the near impossibility of getting regressions 
fixed in a timely fashion, especially without starting over with a new 
case for each affected platform / device.  I've all but given up trying.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] "set routing-options protect core" breaks local-preference

2018-09-26 Thread Rob Foehl


On Mon, 10 Sep 2018, Ivan Malyarchuk wrote:


Hi. We also find something wrong with "protect core".
Seems like Junos 18.1 and 18.2 (running on MX204 in our case) makes one 
#Multipath  equal-cost group with ALL paths except one worst AND one with worst 
path - as backup.

I think it must create  #Multipath forwarding-only route with one best (active, with 
weight of nexthop 0x1, find in detail output) and one "second best" path as 
backup (with weight 0x4000). However I cant find any public PR or Known Issues references 
in 18.x release notes nor in prsearch.


Just hit this issue with 17.4R2 on a newly installed RE-S-X6-64G.  PIC 
works fine as long as all routes are equivalent local preference; as soon 
as one arrives with a lower preference, the #Multipath entry appears with 
all higher preference routes installed as equivalent active routes and the 
lower preference as the sole inactive:


#Multipath Preference: 255
[...]
Protocol next hop: [X]
Indirect next hop: 0x940dc00 1048575 INH Session ID: 0x143 
Weight 0x1
Protocol next hop: [Y]
Indirect next hop: 0x940da00 1048574 INH Session ID: 0x142 
Weight 0x1
Protocol next hop: [Z]
Indirect next hop: 0x9411800 1048624 INH Session ID: 0x161 
Weight 0x4000
State: 
Inactive reason: Forwarding use only

In this case, [X] has a greater IGP cost than [Y], and the BGP decision 
is correct, but [X] is still installed in the forwarding table as an 
active route.


Anyone have any PRs or existing tickets they'd be willing to share 
(off-list is fine) for purposes of sending JTAC in the right direction?


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] EVPN/VXLAN experience

2019-03-22 Thread Rob Foehl


On Fri, 22 Mar 2019, Vincent Bernat wrote:


❦ 22 mars 2019 13:39 -04, Rob Foehl :


I've got a few really large layer 2 domains that I'm looking to start
breaking up and stitching back together with EVPN+VXLAN in the middle,
on the order of a few thousand VLANs apiece.  Trying to plan around
any likely limitations, but specifics have been hard to come by...


You can find a bit more here:

- 
<https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/interface-num-edit-forwarding-options.html>
- 
<https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/next-hop-edit-forwarding-options-vxlan-routing.html>


Noted, thanks.  Raises even more questions, though...  Are these really 
QFX5110 specific, and if so, are there static limitations on the 5100 
chipset?


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] EVPN/VXLAN experience (was: EX4600 or QFX5110)

2019-03-22 Thread Rob Foehl


On Fri, 22 Mar 2019, Sebastian Wiesinger wrote:


What did bother us was that you are limited (at least on QFX5100) in
the amount of "VLANs" (VNIs). We were testing with 30 client
full-trunk ports per leaf and with that amount you can only provision
around 500 VLANs before you get errors and basically it seems you run
out of memory for bridge domains on the switch. This seems to be a
limitation by the chips used in the QFX5100, at least that's what I
got when I asked about it.

You can check if you know where:

root@SW-A:RE:0% ifsmon -Id | grep IFBD
IFBD   :12884  0

root@SW-A:RE:0% ifsmon -Id | grep Bridge
Bridge Domain  : 3502   0

These numbers combined need to be <= 16382.

And if you get over the limit these nice errors occur:

dcf_ng_get_vxlan_ifbd_hw_token: Max vxlan ifbd hw token reached 16382
ifbd_create_node: VXLAN IFBD hw token couldn't be allocated for 

Workaround is to decrease VLANs or trunk config.


Huh, that's potentially bad...  Can you elaborate on the config a bit 
more?  Are you hitting a limit around ~16k bridge domains total?


I've got a few really large layer 2 domains that I'm looking to start 
breaking up and stitching back together with EVPN+VXLAN in the middle, on 
the order of a few thousand VLANs apiece.  Trying to plan around any 
likely limitations, but specifics have been hard to come by...


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] EVPN all-active toward large layer 2?

2019-04-17 Thread Rob Foehl

I've been experimenting with EVPN all-active multihoming toward some large 
legacy layer 2 domains, and running into some fairly bizarre behavior...


First and foremost, is a topology like this even a valid use case?

EVPN PE <-> switch <-> switch <-> EVPN PE

...where both switches are STP root bridges and have a pile of VLANs and 
other switches behind them.  All of the documentation seems to hint at 
LACP toward a single CE device being the expected config here -- is that 
accurate?  If so, are there any options to make the above work?


If I turn up EVPN virtual-switch routing instances on both PEs as above 
with config on both roughly equivalent to the following:


interfaces {
xe-0/1/2 {
flexible-vlan-tagging;
encapsulation flexible-ethernet-services;
esi {
00:11:11:11:11:11:11:11:11:11;
all-active;
}
unit 12 {
encapsulation vlan-bridge;
vlan-id 12;
}
}
}
routing-instances {
test {
instance-type virtual-switch;
vrf-target target:65000:1;
protocols {
evpn {
extended-vlan-list 12;
}
}
bridge-domains {
test-vlan12 {
vlan-id 12;
interface xe-0/1/2.12;
}
}
}
}

Everything works fine for a few minutes -- exact time varies -- then what 
appears to be thousands of packets of unknown unicast traffic starts 
flowing between the PEs, and doesn't stop until one or the other is 
disabled.  Same behavior on this particular segment with or without any 
remote PEs connected.


Both PEs are MX204s running 18.1R3-S4, automatic route distinguishers, 
full mesh RSVP LSPs between, direct BGP with family evpn allowed, no LDP.


I'm going to try a few more tests with single-active and enabling MAC 
accounting to try to nail down what this traffic actually is, but figure 
I'd better first ask whether I'm nuts for trying this at all...


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] EVPN all-active toward large layer 2?

2019-04-18 Thread Rob Foehl


On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:


Hi Rob,
RFC 7432, Section 8.5:

   If a bridged network is multihomed to more than one PE in an EVPN
   network via switches, then the support of All-Active redundancy mode
   requires the bridged network to be connected to two or more PEs using
   a LAG.


So, have you MC-LAG (facing EVPN PEs) configured on your switches?


No, hence the question...  I'd have expected ESI-LAG to be relevant for 
EVPN, and in this case it's not a single "CE" device but rather an entire 
layer 2 domain.  For a few of those, Juniper-flavored MC-LAG isn't an 
option, anyway.  In any case, it's not clear what 8.5 means by "must be 
connected using a LAG" -- from only one device in said bridged network?


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] EVPN all-active toward large layer 2?

2019-04-18 Thread Rob Foehl


On Thu, 18 Apr 2019, Wojciech Janiszewski wrote:


You have effectively created L2 loop over EVPN, so to cut it you need a
link between bridged network and EVPN to be a single link. There is no STP
in EVPN.
If you need two physical connections to between those networks, then LAG is
a way to go. MC-LAG or virtual chassis can be configured on legacy switches
to maintain that connection. ESI will handle that on EVPN side.


On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:


As per RFC, bridges must appear to EVPN PEs as a LAG. In essence, you need to 
configure MC-LAG (facing EVPN PEs) on the switches facing EVPN PEs, if you have 
multiple switches facing EVPN-PEs. Switches doesn’t need to be from Juniper, so 
MC-LAG on the switches doesn’t need to be Juniper-flavored. If you have single 
switch facing EVPN PEs -> simple LAG (with members towards different EVPN PEs) 
on that single switch is OK.


Got it.  Insufficiently careful reading of the RFC vs. Juniper example 
documentation.  I really ought to know better by now...


Unfortunately, doing MC-LAG of any flavor toward the PEs from some of 
these switches is easier said than done.  Assuming incredibly dumb layer 2 
only, and re-reading RFC 7432 8.5 more carefully this time...  Is 
single-active a viable option here?  If so, is there any support on the MX 
for what the RFC is calling service carving for VLAN-aware bundles for 
basic load balancing between the PEs?


Thanks for setting me straight!

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] prsearch missing in inaction

2019-05-08 Thread Rob Foehl


On Tue, 7 May 2019, Nathan Ward wrote:


Is it actually coming back? Hard to believe the “technical issue” given how 
long it’s been, seems like a pretty big systemic issue rather than a technical 
one. “Actively worked on” seems pretty inactive, to me.


Maybe it runs on Space, and they're just waiting for the web service to 
restart...

Asked the account team about it today.  Doesn't sound like there are any 
definitive answers.


-Rob

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX204 vs. MX240??

2019-11-09 Thread Rob Foehl

I'll preface this by saying that the MX204 is a great box, and fits many a 
niche quite well...  However:


On Fri, 8 Nov 2019, Clarke Morledge wrote:

My understanding is that the MX204 is a 1 RU MPC7, but with a few 
modifications.


More or less -- it's an RE glued to the non-fabric-facing parts of the 
MPC7, which tends to tickle some "interesting" corner cases in code that 
assumes there's a fabric chip present.


I understand that the eight 10Gig ports have been modified to 
allow for 1 Gig transceivers as well, and perhaps that the QSFP ports can 
accommodate a pigtail for providing a bunch of 1 Gig connections, if 
necessary.


You can run 1G optical transceivers in the 8x SFP+ slots, if necessary...

Don't.

Seriously, don't.  The initial code in 17.4 refused to light them at all, 
and seems to have been haphazardly gutted of all config/op statements 
related to 1G optics.  18.1R3 is necessary to support them at all, and 
they show up as xe- interfaces only, half the config is hidden and the 
other half refuses to commit, they have a lot of weird problems with rate 
negotiation, and they don't work in bundles unless you really beat on the 
other end to convince it to bring up the aggregate.


Just pair the 204 up with a cheap switch...  Whether you want to get crazy 
and run Fusion is another matter.


Also, I understand that the MX204 CPU and other resources are a vast 
improvement over the MX80, and that the MX204 can handle multiple full 
Internet route BGP feeds, just as well as the MX240 REs can, without 
compromise in performance.


Yup, plenty of memory and CPU to play with, it'll do 10M routes without 
batting an eye.


The newer VM support inside the RE makes the requirements for an additional 
RE less important now, according to my understanding.


Again more or less -- ISSU between VMs works reasonably well, but Juniper 
has walked back the original claims of "never needing to upgrade the 
hypervisor" quite a bit since these were released.  I've been doing full 
vmhost upgrades every time to minimize surprises.  Need a pair for real 
redundancy, anyway...


So, if you do not need a lot of speeds and feeds, and can live without a 
physical backup RE, the MX204 would be a good alternative to a MX240.


You'll also need to be willing to run relatively recent software if you 
want to do anything beyond basic layer 3.  I had 3 MX204-specific PRs on 
18.1R3 that led to running 18.4R1-S4 now -- and have 5 new SRs open 
against that code.  Your mileage may vary...


-Rob




___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BPDUs over EVPN?

2019-10-18 Thread Rob Foehl


On Fri, 18 Oct 2019, Gert Doering wrote:

On Thu, Oct 17, 2019 at 05:37:16PM -0400, Rob Foehl wrote:

Is EVPN expected to be forwarding BPDUs at all, intact or otherwise?


The way understand "how things are meant to be plugged together", you
should not see forwarded BPDUs - "containing layer2 madness to one
attachment site" is the whole point, isn't it?

I could see very special cases where it would be necessary, but that
would need to be a non-default-enabled switch.


Right, I'd only expect this to work in very specific "make it entirely 
transparent" cases...  It looks like both Cisco and Huawei document 
options related to BPDU tunneling, and neither enable any of it by 
default.  Not coming up with much else for comparison.


Juniper is now telling me that this is occuring by design, but can't point 
to any documentation or standards which support that, nor explain why it 
suddenly changed post-upgrade.  I'm... not convinced.


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BPDUs over EVPN?

2019-10-21 Thread Rob Foehl


On Fri, 18 Oct 2019, Rob Foehl wrote:

Juniper is now telling me that this is occuring by design, but can't point to 
any documentation or standards which support that, nor explain why it 
suddenly changed post-upgrade.  I'm... not convinced.


Plot twist: the BPDUs in question turned out to be PVST+ BPDUs from an 
unknown source, that were helpfully converted to MSTP BPDUs by some 
on-by-default compatibility feature on an ancient pair of 6509s -- which 
then started complaining loudly about and attempting to block ports based 
on the nonsensical MSTP BPDUs which they'd made up.


Ugh.  This is why we can't have nice things.

Anyway...  This still leaves the questions of why this became apparent 
only after an upgrade, why there are now multiple disagreements between 
stated and observed behavior regarding regular BPDUs, and whether any of 
this is correct.  Case is still open, I'll follow up when I know more...


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BPDUs over EVPN?

2019-10-18 Thread Rob Foehl


On Fri, 18 Oct 2019, Gert Doering wrote:


On Fri, Oct 18, 2019 at 01:37:21PM +0200, Daniel Verlouw wrote:

On Fri, Oct 18, 2019 at 11:45 AM Gert Doering  wrote:

If yes, is this something people do over EVPN?


as an extension to 'plain' EVPN, yes. It's called EVPN-VPWS, RFC 8214.
Basically EVPN without the MAC learning.


Thanks, this is enlightening.

Are there vendor implementations?


For the Juniper flavor:

https://www.juniper.net/documentation/en_US/junos/topics/concept/evpn-vpws-signaling-mechanisms-overview.html

Has some nice advantages over traditional pseudowire/CCC setups, although 
I have yet to replace one with full EVPN-VPWS since nearly all wound up 
being used as "really long cable between layer 3 interfaces" in practice.


Incidentally, RFC 8214 doesn't mention BPDUs anywhere, either... ;)

-Rob



___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] BPDUs over EVPN?

2019-10-17 Thread Rob Foehl

Seeing something "interesting" after an 18.1R3 to 18.4R1 upgrade on some 
EVPN PEs: the 18.4 boxes are now emitting BPDUs toward the CE interfaces 
containing pre-translation VLAN IDs from the CEs attached to remote PEs, 
which as far as I can tell are originating from the remote CE.


Is EVPN expected to be forwarding BPDUs at all, intact or otherwise?

If yes, is that dependent on how it's configured?  In this case, it's 
VLAN-aware virtual switch instances everywhere, rewriting tags for 
multiple VLANs.  Does PBB change things?  We hit another bug where 18.1 
was convinced these are PBB configs, when they're not...


Given the discrepancy between releases, which one is wrong?  I'm two weeks 
into a TAC case that's been passed around several times and still have no 
answers to any of these questions, would appreciate hearing from anyone 
who actually knows.  Thanks!


-Rob




___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] EVPN all-active vs. layer 3

2019-11-11 Thread Rob Foehl

Given a pair of EX uplinked to a pair of MX, with various downstream CE 
that may be single devices or their own layer 2 topologies, as in this 
terrible diagram:


  MX1 - MX2
  |  / \  |
  EX1   EX2
 \ /
 CEs

...and a need to deliver various EVPN services to access ports on the EX, 
which currently run a bunch of layer 3 toward the CE devices (sometimes 
including VRRP across the CEs, or eBGP to them, or both):


Are there any "good" options for running all-active toward the EX while 
also moving existing layer 3 up to the MX, or am I stuck with 
single-active and carving out IFLs for each use case?


-Rob



___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX204 vs. MX240??

2019-11-11 Thread Rob Foehl


On Sun, 10 Nov 2019, Saku Ytti wrote:


More or less -- it's an RE glued to the non-fabric-facing parts of the
MPC7, which tends to tickle some "interesting" corner cases in code that
assumes there's a fabric chip present.


I don't think RE connects atypically in MX204. RE is ETH+PCI
connected, not fabric.


Atypical only in that there's no chassis switch involved on the Ethernet 
link...  The comment above was about software catching up to what's 
(functionally) missing from the line card, in this case.


PR1444186 -- GRE packets which are larger than MTU get dropped on MX204 
platforms when sampling is enabled on the egress interface -- was a fun 
one, and as it happens an MX80 was used to demonstrate that the issue was 
specific to MX204.  There's a not-yet-public follow up for that, and I've 
also got a half dozen 204s sporadically timing out BFD sessions, with 
platform as the only commonality.


New platform, new bugs...  My only real complaint is how long it takes to 
get fixes turned around these days.


-Rob




___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ACX5448 & ACX710

2020-01-22 Thread Rob Foehl


On Wed, 22 Jan 2020, Giuliano C. Medalha wrote:


TE / TE++ and auto-bandwidth


Still broken?  Been hearing excuses about why these don't work on merchant 
silicon boxes since the EX3200...


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ACX5448 & ACX710

2020-01-22 Thread Rob Foehl


On Wed, 22 Jan 2020, Saku Ytti wrote:


On Wed, 22 Jan 2020 at 11:48, Rob Foehl  wrote:


TE / TE++ and auto-bandwidth


Still broken?  Been hearing excuses about why these don't work on merchant
silicon boxes since the EX3200...


Excuses seems strong word, it implies you know what merchant silicon
EX3200 has and it implies you know it can push two and swap, which it
can't.


autobw never worked on EX3200 and similar vintage because they'd 
periodically dump impossible values into the statistics files and then try 
to do reservations near integer-width limits.  Who implied anything about 
needing more than one label?


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX subscriber captive portal - redirect and rewrite simultaneously

2020-09-01 Thread Rob Foehl


I'll preface this by saying I don't have anything constructive to add...

On Fri, 28 Aug 2020, Nathan Ward wrote:


I’ve tried JTAC on this, twice. First on 16.1, and again now on 19.4. Both 
times JTAC have either not understood and stalled for months and refused to 
escalate to someone who does understand, or, more recently, claim that the docs 
are incorrect and are being fixed.


Just got this exact response late last week, on the issue with BGP output 
queue priorities -- they figured out why it doesn't work, and will update 
the documentation to remove any promises that it should.


Sigh.  I hope this isn't a trend.

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-29 Thread Rob Foehl


On Tue, 28 Jul 2020, Jeffrey Haas wrote:


- "show bgp output-scheduler" is empty without top-level "protocols bgp
 output-queue-priority" config, regardless of anything else

- Top-level "protocols bgp family evpn signaling" priority config -- and
 nothing else within that stanza -- broke every v6 session on the box,
 even with family inet6 explicitly configured under those groups


If you're simply trying to prioritize evpn differently than inet unicast, 
simply having a separate priority for that address family should have been 
sufficient.


Right, that's what I took away from the docs...  No luck in any case, 
starting from the "simplest" of just adding this:


set protocols bgp group X family evpn signaling output-queue-priority expedited

That'll produce this in "show bgp group output-queues" for that group:

  NLRI evpn:
OutQ: expedited RRQ: priority 1 WDQ: priority 1

...but that's it, and no change in behavior.  Same config for family inet 
in the same group would show NLRI inet: output, and no more evpn if both 
were configured.  Still no change.



Can you clarify what you mean "broke every v6 session"?


For that one, it shut down every session on the box that didn't explicitly 
have family inet / family evpn configured at the group/neighbor level, 
refused all the incoming family inet sessions with NLRI mismatch (trying 
to send evpn only), and made no attempt to reestablish any of the family 
inet6 sessions.



I think what you're running into is one of the generally gross things about the 
address-family stanza and the inheritance model global => group => neighbor.  
If you specify ANY address-family configuration at a given scope level, it doesn't 
treat it as inheriting the less specific scopes; it overrides it.


In that specific case, yes; maybe I didn't wait long enough, but this was 
only an experiment to see whether setting something under global family 
evpn would do anything different -- and had about the expected result, 
given the way inheritance works.  (This was the least surprising result 
out of everything I tried.  I have logs, if you want 'em.)



FWIW, the use case of "prioritize a family different" is one of the things this 
was intended to address.  Once you have a working config you may find that you want to do 
policy driven config and use the route-type policy to prioritize the DF related routes in 
its own queue.  That way you're not dealing with the swarm of ARP related routes.


Eventually, yes -- same for certain classes of inet routes -- but for now 
I'd have been happy with "just shove everything EVPN into the expedited 
queue".  I couldn't get them ahead of inet, and it was a many-minute wait 
for anything else to arrive, so pretty easy to observe...


-Rob



- Per-group family evpn priority config would show up under "show bgp
 group output-queues" and similar, but adding family inet would cause the
 NLRI evpn priority output to disappear

- Policy-level adjustments to any of the above had no effect between NLRIs

- "show bgp neighbor output-queue" output always looks like this:

 Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
   Output Queue[1]: 0(inet.0, inet-unicast)

 Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
   Output Queue[2]: 0(bgp.evpn.0, evpn)

 ...which seems to fit the default per-RIB behavior as described.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-27 Thread Rob Foehl

Anyone know the secret to getting BGP output queue priorities working 
across multiple NLRIs?


Had trouble with EVPN routes getting stuck behind full refreshes of the v4 
RIB, often for minutes at a time, which causes havoc with the default DF 
election hold timer of 3 seconds.  Bumping those timers up to tens of 
minutes solves this, but... poorly.


The documentation[1] says:

"In the default configuration, that is, when no output-queue-priority 
configuration or policy that overrides priority exists, the routing 
protocol process (rpd) enqueues BGP routes into the output queue per 
routing information base (RIB). [...] While processing output queues, the 
BGP update code flushes the output queue for the current RIB before moving 
on to the next RIB that has a non-empty output queue."


I've tried about a dozen combinations of options, and cannot get any other 
result with inet/evpn routes in the same session -- inet.0 routes always 
arrive ahead of *.evpn.0.  Am I missing something[2], or is that text not 
quite accurate?


-Rob


[1] 
https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-route-prioritization.html

[2] Highlight reel of failed attempts, all on 19.2R2 thus far:

- "show bgp output-scheduler" is empty without top-level "protocols bgp
  output-queue-priority" config, regardless of anything else

- Top-level "protocols bgp family evpn signaling" priority config -- and
  nothing else within that stanza -- broke every v6 session on the box,
  even with family inet6 explicitly configured under those groups

- Per-group family evpn priority config would show up under "show bgp
  group output-queues" and similar, but adding family inet would cause the
  NLRI evpn priority output to disappear

- Policy-level adjustments to any of the above had no effect between NLRIs

- "show bgp neighbor output-queue" output always looks like this:

  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
Output Queue[1]: 0(inet.0, inet-unicast)

  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
Output Queue[2]: 0(bgp.evpn.0, evpn)

  ...which seems to fit the default per-RIB behavior as described.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-09 Thread Rob Foehl


On Mon, 9 Nov 2020, Jeffrey Haas wrote:


As the source of this particular bit of difficulty, a bit of explanation for 
why it simply wasn't done when the initial feature was authored.


Much appreciated -- the explanation, anyway ;)


An immense amount of work in the BGP code is built around the need to not have 
to keep full state on EVERYTHING.  We're already one of the most stateful BGP 
implementations on the planet.  Many times that helps us, sometimes it doesn't.

But as a result of such designs, for certain kinds of large work it is 
necessary to have a consistent work list and build a simple iterator on that.  
One of the more common patterns that is impacted by this is the walk of the 
various routing tables.  As noted, we start roughly at inet.0 and go forward 
based on internal table order.


Makes sense, but also erases the utility of output queue priorities when 
multiple tables are involved.  Is there any feasibility of moving the RIB 
walking in the direction of more parallelism, or at least something like 
round robin between tables, without incurring too much overhead / bug 
surface / et cetera?



The primary challenge for populating the route queues in user desired orders is 
to move that code out of the pattern that is used for quite a few other things. 
 While you may want your evpn routes to go first, you likely don't want route 
resolution which is using earlier tables to be negatively impacted.  Decoupling 
the iterators for the overlapping table impacts is challenging, at best.  Once 
we're able to achieve that, the user configuration becomes a small thing.


I'm actually worried that if the open ER goes anywhere, it'll result in 
the ability to specify a table order only, and that's an awfully big 
hammer when what's really needed is the equivalent of the output queue 
priorities covering the entire process.  Some of these animals are more 
equal than others.



I don't recall seeing the question about the route refreshes, but I can offer a 
small bit of commentary: The CLI for our route refresh isn't as fine-grained as 
it could be.  The BGP extension for route refresh permits per afi/safi 
refreshing and honestly, we should expose that to the user.  I know I flagged 
this for PLM at one point in the past.


The route refresh issue mostly causes trouble when bringing new PEs into 
existing instances, and is presumably a consequence of the same behavior: 
the refresh message includes the correct AFI/SAFI, but the remote winds up 
walking every RIB before it starts emitting routes for the requested 
family (and no others).  The open case for the output queue issue has a 
note from 9/2 wherein TAC was able to reproduce this behavior and collect 
packet captures of both the specific refresh message and the long period 
of silence before any routes were sent.


-Rob

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl


On Tue, 10 Nov 2020, Gert Doering wrote:


Can you do the EVPN routes on a separate session (different loopback on
both ends, dedicated to EVPN-afi-only BGP)?  Or separate RRs?

Yes, this is not what you're asking, just a wild idea to make life
better :-)


Not that wild -- I've already been pinning up EVPN-only sessions between 
adjacent PEs to smooth out the DF elections where possible.  Discrete 
sessions over multiple loopback addresses also work, at the cost of extra 
complexity.


At some point that starts to look like giving up on RRs, though -- which 
I'd rather avoid, they're kinda useful :)


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl


On Tue, 10 Nov 2020, Robert Raszuk wrote:


But what seems wired is last statement: 

"This has problems with blackholing traffic for long periods in several
cases,..." 

We as the industry have solved this problem many years ago, by clearly
decoupling connectivity restoration term from protocol convergence term. 


Fundamentally, yes -- but not for EVPN DF elections.  Each PE making its 
own decisions about who wins without any round-trip handshake agreement is 
the root of the problem, at least when coupled with all of the fun that 
comes with layer 2 flooding.


There's also no binding between whether a PE has actually converged and 
when it brings up IRBs and starts announcing those routes, which leads to 
a different sort of blackholing.  Or in the single-active case, whether 
the IRB should even be brought up at all, which leads to some really dumb 
traffic paths.  (Think layer 3 via P -> inactive PE -> same P, different 
encapsulation -> active PE -> layer 2 segment, for an example.)



I think this would be a recommended direction not so much to mangle BGP code
to optimize here and in the same time cause new maybe more severe issues
somewhere else. Sure per SAFI refresh should be the norm, but I don't think
this is the main issue here. 


Absolutely.  The reason for the concern here is that the output queue 
priorities would be sufficient to work around the more fundamental flaws, 
if not for the fact that they're largely ineffective in this exact case.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl


On Tue, 10 Nov 2020, Jeffrey Haas wrote:


The thing to remember is that even though you're not getting a given afi/safi 
as front-loaded as you want (absolute front of queue), as soon as we have 
routes for that priority they're dispatched accordingly.


Right, that turns out to be the essential issue -- the output queues 
actually are working as configured, but the AFI/SAFI routes relevant to a 
higher priority queue arrive so late in the process that it's basically 
irrelevant whether they get to cut in line at that point.  Certainly 
wasn't observable to human eyes, had to capture the traffic to verify.



Full table walks to populate the queues take some seconds to several minutes 
depending on the scale of the router.  In the absence of prioritization, 
something like the evpn routes might not go out for most of a minute rather 
than getting delayed some number of seconds until the rib walker has reached 
that table.


Ah, maybe this is the sticking point: on a route reflector with an 
RE-S-X6-64 carrying ~10M inet routes and ~10K evpn routes, a new session 
toward an RR client PE needing to be sent ~1.6M inet routes (full table, 
add-path 2) and maybe ~3K evpn routes takes between 11-17 minutes to get 
through the initial batch.  The evpn routes only arrive at the tail end of 
that, and may only preempt around 1000 inet routes in the output queues, 
as confirmed by TAC.


I have some RRs that tend toward the low end of that range and some that 
tend toward the high end -- and not entirely sure why in either case -- 
but that timing is pretty consistent overall, and pretty horrifying.  I 
could almost live with "most of a minute", but this is not that.


This has problems with blackholing traffic for long periods in several 
cases, but the consequences for DF elections are particularly disastrous, 
given that they make up their own minds based on received state without 
any affirmative handshake: the only possible behaviors are discarding or 
looping traffic for every ethernet segment involved until the routes 
settle, depending on whether the PE involved believes it's going to win 
the election and how soon.  Setting extremely long 20 minute DF election 
hold timers is currently the least worst "solution", as losing traffic for 
up to 20 minutes is preferable to flooding a segment into oblivion -- but 
only just.


I wouldn't be nearly as concerned with this if we weren't taking 15-20 
minute outages every time anything changes on one of the PEs involved...



[on the topic of route refreshes]


The intent of the code is to issue the minimum set of refreshes for new 
configuration.  If it's provably not minimum for a given config, there should 
be a PR on that.


I'm pretty sure that much is working as intended, given what is actually 
sent -- this issue is the time spent walking other RIBs that have no 
bearing on what's being refreshed.



The cost of the refresh in getting routes sent to you is another artifact of "we 
don't keep that state" - at least in that configuration.  This is a circumstance 
where family route-target (RT-Constrain) may help.  You should find when using that 
feature that adding a new VRF with support for that feature results in the missing routes 
arriving quite fast - we keep the state.


I'd briefly looked at RT-Constrain, but wasn't convinced it'd be useful 
here since disinterested PEs only have to discard at most ~10K EVPN routes 
at present.  Worth revisiting that assessment?


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-09 Thread Rob Foehl


On Mon, 27 Jul 2020, Rob Foehl wrote:

Anyone know the secret to getting BGP output queue priorities working across 
multiple NLRIs?

[...]
I've tried about a dozen combinations of options, and cannot get any other 
result with inet/evpn routes in the same session -- inet.0 routes always 
arrive ahead of *.evpn.0.


Following up on this for posterity:

That last part turns out to not be entirely true.  It appears that the 
output queue priorities do work as intended, but route generation walks 
through the RIBs in a static order, always starting with inet.0 -- so 
maybe the last ~1000 inet routes wind up in the output queues at the same 
time as evpn routes.


This was declared to be working as designed, and the issue is now stuck in 
ER hell; best estimate for a real solution is "maybe next year".  Route 
refresh for EVPN routes triggering a full walk of all RIBs was also 
confirmed, but remains unexplained.


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] RSVP path constraints for transit LSPs

2021-02-08 Thread Rob Foehl


On Sat, 6 Feb 2021, Robert Huey wrote:


Have you looked into IGP Overload?   I think it will do the trick without ever 
getting into TE constraints.


In this case, it's OSPF, so overload is just max metric.  The path metric 
already exceeds any other through the network under ordinary conditions, 
which is why it's only a problem on occasion, and with bypass LSPs in 
particular.


IGP metric isn't enough when "best path" is the same answer as "only 
available path", and it looks like switch-away-lsps goes too far in the 
opposite direction.


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] RSVP path constraints for transit LSPs

2021-02-05 Thread Rob Foehl

Possibly-missing-something-obvious question: are there any less-involved 
alternatives to link coloring to preclude RSVP from signaling LSPs through 
specific nodes?


I've got some traffic occasionally wandering off where it shouldn't be -- 
mostly due to bypass LSPs landing on some "temporary" links -- and in this 
case, it'd be handy to just say "this box is never allowed to be a P 
router" and call it solved.


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Thanks for all the fish

2024-01-09 Thread Rob Foehl via juniper-nsp

On Tue, 2024-01-09 at 10:55 +0200, Saku Ytti via juniper-nsp wrote:
> What do we think of HPE acquiring JNPR?

I'm just hoping the port checker gets updated to show which slots will
accept a fresh magenta cartridge in order to bring BGP back up...

(Just kidding -- but only because it's the wrong HP for that particular
manifestation of the current rent-seeking licensing trajectory.)

-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

49 matches

Mail list logo