from:"Saku Ytti via juniper\-nsp"

Re: [j-nsp] rib-sharding and NSR update

2024-06-02 Thread Saku Ytti via juniper-nsp

On Mon, 3 Jun 2024 at 05:26, Gustavo Santos via juniper-nsp
 wrote:

> We will try it again later this year. If update threading / rib-sharding
> works as expected it will be better than having non stop routing running.

I think you need to contact support and work with them, NOS SW quality
is terrible and whatever problem you're seeing might be some corner
case that happens just you, and it will never get fixed if you're not
proactive about it.

> Last time we had an issue caused by bgp routing update, it tooks about 50
> minutes to advertise all needed routes to one of the transit providers,
> because the time it takes to send full routing tables feed to remote peers.

Could be a plethora of things, but by default the TCP window won't
grow past 16kB, so if you have any latency at all, performance is
destroyed. You can raise this to 64kB, but windowning is currently not
supported. And in my own testing, performance was gated by the 64kB
window, we were able to fill the entire window, so convergence was
limited by the 64k window.

But it's not necessarily super trivial to make BGP perform well, as
you may have 10Mbps static policer for queue towards control-plane
shared by BGP, all IGPs, etc, so if you were able to make BGP perform,
you could potentially kill your IGP, so problem may recurse into
complex one, and the DE who made the design choices may not anymore be
employed so there may not be anyone left who will be able to make
informed decision on how to change the behaviour.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] JunOS forwarding IPv6 packets with link-local source

2024-05-17 Thread Saku Ytti via juniper-nsp

On Fri, 17 May 2024 at 10:36, Antti Ristimäki  wrote:

> iACL design becomes a bit more challenging if you want to keep the
> link-local things link local (e.g. there are legit ND packets with
> link-local srcaddr and GUA dstaddr). It is doable, though.

Not disagreeing, but what are these packets? And can you drop
link-local in two forwarding-filter terms?

I know ND can be any permutation, but those can be handled in earlier
terms in iACL without matching addresses, by matching icmp6 types and
hop-limit 255.
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] JunOS forwarding IPv6 packets with link-local source

2024-05-17 Thread Saku Ytti via juniper-nsp

On Thu, 16 May 2024 at 21:23, Antti Ristimäki via juniper-nsp
 wrote:

> Does anyone have any insight into this? This issue was discussed on
> this list already over 10 years ago, for example:
> https://puck.nether.net/pipermail/juniper-nsp/2012-April/023134.html

Personally I'm not convinced I'd even want this fixed, as it likely
comes with significant per-packet cost. Reality is always some
pragmatic version of standard. But I'm pretty sure if you press it,
Juniper will accept it as PR.

If I read the IPv6 standard correctly, nodes /have to/ join the ND
multicast group, which they don't, which is good, because the whole
thing is dumb, fragile and expensive.
ICMPv6 ND forwarding is weird, most forward it happily in all cases,
some like SROS punt all ICMPv6 ND with TTL 255, transit or punt, and
transit all TTL 254 or less.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ACL for lo0 template/example comprehensive list of 'things to think about'?

2024-05-13 Thread Saku Ytti via juniper-nsp

How IP options work is platform specific.

It used to be that _transited_ IP-options were not subject to the lo0
filter, while still being risk for RE, so you'd implement
forwarding-filter, where you'd police IP-options or drop out right.

In more recent junipers, this behaviour has been changed so that
transited IP-options are subject to lo0 filter, which makes it in
practice impossible to determine if IP-option is transit or punt,
especially if you run L3 MPLS VPN.

I tried to argue that the new behaviour is PR, but didn't have enough
patience to ram it through.

So basically no one knows what their policy regarding transit IP
packets are, and most accidentally change the policy from 'transit
all, unlimited' to 'transit none' by upgrading devices. Of course
generally this is the case for most things.

On Mon, 13 May 2024 at 13:36, Martin Tonusoo via juniper-nsp
 wrote:
>
> Michael,
>
> got it, thanks.
>
>
> Lee,
>
> the README of your repository provides an excellent introduction to RE
> filtering. Based on your filters, I moved the processing of the IP
> Options from edge filters to RE filters:
>
> https://gist.github.com/tonusoo/efd9ab4fcf2bb5a45d34d5af5e3f3e0c#file-junos-re-filters-L574:L585
>
>
> Martin
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP timer

2024-04-29 Thread Saku Ytti via juniper-nsp

On Mon, 29 Apr 2024 at 10:13, Mark Tinka via juniper-nsp
 wrote:

> It comes down to how you classify stable (well-behaved) vs. unstable
> (misbehaving) interfaces.

You are making this unnecessarily complicated.

You could simply configure that first down event doesn't add enough
points to damp, 2nd does. And you are wildly better off.

Perfect is the enemy of done and kills all movement towards better.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP timer

2024-04-29 Thread Saku Ytti via juniper-nsp

On Mon, 29 Apr 2024 at 10:07, Gert Doering via juniper-nsp
 wrote:

> The interesting question is "how to react when underlay seems to be stable
> again"?  "bring up upper layers right away, with exponential decay flap
> dampening" or "always wait 15 minutes to be SURE it's stable!!!"...

100%, what Mark implied was not what I was trying to communicate.
Sure, go ahead and damp flapping interfaces, but to penalise on first
down event, when most of them are just that, one event, to me, is just
bad policy made by people who don't feel the cost.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP timer

2024-04-29 Thread Saku Ytti via juniper-nsp

On Sun, 28 Apr 2024 at 21:20, Jeff Haas via juniper-nsp
 wrote:

> BFD holddown is the right feature for this.
> WARNING: BFD holddown is known to be problematic between Juniper and Cisco 
> implementations due to where each start their state machines for BFD vs. BGP.
>
> It was a partial motivation for BGP BFD strict:
> https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-bfd-strict-mode
>
> BGP BFD strict was added in 23.2R1.

But why is this desirable? Why do I want to prioritise stability
always, instead of prioritising convergence on well-behaved interfaces
and stability on poorly behaved interfaces?

If I can pick just one, I'll prioritise convergence every time for both.

That is, if I cannot have exponential back-off, I won't kill
convergence 'just in case', because it's not me who will feel the pain
of my decisions, it's my customers. Netengs and particularly infosec
people quite often are unnecessarily conservative in their policies,
because they don't have skin in the game, they feel the upside, but
not the downside.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ACL for lo0 template/example comprehensive list of 'things to think about'?

2024-04-28 Thread Saku Ytti via juniper-nsp

Some comments from quick read of just IPv4.

- I don't like the level of abstraction, seems it just ensures no one
will bother reading it up and reuse of the filters and terms wont
happen anyhow. It feels like first time learning OO language, and
making everything modular, while adding overhead and abstraction for
no value. Instead of having flat list, you have multiple filters in a
list (which is internally concatenate in SW anyhow, into single fat
list, so no HW benefit), not just that, but filters themselves refer
to other filters.

1) You should have two rules for TCP services, like BGP, inbound and
outbound, instead just allowing far end to connect, and self-connect
is handled by flags  This will allow far-end to hit any port they
want, while it will not have SYN bit, it's still not safe. You could
improve it, by defining DPORT in the connected check as ephemeral
range the NOS uses

2) OSPF can be TTL==1, not very important for security, tho

3) traceroute and ping won't work, if router is the target DADDR and TTL > 1

4) useless use of 'router-v4', if it hit lo0, it was for us. You'd
need something like this in the edge filter, not lo0 filter. And in
the edge filter it's still broken, because this is all LANs, not
host/32.

5) use of 'port' in NTP and other, this allows the far end to hit any
port, by setting SPORT port to ntp

6) no dport in DNS, every term should have DPORT, if we are
connecting, it'll be ephemeral range, otherwise far end can hit any
dport, by setting sport




Some of these mistakes are straight from the book, like the useless
level of abstraction without actual reuse and the insecure use of
'port'. But unlike the book, at least you have ultimate permit and
then ultimate deny, which is important.


On Sun, 28 Apr 2024 at 12:21, Martin Tonusoo  wrote:
>
> Hi.
>
> > In practical life IOS-XR control-plane is better protected than JunOS,
> > as configuring JunOS securely is very involved, considering that MX
> > book gets it wrong, offering horrible lo0 filter as does Cymru, what
> > chance the rest of us have?
>
> I recently worked on a RE protection filter based on the examples
> given in the "Juniper MX Series" book:
> https://gist.github.com/tonusoo/efd9ab4fcf2bb5a45d34d5af5e3f3e0c
>
> It's a tight filter for a simple network, e.g MPLS is not in use and
> thus there are no filters for signaling protocols or MPLS LSP
> ping/traceroute, routing instances are not in use, authentication for
> VRRPv3 or OSPF is not in use, etc.
>
> Few differences compared to filters in the MX book:
>
> * "ttl-except 1" in "accept-icmp" filter was avoided by simply moving
> the traceroute related filters in front of "accept-icmp" filter
>
> * "discard-extension-headers" filter in the book discards certain Next
> Header values and allows the rest. I changed it in a way that only
> specified Next Header values are accepted and rest are discarded. Idea
> is to discard unneeded extension headers as early as possible.
>
> * in term "neighbor-discovery-accept" in filter "accept-icmp6-misc"
> only the packets with Hop Limit value of 255 should be accepted
>
> * the "accept-bgp-v6" filter or any other IPv6 related RE filter in
> the book does not allow the router to initiate BGP sessions with other
> routers. I added a term named "accept-established-bgp-v6" in filter
> "accept-established-v6" which addresses this issue.
>
> * for the sake of readability and simplicity, I used names instead of
> numbers if possible. For example "icmp-type router-solicit" instead of
> "icmp-type 133".
>
> * in all occurrences, if it was not possible to match on the source IP
> address, then I strictly policed the traffic
>
> * traffic from management networks is not sharing policers with
> traffic from untrusted networks
>
>
> The overall structure of the RE filters in "Juniper MX Series" book is
> in my opinion very good. List of small filters which accept specific
> traffic and finally discard all the rest.
>
> Reason for having separate v4 and v6 prefix-lists is a Junos property
> to ignore the prefix-list altogether if it's used in a family inet
> filter while the prefix-list contains only the inet6 networks. Same is
> true if the prefix-list is used in family inet6 filter and the
> prefix-list contains only inet networks. For example, if only IPv4
> name servers addresses are defined under [edit system name-server] and
> prefix-list with apply-path "system name-server <*>" is used as a
> source prefix-list in some family inet6 filter, then actually no
> source address related restrictions apply. This can be checked with
> "show filter index  program" on a PFE CLI.
>
>
> Martin



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP timer

2024-04-28 Thread Saku Ytti via juniper-nsp

On Sat, 27 Apr 2024 at 14:29, Rolf Hanßen via juniper-nsp
 wrote:

> at least for link flapping issues (but not other session flapping reasons) 
> you could set the hold-time:
> set interfaces xy hold-time up 30

Since Junos 14.1 it has caught up with Cisco, and it has implemented
exponential back-off for interface damping. So you don't have to cause
a static penalty as above, but can penalise actually flapping
interfaces, instead of killing convergence on the first transition.

But indeed doesn't really address what OP is asking, and I don't
think, outside scripting, there is a direct solution to what OP wants.
Clearly any vendor could implement exponential back-off damping to any
protocol which has up and down state, and they could write the code
once, and reuse it for everything, so it's not a tall order at all.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture

2024-04-03 Thread Saku Ytti via juniper-nsp

This might be grounds for a feature request to Juniper, if there isn't
already some magic toggle to MakeItGo.

But yeah, the forwarding-table looks suspect, as if it'll do table
lookup, and then will fail to discover the more-specific host-route,
and discard, as the ARP entries are not copied. And yeah Alexandres'
workaround seems like a cute way to force the host route into VRF, if
provisioning intensive.

I think two features would be nice to have

a) this to copy the arp/nd entries from inet to vrf (if not already possible)
b) feature to assign labels to each arp/nd host route, to avoid doing
egressPE lookup (this labeled route would only be imported to the
interface facing scrubber clean side, rest of the network sees the
unlabeled direct aggregate)

On Wed, 3 Apr 2024 at 17:04, Michael Hare  wrote:
>
> Saku, Mark-
>
> Thanks for the responses.  Unless I'm mistaken, short of specifying a 
> selective import policy, I think I'm already doing what Saku suggests, see 
> relevant config snippet below.  Our clean VRF is L3VPN-4205.  But after I saw 
> the lack of mac based next hops I started searching to see if there was a 
> protocol other than direct that I wasn't aware of.  I intend to take a look 
> at Alexandre's workaround to understand/test, just haven't gotten there yet.
>
> I was able to get FBF via dirtyVRF working quickly in the meantime while I 
> figure out how to salvage the longest-prefix approach.
>
> -Michael
>
> ==/==
>
> @ # show routing-options | display inheritance no-comments
> ...
> interface-routes {
> rib-group {
> inet rib-interface-routes-v4;
> inet6 rib-interface-routes-v6;
> }
> }
> rib-groups {
> rib-interface-routes-v4 {
> import-rib [ inet.0 L3VPN-4205.inet.0 ];
> }
> ...
> rib-interface-routes-v6 {
> import-rib [ inet6.0 L3VPN-4205.inet6.0 ];
>     }
> ...
> }
>
> > -Original Message-
> > From: juniper-nsp  On Behalf Of
> > Saku Ytti via juniper-nsp
> > Sent: Wednesday, April 3, 2024 1:58 AM
> > To: Mark Tinka 
> > Cc: juniper-nsp@puck.nether.net
> > Subject: Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture
> >
> > On Wed, 3 Apr 2024 at 09:45, Saku Ytti  wrote:
> >
> > > Actually I think I'm confused. I think it will just work. Because even
> > > as the EgressPE does IP lookup due to table-label, the IP lookup still
> > > points to egressMAC, instead looping back, because it's doing it in
> > > the CleanVRF.
> > > So I think it just works.
> >
> > > routing-options {
> > >   interface-routes {
> > > rib-groups {
> > >   cleanVRF {
> > > import-rib [ inet.0 cleanVRF.inet.0 ];
> > > import-policy cleanVRF:EXPORT;
> > >  
> >
> > This isn't exactly correct. You need to put the cleanVRF in
> > interfacer-quotes and close it.
> >
> > Anyhow I'm 90% sure this will just work and pretty sure I've done it.
> > The confusion I had was about the scrubbing route that on the
> > clean-side is already host/32. For this, I can't figure out a cleanVRF
> > solution, but a BGP-LU solution exists even for this problem.
> >
> >
> > --
> >   ++ytti
> > ___
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/junip
> > er-nsp__;!!Mak6IKo!JQJvgDK7yNf4-
> > 3MbfcDkWHvNajBUNxt3ZAC3DefzEkRkebYhpy3c7RX5em7pvvTJZrdrNKw79P
> > QweWqGaJdIwLpkAng$



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture

2024-04-03 Thread Saku Ytti via juniper-nsp

On Wed, 3 Apr 2024 at 09:45, Saku Ytti  wrote:

> Actually I think I'm confused. I think it will just work. Because even
> as the EgressPE does IP lookup due to table-label, the IP lookup still
> points to egressMAC, instead looping back, because it's doing it in
> the CleanVRF.
> So I think it just works.

> routing-options {
>   interface-routes {
> rib-groups {
>   cleanVRF {
> import-rib [ inet.0 cleanVRF.inet.0 ];
> import-policy cleanVRF:EXPORT;
>  

This isn't exactly correct. You need to put the cleanVRF in
interfacer-quotes and close it.

Anyhow I'm 90% sure this will just work and pretty sure I've done it.
The confusion I had was about the scrubbing route that on the
clean-side is already host/32. For this, I can't figure out a cleanVRF
solution, but a BGP-LU solution exists even for this problem.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture

2024-04-03 Thread Saku Ytti via juniper-nsp

On Wed, 3 Apr 2024 at 09:37, Mark Tinka via juniper-nsp
 wrote:

> At old job, we managed to do this with a virtual-router VRF that carried
> traffic between the scrubbing PE and the egress PE via MPLS, to avoid
> the IP loop.

Actually I think I'm confused. I think it will just work. Because even
as the EgressPE does IP lookup due to table-label, the IP lookup still
points to egressMAC, instead looping back, because it's doing it in
the CleanVRF.
So I think it just works.

So OP just needs to copy the direct route as-is, not as host/32 into
cleanVRF, with something like this:

routing-options {
  interface-routes {
rib-groups {
  cleanVRF {
import-rib [ inet.0 cleanVRF.inet.0 ];
import-policy cleanVRF:EXPORT;

Now cleanVRF.inet.0 has the connected TableLabel, and as lookup is
done in the cleanVRF, without the Scrubber/32 route, it'll be sent to
the correct egress CE, despite doing egress IP lookup.
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] L3VPNs and on-prem DDoS scrubbing architecture

2024-04-03 Thread Saku Ytti via juniper-nsp

On Tue, 2 Apr 2024 at 18:25, Michael Hare via juniper-nsp
 wrote:

> We're a US research and education ISP and we've been tasked for coming up 
> with an architecture to allow on premise DDoS scrubbing with an appliance.   
> As a first pass I've created an cleanL3VPN routing-instance to function as a 
> clean VRF that uses rib-groups to mirror the relevant parts of inet.0.   It 
> is in production and is working great for customer learned BGP routes.  It 
> falls apart when I try to protect a directly attached destination that has a 
> mac address in inet.0.  I think I understand why and the purpose of this 
> message is to see if anyone has been in a similar situation and has 
> thoughts/advice/warnings about alternative designs.
>
> To explain what I see, I noticed that mac address based nexthops don't seem 
> to be copied from inet.0 into cleanL3VPN.inet.0.  I assume this means that 
> mac-address based forwarding must be referencing inet.0 [see far below].   
> This obviously creates a loop once the best path in inet.0 becomes a BGP /32. 
>  For example when I'm announcing a /32 for 1.2.3.4 out of a locally attached 
> 1.2.3.0/26, traceroute implies the packet enters inet.0, is sent to 5.6.7.8 
> as the nexthop correctly, arrives in cleanL3VPN which decides to forward to 
> 5.6.7.8 in a loop, even though the BGP /32 isn't part of cleanL3VPN [see 
> below], cleanL3VPN Is dependent on inet.0 for resolution.  Even if I could 
> copy inet.0 mac addresses into cleanL3VPN, eventually the mac address would 
> age out of inet.0 because the /32 would no longer be directly connected.  If 
> I want to be able to protect locally attached destinations so I think my 
> design is unworkable, I think my solutions are

If I understand you correctly, the problem is not that you can't copy
direct into CleanVRF, the problem is that ScrubberPE that does clean
lookup in in CleanVRF, has label stack of [EgressPE TableLabel],
instead of [EgressPE EgressCE], this causes the EgressPE to do IP
lookup, which will then see the Direct/32 advertised by the scrubber,
causing loop. While what you want is end-to-end MPLS lookup, so that
egressPE MPLS lookup has egressMAC.

I believe in BGP-LU you could fix this, without actually paying for
duplicate RIB/FIB and without opportunistically copying routes to
CleanVRF, every prefix would be scrubbable by default. You'd have
per-ce for rest, but per-prefix for connected routes, I believe then
you would have [EgressPE EgressMAC_CE] label for connected routes, so
each host route would have their own label, allowing mac rewrite
without additional local IP lookup.

I'm not sure if this is the only way, I'm not sure if there would be a
way in CleanVRF to force each direct/32 to have a label as well,
avoiding the egress IP lookup loops. One doesn't immediately spring to
mind, but technically implementation could certainly allow such mode.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BGP route announcements and Blackholes

2024-03-19 Thread Saku Ytti via juniper-nsp

On Tue, 19 Mar 2024 at 19:44, Lee Starnes via juniper-nsp
 wrote:

> The blackhole peer does receive the /32 announcement, but the aggregate
> route also becomes discarded and thus routes to the other peers stop
> working.

I couldn't follow this, and the output you shared didn't support it.
So it is not clear to me what the actual problem is.

Of course if you want a blackhole, you want an internal blackhole too,
so you internally are going to add some route to discard,
then this is the route you'd leak to upstream.

How this would impact the next-hop type or readversability of the
aggregate is unclear to me, unless you're blackholing the next-hop of
some route.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX204 OSPF default route injection

2024-03-06 Thread Saku Ytti via juniper-nsp

On Thu, 7 Mar 2024 at 03:08, Lee Starnes via juniper-nsp
 wrote:

> Any tips or help on the best practice implementation would be greatly
> appreciated.

While what you want obviously is possible to accomplish. Is it a thing
you actually need? I don't personally really see any need to ever
carry default-route in dynamic routing protocols, in static protocols
there are use cases obviously.

Why not have a static floating default pointing to a dynamic recursive
next-hop at the CE? This solves quite a bit of problems in a rather
elegant way. One example being, if you generate a static route at
edge, you have no idea about the quality of the route, the edge device
may be entirely isolated and just blackholing all traffic. Whereas,
perhaps your candidate route is originated only from backbone devices
anycast loopback, and edge device is simply passing that host-route
towards CEs, this way the CE will recurse its default route to
whichever edge device happens to be connected and is passing the
default route along.

There are other examples of problems this addresses, discussed in this
and other lists in previous years.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-12 Thread Saku Ytti via juniper-nsp

On Mon, 12 Feb 2024 at 09:44, james list  wrote:

> I'd like to test with LACP slow, then can see if physical interface still 
> flaps...

I don't think that's good idea, like what would we know? Would we have
to wait 30 times longer, so month-3months, to hit what ever it is,
before we have confidence?

I would suggest
 - turn on debugging, to see cisco emitting LACP PDU, and juniper
receiving LACP PDU
 - do packet capture, if at all reasonable, ideally tap, but in
absence of tap mirror
 - turn off LACP distributed handling on junos
 - ping on the link, ideally 0.2-0.5s interval, to record how ping
stops in relation to first syslog emitted about LACP going down
 - wait for 4days


-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

On Sun, 11 Feb 2024 at 17:52, james list  wrote:

> - why physical interface flaps in DC1 if it is related to lacp ?

16:39:35.813 Juniper reports LACP timeout (so problem started at
16:39:32, (was traffic passing at 32, 33, 34 seconds?))
16:39:36.xxx Cisco reports interface down, long after problem has
already started

Why Cisco reports physical interface down, I'm not sure. But clearly
the problem was already happening before interface down, and first log
entry is LACP timeout, which occurs 3s after the problem starts.
Perhaps Juniper asserts for some reason RFI? Perhaps Cisco resets the
physical interface once removed from LACP?

> - why the same setup in DC2 do not report issues ?

If this is is LACP related software issue, could be difference not
identified. You need to gather more information, like how does ping
look throughout this event, particularly before syslog entries. And if
ping still works up-until syslog, you almost certainly have software
issue with LACP inject at Cisco, or more likely LACP punt at Juniper.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

On Sun, 11 Feb 2024 at 15:24, james list  wrote:

> While on Juniper when the issue happens I always see:
>
> show log messages | last 440 | match LACPD_TIMEOUT
> Jan 25 21:32:27.948 2024  MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp 
> current while timer expired current Receive State: CURRENT

> Feb  9 16:39:35.813 2024  MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp 
> current while timer expired current Receive State: CURRENT

Ok so problem always starts by Juniper seeing 3seconds without LACP
PDU, i.e. missing 3 consecutive LACP PDU. It would be good to ping
while this problem is happening, to see if ping stops at 3s before the
syslog lines, or at the same time as syslog lines.
If ping stops 3s before, it's link problem from cisco to juniper.
If ping stops at syslog time (my guess), it's software problem.

There is unfortunately log of bug surface here, both on inject and on
punt path. You could be hitting PR1541056 on the Juniper end. You
could test for this by removing distributed LACP handling with 'set
routing-options ppm no-delegate-processing'
You could also do packet capture for LACP on both ends, to try to see
if LACP was sent by Cisco and received by capture, but not by system.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

Hey James,

You shared this off-list, I think it's sufficiently material to share.

2024 Feb  9 16:39:36 NEXUS1
%ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface
port-channel101 is down (No operational members)
2024 Feb  9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN:
port-channel101: Ethernet1/44 is down
Feb  9 16:39:35.813 2024  MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5:
lacp current while timer expired current Receive State: CURRENT
Feb  9 16:39:35.813 2024  MX1 lacpd[31632]: LACP_INTF_DOWN: ae49:
Interface marked down due to lacp timeout on member et-0/1/5

We can't know the order of events here, due to no subsecond precision
enabled on Cisco end.

But if failure would start from interface down, it would take 3seconds
for Juniper to realise LACP failure. However we can see that it
happens in less than 1s, so we can determine the interface was not
down first, the first problem was Juniper not receiving 3 consecutive
LACP PDUs, 1s apart, prior to noticing any type of interface state
related problems.

Is this always the order of events? Does it always happen with Juniper
noticing problems receiving LACP PDU first?


On Sun, 11 Feb 2024 at 14:55, james list via juniper-nsp
 wrote:
>
> Hi
>
> 1) cable has been replaced with a brand new one, they said that to check an
> MPO 100 Gbs cable is not that easy
>
> 3) no errors reported on both side
>
> 2) here the output of cisco and juniper
>
> NEXUS1# sh interface eth1/44 transceiver details
> Ethernet1/44
> transceiver is present
> type is QSFP-100G-SR4
> name is CISCO-INNOLIGHT
> part number is TR-FC85S-NC3
> revision is 2C
> serial number is INL27050TVT
> nominal bitrate is 25500 MBit/sec
> Link length supported for 50/125um OM3 fiber is 70 m
> cisco id is 17
> cisco extended id number is 220
> cisco part number is 10-3142-03
> cisco product id is QSFP-100G-SR4-S
> cisco version id is V03
>
> Lane Number:1 Network Lane
>SFP Detail Diagnostics Information (internal calibration)
>
> 
> Current  Alarms  Warnings
> Measurement HighLow High  Low
>
> 
>   Temperature   30.51 C75.00 C -5.00 C 70.00 C0.00 C
>   Voltage3.28 V 3.63 V  2.97 V  3.46 V3.13 V
>   Current6.40 mA   12.45 mA 3.25 mA12.45 mA   3.25
> mA
>   Tx Power   0.98 dBm   5.39 dBm  -12.44 dBm2.39 dBm -8.41
> dBm
>   Rx Power  -1.60 dBm   5.39 dBm  -14.31 dBm2.39 dBm-10.31
> dBm
>   Transmit Fault Count = 0
>
> 
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
>
> Lane Number:2 Network Lane
>SFP Detail Diagnostics Information (internal calibration)
>
> 
> Current  Alarms  Warnings
> Measurement HighLow High  Low
>
> 
>   Temperature   30.51 C75.00 C -5.00 C 70.00 C0.00 C
>   Voltage3.28 V 3.63 V  2.97 V  3.46 V3.13 V
>   Current6.40 mA   12.45 mA 3.25 mA12.45 mA   3.25
> mA
>   Tx Power   0.62 dBm   5.39 dBm  -12.44 dBm2.39 dBm -8.41
> dBm
>   Rx Power  -1.18 dBm   5.39 dBm  -14.31 dBm2.39 dBm-10.31
> dBm
>   Transmit Fault Count = 0
>
> 
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
>
> Lane Number:3 Network Lane
>SFP Detail Diagnostics Information (internal calibration)
>
> 
> Current  Alarms  Warnings
> Measurement HighLow High  Low
>
> 
>   Temperature   30.51 C75.00 C -5.00 C 70.00 C0.00 C
>   Voltage3.28 V 3.63 V  2.97 V  3.46 V3.13 V
>   Current6.40 mA   12.45 mA 3.25 mA12.45 mA   3.25
> mA
>   Tx Power   0.87 dBm   5.39 dBm  -12.44 dBm2.39 dBm -8.41
> dBm
>   Rx Power   0.01 dBm   5.39 dBm  -14.31 dBm2.39 dBm-10.31
> dBm
>   Transmit Fault Count = 0
>
> 
>   Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning
>
> Lane Number:4 Network Lane
>SFP Detail Diagnostics

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

I want to clarify, I meant this in the context of the original question.

That is, if you have a BGP specific problem, and no FCS errors, then
you can't have link problems.

But in this case, the problem is not BGP specific, in fact it has
nothing to do with BGP, since the problem begins on observing link
flap.

On Sun, 11 Feb 2024 at 14:14, Saku Ytti  wrote:
>
> I don't think any of these matter. You'd see FCS failure on any
> link-related issue causing the BGP packet to drop.
>
> If you're not seeing FCS failures, you can ignore all link related
> problems in this case.
>
>
> On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp
>  wrote:
> >
> > > DC technicians states cable are the same in both DCs and
> > > direct, no patch panel
> >
> > Things I would look at:
> >
> >  * Has all the connectors been verified clean via microscope?
> >
> >  * Optical levels relative to threshold values (may relate to the
> >first).
> >
> >  * Any end seeing any input errors?  (May relate to the above
> >two.)  On the Juniper you can see some of this via PCS
> >("Physical Coding Sublayer") unexpected events independently
> >of whether you have payload traffic, not sure you can do the
> >same on the Nexus boxes.
> >
> > Regards,
> >
> > - Håvard
> > ___
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
>
> --
>   ++ytti



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

I don't think any of these matter. You'd see FCS failure on any
link-related issue causing the BGP packet to drop.

If you're not seeing FCS failures, you can ignore all link related
problems in this case.


On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp
 wrote:
>
> > DC technicians states cable are the same in both DCs and
> > direct, no patch panel
>
> Things I would look at:
>
>  * Has all the connectors been verified clean via microscope?
>
>  * Optical levels relative to threshold values (may relate to the
>first).
>
>  * Any end seeing any input errors?  (May relate to the above
>two.)  On the Juniper you can see some of this via PCS
>("Physical Coding Sublayer") unexpected events independently
>of whether you have payload traffic, not sure you can do the
>same on the Nexus boxes.
>
> Regards,
>
> - Håvard
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

On Sun, 11 Feb 2024 at 13:51, james list via juniper-nsp
 wrote:

> One think I've omit to say is that BGP is over a LACP with currently just
> one interface 100 Gbs.
>
> I see that the issue is triggered on Cisco when eth interface seems to go
> in Initializing state:

Ok, so we can forget BGP entirely. And focus on why the LACP is going down.

Is the LACP single port, eth1/44?

When the LACP fails, does Juniper end emit any syslog? Does Juniper
see the interface facing eth1/44 flapping?

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco

2024-02-11 Thread Saku Ytti via juniper-nsp

Open JTAC and CTAC cases.

The amount of information provided is wildly insufficient.

'BGP flaps' what does that mean, is it always the same direction? If
so, which direction thinks it's not seeing keepalives? Do you also
observe loss in 'ping' between the links during the period?

Purely stabbing in the dark, I'd say you always observe it in a single
direction, because in that direction you are losing reliably every nTh
keepalive, and statistically it takes 1-3 days to lose 3 in a row,
with the probability you're seeing. Now why exactly is this, is one
end not sending to wire or is one end not receiving from wire. Again
stabbing in the dark, more likely that problem is in the punt path,
rather than inject path, so I would focus my investigation on the
party who is tearing down the session, due to lack of keepalive, on
thesis this device has problem in punt path and is for some reason
dropping at reliable probability BGP packets from the wire.

On Sun, 11 Feb 2024 at 12:09, james list via juniper-nsp
 wrote:
>
> Dear experts
> we have a couple of BGP peers over a 100 Gbs interconnection between
> Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different datacenters
> like this:
>
> DC1
> MX1 -- bgp -- NEXUS1
> MX2 -- bgp -- NEXUS2
>
> DC2
> MX3 -- bgp -- NEXUS3
> MX4 -- bgp -- NEXUS4
>
> The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP
> flaps only in DC1 on both interconnections (not at the same time), there is
> still no traffic since once noticed the flaps we have blocked deploy on
> production.
>
> We've already changed SPF (we moved the ones from DC2 to DC1 and viceversa)
> and cables on both the interconnetion at DC1 without any solution.
>
> SFP we use in both DCs:
>
> Juniper - QSFP-100G-SR4-T2
> Cisco - QSFP-100G-SR4
>
> over MPO cable OM4.
>
> Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue.
>
> Any idea or suggestion what to check or to do ?
>
> Thanks in advance
> Cheers
> James
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2024-02-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Feb 2024 at 17:50, Tom Beecher  wrote:

> Completely fair, yes. My comments were mostly aimed at a vMX/cRPD comparison. 
> I probably wasn't clear about that. Completely agree that it doesn't make 
> much sense to move from an existing vRR to cRPD just because. For a 
> greenfield thing I'd certainly lean cRPD over VRR at least in planning. Newer 
> cRPD has definitely come a long way relative to older. ( Although I haven't 
> had reason or cycles to really ride it hard and see where I can break it 
> yet. :) )

Agreed on green field straight to cRPD today, and fallback to vRR if
needed. Just because it is clear that vendor focus is there and wants
to see you there.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2024-02-09 Thread Saku Ytti via juniper-nsp

On Thu, 8 Feb 2024 at 17:11, Tom Beecher via juniper-nsp
 wrote:

> For any use cases that you want protocol interaction, but not substantive
> traffic forwarding capabilities , cRPD is by far the better option.

No one is saying that cRPD isn't the future, just that there are a lot
of existing deployments with vRR, which are run with some success, and
the entire stability of the network depends on it. Whereas cRPD is a
newer entrant, and early on back when I tested it, it was very feature
incomplete in comparison.
So those who are already running vRR, and are happy with it, changing
to cRPD just to change to cRPD is simply bad risk. Many of us don't
care about DRAM of vCPU, because you only need a small number of RRs,
and DRAM/vCPU grows on trees. But we live in constant fear of the
entire RR setup blowing up, so motivation for change needs to be solid
and ideally backed by examples of success in a similar role in your
circle of people.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX204 and IPv6 BGP announcements

2024-02-08 Thread Saku Ytti via juniper-nsp

On Thu, 8 Feb 2024 at 16:07, Mark Tinka via juniper-nsp
 wrote:

> So internally, if it attracts any traffic for non-specific destinations,
> does Junos send it /dev/null in hardware? I'd guess so...

In absence of more specifics, junos by default doesn't discard but
reject. There is essentially implied 0/0 static route to reject
adjacency. This can be changed to be discard, or you can just nail
down default discard.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2024-02-08 Thread Saku Ytti via juniper-nsp

On Thu, 8 Feb 2024 at 10:16, Mark Tinka  wrote:

> Is the MX150 still a current product? My understanding is it's an x86 
> platform running vMX.

No longer orderable.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2024-02-07 Thread Saku Ytti via juniper-nsp

On Thu, 8 Feb 2024 at 09:51, Roger Wiklund via juniper-nsp
 wrote:

> I'm curious, when moving from vRR to cRPD, how do you plan to manage/setup
> the infrastructure that cRPD runs on?

Same concerns, I would just push it back and be a late adopter. Rock
existing vRR while supported, not pre-empt into cRPD because vendor
says that's the future. Let someone else work with the vendor to
ensure feature parity and indeed perhaps get some appliance from the
vendor.

With HPE, I feel like there is a lot more incentive to sell integrated
appliances to you than before.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2024-02-06 Thread Saku Ytti via juniper-nsp

On Tue, 6 Feb 2024 at 18:35, Mark Tinka  wrote:

> IME, when we got all available paths, ORR was irrelevant.
>
> But yes, at the cost of some control plane resources.

Not just opinion, fact. If you see everything, ORR does nothing but adds cost.

You only need AddPath and ORR, when everything is too expensive, but
you still need good choices.

But even if you have resources to see all, you may not actually want
to have a lot of useless signalling and overhead, as it'll add
convergence time and risk of encouraging rare bugs to surface. In the
case where I deployed it, having all was not realistic possibly, in
that, having all would mean network upgrade cycle is determined when
enough peers are added, causing RIB scale to demand triggering full
upgrade cycle, despite not selling the ports already paid.
You shouldn't need to upgrade your boxes, because your RIB/FIB doesn't
scale, you should only need to upgrade your boxes, if you don't have
holes to stick paying fiber into.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Thanks for all the fish

2024-01-09 Thread Saku Ytti via juniper-nsp

What do we think of HPE acquiring JNPR?


I guess it was given that something's gotta give, JNPR has lost to
dollar as an investment for more than 2 decades, which is not
sustainable in the way we model our economy.

Out of all possible outcomes:
   - JNPR suddenly starts to grow (how?)
   - JNPR defaults
   - JNPR gets acquired

It's not the worst outcome, and from who acquires them, HPE isn't the
worst option, nor the best. I guess the best option would have been,
several large telcos buying it through a co-owned sister company, who
then are less interested in profits, and more interested in having a
device that works for them. Worst would probably have been Cisco,
Nokia, Huawei.

I think the main concern is that SP business is kinda shitty business,
long sales times, low sales volumes, high requirements. But that's
also the side of JNPR that has USP.

What is the future of NPU (Trio) and Pipeline (Paradise/Triton), why
would I, as HP exec, keep them alive? I need JNPR to put QFX in my DC
RFPs, I don't really care about SP markets, and I can realise some
savings by axing chip design and support. I think Trio is the best NPU
on the market, and I think we may have a real risk losing it, and no
mechanism that would guarantee new players surfacing to replace it.

I do wish that JNPR had been more serious about how unsustainable it
is to lose to the dollar, and had tried more to capture markets. I
always suggested why not try Trio-PCI in newegg. Long tail is long,
maybe if you could buy it for 2-3k, there would be a new market of
Linux PCI users who want wire rate programmable features for multiple
ports? Maybe ESXi server integration for various pre-VPC protection
features at wire-rate? I think there might be a lot of potential in
NPU-PCI, perhaps even FAB-PCI, to have more ports than single NPU-PCI.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2023-12-08 Thread Saku Ytti via juniper-nsp

I tried to advocate for both, sorry if I was unclear.

ORR for good options, add-path for redundancy and/or ECMPability.

On Fri, 8 Dec 2023 at 19:13, Thomas Scott  wrote:
>
> Why not both add-path + ORR?
> --
>
> Thomas Scott
> Sr. Network Engineer
> +1-480-241-7422
> tsc...@digitalocean.com
>
>
> On Fri, Dec 8, 2023 at 11:57 AM Saku Ytti via juniper-nsp 
>  wrote:
>>
>> On Fri, 8 Dec 2023 at 18:42, Vincent Bernat via juniper-nsp
>>  wrote:
>>
>> > On 2023-12-07 15:21, Michael Hare via juniper-nsp wrote:
>> > > I recognize Saku's recommendation of rib sharding is a practical one at 
>> > > 20M routes, I'm curious if anyone is willing to admit to using it in 
>> > > production and on what version of JunOS.  I admit to have not played 
>> > > with this in the lab yet, we are much smaller [3.5M RIB] worst case at 
>> > > this point.
>> >
>> > About the scale, I said routes, but they are paths. We plan to use add
>> > path to ensure optimal routing (ORR could be another option, but it is
>> > less common).
>>
>> Given a sufficient count of path options, they're not really
>> alternatives, but you need both. Like you can't do add-path , as
>> the clients won't scale. And you probably don't want only ORR, because
>> of the convergence cost of clients not having a backup option or the
>> lack of ECMP opportunity.
>>
>> --
>>   ++ytti
>> ___
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2023-12-08 Thread Saku Ytti via juniper-nsp

On Fri, 8 Dec 2023 at 18:42, Vincent Bernat via juniper-nsp
 wrote:

> On 2023-12-07 15:21, Michael Hare via juniper-nsp wrote:
> > I recognize Saku's recommendation of rib sharding is a practical one at 20M 
> > routes, I'm curious if anyone is willing to admit to using it in production 
> > and on what version of JunOS.  I admit to have not played with this in the 
> > lab yet, we are much smaller [3.5M RIB] worst case at this point.
>
> About the scale, I said routes, but they are paths. We plan to use add
> path to ensure optimal routing (ORR could be another option, but it is
> less common).

Given a sufficient count of path options, they're not really
alternatives, but you need both. Like you can't do add-path , as
the clients won't scale. And you probably don't want only ORR, because
of the convergence cost of clients not having a backup option or the
lack of ECMP opportunity.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2023-12-07 Thread Saku Ytti via juniper-nsp

On Thu, 7 Dec 2023 at 16:22, Michael Hare via juniper-nsp
 wrote:

> I recognize Saku's recommendation of rib sharding is a practical one at 20M 
> routes, I'm curious if anyone is willing to admit to using it in production 
> and on what version of JunOS.  I admit to have not played with this in the 
> lab yet, we are much smaller [3.5M RIB] worst case at this point.

2914 uses it, not out of desire (too new, too rare), but out of
necessity at scale 2914 needs. Surprisingly mature/robust for what it
is, and how rare routing suites are to support any type of
multithreading.

Of course the design is a relatively conservative and clever
compromise between building a truly multithreaded routing suite and
delivering something practical on a legacy codebase. It wouldn't help
in every RIB, but probably helps in every practical RIB. If you have a
low amount of duplicate RIB entries it might not be very useful, as
final collation of unique entries will be more or less single threaded
anyhow. But I believe anyone having a truly large RIB, like 20M, will
have massive duplication and will see significant benefit.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Hardware configuration for cRPD as RR

2023-12-06 Thread Saku Ytti via juniper-nsp

From a RPD, not cRPD perspective.

- 64GB is certainly fine, you might be able to do with 32GB
- Unless RRs are physically next to clients, you want to bump default
16kB TCP window to maximum 64kB window, and probably ask account team
for window scaling support (unsure if this is true for cRPD, or if
cRPD lets underlying kernel do this right, but you need to do same in
client end anyhow)
- You absolutely need sharding to put work on more than 1 core.

Sharding goes up-to 31, but very likely 31 is too much, and the
overhead of sharding will make it slower than running lower counts
like 4-8. Your core count likely shouldn't be higher than shards+1.

The sharding count and DRAM count are not specifically answerable, as
it depends on what the contents of the RIB is. Do a binary search for
both and measure convergence time, to find a good-enough number, I
think 64/32GB and 4-8 cores are likely good picks.

On Wed, 6 Dec 2023 at 22:30, Thomas Scott via juniper-nsp
 wrote:
>
> Also very curious in this regard.
>
> Best Regards,
> -Thomas Scott
>
>
> On Wed, Dec 6, 2023 at 12:58 PM Vincent Bernat via juniper-nsp <
> juniper-nsp@puck.nether.net> wrote:
>
> > Hey!
> >
> > cRPD documentation is quite terse about resource requirements:
> >
> > https://www.juniper.net/documentation/us/en/software/crpd/crpd-deployment/topics/concept/crpd-hardware-requirements.html
> >
> > When used as a route reflector with about 20 million routes, what kind
> > of hardware should we use? Documentation says about 64 GB of memory, but
> > for everything else? Notably, should we have many cores but lower boost
> > frequency, or not too many cores but higher boost frequency?
> >
> > There is a Day One book about cRPD, but they show a very outdated
> > processor (Sandy Lake, 10 years old).
> >
> > Is anyone using cRPD as RR with a similar scale and can share the
> > hardware configuration they use? Did you also optimize the underlying OS
> > in some way or just use a stock configuration?
> >
> > Thanks.
> > ___
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> >
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] backup routing engine authente from in-band interface

2023-11-09 Thread Saku Ytti via juniper-nsp

On Thu, 9 Nov 2023 at 10:38, Chen Jiang via juniper-nsp
 wrote:

> Just want to confirm if Juniper backup routing engine could authenticate
> users from in-band interface like ge-0/0/0 to the AAA server?
>
> If not, do we have a solution? The scenario is MX960 with dual RE and no
> OOB network. But need to authenticate users login backup RE from AAA.

No solution. Well sort of hacky solution, if you route AAA server
statically over FXP/EM. But generally speaking, hard no, only local
authentication on backup RE.

But luckily they've fixed this awkward mismatch, and no remote
authentication on either console on EVO at all. Another thing that
might surprise people is that the lo0 filter no longer applies to
EM/FXP ports in EVO.

Ideally we'd all be asking vendors to implement true lights out
ethernet ports, with dedicated control-planes, like Cisco CMP. So we
could get rid of problematic RS232 and useless in-band MGMT ports
(EM/FXP are actively dangerous).
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 - Edge Router

2023-10-26 Thread Saku Ytti via juniper-nsp

On Thu, 26 Oct 2023 at 16:40, Mark Tinka via juniper-nsp
 wrote:

> I'd suggest staying very close to our SE's for the desired outcome we
> want for this development. As we have seen before, Juniper appear
> reasonably open to operator feedback, but we would need to give it to
> them to begin with.

I urge everyone to give them the same message as I've given.

Any type of license, even timed license, after it expires will not
cause an outage. And enforcement would be 'call home' via 'http(s)'
proxy, which reports the license-use data to Juniper sales, making it
a commercial problem between Juniper and you.

Proxy, so that you don't need Internet access on the device.
Potentially you could ask for encryption-less mode, if you want to log
on the proxy what is actually being sent to the vendor. I don't give
flying or any other method of locomotion fuck about leaking
information.

I believe this is a very reasonable give/take compromise which is
marketable, but if we try to start punching holes through esoteric
concerns, we'll get boxes which die periodically because someone
forgot to re-up. This is a real future that may happen, unless we
demand it must not.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 - Edge Router

2023-10-26 Thread Saku Ytti via juniper-nsp

On Thu, 26 Oct 2023 at 07:45, Mark Tinka via juniper-nsp
 wrote:

> While there are some women who enjoy engineering, and some men who enjoy
> nursing, most women don't enjoy engineering, and most men don't enjoy
> nursing. I think we would move much farther ahead if we accepted this,

> If you look at the data, on average, 70% of new enrollments at
> university are women, and 60% of all graduands are women. And yet, 90%
> of all STEM students are men, while 80% of all psychology students are
> women. Perhaps there is a clue in there :-)...

Even if you believe/think this, it is not in your best interest to
communicate anything like this, there is nothing you can win, and
significant downside potential.

I believe the question is not about what data says, the question is,
why does the data say that. And the thesis/belief is, data should not
say that, that there is no fundamental reason why the data would say
so. The question is, is the culture reinforcing this from day0,
causing people to believe it is somehow inherent/natural.

>From scientific POV, we currently don't have any real reason to
believe there are unplastic differences in the brain from birth which
cause this. There might, but science doesn't know that. Scientifically
we should today expect very even distribution, unless culturally
biased.

But of course inequality, inequitability is everywhere, not an
hyperbole, but you can't compare anything on how we choose who does
what and come up with anything that resembles fair distribution. Zip
code has a lot of predictive power where you'll end up in your life,
and that is hardly your fault or merit. Top level managers are not
just disproportionately men, but they are disproportionately men with
+1.5SD height, and there is no scientific reason to believe zip code
or height suggests stronger ability.

It is just a really unfair world to live in, but luckily I am on the
beneficiary side of the unfairness, which I am strong enough to
accept.

I have a curious anecdote about discriminatory outcomes, without any
active discrimination. I think it's easier to discuss as it doesn't
include any differences in the groups of people really. In Finland a
minority natively speaks Swedish, majority Finnish. After 1000 years,
the minority continues to statistically have better education, live
longer, have more savings and higher salary. For this particular
example, only rationale I've come up, which could explain it, is that
the Swedish speaking minority choose other Swedish speaking people as
their peers, so they feel lower sense of accomplishment performing at
Finnish speaker mean level, which causes them to push themselves
little bit further to achieve same satisfaction level as Finnish
speaking majority would feel at lower level of accomplishment. Causing
it to perpetuate indefinitely despite having 'fixed' all active
discriminatory biases since forever. That is, if you ever create,
through any mechanism at all, some biasing between groups, this bias
will never completely go away.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 - Edge Router

2023-10-25 Thread Saku Ytti via juniper-nsp

On Wed, 25 Oct 2023 at 15:26, Aaron1 via juniper-nsp
 wrote:

> Years ago I had to get a license to make my 10g interfaces work on my MX104

I think we need to be careful in what we are saying.

We can't reject licences out right, that's not a fair ask and it won't happen.

But we can reject licenses that expire in operation and cause an
outage. That I think is a very reasonable ask.  I know that IOS XE for
example will do this, you run out of license and your box breaks. I
swapped out from CRS1k to ASR1k because I knew the organisation would
eventually fail to fix the license ahead of expiry.

I'm happy if the device calls homes via https proxy, and reports my
license use, and the sales droid tells me I'm not compliant with
terms. Making it a commercial problem is fine, making it an acute
technical problem is not.


In your specific case, the ports never worked, you had to procure a
license, and the license never dies. So from my POV, this is fine. And
being absolutist here will not help, as then you can't even achieve
reasonable compromise.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 - Edge Router

2023-10-25 Thread Saku Ytti via juniper-nsp

On Tue, 24 Oct 2023 at 22:21, Aaron Gould via juniper-nsp
 wrote:

> My MX304 trial license expired last night, after rebooting the MX304,
> various protocols no longer work.  This seems more than just
> honor-based... ospf, ldp, etc, no longer function.  This is new to me;
> that Juniper is making protocols and technologies tied to license.  I
> need to understand more about this, as I'm considering buying MX304's.

Juniper had assured me multiple times that they strategically have
decided to NEVER do this. That it's an actual decision they've
considered at the highest level, that they will not downgrade devices
in operation. I guess 'reboot' is not in-operation?

Notion that operators are able to keep licenses up-to-date and valid
is naive, we can't keep SSL certificates valid and we've had decades
of time to learn, it won't happen. You will learn about the problem,
when shit breaks.

The right solution would be a phone-home, and a vendor sales rep
calling you 'hey you have expired licenses, let's solve this'. Not
breaking the boxes. Or 'your phone home hasn't worked, you need to fix
it before we can re-up your support contract'.
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Q. Is anyone deploying TCP Authentication Option (TCP-AO) on their BGP peering Sessions?

2023-09-26 Thread Saku Ytti via juniper-nsp

On Wed, 27 Sept 2023 at 03:50, Barry Greene via juniper-nsp
 wrote:

> Q. Is anyone deploying TCP Authentication Option (TCP-AO) on their BGP 
> peering Sessions?
>
> I’m not touching routers right now. I’m wondering if anyone has deployed, 
> your experiences, and thoughts?

For the longest time (like close to decade) no one supported it at
all, not even Juniper, because Juniper implementation was pre-RFC
which was incompatible with RFC.

To my understanding today there is support in Junos, IOS-XE, IOS-XR,
SROS, EOS and VRP. I have no operational experience to share.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Junos 21+ Killing Finger Muscle Memory...

2023-07-17 Thread Saku Ytti via juniper-nsp

On Sun, 16 Jul 2023 at 19:47, Tim Franklin via juniper-nsp
 wrote:

> You missed the fun part where you have to explain *again* every few
> months to the CISO and their minions why you can't adhere to the
> written-by/for-Windows-admins "Patching Policy" that says everything in
> the company is upgraded to "the latest release" within 14 days, no
> software version is ever "more than three months old", and similar
> messages of joy ;)

What is the explanation? Is the explanation that NOS are closed source
software with proprietary or difficult to integrate hardware. And that
revenue comes from support contracts, which creates moral hazard,
which does not suggest intent in outcome, but suggests bias in organic
outcome towards bad software.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-05 Thread Saku Ytti via juniper-nsp

On Wed, 5 Jul 2023 at 04:45, Mark Tinka  wrote:

> This is one of the reasons I prefer to use Ethernet switches to
> interconnect devices in large data centre deployments.
>
> Connecting stuff directly into the core routers or directly together
> eats up a bunch of ports, without necessarily using all the available
> capacity.
>
> But to be fair, at the scale AWS run, I'm not exactly sure how I'd do
> things.

I'm sure it's perfectly reasonable, with some upsides and some
downsides compared to hiding the overhead ports inside chassis fabric
instead of exposing them in front-plate.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-04 Thread Saku Ytti via juniper-nsp

On Tue, 4 Jul 2023 at 08:34, Mark Tinka  wrote:

> Yes, I watched this NANOG session and was also quite surprised when they
> mentioned that they only plan for 25% usage of the deployed capacity.
> Are they giving themselves room to peak before they move to another chip
> (considering that they are likely in a never-ending installation/upgrade
> cycle), or trying to maintain line-rate across a vast number of packet
> sizes? Or both?

You must have misunderstood. When they fully scale the current design,
the design offers 100T capacity, but they've bought 400T of ports. 3/4
ports are overhead to build the design, to connect the pizzaboxes
together. All ports are used, but only 1/4 are revenue.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-02 Thread Saku Ytti via juniper-nsp

On Sun, 2 Jul 2023 at 17:15, Mark Tinka  wrote:

> Technically, do we not think that an oversubscribed Juniper box with a
> single Trio 6 chip with no fabric is feasible? And is it not being built
> because Juniper don't want to cannibalize their other distributed
> compact boxes?
>
> The MX204, for example, is a single Trio 3 chip that is oversubscribed
> by an extra 240Gbps. So we know they can do it. The issue with the MX204
> is that most customers will run out of ports before they run out of
> bandwidth.

Not disagreeing here, but how do we define oversubscribed here? Are
all boxes oversubscribed which can't do a) 100% at max size packet and
b) 100% at min size packet and c) 100% of packets to delay buffer, I
think this would be quite reasonable definition, but as far as I know,
no current device of non-modest scale would satisfy each 3, almost all
of them would only satisfy a).

Let's consider first gen trio serdes
1) 2/4 goes to fabric (btree replication)
2) 1/4 goes to delay buffer
3) 1/4 goes to WAN port
(and actually like 0.2 additionally goes to lookup engine)

So you're selling less than 1/4th of the serdes you ship, more than
3/4 are 'overhead'. Compared to say Silicon1, which is partially
buffered, they're selling almost 1/2 of the serdes they ship. You
could in theory put ports on all of these serdes in BPS terms, but not
in PPS terms at least not with off-chip memory.

And in each case, in a pizza box case, you could sell those fabric
ports, as there is no fabric. So given NPU has always ~2x the bps in
pizza box format (but usually no more pps). And in MX80/MX104 Juniper
did just this, they sell 80G WAN ports, when in linecard mode it only
is 40G WAN port device. I don't consider it oversubscribed, even
though the minimum packet size went up, because the lookup capacity
didn't increase.

Curiously AMZN told Nanog their ratio, when design is fully scaled to
100T is 1/4, 400T bought ports, 100T useful ports. Unclear how long
100T was going to scale, but obviously they wouldn't launch
architecture which needs to be redone next year, so when they decided
100T cap for the scale, they didn't have 100T need yet. This design
was with 112Gx128 chips, and boxes were single chip, so all serdes
connect ports, no fabrics, i.e. true pizzabox.
I found this very interesting, because the 100T design was, I think 3
racks? And last year 50T asics shipped, next year we'd likely get 100T
asics (224Gx512? or 112Gx1024?). So even hyperscalers are growing
slower than silicon, and can basically put their dc-in-a-chip, greatly
reducing cost (both CAPEX and OPEX) as no need for wasting 3/4th of
the investment on overhead.
The scale also surprised me, even though perhaps it should not have,
they quoted +1M network devices, considering they quote +20M nitro
system shipped, that's like <20 revenue generating compute per network
device. Depending on the refresh cycle, this means amazon is buying
15-30k network devices per month, which I expect is significantly more
than cisco+juniper+nokia ship combined to SP infra, so no wonder SPs
get little love.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-02 Thread Saku Ytti via juniper-nsp

On Sun, 2 Jul 2023 at 15:53, Mark Tinka via juniper-nsp
 wrote:

> Well, by your definition, the ASR9903, for example, is a distributed
> platform, which has a fabric ASIC via the RP, with 4x NPU's on the fixed
> line card, 2x NPU's on the 800Gbps PEC and 4x NPU's on the 2Tbps PEC.

Right as is MX304.

I don't think this is 'my definition', everything was centralised
originally, until Cisco7500 came out, which then had distributed
forwarding capabilities.

Now does centralisation truly mean BOM benefit to vendors? Probably
not, but it may allow to address one lower margin market which as
lower per-port performance needs, without cannibilising larger margin
market.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-02 Thread Saku Ytti via juniper-nsp

On Sun, 2 Jul 2023 at 12:11, Mark Tinka  wrote:

> Well, for data centre aggregation, especially for 100Gbps transit ports
> to customers, centralized routers make sense (MX304, MX10003, ASR9903,
> e.t.c.). But those boxes don't make sense as Metro-E routers... they can
> aggregate Metro-E routers, but can't be Metro-E routers due to their cost.

In this context, these are all distributed platforms, they have
multiple NPUs and fabric. Centralised has a single forwarding chip,
and significantly more ports than bandwidth.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-07-02 Thread Saku Ytti via juniper-nsp

On Sun, 2 Jul 2023 at 11:38, Mark Tinka  wrote:

> So all the above sounds to me like scenarios where Metro-E rings are
> built on 802.1Q/Q-in-Q/REP/STP/e.t.c., rather than IP/MPLS.

Yes. Satellite is basically VLAN aggregation, but a little bit less
broken. Both are much inferior to MPLS. But usually that's not the
comparison due to real or perceived cost reasons. So in absence of a
vendor selling you the front-plate you need, option space often
considered is satellite or vlan aggregation, instead of connecting
some smaller MPLS edge boxes to bigger aggregation MPLS boxes, which
would be, in my opinion, obviously better.

But as discussed, centralised chassis boxes are appearing as a new
option to the option space.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-28 Thread Saku Ytti via juniper-nsp

On Tue, 27 Jun 2023 at 19:47, Tarko Tikan via juniper-nsp
 wrote:

> Single NPU doesn't mean non-redundant - those devices run two (or 4 in
> ACX case) BCM NPUs and switch "linecards" over to backup NPU when
> required. All without true fabric and distributed NPUs to keep the cost
> down.

This of course makes it more redundant than distributed box, because
distributed boxes don't have NPU redundancy.

Somewhat analogous how RR makes your network more redundant than
full-mesh. Because in full-mesh every iBGP flap is out of order,
whereas in RR a single iBGP flap has no impact. Of course parallel
continues to scope of outage, in full-mesh losing single iBGP isn't a
big outage, in RR it's binary, either nothing is broken or all is.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-28 Thread Saku Ytti via juniper-nsp

On Tue, 27 Jun 2023 at 19:32, Mark Tinka  wrote:

> > Yes.
>
> How?

Apart from obvious stuff like QoS getting difficult, not full feature
parity with VLAN and main interface, or counters becoming less useful
as many are port level so identifying true source port may not be
easy. There are things that you'll just discover over time that don't
even come to your mind, and I don't know what those will be in your
deployment. I can give anecdotes

2*VXR termination of metro L2 ring
- everything is 'ok'
- ethernet pseudowire service is introduced to customers
- occasionally there are loops now
- well VXR goes to promisc mode when you add ethernet pseudowire,
because while it has VLAN local significancy, it doesn't have per-vlan
MAC filter.
- now unrelated L3 VLAN, which is redundantly terminated to both
VXR has customer CE down in the L2 metro
- because ARP timeout is 4h, and MAC timeout is 300s, the the
metro will forget the MAC fast, L3 slowly
- so primary PE gets packet off of internet, sends to metro, metro
floods to all ports, including secondary PE
- secondary PE sends packet to primary PE, over WAN
- now you learned 'oh yeah, i should have ensured there is
per-vlan mac filter' and 'oh yeah, my MAC/ARP timeouts are
misconfigured'
- but these are probably not the examples you'll learn, they'll be
something different
- when you do satellite, you can solve lot of the problem scope by
software as you control L2 and L3, and can do proprietary code

L2 transparency
- You do QinQ in L2 aggregation, to pass customer frame to
aggregation termination
- You do MAC rewrite in/out of the L2 aggregation (customer MAC
addresses get rewritten coming in from customer, and mangled back to
legitimate MAC going out to termination). You need this to pass STP
and such in pseudowires from customer to termination
- In termination hardware physically doesn't consider VLAN+ISIS
legitimate packet and will kill it, so you have no way of supporting
ISIS inside pseudowire when you have L2 aggregation to customer.
Technically it's not valid, technically ISIS isn't EthernetII, and
802.3 doesn't have VLANs. But technically correct rarely reduces the
red hue in customers faces when they inform about issues they are
experiencing.
- even if this works, there are plenty of other ways pseudowire
transparency suffers with L2 aggregation, as you are experiencing set
of limitations from two box, instead of one box when it comes to
transparency, and these sets wont be identical
- you will introduce MAC limit to your point-to-point martini
product, which didn't previously exist. Because your L2 ring is
redundant and you need mac learning. If it's just single switch, you
can turn off MAC learning per VLAN, and be closer to satellite
solution

Convergence
- your termination no longer observes hardware liveness detection,
so you need some solution to transfer L2 port state to VLAN. Which
will occasionally break, as it's new complexity.

> > Like cat6500/7600 linecards without DFC, so SP gear with centralised
> > logic, and dumb 'low performance' linecards. Given low performance
> > these days is multi Tbps chips.
>
> While I'm not sure operators want that, they will take a look if the
> lower price does not impact performance.
>
> There is more to just raw speed.

I mean of course it affects performance, as you are now having all
ports in single chip, instead of having many chips.  But when it comes
to PPS people are confused about performance, no one* (well maybe 1 in
100k running some esoteric application) cares about wire-rate.
If you are running a card like 4x100 ASR9k, you absolutely want
wire-speed, because there is 1 chip per port, and you want the pool
port is drawing from to have 1 port wire rate free, to ingest dos in
mostly idle interface. But if you have 128x100GE in a chip, you're
happy with 1/3 PPS easily, probably much much less. Because you're not
gonna exhaust that massive pool in any practical scenario, and several
interfaces simultaneously can ingest dos.




-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-27 Thread Saku Ytti via juniper-nsp

On Tue, 27 Jun 2023 at 17:40, Mark Tinka  wrote:

> Would that be high-density face-plate solutions for access aggregation
> in the data centre, that they are> Are you suggesting standard 802.1Q/Q-in-Q 
> trunking from a switch into a
> "pricey" router line card that support locally-significant VLAN's per
> port is problematic?

Yes.

> I'm still a bit unclear on what you mean by "centralized"... in the
> context of satellite, or standalone?

Like cat6500/7600 linecards without DFC, so SP gear with centralised
logic, and dumb 'low performance' linecards. Given low performance
these days is multi Tbps chips.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-27 Thread Saku Ytti via juniper-nsp

On Tue, 27 Jun 2023 at 06:02, Mark Tinka via juniper-nsp
 wrote:

> > Similar use case here but we use a QFX as a fusion satellite if port 
> > expansion is required.
> > Works well as an small site start up option.
>
> Are vendors still pushing their satellite switches :-)?
>
> That technology looked dodgy to me when Cisco first proposed it with
> 9000v, and then Juniper and Nokia followed with their own implementations.

Juniper messaging seems to be geo-specific, in EU their sales seems to
sell them more willingly than in US. My understanding is that
basically fusion is dead, but they don't actually have solution for
access/SP market front-plate, so some sales channels are still
pitching it as the solution.

Nokia seems very committed to it.

I think the solution space is
   a) centralised lookup engines - so you have cheap(er) line cards
for high density low pps/bps
   b) satellite
   c) vlan aggregation

Satellite is basically a specific scenario of c), but it does bring
significant derisking compared to vlan aggregation, as a single
instance is designing it and can solve some problems better than can
be solved by vendor agnostic vlan aggregation. Vlan aggregation looks
very simple on the surface but is fraught with problems, many of which
are slightly better solved in satellites, and these problems will not
be identified ahead of time but during the next two decades of
operation.

Centralised boxes haven't been available for quite a few years, but
hopefully Cisco is changing that, I think it's the right compromise
for SPs.

But in reality I'm not sure if centralised actually makes sense, since
I don't think we can axiomatically assume it costs less to the vendor,
even though there is less BOM, the centralised design does add more
engineering cost. It might be basically a way to sell boxes to some
market at lower margins, while ensuring that hyperscalers don't buy
them, instead of directly benefiting from the cost reduction.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX80 watchdog

2023-06-12 Thread Saku Ytti via juniper-nsp

Do you monitor RPD task memory use and Freebsd process memory use?
Is it possible you are leaking memory over time, and getting DRAM
pressure at the 1500d mark?

It might be this:
https://prsearch.juniper.net/problemreport/PR108

Initially as you said it happens at strenuous SSD access, I was
thinking that Junos does have RE failover limits on disk-io read/write
latency, which causes false positive RE switchovers now and again
(more people have hit them, than people are aware of hitting them).
But in your case this can't possibly be true, because the MX80 doesn't
have two RE. But for completeness,
https://www.juniper.net/documentation/us/en/software/junos/high-availability/topics/ref/statement/not-on-disk-underperform-edit-chassis.html

On Mon, 12 Jun 2023 at 18:35, Tom Bird via juniper-nsp
 wrote:
>
> Afternoon,
>
> I've been upgrading some MX80 routers to from 15.1, consistently they
> seem to fall over during periods of strenuous SSD access, or indeed once
> during a "commit check".
>
> We thought this might be due to the uptime (~1500 days) so have been
> rebooting them prior to the upgrade which has mostly stopped the problem
> from happening.  Not completely, however - they get stuck for about an
> hour doing this, after which they reboot and continue to work.
>
>
> watchdog: scheduling fairness gone for 3540 seconds now.
> (da1:umass-sim1:1:0:0): Synchronize cache failed, status == 0x34, scsi
> status == 0x0
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
>
>
> I'd like it if they waited a bit less than an hour and see the watchdog
> can be configured but I can't find any useful documentation about
> exactly what conditions it would fire and what the defaults are.
>
> Currently there is no configuration under "system processes watchdog",
> and it looks like it can be enabled, disabled and the timeout set up to
> 3600 seconds.
>
> So my question is, is it this watchdog that is resetting the thing after
> an hour and would it be reasonable to set the timeout to say 300 seconds
> so there was less down time if it went wrong.
>
> Thanks,
> --
> Tom
>
> :: www.portfast.co.uk / @portfast
> :: hosted services, domains, virtual machines, consultancy
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Unknown Attribute 28 in BGP

2023-06-12 Thread Saku Ytti via juniper-nsp

Either will help, configure either or both and you're good.

Actual fixed release will behave the same as if drop-path-attribute 28
had been configured. That is read T, read L, seek past V, without
parsing.

On Sun, 11 Jun 2023 at 19:36, Einar Bjarni Halldórsson  wrote:
>
> On 6/11/23 15:24, Saku Ytti wrote:
> > set protocols bgp drop-path-attributes 28 works if your release is too
> > old for set protocols bgp bgp-error-tolerance, and is preferable in
> > some ways, as it will protect your downstream as well.
> >
>
> 18.2R3-S3.11 supports protocols bgp bgp-error-tolerance, but reading
> through the docs, I see:
>
> > The bgp-error-tolerance statement overrides this behavior so that the 
> > following BGP error handling is in effect:
> >
> > For fatal errors, Junos OS sends a notification message titled Error 
> > Code Update Message and resets the BGP session. An error in the 
> > MP_{UN}REACH attribute is considered to be fatal. The presence of multiple 
> > MP_{UN}REACH attributes in one BGP update is also considered to be a fatal 
> > error. Junos OS resets the BGP session if it cannot parse the NLRI field or 
> > the BGP update correctly. Failure to parse the BGP update packet can happen 
> > when the attribute length does not match the length of the attribute value.
>
> I read this section so that even if I configure bgp-error-tolerance, it
> won't make a difference since junos considers this a fatal error and
> resets the BGP session.
>
> .einar



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Unknown Attribute 28 in BGP

2023-06-11 Thread Saku Ytti via juniper-nsp

set protocols bgp drop-path-attributes 28 works if your release is too
old for set protocols bgp bgp-error-tolerance, and is preferable in
some ways, as it will protect your downstream as well.

On Sun, 11 Jun 2023 at 17:25, Einar Bjarni Halldórsson via juniper-nsp
 wrote:
>
> Hi,
>
> We have two MX204 edge routers, each with a connection to a different
> upstream provider (and some IXP peerings on both).
>
> Last week the IPv6 transit session on one of them starting flapping. It
> turns out that we got hit with
> https://labs.ripe.net/author/emileaben/unknown-attribute-28-a-source-of-entropy-in-interdomain-routing/
>
> It only happened on one of our edge routers, so I assume for now that
> either our other transit provider filtered the affected route updates,
> or stripped the attribute.
>
> The post from RIPE links to
> https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp-error-messages.html
> but I can't see that bgp-error-tolerance helps us, since this type of
> malformed update is always fatal.
>
> Our edge routers are both running Junos 18.2R3-S3.11. I was planning on
> upgrading to 22.2R3 regardless of this error, but it would be nice to
> know that this problem has been fixed in later version, or mitigations
> introduced that can be used.
>
> Anybody know about this problem in particular, or have ideas on
> mitigating malformed BGP updates?
>
> .einar
> ISNIC
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jun 2023 at 20:37, Andrey Kostin  wrote:

> Sounds more like a datacenter setup, and for DC operator it could be
> attractive to do at scale. For a traditional ISP with relatively small
> PoPs spread across the country it may be not the case.

Sure, not suggesting everyone is in the target market, but suggesting
the target market includes people who are not developers with no
interest in being one. For a typical access network with multiple
pops, it may be the wrong optimisation point.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jun 2023 at 19:15, Andrey Kostin  wrote:

> Can anything else be inserted in this socket? If not, then what's the
> point? For server CPUs there are many models with different clocking and
> number of cores, so socket provides a flexibility. If there is only one
> chip that fits the socket, then the socket is a redundant part.

Not that I know. I think the point may be decouplement. BRCM doesn't
want to do business with just everyone. This allows someone to build
the switch, without providing the chips. Then customers can buy a
switch from this vendor and chip directly from BRCM.
I could imagine some big players like FB and AMZN designing their own
switch, having some random shop actually build it. But Broadcom saying
'no, we don't do business with you'.  This way they could actually get
the switch from anywhere, while having a direct chip relationship with
BRCM.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jun 2023 at 18:46, Andrey Kostin  wrote:

> I'm not in this market, have no qualification and resources for
> development. The demand in such devices should be really massive to
> justify a process like this.

Are you not? You use a lot of open source software, because someone
else did the hard work, and you have something practical.

The same would be the thesis here,  You order the PCI NPU from newegg,
and you have an ecosystem of practical software to pull from various
sources. Maybe you'll contribute something back, maybe not.

Very typical network is a border router or two, which needs features
and performance, then switches to connect to compute. People who have
no resources or competence to write software could still be users in
this market.
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jun 2023 at 17:26, Mark Tinka  wrote:

> Well, the story is that Cisco are doing this with Meta and Microsoft on
> their C8000 platform, and apparently, doing billions of US$ in business
> on the back of that.

I'm not convinced at all that leaba is being sold. I think it's sold
conditionally when customers would otherwise be lost.

I am reminder of this:
https://www.servethehome.com/this-is-a-broadcom-tomahawk-4-64-port-400gbe-switch-chip-lga8371-intel-amd-ampere/

LGA8371 socketed BRCM TH4. Ostensibly this allows a lot more switches
to appear in the market, as the switch maker doesn't need to be
friendly with BRCM. They make the switch, the customer buys the chip
and sockets it. Wouldn't surprise me if FB, AMZN and the likes would
have pressed for something like this, so they could use cheaper
sources to make the rest of the switch, sources which BRCM didn't want
to play ball with.

But NPU from newegg and community writes code that doesn't exist, and
I think it should and there would be volume in it, but no large volume
to any single customer.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

2023-06-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jun 2023 at 16:58, Andrey Kostin via juniper-nsp
 wrote:

> Not sure why it's eye-watering. The price of fully populated MX304 is
> basically the same as it's predecessor MX10003 but it provides 3.2T BW
> capacity vs 2.4T. If you compare with MX204, then MX304 is about 20%
> expensive for the same total BW, but MX204 doesn't have redundant RE and
> if you use it in redundant chassis configuration you will have to spend
> some BW on "fabric" links, effectively leveling the price if calculated
> for the same BW. I'm just comparing numbers, not considering any real

That's not it, RE doesn't attach to fabric serdes.

You are right that the MX304 is the successor of MX10003 not MX201.

MX80, M104 and MX201 are unique in that they are true pizzabox Trios.
They have exactly 1 trio, and both WAN and FAB side connect to WAN
ports (not sure if MX201 just leaves them unconnected) Therefore say
40G Trio in linecard mode is 80G Trio in pizza mode (albeit PPS stays
the same) as you're not wasting capacity to non-revenue fabric ports.
This single Trio design makes the box very cost effective, as not only
do you just have one Trio and double the capacity per Trio, but you
also don't have any fabric chip and fabric serdes.

MX304 however has Trio in the linecard, so it really is very much a
normal chassis box. And having multiple Trios it needs fabric.

I do think Juniper and the rest of the vendors keep struggling to
identify 'few to many' markets, and are only good at identifying 'many
to few' markets. MX304 and ever denser 512x112G serdes chips represent
this.

I expect many people in this list have no need for more performance
than single Trio YT in any pop at all, yet they need ports. And they
are not adequately addressed by vendors. But they do need the deep
features of NPU.

I keep hoping that someone is so disruptive that they take the
nvidia/gpu approach to npu. That is, you can buy Trio PCI from newegg
for 2 grand, and can program it as you wish. I think this market
remains unidentified and even adjusting to cannibalization would
increase market size.
I can't understand why JNPR is not trying this, they've lost for 20
years to inflation in valuation, what do they have to lose?

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.

2023-06-06 Thread Saku Ytti via juniper-nsp

On Tue, 6 Jun 2023 at 06:54, Mark Tinka via juniper-nsp
 wrote:

> > While I have a lot of sympathy for Saku's pragmatism, I prefer to file off 
> > the ugly edges of old justifications when I can... but it's done one commit 
> > at a time.
>>
> Going back to re-do the implementation from scratch would be a
> non-starter. There is simply too much water under this bridge.

I am not implying it is pragmatic or possible, just correct from a
design point of view.

Commercial software deals with competing requirements, and these
requirements are not constructive towards producing maintainable clean
code. Over time commercial software becomes illiquid with its
technical debt.

There is no real personal reward for paying technical debt, because
almost invariably it takes a lot of time, brings no new revenue and
non-coder observing your work only sees the outages the debt repayment
caused. While another person who creates this debt creating new
invoiceable features and bug fixes in ra[pb]id manner is a star to the
non-coder observers.

Not to say our open source networking is always great either, Linux
developers are notorious about not asking SMEs 'how has this problem
been solved in other software'. There are plenty of anecdotes to
choose from, but I'll give one.

- In 3.6 kernel, FIB was introduced to replace flow-cache, of course
anyone dealing with networking could have told kernel developers day1
why flow-cache was a poor idea, and what FIB is, how it is done, and
why it is a better idea.
- In 3.6 FIB implementation, ECMP was solved by essentially randomly
choosing 1 option of many, per-packet. Again they could have asked
even junior network engineers 'how does ECMP work, how should it be
done, I'm thinking of doing like this, why do you think they've not
done this in other software?' But they didn't.
- in 4.4 Random ECMP was changed to do hashed ECMP

I still continue to catch discussions about poor TCP performance on
Linux ECMP environment, then I first ask what kernel do you have, then
I explain to them why per-packet + cubic will never ever perform. So
for 4 years ECMP was completely broke, and reading ECMP release notes
in 4.4 not even developers had completely understood just how bad the
problem one, so we can safely assume people were not running ECMP.

Another example was when I tried to explain to the OpenSSH mailing
list, that ''TOS' isn't a thing, and got a confident reply that TOS
absolutely is a thing, prec/DSCP are not. Luckily a few years later
Job fixed OpenSSH packet classification.

But these examples are everywhere, so it seems you either choose
software written by people who understand the problem but are forced
to write unmaintainable code, or you choose software by people who are
just now learning about the problem and then solve it without
discovering prior art, usually wrong.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.

2023-06-05 Thread Saku Ytti via juniper-nsp

On Mon, 5 Jun 2023 at 11:13, Lukas Tribus via juniper-nsp
 wrote:

> in Cisco land I worked around VRF or source interface selection
> limitations for RTR by using SSH as a transport method, which then
> used SSH client source-vrf/source-interface configurations.
>
> I don't know if JunOS supports SSH transported RTR though.

It is immaterial, it wouldn't work.

If someone would actually need to make it work, they'd leak between
VRF/Internet, so that RTR configured on the Internet actually goes via
the NMS VRF. This could be accomplished in a multitude of poor ways.
Egress could be next-table static route, ingress could be firewall
filter with from source-address rtr then routing-instance default. Or
it could be LT between VRF and default instance.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] JunOS RPKI/ROA database in non-default routing instance, but require an eBGP import policy in inet.0 (default:default LI:RI) to reference it.

2023-06-05 Thread Saku Ytti via juniper-nsp

I totally agree this should work, and it is unfortunate that you are
struggling to make it work.

Having said that, it is asking for trouble managing your devices in a
VRF, you will continue to find issues and spend time/money working
with vendors to solve those.

It is safer to put the service (internet) in VRF, and leave the main
table for signalling and NMS, if you want to create this distinction.
It will also make it a lot more convenient to leak between instances
and create subInternets, like peeringInternet, to avoid peers from
default routing to you.


https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp_origin_validation.html
NOTE: RPKI validation is available only in the primary instance. If
you configure RPKI validation for a routing instance, then the RPKI
validation fails with the following error message RV instance is not
running.


I consider it a design error that NOS conceptually has two types of
instances, main instance and VRF instance. This distinction makes it
more expensive to write and maintain the NOS and it makes it more
fragile.

General principle applies, do boring things, get boring results. Same
reason why IPv6 continues to be 2nd class citizen and is a bad
candidate for your NMS/signalling AFI, people don't use it, so if you
do, you will be responsible for driving vendors to fix it. Time which
you likely should be spending doing something else.

On Mon, 5 Jun 2023 at 06:52, Chris Kawchuk via juniper-nsp
 wrote:
>
> Great idea! but no dice. :( didn't work.
>
> Seems the whole "VRF -> back to base table" operations that we'd all love to 
> do easily in JunOS rears its ugly head yet again ;)
>
> FWIW - Some friends in the industry *do* use that knob, but they're "going 
> the other way". i.e. The RPKI RV Database  is in inet.0 && Internet is in a 
> VRF. Apparently that does work AOK for them; however it's "fiddly" as you say.
>
> FWIW - here's the VRF's config... pretty darned basic.
>
> routing-instances {
> admin {
> routing-options {
> validation {
> notification-rib [ inet.0 inet6.0 ];   ## << No impact on the 
> default:default LI:RI RV database
> group routinator {
> session 10.x.x.x {
> refresh-time 120;
> port 3323;
> }
> }
> }
> }
> description "Dummy admin vrf - to test RPKI inside a 
> routing-instance";
> instance-type vrf;
> interface xe-0/0/3.0;   ## << the RPKI server is setting on the 
> other end of this /30
> vrf-target target::xxx;
> }
> }
>
> FWIW using a vMX for testing - running JunOS 20.4R3-S4.8.
>
> Basically i'm asking "is there a way to do this without having to stick the 
> validator DB config into the basec onfig [routing-options validation {}] 
> stanza?
>
> .If it must, then yeah, it's easy enough to do a rib group, a static /32 
> next-table/no-readvertise/no-install, lt-x/x/x stitch, route leak, etc...to 
> get the default:default instance to "use the VRF" to reach the RPKI server.  
> I just don't want to go down that road (yet) if I don't have to; as the 
> 'technical elegance' (read: OCD) portion of my brain wants to avoid that if 
> it can.
>
> - CK.
>
>
>
>
> > On Jun 5, 2023, at 13:12, David Sinn  wrote:
> >
> > I'd try the 'notification-rib' chunk in the 'validation' stanza of the 
> > routing-instance and see if setting inet.0 there pushes the DB the way you 
> > need. Certain versions of JunOS are quite broken going the other way, so 
> > I've had to enumerate all of the routing-instances that I want to be sure 
> > have a copy of the validation DB to get them to work correctly. Maybe the 
> > other way will work in your case.
> >
> > David
> >
> >> On Jun 4, 2023, at 7:52 PM, Chris Kawchuk via juniper-nsp 
> >>  wrote:
> >>
> >> Hi All
> >>
> >> Been scratching my head today. As per Juniper's documentation, you can 
> >> indeed setup RPKI/ROA validation session inside a routing-instance. You 
> >> can also have it query against that instance on an import policy for that 
> >> VRF specifically, and if there's no session, it will revert to the default 
> >> RPKI RV database (if configured) under the main routing-options {} stanza 
> >> to check for valid/invalid, etc...
> >>
> >> https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/topic-map/bgp_origin_validation.html
> >>
> >> Thats all fine and dandy, but it seems that JNPR's implementation of 
> >> RPKI/ROA *assumes* that your RV database is always configured in the main 
> >> routing instance (i.e. the main routing-options validation {} stanza, thus 
> >> your RPKI server MUST be available via inet.0 ).
> >>
> >> Unfortunately, the situation I am faced with is the opposite.
> >>
> >> My RPKI/ROA server is only available via our "admin" VRF (which is how we 
> >> manage the device) - Our inet.0 contains the

Re: [j-nsp] QFX DDOS Violations

2022-11-30 Thread Saku Ytti via juniper-nsp

Heh,

That makes sense. So in QFX5k 'VXLAN' classifier can contain anything
inside the VXLAN, like ARP? Instead of it being classified ARP, they
all share VXLAN classifier?

So this could also be VXLAN TTL exceeded? Which would happen every
time you have some kind of convergence event, and you'll get
microloop. Then all these are competing for access to same VXLAN
policer?

This is very broken behaviour :(

22.3R1 appears to address this:
New ARP and NDP packet cѴ-ssbCc-ঞon—We've introduced two CP classes
for ARP and NDP packets received over VTEP interface. When your device
b7;nঞC;s a packet as ARP or NDP, it performs an ingress port check
which v;rbC;s whether the VTEP interface receives these packets. If
VTEP interface receives the packet, datapath re-writes the CP class to
the newly 7;Cn;7 values. Based on this new CP class, the system
performs the remaining packet processing and forwards the packets
toward the host path. The system adds a separate DDoS policer to this
ARP |r-Lcķ which ensures that the ARP |r-Lc is not triggering underlay
ARP DDoS vboѴ-ঞon


I think in order of least to most broken:
a) have separate underlay + overlay policers for every protocol, (ttl,
reject, arp, nd, resolve)
b) use single overlay policer for all cases of (ttl, reject, arp, nd,
resolve) in and out VXLAN shares same ARP policer
c) don't have any policer
d) collapse multiple different punt reasons under single VXLAN policer



On Wed, 30 Nov 2022 at 15:44, Roger Wiklund  wrote:
>
> Hi John
>
> The default DDoS values on QFX5k for EVPN-VXLAN is way too low.
> I recommend these values + very tight storm-control on each applicable port.
>
> RSVP and LDP are not used but share the same queue as BGP so you will see 
> strange triggers if you omit these.
>
> set system ddos-protection protocols rsvp aggregate bandwidth 1
> set system ddos-protection protocols rsvp aggregate burst 1000
> set system ddos-protection protocols ldp aggregate bandwidth 1
> set system ddos-protection protocols ldp aggregate burst 1000
> set system ddos-protection protocols bgp aggregate bandwidth 1
> set system ddos-protection protocols bgp aggregate burst 1000
> set system ddos-protection protocols arp aggregate bandwidth 5
> set system ddos-protection protocols arp aggregate burst 5000
> set system ddos-protection protocols vxlan aggregate bandwidth 5
> set system ddos-protection protocols vxlan aggregate burst 5000
>
> The reason you're seeing VXLAN violation is because of EVPN arp/nd 
> suppression.
> Have a look at the VXLAN queue here:
> Detailed information about DDOS queues on QFX5K switches (juniper.net)
>
> Every single ARP/ND packet is proxied by the QFX. If you have an ARP storm 
> the VXLAN DDoS will kick in and rate limit this.
> If this is sustained, arp cache will timeout on each client and eventually 
> break connectivity.
>
> You mitigate this by having a very tight storm-control on each L2 interface.
> We use 1000kbps which translates roughly into 2000pps ARP
>
> set forwarding-options storm-control-profiles sc-1000kbps all bandwidth-level 
> 1000
> set forwarding-options storm-control enhanced
>
> Because storm-control is distributed and mitigated on each port, while the 
> DDoS is aggregated to the RE, this combo works fine and VXLAN is never 
> triggered.
>
> Hope this helps.
>
> Regards
>
> On Wed, Nov 30, 2022 at 2:43 PM Cristian Cardoso via juniper-nsp 
>  wrote:
>>
>> Hi Johan
>>
>> I experienced a similar issue in my evpn-vxlan environment on QFX5120-48y
>> switches. The DDOS alert occurred whenever a large number of VM migrations
>> occurred simultaneously in my environment, some times there were 20 VM's in
>> simultaneous migration and the DDOS alarmed.
>>
>> To solve this, I set the following value in the configuration:
>>
>> qfx5120> show configuration system ddos-protection protocols
>> vxlan {
>> aggregate {
>> bandwidth 1;
>> burst 12000;
>> }
>> }
>>
>>
>>
>> Em qua., 30 de nov. de 2022 às 07:16, john doe via juniper-nsp <
>> juniper-nsp@puck.nether.net> escreveu:
>>
>> > Hi!
>> >
>> > The leaf switches are QFX5k and it seems to be lacking some of the command
>> > you mentioned. We don't have any problem with bgp sessions going down, the
>> > impact is only the payload inside vxlan.
>> >
>> > Protocol Group: VXLAN
>> >
>> >   Packet type: aggregate (Aggregate for vxlan control packets)
>> > Aggregate policer configuration:
>> >   Bandwidth:500 pps
>> >   Burst:200 packets
>> >   Recover time: 300 seconds
>> >   Enabled:  Yes
>> > Flow detection configuration:
>> >   Flow detection system is off
>> >   Detection mode: Automatic  Detect time:  0 seconds
>> >   Log flows:  YesRecover time: 0 seconds
>> >   Timeout flows:  No Timeout time: 0 seconds
>> >   Flow aggregation level configuration:
>> > Aggregation level   Detection mode  Control mode  Flow rate
>> >

Re: [j-nsp] QFX DDOS Violations

2022-11-30 Thread Saku Ytti via juniper-nsp

The 'max arrival rate' is pre-policer, not the admitted rate.

I don't use VXLAN, and I can't begin to guess what VXLAN traffic needs
to punt. But this is not your transit VXLAN traffic. This is some
VXLAN traffic that the platform thought it needed to process in the
software.

I would personally tcpdump the punted traffic classified as VXLAN and
investigate what exactly it is.

On Wed, 30 Nov 2022 at 12:15, john doe  wrote:
>
> Hi!
>
> The leaf switches are QFX5k and it seems to be lacking some of the command 
> you mentioned. We don't have any problem with bgp sessions going down, the 
> impact is only the payload inside vxlan.
>
> Protocol Group: VXLAN
>
>   Packet type: aggregate (Aggregate for vxlan control packets)
> Aggregate policer configuration:
>   Bandwidth:500 pps
>   Burst:200 packets
>   Recover time: 300 seconds
>   Enabled:  Yes
> Flow detection configuration:
>   Flow detection system is off
>   Detection mode: Automatic  Detect time:  0 seconds
>   Log flows:  YesRecover time: 0 seconds
>   Timeout flows:  No Timeout time: 0 seconds
>   Flow aggregation level configuration:
> Aggregation level   Detection mode  Control mode  Flow rate
> Subscriber  Automatic   Drop  0  pps
> Logical interface   Automatic   Drop  0  pps
> Physical interface  Automatic   Drop  500 pps
> System-wide information:
>   Aggregate bandwidth is no longer being violated
> No. of FPCs that have received excess traffic: 1
> Last violation started at: 2022-11-30 09:08:02 CET
> Last violation ended at:   2022-11-30 09:09:32 CET
> Duration of last violation: 00:01:40 Number of violations: 1508
>   Received:  3548252144  Arrival rate: 201 pps
>   Dropped:   49294329Max arrival rate: 160189 pps
> Routing Engine information:
>   Bandwidth: 500 pps, Burst: 200 packets, enabled
>   Aggregate policer is never violated
>   Received:  0   Arrival rate: 0 pps
>   Dropped:   0   Max arrival rate: 0 pps
> Dropped by individual policers: 0
> FPC slot 0 information:
>   Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled
>   Hostbound queue 255
>   Aggregate policer is no longer being violated
> Last violation started at: 2022-11-30 09:08:02 CET
> Last violation ended at:   2022-11-30 09:09:32 CET
> Duration of last violation: 00:01:40 Number of violations: 1508
>   Received:  3548252144  Arrival rate: 201 pps
>   Dropped:   49294329Max arrival rate: 160189 pps
> Dropped by individual policers: 0
> Dropped by aggregate policer:   50294227
> Dropped by flow suppression:0
>   Flow counts:
> Aggregation level Current   Total detected   State
> Subscriber0 0Active
>
> vty)# show ddos scfd proto-states vxlan
> (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps)
> op-mode: a=automatic, o=always-on, x=disabled
> fc-mode: d=drop-all, k=keep-all, p=police
> d-t: detect time, r-t: recover time, t-t: timeout time
> aggr-t: last aggregated/deaggreagated time
> idx prot   groupproto mode detect agg flags state   sub-cfg   
> ifl-cfg   ifd-cfg  d-t  r-t  t-t   aggr-t
> ---    -- --- - - - 
> - -  ---  ---  ---   --
>  23 6400   vxlanaggregate auto no   1 2 0 a:d:0 a:d:  
>   0 a:d:  5000000
>
>
> Johan
>
> On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti  wrote:
>>
>> Hey,
>>
>> Before any potential trashing, I'd like to say that as far as I am
>> aware Juniper (MX) is the only platform on the market which isn't
>> trivial to DoS off the network, despite any protection users may have
>> tried to configure.
>>
>> > How do you identify the source problem of DDOS violations that junos logs
>> > for QFX? For example what interface that is causing the problem?
>>
>> I assume you are talking about QFX10k with Paradise (PE) chipset. I'm
>> not very familiar with it, but I know something about it when sold in
>> PTX10k quise, but there are significant differences. Answers are from
>> the PTX10k perspective. If you are talking about QFX5k many of the
>> answers won't apply, but the ukern side answers should help
>> troubleshoot it further, certainly with QFX5k the situation is worse
>> than it would be on QFX10k.
>>
>> > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for
>> > protocol/exception  VXLAN:aggregate exceeded its allowed bandwidth at fpc 0
>> > for 30 times, started at...
>> >
>> > The configured rate for VXLAN is 500pps, ddos protection is seeing rates
>> > over 150 000pps
>>
>> Do you mean you've configured:
>> 'set system ddos-protection

Re: [j-nsp] QFX DDOS Violations

2022-11-29 Thread Saku Ytti via juniper-nsp

Hey,

Before any potential trashing, I'd like to say that as far as I am
aware Juniper (MX) is the only platform on the market which isn't
trivial to DoS off the network, despite any protection users may have
tried to configure.

> How do you identify the source problem of DDOS violations that junos logs
> for QFX? For example what interface that is causing the problem?

I assume you are talking about QFX10k with Paradise (PE) chipset. I'm
not very familiar with it, but I know something about it when sold in
PTX10k quise, but there are significant differences. Answers are from
the PTX10k perspective. If you are talking about QFX5k many of the
answers won't apply, but the ukern side answers should help
troubleshoot it further, certainly with QFX5k the situation is worse
than it would be on QFX10k.

> DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for
> protocol/exception  VXLAN:aggregate exceeded its allowed bandwidth at fpc 0
> for 30 times, started at...
>
> The configured rate for VXLAN is 500pps, ddos protection is seeing rates
> over 150 000pps

Do you mean you've configured:
'set system ddos-protection protocols vxlan aggregate bandwidth 500'.
What exactly are you seeing? What does 'show ddos-protection protocols
vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd
proto-states vxlan'

Paradise (unlike Triton and Trio) does not support PPS policing at
all. So when you configure a PPS policer, what actually gets
programmed is 500pps*1500B bps. I've tried to argue this is a poor
default, 64B being superior choice.
In paradise 500pps would admit 500*(1500/64) or about 12kpps per
Paradise if those VXLAN packets were small. These would then be
policed by the LC CPU ukern into 500 pps for all the Paradise chips
living inside that LC CPU, before sending to RE over bme0.
After DDoS but before Paradise admits packet to the LC_CPU it goes
through VoQ, where most packets are classified as VoQ#2 which is
10Mbps wide with no burstability (classification, width and
burstability is being changed on later images). So extremely trivial
rates will cause congestion on the VoQ#2 and a lot of protocols will
be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP,
ND, ARP.

> This is an spine/leaf setup, one theory is that the vxlan traffic that most
> of our QFX boxes are activation ddos protection for is actually vxlan
> services running inside the vxlans, for example we have kubernetes clusters
> using vxlan. Is that a sane theory?

Not enough information to speculate.
In many cases ddos classification is wrong. You can review in the PFE,
'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X
program'. You can also capture punted packets on interface where RE
meets FPC (I think bme0 here), in the bme0 interface TNP headers are
in top of the punted packets and in the TNP headers you will see what
ddos classification was used, you can turn the number into name by
looking at the 'show ddos scfd proto-statates'.


I naively wish I could set my ddos-protocol classification and voq
classification manually in 'lo0 filter', because the infrastructure
allows for great protection, but particularly when choosing which VoQ
packets share there is no obvious single best solution, it depends on
the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as
they never compete with customers, BGP in another as they will compete
with customers and operators for me, and so forth. But of course this
wish is naive, as the solution the vendor offers is already too
complex for customers to use and giving more rope would just make the
mean config worse.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Collapse spine EVPN type 5 routes issue

2022-11-15 Thread Saku Ytti via juniper-nsp

I would still consider as-override, or at least I would figure out the
reason why it is not a good solution.

On Tue, 15 Nov 2022 at 15:40, niklas rehnberg via juniper-nsp
 wrote:
>
> Hi,
> Thanks for the quick reply, I hope following very simple picture may help
>
>   ClientsClients
>
>   |  |
>   |   EVPN/VXLAN|
>   |  Overlay AS 6555 |
>   spine1 --- type 5--- spine2
>  vrf WAN AS X   |  |   vrf WAN AS X
>eBGP  |  |   eBGP
>   |  |
>  PE  AS Y   PE   AS Y
>   |  |
>
>   Core Network---
>
> route example when loop occur
> show route hidden table bgp.evpn extensive
>
> bgp.evpn.0: 156 destinations, 156 routes (153 active, 0 holddown, 3 hidden)
> 5:10.254.0.2:100::0::5.0.0.0::16/248 (1 entry, 0 announced)
>  BGP /-101
> Route Distinguisher: 10.254.0.2:100
> Next hop type: Indirect, Next hop index: 0
> Address: 0x55a1fd2d2cdc
> Next-hop reference count: 108, key opaque handle: (nil),
> non-key opaque handle: (nil)
> Source: 10.254.0.2
> Protocol next hop: 10.254.0.2
> Indirect next hop: 0x2 no-forward INH Session ID: 0
> State: 
> Peer AS: 6
> Age: 1:14   Metric2: 0
> Validation State: unverified
> Task: BGP_6_6.10.254.0.2
> AS path: 65263 xxx I  (Looped: 65263)
> Communities: target:10:100 encapsulation:vxlan(0x8)
> router-mac:34:11:8e:16:52:b2
> Import
> Route Label: 99100
> Overlay gateway address: 0.0.0.0
> ESI 00:00:00:00:00:00:00:00:00:00
> Localpref: 100
> Router ID: 10.254.0.2
> Hidden reason: AS path loop
> Secondary Tables: WAN.evpn.0
> Thread: junos-main
> Indirect next hops: 1
> Protocol next hop: 10.254.0.2
> Indirect next hop: 0x2 no-forward INH Session ID: 0
> Indirect path forwarding next hops: 2
> Next hop type: Router
> Next hop: 10.0.0.1 via et-0/0/46.1000
> Session Id: 0
> Next hop: 10.0.0.11 via et-0/0/45.1000
> Session Id: 0
> 10.254.0.2/32 Originating RIB: inet.0
>   Node path count: 1
>   Forwarding nexthops: 2
> Next hop type: Router
> Next hop: 10.0.0.1 via
> et-0/0/46.1000
> Session Id: 0
> Next hop: 10.0.0.11 via
> et-0/0/45.1000
> Session Id: 0
>
>
> // Niklas
>
>
>
>
> Den tis 15 nov. 2022 kl 13:58 skrev Saku Ytti :
>
> > Hey Niklas,
> >
> > My apologies, I do not understand your topology or what you are trying
> > to do, and would need a lot more context.
> >
> > In my ignorance I would still ask, have you considered 'as-override' -
> >
> > https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/statement/as-override-edit-protocols-bgp.html
> > this is somewhat common in another use-case, which may or may not be
> > near to yours. Say you want to connect arbitrarily many CE routers to
> > MPLS VPN cloud with BGP, but you don't want to get unique ASNs to
> > them, you'd use a single ASN on every CE and use 'as-override' on the
> > core side.
> >
> > Another point I'd like to make, not all implementations even verify AS
> > loops in iBGP, for example Cisco does not, while Juniper does. This
> > implementation detail creates bias on what people consider 'clean' and
> > 'dirty' solution, as in Cisco network it's enough to allow loop at the
> > edge interfaces it feels more 'clean' while in Juniper network you'd
> > have to allow them in all iBGP sessions too, which suddenly makes the
> > solution appear somehow more 'dirty'.
> >
> >
> > On Tue, 15 Nov 2022 at 12:48, niklas rehnberg via juniper-nsp
> >  wrote:
> > >
> > > Hi all,
> > > I have the following setup and need to know the best practices to solve
> > > EVPN type 5 issues.
> > >
> > > Setup:
> > > Two ACX7100 as collapse spine with EVPN/VXLAN
> > > Using type 5 routes between the spines so iBGP can be avoided in
> > >

Re: [j-nsp] Collapse spine EVPN type 5 routes issue

2022-11-15 Thread Saku Ytti via juniper-nsp

Hey Niklas,

My apologies, I do not understand your topology or what you are trying
to do, and would need a lot more context.

In my ignorance I would still ask, have you considered 'as-override' -
https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/statement/as-override-edit-protocols-bgp.html
this is somewhat common in another use-case, which may or may not be
near to yours. Say you want to connect arbitrarily many CE routers to
MPLS VPN cloud with BGP, but you don't want to get unique ASNs to
them, you'd use a single ASN on every CE and use 'as-override' on the
core side.

Another point I'd like to make, not all implementations even verify AS
loops in iBGP, for example Cisco does not, while Juniper does. This
implementation detail creates bias on what people consider 'clean' and
'dirty' solution, as in Cisco network it's enough to allow loop at the
edge interfaces it feels more 'clean' while in Juniper network you'd
have to allow them in all iBGP sessions too, which suddenly makes the
solution appear somehow more 'dirty'.

On Tue, 15 Nov 2022 at 12:48, niklas rehnberg via juniper-nsp
 wrote:
>
> Hi all,
> I have the following setup and need to know the best practices to solve
> EVPN type 5 issues.
>
> Setup:
> Two ACX7100 as collapse spine with EVPN/VXLAN
> Using type 5 routes between the spines so iBGP can be avoided in
> routing-instance.
> Both spines has same bgp as number in the routing-instance WAN
> See below for a part of configuration
>
> Problem:
> Incoming routes from WAN router into spine1 will be advertised to spine2 as
> type 5 routes
> spine2 will not accept them due to AS number exit in the as-path already.
>
> Solution:
> I can easily fix it with "loop 2" config in the routing-options part, but
> is this the right way?
> Does there exist any command to change the EVPN Type 5 behavior from eBGP
> to iBGP?
> Different AS number in routing-instance?
> What are the best practices?
>
> Config part:
> show routing-instances WAN protocols evpn
> ip-prefix-routes {
> advertise direct-nexthop;
> encapsulation vxlan;
> reject-asymmetric-vni;
> vni 99100;
> export EXPORT-T5-WAN;
> }
> policy-statement EXPORT-T5-WAN {
> term 1 {
> from protocol direct;
> then accept;
> }
> term 2 {
> from protocol bgp;
> then accept;
> }
> }
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries

2022-10-21 Thread Saku Ytti via juniper-nsp

On Fri, 21 Oct 2022 at 16:39, Chuck Anderson  wrote:

> Also, it appears that when Junos was changed to support DHCP Snooping,
> Dynamic ARP Inspection, and IP Source Guard on trunk ports, even
> though trunk ports are in "trusted" mode by default, the switch is
> learning bindings on the trusted trunk ports (i.e. the uplink) and
> then *programming them into TCAM* at least for IPSG.  If this is true,
> then Junos has created a situation where one cannot deploy IPSG
> effectively unless the switch can scale to the number of entries
> needed for an entire *VLAN* which may have thousands of hosts, rather
> than just the access ports on a single switch stack which would
> normally have only hundreds of hosts or less.

Thank you for the update, and it sounds plausible to me. Features that
cause ingress TCAM consumption can quickly kill EX/QFX scale. It will
be very challenging to run most of the EX/QFX devices in L3 role, due
to the very modest TCAM. At least if there is any care at all in lo0
and edge filters.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries

2022-10-13 Thread Saku Ytti via juniper-nsp

I think you're gonna need JTAC.

My first guess would be that this is not a supported config on the
platform, but it also may be actual TCAM starvation. I'd be curious to
learn what the problem was.

On Thu, 13 Oct 2022 at 14:41, Chuck Anderson  wrote:
>
> It's an internal filter created by class-of-service.  The one I chose does 
> have a complaint, I just didn't paste the entire log originally.  Here are a 
> few lines earlier:
>
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-623-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-624-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-624-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-626-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
>
> VFP apparently means:
>
> VFP groups: VLAN Filter Processor - pre-ingress Content Aware processor (the 
> first thing in the Broadcom Ingress pipeline). It has maximum 1024 entries. 
> FIP snooping filters for example, belong to this group.
>
> Apparently my QoS config is too complex, which is amusing since it is 
> basically the ezqos-voip template config provided by Juniper:
>
> groups {
> ezqos-voip {
> class-of-service {
> classifiers {
> dscp ezqos-dscp-classifier {
> import default;
> forwarding-class ezqos-voice-fc {
> loss-priority low code-points 101110;
> }
> forwarding-class ezqos-control-fc {
> loss-priority low code-points [ 11 011000 011010 
> 111000 ];
> }
> forwarding-class ezqos-video-fc {
> loss-priority low code-points 100010;
> }
> }
> }
> forwarding-classes {
> class ezqos-best-effort queue-num 0;
> class ezqos-video-fc queue-num 4;
> class ezqos-voice-fc queue-num 5;
> class ezqos-control-fc queue-num 7;
> }
> scheduler-maps {
> ezqos-voip-sched-maps {
> forwarding-class ezqos-voice-fc scheduler 
> ezqos-voice-scheduler;
> forwarding-class ezqos-control-fc scheduler 
> ezqos-control-scheduler;
> forwarding-class ezqos-video-fc scheduler 
> ezqos-video-scheduler;
> forwarding-class ezqos-best-effort scheduler 
> ezqos-data-scheduler;
> }
> }
> schedulers {
> ezqos-voice-scheduler {
> buffer-size percent 20;
> priority strict-high;
> }
> ezqos-control-scheduler {
> buffer-size percent 10;
> priority strict-high;
> }
> ezqos-video-scheduler {
> transmit-rate percent 70;
> buffer-size percent 20;
> priority low;
> }
> ezqos-data-scheduler {
> transmit-rate {
> remainder;
> }
> buffer-size {
> remainder;
> }
> priority low;
> }
> }
> }
> }
> }
> apply-groups ezqos-voip;
> class-of-service {
> interfaces {
> ge-* {
> scheduler-map ezqos-voip-sched-maps;
> unit 0 {
> classifiers {
> dscp ezqos-dscp-classifier;
> }
> }
> }
> mge-* {
> scheduler-map ezqos-voip-sched-maps;
> unit 0 {
> classifiers {
> dscp ezqos-dscp-classifier;
> }
> }
> }
> ae* {
> unit 0 {
> rewrite-rules {
> dscp ezqos-dscp-rewrite;
> }
> }
> }
> }
> rewrite-rules {
> dscp ezqos-dscp-rewrite {
> forwarding-class ezqos-voice-fc {
> loss-priority low code-point 101110;
> }
> forwarding-class ezqos-video-fc {
> loss-priority low code-point 100010;
> }
> }
> }
> }
>
>
> On Thu, Oct 13, 2022 at 09:39:42AM +0300, Saku Ytti wrote:
> > You chose a filter which doesn't seem to complain about TCAM in the
> > initial post. Two filters just state 'not programmed' Others about
> > TCAM? Could you choose another filter which complains about TCAM?
> >
> > But

Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries

2022-10-13 Thread Saku Ytti via juniper-nsp

You chose a filter which doesn't seem to complain about TCAM in the
initial post. Two filters just state 'not programmed' Others about
TCAM? Could you choose another filter which complains about TCAM?

But certainly that output confirms 'Programmed: NO', just not entirely
clear why. Maybe TCAM issue, maybe invalid bind-point, maybe invalid
match, maybe invalid action. I am not familiar with the VFP_IL2L3_COS
type filter.

Is this filter you created? What are the terms you expect it to have?
Single term to accept ether-type 0x8100? What actions? What is the
bind point?



On Wed, 12 Oct 2022 at 21:36, Chuck Anderson  wrote:
>
> On Wed, Oct 12, 2022 at 08:40:46AM +0300, Saku Ytti wrote:
> >   - show filter dram
> >   - show filter hw X
> >   - show filter hw X show_term_info
> >
> > I lost a fight with JTAC about whether the TCAM exhausting filter
> > should be a commit failure or not. Argument was along the line 'well
> > you can keep adding routes even if you exhaust TCAM, so this should be
> > the same'.
> > I'm absolutely certain there are many QFX and EX networks out there
> > with wildly different filters programmed than what they believe they
> > have.
>
> Switching platform (2199 Mhz Pentium processor, 511MB memory, 0KB flash)
>
> FPC0(ex4300-48mp vty)# show filter dram
> Name   BytesAllocs Frees  Failures
> ---
> filter  62940  1198   395  0
> filter-halp 0 0 0  0
> ---
>
> Total DFW Dram Usage obtained from global handle:
> Total DFW Dram Usage:   78680 bytes
> Total DFW allocs:   740
> Total DFW frees:0
> Outstanding DFW allocs: 740
>
> Total DFW Dram Usage obtained from all filters:
> Total DFW Dram Usage:   78704 bytes
> Total DFW allocs:   740
> Total DFW frees:0
> Outstanding DFW allocs: 740
>
> FPC0(ex4300-48mp vty)#
> FPC0(ex4300-48mp vty)# show filter
> Program Filters:
> ---
>Index Dir CntText Bss  Name
>   --  --  --  --  
>
> Term Filters:
> 
>IndexSemanticName
>   
>1  Classic   ROUTING-ENGINE
>2  Classic   ROUTING-ENGINE6
>3  Classic   ACCESS-FILTER
>17000  Classic   __default_arp_policer__
>57006  Classic   __jdhcpd__
>57007  Classic   __dhcpv6__
>65008  Classic   __jdhcpd_l2_snoop_filter__
> 16777216  Classic   fnp-filter-level-all
> 46137360  Classic   pfe-cos-cl-610-5-1
> 46137361  Classic   pfe-cos-cl-611-5-1
> 46137362  Classic   pfe-cos-cl-612-5-1
> 46137363  Classic   pfe-cos-cl-613-5-1
> 46137364  Classic   pfe-cos-cl-614-5-1
> 46137365  Classic   pfe-cos-cl-615-5-1
> 46137366  Classic   pfe-cos-cl-616-5-1
> 46137367  Classic   pfe-cos-cl-617-5-1
> 46137368  Classic   pfe-cos-cl-618-5-1
> 46137369  Classic   pfe-cos-cl-619-5-1
> 46137370  Classic   pfe-cos-cl-620-5-1
> 46137371  Classic   pfe-cos-cl-621-5-1
> 46137372  Classic   pfe-cos-cl-622-5-1
> 46137373  Classic   pfe-cos-cl-623-5-1
> 46137374  Classic   pfe-cos-cl-624-5-1
> 46137375  Classic   pfe-cos-cl-625-5-1
> 46137376  Classic   pfe-cos-cl-626-5-1
> 46137377  Classic   pfe-cos-cl-627-5-1
> 46137378  Classic   pfe-cos-cl-628-5-1
> 46137379  Classic   pfe-cos-cl-629-5-1
> 46137380  Classic   pfe-cos-cl-630-5-1
> 46137381  Classic   pfe-cos-cl-631-5-1
> 46137382  Classic   pfe-cos-cl-632-5-1
> 46137383  Classic   pfe-cos-cl-633-5-1
> 46137384  Classic   pfe-cos-cl-634-5-1
> 46137385  Classic   pfe-cos-cl-635-5-1
> 46137386  Classic   pfe-cos-cl-636-5-1
> 46137387  Classic   pfe-cos-cl-637-5-1
> 46137388  Classic   pfe-cos-cl-638-5-1
> 46137389  Classic   pfe-cos-cl-639-5-1
> 46137390  Classic   pfe-cos-cl-640-5-1
> 46137391  Classic   pfe-cos-cl-641-5-1
> 46137392  Classic   pfe-cos-cl-642-5-1
> 46137393  Classic   pfe-cos-cl-643-5-1
> 46137394  Classic   pfe-cos-cl-644-5-1
> 46137395  Classic   pfe-cos-cl-645-5-1
> 46137396  Classic   pfe-cos-cl-646-5-1
> 46137397  Classic   pfe-cos-cl-647-5-1
> 46137398  Classic   pfe-cos-cl-648-5-1
> 46137399  Classic   pfe-cos-cl-649-5-1
> 46137400  Classic   pfe-cos-cl-656-5-1
> 46137401  Classic   pfe-cos-cl-657-5-1
> 46137402  Classic   pfe-cos-cl-658-5-1
> 46137403  Classic   pfe-cos-cl-655-5-1
> 46137404  Classic   pfe-cos-cl-650-5-1
> 46137405  Classic   pfe-cos-cl-651-5-1
> 46137406  Classic   pfe-cos-cl-652-5-1
> 46137407  Classic   pfe-cos-cl-653-5-1
> 46137408  Classic   pfe-cos-cl-654-5-1
> 92274688  Classic   __jdhcpd_l2_dai_filter__
> 125829120  Classic   __jdhcpd_security_dhcpv6_l2_snoop_filter__
> 130023424  Classic   __jdhcpd_security_icmpv6_l2_snoop_filter__
> 142606336  Classic   __jdhcpd_l3_tag__
> 142606337  Classic   __dhcpv6_l3_tag__
>
> Resolve Filters:
> ---
>Index
> 
>
> FPC0(ex4300-48mp vty)#

Re: [j-nsp] Cannot program filter pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries

2022-10-11 Thread Saku Ytti via juniper-nsp

Hey,

Can you please provide
  - show filter dram
  - show filter hw X
  - show filter hw X show_term_info

I lost a fight with JTAC about whether the TCAM exhausting filter
should be a commit failure or not. Argument was along the line 'well
you can keep adding routes even if you exhaust TCAM, so this should be
the same'.
I'm absolutely certain there are many QFX and EX networks out there
with wildly different filters programmed than what they believe they
have.



On Wed, 12 Oct 2022 at 05:33, Chuck Anderson via juniper-nsp
 wrote:
>
> Has anyone seen these errors and know what the cause is?
>
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-624-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-626-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-631-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-631-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-632-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-632-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-633-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-633-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-634-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-634-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-638-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-638-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-647-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-647-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-656-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-656-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-657-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-657-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-655-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-652-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-652-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-653-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-653-5-1" is NOT programmed in HW
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Cannot program filter 
> pfe-cos-cl-654-5-1 (type VFP_IL2L3_COS) -TCAM has 0 free entries
> Oct 11 21:41:02  ex4300-48mp fpc0 DFWE ERROR DFW: Filter : 
> "pfe-cos-cl-654-5-1" is NOT programmed in HW
>
> There is plenty of TCAM space for IRACL/IPACL entries, so this seems to be 
> some issue with a different TCAM partition?
>
> ex4300-48mp> show pfe filter hw summary
>
> Slot 0
>
> Unit:0:
> GroupGroup-ID   Allocated  Used   Free
> ---
> > Ingress filter groups:
>   iRACL group33 2048   1148   900
>   iPACL group25 51212 500
> > Egress filter groups:
>
> Slot 1
>
> Unit:0:
> GroupGroup-ID   Allocated  Used   Free
> ---
> > Ingress filter groups:
>   iRACL group33 2048   1148   900
>   iPACL group25 51212 500
> > Egress filter groups:
>
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Flowspec not filtering traffic.

2022-09-19 Thread Saku Ytti via juniper-nsp

I can't blame the port=0, even though I agree with the explanation
that you shouldn't rely on it for identifying fragmentation. Looking
at the program, whenever the counter you mentioned
(1x8.2x8.84.34,*,proto=17,port=0) is punched, it is also discarded.
And you can observe the counter being punched, therefore it should be
discarded to my understanding of the PFE programming.

The first term surprises me a little bit. It basically seems to be 'if
interface-group is 0 or 2-255 permit, otherwise check next term', but
it doesn't seem to offer any help to you, just curious detail. Unless
the help is, the counter and discard are working as intended, it's
just the interface you are interested in, belongs to interface-group
1, and traffic from that interface is not being filtered.

For the next step, I would reduce the complexity of your test to
single term, with just src or dst IP address match of some test
address, and review.

On Sun, 18 Sept 2022 at 22:05, Gustavo Santos  wrote:
>
> Hi Alexandre,
>
> The detection system throws for example  port 123 and port 0  rules at the 
> same time.
>
> But I got the logic but for example on our flow monitoring system we got 
> 30Gbps of udp flood towards a customer, 25Gbps are from source port 123 and 
> 5gbps are from port 0.
>
> What we get here is that All of the traffic is forwarded to the customer ( 
> 30gbps) instead of being filtered or not being forwarded to the customer´s 
> interface.
>
> I think I can set the detection system to change its behavior from port 0 to 
> udp fragment.
>
> Thanks for your input.
>
> Em dom., 18 de set. de 2022 às 14:25, Alexandre Snarskii  
> escreveu:
>>
>> On Sat, Sep 17, 2022 at 11:41:58AM -0300, Gustavo Santos via juniper-nsp 
>> wrote:
>> > Hi Saku,
>> >
>> > PS: Real ASN was changed to 65000 on the configuration snippet.
>> >
>> >
>> >
>> > show route table inetflow.0 extensive
>> >
>> > 1x8.2x8.84.34,*,proto=17,port=0/term:7 (1 entry, 1 announced)
>>
>> port=0 seems to be poor choice when trying to shut down NTP reflection,
>> with this rule your router filters only small fraction of DDoS traffic..
>>
>> Background:
>> - udp reflection attacks try go generate as much traffic as possible,
>> so, amplification attacks usually carry lots of fragmented traffic.
>> - when non-first fragment enters your router it does not contain
>> UDP header so it's reported by netflow as having source and destination
>> ports of zeros.
>> - your detection system generates and injects flowspec matching port=0,
>> - now when your router sees first fragment of amplified packet, it does
>> not matches this rule (source port is 123 and destination port is usually
>> non-zero too), so your router passes this packet.
>> - when your router sees non-first fragment of amplified packet,
>> it understand that it does not know neither source nor destination
>> ports, so it can't compare against this rule, so this packet is
>> not matched and passed too.
>> - so, what is filtered is only these (rare) packets that are the
>> first fragments and have destination port of zero.
>>
>> What you can try here: replace port matching with is-fragment matching.
>> In JunOS syntax it will be
>>
>> set routing-options flow route NTP-AMP match destination 1x8.2x8.84.34/32
>> set routing-options flow route NTP-AMP match protocol udp fragment 
>> is-fragment
>> set routing-options flow route NTP-AMP then discard
>>
>> > TSI:
>> > KRT in dfwd;
>> > Action(s): discard,count
>> > Page 0 idx 0, (group KENTIK_FS type Internal) Type 1 val 0x63b7c098
>> > (adv_entry)
>> >Advertised metrics:
>> >  Flags: NoNexthop
>> >  Localpref: 100
>> >  AS path: [65000 I
>> >  Communities: traffic-rate:52873:0
>> > Advertise: 0001
>> > Path 1x8.2x8.84.34,*,proto=17,port=0
>> > Vector len 4.  Val: 0
>> > *Flow   Preference: 5
>> > Next hop type: Fictitious, Next hop index: 0
>> > Address: 0x5214bfc
>> > Next-hop reference count: 22
>> > Next hop:
>> > State: 
>> > Local AS: 52873
>> > Age: 8w0d 20:30:33
>> > Validation State: unverified
>> > Task: RT Flow
>> > Announcement bits (2): 0-Flow 1-BGP_RT_Background
>> > AS path: I
>> > Communities: traffic-rate:65000:0
>> >
>> > show firewall
>> >
>> > Filter: __flowspec_default_inet__
>> > Counters:
>> > NameBytes
>> >  Packets
>> > 1x8.2x8.84.34,*,proto=17,port=0   19897391083
>> >  510189535
>> >
>> >
>> > BGP Group
>> >
>> > {master}[edit protocols bgp group KENTIK_FS]
>> > type internal;
>> > hold-time 720;
>> > mtu-discovery;
>> > family inet {
>> > unicast;
>> > flow {
>> > no-validate flowspec-import;
>> > }
>> > }
>> > }
>> >
>> >
>> >
>> > Import policy
>> > {master}[edit]
>> > gustavo@MX10K3# edit policy-options policy-statement

Re: [j-nsp] Flowspec not filtering traffic.

2022-09-18 Thread Saku Ytti via juniper-nsp

Actually I think I'm confused, I'm just not accustomed to seeing other
than 0:0 as rate, but it may be thaat the first 0 doesn't matter.

I would verify 'show route flow validation detail' as well as verify
presence of policers if any (in PFE 'show filter counters').

I'd also look at the filter more closely at PFE:
- show filter (get the index)
- show filter index X program



On Sun, 18 Sept 2022 at 09:39, Saku Ytti  wrote:
>
> Are you exceeding the configured rate for the policer? Did you expect
> to drop at any rate? The rule sets a non-0 policing rate.
>
> On Sat, 17 Sept 2022 at 17:42, Gustavo Santos  wrote:
> >
> > Hi Saku,
> >
> > PS: Real ASN was changed to 65000 on the configuration snippet.
> >
> >
> >
> > show route table inetflow.0 extensive
> >
> > 1x8.2x8.84.34,*,proto=17,port=0/term:7 (1 entry, 1 announced)
> > TSI:
> > KRT in dfwd;
> > Action(s): discard,count
> > Page 0 idx 0, (group KENTIK_FS type Internal) Type 1 val 0x63b7c098 
> > (adv_entry)
> >Advertised metrics:
> >  Flags: NoNexthop
> >  Localpref: 100
> >  AS path: [65000 I
> >  Communities: traffic-rate:52873:0
> > Advertise: 0001
> > Path 1x8.2x8.84.34,*,proto=17,port=0
> > Vector len 4.  Val: 0
> > *Flow   Preference: 5
> > Next hop type: Fictitious, Next hop index: 0
> > Address: 0x5214bfc
> > Next-hop reference count: 22
> > Next hop:
> > State: 
> > Local AS: 52873
> > Age: 8w0d 20:30:33
> > Validation State: unverified
> > Task: RT Flow
> > Announcement bits (2): 0-Flow 1-BGP_RT_Background
> > AS path: I
> > Communities: traffic-rate:65000:0
> >
> > show firewall
> >
> > Filter: __flowspec_default_inet__
> > Counters:
> > NameBytes  
> > Packets
> > 1x8.2x8.84.34,*,proto=17,port=0   19897391083
> > 510189535
> >
> >
> > BGP Group
> >
> > {master}[edit protocols bgp group KENTIK_FS]
> > type internal;
> > hold-time 720;
> > mtu-discovery;
> > family inet {
> > unicast;
> > flow {
> > no-validate flowspec-import;
> > }
> > }
> > }
> >
> >
> >
> > Import policy
> > {master}[edit]
> > gustavo@MX10K3# edit policy-options policy-statement flowspec-import
> >
> > {master}[edit policy-options policy-statement flowspec-import]
> > gustavo@MX10K3# show
> > term 1 {
> > then accept;
> > }
> >
> > IP transit interface
> >
> > {master}[edit interfaces ae0 unit 10]
> > gustavo@MX10K3# show
> > vlan-id 10;
> > family inet {
> > mtu 1500;
> > filter {
> > inactive: input ddos;
> > }
> > sampling {
> > input;
> > }
> > address x.x.x.x.x/31;
> > }
> >
> >
> > Em sáb., 17 de set. de 2022 às 03:00, Saku Ytti  escreveu:
> >>
> >> Can you provide some output.
> >>
> >> Like 'show route table inetflow.0 extensive' and config.
> >>
> >> On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp
> >>  wrote:
> >> >
> >> > Hi,
> >> >
> >> > We have noticed that flowspec is not working or filtering as expected.
> >> > Trying a DDoS detection and rule generator tool, and we noticed that the
> >> > flowspec rule is installed,
> >> > the filter counter is increasing , but no filtering at all.
> >> >
> >> > For example DDoS traffic from source port UDP port 123 is coming from an
> >> > Internet Transit
> >> > facing interface AE0.
> >> > The destination of this traffic is to a customer Interface ET-0/0/10.
> >> >
> >> > Even with all information and "show" commands confirming that the traffic
> >> > has been filtered, customer and snmp and netflow from the customer facing
> >> > interface is showing that the "filtered" traffic is hitting the 
> >> > destination.
> >> >
> >> > Is there any caveat or limitation or anyone hit this issue? I tried this
> >> > with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3
> >> > junos branch.
> >> >
> >> > Regards.
> >> > ___
> >> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> >> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> >>
> >>
> >>
> >> --
> >>   ++ytti
>
>
>
> --
>   ++ytti



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Flowspec not filtering traffic.

2022-09-18 Thread Saku Ytti via juniper-nsp

Are you exceeding the configured rate for the policer? Did you expect
to drop at any rate? The rule sets a non-0 policing rate.

On Sat, 17 Sept 2022 at 17:42, Gustavo Santos  wrote:
>
> Hi Saku,
>
> PS: Real ASN was changed to 65000 on the configuration snippet.
>
>
>
> show route table inetflow.0 extensive
>
> 1x8.2x8.84.34,*,proto=17,port=0/term:7 (1 entry, 1 announced)
> TSI:
> KRT in dfwd;
> Action(s): discard,count
> Page 0 idx 0, (group KENTIK_FS type Internal) Type 1 val 0x63b7c098 
> (adv_entry)
>Advertised metrics:
>  Flags: NoNexthop
>  Localpref: 100
>  AS path: [65000 I
>  Communities: traffic-rate:52873:0
> Advertise: 0001
> Path 1x8.2x8.84.34,*,proto=17,port=0
> Vector len 4.  Val: 0
> *Flow   Preference: 5
> Next hop type: Fictitious, Next hop index: 0
> Address: 0x5214bfc
> Next-hop reference count: 22
> Next hop:
> State: 
> Local AS: 52873
> Age: 8w0d 20:30:33
> Validation State: unverified
> Task: RT Flow
> Announcement bits (2): 0-Flow 1-BGP_RT_Background
> AS path: I
> Communities: traffic-rate:65000:0
>
> show firewall
>
> Filter: __flowspec_default_inet__
> Counters:
> NameBytes  Packets
> 1x8.2x8.84.34,*,proto=17,port=0   19897391083510189535
>
>
> BGP Group
>
> {master}[edit protocols bgp group KENTIK_FS]
> type internal;
> hold-time 720;
> mtu-discovery;
> family inet {
> unicast;
> flow {
> no-validate flowspec-import;
> }
> }
> }
>
>
>
> Import policy
> {master}[edit]
> gustavo@MX10K3# edit policy-options policy-statement flowspec-import
>
> {master}[edit policy-options policy-statement flowspec-import]
> gustavo@MX10K3# show
> term 1 {
> then accept;
> }
>
> IP transit interface
>
> {master}[edit interfaces ae0 unit 10]
> gustavo@MX10K3# show
> vlan-id 10;
> family inet {
> mtu 1500;
> filter {
> inactive: input ddos;
> }
> sampling {
> input;
> }
> address x.x.x.x.x/31;
> }
>
>
> Em sáb., 17 de set. de 2022 às 03:00, Saku Ytti  escreveu:
>>
>> Can you provide some output.
>>
>> Like 'show route table inetflow.0 extensive' and config.
>>
>> On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp
>>  wrote:
>> >
>> > Hi,
>> >
>> > We have noticed that flowspec is not working or filtering as expected.
>> > Trying a DDoS detection and rule generator tool, and we noticed that the
>> > flowspec rule is installed,
>> > the filter counter is increasing , but no filtering at all.
>> >
>> > For example DDoS traffic from source port UDP port 123 is coming from an
>> > Internet Transit
>> > facing interface AE0.
>> > The destination of this traffic is to a customer Interface ET-0/0/10.
>> >
>> > Even with all information and "show" commands confirming that the traffic
>> > has been filtered, customer and snmp and netflow from the customer facing
>> > interface is showing that the "filtered" traffic is hitting the 
>> > destination.
>> >
>> > Is there any caveat or limitation or anyone hit this issue? I tried this
>> > with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3
>> > junos branch.
>> >
>> > Regards.
>> > ___
>> > juniper-nsp mailing list juniper-nsp@puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>>
>>
>> --
>>   ++ytti



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Flowspec not filtering traffic.

2022-09-17 Thread Saku Ytti via juniper-nsp

Can you provide some output.

Like 'show route table inetflow.0 extensive' and config.

On Sat, 17 Sept 2022 at 05:05, Gustavo Santos via juniper-nsp
 wrote:
>
> Hi,
>
> We have noticed that flowspec is not working or filtering as expected.
> Trying a DDoS detection and rule generator tool, and we noticed that the
> flowspec rule is installed,
> the filter counter is increasing , but no filtering at all.
>
> For example DDoS traffic from source port UDP port 123 is coming from an
> Internet Transit
> facing interface AE0.
> The destination of this traffic is to a customer Interface ET-0/0/10.
>
> Even with all information and "show" commands confirming that the traffic
> has been filtered, customer and snmp and netflow from the customer facing
> interface is showing that the "filtered" traffic is hitting the destination.
>
> Is there any caveat or limitation or anyone hit this issue? I tried this
> with two MX10003 routers one with 19.R3-xxx and the other one with 20.4R3
> junos branch.
>
> Regards.
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Outgrowing a QFX5100

2022-09-16 Thread Saku Ytti via juniper-nsp

On Fri, 16 Sept 2022 at 22:12, Jason Healy via juniper-nsp
 wrote:

Hey Jason,

> My question is, what would be the logical "step up" from the qfx on a small 
> network?  I'm thinking the MX240 as it's the smallest router that has 
> redundant REs.  However, I have no experience with the router family (we're 
> all EX/QFX).  I'd consider a newer member of the QFX family, but I'd need to 
> know I'm not going to bump into a bunch of weird "unsupported on this 
> platform" issues.

Yes. I don't immediately cannot think of any feature that isn't
supported on MX that is supported on EX/QFX.

Broadly speaking if you are not cost-sensitive, and you don't need the
density, always buy an NPU box such as MX, because it's inherently
more feature complete.

Pipeline boxes like EX/QFX make sense if you are cost sensitive or
need high density and can answer what your requirements are ahead of
time and run a field trial against those specific requirements. In my
experience for access providers your requirements are not a knowable
variable, because you will introduce a new product during the life
cycle of a device, therefore you will be carrying additional risk with
pipeline compared to NPU. If you're a cloudy shop or incumbent telco
you likely can have a frozen set of requirements that are knowable
a-priori, which supports pipeline use-case.

> I'm fine with EOL/aftermarket equipment; we've got a pretty traditional 
> layer-2 spoke-and-hub setup with layer-3 for IRB and a default route to our 
> ISP (no VXLAN, tunneling, etc).  Our campus isn't growing so capacity isn't a 
> huge issue (we're 1g/10g uplinks everywhere, and the 10g aren't close to 
> saturation).  I *might* want 40g as a handoff to an aggregation layer, but 
> that's about it.  Thus, I'm OK with a relative lack of new features.

Your problem is the slow rate interfaces and getting reasonable
support for them. With MX if you are buying from a channel for chassis
boxes you should be only buying LC9600, which is 24x400GE, another
alternative is fixed config MX304. Both may be highly unsatisfactory
to you in the front-plate. ACX portfolio may have some middle-ground
to you.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Tacacs command authorization not working as intended

2022-07-04 Thread Saku Ytti via juniper-nsp

I believe this is best you can do:

y...@a03.labxtx03.us.bb-re0# show|display set |match deny
set system login class tacacs-user deny-commands "clear pppoe
sessions($| no-confirm$)"

y...@a03.labxtx03.us.bb-re0> clear pppoe sessions ?
Possible completions:
Name of PPPoE logical interface
y...@a03.labxtx03.us.bb-re0> clear pppoe sessions

You can't clear all, but you can clear any.


On Mon, 4 Jul 2022 at 17:43, Saku Ytti  wrote:
>
> I don't believe what you're doing is tacacs command authorization, that is 
> junos is not asking the tacacs server if or not it can execute the command, 
> something IOS and SROS can do, but which makes things like loading config 
> very brutal (except SROS has way to skip authorization for config loads).
>
> You are shipping config to the router for its allow-commands/deny-commands. 
> And I further believe behaviour you see is because there is distinction 
> between key and values, and you cannot include values in it. Similar problem 
> with 'apply-groups', because the parser doesn't know about values and you're 
> just telling what exists in the parser tree and what does not.
>
>
>
> On Mon, 4 Jul 2022 at 17:25, Pierre Emeriaud  wrote:
>>
>> Le lun. 4 juil. 2022 à 16:18, Saku Ytti  a écrit :
>> >
>> > I don't believe Junos has tacacs command authorization.
>>
>> it has. This sorta works, I've been able to allow some commands like
>> 'clear network-access aaa subscriber username.*' and 'monitor
>> traffic'. The issue I have is with 'clear pppoe sessions pp0'.
>>
>> When providing 'clear' to the user I can make it work, but I also have
>> to forbid all other clear commands I don't want.
>>
>> foo@bar> show cli authorization
>> Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N'
>> Permissions:
>> clear   -- Can clear learned network info
>> (...)
>> Individual command authorization:
>> Allow regular expression: (clear pppoe sessions pp0.*|clear
>> network-access aaa subscriber username.*|monitor traffic.*)
>> Deny regular expression: (request .*|file .*|save .*|clear
>> [a-o].*|clear [q-z].*|clear p[^p].*)
>>
>>
>> foo@bar> clear ?
>> Possible completions:
>>   network-access   Clear network-access related information
>>   ppp  Clear PPP information
>>   pppoeClear PPP over Ethernet information
>>
>> And one can reset all pppoe sessions while I only allowed 'pppoe
>> session pp0.*' :
>> foo@bar> clear pppoe sessions ?
>> Possible completions:
>>   <[Enter]>Execute this command
>> Name of PPPoE logical interface
>>
>> login configuration for your information:
>> foo@bar> show configuration system login
>> class GEN-PROF-N {
>> idle-timeout 15;
>> }
>> user GEN-USR-N {
>> full-name "TACACS centralized command authorization";
>> uid 2006;
>> class GEN-PROF-N;
>> }
>
>
>
> --
>   ++ytti



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Tacacs command authorization not working as intended

2022-07-04 Thread Saku Ytti via juniper-nsp

I don't believe what you're doing is tacacs command authorization, that is
junos is not asking the tacacs server if or not it can execute the command,
something IOS and SROS can do, but which makes things like loading config
very brutal (except SROS has way to skip authorization for config loads).

You are shipping config to the router for its allow-commands/deny-commands.
And I further believe behaviour you see is because there is distinction
between key and values, and you cannot include values in it. Similar
problem with 'apply-groups', because the parser doesn't know about values
and you're just telling what exists in the parser tree and what does not.

On Mon, 4 Jul 2022 at 17:25, Pierre Emeriaud  wrote:

> Le lun. 4 juil. 2022 à 16:18, Saku Ytti  a écrit :
> >
> > I don't believe Junos has tacacs command authorization.
>
> it has. This sorta works, I've been able to allow some commands like
> 'clear network-access aaa subscriber username.*' and 'monitor
> traffic'. The issue I have is with 'clear pppoe sessions pp0'.
>
> When providing 'clear' to the user I can make it work, but I also have
> to forbid all other clear commands I don't want.
>
> foo@bar> show cli authorization
> Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N'
> Permissions:
> clear   -- Can clear learned network info
> (...)
> Individual command authorization:
> Allow regular expression: (clear pppoe sessions pp0.*|clear
> network-access aaa subscriber username.*|monitor traffic.*)
> Deny regular expression: (request .*|file .*|save .*|clear
> [a-o].*|clear [q-z].*|clear p[^p].*)
>
>
> foo@bar> clear ?
> Possible completions:
>   network-access   Clear network-access related information
>   ppp  Clear PPP information
>   pppoeClear PPP over Ethernet information
>
> And one can reset all pppoe sessions while I only allowed 'pppoe
> session pp0.*' :
> foo@bar> clear pppoe sessions ?
> Possible completions:
>   <[Enter]>Execute this command
> Name of PPPoE logical interface
>
> login configuration for your information:
> foo@bar> show configuration system login
> class GEN-PROF-N {
> idle-timeout 15;
> }
> user GEN-USR-N {
> full-name "TACACS centralized command authorization";
> uid 2006;
> class GEN-PROF-N;
> }
>

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Tacacs command authorization not working as intended

2022-07-04 Thread Saku Ytti via juniper-nsp

I don't believe Junos has tacacs command authorization.

You can add do allow/deny commands regexp in the user class to achieve the
same without introducing the RTT lag.

On Mon, 4 Jul 2022 at 15:52, Pierre Emeriaud via juniper-nsp <
juniper-nsp@puck.nether.net> wrote:

> Hi
>
> i've been trying to authorize 'clear pppoe session pp0.*' for some of
> our users. They already have some allowed commands such as 'monitor
> traffic' and 'clear network-access aaa subscriber username' that
> works, but 'clear pppoe' is refused.
>
> foo@bar> clear ppp?
> No valid completions
>
> foo@bar> clear pppoe
>^
> syntax error, expecting .
>
>
> Here are their rights on the box. They don't have 'clear' permissions
> as I'd rather allow one command than refuse all the others.
>
> foo@bar> show cli authorization
> Current user: 'GEN-USR-N' login: 'foo' class 'GEN-PROF-N'
> Permissions:
> configure   -- Can enter configuration mode
> interface   -- Can view interface configuration
> network -- Can access the network
> routing -- Can view routing configuration
> trace   -- Can view trace file settings
> trace-control-- Can modify trace file settings
> view-- Can view current values and statistics
> view-configuration-- Can view all configuration (not including secrets)
> Individual command authorization:
> Allow regular expression: (clear pppoe sessions pp0.*|clear
> network-access aaa subscriber username.*|monitor traffic.*)
> Deny regular expression: (request .*|file .*|save .*|clear log .*)
> Allow configuration regular expression: (protocols pppoe
> traceoptions|system processes smg-service traceoptions|system
> processes general-authentication-service traceoptions|protocols
> ppp-service traceoptions|services l2tp traceoptions)
> Deny configuration regular expression: none
>
> And the tacacs configuration:
>
>   match = @RouterBNG {
> # ReadOnlyDebug
> service = junos-exec {
> local-user-name = GEN-USR-N
> user-permissions = "configure interface network routing trace
> trace-control view view-configuration"
> deny-commands = "request .*|file .*|save .*|clear log .*"
> allow-commands = "clear pppoe sessions pp0.*|clear network-access
> aaa subscriber username.*|monitor traffic.*"
> allow-configuration = "(protocols pppoe traceoptions|system
> processes smg-service traceoptions|system processes
> general-authentication-service traceoptions|protocols ppp-service
> traceoptions|services l2tp traceoptions)"
> }
>   }
>
> options I've tried:
> allow-commands = "(monitor traffic.*)|(clear pppoe sessions
> pp0\..*)|(clear network-access aaa subscriber username.*)"
> allow-commands = "monitor traffic.*|clear pppoe sessions pp0.*|clear
> network-access aaa subscriber username.*"
> allow-commands = "monitor traffic|clear pppoe sessions pp0\..*|clear
> network-access aaa subscriber username"
> allow-commands = "clear pppoe sessions pp0.*|clear network-access aaa
> subscriber username.*|monitor traffic.*"
>
>
> Is there a way without providing 'clear' permission? 'clear
> network-access' works even without it...
>
> thanks,
> pierre
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] GRE tunnels on a QFX10002-60C

2022-06-24 Thread Saku Ytti via juniper-nsp

On Fri, 24 Jun 2022 at 10:54, Mark Tinka via juniper-nsp
 wrote:> After failing to get Netscout to
natively support IS-IS, we came up with

> a rather convoluted - but elegant - way to transport on-ramp/off-ramp
> traffic into and out of our scrubbers.
>
> Basically, we use lt-* (logical tunnel) interfaces that sit both in the
> global table and a VRF. We loop them to each other, and use IS-IS + BGP
> + LDP to tunnel traffic natively using MPLS-based LSP's signaled by LDP
> (as opposed to GRE), so that traffic an always follow the best IS-IS +
> iBGP path, without the hassle of needing to run GRE between routers and
> scrubbers.

Many ways to skin the cat. If you can dedicate small router to the
scrubber (or routing-instance if you can't) and you run BGP-LU, so you
avoid useless egress IP lookup, you just ensure that the scrubber PE
or scrubber instance doesn't have the more specific routes, then it'll
follow the BGP-LU path to egress CE.
You can scrub any and all prefixes, without any scale implications as
you never need to touch the network to handle clean traffic.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] GRE tunnels on a QFX10002-60C

2022-06-24 Thread Saku Ytti via juniper-nsp

Tunnel interfaces are not supported on PE/Paradise, I don't think this
changed in BT/Triton either.

However you can decapsulate/encapsulate on ingress firewall filter, e.g.:

term cleanPipe:xe-0-4-1-1 {
from {
source-address {
a.b.c.d/32;
}
destination-address {
e.f.g.h/30;
}
protocol gre;
}
then {
count cleanPipe:xe-0-4-1-1;
decapsulate gre routing-instance xe-0-4-1-1;
}
}

Here traffic coming from a specific source address, going to a
specific destination link using IP protocol 'GRE' is being counted,
accepted and decapsulated into a routing-instance.

In many ways filter based decapsulation is actually preferable to
interface, so I have no large qualms here. What I'd actually want is
egress filter decap, instead of ingress. So I could point my GRE
tunnels to random addresses at customer network, and have in my edge
filters static decap statement which is never updated. Like 'from
scurbber/32 to anywhere, protocol gre, decap'. This way my scrubber
would launch GRE tunnels to any address at customer site, routing
would follow best BGP path to egress and just at the last moment,
packet would get decapped.

On Fri, 24 Jun 2022 at 00:24, Jon Lewis via juniper-nsp
 wrote:
>
> I've got an open support case with Juniper, but as it's gotten nowhere
> since opening it last night, I figured I'd try some crowdsourcing :)
>
> Does anyone have working GRE tunnels terminated to a QFX10002-60C?  We've
> got a GRE tunnel mesh of several dozen sites, using a mix of Arista 7280s
> and Juniper QFX5120s to terminate the tunnels.  We're trying to add a
> couple of new sites to the mesh where the tunnels will live on
> QFX10002-60C.  What we're seeing with the QFX10002-60C is, locally
> generated traffic (i.e. ping from the QFX10002-60C to an IP reachable via
> a gr-0/0/0.XX interface) works, but traffic from another device in the POP
> that needs to transit a QFX10002-60C which should then route the traffic
> via a gr-0/0/0.XX interface is dropped.
>
> I'm trying to figure out if there's something special about the
> QFX10002-60C that requires some config knob not needed on QFX5120 or if
> GRE is just broken on the QFX10002-60C.  The QFX10002-60C are running
> 20.4R3.8.
>
> --
>   Jon Lewis, MCP :)   |  I route
>   StackPath, Sr. Neteng   |  therefore you are
> _ http://www.lewis.org/~jlewis/pgp for PGP public key_
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Request for data from PTX PE/Paradise (e.g. PTX1000, LC1101) operators

2022-06-17 Thread Saku Ytti via juniper-nsp

Hi,

I'd like to return to this topic. I was confused earlier,
misattributing the issue I'm seeing to MX. And now it is clear it must
be on PTX.

I'd again solicit for input for anyone seeing in their syslogs:
  a) junos: 'received pdu - length mismatch for lacp'
  b) iosxr: 'ROUTING-ISIS-4-ERR_BAD_PDU_LENGTH' or
'ROUTING-ISIS-4-ERR_BAD_PDU_FORMAT'
  c) or any other NOS which might log this, when another end is PTX

The issue is very rare, so ideally you'd look at syslog for all
periods you've had Paradise in the network.

Thanks again,
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Request for data from Trio EA (e.g. LC2101, MX204, etc) operators

2022-05-19 Thread Saku Ytti via juniper-nsp

Hey,

I'd like Trio EA operators to verify two things for me

a) Bad LACP PDU on both sides of EA link
- in syslog something like this:
- kernel: et-0/0/0: received pdu - length mismatch for lacp : len 143, pdu 124

b) L3 incompletes increasing on backbone facing interface on both
sides of EA link (local and egress PE if MPLS encapped by EA when
sending out)
- visible also in standard IF-MIB errors counter if you poll that

Do you see either/both coinciding with the introduction or upgrade of
EA devices? If so I'd like to look a bit deeper into that, but I also
appreciate it if you just tell 'we see it, unfortunately we do not at
this time have time to investigate'.

Thanks!
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Junos 20 - slow RPD

2022-03-22 Thread Saku Ytti via juniper-nsp

Hey,

> On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
> way slower in processing BGP policies and sending the routes to neighbors.
> For example, on a BGP group with one neighbor and an export policy
> containing 5 terms each matching a community it takes ~1min ( 100% RPD
> utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
> in 18.2.
> Disabling terms will reduce the time.
>
> Anyone experienced something similar?

I don't recognise this problem specifically. It seems rather terrible
regression so you probably should either open a JTAC case or do the
Junos dance. If you have a large RIB/FIB ratio allowing more than 1
core to work on BGP will produce improvement:

set system processes routing bgp rib-sharding number-of-shards 4
set system processes routing bgp update-threading

This is a disruptive change. JNPR wanted us on 20.3 (we are on
20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3
with success. We are currently targeting 21.4R1-S1.

If you have memory pressure, you can expand the default 16GB DRAM to
24GB DRAM via configuration toggle (post 21.2R1). If you are
comfortable hacking QEMU/KVM config manually, you can do it on any
release and can entertain other sizes.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper CoS - Classifiers specifically

2022-03-16 Thread Saku Ytti via juniper-nsp

Hey Aaron,

> I'm wondering if the BA classifier stops working once an MFC is applied.  It
> sure seems to in testing.  I feel like I've seen a diagram at some point or
> document stating that MFC comes before BA in the CoS process chain. but I'm
> not sure.  If anyone has that link/doc please send it.  I'd like to know for
> sure.

The implied default classifier is there until something else is
configured. As you say, you can review what is currently applied by
'show class-of-service interface'. And yes, firewall based
classification is done after the cos classifier, so firewall based
classification overrides what our cos configuration classified packet
to. You can use this to accomplish QPPB, such as instead of BGP based
blackholing, you'd have BGP based class downgrade for some
specifically selected SADDR or DADDR, signalled by BGP.

> Oh, btw, were in the world is all this default CoS stuff derived from?  I'd
> like to think it's in a file somewhere that I can see in shell perhaps.  But
> maybe not.  Maybe it's actually compiled into the Junos operating systems
> itself.  Or is there a way to see "show configuration" with a special option
> that shows automatic/default stuff like all this CoS info?

I believe they are compiled in. Juniper does also have a more
appropriate way to inject defaults via 'show configuration groups
junos-defaults', but that is not being used here. Of course this is
the common case, for any NOS vendor defaults are typically compiled
in, not injected via some common configuration scheme, in many cases
this is mandatory, because having no default is impossible, like you
cannot not have MTU.

The standard QoS config in Junos allows any internet user to have
their own protected 5% via class selector 6 and 7, potentially
disrupting your signalling protocols. I consider all Junos devices
misconfigured if QoS policy for edge interfaces is not explicitly
defined by the operator.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Marking/shaping UDP reflection traffic

2022-03-09 Thread Saku Ytti via juniper-nsp

On Wed, 9 Mar 2022 at 19:48, Gert Doering via juniper-nsp
 wrote:

> We use different classes for UDP/123, UDP/53 (exclude well-known
> recursives), fragments, ... and are currently using between 20 and 100
> mbit/s for these classes.  What is the right number for you depends
> on "how much can your customers stomach?" and "how much do you see
> under normal conditions?".

We do the same, but we classify protocols to two classes 'important'
and 'unimportant',. Unimportant being protocols we deem not to be used
in reality for anything but abuse, and important to be dual-use.
'unimportant' gets policed on port-level out-right and 'important'
gets 2coloured on port level, that exceeding traffic gets downgraded
below BE.

Answering 'what rate is right' is difficult without understanding
better how you are policing, where and what your access ports usually
look like. Do remember that JNPR policers are per NPU level by
default, unlike CSCO which are per interface level and per-NPU level
is not even a configurable option.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cut through and buffer questions

2021-11-19 Thread Saku Ytti via juniper-nsp

On Fri, 19 Nov 2021 at 17:12, Thomas Bellman via juniper-nsp
 wrote:

> Cut-through actually *can* help a little bit.  The buffer space in
> the Trident and Tomahawk chips is mostly shared between all ports;
> only a small portion of it is dedicated per port[1].  If you have
> lots of traffic on some ports, with little or no congestion,
> enabling cut-through will leave more buffer space available for
> the congested ports, as the packets will leave the switch/router
> quicker.

Correct, you can save packetSize * egressInts of buffer with
cut-through. So if you have 48 ports and we assume 1500B frames, you
can save 72kB of buffer space.

> One should note though that these chips will fall back to store-
> and-forward if the ingress port and egress port run at different

I had hoped this was obvious, when I mentioned the percentage of
frames getting cut-through. And strictly speaking, it is not 'these
chips', you cannot implement cut-through without store-and-forward.
You'd end up dropping most of the traffic in all but very esoteric
topology/scenario.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cut through and buffer questions

2021-11-19 Thread Saku Ytti via juniper-nsp

On Fri, 19 Nov 2021 at 11:50, james list  wrote:

> I also understood cut through cannot help but obviously I cannot change QFX 
> switches because we loss few udp packets for a single application, the idea 
> could be to change shared buffers for unused queues and add to used one, 
> correct ?

Yes. Anything you can do to
  a) increase buffer (traditionally in Catalyst, EX, you can win quite
bit more buffers by removing queues)
  b) increase egress rate (LACP to the host may help)

Will help a little bit.

> Based on the output provided what you suggest to change ?
> I also understand this kind of change is traffic affecting.

I'm not familiar with QFX tuning, but it should be fairly easy to find
and test how you can increase buffers. I think your goal#1 should be
move to single BE queue, and try to assign everything there and
secondary goal is to add another high priority class and give it a
little bit of a buffer.

> I also need to understand how shared buffer queues on QFX are attached to COS 
> queues.

Yes. I also don't know this, and I'm not sure how much room for
tinkering there is, I know in catalyst and EX some gains over default
config can be made, which have significant improvement when boxes have
been deployed in the wrong application.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cut through and buffer questions

2021-11-19 Thread Saku Ytti via juniper-nsp

On Fri, 19 Nov 2021 at 10:49, james list  wrote:

Hey,

> I try to rephrase the question you do not understand: if I enable cut through 
> or change buffer is it traffic affecting ?

There is no cut-through and I was hoping after reading the previous
email, you'd understand why it won't help you at all nor is it
desirable. Changing QoS config may be traffic affecting, but you
likely do not have the monitoring capability to observe it.

> Regarding the drops here the outputs (15h after clear statistics):

You talked about MX, so I answered from MX perspective. But your
output is not from MX.

The device you actually show has exceedingly tiny buffers and is not
meant for Internet WAN use, that is, it does not expect significantly
higher sender rate to receiver rate with high RTT. It is meant for
datacenter use, where RTT is low and speed delta is small.

In real life Internet you need larger buffers because of this
senderPC => internets => receiverPC

Let's imagine an RTT of 200ms and receiver 10GE and sender 100GE.
- 10Gbps * 200ms = 250MB TCP window needed to fill it
- as TCP windows grow exponentially in absence of loss, you could have
128MB => 250MB growth
- this means, senderPC might serialise 128MB of data at 100Gbps
- this 128MB you can only send at 10 Gbps rate, rest you have to take
into the buffers
- intentionally pathological example
- 'easy' fix is, that sender doesn't burst the data at its own rate,
but does rate estimation and sends window growth at estimated receiver
rate, this practically removes buffering needs entirely
- 'easy' fix is not standard behaviour, but some cloudyshops configure
their linux like this thankfully (Linux already does bandwidth
estimation, and you can ask 'tc' to shape the session to esimated
bandwidth'

What you need to do is change the device to a one that is intended for
the application you have.
If you can do anything at all, what you can do, is ensure that you
have minimum amount of QoS classes and those QoS classes have maximum
amount of buffer. So that unused queues aren't holding empty memory
while used queue is starving. But even this will have only marginal
benefit.

Cut-through does nothing, because your egress is congested, you can
only use cut-through if egress is not congested.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cut through and buffer questions

2021-11-18 Thread Saku Ytti via juniper-nsp

On Thu, 18 Nov 2021 at 23:20, james list via juniper-nsp
 wrote:

> 1) is MX family switching by default in cut through or store and forward
> mode? I was not able to find a clear information

Store and forward.

> 2) is in general (on MX or QFX) jeopardizing the traffic the action to
> enable cut through or change buffer allocation?

I don't understand the question.

> I have some output discard on an interface (class best effort) and some UDP
> packets are lost hence I am tuning to find a solution.

I don't think how this relates to cut-through at all.

Cut-through works when ingress can start writing frame to egress while
still reading it, this is ~never the case in multistage ingress+egress
buffered devices. And even in devices where it is the case, it only
works if egress interface happens to be not serialising the packet at
that time, so the percentage of frames actually getting cut-through
behaviour in cut-through devices is low in typical applications,
applications where it is high likely could have been replaced by a
direct connection.
Modern multistage devices have low single digit microseconds internal
latency and nanoseconds jitter.  One microsecond is about 200m in
fiber, so that gives you the scale of how much distance you can reduce
by reducing the delay incurred by multistage device.

Now having said that, what actually is the problem. What are 'output
discards', which counter are you looking at? Have you modified QoS
configuration, can you share it? By default JNPR is 95% BE, 5% NC
(unlike Cisco, which is 100% BE, which I think is better default), and
buffer allocation is same, so if you are actually QoS tail-dropping in
default JNPR configuration, you're creating massive delays, because
the buffer allocation us huge and your problem is rather simply that
you're offering too much to the egress, and best you can do is reduce
buffer allocation to have lower collateral damage.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ISSU offlined mpc - why?

2021-09-01 Thread Saku Ytti via juniper-nsp

On Wed, 1 Sept 2021 at 20:35, Chuck Anderson via juniper-nsp
 wrote:

> Eventually during the ISSU process, the line card software needs to be
> upgraded.  During that part, each line card goes offline one at a
> time.  If you have multiple line cards, design your network such that
> redundant network paths are connected across different cards to
> prevent a total outage when each line card is upgraded one-by-one.

Disclaimer: I do not use ISSU nor do I plan to use it, it is complex
and I've been hurt before with non-obvious failure modes after the box
is left in an unknown state, this is vendor agnostic fear I have and I
am currently not seeking help to overcome it.

https://www.juniper.net/documentation/us/en/software/junos/high-availability/topics/topic-map/unified-issu-enhanced-mode.html
Enhanced mode is an in-service software upgrade (ISSU) option
available on MPC8E, MPC9E, and MPC11E line cards that eliminates
packet loss during the unified ISSU process

The changes in hardware that affect MPC behaviour are mostly the
microcode that the lookup engines run, in trio these are collections
of identical cores called PPE (packet processing engine). In theory we
could put one PPE at a time out of service, and restart it with new
ucode, until we're all done, having N-1/N of nominal PPS capacity
during upgrade (I think EA has 96 PPE, so 98.95% during upgrade),
And in fact this is what 'hyper mode' does, Trio has ucode compatible
history of over a decade, without running hyper mode the PPEs are
running the old +decade old collection of code, with hyper mode the
PPEs are running new rewrite of the ucode. Newer hardware is hyper
mode only, older hardware it is operator choice which generation of
ucode to run.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Looking for Hints: Best Practices to PUSH prefix-list on MX platform with 16.x and UP

2021-08-13 Thread Saku Ytti via juniper-nsp

You could have something like this:

groups {
  IRR {
 ...
   }
}

Then always generate complete new prefix lists in NMS into a single file.

And have script do:

edit groups
delete IRR
load merge https://nms/irr.junos
commit and-quit


On Thu, 12 Aug 2021 at 21:47, Alain Hebert via juniper-nsp
 wrote:
>
> Context
>
>  I'm looking for a *simple* & safe way to manage daily IRR changes
> from my customers...
>
>  Right now its a simple script that push changes using command lines
> thru SSH...
>
>  While it is working adequately, I wonder how long it will be
> feasible =D with the current growth.
>
>
> Solution
>
>  As for there REST API, I remember someone having some issues where
> the RE keep rebooting and took down their entire OP for a few hours...
>
>  . Anyone can testify on the solidity of their RESTful API?
>
>  . Should we bump up the production version to something newer?
>
>  PS: Security wise we're fine, anything related to management is
> tightly pinned to a OOB with MFA and high encryption =D.
>
>
>  Thanks for your time.
>
> --
>
> -
> Alain Hebertaheb...@pubnix.net
> PubNIX Inc.
> 50 boul. St-Charles
> P.O. Box 26770 Beaconsfield, Quebec H9W 6G7
> Tel: 514-990-5911  http://www.pubnix.netFax: 514-990-9443
>
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] How many bits/bytes of a packet can be matched in a firewall rule on Juniper MX-series?

2021-07-09 Thread Saku Ytti via juniper-nsp

On Fri, 9 Jul 2021 at 13:24, embolist  wrote:

> So, I can match a bit pattern within the first  256 bytes from the start of 
> the IP header, is that correct?
> How many bits can I match within that first 256 bytes?

You can set the match-start from L3, L4 or payload and take 256 bytes
offset from that. The documents say 128 bits.
   Length of integer input (1..32 bits), Optional
length of string input (1..128 bits)
https://kb.juniper.net/InfoCenter/index?page=content=KB34222=MX_SERIES=LIST

What have you done thus far? This seems eminently testable.

--
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] How many bits/bytes of a packet can be matched in a firewall rule on Juniper MX-series?

2021-07-08 Thread Saku Ytti via juniper-nsp

Hey,

I'm not sure I can parse what you are asking. I thought you're asking
how far in the packet you can match with flexible-match-mask, which I
can commit up-to 255 byte offset, but didn't test. I know the original
Trio gets about 320B of the packet in the LU, but newer Trio's get a
little bit less.

Whenever MQ sends a packet to LU for lookup, if it is able to send the
entire packet, it sets the parcel type M2L_Packet, if it cannot send
the entire packet, it sends first N bytes and sets the parcel type
M2L_PacketHead.
Therefore if you ping through a quiet Trio, and increase packet size
byte by byte, once you see a counter shift from M2L_Packet to
M2L_PacketHead you've found the value of N.

You can review these counters on modern Trio via 'show mqss N lo
stats', such as:
IMPC2(r33.labxtx01.us.bb-re0 vty)# show mqss 0 lo stats
LO Block  Parcel Name   Counter NameTotal
   Rate
--
0 M2L_PacketParcels sent to LUSS8194632996
   3479 pps
0 M2L_PacketHeadParcels sent to LUSS
22929007899   7559 pps

But seeing you included a question about filter chaining, I'm not sure
I understood your question right.

On Fri, 9 Jul 2021 at 03:21, embolist via juniper-nsp
 wrote:
>
>
>
>
> -- Forwarded message --
> From: embolist 
> To: "juniper-nsp@puck.nether.net" 
> Cc:
> Bcc:
> Date: Fri, 09 Jul 2021 00:15:11 +
> Subject: How many bits/bytes of a packet can be matched in a firewall rule on 
> Juniper MX-series?
> I'm trying to figure out how many bits/bytes of a packet I can match on in a 
> firewall rule for a Juniper MX router. A lot of the documentation talks about 
> a 128-bit match criteria, but then I see some examples which seem to imply 
> that I can do multi-term matching, chaining match criteria together.
>
> Am I understanding this correctly? If so, how many 128-bit matching criteria 
> can I chain together? Or am I totally misunderstanding?
>
> I'm a Juniper n00b (as if you couldn't tell), and would really appreciate any 
> pointers. The documentation just doesn't seem to contain any information on 
> how much of a packet I can match.
>
>
> -- Forwarded message --
> From: embolist via juniper-nsp 
> To: "juniper-nsp@puck.nether.net" 
> Cc:
> Bcc:
> Date: Fri, 09 Jul 2021 00:15:11 +
> Subject: [j-nsp] How many bits/bytes of a packet can be matched in a firewall 
> rule on Juniper MX-series?
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

94 matches

Mail list logo