from:"James Bensley"

Re: [j-nsp] VMX integrated FPC

2020-12-21 Thread James Bensley

On Mon, 21 Dec 2020 at 09:56, Mark Tees  wrote:
>
> Hello
>
> I remember when I originally got my mittens on VMX there was a boot
> flag to tell it to use an integrated FPC or integrated RIOT without a
> separate VM running forwarding. I can't find my notes on that.
>
> Does anyone know if that's still possible? I just want a pretend/low
> performance/fake FPC ideally.

Hi Mark,

Are you thinking of this?

"set chassis fpc 0 lite-mode" - requires a reboot to take effect.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] LAG/ECMP hash performance

2019-11-26 Thread James Bensley

On Wed, 28 Aug 2019 at 08:21, Saku Ytti  wrote:
> SRC: (single 100GE interface, single unit 0)
>   23.A.B.20 .. 23.A.B.46
>   TCP/80
> DST: (N*10GE LACP)
>   157.C.D.20 .. 157.C.D.35
>   TCP 2074..65470 (RANDOM, this alone, everything else static, should
> have guaranteed fair balancing)
>
> I'm running this through IXIA and my results are:
>
> 3*10GE Egress:
>   port1 10766516pps
>   port2 10766543pps
>   port3  7536578pps
> after (set forwarding-options enhanced-hash-key family inet
> incoming-interface-index)
>   port1 9689881pps
>   port2 11791986pps
>   port3 5383270pps
> after removing s-int-index and setting adaptive
>   port1 9689889pps
>   port2 9689892pps
>   port3 9689884pps
>
> I think this supports that the hash function diffuses poorly. It
> should be noted that 2nd step adds entirely _static_ bits to the input
> of the hash, source interface does not change. And it's perfectly
> repeatable. This is to be expected, the most affected weakness bits
> shift, either making the problem worse or better.
> I.e. flows are 100% perfectly hashable, but not without biasing the
> hash results. There aren't any elephants.
>
>
> 4*10GE Egress:
>   port1 4306757pps
>   port2 8612807pps
>   port3 9689893pps
>   port4 6459931pps
> after adding incoming-interface-index)
>   port1 6459922pps
>   port2 8613236pps
>   port3 9691485pps
>   port4 4306620pps
> after removing s-index and adding adaptive:
>   port1 7536562pps
>   port2 7536593pps
>   port3 6459928pps
>   port4 7536566pps
> after removing adaptive and adding no-destination-port + no-source-port
>   port1: 5383279pps
>   port2: 9689886pps
>   port3: 7536588pps
>   port4: 6459922pps
> after removing no-source-port (i.e. destination port is used for hash)
>   port1: 8613235pps
>   port2: 5383272pps
>   port3: 5383274pps
>   port4: 9689884pps
>
> It is curious that it actually balances more fairly, without using TCP
> ports at all! Even thought there is _tons_ of entropy there due to
> random DPORT.

Better late than never

100G link from Ixia to ASR9K Hu0/1/0/3, with a pseudowire attachment
interface configured on Hu0/1/0/3.4001, 3x100G core facing LAG links
(Hu0/0/0/0, Hu0/0/0/5, Hu0/0/0/6).

The packet stream sent from Ixia has an Ethernet header with random
dest MAC, random src MAC, VLAN ID 4001 to match into pseudowire AC,
IPv4 headers are next with random dest IP and random src IP, TCP
headers follow with random dest port and random src port. Payload is
random data. Frame size is 1522 bytes.

Everything is re-randomised every frame. Sending ~100Mbps of traffic...

The default load-balancing method on ASR9K for L2VPNs is
per-pseudowire so initially everything falls onto one core facing LAG
member:

ar0-ws.bllab Monitor Time: 00:15:42  SysUptime: 312:06:24
 Last Clear:   00:10:36
Protocol:General
Interface In(bps)  Out(bps) InBytes/Delta  OutBytes/Delta
Hu0/1/0/3 99.1M/  0%0/  0% 2.7G/24.8M 0/0
Hu0/0/0/0 11000/  0%15000/  0%   495110/3226 639642/4198
Hu0/0/0/5 12000/  0%   100.3M/  0%   467544/2958   2.7G/25.1M
Hu0/0/0/6 13000/  0%12000/  0%   523510/3328 483334/3020


Switch to src+dst MAC load-balancing and we get a more or less perfect
distribution:
!
l2vpn
 load-balancing flow src-dst-mac
!

ar0-ws.bllab Monitor Time: 00:20:56  SysUptime: 312:11:38
 Last Clear:   00:17:02
Protocol:General
Interface In(bps)  Out(bps) InBytes/Delta  OutBytes/Delta
Hu0/1/0/3 99.7M/  0%0/  0% 2.9G/24.9M 0/0
Hu0/0/0/0 12000/  0%31.7M/  0%   371774/2972 993.0M/8.6M
Hu0/0/0/5 12000/  0%33.4M/  0%   366524/2958 980.9M/8.1M
Hu0/0/0/6 12000/  0%33.3M/  0%   373604/3442 979.3M/8.4M


When switching to src+dst IP load-balancing we get basically the same
distribution:
!
l2vpn
 load-balancing flow src-dst-ip
!

ar0-ws.bllab Monitor Time: 00:23:22  SysUptime: 312:14:04
 Last Clear:   00:21:58
Protocol:General
Interface In(bps)  Out(bps) InBytes/Delta  OutBytes/Delta
Hu0/1/0/3 99.7M/  0%0/  0% 1.0G/24.9M 0/0
Hu0/0/0/0 11000/  0%31.2M/  0%   135550/2888 355.8M/8.4M
Hu0/0/0/5 12000/  0%33.6M/  0%   131840/3396 353.1M/8.4M
Hu0/0/0/6 12000/  0%33.4M/  0%   134639/3091 351.1M/8.3M

Tomahawk NPU is using CRC32 for load-balancing so not sure why the
MX2020 box you tested was so uneven if also using CRC32. It could be
implementation specific as you mentioned with the Nokia owner who
added a 32b static value. Despite having TCP headers on top of IP
headers, if I remove TCP, or set the TCP ports to be static, random,
incrementing etc., it has no impact on the above, so the ASR9K isn't
feeding layer 4 keys into the CRC32 (which is exactly as the Cisco

Re: [j-nsp] LAG/ECMP hash performance

2019-08-28 Thread James Bensley

On Sat, 24 Aug 2019 at 10:06, Saku Ytti  wrote:

Hi Saku,

> Has anyone ran into a set of flows where ostensibly you have enough
> entropy to balance fairly, but you end up seeing significant imbalance
> anyhow? Can you share the story? What platform? How did you
> troubleshoot? How did you fix?

No. Out of curiosity, have you, which is what lead you to post this?
If yes, what platform?

> It looks like many/most vendors are still using CRC for LAG/ECMP,
> which historically makes sense, as you could piggyback on EthernetFCS
> transistors for 0cost implementation. Today likely the transistors are
> different anyhow as PHY and lookup engine are too separate, so CRC may
> not be very good choice for the problem.

Yeah I more or less agree. It's a bit computationally expensive if the
lookup engine is not something "modern" (i.e. a typical modern Intel
x86_64 chip) with a native CRC32 instruction. In this case of say an
Intel chip (or any ASIC with CRC32 built-in) generating a CRC32 sum
for load-balancing wouldn't be much of an overhead. But even with a
native CRC32 instruction it seems like overkill. If "speed is
everything" a CRC32 instruction might not complete in a single CPU
cycle so other methods could be faster, especially given that most
people don't need 32bits of entropy produced by CRC32 (as in they
don't have 2^32 links in a single LAG bundle or that many ECMP
routes).

> If I read this right (thanks David)
> https://github.com/rurban/smhasher/blob/master/doc/crc32 - CRC32
> appears to have less than perfect 'diffusion' quality, which would
> communicate that there are scenarios where poor balancing is by design
> and where another hash implementation with good diffusion quality
> would balance fairly.

That is my understanding of CRC32 also, although I didn't know it was
being widely used for load-balancing so I had never though of it as an
actual piratical issue. One thing to consider is that not all CRC32
sums are the same, what kind of polynomial is used varies and so $box1
doing CRC32 for load-balancing might produce different results to
$box2, if they use different polynomials. I have recorded some common
ones here: 
https://null.53bits.co.uk/index.php?page=crc-and-checksum-error-detection#polynomial

It looks like the standard IEEE 802.3 value 0x04C11DB7 is being used
for these tests, here
https://github.com/jwbensley/Ethernet-CRC32/blob/master/crc32.c

Other polys are used though, e.g. for larger packets. When using jumbo
frames and stretching the amount of data the CRC has to protect
against with the same sized sum (32 bits) other polynomials can be
more effective. It's probably a safe bet that most implementations
that use CRC32 for hashing use the same standard poly value but I'm
keen to hear more about this.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] What exactly causes inconsistent RTT seen using ping utility in Junos?

2019-05-02 Thread James Bensley

On Thu, 25 Apr 2019 at 08:49, Tarko Tikan  wrote:
>
> hey,
>
> > Please let me know if anything was unclear or if someone has other
> > ideas or theories.
>
> Been following this thread and do not have anything to contribute at
> this point but wanted to say I (and I hope many others) appreciate this
> type of proper debugging given the tools we have available these days.

I agree, this has been interesting to read and a commendable level of
debugging has been carried out, however I'm not sure why.

There are 100 different reasons why packets could be delayed when
passing through the RE as opposed to the forwarding plane.

Trying to debug which one is causing this specific issue seems like a
dog chasing his tail. Once the cause has been identified, and
resolved, after the next software update something else will trigger
the same unwanted behaviour :)

The transit forwarding path through the router has a much more
strictly bound packet processing loop than the RE so I wouldn't really
be relying on it for anything.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] OSPF reference-bandwidth 1T

2019-02-07 Thread James Bensley

TLDR; metrics aren't a purely design/academic decision, they are
operational too.

On Thu, 24 Jan 2019 at 09:27, Saku Ytti  wrote:
> I don't disagree, I just disagree that there are common case where
> bandwidth is most indicative of good SPT.

If by "good" you mean "shortest" (least number of hops) then I
disagree with you, bandwidth is usually indicative of shortest number
of hops (not always but usually). In any reasonable hierarchical
design northbound links aren't going to be of a lower speed than
southbound links. Taking Adams example of a folded Clos network as a
theoretical utopian text-book example, you also wouldn't have
east-west links between leaves and if you did they wouldn't be as fast
or faster than your northbound links. The problem is that in reality
no SP network looks as neat and tidy or simply as a Clos network, see
below

> Consider I have
>
> 10GE-1:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - PE2
>
> 10GE-2:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - PE2
>
> 10GE-3:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - PE2
>
> 1GE:
> PE1 - PE2
>
> In which realistic topology

> a) in 10GE-1 + 1GE, I want to prefer the 10GE between PE?

As soon as you have 1.1Gpbs of traffic to shift (see my
previous email). And this is where reality kicks in - why would you
have a PE with a 10G and 1G uplink? In the hypothetical Clos design
you simply wouldn't have mixed speed links facing northbound, in the
real SP networking world you wouldn't have a 10G uplink if you didn't
have >1Gbps of provisioned downstream connectivity, otherwise you're
wasting capex/opex (except for rare circumstances like a carrier
promotion selling 10G for the price of 1G or something, but you
probably hadn't planned for that). So, assuming there is a reason you
have bandwidth asymmetrical uplinks in your topology its probably
downstream bandwidth related. It could also be upstream relted though;
upstream link upgrades don't happen in a fixed time or perfectly
symmetrically, maybe the road cloure is delayed, route planning
changes, PoP closure, transmission equipment upgrade, you end up
upgrading one northbound circuit in 3 motnths and the other takes 12
months. To go full circle to your original point bandwidth is
dictating the "best" SPT here where "best" means "to avoid congestion
during normal operations, not times of excepional operations which is
when we look to QoS for help".

This is what happens in the "real world" and not Clos networks. We
might want diverse connections to a remote PoP and only one carrier
has 10G of capacity there, so our backup link has to be 1G. We
actually have more than 1G of provisioned downstream connectivity but
that is all we can get unless we want 2x10G from the same carrier and
no resilience. Maybe we can bond a few 1G links from the 2nd carrier
and have 10G + 5G backup. To be clear I don't approve of such a
design, my point is that in the real world, where things aren't
simple, circuit costs are higher than expected, we don't have enough
100G or 10G ports, the project has been under budgeted, the lead time
on the new router from vendor is 12 months not the promised 3, we end
up with these kinds of weird asymmetrical topologies and we have to
use a bandwidth based metric to route traffic.

> b) in 10GE-2 + 1GE, I want to balance between the paths

So, from a purely technical perspective, if you did per flow load
balancing it would work. Should you do it? I'd say Hell no. But not
because of anything to do with IGPs. The operational complexity of
troubleshooting such a topology is too high in this scenario; Imagine
if each one of those 10G links between P nodes was from a different
carrier it would be a case of service credits lining ready to be given
away.

> c) in 10GE-3 + 1GE, I want to prefer the 1GE

You actually have some bandwidth critical services which are <= 1Gbps.

> All these seem nonsensical, what actually is meant '1GE has role Z,
> 10GE has role X, have higher metric for role Z', regardless what the
> actual bandwidth is. I just happens that bandwidth approximates role
> in that topology, but desired topology is likely achieved with
> distance vector or simple role topology and bandwidth is not relevant
> information.

To me they aren't nonsensical, they are "not ideal"  for a specific
purpose i.e. sub-optimal for latency, or operationally more complex.
Going right back to basics; the reason we have a metric at all in the
IGP is because there is some reason why the shortest path (number of
hops) from A to B isn't the most optimal path, so we're using the
metric as a weight to influence the SPT calculation. So the question
is why isn't the STP optimal for you? In the hypothetical Clos model
it is, in real life it isn't, so we're always trying to get as close
to that as we can. Metrics aren't just a purely design/academic
decision (function based or role based), they are operational too;
e.g. breaking up a failure domain or breaking up a

Re: [j-nsp] OSPF reference-bandwidth 1T

2019-01-23 Thread James Bensley

On Thu, 17 Jan 2019 at 18:09, Saku Ytti  wrote:
> It boggles my mind which network has _common case_ where
> bandwidth is most indicative of best SPT.

Hi Saku,

I've worked on several small networks where you don't have equal
bandwidth links in the network. I don't mean U/ECMP, I mean a ring
topology for example, and some links might be 10G and some 1G etc.
Maybe the top half of the ring from 9 o'clock moving clockwise round
to 3 o'clock is 10Gbps, or 20Gbps, and the bottom half from 3 o'clock
moving clockwise round to 9 o'clock is 10Gbps or 1Gbps. I want traffic
from the 3 o'clock PE to always go anticlockwise to get to 8 o'clock
depsiten being one hop further to reduce the traffic across the bottom
half of the ring.

Previosuly you said

On Wed, 16 Jan 2019 at 15:42, Saku Ytti  wrote:
...
> No one should be using bandwidth based metrics, it's quite
> non-sensical.

But for any link between PoP A and PoP B the bandwidth is directly
related to the cost, i.e. 1Gbps from A to B costs < 10Gbps and 10Gbps
from A to B costs < 100Gbps etc. Having worked on small very small
ISPs with only 2 or 3 PEs and lucky 8 ball to get by, costs is
everything and you end with both both; links of varrying speeds and
links fo varrying MTUs (oh the joy!).

> P-P-country etc. If you have many egress options for given prefix
> latency based metric might be better bet.

Yeah, for larger networks with more money this works well. $dayjob has
a lot of realtime voice and video flying around so we use latency
based metrics and it works well but we also have our own transmission
infrastructure meaning that bandwidth isn't an factor for us. Not
everyone has that luxury.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] OSPF reference-bandwidth 1T

2019-01-23 Thread James Bensley

On Wed, 16 Jan 2019 at 15:06, Event Script  wrote:
>
> In the process of adding 100G, LAGs with multiple 100G, and to be prepared
> for 400G, looking for feedback on setting ospf reference-bandwidth to 1T.
>
> Please let me know if you have had any issues with this, or if it has been
> a smooth transition for you .
>
> Thanks in advance!

Hi there,

I have worked on networks with the OSPF reference bandwidth set to
1Tbps for the same reason as you, we were planning to deploy 100G
links within the next 12 months. As we built the network we used a
reference bandwidth of 1Tbps from the start so that we wouldn't have
to change it in the future. We also expected to deploy 100G LAGs so a
reference bandwidth of 100G wasn't enough. I've worked on networks
where it has been set to 100Gbps, that just seems silly to me as
you'll just have to increase it at some point. Setting it to 1Tbps
makes good sense to me and in my experince it works fine (tested on
both Cisco and Juniper).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Finding drops

2019-01-23 Thread James Bensley

On Mon, 21 Jan 2019 at 20:09, Jason Lixfeld  wrote:
>
> Hi all,
>
> I’m doing some RFC2544 tests through an MX204.  The tester is connected to 
> et-0/0/2, and the test destination is somewhere out there via et-0/0/0.  64 
> byte packets seem to be getting dropped, and I’m trying to find where on the 
> box those drops are being recorded.
>
> I’ve distilled the test down to generating 100 million 64 byte (UDP) packets 
> to the destination, but the counters on et-0/0/2 read as though they’ve only 
> received about 76.6% of those packets.
>
> If I change the test to send 100 million 100 byte packets, the counters on 
> et-0/0/2 account for all packets.
>
> I’ve tried looking at various output to find a counter that registers the 
> missing packets, but I’m not having any luck.
>
> Aside from 'show interface et-0/0/2 extensive’, I’ve looked here with no luck:
>
> show interface queue et-0/0/2
> show pfe statistics traffic detail
> show pfe statistics exceptions
> show pfe statistics error
>
> Somewhere else I should be looking?

Hi Jason,

To me there are two variables here; speed and packet size. You are
sending 64B packets and experiancing issues but at what rate? 40Gbps
line rate (circa 59.4Mpps)? When you are sending 100B packets you are
not experiancing issues, at what rate is this, 40Gbps (circa 41.6Mpps)
too?

When sending larger packets (100B) the PPS rate would be lower to
achieve line rate @ 40Gbps so two things have changed between the
working and non-working that tests. To try and reduce this to a single
variable (packet size only) does you tester support any sort of packet
pacing, i.e. can you sending 64B packets but at less than line rate?
For example, can you transmit at a rate of 10Gbps of 64B packets
(circa 14.4Mpps) instead of 40Gpbs of 64B packets (circa 59.5Mpps) and
see if all the packets are counted on your ingress interface?

On a side note, when you say that the packet counters on the MX
receiving interface are missing packets, what about the transmitting
interface and the receiving RFC2544 tester? Do they also show the
packets as missing (so they were dropped) or do they show the correct
number of packets received and it's just the MX receiving interface
that isn't accounting for all packets BUT all packets are forwarded?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] vMX questions - vCPU math

2018-12-31 Thread James Bensley

On 30 December 2018 21:54:17 CET, Robert Hass  wrote:
...
>My confusion is related to HT setting, as you wrote to disable it.
>
>But vMX Getting Started Guide for KVM says:
>
>"CPU pinning with flow caching enabled (performance mode) is different
>than
>with flow
>caching disabled (lite mode). For both modes, you must enable
>hyperthreading"

Hmm strange. I've not done any performance testing with vMX, I don't use it 
outside of the lab, so I'm not up to date with vMX best practices but for DPDK 
powered applications the recommendation is always to disable HT and vMX is DPDK 
powered so I made the same recommendation here.

I'm curious to know Juniper's reason for advising HT be turned on. The case for 
disabling it seems clear to me; DPDK will lock the tx/rx cores at 99%, any 
hyper threading on those cores would then cause a high number of context 
switches which degrades performance (latency more then throughput). To what 
extent depends on various factors but, if you're tuning for performance 
(isolating cores, CPU pinning, CPU power management, NUMA affinity, hugepages 
etc.) it seems reasonable to me you wouldn't want to use something like HT that 
can degrade performance.

I'm all ears on this one.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] vMX questions - vCPU math

2018-12-30 Thread James Bensley

On 30 December 2018 18:40:50 CET, James Bensley  wrote:
> Text with lots of typos

^ Sorry about that, on a mobile.

>I often make notes and never get around to publishing them online
>anywhere. Nearly 2 Yeats ago (where did the time go?) I was testing
>CRS1000v performance. This link might have some useful info under the
>HugePages and Virtualization sections:
>https://docs.google.com/document/d/1YUwU3T5GNgmi6e2JwgViFRO_QoyUXiaDGnA-cixAaRY/edit?usp=drivesdk

I forgot to mention that if you want high performance you might also want to 
isolate all except one CPU from Linux so that they can't be used for anything 
except your virtual machines (see the CPU tuning > Isolating section in those 
same notes).

Also you might want to disable any power saving modes in your CPU if you hate 
trees and want lower latency (see the CPU tuning > frequency section in those 
same notes).

There is also some notes on NUMA and PCI affinity.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] vMX questions - vCPU math

2018-12-30 Thread James Bensley

On 30 December 2018 18:12:43 CET, Aaron1  wrote:
>With vMX I understand that as more performance is needed, more vcpu,
>network card(s) and memory are needed.  As you scale up, a single vcpu
>is still used for control plane, any additional vcpu‘s are used for
>forwarding plane.  The assignment of resources is automatic and not
>configurable.

I think this depends on which kind of vMX you use, the nested version allocates 
a fixed amount of vCPU and memory to the vCP machine so more doesn't help 
there. If using separate vCP and vFP machines then more resources to the vCP 
helps in a RR scenario and more resources to the vFP helps in the vPE scenario 
(I don't know what the ceiling is of either scenario though, can Junos make 
"good" use of 20 BGP threads for example?).

>> On Dec 30, 2018, at 2:53 AM, Robert Hass  wrote:
>> 
>> Hi
>> I have few questions regarding vMX deployed on platform:
>> - KVM+Ubuntu as Host/Hypervisor
>> - server with 2 CPUs, 8 core each, HT enabled

If you want high throughput, in a vPE style disabling HT is recommended.

>> - DualPort (2x10G) Intel X520 NIC (SR-IOV mode)
>> - DualPort Intel i350 NIC
>> - vMX performance-mode (SR-IOV only)
>> - 64GB RAM (4GB Ubuntu, 8GB vCP, 52GB vFPC)

I use vMX in the lab but haven't tried to test it's performance, you might need 
to enable hugepages on your KVM host if you haven't already. KVM has support 
for hugepages (somehow, I haven't used it).

>> - JunOS 18.2R1-S1.5 (but I can upgrade to 18.3 or even 18.4)
>> 
>> 1) vMX is using CPU-pinning technique. Can vMX use two CPUs for vFPC

Technically yes. If you manually assign multiple cores to the vFP machine from 
both physicla CPUs it will look like one multicore CPU to this vFP of you 
configure that VM in KVM to only have one CPU. However, this is generally bad. 
When you have 2 physical CPUs with cores from both assigned to the same CPU you 
will have some NUMA locality performance penalty (cache misses). You should 
place the all vCPUs of the same VM on the same NUMA node (because you're not 
dealing with TBs of memory). High performance VMs should use cores from the 
same CPU if possible and HT should be disabled in the case of vMX. The vFP uses 
DPDK to continuously poll the NIC queues for new packets, so core allocated to 
NIC queue processing are locked at 99% CPU usage all the  time, when you have 
1pps of traffic or 1Mpps. HT doesn't work very well when you want to lock the 
cores like this and is often adds a performance penalty due to the high rate of 
context switches.

>>   Eg. machine with two CPUs, 6 cores each. Total 12 cores. Will vMX
>>   use secondary CPU for packet processing ?

As above, depending on how you configure the VM it might see all cores as the 
same CPU.

>> 2) Performance mode for VFP requires cores=(4*number-of-ports)+3.
>>   So in my case (2x10GE SR-IOV) it's (4*2)+3=11. Will vMX count the
>>   cores resulting from HT (not physical) in that case?

If you have 4 physical core, 8 with HT and allocate 6 to a VM it just see's 6 
cores and doesn't differentiate between "real" cores or HT cores but as above, 
for high performance VMs HT is generally disabled. You can oversubscribe with 
KVM, if you host has 4 core, with or without HT, you can have two VMs with 4 
vCPUs but then they'll be firing for physical CPU resources. In the case of vMX 
you don't want to oversubscribe because, as above, DPDK locks the NIC queue 
cores 99%.

>> 3) How JunOS Upgrade process looks like on vMX ? Is it regular
>>   request system software add ...

Sorry don't know :)

I often make notes and never get around to publishing them online anywhere. 
Nearly 2 Yeats ago (where did the time go?) I was testing CRS1000v performance. 
This link might have some useful info under the HugePages and Virtualization 
sections: 
https://docs.google.com/document/d/1YUwU3T5GNgmi6e2JwgViFRO_QoyUXiaDGnA-cixAaRY/edit?usp=drivesdk

This page has some notes on NUMA affinity, its important (IMO) to understand 
why it causes problems: 
https://null.53bits.co.uk/index.php?page=numa-and-queue-affinity

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Opinions on fusion provider edge

2018-11-08 Thread James Bensley

On 8 November 2018 14:23:02 GMT, Tarko Tikan  wrote:
>hey,
>
>> There is
>> nothing wrong with layer 2 aggregation switches in my opinion, the
>> only technical advantage in my opinion to using SP Fusion for a layer
>> 1 extension to a router compared to a layer 2 switch is that SP
>Fusion
>> is one device to configure and monitor instead of two.
>
>Except that it's not L1. It's still L2 with 802.1BR (or vendor 
>proprietary version of that).

Yep, Juniper told us at the time that Fusion was based on open standards 
(802.1BR) and not proprietary in any way. Funny how they don't support the use 
of any other 802.1BR complaint device and, I doubt it would work. They must 
have some property gubbins in there like pushing the Fusion firmware blob from 
the aggregation device to the satellite device. If the Fusion firmware wasn't 
on the QFX the MX and QFX wouldn't "bond". Not sure how the MX detects that 
(LLDP?) - I had a (albeit quick) look at the standard back then and couldn't 
seen anything related, so I presume an MX AD would reject a random 802.1 BR 
compatible device.

>You highlight the exact reasons why one should stay away from 
>fusion/fex/satellite - features must explicitly be 
>ported/accommodated/tested for them. Not all performance data is 
>available, OAM/CFM is a struggle etc.

Agreed.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Opinions on fusion provider edge

2018-11-08 Thread James Bensley

On Wed, 7 Nov 2018 at 13:03, Antti Ristimäki  wrote:
> Wrt the original question about possible issues with Fusion, we have faced 
> quite a many. Currently one of the biggest pains is to get CoS configured 
> properly on Fusion ports. We have a case open, where any CoS scheduler change 
> stops traffic forwarding out from the cascade port, if one has explicitly 
> configured schedulers for the cascade port logical control (32769) and data 
> (32770) units. This is pretty irritating, as traffic between the extended 
> customer facing port and the CE device works just fine, keeping e.g. BGP up 
> and running, but traffic to/from the core does not work.
>
> I'm also somewhat concerned about the fact that the whole Fusion thing is 
> more or less a black box and as such much more difficult to debug than 
> traditional technologies.
>
> From monitoring point of view it also a bit challenge that not all 
> information related to satellite ports is not available via SNMP. E.g. queue 
> specific counters are not available but have to be queried via CLI command, 
> and IIRC also ifOutDiscards is not recorded for the extended ports.

My experiences with SP Fusion so far have been to add more ports to
routers in a cost effective manner. Filling a chassis with 1G/10G
ports isn't super cost efficient as line cards are expensive and ports
are rarely run near 100% utilisation for an extended period of time,
so it makes for a poor ROI. At $job-1 we went down the road of using
SP Fusion as a layer 1 extension to PEs, using a 40G uplinks to the
router as the aggregation link. Operators have been doing this for
years using dumb layer 2 switches as a layer 2 extension. There is
nothing wrong with layer 2 aggregation switches in my opinion, the
only technical advantage in my opinion to using SP Fusion for a layer
1 extension to a router compared to a layer 2 switch is that SP Fusion
is one device to configure and monitor instead of two. Unless we had
deployed thousands of aggregation + satellite devices it's not really
having any major positive impact on my monitoring licensing costs
thought. Equally when using a typical router + layer 2 switch
extension,  the config that goes into the layer 2 switch is so basic,
touching two devices instead of one again seems like a negligible
disadvantage to me.

The benefit we had from SP Fusion is that, and I'm guessing here, is
that Juniper wanted guinea pigs; they sold us QFX5100s as SP Fusion
devices plus line cards for the MX's for cheaper than we could buy
line cards + EXs, and guinea pigs we were. It took quite a bit of
effort to get the QFXs onto the correct code version in stand alone
mode. We also had to upgrade our MXs to 17.1 (this was not long after
it's release) and use the then new RE-64s because we needed HQoS over
Fusion and this was the only way it was supported. It was then more
hassle to get the QFXs to migrate into Fusion mode and download their
special firmware blob from the MXs. We had to get JTAC to help us and
even they struggled. Another issue is that we were a heavy users of
Inter-AS MPLS option B's and they aren't supported over SP Fusion
links. There is technically no reason why it wouldn't work, as Fusion
is a layer 1 extension, however, Inter-AS Opt B isn't one of the
features they test when releasing new Fusion code versions, so it's
officially unsupported, so we still had to deploy EX's for Opt B
links.

A colleague of mine worked on a separate project which was a DC Fusion
deployment and similar issues and it took him a lot of headache and
JTAC assistance to get that deployment working.

In my current $job we have/had a QFX switch stack in the office (so
nothing to do with Fusion) at that has been very troublesome. As per
some of the other threads on this list we've had lots of problems with
QFX switches and certain optics not working, either in stacked mode or
on certain code versions. Again, this went to JTAC, they couldn't fix
it, eventually we fixed it by trying various different code versions
and breaking the stack out.

So overall, not impressed with the QFX5100s at all.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper Buffer Bloat

2018-10-18 Thread James Bensley

On Thu, 18 Oct 2018 at 10:36, Benny Lyne Amorsen
 wrote:
>
> James Bensley  writes:
>
> > If customers have WAN links that are slower than their LAN links -
> > that is where fq-codel was designed to be implemented and that is why
> > it should be implemented on the CPE. Let's be clear because to me
> > there is no place for it in the core:
> >
> > If your customers have a WAN link that is slower than their LAN
> > connectivity the congestion occurs on the WAN link and this is why
> > fq-codel works best on the CPE WAN interface (you would normally shape
> > the CPE WAN interface to just below the WAN link speed and use
> > fq-codel for queuing so that it kicks in before you hit the WAN link
> > speed and start dropping packets, the CPE has visibility of the WAN
> > interface usage and buffer usage).
>
> This is great for upstream traffic, as seen from the customer. It is the
> wrong solution for the downstream. Downstream you need the PE or the
> DSLAM to do the right thing, or you will be shaping traffic AFTER the
> bottleneck, with all the usual problems of that approach.

Hi Benny,

This is not correct. Again we are being unclear here (maybe my
fault!). What I was saying above was:

> > If your customers have a WAN link that is slower than their LAN
> > connectivity the congestion

Should have been bufferbloat ^

> > occurs on the WAN link and this is why
> > fq-codel works best on the CPE WAN interface (you would normally shape
> > the CPE WAN interface to just below the WAN link speed and use
> > fq-codel for queuing so that it kicks in before you hit the WAN link
> > speed and start dropping packets, the CPE has visibility of the WAN
> > interface usage and buffer usage).

The OP asked about fq-codel in Juniper tin for bufferbloat. When using
fq-codel (in my experience which was my home ADSL line with OpenWRT
router) one has to tell fq-codel the link speed (so as I said above,
you set it to slightly lower than your link speed so that it kicks in
before packets are dropped). It is actually enough to use fq-codel on
the CPE only, to fix bufferbloat, nothing is required on the provider
side, that is the whole point, the CPE buffers are too large, so
fq-codel in the core doesn't make sense as there should be minimal
bufferbloat there. As Saku mentioned buffers should be kept small
inside your provider network, this is a great short video on this
subject: https://www.youtube.com/watch?v=y3mkhUq-cO4

A by-product of using fq-codel on the CPE (because it is both AQM and
packet scheduling combined!) is that when you configure it and
configure it to ~99% of the actual speed UP and DOWN it will actually
have positive effects on both upload and download before either your
PE policer kicks in or the WAN link drops packets due to congestion,
the PE doesn't need to do anything. Unless my understanding is wrong,
fq-codel schedules the outbound packets and will allow you to fill
your link in the downstream with multiple concurrent IP flows without
anyone of them being starved by another. My own tests confirmed this
on my home ADSL, I could smash the link with torrents and still watch
NetFlix without issue. There are some examples you can see here:
https://www.bufferbloat.net/projects/codel/wiki/RRUL_Rogues_Gallery/

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper Buffer Bloat

2018-10-18 Thread James Bensley

On Wed, 17 Oct 2018 at 21:48, Colton Conor  wrote:
>
> James,
>
> Thanks for the response. However, if you are just shaping at the CPE, then 
> there could be bottlenecks between the CPE and core router causing the 
> bufferbloat right? Example, if you have a wireless network, DSL network, CMTS 
> network, etc you bottleneck is going to be that wireless accesses point, that 
> DSLAM, or that CMTS, and not the CPE. Packets are going to build up on the 
> access device not the CPE right?
...
On Thu, 18 Oct 2018 at 00:19, Colton Conor  wrote:
>
> Benny,
>
> Great information! So what would you do to avoid congest on access
> networks? Example DSLAM's fed by 10G links. The 10G link is no where near
> capacity, but customers are individually maxing out their 20Mbps by 1Mbps
> DSL connection.

These are two separate issues. I'm confused as to why you are asking
about fq-codel in the core - I think you may have misunderstood
something.

If your DSLAM/access-switch/MSAN has run out of backplane or backhaul
capacity it needs upgrading. fq-codel isn't the solution to that
problem.

If customers have WAN links that are slower than their LAN links -
that is where fq-codel was designed to be implemented and that is why
it should be implemented on the CPE. Let's be clear because to me
there is no place for it in the core:

If your customers have a WAN link that is slower than their LAN
connectivity the congestion occurs on the WAN link and this is why
fq-codel works best on the CPE WAN interface (you would normally shape
the CPE WAN interface to just below the WAN link speed and use
fq-codel for queuing so that it kicks in before you hit the WAN link
speed and start dropping packets, the CPE has visibility of the WAN
interface usage and buffer usage).

If your customers have high speed WAN links 100M/1G/10G etc. that is
the same speed or faster than the LAN connectivity AND you're
implemented the policing/shaping on your edge device (or worse -
deeper into your network) then the CPE has no visibility of this. Now
you have the problem that you are transporting traffic which will be
dropped. This is a bigger issue than fq-codel because, customers could
be congesting important shared links (access-switch/DSLAM/MSAN
uplinks) with traffic that will eventual be dropped - so the
congestion shouldn't have occurred in the first place.

Taking this full circle - fq-codel is aimed at addressing bufferbloat,
not congested up-links on the devices in your network, please note
that these are two seperate problems. A port/link/line card upgrade
fixes the congested core/backhaul link problem.

On Thu, 18 Oct 2018 at 06:59, Mark Tinka  wrote:
>
> > On 17/Oct/18 22:06, James Bensley wrote:
> > "words about policing on the CPE and not in the core"
>
> This wouldn't be practical if you do not manage the CPE.

Yes it would - the general rule of thumb still applies: police on your
access-switch/DSLAM/MSAN/whatever rather than backhauling traffic that
is later going to be dropped - surely not your not disputing the
concept of saving resources but dropped "doomed" traffic earlier?

There are reasons to do the policing on the BNG/LNS/subscriber
management gateway - e.q. complex QoS but Colton hasn't mention any
QoS so I can only comment on the info provided.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper Buffer Bloat

2018-10-17 Thread James Bensley

On 17 October 2018 20:53:44 CEST, Colton Conor  wrote:
>I was wondering if Juniper supports anything like fq-codel to prevent
>buffer bloat? Specifically we would like to do rate shaping and
>subscriber
>management on core Juniper MX's. However, most network devices do
>simple
>buffers and queuing that does not work well compared to fq-codel and
>newer
>algorithms. Does anyone have advice when it comes to Juniper?

Hi Colton,

I'm pretty sure Juniper don't have anything like fq-codel in any of their 
products however, the place to do that would be on the CPE or edge access 
switch/router not your core.

Normally one would rate limit the customer on CPE LAN ingress and egress if 
you're doing multi-colour policing, or CPE WAN ingress and egress if just hard 
policing, or CPE WAN egress and PE egress if doing shaping. You don't want your 
customer to blast traffic into your network and this have to carry those 
packets across your core only to them drop them on your expensive  subscriber 
management box.

One normally wants to drop excess packets as close to the source as possiblr. 
So for that reason you want fq-codel on your CPE.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Traffic delayed

2018-10-05 Thread James Bensley

On 4 October 2018 19:34:01 BST, james list  wrote:
>Due to the fact that access switch are QFX5100 in virtual chassis, does
>anybody know if IS-IS managing virtual- chassis has something happening
>every 30 minutes which could cause delay?
>
>Cheers

As per my previous message, you should see such an event in the logs. Have you 
enabled verbose logging in the IS-IS trace options (and any other services 
running on the devices)?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BFD Distributed Mode for IPv6

2018-10-03 Thread James Bensley

On Wed, 3 Oct 2018 at 10:13, Mark Tinka  wrote:
> On 3/Oct/18 11:09, adamv0...@netconsultings.com wrote:
>
> If you'd have separate ISIS process for v6 would it be possible to spin up a
> separate/dedicated BFD process for that ISIS?

Unless I'm mistaken BFD isn't "multi-tenant", so only one set of BFD
packet exchanges can exist per-interface, there is no support for
multiple BFD session on the same interface (by which I mean layer 2
broadcast domain in the case of sub-interfaces). This kind of makes
sense as BFD is supposed to test for unidirectional communication
failure of the physical link (the fact that it runs in both directions
gives bidirectional failure detection). So we can run BFDv4 and/or
BFDv6 on an interface but only one instance per-interface otherwise
you'd need to negotiate different port numbers for the different
instances on the same interface?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-03 Thread James Bensley

On Tue, 2 Oct 2018 at 15:11, Mark Tinka  wrote:
> Of course, in the real world,
> it was soon obvious that your Windows laptop or your iPhone XS sending
> RSVP messages to the network will not scale well.

A point I was trying to make way back in this thread, was that IntServ
doesn't scale well for multi-stakeholder networks, which has been my
background, ISP and managed WAN operations, so I've never deployed it.
If you have a single tenant WAN with control over the WAN *and* all
end devices you can manage the scale.

On Tue, 2 Oct 2018 at 14:38,  wrote:
> And besides, I'm not sure I'd ever want to be in a position where I allow my
> core links to max out and have TE to try and shuffle flows around so that I
> can squeeze all traffic in.
> - sure this would probably not be the case of day to day operation but most
> likely only employed during link failures,

So tying this to my point above about a single-tenant WAN, this is
something that Google does (any Googler's on-list please correct where
I am wrong). They have two WANs, B2 and B4. One is public facing for
peering and transit (B2?) and the other is internal, e.g.DC to DC
(B4?). The DC to DC WAN tries to sweat it's own assets as much as
possible and run some links in the high-90's percent utilisation.
Nx100G LAGs between DCs aren't cheap, even for Google. With a single
tenant WAN you can run your links much hotter (higher average
throughput) with the aim to reduce the time spent transmitting (lower
average utilisation).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BFD Distributed Mode for IPv6

2018-10-03 Thread James Bensley

On Tue, 2 Oct 2018 at 21:46, Mark Tinka  wrote:
> On 2/Oct/18 21:13, James Bensley wrote:
>
> I presume that if one were to run MT-ISIS there would be no impact to IPv4?
>
>
> We already run MT for IS-IS. I consider this as basic a requirement as "Wide 
> Metrics".

I'm not sure about Junos but IOS-XR can run wide metrics in ST-ISIS so
I wasn't going to assume MT :)

> However, the issue here is BFD sees the whole of IS-IS as a client. So if BFD 
> has a moment, it will signal its client (IS-IS), regardless of whether the 
> moment was for IPv4 or IPv6.

Ah, yeah, that is a bummer :(

> I imagine that re-running adjacencies and SPF just for the IPv6 topology 
> would be a vendor-specific solution to the problem. However, wouldn't it just 
> be easier to support BFD for IPv6 in the PFE as Juniper already does for IPv4?
>
> I'd be interested to know if BFD works OK if you use public IPv6
> addresses for IS-IS adjacencies (although it's a waste of IPs, I'd
> still be curious).
>
>
> Interesting.
>
> What I do know is that if you are running BFD for static IPv6 routes, it runs 
> in the PFE. But if the routes are learned via an IGP (IS-IS or OSPFv3), it 
> can only run in the RE.

That is interesting.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Traffic delayed

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 19:59, james list  wrote:
>
> Can you elaborate?
> Why just every 30 minutes the issue?

Seeing as you have an all Juniper set up I don't think there is a need
to cross-post to two lists simultaneously. If you feel there is a
need, please post to the two lists separately as not all subscribers
will be subscribed to both lists.

What basic troubleshooting have you done so far? What have you ruled out?

The very first think you should have done is try to replicate the
issue in the lab, can you replicate it?

If yes, have you tried a code upgrade to see if this fixes anything?
Or changing any settings?

If not and you've only got the issue in production, can you enable
some logging to see if there is anything in the logs when the issues
happens? Do you see any packet drops on interfaces when the issue
happens? CPU spikes? Anything?

So far you haven't provided any data at all on the problem or what you
have tried to do to resolve it, before coming to the list.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BFD Distributed Mode for IPv6

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 16:39, Mark Tinka  wrote:
> The real-world problem we are seeing is when, for whatever reason, the
> RE CPU spikes and BFD for IPv6 sneezes, we also lose IPv4 because, well,
> IS-IS integrates both IP protocols.

I presume that if one were to run MT-ISIS there would be no impact to IPv4?

> On 2/Oct/18 15:30, Виталий Венгловский wrote:
>
> > Mark,
> >
> > Not exactly your scenario but we had the same problems with eBGP with
> > IPv6 link-local addresses on QFX10K platform.
> > Dev Team had replied that rather than hardware limitation it's more of
> > a "design decision" to not distribute IPv6 LL BFD sessions on PFEs,
> > it's the same behaviour across the MX/QFX/PTX portfolio and there are
> > no plans to change it.

I'd be interested to know if BFD works OK if you use public IPv6
addresses for IS-IS adjacencies (although it's a waste of IPs, I'd
still be curious).

Cheers,
James (not currently near the lab).
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 14:03, Tim Cooper  wrote:
> The QoS obligations has been pretty much cut/paste from PSN into HSCN 
> obligations, if you haven’t come across that yet. So look forward to that... 
> ;)
>
> Tim C

Unfortunately yes 

James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 11:23, Mark Tinka  wrote:
> If you are a large network (such as yourselves, Saku) where it's very likely 
> that the majority your customers are talking to each other directly across 
> your backbone, then I could see the case. But when you have customers 
> transiting multiple unique networks before they can talk to each other or to 
> servers, there is really no way you can guarantee that DSCP=1 from source 
> will remain DSCP=1 at destination.

+1. This is what we did in some parts. The way the Internet
connectivity was sold to customers was as a best effort service so
they had no issues with the DSCP being scrubbed to 0 in these places.

Also lots of customer didn't like packets coming in from the Internet
and hitting their section of the WAN with a DSCP marking on it. Some
explicitly asked us to scrub to 0.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 10:57, Mark Tinka  wrote:
> I've never quite understood it when customers ask for 8 or even 16 classes. 
> When it comes down to it, I've not been able to distill the queues to more 
> than 3. Simply put, High, Medium and Low. The 4th queue is for the network 
> itself.

I'm not saying I agree with this 8 classes - just stating what it was
:) I also agree that most people genuinely don't need more than 3-4.
We often "helped" (nudged) customers to design their traffic into just
a few classes.

Here in the land of Her Majesty and cups of tea, if you want to
operate as part of the Public Services Network (a national effort to
provide unified services to the public sector across multiple
providers to stamp out any monopoly) you must comply with their 6
class model [1]:
https://www.gov.uk/government/publications/psn-quality-of-service-qos-specification/psn-quality-of-service-qos-specification

So this 6 classes, we split voice signalling and media into two, with
the media being an LLQ, and had a separate class to guarantee traffic
for control and MGMT plane traffic (e.g. we can still SSH to our
routers with a customer DoS is filling the pipes) we ended up with 8.
Yay :(

Cheers,
James.

[1] As is customary with any tech savvy government, they've since
sacked off various PSN standards without providing any replacement so
everyone is just sticking to the same expired standards for now

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-02 Thread James Bensley

On Tue, 2 Oct 2018 at 10:10, Saku Ytti  wrote:
>
> Hey James,

Hi Saku

> > Yeah so not already using RSVP means that we're not going to deploy it
> > just to deploy an IntServ QoS model. We also use DSCP and scrub it off
> > of dirty Internet packets.
>
> Have you considered full or short pipe? It would be nice if we'd
> transport DSCP bits as-is, when we don't have to care about them.
> Far-ends may be able to utilise them for improved UX if we don't strip
> them.

Yeah in the short pipe model we can map traffic from a transit/peering
port into a best-effort class across the core and leave the DSCP
markings as is. Caveat: this is depending on your tin. If I remember
correctly we had some old tin that gave you the choice of queuing
based on received DSCP and then pushing on our core DSCP/EXP at egress
but not queuing based upon them, or scrubbing the incoming DSCP value
and then queuing on ingress DSCP (which would now be 0). So in the
former case, Internet traffic could freely drop into and congest an
LLQ on these old boxen. So yeah, tin permitting, short pipe FTW in my
opinion. Also most people run multi-vendor networks, I think short
pipe is an easy way forward for multi-vendor QoS.

That is what I was eluding to here (perhaps not very clearly):

> which you
> can simplify in your core using pipe mode or short pipe QoS models. We
> could offer multiple QoS levels to customers, but simplify the classes
> down to like 3-4 in the core, and without any need for RSVP.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Use cases for IntServ in MPLS backbones

2018-10-02 Thread James Bensley

> On 1/Oct/18 12:16, adamv0...@netconsultings.com wrote:
>
> > Hi folks,

Hi Adam,

On Mon, 1 Oct 2018 at 12:00, Mark Tinka  wrote:
>
> So we don't do any label signaling via RSVP-TE at all.
>
> We use DSCP, but really only for on-net traffic.
>
> Off-net traffic (like the Internet) is really treated as best-effort.
> You can't prioritize what you can't control.

Yeah so not already using RSVP means that we're not going to deploy it
just to deploy an IntServ QoS model. We also use DSCP and scrub it off
of dirty Internet packets.

Like with many things, it depends on your requirements. Having worked
for managed WAN providers where you have a infrastructure shared
amongst multiple customers / stakeholders and provide WAN connectivity
with over the top services like Internet and VoIP, QoS is a product
most customers expect. In this scenario you typically have a set of
queues and customer can access either all of them or a sub-set for a
cost. In a former life we had up-to 8 classes for customers which you
can simplify in your core using pipe mode or short pipe QoS models. We
could offer multiple QoS levels to customers, but simplify the classes
down to like 3-4 in the core, and without any need for RSVP. I feel
this is a good balance between complexity/simplicity and scalability.
If you don't have multiple stakeholders then IntServ becomes more
appealing due to the granularity on offer, but in the shared
infrastructure scenario my experience is that mapping multiple
customer queues down to a fewer core queues helps to protect the
control plane and LLQ traffic in a simply way that covered all stake
holders, and no need for the additional signalling complexity that
RSVP brings.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] help with routing bypassing bgp path selection

2018-10-01 Thread James Bensley

On Mon, 1 Oct 2018 at 06:49, tim tiriche  wrote:
>
> hello,
>
> i have 5 PE routers running with full iBGP/RSVP-TE MPLS Mesh.
>
> There is a CE connected to PE5 and PE4.
>
> Based on BGP Path selection all of the PE {1,2,3,4} are preferring route to
> PE5 due to BGP Path selection based on AS PATH tiebreaker.
>
> However, i would like PE1 to prefer PE4 and the rest to PE5.  What is the
> best way to go about doing this?

Hi Tim,

I'd say that the preferred way to control routing protocol behavior
dynamically (e.g. per site or CPE) is to use BGP communities. So in
this case you could set up some communities and have the CE advertise
it's routes with a specific community attached.

E.g. To have a per-PE preference policy you can have the CPE advertise
it's prefixes with a specific community "X". Either PE4 could match
that community and increase it's local-pref for the route before
sending it on to it's iBGP neighbors (this would affect all iBGP
neighbours making them all prefer the prefix via PE4) or have PE1
match the community on ingress (this would only affect PE1). The
problem with the later is that if PE4 doesn't have the best path he
won't advertise it without some BGP add-path / best-external knobs
being twiddled.

You can use an iBGP outbound policy on PE4 e.g. PE4 matches community
X which sets a higher local pref only on advertisements to PE1 but
per-PE outbound policies don't scale well.

Communities work great for manipulating eBGP policy but with iBGP
policy manipulation it quickly becomes messy so you need to plan with
care.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] IS-IS POI

2018-09-28 Thread James Bensley

Hi All,

Have anyone used this feature, did it actually help you pin-point the
source of an IGP issue?

https://www.juniper.net/documentation/en_US/junos/topics/concept/isis-poi-tlv-overview.html

https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/purge-originator-edit-protocols-isis.html

As always in a mixed vendor network it can only add a limited amount
of mileage as not all our other vendors support it but, that is where
the "empty" knob could help.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] PyEZ - variable into rpc.get request

2018-09-04 Thread James Bensley

On Tue, 4 Sep 2018 at 01:33, Jason Taranto  wrote:
>
> Hi All,
>
> After a while of my head colliding with the wall beside me, would anyone know 
> how to get a variable into an rpc command via pyez.
>
> My latest attempt is below.
>
>   r2check = raw_input("Route to check e.g XXX.XXX.XXX.XXX : ")
>   print ("Checking if route is in the table..")
>   time.sleep(2)
>   r2response = dev.rpc.get_route_information(table='inet.0', 
> destination='r2check')
>   time.sleep(2)
>   
> print("")
>   print(" 
>")
>   print("  Response from router below...  
>")
>   print (etree.tostring(r2response))
>
>
> The error message I get is:
> jnpr.junos.exception.RpcError: RpcError(severity: error, bad_element: 
> r2check, message: could not resolve name: r2check)

Hi Jason,

Someone with more knowledge than I has already provided some info to
you, I just wanted to point out you could also join this Slack
channel: https://networktocode.slack.com

There are rooms for Python, Juniper, Cisco, Ansible, Salt and many
others all relating to network coding and automation. It's a good
place to get help with your network coding issues if you didn't know
about it already.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] spring/sr ospf - opaque lsa's

2018-08-13 Thread James Bensley

Hi Aaron,

I'm not 100% what you're asking here. Opaque LSAs are used in SR to
advertise the SID for a prefix/node/adj within the IGP:

https://tools.ietf.org/html/draft-ietf-ospf-segment-routing-extensions-25

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Longest Match for LDP (RFC5283)

2018-08-01 Thread James Bensley

On 31 July 2018 at 15:29,   wrote:
> One follow up question,
> What about the case, where the minimum set of /32 loopback routes and 
> associated labels is simply beyond the capabilities of an access node.
> Is there a possibility for such access node to rely on default route + label 
> -where originator of such a labelled default-route is the local ABR(s) in 
> "opt-B" role doing full IP lookup and then repackaging packets towards the 
> actual NH please?

Hi Adam,

In the Seamless MPLS design the access nodes have a single default
route or single summary prefix for your loopback range (say
192.0.2.0/24) and use LDP Downstream on Demand and request the
transport labels from the aggregation nodes only for the remote PEs
the access node actually needs (i.e. where you have configured a
pseudowire/L2 VPN towards, iBGP neghbour address for L3 VPN etc.). So
the access node should *only* have exactly the labels it needs with a
single route (when using RFC5283).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Longest Match for LDP (RFC5283)

2018-07-30 Thread James Bensley

On 30 July 2018 at 15:22, Krzysztof Szarkowicz  wrote:
> James,
>
> As mentioned in my earlier mail, you can use it even with DU. If ABR has
> 1 /32 LDP FECs, you can configure LDP export policy on ABR to send only
> subset (e. g. 20 /32 FECs) to access.
>
> Saying that, typical deployment is with DoD, since typically access PEs (and
> not ABRs) have better knowledge which loop backs are needed. So, basically
> access PEs send the info to ABR, which loop backs are needed, and which loop
> backs are not needed via LDP DoD machinery.

Hi Kryzstof,

This was exactly my point:

> On Mon, Jul 30, 2018, 11:15 James Bensley  wrote:
>> unless we create horrible per-LDP
>> neighbour policies on the agg node that only allow the labels for the
>> exact loopbacks that access node needs to reach.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Longest Match for LDP (RFC5283)

2018-07-30 Thread James Bensley

Hi Krasimir, Krzysztof,

On 24 July 2018 at 17:25, Krasimir Avramski  wrote:
> It is used in Access Nodes(default route to AGN) with
> LDP-DOD(Downstream-on-Demand) Seamless MPLS architectures - RFC7032
> A sample with LDP->BGP-LU redistribution on AGN is here.

Thanks Krasimir. Sorry for the delay, I read
https://tools.ietf.org/html/rfc7032,
https://tools.ietf.org/html/rfc5283 and
https://tools.ietf.org/html/draft-ietf-mpls-seamless-mpls-07 before
responding.

On 25 July 2018 at 09:14, Krzysztof Szarkowicz  wrote:
> The purpose of “Longest Match for LDP” is to be able to distribute /32 LDP
> FECs, if corresponding /32 routes are not available in IGP.
> So, on ABR you inject e.g. default route into access IGP domain. ABR has /32
> LDP FECs, and advertises this /32 FECs in LDP (but not in IGP) downstream
> into access domain. In access domain, LDP readvertises hop-by-hop these /32
> LDP FECs, assigning the labels.
>
> It is typically used with LDP DoD. On the other hand, however, nothing
> prevents you from having LDP policy on ABR to inject into access domain only
> specific /32 LDP FECs.

Thanks Krzysztof, that was my understanding from the Juniper link I
provided and the RFC, but it's still nice to have my understanding
clarified by someone else.

After reading the above RFCs I see that the specific use case for this
feature is when using LDP in Downstream on Demand mode, although that
isn't actually called out in RFC5283 anywhere or the Juniper
documentation. I was thinking in DU mode in my head :)

In DU mode, an agg node will advertise all labels to the access node.
If the access node has say 10.0.0.0/22 summary route (an example range
loopback IPs are assigned from) and RFC5283 enabled, and the agg node
advertises 1024 /32 IPv4 FEC labels (one for each loopback assuming
1000 PEs exist) the access node will keep all 1000 labels even if it
only needs a few of them, matching them against the summary route.
This is the default LDP DU behaviour unless we create horrible per-LDP
neighbour policies on the agg node that only allow the labels for the
exact loopbacks that access node needs to reach. So relaxing the LDP
exact match rules is kind of useless for LDP DU. In LDP DoD mode, the
access nodes only request the label mappings for the labels they need,
so no need for per-LDP neighbour policies, but we would still need
per-LDP neighbour IP routing policies to only advertise the /32
loopback IPs that neighbor needs in the IGP, unless we use RFC5283 and
advertise a summary route (or install a static summary route).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Longest Match for LDP (RFC5283)

2018-07-25 Thread James Bensley

On 24 July 2018 at 14:35,   wrote:
> Hi James

Hi Adam,

> Suppose I have ABR advertising default-route + label down to a stub area,
> And suppose PE-3 in this stub area wants to send packets to PE1 and PE2 in
> area 0 or some other area.
> Now I guess the whole purpose of "Longest Match for LDP"  is to save
> resources on PE-3 so that all it has in its RIB/FIB is this default-route +
> LDP label pointing at the ABR.
> So it encapsulates packets destined to PE1 and PE2 with the only transport
> label it has and put the VPN label it learned via BGP from PE1 and PE2 on
> top and send the packets to ABR,
> When ABR receives these two packets -how is it going to know that these are
> not destined to it and that it needs to stitch this LSP further to LSPs
> toward PE1 and PE2 and also how would it know which of the two packets it
> just received is supposed to be forwarded to PE1 and which to PE2?
> This seem to defeat the purpose of an end-to-end LSPs principle where the
> labels stack has to uniquely identify the label-switch-path's end-point (or
> group of end-points)
> The only way out is if ABR indeed thinks these packets are destined for it
> and it also happens to host both VRFs and actually is advertised VPN
> prefixes for these VRFs to our PE-3 so that PE-3 sends packets to PE1 and
> PE2 these will land on ABR ain their respective VRFs and will be send
> further by ABR to PE1 and PE2.

^ This is exactly my problem with this feature. It only works if
directly above the transport label is the IP payload (e.g. in your
topology, PE3 is sending traffic inside the global routing tablet /
inet.0), then we need to store fewer prefixes + labels for transport
of GRT traffic. For MPLS VPN traffic as you say, the ABR needs all the
routes (for L3 VPNs), must be IPv6 capable in the case of IPv6 VPNs,
and the ability to do L2 VPN stitching to support inter-area L2 VPNs.
This is quite a lot of extra work for the ABR just to save TCAM/FIB
space on PE-3.

> In the old world the PE-3 would need to have a route + transport label for
> PE1 and PE2.
> Options:
> a) In single area for the whole core approach, PE3 would have to hold these
> routes + transport labels for all other PEs in the backbone -same LSDB on
> each host requirement.
> b) In multi-area with BGP-LU (hierarchical MPLS) we could have ABR to
> advertise only subset of routes + labels to PE-3 (or have PE-3 to only
> accept routes it actually needs) -this reduction might suffice or not, note:
> no VPN routes at the ABR.
> c) I guess this new approach then further reduces the FIB size requirements
> on PE-3 by allowing it to have just one prefix and transport label  (or two
> in case of redundant ABRs), but it increase requirements on ABRs as now they
> need to hold all VPN routes -just like RRs (i.e. require much more FIB than
> a regular PE).

^ Agree with all of the above.
Opt1 doesn't scale well.
Opt2 scales better, you could only accept the /32s you need on each PE
but now you need per-pe loopback filters :(
Opt3 doesn't scale well either. If your topology is AREA_1 -- AREA_0
-- AREA_2, then the ABR on the area 1/0 boarder must carry all the
service VRFs/prefixes/labels for all PEs inside area 1 and 2, so that
an LSP can stretch from a PE inside area 1 to a PE inside area 2 and
that ABR (and the area 0/2 ABR) can perform service label swap. This
goes for any area 0 ABR, and the more areas you have the worse it
gets, those area x/0 ABRs must carry all service prefixes/labels from
all areas. So this obviously isn't a scalable approach.

So what is the use case of this feature?

All I can see if for a label switching a default route inside the
GRT/inet.0 from access PE to an access PE in another area. Similar to
the Cisco IOS command "mpls ip default route" which allocates a label
for the default route in LDP (default is no label).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Longest Match for LDP (RFC5283)

2018-07-24 Thread James Bensley

Hi All,

Like my other post about Egress Protection on Juniper, is anyone using
what Juniper call "Longest Match for LDP" - their implementation of
RFC5283 LDP Extension for Inter-Area Label Switched Paths (LSPs) ?

The Juniper documentation is available here:

https://www.juniper.net/documentation/en_US/junos/topics/concept/longest-match-support-for-ldp-overview.html

https://www.juniper.net/documentation/en_US/junos/topics/task/configuration/configuring-longest-match-ldp.html

As before, as far as I can tell only Juniper have implemented this:
- Is anyone use this?
- Are you using it in a mixed vendor network?
- What is your use case for using it?

I'm looking at IGP/MPLS scaling issues where some smaller access layer
boxes that run MPLS (e.g. Cisco ME3600X, ASR920 etc.) and have limited
TCAM. We do see TCAM exhaustion issues with these boxes however the
biggest culprit of this is Inter-AS MPLS Option B connections. This is
because Inter-AS OptB double allocates labels, which means label TCAM
can run out before we run out of IP v4/v6 TCAM due to n*2 growth of
labels vs prefixes.

I'm struggling to see the use case for the feature linked above that
has been implemented by Juniper. When running LDP the label space TCAM
usage increments pretty much linearly with IP prefix TCAM space usage.
If you're running the BGP VPNv4/VPNv6 address family and per-prefix
labeling (the default on Cisco IOS/IOS-XE) then again label TCAM usage
increases pretty much linearly with IP prefix TCAM usage. If you're
using per-vrf/per-table labels or per-ce labels then label TCAM usage
increases in a logarithmic fashion in relation to IP Prefix usage, and
in this scenario we run out of IP prefix TCAM long before we run out
of label TCAM.

My point here is that label TCAM runs out because of BGP/RSVP/SR
usage, not because of LDP usage.

So who is using this feature/RFC on low end MPLS access boxes (QFX5100
or ACX5048 etc.)?
How is it helping you?
Who's running out of MPLS TCAM space (on a Juniper device) before they
run out of IP prefix space when using LDP (and not RSVP/SR/BGP)?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Egress Protection/Service Mirroring

2018-07-19 Thread James Bensley

On 15 July 2018 at 19:20, Krzysztof Szarkowicz  wrote:
...
>>> https://pc.nanog.org/static/published/meetings/NANOG71/1451/20171004_Szarkowicz_Fast_Egress_Protection_v1.pdf
>>> https://www.youtube.com/watch?v=MoZn4qq3FcU=69=0s=UUIvcN8QNgRGNW9osYLGsjQQ
...
>> I was originally refering to
>> draft-minto-2547-egress-node-fast-protection-03, is
>> draft-shen-mpls-egress-protection-framework-07 majorly different off
>> the top of your head? I'll read that draft during the week as well as
>> your slides and check for my self, looking at the table of contents
>> though there seems to be clear overlap.
>
> [Krzysztof] In the meantime, these drafts migrated to 
> draft-ietf-mpls-egress-protection-framework, so please look just at  
> draft-ietf-mpls-egress-protection-framework. The current version is 
> draft-ietf-mpls-egress-protection-framework-01 (expiring Dec 2018).
...
>> So which draft is
>> implemented in the Juniper.net documents I linked, anyone know?
>
> [Krzysztof] Juniper implements draft-ietf-mpls-egress-protection-framework. I 
> am not in the position to comment on what is implemented by other 
> vendors/operators. Deutsche Telekom (DT) is co-authoring the 
> draft-ietf-mpls-egress-protection-framework,  and the world-wide first ever 
> deployment of MPLS egress protection for L3VPNs (as mentioned in NANOG71 
> slide deck) was implemented at DT couple of years ago. It works perfectly 
> since then, giving ~50 ms failover during PE failures.

[JB] Thanks for all the info Krzysztof. I've read the draft and
everything is clear to me now. It turns out I already had it under the
name "draft-ietf-mpls-egress-protection-framework" in my inbox from
the IETF WG mailing list and hadn't gotten round to reading it yet. I
might have some feedback on the draft, in which case, I will post back
to the WG mailing list.

I have found PR1278535 with Juniper so I can see that bugs are being
fixed for this feature which is good to know.

I'll speak to Cisco to see if they plan on adopting the draft too.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Egress Protection/Service Mirroring

2018-07-15 Thread James Bensley

On 15 July 2018 at 11:12,   wrote:
> @James is on my todo list so maybe we can exchange notes, (I plan on using
> it in RSVP-TE environment so the added complexity will be only marginal).
> Yes I've been waiting for this feature for quite some time in cisco (got
> promises that maybe on SR) -maybe you can dig some of the old threads I had
> with Oliver Boehmer on this

Thanks Adam. Interesting, I'm in a split mind - It looks helpful in
certain scenarios where HA/FRR is required end-to-end (i.e. CE-PE,
PE-P, P-P, PE-PE etc.). There are still "black spots" in existing FRR
mechanisms however, this feature seems too complex for widespread
deployment to me (i.e. using this for every L3 VPN customer seems too
much additional complexity) but, if like us you provide VoIP to the
emergency services for example, then in those specific cases it seems
it could be a reasonable exception for some added complexity, to fill
in the end-to-end FRR black spots.

Interesting what you say about Cisco - I'll reach our to our SE during
the week to see if he can shed any light on this. Seeing as only
Juniper seem to support draft-minto and we're mixed C/J network it's a
definite no go without multi-vendor support. Having said that
though...

On 15 July 2018 at 12:55, Krzysztof Szarkowicz  wrote:
> Hi,
>
> Egress protection was presented at NANOG71:
>
> https://pc.nanog.org/static/published/meetings/NANOG71/1451/20171004_Szarkowicz_Fast_Egress_Protection_v1.pdf
> https://www.youtube.com/watch?v=MoZn4qq3FcU=69=0s=UUIvcN8QNgRGNW9osYLGsjQQ
>
> Another bing name, who implemented it in the network, was mentioned at NANOG
> (check the preso). They are using it for L3VPN protection.

Thanks for the info Krzysztof. I will read through the slides during
the week - eating cake and sun bathing right now...

I was originally refering to
draft-minto-2547-egress-node-fast-protection-03, is
draft-shen-mpls-egress-protection-framework-07 majorly different off
the top of your head? I'll read that draft during the week as well as
your slides and check for my self, looking at the table of contents
though there seems to be clear overlap.

So we have Juniper + France Telecom on draft-minto and Juniper +
Huawei + Orange + RtBrick + T-Systems/DTAG on the drat-shen document,
all using this/these feature(s) on Juniper. So which draft is
implemented in the Juniper.net documents I linked, anyone know?

It makes me feel very confident that opening these features to testing
in our lab wouldn't be a waste of time, all operators are an order of
magnitude larger than us, maybe even two orders. However, without
Cisco support is a no go - we can't have vendor specific technologies
so we definitely need to give Cisco a bump.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Egress Protection/Service Mirroring

2018-07-15 Thread James Bensley

On 15 July 2018 at 12:47, Saku Ytti  wrote:
> On Sun, 15 Jul 2018 at 13:12,  wrote:

Hi Saku, Adam,

>> > a) If P2->PE2 goes down, we have to wait for PE1 to experience it, after
>> PE1
>> > experiences it, it can immediately redirect to PE3
>> > b) If PE2->CE2 goes down, PE2 should  be able to redirect to PE3
>> >
>
>> But this feature solves only egress PE node failure protection (no need for
>> PIC "core" or IGP tuning), that is BGP-PIC Edge is still needed to protect
>> for egress PE-CE link failures.
>
> Are you sure? To me it looks like this feature also fixes PE=>CE,
> because the primary egress PE also knows the backup egress PE, so if
> it gets packet it cannot forward to 'LAN', it should be able to bypass
> the packet to backup egress PE.
> It's not obvious to me how the P can send it to backup egress PE, if P
> cannot reach primary egress PE.

Saku it fixes your scenario A. The PLR (P2 in your topology)
advertises a secondary new loopback IP into the IGP. PE2 also
advertises this same additional loopback IP and sets net-hop-self to
that IP for VPN prefixes. So prefixes are floating around the iBGP
with next-hop of PE2's secondary loopback IP. The same IP is
advertised from PLR/P2 but with a worse IGP metric so VPN traffic is
sent to PE2 as normal.

When PE2 goes down P2 is the first to know (before PE1) and P2 is now
the best source of this additional loopback IP within the IGP. It
beocmes the "collector" (collecting VPN traffic for PE2's secondary
loopback IP). No one knows PE2 is down so VPN traffic heads to P2
because it's originating the same loopback IP ("context ID"). In this
topology P2 would have to have BGP running and receiving prefixes from
PE2 and PE3. P2 then performs a label swap; it swaps the service label
PE1 is using for traffic sent towards PE2's secondary loopback IP, to
the service labels used on the backup "protector" egress PE, PE3, then
forward the traffic towards PE3 (the "protector").

However, P2 is not a great PLR "collector". PE3 in this topology would
be a better PLR/collector because it already runs BGP for VPN
signaling. PE3 could be a joint collector and protector. I think this
is what Krzysztof is getting at, PE2 and PE3 would be good "collector"
and "protector" nodes for each other:

On 15 July 2018 at 12:55, Krzysztof Szarkowicz  wrote:
> MPLS egress protection is simple, if you have ‘standardized’ PE pairs. I.e.,
> you have CEs connected to PE1/PE2, CEs connected to PE3/PE4, CEs connected
> to PE5/PE6, and so on, but no CEs connected to PE1/PE4

For your scenario B Saku, that is a separate feature, that is
basically PIC Edge Link Protection / BGP Advertise Best-External - PE3
will never advertise his PE-CE prefixes if he see's better ones via
PE2.

On 15 July 2018 at 11:12,   wrote:
> But this feature solves only egress PE node failure protection (no need for
> PIC "core" or IGP tuning),

I would slightly disagree with you there mate; In Saku's topology with
PE3 is the collector and protector for PE2 (not P2 to keep the P's BGP
free), and they both advertise the same loopback IP (context ID)
inside the IGP but with PE3 less preferred, so that PE3 can "collect"
the VPN traffic destined for PE2 when PE2 is down, I'd like that
alternate path to PE3 to be pre-installed in my P node FIBs/HW a-la
PIC/FRR.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Egress Protection/Service Mirroring

2018-07-15 Thread James Bensley

Hi All,

Has anyone used Egress Protection/Service Mirroring, anyone got any
stories they can share good or bad?

To clarify, I'm talking about:
https://tools.ietf.org/html/draft-minto-2547-egress-node-fast-protection-03
https://www.juniper.net/documentation/en_US/junos/topics/task/configuration/Edge-node-failure-protection-BGP-signalled-PWs.html
https://www.juniper.net/documentation/en_US/junos/topics/example/Edge-node-failure-protection-BGP-signalled-PWs.html

Even though the Juniper articles are about L2 VPN protection I'm
actually interested in L3 VPN protection. I'm not interested in
providing this to any of our L3 VPN customers, my use case would be
hosted internal platforms e.g. hosted voice, which needs rapid
recovery times end-to-end.

Was it worth the added complexity?

Did it work for you as expected?

Do you have any other vendors that support it?

Looking at the draft it seems Juniper probably implemented it
specially for France Telecom. I can't find any other vendor (searching
on Google) that is supporting it (or a variation of it).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-09 Thread James Bensley

On 8 July 2018 21:35:36 BST, adamv0...@netconsultings.com wrote:
>Hold on gents,
>You are still talking about multi-hop TCP sessions, right? Sessions
>that
>carry information that is ephemeral to the underlying transport network
>-why
>would you want those session ever go down as a result of anything going
>on
>in the underlying transport network -that's a leaky abstraction , not
>good
>in my opinion.
>You just reroute the multi-hop control-plane TCP session around the
>failed
>link and move on, failed/flapping link should remain solely a
>data-plane
>problem right?
>So in this particular case the VC label remains the same no matter that
>transport labels change in reaction to failed link.
>The PW should go down only in case any of the entities it's bound to
>goes
>down be it a interface or a bridge-domain at either end (or a whole PE
>for
>that matter) -and not because there's a problem somewhere in the core.

I was having the exact same thoughts. LDP or BGP signaled - it should
be independent of IGP link flaps. Saku raises a good point that with
BGP signaled we can have multiple RR's meaning that loosing one
doesn't mean that the server state is lost from the network (so the
service stays up) however, if there is one ingress PE that SPoF
undermines multiple RR's. With LDP we can signal backup pseudowires
(haven't tried with BGP?) - there is a service disruption whilst the
LDP session is detected as dead - but it does work if you have two
ingress PEs and two egress PEs and set up a crisscross topology of
pseudowires/backup-pseudowires.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-07 Thread James Bensley

On 5 July 2018 at 09:40, Mark Tinka  wrote:
>
> In our case, we have different boxes from Cisco, each with varying support
> for SR. This makes things very tricky, and then we need to also throw in our
> Juniper gear. For me, the potential pain isn't worth the hassle, as we are
> not suffering in any way that makes the move to SR overly compelling.

Previously I mentioned that we build out greenfield regional networks
but the core that links them is of course brownfield. We have the same
problem there, mixed Cisco Juniper and a reasonable amount of variance
then within those two vendor selections. As previously mentioned,
there are no requirements that we can only fix with SR and the
benefits aren't worth the truckroll to get SR capable kit and code
everywhere.

> - Go IPv6 native: If using ISIS as the IGP we should be able to go
> IPv4 free (untested and I haven't research that much!).
>
>
> For me, this is the #1 use-case I was going for; to be able to natively
> forward IPv6 packets inside MPLS, and remove BGPv6 from within my core.
>
> I had a discussion about this with Saku on NANOG:
>
> http://seclists.org/nanog/2018/May/257
>
> Where we left things was that while the spec allows for signaling of IPv6 in
> the IGP, there is no clear definition and/or implementation of MPLSv6 in the
> data plane today.

Ah, I remember that thread. It became quite long and I was very busy
so I lost track of it. Just read through it. I also looked at LDPv6 a
while back and saw it was not well supported so passed. For us 6PE
(and eventually 6vPE as we move to Internet in a VRF) "just works".
IPv6 native in SR isn't actually enough of a reason for me to migrate
to it I don't think.

You mentioned in the NANOG thread that you wanted to remove BGP from
your core - are you using 6PE or BGP IPv6-LU on every hop in the path?
I know you are a happy user of BGP-SD so I guess it's Internet in the
GRT for you?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-06 Thread James Bensley

On 5 July 2018 09:56:40 BST, adamv0...@netconsultings.com wrote:
>> Of James Bensley
>> Sent: Thursday, July 05, 2018 9:15 AM
>> 
>> - 100% rFLA coverage: TI-LA covers the "black spots" we currently
>have.
>> 
>Yeah that's an interesting use case you mentioned, that I haven't
>considered, that is no TE need but FRR need.
>But I guess if it was business critical to get those blind spots
>FRR-protected then you would have done something about it already
>right?

Hi Adam,

Yeah correct, no mission critical services are effected by this for us, so the 
business obviously hasn't allocated resource to do anything about it. If it was 
a major issue, it should be as simple as adding an extra back haul link to a 
node or shifting existing ones around (to reshape the P space and Q space to 
"please" the FRR algorithm).

>So I guess it's more like it would be nice to have,  now is it enough
>to
>expose the business to additional risk? 
>Like for instance yes you'd test the feature to death to make sure it
>works
>under any circumstances (it's the very heart of the network after all
>if
>that breaks everything breaks), but the problem I see is then going to
>a
>next release couple of years later -since SR is a new thing it would
>have a
>ton of new stuff added to it by then resulting in higher potential for
>regression bugs with comparison to LDP or RSVP which have been around
>since
>ever and every new release to these two is basically just bug fixes.   

Good point, I think its worth breaking that down into two separate 
points/concerns:

Initial deployment bugs:
We've done stuff like pay for a CPoC with Cisco, then deployed, then had it all 
blow up, then paod Cisco AS to asses the situation only to be told it's not a 
good design :D So we just assume a default/safe view now that no amount of 
testing will protect us. We ensure we have backout plans if something 
immediately blows up, and heightened reporting for issues that take 72 hours to 
show up, and change freezes to cover issues that take a week to show up etc. 
etc. So I think as far as an initial SR deployment goes, all we can do is our 
best with regards to being cautious, just as we would with any major core 
changes. So I don't see the initial deployment as any more risky than other 
core projects we've undertaken like changing vendors, entire chassis 
replacements, code upgrades between major versions etc.

Regression bugs:
My opinion is that in the case of something like SR which is being deployed 
based on early drafts, regression bugs is potentially a bigger issue than an 
initial deployment. I hadn't considered this. Again though I think its 
something we can reasonably prepare for. Depending on the potential impact to 
the business you could go as far as standing up a new chassis next to an 
existing one, but on the newer code version, run them in parallel, migrating 
services over slowly, keep the old one up for a while before you take it down. 
You could just do something as simple and physically replace the routing 
engine, keep the old one on site for a bit so you can quickly swap back. Or 
just drain the links in the IGP, downgraded the code, and then un-drain the 
links, if you've got some single homed services on there. If you have OOB 
access and plan all the rollback config in advance, we can operationally 
support the risks, no differently to any other major core change.

Probably the hardest part is assessing what the risk actually is? How to know 
what level of additional support, monitoring, people, you will need. If you 
under resource a rollback of a major failure, and fuck the rollback too, you 
might need some new pants :)

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-06 Thread James Bensley

On 5 July 2018 14:08:02 BST, Aaron Gould  wrote:
>I really like the simplicity of my ldp-based l2vpn's... eline and elan 
>
>You just made me realize how that would change if I turned off ldp.
>
>So, SR isn't able to signal those l2circuits, and manual vpls instances
>?
>... I would have to do all that with bgp ?  I use bgp in some cases for
>rfc4762, but not for simple martini l2circuits.
>
>My entire cell backhaul environment is based on ldp based pseudowires.

Hi Aaron,

Yes that would be a change in your existing setup but only if you turned off 
LDP. SR fully supports (on paper at least!) running LDP and SR simultaneously 
so you wouldn't need a big bang approach and have to hard switch if you were to 
move to BGP signalled services and/or SR. However, I don't think SR is designed 
to be run along side LDP long term either. I'm sure bugs will pop up, if you 
can use LDP for only signalling L2 VPNs somehow and SR for transport LSP 
signalling you wouldn't need to migrate. I think on Juniper you might be able 
to raise the "preference" (Administrative Distance is Cisco parlance) of LDP 
separate from the IGP but I don't think you can do that on Cisco?

I'm ranting a bit here, but I'd personally look to move to all BGP signalled 
services if I was moving to SR. You have one protocol for IGP transport (SR 
extended OSPF or SR extended IS-IS) and one protocol for all service transport 
signalling (BGP). We (the industry) have our lovely L3 VPNs already, with 
standard BGP communities, RTs and RDs and then a bunch of policies and route 
reflectors to efficiently control route distribution and label allocation. We 
also have high-availability of that information through RR clusters and 
features like BGP Add-Path and PIC. We also have good scalability from 
signalled services using FAT and Entropy labels.

Now with BGP signalled EVPN using MPLS for transport instead of VXLAN, we have 
again RTs and RDs and communities et al. This means we can use similar policies 
on the same RR's to control route (MAC or GW) and label distribution 
efficiently and only to those who exactly need to carry the extra state. We get 
to use the same HA and scalability benefits too. Even with BGP signalled and 
BGP based auto discovery for ELINE services, we control who has that AFI/SAFI 
combo enabled cleanly. With LDP, the configuration and control are both fully 
distributed to the PEs. Not a major issue, but "BGP  for everything" helps to 
keep the design, implementation and limitations of all our services more 
closely aligned.

If you're also using FlowSpec, BMP, BGP-LS, BGP-MDT etc, it makes sense to me 
to keep capitalising on that single signaling protocol for all services.

Cheers,
James.

P.s. sorry, on a plane so I've got time to kill.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-05 Thread James Bensley

On 4 July 2018 at 18:13, Mark Tinka  wrote:
>
>
> On 4/Jul/18 18:28, James Bensley wrote:
>
> Also
>
> Clarence Filsfils from Cisco lists some of their customers who are
> happy to be publicly named as running SR:
>
> https://www.youtube.com/watch?v=NJxtvNssgA8=youtu.be=11m50s
>
>
> We've been struggling to get vendors to present deployments from their
> customers when they submit talks around SR. So the SR talks end up becoming
> updates on where SR is from a protocol development standpoint, recaps for
> those that are new to SR, e.t.c.
>
> Perhaps those willing to talk about SR from the vendor community do not have
> the in with their customers like folk like Clarence might, but I'm not sure.
>
> I'll reach out to Clarence and see if we can get him to talk about this with
> one or two of his customers at an upcoming meeting.

Hi Mark,

If you get any feedback you can publicly share I'm all ears!

As far as a greenfield deployment goes I'm fairly convinced that SR
would be a good idea now, it would future proof that deployment and
for our use case it does actually bring some benefits. To explain
further; we don't have one large contiguous AS or IGP, we build
regional MPLS networks, each on a private AS (and with a stand alone
IGP+LDP) and use Inter-AS services to provide end-to-end services over
the core AS network between regional networks.

If we built a new regional network tomorrow these are the benefits I
see from SR over our existing IGP+LDP design:

- Obviously remove LDP which is one less protocol in the network: This
means less to configure, less for CoPPs/Lo0 filter, less for inter-op
testing as we're mixed vendor, less for operations to support.

- Easier to support: Now that labels are transported in the IGP I hope
that it would be easier to train support staff and troubleshooting
MPLS related issues.They don't need to check LDP is up, they should
see the SID for a prefix inside the IGP along with the prefix. No
prefix, then no SID, etc. I would ideally move all services into BGP
(so no more LDP signaled pseudowires, BGP signaled only service to
unify all services as BGP signaled [L3 VPN, L2 VPN
VPWS/EVPN/VPLS/etc.]).

- Go IPv6 native: If using ISIS as the IGP we should be able to go
IPv4 free (untested and I haven't research that much!).

- Bring label mapping into the IGP: No microloops during
re-convergence as we heavily use IP FRR rLFA.

- 100% rFLA coverage: TI-LA covers the "black spots" we currently have.

- Remove LACP from the network: SR has some nice ECMP features, I'm
not going to start an ECMP vs LAG discussion (war?) but ECMP means we
don't need LACP which again is one less protocol for inter-op testing,
less to configure, less to support etc.It also keeps our p-t-p links
all they same instead of two kinds, p-t-p L3 or LAG bundle (also fewer
config templates).

- Remove microBFD sessions: In the case of LAGs in the worst case
scenario we would have LACP, uBFD, IGP, LDP and BGP running over a set
of links between PEs, we can chop that down to just BFD, IGP and BGP
with SR. If we wish, we can still have visibility of the ECMP paths or
we can use prefix-suppression and hide them (this goes against my IPv6
only item above as I think IS-IS is missing this feature?).


The downsides that I know of are;

- Need to up-skill staff: For NOC staff it should be easy, use this
command "X" to check for prefix/label, this command "Y" to check for
label neighborship. For design and senior engineers since we don't use
MPLS-TE it shouldn't be difficult, we're typically deploying
set-and-forget LDP regional networks so they don't need to know every
single detail of SR (he said, naively).

- New code: Obviously plenty of bugs exist, in the weekly emails I
receive from Cisco and Juniper with the latest bug reports many relate
to SR. But again, any established operator should have good testing
procedures in place for new hardware and software, this is no
different to all those times Sales sold something we don't actually
do. We should all be well versed in testing new code and working out
when it's low risk enough for us to deploy. Due to our lack of MPLS-TE
I see SR as fairly low risk.


I'd be very interested to hear yours or anyone else's views on the
pros and cons of SR in a greenfield network (I don't really care about
brownfield right now because we have no problems in that existing
networks that only SR can fix).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-04 Thread James Bensley

On 4 July 2018 at 17:09, James Bensley  wrote:
> On 4 July 2018 at 10:09, Mark Tinka  wrote:
>>
>>
>> On 4/Jul/18 10:58, Niall Donaghy wrote:
>>> Hi Mark,
>>>
>>> As for segment routing, several of our NREN partners have SR up and running 
>>> in their backbones.
>>> We in GÉANT (the backbone that connects these NRENs) are looking toward 
>>> deploying SR across our entire backbone in the medium term.
>>
>> Thanks, Niall. This will probably be the first deployment of SR in the
>> wild that I've heard of (I'm on the PC for several NOG's, and getting a
>> submission on SR from anyone other than a vendor has been the bane of my
>> PC existence since 2013).
>
> Hi Mark,
>
> Walmart, Microsoft and Comcast all claim to have been running SR since 2016:
>
> http://www.segment-routing.net/conferences/2016-sr-strategy-and-deployment-experiences/

Also

Clarence Filsfils from Cisco lists some of their customers who are
happy to be publicly named as running SR:

https://www.youtube.com/watch?v=NJxtvNssgA8=youtu.be=11m50s

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

2018-07-04 Thread James Bensley

On 4 July 2018 at 10:09, Mark Tinka  wrote:
>
>
> On 4/Jul/18 10:58, Niall Donaghy wrote:
>> Hi Mark,
>>
>> As for segment routing, several of our NREN partners have SR up and running 
>> in their backbones.
>> We in GÉANT (the backbone that connects these NRENs) are looking toward 
>> deploying SR across our entire backbone in the medium term.
>
> Thanks, Niall. This will probably be the first deployment of SR in the
> wild that I've heard of (I'm on the PC for several NOG's, and getting a
> submission on SR from anyone other than a vendor has been the bane of my
> PC existence since 2013).

Hi Mark,

Walmart, Microsoft and Comcast all claim to have been running SR since 2016:

http://www.segment-routing.net/conferences/2016-sr-strategy-and-deployment-experiences/

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] Leaked Video or Not (Linux and Cisco for internal Sales folks)

2018-06-29 Thread James Bensley

On 29 June 2018 at 13:55, Gert Doering  wrote:
> Hi,
>
> On Fri, Jun 29, 2018 at 01:49:46PM +0100, adamv0...@netconsultings.com wrote:
>> Just wondering what's the latest on the GPU for packet forwarding front (or 
>> is that deemed legacy now)?
>
> Last I've heard is that pixel shaders do not map really nicely to the
> work needed for packet forwarding - so it works, but the performance gain
> is not what you'd expect to see.

Which is to be expected right? Typical GPU instruction sets and ALUs
are great for floating point operations which we don't need for packet
processing. Packets typically need low complexity tasks performed at
high rates. Various high end DC switches like Nexus boxes use GDDR5
RAM just as a graphics card would but the processing is done by an
ASIC, which makes sense to me, that is not the place for general
purpose x86 compute chips. This is a specific task.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] How does internal communication between vMX virtual control plane and virtual forwarding plane work?

2018-06-06 Thread James Bensley

On 4 June 2018 at 13:46, Martin T  wrote:
> Hi!

Hi!

> When I deploy a vMX using orchestration scripts, then I end up with
> following virtualized topology:
>
> https://i.imgur.com/bBTXGM0.png
>
> Now when I execute "file copy root@192.168.122.1:/tmp/1G_file
> /dev/zero" in vMX, then I can see that traffic traverses
> virbr0[ge-0.0.0-vmx1] <-> [ge-0/0/0]vcp-vmx1[em1] <->
> [vcp-int-vmx1]br-int-vmx1[vfp-int-vmx1] <-> [int]vfp-vmx1. Am I
> misunderstaning this? Or does it really work in a way that first the
> VM running Junos receives the traffic, then forwards it to VM running
> virtualized Trio and then the traffic is forwarded back to Junos VM?

Have I missed something in relation to your topology/config; ge-0/0/0,
is that meant to provide you with management access to the VCP or is
it supposed to be a forwarding place interface? If the later,
shouldn't it be connected to the VFP VM and not the VCP VM (assuming
you are trying to access the control plane over an in-band/forwarding
plane interface)?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Force a reboot from the serial console?

2018-06-01 Thread James Bensley

On 31 May 2018 at 18:04, Chris Adams  wrote:
> I had an MX80 crash (insert sad face here) - worse problem was that it
> did a crash dump and then did NOT reboot.  I have out-of-band serial
> access to the console, so I could see that, after the dump completed, it
> just printed:
>
> watchdog: scheduling fairness gone for 240 seconds now.
>
> every 20 seconds (with the seconds count increasing).  I had to open a
> remote-hands ticket to get it power cycled.
>
> Is there any way to for a reboot at that point?  Like, on old Sun
> servers, sending a BREAK followed by certain other keys could get to the
> firmware and hard-reboot the system, even with the OS was fubar.  Is
> there anything like that on a Juniper?

Some network devices don't even have console ports or OOB IP MGMT
ports etc. so for this reasons we deploy PDUs with remote power
cycling. We can hard power cycle a crashed device. I strongly
recommend this.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] "show ip cef exact-route"

2018-05-18 Thread James Bensley

On 18 May 2018 at 09:37,  <adamv0...@netconsultings.com> wrote:
> So to clarify couple of things,
>
> First thing first,
> The "show cef exact-route" and "show mpls forwarding exact-route" is just a 
> simulation, so it's not the real thing as I thought unfortunately.
>
> Secondly,
> One can use [ location node-id ] with the above commands to instruct the 
> simulation what hash function to use in an environment with multiple NPU 
> versions (Trident, Typhoon, Tomahawk, etc...).
>
> Oh and there's a bug as well, CSCug36061 (affected 4.2.3. and 4.3.2)

As previously mentioned

On 15 May 2018 at 09:47, James Bensley <jwbens...@gmail.com> wrote:
> With regards to ECMP/LAG you need to run some extra commands that are
> often platform specific. For ASR9K for example use the "bundle-hash"
> command - I haven't used it in a while but I think that is what you're
> looking for (e.g. "bundle-hash bundle-ether 1 location 0/0/CPU0").

"bundle-hash" example here:
https://null.53bits.co.uk/index.php?page=asr9000-load-balancing#hash-fields

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] "show ip cef exact-route"

2018-05-18 Thread James Bensley

On 15 May 2018 at 10:20, Zsolt Hegyi  wrote:
> In case you haven't read it yet, there is a free book called This Week: An
> Expert Packet Walkthrough on the MX Series 3D by David Roy, it has a bunch
> of examples on using jsim and other FPC/PFE commands, including what I think
> might be your exact use-case. Might come in handy.
>
> Zsolt

Oh man! Facepalm!

I remember now that I read that back when it came out - I skipped over
the JSIM part as I didn't need it back then and was more interested at
the time in some other details relevant to a project at the time,
making a mental note to come back to the JSIM part...3 years later
I've obviously forgotten to read that and now I need it :)

Thanks,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] "show ip cef exact-route"

2018-05-15 Thread James Bensley

On 15 May 2018 at 02:51, Nikolas Geyer  wrote:
> Someone at Juniper has kindly reached out and advised that a similar command 
> was added in 17.1R1 for the MX;
>
> https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-forwarding-options-load-balance-ingress-interface.html

I should have been clearer I guess - although the example link I
provided was to find the outgoing interface when using ECMP, I simply
wanted to supply ingress details and find the egress details, for a
single link. This is what I can do with "show ip cef exact-route...".
I can specify the ingress/input details as apposed to commands like
"show route " which show egress information without considering
any ingress information.

> On 14 May 2018, at 7:32 pm, Saku Ytti > 
> wrote:
>
> Have you found cef exact-route to be correct?
>
> Last time I used this (ASR9000), it was giving wrong results to me. I
> think there is entirely separate piece of code for LAG result in
> software code and the CSCO EZChip microcode, and different people code
> IOS-XR than ezchip, so I think there is failure mode where one code is
> updated, and another is not, I hope I'm wrong.

In the case of non-ECMP/LAG which is what I was referring to - "what
is the egress interface AND egress encapsulation details when a
packing ingresses with details X?". That is what I get from "show ip
cef exact-route ..." on IOS/IOS-XE/IOS-XR. In this single egress/path
case - yes "show ip cef " works fine for me.

With regards to ECMP/LAG you need to run some extra commands that are
often platform specific. For ASR9K for example use the "bundle-hash"
command - I haven't used it in a while but I think that is what you're
looking for (e.g. "bundle-hash bundle-ether 1 location 0/0/CPU0").

> But if I'm right, then the only way to do this, is actually ask the
> microcode 'hey i have this packet, do a lookup for it', or like in
> CAT7600/ELAM, get lookup results for real traffic.

Yes, 7600 ELAM is great - all platforms should have this. This is
exactly what I want. The reason is that I can match any ingress packet
(by specifying a mask in hex to match the packet headers) and it will
not only tell me the egress details for "good" paths but also for bad
paths it shows me the egress drop interface which might be a drop/null
interface ID, hardware policer, or that the packet was recirculated,
etc. So when packets aren't passing through the device as expected I
can see whats happening in hardware - this is what I'm interested,
when things don't work properly, how can I ask the hardware what it's
doing with the packet in Junos.

> On 15 May 2018 at 02:27, Nikolas Geyer 
> > wrote:
> Unless it’s changed in newer releases there is no equivalent which is 
> annoying.
>
> I believe you can drop to the FPC vty and extract the information card by 
> card similar to the link you shared, but it’s not exactly a workable 
> solution, nor “officially supported” by Juniper.
>
> The lack of this command is literally my biggest frustration with Juniper.

I believe it is there - it's just a private/internal tool. JSIM is
like 7600 ELAM, we can build a parcel and sent it to the PFE/PPE and
get the response. JTAC closed my case asking how to use it, they
refused to provide any documentation or explanation, which is very
unhelpful.

NPC0(abr2-ld5slo.core vty)# set jsim input-port wan

NPC0(abr2-ld5slo.core vty)# set jsim ipsrc 10.0.0.50

NPC0(abr2-ld5slo.core vty)# set jsim ipdst 10.0.0.20

NPC0(abr2-ld5slo.core vty)# set jsim ip-protocol 1

NPC0(abr2-ld5slo.core vty)# show jsim packet

08004532
000120018685
0a320a14

NPC0(abr2-ld5slo.core vty)# jsim run 1
[May 11 12:22:10.114 LOG: Info] TTRACE PPE14 Context2 now in-active.

^ After running "jsim run" I receive lots of output that is going to
take a long time to reverse engineer and work out what it all means. I
haven't got the time right now so I was hoping JTAC would help but I
guess I'll just have to work out it for myself.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] "show ip cef exact-route"

2018-05-11 Thread James Bensley

Hi All,

Does anyone know of a command like the Cisco CEF "exact-route" command
on Juniper?

I've seen this older thread: https://lists.gt.net/nsp/juniper/50645

Which links to a post on using JSIM but for DPC cards, but I'm
interested in MPC cards:
http://junosandme.net/article-junos-load-balancing-part-3-troubleshooting-109382234.html

Does anyone have any details on using JSIM on MPC cards on MX
platforms? Or is there "another way" ?

I'm going to open a JTAC case as well as asking here however, in the
past they have rejected requests to explain PFE commands to me or
provide documentation for them. I have managed it a few times but only
after a couple of weeks of non-stop screaming. So I'm not holding my
breath for that option.

Kind regards,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] mpls.0 doesn't show LSI as the next hop

2018-05-02 Thread James Bensley

Hi Arie,

On 1 May 2018 at 23:21, Arie Vayner  wrote:
> user@MX104> show route table mpls.0
> 16 *[VPN/0] 00:25:28
>   to table vpn_public_vrf.inet.0, Pop
>
>
> While if we do the same on our MX240 it looks like this:
> 18 *[VPN/0] 4d 21:24:17
> > via *lsi.2048* (vpn_public_vrf), Pop

Just checked a lab MX104, it shows the same output as your MX240
output (it's running 16.1R6.7).

Routing instance type vrf && vrf-table-label yeah?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] migration from cisco VRF+Vrrp to the juniper ACX

2018-05-02 Thread James Bensley

On 1 May 2018 at 17:07, A. Camci  wrote:
> does anyone have an idea why it does not work on Acx( vrf+ vrrp).
>
> Br ap

How have you tried to debug this set up?

>From your original em ail:"maybe vrf+vrrpdoesnt work on a ACX." - Have
you confirmed this, are you trying something that isn't
supported/supposed to work?

"after migration to the ACX has customer no connection from the VRF.
if we switch back to cisco, everything works fine" - How are you
verifying this?

When ACX is master have you checked ARP and MAC tables to see that
physically traffic is being forwarded to the ACX?

Is there a switch inbetween, correct VLANs are allowed on the correct
ports? Are the MAC tables updated after master switch over?

Have you checked the interface counters on the ACX to see that traffic
is coming into the ACX?

Can you run a packet capture on the ACX to see it is the correct
traffic? Can you mirror the switch ports to check there too?

Do you see any drop counters on the ACX?

What have you actually done to debug this issue?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] mx960 to mx960 via ciena 6500 - mtu smaller in the middle

2018-04-18 Thread James Bensley

On 17 April 2018 at 11:57, Gert Doering <g...@greenie.muc.de> wrote:
> Hi,
>
> On Tue, Apr 17, 2018 at 12:34:18PM +0300, Saku Ytti wrote:
>> On 17 April 2018 at 11:25, James Bensley <jwbens...@gmail.com> wrote:
>>
>> > Also you say you have OSPF and LDP up but if you bring up BGP over
>> > this link you may have issues. BGP packs UPDATE messages up to the TCP
>> > MSS (derived from the link MTU). If you are carrying the full table
>>
>> BGP messages are limited to 4096.
>
> Which does not stop TCP from coalescing two messages into one 8 kbyte packet.
>
> (As in "I have had a link that was transported over someone else's packet
> network, and when they re-routed in an outage, I lost 8 bytes MTU,
> resulting in TCP failing across a 9200-minus-8-byte-link...")

This is what I have observed on IOS, UPDATES are packed to the TCP
MCC. Not sure if it's the case for Junos too but I simply wouldn't
risk it.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] mx960 to mx960 via ciena 6500 - mtu smaller in the middle

2018-04-17 Thread James Bensley

On 16 April 2018 at 20:58, Aaron Gould  wrote:
> See juniper interface MTU is set to max 16000 bytes. but when I ping I can
> only get 9584 bytes through to the other side of the link.  This mx960 is
> linked to another mx960, but Ciena 6500 dwdm is in between the mx960's.

Hi Aaron,

I think MTU consistency is important. It will make for a support
issues if the two devices has 16,000 byte MTU configured and that
can't actually be achieved. Will the Level1 support/NOC guys know to
check the Ciene device too and will they be able to work out the
problem? You don't issue being escalated for what is the "normal"
behavior.

Also you say you have OSPF and LDP up but if you bring up BGP over
this link you may have issues. BGP packs UPDATE messages up to the TCP
MSS (derived from the link MTU). If you are carrying the full table
for example, then you could end up with BGP UPDATE messages 16000
bytes long they won't cross the link. You end up with BGP establishing
and after $timeout flapping because no UPDATES were received and this
process loops round indefinitely until you bodge the TCP MSS, use
PMTUD or (preferred choice) correct the MTU issue.

Another issue you may face is that it's best to have consistent MTUs
across you entire network, not just this point-to-point link. If other
parts of your network don't have this size MTU you may have some
hard-to-debug issues further down the road when a customer joins to
two different parts of the network with disparate MTU sizes.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Going Juniper

2018-04-12 Thread James Bensley

>>> On 11 April 2018 at 13:43, Ola Thoresen  wrote:
>>> Granted at least JNPR offering allows you to run same device as pure
>>> L2, with Cisco offering it is satellite-only box, cannot be used as
>>> L2.
>>
>> I know what you mean, but I must say that this time it seems like they have 
>> more or less managed to do it right.  They use pretty standard protocols for 
>> everything, it is just "packaged" so you don't need to think about it.
>>
>> But as you say, you can easily use the exact same hardware in a regular L2 
>> setup, the thing you gain from the satellite setup is central management of 
>> only the routers, which can be a time and management saver.

I have to say, I completely disagree.

On 11 April 2018 at 13:12, Alexandre Guimaraes
 wrote:
> Last notice that I have about Junos fusion, some features doesn’t work in to 
> satellite ports, like ethernet ccc.

^ This is the reason why.

Fusion is a layer 1 extension, my 1st question to Juniper is why is
there a DC version and a Service Provider version?

https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000523-en.pdf:

"Junos Fusion Provider Edge is a technology enabler that overcomes
optical pluggable p hysical limitations by delegating low-speed
optical interfaces to a cost-appropriate switch, virtually expanding
connectivity to thousands of ports from a single Juniper Networks MX
Series 3D Universal Edge Router."

"Junos Fusion Data Center provides automated network configuration and
operational simplicity for medium to large data centers with up to
four QFX1 switches deployed in the spine layer and any combination
of up to 64 QFX5100, QFX5110-48S, QFX5200-32C, and EX4300 top-of-rack
switches deployed as satellite devices."

Basically the same justification for both technology variants - more
ports for less money with zero touch deployment. Both scenarios can
benefit from those proposed advantages so no need to make them
separate products. Obviously they will to make more money, but it
looked back it you have an Ethernet based layer 1 extension technology
and there is more than one variant of it, in 2018 we're only using one
kind of Ethernet here.

OK, 2nd question, and this is my real annoyance, and why I believe
Juniper haven't done it right; Why are certain features, e.g. Inter-AS
MPLS Option B connections, not support over the Service Provider
version? We're heavy users of OptB interconnects and these devices are
supposed to be layer 1 extensions. Can they pass an Ethernet frame,
yes or no? The answer is yes so anything higher level should be
ignored in my opinion.

After pressing Juniper they said that certain traffic types aren't
support because when they release new Fusion version, they test the
most common traffic types and they "can't test everything", and MPLS
OptB's are on the "no time for testing" list. It is possible that it
would work just fine, but if any issues arise they won't support it.
Surely they should test that Ethernet frames pass OK and then support
anything that runs over Ethernet?

If you're selling a layer 1 extension service for Ethernet and a
certain kind of traffic isn't support that runs over the top of
Ethernet - that is a great big red flag to me. We only have Fusion
because they wanted more people to run it - they sold it to us for
cheaper than vanilla layer 2 switches.

I feel dirty.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Going Juniper

2018-04-11 Thread James Bensley

On 11 April 2018 at 10:31, Saku Ytti  wrote:
> New RE for MX104 was on the table early on in the MX104 history, but
> then JNPR changed tracks, citing XEON not being thermally possible on
> it.

I had heard (more or less from the horses mouth) that the MX104's were
initially developed for an Indian telco - they basically wanted MX80s
but with a higher temperature rating and dual REs, allowing them to be
used in mobile base station sites. These would be hard to reach so
they needed reliability (dual REs) and sometimes "external" (as in not
in an air-cooled DC but maybe a roof-top cabinet etc.). Upgraded REs
would have been great. I have seen some that run with low CPU
"normally" but when they are being polled by SNMP they spike to 100%.

> At least Nokia and Huawei seem to think there are other addressable
> markets still, markets which still want to use 1GE and 10GE, with
> various pluggable optics.

We're now working to introduce Huawei into the mix. I would agree with
you that low port coun't, good, and reasonably priced mixed 1G/10G
devices aren't plentiful in choice from vendors. We open a lot of
small PoPs so stuff like ME3600X/ASR920s, ASR9001, MX104 are great for
us but each with their own caveats.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] maximum-prefixes not enforced on option B gateways

2018-03-28 Thread James Bensley

On 28 March 2018 at 11:55, Pierre Emeriaud  wrote:
> Gents,
>
> I just noticed an issue on a couple of option B gateways in our
> network. The max-prefix within routing-instances is not enforced. It's
> although taken into account.
>
> This is on M120 running 12.3R6-S3 (yes I know, ancient. No, can't upgrade).

Do you have any other Junos versions that exhibit the same behavior?
Specifically do you see this on any newer Junos versions you maybe
running?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper EX4550 load balancing of MPLS over LAG

2018-03-09 Thread James Bensley

On 8 March 2018 at 21:45, Erdal Rasid  wrote:
> Now this works great in the majority scenarios, because hey let's be honest, 
> MAC addresses for the longest time started with 00-0..
>
> This fools the system to believe that the inner packet is IP, while it is an 
> Ether header in reality.
>
> Bottom line is that if your DMAC starts with a 4 or 6 you have a situation.
>
>  <>
> Solution
> Use the MPLS control word.

Now you have a new problem: if you do have Ethernet payload directly
after the MPLS stack, with a MAC address that starts with a 4 or 6 and
you add the control-world to put a 0 there, you actual Ethernet
payload is now offset by 4 bytes (the control-world is usually 0x00
0x00 0x00 0x00 unless you're using sequence numbers). The information
that was going to be used to has against (the Ethernet SRC/DST or IP
SRC/DST or TCP/UDP port numbers) are miss-aligned by 4 bytes and your
hashing is now unpredictable.

It seems the most optimal solution here is FAT (flow labels) or
entropy labels (if your devices support either of them).

This issue has been discussed at length:

https://mailarchive.ietf.org/arch/msg/pals/ZTpJ_NEL5j6gv11NnwDW8guDDGQ

https://datatracker.ietf.org/doc/draft-ietf-pals-ethernet-cw/

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] DDoS to core interface - mitigation

2018-03-09 Thread James Bensley

On 8 March 2018 at 20:35, Saku Ytti  wrote:
> Hey Daniel,
>
> Apologies for not answering your question, but generally this is not a
> problem, because:
>
> a) have edgeACL which polices ICMP and UDP high ports to your links
> and drops rest
> b) don't advertise your links in IGP or iBGP
>
>
>
> On 8 March 2018 at 22:17, Dan Římal  wrote:
>> Hi all,
>>
>> I would like to discuss, how do you handle ddos attack pointing to IP 
>> address of any router core interface, if your UPLINK/ISP support RTBH and 
>> you would like to drop traffic at ISP level because of congested links.
>>
>> I have tried to implement "classic" BGP signalized RTBH, via changing 
>> next-hop to discard route. It works good for customers IPs, but applied to 
>> core-interface IP address, it drops routing protocol running on this 
>> interfaces between routers (because /32 discard route is more specific than, 
>> at least, /31 p2p). I tried to implement export filter between RIB and FIB 
>> (routing-options forwarding-table export) to not to install this routes to 
>> FIB. It looks better, it doesn't drop BGP/BFD/... anymore, but it works just 
>> by half. Try to explain:
>>
>> I have two routers, both have transit operator (UPLINK-A, UPLINK-B) and they 
>> are connected to each other. Routers interconnect is let's say 
>> 192.168.72.248/31 (248 router-A, 249 router-B). I will start to propagate 
>> via iBGP discard route 192.168.72.248/32 from ddos detection appliance to 
>> both routers. Router-B get RTBH route as the best, skip install to FIB 
>> because of export filter between RIB and FIB and will start to propagate 
>> appropriate route with blackhole community to UPLINK-B. UPLINK-B drops dst 
>> at their edge. Good.
>>
>> But, router A get the same blackhole route, but not as the best, because it 
>> has the same route (/32) as a local route with lower route preference:
>>
>> 192.168.72.248/32  *[Local/0] 34w1d 07:59:10
>>   Local via ae2.3900
>> [BGP/170] 07:43:20, localpref 2000
>>   AS path: I, validation-state: unverified
>> > to 10.110.0.12 via ae1.405
>>
>> So, router-A doesn't start propagate blackhole route to UPLINK-A (because it 
>> is not the best, i guess) and DDOS still came from UPLINK-A.
>>
>> How can i handle this situation? Maybe set lower route preference from 
>> detection appliance than default 170? But "Directly connected network" has 
>> preference 0 and i cannot go lower and cannot get more specific than local 
>> /32. Or maybe use bgp advertise-inactive toward my UPLINKs? Will this help?
>>
>> Thanks!
>>
>> Daniel

In addition to the above, try to avoid use public IPs on internal
links if you can, they don't need to be reachable from the Internet
and it saves on IPv4 address space :)

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Upgrading from RE-S-2000-4096-S/SCB-MX960-S to RE-S-1800X4-32G-S/SCBE2-MX-S

2018-01-17 Thread James Bensley

On 17 January 2018 at 09:32, Niall Donaghy  wrote:
> Hi Craig,
>
> Indeed Misak's recommendation is the one I would follow.
>
> We are in the process of upgrading SCBEs to SCBE2s and indeed you must power 
> off for this.
>
> As for the RE upgrades, JNPR states here 
> https://www.juniper.net/documentation/en_US/release-independent/junos/topics/reference/specifications/routing-engine-m-mx-t-series-specifications-by-model.html
>  that: "On routers that accept two Routing Engines, you cannot mix Routing 
> Engine types except for a brief period (one minute or so) during an upgrade 
> or downgrade to two Routing Engines of the same type." I suggest that if you 
> try to sync in-chassis rather than use flash, it might work, but is 
> technically unsupported.
>
> Br,
> Niall
>
> Niall Donaghy
> Senior Network Engineer
> GÉANT
> T: +44 (0)1223 371393
> M: +44 (0) 7557770303
> Skype: niall.donaghy-dante
> PGP Key ID: 0x77680027
> nic-hdl: NGD-RIPE
>
> Networks • Services • People
> Learn more at www.geant.org
> GÉANT is the collective trading name of the GÉANT Association and GEANT 
> Limited.
>
> GÉANT Vereniging (Association) is registered in the Netherlands with the 
> Chamber of Commerce in Amsterdam. Registration number: 40535155. Registered 
> office: Hoekenrode 3, 1102BR Amsterdam, The Netherlands.
> GEANT Limited  is registered in England & Wales. Registration number: 
> 2806796. Registered office: City House, 126-130 Hills Road, Cambridge CB2 
> 1PQ, UK.
>
>
>
> -Original Message-
> From: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of 
> Misak Khachatryan
> Sent: 16 January 2018 18:30
> To: craig washington 
> Cc: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] Upgrading from RE-S-2000-4096-S/SCB-MX960-S to 
> RE-S-1800X4-32G-S/SCBE2-MX-S
>
> Hi,
>
> As i remember, you can't mix REs and/or SCBs. Better to save your 
> configuration to flash, replace all cards and power it up. It's relatively 
> easy to predict port number changes from DPCE to MPC, so you can edit 
> configuration accordingly before loading it to updated router.
>
> Best regards,
> Misak Khachatryan,
> Network Administration and
> Monitoring Department Manager,
>
> GNC- ALFA CJSC
> 1 Khaghaghutyan str., Abovyan, 2201 Armenia
> Tel: +374 60 46 99 70 (9670),
> Mob.: +374 55 19 98 40
> URL:www.rtarmenia.am
>
>
> On Tue, Jan 16, 2018 at 5:52 PM, craig washington 
>  wrote:
>> Hello smart people.
>>
>>
>> We are in the process of upgrading from the aforementioned in the title.
>>
>> My question is does anyone have a procedure they have used in the past for 
>> doing this or any type of gotchas.
>>
>> We are also replacing the old DPC's for MPC's.
>>
>> I know the port numbers will change and that the MPC's aren't compatible 
>> with SCB.
>>
>> So I am looking for something like what's the best process to follow, for 
>> instance should I:
>>
>>
>>   1.  Swap out RE-2000 for RE-1800 one at a time (leaving SCB in place) to 
>> get the configuration over and sync and then once that's up swap out the SCB 
>> for SCBE2 and also swap out the DPC's at the same time?
>>
>>
>> Juniper documentation suggest powering off the entire router when going from 
>> SCB to SCBE2 so figured I would make sure the RE's are good and code is up 
>> to date before powering off and swapping out SCBE2 and DPCs for MPC's.
>>
>>
>> Thanks again and all feedback is welcome.
>> ___
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net 
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

We are going through a similar process, upgrading to SCBE2 and RE-NGs.

Seeing as the chassis have to be powered down I think one of the two
approaches is best;

- Use a lab chassis (or get a vendor loan) and insert all the same
hardware you have in productions, load up the same config, then
upgrade the Junos and the SCBs. You can then remove those REs which
will have the correct Junos code and working config on them, and the
SBEs, take them to the DC and the upgrade process on the live network
is "only" a case of shutting down the chassis, swapping RE/SCB cards
and booting up.

- Again, seeing as you are doing a power down, build the finally
chassis / config / software in the lab and take a pre-built chassis to
the DC and replace the chassis. If this is a P with only a few
connections this is easy and clean. If it's a fully PE loaded with
every port (pre)patched then this isn't viable and opt 1 is possibly
better.

Cheers,
James.
___
juniper-nsp mailing list

Re: [j-nsp] Understanding limitations of various MX104 bundles

2018-01-10 Thread James Bensley

On 4 January 2018 at 18:34, Josh Baird  wrote:
> Hi all,
>
> Given the MX104-MX5-AC bundle which comes with 1 20x 1GE MIC pre-installed
> (and none of the onboard 10Gbps interfaces enabled), is this box actually
> limited to 20Gbps overall throughput?
>
> Can I install another MIC (say the MIC-3D-2XGE-XFP) in an additional slot
> to gain 2 10Gbps interfaces without purchasing any additional licensing?
> If I do this, is overall throughput of the chassis still locked to 20Gbps
> (due to the original bundle)?
>
> I can't find anything (ie "show system license") that states there is an
> overall capacity restriction, but I'm hearing mixed things from various
> sources.

The bundle "MX104-MX5-XX" includes:

- 2 MIC slots blocked
- 1 slot reserved for Service MIC
- 20x1GE with MACSEC MIC

So I don't think you can use it for normal line cards like the 2XGE or
another 20x1GE etc.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] LACP between router VMs (James Bensley)

2017-12-04 Thread James Bensley

Amazon recently announced bare metal instances which provide access to
the CPU virtual instruction set such as Intel VT-x, so although I
haven't had a chance to look into it much just yet, I'm hoping one can
spin up vMX/ASR9Kv etc. using these instances in Amazon and just
offload the scaling issue to them:

https://aws.amazon.com/about-aws/whats-new/2017/11/announcing-amazon-ec2-bare-metal-instances-preview/


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] [c-nsp] LACP between router VMs

2017-11-08 Thread James Bensley

On 8 November 2017 at 16:30, Jeff Meyers  wrote:
> Hi Adam,
>
> LACP packets (so called slow packets in the linux kernel) are never
> forwarded by the bridging code. If it didn't change in the last ~2 years,
> you will have to hack your kernel in order to let them pass. It's actually
> only a minor change in a couple of lines. The same applies btw to STP
> packets.
>
>
> Jeff
>
>
> Am 08.11.2017 um 17:09 schrieb adamv0...@netconsultings.com:
>>
>> Hi folks,
>>
>>
>> Slightly off topic but I'm gonna give it a shot anyways.
>>
>> Would anyone know how can I make linux bridges or better OVS to forward
>> LACP
>> PDUs instead of swallowing 'em?
>>
>> Basically "l2protocol forward lacp" equivalent.
>>
>> Couldn't find a single article on this and just can't believe I'm the
>> first
>> person that ever tried this.
>>
>> Aren't linux bridges or OVS MEF compliant :-) ?
>>
>>
>> Thanks
>>
>>
>> adam

If you have NIC(s) that support SR-IOV in your hypervisor boxes then
binding a virtual-function from the PF to your VMs should allow the
LACP frames to pass.

Cheers,
James,
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] LDP VPLS - Multi-homing

2017-10-10 Thread James Bensley

On 10 October 2017 at 01:45, Aaron Gould  wrote:
> Ah, I see what you are asking.  I don't know, perhaps someone on list knows
> the particulars.
>
> About the multiple active fwd'ing paths for mhome'd pe-ce... I think someone
> told me that is a benefit that evpn brings to the table... but I heard it
> has something to do with per-vlan load sharing across those active/active
> mhomed sites.
>
> Don't know yet, since I'm just diving into evpn, and am already discouraged
> that I read that evpn isn't supported in lsys, ... lsys is the basis for all
> my lab testing.  Oh well, perhaps I'll pull a another acx5048 from the
> warehouse and give it a whirl

If you want to practice with EVPN in a non-Juniper environment I think
it works on the Cumulus Linux free/demo VM and probably some others
like Cisco xrv9k I believe, maybe the latest vMX supports it too?

Yeah EVPN has control-plane level MAC learning; from a high level
imagine it like a typical layer 3 VPN with IP prefixes being sent in
BGP UPDATES just that MACs are sent instead.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] LDP VPLS - Multi-homing

2017-10-09 Thread James Bensley

On 9 Oct 2017 18:52, "Aaron Gould"  wrote:

Thanks James, What exactly are you trying to figure out ?  you
mentioned " I was trying to work out the mechanism for signalling the
non-designated-forwarding PE-CE link to go operationally "down"...


-Aaron


I was wondering how it works.

The RFC doesn't cater for multiple active forwarding paths to a CE/site,
some form of STP is recommended. I wonder how Juniper have implemented this
designated forwarding link signalling to remove the layer 2 loop; an
additional attribute in BGP?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] LDP VPLS - Multi-homing

2017-10-09 Thread James Bensley

On 9 October 2017 at 12:49, Aaron Gould  wrote:
> Ah.  I think I might be on to something.  I see that when I do a BGP VPLS
> (fec 128, rfc 4761) style config, then I do NOT see the pw's active between
> the non-designated-forwarding multi-homed pe's...  and, this seems to be
> automatic. (no mhoming config needed in my lab)  It seems in that link there
> is some typos with explaining BGP VPLS and LDP VPLS.  Seems that it explains
> on BGP Signals PW's under the section of the document pertaining to LDP
> VPLS, so it confuses the reader.  It seems that someone should correct that
> document.

I tried reading through the links and it wasn't making sense to me,
now that you say that, it does make more sense. At first I thought you
were confusing LDP signalled VPLS with BGP signalled VPLS but yeah I
see now that the Juniper documentation is in fact incorrect :)

I was trying to work out the mechanism for signalling the
non-designated-forwarding PE-CE link to go operationally "down" (I'm
used to VPLS on Cisco kit). This isn’t in RFC4761 so is this a Juniper
custom extension?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX Forwarding Scale/Monitoring

2017-09-14 Thread James Bensley

On 5 September 2017 at 17:29, Harry Reynolds  wrote:
> The memory is dynamic and can be allocated as needed. I think the key is that 
> NH is limited to a maximum of 11 double mega words. What matters is overall 
> utilization, and then how close NH is to having 11DMW, and if so, how much of 
> that is used. To survive negative events we suggest that no more than 80-85% 
> of the maximum 11 DMW be used.
>
> Here is OP from a box that is getting full. It has near 80% of the 11 DMW 
> maximum allocated and of that, its 84% full.
>
> Regards
>
>
>
> NPC7(gallon vty)# sho jnh 0 poo usage
> EDMEM overall usage:
> [NH//|FW|CNTR///|HASH|ENCAPS|]
> 08.012.018.0 24.7   
> 28.8 32.0M
>
> Next Hop
> [***|]
>  8.0M (84% | 16%)
>
> Firewall
> [*|--|] 4.0M (3% | 97%)
>
> Counters
> [***|] 6.0M (38% | 
> 62%)
>
> HASH
> [***] 6.7M 
> (100% | 0%)
>
> ENCAPS
> [] 4.1M (100% | 0%)
>
> Shared Memory - NH/FW/CNTR/HASH/ENCAPS
> [**|--]
>  7.2M (48% | 52%)
>
> DMEM overall usage:
> [-]
> 0 0.0M


Hi Harry,

Sorry for the delay in my response!

I am still plugging away with JTAC with only moderate success so
thankyou kindly for this info it is very useful.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] MX Forwarding Scale/Monitoring

2017-09-05 Thread James Bensley

Hi All,

I’ve searched juniper.net and JTAC has also failed me, so I turn to
j-nsp for help;

$dayjob has some MX chassis which are running 13.something (basically
pre 15.1 which is when “show system resource-monitor fpc” was added).

We want to monitor FIB usage on our MX platforms. “show system
resource-monitor fpc” shows PFE memory NH and FW memory utilisation
percentage. How can we achieve the same thing pre-15.1?

 Looking at the output from the following command raises more
questions than they answer:

$ request pfe execute target fpc0 command "show jnh 0 pool usage"

This does actually show EDMEM usage however not all EDMEM is allocated
for NH entries and there is free EDMEM space not allocated for any
purpose (yet), so it’s not clear what our NH utilisation percentage is
(I guess we could add the current NH allocation with the unused space
and use that as a guestimate?).


$ request pfe execute target fpc0 command "show jnh 0 pool summary”

This again shows overall EDMEM usage but not NH utilisation.


$ request pfe execute target fpc0 command "show luchip 0”

This shows “RLDRAM:  576 Mb by 4 devices at 533 MHz” but I was under
the impression only 32 or 64MBs of RDLRAM is used by the LUchip/PPEs
for EDMEM entries, so looking at the percentage of RLDRAM used is no
good right? Also what are these “4” devices? All our cards are MPC2E
or MPC3E with mostly 2 PFEs off the top of my head.


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Why JUNOS need re-establish neighbour relationship when configuring advertise-inactive

2017-07-17 Thread James Bensley

On 16 July 2017 at 12:23, Daniel Roesen  wrote:
> On Sat, Jul 15, 2017 at 12:33:16PM +0300, Saku Ytti wrote:
>> Usually JunOS (like other platforms) resets session when you have to
>> change update group. If you'd have multiple neighbours under TO-VRx
>> sharing same export-policy and you'd add 'advertise-inactive' under
>> one of the neighbours, not under the group directly. I would expect to
>> see reset.
>
> Which is technically unnecessary, so a bug. Or extremely sloppy
> programming causing operational impact for no good reason. I can't
> accept this behaviour as "industry standard".

This thread jogged my memory of an issue we had a while back that is
similar; bounced an eBGP inet-vpn session (Inter-AS Opt B) and the
session on that PE to both it's RR's bounced too.

It's this basicaly:
http://www.juniper.net/documentation/en_US/junos/topics/example/bgp-vpn-session-flap-prevention.html

The recommendation is to set up a passive BGP session to a
non-existing peer so that there is always a configure eBGP peer (even
thought the BGP FSM is not == "Established"), in that instance that
was one single eBGP inet-vpn peer to bouncing the BGP session mean the
last (only) eBGP inet-vpn session when down toggling the
instance.inet.0 <> bgp.l3vpn.0 redistribution.

An archaic feature of the Junos BGP implementation, not really carrier
grade stuff if you ask me. Just like some of their run-to-completion
code.

We have seen on IOS many times when activating a new
NLRI/address-family between two peers the router will immediately
bounce the session so that the new NLRI can be negotiated and used,
rather than waiting for the session to be manually cleared for
example. I find that quite annoying. Does anyone know if Junos has the
same behaviour?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] improving global unicast convergence (with or without BGP-PIC)

2017-05-02 Thread James Bensley

On 2 May 2017 at 11:30,  <adamv0...@netconsultings.com> wrote:
>> James Bensley
>> Sent: Tuesday, May 02, 2017 9:28 AM
>>
>> Just to clarify, one doesn't need to enable indirect-next-hop because it
> is
>> enabled by default, but if it were turned off for any reason, I presume it
> is a
>> requirement for PIC Edge? Or is it really not required at all, if not, how
> is the
>> Juniper equivilent working?
>>
> It's a requirement for PIC Edge (Egress PE or Egress PE-CE link failure) as
> well as for PIC Core (Ingress PE core link failure).
> To be precise it is required for in-place modification of the forwarding
> object to the backup/alternate node.
> So in a sense that applies for ECMP/LACP NHs a well the only difference is
> that both NH are in use in those cases -but you still need to be able to
> update all FIB records using them at once in case one of the NHs goes down.
>
>
>
>> Looking on juniper.net it looks like one exports multiple routes from the
> RIB
>> to FIB however assuming the weight == 0x4000 those additional paths won't
>> be used during "normal" operations, only during a failure, so we won't
>> actually get any per-packet load balancing (which would be undesirable for
>> us), is that correct?
> That precise.
>
> adam

Thanks!
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] improving global unicast convergence (with or without BGP-PIC)

2017-05-02 Thread James Bensley

On 27 April 2017 at 14:41,  <adamv0...@netconsultings.com> wrote:
>> James Bensley
>> Sent: Thursday, April 27, 2017 9:13 AM
>>
>> It might be worth pointing out that on Cisco you need to enable PIC Core for
>> PIC Edge to work at its best.

> So it's either Core or Core+Edge.

That's pretty much the point I was trying to make, albeit unclearly.

>> For you VPNv4/VPNv6 stuff one must enable PIC Edge with advertise best
>> external or add path etc. However enabling PIC Edge without PIC Core means
>> that backup paths will be pre-computed but not programmed into hardware.
> Again, not sure how can you enable PIC Edge but not PIC Core on Cisco?

We use PIC Core + PIC Edge however we still have some 7600s in the mix
which don't support a hierarchical FIB for labelled prefixes without
recirculating all packets (PIC Edge is basically not support for
VPNv4/VPNv6 prefixes without halving your pps rate). So you end up
with BGP computing a backup path in the BGP RIB (random prefix from
Internet table shown below as example) but there is no backup path in
CEF/FIB:

#show bgp ipv4 unicast 1.0.4.0/24
BGP routing table entry for 1.0.4.0/24, version 326263390
BGP Bestpath: compare-routerid
Paths: (3 available, best #3, table default)
  Advertise-best-external
  Advertised to update-groups:
 2  4  10
  Refresh Epoch 3
  3356 174 4826 38803 56203
x.x.x.254 (metric 2) from x.x.x.254 (x.x.x.254)
  Origin incomplete, metric 0, localpref 100, valid, internal
  Community: x:200 x:210
  rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  6453 3257 4826 38803 56203
195.219.83.137 from 195.219.83.137 (66.110.10.38)
  Origin incomplete, metric 0, localpref 100, valid, external,
backup/repair, advertise-best-external<< PIC backup path
  Community: x:200 x:211 , recursive-via-connected
  rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  174 4826 38803 56203
10.0.0.7 (metric 1001) from 10.0.0.7 (10.0.0.7)
  Origin incomplete, metric 0, localpref 100, valid, internal,
best<< best path
  Community: x:200 x:212
  rx pathid: 0, tx pathid: 0x0

So one ends up having the next best path learned and computed but not
installed in to FIB. Bit of a corner case I know but Cisco know's we
love to juggle more items than we have hands!

>> In Juniper land, does one need to activate indirect-next-hop before you can
>> provide PIC Edge for eBGP vpn4/vpn6 routes?
>>
> Nope, just load-balancing.
> And then protection under neighbour stanza.
>
>> Is indirect-next-hop enabled by default on newer MX devices / Junos
>> versions?
>>
> Yes.

Just to clarify, one doesn't need to enable indirect-next-hop because
it is enabled by default, but if it were turned off for any reason, I
presume it is a requirement for PIC Edge? Or is it really not required
at all, if not, how is the Juniper equivilent working?

Looking on juniper.net it looks like one exports multiple routes from
the RIB to FIB however assuming the weight == 0x4000 those additional
paths won't be used during "normal" operations, only during a failure,
so we won't actually get any per-packet load balancing (which would be
undesirable for us), is that correct?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] improving global unicast convergence (with or without BGP-PIC)

2017-04-27 Thread James Bensley

On 19 April 2017 at 17:20, Dragan Jovicic  wrote:
> What Cisco originally calls "PIC Core" is simply indirect-next-hop feature
> on routers, same on Juniper. On "flat" architectures without indirect
> next-hop, a failure of an uplink (a core link) on a PE router would require
> this PE to reprogram all BGP prefixes to new directly connected next-hop.
> Depending on your router and number of prefixes this very well may be an
> upward of several dozen of seconds, if not a few minutes. With
> indirect-next-hop feature, a PE router simply updates a pointer from BGP
> next-hop to new interface, making this almost an instantaneous excursion.
> On older routers without it, you may resort to using multiple equal-cost
> uplinks (or LAG interfaces) since in this case you already have a backup
> next-hop in your forwarding table.
>
> What Cisco originally calls "PIC Edge" is ability to install already
> present backup route from another BGP routers into the forwarding table.
> For this you need to:
>
> 1) already have backup route from control plane into RIB (using add path,
> iBGP, additional RR, advertise external, etc),
> 2) install these route into forwarding table ( this is main part as this
> FIB update is largest piece of convergence cake).
> On Juniper, the part of importing routes into FT is, for some reason,
> called "protect core" (and available for inet.0 table post-15.1), and
> 3) the PE router need to detect failure of upstream BGP router or its link.
> One of the ways is to passively include upstream link in IGP, but there are
> others.
>
> Note the difference - in first case BGP next-hop is unchanged, in the
> second, you have a new BGP next-hop altogether.
>
> What Juniper calls "BGP Edge Link Protection" is something different. It
> allows Edge ASBR router to reroute/tunnel traffic from failed CE link over
> core to another ASBR. For this to work the router must not look at IP
> packet (still pointing to failed PE-CE links), hence per-prefix labels are
> used. Juniper very well mentions this. Also this is available only for
> labeled inet/inet6 traffic, not family inet - at least I don't see it
> available in recent versions.
>
> There is also another technology called "Egress Protection", which is
> something different but quite cool.
>
> @OP, depending on how your topology looks like you may benefit from simple
> indirect-nh (aka PIC Core) as this might not need an upgrade. For link
> failure detection on ASBR, you might use BFD, smaller times, even
> scripting, if LOS is not a viable option. But this still means BGP
> convergence. LoS opens some cool options like using same bgp next-hop
> pointing over multiple rsvp tunnels ending on multiple routers.
>
> As for default route, if its installed in FT, I don't see why the router
> wouldn't use this entry in the absence of more specific (bearing all other
> issues with such setup).
> If you use labeled internet traffic you can resolve remote next-hop of
> static route to get a label for it.
>
> BR
>
> -Dragan
> ccie/jncie

Hi,

It might be worth pointing out that on Cisco you need to enable PIC
Core for PIC Edge to work at its best. PIC Core as already mentioned
is just enabling the hierarchical FIB. So for your IGP / global
routing table prefixes they will be covered by backup paths if they
exist (backup path computed and installed into hardware FIB).

For you VPNv4/VPNv6 stuff one must enable PIC Edge with advertise best
external or add path etc. However enabling PIC Edge without PIC Core
means that backup paths will be pre-computed but not programmed into
hardware. With PIC Core enabled, the FIB is arranged hierarchically to
support prefix indirection AND for your IGP (for example) which has
visibility of multiple paths without the need for any additional
features (unlike eBGP which only sees the best paths by default) a PE
can both calculate AND program the backup path into the FIB. With BGP
PIC Edge and no PIC Core eBGP backup paths can be received and
computed but the backup path is not pre-programmed into FIB. There is
still some speed up to this but really if using BGP PIC Edge, PIC Core
should be enabled too.

There are caveats in the Cisco world like 7600s support PIC Core but
to support PIC Edge they have to recirculate all packets so you half
you Pps rate for VPNv4/VPNv6 packets. ASR9000’s have the hierarchical
fib enabled by default and I don’t think if it can be disable.
ME3600/ME3800 don’t have the H-FIB enabled by default but it can be
enabled and it supports VPNv4/VPNv6 prefixes, and so on.

In Juniper land, does one need to activate indirect-next-hop before
you can provide PIC Edge for eBGP vpn4/vpn6 routes?

Is indirect-next-hop enabled by default on newer MX devices / Junos versions?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] VMX 17.1 experiencing high latency/packet loss with SRIOV

2017-04-05 Thread James Bensley

On 5 April 2017 at 15:38,   wrote:
> The NIC is an Intel XL710 running at 10Gbps.

I don't know about vMX for Junos 17, is the i40evf driver supported
(for X710 Intel NICs)?

We are having a similar issue with Cisco's CSR1000v on CentOS with KVM
and X710 NICs. The i40evf driver isn't support by the CSR1000v at
present and we have one-way packet loss (inbound to the VM). We have
to wait for IOS-XE/CSR1000v 16.7 coming in Q3 2017to support X710
NICs/the i40evf driver. Are you use its supported for vMX 17?

Having a quick scan of juniper.net says: For single root I/O
virtualization (SR-IOV) NIC type, use Intel 82599-based PCI-Express
cards (10 Gbps) and Ivy Bridge processors.

https://www.juniper.net/techpubs/en_US/vmx17.1/topics/reference/general/vmx-hw-sw-minimums.html

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] vMX SR-IOV

2017-03-23 Thread James Bensley

It's not supported by the Intel drivers.

Travelling, I'll try and find you a link tomorrow.

Cheers,
James.


On 23 Mar 2017 21:18, "Stefan Stoyanov"  wrote:

> Hi everyone,
>
> Does anyone have an idea, why vMX SR-IOV vlan-tagging isn't working?
> If I use "unit 0" without any VLANs configured on the interface everything
> is okay.
>
> Is it possible to be something related to the NIC which I am using?
> Also, do I need to disable some of the following feature ( or all of them
> )?
>
> rx-vlan-offload: on
> tx-vlan-offload: on
> rx-vlan-filter: on
>
>
>
> *0a:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
> Network Connection (rev 01)*
> *Subsystem: Intel Corporation Ethernet Server Adapter X520-2*
> *Physical Slot: 2*
> *Flags: bus master, fast devsel, latency 0, IRQ 68*
> *Memory at e708 (64-bit, non-prefetchable) [size=512K]*
> *I/O ports at 5020 [size=32]*
> *Memory at e7304000 (64-bit, non-prefetchable) [size=16K]*
> *Capabilities: [40] Power Management version 3*
> *Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+*
> *Capabilities: [70] MSI-X: Enable+ Count=64 Masked-*
> *Capabilities: [a0] Express Endpoint, MSI 00*
> *Capabilities: [100] Advanced Error Reporting*
> *Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-89-48-f4*
> *Capabilities: [150] Alternative Routing-ID Interpretation (ARI)*
> *Capabilities: [160] Single Root I/O Virtualization (SR-IOV)*
> *Kernel driver in use: ixgbe*
> *Kernel modules: ixgbe*
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] l2circuit/xconnect between MX104 and a ME3600X

2016-09-30 Thread James Bensley

You need to share the configs I think you want much help with this.

Also from the Cisco side can you give the full output from...

show xconnect interface Gi0/1 detail

show mpls l2transport vc 2 detail

show mpls l2transport binding 2

show mpls forwarding-table labels 18 detail


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Service Activation Testing

2016-09-27 Thread James Bensley

> On 22/09/2016 14:41, Joe Freeman wrote:
>> I've been asked to put together a solution that allows us to do SAT on
>> every new turnup. These are all Ethernet services.
>>
>> I've been trying to figure out how to do it in the MX platform since that's
>> what we predominately have in our CO's, but JTAC has recently told me that
>> RFC2544 or Y.1564 service testing won't be available until 17.1 at the
>> earliest, contrary to all the published documentation for 16.1.
>>
>> What solutions have others used?

Sadly, the highest overhead but most reliable option which is to get a
field engineer on site and have them perform testing as part of the
service activation.

On 26 September 2016 at 00:29, James Harrison  wrote:
> RFC2544 isn't great for non-lab testing - Y.1564 is the way to go
> (unless you want to validate TCP throughput rather than or in addition
> to L2/L3, in which case RFC6349 is the right tool). It's worth digging
> into what you're actually looking to test and assure - CIR? EIR? Do you
> need to prove burst characteristics, QoS/CoS etc? VLANs, multicast/IGMP?
> Some of this will determine how you'll have to test.

+1 for this, as mentioned RFC2544 and Y.1564 aren't the same. Also if
you are running MPLS down to the CPE you can activate OAM between PE
and CE, this works more or less in the little testing I've done with
it, but scaling to every MPLS enabled CPE rules it out for us.

There are multiple things to test. In the case of jitter and latency
for example we have lots of small Cisco routers dotted around that are
configured in a full mesh of IP SLA test and we poll from the NMS for
the test results and create a matrix. SAT for a new PE would mean
adding it as a client to the mesh of IP SLA probes, not for CPEs
though. My point is, OP might need to break this down into multiple
tools or platforms etc.

> Realistically there's a lot of kit out there for multiprotocol Ethernet
> service assurance, and while the MXes can do TWAMP and such just fine
> I'd be looking at dedicated hardware for Y.1564 et al. Apart from
> anything else, having performance measurement endpoints/devices
> dedicated to just that makes isolation of variables easier when
> diagnosing performance faults and gives you a bit more flexibility in
> how you deploy test endpoints.

If OP is offering a wires only service (we usually do managed CPE but
we do wires only from time to time) a hardware tester is the way
forward. Get a field engineer on each end of the link/DCI/whatever and
give it a thorough testing, save the results somewhere safe and get
the customer to approve the results before handover (assuming there
are no issues).

> For 1G and up, EXFO, Viavi and VeEX all have products worth looking at,
> though only the former two have "service assurance" platforms
> specifically aimed at turn-up testing (AFAIK - VeEX has a lot of stuff
> in the HFC world, less so on pure Ethernet). Below 500M or so there's
> more scope for cheap and cheerful options like perfSonar/iperf/bwctl and
> friends, but I'd really avoid those if you can - we see a huge amount of
> variation in test performance above even a few hundred megs.

On 26 September 2016 at 18:04, Joe Freeman  wrote:
> Primarily, I need to valid throughput and frame loss at the moment. having
> the ability to do L2/L3 with CoS/QoS is icing on the cake.

A shameless self plug here, I have written an tool that has been used
by me and a few others at other ISPs to test layer 2 services and
pseudowires and VPLS services https://github.com/jwbensley/Etherate

Similar to above when I said we send an FE to the each end of a link
for testing with hardware testers, sending two guys with laptops with
1G NICs, Etherate will easy saturate the link with 1Gbps of traffic
and let you calculate the delay, frame loss, detect out-of-order
frames etc. Stuff like iPerf is great but I've had problems with
varying iPerf results when all I wanted is a basic speed test in UDP
mode. Etherate is a very simple and light weight program so it can
easily generate 1Gbps and 10Gbps but most laptops don't have 10Gbps
NICs (yet!).

We do also have iPerf servers dotted around in a few PoPs, customers
and engineers can run tests back to them anytime after the service
goes live which is helpful as it allows customers to perform some
diagnostics before they come to our support desk.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Juniper vMX

2016-09-12 Thread James Bensley

> From: Josh Reynolds 
> Sent: Friday, September 9, 2016 7:41:13 PM
> To: Alex Valo
> Cc: Juniper List
> Subject: Re: [j-nsp] Juniper vMX
>
>
> Disclaimer: I have not used vMX.
>
> You might be better off going with something like VyOS/Vyatta. A quad core 
> high clock xeon with 8+ GB of RAM and a PCIE ASIC like a Chelsio T420 (or 
> better) would probably serve you well, and will have a very similar syntax.

...

What makes you say he'd be better off?

On 9 September 2016 at 18:48, Alex Valo  wrote:
> Thank you for your sharing your thought.
>
>
> We also consider a similar product, the Brocade vRouter 5600, which is 
> essentially Vyatta with Intel DPDK support and a lot of bug fixes, however, 
> we do like Juniper product,

vMX does use DPDK so I am quite curious to see how fast it will run, I
would expect line rate to be honest (I don't know if there are any
throughput licensing requirements like a Cisco CSR1000v).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Inter-AS MPLS OptB Endianness Issue?

2016-09-01 Thread James Bensley

On 18 Jul 2016 11:58, "James Bensley" <jwbens...@gmail.com> wrote:
>
> I have had an off list resposne confirming that others have seen this
> issue, so I'm not totally mad.
>
> Will speak to JTAC if I get the chance.
>
> Cheers,
> James.

PR 1211567

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Limit on the number of BGP communities a route can be tagged with?

2016-08-23 Thread James Bensley

Thanks all :)


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Limit on the number of BGP communities a route can be tagged with?

2016-08-23 Thread James Bensley

On 23 August 2016 at 13:40, Olivier Benghozi
 wrote:
> And about a limitation to 10 communities:
> I've seen that on SEOS (Redback/Ericsson OS for SmartEdge routers) when using 
> "set community" in a route-map. This is a ridiculous arbitrary limitation, of 
> course.
>
> Hopefully the limitation was only in the CLI, not in the BGP code itself. So 
> the workaround was to use the route-map "continue" command like in a BASIC 
> GOTO structure to add more communities in additional route-map entries (with 
> set community additive - these are Cisco-like commands).
>
>> Le 23 août 2016 à 14:03, Alexander Arseniev  a 
>> écrit :
>>
>> In BGP messages, a regular community is encoded in 7 bytes, and extended one 
>> in 11 bytes.
>>
>> Max BGP message size is 4096 bytes - this sets a limit for regular 
>> communities number to about 4K/7=570, and for extended communities to about 
>> 4K/11=360, if You consider the minimal mandatory information that has to be 
>> there apart from communities.
>>
>>
>> On 23/08/2016 03:18, Huan Pham wrote:
>>>
>>> I remember hitting a limit on a number of communities (something like 10 or
>>> so) on a platform (can not remember which one from which vendor). So I
>>> believe that there is a hard limit a platform or OS can support.
>>>
>>> I test this in the lab and found no problem with tagging 100 communities.
>>>
>>> Is there a maximum number of communities that Junos can tag to a route? If
>>> yes, then what it is?  Thanks.


Hi,

Hopefully not completely hijacking this thread; I'm interested to know
if there is a way I can limit a peer to a maximum number of
communities?


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Inter-AS MPLS OptB Endianness Issue?

2016-07-18 Thread James Bensley

I have had an off list resposne confirming that others have seen this
issue, so I'm not totally mad.

Will speak to JTAC if I get the chance.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Inter-AS MPLS OptB Endianness Issue?

2016-07-08 Thread James Bensley

Hi all,

Just noticed this the lab that between an ASR9K (5.3.3) and MX480
(13.3R8.7), nothing "major" as it work but just curious is anyone else
has seen this (especially between two MXs), I haven't.

On the MX the RD for the ABC VRF was set to "route-distinguisher
196744L:200;" and the RT is "196744:200". On the 9K it shows this:

show bgp vpnv4 unicast nei 172.31.98.77 routes

Route Distinguisher: 2281702144:200
*> 2.2.2.2/32 172.31.98.77   0 196744 i

When I change the RD on the ABC VRF to "route-distinguisher
172.31.96.7:200; ", the ASR 9K shows this...

show bgp vpnv4 unicast nei 172.31.98.77 routes

Route Distinguisher: 172.31.96.7:200
*> 2.2.2.2/32 172.31.98.77   0 196744 i


In either case it "works", the routes are received and installed as
expected (ultimately the RTs match up on either side), so perhaps not
really an issue, but a bit weird. It looks to me like a bug on the
Juniper when using 32B ASN for RDs when you specific the L option in
the RD [type 2] (or in the way the 9K interprets it?).


196744 in binary is:
   0011   1000 1000

2281702144 in binary is:
1000 1000    0011  

When sending a 32 bit administrator value for the RD, instead of the
16 bit one, the 2nd two bytes have come first and are back to front.
The first 2 bytes have come last but slightly bit-shifted. So is this
a big-endian / little-endian, network-to-host / host-to-network
mistake or am I reading into that too much? :)

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BFD/IS-IS wait to re-establish adjacency after failure tweak knob?

2016-05-19 Thread James Bensley

On 19 May 2016 at 10:53, Mark Tinka <mark.ti...@seacom.mu> wrote:
>
>
> On 19/May/16 11:49, James Bensley wrote:
>
>> In Cisco land we have the interface command "carrier-delay", for Junos
>> (this scenario) can the OP not use some variant of "set interfaces
>> xe-0/0/1 hold-time up 5000" ?
>
> OP says the issue is remote.
>
> Local link to provider's switch does not fail, and appears to have no
> way of relaying upstream outages to the OP's port.

Ah OK I missread, although in the case the physical interfaces is up
to the carriers switch but the far end is down, BFD should keep the
OP's interface down.

>From the original post:

> However, the IS-IS adjacency is coming up more quickly than desired.
> On average, it is coming up 7-8 seconds later. Unfortunately, the L2
> link is still unstable, so the BFD causes the session to drop again
> fairly quickly. This causes a lot of flapping that I do not need.

OK so assuming BFD is detecting the link as recovered then I think the
only options are dampening or increasing the carrier delay / hold
time.

Cheers,
James,
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] BFD/IS-IS wait to re-establish adjacency after failure tweak knob?

2016-05-19 Thread James Bensley

In Cisco land we have the interface command "carrier-delay", for Junos
(this scenario) can the OP not use some variant of "set interfaces
xe-0/0/1 hold-time up 5000" ?

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] SNMP walk on JunOS from inside a routing instance

2016-04-28 Thread James Bensley

On 28 April 2016 at 17:16, Hugo Slabbert  wrote:
> Use a community of simply "@SecretCommunity", *WITHOUT* the actual RI
> specified.  That will pull everything.  It's a little weird, but it works.

Yeah I had someone point that out to me offlist. I can confirm it's
now working as desired. Weird indeed, but hey, it works! :)

Thanks for the help all.

James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] SNMP walk on JunOS from inside a routing instance

2016-04-28 Thread James Bensley

On 28 April 2016 at 12:50, Dale Shaw  wrote:
> Hi James,
> My memory's a bit hazy on this, but do you see everything you want to see if
> you prefix the community string with a "@" in your cacti config?

Hi Dale,

As per my original email, I am prefixing the routing-instance name on
the SNMP get's;

snmpwalk -v 2c -c TEST-SNMP@SecretCommunity 10.254.242.1 .iso | grep ifDesc

Without the routing-instance name the SNMP gets timeout. I can prefix
it as default@SecretCommunity which will for example bring back all
the interfaces on the MX not in any VRf/routing-instance.

So it seems I have to specify a routing instance when using the config
from my original post, and I can specify "default@" to see interfaces
in the default table, I can also specify
A.Nother.Routing-Instance.Name@SecretCommunity and see interfaces in
that RI too, but nothing I can do seems to pull all interfaces when
making the SNMP get from within the RI when compared to making the get
from a host default.inet0.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] SNMP walk on JunOS from inside a routing instance

2016-04-28 Thread James Bensley

On 27 April 2016 at 17:10, Phil Mayers  wrote:
> On 27/04/16 16:58, Per Westerlund wrote:
>>
>> That is default behavior, but you can access other RI's interfaces by
>> explicitly using the RI name. No way to reach all IFs at once via a RI.
>
>
> I'm a bit confused now.
>
> I just tested (SRX240H running 12.3X48-D15.4) and I can see all interfaces
> when hitting an IP inside a routing-instance, as well as in inet.0.
>
> We do *not* have "routing-instance-access" under the "snmp" block, but can
> still make SNMP queries to a routing instance; the docs suggest this should
> not work, so I'm not sure what's going on.

Yes I would expect it to NOT work inline with Per's comments and that
is whats happening for us. From the old Cacti box which is in inet0
(no routing instance) we can hit that community string and get all
interfaces return.

On 27 April 2016 at 17:01, Phil Mayers  wrote:
> You've configured this community string to map to a routing-instance. Try
> removing it this config item, and just putting the "clients" directly under
> the community.

The problem is that the new Cacti box is only routable to/from the
MX's inside the routing-instance, we want it to be "securely" (take
that with a pinch of salt!) seperated from other traffic and routing.
So this is going to be a problem if the MX's have to be polled from
within inet0. All Cisco boxes are polled inside a management VRF, I
would expect Junos to be able to do this, it seems tome like it would
be a fairly common requirement (to have SNMP traffic seperated into
it's own routing instance).

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] SNMP walk on JunOS from inside a routing instance

2016-04-27 Thread James Bensley

Hi All,

I am migrating from one Cacti box to another, the new one polls some
MX boxes inside a routing instance but the old one polls in inet0 in
no routing instance.

When I snmpwalk the MX boxes from the new Cacti box I am only returned
the interfaces which are inside that routing instance the poll comes
in on. On the old Cacti box SNMP returns all interfaces, including
those inside all routing instances.

Does Junos restrict the SNMP output to that which relates to the
routing instance only, when polling in a routing instance?

username@mxrouter> show configuration snmp
community SecretCommunity {
authorization read-only;
routing-instance SNMP-TEST {
clients {
10.0.0.0/8;
}
}
}

username@mxrouter> show interfaces terse routing-instance TEST-SNMP
Interface   Admin Link ProtoLocal Remote
vt-0/0/10.1054  upup   inet
ge-0/3/7.2012   upup   inet 172.21.18.53/30
   multiservice
ge-2/3/7.2013   upup   inet 172.21.18.57/30
   multiservice
ae0.2047upup   inet 10.254.240.1/24
   multiservice
lo0.2047upup   inet 10.254.242.1--> 0/0


[root@cacti ~]#  snmpwalk -v 2c -c TEST-SNMP@SecretCommunity
10.254.242.1 .iso | grep ifDesc
IF-MIB::ifDescr.6 = STRING: lo0
IF-MIB::ifDescr.556 = STRING: ge-0/3/7
IF-MIB::ifDescr.571 = STRING: vt-0/0/10
IF-MIB::ifDescr.581 = STRING: ae0
IF-MIB::ifDescr.1220 = STRING: lo0.2047
IF-MIB::ifDescr.1242 = STRING: ae0.2047
IF-MIB::ifDescr.1342 = STRING: ge-0/3/7.2012
IF-MIB::ifDescr.1343 = STRING: ge-2/3/7.2013
IF-MIB::ifDescr.2936 = STRING: ge-2/3/7
IF-MIB::ifDescr.43020 = STRING: vt-0/0/10.1054

This example system is on 11.4R6.5, but we have a range of Junos
versions across MX480s & MX960s and it's the same behaviour for all of
them.

Any info and help with getting all interfaces returned when polling
from within a routing instance would be appreciated.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Redistribute Connected in Junos

2015-12-09 Thread James Bensley

On 17 November 2015 at 15:49, Dave Bell  wrote:
> Hi James,
>
> Your export policy isn't adding on your community.
>
> Try:
> term 10 {
> from {
> protocol direct;
> interface [ ge-0/1/0.89 fe-1/1/3.89 ];
> }
> community add 0089_VRF;
> then accept;
> }

Ah, I'm so glad it was that simple :)

Thank you very much for pointing that out, even if it wasn't very
technically difficult, you own mistakes are bar far (IMO) the most
difficult to spot/fix.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cisco ASR 9001 vs Juniper MX104

2015-12-02 Thread James Bensley

On 1 December 2015 at 14:14, Mark Tinka  wrote:
>
>
> On 1/Dec/15 15:03, john doe wrote:
>
>>
>>
>> I think price wise MX is a better deal. ASR fully loaded with cards and 
>> licences for various services gets expensive fast.
>
> Depends what cards you are loading in there.
>
> If you're packing an ASR1000 with Ethernet line cards, then you get what
> you deserve.
>
> If you need dense Ethernet aggregation, the ASR9000 and MX are better
> than the ASR1000.
>
> If you need a mix-and-match, the ASR1000 is better than the ASR9000 or MX.

With the exception of LAGs (IMO) as port-channels on the ASR1000
series does not support QoS very well at all on them;

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_mqc/configuration/xe-3s/qos-mqc-xe-3s-book/qos-eth-int.html#GUID-95630B2A-986E-4063-848B-BC0AB7456C44


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cisco ASR 9001 vs Juniper MX104

2015-12-02 Thread James Bensley

On 1 December 2015 at 17:29, Stepan Kucherenko  wrote:
> My biggest gripe with ASR9k (or IOS XR in particular) is that Cisco stopped
> grouping BGP prefixes in one update if they have same attributes so it's one
> prefix per update now (or sometimes two).
>
> Transit ISP we tested it with pinged TAC and got a response that it's
> "software/hardware limitation" and nothing can be done.
>
> I don't know when this regression happened but now taking full feed from
> ASR9k is almost twice as slow as taking it from 7600 with weak RE and 3-4
> times slower than taking it from MX.
>
> I'm not joking, test it yourself. Just look at the traffic dump. As I
> understand it, it's not an edge case so you must see it as well.
>
> In my case it was 450k updates per 514k prefixes for full feed from ASR9k,
> 89k updates per 510k prefixes from 7600 and 85k updates per 516k prefixes
> from MX480. Huge difference.
>
> It's not a show stopper but I'm sure it must be a significant impact on
> convergence time.

How long timewise is it taking you to converge?

Last time I bounced a BGP session to a full table provider it took sub
1 minute to take in all the routes. I wasn't actually timing so I
don't know how long exactly.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cisco ASR 9001 vs Juniper MX104

2015-12-02 Thread James Bensley

On 2 December 2015 at 09:17, Mark Tinka  wrote:
>
>
> On 1/Dec/15 17:49, john doe wrote:
>
>>
>>
>>
>> Yeah, I was just referring to cli experience. commits, rollback, hierarchy 
>> within. Prior XR IOS was wall of text, no?
>
> Still is, but you get used to working with what you have :-).

IOS does support configuration reverting and rollbacks, not in exactly
the same way as IOS-XR/Junos but I always use it when workingon the
production network. Just enable configuration archiving:

conf t
 archive
  path sup-bootdisk:/config-backup-
  maximum 10
  write-memory
  end
wr

conf term lock revert timer 20

%ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_BACKUP: Backing up current running
config to sup-bootdisk:/config-backup-Nov-25-2015-23-04-57.804-UTC-166

%ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_START_ABSTIMER: User: james.bensley:
Scheduled to rollback to config
sup-bootdisk:/config-backup-Nov-25-2015-23-04-57.804-UTC-166 in 20
minutes

! config changes goes here

end

! Check everythign is OK, then confirm the changes to cancel the rollback timer,
! If I make a big boo boo that cuts me off, the config will roll back
after 20 mins (as above)
! without me confirming it

configure confirm

! Oh no, I haven't made such a big mistake that I've been disconnected
! but actually I do need to rollback

configure replace
sup-bootdisk:/config-backup-Nov-25-2015-23-04-57.804-UTC-166 list

Nov 25 2015 23:25:17.479 UTC:
%ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_ROLLBACK_START: Start rolling to:
sup-bootdisk:/config-backup-Nov-25-2015-23-04-57.804-UTC-166

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Redistribute Connected in Junos

2015-11-17 Thread James Bensley

Hi All,

I'm much more of a Cisco head; trying to redistribute the connected
subnets into an MPLS L3VPN form PE2, up to some RRs then down to PE1,
not sure what I've missed here, can anyone help me out?

bensley@PE2> show route table 0089.inet.0
172.31.253.100/31  *[Direct/0] 3w4d 12:42:16
> via fe-1/1/3.89
172.31.253.100/32  *[Local/0] 3w4d 12:42:16
  Local via fe-1/1/3.89
172.31.253.102/31  *[Direct/0] 3w4d 12:42:16
> via ge-0/1/0.89
172.31.253.102/32  *[Local/0] 3w4d 12:42:16
  Local via ge-0/1/0.89
PE2.Lo0.IP.80/32*[Direct/0] 3w4d 12:42:16
> via lo0.89

bensley@PE2> show configuration policy-options community 0089_VRF
members target:12345:89;

bensley@PE2> show configuration routing-instances 0089
instance-type vrf;
interface lo0.89;
interface ge-0/1/0.89;
interface fe-1/1/3.89;
route-distinguisher PE2.Lo0.IP.80:89;
vrf-import plc-VRF-0089-Import;
vrf-export plc-VRF-0089-Export;
vrf-table-label;


bensley@PE2> show configuration policy-options policy-statement
plc-VRF-0089-Export
term 10 {
from {
protocol direct;
interface [ ge-0/1/0.89 fe-1/1/3.89 ];
}
then accept;
}

bensley@PE2> show route advertising-protocol bgp RR1.Lo0.IP.165 table
bgp.l3vpn.0 | match 89
  PE2.Lo0.IP.80:89:172.31.253.100/31
  PE2.Lo0.IP.80:89:172.31.253.102/31


So that all looks good to my layman eyes, however over on PE1 we don't
receive the routes:

bensley@PE1> show route receive-protocol bgp RR1.Lo0.IP.165 table
bgp.l3vpn.0 | match 89
  PE9.Lo0.IP.9:1067:10.1.89.0/24
  PE9.Lo0.IP.9:1067:10.2.89.0/24
  RR1.Lo0.IP.165:511:10.89.55.0/28



The RR also doesn't show the routes in "show route receive-protocol
bgp..." (it doesn't have the routing instance configured either but I
don't believe that should make a difference in Junos? I think I should
at least see the routes in the BGP RIB?). PE1 is sending some eBGP
learnt routes inside this VRF to PE2 via the RR, and PE2 is
successfully receiving them so I'm just trying to get some directly
connected return routes back from PE2 via RR to PE1.

Many thanks,
James.


bensley@RR1> show configuration protocols bgp group Core-MX480
type internal;
local-address RR1.Lo0.IP.165;
family inet {
unicast;
}
family inet-vpn {
unicast;
}
family inet6 {
unicast;
}
family l2vpn {
signaling;
}
export [ export-ibgp-ipv4-default-route export-ibgp-ipv4-client-routes
export-ibgp-ipv4-no-transit ];
cluster RR1.Lo0.IP.165;
neighbor PE1.Lo0.IP.85 {
description "PE1";
}
bensley@RR1> show configuration protocols bgp group Core-Others
type internal;
local-address RR1.Lo0.IP.165;
family inet {
unicast;
}
family inet-vpn {
unicast;
}
family inet6 {
unicast;
}
family l2vpn {
signaling;
}
export [ export-ibgp-ipv4-default-route export-ibgp-ipv4-client-routes
export-ibgp-ipv4-no-transit ];
cluster RR1.Lo0.IP.165;
local-as 12345;
neighbor PE2.Lo0.IP.80 {
description " PE2";
}

# There are no import statements, iBGP should advertise all routes
then, so only the export statements could potentially filter the
routes but thye *seem* to be allowed

bensley@RR1> show configuration policy-options policy-statement
export-ibgp-ipv4-client-routes
term downstream-transit {
from {
protocol bgp;
community [ downstream-transit lpsn-ipv4-route ];
}
then accept;
}
term vpn-routes {
from {
protocol bgp;
rib bgp.l3vpn.0;
}
then accept;
}
term l2vpn-routes {
from {
protocol bgp;
rib bgp.l2vpn.0;
}
then accept;
}


bensley@PE1> show configuration protocols bgp group core-mx480-rr
type internal;
local-address PE1.85;
family inet {
unicast;
}
family inet-vpn {
unicast;
}
family inet6 {
unicast;
}
family l2vpn {
signaling;
}
export [ export-bgp-default export-bgp-ipv4-transit
export-bgp-ipv4-downstream-routes export-bgp-vrf-all
export-bgp-ipv4-deny-all export-bgp-ipv6-deny-all ];
neighbor RR1.Lo0.IP.165 {
description "RR1";
}
neighbor RR2.Lo0.IP.166 {
description "RR2";
}
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] l2circuit between ASR9k and MX80

2015-08-24 Thread James Bensley

On 31 July 2015 at 11:41, Marcin Kurek not...@marcinkurek.com wrote:
 Hello,

 I'm doing some interoperability tests between Cisco and Juniper routers and
 I wanted to ask about a particular piece of config.
 I would expect that it shouldn't work, but it works perfectly, so I'm a bit
 confused.

 There is a l2vpn circuit between ASR9k and MX80 running in VLAN mode (VC
 Type 4).

 --- ASR9k 

 interface TenGigE0/0/1/5.3101 l2transport
  encapsulation dot1q 3101
  rewrite ingress tag pop 1 symmetric

 l2vpn
 !
  pw-class PW_TYPE_VLAN
   encapsulation mpls
transport-mode vlan
   !
  !
 !
  xconnect group XCONNECT_GROUP_2
   p2p PW2
interface TenGigE0/0/1/5.3101
neighbor ipv4 3.3.3.3 pw-id 101
 pw-class PW_TYPE_VLAN
!
   !
  !
 --- MX80 -

  ge-1/0/1 {
 description ESXi 2 VMNIC3;
 flexible-vlan-tagging;
 encapsulation flexible-ethernet-services;
 unit 3101 {
 encapsulation vlan-ccc;
 vlan-id 3101;

 protocols {
  l2circuit {
 neighbor 4.4.4.4 {
 interface ge-1/0/1.3101 {
 virtual-circuit-id 101;
 no-control-word;
 encapsulation-type ethernet-vlan;
 }
 }
 }
 }

 In my understanding following things should happen:
 - a frame coming to ASR9k from a CE is tagged with VLAN 3101
 - we are doing VLAN tag manipulation on ASR9k, the tag is popped
 - since we are using VC type 4, the PW forwarder adds a dummy tag 0 before
 forwarding the frame
 - MX80 should get the frame with tag 0 imposed

 If above is true, how does the MX handle the dummy tag?


Hi Marcin,

My understanding of how this should be configured and what would happen;

 - a frame coming to ASR9k from a CE is tagged with VLAN 3101
 - we are doing VLAN tag manipulation on ASR9k, the tag is popped
 - since we are using VC type 4, the PW forwarder adds a dummy tag 0 before
 forwarding the frame
 - MX80 should get the frame with tag 0 imposed
 - THE MX80 SHOULD POP THE DUMMY VLAN 0 TAG
 - THE MX80 SHOULD PUSH THE TAG 3101


However the MX80 hasn't got output-vlan-map configured as Saku pointed out.

Even without that configured, my understanding is that it should at
least PoP the dummy VLAN, unless it doesn't support VC type 4
properly? So the 3101 tag should be being popped unless it somehow
knows that isn't the dummy VLAN tag (maybe becasue it's not 0?).


On 10 August 2015 at 11:29, Marcin Kurek not...@marcinkurek.com wrote:
 Hi Saku,

 I wanted to follow-up with the results of further testing in the lab.

  Quote from cisco docs:
  In order to address this possibility, the EVC platforms insert a

 dummy VLAN

  tag 0 on top of the frame for type 4 PWs


 I did a packet capture on one of the core links and it clearly shows that
 above isn't true.
 When setting VC Type 4 (VLAN mode) between ASR9k w/ Typhoon LC and MX80 the
 tag is stripped (rewrite ingress tag pop 1 symmetric) and then it is pushed
 again, so the packet is traversing the core with tag 3101.

 Also I tested this configuration:

 A
 interface GigabitEthernet0/1/0/19.100 l2transport
 encapsulation dot1q 100


 B
 interface GigabitEthernet0/1/0/19.200 l2transport
 encapsulation dot1q 200
 rewrite ingress tag translate 1-to-1 dot1q 100 symmetric

 And it seems to be working, although I understand your point.



Makes sense (albeit a little naughty of Cisco), as per your testing
above the ASR9K is pushing the SVLAN tag back on before transmission
over the pseudowire.



On 2 August 2015 at 10:16, Marcin Kurek not...@marcinkurek.com wrote:
 Regarding VC Type 4 and Type 5 discussion, there was another thread on
 cisco-nsp a few days ago.
 Waris Sagheer wrote:

 Little bit history, VC type 4 came into being since some hardware did not
 have the capability to manipulate the tag at egress hence the concept of
 dummy tag. VC type 4 is pretty much becoming non existent and is being
 supported on legacy platforms only. VC type 5 will be the default signaling
 moving forward where users will have the flexibility to add/remove tags
 based on the EVC configuration.



It was me that started that thread. I was confused because I had made
a mistake in my testing which I hadn't spotted (classic!) and so my
results weren't adding up. I was about to begin some interopt
testing like you for pseudowire between a mixture of Cisco and Juinper
devices that use different configurations and AC encapsualtions, so
MX480's  MX960's running pseudowires to 7600's, ME3600/ME3800's,
ASR1000's and ASR9001/9006's so a full mix of IOS types and
configuration types using EVC and port based AC etc.

I've done some more testing, everything works using VC type 5 more a
less (that is the main take away here). I don't see any need to be
using VC type 4 anywhere. VC type 5 can transport untagged, tagged,
double tagged frames so juse use VC type 5.

Re: [j-nsp] MPLS LDP router-id

2015-08-10 Thread James Bensley

Hi Mohammad,

I think you are looking for the following commands on your Cisco
device (if I have understood the problem correctly),

IOS:
interface x/y/z
 mpls ldp discovery transport-address interface

IOS-XR:
mpls ldp
 vrf ABC123
  interface GigabitEthernet0/0/0/1
   address-family ipv4
discovery transport-address interface


Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Cisco ME3600 migration to something with more 10 gigports

2015-07-14 Thread James Bensley

On 14 July 2015 at 16:13, Aaron aar...@gvtc.com wrote:
 Thanks everyone for your input.



 Does the mx80 support all the mpls L3vpn and L2vpn things I mentioned ?


It does do all this:

 I'm needing more 10 gig ports in my CO's for purposes of upgrading my FTTH
 OLT shelves with 10 gig.  I currently use Cisco ME3600's and do a lot of
 core ospf, and MP-iBGP over that for MPLS L2VPN's (eline, elan, etree) and
 L3VPN's (VPNv4 and testing VPNv6)

However, If you're roughly after ME3600x but with say 6x10G then you
will need to chuck in a 20x1G MIC and 2x10G MIC. That is likely going
to me a lot more that you budgeted for. It's also 2 RU.

Cheers,
James.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

1 2 >

1 - 100 of 105 matches

Mail list logo