Re: [j-nsp] BGP timer

2024-04-28 Thread Thomas Bellman via juniper-nsp
On 2024-04-27 09:44, Lee Starnes via juniper-nsp wrote:

> Having difficulty finding a way to prevent BGP from re-establishing after a
> BFD down detect. I am looking for a way to keep the session from
> re-establishing for a configured amount of time (say 5 minutes) to ensure
> we don't have a flapping session for a. link having issues.

Isn't that what the holddown-interval setting does?  It is limited
to 255 seconds (4 minutes 15 seconds), though, and for BGP it is
only allowed for EBGP sessions, not iBGP sessions.

The documentation also says that you need to set holddown-interval
on *both* ends.  I'm gueesing that the holddown only prevents your
end from initiating the BGP session, but that it will still accept
a connection initiated from the other end.

https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/bfd-liveness-detection-edit-protocols-bgp.html

I haven't used BFD for BGP myself, though, only for static routes
on a couple of links.  But there I do use holddown-interval, and
at least when I set it up several years ago, it seemed to do what
I expected: after the link and the BFD session came up again, it
waited (in my case) 15 seconds before enabling my static route
again.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] QSA28 adapter in QFX5120-32C

2023-11-20 Thread Thomas Bellman via juniper-nsp
Has anyone managed to get a 25G transceiver to work in a QFX5120-32C
switch with a QSA28 adapter, i.e, an adapter from QSFP28 to SFP28?  If
so, what brand and model, what Junos version, and what configuration
magic did you need?


We have tried both an adapter from SmartOptics, and one from Dell,
but the switch doesn't seem to recognize it as a 25G transceiver at
all.  The channelized interfaces (et-0/0/0:[0-3]) don't show up, only
a non-channelized interface (et-0/0/0), and the switch says it has a
speed of 40 Gbit/s.

The transceivers themselves (also from SmartOptics) seem to work, or
are at least *recognized*, when we put them in QFX5120-48Y switches,
but those have native SFP28 ports, so we are not using an adapter
there.

QSA adapters for putting a 10G transceiver into it (i.e. not QSA28),
works without any problems in the switch, and likewise does break-out
DAC cables to give us 4×25G from each QSFP28 port.

We have configured the ports with

chassis {
fpc 0 {
pic 0 {
port 0 { channel-speed 25g; }
port 1 { channel-speed 25g; }
port 2 { channel-speed 25g; }
}
}
}

We have also tried "speed 100g" instead of "channel-speed 25g", as
well as no speed configuration at all, to no avail.

We have tested the QSA28 adapter in a Dell Z9100 switch as well, and
it seems to be OK with the adapter, and recognizes the transceiver as
25Gbase-LR (but then thinks it is not Dell qualified, and refuses to
use it, but the adapter itself seems OK).

Junos version 23.2R1-S1.6, but we have also tried 22.2R2-S1.5 (which
the switch arrived pre-installed with) and 22.4R2-S2.6 (which seems
to be the one with latest release date).

SmartOptics says they get their adapter to work in a PTX10001-36MR,
if you configure it with

interfaces {
et-0/1/4 {
number-of-sub-ports 4;
speed 25g;
}
}

but that syntax is not accepted by Junos on QFX5120 (I suppose it is
specific to Junos Evolved).


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX304 Port Layout

2023-06-08 Thread Thomas Bellman via juniper-nsp
On 2023-06-08 17:18, Kevin Shymkiw via juniper-nsp wrote:

> Along with this - I would suggest looking at Port Checker (
> https://apps.juniper.net/home/port-checker/index.html ) to make sure
> your port combinations are valid.

The port checker claims an interresting "feature": if you have
anything in port 3, then *all* the other ports in that port group
must also be occupied.  So if you use all those four ports for
e.g. 100GE, everything is fine, but if you then want to stop using
either of ports 0, 1 or 2, the configuration becomes invalid...

(And similarly for ports 5, 8 and 14 in their respective groups.)

I hope that's a bug in the port checker, not actual behaviour by
the MX304...


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] VRRP for IPv6

2022-01-25 Thread Thomas Bellman via juniper-nsp
On 2022-01-25 22:53, Chris Adams via juniper-nsp wrote:

> I wasn't planning to use a virtual link-local address, so I didn't put
> one.  The JUNOS VRRP for v6 example doesn't include one, although then
> the JUNOS documentation for virtual-link-local-address is oddly
> confusing:

For IPv6, the VRRP protocol requires that the link-local address is
virtual; it *must* be present in the list of virtual addresses a VRRP
node announces.

But Junos does indeed generate one automatically for you; you don't
need to add a virtual-link-local-address stanza.  And the link-local
address should show up when you run 'show vrrp':

  bellman@Bluegrass2> show vrrp 
  Interface  State  Group  VR state VR Mode  TimerType  Address
  irb.214up 1  master   Active   A 0.461  lcl   2001:6b0:17:180::3
  vip   
fe80::200:5eff:fe00:201
  vip   2001:6b0:17:180::1


(Unfortunately I don't have any immediate ideas of why VRRP for IPv6
doesn't work for you, or why you don't see the outgoing packets using
'monitor traffic'.  When I test on a couple of QFX:es and EX4600:s, I
can see both outgoing and incoming VRRP packets.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Cut through and buffer questions

2021-11-19 Thread Thomas Bellman via juniper-nsp
On 2021-11-19 10:07, Saku Ytti via juniper-nsp wrote:

> Cut-through does nothing, because your egress is congested, you can
> only use cut-through if egress is not congested.

Cut-through actually *can* help a little bit.  The buffer space in
the Trident and Tomahawk chips is mostly shared between all ports;
only a small portion of it is dedicated per port[1].  If you have
lots of traffic on some ports, with little or no congestion,
enabling cut-through will leave more buffer space available for
the congested ports, as the packets will leave the switch/router
quicker.

One should note though that these chips will fall back to store-
and-forward if the ingress port and egress port run at different
speeds.  (In theory, it should be possible to do cut-through as long
as the egress port is not faster than the ingress port, but as far
as I know, any speed mismatch causes store-and-forward to be used).
Also, if you have rate limiting or shaping enabled on the ingress
or egress port, the chips will fall back to store-and-forward.

Whether this helps *enough*, is another question. :-)  I believe
in general, it will only make a pretty small difference in buffer
usage.  I enabled cut-through forwarding on our QFX5xxx:es and
EX4600:s a few years ago, and any change in packet drop rates or
TCP performance (both local and long-distance) was lost way down
in the noise.  But I have seen reports from others that saw a
meaningful, if not exactly huge, difference; that was several
years ago, though, and I didn't save any reference to the report,
so you might want to classify that as hearsay...  (I have kept
cut-through enabled on our devices, since I don't know of any
practical disadvantages, and it *might* help a tiny little bit
in some cases.)


[1] Of the 12 Mbyte buffer space in Trident 2, which is used in
QFX5100 and EX4600, 3 Mbyte is used for per-port dedicated
buffers, and 9 Mbyte is shared between all ports.  I believe
on later chips an even larger percentage is shared.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Cut through and buffer questions

2021-11-19 Thread Thomas Bellman via juniper-nsp
On 2021-11-19 09:49, james list via juniper-nsp wrote:

> I try to rephrase the question you do not understand: if I enable cut
> through or change buffer is it traffic affecting ?

On the QFX 5xxx series and (at least) EX 46xx series, the forwarding
ASIC needs to reset in order to change between store-and-forward and
cut-through, and traffic will be lost until the reprogramming has been
completed.  Likewise, changing buffer config will need to reset the
ASIC.  When I have tested it, this has taken at most one second, though,
so for many people it will be a non-event.

One thing to remember when using cut-through forwarding, is that packets
that have suffered bit errors or truncation, so the CRC checksum is
incorrect, will still be forwarded, and not be discarded by the switch.
This is usually not a problem in itself, but if you are not aware of it,
it is easy to get confused when troubleshooting bit errors (you see
ingress errors on one switch, and think it is the link to the switch
that has problems, but in reality it might just be that the switch on
the other end that is forwarding broken packets *it* received).


> Regarding the drops here the outputs (15h after clear statistics):
[...abbreviated...]
> Queue: 0, Forwarding classes: best-effort
>   Transmitted:
> Packets  :6929684309190446 pps
> Bytes: 4259968408584 761960360 bps
> Total-dropped packets:  1592 0 pps
> Total-dropped bytes  :   2244862 0 bps
[...]> Queue: 7, Forwarding classes: network-control
>   Transmitted:
> Packets  : 59234 0 pps
> Bytes:   4532824   504 bps
> Total-dropped packets: 0 0 pps
> Total-dropped bytes  : 0 0 bps
> Queue: 8, Forwarding classes: mcast
>   Transmitted:
> Packets  :   655370488 pps
> Bytes:5102847425663112 bps
> Total-dropped packets:   279 0 pps
> Total-dropped bytes  :423522 0 bps

These drop figures don't immediately strike me as excessive.  We
certainly have much higher drop percentages, and don't see much
practical performance problems.  But it will very much depend on
your application.  The one thing I note is that you have much
more multicast than we do, and you see drops in that forwarding
class.

I didn't quite understand if you see actual application or
performance problems.


> show class-of-service shared-buffer
> Ingress:
>   Total Buffer :  12480.00 KB
>   Dedicated Buffer :  2912.81 KB
>   Shared Buffer:  9567.19 KB
> Lossless  :  861.05 KB
> Lossless Headroom :  4305.23 KB
> Lossy :  4400.91 KB

This looks like a QFX5100 or EX4600, with the 12 Mbyte buffer in the
Broadcom Trident 2 chip.  You probably want to read this page, to
understand how to configure buffer allocation for your needs:


https://www.juniper.net/documentation/us/en/software/junos/traffic-mgmt-qfx/topics/concept/cos-qfx-series-buffer-configuration-understanding.html

In my network, we only have best-effort traffic, and very little
multi- or broadcast traffic (basically just ARP/Neighbour discovery,
DHCP, and OSPF), so we use these settings on our QFX5100 and EX4600
switches:

forwarding-options {
cut-through;
}
class-of-service {
/* Max buffers to best-effort traffic, minimum for lossless ethernet */
shared-buffer {
ingress {
percent 100;
buffer-partition lossless { percent 5; }
buffer-partition lossless-headroom { percent 0; }
buffer-partition lossy { percent 95; }
}
egress {
percent 100;
buffer-partition lossless { percent 5; }
buffer-partition lossy { percent 75; }
buffer-partition multicast { percent 20; }
}
}
}

(On our QFX5120 switches, I have moved even more buffer space to
the "lossy" classes.)  But you need to tune to *your* needs; the
above is for our needs.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] QFX5100-48S-AFI/AFO vs QFX5100-48S-3AFI/AFO

2021-10-26 Thread Thomas Bellman via juniper-nsp
On 2021-10-26 23:27, Han Hwei Woo via juniper-nsp wrote:

> Does anyone know if there are any differences between the
> QFX5100-48S versions with or without the '3'? 

The normal version of QFX5100 have two management ethernet ports, one
SFP port and one twisted pair ("RJ45") port, while the 3AFI/3AFO
versions have three management ports, two SFP ports and one twisted
pair port.

I have never seen or used the 3AFI/3AFO version in the real world
myself, but from reading the hardware guide
(https://www.juniper.net/documentation/us/en/hardware/qfx5100/qfx5100.pdf)
it looks like the TP port and one of the SFP ports are actually
shared, so you can use *either*, but not both at the same time, and
that becomes the em0 interface.  (That's what some vendors call a dual
personality port.)  The other SFP port is then the em1 interface.

Otherwise they appear to be identical.


More modern Juniper models appear to only have twisted pair ports,
and it seems pretty random which ones have one port and which ones
have two ports.  For example, in the QFX5120 line, the -48Y and
-48YM modules have two management ports, while the -48T and -32C
models have one.  Weird.

(And hey, Juniper, how about making those management ports actually
useful, and connect them to an IPMI controller with support for Serial
Over LAN?  That would be super helpful, especially when you are e.g.
upgrading Junos on them.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] QFX3500 and... multicast forward (VRRP related)

2021-09-15 Thread Thomas Bellman via juniper-nsp
On 2021-09-13 13:56, Xavier Beaudouin wrote:

> I have a strange clue with an QFX3500-48S4Q, and with "simple" VRRP
> setup.
> 
> On port xe-0/0/6.0 I have a infrastructure (cisco switches) with a
> VLAN 3016 who want to be VRRP with an MX204 on et-0/1/0.0.
> 
> Current config of the switch :
> 
> [...]
> 
> Pretty "simple" configuration.
> 
> When I monitor traffic interface et-0/1/0.0 match vrrp I see the VRRP
> comming MX204 and on xe-0/0/6.0 I see also VRRP comming TO the QFX...
> Is there any reason why VRRP on the VLAN is not forwarded between
> et-0/1/0.0 and xe-0/0/6.0 ?
> 
> Did I missed something ?

The 'monitor traffic' command will only show packets that actually go
to or from the routing engine ("supervisor" in Cisco speak).  Traffic
that is just forwarded by the packet forwarding engine (the line cards
on an MX, or the ASIC on the QFX3500) don't show up.  Traffic that is
generated by or handled by the PFE itself, e.g. BFD on some platforms,
*also* are not visible to 'monitor traffic'.  (Also worth noting, is
that incoming traffic that *is* destined for the routing engine, but
is discarded by a firewall rule executing on the PFE, also won't be
seen by 'monitor traffic'.)

The fact that you see VRRP packets from both the Cisco and the MX,
is probably because it is multicast, so it is forwarded both to the
other port, *and* to the routing-engine.  But since the QFX3500 is
not itself doing any VRRP, you won't see any outgoing VRRP packets.

The interresting question is if the MX and the Cisco see each other's
VRRP packets.  Do they both think they are master?  (It seems like the
command to use is 'show vrrp' both in Junos and IOS.)  If both sides
believe they are master, then you have a real forwarding problem, but
if one side think they are backup, and says the other is master, then
the VRRP packets are forwarded as they should.

This is what it looks like on a Juniper:

  bellman@...> show vrrp 
  Interface  State  Group  VR state  VR Mode  TimerType  Address
  irb.13 up 1  backupActive   D 2.949  lcl  192.168.13.18
   vip  192.168.13.19
   mas  192.168.13.20

This says that my router's own IP address ("lcl") is 192.168.13.18,
the virtual address the routers are battling for ("vip") is
192.168.13.19, and the current master ("mas") is 192.168.13.20.
(I don't have a Cisco at hand to check what the output looks like
there.)

Note though, that if you run 'show vrrp' on the VRRP master, you
won't see any information about which nodes are backups.  That's
because in the VRRP protocol, only the master, or those that want
to take over the master role, sends any packets; those that are
backup and happy with being so, are silent.  (This is unlike
Cisco's older HSRP protocol, where both master and backup where
sending, allowing you to see all participating nodes by running
'show hsrp' on any of the nodes.)

If you run the 'monitor traffic' command on the MX, you should be
able to see the VRRP traffic it receives and/or sends.  (I don't
know if there is any equivalent on command on Cisco's operating
systems.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] QSFP+ to SFP+ adapters

2020-03-16 Thread Thomas Bellman
On 2020-03-16 21:06, Chuck Anderson wrote:

> Has anyone tried using QSFP+ to SFP+ adapters such as this one?  What
> software versions have you tried?
> 
> https://www.fs.com/products/72587.html
> 
> I'm testing these on QFX10002-36Q with 17.3R3-S7.2 and SFP+ 10G-LR modules.
> The links come up and pass LLDP and IP traffic, but DOM doesn't work:

We have some equivalent adapters from Mellanox.  On a QFX5120 (running Junos
192.R1-S3) where we use those adapters, I don't get any DOM either, but on
Dell Z9100 switches (running DNOS 9.14(2.4)) I do get DOM.  However, it might
be that the Dell switches got some newer version of the adapters; I'm fairly
certain the adapters in the Dells are ones we received in 2018, while I suspect
the adapters in the Juniper are ones we got in the 2012-2014 timespan.

I might be able to check and test sometime later this week, but no promises.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] LAG/ECMP hash performance

2019-08-29 Thread Thomas Bellman
On 2019-08-29 17:31 +0200, Robert Raszuk wrote:

> You are very correct. I was very highly surprised to read Saku mentioning
> use of CRC for hashing but then quick google revealed this link:
> 
> https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/hash-parameters-edit-forwarding-options.html
> 
> Looks like ECMP and LAG hashing may seriously spread your flows as clearly
> CRC includes payload and payload is likely to be different with every
> packet.

On what basis do you figure CRC "clearly" includes payload?  I see
no indication on that page, or a few other pages close by, that
anything but select layer 2 or layer 3/4 headers are used in the
hashes for LAG and ECMP.

Are you perhaps mislead by the 'forwarding-options enhanced-hash-key
hash-mode layer2-payload' setting?  My understanding is that its
meaning is to use select L3 and/or L4 headers, as opposed to using
select L2 headers, as input to the CRC function.  A better name for
that setting would probably be 'layer2/3-headers'.

https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/hash-mode-edit-forwarding-options-ex-series.html
says:

If the hash mode is set to layer2-payload, you can set the fields
used by the hashing algorithm to hash IPv4 traffic using the set
forwarding-options enhanced-hash-key inet statement. You can set
the fields used by the hashing algorithm to hash IPv6 traffic using
the set forwarding-options enhanced-hash-key inet6 statement.

The fields you can select/deselect are:

  - Source IPv4/IPv6 address
  - Destination IPv4/IPv6 address
  - Source L4 port
  - Destination L4 port
  - IPv4 protocol / IPv6 NextHdr
  - VLAN-id  (on EX and QFX 5k)
  - Incoming port  (on QFX 10k)
  - IPv6 flow label  (on QFX 10k)
  - GPRS Tunneling Protocol endpoint id


> Good that this is only for QFX though :-)

The 'hash-parameter' settings are not even valid on all QFX:es.  At
least Trident II (QFX 51x0) uses a Broadcom-proprietary hash called
RTAG7.  I'm guessing that using CRC16 or CRC32 for LAG/ECMP hasing
is just used on QFX 10k, not any of the Trident- or Tomahawk-based
routers/switches.


> For MX I recall that the hash is not computed with entire packet. The
> specific packet's fields are taken as input (per configuration) and CRC
> functions are used to mangle them - which is very different from saying
> that packet's CRC is used as input.

I don't think anyone has said that any product use the ethernet
packet's CRC for LAG/ECMP hashing.  Just that they might reuse
the CRC circuitry in the NPU/ASIC for calculating this hash, but
based on different inputs.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Junos 18.X on QFX5100

2019-05-26 Thread Thomas Bellman
On 2019-05-25 22:38 -0400, Philippe Girard wrote:

> Anyone running production Junos 18.X on QFX5100?
> 
> JTAC recommended still says 17.3R3-S4 but I'd really like to jump to 18 for
> some new features it offers.

We have one QFX5100-48S running 18.3R1.9 (the other QFX5100 we have
runs 17.3R3.10).  We also have a few EX4600, which is basically the
same as QFX5100, running a mix of 18.2, 18.3, 18.4 and 19.1.

So far, the only problem I have seen is that the Jet Service Daemon
(jsd) and the na-grpc-daemon starts eating 100% CPU after a few weeks
on 18.3, but not the other versions.  Restarting them helps; for a
few weeks, then they suddenly eat CPU again.  It should also be possible
to disable them if you don't use them (I haven't gotten around to do
that myself, though).

We don't use a huge amount of features, though.  OSPF (v2 and v3),
Spanning Tree (MSTP), multiple VRFs, some route-leaking between VRFs,
and DHCP relay.  A couple of static routes with BFD as well.  No BGP,
no VXLAN, no EVPN, no MPLS.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] JunOS 16.2R2.8 High CPU caused by python

2019-03-27 Thread Thomas Bellman
On 2019-03-26 21:11 -0400, Jason Lixfeld wrote:

> Not a solution, but an ignorant question - Is there a function to
> kill (and/or restart) the process in this type of scenario?  On
> IOS-XR, there were specific XR CLI wrappers for restarting a process
> as a means to fix stuff like processes run amok without having to
> reboot the box (or RE/RSP/LC/whatever was misbehaving).

There is a restart command in Junos, which does exactly that.  E.g:

bellman@Chili4> restart jsd   
JET Services Daemon started, pid 62402

However, it can only restart certain processes (on my switches, I see
64 possible daemons in the help when I press "?"), and ICMD does not
seem to be one of them.  (But that's on an EX4600 running 18.4R1; and
/usr/libexec/icmd doesn't even exist on it.)

Also, sometimes the name of the process binary does not match exactly
with the argument you are supposed to give to the restart command, so
you may need to think a little bit to figure that out.

(On 18.3, we had similar problems, but with jsd and ga-nrpc; after a
few weeks, they started using 100% CPU.  Restarting them helped, but
after another couple of weeks they ran amok again.  Doesn't happen in
18.2 or 18.4, though.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] 400G is coming?

2019-03-18 Thread Thomas Bellman
On 2019-03-18 23:24 +0200, Saku Ytti wrote:

> Cheaper is subjective. To a small and dynamic shop CAPEX may represent
> majority of cost. To an incumbent CAPEX may be entirely irrelevant,
> money is cheap, but approving hardware to network may be massive
> multiyear project. This is why platforms like GSR had so long tail.

It's just too often used even when the customer *is* capex sensitive, 
to fool them into believing they are saving money on the hardware.

"Buy this chassis based switch!  It costs twice as much as the fixed-
config datacenter switch, gives you half the number of ports [and the
ports are heavily oversubscribed], but four years later you can just
buy more and newer linecards [each costing as much as an entire fixed-
config switch] instead of replacing the entire switch!  [Oops, we
forgot to tell you that the new linecards will require a new super-
visor card as well.]  Protecting your investment [putting *your*
money in *our* coffers] is something we are good at!"

I've seen that kind of marketing and sales techniques, concentrating
on the capex (and being misleading, if not outright lying, about it),
too many times, and people falling for it.  I've kind of become allergic
to that wording...

Sorry for the rant...


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] 400G is coming?

2019-03-18 Thread Thomas Bellman
On 2019-03-18 21:05 UTC, Tim Rayner wrote:

> As I understand it, when a 400G port is enabled, 3 of the 100G ports
> are made un-available (not sure whether there is an option for sub-rate
> on the 400G port keeping more of the 100G ports available), hence there
> will be a limit of 1.5 Tbps per slot with no over-subscription.  It is
> actually a 15x100G card... each group of 5 ports can enable one of its
> ports as 400G, thereby disabling three of its other 100G ports... there
> is 500Gbps of capacity per port group = available as 5 x 100G, or 1 x
> 400G + 1 x 100G.

That makes sense.  And I'm not opposed to oversubscribed linecards,
as long as it's clear that they are, and how the oversubscription
works (port groups, et.c).  Presumably Juniper will put up a page
with more details soon, as they have for other linecards.

/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] 400G is coming?

2019-03-18 Thread Thomas Bellman
On 2019-03-14 13:40 -0400, Andrey Kostin wrote:

> Accidentally found that MX series datasheet now mentions MPC-10E with
> 400G ports
> https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000597-en.pdf
[...]
> the MPC-10E protects existing investments

Gah, I hate that wording.  To me it sounds like "sunk cost fallacy"
and "throwing good money after bad"...  (I'm not necessarily saying
that applies to these cards.  It's just that I have heard the words
"protect your investment" too many times when it would be much
cheaper and better to throw out and replace the old stuff entirely.
Seeing that in advertisments thus trigger my bullsh*t klaxons.)


> MPC10E-10C
> Modular port concentrator with 8xQSPF28 multirate
> ports (10/40/100GbE) plus 2xQSFP56-DD multirate
> ports (10/40/100/400GbE)
> MPC10E-15C
> Modular port concentrator with 12xQSPF28 multirate
> ports (10/40/100GbE) plus 3xQSFP56-DD multirate
> ports (10/40/100/400GbE)

It seems these are oversubscribed to the backplane.  8×100G + 2×400G
is 1.6 Tbit/s, and 12×100G + 3×400G is 2.4 Tbit/s, but all three of
MX240, MX480 and MX960 are listed as having 1.5 Tbit/s max per slot.
(And is that 1.5 Tbit/s in *and* out, or is that just 750 Gbit/s per
direction?)

And nothing for 400G DWDM/coherent, that I can see.  I expect service
providers would like that, to run 400G on their long distance links
without having to have external transponders.

For our own use, I'm also hoping for linecards supporting 50G ports
(specifically, 50Gbase-LR) soonish.  We have two MX480s as our border
routers (provided by our ISP) and currently have 100G uplinks to the
ISP, and are connecting our datacenter to the MX480s using multiple
10G links to our DC spines.  We are kind of hoping to be able to up-
grade to 50G connections next year, or possibly the year after that.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] OSPF reference-bandwidth 1T

2019-01-22 Thread Thomas Bellman
On 2019-01-22 12:02 MET, Pavel Lunin wrote:

>> (I am myself running a mostly DC network, with a little bit of campus
>> network on the side, and we use bandwidth-based metrics in our OSPF.
>> But we have standardized on using 3 Tbit/s as our "reference bandwidth",
>> and Junos doesn't allow us to set that, so we set explicit metrics.)

> As Adam has already mentioned, DC networks are becoming more and more
> Clos-based, so you basically don't need OSPF at all for this.
> 
> Fabric uplinks, Backbone/DCI and legacy still exist though, however in
> the DC we tend to ECMP it all, so you normally don't want to have unequal
> bandwidth links in parallel in the DC.

Our network is roughly spine-and-leaf.  But we have a fairly small net
(two spines, around twenty leafs, split over two computer rooms a couple
of hundred meters apart the way the fiber goes), and it doesn't make
economical sense to make it a perfectly pure folded Clos network.  So,
there are a couple of leaf switches that are just layer 2 with spanning
tree, and the WAN connections to our partner in the neighbouring city
goes directly into our spines instead of into "peering leafs".  (The
border routers for our normal Internet connectivity are connected as
leafs to our spines, though, but they are really our ISP's CPE routers,
not ours.)

Also, the leaves have wildly different bandwidth needs.  Our DNS, email
and web servers don't need as much bandwidth as a 2000 node HPC cluster,
which in turn needs less bandwidth than the storage cluster for LHC
data.  Most leaves have 10G uplinks (one to each spine), but we also
have leafs with 1G and with 40G uplinks.

I don't want a leaf with 1G uplinks becoming a "transit" node for traffic
between two other leafs in (some) failure cases, because an elephant flow
could easily saturate those 1G links.  Thus, I want higher costs for those
links than for the 10G and 40G links.  Of course, the costs don't have to
be exactly  / , but there need to be some relation
to the bandwidth.

> Workarounds happen, sometimes you have no more 100G ports available and
> need to plug, let's say, 4x40G "temporarily" in addition to two existing
> 100G which are starting to be saturated. In such a case you'd rather
> consciously decide weather you want to ECMP these 200 Gigs among six
> links (2x100 + 4x40) or use 40GB links as a backup only (might be not
> the best idea in this scenario).

Right.  I actually have one leaf switch with unequal bandwidth uplinks.
On one side, it uses 2×10G link aggregation, but on the other side, I
could use an old Infiniband AOC cable giving us a 40G uplink.  In that
case, I have explicitly set the two uplinks to have the same costs.


/Bellman, NSC



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] OSPF reference-bandwidth 1T

2019-01-17 Thread Thomas Bellman
On 2019-01-16 16:41 MET, Saku Ytti wrote:

> No one should be using bandwidth based metrics, it's quite
> non-sensical. I would recommend that if you have only few egress
> points for given prefix, adopt role based metric P-PE, P-P-city,
> P-P-country etc. If you have many egress options for given prefix
> latency based metric might be better bet.

You are obviously talking about a service-provider perspective here,
since you are talking about P and PE.  Not an unreasonable assumption
on this list of course, but I don't see any indication of what kind of
network Event Script is running

Would you advise avoiding bandwidth-based metrics in e.g. datacenter
or campus networks as well?

(I am myself running a mostly DC network, with a little bit of campus
network on the side, and we use bandwidth-based metrics in our OSPF.
But we have standardized on using 3 Tbit/s as our "reference bandwidth",
and Junos doesn't allow us to set that, so we set explicit metrics.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] interface-range inheritance change in 14.1X53 and 15.1

2018-12-21 Thread Thomas Bellman
On 2018-12-21 23:22 UTC, Anderson, Charles R wrote:

> Can anyone shed some light on WHY this change was made?  I much prefer
> the old behavior.
> 
> From PR1281947:
> 
> "The behavior of the "interface-range" configuration statement changed
> in 14.1X53 and 15.1. Prior to 14.1X53 and 15.1, more specific configuration
> on the interface would supersede what was configured in the interface-range.
> In 14.1X53 or 15.1, if there was specific interface configuration, the
> configuration in "interface-range" will append to it. For example, if there
> is a mismatch of VLAN assignment between the interface-range and the direct
> interface configuration, it will append the configured VLAN and cause an
> invalid configuration in 14.1X53 or 15.1."

I don't know Juniper's motivations, but the new behaviour *allows* you
to append, which could be useful for trunk mode interfaces.  The old
behaviour didn't.  (I don't use this myself for interface ranges, but I
do use it for groups/apply-groups; I don't know if groups also changed
chnaged their behaviour.)

To, kind of, emulate the old behaviour, you could use multiple ranges:

interface-range all-hosts {
member-range xe-0/0/4 to xe-0/0/23;
mtu 9216;
unit 0 {
family ethernet-switching {
interface-mode access;
}
}
}
interface-range most-hosts {
member-range xe-0/0/4 to xe-0/0/14;
member-range xe-0/0/16 to xe-0/0/23;
unit 0 {
family ethernet-switching {
vlan { members HostVLAN; }
}
}
}
xe-0/0/15 {
unit 0 {
family ethernet-switching {
vlan { members OtherVLAN; }
}
}
}

Admittedly, a fair bit more clumsy, but it *is* possible to express.

Maybe there should have been a 'except-interface-range '
statement, similar to 'apply-groups-except'?


(A nicer way of expressing VLAN assignments, would be to throw out
the 'interface-mode' statement, and replace 'vlan members' with two
statements: 'untagged-vlan ' and 'tagged-vlans '.  I
suppose that would also have solved your practical problems, since
singletons are still overridden by more specific declarations, and I
guess you want this for access-mode interfaces.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Silly command?

2018-12-13 Thread Thomas Bellman
On 2018-12-13 16:10, Chris Adams wrote:

> While configuring a new MX204, I noticed this:
> 
> admin@newrouter> request vmhost power-o?
> Possible completions:
>   power-offPower off the software on RE
>   power-on Power on the system
> 
> Umm, why is there a CLI command to turn the router ON?

The command is to power on the *other* routing-engine:

> request vmhost power-on ?
Possible completions:
  other-routing-engine  Power on other Routing Engine

Power off, on the other hand, seems like it can be done either on
the current RE, or on the other RE:

> request vmhost power-off ?
Possible completions:
  <[Enter]>Execute this command
  other-routing-engine  Power off other Routing Engine
  |Pipe through a command

I have never actually tried executing those commands, just checked
what is available with completion.  And reading the documentation:


https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/request-vmhost-power-on.html

Why that command is available even on an MX204, which I believe
only has a single RE, I don't know.  (The only MX:es I have access
to are MX480s.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Network automation vs. manual config

2018-08-19 Thread Thomas Bellman
On 2018-08-19 08:11, Nathan Ward wrote:

> I would be interested in a way to build a command alias with
> `| display inheritance | display commit-scripts | display omit | exclude #`
> or something - `exclude #` isn’t the best either, as # is often in int
> description etc.

Slightly aside: instead of using exclude to get rid of the inheritance
comments, why not use the existing variants of 'display inheritance'?
Add one of the keywords 'brief', 'terse' or 'no-comments' to get less
and less amount of comments about where things were inherited from.
I.e, 'show configuration | display inheritance no-comments'.

/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4550 or QFX5100 for Core

2018-08-07 Thread Thomas Bellman
On 2018-08-07 14:21, Giovanni Bellac via juniper-nsp wrote:

> Sorry, my first email was not clear enough that I require Base-T
> (copper) ports.
> QFX5110 etc. are looking great on paper, but with copper optics the
> docs are saying:
> ###
> Caution
> Do not place a copper transceiver in an access port directly above or below
> another copper transceiver. Internal damage to the access ports and switch
> can occur. For copper transceivers, we recommend either using the top port
> row exclusively, or the bottom port row exclusively.
> ###

Some TP transceivers are small enough that it isn't a problem.  But be
careful, and verify, if you go down that route.

More importantly, 10 Gbit/s TP transceivers are not supported.  You can
buy such transceivers from some third-party vendors, but they *are*
violating the specifications for SFP+ ports, drawing more power than
SFP+ ports are required to deliver.  It might work, as many switches
can deliver more power than the spec requires, or it might not.  Or it
might work for a few transceivers, but not if you fill all ports with
such transceivers.

So, I agree, if you need 10Gbase-T, then the QFX5110 or EX4600 is not
what you should look at.

> So the options are limited with EX4550-32T and QFX5100-48T...
> Kind regardsGiovanni

There is also the EX4300-48MP, with 24 TP ports that do 10/100/1000
Mbit/s, and 24 TP ports that do 1/2.5/5/10 Gbit/s.  There is also a
slot for a module where you can get four SFP+ ports, two QSFP ports,
or one QSFP28 ports, if you need a couple of fiber connections.

I'm assuming that the EX4300-48MP is cheaper than a QFX5100-48T, but
I have never priced one, or used one.  (Note that I believe you need
to buy an extra license to run OSPF, or use VRFs, on the EX4300, while
that is included in the base license for QFX5100.  Both require extra
license to run BGP or IS-IS.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] EX4550 or QFX5100 for Core

2018-08-03 Thread Thomas Bellman
On 2018-08-03 16:39, Giovanni Bellac via juniper-nsp wrote:

> So, we want something new with JTAC support. We need (1/10G)-Base-T,
> VLAN, L3, nothing fancy, but stable. We have 3k ARP entries.
>
> Option 1) 2x EX4550
>
> Option 2) 2x QFX5100
>
> We want to keep simplicity in and therefore want to use VC. We are
> pushing some Gbit/s from Rack-to-Rack (backups) and to our two
> upstreams around 500-600Mbit/s.
> QFX5100 hardware seems to be MUCH better than EX4550 hardware. The ARP
> table size, hash table size etc. on EX4550 is relatively small.

We have two QFX5100-48S in our datacenter core/spine, and have been
pretty happy with them.  They have been running for four years, and
I currently expect that we will replace them in another three years
(i.e. sometime in 2021).

We are however *not* running virtual chassis, but run them as stand-
alone units, with a mix of OSPF, Spanning Tree and VRRP to connect
them, and the leaf switches/routers, together.

We also have much smaller ARP (and MAC) tables than you.  Partly
because we mostly run L3 (OSPF) out to our leafs.  Currently we
have about 150 ARP entries, 70 IPv6 neighbour entries and 150 MAC
addresses in each.  We have about 1200 IPv4 routes and 450 IPv6
routes in our routing tables (that includes four VRFs).


We are currently not running BGP on them.  (The border routers are
MX480s, provided by our ISP, the Swedish NREN, and they talk BGP to
the NREN core routers, but we only talk OSPF to the border routers.)


We have almost only fiber, and mostly single-mode.  Just a couple of
1000Base-T connections.

If you don't need 10Gbase-T, and don't need all of the ports in a
QFX5100, you might want to take a look at EX4600.  The same hardware
as in QFX5100 (Broadcom Trident II), but fewer ports, and cheaper.
I think there are a few MPLS features that are disabled in the
EX4600.  We have recently bought a few of those to have as leaf
routers/switches in our datacenter, but I'm in the middle of taking
the first of them into production, so don't have much experience.
Given the similarities to QFX5100, I don't *expect* problems, though.

(If there had been a 10GBase-T version of the EX4600, I think we
would have bought that for some of the leafs instead, since TP cables
are much easier to handle in a rack than fiber or DAC, and cheaper
as well.)



There are several reasons for not running virtual chassis:

 - Standards-based protocols (OSPFv2, OSPFv3, Spanning Tree, VRRP).
   If we want to change vendors, or just generations with the same
   vendor, that is much easier than if they run a proprietary VC
   protocol.  Just add the new spines, move connections one by one
   to the new hardware, and finally turn off the old ones.

   With a virtual chassis, you would need to add routing protocols,
   STP, VRRP, et.c first between the old and new VC.  And that is
   not always possible to do without (short) downtimes.

 - Independant control planes.
   A shared control plane can cause a bug to take out both routers.

 - Loose coupling
   Virtual chassis requires much more state to be shared between the
   units, which makes the implementation more complicated.  Being a
   former programmer, I'm averse to that. :-)  I have also heard too
   many stories (about many vendors) about limitations or problems
   with virtual chassis to feel comfortable with that.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Spine & leaf

2018-06-27 Thread Thomas Bellman
On 2018-06-26 21:38, David Sinn wrote:

> OSPF scales well to many multiples of 1000's of devices.

Is that true even for Clos (spine & leaf) networks, and in a single area?

My understanding, solely based on what others have told me, is that
the flooding of LSAs in a Clos network can start to overwhelm routers
already at a few hundred devices, as each time e.g. a spine sends out
an LSA, all of the other spines will hear that from each of the leaves,
all more or less simultaneously.  And since the OSPF protocol limits
lifetimes of LSAs to 1 hour, you will get a constant stream of updates.

(And if you have multiple VRFs, the load might be multiplied by some
factor, depending on how many devices those VRFs exist on.)

The topologies of classical core-distribution-access networks would
not suffer as bad, nor would most ISP networks.

My own experience is only with pretty small networks (we currently have
2 spines and 11 leaves in our area, the rest of the university have a
couple dozen routers, and the OSPF database in our area contains ~1600
LSAs).  Thus, I can only repeat what others have told me, but I'm
curious to hear real-world experience from people running larger
OSPF networks.


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and sometimes we don't understand
 the hardware, but we can *see* the blinking lights!"



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Spine & leaf

2018-06-25 Thread Thomas Bellman
On 2018-06-25 18:22, Scott Whyte wrote:

> BGP, as you say, provides excellent filtering capabilities.  What
> does OSPF/ISIS bring to the table?

Automatic discovery of peers, and thus less unique configuration.  You
don't need to configure each peer individually, just the interface.  If
you do unnumbered links, you don't even need to allocate link networks
for your routing links, giving even less unique configuration.  Just

  set interfaces xe-0/0/17.1 family inet unnumbered-address lo0.1
  set interfaces xe-0/0/17.1 family inet6
  set protocols ospf area A.B.C.D interface xe-0/0/17.1 interface-type p2p
  set protocols ospf3 area A.B.C.D interface xe-0/0/17.1 interface-type p2p

and you're done.  The nice thing is that the only unique piece of
configuration is the interface name.

Doing unnumbered links for BGP seems to at least be more complicated,
but Cumulus Linux is supposed to have support for it, making it as easy
to configure as OSPF.
(https://blog.ipspace.net/2015/02/bgp-configuration-made-simple-with.html;
I've never used Cumulus, just read about it.)


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] OSPFv3 monitoring using SNMP

2018-06-08 Thread Thomas Bellman
I'm trying to understand the SNMP information my Junipers give
me for OSPF v3, specifically the indices for ospfv3NbrTable and
ospfv3IfTable.  (I want to write a Nagios plugin for checking
OSPFv3 neighbours.)

RFC 5643 says that there is an interface index in there, e.g. in the
ospfv3NbrTable the index includes ospfv3NbrIfIndex.  According to
section 2.1, those references the "IPv6 Interface Table", which
ought to be IP-MIB::ipv6InterfaceTable (defined in RFC 4293).  Junos
doesn't seem to implement that, though, and instead implements RFC
2465, where we instead have IPV6-MIB::ipv6IfTable.

IPV6-MIB::ipv6IfTable contains the column ipv6IfLowerLayer, which
points directly into IF-MIB::ifTable.

However, the data that I get from my Junipers don't match this:

   $ snmptable -Ci -v2c -c... lo-nsc5 OSPFV3-MIB::ospfv3NbrTable
   SNMP table: OSPFV3-MIB::ospfv3NbrTable

 index  ospfv3NbrAddressType  ospfv3NbrAddress  [...]
2.0.2196529913  ipv6  [...]
6.0.2196507131  ipv6  [...]
7.0.2196507132  ipv6  [...]
8.0.2196529907  ipv6  [...]
9.0.2196529910  ipv6  [...]
   10.0.2196529911  ipv6  [...]
   11.0.2196529908  ipv6  [...]
   13.0.2196529909  ipv6  [...]

The index here is the three-tuple .
Then looking at the ipv6IfTable:

   $ snmptable -Ci -v2c -c... lo-nsc5 IPV6-MIB::ipv6IfTable
   SNMP table: IPV6-MIB::ipv6IfTable

index ipv6IfDescr ipv6IfLowerLayer ipv6IfEffectiveMtu   [...]
   16   lo0.0   IF-MIB::ifIndex.16  4294967295 octets   [...]
  511 xe-0/0/25.0  IF-MIB::ifIndex.5111500 octets   [...]
  512 pfe-0/0/0.16383  IF-MIB::ifIndex.512  4294967295 octets   [...]
  571 irb.121  IF-MIB::ifIndex.5711500 octets   [...]
  650 irb.110  IF-MIB::ifIndex.6501500 octets   [...]
  659 irb.101  IF-MIB::ifIndex.6591500 octets   [...]
  662 lo0.104  IF-MIB::ifIndex.662  4294967295 octets   [...]
  664 xe-0/0/18.0  IF-MIB::ifIndex.6641500 octets   [...]
  672 xe-0/0/16.0  IF-MIB::ifIndex.6721500 octets   [...]
  675 et-0/0/48.1  IF-MIB::ifIndex.6751500 octets   [...]
  678   et-0/0/48.104  IF-MIB::ifIndex.6781500 octets   [...]
  680 irb.103  IF-MIB::ifIndex.6801500 octets   [...]
  686ge-0/0/30.23  IF-MIB::ifIndex.6861500 octets   [...]
  694xe-0/0/4.104  IF-MIB::ifIndex.6941500 octets   [...]
  698xe-0/0/5.104  IF-MIB::ifIndex.6981500 octets   [...]
  702 xe-0/0/12.1  IF-MIB::ifIndex.7021500 octets   [...]
  719  xe-0/0/1.1  IF-MIB::ifIndex.7191500 octets   [...]
  721  xe-0/0/0.1  IF-MIB::ifIndex.7211500 octets   [...]

It doesn't have the any interface with index 2, 6, 7 and so on.
For a short while, I thought that those numbers where *positions*
in ipv6IfTable, but that doesn't match either; e.g. neighbour
6.0.2196507131 actually is on the other end of xe-0/0/0.1 (the
last entry in ipv6IfTable), while 7.0.2196507132 is on the other
end of xe-0/0/1.1 (the second to last entry in ipv6IfTable).

Our HP ProCurve and HPE FlexFabric switches don't behave like this.
The interface indices in ospfv3NbrTable and ospfv3IfTable matches
the indices in ipv6InterfaceTable, which matches the indices in
ifTable.

Am I totally misunderstanding the OSPFv3 MIB, or is the Junos
implementation broken?  Does Junos have some other SNMP table I
should look in to be able to map neighbours to interfaces?

(I see the same behaviour on Junos 17.2R1.13, 17.3R1-S3 and
18.1R1.9.)


-- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"Life IS pain, highness.  Anyone who tells   !  bellman @ nsc . liu . se
 differently is selling something."  !  Make Love -- Nicht Wahr!



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] QFX5100 buffer allocation

2018-05-17 Thread Thomas Bellman
On 2018-05-17 02:41, Brian Rak wrote:

> We're not even doing 10gbit of traffic, so the buffers should last at
> least a little bit.

And you're not hitting 10 Gbit/s even under very short bursts of a few
milliseconds?  Microbursts like that don't show up in "normal" usage
graphs where you only poll your switches/routers every minute or so.

> Thanks for the tip about cut-through, we didn't have that enabled.
> Do you happen to know if it works from a 10g port to a broken out
> 4x10g port?

Should do.  From the perspective of the Trident II chip, they are
not any different from normal 10G ports.  Cut-through doesn't work
between ports of different speed, and the ports involved must not
have any rate limiting or shaping, but other than that I don't know
of any limitations.  (And if you receive broken packets, those will
be forwarded instead of thrown away; that is the only disadvantage
of cut-through mode that I have heard of.)

> It's annoying to be dropping packets with a bunch of unused buffer
> space.

Just make sure you don't fill your buffers so much that you get a long
(measured in time), standing queue, since that will just turn into a
long delay for the packets without helping anything (search for "buffer-
bloat" for mor information).  Not a big problem on Trident II-based
hardware, but if you have equipment that boasts about gigabytes of buffer
space, you may need to watch out.

Oh, and I believe both changing buffer allocation and enabling/disabling
cut-through mode resets the Trident chip, causing a short period (less
than one second, I belive) where traffic is lost.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] QFX5100 buffer allocation

2018-05-16 Thread Thomas Bellman
On 2018-05-16 18:06, Brian Rak wrote:

> We've been trying to track down why our 5100's are dropping traffic
> due to lack of buffer space, even with very low link utilization.

There's only 12 Mbyte of buffer space on the Trident II chip.  If you
get 10 Gbit/s bursts simultaneously on two ports, contending for the
same output port, it will only take 10 ms to fill 12 Mbyte.  (And of
those 12 Mbyte, 3 Mbyte is used for dedicated per-port buffers, so you
really only have ~9 Mbyte, so you would actually fill your buffers in
7.5-8 ms.)

Do you see any actual problems due to the dropped packets?  Some people
would have you believe that TCP suffers horribly from a single dropped
packet, but reality is not quite that bad.  So don't chase problems
that aren't there.

Our busiest ports have drop rates at about 1 in every 15'000 packets
(average over a few months), and so far we haven't noticed any TCP
performance problems related to that.  (But I should note that most
of our traffic is long-distance, to and from sites at least several
milliseconds away from us, and often a 10-20 ms away.)

That said, for Trident II / Tomahawk level of buffer sizes, I think
it makes sense to configure them to have it all actually used, and
not wasted on the lossless queues.

You should probably also consider enabling cut-through forwarding, if
you haven't already done so.  That should decrease the amount of buffer
space used, leaving more available for when contention happens.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX204 and copper SFP?

2018-04-05 Thread Thomas Bellman
On 2018-04-04 21:09, Niall Donaghy wrote:

> Even more sad to see that 1G ports retain their xe- naming rather than
> changing to ge- as you would hope and expect.

I have never understood the reason for having different names for
ports depending on the speed of the transceiver.  To me, it just
makes things more confusing.

Can someone enlighten me on the benefits of that?


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] STP in spine leaf architecture

2017-10-27 Thread Thomas Bellman
On 2017-10-26 18:11 (CEST), Hugo Slabbert wrote:

> [...] in a general a spine & leaf setup should be L3 for interswitch
> links, so any STP should be local to a given switch.  [...]
> Here I'm just talking about a vanilla spine & leaf setup, not anything
> Juniper-specific e.g. QFabric or VCF or whatnot.

You can also build a spine & leaf setup using TRILL och Shortest Path
Bridging (SPB), in which case you have a single large layer 2-domain.
Not using Juniper equipment, though, since Juniper supports neither
TRILL nor SPB...

> I'd be curious about more specific details from folks running QFX in
> prod in this type of setup.

You are generally correct though.  Configure your swithc-to-switch
links as L3 ports (i.e. 'interface ... unit ... family inet/inet6',
not 'family ethernet-switching'), and some routing protocol like
OSPF, IS-IS or BGP.  BGP is fairly popular in datacenter settings,
but OSPF works fine as well, as should IS-IS.

Layer 2 domains should be kept to a single leaf switch, and thus you
don't need to run Spanning Tree at all.  And definitely not on your
links between spines and leafs, since that would block all but one of
the uplinks, and give you all the pains of Spanning Tree without any
of the benefits.  (You *might* want to run STP on your client ports and
configure them as edge ports with bpdu-block-on-edge, to protect against
someone misadvertently connecting two L2 client ports togethere.)

(I don't run a pure spine-and-leaf network myself.  I am trying to
migrate towards one, but we still have several "impurities", and
have STP running in several places.)


-- 
Thomas Bellman <bell...@nsc.liu.se>
National Supercomputer Centre, Linköping University, Sweden



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Many contributing routes

2017-08-09 Thread Thomas Bellman
On 2017-08-09 09:05, Vincent Bernat wrote:

> I am generating a default route to distribute with a policy statement
> like that:
> 
> #v+
> policy-statement v4-DEFAULT-ROUTE-GENERATE {
[...]
> }
> #v-
> 
> This works just fine but there are a lot of contributing routes (about
> 400k) to the generated route. Is it harmless or will I run into trouble
> for that?

That is a pretty common thing to do when you inject a default
route into your IGP (OSPF or ISIS) from your BGP-talking border
routers.  At least for those of us who are end-users, and not
ISPs ourselves, since our internal routers often do not handle
a full Internet table.  If you have a full Internet BGP feed
in your border router, you will of course then get hundreds of
thousands of contributing routes.  If this was problematic,
lots of people would complain...

Usually, you would use an 'aggregate' route for that (i.e. 'edit
routing-options aggregate'), or a 'generate' route ('edit routing-
options generate') with an explicit discard or reject target, not
retaining the nexthop from any of the contributing routes.  My
understanding is that if you use a generate route that keeps the
nexthop from some contributing route, you usually have fairly few
contributors, but I would not expect it to be a problem having
400k contributing routes.


/Bellman



signature.asc
Description: OpenPGP digital signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

[j-nsp] Cut-through forwarding statistics on QFX5100

2017-06-21 Thread Thomas Bellman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

We have a QFX5100 on which we have enabled cut-throug forwarding
('set forwarding-options cut-through').  Now I am curious about
how effective that is.  Is there any statistics one can dig out
about how often the switch is able to do cut-through, and how
often it has to fall back to store-and-forward?  I haven't found
anything in 'show interfaces extensive', nor any other promising
'show' command.

The dream would of course be to get counters for every combination
of ingress and egress interface, but I would be happy to get just
counters for just ingress or egress, or even just a total for the
entire system.

The switch is running Junos 17.2R1 if that matters.


- -- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"We don't understand the software, and! bellman @ nsc . liu . se
 sometimes we don't understand the hardware,  !
 but we can *see* the blinking lights!"   ! Make Love -- Nicht Wahr!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJZSvMeAAoJEGqUdKqa3HTsuwUP/3YluYW/rUn9igIAd4Jvv/HI
g+Dnr6WuHejv9lKzVaZnsf8XuJkEzXpsVuEzGcBOyn10a77axltEhVwp/pUdPLM6
/tr04Z0P3at51sQUP3SCot+CxPhfVkJ4TB9j5HET5zouwRBTDFlvPhpRoWStaGjz
VV/5AlQDAD3xrw49ypFiJ+FQ122hj1Xz6EVzioDPf/zbtg86c8mxkhZPiaS+H2jE
Pf3H0hTDtL7YZSKlmevX0u2jiNJFALutgkvCNB8+vZjO2eYdjGEtdsrz4Nxw2Mkv
46Q8niW75jOovLmnkQj1iMyaG9DljXf76zl1jQyXlKWX97+pV88OQPqYWZOf3rUZ
dNZyyy+7kF0iMLOG1u/F8ufTm0PqXgbWtx4EaSjj7kwUNCxlv/IvuBnxsB4cv8dH
vzsy4hxpRkMMyqmHuXrPIdj24Dk9+vpxJgsZ/X6GsVkrTwrkghBTCTxovhuQq7RM
ZMN4rIVGwSBYE8tKFcM6Rfs/nHhfggvwBKGs7OuKlMoGia5F5bwTZTIRQkiGoWuY
ipLXVhk713eecn/Ftpr0TYA43H+8LlOMgFWhO+u0hcS0y2wi++lHvDFRxamsxz9G
by7sp6DH3twx7I7fT/8zFL+tSD7scIeF8g7f91oW3kX8KV8WJ6YnuLyfSUDHSf+y
n9bM0LvWv9FLfiX2S4dq
=JXYJ
-END PGP SIGNATURE-
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] flowspec in logical-systems

2017-04-09 Thread Thomas Bellman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2017-04-07 20:43, Aaron Gould wrote:

> Do you all use logical-systems in your operational network?  How pleased are
> you with them?  I have an MX104 with about 8 lsys's and I am using it for a
> study lab and love it.

Our ISP uses logical systems on their CPE routers to provide their
customers (us) access to them, so we don't have to e.g buy routers
with BGP licenses.  We can also use them as part of our core network,
if we so wish; just pay for the extra linecards if what they provide
by default is not enough.

(Actually, since they upgraded from MX80 to MX480 as CPE routers last
year, we customers get to use the main instance, and our ISP has a
logical system which they use for their purposes.  But they are the
NREN, and we are universities, so they trust us to not abuse it.  A
commercial ISP might be more reluctant of doing it that way...)

At our site, we have also created a logical system for managing what
is essentially just a VRF.  The intent was to be able to let some
persons manage that without being able to affect the rest of the
configuration and screw up the entire university's network, and also
to separate out that part, so it doesn't clutter up the rest of the
configuration.


> I envision being able to cleanly separate router functions in my network for
> P or PE type things... and uplink PE to P using a lt-0/0/0 interface with
> mpls on it.

You should be aware that there are some limitations to logical systems,
and they aren't quite as independant and isolated as one might expect.
I believe for example that you can't do netflow or multi-chassis LAG in
an lsys.  And SNMP monitoring is configured in the main instance, not
per logical system.  (You can limit SNMP communities to specific logical
systems, but it can break SNMP monitoring in other ways; I don't remeber
the details about this, though.)

Also, traffice over logical tunnel interfaces, has to go via the
backplane, which may limit the bandwidth you can use.  At least with
the linecards we have in "our" MX480, we are limited to 65 Gbit/s for
such traffic.  Thus, we as a customer talk BGP with ISP's core router
in their POP elsewhere in the city, not with the ISP's logical system
in the CPE router over a logical tunnel interface.  (We don't use
enough bandwidth for this to be a practical problem at the moment,
though.)

If you just want to separate configuration into related chunks, then
using groups might be a viable alternative to logical systems.  And
you don't need MX class hardware. :-)  I use that on the QFX5100:s I
have as core router/switches at my department.  Then I can use
'show configuration groups FOO' to see everything concerning FOO,
without having to wade through everything that concerns FIE or FUM.


- -- 
Thomas Bellman,  National Supercomputer Centre,  Linköping Univ., Sweden
"Life IS pain, highness.  Anyone who tells   !  bellman @ nsc . liu . se
 differently is selling something."  !  Make Love -- Nicht Wahr!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJY6lLpAAoJEGqUdKqa3HTs3jgP/1tDEWvyvlPC718QttOJAJbm
Hqqy2n15jkAgFipiSmunWYvpGHxN6s/aEX5D0vEzf89FNeqWVF7ge8QnVEnW06oC
M8Ze0MFEpSAKRzD91/7zXGZ1nBMAX6u3VOj/AQJ85fCVtWeM4vKTDw4V7kvZBdl/
fimxrfWx9vqp0pn4ICtQ35QTgFbUqnNVMNsnwxV/ganOmEaOEjUvkPuISvswDueP
/WYBlzmpRMPd4VP87byr9AujoBi/LxLuY5HWw57EwKeoMxF2XS9W0cGg0CK2JSEE
xlXJSgkCn33zf7HrhlwyzIUSiM3y/anH3R1v0isBRqbcON4tbWKwmYi9xw+7KoIr
LIMwB2bfHWWwSXiWNnia5WlqrXhEpzqYA3h6NxWvKCvVtxW1Y2y4aBThHuLmbooB
X2vPdQjhT3CCrMBc8nllRFfIncVyrbOUDlfLs7M9aDW53FURaa+s/7NuvboxEzP9
J1+grkMXguKBlRSPAEehHW7y+dVOooaKi7kAt1R94xDfgBT91VAqgKkm3o85r4rb
9SVyoqRVamN7brZls3FzCWItnFpiTPOc4vcmudHv5aaks3ne1dnQH1Zl9RVs/fcI
Ks+tzcfcUePMwMHeH9TcG9CYWsoan+ud0on5BX44EdDPAC3Krrb6/gZPKknegp6Q
XdrlmQZc4s1o2Tgdu+vJ
=Xg8H
-END PGP SIGNATURE-
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp