Re: Updated prefix filtering

2015-05-10 Thread Mark Andrews

In message 
, Dave Taht writes:
> On Fri, May 8, 2015 at 3:41 PM, Chaim Rieger  wrote=
> :
> >
> > Best example  I=E2=80=99ve found is located at http://jonsblog.lewis.org/=
>  
> >
> > I too ran out of space, Brocade, not Cisco though, and am looking to filt=
> er prefixes. did anybody do a more recent or updated filter list  since 200=
> 8 ?
> >
> > Offlist is fine.
> >
> > Oh and happy friday to all.
> 
> I have had a piece long on the spike on how we implemented bcp38 for
> linux (openwrt) devices using the ipset facility.
> 
> We had a different use case (preventing all possible internal rfc1918
> network addresses from escaping, while still allowing punching through
> one layer of nat ), but the underlying ipset facility was easily
> extendible to actually do bcp38 and fast to use, so that is what we
> ended up calling the openwrt package. Please contact me offlist if you
> would like a peek at that piece, because the article had some
> structural problems and we never got around to finishing/publishing
> it, and I would like to
> 
> has there been a bcp38 equivalent published for ipv6?

Yes, BCP 38.  BCP 38 is address family agnostic.  Just because the
examples use IPv4 addresses doesn't mean that the concepts don't
just map straight over onto IPv6.

Source based routing is really only needed because BCP 38 filtering
is being poorly implemented.  Rather than collecting the full set
of legitimate source addresses ISP's are only accepting the set of
source addresses that they have allocated to the customer.

With SIDR it should be possible to pass certs to the other ISP's
that say "I am a legitimate source of these addresses" and do this
all automatically.

> Along the way source specific routing showed up for ipv6 and we ended
> up obsoleting the concept of an ipv6 global default route entirely on
> a linux based CPE router.
> 
> see: http://arxiv.org/pdf/1403.0445.pdf and some relevant homenet wg stuff.
> 
> d@nuc-client:~/babeld-1.6.0 $ ip -6 route
> 
> default from 2001:558:6045:e9:251a:738a:ac86:eaf6 via
> fe80::28c6:8eff:febb:9ff0 dev eth0  proto babel  metric 1024
> default from 2601:9:4e00:4cb0::/60 via fe80::28c6:8eff:febb:9ff0 dev
> eth0  proto babel  metric 1024
> default from fde5:dfb9:df90:fff0::/60 via fe80::225:90ff:fef4:a5c5 dev
> eth0  proto babel  metric 1024
> 
> So this box will not forward any ipv6 not in the from(src) table.
> 
> --=20
> Dave T=C3=A4ht
> https://plus.google.com/u/0/explore/makewififast
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org


Re: Updated prefix filtering

2015-05-10 Thread Frederik Kriewitz
Hello Dave,

On Sun, May 10, 2015 at 1:49 AM, Dave Taht  wrote:
> I have had a piece long on the spike on how we implemented bcp38 for
> linux (openwrt) devices using the ipset facility.
>
> We had a different use case (preventing all possible internal rfc1918
> network addresses from escaping, while still allowing punching through
> one layer of nat ), but the underlying ipset facility was easily
> extendible to actually do bcp38 and fast to use, so that is what we
> ended up calling the openwrt package. Please contact me offlist if you
> would like a peek at that piece, because the article had some
> structural problems and we never got around to finishing/publishing
> it, and I would like to
>
> has there been a bcp38 equivalent published for ipv6?

I don't see how this is related to the OPs problem.
But there's the rpfilter iptables module which can be used for BCP38
IPv4 and IPv6 implementations on linux routers.


Re: Thousands of hosts on a gigabit LAN, maybe not

2015-05-10 Thread Nick Hilliard
On 10/05/2015 00:33, Karl Auer wrote:
> Would be interesting to see how IPv6 performed, since is one of the
> things it was supposed to be able to deliver - massively scalable links
> (equivalent to an IPv4 broadcast domain) via massively reduced protocol
> chatter (IPv6 multicast groups vs IPv4 broadcast), plus fully automated
> L3 address assignment.

It will perform badly because putting large numbers of hosts in a single
broadcast domain is a bad idea, no matter what the protocol.

If you have a very large L2 domain and if you use router advertisements to
handle your default gateway announcement, you'll probably end up trashing
your routers due to periodic neighbor solicitation messages.  If you don't
use tight timers, your failover convergence time will be trash.  On the
other hand, the tighter the timers, the more you'll trash your routers,
particularly if there is a failover event - in other words, exactly when
you don't want to stress the network.

In the best case, the gateway unavailability mttr will be around 5-10
seconds and it will be non-deterministic.  This means that if you want
router failover which actually works, you will need to use a first-hop
routing protocol like vrrp or similar.

You will probably want to disable all multicast snooping on your network
because of ipv6 chatter.  Pushing state requirements into the L2 forwarding
mechanism turns out not to be a good idea especially at scale - see the
bimajority.org url that someone else posted on this thread, which is as
much about poor switch implementation as it is about poor protocol design
and solving problems that are a lot less relevant on today's networks.
This will mean that you will also need to manually prune the scope of your
dot1q network domain because otherwise the multicast chatter will be
spammed network-wide across all vlans on which it's defined.

RA gives the operator no way of controlling which IP address is assigned to
which hosts, which means that the operator of the large l2 domain is likely
to want to disable SLAAC if they plan to have any input on what IP address
is assigned to what host.  This may or may not be important to the
operator.  If it's hosts on a hot-seated corporate lan, probably it doesn't
matter too much.  If it's a service provider selling ipv6 services, it
matters a lot.

Regardless of whether this is the case, RA guard on each end-point is a
necessity and if you don't have it, your control plane will be compromised.
 RA guard is more complicated than ARP / DHCP guard and is not well
supported on a lot of hardware.

Finally, if you have a connectivity problem with your large l2 domain, your
problem surface area is much greater than if you segment your network into
smaller chunks, which allows the scope of your outage to be a lot larger.

Nick



RE: Thousands of hosts on a gigabit LAN, maybe not

2015-05-10 Thread John R. Levine
Also, do you need line rate forwarding? Having 1,000 devices with 1Gb 
uplinks doesn't necessarily mean that full throughput is required... the 
clustering and the applications may be sporadic and bursty?


It's definitely sporadic and bursty.  There's another network for high 
speed traffic among the nodes.  The Ethernet is for stuff like program 
loading from NFS servers..


And... what support do you need? Just one spare on the shelf or full 
vendor support on every switch?


Spare on the shelf, definitely.

R's,
John


RE: Thousands of hosts on a gigabit LAN, maybe not

2015-05-10 Thread c b
If you need that kind of density, I recommend a Clos fabric. Arista, Juniper, 
Brocade, Big Switch BCF and Cisco all have solutions that would allow you to 
build a high-density leaf/spine. You can build the Cisco solution with NXOS or 
ACI, depending which models you choose. The prices on these solutions are all 
somewhat in the same ballpark based on list pricing I've seen... even Cisco 
(the Nexus 9k is surprisingly in the same range as branded whitebox). There is 
also Pluribus which offers a fabric, but their niche is having server procs on 
board the switches and it seems like your project involves physical rather than 
virtual servers. Still, the Pluribus could be used without taking advantage of 
the on board server compute I suppose.
I also recommend looking into a solution that supports VXLAN (or GENEVE, or 
whatever overlay works for your needs) simply because MAC is carried in Layer-3 
so you won't have to deal with spanning tree or monstrous mac tables. But you 
don't need to do an overlay if you just segment with traditional VLANs.
I'm guessing you don't need HA (A/B uplinks utilizing LACP) for these servers?
Also, do you need line rate forwarding? Having 1,000 devices with 1Gb uplinks 
doesn't necessarily mean that full throughput is required... the clustering and 
the applications may be sporadic and bursty? I have seen load-testing clusters, 
hadoop and data warehousing pushing high volumes but the individual NICs in the 
clusters never actually hit capacity... If you need line-rate, then you need to 
do a deep dive with several of the vendors because there are significant 
differences in buffers on some models.
And... what support do you need? Just one spare on the shelf or full vendor 
support on every switch? That will impact which vendor you choose.
I'd like to hear more about this effort once you get it going. Which vendor you 
went with, how you tuned it, and why you selected who you did. Also, how it 
works.
LFoD
> Date: Sun, 10 May 2015 01:17:07 +
> From: jo...@iecc.com
> To: nanog@nanog.org
> Subject: Re: Thousands of hosts on a gigabit LAN, maybe not
> 
> In article 
>  you 
> write:
> >Juniper OCX1100 have 72 ports in 1U.
> 
> Yeah, too bad it costs $32,000.  Other than that it'd be perfect.
> 
> R's,
> John