from:"Tore Anderson"

Re: Open source Netflow analysis for monitoring AS-to-AS traffic

2024-03-28 Thread Tore Anderson


On 27/03/24 01:04, Brian Knight via NANOG wrote:
What's presently the most commonly used open source toolset for 
monitoring AS-to-AS traffic?


I want to see with which ASes I am exchanging the most traffic across 
my transits and IX links. I want to look for opportunities to peer so 
I can better sell expansion of peering to upper management.

…
pmacct seems to be good at gathering Netflow, but doesn't seem to 
analyze data. I don't see any concise howto guides for setting this up 
for my purpose, however.


pmacct will do what you want and it's not particularly difficult to set 
it up.


For example, you can aggregate data into a database using:

aggregate[in]: src_as,src_net,src_mask
aggregate[out]: dst_as,dst_net,dst_mask

Now you can issue SQL queries that tell you which ASes or prefixes you 
send/receive the most bits or packets to/from.


Tore

Re: Reverse Traceroute

2023-02-25 Thread Tore Anderson

* Rolf Winter


> If you would like to play with reverse traceroute, the easiest option
> is to work with the client and use one of the public server instances
> (https://github.com/HSAnet/reverse-traceroute/blob/main/ENDPOINTS).
> If you would be willing to host a public server instance yourself,
> please reach out to us.

I suggest you get in touch with the fine folks at NLNOG RING and ask it
they would be interested in setting this up on the 600+ RING nodes all
over the world. See https://ring.nlnog.net/.

Tore

Re: Rack rails on network equipment

2021-09-27 Thread Tore Anderson

* Andrey Khomyakov

> Interesting tidbit is that we actually used to manufacture custom rails for 
> our Juniper EX4500 switches so the switch can be actually inserted from the 
> back of the rack (you know, where most of your server ports are...) and not 
> be blocked by the zero-U PDUs and all the cabling in the rack. Stock rails 
> didn't work at all for us unless we used wider racks, which then, in turn, 
> reduced floor capacity.
> 
> As far as I know, Dell is the only switch vendor doing toolless rails so it's 
> a bit of a hardware lock-in from that point of view. 

Amen.

I suspect that Dell is pretty much alone in realising that rack mount
kits that require insertion/removal from the hot aisle is pure idiocy,
since the rear of the rack tends to be crowded with cables, PDUs, and
so forth.

This might be due to Dell starting out as a server manufacturer. *All*
rack-mount servers on the market are inserted into (and removed from)
the cold aisle of the rack, after all. The reasons that make this the
only sensible thing for servers apply even more so for data centre
switches.

I got so frustrated with this after having to remove a couple of
decommissioned switches that I wrote a post about it a few years back:

https://www.redpill-linpro.com/techblog/2019/08/06/rack-switch-removal.html

Nowadays I employ various strategies to facilitate cold aisle
installation/removal, such as: reversing the rails if possible,
attaching only a single rack ear (for four-post mounted equipment) or
installing rivet nuts directly in the rack ears (for shallow two-post
mounted equipment).

(Another lesson the data centre switch manufacturers could learn from
the server manufacturers is to always include a BMC. I would *much
rather* spend my serial console infrastructure budget on switches with
built-in BMCs. That way I would get remote power control, IPMI Serial-
Over-LAN and so on – all through a *single* Ethernet management cable.)

Tore

Re: Scanning activity from 2620:96:a000::/48

2021-07-06 Thread Tore Anderson

* Dobbins, Roland

> Scanning is part of the ‘background radiation’ of the Internet, and it’s 
> performed by various parties with varying motivations.  Of necessity, IPv6 
> scanning is likely to be more targeted (were your able to discern any rhyme 
> or reason behind the observed scanning patterns?).

The pattern appears to be sending a bunch of ICMPv6 pings to a random adresses
within the same /104. The last 24 bits of each destination address appears
randomised in each ping request, that is.

I don't know if they move on to another /104 after they were done with the
first one and so forth.

> iACLs, tACLs, CoPP, selective QoS for various ICMPv6 types/codes, et. al. 
> should be configured in such a manner that 600pps of anything can’t cause an 
> adverse impact to any network functions.  Because actual bad actors are 
> unlikely to voluntarily stop, even when requested to do so.

Clearly, and in this particular case my CP protections did their job
successfully, fortunately, but that is kind of besides the point.

What I am wondering, though, is if it is really should be considered okay for
a good actor to launch what essentially amounts to an neighbour cache
exhaustion DoS attack towards unrelated network operators (without asking
first), just because bad actors might do the same.

Tore

Scanning activity from 2620:96:a000::/48

2021-07-06 Thread Tore Anderson

A couple of hours after midnight UTC, the control plane policers for
unresolved traffic on a couple of our CE routers started being clogged with
ping-scanning activity from 2620:96:a000::/48, which belongs to «Internet
Measurement Research (SIXMA)» according to ARIN.

Excerpt of this traffic (anonymised on our end):

11:21:05.016914 IP6 2620:96:a000::10 > 2001:db8:1234::f5:7a69: ICMP6, echo 
request, seq 0, length 16
11:21:05.016929 IP6 2620:96:a000::10 > 2001:db8:1234::12:ba74: ICMP6, echo 
request, seq 0, length 16
11:21:05.060045 IP6 2001:db8:1234::3 > 2620:96:a000::10: ICMP6, destination 
unreachable, unreachable address 2001:db8:1234::e7:f473, length 64
11:21:05.060060 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination 
unreachable, unreachable address 2001:db8:1234::d4:c4a3, length 64
11:21:05.060419 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination 
unreachable, unreachable address 2001:db8:1234::42:198a, length 64
11:21:05.064464 IP6 2620:96:a000::10 > 2001:db8:1234::4a:d4cd: ICMP6, echo 
request, seq 0, length 16
11:21:05.079645 IP6 2620:96:a000::10 > 2001:db8:1234::63:b58d: ICMP6, echo 
request, seq 0, length 16
11:21:05.097337 IP6 2620:96:a000::10 > 2001:db8:1234::24:1038: ICMP6, echo 
request, seq 0, length 16
11:21:05.111091 IP6 2620:96:a000::7 > 2001:db8:1234::8f:a126: ICMP6, echo 
request, seq 0, length 16
11:21:05.124112 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination 
unreachable, unreachable address 2001:db8:1234::e6:70fc, length 64
11:21:05.124417 IP6 2001:db8:1234::3 > 2620:96:a000::10: ICMP6, destination 
unreachable, unreachable address 2001:db8:1234::bf:ca18, length 64
11:21:05.137509 IP6 2620:96:a000::10 > 2001:db8:1234::12:f0df: ICMP6, echo 
request, seq 0, length 16
11:21:05.142614 IP6 2620:96:a000::7 > 2001:db8:1234::8f:9ec6: ICMP6, echo 
request, seq 0, length 16

While the CP policer did its job and prevented any significant operational
impact, the traffic did possibly prevent/delay legitimate address resolution
attempts as well as trigger loads of pointless address resolution attempts
(ICMPv6 Neighbour Solicitations) towards the customer LAN.

We just blocked the prefix at our AS border to get rid of that noise. Those
ACLs are currently dropping packets at a rate of around 600 pps.

I was just curious to hear if anyone else is seeing the same thing, and also
whether or not people feel that this is an okay thing for this «Internet
Measurement Research (SIXMA)» to do (assuming they are white-hats)?

Tore

Re: Partial vs Full tables

2020-06-05 Thread Tore Anderson

* Michael Hare

> I'm considering an approach similar to Tore's blog where at some
> point I keep the full RIB but selectively populate the FIB.  Tore,
> care to comment on why you decided to filter the RIB as well?

Not «as well», «instead».

In the end I felt that running in production with the RIB and the FIB
perpetually out of sync was too much of a hack, something that I would
likely come to regret at a later point in time. That approach never
made it out of the lab.

For example, simple RIB lookups like «show route $dest» would not have
given truthful answers, which would likely have confused colleagues.

Even though we filter on the BGP sessions towards our transits, we
still all get the routes in our RIB and can look them up explicitly we
need to (e.g., in JunOS: «show route hidden $dest»).

Tore

Re: Partial vs Full tables

2020-06-05 Thread Tore Anderson

* Saku Ytti

> On Fri, 5 Jun 2020 at 11:23, Tore Anderson  wrote:
> 
> > Sure you can, you just ask them. (We did.)
> 
> And is it the same now? Some Ytti didn't 'fix' the config last night?
> Or NOS change which doesn't do conditional routes? Or they
> misunderstood their implementation and it doesn't actually work like
> they think it does. I personally always design my reliance to other
> people's clue to be as little as operationally feasible.

The way they answered the question showed that they had already
considered this particular failure case and engineered their
implementation accordingly. That is good enough for us.

Incorrect origination of a default route is, after all, just one of the
essentially infinite ways our transit providers can screw up our
services. Therefore it would make no sense to me to entrust the
delivery of our business critical packets to a transit provider, yet at
the same time not trust them to originate a default route reliably.

If we did not feel I could trust my transit provider, we would simply
find another one. There are plenty to choose from.

Tore

Re: Partial vs Full tables

2020-06-05 Thread Tore Anderson

* Saku Ytti

> On Fri, 5 Jun 2020 at 10:48, Tore Anderson  wrote:
> 
> > We started taking defaults from our transits and filtering most of the
> > DFZ over three years ago. No regrets, it's one of the best decisions we
> > ever made. Vastly reduced both convergence time and CapEx.
> 
> Is this verbatim?

I do not understand this question, sorry.

> you cannot know how the operator originates default

Sure you can, you just ask them. (We did.)

Tore

Re: Partial vs Full tables

2020-06-05 Thread Tore Anderson

* James Breeden

> I come to NANOG to get feedback from others who may be doing this. We
> have 3 upstream transit providers and PNI and public peers in 2
> locations. It'd obviously be easy to transition to doing partial
> routes for just the peers, etc, but I'm not sure where to draw the
> line on the transit providers. I've thought of straight preferencing
> one over another. I've thought of using BGP filtering and community
> magic to basically allow Transit AS + 1 additional AS (Transit direct
> customer) as specific routes, with summarization to default for the
> rest. I'm sure there are other thoughts that I haven't had about this
> as well

We started taking defaults from our transits and filtering most of the
DFZ over three years ago. No regrets, it's one of the best decisions we
ever made. Vastly reduced both convergence time and CapEx.

Transit providers worth their salt typically include BGP communities
you can use to selectively accept more-specific routes that you are
interested in. You could, for example, accept routes learned by your
transits from IX-es in in your geographic vicinity.

Here's a PoC where we used communities to filter out all routes except
for any routes learned by our primary transit provider anywhere in
Scandinavia, while using defaults for everything else:

https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html

(Note that we went away from the RIB->FIB filtering approach described
in the post, what we have in production is traditional filtering on the
BGP sessions.)

Tore

Re: [EXT] Re: rack rails

2020-03-30 Thread Tore Anderson

* Cummings, Chris

> Now that you say that, I think you're right. I am referring specifically to 
> the EX4650 and they are the cheesy type where the rear half of the rail stays 
> screwed in to the rack and the front half of the rail is attached to the 
> switch. I assume it is the same on the QFX since they are very similar 
> platforms. Basically they are that annoying type between rack ears and 
> sliding rails where the device can separate completely from the rails.

Looking at the documentation (linked below), it would appear the EX4650 has the 
exact same rack-mount kit as the EX4500 and EX4600 do.

They all share the fundamental problem I'm taking about, namely that there are 
fixed mounting ears on the port side of the switch, which prevent removal 
through the cold aisle (assuming data centre/PSU-to-port airflow).

The "sliders" are really just there to prevent the PSU end of the switch to sag.

Tore

https://www.juniper.net/documentation/en_US/release-independent/junos/information-products/topic-collections/hardware/ex-series/ex4650/quick-start-ex4650.pdf
 (page 6)

Re: [EXT] Re: rack rails

2020-03-30 Thread Tore Anderson

* Chuck Anderson

> The point is that the switches need to be removable without empty
> space above/below, and ideally from the rear side of the rack.  By
> having extending/sliding rails, you can lift out or drop in the switch
> after you slide it out.  Then you can remove the rails.
> 
> With fixed rails, you can't get the switch out without bending the ear
> part of the rails when there are PDUs and other stuff in the way.

Not necessarily. Even sliding rails must be constructed in a way that 
facilitates removal through the cold aisle side of the rack. That's not a given.

One example of sliding rails that unfortunately do *not* allow for removal that 
way is the Edge-Core RKIT-100G-SLIDE:

https://www.redpill-linpro.com/techblog/2020/01/17/new-routers.html (Ctrl+F 
Bonus)

Tore

Re: rack rails

2020-03-30 Thread Tore Anderson

* Luke Guillory

> I've had gear that came with a small rear support shelf that didn't had to 
> the height, RGB Networks BNPs for example. I'm pretty sure we've used these 
> with the BNPs one on top of the other. 
> 
> Page 16 in this PDF shows the shelf.
> 
> http://www.konturm.ru/catalogy/df/bnp2xr_installation_guide_3.7.1_20160222.pdf

Interesting, thanks! Such a shelf would do the trick if it is thin enough to 
fit in the tiny space between two devices mounted in adjacent rack units.

Do you know if it is possible to buy this kind of shelf from somewhere (without 
an accompanying device)?

Tore

Re: rack rails

2020-03-30 Thread Tore Anderson

* David Funderburk

> 2 - Do you know of any universal rail kits for 1U, 2U and 3U servers, 
> routers, switches that work well?  The brand names are nice but expensive. 
> Thought I'd explore some cheaper options first. We use a lot of MikroTik, HP, 
> Dell and some CISCO with a few other things here and there.

When it comes to network equipment meant for mounting in four-post data centre 
racks with PSU-to-port airflow, the included kits are usually anything but nice.

The problem is that they typically only allow for insertion/removal through the 
rear of the rack (unlike servers, which are almost exclusively mounted through 
the front of the rack).

When a rack has been filled up, removal/insertion through the rear will often 
be essentially impossible due to cables, vertical PDUs and stuff like that that 
gets in the way.

Explained in pictures here: 
https://www.redpill-linpro.com/techblog/2019/08/06/rack-switch-removal.html

If someone knows of a generic rack mount kit for data centre switches that 
allows for insertion/removal through the front of the rack, i.e. from/to the 
cold aisle, I'd be very grateful.

Best thing I've come up with so far is to use shelves, but that doubles the 
amount of rack units I need to use (1U switch sitting on top of a 1U shelf...)

Tore

Re: Dual Homed BGP

2020-01-25 Thread Tore Anderson

* Baldur Norddahl

> If you join any peering exchanges, full tables will be mandatory. Some 
> parties will export prefixes and then expect a more specific prefix received 
> from your transit to override a part of the space received via the peering. 

That would be a fundamentally flawed expectation, in my opinion.

An AS that that advertises a prefix to its peers must be prepared to carry 
traffic to that entire prefix via that peering circuit.

There is simply no guarantee that a more-specific prefix advertised somewhere 
else will make it into the RIBs and FIBs of all the peers of that AS.

The AS might of course opt to do so anyway for traffic engineering purposes, 
but there is no assurance that it will actually work 100% of the time. When it 
doesn't, the AS in question would need to carry the traffic from the peering 
circuit across their own backbone.

If the AS in question for some reason cannot do so, it would need to adjust its 
advertisements across the peering circuit so as to avoid falsely advertising 
reachability to unreachable destinations.

Tore

Re: FYI - Suspension of Cogent access to ARIN Whois

2020-01-07 Thread Tore Anderson

* David Guo via NANOG

> Good News! But we still received several spams from Cogent for our RIPE and 
> APNIC ASNs.

If you are an EU/EEA citizen, you may object to their use of your personal 
information for marketing purposes (or for any purpose at all), as well as 
request erasure.

(Note: these rights do not extend to impersonal role addresses like 
n...@example.com or hostmas...@example.com.)

According to https://www.cogentco.com/en/cogent-gdpr/data-privacy, this should 
be done by sending e-mail to datapriv...@cogentco.com.

There is no circumstance in which a company can legally refuse an objection to 
processing of personal information for marketing purposes. Therefore, should 
they refuse (or claim compliance but continue to spam you), you have standing 
to file a complaint with your national data protection agency.

A DPA is competent to levy fines for violations of the GDPR of up to €20M or 4% 
of annual global revenue, so there is a certain incentive to respect such 
objections.

(It might be that citizens of California have similar rights under the CCPA, 
which came into force last week.)

Tore

Re: ECN

2019-11-13 Thread Tore Anderson

* Saku Ytti

> Not true. Hash result should indicate discreet flow, more importantly
> discreet flow should not result into two unique hash numbers. Using
> whole TOS byte breaks this promise and thus breaks ECMP.
> 
> Platforms allow you to configure which  bytes are part of hash
> calculation, whole TOS byte should not be used as discreet flow SHOULD
> have unique ECN bits during congestion. Toke has diagnosed the problem
> correctly, solution is to remove TOS from ECMP hash calculation.

Agreed. This also goes for the other bits, so whole byte must be excluded.

For example, the OpenSSH client will by default change the code point from zero 
(during authentication) to af21/cs1 (when it enters a 
interactive/non-interactive session).

I have experienced this break IPv6 SSH sessions to an anycasted SSH server 
instance that was reached through old Juniper DPC cards with ECMP enabled. 
Symptom was that authentication went fine, only for the connection to be reset 
immediately after (unless default IPQoS config was changed). The «solution» was 
to simply disable ECMP for all IPv6 traffic, since I could not figure out how 
to make the Juniper exclude the DiffServ byte from the ECMP hash calculation.

Tore

Re: Couple of questions about "baremetal/ONIE" networking equipment sellers

2019-10-27 Thread Tore Anderson

* Nick ten Cate

> We also have lots of experience with FS.com switches; however.. One thing we 
> noticed really quick is that its better to order 1 and to find the actual 
> supplier and order with them directly. FS.com is a reseller; and they will 
> switch (no pun intended) supplier almost yearly. Real technical support is 
> nonexistent (even though they claim it is great) and I have yet to have a 
> single bug fixed; packet dumps and steps to reproduce included. I have 
> removed all of our *N*5850-48S6Q due to bugs in software lockups.

Hi Nick,

FS.com did indeed replace their N5850-58S6Q supplier a while back. It is rather 
idiotic of them to not change their SKU when they do so.

Anyway, before it was manufactured by Celestica I think, now it is the 
Edge-Core AS5812-54X. The latter is very well supported by Cumulus, the former 
is not.

You can see it is the the Edge-Core by comparing the pictures:

https://www.fs.com/de-en/products/69226.html
https://www.edge-core.com/productsInfo.php?cls=1=8=59=119

We bought a few of them. I did mail our AM before placing the order to 
ascertain that they would indeed deliver the AS5812-54X and to make it crystal 
clear that no other model would be accepted. No problem.

They will also sell other Edge-Core models that's not (yet) on their website 
catalogue if you ask (we ordered a few AS7326-56Xes).

I do not believe Edge-Core will sell direct to end-users, so resellers like FS, 
Cumulus Networks or HPE is your best bet if you want those.

Tore

Re: BGP router question

2019-08-09 Thread Tore Anderson

* Art Stephens

> Hope this is not too off topic but can any one advise if a Dell S4048-ON can 
> support full ebgp routes?

As others have mentioned, you won't be able to program them all in the 
forwarding plane, but the control plane can receive them all just fine (it has 
more than enough RAM).

If your use case allows for accepting a default route from your IP transit 
providers along with the full feed, you can easily implement control plane 
policies that ensure that what gets installed to the forwarding plane is only 
the routes to the destinations you care the most about + the default route to 
cover the long tail of traffic to the rest of the world.

You can use the S4048-ON (or any equivalent layer-3 capable data centre switch) 
as a border router this way, at a fraction of what a big C or J router would 
cost you. We started doing this a few years back and we're not regretting it.

https://labs.spotify.com/2016/01/27/sdn-internet-router-part-2/
https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html

Tore

Re: Gi Firewall for mobile subscribers

2019-04-13 Thread Tore Anderson

* Mark Milhollan
> On Thu, 11 Apr 2019, Tore Anderson wrote:
> 
>> We've been wanting to replace our all of our ad-hoc OOB links with a
>> standardised setup based on LTE connectivity to an embedded
>> login/console server at each PoP. IPv6 would be perfect due to no
>> CGNAT and infinitesimal levels of background scanning.
>>
>> Unfortunately Telenor has decided to deploy a central firewall that
>> drops all inbound connections, making their service totally unusable
>> for our use case. I guess they don't want our money.
> 
> Sounds like the console server will need to "phone home".  That a workaround 
> might be possible doesn't make a firewall which the user cannot control to 
> some degree less annoying.  Though it might be that Telenor just needs to be 
> notified/reminded that power users and business customers exist.

Phoning home is not an option here, as the whole point is to have an OOB 
backdoor that works even if «home» is totally FUBAR.

For that reason it needs to be completely independent of the production 
network. Standard Internet connections are perfect, IFF they are bi-directional.

Tore

Re: Gi Firewall for mobile subscribers

2019-04-11 Thread Tore Anderson

* Owen DeLong

> What would be the process for a subscriber who wishes to allow inbound 
> connections?
> 
> If you are simply saying that as a customer of your ISP you simply can’t 
> allow inbound IPv6 connections at all, then you are becoming a very poor 
> substitute for an ISP IMHO.

I have to agree with this.

We've been wanting to replace our all of our ad-hoc OOB links with a
standardised setup based on LTE connectivity to an embedded
login/console server at each PoP. IPv6 would be perfect due to no
CGNAT and infinitesimal levels of background scanning.

Unfortunately Telenor has decided to deploy a central firewall that
drops all inbound connections, making their service totally unusable
for our use case. I guess they don't want our money.

Maybe with EU RLAH I could simply find another more suitable provider
abroad. Maybe I'd even get vPLMN redundancy that way. Hmm...

Tore

Re: ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms

2019-03-05 Thread Tore Anderson

* Jean-Daniel Pauget

> I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" 
> service
> of the concerned operator doesn't handle IPv6 yet.
> 
> as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" 
> (rfc 4443)
> seem to be ignored or filtered at ~60% of ClouFlare's http farms
> 
> as a result, random sites such as http://nanog.org/ or 
> https://www.ansible.com/
> are badly reachable whenever small mtu are involved ...

Hi Jean-Daniel.

If you're using using tunnels you'll want to have your tunnel endpoint
adjust down the TCP MSS value to match the MTU of the tunnel interface.
That way, you'll avoid problems with Path MTU Discovery. Even in those
situations where PMTUD does work fine, doing TCP MSS adjustment will
improve performance as the server does not need to spend an RTT to
discover your reduced MTU.

(This isn't really an IPv6 issue, by the way - ISPs using PPPoE will
typically perform MSS adjustment for IPv4 packets too.)

If you're using Linux as your tunnel endpoint, try:

ip6tables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  
--clamp-mss-to-pmtu

Tore

Re: BGP Experiment

2019-01-08 Thread Tore Anderson

* Job Snijders

> Given the severity of the bug, there is a strong incentive for people to 
> upgrade ASAP.

The buggy code path can also be disabled without upgrading, by building
FRR with the --disable-bgp-vnc configure option, as I understand it.

I've been told that this is the default in Cumulus Linux.

Tore

Re: Most peered AS per country

2018-11-28 Thread Tore Anderson

* Mehmet Akcin

> I am noticing provider A enters market X saying they are tier 1 network but 
> they do not have a si ngle peering session in country and they backhaul 
> everything back to market Z where they deliver traffic to the peer via high 
> latency and low performance method. This is causing market to receive pricing 
> targets which are unrealistic and hurting telecoms who are genuinely trying 
> to do right thing and establish in country direct peering with peers.

Yeah, don't fall for the marketing hyperbole. A transit provider's «tier»
is an extremely poor indicator of its interconnectedness and quality,
especially if your traffic is regional in nature. In most cases you'll be
much better off buying your IP transit from a regional «tier-2» provider,
which tends to give you much better connectivity to other networks in your
region - in addition to all the global connectivity that the «tier-2»'s
upstream(s) provide, of course.

Tore

Re: China ’s Maxim – Leave No Access Point Unexploited: The Hidden Story of China Telecom’ s BGP Hijacking

2018-11-05 Thread Tore Anderson

* Harley H

> Curious to hear others' thoughts on this. 
> https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1050=mca
> 
> This paper presents the view that several BGP hijacks performed by China 
> Telecom had malicious intent. The incidents are:
> * Canada to Korea - 2016
> * US to Italy - Oct 2016
> * Scandinavia to Japan - April-May 2017
> * Italy to Thailand - April-July 2017
> 
> The authors claim this is enabled by China Telecom's presence in North 
> America.

Hi,

I looked a bit into the Scandinavia to Japan claim last week for a Norwegian
journalist, who obviously found this rather sensational claim very intriguing.
The article (Norwegian, but Google Translate does a decent job) is found at 
https://www.digi.no/artikler/internettrafikk-fra-norge-og-sverige-ble-kapret-og-omdirigert-til-kina/449797?key=vS1EOiG1
in case you're interested.

>From what I can tell from looking at routeviews data from the period, what
happened was that SK Broadband (AS9318) was leaking a bunch of routes to
China Telecom (AS4134). The leak included the transit routes from SKB's
upstream Verizon (AS703) and customers of theirs in turn, including well-
known organisations such as Bloomberg (AS10361) and Time Warner (AS36032),
which I suppose might be the ones the paper is referring to.

The routes in question then propagated from CT to Telia Carrier (AS1299),
probably in North America somewhere. Scandinavia is TC's home turf, it
makes sense that the detour via CT was easily observed from here.

If you want to see for yourself, look for «1299 4134 9318 703» in
http://archive.routeviews.org/route-views.linx/bgpdata/2017.04/RIBS/rib.20170430.2200.bz2

Anyway, in my opinion the data for this particular incident (I haven't
looked into the other three) does not indicate foul play on CT's behalf,
but rather a pretty standard leak by SKB followed by sloppy filtering
by CT and TC both.

Tore

Re: Cloudflare 1.1.1.1 public DNS different as path info for 1.0.0.1 and 1.1.1.1 london

2018-04-02 Thread Tore Anderson

* Marty Strong via NANOG 

> Routing from ~150 locations, plenty of redundancy.

Any plans to support NSID and/or "hostname.bind" to allow clients to
identify which node is serving their requests? For example:

$ dig @nsb.dnsnode.net. hostname.bind. CH TXT +nsid
[...]
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; NSID: 73 34 2e 6f 73 6c ("s4.osl")
;; QUESTION SECTION:
;hostname.bind. CH  TXT

;; ANSWER SECTION:
hostname.bind.  0   CH  TXT "s4.osl"
[...]

Tore

Re: Xbox Live and Teredo

2018-01-03 Thread Tore Anderson

* Martin List-Petersen 

> Your best bet: set up a Terredo gateway and facilitate these Xboxes as 
> long as you don't give them native IPv6.

This is unlikely to help, as the XB1 doesn't use Teredo relays at all.

The XB1 uses Teredo to facilitate direct p2p communication between IPv4
consoles only. Essentially it is used an IPv4 NAT traversal mechanism.

Its Teredo implementation does not allow communication between IPv4 and
IPv6 peers. This is the only communication pattern which would normally
require a third-party Teredo relay. This unfortunately also means that
provisioning IPv6 is also unlikely to help, unless you're in a position
to provision it to both peers.

See: https://www.ietf.org/proceedings/88/slides/slides-88-v6ops-0.pdf

Personally I'd start out by verifying the connectivity to and
functionality of Microsoft's Teredo servers, which are used for NAT
address discovery and port mapping during tunnel setup (unlike Teredo
relays, Teredo servers aren't part of the Teredo «forwarding plane»).

Tore

Re: BGP peering question

2017-07-12 Thread Tore Anderson

* craig washington

> Newbie question, what criteria do you look for when you decide that
> you want to peer with someone or if you will accept peering with
> someone from an ISP point of view.
Routing hygiene. I expect the would-be peer to keep the number of
advertised routes that are either 1) not registered in RIPE/RADB, 2)
disaggregated, or 3) redundant (i.e., more-specifics of larger
advertisements) to an absolute minimum.

Tore

Re: difference with caching when connected to telia internet

2017-03-18 Thread Tore Anderson

Hi Aaron,

> What happened was, when I turned up my new 10 gig Telia Internet
> connection a few days ago, I needed to balance out my (4) 10 gig
> internet connections so I chopped up a /17 into (4) /19's.  When I
> did this, I was still advertising the /17 to my local caches, but I
> was advertising the (4) /19's , one on each of my (4) 10 gig internet
> connections.  So the caches out on the public internet were learning
> more specific prefixes (longer masks) then my local caches were
> learning... so the caches on the internet were being used instead of
> my local caches.  Once google and Netflix tech support helped to make
> me aware of this, I correctly sent the additional (4) /19's to my
> caches and now all is well. 

Please instead advertise the /17 on all your Telia uplinks. You should
*additionally* advertise the four /19s to the different links, but make
sure to tag them with the NO_EXPORT community so they don't propagate
outside Telia.

That way you get the traffic engineering you want (i.e., load balancing
of ingress traffic from Telia), while at the same time avoiding coming
across as a self-serving jerk to everyone else on the Internet by not
polluting their routing tables/FIBs with four entirely superfluous /19s.

Tore

Re: External BGP Controller for L3 Switch BGP routing

2017-01-16 Thread Tore Anderson

* Saku Ytti

> On 16 January 2017 at 14:36, Tore Anderson <t...@fud.no> wrote:
>
> > Put it another way, my «Internet facing» interfaces are typically
> > 10GEs with a few (kilo)metres of dark fibre that x-connects into my
> > IP-transit providers' routers sitting in nearby rooms or racks
> > (worst case somewhere else in the same metro area). Is there any
> > reason why I should need deep buffers on those interfaces?  
> 
> Imagine content network having 40Gbps connection, and client having
> 10Gbps connection, and network between them is lossless and has RTT of
> 200ms. To achieve 10Gbps rate receiver needs 10Gbps*200ms = 250MB
> window, in worst case 125MB window could grow into 250MB window,  and
> sender could send the 125MB at 40Gbps burst.
> This means the port receiver is attached to, needs to store the 125MB,
> as it's only serialising it at 10Gbps. If it  cannot store it, window
> will shrink and receiver cannot get 10Gbps.
> 
> This is quite pathological example, but you can try with much less
> pathological numbers, remembering TridentII has 12MB of buffers.

I totally get why the receiver need bigger buffers if he's going to
shuffle that data out another interface with a slower speed.

But when you're a data centre operator you're (usually anyway) mostly
transmitting data. And you can easily ensure the interface speed facing
the servers can be the same as the interface speed facing the ISP.

So if you consider this typical spine/leaf data centre network topology
(essentially the same one I posted earlier this morning):

(Server) --10GE--> (T2 leaf X) --40GE--> (T2 spine) --40GE-->
(T2 leaf Y) --10GE--> (IP-transit/"the Internet") --10GE--> (Client)

If I understand you correctly you're saying this is a "suspect" topology
that cannot achieve 10G transmission rate from server to client (or
from client to server for that matter) because of small buffers on my
"T2 leaf Y" switch (i.e., the one which has the Internet-facing
interface)?

If so would it solve the problem just replacing "T2 leaf Y" with, say,
a Juniper MX or something else with deeper buffers?

Or would it help to use (4x)10GE instead of 40GE for the links between
the leaf and spine layers too, so there was no change in interface
speeds along the path through the data centre towards the handoff to
the IPT provider?

Tore

Re: External BGP Controller for L3 Switch BGP routing

2017-01-16 Thread Tore Anderson

* Saku Ytti 

> Why I said it won't be a problem inside DC, is because low RTT, which
> means small bursts. I'm talking about backend network infra in DC, not
> Internet facing. Anywhere where you'll see large RTT and
> speed/availability step-down you'll need buffers (unless we change TCP
> to pace window-growth, unlike burst what it does now, AFAIK, you could
> already configure your Linux server to do pacing at estimate BW, but
> then you'd lose in congested links, as more aggressive TCP stack would
> beat you to oblivion).

But here you're talking about the RTT of each individual link, right,
not the RTT of the entire path through the Internet for any given flow?

Put it another way, my «Internet facing» interfaces are typically 10GEs with
a few (kilo)metres of dark fibre that x-connects into my IP-transit providers'
routers sitting in nearby rooms or racks (worst case somewhere else in
the same metro area). Is there any reason why I should need deep
buffers on those interfaces?

The IP-transit providers might need the deep buffers somewhere in their
networks, sure. But if so I'm thinking that's a problem I'm paying them
to not have to worry about.

BTW, in my experience the buffering and tail-dropping is actually a
bigger problem inside the data centre because of distributed
applications causing incast. So we get workarounds like DCTCP and BBR,
which is apparently cheaper than using deep-buffer switches everywhere.

Tore

Re: External BGP Controller for L3 Switch BGP routing

2017-01-15 Thread Tore Anderson

Hi Saku,

> > https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html
> 
> ---
> As described in a prevous post, we’re testing a HPE Altoline 6920 in
> our lab. The Altoline 6920 is, like other switches based on the
> Broadcom Trident II chipset, able to handle up to 720 Gbps of
> throughput, packing 48x10GbE + 6x40GbE ports in a compact 1RU chassis.
> Its price is in all likelihood a single-digit percentage of the price
> of a traditional Internet router with a comparable throughput rating.
> ---
> 
> This makes it sound like small-FIB router is single-digit percentage
> cost of full-FIB.

Do you know of any traditional «Internet scale» router that can do ~720
Gbps of throughput for less than 10x the price of a Trident II box? Or
even <100kUSD? (Disregarding any volume discounts.)

> Also having Trident in Internet facing interface may be suspect,
> especially if you need to go from fast interface to slow or busy
> interface, due to very minor packet buffers. This obviously won't be
> much of a problem in inside-DC traffic.

Quite the opposite, changing between different interface speeds happens
very commonly inside the data centre (and most of the time it's done by
shallow-buffered switches using Trident II or similar chips).

One ubiquitous configuration has the servers and any external uplinks
attached with 10GE to leaf switches which in turn connects to a 40GE
spine layer with. In this config server<->server and server<->Internet
packets will need to change speed twice:

[server]-10GE-(leafX)-40GE-(spine)-40GE-(leafY)-10GE-[server/internet]

I suppose you could for example use a couple of MX240s or something as
a special-purpose leaf layer for external connectivity.
MPC5E-40G10G-IRB or something towards the 40GE spines and any regular
10GE MPC towards the exits. That way you'd only have one
shallow-buffered speed conversion remaining. But I'm very sceptical if
something like this makes sense after taking the cost/benefit ratio
into account.

Tore

Re: Advertising rented IPv4 prefix from a different ASN.

2016-08-05 Thread Tore Anderson

* Mark Tinka

> On 5/Aug/16 15:40, Soon Keat Neo wrote:
> 
> > If you are just announcing more specific address space that you've
> > obtained legitimately off their assigned address space, it should
> > be no problem, just obtain an LoA and register it on the different
> > databases and you should be set to ask your upstreams to allow the
> > announcements.  
> 
> Do people actually do this?

Just as an example: There are hundreds of more-specifics coming out of
8/8 that has a different origin AS than 8/8 itself, so yes, people do.

Tore

Re: MTU

2016-07-23 Thread Tore Anderson

* Baldur Norddahl

> I did not say we were doing internet peering...

Uhm. When you say that you peer with another ISP (and keep in mind what
the "I" in ISP stands for), while giving no further details, then folks
are going to assume that you're talking about a standard eBGP peering
with inet/inet6 unicast NLRIs.

> In case you are wondering, we are actually running L2VPN tunnels over
> MPLS.

Okay. Well, I see no reason why using GRE tunnels for this purpose
shouldn't work, it does for us (using mostly VPLS and Martini tunnels).

That said, I've never tried extending our MPLS backbone outside of
our own administrative domain or autonomous system. That sounds like a
really scary prospect to me, but I'll admit I've never given serious
consideration to such an arrangement before. Hopefully you know what
you're doing.

Tore

Re: MTU

2016-07-23 Thread Tore Anderson

* Baldur Norddahl

> What is best practice regarding choosing MTU on transit links?
> 
> Until now we have used the default of 1500 bytes. I now have a
> project were we peer directly with another small ISP. However we need
> a backup so we figured a GRE tunnel on a common IP transit carrier
> would work. We want to avoid the troubles you get by having an
> effective MTU smaller than 1500 inside the tunnel, so the IP transit
> carrier agreed to configure a MTU of 9216.

You use case as described above puzzles me. You should already your
peer's routes being advertised to you via the transit provider and vice
versa. If your direct peering fails, the traffic should start flowing
via the transit provider automatically. So unless there's something
else going on here you're not telling us there should be no need for
the GRE tunnel.

That said, it should work, as long as the MTU is increased in both ends
and the transit network guarantees it will transports the jumbos.

We're doing something similar, actually. We have multiple sites
connected with either dark fibre or DWDM, but not always in a redundant
fashion. So instead we run GRE tunnels through transit (with increased
MTU) between selected sites to achieve full redundancy. This has worked
perfectly so far. It's only used for our intra-AS IP/MPLS traffic
though, not for eBGP like you're considering.

> Obviously I only need to increase my MTU by the size of the GRE
> header. But I am thinking is there any reason not to go all in and
> ask every peer to go to whatever max MTU they can support? My own
> equipment will do MTU of 9600 bytes.

I'd say it's not worth the trouble unless you know you're going to use
it for anything. If I was your peer I'd certainly need you to give me a
good reason why I should deviate from my standard templates first...

> On the other hand, none of my customers will see any actual difference
> because they are end users with CPE equipment that expects a 1500
> byte MTU. Trying to deliver jumbo frames to the end users is probably
> going to end badly.

Depends on the end user, I guess. Residential? Agreed. Business? Who
knows - maybe they would like to run fat GRE tunnels through your
network? In any case: 1500 by default, other values only by request.

Tore

Re: IPv6 Deployment for Mobile Subscribers

2016-07-22 Thread Tore Anderson

* Baldur Norddahl

> Den 22. jul. 2016 20.25 skrev "Ca By" :
> 
> > Phones, as in 3gpp? If so, each phone alway gets a /64, there is
> > no choice.
> >
> > https://tools.ietf.org/html/rfc6459  
> 
> Here the cell companies are marketing their 4G LTE as an alternative
> to DSL, Coax and fiber for internet access in your home with a 4G
> wifi router. If they can not do prefix delegation it is no
> alternative!

Actually, that /64 prefix is delegated, after a fashion. RFC 7278.

That said, according to RFC 6459 section 5.3, full DHCPv6-PD support
was specified in 3GPP Rel-10. Not sure if there are production
deployments of that yet though, and if not how far off they are. But at
least it looks like it's coming.

Tore

Re: IPv6 deployment excuses

2016-07-04 Thread Tore Anderson

* Mark Tinka 

> What I was trying to get to is that, yes, running a single-stack is
> cheaper (depending on what "cheaper" means to you) than running
> dual-stack.

Wholeheartedly agreed.

> That said, running IPv4-only means you put yourself at a disadvantage
> as IPv6 is now where the world is going.

Also wholeheartedly agreed.

> Similarly, running IPv6-only means you still need to support access to
> the IPv4-only Internet anyway, if you want to have paying customers or
> happy users.
> 
> So the bottom line is that for better or worse, any progressive
> network in 2016 is going to have to run dual-stack in some form or
> other for the foreseeable future. So the argument on whether it is
> cheaper or more costly to run single- or dual-stack does not change
> that fact if you are interested in remaining a going concern.

My point is that as a content provider, I only need dual-stacked
façade. That can easily be achieved using, e.g., protocol translation
at the outer border of my network.

The inside of my network, where 99.99% of all the complexity, devices,
applications and so on reside, can be single stack IPv6-only today.

Thus I get all the benefits of running a single stack network, minus a
some fraction of a percent needed to operate the translation system.
(I could in theory get rid of that too by outsourcing it somewhere.)

Tore

Re: IPv6 deployment excuses

2016-07-03 Thread Tore Anderson

* Mark Tinka

> I understand your points - to your comment, my question is around
> whether it is cheaper (for you) to just run IPv6 in lieu of IPv6 and
> IPv4.

We've found that it is. IPv6-only greatly reduces complexity compared to
dual stack. This means higher reliability, lower OpEx, shorter recovery
time when something does go wrong anyway, fewer SLA violations, happier
customers, and so on - the list goes on and on. Single stack is
essentially the KISS option.

It also means that we'll essentially never have to perform IPv4
renumbering exercises in order to accomodate for growth. Those tend to
be very costly due to the man-hours required for planning and
implementation.

Besides, it means we don't need IPv4 to number customer infrastructure.
As you probably know, IPv4 numbers have a real cost these days.

My point of view is ASP/MSP/data centre stuff. I know I'm not alone in
going down the IPv6 road here, though. Facebook is another prominent
example.

Other operators in different market segments are also doing IPv6-only.
Kabel Deutschland and T-Mobile US, for example. I'm guessing they have
similar motivations.

Tore

Re: Netflix VPN detection - actual engineer needed

2016-06-07 Thread Tore Anderson

* Davide Davini 

> On 04/06/2016 20:46, Owen DeLong wrote:
> > Get your own /48 and advertise to HE Tunnel via BGP. Problem
> > solved.  
> 
> Even though that sounds like an awesome idea it does not seem trivial
> to me to obtain your own /48.

Which is a good thing, as every new PI /48 advertised to the DFZ will
bloat the routing tables of thousands upon thousands of routers world
wide. It might solve the Netflix problem, but what has actually
happened is that you've split the original problem into a thousand
small bits and thrown one piece into each of your neighbours' gardens.

I'd encourage everyone to try to fix their Netflix problem a more proper
way before deciding to litter everyone else's routing tables with
another PI prefix.

Blocking access to Netflix via the tunnel seems like an obvious
solution to me, for what it's worth.

I wonder if anyone has attempted to estimate approx. how much RIB/FIB
space a single DFZ route requires in total across the entire internet...

Tore

Re: Netflix VPN detection - actual engineer needed

2016-06-06 Thread Tore Anderson

* Spencer Ryan

> As an addendum to this and what someone said earlier about the
> tunnels not being anonymous: From Netflix's perspective they are. Yes
> HE knows who controls which tunnel, but if Netflix went to HE and
> said "Tell me what user has x/48" HE would say "No". Thus, making
> them an effective anonymous VPN service from Netflix's perspective.

Every ISP would say «No» to that question. In sane juridstictions only
law enforcement has any chance of getting that answer (hopefully only if
they have a valid mandate from some kind of court).

But Netflix shouldn't have any need to ask in the first place. Their
customers need to log in to their own personal accounts in order to
access any content, when they do Netflix can discover their addresses.

Tore

Re: Public DNS64

2016-06-01 Thread Tore Anderson

* Mark Andrews

> In message <20160601103707.7de9d...@envy.e5.y.home>, Tore Anderson writes:
> > Or you could simply accept that active sessions are torn down
> > whenever the routing topology changes enough to flip traffic to the
> > anycast prefix to another NAT64 instance in a different region.
> > 
> > It would be no different from any other anycasted service.  
> 
> But some services are inherently short lived.  NAT64 has no such
> property.

Well, yes - it depends on the service/application, right?

That is, anycasted_${service} will work pretty much the same as
${service}_via_anycasted_nat64 for most values of ${service}.

Assuming that:

1) most of your customer's sessions are short-lived and/or their
applications can handle failures reasonably gracefully, and/or
2) you have a stable and well-designed network where you can be
reasonably certain that the traffic from clients in city/region/country
X is going to consistently be routed to the NAT64 instance in
city/region/country X:

...you will have very little to gain by setting up some complicated
NAT64 session replication scheme to city/region/country Y, Z, and so on.

KISS: Just use different IPv4 source address pools in each location and
accept that any long-lived sessions are interrupted when your routing
turns really wonky once in a blue moon.

If on the other hand you cannot under any circumstance accept
disruption to existing sessions, you probably don't want to be using
any form of NAT in the first place. It's not like anycast routes
flipping is the only reason why sessions through a NAT can be
disrupted.

In that case, native IPv6 is probably better, or possibly MAP if you
have no control over the (presumably IPv4-only) remote ends of those
sessions.

Tore

Re: Public DNS64

2016-06-01 Thread Tore Anderson

* Baldur Norddahl

> It goes to the USA and back again. They would need NAT64 servers in
> every region and then let the DNS64 service decide which one is close
> to you by encoding the region information in the returned IPv6
> address. Such as 2001:470:64:[region number]::/96.
> 
> An anycast solution would need a distributed NAT64 implementation,
> such that the NAT64 servers could somehow synchronize state.

Or you could simply accept that active sessions are torn down whenever
the routing topology changes enough to flip traffic to the anycast
prefix to another NAT64 instance in a different region.

It would be no different from any other anycasted service.

Tore

Re: IPV6 planning

2016-03-06 Thread Tore Anderson

* Saku Ytti

> Yes, SLAAC, 4862 clearly does not forbid it, and there is no
> technical reason. But as you state, 2464 does not specify other
> behaviour. Writing new draft which specifies behaviour for arbitrary
> size wouldn't be a challenge, marketing it might be.

FYI: RFC 7421 is an in-depth discussion of the fixed 64-bit boundary.

Tore

Re: The IPv6 Travesty that is Cogent's refusal to peer Hurricane Electric - and how to solve it

2016-01-23 Thread Tore Anderson

William,

> Don't get me wrong. You can cure this fraud without going to extremes.
> An open peering policy doesn't require you to buy hardware for the
> other guy's convenience. Let him reimburse you or procure the hardware
> you spec out if he wants to peer. Nor do you have to extend your
> network to a location convenient for the other guy. Pick neutral
> locations where you're willing to peer and let the other guy build to
> them or pay you to build from there to him. Nor does an open peering
> policy require you to give the other guy a free ride on your
> international backbone: you can swap packets for just the regions of
> your network in which he's willing to establish a connection. But not
> ratios and traffic minimums -- those are not egalitarian, they're
> designed only to exclude the powerless.
> 
> Taken in this context, the Cogent/HE IPv6 peering spat is very simple:
> Cogent is -the- bad actor. 100%.

I'm curious: How do you know that Cogent didn't offer to peer under
terms such as the ones you mention, but that those were refused by HE?

Tore

Re: The IPv6 Travesty that is Cogent's refusal to peer Hurricane Electric - and how to solve it

2016-01-21 Thread Tore Anderson

* Ca By 

> Selling a service that is considered internet but does not deliver
> full internet access is generally considered properly bad.
> 
> I would not do business with either company, since neither of them
> provide a full view.

+1

Both networks are in a position to easily remedy the situation if they
were pragmatically inclined. For example, Cogent could simply accept
HE's offer to peer; HE could simply pick up Cogent's IPv6 routes from
their existing transit provider TSIC.

Instead they both choose to continue their game of chicken to the
detriment of both of their customer bases.

Fortunately there's no shortage of competitors to HE and Cogent who
prioritise providing connectivity higher than engaging in such
nonsense. Vote with your wallets, folks.

Tore

Re: Another Big day for IPv6 - 10% native penetration

2016-01-04 Thread Tore Anderson

* Sander Steffann 

> > We just need Google to announce that IPv6 enabled sites will get a
> > slight bonus in search rankings. And just like that, there will
> > suddenly be a business reason to implement IPv6.  
> 
> I already discussed that with them a long time ago, but they weren't
> convinced. Maybe now is the time to discuss it again :)

I've mentioned this in other forums before, but I might as well repeat
it here too:

I can understand that Google (or Netflix for that matter) are reluctant
to engage in pure IPv6 activism by providing different or improved
content to users which have no IPv6 connectivity. However, maybe they'd
be more open to the idea if it was limited to IPv6 clients only? That
is, IFF the Google user submitting the search is doing it using IPv6,
then consider the result entries' IPv6 availability when sorting the
result set.

My reasoning is that there would be an objective techincal reason for
doing it. The client is demonstrably capable of using IPv6 and prefers
to do so, and as it has been shown that IPv6 performs better than IPv4
(see e.g. https://youtu.be/_7rcAIbvzVY), giving priority to IPv6-enabled
results seems a logical thing to do. Much in the same way that it makes
sense to rank mobile-optimised sites high in result sets returned to
mobile clients.

I'd imagine that the promise of improved Google ratings for 10%/25% of
global/U.S. users will still be a significant enough business reason
for web site operators to seriously consider implementing IPv6.

Tore

Re: Production-scale NAT64

2015-08-27 Thread Tore Anderson

* Mark Tinka mark.ti...@seacom.mu

 On 27/Aug/15 07:16, Mark Andrews wrote:
 
 
  Or why you are looking at NAT64 instead of DS-Lite, MAP-E, or MAP-T
  all of which are better solutions than NAT64.  NAT64 + DNS64 which
  breaks DNSSEC.
 
 Because with NAT64/DNS64/464XLAT, there isn't any undo work after the
 dust settles.

Hi Mark,

There's not much difference between 464XLAT and MAP-*/DS-Lite/lw4o6 in
this respect, the way I see it. In all cases you need four things:

0) Native IPv6.
1) A central component connected to the IPv4 internet and the IPv6.
   access network (464XLAT: PLAT, MAP-*: BR, DS-Lite/lw4o6: AFTR)
2) Signalling to client that #1 exists and can be used (464XLAT: DNS64,
   others: DHCPv6 options).
3) A distributed component at the customer premise/nodes that acts on
   #2 and connects an isolated IPv4 network to the IPv6 access network
   (464XLAT: CLAT, MAP-*: CE, DS-Lite/lw4o6: B4).

The necessary undo work in all cases is to disable #2. At that point
components #1 and #3 will become un-used and can be removed if you
care. My guess is that you'll care about removing #1 because it
probably uses power and space in your PoP, but that you won't care
about #3 because that's just an unused software function residing in a
customer device you might not even have management access to.

I'll grant you that with NAT64/DNS64 *without* 464XLAT there is no #3
to remove as part of your undo work, but as I mentioned above I doubt
you'll care about that particular distinction. Besides, since a CLAT is
included by default in multiple client platforms, you can't really
prevent your users from using 464XLAT if you're providing NAT64/DNS64
to begin with, unless you're doing something really weird like
disabling DNS64 for the ipv4only.arpa. hostname specifically.

Tore

Re: Production-scale NAT64

2015-08-26 Thread Tore Anderson

Hi Mark,

* Mark Tinka mark.ti...@seacom.mu

 In our deployment, we do not offer customers private IPv4 addresses. I
 suppose we can afford to do this because a) we still have lots of
 public IPv4, b) we are not a mobile carrier. So any of our customers
 with IPv4 will never hit the NAT64 gateway.
 
 When we do run out of public IPv4 addresses (and cannot get anymore
 from AFRINIC), all new customers will be assigned IPv6 addresses.

Why wait until then?

Any particular reason why you cannot already today provide IPv6
addresses to your [new] customers in parallel with IPv4?

Tore

Re: Production-scale NAT64

2015-08-25 Thread Tore Anderson

* William Herrin

 On Thu, Aug 20, 2015 at 1:22 PM, Ca By cb.li...@gmail.com wrote:
  On Thu, Aug 20, 2015 at 9:36 AM, William Herrin b...@herrin.us wrote:  
  Seriously though, if you want to run a v6-only network and still
  support access to IPv4 Internet resources, consider 464XLAT or
  DS-Lite.
 
  NAT64 is a required component of 464XLAT.
 
 Sort of, technically, but not really.

Yes really. See below.

 464XLAT does not require DNS64 and provides client software with an
 IPv4 interface. IPv4 software that has no idea IPv6 exists sends IPv4
 packets which get translated to IPv6 packets. Those packets are routed
 to the carrier NAT box which then translates these specially crafted
 IPv6 packets back to IPv4 packets.

What do you think the «carrier NAT box» in 464XLAT is, exactly?

No need to guess, we can check the 464XLAT specification:

http://tools.ietf.org/html/rfc6877#section-2

  PLAT:   PLAT is provider-side translator (XLAT) that complies with
  [RFC6146].  It translates N:1 global IPv6 addresses to global
  IPv4 addresses, and vice versa.

Let's check that reference:

http://tools.ietf.org/html/rfc6146#section-1

  This document specifies stateful NAT64, a mechanism for IPv4-IPv6
  transition and IPv4-IPv6 coexistence.

Lo and behold! Your 464XLAT «carrier NAT box» (a.k.a. «PLAT») *is* a
NAT64 box. Thus, if you intend to deploy 464XLAT in production, you'll
going to need a production scale NAT64 implementation.

To answer the Jawaid's original question, I'm very happy with Jool
(http://jool.mx) for my NAT64 (and SIIT) needs, which is a open-source
Linux-based software solution. It has no problems handling several Gb/s
of traffic using a couple of years old x86 server without any tuning,
so if the capacity required is moderate this might be a cost-effective
alternative to a dedicated boxes from the one of the router/network
appliance vendors.

Tore

Re: Remember Internet-In-A-Box?

2015-07-16 Thread Tore Anderson

* Owen DeLong o...@delong.com

  On Jul 15, 2015, at 08:57 , Matthew Kaufman matt...@matthew.at wrote:
  This is only true for dual-stacked networks. I just tried to set up
  an IPv6-only WiFi network at my house recently, and it was a total
  fail due to non-implementation of relatively new standards...
  starting with the fact that my Juniper SRX doesn't run a load new
  enough to include RDNSS information in RAs, and some of the devices
  I wanted to test with (Android tablets) won't do DHCPv6.
 
 That’s a pretty old load then, as I’ve had RDNSS on my SRX-100 for
 several years now.

Interesting. Which JUNOS version are you running, exactly?

According to Juniper's web site, RDNSS support showed up in JUNOS 14.1,
which isn't available for the SRX series (nor is any later version).

http://www.juniper.net/techpubs/en_US/junos15.1/topics/reference/configuration-statement/dns-server-address-edit-protocols-router-advertisement.html

Tore

Re: NTP versions in production use?

2015-07-12 Thread Tore Anderson

* Julien Goodwin

 Juniper have recently (15.1, still not out for all platforms) rebased 
 JunOS on a slightly less ancient FreeBSD release, and nothing I have in 
 my lab has it released yet, and I can't be bothered to go spelunking in 
 the install image for what version of NTP it's running.

FWIW:

root@lab-ex4200:RE:1% ntpq -c rv
status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
version=ntpd 4.2.0-a Fri May 29 07:45:35  2015 (1),
processor=powerpc, system=JUNOS15.1R1.8, leap=00, stratum=3,
precision=-18, rootdelay=8.087, rootdispersion=52.195, peer=32436,
refid=87.238.33.2,
reftime=d94c85fa.7b317b80  Sun, Jul 12 2015  8:21:46.481, poll=10,
clock=d94c8669.9b6e8a47  Sun, Jul 12 2015  8:23:37.607, state=4,
offset=-1.039, frequency=-32.350, jitter=0.445, stability=0.040

It seems they've pulled the 15.1 release though, at least I can't
download it anymore.

Tore

Re: NTT-HE earlier today (~10am EDT)

2015-06-30 Thread Tore Anderson

* Mike Leber

 I was thinking that when I posted yesterday.
 
 These were announcements from a peer, not customer routes.
 
 We are lowering our max prefix limits on many peers as a result of this.
 
 We are also going towards more prefix filtering on peers beyond bogons 
 and martians.

Hi Mike,

You're not mentioning RPKI here. Any particular reason why not?

If I understand correctly, in today's leak the origin AS was
changed/reset, so RPKI ought to have saved the day. (At least Grzegorz'
day, considering that 33 of AS43996's prefixes are covered by ROAs.)

Tore

  At Tue, 30 Jun 2015 10:27:21 +0200, Grzegorz Janoszka wrote:
  We have just received alert from bgpmon that AS58587 Fiber @ Home
  Limited has hijacked most of our (AS43996) prefixes and Hurricane
  Electric gladly accepted them.

Re: REMINDER: LEAP SECOND

2015-06-25 Thread Tore Anderson

* Stefan Schlesinger s...@ono.at

  On 25 Jun 2015, at 03:14, Damian Menscher via NANOG nanog@nanog.org wrote:
  
  http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html
  comes dangerously close to your modest proposal.
 
 I wonder why Google hasn't published the patch yet. Leap smear sounds
 like the sane way to do leap seconds, and it would't break software
 at all, because time adjustments in the sub-second area are proven to
 work quite well. 

It's implemented in chronyd versions 2.0 and up, for what it's worth.
The required config directive is leapsecmode slew.

There's a nice blog post explaining how this feature, as well as some
other approaches on how to deal with the leap second, work here:

http://developerblog.redhat.com/2015/06/01/five-different-ways-handle-leap-seconds-ntp/

Tore

Re: REMINDER: LEAP SECOND

2015-06-24 Thread Tore Anderson

* Harlan Stenn st...@ntp.org

 Matthew Huff writes:
  A backward step is a known issue and something that people are more
  comfortable dealing with as it can happen on any machine with a
  noisy clock crystal.
 
 A clock crystal has to be REALLY bad for ntpd to need to step the
 clock.
 
  Having 61 seconds in a minute or 86401 seconds in a day is a
  different story.
 
 Yeah, leap years suck too.
 
 And those jumps around daylight savings time.

Hi Harlan,

Leap years and DST ladjustments have never caused us any major
issues. It seems these code paths are well tested and work fine.

The leap second in 2012 however ... total and utter carnage.
Application servers, databases, etc. falling over like dominoes. All
hands on deck in the middle of the night to clean up. It took days
before we stopped finding broken stuff.

Maybe all the bugs from 2012 have been fixed. Maybe they haven't. Maybe
new ones have been introduced. I'm not terribly optimistic. One example
I'm aware of: Cisco Nexus 5010/5020 switches need software that was
released as late as 29th of April this year in order to be immune to
the crashburn leap second bug CSCub38654. The official «Cisco
Suggested release based on software quality, stability and longevity»
is older. Go figure.

In any case, we're certainly not going to risk it. So our plan is to
disconnect our local stratum-2s from their upstreams on June 29th so
they (and more crucially, their downstream clients) remain oblivious to
the leap second. Come July 1st, we'll reconnect them. The clients'
clocks will be 1s (plus any drift) off at that point, but as we're
running ntpd with the -x option, that shouldn't cause backwards
stepping. Running with slightly incorrect clocks for a few days is a
small price to pay to avoid a repeat of 2012's mayhem.

Tore

Re: REMINDER: LEAP SECOND

2015-06-24 Thread Tore Anderson

* Majdi S. Abbas

 On Wed, Jun 24, 2015 at 08:33:14AM +0200, Tore Anderson wrote:
  Leap years and DST ladjustments have never caused us any major
  issues. It seems these code paths are well tested and work fine.
 
   I've seen quite a few people that for whatever reason insist
 on running systems in local time zones struggle with the DST reverse
 step.  It's not nearly as much of a non-issue as you claim.

Read again, and note the word us. I am describing my and my
employer's experience with past DST changes and leap years, and those
have indeed been completely uneventful.

YMMV.

  The leap second in 2012 however ... total and utter carnage.
  Application servers, databases, etc. falling over like dominoes. All
  hands on deck in the middle of the night to clean up. It took days
  before we stopped finding broken stuff.
 
   Total and utter carnage is a bit of a stretch.

As above, I am speaking only about how the 2012 leap second went down
in our infrastructure. I stand by how I described the event.

Again, YMMV. If you plan on let your infrastructure deal with the
upcoming leap second head-on, I wish you the best of luck. Hopefully
all the bugs from 2012 have been fixed. I, however, certainly have no
intention of being the one to find out otherwise.

Tore

Re: REMINDER: LEAP SECOND

2015-06-24 Thread Tore Anderson

* Matthew Huff

 Does anyone know what the latest that we can run our NTP servers and
 not distribute the LEAP_SECOND flag to the NTP clients?

From http://support.ntp.org/bin/view/Support/NTPRelatedDefinitions:

  Leap Indicator

This is a two-bit code warning of an impending leap second to be
inserted in the NTP timescale. The bits are set before 23:59 on the
day of insertion and reset after 00:00 on the following day. This
causes the number of seconds (rollover interval) in the day of
insertion to be increased or decreased by one. 

So the answer to your question is, AIUI, 2015-06-29 23:59:59.

Tore

Re: REMINDER: LEAP SECOND

2015-06-24 Thread Tore Anderson

* Matthew Huff

 I saw that, but it says the bits are set before 23:59 on the day of
 insertion, but I was hoping that I could shut it down later than
 23:59:59 of the previous day (8pm EST). The reason is FINRA
 regulations. We have to have the time synced once per trading day
 before the open according to the regulations.

Again AIUI, and I'm no NTP expert so I hope someone corrects me if I'm
wrong:

If you don't configure the leapfile ntpd option, the Leap
Indicator flag will flow down to your servers from the stratum-1s
servers you're synchronising from (directly or indirectly).

So what I think you could do, is to on the 29th remove all your
upstream servers from your NTP server's config, and set fudge
127.127.1.0 stratum 3 or something like that so that clients will
still want to sync to it. At that point, your NTP server's clock chip
will be the reference clock, which might be drift-prone. To work around
that, you could at 8pm on the 30th stop ntpd, manually sync the system
clock with ntpdate, and start ntpd again. That should keep your NTP
server's clock reasonably synchronised, that provides your clients with
(a Leap Indicator-free) NTP service.

I make no guarantees that the above will work the way I think it will,
though... Try it at your own risk.

Tore

Re: REMINDER: LEAP SECOND

2015-06-24 Thread Tore Anderson

* Matthew Huff

 That won't work. Being internally sync'ed isn't good enough for
 FINRA. All the machines must be synced to an external accurate source
 at least once per trading day.

That was why I proposed to ntpdate on your (upstream-free since the
29th) NTP server(s) sometime on the 30th. That would synchronise its
local clock with an external accurate source, without learning the Leap
Indicator.

 Our plan is to disable our two stratum 1 servers, and our 3 stratum 2
 servers before the leap second turnover, but to be 100% safe we would
 need to do that 24 hours before, but that would be a violation of
 FINRA regulations.

If you run your own straum-1 servers, can't you just opt not to
configure leapfile? Assuming your own organisation is the only user
of those servers, that is (certainly don't do that if it's a public
server). After the leap second has passed, you can proceed to correct
things. Your clients will then be 1s ahead of correct time, and will
need to step/slew their clocks to get in sync. But maybe that's OK as
far as FINRA's concerned...

 It looks like the safest thing for us to do is to keep our NTP
 servers running and deal with any crashes/issues. That's better than
 having to deal with FINRA.

Maybe. I have no experience with FINRA. :-)

Tore

AS4788 Telecom Malaysia major route leak?

2015-06-12 Thread Tore Anderson

I see tons of bogus routes show up with AS4788 in the path, and at
least AS3549 is acceping them.

E.g. for the RIPE NCC (193.0.0.0/21):

[BGP/170] 00:20:29, MED 1000, localpref 150
  AS path: 3549 4788 12859  I, validation-state: valid
 to 64.210.69.85 via xe-1/1/0.0

Tore

Re: AS4788 Telecom Malaysia major route leak?

2015-06-12 Thread Tore Anderson

* Marty Strong via NANOG nanog@nanog.org

 It *looks* like GBLX stopped accepting the leak.

If so, it's a partial fix at best, I still see plenty of leaked routes,
both via 3356 and 3549, e.g.:

tore@cr1-osl3 show route 195.24.168.98 all 
Jun 12 12:03:54 +0200

inet.0: 544405 destinations, 1591203 routes (543086 active, 3 holddown, 526626 
hidden)
+ = Active Route, - = Last Active, * = Both

195.24.160.0/19*[BGP/170] 00:03:59, MED 2000, localpref 50, from 87.238.63.5
  AS path: 3356 3549 4788 6939 39648 I, validation-state: 
unverified
 to 87.238.63.56 via ae0.0
[BGP/170] 00:05:24, MED 0, localpref 50, from 87.238.63.2
  AS path: 3356 3549 4788 6939 39648 I, validation-state: 
unverified
 to 87.238.63.56 via ae0.0
[BGP ] 01:16:00, MED 25245, localpref 100
  AS path: 3549 4788 6939 39648 I, validation-state: 
unverified
 to 64.210.69.85 via xe-1/1/0.0

It seems to have started around 08:47 UTC, that's when I got my first
alarm from ring-sqa at least.

Tore

Re: Greenfield 464XLAT (In January)

2015-06-11 Thread Tore Anderson

* Baldur Norddahl baldur.nordd...@gmail.com

 The high tech solution is stuff like MAP where you move the cost out
 to the CPE. But then you need to control the CPE - if you have that
 then great. You would still want to sell a non-NAT (and MAP is NAT)
 to users that require a public IPv4 address, so you still need to go
 dual stack or use some tunnelling for that.

Hi Baldur,

MAP is *not* NAT; that's what's so neat about it. The users do get a
public IPv4 address (or prefix!) routed to their CPE's WAN interface,
towards which they can accept inbound unsolicited connections.

The public IPv4 address could be port-restricted if the operator wants
address sharing, but it does not have to be. You could do both at the
same time, e.g., giving your premium users a /32 or /28, while the
standard subscription includes a /32 with 4k ports.

I will grant you that MAP-T performs NAT (i.e., protocol translation)
internally, but the translations that happens when a packet enters the
MAP domain are reversed when it exits. So the IPv4 addresses are
transparent end-to-end.

MAP-E (and lw4o6 for that matter), on the other hand, has no form of
NAT anywhere. (Unless you count the NAPT44 that sits between the
subscriber's RFC1918 LAN segment and the CPE's WAN interface, but
that's not exactly something that's unique to MAP.)

Nicholas: If I were you, before going down the 464XLAT route, I'd first
look closely at these technologies, in the order given:

1) MAP (because it is fully stateless)
2) lw4o6 (because it is mostly stateless, i.e., no session tracking)
3) DS-Lite (which, like 464XLAT, is stateful, but you'll have way more
   CPEs to choose from than with 464XLAT, which is mostly for mobile)

Tore

Re: Android (lack of) support for DHCPv6

2015-06-10 Thread Tore Anderson

* Lorenzo Colitti

 Remember, what I'm trying to do is avoid user-visible regressions
 while getting rid of NAT. Today in IPv4, tethering just works,
 period. No ifs, no buts, no requests to the network. The user turns
 it on, and it works.

*cough*

https://code.google.com/p/android/issues/detail?id=38563

In particular comment 105 is illuminating. Android is apparently fully
on-board with mobile carriers' desire to break tethering, even going so
far as to implement a feature whose *sole purpose* is to break
thethering.

Yet, at the same time, you refuse to implement DHCPv6 on WiFi because it
*might*, as a *side effect*, break tethering. This does not strike me
as very consistent.

If Android had instead simply refused to establish a mobile data
connection to the mobile carriers that breaks tethering, then the
refusal to implement DHCPv6 would make much more sense.

Tore

Re: Android (lack of) support for DHCPv6

2015-06-10 Thread Tore Anderson

* Lorenzo Colitti

 Tethering is just one example that we know about today. Another example is
 464xlat.

You can't do 464XLAT without the network operator's help anyway (unless
you/Google is planning on hosting a public NAT64 service?). If the
network operator actively wants 464XLAT to be used, by providing
DNS64/NAT64 service, then it seems fairly reasonable to assume that
they're not going to deploy an IPv6/DHCPv6-only network that limits the
number of IA_NA per attached node to 1.

 And that's not counting future applications that can take
 advantage of multiple IP addresses that we haven't thought of yet, and that
 we will have if we get stuck with
 there-are-more-IPv6-addresses-in-this-subnet-than-grains-of-sand-but-you-only-get-one-because-that's-how-we-did-it-in-IPv4
 networks.

Of course. Hard to argue against imaginary things. :-)

On the other hand, there exist applications *today* that do require
DHCPv6. One such example would be MAP, which IMHO is superior to
464XLAT both for the network operator (statlessness ftw) as well as for
the end user (unsolicited inbound packets work, no NAT traversal
required). MAP is provisioned with DHCPv6 (I-D.ietf-softwire-map-dhcp),
so without DHCPv6 support in Android, MAP support in Android is a
non-starter.

Tore

Re: Android (lack of) support for DHCPv6

2015-06-10 Thread Tore Anderson

* Lorenzo Colitti

  On the other hand, there exist applications *today* that do require
  DHCPv6. One such example would be MAP, which IMHO is superior to
  464XLAT both for the network operator (statlessness ftw) as well as
  for the end user (unsolicited inbound packets work, no NAT traversal
  required). MAP is provisioned with DHCPv6
  (I-D.ietf-softwire-map-dhcp), so without DHCPv6 support in Android,
  MAP support in Android is a non-starter.
 
 
 Support for the DHCPv6 protocol, or support for assigning addresses
 from IA_NA?

I'm not 100% certain, but you can possibly run MAP without IA_NA. But I
think you'll need the CE to be configured with a predictable IPv6
address so that the BR knows where to send the IPv6-encapsulated or
-translated IPv4 packets. I don't see how that would work with SLAAC.
But I'm not a MAP expert, so I'm open to be educated otherwise.

Anyway, here's a (hopefully constructive) suggestion on a way forward:

* Implement DHCPv6 client support (IA_NA, IA_TA, IA_PD .. the works)
* Upon network connection, request 2x IA_NA and 1x IA_PD (in addition
  to SLAAC):
** If you get addressing from SLAAC and/or IA_PD, accept the
   configuration and connect to the network.
*** If apps/services require additional addresses, self-assign them
from the on-link/delegated prefix as needed.
** If you get 2x IA_NA, accept the configuration and connect to the
   network.
*** If apps/services requires additional addresses, request additional
IA_NA as needed.
 If additional IA_NAs are declined either warn user or trigger
 Android's already existing «avoided bad network» functionality.
** If you get no SLAAC or IA_PD, and IA_NA = 1, then refuse to
   connect to the network (or, for a dual-stack network, connect
   IPv4-only). (I.e., same behaviour as on a DHCPv6-only network
   today.)

Why N=2? Because it's 1, and what you seem to be worried about is
operators using N=1 without thought (because that's what we did in
IPv4). N=2 will confirm that's not the case for the given network, so
I think confirming N=2 gives a much stronger indication that the
network allows N=something reasonable than confirming N=1.

That said, I doubt that you can rely on the network accepting
N=hundreds or more, neither for DHCPv6 IA_NA *nor* SLAAC, due to
neighbour table limitations and DAD overhead (both delay and packets).
If the future applications we're imagining needs IPv6 addresses in that
ballpark (which isn't *that* far-fetched - say a new address per
connection, process, app, whatever), IA_PD is the only mechanism we have
today that will work. If you start supporting IA_PD, my bet networks
are going to start offering it - just like when you added 464XLAT.

Tore

Re: Android (lack of) support for DHCPv6

2015-06-10 Thread Tore Anderson

* Dave Taht

 I am told that well over 50% of all android development comes from
 volunteer developers so rather than kvetching about this it seems
 plausible for an outside effort to get the needed features for
 tethering and using dhcpv6-pd into it. If someone wanted to do the
 work.

https://android-review.googlesource.com/#/c/78857/

Tore

Re: Peering and Network Cost

2015-04-16 Thread Tore Anderson

* Mark Tinka mark.ti...@seacom.mu

 On 16/Apr/15 07:25, Tore Anderson wrote:
  We're in a similar situation here; transit prices has come down so
  much in recent years (while IX fees are indeed stagnant) that I am
  certain that if I were to cut all peering and buy everything from a
  regional tier-2 instead, I'd be lowering my total MRC somewhat,
  without really reducing connectivity quality to my (former) peers.
 
 I wouldn't say exchange point prices are stagnant, per se. They may
 remain the same, but what goes up is the port bandwidth. It's not
 directly linear, but you get my point.
 
 Again, the burden is on the peering members to extract the most out of
 their peering links by having as much peering as possible.

You appear to be assuming that an IP transit port is more expensive
then an IXP port with the same speed. That doesn't seem to always be
the case anymore, at least not in all parts of the world, and I expect
this trend to continue - transit prices seems to go down almost on a
monthly basis, while the price lists of the two closest IXPs to where
I'm sitting are dated 2011 and 2013, respectively.

Even if the transit port itself remains slightly more expensive than
the IXP port like in the example Baldur showed, the no-peering
alternative might still be cheaper overall because even if you're
peering most of your traffic you'll still need to pay a nonzero amount
for a (smaller or less utilised) transit port anyway.

Tore

Re: Peering and Network Cost

2015-04-15 Thread Tore Anderson

* Baldur Norddahl baldur.nordd...@gmail.com

 Transit cost is down but IX cost remains the same. Therefore IX is longer
 cost effective for a small ISP.
 
 As an (non US) example, here in Copenhagen, Denmark we have two internet
 exchanges DIX and Netnod. We also have many major transit providers,
 including Hurricane Electric and Cogent.
 
 Netnod price for a 1 Gbps port is 4 SEK = 4500 USD / year
 http://www.netnod.se/ix/join/prices. DIX is 4 DKK = 5700 USD / year
 http://dix.dk/serviceinformation/
 
 HE.net is offering 1 Gbps flatrate for 450 USD / month list price = 5400
 USD /year.
 
 Cogent can match that.
 
 So why would a small ISP pay 4500 USD for a service with no guarantee of
 how much traffic they will be able to peer away?

We're in a similar situation here; transit prices has come down so much
in recent years (while IX fees are indeed stagnant) that I am certain
that if I were to cut all peering and buy everything from a regional
tier-2 instead, I'd be lowering my total MRC somewhat, without really
reducing connectivity quality to my (former) peers.

For us, the primary reason that keeps us peering is DDoS prevention. Our
traffic is mostly regional, so if a customer of mine gets hit with a
volumetric DDoS attack that would saturate my IP transit lines and cause
collateral damage, that's no big deal as we can just RTBH the customers
prefix towards our transit providers. The customer is only mildly
inconvenienced by this as, say, 90% of his traffic goes to our peers.
Without peering the attack would succeed because my RTBH would
completely offline my customer.

Tore

Re: IPv6 allocation plan, security, and 6-to-4 conversion

2015-02-01 Thread Tore Anderson

* William Herrin

 T-Mobile uses something called 464XLAT. Don't let the translation
 part fool you: it's a tunnel. IPv4 in one side, IPv4 out the other.

464XLAT is not a tunnel. Protocol translation is substantially
different from tunneling. With tunneling, the original layer-3 header
is kept intact as it is encapsulated inside another layer-3 header.
With translation, the original layer-3 header is removed and replaced
with another layer-3 header.

They come with a different set of trade-offs, such as:

- Protocol translation may be lossy (e.g., exotic IPv4 options may not
  survive the translation to IPv6 and would therefore not reappear
  after translation back to IPv4). Tunneling, OTOH, is not lossy.

- Tunneling moves the original layer-4 header into another
  encapsulation layer, so e.g. an ACL attempting to match an IPv6 HTTP
  packet using something like next-header tcp, dst port 80 will not
  work. With translation, it will.

 Kabel Deutschland uses something called Dual Stack Lite. It's also a
 tunnel: the Kabel-owned CPE encapsulates the customer's IPv4 packets
 within IPv6 and delivers them to the Kabel's IPv4 carrier NAT box.

Yep. DS-Lite is indeed tunneling.

 So sure, if you don't mind dissembling a little bit you can say that
 they moved their infrastructure to IPv6-only. In my mind, tunnelling
 IPv4 over IPv6 where it both enters and exits the carrier's area of
 control as an IPv4 packet doesn't count as IPv6-only.

I guess we disagree about the definitions, then.

In my view, a dual-stack network is one where IPv4 and IPv6 are running
side-by-side like ships in the night with no fate sharing. You might
be running two different IGP protocols (like OSPFv2 and OSPFv3) and a
duplicated set of iBGP sessions. ACLs and the like must exist both for
IPv4 and IPv6. And so on. If you turn off one protocol, and the other
one keeps on running just like before. 

This is in contrast with a single-stack network; turn off that single
stack, and nothing works. That doesn't mean that cannot simultaneously
transport other layer-3 protocols across that single-stack network;
just that there is a clear distinction between which is the main
layer-3 protocol and others being transported across it.

You might very well simultaneously transport IPv6, AppleTalk, and
IPX/SPX across an IPv4-only network - but that doesn't mean that the
network is quad-stack - IMHO, it's still single-stack IPv4.

 On Fri, Jan 30, 2015 at 11:44 AM, Tore Anderson t...@fud.no wrote:
  If everyone could just dual-stack their networks, they
  might as well single-stack them on IPv4 instead; there would be no
  point whatsoever in transitioning to IPv6 for anyone.
 
 What do you mean if? Carrier NAT means we *can* single-stack on IPv4
 for the next 20 to 30 years, if we're so inclined.

I suppose that's true - if you ignore that a number of other folks are
deploying IPv6 to deal with their IPv4 exhaustion, and that products
and services are being put to market that recommends the use of IPv6
connectivity above NATed IPv4 (e.g., Xbox One).

So much earlier than 30 years from now you'll be wanting to have IPv6
in your network anyway, and once you come to that realisation you might
also realise that operating a dual-stack network for those 30 years is
not going to be any fun at all due to the increased complexity it
causes. Especially if the IPv4 part of that dual-stack network is in
itself getting increasingly complex due to more and more NAT being
added to deal with growth.

So IMHO dual-stack is a bad recommendation, or at least it is rather
shortsighted. If you're in a position to do single-stack IPv6-only with
IPv4 as a service (like T-Mobile USA or Kabel Deutschland), you'll end
up with a much simpler network that it'll be much easier to maintain
over the years. This also facilitates the use of IPv4 address sharing
solutions like lw4o6 and MAP, whose stateless nature makes them vastly
superior to traditional stateful Carrier Grade NAT44 boxes.

YMMV, of course.

Tore

Re: IPv6 allocation plan, security, and 6-to-4 conversion

2015-02-01 Thread Tore Anderson

Hi Baldur,

* Baldur Norddahl baldur.nordd...@gmail.com

 On 1 February 2015 at 20:10, Tore Anderson t...@fud.no wrote:
 
  - Tunneling moves the original layer-4 header into another
encapsulation layer, so e.g. an ACL attempting to match an IPv6
HTTP packet using something like next-header tcp, dst port 80
will not work. With translation, it will.
 
 But on the other hand you will mess up with the routing of the
 network. In our network both IPv4 and IPv6 are routed to different
 transit points depending on the destination. With translation you
 need to ensure that the traffic passes a translation point before it
 leaves the network.

Sure, but you could scatter these translation points all over your
network, so that the flow of traffic remains optimal. You could enable
the translation functionality on your aggregation and/or your border
routers, for example. The traffic would need to pass those anyway, so
there's no real change to how traffic is being routed.

 If that translation involves NAT, then you also need to ensure that
 the return traffic hits the same translation device.

No, with stateless solutions like MAP and lw6o4, there is no such
requirement. Anycast them or use ECMP towards them however way you like.

This is in my view one of the great advantages of such solutions over
IPv4 CGN. To the best of my knowledge, there exists no stateless IPv4
sharing mechanism. So the CGN-ed traffic must flow bidirectionally
across the same translation device, which then could easily become a
choke point. Also, should the CGN device fail, all the existing sessions
it was handling would be disrupted.

  In my view, a dual-stack network is one where IPv4 and IPv6 are
  running side-by-side like ships in the night with no fate
  sharing. You might be running two different IGP protocols (like
  OSPFv2 and OSPFv3) and a duplicated set of iBGP sessions. ACLs and
  the like must exist both for IPv4 and IPv6. And so on. If you turn
  off one protocol, and the other one keeps on running just like
  before.
 
 By that definition my dual stack network is single stack: kill ipv4
 and MPLS goes down = everything is down.
 
 On the other hand there are actually two IPv4 networks, since the IPv4
 network under MPLS does not carry internet traffic directly. BOTH
 IPv4 and IPv6 can be said to be tunneled through the MPLS network.

While MPLS certainly blurs the lines a bit, based on your description I
think that your network could reasonably be described as single-stack
MPLS/IPv4-only at its core, while IPv6 (using 6PE I guess?) and another
instance of IPv4 (distinct from the one used for MPLS signaling) is
being transported as a service across that single-stack network.

 I do not see the point in making this mess even bigger by adding
 another layer by shoehorning v4 traffic into v6 packets.

Agreed, considering that you seem to already be enjoying the benefits
of having a single-stack network. That is after all what I am saying
folks should be considering, rather than automatically going down the
dual-stack road. While you're using MPLS instead of IPv6, the principle
is similar.

 I fail to see the complexity. You are advocating that I should have
 spent money on more equipment and force my users to use a ISP
 supplied CPE (currently my users can use any CPE they want).

I'm just advocating that people should seriously *consider* it,
especially if they're buidling something new. I'm not saying it's for
everyone everywhere, nor for you specifically. For a provider that
controls the user equipment, going IPv6-only certainly a possibility,
as demonstrated by T-Mobile USA and Kabel Deutschland. If OTOH there is
a requirment to support legacy IPv4-only CPEs, then clearly IPv6-only
isn't going to work out too well.

Tore

Re: IPv6 allocation plan, security, and 6-to-4 conversion

2015-01-30 Thread Tore Anderson

* William Herrin

 nat64/nat46 - allows an IPv6-only host to interact in limited ways
 with IPv4-only hosts. Don't go down this rabbit hole. This will
 probably be useful in the waning days of IPv4 when folks are
 dismantling their IPv4 networks but for now the corner cases will
 drive you nuts. Plan on dual-stacking any network which requires
 access to IPv4 resources such as the public Internet.

For many folks, that's easier said than done.

Think about it: If everyone could just dual-stack their networks, they
might as well single-stack them on IPv4 instead; there would be no
point whatsoever in transitioning to IPv6 for anyone.

Tore

Re: IPv6 allocation plan, security, and 6-to-4 conversion

2015-01-30 Thread Tore Anderson

* Mel Beckman

Um, haven't you heard that we are out of IPv4 addresses? The point
 of IPv6 is to expand address space so that the Internet can keep
 growing. Maybe you don't want to grow with it, but most people do.
 Eventually IPv4 will be dropped and the Internet will be IPv6-only.
 Dual-stack is just a convenient transition mechanism.

Mel,

Dual-stack was positioned to be a convenient transition mechanism 15
years ago (to take the year when RFC 2893 was published). However, that
train left the platform mostly empty years ago, when the first RIRs
started to run out of IPv4 addresses. After all, we were supposed to
have dual-stack everywhere *before* we ran out of IPv4. That didn't
happen.

The key point is: In order to run dual-stack, you need as many IPv4
addresses as you do to run IPv4-only. Or to put it another way: If you
don't have enough IPv4 addresses to run IPv4-only, then you don't have
enough IPv4 addresses to run dual-stack either.

Sure, you can squeeze some more life-time out of IPv4 by adding more
NAT (something which is completely orthogonal to deploying IPv6
simultaneously). However, if you're already out of IPv4, and you
already see no way forward except adding NAT, then you should seriously
consider doing the NAT (or whatever backwards compat mechanism
you prefer) between the residual IPv4 internet and your IPv6
infrastructure, instead of doing it between IPv4 and IPv4.

Running single-stack is simply much easier and less complex than
dual-stack, and once your infrastructure is based on an IPv6-only
foundation, you don't have to bother with any IPv4-IPv6 transition
project ever again.

Tore

Re: IPv6 allocation plan, security, and 6-to-4 conversion

2015-01-30 Thread Tore Anderson

* Baldur Norddahl

 Single stacking on IPv6 is nice in theory. In practice it just doesn't work
 yet. If you as an ISP tried to force all your customers to be IPv6 single
 stack, you would go bust.

Kabel Deutschland, T-Mobile USA, and Facebook are examples of companies
who have already or are in the process of moving their network
infrastructure to IPv6-only. Without going bust.

What you *do* need, is some form of connectivity to the IPv4 internet.
But there are smarter ways to do that than dual stack. Seriously, if
you're building a network today, consider making IPv4 a legacy app or
service running on top of an otherwise IPv6-only infrastructure. Five
years down the road you'll thank me for the tip. :-)

Tore

Re: DDOS solution recommendation

2015-01-12 Thread Tore Anderson

* Roland Dobbins rdobb...@arbor.net

 On 12 Jan 2015, at 16:19, Tore Anderson wrote:
 
  I'd love to use flowspec over D/RTBH, but to me it seems like 
  vapourware.
 
 I meant on your own infrastructure, apologies for the confusion.

Right. So if I first need to accept the traffic onto my infrastructure
before I can discard it, I'm dead in the water anyway: My uplinks will
sit there at 100% ingress utilisation, dropping legitimate traffic.
/32 or /128 D/RTBH announcements towards my transits is my only real
option at this point. That helps protect against collateral damage, and
if the customer's audience is local, it can also restore full operation
for the attacked customer's primary markets (which are usually reached
via peers instead of transits).

For attacks that are conveniently sized smaller than my upstream
capacity, I could see that flowspec could be useful, but not in a
unique way, as inside my own network I can easily distribute targeted
stateless discard ACLs in many other ways too (I use Netconf currently).

 Transit providers utilizing Juniper aggregation edge routers could do it 
 now - why they don't, I don't know.

I'd definitively be willing to pay a premium for such a feature.

Tore

Re: DDOS solution recommendation

2015-01-12 Thread Tore Anderson

* Roland Dobbins rdobb...@arbor.net

 On 11 Jan 2015, at 20:52, Ca By wrote:
 
  3.  Have RTBH ready for some special case.
 
 S/RTBH and/or flowspec are better (S/RTBH does D/RTBH, too).

But are there any transit providers that support flowspec these days?
As I understand it, only GTT used to, but they stopped.

I'd love to use flowspec over D/RTBH, but to me it seems like
vapourware.

Tore

Re: Charging fee for BGP prefix per /24?!

2014-12-10 Thread Tore Anderson

* Yucong Sun

 My recent inquiry to some network provider reveals that they are
 charging fee for per /24 announced. Obvious that would means they get
 to charge a lot with little to none efforts on their side.
 
 In a world we are charging total bytes transferred instead of bps on
 uplinks, i can't say I'm surprised that much. But does anyone else had
 same experience? Did you pay? Is this the new status quo now?

Haven't encountered this myself, but putting a price on DFZ routing
slots seems like a Good Thing to me.

Tore

Re: IPv6 Default Allocation - What size allocation are you giving out

2014-10-09 Thread Tore Anderson

* Baldur Norddahl

 Why do people assign addresses to point-to-point links at all? You can just
 use a host /128 route to the loopback address of the peer. Saves you the
 hassle of coming up with new addresses for every link.

Why do you need those host routes?

Most IPv6 IGPs work just fine without global addresses or host routes.

https://tools.ietf.org/html/draft-ietf-opsec-lla-only-11

Tore

Re: What Net Neutrality should and should not cover

2014-04-27 Thread Tore Anderson

* William Herrin

 On Sun, Apr 27, 2014 at 2:05 AM, Rick Astley jna...@gmail.com wrote:
 #3 On paid peering:
 I think this is where people start to disagree but I don't see what should
 be criminal about paid peering agreements. More specifically, I see serious
 problems once you outlaw paid peering and then look at the potential
 repercussions that would have.
 
 Double-billing Rick. It's just that simple. Paid peering means you're
 deliberately billing two customers for the same byte -- the peer and
 the downstream. And not merely incidental to ordinary service - the
 peer specifically connects to gain access to customers who already pay
 you and no one else. Where those two customers have divergent
 interests, you have to pick which one you'll serve even as you
 continue to bill both. That's a corrupt practice.

It's not just that simple.

If for example you asks for a peering with me, the first thing I'll do
is to take a close look at how the traffic between our two networks is
currently being routed.

If I see that I have no monetary or technical gain from setting up that
peering with you, perhaps because the traffic is currently flowing via
an already existing peering of mine (with your upstream, say), or via a
transit port of mine that's not exceeding its CDR, then I'd probably
want you to at cover my costs of setting up that peering before
accepting, at the very least.

Even if I was exceeding the CDR on my transit ports, it's not at all
certain that accepting a peering with you would even be a break-even
proposition for me. Keep in mind that unlike routers and line cards, IP
transit service *is* dirt cheap these days.

So no, refusing a peering or requiring the would-be peer to pay for the
privilege isn't *necessarily* corrupt practice. It Depends.

Tore

Re: misunderstanding scale

2014-03-24 Thread Tore Anderson

* William Herrin

 On Sat, Mar 22, 2014 at 8:19 PM, Randy Bush ra...@psg.com wrote:
 don't believe for a moment that v6 to v4 protocol translation is any less
 ugly than CGN.

 it can be stateless
 
 You're smarter than that.

https://tools.ietf.org/html/rfc6145
https://tools.ietf.org/html/draft-ietf-softwire-map-t-05
https://tools.ietf.org/html/draft-anderson-siit-dc-00

Tore

Re: misunderstanding scale (was: Ipv4 end, its fake.)

2014-03-23 Thread Tore Anderson

* John Levine

 Also, although it is fashionable to say how awful CGN is, the users
 don't seem to mind it at all.

You might just be looking in the wrong places.

Try searching for playstation nat type 3 or xbox strict nat.

Tore

Re: misunderstanding scale

2014-03-22 Thread Tore Anderson

* Nick Hilliard

 the level of pain
 associated with continued deployment of ipv4-only services is still nowhere
 near the point that ipv6 can be considered a viable alternative.

This depends on who you're asking; as a blanket statement it's
demonstrably false: For the likes of T-Mobile USA¹ and Facebook², or
even myself³, IPv6-only isn't just an «alternative». It's «happening».

[1]
http://www.dslreports.com/shownews/TMobile-Goes-IPv6-Only-on-Android-44-Devices-126506
[2]
https://www.dropbox.com/s/doazzo5ygu3idna/WorldIPv6Congress-IPv6_LH%20v2.pdf
[3] http://www.ipspace.net/IPv6-Only_Data_Centers

Tore

Re: BGP multihoming

2014-02-03 Thread Tore Anderson

* Tore Anderson

 * Baldur Norddahl
 
 Is assigning a /24 from my own PA space for the purpose of BGP
 multihoming considered sufficient need?
 
 Not with current policies, no

That was then. With current policies: yes.

To elaborate a bit, the RIPE Community just reached consensus on a
policy change that makes the size and the purpose of an assignment
entirely a local decision. That means that if you and your customer
agree that a /X is needed for purpose Y, and you as the LIR have the
available space and the willingness to make that assignment, you are now
free to make it.

The new IPV4 policy does not mandate any limits to what X and Y might
be, except for the fact that Y must somehow involve «operating a
network» (your use case certainly qualifies).

Tore

Re: Updated ARIN allocation information

2014-02-01 Thread Tore Anderson

* Owen DeLong

 In answer to Tore's statement, this block does not apply the standard
 justification criteria and I think you would actually be quite hard
 pressed to justify a /24 from this prefix. In most cases, it is
 expected that these would be the IPv4 address pool for the public
 facing IPv4 side of a NAT64 or 464xlat service. Most organizations
 probably only need one or two addresses and so would receive a /28.
 It is expected that each of these addresses likely supports several
 thousand customers in a service provider environment.

This latter expectation of over-subscription is not echoed by the policy
text itself. One of the valid usage examples mentioned («key dual stack
DNS servers») would also be fundamentally incompatible with an
requirement of over-subscription. If you look at the common transitional
technologies you'll see that not even all of them do support
over-subscription. In alphabetical order:

- 6RD: No over-subscription possible, would require at least one IPv4
address per subscriber plus additional addressing required for the
transport/access network.

- 6PE/6VPE: No over-subscription possible, the infrastructure must be
numbered normally with IPv4.

- DS-Lite (AFTR): Over-subscription possible, but it's entirely
reasonable to want to make the ratio as low as possible, in order to
provide as many source ports as possible to the subscriber, to ease
abuse handling, and so on.

- MAP: Similar to DS-Lite, but is less flexible with regards to
over-subscription, as all users in a MAP domain must get the same amount
of ports. Thus the maximum over-subscription you can achieve is limited
by your most active subscriber in his peak period of use, i.e., if you
have a subscriber whose usage peaks to 20k ports, then that MAP domain
can only support a 2:1 over-subscription ratio. MAP can also be
configured in a not over-subscribed 1:1 mode.

- NAT64: Same as DS-Lite.

- SIIT: No over-subscription possible, as it's by design a 1:1 mapping.

That said, the policy language does say «ARIN staff will use their
discretion when evaluating justifications». So I suppose it is
theoretically possible that the ARIN staff will do their best Dr. Evil
impression, coming up with a big number N, and require requestors to
have a N:1 over-subscription ratio to qualify. However, that would be
better described as indiscretion, not discretion, IMHO. After all, the
RIRs are book-keepers, not network operators; if a network operator
makes a reasonable request, it isn't the RIR's place to second-guess
their network deployment. If ARIN is doing that, they're overstepping.

So in summary it seems to me that it is pretty easy to make a reasonable
request for a /24 under this particular policy, and especially
considering the immense routing benefit the /24 will have over all the
other possible prefix lengths that can be requested (persuading
providers/peers to accept /28s might be done on a small scale, but just
won't work if you need global connectivity, and global connectivity is
what end users expect), the only realistic outcome I can see is that
[almost] all the requestors will go ahead and ask for the /24. We'll
just have to wait and see, I guess.

Tore

Re: Updated ARIN allocation information

2014-01-31 Thread Tore Anderson

* Mark Andrews

 I understand this but this block changes the status quo.  It is a
 policy changer.  AFAIK ARIN hasn't done allocations to the /28 level
 like this in the past.  This is all new territory.

It's not exactly new. Like I've mentioned earlier in this thread, the
RIPE NCC has granted assignments smaller than /24 to requestors since,
well, forever. There are currently 238 such assignments listed in
delegated-ripencc-extended-latest.txt. However, these microscopic
assignments have proven hugely unpopular, amounting for only a fraction
of a percent of the total (there are 27733 assignments equal to or
larger than /24 in the same file).

What I fail to understand from this thread is the apparent expectation
that these smaller-than-/24 microscopic delegations from ARIN will be
popular. As I read the policy in question, the requestors may get a /24
instead. That's a pretty miniature block to begin with and trivial to
justify, and given human nature of wanting to grab as much of something
as you can (especially when you in all likelihood cannot get nearly as
much as you actually need), coupled with the fact that a /24 is likely
to be immensely more useful than anything smaller...well, I just don't
see why we shouldn't realistically expect that pretty much all of the
assignments being made from this block will be exactly /24, and that
exceptions that prove the rule will amount for 1% of the total - just
like we've seen happen in the RIPE region.

Oh well. Time will tell, I suppose.

Tore

Re: FW: Updated ARIN allocation information

2014-01-30 Thread Tore Anderson

* Justin M. Streiner

 In the worst case, this would add another 262,144 routes (/10 fully
 assigned, and all assignments are /28s) to the global IPv4 route view.
 Realistically, the number will be a good bit smaller than that, but only
 time will tell for sure exactly how much smaller.  Wash/rinse/repeat for
 any other RIR that adopts a similar policy.

I wouldn't worry if I were you. I'll wager you $100 that pretty much all
of the people requesting a block from ARIN under this policy (or any
other) is going to go for a /24 (or larger). There is some precedent;
RIPE policy has not mandated a minimum assignment size for IPv4 PI, at
least not in the last decade, yet the NCC has made almost no assignments
smaller than /24.

Tore

Re: Are specific route objects in RIR databases needed?

2014-01-30 Thread Tore Anderson

* Job Snijders

 On Thu, Jan 30, 2014 at 06:51:59PM +0200, Martin T wrote:
 
 for example there is a small company with /22 IPv4 allocation from
 RIPE in European region. This company is dual-homed and would like to
 announce 4x /24 prefixes to both ISPs. Both ISP's update their
 prefix-lists automatically based on records in RIPE database. For
 example Level3 uses this practice at least in Europe. If this small
 company creates a route object for it's /22 allocation, then is it
 enough? Theoretically this would cover all four /24 networks. Or in
 which situation it is useful/needed to have route object for each
 /24 prefix?
 
 You should create a route object for each route that you announce, if
 you announce 4 x /24 you should create a route: object for each /24. 

+1

 ps. Can you please send 20 dollarcent per /24 to my paypal account
 (j...@instituut.net) with the reference deaggregation fee?

Indeed.

Martin, I'd suggest announcing the 4 x /24s to each ISP tagged with the
no-export community in order to achieve whatever you are trying to do,
*in addition* to the covering /22. That way you're not polluting Job,
my, and everyone else's routing tables more than necessary, only your
own ISPs', but then again you're actually paying them for the privilege.

Tore

Re: BGP multihoming

2014-01-29 Thread Tore Anderson

* Baldur Norddahl

 Apologies for a RIPE question on NANOG, although I believe this issue
 will soon enough to be relevant for the ARIN region as well.

Relevant perhaps, but as the policies differ, so may the correct answers...

 I had a customer ask if we could provide him with BGP such that he
 could be multihomed. He already has 128 IP addresses from another
 ISP. Obviously a /25 is a non go for multihoming as everyone are
 going to ignore his route.
 
 I would then need to help him with acquiring a /24 PI. Which appears
 to be impossible as RIPE does no longer assign PI space and PI can
 not be reassigned and thus be bought.

There is another option, namely if your customer becomes a RIPE NCC
member (i.e., an LIR), he'll get a PA /22. (Of course, you could offer
to perform all the administrative work is to start and operate an LIR on
your customer's behalf, for a reasonable fee.)

 Is assigning a /24 from my own PA space for the purpose of BGP
 multihoming considered sufficient need?

Not with current policies, no, as the multihoming clause only applied
specifically to PI assignments, not pA. However, if you customer can
show that he'll be using at least 128 addresses (i.e., 50% of a /24)
within a year, he does qualify for an assignment of a /24. Plans to
renumber out of his current /25 would count towards that.

Tore

Re: Will a single /27 get fully routed these days?

2014-01-27 Thread Tore Anderson

* Sander Steffann

 But more important: which /10 is set aside for this? It is not listed
 on https://www.arin.net/knowledge/ip_blocks.html

Probably 23.128/10:

arin||ipv4|23.128.0.0|4194304||reserved|

Tore

Re: ddos attacks

2013-12-19 Thread Tore Anderson

* James Braunegg

 Of course for any form of Anti DDoS hardware to be functional you
 need to make sure your network can route and pass the traffic so you
 can absorb the bad traffic to give you a chance cleaning the
 traffic.

So in order for an Anti-DDoS appliance to be functional the network
needs to be able to withstand the DDoS on its own. How terribly useful.

Tore

Re: ddos attacks

2013-12-19 Thread Tore Anderson

* Dobbins, Roland

 Once again, nothing in my post said or referred to bandwidth;

The post of mine, to which you replied, did.

Perhaps if you had taken your own advice quoted below when replying to
me, Nick wouldn't have been contextually confused.

Tore

 In future, it might be a good idea to ensure that the points one
 attempts to make actually apply to the specific post to which one is
 replying.

Re: IP Fragmentation - Not reliable over the Internet?

2013-08-28 Thread Tore Anderson

* Owen DeLong

 On Aug 27, 2013, at 07:33 , valdis.kletni...@vt.edu wrote:
 
 Saku Ytti and Emile Aben have numbers that say otherwise.  And there must
 be a significantly bigger percentage of failures than pretty close to 0,
 or Path MTU Discovery wouldn't have a reputation of being next to useless.
 
 No, their numbers describe what happens to single packets of differing sizes.
 
 Nothing they did describes results of actually fragmented packets.

Yes, it did.

Hint: 1473 + 8 + 20

Tore

Re: Evaluating Tier 1 Internet providers

2013-08-28 Thread Tore Anderson

* Richard Hesse

 On Tue, Aug 27, 2013 at 12:14 PM, Joe Abley jab...@hopcount.ca wrote:
 
   - response you can expect when you call one day and say our 10GE is
 maxed out with inbound traffic from apparently everywhere, it has been
 going on for an hour, please help

 
 That was good for a laugh.
 
 If it's a DoS, you know what the answer already is. We no longer offer
 filtering for any of our customers. You must upgrade to the DDoS prevention
 service. We've actually made a list of other companies that share our
 providers' downstream links in each facility and reached out to them. We
 get them to call up and complain to said tier1 provider that something is
 affecting our traffic. That usually gets filters installedotherwise no
 dice.

Several providers have a self-service blackholing functionality which
may alleviate DDoS attacks. Typically you announce the attacked /32 or a
/128 to your upstreams, tagged with some special blackhole community,
and/or to a special multihop BGP session dedicated for blackholing
purposes. Doing so will cause your upstreams to automatically drop the
attack traffic within their network, *before* it gets to saturate your
uplinks.

Clearly, this is a blunt and last-resort type of tool which will cement
the efficiency of the attack from a global perspective, but that may be
an acceptable trade-off depending on the circumstances; you may prevent
collateral damage from impacting your other customers, and by cutting
out global attack traffic might enable the attacked customer to serve
his primary markets just fine through local peering sessions, regional
transits, and so forth.

I'm not buying transit from a network that don't give me such
blackholing functionality, FWIW.

Tore

Re: IANA Reference to hopopt as a protocol

2013-06-24 Thread Tore Anderson

* David Edelman

 Does anyone have an explanation for the IPv6 hopopt appearing as  protocol
 value 0  in http://www.iana.org/assignments/protocol-numbers? 

It's defined in RFC 2460, section 4.3. Which is linked to from the
reference column of the page you linked to...

Tore

Re: IP4 address conservation method

2013-06-07 Thread Tore Anderson

* Blake Hudson

 One thing not mentioned so far in this discussion is using PPPoE or some
 other tunnel/VPN technology for efficient IP utilization. The result
 could be zero wasted IP addresses without the need to resort to
 non-routable IP addresses in a customer's path (as the pdf suggested)
 and without some of the quirkyness or vendor lock-in of using ip
 unnumbered.
 
 PPPoE (and other VPNs) have many of the same downsides as mentioned
 above though, they require routing cost and increase the complexity of
 the network. The question becomes which deployment has more cost: the
 simple, yet wasteful, design or the efficient, but complex, design.

shameless plug alert

Or, simply just use IPv6, and use a stateless translation service
located in the core network to provide IPv4 connectivity to the public
Internet services.

This allows for 100% efficient utilisation of whatever IPv4 addresses
you have left - nothing needs to go to waste due to router interfaces,
subnet power of 2 overhead, internal servers/services that have no
Internet-available services, etc...all without requiring you to do
anything special on the server/application stacks to support it (like
set up tunnel endpoints), add dual-stack complexity into your network,
or introduce any form of stateful translation or VPN service into your
network.

Here's some more resources:

http://fud.no/talks/20130321-V6_World_Congress-The_Case_for_IPv6_Only_Data_Centres.pdf

http://tools.ietf.org/html/draft-anderson-siit-dc-00

In case you're interested in more, Ivan Pepelnjak and I will host a
(free) webinar about the approach next week. Feel free to join!

http://www.ipspace.net/IPv6-Only_Data_Centers

BTW: I hear Cisco has implemented support for this approach in their
latest AS1K code, although I haven't confirmed this myself yet.

Tore

Re: It's the end of the world as we know it -- REM

2013-04-26 Thread Tore Anderson

* Owen DeLong

 Quite the contrary… I personally think that the abysmal rate of IPv6
 adoption among some content providers (Are you listening, Amazon,
 Xbox, BING?) is just plain shameful.

FWIW, www.bing.com resolves to IPv6 addresses from where I'm sitting
(Oslo), and the page seems to load over IPv6 as well.

Also, Amazon provides some form of IPv6 (I believe it's based on 6RD or
something similar though). At least, the NLNOG RING has six
Amazon-hosted nodes, all with IPv6 enabled
(amazon0{1..6}.ring.nlnog.net). All of them respond to ICMPv6 pings from
here. Whether or not the average Amazon customer chooses to enable IPv6
or not is another story, though..

Tore

Re: It's the end of the world as we know it -- REM

2013-04-24 Thread Tore Anderson

* Andrew Latham

 I have sadly witnessed a growing number of businesses with /24s
 moving to colocation/aws networks and not giving up their unused
 network space. I assume this will come into play soon.

A couple of /24s being returned wouldn't make a significant difference
when it comes to IPv4 depletion. Heck, not even a couple of /8s would.
Trying to reclaim and redistribute unused space would be a tremendous
waste of effort.

 I have already read the news of blackmarket sales of network
 allocations in Europe.

Interesting. Do you have a link or some other kind of reference?

Tore

Re: It's the end of the world as we know it -- REM

2013-04-24 Thread Tore Anderson

* Mikael Abrahamsson

 On Wed, 24 Apr 2013, Tore Anderson wrote:
 
 I have already read the news of blackmarket sales of network
 allocations in Europe.

 Interesting. Do you have a link or some other kind of reference?
 
 http://www.ripe.net/lir-services/resource-management/listing is a
 white market sales place. Perhaps that's what the previous poster meant.
 
 Searching for IPv4 broker yields a lot of results as well, that might
 be the black market though.

White market transfers has been allowed in the RIPE region since late
2008, cf. http://www.ripe.net/ripe/policies/proposals/2007-08. There's
no requirement that the transferred space is put on the NCC's listing
service first - you can use a broker to arrange it if you want, or do it
completely in private.

For a transfer not to be white, the transaction would need happen
without the NCC's knowing and blessing. This implies validations of the
receiver's operational need for the allocation, and updating the
registry/database to reflect the new holder. I'm genuinely interested in
reading articles or other research documenting that such black market
transfers are happening (or not).

Tore

Re: It's the end of the world as we know it -- REM

2013-04-24 Thread Tore Anderson

* Chris Grundemann

 Nope, you are correct Geoff. There is a /10 reserved for transition
 technologies (e.g. outside addresses on a CGN) and there is a
 critical infrastructure reserve, but no general purpose reserve like
 in RIPE and APNIC.

One interesting thing is that this is dedicated specifically for
transition/deployment of *IPv6*. So the way I understand it, you won't
get any space from this block to number the outside of a NAT444-style
CGN, while you would for a NAT64-style CGN.

https://www.arin.net/policy/nrpm.html#four10

Tore

Re: It's the end of the world as we know it -- REM

2013-04-24 Thread Tore Anderson

* Andrew Latham

If I can walk around a smallish town and point at 5 businesses like
this its a possible solution. I am not claiming a few /24s will do, I
am claiming that there are many (for larger values of many) companies
like this.

There are certainly several thousands or even millions of unused IPv4
addresses in existence. But reclaiming and redistributing it, which
would be a colossal undertaking, would only push back IPv4 depletion by
a few months. It's simply not worth the effort.

I have already read the news of blackmarket sales of network
allocations in Europe.

Interesting. Do you have a link or some other kind of reference?

I did a quick search and they are easy to find. Many news articles
about Microsoft buying network allocations at auction to set a price
of ~$11USD per IP. One tangent article that I liked was

Sure, there's a market all right. However, the well publicised
Microsoft/Nortel transfer wasn't a black market transfer, it was done
in accordance with the ARIN community's policies. Straight from the
horse's mouth:

https://www.arin.net/about_us/media/releases/20110415.html

Such transfers are also permitted by the community's policies in the
RIPE region, and the NCC maintains a public list of all such
legit/white transfers that have taken place:

https://www.ripe.net/lir-services/resource-management/ipv4-transfers/table-of-transfers

http://www.datacenterknowledge.com/archives/2012/07/16/ipv4-addresses-now-driving-hosting-deals/

This article mentions a black market, but it falls short of providing
any tangible evidence that it really exists, or to what extent - it
appears to me to be more speculation and conjecture than anything else.

That said - such speculation may well turn out to be correct, of course,
and being involved in the RIPE community I'm genuinely interested in the
topic. Therefore I was hoping you'd point me in the direction of the
news of blackmarket sales of network allocations in Europe you
mentioned you have read.

Tore

Re: RPKI Support on the Juniper SRX line

2013-04-10 Thread Tore Anderson

* Carlos M. martinez

 the partner insists that Junos 12.3 / 13.1 supports RPKI on the SRX
 line.

JUNOS 12.3 and 13.1 aren't supported on SRX at all.

From e.g. http://www.juniper.net/support/downloads/?p=srx5600 :

«High: Junos OS Release 12.2, 12.3 and 13.1 are not supported On SRX
Series, J Series, LN1000 and WXC-ISM-200( PSN-2012-09-707).»

Tore

Re: Verizon DSL moving to CGN

2013-04-08 Thread Tore Anderson

* Owen DeLong

 The need for CGN is not divorced from the failure to deploy IPv6, it
 is caused by it.

In a historical context, this is true enough. If we had accomplished
ubiquitous IPv6 deployment ten years ago, there would be no IPv4
depletion, and there would be no CGN. However, that ship has sailed long
ago. You're using present tense where you should have used past.

 I was responding to Mikael's claim that pushing content providers to
 deploy IPv6 was orthogonal to the need for CGN.

If we put down the history books and focus on today's operational
realities, it *is* orthogonal. If you're an ISP fresh out of IPv4
addresses today, pushing content providers to deploy IPv6 is simply
not a realistic strategy to deal with it. CGN is.

 Clearly your statement here indicates that you see my point that it
 is NOT orthogonal, but, in fact the failure of content providers to
 deploy IPv6 _IS_ the driving cause for CGN.

I'm not sure why you are singling out content providers, BTW. There are
no shortage of other things out there that have an absolute hard
requirement on IPv4 to function properly. Gaming consoles, Android
phones and tables, iOS phones and tablets[1], home gateways, software
and apps, embedded devices, ... - the list goes on and on.

If the only missing piece of the puzzle was the lack of IPv6 support at
the content providers' side, IPv6+NAT64 would constitute a perfectly
viable residential/cellular internet service. As far as I know, however,
not a single provider is seriously considering this strategy going
forward. That's telling.

Tore

[1] From what I hear, anyway. They used to work fine on IPv6-only
wireless networks, I've seen it myself, but I've been told that it's
taken a turn for the worse over the course of the last year.

Re: Verizon DSL moving to CGN

2013-04-08 Thread Tore Anderson

* Mikael Abrahamsson

 On Mon, 8 Apr 2013, Rajiv Asati (rajiva) wrote:
 
 MAP is all about stateless (NAT64 of Encapsulation) and IPv6 enabled
 access. MAP makes much more sense in any SP network having its
 internet customers do IPv4 address sharing and embrace IPv6.
 
 It's still NAT.

AIUI, the standards-track flavour of MAP, MAP-E, is *not* NAT - it is
tunneling, pure encap/decap plus a clever way to calculate the outer
IPv6 src/dst addresses from the inner IPv4 addresses and ports. The
inner IPv4 packets are not modified by the centralised MAP tunneling
routers, so there is no Network Address Translation being performed.

The tunnel endpoint will 99.99% of cases be a CPE with a NAPT44
component though, so there is some NAT involved in the overall solution,
but it's pretty much the same as what we have in today's CPEs/HGWs. The
only significant difference is that a MAP CPE must be prepared to not
being able to use all the 65536 source ports.

Tore

1 2 >

1 - 100 of 157 matches

Mail list logo