RE: What are these Google IPs hammering on my DNS server?

2023-12-05 Thread Michael Hare via NANOG
Damian-

Not Google or ISCs fault, our customers have made some decisions that have 
exasperated the issues.  By and away the biggest problem facing my customers is 
that they have chosen a stateful border firewall that collapses due to session 
exhaustion and they put everything, including aDNS, behind said firewall.  “If 
it hurts, don’t do it” comes to mind, but out of my hands.

At quick glance following the ISC link I didn’t see the compute infrastructure 
[core count] needed to get 1Mpps.  There is an obvious difference between 99% 
load of ~500rps and 1M, so we can maybe advise to not undersize ADNS if that's 
an issue.

I'm an ISP engineer and am generally not the directly affected party, so I 
don't get to pick these implementation details for my customers.  I appreciate 
the background and suggestions from you and others on this thread like Mark.  
That's an interesting comment about DNSSEC that I hadn't considered.

-Michael

From: Damian Menscher 
Sent: Monday, December 4, 2023 12:21 PM
To: Michael Hare 
Cc: John R. Levine ; nanog@nanog.org
Subject: Re: What are these Google IPs hammering on my DNS server?

Google Public DNS (8.8.8.8) attempts to identify and filter abuse, and while we 
think we're fairly effective for large attacks (eg, those above 1Mpps), it gets 
more challenging (due to risk of false positives) to adequately filter small 
attacks.  I should note that we generally see the attack traffic coming from 
botnets, or forwarding resolvers that blend the attack traffic with legitimate 
traffic.

Based on ISC BIND load-tests [0], a single DNS server can handle O(1Mpps).  
Also, no domain should be served by a single DNS server, so O(1Mpps) seems like 
a safe lower-bound for small administrative domains (larger ones will have more 
redundancy/capacity).  Based on these estimates, we haven't treated mitigation 
of small attacks as a high priority.  If O(25Kpps) attacks are causing real 
problems for the community, I'd appreciate that feedback and some hints as to 
why your experience differs from the ISC BIND load-tests.  With a better 
understanding of the pain-points, we may be able to improve our filtering a 
bit, though I suspect we're nearing the limits of what is attainable.

Since it was mentioned up-thread, I'd caution against dropping queries from 
likely-legitimate recursives, as that will lead to a retry storm that you won't 
like (based on a few reports of authoritatives who suffered outages, the retry 
storm increased demand by 30x and they initially misdiagnosed the root cause as 
a DDoS).  The technically correct (if not entirely practical) mitigation for a 
DNS cache-busting attack laundered through open recursives is to deploy DNSSEC 
and issue NSEC/NSEC3 responses to allow the recursives to cache the 
non-existence of the randomized labels.

[0] https://www.isc.org/blogs/bind-performance-september-2023/

Damian
--
Damian Menscher :: Security Reliability Engineer :: Google :: AS15169

On Sun, Dec 3, 2023 at 1:22 PM Michael Hare via NANOG 
mailto:nanog@nanog.org>> wrote:
John-

This is little consolation, but at AS3128, I see the same thing to our 
downstream at times, claiming to come from both 13335 and 15169 often 
simultaneously at the tune of 25Kpps , "assuming it's not spoofed", which is 
pragmatically impossible to prove for me given our indirect relationships with 
these companies.  When I see these events, I typically also see a wide variety 
of country codes participating simultaneously.  Again, assuming it's not 
spoofed.  To me it just looks like effective harassment with 13335/15169 
helping out.  I pine for the internet of the 1990s.

Recent events in GMT for us were the following, curious if you see the same
~ Nov 26 05:40
~ Nov 30 00:40
~ Nov 30 05:55

Application agnostic, on the low $ end for "fixes", if it's either do something 
or face an outage, I've found some utility in short term automated DSCP 
coloring on ingress paired with light touch policing as close to the end host 
as possible, which at least keeps things mostly working during times of 
conformance.  Cheap/fast and working ... most of the time.  Definitely not 
great or complete at all, and a role I'd rather not play as an educational 
ISP/enterprise.

So what are most folks doing to survive crap like this?  Nothing/waiting it 
out?  Oursourcing DNS?  Scrubbing appliance?  Poormans stuff like I mention 
above?

-Michael

> -Original Message-
> From: NANOG 
> mailto:wisc@nanog.org>> On
> Behalf Of John R. Levine
> Sent: Sunday, December 3, 2023 1:18 PM
> To: Peter Potvin 
> mailto:peter.pot...@accuristechnologies.ca>>
> Cc: nanog@nanog.org<mailto:nanog@nanog.org>
> Subject: Re: What are these Google IPs hammering on my DNS server?
>
> > Did a bit of digging on Google's developer site and came across this:
> > https://developers.google.com/speed/public-
> dns/faq#locations_of_ip_address_

RE: What are these Google IPs hammering on my DNS server?

2023-12-03 Thread Michael Hare via NANOG
John-

This is little consolation, but at AS3128, I see the same thing to our 
downstream at times, claiming to come from both 13335 and 15169 often 
simultaneously at the tune of 25Kpps , "assuming it's not spoofed", which is 
pragmatically impossible to prove for me given our indirect relationships with 
these companies.  When I see these events, I typically also see a wide variety 
of country codes participating simultaneously.  Again, assuming it's not 
spoofed.  To me it just looks like effective harassment with 13335/15169 
helping out.  I pine for the internet of the 1990s.

Recent events in GMT for us were the following, curious if you see the same
~ Nov 26 05:40
~ Nov 30 00:40
~ Nov 30 05:55

Application agnostic, on the low $ end for "fixes", if it's either do something 
or face an outage, I've found some utility in short term automated DSCP 
coloring on ingress paired with light touch policing as close to the end host 
as possible, which at least keeps things mostly working during times of 
conformance.  Cheap/fast and working ... most of the time.  Definitely not 
great or complete at all, and a role I'd rather not play as an educational 
ISP/enterprise.

So what are most folks doing to survive crap like this?  Nothing/waiting it 
out?  Oursourcing DNS?  Scrubbing appliance?  Poormans stuff like I mention 
above?

-Michael 

> -Original Message-
> From: NANOG  On
> Behalf Of John R. Levine
> Sent: Sunday, December 3, 2023 1:18 PM
> To: Peter Potvin 
> Cc: nanog@nanog.org
> Subject: Re: What are these Google IPs hammering on my DNS server?
> 
> > Did a bit of digging on Google's developer site and came across this:
> > https://developers.google.com/speed/public-
> dns/faq#locations_of_ip_address_ranges_google_public_dns_uses_to_send_
> queries
> >
> > Looks like the IPs you mentioned belong to Google's public DNS resolver
> > based on that list on their site. They could also be spoofed though from a
> > DNS AMP attack, so keep that in mind.
> 
> Per my recent message, the replies are tiny so if it's an amplification
> attack, it's a very incompetent one.  The queries are case randomized so I
> guess it's really Google.  Sigh.
> 
> If anyone is wondering, I have a passive aggressive countermeasure against
> some overqueriers that returns ten NS referral names, and then 25 random
> IP addresses for each of those names, but I don't do that to Google.
> 
> R's,
> John
> 
> > --
> > *Accuris Technologies Ltd.*
> >
> >
> > On Sun, Dec 3, 2023 at 1:51 PM John Levine  wrote:
> >
> >> At contacts.abuse.net, I have a little stunt DNS server that provides
> >> domain contact info, e.g.:
> >>
> >> $ host -t txt comcast.net.contacts.abuse.net
> >> comcast.net.contacts.abuse.net descriptive text "ab...@comcast.net"
> >>
> >> $ host -t hinfo comcast.net.contacts.abuse.net
> >> comcast.net.contacts.abuse.net host information "lookup" "comcast.net"
> >>
> >> Every once in a while someone decides to look up every domain in the
> >> world and DoS'es it until I update my packet filters. This week it's
> >> been this set of IPs that belong to Google. I don't think they're
> >> 8.8.8.8. Any idea what they are? Random Google Cloud customers? A
> >> secret DNS mapping project?
> >>
> >>  172.253.1.133
> >>  172.253.206.36
> >>  172.253.1.130
> >>  172.253.206.37
> >>  172.253.13.196
> >>  172.253.255.36
> >>  172.253.13.197
> >>  172.253.1.131
> >>  172.253.255.35
> >>  172.253.255.37
> >>  172.253.1.132
> >>  172.253.13.193
> >>  172.253.1.129
> >>  172.253.255.33
> >>  172.253.206.35
> >>  172.253.255.34
> >>  172.253.206.33
> >>  172.253.206.34
> >>  172.253.13.194
> >>  172.253.13.195
> >>  172.71.125.63
> >>  172.71.117.60
> >>  172.71.133.51
> >>
> >> R's,
> >> John
> >>
> >
> 
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for
> Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly


RE: Long hops on international paths

2022-01-18 Thread Michael Hare via NANOG
Paul-

You said: "... would decide to configure MPLS paths between Chicago and distant 
international locations ..."

AS3128 runs MPLS and it's probable someone might correct me here, but for a IGP 
backbone area I think it's common for there to be a full mesh of LSPs via 
either LDP, RSVP, SR etc.  AS3128 is a small regional and we operate in that 
way across 60+ nodes.  I don't know if it's common for someone with a global 
footprint like 1299 to have a contiguous global MPLS backbone, but the point of 
my reply was to say it's not impossible to think 1299 has a global MPLS mesh 
between major POPs.

-Michael

> -Original Message-
> From: NANOG  On
> Behalf Of PAUL R BARFORD
> Sent: Tuesday, January 18, 2022 8:16 AM
> To: Saku Ytti 
> Cc: Esteban Carisimo ;
> nanog@nanog.org; Fabian E. Bustamante 
> Subject: Re: Long hops on international paths
> 
> Hello Saku,
> 
> Thank you for the summary.  We're clear about the fact that what we're
> seeing are MLPS paths - that was not in question.  What we are not clear
> about and the reason for the post is why the provider - zayo.telia in this 
> case
> - would decide to configure MPLS paths between Chicago and distant
> international locations.  We assumed we would see hops in traceroute
> between Chicago and coastal locations and then hops that transited
> submarine infrastructure followed by hops to large population centers.
> 
> Regards, PB
> 
> 
> From: Saku Ytti 
> Sent: Tuesday, January 18, 2022 12:50 AM
> To: PAUL R BARFORD 
> Cc: Lukas Tribus ; Esteban Carisimo
> ; nanog@nanog.org
> ; Fabian E. Bustamante
> 
> Subject: Re: Long hops on international paths
> 
> 1) all (meaning all hitting the zayo.telia) your traceroutes originate
> from University in Chicago
> 2) the zayo.telia device is physically close to the university
> 3) we should expect physically close-by backbone device to be present
> in disproportionate amount of traceroutes
> 4) almost certainly zayo.telia is imposing the MPLS label of TTL 255,
> _NOT_ copying IP TTL, therefore until MPLS label is popped, TTL is not
> expiring. I.e. you are seeing ingressPE and egress PE ot Telia, you
> are not seeing any P routers.
> 
> This is not esoteric knowledge, but a fairly basic Internet concept. I
> am worried you are missing too much context to produce actionable
> output from your work. It might be interesting to see your curriculum,
> why this confusion arose, why it seems logical that the reason must be
> that almost all waves are terminated there, because it would not seem
> logical for people practising in the field who have even cursory
> understanding, this implies problems in the curriculum.
> 
> On Tue, 18 Jan 2022 at 07:21, PAUL R BARFORD  wrote:
> >
> > Please find the examples for the case of Telia below.
> >
> > FROM jfk-us (jfk-us.team-probing.c008820.20201002.warts.gz)
> >
> >
> >
> > traceroute from 216.66.30.102 (Ark probe hosted in New York City, NY, US.
> No AS info found) to 223.114.235.32 (MAXMIXD: Turpan, CN)
> >
> > 1  216.66.30.101  0.365 ms
> >
> > 2  62.115.49.173  3.182 ms
> >
> > 3  *
> >
> > 4  62.115.137.59  17.453 ms [x] (chi-b23-link.ip.twelve99.net., CAIDA-
> GEOLOC -> Chicago, IL, US)
> >
> > 5  62.115.117.48  59.921 ms [x] (sea-b2-link.ip.twelve99.net., RIPE-IPMAP ->
> Seattle, WA, US)
> >
> > 6  62.115.171.221  69.993 ms
> >
> > 7  223.120.6.53  69.378 ms
> >
> > 8  223.120.12.34  226.225 ms
> >
> > 9  221.183.55.110  237.475 ms
> >
> > 10  221.183.25.201  238.697 ms
> >
> > 11  221.176.16.213  242.296 ms
> >
> > 12  221.183.36.62  352.695 ms
> >
> > 13  221.183.39.2  300.166 ms
> >
> > 14  117.191.8.118  316.270 ms
> >
> > 15  *
> >
> > 16  *
> >
> > 17  *
> >
> > 18  *
> >
> > 19  *
> >
> >
> >
> >
> >
> > FROM ord-us (ord-us.team-probing.c008820.20201002.warts.gz)
> >
> >
> >
> > traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at
> Depaul University-AS20120) to 109.25.215.237 (237.215.25.109.rev.sfr.net.,
> MAXMIXD: La Crau, FR)
> >
> > 1  140.192.218.129  0.795 ms
> >
> > 2  140.192.9.124  0.603 ms
> >
> > 3  64.124.44.158  1.099 ms
> >
> > 4  64.125.31.172  3.047 ms
> >
> > 5  *
> >
> > 6  64.125.15.65  1.895 ms  [x] (zayo.telia.ter1.ord7.us.zip.zayo.com.,
> CAIDA-GEOLOC -> Chicago, IL, US)
> >
> > 7  62.115.118.59  99.242 ms[x] (prs-b3-link.ip.twelve99.net., CAIDA-
> GEOLOC -> Paris, FR)
> >
> > 8  62.115.154.23  105.214 ms
> >
> > 9  77.136.10.6  119.021 ms
> >
> > 10  77.136.10.6  118.830 ms
> >
> > 11  80.118.89.202  118.690 ms
> >
> > 12  80.118.89.234  118.986 ms
> >
> > 13  109.24.108.66  119.159 ms
> >
> > 14  109.25.215.237  126.085 ms
> >
> >
> >
> >
> >
> > traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at
> Depaul University-AS20120) to 84.249.89.93 (dsl-tkubng12-54f959-
> 93.dhcp.inet.fi., MAXMIXD: Turku, FI)
> >
> > 1  140.192.218.129  0.243 ms
> >
> > 2  140.192.9.124  0.326 ms
> >
> > 3  64.124.44.158  0.600 ms
> >
> > 4  *
> >
> > 5  *
> >
> > 6  

RE: BGP Route Monitoring

2022-01-06 Thread Michael Hare via NANOG
Re: Adam's advice about IOS/XR SNMP access to VRF, while this experience may be 
a bit dated [IOS XR 5.x], in production we have used "snmp-server community-map 
$x context $y".  I will say we weren't pleased, we noticed that context 
switches didn't work well.  For example if our poller tried to simultaneously 
poll the global community and the vrf community at the same time, the results 
were non deterministic.  Maybe this has been improved in later versions, or if 
you always have a single poller that will always be sequential, you may never 
see this.

Although more work BMP is probably the better/safer approach.

-Michael

From: NANOG  On Behalf Of Adam 
Thompson
Sent: Thursday, January 6, 2022 12:41 PM
To: Sandoiu Mihai ; nanog@nanog.org
Subject: RE: BGP Route Monitoring

Most monitoring products allow you to monitor custom SNMP OIDs, and your entire 
BGP RIB is - usually - exposed via SNMP.
Most monitoring products also treat "missing" OIDs specially, and can alert on 
that fact.
At least, that's how I would start doing it.
We use Observium here, and it can do what you want, albeit with a little bit of 
futzing around in the Custom OID and Alerts sections.

Cisco does weird things with getting SNMP data from VRFs, though, so... YMMV.  
I know there used to be a Cisco-proprietary way to select which VRF you were 
polling common OIDs from, but don't remember the details.
-Adam

Adam Thompson
Consultant, Infrastructure Services
[MERLIN]
100 - 135 Innovation Drive
Winnipeg, MB, R3T 6A8
(204) 977-6824 or 1-800-430-6404 (MB only)
athomp...@merlin.mb.ca
www.merlin.mb.ca

From: NANOG 
mailto:nanog-bounces+athompson=merlin.mb...@nanog.org>>
 On Behalf Of Sandoiu Mihai
Sent: Thursday, January 6, 2022 4:35 AM
To: nanog@nanog.org
Subject: BGP Route Monitoring

Hi

I am looking for a route monitoring product that does the following:
-checks if a specific bgp route from a specific neighbor is present the BGP 
table (in some vrf, not necessarily internet routed vrf) of an ASR9K running 
IOS XR
-sends a syslog message or an alarm if the route goes missing

The use case is the following: we are receiving same routes over 2 or more bgp 
peerings, due to best route we cannot really see at the moment if one of the 
routes ceased to be received over a certain peering.

Alternative approach: a product that measures the number of bgp received 
prefixes from a certain peer.

Do you know of such product that is readily available and does not require ssh 
sessions to the routers and parsing the outputs?
I am trying to find a solution that does not require much scripting or 
customization.

Many thanks.

Regards
Mihai



RE: Partial vs Full tables

2020-06-11 Thread Michael Hare via NANOG
Mark (and others),

I used to run loose uRPF on peering/transit links for AS3128 because I used to 
think that tightening the screws was always the "right thing to do".   

I instrumented at 60s granularity with vendor J uRPF drop counters on these 
links.  Drops during steady state [bgp converged] were few [Kbps].  Drops 
during planned maintenance were at much rates for a few minutes.

What was happening: I advertise a handful of routes to transit/peers from 
multiple ASBR.  Typically my ASBR sees 800K FIB and a few million RIB routes  
We all know this takes a good amount of time to churn..   

For planned maintenance of ASBR A [cold boot upgrades], if recovery didn't 
include converging my inbound routes before doing eBGP advertising, I'd be 
tossing packets due to loose uRPF.  

Remember during this time 'ASBR B'  in my AS is happy egressing traffic as soon 
as 'ASBR A' advertises my dozen or so prefixes via eBGP, I start to see return 
traffic much sooner than before 'ASBR A' has converged.  No more specific 
return route yet other than maybe default for a few minutes if unlucky..  The 
result is bit bucket networkwide despite ASBR B functioning just fine.

Maybe everyone already convergences inbound before advertising eBGP and I made 
a rookie mistake, but what about unplanned events?

For me the summary is that I was causing more collateral damage than good 
[verified by time series data], so I turned off loose URPF.  YMMV.

-Michael

> -Original Message-
> From: NANOG  On Behalf Of Mark Tinka
> Sent: Thursday, June 11, 2020 12:14 PM
> To: nanog@nanog.org
> Subject: Re: Partial vs Full tables
> 
> 
> 
> On 10/Jun/20 19:31, William Herrin wrote:
> 
> >
> > Sorry, it'd be pre-coffee if I drank coffee and I was overly harsh
> > here. Let me back up:
> >
> > The most basic spoofing protection is: don't accept remote packets
> > pretending to be from my IP address.
> >
> > Strict mode URPF extends this to networks: don't accept packets on
> > interfaces where I know for sure the source host isn't in that
> > direction. It works fine in network segments whose structure requires
> > routes to be perfectly symmetrical: on every interface, the packet for
> > every source can only have been from one particular next hop, the same
> > one that advertises acceptance of packets with that destination. The
> > use of BGP breaks the symmetry requirement so close to always that you
> > may as well think of it as always. Even with a single transit or a
> > partial table. Don't use strict mode URPF on BGP speakers.
> >
> > Loose mode URPF is... broken. It was a valiant attempt to extend
> > reverse path filtering into networks with asymmetry but I've yet to
> > discover a use where there wasn't some faulty corner case. If you
> > think you want to use loose mode RPF, trust me: you've already passed
> > the point where any RPF was going to be helpful to you. Time to set it
> > aside and solve the problem a different way.
> 
> We don't run Loose Mode on peering routers because they don't carry a
> full table. If anyone sent the wrong packets that way, they wouldn't be
> able to leave the box anyway.
> 
> We do run Loose Mode on transit routers, no issues thus far.
> 
> We do run Strict Mode on customer-facing links that are stub-homed to us
> (DIA). We also run Loose Mode on customer-facing links that buy transit
> (BGP).
> 
> But mostly, BCP-38 deployed at the edge (peering, transit and customer
> routers) also goes a long way in protecting the network.
> 
> Mark.


RE: Partial vs Full tables

2020-06-05 Thread Michael Hare via NANOG
Saku-

> In internal network, instead of having a default route in iBGP or IGP,
> you should have the same loopback address in every full DFZ router and
> advertise that loopback in IGP. Then non fullDFZ routers should static
> route default to that loopback, always reaching IGP closest full DFZ
> router.

Just because DFZ role device can advertise loopback unconditionally in IGP 
doesn't mean the DFZ actually has a valid eBGP or iBGP session to another DFZ.  
It may be contrived but could this not be a possible way to blackhole nearby 
PEs..?   

We currently take a full RIB and I am currently doing full FIB.  I'm currently 
choosing to create a default aggregate for downstream default-only connectors 
based on something like

 from {
protocol bgp;
as-path-group transit-providers;
route-filter 0.0.0.0/0 prefix-length-range /8-/10;
route-type external;
}

Of course there is something functionally equivalent for v6.  I have time 
series data on the count of routes contributing to the aggregate which helps a 
bit with ease of mind of default being pulled when it shouldn't be.  Like all 
tricks of this type I recognize this is susceptible to default being 
synthesized when it shouldn't be.

I'm considering an approach similar to Tore's blog where at some point I keep 
the full RIB but selectively populate the FIB.  Tore, care to comment on why 
you decided to filter the RIB as well?

-Michael