Re: Famous operational issues

2021-02-18 Thread Suresh Ramasubramanian
Did you at least hire the janitor?

From: NANOG  on behalf of Mark 
Tinka 
Date: Friday, 19 February 2021 at 10:20 AM
To: nanog@nanog.org 
Subject: Re: Famous operational issues

On 2/19/21 00:37, Warren Kumari wrote:

5: Another one. In the early 2000s I was working for a dot-com boom company. We 
are building out our first datacenter, and I'm installing a pair of Cisco 7206s 
in 811 10th Ave. These will run basically the entire company, we have some 
transit, we have some peering to configure, we have an AS, etc. I'm going to be 
configuring all of this; clearly I'm a router-god...
Anyway, while I'm getting things configured, this janitor comes past, wheeling 
a garbage bin. He stops outside the cage and says "Whatcha doin'?". I go into 
this long explanation of how these "routers"  will connect to "the 
Internet"  to allow my "servers"  to talk to other "computers"  
on "the Internet" . He pauses for a second, 
and says "'K. So, you doing a full iBGP mesh, or confeds?". I really hadn't 
intended to be a condescending ass, but I think of that every time I realize I 
might be assuming something about someone based on thier attire/job/etc.

:-), cute.

Mark.


Re: Famous operational issues

2021-02-18 Thread George Herbert
Northridge quake.  I was #2 and on call at CRL.  That One Guy on dialup in 
Atlanta playing MUDs 23x7 pages that things are down.  I wander out to my 
computer to dial in and see what’s up, turned on TV walking past it, sat down 
and turned computer on, as it was booting on comes a live helicopter shot over 
Northridge showing the 1.5 remaining floors of the 3-story Cable and Wireless 
building our east coast connector went through.

Took a second to listen and make sure I understood what was happening, changed 
channels to verify it wasn’t a stunt, logged  on and pinged our router there to 
confirm nothing there, call & wake up Jim: “East coast’s down because 
earthquake in Northridge and the C&W center fell down.”

“oh.”

And then there was the Sidekick outage...


-George 

Sent from my iPhone

> On Feb 18, 2021, at 4:37 PM, Patrick W. Gilmore  wrote:
> 
> On Feb 18, 2021, at 6:10 PM, Karl Auer  wrote:
>> 
>> I think it was Macchiavelli who said that one should not ascribe to
>> malice anything adequately explained by incompetence…
> 
> https://en.wikipedia.org/wiki/Hanlon%27s_razor
>Never attribute to malice that which is adequately explained by stupidity.
> 
> I personally prefer this version from Robert A. Heinlein:
>Never underestimate the power of human stupidity.
> 
> And to put it on topic, cover your EPOs
> 
> In 1994, there was a major earthquake near the city of Los Angeles. City hall 
> had to be evacuated and it would take over a year to reinforce the building 
> to make it habitable again. My company moved all the systems in the basement 
> of city hall to a new datacenter a mile or so away. After the install, we 
> spent more than a week coaxing their ancient (even for 1994) machines back 
> online, such as a Prime Computer and an AS400 with tons of DASD. Well, tons 
> of cabinets, certainly less storage than my watch has now.
> 
> I was in the DC going over something with the lady in charge when someone 
> walked in to ask her something. She said “just a second”. That person took 
> one step to the side of the door and leaned against the wall - right on an 
> EPO which had no cover.
> 
> Have you ever heard an entire row of DASD spin down instantly? Or taken 40 
> minutes to IPL an AS400? In the middle of the business day? For the second 
> most populous city in the country?
> 
>Me: Maybe you should get a cover for that?
>Her: Good idea.
> 
> Couple weeks later, in the same DC, going over final checklist. A fedex guy 
> walks in. (To this day, no idea how he got in a supposedly locked DC.) She 
> says “just a second”, and I get a very strong deja vu feeling. He takes one 
> step to the side and leans against the wall.
> 
>Me: Did you order that EPO cover?
>Her: Nope.
> 
> -- 
> TTFN,
> patrick
> 


Re: Famous operational issues

2021-02-18 Thread bzs


One day I got called into the office supplies area because there was a
smell of something burning. Uh-oh.

To make a long story short there was a stainless steel bowl which was
focusing the sun from a window such that it was igniting a cardboard
box.

Talk about SMH and random bad luck which could have been a lot worse,
nothing really happened other than some smoke and char.

On February 18, 2021 at 01:07 eric.kuh...@gmail.com (Eric Kuhnke) wrote:
 > On that note, I'd be very interested in hearing stories of actual incidents
 > that are the cause of why cardboard boxes are banned in many facilities, due 
 > to
 > loose particulate matter getting into the air and setting off very sensitive
 > fire detection systems.
 > 
 > Or maybe it's more mundane and 99% of the reason is people unpack stuff and
 > don't always clean up properly after themselves.
 > 
 > On Wed, Feb 17, 2021, 6:21 PM Owen DeLong  wrote:
 > 
 > Stolen isn’t nearly as exciting as what happens when your (used) 6509
 > arrives and
 > gets installed and operational before anyone realizes that the conductive
 > packing
 > peanuts that it was packed in have managed to work their way into various
 > midplane
 > connectors. Several hours later someone notices that the box is quite
 > literally
 > smoldering in the colo and the resulting combination of panic, fire 
 > drill,
 > and
 > management antics that ensue.
 > 
 > Owen
 > 
 > 
 > > On Feb 16, 2021, at 2:08 PM, Jared Mauch  wrote:
 > >
 > > I was thinking about how we need a war stories nanog track. My favorite
 > was being on call when the router was stolen.
 > >
 > > Sent from my TI-99/4a
 > >
 > >> On Feb 16, 2021, at 2:40 PM, John Kristoff  wrote:
 > >>
 > >> Friends,
 > >>
 > >> I'd like to start a thread about the most famous and widespread 
 > Internet
 > >> operational issues, outages or implementation incompatibilities you
 > >> have seen.
 > >>
 > >> Which examples would make up your top three?
 > >>
 > >> To get things started, I'd suggest the AS 7007 event is perhaps  the
 > >> most notorious and likely to top many lists including mine.  So if
 > >> that is one for you I'm asking for just two more.
 > >>
 > >> I'm particularly interested in this as the first step in developing a
 > >> future NANOG session.  I'd be particularly interested in any issues
 > >> that also identify key individuals that might still be around and
 > >> interested in participating in a retrospective.  I already have 
 > someone
 > >> that is willing to talk about AS 7007, which shouldn't be hard to 
 > guess
 > >> who.
 > >>
 > >> Thanks in advance for your suggestions,
 > >>
 > >> John
 > 
 > 

-- 
-Barry Shein

Software Tool & Die| b...@theworld.com | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*


Re: Famous operational issues

2021-02-18 Thread Mark Tinka



On 2/19/21 00:37, Warren Kumari wrote:



5: Another one. In the early 2000s I was working for a dot-com boom 
company. We are building out our first datacenter, and I'm installing 
a pair of Cisco 7206s in 811 10th Ave. These will run basically the 
entire company, we have some transit, we have some peering to 
configure, we have an AS, etc. I'm going to be configuring all of 
this; clearly I'm a router-god...
Anyway, while I'm getting things configured, this janitor comes past, 
wheeling a garbage bin. He stops outside the cage and says "Whatcha 
doin'?". I go into this long explanation of how these "routers" 
 will connect to "the Internet"  to 
allow my "servers"  
to talk to other "computers"  on "the Internet" with the waving of the hands>. He pauses for a second, and says "'K. 
So, you doing a full iBGP mesh, or confeds?". I really hadn't intended 
to be a condescending ass, but I think of that every time I realize I 
might be assuming something about someone based on thier attire/job/etc.


:-), cute.

Mark.


Re: Texas internet connectivity declining due to blackouts

2021-02-18 Thread Mark Tinka




On 2/17/21 16:09, Ben Cannon wrote:

https://www.dallasnews.com/business/energy/2021/02/16/electricity-retailer-griddys-unusual-plea-to-texas-customers-leave-now-before-you-get-a-big-bill/ 
 



The power market in Texas has utterly failed.


Griddy aren't greedy. Pity about the grid.

Mark.


Re: Carrier Neutral Site - Freetown, Sierra Leone?

2021-02-18 Thread Mark Tinka



On 2/18/21 19:45, Rod Beck wrote:

Every time I try to bring a circuit into Africa it is like a complete 
tour of Dante's Hell.


A broad brush for such a large place.

Mark.


Re: Famous operational issues

2021-02-18 Thread Andy Ringsmuth


> On Feb 18, 2021, at 4:37 PM, Warren Kumari  wrote:
> 
> 4: Not too long after I started doing networking (and for the same small ISP 
> in Yonkers), I'm flying off to install a new customer. I (of course) think 
> that I'm hot stuff because I'm going to do the install, configure the router, 
> whee, look at me! Anyway, I don't want to check a bag, and so I stuff the 
> Cisco 2501 in a carryon bag, along with tools, etc (this was all pre-9/11!). 
> I'm going through security and the TSA[0] person opens my bag and pulls the 
> router out. "What's this?!" he asks. I politely tell him that it's a router. 
> He says it's not. I'm still thinking that I'm the new hotness, and so I tell 
> him in a somewhat condescending way that it is, and I know what I'm talking 
> about. He tells me that it's not a router, and is starting to get annoyed. I 
> explain using my "talking to a 5 year old" voice that it most certainly is a 
> router. He tells me that lying to airport security is a federal offense, and 
> starts looming at me. I adjust my attitude and start explaining that it's 
> like a computer and makes the Internet work. He gruffly hands me back the 
> router, I put it in my bag and scurry away. As I do so, I hear him telling 
> his colleague that it wasn't a router, and that he certainly knows what a 
> router is, because he does woodwork… 

Well, in his defense, he wasn’t wrong…   :-)




Andy Ringsmuth
5609 Harding Drive
Lincoln, NE 68521-5831
(402) 304-0083
a...@andyring.com

“Better even die free, than to live slaves.” - Frederick Douglas, 1863



Re: Famous operational issues

2021-02-18 Thread Randy Bush
when employer had shipped 2xJ to london, had the circuits up, ...
the local office sat on their hands.  for weeks.  i finally was
pissed enough to throw my toolbag over my shoulder, get on a
plane, and fly over.  i walked into the fancy office and said
"hi, i am randy, vp eng, here to help you turn up the routers."
they managed to turn them up pretty quickly.


Re: Google Fiber abuse address does not exist

2021-02-18 Thread Chris Boyd



> On Feb 18, 2021, at 5:19 PM, Louie Lee  wrote:
> 
> Hey Chris,
> 
> Thanks for reporting this. We had an issue that caused emails to addresses in 
> that domain to not be recognized.
> 
> The email is no longer bouncing back, and emails to other googlefiber.net 
> addresses are confirmed working.
> 
> Louie

Thanks Warren and Louie for looking into it and getting it fixed. My abuse 
report has been received by the giant brain.

I’m waiting for $DAYJOB to wise up and make me the DMR at ARIN. Coming soon….

—Chris

Re: Famous operational issues

2021-02-18 Thread Patrick W. Gilmore
On Feb 18, 2021, at 6:10 PM, Karl Auer  wrote:
> 
> I think it was Macchiavelli who said that one should not ascribe to
> malice anything adequately explained by incompetence…

https://en.wikipedia.org/wiki/Hanlon%27s_razor
Never attribute to malice that which is adequately explained by 
stupidity.

I personally prefer this version from Robert A. Heinlein:
Never underestimate the power of human stupidity.

And to put it on topic, cover your EPOs

In 1994, there was a major earthquake near the city of Los Angeles. City hall 
had to be evacuated and it would take over a year to reinforce the building to 
make it habitable again. My company moved all the systems in the basement of 
city hall to a new datacenter a mile or so away. After the install, we spent 
more than a week coaxing their ancient (even for 1994) machines back online, 
such as a Prime Computer and an AS400 with tons of DASD. Well, tons of 
cabinets, certainly less storage than my watch has now.

I was in the DC going over something with the lady in charge when someone 
walked in to ask her something. She said “just a second”. That person took one 
step to the side of the door and leaned against the wall - right on an EPO 
which had no cover.

Have you ever heard an entire row of DASD spin down instantly? Or taken 40 
minutes to IPL an AS400? In the middle of the business day? For the second most 
populous city in the country?

Me: Maybe you should get a cover for that?
Her: Good idea.

Couple weeks later, in the same DC, going over final checklist. A fedex guy 
walks in. (To this day, no idea how he got in a supposedly locked DC.) She says 
“just a second”, and I get a very strong deja vu feeling. He takes one step to 
the side and leans against the wall.

Me: Did you order that EPO cover?
Her: Nope.

-- 
TTFN,
patrick



Re: Famous operational issues

2021-02-18 Thread Brian Knight via NANOG

On 2021-02-17 13:28, John Kristoff wrote:

On Wed, 17 Feb 2021 14:07:54 -0500
John Curran  wrote:


I have no idea what outages were most memorable for others, but the
Stanford transfer switch explosion in October 1996 resulted in a much
of the Internet in the Bay Area simply not being reachable for
several days.


Thanks John.

This reminds me of two I've not seen anyone mention yet.  Both
coincidentally in the Chicago area that I learned before my entry
into netops full time.  One was a flood:

  

The other, at the dawn of an earlier era:

  



I wouldn't necessarily put those two in the top 3, but by some standard
for many they were certainly very significant and noteworthy.

John


Thanks for sharing these links John.  I was personally affected by the 
Hinsdale CO fire when I was a kid.  At the time, my family lived on the 
southern border of Hinsdale in the adjacent town of Burr Ridge.  It was 
weird like a power outage: you're reminded of the loss of service every 
time you perform the simple act of requesting service, picking up the 
phone or toggling a light switch.  But it lasted a lot longer than any 
loss of power: It was six or seven weeks that, to this day, felt a lot 
longer.


Anytime we needed to talk to someone long-distance, we had to drive to a 
cousin's house to make the call.  To talk to anyone local, you'd have to 
physically go and show up unannounced.  At 11 years old, I was the 
bicycle messenger between our house and my great-grandmother, who lived 
about two blocks away.  My mother and father kept the cars gassed up and 
extra fuel on hand in case there was an emergency.


Dad ran a home improvement business out of the house, so new business 
ground to a halt.  Mom worked for a publishing company, so their release 
dates were impacted.  The local grocery store's scanners wouldn't work, 
so they had to punch the orders into the register by hand, using the 
paper sticker prices on the items.


I clearly remember from the local papers that they had to special-order 
the replacement 5ESS at enormous cost.  I saw the big brick building 
after the fire with the burn marks around the front door.  In late May 
and early June, the Greyhound buses with the workers were parked around 
the block, power plants outside with huge cables snaking in right 
through the wide open front door.


When we heard that dial tone at last, everyone was happier than an 
iPhone with full bars. Lol


We're spoiled for choice in telecom networks these days.  Also, 
facilities management have learned plenty of lessons since then.  Like, 
install and maintain an FM-200 fire suppression system.  But 
nevertheless, sometimes when I step into a colo, I think of that outage 
and the impact it had.


-Brian


Re: Carrier Neutral Site - Freetown, Sierra Leone?

2021-02-18 Thread Eric Kuhnke
There is really no such thing since there is just the one cable landing
station. I've previously spent months working in network infrastructure and
telecom in Sierra Leone, contact me off-list if you're serious about
getting something done there.




On Thu, Feb 18, 2021 at 9:46 AM Rod Beck 
wrote:

> Every time I try to bring a circuit into Africa it is like a complete tour
> of Dante's Hell.
>
> 😃
>
> Regards,
>
> Roderick.
>
> Roderick Beck
> Global Network Capacity Procurement
>
> United Cable Company
> www.unitedcablecompany.com
> https://unitedcablecompany.com/video/
> New York City & Budapest
>
> rod.b...@unitedcablecompany.com
>
> Budapest: 36-70-605-5144
>
> NJ: 908-452-8183
>
>
>
> [image: 1467221477350_image005.png]
>


Re: Famous operational issues

2021-02-18 Thread Paul Ebersman
warren> 2: A somewhat similar thing would happen with the Ascend TNT
warren> Max, which had side-to-side airflow. These were dial termination
warren> boxes, and so people would install racks and racks of them. The
warren> first one would draw in cool air on the left, heat it up and
warren> ship it out the right. The next one over would draw in warm air
warren> on the left, heat it up further, and ship it out the
warren> right... Somewhere there is a fairly famous photo of a rack of
warren> TNT Maxes, with the final one literally on fire, and still
warren> passing packets.

The Ascend MAX (TNT was the T3 version, max took 2 T1s) was originally
an ISDN device. We got the first v.34 rockwell modem version for
testing. An individual card had 4 daughter boards. They were burned in
for 24 hours at Ascend, then shipped to us. We were doing stress testing
in Fairfax VA. Turns out that the boards started to overheat at about 30
hours and caught fire a few hours after that... Completely melted the
daughterboards. They did fix that issue and upped the burnin test period
to 48 hours.

And yeah, they vented side to side. They were designed for enclosed
racks where are flow was forced up. We were colocating at telco POPs so
we had to use center mount open relay racks. The air flow was as you
describe. Good time. Had by all...

Both we (UUNET, for MSN and Earthlink) and AOL were using these for
dialup access. 80k ports before we switched to the TNTs, 3+ million
ports on TNTs by the time I stopped paying attention.


Re: Google Fiber abuse address does not exist

2021-02-18 Thread Louie Lee via NANOG
Hey Chris,

Thanks for reporting this. We had an issue that caused emails to addresses
in that domain to not be recognized.

The email is no longer bouncing back, and emails to other googlefiber.net
addresses are confirmed working.

Louie


On Thu, Feb 18, 2021 at 1:58 PM Chris Boyd  wrote:

> Can someone at ARIN tell them they need to fix this?
>
> From whois 136.32.164.64:
> OrgAbuseHandle: GFA32-ARIN
> OrgAbuseName:   Google Fiber Abuse
> OrgAbusePhone:  +1-650-253- <(650)%20253->
> OrgAbuseEmail:  ab...@googlefiber.net
> OrgAbuseRef:https://rdap.arin.net/registry/entity/GFA32-ARIN
>
> Email response:
>   - The following addresses had permanent fatal errors -
> 
>(reason: 550-5.1.1 The email account that you tried to reach does not
> exist. Please try)
>
>   - Transcript of session follows -
> ... while talking to gmr-smtp-in.l.google.com.:
> >>> DATA
> <<< 550-5.1.1 The email account that you tried to reach does not exist.
> Please try
> <<< 550-5.1.1 double-checking the recipient's email address for typos or
> <<< 550-5.1.1 unnecessary spaces. Learn more at
> <<< 550 5.1.1  https://support.google.com/mail/?p=NoSuchUser
> kk5si203161pjb.1 - gsmtp
> 550 5.1.1 ... User unknown
> <<< 503 5.5.1 RCPT first. kk5si203161pjb.1 - gsmtp
> Reporting-MTA: dns; lenny.gizmopartners.com
> Received-From-MTA: DNS; 136-49-160-191.googlefiber.net
> Arrival-Date: Thu, 18 Feb 2021 21:52:38 GMT
>
> Final-Recipient: RFC822; ab...@googlefiber.net
> Action: failed
> Status: 5.1.1
> Remote-MTA: DNS; gmr-smtp-in.l.google.com
> Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach
> does not exist. Please try
> Last-Attempt-Date: Thu, 18 Feb 2021 21:52:39 GMT
>
>


Re: Famous operational issues

2021-02-18 Thread Karl Auer
On Thu, 2021-02-18 at 17:37 -0500, Warren Kumari wrote:
> Anyway, the subcontractor who made the power supplies for the vendor
> realized that they could save a few cents by not installing the
> little metal clip that held the heatsink to the MOSFET

I think it was Macchiavelli who said that one should not ascribe to
malice anything adequately explained by incompetence...

> 3: I used to work for a small ISP in Yonkers, NY.

There is actually a place called "Yonkers"?!? I always thought it was a
joke placename. We don't really need joke placenames in Oz, since we
have real ones like Woolloomooloo, Burpengary and Humpty Doo. My
favourite is Numbugga (closely followed by Wonglepong).

> I cannot remember what we used to call airport security pre-TSA...

"Useful"?

Regards, K.

-- 
~~~
Karl Auer (ka...@biplane.com.au)
http://www.biplane.com.au/kauer

GPG fingerprint: 2561 E9EC D868 E73C 8AF1 49CF EE50 4B1D CCA1 5170
Old fingerprint: 8D08 9CAA 649A AFEF E862 062A 2E97 42D4 A2A0 616D





FCC outage report for Texas (central US) winter storm

2021-02-18 Thread Sean Donelan




The Federal Communications Commission has posted a summary report on 
outage reports due to the winter storm impacting Texas and central US.


It looks incomplete to me compared to the detail collected during a 
typical hurricane. It is based on reports from telecommunication 
companies.  If companies don't self-report, the FCC doesn't know.


Through February 18, 2021

8 Oklahoma outage reports
208 Texas outage reports

https://docs.fcc.gov/public/attachments/DOC-370095A1.pdf

Caida and Netblocks data shows more outages in the central United States 
(Texas), but I don't know how accurate their sub-county measurements are.



According to news reports FEMA brought in 60 generators and diesel 
fuel into Texas, but local authorities haven't dispatched orders for 
their deployment yet.


Re: Google Fiber abuse address does not exist

2021-02-18 Thread Warren Kumari
Whoops.

Thank you for reporting this, it’s being looked into.

W



On Thu, Feb 18, 2021 at 5:01 PM Chris Boyd  wrote:

> Can someone at ARIN tell them they need to fix this?
>
> From whois 136.32.164.64:
> OrgAbuseHandle: GFA32-ARIN
> OrgAbuseName:   Google Fiber Abuse
> OrgAbusePhone:  +1-650-253-
> OrgAbuseEmail:  ab...@googlefiber.net
> OrgAbuseRef:https://rdap.arin.net/registry/entity/GFA32-ARIN
>
> Email response:
>   - The following addresses had permanent fatal errors -
> 
>(reason: 550-5.1.1 The email account that you tried to reach does not
> exist. Please try)
>
>   - Transcript of session follows -
> ... while talking to gmr-smtp-in.l.google.com.:
> >>> DATA
> <<< 550-5.1.1 The email account that you tried to reach does not exist.
> Please try
> <<< 550-5.1.1 double-checking the recipient's email address for typos or
> <<< 550-5.1.1 unnecessary spaces. Learn more at
> <<< 550 5.1.1  https://support.google.com/mail/?p=NoSuchUser
> kk5si203161pjb.1 - gsmtp
> 550 5.1.1 ... User unknown
> <<< 503 5.5.1 RCPT first. kk5si203161pjb.1 - gsmtp
> Reporting-MTA: dns; lenny.gizmopartners.com
> Received-From-MTA: DNS; 136-49-160-191.googlefiber.net
> Arrival-Date: Thu, 18 Feb 2021 21:52:38 GMT
>
> Final-Recipient: RFC822; ab...@googlefiber.net
> Action: failed
> Status: 5.1.1
> Remote-MTA: DNS; gmr-smtp-in.l.google.com
> Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach
> does not exist. Please try
> Last-Attempt-Date: Thu, 18 Feb 2021 21:52:39 GMT
>
> --
Perhaps they really do strive for incomprehensibility in their specs.
After all, when the liturgy was in Latin, the laity knew their place.
-- Michael Padlipsky


Re: Famous operational issues

2021-02-18 Thread Warren Kumari
On Thu, Feb 18, 2021 at 8:31 AM Jared Mauch  wrote:

> On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote:
> > On that note, I'd be very interested in hearing stories of actual
> incidents
> > that are the cause of why cardboard boxes are banned in many facilities,
> > due to loose particulate matter getting into the air and setting off very
> > sensitive fire detection systems.
> >
> > Or maybe it's more mundane and 99% of the reason is people unpack stuff
> and
> > don't always clean up properly after themselves.
>
> We had a plastic bag sucked into the intake of a router in a
> datacenter once that caused it to overheat and take the site down.  We
> had cameras in our cage and I remember seeing the photo from the site of
> the colo (I'll protect their name just because) taken as the tech was on
> the phone and pulled the bag out of the router.
>
> The time from the thermal warning syslog that it's getting warm
> to overheat and shutdown is short enough you can't really get a tech to
> the cage in time to prevent it.
>


1: A previous employer was a large customer of a (now defunct) L3 switch
vendor. The AC power inputs were along the bottom of the power supply, and
the big aluminium heatsinks in the power supplies were just above the AC
socket.
Anyway, the subcontractor who made the power supplies for the vendor
realized that they could save a few cents by not installing the little
metal clip that held the heatsink to the MOSFET, and instead relying on the
thermal adhesive to hold it...
This worked fine, until a certain number of hours had passed, at which
point the goop would dry out and the heatsink would fall down, directly
across the AC socket This would A: trip the circuit that this was on,
but, more excitingly, set the aluminum on fire, which would then ignite the
other heatsinks in the PSU, leading to much fire...

2: A somewhat similar thing would happen with the Ascend TNT Max, which had
side-to-side airflow. These were dial termination boxes, and so people
would install racks and racks of them. The first one would draw in cool air
on the left, heat it up and ship it out the right. The next one over would
draw in warm air on the left, heat it up further, and ship it out the
right... Somewhere there is a fairly famous photo of a rack of TNT Maxes,
with the final one literally on fire, and still passing packets.
There is a related (and probably apocryphal) regarding the launch of the
TNT. It was being shipped for a major trade-show, but got stuck in customs.
After many bizarre calls with the customs folk, someone goes to the customs
office to try and sort it out, and get greeted by custom agents with guns.
They all walk into the warehouse, and discover that there is a large empty
area around the crate, which is a wooden cube, with "TNT" stencilled in big
red letters...

3: I used to work for a small ISP in Yonkers, NY. We had a customer in
Florida, and on a Friday morning their site goes down. We (of course) have
not paid for Cisco 4 hour support (or, honestly, any support) and they have
a strict SLA, so we are a little stuck.
We end up driving to JFK, and lugging a fully loaded Cisco 7507 to the
check in counter. It was just before the last flight of the day, so we
shrugged and said it was my checked bag. The excess baggage charges were
eye-watering,  but it rode the conveyor belt with the rest of the luggage
onto the plane. It arrived with just a bent  ejector handle, and the rest
was fine.

4: Not too long after I started doing networking (and for the same small
ISP in Yonkers), I'm flying off to install a new customer. I (of course)
think that I'm hot stuff because I'm going to do the install, configure the
router, whee, look at me! Anyway, I don't want to check a bag, and so I
stuff the Cisco 2501 in a carryon bag, along with tools, etc (this was all
pre-9/11!). I'm going through security and the TSA[0] person opens my bag
and pulls the router out. "What's this?!" he asks. I politely tell him that
it's a router. He says it's not. I'm still thinking that I'm the new
hotness, and so I tell him in a somewhat condescending way that it is, and
I know what I'm talking about. He tells me that it's not a router, and is
starting to get annoyed. I explain using my "talking to a 5 year old" voice
that it most certainly is a router. He tells me that lying to airport
security is a federal offense, and starts looming at me. I adjust my
attitude and start explaining that it's like a computer and makes the
Internet work. He gruffly hands me back the router, I put it in my bag and
scurry away. As I do so, I hear him telling his colleague that it wasn't a
router, and that he certainly knows what a router is, because he does
woodwork...

5: Another one. In the early 2000s I was working for a dot-com boom
company. We are building out our first datacenter, and I'm installing a
pair of Cisco 7206s in 811 10th Ave. These will run basically the entire
company, we have some transit, we h

Re: Google Fiber abuse address does not exist

2021-02-18 Thread Mark Seiden
i forwarded this to a colleague who has just taken a job that looks like he’s 
running abuse and security at g fiber.
(not sure that he’s started work yet, it’s that new.)

> On Feb 18, 2021, at 2:24 PM, TJ Trout  wrote:
> 
> Did you try opening a ticket with arin?
> 
> On Thu, Feb 18, 2021 at 2:00 PM Chris Boyd  > wrote:
> Can someone at ARIN tell them they need to fix this?
> 
> From whois 136.32.164.64 :
> OrgAbuseHandle: GFA32-ARIN
> OrgAbuseName:   Google Fiber Abuse
> OrgAbusePhone:  +1-650-253- 
> OrgAbuseEmail:  ab...@googlefiber.net 
> OrgAbuseRef:https://rdap.arin.net/registry/entity/GFA32-ARIN 
> 
> 
> Email response:
>   - The following addresses had permanent fatal errors -
> mailto:ab...@googlefiber.net>>
>(reason: 550-5.1.1 The email account that you tried to reach does not 
> exist. Please try)
> 
>   - Transcript of session follows -
> ... while talking to gmr-smtp-in.l.google.com 
> .:
> >>> DATA
> <<< 550-5.1.1 The email account that you tried to reach does not exist. 
> Please try
> <<< 550-5.1.1 double-checking the recipient's email address for typos or
> <<< 550-5.1.1 unnecessary spaces. Learn more at
> <<< 550 5.1.1  https://support.google.com/mail/?p=NoSuchUser 
>  kk5si203161pjb.1 - gsmtp
> 550 5.1.1 mailto:ab...@googlefiber.net>>... User 
> unknown
> <<< 503 5.5.1 RCPT first. kk5si203161pjb.1 - gsmtp
> Reporting-MTA: dns; lenny.gizmopartners.com 
> Received-From-MTA: DNS; 136-49-160-191.googlefiber.net 
> 
> Arrival-Date: Thu, 18 Feb 2021 21:52:38 GMT
> 
> Final-Recipient: RFC822; ab...@googlefiber.net 
> Action: failed
> Status: 5.1.1
> Remote-MTA: DNS; gmr-smtp-in.l.google.com 
> Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach 
> does not exist. Please try
> Last-Attempt-Date: Thu, 18 Feb 2021 21:52:39 GMT
> 



Re: Google Fiber abuse address does not exist

2021-02-18 Thread TJ Trout
Did you try opening a ticket with arin?

On Thu, Feb 18, 2021 at 2:00 PM Chris Boyd  wrote:

> Can someone at ARIN tell them they need to fix this?
>
> From whois 136.32.164.64:
> OrgAbuseHandle: GFA32-ARIN
> OrgAbuseName:   Google Fiber Abuse
> OrgAbusePhone:  +1-650-253-
> OrgAbuseEmail:  ab...@googlefiber.net
> OrgAbuseRef:https://rdap.arin.net/registry/entity/GFA32-ARIN
>
> Email response:
>   - The following addresses had permanent fatal errors -
> 
>(reason: 550-5.1.1 The email account that you tried to reach does not
> exist. Please try)
>
>   - Transcript of session follows -
> ... while talking to gmr-smtp-in.l.google.com.:
> >>> DATA
> <<< 550-5.1.1 The email account that you tried to reach does not exist.
> Please try
> <<< 550-5.1.1 double-checking the recipient's email address for typos or
> <<< 550-5.1.1 unnecessary spaces. Learn more at
> <<< 550 5.1.1  https://support.google.com/mail/?p=NoSuchUser
> kk5si203161pjb.1 - gsmtp
> 550 5.1.1 ... User unknown
> <<< 503 5.5.1 RCPT first. kk5si203161pjb.1 - gsmtp
> Reporting-MTA: dns; lenny.gizmopartners.com
> Received-From-MTA: DNS; 136-49-160-191.googlefiber.net
> Arrival-Date: Thu, 18 Feb 2021 21:52:38 GMT
>
> Final-Recipient: RFC822; ab...@googlefiber.net
> Action: failed
> Status: 5.1.1
> Remote-MTA: DNS; gmr-smtp-in.l.google.com
> Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach
> does not exist. Please try
> Last-Attempt-Date: Thu, 18 Feb 2021 21:52:39 GMT
>
>


Google Fiber abuse address does not exist

2021-02-18 Thread Chris Boyd
Can someone at ARIN tell them they need to fix this?

From whois 136.32.164.64:
OrgAbuseHandle: GFA32-ARIN
OrgAbuseName:   Google Fiber Abuse
OrgAbusePhone:  +1-650-253- 
OrgAbuseEmail:  ab...@googlefiber.net
OrgAbuseRef:https://rdap.arin.net/registry/entity/GFA32-ARIN

Email response:
  - The following addresses had permanent fatal errors -

   (reason: 550-5.1.1 The email account that you tried to reach does not exist. 
Please try)

  - Transcript of session follows -
... while talking to gmr-smtp-in.l.google.com.:
>>> DATA
<<< 550-5.1.1 The email account that you tried to reach does not exist. Please 
try
<<< 550-5.1.1 double-checking the recipient's email address for typos or
<<< 550-5.1.1 unnecessary spaces. Learn more at
<<< 550 5.1.1  https://support.google.com/mail/?p=NoSuchUser kk5si203161pjb.1 - 
gsmtp
550 5.1.1 ... User unknown
<<< 503 5.5.1 RCPT first. kk5si203161pjb.1 - gsmtp
Reporting-MTA: dns; lenny.gizmopartners.com
Received-From-MTA: DNS; 136-49-160-191.googlefiber.net
Arrival-Date: Thu, 18 Feb 2021 21:52:38 GMT

Final-Recipient: RFC822; ab...@googlefiber.net
Action: failed
Status: 5.1.1
Remote-MTA: DNS; gmr-smtp-in.l.google.com
Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach does 
not exist. Please try
Last-Attempt-Date: Thu, 18 Feb 2021 21:52:39 GMT



Re: Famous operational issues

2021-02-18 Thread Henry Yen
On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote:
> On that note, I'd be very interested in hearing stories of actual incidents
> that are the cause of why cardboard boxes are banned in many facilities,

the datacenter manager's daughter's cat.

-- 
Henry Yen   Aegis Information Systems, Inc.
Senior Systems Programmer   Hicksville, New York


Re: Famous operational issues

2021-02-18 Thread Alain Hebert

A few I remember:

    . Some monitoring server SCSI drive failed (we're talking 
State/Province level govt)...  Got a return back stating it will take 6 
month delay to get a replacement...


        Ended up choosing to use my own drive instead of leaving 
something that could be have been deadly, unmonitored.


    . Metro interruption during rush hour (for a pop of 4M) due to 
overload power bar in a MMR (Meet Me Room) during a unplanned deployment;


    . Cherry red and very angry looking 520-600V bus bar =D;

    . Fire fighters hitting the building generator emergency STOP 
button because some neighbor reported smoke on top of the building 
during a black out...

    ( not their fault, local gov failure as usual )

    . Some idiots poured gasoline into a large pipe under a bridge...  
ended up demonstrating the lack of diversity to the DCs on that urban 
island;


    . Underground transformer blow up downtown Mtl and took out the 
entire fiber bundle, demonstrating to those customers that their 
diversity was actually real =D.


        (took them a year to get that fixed)

and

    . Obviously: Any rack cabling I do...

-
Alain Hebertaheb...@pubnix.net
PubNIX Inc.
50 boul. St-Charles
P.O. Box 26770 Beaconsfield, Quebec H9W 6G7
Tel: 514-990-5911  http://www.pubnix.netFax: 514-990-9443

On 2/18/21 2:37 PM, t...@pelican.org wrote:

On Thursday, 18 February, 2021 16:23, "Seth Mattinen"  said:


I had a customer that tried to stack their servers - no rails except the
bottom most one - using 2x4's between each server. Up until then I
hadn't imagined anyone would want to fill their cabinet with wood, so I
made a rule to ban wood and anything tangentially related (cardboard,
paper, plastic, etc.). Easier to just ban all things. Fire reasons too
but mainly I thought a cabinet full of wood was too stupid to allow.

On the "stupid racking" front, I give you most of a rack dedicated to a single 
server.  Not all that high a server, maybe 2U or so, but *way* too deep for the rack, so 
it had been installed vertically.  By looping some fairly hefty chain through the handles 
on either side of the front of the chassis, and then bolting the four chain ends to the 
four rack posts.  I wish I'd kept pictures of that one.  Not flammable, but a serious WTF 
moment.

Cheers,
Tim.






Re: Famous operational issues

2021-02-18 Thread George Metz
Normally I reference this as an example of terrible government
bureaucracy, but in this case it's also how said bureaucracy can delay
operational changes.

I was a contractor for one of the many branches of the DoD in charge
of the network at a moderate-sized site. I'd been there about 4
months, and it was my first job with FedGov. I was sent a pair of
Cisco 6509-E routers, with all supervisors and blades needed, along
with a small mountain of SFPs, to replace the non-E 6509s we had
installed that were still using GBICs for their downlinks. These were
the distro switches for approximately half the site.

Problem was, we needed 84 new SC-LC fiber jumpers to replace the SC-SC
we had in place for the existing switch - GBICs to SFPs remember. We
hadn't received any with the shipment. So I reached out to the project
manager to ask about getting the fiber jumpers. "Oh, that should be
coming from the server farm folks, since it's being installed in a
server farm." Okay, that seems stupid to me, but $FedGov, who knows. I
tell him we're stalled out until we get those cables - we have the
routers configured and ready to go, just need the jumpers, can he get
them from the server farm folks? He'll do that.

It took FIFTEEN MONTHS to hash out who was going to pay for and order
the fiber jumpers. Any number of times as the months dragged on, I
seriously considered ordering them on Amazon Prime using my corporate
card. We had them installed a week and a half after we got them. Why
that long? Because we had to completely reconfigure them, and after 15
months, the urgency just wasn't there.

By the way, the project ended up buying them, not the server farm team.

On Tue, Feb 16, 2021 at 2:38 PM John Kristoff  wrote:
>
> Friends,
>
> I'd like to start a thread about the most famous and widespread Internet
> operational issues, outages or implementation incompatibilities you
> have seen.
>
> Which examples would make up your top three?
>
> To get things started, I'd suggest the AS 7007 event is perhaps  the
> most notorious and likely to top many lists including mine.  So if
> that is one for you I'm asking for just two more.
>
> I'm particularly interested in this as the first step in developing a
> future NANOG session.  I'd be particularly interested in any issues
> that also identify key individuals that might still be around and
> interested in participating in a retrospective.  I already have someone
> that is willing to talk about AS 7007, which shouldn't be hard to guess
> who.
>
> Thanks in advance for your suggestions,
>
> John


Re: Famous operational issues

2021-02-18 Thread t...@pelican.org
On Thursday, 18 February, 2021 16:23, "Seth Mattinen"  said:

> I had a customer that tried to stack their servers - no rails except the
> bottom most one - using 2x4's between each server. Up until then I
> hadn't imagined anyone would want to fill their cabinet with wood, so I
> made a rule to ban wood and anything tangentially related (cardboard,
> paper, plastic, etc.). Easier to just ban all things. Fire reasons too
> but mainly I thought a cabinet full of wood was too stupid to allow.

On the "stupid racking" front, I give you most of a rack dedicated to a single 
server.  Not all that high a server, maybe 2U or so, but *way* too deep for the 
rack, so it had been installed vertically.  By looping some fairly hefty chain 
through the handles on either side of the front of the chassis, and then 
bolting the four chain ends to the four rack posts.  I wish I'd kept pictures 
of that one.  Not flammable, but a serious WTF moment.

Cheers,
Tim.




Re: Famous operational issues

2021-02-18 Thread Erik Sundberg
Worked a cronic support call where their internet would bounce at noon every 
workday. The Cisco 1601 or 1700 Router that had there T1 in, ended up being on 
top a microwave. Weeks of troubleshooting and shipping new routers on this one.

Also had another one where the router was plugged in to an outlet that was 
controlled by a light switch, discovered this after shipping them two new 
routers.

Customer had there building remodeled and the techs counldn't find the T1 
Smartjack for the building. The contract who did the remodel job, decided it 
would be a good idea to cut out the section of wall where the telco equipment 
was and mounted it to the ceiling. It's new location was in the ladys bathroom, 
above the drop ceiling mounted to the building's rafters 10' in the air.

Customer needed a new router, because the first one died. It was a machine shop 
and they mounted the router to the wall next to a lathe or drill press that 
used oil to cool the bit while it was cutting. It looked like some dumped the 
router in a bucket of oil when we got it back.

Arriving at another large colo for a buildout. Only to find that our ASR9K that 
arrived 2 weeks ago was stored outside on the load dock which has no roof or 
locked gate. I guess that why Cisco put the plastic bag over the chassis when 
there shipped.

Colo techs at another larger colo decided to unpack our router which was a 
fully loaded 1/2 rack chassis. Since they couldn't lift it, they tipped the 
router on the side and walked it back by shifting the weight from one corner of 
the chassis to another. Bending the chassis. I could see the scrap marks in the 
floor from it.

We had colo space in top floor of an ATT CO where we put a Cisco 7513 to 
terminate about a dozen CHDS3's. The roof was leaking and instead of fixing the 
roof. The fix was to put a sheet of plastic over our cabinet. It was more like 
a tent over the cabinet.  A pool of water formed in a diviot at the top and it 
was 120+ degrees under the plastic tarp.

Our office was in a work loft off an older building and they had the AC unit 
mounted to the ceiling with a drip pan underneath them. Well, AC on the 2nd 
floor had the pump for the drip pan died. Who every installed the drip pan 
didn't secure it or center it under the AC unit. It filled up with water and 
since it was not secured and was off centered. The drip pan came crashing down 
with a few gallons of water. The water worked it's way over to the wall and 
traveled down one story in the building. The floor below had all the telco 
equipment mounted to that same wall and the water flowed down right through a 
couple of ATT's Ciena mounted to the wall shorting them out. I was at the 
Chicago Nanog Hackathon on Sunday and was called out to work that one 😕

Was working in the back of a cabinet that had -48 VDC power for a Cisco Router, 
a screw fell and shorted out the power. My co worker who was standing in front 
of the rack wasn't happy because the ADC PowerWorx Fuse panel was about 6" from 
his face where he was working. It had those little black alarm fuses, that had 
the spring-loaded arm. When it tripped a nice shower of sparks had flew right 
at his face Luckly he wore glasses.

I was 18 at my first IT job and it was a brand-new building. I was plugging in 
a 208VAC 30A APC UPS in the server room the electrican had just energized and 
check the circuit. I plugged in the APC UPS and gave it a good turn for the 
twist lock plug to catch and KA BAMB!!! Sparks came shooting out of the outlet 
at me. I think I pooped myself that day. Turns out the electricians deiced that 
a single Gange electrical box was good enough for a 208 VAC 30A outlet, that 
barely fit in the box. Didn't put any tape around the wire terminals. When they 
energized the circuit there was enough of an air gap that the hot screw didn't 
ground out. When I gave it that good old twist while plugging in the APC, I 
grounded the hot screw to the side of the electrical box.







From: NANOG  on behalf of Seth 
Mattinen 
Sent: Thursday, February 18, 2021 10:23 AM
To: nanog@nanog.org 
Subject: Re: Famous operational issues

On 2/18/21 1:07 AM, Eric Kuhnke wrote:
> On that note, I'd be very interested in hearing stories of actual
> incidents that are the cause of why cardboard boxes are banned in many
> facilities, due to loose particulate matter getting into the air and
> setting off very sensitive fire detection systems.
>


I had a customer that tried to stack their servers - no rails except the
bottom most one - using 2x4's between each server. Up until then I
hadn't imagined anyone would want to fill their cabinet with wood, so I
made a rule to ban wood and anything tangentially related (cardboard,
paper, plastic, etc.). Easier to just ban all things. Fire reasons too
but mainly I thought a cabinet full of wood was too stupid to allow.

The "no wood" rule has become a fun story to tell everyone who asks how
that ended up being

Carrier Neutral Site - Freetown, Sierra Leone?

2021-02-18 Thread Rod Beck
Every time I try to bring a circuit into Africa it is like a complete tour of 
Dante's Hell.

😃

Regards,

Roderick.


Roderick Beck

Global Network Capacity Procurement

United Cable Company

www.unitedcablecompany.com
https://unitedcablecompany.com/video/
New York City & Budapest

rod.b...@unitedcablecompany.com

Budapest: 36-70-605-5144

NJ: 908-452-8183



[1467221477350_image005.png]


Re: Famous operational issues

2021-02-18 Thread Seth Mattinen

On 2/18/21 1:07 AM, Eric Kuhnke wrote:
On that note, I'd be very interested in hearing stories of actual 
incidents that are the cause of why cardboard boxes are banned in many 
facilities, due to loose particulate matter getting into the air and 
setting off very sensitive fire detection systems.





I had a customer that tried to stack their servers - no rails except the 
bottom most one - using 2x4's between each server. Up until then I 
hadn't imagined anyone would want to fill their cabinet with wood, so I 
made a rule to ban wood and anything tangentially related (cardboard, 
paper, plastic, etc.). Easier to just ban all things. Fire reasons too 
but mainly I thought a cabinet full of wood was too stupid to allow.


The "no wood" rule has become a fun story to tell everyone who asks how 
that ended up being a rule. The wood customer turned out to be a 
complete a-hole anyway, wood was just the tip of the iceberg.


Re: Famous operational issues

2021-02-18 Thread Jared Mauch
On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote:
> On that note, I'd be very interested in hearing stories of actual incidents
> that are the cause of why cardboard boxes are banned in many facilities,
> due to loose particulate matter getting into the air and setting off very
> sensitive fire detection systems.
> 
> Or maybe it's more mundane and 99% of the reason is people unpack stuff and
> don't always clean up properly after themselves.

We had a plastic bag sucked into the intake of a router in a
datacenter once that caused it to overheat and take the site down.  We
had cameras in our cage and I remember seeing the photo from the site of
the colo (I'll protect their name just because) taken as the tech was on
the phone and pulled the bag out of the router.

The time from the thermal warning syslog that it's getting warm
to overheat and shutdown is short enough you can't really get a tech to
the cage in time to prevent it.

I assume also the latter above, which is people have varying
definitons of clean.

- Jared

-- 
Jared Mauch  | pgp key available via finger from ja...@puck.nether.net
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: bgp.he.net?

2021-02-18 Thread Niels Bakker

* h...@interall.co.il (Hank Nussbacher) [Thu 18 Feb 2021, 14:10 CET]:

  Is it down?

  -Hank


I can access https://bgp.he.net/contact/ just fine from here.

Also, it's 2021, please stop posting in HTML.


-- Niels.


Re: bgp.he.net?

2021-02-18 Thread Hank Nussbacher

  
  
On 18/02/2021 15:08, Hank Nussbacher
  wrote:


  
  
  Is it down?
  
  
  -Hank
  

Back up.


-Hank

  



bgp.he.net?

2021-02-18 Thread Hank Nussbacher

  
  
Is it down?


-Hank

  



Re: Famous operational issues

2021-02-18 Thread Eric Kuhnke
On that note, I'd be very interested in hearing stories of actual incidents
that are the cause of why cardboard boxes are banned in many facilities,
due to loose particulate matter getting into the air and setting off very
sensitive fire detection systems.

Or maybe it's more mundane and 99% of the reason is people unpack stuff and
don't always clean up properly after themselves.

On Wed, Feb 17, 2021, 6:21 PM Owen DeLong  wrote:

> Stolen isn’t nearly as exciting as what happens when your (used) 6509
> arrives and
> gets installed and operational before anyone realizes that the conductive
> packing
> peanuts that it was packed in have managed to work their way into various
> midplane
> connectors. Several hours later someone notices that the box is quite
> literally
> smoldering in the colo and the resulting combination of panic, fire drill,
> and
> management antics that ensue.
>
> Owen
>
>
> > On Feb 16, 2021, at 2:08 PM, Jared Mauch  wrote:
> >
> > I was thinking about how we need a war stories nanog track. My favorite
> was being on call when the router was stolen.
> >
> > Sent from my TI-99/4a
> >
> >> On Feb 16, 2021, at 2:40 PM, John Kristoff  wrote:
> >>
> >> Friends,
> >>
> >> I'd like to start a thread about the most famous and widespread Internet
> >> operational issues, outages or implementation incompatibilities you
> >> have seen.
> >>
> >> Which examples would make up your top three?
> >>
> >> To get things started, I'd suggest the AS 7007 event is perhaps  the
> >> most notorious and likely to top many lists including mine.  So if
> >> that is one for you I'm asking for just two more.
> >>
> >> I'm particularly interested in this as the first step in developing a
> >> future NANOG session.  I'd be particularly interested in any issues
> >> that also identify key individuals that might still be around and
> >> interested in participating in a retrospective.  I already have someone
> >> that is willing to talk about AS 7007, which shouldn't be hard to guess
> >> who.
> >>
> >> Thanks in advance for your suggestions,
> >>
> >> John
>
>