Russel that's a great story! Nice job troubleshooting.
On 5/25/24 03:10, Russell Senior wrote:
On 3/7/24 00:15, Russell Senior wrote:
On 3/1/24 17:40, Russell Senior wrote:
Portland Linux/Unix Group General Meeting Announcement
Who: Russell Senior
What: Part 1: A Network Relay via Cloud Instance ; Part 2: Retro
Linux Tape Recovery Show and Tell
Where: 5500 SW Dosch Rd, Portland
When: Thursday, March 7, 2024 at 7pm (Help with chairs a few minutes
early is always appreciated)
Why: The pursuit of technology freedom
https://pdxlinux.org
This is going to be a two-part talk, because each of the parts alone
isn't enough to fill an hour (let's hope).
The first part is going to be a description of how I relay network
connections from the Internet to my low-volume home-based email
server to evade potential ISP blockages.
Earlier this week, you might have heard about a large Pacific Power
outage in Northeast Portland. It only lasted for 30 or 40 minutes, but
it affected a wide area and reportedly on the order of 30k customers.
I was one of those affected. I was at home at the time. When all the
UPSes started screaming at me, and the power didn't come back on
immediately, I thought I'd better be pro-active and start shutting
down machines. And it's good I did because the batteries I have
wouldn't have sustained the load for that long. I wandered around the
neighborhood, chatting with neighbors, exchanging information, and
began contemplating building a soapbox racer or possibly pushing
wheels with sticks in the dark time with no internet connections. It
turns out, reports local journalists, a beaver up near the Columbia
Slough had chewed through a tree that fell into some transmission
lines and (i'm guessing now) caused a brief fault and opened up a
circuit breaker. There must not have been any significant damage to
the line because power was restored pretty much as soon as they'd
identified the cause.
Lights come back on, and with some relief I commenced to go around and
turn back on the machines I'd turned off. Some of the machines had
been up for a long time, sustaining long running sessions, so the
downage was a chance to catch up on the deferred maintenance. For
example, I'd purchased a Core i7 975 on ebay to replace a first-gen i7
920, to max out the CPU in one of my desktop boxes. This was a chance
to replace the CPU, which I did and got that box powered back up. I
also power my mailserver back on, which had been running without a
reboot for nearly a year. It gets regular updates, but I hadn't
rebooted into a new kernel. You might recall, I gave a PLUG talk in
March describing the cloud-based tunnel I used to connect the internet
to the mailserver in my house, bypassing any obstructions my ISP might
employ. It has been working great. Life seemingly returned to normal.
Then, Friday morning, I caught wind in a meeting that some mail (turns
out, it was just mail being forwarded from gmail to my home server)
was bouncing and the senders were seeing this odd domain they hadn't
emailed. Uh-oh. But I had plans today and was away from home most of
the day. This evening, I remembered about the mailserver and decided
I'd better figure out what was going wrong. I had also recently
updated my letsencrypt certificate, and that sometimes causes trouble
if the mailserver doesn't use the new certificate, and I need to
restart or reload the service.
Oh, and the machine runs Arch. And yes, I don't mind that it sometimes
gives me paper cuts. We're coming to that.
So I look at my cloud hosted relay. If you recall the talk, the relay
is just relaying packets, there's no server there other than the vpn I
use to do the tunneling. And some tricky port forwarding,
masquerading, ip rules, etc. At first, I'm just looking at the postfix
logs on my mailserver, and I'm not seeing anything inbound. I look in
iptables on the mail server to see if I'm dropping anything
overzealously. Not that I can tell. I run tcpdump on the cloud-based
relay, and I see TCP connections coming in but no answers. Weird. And
then I run tcpdump on my mail server and I see TCP connection attempts
there as well, but nothing going back over the tunnel interface, as
they should. And then I think: "Hey, wait a minute, didn't someone
just talk about this? And, hey, wait a minute, wasn't that person
ME??? Where the hell are my slides?" and I go and find them, and flip
through until I find the relevant bits. I had annoyingly obfuscated
some of the addresses for the audience, so I had to translate the
examples in my slides back to my actual context. And I start checking
things like, are the fwmark rules intact (they were) and how about
that ip rule? What? No ip rule? So I type in my translation, guessing
a little at the table name. And ip tells me, "no such table". What? I
remind myself that the table names are listed in a file called
/etc/iproute2/rt_tables. I look in my /etc/iproute2 directory and I
find rt_tables.pacsave, but no rt_tables. The pacsave version has my
table name in it. Where did my file go? Well, I can just copy it back,
which I do, and then run the ip command again with the table name and
it works. And pretty much instantly, emails start flowing again.
So, where the hell did someone get the idea that they should remove my
custom rt_tables file? I look in /var/log/pacman.log and notice that
iproute2 was updated recently and I go look at its commit and don't
seen anything particularly guilty looking. Then I realize the box has
been up since late June 2023, nearly a year and realize that my file
could have disappeared anytime since then and it probably would
continue working. So, I hop on the #archlinux channel and describe my
observations and asking what might have caused this kind of rude file
move that broke my perfectly working network. After 10 minutes or so,
someone pipes up with the commit from a number of versions ago. It
appears that the files in /etc/iproute2 are mostly commented examples,
and that the modern place for such examples is in /usr/share/iproute2/
and that the transition had, through an oversight, moved (thankfully
not deleted) the file I was depending on.
So, my mail wasn't being delivered correctly for a few days. And I
spent an hour or so puzzling out what had gone wrong. And I'm reminded
that people can be sloppy (whether paid or volunteers), and they can
make mistakes and distributions like Arch are particularly susceptible
to moving fast and occasionally breaking things, but I got a nice
puzzle out of it and was reminded of some things that I might have
otherwise forgotten, and I didn't even need to pay for a subscription
to the nytimes puzzle service.