On 3/7/24 00:15, Russell Senior wrote:


On 3/1/24 17:40, Russell Senior wrote:
Portland Linux/Unix Group General Meeting Announcement

Who: Russell Senior
What: Part 1: A Network Relay via Cloud Instance ; Part 2: Retro Linux Tape Recovery Show and Tell
Where: 5500 SW Dosch Rd, Portland
When: Thursday, March 7, 2024 at 7pm (Help with chairs a few minutes early is always appreciated)
Why: The pursuit of technology freedom

https://pdxlinux.org

This is going to be a two-part talk, because each of the parts alone isn't enough to fill an hour (let's hope).

The first part is going to be a description of how I relay network connections from the Internet to my low-volume home-based email server to evade potential ISP blockages.


Earlier this week, you might have heard about a large Pacific Power outage in Northeast Portland. It only lasted for 30 or 40 minutes, but it affected a wide area and reportedly on the order of 30k customers. I was one of those affected. I was at home at the time. When all the UPSes started screaming at me, and the power didn't come back on immediately, I thought I'd better be pro-active and start shutting down machines. And it's good I did because the batteries I have wouldn't have sustained the load for that long. I wandered around the neighborhood, chatting with neighbors, exchanging information, and began contemplating building a soapbox racer or possibly pushing wheels with sticks in the dark time with no internet connections. It turns out, reports local journalists, a beaver up near the Columbia Slough had chewed through a tree that fell into some transmission lines and (i'm guessing now) caused a brief fault and opened up a circuit breaker. There must not have been any significant damage to the line because power was restored pretty much as soon as they'd identified the cause.

Lights come back on, and with some relief I commenced to go around and turn back on the machines I'd turned off. Some of the machines had been up for a long time, sustaining long running sessions, so the downage was a chance to catch up on the deferred maintenance. For example, I'd purchased a Core i7 975 on ebay to replace a first-gen i7 920, to max out the CPU in one of my desktop boxes. This was a chance to replace the CPU, which I did and got that box powered back up. I also power my mailserver back on, which had been running without a reboot for nearly a year. It gets regular updates, but I hadn't rebooted into a new kernel. You might recall, I gave a PLUG talk in March describing the cloud-based tunnel I used to connect the internet to the mailserver in my house, bypassing any obstructions my ISP might employ. It has been working great. Life seemingly returned to normal.

Then, Friday morning, I caught wind in a meeting that some mail (turns out, it was just mail being forwarded from gmail to my home server) was bouncing and the senders were seeing this odd domain they hadn't emailed. Uh-oh. But I had plans today and was away from home most of the day. This evening, I remembered about the mailserver and decided I'd better figure out what was going wrong. I had also recently updated my letsencrypt certificate, and that sometimes causes trouble if the mailserver doesn't use the new certificate, and I need to restart or reload the service.

Oh, and the machine runs Arch. And yes, I don't mind that it sometimes gives me paper cuts. We're coming to that.

So I look at my cloud hosted relay. If you recall the talk, the relay is just relaying packets, there's no server there other than the vpn I use to do the tunneling. And some tricky port forwarding, masquerading, ip rules, etc. At first, I'm just looking at the postfix logs on my mailserver, and I'm not seeing anything inbound. I look in iptables on the mail server to see if I'm dropping anything overzealously. Not that I can tell. I run tcpdump on the cloud-based relay, and I see TCP connections coming in but no answers. Weird. And then I run tcpdump on my mail server and I see TCP connection attempts there as well, but nothing going back over the tunnel interface, as they should. And then I think: "Hey, wait a minute, didn't someone just talk about this? And, hey, wait a minute, wasn't that person ME??? Where the hell are my slides?" and I go and find them, and flip through until I find the relevant bits. I had annoyingly obfuscated some of the addresses for the audience, so I had to translate the examples in my slides back to my actual context. And I start checking things like, are the fwmark rules intact (they were) and how about that ip rule? What? No ip rule? So I type in my translation, guessing a little at the table name. And ip tells me, "no such table". What? I remind myself that the table names are listed in a file called /etc/iproute2/rt_tables. I look in my /etc/iproute2 directory and I find rt_tables.pacsave, but no rt_tables. The pacsave version has my table name in it. Where did my file go? Well, I can just copy it back, which I do, and then run the ip command again with the table name and it works. And pretty much instantly, emails start flowing again.

So, where the hell did someone get the idea that they should remove my custom rt_tables file? I look in /var/log/pacman.log and notice that iproute2 was updated recently and I go look at its commit and don't seen anything particularly guilty looking. Then I realize the box has been up since late June 2023, nearly a year and realize that my file could have disappeared anytime since then and it probably would continue working. So, I hop on the #archlinux channel and describe my observations and asking what might have caused this kind of rude file move that broke my perfectly working network. After 10 minutes or so, someone pipes up with the commit from a number of versions ago. It appears that the files in /etc/iproute2 are mostly commented examples, and that the modern place for such examples is in /usr/share/iproute2/ and that the transition had, through an oversight, moved (thankfully not deleted) the file I was depending on.

So, my mail wasn't being delivered correctly for a few days. And I spent an hour or so puzzling out what had gone wrong. And I'm reminded that people can be sloppy (whether paid or volunteers), and they can make mistakes and distributions like Arch are particularly susceptible to moving fast and occasionally breaking things, but I got a nice puzzle out of it and was reminded of some things that I might have otherwise forgotten, and I didn't even need to pay for a subscription to the nytimes puzzle service.

--
Russell Senior
[email protected]

Reply via email to