On Saturday, March 2nd, 2024 at 10:50 AM, Ted Mittelstaedt <[email protected]> wrote:
> Are these 800 servers virtual or physical? Physical. > Are the physical servers home-built or commercial from a major brand (HP > Proliant, etc.) Home-built... but often with parts from major brands. Or copy cat brands > Are the servers all the same brand and model or are they a mismash of pieces > from different makers? Uhh.. Ever seen a graphics card with a Gigabyte logo and EVGA silkscreened onto the PCB? > Are the servers yours or owned by customers? That is, if they are virtual > servers owned by remote customers do you have any responsibility to monitor > them?> We own them. And the racks, cabinets, PDUs. > For "emergency notifications" the go-to for FOSS is "Big Sister" > https://bigsister.ch/ Set that up to ping the server interface and if it > trips a breaker and goes offline then have Big Sister email a text-to-SMS > gateway for your cell phone number > > For monitoring power consumption you have to configure the PDUs for that. > I've yet to see one of these that supports current monitoring but does not > support SNMP, so once you get that going you can monitor power consumption > with mrtg or, if you want to get fancy, https://www.cacti.net/ Cacti is based > on RRDtool with is the successor to MRTG https://oss.oetiker.ch/rrdtool/ > The PDUs have SNMP so I may have to take a look at those. I've used RT in the past and it's a bit on the excessive side. IIRC it uses perl and I know next to nothing about perl. As of right now, it basically is a one man show, I am the only one regularly on side for the physical hardware. That said, they want to hire a second person which is where these tools will start to come in handy. Creating a custom tool to manage all this stuff is not outside the realm of possibility, but that might end up meaning that I spend all my time maintaining said tool. My instinct is to start setting up some sort of relational database and build it up piece by piece simply because there is literally NOTHING used to manage this stuff. Especially since the servers are already installed and running. But like anything else the first step is to list all options and make my list of pros and cons. ;)
