Relax, everyone here has offered a good set of options to consider. My goal was to brainstorm possible solutions here since I haven't worked directly with this kind of software in a Linux environment. First is always to list all your options, even if those options may not be the best and I've got plenty to think about here. I'm currently in the "there are no bad ideas here" stage.
On Sunday, March 3rd, 2024 at 8:22 AM, Ted Mittelstaedt <[email protected]> wrote: > ... > But all his hardware is DIFFERENT. That's why the past of this company has > been littered with burned out system admins who quit. > ... Not exactly. The previous admin did not quit, he was FIRED. He had no concept of Linux, IT, or even basic PC troubleshooting but operated as if he assumed he understood why everything was broken and blamed the remote software team for pretty much everything. A lot of the jankyness right now comes from the fact that the onsite technican was a complete doofus who needed his hand held when replacing a bad GPU because he wasn't able to verify that it was actually working via lspci/nvidia-smi. "Training" the remote team to give me the PCIe ID for a bad GPU was more about building confidence that I could actually handle that info. They were also having issues with basic inventory management. Just getting this guy to write down a tracking number or count server rails was a massive undertaking so you can see why they might not want to splurge for nice parts. I ran into a similar problem last year at a big corporation - lower tier support technicians were mad at management for making bad decisions, but management makes those decisions based on ticket data. So I looked at the ticket data and noticed that people weren't creating their SNOW tickets properly, which resulted in all of their work being massively under-reported. So of course management is going to assume you have free time... that's what your own ticket data says. Why are we blaming the managers if we didn't do our job correctly? Similar problem here, just much smaller scale. Tech who didn't properly manage the physical location ended up making it difficult for decision makers to choose a path forward. The short term solution might be to modularize the different tasks, and follow a one-application-per-task mindset. If it gets to the point where everyone else wants to integrate these tools together, that could become the opportunity to suggest a more robust system. -Ben
