Dzahn added a comment.
As of right now both wcqs1002 and wcqs2001 seem to be running normal, blazegraph is active and all Icinga checks are green/OK. It's not obvious what the issue was but it seems gone now. re: "various NRPE timeouts" these happen when the nagios-nrpe-server service dies. An issue we have seen repeatedly on other hosts was: host runs out of memory for whatever reason, OOM-killer picks nagios-nrpe-server process as the victim, all the Icinga standard checks for this host that are executed on the host via NRPE start failing (check_disk, check_cpu and so on), someone restarts nagios-nrpe-server, everything recovers. So maybe it was that here as well? TASK DETAIL https://phabricator.wikimedia.org/T294865 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Dzahn Cc: Dzahn, RKemper, Gehel, Aklapper, 786, Suran38, Biggs657, joanna_borun, Invadibot, Lalamarie69, MPhamWMF, Devnull, maantietaja, lmata, Juan90264, Muchiri124, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, RhinosF1, joker88john, Legado_Shulgin, ReaperDawn, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Davinaclare77, Cpaulf30, Techguru.pc, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Addshore, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
