Dzahn added a comment.

  As of right now both wcqs1002 and wcqs2001 seem to be running normal, 
blazegraph is active and all Icinga checks are green/OK. It's not obvious what 
the issue was but it seems gone now.
  
  re: "various NRPE timeouts" these happen when the nagios-nrpe-server service 
dies. An issue we have seen repeatedly on other hosts was:  host runs out of 
memory for whatever reason, OOM-killer picks nagios-nrpe-server process as the 
victim, all the Icinga standard checks for this host that are executed on the 
host via NRPE start failing (check_disk, check_cpu and so on), someone restarts 
nagios-nrpe-server, everything recovers.   So maybe it was that here as well?

TASK DETAIL
  https://phabricator.wikimedia.org/T294865

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dzahn
Cc: Dzahn, RKemper, Gehel, Aklapper, 786, Suran38, Biggs657, joanna_borun, 
Invadibot, Lalamarie69, MPhamWMF, Devnull, maantietaja, lmata, Juan90264, 
Muchiri124, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, 
Kent7301, RhinosF1, joker88john, Legado_Shulgin, ReaperDawn, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Davinaclare77, Cpaulf30, Techguru.pc, Lahi, 
Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Hfbn0, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, Maathavan, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, faidon, Addshore, Mbch331, Jay8g, 
fgiunchedi
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to