[Wikidata-bugs] [Maniphest] T301147: The WDQS streaming updater went unstable for several hours (2022-02-06T23:00:00 - 2022-02-07T06:20:00)

JMeybohm Tue, 08 Feb 2022 01:57:16 -0800

JMeybohm added a comment.


  In T301147#7689837 <https://phabricator.wikimedia.org/T301147#7689837>, 
@dcausse wrote:
  
  > @JMeybohm we're still investigating why the application did not properly 
recover while kubernetes1014 went down but if you have ideas on the two 
questions in the ticket description this would be very helpful, thanks!
  
  Unfortunately I'm not exactly sure what happened to the node. What I know is 
that the system load surged (potentially due to high iowait) on the system, 
leaving running processes practically starving but the system was still 
responding to ICMP and kubernetes status heartbeats still (mostly) worked. 
Leaving the node flipping between Ready/NotReady state.
  That means the node was not actually down from k8s POV, which is why no new 
Pods where created until I drained the node respectively before I powercycled 
it (as evicting pods was actually hanging as well, as k8s tries to be nice and 
the node still was in it's overloaded state).

TASK DETAIL
  https://phabricator.wikimedia.org/T301147

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JMeybohm
Cc: Addshore, JMeybohm, Michael, Aklapper, dcausse, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata-bugs] [Maniphest] T301147: The WDQS streaming updater went unstable for several hours (2022-02-06T23:00:00 - 2022-02-07T06:20:00)

Reply via email to