JMeybohm added a comment.

  In T301147#7689837 <https://phabricator.wikimedia.org/T301147#7689837>, 
@dcausse wrote:
  
  > @JMeybohm we're still investigating why the application did not properly 
recover while kubernetes1014 went down but if you have ideas on the two 
questions in the ticket description this would be very helpful, thanks!
  
  Unfortunately I'm not exactly sure what happened to the node. What I know is 
that the system load surged (potentially due to high iowait) on the system, 
leaving running processes practically starving but the system was still 
responding to ICMP and kubernetes status heartbeats still (mostly) worked. 
Leaving the node flipping between Ready/NotReady state.
  That means the node was not actually down from k8s POV, which is why no new 
Pods where created until I drained the node respectively before I powercycled 
it (as evicting pods was actually hanging as well, as k8s tries to be nice and 
the node still was in it's overloaded state).

TASK DETAIL
  https://phabricator.wikimedia.org/T301147

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JMeybohm
Cc: Addshore, JMeybohm, Michael, Aklapper, dcausse, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to