Hi I recently started the resiliency testing on honolulu the test case can be summarized as follows:
1) run basic_vm test 2) stop a k8s worker (I shutdown the VM hosting the worker in my case) 3) run basic_vm 4) restart the k8s worker (I restyart the VM) 5) run basic_vm I gave more details and logs in https://jira.onap.org/browse/TEST-308 On my first attempt, I had the impression that the pods were properly restarted on the other healthy workers but on the last attempt, it was clearly not the case. I can see in the events of the pod that used to be stopped worker Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning NodeNotReady 2m45s node-controller Node is not ready it seems that I just did not wait enough in the second case the default value is 5m https://github.com/kubernetes/kubernetes/issues/55713 I will redo a test before reinstalling a weekly and reduce the pod-eviction-timeout. Anyway in both cases the test was failing when the worker was down apparently due to network issues... more surprisingly my test reported that I was not able anymore to contact the SDC (first step of the basic_vm test), even if on the faulty pods there was no SDC pods at all. I did not change anything on the controler what could explain the timeout on requests to the SDC? Note that the requests are launched from a docker out of k8s cluster When I restart the worker, the pods are restarted properly on other pods and the test is PASS again /Morgan _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#23147): https://lists.onap.org/g/onap-discuss/message/23147 Mute This Topic: https://lists.onap.org/mt/82233952/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
