Hi we reenabled gating today (on master) apparently there is a problem in OOM, we got systematically the same error (see Sylvain's mail) Something bad has been probably merged
Sylvain probably found the root cause of our errors. We got lots of error/warning messages from etcd claiming that it was too slow. The installation now imposed to use SSD disk for etc VM (previously, no constraints were indicated). Since this change has been applied we do not have anymore error messages , the deployment of ONAP seems faster (only dmaap and so remains a little bit long) and at the end more deterministic. It is probably our migration to 3 etc VM that caused the issues on our different chains. We tested on our Daily master and on the Openlab (ciommunity lab based on Dublin). I sent the mail to our active Openlab users, they should be able to connect and try Dublin. As discussed we will be off next week and be back in August. We disabled our daily chains during our PTO. We only keep the openlab + Gating. /Morgan ________________________________________ De : Lefevre, Catherine [[email protected]] Envoyé : jeudi 11 juillet 2019 12:00 À : RICHOMME Morgan TGI/OLN; FREEMAN, BRIAN D; [email protected]; [email protected]; [email protected]; [email protected] Cc : DEBEAU Eric TGI/OLN; DESBUREAUX Sylvain TGI/OLN Objet : RE: [ONAP][Gating] Gating chains momentarily disabled Thank you Morgan for the update. I will maintain our today's sync-up call so you can share your latest updates KR Catherine -----Original Message----- From: [email protected] [mailto:[email protected]] Sent: Thursday, July 11, 2019 11:18 AM To: FREEMAN, BRIAN D <[email protected]>; [email protected]; Lefevre, Catherine <[email protected]>; [email protected]; [email protected]; [email protected] Cc: DEBEAU Eric TGI/OLN <[email protected]>; DESBUREAUX Sylvain TGI/OLN <[email protected]> Subject: [ONAP][Gating] Gating chains momentarily disabled Hi, we momentarily stopped the gating chain on Orange Openlab. We are facing weird behaviors since some days. Initially we believed it could be due to ONAP but latest investigations seem to indicate that it is a kubernetes issue. Our gating chains include a 3 controllers + 12 compute nodes configurations. For some reasons we still ignore, sometimes some compute nodes (usually 1 or 2 on the 12) are badly configured. The IPvS (~ IP Tables to manage the routing within the k8s clusters are incomplete) Apparently it is not due to the CNI (we tried with weave and flanel). As a consequence, some ONAP components cannot contact other ONAP components if they are on a wrongly configured compute nodes. The delete of the pod may lead to restoration if by chance the pod is rescheduled on a healthy node. It could explain the intermittent problems we reported - and why it was hard to reproduce the issues. For gating and our daily chains, the consequence is an abnormal number of failed pods - pods should be OK but as they cannot contact pods they depend on, init is failing. We need to understand the root cause of the problem and see what changed over the last few days. By default we run a simple healthcheck test suite after kubernetes installation, it is probably not enough. Meanwhile to avoid any misleading reporting on gating, we disabled the listener on gerrit. Sorry for the inconvenience Morgan & Sylvain _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18039): https://lists.onap.org/g/onap-discuss/message/18039 Mute This Topic: https://lists.onap.org/mt/32428410/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
