Jobs are used in ONAP primarily to perform one-time database initialization. A limitation in using a Job is that it runs to completion and will not be restarted. If there are Pods that depend on the completion of a Job those Pods will become stuck, indefinitely. As was observed when infrastructure failures occur. Unfortunately, the only course of action is to reinstall the Helm Charts that apply the above-mentioned Job/Dependency pattern.
Some of this behavior may have been corrected in Casablanca, but I suspect there still may be ONAP components that suffer from this. I encourage you to raise defects against the ONAP components that failed to restart after the power outage. This will help drive the need for better resiliency testing in future releases. A production-grade platform is a priority for the OOM team. In the Dublin release our team will use feedback like this to further push this agenda. Thanks, Mike -- Mike Elliott ONAP OOM PTL Amdocs Senior Architect From: <onap-discuss@lists.onap.org> on behalf of Syed Atif Husain <syed...@infosys.com> Reply-To: "onap-discuss@lists.onap.org" <onap-discuss@lists.onap.org>, "syed...@infosys.com" <syed...@infosys.com> Date: Tuesday, November 27, 2018 at 12:03 AM To: "onap-discuss@lists.onap.org" <onap-discuss@lists.onap.org>, "bw3...@att.com" <bw3...@att.com> Subject: Re: [onap-discuss] Has anyone had success in restarting the #kubernetes cluster after a power outage with an #OOM #Beijing ONAP I have faced the same issue. But haven’t found a solution so far except for reinstalling ONAP. Regards, Atif From: onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of bw3...@att.com Sent: Tuesday, November 27, 2018 12:10 AM To: onap-discuss@lists.onap.org Subject: [onap-discuss] Has anyone had success in restarting the #kubernetes cluster after a power outage with an #OOM #Beijing ONAP After a power outage about 75% of the pods come back. And for the most part the functionality is not working. Seeing a bunch of errors for pods that look like this: container "portal-db-job" in pod "onap-portal-db-config-n6lrn" is waiting to start: PodInitializing This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service <https://www.amdocs.com/about/email-terms-of-service> -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14093): https://lists.onap.org/g/onap-discuss/message/14093 Mute This Topic: https://lists.onap.org/mt/28324774/21656 Mute #oom: https://lists.onap.org/mk?hashtag=oom&subid=2740164 Mute #kubernetes: https://lists.onap.org/mk?hashtag=kubernetes&subid=2740164 Mute #beijing: https://lists.onap.org/mk?hashtag=beijing&subid=2740164 Group Owner: onap-discuss+ow...@lists.onap.org Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-