Hey there,

Maybe this can help with this topic, I found that Kured[1] helps to drain the 
workers before they’re restarted. Its installation is thru a Kubernetes objects 
and it supports several K8s versions.

Regards,
Victor Morales

[1] https://github.com/weaveworks/kured

From: <[email protected]> on behalf of Mike Elliott 
<[email protected]>
Reply-To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Date: Tuesday, November 27, 2018 at 7:13 AM
To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, "[email protected]" <[email protected]>
Subject: Re: [onap-discuss] Has anyone had success in restarting the 
#kubernetes cluster after a power outage with an #OOM #Beijing ONAP

Jobs are used in ONAP primarily to perform one-time database initialization. A 
limitation in using a Job is that it runs to completion and will not be 
restarted. If there are Pods that depend on the completion of a Job those Pods 
will become stuck, indefinitely. As was observed when infrastructure failures 
occur. Unfortunately, the only course of action is to reinstall the Helm Charts 
that apply the above-mentioned Job/Dependency pattern.

Some of this behavior may have been corrected in Casablanca, but I suspect 
there still may be ONAP components that suffer from this. I encourage you to 
raise defects against the ONAP components that failed to restart after the 
power outage. This will help drive the need for better resiliency testing in 
future releases. A production-grade platform is a priority for the OOM team. In 
the Dublin release our team will use feedback like this to further push this 
agenda.

Thanks,
Mike

--
Mike Elliott
ONAP OOM PTL
Amdocs Senior Architect


From: <[email protected]> on behalf of Syed Atif Husain 
<[email protected]>
Reply-To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Date: Tuesday, November 27, 2018 at 12:03 AM
To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Re: [onap-discuss] Has anyone had success in restarting the 
#kubernetes cluster after a power outage with an #OOM #Beijing ONAP

I have faced the same issue. But haven’t found a solution so far except for 
reinstalling ONAP.

Regards,
Atif

From: [email protected] <[email protected]> On Behalf Of 
[email protected]
Sent: Tuesday, November 27, 2018 12:10 AM
To: [email protected]
Subject: [onap-discuss] Has anyone had success in restarting the #kubernetes 
cluster after a power outage with an #OOM #Beijing ONAP

After a power outage about 75% of the pods come back. And for the most part the 
functionality is not working. Seeing a bunch of errors for pods that look like 
this: container "portal-db-job" in pod "onap-portal-db-config-n6lrn" is waiting 
to start: PodInitializing
This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14100): https://lists.onap.org/g/onap-discuss/message/14100
Mute This Topic: https://lists.onap.org/mt/28324774/21656
Mute #oom: https://lists.onap.org/mk?hashtag=oom&subid=2740164
Mute #kubernetes: https://lists.onap.org/mk?hashtag=kubernetes&subid=2740164
Mute #beijing: https://lists.onap.org/mk?hashtag=beijing&subid=2740164
Group Owner: [email protected]
Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to