Gary, Team,

I would like to share some findings made by the CLAMP team in case of a similar 
issue is encountered by the other teams.
It seems that the problem arrives when the container takes some time to start 
(around 60s for Windriver OOM-Daily lab)

The setting of the liveness, initialDelaySeconds, must be adapted. It was 
initially set to 30s which too low, we will increase it to 120s.

Hope it helps.

Best regards
Catherine

From: Lefevre, Catherine
Sent: Wednesday, August 08, 2018 5:49 AM
To: 'Gary Wu' <gary.i...@huawei.com>; 'onap-discuss@lists.onap.org' 
<onap-discuss@lists.onap.org>; 'roger.maitl...@amdocs.com' 
<roger.maitl...@amdocs.com>; 'onap-rele...@lists.onap.org' 
<onap-rele...@lists.onap.org>
Subject: RE: [onap-discuss] [oom] OOM container restarts

Thank you Gary for the additional information.
I raised previously this request because some PTLs could not find the logs from 
previous OOM containers restart.

If PTLs check at the beginning of their day then they should be able to capture 
the logs if the restart reoccurs.

I looked on the different environments yesterday (Aug 7th).
I have asked CLAMP, SDNC, DCAEGEN2, MUSIC, EXTAPI, AAF teams to investigate 
since the restart of their containers are recurrent.
I did not trigger the VFC team – I did the assumption that you were already 
working with them.
If not then let me know.

Finally, great job on the Grafana dashboard !

Best regards
Catherine

From: Gary Wu [mailto:gary.i...@huawei.com]
Sent: Tuesday, August 07, 2018 5:05 PM
To: Lefevre, Catherine <cl6...@intl.att.com<mailto:cl6...@intl.att.com>>; 
onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; 
roger.maitl...@amdocs.com<mailto:roger.maitl...@amdocs.com>; 
onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>
Subject: RE: [onap-discuss] [oom] OOM container restarts

Hi Cathrine,

Generally speaking, a container restart means that:

1.      The liveness probe initial delay is too low, and

2.      The liveness probe, when triggered, is returning the wrong status while 
the container is still initializing (i.e. not dead).

The default liveness probe in OOM is a TCP probe with a short (10 seconds?) 
initial delay.  This means that if the TCP port specified is not responding at 
10s after container start, then the container will be killed and restarted.

There are two ways to remedy this:

1.      Make the initial delay long enough to guarantee that by then the TCP 
port will be up, keeping in mind that ONAP might be running on arbitrarily slow 
hardware.

2.      Change the liveness probe to something else, perhaps to a shell script 
probe inside the container that can do more sophisticated checks like process 
status, etc.

I can look into the feasibility of saving the container logs.  In the meantime, 
it might actually be easier to debug the restart issues directly in the 
environments.  The deployments are done daily at midnight Pacific so anyone who 
is actively working a restart issue should have plenty of time to gather 
information and try out liveness probe changes.

Thanks,
Gary

From: Lefevre, Catherine [mailto:cl6...@intl.att.com]
Sent: Tuesday, August 07, 2018 4:48 AM
To: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; 
roger.maitl...@amdocs.com<mailto:roger.maitl...@amdocs.com>; Gary Wu 
<gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; 
onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>
Subject: RE: [onap-discuss] [oom] OOM container restarts

Good morning Gary,

Would it be possible to save logs before a new deployment is performed so we 
can investigate what could be the reason of these restarts?

Many thanks & regards
Catherine

From: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> 
[mailto:onap-discuss@lists.onap.org] On Behalf Of Roger Maitland
Sent: Thursday, August 02, 2018 2:32 PM
To: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; 
gary.i...@huawei.com<mailto:gary.i...@huawei.com>; 
onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>
Subject: Re: [onap-discuss] [oom] OOM container restarts

Thanks for setting this up Gary.  Having good data will allow us to observe and 
fix these problems.

Cheers,
Roger

From: <onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>> on 
behalf of Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>
Reply-To: "onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>" 
<onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>>, 
"gary.i...@huawei.com<mailto:gary.i...@huawei.com>" 
<gary.i...@huawei.com<mailto:gary.i...@huawei.com>>
Date: Wednesday, August 1, 2018 at 6:09 PM
To: "onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>" 
<onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>>, 
"onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>" 
<onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>>
Subject: [onap-discuss] [oom] OOM container restarts

Hi PTLs,

We have started to log the number of container restarts in OOM daily deployment 
tests:

http://onapci.org/grafana/d/kRvfoqKmz/oom-container-restarts?orgId=1<https://urldefense.proofpoint.com/v2/url?u=http-3A__onapci.org_grafana_d_kRvfoqKmz_oom-2Dcontainer-2Drestarts-3ForgId-3D1&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=fZo_PBywno7GLydp8rQbEzh-U1twCVqTUDOjU2PZjeg&e=>

Please review for your respective projects and see if your containers appear in 
these charts.

A restart count > 0 means that your container got killed by Kubernetes while it 
was initializing.  If your docker container has a non-zero restart count, this 
means that the liveness probe configuration for your respective helm charts 
need to be fixed so that Kubernetes doesn’t kill your containers during a slow 
startup.

Our goal for Casablanca is to have zero restarts for all ONAP containers.  If 
you don’t know what a k8s liveness probe is or what to do about it, please 
contact me or the OOM team for assistance.

Thanks,
Gary


This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at 
https://www.amdocs.com/about/email-disclaimer<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amdocs.com_about_email-2Ddisclaimer&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=MGVvGQTHdW6k34AB6Ahzg9vN4oAADYi_R0JrM5lH1DE&e=>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11742): https://lists.onap.org/g/onap-discuss/message/11742
Mute This Topic: https://lists.onap.org/mt/24083345/21656
Group Owner: onap-discuss+ow...@lists.onap.org
Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to