Gary, Team, I would like to share some findings made by the CLAMP team in case of a similar issue is encountered by the other teams. It seems that the problem arrives when the container takes some time to start (around 60s for Windriver OOM-Daily lab)
The setting of the liveness, initialDelaySeconds, must be adapted. It was initially set to 30s which too low, we will increase it to 120s. Hope it helps. Best regards Catherine From: Lefevre, Catherine Sent: Wednesday, August 08, 2018 5:49 AM To: 'Gary Wu' <gary.i...@huawei.com>; 'onap-discuss@lists.onap.org' <onap-discuss@lists.onap.org>; 'roger.maitl...@amdocs.com' <roger.maitl...@amdocs.com>; 'onap-rele...@lists.onap.org' <onap-rele...@lists.onap.org> Subject: RE: [onap-discuss] [oom] OOM container restarts Thank you Gary for the additional information. I raised previously this request because some PTLs could not find the logs from previous OOM containers restart. If PTLs check at the beginning of their day then they should be able to capture the logs if the restart reoccurs. I looked on the different environments yesterday (Aug 7th). I have asked CLAMP, SDNC, DCAEGEN2, MUSIC, EXTAPI, AAF teams to investigate since the restart of their containers are recurrent. I did not trigger the VFC team – I did the assumption that you were already working with them. If not then let me know. Finally, great job on the Grafana dashboard ! Best regards Catherine From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Tuesday, August 07, 2018 5:05 PM To: Lefevre, Catherine <cl6...@intl.att.com<mailto:cl6...@intl.att.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; roger.maitl...@amdocs.com<mailto:roger.maitl...@amdocs.com>; onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org> Subject: RE: [onap-discuss] [oom] OOM container restarts Hi Cathrine, Generally speaking, a container restart means that: 1. The liveness probe initial delay is too low, and 2. The liveness probe, when triggered, is returning the wrong status while the container is still initializing (i.e. not dead). The default liveness probe in OOM is a TCP probe with a short (10 seconds?) initial delay. This means that if the TCP port specified is not responding at 10s after container start, then the container will be killed and restarted. There are two ways to remedy this: 1. Make the initial delay long enough to guarantee that by then the TCP port will be up, keeping in mind that ONAP might be running on arbitrarily slow hardware. 2. Change the liveness probe to something else, perhaps to a shell script probe inside the container that can do more sophisticated checks like process status, etc. I can look into the feasibility of saving the container logs. In the meantime, it might actually be easier to debug the restart issues directly in the environments. The deployments are done daily at midnight Pacific so anyone who is actively working a restart issue should have plenty of time to gather information and try out liveness probe changes. Thanks, Gary From: Lefevre, Catherine [mailto:cl6...@intl.att.com] Sent: Tuesday, August 07, 2018 4:48 AM To: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; roger.maitl...@amdocs.com<mailto:roger.maitl...@amdocs.com>; Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org> Subject: RE: [onap-discuss] [oom] OOM container restarts Good morning Gary, Would it be possible to save logs before a new deployment is performed so we can investigate what could be the reason of these restarts? Many thanks & regards Catherine From: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> [mailto:onap-discuss@lists.onap.org] On Behalf Of Roger Maitland Sent: Thursday, August 02, 2018 2:32 PM To: onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>; gary.i...@huawei.com<mailto:gary.i...@huawei.com>; onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org> Subject: Re: [onap-discuss] [oom] OOM container restarts Thanks for setting this up Gary. Having good data will allow us to observe and fix these problems. Cheers, Roger From: <onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>> on behalf of Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>> Reply-To: "onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>" <onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>>, "gary.i...@huawei.com<mailto:gary.i...@huawei.com>" <gary.i...@huawei.com<mailto:gary.i...@huawei.com>> Date: Wednesday, August 1, 2018 at 6:09 PM To: "onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>" <onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>>, "onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>" <onap-rele...@lists.onap.org<mailto:onap-rele...@lists.onap.org>> Subject: [onap-discuss] [oom] OOM container restarts Hi PTLs, We have started to log the number of container restarts in OOM daily deployment tests: http://onapci.org/grafana/d/kRvfoqKmz/oom-container-restarts?orgId=1<https://urldefense.proofpoint.com/v2/url?u=http-3A__onapci.org_grafana_d_kRvfoqKmz_oom-2Dcontainer-2Drestarts-3ForgId-3D1&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=fZo_PBywno7GLydp8rQbEzh-U1twCVqTUDOjU2PZjeg&e=> Please review for your respective projects and see if your containers appear in these charts. A restart count > 0 means that your container got killed by Kubernetes while it was initializing. If your docker container has a non-zero restart count, this means that the liveness probe configuration for your respective helm charts need to be fixed so that Kubernetes doesn’t kill your containers during a slow startup. Our goal for Casablanca is to have zero restarts for all ONAP containers. If you don’t know what a k8s liveness probe is or what to do about it, please contact me or the OOM team for assistance. Thanks, Gary This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amdocs.com_about_email-2Ddisclaimer&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=MGVvGQTHdW6k34AB6Ahzg9vN4oAADYi_R0JrM5lH1DE&e=> -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11742): https://lists.onap.org/g/onap-discuss/message/11742 Mute This Topic: https://lists.onap.org/mt/24083345/21656 Group Owner: onap-discuss+ow...@lists.onap.org Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-