Hi Yan, No worries.
I do not know if your container restart issue is related to a timeout or not. The CLAMP team has increased the initialDelaySeconds to 120 in the values.yaml file. # probe configuration parameters liveness: initialDelaySeconds: 120 Hope it helps. Best regards Catherine From: Yan Yang [mailto:[email protected]] Sent: Wednesday, August 08, 2018 6:09 AM To: [email protected]; Lefevre, Catherine <[email protected]>; 'Gary Wu' <[email protected]>; [email protected]; [email protected] Subject: 答复: [onap-discuss] [oom] OOM container restarts Hi Catherine and Gary, Sorry for late to see this email. VF-C team will key our eyes on this, may be need some help from OOM team. Best Regards, Yan 发件人: [email protected]<mailto:[email protected]> [mailto:[email protected]] 代表 Catherine LEFEVRE 发送时间: 2018年8月8日 11:49 收件人: Gary Wu; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> 主题: Re: [onap-discuss] [oom] OOM container restarts Thank you Gary for the additional information. I raised previously this request because some PTLs could not find the logs from previous OOM containers restart. If PTLs check at the beginning of their day then they should be able to capture the logs if the restart reoccurs. I looked on the different environments yesterday (Aug 7th). I have asked CLAMP, SDNC, DCAEGEN2, MUSIC, EXTAPI, AAF teams to investigate since the restart of their containers are recurrent. I did not trigger the VFC team – I did the assumption that you were already working with them. If not then let me know. Finally, great job on the Grafana dashboard ! Best regards Catherine From: Gary Wu [mailto:[email protected]] Sent: Tuesday, August 07, 2018 5:05 PM To: Lefevre, Catherine <[email protected]<mailto:[email protected]>>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: RE: [onap-discuss] [oom] OOM container restarts Hi Cathrine, Generally speaking, a container restart means that: 1. The liveness probe initial delay is too low, and 2. The liveness probe, when triggered, is returning the wrong status while the container is still initializing (i.e. not dead). The default liveness probe in OOM is a TCP probe with a short (10 seconds?) initial delay. This means that if the TCP port specified is not responding at 10s after container start, then the container will be killed and restarted. There are two ways to remedy this: 1. Make the initial delay long enough to guarantee that by then the TCP port will be up, keeping in mind that ONAP might be running on arbitrarily slow hardware. 2. Change the liveness probe to something else, perhaps to a shell script probe inside the container that can do more sophisticated checks like process status, etc. I can look into the feasibility of saving the container logs. In the meantime, it might actually be easier to debug the restart issues directly in the environments. The deployments are done daily at midnight Pacific so anyone who is actively working a restart issue should have plenty of time to gather information and try out liveness probe changes. Thanks, Gary From: Lefevre, Catherine [mailto:[email protected]] Sent: Tuesday, August 07, 2018 4:48 AM To: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; Gary Wu <[email protected]<mailto:[email protected]>>; [email protected]<mailto:[email protected]> Subject: RE: [onap-discuss] [oom] OOM container restarts Good morning Gary, Would it be possible to save logs before a new deployment is performed so we can investigate what could be the reason of these restarts? Many thanks & regards Catherine From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Roger Maitland Sent: Thursday, August 02, 2018 2:32 PM To: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: Re: [onap-discuss] [oom] OOM container restarts Thanks for setting this up Gary. Having good data will allow us to observe and fix these problems. Cheers, Roger From: <[email protected]<mailto:[email protected]>> on behalf of Gary Wu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, August 1, 2018 at 6:09 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [onap-discuss] [oom] OOM container restarts Hi PTLs, We have started to log the number of container restarts in OOM daily deployment tests: http://onapci.org/grafana/d/kRvfoqKmz/oom-container-restarts?orgId=1<https://urldefense.proofpoint.com/v2/url?u=http-3A__onapci.org_grafana_d_kRvfoqKmz_oom-2Dcontainer-2Drestarts-3ForgId-3D1&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=fZo_PBywno7GLydp8rQbEzh-U1twCVqTUDOjU2PZjeg&e=> Please review for your respective projects and see if your containers appear in these charts. A restart count > 0 means that your container got killed by Kubernetes while it was initializing. If your docker container has a non-zero restart count, this means that the liveness probe configuration for your respective helm charts need to be fixed so that Kubernetes doesn’t kill your containers during a slow startup. Our goal for Casablanca is to have zero restarts for all ONAP containers. If you don’t know what a k8s liveness probe is or what to do about it, please contact me or the OOM team for assistance. Thanks, Gary This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amdocs.com_about_email-2Ddisclaimer&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=66ObImPAUA0o2f1hTGknnnv5ScXvX8EnREJCPHHBY5M&m=4pI8f9f2urxJ6_53GuX5WpQsiNREmaajs1mjYXRH9lI&s=MGVvGQTHdW6k34AB6Ahzg9vN4oAADYi_R0JrM5lH1DE&e=> -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11744): https://lists.onap.org/g/onap-discuss/message/11744 Mute This Topic: https://lists.onap.org/mt/24227633/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
