Hi A. Seaudi, I’m really only speculating here but if there was a resource leak (memory or storage) that consumes everything on a node (physical machine or VM) the containers on that node will fail and Kubernetes will try to restart them. If there are insufficient resources the pod(s) will remain in “init” status until there are sufficient resources (say by adding a new node to the Kubernetes cluster).
The Beijing release is in the integration phase now and one of the major goals of the release was to achieve stability. It would be great if you were willing to switch your testing to Beijing so we can identify problems like resource leaks quickly. We want Beijing to be production ready so the more hardening we do the better. Cheers, Roger From: "abdelmuhaimen.sea...@orange.com" <abdelmuhaimen.sea...@orange.com> Date: Saturday, April 14, 2018 at 1:00 PM To: Michael O'Brien <frank.obr...@amdocs.com>, Roger Maitland <roger.maitl...@amdocs.com>, "onap-discuss@lists.onap.org" <onap-discuss@lists.onap.org> Subject: RE: [onap-discuss] ONAP on Kubernetes Hi, I started with another clean vm, and now OOM is running fine. However, I am interested to know why many pods return to "init" status after one or two days running. Thanks A. Seaudi ________________________________ From: SEAUDI Abdelmuhaimen OBS/CSO Sent: Saturday, April 14, 2018 3:11 PM To: Michael O'Brien; Roger Maitland; onap-discuss@lists.onap.org Subject: RE: [onap-discuss] ONAP on Kubernetes Hi, I was able to clean my system and close down port 10250, and the cpu issue is now resolved. However, after a day or 2 runnnig OOM, i find many of the pods are now in init status, and i need to "./deleteAll -n onap -a aai" for example, and "./createAll -n onap -a aai" again to resume the pods working. I worked like that for a while, but now i am facing errors similar to below, and i cannot get the pods to run again. onap-message-router dmaap-3126594942-cfm9n 0/1 rpc error: code = 2 desc = failed to start container "74b8d5d04b693c432a4b21a03c272418451eccdc54305330d998bfee0d532a59 ": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/message-router/dmaap/cadi.properties\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/737 6ab454562d2c60dccf7a0a24f3a14b7ee33461f50487b8da6e1b358f5511b\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/7376ab454562d2c60dccf7a0a24f3a14b7ee33461f50487b8da6e1b358f5511b/ap pl/dmaapMR1/etc/cadi.properties\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""} 3 1m Any idea what this error means ? I tried to use a clean vm and pulling https://jira.onap.org/secure/attachment/11421/oom_rancher_setup.sh and https://jira.onap.org/secure/attachment/11413/cd.sh but i get the same result. Thanks A. Seaudi ________________________________ From: Michael O'Brien [frank.obr...@amdocs.com] Sent: Monday, March 26, 2018 8:34 PM To: Roger Maitland; SEAUDI Abdelmuhaimen OBS/CSO; onap-discuss@lists.onap.org Subject: RE: [onap-discuss] ONAP on Kubernetes Abdul, Hi, if you are at 95-100% constantly – that will not be ONAP – even on an 8 vCore you will see a max of 50-90% - if you see 95-100 – then it is https://jira.onap.org/browse/OOM-806 - and if your network is public facing – as Roger says – lock down ports 10249-10255, do a “top” first, hit c and verify you are compromised. Use the port lockdown ACL in the Azure template as a guide below https://gerrit.onap.org/r/#/c/33527/9/install/azure/arm_deploy_ons_sut.json For the full HD – you should be able to run for weeks - I have a 120G drive. For reference amsterdam will not breach 64G – you should hover between 51 and 55. – Beijing has this issue periodically because the odd container will consume 10G – we are working on limiting the profile for an individual container – out of the box each container thinks it has the whole VM. thank you /michael From: onap-discuss-boun...@lists.onap.org [mailto:onap-discuss-boun...@lists.onap.org] On Behalf Of Roger Maitland Sent: Monday, March 26, 2018 09:55 To: abdelmuhaimen.sea...@orange.com; onap-discuss@lists.onap.org Subject: Re: [onap-discuss] ONAP on Kubernetes I suspect you’ve come across two, possibly three, different issues: * Memory leaks within the containers of (potentially) several projects: There already are several Jira bugs open against suspected memory leaks: SDC-1092<https://jira.onap.org/browse/SDC-1092>, DCAEGEN2-198<https://jira.onap.org/browse/DCAEGEN2-198>, PORTAL-211<https://jira.onap.org/browse/PORTAL-211>, and VID-196<https://jira.onap.org/browse/VID-196>. If you’ve found memory leaks outside of these components please raise a Jira bug against the appropriate project. Note that stability is an important part of the Beijing release and many of the teams have reworked their components so hopefully the leaks have been addressed. Please keep watch as the new code is made available. * Likely log files are consuming all of the available storage. Log rotation needs to be implemented across all of ONAP to ensure that the storage volumes aren’t consumed over time. Again, if you can track the problem down to a specific component please raise a Jira bug. * If your system is available to internet you may find that it is being used to mine bitcoin. Unfortunately, some Kubernetes systems are being attacked this way so you’ll want to watch your system closely. The ONAP team is working to address this issue. Thank you for your observations – this is how we make ONAP production grade. Cheers, Roger From: <onap-discuss-boun...@lists.onap.org<mailto:onap-discuss-boun...@lists.onap.org>> on behalf of "abdelmuhaimen.sea...@orange.com<mailto:abdelmuhaimen.sea...@orange.com>> Date: Saturday, March 24, 2018 at 5:54 AM To: "onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>> Subject: [onap-discuss] ONAP on Kubernetes Hi, I deploy a minimum ONAP Amsterdam on Kubernetes, using a 64 GB RAM + 8 vCPU or 16 vCPU. The starting RAM utilization is around 30--40 GB RAM. After one or two days, all the cpu cores reach 100%, and i reboot the VM, to be able to have some responsive feedback. Sometimes, the HDD is full, and i have to reset the VM and redploy from scratch. What is the reason for saturating all the CPU cores, or the RAM ? Is there something I can do to minimize this behaviour ? should I use the Master Branch instead of Amsterdam ? Thanks. A. Seaudi _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
_______________________________________________ onap-discuss mailing list onap-discuss@lists.onap.org https://lists.onap.org/mailman/listinfo/onap-discuss