Hi All,


I have also been testing the stability of ONAP in K8S/OOM/Rancher approach in 
TLAB, and have been seeing the same problems as described below when running a 
complete ONAP instance for more than 1 day. We have been mainly using the 
Portal application (going into the VNC console, then into the Portal GUI).



Some observations made from the testing so far:



- After running SAR Linux stats for some days on a complete ONAP instance 
(Amsterdam), It seems like a full ONAP instance in a single VM should have at 
least 120-180 GB of RAM along with some swap memory allocate in order for the 
cluster to not fail.



- Scaling up the Rancher's K8S Environment's services to more than 1 container 
which is the default (esp. the rancher-ingress-controller) is important since 
this service receives all API/curl requests exposed from K8S cluster.



I have also seen a more reliable ONAP instance when K8S is deployed in a 
multi-node/HA environment in a deployment of 1 Rancher VM, 1 MySQL VM as its 
external DB, and 7 K8S VMs (3 data plane nodes, 2 orch nodes, and 2 compute 
nodes - all can serve as compute nodes though)







I wanted to confirm that it does seem like the heap size ends up being the most 
important one to troubleshoot and it appears (from the JIRA issues below) that 
this needs to be fixed at the application level along with the limiting 
resources (or just stop trying to spawn up k8s deployments failing because of 
resource shortage)?



Thanks,

Hector







Hi A. Seaudi,



I'm really only speculating here but if there was a resource leak (memory or 
storage) that consumes everything on a node (physical machine or VM) the 
containers on that node will fail and Kubernetes will try to restart them. If 
there are insufficient resources the pod(s) will remain in "init" status until 
there are sufficient resources (say by adding a new node to the Kubernetes 
cluster).



The Beijing release is in the integration phase now and one of the major goals 
of the release was to achieve stability.  It would be great if you were willing 
to switch your testing to Beijing so we can identify problems like resource 
leaks quickly.  We want Beijing to be production ready so the more hardening we 
do the better.



Cheers,

Roger





From: "abdelmuhaimen.seaudi at 
orange.com<https://lists.onap.org/mailman/listinfo/onap-discuss>" 
<abdelmuhaimen.seaudi at 
orange.com<https://lists.onap.org/mailman/listinfo/onap-discuss>>

Date: Saturday, April 14, 2018 at 1:00 PM

To: Michael O'Brien <Frank.Obrien at 
amdocs.com<https://lists.onap.org/mailman/listinfo/onap-discuss>>, Roger 
Maitland <Roger.Maitland at 
amdocs.com<https://lists.onap.org/mailman/listinfo/onap-discuss>>, 
"onap-discuss at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>" 
<onap-discuss at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>>

Subject: RE: [onap-discuss] ONAP on Kubernetes



Hi, I started with another clean vm, and now OOM is running fine.



However, I am interested to know why many pods return to "init" status after 
one or two days running.



Thanks



A. Seaudi

________________________________

From: SEAUDI Abdelmuhaimen OBS/CSO

Sent: Saturday, April 14, 2018 3:11 PM

To: Michael O'Brien; Roger Maitland; onap-discuss at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>

Subject: RE: [onap-discuss] ONAP on Kubernetes

Hi,



I was able to clean my system and close down port 10250, and the cpu issue is 
now resolved.



However, after a day or 2 runnnig OOM, i find many of the pods are now in init 
status, and i need to "./deleteAll -n onap -a aai" for example, and 
"./createAll -n onap -a aai" again to resume the pods working.



I worked like that for a while, but now i am facing errors similar to below, 
and i cannot get the pods to run again.



onap-message-router   dmaap-3126594942-cfm9n                 0/1       rpc 
error: code = 2 desc = failed to start container 
"74b8d5d04b693c432a4b21a03c272418451eccdc54305330d998bfee0d532a59

": Error response from daemon: {"message":"invalid header field value \"oci 
runtime error: container_linux.go:247: starting container process caused 
\\\"process_linux.go:359: container init

 caused \\\\\\\"rootfs_linux.go:53: mounting 
\\\\\\\\\\\\\\\"/dockerdata-nfs/onap/message-router/dmaap/cadi.properties\\\\\\\\\\\\\\\"
 to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/737

6ab454562d2c60dccf7a0a24f3a14b7ee33461f50487b8da6e1b358f5511b\\\\\\\\\\\\\\\" 
at 
\\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/7376ab454562d2c60dccf7a0a24f3a14b7ee33461f50487b8da6e1b358f5511b/ap

pl/dmaapMR1/etc/cadi.properties\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a 
directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}   3          1m



Any idea what this error means ?



I tried to use a clean vm and pulling 
https://jira.onap.org/secure/attachment/11421/oom_rancher_setup.sh and 
https://jira.onap.org/secure/attachment/11413/cd.sh but i get the same result.



Thanks



A. Seaudi

________________________________

From: Michael O'Brien [Frank.Obrien at 
amdocs.com<https://lists.onap.org/mailman/listinfo/onap-discuss>]

Sent: Monday, March 26, 2018 8:34 PM

To: Roger Maitland; SEAUDI Abdelmuhaimen OBS/CSO; onap-discuss at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>

Subject: RE: [onap-discuss] ONAP on Kubernetes

Abdul,

  Hi, if you are at 95-100% constantly - that will not be ONAP - even on an 8 
vCore you will see a max of 50-90% - if you see 95-100 - then it is 
https://jira.onap.org/browse/OOM-806   - and if your network is public facing - 
as Roger says - lock down ports 10249-10255, do a "top" first, hit c and verify 
you are compromised.

  Use the port lockdown ACL in the Azure template as a guide below

https://gerrit.onap.org/r/#/c/33527/9/install/azure/arm_deploy_ons_sut.json



   For the full HD - you should be able to run for weeks  - I have a 120G drive.

   For reference amsterdam will not breach 64G - you should hover between 51 
and 55. - Beijing has this issue periodically because the odd container will 
consume 10G - we are working on limiting the profile for an individual 
container - out of the box each container thinks it has the whole VM.



   thank you

   /michael



From: onap-discuss-bounces at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss> 
[mailto:onap-discuss-bounces at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>] On Behalf 
Of Roger Maitland

Sent: Monday, March 26, 2018 09:55

To: abdelmuhaimen.seaudi at 
orange.com<https://lists.onap.org/mailman/listinfo/onap-discuss>; onap-discuss 
at lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>

Subject: Re: [onap-discuss] ONAP on Kubernetes



I suspect you've come across two, possibly three, different issues:



  *   Memory leaks within the containers of (potentially) several projects:  
There already are several Jira bugs open against suspected memory leaks: 
SDC-1092<https://jira.onap.org/browse/SDC-1092>, 
DCAEGEN2-198<https://jira.onap.org/browse/DCAEGEN2-198>, 
PORTAL-211<https://jira.onap.org/browse/PORTAL-211>, and 
VID-196<https://jira.onap.org/browse/VID-196>.  If you've found memory leaks 
outside of these components please raise a Jira bug against the appropriate 
project.  Note that stability is an important part of the Beijing release and 
many of the teams have reworked their components so hopefully the leaks have 
been addressed.  Please keep watch as the new code is made available.

  *   Likely log files are consuming all of the available storage.  Log 
rotation needs to be implemented across all of ONAP to ensure that the storage 
volumes aren't consumed over time.  Again, if you can track the problem down to 
a specific component please raise a Jira bug.

  *   If your system is available to internet you may find that it is being 
used to mine bitcoin.  Unfortunately, some Kubernetes systems are being 
attacked this way so you'll want to watch your system closely.  The ONAP team 
is working to address this issue.



Thank you for your observations - this is how we make ONAP production grade.



Cheers,

Roger



From: <onap-discuss-bounces at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss><mailto:onap-discuss-bounces
 at lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>>> on 
behalf of "abdelmuhaimen.seaudi at 
orange.com<https://lists.onap.org/mailman/listinfo/onap-discuss><mailto:abdelmuhaimen.seaudi
 at orange.com<https://lists.onap.org/mailman/listinfo/onap-discuss>>>

Date: Saturday, March 24, 2018 at 5:54 AM

To: "onap-discuss at 
lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss><mailto:onap-discuss
 at lists.onap.org<https://lists.onap.org/mailman/listinfo/onap-discuss>>>

Subject: [onap-discuss] ONAP on Kubernetes



Hi,



I deploy a minimum ONAP Amsterdam on Kubernetes, using a 64 GB RAM + 8 vCPU or 
16 vCPU.



The starting RAM utilization is around 30--40 GB RAM.



After one or two days, all the cpu cores reach 100%, and i reboot the VM, to be 
able to have some responsive feedback.



Sometimes, the HDD is full, and i have to reset the VM and redploy from scratch.



What is the reason for saturating all the CPU cores, or the RAM ?



Is there something I can do to minimize this behaviour ? should I use the 
Master Branch instead of Amsterdam ?



Thanks.



A. Seaudi



_________________________________________________________________________________________________________________________







Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc



pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler



a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,



Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.







This message and its attachments may contain confidential or privileged 
information that may be protected by law;



they should not be distributed, used or copied without authorisation.



If you have received this email in error, please notify the sender and delete 
this message and its attachments.



As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.



Thank you.

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer



_________________________________________________________________________________________________________________________







Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc



pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler



a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,



Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.







This message and its attachments may contain confidential or privileged 
information that may be protected by law;



they should not be distributed, used or copied without authorisation.



If you have received this email in error, please notify the sender and delete 
this message and its attachments.



As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.



Thank you.

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,



you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>

-------------- next part --------------

An HTML attachment was scrubbed...

URL: 
<http://lists.onap.org/pipermail/onap-discuss/attachments/20180416/46a13317/attachment.html>

_______________________________________________
onap-discuss mailing list
[email protected]
https://lists.onap.org/mailman/listinfo/onap-discuss

Reply via email to