Hi, The AAF problem is usually related to nfs (/dockerdata-nfs) not working between the nodes. Please check that the nfs is working.
Thanks, Borislav Glozman O:+972.9.776.1988 M:+972.52.2835726 [amdocs-a] Amdocs a Platinum member of ONAP<https://www.amdocs.com/open-network/nfv-powered-by-onap> From: Michael O'Brien Sent: Thursday, July 19, 2018 5:49 PM To: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]>; Borislav Glozman <[email protected]>; Mike Elliott <[email protected]> Cc: [email protected] Subject: RE: facing problem with portal app Hi, For the aaf pod - this looks like one of the known issue pods - we need a page for these though for reference - the teams like AAF can keep up to date (along with checking the status of the CI/CD reference servers) For your portal issue - I don't remember if it was your team or another we were working with - where I mentioned that only the Rancher RI comes with a default LoadBalancer service - make sure if you are using an alternative kubernetes setup based on kubectl that you setup your own native LoadBalancer - or switch to using Rancher. Thank you /michael From: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]<mailto:[email protected]>> Sent: Thursday, July 19, 2018 10:08 AM To: Michael O'Brien <[email protected]<mailto:[email protected]>>; Borislav Glozman <[email protected]<mailto:[email protected]>>; Mike Elliott <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: RE: facing problem with portal app Hi Mike/Borislav, As discussed in yesterday's meeting I am sharing the configuration I have setup in my lab. I have one Rancher node and a cluster of 4 Kubernetes nodes for the Beijing release. Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s VM node is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting this up in OpenStack following the steps is link http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>. I had shared this with Michael O'Brien earlier and got the setup verified as in the below mail. But even after trying 2-3 attempts some of the pods did not come up. These are intermittent and the failed pods vary each time. In recent run I had problem with: onap-aaf-cm (crashloopbackoff) onap-dbc-pg-0 (this shows readiness probe failure) onap-dbc-pg-1 (this pod used to come up earlier but not this time) onap-dmaap-bus-controller (crashloopbackoff) portal-app (this remains in init state every time due to a failed init container) Just wanted to know if having 6 k8s nodes with 32 GB RAM, as suggested by Borislav in recent posts, would improve the chances? Michael had indicated that there are Docker image downloading problems which can lead to pod syncing issues. What Internet speed is recommended for pulling the images; could fluctuating bandwidth be a problem? Thanks, Vidhu From: Michael O'Brien [mailto:[email protected]] Sent: 05 July 2018 04:57 To: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: RE: facing problem with portal app Adding ONAP community for reference and input Your setup looks fine - docker downloads will be 40G per VM, the master will only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it will expand past 120G if you provide a large enough cluster - you are running 160G so ok there as well. Deploying ONAP on a clean set of VM will be problematic mostly because of the docker downloads issue - until we retrofit the preload script to run off the manifest. The 2nd time you deploy it will come up faster for that day as your 4 docker caches per/vm are filled. This is less of a problem on some cloud providers and if you run behind your own local nexus3 proxy. The NFS share is recommended but not required until pods start getting rescheduled across cluster VM's The system does have dependency tracking in the case of readiness checks - but the number of retries are unfortunately finite in number and durations and start time - to be fine tuned as we go. Therefore you are subject to a bit of random start order for now - as we have not allocated resource cpu/ram where it is required low in the dep tree yet - except for DaemonSets. Lately for the past 2 weeks portal has also been failing for the CD system when it runs every 4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container related retries), sdnc, so, and intermittently dcae, aaf, and a couple aai on kibana.onap.info:5601 I have not looked into all of these failures, a couple were docker image tag version flips that were required, a couple were removed images from nexus3 that were fixed. My highest healthcheck was 40/43 on June 20th on a clean cluster - where there were only issues with clamp, sdc and sdnc - most of them timing related not particular to these apps - they were just unlucky to be starved of resources - this is on a 4 cluster (64cores/256G ram) each 16 cores/64G/120Gssd/20GbpsNet with EFS/NFS share As you can see -failed containers does not necessarily match failed healtchecks. Health can still fail on 1/1 or 2/2 running containers if these are still initializing - which is good, or not fail if the container is not part of the healthcheck - which is sometimes by design because the container is an optional one. http://jenkins.onap.info/job/oom-cd-master/3185/console<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjenkins.onap.info%2Fjob%2Foom-cd-master%2F3185%2Fconsole&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181054180&sdata=Vd4%2BvogRjc0OIiFkHW2RgO1uZnMoPF75tv2ZlPtr6lo%3D&reserved=0> 22:41:11 report on non-running containers 22:41:12 down-aai=1 22:41:15 down-sdc=1 22:41:18 down-clamp=1 22:41:20 pending containers=3 22:41:20 onap onap-aai-champ-68ff644d85-mnf79 0/1 Running 0 2h 22:41:20 onap onap-clamp-7d69d4cdd7-vlkkk 1/2 CrashLoopBackOff 31 2h 22:41:20 onap onap-sdc-be-8447b4d544-7tn5w 1/2 Running 0 2h 23:02:46 Basic SDC Health Check | FAIL | 23:02:46 500 != 200 23:02:46 ------------------------------------------------------------------------------ 23:02:46 Basic SDNC Health Check | FAIL | 23:02:46 Resolving variable '${resp.json()['output']['response-code']}' failed: JSONDecodeError: Expecting value: line 1 column 1 (char 0) 23:02:34 Basic CLAMP Health Check | FAIL | 23:02:34 Test timeout 1 minute exceeded. We will be having the next OOM meet on wed - we can discuss there as well. Thank you /michael From: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]<mailto:[email protected]>> Sent: Tuesday, July 3, 2018 2:27 PM To: Michael O'Brien <[email protected]<mailto:[email protected]>> Subject: facing problem with portal app Hi Michael, I am trying to install ONAP OOM Beijing. I have setup one Rancher VM and cluster of 4 Kubernetes nodes . Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s node is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting up in OpenStack following the steps is link http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>. I tired cd.sh script earlier many times but each time most of the pods did not come up in running state and were stuck. So I resorted to getting the pods up one by one (in smaller groups) using "helm install local/onap -n onap -namespace onap -f values.yaml" and then adding more components using "helm upgrade onap local/onap -f values.yaml". This way I am able to get almost all pods running except few. One is the portal app pod which is stuck in init state waiting for ever. Is this due to some dependency on other pods? Is there a sequence I should follow while bringing the pods up? Is there any dependency diagram of components for Beijing release? Earlier I also faced problems in getting policy nexus (sonatype) pod up but after the recent bug fixes in liveness and readiness time delay I got it working for me now. Would really appreciate your suggestions. Thanks, Vidhu ::DISCLAIMER:: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amdocs.com%2Fabout%2Femail-disclaimer&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=5M2LXWr9g6RHHQA02X4EjMnBimypPqGBcsSmIRbgrNg%3D&reserved=0> This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer> -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11282): https://lists.onap.org/g/onap-discuss/message/11282 Mute This Topic: https://lists.onap.org/mt/23146822/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
