Re: [onap-discuss] facing problem with portal app

Michael O'Brien Thu, 19 Jul 2018 07:48:58 -0700

Hi,
   For the aaf pod - this looks like one of the known issue pods - we need a 
page for these though for reference - the teams like AAF can keep up to date 
(along with checking the status of the CI/CD reference servers)
   For your portal issue - I don't remember if it was your team or another we 
were working with - where I mentioned that only the Rancher RI comes with a 
default LoadBalancer service - make sure if you are using an alternative 
kubernetes setup based on kubectl that you setup your own native LoadBalancer - 
or switch to using Rancher.
   Thank you
    /michael

From: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]>
Sent: Thursday, July 19, 2018 10:08 AM
To: Michael O'Brien <[email protected]>; Borislav Glozman 
<[email protected]>; Mike Elliott <[email protected]>
Cc: [email protected]
Subject: RE: facing problem with portal app

Hi Mike/Borislav,

As discussed in yesterday's meeting I am sharing the configuration I have setup 
in my lab. I have one Rancher node and a cluster of 4 Kubernetes nodes for the 
Beijing release. Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s VM node 
is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting this up in OpenStack 
following the steps is link 
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>.

I had shared this with Michael O'Brien earlier and got the setup verified as in 
the below mail. But even after trying 2-3 attempts some of the pods did not 
come up. These are intermittent and the failed pods vary each time. In recent 
run I had problem with:

onap-aaf-cm (crashloopbackoff)
onap-dbc-pg-0 (this shows readiness probe failure)
onap-dbc-pg-1 (this pod used to come up earlier but not this time)
onap-dmaap-bus-controller (crashloopbackoff)
portal-app (this remains in init state every time due to a failed init 
container)

Just wanted  to know if having 6 k8s nodes with 32 GB RAM, as suggested by 
Borislav in recent posts, would improve the chances?
Michael had indicated that there are Docker image downloading problems which 
can lead to pod syncing issues. What Internet speed is recommended for pulling 
the images; could fluctuating bandwidth be a problem?

Thanks,
Vidhu

From: Michael O'Brien [mailto:[email protected]]
Sent: 05 July 2018 04:57
To: Vidhu Shekhar Pandey - ERS, HCL Tech 
<[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: facing problem with portal app

Adding ONAP community for reference and input
Your setup looks fine - docker downloads will be 40G per VM, the master will 
only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it 
will expand past 120G if you provide a large enough cluster - you are running 
160G so ok there as well.

Deploying ONAP on a clean set of VM will be problematic mostly because of the 
docker downloads issue - until we retrofit the preload script to run off the 
manifest.  The 2nd time you deploy it will come up faster for that day as your 
4 docker caches per/vm are filled.  This is less of a problem on some cloud 
providers and if you run behind your own local nexus3 proxy.

The NFS share is recommended but not required until pods start getting 
rescheduled across cluster VM's
The system does have dependency tracking in the case of readiness checks - but 
the number of retries are unfortunately finite in number and durations and 
start time - to be fine tuned as we go.  Therefore you are subject to a bit of 
random start order for now - as we have not allocated resource cpu/ram where it 
is required low in the dep tree yet - except for DaemonSets.  Lately for the 
past 2 weeks portal has also been failing for the CD system when it runs every 
4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container 
related retries), sdnc, so,  and intermittently dcae, aaf, and a couple aai on 
kibana.onap.info:5601

I have not looked into all of these failures, a couple were docker image tag 
version flips that were required, a couple were removed images from nexus3 that 
were fixed.

My highest healthcheck was 40/43 on June 20th on a clean cluster - where there 
were only issues with clamp, sdc and sdnc - most of them timing related not 
particular to these apps - they were just unlucky to be starved of resources - 
this is on a 4 cluster (64cores/256G ram) each 16 cores/64G/120Gssd/20GbpsNet 
with EFS/NFS share

As you can see -failed containers does not necessarily match failed 
healtchecks.  Health can still fail on 1/1 or 2/2 running containers if these 
are still initializing - which is good, or not fail if the container is not 
part of the healthcheck - which is sometimes by design because the container is 
an optional one.

http://jenkins.onap.info/job/oom-cd-master/3185/console<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjenkins.onap.info%2Fjob%2Foom-cd-master%2F3185%2Fconsole&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181054180&sdata=Vd4%2BvogRjc0OIiFkHW2RgO1uZnMoPF75tv2ZlPtr6lo%3D&reserved=0>
22:41:11 report on non-running containers
22:41:12 down-aai=1
22:41:15 down-sdc=1
22:41:18 down-clamp=1
22:41:20 pending containers=3
22:41:20 onap          onap-aai-champ-68ff644d85-mnf79                  0/1     
  Running            0          2h
22:41:20 onap          onap-clamp-7d69d4cdd7-vlkkk                      1/2     
  CrashLoopBackOff   31         2h
22:41:20 onap          onap-sdc-be-8447b4d544-7tn5w                     1/2     
  Running            0          2h

23:02:46 Basic SDC Health Check                                                
| FAIL |

23:02:46 500 != 200

23:02:46 
------------------------------------------------------------------------------

23:02:46 Basic SDNC Health Check                                               
| FAIL |

23:02:46 Resolving variable '${resp.json()['output']['response-code']}' failed: 
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

23:02:34 Basic CLAMP Health Check                                              
| FAIL |

23:02:34 Test timeout 1 minute exceeded.

We will be having the next OOM meet on wed - we can discuss there as well.

Thank you
/michael

From: Vidhu Shekhar Pandey - ERS, HCL Tech 
<[email protected]<mailto:[email protected]>>
Sent: Tuesday, July 3, 2018 2:27 PM
To: Michael O'Brien <[email protected]<mailto:[email protected]>>
Subject: facing problem with portal app

Hi Michael,

I am trying to install ONAP OOM Beijing. I have setup one Rancher VM  and  
cluster of 4 Kubernetes nodes . Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. 
Each k8s node is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting up in 
OpenStack following the steps is link 
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>.

I tired cd.sh script earlier many times but each time most of the pods did not 
come up in running state and were stuck. So I resorted to getting the pods up 
one by one (in smaller groups) using "helm install local/onap -n onap 
-namespace onap -f values.yaml" and then adding more components using "helm 
upgrade onap local/onap -f values.yaml". This way I am able to get almost all 
pods running except few. One is the portal app pod which is stuck in init state 
waiting for ever. Is this due to some dependency on other pods? Is there a 
sequence I should follow while bringing the pods up? Is there any dependency 
diagram of components for Beijing release?

Earlier I also faced problems in getting policy nexus (sonatype) pod up but 
after the recent bug fixes in liveness and readiness time delay I got it 
working for me now.

Would really appreciate your suggestions.

Thanks,
Vidhu
::DISCLAIMER::
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents (with or without referred errors) shall therefore 
not attach any liability on the originator or HCL or its affiliates. Views or 
opinions, if any, presented in this email are solely those of the author and 
may not necessarily reflect the views or opinions of HCL or its affiliates. Any 
form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of authorized representative of HCL is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any email and/or attachments, please check them for 
viruses and other defects.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at 
https://www.amdocs.com/about/email-disclaimer<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amdocs.com%2Fabout%2Femail-disclaimer&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=5M2LXWr9g6RHHQA02X4EjMnBimypPqGBcsSmIRbgrNg%3D&reserved=0>
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11281): https://lists.onap.org/g/onap-discuss/message/11281
Mute This Topic: https://lists.onap.org/mt/23146822/21656
Group Owner: [email protected]
Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [onap-discuss] facing problem with portal app

Reply via email to