Hi,
For the aaf pod - this looks like one of the known issue pods - we need a
page for these though for reference - the teams like AAF can keep up to date
(along with checking the status of the CI/CD reference servers)
For your portal issue - I don't remember if it was your team or another we
were working with - where I mentioned that only the Rancher RI comes with a
default LoadBalancer service - make sure if you are using an alternative
kubernetes setup based on kubectl that you setup your own native LoadBalancer -
or switch to using Rancher.
Thank you
/michael
From: Vidhu Shekhar Pandey - ERS, HCL Tech <[email protected]>
Sent: Thursday, July 19, 2018 10:08 AM
To: Michael O'Brien <[email protected]>; Borislav Glozman
<[email protected]>; Mike Elliott <[email protected]>
Cc: [email protected]
Subject: RE: facing problem with portal app
Hi Mike/Borislav,
As discussed in yesterday's meeting I am sharing the configuration I have setup
in my lab. I have one Rancher node and a cluster of 4 Kubernetes nodes for the
Beijing release. Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s VM node
is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting this up in OpenStack
following the steps is link
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>.
I had shared this with Michael O'Brien earlier and got the setup verified as in
the below mail. But even after trying 2-3 attempts some of the pods did not
come up. These are intermittent and the failed pods vary each time. In recent
run I had problem with:
onap-aaf-cm (crashloopbackoff)
onap-dbc-pg-0 (this shows readiness probe failure)
onap-dbc-pg-1 (this pod used to come up earlier but not this time)
onap-dmaap-bus-controller (crashloopbackoff)
portal-app (this remains in init state every time due to a failed init
container)
Just wanted to know if having 6 k8s nodes with 32 GB RAM, as suggested by
Borislav in recent posts, would improve the chances?
Michael had indicated that there are Docker image downloading problems which
can lead to pod syncing issues. What Internet speed is recommended for pulling
the images; could fluctuating bandwidth be a problem?
Thanks,
Vidhu
From: Michael O'Brien [mailto:[email protected]]
Sent: 05 July 2018 04:57
To: Vidhu Shekhar Pandey - ERS, HCL Tech
<[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: facing problem with portal app
Adding ONAP community for reference and input
Your setup looks fine - docker downloads will be 40G per VM, the master will
only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it
will expand past 120G if you provide a large enough cluster - you are running
160G so ok there as well.
Deploying ONAP on a clean set of VM will be problematic mostly because of the
docker downloads issue - until we retrofit the preload script to run off the
manifest. The 2nd time you deploy it will come up faster for that day as your
4 docker caches per/vm are filled. This is less of a problem on some cloud
providers and if you run behind your own local nexus3 proxy.
The NFS share is recommended but not required until pods start getting
rescheduled across cluster VM's
The system does have dependency tracking in the case of readiness checks - but
the number of retries are unfortunately finite in number and durations and
start time - to be fine tuned as we go. Therefore you are subject to a bit of
random start order for now - as we have not allocated resource cpu/ram where it
is required low in the dep tree yet - except for DaemonSets. Lately for the
past 2 weeks portal has also been failing for the CD system when it runs every
4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container
related retries), sdnc, so, and intermittently dcae, aaf, and a couple aai on
kibana.onap.info:5601
I have not looked into all of these failures, a couple were docker image tag
version flips that were required, a couple were removed images from nexus3 that
were fixed.
My highest healthcheck was 40/43 on June 20th on a clean cluster - where there
were only issues with clamp, sdc and sdnc - most of them timing related not
particular to these apps - they were just unlucky to be starved of resources -
this is on a 4 cluster (64cores/256G ram) each 16 cores/64G/120Gssd/20GbpsNet
with EFS/NFS share
As you can see -failed containers does not necessarily match failed
healtchecks. Health can still fail on 1/1 or 2/2 running containers if these
are still initializing - which is good, or not fail if the container is not
part of the healthcheck - which is sometimes by design because the container is
an optional one.
http://jenkins.onap.info/job/oom-cd-master/3185/console<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjenkins.onap.info%2Fjob%2Foom-cd-master%2F3185%2Fconsole&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181054180&sdata=Vd4%2BvogRjc0OIiFkHW2RgO1uZnMoPF75tv2ZlPtr6lo%3D&reserved=0>
22:41:11 report on non-running containers
22:41:12 down-aai=1
22:41:15 down-sdc=1
22:41:18 down-clamp=1
22:41:20 pending containers=3
22:41:20 onap onap-aai-champ-68ff644d85-mnf79 0/1
Running 0 2h
22:41:20 onap onap-clamp-7d69d4cdd7-vlkkk 1/2
CrashLoopBackOff 31 2h
22:41:20 onap onap-sdc-be-8447b4d544-7tn5w 1/2
Running 0 2h
23:02:46 Basic SDC Health Check
| FAIL |
23:02:46 500 != 200
23:02:46
------------------------------------------------------------------------------
23:02:46 Basic SDNC Health Check
| FAIL |
23:02:46 Resolving variable '${resp.json()['output']['response-code']}' failed:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
23:02:34 Basic CLAMP Health Check
| FAIL |
23:02:34 Test timeout 1 minute exceeded.
We will be having the next OOM meet on wed - we can discuss there as well.
Thank you
/michael
From: Vidhu Shekhar Pandey - ERS, HCL Tech
<[email protected]<mailto:[email protected]>>
Sent: Tuesday, July 3, 2018 2:27 PM
To: Michael O'Brien <[email protected]<mailto:[email protected]>>
Subject: facing problem with portal app
Hi Michael,
I am trying to install ONAP OOM Beijing. I have setup one Rancher VM and
cluster of 4 Kubernetes nodes . Rancher VM is 12 vCPU, 15GB RAM , 80GB disk.
Each k8s node is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting up in
OpenStack following the steps is link
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html<https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fonap.readthedocs.io%2Fen%2Flatest%2Fsubmodules%2Foom.git%2Fdocs%2Foom_setup_kubernetes_rancher.html&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=QdyQWTZpRsQQ2bdilKXCrBKk7ydFYhMfkArzo4A%2BQNM%3D&reserved=0>.
I tired cd.sh script earlier many times but each time most of the pods did not
come up in running state and were stuck. So I resorted to getting the pods up
one by one (in smaller groups) using "helm install local/onap -n onap
-namespace onap -f values.yaml" and then adding more components using "helm
upgrade onap local/onap -f values.yaml". This way I am able to get almost all
pods running except few. One is the portal app pod which is stuck in init state
waiting for ever. Is this due to some dependency on other pods? Is there a
sequence I should follow while bringing the pods up? Is there any dependency
diagram of components for Beijing release?
Earlier I also faced problems in getting policy nexus (sonatype) pod up but
after the recent bug fixes in liveness and readiness time delay I got it
working for me now.
Would really appreciate your suggestions.
Thanks,
Vidhu
::DISCLAIMER::
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and intended
for the named recipient(s) only. E-mail transmission is not guaranteed to be
secure or error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or may contain viruses in transmission.
The e mail and its contents (with or without referred errors) shall therefore
not attach any liability on the originator or HCL or its affiliates. Views or
opinions, if any, presented in this email are solely those of the author and
may not necessarily reflect the views or opinions of HCL or its affiliates. Any
form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written
consent of authorized representative of HCL is strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. Before opening any email and/or attachments, please check them for
viruses and other defects.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at
https://www.amdocs.com/about/email-disclaimer<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amdocs.com%2Fabout%2Femail-disclaimer&data=02%7C01%7Cvidhu.pandey%40hcl.com%7C72d034dbb53f4dd2c42c08d5e205a0ca%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636663436181210425&sdata=5M2LXWr9g6RHHQA02X4EjMnBimypPqGBcsSmIRbgrNg%3D&reserved=0>
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
<https://www.amdocs.com/about/email-disclaimer>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#11281): https://lists.onap.org/g/onap-discuss/message/11281
Mute This Topic: https://lists.onap.org/mt/23146822/21656
Group Owner: [email protected]
Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-