Guys,
The Amazon AWS master/Beijing hourly OOM Kubernetes deploy CD test server
has been fixed.
cd.sh retrofitted to use new helm install/delete work done by the OOM team
along with the project teams (thank you)
The new OOM helm based orchestration system is a lot cleaner/better -
essentially a single helm delete/install line (the rest of cd.sh is monitoring
and health sh code)
- healthcheck results available again in kibana every 2 hours for master - full
delete/install of the 102 containers and the config dir.
https://jenkins.onap.org/job/oom-cd-master
http://kibana.onap.info:5601/app/kibana#/dashboard/AWAtvpS63NTXK5mX2kuS?_g=()&_a=(description:'',filters:!(),options:(darkTheme:!f),panels:!((col:1,id:AWAts77k3NTXK5mX2kuM,panelIndex:1,row:1,size_x:8,size_y:3,type:visualization),(col:9,id:AWAtuTVI3NTXK5mX2kuP,panelIndex:2,row:1,size_x:4,size_y:3,type:visualization),(col:1,id:AWAtuBTY3NTXK5mX2kuO,panelIndex:3,row:7,size_x:6,size_y:3,type:visualization),(col:1,id:AWAttmqB3NTXK5mX2kuN,panelIndex:4,row:4,size_x:6,size_y:3,type:visualization),(col:7,id:AWAtvHtY3NTXK5mX2kuR,panelIndex:6,row:4,size_x:6,size_y:6,type:visualization)),query:(match_all:()),timeRestore:!f,title:'CD%20Health%20Check',uiState:(),viewMode:view)
First retrofitted build showing 19/33 since the switch just before ONS on
the 20th http://jenkins.onap.info/job/oom-cd-master/2628/console
Merging cd.sh changes to https://jira.onap.org/browse/OOM-716
Note: getting about 24 container failures (half are in normal hierarchy
pending state waiting on dependencies to finish - the subset of 24 that is
actually having issues) of the full 94 - this is being addressed by everyone as
we speak.
Note: we are still good with Rancher 1.6.14 and Kubernetes 1.8.9+ (that
regression 4 weeks ago with Kubernetes 1.8.9 was fixed).
Stats: on a 16 vCore 122G vm it takes about 18 min to delete ONAP, the
system comes up in about 35 min (this includes a pull policy of always and
accounting for configuration tgz extraction to /dockerdata-nfs/onap)
At around the 20 min interval we start to saturate all vCores - last check
on a 64 vCore vm was 55 - with 94 containers I expect higher now - so as before
a larger VM or more clusters hosts will be better.
At 25 min 60/94 containers are up
At 36 min 79/94 containers are up (steady state for now) - up from 27
failures or 75/94 up yesterday
I am working on the Azure one next at onap.cloud - but the kibana
configuration was wiped just after ONS (no security on the ELK stack) - so use
AWS based onap.info for now.
Reference page
https://wiki.onap.org/display/DW/Auto+Continuous+Deployment+via+Jenkins+and+Kibana#AutoContinuousDeploymentviaJenkinsandKibana-AutomatedONAPCDInfrastructure
Other LF CD system
The last run showed 17/30 which aligns with the current 19/30 if 2
components were fixed in the last 6 hours or we are seeing intermittent
healthcheck timing issues.
https://jenkins.onap.org/view/External%20Labs/job/lab-tlab-beijing-oom-deploy/240/console
thank you
/michael
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
<https://www.amdocs.com/about/email-disclaimer>
_______________________________________________
onap-discuss mailing list
[email protected]
https://lists.onap.org/mailman/listinfo/onap-discuss