33 HC pass

Michael O'Brien Tue, 10 Apr 2018 13:46:44 -0700

Guys,
    The Amazon AWS master/Beijing hourly OOM Kubernetes deploy CD test server 
has been fixed.
    cd.sh retrofitted to use new helm install/delete work done by the OOM team 
along with the project teams (thank you)
    The new OOM helm based orchestration system is a lot cleaner/better - 
essentially a single helm delete/install line (the rest of cd.sh is monitoring 
and health sh code)


- healthcheck results available again in kibana every 2 hours for master - full 
delete/install of the 102 containers and the config dir.

https://jenkins.onap.org/job/oom-cd-master

http://kibana.onap.info:5601/app/kibana#/dashboard/AWAtvpS63NTXK5mX2kuS?_g=()&_a=(description:'',filters:!(),options:(darkTheme:!f),panels:!((col:1,id:AWAts77k3NTXK5mX2kuM,panelIndex:1,row:1,size_x:8,size_y:3,type:visualization),(col:9,id:AWAtuTVI3NTXK5mX2kuP,panelIndex:2,row:1,size_x:4,size_y:3,type:visualization),(col:1,id:AWAtuBTY3NTXK5mX2kuO,panelIndex:3,row:7,size_x:6,size_y:3,type:visualization),(col:1,id:AWAttmqB3NTXK5mX2kuN,panelIndex:4,row:4,size_x:6,size_y:3,type:visualization),(col:7,id:AWAtvHtY3NTXK5mX2kuR,panelIndex:6,row:4,size_x:6,size_y:6,type:visualization)),query:(match_all:()),timeRestore:!f,title:'CD%20Health%20Check',uiState:(),viewMode:view)


    First retrofitted build showing 19/33 since the switch just before ONS on 
the 20th http://jenkins.onap.info/job/oom-cd-master/2628/console
    Merging cd.sh changes to https://jira.onap.org/browse/OOM-716

    Note: getting about 24 container failures (half are in normal hierarchy 
pending state waiting on dependencies to finish - the subset of 24 that is 
actually having issues) of the full 94 - this is being addressed by everyone as 
we speak.
    Note: we are still good with Rancher 1.6.14 and Kubernetes 1.8.9+ (that 
regression 4 weeks ago with Kubernetes 1.8.9 was fixed).
    Stats: on a 16 vCore 122G vm it takes about 18 min to delete ONAP, the 
system comes up in about 35 min (this includes a pull policy of always and 
accounting for configuration tgz extraction to /dockerdata-nfs/onap)
    At around the 20 min interval we start to saturate all vCores - last check 
on a 64 vCore vm was 55 - with 94 containers I expect higher now - so as before 
a larger VM or more clusters hosts will be better.
    At 25 min 60/94 containers are up
    At 36 min 79/94 containers are up (steady state for now) - up from 27 
failures or 75/94 up yesterday

   I am working on the Azure one next at onap.cloud - but the kibana 
configuration was wiped just after ONS (no security on the ELK stack) - so use 
AWS based onap.info for now.

   Reference page
https://wiki.onap.org/display/DW/Auto+Continuous+Deployment+via+Jenkins+and+Kibana#AutoContinuousDeploymentviaJenkinsandKibana-AutomatedONAPCDInfrastructure
   Other LF CD system
    The last run showed 17/30 which aligns with the current 19/30 if 2 
components were fixed in the last 6 hours or we are seeing intermittent 
healthcheck timing issues.
https://jenkins.onap.org/view/External%20Labs/job/lab-tlab-beijing-oom-deploy/240/console

    thank you
    /michael

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>

_______________________________________________
onap-discuss mailing list
[email protected]
https://lists.onap.org/mailman/listinfo/onap-discuss

[onap-discuss] [OOM] CD master auto hourly/2 deploy job retrofitted - seeing 19/33 HC pass

Reply via email to