Unofficial workaround is to revert to Rancher 1.6.12 under Docker 1.12 which installs Kubernetes 1.8.3 It looks like when Rancher 1.6.15 was released 4 days ago, their team also upgraded 1.6.14 and 1.6.13 from Kubernetes 1.8.5 to 1.8.9. We seem to have a persistent volume issue specific to Kubernetes 1.8.9+ Usually software does not retroactively upgrade released software - it would be like we forced Amsterdam to use the versions from Beijing - there must be good reason for this.
Also tested 1.6.15 running Kubernetes 1.9.2 - same pv issue So I will adjust cd.sh - please use the following versions for now until the testing/integration team verifies the versions https://jira.onap.org/browse/OOM-716 Rancher v1.6.12 Docker 1.12 (downgrade from 17.03 required) Kubectl 1.8.3 to 1.8.6 (didn't test a downgrade Helm 2.6.1 (didn't test a downgrade) Rancher issue below - I have a new contact at Rancher and will setup a meet this week to go over their release details - of which ONAP is very sensitive to - especially when their release usually comes a couple weeks before our milestones https://jira.onap.org/browse/OOM-813 https://github.com/rancher/rancher/issues/12178 root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/ onap aaf-6c64db8fdd-zk9qv 0/1 Running 0 40m onap sdnc-dgbuilder-794d686f78-jmd8x 0/1 Init:0/1 0 4m onap sdnc-dmaap-listener-8595c8f6c-kfpmp 0/1 Init:0/1 0 4m onap sdnc-portal-69b79b6646-t42lv 0/1 Init:0/1 0 4m onap sdnc-ueb-listener-6897f6dd55-2nrzz 0/1 Init:0/1 0 4m onap vfc-ztevnfmdriver-fcf4ddf68-dr2jw 0/1 ImagePullBackOff 0 40m onap vnfsdk-refrepo-55f544c5f5-stm45 0/1 ImagePullBackOff 0 40m root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/ onap aaf-6c64db8fdd-zk9qv 0/1 Running 0 43m onap vfc-ztevnfmdriver-fcf4ddf68-dr2jw 0/1 ImagePullBackOff 0 43m onap vnfsdk-refrepo-55f544c5f5-stm45 0/1 ImagePullBackOff 0 43m root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.3-rancher3", GitCommit:"772c4c54e1f4ae7fc6f63a8e1ecd9fe616268e16", GitTreeState:"clean", BuildDate:"2017-11-27T19:51:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} /michael From: onap-discuss-boun...@lists.onap.org [mailto:onap-discuss-boun...@lists.onap.org] On Behalf Of Michael O'Brien Sent: Monday, March 19, 2018 17:24 To: Gary Wu <gary.i...@huawei.com>; onap-discuss@lists.onap.org Subject: Re: [onap-discuss] New OOM deployment issues Gary, Good one - didn't see that - Rancher 1.6.14 should not be changing from K8S 1.8.5 to 1.8.6 - but looks like they backported a change and upgraded from 1.8.5 to 1.8.9 Nice catch - testing this to see if this is the issue One thing I'll also test is an upgrade of the client to 1.8.9 v1.8.9-rancher1 Makes sense since my AWS server is static (ONAP up/down on the same server (delete oom repo/delete/create pods) - over and over) But my Azure system is dynamic - completely new VM + docker pull + rancher + oom install every 2 hours) Good catch Gary - Rancher 1.6.14 is now running 1.8.9 as of 3 days ago - instead of 1.8.5 from an Azure VM today Server: Version: 17.03.2-ce API version: 1.27 (minimum version 1.12) Go version: go1.7.5 Git commit: f5ec1e2 Built: Tue Jun 27 03:35:14 2017 OS/Arch: linux/amd64 Experimental: false root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.9-rancher1", GitCommit:"68595e18f25e24125244e9966b1e5468a98c1cd4", GitTreeState:"clean", BuildDate:"2018-03-13T04:37:53Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# helm version Client: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitT Retesting on a clean AWS system now /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 17:07 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Hi Michael, My versions are locked down and are the same as the ones you specified for master branch. But, it seems like Rancher v1.6.14 decided to deploy a different version of Kubernetes since Friday. Maybe this is a bug in Rancher? Thanks, Gary From: Michael O'Brien [mailto:frank.obr...@amdocs.com] Sent: Monday, March 19, 2018 1:34 PM To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Checking. There is documentation that the OOM and Integration team keeps up to date on this below https://wiki.onap.org/display/DW/ONAP+on+Kubernetes#ONAPonKubernetes-SoftwareRequirements You should be ok with Kubernetes 1.8.x in master - but we need to verify post 1.8.6 Since Rancher 1.6.14 is still at 1.8.5 (it should not move) 1.8.6 is the closest. I am running 2.6.1 helm server/client and K8s 1.8.5 server, 1.8.6 client. Normally for the CD you should be on a locked down version of Kubernetes, Rancher, Helm and (not so much docker). My script has these hardcoded for each branch https://gerrit.onap.org/r/#/c/32019/11/install/rancher/oom_rancher_setup.sh https://jira.onap.org/browse/OOM-716 if [ "$BRANCH" == "amsterdam" ]; then RANCHER_VERSION=1.6.10 KUBECTL_VERSION=1.7.7 HELM_VERSION=2.3.0 DOCKER_VERSION=1.12 else RANCHER_VERSION=1.6.14 KUBECTL_VERSION=1.8.6 HELM_VERSION=2.6.1 DOCKER_VERSION=17.03 fi These versions are what I run for everything AWS, Azure, Openstack, VMWare Unfortunately AWS had a resource issue on the 17th so all spot VMs were reset when the market rose to peak - I lost a week of runs and only have hourly master traffic from 17 Mar at 1400h. When I get some time I will retest a couple more environments to narrow it down - as I also need to get master working in azure (currently only amsterdam deploys there). /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 15:52 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Hi Michael, For reference, both of my environments (Wind River / TLAB) are running OOM as the root user, and they seem to be failing the same error as your Azure/master/ubuntu environment, so it may not be an issue with root user vs. ubuntu user. The failures started on 3/16 between noon and 6 Pacific time. The only thing new that happened in my environments during that time seems to be the docker image rancher/k8s:v1.8.9-rancher-1.2. For comparison, another environment I deployed a week ago is on rancher/k8s:v1.8.5-rancher4 which was working fine. This is without me updating any rancher-specific configuration between the two, so maybe Rancher itself has changed? Can you check your various OOM environments and see what versions of rancher/k8s they're on? Thanks, Gary From: Michael O'Brien [mailto:frank.obr...@amdocs.com] Sent: Monday, March 19, 2018 11:36 AM To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Gary, Adding onap-discuss as we should always discuss ONAP health in public - as it may also catch the attention of anyone who did these changes. Yes when working on OOM-710 over the weekend noticed this issue specific to only my azure instances running in Ubuntu (I was working in master mainly so did not check amsterdam for a while - just checked and it is OK) - Assumed it was my arm template as I am testing the entrypoint script in the script extension point. I say this because I have always had this problem in Azure specific only to the Ubuntu user - since I started running as Ubuntu instead of just root (around Friday) Undercloud seems to be the issue here - mixed with some config in master (azure/openstack have issues, aws does not) Running the install in root did not have the issue on either AWS:EBS or Azure - before the 15th and only in azure/openstack:ubuntu:master after Running on AWS EBS also does not have the issue on Ubuntu or root So it looks like a permissions change on the config files sensitive to the file system. There were only 3 commits to master since the 15th so it does not look like any of those 3 would cause it https://gerrit.onap.org/r/#/q/status:merged+oom Raised the following just for tracking - but until we go through the exact start of this change we won't know which PV code change did it - if any did. You don't give specfics but in my Jira 39 pods are failing (half of these are normal hierarchy failures until the ones actually busted get fixed) https://jira.onap.org/browse/OOM-813 Remember the job of both of our CD systems is specifically to catch these and eventually mark the commit causing it with ONAPCDBuilder -1 - so it is good we are catching them manually for now - as long as they are not config issues or red herrings - hence the need for more than one type of undercloud. state AWS, amsterdam, ubuntu user = ? AWS, beijing, ubuntu user = OK (20180319) AWS, beijing, root user = ? Azure, amsterdam, ubuntu user = OK (20180319) http://jenkins.onap.cloud/job/oom_azure_deployment/13/console Azure, beijing, ubuntu user = BUSTED (20180319) Azure, beijing, root user = in progress now (ete 75 min) - but a previous instance before the 14th is ok AWS is fine http://jenkins.onap.info/job/oom-cd/2410/console Azure has issues on master not amsterdam on the ubuntu user http://jenkins.onap.cloud/job/oom-cd-master/13/console When my next run comes up - I will get the error directly from the k8s console (these are deleted by now) master pending containers=39 onap aaf-6c64db8fdd-fgwxb 0/1 Running 0 27m onap aai-data-router-6fbb8695d4-9s6w2 0/1 CreateContainerConfigError 0 27m onap aai-elasticsearch-7f66545fdf-q7gnh 0/1 CreateContainerConfigError 0 27m onap aai-model-loader-service-7768db4744-lj9bg 0/2 CreateContainerConfigError 0 27m onap aai-resources-9f95b9b6d-qrhs5 0/2 CreateContainerConfigError 0 27m onap aai-search-data-service-99dff479c-fr8bh 0/2 CreateContainerConfigError 0 27m onap aai-service-5698ddc455-npsm6 0/1 Init:0/1 2 27m onap aai-sparky-be-57bd9944b5-cmqvc 0/2 CreateContainerConfigError 0 27m onap aai-traversal-df4b45c4-sjtlx 0/2 Init:0/1 0 27m onap appc-67c6b9d477-n64mk 0/2 CreateContainerConfigError 0 27m onap appc-dgbuilder-68c68ff84b-x6dst 0/1 Init:0/1 0 27m onap clamp-6889598c4-76mww 0/1 Init:0/1 2 27m onap clamp-mariadb-78c46967b8-2w922 0/1 CreateContainerConfigError 0 27m onap log-elasticsearch-6ff5b5459d-2zq2b 0/1 CreateContainerConfigError 0 27m onap log-kibana-54c978c5fc-457gb 0/1 Init:0/1 2 27m onap log-logstash-5f6fbc4dff-t2hh9 0/1 Init:0/1 2 27m onap mso-555464596b-t5fc2 0/2 Init:0/1 2 28m onap mso-mariadb-5448666ccc-kddh6 0/1 CreateContainerConfigError 0 28m onap multicloud-framework-57687dc8c-nf7pk 0/2 CreateContainerConfigError 0 27m onap multicloud-vio-5bfb9f68db-g6j7h 0/2 CreateContainerConfigError 0 27m onap policy-brmsgw-5f445cfcfb-wzb88 0/1 Init:0/1 2 27m onap policy-drools-5b67c475d6-pv6kt 0/2 CreateContainerConfigError 0 27m onap policy-pap-79577c6947-fhfxb 0/2 Init:CrashLoopBackOff 8 27m onap policy-pdp-7d5c76bf8d-st7js 0/2 Init:0/1 2 27m onap portal-apps-7ddfc4b6bd-g7nhk 0/2 Init:CreateContainerConfigError 0 27m onap portal-vnc-7dcbf79f66-7c6p6 0/1 Init:0/4 2 27m onap portal-widgets-6979b47c48-5kr86 0/1 CreateContainerConfigError 0 27m onap robot-f6d55cc87-t2wgd 0/1 CreateContainerConfigError 0 27m onap sdc-fe-6d4b87c978-2v5x2 0/2 CreateContainerConfigError 0 27m onap sdnc-0 0/2 Init:0/1 2 28m onap sdnc-dbhost-0 0/2 Pending 0 28m onap sdnc-dgbuilder-794d686f78-296zf 0/1 Init:0/1 2 28m onap sdnc-dmaap-listener-8595c8f6c-vgzxt 0/1 Init:0/1 2 28m onap sdnc-portal-69b79b6646-p4x8k 0/1 Init:0/1 2 28m onap sdnc-ueb-listener-6897f6dd55-fq9j5 0/1 Init:0/1 2 28m onap vfc-ztevnfmdriver-fcf4ddf68-65pb5 0/1 ImagePullBackOff 0 27m onap vid-mariadb-6788c598fb-kbfnw 0/1 CreateContainerConfigError 0 28m onap vid-server-87d5d87cf-9rbx4 0/2 Init:0/1 2 28m onap vnfsdk-refrepo-55f544c5f5-9b6jj 0/1 ImagePullBackOff 0 27m http://beijing.onap.cloud:8880/r/projects/1a7/kubernetes-dashboard:9090/#!/pod?namespace=_all Checking amsterdam on Azure running via Ubuntu user = OK root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0# tail -f stdout root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0/oom# git status On branch amsterdam Your branch is up-to-date with 'origin/amsterdam'. 4 pending > 0 at the 62th 15 sec interval onap-aaf aaf-1993711932-3lnwd 0/1 Running 0 19m onap-vnfsdk refrepo-1924147637-1x10v 0/1 ErrImagePull 0 19m 3 pending > 0 at the 63th 15 sec interval /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 11:04 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>> Cc: Yunxia Chen <helen.c...@huawei.com<mailto:helen.c...@huawei.com>>; PLATANIA, MARCO <plata...@research.att.com<mailto:plata...@research.att.com>>; FREEMAN, BRIAN D <bf1...@att.com<mailto:bf1...@att.com>> Subject: New OOM deployment issues Hi Michael, Since some time Friday afternoon, my daily OOM deployments have been failing "Error: failed to prepare subPath for volumeMount ..." for various ONAP pods. Has anything changed recently that may be causing this issue? It also looks like the robot logs directory is still not there under /dockerdata-nfs/. Do we have a ticket tracking this issue? Thanks, Gary This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
_______________________________________________ onap-discuss mailing list onap-discuss@lists.onap.org https://lists.onap.org/mailman/listinfo/onap-discuss