Rancher has closed the issue as they have added 1.8.10 to 1.6.14 - retesting Unfortunately Rancher 1.6.14 which was released months ago has gone through 3 versions of Kubernetes in 7 days (1.8.5. 1.8.9 and 1.8.10) - need to see if we are compatible and also that helm 2.6.1 is ok with 1.8.9
https://github.com/rancher/rancher/issues/12178 via http://jenkins.onap.info/job/oom-cd/2457/console Note kubectl is now 1.8.10 We are in serious need of a DevOps testing team at ONAP /michael From: onap-discuss-boun...@lists.onap.org [mailto:onap-discuss-boun...@lists.onap.org] On Behalf Of Michael O'Brien Sent: Monday, March 19, 2018 17:24 To: Gary Wu <gary.i...@huawei.com>; onap-discuss@lists.onap.org Subject: Re: [onap-discuss] New OOM deployment issues Gary, Good one - didn't see that - Rancher 1.6.14 should not be changing from K8S 1.8.5 to 1.8.6 - but looks like they backported a change and upgraded from 1.8.5 to 1.8.9 Nice catch - testing this to see if this is the issue One thing I'll also test is an upgrade of the client to 1.8.9 v1.8.9-rancher1 Makes sense since my AWS server is static (ONAP up/down on the same server (delete oom repo/delete/create pods) - over and over) But my Azure system is dynamic - completely new VM + docker pull + rancher + oom install every 2 hours) Good catch Gary - Rancher 1.6.14 is now running 1.8.9 as of 3 days ago - instead of 1.8.5 from an Azure VM today Server: Version: 17.03.2-ce API version: 1.27 (minimum version 1.12) Go version: go1.7.5 Git commit: f5ec1e2 Built: Tue Jun 27 03:35:14 2017 OS/Arch: linux/amd64 Experimental: false root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# kubectl version Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.9-rancher1", GitCommit:"68595e18f25e24125244e9966b1e5468a98c1cd4", GitTreeState:"clean", BuildDate:"2018-03-13T04:37:53Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# helm version Client: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitT Retesting on a clean AWS system now /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 17:07 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Hi Michael, My versions are locked down and are the same as the ones you specified for master branch. But, it seems like Rancher v1.6.14 decided to deploy a different version of Kubernetes since Friday. Maybe this is a bug in Rancher? Thanks, Gary From: Michael O'Brien [mailto:frank.obr...@amdocs.com] Sent: Monday, March 19, 2018 1:34 PM To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Checking. There is documentation that the OOM and Integration team keeps up to date on this below https://wiki.onap.org/display/DW/ONAP+on+Kubernetes#ONAPonKubernetes-SoftwareRequirements You should be ok with Kubernetes 1.8.x in master - but we need to verify post 1.8.6 Since Rancher 1.6.14 is still at 1.8.5 (it should not move) 1.8.6 is the closest. I am running 2.6.1 helm server/client and K8s 1.8.5 server, 1.8.6 client. Normally for the CD you should be on a locked down version of Kubernetes, Rancher, Helm and (not so much docker). My script has these hardcoded for each branch https://gerrit.onap.org/r/#/c/32019/11/install/rancher/oom_rancher_setup.sh https://jira.onap.org/browse/OOM-716 if [ "$BRANCH" == "amsterdam" ]; then RANCHER_VERSION=1.6.10 KUBECTL_VERSION=1.7.7 HELM_VERSION=2.3.0 DOCKER_VERSION=1.12 else RANCHER_VERSION=1.6.14 KUBECTL_VERSION=1.8.6 HELM_VERSION=2.6.1 DOCKER_VERSION=17.03 fi These versions are what I run for everything AWS, Azure, Openstack, VMWare Unfortunately AWS had a resource issue on the 17th so all spot VMs were reset when the market rose to peak - I lost a week of runs and only have hourly master traffic from 17 Mar at 1400h. When I get some time I will retest a couple more environments to narrow it down - as I also need to get master working in azure (currently only amsterdam deploys there). /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 15:52 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Hi Michael, For reference, both of my environments (Wind River / TLAB) are running OOM as the root user, and they seem to be failing the same error as your Azure/master/ubuntu environment, so it may not be an issue with root user vs. ubuntu user. The failures started on 3/16 between noon and 6 Pacific time. The only thing new that happened in my environments during that time seems to be the docker image rancher/k8s:v1.8.9-rancher-1.2. For comparison, another environment I deployed a week ago is on rancher/k8s:v1.8.5-rancher4 which was working fine. This is without me updating any rancher-specific configuration between the two, so maybe Rancher itself has changed? Can you check your various OOM environments and see what versions of rancher/k8s they're on? Thanks, Gary From: Michael O'Brien [mailto:frank.obr...@amdocs.com] Sent: Monday, March 19, 2018 11:36 AM To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org> Subject: RE: New OOM deployment issues Gary, Adding onap-discuss as we should always discuss ONAP health in public - as it may also catch the attention of anyone who did these changes. Yes when working on OOM-710 over the weekend noticed this issue specific to only my azure instances running in Ubuntu (I was working in master mainly so did not check amsterdam for a while - just checked and it is OK) - Assumed it was my arm template as I am testing the entrypoint script in the script extension point. I say this because I have always had this problem in Azure specific only to the Ubuntu user - since I started running as Ubuntu instead of just root (around Friday) Undercloud seems to be the issue here - mixed with some config in master (azure/openstack have issues, aws does not) Running the install in root did not have the issue on either AWS:EBS or Azure - before the 15th and only in azure/openstack:ubuntu:master after Running on AWS EBS also does not have the issue on Ubuntu or root So it looks like a permissions change on the config files sensitive to the file system. There were only 3 commits to master since the 15th so it does not look like any of those 3 would cause it https://gerrit.onap.org/r/#/q/status:merged+oom Raised the following just for tracking - but until we go through the exact start of this change we won't know which PV code change did it - if any did. You don't give specfics but in my Jira 39 pods are failing (half of these are normal hierarchy failures until the ones actually busted get fixed) https://jira.onap.org/browse/OOM-813 Remember the job of both of our CD systems is specifically to catch these and eventually mark the commit causing it with ONAPCDBuilder -1 - so it is good we are catching them manually for now - as long as they are not config issues or red herrings - hence the need for more than one type of undercloud. state AWS, amsterdam, ubuntu user = ? AWS, beijing, ubuntu user = OK (20180319) AWS, beijing, root user = ? Azure, amsterdam, ubuntu user = OK (20180319) http://jenkins.onap.cloud/job/oom_azure_deployment/13/console Azure, beijing, ubuntu user = BUSTED (20180319) Azure, beijing, root user = in progress now (ete 75 min) - but a previous instance before the 14th is ok AWS is fine http://jenkins.onap.info/job/oom-cd/2410/console Azure has issues on master not amsterdam on the ubuntu user http://jenkins.onap.cloud/job/oom-cd-master/13/console When my next run comes up - I will get the error directly from the k8s console (these are deleted by now) master pending containers=39 onap aaf-6c64db8fdd-fgwxb 0/1 Running 0 27m onap aai-data-router-6fbb8695d4-9s6w2 0/1 CreateContainerConfigError 0 27m onap aai-elasticsearch-7f66545fdf-q7gnh 0/1 CreateContainerConfigError 0 27m onap aai-model-loader-service-7768db4744-lj9bg 0/2 CreateContainerConfigError 0 27m onap aai-resources-9f95b9b6d-qrhs5 0/2 CreateContainerConfigError 0 27m onap aai-search-data-service-99dff479c-fr8bh 0/2 CreateContainerConfigError 0 27m onap aai-service-5698ddc455-npsm6 0/1 Init:0/1 2 27m onap aai-sparky-be-57bd9944b5-cmqvc 0/2 CreateContainerConfigError 0 27m onap aai-traversal-df4b45c4-sjtlx 0/2 Init:0/1 0 27m onap appc-67c6b9d477-n64mk 0/2 CreateContainerConfigError 0 27m onap appc-dgbuilder-68c68ff84b-x6dst 0/1 Init:0/1 0 27m onap clamp-6889598c4-76mww 0/1 Init:0/1 2 27m onap clamp-mariadb-78c46967b8-2w922 0/1 CreateContainerConfigError 0 27m onap log-elasticsearch-6ff5b5459d-2zq2b 0/1 CreateContainerConfigError 0 27m onap log-kibana-54c978c5fc-457gb 0/1 Init:0/1 2 27m onap log-logstash-5f6fbc4dff-t2hh9 0/1 Init:0/1 2 27m onap mso-555464596b-t5fc2 0/2 Init:0/1 2 28m onap mso-mariadb-5448666ccc-kddh6 0/1 CreateContainerConfigError 0 28m onap multicloud-framework-57687dc8c-nf7pk 0/2 CreateContainerConfigError 0 27m onap multicloud-vio-5bfb9f68db-g6j7h 0/2 CreateContainerConfigError 0 27m onap policy-brmsgw-5f445cfcfb-wzb88 0/1 Init:0/1 2 27m onap policy-drools-5b67c475d6-pv6kt 0/2 CreateContainerConfigError 0 27m onap policy-pap-79577c6947-fhfxb 0/2 Init:CrashLoopBackOff 8 27m onap policy-pdp-7d5c76bf8d-st7js 0/2 Init:0/1 2 27m onap portal-apps-7ddfc4b6bd-g7nhk 0/2 Init:CreateContainerConfigError 0 27m onap portal-vnc-7dcbf79f66-7c6p6 0/1 Init:0/4 2 27m onap portal-widgets-6979b47c48-5kr86 0/1 CreateContainerConfigError 0 27m onap robot-f6d55cc87-t2wgd 0/1 CreateContainerConfigError 0 27m onap sdc-fe-6d4b87c978-2v5x2 0/2 CreateContainerConfigError 0 27m onap sdnc-0 0/2 Init:0/1 2 28m onap sdnc-dbhost-0 0/2 Pending 0 28m onap sdnc-dgbuilder-794d686f78-296zf 0/1 Init:0/1 2 28m onap sdnc-dmaap-listener-8595c8f6c-vgzxt 0/1 Init:0/1 2 28m onap sdnc-portal-69b79b6646-p4x8k 0/1 Init:0/1 2 28m onap sdnc-ueb-listener-6897f6dd55-fq9j5 0/1 Init:0/1 2 28m onap vfc-ztevnfmdriver-fcf4ddf68-65pb5 0/1 ImagePullBackOff 0 27m onap vid-mariadb-6788c598fb-kbfnw 0/1 CreateContainerConfigError 0 28m onap vid-server-87d5d87cf-9rbx4 0/2 Init:0/1 2 28m onap vnfsdk-refrepo-55f544c5f5-9b6jj 0/1 ImagePullBackOff 0 27m http://beijing.onap.cloud:8880/r/projects/1a7/kubernetes-dashboard:9090/#!/pod?namespace=_all Checking amsterdam on Azure running via Ubuntu user = OK root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0# tail -f stdout root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0/oom# git status On branch amsterdam Your branch is up-to-date with 'origin/amsterdam'. 4 pending > 0 at the 62th 15 sec interval onap-aaf aaf-1993711932-3lnwd 0/1 Running 0 19m onap-vnfsdk refrepo-1924147637-1x10v 0/1 ErrImagePull 0 19m 3 pending > 0 at the 63th 15 sec interval /michael From: Gary Wu [mailto:gary.i...@huawei.com] Sent: Monday, March 19, 2018 11:04 To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>> Cc: Yunxia Chen <helen.c...@huawei.com<mailto:helen.c...@huawei.com>>; PLATANIA, MARCO <plata...@research.att.com<mailto:plata...@research.att.com>>; FREEMAN, BRIAN D <bf1...@att.com<mailto:bf1...@att.com>> Subject: New OOM deployment issues Hi Michael, Since some time Friday afternoon, my daily OOM deployments have been failing "Error: failed to prepare subPath for volumeMount ..." for various ONAP pods. Has anything changed recently that may be causing this issue? It also looks like the robot logs directory is still not there under /dockerdata-nfs/. Do we have a ticket tracking this issue? Thanks, Gary This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
_______________________________________________ onap-discuss mailing list onap-discuss@lists.onap.org https://lists.onap.org/mailman/listinfo/onap-discuss