Unofficial workaround is to revert to Rancher 1.6.12 under Docker 1.12 which 
installs Kubernetes 1.8.3
It looks like when Rancher 1.6.15 was released 4 days ago, their team also 
upgraded 1.6.14 and 1.6.13 from Kubernetes 1.8.5 to 1.8.9.  We seem to have a 
persistent volume issue specific to Kubernetes 1.8.9+
Usually software does not retroactively upgrade released software - it would be 
like we forced Amsterdam to use the versions from Beijing - there must be good 
reason for this.

Also tested 1.6.15 running Kubernetes 1.9.2 - same pv issue

So I will adjust cd.sh - please use the following versions for now until the 
testing/integration team verifies the versions
https://jira.onap.org/browse/OOM-716

Rancher v1.6.12
Docker 1.12 (downgrade from 17.03 required)
Kubectl 1.8.3 to 1.8.6  (didn't test a downgrade
Helm 2.6.1 (didn't test a downgrade)

Rancher issue below - I have a new contact at Rancher and will setup a meet 
this week to go over their release details - of which ONAP is very sensitive to 
- especially when their release usually comes a couple weeks before our 
milestones

https://jira.onap.org/browse/OOM-813
https://github.com/rancher/rancher/issues/12178


root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl get pods 
--all-namespaces | grep 0/
onap          aaf-6c64db8fdd-zk9qv                           0/1       Running  
          0          40m
onap          sdnc-dgbuilder-794d686f78-jmd8x                0/1       Init:0/1 
          0          4m
onap          sdnc-dmaap-listener-8595c8f6c-kfpmp            0/1       Init:0/1 
          0          4m
onap          sdnc-portal-69b79b6646-t42lv                   0/1       Init:0/1 
          0          4m
onap          sdnc-ueb-listener-6897f6dd55-2nrzz             0/1       Init:0/1 
          0          4m
onap          vfc-ztevnfmdriver-fcf4ddf68-dr2jw              0/1       
ImagePullBackOff   0          40m
onap          vnfsdk-refrepo-55f544c5f5-stm45                0/1       
ImagePullBackOff   0          40m
root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl get pods 
--all-namespaces | grep 0/
onap          aaf-6c64db8fdd-zk9qv                           0/1       Running  
          0          43m
onap          vfc-ztevnfmdriver-fcf4ddf68-dr2jw              0/1       
ImagePullBackOff   0          43m
onap          vnfsdk-refrepo-55f544c5f5-stm45                0/1       
ImagePullBackOff   0          43m
root@ip-172-31-12-163:~/oom/kubernetes/oneclick# kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", 
GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", 
BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", 
Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", 
GitVersion:"v1.8.3-rancher3", 
GitCommit:"772c4c54e1f4ae7fc6f63a8e1ecd9fe616268e16", GitTreeState:"clean", 
BuildDate:"2017-11-27T19:51:43Z", GoVersion:"go1.8.3", Compiler:"gc", 
Platform:"linux/amd64"}

/michael

From: onap-discuss-boun...@lists.onap.org 
[mailto:onap-discuss-boun...@lists.onap.org] On Behalf Of Michael O'Brien
Sent: Monday, March 19, 2018 17:24
To: Gary Wu <gary.i...@huawei.com>; onap-discuss@lists.onap.org
Subject: Re: [onap-discuss] New OOM deployment issues

Gary, Good one - didn't see that - Rancher 1.6.14 should not be changing from 
K8S 1.8.5 to 1.8.6 - but looks like they backported a change and upgraded from 
1.8.5 to 1.8.9
Nice catch - testing this to see if this is the issue
One thing I'll also test is an upgrade of the client to 1.8.9

v1.8.9-rancher1

Makes sense since my AWS server is static (ONAP up/down on the same server 
(delete oom repo/delete/create pods) - over and over)
But my Azure system is dynamic - completely new VM + docker pull + rancher + 
oom install every 2 hours)

Good catch Gary - Rancher 1.6.14 is now running 1.8.9 as of 3 days ago - 
instead of 1.8.5
from an Azure VM today
Server:
Version:      17.03.2-ce
API version:  1.27 (minimum version 1.12)
Go version:   go1.7.5
Git commit:   f5ec1e2
Built:        Tue Jun 27 03:35:14 2017
OS/Arch:      linux/amd64
Experimental: false
root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick#
 kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", 
GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", 
BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", 
Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", 
GitVersion:"v1.8.9-rancher1", 
GitCommit:"68595e18f25e24125244e9966b1e5468a98c1cd4", GitTreeState:"clean", 
BuildDate:"2018-03-13T04:37:53Z", GoVersion:"go1.8.3", Compiler:"gc", 
Platform:"linux/amd64"}
root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick#
 helm version
Client: &version.Version{SemVer:"v2.6.1", 
GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1", 
GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitT


Retesting on a clean AWS system now
/michael

From: Gary Wu [mailto:gary.i...@huawei.com]
Sent: Monday, March 19, 2018 17:07
To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; 
onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>
Subject: RE: New OOM deployment issues

Hi Michael,

My versions are locked down and are the same as the ones you specified for 
master branch.  But, it seems like Rancher v1.6.14 decided to deploy a 
different version of Kubernetes since Friday.  Maybe this is a bug in Rancher?

Thanks,
Gary


From: Michael O'Brien [mailto:frank.obr...@amdocs.com]
Sent: Monday, March 19, 2018 1:34 PM
To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; 
onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>
Subject: RE: New OOM deployment issues

Checking.
   There is documentation that the OOM and Integration team keeps up to date on 
this below
https://wiki.onap.org/display/DW/ONAP+on+Kubernetes#ONAPonKubernetes-SoftwareRequirements

   You should be ok with Kubernetes 1.8.x in master - but we need to verify 
post 1.8.6
   Since Rancher 1.6.14 is still at 1.8.5 (it should not move) 1.8.6 is the 
closest.
   I am running 2.6.1 helm server/client and K8s 1.8.5 server, 1.8.6 client.

   Normally for the CD you should be on a locked down version of Kubernetes, 
Rancher, Helm and (not so much docker).
   My script has these hardcoded for each branch
https://gerrit.onap.org/r/#/c/32019/11/install/rancher/oom_rancher_setup.sh
https://jira.onap.org/browse/OOM-716

  if [ "$BRANCH" == "amsterdam" ]; then
    RANCHER_VERSION=1.6.10
    KUBECTL_VERSION=1.7.7
    HELM_VERSION=2.3.0
    DOCKER_VERSION=1.12
  else
    RANCHER_VERSION=1.6.14
    KUBECTL_VERSION=1.8.6
    HELM_VERSION=2.6.1
    DOCKER_VERSION=17.03
  fi

  These versions are what I run for everything AWS, Azure, Openstack, VMWare
  Unfortunately AWS had a resource issue on the 17th so all spot VMs were reset 
when the market rose to peak - I lost a week of runs and only have hourly 
master traffic from 17 Mar at 1400h.

    When I get some time I will retest a couple more environments to narrow it 
down - as I also need to get master working in azure (currently only amsterdam 
deploys there).

  /michael




From: Gary Wu [mailto:gary.i...@huawei.com]
Sent: Monday, March 19, 2018 15:52
To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>; 
onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>
Subject: RE: New OOM deployment issues

Hi Michael,

For reference, both of my environments (Wind River / TLAB) are running OOM as 
the root user, and they seem to be failing the same error as your 
Azure/master/ubuntu environment, so it may not be an issue with root user vs. 
ubuntu user.

The failures started on 3/16 between noon and 6 Pacific time.  The only thing 
new that happened in my environments during that time seems to be the docker 
image rancher/k8s:v1.8.9-rancher-1.2.  For comparison, another environment I 
deployed a week ago is on rancher/k8s:v1.8.5-rancher4 which was working fine.  
This is without me updating any rancher-specific configuration between the two, 
so maybe Rancher itself has changed?

Can you check your various OOM environments and see what versions of 
rancher/k8s they're on?

Thanks,
Gary


From: Michael O'Brien [mailto:frank.obr...@amdocs.com]
Sent: Monday, March 19, 2018 11:36 AM
To: Gary Wu <gary.i...@huawei.com<mailto:gary.i...@huawei.com>>; 
onap-discuss@lists.onap.org<mailto:onap-discuss@lists.onap.org>
Subject: RE: New OOM deployment issues

Gary,
  Adding onap-discuss as we should always discuss ONAP health in public - as it 
may also catch the attention of anyone who did these changes.

  Yes when working on OOM-710 over the weekend noticed this issue specific to 
only my azure instances running in Ubuntu (I was working in master mainly so 
did not check amsterdam for a while - just checked and it is OK)  - Assumed it 
was my arm template as I am testing the entrypoint script in the script 
extension point.   I say this because I have always had this problem in Azure 
specific only to the Ubuntu user - since I started running as Ubuntu instead of 
just root (around Friday)

  Undercloud seems to be the issue here - mixed with some config in master 
(azure/openstack have issues, aws does not)
  Running the install in root did not have the issue on either AWS:EBS or Azure 
- before the 15th and only in azure/openstack:ubuntu:master after
  Running on AWS EBS also does not have the issue on Ubuntu or root

  So it looks like a permissions change on the config files sensitive to the 
file system.
  There were only 3 commits to master since the 15th so it does not look like 
any of those 3 would cause it
https://gerrit.onap.org/r/#/q/status:merged+oom

   Raised the following just for tracking - but until we go through the exact 
start of this change we won't know which PV code change did it - if any did.  
You don't give specfics but in my Jira 39 pods are failing (half of these are 
normal hierarchy failures until the ones actually busted get fixed)
https://jira.onap.org/browse/OOM-813

   Remember the job of both of our CD systems is specifically to catch these 
and eventually mark the commit causing it with ONAPCDBuilder -1 - so it is good 
we are catching them manually for now - as long as they are not config issues 
or red herrings - hence the need for more than one type of undercloud.


state
AWS, amsterdam, ubuntu user = ?
AWS, beijing, ubuntu user = OK (20180319)
AWS, beijing, root user = ?
Azure, amsterdam, ubuntu user = OK (20180319)  
http://jenkins.onap.cloud/job/oom_azure_deployment/13/console
Azure, beijing, ubuntu user = BUSTED (20180319)
Azure, beijing, root user = in progress now (ete 75 min) - but a previous 
instance before the 14th is ok

AWS is fine
http://jenkins.onap.info/job/oom-cd/2410/console
Azure has issues on master not amsterdam on the ubuntu user
http://jenkins.onap.cloud/job/oom-cd-master/13/console



When my next run comes up - I will get the error directly from the k8s console 
(these are deleted by now)

master

pending containers=39

onap          aaf-6c64db8fdd-fgwxb                           0/1       Running  
                         0          27m

onap          aai-data-router-6fbb8695d4-9s6w2               0/1       
CreateContainerConfigError        0          27m

onap          aai-elasticsearch-7f66545fdf-q7gnh             0/1       
CreateContainerConfigError        0          27m

onap          aai-model-loader-service-7768db4744-lj9bg      0/2       
CreateContainerConfigError        0          27m

onap          aai-resources-9f95b9b6d-qrhs5                  0/2       
CreateContainerConfigError        0          27m

onap          aai-search-data-service-99dff479c-fr8bh        0/2       
CreateContainerConfigError        0          27m

onap          aai-service-5698ddc455-npsm6                   0/1       Init:0/1 
                         2          27m

onap          aai-sparky-be-57bd9944b5-cmqvc                 0/2       
CreateContainerConfigError        0          27m

onap          aai-traversal-df4b45c4-sjtlx                   0/2       Init:0/1 
                         0          27m

onap          appc-67c6b9d477-n64mk                          0/2       
CreateContainerConfigError        0          27m

onap          appc-dgbuilder-68c68ff84b-x6dst                0/1       Init:0/1 
                         0          27m

onap          clamp-6889598c4-76mww                          0/1       Init:0/1 
                         2          27m

onap          clamp-mariadb-78c46967b8-2w922                 0/1       
CreateContainerConfigError        0          27m

onap          log-elasticsearch-6ff5b5459d-2zq2b             0/1       
CreateContainerConfigError        0          27m

onap          log-kibana-54c978c5fc-457gb                    0/1       Init:0/1 
                         2          27m

onap          log-logstash-5f6fbc4dff-t2hh9                  0/1       Init:0/1 
                         2          27m

onap          mso-555464596b-t5fc2                           0/2       Init:0/1 
                         2          28m

onap          mso-mariadb-5448666ccc-kddh6                   0/1       
CreateContainerConfigError        0          28m

onap          multicloud-framework-57687dc8c-nf7pk           0/2       
CreateContainerConfigError        0          27m

onap          multicloud-vio-5bfb9f68db-g6j7h                0/2       
CreateContainerConfigError        0          27m

onap          policy-brmsgw-5f445cfcfb-wzb88                 0/1       Init:0/1 
                         2          27m

onap          policy-drools-5b67c475d6-pv6kt                 0/2       
CreateContainerConfigError        0          27m

onap          policy-pap-79577c6947-fhfxb                    0/2       
Init:CrashLoopBackOff             8          27m

onap          policy-pdp-7d5c76bf8d-st7js                    0/2       Init:0/1 
                         2          27m

onap          portal-apps-7ddfc4b6bd-g7nhk                   0/2       
Init:CreateContainerConfigError   0          27m

onap          portal-vnc-7dcbf79f66-7c6p6                    0/1       Init:0/4 
                         2          27m

onap          portal-widgets-6979b47c48-5kr86                0/1       
CreateContainerConfigError        0          27m

onap          robot-f6d55cc87-t2wgd                          0/1       
CreateContainerConfigError        0          27m

onap          sdc-fe-6d4b87c978-2v5x2                        0/2       
CreateContainerConfigError        0          27m

onap          sdnc-0                                         0/2       Init:0/1 
                         2          28m

onap          sdnc-dbhost-0                                  0/2       Pending  
                         0          28m

onap          sdnc-dgbuilder-794d686f78-296zf                0/1       Init:0/1 
                         2          28m

onap          sdnc-dmaap-listener-8595c8f6c-vgzxt            0/1       Init:0/1 
                         2          28m

onap          sdnc-portal-69b79b6646-p4x8k                   0/1       Init:0/1 
                         2          28m

onap          sdnc-ueb-listener-6897f6dd55-fq9j5             0/1       Init:0/1 
                         2          28m

onap          vfc-ztevnfmdriver-fcf4ddf68-65pb5              0/1       
ImagePullBackOff                  0          27m

onap          vid-mariadb-6788c598fb-kbfnw                   0/1       
CreateContainerConfigError        0          28m

onap          vid-server-87d5d87cf-9rbx4                     0/2       Init:0/1 
                         2          28m

onap          vnfsdk-refrepo-55f544c5f5-9b6jj                0/1       
ImagePullBackOff                  0          27m

http://beijing.onap.cloud:8880/r/projects/1a7/kubernetes-dashboard:9090/#!/pod?namespace=_all

Checking amsterdam on Azure running via Ubuntu user = OK
root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0# tail -f 
stdout
root@ons-auto-201803191109z:/var/lib/waagent/custom-script/download/0/oom# git 
status
On branch amsterdam
Your branch is up-to-date with 'origin/amsterdam'.
4 pending > 0 at the 62th 15 sec interval
onap-aaf aaf-1993711932-3lnwd 0/1 Running 0 19m
onap-vnfsdk refrepo-1924147637-1x10v 0/1 ErrImagePull 0 19m
3 pending > 0 at the 63th 15 sec interval


/michael


From: Gary Wu [mailto:gary.i...@huawei.com]
Sent: Monday, March 19, 2018 11:04
To: Michael O'Brien <frank.obr...@amdocs.com<mailto:frank.obr...@amdocs.com>>
Cc: Yunxia Chen <helen.c...@huawei.com<mailto:helen.c...@huawei.com>>; 
PLATANIA, MARCO <plata...@research.att.com<mailto:plata...@research.att.com>>; 
FREEMAN, BRIAN D <bf1...@att.com<mailto:bf1...@att.com>>
Subject: New OOM deployment issues

Hi Michael,

Since some time Friday afternoon, my daily OOM deployments have been failing 
"Error: failed to prepare subPath for volumeMount ..." for various ONAP pods.  
Has anything changed recently that may be causing this issue?

It also looks like the robot logs directory is still not there under 
/dockerdata-nfs/.  Do we have a ticket tracking this issue?

Thanks,
Gary


This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>
_______________________________________________
onap-discuss mailing list
onap-discuss@lists.onap.org
https://lists.onap.org/mailman/listinfo/onap-discuss

Reply via email to