Re: [onap-discuss] OOF integration testing

2018-05-20 Thread Yunxia Chen
Hi, Shankar,
Thanks for the update. Please discuss with Kang, added on this maillist,  on 
schedule and priority.

Regards,

Helen Chen

From: Shankaranarayanan P N 
Date: Friday, May 18, 2018 at 10:21 AM
To: Helen Chen 00725961 , Gildas Lanilis 
, "PATEL, ANKITKUMAR N (ANKITKUMAR N)" 
, "ritu.s...@intel.com" , 
"LEFEVRE, CATHERINE" , "FREEMAN, BRIAN D" 
, "sa...@research.att.com" , 
onap-discuss 
Subject: OOF integration testing

Hi Helen, Gildas,

Following up on our earlier meeting regarding the integration testing, I wanted 
to reach out and provide an update on the progress here as you requested.

We have completed all the pairwise testing with our downstream dependencies 
(AAI, MultiCloud, Policy), and expect to complete the Northbound testing with 
SO in a short while. In order to simultaneously progress alongside the 
integration testing efforts, we have been primarily using the vCPE workflows 
for our pairwise testing.

Since OOF is called by SO for homing in the vCPEResCust instantiation workflow, 
I wanted to check if the following plan for the testing would work:

1. Test the basic Homing workflow for vCPE with minimal constraints 
(https://wiki.onap.org/display/DW/HAS+%28R2%29+Beijing+Release+planning). This 
would, in effect, replace the SNIRO emulator stub in R1 flows with OOF homing 
workflow.
2. Add the HPA and MultiCloud capacity check policies, and test with the 
functional test cases.

This way, we can start off with an initial but meaningful test and make 
progress incrementally by adding more complex policies. Please let us know if 
this sounds reasonable.

Thank you so much for your help with the testing efforts !

Thanks,
Shankar.



___
onap-discuss mailing list
onap-discuss@lists.onap.org
https://lists.onap.org/mailman/listinfo/onap-discuss


[onap-discuss] [CD]: Both systems in sync at 41/43 today

2018-05-20 Thread Michael O'Brien
Guys,
   Update: The tlab and the AWS CD systems were in agreement today.
   Note the AWS system is now running 4 x 64G VMs (32 cores and 256G ram) – (I 
am bringing up a 2nd 9 x 16G vm mirror system up as well).
   For those requiring AWS - the details/script on configuring the only cloud 
native part of the cluster – the EFS wrapper on the NFS file share (will be 
hosted outside of all the VM’s as a service) – is detailed below – you can run 
the EFS script after the normal cloud-agnostic oom_rancher_install.sh script in 
https://jira.onap.org/browse/LOG-325
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-EFS/NFSProvisioningScriptforAWS

There were only 2 failures in a good build earlier today on both systems – 
one due to intermittent timing likely at 41/43 – both agreed at the same time 
and on the same 2 pods – which is rare.
4 x 64G – AWS
http://jenkins.onap.info/job/oom-cd-master/2971/consoleFull
16:00:18 43 critical tests, 41 passed, 2 failed
9 x 16G - TLAB
https://jenkins.onap.org/view/External%20Labs/job/lab-tlab-beijing-oom-deploy/321/console
21:37:46 14:19:40-0700  43 critical tests, 41 passed, 2 failed

http://kibana.onap.info:5601/app/kibana#/dashboard/AWAtvpS63NTXK5mX2kuS?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2018-05-19T14:46:38.216Z',mode:absolute,to:'2018-05-19T17:22:18.756Z'))&_a=(description:'',filters:!(),options:(darkTheme:!f),panels:!((col:1,id:AWAts77k3NTXK5mX2kuM,panelIndex:1,row:1,size_x:8,size_y:3,type:visualization),(col:9,id:AWAtuTVI3NTXK5mX2kuP,panelIndex:2,row:1,size_x:4,size_y:3,type:visualization),(col:1,id:AWAtuBTY3NTXK5mX2kuO,panelIndex:3,row:7,size_x:6,size_y:3,type:visualization),(col:1,id:AWAttmqB3NTXK5mX2kuN,panelIndex:4,row:4,size_x:6,size_y:3,type:visualization),(col:7,id:AWAtvHtY3NTXK5mX2kuR,panelIndex:6,row:4,size_x:6,size_y:6,type:visualization)),query:(match_all:()),timeRestore:!f,title:'CD%20Health%20Check',uiState:(),viewMode:view)

   ONAP wide Resource allocations:
   The healthcheck even on this system (and the tlab system) – is very 
sensitive to un-optimized rogue containers (logstash being one), the order of 
pods, and readiness/liveness timing – as we fine tune and prioritize container 
resources we should get better – I am scheduling a performance meeting for 1130 
Thu to go over a couple of the containers (ELK stack under a 30 logs/sec idle 
load) causing issues like the high indexing behavior on the logstash daemonset 
and whether/how elasticsearch should also be a ds - Mike and Mandeep have work 
scheduled to do most of this for all of OOM in general – the meeting will be 
public and start with logging pods.
A lot of this will be things like ReplicaSet sizes - which to switch to 
DaemonSets (1 container per vm), which to use autoscalers, cpu limits (cores 
sorry no %), ram limits, collocation rules (which pods get affected by others 
on the same vm), cluster VM granularity sweetspot (32/16/8g VMs and the 
tradeoff on limiting vm local affects with reduced collocation) – for example a 
rogue container on a 128g vm in a cluster hogging 6 of 32 cores is less 
affecting than one on an 8 core vm, but an 8G vm may only have enough room for 
1 or 2 pods that need huge heaps.  However all of these optimizations should be 
done together with all the PTLs because if we arbitrarily set priorities on 
ram/cpu limits on some pods – others will be effectively downgraded – we need a 
hierarchy – which Mike mentioned.

 I recommend the windriver/tlab system report the pod list just before the 
final healthcheck like below so we can split out heathchecks failing due to 
failed pods and HC failing on running pods – as well as HC passing on failing 
pods for false positives.
 If we use -o wide we can also determine the deployment distribution 
architecture of that particular install – where pods especially non-DaemonSet 
replicaSet ones are running on which cluster VM.

Example a couple running pods – the list is 125 of 150+
16:00:01 List of ONAP Modules
16:00:01 NAMESPACE NAMEREADY
 STATUS RESTARTS   AGE   IP  NODE
16:00:01 onap  dep-config-binding-service-68b4695cb4-l4tst 2/2  
 Running0  3h10.42.233.237   
ip-10-0-0-80.us-east-2.compute.internal
16:00:01 onap  dep-dcae-tca-analytics-68d749cb4c-7mzjg 2/2  
 Running0  3h10.42.174.250   
ip-10-0-0-111.us-east-2.compute.internal
…..

For robot logs – I’ll think it will be benefical to add a filebeat sidecar to 
the robot pod – so we can query the elk stack on 30253 on any robot healthcheck 
and ete logs as well.
https://jira.onap.org/browse/LOG-414
this was mentioned in a previous request 
https://lists.onap.org/pipermail/onap-discuss/2018-April/009199.html

thank you
/michael


This message and the information contained herein is proprietary and 
confidential and subject to the