Thanks Morgan. Is there any insight about SDNC limit issue? Best regards Catherine
From: [email protected] <[email protected]> On Behalf Of Morgan Richomme via lists.onap.org Sent: Monday, December 7, 2020 7:26 PM To: [email protected]; [email protected]; [email protected] Subject: [onap-tsc] [ONAP] [Guilin] [Integration] stability test first results Hi I planned to work on an official documentation page for the stability testing tomorrow. I just want to share some results (I already shared some during last PTL meeting) We run 2 stability tests in addition of the daily Guilin. 1) Daily Guilin **************** src: https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/<https://urldefense.com/v3/__https:/logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/__;!!BhdT!y5oQwGADjnbMkcRlT6UhYAZfVsxpPc6yJChWSTvT-VrAZYA9ypGEGYpYYT218XFap9F6PxY$> The test evolution can be described as follows: Infra-Healthcheck: 50% - release criteria met Healthcheck: 100% today (data result sync error in the dashboard) -release criteria met Smoke: 60 % - errors could be due to // tests in SDC (got a 500 sometimes in case of // testing) Security: 66%: criteria met limits only 1 remaining: onap-ejbca-675cf9b46f-rz45g wait for additional days, feedback from installation using Helm3 (performed by DT) is also reasonably positive. 2) Simple HC during 24h ************************** On a lab, we run the healtcheck every 10 minutes the idea was to see if at one stage AAF-SMS got failed, which could explain why DCAE HC becomes sometimes FAIL after some time Test 100% OK src: http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full&period=2<https://urldefense.com/v3/__http:/testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full&period=2__;!!BhdT!y5oQwGADjnbMkcRlT6UhYAZfVsxpPc6yJChWSTvT-VrAZYA9ypGEGYpYYT218XFaW4S9z-s$> 99 OK/99 tests run (6/12 7h35 => 7/12 06h23) 3) Simple basic_vm over 7 days ******************************** This test consists in continuously run the basic_vm test (https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/12-07-2020_12-12/smoke-usecases/basic_vm/reporting.html<https://urldefense.com/v3/__https:/logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/12-07-2020_12-12/smoke-usecases/basic_vm/reporting.html__;!!BhdT!y5oQwGADjnbMkcRlT6UhYAZfVsxpPc6yJChWSTvT-VrAZYA9ypGEGYpYYT218XFagf6li-w$>) onboard once, instantiate multiple times note even after onboard, the test call the SDC and AAI API to verify that the service does exist in the SDC components impacted: SDC, AAI, SDNC, SO, DMAAP SO BPMN 'a la carte used' Globally the test is stable over the last 7 days the only issue deals with the fact that the SDNC reached its limit after ~ 24h of continuous testing. once this limit is reached, it does not work anymore, a manual restart (no OOM killing) of the SDNC is needed. performing a kubectl delete pod -n onap onap-sdnc-0 is enough, the tests are PASS after SDNC restart The raw results are PASS: 490 FAIL: 165 Raw Success Rate: 75% but as the saturation occurs at the beginning of the night, I restarted the SDNC only in the morning (my time), so we have long consecutive FAIL tests If I correct and remove the Failure due to the fact that we reached the limits PASS: 490 FAIl:87 Corrected Success Rate: 85% Among the PASS tests Min: 188s Max: 2094s Average: 550s Median: 260s The duration of the PASS test has been also evaluated we can see that after the first restart the duration is much more variable than during the initial phase I created an histogram to see the distribution of the duration for this test Conclusions ************** Limited Stability looks OK (daily + stability tests) only SDNC limit reached triggers an issue We may have expected a restart as the limit was reached, not sure to fully understand how k8s limits are working - several components are above their limits/requests BUT the stability tests are very light few components have been tested (no feedback on Policy/DCAE/CLAMP, in Dublin or El Alto, Brian was using the vFWCL as the test baseline for the tests, here the test is simpler, mainly focused on SO instantiation) we can see that the average duration of a simple test seems to increase over time we have lots of figures and graphs, I need to spend more time to dive into it... we shall be able to identify the components that require memory/cpu over the last 7 days top 5 CPU note robot used only to lauch an HC..showing that the pressure in CPU is very low... top 5 Memory For Honolulu, I hope we could create a benchmarking xtesting dockers that will automatically replay such tests on the weekly master providing feedback continuously /Morgan _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#7354): https://lists.onap.org/g/onap-tsc/message/7354 Mute This Topic: https://lists.onap.org/mt/78784848/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-tsc/leave/2743226/1412191262/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
