Re: [opnfv-tech-discuss] Interpretation of yardstick test results
Hi, Thanks all for great questions and suggestions and sorry for the delay in responding. Since I was in National Day vacation. The yardstick.out file stores raw data of each test case you ran. The raw data consists of test details and raw test output. All result data are stored under “data” section. It is not easy to understand the results in yardstick.out directly but it is much easier to use grafana to visualize these data to help you understand the results. All CI jobs data can be found at Yardstick’s community dashboard. If you want to analyze your local test data, you will need to build a local influx DB and grafana. https://wiki.opnfv.org/display/yardstick/How+to+deploy+InfluxDB+and+Grafana+locally In Danube release, we plan to support automatic influx DB and grafana local deployment. For details of metrics measured and relevant specifications, currently we only have the test case description in user guide. In C-release, we have done some test result analysis for the passed scenarios, this can be find at http://artifacts.opnfv.org/yardstick/colorado/docs/results/index.html For test results criteria, it is probably complex to define thresholds. A good news is Yardstick grafana do support threshold. If the thresholds are defined, it would be useful. Once the threshold is defined, You can use it to evaluate the test results. For example: Here we set two threshold levels for ping test. [cid:image001.png@01D2217E.5356F790] If the round trip time is smaller than 1.5ms, it may be considered as “good”. If the RTT is between 1.5ms to 3ms, it may be considered as “good enough”. One thing needs to mention is although Yardstick grafana can set the threshold for each test case , it is up to standardization side and the whole test community to define these thresholds. [cid:image002.png@01D2217E.5356F790] For comparative tests, so far, you can compare different scenarios on the same pod. We have a dedicated graph for each pod. [cid:image003.png@01D22183.E8A7D030] To compare test results of same scenario on different pods, you will need to choose the scenario and pod from the pull-down menu. Like below: [cid:image004.png@01D22184.53F524A0] And we are planning to improve this in Danube. One basic idea is to support compare different pod per scenario. Regards. Kubi 发件人: MORTON, ALFRED C (AL) [mailto:acmor...@att.com] 发送时间: 2016年10月8日 1:03 收件人: Cooper, Trevor; morgan.richo...@orange.com; Frank Brockners (fbrockne); Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco); Gaoliang (kubi); limingjiang 抄送: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); opnfv-tech-discuss@lists.opnfv.org 主题: RE: [opnfv-tech-discuss] Interpretation of yardstick test results Hi Morgan, you wrote: We got an update on grafana last week but it was more on the capabilities of the tools than on the interpretation of the results. I think we should clearly have a discussion on this topic. It is probably complex to define thresholds = f(pod, hardware, network config,..) but it would be helpful. Is there any activity on standardization side on this area? I think there is agreement (in some stds bodies) that the benchmarking results we collect for NFVI and VNFs should support operator engineering and capacity planning in a better way than we have done in the past (for physical NF). In other words, truly fundamental metrics should lead to additive system models, and VNF workload expressed in the same units could be matched with system capabilities. I think Trevor’s list below is kind-of a pre-requisite for this... It’s different from (more useful than?) setting thresholds for specific tests that can be mapped to different platforms, and a related process in my mind. If you can perform comparative tests (A vs B), then the relative test results should be useful without thresholds. One side topic that has come up recently: I don’t know if there are “standard” definitions for processor utilization and interface metrics (packet and byte counts) that can be expressed at various levels of physical and virtualization, but it would certainly help the have these (std) metrics available to support operations (and avoid calculation differences per system). Al From: opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org> [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Cooper, Trevor Sent: Friday, October 07, 2016 10:57 AM To: morgan.richo...@orange.com<mailto:morgan.richo...@orange.com>; Frank Brockners (fbrockne); Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco); Gaoliang (kubi); limingjiang Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results I don’t think Yardstick tests are being interpreted much today, at least not as performance metrics. To reach a
Re: [opnfv-tech-discuss] Interpretation of yardstick test results
Hi Morgan, you wrote: We got an update on grafana last week but it was more on the capabilities of the tools than on the interpretation of the results. I think we should clearly have a discussion on this topic. It is probably complex to define thresholds = f(pod, hardware, network config,..) but it would be helpful. Is there any activity on standardization side on this area? I think there is agreement (in some stds bodies) that the benchmarking results we collect for NFVI and VNFs should support operator engineering and capacity planning in a better way than we have done in the past (for physical NF). In other words, truly fundamental metrics should lead to additive system models, and VNF workload expressed in the same units could be matched with system capabilities. I think Trevor's list below is kind-of a pre-requisite for this... It's different from (more useful than?) setting thresholds for specific tests that can be mapped to different platforms, and a related process in my mind. If you can perform comparative tests (A vs B), then the relative test results should be useful without thresholds. One side topic that has come up recently: I don't know if there are "standard" definitions for processor utilization and interface metrics (packet and byte counts) that can be expressed at various levels of physical and virtualization, but it would certainly help the have these (std) metrics available to support operations (and avoid calculation differences per system). Al From: opnfv-tech-discuss-boun...@lists.opnfv.org [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Cooper, Trevor Sent: Friday, October 07, 2016 10:57 AM To: morgan.richo...@orange.com; Frank Brockners (fbrockne); Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco); Gaoliang (kubi); limingjiang Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); opnfv-tech-discuss@lists.opnfv.org Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results I don't think Yardstick tests are being interpreted much today, at least not as performance metrics. To reach a level of maturity that would make interpretation easy/useful and help the industry, IMO we need Analysis of test coverage -> catalog with views per metric / project / scenarios ... and ultimately also workload/VNF Tools / traffic generators -> features, suitability and limitations Test cases -> accurate description of what is being tested, details of metrics measured, relevant specifications / references (what part of spec is actually implemented) Any requirements from CVP ... cannot have separate set of tests/tools (who would work on that?) Agree with Morgan we should discuss as a test community and have a strategy for Danube. /Trevor From: opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org> [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of morgan.richo...@orange.com<mailto:morgan.richo...@orange.com> Sent: Thursday, October 06, 2016 9:10 AM To: Frank Brockners (fbrockne) <fbroc...@cisco.com<mailto:fbroc...@cisco.com>>; Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com<mailto:jlin...@cisco.com>>; Gaoliang (kubi) <jean.gaoli...@huawei.com<mailto:jean.gaoli...@huawei.com>>; limingjiang <limingji...@huawei.com<mailto:limingji...@huawei.com>> Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) <ava...@cisco.com<mailto:ava...@cisco.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results Hi, I think the question was already asked in Brahmaputra :) We got an update on grafana last week but it was more on the capabilities of the tools than on the interpretation of the results. I think we should clearly have a discussion on this topic. It is probably complex to define thresholds = f(pod, hardware, network config,..) but it would be helpful. Is there any activity on standardization side on this area? I put several possible future discussions on the Testing community page https://wiki.opnfv.org/display/meetings/TestPerf Please note that I postponed all the today agenda to next week as the quoraum was not reached. I put the catalog mentioned by Myriam last week, but also the question of test coverage (discussions initiated months ago but could be interesting to reinitiate for Danube) and performance/stress tests. I was recently asked about the stress tests done in OPNFV and as far as I know we do not really try to stress the system (except vsperf and storperf). We have the tools and the framework (Yardstick, Rally,..and some proprietary loaders) to do it but not a real strategy on performance/stress tests Danube is maybe a good time to try to elaborate something I think we need also to organize a sync with CVS group to avoi
Re: [opnfv-tech-discuss] Interpretation of yardstick test results
I don't think Yardstick tests are being interpreted much today, at least not as performance metrics. To reach a level of maturity that would make interpretation easy/useful and help the industry, IMO we need Analysis of test coverage -> catalog with views per metric / project / scenarios ... and ultimately also workload/VNF Tools / traffic generators -> features, suitability and limitations Test cases -> accurate description of what is being tested, details of metrics measured, relevant specifications / references (what part of spec is actually implemented) Any requirements from CVP ... cannot have separate set of tests/tools (who would work on that?) Agree with Morgan we should discuss as a test community and have a strategy for Danube. /Trevor From: opnfv-tech-discuss-boun...@lists.opnfv.org [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of morgan.richo...@orange.com Sent: Thursday, October 06, 2016 9:10 AM To: Frank Brockners (fbrockne) <fbroc...@cisco.com>; Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com>; Gaoliang (kubi) <jean.gaoli...@huawei.com>; limingjiang <limingji...@huawei.com> Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) <ava...@cisco.com>; opnfv-tech-discuss@lists.opnfv.org Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results Hi, I think the question was already asked in Brahmaputra :) We got an update on grafana last week but it was more on the capabilities of the tools than on the interpretation of the results. I think we should clearly have a discussion on this topic. It is probably complex to define thresholds = f(pod, hardware, network config,..) but it would be helpful. Is there any activity on standardization side on this area? I put several possible future discussions on the Testing community page https://wiki.opnfv.org/display/meetings/TestPerf Please note that I postponed all the today agenda to next week as the quoraum was not reached. I put the catalog mentioned by Myriam last week, but also the question of test coverage (discussions initiated months ago but could be interesting to reinitiate for Danube) and performance/stress tests. I was recently asked about the stress tests done in OPNFV and as far as I know we do not really try to stress the system (except vsperf and storperf). We have the tools and the framework (Yardstick, Rally,..and some proprietary loaders) to do it but not a real strategy on performance/stress tests Danube is maybe a good time to try to elaborate something I think we need also to organize a sync with CVS group to avoid any misunderstanding /Morgan Le 06/10/2016 à 09:59, Frank Brockners (fbrockne) a écrit : Hi folks, is there anyone around who can help with interpreting Yardstick's test results? I.e. what do all the numbers that we see created and submitted into the InfluxDB mean - i.e. how do I know whether a number is "good", "good enough", "not good"? In Grafana you see some nice graphs - but how do you interpret them? I scanned the user-guide but did not find any guidance - and from talking to other folks, I don't seem to be alone in struggling to understand the results. Would greatly appreciate if someone could either explain the results (see e.g. Juraj's email below) or point us to a document that does so. Many thanks! Frank From: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) Sent: Dienstag, 4. Oktober 2016 16:23 To: Gaoliang (kubi) <jean.gaoli...@huawei.com><mailto:jean.gaoli...@huawei.com>; limingjiang <limingji...@huawei.com><mailto:limingji...@huawei.com> Cc: opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>; Frank Brockners (fbrockne) <fbroc...@cisco.com><mailto:fbroc...@cisco.com>; Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) <ava...@cisco.com><mailto:ava...@cisco.com> Subject: Interpretation of yardstick test results Hi Kubi, Can you help us with interpreting yardstick results? I've attached data from four runs produced by yardstick, but I have no idea what they mean - how do I know what is a good result and what is not? Thanks, Juraj -- Morgan Richomme Orange/ IMT/ OLN/ CNC/ NCA/ SINA Network architect for innovative services Future of the Network community member Open source Orange community manager tel. +33 (0) 296 072 106 mob. +33 (0) 637 753 326 morgan.richo...@orange.com<mailto:morgan.richo...@orange.com> _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruir
Re: [opnfv-tech-discuss] Interpretation of yardstick test results
Hi, I think the question was already asked in Brahmaputra :) We got an update on grafana last week but it was more on the capabilities of the tools than on the interpretation of the results. I think we should clearly have a discussion on this topic. It is probably complex to define thresholds = f(pod, hardware, network config,..) but it would be helpful. Is there any activity on standardization side on this area? I put several possible future discussions on the Testing community page https://wiki.opnfv.org/display/meetings/TestPerf Please note that I postponed all the today agenda to next week as the quoraum was not reached. I put the catalog mentioned by Myriam last week, but also the question of test coverage (discussions initiated months ago but could be interesting to reinitiate for Danube) and performance/stress tests. I was recently asked about the stress tests done in OPNFV and as far as I know we do not really try to stress the system (except vsperf and storperf). We have the tools and the framework (Yardstick, Rally,..and some proprietary loaders) to do it but not a real strategy on performance/stress tests Danube is maybe a good time to try to elaborate something I think we need also to organize a sync with CVS group to avoid any misunderstanding /Morgan Le 06/10/2016 à 09:59, Frank Brockners (fbrockne) a écrit : > > Hi folks, > > > > is there anyone around who can help with interpreting Yardstick’s test > results? I.e. what do all the numbers that we see created and > submitted into the InfluxDB mean – i.e. how do I know whether a number > is “good”, “good enough”, “not good”? In Grafana you see some nice > graphs – but how do you interpret them? I scanned the user-guide but > did not find any guidance – and from talking to other folks, I don’t > seem to be alone in struggling to understand the results. > > > > Would greatly appreciate if someone could either explain the results > (see e.g. Juraj’s email below) or point us to a document that does so. > > > > Many thanks! > > > Frank > > > > *From:*Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) > *Sent:* Dienstag, 4. Oktober 2016 16:23 > *To:* Gaoliang (kubi); limingjiang > > *Cc:* opnfv-tech-discuss@lists.opnfv.org; Frank Brockners (fbrockne) > ; Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES > at Cisco) > *Subject:* Interpretation of yardstick test results > > > > Hi Kubi, > > > > Can you help us with interpreting yardstick results? I've attached > data from four runs produced by yardstick, but I have no idea what > they mean – how do I know what is a good result and what is not? > > > > Thanks, > > Juraj > -- Morgan Richomme Orange/ IMT/ OLN/ CNC/ NCA/ SINA Network architect for innovative services Future of the Network community member Open source Orange community manager tel. +33 (0) 296 072 106 mob. +33 (0) 637 753 326 morgan.richo...@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ opnfv-tech-discuss mailing list opnfv-tech-discuss@lists.opnfv.org https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss