Re: [opnfv-tech-discuss] Interpretation of yardstick test results

2016-10-08 Thread Gaoliang (kubi)
Hi,

Thanks all for great questions and suggestions and sorry for the delay in 
responding. Since I was in National Day vacation.

The yardstick.out file stores raw data of each test case you ran. The raw data 
consists of test details and raw test output. All result data are stored under 
“data” section.
It is not easy to understand the results in yardstick.out directly but it is 
much easier to use grafana to visualize these data to help you understand the 
results.

All CI jobs data can be found at Yardstick’s community dashboard.
If you want to analyze your local test data, you will need to build a local 
influx DB and grafana. 
https://wiki.opnfv.org/display/yardstick/How+to+deploy+InfluxDB+and+Grafana+locally
In Danube release, we plan to support automatic influx DB and grafana local 
deployment.

For details of metrics measured and relevant specifications, currently we only 
have the test case description in user guide.
In C-release, we have done some test result analysis for the passed scenarios, 
this can be find at 
http://artifacts.opnfv.org/yardstick/colorado/docs/results/index.html

For test results criteria, it is probably complex to define thresholds. A good 
news is Yardstick grafana do support threshold. If the thresholds are defined, 
it would be useful.
Once the threshold is defined, You can use it to evaluate the test results. For 
example:

Here we set two threshold levels for ping test.
[cid:image001.png@01D2217E.5356F790]

If the round trip time is smaller than 1.5ms, it may be considered as “good”. 
If the RTT is between 1.5ms to 3ms, it may be considered as “good enough”.
One thing needs to mention is although Yardstick grafana can set the threshold 
for each test case , it is up to standardization side and the whole test 
community to define these thresholds.
[cid:image002.png@01D2217E.5356F790]


For comparative tests, so far, you can compare different scenarios on the same 
pod.
We have a dedicated graph for each pod.
[cid:image003.png@01D22183.E8A7D030]

To compare test results of same scenario on different pods, you will need to 
choose the scenario and pod from the pull-down menu. Like below:
[cid:image004.png@01D22184.53F524A0]
And we are planning to improve this in Danube. One basic idea is to support 
compare different pod per scenario.

Regards.

Kubi

发件人: MORTON, ALFRED C (AL) [mailto:acmor...@att.com]
发送时间: 2016年10月8日 1:03
收件人: Cooper, Trevor; morgan.richo...@orange.com; Frank Brockners (fbrockne); 
Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco); Gaoliang (kubi); 
limingjiang
抄送: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); 
opnfv-tech-discuss@lists.opnfv.org
主题: RE: [opnfv-tech-discuss] Interpretation of yardstick test results

Hi Morgan,

you wrote:
We got an update on grafana last week but it was more on the capabilities of 
the tools than on the interpretation of the results.
I think we should clearly have a discussion on this topic.
It is probably complex to define thresholds = f(pod, hardware, network 
config,..) but it would be helpful.
Is there any activity on standardization side on this area?
I think there is agreement (in some stds bodies) that the benchmarking results
we collect for NFVI and VNFs should support operator engineering and capacity 
planning
in a better way than we have done in the past (for physical NF).
In other words, truly fundamental metrics should lead to additive system models,
and VNF workload expressed in the same units could be matched with system 
capabilities.
I think Trevor’s list below is kind-of a pre-requisite for this...

It’s different from (more useful than?) setting thresholds for specific tests
that can be mapped to different platforms, and a related process in my mind.
If you can perform comparative tests (A vs B), then the relative test results
should be useful without thresholds.

One side topic that has come up recently: I don’t know if there are
“standard” definitions for processor utilization and interface metrics
(packet and byte counts) that can be expressed at various levels of physical
and virtualization, but it would certainly help the have these (std) metrics
available to support operations (and avoid calculation differences per system).

Al


From: 
opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org>
 [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Cooper, Trevor
Sent: Friday, October 07, 2016 10:57 AM
To: morgan.richo...@orange.com<mailto:morgan.richo...@orange.com>; Frank 
Brockners (fbrockne); Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at 
Cisco); Gaoliang (kubi); limingjiang
Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results

I don’t think Yardstick tests are being interpreted much today, at least not as 
performance metrics. To reach a

Re: [opnfv-tech-discuss] Interpretation of yardstick test results

2016-10-07 Thread MORTON, ALFRED C (AL)
Hi Morgan,

you wrote:
We got an update on grafana last week but it was more on the capabilities of 
the tools than on the interpretation of the results.
I think we should clearly have a discussion on this topic.
It is probably complex to define thresholds = f(pod, hardware, network 
config,..) but it would be helpful.
Is there any activity on standardization side on this area?

I think there is agreement (in some stds bodies) that the benchmarking results
we collect for NFVI and VNFs should support operator engineering and capacity 
planning
in a better way than we have done in the past (for physical NF).
In other words, truly fundamental metrics should lead to additive system models,
and VNF workload expressed in the same units could be matched with system 
capabilities.
I think Trevor's list below is kind-of a pre-requisite for this...
It's different from (more useful than?) setting thresholds for specific tests
that can be mapped to different platforms, and a related process in my mind.
If you can perform comparative tests (A vs B), then the relative test results
should be useful without thresholds.

One side topic that has come up recently: I don't know if there are
"standard" definitions for processor utilization and interface metrics
(packet and byte counts) that can be expressed at various levels of physical
and virtualization, but it would certainly help the have these (std) metrics
available to support operations (and avoid calculation differences per system).

Al


From: opnfv-tech-discuss-boun...@lists.opnfv.org 
[mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Cooper, Trevor
Sent: Friday, October 07, 2016 10:57 AM
To: morgan.richo...@orange.com; Frank Brockners (fbrockne); Juraj Linkes -X 
(jlinkes - PANTHEON TECHNOLOGIES at Cisco); Gaoliang (kubi); limingjiang
Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco); 
opnfv-tech-discuss@lists.opnfv.org
Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results

I don't think Yardstick tests are being interpreted much today, at least not as 
performance metrics. To reach a level of maturity that would make 
interpretation easy/useful and help the industry, IMO we need

Analysis of test coverage -> catalog with views per metric / project / 
scenarios ... and ultimately also workload/VNF
Tools / traffic generators -> features, suitability and limitations
Test cases -> accurate description of what is being tested, details of metrics 
measured, relevant specifications / references (what part of spec is actually 
implemented)
Any requirements from CVP ... cannot have separate set of tests/tools (who 
would work on that?)

Agree with Morgan we should discuss as a test community and have a strategy for 
Danube.

/Trevor


From: 
opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org>
 [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of 
morgan.richo...@orange.com<mailto:morgan.richo...@orange.com>
Sent: Thursday, October 06, 2016 9:10 AM
To: Frank Brockners (fbrockne) <fbroc...@cisco.com<mailto:fbroc...@cisco.com>>; 
Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) 
<jlin...@cisco.com<mailto:jlin...@cisco.com>>; Gaoliang (kubi) 
<jean.gaoli...@huawei.com<mailto:jean.gaoli...@huawei.com>>; limingjiang 
<limingji...@huawei.com<mailto:limingji...@huawei.com>>
Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) 
<ava...@cisco.com<mailto:ava...@cisco.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results

Hi,

I think the question was already asked in Brahmaputra :)

We got an update on grafana last week but it was more on the capabilities of 
the tools than on the interpretation of the results.
I think we should clearly have a discussion on this topic.
It is probably complex to define thresholds = f(pod, hardware, network 
config,..) but it would be helpful.
Is there any activity on standardization side on this area?

I put several possible future discussions on the Testing community page 
https://wiki.opnfv.org/display/meetings/TestPerf
Please note that I postponed all the today agenda to next week as the quoraum 
was not reached.
I put the catalog mentioned by Myriam last week, but also the question of test 
coverage (discussions initiated months ago but could be interesting to 
reinitiate for Danube)  and performance/stress tests.
I was recently asked about the stress tests done in OPNFV and as far as I know 
we do not really try to stress the system (except vsperf and storperf).
We have the tools and the framework (Yardstick, Rally,..and some proprietary 
loaders)  to do it but not a real strategy on performance/stress tests
Danube is maybe a good time to try to elaborate something
I think we need also to organize a sync with CVS group to avoi

Re: [opnfv-tech-discuss] Interpretation of yardstick test results

2016-10-07 Thread Cooper, Trevor
I don't think Yardstick tests are being interpreted much today, at least not as 
performance metrics. To reach a level of maturity that would make 
interpretation easy/useful and help the industry, IMO we need

Analysis of test coverage -> catalog with views per metric / project / 
scenarios ... and ultimately also workload/VNF
Tools / traffic generators -> features, suitability and limitations
Test cases -> accurate description of what is being tested, details of metrics 
measured, relevant specifications / references (what part of spec is actually 
implemented)
Any requirements from CVP ... cannot have separate set of tests/tools (who 
would work on that?)

Agree with Morgan we should discuss as a test community and have a strategy for 
Danube.

/Trevor


From: opnfv-tech-discuss-boun...@lists.opnfv.org 
[mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of 
morgan.richo...@orange.com
Sent: Thursday, October 06, 2016 9:10 AM
To: Frank Brockners (fbrockne) <fbroc...@cisco.com>; Juraj Linkes -X (jlinkes - 
PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com>; Gaoliang (kubi) 
<jean.gaoli...@huawei.com>; limingjiang <limingji...@huawei.com>
Cc: Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) 
<ava...@cisco.com>; opnfv-tech-discuss@lists.opnfv.org
Subject: Re: [opnfv-tech-discuss] Interpretation of yardstick test results

Hi,

I think the question was already asked in Brahmaputra :)

We got an update on grafana last week but it was more on the capabilities of 
the tools than on the interpretation of the results.
I think we should clearly have a discussion on this topic.
It is probably complex to define thresholds = f(pod, hardware, network 
config,..) but it would be helpful.
Is there any activity on standardization side on this area?

I put several possible future discussions on the Testing community page 
https://wiki.opnfv.org/display/meetings/TestPerf
Please note that I postponed all the today agenda to next week as the quoraum 
was not reached.
I put the catalog mentioned by Myriam last week, but also the question of test 
coverage (discussions initiated months ago but could be interesting to 
reinitiate for Danube)  and performance/stress tests.
I was recently asked about the stress tests done in OPNFV and as far as I know 
we do not really try to stress the system (except vsperf and storperf).
We have the tools and the framework (Yardstick, Rally,..and some proprietary 
loaders)  to do it but not a real strategy on performance/stress tests
Danube is maybe a good time to try to elaborate something
I think we need also to organize a sync with CVS group to avoid any 
misunderstanding

/Morgan



Le 06/10/2016 à 09:59, Frank Brockners (fbrockne) a écrit :
Hi folks,

is there anyone around who can help with interpreting Yardstick's test results? 
I.e. what do all the numbers that we see created and submitted into the 
InfluxDB mean - i.e. how do I know whether a number is "good", "good enough", 
"not good"? In Grafana you see some nice graphs - but how do you interpret 
them? I scanned the user-guide but did not find any guidance - and from talking 
to other folks, I don't seem to be alone in struggling to understand the 
results.

Would greatly appreciate if someone could either explain the results (see e.g. 
Juraj's email below) or point us to a document that does so.

Many thanks!

Frank

From: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco)
Sent: Dienstag, 4. Oktober 2016 16:23
To: Gaoliang (kubi) 
<jean.gaoli...@huawei.com><mailto:jean.gaoli...@huawei.com>; limingjiang 
<limingji...@huawei.com><mailto:limingji...@huawei.com>
Cc: 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>; 
Frank Brockners (fbrockne) <fbroc...@cisco.com><mailto:fbroc...@cisco.com>; 
Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES at Cisco) 
<ava...@cisco.com><mailto:ava...@cisco.com>
Subject: Interpretation of yardstick test results

Hi Kubi,

Can you help us with interpreting yardstick results? I've attached data from 
four runs produced by yardstick, but I have no idea what they mean - how do I 
know what is a good result and what is not?

Thanks,
Juraj




--

Morgan Richomme

Orange/ IMT/ OLN/ CNC/ NCA/ SINA



Network architect for innovative services

Future of the Network community member

Open source Orange community manager





tel. +33 (0) 296 072 106

mob. +33 (0) 637 753 326

morgan.richo...@orange.com<mailto:morgan.richo...@orange.com>

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruir

Re: [opnfv-tech-discuss] Interpretation of yardstick test results

2016-10-06 Thread morgan.richomme
Hi,

I think the question was already asked in Brahmaputra :)

We got an update on grafana last week but it was more on the
capabilities of the tools than on the interpretation of the results.
I think we should clearly have a discussion on this topic.
It is probably complex to define thresholds = f(pod, hardware, network
config,..) but it would be helpful.
Is there any activity on standardization side on this area?

I put several possible future discussions on the Testing community page
https://wiki.opnfv.org/display/meetings/TestPerf
Please note that I postponed all the today agenda to next week as the
quoraum was not reached.
I put the catalog mentioned by Myriam last week, but also the question
of test coverage (discussions initiated months ago but could be
interesting to reinitiate for Danube)  and performance/stress tests.
I was recently asked about the stress tests done in OPNFV and as far as
I know we do not really try to stress the system (except vsperf and
storperf).
We have the tools and the framework (Yardstick, Rally,..and some
proprietary loaders)  to do it but not a real strategy on
performance/stress tests
Danube is maybe a good time to try to elaborate something
I think we need also to organize a sync with CVS group to avoid any
misunderstanding

/Morgan
 


Le 06/10/2016 à 09:59, Frank Brockners (fbrockne) a écrit :
>
> Hi folks,
>
>  
>
> is there anyone around who can help with interpreting Yardstick’s test
> results? I.e. what do all the numbers that we see created and
> submitted into the InfluxDB mean – i.e. how do I know whether a number
> is “good”, “good enough”, “not good”? In Grafana you see some nice
> graphs – but how do you interpret them? I scanned the user-guide but
> did not find any guidance – and from talking to other folks, I don’t
> seem to be alone in struggling to understand the results.
>
>  
>
> Would greatly appreciate if someone could either explain the results
> (see e.g. Juraj’s email below) or point us to a document that does so.
>
>  
>
> Many thanks!
>
>
> Frank
>
>  
>
> *From:*Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco)
> *Sent:* Dienstag, 4. Oktober 2016 16:23
> *To:* Gaoliang (kubi) ; limingjiang
> 
> *Cc:* opnfv-tech-discuss@lists.opnfv.org; Frank Brockners (fbrockne)
> ; Andrej Vanko -X (avanko - PANTHEON TECHNOLOGIES
> at Cisco) 
> *Subject:* Interpretation of yardstick test results
>
>  
>
> Hi Kubi,
>
>  
>
> Can you help us with interpreting yardstick results? I've attached
> data from four runs produced by yardstick, but I have no idea what
> they mean – how do I know what is a good result and what is not?
>
>  
>
> Thanks,
>
> Juraj
>


-- 
Morgan Richomme
Orange/ IMT/ OLN/ CNC/ NCA/ SINA 

Network architect for innovative services
Future of the Network community member
Open source Orange community manager


tel. +33 (0) 296 072 106
mob. +33 (0) 637 753 326
morgan.richo...@orange.com


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss