Re: [opnfv-tech-discuss] [test-wg] Monitoring dashboard for long duration test

2017-11-28 Thread Rutuja Surve
Hi Julien,
Thanks a lot for your review.
Yes, I'll make sure that there is POD level support for metrics. It is
possible to configure the data duration, so we can set it to provide daily
data instead of instant traffic. I'll use Jinja2 template for the dashboard
file, thanks for pointing it out.

Thanks,
Rutuja

On Mon, Nov 27, 2017 at 8:52 PM, Julien  wrote:

> It's really cool.
>
> As Kubi mentioned, it is useful to support POD level metrics info
> including containers, nodes, disk/memory usage.
> For network traffic, it is too sensitive when we use the instant traffic,
> can we provide the daily data?
> The config file "prototype_prometheus_dashboard" in review is 2000 lines
> long. I suggest to use a simple Jinja2 template to produce this file. It
> will more easily to use and maintain.
>
> BR/Julien
>
>
>
> Rutuja Surve 于2017年11月26日周日 下午6:05写道:
>
>> Hi Kubi,
>> Thanks for reviewing the dashboard.
>> It is possible to monitor multiple hosts (the jump server and its
>> corresponding compute and controller nodes) with this dashboard. The
>> 'instance' parameter for every metric corresponds to the IP address of the
>> node, hence its possible to filter it by node IP. The whole physical
>> deployment is configured in the pod.yaml file where we can see information
>> regarding the compute and controller nodes.
>> We have scripts for installing the statistics collecting daemons
>> (Cadvisor and Collectd) on the jump-server and the client nodes (Compute
>> and controller) that send the metrics to the jump server.
>> The 'Load' corresponds to the CPU load and can be best explained with
>> this Prometheus query that is used for collecting it:
>>
>> node_load1{instance=~\"$server:.*\"} / count by(job, instance)(count
>> by(job, instance, cpu)(node_cpu{instance=~\"$server:.*\"}))
>>
>> CPU Usage per container corresponds to :
>>
>> sum(rate(container_cpu_usage_seconds_total{name=~\".+\"}[$interval])) by
>> (name) * 100
>>
>> So, if the sum of the rate for a particular interval exceeds 1, it can
>> cross 100% and reach upto 400%. It's more about how the query is framed.
>> The network traffic apparently monitors the http port of the host.
>>
>> Do let me know if you have more questions.
>>
>> Thanks,
>>
>> Rutuja
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 22, 2017 at 8:25 AM, Gaoliang (kubi) <
>> jean.gaoli...@huawei.com> wrote:
>>
>>> Hi Rutuja,
>>>
>>>
>>>
>>> The dashboard looks pretty good J
>>>
>>>
>>>
>>> Only few questions about the dashboard.
>>>
>>>
>>>
>>> What kind of  SUT you can monitor?  A single host or 5 hosts (OPNFV
>>> physical HA deployment)?  It seems that It can be filtered by Node IP. Do
>>> we have a whole view for a physical deployment POD?
>>>
>>>
>>>
>>> What does the “Load” mean? CPU Load?  Why “CPU usage” can go to 400%?
>>> Does the “Network Traffic” monitor one port or all of the ports of host?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Kubi
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* test-wg-boun...@lists.opnfv.org [mailto:test-wg-bounces@lists.
>>> opnfv.org] *On Behalf Of *Rutuja Surve
>>> *Sent:* Tuesday, November 21, 2017 4:42 PM
>>> *To:* opnfv-tech-discuss@lists.opnfv.org; test...@lists.opnfv.org
>>> *Subject:* [test-wg] Monitoring dashboard for long duration test
>>>
>>>
>>>
>>> Hi,
>>>
>>> I am currently working on Bottlenecks intern project focusing on
>>> monitoring/dashboarding for long duration test.
>>> We have been closing to a protoype. We need your opinions/comments on
>>> how to organize the dashboard, what metrics/plugins should be included, etc.
>>> The screenshots/details for the pre-protoype dashboard are provided
>>> below. Please comment on that.
>>> Currently, we do not have a public access to the dashboard. If you'd
>>> like to know more details/operations, please refer to the gerrit patch:
>>>
>>> https://gerrit.opnfv.org/gerrit/#/c/47567/
>>>
>>> and give your review there or attend the Bottlenecks meeting tomorrow
>>> (Wednesday) at 8.30 am IST where I will provide regular reports for the
>>> progress and show customization of the dashboard.
>>>
>>>
>>> Also find the screenshot-pdf of the dashboard attached with this e-mail.
>>> We are using Prometheus for querying and as datasource, Cadvisor and
>>> Collectd plugins for collecting system metrics and Grafana for displaying
>>> the dashboard.
>>>
>>> Thanks,
>>> Rutuja
>>>
>>>
>>>
>>
>> ___
>> opnfv-tech-discuss mailing list
>> opnfv-tech-discuss@lists.opnfv.org
>> https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss
>>
>
___
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss


Re: [opnfv-tech-discuss] [test-wg] Monitoring dashboard for long duration test

2017-11-27 Thread Julien
It's really cool.

As Kubi mentioned, it is useful to support POD level metrics info including
containers, nodes, disk/memory usage.
For network traffic, it is too sensitive when we use the instant traffic,
can we provide the daily data?
The config file "prototype_prometheus_dashboard" in review is 2000 lines
long. I suggest to use a simple Jinja2 template to produce this file. It
will more easily to use and maintain.

BR/Julien



Rutuja Surve 于2017年11月26日周日 下午6:05写道:

> Hi Kubi,
> Thanks for reviewing the dashboard.
> It is possible to monitor multiple hosts (the jump server and its
> corresponding compute and controller nodes) with this dashboard. The
> 'instance' parameter for every metric corresponds to the IP address of the
> node, hence its possible to filter it by node IP. The whole physical
> deployment is configured in the pod.yaml file where we can see information
> regarding the compute and controller nodes.
> We have scripts for installing the statistics collecting daemons (Cadvisor
> and Collectd) on the jump-server and the client nodes (Compute and
> controller) that send the metrics to the jump server.
> The 'Load' corresponds to the CPU load and can be best explained with this
> Prometheus query that is used for collecting it:
>
> node_load1{instance=~\"$server:.*\"} / count by(job, instance)(count
> by(job, instance, cpu)(node_cpu{instance=~\"$server:.*\"}))
>
> CPU Usage per container corresponds to :
>
> sum(rate(container_cpu_usage_seconds_total{name=~\".+\"}[$interval])) by
> (name) * 100
>
> So, if the sum of the rate for a particular interval exceeds 1, it can
> cross 100% and reach upto 400%. It's more about how the query is framed.
> The network traffic apparently monitors the http port of the host.
>
> Do let me know if you have more questions.
>
> Thanks,
>
> Rutuja
>
>
>
>
>
>
>
> On Wed, Nov 22, 2017 at 8:25 AM, Gaoliang (kubi)  > wrote:
>
>> Hi Rutuja,
>>
>>
>>
>> The dashboard looks pretty good J
>>
>>
>>
>> Only few questions about the dashboard.
>>
>>
>>
>> What kind of  SUT you can monitor?  A single host or 5 hosts (OPNFV
>> physical HA deployment)?  It seems that It can be filtered by Node IP. Do
>> we have a whole view for a physical deployment POD?
>>
>>
>>
>> What does the “Load” mean? CPU Load?  Why “CPU usage” can go to 400%?
>> Does the “Network Traffic” monitor one port or all of the ports of host?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Kubi
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* test-wg-boun...@lists.opnfv.org [mailto:
>> test-wg-boun...@lists.opnfv.org] *On Behalf Of *Rutuja Surve
>> *Sent:* Tuesday, November 21, 2017 4:42 PM
>> *To:* opnfv-tech-discuss@lists.opnfv.org; test...@lists.opnfv.org
>> *Subject:* [test-wg] Monitoring dashboard for long duration test
>>
>>
>>
>> Hi,
>>
>> I am currently working on Bottlenecks intern project focusing on
>> monitoring/dashboarding for long duration test.
>> We have been closing to a protoype. We need your opinions/comments on how
>> to organize the dashboard, what metrics/plugins should be included, etc.
>> The screenshots/details for the pre-protoype dashboard are provided
>> below. Please comment on that.
>> Currently, we do not have a public access to the dashboard. If you'd like
>> to know more details/operations, please refer to the gerrit patch:
>>
>> https://gerrit.opnfv.org/gerrit/#/c/47567/
>>
>> and give your review there or attend the Bottlenecks meeting tomorrow
>> (Wednesday) at 8.30 am IST where I will provide regular reports for the
>> progress and show customization of the dashboard.
>>
>>
>> Also find the screenshot-pdf of the dashboard attached with this e-mail.
>> We are using Prometheus for querying and as datasource, Cadvisor and
>> Collectd plugins for collecting system metrics and Grafana for displaying
>> the dashboard.
>>
>> Thanks,
>> Rutuja
>>
>>
>>
>
> ___
> opnfv-tech-discuss mailing list
> opnfv-tech-discuss@lists.opnfv.org
> https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss
>
___
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss


Re: [opnfv-tech-discuss] [test-wg] Monitoring dashboard for long duration test

2017-11-26 Thread Rutuja Surve
Hi Kubi,
Thanks for reviewing the dashboard.
It is possible to monitor multiple hosts (the jump server and its
corresponding compute and controller nodes) with this dashboard. The
'instance' parameter for every metric corresponds to the IP address of the
node, hence its possible to filter it by node IP. The whole physical
deployment is configured in the pod.yaml file where we can see information
regarding the compute and controller nodes.
We have scripts for installing the statistics collecting daemons (Cadvisor
and Collectd) on the jump-server and the client nodes (Compute and
controller) that send the metrics to the jump server.
The 'Load' corresponds to the CPU load and can be best explained with this
Prometheus query that is used for collecting it:

node_load1{instance=~\"$server:.*\"} / count by(job, instance)(count
by(job, instance, cpu)(node_cpu{instance=~\"$server:.*\"}))

CPU Usage per container corresponds to :

sum(rate(container_cpu_usage_seconds_total{name=~\".+\"}[$interval])) by
(name) * 100

So, if the sum of the rate for a particular interval exceeds 1, it can
cross 100% and reach upto 400%. It's more about how the query is framed.
The network traffic apparently monitors the http port of the host.

Do let me know if you have more questions.

Thanks,

Rutuja







On Wed, Nov 22, 2017 at 8:25 AM, Gaoliang (kubi) 
wrote:

> Hi Rutuja,
>
>
>
> The dashboard looks pretty good J
>
>
>
> Only few questions about the dashboard.
>
>
>
> What kind of  SUT you can monitor?  A single host or 5 hosts (OPNFV
> physical HA deployment)?  It seems that It can be filtered by Node IP. Do
> we have a whole view for a physical deployment POD?
>
>
>
> What does the “Load” mean? CPU Load?  Why “CPU usage” can go to 400%?
> Does the “Network Traffic” monitor one port or all of the ports of host?
>
>
>
> Regards,
>
>
>
> Kubi
>
>
>
>
>
>
>
>
>
> *From:* test-wg-boun...@lists.opnfv.org [mailto:test-wg-bounces@lists.
> opnfv.org] *On Behalf Of *Rutuja Surve
> *Sent:* Tuesday, November 21, 2017 4:42 PM
> *To:* opnfv-tech-discuss@lists.opnfv.org; test...@lists.opnfv.org
> *Subject:* [test-wg] Monitoring dashboard for long duration test
>
>
>
> Hi,
>
> I am currently working on Bottlenecks intern project focusing on
> monitoring/dashboarding for long duration test.
> We have been closing to a protoype. We need your opinions/comments on how
> to organize the dashboard, what metrics/plugins should be included, etc.
> The screenshots/details for the pre-protoype dashboard are provided below.
> Please comment on that.
> Currently, we do not have a public access to the dashboard. If you'd like
> to know more details/operations, please refer to the gerrit patch:
>
> https://gerrit.opnfv.org/gerrit/#/c/47567/
>
> and give your review there or attend the Bottlenecks meeting tomorrow
> (Wednesday) at 8.30 am IST where I will provide regular reports for the
> progress and show customization of the dashboard.
>
>
> Also find the screenshot-pdf of the dashboard attached with this e-mail.
> We are using Prometheus for querying and as datasource, Cadvisor and
> Collectd plugins for collecting system metrics and Grafana for displaying
> the dashboard.
>
> Thanks,
> Rutuja
>
>
>
___
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss


Re: [opnfv-tech-discuss] [test-wg] Monitoring dashboard for long duration test

2017-11-21 Thread Gaoliang (kubi)
Hi Rutuja,

The dashboard looks pretty good ☺

Only few questions about the dashboard.

What kind of  SUT you can monitor?  A single host or 5 hosts (OPNFV physical HA 
deployment)?  It seems that It can be filtered by Node IP. Do we have a whole 
view for a physical deployment POD?

What does the “Load” mean? CPU Load?  Why “CPU usage” can go to 400%?  Does the 
“Network Traffic” monitor one port or all of the ports of host?

Regards,

Kubi




From: test-wg-boun...@lists.opnfv.org [mailto:test-wg-boun...@lists.opnfv.org] 
On Behalf Of Rutuja Surve
Sent: Tuesday, November 21, 2017 4:42 PM
To: opnfv-tech-discuss@lists.opnfv.org; test...@lists.opnfv.org
Subject: [test-wg] Monitoring dashboard for long duration test

Hi,

I am currently working on Bottlenecks intern project focusing on 
monitoring/dashboarding for long duration test.
We have been closing to a protoype. We need your opinions/comments on how to 
organize the dashboard, what metrics/plugins should be included, etc.
The screenshots/details for the pre-protoype dashboard are provided below. 
Please comment on that.
Currently, we do not have a public access to the dashboard. If you'd like to 
know more details/operations, please refer to the gerrit patch:
https://gerrit.opnfv.org/gerrit/#/c/47567/
and give your review there or attend the Bottlenecks meeting tomorrow 
(Wednesday) at 8.30 am IST where I will provide regular reports for the 
progress and show customization of the dashboard.

Also find the screenshot-pdf of the dashboard attached with this e-mail. We are 
using Prometheus for querying and as datasource, Cadvisor and Collectd plugins 
for collecting system metrics and Grafana for displaying the dashboard.

Thanks,
Rutuja

___
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss