Re: [3.x]: openshift router and its own metrics

2019-08-16 Thread Clayton Coleman
On Aug 16, 2019, at 4:55 AM, Daniel Comnea  wrote:



On Thu, Aug 15, 2019 at 7:46 PM Clayton Coleman  wrote:

>
>
> On Aug 15, 2019, at 12:25 PM, Daniel Comnea  wrote:
>
> Hi Clayton,
>
> Certainly some of the metrics should be preserved across reloads, e.g.
> metrics like *haproxy_server_http_responses_total *should be preserved
> across reload (though to an extent, Prometheus can handle resets correctly
> with its native support).
>
> However, the metric
> *haproxy_server_http_average_response_latency_milliseconds* appears also
> to be accumulating when we wouldn't expect it to. (According the the
> haproxy stats, I think that's a rolling average over the last 1024 calls --
> so it goes up and down, or should.)
>
>
> File a bug with more details, can’t say off the top of my head
> [DC]: thank you, do you have a preference/ suggestion where i should open
> it for OKD ? i guess BZ is not the suitable for OKD, or am i wrong ?
>

There should be BZ components for origin


> Thoughts?
>
>
> Cheers,
> Dani
>
>
> On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman 
> wrote:
>
>> Metrics memory use in the router should be proportional to number of
>> services, endpoints, and routes.  I doubt it's leaking there and if it were
>> it'd be really slow since we don't restart the router monitor process
>> ever.  Stats should definitely be preserved across reloads, but will not be
>> preserved across the pod being restarted.
>>
>> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace  wrote:
>>
>>>
>>>
>>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
>>> wrote:
>>>
 Hi,

 Would appreciate if anyone can please confirm that my understanding is
 correct w.r.t the way the router haproxy image [1] is built.
 Am i right to assume that the image [1] is is built as it's seen
 without any other layer being added to include [2] ?
 Also am i right to say the haproxy metrics [2] is part of the origin
 package ?


 A bit of background/ context:

 a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
 with 3.10 because we were seeing some problems with the reload and so we
 wanted to take the benefit of the native haproxy 1.8 reload feature to stop
 affecting the traffic.

 While everything was nice and working okay we've noticed recently that
 the haproxy stats do slowly increase and we do wonder if this is an
 accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
 change made [3] however i suspect that is not part of the 3.10 image hence
 my question to double check if my understanding is wrong or not.


 Cheers,
 Dani

 [1]
 https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
 [2]
 https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
 [3]
 https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
 ___
 dev mailing list
 dev@lists.openshift.redhat.com
 http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

>>>
>>> I think Clayton (copied) has the history here, but the nature of the
>>> metrics commit you referenced is that many of the exposed metrics points
>>> are counters which were being reset across reloads. The patch was (I think)
>>> to enable counter metrics to correctly aaccumulate across reloads.
>>>
>>> As to how the image itself is built, the pkg directly is part of the
>>> router controller code included with the image. Not sure if that answers
>>> your question.
>>>
>>> --
>>>
>>> Dan Mace
>>>
>>> Principal Software Engineer, OpenShift
>>>
>>> Red Hat
>>>
>>> dm...@redhat.com
>>>
>>>
>>>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: [3.x]: openshift router and its own metrics

2019-08-16 Thread Daniel Comnea
On Thu, Aug 15, 2019 at 7:46 PM Clayton Coleman  wrote:

>
>
> On Aug 15, 2019, at 12:25 PM, Daniel Comnea  wrote:
>
> Hi Clayton,
>
> Certainly some of the metrics should be preserved across reloads, e.g.
> metrics like *haproxy_server_http_responses_total *should be preserved
> across reload (though to an extent, Prometheus can handle resets correctly
> with its native support).
>
> However, the metric
> *haproxy_server_http_average_response_latency_milliseconds* appears also
> to be accumulating when we wouldn't expect it to. (According the the
> haproxy stats, I think that's a rolling average over the last 1024 calls --
> so it goes up and down, or should.)
>
>
> File a bug with more details, can’t say off the top of my head
> [DC]: thank you, do you have a preference/ suggestion where i should open
> it for OKD ? i guess BZ is not the suitable for OKD, or am i wrong ?
>
>
> Thoughts?
>
>
> Cheers,
> Dani
>
>
> On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman 
> wrote:
>
>> Metrics memory use in the router should be proportional to number of
>> services, endpoints, and routes.  I doubt it's leaking there and if it were
>> it'd be really slow since we don't restart the router monitor process
>> ever.  Stats should definitely be preserved across reloads, but will not be
>> preserved across the pod being restarted.
>>
>> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace  wrote:
>>
>>>
>>>
>>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
>>> wrote:
>>>
 Hi,

 Would appreciate if anyone can please confirm that my understanding is
 correct w.r.t the way the router haproxy image [1] is built.
 Am i right to assume that the image [1] is is built as it's seen
 without any other layer being added to include [2] ?
 Also am i right to say the haproxy metrics [2] is part of the origin
 package ?


 A bit of background/ context:

 a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
 with 3.10 because we were seeing some problems with the reload and so we
 wanted to take the benefit of the native haproxy 1.8 reload feature to stop
 affecting the traffic.

 While everything was nice and working okay we've noticed recently that
 the haproxy stats do slowly increase and we do wonder if this is an
 accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
 change made [3] however i suspect that is not part of the 3.10 image hence
 my question to double check if my understanding is wrong or not.


 Cheers,
 Dani

 [1]
 https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
 [2]
 https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
 [3]
 https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
 ___
 dev mailing list
 dev@lists.openshift.redhat.com
 http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

>>>
>>> I think Clayton (copied) has the history here, but the nature of the
>>> metrics commit you referenced is that many of the exposed metrics points
>>> are counters which were being reset across reloads. The patch was (I think)
>>> to enable counter metrics to correctly aaccumulate across reloads.
>>>
>>> As to how the image itself is built, the pkg directly is part of the
>>> router controller code included with the image. Not sure if that answers
>>> your question.
>>>
>>> --
>>>
>>> Dan Mace
>>>
>>> Principal Software Engineer, OpenShift
>>>
>>> Red Hat
>>>
>>> dm...@redhat.com
>>>
>>>
>>>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: [3.x]: openshift router and its own metrics

2019-08-15 Thread Clayton Coleman
On Aug 15, 2019, at 12:25 PM, Daniel Comnea  wrote:

Hi Clayton,

Certainly some of the metrics should be preserved across reloads, e.g.
metrics like *haproxy_server_http_responses_total *should be preserved
across reload (though to an extent, Prometheus can handle resets correctly
with its native support).

However, the metric
*haproxy_server_http_average_response_latency_milliseconds* appears also to
be accumulating when we wouldn't expect it to. (According the the haproxy
stats, I think that's a rolling average over the last 1024 calls -- so it
goes up and down, or should.)


File a bug with more details, can’t say off the top of my head


Thoughts?


Cheers,
Dani


On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman  wrote:

> Metrics memory use in the router should be proportional to number of
> services, endpoints, and routes.  I doubt it's leaking there and if it were
> it'd be really slow since we don't restart the router monitor process
> ever.  Stats should definitely be preserved across reloads, but will not be
> preserved across the pod being restarted.
>
> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace  wrote:
>
>>
>>
>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
>> wrote:
>>
>>> Hi,
>>>
>>> Would appreciate if anyone can please confirm that my understanding is
>>> correct w.r.t the way the router haproxy image [1] is built.
>>> Am i right to assume that the image [1] is is built as it's seen without
>>> any other layer being added to include [2] ?
>>> Also am i right to say the haproxy metrics [2] is part of the origin
>>> package ?
>>>
>>>
>>> A bit of background/ context:
>>>
>>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
>>> with 3.10 because we were seeing some problems with the reload and so we
>>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop
>>> affecting the traffic.
>>>
>>> While everything was nice and working okay we've noticed recently that
>>> the haproxy stats do slowly increase and we do wonder if this is an
>>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
>>> change made [3] however i suspect that is not part of the 3.10 image hence
>>> my question to double check if my understanding is wrong or not.
>>>
>>>
>>> Cheers,
>>> Dani
>>>
>>> [1]
>>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
>>> [2]
>>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
>>> [3]
>>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
>>> ___
>>> dev mailing list
>>> dev@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>
>>
>> I think Clayton (copied) has the history here, but the nature of the
>> metrics commit you referenced is that many of the exposed metrics points
>> are counters which were being reset across reloads. The patch was (I think)
>> to enable counter metrics to correctly aaccumulate across reloads.
>>
>> As to how the image itself is built, the pkg directly is part of the
>> router controller code included with the image. Not sure if that answers
>> your question.
>>
>> --
>>
>> Dan Mace
>>
>> Principal Software Engineer, OpenShift
>>
>> Red Hat
>>
>> dm...@redhat.com
>>
>>
>>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: [3.x]: openshift router and its own metrics

2019-08-15 Thread Daniel Comnea
Hi Clayton,

Certainly some of the metrics should be preserved across reloads, e.g.
metrics like *haproxy_server_http_responses_total *should be preserved
across reload (though to an extent, Prometheus can handle resets correctly
with its native support).

However, the metric
*haproxy_server_http_average_response_latency_milliseconds* appears also to
be accumulating when we wouldn't expect it to. (According the the haproxy
stats, I think that's a rolling average over the last 1024 calls -- so it
goes up and down, or should.)

Thoughts?


Cheers,
Dani


On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman  wrote:

> Metrics memory use in the router should be proportional to number of
> services, endpoints, and routes.  I doubt it's leaking there and if it were
> it'd be really slow since we don't restart the router monitor process
> ever.  Stats should definitely be preserved across reloads, but will not be
> preserved across the pod being restarted.
>
> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace  wrote:
>
>>
>>
>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
>> wrote:
>>
>>> Hi,
>>>
>>> Would appreciate if anyone can please confirm that my understanding is
>>> correct w.r.t the way the router haproxy image [1] is built.
>>> Am i right to assume that the image [1] is is built as it's seen without
>>> any other layer being added to include [2] ?
>>> Also am i right to say the haproxy metrics [2] is part of the origin
>>> package ?
>>>
>>>
>>> A bit of background/ context:
>>>
>>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
>>> with 3.10 because we were seeing some problems with the reload and so we
>>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop
>>> affecting the traffic.
>>>
>>> While everything was nice and working okay we've noticed recently that
>>> the haproxy stats do slowly increase and we do wonder if this is an
>>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
>>> change made [3] however i suspect that is not part of the 3.10 image hence
>>> my question to double check if my understanding is wrong or not.
>>>
>>>
>>> Cheers,
>>> Dani
>>>
>>> [1]
>>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
>>> [2]
>>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
>>> [3]
>>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
>>> ___
>>> dev mailing list
>>> dev@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>
>>
>> I think Clayton (copied) has the history here, but the nature of the
>> metrics commit you referenced is that many of the exposed metrics points
>> are counters which were being reset across reloads. The patch was (I think)
>> to enable counter metrics to correctly aaccumulate across reloads.
>>
>> As to how the image itself is built, the pkg directly is part of the
>> router controller code included with the image. Not sure if that answers
>> your question.
>>
>> --
>>
>> Dan Mace
>>
>> Principal Software Engineer, OpenShift
>>
>> Red Hat
>>
>> dm...@redhat.com
>>
>>
>>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: [3.x]: openshift router and its own metrics

2019-08-15 Thread Clayton Coleman
Metrics memory use in the router should be proportional to number of
services, endpoints, and routes.  I doubt it's leaking there and if it were
it'd be really slow since we don't restart the router monitor process
ever.  Stats should definitely be preserved across reloads, but will not be
preserved across the pod being restarted.

On Thu, Aug 15, 2019 at 10:30 AM Dan Mace  wrote:

>
>
> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
> wrote:
>
>> Hi,
>>
>> Would appreciate if anyone can please confirm that my understanding is
>> correct w.r.t the way the router haproxy image [1] is built.
>> Am i right to assume that the image [1] is is built as it's seen without
>> any other layer being added to include [2] ?
>> Also am i right to say the haproxy metrics [2] is part of the origin
>> package ?
>>
>>
>> A bit of background/ context:
>>
>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
>> with 3.10 because we were seeing some problems with the reload and so we
>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop
>> affecting the traffic.
>>
>> While everything was nice and working okay we've noticed recently that
>> the haproxy stats do slowly increase and we do wonder if this is an
>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
>> change made [3] however i suspect that is not part of the 3.10 image hence
>> my question to double check if my understanding is wrong or not.
>>
>>
>> Cheers,
>> Dani
>>
>> [1]
>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
>> [2]
>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
>> [3]
>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
>> ___
>> dev mailing list
>> dev@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>
>
> I think Clayton (copied) has the history here, but the nature of the
> metrics commit you referenced is that many of the exposed metrics points
> are counters which were being reset across reloads. The patch was (I think)
> to enable counter metrics to correctly aaccumulate across reloads.
>
> As to how the image itself is built, the pkg directly is part of the
> router controller code included with the image. Not sure if that answers
> your question.
>
> --
>
> Dan Mace
>
> Principal Software Engineer, OpenShift
>
> Red Hat
>
> dm...@redhat.com
>
>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: [3.x]: openshift router and its own metrics

2019-08-15 Thread Dan Mace
On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea 
wrote:

> Hi,
>
> Would appreciate if anyone can please confirm that my understanding is
> correct w.r.t the way the router haproxy image [1] is built.
> Am i right to assume that the image [1] is is built as it's seen without
> any other layer being added to include [2] ?
> Also am i right to say the haproxy metrics [2] is part of the origin
> package ?
>
>
> A bit of background/ context:
>
> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
> with 3.10 because we were seeing some problems with the reload and so we
> wanted to take the benefit of the native haproxy 1.8 reload feature to stop
> affecting the traffic.
>
> While everything was nice and working okay we've noticed recently that the
> haproxy stats do slowly increase and we do wonder if this is an
> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
> change made [3] however i suspect that is not part of the 3.10 image hence
> my question to double check if my understanding is wrong or not.
>
>
> Cheers,
> Dani
>
> [1]
> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
> [2]
> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
> [3]
> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>

I think Clayton (copied) has the history here, but the nature of the
metrics commit you referenced is that many of the exposed metrics points
are counters which were being reset across reloads. The patch was (I think)
to enable counter metrics to correctly aaccumulate across reloads.

As to how the image itself is built, the pkg directly is part of the router
controller code included with the image. Not sure if that answers your
question.

-- 

Dan Mace

Principal Software Engineer, OpenShift

Red Hat

dm...@redhat.com
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev