Re: [3.x]: openshift router and its own metrics
On Aug 16, 2019, at 4:55 AM, Daniel Comnea wrote: On Thu, Aug 15, 2019 at 7:46 PM Clayton Coleman wrote: > > > On Aug 15, 2019, at 12:25 PM, Daniel Comnea wrote: > > Hi Clayton, > > Certainly some of the metrics should be preserved across reloads, e.g. > metrics like *haproxy_server_http_responses_total *should be preserved > across reload (though to an extent, Prometheus can handle resets correctly > with its native support). > > However, the metric > *haproxy_server_http_average_response_latency_milliseconds* appears also > to be accumulating when we wouldn't expect it to. (According the the > haproxy stats, I think that's a rolling average over the last 1024 calls -- > so it goes up and down, or should.) > > > File a bug with more details, can’t say off the top of my head > [DC]: thank you, do you have a preference/ suggestion where i should open > it for OKD ? i guess BZ is not the suitable for OKD, or am i wrong ? > There should be BZ components for origin > Thoughts? > > > Cheers, > Dani > > > On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman > wrote: > >> Metrics memory use in the router should be proportional to number of >> services, endpoints, and routes. I doubt it's leaking there and if it were >> it'd be really slow since we don't restart the router monitor process >> ever. Stats should definitely be preserved across reloads, but will not be >> preserved across the pod being restarted. >> >> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace wrote: >> >>> >>> >>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea >>> wrote: >>> Hi, Would appreciate if anyone can please confirm that my understanding is correct w.r.t the way the router haproxy image [1] is built. Am i right to assume that the image [1] is is built as it's seen without any other layer being added to include [2] ? Also am i right to say the haproxy metrics [2] is part of the origin package ? A bit of background/ context: a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image with 3.10 because we were seeing some problems with the reload and so we wanted to take the benefit of the native haproxy 1.8 reload feature to stop affecting the traffic. While everything was nice and working okay we've noticed recently that the haproxy stats do slowly increase and we do wonder if this is an accumulation or not cause (maybe?) by the reloads. Now i'm aware of a change made [3] however i suspect that is not part of the 3.10 image hence my question to double check if my understanding is wrong or not. Cheers, Dani [1] https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy [2] https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics [3] https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >>> >>> I think Clayton (copied) has the history here, but the nature of the >>> metrics commit you referenced is that many of the exposed metrics points >>> are counters which were being reset across reloads. The patch was (I think) >>> to enable counter metrics to correctly aaccumulate across reloads. >>> >>> As to how the image itself is built, the pkg directly is part of the >>> router controller code included with the image. Not sure if that answers >>> your question. >>> >>> -- >>> >>> Dan Mace >>> >>> Principal Software Engineer, OpenShift >>> >>> Red Hat >>> >>> dm...@redhat.com >>> >>> >>> ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: [3.x]: openshift router and its own metrics
On Thu, Aug 15, 2019 at 7:46 PM Clayton Coleman wrote: > > > On Aug 15, 2019, at 12:25 PM, Daniel Comnea wrote: > > Hi Clayton, > > Certainly some of the metrics should be preserved across reloads, e.g. > metrics like *haproxy_server_http_responses_total *should be preserved > across reload (though to an extent, Prometheus can handle resets correctly > with its native support). > > However, the metric > *haproxy_server_http_average_response_latency_milliseconds* appears also > to be accumulating when we wouldn't expect it to. (According the the > haproxy stats, I think that's a rolling average over the last 1024 calls -- > so it goes up and down, or should.) > > > File a bug with more details, can’t say off the top of my head > [DC]: thank you, do you have a preference/ suggestion where i should open > it for OKD ? i guess BZ is not the suitable for OKD, or am i wrong ? > > > Thoughts? > > > Cheers, > Dani > > > On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman > wrote: > >> Metrics memory use in the router should be proportional to number of >> services, endpoints, and routes. I doubt it's leaking there and if it were >> it'd be really slow since we don't restart the router monitor process >> ever. Stats should definitely be preserved across reloads, but will not be >> preserved across the pod being restarted. >> >> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace wrote: >> >>> >>> >>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea >>> wrote: >>> Hi, Would appreciate if anyone can please confirm that my understanding is correct w.r.t the way the router haproxy image [1] is built. Am i right to assume that the image [1] is is built as it's seen without any other layer being added to include [2] ? Also am i right to say the haproxy metrics [2] is part of the origin package ? A bit of background/ context: a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image with 3.10 because we were seeing some problems with the reload and so we wanted to take the benefit of the native haproxy 1.8 reload feature to stop affecting the traffic. While everything was nice and working okay we've noticed recently that the haproxy stats do slowly increase and we do wonder if this is an accumulation or not cause (maybe?) by the reloads. Now i'm aware of a change made [3] however i suspect that is not part of the 3.10 image hence my question to double check if my understanding is wrong or not. Cheers, Dani [1] https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy [2] https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics [3] https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >>> >>> I think Clayton (copied) has the history here, but the nature of the >>> metrics commit you referenced is that many of the exposed metrics points >>> are counters which were being reset across reloads. The patch was (I think) >>> to enable counter metrics to correctly aaccumulate across reloads. >>> >>> As to how the image itself is built, the pkg directly is part of the >>> router controller code included with the image. Not sure if that answers >>> your question. >>> >>> -- >>> >>> Dan Mace >>> >>> Principal Software Engineer, OpenShift >>> >>> Red Hat >>> >>> dm...@redhat.com >>> >>> >>> ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: [3.x]: openshift router and its own metrics
On Aug 15, 2019, at 12:25 PM, Daniel Comnea wrote: Hi Clayton, Certainly some of the metrics should be preserved across reloads, e.g. metrics like *haproxy_server_http_responses_total *should be preserved across reload (though to an extent, Prometheus can handle resets correctly with its native support). However, the metric *haproxy_server_http_average_response_latency_milliseconds* appears also to be accumulating when we wouldn't expect it to. (According the the haproxy stats, I think that's a rolling average over the last 1024 calls -- so it goes up and down, or should.) File a bug with more details, can’t say off the top of my head Thoughts? Cheers, Dani On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman wrote: > Metrics memory use in the router should be proportional to number of > services, endpoints, and routes. I doubt it's leaking there and if it were > it'd be really slow since we don't restart the router monitor process > ever. Stats should definitely be preserved across reloads, but will not be > preserved across the pod being restarted. > > On Thu, Aug 15, 2019 at 10:30 AM Dan Mace wrote: > >> >> >> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea >> wrote: >> >>> Hi, >>> >>> Would appreciate if anyone can please confirm that my understanding is >>> correct w.r.t the way the router haproxy image [1] is built. >>> Am i right to assume that the image [1] is is built as it's seen without >>> any other layer being added to include [2] ? >>> Also am i right to say the haproxy metrics [2] is part of the origin >>> package ? >>> >>> >>> A bit of background/ context: >>> >>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image >>> with 3.10 because we were seeing some problems with the reload and so we >>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop >>> affecting the traffic. >>> >>> While everything was nice and working okay we've noticed recently that >>> the haproxy stats do slowly increase and we do wonder if this is an >>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a >>> change made [3] however i suspect that is not part of the 3.10 image hence >>> my question to double check if my understanding is wrong or not. >>> >>> >>> Cheers, >>> Dani >>> >>> [1] >>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy >>> [2] >>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics >>> [3] >>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff >>> ___ >>> dev mailing list >>> dev@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >>> >> >> I think Clayton (copied) has the history here, but the nature of the >> metrics commit you referenced is that many of the exposed metrics points >> are counters which were being reset across reloads. The patch was (I think) >> to enable counter metrics to correctly aaccumulate across reloads. >> >> As to how the image itself is built, the pkg directly is part of the >> router controller code included with the image. Not sure if that answers >> your question. >> >> -- >> >> Dan Mace >> >> Principal Software Engineer, OpenShift >> >> Red Hat >> >> dm...@redhat.com >> >> >> ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: [3.x]: openshift router and its own metrics
Hi Clayton, Certainly some of the metrics should be preserved across reloads, e.g. metrics like *haproxy_server_http_responses_total *should be preserved across reload (though to an extent, Prometheus can handle resets correctly with its native support). However, the metric *haproxy_server_http_average_response_latency_milliseconds* appears also to be accumulating when we wouldn't expect it to. (According the the haproxy stats, I think that's a rolling average over the last 1024 calls -- so it goes up and down, or should.) Thoughts? Cheers, Dani On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman wrote: > Metrics memory use in the router should be proportional to number of > services, endpoints, and routes. I doubt it's leaking there and if it were > it'd be really slow since we don't restart the router monitor process > ever. Stats should definitely be preserved across reloads, but will not be > preserved across the pod being restarted. > > On Thu, Aug 15, 2019 at 10:30 AM Dan Mace wrote: > >> >> >> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea >> wrote: >> >>> Hi, >>> >>> Would appreciate if anyone can please confirm that my understanding is >>> correct w.r.t the way the router haproxy image [1] is built. >>> Am i right to assume that the image [1] is is built as it's seen without >>> any other layer being added to include [2] ? >>> Also am i right to say the haproxy metrics [2] is part of the origin >>> package ? >>> >>> >>> A bit of background/ context: >>> >>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image >>> with 3.10 because we were seeing some problems with the reload and so we >>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop >>> affecting the traffic. >>> >>> While everything was nice and working okay we've noticed recently that >>> the haproxy stats do slowly increase and we do wonder if this is an >>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a >>> change made [3] however i suspect that is not part of the 3.10 image hence >>> my question to double check if my understanding is wrong or not. >>> >>> >>> Cheers, >>> Dani >>> >>> [1] >>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy >>> [2] >>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics >>> [3] >>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff >>> ___ >>> dev mailing list >>> dev@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >>> >> >> I think Clayton (copied) has the history here, but the nature of the >> metrics commit you referenced is that many of the exposed metrics points >> are counters which were being reset across reloads. The patch was (I think) >> to enable counter metrics to correctly aaccumulate across reloads. >> >> As to how the image itself is built, the pkg directly is part of the >> router controller code included with the image. Not sure if that answers >> your question. >> >> -- >> >> Dan Mace >> >> Principal Software Engineer, OpenShift >> >> Red Hat >> >> dm...@redhat.com >> >> >> ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: [3.x]: openshift router and its own metrics
Metrics memory use in the router should be proportional to number of services, endpoints, and routes. I doubt it's leaking there and if it were it'd be really slow since we don't restart the router monitor process ever. Stats should definitely be preserved across reloads, but will not be preserved across the pod being restarted. On Thu, Aug 15, 2019 at 10:30 AM Dan Mace wrote: > > > On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea > wrote: > >> Hi, >> >> Would appreciate if anyone can please confirm that my understanding is >> correct w.r.t the way the router haproxy image [1] is built. >> Am i right to assume that the image [1] is is built as it's seen without >> any other layer being added to include [2] ? >> Also am i right to say the haproxy metrics [2] is part of the origin >> package ? >> >> >> A bit of background/ context: >> >> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image >> with 3.10 because we were seeing some problems with the reload and so we >> wanted to take the benefit of the native haproxy 1.8 reload feature to stop >> affecting the traffic. >> >> While everything was nice and working okay we've noticed recently that >> the haproxy stats do slowly increase and we do wonder if this is an >> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a >> change made [3] however i suspect that is not part of the 3.10 image hence >> my question to double check if my understanding is wrong or not. >> >> >> Cheers, >> Dani >> >> [1] >> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy >> [2] >> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics >> [3] >> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff >> ___ >> dev mailing list >> dev@lists.openshift.redhat.com >> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >> > > I think Clayton (copied) has the history here, but the nature of the > metrics commit you referenced is that many of the exposed metrics points > are counters which were being reset across reloads. The patch was (I think) > to enable counter metrics to correctly aaccumulate across reloads. > > As to how the image itself is built, the pkg directly is part of the > router controller code included with the image. Not sure if that answers > your question. > > -- > > Dan Mace > > Principal Software Engineer, OpenShift > > Red Hat > > dm...@redhat.com > > > ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: [3.x]: openshift router and its own metrics
On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea wrote: > Hi, > > Would appreciate if anyone can please confirm that my understanding is > correct w.r.t the way the router haproxy image [1] is built. > Am i right to assume that the image [1] is is built as it's seen without > any other layer being added to include [2] ? > Also am i right to say the haproxy metrics [2] is part of the origin > package ? > > > A bit of background/ context: > > a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image > with 3.10 because we were seeing some problems with the reload and so we > wanted to take the benefit of the native haproxy 1.8 reload feature to stop > affecting the traffic. > > While everything was nice and working okay we've noticed recently that the > haproxy stats do slowly increase and we do wonder if this is an > accumulation or not cause (maybe?) by the reloads. Now i'm aware of a > change made [3] however i suspect that is not part of the 3.10 image hence > my question to double check if my understanding is wrong or not. > > > Cheers, > Dani > > [1] > https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy > [2] > https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics > [3] > https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff > ___ > dev mailing list > dev@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev > I think Clayton (copied) has the history here, but the nature of the metrics commit you referenced is that many of the exposed metrics points are counters which were being reset across reloads. The patch was (I think) to enable counter metrics to correctly aaccumulate across reloads. As to how the image itself is built, the pkg directly is part of the router controller code included with the image. Not sure if that answers your question. -- Dan Mace Principal Software Engineer, OpenShift Red Hat dm...@redhat.com ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev