[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

'Florian Luce' via Prometheus Users Fri, 20 Dec 2024 05:03:12 -0800

Hey Brian,

Thx for the answer, if i understand correctly prom. can be used for this 
use case, but I need to integrate a new component (aka statsd_exporter) to 
clean metrics and remove "non static" label value.

Your approach allows you to go from, several time series, e.g:

- import_process_total{pod=“ pod-sdsdsf”}
- import_process_total{pod=“ pod-zzze”}
- import_process_total{pod=“ pod-fdssfdf”}
- ...

to a single

- import_process_total{}

And then I can use the classic “sum” operator to get the total of the
(processed) import, increase...

Is that the idea?

Le vendredi 20 décembre 2024 à 12:39:35 UTC+1, Brian Candler a écrit :

I see at least two distinct issues there.

1. "Is Prometheus and PromQL suitable for working on a metric that doesn't
change much?" - quite simply, "yes". Prometheus uses delta compression, so
adjacent identical values compress extremely well. Indeed, Prometheus is
often used for metrics which *never* change, so long as the labels are
static, for example:
node_os_info{id="ubuntu", id_like="debian", name="Ubuntu",
pretty_name="Ubuntu 22.04.5 LTS", version="22.04.5 LTS (Jammy Jellyfish)",
version_codename="jammy", version_id="22.04"} 1
The overhead of scraping this repeatedly is tiny.

2. You have a specific issue with distributed counters. Ideally you'd use
sum(import_processed_total) to get the total amount of work done over all
pods, but that's not reliable because parts of the counter will *vanish*
when the pod terminates, and you don't want the total counter to go down.

I think the best solution is to accumulate your counters in some other
external process, such as statsd_exporter. Send a '+1' whenever you do some
work. The value scraped from statsd_exporter will be the total amount of
work done, independently of which pod has performed the work. That is fine
for both total work done and for calculating the overall rate of processing
work.

On Friday, 20 December 2024 at 10:52:37 UTC Florian Luce wrote:

Hi everyone,

I have a use case where I'm trying to track the use of a feature that isn't
often used, and I've decided to use a counter.

To give you some stats, for the moment this counter will be incremented 50
times over 24h on average.

This functionality is implemented within a service that is deployed and
replicated on 10 to 20 pods (infra k8s), with metrics scrapped at regular
frequency (30sec). We have a label on the metrics to identify the pods and
avoid collisions, so this metric evolves very little and is spread over a
number of time series.

Here's a small example of the “flat” side of this metric

[image: Capture d’écran 2024-12-20 à 09.26.03.png]

The first problem we had to solve was losing the 0 to 1 transition (we
tested the feature beta created timestamps zero injection
<https://prometheus.io/docs/prometheus/latest/feature_flags/#created-timestamps-zero-injection>,

but it generated a significant CPU overload, so we didn't activate it).

So we went with a request like this :

clamp_min(
sum (max_over_time(import_processed_total{}[1m]) or vector(0))
- sum (max_over_time(import_processed_total{}[1m] offset 1m) or vector(0)),
0)

And i fix the "Min interval" of query options in grafana to 1m.

It's still imperfect at the end of time series, but arrives at a result
close to reality if I analyze it over time windows of 24 / 48 hours.

However, it becomes unusable if I use this approach over 30 days.

The questions I have are the following:

- Is there a different approach (promql query) to exploit this metric
without losing precision?
- Is Prometheus suitable for this kind of use case?
- Couldn't an “adaptive metrics

<https://grafana.com/blog/2023/05/09/adaptive-metrics-grafana-cloud-announcement/>”

approach be a solution for cleaning up this metric and generating a
synthetic version for one day, which can then be analyzed over 30 days?

Thx for the read and futur answers

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/a1c743ec-81cb-4fec-afec-f48e7597f893n%40googlegroups.com.

[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

Reply via email to