On Monday, 12 October 2020 08:34:55 UTC+1, Samuel Stanley wrote:
>
> Here's my use case:
>
> I use PushGateway to allows Prometheus to scrape a custom component in our
> service. Here the metric is the number of notifications sent for a given
> service instance. So every time there is a Push Notification being sent via
> the service instance, I go ahead and increment the Guage by 1 and send it
> to PushGateway. Prometheus then scrapes this and sends it to Sysdig.
> Currently what happens is if the user happened to send 5 notifications to
> say at 12:58 PM and then does not send any more notification, what I see is
> 5 is being sent by the Prometheus server even after 12:58 PM and that
> continues until the next time a notification is being sent by the user and
> the value changes.
>
That is absolutely the correct way to use a counter in Prometheus. This is
how it should work. The repeated value of 5 is confirming that no
additional events occurred between T1 and T2, which is a valid and
important piece of information.
Compare the following two timeseries:
(a) 1 2 4 5 5 5 5 5 5 5 6 6 6
(b) 1 2 4 5 . . . . . . 6 . .
In case (a) you know exactly when the counter went from 5 to 6. In case
(b) you don't know anything about the counter value where there is a dot.
What it's actually saying is that the metric has gone away. In that case,
it's impossible to calculate the rate:
(b) 1 2 4 5 . . . . . . 6 . .
<vals unknown>
*Maybe* the counter just went from 5 to 6. But maybe it went from 5 to 7,
and then the counter was reset to zero, and then incremented back up to 6,
all during that period where there is no data. After the gap, it is
effectively a completely new time series, which just happens to start at
value 6.
> So the way I would want it to work is at 12:58 PM it should spike up
> showing that the value is 5 indicating that there were 5 notifications that
> were sent and then drop back to 0 until the next value comes in.
>
That's abuse of the data model. Prometheus *can* work with counters which
occasionally reset, because this happens in real life when services are
restarted and have no way to persist their counters, but you should not
have counters resetting frequently as a matter of course. At that point
they are no longer counters, but they are not gauges either, and the data
is useless.
If this were a local Prometheus server there would be no issue with the
repeated counter values. So I presume the driver here is that you are
trying to micro-optimise sending to sysdig - maybe to do with the way
sysdig charges you? I'm sorry, but Prometheus doesn't support this use
case, certainly not with remote_write.
It might be possible to do something with recording rules - that is, create
a new timeseries with a stale gap where the data is not changing. But I'm
not going to help you with that, because in the long term it will bite you.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/a09eb541-fd87-4c06-8b38-0c4c447f6cf0o%40googlegroups.com.