On Mon, Feb 22, 2021 at 3:48 AM Marcelo Magallón <
[email protected]> wrote:

> Thanks Ben.
>
> On Thu, Feb 18, 2021 at 1:37 PM Ben Kochie <[email protected]> wrote:
>
>> The problem with what you're proposing is you're getting an invalid
>> picture of data over time. This is the problem with the original smokeping
>> program that the smokeping prober is trying to solve.
>>
>> The original smokeping software does exactly what you're talking about.
>> It sends out a burst of 10 packets at the configured interval (in your
>> example, 1 minute). The problem is this does not give you a real picture,
>> because the packets are not evenly spaced.
>>
>> This is why I made the smokeping_prober work the way it does. It sends a
>> regular stream, but captures the data in a smarter way, as a histogram.
>>
>> From the histogram data you can only collect the metrics every minute,
>> and generate the same "min / max / avg / dev / loss" values that you're
>> looking for. But the actual values are much more statistically valid, as
>> it's measuring evenly over time.
>>
>
> That's fair. I do understand the argument for preferring continuous
> observations.
>
> The problem I have with the histogram approach (and this is partly due to
> the current way histograms work in Prometheus) is that I don't know the
> distribution a priori.
>
> I let smokeping_prober run for a few days against several IP addresses.
> For a particular one, after 250+ thousand observations, it's telling me
> that the round trip time is somewhere between 51.2 ms and 102.4 ms. Using
> the sum and the count from histogram data I can derive an average (not
> mean) over a short window and it's giving me ~ 60 ms. I happen to know
> (from the individual observations) that the 95th percentile is also ~ 60
> ms, and that's pretty much the 50th percentile (the spread of the
> observations is very small). The actual min/max/avg from observations is
> something like 59.1 / 59.7 / 59.4 ms. If I use the data from the histogram
> the 50th percentile comes out as ~ 77 ms and the 95th percentile as ~ 100
> ms. I must be missing something, because I don't see how I would extract
> the min / max / dev from the available data. I do understand that the
> standard deviation for this data is unusually small (compared to what you'd
> expect to see in the wild), but still...
>

The default histogram buckets in the smokeping_prober cover latency
durations from localhost to the moon and back. It's relatively easy to
adjust the buckets, and easy enough to get within a reasonable range for
your network expectations.

Without knowing exactly what queries you're running, it's hard to say what
you're doing. If you're using the histogram count/sum, this will give you
the mean value.

There is one known issue with the smokeping_prober right now that I need to
fix, the ping library handling of sequence numbers is broken and doesn't
wrap correctly.


>
> I also have to think of the data size. For 1 ICMP packet every 1 second,
> I'm at (order of magnitude) 100 MB of data per target per month. Reducing
> this to 5 packets every 60 seconds I'm down to 10 MB (order of magnitude).
> This doesn't sound like much for a single target but it does add up.
>

Yes, this is going to be an issue no matter what you do. I don't see how
this relates to any mode of operation.


>
> As a side note, I noticed that smokeping_prober resolves the IP address
> once. With BBE this happens everytime the probe runs, so I don't have to do
> anything if I'm monitoring a host where IP addresses might change every now
> and then.
>

Yes, this is currently intentional, but re-resolving is something I'm
planning to do eventually.


>
> Thanks again,
>
> Marcelo
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CABbyFmr9Vrz5%2Buoeo6vJtPfPO3Z18dKC82XBsFYRbAYLxGx6Gw%40mail.gmail.com.

Reply via email to