Thanks Ben. On Thu, Feb 18, 2021 at 1:37 PM Ben Kochie <[email protected]> wrote:
> The problem with what you're proposing is you're getting an invalid > picture of data over time. This is the problem with the original smokeping > program that the smokeping prober is trying to solve. > > The original smokeping software does exactly what you're talking about. It > sends out a burst of 10 packets at the configured interval (in your > example, 1 minute). The problem is this does not give you a real picture, > because the packets are not evenly spaced. > > This is why I made the smokeping_prober work the way it does. It sends a > regular stream, but captures the data in a smarter way, as a histogram. > > From the histogram data you can only collect the metrics every minute, and > generate the same "min / max / avg / dev / loss" values that you're looking > for. But the actual values are much more statistically valid, as it's > measuring evenly over time. > That's fair. I do understand the argument for preferring continuous observations. The problem I have with the histogram approach (and this is partly due to the current way histograms work in Prometheus) is that I don't know the distribution a priori. I let smokeping_prober run for a few days against several IP addresses. For a particular one, after 250+ thousand observations, it's telling me that the round trip time is somewhere between 51.2 ms and 102.4 ms. Using the sum and the count from histogram data I can derive an average (not mean) over a short window and it's giving me ~ 60 ms. I happen to know (from the individual observations) that the 95th percentile is also ~ 60 ms, and that's pretty much the 50th percentile (the spread of the observations is very small). The actual min/max/avg from observations is something like 59.1 / 59.7 / 59.4 ms. If I use the data from the histogram the 50th percentile comes out as ~ 77 ms and the 95th percentile as ~ 100 ms. I must be missing something, because I don't see how I would extract the min / max / dev from the available data. I do understand that the standard deviation for this data is unusually small (compared to what you'd expect to see in the wild), but still... I also have to think of the data size. For 1 ICMP packet every 1 second, I'm at (order of magnitude) 100 MB of data per target per month. Reducing this to 5 packets every 60 seconds I'm down to 10 MB (order of magnitude). This doesn't sound like much for a single target but it does add up. As a side note, I noticed that smokeping_prober resolves the IP address once. With BBE this happens everytime the probe runs, so I don't have to do anything if I'm monitoring a host where IP addresses might change every now and then. Thanks again, Marcelo -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CABiJYgbHd0iV8r1ys8LDdzv2Dou%3D7VYxkoDeUzdsmSoutAN0Gg%40mail.gmail.com.

