On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote: > Il 08/09/2014 16:49, Benoît Canet ha scritto: > >> > - create two windows, with twice the suggested expiration period, and > >> > return min/avg/max from the oldest window. Example > >> > > >> > t=0 |t=1 |t=2 |t=3 |t=4 > >> > wnd0: [0,1) |wnd0: [1,3) | |wnd0: [3,5) | > >> > wnd1: [0,2) | |wnd1: [2,4) | | > >> > > >> > Values are returned from: > >> > > >> > wnd0---------|wnd1---------|wnd0---------|wnd1---------| > > > > This is neat. > > Alternatively, you can make it probabilistically correct: > > t=0 |t=0.66 |t=1.33 |t=2 > |t=2.66 > |wnd0: [0.66,2) | |wnd0: [2,3.33) | > wnd1: [0,0.66) | |wnd1: [1.33,2.66) | | > > Return from: > > > wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0 > > So you always have 2/3 seconds worth of data, and on average exactly 1 second > worth of data. > > The problem is the delay in getting data, which can be big for the minute- > and hour-based statistics. Suppose you have a spike that lasts 10 seconds, > it might not show in the minute-based statistics for as much as 30 seconds > after it ends (the window switches every 40 seconds). > > For min/max you could return min(min0, min1) and max(max0, max1). Only the > average has this problem. > > Exponential smoothing doesn't have this problem. IIRC uptime uses that.
I am writing this so cloud end users can programatically get informations about their vms disk statistics. Cloud end users are known to use their cloud API to script the elasticity of their architecture. Some code will poll system statistics to decide if new instances must be launched or existing instances must be pruned. This means introducing a delay in the accounting code would slow down their decisions. min and max is also useful to know since it gives an idea of the deviation. So I think the first method you suggested would be the best for a cloud vm. Best regards Benoît > > Paolo