Xavier Simonart <[email protected]> writes: > Hi Aaron > > We might for instance want to have some counters giving statistical data on > flow_mods added per ovn iteration. > So, for instance, we would be able to compare those statistics with existing > stopwatches indicating the time spent for > those iterations, and see whether a high time stopwatch is linked to a high > value of the maximum flow_mod.... > > Coverage counters would not be sufficient. They provide very useful info > (average over a few seconds, minute, hour), > but they do not provide (as far as I know at least) this statistical view > (max, percentile, ...) per iteration - e.g. the > maximum. Adding this statistical view to the coverage counter does not seem > to make too much sense for the existing > coverage counters, so I would not modify the coverage counters.
Okay. > As I mentioned, we could make the stopwatch (slightly) more generic by making > two small modifications: > - Add an empty unit, so the stopwatch could cover all the "counts" > statistics, and not only the time related ones (as the > existing stopwatches). > - Add to the stopwatch printout the "Sum" of all items (and not only some > averages). This addition might be useful as > well for the existing stopwatches. > With those two minor changes, OVN could add its own statistics. I guess it becomes more than just a stopwatch at that point. It's more like a stats window. Maybe it makes sense to decouple stopwatch from that, and have a generic set of stats window functions, and then build stopwatch on top. > Do you think that such changes to the OVS stopwatch could be accepted > upstream? Seems like there are some use cases that you have considered - although they do seem a bit more like OVN specific use cases. It make sense since OVS has this functionality to expose it. Mark, any thoughts on how best to proceed? > Thanks > Xavier > > On Thu, Oct 14, 2021 at 9:39 PM Aaron Conole <[email protected]> wrote: > > Xavier Simonart <[email protected]> writes: > > > Hi Aaron > > > > Thanks for looking into this. > > > > You are right when saying that OVN uses the stopwatch API just fine. > > The goal of the proposed modification is to be able provide extra > counters/statistics from OVN. > > For instance, there was some interest about how many flows (or groups) are > added (removed, updated, ...) by > > ovn-controller. > > In addition to the raw count, we'd like to provide some statistical view > per iteration - hence the need of being > able to > > calculate things like percentiles (in addition to average, max, ...). > > > > The stopwatch API provides most of what would be needed, but would have > required some changes: > > - stopwatch always uses some "units" (msec, ...) while we would use it > here to report non time-based statistics > => we > > would need to add some other "units". We could add an "empty" unit, to > avoid having to require OVS changes > > everytime we need different statistics (today flows, tomorrow something > else). > > - more important, stopwatch does not provide the total count we are > looking for (i.e. the sum of all samples). We > could > > add a "Total" or a "Sum" to stopwatch, but this would have changed the > stopwatch outputs for all existing > counters. I > > felt that this might be perceived as an API change or a compatibility > issue. > > > > Hence I was proposing to only export the percentile related function, to > avoid any change to OVS which might > impact > > backward compatibility. > > But I would be happy to modify the stopwatch, adding the "Sum" and the > empty unit. This would simplify OVN > code as > > well compared to what I was initially proposing. > > Adding stopwatch_add_sample(...) would not change much in this case. > > I see - what you're looking for is some kind of generic statistics - > would the coverage counters work instead? We do keep some flow > statistics, so I guess it's useful to understand what you're looking > to track. Maybe 'stopwatch' might not work perfectly, or maybe we can > make some changes to make it seem more generic. > > > Thanks > > Xavier > > > > On Wed, Oct 13, 2021 at 6:42 PM Aaron Conole <[email protected]> wrote: > > > > Xavier Simonart <[email protected]> writes: > > > > > export calc_percentile function (and percentile struct) so that > > > it can be used in other libraries such as OVN. > > > > > > Signed-off-by: Xavier Simonart <[email protected]> > > > --- > > > > What is the intent to use this in other libraries? It would be nice to > > understand why just running the existing stopwatch API doesn't work > > (maybe you cannot start samples). AFAIK, OVN uses the stopwatch API > > just fine? > > > > Perhaps it could make sense to expose the functionality in a different > > fashion like: > > > > void stopwatch_add_sample(const char *, unsigned long long); > > > > This seems a useful API and doesn't expose all of the internal > > information, but I don't know if it really is needed. Can you expand > > why you need calc_percentile exposed? > > > > > lib/stopwatch.c | 24 +----------------------- > > > lib/stopwatch.h | 26 ++++++++++++++++++++++++++ > > > 2 files changed, 27 insertions(+), 23 deletions(-) > > > > > > diff --git a/lib/stopwatch.c b/lib/stopwatch.c > > > index f5602163b..003c3a05f 100644 > > > --- a/lib/stopwatch.c > > > +++ b/lib/stopwatch.c > > > @@ -35,25 +35,6 @@ struct average { > > > double alpha; /* Weight given to new samples */ > > > }; > > > > > > -#define MARKERS 5 > > > - > > > -/* Number of samples to collect before reporting P-square calculated > > > - * percentile > > > - */ > > > -#define P_SQUARE_MIN 50 > > > - > > > -/* The naming of these fields is based on the naming used in the > > > - * P-square algorithm paper. > > > - */ > > > -struct percentile { > > > - int n[MARKERS]; > > > - double n_prime[MARKERS]; > > > - double q[MARKERS]; > > > - double dn[MARKERS]; > > > - unsigned long long samples[P_SQUARE_MIN]; > > > - double percentile; > > > -}; > > > - > > > struct stopwatch { > > > enum stopwatch_units units; > > > unsigned long long n_samples; > > > @@ -107,10 +88,7 @@ comp_samples(const void *left, const void *right) > > > return *right_d > *left_d ? -1 : *right_d < *left_d; > > > } > > > > > > -/* Calculate the percentile using the P-square algorithm. For more > > > - * information, see > https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf > > > - */ > > > -static void > > > +void > > > calc_percentile(unsigned long long n_samples, struct percentile *pctl, > > > unsigned long long new_sample) > > > { > > > diff --git a/lib/stopwatch.h b/lib/stopwatch.h > > > index 91abd64e4..efb9a9e8a 100644 > > > --- a/lib/stopwatch.h > > > +++ b/lib/stopwatch.h > > > @@ -36,6 +36,32 @@ struct stopwatch_stats { > > > (alpha 0.01). */ > > > }; > > > > > > +#define MARKERS 5 > > > + > > > +/* Number of samples to collect before reporting P-square calculated > > > + * percentile > > > + */ > > > +#define P_SQUARE_MIN 50 > > > + > > > +/* The naming of these fields is based on the naming used in the > > > + * P-square algorithm paper. > > > + */ > > > +struct percentile { > > > + int n[MARKERS]; > > > + double n_prime[MARKERS]; > > > + double q[MARKERS]; > > > + double dn[MARKERS]; > > > + unsigned long long samples[P_SQUARE_MIN]; > > > + double percentile; > > > +}; > > > + > > > +/* Calculate the percentile using the P-square algorithm. For more > > > + * information, see > https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf > > > + */ > > > +void > > > +calc_percentile(unsigned long long n_samples, struct percentile *pctl, > > > + unsigned long long new_sample); > > > + > > > /* Create a new stopwatch. > > > * The "units" are not used for any calculations but are printed when > > > * statistics are requested. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
