Hi Aaron

We might for instance want to have some counters giving statistical data on
flow_mods added per ovn iteration.
So, for instance, we would be able to compare those statistics with
existing stopwatches indicating the time spent for those iterations, and
see whether a high time stopwatch is linked to a high value of the maximum
flow_mod....

Coverage counters would not be sufficient. They provide very useful info
(average over a few seconds, minute, hour), but they do not provide (as far
as I know at least) this statistical view (max, percentile, ...) per
iteration - e.g. the maximum. Adding this statistical view to the coverage
counter does not seem to make too much sense for the existing coverage
counters, so I would not modify the coverage counters.

As I mentioned, we could make the stopwatch (slightly) more generic by
making two small modifications:
- Add an empty unit, so the stopwatch could cover all the "counts"
statistics, and not only the time related ones (as the existing
stopwatches).
- Add to the stopwatch printout the "Sum" of all items (and not only some
averages). This addition might be useful as well for the existing
stopwatches.
With those two minor changes, OVN could add its own statistics.

Do you think that such changes to the OVS stopwatch could be accepted
upstream?

Thanks
Xavier




On Thu, Oct 14, 2021 at 9:39 PM Aaron Conole <[email protected]> wrote:

> Xavier Simonart <[email protected]> writes:
>
> > Hi Aaron
> >
> > Thanks for looking into this.
> >
> > You are right when saying that OVN uses the stopwatch API just fine.
> > The goal of the proposed modification is to be able provide extra
> counters/statistics from OVN.
> > For instance, there was some interest about how many flows (or groups)
> are added (removed, updated, ...) by
> > ovn-controller.
> > In addition to the raw count, we'd like to provide some statistical view
> per iteration - hence the need of being able to
> > calculate things like percentiles (in addition to average, max, ...).
> >
> > The stopwatch API provides most of what would be needed, but would have
> required some changes:
> > - stopwatch always uses some "units" (msec, ...) while we would use it
> here to report non time-based statistics => we
> > would need to add some other "units". We could add an "empty" unit, to
> avoid having to require OVS changes
> > everytime we need different statistics (today flows, tomorrow something
> else).
> > - more important, stopwatch does not provide the total count we are
> looking for (i.e. the sum of all samples). We could
> > add a "Total" or a "Sum" to stopwatch, but this would have changed the
> stopwatch outputs for all existing counters. I
> > felt that this might be perceived as an API change or a compatibility
> issue.
> >
> > Hence I was proposing to only export the percentile related function, to
> avoid any change to OVS which might impact
> > backward compatibility.
> > But I would be happy to modify the stopwatch, adding the "Sum" and the
> empty unit. This would simplify OVN code as
> > well compared to what I was initially proposing.
> > Adding stopwatch_add_sample(...) would not change much in this case.
>
> I see - what you're looking for is some kind of generic statistics -
> would the coverage counters work instead?  We do keep some flow
> statistics, so I guess it's useful to understand what you're looking
> to track.  Maybe 'stopwatch' might not work perfectly, or maybe we can
> make some changes to make it seem more generic.
>
> > Thanks
> > Xavier
> >
> > On Wed, Oct 13, 2021 at 6:42 PM Aaron Conole <[email protected]> wrote:
> >
> >  Xavier Simonart <[email protected]> writes:
> >
> >  > export calc_percentile function (and percentile struct) so that
> >  > it can be used in other libraries such as OVN.
> >  >
> >  > Signed-off-by: Xavier Simonart <[email protected]>
> >  > ---
> >
> >  What is the intent to use this in other libraries?  It would be nice to
> >  understand why just running the existing stopwatch API doesn't work
> >  (maybe you cannot start samples).  AFAIK, OVN uses the stopwatch API
> >  just fine?
> >
> >  Perhaps it could make sense to expose the functionality in a different
> >  fashion like:
> >
> >    void stopwatch_add_sample(const char *, unsigned long long);
> >
> >  This seems a useful API and doesn't expose all of the internal
> >  information, but I don't know if it really is needed.  Can you expand
> >  why you need calc_percentile exposed?
> >
> >  >  lib/stopwatch.c | 24 +-----------------------
> >  >  lib/stopwatch.h | 26 ++++++++++++++++++++++++++
> >  >  2 files changed, 27 insertions(+), 23 deletions(-)
> >  >
> >  > diff --git a/lib/stopwatch.c b/lib/stopwatch.c
> >  > index f5602163b..003c3a05f 100644
> >  > --- a/lib/stopwatch.c
> >  > +++ b/lib/stopwatch.c
> >  > @@ -35,25 +35,6 @@ struct average {
> >  >      double alpha;   /* Weight given to new samples */
> >  >  };
> >  >
> >  > -#define MARKERS 5
> >  > -
> >  > -/* Number of samples to collect before reporting P-square calculated
> >  > - * percentile
> >  > - */
> >  > -#define P_SQUARE_MIN 50
> >  > -
> >  > -/* The naming of these fields is based on the naming used in the
> >  > - * P-square algorithm paper.
> >  > - */
> >  > -struct percentile {
> >  > -    int n[MARKERS];
> >  > -    double n_prime[MARKERS];
> >  > -    double q[MARKERS];
> >  > -    double dn[MARKERS];
> >  > -    unsigned long long samples[P_SQUARE_MIN];
> >  > -    double percentile;
> >  > -};
> >  > -
> >  >  struct stopwatch {
> >  >      enum stopwatch_units units;
> >  >      unsigned long long n_samples;
> >  > @@ -107,10 +88,7 @@ comp_samples(const void *left, const void *right)
> >  >      return *right_d > *left_d ? -1 : *right_d < *left_d;
> >  >  }
> >  >
> >  > -/* Calculate the percentile using the P-square algorithm. For more
> >  > - * information, see
> https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
> >  > - */
> >  > -static void
> >  > +void
> >  >  calc_percentile(unsigned long long n_samples, struct percentile
> *pctl,
> >  >                  unsigned long long new_sample)
> >  >  {
> >  > diff --git a/lib/stopwatch.h b/lib/stopwatch.h
> >  > index 91abd64e4..efb9a9e8a 100644
> >  > --- a/lib/stopwatch.h
> >  > +++ b/lib/stopwatch.h
> >  > @@ -36,6 +36,32 @@ struct stopwatch_stats {
> >  >                                      (alpha 0.01). */
> >  >  };
> >  >
> >  > +#define MARKERS 5
> >  > +
> >  > +/* Number of samples to collect before reporting P-square calculated
> >  > + * percentile
> >  > + */
> >  > +#define P_SQUARE_MIN 50
> >  > +
> >  > +/* The naming of these fields is based on the naming used in the
> >  > + * P-square algorithm paper.
> >  > + */
> >  > +struct percentile {
> >  > +    int n[MARKERS];
> >  > +    double n_prime[MARKERS];
> >  > +    double q[MARKERS];
> >  > +    double dn[MARKERS];
> >  > +    unsigned long long samples[P_SQUARE_MIN];
> >  > +    double percentile;
> >  > +};
> >  > +
> >  > +/* Calculate the percentile using the P-square algorithm. For more
> >  > + * information, see
> https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
> >  > + */
> >  > +void
> >  > +calc_percentile(unsigned long long n_samples, struct percentile
> *pctl,
> >  > +                unsigned long long new_sample);
> >  > +
> >  >  /* Create a new stopwatch.
> >  >   * The "units" are not used for any calculations but are printed when
> >  >   * statistics are requested.
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to