Re: [ovs-dev] [PATCH v2 1/2] Add performance measuring API

Aaron Conole Thu, 18 Jan 2018 08:21:52 -0800

Hi Mark,

Mark Michelson <[email protected]> writes:


> This is similar to the existing coverage and perf-counter APIs in OVS.
> However, rather than keeping counters, this is aimed at timing how long
> operations take to perform. "Operations" in this case can be anything
> from a loop iteration, to a function, to something more complex.

It's important to note that the granularity on this is measured in 'ms',
which means this library is more suited towards measuring larger groups
of operations (those which take longer than the clock resolution on
time_msec).  I guess one advantage of 

I might suggest the ability to pick the time resolution.  For instance,
sometimes picking msec is useful when clocking response time in a
multi-message system.  Sometimes a tighter granularity (using finer
controls such as 'clock_gettime', 'rdtsc', 'read register 9 of
coprocessor 0') may be desired.  A function pointer passed to the
performance 'bucket' ie:
   performance_create("something", perf_time_msec)
that can have a function signature that makes sense for most cases (ull
return, no args, etc.) might be good to allow users to customize the
performance system to suit their needs.

OTOH, since it won't have much for initial users, maybe it's okay.

Also, apologies if I'm really picky or seem overly verbose with moving
average filters.  Once upon a time, I worked on an embedded system that
was a giant... :)

> The library will keep track of how long it takes to perform the
> particular operations and will maintain statistics of those running
> times.
>
> Statistics for a particular operation can be queried from the command
> line by using ovs-appctl -t <target> performance/show <operation name>.

I noticed that you'll have a global lock preventing statistics updates
from happening.  No problem - except the larger issue is while that lock
is held no other forward progress for the thread tracking the statistics
can happen.

Maybe it makes sense to think about an alternate set of data structures
that will accommodate multiple threads contending on performance
monitoring.  I'm guessing in your use case there isn't any contention,
but perhaps in the future someone will use this in, for example, a
netdev area to profile and the results would get skewed.

> The API is designed to be pretty small. The expected steps are as
> follows:
>
> 1) Create a performance measurement, providing a unique name, using
> performance_create()
> 2) Add calls to start_sample() and end_sample() to mark the start and
> stop of the operation you wish to measure.
> 3) Periodically call performance_run() in order to compile statistics.
> 4) Upon completion (likely program shutdown), call performance_destroy()
> to clean up.

You could register an atexit() handler to automatically destroy these.
Then the framework user doesn't have to deal with that.  There's also a
fatal_signal_add_hook() function where you can register deletion
(although I'd advise against it - usually a fatal signal means 'leave
memory alone' and cleanup here would require deletion).

> Signed-off-by: Mark Michelson <[email protected]>
> ---
>  lib/automake.mk   |   2 +
>  lib/performance.c | 387 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/performance.h |  58 ++++++++
>  3 files changed, 447 insertions(+)
>  create mode 100644 lib/performance.c
>  create mode 100644 lib/performance.h
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 4b38a1163..fef2c4e1a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -205,6 +205,8 @@ lib_libopenvswitch_la_SOURCES = \
>       lib/pcap-file.h \
>       lib/perf-counter.h \
>       lib/perf-counter.c \
> +     lib/performance.h \
> +     lib/performance.c \
>       lib/poll-loop.c \
>       lib/process.c \
>       lib/process.h \
> diff --git a/lib/performance.c b/lib/performance.c
> new file mode 100644
> index 000000000..5b54cfc29
> --- /dev/null
> +++ b/lib/performance.c
> @@ -0,0 +1,387 @@
> +/* Copyright (c) 2017 Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +
> +#include "performance.h"
> +#include "timer.h"
> +#include "timeval.h"
> +#include "openvswitch/shash.h"
> +#include "openvswitch/vlog.h"
> +#include "unixctl.h"
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/poll-loop.h"
> +
> +VLOG_DEFINE_THIS_MODULE(performance);
> +
> +struct sample {
> +    long long int start_time; /* Time when we started this sample */
> +    long long int end_time;   /* Time when we ended this sample */
> +    long long int elapsed;    /* Elapsed time: end_time - start_time */
> +};
> +
> +struct sample_vec {
> +    struct sample *samples; /* Dynamic array of samples */
> +    size_t n_samples;       /* Number of samples */
> +    size_t capacity;        /* Number of allocated samples */
> +};
> +
> +struct stats {
> +    long long int min;         /* Minimum measurement (ms) */
> +    long long int max;         /* Maximum measurement (ms) */
> +    long long int average;     /* Average measurement (ms) */

I think signed values doesn't make sense.  I don't know that time moves
backwards (unless you're using "wall clock", instead of a monotonic time
source).  I guess this is to keep the return type correct with
time_msec(), but I don't understand why that should be signed either
(since it says it will be monotonic).  In practice, it generally *won't*
matter at msec granularity - I 'napkin math' think it takes something
like 6.2bn years to rollover from epoch.  I just worry about the
rollover to negative case (and maybe it isn't important).

Additionally, I think integers aren't a good way of storing the average,
since it will truncate, and 1ms vs 1.9ms are very different when it
comes to measurements.

> +    long long int percentile;  /* 95th percentile measurement (ms) */
> +    long long int num_samples; /* Total number of measurements */
> +};
> +
> +struct performance {
> +    struct sample_vec vec;
> +    long long int sample_rate;
> +    struct timer timer;
> +    struct sample cur_sample;
> +};

It looks like you're keeping around the samples because you want to 'age
out' from the moving average window.  Otherwise I think it can be
implemented using an IIR filter (psuedo-code below):

       current_avg -= (current_avg/n_window_samples)
       current_avg += (sample / n_window_samples)

BUT - if you're going to keep around the samples, you might as well
include a calculation of additional statistics (variance comes to mind).

> +static struct shash performances = SHASH_INITIALIZER(&performances);
> +static struct ovs_mutex performances_lock = OVS_MUTEX_INITIALIZER;

As I wrote before, having this global mutex means statistics
accumulation blocks for everyone.  You might think it doesn't matter,
but imagine this scenario:

  thread 0                      |    thread 1

  call perf_start()             |  .............
  .... do things                |  call perf_start() (get slept)
  call perf_end() (block...)    |  wake up and complete
     (unblock)                  |

In the above, the perf_end() will intermittently include lock calls from
the perf_start(), which may smooth out over time but will introduce
variance.

> +#define PERFORMANCE_INTERVAL 10000
> +#define PERFORMANCE_OLDEST 600000
> +
> +static void
> +add_sample(struct sample_vec *vec, struct sample *new_sample)
> +{
> +    if (vec->capacity == vec->n_samples) {
> +        vec->samples = x2nrealloc(vec->samples, &vec->capacity,
> +            sizeof *vec->samples);

This resize operation can grow very quickly.  Assuming the samples come
very fast, you'll run out of memory very quickly.  Is there a reason not
to have just roll samples in the vector?  I guess it makes your
'selection' for various time periods more difficult, but you can also
track those with independent counters as well (just set a different time
interval... :).  That also means no allocations when creating samples,
and no deallocations when removing samples, freeing up additional memory
pressure.

The "disadvantage" is you need to determine your sampling window sizes,
but that may be something you expose as a tune-able setting?  Also,
tracking your 1m,5m,10m counters would be a bit different (but you could
just treat them as additional 'performance counters' in your framework,
and every time the performance_run() decides it's time to take an
average, if the time interval makes sense, it can update "name",
"name_1m", "name_5m", "name_10m", etc).

> +    }
> +
> +    vec->samples[vec->n_samples++] = *new_sample;
> +}
> +
> +static int
> +find_earliest(const struct sample_vec *vec, long long int age_ms)
> +{
> +    long long int cutoff = time_msec() - age_ms;
> +
> +    for (int i = 0; i < vec->n_samples; i++) {
> +        if (vec->samples[i].end_time >= cutoff) {
> +            return i;
> +        }
> +    }
> +
> +    /* Either the vector is empty or all times are
> +     * older than the cutoff.
> +     */
> +    return -1;
> +}
> +
> +static long long int
> +average(const struct sample *samples, size_t num_samples)
> +{
> +    /* Avoid division by zero */
> +    if (num_samples == 0) {
> +        return 0;
> +    }
> +
> +    long long int sum = 0;
> +    for (size_t i = 0; i < num_samples; i++) {
> +        sum += samples[i].elapsed;
> +    }
> +
> +    return sum / num_samples;
> +}
> +
> +/* Calculate a specific percentile value.
> + * Valid percentile values range from 0 to 99
> + */
> +static long long int
> +percentile(const struct sample *samples, size_t num_samples, int percentile)
> +{
> +    if (num_samples == 0) {
> +        return 0;
> +    }
> +
> +    size_t pctl = num_samples * percentile / 100;
> +    return samples[pctl].elapsed;
> +}
> +
> +static void
> +cull_old_times(struct sample_vec *vec, long long int age_ms)
> +{
> +    int i = find_earliest(vec, age_ms);
> +
> +    if (i <= 0) {
> +        return;
> +    }
> +
> +    size_t new_size = vec->n_samples - i;
> +    memmove(vec->samples, &vec->samples[i], new_size * sizeof *vec->samples);
> +    vec->n_samples = new_size;
> +}
> +
> +static void
> +format_stats(struct ds *s, const char *prefix, const struct stats *stats)
> +{
> +    if (stats->num_samples) {
> +        ds_put_format(s, "\t%s samples: %lld\n", prefix, stats->num_samples);
> +        ds_put_format(s, "\t%s minimum: %lld\n", prefix, stats->min);
> +        ds_put_format(s, "\t%s maximum: %lld\n", prefix, stats->max);
> +        ds_put_format(s, "\t%s average: %lld\n", prefix, stats->average);
> +        ds_put_format(s, "\t%s 95th percentile: %lld\n\n", prefix,
> +                      stats->percentile);
> +    } else {
> +        ds_put_format(s, "\t%s samples: 0\n", prefix);
> +        ds_put_format(s, "\t%s minimum: N/A\n", prefix);
> +        ds_put_format(s, "\t%s maximum: N/A\n", prefix);
> +        ds_put_format(s, "\t%s average: N/A\n", prefix);
> +        ds_put_format(s, "\t%s 95th percentile: N/A\n\n", prefix);
> +    }
> +}
> +
> +static int
> +cmp_times(const void *left_, const void *right_)
> +{
> +    const struct sample *left = left_;
> +    const struct sample *right = right_;
> +
> +    return left->elapsed == right->elapsed ? 0
> +        : left->elapsed > right->elapsed;
> +}
> +
> +static struct sample *
> +sort_times(const struct sample *by_time, size_t vec_size)

I got confused reading through this a few times, because the name
doesn't convey the 'copy' operation that happens.  It would be good to
call this something like:

  sorted_copy_times()

which would make reasoning about the get_stats block a little bit better
(I read through it twice forgetting that sort_times wasn't an in-place
sort).

> +{
> +    struct sample *copy = xmemdup(by_time, vec_size * sizeof *copy);
> +    qsort(copy, vec_size, sizeof *copy, cmp_times);
> +
> +    return copy;
> +}
> +
> +static void
> +get_stats(const struct sample_vec *vec, long long int age_ms,
> +          struct stats *stats)
> +{
> +    int start_idx = find_earliest(vec, age_ms);
> +    if (start_idx < 0) {
> +        memset(stats, 0, sizeof *stats);
> +        return;
> +    }
> +    size_t vec_size = vec->n_samples - start_idx;
> +
> +    struct sample *by_time = &vec->samples[start_idx];
> +    struct sample *by_elapsed = sort_times(by_time, vec_size);
> +
> +    stats->min = by_elapsed[0].elapsed;
> +    stats->max = by_elapsed[vec_size - 1].elapsed;
> +    stats->average = average(by_time, vec_size);
> +    stats->percentile = percentile(by_elapsed, vec_size, 95);
> +    stats->num_samples = vec_size;
> +
> +    free(by_elapsed);
> +}
> +
> +static void
> +performance_print(struct performance *perf, const char *name,
> +                  struct ds *s)
> +{
> +    struct stats one_min;
> +    struct stats five_min;
> +    struct stats ten_min;
> +    ds_put_format(s, "Statistics for '%s'\n", name);
> +
> +    get_stats(&perf->vec, 60000, &one_min);
> +    get_stats(&perf->vec, 300000, &five_min);
> +    get_stats(&perf->vec, 600000, &ten_min);
> +
> +    format_stats(s, "1 minute", &one_min);
> +    format_stats(s, "5 minute", &five_min);
> +    format_stats(s, "10 minute", &ten_min);
> +}
> +
> +static bool
> +performance_show_protected(int argc, const char *argv[], struct ds *s)
> +{
> +    struct performance *perf;
> +
> +    if (argc > 1) {
> +        perf = shash_find_data(&performances, argv[1]);
> +        if (!perf) {
> +            ds_put_cstr(s, "No such performance");
> +            return false;
> +        }
> +        performance_print(perf, argv[1], s);
> +    } else {
> +        struct shash_node *node;
> +        SHASH_FOR_EACH (node, &performances) {
> +            perf = node->data;
> +            performance_print(perf, node->name, s);
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +static void
> +performance_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +        const char *argv[], void *ignore OVS_UNUSED)
> +{
> +    struct ds s = DS_EMPTY_INITIALIZER;
> +    bool success;
> +
> +    ovs_mutex_lock(&performances_lock);
> +    success = performance_show_protected(argc, argv, &s);
> +    ovs_mutex_unlock(&performances_lock);
> +
> +    if (success) {
> +        unixctl_command_reply(conn, ds_cstr(&s));
> +    } else {
> +        unixctl_command_reply_error(conn, ds_cstr(&s));
> +    }
> +    ds_destroy(&s);
> +}
> +
> +static void
> +do_init_performance(void)
> +{
> +    unixctl_command_register("performance/show", "[NAME]", 0, 1,
> +                             performance_show, NULL);
> +}
> +
> +static void
> +performance_init(void)
> +{
> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +    if (ovsthread_once_start(&once)) {
> +        do_init_performance();
> +        ovsthread_once_done(&once);
> +    }
> +}
> +
> +void
> +performance_create(const char *name)
> +{
> +    performance_init();
> +
> +    struct performance *perf = xzalloc(sizeof *perf);
> +    timer_set_duration(&perf->timer, PERFORMANCE_INTERVAL);
> +    ovs_mutex_lock(&performances_lock);
> +    shash_add(&performances, name, perf);
> +    ovs_mutex_unlock(&performances_lock);
> +}
> +
> +void
> +performance_destroy(const char *name)
> +{
> +    struct performance *perf;
> +
> +    ovs_mutex_lock(&performances_lock);
> +    perf = shash_find_and_delete(&performances, name);
> +    ovs_mutex_unlock(&performances_lock);
> +
> +    if (perf) {
> +        free(perf->vec.samples);
> +    }
> +
> +    free(perf);
> +}
> +
> +static bool
> +performance_start_sample_protected(const char *name)
> +{
> +    struct performance *perf = shash_find_data(&performances, name);
> +    if (!perf) {
> +        return false;
> +    }
> +
> +    /* We already started sampling. Need an end before
> +     * we start another sample
> +     */
> +    if (perf->cur_sample.start_time) {
> +        return false;
> +    }
> +
> +    perf->cur_sample.start_time = time_msec();
> +    return true;
> +}
> +
> +bool
> +performance_start_sample(const char *name)
> +{
> +    ovs_mutex_lock(&performances_lock);
> +    bool ret = performance_start_sample_protected(name);
> +    ovs_mutex_unlock(&performances_lock);
> +
> +    return ret;
> +}
> +
> +static bool
> +performance_end_sample_protected(const char *name)
> +{
> +    struct performance *perf = shash_find_data(&performances, name);
> +    if (!perf) {
> +        return false;
> +    }
> +
> +    /* We can't end a sample if we haven't started one */
> +    if (!perf->cur_sample.start_time) {
> +        return false;
> +    }
> +
> +    perf->cur_sample.end_time = time_msec();
> +    perf->cur_sample.elapsed = perf->cur_sample.end_time
> +        - perf->cur_sample.start_time;
> +
> +    add_sample(&perf->vec, &perf->cur_sample);
> +
> +    memset(&perf->cur_sample, 0, sizeof perf->cur_sample);
> +    return true;
> +}
> +
> +bool
> +performance_end_sample(const char *name)
> +{
> +    ovs_mutex_lock(&performances_lock);
> +    bool ret = performance_end_sample_protected(name);
> +    ovs_mutex_unlock(&performances_lock);
> +
> +    return ret;
> +}
> +
> +static void
> +performance_run_protected(void)
> +{
> +    struct shash_node *node;
> +    SHASH_FOR_EACH (node, &performances) {
> +        struct performance *perf = node->data;
> +        if (!timer_expired(&perf->timer)) {
> +            timer_wait(&perf->timer);
> +            continue;
> +        }
> +
> +        cull_old_times(&perf->vec, PERFORMANCE_OLDEST);
> +
> +        timer_set_duration(&perf->timer, perf->sample_rate);
> +        timer_wait(&perf->timer);
> +    }
> +}
> +
> +void
> +performance_run(void)
> +{
> +    ovs_mutex_lock(&performances_lock);
> +    performance_run_protected();
> +    ovs_mutex_unlock(&performances_lock);
> +}
> diff --git a/lib/performance.h b/lib/performance.h
> new file mode 100644
> index 000000000..ec115a88c
> --- /dev/null
> +++ b/lib/performance.h
> @@ -0,0 +1,58 @@
> +/* Copyright (c) 2017 Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef PERFORMANCE_H
> +#define PERFORMANCE_H 1
> +
> +#include <stdbool.h>
> +
> +/* This file provides a method for timing operations in OVS.
> + *
> + * The expected operation is as follows:
> + * 1) Call performance_create(), supplying a unique name for the
> + *    operation that is being measured.
> + * 2) Place calls to start_sample() and end_sample() at the beginning
> + *    and end of the operation you wish to measure.
> + * 3) Periodically call performance_run() so that statistics can be
> + *    gathered based on the recorded times. If you are running a loop,
> + *    calling this once per loop is a good practice.
> + * 4) When complete, call performance_destroy() to clean up.
> + *
> + * This implementation is thread-safe.
> + */
> +
> +/* Create a new performance measurement tracker*/
> +void performance_create(const char *name);
> +
> +/* Destroy a performance measurement tracker */
> +void performance_destroy(const char *name);
> +
> +/* Indicate that a performance measurement is beginning. */
> +bool performance_start_sample(const char *name);
> +
> +/* Indicate that a performance measurement has ended. The
> + * sample will be added to the history of performance
> + * measurements for this tracker
> + */
> +bool performance_end_sample(const char *name);
> +
> +/* Perform periodic duties on all performance measurement
> + * trackers. Currently, this just takes care of culling old
> + * measurements. It is a good practice to call this periodically
> + * from a daemon's main loop
> + */
> +void performance_run(void);
> +
> +#endif /* performance.h */
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/2] Add performance measuring API

Reply via email to