Thanks for the detailed feedback Ben. I'll work up a new revision that addresses what you have found here. There are a couple of in-line comments below.

On 01/04/2018 05:58 PM, Ben Pfaff wrote:
On Mon, Dec 18, 2017 at 04:51:37PM -0600, Mark Michelson wrote:
This is similar to the existing coverage and perf-counter APIs in OVS.
However, rather than keeping counters, this is aimed at timing how long
operations take to perform. "Operations" in this case can be anything
from a loop iteration, to a function, to something more complex.

The library will keep track of how long it takes to perform the
particular operations and will maintain statistics of those running
times.

Statistics for a particular operation can be queried from the command
line by using ovs-appctl -t <target> performance/show <operation name>.

The API is designed to be pretty small. The expected steps are as
follows:

1) Create a performance measurement, providing a unique name, using
performance_create()
2) Add calls to start_sample() and end_sample() to mark the start and
stop of the operation you wish to measure.
3) Periodically call performance_run() in order to compile statistics.
4) Upon completion (likely program shutdown), call performance_destroy()
to clean up.

Signed-off-by: Mark Michelson <[email protected]>

Thanks.  I guess that this will be useful from time to time

It would be helpful to have a comment on each public function explaining
how to use it.  In particular, the meaning of parameters isn't
necessarily obvious (e.g. I guess that sample_rate is in msec?).

In performance_init(), usually we would use ovsthread_once instead of
raw pthread_once().

In performance_create(), you can use xzalloc() instead of xcalloc().

Usually we put struct and data definitions at the top of a file, unless
there's a good reason otherwise.  In this case, I'd move struct stats
and struct performance and 'performances' to the top of the file.

This doesn't look thread-safe; that would require a mutex (etc.) to
protect 'performances'.  Or it could be thread-local.

I think that 'performances' should be static.

Usually, in simple cases like this we put comments next to struct
members, e.g.:

struct sample {
     long long int start_time;   /* Time when we started this sample. */
     long long int end_time;     /* Time when we ended this sample. */
     long long int elapsed;      /* Elapsed time: end_time - start_time. */
};

It bothers me a little to see find_earliest use a size_t to iterate
through the samples and then return -1 to indicate failure.

There should probably be documentation on the precision here.  I think
it's 1 ms, so only fairly long kinds of activities can be timed with any
accuracy.

Maybe percentile() should document the valid range for 'percentile'.  In
particular, a 'percentile' of 100 will read past the end of the array.

I'd prefer if the names of start_sample() and end_sample() started with
performance_, for consistency.

sort_times() could use xmemdup().

Usually we line up function declarations more like this:
     static int
     get_stats(const struct sample_vec *vec, long long int age_ms,
               struct stats *stats)
with the second line of parameters indented just past the (.

In many cases, we've found that for "show" kinds of commands, it is
helpful to allow an empty set of parameters to show all of the entities
of that type.

unixctl_command_register() should pass "NAME" as the second argument, to
let the user know what parameter is expected (or "[NAME]", if you make
it optional).

Is there value in calculating stats periodically in performance_run()?
It looks to me like nothing is lost if they are calculated only when
requested via performance/show.

In this patch alone, no there is not any value in calculating stats in performance_run(). However, the next patch in the series stores the statistics in the database, meaning that they could be queried outside of the performance/show command. (Note, I have not yet read your response to that patch)


I think that running cull_old_times() in each call to performance_run()
means that we will normally cull off only a single element, which is
expensive due to the copying.  Maybe there should be a strategy to cull
less frequently so that on average maybe half of the elements are
culled

It actually won't usually cull one element. The thing to remember is that even though you may call performance_run() often, cull_old_times() will only be run if the sample rate timer is expired. So if you're measuring the performance of something that runs constantly and typically finishes in sub 1-second, and your sample rate is ten seconds, then you'll only cull old times every ten seconds. So you could potentially remove a bunch of elements.

That all being said, I will still look at potentially optimizing this.


Thanks,

Ben.


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to