Sorry for the delayed response on this, but things are a bit busy/weird
for me right now.
On 03/21/2018 04:33 PM, Han Zhou wrote:
(Add the dev list back)
On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com>> wrote:
>
> On 03/15/2018 12:04 PM, Han Zhou wrote:
>>
>>
>>
>> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com> <mailto:mmich...@redhat.com
<mailto:mmich...@redhat.com>>> wrote:
>> >
>> > This modifies ovn-controller to measure the amount of time it
takes to
>> > detect a change in the southbound database and generate the resulting
>> > flow table. This may require multiple iterations of the
ovn-controller
>> > loop.
>> >
>> > The statistics can be queried using:
>> >
>> > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>> >
>> > The statistics can be reset using:
>> >
>> > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>> >
>> If the purpose of this patch is for measuring *SB changes* handling
time only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since
the controller loop handles many other things, such as local interface
changes, Packet-ins, OF messages, etc.
>
>
> Sure, this sounds reasonable. I'll talk about this a bit more down
below, though.
>
>>
>> > Signed-off-by: Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com> <mailto:mmich...@redhat.com
<mailto:mmich...@redhat.com>>>
>>
>> > ---
>> > ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>> > 1 file changed, 17 insertions(+)
>> >
>> > diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>> > index 7592bda25..abd253fca 100644
>> > --- a/ovn/controller/ovn-controller.c
>> > +++ b/ovn/controller/ovn-controller.c
>> > @@ -57,6 +57,9 @@
>> > #include "stream.h"
>> > #include "unixctl.h"
>> > #include "util.h"
>> > +#include "timeval.h"
>> > +#include "timer.h"
>> > +#include "stopwatch.h"
>> >
>> > VLOG_DEFINE_THIS_MODULE(main);
>> >
>> > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>> > #define DEFAULT_BRIDGE_NAME "br-int"
>> > #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>> >
>> > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>> > +
>> > static void update_probe_interval(struct controller_ctx *,
>> > const char *ovnsb_remote);
>> > static void parse_options(int argc, char *argv[]);
>> > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>> > unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1,
inject_pkt,
>> > &pending_pkt);
>> >
>> > + stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>> > /* Main loop. */
>> > exiting = false;
>> > + unsigned int our_seqno = 0;
>> > while (!exiting) {
>> > /* Check OVN SB database. */
>> > char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
>> > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>> > .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>> > };
>> >
>> > + if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>> > + stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>> > + time_msec());
>> When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
>
>
> That actually cannot happen. Check out
stopwatch_start_sample_protected() in lib/stopwatch.c. If the stopwatch
currently has a sample in progress, then the start operation immediately
returns and will not overwrite the previously recorded start time. The
only way that the start time can be recorded is if there is no sample
currently in progress.
Agree. So some of the SB change processing time are silently ignored,
but it is totally fine since it is just for statistics, and it is ok to
ignore some samples in rare situation :)
>
>> > + our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>> > + }
>> > +
>> > update_probe_interval(&ctx, ovnsb_remote);
>> >
>> > update_ssl_config(ctx.ovs_idl);
>> > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>> > ofctrl_put(&flow_table, &pending_ct_zones,
>> > get_nb_cfg(ctx.ovnsb_idl));
>> >
>> > + stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>> > + time_msec());
>> > +
>>
>> Two issues here:
>>
>> Firstly, we start the stopwatch only when seqno changes, but we will
always call the stop, which may result in many unnecessary messages to
the stopwatch thread. Maybe it is not a big issue, but it would be
better if we have a variable to control it to stop only when necessary.
>
>
> This is a good point. I had been viewing the extra stopwatch_stop()
calls as being inconsequential since they would be eventual no-ops.
However, you're definitely right that it leads to extra processing that
we can easily avoid by keeping track of whether we have a sample in
progress.
>
>>
>> Secondly, when multiple iterations are needed for one SB change
handling, the iterations will be like:
>> - iter1:
>> - stopwatch_start
>> - ofctrl_put
>> - stopwatch stop
>> - iter2:
>> - ofctrl_run
>> - (ofctrl_put skipped)
>> ...
>> - iterN:
>> - ofctrl_run
>> - ofctrl_put (will not send to OVS, since desired and installed
flows should be same now)
>> - stopwatch stop
>>
>> So putting stopwatch_stop here still measures only the first
iteration. I had a suggestion earlier about checking ofctrl.cur_cfg, but
I was wrong since I realized that nb_cfg changes only upon user
requesting waiting for HV. To really measure the multi-iteration
processing, we need a mechanism that ofctrl takes current SB seqno as
input and maintains the cur_seq_no just like how cur_cfg is maintained.
>
>
> Thanks for the detailed explanation. I think I understand better the
position you are coming from. Let me first explain what I had been thinking.
>
> - iter1:
> - SB seqno has changed
> - stopwatch_start()
> - ofctrl_can_put() returns false, so no stopwatch_stop call
ofctrl_can_put() return false means the SB change detected in this
iteration can't even be started to get added to ovs.
> ...
> - iterN:
> - ofctrl_can_put() returns true
ofctrl_can_put() returns true means the last SB change now can get added
to OVS, but it may just get started, and it is not completed yet when
multiple iterations are needed.
> - stopwatch_stop()
>
> On iterations 2 through N, it does not matter if the SB seqno has
changed or not. The stopwatch_start() will be a no-op since there is
already a sample being measured. The change to the southbound database
that we observed in iteration 1 gets timed until iteration N.
>
> The question here is whether my poor naming of the measurement has
created confusion. In other words, am I expecting a measurement of one
thing while you expect the measurement of something else? Or is the
logic being used here completely incorrect?
>
> What I was intending to measure was the time it takes to see a change
in the southbound database, create the resultant flows, and send those
to vswitchd. I believe what you're thinking about is just the latter two
steps.
To measure the time it takes to "see a change in the southbound
database, create the resultant flows, and send those to vswitchd", I
think we need to consider the iterations in my previous description.
ofctrl_can_put() returns true means the flow sending to vswitchd starts
in this iteration for sure, but it may or may not finish in this same
iteration. If oftrl_put() finishes sending the flow in this same
iteration, your current logic works perfectly. Otherwise, if multiple
iterations are needed, it is ofctrl_run() in the next iterations that
completes the flow sendings to vswitchd. The iterations you described is
not a start-to-end cycle of
SB change handling (please see my comment inlined).
As you see, to measure a real complete cycle of "SB change - flow
compute - adding to ovs" is a little bit tricky, and ofctrl_can_put()
itself doesn't provide all the information, so that's why I proposed to
change the scope/rename the statistics of this patch just to measure the
controller loop time which is straighforward, and then have the *current
seq_no* mechanism implemented in ofctrl later so that we can measure the
"SB change - flow compute - adding to ovs" cycle. Hope I didn't make it
more confusing :)
This makes a great amount of sense and is not confusing. Thank you.
I think the main flaw between what I actually coded and what I said is
the part about adding the flows to OVS. In reality, I'm way more
concerned about the time it takes to generate flows than anything else,
since that happens 100% in ovn-controller and is something that is good
to have measurements for. I can think of two things I can do here that
might be good:
1) Change the scope of the stopwatch (as well as the name) so that it
only operates within the ofctrl_can_put() conditional. This way, all it
is measuring is flow creation. I could call the stopwatch something like
"ovn-controller-flow-process" or something. This way, I'm zeroing in on
just the operation I think is a good one to be measuring. The southbound
database doesn't play into it, nor does passing the flows down to OVS.
2) Keep the stopwatch measurements basically where they are and change
the description of what I'm measuring. What it currently measures is the
flow creation for a specific change to the southbound database, plus
potentially passing it down to OVS. If I move the stopwatch_stop() call
to before ofctrl_put(), then I think I'll be measuring the time it takes
to process a specific change to the southbound database, but not taking
into account sending the flows to OVS.
Another option is to implement both (1) and (2). (1) can give a general
idea of how long it takes to generate flows, no matter the circumstance.
(2) can give a general idea of how long it takes to create flows based
on changes seen in the southbound database.
In either case, I think you're right that putting off the ofctrl seqno
changes until later.
What do you think?
Thanks,
Han
>
>> > hmap_destroy(&flow_table);
>> > }
>> > if (ctx.ovnsb_idl_txn) {
>> > @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>> > ofctrl_wait();
>> > pinctrl_wait(&ctx);
>> > }
>> > +
>> > ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>> >
>> > if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
>> > --
>> > 2.14.3
>> >
>> > _______________________________________________
>> > dev mailing list
>> > d...@openvswitch.org <mailto:d...@openvswitch.org>
<mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>
>> Acked-by: Han Zhou <hzh...@ebay.com <mailto:hzh...@ebay.com>
<mailto:hzh...@ebay.com <mailto:hzh...@ebay.com>>>
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev