Re: [ovs-dev] [PATCH v6 2/5] Measure timing of ovn-controller loops.

Han Zhou Wed, 21 Mar 2018 14:34:02 -0700

(Add the dev list back)

On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <[email protected]> wrote:
>
> On 03/15/2018 12:04 PM, Han Zhou wrote:
>>
>>
>>
>> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
>>  >
>>  > This modifies ovn-controller to measure the amount of time it takes to
>>  > detect a change in the southbound database and generate the resulting
>>  > flow table. This may require multiple iterations of the ovn-controller
>>  > loop.
>>  >
>>  > The statistics can be queried using:
>>  >
>>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>>  >
>>  > The statistics can be reset using:
>>  >
>>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>>  >
>> If the purpose of this patch is for measuring *SB changes* handling time
only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since the
controller loop handles many other things, such as local interface changes,
Packet-ins, OF messages, etc.
>
>
> Sure, this sounds reasonable. I'll talk about this a bit more down below,
though.
>
>>
>>  > Signed-off-by: Mark Michelson <[email protected] <mailto:
[email protected]>>
>>
>>  > ---
>>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>>  >  1 file changed, 17 insertions(+)
>>  >
>>  > diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>>  > index 7592bda25..abd253fca 100644
>>  > --- a/ovn/controller/ovn-controller.c
>>  > +++ b/ovn/controller/ovn-controller.c
>>  > @@ -57,6 +57,9 @@
>>  >  #include "stream.h"
>>  >  #include "unixctl.h"
>>  >  #include "util.h"
>>  > +#include "timeval.h"
>>  > +#include "timer.h"
>>  > +#include "stopwatch.h"
>>  >
>>  >  VLOG_DEFINE_THIS_MODULE(main);
>>  >
>>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>>  >
>>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>>  > +
>>  >  static void update_probe_interval(struct controller_ctx *,
>>  >                                    const char *ovnsb_remote);
>>  >  static void parse_options(int argc, char *argv[]);
>>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1,
inject_pkt,
>>  >                               &pending_pkt);
>>  >
>>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>  >      /* Main loop. */
>>  >      exiting = false;
>>  > +    unsigned int our_seqno = 0;
>>  >      while (!exiting) {
>>  >          /* Check OVN SB database. */
>>  >          char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
>>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>  >          };
>>  >
>>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  > +                            time_msec());
>> When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
>
>
> That actually cannot happen. Check out stopwatch_start_sample_protected()
in lib/stopwatch.c. If the stopwatch currently has a sample in progress,
then the start operation immediately returns and will not overwrite the
previously recorded start time. The only way that the start time can be
recorded is if there is no sample currently in progress.


Agree. So some of the SB change processing time are silently ignored, but
it is totally fine since it is just for statistics, and it is ok to ignore
some samples in rare situation :)
>
>>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>>  > +        }
>>  > +
>>  >          update_probe_interval(&ctx, ovnsb_remote);
>>  >
>>  >          update_ssl_config(ctx.ovs_idl);
>>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>>  >
>>  > +                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  > +                                   time_msec());
>>  > +
>>
>> Two issues here:
>>
>> Firstly, we start the stopwatch only when seqno changes, but we will
always call the stop, which may result in many unnecessary messages to the
stopwatch thread. Maybe it is not a big issue, but it would be better if we
have a variable to control it to stop only when necessary.
>
>
> This is a good point. I had been viewing the extra stopwatch_stop() calls
as being inconsequential since they would be eventual no-ops. However,
you're definitely right that it leads to extra processing that we can
easily avoid by keeping track of whether we have a sample in progress.
>
>>
>> Secondly, when multiple iterations are needed for one SB change
handling, the iterations will be like:
>> - iter1:
>>      - stopwatch_start
>>      - ofctrl_put
>>      - stopwatch stop
>> - iter2:
>>      - ofctrl_run
>>      - (ofctrl_put skipped)
>> ...
>> - iterN:
>>      - ofctrl_run
>>      - ofctrl_put (will not send to OVS, since desired and installed
flows should be same now)
>>      - stopwatch stop
>>
>> So putting stopwatch_stop here still measures only the first iteration.
I had a suggestion earlier about checking ofctrl.cur_cfg, but I was wrong
since I realized that nb_cfg changes only upon user requesting waiting for
HV. To really measure the multi-iteration processing, we need a mechanism
that ofctrl takes current SB seqno as input and maintains the cur_seq_no
just like how cur_cfg is maintained.
>
>
> Thanks for the detailed explanation. I think I understand better the
position you are coming from. Let me first explain what I had been thinking.
>
> - iter1:
>     - SB seqno has changed
>     - stopwatch_start()
>     - ofctrl_can_put() returns false, so no stopwatch_stop call
ofctrl_can_put() return false means the SB change detected in this
iteration can't even be started to get added to ovs.
> ...
> - iterN:
>     - ofctrl_can_put() returns true
ofctrl_can_put() returns true means the last SB change now can get added to
OVS, but it may just get started, and it is not completed yet when multiple
iterations are needed.
>     - stopwatch_stop()
>
> On iterations 2 through N, it does not matter if the SB seqno has changed
or not. The stopwatch_start() will be a no-op since there is already a
sample being measured. The change to the southbound database that we
observed in iteration 1 gets timed until iteration N.
>
> The question here is whether my poor naming of the measurement has
created confusion. In other words, am I expecting a measurement of one
thing while you expect the measurement of something else? Or is the logic
being used here completely incorrect?
>
> What I was intending to measure was the time it takes to see a change in
the southbound database, create the resultant flows, and send those to
vswitchd. I believe what you're thinking about is just the latter two steps.

To measure the time it takes to "see a change in the southbound database,
create the resultant flows, and send those to vswitchd", I think we need to
consider the iterations in my previous description. ofctrl_can_put()
returns true means the flow sending to vswitchd starts in this iteration
for sure, but it may or may not finish in this same iteration. If
oftrl_put() finishes sending the flow in this same iteration, your current
logic works perfectly. Otherwise, if multiple iterations are needed, it is
ofctrl_run() in the next iterations that completes the flow sendings to
vswitchd. The iterations you described is not a start-to-end cycle of
SB change handling (please see my comment inlined).

As you see, to measure a real complete cycle of "SB change - flow compute -
adding to ovs" is a little bit tricky, and ofctrl_can_put() itself doesn't
provide all the information, so that's why I proposed to change the
scope/rename the statistics of this patch just to measure the controller
loop time which is straighforward, and then have the *current seq_no*
mechanism implemented in ofctrl later so that we can measure the "SB change
- flow compute - adding to ovs" cycle. Hope I didn't make it more confusing
:)

Thanks,
Han

>
>>  >                      hmap_destroy(&flow_table);
>>  >                  }
>>  >                  if (ctx.ovnsb_idl_txn) {
>>  > @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>>  >              ofctrl_wait();
>>  >              pinctrl_wait(&ctx);
>>  >          }
>>  > +
>>  >          ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>>  >
>>  >          if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
>>  > --
>>  > 2.14.3
>>  >
>>  > _______________________________________________
>>  > dev mailing list
>>  > [email protected] <mailto:[email protected]>
>>  > https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>
>> Acked-by: Han Zhou <[email protected] <mailto:[email protected]>>
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v6 2/5] Measure timing of ovn-controller loops.

Reply via email to