On Fri, Mar 23, 2018 at 3:22 PM, Mark Michelson <[email protected]> wrote:
>
> Sorry for the delayed response on this, but things are a bit busy/weird
for me right now.
>
> On 03/21/2018 04:33 PM, Han Zhou wrote:
>>
>> (Add the dev list back)
>>
>> On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
>>  >
>>  > On 03/15/2018 12:04 PM, Han Zhou wrote:
>>  >>
>>  >>
>>  >>
>>  >> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <[email protected]
<mailto:[email protected]> <mailto:[email protected] <mailto:
[email protected]>>> wrote:
>>  >>  >
>>  >>  > This modifies ovn-controller to measure the amount of time it
takes to
>>  >>  > detect a change in the southbound database and generate the
resulting
>>  >>  > flow table. This may require multiple iterations of the
ovn-controller
>>  >>  > loop.
>>  >>  >
>>  >>  > The statistics can be queried using:
>>  >>  >
>>  >>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>>  >>  >
>>  >>  > The statistics can be reset using:
>>  >>  >
>>  >>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>>  >>  >
>>  >> If the purpose of this patch is for measuring *SB changes* handling
time only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since the
controller loop handles many other things, such as local interface changes,
Packet-ins, OF messages, etc.
>>  >
>>  >
>>  > Sure, this sounds reasonable. I'll talk about this a bit more down
below, though.
>>  >
>>  >>
>>  >>  > Signed-off-by: Mark Michelson <[email protected] <mailto:
[email protected]> <mailto:[email protected] <mailto:[email protected]
>>>
>>
>>  >>
>>  >>  > ---
>>  >>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>>  >>  >  1 file changed, 17 insertions(+)
>>  >>  >
>>  >>  > diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>>  >>  > index 7592bda25..abd253fca 100644
>>  >>  > --- a/ovn/controller/ovn-controller.c
>>  >>  > +++ b/ovn/controller/ovn-controller.c
>>  >>  > @@ -57,6 +57,9 @@
>>  >>  >  #include "stream.h"
>>  >>  >  #include "unixctl.h"
>>  >>  >  #include "util.h"
>>  >>  > +#include "timeval.h"
>>  >>  > +#include "timer.h"
>>  >>  > +#include "stopwatch.h"
>>  >>  >
>>  >>  >  VLOG_DEFINE_THIS_MODULE(main);
>>  >>  >
>>  >>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>>  >>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>>  >>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>>  >>  >
>>  >>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>>  >>  > +
>>  >>  >  static void update_probe_interval(struct controller_ctx *,
>>  >>  >                                    const char *ovnsb_remote);
>>  >>  >  static void parse_options(int argc, char *argv[]);
>>  >>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>>  >>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1,
inject_pkt,
>>  >>  >                               &pending_pkt);
>>  >>  >
>>  >>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>  >>  >      /* Main loop. */
>>  >>  >      exiting = false;
>>  >>  > +    unsigned int our_seqno = 0;
>>  >>  >      while (!exiting) {
>>  >>  >          /* Check OVN SB database. */
>>  >>  >          char *new_ovnsb_remote =
get_ovnsb_remote(ovs_idl_loop.idl);
>>  >>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>>  >>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>  >>  >          };
>>  >>  >
>>  >>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>>  >>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  >>  > +                            time_msec());
>>  >> When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
>>  >
>>  >
>>  > That actually cannot happen. Check out
stopwatch_start_sample_protected() in lib/stopwatch.c. If the stopwatch
currently has a sample in progress, then the start operation immediately
returns and will not overwrite the previously recorded start time. The only
way that the start time can be recorded is if there is no sample currently
in progress.
>>
>> Agree. So some of the SB change processing time are silently ignored,
but it is totally fine since it is just for statistics, and it is ok to
ignore some samples in rare situation :)
>>  >
>>  >>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>>  >>  > +        }
>>  >>  > +
>>  >>  >          update_probe_interval(&ctx, ovnsb_remote);
>>  >>  >
>>  >>  >          update_ssl_config(ctx.ovs_idl);
>>  >>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>>  >>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>>  >>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>>  >>  >
>>  >>  > +
 stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  >>  > +                                   time_msec());
>>  >>  > +
>>  >>
>>  >> Two issues here:
>>  >>
>>  >> Firstly, we start the stopwatch only when seqno changes, but we will
always call the stop, which may result in many unnecessary messages to the
stopwatch thread. Maybe it is not a big issue, but it would be better if we
have a variable to control it to stop only when necessary.
>>  >
>>  >
>>  > This is a good point. I had been viewing the extra stopwatch_stop()
calls as being inconsequential since they would be eventual no-ops.
However, you're definitely right that it leads to extra processing that we
can easily avoid by keeping track of whether we have a sample in progress.
>>  >
>>  >>
>>  >> Secondly, when multiple iterations are needed for one SB change
handling, the iterations will be like:
>>  >> - iter1:
>>  >>      - stopwatch_start
>>  >>      - ofctrl_put
>>  >>      - stopwatch stop
>>  >> - iter2:
>>  >>      - ofctrl_run
>>  >>      - (ofctrl_put skipped)
>>  >> ...
>>  >> - iterN:
>>  >>      - ofctrl_run
>>  >>      - ofctrl_put (will not send to OVS, since desired and installed
flows should be same now)
>>  >>      - stopwatch stop
>>  >>
>>  >> So putting stopwatch_stop here still measures only the first
iteration. I had a suggestion earlier about checking ofctrl.cur_cfg, but I
was wrong since I realized that nb_cfg changes only upon user requesting
waiting for HV. To really measure the multi-iteration processing, we need a
mechanism that ofctrl takes current SB seqno as input and maintains the
cur_seq_no just like how cur_cfg is maintained.
>>  >
>>  >
>>  > Thanks for the detailed explanation. I think I understand better the
position you are coming from. Let me first explain what I had been thinking.
>>  >
>>  > - iter1:
>>  >     - SB seqno has changed
>>  >     - stopwatch_start()
>>  >     - ofctrl_can_put() returns false, so no stopwatch_stop call
>> ofctrl_can_put() return false means the SB change detected in this
iteration can't even be started to get added to ovs.
>>  > ...
>>  > - iterN:
>>  >     - ofctrl_can_put() returns true
>> ofctrl_can_put() returns true means the last SB change now can get added
to OVS, but it may just get started, and it is not completed yet when
multiple iterations are needed.
>>  >     - stopwatch_stop()
>>  >
>>  > On iterations 2 through N, it does not matter if the SB seqno has
changed or not. The stopwatch_start() will be a no-op since there is
already a sample being measured. The change to the southbound database that
we observed in iteration 1 gets timed until iteration N.
>>  >
>>  > The question here is whether my poor naming of the measurement has
created confusion. In other words, am I expecting a measurement of one
thing while you expect the measurement of something else? Or is the logic
being used here completely incorrect?
>>  >
>>  > What I was intending to measure was the time it takes to see a change
in the southbound database, create the resultant flows, and send those to
vswitchd. I believe what you're thinking about is just the latter two steps.
>>
>> To measure the time it takes to "see a change in the southbound
database, create the resultant flows, and send those to vswitchd", I think
we need to consider the iterations in my previous description.
ofctrl_can_put() returns true means the flow sending to vswitchd starts in
this iteration for sure, but it may or may not finish in this same
iteration. If oftrl_put() finishes sending the flow in this same iteration,
your current logic works perfectly. Otherwise, if multiple iterations are
needed, it is ofctrl_run() in the next iterations that completes the flow
sendings to vswitchd. The iterations you described is not a start-to-end
cycle of
>> SB change handling (please see my comment inlined).
>>
>> As you see, to measure a real complete cycle of "SB change - flow
compute - adding to ovs" is a little bit tricky, and ofctrl_can_put()
itself doesn't provide all the information, so that's why I proposed to
change the scope/rename the statistics of this patch just to measure the
controller loop time which is straighforward, and then have the *current
seq_no* mechanism implemented in ofctrl later so that we can measure the
"SB change - flow compute - adding to ovs" cycle. Hope I didn't make it
more confusing :)
>
>
> This makes a great amount of sense and is not confusing. Thank you.
>
> I think the main flaw between what I actually coded and what I said is
the part about adding the flows to OVS. In reality, I'm way more concerned
about the time it takes to generate flows than anything else, since that
happens 100% in ovn-controller and is something that is good to have
measurements for. I can think of two things I can do here that might be
good:
>
> 1) Change the scope of the stopwatch (as well as the name) so that it
only operates within the ofctrl_can_put() conditional. This way, all it is
measuring is flow creation. I could call the stopwatch something like
"ovn-controller-flow-process" or something. This way, I'm zeroing in on
just the operation I think is a good one to be measuring. The southbound
database doesn't play into it, nor does passing the flows down to OVS.
>
> 2) Keep the stopwatch measurements basically where they are and change
the description of what I'm measuring. What it currently measures is the
flow creation for a specific change to the southbound database, plus
potentially passing it down to OVS. If I move the stopwatch_stop() call to
before ofctrl_put(), then I think I'll be measuring the time it takes to
process a specific change to the southbound database, but not taking into
account sending the flows to OVS.
>
> Another option is to implement both (1) and (2). (1) can give a general
idea of how long it takes to generate flows, no matter the circumstance.
(2) can give a general idea of how long it takes to create flows based on
changes seen in the southbound database.
>
> In either case, I think you're right that putting off the ofctrl seqno
changes until later.
>
> What do you think?
>
Thanks Mark. I think 1) sounds better because it will be more clear for
what we are measuring. We may add more measures later on for other
scenarios.

Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to