On Tue, Mar 12, 2019 at 9:22 AM Numan Siddique <[email protected]> wrote:
>
>
>
> On Tue, Mar 12, 2019 at 6:16 PM Mark Michelson <[email protected]> wrote:
>>
>> On 3/11/19 6:11 PM, Han Zhou wrote:
>> > On Mon, Mar 11, 2019 at 9:29 AM <[email protected]> wrote:
>> >>
>> >> From: Numan Siddique <[email protected]>
>> >>
>> >> Prior to this patch, ovn-controller was single threaded and everytime the
>> >> poll_block() at the end of the main while() loop wakes up, it  processes
>> >> the whole SB DB and translates the logical flows to OF flows.
>> >>
>> >> There are few issues with this -
>> >>
>> >>    * For every packet-in received, ovn-controller does this translation
>> >>      resulting in unnecessary CPU cycles.
>> >>
>> >>    * If the translation takes a lot of time, then the packet-in handling
>> >>      would get delayed. The delay in responses to DHCP requests could
>> >>      result in resending of these requests.
>> >>
>> >> This patch addresses these issues by creating a new pthread in pinctrl 
>> >> module
>> >> to handle packet-ins. This thread doesn't access the Southbound DB IDL 
>> >> object.
>> >>
>> >> Since some of the OVN actions - like dns_lookup, arp, put_arp, put_nd
>> >> require access to the Southbound DB contents and gARPs, periodic IPv6 RA
>> >> generation also requires the DB access, pinctrl_run() called by the main
>> >> ovn-controller thread accesses the Southbound DB IDL and builds the local
>> >> datastructures. pinctrl_handler thread accesses these data structures
>> >> in handling such requests. An ovs_mutex is used between the pinctr_run() 
>> >> and
>> >> the pinctrl_handler thread to protect these data structures.
>> >>
>> >> Signed-off-by: Numan Siddique <[email protected]>
>> >> ---
>> >>   ovn/controller/pinctrl.c | 681 ++++++++++++++++++++++++++++++---------
>> >>   1 file changed, 532 insertions(+), 149 deletions(-)
>> >>
>> >> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
>> >> index 100a20ff2..857cb5d51 100644
>> >> --- a/ovn/controller/pinctrl.c
>> >> +++ b/ovn/controller/pinctrl.c
>> >> @@ -27,6 +27,7 @@
>> >>   #include "lport.h"
>> >>   #include "nx-match.h"
>> >>   #include "ovn-controller.h"
>> >> +#include "latch.h"
>> >>   #include "lib/packets.h"
>> >>   #include "lib/sset.h"
>> >>   #include "openvswitch/ofp-actions.h"
>> >> @@ -48,19 +49,13 @@
>> >>   #include "openvswitch/poll-loop.h"
>> >>   #include "openvswitch/rconn.h"
>> >>   #include "socket-util.h"
>> >> +#include "seq.h"
>> >>   #include "timeval.h"
>> >>   #include "vswitch-idl.h"
>> >>   #include "lflow.h"
>> >>
>> >>   VLOG_DEFINE_THIS_MODULE(pinctrl);
>> >>
>> >> -/* OpenFlow connection to the switch. */
>> >> -static struct rconn *swconn;
>> >> -
>> >> -/* Last seen sequence number for 'swconn'.  When this differs from
>> >> - * rconn_get_connection_seqno(rconn), 'swconn' has reconnected. */
>> >> -static unsigned int conn_seq_no;
>> >> -
>> >>   static void init_buffered_packets_map(void);
>> >>   static void destroy_buffered_packets_map(void);
>> >>
>> >> @@ -76,11 +71,14 @@ static void run_put_mac_bindings(
>> >>       struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip);
>> >>   static void wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn);
>> >>   static void flush_put_mac_bindings(void);
>> >> +static void buffer_put_mac_bindings(void);
>> >> +static void destroy_buffered_mac_bindings(void);
>> >> +static void send_mac_binding_buffered_pkts(struct rconn *swconn);
>> >>
>> >>   static void init_send_garps(void);
>> >>   static void destroy_send_garps(void);
>> >>   static void send_garp_wait(void);
>> >> -static void send_garp_run(
>> >> +static void send_garp_prepare(
>> >>       struct ovsdb_idl_index *sbrec_chassis_by_name,
>> >>       struct ovsdb_idl_index *sbrec_port_binding_by_datapath,
>> >>       struct ovsdb_idl_index *sbrec_port_binding_by_name,
>> >> @@ -88,45 +86,144 @@ static void send_garp_run(
>> >>       const struct sbrec_chassis *,
>> >>       const struct hmap *local_datapaths,
>> >>       const struct sset *active_tunnels);
>> >> -static void pinctrl_handle_nd_na(const struct flow *ip_flow,
>> >> +static void send_garp_run(struct rconn *swconn);
>> >> +static void pinctrl_handle_nd_na(struct rconn *swconn,
>> >> +                                 const struct flow *ip_flow,
>> >>                                    const struct match *md,
>> >>                                    struct ofpbuf *userdata,
>> >>                                    bool is_router);
>> >>   static void reload_metadata(struct ofpbuf *ofpacts,
>> >>                               const struct match *md);
>> >>   static void pinctrl_handle_put_nd_ra_opts(
>> >> +    struct rconn *swconn,
>> >>       const struct flow *ip_flow, struct dp_packet *pkt_in,
>> >>       struct ofputil_packet_in *pin, struct ofpbuf *userdata,
>> >>       struct ofpbuf *continuation);
>> >> -static void pinctrl_handle_nd_ns(const struct flow *ip_flow,
>> >> +static void pinctrl_handle_nd_ns(struct rconn *swconn,
>> >> +                                 const struct flow *ip_flow,
>> >>                                    struct dp_packet *pkt_in,
>> >>                                    const struct match *md,
>> >>                                    struct ofpbuf *userdata);
>> >>   static void init_ipv6_ras(void);
>> >>   static void destroy_ipv6_ras(void);
>> >>   static void ipv6_ra_wait(void);
>> >> -static void send_ipv6_ras(
>> >> +static void prepare_ipv6_ras(
>> >>       struct ovsdb_idl_index *sbrec_port_binding_by_datapath,
>> >>       struct ovsdb_idl_index *sbrec_port_binding_by_name,
>> >>       const struct hmap *local_datapaths);
>> >> -;
>> >> +static void send_ipv6_ras(struct rconn *swconn);
>> >>
>> >>   COVERAGE_DEFINE(pinctrl_drop_put_mac_binding);
>> >>   COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map);
>> >>
>> >> +/* pinctrl module creates a thread - pinctrl_handler to handle
>> >> + * the packet-ins from ovs-vswitchd. Some of the OVN actions
>> >> + * are translated to OF 'controller' actions. See include/ovn/actions.h
>> >> + * for more details.
>> >> + *
>> >> + * pinctrl_handler thread doesn't access the Southbound IDL object. But
>> >> + * some of the OVN actions which gets translated to 'controller'
>> >> + * OF action, require data from Southbound DB.  Below are the details
>> >> + * on how these actions are implemented.
>> >> + *
>> >> + * pinctrl_run() function is called by ovn-controller main thread.
>> >> + * A Mutex - 'pinctrl_mutex' is used between the pinctrl_handler() thread
>> >> + * and pinctrl_run().
>> >> + *
>> >> + *   - dns_lookup -     In order to do a DNS lookup, this action needs
>> >> + *                      to access the 'DNS' table. pinctrl_run() builds a
>> >> + *                      local DNS cache - 'dns_cache'. See 
>> >> sync_dns_cache()
>> >> + *                      for more details.
>> >> + *                      The function 'pinctrl_handle_dns_lookup()' 
>> >> (which is
>> >> + *                      called with in the pinctrl_handler thread) looks 
>> >> into
>> >> + *                      the local DNS cache to resolve the DNS requests.
>> >> + *
>> >> + *   - put_arp/put_nd - These actions stores the IPv4/IPv6 and MAC 
>> >> addresses
>> >> + *                      in the 'MAC_Binding' table.
>> >> + *                      The function 'pinctrl_handle_put_mac_binding()' 
>> >> (which
>> >> + *                      is called with in the pinctrl_handler thread), 
>> >> stores
>> >> + *                      the IPv4/IPv6 and MAC addresses in the
>> >> + *                      hmap - put_mac_bindings.
>> >> + *
>> >> + *                      pinctrl_run(), reads these mac bindings from the 
>> >> hmap
>> >> + *                      'put_mac_bindings' and writes to the 
>> >> 'MAC_Binding'
>> >> + *                      table in the Southbound DB.
>> >> + *
>> >> + *   - arp/nd_ns      - These actions generate an ARP/IPv6 Neighbor 
>> >> solicit
>> >> + *                      requests. The original packets are buffered and
>> >> + *                      injected back when put_arp/put_nd actions are 
>> >> called.
>> >> + *                      When pinctrl_run(), writes the mac bindings from 
>> >> the
>> >> + *                      'put_mac_bindings' hmap, it moves these mac 
>> >> bindings
>> >> + *                      to another hmap - 'buffered_mac_bindings'.
>> >> + *
>> >> + *                      The pinctrl_handler thread calls the function -
>> >> + *                      send_mac_binding_buffered_pkts(), which uses
>> >> + *                      the hmap - 'buffered_mac_bindings' and reinjects 
>> >> the
>> >> + *                      buffered packets.
>> >> + *
>> >> + * pinctrl module also periodically sends IPv6 Router Solicitation 
>> >> requests
>> >> + * and gARPs (for the router gateway IPs and configured NAT addresses).
>> >> + *
>> >> + * IPv6 RA handling - pinctrl_run() prepares the IPv6 RA information
>> >> + *                    (see prepare_ipv6_ras()) in the shash 'ipv6_ras' by
>> >> + *                    looking into the Southbound DB table - 
>> >> Port_Binding.
>> >> + *
>> >> + *                    pinctrl_handler thread sends the periodic IPv6 RAs 
>> >> using
>> >> + *                    the shash - 'ipv6_ras'
>> >> + *
>> >> + * gARP handling    - pinctrl_run() prepares the gARP information
>> >> + *                    (see send_garp_prepare()) in the shash 
>> >> 'send_garp_data'
>> >> + *                    by looking into the Southbound DB table 
>> >> Port_Binding.
>> >> + *
>> >> + *                    pinctrl_handler() thread sends these gARPs using 
>> >> the
>> >> + *                    shash 'send_garp_data'.
>> >> + *
>> >> + * Notification between pinctrl_handler() and pinctrl_run()
>> >> + * -------------------------------------------------------
>> >> + * 'struct seq' is used for notification between pinctrl_handler() thread
>> >> + *  and pinctrl_run().
>> >> + *  'pinctrl_handler_seq' is used by pinctrl_run() to
>> >> + *  wake up pinctrl_handler thread from poll_block() if any changes 
>> >> happened
>> >> + *  in 'send_garp_data', 'ipv6_ras' and 'buffered_mac_bindings' 
>> >> structures.
>> >> + *
>> >> + *  'pinctrl_main_seq' is used by pinctrl_handler() thread to wake up
>> >> + *  the main thread from poll_block() when mac bindings needs to be 
>> >> updated
>> >> + *  in the Southboubd DB.
>> >> + * */
>> >> +
>> >
>> > Thanks Numan for addressing this problem. It is very good that the
>> > implementation doesn't require a separate IDL for SB access. Here are
>> > some generic comments.
>> >
>> > For operations that doesn't rely on SB DB, such as OVS probe echo, ACL
>> > logging and DHCP, in theory, they can be handled completely in
>> > parallel. However, in this patch the pinctrl_mutex locks the whole
>> > pinctrl_handler. I think the lock can be relaxed to exclude the
>> > operations that doesn't need SB DB access.
>
>
> I did think of it earlier, but then went with this approach to take a safe 
> approach.
> I will address this in v2  as per your suggestion.
>
>>
>> >
>> > For tasks that need READONLY access to SB DB, the main thread reads
>> > from DB, and notifies pinctrl thread to process the changed data. This
>> > synchronization is definitely required and the implementation looks
>> > good to me.
>> >
>> > For tasks that requires WRITE access to SB DB, such as put_arp, it
>> > will still trigger recompute in main thread with this patch, and the
>> > two thread would still block each other.
>>
>> > up main thread with pinctrl_main_seq, the main thread will recompute
>> > flows again, even if there is no change in SB DB. In fact this can be
>> > solved by the first patches of the incremental processing
>> > (https://github.com/hzhou8/ovs/commit/2bc0e2423fdcff21db4d607f3164b0e0c62cd660).
>> > In that patch there is no real incremental processing yet, but since
>> > it is abstracted with dependency model, it can avoid flow computing
>> > when it is not needed. Do you think it is good to be combined with
>> > this multi-threading patch to avoid the problem of put_arp triggering
>> > flow computing which blocks put_arp operation?
>>
>> I think the idea of not recomputing flows if SB DB has not changed is a
>> fantastic idea and should be implemented whether pinctrl is made
>> multithreaded or not. If the goal is *only* not to recompute flows if SB
>> DB has not changed, then it potentially could be done with a simpler set
>> of patches, since there is no goal of implementing an I-P engine.
>>

It is not only about SB DB change, but about all input changes that
can trigger flow computation. The first patches of I-P engine just
accomplish that, but it is true that it can be simpler, at least the
flow table persistent is not necessary for this purpose. I will try to
remove the unnecessary part from that patch.

>
> Thanks Han and Mark for the comments.
> I agree with Mark.  I think we should address that and it would definitely
> help in reducing ovn-controller's  CPU usage. But I think that can be
> addressed as a separate patch since this patch is not changing that
> behavior. I will look into your patch you pointed out and see if I can test
> it out.
>
> Does this sound good ?

Agree. I can send a separate one.


>
> Thanks
> Numan
>
>
>>
>> I was going to suggest that it may be possible to avoid such a patch by
>> having the pinctrl_handle_mac_binding() function detect if local data
>> has changed before notifying the main thread. However, I don't think
>> this will work if MAC bindings move between hypervisors.
>>
>> Sorry for bringing the
>> > I-P series back if it is annoying. I think the major reason that was
>> > rejected earlier was the concern of maintenance effort for dependency
>> > graph and change-handler implementations. However, since the first
>> > patches of I-P only build the coarse grained dependency without any
>> > change-handler, we can just utilize the benefit without worrying about
>> > maintenance, although I might be biased. (p.s. I am still eager to
>> > work on DDlog for ovn-controller, but I guess it will just take quite
>> > some time to be ready).
>> >
>> > Thanks,
>> > Han
>> > _______________________________________________
>> > dev mailing list
>> > [email protected]
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to