On Tue, Mar 12, 2019 at 6:16 PM Mark Michelson <[email protected]> wrote:
> On 3/11/19 6:11 PM, Han Zhou wrote: > > On Mon, Mar 11, 2019 at 9:29 AM <[email protected]> wrote: > >> > >> From: Numan Siddique <[email protected]> > >> > >> Prior to this patch, ovn-controller was single threaded and everytime > the > >> poll_block() at the end of the main while() loop wakes up, it processes > >> the whole SB DB and translates the logical flows to OF flows. > >> > >> There are few issues with this - > >> > >> * For every packet-in received, ovn-controller does this translation > >> resulting in unnecessary CPU cycles. > >> > >> * If the translation takes a lot of time, then the packet-in handling > >> would get delayed. The delay in responses to DHCP requests could > >> result in resending of these requests. > >> > >> This patch addresses these issues by creating a new pthread in pinctrl > module > >> to handle packet-ins. This thread doesn't access the Southbound DB IDL > object. > >> > >> Since some of the OVN actions - like dns_lookup, arp, put_arp, put_nd > >> require access to the Southbound DB contents and gARPs, periodic IPv6 RA > >> generation also requires the DB access, pinctrl_run() called by the main > >> ovn-controller thread accesses the Southbound DB IDL and builds the > local > >> datastructures. pinctrl_handler thread accesses these data structures > >> in handling such requests. An ovs_mutex is used between the > pinctr_run() and > >> the pinctrl_handler thread to protect these data structures. > >> > >> Signed-off-by: Numan Siddique <[email protected]> > >> --- > >> ovn/controller/pinctrl.c | 681 ++++++++++++++++++++++++++++++--------- > >> 1 file changed, 532 insertions(+), 149 deletions(-) > >> > >> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c > >> index 100a20ff2..857cb5d51 100644 > >> --- a/ovn/controller/pinctrl.c > >> +++ b/ovn/controller/pinctrl.c > >> @@ -27,6 +27,7 @@ > >> #include "lport.h" > >> #include "nx-match.h" > >> #include "ovn-controller.h" > >> +#include "latch.h" > >> #include "lib/packets.h" > >> #include "lib/sset.h" > >> #include "openvswitch/ofp-actions.h" > >> @@ -48,19 +49,13 @@ > >> #include "openvswitch/poll-loop.h" > >> #include "openvswitch/rconn.h" > >> #include "socket-util.h" > >> +#include "seq.h" > >> #include "timeval.h" > >> #include "vswitch-idl.h" > >> #include "lflow.h" > >> > >> VLOG_DEFINE_THIS_MODULE(pinctrl); > >> > >> -/* OpenFlow connection to the switch. */ > >> -static struct rconn *swconn; > >> - > >> -/* Last seen sequence number for 'swconn'. When this differs from > >> - * rconn_get_connection_seqno(rconn), 'swconn' has reconnected. */ > >> -static unsigned int conn_seq_no; > >> - > >> static void init_buffered_packets_map(void); > >> static void destroy_buffered_packets_map(void); > >> > >> @@ -76,11 +71,14 @@ static void run_put_mac_bindings( > >> struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip); > >> static void wait_put_mac_bindings(struct ovsdb_idl_txn > *ovnsb_idl_txn); > >> static void flush_put_mac_bindings(void); > >> +static void buffer_put_mac_bindings(void); > >> +static void destroy_buffered_mac_bindings(void); > >> +static void send_mac_binding_buffered_pkts(struct rconn *swconn); > >> > >> static void init_send_garps(void); > >> static void destroy_send_garps(void); > >> static void send_garp_wait(void); > >> -static void send_garp_run( > >> +static void send_garp_prepare( > >> struct ovsdb_idl_index *sbrec_chassis_by_name, > >> struct ovsdb_idl_index *sbrec_port_binding_by_datapath, > >> struct ovsdb_idl_index *sbrec_port_binding_by_name, > >> @@ -88,45 +86,144 @@ static void send_garp_run( > >> const struct sbrec_chassis *, > >> const struct hmap *local_datapaths, > >> const struct sset *active_tunnels); > >> -static void pinctrl_handle_nd_na(const struct flow *ip_flow, > >> +static void send_garp_run(struct rconn *swconn); > >> +static void pinctrl_handle_nd_na(struct rconn *swconn, > >> + const struct flow *ip_flow, > >> const struct match *md, > >> struct ofpbuf *userdata, > >> bool is_router); > >> static void reload_metadata(struct ofpbuf *ofpacts, > >> const struct match *md); > >> static void pinctrl_handle_put_nd_ra_opts( > >> + struct rconn *swconn, > >> const struct flow *ip_flow, struct dp_packet *pkt_in, > >> struct ofputil_packet_in *pin, struct ofpbuf *userdata, > >> struct ofpbuf *continuation); > >> -static void pinctrl_handle_nd_ns(const struct flow *ip_flow, > >> +static void pinctrl_handle_nd_ns(struct rconn *swconn, > >> + const struct flow *ip_flow, > >> struct dp_packet *pkt_in, > >> const struct match *md, > >> struct ofpbuf *userdata); > >> static void init_ipv6_ras(void); > >> static void destroy_ipv6_ras(void); > >> static void ipv6_ra_wait(void); > >> -static void send_ipv6_ras( > >> +static void prepare_ipv6_ras( > >> struct ovsdb_idl_index *sbrec_port_binding_by_datapath, > >> struct ovsdb_idl_index *sbrec_port_binding_by_name, > >> const struct hmap *local_datapaths); > >> -; > >> +static void send_ipv6_ras(struct rconn *swconn); > >> > >> COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); > >> COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); > >> > >> +/* pinctrl module creates a thread - pinctrl_handler to handle > >> + * the packet-ins from ovs-vswitchd. Some of the OVN actions > >> + * are translated to OF 'controller' actions. See include/ovn/actions.h > >> + * for more details. > >> + * > >> + * pinctrl_handler thread doesn't access the Southbound IDL object. But > >> + * some of the OVN actions which gets translated to 'controller' > >> + * OF action, require data from Southbound DB. Below are the details > >> + * on how these actions are implemented. > >> + * > >> + * pinctrl_run() function is called by ovn-controller main thread. > >> + * A Mutex - 'pinctrl_mutex' is used between the pinctrl_handler() > thread > >> + * and pinctrl_run(). > >> + * > >> + * - dns_lookup - In order to do a DNS lookup, this action needs > >> + * to access the 'DNS' table. pinctrl_run() > builds a > >> + * local DNS cache - 'dns_cache'. See > sync_dns_cache() > >> + * for more details. > >> + * The function 'pinctrl_handle_dns_lookup()' > (which is > >> + * called with in the pinctrl_handler thread) > looks into > >> + * the local DNS cache to resolve the DNS > requests. > >> + * > >> + * - put_arp/put_nd - These actions stores the IPv4/IPv6 and MAC > addresses > >> + * in the 'MAC_Binding' table. > >> + * The function > 'pinctrl_handle_put_mac_binding()' (which > >> + * is called with in the pinctrl_handler thread), > stores > >> + * the IPv4/IPv6 and MAC addresses in the > >> + * hmap - put_mac_bindings. > >> + * > >> + * pinctrl_run(), reads these mac bindings from > the hmap > >> + * 'put_mac_bindings' and writes to the > 'MAC_Binding' > >> + * table in the Southbound DB. > >> + * > >> + * - arp/nd_ns - These actions generate an ARP/IPv6 Neighbor > solicit > >> + * requests. The original packets are buffered and > >> + * injected back when put_arp/put_nd actions are > called. > >> + * When pinctrl_run(), writes the mac bindings > from the > >> + * 'put_mac_bindings' hmap, it moves these mac > bindings > >> + * to another hmap - 'buffered_mac_bindings'. > >> + * > >> + * The pinctrl_handler thread calls the function - > >> + * send_mac_binding_buffered_pkts(), which uses > >> + * the hmap - 'buffered_mac_bindings' and > reinjects the > >> + * buffered packets. > >> + * > >> + * pinctrl module also periodically sends IPv6 Router Solicitation > requests > >> + * and gARPs (for the router gateway IPs and configured NAT addresses). > >> + * > >> + * IPv6 RA handling - pinctrl_run() prepares the IPv6 RA information > >> + * (see prepare_ipv6_ras()) in the shash 'ipv6_ras' > by > >> + * looking into the Southbound DB table - > Port_Binding. > >> + * > >> + * pinctrl_handler thread sends the periodic IPv6 > RAs using > >> + * the shash - 'ipv6_ras' > >> + * > >> + * gARP handling - pinctrl_run() prepares the gARP information > >> + * (see send_garp_prepare()) in the shash > 'send_garp_data' > >> + * by looking into the Southbound DB table > Port_Binding. > >> + * > >> + * pinctrl_handler() thread sends these gARPs using > the > >> + * shash 'send_garp_data'. > >> + * > >> + * Notification between pinctrl_handler() and pinctrl_run() > >> + * ------------------------------------------------------- > >> + * 'struct seq' is used for notification between pinctrl_handler() > thread > >> + * and pinctrl_run(). > >> + * 'pinctrl_handler_seq' is used by pinctrl_run() to > >> + * wake up pinctrl_handler thread from poll_block() if any changes > happened > >> + * in 'send_garp_data', 'ipv6_ras' and 'buffered_mac_bindings' > structures. > >> + * > >> + * 'pinctrl_main_seq' is used by pinctrl_handler() thread to wake up > >> + * the main thread from poll_block() when mac bindings needs to be > updated > >> + * in the Southboubd DB. > >> + * */ > >> + > > > > Thanks Numan for addressing this problem. It is very good that the > > implementation doesn't require a separate IDL for SB access. Here are > > some generic comments. > > > > For operations that doesn't rely on SB DB, such as OVS probe echo, ACL > > logging and DHCP, in theory, they can be handled completely in > > parallel. However, in this patch the pinctrl_mutex locks the whole > > pinctrl_handler. I think the lock can be relaxed to exclude the > > operations that doesn't need SB DB access. > I did think of it earlier, but then went with this approach to take a safe approach. I will address this in v2 as per your suggestion. > > > > For tasks that need READONLY access to SB DB, the main thread reads > > from DB, and notifies pinctrl thread to process the changed data. This > > synchronization is definitely required and the implementation looks > > good to me. > > > > For tasks that requires WRITE access to SB DB, such as put_arp, it > > will still trigger recompute in main thread with this patch, and the > > two thread would still block each other. > up main thread with pinctrl_main_seq, the main thread will recompute > > flows again, even if there is no change in SB DB. In fact this can be > > solved by the first patches of the incremental processing > > ( > https://github.com/hzhou8/ovs/commit/2bc0e2423fdcff21db4d607f3164b0e0c62cd660 > ). > > In that patch there is no real incremental processing yet, but since > > it is abstracted with dependency model, it can avoid flow computing > > when it is not needed. Do you think it is good to be combined with > > this multi-threading patch to avoid the problem of put_arp triggering > > flow computing which blocks put_arp operation? > > I think the idea of not recomputing flows if SB DB has not changed is a > fantastic idea and should be implemented whether pinctrl is made > multithreaded or not. If the goal is *only* not to recompute flows if SB > DB has not changed, then it potentially could be done with a simpler set > of patches, since there is no goal of implementing an I-P engine. > > Thanks Han and Mark for the comments. I agree with Mark. I think we should address that and it would definitely help in reducing ovn-controller's CPU usage. But I think that can be addressed as a separate patch since this patch is not changing that behavior. I will look into your patch you pointed out and see if I can test it out. Does this sound good ? Thanks Numan > I was going to suggest that it may be possible to avoid such a patch by > having the pinctrl_handle_mac_binding() function detect if local data > has changed before notifying the main thread. However, I don't think > this will work if MAC bindings move between hypervisors. > > Sorry for bringing the > > I-P series back if it is annoying. I think the major reason that was > > rejected earlier was the concern of maintenance effort for dependency > > graph and change-handler implementations. However, since the first > > patches of I-P only build the coarse grained dependency without any > > change-handler, we can just utilize the benefit without worrying about > > maintenance, although I might be biased. (p.s. I am still eager to > > work on DDlog for ovn-controller, but I guess it will just take quite > > some time to be ready). > > > > Thanks, > > Han > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
