On Tue, Mar 12, 2019 at 9:22 AM Numan Siddique <[email protected]> wrote: > > > > On Tue, Mar 12, 2019 at 6:16 PM Mark Michelson <[email protected]> wrote: >> >> On 3/11/19 6:11 PM, Han Zhou wrote: >> > On Mon, Mar 11, 2019 at 9:29 AM <[email protected]> wrote: >> >> >> >> From: Numan Siddique <[email protected]> >> >> >> >> Prior to this patch, ovn-controller was single threaded and everytime the >> >> poll_block() at the end of the main while() loop wakes up, it processes >> >> the whole SB DB and translates the logical flows to OF flows. >> >> >> >> There are few issues with this - >> >> >> >> * For every packet-in received, ovn-controller does this translation >> >> resulting in unnecessary CPU cycles. >> >> >> >> * If the translation takes a lot of time, then the packet-in handling >> >> would get delayed. The delay in responses to DHCP requests could >> >> result in resending of these requests. >> >> >> >> This patch addresses these issues by creating a new pthread in pinctrl >> >> module >> >> to handle packet-ins. This thread doesn't access the Southbound DB IDL >> >> object. >> >> >> >> Since some of the OVN actions - like dns_lookup, arp, put_arp, put_nd >> >> require access to the Southbound DB contents and gARPs, periodic IPv6 RA >> >> generation also requires the DB access, pinctrl_run() called by the main >> >> ovn-controller thread accesses the Southbound DB IDL and builds the local >> >> datastructures. pinctrl_handler thread accesses these data structures >> >> in handling such requests. An ovs_mutex is used between the pinctr_run() >> >> and >> >> the pinctrl_handler thread to protect these data structures. >> >> >> >> Signed-off-by: Numan Siddique <[email protected]> >> >> --- >> >> ovn/controller/pinctrl.c | 681 ++++++++++++++++++++++++++++++--------- >> >> 1 file changed, 532 insertions(+), 149 deletions(-) >> >> >> >> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c >> >> index 100a20ff2..857cb5d51 100644 >> >> --- a/ovn/controller/pinctrl.c >> >> +++ b/ovn/controller/pinctrl.c >> >> @@ -27,6 +27,7 @@ >> >> #include "lport.h" >> >> #include "nx-match.h" >> >> #include "ovn-controller.h" >> >> +#include "latch.h" >> >> #include "lib/packets.h" >> >> #include "lib/sset.h" >> >> #include "openvswitch/ofp-actions.h" >> >> @@ -48,19 +49,13 @@ >> >> #include "openvswitch/poll-loop.h" >> >> #include "openvswitch/rconn.h" >> >> #include "socket-util.h" >> >> +#include "seq.h" >> >> #include "timeval.h" >> >> #include "vswitch-idl.h" >> >> #include "lflow.h" >> >> >> >> VLOG_DEFINE_THIS_MODULE(pinctrl); >> >> >> >> -/* OpenFlow connection to the switch. */ >> >> -static struct rconn *swconn; >> >> - >> >> -/* Last seen sequence number for 'swconn'. When this differs from >> >> - * rconn_get_connection_seqno(rconn), 'swconn' has reconnected. */ >> >> -static unsigned int conn_seq_no; >> >> - >> >> static void init_buffered_packets_map(void); >> >> static void destroy_buffered_packets_map(void); >> >> >> >> @@ -76,11 +71,14 @@ static void run_put_mac_bindings( >> >> struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip); >> >> static void wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn); >> >> static void flush_put_mac_bindings(void); >> >> +static void buffer_put_mac_bindings(void); >> >> +static void destroy_buffered_mac_bindings(void); >> >> +static void send_mac_binding_buffered_pkts(struct rconn *swconn); >> >> >> >> static void init_send_garps(void); >> >> static void destroy_send_garps(void); >> >> static void send_garp_wait(void); >> >> -static void send_garp_run( >> >> +static void send_garp_prepare( >> >> struct ovsdb_idl_index *sbrec_chassis_by_name, >> >> struct ovsdb_idl_index *sbrec_port_binding_by_datapath, >> >> struct ovsdb_idl_index *sbrec_port_binding_by_name, >> >> @@ -88,45 +86,144 @@ static void send_garp_run( >> >> const struct sbrec_chassis *, >> >> const struct hmap *local_datapaths, >> >> const struct sset *active_tunnels); >> >> -static void pinctrl_handle_nd_na(const struct flow *ip_flow, >> >> +static void send_garp_run(struct rconn *swconn); >> >> +static void pinctrl_handle_nd_na(struct rconn *swconn, >> >> + const struct flow *ip_flow, >> >> const struct match *md, >> >> struct ofpbuf *userdata, >> >> bool is_router); >> >> static void reload_metadata(struct ofpbuf *ofpacts, >> >> const struct match *md); >> >> static void pinctrl_handle_put_nd_ra_opts( >> >> + struct rconn *swconn, >> >> const struct flow *ip_flow, struct dp_packet *pkt_in, >> >> struct ofputil_packet_in *pin, struct ofpbuf *userdata, >> >> struct ofpbuf *continuation); >> >> -static void pinctrl_handle_nd_ns(const struct flow *ip_flow, >> >> +static void pinctrl_handle_nd_ns(struct rconn *swconn, >> >> + const struct flow *ip_flow, >> >> struct dp_packet *pkt_in, >> >> const struct match *md, >> >> struct ofpbuf *userdata); >> >> static void init_ipv6_ras(void); >> >> static void destroy_ipv6_ras(void); >> >> static void ipv6_ra_wait(void); >> >> -static void send_ipv6_ras( >> >> +static void prepare_ipv6_ras( >> >> struct ovsdb_idl_index *sbrec_port_binding_by_datapath, >> >> struct ovsdb_idl_index *sbrec_port_binding_by_name, >> >> const struct hmap *local_datapaths); >> >> -; >> >> +static void send_ipv6_ras(struct rconn *swconn); >> >> >> >> COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); >> >> COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); >> >> >> >> +/* pinctrl module creates a thread - pinctrl_handler to handle >> >> + * the packet-ins from ovs-vswitchd. Some of the OVN actions >> >> + * are translated to OF 'controller' actions. See include/ovn/actions.h >> >> + * for more details. >> >> + * >> >> + * pinctrl_handler thread doesn't access the Southbound IDL object. But >> >> + * some of the OVN actions which gets translated to 'controller' >> >> + * OF action, require data from Southbound DB. Below are the details >> >> + * on how these actions are implemented. >> >> + * >> >> + * pinctrl_run() function is called by ovn-controller main thread. >> >> + * A Mutex - 'pinctrl_mutex' is used between the pinctrl_handler() thread >> >> + * and pinctrl_run(). >> >> + * >> >> + * - dns_lookup - In order to do a DNS lookup, this action needs >> >> + * to access the 'DNS' table. pinctrl_run() builds a >> >> + * local DNS cache - 'dns_cache'. See >> >> sync_dns_cache() >> >> + * for more details. >> >> + * The function 'pinctrl_handle_dns_lookup()' >> >> (which is >> >> + * called with in the pinctrl_handler thread) looks >> >> into >> >> + * the local DNS cache to resolve the DNS requests. >> >> + * >> >> + * - put_arp/put_nd - These actions stores the IPv4/IPv6 and MAC >> >> addresses >> >> + * in the 'MAC_Binding' table. >> >> + * The function 'pinctrl_handle_put_mac_binding()' >> >> (which >> >> + * is called with in the pinctrl_handler thread), >> >> stores >> >> + * the IPv4/IPv6 and MAC addresses in the >> >> + * hmap - put_mac_bindings. >> >> + * >> >> + * pinctrl_run(), reads these mac bindings from the >> >> hmap >> >> + * 'put_mac_bindings' and writes to the >> >> 'MAC_Binding' >> >> + * table in the Southbound DB. >> >> + * >> >> + * - arp/nd_ns - These actions generate an ARP/IPv6 Neighbor >> >> solicit >> >> + * requests. The original packets are buffered and >> >> + * injected back when put_arp/put_nd actions are >> >> called. >> >> + * When pinctrl_run(), writes the mac bindings from >> >> the >> >> + * 'put_mac_bindings' hmap, it moves these mac >> >> bindings >> >> + * to another hmap - 'buffered_mac_bindings'. >> >> + * >> >> + * The pinctrl_handler thread calls the function - >> >> + * send_mac_binding_buffered_pkts(), which uses >> >> + * the hmap - 'buffered_mac_bindings' and reinjects >> >> the >> >> + * buffered packets. >> >> + * >> >> + * pinctrl module also periodically sends IPv6 Router Solicitation >> >> requests >> >> + * and gARPs (for the router gateway IPs and configured NAT addresses). >> >> + * >> >> + * IPv6 RA handling - pinctrl_run() prepares the IPv6 RA information >> >> + * (see prepare_ipv6_ras()) in the shash 'ipv6_ras' by >> >> + * looking into the Southbound DB table - >> >> Port_Binding. >> >> + * >> >> + * pinctrl_handler thread sends the periodic IPv6 RAs >> >> using >> >> + * the shash - 'ipv6_ras' >> >> + * >> >> + * gARP handling - pinctrl_run() prepares the gARP information >> >> + * (see send_garp_prepare()) in the shash >> >> 'send_garp_data' >> >> + * by looking into the Southbound DB table >> >> Port_Binding. >> >> + * >> >> + * pinctrl_handler() thread sends these gARPs using >> >> the >> >> + * shash 'send_garp_data'. >> >> + * >> >> + * Notification between pinctrl_handler() and pinctrl_run() >> >> + * ------------------------------------------------------- >> >> + * 'struct seq' is used for notification between pinctrl_handler() thread >> >> + * and pinctrl_run(). >> >> + * 'pinctrl_handler_seq' is used by pinctrl_run() to >> >> + * wake up pinctrl_handler thread from poll_block() if any changes >> >> happened >> >> + * in 'send_garp_data', 'ipv6_ras' and 'buffered_mac_bindings' >> >> structures. >> >> + * >> >> + * 'pinctrl_main_seq' is used by pinctrl_handler() thread to wake up >> >> + * the main thread from poll_block() when mac bindings needs to be >> >> updated >> >> + * in the Southboubd DB. >> >> + * */ >> >> + >> > >> > Thanks Numan for addressing this problem. It is very good that the >> > implementation doesn't require a separate IDL for SB access. Here are >> > some generic comments. >> > >> > For operations that doesn't rely on SB DB, such as OVS probe echo, ACL >> > logging and DHCP, in theory, they can be handled completely in >> > parallel. However, in this patch the pinctrl_mutex locks the whole >> > pinctrl_handler. I think the lock can be relaxed to exclude the >> > operations that doesn't need SB DB access. > > > I did think of it earlier, but then went with this approach to take a safe > approach. > I will address this in v2 as per your suggestion. > >> >> > >> > For tasks that need READONLY access to SB DB, the main thread reads >> > from DB, and notifies pinctrl thread to process the changed data. This >> > synchronization is definitely required and the implementation looks >> > good to me. >> > >> > For tasks that requires WRITE access to SB DB, such as put_arp, it >> > will still trigger recompute in main thread with this patch, and the >> > two thread would still block each other. >> >> > up main thread with pinctrl_main_seq, the main thread will recompute >> > flows again, even if there is no change in SB DB. In fact this can be >> > solved by the first patches of the incremental processing >> > (https://github.com/hzhou8/ovs/commit/2bc0e2423fdcff21db4d607f3164b0e0c62cd660). >> > In that patch there is no real incremental processing yet, but since >> > it is abstracted with dependency model, it can avoid flow computing >> > when it is not needed. Do you think it is good to be combined with >> > this multi-threading patch to avoid the problem of put_arp triggering >> > flow computing which blocks put_arp operation? >> >> I think the idea of not recomputing flows if SB DB has not changed is a >> fantastic idea and should be implemented whether pinctrl is made >> multithreaded or not. If the goal is *only* not to recompute flows if SB >> DB has not changed, then it potentially could be done with a simpler set >> of patches, since there is no goal of implementing an I-P engine. >>
It is not only about SB DB change, but about all input changes that can trigger flow computation. The first patches of I-P engine just accomplish that, but it is true that it can be simpler, at least the flow table persistent is not necessary for this purpose. I will try to remove the unnecessary part from that patch. > > Thanks Han and Mark for the comments. > I agree with Mark. I think we should address that and it would definitely > help in reducing ovn-controller's CPU usage. But I think that can be > addressed as a separate patch since this patch is not changing that > behavior. I will look into your patch you pointed out and see if I can test > it out. > > Does this sound good ? Agree. I can send a separate one. > > Thanks > Numan > > >> >> I was going to suggest that it may be possible to avoid such a patch by >> having the pinctrl_handle_mac_binding() function detect if local data >> has changed before notifying the main thread. However, I don't think >> this will work if MAC bindings move between hypervisors. >> >> Sorry for bringing the >> > I-P series back if it is annoying. I think the major reason that was >> > rejected earlier was the concern of maintenance effort for dependency >> > graph and change-handler implementations. However, since the first >> > patches of I-P only build the coarse grained dependency without any >> > change-handler, we can just utilize the benefit without worrying about >> > maintenance, although I might be biased. (p.s. I am still eager to >> > work on DDlog for ovn-controller, but I guess it will just take quite >> > some time to be ready). >> > >> > Thanks, >> > Han >> > _______________________________________________ >> > dev mailing list >> > [email protected] >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
