Re: [ovs-dev] [PATCH v2 1/2] netdev-dpdk: Use intermediate queue during packet transmission.
Not a complete review. This code is full of races. See details inline. Best regards, Ilya Maximets. On 17.01.2017 18:37, antonio.fische...@intel.com wrote: > This patch implements the intermediate Tx queues on 'dpdk' type ports. > > Test results: > * In worst case scenario with fewer packets per batch, a significant >bottleneck is observed for netdev_dpdk_eth_send() function due to >expensive MMIO writes. > > * Also its observed that CPI(cycles per instruction) Rate for the function >stood between 3.15 and 4.1 which is significantly higher than acceptable >limit of 1.0 for HPC applications and theoretical limit of 0.25 (As Backend >pipeline can retire 4 micro-operations in a cycle). > > * With this patch, CPI for netdev_dpdk_eth_send() is at 0.55 and the overall >throughput improved significantly. > > > Signed-off-by: Antonio Fischetti> Signed-off-by: Bhanuprakash Bodireddy > Co-authored-by: Bhanuprakash Bodireddy > Signed-off-by: Markus Magnusson > Co-authored-by: Markus Magnusson > --- > lib/dpif-netdev.c | 53 +++-- > lib/netdev-bsd.c | 1 + > lib/netdev-dpdk.c | 82 > ++- > lib/netdev-dummy.c| 1 + > lib/netdev-linux.c| 1 + > lib/netdev-provider.h | 8 + > lib/netdev-vport.c| 3 +- > lib/netdev.c | 9 ++ > lib/netdev.h | 1 + > 9 files changed, 149 insertions(+), 10 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 3901129..58ac429 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -289,6 +289,8 @@ struct dp_netdev_rxq { > struct dp_netdev_pmd_thread *pmd; /* pmd thread that will poll this > queue. */ > }; > > +#define LAST_USED_QID_NONE -1 > + > /* A port in a netdev-based datapath. */ > struct dp_netdev_port { > odp_port_t port_no; > @@ -303,6 +305,8 @@ struct dp_netdev_port { > char *type; /* Port type as requested by user. */ > char *rxq_affinity_list;/* Requested affinity of rx queues. */ > bool need_reconfigure; /* True if we should reconfigure netdev. */ > +int last_used_qid; /* Last queue id where packets could be > + enqueued. */ > }; > > /* Contained by struct dp_netdev_flow's 'stats' member. */ > @@ -619,6 +623,9 @@ static int dpif_netdev_xps_get_tx_qid(const struct > dp_netdev_pmd_thread *pmd, > static inline bool emc_entry_alive(struct emc_entry *ce); > static void emc_clear_entry(struct emc_entry *ce); > > +static struct tx_port *pmd_send_port_cache_lookup > +(const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no); > + > static void > emc_cache_init(struct emc_cache *flow_cache) > { > @@ -3507,15 +3514,19 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread > *pmd, > return i; > } > > +enum { DRAIN_TSC = 2ULL }; > + > static void * > pmd_thread_main(void *f_) > { > struct dp_netdev_pmd_thread *pmd = f_; > -unsigned int lc = 0; > +unsigned int lc = 0, lc_drain = 0; > struct polled_queue *poll_list; > bool exiting; > int poll_cnt; > int i; > +uint64_t prev = 0, now = 0; > +struct tx_port *tx_port; > > poll_list = NULL; > > @@ -3548,6 +3559,26 @@ reload: > poll_list[i].port_no); > } > > +#define MAX_LOOP_TO_DRAIN 128 > +if (lc_drain++ > MAX_LOOP_TO_DRAIN) { > +lc_drain = 0; > +prev = now; > +now = pmd->last_cycles; > +if ((now - prev) > DRAIN_TSC) { > +HMAP_FOR_EACH (tx_port, node, >tx_ports) { 'pmd->tx_ports' must be protected by 'pmd->port_mutex'. Also it can be changed while pmd still working. I think you wanted something like 'pmd->send_port_cache'. > +if (tx_port->port->last_used_qid != LAST_USED_QID_NONE) { > +/* This queue may contain some buffered packets > waiting > + * to be sent out. */ > +netdev_txq_drain(tx_port->port->netdev, > +tx_port->port->last_used_qid, > +tx_port->port->dynamic_txqs); > +/* Mark it as empty. */ > +tx_port->port->last_used_qid = LAST_USED_QID_NONE; 'port' is a pointer to the common structure --> 'port->last_used_qid' will be concurrently updated by all threads --> total mess. > +} > +} > +} > +} > + > if (lc++ > 1024) { > bool reload; > > @@ -3883,6 +3914,7 @@ dp_netdev_add_port_tx_to_pmd(struct > dp_netdev_pmd_thread *pmd, > > tx->port = port; > tx->qid = -1; > +
[ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration
After the hotplug of DPDK ports, a valid dpdk-devargs must be specified. Otherwise, the DPDK device can't be available. Signed-off-by: Binbin Xu--- Documentation/faq/configuration.rst | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/Documentation/faq/configuration.rst b/Documentation/faq/configuration.rst index c03d069..8bd0e11 100644 --- a/Documentation/faq/configuration.rst +++ b/Documentation/faq/configuration.rst @@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port? startup when other_config:dpdk-init is set to 'true'. Secondly, when adding a DPDK port, unlike a system port, the type for the -interface must be specified. For example:: +interface and valid dpdk-devargs must be specified. For example:: $ ovs-vsctl add-br br0 -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk - -Finally, it is required that DPDK port names begin with ``dpdk``. +$ ovs-vsctl add-port br0 myportname -- set Interface myportname \ +type=dpdk options:dpdk-devargs=:06:00.0 Refer to :doc:`/intro/install/dpdk` for more information on enabling and using DPDK with Open vSwitch. -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2 0/2] netdev-dpdk: Use intermediate queue during packet transmission.
After packet classification packets are queued in to batches depending on the matching netdev flow. Thereafter each batch is processed to execute the related actions. This becomes particularly inefficient if there are few packets in each batch as rte_eth_tx_burst() incurs expensive MMIO writes. This patchset adds back the intermediate queue implementation. Packets are queued and burst when their count is >= NETDEV_MAX_BURST. Also a drain logic is refactored to handle packets hanging into the tx queues. Testing shows significant performance gains with this implementation. Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of queueing of packets.") V1 => V2: * xps_get_tx_qid() is no more called twice. The last used qid is stored so the drain function will flush the right queue also when XPS is enabled. * netdev_txq_drain() is called unconditionally and not just for dpdk ports. * txq_drain() takes the 'tx_lock' for queue in case of dynamic tx queues. * Restored counting of dropped packets. * Changed scheduling of drain function. * Updated comments in netdev-provider.h * Fixed a comment in dp-packet.h. Fischetti, Antonio (2): netdev-dpdk: Use intermediate queue during packet transmission. dp-packet: Fix comment on DPDK buffer data. lib/dp-packet.c | 4 +-- lib/dp-packet.h | 10 +++ lib/dpif-netdev.c | 53 +++-- lib/netdev-bsd.c | 1 + lib/netdev-dpdk.c | 82 ++- lib/netdev-dummy.c| 1 + lib/netdev-linux.c| 1 + lib/netdev-provider.h | 8 + lib/netdev-vport.c| 3 +- lib/netdev.c | 9 ++ lib/netdev.h | 1 + 11 files changed, 156 insertions(+), 17 deletions(-) -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2 1/2] netdev-dpdk: Use intermediate queue during packet transmission.
This patch implements the intermediate Tx queues on 'dpdk' type ports. Test results: * In worst case scenario with fewer packets per batch, a significant bottleneck is observed for netdev_dpdk_eth_send() function due to expensive MMIO writes. * Also its observed that CPI(cycles per instruction) Rate for the function stood between 3.15 and 4.1 which is significantly higher than acceptable limit of 1.0 for HPC applications and theoretical limit of 0.25 (As Backend pipeline can retire 4 micro-operations in a cycle). * With this patch, CPI for netdev_dpdk_eth_send() is at 0.55 and the overall throughput improved significantly. Signed-off-by: Antonio FischettiSigned-off-by: Bhanuprakash Bodireddy Co-authored-by: Bhanuprakash Bodireddy Signed-off-by: Markus Magnusson Co-authored-by: Markus Magnusson --- lib/dpif-netdev.c | 53 +++-- lib/netdev-bsd.c | 1 + lib/netdev-dpdk.c | 82 ++- lib/netdev-dummy.c| 1 + lib/netdev-linux.c| 1 + lib/netdev-provider.h | 8 + lib/netdev-vport.c| 3 +- lib/netdev.c | 9 ++ lib/netdev.h | 1 + 9 files changed, 149 insertions(+), 10 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3901129..58ac429 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -289,6 +289,8 @@ struct dp_netdev_rxq { struct dp_netdev_pmd_thread *pmd; /* pmd thread that will poll this queue. */ }; +#define LAST_USED_QID_NONE -1 + /* A port in a netdev-based datapath. */ struct dp_netdev_port { odp_port_t port_no; @@ -303,6 +305,8 @@ struct dp_netdev_port { char *type; /* Port type as requested by user. */ char *rxq_affinity_list;/* Requested affinity of rx queues. */ bool need_reconfigure; /* True if we should reconfigure netdev. */ +int last_used_qid; /* Last queue id where packets could be + enqueued. */ }; /* Contained by struct dp_netdev_flow's 'stats' member. */ @@ -619,6 +623,9 @@ static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd, static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); +static struct tx_port *pmd_send_port_cache_lookup +(const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no); + static void emc_cache_init(struct emc_cache *flow_cache) { @@ -3507,15 +3514,19 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd, return i; } +enum { DRAIN_TSC = 2ULL }; + static void * pmd_thread_main(void *f_) { struct dp_netdev_pmd_thread *pmd = f_; -unsigned int lc = 0; +unsigned int lc = 0, lc_drain = 0; struct polled_queue *poll_list; bool exiting; int poll_cnt; int i; +uint64_t prev = 0, now = 0; +struct tx_port *tx_port; poll_list = NULL; @@ -3548,6 +3559,26 @@ reload: poll_list[i].port_no); } +#define MAX_LOOP_TO_DRAIN 128 +if (lc_drain++ > MAX_LOOP_TO_DRAIN) { +lc_drain = 0; +prev = now; +now = pmd->last_cycles; +if ((now - prev) > DRAIN_TSC) { +HMAP_FOR_EACH (tx_port, node, >tx_ports) { +if (tx_port->port->last_used_qid != LAST_USED_QID_NONE) { +/* This queue may contain some buffered packets waiting + * to be sent out. */ +netdev_txq_drain(tx_port->port->netdev, +tx_port->port->last_used_qid, +tx_port->port->dynamic_txqs); +/* Mark it as empty. */ +tx_port->port->last_used_qid = LAST_USED_QID_NONE; +} +} +} +} + if (lc++ > 1024) { bool reload; @@ -3883,6 +3914,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd, tx->port = port; tx->qid = -1; +port->last_used_qid = LAST_USED_QID_NONE; hmap_insert(>tx_ports, >node, hash_port_no(tx->port->port_no)); pmd->need_reload = true; @@ -4538,7 +4570,24 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, } else { tx_qid = pmd->static_tx_qid; } +//TODO Add UNLIKELY to the 1st condition? +/* Is the current qid the same as the last one we used? */ +if ((p->port->last_used_qid != LAST_USED_QID_NONE) && +(p->port->last_used_qid != tx_qid)) { +/* The current assigned queue was changed, we need to drain + * packets from the previous queue. */ +
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
It would be more helpful to have a simple reproduction case. Why haven't you tried a newer version from branch-2.5? On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote: > Hi Ben, > > Here i is more debug information related to this incident (still using > version 2.5.0): > > Summary : > > We think that there is some race condition involved in processing OF > Controller connections and Packet miss processing in ovs-vswitchd. > > Reasoning : > > Please consider the following GDB Debug Session: > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 > (gdb) f 2 > #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at > ofproto/connmgr.c:1613 > 1613 enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > (gdb) p *ofconn > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, > next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, > band = OFPROTO_IN_BAND, enable_async_msgs = true, > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, > 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = > 0, n_modify = 0, > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev > = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > (gdb) bt > #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at > ofproto/connmgr.c:999 > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at > ofproto/connmgr.c:1613 > #3 0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, > version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, > table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 > #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 > #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, > reval_seq=585728, recircs=0x7f2ace7f8a30) > at ofproto/ofproto-dpif-upcall.c:1866 > #6 0x00448fb2 in revalidate (revalidator=0x1691990) at > ofproto/ofproto-dpif-upcall.c:2186 > #7 0x0044593e in udpif_revalidator (arg=0x1691990) at > ofproto/ofproto-dpif-upcall.c:862 > #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at > lib/ovs-thread.c:340 > #9 0x7f2ad75c2184 in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) f 1 > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), > (gdb) l > 977 { > 978 if (ofconn->protocol == OFPUTIL_P_NONE && > 979 rconn_is_connected(ofconn->rconn)) { > 980 int version = rconn_get_version(ofconn->rconn); > 981 if (version > 0) { > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), > 983 > ofputil_protocol_from_ofp_version(version)); > 984 } > 985 } > 986 > (gdb) p *ofconn > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, > next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, > band = OFPROTO_IN_BAND, enable_async_msgs = true, > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, > 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = > 0, n_modify = 0, > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev > = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > (gdb) p ofconn > $4 = (const struct ofconn *) 0x16d5810 > (gdb) c > Continuing. > [Thread 0x7f2ad79f5980 (LWP 20165) exited] > >
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
This issue happened on our in-use systems and we were trying to find a way to move forward avoiding this issue so that we do not have to upgrade OVS on thousands of our hypervisors causing down time. Our debugging did help us avoid the issue for now by installing an explicit rule to to drop packets when there is no match and this issue is not seen over many hours of test runs. We will definitely run this test with latest version. But, will need more time since we are busy with our release related activities. Regards, Sagar. On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaffwrote: > It would be more helpful to have a simple reproduction case. > > Why haven't you tried a newer version from branch-2.5? > > On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote: > > Hi Ben, > > > > Here i is more debug information related to this incident (still using > version 2.5.0): > > > > Summary : > > > > We think that there is some race condition involved in processing OF > Controller connections and Packet miss processing in ovs-vswitchd. > > > > Reasoning : > > > > Please consider the following GDB Debug Session: > > > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 > > (gdb) f 2 > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > 1613 enum ofputil_protocol protocol = > ofconn_get_protocol(ofconn); > > (gdb) p *ofconn > > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, > > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > > controller_id = 0, reply_counter = 0x1673190, master_async_config = > {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, > n_delete = 0, n_modify = 0, > > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = > {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > > (gdb) bt > > #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) > at ofproto/connmgr.c:999 > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > #3 0x00435261 in rule_dpif_lookup_from_table > (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, > stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, > > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 > > #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 > > #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, > reval_seq=585728, recircs=0x7f2ace7f8a30) > > at ofproto/ofproto-dpif-upcall.c:1866 > > #6 0x00448fb2 in revalidate (revalidator=0x1691990) at > ofproto/ofproto-dpif-upcall.c:2186 > > #7 0x0044593e in udpif_revalidator (arg=0x1691990) at > ofproto/ofproto-dpif-upcall.c:862 > > #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at > lib/ovs-thread.c:340 > > #9 0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/ > libpthread.so.0 > > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) f 1 > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > ofconn), > > (gdb) l > > 977 { > > 978 if (ofconn->protocol == OFPUTIL_P_NONE && > > 979 rconn_is_connected(ofconn->rconn)) { > > 980 int version = rconn_get_version(ofconn->rconn); > > 981 if (version > 0) { > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > ofconn), > > 983 ofputil_protocol_from_ofp_ > version(version)); > > 984 } > > 985 } > > 986 > > (gdb) p *ofconn > > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, > > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > > controller_id = 0, reply_counter =
Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.
Will respin a patch v2 where we addressed your comments. So please let's continue our discussion on v2. Thanks, Antonio > -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of Fischetti, Antonio > Sent: Thursday, January 12, 2017 5:40 PM > To: Ilya Maximets; Bodireddy, Bhanuprakash > ; Aaron Conole > Cc: d...@openvswitch.org; Daniele Di Proietto > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during > packet transmission. > > Hi Ilya, > thanks for your detailed feedback. I've added a couple of replies > inline about when the instant send happens and the may_steal param. > > > Regards, > Antonio > > > -Original Message- > > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > > Sent: Tuesday, December 20, 2016 12:47 PM > > To: Bodireddy, Bhanuprakash ; Aaron > > Conole > > Cc: d...@openvswitch.org; Daniele Di Proietto ; > > Thadeu Lima de Souza Cascardo ; Fischetti, Antonio > > ; Markus Magnusson > > > > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > during > > packet transmission. > > > > On 20.12.2016 15:19, Bodireddy, Bhanuprakash wrote: > > >> -Original Message- > > >> From: Ilya Maximets [mailto:i.maxim...@samsung.com] > > >> Sent: Tuesday, December 20, 2016 8:09 AM > > >> To: Bodireddy, Bhanuprakash ; Aaron > > >> Conole > > >> Cc: d...@openvswitch.org; Daniele Di Proietto > ; > > >> Thadeu Lima de Souza Cascardo ; Fischetti, > Antonio > > >> ; Markus Magnusson > > >> > > >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > > during > > >> packet transmission. > > >> > > >> On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote: > > >>> Thanks Ilya and Aaron for reviewing this patch and providing your > > >> comments, my reply inline. > > >>> > > -Original Message- > > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > > Sent: Monday, December 19, 2016 8:41 AM > > To: Aaron Conole ; Bodireddy, Bhanuprakash > > > > Cc: d...@openvswitch.org; Daniele Di Proietto > > ; Thadeu Lima de Souza Cascardo > > ; Fischetti, Antonio > > ; Markus Magnusson > > > > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > > during packet transmission. > > > > Hi, > > > > I didn't test this patch yet. Bhanu, could you please describe your > > test scenario and performance results in more details. > > >>> > > >>> During the recent performance analysis improvements for classifier, > we > > >> found that bottleneck was also observed at flow batching. > > >>> This was due to fewer packets in batch. To reproduce this, a simple > > P2P test > > >> case can be used with 30 IXIA streams and matching IP flow rules. > > >>> Eg: For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1 > > >>> ovs-ofctl add-flow br0 > > >>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2 > > >>> > > >>> For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1 > > >>> ovs-ofctl add-flow br0 > > >>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2 > > >>> > > >>> This leaves fewer packets in batches and > > packet_batch_per_flow_execute() > > >> shall be invoked for every batch. > > >>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY > case > > for > > >> 64 byte udp packets. > > >>> > > It'll be nice if you provide throughput and latency measurement > > results for different scenarios and packet sizes. Latency is > > important here. > > >>> We are yet to do latency measurements in this case. With 30 IXIA > > >>> streams comprising of 64 byte udp packets there was an throughput > > >>> improvement of 30% in P2P case and 13-15% in PVP case(single queue). > > we > > >> will try to get the latency stats with and without this patch. > > >>> > > > > About the patch itself: > > > > 1. 'drain' called only for PMD threads. This will lead to > > broken connection with non-PMD ports. > > >>> I see that non-PMD ports are handled with vswitchd thread. > > >>> Tested PVP loopback case with tap ports and found to be working as > > >>> expected. Can you let me know the specific case you are referring > here > > so > > >> that I can verify if the patch breaks it. > > >> > > >> I meant something like this: > > >> > > >> > > >>
[ovs-dev] Lograr el éxito no es casualidad
" style="display:block; border:none; outline:none; text-decoration:none; color:#ff !important; font-size:36px;" src="http://www.tumanualestudioenlinea.com/mail/img/npm-global-executive-mba.png; class="banner" / Maestría Global Executive MBA Las empresas requieren profesionales más efectivos en los puestos de dirección, con pensamiento global y local, con capacidad de motivación y gestión intangibles, preparados para evaluar y realizar el seguimiento de la estrategia a seguir para alcanzar objetivos. Este programa está diseñado para que los participantes adquieran conocimientos, competencias y habilidades que les permitan una gestión efectiva y de éxito en sus empresas. ¿Quiere desarrollar las competencias directivas más relevantes? Requiero Informacin Doble Titulacin por: IOE Business School y Universidad Rey Juan Carlos E-Learning 100% Online Temario Multimedia Progamas Tutorizados High Quality Garanta de Calidad Este programa ha superado un estricto proceso de certificacin por parte de la Universidad Rey Juan Carlos (Madrid). Si no quiere recibir ms publicidad de mster, maestras o cursos de capacitacin puede dar de baja su correo electrnico de esta lista de distribucin pulsando aqu. Muchas gracias de antemano por la atencin prestada. Reciba un cordial saludo. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles
Instead of counting all polling cycles as processing cycles, only count the cycles where packets were received from the polling. Signed-off-by: Georg SchmueckingSigned-off-by: Ciara Loftus Co-authored-by: Ciara Loftus --- v2: - Rebase --- lib/dpif-netdev.c | 57 ++- 1 file changed, 44 insertions(+), 13 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3901129..3854c79 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -272,7 +272,10 @@ enum dp_stat_type { enum pmd_cycles_counter_type { PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */ -PMD_CYCLES_PROCESSING, /* Cycles spent processing packets */ +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful polling */ +PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and + * processing polled packets */ + PMD_N_CYCLES }; @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply, } ds_put_format(reply, - "\tpolling cycles:%"PRIu64" (%.02f%%)\n" + "\tidle cycles:%"PRIu64" (%.02f%%)\n" "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", - cycles[PMD_CYCLES_POLLING], - cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100, + cycles[PMD_CYCLES_IDLE], + cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, cycles[PMD_CYCLES_PROCESSING], cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(>cycles.n[type], interval); } -static void +/* Calculate the intermediate cycle result and add to the counter 'type' */ +static inline void +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, + enum pmd_cycles_counter_type type) +OVS_NO_THREAD_SAFETY_ANALYSIS +{ +unsigned long long new_cycles = cycles_counter(); +unsigned long long interval = new_cycles - pmd->last_cycles; +pmd->last_cycles = new_cycles; + +non_atomic_ullong_add(>cycles.n[type], interval); +} + +static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct netdev_rxq *rx, odp_port_t port_no) { struct dp_packet_batch batch; int error; +int batch_cnt = 0; dp_packet_batch_init(); -cycles_count_start(pmd); error = netdev_rxq_recv(rx, ); -cycles_count_end(pmd, PMD_CYCLES_POLLING); if (!error) { *recirc_depth_get() = 0; -cycles_count_start(pmd); +batch_cnt = batch.count; dp_netdev_input(pmd, , port_no); -cycles_count_end(pmd, PMD_CYCLES_PROCESSING); } else if (error != EAGAIN && error != EOPNOTSUPP) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); VLOG_ERR_RL(, "error receiving data from %s: %s", netdev_rxq_get_name(rx), ovs_strerror(error)); } + +return batch_cnt; } static struct tx_port * @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif) struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *non_pmd; uint64_t new_tnl_seq; +int process_packets = 0; ovs_mutex_lock(>port_mutex); non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); if (non_pmd) { ovs_mutex_lock(>non_pmd_mutex); +cycles_count_start(non_pmd); HMAP_FOR_EACH (port, node, >ports) { if (!netdev_is_pmd(port->netdev)) { int i; for (i = 0; i < port->n_rxq; i++) { -dp_netdev_process_rxq_port(non_pmd, port->rxqs[i].rx, - port->port_no); +process_packets += +dp_netdev_process_rxq_port(non_pmd, + port->rxqs[i].rx, + port->port_no); } } } +cycles_count_end(non_pmd, process_packets ? PMD_CYCLES_PROCESSING + : PMD_CYCLES_IDLE); dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false); ovs_mutex_unlock(>non_pmd_mutex); @@ -3516,6 +3538,7 @@ pmd_thread_main(void *f_) bool exiting; int poll_cnt; int i; +int process_packets = 0; poll_list = NULL; @@ -3542,10 +3565,12 @@ reload: lc = UINT_MAX; } +cycles_count_start(pmd); for (;;) { for (i = 0; i < poll_cnt; i++) { -dp_netdev_process_rxq_port(pmd, poll_list[i].rx, - poll_list[i].port_no); +process_packets += +
Re: [ovs-dev] [PATCH v10 1/8] ovn: document logical routers and logical patch ports in ovn-architecture
On Tue, Jan 17, 2017 at 01:45:02AM -0800, Mickey Spiegel wrote: > This patch adds a description of logical routers and logical patch ports, > including gateway routers, to ovn/ovn-architecture.7.xml. > > Signed-off-by: Mickey SpiegelThanks for working on the documentation! I applied this to master, folding in just one change (because I wanted to emphasize that the behavior isn't different, just the implementation): diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index d92f878..c0ceb65 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -1096,8 +1096,8 @@ When the packet reaches table 65, the logical egress port is a logical -patch port. The behavior at table 65 differs depending on the OVS -version: +patch port. The implementation in table 65 differs depending on the OVS +version, although the observed behavior is meant to be the same: Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles
On 01/17/2017 05:43 PM, Ciara Loftus wrote: > Instead of counting all polling cycles as processing cycles, only count > the cycles where packets were received from the polling. This makes these stats much clearer. One minor comment below, other than that Acked-by: Kevin Traynor> > Signed-off-by: Georg Schmuecking > Signed-off-by: Ciara Loftus > Co-authored-by: Ciara Loftus > --- > v2: > - Rebase > --- > lib/dpif-netdev.c | 57 > ++- > 1 file changed, 44 insertions(+), 13 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 3901129..3854c79 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -272,7 +272,10 @@ enum dp_stat_type { > > enum pmd_cycles_counter_type { > PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */ this is not used anymore and can be removed > -PMD_CYCLES_PROCESSING, /* Cycles spent processing packets */ > +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful polling > */ > +PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and > + * processing polled packets */ > + > PMD_N_CYCLES > }; > > @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply, > } > > ds_put_format(reply, > - "\tpolling cycles:%"PRIu64" (%.02f%%)\n" > + "\tidle cycles:%"PRIu64" (%.02f%%)\n" >"\tprocessing cycles:%"PRIu64" (%.02f%%)\n", > - cycles[PMD_CYCLES_POLLING], > - cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100, > + cycles[PMD_CYCLES_IDLE], > + cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, >cycles[PMD_CYCLES_PROCESSING], >cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * > 100); > > @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, > non_atomic_ullong_add(>cycles.n[type], interval); > } > > -static void > +/* Calculate the intermediate cycle result and add to the counter 'type' */ > +static inline void > +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, > + enum pmd_cycles_counter_type type) > +OVS_NO_THREAD_SAFETY_ANALYSIS > +{ > +unsigned long long new_cycles = cycles_counter(); > +unsigned long long interval = new_cycles - pmd->last_cycles; > +pmd->last_cycles = new_cycles; > + > +non_atomic_ullong_add(>cycles.n[type], interval); > +} > + > +static int > dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, > struct netdev_rxq *rx, > odp_port_t port_no) > { > struct dp_packet_batch batch; > int error; > +int batch_cnt = 0; > > dp_packet_batch_init(); > -cycles_count_start(pmd); > error = netdev_rxq_recv(rx, ); > -cycles_count_end(pmd, PMD_CYCLES_POLLING); > if (!error) { > *recirc_depth_get() = 0; > > -cycles_count_start(pmd); > +batch_cnt = batch.count; > dp_netdev_input(pmd, , port_no); > -cycles_count_end(pmd, PMD_CYCLES_PROCESSING); > } else if (error != EAGAIN && error != EOPNOTSUPP) { > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > > VLOG_ERR_RL(, "error receiving data from %s: %s", > netdev_rxq_get_name(rx), ovs_strerror(error)); > } > + > +return batch_cnt; > } > > static struct tx_port * > @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif) > struct dp_netdev *dp = get_dp_netdev(dpif); > struct dp_netdev_pmd_thread *non_pmd; > uint64_t new_tnl_seq; > +int process_packets = 0; > > ovs_mutex_lock(>port_mutex); > non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); > if (non_pmd) { > ovs_mutex_lock(>non_pmd_mutex); > +cycles_count_start(non_pmd); > HMAP_FOR_EACH (port, node, >ports) { > if (!netdev_is_pmd(port->netdev)) { > int i; > > for (i = 0; i < port->n_rxq; i++) { > -dp_netdev_process_rxq_port(non_pmd, port->rxqs[i].rx, > - port->port_no); > +process_packets += > +dp_netdev_process_rxq_port(non_pmd, > + port->rxqs[i].rx, > + port->port_no); > } > } > } > +cycles_count_end(non_pmd, process_packets ? PMD_CYCLES_PROCESSING > + : PMD_CYCLES_IDLE); > dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false); > ovs_mutex_unlock(>non_pmd_mutex); > > @@ -3516,6
Re: [ovs-dev] [PATCH] dp-packet: Fix comment on DPDK buffer data.
Self-NACK. Will include this change with another patch. Antonio > -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com > Sent: Wednesday, January 11, 2017 5:11 PM > To: d...@openvswitch.org > Subject: [ovs-dev] [PATCH] dp-packet: Fix comment on DPDK buffer data. > > Small fix on the comments about the DPDK buffer data to refer > to dp_packet_init_dpdk() function. > > Signed-off-by: Antonio Fischetti> --- > lib/dp-packet.c | 4 ++-- > lib/dp-packet.h | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c > index 793b54f..14ab15e 100644 > --- a/lib/dp-packet.c > +++ b/lib/dp-packet.c > @@ -91,8 +91,8 @@ dp_packet_use_const(struct dp_packet *b, const void > *data, size_t size) > > /* Initializes 'b' as an empty dp_packet that contains the 'allocated' > bytes of > * memory starting at 'base'. DPDK allocated dp_packet and *data is > allocated > - * from one continous memory region, so in memory data start right after > - * dp_packet. Therefore there is special method to free this type of > + * from one continuous memory region, so in memory data start right after > + * dp_packet. Therefore there is a special method to free this type of > * buffer. dp_packet base, data and size are initialized by dpdk rcv() > so no > * need to initialize those fields. */ > void > diff --git a/lib/dp-packet.h b/lib/dp-packet.h > index cf7d247..b941540 100644 > --- a/lib/dp-packet.h > +++ b/lib/dp-packet.h > @@ -39,7 +39,7 @@ enum OVS_PACKED_ENUM dp_packet_source { > DPBUF_STACK, /* Un-movable stack space or static > buffer. */ > DPBUF_STUB,/* Starts on stack, may expand into heap. > */ > DPBUF_DPDK,/* buffer data is from DPDK allocated > memory. > -* ref to build_dp_packet() in netdev- > dpdk. */ > +* Ref to dp_packet_init_dpdk() in dp- > packet. */ > }; > > #define DP_PACKET_CONTEXT_SIZE 64 > -- > 2.4.11 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Hi Ben, Here i is more debug information related to this incident (still using version 2.5.0): Summary : We think that there is some race condition involved in processing OF Controller connections and Packet miss processing in ovs-vswitchd. Reasoning : Please consider the following GDB Debug Session: Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 (gdb) f 2 #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at ofproto/connmgr.c:1613 1613enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); (gdb) p *ofconn $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, n_modify = 0, first_op = -9223372036854775808, last_op = -9223372036854775808, next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} (gdb) bt #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at ofproto/connmgr.c:982 #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at ofproto/connmgr.c:1613 #3 0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, reval_seq=585728, recircs=0x7f2ace7f8a30) at ofproto/ofproto-dpif-upcall.c:1866 #6 0x00448fb2 in revalidate (revalidator=0x1691990) at ofproto/ofproto-dpif-upcall.c:2186 #7 0x0044593e in udpif_revalidator (arg=0x1691990) at ofproto/ofproto-dpif-upcall.c:862 #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at lib/ovs-thread.c:340 #9 0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) f 1 #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at ofproto/connmgr.c:982 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), (gdb) l 977 { 978 if (ofconn->protocol == OFPUTIL_P_NONE && 979 rconn_is_connected(ofconn->rconn)) { 980 int version = rconn_get_version(ofconn->rconn); 981 if (version > 0) { 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), 983 ofputil_protocol_from_ofp_version(version)); 984 } 985 } 986 (gdb) p *ofconn $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, n_modify = 0, first_op = -9223372036854775808, last_op = -9223372036854775808, next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} (gdb) p ofconn $4 = (const struct ofconn *) 0x16d5810 (gdb) c Continuing. [Thread 0x7f2ad79f5980 (LWP 20165) exited] >From the above GDB Session, ovs-vswitchd is in the middle of processing a >packet miss that was read from the data path. The break point was set inside ofconn_set_protocol inside so that we hit if protocol was already set to other than OFPUTIL_P_NONE and now is being set to OFPUTIL_P_OF10_STD. Yes, we modified the code in ofconn_set_protocol with this if
Re: [ovs-dev] [PATCH 3/6] datapath-windows: Add support for OVS_TUNNEL_KEY_ATTR_TP_DST
We should consolidate this logic into 1 function for retrieving the tunKey from attr. This can be done separately. Acked-by: Sairam VenugopalOn 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin Serdean" wrote: >Add support for netlink attribute OVS_TUNNEL_KEY_ATTR_TP_DST get/set >flow functions. > >Signed-off-by: Alin Gabriel Serdean >--- > datapath-windows/ovsext/Flow.c | 8 > 1 file changed, 8 insertions(+) > >diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c >index 749d83a..96ff9fa 100644 >--- a/datapath-windows/ovsext/Flow.c >+++ b/datapath-windows/ovsext/Flow.c >@@ -1737,6 +1737,9 @@ OvsTunnelAttrToIPv4TunnelKey(PNL_ATTR attr, > case OVS_TUNNEL_KEY_ATTR_OAM: > tunKey->flags |= OVS_TNL_F_OAM; > break; >+case OVS_TUNNEL_KEY_ATTR_TP_DST: >+tunKey->dst_port = NlAttrGetBe16(a); >+break; > case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS: > if (hasOpt) { > /* Duplicate options attribute is not allowed. */ >@@ -1812,6 +1815,11 @@ MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, > destKey->tunKey.flags |= OVS_TNL_F_OAM; > } > >+if (tunAttrs[OVS_TUNNEL_KEY_ATTR_TP_DST]) { >+destKey->tunKey.dst_port = >+NlAttrGetU16(tunAttrs[OVS_TUNNEL_KEY_ATTR_TP_DST]); >+} >+ > if (tunAttrs[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS]) { > NTSTATUS status = OvsTunnelAttrToGeneveOptions( > tunAttrs[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS], >-- >2.10.2.windows.1 >___ >dev mailing list >d...@openvswitch.org >https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=b5wDdE2vUUc40ZLOT7Qh6Xa7oJn8L5PUF1xaK4QXlRc=wV7RC6p0TO50RU-QknKEC1uCMcEjT0QBCechGsJCnWc= > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/6] datapath-windows: Allow tunnel action to modify destination port
Patch 2 and 3 could’ve been combined into the same patch since they are about dst_port. Acked-by: Sairam VenugopalOn 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin Serdean" wrote: >'OvsTunnelAttrToIPv4TunnelKey' modifies 'tunkey' with the received netlink >attributes(i.e. OVS_TUNNEL_KEY_ATTR_IPV4_DST). > >Change the order of the value assignment to reflect the values received via >userspace. > >Signed-off-by: Alin Gabriel Serdean >--- > datapath-windows/ovsext/Actions.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/datapath-windows/ovsext/Actions.c >b/datapath-windows/ovsext/Actions.c >index b4a286b..1530d8b 100644 >--- a/datapath-windows/ovsext/Actions.c >+++ b/datapath-windows/ovsext/Actions.c >@@ -1492,11 +1492,11 @@ OvsExecuteSetAction(OvsForwardingContext *ovsFwdCtx, > case OVS_KEY_ATTR_TUNNEL: > { > OvsIPv4TunnelKey tunKey; >+tunKey.flow_hash = (uint16)(hash ? *hash : OvsHashFlow(key)); >+tunKey.dst_port = key->ipKey.l4.tpDst; > NTSTATUS convertStatus = OvsTunnelAttrToIPv4TunnelKey((PNL_ATTR)a, > ); > status = SUCCEEDED(convertStatus) ? NDIS_STATUS_SUCCESS : > NDIS_STATUS_FAILURE; > ASSERT(status == NDIS_STATUS_SUCCESS); >-tunKey.flow_hash = (uint16)(hash ? *hash : OvsHashFlow(key)); >-tunKey.dst_port = key->ipKey.l4.tpDst; > RtlCopyMemory(>tunKey, , sizeof ovsFwdCtx->tunKey); > break; > } >-- >2.10.2.windows.1 >___ >dev mailing list >d...@openvswitch.org >https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=ZijUW-XONuy4-0Gaus241SbH5Y3jgJ6h3MNUavcwlRc=1XFYfOzat_zlBGT85Z2Z7b3sEH8rtGJQKCGCjXsLtag= > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 4/6] datapath-windows: VXLAN Check for flow destination port
Do we need to convert the tunKey->dst_port to network byte order? On 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin Serdean"wrote: >Change the UDP destination port(VXLAN header) to check if it was set by >the userspace, use it if it was set. >If the userspace did not specify a destination port, use the configured >vport destination port. > >Signed-off-by: Alin Gabriel Serdean >--- > datapath-windows/ovsext/Vxlan.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/datapath-windows/ovsext/Vxlan.c b/datapath-windows/ovsext/Vxlan.c >index 949e069..84c2f2f 100644 >--- a/datapath-windows/ovsext/Vxlan.c >+++ b/datapath-windows/ovsext/Vxlan.c >@@ -289,7 +289,8 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, > /* UDP header */ > udpHdr = (UDPHdr *)((PCHAR)ipHdr + sizeof *ipHdr); > udpHdr->source = htons(tunKey->flow_hash | MAXINT16); >-udpHdr->dest = htons(vportVxlan->dstPort); >+udpHdr->dest = tunKey->dst_port ? tunKey->dst_port : >+ htons(vportVxlan->dstPort); > udpHdr->len = htons(NET_BUFFER_DATA_LENGTH(curNb) - headRoom + > sizeof *udpHdr + sizeof *vxlanHdr); > >-- >2.10.2.windows.1 >___ >dev mailing list >d...@openvswitch.org >https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=5_pKhBH-PUBhzkQB9iiJQHwEXF4_z3jQTCICjBvpj1M=LmmZoSZ1s-GGDQBO7ydyZsKXCOqTZ_QAp99ZVHO633M= > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 15/17] datapath: enable vxlangpe creation in compat mode
Make sense, will change it in v3. -Original Message- From: Jiri Benc [mailto:jb...@redhat.com] Sent: Wednesday, January 18, 2017 12:37 AM To: Yang, Yi YCc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v2 15/17] datapath: enable vxlangpe creation in compat mode On Wed, 28 Dec 2016 20:26:28 +0800, Yi Yang wrote: > --- a/datapath/linux/compat/include/linux/openvswitch.h > +++ b/datapath/linux/compat/include/linux/openvswitch.h > @@ -291,6 +291,7 @@ enum ovs_vport_attr { enum { > OVS_VXLAN_EXT_UNSPEC, > OVS_VXLAN_EXT_GBP, /* Flag or __u32 */ > + OVS_VXLAN_EXT_GPE, /* Flag or __u32 */ Please leave several free slots between OVS_VXLAN_EXT_GBP and OVS_VXLAN_EXT_GPE. Although it's not expected that we add more values to this enum upstream, we should not rule it out completely. Having a buffer here allows to prevent nasty problems with attribute conflicts between the upstream kernel and the compat code. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles
2017-01-17 11:43 GMT-08:00 Kevin Traynor: > On 01/17/2017 05:43 PM, Ciara Loftus wrote: >> Instead of counting all polling cycles as processing cycles, only count >> the cycles where packets were received from the polling. > > This makes these stats much clearer. One minor comment below, other than > that > > Acked-by: Kevin Traynor > >> >> Signed-off-by: Georg Schmuecking >> Signed-off-by: Ciara Loftus >> Co-authored-by: Ciara Loftus Minor: the co-authored-by tag should be different from the main author. This makes it easier to understand how busy a pmd thread is, a valid question that a sysadmin might have. The counters were originally introduced to help developers understand how cycles are spent between drivers(netdev rx) and datapath processing(dpif). Do you think it's ok to lose this type of information? Perhaps it is, since a developer can also use a profiler, I'm not sure. Maybe we could 'last_cycles' as it is and introduce a separate counter to get the idle/busy ratio. I'm not 100% sure this is the best way. What do you guys think? Thanks, Daniele >> --- >> v2: >> - Rebase >> --- >> lib/dpif-netdev.c | 57 >> ++- >> 1 file changed, 44 insertions(+), 13 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 3901129..3854c79 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -272,7 +272,10 @@ enum dp_stat_type { >> >> enum pmd_cycles_counter_type { >> PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */ > > this is not used anymore and can be removed > >> -PMD_CYCLES_PROCESSING, /* Cycles spent processing packets */ >> +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful >> polling */ >> +PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and >> + * processing polled packets */ >> + >> PMD_N_CYCLES >> }; >> >> @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply, >> } >> >> ds_put_format(reply, >> - "\tpolling cycles:%"PRIu64" (%.02f%%)\n" >> + "\tidle cycles:%"PRIu64" (%.02f%%)\n" >>"\tprocessing cycles:%"PRIu64" (%.02f%%)\n", >> - cycles[PMD_CYCLES_POLLING], >> - cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100, >> + cycles[PMD_CYCLES_IDLE], >> + cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, >>cycles[PMD_CYCLES_PROCESSING], >>cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * >> 100); >> >> @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, >> non_atomic_ullong_add(>cycles.n[type], interval); >> } >> >> -static void >> +/* Calculate the intermediate cycle result and add to the counter 'type' */ >> +static inline void >> +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, >> + enum pmd_cycles_counter_type type) I'd add an OVS_REQUIRES(_counter_fake_mutex) >> +OVS_NO_THREAD_SAFETY_ANALYSIS >> +{ >> +unsigned long long new_cycles = cycles_counter(); >> +unsigned long long interval = new_cycles - pmd->last_cycles; >> +pmd->last_cycles = new_cycles; >> + >> +non_atomic_ullong_add(>cycles.n[type], interval); >> +} >> + >> +static int >> dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, >> struct netdev_rxq *rx, >> odp_port_t port_no) >> { >> struct dp_packet_batch batch; >> int error; >> +int batch_cnt = 0; >> >> dp_packet_batch_init(); >> -cycles_count_start(pmd); >> error = netdev_rxq_recv(rx, ); >> -cycles_count_end(pmd, PMD_CYCLES_POLLING); >> if (!error) { >> *recirc_depth_get() = 0; >> >> -cycles_count_start(pmd); >> +batch_cnt = batch.count; >> dp_netdev_input(pmd, , port_no); >> -cycles_count_end(pmd, PMD_CYCLES_PROCESSING); >> } else if (error != EAGAIN && error != EOPNOTSUPP) { >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >> >> VLOG_ERR_RL(, "error receiving data from %s: %s", >> netdev_rxq_get_name(rx), ovs_strerror(error)); >> } >> + >> +return batch_cnt; >> } >> >> static struct tx_port * >> @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif) >> struct dp_netdev *dp = get_dp_netdev(dpif); >> struct dp_netdev_pmd_thread *non_pmd; >> uint64_t new_tnl_seq; >> +int process_packets = 0; >> >> ovs_mutex_lock(>port_mutex); >> non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); >> if (non_pmd) { >> ovs_mutex_lock(>non_pmd_mutex); >> +cycles_count_start(non_pmd); >> HMAP_FOR_EACH (port, node, >ports) { >>
[ovs-dev] [RFC] ofproto/bond: operational vs administratively disabled bond interface
Currently OVS does not distinguish between a bond slave being operational disabled, i.e. link being down, and administratively disabled. Take the example where the administrator disabled a link in a bond, "ovs-appctl bond/disable-slave bond0 enp129s0f0", it's automatically enabled again due to the fact the link is up. I would like to change this behavior such that when disabled trough appctl the slave is no longer used until explicitly enabled again via appctl. Any comments? If not I go ahead and prepare a patch-set. Cheers, Eelco ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v10 4/8] ovn: add egress_loopback action
This patch adds an action that loops a clone of the packet back to the beginning of the ingress pipeline with logical inport equal to the value of the current logical outport. The following actions are executed on the clone: clears the connection tracking state in_port = 0 inport = outport outport = 0 flags = 0 reg0 ... reg9 = 0 nested actions from inside "{ ... }" for example "reg9[1] = 1" to indicate that egress loopback has occurred executes the ingress pipeline as a subroutine This action is expected to be executed in the egress pipeline. No changes are made to the logical datapath or to the connection tracking zones, which will continue to be correct when carrying out loopback from the egress pipeline to the ingress pipeline. This capability is needed in order to implement some of the east/west distributed NAT flows. Signed-off-by: Mickey Spiegel--- include/ovn/actions.h | 5 +++- ovn/controller/lflow.c| 1 + ovn/lib/actions.c | 71 +-- ovn/ovn-sb.xml| 35 +++ ovn/utilities/ovn-trace.c | 49 tests/ovn.at | 4 +++ tests/test-ovn.c | 1 + 7 files changed, 163 insertions(+), 3 deletions(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index 0bf6145..5f9d203 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -68,7 +68,8 @@ struct simap; OVNACT(PUT_ND,ovnact_put_mac_bind) \ OVNACT(PUT_DHCPV4_OPTS, ovnact_put_dhcp_opts) \ OVNACT(PUT_DHCPV6_OPTS, ovnact_put_dhcp_opts) \ -OVNACT(SET_QUEUE, ovnact_set_queue) +OVNACT(SET_QUEUE, ovnact_set_queue) \ +OVNACT(EGRESS_LOOPBACK, ovnact_nest) /* enum ovnact_type, with a member OVNACT_ for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -444,6 +445,8 @@ struct ovnact_encode_params { uint8_t output_ptable; /* OpenFlow table for 'output' to resubmit. */ uint8_t mac_bind_ptable;/* OpenFlow table for 'get_arp'/'get_nd' to resubmit. */ +uint8_t ingress_ptable; /* OpenFlow table for 'egress_loopback' to + resubmit. */ }; void ovnacts_encode(const struct ovnact[], size_t ovnacts_len, diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 3d7633e..c41368f 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -226,6 +226,7 @@ consider_logical_flow(const struct lport_index *lports, .first_ptable = first_ptable, .output_ptable = output_ptable, .mac_bind_ptable = OFTABLE_MAC_BINDING, +.ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE, }; ovnacts_encode(ovnacts.data, ovnacts.size, , ); ovnacts_free(ovnacts.data, ovnacts.size); diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index 686ecc5..dda675b 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -169,6 +169,21 @@ put_load(uint64_t value, enum mf_field_id dst, int ofs, int n_bits, bitwise_copy(_value, 8, 0, sf->value, sf->field->n_bytes, ofs, n_bits); bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits); } + +static void +put_move(enum mf_field_id src, int src_ofs, + enum mf_field_id dst, int dst_ofs, + int n_bits, + struct ofpbuf *ofpacts) +{ +struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts); +move->src.field = mf_from_id(src); +move->src.ofs = src_ofs; +move->src.n_bits = n_bits; +move->dst.field = mf_from_id(dst); +move->dst.ofs = dst_ofs; +move->dst.n_bits = n_bits; +} /* Context maintained during ovnacts_parse(). */ struct action_context { @@ -1021,7 +1036,10 @@ free_CT_LB(struct ovnact_ct_lb *ct_lb) } /* Implements the "arp" and "nd_na" actions, which execute nested actions on a - * packet derived from the one being processed. */ + * packet derived from the one being processed. Also implements the + * "egress_loopback" action, which executes nested actions after clearing + * registers and connection state, then loops the packet back to the + * beginning of the ingress pipeline. */ static void parse_nested_action(struct action_context *ctx, enum ovnact_type type, const char *prereq) @@ -1055,7 +1073,9 @@ parse_nested_action(struct action_context *ctx, enum ovnact_type type, return; } -add_prerequisite(ctx, prereq); +if (prereq) { +add_prerequisite(ctx, prereq); +} struct ovnact_nest *on = ovnact_put(ctx->ovnacts, type, sizeof *on); on->nested_len = nested.size; @@ -1075,6 +1095,12 @@ parse_ND_NA(struct action_context *ctx) } static void +parse_EGRESS_LOOPBACK(struct action_context *ctx) +{ +parse_nested_action(ctx, OVNACT_EGRESS_LOOPBACK, NULL); +} + +static void format_nested_action(const struct ovnact_nest *on, const char *name,
[ovs-dev] [PATCH v10 8/8] ovn: ovn-nbctl commands for distributed NAT
This patch adds the new optional arguments "logical_port" and "external_mac" to lr-nat-add, and displays that information in lr-nat-list. Signed-off-by: Mickey Spiegel--- ovn/utilities/ovn-nbctl.8.xml | 27 +++--- ovn/utilities/ovn-nbctl.c | 54 +-- tests/ovn-nbctl.at| 47 + tests/system-ovn.at | 30 +--- 4 files changed, 119 insertions(+), 39 deletions(-) diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml index 4911c6a..c408484 100644 --- a/ovn/utilities/ovn-nbctl.8.xml +++ b/ovn/utilities/ovn-nbctl.8.xml @@ -444,7 +444,7 @@ NAT Commands - [--may-exist] lr-nat-add router type external_ip logical_ip + [--may-exist] lr-nat-add router type external_ip logical_ip [logical_port external_mac] Adds the specified NAT to router. @@ -453,6 +453,13 @@ The external_ip is an IPv4 address. The logical_ip is an IPv4 network (e.g 192.168.1.0/24) or an IPv4 address. + The logical_port and external_mac are only + accepted when router is a distributed router (rather + than a gateway router) and type is + dnat_and_snat. + The logical_port is the name of an existing logical + switch port where the logical_ip resides. + The external_mac is an Ethernet address. When type is dnat, the externally @@ -475,8 +482,22 @@ the IP address in external_ip. - It is an error if a NAT already exists, - unless --may-exist is specified. + When the logical_port and external_mac + are specified, the NAT rule will be programmed on the chassis + where the logical_port resides. This includes + ARP replies for the external_ip, which return the + value of external_mac. All packets transmitted + with source IP address equal to external_ip will + be sent using the external_mac. + + + It is an error if a NAT already exists with the same values + of router, type, external_ip, + and logical_ip, unless --may-exist is + specified. When --may-exist, + logical_port, and external_mac are all + specified, the existing values of logical_port and + external_mac are overwritten. diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c index 4397daf..661f7de 100644 --- a/ovn/utilities/ovn-nbctl.c +++ b/ovn/utilities/ovn-nbctl.c @@ -384,7 +384,7 @@ Route commands:\n\ lr-route-list ROUTER print routes for ROUTER\n\ \n\ NAT commands:\n\ - lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\ + lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\ add a NAT to ROUTER\n\ lr-nat-del ROUTER [TYPE [IP]]\n\ remove NATs from ROUTER\n\ @@ -2233,6 +2233,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx) new_logical_ip = normalize_ipv4_prefix(ipv4, plen); } +const char *logical_port; +const char *external_mac; +if (ctx->argc == 6) { +ctl_fatal("lr-nat-add with logical_port " + "must also specify external_mac."); +} else if (ctx->argc == 7) { +if (strcmp(nat_type, "dnat_and_snat")) { +ctl_fatal("logical_port and external_mac are only valid when " + "type is \"dnat_and_snat\"."); +} + +logical_port = ctx->argv[5]; +lsp_by_name_or_uuid(ctx, logical_port, true); + +external_mac = ctx->argv[6]; +struct eth_addr ea; +if (!eth_addr_from_string(external_mac, )) { +ctl_fatal("invalid mac address %s.", external_mac); +} +} else { +logical_port = NULL; +external_mac = NULL; +} + bool may_exist = shash_find(>options, "--may-exist") != NULL; int is_snat = !strcmp("snat", nat_type); for (size_t i = 0; i < lr->n_nat; i++) { @@ -2243,6 +2267,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx) if (!strcmp(is_snat ? external_ip : new_logical_ip, is_snat ? nat->external_ip : nat->logical_ip)) { if (may_exist) { +nbrec_nat_verify_logical_port(nat); +nbrec_nat_verify_external_mac(nat); +nbrec_nat_set_logical_port(nat, logical_port); +nbrec_nat_set_external_mac(nat, external_mac); free(new_logical_ip); return; } @@ -2265,6 +2293,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx) nbrec_nat_set_type(nat, nat_type); nbrec_nat_set_external_ip(nat, external_ip);
[ovs-dev] [PATCH v10 6/8] ovn: avoid snat recirc only on gateway routers
Currently, for performance reasons on gateway routers, ct_snat that does not specify an IP address does not immediately trigger recirculation. On gateway routers, ct_snat that does not specify an IP address happens in the UNSNAT pipeline stage, which is followed by the DNAT pipeline stage that triggers recirculation for all packets. This DNAT pipeline stage recirculation takes care of the recirculation needs of UNSNAT as well as other cases such as UNDNAT. On distributed routers, UNDNAT is handled in the egress pipeline stage, separately from DNAT in the ingress pipeline stages. The DNAT pipeline stage only triggers recirculation for some packets. Due to this difference in design, UNSNAT needs to trigger its own recirculation. This patch restricts the logic that avoids recirculation for ct_snat, so that it only applies to datapaths representing gateway routers. Signed-off-by: Mickey Spiegel--- include/ovn/actions.h | 3 +++ ovn/controller/lflow.c | 10 ++ ovn/lib/actions.c | 15 +-- ovn/ovn-sb.xml | 23 +++ tests/ovn.at | 2 +- 5 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index 5f9d203..810b901 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -418,6 +418,9 @@ struct ovnact_encode_params { /* 'true' if the flow is for a switch. */ bool is_switch; +/* 'true' if the flow is for a gateway router. */ +bool is_gateway_router; + /* A map from a port name to its connection tracking zone. */ const struct simap *ct_zones; diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index c41368f..b011109 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp) } +static bool +is_gateway_router(const struct sbrec_datapath_binding *ldp, + const struct hmap *local_datapaths) +{ +struct local_datapath *ld = +get_local_datapath(local_datapaths, ldp->tunnel_key); +return ld ? ld->has_local_l3gateway : false; +} + /* Adds the logical flows from the Logical_Flow table to flow tables. */ static void add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports, @@ -220,6 +229,7 @@ consider_logical_flow(const struct lport_index *lports, .lookup_port = lookup_port_cb, .aux = , .is_switch = is_switch(ldp), +.is_gateway_router = is_gateway_router(ldp, local_datapaths), .ct_zones = ct_zones, .group_table = group_table, diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index dda675b..5c57ab7 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -803,12 +803,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn, ct = ofpacts->header; if (cn->ip) { ct->flags |= NX_CT_F_COMMIT; -} else if (snat) { -/* XXX: For performance reasons, we try to prevent additional - * recirculations. So far, ct_snat which is used in a gateway router - * does not need a recirculation. ct_snat(IP) does need a - * recirculation. Should we consider a method to let the actions - * specify whether an action needs recirculation if there more use +} else if (snat && ep->is_gateway_router) { +/* For performance reasons, we try to prevent additional + * recirculations. ct_snat which is used in a gateway router + * does not need a recirculation. ct_snat(IP) does need a + * recirculation. ct_snat in a distributed router needs + * recirculation regardless of whether an IP address is + * specified. + * XXX Should we consider a method to let the actions specify + * whether an action needs recirculation if there are more use * cases?. */ ct->recirc_table = NX_CT_RECIRC_NONE; } diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index a7c29c3..8fe0e2b 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -1122,11 +1122,26 @@ ct_snat sends the packet through the SNAT zone to -unSNAT any packet that was SNATed in the opposite direction. If -the packet needs to be sent to the next tables, then it should be -followed by a next; action. The next tables will not -see the changes in the packet caused by the connection tracker. +unSNAT any packet that was SNATed in the opposite direction. The +behavior on gateway routers differs from the behavior on a +distributed router: + + + On a gateway router, if the packet needs to be sent to the next + tables, then it should be followed by a next; + action. The next tables will not see the changes in the packet + caused by the connection tracker. + + +
[ovs-dev] [PATCH v10 5/8] ovn: move load balancing flows after NAT flows
This will make it easy for distributed NAT to reuse some of the existing code for NAT flows, while leaving load balancing and defrag as functionality specific to gateway routers. There is no intent to change any functionality in this patch. Signed-off-by: Mickey SpiegelAcked-by: Gurucharan Shetty --- ovn/northd/ovn-northd.c | 140 1 file changed, 70 insertions(+), 70 deletions(-) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 87c80d1..219a69c 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, const char *lb_force_snat_ip = get_force_snat_ip(od, "lb", _ip); -/* A set to hold all ips that need defragmentation and tracking. */ -struct sset all_ips = SSET_INITIALIZER(_ips); - -for (int i = 0; i < od->nbr->n_load_balancer; i++) { -struct nbrec_load_balancer *lb = od->nbr->load_balancer[i]; -struct smap *vips = >vips; -struct smap_node *node; - -SMAP_FOR_EACH (node, vips) { -uint16_t port = 0; - -/* node->key contains IP:port or just IP. */ -char *ip_address = NULL; -ip_address_and_port_from_lb_key(node->key, _address, ); -if (!ip_address) { -continue; -} - -if (!sset_contains(_ips, ip_address)) { -sset_add(_ips, ip_address); -} - -/* Higher priority rules are added for load-balancing in DNAT - * table. For every match (on a VIP[:port]), we add two flows - * via add_router_lb_flow(). One flow is for specific matching - * on ct.new with an action of "ct_lb($targets);". The other - * flow is for ct.est with an action of "ct_dnat;". */ -ds_clear(); -ds_put_format(, "ct_lb(%s);", node->value); - -ds_clear(); -ds_put_format(, "ip && ip4.dst == %s", - ip_address); -free(ip_address); - -if (port) { -if (lb->protocol && !strcmp(lb->protocol, "udp")) { -ds_put_format(, " && udp && udp.dst == %d", - port); -} else { -ds_put_format(, " && tcp && tcp.dst == %d", - port); -} -add_router_lb_flow(lflows, od, , , 120, - lb_force_snat_ip); -} else { -add_router_lb_flow(lflows, od, , , 110, - lb_force_snat_ip); -} -} -} - -/* If there are any load balancing rules, we should send the - * packet to conntrack for defragmentation and tracking. This helps - * with two things. - * - * 1. With tracking, we can send only new connections to pick a - *DNAT ip address from a group. - * 2. If there are L4 ports in load balancing rules, we need the - *defragmentation to match on L4 ports. */ -const char *ip_address; -SSET_FOR_EACH(ip_address, _ips) { -ds_clear(); -ds_put_format(, "ip && ip4.dst == %s", ip_address); -ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, - 100, ds_cstr(), "ct_next;"); -} - -sset_destroy(_ips); - for (int i = 0; i < od->nbr->n_nat; i++) { const struct nbrec_nat *nat; @@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * routing in the openflow pipeline. */ ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, "ip", "flags.loopback = 1; ct_dnat;"); + +/* A set to hold all ips that need defragmentation and tracking. */ +struct sset all_ips = SSET_INITIALIZER(_ips); + +for (int i = 0; i < od->nbr->n_load_balancer; i++) { +struct nbrec_load_balancer *lb = od->nbr->load_balancer[i]; +struct smap *vips = >vips; +struct smap_node *node; + +SMAP_FOR_EACH (node, vips) { +uint16_t port = 0; + +/* node->key contains IP:port or just IP. */ +char *ip_address = NULL; +ip_address_and_port_from_lb_key(node->key, _address, ); +if (!ip_address) { +continue; +} + +if (!sset_contains(_ips, ip_address)) { +sset_add(_ips, ip_address); +} + +/* Higher priority rules are
Re: [ovs-dev] [PATCH] ovn-controller: Provide the option to set Encap.options:csum
On Mon, Jan 16, 2017 at 11:46 PM, Russell Bryantwrote: > > > On Mon, Jan 16, 2017 at 12:01 PM, Ben Pfaff wrote: > >> On Mon, Jan 16, 2017 at 09:28:28AM -0500, Russell Bryant wrote: >> > On Sat, Jan 14, 2017 at 11:29 AM, Ben Pfaff wrote: >> > >> > > On Sat, Jan 14, 2017 at 07:37:53PM +0530, Numan Siddique wrote: >> > > > On Sat, Jan 14, 2017 at 3:18 AM, Ben Pfaff wrote: >> > > > >> > > > > On Tue, Jan 10, 2017 at 11:34:42AM +0530, Numan Siddique wrote: >> > > > > > ovn-controller by default enables UDP checksums for geneve >> > > > > > tunnels. With this patch user can set the desired value in >> > > > > > Open_vSwitch.external_ids:ovn_encap_csum. >> > > > > > >> > > > > > Signed-off-by: Numan Siddique >> > > > > >> > > > > I don't see technical problems with this, but I also don't know >> why a >> > > > > user would want to disable checksums. Can you send a v2 that >> adds this >> > > > > rationale to the documentation and to the commit message? >> > > > > >> > > > >> > > > Thanks for the review. Sure I will do that. The reason for this >> patch is >> > > - >> > > > we are seeing significant performance increase (more than 100%) in >> our >> > > > testing when tunnel checksum is disabled. >> > > > >> > > > The lab servers have nics with geneve offload support ( >> > > > tx-udp_tnl-segmentation >> > > > ) >> > > > . >> > > >> > > OK, that sounds like a good reason to document. >> > >> > >> > In particular, it looks like the NICs we have, Intel X710, will do TCP >> > Segmentation Offload (TSO) with geneve or vxlan, but only if udp >> checksums >> > are turned off. Once they're on, TCP throughput gets cut to less than >> half. >> > >> > This is going to be painful to document well if it's hardware dependent. >> > I'm not sure what the better default is, since checksums should actually >> > improve performance for NICs without geneve offload support. >> >> Well, it's at least helpful to document that performance is the reason, >> and that offloads factor into the issue. Otherwise readers will have no >> idea why they'd want to turn off checksums. At worst, users can >> benchmark both cases in their environments. >> > > Totally agreed. I was just adding some more detail about what we've seen, > and then reflecting on how we might want to document this new option for > users. > > I have submitted v2 of the patch with updated documenation - https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327660.html Thanks Numan > -- > Russell Bryant > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v10 3/8] ovn: Introduce distributed gateway port and "chassisredirect" port binding
Currently OVN distributed logical routers achieve reachability to physical networks by passing through a "join" logical switch to a centralized gateway router, which then connects to another logical switch that has a localnet port connecting to the physical network. This patch adds logical port and port binding abstractions that allow an OVN distributed logical router to connect directly to a logical switch that has a localnet port connecting to the physical network. In this patch, this logical router port is called a "distributed gateway port". The primary design goal of distributed gateway ports is to allow as much traffic as possible to be handled locally on the hypervisor where a VM or container resides. Whenever possible, packets from the VM or container to the outside world should be processed completely on that VM's or container's hypervisor, eventually traversing a localnet port instance on that hypervisor to the physical network. Whenever possible, packets from the outside world to a VM or container should be directed through the physical network directly to the VM's or container's hypervisor, where the packet will enter the integration bridge through a localnet port. However, due to the implications of the use of L2 learning in the physical network, as well as the need to support advanced features such as one-to-many NAT (aka IP masquerading), where multiple logical IP addresses spread across multiple chassis are mapped to one external IP address, it will be necessary to handle some of the logical router processing on a specific chassis in a centralized manner. For this reason, the user must associate a chassis with each distributed gateway port. In order to allow for the distributed processing of some packets, distributed gateway ports need to be logical patch ports that effectively reside on every hypervisor, rather than "l3gateway" ports that are bound to a particular chassis. However, the flows associated with distributed gateway ports often need to be associated with physical locations. This is implemented in this patch (and subsequent patches) by adding "is_chassis_resident()" match conditions to several logical router flows. While most of the physical location dependent aspects of distributed gateway ports can be handled by restricting some flows to specific chassis, one additional mechanism is required. When a packet leaves the ingress pipeline and the logical egress port is the distributed gateway port, one of two different sets of actions is required at table 32: - If the packet can be handled locally on the sender's hypervisor (e.g. one-to-one NAT traffic), then the packet should just be resubmitted locally to table 33, in the normal manner for distributed logical patch ports. - However, if the packet needs to be handled on the chassis associated with the distributed gateway port (e.g. one-to-many SNAT traffic or non-NAT traffic), then table 32 must send the packet on a tunnel port to that chassis. In order to trigger the second set of actions, the "chassisredirect" type of southbound port_binding is introduced. Setting the logical egress port to the type "chassisredirect" logical port is simply a way to indicate that although the packet is destined for the distributed gateway port, it needs to be redirected to a different chassis. At table 32, packets with this logical egress port are sent to a specific chassis, in the same way that table 32 directs packets whose logical egress port is a VIF or a type "l3gateway" port to different chassis. Once the packet arrives at that chassis, table 33 resets the logical egress port to the value representing the distributed gateway port. For each distributed gateway port, there is one type "chassisredirect" port, in addition to the distributed logical patch port representing the distributed gateway port. A "chassisredirect" port represents a particular instance, bound to a specific chassis, of an otherwise distributed port. A "chassisredirect" port is associated with a chassis in the same manner as a "l3gateway" port. However, unlike "l3gateway" ports, "chassisredirect" ports have no associated IP or MAC addresses, and "chassisredirect" ports should never be used as the "inport". Any pipeline stages that depend on port specific IP or MAC addresses should be carried out in the context of the distributed gateway port's logical patch port. Although the abstraction represented by the "chassisredirect" port binding is generalized, in this patch the "chassisredirect" port binding is only created for NB logical router ports that specify the new "redirect-chassis" option. There is no explicit notion of a "chassisredirect" port in the NB database. The expectation is when capabilities are implemented that take advantage of "chassisredirect" ports (e.g. distributed gateway ports), flows specifying a "chassisredirect" port as the outport will be added as part of that capability. Signed-off-by: Mickey Spiegel
[ovs-dev] [PATCH v10 2/8] ovn: add is_chassis_resident match expression component
This patch introduces a new match expression component is_chassis_resident(). Unlike match expression comparisons, is_chassis_resident is not pushed down to OpenFlow. It is a conditional that is evaluated in the controller during expr_simplify(), when it is replaced by a boolean expression. The is_chassis_resident conditional evaluates to "true" when the specified string identifies a port name that is resident on this controller chassis, i.e., the corresponding southbound database Port_Binding has a chassis column that matches this chassis. Otherwise it evaluates to "false". This allows higher level features to specify flows that are only installed on some chassis rather than on all chassis with the corresponding datapath. Suggested-by: Ben PfaffSigned-off-by: Mickey Spiegel Acked-by: Ben Pfaff --- include/ovn/expr.h | 22 +- ovn/controller/lflow.c | 31 ++-- ovn/controller/lflow.h | 5 +- ovn/controller/ovn-controller.c | 5 +- ovn/lib/expr.c | 160 ++-- ovn/ovn-sb.xml | 14 ovn/utilities/ovn-trace.c | 21 +- tests/ovn.at| 14 tests/test-ovn.c| 24 +- 9 files changed, 279 insertions(+), 17 deletions(-) diff --git a/include/ovn/expr.h b/include/ovn/expr.h index 2169a8c..711713e 100644 --- a/include/ovn/expr.h +++ b/include/ovn/expr.h @@ -292,6 +292,15 @@ enum expr_type { EXPR_T_AND, /* Logical AND of 2 or more subexpressions. */ EXPR_T_OR, /* Logical OR of 2 or more subexpressions. */ EXPR_T_BOOLEAN, /* True or false constant. */ +EXPR_T_CONDITION, /* Conditional to be evaluated in the + * controller during expr_simplify(), + * prior to constructing OpenFlow matches. */ +}; + +/* Expression condition type. */ +enum expr_cond_type { +EXPR_COND_CHASSIS_RESIDENT, /* Check if specified logical port name is + * resident on the controller chassis. */ }; /* Relational operator. */ @@ -349,6 +358,14 @@ struct expr { /* EXPR_T_BOOLEAN. */ bool boolean; + +/* EXPR_T_CONDITION. */ +struct { +enum expr_cond_type type; +bool not; +/* XXX Should arguments for conditions be generic? */ +char *string; +} cond; }; }; @@ -375,7 +392,10 @@ void expr_destroy(struct expr *); struct expr *expr_annotate(struct expr *, const struct shash *symtab, char **errorp); -struct expr *expr_simplify(struct expr *); +struct expr *expr_simplify(struct expr *, + bool (*is_chassis_resident)(const void *c_aux, + const char *port_name), + const void *c_aux); struct expr *expr_normalize(struct expr *); bool expr_honors_invariants(const struct expr *); diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 71d8c59..3d7633e 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -50,12 +50,18 @@ struct lookup_port_aux { const struct sbrec_datapath_binding *dp; }; +struct condition_aux { +const struct lport_index *lports; +const struct sbrec_chassis *chassis; +}; + static void consider_logical_flow(const struct lport_index *lports, const struct mcgroup_index *mcgroups, const struct sbrec_logical_flow *lflow, const struct hmap *local_datapaths, struct group_table *group_table, const struct simap *ct_zones, + const struct sbrec_chassis *chassis, struct hmap *dhcp_opts, struct hmap *dhcpv6_opts, uint32_t *conj_id_ofs, @@ -85,6 +91,16 @@ lookup_port_cb(const void *aux_, const char *port_name, unsigned int *portp) } static bool +is_chassis_resident_cb(const void *c_aux_, const char *port_name) +{ +const struct condition_aux *c_aux = c_aux_; + +const struct sbrec_port_binding *pb += lport_lookup_by_name(c_aux->lports, port_name); +return pb && pb->chassis && pb->chassis == c_aux->chassis; +} + +static bool is_switch(const struct sbrec_datapath_binding *ldp) { return smap_get(>external_ids, "logical-switch") != NULL; @@ -98,6 +114,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports, const struct hmap *local_datapaths, struct group_table *group_table, const struct simap *ct_zones, + const struct sbrec_chassis *chassis,
[ovs-dev] [PATCH v10 1/8] ovn: document logical routers and logical patch ports in ovn-architecture
This patch adds a description of logical routers and logical patch ports, including gateway routers, to ovn/ovn-architecture.7.xml. Signed-off-by: Mickey Spiegel--- ovn/ovn-architecture.7.xml | 148 ++--- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index b049e51..d92f878 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -381,6 +381,36 @@ switch. Logical switches and routers are both implemented as logical datapaths. + + + +Logical ports represent the points of connectivity in and +out of logical switches and logical routers. Some common types of +logical ports are: + + + + + Logical ports representing VIFs. + + + + Localnet ports represent the points of connectivity + between logical switches and the physical network. They are + implemented as OVS patch ports between the integration bridge + and the separate Open vSwitch bridge that underlay physical + ports attach to. + + + + Logical patch ports represent the points of + connectivity between logical switches and logical routers, and + in some cases between peer logical routers. There is a pair of + logical patch ports at each such point of connectivity, one on + each side. + + + Life Cycle of a VIF @@ -1040,17 +1070,119 @@ is a container nested with a VM, then before sending the packet the actions push on a VLAN header with an appropriate VLAN ID. - - -If the logical egress port is a logical patch port, then table 65 -outputs to an OVS patch port that represents the logical patch port. -The packet re-enters the OpenFlow flow table from the OVS patch port's -peer in table 0, which identifies the logical datapath and logical -input port based on the OVS patch port's OpenFlow port number. - + Logical Routers and Logical Patch Ports + + +Typically logical routers and logical patch ports do not have a +physical location and effectively reside on every hypervisor. This is +the case for logical patch ports between logical routers and logical +switches behind those logical routers, to which VMs (and VIFs) attach. + + + +Consider a packet sent from one virtual machine or container to another +VM or container that resides on a different subnet. The packet will +traverse tables 0 to 65 as described in the previous section +Architectural Physical Life Cycle of a Packet, using the +logical datapath representing the logical switch that the sender is +attached to. At table 32, the packet will use the fallback flow that +resubmits locally to table 33 on the same hypervisor. In this case, +all of the processing from table 0 to table 65 occurs on the hypervisor +where the sender resides. + + + +When the packet reaches table 65, the logical egress port is a logical +patch port. The behavior at table 65 differs depending on the OVS +version: + + + + + In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch + port that represents the logical patch port. The packet re-enters + the OpenFlow flow table from the OVS patch port's peer in table 0, + which identifies the logical datapath and logical input port based + on the OVS patch port's OpenFlow port number. + + + + In OVS versions 2.7 and later, the packet is cloned and resubmitted + directly to OpenFlow flow table 16, setting the logical ingress + port to the peer logical patch port, and using the peer logical + patch port's logical datapath (that represents the logical router). + + + + +The packet re-enters the ingress pipeline in order to traverse tables +16 to 65 again, this time using the logical datapath representing the +logical router. The processing continues as described in the previous +section Architectural Physical Life Cycle of a Packet. +When the packet reachs table 65, the logical egress port will once +again be a logical patch port. In the same manner as described above, +this logical patch port will cause the packet to be resubmitted to +OpenFlow tables 16 to 65, this time using the logical datapath +representing the logical switch that the destination VM or container +is attached to. + + + +The packet traverses tables 16 to 65 a third and final time. If the +destination VM or container resides on a remote hypervisor, then table +32 will send the packet on a tunnel port from the sender's hypervisor +to the remote hypervisor. Finally table 65 will output the packet +directly to the destination VM or
Re: [ovs-dev] [PATCH] vlan.rst: Strip leftover HTML.
On Mon, Jan 16, 2017 at 5:14 PM, Ben Pfaffwrote: > On Mon, Jan 16, 2017 at 04:46:58PM -0500, Russell Bryant wrote: > > String a couple of closing HTML tags that were left over from when this > doc > > s/String/Strip/ > Oops, fixed. > > > was converted from the web site to RST. > > > > Signed-off-by: Russell Bryant > > Acked-by: Ben Pfaff > Thanks, applied to master. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Create separate memory pool for each port
On Sun, Jan 15, 2017 at 06:14:47PM +, Stokes, Ian wrote: > > Since it's possible to delete memory pool in DPDK we can try to estimate > > better required memory size when port is reconfigured, e.g. with different > > number of rx queues. > > Thanks for the patch. > > It will need to be rebased for it to apply cleanly to master. Yes, this patch waits here for quite some time. I need to rebase it and send once again. > I've applied the changes manually just to test and from what I can see there > are no issues. > It also resolves issue reported with having 64 queues for dpdk physical ports > which causes ovs to crash. Thanks for testing > > A few other comments below. > > > > Signed-off-by: Robert Wojciechowicz> > --- > > lib/netdev-dpdk.c | 85 +++--- > > - > > 1 file changed, 41 insertions(+), 44 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7c1523e..4c9e1e8 > > 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c [...] > > > > static struct dpdk_mp * > > -dpdk_mp_get(int socket_id, int mtu) > > +dpdk_mp_get(struct netdev_dpdk *dev, int mtu) > > { > > struct dpdk_mp *dmp; > > > > ovs_mutex_lock(_mp_mutex); > > -LIST_FOR_EACH (dmp, list_node, _mp_list) { > > -if (dmp->socket_id == socket_id && dmp->mtu == mtu) { > > -dmp->refcount++; > > -goto out; > > -} > > -} > > - > As you remove the use of dpdk_mp_list above, I wonder is there a need to > maintain the global list? > With mempools now set on a per netdev basis, perhaps the global list could be > removed also. Thoughts? > I was thinking about that when I prepared this patch. In this implementation memory pool will be always used only by one port, so I could get rid of `refcount` and `dpdk_mp_list`. I didn't do that because I didn't want to make huge revolution in the implementation without even knowing if this idea of separated memory pools makes sense for the community. Now I have positive feedbak from you, so maybe it's worth it to work on this patch further. Br, Robert ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev