Re: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT
From: ovs-dev-boun...@openvswitch.orgon behalf of Anand Kumar Sent: Thursday, August 10, 2017 8:59 PM To: d...@openvswitch.org Subject: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT During SNAT/DNAT, we should not be updating the port field of ct_endpoint struct, as ICMP packets do not have port information. Since port and icmp_id are overlapped in ct_endpoint struct, icmp_id gets changed. As a result, NAT look up fails to find a matching entry. This patch addresses this issue by not modifying icmp_id field during SNAT/DNAT only for ICMP traffic The current NAT module doesn't take the ICMP type/id/code into account during the lookups. Fix this to make it similar with the other conntrack module. Signed-off-by: Anand Kumar ___ Acked-by: Shashank Ram ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT
During SNAT/DNAT, we should not be updating the port field of ct_endpoint struct, as ICMP packets do not have port information. Since port and icmp_id are overlapped in ct_endpoint struct, icmp_id gets changed. As a result, NAT look up fails to find a matching entry. This patch addresses this issue by not modifying icmp_id field during SNAT/DNAT only for ICMP traffic The current NAT module doesn't take the ICMP type/id/code into account during the lookups. Fix this to make it similar with the other conntrack module. Signed-off-by: Anand Kumar--- datapath-windows/ovsext/Conntrack-nat.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack-nat.c b/datapath-windows/ovsext/Conntrack-nat.c index ae6b92c..eb6f9db 100644 --- a/datapath-windows/ovsext/Conntrack-nat.c +++ b/datapath-windows/ovsext/Conntrack-nat.c @@ -22,6 +22,12 @@ OvsHashNatKey(const OVS_CT_KEY *key) HASH_ADD(src.port); HASH_ADD(dst.port); HASH_ADD(zone); +/* icmp_id and port overlap in the union */ +HASH_ADD(src.icmp_type); +HASH_ADD(dst.icmp_type); +HASH_ADD(src.icmp_code); +HASH_ADD(dst.icmp_code); + #undef HASH_ADD return hash; } @@ -44,6 +50,12 @@ OvsNatKeyAreSame(const OVS_CT_KEY *key1, const OVS_CT_KEY *key2) FIELD_COMPARE(src.port); FIELD_COMPARE(dst.port); FIELD_COMPARE(zone); +FIELD_COMPARE(src.icmp_id); +FIELD_COMPARE(dst.icmp_id); +FIELD_COMPARE(src.icmp_type); +FIELD_COMPARE(dst.icmp_type); +FIELD_COMPARE(src.icmp_code); +FIELD_COMPARE(dst.icmp_code); return TRUE; #undef FIELD_COMPARE } @@ -253,6 +265,7 @@ OvsNatAddEntry(OVS_NAT_ENTRY* entry) * Update an Conntrack entry with NAT information. Translated address and * port will be generated and write back to the conntrack entry as a * result. + * Note: For ICMP, only address is translated. * */ BOOLEAN @@ -271,7 +284,7 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry) BOOLEAN allPortsTried; BOOLEAN originalPortsTried; struct ct_addr firstAddr; - + uint32_t hash = OvsNatHashRange(entry, 0); if ((entry->natInfo.natAction & NAT_ACTION_SRC) && @@ -310,10 +323,14 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry) for (;;) { if (entry->natInfo.natAction & NAT_ACTION_SRC) { entry->rev_key.dst.addr = ctAddr; -entry->rev_key.dst.port = htons(port); +if (entry->rev_key.nw_proto != IPPROTO_ICMP) { +entry->rev_key.dst.port = htons(port); +} } else { entry->rev_key.src.addr = ctAddr; -entry->rev_key.src.port = htons(port); +if (entry->rev_key.nw_proto != IPPROTO_ICMP) { +entry->rev_key.src.port = htons(port); +} } OVS_NAT_ENTRY *natEntry = OvsNatLookup(>rev_key, TRUE); -- 2.9.3.windows.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.
On Thu, Aug 10, 2017 at 02:31:43PM -0700, Joe Stringer wrote: > On 10 August 2017 at 11:41, Ben Pfaffwrote: > > On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote: > >> Besides of that, I see many places consume the table class. > >> Do you mind to make a macro helps to fetch the class? > >> Like: > >> > >> #define OVSDB_GET_TABLE_CLASS(row) \ > >> ((row)->table->class) > > > > Ugh. > > Just in case the context isn't obvious... ;-) > > I think that in general, we try not to obscure what's going on in OVS > code. This particular suggestion makes it less clear when reading the > code exactly where the class comes from, because the pointer > dereference is then hidden behind the macro. It also makes lines > referencing the class longer, and some of these lines are already > fairly long. That's a much better way of explaining what I meant. Thanks, Joe. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] How to make OVS working with IVSHMEM ?
Hello, I encounter a problem with running ovs with IVSHMEM. I've followed the guide of INSTALL.DPDK.md of ovs-2.5.0 to use IVSHMEM. Firstly, I started ovs-vswitchd using "./sbin/ovs-vswitchd --dpdk -c 0x1 -n 4 --proc-type=primary -- --pidfile --detach", and I added dpdk ports(dpdk0,dpdk1) and dpdk ring(dpdkr0) to ovs. Then, I started ./openvswitch-2.5.0/tests/test-dpdkr using "./test-dpdkr -c 1 -n 4 --proc-type=secondary -- -n 0", but the EAL initialization of this program was stucked after printing "EAL: Detected 16 lcore(s)" to stdout. Does anybody has successful experience with running ovs with IVSHMEM? Please share with me. Many thanks!!! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.
Hi Bhanu Given that you ultimately intend changes beyond those in this patch, would it make sense just to fold the follow up series (at least, the key elements) into this series, essentially expanding on this patch 5 ? Thanks Darrell -Original Message- From:on behalf of Bhanuprakash Bodireddy Date: Tuesday, August 8, 2017 at 10:06 AM To: "d...@openvswitch.org" Subject: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue. Under low rate traffic conditions, there can be 2 issues. (1) Packets potentially can get stuck in the intermediate queue. (2) Latency of the packets can increase significantly due to buffering in intermediate queue. This commit handles the (1) issue by flushing the tx port queues using dp_netdev_flush_txq_ports() as part of PMD packet processing loop. Also this commit addresses issue (2) by flushing the tx queues after every rxq port processing. This reduces the latency with out impacting the forwarding throughput. MASTER Pkt size min(ns) avg(ns) max(ns) 512 4,631 5,022309,914 1024 5,545 5,749104,294 1280 5,978 6,159 45,306 1518 6,419 6,774946,850 MASTER + COMMIT - Pkt size min(ns) avg(ns) max(ns) 512 4,711 5,064182,477 1024 5,601 5,888701,654 1280 6,018 6,491533,037 1518 6,467 6,734312,471 PMDs can be teared down and spawned at runtime and so the rxq and txq mapping of the PMD threads can change. In few cases packets can get stuck in the queue due to reconfiguration and this commit helps flush the queues. Suggested-by: Eelco Chaudron Reported-at: https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DApril_331039.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28=H0yNRh-c9pdYHacCzkoruc48Dj_Whkcwcjzv-vta-EI= Signed-off-by: Bhanuprakash Bodireddy Signed-off-by: Antonio Fischetti Co-authored-by: Antonio Fischetti Signed-off-by: Markus Magnusson Co-authored-by: Markus Magnusson Acked-by: Eelco Chaudron --- lib/dpif-netdev.c | 52 +++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e2cd931..bfb9650 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -340,6 +340,7 @@ enum pmd_cycles_counter_type { }; #define XPS_TIMEOUT_MS 500LL +#define LAST_USED_QID_NONE -1 /* Contained by struct dp_netdev_port's 'rxqs' member. */ struct dp_netdev_rxq { @@ -500,7 +501,13 @@ struct rxq_poll { struct tx_port { struct dp_netdev_port *port; int qid; -long long last_used; +int last_used_qid;/* Last queue id where packets got + enqueued. */ +long long last_used; /* In case XPS is enabled, it contains the + * timestamp of the last time the port was + * used by the thread to send data. After + * XPS_TIMEOUT_MS elapses the qid will be + * marked as -1. */ struct hmap_node node; }; @@ -3101,6 +3108,25 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(>cycles.n[type], interval); } +static void +dp_netdev_flush_txq_ports(struct dp_netdev_pmd_thread *pmd) +{ +struct tx_port *cached_tx_port; +int tx_qid; + +HMAP_FOR_EACH (cached_tx_port, node, >send_port_cache) { +tx_qid = cached_tx_port->last_used_qid; + +if (tx_qid != LAST_USED_QID_NONE) { +netdev_txq_flush(cached_tx_port->port->netdev, tx_qid, + cached_tx_port->port->dynamic_txqs); + +/* Queue flushed and mark it empty. */ +cached_tx_port->last_used_qid = LAST_USED_QID_NONE; +} +} +} + static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct netdev_rxq *rx, @@ -3667,6 +3693,9 @@ dpif_netdev_run(struct dpif *dpif) dp_netdev_process_rxq_port(non_pmd,
[ovs-dev] DPDK Merge Repo Location
Hi All As discussed in the fortnightly DPDK meeting, I am using a repo for DPDK patch merging. The repo is here: https://github.com/darball/ovs/ Branch is dpdk_merge - based on ovs master. Nothing for this week, as reviews are not completed yet for some features. Thanks Darrell ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 6/6] dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Rxqs consumed processing cycles are used to improve the balance of how rxqs are assigned to pmds. Currently some reconfiguration is needed to perform a reassignment. Add an ovs-appctl command to perform a new assignment in order to balance based on the latest rxq processing cycle information. Note: Jan requested this for testing purposes. Suggested-by: Jan ScheurichSigned-off-by: Kevin Traynor --- Documentation/howto/dpdk.rst | 5 - lib/dpif-netdev.c| 35 +++ vswitchd/ovs-vswitchd.8.in | 2 ++ 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 493e215..5d009bd 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -140,5 +140,8 @@ Core 7: Q4 (70%) | Q5 (10%) core 8: Q0 (60%) | Q0 (30%) -Rxq to pmds assignment takes place whenever there are configuration changes. +Rxq to pmds assignment takes place whenever there are configuration changes +or can be triggered by using:: + +$ ovs-appctl dpif-netdev/pmd-rxq-rebalance QoS diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b0f4010..8857489 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -708,4 +708,6 @@ static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); +static void dp_netdev_request_reconfigure(struct dp_netdev *dp); + static void emc_cache_init(struct emc_cache *flow_cache) @@ -1001,4 +1003,34 @@ sorted_poll_thread_list(struct dp_netdev *dp, static void +dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc, + const char *argv[], void *aux OVS_UNUSED) +{ +struct ds reply = DS_EMPTY_INITIALIZER; +struct dp_netdev *dp = NULL; + +ovs_mutex_lock(_netdev_mutex); + +if (argc == 2) { +dp = shash_find_data(_netdevs, argv[1]); +} else if (shash_count(_netdevs) == 1) { +/* There's only one datapath */ +dp = shash_first(_netdevs)->data; +} + +if (!dp) { +ovs_mutex_unlock(_netdev_mutex); +unixctl_command_reply_error(conn, +"please specify an existing datapath"); +return; +} + +dp_netdev_request_reconfigure(dp); +ovs_mutex_unlock(_netdev_mutex); +ds_put_cstr(, "pmd rxq rebalance requested.\n"); +unixctl_command_reply(conn, ds_cstr()); +ds_destroy(); +} + +static void dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], void *aux) @@ -1078,4 +1110,7 @@ dpif_netdev_init(void) 0, 1, dpif_netdev_pmd_info, (void *)_aux); +unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", + 0, 1, dpif_netdev_pmd_rebalance, + NULL); return 0; } diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index dfd209e..c18baf6 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -281,4 +281,6 @@ bridge statistics, only the values shown by the above command. For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with port names, which this thread polls. +.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" +Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage. . .so ofproto/ofproto-dpif-unixctl.man Tested-by: Greg Rose Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 5/6] dpif-netdev: Change pmd selection order.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Up to his point rxqs are sorted by processing cycles they consumed and assigned to pmds in a round robin manner. Ian pointed out that on wrap around the most loaded pmd will be the next one to be assigned an additional rxq and that it would be better to reverse the pmd order when wraparound occurs. In other words, change from assigning by rr to assigning in a forward and reverse cycle through pmds. Also, now that the algothim has finalised, document an example. Suggested-by: Ian StokesSigned-off-by: Kevin Traynor --- Documentation/howto/dpdk.rst | 16 lib/dpif-netdev.c| 21 - tests/pmd.at | 2 +- 3 files changed, 37 insertions(+), 2 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 44737e4..493e215 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -124,4 +124,20 @@ will be used where known to assign rxqs with the highest consumption of processing cycles to different pmds. +For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8) +available, and the measured usage of core cycles per rxq over the last +interval is seen to be: + +- Queue #0: 30% +- Queue #1: 80% +- Queue #3: 60% +- Queue #4: 70% +- Queue #5: 10% + +The rxqs will be assigned to cores 3,7,8 in the following order: + +Core 3: Q1 (80%) | +Core 7: Q4 (70%) | Q5 (10%) +core 8: Q0 (60%) | Q0 (30%) + Rxq to pmds assignment takes place whenever there are configuration changes. diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b4663ab..b0f4010 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3251,4 +3251,5 @@ struct rr_numa { int cur_index; +bool idx_inc; }; @@ -3307,4 +3308,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds); numa->pmds[numa->n_pmds - 1] = pmd; +/* At least one pmd so initialise curr_idx and idx_inc. */ +numa->cur_index = 0; +numa->idx_inc = true; } } @@ -3313,5 +3317,20 @@ static struct dp_netdev_pmd_thread * rr_numa_get_pmd(struct rr_numa *numa) { -return numa->pmds[numa->cur_index++ % numa->n_pmds]; +int numa_idx = numa->cur_index; + +if (numa->idx_inc == true) { +if (numa->cur_index == numa->n_pmds-1) { +numa->idx_inc = false; +} else { +numa->cur_index++; +} +} else { +if (numa->cur_index == 0) { +numa->idx_inc = true; +} else { +numa->cur_index--; +} +} +return numa->pmds[numa_idx]; } diff --git a/tests/pmd.at b/tests/pmd.at index b6732ea..e39a23a 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/"]) -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4 6/\1/;s/\(queue-id: \)1 3 5 7/\1/"]) +m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5 6/\1/;s/\(queue-id: \)0 3 4 7/\1/"]) m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) Tested-by: Greg Rose Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Previously rxqs were assigned to pmds by round robin in port/queue order. Now that we have the processing cycles used for existing rxqs, use that information to try and produced a better balanced distribution of rxqs across pmds. i.e. given multiple pmds, the rxqs which have consumed the largest amount of processing cycles will be placed on different pmds. The rxqs are sorted by their processing cycles and assigned (in sorted order) round robin across pmds. Signed-off-by: Kevin Traynor--- Documentation/howto/dpdk.rst | 7 +++ lib/dpif-netdev.c| 105 ++- 2 files changed, 81 insertions(+), 31 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index d7f6610..44737e4 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become thread. +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) +automatically. The processing cycles that have been required for each rxq +will be used where known to assign rxqs with the highest consumption of +processing cycles to different pmds. + +Rxq to pmds assignment takes place whenever there are configuration changes. + QoS --- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e344063..b4663ab 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3328,8 +3328,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) } +/* Sort Rx Queues by the processing cycles they are consuming. */ +static int +rxq_cycle_sort(const void *a, const void *b) +{ +struct dp_netdev_rxq * qa; +struct dp_netdev_rxq * qb; + +qa = *(struct dp_netdev_rxq **) a; +qb = *(struct dp_netdev_rxq **) b; + +if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= +dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { +return -1; +} + +return 1; +} + /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned * queues and marks the pmds as isolated. Otherwise, assign non isolated * pmds to unpinned queues. * + * If 'pinned' is false queues will be sorted by processing cycles they are + * consuming and then assigned to pmds in round robin order. + * * The function doesn't touch the pmd threads, it just stores the assignment * in the 'pmd' member of each rxq. */ @@ -3340,18 +3361,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) struct rr_numa_list rr; struct rr_numa *non_local_numa = NULL; - -rr_numa_list_populate(dp, ); +struct dp_netdev_rxq ** rxqs = NULL; +int i, n_rxqs = 0; +struct rr_numa *numa = NULL; +int numa_id; HMAP_FOR_EACH (port, node, >ports) { -struct rr_numa *numa; -int numa_id; - if (!netdev_is_pmd(port->netdev)) { continue; } -numa_id = netdev_get_numa_id(port->netdev); -numa = rr_numa_list_lookup(, numa_id); - for (int qid = 0; qid < port->n_rxq; qid++) { struct dp_netdev_rxq *q = >rxqs[qid]; @@ -3371,34 +3388,60 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { -if (!numa) { -/* There are no pmds on the queue's local NUMA node. - Round-robin on the NUMA nodes that do have pmds. */ -non_local_numa = rr_numa_list_next(, non_local_numa); -if (!non_local_numa) { -VLOG_ERR("There is no available (non-isolated) pmd " - "thread for port \'%s\' queue %d. This queue " - "will not be polled. Is pmd-cpu-mask set to " - "zero? Or are all PMDs isolated to other " - "queues?", netdev_get_name(port->netdev), - qid); -continue; -} -q->pmd = rr_numa_get_pmd(non_local_numa); -VLOG_WARN("There's no available (non-isolated) pmd thread " - "on numa node %d. Queue %d on port \'%s\' will " - "be assigned to the pmd on core %d " - "(numa node %d). Expect reduced performance.", - numa_id, qid, netdev_get_name(port->netdev), - q->pmd->core_id, q->pmd->numa_id); +if (n_rxqs == 0) { +rxqs = xmalloc(sizeof *rxqs); } else { -/* Assign queue to the next (round-robin) PMD on it's local - NUMA node. */ -q->pmd = rr_numa_get_pmd(numa); +
Re: [ovs-dev] [PATCH v4 3/6] dpif-netdev: Count the rxq processing cycles for an rxq.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Count the cycles used for processing an rxq during the pmd rxq interval. As this is an in flight counter and pmds run independently, also store the total cycles used during the last full interval. Signed-off-by: Kevin Traynor--- lib/dpif-netdev.c | 65 ++- 1 file changed, 60 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 41f16b2..e344063 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -182,4 +182,8 @@ struct emc_cache { #define DPCLS_OPTIMIZATION_INTERVAL 1000 +/* Time in ms of the interval in which rxq processing cycles used in + * rxq to pmd assignments is measured and stored. */ +#define PMD_RXQ_INTERVAL 1000 + struct dpcls { struct cmap_node node; /* Within dp_netdev_pmd_thread.classifiers */ @@ -558,4 +562,6 @@ struct dp_netdev_pmd_thread { /* Periodically sort subtable vectors according to hit frequencies */ long long int next_optimization; +/* Periodically store the processing cycles used for each rxq. */ +long long int rxq_interval; /* Statistics. */ @@ -684,5 +690,13 @@ static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) OVS_REQUIRES(pmd->port_mutex); static inline void -dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd, + struct polled_queue *poll_list, int poll_cnt); +static void +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type, + unsigned long long cycles); +static uint64_t +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type); static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@ -3114,4 +3128,21 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, } +static void +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type, + unsigned long long cycles) +{ + atomic_store_relaxed(>cycles[type], cycles); +} + +static uint64_t +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type) +{ +unsigned long long tmp; +atomic_read_relaxed(>cycles[type], ); +return tmp; +} + static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, @@ -3158,5 +3189,5 @@ port_reconfigure(struct dp_netdev_port *port) { struct netdev *netdev = port->netdev; -int i, err; +int i, err, last_nrxq; port->need_reconfigure = false; @@ -3167,4 +3198,5 @@ port_reconfigure(struct dp_netdev_port *port) port->rxqs[i].rx = NULL; } +last_nrxq = port->n_rxq; port->n_rxq = 0; @@ -3187,4 +3219,9 @@ port_reconfigure(struct dp_netdev_port *port) for (i = 0; i < netdev_n_rxq(netdev); i++) { port->rxqs[i].port = port; +if (i >= last_nrxq) { +/* Only reset cycle stats for new queues */ +dp_netdev_rxq_set_cycles(>rxqs[i], RXQ_CYCLES_PROC_CURR, 0); +dp_netdev_rxq_set_cycles(>rxqs[i], RXQ_CYCLES_PROC_LAST, 0); +} err = netdev_rxq_open(netdev, >rxqs[i].rx, i); if (err) { @@ -3867,5 +3904,5 @@ reload: dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); -cycles_count_intermediate(pmd, NULL, +cycles_count_intermediate(pmd, poll_list[i].rxq, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); @@ -3878,5 +3915,5 @@ reload: coverage_try_clear(); -dp_netdev_pmd_try_optimize(pmd); +dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt); if (!ovsrcu_try_quiesce()) { emc_cache_slow_sweep(>flow_cache); @@ -4322,4 +4359,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, cmap_init(>classifiers); pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL; +pmd->rxq_interval = time_msec() + PMD_RXQ_INTERVAL; hmap_init(>poll_list); hmap_init(>tx_ports); @@ -5759,8 +5797,25 @@ dpcls_sort_subtable_vector(struct dpcls *cls) static inline void -dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd) +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd, + struct polled_queue *poll_list, int poll_cnt) { struct dpcls *cls; long long int now = time_msec(); +int i; +uint64_t rxq_cyc_curr; + +if (now > pmd->rxq_interval) { +/* Get the cycles that were used to process each queue and store. */ +for (i = 0; i < poll_cnt;
Re: [ovs-dev] [PATCH v4 2/6] dpif-netdev: Add rxq processing cycle counters.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Add two counters to dp_netdev_rxq which will be used for storing the processing cycles of an rxq. Processing cycles will be stored in reference to a defined interval. One counter is used for storing cycles during the current in progress interval, while the other is used to store the cycles of the last fully complete interval. cycles_count_intermediate was used to count cycles for a pmd. With some small additions we can also use it to count the cycles used for processing an rxq. Signed-off-by: Kevin Traynor--- lib/dpif-netdev.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f35c079..41f16b2 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -340,4 +340,11 @@ enum pmd_cycles_counter_type { }; +enum rxq_cycles_counter_type { +RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and + processing polled packets */ +RXQ_CYCLES_PROC_LAST, +RXQ_N_CYCLES +}; + #define XPS_TIMEOUT_MS 500LL @@ -351,4 +358,5 @@ struct dp_netdev_rxq { particular core. */ struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this queue. */ +atomic_ullong cycles[RXQ_N_CYCLES]; }; @@ -677,5 +685,4 @@ static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) static inline void dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); - static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@ -3092,4 +3099,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_rxq *rxq, enum pmd_cycles_counter_type type) OVS_NO_THREAD_SAFETY_ANALYSIS @@ -3100,4 +3108,8 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(>cycles.n[type], interval); +if (rxq && (type == PMD_CYCLES_PROCESSING)) { +/* Add to the amount of current processing cycles. */ +non_atomic_ullong_add(>cycles[RXQ_CYCLES_PROC_CURR], interval); +} } @@ -3668,5 +3680,5 @@ dpif_netdev_run(struct dpif *dpif) port->rxqs[i].rx, port->port_no); -cycles_count_intermediate(non_pmd, process_packets ? +cycles_count_intermediate(non_pmd, NULL, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); @@ -3855,5 +3867,5 @@ reload: dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); -cycles_count_intermediate(pmd, +cycles_count_intermediate(pmd, NULL, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); Tested-by: Greg Rose Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/6] dpif-netdev: Change polled_queue to use dp_netdev_rxq.
On 08/09/2017 08:45 AM, Kevin Traynor wrote: Soon we will want to store processing cycle counts in the dp_netdev_rxq, so use that as a basis for the polled_queue that pmd_thread_main uses. Signed-off-by: Kevin Traynor--- lib/dpif-netdev.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e2cd931..f35c079 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -486,5 +486,5 @@ struct dp_netdev_pmd_cycles { struct polled_queue { -struct netdev_rxq *rx; +struct dp_netdev_rxq *rxq; odp_port_t port_no; }; @@ -3799,5 +3799,5 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd, i = 0; HMAP_FOR_EACH (poll, node, >poll_list) { -poll_list[i].rx = poll->rxq->rx; +poll_list[i].rxq = poll->rxq; poll_list[i].port_no = poll->rxq->port->port_no; i++; @@ -3837,6 +3837,6 @@ reload: for (i = 0; i < poll_cnt; i++) { VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", -pmd->core_id, netdev_rxq_get_name(poll_list[i].rx), -netdev_rxq_get_queue_id(poll_list[i].rx)); +pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx), +netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); } @@ -3853,5 +3853,5 @@ reload: for (i = 0; i < poll_cnt; i++) { process_packets = -dp_netdev_process_rxq_port(pmd, poll_list[i].rx, +dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, Tested-by: Greg Rose Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] AdvancedMD Users List
Hi, I understand your inbox is a busy place and sometimes it's natural to miss few emails. I quickly wanted to check whether you had a chance to review my email which I sent you. Please let me know how you wish to proceed with this. Thank you and look forward to hearing from you. Regards, Kristina Jones From: Kristina [mailto:krist...@zoomleads.us] Sent: Wednesday, August 09, 2017 4:28 PM To: 'd...@openvswitch.org' Subject: AdvancedMD Users List Hi, I was curious to know if you would be interested in AdvancedMD Users List 2017? Let me know your target criteria and we will revert back to you with further details. Target Users: Geography: Looking forward to continued success with you. Regards, Kristina Jones | Marketing Analyst If you have any other unique requirement please feel free to share your requirement with us as we can help you with on-demand list customized as per your requirement. To remove from this mailing: reply with subject line as "leave out" ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.
On 10 August 2017 at 11:41, Ben Pfaffwrote: > On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote: >> How about: >> struct ovsdb_idl_table { >> ... >> const struct ovsdb_idl_table_class *table_class >> >> } >> >> struct ovsdb_idl { >> >> const struct ovsdb_idl_class *idl_class; >> > > Why make it longer? > >> Besides of that, I see many places consume the table class. >> Do you mind to make a macro helps to fetch the class? >> Like: >> >> #define OVSDB_GET_TABLE_CLASS(row) \ >> ((row)->table->class) > > Ugh. Just in case the context isn't obvious... ;-) I think that in general, we try not to obscure what's going on in OVS code. This particular suggestion makes it less clear when reading the code exactly where the class comes from, because the pointer dereference is then hidden behind the macro. It also makes lines referencing the class longer, and some of these lines are already fairly long. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.
On Thu, Aug 10, 2017 at 4:43 PM, Ben Pfaffwrote: > On Thu, Aug 10, 2017 at 04:20:23PM -0400, Russell Bryant wrote: >> ovs-sandbox comes with a script to quickly set up a simple >> OVN configuration, ovn-setup.sh. This script set up config in the OVN >> northbound database, but didn't create the corresponding ports on >> br-int. Add that to save another step in provisioning this simple >> environment. >> >> Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl >> show" output. >> >> Signed-off-by: Russell Bryant > > I haven't used this script, and I didn't test it, but it seems > reasonable to me. Feel free to ask or wait for a more thorough review, > if you prefer. > > Acked-by: Ben Pfaff Thanks. It works for me. I'm not too worried so I applied this to master. FWIW, here's what it gives you, 2 logical switches connected by a router with a port on each switch. === ovn-nbctl show === switch 34bfbfec-423e-4a8c-9aef-31e1a847c601 (sw1) port sw1-port1 addresses: ["50:54:00:00:00:03 11.0.0.2"] port lrp1-attachment type: router addresses: ["00:00:00:00:ff:02"] router-port: lrp1 switch f33fce42-66dc-4439-9070-c2b09623133d (sw0) port sw0-port1 addresses: ["50:54:00:00:00:01 192.168.0.2"] port lrp0-attachment type: router addresses: ["00:00:00:00:ff:01"] router-port: lrp0 router 84e3352f-8fed-4b3b-99d4-92ccd2e4310f (lr0) port lrp1 mac: "00:00:00:00:ff:02" networks: ["11.0.0.1/24"] port lrp0 mac: "00:00:00:00:ff:01" networks: ["192.168.0.1/24"] === ovn-sbctl show === Chassis "chassis-1" hostname: sandbox Encap geneve ip: "127.0.0.1" options: {csum="true"} Port_Binding "sw0-port1" Port_Binding "sw1-port1" ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] travis: parallel builds and tests
On Thu, Aug 10, 2017 at 04:41:19PM -0400, Lance Richardson wrote: > Some recent travis builds have failed due to having exceeded the > per-job time limit of 50 minutes. This change enables parallel > builds and parallel test execution in order to reduce overall > execution time, and will hopefully allow this class of build > failures to be avoided. > > Since the travis build environment is provisioned with two CPUs, > use -j2 for builds and -j4 for tests. Testing in a cloned repository > shows slightly more than a 50% reduction in overall test time. > > Signed-off-by: Lance RichardsonThanks, applied to master. If this makes the master build finish reliably then I'll gladly apply it to branch-2.8 also. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly
On Thu, Aug 10, 2017 at 03:33:55PM -0400, Aaron Conole wrote: > Ben Pfaffwrites: > > > On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote: > >> As part of a devargs rework in DPDK, the PCI header file was removed, and > >> needs to be directly included. This isn't required to build with 17.05 or > >> earlier, but will be required should a future update happen. > >> > >> Signed-off-by: Aaron Conole > > > > Just to be sure, this patch doesn't cause a build failure with 17.05 or > > earlier, right? If it does not, then it's a no-brainer to apply it. > > Correct. Great. I applied this to master and branch-2.8. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.
On Thu, Aug 10, 2017 at 04:20:23PM -0400, Russell Bryant wrote: > ovs-sandbox comes with a script to quickly set up a simple > OVN configuration, ovn-setup.sh. This script set up config in the OVN > northbound database, but didn't create the corresponding ports on > br-int. Add that to save another step in provisioning this simple > environment. > > Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl > show" output. > > Signed-off-by: Russell BryantI haven't used this script, and I didn't test it, but it seems reasonable to me. Feel free to ask or wait for a more thorough review, if you prefer. Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v6] dp-packet: Reset DPDK hwol flags on init.
On Thu, Aug 10, 2017 at 01:22:16PM -0700, Darrell Ball wrote: > Reset the DPDK hwol flags in dp_packet_init_. The new hwol bad checksum > flag is uninitialized for non-dpdk ports and this is noticed as test > failures using netdev-dummy ports, when built with the --with-dpdk > flag set. Hence, in this case, packets may be falsely marked as having a > bad checksum. The existing APIs are simplified at the same time by > making them specific to either DPDK or otherwise; they also now > manage a single field. > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation > functions.") > CC: Sugesh Chandran> Signed-off-by: Darrell Ball Thanks a lot! I applied this to master and branch-2.8. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] travis: parallel builds and tests
Some recent travis builds have failed due to having exceeded the per-job time limit of 50 minutes. This change enables parallel builds and parallel test execution in order to reduce overall execution time, and will hopefully allow this class of build failures to be avoided. Since the travis build environment is provisioned with two CPUs, use -j2 for builds and -j4 for tests. Testing in a cloned repository shows slightly more than a 50% reduction in overall test time. Signed-off-by: Lance Richardson--- .travis/linux-build.sh | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh index d6f610ea1..bc7c7087d 100755 --- a/.travis/linux-build.sh +++ b/.travis/linux-build.sh @@ -102,16 +102,16 @@ if [ "$KERNEL" ] && [ ! "$TESTSUITE" ] && [ ! "$DPDK" ]; then fi if [ "$CC" = "clang" ]; then -make CFLAGS="$CFLAGS -Wno-error=unused-command-line-argument" +make -j2 CFLAGS="$CFLAGS -Wno-error=unused-command-line-argument" elif [[ $BUILD_ENV =~ "-m32" ]]; then # Disable sparse for 32bit builds on 64bit machine -make CFLAGS="$CFLAGS $BUILD_ENV" +make -j2 CFLAGS="$CFLAGS $BUILD_ENV" else -make CFLAGS="$CFLAGS $BUILD_ENV $SPARSE_FLAGS" C=1 +make -j2 CFLAGS="$CFLAGS $BUILD_ENV $SPARSE_FLAGS" C=1 fi if [ "$TESTSUITE" ] && [ "$CC" != "clang" ]; then -if ! make distcheck RECHECK=yes; then +if ! make distcheck TESTSUITEFLAGS=-j4 RECHECK=yes; then # testsuite.log is necessary for debugging. cat */_build/tests/testsuite.log exit 1 -- 2.13.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 4/4] ovsdb-idl: Rename 'old' to 'old_datum'.
On Wed, Aug 09, 2017 at 03:27:42PM -0700, Joe Stringer wrote: > Now that the 'new' datum is named 'new_datum', be more consistent by > renaming 'old' to 'old_datum' to match. > > Signed-off-by: Joe Stringer> --- > v2: New patch. Oh, it's an additional patch. OK ;-) Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 3/4] ovsdb-idl: Avoid new expression.
On Wed, Aug 09, 2017 at 03:27:41PM -0700, Joe Stringer wrote: > In C++, 'new' is a keyword. If this is used as the name for a field, > then C++ compilers can get confused about the context and fail to > compile references to such fields. Rename the field to 'new_datum' to > avoid this issue. > > Signed-off-by: Joe Stringer> --- > v2: Rebase. > Rename 'new_' to 'new_datum'. > Update comments above 'struct ovsdb_idl_row'. Should we rename 'old' to 'old_datum' for consistency? Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [patch_v6] dp-packet: Reset DPDK hwol flags on init.
Reset the DPDK hwol flags in dp_packet_init_. The new hwol bad checksum flag is uninitialized for non-dpdk ports and this is noticed as test failures using netdev-dummy ports, when built with the --with-dpdk flag set. Hence, in this case, packets may be falsely marked as having a bad checksum. The existing APIs are simplified at the same time by making them specific to either DPDK or otherwise; they also now manage a single field. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation functions.") CC: Sugesh ChandranSigned-off-by: Darrell Ball --- v5->v6: Refactor some existing apis and incorporate review comments. v3->v5: Update the commit message with more context. v2->v3: Use existed API to reset both the DPDK HWOL flags. v1->v2: Fix build failure for without --with-dpdk. lib/dp-packet.c | 1 + lib/dp-packet.h | 24 +--- lib/netdev-dpdk.c | 2 +- 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..c1f43f3 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -31,6 +31,7 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source so dp_packet_reset_offsets(b); pkt_metadata_init(>md, 0); dp_packet_rss_invalidate(b); +dp_packet_mbuf_init(b); dp_packet_reset_cutlen(b); /* By default assume the packet type to be Ethernet. */ b->packet_type = htonl(PT_ETH); diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 8f52fee..bb3f9db 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -605,12 +605,30 @@ dp_packet_rss_valid(struct dp_packet *p) } static inline void -dp_packet_rss_invalidate(struct dp_packet *p) +dp_packet_rss_invalidate(struct dp_packet *p OVS_UNUSED) +{ +#ifndef DPDK_NETDEV +p->rss_hash_valid = false; +#endif +} + +static inline void +dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV p->mbuf.ol_flags &= ~PKT_RX_RSS_HASH; -#else -p->rss_hash_valid = false; +#endif +} + +/* This initialization is needed for packets that do not come + * from DPDK interfaces, when vswitchd is built with --with-dpdk. + * The DPDK rte library will still otherwise manage the mbuf. + * We only need to initialize the mbuf ol_flags. */ +static inline void +dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) +{ +#ifdef DPDK_NETDEV +p->mbuf.ol_flags = 0; #endif } diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 1d82bca..9fd98af 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2916,7 +2916,7 @@ netdev_dpdk_ring_send(struct netdev *netdev, int qid, * modified by the consumer of the ring and return into the datapath * without recalculating the RSS hash. */ for (i = 0; i < batch->count; i++) { -dp_packet_rss_invalidate(batch->packets[i]); +dp_packet_mbuf_rss_flag_reset(batch->packets[i]); } netdev_dpdk_send__(dev, qid, batch, may_steal, concurrent_txq); -- 1.9.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.
ovs-sandbox comes with a script to quickly set up a simple OVN configuration, ovn-setup.sh. This script set up config in the OVN northbound database, but didn't create the corresponding ports on br-int. Add that to save another step in provisioning this simple environment. Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl show" output. Signed-off-by: Russell Bryant--- tutorial/ovn-setup.sh | 8 1 file changed, 8 insertions(+) diff --git a/tutorial/ovn-setup.sh b/tutorial/ovn-setup.sh index dc4c813b1..943ca58ed 100755 --- a/tutorial/ovn-setup.sh +++ b/tutorial/ovn-setup.sh @@ -23,5 +23,13 @@ ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 +ovs-vsctl add-port br-int p1 -- \ +set Interface p1 external_ids:iface-id=sw0-port1 +ovs-vsctl add-port br-int p2 -- \ +set Interface p2 external_ids:iface-id=sw1-port1 + # View a summary of the configuration +printf "\n=== ovn-nbctl show ===\n\n" ovn-nbctl show +printf "\n=== ovn-sbctl show ===\n\n" +ovn-sbctl show -- 2.13.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.
Darrell Ballwrites: > -Original Message- > From: Aaron Conole > Date: Thursday, August 10, 2017 at 10:13 AM > To: Darrell Ball > Cc: ovs dev > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags > on init. > > Darrell Ball writes: > > > -Original Message- > > From: Aaron Conole > > Date: Wednesday, August 9, 2017 at 12:51 PM > > To: Darrell Ball > > Cc: Joe Stringer , Darrell Ball , ovs > > dev > > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL > > checksum flags on init. > > > > Darrell Ball writes: > > > > > Thanks Joe > > > I forgot to add your Tested-by to V5; I have been testing this > myself; > > > but let me know if you would like it added – I can send a V6. > > > > It will automatically be added by patchwork. It is sufficient to > > download (ex: > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI= > ) > > > > I usually use patchwork when working on a series - if I download it > from > > patchwork, I can be confident that all the tags are applied > properly, so > > I won't forget. Plus, all the discussion happens there, so I can > > quickly browse it. > > > > Thanks Aaron > > In this case, the Tested-by was applied to V4 and I sent a V5 which did > not have the Tested-by. > > The easiest solution would have been to ask Joe to respond to V5 – > well, it is moot now anyways with V6 on the way. > > Yep, that happens. > > > The full list is available at: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE= > > > > > Thanks, I am on this page most of the day ( > > > > It would actually be cool to have a few more admins to troll > patchwork > > and do things like ping for status (ex: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4= > > > > > I don’t have any context on this patch. > > Aaron, would you like to resurrect this patch with rebase ?; it is RHEL > related. > > It doesn't need a rebase; it applies cleanly. The version of patchwork > running doesn't recognize my ack, so someone will have to add it > manually (or omit it... doesn't matter too much, since Ben ack'd as > well). > > But it would be cool to get it applied; maybe even to get back as far > as branch-2.7 (I think it was posted in time for 2.7 branch). > > > One thing is I don’t know if Daniele had a chance to test this, although it > looks simple enough. > Did you get a chance to test it ? if so, maybe a tested-by ? I believe I tested it at the time I acked it. I'll set up a test for it tomorrow, though. > > is this still needed? it > > does seem to mostly apply). Then we could make sure we don't miss > > things. > > > > > Darrell > > > > > > -Original Message- > > > From: on behalf of Joe > > > Stringer > > > Date: Tuesday, August 8, 2017 at 4:51 PM > > > To: Darrell Ball > > > Cc: ovs dev > > > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL > checksum > > > flags on init. > > > > > > On 8 August 2017 at 16:39, Darrell Ball > wrote: > > > > Reset the DPDK HWOL checksum flags in dp_packet_init_. > > > > The new HWOL bad checksum flag is uninitialized on non-dpdk > ports and > > > > this is noticed as test failures using netdev-dummy > > > > ports where the bad > > > > checksum flag is checked. > > > > > > > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum > > > > validation functions.") > > > > CC: Sugesh Chandran > > > > Signed-off-by: Darrell Ball > > > > --- > > > > > > Tested-by: Joe Stringer > > > Tested-at:
Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly
Ben Pfaffwrites: > On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote: >> As part of a devargs rework in DPDK, the PCI header file was removed, and >> needs to be directly included. This isn't required to build with 17.05 or >> earlier, but will be required should a future update happen. >> >> Signed-off-by: Aaron Conole > > Just to be sure, this patch doesn't cause a build failure with 17.05 or > earlier, right? If it does not, then it's a no-brainer to apply it. Correct. > Thanks, > > Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovn-controller: Refactor function of consider_port_binding
On Fri, Aug 04, 2017 at 10:31:14AM +0800, wang.qia...@zte.com.cn wrote: > The function of consider_port_binding is redundant. This patch split the > function to some sub-function by the port type. > > Signed-off-by: wang qianyuCan you explain how it is redundant, and how this patch reduces it? This patch makes two identical copies of the code for handling L3 gateway ports and patch ports, as patch_port_handle() and l3gateway_port_handle(), so it's hard for me to see what redundancy it eliminates. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN question: Transmission of link-local IPv6 multicast
On Thu, Aug 10, 2017 at 1:14 PM Ben Pfaffwrote: > On Thu, Aug 10, 2017 at 02:41:30PM +, Mark Michelson wrote: > > I'm curious about the current behavior when ovn-controller originates a > > packet and transmits it over a link-local IPv6 multicast address. Will > the > > packet be delivered only to nodes on the same chassis, or can the packet > > also reach on-link nodes on other chassis? > > When an OVN logical switch receives an Ethernet broadcast or multicast > packet from a given port, it forwards it to all of its other logical > switch ports, regardless of chassis. There are special cases for ARP > and IPv6 ND, to avoid forwarding them across chassis for known IP > addresses. I don't think that we have any special cases for IPv6 > link-local multicast. Limiting these within a chassis would probably > require new mechanisms, because logical flows, by design, do not respect > physical boundaries. > > Interesting you mention IPv6 ND here. RAs fall under the umbrella of IPv6 ND. So depending on how IPv6 ND is detected, it may be that RAs will be limited from being sent to other chassis. The interesting bit here is the "for known IP addresses" part. I'll take a closer look at the flows to see what that means. If the all-nodes link-local multicast address is one of these known IP addresses, then things might just work without having to do much extra. > > The reason I ask is that I'm working towards expanding OVN's IPv6 > neighbor > > discovery capabilities, and the first step on that path is being able to > > send periodic router advertisements (RAs). In a traditional physical > > network, each interface on a router may send periodic RAs to its > link-local > > all-nodes address (ff02::1). This way all on-link nodes receive the RA, > and > > the world is a happy place. > > > > OVN is not a traditional physical router, though :) > > > > With OVN, each hypervisor has an ovn-controller instance running, > blurring > > the lines about what exactly constitutes a "router interface". From an > > ovn-nb point of view, a logical router port constitutes a router > interface. > > However, in practice, I'm not sure if this holds true once you get down > to > > the ovn-controller level. Let's say that a single link (via a logical > > switch) is spread across three hypervisors. If the ovn-controller on each > > of those hypervisors sends an RA to the link-local all-nodes address, > would > > that result in nodes on each hypervisor receiving three RAs? Or would > each > > node only receive one RA since it only receives the RA from the local > > ovn-controller? > > I believe that this would result in 3x advertisements. > That's what I was afraid of. This could end up being a bit spammy if all ovn-controllers are sending RAs. But as I mentioned earlier, this may actually be taken care of since RAs are part of IPv6 ND. > > > I suspect the answer to this all comes down to how openflow flows get > > installed on each OVS instance. What I'm curious about is the current > > behavior of OVN in this regard. > > I guess that it would be more efficient for each chassis to individually > generate these router advertisements and ensure that they do not > propagate beyond the physical chassis boundary? I can think of ways to > do that, especially if ovn-controller were generating them. This is > because when ovn-controller generates packets, it can tag them with > additional metadata that indicate the purpose. For example, to allow it > to send a chassis-limited packet, we could implement a new metadata bit > that suppresses output through tunnels, and ovn-controller could set > that bit. > Cool, thanks for the pointer. If the current IPv6 ND-related rules don't already do what I need, I'll look into the metadata idea. > > > P.S. Someone please tell me if ovs-discuss would have been a more > suitable > > list for this question. I figured since it was related to a feature I was > > developing that this would be the proper list, though. > > Seems like a legit dev discussion to me. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/5] netdev-dpdk: Add netdev_dpdk_vhost_txq_flush function.
>> } else { +/* If the queue is disabled in the guest, the corresponding qid + * map shall be set to OVS_VHOST_QUEUE_DISABLED(-2). + * + * The packets that were queued in 'qid' could be potentially + * stuck and needs to be dropped. + * + * XXX: The queues may be already disabled in the guest so + * flush function in this case only helps in updating stats + * and freeing memory. + */ +netdev_dpdk_vhost_txq_flush(>up, qid, 0); dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; } netdev_dpdk_remap_txqs(dev); >> >> 'netdev_dpdk_remap_txqs()', actually, is able to change mapping for >> all the disabled in guest queues. So, we need to flush all of them >> while remapping somewhere inside the function. >> One other thing is that there is a race window between flush and >> mapping update where another process able to enqueue more packets in >> just flushed queue. The order of operations should be changed, or both >> of them should be done under the same tx_lock. I think, it's required >> to make tx_q[].map field atomic to fix the race condition, because >> send function takes the 'map' and then locks the corresponding queue. >> It wasn't an issue before, because packets in case of race was just >> dropped on attempt to send to disabled queue, but with this patch >> applied they will be enqueued to the intermediate queue and stuck there. > >Making 'map' atomic will not help. To solve the race we should make 'reading >of map + enqueue' an atomic operation by some spinlock. >Like this: > >vhost_send: > >qid = qid % netdev->n_txq; >rte_spinlock_lock(>tx_q[qid].tx_lock); > >mapped_qid = dev->tx_q[qid].map; > >if (qid != mapped_qid) { >rte_spinlock_lock(>tx_q[mapped_qid].tx_lock); >} > >tx_enqueue(mapped_qid, pkts, cnt); > >if (qid != mapped_qid) { >rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock); >} > >rte_spinlock_unlock(>tx_q[qid].tx_lock); > > >txq remapping inside 'netdev_dpdk_remap_txqs()' or >'vring_state_changed()': > >qid - queue we need to remap. >new_qid - queue we need to remap to. > >rte_spinlock_lock(>tx_q[qid].tx_lock); > >mapped_qid = dev->tx_q[qid].map; >if (qid != mapped_qid) { >rte_spinlock_lock(>tx_q[mapped_qid].tx_lock); >} > >tx_flush(mapped_qid) > >if (qid != mapped_qid) { >rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock); >} > >dev->tx_q[qid].map = new_qid; > >rte_spinlock_unlock(>tx_q[qid].tx_lock); > > >Above schema should work without races, but looks kind of ugly and requires >taking of additional spinlock on each send. > >P.S. Sorry for talking with myself. Just want to share my thoughts. Hi Ilya, Can you please review the below changes based on what you suggested above. As the problem only happens when the queues are enabled/disabled in the guest, I did some preliminary testing with the below changes by sending some traffic in to the VM and enabling and disabling the queues inside the guest the same time. Vhost_send() - qid = qid % netdev->n_txq; /* Acquire tx_lock before reading tx_q[qid].map and enqueueing packets. * tx_q[].map gets updated in vring_state_changed() when vrings are * enabled/disabled in the guest. */ rte_spinlock_lock(>tx_q[qid].tx_lock); mapped_qid = dev->tx_q[qid].map; if (OVS_UNLIKELY(qid != mapped_qid)) { rte_spinlock_lock(>tx_q[mapped_qid].tx_lock); } if (OVS_UNLIKELY(!is_vhost_running(dev) || mapped_qid < 0 || !(dev->flags & NETDEV_UP))) { rte_spinlock_lock(>stats_lock); dev->stats.tx_dropped+= cnt; rte_spinlock_unlock(>stats_lock); for (i = 0; i < total_pkts; i++) { dp_packet_delete(pkts[i]); } if (OVS_UNLIKELY(qid != mapped_qid)) { rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock); } rte_spinlock_unlock(>tx_q[qid].tx_lock); return; } cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); /* Check has QoS has been configured for the netdev */ cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt); dropped = total_pkts - cnt; int idx = 0; struct dpdk_tx_queue *txq = >tx_q[mapped_qid]; while (idx < cnt) { txq->vhost_burst_pkts[txq->vhost_pkt_cnt++] = pkts[idx++]; if (txq->vhost_pkt_cnt >=
Re: [ovs-dev] [PATCHv2 2/4] ovsdb-idl: Avoid mutable type specifier.
I like that. On Thu, Aug 10, 2017 at 11:54:11AM +0800, Gao Zhenyu wrote: > How about mutable --> is_mutable ? > > > Thanks > Zhenyu Gao > > 2017-08-10 6:27 GMT+08:00 Joe Stringer: > > > In C++, 'mutable' is a keyword. If this is used as the name for a field, > > then C++ compilers can get confused about the context and fail to > > compile references to such fields. Rename the field to 'mutable_' to > > avoid this issue. > > > > Signed-off-by: Joe Stringer > > --- > > v2: Rebase. > > --- > > lib/ovsdb-idl-provider.h | 2 +- > > lib/ovsdb-idl.c | 2 +- > > ovsdb/ovsdb-idlc.in | 2 +- > > 3 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h > > index a2eb8cac67d7..aa1fa9390572 100644 > > --- a/lib/ovsdb-idl-provider.h > > +++ b/lib/ovsdb-idl-provider.h > > @@ -89,7 +89,7 @@ struct ovsdb_idl_row { > > struct ovsdb_idl_column { > > char *name; > > struct ovsdb_type type; > > -bool mutable; > > +bool mutable_; > > void (*parse)(struct ovsdb_idl_row *, const struct ovsdb_datum *); > > void (*unparse)(struct ovsdb_idl_row *); > > }; > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > > index 227aa5fbfcb2..d474075fa990 100644 > > --- a/lib/ovsdb-idl.c > > +++ b/lib/ovsdb-idl.c > > @@ -2871,7 +2871,7 @@ bool > > ovsdb_idl_is_mutable(const struct ovsdb_idl_row *row, > > const struct ovsdb_idl_column *column) > > { > > -return column->mutable || (row->new && !row->old); > > +return column->mutable_ || (row->new && !row->old); > > } > > > > /* Returns false if 'row' was obtained from the IDL, true if it was > > initialized > > diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in > > index f065ef1c68fe..36b7d2700bb6 100755 > > --- a/ovsdb/ovsdb-idlc.in > > +++ b/ovsdb/ovsdb-idlc.in > > @@ -1268,7 +1268,7 @@ void > > .type = { > > %(type)s > > }, > > - .mutable = %(mutable)s, > > + .mutable_ = %(mutable)s, > > .parse = %(s)s_parse_%(c)s, > > .unparse = %(s)s_unparse_%(c)s, > > },\n""" % {'P': prefix.upper(), > > -- > > 2.13.3 > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.
On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote: > How about: > struct ovsdb_idl_table { > ... > const struct ovsdb_idl_table_class *table_class > > } > > struct ovsdb_idl { > > const struct ovsdb_idl_class *idl_class; > Why make it longer? > Besides of that, I see many places consume the table class. > Do you mind to make a macro helps to fetch the class? > Like: > > #define OVSDB_GET_TABLE_CLASS(row) \ > ((row)->table->class) Ugh. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly
On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote: > As part of a devargs rework in DPDK, the PCI header file was removed, and > needs to be directly included. This isn't required to build with 17.05 or > earlier, but will be required should a future update happen. > > Signed-off-by: Aaron ConoleJust to be sure, this patch doesn't cause a build failure with 17.05 or earlier, right? If it does not, then it's a no-brainer to apply it. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovsdb-server replication: Possible to exclude syncing specific column of a table??
OK. I see what you're talking about. It's not a bug, then. OVSDB is just doing its job ensuring referential integrity. Please feel free to submit a patch to implement the feature that you want. On Thu, Aug 10, 2017 at 06:07:09AM +0530, Arunkumar Rg wrote: > Hi Ben, > > Thanks for your response! > > I have reported this to b...@openvswitch.org. > > Below is what I have shared to b...@openvswitch.org. > > Issue summary: > -- > On setting, "ovsdb-server/set-sync-exclude-tables" for certain tables, > ovsdb-server replication does not work. > i.e the ovsdb-server which connects to the active ovsdb-server doesn't > become 'backup'. > > Expected: > - > ovsdb-server which connects to active ovsdb-server should become 'backup' > and recieve sync for all the tables except for the tables specified in the > ctl command "ovsdb-server/set-sync-exclude-tables". > > Reproduction steps: > --- > ["db:Hardware_vtep"---OVSDB-SERVER(Active)-(ptcp:3.1.1.2:7000 > )-]--[-OVSDB-SERVER(Backup)---"db:Hardware_vtep"] > > 1. Start two ovsdb-server instances with db schema as "Hardware_vtep". Lets > call them instance-1 and instance-2. > > ps -ef | grep ovsdb-server > root 2704 2093 0 19:56 ?00:00:03 ovsdb-server > /opt/ovsdb/db/vtep.db --remote=punix:/opt/ovsdb/db/db.sock > --unixctl=/opt/ovsdb/ovsdb-server.ctl > > 2. At instance-1: Add an entry in 'Manager' table. > vtep-ctl --db=unix:/opt/ovsdb/db/db.sock set-manager ssl: > 10.133.130.126:6640 > > ovsdb-client dump unix:/opt/ovsdb/db/db.sock<<---showing only the > tables which is relevant to this issue. > Global table > _uuidmanagers > other_config switches > > -- > 4121ba6c-a729-45b1-8f69-806b32584d04 > [73affd9e-07f0-422f-a394-47ae2a1ab499] {} > > Manager table > _uuidinactivity_probe is_connected > max_backoff other_config status target > > --- --- - > 73affd9e-07f0-422f-a394-47ae2a1ab499 [] false [] > {} "ssl:10.133.130.126:6640" > > > 3. Now do ovsdb replication related configs: > 3.1. At instance-1: > ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl ovsdb-server/add-remote > ptcp:7000:3.1.1.2 > > 3.2 At instance-2: > ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl > ovsdb-server/set-active-ovsdb-server tcp:3.1.1.2:7000 > ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl > ovsdb-server/set-sync-exclude-tables hardware_vtep:Manager > ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl > ovsdb-server/connect-active-ovsdb-server > > 4. Issue reproduced now!! > Instance-2 will not become 'Backup' and it'll remain in 'Active' state. > > This seem to be due to the following error, while processing the > notification for 'Global' table: > > "Aug 9 12:26:38 Pollux daemon.err ovsdb-server: > ovs|00070|ovsdb_error|ERR|unexpected ovsdb error: referential integrity > violation: Table Global column managers row > 4121ba6c-a729-45b1-8f69-806b32584d04 references nonexistent row > 5dbedec1-8221-435f-89b3-503867d0e987 in t" > > Note: In the hardware_vtep schema, 'Global' table has a reference to > 'Manager' table. > > Additional inputs: > -- > bash-3.2# ovsdb-server --version > ovsdb-server (Open vSwitch) 2.7.0 > > Linux version 4.1.14 > > Thanks, > Arun. > > > On Wed, Aug 9, 2017 at 9:27 PM, Ben Pfaffwrote: > > > On Wed, Aug 09, 2017 at 05:34:16PM +0530, Arunkumar Rg wrote: > > > Hi All, > > > > > > I need a clarification on ovsdb-server replication's > > > set-sync-exclude-tables. > > > > > > *Is there a way, by which we can say exclude syncing specific column of a > > > table??* > > > > > > The use case I'm looking at is: > > > > > >1. The replication is configured for DB "hardware_vtep". > > >2. I'm trying to use the "ovsdb-server/set-sync-exclude-tables" > > unixctl > > >command for table "hardware_vtep:Manager" on the backup ovsdb-server. > > >3. After this the backup ovsdb-server gets replica(notification) from > > >the active's db except for 'Manager' table. > > >4. At this poiint, while processing the notification for 'Global' > > table, > > >I see the below error: > > >5. "Aug 9 12:26:38 Pollux daemon.err ovsdb-server: > > >ovs|00070|ovsdb_error|ERR|unexpected ovsdb error: referential > > integrity > > >violation: Table Global column managers row > > >4121ba6c-a729-45b1-8f69-806b32584d04 references nonexistent row > > >5dbedec1-8221-435f-89b3-503867d0e987 in t"" > > >6. I think this error could be avoided if we can say exclude syncing > > >'Manager' column in 'Global' table also. > > > > It sounds like a bug. Probably, you
Re: [ovs-dev] [PATCH] redhat: add vfio udev rules
On Wed, Aug 9, 2017 at 4:36 PM, Aaron Conolewrote: > This commit builds on the non-root ovs work and adds a udev rule which will > automatically set the group permissions of vfio devices. > > Signed-off-by: Aaron Conole > --- > Systemd folks say that this is not something that should be a part of systemd, > but should be part of openvswitch. I can make a case for either, but it seems > it's probably an uphill battle to get it in through systemd. This should be > applied on branch-2.8, as it _could_ be considered a "fix" for > commit e3e738a3d058 ("redhat: allow dpdk to also run as non-root user"). > > rhel/automake.mk| 1 + > rhel/openvswitch-fedora.spec.in | 8 > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > 3 files changed, 10 insertions(+) > create mode 100644 rhel/usr_lib_udev_rules.d_91-vfio.rules Thanks, I added a fixes header and applied this to master and branch-2.8. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 1/2] OF support and translation of generic encap and decap
I'm inclined to make the action name specific to the header, e.g. encap_nsh, decap_nsh. There doesn't have to be a one-to-one correspondence between syntax and OpenFlow encoding. On Thu, Aug 10, 2017 at 03:25:20PM +, Jan Scheurich wrote: > The generic code today in function parse_ENCAP() uses the encap header string > also as string for the property class. I am afraid that implementer of > subsequent encap headers might not realize that this is a temporary > implementation shortcut that should have been generalized. > > The minimum we should add now is a comment explaining this, such as, for > example: > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c > index bfc8a80..84522e0 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4134,9 +4134,19 @@ parse_ed_props(const uint16_t prop_class, char **arg, > int *n_props, struct ofpbu > return NULL; > } > > -/* The string representation of the encap action is > - * encap(header_type(prop=,tlv(,,),...)) > - */ > +/* The generic string representation of the encap action is > + * encap() > + * encap((=,(),...)) > + * > + * TODO: The current implementation only supports the simple case that all > + * encap parameters for a given encap header type are in a single property > + * class, identified by the same keyword as the encap header type. > + * To represent different property classes allowed by the OF action, the > + * syntax should be generalized as follows: > + * > + * encap((=,(),...), > + * (=,(),...),...) > +*/ > > static char * OVS_WARN_UNUSED_RESULT > parse_ENCAP(char *arg, > > BR, Jan > > > -Original Message- > > From: Ben Pfaff [mailto:b...@ovn.org] > > Sent: Tuesday, 08 August, 2017 21:36 > > To: Jan Scheurich> > Cc: Yi Yang ; d...@openvswitch.org; Zoltán Balogh > > > > Subject: Re: [PATCH v5 1/2] OF support and translation of generic encap and > > decap > > > > We can add the additional syntax at the same time we add something that > > has the need for it. > > > > On Tue, Aug 08, 2017 at 03:52:32PM +, Jan Scheurich wrote: > > > I know this comment is late as the patch has been merged already, but I > > > just returned from vacation and only found today: > > > > > > I agree that the new simplified ovs-ofctl syntax for encap properties > > > looks quite neat for the NSH use case at hand: > > > > > > encap(nsh(md_type=1)) > > > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) > > > > > > But unfortunately we have fallen into the trap of oversimplifying. The > > > generic encap action (drafted in EXT-382) distinguishes between > > the encapsulation packet type (in this case (1,0x894f) for NSH) and the > > encap property class (in this case 0x4 for NSH), which may be > > different for every property in an encap action. > > > > > > Using the "nsh" keyword both as shorthand for the packet type and the > > > property class selector works here because the NSH encap action > > only supports NSH properties so far, and in many simple applications of the > > generic encap action there may well be 1-1 relation between > > the encap packet type and the class of all supported properties. But the > > ONF draft specifically decouples the packet type from property > > classes for a richer set of use cases. > > > > > > For example there could be a more powerful generic encap operation to add > > > multiple protocol layers in one action. In this case the > > encap properties would likely belong to several property classes. Or an > > encapsulation header re-uses a general encap property already > > defined elsewhere, which should not be duplicated. > > > > > > In order not to restrict the generality of the ovs-ofctl syntax, I'd > > > suggest to separate packet type and the property class as follows: > > > > > > encap() > > > encap((,(=,()),...) > > > > > > For NSH the syntax would look: > > > > > > encap(nsh,nsh(md_type=1)) > > > encap(nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) > > > > > > To be user friendly we can also allow the above more concise syntax in > > > the special case of a 1-1 relation between packet type and > > property class. In that case we could re-add the generalization of the > > syntax after the OVS 2.8 release without braking backward > > compatibility. > > > > > > BR, Jan > > > > > > > -Original Message- > > > > From: Yi Yang [mailto:yi.y.y...@intel.com] > > > > Sent: Wednesday, 02 August, 2017 10:04 > > > > To: d...@openvswitch.org > > > > Cc: b...@ovn.org; Jan Scheurich ; Yi Yang > > > > ; Zoltán Balogh > > > > > > > > Subject: [PATCH v5 1/2] OF support and translation of generic encap and > > > > decap > > > > > > > > From: Jan Scheurich > > > > > > > > This commit adds support for the OpenFlow actions generic
Re: [ovs-dev] OVN question: Transmission of link-local IPv6 multicast
On Thu, Aug 10, 2017 at 02:41:30PM +, Mark Michelson wrote: > I'm curious about the current behavior when ovn-controller originates a > packet and transmits it over a link-local IPv6 multicast address. Will the > packet be delivered only to nodes on the same chassis, or can the packet > also reach on-link nodes on other chassis? When an OVN logical switch receives an Ethernet broadcast or multicast packet from a given port, it forwards it to all of its other logical switch ports, regardless of chassis. There are special cases for ARP and IPv6 ND, to avoid forwarding them across chassis for known IP addresses. I don't think that we have any special cases for IPv6 link-local multicast. Limiting these within a chassis would probably require new mechanisms, because logical flows, by design, do not respect physical boundaries. > The reason I ask is that I'm working towards expanding OVN's IPv6 neighbor > discovery capabilities, and the first step on that path is being able to > send periodic router advertisements (RAs). In a traditional physical > network, each interface on a router may send periodic RAs to its link-local > all-nodes address (ff02::1). This way all on-link nodes receive the RA, and > the world is a happy place. > > OVN is not a traditional physical router, though :) > > With OVN, each hypervisor has an ovn-controller instance running, blurring > the lines about what exactly constitutes a "router interface". From an > ovn-nb point of view, a logical router port constitutes a router interface. > However, in practice, I'm not sure if this holds true once you get down to > the ovn-controller level. Let's say that a single link (via a logical > switch) is spread across three hypervisors. If the ovn-controller on each > of those hypervisors sends an RA to the link-local all-nodes address, would > that result in nodes on each hypervisor receiving three RAs? Or would each > node only receive one RA since it only receives the RA from the local > ovn-controller? I believe that this would result in 3x advertisements. > I suspect the answer to this all comes down to how openflow flows get > installed on each OVS instance. What I'm curious about is the current > behavior of OVN in this regard. I guess that it would be more efficient for each chassis to individually generate these router advertisements and ensure that they do not propagate beyond the physical chassis boundary? I can think of ways to do that, especially if ovn-controller were generating them. This is because when ovn-controller generates packets, it can tag them with additional metadata that indicate the purpose. For example, to allow it to send a chassis-limited packet, we could implement a new metadata bit that suppresses output through tunnels, and ovn-controller could set that bit. > P.S. Someone please tell me if ovs-discuss would have been a more suitable > list for this question. I figured since it was related to a feature I was > developing that this would be the proper list, though. Seems like a legit dev discussion to me. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.
-Original Message- From: Aaron ConoleDate: Thursday, August 10, 2017 at 10:13 AM To: Darrell Ball Cc: ovs dev Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init. Darrell Ball writes: > -Original Message- > From: Aaron Conole > Date: Wednesday, August 9, 2017 at 12:51 PM > To: Darrell Ball > Cc: Joe Stringer , Darrell Ball , ovs > dev > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init. > > Darrell Ball writes: > > > Thanks Joe > > I forgot to add your Tested-by to V5; I have been testing this myself; > > but let me know if you would like it added – I can send a V6. > > It will automatically be added by patchwork. It is sufficient to > download (ex: https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI= ) > > I usually use patchwork when working on a series - if I download it from > patchwork, I can be confident that all the tags are applied properly, so > I won't forget. Plus, all the discussion happens there, so I can > quickly browse it. > > Thanks Aaron > In this case, the Tested-by was applied to V4 and I sent a V5 which did not have the Tested-by. > The easiest solution would have been to ask Joe to respond to V5 – well, it is moot now anyways with V6 on the way. Yep, that happens. > The full list is available at: > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE= > > Thanks, I am on this page most of the day ( > > It would actually be cool to have a few more admins to troll patchwork > and do things like ping for status (ex: > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4= > > I don’t have any context on this patch. > Aaron, would you like to resurrect this patch with rebase ?; it is RHEL related. It doesn't need a rebase; it applies cleanly. The version of patchwork running doesn't recognize my ack, so someone will have to add it manually (or omit it... doesn't matter too much, since Ben ack'd as well). But it would be cool to get it applied; maybe even to get back as far as branch-2.7 (I think it was posted in time for 2.7 branch). One thing is I don’t know if Daniele had a chance to test this, although it looks simple enough. Did you get a chance to test it ? if so, maybe a tested-by ? > is this still needed? it > does seem to mostly apply). Then we could make sure we don't miss > things. > > > Darrell > > > > -Original Message- > > From: on behalf of Joe Stringer > > Date: Tuesday, August 8, 2017 at 4:51 PM > > To: Darrell Ball > > Cc: ovs dev > > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum > > flags on init. > > > > On 8 August 2017 at 16:39, Darrell Ball wrote: > > > Reset the DPDK HWOL checksum flags in dp_packet_init_. > > > The new HWOL bad checksum flag is uninitialized on non-dpdk ports and > > > this is noticed as test failures using netdev-dummy ports where the bad > > > checksum flag is checked. > > > > > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum > > > validation functions.") > > > CC: Sugesh Chandran > > > Signed-off-by: Darrell Ball > > > --- > > > > Tested-by: Joe Stringer > > Tested-at: https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_joestringer_openvswitch_jobs_262464859=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=d664GG1tgARiN5uEJxebCnczlUnMDtHdWluYN0dRc5g= > > > > I'll let those more familiar with this code provide the review. > >
Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.
Darrell Ballwrites: > -Original Message- > From: Aaron Conole > Date: Wednesday, August 9, 2017 at 12:51 PM > To: Darrell Ball > Cc: Joe Stringer , Darrell Ball , ovs > dev > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags > on init. > > Darrell Ball writes: > > > Thanks Joe > > I forgot to add your Tested-by to V5; I have been testing this myself; > > but let me know if you would like it added – I can send a V6. > > It will automatically be added by patchwork. It is sufficient to > download (ex: > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI= > ) > > I usually use patchwork when working on a series - if I download it from > patchwork, I can be confident that all the tags are applied properly, so > I won't forget. Plus, all the discussion happens there, so I can > quickly browse it. > > Thanks Aaron > In this case, the Tested-by was applied to V4 and I sent a V5 which did not > have the Tested-by. > The easiest solution would have been to ask Joe to respond to V5 – well, it > is moot now anyways with V6 on the way. Yep, that happens. > The full list is available at: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE= > > > Thanks, I am on this page most of the day ( > > It would actually be cool to have a few more admins to troll patchwork > and do things like ping for status (ex: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4= > > > I don’t have any context on this patch. > Aaron, would you like to resurrect this patch with rebase ?; it is RHEL > related. It doesn't need a rebase; it applies cleanly. The version of patchwork running doesn't recognize my ack, so someone will have to add it manually (or omit it... doesn't matter too much, since Ben ack'd as well). But it would be cool to get it applied; maybe even to get back as far as branch-2.7 (I think it was posted in time for 2.7 branch). > is this still needed? it > does seem to mostly apply). Then we could make sure we don't miss > things. > > > Darrell > > > > -Original Message- > > From: on behalf of Joe Stringer > > > Date: Tuesday, August 8, 2017 at 4:51 PM > > To: Darrell Ball > > Cc: ovs dev > > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum > > flags on init. > > > > On 8 August 2017 at 16:39, Darrell Ball wrote: > > > Reset the DPDK HWOL checksum flags in dp_packet_init_. > > > The new HWOL bad checksum flag is uninitialized on non-dpdk ports > and > > > this is noticed as test failures using netdev-dummy ports where > the bad > > > checksum flag is checked. > > > > > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum > > > validation functions.") > > > CC: Sugesh Chandran > > > Signed-off-by: Darrell Ball > > > --- > > > > Tested-by: Joe Stringer > > Tested-at: > https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_joestringer_openvswitch_jobs_262464859=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=d664GG1tgARiN5uEJxebCnczlUnMDtHdWluYN0dRc5g= > > > > > I'll let those more familiar with this code provide the review. > > ___ > > dev mailing list > > d...@openvswitch.org > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=kCTOzFrO_3CdwW2mM_wcQZXMHDtzpo8cAuKlHSOcxTw= > > > > > > > ___ > > dev mailing list > > d...@openvswitch.org > > >
Re: [ovs-dev] OVS-DPDK public meeting
9th August 2017 ATTENDEES: Aaron C, Antonio F, Ciara L, Darrell B, Olga A, Sugesh C, Mark K, Michael L, Johan T, Peter S, Ian S, Kevin T, Yipeng W, Billy O'M, Finn C, Georg S, Jan S, Simon H, Ben P, Ori, (may have missed some) === GENERAL === - OVS 2.8 -- OVS 2.8 branched -- Looking at end of August for release -- Bug fixes at this point - DPDK 17.05.2 -- Discussion on whether this would be out in time for OVS 2.8 -- Probably too late for validation with OVS 2.8 - DPDK 17.08 -- Released this week - Staging branch -- Darrell gave us an update about this -- No serious problems reported so far -- Plan to submit merge requests on regular basis (~1 week) -- Request for usual "applied" email reply to merged patches - Feature proposals for OVS 2.9 -- Ian will send list of features Intel are planning to work on - Conferences fyi -- DPDK Userspace - September 26-27 Dublin -- DPDK Summit - November 14-15 San Jose -- OVS Conference - TBC, may align with DPDK summit -- Openstack Summit - November 6-8 Sydney PERFORMANCE/FEATURES - Tx batching -- Won't make OVS 2.8 -- Bhanu mentioned "Additional changes" on ML. Request to ensure all required patches are submitted. - Cuckoo distributor -- Antonio requested some testing for this. HARDWARE OFFLOAD - Mellanox will submit patches based on collaboration with Napatech - Suggestion/Request to try and make higher layers follow same model as tc offload - Request for Joe to review On 07/25/2017 02:25 PM, Kevin Traynor wrote: > Hi All, > > The OVS-DPDK public meetings have moved to Wednesday's at the same time. > Everyone is welcome, it's a chance to share status/plans etc. > > It's scheduled for every 2 weeks starting 26th July. Meeting time is > currently @ 4pm UTC. > > You can connect through Bluejeans or through dial in: > > https://bluejeans.com/139318596 > > US: +1.408.740.7256 > UK: +44.203.608.5256 > Germany: +49.32.221.091256 > Ireland: +353.1.697.1256 > Other numbers at https://www.bluejeans.com/numbers > Meeting ID: 139318596 > > thanks, > Kevin. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 1/2] netdev: Refactor destruction of netdev_ports.
On 08/08/2017 11:23 AM, Joe Stringer wrote: An upcoming patch will reuse this from elsewhere. Signed-off-by: Joe Stringer--- lib/netdev.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/netdev.c b/lib/netdev.c index 7e9896b82928..3e8b211857d7 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -2255,6 +2255,15 @@ netdev_ports_get(odp_port_t port_no, const struct dpif_class *dpif_class) return ret; } +static void +netdev_port_data_destroy(struct port_to_netdev_data *data) +{ +dpif_port_destroy(>dpif_port); +netdev_close(data->netdev); /* unref and possibly close */ +hmap_remove(_to_netdev, >node); +free(data); +} + int netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class) { @@ -2266,10 +2275,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class) data = netdev_ports_lookup(port_no, dpif_class); if (data) { -dpif_port_destroy(>dpif_port); -netdev_close(data->netdev); /* unref and possibly close */ -hmap_remove(_to_netdev, >node); -free(data); +netdev_port_data_destroy(data); ret = 0; } LGTM Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 2/2] dpif-netdev: Per-port conditional EMC insert.
Conditional EMC insert helps a lot in scenarios with high numbers of parallel flows, but in current implementation this option affects all the threads and ports at once. There are scenarios there we have different number of flows on different ports. For example, if one of the VMs encapsulates traffic using additional headers, it will recieve large number of flows but only few flows will come out of this VM. In this scenario it's much faster to use EMC instead of classifier for traffic from the VM, but it's better to disable EMC for the traffic which flows to VM. To handle above issue 'emc-insert-inv-prob' was converted to per-port option. Default value and behaviour kept as is. For example, following command sets the insertion probability for packets that came from port 'dpdk0' to ~1/20, i.e. ~5%: ovs-vsctl set interface dpdk0 other_config:emc-insert-inv-prob=20 Signed-off-by: Ilya Maximets--- Documentation/howto/dpdk.rst | 4 +- NEWS | 2 +- lib/dpif-netdev.c| 106 +-- tests/pmd.at | 7 ++- vswitchd/vswitch.xml | 42 ++--- 5 files changed, 106 insertions(+), 55 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index d7f6610..c620961 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -389,9 +389,9 @@ EMC Insertion Probability - By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC). It is possible to change this insertion probability by setting the -``emc-insert-inv-prob`` option:: +``emc-insert-inv-prob`` option for the desired interface:: -$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N +$ ovs-vsctl set interface other_config:emc-insert-inv-prob=N where: diff --git a/NEWS b/NEWS index 66eb936..a7bfdaf 100644 --- a/NEWS +++ b/NEWS @@ -1,6 +1,6 @@ Post-v2.8.0 - - Nothing yet. + - EMC insertion probability turned to per-port other_config. v2.8.0 - xx xxx - diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 6d42393..94e7bc4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -266,9 +266,6 @@ struct dp_netdev { struct ovs_mutex meter_locks[N_METER_LOCKS]; struct dp_meter *meters[MAX_METERS]; /* Meter bands. */ -/* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ -OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; - /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ struct fat_rwlock upcall_rwlock; @@ -364,6 +361,8 @@ struct dp_netdev_port { unsigned n_rxq; /* Number of elements in 'rxqs' */ unsigned *txq_used; /* Number of threads that use each tx queue. */ struct ovs_mutex txq_used_mutex; +uint32_t emc_insert_min;/* Probability of EMC insertions is a factor + * of 'emc_insert_min'. */ char *type; /* Port type as requested by user. */ char *rxq_affinity_list;/* Requested affinity of rx queues. */ }; @@ -487,6 +486,7 @@ struct dp_netdev_pmd_cycles { struct polled_queue { struct netdev_rxq *rx; odp_port_t port_no; +uint32_t emc_insert_min; }; /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */ @@ -508,6 +508,8 @@ struct tx_port { struct dp_netdev_pmd_thread_ctx { /* Latest measured time. */ long long now; +/* EMC insertion probability context for the current processing cycle. */ +uint32_t emc_insert_min; /* Used to count cycles. See 'cycles_count_end()' */ unsigned long long last_cycles; }; @@ -1202,8 +1204,6 @@ create_dp_netdev(const char *name, const struct dpif_class *class, conntrack_init(>conntrack); -atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN); - cmap_init(>poll_threads); ovs_mutex_init(>tx_qid_pool_mutex); @@ -1485,6 +1485,7 @@ port_create(const char *devname, const char *type, port->netdev = netdev; port->type = xstrdup(type); port->sf = sf; +port->emc_insert_min = DEFAULT_EM_FLOW_INSERT_MIN; port->need_reconfigure = true; ovs_mutex_init(>txq_used_mutex); @@ -2104,8 +2105,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, * default the value is UINT32_MAX / 100 which yields an insertion * probability of 1/100 ie. 1% */ -uint32_t min; -atomic_read_relaxed(>dp->emc_insert_min, ); +uint32_t min = pmd->ctx.emc_insert_min; if (min && random_uint32() <= min) { emc_insert(>flow_cache, key, flow); @@ -2914,10 +2914,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) { struct dp_netdev *dp = get_dp_netdev(dpif); const char *cmask = smap_get(other_config, "pmd-cpu-mask"); -unsigned long long
[ovs-dev] [PATCH 1/2] dpif-netdev: Keep latest measured time for PMD thread.
In current implementation 'now' variable updated once on each receive cycle and passed through the whole datapath via function arguments. It'll be better to keep this variable inside PMD thread structure to be able to get it at any time. Such solution will save the stack memory and simplify possible modifications in current logic. This patch introduces new structure 'dp_netdev_pmd_thread_ctx' contained by 'struct dp_netdev_pmd_thread' to store any processing context of this PMD thread. For now, only time and cycles moved to that structure. Can be extended in the future. Signed-off-by: Ilya Maximets--- Will be used in the next patch to not pass the probability through the whole datapath. lib/dpif-netdev.c | 117 ++ 1 file changed, 65 insertions(+), 52 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e2cd931..6d42393 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -504,6 +504,16 @@ struct tx_port { struct hmap_node node; }; +/* Contained by struct dp_netdev_pmd_thread's 'ctx' member. */ +struct dp_netdev_pmd_thread_ctx { +/* Latest measured time. */ +long long now; +/* Used to count cycles. See 'cycles_count_end()' */ +unsigned long long last_cycles; +}; + +static void pmd_thread_ctx_time_update(struct dp_netdev_pmd_thread *); + /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll @@ -556,8 +566,8 @@ struct dp_netdev_pmd_thread { /* Cycles counters */ struct dp_netdev_pmd_cycles cycles; -/* Used to count cicles. See 'cycles_counter_end()' */ -unsigned long long last_cycles; +/* Current context of the PMD thread. */ +struct dp_netdev_pmd_thread_ctx ctx; struct latch exit_latch;/* For terminating the pmd thread. */ struct seq *reload_seq; @@ -630,8 +640,7 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *, bool may_steal, const struct flow *flow, const struct nlattr *actions, - size_t actions_len, - long long now); + size_t actions_len); static void dp_netdev_input(struct dp_netdev_pmd_thread *, struct dp_packet_batch *, odp_port_t port_no); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, @@ -679,9 +688,9 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, - long long now, bool purge); + bool purge); static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd, - struct tx_port *tx, long long now); + struct tx_port *tx); static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); @@ -723,6 +732,12 @@ emc_cache_slow_sweep(struct emc_cache *flow_cache) flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK; } +static inline void +pmd_thread_ctx_time_update(struct dp_netdev_pmd_thread *pmd) +{ +pmd->ctx.now = time_msec(); +} + /* Returns true if 'dpif' is a netdev or dummy dpif, false otherwise. */ bool dpif_is_netdev(const struct dpif *dpif) @@ -2839,6 +2854,9 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) ovs_mutex_lock(>non_pmd_mutex); } +/* Update current time in PMD context. */ +pmd_thread_ctx_time_update(pmd); + /* The action processing expects the RSS hash to be valid, because * it's always initialized at the beginning of datapath processing. * In this case, though, 'execute->packet' may not have gone through @@ -2851,8 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) dp_packet_batch_init_packet(, execute->packet); dp_netdev_execute_actions(pmd, , false, execute->flow, - execute->actions, execute->actions_len, - time_msec()); + execute->actions, execute->actions_len); if (pmd->core_id == NON_PMD_CORE_ID) { ovs_mutex_unlock(>non_pmd_mutex); @@ -3073,7 +3090,7 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) OVS_ACQUIRES(_counter_fake_mutex) OVS_NO_THREAD_SAFETY_ANALYSIS { -pmd->last_cycles = cycles_counter(); +pmd->ctx.last_cycles = cycles_counter(); } /* Stop counting cycles and add them to the counter 'type' */ @@ -3083,7 +3100,7 @@ cycles_count_end(struct
[ovs-dev] [PATCH 0/2] Per-port EMC insertion probability.
Ilya Maximets (2): dpif-netdev: Keep latest measured time for PMD thread. dpif-netdev: Per-port conditional EMC insert. Documentation/howto/dpdk.rst | 4 +- NEWS | 2 +- lib/dpif-netdev.c| 223 ++- tests/pmd.at | 7 +- vswitchd/vswitch.xml | 42 5 files changed, 171 insertions(+), 107 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags on init.
-Original Message- From: Darrell BallDate: Wednesday, August 9, 2017 at 1:38 PM To: "Chandran, Sugesh" , Ben Pfaff Cc: "d...@openvswitch.org" Subject: Re: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags on init. -Original Message- From: "Chandran, Sugesh" Date: Wednesday, August 9, 2017 at 12:55 PM To: Darrell Ball , Ben Pfaff Cc: "d...@openvswitch.org" Subject: RE: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags on init. Regards _Sugesh > > > > > > Correct, I reused reset_dp_packet_checksum_ol_flags() to do the > > initialization as well > > > I could also have created a separate function. > > > > > > In case a DPDK dev is used, those flags will be managed by DPDK. > > > > > > That sounds like a > > > bug in itself--is there a missing call to initialize the mbuf > somewhere? > > > > > > Are you suggesting to initialize the whole mbuf for each packet ? > > > > The issue that I'm raising is that it's unusual to take an > > uninitialized, indeterminate field, and then initialize it by clearing a > > few bits. It's much more conventional to initialize it by setting it to > > zero, like this: > > > > p->mbuf.ol_flags = 0; > > > > > > That is better; I will create a separate function then. > > I will resend > > Thanks > [Sugesh] I also agree with Ben here. > Currently OVS uses only checksum offload flags from mbuf(As I am aware > of). > But there are other flag bits that may get used in future like TSO. > So its better to initialize the mbuf properly before using. > > Here is the mbuf reset function in DPDK that get called when an existing > memory is mapped to > Mbuf. > I believe only the ol_flags are relevant for now in OVS. > > There is no higher cost associated with initializing all the ol_flags vs some > flags, so that is fine. > It will be done twice in the case of a packet received from a dpdk device, but > it is a small cost. > I was more concerned about the dual responsibility conflict when packets are > received from a DPDK device and this is why I wanted to limit the scope of > the flag management in OVS; it should be ok, though. > Hence, I mentioned that I will initialize all the ol_flags. > > JTBC, are you suggesting to initialize all the fields below ? > This would mean when packets are received from a DPDK dev, both the rte > library and OVS would separately initialize all the fields below – meaning, it > would be done twice for each packet. > [Sugesh] No, we don’t have to initialize all the fields. But we can have a generic function to init all mbuf fields that are relevant in OVS. For now its only ol_flags. In the future this function must be updated when we use more fields from rte_mbuf. We are on the same page then. I plan for the function to have a generic name, so this fine. Sorry, I really didn’t get the case you mentioned above, where the init get called twice. What I can see in the code is , during the dpdk mp init, the the dp_packet_init is called for rte_mbuf initialization. This happens at port initiation. On packet reception, there wont any call to dp_packet_init. I meant for the memory related to packets that are received, not during the reception time itself. However, the memory will be reused for subsequent packets, right. You can’t use a piece of memory once and never use it again – this is called a memory leak. This means each mbuf will need to be initialized again and again. I did some background investigation around this Sugesh. I originally expected that an init function ptr for OVS only portion of dp-packet by passed and saved with the dpdk library to be used to opaquely init when rte calls priv memory, which is ovs common memory, in this case. The func ptr is used but only for init of the dpdk memory pools (which confirms what you mentioned) and not later when mbuf memory is reused for subsequent packets. Some OVS common memory (i.e. non-mbuf portion) of dp-packet must still be initialized for every packet; the transient fields and seem to be
[ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.
This allows to collect packets from more than one RX burst and send them together with a configurable maximum latency. 'other_config:output-max-latency' can be used to configure time that a packet can wait in output batch for sending. Signed-off-by: Ilya Maximets--- Notes: * This is an RFC and should not be used for performance testing. * Millisecond granularity is used for now. Can be easily switched to use microseconds instead. lib/dpif-netdev.c| 121 ++- vswitchd/vswitch.xml | 15 +++ 2 files changed, 115 insertions(+), 21 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index dcf55f3..0d78ae4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -85,6 +85,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); #define MAX_RECIRC_DEPTH 5 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) +/* Use instant packet send by default. */ +#define DEFAULT_OUTPUT_MAX_LATENCY 0 + /* Configuration parameters. */ enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */ enum { MAX_METERS = 65536 };/* Maximum number of meters. */ @@ -262,6 +265,9 @@ struct dp_netdev { struct hmap ports; struct seq *port_seq; /* Incremented whenever a port changes. */ +/* The time that a packet can wait in output batch for sending. */ +atomic_uint32_t output_max_latency; + /* Meters. */ struct ovs_mutex meter_locks[N_METER_LOCKS]; struct dp_meter *meters[MAX_METERS]; /* Meter bands. */ @@ -502,6 +508,7 @@ struct tx_port { int qid; long long last_used; struct hmap_node node; +long long output_time; struct dp_packet_batch output_pkts; }; @@ -574,6 +581,9 @@ struct dp_netdev_pmd_thread { * than 'cmap_count(dp->poll_threads)'. */ uint32_t static_tx_qid; +/* Number of filled output batches. */ +int n_output_batches; + struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ struct hmap poll_list OVS_GUARDED; @@ -669,9 +679,9 @@ static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd, struct rxq_poll *poll) OVS_REQUIRES(pmd->port_mutex); -static void +static int dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd, - long long now); + long long now, bool force); static void reconfigure_datapath(struct dp_netdev *dp) OVS_REQUIRES(dp->port_mutex); static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); @@ -1193,6 +1203,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class, conntrack_init(>conntrack); atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN); +atomic_init(>output_max_latency, DEFAULT_OUTPUT_MAX_LATENCY); cmap_init(>poll_threads); @@ -2858,7 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) dp_packet_batch_init_packet(, execute->packet); dp_netdev_execute_actions(pmd, , false, execute->flow, execute->actions, execute->actions_len, now); -dp_netdev_pmd_flush_output_packets(pmd, now); +dp_netdev_pmd_flush_output_packets(pmd, now, true); if (pmd->core_id == NON_PMD_CORE_ID) { ovs_mutex_unlock(>non_pmd_mutex); @@ -2907,6 +2918,16 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) smap_get_ullong(other_config, "emc-insert-inv-prob", DEFAULT_EM_FLOW_INSERT_INV_PROB); uint32_t insert_min, cur_min; +uint32_t output_max_latency, cur_max_latency; + +output_max_latency = smap_get_int(other_config, "output-max-latency", + DEFAULT_OUTPUT_MAX_LATENCY); +atomic_read_relaxed(>output_max_latency, _max_latency); +if (output_max_latency != cur_max_latency) { +atomic_store_relaxed(>output_max_latency, output_max_latency); +VLOG_INFO("Output maximum latency set to %"PRIu32" ms", + output_max_latency); +} if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) { free(dp->pmd_cmask); @@ -3107,11 +3128,12 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(>cycles.n[type], interval); } -static void +static int dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, struct tx_port *p, long long now) { int tx_qid; +int output_cnt; bool dynamic_txqs; dynamic_txqs = p->port->dynamic_txqs; @@ -3121,21 +3143,39 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, tx_qid = pmd->static_tx_qid; } +output_cnt = dp_packet_batch_size(>output_pkts);
[ovs-dev] [PATCH v3 3/4] netdev-dpdk: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT. Signed-off-by: Ilya Maximets--- lib/netdev-dpdk.c | 5 - 1 file changed, 5 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8e3158f..ddcc574 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1819,8 +1819,6 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) int newcnt = 0; int i; -dp_packet_batch_apply_cutlen(batch); - for (i = 0; i < batch->count; i++) { int size = dp_packet_size(batch->packets[i]); @@ -1879,7 +1877,6 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, dpdk_do_tx_copy(netdev, qid, batch); dp_packet_delete_batch(batch, true); } else { -dp_packet_batch_apply_cutlen(batch); __netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count); } return 0; @@ -1910,8 +1907,6 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, int cnt = batch->count; struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; -dp_packet_batch_apply_cutlen(batch); - cnt = netdev_dpdk_filter_packet_len(dev, pkts, cnt); cnt = netdev_dpdk_qos_run(dev, pkts, cnt); dropped = batch->count - cnt; -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/4] dpif-netdev: Output packet batching.
While processing incoming batch of packets they are scattered across many per-flow batches and sent separately. This becomes an issue while using more than a few flows. For example if we have balanced-tcp OvS bonding with 2 ports there will be 256 datapath internal flows for each dp_hash pattern. This will lead to scattering of a single recieved batch across all of that 256 per-flow batches and invoking send for each packet separately. This behaviour greatly degrades overall performance of netdev_send because of inability to use advantages of vectorized transmit functions. But the half (if 2 ports in bonding) of datapath flows will have the same output actions. This means that we can collect them in a single place back and send at once using single call to netdev_send. This patch introduces per-port packet batch for output packets for that purpose. 'output_pkts' batch is thread local and located in send port cache. Signed-off-by: Ilya Maximets--- lib/dpif-netdev.c | 104 ++ 1 file changed, 82 insertions(+), 22 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e2cd931..a2a25be 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -502,6 +502,7 @@ struct tx_port { int qid; long long last_used; struct hmap_node node; +struct dp_packet_batch output_pkts; }; /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate @@ -633,9 +634,10 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, size_t actions_len, long long now); static void dp_netdev_input(struct dp_netdev_pmd_thread *, -struct dp_packet_batch *, odp_port_t port_no); +struct dp_packet_batch *, odp_port_t port_no, +long long now); static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, - struct dp_packet_batch *); + struct dp_packet_batch *, long long now); static void dp_netdev_disable_upcall(struct dp_netdev *); static void dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd); @@ -667,6 +669,9 @@ static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd, struct rxq_poll *poll) OVS_REQUIRES(pmd->port_mutex); +static void +dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd, + long long now); static void reconfigure_datapath(struct dp_netdev *dp) OVS_REQUIRES(dp->port_mutex); static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); @@ -2809,6 +2814,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_pmd_thread *pmd; struct dp_packet_batch pp; +long long now = time_msec(); if (dp_packet_size(execute->packet) < ETH_HEADER_LEN || dp_packet_size(execute->packet) > UINT16_MAX) { @@ -2851,8 +2857,8 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) dp_packet_batch_init_packet(, execute->packet); dp_netdev_execute_actions(pmd, , false, execute->flow, - execute->actions, execute->actions_len, - time_msec()); + execute->actions, execute->actions_len, now); +dp_netdev_pmd_flush_output_packets(pmd, now); if (pmd->core_id == NON_PMD_CORE_ID) { ovs_mutex_unlock(>non_pmd_mutex); @@ -3101,6 +3107,37 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(>cycles.n[type], interval); } +static void +dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd, + struct tx_port *p, long long now) +{ +int tx_qid; +bool dynamic_txqs; + +dynamic_txqs = p->port->dynamic_txqs; +if (dynamic_txqs) { +tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now); +} else { +tx_qid = pmd->static_tx_qid; +} + +netdev_send(p->port->netdev, tx_qid, >output_pkts, true, dynamic_txqs); +dp_packet_batch_init(>output_pkts); +} + +static void +dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd, + long long now) +{ +struct tx_port *p; + +HMAP_FOR_EACH (p, node, >send_port_cache) { +if (!dp_packet_batch_is_empty(>output_pkts)) { +dp_netdev_pmd_flush_output_on_port(pmd, p, now); +} +} +} + static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, struct netdev_rxq *rx, @@ -3113,10 +3150,13 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, dp_packet_batch_init();
[ovs-dev] [PATCH v3 0/4] Output packet batching.
This patch-set inspired by [1] from Bhanuprakash Bodireddy. Implementation of [1] looks very complex and introduces many pitfalls [2] for later code modifications like possible packet stucks. This version targeted to make simple and flexible output packet batching on higher level without introducing and even simplifying netdev layer. Patch set consists of 3 patches. All the functionality introduced in the first patch. Two others are just cleanups of netdevs to not do unnecessary things. 4th patch is just an RFC with possible time based implementation. Should not be concidered for performance testing. Basic testing of 'PVP with OVS bonding on phy ports' scenario shows significant performance improvement. More accurate and intensive testing required. [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission. https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html [2] For example: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html Version 3: * Rebased on current master. * Time based RFC: fixed assert on n_output_batches <= 0. Version 2: * Rebased on current master. * Added time based batching RFC patch. * Fixed mixing packets with different sources in same batch. Ilya Maximets (4): dpif-netdev: Output packet batching. netdev: Remove unused may_steal. netdev-dpdk: Remove useless cutlen. dpif-netdev: Time based output batching. lib/dpif-netdev.c | 197 ++ lib/netdev-bsd.c | 4 +- lib/netdev-dpdk.c | 30 +++- lib/netdev-dummy.c| 4 +- lib/netdev-linux.c| 4 +- lib/netdev-provider.h | 7 +- lib/netdev.c | 12 +-- lib/netdev.h | 2 +- vswitchd/vswitch.xml | 15 9 files changed, 208 insertions(+), 67 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 1/2] OF support and translation of generic encap and decap
The generic code today in function parse_ENCAP() uses the encap header string also as string for the property class. I am afraid that implementer of subsequent encap headers might not realize that this is a temporary implementation shortcut that should have been generalized. The minimum we should add now is a comment explaining this, such as, for example: diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index bfc8a80..84522e0 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4134,9 +4134,19 @@ parse_ed_props(const uint16_t prop_class, char **arg, int *n_props, struct ofpbu return NULL; } -/* The string representation of the encap action is - * encap(header_type(prop=,tlv(,,),...)) - */ +/* The generic string representation of the encap action is + * encap() + * encap((=,(),...)) + * + * TODO: The current implementation only supports the simple case that all + * encap parameters for a given encap header type are in a single property + * class, identified by the same keyword as the encap header type. + * To represent different property classes allowed by the OF action, the + * syntax should be generalized as follows: + * + * encap((=,(),...), + * (=,(),...),...) +*/ static char * OVS_WARN_UNUSED_RESULT parse_ENCAP(char *arg, BR, Jan > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, 08 August, 2017 21:36 > To: Jan Scheurich> Cc: Yi Yang ; d...@openvswitch.org; Zoltán Balogh > > Subject: Re: [PATCH v5 1/2] OF support and translation of generic encap and > decap > > We can add the additional syntax at the same time we add something that > has the need for it. > > On Tue, Aug 08, 2017 at 03:52:32PM +, Jan Scheurich wrote: > > I know this comment is late as the patch has been merged already, but I > > just returned from vacation and only found today: > > > > I agree that the new simplified ovs-ofctl syntax for encap properties looks > > quite neat for the NSH use case at hand: > > > > encap(nsh(md_type=1)) > > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) > > > > But unfortunately we have fallen into the trap of oversimplifying. The > > generic encap action (drafted in EXT-382) distinguishes between > the encapsulation packet type (in this case (1,0x894f) for NSH) and the encap > property class (in this case 0x4 for NSH), which may be > different for every property in an encap action. > > > > Using the "nsh" keyword both as shorthand for the packet type and the > > property class selector works here because the NSH encap action > only supports NSH properties so far, and in many simple applications of the > generic encap action there may well be 1-1 relation between > the encap packet type and the class of all supported properties. But the ONF > draft specifically decouples the packet type from property > classes for a richer set of use cases. > > > > For example there could be a more powerful generic encap operation to add > > multiple protocol layers in one action. In this case the > encap properties would likely belong to several property classes. Or an > encapsulation header re-uses a general encap property already > defined elsewhere, which should not be duplicated. > > > > In order not to restrict the generality of the ovs-ofctl syntax, I'd > > suggest to separate packet type and the property class as follows: > > > > encap() > > encap((,(=,()),...) > > > > For NSH the syntax would look: > > > > encap(nsh,nsh(md_type=1)) > > encap(nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) > > > > To be user friendly we can also allow the above more concise syntax in the > > special case of a 1-1 relation between packet type and > property class. In that case we could re-add the generalization of the syntax > after the OVS 2.8 release without braking backward > compatibility. > > > > BR, Jan > > > > > -Original Message- > > > From: Yi Yang [mailto:yi.y.y...@intel.com] > > > Sent: Wednesday, 02 August, 2017 10:04 > > > To: d...@openvswitch.org > > > Cc: b...@ovn.org; Jan Scheurich ; Yi Yang > > > ; Zoltán Balogh > > > > > > Subject: [PATCH v5 1/2] OF support and translation of generic encap and > > > decap > > > > > > From: Jan Scheurich > > > > > > This commit adds support for the OpenFlow actions generic encap > > > and decap (as specified in ONF EXT-382) to the OVS control plane. > > > > > > CLI syntax for encap action with properties: > > > encap() > > > encap((=,(,,),...)) > > > > > > For example: > > > encap(ethernet) > > > encap(nsh(md_type=1)) > > > > > > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) > > > > > > CLI syntax for decap action: > > > decap() > > > decap(packet_type(ns=,type=)) > > > > > > For
[ovs-dev] New Pricelist
[ View in browser ]( http://r.newsletter.bonescamail.nl/nru6rn1aoatrf.html ) [ ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k4taoatrd.html ) [ Click here for our complete pricelist! ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k5lqoatrd.html ) SPECIAL PROMO FOR 1 AND 3 PALETS : [ Click here to find all recent offers. ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k6e6oatrd.html ) This email was sent to d...@openvswitch.org You received this email because you are registered with Bonesca Import en Export BV [ Unsubscribe here ]( http://r.newsletter.bonescamail.nl/nru6rn1aoatrg.html ) Sent by [ ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k76moatrd.html ) © 2017 Bonesca Import en Export BV ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] OVN question: Transmission of link-local IPv6 multicast
Hi, I'm curious about the current behavior when ovn-controller originates a packet and transmits it over a link-local IPv6 multicast address. Will the packet be delivered only to nodes on the same chassis, or can the packet also reach on-link nodes on other chassis? The reason I ask is that I'm working towards expanding OVN's IPv6 neighbor discovery capabilities, and the first step on that path is being able to send periodic router advertisements (RAs). In a traditional physical network, each interface on a router may send periodic RAs to its link-local all-nodes address (ff02::1). This way all on-link nodes receive the RA, and the world is a happy place. OVN is not a traditional physical router, though :) With OVN, each hypervisor has an ovn-controller instance running, blurring the lines about what exactly constitutes a "router interface". From an ovn-nb point of view, a logical router port constitutes a router interface. However, in practice, I'm not sure if this holds true once you get down to the ovn-controller level. Let's say that a single link (via a logical switch) is spread across three hypervisors. If the ovn-controller on each of those hypervisors sends an RA to the link-local all-nodes address, would that result in nodes on each hypervisor receiving three RAs? Or would each node only receive one RA since it only receives the RA from the local ovn-controller? I suspect the answer to this all comes down to how openflow flows get installed on each OVS instance. What I'm curious about is the current behavior of OVN in this regard. Thanks, Mark Michelson P.S. Someone please tell me if ovs-discuss would have been a more suitable list for this question. I figured since it was related to a feature I was developing that this would be the proper list, though. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly
> > On 08/09/2017 10:00 PM, Aaron Conole wrote: > > As part of a devargs rework in DPDK, the PCI header file was removed, and > > needs to be directly included. This isn't required to build with 17.05 or > > earlier, but will be required should a future update happen. > > > > Signed-off-by: Aaron Conole> > Tried to build with dpdk 17.08 > > Tested-By: Timothy Redaelli +1 for forwards compatibility Acked-by: Ciara Loftus > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly
On 08/09/2017 10:00 PM, Aaron Conole wrote: > As part of a devargs rework in DPDK, the PCI header file was removed, and > needs to be directly included. This isn't required to build with 17.05 or > earlier, but will be required should a future update happen. > > Signed-off-by: Aaron ConoleTried to build with dpdk 17.08 Tested-By: Timothy Redaelli ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in 2.8 release in compat mode by porting this. Signed-off-by: Yi Yang--- drivers/net/vxlan.c | 7 ++ include/net/nsh.h| 126 ++ include/uapi/linux/openvswitch.h | 33 net/openvswitch/actions.c| 165 +++ net/openvswitch/flow.c | 41 ++ net/openvswitch/flow.h | 1 + net/openvswitch/flow_netlink.c | 54 - 7 files changed, 426 insertions(+), 1 deletion(-) create mode 100644 include/net/nsh.h diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index dbca067..843714c 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -27,6 +27,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_IPV6) #include @@ -1267,6 +1268,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed, case VXLAN_GPE_NP_IPV6: *protocol = htons(ETH_P_IPV6); break; + case VXLAN_GPE_NP_NSH: + *protocol = htons(ETH_P_NSH); + break; case VXLAN_GPE_NP_ETHERNET: *protocol = htons(ETH_P_TEB); break; @@ -1806,6 +1810,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags, case htons(ETH_P_IPV6): gpe->next_protocol = VXLAN_GPE_NP_IPV6; return 0; + case htons(ETH_P_NSH): + gpe->next_protocol = VXLAN_GPE_NP_NSH; + return 0; case htons(ETH_P_TEB): gpe->next_protocol = VXLAN_GPE_NP_ETHERNET; return 0; diff --git a/include/net/nsh.h b/include/net/nsh.h new file mode 100644 index 000..96477a1 --- /dev/null +++ b/include/net/nsh.h @@ -0,0 +1,126 @@ +#ifndef __NET_NSH_H +#define __NET_NSH_H 1 + + +/* + * Network Service Header: + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |Ver|O|C|R|R|R|R|R|R|Length | MD Type | Next Proto | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |Service Path ID| Service Index | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | | + * ~ Mandatory/Optional Context Header ~ + * | | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * Ver = The version field is used to ensure backward compatibility + * going forward with future NSH updates. It MUST be set to 0x0 + * by the sender, in this first revision of NSH. + * + * O = OAM. when set to 0x1 indicates that this packet is an operations + * and management (OAM) packet. The receiving SFF and SFs nodes + * MUST examine the payload and take appropriate action. + * + * C = context. Indicates that a critical metadata TLV is present. + * + * Length : total length, in 4-byte words, of NSH including the Base + * Header, the Service Path Header and the optional variable + * TLVs. + * MD Type: indicates the format of NSH beyond the mandatory Base Header + * and the Service Path Header. + * + * Next Protocol: indicates the protocol type of the original packet. A + * new IANA registry will be created for protocol type. + * + * Service Path Identifier (SPI): identifies a service path. + * Participating nodes MUST use this identifier for Service + * Function Path selection. + * + * Service Index (SI): provides location within the SFP. + * + * [0] https://tools.ietf.org/html/draft-ietf-sfc-nsh-13 + */ + +/** + * struct nsh_md1_ctx - Keeps track of NSH context data + * @nshc<1-4>: NSH Contexts. + */ +struct nsh_md1_ctx { + __be32 c[4]; +}; + +struct nsh_md2_tlv { + __be16 md_class; + u8 type; + u8 length; + u8 md_value[]; +}; + +struct nsh_hdr { + __be16 ver_flags_len; + u8 md_type; + u8 next_proto; + __be32 path_hdr; + union { + struct nsh_md1_ctx md1; + struct nsh_md2_tlv md2[0]; + }; +}; + +/* Masking NSH header fields. */ +#define NSH_VER_MASK 0xc000 +#define NSH_VER_SHIFT 14 +#define NSH_FLAGS_MASK 0x3fc0 +#define NSH_FLAGS_SHIFT6 +#define NSH_LEN_MASK 0x003f +#define NSH_LEN_SHIFT 0 + +#define NSH_SPI_MASK 0xff00 +#define NSH_SPI_SHIFT 8 +#define NSH_SI_MASK0x00ff +#define NSH_SI_SHIFT 0 + +#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */ +#define ETH_P_NSH 0x894F /* Ethertype for NSH. */ + +/* NSH Base Header Next Protocol. */ +#define NSH_P_IPV40x01 +#define NSH_P_IPV60x02 +#define NSH_P_ETHERNET0x03 +#define NSH_P_NSH