date:20170825

[ovs-dev] [PATCH] dpif: Remove duplicated word in comment for dpif_recv()

2017-08-25 Thread fukaige

From: Kaige Fu 

Signed-off-by: Kaige Fu 
---
 lib/dpif.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/dpif.c b/lib/dpif.c
index 79b2e6c..a4471d5 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1547,11 +1547,11 @@ dpif_set_config(struct dpif *dpif, const struct smap 
*cfg)
 return error;
 }
 
-/* Polls for an upcall from 'dpif' for an upcall handler.  Since there
- * there can be multiple poll loops, 'handler_id' is needed as index to
- * identify the corresponding poll loop.  If successful, stores the upcall
- * into '*upcall', using 'buf' for storage.  Should only be called if
- * 'recv_set' has been used to enable receiving packets from 'dpif'.
+/* Polls for an upcall from 'dpif' for an upcall handler.  Since there can
+ * be multiple poll loops, 'handler_id' is needed as index to identify the
+ * corresponding poll loop.  If successful, stores the upcall into '*upcall',
+ * using 'buf' for storage.  Should only be called if 'recv_set' has been used
+ * to enable receiving packets from 'dpif'.
  *
  * 'upcall->key' and 'upcall->userdata' point into data in the caller-provided
  * 'buf', so their memory cannot be freed separately from 'buf'.
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Darrell Ball



On 8/25/17, 3:21 PM, "Fischetti, Antonio"  wrote:



> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, August 25, 2017 6:19 PM
> To: Fischetti, Antonio ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash 
when EMC
> is disabled
> 
> Hi Antonio
> 
> On 8/25/17, 6:56 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com"  antonio.fische...@intel.com> wrote:
> 
> When EMC is disabled the reading of RSS hash is skipped.
> Also, for packets that are not recirculated it retrieves
> the hash value without considering the recirc id.
> 
> Signed-off-by: Antonio Fischetti 
> ---
> 
>  V4
>   - reworked to remove dependencies from other patches in
> patchset "Skip EMC for recirc pkts and other optimizations."
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DAugust_337320.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> 
uZnsw=7UGZunJeutRugFmidRAykhZFRYscZ2rUUmhF7Ys9IM8=hzVGT1bI3zOj_x9J5cu18Q2D5
> WLHdEC1qhQ-Gu--TEE=
> 
>   - measurements were repeated with the latest head of master.
> 
>  Port-to-Port Test
>  =
> Software built with "-O2 -march=native -g".
> 
> I measured the Rx rate regardless of pkt loss by sending 1 UDP
> flow,
> 64B packets, at the line-rate.
> 
> 2 PMDs with 3 Tx queues.
> 
> Flows setup:
>   in_port=dpdk0 actions=output:dpdk1
>   in_port=dpdk1 actions=output:dpdk0
> 
> Results
> ---
> Values are for the Rx rate in Mpps, regardless of packet loss.
> 
> RSS column:
>Yes: RSS hash is provided by the NIC
>No: RSS is disabled and the 5-tuple hash must be
>computed in software.
> 
>Note: to simulate RSS disabled I added the line
>dp_packet_rss_valid(struct dp_packet *p)
>{
>+return false;
>#ifdef DPDK_NETDEV
> 
> EMC column:
>Yes: default probability insertion,
>No: EMC disabled.
> 
> Orig means Commit ID:
>   75f9e007e7f7eb91461e238f882d1c539c56bb8d
> 
> [Darrell]
> It looks like the main benefit (+6% throughput) occurs
> when RSS is not calculated by the NIC and emc is disabled. What are the 
common
> cases when RSS is not generated by the NIC ?

[Antonio]
A common case could be when packets are coming in from a VM.  

Well, there is no pnic in this case.
I am thinking gateway usage is predominate in importance, but the RSS cases are 
about equal
based on the sampling below, so we could almost ‘remove them from the 
equation’, as it were.

For non-RSS:
Using emc, there is about 0.7 % degradation, but it is more common to use emc
So overall, it is not clear; what do you think ?

Also when the NIC does not provide a hash value, but this shouldn't be 
considered a common case I guess.

I agree.


> 
> 
> 
> +--++
> +-+-+Orig  | Orig + this|
> | RSS | EMC |  |patch   |
> +-+-+--++
> | Yes | Yes | 12.02, 11.38 | 12.18, 11.32   |
> | Yes |  No |  8.35,  8.36 |  8.36,  8.38   |
> +-+-+--++
> |  No | Yes |  9.91, 11.15 |  9.84, 11.08   |
> |  No |  No |  7.83,  7.87 |  8.33,  8.35   |
> +-+-+--++
> 
> ---
>  lib/dpif-netdev.c | 32 
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e2cd931..1157998 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4567,6 +4567,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread 
*pmd,
> struct dp_packet *packet_,
>  }
> 
>  static inline uint32_t
> +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> +const struct miniflow *mf)
> +{
> +uint32_t hash;
> +
> +if

Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH

2017-08-25 Thread Jiri Benc

On Fri, 25 Aug 2017 18:25:14 +0200, Jiri Benc wrote:
> While looking at this, I realized that GSO for VXLAN-GPE is broken,
> too. Let me fix it by implementing what I described above which will
> make your patch much easier.

Okay, it's not really broken and we don't need that complexity. At
least not immediately. Hw offloading in the VXLAN-GPE case probably does
not work correctly and would benefit from that change but that's a
different beast to tackle at a different time. Software segmentation
works fine for VXLAN-GPE.

There should not be much problems with NSH segmentation, either, if we
carefully store and set mac_header, mac_len and skb->protocol around
calls to skb_mac_gso_segment. Note that with zero mac_len (and correct
skb->protocol), skb_mac_gso_segment behaves in the same way that you
tried to achieve with find_gso_segment_by_type, which is thus completely
unnecessary.

More on Monday.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC PATCH] daemon-windows: Set default error mode for services

2017-08-25 Thread Anand Kumar

Thanks for the patch. 

Acked-by: Anand Kumar 

Regards,
Anand Kumar

On 8/25/17, 10:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin 
Gabriel Serdean"  wrote:

Microsoft allows default process memory dumps via WER (Windows Error
Reporting).
WER can be set to collect dumps using in general using:

https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_windows_desktop_bb787181-28v-3Dvs.85-29.aspx=DwICAg=uilaK90D4TOVoH58JNXRgQ=Q5z9tBe-nAOpE7LIHSPV8uy5-437agMXvkeHHMkR8Us=T39QXMGqF3LDTBL68mucqjQoP0yQWcnSJvQ_W7YUs58=0OQvKfDEugpenlU5dZ80JElhDzBjd6Xys9JGa1K5lmw=
 
If a normal application crashes, the user will receive a pop-up dialog,
in which he will be asked of his consent on what will be the steps
after the dump was created (debug, close or send the dump to MSFT).
We disable the dump creation via WER in:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_blob_master_lib_util.c-23L492=DwICAg=uilaK90D4TOVoH58JNXRgQ=Q5z9tBe-nAOpE7LIHSPV8uy5-437agMXvkeHHMkR8Us=T39QXMGqF3LDTBL68mucqjQoP0yQWcnSJvQ_W7YUs58=0-3zOm3jVe_9clvGkQRY-o0Yag_Jwt_wtH_HBLq2Qdk=
 
because the idea is we don't want a pop-up if a OVS process (eg. ovs-vsctl)
has crashed.
(more information on the subject:

https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.msdn.microsoft.com_oldnewthing_20040727-2D00_-3Fp-3D38323=DwICAg=uilaK90D4TOVoH58JNXRgQ=Q5z9tBe-nAOpE7LIHSPV8uy5-437agMXvkeHHMkR8Us=T39QXMGqF3LDTBL68mucqjQoP0yQWcnSJvQ_W7YUs58=hwo9aq2eCDjXH_ce8gNn3KLpDv6IjFL92fEAzM_rgac=
 )

Until we implement our own dump collection mechanism, we can set the default
error mode for services because there are no pop-ups allowed in that case.

Signed-off-by: Alin Gabriel Serdean 
Requested-by: Anand Kumar 
---
 lib/daemon-windows.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c
index 1ba714d..2ba62d2 100644
--- a/lib/daemon-windows.c
+++ b/lib/daemon-windows.c
@@ -138,6 +138,12 @@ service_start(int *argcp, char **argvp[])
 *argcp = sargc;
 *argvp = *sargvp;
 
+/* Enable default error mode so we can take advantage of WER
+ * (Windows Error Reporting) crash dumps.
+ * Being a service it does not allow for WER window pop-up.
+ * XXX implement our on crash dump collection mechanism. */
+SetErrorMode(0);
+
 return;
 }
 
-- 
2.10.2.windows.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Q5z9tBe-nAOpE7LIHSPV8uy5-437agMXvkeHHMkR8Us=T39QXMGqF3LDTBL68mucqjQoP0yQWcnSJvQ_W7YUs58=w2X6PPKmfvsGZHK3CFa5CxHXwUSNAUnjEkL0qpJN_b8=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 9/9] ofproto/trace: Support ct_mark and ct_label in --ct-next

2017-08-25 Thread Yi-Hung Wei

Previously, --ct-next option in ofproto/trace only supports specifying
ct_state. This patch adds support of ct_mark and ct_label.

Signed-off-by: Yi-Hung Wei 
---
 lib/ct-dpif.c|  7 
 lib/ct-dpif.h|  3 ++
 lib/netlink-conntrack.c  |  5 +++
 lib/odp-util.c   |  3 +-
 lib/odp-util.h   |  1 +
 ofproto/ofproto-dpif-trace.c | 98 
 ofproto/ofproto-dpif-trace.h |  7 ++--
 ofproto/ofproto-unixctl.man  | 16 +---
 tests/ofproto-dpif.at| 12 +++---
 tests/system-traffic.at  |  6 +--
 10 files changed, 103 insertions(+), 55 deletions(-)

diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 91388dd6681b..07c549707303 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -144,6 +144,13 @@ ct_dpif_format_info(const struct ct_dpif_info *info, 
struct ds *output)
 format_flags(, ct_state_to_string, info->ct_state, '|');
 ds_put_format(output, "ct_state=%s", ds_cstr());
 ds_destroy();
+
+ds_put_format(output, ",ct_mark=%#"PRIx32, info->ct_mark);
+if (info->have_label) {
+ovs_be128 value = hton128(info->ct_label);
+ds_put_cstr(output, ",ct_label=");
+ds_put_hex(output, , sizeof value);
+}
 }
 
 void
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 9122e8cd752b..9d50db920e6e 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -175,6 +175,9 @@ struct ct_dpif_entry {
 
 struct ct_dpif_info {
 uint32_t ct_state;
+uint32_t ct_mark;
+bool have_label;
+ovs_u128 ct_label;
 };
 
 struct ct_dpif_exp_entry {
diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c
index de1e467dc61a..57da43f77135 100644
--- a/lib/netlink-conntrack.c
+++ b/lib/netlink-conntrack.c
@@ -523,6 +523,11 @@ nl_ct_get_info(struct ct_dpif_tuple *tuple, uint16_t zone,
 
 nl_ct_get_ct_state(tuple, , >ct_state);
 info->ct_state |= CS_TRACKED;
+info->ct_mark = entry.mark;
+if (entry.have_labels == true) {
+info->have_label = true;
+memcpy(>ct_label, , sizeof(info->ct_label));
+}
 
 ofpbuf_delete(reply);
 return 0;
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 4f1499ed4a47..c9669ca4572a 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -88,7 +88,6 @@ static struct nlattr *generate_all_wildcard_mask(const struct 
attr_len_tbl tbl[]
  const struct nlattr *key);
 static void format_u128(struct ds *d, const ovs_32aligned_u128 *key,
 const ovs_32aligned_u128 *mask, bool verbose);
-static int scan_u128(const char *s, ovs_u128 *value, ovs_u128 *mask);
 
 static int parse_odp_action(const char *s, const struct simap *port_names,
 struct ofpbuf *actions);
@@ -3527,7 +3526,7 @@ format_u128(struct ds *ds, const ovs_32aligned_u128 *key,
  * If either the value or mask is larger than 64 bits, the string must
  * be in hexadecimal.
  */
-static int
+int
 scan_u128(const char *s_, ovs_u128 *value, ovs_u128 *mask)
 {
 char *s = CONST_CAST(char *, s_);
diff --git a/lib/odp-util.h b/lib/odp-util.h
index 27c2ab4f0111..c2633797284b 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -344,4 +344,5 @@ void odp_put_push_eth_action(struct ofpbuf *odp_actions,
  const struct eth_addr *eth_src,
  const struct eth_addr *eth_dst);
 
+int scan_u128(const char *s, ovs_u128 *value, ovs_u128 *mask);
 #endif /* odp-util.h */
diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index d00d41a89cab..a6844b205f56 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -29,7 +29,7 @@
 static void ofproto_trace(struct ofproto_dpif *, const struct flow *,
   const struct dp_packet *packet,
   const struct ofpact[], size_t ofpacts_len,
-  struct ovs_list *next_ct_states,
+  struct ovs_list *next_cts,
   struct ds *);
 static void oftrace_node_destroy(struct oftrace_node *);
 
@@ -124,20 +124,19 @@ oftrace_recirc_node_destroy(struct oftrace_recirc_node 
*node)
 }
 
 static void
-oftrace_push_ct_state(struct ovs_list *next_ct_states, uint32_t ct_state)
+oftrace_push_ct_info(struct ovs_list *next_cts, struct ct_dpif_info info)
 {
-struct oftrace_next_ct_state *next_ct_state =
-xmalloc(sizeof *next_ct_state);
-next_ct_state->state = ct_state;
-ovs_list_push_back(next_ct_states, _ct_state->node);
+struct oftrace_next_ct *next_ct = xmalloc(sizeof *next_ct);
+next_ct->info = info;
+ovs_list_push_back(next_cts, _ct->node);
 }
 
 static void
-oftrace_pop_ct_state(struct ovs_list *next_ct_states, uint32_t *ct_state)
+oftrace_pop_ct_info(struct ovs_list *next_cts, struct ct_dpif_info *info)
 {
-struct oftrace_next_ct_state *s;
-LIST_FOR_EACH_POP (s, node, next_ct_states) {
-*ct_state = s->state;
+

[ovs-dev] [PATCH 8/9] ofproto/trace: Change delimiter of ct_state in --ct-next

2017-08-25 Thread Yi-Hung Wei

This patch changes the delimiter of ct_state in --ct-next from comma
or space to vertical bar. The new format will be easier to parse when
the --ct-next options support more ct_fields such as ct_mark and ct_label.

Signed-off-by: Yi-Hung Wei 
---
 ofproto/ofproto-dpif-trace.c | 2 +-
 ofproto/ofproto-unixctl.man  | 6 +++---
 tests/ofproto-dpif.at| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index 9303ea18c237..d00d41a89cab 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -199,7 +199,7 @@ parse_oftrace_options(int argc, const char *argv[],
 }
 
 uint32_t ct_state;
-if (!parse_ct_state(argv[++k], 0, ", ", _state, )) {
+if (!parse_ct_state(argv[++k], 0, "|", _state, )) {
 return ds_steal_cstr();
 }
 if (!validate_ct_state(ct_state, )) {
diff --git a/ofproto/ofproto-unixctl.man b/ofproto/ofproto-unixctl.man
index f511c392b548..abc9481f0faf 100644
--- a/ofproto/ofproto-unixctl.man
+++ b/ofproto/ofproto-unixctl.man
@@ -55,8 +55,8 @@ wildcards.)  \fIbridge\fR names of the bridge through which
 When the traced flow triggers conntrack actions, \fBofproto/trace\fR
 will automatically trace the forked packet processing pipeline with
 user specified ct_state.  This option sets the ct_state flags that the
-conntrack module will report. The \fIflags\fR must be a comma- or
-space-separated list of the following connection tracking flags:
+conntrack module will report. The \fIflags\fR must be a vertical
+bar separated list of the following connection tracking flags:
 .
 .RS
 .IP \(bu
@@ -89,7 +89,7 @@ changed.
 .
 .IP
 When --ct-next is unspecified, or when there are fewer --ct-next options than
-ct actions, the \fIflags\fR default to trk,new.
+ct actions, the \fIflags\fR default to trk|new.
 .
 .RE
 .
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index c76ea4eee1cc..7353cbd6289a 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -9862,7 +9862,7 @@ AT_CHECK([tail -1 stdout], [0],
   [Datapath actions: ct(commit,zone=2),4
 ])
 
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,tcp' --ct-next 'trk,est' 
--ct-next 'trk,est' ], [0], [stdout])
+AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,tcp' --ct-next 'trk|est' 
--ct-next 'trk|est' ], [0], [stdout])
 AT_CHECK([tail -1 stdout], [0],
   [Datapath actions: 3
 ])
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 6/9] netlink-conntrack: Add support for querying conntrack exp

2017-08-25 Thread Yi-Hung Wei

The ct_state of an uncommmited new flow is marked as related if the flow
is in the conntrack expectation table. In order for ofproto/trace to
identify the ct_state of a new related flow, this patch utilizes
NFNL_SUBSYS_CTNETLINK_EXP netlink subsystem to query the conntrack
expectation table, therefore, we can mark the ct_state of a related flow
correctly.

Signed-off-by: Yi-Hung Wei 
---
 lib/ct-dpif.h   |  13 
 lib/netlink-conntrack.c | 203 
 lib/netlink-conntrack.h |   2 +
 tests/atlocal.in|   3 +
 tests/system-traffic.at | 137 
 5 files changed, 358 insertions(+)

diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index f4ca07b5e776..9122e8cd752b 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -177,6 +177,19 @@ struct ct_dpif_info {
 uint32_t ct_state;
 };
 
+struct ct_dpif_exp_entry {
+struct ct_dpif_tuple tuple;
+struct ct_dpif_tuple tuple_mask;
+struct ct_dpif_tuple tuple_master;
+struct ct_dpif_tuple tuple_nat;
+struct ct_dpif_helper helper;
+uint32_t timeout;
+uint32_t id;
+uint32_t nat_dir;
+uint32_t exp_flags;
+uint32_t exp_class;
+};
+
 enum {
 CT_STATS_UDP,
 CT_STATS_TCP,
diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c
index 93cd0ac2c34f..de1e467dc61a 100644
--- a/lib/netlink-conntrack.c
+++ b/lib/netlink-conntrack.c
@@ -70,6 +70,15 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 
5);
 #define CTA_TIMESTAMP_START 1
 #define CTA_TIMESTAMP_STOP  2
 
+#define CTA_EXPECT_ZONE   (CTA_EXPECT_HELP_NAME + 1)
+#define CTA_EXPECT_FLAGS  (CTA_EXPECT_HELP_NAME + 2)
+#define CTA_EXPECT_CLASS  (CTA_EXPECT_HELP_NAME + 3)
+#define CTA_EXPECT_NAT(CTA_EXPECT_HELP_NAME + 4)
+#define CTA_EXPECT_FN (CTA_EXPECT_HELP_NAME + 5)
+
+#define CTA_EXPECT_NAT_DIR  1
+#define CTA_EXPECT_NAT_TUPLE2
+
 #define IPS_TEMPLATE_BIT 11
 #define IPS_TEMPLATE (1 << IPS_TEMPLATE_BIT)
 
@@ -99,6 +108,20 @@ static const struct nl_policy nfnlgrp_conntrack_policy[] = {
  * CTA_LABELS_MASK are not received from kernel. */
 };
 
+static const struct nl_policy nfnlgrp_conntrack_exp_policy[] = {
+[CTA_EXPECT_MASTER] = { .type = NL_A_NESTED, .optional = false },
+[CTA_EXPECT_TUPLE] = { .type = NL_A_NESTED, .optional = false },
+[CTA_EXPECT_MASK] = { .type = NL_A_NESTED, .optional = false },
+[CTA_EXPECT_TIMEOUT] = { .type = NL_A_U32, .optional = false },
+[CTA_EXPECT_ID] = { .type = NL_A_U32, .optional = false },
+[CTA_EXPECT_HELP_NAME] = { .type = NL_A_STRING, .optional = true },
+[CTA_EXPECT_ZONE] = { .type = NL_A_U16, .optional = true },
+[CTA_EXPECT_FLAGS] = { .type = NL_A_U32, .optional = false },
+[CTA_EXPECT_CLASS] = { .type = NL_A_U32, .optional = false },
+[CTA_EXPECT_NAT] = { .type = NL_A_NESTED, .optional = true },
+[CTA_EXPECT_FN] = { .type = NL_A_STRING, .optional = true },
+};
+
 /* Declarations for conntrack netlink dumping. */
 static void nl_msg_put_nfgenmsg(struct ofpbuf *msg, size_t expected_payload,
 int family, uint8_t subsystem, uint8_t cmd,
@@ -108,13 +131,22 @@ static bool nl_ct_parse_header_policy(struct ofpbuf *buf,
 enum nl_ct_event_type *event_type,
 uint8_t *nfgen_family,
 struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_policy)]);
+static bool nl_ct_parse_ct_exp_policy(struct ofpbuf *buf,
+uint8_t *nfgen_family,
+struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_exp_policy)]);
 
 static bool nl_ct_attrs_to_ct_dpif_entry(struct ct_dpif_entry *entry,
 struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_policy)],
 uint8_t nfgen_family);
+static bool nl_ct_exp_attrs_to_ct_dpif_exp_entry(
+struct ct_dpif_exp_entry *entry,
+struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_exp_policy)],
+uint8_t nfgen_family);
 
 static bool nl_ct_put_ct_tuple(struct ofpbuf *, struct ct_dpif_tuple *,
enum ctattr_type);
+static bool nl_ct_put_ct_exp_tuple(struct ofpbuf *, struct ct_dpif_tuple *,
+   enum ctattr_expect);
 
 struct nl_ct_dump_state {
 struct nl_dump dump;
@@ -385,6 +417,51 @@ nl_ct_get_ct_state(struct ct_dpif_tuple *tuple, struct 
ct_dpif_entry *entry,
 }
 }
 
+/* Searches nf_conntrack's expectation table for 'tuple' in 'zone'.
+ * Sets 'exp' and returns 0 if we successfully find a match. Otherwise,
+ * returns non-zero errno. */
+int
+nl_ct_get_expect(struct ct_dpif_tuple *tuple, uint16_t zone,
+struct ct_dpif_exp_entry *exp)
+{
+struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_exp_policy)];
+struct ofpbuf request, *reply;
+uint8_t nfgen_family;
+int err;
+
+ofpbuf_init(, NL_DUMP_BUFSIZE);
+
+nl_msg_put_nfgenmsg(, 0, tuple->l3_type, NFNL_SUBSYS_CTNETLINK_EXP,
+   IPCTNL_MSG_EXP_GET, NLM_F_REQUEST);
+nl_msg_put_be16(,

[ovs-dev] [PATCH 7/9] flow: Refactor parse_ct_state()

2017-08-25 Thread Yi-Hung Wei

Refactor parse_ct_state() to support different delimiters.

Signed-off-by: Yi-Hung Wei 
---
 lib/flow.c   | 6 +++---
 lib/flow.h   | 2 +-
 ofproto/ofproto-dpif-trace.c | 2 +-
 ovn/utilities/ovn-trace.c| 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index b2b10aa488be..34bc176e8b6e 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1129,14 +1129,14 @@ ct_state_from_string(const char *s)
  * returns false, and reports error message in 'ds'. */
 bool
 parse_ct_state(const char *state_str, uint32_t default_state,
-   uint32_t *ct_state, struct ds *ds)
+   const char *delimiters, uint32_t *ct_state, struct ds *ds)
 {
 uint32_t state = default_state;
 char *state_s = xstrdup(state_str);
 char *save_ptr = NULL;
 
-for (char *cs = strtok_r(state_s, ", ", _ptr); cs;
- cs = strtok_r(NULL, ", ", _ptr)) {
+for (char *cs = strtok_r(state_s, delimiters, _ptr); cs;
+ cs = strtok_r(NULL, delimiters, _ptr)) {
 uint32_t bit = ct_state_from_string(cs);
 if (!bit) {
 ds_put_format(ds, "%s: unknown connection tracking state flag",
diff --git a/lib/flow.h b/lib/flow.h
index 6ae5a674df46..d1aba31290f7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -76,7 +76,7 @@ void flow_get_metadata(const struct flow *, struct match 
*flow_metadata);
 const char *ct_state_to_string(uint32_t state);
 uint32_t ct_state_from_string(const char *);
 bool parse_ct_state(const char *state_str, uint32_t default_state,
-uint32_t *ct_state, struct ds *);
+const char *delimiters, uint32_t *ct_state, struct ds *);
 bool validate_ct_state(uint32_t state, struct ds *);
 void flow_clear_conntrack(struct flow *);
 
diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index a86cf211803e..9303ea18c237 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -199,7 +199,7 @@ parse_oftrace_options(int argc, const char *argv[],
 }
 
 uint32_t ct_state;
-if (!parse_ct_state(argv[++k], 0, _state, )) {
+if (!parse_ct_state(argv[++k], 0, ", ", _state, )) {
 return ds_steal_cstr();
 }
 if (!validate_ct_state(ct_state, )) {
diff --git a/ovn/utilities/ovn-trace.c b/ovn/utilities/ovn-trace.c
index 59083eebe270..9f4cbf58d3e6 100644
--- a/ovn/utilities/ovn-trace.c
+++ b/ovn/utilities/ovn-trace.c
@@ -173,7 +173,7 @@ parse_ct_option(const char *state_s_)
 uint32_t state;
 struct ds ds = DS_EMPTY_INITIALIZER;
 
-if (!parse_ct_state(state_s_, CS_TRACKED, , )) {
+if (!parse_ct_state(state_s_, CS_TRACKED, ", ", , )) {
 ovs_fatal(0, "%s", ds_cstr());
 }
 if (!validate_ct_state(state, )) {
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 4/9] ofproto/trace: Query ct_state for conntrack recirc from DP

2017-08-25 Thread Yi-Hung Wei

Instead of using fixed default conntrack state 'trk|new' in
ofproto/trace for conntrack recirculation, this patch queries the
conntrack state from datapath using ct_dpif_get_info().

Signed-off-by: Yi-Hung Wei 
---
 lib/ct-dpif.c| 42 ++
 lib/ct-dpif.h|  2 ++
 ofproto/ofproto-dpif-trace.c | 48 ++--
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 3e6f4cbe9be2..91388dd6681b 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -442,3 +442,45 @@ ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, int 
conn_per_state)
 ds_put_cstr(ds, "]");
 ds_put_format(ds, "=%u", conn_per_state);
 }
+
+/* Converts a given 'flow' to conntrack 'tuple'. Returns true if the
+ * conversion is valid. Returns false and reports error in 'ds', if
+ * the type of the connection is not supported. */
+bool
+ct_dpif_flow_to_tuple(struct flow *flow, struct ct_dpif_tuple *tuple,
+  struct ds *ds)
+{
+memset(tuple, 0, sizeof *tuple);
+
+if (flow->dl_type == htons(ETH_TYPE_IP)) {
+tuple->l3_type = AF_INET;
+memcpy(>src.in, >nw_src, sizeof tuple->src.in);
+memcpy(>dst.in, >nw_dst, sizeof tuple->dst.in);
+} else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
+tuple->l3_type = AF_INET6;
+memcpy(>src.in6, >ipv6_src, sizeof tuple->src.in6);
+memcpy(>dst.in6, >ipv6_dst, sizeof tuple->dst.in6);
+} else {
+ds_put_format(ds,
+  "Failed to convert flow to conntrack tuple "
+  "(Unsupported dl_type: %"PRIu16").",
+  ntohs(flow->dl_type));
+return false;
+}
+
+tuple->ip_proto = flow->nw_proto;
+
+/* Conntrack does support ICMP, however, we can not get ICMP id
+ * from 'flow'. */
+if (flow->nw_proto == IPPROTO_TCP || flow->nw_proto == IPPROTO_UDP) {
+tuple->src_port = flow->tp_src;
+tuple->dst_port = flow->tp_dst;
+} else {
+ds_put_format(ds,
+  "Failed to convert flow to conntrack tuple "
+  "(Unsupported ip_proto: %"PRIu8").", flow->nw_proto);
+return false;
+}
+
+return true;
+}
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 0c82fb022f2b..f4ca07b5e776 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -17,6 +17,7 @@
 #ifndef CT_DPIF_H
 #define CT_DPIF_H
 
+#include "flow.h"
 #include "openvswitch/types.h"
 #include "packets.h"
 
@@ -209,5 +210,6 @@ void ct_dpif_format_entry(const struct ct_dpif_entry *, 
struct ds *,
 void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple *);
 uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
 void ct_dpif_format_tcp_stat(struct ds *, int, int);
+bool ct_dpif_flow_to_tuple(struct flow *, struct ct_dpif_tuple *, struct ds *);
 
 #endif /* CT_DPIF_H */
diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index c3c929520a2d..a86cf211803e 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -18,7 +18,9 @@
 
 #include "ofproto-dpif-trace.h"
 
+#include 
 #include "conntrack.h"
+#include "ct-dpif.h"
 #include "dpif.h"
 #include "ofproto-dpif-xlate.h"
 #include "openvswitch/ofp-parse.h"
@@ -645,20 +647,44 @@ ofproto_trace(struct ofproto_dpif *ofproto, const struct 
flow *flow,
   recirc_node->recirc_id);
 
 if (recirc_node->type == OFT_RECIRC_CONNTRACK) {
-uint32_t ct_state;
+struct ct_dpif_info ct_info;
+memset(_info, 0, sizeof(ct_info));
+
 if (ovs_list_is_empty(next_ct_states)) {
-ct_state = CS_TRACKED | CS_NEW;
-ds_put_cstr(output, " - resume conntrack with default "
-"ct_state=trk|new (use --ct-next to customize)");
+struct ds errs = DS_EMPTY_INITIALIZER;
+struct ct_dpif_tuple tuple;
+int err, indent_size;
+
+indent_size = 14 + recirc_node->recirc_id / 16;
+ds_put_cstr(output, " - resume conntrack with conntrack info"
+"from datapath\n");
+ds_put_char_multiple(output, ' ', indent_size);
+
+if (ct_dpif_flow_to_tuple(_node->flow, , )) {
+err = ct_dpif_get_info(ofproto->backer->dpif, ,
+recirc_node->flow.ct_zone,
+_info);
+if (err) {
+ds_put_format(, "%s", ovs_strerror(err));
+}
+}
+
+if (errs.length) {
+ct_info.ct_state = CS_TRACKED | CS_NEW;
+ds_put_format(output, "Failed to query ct_state from "
+  "datapath (%s).\n", ds_cstr());
+

[ovs-dev] [PATCH 5/9] dpif-netlink: Implement ct_get_info

2017-08-25 Thread Yi-Hung Wei

This patch implements ct_get_info() in dpif-netlink. It uses
NFNL_SUBSYS_CTNETLINK netlink subsystem to query conntrack info from
kernel datapath. Then, ofproto/trace can use the ct_get_info() to derive
the ct_state of the traced flow. System traffic tests are provided to verify
that ofproto/trace can get the correct ct_state for IPv4 TCP/UDP, and NAT
traffic.

Signed-off-by: Yi-Hung Wei 
---
 lib/dpif-netlink.c   |  10 +-
 lib/netlink-conntrack.c  | 204 +++
 lib/netlink-conntrack.h  |   3 +
 tests/system-kmod-macros.at  |   6 ++
 tests/system-traffic.at  | 181 ++
 tests/system-userspace-macros.at |   9 ++
 6 files changed, 412 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 122c59614f5a..cc22cc6068c2 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2901,6 +2901,14 @@ dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, 
const uint16_t *zone)
 }
 }
 
+static int
+dpif_netlink_ct_get_info(struct dpif *dpif OVS_UNUSED,
+ struct ct_dpif_tuple *tuple, uint16_t zone,
+ struct ct_dpif_info *info)
+{
+return nl_ct_get_info(tuple, zone, info);
+}
+
 
 /* Meters */
 static void
@@ -2986,7 +2994,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_ct_dump_next,
 dpif_netlink_ct_dump_done,
 dpif_netlink_ct_flush,
-NULL,   /* ct_get_info */
+dpif_netlink_ct_get_info,
 dpif_netlink_meter_get_features,
 dpif_netlink_meter_set,
 dpif_netlink_meter_get,
diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c
index ac48b15bde90..93cd0ac2c34f 100644
--- a/lib/netlink-conntrack.c
+++ b/lib/netlink-conntrack.c
@@ -18,6 +18,7 @@
 
 #include "netlink-conntrack.h"
 
+#include 
 #include 
 #include 
 #include 
@@ -112,6 +113,9 @@ static bool nl_ct_attrs_to_ct_dpif_entry(struct 
ct_dpif_entry *entry,
 struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_policy)],
 uint8_t nfgen_family);
 
+static bool nl_ct_put_ct_tuple(struct ofpbuf *, struct ct_dpif_tuple *,
+   enum ctattr_type);
+
 struct nl_ct_dump_state {
 struct nl_dump dump;
 struct ofpbuf buf;
@@ -325,6 +329,125 @@ nl_ct_flush_zone(uint16_t flush_zone)
 }
 #endif
 
+static bool
+nl_ct_cmp_ct_dpif_tuple(struct ct_dpif_tuple *t1, struct ct_dpif_tuple *t2)
+{
+if (t1->l3_type != t2->l3_type || t1->ip_proto != t2->ip_proto) {
+return false;
+}
+
+if (t1->l3_type == AF_INET) {
+if (t1->src.ip != t2->src.ip || t1->dst.ip != t2->dst.ip) {
+return false;
+}
+} else {
+if (memcmp(>src.in6, >src.in6, sizeof t1->src.in6) ||
+memcpy(>dst.in6, >dst.in6, sizeof t1->dst.in6)) {
+return false;
+}
+}
+
+if (t1->ip_proto == IPPROTO_ICMP || t1->ip_proto == IPPROTO_ICMPV6) {
+if (t1->icmp_id != t2->icmp_id || t1->icmp_type != t2->icmp_type ||
+t1->icmp_code != t2->icmp_code) {
+return false;
+}
+} else {
+if (t1->src_port != t2->src_port || t1->dst_port != t2->dst_port) {
+return false;
+}
+}
+
+return true;
+}
+
+static void
+nl_ct_get_ct_state(struct ct_dpif_tuple *tuple, struct ct_dpif_entry *entry,
+   uint32_t *ct_state)
+{
+if (entry->status & IPS_SEEN_REPLY) {
+*ct_state |= CS_ESTABLISHED;
+}
+if (!nl_ct_cmp_ct_dpif_tuple(tuple, >tuple_orig)) {
+*ct_state |= CS_REPLY_DIR;
+}
+if (entry->tuple_master.l3_type) {
+*ct_state |= CS_RELATED;
+}
+if (entry->status & IPS_SRC_NAT_DONE) {
+*ct_state |= *ct_state & CS_REPLY_DIR ? CS_DST_NAT : CS_SRC_NAT;
+}
+if (entry->status & IPS_DST_NAT_DONE) {
+*ct_state |= *ct_state & CS_REPLY_DIR ? CS_SRC_NAT : CS_DST_NAT;
+}
+if (*ct_state == 0) {
+*ct_state = CS_NEW;
+}
+}
+
+int
+nl_ct_get_info(struct ct_dpif_tuple *tuple, uint16_t zone,
+   struct ct_dpif_info *info)
+{
+struct nlattr *attrs[ARRAY_SIZE(nfnlgrp_conntrack_policy)];
+struct ofpbuf request, *reply;
+struct ct_dpif_entry entry;
+enum nl_ct_event_type type;
+uint8_t nfgen_family;
+int err;
+
+ofpbuf_init(, NL_DUMP_BUFSIZE);
+nl_msg_put_nfgenmsg(, 0, tuple->l3_type, NFNL_SUBSYS_CTNETLINK,
+IPCTNL_MSG_CT_GET, NLM_F_REQUEST);
+nl_msg_put_be16(, CTA_ZONE, htons(zone));
+if (!nl_ct_put_ct_tuple(, tuple, CTA_TUPLE_ORIG)) {
+ofpbuf_uninit();
+return EOPNOTSUPP;
+}
+
+err = nl_transact(NETLINK_NETFILTER, , );
+ofpbuf_uninit();
+
+if (err == ENOENT) {
+info->ct_state = CS_TRACKED | CS_NEW;
+return 0;
+} else if (err) {
+struct ds s = DS_EMPTY_INITIALIZER;
+
+ct_dpif_format_tuple(, tuple);
+VLOG_WARN_RL(, "Could

[ovs-dev] [PATCH 3/9] ct-dpif: Add ct_dpif_get_info()

2017-08-25 Thread Yi-Hung Wei

This patch adds ct_dpif_get_info() to dpif_class for querying conntrack
info from datapath. Later patches will use this function to query ct_fields
such as ct_state, ct_mark, and ct_label for ofproto/trace command.
The following commits will provide implementation of ct_dpif_get_info()
on dpif-netlink and dpif-netdev.

Signed-off-by: Yi-Hung Wei 
---
 lib/ct-dpif.c   | 19 +++
 lib/ct-dpif.h   |  7 +++
 lib/dpif-netdev.c   |  1 +
 lib/dpif-netlink.c  |  1 +
 lib/dpif-provider.h |  8 
 5 files changed, 36 insertions(+)

diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index c79e69e23970..3e6f4cbe9be2 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -127,6 +127,25 @@ ct_dpif_flush(struct dpif *dpif, const uint16_t *zone)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_get_info(struct dpif *dpif, struct ct_dpif_tuple *tuple,
+ const uint16_t zone, struct ct_dpif_info *info)
+{
+return (dpif->dpif_class->ct_get_info
+? dpif->dpif_class->ct_get_info(dpif, tuple, zone, info)
+: EOPNOTSUPP);
+}
+
+void
+ct_dpif_format_info(const struct ct_dpif_info *info, struct ds *output)
+{
+struct ds s = DS_EMPTY_INITIALIZER;
+
+format_flags(, ct_state_to_string, info->ct_state, '|');
+ds_put_format(output, "ct_state=%s", ds_cstr());
+ds_destroy();
+}
+
 void
 ct_dpif_entry_uninit(struct ct_dpif_entry *entry)
 {
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index d5f966175bc0..0c82fb022f2b 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -172,6 +172,10 @@ struct ct_dpif_entry {
 uint32_t bkt;   /* CT bucket number. */
 };
 
+struct ct_dpif_info {
+uint32_t ct_state;
+};
+
 enum {
 CT_STATS_UDP,
 CT_STATS_TCP,
@@ -196,6 +200,9 @@ int ct_dpif_dump_start(struct dpif *, struct 
ct_dpif_dump_state **,
 int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
 int ct_dpif_dump_done(struct ct_dpif_dump_state *);
 int ct_dpif_flush(struct dpif *, const uint16_t *zone);
+int ct_dpif_get_info(struct dpif *, struct ct_dpif_tuple *,
+ const uint16_t zone, struct ct_dpif_info *);
+void ct_dpif_format_info(const struct ct_dpif_info *, struct ds *);
 void ct_dpif_entry_uninit(struct ct_dpif_entry *);
 void ct_dpif_format_entry(const struct ct_dpif_entry *, struct ds *,
   bool verbose, bool print_stats);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd9310d51f..e9586bc46627 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5535,6 +5535,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_ct_dump_next,
 dpif_netdev_ct_dump_done,
 dpif_netdev_ct_flush,
+NULL,   /* ct_get_info */
 dpif_netdev_meter_get_features,
 dpif_netdev_meter_set,
 dpif_netdev_meter_get,
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 29001fbe4a54..122c59614f5a 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2986,6 +2986,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_ct_dump_next,
 dpif_netlink_ct_dump_done,
 dpif_netlink_ct_flush,
+NULL,   /* ct_get_info */
 dpif_netlink_meter_get_features,
 dpif_netlink_meter_set,
 dpif_netlink_meter_get,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 1d82a0939aa1..829738739c6b 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -75,6 +75,8 @@ dpif_flow_dump_thread_init(struct dpif_flow_dump_thread 
*thread,
 
 struct ct_dpif_dump_state;
 struct ct_dpif_entry;
+struct ct_dpif_tuple;
+struct ct_dpif_info;
 
 /* Datapath interface class structure, to be defined by each implementation of
  * a datapath interface.
@@ -428,6 +430,12 @@ struct dpif_class {
  * only deletes connections in '*zone'. */
 int (*ct_flush)(struct dpif *, const uint16_t *zone);
 
+/* Queries 'dpif' for the connection tracking info of a connection
+ * specified by 'tuple' in 'zone'. On success, returns 0 and sets
+ * 'info', otherwise returns nonzero value. */
+int (*ct_get_info)(struct dpif *, struct ct_dpif_tuple *tuple,
+   const uint16_t zone, struct ct_dpif_info *info);
+
 /* Meters */
 
 /* Queries 'dpif' for supported meter features.
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 2/9] ofproto/trace: Propagate ct_zone in recirculation

2017-08-25 Thread Yi-Hung Wei

This patch propagates ct_zone when ofproto/trace automatically runs
through the recirculation process.

Fixes: e6bc8e749381 ("ofproto/trace: Add support for tracing conntrack 
recirculation")
Signed-off-by: Yi-Hung Wei 
---
 ofproto/ofproto-dpif-trace.c |  4 +++-
 ofproto/ofproto-dpif-trace.h |  3 ++-
 ofproto/ofproto-dpif-xlate.c |  7 ---
 tests/ofproto-dpif.at| 14 +++---
 4 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index a45c9cfd9619..c3c929520a2d 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -91,7 +91,8 @@ oftrace_node_destroy(struct oftrace_node *node)
 bool
 oftrace_add_recirc_node(struct ovs_list *recirc_queue,
 enum oftrace_recirc_type type, const struct flow *flow,
-const struct dp_packet *packet, uint32_t recirc_id)
+const struct dp_packet *packet, uint32_t recirc_id,
+const uint16_t zone)
 {
 if (!recirc_id_node_find_and_ref(recirc_id)) {
 return false;
@@ -104,6 +105,7 @@ oftrace_add_recirc_node(struct ovs_list *recirc_queue,
 node->recirc_id = recirc_id;
 node->flow = *flow;
 node->flow.recirc_id = recirc_id;
+node->flow.ct_zone = zone;
 node->packet = packet ? dp_packet_clone(packet) : NULL;
 
 return true;
diff --git a/ofproto/ofproto-dpif-trace.h b/ofproto/ofproto-dpif-trace.h
index 5178d227ed07..5e51771b19b0 100644
--- a/ofproto/ofproto-dpif-trace.h
+++ b/ofproto/ofproto-dpif-trace.h
@@ -84,6 +84,7 @@ struct oftrace_node *oftrace_report(struct ovs_list *, enum 
oftrace_node_type,
 const char *text);
 bool oftrace_add_recirc_node(struct ovs_list *recirc_queue,
  enum oftrace_recirc_type, const struct flow *,
- const struct dp_packet *, uint32_t recirc_id);
+ const struct dp_packet *, uint32_t recirc_id,
+ const uint16_t zone);
 
 #endif /* ofproto-dpif-trace.h */
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 9e1f837cb23e..95c4fef0d9b0 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4685,7 +4685,8 @@ finish_freezing(struct xlate_ctx *ctx)
  * the remainder of the current action list and asynchronously resume pipeline
  * processing in 'table' with the current metadata and action set. */
 static void
-compose_recirculate_and_fork(struct xlate_ctx *ctx, uint8_t table)
+compose_recirculate_and_fork(struct xlate_ctx *ctx, uint8_t table,
+ const uint16_t zone)
 {
 uint32_t recirc_id;
 ctx->freezing = true;
@@ -4694,7 +4695,7 @@ compose_recirculate_and_fork(struct xlate_ctx *ctx, 
uint8_t table)
 if (OVS_UNLIKELY(ctx->xin->trace) && recirc_id) {
 if (oftrace_add_recirc_node(ctx->xin->recirc_queue,
 OFT_RECIRC_CONNTRACK, >xin->flow,
-ctx->xin->packet, recirc_id)) {
+ctx->xin->packet, recirc_id, zone)) {
 xlate_report(ctx, OFT_DETAIL, "A clone of the packet is forked to "
  "recirculate. The forked pipeline will be resumed at "
  "table %u.", table);
@@ -5764,7 +5765,7 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct 
ofpact_conntrack *ofc)
 
 if (ofc->recirc_table != NX_CT_RECIRC_NONE) {
 ctx->conntracked = true;
-compose_recirculate_and_fork(ctx, ofc->recirc_table);
+compose_recirculate_and_fork(ctx, ofc->recirc_table, zone);
 }
 
 /* The ct_* fields are only available in the scope of the 'recirc_table'
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 28a7e827cac2..c76ea4eee1cc 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -9827,21 +9827,21 @@ table=0,priority=1,action=drop
 dnl
 dnl Table 1
 dnl
-table=1,priority=10,in_port=1,ct_state=+trk+new,udp,action=ct(commit,zone=0),2
-table=1,priority=10,in_port=1,ct_state=+trk+est,udp,action=2
-table=1,priority=10,in_port=2,ct_state=+trk+est,udp,action=1
+table=1,priority=10,in_port=1,ct_zone=0,ct_state=+trk+new,udp,action=ct(commit,zone=0),2
+table=1,priority=10,in_port=1,ct_zone=0,ct_state=+trk+est,udp,action=2
+table=1,priority=10,in_port=2,ct_zone=0,ct_state=+trk+est,udp,action=1
 table=1,priority=1,action=drop
 dnl
 dnl Table 2
 dnl
-table=2,priority=10,in_port=1,tcp,ct_state=+trk+new,tcp,action=ct(commit,zone=1),ct(table=3,zone=2)
-table=2,priority=10,in_port=1,tcp,ct_state=+trk+est,tcp,action=ct(table=3,zone=2)
+table=2,priority=10,in_port=1,tcp,ct_zone=1,ct_state=+trk+new,tcp,action=ct(commit,zone=1),ct(table=3,zone=2)
+table=2,priority=10,in_port=1,tcp,ct_zone=1,ct_state=+trk+est,tcp,action=ct(table=3,zone=2)
 table=2,priority=1,action=drop
 dnl
 dnl Table 3

Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Fischetti, Antonio



> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, August 25, 2017 6:19 PM
> To: Fischetti, Antonio ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC
> is disabled
> 
> Hi Antonio
> 
> On 8/25/17, 6:56 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com"  antonio.fische...@intel.com> wrote:
> 
> When EMC is disabled the reading of RSS hash is skipped.
> Also, for packets that are not recirculated it retrieves
> the hash value without considering the recirc id.
> 
> Signed-off-by: Antonio Fischetti 
> ---
> 
>  V4
>   - reworked to remove dependencies from other patches in
> patchset "Skip EMC for recirc pkts and other optimizations."
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DAugust_337320.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=7UGZunJeutRugFmidRAykhZFRYscZ2rUUmhF7Ys9IM8=hzVGT1bI3zOj_x9J5cu18Q2D5
> WLHdEC1qhQ-Gu--TEE=
> 
>   - measurements were repeated with the latest head of master.
> 
>  Port-to-Port Test
>  =
> Software built with "-O2 -march=native -g".
> 
> I measured the Rx rate regardless of pkt loss by sending 1 UDP
> flow,
> 64B packets, at the line-rate.
> 
> 2 PMDs with 3 Tx queues.
> 
> Flows setup:
>   in_port=dpdk0 actions=output:dpdk1
>   in_port=dpdk1 actions=output:dpdk0
> 
> Results
> ---
> Values are for the Rx rate in Mpps, regardless of packet loss.
> 
> RSS column:
>Yes: RSS hash is provided by the NIC
>No: RSS is disabled and the 5-tuple hash must be
>computed in software.
> 
>Note: to simulate RSS disabled I added the line
>dp_packet_rss_valid(struct dp_packet *p)
>{
>+return false;
>#ifdef DPDK_NETDEV
> 
> EMC column:
>Yes: default probability insertion,
>No: EMC disabled.
> 
> Orig means Commit ID:
>   75f9e007e7f7eb91461e238f882d1c539c56bb8d
> 
> [Darrell]
> It looks like the main benefit (+6% throughput) occurs
> when RSS is not calculated by the NIC and emc is disabled. What are the common
> cases when RSS is not generated by the NIC ?

[Antonio]
A common case could be when packets are coming in from a VM.  
Also when the NIC does not provide a hash value, but this shouldn't be 
considered a common case I guess.


> 
> 
> 
> +--++
> +-+-+Orig  | Orig + this|
> | RSS | EMC |  |patch   |
> +-+-+--++
> | Yes | Yes | 12.02, 11.38 | 12.18, 11.32   |
> | Yes |  No |  8.35,  8.36 |  8.36,  8.38   |
> +-+-+--++
> |  No | Yes |  9.91, 11.15 |  9.84, 11.08   |
> |  No |  No |  7.83,  7.87 |  8.33,  8.35   |
> +-+-+--++
> 
> ---
>  lib/dpif-netdev.c | 32 
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e2cd931..1157998 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4567,6 +4567,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet *packet_,
>  }
> 
>  static inline uint32_t
> +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> +const struct miniflow *mf)
> +{
> +uint32_t hash;
> +
> +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> +hash = dp_packet_get_rss_hash(packet);
> +} else {
> +hash = miniflow_hash_5tuple(mf, 0);
> +dp_packet_set_rss_hash(packet, hash);
> +}
> +
> +return hash;
> +}
> +
> +static inline uint32_t
>  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
>  const struct miniflow *mf)
>  {
> @@ -4701,10 +4717,18 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  miniflow_extract(packet, >mf);
>  key->len = 0; /* Not computed yet. */
> -key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
> -
> -/* If EMC is disabled skip emc_lookup */
> -flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
> +/* If EMC is disabled skip hash computation and

Re: [ovs-dev] [PATCH v2] conntrack: Fix ct-clean thread crash bug

2017-08-25 Thread Darrell Ball

Thanks for the fix Lili

I also updated AUTHORS.rst, since I did not find your name there.

I also added a Fixes tag and more detail to the commit message.

and applied the patch to dpdk_merge here

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_darball_ovs_commits_dpdk-5Fmerge=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=A2_FCacqbp2moAo3HGFlTuxsjONUGhlN42OBcAuQQ6w=b6btPKhgvOFr2GOUYvktND6kaC6jc3fXI-mXfvNgXOU=

This will need to go to 2.8 as well



On 8/24/17, 11:25 PM, "ovs-dev-boun...@openvswitch.org on behalf of huanglili" 
 wrote:

From: Lili Huang 

Conn should be removed from the list before freed.

This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
 table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
 table=1,in_port=2,ip,ct_state=+est,actions=1
 table=1,in_port=1,ip,ct_state=+est,actions=2"

Signed-off-by: Lili Huang 
---
 lib/conntrack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1c0e023..4918aaf 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -805,6 +805,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
  * against with firewall rules or a separate firewall.
  * Also using zone partitioning can limit DoS impact. */
 nat_res_exhaustion:
+ovs_list_remove(>exp_node);
 delete_conn(nc);
 /* conn_for_un_nat_copy is a local variable in process_one; this
  * memset() serves to document that conn_for_un_nat_copy is from
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=vuMxVczeFb3Q6uRGj0RiEHsArdAc_4l8bUJACE8IyPc=mpsnR_2mI9eB1-58_lKZlnjYscN74wQWSe_are1qZjs=
 






___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] github: Add Appveyor image and link to Readme

2017-08-25 Thread Alin Gabriel Serdean

Just a small nit to see current build status of appveyor.
Also add a link so one could easily reach the history of the builds.

Signed-off-by: Alin Gabriel Serdean 
---
 README.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.rst b/README.rst
index 28f8590..251e2bb 100644
--- a/README.rst
+++ b/README.rst
@@ -7,6 +7,8 @@ Open vSwitch
 
 .. image:: https://travis-ci.org/openvswitch/ovs.png
 :target: https://travis-ci.org/openvswitch/ovs
+.. image:: 
https://ci.appveyor.com/api/projects/status/github/openvswitch/ovs?branch=master=true=true
+:target: https://ci.appveyor.com/project/blp/ovs/history
 
 What is Open vSwitch?
 -
-- 
2.10.2.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3] ovn: Check for known logical switch port types.

2017-08-25 Thread Mark Michelson

OVN is lenient with the types of logical switch ports. Maybe too
lenient. This patch attempts to solve this problem on two fronts:

1) In ovn-nbctl, if you attempt to set the port type to an unknown
type, the command will not end up setting the type.
2) In northd, when copying the port type from the northbound database to
the corresponding port-binding in the southbound database, a warning
will be issued if the port is of an unknown type.

Signed-off-by: Mark Michelson 
---
v3:
  * OVN_NB_LSP_TYPES declaration is static
  * northd will not copy unknown port types to southbound DB
  * re-ordered port types in OVN_NB_LSP_TYPES
v2:
  * Used ARRAY_SIZE to calculate length of OVN_NB_LSP_TYPES
v1:
  * Initial patch
---
 ovn/lib/ovn-util.c| 30 +++
 ovn/lib/ovn-util.h|  2 ++
 ovn/northd/ovn-northd.c   |  7 -
 ovn/utilities/ovn-nbctl.c |  7 -
 tests/ovn-nbctl.at| 73 +++
 5 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c
index 037d0798a..a554c23f5 100644
--- a/ovn/lib/ovn-util.c
+++ b/ovn/lib/ovn-util.c
@@ -299,3 +299,33 @@ default_sb_db(void)
 }
 return def;
 }
+
+/* l3gateway, chassisredirect, and patch
+ * are not in this list since they are
+ * only set in the SB DB by northd
+ */
+static const char *OVN_NB_LSP_TYPES[] = {
+"l2gateway",
+"localnet",
+"localport",
+"router",
+"vtep",
+};
+
+bool
+ovn_is_known_nb_lsp_type(const char *type)
+{
+int i;
+
+if (!type || !type[0]) {
+return true;
+}
+
+for (i = 0; i < ARRAY_SIZE(OVN_NB_LSP_TYPES); ++i) {
+if (!strcmp(OVN_NB_LSP_TYPES[i], type)) {
+return true;
+}
+}
+
+return false;
+}
diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h
index b3d2125a3..9b456426d 100644
--- a/ovn/lib/ovn-util.h
+++ b/ovn/lib/ovn-util.h
@@ -67,4 +67,6 @@ char *alloc_nat_zone_key(const struct uuid *key, const char 
*type);
 const char *default_nb_db(void);
 const char *default_sb_db(void);
 
+bool ovn_is_known_nb_lsp_type(const char *type);
+
 #endif
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 49e4ac338..9510a7bdb 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1962,7 +1962,12 @@ ovn_port_update_sbrec(struct northd_context *ctx,
 }
 sbrec_port_binding_set_options(op->sb, );
 smap_destroy();
-sbrec_port_binding_set_type(op->sb, op->nbsp->type);
+if (ovn_is_known_nb_lsp_type(op->nbsp->type)) {
+sbrec_port_binding_set_type(op->sb, op->nbsp->type);
+} else {
+VLOG_WARN("Unknown port type '%s' set on logical switch '%s'.",
+op->nbsp->type, op->nbsp->name);
+}
 } else {
 const char *chassis = NULL;
 if (op->peer && op->peer->od && op->peer->od->nbr) {
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index 46ede4e50..d00bd6ec9 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -1173,7 +1173,12 @@ nbctl_lsp_set_type(struct ctl_context *ctx)
 const struct nbrec_logical_switch_port *lsp;
 
 lsp = lsp_by_name_or_uuid(ctx, id, true);
-nbrec_logical_switch_port_set_type(lsp, type);
+if (ovn_is_known_nb_lsp_type(type)) {
+nbrec_logical_switch_port_set_type(lsp, type);
+} else {
+ctl_fatal("Logical switch port type '%s' is unrecognized. "
+  "Not setting type.", type);
+}
 }
 
 static void
diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at
index 354b8df96..e1c4b4473 100644
--- a/tests/ovn-nbctl.at
+++ b/tests/ovn-nbctl.at
@@ -951,3 +951,76 @@ IPv6 Routes
 
 OVN_NBCTL_TEST_STOP
 AT_CLEANUP
+
+dnl -
+
+AT_SETUP([ovn-nbctl - lsp types])
+OVN_NBCTL_TEST_START
+
+AT_CHECK([ovn-nbctl ls-add ls0])
+AT_CHECK([ovn-nbctl lsp-add ls0 lp0])
+
+dnl switchport type defaults to empty
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+
+])
+
+dnl The following are the valid entries for
+dnl switchport type
+AT_CHECK([ovn-nbctl lsp-set-type lp0 l2gateway])
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+l2gateway
+])
+
+AT_CHECK([ovn-nbctl lsp-set-type lp0 router])
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+router
+])
+
+AT_CHECK([ovn-nbctl lsp-set-type lp0 localnet])
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+localnet
+])
+
+AT_CHECK([ovn-nbctl lsp-set-type lp0 localport])
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+localport
+])
+
+AT_CHECK([ovn-nbctl lsp-set-type lp0 vtep])
+AT_CHECK([ovn-nbctl lsp-get-type lp0], [0], [dnl
+vtep
+])
+
+dnl All of these are valid southbound port types but
+dnl should be rejected for northbound logical switch
+dnl ports.
+AT_CHECK([ovn-nbctl lsp-set-type lp0 l3gateway], [1], [], [dnl
+ovn-nbctl: Logical switch port type

Re: [ovs-dev] [PATCH 00/40] Fix static code analysis warnings.

2017-08-25 Thread Alin Serdean



> -Original Message-
> From: Shashank Ram [mailto:r...@vmware.com]
> Sent: Friday, August 25, 2017 2:56 AM
> To: Ben Pfaff ; Alin Serdean
> 
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 00/40] Fix static code analysis warnings.
> 
> 
> 
> 
> From: ovs-dev-boun...@openvswitch.org  boun...@openvswitch.org> on behalf of Ben Pfaff 
> Sent: Wednesday, August 2, 2017 11:35 AM
> To: Alin Serdean
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 00/40] Fix static code analysis warnings.
> 
> On Fri, Jul 14, 2017 at 04:40:51AM +, Alin Serdean wrote:
> > The following patches are fixes found with the WDK 8.1 and 10 static
> > code analysis.
> 
> I applied most of this series to master.  The patches that I didn't apply are 
> the
> ones where Shashank had some requests.  (Alin, you should be able to find
> the remaining patches by simply rebasing the series against current master--
> the patches that were applied will disappear on
> rebase.)
> 
> Thank you very much!
> ___
> 
> Hi Ben, all the 40 patches in this series have been applied to master. Some of
> the patches have issues such as memory leaks as I pointed out in my review.
> Were these applied by mistake?
> ___
> >>> My bad, there are some that aren't applied. Alin, do you plan to
> >>> respin the ones from this series that haven't been applied yet?
[Alin Serdean] Yup. I'll try to send them out tonight/tomorrow worst case. I 
got sidetracked. Sorry!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC PATCH] daemon-windows: Set default error mode for services

2017-08-25 Thread Alin Gabriel Serdean

Microsoft allows default process memory dumps via WER (Windows Error
Reporting).
WER can be set to collect dumps using in general using:
https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx
If a normal application crashes, the user will receive a pop-up dialog,
in which he will be asked of his consent on what will be the steps
after the dump was created (debug, close or send the dump to MSFT).
We disable the dump creation via WER in:
https://github.com/openvswitch/ovs/blob/master/lib/util.c#L492
because the idea is we don't want a pop-up if a OVS process (eg. ovs-vsctl)
has crashed.
(more information on the subject:
https://blogs.msdn.microsoft.com/oldnewthing/20040727-00/?p=38323)

Until we implement our own dump collection mechanism, we can set the default
error mode for services because there are no pop-ups allowed in that case.

Signed-off-by: Alin Gabriel Serdean 
Requested-by: Anand Kumar 
---
 lib/daemon-windows.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c
index 1ba714d..2ba62d2 100644
--- a/lib/daemon-windows.c
+++ b/lib/daemon-windows.c
@@ -138,6 +138,12 @@ service_start(int *argcp, char **argvp[])
 *argcp = sargc;
 *argvp = *sargvp;
 
+/* Enable default error mode so we can take advantage of WER
+ * (Windows Error Reporting) crash dumps.
+ * Being a service it does not allow for WER window pop-up.
+ * XXX implement our on crash dump collection mechanism. */
+SetErrorMode(0);
+
 return;
 }
 
-- 
2.10.2.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Darrell Ball

Hi Antonio

On 8/25/17, 6:56 AM, "ovs-dev-boun...@openvswitch.org on behalf of 
antonio.fische...@intel.com"  wrote:

When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves
the hash value without considering the recirc id.

Signed-off-by: Antonio Fischetti 
---

 V4
  - reworked to remove dependencies from other patches in
patchset "Skip EMC for recirc pkts and other optimizations."

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DAugust_337320.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=7UGZunJeutRugFmidRAykhZFRYscZ2rUUmhF7Ys9IM8=hzVGT1bI3zOj_x9J5cu18Q2D5WLHdEC1qhQ-Gu--TEE=
 

  - measurements were repeated with the latest head of master.

 Port-to-Port Test
 =
Software built with "-O2 -march=native -g".

I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
64B packets, at the line-rate.

2 PMDs with 3 Tx queues.

Flows setup:
  in_port=dpdk0 actions=output:dpdk1
  in_port=dpdk1 actions=output:dpdk0

Results
---
Values are for the Rx rate in Mpps, regardless of packet loss.

RSS column:
   Yes: RSS hash is provided by the NIC
   No: RSS is disabled and the 5-tuple hash must be
   computed in software.

   Note: to simulate RSS disabled I added the line
   dp_packet_rss_valid(struct dp_packet *p)
   {
   +return false;
   #ifdef DPDK_NETDEV

EMC column:
   Yes: default probability insertion,
   No: EMC disabled.

Orig means Commit ID:
  75f9e007e7f7eb91461e238f882d1c539c56bb8d

[Darrell]
It looks like the main benefit (+6% throughput) occurs
when RSS is not calculated by the NIC and emc is disabled. What are the common
cases when RSS is not generated by the NIC ?



+--++
+-+-+Orig  | Orig + this|
| RSS | EMC |  |patch   |
+-+-+--++
| Yes | Yes | 12.02, 11.38 | 12.18, 11.32   |
| Yes |  No |  8.35,  8.36 |  8.36,  8.38   |
+-+-+--++
|  No | Yes |  9.91, 11.15 |  9.84, 11.08   |
|  No |  No |  7.83,  7.87 |  8.33,  8.35   |
+-+-+--++

---
 lib/dpif-netdev.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..1157998 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4567,6 +4567,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
struct dp_packet *packet_,
 }
 
 static inline uint32_t
+dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
+const struct miniflow *mf)
+{
+uint32_t hash;
+
+if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
+hash = dp_packet_get_rss_hash(packet);
+} else {
+hash = miniflow_hash_5tuple(mf, 0);
+dp_packet_set_rss_hash(packet, hash);
+}
+
+return hash;
+}
+
+static inline uint32_t
 dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
 const struct miniflow *mf)
 {
@@ -4701,10 +4717,18 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 miniflow_extract(packet, >mf);
 key->len = 0; /* Not computed yet. */
-key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
-
-/* If EMC is disabled skip emc_lookup */
-flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
+/* If EMC is disabled skip hash computation and emc_lookup */
+if (OVS_LIKELY(cur_min)) {
+if (!md_is_valid) {
+key->hash = 
dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
+>mf);
+} else {
+key->hash = dpif_netdev_packet_get_rss_hash(packet, 
>mf);
+}
+flow = emc_lookup(flow_cache, key);
+} else {
+flow = NULL;
+}
 if (OVS_LIKELY(flow)) {
 dp_netdev_queue_batches(packet, flow, >mf, batches,
 n_batches);
-- 
2.4.11

[ovs-dev] [RFC PATCH v2 09/10] Docs: Add userspace-ipsec how to guide.

2017-08-25 Thread Ian Stokes

This commit adds a how to guide for using the proposed vxlanipsec
userspace interface. It is not intended to be upstreamed but simply
seeks to solicit feed back by providing an example of the proposed
vxlanipsec interface design setup and usage.

The how to usecase deals with securing vxlan traffic between 2 VMs as
described in the userspace-vxlan how to guide. It provides an example of
how the proposed ipsec interface is configured with an SPD, creation of
SAs including IPsec protocol, mode, crypto/authentication algorithms/keys,
assigning SPD entires to SAs for inbound/outbound traffic with traffic
selectors and finally updating the SA keys.

Signed-off-by: Ian Stokes 
---
 Documentation/automake.mk   |1 +
 Documentation/howto/index.rst   |1 +
 Documentation/howto/userspace-ipsec.rst |  187 +++
 3 files changed, 189 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/howto/userspace-ipsec.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 24fe63d..a8f2a01 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -59,6 +59,7 @@ DOC_SOURCE = \
Documentation/howto/tunneling.png \
Documentation/howto/tunneling.rst \
Documentation/howto/userspace-tunneling.rst \
+   Documentation/howto/userspace-ipsec.rst \
Documentation/howto/vlan.png \
Documentation/howto/vlan.rst \
Documentation/howto/vtep.rst \
diff --git a/Documentation/howto/index.rst b/Documentation/howto/index.rst
index 5859a33..97d690a 100644
--- a/Documentation/howto/index.rst
+++ b/Documentation/howto/index.rst
@@ -43,6 +43,7 @@ OVS
lisp
tunneling
userspace-tunneling
+   userspace-ipsec
vlan
qos
vtep
diff --git a/Documentation/howto/userspace-ipsec.rst 
b/Documentation/howto/userspace-ipsec.rst
new file mode 100644
index 000..2ae2bd8
--- /dev/null
+++ b/Documentation/howto/userspace-ipsec.rst
@@ -0,0 +1,187 @@
+..
+  Licensed under the Apache License, Version 2.0 (the "License"); you may
+  not use this file except in compliance with the License. You may obtain
+  a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+  License for the specific language governing permissions and limitations
+  under the License.
+
+  Convention for heading levels in Open vSwitch documentation:
+
+  ===  Heading 0 (reserved for the title in a document)
+  ---  Heading 1
+  ~~~  Heading 2
+  +++  Heading 3
+  '''  Heading 4
+
+  Avoid deeper levels because they do not render well.
+
+==
+Securing VXLAN traffic between VMs Using IPsec (Userspace)
+==
+
+This document describes how to use IPsec in Open vSwitch to secure traffic
+between VMs on two different hosts communicating over VXLAN tunnels. This
+solution works entirely in userspace.
+
+.. note::
+
+   This guide covers the steps required to configure an IPsec interface to
+   secure VXLAN tunneling traffic. It does not cover the steps to configure
+   the vxlan tunnels in userspace. For these configuration steps please refer
+   to :doc:`userspace-tunneling`.
+
+.. TODO(stephenfin): Convert this to a (prettier) PNG with same styling as the
+   rest of the document
+
+::
+
++--+  +--+
+| vm0  | 192.168.1.1/24192.168.1.2/24 | vm1  |
++--+  +--+
+   (vm_port0)(vm_port1)
+   | |
+   | |
+   | |
++--+  +--+
+|br-int|  |br-int|
++--+  +--+
+| vxlanipsec0  | 172.168.1.1/24172.168.1.2/24 | vxlanipsec0  |
++--+  +--+
+   | |
+   | |
+   | |
++--+  +--+
+|br-phy|  |br-phy|
++--+  +--+
+|  dpdk0/eth1  |--|  dpdk0/eth1  |
+

[ovs-dev] [RFC PATCH v2 08/10] vswitch.xml: Detail vxlanipsec user interface.

2017-08-25 Thread Ian Stokes

This commit adds details to the vswitch xml regarding the use of the
vxlanipsec interface type. This patch is not intended for upstreaming
and simply seeks to solicit feedback on the user interface design of
the vxlanipsec port type as described in the vswitch.xml.

This modifies the vswitch.xml with a proposed vxlanipsec interface.
It also provides details for the proposed interface options such as
SPD creation, SA creation and modification, Policy entries for the
SPD as well as traffic selector options for the policy.

Signed-off-by: Ian Stokes 
---
 vswitchd/vswitch.xml |  225 ++
 1 files changed, 225 insertions(+), 0 deletions(-)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 074535b..27c3c54 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2227,6 +2227,13 @@
 A pair of virtual devices that act as a patch cable.
   
 
+  vxlanipsec
+  
+An interface type to provide IPsec RFC4301 functionality for
+traffic at the IP layer with vxlan, initially for IPv4
+environments.
+  
+
   null
   An ignored interface. Deprecated and slated for removal in
   February 2013.
@@ -2644,6 +2651,224 @@
   
 
 
+
+  
+Only vxlanipsec interfaces support these options.
+  
+
+  
+  
+Must be an identifier for the SPD that is to be used by this IPsec
+interface. If no SPD exists with this ID then it will be created.
+  
+  
+
+  
+  
+An identifier representing the ID of a Security Association.
+If no SA with this ID exists it will be created.
+  
+  
+
+  
+  
+A 32 bit number representing the security policy index for
+the SA.
+  
+  
+
+  
+  
+The IPsec mode that applies to the SA, one of:
+  
+
+  
+transport: Provide protection primarily for next
+layer protocols.
+  
+  
+tunnel: Provide protection to IP layer also (applied
+to tunneled IP packets).
+  
+
+  
+Initially only support for transport mode will be implemented.
+  
+  
+
+  
+  
+The security protocol used for IPsec, one of the following:
+  
+
+  
+ESP: Encapsulating Security Payload.
+  
+  
+AH: Authentication header
+  
+
+  
+Initially only ESP is supported, users can implement authentication
+communication only by setting the encryption algorithm to NULL for ESP
+but specifying the integrity algorithm. In this way there is no need
+to support AH. If this is acceptable to the OVS community then this
+option could be removed as it will always be ESP.
+  
+  
+
+  
+  
+The encryption algorithm used for IPsec, one of the following:
+  
+
+  
+NULL: No encryption. Note NULL is required for the
+use of ESP with authentication only which is preferred over AH
+due to NAT traversal.
+  
+  
+AES_CBC: AES_CBC is a non-AEAD algorithm. Note users
+MUST specify an authentication algorithm to check integrity.
+  
+  
+AES_GCM_16: An AEAD algorithm. This allows the use of
+ESP with a combined mode cipher that handles encryption/decryption
+and authentication in a single step. Recommended for performance.
+  
+
+  
+
+  
+  
+A crypto key specifies the transformation of plaintext into cipher
+text for the encryption algorithm. Alphanumeric user generated string.
+IPsec sessions may have very long life time, and carry multiple
+packets, where possible 128-bit or 256-bit keys should be used.
+  
+  
+
+  
+  
+Optional. Represents the authentication algorithm to be used for
+IPsec. Depending on the encryption algorithm that is chosen a user
+may have to specify a separate authentication algorithm.  This
+typically occurs when communications are encrypted with non-AEAD
+algorithm which MUST be combined with an authentication algorithm.
+Can be one of the following:
+  
+
+  
+NULL: No authentication algorithm. Must only be used
+when the encryption is done with an AEAD algorithm.
+  
+  
+HMAC_SHA2_256_128: Should be selected when a non AEAD
+algorithm is used for encryption.
+  
+
+  
+
+  
+  
+Required if an auth_alg has been specificed. An authentication key
+specifies the transformation of plaintext into cipher text for the
+authentication algorithm.

[ovs-dev] [RFC PATCH v2 03/10] packets: Add ESP header and trailer.

2017-08-25 Thread Ian Stokes

This patch introduces structs for both ESP headers and ESP trailers
along with expected size assertions.

Signed-off-by: Ian Stokes 
---
 lib/packets.h |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/lib/packets.h b/lib/packets.h
index 705d0b2..057935c 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -799,6 +799,20 @@ struct udp_header {
 };
 BUILD_ASSERT_DECL(UDP_HEADER_LEN == sizeof(struct udp_header));
 
+#define ESP_HEADER_LEN 8
+struct esp_header {
+ovs_be32 spi;
+ovs_be32 seq_no;
+};
+BUILD_ASSERT_DECL(ESP_HEADER_LEN == sizeof(struct esp_header));
+
+#define ESP_TRAILER_LEN 2
+struct esp_trailer {
+uint8_t pad_len;
+uint8_t next_hdr;
+};
+BUILD_ASSERT_DECL(ESP_TRAILER_LEN == sizeof(struct esp_trailer));
+
 #define TCP_FIN 0x001
 #define TCP_SYN 0x002
 #define TCP_RST 0x004
-- 
1.7.0.7

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC PATCH v2 04/10] flow: Add ESP spi value to flow struct.

2017-08-25 Thread Ian Stokes

This patch adds a field to the flow struct to represent the ESP security
parameter index of a packet.

Signed-off-by: Ian Stokes 
---
 include/openvswitch/flow.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index a658a58..d986929 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -156,6 +156,11 @@ struct flow {
 ovs_be32 igmp_group_ip4;/* IGMP group IPv4 address.
  * Keep last for BUILD_ASSERT_DECL below. */
 ovs_be32 pad3;  /* Pad to 64 bits. */
+
+/* SPI for ESP (64-bit aligned) */
+/* XXX TO DO: move this to the l3 layer */
+ovs_be32 spi;
+ovs_be32 pad4;
 };
 BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0);
 BUILD_ASSERT_DECL(sizeof(struct flow_tnl) % sizeof(uint64_t) == 0);
-- 
1.7.0.7

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC PATCH v2 01/10] acinclude.m4: Support compilation of libIPsec.

2017-08-25 Thread Ian Stokes

LibIpsecMB is required to enable the use of vdev cryptodev devices in
DPDK. This patch adds a condition to check for the library when it is
detected that ONFIG_RTE_LIBRTE_PMD_AESNI_MB=y is enabled in the DPDK
config.

Signed-off-by: Ian Stokes 
---
 acinclude.m4 |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index aeb594a..8c14367 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -271,6 +271,19 @@ AC_DEFUN([OVS_CHECK_DPDK], [
], [],
[AC_DEFINE([DPDK_PDUMP], [1], [DPDK pdump enabled in OVS.])])
  ])
+AC_COMPILE_IFELSE([
+  AC_LANG_PROGRAM(
+[
+  #include 
+#if RTE_LIBRTE_PMD_AESNI_MB
+#error
+#endif
+], [])
+  ], [],
+  [AC_SEARCH_LIBS([init_mb_mgr_sse],[IPSec_MB],[],[AC_MSG_ERROR([unable to 
find lib_IPSec_MB in ${LDFLAGS}, install the dependency package])])
+   DPDK_EXTRA_LIB="-lIPSec_MB"
+   AC_DEFINE([PMD_AESNI_MB], [1], [PMD_AESNI_MB support detected in 
DPDK.])])
+
 
 # On some systems we have to add -ldl to link with dpdk
 #
-- 
1.7.0.7

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC PATCH v2 02/10] openvswitch.h: add vport to ovs_action_push_tnl.

2017-08-25 Thread Ian Stokes

Upon callback for building/pushing a packet header when encapsulating
for tunneling a reference to the vport in question is required to access
associated devices such as cryptodevs. This patch adds this pointer and
will enable future work with cryptodevs that are associated with a
vport.

Signed-off-by: Ian Stokes 
---
 datapath/linux/compat/include/linux/openvswitch.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index bc6c94b..afa7faf 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -723,6 +723,8 @@ struct ovs_action_hash {
  * @tnl_port: To identify tunnel port to pass header info.
  * @out_port: Physical port to send encapsulated packet.
  * @header_len: Length of the header to be pushed.
+ * @dev: Pointer to vport so that the cryptodev parameters associated with the
+ * vport can be accessed at the callback function.
  * @tnl_type: This is only required to format this header.  Otherwise
  * ODP layer can not parse %header.
  * @header: Partial header for the tunnel. Tunnel push action can use
@@ -732,6 +734,7 @@ struct ovs_action_push_tnl {
odp_port_t tnl_port;
odp_port_t out_port;
uint32_t header_len;
+struct netdev_vport *dev;
uint32_t tnl_type; /* For logging. */
uint32_t header[TNL_PUSH_HEADER_SIZE / 4];
 };
-- 
1.7.0.7

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC PATCH v2 00/10] RFC for Userspace IPsec Interface

2017-08-25 Thread Ian Stokes

Hi all,

I've started to work on enabling IPsec for userspace in OVS.

This RFC patchset provides a very basic implementation allowing users to test
a simple VM to VM setup. The patch set is far from complete and will require
a lot more work before I consider submitting from upstream but I'm sharing
what I have to date to gather feedback on the approach and some of the key
considerations I've come across.

I'd like to discuss 2 areas in this cover letter which are not directly
covered in the accompanying RFC patches.

(i) Usecase: Securing isolated tenant VM traffic.
(ii) Known Issues 

(i) Usecase: Securing isolated tenant VM traffic.

Users can isolate traffic between VMs in a data center by the use of an
appropriate tunneling protocol i.e. VXLAN.

However although the traffic is isolated in terms of its source and
destination, it is not encrypted.

A rogue entity with access to the network could listen and examine the clear
text payload of this traffic between VMs.

The aim of this work is to introduce IPsec in userspace to secure the traffic
payloads.

As such, the malicious entity could still see the traffic as it traverses the
network but the payload of the traffic will be secured via encryption and
authentication provided by IPsec.

The steps involved in securing the vxlan payload would be done with VXLAN over
IPsec (transport Mode). This would look as follows

1.) Original header | Payload ! before VXLAN

2.) Outer header | UDP |VXLAN | Original header | Payload ! after VXLAN

3.) Outer header | ESP | Encrypt ( UDP | VXLAN | Original header | Payload )
! after IPsec transport mode

A more detailed description of the expected packet format is given below.

   Outer Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Outer Destination MAC Address |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Outer Destination MAC Address | Outer Source MAC Address  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Outer Source MAC Address   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q| Outer.VLAN Tag Information|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype = 0x0800|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|  Total Length |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Identification|Flags|  Fragment Offset|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live | Protocol = 50 (ESP) |   Header Checksum   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Outer Source IPv4 Address   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Outer Destination IPv4 Address  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   
   ESP Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |SPI (Security Parameter Index) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Sequence Number  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer UDP Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Source Port |   Dest Port = VXLAN Port  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   UDP Length  |UDP Checksum   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   VXLAN Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|R|R|R|I|R|R|R|Reserved   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |VXLAN Network Identifier (VNI) |   Reserved|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   
   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Inner Destination MAC Address |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Inner Destination MAC Address | Inner Source MAC Address  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Inner Source MAC Address   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q| Inner.VLAN Tag Information|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload:

Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH

2017-08-25 Thread Jiri Benc

On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -766,7 +766,7 @@ struct sk_buff {
>   __u8ndisc_nodetype:2;
>  #endif
>   __u8ipvs_property:1;
> - __u8inner_protocol_type:1;
> + __u8inner_protocol_type:2;

Adding anything to sk_buff is pretty much forbidden. You can't add more
bytes to it and there are no more free bits to use, either.

Luckily, we still have one byte hole next to inner_ipproto that we can
use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3
and storing the L3 type in the unused byte. It's not beautiful (would
be better to use ethertype than a custom enum) but it will work.

While looking at this, I realized that GSO for VXLAN-GPE is broken,
too. Let me fix it by implementing what I described above which will
make your patch much easier.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-25 Thread Darrell Ball

Thank you for this contribution
It is a big step forward for HWOL; even partial offload for these patches have 
a significant benefit.
I had done a first pass on Wednesday when the patches came out but I’ll send 
comments out on Monday 

On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:

Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also resues the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
almost 80% performance boost. For PHY-vhost forwarding, I got about 50%
performance boost.


Though that being said, this patchset still has issues unresolved. The
major issue is that maybe most NIC (for instance, Mellanox and Intel)
can not support a pure MARK action. It has to be used together with a
QUEUE action, which in turn needs a queue index. That comes to the issue:
the queue index is not given in the flow context. To make it work, patch
5 just set the queue index to 0, which is obviously wrong. One possible
solution is to record the rxq and pass it down to the flow creation
stage. It would be much better, but it's still far away from being perfect.
Because it might have changed the steering rules stealthily, which may
break the default RSS setup by OVS-DPDK.


The reason I still want to send it out is to get more comments/thoughts
from community on this whole patchset. Meanwhile, I will try to resolve
the QUEUE action issue.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Thanks.

--yliu

---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  13 ++
 lib/dpif-netdev.c | 105 +++
 lib/netdev-dpdk.c | 511 
+-
 lib/netdev.h  |   6 +
 4 files changed, 634 insertions(+), 1 deletion(-)

-- 
2.7.4



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovn: Check for known logical switch port types.

2017-08-25 Thread Mark Michelson

On Thu, Aug 24, 2017 at 2:42 PM Russell Bryant  wrote:

> On Thu, Aug 24, 2017 at 2:13 PM, Mark Michelson 
> wrote:
> > OVN is lenient with the types of logical switch ports. Maybe too
> > lenient. This patch attempts to solve this problem on two fronts:
> >
> > 1) In ovn-nbctl, if you attempt to set the port type to an unknown
> > type, the command will not end up setting the type.
> > 2) In northd, when copying the port type from the northbound database to
> > the corresponding port-binding in the southbound database, a warning
> > will be issued if the port is of an unknown type.
> >
> > Signed-off-by: Mark Michelson 
> > ---
> >  ovn/lib/ovn-util.c| 30 +++
> >  ovn/lib/ovn-util.h|  2 ++
> >  ovn/northd/ovn-northd.c   |  4 +++
> >  ovn/utilities/ovn-nbctl.c |  7 -
> >  tests/ovn-nbctl.at| 73
> +++
> >  5 files changed, 115 insertions(+), 1 deletion(-)
> >
>
> Did you look into whether it was possible to use an enum in the db
> schema?  In particular, what would happen on an upgrade of an existing
> database?  and what happens if the existing db has an invalid type in
> it already?
>
> I just want to make sure we clearly rule that out first.
>

I just did a couple of experiments. I updated the DB schema to change the
logical switch port type to be an enum instead of a string.

I created an environment where I had clean values in the DB. I performed an
upgrade and the northbound DB stayed intact and seemed fine.

I then created an environment where there were logical switch ports with
dirty values in the northbound DB. When I ran the upgrade this time, the
entire northbound DB was empty afterwards. It wasn't just the offending
switch ports that got removed; it was literally everything.

My opinion on this is that the schema change is too risky. If it only
removed offending switch ports, it might not be so bad. But a single switch
port with a bad type results in the entire DB being empty. What do you
think?

Mark Michelson
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 1/3] net: add NSH header structures and helpers

2017-08-25 Thread Jiri Benc

On Fri, 25 Aug 2017 22:20:03 +0800, Yi Yang wrote:
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -138,6 +138,7 @@
>  #define ETH_P_IEEE802154 0x00F6  /* IEEE802.15.4 frame   
> */
>  #define ETH_P_CAIF   0x00F7  /* ST-Ericsson CAIF protocol*/
>  #define ETH_P_XDSA   0x00F8  /* Multiplexed DSA protocol */
> +#define ETH_P_NSH0x894F  /* Ethertype for NSH. */

This is in a wrong section. It belongs to Ethernet protocols a few lines
higher in the file (after ETH_P_HSR).

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] windows,python: remove dead code in send_wait

2017-08-25 Thread Alin Balutoiu

Found while looking over the code.

Signed-off-by: Alin Balutoiu 
Suggested-by: Alin Gabriel Serdean 
---
 python/ovs/stream.py | 2 --
 1 file changed, 2 deletions(-)

diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index 717ea18..d4c1edb 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -494,8 +494,6 @@ class Stream(object):
 self.wait(poller, Stream.W_RECV)
 
 def send_wait(self, poller):
-if sys.platform == 'win32':
-poller.fd_wait(self.connect.hEvent, ovs.poller.POLLIN)
 self.wait(poller, Stream.W_SEND)
 
 def __del__(self):
-- 
2.10.0.windows.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 1/2] windows, python: create a different event for sockets

2017-08-25 Thread Alin Balutoiu

At the moment the sockets on Windows use the same events
that are being created for the pipes.

This is not correct because they should be different events.

This patch introduces a new event which should be used for sockets.
The new event needs to be set on automatic reset with its initial
state not signaled.

Signed-off-by: Alin Balutoiu 
Co-authored-by: Alin Gabriel Serdean 
Signed-off-by: Alin Gabriel Serdean 
---
 python/ovs/stream.py | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index 717ea18..9d0536d 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -102,10 +102,6 @@ class Stream(object):
 self.socket = socket
 self.pipe = pipe
 if sys.platform == 'win32':
-self._read = pywintypes.OVERLAPPED()
-self._read.hEvent = winutils.get_new_event()
-self._write = pywintypes.OVERLAPPED()
-self._write.hEvent = winutils.get_new_event()
 if pipe is not None:
 # Flag to check if fd is a server HANDLE.  In the case of a
 # server handle we have to issue a disconnect before closing
@@ -114,6 +110,13 @@ class Stream(object):
 suffix = name.split(":", 1)[1]
 suffix = ovs.util.abs_file_name(ovs.dirs.RUNDIR, suffix)
 self._pipename = winutils.get_pipe_name(suffix)
+self._read = pywintypes.OVERLAPPED()
+self._read.hEvent = winutils.get_new_event()
+self._write = pywintypes.OVERLAPPED()
+self._write.hEvent = winutils.get_new_event()
+else:
+self._wevent = winutils.get_new_event(bManualReset=False,
+  bInitialState=False)
 
 self.name = name
 if status == errno.EAGAIN:
@@ -459,24 +462,24 @@ class Stream(object):
   win32file.FD_CLOSE)
 try:
 win32file.WSAEventSelect(self.socket,
- self._read.hEvent,
+ self._wevent,
  read_flags)
 except pywintypes.error as e:
 vlog.err("failed to associate events with socket: %s"
  % e.strerror)
-poller.fd_wait(self._read.hEvent, ovs.poller.POLLIN)
+poller.fd_wait(self._wevent, ovs.poller.POLLIN)
 else:
 write_flags = (win32file.FD_WRITE |
win32file.FD_CONNECT |
win32file.FD_CLOSE)
 try:
 win32file.WSAEventSelect(self.socket,
- self._write.hEvent,
+ self._wevent,
  write_flags)
 except pywintypes.error as e:
 vlog.err("failed to associate events with socket: %s"
  % e.strerror)
-poller.fd_wait(self._write.hEvent, ovs.poller.POLLOUT)
+poller.fd_wait(self._wevent, ovs.poller.POLLOUT)
 else:
 if wait == Stream.W_RECV:
 if self._read:
-- 
2.10.0.windows.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-08-25 Thread Yi Yang

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang 
---
 drivers/net/vxlan.c  |   7 +
 include/uapi/linux/openvswitch.h |  28 +++
 net/openvswitch/actions.c| 178 +
 net/openvswitch/flow.c   |  55 ++
 net/openvswitch/flow.h   |  11 ++
 net/openvswitch/flow_netlink.c   | 404 ++-
 net/openvswitch/flow_netlink.h   |   4 +
 7 files changed, 686 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da..a36c41e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
@@ -1268,6 +1269,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
case VXLAN_GPE_NP_IPV6:
*protocol = htons(ETH_P_IPV6);
break;
+   case VXLAN_GPE_NP_NSH:
+   *protocol = htons(ETH_P_NSH);
+   break;
case VXLAN_GPE_NP_ETHERNET:
*protocol = htons(ETH_P_TEB);
break;
@@ -1807,6 +1811,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 
vxflags,
case htons(ETH_P_IPV6):
gpe->next_protocol = VXLAN_GPE_NP_IPV6;
return 0;
+   case htons(ETH_P_NSH):
+   gpe->next_protocol = VXLAN_GPE_NP_NSH;
+   return 0;
case htons(ETH_P_TEB):
gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
return 0;
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4c..91dee5b 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -333,6 +333,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
+   OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -491,6 +492,29 @@ struct ovs_key_ct_tuple_ipv6 {
__u8   ipv6_proto;
 };
 
+enum ovs_nsh_key_attr {
+   OVS_NSH_KEY_ATTR_BASE,  /* struct ovs_nsh_key_base. */
+   OVS_NSH_KEY_ATTR_MD1,   /* struct ovs_nsh_key_md1. */
+   OVS_NSH_KEY_ATTR_MD2,   /* variable-length octets for MD type 2. */
+   __OVS_NSH_KEY_ATTR_MAX
+};
+
+#define OVS_NSH_KEY_ATTR_MAX (__OVS_NSH_KEY_ATTR_MAX - 1)
+
+struct ovs_nsh_key_base {
+   __u8 flags;
+   __u8 ttl;
+   __u8 mdtype;
+   __u8 np;
+   __be32 path_hdr;
+};
+
+#define NSH_MD1_CONTEXT_SIZE 4
+
+struct ovs_nsh_key_md1 {
+   __be32 context[NSH_MD1_CONTEXT_SIZE];
+};
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
@@ -806,6 +830,8 @@ struct ovs_action_push_eth {
  * packet.
  * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
  * packet.
+ * @OVS_ACTION_ATTR_PUSH_NSH: push NSH header to the packet.
+ * @OVS_ACTION_ATTR_POP_NSH: pop the outermost NSH header off the packet.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -835,6 +861,8 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */
OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */
OVS_ACTION_ATTR_POP_ETH,  /* No argument. */
+   OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */
+   OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a54a556..9ca1a84 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -43,6 +43,7 @@
 #include "flow.h"
 #include "conntrack.h"
 #include "vport.h"
+#include "flow_netlink.h"
 
 struct deferred_action {
struct sk_buff *skb;
@@ -380,6 +381,95 @@ static int push_eth(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
+static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
+   const struct nsh_hdr *nsh_src)
+{
+   struct nsh_hdr *nsh;
+   size_t length = nsh_hdr_len(nsh_src);
+   u8 next_proto;
+
+   if (key->mac_proto == MAC_PROTO_ETHERNET) {
+   next_proto = NSH_P_ETHERNET;
+   } else {
+   switch (ntohs(skb->protocol)) {
+   case ETH_P_IP:
+   next_proto = NSH_P_IPV4;
+

[ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH

2017-08-25 Thread Yi Yang

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

We need to enbale Open vSwitch to support NSH, this patch
is to make Linux network infrastructure able to support
NSH GSO for big packet.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

Signed-off-by: Yi Yang 
---
 include/linux/netdevice.h |   1 +
 include/linux/skbuff.h|   8 +++-
 net/Kconfig   |   1 +
 net/Makefile  |   1 +
 net/core/dev.c|  14 ++
 net/ipv4/udp_offload.c|   7 +++
 net/nsh/Kconfig   |  11 +
 net/nsh/Makefile  |   4 ++
 net/nsh/nsh_gso.c | 106 ++
 9 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh_gso.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b3..b017418 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3253,6 +3253,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
+struct packet_offload *find_gso_segment_by_type(__be16 type);
 
 static inline void napi_free_frags(struct napi_struct *napi)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7594e19..aafc8ff 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -766,7 +766,7 @@ struct sk_buff {
__u8ndisc_nodetype:2;
 #endif
__u8ipvs_property:1;
-   __u8inner_protocol_type:1;
+   __u8inner_protocol_type:2;
__u8remcsum_offload:1;
 #ifdef CONFIG_NET_SWITCHDEV
__u8offload_fwd_mark:1;
@@ -2174,12 +2174,16 @@ static inline void skb_tailroom_reserve(struct sk_buff 
*skb, unsigned int mtu,
 
 #define ENCAP_TYPE_ETHER   0
 #define ENCAP_TYPE_IPPROTO 1
+#define ENCAP_TYPE_NSH 2
 
 static inline void skb_set_inner_protocol(struct sk_buff *skb,
  __be16 protocol)
 {
skb->inner_protocol = protocol;
-   skb->inner_protocol_type = ENCAP_TYPE_ETHER;
+   if (skb->inner_protocol == htons(ETH_P_NSH))
+   skb->inner_protocol_type = ENCAP_TYPE_NSH;
+   else
+   skb->inner_protocol_type = ENCAP_TYPE_ETHER;
 }
 
 static inline void skb_set_inner_ipproto(struct sk_buff *skb,
diff --git a/net/Kconfig b/net/Kconfig
index 7d57ef3..818df7a 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -240,6 +240,7 @@ source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
 source "net/qrtr/Kconfig"
 source "net/ncsi/Kconfig"
+source "net/nsh/Kconfig"
 
 config RPS
bool
diff --git a/net/Makefile b/net/Makefile
index bed80fa..82bfac6 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -85,3 +85,4 @@ obj-y += l3mdev/
 endif
 obj-$(CONFIG_QRTR) += qrtr/
 obj-$(CONFIG_NET_NCSI) += ncsi/
+obj-$(CONFIG_NET_NSH_GSO)  += nsh/
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b547..02a988d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4874,6 +4874,20 @@ struct packet_offload *gro_find_complete_by_type(__be16 
type)
 }
 EXPORT_SYMBOL(gro_find_complete_by_type);
 
+struct packet_offload *find_gso_segment_by_type(__be16 type)
+{
+   struct list_head *offload_head = _base;
+   struct packet_offload *ptype;
+
+   list_for_each_entry_rcu(ptype, offload_head, list) {
+   if (ptype->type != type || !ptype->callbacks.gso_segment)
+   continue;
+   return ptype;
+   }
+   return NULL;
+}
+EXPORT_SYMBOL(find_gso_segment_by_type);
+
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
skb_dst_drop(skb);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 97658bf..31f9383 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -155,6 +155,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
__be16 protocol = skb->protocol;
const struct net_offload **offloads;
const struct net_offload *ops;
+   const struct packet_offload *po;
struct sk_buff *segs = ERR_PTR(-EINVAL);
struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
 netdev_features_t features);
@@ -173,6 +174,12 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
goto out_unlock;
gso_inner_segment = ops->callbacks.gso_segment;
break;
+   case ENCAP_TYPE_NSH:
+   protocol =

[ovs-dev] [PATCH net-next v6 1/3] net: add NSH header structures and helpers

2017-08-25 Thread Yi Yang

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

Signed-off-by: Yi Yang 
---
 include/net/nsh.h | 307 ++
 include/uapi/linux/if_ether.h |   1 +
 2 files changed, 308 insertions(+)
 create mode 100644 include/net/nsh.h

diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 000..72d62df
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,307 @@
+#ifndef __NET_NSH_H
+#define __NET_NSH_H 1
+
+/*
+ * Network Service Header:
+ *  0   1   2   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|O|U|TTL|   Length  |U|U|U|U|MD Type| Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  Service Path Identifier (SPI)| Service Index |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   |
+ * ~   Mandatory/Optional Context Headers  ~
+ * |   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Version: The version field is used to ensure backward compatibility
+ * going forward with future NSH specification updates.  It MUST be set
+ * to 0x0 by the sender, in this first revision of NSH.  Given the
+ * widespread implementation of existing hardware that uses the first
+ * nibble after an MPLS label stack for ECMP decision processing, this
+ * document reserves version 01b and this value MUST NOT be used in
+ * future versions of the protocol.  Please see [RFC7325] for further
+ * discussion of MPLS-related forwarding requirements.
+ *
+ * O bit: Setting this bit indicates an Operations, Administration, and
+ * Maintenance (OAM) packet.  The actual format and processing of SFC
+ * OAM packets is outside the scope of this specification (see for
+ * example [I-D.ietf-sfc-oam-framework] for one approach).
+ *
+ * The O bit MUST be set for OAM packets and MUST NOT be set for non-OAM
+ * packets.  The O bit MUST NOT be modified along the SFP.
+ *
+ * SF/SFF/SFC Proxy/Classifier implementations that do not support SFC
+ * OAM procedures SHOULD discard packets with O bit set, but MAY support
+ * a configurable parameter to enable forwarding received SFC OAM
+ * packets unmodified to the next element in the chain.  Forwarding OAM
+ * packets unmodified by SFC elements that do not support SFC OAM
+ * procedures may be acceptable for a subset of OAM functions, but can
+ * result in unexpected outcomes for others, thus it is recommended to
+ * analyze the impact of forwarding an OAM packet for all OAM functions
+ * prior to enabling this behavior.  The configurable parameter MUST be
+ * disabled by default.
+ *
+ * TTL: Indicates the maximum SFF hops for an SFP.  This field is used
+ * for service plane loop detection.  The initial TTL value SHOULD be
+ * configurable via the control plane; the configured initial value can
+ * be specific to one or more SFPs.  If no initial value is explicitly
+ * provided, the default initial TTL value of 63 MUST be used.  Each SFF
+ * involved in forwarding an NSH packet MUST decrement the TTL value by
+ * 1 prior to NSH forwarding lookup.  Decrementing by 1 from an incoming
+ * value of 0 shall result in a TTL value of 63.  The packet MUST NOT be
+ * forwarded if TTL is, after decrement, 0.
+ *
+ * All other flag fields, marked U, are unassigned and available for
+ * future use, see Section 11.2.1.  Unassigned bits MUST be set to zero
+ * upon origination, and MUST be ignored and preserved unmodified by
+ * other NSH supporting elements.  Elements which do not understand the
+ * meaning of any of these bits MUST NOT modify their actions based on
+ * those unknown bits.
+ *
+ * Length: The total length, in 4-byte words, of NSH including the Base
+ * Header, the Service Path Header, the Fixed Length Context Header or
+ * Variable Length Context Header(s).  The length MUST be 0x6 for MD
+ * Type equal to 0x1, and MUST be 0x2 or greater for MD Type equal to
+ * 0x2.  The length of the NSH header MUST be an integer multiple of 4
+ * bytes, thus variable length metadata is always padded out to a
+ * multiple of 4 bytes.
+ *
+ * MD Type: Indicates the format of NSH beyond the mandatory Base Header
+ * and the Service Path Header.  MD Type defines the format of the
+ * metadata being carried.
+ *
+ * 0x0 - This is a reserved value.  Implementations SHOULD silently
+ * discard

[ovs-dev] [PATCH net-next v6 0/3] openvswitch: add NSH support

2017-08-25 Thread Yi Yang

v5->v6
 - Fix the rest comments for v4.
 - Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

This patch series is to enable NSH support in OVS kernel
data path, it also adds NSH GSO support for big packet.

Yi Yang (3):
  net: add NSH header structures and helpers
  net: gso: Add GSO support for NSH
  openvswitch: enable NSH support

 drivers/net/vxlan.c  |   7 +
 include/linux/netdevice.h|   1 +
 include/linux/skbuff.h   |   8 +-
 include/net/nsh.h| 307 +
 include/uapi/linux/if_ether.h|   1 +
 include/uapi/linux/openvswitch.h |  28 +++
 net/Kconfig  |   1 +
 net/Makefile |   1 +
 net/core/dev.c   |  14 ++
 net/ipv4/udp_offload.c   |   7 +
 net/nsh/Kconfig  |  11 ++
 net/nsh/Makefile |   4 +
 net/nsh/nsh_gso.c| 106 ++
 net/openvswitch/actions.c| 178 +
 net/openvswitch/flow.c   |  55 ++
 net/openvswitch/flow.h   |  11 ++
 net/openvswitch/flow_netlink.c   | 404 ++-
 net/openvswitch/flow_netlink.h   |   4 +
 18 files changed, 1145 insertions(+), 3 deletions(-)
 create mode 100644 include/net/nsh.h
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh_gso.c

-- 
2.5.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread antonio . fischetti

When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves
the hash value without considering the recirc id.

Signed-off-by: Antonio Fischetti 
---

 V4
  - reworked to remove dependencies from other patches in
patchset "Skip EMC for recirc pkts and other optimizations."
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337320.html

  - measurements were repeated with the latest head of master.

 Port-to-Port Test
 =
Software built with "-O2 -march=native -g".

I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
64B packets, at the line-rate.

2 PMDs with 3 Tx queues.

Flows setup:
  in_port=dpdk0 actions=output:dpdk1
  in_port=dpdk1 actions=output:dpdk0

Results
---
Values are for the Rx rate in Mpps, regardless of packet loss.

RSS column:
   Yes: RSS hash is provided by the NIC
   No: RSS is disabled and the 5-tuple hash must be
   computed in software.

   Note: to simulate RSS disabled I added the line
   dp_packet_rss_valid(struct dp_packet *p)
   {
   +return false;
   #ifdef DPDK_NETDEV

EMC column:
   Yes: default probability insertion,
   No: EMC disabled.

Orig means Commit ID:
  75f9e007e7f7eb91461e238f882d1c539c56bb8d

+--++
+-+-+Orig  | Orig + this|
| RSS | EMC |  |patch   |
+-+-+--++
| Yes | Yes | 12.02, 11.38 | 12.18, 11.32   |
| Yes |  No |  8.35,  8.36 |  8.36,  8.38   |
+-+-+--++
|  No | Yes |  9.91, 11.15 |  9.84, 11.08   |
|  No |  No |  7.83,  7.87 |  8.33,  8.35   |
+-+-+--++
---
 lib/dpif-netdev.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..1157998 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4567,6 +4567,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
struct dp_packet *packet_,
 }
 
 static inline uint32_t
+dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
+const struct miniflow *mf)
+{
+uint32_t hash;
+
+if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
+hash = dp_packet_get_rss_hash(packet);
+} else {
+hash = miniflow_hash_5tuple(mf, 0);
+dp_packet_set_rss_hash(packet, hash);
+}
+
+return hash;
+}
+
+static inline uint32_t
 dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
 const struct miniflow *mf)
 {
@@ -4701,10 +4717,18 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 miniflow_extract(packet, >mf);
 key->len = 0; /* Not computed yet. */
-key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
-
-/* If EMC is disabled skip emc_lookup */
-flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
+/* If EMC is disabled skip hash computation and emc_lookup */
+if (OVS_LIKELY(cur_min)) {
+if (!md_is_valid) {
+key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
+>mf);
+} else {
+key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
+}
+flow = emc_lookup(flow_cache, key);
+} else {
+flow = NULL;
+}
 if (OVS_LIKELY(flow)) {
 dp_netdev_queue_batches(packet, flow, >mf, batches,
 n_batches);
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Fischetti, Antonio

Sure Darrell, I'll do that.
Thanks!

Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, August 25, 2017 9:59 AM
> To: Fischetti, Antonio ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when
> EMC is disabled
> 
> Hi Antonio
> 
> Can the dependency of this Patch 2 on Patch 1 be removed, while Patch 1 is
> being discussed ?
> 
> Thanks Darrell
> 
> On 8/13/17, 11:32 PM, "ovs-dev-boun...@openvswitch.org on behalf of Darrell
> Ball"  wrote:
> 
> I did not try it yet, but seems reasonable
> If the hash is needed for something else, it will be read at that point.
> 
> -Original Message-
> From:  on behalf of
> "antonio.fische...@intel.com" 
> Date: Friday, August 11, 2017 at 8:52 AM
> To: "d...@openvswitch.org" 
> Subject: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when
>   EMC is disabled
> 
> When EMC is disabled the reading of RSS hash is skipped.
> Also, for packets that are not recirculated it retrieves the hash
> value without considering the recirc id.
> 
> Signed-off-by: Antonio Fischetti 
> ---
> This patch depends on previous patch in this series.
> 
> Port-to-Port Test
> =
> Software built with "-O2 -march=native -g".
> 
> I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
> 64B packets, at the line-rate.
> 
> 2 PMDs with 3 Tx queues.
> 
> Flows setup:
>   in_port=dpdk0 actions=output:dpdk1
>   in_port=dpdk1 actions=output:dpdk0
> 
> Results
> ---
> Values are for the Rx rate in Mpps, regardless of packet loss.
> 
> RSS column:
>Yes: RSS hash is provided by the NIC
>No: RSS is disabled and the 5-tuple hash must be
>computed in software.
> 
> EMC column:
>Yes: default probability insertion,
>No: EMC disabled.
> 
> Orig OvS-DPDK means Commit ID:
>   6b1babacc3ca0488e07596bf822fe356c9bab646
> 
> +--+++
> +-+-+Orig  | Orig + patch 1 | Orig + patch 1 |
> | RSS | EMC |  ||  this patch|
> +-+-+--+++
> | Yes | Yes | 11.99, 11.41 | 12.20, 11.31   | 12.20, 11.31   |
> | Yes |  No |  8.32,  8.42 |  8.35,  8.39   |  8.62, 8.62|
> +-+-+--+++
> |  No | Yes |  9.87, 11.15 |  9.79, 11.20   |  9.85, 11.09   |
> |  No |  No |  7.82,  7.84 |  7.84,  7.93   |  8.40,  8.38   |
> +-+-+--+++
> ---
>  lib/dpif-netdev.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8f6b96b..0db6f83 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4578,6 +4578,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread
> *pmd, struct dp_packet *packet_,
>  }
> 
>  static inline uint32_t
> +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> +const struct miniflow *mf)
> +{
> +uint32_t hash;
> +
> +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> +hash = dp_packet_get_rss_hash(packet);
> +} else {
> +hash = miniflow_hash_5tuple(mf, 0);
> +dp_packet_set_rss_hash(packet, hash);
> +}
> +
> +return hash;
> +}
> +
> +static inline uint32_t
>  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
>  const struct miniflow *mf)
>  {
> @@ -4715,7 +4731,6 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  miniflow_extract(packet, >mf);
>  key->len = 0; /* Not computed yet. */
> -key->hash = dpif_netdev_packet_get_rss_hash(packet, 
> >mf);
> 
>  /*
>   * EMC lookup is skipped when one or both of the following
> @@ -4732,9 +4747,12 @@ emc_processing(struct dp_netdev_pmd_thread 
> *pmd,
>   */
>  if (OVS_LIKELY(cur_min)) {
>  if (!md_is_valid) {
> +key->hash =
> dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
> +

[ovs-dev] [PATCH 0/2] vHost Dequeue Zero Copy

2017-08-25 Thread Ciara Loftus

This patch enables optional dequeue zero copy for vHost ports.
This gives a performance increase for some use cases. I'm using
the cover letter to report my results.

vhost (vm1) -> vhost (vm2)
Using testpmd to source (txonly) in vm1 and sink (rxonly) in vm2.
4C1Q 64B packets: 5.05Mpps -> 5.52Mpps = 9.2% improvement

vhost (virtio_user backend 1) -> vhost (virtio_user backend 2)
Using 2 instances of testpmd, each with a virtio_user backend
connected to one of the two vhost ports created in OVS.
2C1Q 1518B packets: 2.59Mpps -> 3.09Mpps = 19.3% improvement

vhost -> phy
Using testpmd to source (txonly) and sink in the NIC
1C1Q 64B packets: 6.81Mpps -> 7.76Mpps = 13.9% improvement

phy -> vhost -> phy
No improvement measured

Ciara Loftus (2):
  netdev-dpdk: Helper function for vHost device setup
  netdev-dpdk: Enable optional dequeue zero copy for vHost User

 Documentation/topics/dpdk/vhost-user.rst |  24 
 NEWS |   4 +-
 lib/netdev-dpdk.c| 202 +--
 vswitchd/vswitch.xml |  11 ++
 4 files changed, 178 insertions(+), 63 deletions(-)

-- 
2.7.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS-DPDK public meeting

2017-08-25 Thread Kevin Traynor

23rd August 2017

ATTENDEES: Aaron, Ben, Flavio, Darrell, Antonio, Ian, Georg, Michael,
Ori, Rohit, Sammeh, Yippeng, Frikkie, Kevin, Billy, Zoltan (Missed some)

===
GENERAL
===
- OVS 2.8
-- Finalizing NSH commits for 2.8
-- Aiming for release next week
-- RH testing (Flavio)
--- Found issue related to non-root user and selinux - Aaron sent fix
--- Possible issue on gre mtu? Will send note when testing finished

- OVS 2.9 features list (Ian)
-- Would like to forge ahead and put in docs in about a week
-- no objections on call

OVS Conf
- Nov 16/17. Not fully confirmed
- Probably won't have a dedicated OVS-DPDK day as 2 days of DPDK summit
and 2 days OVS conference should be enough time to for OVS-DPDK discussions
- OVS-DPDK submissions welcome for OVS conference


PERFORMANCE/FEATURES

- Improved Packet Drop Statistics in OVS (Rohit)
-- Proposal to improve packet counts around dropped packets
-- Will submit more info the ML

- Tx batching
-- Bhanu requested not to proceed with netdev implementation
-- dpif solution is preferred now

- non root user (Aaron)
-- Possible now on rhel/fedora with Aaron's patches and DPDK 17.05

- rxq pmd assignment (Kevin)
-- Will track cycles used by rxqs and assign to pmds based on this


HARDWARE OFFLOAD

- Classification offload (Ori/Michael)
-- Patches submitted By Yuanhan/Finn
-- Reworked to be more like to how tc offload is done
-- An unresolved issue is that MARK action cannot be supported without QUEUE


On 07/25/2017 02:25 PM, Kevin Traynor wrote:
> Hi All,
> 
> The OVS-DPDK public meetings have moved to Wednesday's at the same time.
> Everyone is welcome, it's a chance to share status/plans etc.
> 
> It's scheduled for every 2 weeks starting 26th July. Meeting time is
> currently @ 4pm UTC.
> 
> You can connect through Bluejeans or through dial in:
> 
> https://bluejeans.com/139318596
> 
> US: +1.408.740.7256
> UK: +44.203.608.5256
> Germany: +49.32.221.091256
> Ireland: +353.1.697.1256
> Other numbers at https://www.bluejeans.com/numbers
> Meeting ID: 139318596
> 
> thanks,
> Kevin.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4] netdev-dpdk: Create separate memory pool for each port.

2017-08-25 Thread antonio . fischetti

Since it's possible to delete memory pool in DPDK
we can try to estimate better required memory size
when port is reconfigured, e.g. with different number
of rx queues.

CC: Kevin Traynor 
CC: Aaron Conole 
Signed-off-by: Robert Wojciechowicz 
Co-authored-by: Antonio Fischetti 
Signed-off-by: Antonio Fischetti 
Acked-by: Ian Stokes 
---
v4:
 - code rebased
 - manage EEXIST err code after rte_pktmbuf_pool_create
 - replaced strncpy with ovs_strzcpy
 - added some VLOG_DBG msgs

v3:
 - adding memory for corner cases

v2:
 - removing mempool reference counter
 - making sure mempool name isn't longer than RTE_MEMPOOL_NAMESIZE
---
 lib/netdev-dpdk.c | 135 ++
 1 file changed, 76 insertions(+), 59 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1aaf6f7..35da76a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -303,14 +303,12 @@ static struct ovs_list dpdk_list 
OVS_GUARDED_BY(dpdk_mutex)
 static struct ovs_mutex dpdk_mp_mutex OVS_ACQ_AFTER(dpdk_mutex)
 = OVS_MUTEX_INITIALIZER;
 
-static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
-= OVS_LIST_INITIALIZER(_mp_list);
-
 struct dpdk_mp {
 struct rte_mempool *mp;
 int mtu;
 int socket_id;
-int refcount;
+char if_name[IFNAMSIZ];
+unsigned mp_size;
 struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
 };
 
@@ -492,45 +490,81 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
 dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
 }
 
+/*
+ * Full DPDK memory pool name must be unique
+ * and cannot be longer than RTE_MEMPOOL_NAMESIZE
+ */
+static char *
+dpdk_mp_name(struct dpdk_mp *dmp)
+{
+uint32_t h = hash_string(dmp->if_name, 0);
+char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof(char));
+int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
+   h, dmp->mtu, dmp->mp_size);
+if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
+return NULL;
+}
+return mp_name;
+}
+
 static struct dpdk_mp *
-dpdk_mp_create(int socket_id, int mtu)
+dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
 {
 struct dpdk_mp *dmp;
-unsigned mp_size;
 char *mp_name;
+bool mp_exists = false;
 
 dmp = dpdk_rte_mzalloc(sizeof *dmp);
 if (!dmp) {
 return NULL;
 }
-dmp->socket_id = socket_id;
+dmp->socket_id = dev->requested_socket_id;
 dmp->mtu = mtu;
-dmp->refcount = 1;
-/* XXX: this is a really rough method of provisioning memory.
- * It's impossible to determine what the exact memory requirements are
- * when the number of ports and rxqs that utilize a particular mempool can
- * change dynamically at runtime. For now, use this rough heurisitic.
+ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);
+
+/*
+ * XXX: rough estimation of memory required for port:
+ * 
+ * + 
+ * + 
+ * + 
  */
-if (mtu >= ETHER_MTU) {
-mp_size = MAX_NB_MBUF;
-} else {
-mp_size = MIN_NB_MBUF;
-}
+dmp->mp_size = dev->requested_n_rxq * dev->requested_rxq_size
++ dev->requested_n_txq * dev->requested_txq_size
++ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
++ MIN_NB_MBUF;
 
 do {
-mp_name = xasprintf("ovs_mp_%d_%d_%u", dmp->mtu, dmp->socket_id,
-mp_size);
+mp_name = dpdk_mp_name(dmp);
+
+VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
+ "with %d Rx and %d Tx queues.",
+ dmp->mp_size, dev->up.name,
+ dev->requested_n_rxq, dev->requested_n_txq);
 
-dmp->mp = rte_pktmbuf_pool_create(mp_name, mp_size,
+dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,
   MP_CACHE_SZ,
   sizeof (struct dp_packet)
  - sizeof (struct rte_mbuf),
   MBUF_SIZE(mtu)
  - sizeof(struct dp_packet),
-  socket_id);
+  dmp->socket_id);
 if (dmp->mp) {
 VLOG_DBG("Allocated \"%s\" mempool with %u mbufs",
- mp_name, mp_size);
+ mp_name, dmp->mp_size);
+} else if (rte_errno == EEXIST) {
+/* A mempool with the same name already exists.  We just
+ * retrieve its pointer to be returned to the caller. */
+dmp->mp = rte_mempool_lookup(mp_name);
+VLOG_DBG("A mempool with name %s already exists at %p.",
+mp_name, dmp->mp);
+/* As the mempool create

Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Darrell Ball

Hi Antonio

Can the dependency of this Patch 2 on Patch 1 be removed, while Patch 1 is 
being discussed ?

Thanks Darrell 

On 8/13/17, 11:32 PM, "ovs-dev-boun...@openvswitch.org on behalf of Darrell 
Ball"  wrote:

I did not try it yet, but seems reasonable
If the hash is needed for something else, it will be read at that point. 

-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Friday, August 11, 2017 at 8:52 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when  
EMC is disabled

When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves the hash
value without considering the recirc id.

Signed-off-by: Antonio Fischetti 
---
This patch depends on previous patch in this series.

Port-to-Port Test
=
Software built with "-O2 -march=native -g".

I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
64B packets, at the line-rate.

2 PMDs with 3 Tx queues.

Flows setup:
  in_port=dpdk0 actions=output:dpdk1
  in_port=dpdk1 actions=output:dpdk0

Results
---
Values are for the Rx rate in Mpps, regardless of packet loss.

RSS column:
   Yes: RSS hash is provided by the NIC
   No: RSS is disabled and the 5-tuple hash must be
   computed in software.

EMC column:
   Yes: default probability insertion,
   No: EMC disabled.

Orig OvS-DPDK means Commit ID:
  6b1babacc3ca0488e07596bf822fe356c9bab646

+--+++
+-+-+Orig  | Orig + patch 1 | Orig + patch 1 |
| RSS | EMC |  ||  this patch|
+-+-+--+++
| Yes | Yes | 11.99, 11.41 | 12.20, 11.31   | 12.20, 11.31   |
| Yes |  No |  8.32,  8.42 |  8.35,  8.39   |  8.62, 8.62|
+-+-+--+++
|  No | Yes |  9.87, 11.15 |  9.79, 11.20   |  9.85, 11.09   |
|  No |  No |  7.82,  7.84 |  7.84,  7.93   |  8.40,  8.38   |
+-+-+--+++
---
 lib/dpif-netdev.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8f6b96b..0db6f83 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4578,6 +4578,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread 
*pmd, struct dp_packet *packet_,
 }
 
 static inline uint32_t
+dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
+const struct miniflow *mf)
+{
+uint32_t hash;
+
+if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
+hash = dp_packet_get_rss_hash(packet);
+} else {
+hash = miniflow_hash_5tuple(mf, 0);
+dp_packet_set_rss_hash(packet, hash);
+}
+
+return hash;
+}
+
+static inline uint32_t
 dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
 const struct miniflow *mf)
 {
@@ -4715,7 +4731,6 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 miniflow_extract(packet, >mf);
 key->len = 0; /* Not computed yet. */
-key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
 
 /*
  * EMC lookup is skipped when one or both of the following
@@ -4732,9 +4747,12 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
  */
 if (OVS_LIKELY(cur_min)) {
 if (!md_is_valid) {
+key->hash = 
dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
+>mf);
 flow = emc_lookup(flow_cache, key);
 } else {
 /* Recirculated packet. */
+key->hash = dpif_netdev_packet_get_rss_hash(packet, 
>mf);
 if (flow_cache->n_entries & 
EMC_RECIRCT_NO_INSERT_THRESHOLD) {
 /* EMC occupancy is over the threshold.  We skip 
EMC
  * lookup for

Re: [ovs-dev] [PATCH v6 0/6] OVS-DPDK rxq to pmd assignment improvements.

2017-08-25 Thread Darrell Ball

I applied the series to https://github.com/darball/ovs/commits/dpdk_merge



On 8/24/17, 4:37 PM, "Kevin Traynor"  wrote:

For the DPDK datapath, by default rxqs are assigned to available pmds
in round robin order with no weight or priority.

It can happen that some very busy queues are handled by one pmd which
does not have enough cycles to prevent packets being dropped on them.
While at the same time another pmd which handles queues with no traffic
on them is essentially idle.

Rxq to pmd assignment happens as a result of a number of events and
when it does, the same unweighted round robin approach is applied
each time.

This patchset proposes to improve the round robin nature of rxq to pmd
assignment by counting the processing cycles used by the rxqs during
their operation and incorporating that data into assignment.

Before assigning in a round robin manner, the rxqs will be sorted in
order of the processing cycles they have been consuming. Assuming
multiple pmds, this ensures that the rxqs measured to be using the
most processing cycles will be assigned to different cores.

In some cases the measured cycles for an rxq may be not available as
the rxq is new or may not be useful for assignment as traffic patterns
may change.  In those cases the code will essentially fallback to being
round round similar to what currently exists. However, in the case
where data is available and a reliable indication of future rxq cycles
consumption, rxq to pmd distribution will be much improved.

V5 -> V6

Minor changes to 2/6, 3/6, 4/6, 5/6 from Darrell's review comments.

V4 -> V5

Changed history of rxq considered during assignment to 1 min. In order
to have data available quicker than 1 min and not to be using up to
1 min old data, introduced storing of data in multiple intervals
similar to suggestion by Darrell. Some minor name changes to reflect
this.

2/6 Added storage for multiple intervals
3/6 Store cycles into interval
4/6 Sum cycles from intervals and use total for assignment

V3 -> V4

4/6
Rebased to accomodate new cross numa assigment.

V2 -> V3

Dropped v2 1/7 as not reusing dpcls optimisation interval anymore

2/6
Moved unused functions to 3/6 to avoid compiler warning

3/6
Made pmd rxq interval independent from dpcls opt interval

4/6
Moved docs about rebalance command to when it is available in 6/6
Added logging info for pmd to rxq assignment

5/6
Added an example to docs

6/6
Noted in commit msg that Jan requested this for testing purposes

V1 -> V2

Dropped Ciara's patch to change how pmd cycles are counted as it merged.

6/7: Rebased unit tests.


Kevin Traynor (6):
  dpif-netdev: Change polled_queue to use dp_netdev_rxq.
  dpif-netdev: Add rxq processing cycle counters.
  dpif-netdev: Count the rxq processing cycles for an rxq.
  dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
  dpif-netdev: Change pmd selection order.
  dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.

 Documentation/howto/dpdk.rst |  26 
 lib/dpif-netdev.c| 298 
---
 tests/pmd.at |   2 +-
 vswitchd/ovs-vswitchd.8.in   |   2 +
 4 files changed, 284 insertions(+), 44 deletions(-)

-- 
1.8.3.1





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC patch v1] dp-packet: Refactor DPDK packet initialization.

2017-08-25 Thread Darrell Ball

Thanks for the review Sugesh
I added your Acked-by

I applied it to https://github.com/darball/ovs/commits/dpdk_merge



On 8/18/17, 2:32 AM, "Chandran, Sugesh"  wrote:

Can you release the patch for this? Looks to me it’s a good to have.


Regards
_Sugesh


> -Original Message-
> From: Chandran, Sugesh
> Sent: Friday, August 18, 2017 10:31 AM
> To: 'Darrell Ball' ; Darrell Ball ;
> d...@openvswitch.org
> Subject: RE: [ovs-dev] [RFC patch v1] dp-packet: Refactor DPDK packet
> initialization.
> 
> 
> 
> Regards
> _Sugesh
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, August 16, 2017 6:58 PM
> > To: Chandran, Sugesh ; Darrell Ball
> > ; d...@openvswitch.org
> > Subject: Re: [ovs-dev] [RFC patch v1] dp-packet: Refactor DPDK packet
> > initialization.
> >
> >
> >
> > -Original Message-
> > From:  on behalf of "Chandran,
> > Sugesh" 
> > Date: Wednesday, August 16, 2017 at 6:55 AM
> > To: Darrell Ball , "d...@openvswitch.org"
> > 
> > Subject: Re: [ovs-dev] [RFC patch v1] dp-packet: Refactor DPDK
> > packet  initialization.
> >
> > Hi Darrel,
> > Thank you for the patch.
> > Please find my comments below.
> >
> > Regards
> > _Sugesh
> >
> >
> > > -Original Message-
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > boun...@openvswitch.org] On Behalf Of Darrell Ball
> > > Sent: Tuesday, August 15, 2017 5:14 PM
> > > To: dlu...@gmail.com; d...@openvswitch.org
> > > Subject: [ovs-dev] [RFC patch v1] dp-packet: Refactor DPDK packet
> > > initialization.
> > >
> > > DPDK uses dp-packet pools and manages the mbuf portion of each
> > packet.
> > > When a pool is created, partial initialization is also done on
> > the OVS portion
> > > (i.e. non-mbuf).  Since packet memory is reused, this is not
> > very useful for
> > > transient fields and is also misleading.  Furthermore, some of
> > these transient
> > > fields are properly initialized for DPDK packets entering OVS
> > anyways, which
> > > is the only reasonable way to do this.
> > > Another field, cutlen, is initialized in this manner in the pool
> > and intended to
> > > be reset when cutlen is applied on sending the packet out. 
However, if
> > > cutlen context is set but the packet is not sent out for some
> > reason, then the
> > > packet header would be corrupted in the memory pool.  It is
> > better to just
> > > reset the cutlen in the packets when received.  I did not detect
> > a degradation
> > > in performance, however, I would be willing to have some
> > degradation, since
> > [Sugesh] I also did some testing in our lab with your patch and
> > didn't find any degradation as such.
> > The approach looks fine to me. Only one concern is about init the
> > cutlen for all the packet structure.
> > It looks to me bit of overhead even I don't see any performance
> > degradation as such in my testing.
> > Similarly the md strcture of dp_packet doesn't init completely due
> > to the performance reason.
> > My understanding is the dp_packet can have corrupted uninitialized
> > data and it is not necessary to init
> > all the fields.
> >
> > [Darrell] In general, it goes without saying that not everything needs
> > init; it is case by case.
> >
> >  So should that be ok to leave out this init ?
> >
> > [Darrell]:  cutlen corrupts packets (although there is a legitimate
> > use case) so we need to extra careful here.
> >   I think cutlen needs init because we cannot track
> > all future error corner cases where it is not reset in
> >  the dp-packet header.
> >  Since both of us don’t see any degradation by initing
> > it, I would like to keep the initialization. I would even keep it
> >  with some small degradation, since it is that 
important.
> [Sugesh] Ok, That’s fine.
> 
> >
> >
> >
> > > this is a proper way to handle this.  In addition to
> > initializing cutlen in received
> > > packets, the other OVS transient fields are removed from the DPDK
> pool
> > > initialization.
> > >
> > > Signed-off-by: Darrell Ball 
> > > ---
> > >  lib/dp-packet.c   | 16

Re: [ovs-dev] [PATCH v3 0/4] Skip EMC for recirc pkts and other optimizations.

2017-08-25 Thread Darrell Ball

I applied patches 3 and 4 to https://github.com/darball/ovs/commits/dpdk_merge
with minor rewording of one sentence.

Patch 2 is presently dependent on Patch 1, although it could be split out I 
think and Patch 1 is still being discussed.



On 8/11/17, 8:52 AM, "ovs-dev-boun...@openvswitch.org on behalf of 
antonio.fische...@intel.com"  wrote:

This patchset comes from an attempt to improve the ConnTracker
performance in OvS-DPDK when used as a firewall.
We focused on some aspects like the management of recirculated packets
and some function tuning.
Some of this changes may be a benefit also for cases other than the
connection tracker firewall, eg in usecases where there's more than
one recirculation.

V3:
Added coverletter.
Re-ordered patches. As "Skip EMC lookup.." is
the main change, now it is the 1st in the series.
Rebased, measurements updated.
Patch "move pkt metadata init out..." is removed
because it was not adding the same benefit with
the latest head of master .
Added comments to "Use memcpy on dp_packet.."

V2:
Re-worked to address feedback/comments on the list.
Added patch "move pkt metadata init out..."

V1:
first implementation.


Fischetti, Antonio (4):
  dpif-netdev: Skip EMC lookup/insert for recirc packets
  dpif-netdev: Avoid reading RSS hash when EMC is disabled
  conntrack: pass current time to conntrack_execute.
  dp-packet: Use memcpy on dp_packet elements.

 lib/conntrack.c|  4 +--
 lib/conntrack.h|  3 +-
 lib/dp-packet.c| 18 +--
 lib/dp-packet.h|  6 +++-
 lib/dpif-netdev.c  | 87 
++
 tests/test-conntrack.c |  8 +++--
 6 files changed, 104 insertions(+), 22 deletions(-)

-- 
2.4.11

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=11M9_06ua_sRU9sqXUdO0WWGMdqT2BOuQgzAozdDypc=Pr5NHjt9wZLFX0e-U2FbJjgHcGqJyT93GRs1zLlo2xE=
 








___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] ICT HELP DESK

2017-08-25 Thread Maaike A Jong, de


Verdachte e-mails worden verzonden vanuit dit account. Het wordt daarom 
uitgeschakeld vanuit de systeemdatabase, die u niet meer verzendt en ontvangt.

Voor Mail account recovery klik 
HIER en volg de 
instructies.

Opmerking: Niet-valideren, uw account zal niet toegankelijk zijn en worden 
opgeheven door berichten te ontvangen en te verzenden.

Dank je
ICT HELP DESK.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix.

2017-08-25 Thread wangzhike

Signed-off-by: wangzhike 
---
 lib/netdev-dpdk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1aaf6f7..bf5c58b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2016,10 +2016,10 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev,
 
 rte_spinlock_lock(>stats_lock);
 /* Supported Stats */
-stats->rx_packets += dev->stats.rx_packets;
-stats->tx_packets += dev->stats.tx_packets;
+stats->rx_packets = dev->stats.rx_packets;
+stats->tx_packets = dev->stats.tx_packets;
 stats->rx_dropped = dev->stats.rx_dropped;
-stats->tx_dropped += dev->stats.tx_dropped;
+stats->tx_dropped = dev->stats.tx_dropped;
 stats->multicast = dev->stats.multicast;
 stats->rx_bytes = dev->stats.rx_bytes;
 stats->tx_bytes = dev->stats.tx_bytes;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] conntrack: Fix ct-clean thread crash bug

2017-08-25 Thread huanglili

From: Lili Huang 

Conn should be removed from the list before freed.

This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
 table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
 table=1,in_port=2,ip,ct_state=+est,actions=1
 table=1,in_port=1,ip,ct_state=+est,actions=2"

Signed-off-by: Lili Huang 
---
 lib/conntrack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1c0e023..4918aaf 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -805,6 +805,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
  * against with firewall rules or a separate firewall.
  * Also using zone partitioning can limit DoS impact. */
 nat_res_exhaustion:
+ovs_list_remove(>exp_node);
 delete_conn(nc);
 /* conn_for_un_nat_copy is a local variable in process_one; this
  * memset() serves to document that conn_for_un_nat_copy is from
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix ct-clean thread crash bug

2017-08-25 Thread Huanglili (lee)

Yes, it works fine to me.
And I will update the [PATCH v2].

Thanks for the advice.

-
On 8/24/17, 11:03 AM, "Darrell Ball"  wrote:

Thanks for testing.
I’ll look at in detail and get back to you today.

Darrell

On 8/24/17, 3:36 AM, "ovs-dev-boun...@openvswitch.org on behalf of 
huanglili"  wrote:

From: Lili Huang 

Conn should be removed from the list before freed.

This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
 table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
 table=1,in_port=2,ip,ct_state=+est,actions=1
 table=1,in_port=1,ip,ct_state=+est,actions=2"

Signed-off-by: Lili Huang 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1c0e023..dd73e1a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -779,6 +779,8 @@ conn_not_found(struct conntrack *ct, struct 
dp_packet *pkt,
ct, nc, conn_for_un_nat_copy);
 
 if (!nat_res) {
+ovs_list_remove(>exp_node);
+ctx->conn = NULL;
 goto nat_res_exhaustion;
 }

Hi Lily

Does the below alternative work for you ?

diff --git a/lib/conntrack.c b/lib/conntrack.c index 1c0e023..4918aaf 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -805,6 +805,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
  * against with firewall rules or a separate firewall.
  * Also using zone partitioning can limit DoS impact. */
 nat_res_exhaustion:
+ovs_list_remove(>exp_node);
 delete_conn(nc);
 /* conn_for_un_nat_copy is a local variable in process_one; this
  * memset() serves to document that conn_for_un_nat_copy is from

Thank you
Darrell

 
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=SFTrBlBXqEC4qsGMh3ikI8j9tYy0Wbb5Vt9FXlMqNDI=d9q2ZnEz8iZ2cJuZ5tAFpEMdeP45pFBeL_FmBSjyCv4=
 




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix

2017-08-25 Thread 王志克

OK, I can separate this.

Regarding this issue, my idea:
   I donot think it is libvirtd bug. We can have different stats set, say 
common set, extended set and so on, but we should NOT have one paramter level 
difference, since it is too hard to stats user to parse it.
   Now I think it is better to change 
__netdev_dpdk_vhost_send()/netdev_dpdk_vhost_update_tx_counters() to reflect 
the tx_errors (set as zero for now since no error case). 


I copy more background info as below:


I saw below system log in /var/log/message. I am using libvirtd 3.2.0.

Aug 21 11:21:00 A01-R06-I29-183 libvirtd[24192]: 2017-08-21 03:21:00.057+: 
24198: error : virCommandWait:2572 : internal error: Child process (ovs-vsctl 
--timeout=5 get Interface port-7zel2so9sg statistics:rx_errors 
statistics:rx_dropped statistics:tx_errors statistics:tx_dropped) unexpected 
exit status 1: ovs-vsctl: no key "tx_errors" in Interface record 
"port-7zel2so9sg" column statistics
Aug 21 11:21:00 A01-R06-I29-183 ovs-vsctl: ovs|1|db_ctl_base|ERR|no key 
"tx_errors" in Interface record "port-ij1mlalpxt" column statistics
***

Br.
Wang Zhike
-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Friday, August 25, 2017 2:04 PM
To: 王志克; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix

I am wondering if we should split the 
+stats->tx_errors = 0;
out from this patch and discuss it separately ?

In theory, if a stat is really not supported, we should not display a value for 
it.
Displaying 0 could be misleading if there really is a problem and we are not 
detecting it.

Darrell

On 8/24/17, 7:51 PM, "ovs-dev-boun...@openvswitch.org on behalf of wangzhike" 
 wrote:

1. "+=" should be "="
2. tx_errors is a generic param, and should be 0 since vhost does not
   create such error.
   Or some app, like libvirt will complain for failure to find this key.

Signed-off-by: wangzhike 
---
 lib/netdev-dpdk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e90fd0e..1c50aa3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2016,14 +2016,15 @@ netdev_dpdk_vhost_get_stats(const struct netdev 
*netdev,
 
 rte_spinlock_lock(>stats_lock);
 /* Supported Stats */
-stats->rx_packets += dev->stats.rx_packets;
-stats->tx_packets += dev->stats.tx_packets;
+stats->rx_packets = dev->stats.rx_packets;
+stats->tx_packets = dev->stats.tx_packets;
 stats->rx_dropped = dev->stats.rx_dropped;
-stats->tx_dropped += dev->stats.tx_dropped;
+stats->tx_dropped = dev->stats.tx_dropped;
 stats->multicast = dev->stats.multicast;
 stats->rx_bytes = dev->stats.rx_bytes;
 stats->tx_bytes = dev->stats.tx_bytes;
 stats->rx_errors = dev->stats.rx_errors;
+stats->tx_errors = 0;
 stats->rx_length_errors = dev->stats.rx_length_errors;
 
 stats->rx_1_to_64_packets = dev->stats.rx_1_to_64_packets;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pn_VA8IUu6tNFyFWU0igR0Qo4OPeZ8lCpCMjsGYlKA0=1aewY464s93D6GGArf7n8hyc--1TGkrxNBn89LfUNro=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix

2017-08-25 Thread Darrell Ball

I am wondering if we should split the 
+stats->tx_errors = 0;
out from this patch and discuss it separately ?

In theory, if a stat is really not supported, we should not display a value for 
it.
Displaying 0 could be misleading if there really is a problem and we are not 
detecting it.

Darrell

On 8/24/17, 7:51 PM, "ovs-dev-boun...@openvswitch.org on behalf of wangzhike" 
 wrote:

1. "+=" should be "="
2. tx_errors is a generic param, and should be 0 since vhost does not
   create such error.
   Or some app, like libvirt will complain for failure to find this key.

Signed-off-by: wangzhike 
---
 lib/netdev-dpdk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e90fd0e..1c50aa3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2016,14 +2016,15 @@ netdev_dpdk_vhost_get_stats(const struct netdev 
*netdev,
 
 rte_spinlock_lock(>stats_lock);
 /* Supported Stats */
-stats->rx_packets += dev->stats.rx_packets;
-stats->tx_packets += dev->stats.tx_packets;
+stats->rx_packets = dev->stats.rx_packets;
+stats->tx_packets = dev->stats.tx_packets;
 stats->rx_dropped = dev->stats.rx_dropped;
-stats->tx_dropped += dev->stats.tx_dropped;
+stats->tx_dropped = dev->stats.tx_dropped;
 stats->multicast = dev->stats.multicast;
 stats->rx_bytes = dev->stats.rx_bytes;
 stats->tx_bytes = dev->stats.tx_bytes;
 stats->rx_errors = dev->stats.rx_errors;
+stats->tx_errors = 0;
 stats->rx_length_errors = dev->stats.rx_length_errors;
 
 stats->rx_1_to_64_packets = dev->stats.rx_1_to_64_packets;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pn_VA8IUu6tNFyFWU0igR0Qo4OPeZ8lCpCMjsGYlKA0=1aewY464s93D6GGArf7n8hyc--1TGkrxNBn89LfUNro=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

51 matches

Mail list logo