Re: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache

2018-01-03 Thread Fischetti, Antonio
Thanks William for your comments.
Some replies inline.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of William Tu
> Sent: Wednesday, January 3, 2018 1:32 AM
> To: Wang, Yipeng1 
> Cc: d...@openvswitch.org; Tai, Charlie 
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache
> 
> Hi Yipeng,
> 
> Thanks for the reply.
> 
> On Tue, Jan 2, 2018 at 10:45 AM, Wang, Yipeng1 
> wrote:
> > Hi, William,
> >
> > Thanks for the comment. You are right. If the RSS hash does not
> consider the fields that matter, the situation you mentioned could
> happen.
> >
> > There are two design options for CD as you may find, when the
> signatures collide, we could either replace the existing entry (the
> current design), or still insert the entry into the cache. If we chose
> the second design, I think the situation you mentioned could be
> alleviated. We chose the first one mostly because of its simplicity and
> speed for the hit case. For example, if we allow multiple entries with
> the same signature stay in one bucket, then the lookup function needs to
> iterate all the entries in a bucket to find all the matches (for scalar
> version). And additional loops and variables are required to iterate all
> the matches. We expected to see some percentage of throughput influence
> for cache hit cases.
> 
> I think the cost of having multiple entries with the same signature is
> too high, basically the CD lookup time increase from O(1) to O(n),
> where n is the bucket size.
> 
> >
> > But as you suggested, if the situation you mentioned is very common in
> real use cases, and RSS does not consider the vlan id, we could choose
> to not overwrite. Another option is to reduce the insertion rate (or
> turn off CD) as CD's miss ratio is high (this is similar to the EMC
> conditional insertion). Then the 100% miss ratio case can be alleviated.
> This is an easy change for CD. Or we could use software hash together
> with RSS to consider vlan tag, this could benefit EMC too I guess.

[Antonio] Hmm, I expect that adding some software hash computation would
kill the performance.


> >
> > There are many design options and trade-offs but we eventually want to
> have a design that work for most use cases.
> 
> I don't have any traffic dataset, but I would assume it's pretty
> common that multiple tunneling protocols are deployed. That said, the
> RSS hash, which is based on outer-header 5-tuple, might have little
> difference and cause high collision when flows try to match fields
> such as vxlan vni, or geneve metadata field. Matching the inner
> packets requires recirculation, so the rss of inner 5-tuple come from
> cpu, and I guess the CD's hit rate is higher for inner packets.
> 
> The DFC (datapath flow cache) patch seems to have similar drawbacks?

[Antonio] This is a general issue due to the fact that RSS hash is 
computed on a pre-defined 5-tuple of the outer header. So it's not
specific to the CD or the DFC implementations. For sure it affects
the EMC.


> The fundamental issue seems to be the choice of hash function (RSS),
> which only covers 5-tuple. Can we configure the rss hash to hash on
> more fields when subtables uses more than 5-tuple?

[Antonio] 
> 
> Regards,
> William
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/5] conntrack: add commands to r/w CT parameters.

2017-12-15 Thread Fischetti, Antonio
Sure, that's perfect.

Thanks, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, December 15, 2017 7:23 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>;
> d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 2/5] conntrack: add commands to r/w CT
> parameters.
> 
> Hi Antonio
> 
> Alternatively, given that most of code is rewritten, I was planning on
> just submitting the patches, which are in process, and adding you as co-
> author.
> Will that work for you?
> 
> Thanks Darrell
> 
> 
> 
> On 12/15/17, 11:05 AM, "Fischetti, Antonio"
> <antonio.fische...@intel.com> wrote:
> 
> Thanks Darrell for your review.
> I agree with your comments, if we have to consider R/W commands with
> more than one parameter is much better to have specific functions
> as you explained and showed.
> 
> I'll rework accordingly and post a v4.
> 
> Regards,
> -Antonio
> 
> > -Original Message-
>     > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Monday, December 11, 2017 6:44 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> > d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v3 2/5] conntrack: add commands to
> r/w CT
> > parameters.
> >
> > One extra note inline
> >
> > Thanks Darrell
> >
> > On 12/11/17, 8:35 AM, "Darrell Ball" <db...@vmware.com> wrote:
> >
> > Thanks Antonio for doing all this and pushing it forward.
> >
> > Regarding patches 2-4:
> >
> > I understand you want to save some code for various possible
> set and
> > get operations.
> > The prior art for these commands is however not generic set
> and get
> > commands.
> > Sometimes, we have specific commands that can take different
> numbers
> > of arguments but
> > those are specific and the context is clear.
> > If we take some examples, we might see there is an issue with
> the
> > approach here.
> > Take per zone limits as one example, which is an internal
> > requirement that we are working on
> > across datapaths.
> > This is a set operation with 2 arguments plus the datapath = 3
> > arguments.
> > This won’t work with the generic set functions here.
> > The corresponding get will not work either.
> > Trying to make this work generically will get pretty messy and
> the
> > parameter validation error prone.
> >
> > I would like to propose an alternative (a simple one; also
> with code
> > consolidation and more error checking) below:
> > Please let me know what you think.
> >
> > Thanks Darrell
> >
> >
> ///
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index f5a3aa9..c8ad548 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -2485,6 +2485,27 @@ conntrack_flush(struct conntrack *ct,
> const
> > uint16_t *zone)
> >  return 0;
> >  }
> >
> > +int
> > +conntrack_set_maxconns(struct conntrack *ct, uint32_t
> maxconns)
> > +{
> > +atomic_init(>n_conn_limit, maxconns);
> > +return 0;
> > +}
> > +
> > +int
> > +conntrack_get_maxconns(struct conntrack *ct, uint32_t
> *maxconns)
> > +{
> > +atomic_read_relaxed(>n_conn_limit, maxconns);
> > +return 0;
> > +}
> > +
> > +int
> > +conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns)
> > +{
> > +*nconns = atomic_count_get(>n_conn);
> > +return 0;
> > +}
> > +
> >  /* This function must be called with the ct->resources read
> lock
> > taken. */
> >  static struct alg_exp_node *
> >  expectation_lookup(struct hmap *alg_expectations,
> > diff --git a/lib/conntrack.h b/lib/conntrack.h
> > index fbeef1c..8652724 100644
> > --- a/lib/conntrack.h
> > +++ b/lib/conntrack.h
> > @@ -114,6 +114,9 @@ int conntrack_dump_next(struct
> conntrack_dump *,
> > struc

Re: [ovs-dev] [PATCH v3 5/5] doc: ConnTracker cfg parameters.

2017-12-15 Thread Fischetti, Antonio
Thanks Darrell and Stephen for your suggestions. 
I'll rework accordingly in v4.

Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Monday, December 11, 2017 6:02 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>;
> d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 5/5] doc: ConnTracker cfg parameters.
> 
> Thanks Antonio for doing this.
> 
> 1/ Given the comments on patches 2-4, I think the documentation would
> change in dpctl.man to be attribute specific, if
>  we go that route.
>  I did not write it up yet, but most of it would be obvious.
>  One exception is how a case where setting a limit is handled when
> the limit is already exceeded – this needs documentation.
>  I think the simple and robust approach is to set the attribute
> regardless without affecting existing connections.  When existing
>  connections time out, the limit would be enforced. This is what the
> proposed code does.
> 
> 2/ I also think the userspace connection tracker documentation does not
> belong in dpdk documentation.
>  Part of the content in intro/install/dpdk.rst could be moved to
> dpctl.man.
>  dpctl.man is pulled into ovs-vswitchd.8.pdf.
> 
> 3/ The documentation in dpctl.man would mention that support is
> presently only in the userspace connection tracker.
> 
> Thanks Darrell
> 
> 
> 
> On 10/13/17, 1:28 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf
> of antonio.fische...@intel.com> wrote:
> 
> Update documentation with the new commands to Read/Write
> ConnTracker configuration parameters.
> 
> CC: Kevin Traynor <ktray...@redhat.com>
> CC: Darrell Ball <dlu...@gmail.com>
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  Documentation/intro/install/dpdk.rst | 25 +
>  lib/dpctl.man| 10 ++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index bb69ae5..a1f259c 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -568,6 +568,31 @@ not needed i.e. jumbo frames are not needed, it
> can be forced off by adding
>  chains of descriptors it will make more individual virtio
> descriptors available
>  for rx to the guest using dpdkvhost ports and this can improve
> performance.
> 
> +Connection Tracker
> +~~
> +
> +When the Connection Tracker is enabled the overall performance can
> be deeply
> +affected, even with simple firewall rules and with stateless
> protocols like
> +UDP.  In order to find a better tuning, commands like
> +
> +::
> +
> +$ ovs-appctl dpctl/ct-get-glbl-cfg 
> +$ ovs-appctl dpctl/ct-set-glbl-cfg =
> +
> +allow respectively to read the current value, or set a new value to
> a
> +configuration parameter.
> +For example, to reduce the impact of the Connection Tracker load on
> the
> +system performance, the maximum number of tracked connections can
> be
> +reduced.
> +
> +The available configuration parameters are:
> +
> +- maxconn: Maximum number of connections managed by the Connection
> Tracker
> +  module. It's both readable and writeable.
> +- totconn: Total number of connections currently managed by the
> Connection
> +  Tracker module. Readable only.
> +
>  Limitations
>  
> 
> diff --git a/lib/dpctl.man b/lib/dpctl.man
> index 675fe5a..64ad105 100644
> --- a/lib/dpctl.man
> +++ b/lib/dpctl.man
> @@ -235,3 +235,13 @@ For each ConnTracker bucket, displays the
> number of connections used
>  by \fIdp\fR.
>  If \fBgt=\fIThreshold\fR is specified, bucket numbers are displayed
> when
>  the number of connections in a bucket is greater than
> \fIThreshold\fR.
> +.
> +.TP
> +\*(DX\fBct\-get\-glbl\-cfg\fR [\fIdp\fR] \fBparam\fR
> +Read the current value of the specified ConnTracker parameter used
> +by \fIdp\fR.
> +.
> +.TP
> +\*(DX\fBct\-set\-glbl\-cfg\fR [\fIdp\fR] \fBparam=\fI..\fR
> +Set a value to the specified ConnTracker parameter used
> +by \fIdp\fR.
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=vXZ1YIrzm8yx9y_G6RlRqBJPOyEO6liY9bXSHzA0uAE=PHKAZck2m0ZlG-
> WVDIVcLP56XP-S94YZ2m0pGqDmjPc=
> 
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/5] conntrack: add commands to r/w CT parameters.

2017-12-15 Thread Fischetti, Antonio
Thanks Darrell for your review.
I agree with your comments, if we have to consider R/W commands with
more than one parameter is much better to have specific functions
as you explained and showed.

I'll rework accordingly and post a v4.

Regards,
-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Monday, December 11, 2017 6:44 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>;
> d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 2/5] conntrack: add commands to r/w CT
> parameters.
> 
> One extra note inline
> 
> Thanks Darrell
> 
> On 12/11/17, 8:35 AM, "Darrell Ball" <db...@vmware.com> wrote:
> 
> Thanks Antonio for doing all this and pushing it forward.
> 
> Regarding patches 2-4:
> 
> I understand you want to save some code for various possible set and
> get operations.
> The prior art for these commands is however not generic set and get
> commands.
> Sometimes, we have specific commands that can take different numbers
> of arguments but
> those are specific and the context is clear.
> If we take some examples, we might see there is an issue with the
> approach here.
> Take per zone limits as one example, which is an internal
> requirement that we are working on
> across datapaths.
> This is a set operation with 2 arguments plus the datapath = 3
> arguments.
> This won’t work with the generic set functions here.
> The corresponding get will not work either.
> Trying to make this work generically will get pretty messy and the
> parameter validation error prone.
> 
> I would like to propose an alternative (a simple one; also with code
> consolidation and more error checking) below:
> Please let me know what you think.
> 
> Thanks Darrell
> 
> ///
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index f5a3aa9..c8ad548 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2485,6 +2485,27 @@ conntrack_flush(struct conntrack *ct, const
> uint16_t *zone)
>  return 0;
>  }
> 
> +int
> +conntrack_set_maxconns(struct conntrack *ct, uint32_t maxconns)
> +{
> +atomic_init(>n_conn_limit, maxconns);
> +return 0;
> +}
> +
> +int
> +conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns)
> +{
> +atomic_read_relaxed(>n_conn_limit, maxconns);
> +return 0;
> +}
> +
> +int
> +conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns)
> +{
> +*nconns = atomic_count_get(>n_conn);
> +return 0;
> +}
> +
>  /* This function must be called with the ct->resources read lock
> taken. */
>  static struct alg_exp_node *
>  expectation_lookup(struct hmap *alg_expectations,
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index fbeef1c..8652724 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -114,6 +114,9 @@ int conntrack_dump_next(struct conntrack_dump *,
> struct ct_dpif_entry *);
>  int conntrack_dump_done(struct conntrack_dump *);
> 
>  int conntrack_flush(struct conntrack *, const uint16_t *zone);
> +int conntrack_set_maxconns(struct conntrack *ct, uint32_t
> maxconns);
> +int conntrack_get_maxconns(struct conntrack *ct, uint32_t
> *maxconns);
> +int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
>  ^L
>  /* 'struct ct_lock' is a wrapper for an adaptive mutex.  It's
> useful to try
>   * different types of locks (e.g. spinlocks) */
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index 239c848..5fa3a97 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -140,6 +140,30 @@ ct_dpif_flush(struct dpif *dpif, const uint16_t
> *zone,
>  : EOPNOTSUPP);
>  }
> 
> +int
> +ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns)
> +{
> +return (dpif->dpif_class->ct_set_maxconns
> +? dpif->dpif_class->ct_set_maxconns(dpif, maxconns)
> +: EOPNOTSUPP);
> +}
> +
> +int
> +ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns)
> +{
> +return (dpif->dpif_class->ct_get_maxconns
> +? dpif->dpif_class->ct_get_maxconns(dpif, maxconns)
> +: EOPNOTSUPP);
> +}
> +
> +int
> +ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns)
> +{
> +return (dpif->dpif_class->ct_get_nconns
> +?

Re: [ovs-dev] [PATCH v6 8/8] NEWS: Add keepalive support information in NEWS.

2017-12-15 Thread Fischetti, Antonio
I think this needs a trivial rebase, other than that it's ok.

Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 8/8] NEWS: Add keepalive support
> information in NEWS.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  NEWS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/NEWS b/NEWS
> index 188a075..6fa69ed 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,8 @@ Post-v2.8.0
>   * Add support for compiling OVS with the latest Linux 4.13 kernel
> - "flush-conntrack" in ovs-dpctl and ovs-appctl now accept a 5-tuple
> to
>   delete a specific connection tracking entry.
> +   - Userspace Datapath:
> + * Added Keepalive support for userspace datapath.
> 
>  v2.8.0 - 31 Aug 2017
>  
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 6/8] keepalive: Add support to query keepalive status and statistics.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 6/8] keepalive: Add support to query
> keepalive status and statistics.
> 
> This commit adds support to query keepalive status and statistics.
> 
>   $ ovs-appctl keepalive/status
> keepAlive Status: Enabled
> 
>   $ ovs-appctl keepalive/pmd-health-show
> 
>   Keepalive status
> 
> keepalive status   : Enabled
> keepalive interva l: 1000 ms
> keepalive init time: 21 Aug 2017 16:20:31
> PMD threads: 4
> 
>  PMDCORESTATE   LAST SEEN TIMESTAMP(UTC)
> pmd620  ALIVE   21 Aug 2017 16:29:31
> pmd631  ALIVE   21 Aug 2017 16:29:31
> pmd642  ALIVE   21 Aug 2017 16:29:31
> pmd653  GONE21 Aug 2017 16:26:31
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/keepalive.c | 101
> 
>  1 file changed, 101 insertions(+)
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 14ac093..75c0884 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -18,11 +18,13 @@
> 
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
> +#include "openvswitch/dynamic-string.h"
>  #include "openvswitch/vlog.h"
>  #include "ovs-thread.h"
>  #include "process.h"
>  #include "seq.h"
>  #include "timeval.h"
> +#include "unixctl.h"
> 
>  VLOG_DEFINE_THIS_MODULE(keepalive);
> 
> @@ -362,6 +364,99 @@ ka_stats_run(void)
>  return ka_stats;
>  }
> 
> +static void
> +ka_unixctl_status(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +  const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +ds_put_format(, "keepAlive Status: %s",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
> +static void
> +ka_unixctl_pmd_health_show(struct unixctl_conn *conn, int argc
> OVS_UNUSED,
> +   const char *argv[] OVS_UNUSED, void
> *ka_info_)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +ds_put_format(,
> +  "\n\t\tKeepalive status\n\n");
> +
> +ds_put_format(, "keepalive status   : %s\n",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +if (!ka_is_enabled()) {
> +goto out;
> +}
> +
> +ds_put_format(, "keepalive interval : %"PRIu32" ms\n",
> +  get_ka_interval());
> +
> +char *utc = xastrftime_msec("%d %b %Y %H:%M:%S",
> +ka_info.init_time, true);
> +ds_put_format(, "keepalive init time: %s \n", utc);
> +
> +struct keepalive_info *ka_info = (struct keepalive_info *)ka_info_;
> +if (OVS_UNLIKELY(!ka_info)) {
> +goto out;
> +}
> +
> +ds_put_format(, "PMD threads: %"PRIu32" \n",
> +  ka_info->thread_cnt);
> +ds_put_format(,
> +  "\n PMD\tCORE\tSTATE\tLAST SEEN TIMESTAMP(UTC)\n");
> +
> +struct ka_process_info *pinfo, *pinfo_next;
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info-
> >process_list) {
> +char *state = NULL;
> +
> +if (pinfo->state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +default:
> +OVS_NOT_REACHED();
> +}
> +
> +utc = xastrftime_msec("%d %b %Y %H:%M:%S",
> +pinfo->last_seen_time, true);
> +
> +ds_put_format(, "%s\t%2d\t%s\t%s\n",
> +  pinfo->name, pinfo->core_id, state, utc);
> +
> +free(utc);
> +}
> +ovs_mutex_unlock(_info->proclist_mutex);
> +
> +ds_put_format(, "\n");
> +out:
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
>  /* Dispatch heartbeats from 'ovs_keepalive' thread. */
>  void
>  dispatch_heartbeats(void)
> @@ -424,6 +519,12 @@ ka_init(const struct smap *ovs_other_config)
> 
>  ka_info.init_time = time_wall_msec();
> 
> +unixctl_command_register("keepalive/status", "", 0, 0,
> + 

Re: [ovs-dev] [PATCH v6 5/8] bridge: Update keepalive status in OVSDB.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 5/8] bridge: Update keepalive status in
> OVSDB.
> 
> This commit allows vswitchd thread to update the OVSDB with the
> status of all registered PMD threads. The status can be monitored
> using ovsdb-client and the sample output is below.
> 
> $ ovsdb-client monitor Open_vSwitch Open_vSwitch keepalive
> 
> rowaction keepalive
> 7b746190-ee71-4dcc-becf-f8cb9c7cb909 old  {
> "pmd62"="ALIVE,0,9226457935188922"
> 
> "pmd63"="ALIVE,1,150678618"
> 
> "pmd64"="ALIVE,2,150678618"
> 
> "pmd65"="ALIVE,3,150678618"}
> 
>  new  {
> "pmd62"="ALIVE,0,9226460230167364"
> 
> "pmd63"="ALIVE,1,150679619"
> 
> "pmd64"="ALIVE,2,150679619"
> 
> "pmd65"="ALIVE,3,150679619""}
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/keepalive.c   | 15 +++
>  lib/keepalive.h   |  1 +
>  vswitchd/bridge.c | 26 ++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 7d3dbad..14ac093 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -347,6 +347,21 @@ get_ka_stats(void)
>  ovs_mutex_unlock();
>  }
> 
> +struct smap *
> +ka_stats_run(void)
> +{
> +struct smap *ka_stats = NULL;
> +
> +ovs_mutex_lock();
> +if (keepalive_stats) {
> +ka_stats = keepalive_stats;
> +keepalive_stats = NULL;
> +}
> +ovs_mutex_unlock();
> +
> +return ka_stats;
> +}
> +
>  /* Dispatch heartbeats from 'ovs_keepalive' thread. */
>  void
>  dispatch_heartbeats(void)
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index 2bae8f1..e84646a 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -101,6 +101,7 @@ void ka_cache_registered_threads(void);
>  void ka_mark_pmd_thread_alive(int);
>  void ka_mark_pmd_thread_sleep(int);
>  void get_ka_stats(void);
> +struct smap *ka_stats_run(void);
>  void dispatch_heartbeats(void);
>  void ka_init(const struct smap *);
>  void ka_destroy(void);
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index f70407f..55c925e 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -286,6 +286,7 @@ static bool port_is_synthetic(const struct port *);
> 
>  static void reconfigure_system_stats(const struct ovsrec_open_vswitch
> *);
>  static void run_system_stats(void);
> +static void run_keepalive_stats(void);
> 
>  static void bridge_configure_mirrors(struct bridge *);
>  static struct mirror *mirror_create(struct bridge *,
> @@ -403,6 +404,7 @@ bridge_init(const char *remote)
> 
>  ovsdb_idl_omit_alert(idl, _open_vswitch_col_cur_cfg);
>  ovsdb_idl_omit_alert(idl, _open_vswitch_col_statistics);
> +ovsdb_idl_omit_alert(idl, _open_vswitch_col_keepalive);
>  ovsdb_idl_omit_alert(idl, _open_vswitch_col_datapath_types);
>  ovsdb_idl_omit_alert(idl, _open_vswitch_col_iface_types);
>  ovsdb_idl_omit(idl, _open_vswitch_col_external_ids);
> @@ -2686,6 +2688,29 @@ run_system_stats(void)
>  }
>  }
> 
> +void
> +run_keepalive_stats(void)
> +{
> +struct smap *ka_stats;
> +const struct ovsrec_open_vswitch *cfg =
> ovsrec_open_vswitch_first(idl);
> +
> +ka_stats = ka_stats_run();
> +if (ka_stats && cfg) {
> +struct ovsdb_idl_txn *txn;
> +struct ovsdb_datum datum;
> +
> +txn = ovsdb_idl_txn_create(idl);
> +ovsdb_datum_from_smap(, ka_stats);
> +smap_destroy(ka_stats);
> +ovsdb_idl_txn_write(>header_,
> _open_vswitch_col_keepalive,
> +);
> +ovsdb_idl_txn_commit(txn);
> +ovsdb_idl_txn_destroy(txn);
> +
> +free(ka_stats);
> +}
> +}
> +
>  static const char *
>  ofp12_controller_role_to_str(enum ofp12_controller_role role)
>  {
> @@ -3039,6 +3064,7 @@ bridge_run(void)
>  run_stats_update();
>  run_status_update();
>  run_system_stats();
> +run_keepalive_stats();
>  }
> 
>  void
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 4/8] keepalive: Retrieve PMD status periodically.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 



> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 4/8] keepalive: Retrieve PMD status
> periodically.
> 
> This commit implements APIs to retrieve the PMD thread status and return
> the status in the below format for each PMD thread.
> 
>   Format: pmdid="status,core id,last_seen_timestamp(epoch)"
>   eg: pmd62="ALIVE,2,150332575"
>   pmd63="GONE,3,150332525"
> 
> The status is periodically retrieved by keepalive thread and stored in
> keepalive_stats struc which later shall be retrieved by vswitchd thread.
> In case of four PMD threads the status is as below:
> 
>"pmd62"="ALIVE,0,150332575"
>"pmd63"="ALIVE,1,150332575"
>"pmd64"="ALIVE,2,150332575"
>"pmd65"="ALIVE,3,150332575"
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c |  1 +
>  lib/keepalive.c   | 63
> +++
>  lib/keepalive.h   |  1 +
>  3 files changed, 65 insertions(+)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9021906..e9fa3c1 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1039,6 +1039,7 @@ ovs_keepalive(void *f_)
>  /* Dispatch heartbeats only if pmd[s] exist. */
>  if (hb_enable) {
>  dispatch_heartbeats();
> +get_ka_stats();
>  }
> 
>  xnanosleep(interval * 1000 * 1000);
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 0e4b2b6..7d3dbad 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -19,6 +19,7 @@
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
>  #include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
>  #include "process.h"
>  #include "seq.h"
>  #include "timeval.h"
> @@ -29,6 +30,9 @@ static bool keepalive_enable = false;  /*
> Keepalive disabled by default. */
>  static uint32_t keepalive_timer_interval;  /* keepalive timer interval.
> */
>  static struct keepalive_info ka_info;
> 
> +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
> +static struct smap *keepalive_stats OVS_GUARDED_BY(mutex);
> +
>  /* Returns true if state update is allowed, false otherwise. */
>  static bool
>  ka_can_update_state(void)
> @@ -284,6 +288,65 @@ ka_mark_pmd_thread_sleep(int tid)
>  }
>  }
> 
> +static void
> +get_pmd_status(struct smap *ka_pmd_stats)
> +OVS_REQUIRES(ka_info.proclist_mutex)
> +{
> +struct ka_process_info *pinfo, *pinfo_next;
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info.process_list)
> {
> +char *state = NULL;
> +if (pinfo->state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +default:
> +OVS_NOT_REACHED();
> +}
> +
> +smap_add_format(ka_pmd_stats, pinfo->name, "%s,%d,%ld",
> +state, pinfo->core_id, pinfo->last_seen_time);
> +}
> +}
> +
> +void
> +get_ka_stats(void)
> +{
> +struct smap *ka_pmd_stats;
> +ka_pmd_stats = xmalloc(sizeof *ka_pmd_stats);
> +smap_init(ka_pmd_stats);
> +
> +ovs_mutex_lock(_info.proclist_mutex);
> +get_pmd_status(ka_pmd_stats);
> +ovs_mutex_unlock(_info.proclist_mutex);
> +
> +ovs_mutex_lock();
> +if (keepalive_stats) {
> +smap_destroy(keepalive_stats);
> +free(keepalive_stats);
> +keepalive_stats = NULL;
> +}
> +keepalive_stats = ka_pmd_stats;
> +ovs_mutex_unlock();
> +}
> +
>  /* Dispatch heartbeats from 'ovs_keepalive' thread. */
>  void
>  dispatch_heartbeats(void)
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index cbc2387..2bae8f1 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -100,6 +100,7 @@ void ka_free_cached_threads(void);
>  void ka_cache_registered_threads(void);
>  void ka_mark_pmd_thread_alive(int);
>  void ka_mark_pmd_thread_sleep(int);
> +void get_ka_stats(void);
>  void dispatch_heartbeats(void);
>  void ka_init(const struct smap *);
>  void ka_destroy(void);
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev 

Re: [ovs-dev] [PATCH v6 3/8] dpif-netdev: Enable heartbeats for DPDK datapath.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 3/8] dpif-netdev: Enable heartbeats for
> DPDK datapath.
> 
> This commit adds heartbeat mechanism support for DPDK datapath.
> Heartbeats
> are sent to registered PMD threads at predefined intervals (as set in
> ovsdb
> with 'keepalive-interval').
> 
> The heartbeats are only enabled when there is atleast one port added to
> the bridge and with active PMD thread polling the port.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c | 14 +-
>  lib/keepalive.c   | 42 ++
>  lib/keepalive.h   |  1 +
>  3 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c978a76..9021906 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1021,14 +1021,26 @@ sorted_poll_thread_list(struct dp_netdev *dp,
>  }
> 
>  static void *
> -ovs_keepalive(void *f_ OVS_UNUSED)
> +ovs_keepalive(void *f_)
>  {
> +struct dp_netdev *dp = f_;
> +
>  pthread_detach(pthread_self());
> 
>  for (;;) {
> +bool hb_enable;
> +int n_pmds;
>  uint64_t interval;
> 
>  interval = get_ka_interval();
> +n_pmds = cmap_count(>poll_threads) - 1;
> +hb_enable = (n_pmds > 0) ? true : false;
> +
> +/* Dispatch heartbeats only if pmd[s] exist. */
> +if (hb_enable) {
> +dispatch_heartbeats();
> +}
> +
>  xnanosleep(interval * 1000 * 1000);
>  }
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index b04877f..0e4b2b6 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -284,6 +284,48 @@ ka_mark_pmd_thread_sleep(int tid)
>  }
>  }
> 
> +/* Dispatch heartbeats from 'ovs_keepalive' thread. */
> +void
> +dispatch_heartbeats(void)
> +{
> +struct ka_process_info *pinfo, *pinfo_next;
> +
> +/* Iterates over the list of processes in 'cached_process_list'
> map. */
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node,
> +_info.cached_process_list) {
> +if (pinfo->state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->state) {
> +case KA_STATE_UNUSED:
> +break;
> +case KA_STATE_ALIVE:
> +pinfo->state = KA_STATE_MISSING;
> +pinfo->last_seen_time = time_wall_msec();
> +break;
> +case KA_STATE_MISSING:
> +pinfo->state = KA_STATE_DEAD;
> +break;
> +case KA_STATE_DEAD:
> +pinfo->state = KA_STATE_GONE;
> +break;
> +case KA_STATE_GONE:
> +break;
> +case KA_STATE_SLEEP:
> +pinfo->state = KA_STATE_SLEEP;
> +pinfo->last_seen_time = time_wall_msec();
> +break;
> +default:
> +OVS_NOT_REACHED();
> +}
> +
> +/* Invoke 'ka_update_thread_state' cb function to update state
> info
> + * in to 'ka_info.process_list' map. */
> +ka_info.relay_cb(pinfo->tid, pinfo->state, pinfo-
> >last_seen_time);
> +}
> +}
> +
>  void
>  ka_init(const struct smap *ovs_other_config)
>  {
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index 7674ea3..cbc2387 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -100,6 +100,7 @@ void ka_free_cached_threads(void);
>  void ka_cache_registered_threads(void);
>  void ka_mark_pmd_thread_alive(int);
>  void ka_mark_pmd_thread_sleep(int);
> +void dispatch_heartbeats(void);
>  void ka_init(const struct smap *);
>  void ka_destroy(void);
> 
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 2/8] dpif-netdev: Register packet processing cores to KA framework.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 2/8] dpif-netdev: Register packet
> processing cores to KA framework.
> 
> This commit registers the packet processing PMD threads to keepalive
> framework. Only PMDs that have rxqs mapped will be registered and
> actively monitored by KA framework.
> 
> This commit spawns a keepalive thread that will dispatch heartbeats to
> PMD threads. The pmd threads respond to heartbeats by marking themselves
> alive. As long as PMD responds to heartbeats it is considered 'healthy'.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c |  79 ++
>  lib/keepalive.c   | 194
> --
>  lib/keepalive.h   |  20 ++
>  lib/ovs-thread.c  |   6 ++
>  lib/ovs-thread.h  |   1 +
>  lib/util.c|  22 +++
>  lib/util.h|   1 +
>  7 files changed, 318 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 07f6113..c978a76 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -49,6 +49,7 @@
>  #include "flow.h"
>  #include "hmapx.h"
>  #include "id-pool.h"
> +#include "keepalive.h"
>  #include "latch.h"
>  #include "netdev.h"
>  #include "netdev-vport.h"
> @@ -592,6 +593,7 @@ struct dp_netdev_pmd_thread {
>  atomic_bool reload; /* Do we need to reload ports? */
>  pthread_t thread;
>  unsigned core_id;   /* CPU core id of this pmd thread.
> */
> +pid_t tid;  /* PMD thread tid. */
>  int numa_id;/* numa node id of this pmd thread.
> */
>  bool isolated;
> 
> @@ -1018,6 +1020,72 @@ sorted_poll_thread_list(struct dp_netdev *dp,
>  *n = k;
>  }
> 
> +static void *
> +ovs_keepalive(void *f_ OVS_UNUSED)
> +{
> +pthread_detach(pthread_self());
> +
> +for (;;) {
> +uint64_t interval;
> +
> +interval = get_ka_interval();
> +xnanosleep(interval * 1000 * 1000);
> +}
> +
> +return NULL;
> +}
> +
> +/* Kickstart 'ovs_keepalive' thread. */
> +static void
> +ka_thread_start(struct dp_netdev *dp)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +
> +if (ovsthread_once_start()) {
> +ovs_thread_create("ovs_keepalive", ovs_keepalive, dp);
> +
> +ovsthread_once_done();
> +}
> +}
> +
> +/* Register the datapath threads. This gets invoked on every datapath
> + * reconfiguration. The pmd thread[s] having rxq[s] mapped will be
> + * registered to KA framework.
> + */
> +static void
> +ka_register_datapath_threads(struct dp_netdev *dp)
> +{
> +if (!ka_is_enabled()) {
> +return;
> +}
> +
> +ka_thread_start(dp);
> +
> +ka_reload_datapath_threads_begin();
> +
> +struct dp_netdev_pmd_thread *pmd;
> +CMAP_FOR_EACH (pmd, node, >poll_threads) {
> +/*  Register only PMD threads. */
> +if (pmd->core_id != NON_PMD_CORE_ID) {
> +/* Skip PMD thread with no rxqs mapping. */
> +if (OVS_UNLIKELY(!hmap_count(>poll_list))) {
> +/* Rxq mapping changes due to datapath reconfiguration.
> + * If no rxqs mapped to PMD now due to reconfiguration,
> + * unregister the pmd thread. */
> +ka_unregister_thread(pmd->tid);
> +continue;
> +}
> +
> +ka_register_thread(pmd->tid);
> +VLOG_INFO("Registered PMD thread [%d] on Core[%d] to KA
> framework",
> +  pmd->tid, pmd->core_id);
> +}
> +}
> +ka_cache_registered_threads();
> +
> +ka_reload_datapath_threads_end();
> +}
> +
>  static void
>  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
>const char *argv[], void *aux OVS_UNUSED)
> @@ -3819,6 +3887,9 @@ reconfigure_datapath(struct dp_netdev *dp)
> 
>  /* Reload affected pmd threads. */
>  reload_affected_pmds(dp);
> +
> +/* Register datapath threads to KA monitoring. */
> +ka_register_datapath_threads(dp);
>  }
> 
>  /* Returns true if one of the netdevs in 'dp' requires a
> reconfiguration */
> @@ -4023,6 +4094,8 @@ pmd_thread_main(void *f_)
> 
>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
> +/* Stores tid in to 'pmd->tid'. */
> +ovsthread_set_tid(>tid);
>  ovs_numa_thread_setaffinity_core(pmd->core_id);
>  dpdk_set_lcore_id(pmd->core_id);
>  poll_cnt = pmd_load_queues_and_ports(pmd, _list);
> @@ -4056,6 +4129,9 @@ reload:
>:
> 

Re: [ovs-dev] [PATCH v6 1/8] Keepalive: Add initial keepalive configuration.

2017-12-15 Thread Fischetti, Antonio
LGTM,

Tested-by: Antonio Fischetti 
Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 1/8] Keepalive: Add initial keepalive
> configuration.
> 
> This commit introduces the keepalive configuration by adding
> 'keepalive' module and also helper and initialization functions
> that will be invoked by later commits.
> 
> This commit adds new ovsdb column "keepalive" that shows the status
> of the datapath threads. This is implemented for DPDK datapath and
> only status of PMD threads is reported.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/automake.mk|   2 +
>  lib/keepalive.c| 147
> +
>  lib/keepalive.h|  86 ++
>  vswitchd/bridge.c  |   3 +
>  vswitchd/vswitch.ovsschema |   8 ++-
>  vswitchd/vswitch.xml   |  49 +++
>  6 files changed, 293 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index effe5b5..91d65be 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/json.c \
>   lib/jsonrpc.c \
>   lib/jsonrpc.h \
> + lib/keepalive.c \
> + lib/keepalive.h \
>   lib/lacp.c \
>   lib/lacp.h \
>   lib/latch.h \
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> new file mode 100644
> index 000..ca8dccb
> --- /dev/null
> +++ b/lib/keepalive.c
> @@ -0,0 +1,147 @@
> +/*
> + * Copyright (c) 2017 Intel, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +
> +#include "keepalive.h"
> +#include "lib/vswitch-idl.h"
> +#include "openvswitch/vlog.h"
> +#include "seq.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(keepalive);
> +
> +static bool keepalive_enable = false;  /* Keepalive disabled by
> default. */
> +static uint32_t keepalive_timer_interval;  /* keepalive timer interval.
> */
> +static struct keepalive_info ka_info;
> +
> +/* Returns true if keepalive is enabled, false otherwise. */
> +bool
> +ka_is_enabled(void)
> +{
> +return keepalive_enable;
> +}
> +
> +/* Finds the thread by 'tid' in 'process_list' map and update
> + * the thread state and last_seen_time stamp.  This is invoked
> + * periodically(based on keepalive-interval) as part of callback
> + * function in the context of keepalive thread.
> + */
> +static void
> +ka_set_thread_state_ts(pid_t tid, enum keepalive_state state,
> +   uint64_t last_alive)
> +{
> +struct ka_process_info *pinfo;
> +
> +ovs_mutex_lock(_info.proclist_mutex);
> +HMAP_FOR_EACH_WITH_HASH (pinfo, node, hash_int(tid, 0),
> + _info.process_list) {
> +if (pinfo->tid == tid) {
> +pinfo->state = state;
> +pinfo->last_seen_time = last_alive;
> +}
> +}
> +ovs_mutex_unlock(_info.proclist_mutex);
> +}
> +
> +/* Retrieve and return the keepalive timer interval from OVSDB. */
> +static uint32_t
> +ka_get_timer_interval(const struct smap *ovs_other_config)
> +{
> +uint32_t ka_interval;
> +
> +/* Timer granularity in milliseconds
> + * Defaults to OVS_KEEPALIVE_TIMEOUT(ms) if not set */
> +ka_interval = smap_get_int(ovs_other_config, "keepalive-interval",
> +   OVS_KEEPALIVE_DEFAULT_TIMEOUT);
> +
> +VLOG_INFO("Keepalive timer interval set to %"PRIu32" (ms)\n",
> ka_interval);
> +return ka_interval;
> +}
> +
> +/* Invoke periodically to update the status and last seen timestamp
> + * of the thread in to 'process_list' map. Runs in the context of
> + * keepalive thread.
> + */
> +static void
> +ka_update_thread_state(pid_t tid, const enum keepalive_state state,
> +   uint64_t last_alive)
> +{
> +switch (state) {
> +case KA_STATE_ALIVE:
> +case KA_STATE_MISSING:
> +ka_set_thread_state_ts(tid, KA_STATE_ALIVE, last_alive);
> +break;
> +case KA_STATE_UNUSED:
> +case KA_STATE_SLEEP:
> +case KA_STATE_DEAD:
> +case KA_STATE_GONE:
> +  

Re: [ovs-dev] [PATCH v6 0/8] Add OVS DPDK keep-alive functionality.

2017-12-15 Thread Fischetti, Antonio
Hi Bhanu,
I've tested v6 and LGTM, works as expected.

To check the behavior I blocked one or more of the PMD with the trick of 
attaching cgdb. So with 

   utilities/ovs-appctl keepalive/pmd-health-show

I could see for example 2 of the 3 PMDs - those I intentionally blocked - 
correctly reported as DEAD first, then GONE like

Keepalive status

keepalive status   : Enabled
keepalive interval : 1000 ms
keepalive init time: 15 Dec 2017 13:20:54
PMD threads: 3

 PMDCORESTATE   LAST SEEN TIMESTAMP(UTC)
pmd625  GONE15 Dec 2017 13:22:39
pmd657  ALIVE   15 Dec 2017 13:22:50
pmd634  GONE15 Dec 2017 13:22:14


Thanks,
Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 0/8] Add OVS DPDK keep-alive functionality.
> 
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing threads by dispatching heartbeats at regular intervals.
> 
> keepalive feature can be enabled through below OVSDB settings.
> 
> enable-keepalive=true
>   - Keepalive feature is disabled by default and should be enabled
> at startup before ovs-vswitchd daemon is started.
> 
> keepalive-interval="5000"
>   - Timer interval in milliseconds for monitoring the packet
> processing cores.
> 
> TESTING:
> The testing of keepalive is done using stress cmd (simulating the
> stalls).
>   - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>   - stress -c 1 &  [tid is usually the __tid + 1 of the
> output]
>   - chrt -r -p 99 [set realtime priority for stress thread]
>   - taskset -p 0x8[Pin the stress thread to the core PMD is
> running]
>   - PMD thread will be descheduled due to its normal priority and
> yields
> core to stress thread.
> 
>   - ovs-appctl keepalive/pmd-health-show   [Display that the thread
> is GONE]
>   - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
> status]
> 
>   - taskset -p 0x10   [This brings back pmd thread to life as
> stress thread
> is moved to idle core]
> 
>   (watch out for stress threads, and carefully pin them to core not
> to hang your DUTs
>during tesing).
> 
> v5 -> v6
>   * Remove 2 patches from series
>  - xnanosleep was applied to master as part of high resolution
> timeout support.
>  - Extend get_process_info() API was also applied to master earlier.
>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
> states, not needed
> for now.
>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>   * A minor performance degradation(0.5%) is observed with Keepalive
> enabled.
> [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts
> and
> 1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>   * Verified with sparse, MSVC compilers(appveyor).
> 
> v4 -> v5
>   * Add 3 more patches to the series
>  - xnanosleep()
>  - Documentation
>  - Update to NEWS
>   * Remove all references to core_id and instead implemented thread
> based tracking.
>   * Addressed most of the comments in v4.
> 
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only
> updates
> PMD status to OVSDB. The incremental patch series to handle false
> positives,
> negatives and more checking and stats.
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
> is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by
> Aaron).
>   * Rework the logic to register and track threads instead of cores.
> This way
> in the future any thread can be registered to KA framework. For now
> only PMD
> threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
> 
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
> implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
> show,
> pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy 

Re: [ovs-dev] [PATCH v4 2/2] netdev-dpdk: Add debug appctl to get mempool information.

2017-12-11 Thread Fischetti, Antonio
Still LGTM, please add

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Monday, December 11, 2017 4:37 PM
> To: Ilya Maximets <i.maxim...@samsung.com>; ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Stokes, Ian <ian.sto...@intel.com>; Wojciechowicz, RobertX
> <robertx.wojciechow...@intel.com>; Flavio Leitner <f...@redhat.com>
> Subject: RE: [PATCH v4 2/2] netdev-dpdk: Add debug appctl to get mempool
> information.
> 
> >From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >Sent: Monday, December 11, 2017 1:19 PM
> >To: ovs-dev@openvswitch.org
> >Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> >Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> ><ian.sto...@intel.com>; Wojciechowicz, RobertX
> ><robertx.wojciechow...@intel.com>; Flavio Leitner <f...@redhat.com>;
> Ilya
> >Maximets <i.maxim...@samsung.com>
> >Subject: [PATCH v4 2/2] netdev-dpdk: Add debug appctl to get mempool
> >information.
> >
> >New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
> >of 'rte_mempool_list_dump()' function if no arguments passed and
> >'rte_mempool_dump()' if DPDK netdev passed as argument.
> >
> >Could be used for debugging mbuf leaks and other mempool related
> >issues. Most useful in pair with `grep -v "cache_count.*=0"`.
> >
> >Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> >---
> > NEWS|  1 +
> > lib/netdev-dpdk-unixctl.man |  5 +
> > lib/netdev-dpdk.c   | 54
> >+
> > 3 files changed, 60 insertions(+)
> >
> >diff --git a/NEWS b/NEWS
> >index 69d5dab..e60514e 100644
> >--- a/NEWS
> >+++ b/NEWS
> >@@ -18,6 +18,7 @@ Post-v2.8.0
> >- DPDK:
> >  * Add support for DPDK v17.11
> >  * Add support for vHost IOMMU
> >+ * New debug appctl command 'netdev-dpdk/get-mempool-info'.
> >  * All the netdev-dpdk appctl commands described in ovs-vswitchd
> man
> >page.
> >
> > v2.8.0 - 31 Aug 2017
> >diff --git a/lib/netdev-dpdk-unixctl.man b/lib/netdev-dpdk-unixctl.man
> >index 5af6eca..ac274cd 100644
> >--- a/lib/netdev-dpdk-unixctl.man
> >+++ b/lib/netdev-dpdk-unixctl.man
> >@@ -7,3 +7,8 @@ If \fIinterface\fR is not specified, then it applies to
> all
> >DPDK ports.
> > Detaches device with corresponding \fIpci-address\fR from DPDK.  This
> command
> > can be used to detach device if it wasn't detached automatically after
> port
> > deletion. Refer to the documentation for details and instructions.
> 
> Hi Ilya,
> 
> I would still prefer if the pointer to documentation were more specific;
> however, I won't block on that basis alone.
> 
> Acked-by: Mark Kavanagh <mark.b.kavan...@intel.com>
> Tested-by: Mark Kavanagh <mark.b.kavan...@intel.com>
> 
> Thanks for the series,
> Mark
> 
> >+.IP "\fBnetdev-dpdk/get-mempool-info\fR [\fIinterface\fR]"
> >+Prints the debug information about memory pool used by DPDK
> \fIinterface\fR.
> >+If called without arguments, information of all the available mempools
> will
> >+be printed. For additional mempool statistics enable
> >+\fBCONFIG_RTE_LIBRTE_MEMPOOL_DEBUG\fR while building DPDK.
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 8f22264..3bf461b 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -2586,6 +2586,56 @@ error:
> > free(response);
> > }
> >
> >+static void
> >+netdev_dpdk_get_mempool_info(struct unixctl_conn *conn,
> >+ int argc, const char *argv[],
> >+ void *aux OVS_UNUSED)
> >+{
> >+size_t size;
> >+FILE *stream;
> >+char *response = NULL;
> >+struct netdev *netdev = NULL;
> >+
> >+if (argc == 2) {
> >+netdev = netdev_from_name(argv[1]);
> >+if (!netdev || !is_dpdk_class(netdev->netdev_class)) {
> >+unixctl_command_reply_error(conn, "Not a DPDK Interface");
> >+goto out;
> >+}
> >+}
> >+
> >+stream = open_memstream(, );
> >+if (!stream) {
> >+response = xasprintf("Unab

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Fischetti, Antonio
I just checked there's no issue if we use dpdk v17.11 on the host
and dpdk 17.05.2 in the guest.

Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Thursday, December 7, 2017 12:28 PM
> To: Ilya Maximets <i.maxim...@samsung.com>; Stokes, Ian
> <ian.sto...@intel.com>; d...@openvswitch.org
> Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean
> K <sean.k.moo...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Bie, Tiwei <tiwei@intel.com>;
> Mcnamara, John <john.mcnam...@intel.com>; Guoshuai Li
> <l...@dtdream.com>; ktray...@redhat.com; Loftus, Ciara
> <ciara.lof...@intel.com>; Yuanhan Liu <y...@fridaylinux.org>
> Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> 
> >From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >Sent: Thursday, December 7, 2017 12:19 PM
> >To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> ><ian.sto...@intel.com>; d...@openvswitch.org
> >Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney,
> Sean K
> ><sean.k.moo...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>;
> >Bie, Tiwei <tiwei@intel.com>; Mcnamara, John
> <john.mcnam...@intel.com>;
> >Guoshuai Li <l...@dtdream.com>; ktray...@redhat.com; Loftus, Ciara
> ><ciara.lof...@intel.com>; Yuanhan Liu <y...@fridaylinux.org>
> >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> >
> >On 07.12.2017 14:32, Kavanagh, Mark B wrote:
> >> Yuanhan's old email address was used in the previous mail - updated
> >correctly here.
> >> -Mark
> >>
> >>> -Original Message-
> >>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> >boun...@openvswitch.org]
> >>> On Behalf Of Kavanagh, Mark B
> >>> Sent: Thursday, December 7, 2017 11:24 AM
> >>> To: Stokes, Ian <ian.sto...@intel.com>; d...@openvswitch.org
> >>> Cc: Liu, Yuanhan <yuanhan@intel.com>; Bie, Tiwei
> <tiwei@intel.com>;
> >>> Mcnamara, John <john.mcnam...@intel.com>;
> maxime.coque...@redhat.com;
> >>> i.maxim...@samsung.com
> >>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU
> support
> >>>
> >>> Hi folks,
> >>>
> >>> Thanks for all of your respective reviews and testing of this
> patchset.
> >>>
> >>> It seems, however, that an issue in DPDK v17.11 has come to light
> that
> >affects
> >>> guests which use testpmd.
> >>> Specifically, traffic does not reach a guest when traffic is live
> prior to
> >>> kicking off the testpmd app within said guest, and at least two
> forwarding
> >>> cores are used. [1]
> >>>
> >>> This is explained in additional detail is [2], an extract of which
> is
> >>> reproduced below:
> >>>
> >>>   "the vector Rx could be broken if backend has consumed all the
> avail
> >>> descs before the
> >>>device is started. Because in current implementation, the vector
> Rx will
> >>> return immediately
> >>>without refilling the avail ring if the used ring is empty. So we
> have
> >>> to refill
> >>>the avail ring after flushing the elements in the used ring."
> >>>
> >>> This issue was initially uncovered by Antonio Fischetti, as part of
> the
> >17.11
> >>> patchset validation, and has since been localized to DPDK, rather
> than OvS.
> >>> As a result, it seems now that we should not move to 17.11, but seek
> an
> >out-
> >>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on
> this -
> >>> should we:
> >>>   a) move to 17.11 now, note the issue above in the 'errata' section
> of the
> >>> documentation, and move to 17.11.1 when it becomes available in
> February of
> >>> next year
> >>>   b) request the early release of the 17.11.1 stable branch, which
> >>> incorporates a fix for this issue (the possibility, and
> availability, of
> >which
> >>> are both TBD).
> >>>
> >>> Thanks in advance,
> >>> Mark
> >
> >Hmm. Isn't it a guest driver issue? Do we need to care so much about
> running
> >OVS inside the VM? If I assumed right, if we're running OVS not inside
> the VM,
> >there will be no issues on the OVS side. The issue h

Re: [ovs-dev] [PATCH 3/2] vswitchd: Document netdev-dpdk commands.

2017-10-31 Thread Fischetti, Antonio
Thanks, LGTM.
-Antonio

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, October 31, 2017 3:18 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> <ian.sto...@intel.com>; Wojciechowicz, RobertX
> <robertx.wojciechow...@intel.com>; Ilya Maximets <i.maxim...@samsung.com>
> Subject: [PATCH 3/2] vswitchd: Document netdev-dpdk commands.
> 
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> ---
>  NEWS|  3 +++
>  lib/automake.mk |  1 +
>  lib/netdev-dpdk-unixctl.man | 13 +
>  manpages.mk |  2 ++
>  vswitchd/ovs-vswitchd.8.in  |  1 +
>  5 files changed, 20 insertions(+)
>  create mode 100644 lib/netdev-dpdk-unixctl.man
> 
> diff --git a/NEWS b/NEWS
> index 1325d31..6c09d71 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,6 +5,9 @@ Post-v2.8.0
> chassis "hostname" in addition to a chassis "name".
> - Linux kernel 4.13
>   * Add support for compiling OVS with the latest Linux 4.13 kernel
> +   - DPDK:
> + * New debug appctl command 'netdev-dpdk/get-mempool-info'.
> + * All the netdev-dpdk appctl commands described in ovs-vswitchd man 
> page.
> 
>  v2.8.0 - 31 Aug 2017
>  
> diff --git a/lib/automake.mk b/lib/automake.mk
> index ca1cf5d..f6a82d5 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -467,6 +467,7 @@ MAN_FRAGMENTS += \
>   lib/db-ctl-base.man \
>   lib/dpctl.man \
>   lib/memory-unixctl.man \
> + lib/netdev-dpdk-unixctl.man \
>   lib/ofp-version.man \
>   lib/ovs.tmac \
>   lib/service.man \
> diff --git a/lib/netdev-dpdk-unixctl.man b/lib/netdev-dpdk-unixctl.man
> new file mode 100644
> index 000..73b2e10
> --- /dev/null
> +++ b/lib/netdev-dpdk-unixctl.man
> @@ -0,0 +1,13 @@
> +.SS "NETDEV-DPDK COMMANDS"
> +These commands manage DPDK related ports (\fItype=dpdk*\fR).
> +.IP "\fBnetdev-dpdk/set-admin-state\fR [\fIinterface\fR] \fIstate\fR"
> +Sets admin state for DPDK \fIinterface\fR (or all interfaces if none is 
> given)
> +to \fIstate\fR.  \fIstate\fR can be "up" or "down".
> +.IP "\fBnetdev-dpdk/detach\fR \fIpci-address\fR"
> +Detaches device with corresponding \fIpci-address\fR from DPDK.  This command
> +can be used to detach device if it wasn't detached automatically after port
> +deletion. Refer to the documentation for details and instructions.
> +.IP "\fBnetdev-dpdk/get-mempool-info\fR [\fIinterface\fR]"
> +Prints the debug information about memory pool used by DPDK \fIinterface\fR.
> +If called without arguments, information of all the available mempools will
> +be printed.
> diff --git a/manpages.mk b/manpages.mk
> index d610d88..c89bc45 100644
> --- a/manpages.mk
> +++ b/manpages.mk
> @@ -279,6 +279,7 @@ vswitchd/ovs-vswitchd.8: \
>   lib/daemon.man \
>   lib/dpctl.man \
>   lib/memory-unixctl.man \
> + lib/netdev-dpdk-unixctl.man \
>   lib/service.man \
>   lib/ssl-bootstrap.man \
>   lib/ssl.man \
> @@ -296,6 +297,7 @@ lib/coverage-unixctl.man:
>  lib/daemon.man:
>  lib/dpctl.man:
>  lib/memory-unixctl.man:
> +lib/netdev-dpdk-unixctl.man:
>  lib/service.man:
>  lib/ssl-bootstrap.man:
>  lib/ssl.man:
> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
> index c18baf6..76ccfcb 100644
> --- a/vswitchd/ovs-vswitchd.8.in
> +++ b/vswitchd/ovs-vswitchd.8.in
> @@ -283,6 +283,7 @@ port names, which this thread polls.
>  .IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
>  Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage.
>  .
> +.so lib/netdev-dpdk-unixctl.man
>  .so ofproto/ofproto-dpif-unixctl.man
>  .so ofproto/ofproto-unixctl.man
>  .so lib/vlog-unixctl.man
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Add debug appctl to get mempool information.

2017-10-31 Thread Fischetti, Antonio
Thanks Ilya, looks a useful debugging command, I gave it a try.
Agree with Raymond, there should be some reference in the doc somewhere.
Beside that LGTM.

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, October 31, 2017 11:35 AM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> <ian.sto...@intel.com>; Wojciechowicz, RobertX
> <robertx.wojciechow...@intel.com>; Ilya Maximets <i.maxim...@samsung.com>
> Subject: [PATCH 2/2] netdev-dpdk: Add debug appctl to get mempool information.
> 
> New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
> of 'rte_mempool_list_dump()' function if no arguments passed and
> 'rte_mempool_dump()' if DPDK netdev passed as argument.
> 
> Could be used for debugging mbuf leaks and other mempool related
> issues. Most useful in pair with `grep -v "cache_count.*=0"`.
> 
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> ---
>  lib/netdev-dpdk.c | 54 ++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 4ec536d..0e4a08c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2550,6 +2550,56 @@ error:
>  free(response);
>  }
> 
> +static void
> +netdev_dpdk_get_mempool_info(struct unixctl_conn *conn,
> + int argc, const char *argv[],
> + void *aux OVS_UNUSED)
> +{
> +size_t size;
> +FILE *stream;
> +char *response = NULL;
> +struct netdev *netdev = NULL;
> +
> +if (argc == 2) {
> +netdev = netdev_from_name(argv[1]);
> +if (!netdev || !is_dpdk_class(netdev->netdev_class)) {
> +unixctl_command_reply_error(conn, "Not a DPDK Interface");
> +goto out;
> +}
> +}
> +
> +stream = open_memstream(, );
> +if (!stream) {
> +response = xasprintf("Unable to open memstream: %s.",
> + ovs_strerror(errno));
> +unixctl_command_reply_error(conn, response);
> +goto out;
> +}
> +
> +if (netdev) {
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +ovs_mutex_lock(>mutex);
> +ovs_mutex_lock(_mp_mutex);
> +
> +rte_mempool_dump(stream, dev->mp);
> +
> +ovs_mutex_unlock(_mp_mutex);
> +ovs_mutex_unlock(>mutex);
> +} else {
> +ovs_mutex_lock(_mp_mutex);
> +rte_mempool_list_dump(stream);
> +ovs_mutex_unlock(_mp_mutex);
> +}
> +
> +fclose(stream);
> +
> +unixctl_command_reply(conn, response);
> +out:
> +free(response);
> +netdev_close(netdev);
> +}
> +
>  /*
>   * Set virtqueue flags so that we do not receive interrupts.
>   */
> @@ -2806,6 +2856,10 @@ netdev_dpdk_class_init(void)
>   "pci address of device", 1, 1,
>   netdev_dpdk_detach, NULL);
> 
> +unixctl_command_register("netdev-dpdk/get-mempool-info",
> + "[netdev]", 0, 1,
> + netdev_dpdk_get_mempool_info, NULL);
> +
>  ovsthread_once_done();
>  }
> 
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/4] netdev-dpdk: Remove unused MAX_NB_MBUF.

2017-10-31 Thread Fischetti, Antonio
Hi Ilya, I've tested all this patch-series by 
 - running some PVP test, 
 - checking the NUMA-awareness works and 
 - MTU small/big changes that causes reuse/creation of a new mp

It works fine.

LGTM

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Monday, October 30, 2017 12:53 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> <ian.sto...@intel.com>; Wojciechowicz, RobertX
> <robertx.wojciechow...@intel.com>; Ilya Maximets <i.maxim...@samsung.com>
> Subject: [PATCH 4/4] netdev-dpdk: Remove unused MAX_NB_MBUF.
> 
> CC: Robert Wojciechowicz <robertx.wojciechow...@intel.com>
> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each 
> port.")
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> ---
>  lib/netdev-dpdk.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index cdb3244..0b40966 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -89,23 +89,13 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 20);
>  #define NETDEV_DPDK_MBUF_ALIGN  1024
>  #define NETDEV_DPDK_MAX_PKT_LEN 9728
> 
> -/* Max and min number of packets in the mempool.  OVS tries to allocate a
> - * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
> - * enough hugepages) we keep halving the number until the allocation succeeds
> - * or we reach MIN_NB_MBUF */
> -
> -#define MAX_NB_MBUF  (4096 * 64)
> +/* Min number of packets in the mempool.  OVS tries to allocate a mempool 
> with
> + * roughly estimated number of mbufs: if this fails (because the system
> doesn't
> + * have enough hugepages) we keep halving the number until the allocation
> + * succeeds or we reach MIN_NB_MBUF */
>  #define MIN_NB_MBUF  (4096 * 4)
>  #define MP_CACHE_SZ  RTE_MEMPOOL_CACHE_MAX_SIZE
> 
> -/* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */
> -BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) ==
> 0);
> -
> -/* The smallest possible NB_MBUF that we're going to try should be a multiple
> - * of MP_CACHE_SZ. This is advised by DPDK documentation. */
> -BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> -  % MP_CACHE_SZ == 0);
> -
>  /*
>   * DPDK XSTATS Counter names definition
>   */
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] netdev-dpdk: Factor out struct dpdk_mp.

2017-10-31 Thread Fischetti, Antonio
Thanks Ilya, it's a good rework especially for netdev_dpdk_mempool_configure() 
fn.

LGTM

Acked-by: Antonio Fischetti 



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] netdev-dpdk: Fix dpdk_mp leak in case of EEXIST.

2017-10-31 Thread Fischetti, Antonio
LGTM 
Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Monday, October 30, 2017 12:53 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Stokes, Ian
> <ian.sto...@intel.com>; Wojciechowicz, RobertX
> <robertx.wojciechow...@intel.com>; Ilya Maximets <i.maxim...@samsung.com>
> Subject: [PATCH 2/4] netdev-dpdk: Fix dpdk_mp leak in case of EEXIST.
> 
> CC: Robert Wojciechowicz <robertx.wojciechow...@intel.com>
> CC: Antonio Fischetti <antonio.fische...@intel.com>
> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each 
> port.")
> Fixes: b6b26021d2e2 ("netdev-dpdk: fix management of pre-existing mempools.")
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> ---
>  lib/netdev-dpdk.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 1e9d78f..ba6add2 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -649,6 +649,12 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>   * Update dev with the new values. */
>  dev->mtu = dev->requested_mtu;
>  dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +/* 'mp' should contain pointer to the mempool already owned by 
> netdev.
> + * Otherwise something went completely wrong. */
> +ovs_assert(dev->dpdk_mp);
> +ovs_assert(dev->dpdk_mp->mp == mp->mp);
> +/* Free the returned struct dpdk_mp because it will not be used. */
> +rte_free(mp);
>  return EEXIST;
>  } else {
>  /* A new mempool was created, release the previous one. */
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] dpif-netdev: Rename rxq_interval.

2017-10-24 Thread Fischetti, Antonio
LGTM, makes the code more readable.

Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Kevin Traynor
> Sent: Friday, October 20, 2017 11:37 AM
> To: Darrell Ball ; d...@openvswitch.org;
> i.maxim...@samsung.com; Stokes, Ian 
> Subject: Re: [ovs-dev] [PATCH 1/3] dpif-netdev: Rename rxq_interval.
> 
> Ping again. This is a simple variable rename that was requested.
> 
> On 09/22/2017 08:22 PM, Darrell Ball wrote:
> > Are there any other comments?
> >
> >
> >
> > On 8/30/17, 10:49 AM, "Darrell Ball"  wrote:
> >
> > Thanks Kevin
> >
> > Naming is hard.
> > The name looks a bit more intuitive and matches closely with the
> description previously added.
> >
> > Darrell
> >
> > On 8/30/17, 10:45 AM, "Kevin Traynor"  wrote:
> >
> > rxq_interval was added before there was other #defines
> > and code related to rxq intervals.
> >
> > Rename to rxq_next_cycles_store in order to make it more intuitive.
> >
> > Reported-by: Ilya Maximets 
> > Signed-off-by: Kevin Traynor 
> > ---
> >  lib/dpif-netdev.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 071ec14..55d5656 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -576,5 +576,5 @@ struct dp_netdev_pmd_thread {
> >  /* End of the next time interval for which processing cycles
> > are stored for each polled rxq. */
> > -long long int rxq_interval;
> > +long long int rxq_next_cycle_store;
> >
> >  /* Statistics. */
> > @@ -4507,5 +4507,5 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> >  cmap_init(>classifiers);
> >  pmd->next_optimization = time_msec() +
> DPCLS_OPTIMIZATION_INTERVAL;
> > -pmd->rxq_interval = time_msec() + PMD_RXQ_INTERVAL_LEN;
> > +pmd->rxq_next_cycle_store = time_msec() + PMD_RXQ_INTERVAL_LEN;
> >  hmap_init(>poll_list);
> >  hmap_init(>tx_ports);
> > @@ -5951,5 +5951,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> >  long long int now = time_msec();
> >
> > -if (now > pmd->rxq_interval) {
> > +if (now > pmd->rxq_next_cycle_store) {
> >  /* Get the cycles that were used to process each queue and
> store. */
> >  for (unsigned i = 0; i < poll_cnt; i++) {
> > @@ -5961,5 +5961,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> >  }
> >  /* Start new measuring interval */
> > -pmd->rxq_interval = now + PMD_RXQ_INTERVAL_LEN;
> > +pmd->rxq_next_cycle_store = now + PMD_RXQ_INTERVAL_LEN;
> >  }
> >
> > --
> > 1.8.3.1
> >
> >
> >
> >
> >
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8 6/6] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.

2017-10-20 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 20, 2017 9:44 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Darrell Ball <dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>
> Subject: RE: [PATCH v8 6/6] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> 
> >From: Fischetti, Antonio
> >Sent: Thursday, October 19, 2017 5:54 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Darrell Ball
> ><dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>
> >Subject: [PATCH v8 6/6] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> >
> >For readability purposes dpdk_mp_put is renamed as dpdk_mp_free.
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> 
> 
> Hi Antonio,
> 
> I already Acked this patch in v6 (it was patch 4/5 of that series) - in 
> future,
> please carry forward Acks ;)

[Antonio] My bad, sorry for that.

> 
> Acked-by: Mark Kavanagh <mark.b.kavan...@intel.com>
> 
> Cheers,
> Mark
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8 4/6] netdev-dpdk: manage failure in mempool name creation.

2017-10-20 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 20, 2017 10:28 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Darrell Ball <dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>
> Subject: RE: [PATCH v8 4/6] netdev-dpdk: manage failure in mempool name
> creation.
> 
> >From: Fischetti, Antonio
> >Sent: Thursday, October 19, 2017 5:54 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Darrell Ball
> ><dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>
> >Subject: [PATCH v8 4/6] netdev-dpdk: manage failure in mempool name creation.
> >
> >In case a mempool name could not be generated log a message
> >and return a null mempool pointer to the caller.
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> >port.")
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > lib/netdev-dpdk.c | 7 +++
> > 1 file changed, 7 insertions(+)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index dc1e9c3..6fc6e1b 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -502,6 +502,9 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> > int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> >h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> > if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >+VLOG_DBG("snprintf returned %d. Failed to generate a mempool "
> >+"name for \"%s\". Hash:0x%x, mtu:%d, mbufs:%u.",
> >+ret, dmp->if_name, h, dmp->mtu, dmp->mp_size);
> > return NULL;
> > }
> > return mp_name;
> >@@ -533,6 +536,10 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> >
> > do {
> > char *mp_name = dpdk_mp_name(dmp);
> >+if (!mp_name) {
> >+rte_free(dmp);
> >+return NULL;
> >+}
> 
> 
> This is a fix, and as such, needs to include 'Fixes: " in the 
> commit
> message.
> 
> I believe that Kevin made the same comment on this patch in v7.

[Antonio] Actually I added 
   Fixes: d555d9bded5f ("netdev-dpdk: Create separate...
right before the Signed-off-by line.
Is that what you meant?


> 
> Thanks,
> Mark
> 
> >
> > VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
> >   "on socket %d for %d Rx and %d Tx queues.",
> >--
> >2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8 0/6] netdev-dpdk: Fix mempool management and other cleanup.

2017-10-19 Thread Fischetti, Antonio


> -Original Message-
> From: Stokes, Ian
> Sent: Thursday, October 19, 2017 6:22 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v8 0/6] netdev-dpdk: Fix mempool management and
> other cleanup.
> 
> > Patch #1, #2 and #4 contain the fixes.

[Antonio] Further to previous line: patches #1 and #2 do fix the issues we saw: 
 - issue with vhostuserclient in a PVP test
 - issue of new MTU not displayed when an existing mempool is returned.
 - issue with the NUMA-Aware usecase 

All other patches #3, #4, #5 and #6 are small improvements or just clean-up 
that we 
could even skip at all.
Actually patch #4 is not just a clean-up, it's managing an unlikely event that 
'might' happen (I've never seen it), that's why I promoted it as a 'fix' in the
line above.

So just the first 2 patches are needed to fix the mempool issues.


> > All other patches in this series are a clean up for code readability or
> > small improvements.
> >
> 
> Hi Antonio, if the fixes are in patches 1 2 and 4 is there a reason they have
> not been grouped in a patchset and patch 1,2,3?
> 
> The other patches could be applied separately afterwards?

[Antonio] Patches #1 and #2 are the ones needed to fix the issues.
All other patches are clean-up/small improvement, that could be optionally
applied. I think it's better to apply them in the given order.


> 
> > List of versions:
> >  - v8:
> >- Debug message rephrased in patch #2.
> >- Reworked patch #4 for snprintf error code.
> >- Comments in patch #6 moved into patch #1.
> >
> >  - v7:
> >- Restored 2 separate patches for the 2 fixes.
> >- patch #1: detect when previous mempools must be released.
> >- patch #2: mempool name generation for NUMA-awareness test case.
> >- patch "netdev-dpdk: manage empty mempool names." renamed as
> >  "netdev-dpdk: manage failure in mempool name creation."
> >- Various rework based on comments.
> >
> >  - v6:
> >- patches #1 and #2 squashed into one.
> >- Reworked to consider the latest comments.
> >- tested the release of pre-existing mempools (reported by Ciara L.)
> >- tested the change of MTU when an existing mempool is returned
> >  (reported by Robert M.)
> >- tested the NUMA-Awareness usecase (reported by Ciara L.)
> >  - v5: manage new MTU value when a pre-existing mempool is returned.
> >  - v4: fix NUMA awareness usecase
> >  - v3: avoid deletion of pre-existing mempools
> >  - v2: rework to accomodate code changes for dpdk ports too
> >  - v1: 1st implementation.
> >
> > Fischetti, Antonio (6):
> >   netdev-dpdk: fix management of pre-existing mempools.
> >   Fix mempool names to reflect socket id.
> >   netdev-dpdk: skip init for existing mempools.
> >   netdev-dpdk: manage failure in mempool name creation.
> >   netdev-dpdk: Reword mp_size as n_mbufs.
> >   netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> >
> >  lib/netdev-dpdk.c | 96 ++
> > -
> >  1 file changed, 59 insertions(+), 37 deletions(-)
> >
> > --
> > 2.4.11
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8 4/6] netdev-dpdk: manage failure in mempool name creation.

2017-10-19 Thread Fischetti, Antonio


> -Original Message-
> From: Stokes, Ian
> Sent: Thursday, October 19, 2017 6:23 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v8 4/6] netdev-dpdk: manage failure in mempool
> name creation.
> 
> > In case a mempool name could not be generated log a message and return a
> > null mempool pointer to the caller.
> >
> 
> Now that I look at this, is it the case that the issues have been resolved in
> patches 1 and 2 but this is a clean up patch for something that might happen?

[Antonio] Exactly. All the issues we saw: 
 - issue with vhostuserclient in a PVP test
 - issue of new MTU not displayed when an existing mempool is returned.
 - issue with the NUMA-Aware usecase
get resolved by patches #1 and #2.

All remaining patches are small improvements or just cosmetics.

This patch would be a small improvement to manage the unlikely event of a name 
creation failure that 'might' happen. 
So this has nothing to do with the issues listed above. I added this patch to 
the series because  - after looking at the code - I thought it would be good to 
track and manage an empty mempool name, just to be extra-safe.


> 
> 
> > CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> > CC: Darrell Ball <dlu...@gmail.com>
> > CC: Ciara Loftus <ciara.lof...@intel.com>
> > CC: Kevin Traynor <ktray...@redhat.com>
> > CC: Aaron Conole <acon...@redhat.com>
> > Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> > port.")
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/netdev-dpdk.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index dc1e9c3..6fc6e1b
> > 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -502,6 +502,9 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> >  int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> > h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> >  if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> > +VLOG_DBG("snprintf returned %d. Failed to generate a mempool "
> > +"name for \"%s\". Hash:0x%x, mtu:%d, mbufs:%u.",
> > +ret, dmp->if_name, h, dmp->mtu, dmp->mp_size);
> >  return NULL;
> >  }
> >  return mp_name;
> > @@ -533,6 +536,10 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> > *mp_exists)
> >
> >  do {
> >  char *mp_name = dpdk_mp_name(dmp);
> > +if (!mp_name) {
> > +rte_free(dmp);
> > +return NULL;
> > +}
> >
> >  VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
> >"on socket %d for %d Rx and %d Tx queues.",
> > --
> > 2.4.11
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 2/6] netdev-dpdk: Fix mempool names to reflect socket id.

2017-10-19 Thread Fischetti, Antonio
Thanks Kevin for your review.
I could re-spin a new version to take into account your comments on the patch 
#6.
I was wondering a better way to rephrase the debug message in this patch.
Would it be ok something like:

  VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
  "on socket %d for %d Rx and %d Tx queues.",
  dev->up.name, dmp->mp_size, 
   dev->requested_socket_id,
  dev->requested_n_rxq, dev->requested_n_txq);

Any suggestion, let me know.

-Antonio

> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Thursday, October 19, 2017 11:33 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Aaron Conole
> <acon...@redhat.com>
> Subject: Re: [PATCH v7 2/6] netdev-dpdk: Fix mempool names to reflect socket
> id.
> 
> On 10/18/2017 05:01 PM, antonio.fische...@intel.com wrote:
> > Create mempool names by considering also the NUMA socket number.
> > So a name reflects what socket the mempool is allocated on.
> > This change is needed for the NUMA-awareness feature.
> >
> > CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> > CC: Kevin Traynor <ktray...@redhat.com>
> > CC: Aaron Conole <acon...@redhat.com>
> > Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> > Tested-by: Ciara Loftus <ciara.lof...@intel.com>
> > Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> port.")
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> 
> If you have to re-spin anyway I'd make the debug sentence a little more
> natural sounding, but it's ok as is and I don't think it's worth
> re-spinning just for that.
> 
> Otherwise LGTM.
> Acked-by: Kevin Traynor <ktray...@redhat.com>
> 
> 
> > ---
> > Mempool names now contains the requested socket id and become like:
> > "ovs_4adb057e_1_2030_20512".
> >
> > Tested with DPDK 17.05.2 (from dpdk-stable branch).
> > NUMA-awareness feature enabled (DPDK/config/common_base).
> >
> > Created 1 single dpdkvhostuser port type.
> > OvS pmd-cpu-mask=FF3 # enable cores on both numa nodes
> > QEMU core mask = 0xFC000 # cores for qemu on numa node 1 only
> >
> >  Before launching the VM:
> >  
> > ovs-appctl dpif-netdev/pmd-rxq-show
> > shows core #1 is serving the vhu port.
> >
> > pmd thread numa_id 0 core_id 1:
> > isolated : false
> > port: dpdkvhostuser0queue-id: 0
> >
> >  After launching the VM:
> >  ---
> > the vhu port is now managed by core #27
> > pmd thread numa_id 1 core_id 27:
> > isolated : false
> > port: dpdkvhostuser0queue-id: 0
> >
> > and the log shows a new mempool is allocated on NUMA node 1, while
> > the previous one is deleted:
> >
> > 2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated
> "ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs
> > 2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing
> "ovs_4adb057e_0_2030_20512" mempool
> > ---
> > ---
> >  lib/netdev-dpdk.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index d49afd8..3155505 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> >  {
> >  uint32_t h = hash_string(dmp->if_name, 0);
> >  char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
> > -int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
> > -   h, dmp->mtu, dmp->mp_size);
> > +int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> > +   h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> >  if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >  return NULL;
> >  }
> > @@ -535,9 +535,10 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> *mp_exists)
> >  char *mp_name = dpdk_mp_name(dmp);
> >
> >  VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
> > - "with %d Rx and %d Tx queues.",
> > + "with %d Rx and %d Tx queues, socket id:%d.",
> >   dmp->mp_size, dev->up.name,
> > - dev->requested_n_rxq, dev->requested_n_txq);
> > + dev->requested_n_rxq, dev->requested_n_txq,
> > + dev->requested_socket_id);
> >
> >  dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,
> >MP_CACHE_SZ,
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

2017-10-18 Thread Fischetti, Antonio


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Fischetti, Antonio
> Sent: Wednesday, October 18, 2017 10:43 AM
> To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-
> existing mempools.
> 
> 
> 
> > -Original Message-
> > From: Kavanagh, Mark B
> > Sent: Tuesday, October 17, 2017 10:14 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> > Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> > Darrell Ball <dlu...@gmail.com>
> > Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> > mempools.
> >
> > >From: Fischetti, Antonio
> > >Sent: Tuesday, October 17, 2017 6:04 PM
> > >To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; d...@openvswitch.org
> > >Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> > >Darrell Ball <dlu...@gmail.com>
> > >Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> > >mempools.
> > >
> > >Thanks Mark, comments inline.
> > >
> > >-Antonio
> > >
> > >> -Original Message-
> > >> From: Kavanagh, Mark B
> > >> Sent: Tuesday, October 17, 2017 2:34 PM
> > >> To: Fischetti, Antonio <antonio.fische...@intel.com>; 
> > >> d...@openvswitch.org
> > >> Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole
> <acon...@redhat.com>;
> > >> Darrell Ball <dlu...@gmail.com>
> > >> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> > >> mempools.
> > >>
> > >> >From: Fischetti, Antonio
> > >> >Sent: Monday, October 16, 2017 2:15 PM
> > >> >To: d...@openvswitch.org
> > >> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Kevin Traynor
> > >> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Darrell Ball
> > >> ><dlu...@gmail.com>; Fischetti, Antonio <antonio.fische...@intel.com>
> > >> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> > >mempools.
> > >> >
> > >> >Fix issues on reconfiguration of pre-existing mempools.
> > >> >This patch avoids to call dpdk_mp_put() - and erroneously
> > >> >release the mempool - when it already exists.
> > >> >Create mempool names by considering also the NUMA socket number.
> > >> >So a name reflects what socket the mempool is allocated on.
> > >> >This change is needed for the NUMA-awareness feature.
> > >>
> > >> Hi Antonio,
> > >>
> > >> Is there any particular reason why you've combined patches 1 and 2 of the
> > >> previous series in a single patch here?
> > >>
> > >> I would have thought that these two separate issues would warrant two
> > >> individual patches (particularly with respect to the reported-by, tested-
> by
> > >> tags).
> > >
> > >[Antonio]
> > >I guess I misunderstood your previous review where you asked to squash
> patches
> > >1 and 3 into one patch.
> >
> > Hi Antonio,
> >
> > I figured as much ;)
> >
> > >I understood instead to squash the first 2 patches because they were both
> bug-
> > >fixes.
> > >In the next version v7 I'll restore the 2 separate patches.
> >
> > Thanks - I think that's a much cleaner approach.
> >
> > >
> > >>
> > >> Maybe it's not a big deal, but noted here nonetheless.
> > >>
> > >> Apart from that, there are some comments inline.
> > >>
> > >> Thanks again,
> > >> Mark
> > >>
> > >> >
> > >> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> > >> >CC: Kevin Traynor <ktray...@redhat.com>
> > >> >CC: Aaron Conole <acon...@redhat.com>
> > >> >CC: Darrell Ball <dlu...@gmail.com>
> > >> >Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> > >> >Tested-by: Ciara Loftus <ciara.lof...@intel.com>
> > >> >Reported-by: Róbert Mulik <robert.mu...@ericsson.com>
> > >> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory 

Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

2017-10-18 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Tuesday, October 17, 2017 10:14 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> Darrell Ball <dlu...@gmail.com>
> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> mempools.
> 
> >From: Fischetti, Antonio
> >Sent: Tuesday, October 17, 2017 6:04 PM
> >To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; d...@openvswitch.org
> >Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> >Darrell Ball <dlu...@gmail.com>
> >Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> >mempools.
> >
> >Thanks Mark, comments inline.
> >
> >-Antonio
> >
> >> -Original Message-
> >> From: Kavanagh, Mark B
> >> Sent: Tuesday, October 17, 2017 2:34 PM
> >> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> >> Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> >> Darrell Ball <dlu...@gmail.com>
> >> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> >> mempools.
> >>
> >> >From: Fischetti, Antonio
> >> >Sent: Monday, October 16, 2017 2:15 PM
> >> >To: d...@openvswitch.org
> >> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Kevin Traynor
> >> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Darrell Ball
> >> ><dlu...@gmail.com>; Fischetti, Antonio <antonio.fische...@intel.com>
> >> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> >mempools.
> >> >
> >> >Fix issues on reconfiguration of pre-existing mempools.
> >> >This patch avoids to call dpdk_mp_put() - and erroneously
> >> >release the mempool - when it already exists.
> >> >Create mempool names by considering also the NUMA socket number.
> >> >So a name reflects what socket the mempool is allocated on.
> >> >This change is needed for the NUMA-awareness feature.
> >>
> >> Hi Antonio,
> >>
> >> Is there any particular reason why you've combined patches 1 and 2 of the
> >> previous series in a single patch here?
> >>
> >> I would have thought that these two separate issues would warrant two
> >> individual patches (particularly with respect to the reported-by, tested-by
> >> tags).
> >
> >[Antonio]
> >I guess I misunderstood your previous review where you asked to squash 
> >patches
> >1 and 3 into one patch.
> 
> Hi Antonio,
> 
> I figured as much ;)
> 
> >I understood instead to squash the first 2 patches because they were both 
> >bug-
> >fixes.
> >In the next version v7 I'll restore the 2 separate patches.
> 
> Thanks - I think that's a much cleaner approach.
> 
> >
> >>
> >> Maybe it's not a big deal, but noted here nonetheless.
> >>
> >> Apart from that, there are some comments inline.
> >>
> >> Thanks again,
> >> Mark
> >>
> >> >
> >> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >> >CC: Kevin Traynor <ktray...@redhat.com>
> >> >CC: Aaron Conole <acon...@redhat.com>
> >> >CC: Darrell Ball <dlu...@gmail.com>
> >> >Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> >> >Tested-by: Ciara Loftus <ciara.lof...@intel.com>
> >> >Reported-by: Róbert Mulik <robert.mu...@ericsson.com>
> >> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> >> >port.")
> >> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >> >---
> >> > Test on releasing pre-existing mempools
> >> > ===
> >> >I've tested this patch by
> >> >  - changing at run-time the number of Rx queues:
> >> >  ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4
> >> >
> >> >  - reducing the MTU of the dpdk ports of 1 byte to force
> >> >the configuration of an existing mempool:
> >> >  ovs-vsctl set Interface dpdk0 mtu_request=1499
> >> >
> >> >This issue was observed in a PVP test topology with dpdkvhostuserclient
> >> >ports. I

Re: [ovs-dev] [PATCH v6 5/5] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.

2017-10-17 Thread Fischetti, Antonio
Thanks Mark, comments inline.

-Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Tuesday, October 17, 2017 2:36 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Darrell Ball <dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>
> Subject: RE: [PATCH v6 5/5] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> 
> >From: Fischetti, Antonio
> >Sent: Monday, October 16, 2017 2:15 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Darrell Ball
> ><dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>
> >Subject: [PATCH v6 5/5] netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> >
> >For readability purposes dpdk_mp_put is renamed as dpdk_mp_free.
> >Some other comments are also added to mempool functions.
> 
> Hey Antonio,
> 
> Some minor comments inline.
> 
> Thanks,
> Mark
> 
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > lib/netdev-dpdk.c | 15 ++-
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 5cf1392..ca07918 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -587,6 +587,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> > return NULL;
> > }
> >
> >+/* Returns a valid pointer when either of the two cases occur:
> >+ * a new mempool was just created or the requested mempool is already
> >+ * existing. */
> 
> It may be clearer to re-write the above as follows:
> 
> "
> Returns a valid pointer when either of the following is true:
>  - a new mempool was just created
>  - a matching mempool already exists
> "
> 

[Antonio]
Thanks, will do that.


> > static struct dpdk_mp *
> > dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
> > {
> >@@ -599,8 +602,9 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> > return dmp;
> > }
> >
> >+/* Release an existing mempool. */
> > static void
> >-dpdk_mp_put(struct dpdk_mp *dmp)
> >+dpdk_mp_free(struct dpdk_mp *dmp)
> > {
> > char *mp_name;
> >
> >@@ -617,8 +621,8 @@ dpdk_mp_put(struct dpdk_mp *dmp)
> > ovs_mutex_unlock(_mp_mutex);
> > }
> >
> >-/* Tries to allocate new mempool on requested_socket_id with
> >- * mbuf size corresponding to requested_mtu.
> >+/* Tries to allocate a new mempool on requested_socket_id with a size
> >+ * determined by requested_mtu and requested Rx/Tx queues.
> 
> This comment needs to be updated, as previously described in review comments 
> of
> patch 1 of the series.

[Antonio] ok, will do.


> 
> 
> >  * On success new configuration will be applied.
> >  * On error, device will be left unchanged. */
> > static int
> >@@ -644,7 +648,8 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
> > dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> > return EEXIST;
> > } else {
> >-dpdk_mp_put(dev->dpdk_mp);
> >+/* A new mempool was created, release the previous one. */
> >+dpdk_mp_free(dev->dpdk_mp);
> > dev->dpdk_mp = mp;
> > dev->mtu = dev->requested_mtu;
> > dev->socket_id = dev->requested_socket_id;
> >@@ -1089,7 +1094,7 @@ common_destruct(struct netdev_dpdk *dev)
> > OVS_EXCLUDED(dev->mutex)
> > {
> > rte_free(dev->tx_q);
> >-dpdk_mp_put(dev->dpdk_mp);
> >+dpdk_mp_free(dev->dpdk_mp);
> >
> > ovs_list_remove(>list_node);
> > free(ovsrcu_get_protected(struct ingress_policer *,
> >--
> >2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 3/5] netdev-dpdk: manage empty mempool names.

2017-10-17 Thread Fischetti, Antonio
Thanks Mark for your suggestions, I'll rework accordingly.

-Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Tuesday, October 17, 2017 2:35 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Darrell Ball <dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>
> Subject: RE: [PATCH v6 3/5] netdev-dpdk: manage empty mempool names.
> 
> >From: Fischetti, Antonio
> >Sent: Monday, October 16, 2017 2:15 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Darrell Ball
> ><dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>
> >Subject: [PATCH v6 3/5] netdev-dpdk: manage empty mempool names.
> 
> It's not just empty names - the name could also be too long. Probably best to
> rephrase the commit name accordingly.
> 
> >
> >In case a mempool name could not be generated log a message
> >and return a null mempool pointer to the caller.
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > lib/netdev-dpdk.c | 7 +++
> > 1 file changed, 7 insertions(+)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 07c438a..dd5759b 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -502,6 +502,9 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> > int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> >h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> > if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >+VLOG_ERR("Failed to generate a mempool name for \"%s\". "
> >+"Hash:0x%x, mtu:%d, mbufs:%u, ret:%d",
> >+dmp->if_name, h, dmp->mtu, dmp->mp_size, ret);
> 
> A string from ovs_strerror(ret) would probably be more useful than the return
> value itself here.
> I'm not sure how useful the individual values themselves are in an ERR log
> either (more suited to a DBG log).
> -Mark
> 
> > return NULL;
> > }
> > return mp_name;
> >@@ -533,6 +536,10 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> >
> > do {
> > char *mp_name = dpdk_mp_name(dmp);
> >+if (!mp_name) {
> >+rte_free(dmp);
> >+return NULL;
> >+}
> >
> > VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
> >  "with %d Rx and %d Tx queues, socket id:%d.",
> >--
> >2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 2/5] netdev-dpdk: skip init for existing mempools.

2017-10-17 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Tuesday, October 17, 2017 2:34 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Darrell Ball <dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>
> Subject: RE: [PATCH v6 2/5] netdev-dpdk: skip init for existing mempools.
> 
> >From: Fischetti, Antonio
> >Sent: Monday, October 16, 2017 2:15 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Darrell Ball
> ><dlu...@gmail.com>; Loftus, Ciara <ciara.lof...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> ><antonio.fische...@intel.com>
> >Subject: [PATCH v6 2/5] netdev-dpdk: skip init for existing mempools.
> >
> >Skip initialization of mempool packet areas if this was already
> >done in a previous call to dpdk_mp_create.
> 
> Hi Antonio,
> 
> As stated in my previous review, I believe that this could probably be folded
> into patch 1 of the series (it was patch 3 of v5).
> However, I don't object strongly to this patch, so I'll leave it to your
> discretion.

[Antonio]
I'm keeping this change in a separate patch because it is not related 
to the fixes for the mempool management. 
It's a small improvement, not so much to save CPU cycles, it's actually 
a clean up of the code. 
 
> 
> Other than that, LGTM.
> 
> Thanks,
> Mark
> 
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > lib/netdev-dpdk.c | 10 +-
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 7f2d7ed..07c438a 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -550,6 +550,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> > if (dmp->mp) {
> > VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", mp_name,
> >  dmp->mp_size);
> >+/* rte_pktmbuf_pool_create has done some initialization of the
> >+ * rte_mbuf part of each dp_packet. Some OvS specific fields
> >+ * of the packet still need to be initialized by
> >+ * ovs_rte_pktmbuf_init. */
> >+rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);
> > } else if (rte_errno == EEXIST) {
> > /* A mempool with the same name already exists.  We just
> >  * retrieve its pointer to be returned to the caller. */
> >@@ -566,11 +571,6 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> > }
> > free(mp_name);
> > if (dmp->mp) {
> >-/* rte_pktmbuf_pool_create has done some initialization of the
> >- * rte_mbuf part of each dp_packet, while ovs_rte_pktmbuf_init
> >- * initializes some OVS specific fields of dp_packet.
> >- */
> >-rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);
> > return dmp;
> > }
> > } while (!(*mp_exists) &&
> >--
> >2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

2017-10-17 Thread Fischetti, Antonio
Thanks Mark, comments inline.

-Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Tuesday, October 17, 2017 2:34 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Kevin Traynor <ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>;
> Darrell Ball <dlu...@gmail.com>
> Subject: RE: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing
> mempools.
> 
> >From: Fischetti, Antonio
> >Sent: Monday, October 16, 2017 2:15 PM
> >To: d...@openvswitch.org
> >Cc: Kavanagh, Mark B <mark.b.kavan...@intel.com>; Kevin Traynor
> ><ktray...@redhat.com>; Aaron Conole <acon...@redhat.com>; Darrell Ball
> ><dlu...@gmail.com>; Fischetti, Antonio <antonio.fische...@intel.com>
> >Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.
> >
> >Fix issues on reconfiguration of pre-existing mempools.
> >This patch avoids to call dpdk_mp_put() - and erroneously
> >release the mempool - when it already exists.
> >Create mempool names by considering also the NUMA socket number.
> >So a name reflects what socket the mempool is allocated on.
> >This change is needed for the NUMA-awareness feature.
> 
> Hi Antonio,
> 
> Is there any particular reason why you've combined patches 1 and 2 of the
> previous series in a single patch here?
> 
> I would have thought that these two separate issues would warrant two
> individual patches (particularly with respect to the reported-by, tested-by
> tags).

[Antonio]
I guess I misunderstood your previous review where you asked to squash patches 
1 and 3 into one patch.
I understood instead to squash the first 2 patches because they were both 
bug-fixes.
In the next version v7 I'll restore the 2 separate patches.

> 
> Maybe it's not a big deal, but noted here nonetheless.
> 
> Apart from that, there are some comments inline.
> 
> Thanks again,
> Mark
> 
> >
> >CC: Mark B Kavanagh <mark.b.kavan...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> >Tested-by: Ciara Loftus <ciara.lof...@intel.com>
> >Reported-by: Róbert Mulik <robert.mu...@ericsson.com>
> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> >port.")
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > Test on releasing pre-existing mempools
> > ===
> >I've tested this patch by
> >  - changing at run-time the number of Rx queues:
> >  ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4
> >
> >  - reducing the MTU of the dpdk ports of 1 byte to force
> >the configuration of an existing mempool:
> >  ovs-vsctl set Interface dpdk0 mtu_request=1499
> >
> >This issue was observed in a PVP test topology with dpdkvhostuserclient
> >ports. It can happen also with dpdk type ports, eg by reducing the MTU
> >of 1 byte.
> >
> >To replicate the bug scenario in the PVP case it's sufficient to
> >set 1 dpdkvhostuserclient port, and just boot the VM.
> >
> >Below some more details on my own test setup.
> >
> > PVP test setup
> > --
> >CLIENT_SOCK_DIR=/tmp
> >SOCK0=dpdkvhostuser0
> >SOCK1=dpdkvhostuser1
> >
> >1 PMD
> >Add 2 dpdk ports, n_rxq=1
> >Add 2 vhu ports both of type dpdkvhostuserclient and specify 
> >vhost-server-path
> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> >path="$CLIENT_SOCK_DIR/$SOCK0"
> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> >path="$CLIENT_SOCK_DIR/$SOCK1"
> >
> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> > add-flow br0 in_port=1,action=output:3
> > add-flow br0 in_port=3,action=output:1
> > add-flow br0 in_port=4,action=output:2
> > add-flow br0 in_port=2,action=output:4
> >
> > Launch QEMU
> > ---
> >As OvS vhu ports are acting as clients, we must specify 'server' in the next
> >command.
> >VM_IMAGE=
> >
> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-
> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-
> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem
> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev
> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-
&

Re: [ovs-dev] [PATCH v2] dpctl: manage ret value when dumping CT entries.

2017-10-16 Thread Fischetti, Antonio
Any comment on v2?

Thanks,
Antonio

> -Original Message-
> From: Fischetti, Antonio
> Sent: Tuesday, September 26, 2017 10:45 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2] dpctl: manage ret value when dumping CT
> entries.
> 
> Actually I forgot to add in the commit message
> 
> Reviewed-by: Greg Rose <gvrose8...@gmail.com>
> 
> Can this please be added later?
> 
> Thanks, Antonio
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> > On Behalf Of antonio.fische...@intel.com
> > Sent: Tuesday, September 26, 2017 10:37 AM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH v2] dpctl: manage ret value when dumping CT
> entries.
> >
> > Manage error value returned by ct_dpif_dump_next.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dpctl.c | 27 ---
> >  1 file changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/dpctl.c b/lib/dpctl.c
> > index 8951d6e..d229c97 100644
> > --- a/lib/dpctl.c
> > +++ b/lib/dpctl.c
> > @@ -1286,7 +1286,7 @@ dpctl_dump_conntrack(int argc, const char *argv[],
> >  return error;
> >  }
> >
> > -while (!ct_dpif_dump_next(dump, )) {
> > +while (!(error = ct_dpif_dump_next(dump, ))) {
> >  struct ds s = DS_EMPTY_INITIALIZER;
> >
> >  ct_dpif_format_entry(, , dpctl_p->verbosity,
> > @@ -1296,6 +1296,13 @@ dpctl_dump_conntrack(int argc, const char *argv[],
> >  dpctl_print(dpctl_p, "%s\n", ds_cstr());
> >  ds_destroy();
> >  }
> > +if (error == EOF) {
> > +/* Any CT entry was dumped with no issue. */
> > +error = 0;
> > +} else if (error) {
> > +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> > +}
> > +
> >  ct_dpif_dump_done(dump);
> >  dpif_close(dpif);
> >  return error;
> > @@ -1384,7 +1391,7 @@ dpctl_ct_stats_show(int argc, const char *argv[],
> >  }
> >
> >  int tot_conn = 0;
> > -while (!ct_dpif_dump_next(dump, )) {
> > +while (!(error = ct_dpif_dump_next(dump, ))) {
> >  ct_dpif_entry_uninit();
> >  tot_conn++;
> >  switch (cte.tuple_orig.ip_proto) {
> > @@ -1425,6 +1432,13 @@ dpctl_ct_stats_show(int argc, const char *argv[],
> >  break;
> >  }
> >  }
> > +if (error == EOF) {
> > +/* All CT entries were dumped with no issue.  */
> > +error = 0;
> > +} else if (error) {
> > +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> > +/* Fall through to show any other info we collected. */
> > +}
> >
> >  dpctl_print(dpctl_p, "Connections Stats:\nTotal: %d\n", tot_conn);
> >  if (proto_stats[CT_STATS_TCP]) {
> > @@ -1521,7 +1535,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
> >  int tot_conn = 0;
> >  uint32_t *conn_per_bkts = xzalloc(tot_bkts * sizeof(uint32_t));
> >
> > -while (!ct_dpif_dump_next(dump, )) {
> > +while (!(error = ct_dpif_dump_next(dump, ))) {
> >  ct_dpif_entry_uninit();
> >  tot_conn++;
> >  if (tot_bkts > 0) {
> > @@ -1533,6 +1547,13 @@ dpctl_ct_bkts(int argc, const char *argv[],
> >  }
> >  }
> >  }
> > +if (error == EOF) {
> > +/* All CT entries were dumped with no issue.  */
> > +error = 0;
> > +} else if (error) {
> > +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> > +/* Fall through and display all the collected info.  */
> > +}
> >
> >  dpctl_print(dpctl_p, "Current Connections: %d\n", tot_conn);
> >  dpctl_print(dpctl_p, "\n");
> > --
> > 2.4.11
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/6] netdev-dpdk: Fix management of pre-existing mempools.

2017-10-13 Thread Fischetti, Antonio
Thanks Mark for your review, some comment inline.

-Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 13, 2017 3:46 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v5 0/6] netdev-dpdk: Fix management of pre-
> existing mempools.
> 
> 
> 
> >From: ovs-dev-boun...@openvswitch.org 
> >[mailto:ovs-dev-boun...@openvswitch.org]
> >On Behalf Of antonio.fische...@intel.com
> >Sent: Wednesday, October 11, 2017 5:01 PM
> >To: d...@openvswitch.org
> >Subject: [ovs-dev] [PATCH v5 0/6] netdev-dpdk: Fix management of pre-existing
> >mempools.
> >
> >List of versions:
> > - v5: manage new MTU value when a pre-existing mempool is returned.
> > - v4: fix NUMA awareness usecase
> > - v3: avoid deletion of pre-existing mempools
> > - v2: rework to accomodate code changes for dpdk ports too
> > - v1: 1st implementation.
> >
> >Fischetti, Antonio (6):
> >  netdev-dpdk: fix management of pre-existing mempools.
> >  netdev-dpdk: Fix mempool names to reflect socket id.
> >  netdev-dpdk: skip init for existing mempools.
> >  netdev-dpdk: assert mempool names.
> >  netdev-dpdk: Reword mp_size as n_mbufs.
> >  netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
> 
> Hi Antonio,
> 
> Some general comments on this patchset:
> - patches 1 and 3 of the series should be squashed into one.

[Antonio] 
The fixes are for two different issues:
 - Issue #1: detecting when previous mempools must be released
 - Issue #2: mempool name generation for NUMA-awareness test case 

The 1st issue came earlier from a PVP test case and while its fix 
was being tested, the 2nd issue for the NUMA-awareness usecase 
came later on. I kept these two fixes into two separate
patches because the vHost Zero-copy development was blocked 
by the issue #1, so keeping two independent patches would speed
up upstreaming patch #1 - and unblocking vHost Zero-copy 
development.


> - patches 3-6 are purely cosmetic, and do not contribute to fixing the mempool
> reconfiguration mechanism;

[Antonio] Agree, the name of the series says "Fix management.."
but the real 'fixes' are in patches #1 and #2.


>   as such, they should be part of a separate patchset.
> - initial testing has been successful; I'll send on final results as soon as
> they are available.
> 
> Thanks,
> Mark
> 
> 
> >
> > lib/netdev-dpdk.c | 88 
> > +++---
> >-
> > 1 file changed, 50 insertions(+), 38 deletions(-)
> >
> >--
> >2.4.11
> >
> >___
> >dev mailing list
> >d...@openvswitch.org
> >https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 5/6] netdev-dpdk: Reword mp_size as n_mbufs.

2017-10-13 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 13, 2017 3:48 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v5 5/6] netdev-dpdk: Reword mp_size as n_mbufs.
> 
> >From: ovs-dev-boun...@openvswitch.org 
> >[mailto:ovs-dev-boun...@openvswitch.org]
> >On Behalf Of antonio.fische...@intel.com
> >Sent: Wednesday, October 11, 2017 5:01 PM
> >To: d...@openvswitch.org
> >Subject: [ovs-dev] [PATCH v5 5/6] netdev-dpdk: Reword mp_size as n_mbufs.
> >
> >Rename mp_size as n_mbufs in dpdk_mp structure.
> >This parameter is passed to rte mempool creation functions
> >and is meant to contain the number of elements inside
> >the requested mempool.
> 
> As previously mentioned, I don't believe that this patch should be part of the
> 'fix management of pre-existing mempools' patchset, since it's a cosmetic
> change, rather than an actual fix.
> 
> Apart from that, I don't really see a need for this change - I think 'mp_size'
> is sufficiently descriptive.

[Antonio] I'm proposing to reword mp_size because that
would mean "mempool size" to me, when instead it stores a 
number of mbufs. So it's a change similar to adding comments,
the purpose is to make code more readable.


> 
> Thanks,
> Mark
> 
> >
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> > lib/netdev-dpdk.c | 18 +-
> > 1 file changed, 9 insertions(+), 9 deletions(-)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index c451bc9..f67c311 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -308,7 +308,7 @@ struct dpdk_mp {
> > int mtu;
> > int socket_id;
> > char if_name[IFNAMSIZ];
> >-unsigned mp_size;
> >+unsigned n_mbufs;   /* Number of mbufs inside the mempool. */
> > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
> > };
> >
> >@@ -500,7 +500,7 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> > uint32_t h = hash_string(dmp->if_name, 0);
> > char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
> > int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> >-   h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> >+   h, dmp->socket_id, dmp->mtu, dmp->n_mbufs);
> > ovs_assert(ret >= 0 && ret < RTE_MEMPOOL_NAMESIZE);
> > return mp_name;
> > }
> >@@ -518,13 +518,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> > ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);
> >
> > /*
> >- * XXX: rough estimation of memory required for port:
> >+ * XXX: rough estimation of number of mbufs required for this port:
> >  * 
> >  * + 
> >  * + 
> >  * + 
> >  */
> >-dmp->mp_size = dev->requested_n_rxq * dev->requested_rxq_size
> >+dmp->n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
> > + dev->requested_n_txq * dev->requested_txq_size
> > + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
> > + MIN_NB_MBUF;
> >@@ -534,11 +534,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> >
> > VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
> >  "with %d Rx and %d Tx queues, socket id:%d.",
> >- dmp->mp_size, dev->up.name,
> >+ dmp->n_mbufs, dev->up.name,
> >  dev->requested_n_rxq, dev->requested_n_txq,
> >  dev->requested_socket_id);
> >
> >-dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,
> >+dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->n_mbufs,
> >   MP_CACHE_SZ,
> >   sizeof (struct dp_packet)
> >  - sizeof (struct rte_mbuf),
> >@@ -547,7 +547,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> >*mp_exists)
> >   dmp->socket_id);
> > if (dmp->mp) {
> > VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", mp_name

Re: [ovs-dev] [PATCH v5 1/6] netdev-dpdk: fix management of pre-existing mempools.

2017-10-13 Thread Fischetti, Antonio
Thanks Mark for your review, I've added my answers inline.

-Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 13, 2017 4:56 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v5 1/6] netdev-dpdk: fix management of pre-
> existing mempools.
> 
> >From: ovs-dev-boun...@openvswitch.org 
> >[mailto:ovs-dev-boun...@openvswitch.org]
> >On Behalf Of antonio.fische...@intel.com
> >Sent: Wednesday, October 11, 2017 5:01 PM
> >To: d...@openvswitch.org
> >Subject: [ovs-dev] [PATCH v5 1/6] netdev-dpdk: fix management of pre-existing
> >mempools.
> 
> Hi Antonio,
> 
> IMO, "Fix reconfiguration of pre-existing mempools" is a more
> appropriate/descriptive name for this commit.
> 
> Also, patch 3 of the series should be combined with this one.
> 
> Apart from that, some comments inline.
> 
> Thanks,
> Mark
> 
> >
> >Fix an issue on reconfiguration of pre-existing mempools.
> >This patch avoids to call dpdk_mp_put() - and erroneously
> >release the mempool - when it already exists.
> >
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >CC: Darrell Ball <dlu...@gmail.com>
> >Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> >Tested-by: Ciara Loftus <ciara.lof...@intel.com>
> >Reported-by: Róbert Mulik <robert.mu...@ericsson.com>
> >Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> >port.")
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >---
> >I've tested this patch by
> >  - changing at run-time the number of Rx queues:
> >  ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4
> >
> >  - reducing the MTU of the dpdk ports of 1 byte to force
> >the configuration of an existing mempool:
> >  ovs-vsctl set Interface dpdk0 mtu_request=1499
> >
> >This issue was observed in a PVP test topology with dpdkvhostuserclient
> >ports. It can happen also with dpdk type ports, eg by reducing the MTU
> >of 1 byte.
> >
> >To replicate the bug scenario in the PVP case it's sufficient to
> >set 1 dpdkvhostuserclient port, and just boot the VM.
> >
> >Below some more details on my own test setup.
> >
> > PVP test setup
> > --
> >CLIENT_SOCK_DIR=/tmp
> >SOCK0=dpdkvhostuser0
> >SOCK1=dpdkvhostuser1
> >
> >1 PMD
> >Add 2 dpdk ports, n_rxq=1
> >Add 2 vhu ports both of type dpdkvhostuserclient and specify 
> >vhost-server-path
> > ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> >path="$CLIENT_SOCK_DIR/$SOCK0"
> > ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> >path="$CLIENT_SOCK_DIR/$SOCK1"
> >
> >Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> > add-flow br0 in_port=1,action=output:3
> > add-flow br0 in_port=3,action=output:1
> > add-flow br0 in_port=4,action=output:2
> > add-flow br0 in_port=2,action=output:4
> >
> > Launch QEMU
> > ---
> >As OvS vhu ports are acting as clients, we must specify 'server' in the next
> >command.
> >VM_IMAGE=
> >
> > sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-
> >vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-
> >file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem
> >-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev
> >socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-
> >user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
> >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
> >socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-
> >user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
> >pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic
> >
> > Expected behavior
> > -
> >With this fix OvS shouldn't crash.
> >---
> > lib/netdev-dpdk.c | 34 +-
> > 1 file changed, 21 insertions(+), 13 deletions(-)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index c60f46f..e6f3ca4 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> > }
> >
> > static struct dpdk_mp *
> >-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >+dpdk_mp_create(struct netdev_dpdk *dev, int mtu

Re: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.

2017-10-13 Thread Fischetti, Antonio
+ CC Darrell to the loop because we already discussed about this.

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 13, 2017 5:07 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> 
> >From: Fischetti, Antonio
> >Sent: Friday, October 13, 2017 4:47 PM
> >To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; d...@openvswitch.org
> >Subject: RE: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> >
> >
> >
> >> -Original Message-
> >> From: Kavanagh, Mark B
> >> Sent: Friday, October 13, 2017 3:47 PM
> >> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> >> Subject: RE: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> >>
> >> >From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> >boun...@openvswitch.org]
> >> >On Behalf Of antonio.fische...@intel.com
> >> >Sent: Wednesday, October 11, 2017 5:01 PM
> >> >To: d...@openvswitch.org
> >> >Subject: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> >> >
> >> >Replace if statement with an assert.
> >> >
> >> >CC: Darrell Ball <dlu...@gmail.com>
> >> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >> >CC: Kevin Traynor <ktray...@redhat.com>
> >> >CC: Aaron Conole <acon...@redhat.com>
> >> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> >>
> >> Hey Antonio,
> >>
> >> As per my previous comments, I believe that this patch should not be
> >included
> >> in the patchset.
> >>
> >> Other than that, two comments inline below.
> >>
> >> Thanks,
> >> Mark
> >>
> >>
> >> >---
> >> > lib/netdev-dpdk.c | 4 +---
> >> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >> >
> >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> >index 2aa4a55..c451bc9 100644
> >> >--- a/lib/netdev-dpdk.c
> >> >+++ b/lib/netdev-dpdk.c
> >> >@@ -501,9 +501,7 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> >> > char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
> >> > int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> >> >h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> >> >-if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >> >-return NULL;
> >> >-}
> >> >+ovs_assert(ret >= 0 && ret < RTE_MEMPOOL_NAMESIZE);
> >>
> >> The behavior of ovs_assert in the event of failure is to abort execution -
> >are
> >> you sure that's what you want to happen here?
> >>
> >> In the previous implementation, NULL was returned, and execution continued;
> >are
> >> there some undesired side effects of same that you're trying to avoid with
> >this
> >> change?
> >
> >[Antonio] Actually I had originally just added a VLOG_ERR before returning
> >NULL like:
> >
> > if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >+VLOG_ERR("Failed to generate a mempool name for \"%s\". "
> >+"Hash:0x%x, mtu:%d, mbufs:%u",
> >+dmp->if_name, h, dmp->mtu, dmp->mp_size);
> > return NULL;
> > }
> >
> >Then this change to use an assert came after the discussion with Darrell and
> >makes sense
> >to me because if the mempool name is not generated - for sure it's an 
> >unlikely
> >event - we
> >cannot call the rte_pktmbuf_pool_create() which uses that name as a unique
> >identifier
> >for the mempool. That's why we agreed to use an assert. What do you think?
> 
> I'd approach it as follows:
> - [dpdk_mp_name]   return NULL if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) ;
> don't log failure here, as that's handled by netdev_dpdk_mempool_create()
> - [dpdk_mp_create] check for a NULL return value from dpdk_mp_name; return 
> same
> to caller
> - [dpdk_mp_get]return NULL  (no changes needed here)
> - [netdev_dpdk_mempool_create] return value of NULL yields an error log and
> returns rte_errno to the caller (no changes needed here).
> 
> At least with this approach, execution can continue, instead of shutting the
> application abnormally.

[Antonio] Yes, m

Re: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.

2017-10-13 Thread Fischetti, Antonio


> -Original Message-
> From: Kavanagh, Mark B
> Sent: Friday, October 13, 2017 3:47 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> 
> >From: ovs-dev-boun...@openvswitch.org 
> >[mailto:ovs-dev-boun...@openvswitch.org]
> >On Behalf Of antonio.fische...@intel.com
> >Sent: Wednesday, October 11, 2017 5:01 PM
> >To: d...@openvswitch.org
> >Subject: [ovs-dev] [PATCH v5 4/6] netdev-dpdk: assert mempool names.
> >
> >Replace if statement with an assert.
> >
> >CC: Darrell Ball <dlu...@gmail.com>
> >CC: Ciara Loftus <ciara.lof...@intel.com>
> >CC: Kevin Traynor <ktray...@redhat.com>
> >CC: Aaron Conole <acon...@redhat.com>
> >Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> 
> Hey Antonio,
> 
> As per my previous comments, I believe that this patch should not be included
> in the patchset.
> 
> Other than that, two comments inline below.
> 
> Thanks,
> Mark
> 
> 
> >---
> > lib/netdev-dpdk.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 2aa4a55..c451bc9 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -501,9 +501,7 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> > char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
> > int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
> >h, dmp->socket_id, dmp->mtu, dmp->mp_size);
> >-if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >-return NULL;
> >-}
> >+ovs_assert(ret >= 0 && ret < RTE_MEMPOOL_NAMESIZE);
> 
> The behavior of ovs_assert in the event of failure is to abort execution - are
> you sure that's what you want to happen here?
> 
> In the previous implementation, NULL was returned, and execution continued; 
> are
> there some undesired side effects of same that you're trying to avoid with 
> this
> change?

[Antonio] Actually I had originally just added a VLOG_ERR before returning NULL 
like:

 if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
+VLOG_ERR("Failed to generate a mempool name for \"%s\". "
+"Hash:0x%x, mtu:%d, mbufs:%u",
+dmp->if_name, h, dmp->mtu, dmp->mp_size);
 return NULL;
 }

Then this change to use an assert came after the discussion with Darrell and 
makes sense 
to me because if the mempool name is not generated - for sure it's an unlikely 
event - we 
cannot call the rte_pktmbuf_pool_create() which uses that name as a unique 
identifier
for the mempool. That's why we agreed to use an assert. What do you think?

> 
> 
> > return mp_name;
> > }
> >
> >--
> >2.4.11
> >
> >___
> >dev mailing list
> >d...@openvswitch.org
> >https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 02/10] Keepalive: Add initial keepalive support.

2017-10-13 Thread Fischetti, Antonio
Hi Bhanu,
a couple of minor comments below.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, September 15, 2017 5:40 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v5 02/10] Keepalive: Add initial keepalive support.
> 
> This commit introduces the initial keepalive support by adding
> 'keepalive' module and also helper and initialization functions
> that will be invoked by later commits.
> 
> This commit adds new ovsdb column "keepalive" that shows the status
> of the datapath threads. This is implemented for DPDK datapath and
> only status of PMD threads is reported.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/automake.mk|   2 +
>  lib/keepalive.c| 145 
> +
>  lib/keepalive.h|  88 +++
>  vswitchd/bridge.c  |   3 +
>  vswitchd/vswitch.ovsschema |   8 ++-
>  vswitchd/vswitch.xml   |  49 +++
>  6 files changed, 293 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 2415f4c..0d99f0a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/json.c \
>   lib/jsonrpc.c \
>   lib/jsonrpc.h \
> + lib/keepalive.c \
> + lib/keepalive.h \
>   lib/lacp.c \
>   lib/lacp.h \
>   lib/latch.h \
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> new file mode 100644
> index 000..1f151f6
> --- /dev/null
> +++ b/lib/keepalive.c
> @@ -0,0 +1,145 @@
> +/*
> + * Copyright (c) 2017 Intel, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +
> +#include "keepalive.h"
> +#include "lib/vswitch-idl.h"
> +#include "openvswitch/vlog.h"
> +#include "seq.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(keepalive);
> +
> +static bool keepalive_enable = false;  /* Keepalive disabled by default.
> */
> +static uint32_t keepalive_timer_interval;  /* keepalive timer interval. */
> +static struct keepalive_info ka_info;
> +
> +/* Returns true if keepalive is enabled, false otherwise. */
> +bool
> +ka_is_enabled(void)
> +{
> +return keepalive_enable;
> +}
> +
> +/* Finds the thread by 'tid' in 'process_list' map and update
> + * the thread state and last_seen_time stamp.  This is invoked
> + * periodically(based on keepalive-interval) as part of callback
> + * function in the context of keepalive thread.
> + */
> +static void
> +ka_set_thread_state_ts(pid_t tid, enum keepalive_state state,
> +   uint64_t last_alive)
> +{
> +struct ka_process_info *pinfo;
> +
> +ovs_mutex_lock(_info.proclist_mutex);
> +HMAP_FOR_EACH_WITH_HASH (pinfo, node, hash_int(tid, 0),
> + _info.process_list) {
> +if (pinfo->tid == tid) {
> +pinfo->state = state;
> +pinfo->last_seen_time = last_alive;
> +}
> +}
> +ovs_mutex_unlock(_info.proclist_mutex);
> +}
> +
> +/* Retrieve and return the keepalive timer interval from OVSDB. */
> +static uint32_t
> +ka_get_timer_interval(const struct smap *ovs_other_config)
> +{
> +uint32_t ka_interval;
> +
> +/* Timer granularity in milliseconds
> + * Defaults to OVS_KEEPALIVE_TIMEOUT(ms) if not set */

[Antonio] typo
* Defaults to OVS_KEEPALIVE_DEFAULT_TIMEOUT (ms) if not set. */

> +ka_interval = smap_get_int(ovs_other_config, "keepalive-interval",
> +   OVS_KEEPALIVE_DEFAULT_TIMEOUT);
> +
> +VLOG_INFO("Keepalive timer interval set to %"PRIu32" (ms)\n",
> ka_interval);
> +return ka_interval;
> +}
> +
> +/*
> + * This function is invoked periodically to write the status and
> + * last seen timestamp of the thread in to 'process_list' map.
> + */
> +static void
> +ka_update_thread_state(pid_t tid, const enum keepalive_state state,
> +   uint64_t last_alive)
> +{
> +switch (state) {
> +case KA_STATE_ALIVE:
> +case KA_STATE_MISSING:
> +ka_set_thread_state_ts(tid, KA_STATE_ALIVE, last_alive);
> +break;
> +case KA_STATE_UNUSED:
> +case KA_STATE_DOZING:
> +case KA_STATE_SLEEP:
> +case KA_STATE_DEAD:
> +case 

Re: [ovs-dev] [PATCH v5 05/10] dpif-netdev: Enable heartbeats for DPDK datapath.

2017-10-13 Thread Fischetti, Antonio
A couple of minor comments inline.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, September 15, 2017 5:40 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v5 05/10] dpif-netdev: Enable heartbeats for DPDK
> datapath.
> 
> This commit adds heartbeat mechanism support for DPDK datapath. Heartbeats
> are sent to registered PMD threads at predefined intervals (as set in ovsdb
> with 'keepalive-interval').
> 
> The heartbeats are only enabled when there is atleast one port added to
> the bridge and with active PMD thread polling the port.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c | 15 +--
>  lib/keepalive.c   | 44 
>  lib/keepalive.h   |  1 +
>  3 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index da419d5..fd0ce61 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1021,14 +1021,25 @@ sorted_poll_thread_list(struct dp_netdev *dp,
>  }
> 
>  static void *
> -ovs_keepalive(void *f_ OVS_UNUSED)
> +ovs_keepalive(void *f_)
>  {
> +struct dp_netdev *dp = f_;
> +
>  pthread_detach(pthread_self());
> 
>  for (;;) {
> -int interval;
> +int interval, n_pmds;
> +bool hb_enable;
> 
>  interval = get_ka_interval();
> +n_pmds = cmap_count(>poll_threads) - 1;
> +hb_enable = (n_pmds > 0) ? true : false;
> +
> +/* Dispatch heartbeats only if pmd[s] exist. */
> +if (hb_enable) {
> +dispatch_heartbeats();
> +}
> +
>  xnanosleep(interval);
>  }
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index da4defd..3067e73 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -284,6 +284,50 @@ ka_mark_pmd_thread_sleep(int tid)
>  }
>  }
> 
> +/* Dispatch heartbeats from 'ovs_keepalive' thread. */
> +void
> +dispatch_heartbeats(void)
> +{

[Antonio] I'd rather call this function ka_state_update() or similar?

 
> +struct ka_process_info *pinfo, *pinfo_next;
> +
> +/* Iterates over the list of processes in 'cached_process_list' map. */
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node,
> +_info.cached_process_list) {
> +if (pinfo->state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->state) {
> +case KA_STATE_UNUSED:

[Antonio] this case statement could be removed as already managed above.


> +break;
> +case KA_STATE_ALIVE:
> +pinfo->state = KA_STATE_MISSING;
> +pinfo->last_seen_time = time_wall_msec();
> +break;
> +case KA_STATE_MISSING:
> +pinfo->state = KA_STATE_DEAD;
> +break;
> +case KA_STATE_DEAD:
> +pinfo->state = KA_STATE_GONE;
> +break;
> +case KA_STATE_GONE:
> +break;
> +case KA_STATE_DOZING:
> +pinfo->state = KA_STATE_SLEEP;
> +pinfo->last_seen_time = time_wall_msec();
> +break;
> +case KA_STATE_SLEEP:
> +break;
> +default:
> +OVS_NOT_REACHED();
> +}
> +
> +/* Invoke 'ka_update_thread_state' cb function to update state info
> + * in to 'ka_info.process_list' map. */
> +ka_info.relay_cb(pinfo->tid, pinfo->state, pinfo->last_seen_time);
> +}
> +}
> +
>  void
>  ka_init(const struct smap *ovs_other_config)
>  {
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index 9e8bfdf..392a701 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -102,6 +102,7 @@ void ka_free_cached_threads(void);
>  void ka_cache_registered_threads(void);
>  void ka_mark_pmd_thread_alive(int);
>  void ka_mark_pmd_thread_sleep(int);
> +void dispatch_heartbeats(void);
>  void ka_init(const struct smap *);
>  void ka_destroy(void);
> 
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 04/10] dpif-netdev: Register packet processing cores to KA framework.

2017-10-13 Thread Fischetti, Antonio
Hi Bhanu,
a couple of minor comments inline.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, September 15, 2017 5:40 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v5 04/10] dpif-netdev: Register packet processing
> cores to KA framework.
> 
> This commit registers the packet processing PMD threads to keepalive
> framework. Only PMDs that have rxqs mapped will be registered and
> actively monitored by KA framework.
> 
> This commit spawns a keepalive thread that will dispatch heartbeats to
> PMD threads. The pmd threads respond to heartbeats by marking themselves
> alive. As long as PMD responds to heartbeats it is considered 'healthy'.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c |  79 ++
>  lib/keepalive.c   | 191 
> --
>  lib/keepalive.h   |  20 ++
>  lib/ovs-thread.c  |   6 ++
>  lib/ovs-thread.h  |   1 +
>  lib/util.c|  22 +++
>  lib/util.h|   1 +
>  7 files changed, 316 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index ca74df8..da419d5 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -49,6 +49,7 @@
>  #include "flow.h"
>  #include "hmapx.h"
>  #include "id-pool.h"
> +#include "keepalive.h"
>  #include "latch.h"
>  #include "netdev.h"
>  #include "netdev-vport.h"
> @@ -591,6 +592,7 @@ struct dp_netdev_pmd_thread {
>  uint64_t last_reload_seq;
>  atomic_bool reload; /* Do we need to reload ports? */
>  pthread_t thread;
> +pid_t tid;  /* Thread id of this pmd thread. */
>  unsigned core_id;   /* CPU core id of this pmd thread. */
>  int numa_id;/* numa node id of this pmd thread. */
>  bool isolated;
> @@ -1018,6 +1020,72 @@ sorted_poll_thread_list(struct dp_netdev *dp,
>  *n = k;
>  }
> 
> +static void *
> +ovs_keepalive(void *f_ OVS_UNUSED)
> +{
> +pthread_detach(pthread_self());
> +
> +for (;;) {
> +int interval;

[Antonio] shouldn't we put 'interval' declaration outside of the for (;;) loop?
   int interval;
   for (;;) {
   interval =...
   xnanosleep(..
   }

> +
> +interval = get_ka_interval();
> +xnanosleep(interval);
> +}
> +
> +return NULL;
> +}
> +
> +/* Kickstart 'ovs_keepalive' thread. */
> +static void
> +ka_thread_start(struct dp_netdev *dp)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +
> +if (ovsthread_once_start()) {
> +ovs_thread_create("ovs_keepalive", ovs_keepalive, dp);
> +
> +ovsthread_once_done();
> +}
> +}
> +
> +/* Register the datapath threads. This gets invoked on every datapath
> + * reconfiguration. The pmd thread[s] having rxq[s] mapped will be
> + * registered to KA framework.
> + */
> +static void
> +ka_register_datapath_threads(struct dp_netdev *dp)
> +{
> +if (!ka_is_enabled()) {
> +return;
> +}
> +
> +ka_thread_start(dp);
> +
> +ka_reload_datapath_threads_begin();
> +
> +struct dp_netdev_pmd_thread *pmd;
> +CMAP_FOR_EACH (pmd, node, >poll_threads) {
> +/*  Register only PMD threads. */
> +if (pmd->core_id != NON_PMD_CORE_ID) {
> +/* Skip PMD thread with no rxqs mapping. */
> +if (OVS_UNLIKELY(!hmap_count(>poll_list))) {
> +/* Rxq mapping changes due to datapath reconfiguration.
> + * If no rxqs mapped to PMD now due to reconfiguration,
> + * unregister the pmd thread. */
> +ka_unregister_thread(pmd->tid);
> +continue;
> +}
> +
> +ka_register_thread(pmd->tid);
> +VLOG_INFO("Registered PMD thread [%d] on Core[%d] to KA
> framework",
> +  pmd->tid, pmd->core_id);
> +}
> +}
> +ka_cache_registered_threads();
> +
> +ka_reload_datapath_threads_end();
> +}
> +
>  static void
>  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
>const char *argv[], void *aux OVS_UNUSED)
> @@ -3821,6 +3889,9 @@ reconfigure_datapath(struct dp_netdev *dp)
> 
>  /* Reload affected pmd threads. */
>  reload_affected_pmds(dp);
> +
> +/* Register datapath threads to KA monitoring. */
> +ka_register_datapath_threads(dp);
>  }
> 
>  /* Returns true if one of the netdevs in 'dp' requires a reconfiguration */
> @@ -4023,6 +4094,8 @@ pmd_thread_main(void *f_)
> 
>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
> +/* Stores tid in to 'pmd->tid'. */
> +ovsthread_settid(>tid);
>  ovs_numa_thread_setaffinity_core(pmd->core_id);
>  dpdk_set_lcore_id(pmd->core_id);
>  poll_cnt = 

Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with vhu client.

2017-10-11 Thread Fischetti, Antonio


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, October 11, 2017 2:37 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; Róbert Mulik
> <robert.mu...@ericsson.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with
> vhu client.
> 
> On 10/11/2017 02:04 PM, Fischetti, Antonio wrote:
> >
> >> -Original Message-
> >> From: Kevin Traynor [mailto:ktray...@redhat.com]
> >> Sent: Wednesday, October 11, 2017 11:40 AM
> >> To: Fischetti, Antonio <antonio.fische...@intel.com>; Róbert Mulik
> >> <robert.mu...@ericsson.com>; d...@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management
> with
> >> vhu client.
> >>
> >> On 10/11/2017 11:28 AM, Fischetti, Antonio wrote:
> >>> Hi Robert,
> >>> that's happening because the requested MTU is rounded up to the current
> >> boundary.
> >>> So if the current upper boundary is 2500 and we request 2000 =>
> >>> 2000 is rounded up to 2500 and the same mempool is returned.
> >>>
> >>> I may be wrong but this seems the wanted behavior, maybe Kevin can shed
> some
> >> light?
> >>> I may have missed some detail as I didn't follow this implementation since
> >> the
> >>> very beginning.
> >>>
> >>
> >> I think it's related to review comments I sent earlier today. mtu is
> >> mtu, but a value that is rounded from it is used in calculating the size
> >> of the mbuf. I suspect in this case, when the new mtu size results in
> >> the same rounded value, the current mempool is being reused (which is
> >> fine)
> >
> > [Antonio] exactly.
> >
> >> but the EEXISTS error value returned from reconfigure means that
> >> the change is not seen as successful and the old mtu value is restored
> >> to ovsdb.
> >
> > [Antonio]
> > I think this can be fixed by the following changes.
> > The new mtu value is stored when a pre-existing mempool is returned:
> > __
> > @@ -631,6 +631,11 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
> >   rte_strerror(rte_errno));
> >  return rte_errno;
> >  } else if (mp_exists) {
> > +/* If a new MTU was requested and its rounded value is the same
> > + * that is currently used, then the existing mempool was returned.
> > + * Update the new MTU value. */
> > +dev->mtu = dev->requested_mtu;
> > +dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> 
> What about requested_socket_id, isn't that needed also?

Yes will add it. 
Thanks Kevin and Robert, I'll rework it and post a v5. I'll take into
account also the other comments by Kevin on v4 review.

-Antonio

> 
> >  return EEXIST;
> >  } else {
> >  dpdk_mp_put(dev->dpdk_mp);
> > __
> >
> > Instead, inside netdev_dpdk_reconfigure() I think it should be correct to
> have
> >
> > err = netdev_dpdk_mempool_configure(dev);
> > if (err && err != EEXIST) {
> > goto out;
> > }
> > > because as in the case of a new mempool just created as in the case of
> EEXIST (=17)
> > we don't want to execute "goto out" and fall through to do
> >
> > dev->rxq_size = dev->requested_rxq_size;
> > dev->txq_size = dev->requested_txq_size;
> > ...
> >
> > With these code changes it seems to work, at the beginning I have MTU=1500
> and I set it to 1000:
> >
> > # ovs-vsctl list interface dpdk0 | grep mtu
> > mtu : 1500
> > mtu_request : []
> >
> > # ovs-vsctl set interface dpdk0 mtu_request=1000
> > my log says
> > netdev_dpdk|DBG|Requesting a mempool of 40992 mbufs for netdev dpdk0 with 1
> Rx and 11 Tx queues, socket id:0.
> > netdev_dpdk|DBG|A mempool with name ovs_62a2ca2f_0_2030_40992 already exists
> at 0x7fad9d77dd00.
> > dpdk|INFO|PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 0 Mbps
> - half-duplex
> >
> > # ovs-vsctl list interface dpdk0 | grep mtu
> > mtu : 1000
> > mtu_request : 1000
> >
> > Does that make sense?
> >
> 
> Looks ok to me.
> 
> Kevin.
> 
> > Thanks,
> > -Anton

Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with vhu client.

2017-10-11 Thread Fischetti, Antonio

> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, October 11, 2017 11:40 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; Róbert Mulik
> <robert.mu...@ericsson.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with
> vhu client.
> 
> On 10/11/2017 11:28 AM, Fischetti, Antonio wrote:
> > Hi Robert,
> > that's happening because the requested MTU is rounded up to the current
> boundary.
> > So if the current upper boundary is 2500 and we request 2000 =>
> > 2000 is rounded up to 2500 and the same mempool is returned.
> >
> > I may be wrong but this seems the wanted behavior, maybe Kevin can shed some
> light?
> > I may have missed some detail as I didn't follow this implementation since
> the
> > very beginning.
> >
> 
> I think it's related to review comments I sent earlier today. mtu is
> mtu, but a value that is rounded from it is used in calculating the size
> of the mbuf. I suspect in this case, when the new mtu size results in
> the same rounded value, the current mempool is being reused (which is
> fine) 

[Antonio] exactly.

> but the EEXISTS error value returned from reconfigure means that
> the change is not seen as successful and the old mtu value is restored
> to ovsdb.

[Antonio] 
I think this can be fixed by the following changes. 
The new mtu value is stored when a pre-existing mempool is returned:
__
@@ -631,6 +631,11 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
  rte_strerror(rte_errno));
 return rte_errno;
 } else if (mp_exists) {
+/* If a new MTU was requested and its rounded value is the same
+ * that is currently used, then the existing mempool was returned.
+ * Update the new MTU value. */
+dev->mtu = dev->requested_mtu;
+dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
 return EEXIST;
 } else {
 dpdk_mp_put(dev->dpdk_mp);
__

Instead, inside netdev_dpdk_reconfigure() I think it should be correct to have

err = netdev_dpdk_mempool_configure(dev);
if (err && err != EEXIST) {
goto out;
}

because as in the case of a new mempool just created as in the case of EEXIST 
(=17) 
we don't want to execute "goto out" and fall through to do

dev->rxq_size = dev->requested_rxq_size;
dev->txq_size = dev->requested_txq_size;
...

With these code changes it seems to work, at the beginning I have MTU=1500 and 
I set it to 1000:

# ovs-vsctl list interface dpdk0 | grep mtu
mtu : 1500
mtu_request : []

# ovs-vsctl set interface dpdk0 mtu_request=1000
my log says
netdev_dpdk|DBG|Requesting a mempool of 40992 mbufs for netdev dpdk0 with 1 Rx 
and 11 Tx queues, socket id:0.
netdev_dpdk|DBG|A mempool with name ovs_62a2ca2f_0_2030_40992 already exists at 
0x7fad9d77dd00.
dpdk|INFO|PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 0 Mbps - 
half-duplex

# ovs-vsctl list interface dpdk0 | grep mtu
mtu : 1000
mtu_request : 1000

Does that make sense?

Thanks,
-Antonio

> 
> Kevin.
> 
> > Antonio
> >
> >> -Original Message-
> >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> >> On Behalf Of Fischetti, Antonio
> >> Sent: Wednesday, October 11, 2017 9:04 AM
> >> To: Róbert Mulik <robert.mu...@ericsson.com>; d...@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management
> with
> >> vhu client.
> >>
> >> Thanks Robert for reporting this and for all the clear details you 
> >> provided.
> >> I'll look into this and get back to you.
> >>
> >> Antonio
> >>
> >>> -Original Message-
> >>> From: Róbert Mulik [mailto:robert.mu...@ericsson.com]
> >>> Sent: Tuesday, October 10, 2017 4:19 PM
> >>> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> >>> Subject: RE: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management
> >> with
> >>> vhu client.
> >>>
> >>> Hi Antonio,
> >>>
> >>> Last week I run into this mempool issue during the development of a new
> >>> feature. I have made a bugfix, but then we saw yours too, so I tested if 
> >>> it
> >>> solves my problem. It did, but I realized another problem with it. The
> >> mempool
> >>> name generation 

Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with vhu client.

2017-10-11 Thread Fischetti, Antonio
Hi Robert,
that's happening because the requested MTU is rounded up to the current 
boundary.
So if the current upper boundary is 2500 and we request 2000 => 
2000 is rounded up to 2500 and the same mempool is returned.

I may be wrong but this seems the wanted behavior, maybe Kevin can shed some 
light?
I may have missed some detail as I didn't follow this implementation since the 
very beginning.

Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Fischetti, Antonio
> Sent: Wednesday, October 11, 2017 9:04 AM
> To: Róbert Mulik <robert.mu...@ericsson.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with
> vhu client.
> 
> Thanks Robert for reporting this and for all the clear details you provided.
> I'll look into this and get back to you.
> 
> Antonio
> 
> > -Original Message-
> > From: Róbert Mulik [mailto:robert.mu...@ericsson.com]
> > Sent: Tuesday, October 10, 2017 4:19 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management
> with
> > vhu client.
> >
> > Hi Antonio,
> >
> > Last week I run into this mempool issue during the development of a new
> > feature. I have made a bugfix, but then we saw yours too, so I tested if it
> > solves my problem. It did, but I realized another problem with it. The
> mempool
> > name generation is partly based on the MTU size, which is handled in 1024
> bytes
> > long ranges. For example MTU 1000 and 1500 are in the same range, 2000 and
> 2500
> > are in a different range. So I tested this patch and got the following.
> >
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 2500
> > mtu_request : 2500
> > # ovs-vsctl set interface dpdk0 mtu_request=2000
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 2500
> > mtu_request : 2000
> > # ovs-vsctl set interface dpdk0 mtu_request=1500
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 1500
> > mtu_request : 1500
> > # ovs-vsctl set interface dpdk0 mtu_request=1000
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 1500
> > mtu_request : 1000
> > # ovs-vsctl set interface dpdk0 mtu_request=2000
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 2000
> > mtu_request : 2000
> > # ovs-vsctl set interface dpdk0 mtu_request=1000
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 1000
> > mtu_request : 1000
> > # ovs-vsctl set interface dpdk0 mtu_request=1500
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 1000
> > mtu_request : 1500
> > # service openvswitch-switch restart
> > # ovs-vsctl list interface dpdk0 |grep mtu
> > mtu : 1500
> > mtu_request : 1500
> >
> >
> > This was my setup:
> > Bridge br-prv
> > Port bond-prv
> > Interface "dpdk0"
> > type: dpdk
> > options: {dpdk-devargs=":05:00.0", n_rxq_desc="1024",
> > n_txq_desc="1024"}
> > ovs_version: "2.8.90"
> >
> > And I used DPDK v17.08.
> >
> >
> > So, as it can be see from the example above, with the patch applied when a
> new
> > mtu_request is in the same range as the previously set MTU, then it has no
> > effect until service restart. The mtu_request has immediate effect when it 
> > is
> > in different range as the previously set MTU. Or did I miss something during
> > the testing?
> >
> > My patch what I used last week does the following. During reconfiguration 
> > the
> > mempool is always deleted before a new one is created. It solved the problem
> > without side effects, but it is not optimized (always recreates the mempool
> > when this function is called).
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index c60f46f..de38f95 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -621,6 +621,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
> >  uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
> >  struct dpdk_mp *mp;
> >
> > +dpdk_mp_put(dev->dpdk_mp);
> >  mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));
> >  if (!mp) {
> > 

Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with vhu client.

2017-10-11 Thread Fischetti, Antonio
Thanks Robert for reporting this and for all the clear details you provided.
I'll look into this and get back to you.

Antonio

> -Original Message-
> From: Róbert Mulik [mailto:robert.mu...@ericsson.com]
> Sent: Tuesday, October 10, 2017 4:19 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with
> vhu client.
> 
> Hi Antonio,
> 
> Last week I run into this mempool issue during the development of a new
> feature. I have made a bugfix, but then we saw yours too, so I tested if it
> solves my problem. It did, but I realized another problem with it. The mempool
> name generation is partly based on the MTU size, which is handled in 1024 
> bytes
> long ranges. For example MTU 1000 and 1500 are in the same range, 2000 and 
> 2500
> are in a different range. So I tested this patch and got the following.
> 
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 2500
> mtu_request : 2500
> # ovs-vsctl set interface dpdk0 mtu_request=2000
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 2500
> mtu_request : 2000
> # ovs-vsctl set interface dpdk0 mtu_request=1500
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 1500
> mtu_request : 1500
> # ovs-vsctl set interface dpdk0 mtu_request=1000
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 1500
> mtu_request : 1000
> # ovs-vsctl set interface dpdk0 mtu_request=2000
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 2000
> mtu_request : 2000
> # ovs-vsctl set interface dpdk0 mtu_request=1000
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 1000
> mtu_request : 1000
> # ovs-vsctl set interface dpdk0 mtu_request=1500
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 1000
> mtu_request : 1500
> # service openvswitch-switch restart
> # ovs-vsctl list interface dpdk0 |grep mtu
> mtu : 1500
> mtu_request : 1500
> 
> 
> This was my setup:
> Bridge br-prv
> Port bond-prv
> Interface "dpdk0"
> type: dpdk
> options: {dpdk-devargs=":05:00.0", n_rxq_desc="1024",
> n_txq_desc="1024"}
> ovs_version: "2.8.90"
> 
> And I used DPDK v17.08.
> 
> 
> So, as it can be see from the example above, with the patch applied when a new
> mtu_request is in the same range as the previously set MTU, then it has no
> effect until service restart. The mtu_request has immediate effect when it is
> in different range as the previously set MTU. Or did I miss something during
> the testing?
> 
> My patch what I used last week does the following. During reconfiguration the
> mempool is always deleted before a new one is created. It solved the problem
> without side effects, but it is not optimized (always recreates the mempool
> when this function is called).
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c60f46f..de38f95 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -621,6 +621,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>  uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
>  struct dpdk_mp *mp;
> 
> +dpdk_mp_put(dev->dpdk_mp);
>  mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));
>  if (!mp) {
>  VLOG_ERR("Failed to create memory pool for netdev "
> @@ -629,7 +630,6 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>   rte_strerror(rte_errno));
>  return rte_errno;
>  } else {
> -dpdk_mp_put(dev->dpdk_mp);
>  dev->dpdk_mp = mp;
>  dev->mtu = dev->requested_mtu;
>  dev->socket_id = dev->requested_socket_id;
> 
> 
> What do you think about this solution?
> 
> Regards,
> Robert
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/4] Conntrack: add commands to r/w CT parameters.

2017-10-09 Thread Fischetti, Antonio
Thanks Kevin, I'll rework a v3.

Antonio

> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Tuesday, October 3, 2017 11:11 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2 0/4] Conntrack: add commands to r/w CT
> parameters.
> 
> On 10/03/2017 10:11 AM, Fischetti, Antonio wrote:
> > Thanks Kevin, comments inline.
> >
> > -Antonio
> >
> >> -Original Message-
> >> From: Kevin Traynor [mailto:ktray...@redhat.com]
> >> Sent: Monday, October 2, 2017 11:46 AM
> >> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH v2 0/4] Conntrack: add commands to r/w CT
> >> parameters.
> >>
> >> On 09/26/2017 01:35 PM, antonio.fische...@intel.com wrote:
> >>> This series adds two new commands to allow read/write of
> >>> some of the CT configuration parameters. This could be
> >>> used for maintenance purposes or to find a better tuning
> >>> of the current setup.
> >>>
> >>
> >> Hi Antonio. I don't think that helps people not too familiar with
> >> conntrack understand why the commands are needed and what cases they
> >> will help with.
> >
> > [Antonio]
> > I can rephrase it like:
> > This change comes from the consideration that when the CT is enabled
> > the overall performance can be deeply affected, even with simple
> > firewall rules and with stateless protocols like UDP.
> > This implementation adds a basic infrastructure that allows the user
> > to adjust the CT configuration parameters at run-time in order to
> > find a better tuning.
> > For example - depending on the traffic profile - the user could decrease
> > at run-time the maximum number of tracked connections, so to mitigate
> > the impact on performance.
> >
> 
> Sounds much better, thanks.
> 
> >
> >> Also, I think there should be some documentation to
> >> guide the user on when to use the new commands.
> >
> > [Antonio]
> > Sure, I'll update the dpctl.man and possibly other docs too, like some
> > new doc inside Documentation/howto/ ?
> > If you think other docs should be updated/added please let me know.
> >
> 
> You could add to the 'performance tuning' section if it's just about
> getting better performance. I don't really mind where, just that user
> has enough info to know what they are and why they would use them.
> 
> thanks,
> Kevin.
> 
> >> I'm not making comment
> >> on the usefulness or not of the commands but there's a need to explain
> >> why you are making the changes and guide the user on them.
> >>
> >> thanks,
> >> Kevin.
> >>
> >>> V2: Reworked based on comments.
> >>> V1: First implementation.
> >>>
> >>> Fischetti, Antonio (4):
> >>>   dpctl: Add a comment to functions retrieving the datapath name.
> >>>   conntrack: add commands to r/w CT parameters.
> >>>   conntrack: r/w upper limit connection value.
> >>>   conntrack: read current nr of connections.
> >>>
> >>>  lib/conntrack.c |  90 +
> >>>  lib/conntrack.h |   3 ++
> >>>  lib/ct-dpif.c   |  28 ++
> >>>  lib/ct-dpif.h   |   2 +
> >>>  lib/dpctl.c | 104
> >> +++-
> >>>  lib/dpif-netdev.c   |  19 ++
> >>>  lib/dpif-netlink.c  |   2 +
> >>>  lib/dpif-provider.h |   4 ++
> >>>  8 files changed, 251 insertions(+), 1 deletion(-)
> >>>
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with vhu client.

2017-10-06 Thread Fischetti, Antonio
Thanks Ciara, will respin a v4. Comments inline.

Antonio

> -Original Message-
> From: Loftus, Ciara
> Sent: Friday, October 6, 2017 11:40 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 1/5] netdev-dpdk: fix mempool management with
> vhu client.
> 
> >
> > In a PVP test where vhostuser ports are configured as
> > clients, OvS crashes when QEMU is launched.
> > This patch avoids to call dpdk_mp_put() - and erroneously
> > release the mempool - when it already exists.
> 
> Thanks for investigating this issue and for the patch.
> I think the commit message could be made more generic since the freeing of the
> pre-existing mempool could potentially happen for other port types and
> topologies, not just vhostuserclient & PVP.

[Antonio] ok.


> 
> >
> > CC: Kevin Traynor <ktray...@redhat.com>
> > CC: Aaron Conole <acon...@redhat.com>
> > Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> > Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> > port.")
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> > I've tested this patch by
> >   - changing at run-time the number of Rx queues:
> >   ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4
> >
> >   - reducing the MTU of the dpdk ports of 1 byte to force
> > the configuration of an existing mempool:
> >   ovs-vsctl set Interface dpdk0 mtu_request=1499
> >
> > To replicate the bug scenario:
> >
> >  PVP test setup
> >  --
> > CLIENT_SOCK_DIR=/tmp
> > SOCK0=dpdkvhostuser0
> > SOCK1=dpdkvhostuser1
> >
> > 1 PMD
> > Add 2 dpdk ports, n_rxq=1
> > Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-
> > path
> >  ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> > path="$CLIENT_SOCK_DIR/$SOCK0"
> >  ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> > path="$CLIENT_SOCK_DIR/$SOCK1"
> >
> > Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> >  add-flow br0 in_port=1,action=output:3
> >  add-flow br0 in_port=3,action=output:1
> >  add-flow br0 in_port=4,action=output:2
> >  add-flow br0 in_port=2,action=output:4
> 
> Nit - the steps to reproduce the bug are over-complicated. One only needs 1
> vhostuserclient port (no dpdk ports, no flows), and just boot the VM = crash.

[Antonio] ok, will change this description.

> 
> >
> >  Launch QEMU
> >  ---
> > As OvS vhu ports are acting as clients, we must specify 'server' in the next
> > command.
> > VM_IMAGE=
> >
> >  sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> > -name us-vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-
> > backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -
> > numa node,memdev=mem -mem-prealloc -smp 4 -drive file=$VM_IMAGE -
> > chardev socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev
> > type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
> > pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
> > socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev
> > type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
> > pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic
> >
> >  Expected behavior
> >  -
> > With this fix OvS shouldn't crash.
> > ---
> >  lib/netdev-dpdk.c | 27 ++-
> >  1 file changed, 14 insertions(+), 13 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index c60f46f..80a6ff3 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -508,7 +508,7 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> >  }
> >
> >  static struct dpdk_mp *
> > -dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> > +dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
> >  {
> >  struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);
> >  if (!dmp) {
> > @@ -530,8 +530,6 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >  + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) *
> > NETDEV_MAX_BURST
> >  + MIN_NB_MBUF;
> >
> > -bool mp_exists = false;
> > -
> >  do {
> >  char *mp_name = dpdk_mp_name(dmp);
> 
> Slightly unrelated to this patch but another issue with the d555d9bded5f
> &

Re: [ovs-dev] [PATCH v2 1/4] netdev-dpdk: fix mempool management with vhu client.

2017-10-05 Thread Fischetti, Antonio
I'll rework this patch and post a V3.

Thanks,
Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Fischetti, Antonio
> Sent: Tuesday, October 3, 2017 4:25 PM
> To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2 1/4] netdev-dpdk: fix mempool management with
> vhu client.
> 
> Thanks Kevin for your feedback.
> Below some details on what happens, how to replicate the issue and
> some comments inline.
> 
> -Antonio
> 
> > -Original Message-
> > From: Kevin Traynor [mailto:ktray...@redhat.com]
> > Sent: Monday, October 2, 2017 6:38 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> > Cc: Aaron Conole <acon...@redhat.com>
> > Subject: Re: [PATCH v2 1/4] netdev-dpdk: fix mempool management with vhu
> > client.
> >
> > On 09/28/2017 03:28 PM, antonio.fische...@intel.com wrote:
> > > From: Antonio Fischetti <antonio.fische...@intel.com>
> > >
> > > In a PVP test where vhostuser ports are configured as
> > > clients, OvS crashes when QEMU is launched.
> > > This patch avoids the repeated calls to netdev_change_seq_changed
> > > after the requested mempool is already acquired.
> > >
> >
> > Can you explain what is happening in this bug? I can't reproduce it
> 
> [Antonio]
> When QEMU is being launched, ovs crashes with the following stacktrace:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/339343.html
> 
> In case the requested mempool already exists, netdev_dpdk_mempool_configure
> returns 0 => netdev_change_seq_changed is called.
> 
> The issue happens with vhostuser 'client' ports:
>  - the vhu ports must be of dpdkvhostuserclient type
>  - so the QEMU command must contain 'server' like
>qemu-system-x86_64  path=$CLIENT_SOCK_DIR/$SOCK0,server
> 
> Below other details on my setup.
> 
>   1 PMD
>   -
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=8
> 
>   Ports
>   -
> ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-
> devargs=$NIC0
> ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk1 type=dpdk options:dpdk-
> devargs=$NIC1
> 
> ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface dpdkvhostuser0
> type=dpdkvhostuserclient
> ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> path="/tmp/dpdkvhostuser0"
> 
> ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface dpdkvhostuser1
> type=dpdkvhostuserclient
> ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> path="/tmp/dpdkvhostuser1"
> 
> 
> I'm using DPDK v17.05 and QEMU v2.7.0.
> 
> 
> Other details are below right after the patch description.
> 
> 
> 
> > and
> > the mempool for vhost ports should only be reconfigured if the number of
> > queues or socket has changed.
> >
> > > CC: Kevin Traynor <ktray...@redhat.com>
> > > CC: Aaron Conole <acon...@redhat.com>
> > > Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> > > Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> > port.")
> > > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > > ---
> > > To replicate the bug scenario:
> > >
> > >  PVP test setup
> > >  --
> > > CLIENT_SOCK_DIR=/tmp
> > > SOCK0=dpdkvhostuser0
> > > SOCK1=dpdkvhostuser1
> > >
> > > 1 PMD
> > > Add 2 dpdk ports, n_rxq=1
> > > Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-
> > path
> > >  ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> > path="$CLIENT_SOCK_DIR/$SOCK0"
> > >  ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> > path="$CLIENT_SOCK_DIR/$SOCK1"
> > >
> > > Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> > >  add-flow br0 in_port=1,action=output:3
> > >  add-flow br0 in_port=3,action=output:1
> > >  add-flow br0 in_port=4,action=output:2
> > >  add-flow br0 in_port=2,action=output:4
> > >
> > >  Launch QEMU
> > >  ---
> > > As OvS vhu ports are acting as clients, we must specify 'server' in the
> next
> > command.
> > > VM_IMAGE=
> > >
> > >  sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name
> us-
> > vhost-vm1 -cpu host -enable-kvm -m 4096M -object 

Re: [ovs-dev] [PATCH v2 1/4] netdev-dpdk: fix mempool management with vhu client.

2017-10-03 Thread Fischetti, Antonio
Thanks Kevin for your feedback.
Below some details on what happens, how to replicate the issue and 
some comments inline.

-Antonio

> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Monday, October 2, 2017 6:38 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Cc: Aaron Conole <acon...@redhat.com>
> Subject: Re: [PATCH v2 1/4] netdev-dpdk: fix mempool management with vhu
> client.
> 
> On 09/28/2017 03:28 PM, antonio.fische...@intel.com wrote:
> > From: Antonio Fischetti <antonio.fische...@intel.com>
> >
> > In a PVP test where vhostuser ports are configured as
> > clients, OvS crashes when QEMU is launched.
> > This patch avoids the repeated calls to netdev_change_seq_changed
> > after the requested mempool is already acquired.
> >
> 
> Can you explain what is happening in this bug? I can't reproduce it 

[Antonio]
When QEMU is being launched, ovs crashes with the following stacktrace:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/339343.html

In case the requested mempool already exists, netdev_dpdk_mempool_configure
returns 0 => netdev_change_seq_changed is called.

The issue happens with vhostuser 'client' ports:
 - the vhu ports must be of dpdkvhostuserclient type
 - so the QEMU command must contain 'server' like
   qemu-system-x86_64  path=$CLIENT_SOCK_DIR/$SOCK0,server

Below other details on my setup.

  1 PMD
  -
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=8

  Ports
  -
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk 
options:dpdk-devargs=$NIC0
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk1 type=dpdk 
options:dpdk-devargs=$NIC1

ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface dpdkvhostuser0 
type=dpdkvhostuserclient
ovs-vsctl set Interface dpdkvhostuser0 
options:vhost-server-path="/tmp/dpdkvhostuser0"

ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface dpdkvhostuser1 
type=dpdkvhostuserclient
ovs-vsctl set Interface dpdkvhostuser1 
options:vhost-server-path="/tmp/dpdkvhostuser1"


I'm using DPDK v17.05 and QEMU v2.7.0.


Other details are below right after the patch description.



> and
> the mempool for vhost ports should only be reconfigured if the number of
> queues or socket has changed.
> 
> > CC: Kevin Traynor <ktray...@redhat.com>
> > CC: Aaron Conole <acon...@redhat.com>
> > Reported-by: Ciara Loftus <ciara.lof...@intel.com>
> > Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
> port.")
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> > To replicate the bug scenario:
> >
> >  PVP test setup
> >  --
> > CLIENT_SOCK_DIR=/tmp
> > SOCK0=dpdkvhostuser0
> > SOCK1=dpdkvhostuser1
> >
> > 1 PMD
> > Add 2 dpdk ports, n_rxq=1
> > Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-
> path
> >  ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> path="$CLIENT_SOCK_DIR/$SOCK0"
> >  ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> path="$CLIENT_SOCK_DIR/$SOCK1"
> >
> > Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> >  add-flow br0 in_port=1,action=output:3
> >  add-flow br0 in_port=3,action=output:1
> >  add-flow br0 in_port=4,action=output:2
> >  add-flow br0 in_port=2,action=output:4
> >
> >  Launch QEMU
> >  ---
> > As OvS vhu ports are acting as clients, we must specify 'server' in the next
> command.
> > VM_IMAGE=
> >
> >  sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name 
> > us-
> vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-
> file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem 
> -
> mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev
> socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-
> user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
> socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-
> user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic
> >
> >  Expected behavior
> >  -
> > With this fix OvS shouldn't crash.
> > ---
> >  lib/netdev-dpdk.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index c60f46f..dda3771 100644
> > --- a/lib/netdev-dpdk.c
> >

Re: [ovs-dev] [PATCH v2 0/4] Conntrack: add commands to r/w CT parameters.

2017-10-03 Thread Fischetti, Antonio
Thanks Kevin, comments inline.

-Antonio

> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Monday, October 2, 2017 11:46 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2 0/4] Conntrack: add commands to r/w CT
> parameters.
> 
> On 09/26/2017 01:35 PM, antonio.fische...@intel.com wrote:
> > This series adds two new commands to allow read/write of
> > some of the CT configuration parameters. This could be
> > used for maintenance purposes or to find a better tuning
> > of the current setup.
> >
> 
> Hi Antonio. I don't think that helps people not too familiar with
> conntrack understand why the commands are needed and what cases they
> will help with. 

[Antonio]
I can rephrase it like:
This change comes from the consideration that when the CT is enabled 
the overall performance can be deeply affected, even with simple 
firewall rules and with stateless protocols like UDP. 
This implementation adds a basic infrastructure that allows the user 
to adjust the CT configuration parameters at run-time in order to 
find a better tuning.
For example - depending on the traffic profile - the user could decrease 
at run-time the maximum number of tracked connections, so to mitigate 
the impact on performance.


> Also, I think there should be some documentation to
> guide the user on when to use the new commands. 

[Antonio]
Sure, I'll update the dpctl.man and possibly other docs too, like some 
new doc inside Documentation/howto/ ?
If you think other docs should be updated/added please let me know.

> I'm not making comment
> on the usefulness or not of the commands but there's a need to explain
> why you are making the changes and guide the user on them.
> 
> thanks,
> Kevin.
> 
> > V2: Reworked based on comments.
> > V1: First implementation.
> >
> > Fischetti, Antonio (4):
> >   dpctl: Add a comment to functions retrieving the datapath name.
> >   conntrack: add commands to r/w CT parameters.
> >   conntrack: r/w upper limit connection value.
> >   conntrack: read current nr of connections.
> >
> >  lib/conntrack.c |  90 +
> >  lib/conntrack.h |   3 ++
> >  lib/ct-dpif.c   |  28 ++
> >  lib/ct-dpif.h   |   2 +
> >  lib/dpctl.c | 104
> +++-
> >  lib/dpif-netdev.c   |  19 ++
> >  lib/dpif-netlink.c  |   2 +
> >  lib/dpif-provider.h |   4 ++
> >  8 files changed, 251 insertions(+), 1 deletion(-)
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] netdev-dpdk: log an err message when a mempool name is empty.

2017-09-28 Thread Fischetti, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 26, 2017 9:22 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 3/4] netdev-dpdk: log an err message when a
> mempool name is empty.
> 
> 
> 
> On 9/26/17, 8:06 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Log an error message when the creation of a name for a
> new mempool fails.
> 
> CC: Ciara Loftus <ciara.lof...@intel.com>
> CC: Kevin Traynor <ktray...@redhat.com>
> CC: Aaron Conole <acon...@redhat.com>
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/netdev-dpdk.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f3f42ee..7c673ec 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -502,6 +502,9 @@ dpdk_mp_name(struct dpdk_mp *dmp)
>  int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
> h, dmp->mtu, dmp->mp_size);
>  if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> +VLOG_ERR("Failed to generate a mempool name for \"%s\". "
> +"Hash:0x%x, mtu:%d, mbufs:%u",
> +dmp->if_name, h, dmp->mtu, dmp->mp_size);
> 
> [Darrell] “Unlikely” to fail but this could be an ovs_assert(…)

[Antonio] ok will replace with
 ovs_assert(ret >= 0 && ret < RTE_MEMPOOL_NAMESIZE);

> 
> 
>  return NULL;
>  }
>  return mp_name;
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=VXc6b9RFgB9FaMDL4JevWOtvj0gTpvDJYBS30fvVj8Y=Eeo-
> qAFITbCVhSWvMvxHDN4IzLsDEIYI1MmIsoUtDOE=
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] netdev-dpdk: if mempool already exists don't reinit packet areas.

2017-09-28 Thread Fischetti, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 26, 2017 9:02 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/4] netdev-dpdk: if mempool already exists 
> don't
> reinit packet areas.
> 
> 
> 
> On 9/26/17, 8:05 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Skip initialization of mempool objects if this was already
> done in a previous call to dpdk_mp_create.
> 
> CC: Ciara Loftus <ciara.lof...@intel.com>
> CC: Kevin Traynor <ktray...@redhat.com>
> CC: Aaron Conole <acon...@redhat.com>
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/netdev-dpdk.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 2f5ec71..f3f42ee 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -566,12 +566,15 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>  }
>  free(mp_name);
>  if (dmp->mp) {
> -/* rte_pktmbuf_pool_create has done some initialization of 
> the
> - * rte_mbuf part of each dp_packet, while 
> ovs_rte_pktmbuf_init
> - * initializes some OVS specific fields of dp_packet.
> - */
> -rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);
> -
> +/* If the current mp was already created by a previous call
> + * we don't need to init again all its elements. */
> +if (!mp_exists) {
> +/* rte_pktmbuf_pool_create has done some initialization 
> of
> the
> + * rte_mbuf part of each dp_packet, while
> ovs_rte_pktmbuf_init
> + * initializes some OVS specific fields of dp_packet.
> + */
> +rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, 
> NULL);
> 
> 
> [Darrell] Can this be moved inside
> if (dmp->mp) {
> VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", mp_name,
>  dmp->mp_size);
> }…..
> 
> for clarity?

[Antonio] Thanks, that makes the code more readable too.


> 
> 
> +}
>  return dmp;
>  }
>  } while (!mp_exists &&
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=Bf45eQULq41ut2PpTtt6Dah9xN86c0suku7rL1WVaTs=2HhdsumV1sIAkn7BT6u3jzjIP
> ghy-48d2IgkqXemk8c=
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/4] netdev-dpdk: fix mempool management with vhu client.

2017-09-28 Thread Fischetti, Antonio
Thanks Darrell, replies inline.

-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 26, 2017 9:04 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/4] netdev-dpdk: fix mempool management with 
> vhu
> client.
> 
> 
> 
> On 9/26/17, 12:58 PM, "Darrell Ball" <db...@vmware.com> wrote:
> 
> 
> 
> On 9/26/17, 8:04 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> In a PVP test where vhostuser ports are configured as
> clients, OvS crashes when QEMU is launched.
> This patch avoids the repeated calls to netdev_change_seq_changed
> after the requested mempool is already acquired.
> 
> CC: Ciara Loftus <ciara.lof...@intel.com>
> CC: Kevin Traynor <ktray...@redhat.com>
> CC: Aaron Conole <acon...@redhat.com>
> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for 
> each
> port.")
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
> To replicate the bug scenario:
> 
> [Darrell] Just curious, but reproducibility with below steps ?; what about
> using libvirt ?
>Do we have the stacktrace ?

[Antonio] 
Actually I didn't try with libvirt. I also saw that it didn't crash when the
vhostuser ports were set with type=dpdkvhostuser.
 
Below is the stacktrace on a crash, ie when setting type=dpdkvhostuserclient:

Thread 26 "pmd124" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f73ca7fc700 (LWP 20176)]
0x00410fa9 in stack_dequeue ()
(gdb) bt
#0  0x00410fa9 in stack_dequeue ()
#1  0x005cdc17 in rte_mempool_ops_dequeue_bulk (mp=0x7f72fb83c940, 
obj_table=0x7f73ca7fb258, n=1) at
 /home/user/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:474
#2  0x005cdff9 in __mempool_generic_get (cache=0x0, n=1, 
obj_table=0x7f73ca7fb258, mp=0x7f72fb83c940
) at /home/user/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1218
#3  rte_mempool_generic_get (flags=0, cache=0x0, n=1, obj_table=0x7f73ca7fb258, 
mp=0x7f72fb83c940) at /home/
user/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1256
#4  rte_mempool_get_bulk (n=1, obj_table=0x7f73ca7fb258, mp=0x7f72fb83c940) at 
/home/user/dpdk/x86_64-nat
ive-linuxapp-gcc/include/rte_mempool.h:1289
#5  rte_mempool_get (obj_p=0x7f73ca7fb258, mp=0x7f72fb83c940) at 
/home/user/dpdk/x86_64-native-linuxapp-g
cc/include/rte_mempool.h:1315
#6  rte_mbuf_raw_alloc (mp=0x7f72fb83c940) at 
/home/user/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf
.h:822
#7  0x005ce14b in rte_pktmbuf_alloc (mp=0x7f72fb83c940) at 
/home/user/dpdk/x86_64-native-linuxapp
-gcc/include/rte_mbuf.h:1122
#8  0x005d283a in rte_vhost_dequeue_burst (vid=0, queue_id=1, 
mbuf_pool=0x7f72fb83c940, pkts=0x7f73c
a7fb830, count=1) at /home/user/dpdk/lib/librte_vhost/virtio_net.c:1116
#9  0x007b4025 in netdev_dpdk_vhost_rxq_recv (rxq=0x7f72ffdab080, 
batch=0x7f73ca7fb820) at lib/netde
v-dpdk.c:1650
#10 0x0070d331 in netdev_rxq_recv ()
#11 0x006ea8ce in dp_netdev_process_rxq_port ()
#12 0x006eaba0 in pmd_thread_main ()
#13 0x0075cb34 in ovsthread_wrapper ()
#14 0x7f742a0e65ca in start_thread () from /lib64/libpthread.so.0
#15 0x7f742990f0cd in clone () from /lib64/libc.so.6

> 
> 
> 
>  PVP test setup
>  --
> CLIENT_SOCK_DIR=/tmp
> SOCK0=dpdkvhostuser0
> SOCK1=dpdkvhostuser1
> 
> 1 PMD
> Add 2 dpdk ports, n_rxq=1
> Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-
> server-path
>  ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
> path="$CLIENT_SOCK_DIR/$SOCK0"
>  ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
> path="$CLIENT_SOCK_DIR/$SOCK1"
> 
> Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
>  add-flow br0 in_port=1,action=output:3
>  add-flow br0 in_port=3,action=output:1
>  add-flow br0 in_port=4,action=output:2
>  add-flow br0 in_port=2,action=output:4
> 
>  Launch QEMU
>  ---
> As OvS vhu ports are acting as clients, we must specify 'server' in 
> the
> next command.
> VM_IMAGE=
> 
>  sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -
> name us-vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-
> file,id=mem,size=4096M,mem-path=/de

Re: [ovs-dev] [PATCH v2] dpctl: manage ret value when dumping CT entries.

2017-09-26 Thread Fischetti, Antonio
Actually I forgot to add in the commit message

Reviewed-by: Greg Rose 

Can this please be added later?

Thanks, Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of antonio.fische...@intel.com
> Sent: Tuesday, September 26, 2017 10:37 AM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v2] dpctl: manage ret value when dumping CT entries.
> 
> Manage error value returned by ct_dpif_dump_next.
> 
> Signed-off-by: Antonio Fischetti 
> ---
>  lib/dpctl.c | 27 ---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 8951d6e..d229c97 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -1286,7 +1286,7 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  return error;
>  }
> 
> -while (!ct_dpif_dump_next(dump, )) {
> +while (!(error = ct_dpif_dump_next(dump, ))) {
>  struct ds s = DS_EMPTY_INITIALIZER;
> 
>  ct_dpif_format_entry(, , dpctl_p->verbosity,
> @@ -1296,6 +1296,13 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  dpctl_print(dpctl_p, "%s\n", ds_cstr());
>  ds_destroy();
>  }
> +if (error == EOF) {
> +/* Any CT entry was dumped with no issue. */
> +error = 0;
> +} else if (error) {
> +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> +}
> +
>  ct_dpif_dump_done(dump);
>  dpif_close(dpif);
>  return error;
> @@ -1384,7 +1391,7 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  }
> 
>  int tot_conn = 0;
> -while (!ct_dpif_dump_next(dump, )) {
> +while (!(error = ct_dpif_dump_next(dump, ))) {
>  ct_dpif_entry_uninit();
>  tot_conn++;
>  switch (cte.tuple_orig.ip_proto) {
> @@ -1425,6 +1432,13 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  break;
>  }
>  }
> +if (error == EOF) {
> +/* All CT entries were dumped with no issue.  */
> +error = 0;
> +} else if (error) {
> +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> +/* Fall through to show any other info we collected. */
> +}
> 
>  dpctl_print(dpctl_p, "Connections Stats:\nTotal: %d\n", tot_conn);
>  if (proto_stats[CT_STATS_TCP]) {
> @@ -1521,7 +1535,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
>  int tot_conn = 0;
>  uint32_t *conn_per_bkts = xzalloc(tot_bkts * sizeof(uint32_t));
> 
> -while (!ct_dpif_dump_next(dump, )) {
> +while (!(error = ct_dpif_dump_next(dump, ))) {
>  ct_dpif_entry_uninit();
>  tot_conn++;
>  if (tot_bkts > 0) {
> @@ -1533,6 +1547,13 @@ dpctl_ct_bkts(int argc, const char *argv[],
>  }
>  }
>  }
> +if (error == EOF) {
> +/* All CT entries were dumped with no issue.  */
> +error = 0;
> +} else if (error) {
> +dpctl_error(dpctl_p, error, "dumping conntrack entry");
> +/* Fall through and display all the collected info.  */
> +}
> 
>  dpctl_print(dpctl_p, "Current Connections: %d\n", tot_conn);
>  dpctl_print(dpctl_p, "\n");
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] dpctl: manage ret value when dumping CT entries.

2017-09-26 Thread Fischetti, Antonio
Thanks Darrell, comments inline. I'll post a v2 based on your suggestions.

-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Monday, September 25, 2017 8:31 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/2] dpctl: manage ret value when dumping CT
> entries.
> 
> 
> 
> On 9/25/17, 2:27 AM, "Fischetti, Antonio" <antonio.fische...@intel.com> wrote:
> 
> Hi Darrell,
> I agree with your suggestion in keeping 'error'
> as the only variable to manage return values.
> 
> In this case - as I'm assuming we shouldn't return an EOF to the
> caller - I should manage error as below?
> 
> if (error == EOF) {
> error = 0;<< EOF is not an issue, so return 0 to the 
> caller
> } else if (error) {
> dpctl_error(dpctl_p, error, "dumping conntrack entry");
> ct_dpif_dump_done(dump);
> dpif_close(dpif);
> return error;
> }
> 
> 
> [Darrell] For sure - EOF should not be returned to user since it is not an
> error.
>The other point I wanted to make is:
> I think you can trivially fall thru. for 
> dpctl_dump_conntrack()
>  After doing just dpctl_error(dpctl_p, error, "dumping
> conntrack entry");
>(comments inline).
>And in the other 2 cases, I think you can still try to print 
> out
> whatever is known
>after breaking out of the while loop and use the existing
> cleanup code.
>You would print error in the cases as well
> What do you think ?

[Antonio] Good idea, thanks. So for example the CT protocol stats or anything
else could be displayed anyway.


> 
> 
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Friday, September 22, 2017 8:42 AM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; 
> d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH 1/2] dpctl: manage ret value when dumping
> CT
> > entries.
> >
> > Few comments Antonio
> >
> > Darrell
> >
> > On 9/13/17, 5:37 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> > antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf
> of
> > antonio.fische...@intel.com> wrote:
> >
> > Manage error value returned by ct_dpif_dump_next.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dpctl.c | 28 +---
> >  1 file changed, 25 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/dpctl.c b/lib/dpctl.c
> > index 8951d6e..86d0f90 100644
> > --- a/lib/dpctl.c
> > +++ b/lib/dpctl.c
> > @@ -1263,6 +1263,7 @@ dpctl_dump_conntrack(int argc, const char
> *argv[],
> >  struct dpif *dpif;
> >  char *name;
> >  int error;
> > +int ret;
> >
> >  if (argc > 1 && ovs_scan(argv[argc - 1], "zone=%"SCNu16, 
> ))
> {
> >  pzone = 
> > @@ -1286,7 +1287,7 @@ dpctl_dump_conntrack(int argc, const char
> *argv[],
> >  return error;
> >  }
> >
> > -while (!ct_dpif_dump_next(dump, )) {
> > +while (!(ret = ct_dpif_dump_next(dump, ))) {
> >  struct ds s = DS_EMPTY_INITIALIZER;
> >
> >  ct_dpif_format_entry(, , dpctl_p->verbosity,
> > @@ -1296,6 +1297,13 @@ dpctl_dump_conntrack(int argc, const char
> *argv[],
> >  dpctl_print(dpctl_p, "%s\n", ds_cstr());
> >  ds_destroy();
> >  }
> > +if (ret && ret != EOF) {
> > +dpctl_error(dpctl_p, ret, "dumping conntrack entry");
> > +ct_dpif_dump_done(dump);
> > +dpif_close(dpif);
> > +return ret;
> > +}
> > +
> >
> > [Darrell] Maybe we can reuse ‘error’ ?
> > if (error && error != EOF) {
> >and just do
> >dpctl_error(dpctl_p, error, "dumping conntrack entry");
> >  and then fall 

Re: [ovs-dev] [PATCH 2/2] dpctl: init CT entry variable.

2017-09-26 Thread Fischetti, Antonio


> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Monday, September 25, 2017 8:35 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/2] dpctl: init CT entry variable.
> 
> 
> 
> On 9/25/17, 2:51 AM, "Fischetti, Antonio" <antonio.fische...@intel.com> wrote:
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
>     > Sent: Friday, September 22, 2017 9:26 AM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; 
> d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH 2/2] dpctl: init CT entry variable.
> >
> >
> >
> > On 9/13/17, 5:37 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> > antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf
> of
> > antonio.fische...@intel.com> wrote:
> >
> > ct_dpif_entry_uninit could potentially be called even if
> > ct_dpif_dump_next failed. As ct_dpif_entry_uninit receives
> > a pointer to a CT entry - and just checks it is not null -
> > it's safer to init to zero any instantiated ct_dpif_entry
> > variable before its usage.
> >
> > [Darrell] I took a look and did not see a particular problem.
> >Was there an issue that we are trying to address?; if so,
> this
> > may hide it ?
> 
> [Antonio]
> This change is more a matter of keeping safe habits for future
> code additions.
> In a new CT function that could be added down the line, one could
> potentially call ct_dpif_entry_uninit without checking what was
> returned by ct_dpif_dump_next.
> 
> As this is not in the hotpath, I added a memset to be extra-careful
> when initializing the local CT entry variable.
> 
> Maybe also a comment on top of the fn definition could help on this,
> something like?
> 
> +/* This function must be called when the returned
> +   value from ct_dpif_dump_next is 0. */
> void
> ct_dpif_entry_uninit(struct ct_dpif_entry *entry)
> {
> if (entry) {
> if (entry->helper.name) {
> 
> 
> [Darrell] It is usually better to wait for the new code that might need this
> and
>associate those patches as part of the same series.
>Can we do that ?

[Antonio] Yes, makes sense. I'll keep this one into my backlog.


> 
> 
> 
> >
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dpctl.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/lib/dpctl.c b/lib/dpctl.c
> > index 86d0f90..77d4e58 100644
> > --- a/lib/dpctl.c
> > +++ b/lib/dpctl.c
> > @@ -1287,6 +1287,7 @@ dpctl_dump_conntrack(int argc, const char
> *argv[],
> >  return error;
> >  }
> >
> > +memset(, 0, sizeof(cte));
> >  while (!(ret = ct_dpif_dump_next(dump, ))) {
> >  struct ds s = DS_EMPTY_INITIALIZER;
> >
> > @@ -1392,6 +1393,7 @@ dpctl_ct_stats_show(int argc, const char
> *argv[],
> >  return error;
> >  }
> >
> > +memset(, 0, sizeof(cte));
> >  int tot_conn = 0;
> >  while (!(ret = ct_dpif_dump_next(dump, ))) {
> >  ct_dpif_entry_uninit();
> > @@ -1532,6 +1534,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
> >   return 0;
> >  }
> >
> > +memset(, 0, sizeof(cte));
> >  dpctl_print(dpctl_p, "Total Buckets: %d\n", tot_bkts);
> >
> >  int tot_conn = 0;
> > --
> > 2.4.11
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> > 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> >
> uZnsw=3FF1c4sa7rHZb5a1DAZQlnsPZywcY7R_LNFki9WS9So=tU4fSt243XI_2QHkAF4R2h0sm
> > vtTC8fDyiOXBI02_t8=
> >
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] dpctl: init CT entry variable.

2017-09-25 Thread Fischetti, Antonio
> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, September 22, 2017 9:26 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/2] dpctl: init CT entry variable.
> 
> 
> 
> On 9/13/17, 5:37 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> ct_dpif_entry_uninit could potentially be called even if
> ct_dpif_dump_next failed. As ct_dpif_entry_uninit receives
> a pointer to a CT entry - and just checks it is not null -
> it's safer to init to zero any instantiated ct_dpif_entry
> variable before its usage.
> 
> [Darrell] I took a look and did not see a particular problem.
>Was there an issue that we are trying to address?; if so, this
> may hide it ?

[Antonio]
This change is more a matter of keeping safe habits for future
code additions. 
In a new CT function that could be added down the line, one could
potentially call ct_dpif_entry_uninit without checking what was
returned by ct_dpif_dump_next.

As this is not in the hotpath, I added a memset to be extra-careful 
when initializing the local CT entry variable.

Maybe also a comment on top of the fn definition could help on this,
something like?

+/* This function must be called when the returned
+   value from ct_dpif_dump_next is 0. */
void
ct_dpif_entry_uninit(struct ct_dpif_entry *entry)
{
if (entry) {
if (entry->helper.name) {



> 
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/dpctl.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 86d0f90..77d4e58 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -1287,6 +1287,7 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  return error;
>  }
> 
> +memset(, 0, sizeof(cte));
>  while (!(ret = ct_dpif_dump_next(dump, ))) {
>  struct ds s = DS_EMPTY_INITIALIZER;
> 
> @@ -1392,6 +1393,7 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  return error;
>  }
> 
> +memset(, 0, sizeof(cte));
>  int tot_conn = 0;
>  while (!(ret = ct_dpif_dump_next(dump, ))) {
>  ct_dpif_entry_uninit();
> @@ -1532,6 +1534,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
>   return 0;
>  }
> 
> +memset(, 0, sizeof(cte));
>  dpctl_print(dpctl_p, "Total Buckets: %d\n", tot_bkts);
> 
>  int tot_conn = 0;
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=3FF1c4sa7rHZb5a1DAZQlnsPZywcY7R_LNFki9WS9So=tU4fSt243XI_2QHkAF4R2h0sm
> vtTC8fDyiOXBI02_t8=
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] dpctl: manage ret value when dumping CT entries.

2017-09-25 Thread Fischetti, Antonio
Hi Darrell, 
I agree with your suggestion in keeping 'error'
as the only variable to manage return values.

In this case - as I'm assuming we shouldn't return an EOF to the 
caller - I should manage error as below?

if (error == EOF) {
error = 0;  << EOF is not an issue, so return 0 to the caller
} else if (error) {  
dpctl_error(dpctl_p, error, "dumping conntrack entry");
ct_dpif_dump_done(dump);
dpif_close(dpif);
return error;
} 


> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, September 22, 2017 8:42 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/2] dpctl: manage ret value when dumping CT
> entries.
> 
> Few comments Antonio
> 
> Darrell
> 
> On 9/13/17, 5:37 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Manage error value returned by ct_dpif_dump_next.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/dpctl.c | 28 +---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 8951d6e..86d0f90 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -1263,6 +1263,7 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  struct dpif *dpif;
>  char *name;
>  int error;
> +int ret;
> 
>  if (argc > 1 && ovs_scan(argv[argc - 1], "zone=%"SCNu16, )) {
>  pzone = 
> @@ -1286,7 +1287,7 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  return error;
>  }
> 
> -while (!ct_dpif_dump_next(dump, )) {
> +while (!(ret = ct_dpif_dump_next(dump, ))) {
>  struct ds s = DS_EMPTY_INITIALIZER;
> 
>  ct_dpif_format_entry(, , dpctl_p->verbosity,
> @@ -1296,6 +1297,13 @@ dpctl_dump_conntrack(int argc, const char *argv[],
>  dpctl_print(dpctl_p, "%s\n", ds_cstr());
>  ds_destroy();
>  }
> +if (ret && ret != EOF) {
> +dpctl_error(dpctl_p, ret, "dumping conntrack entry");
> +ct_dpif_dump_done(dump);
> +dpif_close(dpif);
> +return ret;
> +}
> +
> 
> [Darrell] Maybe we can reuse ‘error’ ?
> if (error && error != EOF) {
>and just do
>dpctl_error(dpctl_p, error, "dumping conntrack entry");
>  and then fall thru for cleanup ?
> 
> 
>  ct_dpif_dump_done(dump);
>  dpif_close(dpif);
>  return error;
> @@ -1348,6 +1356,7 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  int proto_stats[CT_STATS_MAX];
>  int tcp_conn_per_states[CT_DPIF_TCPS_MAX_NUM];
>  int error;
> +int ret;
> 
>  while (argc > 1 && lastargc != argc) {
>  lastargc = argc;
> @@ -1384,7 +1393,7 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  }
> 
>  int tot_conn = 0;
> -while (!ct_dpif_dump_next(dump, )) {
> +while (!(ret = ct_dpif_dump_next(dump, ))) {
>  ct_dpif_entry_uninit();
>  tot_conn++;
>  switch (cte.tuple_orig.ip_proto) {
> @@ -1425,6 +1434,12 @@ dpctl_ct_stats_show(int argc, const char *argv[],
>  break;
>  }
>  }
> +if (ret && ret != EOF) {
> +dpctl_error(dpctl_p, ret, "dumping conntrack entry");
> +ct_dpif_dump_done(dump);
> +dpif_close(dpif);
> +return ret;
> +}
> 
> [Darrell]
>  Can we reuse ‘error’, just print error and fall thru. ?
> It looks like it is safe to print whatever we know, which could be useful.
> Otherwise, if we have an error in dump_next, we may never be able to see any
> useful info.
> for debugging the same error or something else.
> 
> 
>  dpctl_print(dpctl_p, "Connections Stats:\nTotal: %d\n", 
> tot_conn);
>  if (proto_stats[CT_STATS_TCP]) {
> @@ -1482,6 +1497,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
>  uint16_t *pzone = NULL;
>  int tot_bkts = 0;
>  int error;
> +int ret;
> 
>  if (argc > 1 && !strncmp(argv[argc - 1], CT_BKTS_GT,
> strlen(CT_BKTS_GT))) {
>  if (o

Re: [ovs-dev] [patch v2 4/5] conntrack: Add function print_conn_info().

2017-09-21 Thread Fischetti, Antonio
Looks a good improvement to report on an intentional exploit, or 
other issues.
LGTM, just one comment. Now this new print fn is called just from 
create_un_nat_conn but in the future it could potentially be called 
from any other CT function. As the function call could affect the
performance, especially for the memcpy when dl_type != ETH_TYPE_IP, 
I was wondering if print_conn_info could be limited to work over a 
certain info level. Could something like the following help on this?
 
void
print_conn_info(struct conn *c, char *log_msg)
{
+if (!VLOG_IS_DBG_ENABLED()) {
+return;
+}

VLOG_INFO("%s", log_msg);
if (c->key.dl_type == htons(ETH_TYPE_IP)) {
VLOG_INFO("src addr "IP_FMT " dst addr "IP_FMT
..


Acked-by: Antonio Fischetti 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Darrell Ball
> Sent: Thursday, September 21, 2017 8:12 AM
> To: dlu...@gmail.com; d...@openvswitch.org
> Subject: [ovs-dev] [patch v2 4/5] conntrack: Add function print_conn_info().
> 
> A new debug function is added and used in a
> subsequent patch.
> 
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 53 +
>  1 file changed, 53 insertions(+)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 2eca38d..8deeec9 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -67,6 +67,8 @@ enum ct_alg_mode {
>  CT_TFTP_MODE,
>  };
> 
> +void print_conn_info(struct conn *c, char *log_msg);
> +
>  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
>   ovs_be16 dl_type, struct conn_lookup_ctx *,
>   uint16_t zone);
> @@ -223,6 +225,57 @@ conn_key_cmp(const struct conn_key *key1, const struct
> conn_key *key2)
>  return 1;
>  }
> 
> +void
> +print_conn_info(struct conn *c, char *log_msg)
> +{
> +VLOG_INFO("%s", log_msg);
> +if (c->key.dl_type == htons(ETH_TYPE_IP)) {
> +VLOG_INFO("src addr "IP_FMT " dst addr "IP_FMT
> +  " rev src addr "IP_FMT " rev dst addr "IP_FMT,
> +  IP_ARGS(c->key.src.addr.ipv4_aligned),
> +  IP_ARGS(c->key.dst.addr.ipv4_aligned),
> +  IP_ARGS(c->rev_key.src.addr.ipv4_aligned),
> +  IP_ARGS(c->rev_key.dst.addr.ipv4_aligned));
> +} else {
> +ovs_be32 a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13;
> +ovs_be32 a14, a15;
> +memcpy(, >key.src.addr.ipv6_aligned.s6_addr[0], sizeof a0);
> +memcpy(, >key.src.addr.ipv6_aligned.s6_addr[4], sizeof a1);
> +memcpy(, >key.src.addr.ipv6_aligned.s6_addr[8], sizeof a2);
> +memcpy(, >key.src.addr.ipv6_aligned.s6_addr[12], sizeof a3);
> +memcpy(, >key.dst.addr.ipv6_aligned.s6_addr[0], sizeof a4);
> +memcpy(, >key.dst.addr.ipv6_aligned.s6_addr[4], sizeof a5);
> +memcpy(, >key.dst.addr.ipv6_aligned.s6_addr[8], sizeof a6);
> +memcpy(, >key.dst.addr.ipv6_aligned.s6_addr[12], sizeof a7);
> +memcpy(, >rev_key.src.addr.ipv6_aligned.s6_addr[0],
> +   sizeof a8);
> +memcpy(, >rev_key.src.addr.ipv6_aligned.s6_addr[4],
> +   sizeof a9);
> +memcpy(, >rev_key.src.addr.ipv6_aligned.s6_addr[8],
> +   sizeof a10);
> +memcpy(, >rev_key.src.addr.ipv6_aligned.s6_addr[12],
> +   sizeof a11);
> +memcpy(, >rev_key.dst.addr.ipv6_aligned.s6_addr[0],
> +   sizeof a12);
> +memcpy(, >rev_key.dst.addr.ipv6_aligned.s6_addr[4],
> +   sizeof a13);
> +memcpy(, >rev_key.dst.addr.ipv6_aligned.s6_addr[8],
> +   sizeof a14);
> +memcpy(, >rev_key.dst.addr.ipv6_aligned.s6_addr[12],
> +   sizeof a15);
> +
> +VLOG_INFO("src addr 0x%08x:%08x:%08x:%08x; "
> +  "dst addr 0x%08x:%08x:%08x:%08x; "
> +  "rev src addr 0x%08x:%08x:%08x:%08x; "
> +  "rev dst addr 0x%08x:%08x:%08x:%08x", ntohl(a0), ntohl(a1),
> +  ntohl(a2), ntohl(a3), ntohl(a4), ntohl(a5), ntohl(a6),
> +  ntohl(a7), ntohl(a8), ntohl(a9), ntohl(a10), ntohl(a11),
> +  ntohl(a12), ntohl(a13), ntohl(a14), ntohl(a15));
> +}
> +VLOG_INFO("src/dst ports %d/%d rev src/dst ports %d/%d", c->key.src.port,
> +  c->key.dst.port, c->rev_key.src.port, c->rev_key.dst.port);
> +}
> +
>  /* Initializes the connection tracker 'ct'.  The caller is responsible for
>   * calling 'conntrack_destroy()', when the instance is not needed anymore */
>  void
> --
> 1.9.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH 1/5] conntrack: add commands to r/w conntrack parameters.

2017-09-20 Thread Fischetti, Antonio
Sure Kevin, I'll add a cover letter to the new version I'm going to rework.
Basically these are two new commands to allow read/adjust some of the CT
configuration parameter. So that the user can try to find a better tuning
for his/her setup.

Thanks,
Antonio


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Tuesday, September 19, 2017 2:11 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/5] conntrack: add commands to r/w conntrack
> parameters.
> 
> On 09/18/2017 11:21 AM, antonio.fische...@intel.com wrote:
> > Add infrastructure to implement:
> >  - dpctl/ct-get to read a current value of available
> >conntrack parameters.
> >  - dpctl/ct-set to set a value to the available conntrack
> >parameters.
> >
> > Add dpctl/ct-get to read current values of conntrack
> > parameters.
> > Add dpctl/ct-set to set a value to conntrack parameters.
> >
> 
> Hi Antonio - The commit message doesn't tell why these are needed or
> what use cases they will help with etc. Can you add something in a cover
> letter or the commits?
> 
> thanks,
> Kevin.
> 
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/conntrack.c | 67 +
> >  lib/conntrack.h |  3 ++
> >  lib/ct-dpif.c   | 28 ++
> >  lib/ct-dpif.h   |  2 ++
> >  lib/dpctl.c | 85
> +
> >  lib/dpif-netdev.c   | 19 
> >  lib/dpif-netlink.c  |  2 ++
> >  lib/dpif-provider.h |  4 +++
> >  8 files changed, 210 insertions(+)
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index 419cb1d..0642cc8 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -67,6 +67,13 @@ enum ct_alg_mode {
> >  CT_TFTP_MODE,
> >  };
> >
> > +/* Variable to manage read/write on CT parameters. */
> > +struct ct_wk_params {
> > +char *cli;  /* Parameter name in human format. */
> > +int (*wr)(struct conntrack *, uint32_t);
> > +int (*rd)(struct conntrack *, uint32_t *);
> > +};
> > +
> >  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
> >   ovs_be16 dl_type, struct conn_lookup_ctx *,
> >   uint16_t zone);
> > @@ -2391,6 +2398,66 @@ conntrack_flush(struct conntrack *ct, const uint16_t
> *zone)
> >  return 0;
> >  }
> >
> > +/* List of parameters that can be read/written at run-time. */
> > +struct ct_wk_params wk_params[] = {};
> > +
> > +int
> > +conntrack_set_param(struct conntrack *ct,
> > +const char *set_param)
> > +{
> > +bool valid_param = false;
> > +uint32_t max_conn;
> > +char bfr[16] = "";
> > +
> > +/* Check if the specified param can be managed. */
> > +for (int i = 0; i < sizeof(wk_params) / sizeof(struct ct_wk_params);
> i++) {
> > +if (!strncmp(set_param, wk_params[i].cli,
> > +strlen(wk_params[i].cli))) {
> > +valid_param = true;
> > +ovs_strzcpy(bfr, wk_params[i].cli, sizeof(bfr) - 1);
> > +strncat(bfr, "=%"SCNu32, sizeof(bfr) - 1 - strlen(bfr));
> > +if (ovs_scan(set_param, bfr, _conn)) {
> > +return (wk_params[i].wr
> > +? wk_params[i].wr(ct, max_conn)
> > +: EOPNOTSUPP);
> > +} else {
> > +return EINVAL;
> > +}
> > +}
> > +}
> > +if (!valid_param) {
> > +VLOG_DBG("%s: expected valid PARAM=NUMBER", set_param);
> > +return EINVAL;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +int
> > +conntrack_get_param(struct conntrack *ct,
> > +const char *get_param, uint32_t *val)
> > +{
> > +bool valid_param = false;
> > +
> > +/* Check if the specified param can be managed. */
> > +for (int i = 0; i < sizeof(wk_params) / sizeof(struct ct_wk_params);
> i++) {
> > +if (!strncmp(get_param, wk_params[i].cli,
> > +strlen(wk_params[i].cli))) {
> > +valid_param = true;
> > +
> > +return (wk_params[i].rd
> > +? wk_params[i].rd(ct, val)
> > +: EOPNOTSUPP);
> > +}
> > +}
&

Re: [ovs-dev] [PATCH 3/5] conntrack: r/w clean-up interval.

2017-09-20 Thread Fischetti, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 19, 2017 9:28 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 3/5] conntrack: r/w clean-up interval.
> 
> Hi Antonio
> 
> What is the motivation for this ?
> I don’t think this is a good idea, as it should not be needed under normal
> usage and has the potential to create unnecessary issues for the user and
> also maintenance issues.

[Antonio]
Agree, this would be more for debugging/experimenting purposes.
I will remove this patch from the series.

BTW let me know if you think some other CT cfg parameters could be 
read/written - or just read - by this ct-set-glbl-cfg command. 
For example, the Alg expectation timeout?

Thanks,
Antonio

> 
> Thanks Darrell
> 
> On 9/18/17, 3:23 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Read/Write conntrack clean-up interval used by
> the clean_thread_main() thread.
> 
> Example:
>ovs-appctl dpctl/ct-set cleanup=4000  # Set a new value
>ovs-appctl dpctl/ct-get cleanup   # Read
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/conntrack.c | 27 ---
>  lib/conntrack.h |  2 ++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 6d86625..60eb376 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -225,6 +225,9 @@ conn_key_cmp(const struct conn_key *key1, const struct
> conn_key *key2)
>  return 1;
>  }
> 
> +#define CT_CLEAN_INTERVAL 5000 /* 5 seconds */
> +#define CT_CLEAN_MIN_INTERVAL 200  /* 0.2 seconds */
> +
>  /* Initializes the connection tracker 'ct'.  The caller is responsible 
> for
>   * calling 'conntrack_destroy()', when the instance is not needed anymore
> */
>  void
> @@ -258,6 +261,7 @@ conntrack_init(struct conntrack *ct)
>  ct->hash_basis = random_uint32();
>  atomic_count_init(>n_conn, 0);
>  atomic_init(>n_conn_limit, DEFAULT_N_CONN_LIMIT);
> +ct->clean_interval = CT_CLEAN_INTERVAL;
>  latch_init(>clean_thread_exit);
>  ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main,
> ct);
>  }
> @@ -1327,8 +1331,6 @@ next_bucket:
>   *   behind, there is at least some 200ms blocks of time when buckets 
> will
> be
>   *   left alone, so the datapath can operate unhindered.
>   */
> -#define CT_CLEAN_INTERVAL 5000 /* 5 seconds */
> -#define CT_CLEAN_MIN_INTERVAL 200  /* 0.2 seconds */
> 
>  static void *
>  clean_thread_main(void *f_)
> @@ -1344,7 +1346,7 @@ clean_thread_main(void *f_)
>  if (next_wake < now) {
>  poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
>  } else {
> -poll_timer_wait_until(MAX(next_wake, now +
> CT_CLEAN_INTERVAL));
> +poll_timer_wait_until(MAX(next_wake, now + ct-
> >clean_interval));
>  }
>  latch_wait(>clean_thread_exit);
>  poll_block();
> @@ -2398,6 +2400,21 @@ conntrack_flush(struct conntrack *ct, const 
> uint16_t
> *zone)
>  return 0;
>  }
> 
> +/* Set an interval value to be used by clean_thread_main. */
> +static int
> +wr_clean_int(struct conntrack *ct, uint32_t new_val) {
> +ct->clean_interval = new_val;
> +VLOG_DBG("Set clean interval to %d", new_val);
> +return 0;
> +}
> +
> +/* Read current clean-up interval used by clean_thread_main. */
> +static int
> +rd_clean_int(struct conntrack *ct, uint32_t *cur_val) {
> +*cur_val = ct->clean_interval;
> +return 0;
> +}
> +
>  /* Set a new value for the upper limit of connections. */
>  static int
>  wr_max_conn(struct conntrack *ct, uint32_t new_val) {
> @@ -2414,11 +2431,15 @@ rd_max_conn(struct conntrack *ct, uint32_t
> *cur_val) {
>  }
> 
>  /* List of managed parameters. */
> +/* Max nr of connections managed by CT module. */
>  #define CT_RW_MAX_CONN "maxconn"
> +/* Clean-up interval used by clean_thread_main() thread. */
> +#define CT_RW_CLEAN_INTERVAL "cleanup"
> 
>  /* List of parameters that can be read/written at run-time. */
>  struct ct_wk_params wk_params[

Re: [ovs-dev] [PATCH 2/5] conntrack: r/w upper limit value for connections.

2017-09-20 Thread Fischetti, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 19, 2017 9:05 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/5] conntrack: r/w upper limit value for
> connections.
> 
> 
> 
> On 9/18/17, 3:22 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Read/Write the upper limit value for connections.
> 
> Example:
># set a new upper limit
>ovs-appctl dpctl/ct-set maxconn=100
> 
># display cur upper limit
>ovs-appctl dpctl/ct-get maxconn
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/conntrack.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 0642cc8..6d86625 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2398,8 +2398,28 @@ conntrack_flush(struct conntrack *ct, const 
> uint16_t
> *zone)
>  return 0;
>  }
> 
> +/* Set a new value for the upper limit of connections. */
> +static int
> +wr_max_conn(struct conntrack *ct, uint32_t new_val) {
> +atomic_init(>n_conn_limit, new_val);
> +VLOG_DBG("Set conn upper limit to %d", new_val);
> 
> [Darrell] really needed ?

[Antonio] ok will remove


> 
> +return 0;
> +}
> +
> +/* Read the current upper limit of connections. */
> +static int
> +rd_max_conn(struct conntrack *ct, uint32_t *cur_val) {
> +atomic_read_relaxed(>n_conn_limit, cur_val);
> +return 0;
> +}
> 
> [Darrell] I realize you are trying to generalize the function pointer, but I
> think
> it may be nicer to use different function pointers in these cases ?
> Name probably should have ct_ ?

[Antonio] ok will add ct_ prefix

> 
> +
> +/* List of managed parameters. */
> +#define CT_RW_MAX_CONN "maxconn"
> +
>  /* List of parameters that can be read/written at run-time. */
> -struct ct_wk_params wk_params[] = {};
> +struct ct_wk_params wk_params[] = {
> +{CT_RW_MAX_CONN, wr_max_conn, rd_max_conn},
> +};
> 
>  int
>  conntrack_set_param(struct conntrack *ct,
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=KK4qOk1HsuyVz-
> TJdOC9uBkoCN1hDN4I7UOmuCDr4ps=67tSYBejIuWlKYg0rQazrtErrRHWkL42TvEJUfd3iuU=
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/5] conntrack: add commands to r/w conntrack parameters.

2017-09-20 Thread Fischetti, Antonio
Thanks Darrell for your comments, I'll re-spin a V2 based on your feedback.
Other details inline below.

-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, September 19, 2017 8:50 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/5] conntrack: add commands to r/w conntrack
> parameters.
> 
> Thanks for working on this Antonio
> 
> Few initial comments; in some cases, I did not repeat the same comment.
> 
> 
> On 9/18/17, 3:22 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> Add infrastructure to implement:
>  - dpctl/ct-get to read a current value of available
>conntrack parameters.
>  - dpctl/ct-set to set a value to the available conntrack
>parameters.
> 
> Add dpctl/ct-get to read current values of conntrack
> parameters.
> Add dpctl/ct-set to set a value to conntrack parameters.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
>  lib/conntrack.c | 67 +
>  lib/conntrack.h |  3 ++
>  lib/ct-dpif.c   | 28 ++
>  lib/ct-dpif.h   |  2 ++
>  lib/dpctl.c | 85
> +
>  lib/dpif-netdev.c   | 19 
>  lib/dpif-netlink.c  |  2 ++
>  lib/dpif-provider.h |  4 +++
>  8 files changed, 210 insertions(+)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 419cb1d..0642cc8 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -67,6 +67,13 @@ enum ct_alg_mode {
>  CT_TFTP_MODE,
>  };
> 
> +/* Variable to manage read/write on CT parameters. */
> +struct ct_wk_params {
> 
> [Darrell]
> Somehow, I don’t get the ‘_wk_’
> How about ‘cfg’

[Antonio] agree, ct_cfg_params sounds better.
I was using 'wk' as 'working'.

> 
> +char *cli;  /* Parameter name in human format. */
> +int (*wr)(struct conntrack *, uint32_t);
> +int (*rd)(struct conntrack *, uint32_t *);
> +};
> +
>  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
>   ovs_be16 dl_type, struct conn_lookup_ctx *,
>   uint16_t zone);
> @@ -2391,6 +2398,66 @@ conntrack_flush(struct conntrack *ct, const 
> uint16_t
> *zone)
>  return 0;
>  }
> 
> +/* List of parameters that can be read/written at run-time. */
> +struct ct_wk_params wk_params[] = {};
> +
> +int
> +conntrack_set_param(struct conntrack *ct,
> +const char *set_param)
> +{
> +bool valid_param = false;
> +uint32_t max_conn;
> +char bfr[16] = "";
> 
> [Darrell]
> bfr ?
> could we use a few more letters ?

[Antonio]
As it is a temp buffer I could rename it 'temp' or 'buf'?

> 
> +
> +/* Check if the specified param can be managed. */
> +for (int i = 0; i < sizeof(wk_params) / sizeof(struct ct_wk_params);
> i++) {
> +if (!strncmp(set_param, wk_params[i].cli,
> +strlen(wk_params[i].cli))) {
> +valid_param = true;
> +ovs_strzcpy(bfr, wk_params[i].cli, sizeof(bfr) - 1);
> +strncat(bfr, "=%"SCNu32, sizeof(bfr) - 1 - strlen(bfr));
> +if (ovs_scan(set_param, bfr, _conn)) {
> +return (wk_params[i].wr
> +? wk_params[i].wr(ct, max_conn)
> +: EOPNOTSUPP);
> +} else {
> +return EINVAL;
> +}
> +}
> +}
> 
> [Darrell] If we reach here, then won’t valid_param be false since we otherwise
> returned.

[Antonio] you're right, will fix.


> 
> 
> +if (!valid_param) {
> +VLOG_DBG("%s: expected valid PARAM=NUMBER", set_param);
> 
> [Darrell] VLOG_WARN ?
>‘PARAM=NUMBER’ capitalization ?
> Could we use a sentence or sentence fragment ?

[Antonio] will change it to
  VLOG_WARN("%s parameter is not managed by this command.", set_param);

> 
> +return EINVAL;
> +}
> +
> +return 0;
> +}
> +
> +int
> +conntrack_get_param(struct conntrack *ct,
> +const char *get_param, uint32_t *val)
> +{
> +bool valid_param = false;
>

Re: [ovs-dev] [PATCH v4 7/7] keepalive: Add support to query keepalive status and statistics.

2017-09-05 Thread Fischetti, Antonio
Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 7/7] keepalive: Add support to query keepalive
> status and statistics.
> 
> This commit adds support to query keepalive status and statistics.
> 
>   $ ovs-appctl keepalive/status
> keepAlive Status: Enabled
> 
>   $ ovs-appctl keepalive/pmd-health-show
> 
>   Keepalive status
> 
> keepalive status  : Enabled
> keepalive interval: 1000 ms
> PMD threads   : 4
> 
>  PMDCORESTATE   LAST SEEN TIMESTAMP(UTC)
> pmd620  ALIVE   21 Aug 2017 16:29:31
> pmd631  ALIVE   21 Aug 2017 16:29:31
> pmd642  ALIVE   21 Aug 2017 16:29:31
> pmd653  GONE21 Aug 2017 16:26:31
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/keepalive.c | 97 
> +
>  1 file changed, 97 insertions(+)
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 2497f00..119e351 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -22,10 +22,12 @@
> 
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
> +#include "openvswitch/dynamic-string.h"
>  #include "openvswitch/vlog.h"
>  #include "ovs-thread.h"
>  #include "process.h"
>  #include "timeval.h"
> +#include "unixctl.h"
> 
>  VLOG_DEFINE_THIS_MODULE(keepalive);
> 
> @@ -295,6 +297,95 @@ ka_stats_run(void)
>  return ka_stats;
>  }
> 
> +static void
> +ka_unixctl_status(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +  const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +ds_put_format(, "keepAlive Status: %s",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
> +static void
> +ka_unixctl_pmd_health_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +   const char *argv[] OVS_UNUSED, void *ka_info_)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +ds_put_format(,
> +  "\n\t\tKeepalive status\n\n");
> +
> +ds_put_format(, "keepalive status  : %s\n",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +if (!ka_is_enabled()) {
> +goto out;
> +}
> +
> +ds_put_format(, "keepalive interval: %"PRIu32" ms\n",
> +  get_ka_interval());
> +
> +struct keepalive_info *ka_info = (struct keepalive_info *)ka_info_;
> +if (OVS_UNLIKELY(!ka_info)) {
> +goto out;
> +}
> +
> +ds_put_format(, "PMD threads   : %"PRIu32" \n", ka_info->pmd_cnt);
> +ds_put_format(,
> +  "\n PMD\tCORE\tSTATE\tLAST SEEN TIMESTAMP(UTC)\n");
> +
> +struct ka_process_info *pinfo, *pinfo_next;
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
> +char *state = NULL;
> +
> +if (pinfo->core_state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->core_state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_DOZING:
> +state = "DOZING";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +}

[Antonio]
Quite similarly to comment in patch #2, I'd add to the switch
statement at the end something like?
+default:
+VLOG_DBG("Unexpected %d value for core_state.", pinfo->core_state);

> +
> +char *utc = xastrftime_msec("%d %b %Y %H:%M:%S",
> +pinfo->core_last_seen_times, true);
> +
> +ds_put_format(, "%s\t%2d\t%s\t%s\n",
> +  pinfo->name, pinfo->core_id, state, utc);
> +
> +free(utc);
> +}
> +ovs_mutex_unlock(_info->proclist_mutex);
> +
> +ds_put_format(, "\n");
> +out:
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
>  /* Dispatch heartbeats. */
>  void
>  dispatch_heartbeats(void)
> @@ -412,6 +503,12 @@ ka_init(const struct smap *ovs_other_config)
>  ka_init_status = ka_init_success;
>  }
> 
> +unixctl_command_register("keepalive/status", "", 0, 0,
> +  ka_unixctl_status, NULL);
> +
> +unixctl_command_register("keepalive/pmd-health-show", "", 0, 

Re: [ovs-dev] [PATCH v4 5/7] keepalive: Retrieve PMD status periodically.

2017-09-05 Thread Fischetti, Antonio
Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 5/7] keepalive: Retrieve PMD status periodically.
> 
> This commit implements APIs to retrieve the PMD thread status and return
> the status in the below format for each PMD thread.
> 
>   Format: pmdid="status,core id,last_seen_timestamp(epoch)"
>   eg: pmd62="ALIVE,2,150332575"
>   pmd63="GONE,3,150332525"
> 
> The status is periodically retrieved by keepalive thread and stored in
> keepalive_stats struc which later shall be retrieved by vswitchd thread.
> In case of four PMD threads the status is as below:
> 
>"pmd62"="ALIVE,0,150332575"
>"pmd63"="ALIVE,1,150332575"
>"pmd64"="ALIVE,2,150332575"
>"pmd65"="ALIVE,3,150332575"
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c |  1 +
>  lib/keepalive.c   | 69 
> +++
>  lib/keepalive.h   |  1 +
>  3 files changed, 71 insertions(+)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 67ee424..8475a24 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -990,6 +990,7 @@ ovs_keepalive(void *f_)
>  int n_pmds = cmap_count(>poll_threads) - 1;
>  if (n_pmds > 0) {
>  dispatch_heartbeats();
> +get_ka_stats();
>  }
> 
>  xusleep(get_ka_interval() * 1000);
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 4ee89c0..9fd71b2 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -23,6 +23,7 @@
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
>  #include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
>  #include "process.h"
>  #include "timeval.h"
> 
> @@ -33,6 +34,9 @@ static bool ka_init_status = ka_init_failure; /* Keepalive
> initialization */
>  static uint32_t keepalive_timer_interval; /* keepalive timer interval */
>  static struct keepalive_info *ka_info = NULL;
> 
> +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
> +static struct smap *keepalive_stats OVS_GUARDED_BY(mutex);
> +
>  inline bool
>  ka_is_enabled(void)
>  {
> @@ -211,6 +215,71 @@ ka_get_timer_interval(const struct smap *ovs_other_config
> OVS_UNUSED)
>  return ka_interval;
>  }
> 
> +static void
> +get_pmd_status(struct smap *ka_pmd_stats)
> +OVS_REQUIRES(ka_info->proclist_mutex)
> +{
> +if (OVS_UNLIKELY(!ka_info)) {
> +return;
> +}
> +
> +struct ka_process_info *pinfo, *pinfo_next;
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
> +int core_id = pinfo->core_id;
> +char *state = NULL;
> +if (pinfo->core_state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->core_state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_DOZING:
> +state = "DOZING";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +}

[Antonio]
Quite similarly to comment in patch #2, I'd add to the switch
statement at the end something like?
+default:
+VLOG_DBG("Unexpected %d value for core_state.", pinfo->core_state);


> +
> +smap_add_format(ka_pmd_stats, pinfo->name, "%s,%d,%ld",
> +state, core_id, pinfo->core_last_seen_times);
> +}
> +}
> +
> +void
> +get_ka_stats(void)
> +{
> +struct smap *ka_pmd_stats;
> +ka_pmd_stats = xmalloc(sizeof *ka_pmd_stats);
> +smap_init(ka_pmd_stats);
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +get_pmd_status(ka_pmd_stats);
> +ovs_mutex_unlock(_info->proclist_mutex);
> +
> +ovs_mutex_lock();
> +if (keepalive_stats) {
> +smap_destroy(keepalive_stats);
> +free(keepalive_stats);
> +keepalive_stats = NULL;
> +}
> +keepalive_stats = ka_pmd_stats;
> +ovs_mutex_unlock();
> +}
> +
>  /* Dispatch heartbeats. */
>  void
>  dispatch_heartbeats(void)
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index a344006..f5da460 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -100,6 +100,7 @@ uint32_t get_ka_interval(void);
>  int get_ka_init_status(void);
>  int ka_alloc_portstats(unsigned, int);
>  void ka_destroy_portstats(void);
> +void get_ka_stats(void);
> 
>  void dispatch_heartbeats(void);
>  #endif /* keepalive.h */
> --
> 2.4.11
> 
> 

Re: [ovs-dev] [PATCH v4 2/7] Keepalive: Add initial keepalive support.

2017-09-05 Thread Fischetti, Antonio
Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 2/7] Keepalive: Add initial keepalive support.
> 
> This commit introduces the initial keepalive support by adding
> 'keepalive' module and also helper and initialization functions
> that will be invoked by later commits.
> 
> This commit adds new ovsdb column "keepalive" that shows the status
> of the datapath threads. This is implemented for DPDK datapath and
> only status of PMD threads is reported.
> 
> For eg:
>   To enable keepalive feature.
>   'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true'

[Antonio]
It would help saying that this command must be run 'before'
launching ovs-vswitchd otherwise has no effect.

> 
>   To set timer interval of 5000ms for monitoring packet processing cores.
>   'ovs-vsctl --no-wait set Open_vSwitch . \
>  other_config:keepalive-interval="5000"
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/automake.mk|   2 +
>  lib/keepalive.c| 183 
> +
>  lib/keepalive.h|  87 +
>  vswitchd/bridge.c  |   3 +
>  vswitchd/vswitch.ovsschema |   8 +-
>  vswitchd/vswitch.xml   |  49 
>  6 files changed, 330 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 2415f4c..0d99f0a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/json.c \
>   lib/jsonrpc.c \
>   lib/jsonrpc.h \
> + lib/keepalive.c \
> + lib/keepalive.h \
>   lib/lacp.c \
>   lib/lacp.h \
>   lib/latch.h \
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> new file mode 100644
> index 000..ac73a42
> --- /dev/null
> +++ b/lib/keepalive.c
> @@ -0,0 +1,183 @@
> +/*
> + * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "keepalive.h"
> +#include "lib/vswitch-idl.h"
> +#include "openvswitch/vlog.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(keepalive);
> +
> +static bool keepalive_enable = false;/* Keepalive disabled by default */
> +static bool ka_init_status = ka_init_failure; /* Keepalive initialization */
> +static uint32_t keepalive_timer_interval; /* keepalive timer interval */
> +static struct keepalive_info *ka_info = NULL;
> +
> +inline bool
> +ka_is_enabled(void)
> +{
> +return keepalive_enable;
> +}
> +
> +inline int
> +ka_get_pmd_tid(unsigned core_idx)
> +{
> +if (ka_is_enabled()) {
> +return ka_info->thread_id[core_idx];
> +}
> +
> +return -EINVAL;
> +}
> +
> +void
> +ka_set_pmd_state_ts(unsigned core_id, enum keepalive_state state,
> +uint64_t last_alive)
> +{
> +struct ka_process_info *pinfo;
> +int tid = ka_get_pmd_tid(core_id);
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +HMAP_FOR_EACH_WITH_HASH (pinfo, node, hash_int(tid, 0),
> + _info->process_list) {
> +if ((pinfo->core_id == core_id) && (pinfo->tid == tid)) {
> +pinfo->core_state = state;
> +pinfo->core_last_seen_times = last_alive;
> +}
> +}
> +ovs_mutex_unlock(_info->proclist_mutex);
> +}
> +
> +/* Retrieve and return the keepalive timer interval from OVSDB. */
> +static uint32_t
> +ka_get_timer_interval(const struct smap *ovs_other_config OVS_UNUSED)
> +{
> +#define OVS_KEEPALIVE_TIMEOUT 1000/* Default timeout set to 1000ms */
> +uint32_t ka_interval;
> +
> +/* Timer granularity in milliseconds
> + * Defaults to OVS_KEEPALIVE_TIMEOUT(ms) if not set */
> +ka_interval = smap_get_int(ovs_other_config, "keepalive-interval",
> +  OVS_KEEPALIVE_TIMEOUT);
> +
> +VLOG_INFO("Keepalive timer interval set to %"PRIu32" (ms)\n",
> ka_interval);
> +return ka_interval;
> +}
> +
> +/*
> + * This function shall be invoked periodically to write the core status and
> + * last seen timestamp of the cores in 

Re: [ovs-dev] [PATCH v4 0/7] Add OVS DPDK keep-alive functionality.

2017-09-05 Thread Fischetti, Antonio
Hi Bhanu,
I added some comments on patches #2, 5 and 7.

Besides that LGTM. I applied this patch series to 
commit 84d2723305064e25402cb89a16bf7ad1aa2cda70
and it works as expected.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 0/7] Add OVS DPDK keep-alive functionality.
> 
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing cores(PMD thread cores) by dispatching heartbeats at regular
> intervals. Incase of heartbeat misses additional health checks are
> enabled on the PMD thread to detect the failure and the same shall be
> reported to higher level fault management systems/frameworks.
> 
> The implementation uses OVSDB for reporting the health of the PMD threads.
> Any external monitoring application can read the status from OVSDB at
> regular intervals (or) subscribe to the updates in OVSDB so that they get
> notified when the changes happen on OVSDB.
> 
> keepalive info struct is created and initialized for storing the
> status of the PMD threads. This is initialized by main thread(vswitchd)
> as part of init process and will be periodically updated by 'keepalive'
> thread. keepalive feature can be enabled through below OVSDB settings.
> 
> enable-keepalive=true
>   - Keepalive feature is disabled by default.
> 
> keepalive-interval="5000"
>   - Timer interval in milliseconds for monitoring the packet
> processing cores.
> 
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd cores
> in keepalive info struct. This information shall be read by vswitchd thread
> and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.
> 
> An external monitoring framework like collectd with ovs events support
> can read (or) subscribe to the datapath status changes in ovsdb. When the 
> state
> is updated, the collectd shall be notified and will eventually relay the 
> status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
> 
> Compute NodeControllerCompute Node
> 
> Collectd  <--> Ceilometer <>   Collectd
> 
> OvS DPDK   OvS DPDK
> 
> +-+
> | VM  |
> +--+--+
> \---+---/
> |
> +--+---+   ++--+ +--+---+
> | OVS  |-> |   ovsevents plugin| --> |   collectd   |
> +--+---+   ++--+ +--+---+
> 
> +--+-+ +---++ |
> | Ceilometer | <-- | collectd ceilometer plugin |  <---
> +--+-+ +---++
> 
> github: The patches can be found here:
>   https://github.com/bbodired/ovs (Last master commit e7cd8c363)
> 
> Performance impact:
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
> 
> -
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only updates
> PMD status to OVSDB. The incremental patch series to handle false
> positives,
> negatives and more checking and stats.
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
> is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by Aaron).
>   * Rework the logic to register and track threads instead of cores. This way
> in the future any thread can be registered to KA framework. For now only
> PMD
> threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
> 
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
> implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
> pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy (7):
>   process: Extend get_process_info() for additional fields.
>   Keepalive: Add initial keepalive support.
>  

Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Fischetti, Antonio


> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, August 25, 2017 6:19 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] dpif-netdev: Avoid reading RSS hash when EMC
> is disabled
> 
> Hi Antonio
> 
> On 8/25/17, 6:56 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> When EMC is disabled the reading of RSS hash is skipped.
> Also, for packets that are not recirculated it retrieves
> the hash value without considering the recirc id.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
> 
>  V4
>   - reworked to remove dependencies from other patches in
> patchset "Skip EMC for recirc pkts and other optimizations."
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DAugust_337320.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=7UGZunJeutRugFmidRAykhZFRYscZ2rUUmhF7Ys9IM8=hzVGT1bI3zOj_x9J5cu18Q2D5
> WLHdEC1qhQ-Gu--TEE=
> 
>   - measurements were repeated with the latest head of master.
> 
>  Port-to-Port Test
>  =
> Software built with "-O2 -march=native -g".
> 
> I measured the Rx rate regardless of pkt loss by sending 1 UDP
> flow,
> 64B packets, at the line-rate.
> 
> 2 PMDs with 3 Tx queues.
> 
> Flows setup:
>   in_port=dpdk0 actions=output:dpdk1
>   in_port=dpdk1 actions=output:dpdk0
> 
> Results
> ---
> Values are for the Rx rate in Mpps, regardless of packet loss.
> 
> RSS column:
>Yes: RSS hash is provided by the NIC
>No: RSS is disabled and the 5-tuple hash must be
>computed in software.
> 
>Note: to simulate RSS disabled I added the line
>dp_packet_rss_valid(struct dp_packet *p)
>{
>+return false;
>#ifdef DPDK_NETDEV
> 
> EMC column:
>Yes: default probability insertion,
>No: EMC disabled.
> 
> Orig means Commit ID:
>   75f9e007e7f7eb91461e238f882d1c539c56bb8d
> 
> [Darrell]
> It looks like the main benefit (+6% throughput) occurs
> when RSS is not calculated by the NIC and emc is disabled. What are the common
> cases when RSS is not generated by the NIC ?

[Antonio]
A common case could be when packets are coming in from a VM.  
Also when the NIC does not provide a hash value, but this shouldn't be 
considered a common case I guess.


> 
> 
> 
> +--++
> +-+-+Orig  | Orig + this|
> | RSS | EMC |  |patch   |
> +-+-+--++
> | Yes | Yes | 12.02, 11.38 | 12.18, 11.32   |
> | Yes |  No |  8.35,  8.36 |  8.36,  8.38   |
> +-+-+--++
> |  No | Yes |  9.91, 11.15 |  9.84, 11.08   |
> |  No |  No |  7.83,  7.87 |  8.33,  8.35   |
> +-+-+--++
> 
> ---
>  lib/dpif-netdev.c | 32 
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e2cd931..1157998 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4567,6 +4567,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet *packet_,
>  }
> 
>  static inline uint32_t
> +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> +const struct miniflow *mf)
> +{
> +uint32_t hash;
> +
> +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> +hash = dp_packet_get_rss_hash(packet);
> +} else {
> +hash = miniflow_hash_5tuple(mf, 0);
> +dp_packet_set_rss_hash(packet, hash);
> +}
> +
> +return hash;
> +}
> +
> +static inline uint32_t
>  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
>  const struct miniflow *mf)
>  {
> @@ -4701,10 +4717,18 @@ emc_processing(struct dp_netdev_

Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-25 Thread Fischetti, Antonio
Sure Darrell, I'll do that.
Thanks!

Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, August 25, 2017 9:59 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when
> EMC is disabled
> 
> Hi Antonio
> 
> Can the dependency of this Patch 2 on Patch 1 be removed, while Patch 1 is
> being discussed ?
> 
> Thanks Darrell
> 
> On 8/13/17, 11:32 PM, "ovs-dev-boun...@openvswitch.org on behalf of Darrell
> Ball" <ovs-dev-boun...@openvswitch.org on behalf of db...@vmware.com> wrote:
> 
> I did not try it yet, but seems reasonable
> If the hash is needed for something else, it will be read at that point.
> 
> -Original Message-
> From: <ovs-dev-boun...@openvswitch.org> on behalf of
> "antonio.fische...@intel.com" <antonio.fische...@intel.com>
> Date: Friday, August 11, 2017 at 8:52 AM
> To: "d...@openvswitch.org" <d...@openvswitch.org>
> Subject: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when
>   EMC is disabled
> 
> When EMC is disabled the reading of RSS hash is skipped.
> Also, for packets that are not recirculated it retrieves the hash
> value without considering the recirc id.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> ---
> This patch depends on previous patch in this series.
> 
> Port-to-Port Test
> =
> Software built with "-O2 -march=native -g".
> 
> I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
> 64B packets, at the line-rate.
> 
> 2 PMDs with 3 Tx queues.
> 
> Flows setup:
>   in_port=dpdk0 actions=output:dpdk1
>   in_port=dpdk1 actions=output:dpdk0
> 
> Results
> ---
> Values are for the Rx rate in Mpps, regardless of packet loss.
> 
> RSS column:
>Yes: RSS hash is provided by the NIC
>No: RSS is disabled and the 5-tuple hash must be
>computed in software.
> 
> EMC column:
>Yes: default probability insertion,
>No: EMC disabled.
> 
> Orig OvS-DPDK means Commit ID:
>   6b1babacc3ca0488e07596bf822fe356c9bab646
> 
> +--+++
> +-+-+Orig  | Orig + patch 1 | Orig + patch 1 |
> | RSS | EMC |  ||  this patch|
> +-+-+--+++
> | Yes | Yes | 11.99, 11.41 | 12.20, 11.31   | 12.20, 11.31   |
> | Yes |  No |  8.32,  8.42 |  8.35,  8.39   |  8.62, 8.62|
> +-+-+--+++
> |  No | Yes |  9.87, 11.15 |  9.79, 11.20   |  9.85, 11.09   |
> |  No |  No |  7.82,  7.84 |  7.84,  7.93   |  8.40,  8.38   |
> +-+-+--+++
> ---
>  lib/dpif-netdev.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8f6b96b..0db6f83 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4578,6 +4578,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread
> *pmd, struct dp_packet *packet_,
>  }
> 
>  static inline uint32_t
> +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> +const struct miniflow *mf)
> +{
> +uint32_t hash;
> +
> +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> +hash = dp_packet_get_rss_hash(packet);
> +} else {
> +hash = miniflow_hash_5tuple(mf, 0);
> +dp_packet_set_rss_hash(packet, hash);
> +}
> +
> +return hash;
> +}
> +
> +static inline uint32_t
>  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
>  const struct miniflow *mf)
>  {
> @@ -4715,7 +4731,6 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  miniflow_extract(packet, >mf);
>  key->len = 0; /* Not computed yet. */
> -key->hash = dpif_netdev_pack

Re: [ovs-dev] [PATCH v3] netdev-dpdk: Create separate memory pool for each port

2017-08-23 Thread Fischetti, Antonio


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Tuesday, August 22, 2017 7:58 PM
> To: Aaron Conole <acon...@redhat.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>
> Cc: Wojciechowicz, RobertX <robertx.wojciechow...@intel.com>;
> d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3] netdev-dpdk: Create separate memory pool for
> each port
> 
> On 08/22/2017 06:14 PM, Aaron Conole wrote:
> > "Fischetti, Antonio" <antonio.fische...@intel.com> writes:
> >
> >> Hi Kevin, pls see comments inline.
> >> I'm going to rebase and rework this patch on behalf of Robert.
> >>
> >> Thanks,
> >> Antonio
> >>
> >>> -Original Message-
> >>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> >>> On Behalf Of Kevin Traynor
> >>> Sent: Tuesday, April 11, 2017 9:37 AM
> >>> To: Wojciechowicz, RobertX <robertx.wojciechow...@intel.com>;
> >>> d...@openvswitch.org
> >>> Subject: Re: [ovs-dev] [PATCH v3] netdev-dpdk: Create separate memory pool
> for
> >>> each port
> >>>
> >>> On 04/07/2017 09:20 AM, Robert Wojciechowicz wrote:
> >>>> Since it's possible to delete memory pool in DPDK
> >>>> we can try to estimate better required memory size
> >>>> when port is reconfigured, e.g. with different number
> >>>> of rx queues.
> >>>>
> >>>> Signed-off-by: Robert Wojciechowicz <robertx.wojciechow...@intel.com>
> >>>> Acked-by: Ian Stokes <ian.sto...@intel.com>
> >>>> ---
> >>>> v2:
> >>>> - removing mempool reference counter
> >>>> - making sure mempool name isn't longer than RTE_MEMPOOL_NAMESIZE
> >>>>
> >>>> v3:
> >>>> - adding memory for corner cases
> >>>> ---
> >>>>  lib/netdev-dpdk.c | 118 
> >>>> ++---
> ---
> >>> --
> >>>>  1 file changed, 57 insertions(+), 61 deletions(-)
> >>>>
> >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>>> index ddc651b..6b781ac 100644
> >>>> --- a/lib/netdev-dpdk.c
> >>>> +++ b/lib/netdev-dpdk.c
> >>>> @@ -275,14 +275,12 @@ static struct ovs_list dpdk_list
> >>> OVS_GUARDED_BY(dpdk_mutex)
> >>>>  static struct ovs_mutex dpdk_mp_mutex OVS_ACQ_AFTER(dpdk_mutex)
> >>>>  = OVS_MUTEX_INITIALIZER;
> >>>>
> >>>> -static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
> >>>> -= OVS_LIST_INITIALIZER(_mp_list);
> >>>> -
> >>>>  struct dpdk_mp {
> >>>>  struct rte_mempool *mp;
> >>>>  int mtu;
> >>>>  int socket_id;
> >>>> -int refcount;
> >>>> +char if_name[IFNAMSIZ];
> >>>> +unsigned mp_size;
> >>>>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
> >>>>  };
> >>>>
> >>>> @@ -463,78 +461,82 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp,
> >>>>  dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
> >>>>  }
> >>>>
> >>>> +/*
> >>>> + * XXX Full DPDK memory pool name must be unique
> >>>> + * and cannot be longer than RTE_MEMPOOL_NAMESIZE
> >>>> + */
> >>>> +static char *
> >>>> +dpdk_mp_name(struct dpdk_mp *dmp)
> >>>> +{
> >>>> +uint32_t h = hash_string(dmp->if_name, 0);
> >>>> +char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof(char));
> >>>> +int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
> >>>> +   h, dmp->mtu, dmp->mp_size);
> >>>> +if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >>>> +return NULL;
> >>>> +}
> >>>> +return mp_name;
> >>>> +}
> >>>> +
> >>>>  static struct dpdk_mp *
> >>>> -dpdk_mp_create(int socket_id, int mtu)
> >>>> +dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >>>>  {
> >>>>  struct rte_pktmbuf_pool_private mbp_priv;
> >>>>  struct dpdk_mp *dmp;
> >>>> -   

Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets

2017-08-23 Thread Fischetti, Antonio

> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Thursday, August 17, 2017 1:42 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; Darrell Ball
> <db...@vmware.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
> 
> Hi Antonio,
> 
> > > Is there a reason to assume that a deterministic selection on some non-
> > random
> > > criteria like the recirculation count will on average (over deployments
> > and
> > > applications) give a better performance than a random selection?
> >
> > [Antonio]
> > If we consider latency and jitter a deterministic solution should be
> > more preferable than a solution which behaves differently depending
> > on the particular values of the packet fields, eg the IP addresses.
> 
> Do you have measurements showing that latency is significantly affected
> by EMC hit vs DPCLS hit?  I wouldn't think so.  Only throughput should vary.
> 

[Antonio]
Agree. 
What I meant to say is that - broadly speaking - it should be
preferable to adopt solutions that seem to be more deterministic, 
especially in a Telco deployment.
This approach - at least at a first glance - seems to be more deterministic
than other approaches like the "RSS hash threshold method" because
the latter can treat the packet differently depending on their header.

IMPO it could be good to have this approach in parallel with some other 
strategies - like the "RSS hash threshold method" - because they operate 
on two different causes/levels of the same problem.
 

> Probabilistic EMC lookup should only apply in situations where EMC is
> overloaded, meaning we have thousands of packet flows. In this case we
> maximize the aggregate throughput of the statistical flow mix. But it is not
> that a flow using EMC would see higher throughput than analogous flows that
> don't.
> 
> > > I don't believe so. For example, the number of "EMC flows" in each pass
> > through
> > > the datapath can differ hugely: 1 GRE tunnel flow in first pass (from phy
> > > port), 100K tenant flows after tunnel decapsulation. Or 100K tenant flows
> > in
> > > first pass (from VM) but 1 flow after NSH encapsulation in second pass.
> >
> > [Antonio]
> > Maybe I'm wrong but shouldn't the different flows encapped in a GRE
> > tunnel hit the EMC in different locations? Because even if they all have the
> > same outer IP addresses, they differ in the L4 ports so the 5-tuple hash
> > - and the emc locations - should vary. Same thing for NSH encapsulation?
> 
> Neither GRE nor NSH packets have L4 ports for RSS hashing. GRE is a separate
> IP protocol (not UDP). All packets of a GRE tunnel share the same pair of IP
> addresses. NSH is even a non-IP protocol.
> 
> > > I believe a random selection with dynamically adapted probability is the
> > best
> > > we can do without a priori knowledge about the traffic patterns and
> > pipeline
> > > organization.
> >
> > [Antonio]
> > This proposal is orthogonal to other approaches that look at the usage
> > of the single locations, eg policies not to overwrite active locations or to
> > reduce in general the emc usage.
> > I think we should consider both the two strategies to tackle two different
> > aspects of the thrashing and use emc more efficiently:
> >  1. skip emc lookup/insert for recirc packets (which is only activated when
> >emc entries exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD);
> >  2. any other strategy that limits emc usage or offers a better entries
> > eviction.
> >
> > So - being agnostic of what's the traffic type - if we have 100k flows
> > that could potentially be recirculated:
> >  1. allows to tackle the thrashing due to recirculation, which is activated
> > when the emc entries exceeds a threshold.
> >  2. allows to limit the emc usage to fewer flows because we don't want
> > 100k flows to hit emc.
> 
> First of all: we only discuss limiting EMC lookups in the case of EMC 
> overload.
> I still don't think that it is a good idea to general skip EMC lookup for
> recirculated  flows in that situation. It may be the right thing to do in some
> scenarios (e.g. GRE -> VM), but exactly the wrong in others (e.g. VM -> GRE).
> 
> If we go for a probabilistic reduction of EMC lookups we'd statistically have 
> a
> balanced improvement in all (known and unknown) scenarios.
> 
> BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: Create separate memory pool for each port

2017-08-22 Thread Fischetti, Antonio
Hi Kevin, pls see comments inline.
I'm going to rebase and rework this patch on behalf of Robert.  

Thanks,
Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Kevin Traynor
> Sent: Tuesday, April 11, 2017 9:37 AM
> To: Wojciechowicz, RobertX ;
> d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3] netdev-dpdk: Create separate memory pool for
> each port
> 
> On 04/07/2017 09:20 AM, Robert Wojciechowicz wrote:
> > Since it's possible to delete memory pool in DPDK
> > we can try to estimate better required memory size
> > when port is reconfigured, e.g. with different number
> > of rx queues.
> >
> > Signed-off-by: Robert Wojciechowicz 
> > Acked-by: Ian Stokes 
> > ---
> > v2:
> > - removing mempool reference counter
> > - making sure mempool name isn't longer than RTE_MEMPOOL_NAMESIZE
> >
> > v3:
> > - adding memory for corner cases
> > ---
> >  lib/netdev-dpdk.c | 118 
> > ++--
> --
> >  1 file changed, 57 insertions(+), 61 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index ddc651b..6b781ac 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -275,14 +275,12 @@ static struct ovs_list dpdk_list
> OVS_GUARDED_BY(dpdk_mutex)
> >  static struct ovs_mutex dpdk_mp_mutex OVS_ACQ_AFTER(dpdk_mutex)
> >  = OVS_MUTEX_INITIALIZER;
> >
> > -static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
> > -= OVS_LIST_INITIALIZER(_mp_list);
> > -
> >  struct dpdk_mp {
> >  struct rte_mempool *mp;
> >  int mtu;
> >  int socket_id;
> > -int refcount;
> > +char if_name[IFNAMSIZ];
> > +unsigned mp_size;
> >  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
> >  };
> >
> > @@ -463,78 +461,82 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp,
> >  dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
> >  }
> >
> > +/*
> > + * XXX Full DPDK memory pool name must be unique
> > + * and cannot be longer than RTE_MEMPOOL_NAMESIZE
> > + */
> > +static char *
> > +dpdk_mp_name(struct dpdk_mp *dmp)
> > +{
> > +uint32_t h = hash_string(dmp->if_name, 0);
> > +char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof(char));
> > +int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
> > +   h, dmp->mtu, dmp->mp_size);
> > +if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> > +return NULL;
> > +}
> > +return mp_name;
> > +}
> > +
> >  static struct dpdk_mp *
> > -dpdk_mp_create(int socket_id, int mtu)
> > +dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >  {
> >  struct rte_pktmbuf_pool_private mbp_priv;
> >  struct dpdk_mp *dmp;
> > -unsigned mp_size;
> >  char *mp_name;
> >
> >  dmp = dpdk_rte_mzalloc(sizeof *dmp);
> >  if (!dmp) {
> >  return NULL;
> >  }
> > -dmp->socket_id = socket_id;
> > +dmp->socket_id = dev->requested_socket_id;
> >  dmp->mtu = mtu;
> > -dmp->refcount = 1;
> > +strncpy(dmp->if_name, dev->up.name, IFNAMSIZ);
> >  mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct
> dp_packet);
> >  mbp_priv.mbuf_priv_size = sizeof(struct dp_packet)
> > -  - sizeof(struct rte_mbuf);
> > -/* XXX: this is a really rough method of provisioning memory.
> > - * It's impossible to determine what the exact memory requirements are
> > - * when the number of ports and rxqs that utilize a particular mempool
> can
> > - * change dynamically at runtime. For now, use this rough heurisitic.
> > +- sizeof(struct rte_mbuf);
> > +/*
> > + * XXX: rough estimation of memory required for port:
> > + * 
> > + * + 
> > + * + 
> > + * + 
> >   */
> > -if (mtu >= ETHER_MTU) {
> > -mp_size = MAX_NB_MBUF;
> > -} else {
> > -mp_size = MIN_NB_MBUF;
> > -}
> >
> > -do {
> > -mp_name = xasprintf("ovs_mp_%d_%d_%u", dmp->mtu, dmp->socket_id,
> > -mp_size);
> > +dmp->mp_size = dev->requested_n_rxq * dev->requested_rxq_size
> > ++ dev->requested_n_txq * dev->requested_txq_size
> > ++ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
> > ++ MIN_NB_MBUF;
> >
> > -dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
> > +do {
> > +mp_name = dpdk_mp_name(dmp);
> > +dmp->mp = rte_mempool_create(mp_name, dmp->mp_size, MBUF_SIZE(mtu),
> >   MP_CACHE_SZ,
> >   sizeof(struct
> rte_pktmbuf_pool_private),
> >   rte_pktmbuf_pool_init, _priv,
> >   ovs_rte_pktmbuf_init, NULL,
> > - socket_id, 0);
> > +

Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets

2017-08-17 Thread Fischetti, Antonio
Thanks Jan for your feedback and the interesting usecases described. 
Please find below some questions/comments I added inline.

Regards,
-Antonio


> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Wednesday, August 16, 2017 5:24 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; Darrell Ball
> <db...@vmware.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
> 
> Hi,
> 
> I agree that in the event of EMC overload it is beneficial to reduce the 
> number
> of EMC insertions and lookups as they just generate overhead and degrade
> overall throughput. At the same time we want to keep as much of the EMC
> acceleration as possible for a fraction of traffic that can benefit from EMC
> most.

[Antonio] 
Perfectly agree, the goal should be to reserve the emc acceleration to a 
'fraction'
of the traffic.


> 
> For EMC insertion we have already done earlier this by introducing
> probabilistic EMC insertion, which greatly reduces the costly effect of EMC
> thrashing. But we didn't touch the lookup part. How should we select the
> packets (or rather packet datapath traversals) for which to perform lookup?
> 
> There are several proposals in the air: Only do it for the first pass, not for
> recirculated packets, only do it for RSS hash values below a (dynamic)
> threshold, possibly others.
> 
> For EMC insertion we consciously settled on a random selection as the datapath
> has no a priori insight into which flows are better candidates than others and
> big flows that benefit most have a higher chance of getting cached.
> 
> Is there a reason to assume that a deterministic selection on some non-random
> criteria like the recirculation count will on average (over deployments and
> applications) give a better performance than a random selection?

[Antonio]
If we consider latency and jitter a deterministic solution should be 
more preferable than a solution which behaves differently depending 
on the particular values of the packet fields, eg the IP addresses.


> 
> I don't believe so. For example, the number of "EMC flows" in each pass 
> through
> the datapath can differ hugely: 1 GRE tunnel flow in first pass (from phy
> port), 100K tenant flows after tunnel decapsulation. Or 100K tenant flows in
> first pass (from VM) but 1 flow after NSH encapsulation in second pass.

[Antonio]
Maybe I'm wrong but shouldn't the different flows encapped in a GRE 
tunnel hit the EMC in different locations? Because even if they all have the 
same outer IP addresses, they differ in the L4 ports so the 5-tuple hash
- and the emc locations - should vary. Same thing for NSH encapsulation?


> 
> I believe a random selection with dynamically adapted probability is the best
> we can do without a priori knowledge about the traffic patterns and pipeline
> organization.

[Antonio]
This proposal is orthogonal to other approaches that look at the usage
of the single locations, eg policies not to overwrite active locations or to 
reduce in general the emc usage. 
I think we should consider both the two strategies to tackle two different 
aspects of the thrashing and use emc more efficiently:
 1. skip emc lookup/insert for recirc packets (which is only activated when 
   emc entries exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD);
 2. any other strategy that limits emc usage or offers a better entries 
eviction.

So - being agnostic of what's the traffic type - if we have 100k flows 
that could potentially be recirculated:
 1. allows to tackle the thrashing due to recirculation, which is activated
when the emc entries exceeds a threshold. 
 2. allows to limit the emc usage to fewer flows because we don't want 
100k flows to hit emc.

> 
> The RSS hash threshold method looks like the only pseudo-random criterion that
> we can use that produces consistent result for every packet of a flow and does
> require more information. Of course elephant flows with an unlucky hash value
> might never get to use the EMC, but that risk we have with any stateless
> selection scheme.
> 
> The new thing required will be the dynamic adjustment of lookup probability to
> the EMC fill level and/or hit ratio. Any ideas for that? I guess we'd need a
> scheme that periodically increases the probability again to probe for changed
> traffic patterns.
> 
> Once we have that I think the same dynamic probability could be possible to 
> use
> also for probabilistic EMC insertion.
> 
> BR, Jan
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of Fischetti, Antonio
> > Sent: Wednesday, 16 August, 2017 14:42
> > To: Darrell Ball <db...

Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets

2017-08-16 Thread Fischetti, Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Wednesday, August 16, 2017 9:09 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
> 
> 
> 
> -Original Message-
> From: "Fischetti, Antonio" <antonio.fische...@intel.com>
> Date: Tuesday, August 15, 2017 at 6:55 AM
> To: Darrell Ball <db...@vmware.com>, "d...@openvswitch.org"
> <d...@openvswitch.org>
> Subject: RE: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
> 
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Monday, August 14, 2017 7:27 AM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>; 
> d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC 
> lookup/insert
> for
> > recirc packets
> >
> >
> >
> > -Original Message-
> > From: <ovs-dev-boun...@openvswitch.org> on behalf of
> > "antonio.fische...@intel.com" <antonio.fische...@intel.com>
> > Date: Friday, August 11, 2017 at 8:52 AM
> > To: "d...@openvswitch.org" <d...@openvswitch.org>
> > Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert 
> for
> > recirc packets
> >
> > When OVS is configured as a firewall, with thousands of active
> > concurrent connections, the EMC gets quicly saturated and may
> > come under heavy thrashing for the reason that original and
> > recirculated packets keep overwriting the existing active EMC
> > entries due to its limited size (8k).
> >
> >
> > The recirculated packet could have been modified, in which case, maybe 
> we
> > still want to do the emc lookup/insert ?
> 
> [Antonio]
> IMPO I'd say we should still skip emc anyway, because the purpose is to
> mitigate thrashing when emc is full. So any recirculated packet should
> be classified at the dpcls/ofproto layers.
> I don't know if I'm missing something from your question?
> 
> We can expect that a recirc pkt that has been modified - similarly to all
> other recirculated pkts - could result in a miss when emc is full.
> Later we should do an emc insertion that is likely to overwrite some
> active entry. And recursively, this new insertion itself could be
> overwritten - due to the shortage of locations - even before it is hit
> again. This proposal is to mitigate the thrashing with the criteria of
> reserving emc usage to original packets only.
> So a limited resource like emc hopefully could be used more efficiently,
> especially when there is more than 1 recirculation.
> I guess that adding an exception for modified recirc pkts could also
> drop a bit the throughtput as we should add another if statement inside
> emc_processing.
> 
> [Darrell]
> I’ll can drop the edited packet case as my concern was really more general.
> The concern is that recirculated packets should still be forwarded quickly if
> possible
> and using emc should help that. The first time through, emc is used for the
> packet and then the second
> time through, emc is not used, so it is slower. But, possibly the argument
> could be made that since it is recirculated,
> it is already slower, in which case, maybe a penalty for recirculated packets
> is reasonable.

[Antonio]
Agree. Other than that, in case of an emc congestion - eg a firewall with
say 6,000 connections - with a lot of overwrites, the effect could be that 
a lot of lookups will fail and the new insertions are just overwriting active 
flows. This keeps a high failure for lookups and the continuous overwrites 
for insertions become an overhead. So in this case there's a penalty 
as for the original (ie the 1st time through) as for the recirculated packets.
With this approach we are considering that with 6,000 flows we would need at
least 12,000 entries with 1 recirculation. So one strategy to reduce thrashing
could be to restrict emc usage to original packets only. The counterpart is 
that recirculated packets are slower, but the overall effect should be a 
benefit.


> Instead of having a simple 50% black and white cutoff, maybe a penalty to the
> insertion probability could be used ?

[Antonio]
Yes, at the beginning I was considering this solution. I then preferred 
the current one because it allows not only to skip insertions but also 
to skip lookups, espec

Re: [ovs-dev] [PATCH 0/5] dpif-netdev: Cuckoo-Distributor implementation

2017-08-09 Thread Fischetti, Antonio
Any comment on this patchset?

Adding Jan in CC.

In one of the last bi-weekly meeting there was some interest in testing
this patchset in conjunction with the patch to avoid using EMC for
recirculated packets - this is contained inside the patchset
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335938.html


Thanks,
-Antonio


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Wang, Yipeng1
> Sent: Tuesday, July 11, 2017 8:59 PM
> To: Darrell Ball ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 0/5] dpif-netdev: Cuckoo-Distributor
> implementation
> 
> Thank you Darrell for the comments.
> 
> To ones who are interested, this patch is mainly for improving the subtable
> lookup process when subtable count is large. We heard about use cases that
> the current sequential search of subtables is not efficient enough.  With 30
> subtables, this patch could achieve more than 2x speedup.  Basically, a hash
> table is used to direct the packets to correct sub-table.
> 
> We also plan a replacement policy mechanism for version 2, our initial results
> Show another 7% improvement on top of current CD for certain use cases.
> 
> Please feel free to comment and share any thought on this patch.
> 
> Thanks
> Yipeng
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Friday, July 7, 2017 6:37 PM
> > To: Wang, Yipeng1 ; d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH 0/5] dpif-netdev: Cuckoo-Distributor
> > implementation
> >
> > I just noticed this patch set has not had much discussion since the RFC
> > version.
> > It would be nice if the discussion can be revived.
> >
> > Thanks Darrell
> >
> >
> > On 6/13/17, 4:09 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> > yipeng1.w...@intel.com"  > yipeng1.w...@intel.com> wrote:
> >
> > From: Yipeng Wang 
> >
> > The Datapath Classifier uses tuple space search for flow classification.
> > The rules are arranged into a set of tuples/subtables (each with a
> > distinct mask).  Each subtable is implemented as a hash table and lookup
> > is done with flow keys formed by selecting the bits from the packet
> header
> > based on each subtable's mask. Tuple space search will sequentially
> search
> > each subtable until a match is found. With a large number of subtables, 
> > a
> > sequential search of the subtables could consume a lot of CPU cycles. In
> > a testbench with a uniform traffic pattern equally distributed across 20
> > subtables, we measured that up to 65% of total execution time is
> > attributed
> > to the megaflow cache lookup.
> >
> > This patch presents the idea of the two-layer hierarchical lookup, where
> a
> > low overhead first level of indirection is accessed first, we call this
> > level cuckoo distributor (CD). If a flow key has been inserted in the
> flow
> > table the first level will indicate with high probability that which
> > subtable to look into. A lookup is performed on the second level (the
> > target subtable) to retrieve the result. If the key doesn’t have a 
> > match,
> > then we revert back to the sequential search of subtables. The patch is
> > partially inspired by earlier concepts proposed in "simTable"[1] and
> > "Cuckoo Filter"[2], and DPDK's Cuckoo Hash implementation.
> >
> > This patch can improve the already existing Subtable Ranking when 
> > traffic
> > data has high entropy. Subtable Ranking helps minimize the number of
> > traversed subtables when most of the traffic hit the same subtable.
> > However, in the case of high entropy traffic such as traffic coming from
> > a physical port, multiple subtables could be hit with a similar
> frequency.
> > In this case the average subtable lookups per hit would be much greater
> > than 1. In addition, CD can adaptively turn off when it finds the 
> > traffic
> > mostly hit one subtable. Thus, CD will not be an overhead when Subtable
> > Ranking works well.
> >
> > Scheme:
> >
> >  ---
> > |  CD   |
> >  ---
> >\
> > \
> >  -  - -
> > |sub  ||sub  |...|sub  |
> > |table||table|   |table|
> >  -  - -
> >
> > Evaluation:
> >
> > We create set of rules with various src IP. We feed traffic containing
> various
> > numbers of flows with various src IP and dst IP. All the flows hit
> 10/20/30
> > rules creating 10/20/30 subtables.
> >
> > The table below shows the preliminary continuous testing results (full
> line
> > speed test) we collected with a uni-directional phy-to-phy setup. The
> > machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS
> > runs with 1 PMD. We use Spirent as the 

Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.

2017-08-08 Thread Fischetti, Antonio


> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, August 8, 2017 8:16 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; Ilya Maximets
> <i.maxim...@samsung.com>; ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Hi Antonio
> 
> Would you mind sharing your distribution algorithm ?
> I would like to understand how you saw some benefit in the 1000-5000 range.
> 
> 1000| 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91
> 3000| 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
> 5000| 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
> 
> Thanks Darrell

[Antonio]
I set up the generator to loop on the dest IP addr on one side,
and loop instead on the source IP addr on the other side.

For example to generate 10 different flows, I was sending to phy port #1
UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: 63

Instead to phy port #2 (source and dest IPs are now swapped):
UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest: 63

I use this setup to quickly simulate some UDP connections, maybe the
flows created in this way don't seem to be a real packet dataset.

> 
> -Original Message-----
> From: <ovs-dev-boun...@openvswitch.org> on behalf of "Fischetti, Antonio"
> <antonio.fische...@intel.com>
> Date: Thursday, August 3, 2017 at 8:10 AM
> To: Ilya Maximets <i.maxim...@samsung.com>, "ovs-dev@openvswitch.org"  d...@openvswitch.org>
> Cc: Heetae Ahn <heetae82@samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
> LGTM.
> 
> I think this patch could work fine in conjunction with the one I posted
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJuly_335940.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA=Ro4bZVnRtHb2-
> gSch_oxiU5ypxPWW8ONyoMeuXml4Dk=
> where I'm targeting a congestion usecase with recirculated packets.
> The goal is still to limit thrashing and the criteria is to avoid
> EMC lookup and insertions for recirculated packets.
> 
> I gave a try to your 2 patches, I'm using
> Commit 325b2b1a493a2230072de726bbb53a8337759f39
> 
> On top of that I applied Ciara's patch "dpif-netdev: add EMC entry count
> and %full figure to pmd-stats-show" at:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJanuary_327570.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA=Np71SDnUWCgZNENFZ0A95iSWD
> qHN9SBtRso9UU6Wzlk=
> because I wanted to see the count of EMC entries.
> 
> I generated the different traffic streams by looping on IP addresses.
> Of course results are strictly dependent on the particular data traffic.
> 
> I sent traffic 64B UDP packets at the line rate and measured the Rx packet
> rate, regardless of pkt loss.
> 
> Flow setup:
> table=0, in_port=dpdk0 actions=output:dpdk1
> table=0, in_port=dpdk1 actions=output:dpdk0
> 
> 2 PMDs, 3 Tx queues.
> 
> To read entries counts:
>  sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show | grep entries
> 
> Results
> ===
> 
> +=+
> |  Orig + Ciara's patch   |  + these 2 patches
> +=+
> Streams | Entries  Rx [Mpps]  |   Entries  Rx [Mpps]
> +-+
> 100 | 100, 1009.76, 10.05 |   100, 10010.55, 10.72
> 1000| 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91
> 3000| 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
> 5000| 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
> 7000| 5870, 7000  8.57, 7.20  |   6039, 7000  8.56, 7.26
> 9000| 6550, 7572  7.19, 6.91  |   6643, 7836  7.06, 6.90
> 11000   | 7152, 8192  6.81, 6.83  |   7158, 8192  6.81, 6.79
> +-+
> 
> It's interesting to see how these patches allow to use more EMC locations 
> -
> see cases from 3000 to 9000 streams - so reducing thrashing.
> 
> 
> -Antonio
> 
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> > On

Re: [ovs-dev] [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-08 Thread Fischetti, Antonio


> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Tuesday, August 8, 2017 8:12 AM
> To: Ilya Maximets <i.maxim...@samsung.com>; Wang, Yipeng1
> <yipeng1.w...@intel.com>; ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Kevin Traynor 
> <ktray...@redhat.com>;
> Loftus, Ciara <ciara.lof...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>
> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC
> probability.
> 
> 
> 
> -Original Message-
> From: Ilya Maximets <i.maxim...@samsung.com>
> Date: Monday, August 7, 2017 at 4:54 AM
> To: "Wang, Yipeng1" <yipeng1.w...@intel.com>, Darrell Ball <db...@vmware.com>,
> "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
> Cc: Heetae Ahn <heetae82@samsung.com>, Kevin Traynor 
> <ktray...@redhat.com>,
> "Loftus, Ciara" <ciara.lof...@intel.com>, "Fischetti, Antonio"
> <antonio.fische...@intel.com>
> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC
> probability.
> 
> On 04.08.2017 23:39, Wang, Yipeng1 wrote:
> >
> >
> >> -Original Message-
> >> From: Darrell Ball [mailto:db...@vmware.com]
> >> Sent: Friday, August 4, 2017 11:35 AM
> >> To: Ilya Maximets <i.maxim...@samsung.com>; ovs-dev@openvswitch.org
> >> Cc: Heetae Ahn <heetae82@samsung.com>; Wang, Yipeng1
> >> <yipeng1.w...@intel.com>; Kevin Traynor <ktray...@redhat.com>; Loftus,
> >> Ciara <ciara.lof...@intel.com>; Fischetti, Antonio
> >> <antonio.fische...@intel.com>
> >> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for
> EMC
> >> probability.
> >>
> >>
> >>
> >> -Original Message-
> >> From: Ilya Maximets <i.maxim...@samsung.com>
> >> Date: Friday, August 4, 2017 at 7:17 AM
> >> To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
> >> Cc: Heetae Ahn <heetae82@samsung.com>, Darrell Ball
> >> <db...@vmware.com>, Yipeng Wang <yipeng1.w...@intel.com>, Kevin
> >> Traynor <ktray...@redhat.com>, Ciara Loftus <ciara.lof...@intel.com>,
> >> Antonio Fischetti <antonio.fische...@intel.com>, Ilya Maximets
> >> <i.maxim...@samsung.com>
> >> Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC
> >> probability.
> >>
> >> Currently, real insertion probability is higher than configured
> >> for the maximum case because of wrong usage of the random value.
> >>
> >> i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
> >> equals to 1. In this case we're allowing insert if random vailue
> >> is less or equal to 1. So, two of 2**32 values (0 and 1) are
> >> allowed and real probability is 2 times higher than configured.
> >>
> >> This happens because 'random_uint32()' returns value in range
> >> [0; UINT32_MAX], but for the checking to be correct we should
> >> generate random value in range [0; UINT32_MAX - 1].
> >>
> >>
> >> I understand the calculation is slightly off.
> >> If the user enters 4,294,967,295 then the probability to insert into 
> emc
> will be
> >> 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,
> >>
> >> However, is there a general concern about such a low probability 
> anyways
> ?
> >> This max inverse value would be rarely, if ever used and if used, it
> would be
> >> impossible
> >> to see the difference in any real use case.
> >>
> >> The user might as well just disable emc rather than use such tiny
> probabilities.
> >>
> >> This existing api was discussed extensively and was very contentious.
> >>
> >> However, if patch 2 really has an absolute dependency on this patch 1,
> we
> >> can include it.
> >> I have done various testing and don’t see that, but I have some 
> comments
> >> on the
> >> other threads.
> >>
> >>
> > [Wang, Yipeng] The dependency I can tell is that when do EMC_insert, the
> random bit is chosen by (random_value >> EM_FLOW_INSERT_IN

Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy.

2017-08-04 Thread Fischetti, Antonio
LGTM

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Friday, August 4, 2017 3:17 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Darrell Ball <db...@vmware.com>;
> Wang, Yipeng1 <yipeng1.w...@intel.com>; Kevin Traynor <ktray...@redhat.com>;
> Loftus, Ciara <ciara.lof...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>; Ilya Maximets <i.maxim...@samsung.com>
> Subject: [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Current EMC replacement policy allows to replace active EMC entry
> even if there are dead (empty) entries available. This leads to
> EMC trashing even on few hundreds of flows. In some cases PMD
> threads starts to execute classifier lookups even in tests with
> 50 - 100 active flows.
> 
> Looks like the hash comparison rule was introduced to randomly
> choose one of alive entries to replace. But it doesn't work as
> needed and also hashes has nothing common with randomness.
> 
> Lets fix the replacement policy by removing hash checking and
> using the random value passed from 'emc_probabilistic_insert()'
> only while considering replace of the alive entry.
> This should give us nearly fair way to choose the entry to replace.
> 
> We are avoiding calculation of the new random value by reusing
> bits of already generated random for probabilistic EMC insertion.
> Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
> lower bits are less than 'min' and not fully random.
> 
> Not replacing of alive entries while dead ones exists allows to
> significantly decrease EMC trashing.
> 
> Testing shows stable work of exact match cache without misses
> with up to 3072 - 6144 active flows (depends on traffic pattern).
> 
> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
> ---
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-04 Thread Fischetti, Antonio
My reply inline.

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Friday, August 4, 2017 12:38 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82@samsung.com>; Loftus, Ciara
> <ciara.lof...@intel.com>
> Subject: Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values 
> for
> EMC probability.
> 
> Hi Antonio,
> 
> Thanks for review and testing. Comments inline.
> 
> Best regards, Ilya Maximets.
> 
> On 03.08.2017 18:37, Fischetti, Antonio wrote:
> > LGTM,
> > just wondering if a further comment would help,
> > eg something like
> >
> > +/* random_uint32() returns a value in the [0; UINT32_MAX] range.
> > +   For our checking to be correct we would need instead a random value
> > +   in the range [0; UINT32_MAX - 1]. To avoid further computation
> > +   we use a decreased range of available values for 'emc-insert-inv-prob'
> > +   ie [0; 2**20 - 1]. */
> > +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
> > +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
> > +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
> >
> > ?
> >
> > -Antonio
> 
> I don't think that such a comment will be useful for the reader.
> This is more like explanation why we made this change and it should
> be in commit message (which already has this information).
> 
> In addition, the next patch adds build time assert which will forbid
> using of EM_FLOW_INSERT_INV_PROB_SHIFT higher than 30.
> 
> One thing we may clarify here is why 20 was chosen.
> Something like this:
> 
> /* Set up maximum inverse EMC insertion probability to 2^20 - 1.
>  * Higher values considered useless in practice. */
> 
> What do you think?

[Antonio] yes, sounds good. Thanks.


> 
> >
> >> -Original Message-
> >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> >> On Behalf Of Ilya Maximets
> >> Sent: Monday, July 31, 2017 3:41 PM
> >> To: ovs-dev@openvswitch.org
> >> Cc: Heetae Ahn <heetae82@samsung.com>; Ilya Maximets
> >> <i.maxim...@samsung.com>
> >> Subject: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for
> EMC
> >> probability.
> >>
> >> Currently, real insertion probability is higher than configured
> >> for the maximum case because of wrong usage of the random value.
> >>
> >> i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
> >> equals to 1. In this case we're allowing insert if random vailue
> >> is less or equal to 1. So, two of 2**32 values (0 and 1) are
> >> allowed and real probability is 2 times higher than configured.
> >>
> >> This happens because 'random_uint32()' returns value in range
> >> [0; UINT32_MAX], but for the checking to be correct we should
> >> generate random value in range [0; UINT32_MAX - 1].
> >>
> >> To fix this we have 4 possible solutions:
> >>
> >>  1. need to use uint64_t for 'emc-insert-min' and calculate it
> >> as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
> >> range [0; UINT32_MAX].
> >>
> >> This may decrease performance becaue of 64 bit atomic ops.
> >>
> >>  2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
> >> because it's the only value we have issues with.
> >>
> >> This will require additional explanations and not very friendly
> >> for users.
> >>
> >>  3. Generate random value in range [0; UINT32_MAX - 1].
> >>
> >> This will require heavy division operation.
> >>
> >>  4. Decrease the range of available values for 'emc-insert-inv-prob'.
> >>
> >> Actually, we don't need to have so much different values for
> >> that option. I beleve that values higher than 1M are completely
> >> useless. Choosing the upper limit as a power of 2 like 2**20 we
> >> will be able to mask the generated random value in a fast way
> >> and also avoid range issue, because same uint32_t can be used to
> >> store 2**20.
> >>
> >> This patch implements solution #4.
> >>
> >> CC: Ciara Loftus <ciara.lof...@intel.com>
> >> Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
> >> Signed-off-by: Ilya Maximets &

Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-03 Thread Fischetti, Antonio
LGTM, 
just wondering if a further comment would help, 
eg something like

+/* random_uint32() returns a value in the [0; UINT32_MAX] range.
+   For our checking to be correct we would need instead a random value
+   in the range [0; UINT32_MAX - 1]. To avoid further computation 
+   we use a decreased range of available values for 'emc-insert-inv-prob'
+   ie [0; 2**20 - 1]. */
+#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
+#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
+#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)

?

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Ilya Maximets
> Sent: Monday, July 31, 2017 3:41 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Ilya Maximets
> 
> Subject: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for 
> EMC
> probability.
> 
> Currently, real insertion probability is higher than configured
> for the maximum case because of wrong usage of the random value.
> 
> i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
> equals to 1. In this case we're allowing insert if random vailue
> is less or equal to 1. So, two of 2**32 values (0 and 1) are
> allowed and real probability is 2 times higher than configured.
> 
> This happens because 'random_uint32()' returns value in range
> [0; UINT32_MAX], but for the checking to be correct we should
> generate random value in range [0; UINT32_MAX - 1].
> 
> To fix this we have 4 possible solutions:
> 
>  1. need to use uint64_t for 'emc-insert-min' and calculate it
> as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
> range [0; UINT32_MAX].
> 
> This may decrease performance becaue of 64 bit atomic ops.
> 
>  2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
> because it's the only value we have issues with.
> 
> This will require additional explanations and not very friendly
> for users.
> 
>  3. Generate random value in range [0; UINT32_MAX - 1].
> 
> This will require heavy division operation.
> 
>  4. Decrease the range of available values for 'emc-insert-inv-prob'.
> 
> Actually, we don't need to have so much different values for
> that option. I beleve that values higher than 1M are completely
> useless. Choosing the upper limit as a power of 2 like 2**20 we
> will be able to mask the generated random value in a fast way
> and also avoid range issue, because same uint32_t can be used to
> store 2**20.
> 
> This patch implements solution #4.
> 
> CC: Ciara Loftus 
> Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
> Signed-off-by: Ilya Maximets 
> ---
> 
> Infrastructure and logic introduced here will be used for fixing
> emc replacement policy.
> 
>  lib/dpif-netdev.c| 12 
>  vswitchd/vswitch.xml |  3 ++-
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 47a9fa0..123a7c9 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -152,9 +152,12 @@ struct netdev_flow_key {
>  #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
>  #define EM_FLOW_HASH_SEGS 2
> 
> +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
> +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
> +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
>  /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
>  #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
> -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
> +#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\
>  DEFAULT_EM_FLOW_INSERT_INV_PROB)
> 
>  struct emc_entry {
> @@ -2077,7 +2080,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread
> *pmd,
>  uint32_t min;
>  atomic_read_relaxed(>dp->emc_insert_min, );
> 
> -if (min && random_uint32() <= min) {
> +if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
>  emc_insert(>flow_cache, key, flow);
>  }
>  }
> @@ -2894,8 +2897,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct
> smap *other_config)
>  }
> 
>  atomic_read_relaxed(>emc_insert_min, _min);
> -if (insert_prob <= UINT32_MAX) {
> -insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
> +if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {
> +insert_min = insert_prob == 0
> + ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;
>  } else {
>  insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
>  insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 074535b..61f252e 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -381,7 +381,8 @@
>
> 

Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.

2017-08-03 Thread Fischetti, Antonio
LGTM.

I think this patch could work fine in conjunction with the one I posted
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html
where I'm targeting a congestion usecase with recirculated packets.
The goal is still to limit thrashing and the criteria is to avoid 
EMC lookup and insertions for recirculated packets.

I gave a try to your 2 patches, I'm using 
Commit 325b2b1a493a2230072de726bbb53a8337759f39

On top of that I applied Ciara's patch "dpif-netdev: add EMC entry count 
and %full figure to pmd-stats-show" at:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
because I wanted to see the count of EMC entries.

I generated the different traffic streams by looping on IP addresses.
Of course results are strictly dependent on the particular data traffic.

I sent traffic 64B UDP packets at the line rate and measured the Rx packet 
rate, regardless of pkt loss.

Flow setup:
table=0, in_port=dpdk0 actions=output:dpdk1
table=0, in_port=dpdk1 actions=output:dpdk0

2 PMDs, 3 Tx queues.

To read entries counts:
 sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show | grep entries

Results
===

+=+
|  Orig + Ciara's patch   |  + these 2 patches 
+=+
Streams | Entries  Rx [Mpps]  |   Entries  Rx [Mpps]
+-+
100 | 100, 1009.76, 10.05 |   100, 10010.55, 10.72 
1000| 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91 
3000| 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
5000| 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
7000| 5870, 7000  8.57, 7.20  |   6039, 7000  8.56, 7.26 
9000| 6550, 7572  7.19, 6.91  |   6643, 7836  7.06, 6.90 
11000   | 7152, 8192  6.81, 6.83  |   7158, 8192  6.81, 6.79
+-+

It's interesting to see how these patches allow to use more EMC locations -
see cases from 3000 to 9000 streams - so reducing thrashing.


-Antonio


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Ilya Maximets
> Sent: Monday, July 31, 2017 3:41 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Ilya Maximets
> 
> Subject: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Current EMC replacement policy allows to replace active EMC entry
> even if there are dead (empty) entries available. This leads to
> EMC trashing even on few hundreds of flows. In some cases PMD
> threads starts to execute classifier lookups even in tests with
> 50 - 100 active flows.
> 
> Looks like the hash comparison rule was introduced to randomly
> choose one of alive entries to replace. But it doesn't work as
> needed and also hashes has nothing common with randomness.
> 
> Lets fix the replacement policy by removing hash checking and
> using the random value passed from 'emc_probabilistic_insert()'
> only while considering replace of the alive entry.

[Antonio]
I like the approach to re-use the info we already have, in this case
the random value.

> This should give us nearly fair way to choose the entry to replace.
> 
> We are avoiding calculation of the new random value by reusing
> bits of already generated random for probabilistic EMC insertion.
> Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
> lower bits are less than 'min' and not fully random.
> 
> Not replacing of alive entries while dead ones exists allows to
> significantly decrease EMC trashing.
> 
> Testing shows stable work of exact match cache without misses
> with up to 3072 - 6144 active flows (depends on traffic pattern).
> 
> Signed-off-by: Ilya Maximets 
> ---
>  lib/dpif-netdev.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 123a7c9..a714329 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -155,6 +155,9 @@ struct netdev_flow_key {
>  #define EM_FLOW_INSERT_INV_PROB_SHIFT 20
>  #define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
>  #define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
> +/* We will use bits higher than EM_FLOW_INSERT_INV_PROB_SHIFT of the random
> + * value for EMC replacement policy. */
> +BUILD_ASSERT_DECL(32 - EM_FLOW_INSERT_INV_PROB_SHIFT >= EM_FLOW_HASH_SEGS);
>  /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
>  #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
>  #define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\
> @@ -2041,7 +2044,7 @@ emc_change_entry(struct emc_entry *ce, struct
> dp_netdev_flow *flow,
> 
>  static inline void
>  emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
> -   struct 

Re: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.

2017-08-02 Thread Fischetti, Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Tuesday, August 1, 2017 11:51 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> recirc packets.
> 
> Hi Antonio,
> 
> Unfortunately I think the performance deltas of this here probably need to be
> re-worked given the bug discovered & fixed in EMC Insertion algorithm here
> which according to the patch notes will significantly reduce EMC contention 
> for
> a given number of flows.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html

[Antonio] I think this patch and the one you mentioned are 2 different 
approaches with 2 different goals that can work fine together. 

  "Fix emc replacement policy" patch
  --
It allows to select - better than now - which location to overwrite so 
that the emc is used in a smarter way. The usecase here is the general
emc replacement management, also with very few flows, ie 50 - 100 
active flows.
In case you have to choose between 2 active flows it will decide  
with a criteria based on a good random value.

  This patch
  --
This patch is instead targeting a 'congestion' usecase where you already have 
the EMC quite full and also recirculation(s). A typical example is a 
firewall keeping track of a tens of thousands of connections. A better 
example would be a scenario - as Jan S. mentioned in one of the last 
Community calls - with 'more than 1' recirculation.
It also defines a criteria to avoid lookups.

I think both patches can work together.


> 
> However, before you commit more effort I would like to post a proposal to the
> list on a more generalized EMC load-shedding mechanism which I think could be
> more effective as it would be more granular than shedding just re-circulated
> traffic. I hope to post that today.

[Antonio] I'll have a look.


> 
> Regards,
> /Billy
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Wednesday, July 19, 2017 5:05 PM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> > recirc packets.
> >
> > When OVS is configured as a firewall, with thousands of active concurrent
> > connections, the EMC gets quicly saturated and may come under heavy
> > thrashing for the reason that original and recirculated packets keep
> overwrite
> > existing active EMC entries due to its limited size (8k).
> >
> > This thrashing causes the EMC to be less efficient than the dcpls in terms 
> > of
> > lookups and insertions.
> >
> > This patch allows to use the EMC efficiently by allowing only the 'original'
> > packets to hit EMC. All recirculated packets are sent to the classifier
> directly.
> > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for
> > EMC occupancy is set to trigger this logic. By doing so when EMC utilization
> > exceeds
> > EMC_RECIRCT_NO_INSERT_THRESHOLD:
> >  - EMC Insertions are allowed just for original packets. EMC insertion
> >and look up is skipped for recirculated packets.
> >  - Recirculated packets are sent to the classifier.
> >
> > This patch is based on patch
> > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> > Also, this patch depends on the previous one in this series.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > ---
> > In our Connection Tracker testbench set up with
> >
> >  table=0, priority=1 actions=drop
> >  table=0, priority=10,arp actions=NORMAL  table=0, priority=100,ct_state=-
> > trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> > actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> > actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> >
> > we saw the following performance improvement.
> >
> > We measured packet Rx rate (regardless of packet loss). Bidirectional test
> > with 64B UDP packets.
> > Each row is a test with a different number of traffic streams. The traffic
> > generator is set so that each stream establishes one U

Re: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when EMC is disabled.

2017-08-01 Thread Fischetti, Antonio


> -Original Message-
> From: O Mahony, Billy
> Sent: Monday, July 31, 2017 5:22 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when
> EMC is disabled.
> 
> Hi Antonio,
> 
> This is patch is definitely simpler than the original.
> 
> However on the original patch I suggested:
> 
> "If so it would be less disturbing to the existing code to just add a bool arg
> to dpif_netdev_packet_get_rss_hash() called do_not_check_recirc_depth and use
> that to return early (before the if (recirc_depth) check). Also in that case
> the patch would require none of the  conditional logic changes (neither the
> original or that suggested in this email) and should be able to just set the
> proposed do_not_check_recirc_depth based on md_is_valid."
> 
> I know you checked this and reported the performance gain was lower than with
> the v1 patch. We surmised that it was related to introducing a branch in the
> dpif_netdev_packet_get_rss_hash(). However there are many branches in this
> patch also.
> 
> Can you give details of how you are testing?
> * What is the traffic
> * the flows/rules and
> * how are you measuring the performance difference  (ie. cycles per packet or
> packet throughput or some other measure).

[Antonio]
I'm using a port-to-port setup, sending 1 UDP flow, 64B packets.
2 PMDs with 3 Tx queues.
The biggest difference is with case A where the 5-tuple hash is computed 
in software.

Case A) RSS Hash is disabled. I see for each side the Rx pkt rate:
 Orig OvS + previous patch#1:   1.281.29   =2.57 Mpps
 Orig OvS + previous patch#1 + this patch:  1.471.49   =2.96 Mpps   
  

Case B) In case RSS Hash is enabled I see more or less the same performance:
 Orig OvS + previous patch#1:   11.59   11.72  =23.31 Mpps
 Orig OvS + previous patch#1 + this patch:  11.45   11.84  =23.29 Mpps

Case C) RSS Hash is enabled, EMC is disabled.
 Orig OvS + previous patch#1 + No EMC:  7.687.65   =15.33 Mpps
 Orig OvS + previous patch#1 + this patch:  7.627.73   =15.35 Mpps

> 
> Apologies for going on about this but if we can't get the same effect with a
> two or three line change than a 20line change I think it'll be worth it.
> 
> One other comment below
> 
> Thanks,
> Billy.
> 
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Wednesday, July 19, 2017 5:05 PM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when
> > EMC is disabled.
> >
> > When EMC is disabled the reading of RSS hash is skipped.
> 
> [[BO'M]] I think this is already the case with the existing code?  Just
> addition of OVS_UNLIKELY on the check.
[Antonio] No, in the existing code when EMC is disabled it just skips 
emc_lookup.

miniflow_extract(packet, >mf);
key->len = 0; /* Not computed yet. */
key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);  <---  
Read Hash

/* If EMC is disabled skip emc_lookup */
flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);  <--- Skip 
emc_lookup


> 
> > For packets that are not recirculated it retrieves the hash value without
> > considering the recirc id.
> >
> > This is mostly a preliminary change for the next patch in this series.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dpif-netdev.c | 42 ++
> >  1 file changed, 34 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 123e04a..9562827
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4472,6 +4472,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread
> > *pmd, struct dp_packet *packet_,  }
> >
> >  static inline uint32_t
> > +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> > +const struct miniflow *mf) {
> > +uint32_t hash;
> > +
> > +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> > +hash = dp_packet_get_rss_hash(packet);
> > +} else {
> > +hash = miniflow_hash_5tuple(mf, 0);
> > +dp_packet_set_rss_hash(packet, hash);
> > +}
> > +
> > +return hash;
> > +}
> > +
> > +static inline uint32_t
> >  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
> > 

Re: [ovs-dev] [patch_v2] dpdk: Fix device cleanup.

2017-07-31 Thread Fischetti, Antonio

> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Sunday, July 30, 2017 4:19 PM
> To: Darrell Ball <dlu...@gmail.com>
> Cc: d...@openvswitch.org; Fischetti, Antonio <antonio.fische...@intel.com>;
> Ilya Maximets <i.maxim...@samsung.com>
> Subject: Re: [ovs-dev] [patch_v2] dpdk: Fix device cleanup.
> 
> Darrell Ball <dlu...@gmail.com> writes:
> 
> > Commit 5dcde09c80a8 was introduced to make detaching more
> > automatic without using an additional command beyond
> > ovs-vsctl del-port  .
> >
> > Sometimes, since commit 5dcde09c80a8, dpdk devices are
> > not detached when del-port is issued; command example:
> >
> > sudo ovs-vsctl del-port br0 dpdk1
> >
> > This can happen when vswitchd is (re)started with an existing
> > database and devices are already bound to dpdk.
> >
> > A minimal recipe to reproduce the issue is:
> >
> > 1/ Starting with
> >
> > darrell@prmh-nsx-perf-server125:~$ sudo ovs-vsctl show
> > 1c50d8ee-b17f-4fac-a595-03b0da8c8275
> > Bridge "br0"
> > Port "br0"
> > Interface "br0"
> > type: internal
> > Port "dpdk1"
> > Interface "dpdk1"
> > type: dpdk
> > options: {dpdk-devargs=":04:00.1"}
> > Port "dpdk0"
> > Interface "dpdk0"
> > type: dpdk
> > options: {dpdk-devargs=":04:00.0"}
> >
> > darrell@prmh-nsx-perf-server125:~$ /usr/src/dpdk-16.11/tools/dpdk-
> devbind.py --status
> >
> > Network devices using DPDK-compatible driver
> > 
> > :04:00.0 'Ethernet Controller 10-Gigabit X540-AT2'
> drv=uio_pci_generic unused=ixgbe,vfio-pci
> > :04:00.1 'Ethernet Controller 10-Gigabit X540-AT2'
> drv=uio_pci_generic unused=ixgbe,vfio-pci
> >
> > 2/ restart vswitchd
> >
> > 3/ run
> >  sudo ovs-vsctl del-port br0 dpdk1
> >
> > and find the interface is NOT detached; there is
> > no info log ‘Device ':04:00.1' detached’.
> >
> > A more verbose discussion is here:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html
> > along with another possible solution.
> >
> > Since we are nearing the end of a release, a safe approach is needed,
> > at this time.
> > One approach is to revert 5dcde09c80a8.  This patch does not do that
> > but reinstates the command ovs-appctl netdev-dpdk/detach to handle
> > cases when del-port will not work.
> >
> > To detach the device, run the reinstated command
> > ovs-appctl netdev-dpdk/detach :04:00.1
> > Observe console output
> > ‘Device ':04:00.1' has been detached’
> >
> > Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.")
> > CC: Ilya Maximets <i.maxim...@samsung.com>
> > Acked-by: Fischetti, Antonio <antonio.fische...@intel.com>
> > Signed-off-by: Darrell Ball <dlu...@gmail.com>
> > ---
> 
> LGTM.
> 
> Acked-by: Aaron Conole <acon...@redhat.com>

LGTM, I see it works as expected.

Acked-by: Antonio Fischetti <antonio.fische...@intel.com>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v1] dpdk: Fix device cleanup.

2017-07-28 Thread Fischetti, Antonio
LGTM I agree with this safe approach.

I did some basic testing with a Niantic NIC but I couldn't 
replicate the issue. Below some details on my testing.

To cause the issue I killed and restarted vswitchd only, so
that it could find and run with an existing database with 
the devices already bound to dpdk. Then I deleted one dpdk port
and re-added it later on.

In both cases - with and without this patch - I couldn't see
any difference, in particular when debugging the following 
functions:
 common_construct()
 netdev_dpdk_destruct()
 netdev_dpdk_process_devargs()


With the patch, when I delete a port I don't see messages like
Device ':xx:00.0' has been detached
should I expect to see that?


I saw that if I call
ovs-appctl netdev-dpdk/detach :xx:00.0
before deleting the port it works fine by displaying
Device YYY is being used by interface... Remove it...


Regards,
-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Darrell Ball
> Sent: Wednesday, July 26, 2017 6:10 AM
> To: d...@openvswitch.org
> Cc: Ilya Maximets 
> Subject: [ovs-dev] [patch_v1] dpdk: Fix device cleanup.
> 
> Commit 5dcde09c80a8 was introduced to make detaching more
> automatic without using an additional command.
> 
> Sometimes, since commit 5dcde09c80a8, dpdk devices are
> not detached when del-port is issued; command example:
> 
> sudo ovs-vsctl del-port br0 dpdk1
> 
> This can happen when vswitchd is (re)started with an existing
> database and devices are already bound to dpdk.
> 
> A discussion is here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html
> along with a possible solution.
> 
> Since we are nearing the end of a release, a safe approach is needed,
> at this time.
> One approach is to revert 5dcde09c80a8.  This patch does not do that
> but reinstates the command ovs-appctl netdev-dpdk/detach to handle
> cases when del-port will not work.
> 
> Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.")
> CC: Ilya Maximets 
> Signed-off-by: Darrell Ball 
> ---
>  Documentation/howto/dpdk.rst | 12 ++
>  lib/netdev-dpdk.c| 52
> +++-
>  2 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index af01d3e..3c198a2 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -331,6 +331,18 @@ Detaching will be performed while processing del-port
> command::
> 
>  $ ovs-vsctl del-port dpdkx
> 
> +Sometimes, the del-port command may not detach the device.
> +Detaching can be confirmed by the appearance of an INFO log.
> +For example::
> +
> +Device ':04:00.0' has been detached
> +
> +If the log is not seen, then the port can be detached using::
> +
> +$ ovs-appctl netdev-dpdk/detach :01:00.0
> +
> +Again, detaching can be confirmed by the above INFO log.
> +
>  This feature is not supported with VFIO and does not work with some NICs.
>  For more information please refer to the `DPDK Port Hotplug Framework
> 
>  >`__.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index ea17b97..812d262 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1013,7 +1013,7 @@ netdev_dpdk_destruct(struct netdev *netdev)
>  if (rte_eth_dev_detach(dev->port_id, devname) < 0) {
>  VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>  } else {
> -VLOG_INFO("Device '%s' detached", devname);
> +VLOG_INFO("Device '%s' has been detached", devname);
>  }
>  }
> 
> @@ -2449,6 +2449,53 @@ netdev_dpdk_set_admin_state(struct unixctl_conn
> *conn, int argc,
>  unixctl_command_reply(conn, "OK");
>  }
> 
> +static void
> +netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +   const char *argv[], void *aux OVS_UNUSED)
> +{
> +int ret;
> +char *response;
> +uint8_t port_id;
> +char devname[RTE_ETH_NAME_MAX_LEN];
> +struct netdev_dpdk *dev;
> +
> +ovs_mutex_lock(_mutex);
> +
> +if (!rte_eth_dev_count() || rte_eth_dev_get_port_by_name(argv[1],
> + _id)) {
> +response = xasprintf("Device '%s' not found in DPDK", argv[1]);
> +goto error;
> +}
> +
> +dev = netdev_dpdk_lookup_by_port_id(port_id);
> +if (dev) {
> +response = xasprintf("Device '%s' is being used by interface
> '%s'. "
> + "Remove it before detaching",
> + argv[1], netdev_get_name(>up));
> +goto error;
> +}
> +
> +rte_eth_dev_close(port_id);
> +
> +ret = rte_eth_dev_detach(port_id, devname);
> +if (ret < 0) {
> +response = 

Re: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert for recirculated packets.

2017-07-26 Thread Fischetti, Antonio
Thanks for your feedback, good to see this could follow up with some further
solutions. In the meantime - based also on your suggestions - I posted a 
v2 of this patch at 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html

It's in a patchset that begins at 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335938.html

-Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Wednesday, July 26, 2017 9:55 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert
> for recirculated packets.
> 
> Hi Antonio,
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.

2017-07-25 Thread Fischetti, Antonio
Hi Darrell, I posted a v5 at http://patchwork.ozlabs.org/patch/792723/
where I added all your suggestions.

Thanks,
-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Fischetti, Antonio
> Sent: Monday, July 24, 2017 10:45 AM
> To: Darrell Ball <db...@vmware.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.
> 
> Thanks Darrell, I will apply your suggetions.
> 
> -Antonio
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Sunday, July 23, 2017 6:28 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.
> >
> > Minor comment:
> >
> > When applying the patch, there is a complaint about new whitespace below
> 
> [Antonio] will remove it.
> 
> 
> >
> > +the number of connections in a bucket is greater than \fIThreshold\fR.
> > +
> >
> > Two comments inline
> >
> >
> > -Original Message-
> > From: <ovs-dev-boun...@openvswitch.org> on behalf of
> > "antonio.fische...@intel.com" <antonio.fische...@intel.com>
> > Date: Saturday, July 22, 2017 at 7:38 AM
> > To: "d...@openvswitch.org" <d...@openvswitch.org>
> > Subject: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.
> >
> > With the command:
> >  ovs-appctl dpctl/ct-bkts
> > shows the number of connections per bucket.
> >
> > By using a threshold:
> >  ovs-appctl dpctl/ct-bkts gt=N
> > for each bucket shows the number of connections when they
> > are greater than N.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > ---
> >  lib/conntrack.c|   9 ++--
> >  lib/conntrack.h|   2 +-
> >  lib/ct-dpif.c  |   4 +-
> >  lib/ct-dpif.h  |   3 +-
> >  lib/dpctl.c| 108
> > -
> >  lib/dpctl.man  |   8 +++
> >  lib/dpif-netdev.c  |   4 +-
> >  lib/dpif-netlink.c |   4 +-
> >  lib/dpif-provider.h|   2 +-
> >  lib/netlink-conntrack.c|   6 ++-
> >  lib/netlink-conntrack.h|   3 +-
> >  tests/test-netlink-conntrack.c |   3 +-
> >  utilities/ovs-dpctl.c  |   1 +
> >  13 files changed, 140 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index de46a6b..e290b20 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -1931,7 +1931,7 @@ conn_key_to_tuple(const struct conn_key *key,
> > struct ct_dpif_tuple *tuple)
> >
> >  static void
> >  conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry
> > *entry,
> > -  long long now)
> > +  long long now, int bkt)
> >  {
> >  struct ct_l4_proto *class;
> >  long long expiration;
> > @@ -1954,11 +1954,12 @@ conn_to_ct_dpif_entry(const struct conn
> *conn,
> > struct ct_dpif_entry *entry,
> >  if (class->conn_get_protoinfo) {
> >  class->conn_get_protoinfo(conn, >protoinfo);
> >  }
> > +entry->bkt = bkt;
> >  }
> >
> >  int
> >  conntrack_dump_start(struct conntrack *ct, struct conntrack_dump
> > *dump,
> > - const uint16_t *pzone)
> > + const uint16_t *pzone, int *ptot_bkts)
> >  {
> >  memset(dump, 0, sizeof(*dump));
> >  if (pzone) {
> > @@ -1967,6 +1968,8 @@ conntrack_dump_start(struct conntrack *ct,
> > struct conntrack_dump *dump,
> >  }
> >  dump->ct = ct;
> >
> > +*ptot_bkts = CONNTRACK_BUCKETS;
> > +
> >  return 0;
> >  }
> >
> > @@ -1991,7 +1994,7 @@ conntrack_dump_next(struct conntrack_dump
> *dump,
> > struct ct_dpif_entry *entry)
> >  INIT_CONTAINER(conn, node, node);
> >  if ((!dump->fi

Re: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.

2017-07-24 Thread Fischetti, Antonio
Thanks Darrell, I will apply your suggetions.

-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Sunday, July 23, 2017 6:28 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.
> 
> Minor comment:
> 
> When applying the patch, there is a complaint about new whitespace below

[Antonio] will remove it.


> 
> +the number of connections in a bucket is greater than \fIThreshold\fR.
> +
> 
> Two comments inline
> 
> 
> -Original Message-
> From: <ovs-dev-boun...@openvswitch.org> on behalf of
> "antonio.fische...@intel.com" <antonio.fische...@intel.com>
> Date: Saturday, July 22, 2017 at 7:38 AM
> To: "d...@openvswitch.org" <d...@openvswitch.org>
> Subject: [ovs-dev] [PATCH v4] dpctl: Add new 'ct-bkts' command.
> 
> With the command:
>  ovs-appctl dpctl/ct-bkts
> shows the number of connections per bucket.
> 
> By using a threshold:
>  ovs-appctl dpctl/ct-bkts gt=N
> for each bucket shows the number of connections when they
> are greater than N.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> Co-authored-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> ---
>  lib/conntrack.c|   9 ++--
>  lib/conntrack.h|   2 +-
>  lib/ct-dpif.c  |   4 +-
>  lib/ct-dpif.h  |   3 +-
>  lib/dpctl.c| 108
> -
>  lib/dpctl.man  |   8 +++
>  lib/dpif-netdev.c  |   4 +-
>  lib/dpif-netlink.c |   4 +-
>  lib/dpif-provider.h|   2 +-
>  lib/netlink-conntrack.c|   6 ++-
>  lib/netlink-conntrack.h|   3 +-
>  tests/test-netlink-conntrack.c |   3 +-
>  utilities/ovs-dpctl.c  |   1 +
>  13 files changed, 140 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index de46a6b..e290b20 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1931,7 +1931,7 @@ conn_key_to_tuple(const struct conn_key *key,
> struct ct_dpif_tuple *tuple)
> 
>  static void
>  conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry
> *entry,
> -  long long now)
> +  long long now, int bkt)
>  {
>  struct ct_l4_proto *class;
>  long long expiration;
> @@ -1954,11 +1954,12 @@ conn_to_ct_dpif_entry(const struct conn *conn,
> struct ct_dpif_entry *entry,
>  if (class->conn_get_protoinfo) {
>  class->conn_get_protoinfo(conn, >protoinfo);
>  }
> +entry->bkt = bkt;
>  }
> 
>  int
>  conntrack_dump_start(struct conntrack *ct, struct conntrack_dump
> *dump,
> - const uint16_t *pzone)
> + const uint16_t *pzone, int *ptot_bkts)
>  {
>  memset(dump, 0, sizeof(*dump));
>  if (pzone) {
> @@ -1967,6 +1968,8 @@ conntrack_dump_start(struct conntrack *ct,
> struct conntrack_dump *dump,
>  }
>  dump->ct = ct;
> 
> +*ptot_bkts = CONNTRACK_BUCKETS;
> +
>  return 0;
>  }
> 
> @@ -1991,7 +1994,7 @@ conntrack_dump_next(struct conntrack_dump *dump,
> struct ct_dpif_entry *entry)
>  INIT_CONTAINER(conn, node, node);
>  if ((!dump->filter_zone || conn->key.zone == dump->zone)
> &&
>   (conn->conn_type != CT_CONN_TYPE_UN_NAT)) {
> -conn_to_ct_dpif_entry(conn, entry, now);
> +conn_to_ct_dpif_entry(conn, entry, now, dump-
> >bucket);
>  break;
>  }
>  /* Else continue, until we find an entry in the
> appropriate zone
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index defde4c..3f48444 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -108,7 +108,7 @@ struct conntrack_dump {
>  struct ct_dpif_entry;
> 
>  int conntrack_dump_start(struct conntrack *, struct conntrack_dump *,
> - const uint16_t *pzone);
> + const uint16_t *pzone, int *);
>  int conntrack_dump_next(struct conntrack_dump *, struct ct_dpif_entry
> *);
>  int conntrack_

Re: [ovs-dev] [PATCH v3] dpctl: Add new 'ct-bkts' command.

2017-07-22 Thread Fischetti, Antonio
Thanks Darrell, I agree with all your suggestions. 
I'll rework and post a new version.

-Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Friday, July 21, 2017 8:12 AM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3] dpctl: Add new 'ct-bkts' command.
> 
> I did some testing; display looks nice to me
> 
> Other comments inline
> 
> > 2017-07-21T06:38:33.215Z|00053|unixctl|DBG|received request dpctl/ct-
> bkts["netdev@ovs-netdev","gt=0"], id=0
> > 2017-07-21T06:38:33.215Z|00054|dpctl|INFO|set_names=0 verbosity=0
> names=0
> > 2017-07-21T06:38:33.215Z|00055|dpctl|WARN|DARRELL gt 0
> > 2017-07-21T06:38:33.215Z|00056|dpctl|WARN|DARRELL name netdev@ovs-netdev
> > 2017-07-21T06:38:33.215Z|00057|unixctl|DBG|replying with success, id=0:
> "Total Buckets: 256
> > Current Connections: 1
> >
> > +---+-+
> > |  Buckets  | Connections per Buckets |
> > +---+-+
> >0..  7   | ........
> >8.. 15   | ........
> >   16.. 23   | ........
> >   24.. 31   | ........
> >   32.. 39   | ........
> >   40.. 47   | ........
> >   48.. 55   | ........
> >   56.. 63   | ........
> >   64.. 71   | ........
> >   72.. 79   | ........
> >   80.. 87   | ........
> >   88.. 95   | ........
> >   96..103   | ........
> >  104..111   | ........
> >  112..119   | ........
> >  120..127   | ........
> >  128..135   | ........
> >  136..143   | ........
> >  144..151   | ........
> >  152..159   | ........
> >  160..167   | ........
> >  168..175   | 1.......
> >  176..183   | ........
> >  184..191   | ........
> >  192..199   | ........
> >  200..207   | ........
> >  208..215   | ........
> >  216..223   | ........
> >  224..231   | ........
> >  232..239   | ........
> >  240..247   | ........
> >  248..255   | ........
> 
> -Original Message-
> From: <ovs-dev-boun...@openvswitch.org> on behalf of
> "antonio.fische...@intel.com" <antonio.fische...@intel.com>
> Date: Tuesday, July 18, 2017 at 6:03 AM
> To: "d...@openvswitch.org" <d...@openvswitch.org>
> Subject: [ovs-dev] [PATCH v3] dpctl: Add new 'ct-bkts' command.
> 
> From: Antonio Fischetti <antonio.fische...@intel.com>
> 
> With the command:
>  ovs-appctl dpctl/ct-bkts
> shows the number of connections per bucket.
> 
> By using a threshold:
>  ovs-appctl dpctl/ct-bkts gt=N
> for each bucket shows the number of connections when they
> are greater than N.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> Co-authored-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> ---
>  lib/conntrack.c|  10 ++--
>  lib/conntrack.h|   2 +-
>  lib/ct-dpif.c  |   4 +-
>  lib/ct-dpif.h  |   3 +-
>  lib/dpctl.c| 103
> -
>  lib/dpctl.man  |   8 
>  lib/dpif-netdev.c  |   4 +-
>  lib/dpif-netlink.c |   4 +-
>  lib/dpif-provider.h|   2 +-
>  lib/netlink-conntrack.c|   3 +-
>  lib/netlink-conntrack.h|   3 +-
>  tests/test-netlink-conntrack.c |   2 +-
>  

Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for established connections.

2017-07-20 Thread Fischetti, Antonio
Thanks Sugesh for your comments, pls see replies inline.
-Antonio

> -Original Message-
> From: Chandran, Sugesh
> Sent: Friday, June 23, 2017 4:20 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> established connections.
> 
> Hi Antonio,
> 
> Thank you for the patches,
> 
> Please find my comments below.
> 
> 
> 
> Regards
> _Sugesh
> 
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of Fischetti, Antonio
> > Sent: Tuesday, June 6, 2017 5:15 PM
> > To: Darrell Ball <db...@vmware.com>; d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> > established connections.
> >
> > Thanks Darrel for your useful comments, I've tried to replicate the
> usecases
> > you mentioned, please find inline some details.
> >
> > Regards,
> > Antonio
> >
> > > -Original Message-
> > > From: Darrell Ball [mailto:db...@vmware.com]
> > > Sent: Thursday, June 1, 2017 6:20 PM
> > > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> > > d...@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation
> > > for established connections.
> > >
> > > Comments inline
> > >
> > > On 5/29/17, 8:24 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> > > Fischetti, Antonio" <ovs-dev-boun...@openvswitch.org on behalf of
> > > antonio.fische...@intel.com> wrote:
> > >
> > > Thanks Joe for your feedback and the interesting insights in
> > > conntrack in your earlier communication.
> > > We have added all the details that we considered for this first
> > > implementation. Also, some answers are inline.
> > >
> > > The purpose of this implementation is to avoid recirculation just
> > > for those packets that are part of established connections.
> > >
> > > This shouldn't affect the packet recirculation for actions other
> > > than conntrack. For example in MPLS, after a pop_mpls action the
> > > packet will still be recirculated to follow the usual datapath.
> > >
> > > Most importantly, the CT module isn't by-passed in this
> > > implementation.
> > >
> > > Besides the recirculation, all other action[s] shall be executed
> > > as-is on each packet.
> > > Any new CT change or action set by the controller will be managed
> > > as usual.
> > >
> > > For our testing we set up a simple firewall, below are the flows.
> > >
> > >
> > >  Flow Setup
> > >  --
> > > table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
> > > table=0, priority=10,arp actions=NORMAL
> > > table=0, priority=1 actions=drop
> > > table=1, ct_state=+new+trk,ip,in_port=1
> actions=ct(commit),output:2
> > > table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > > table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
> > > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> > >
> > >
> > >  Basic idea
> > >  --
> > > With the above rules, all packets belonging to an established
> > > connection will first hit the flow
> > > "ct_state=-trk,ip actions=ct(table=1)"
> > >
> > > then on recirculation, they will hit
> > > "ct_state=+est+trk,ip,in_port=.. actions=output:X".
> > >
> > > The basic idea is to do the following 2 steps.
> > > 1. Combine the two sets of actions by removing the recirculation.
> > >a) Original actions:
> > > - "ct(zone=N), recirc(id)" [ i.e ct(table=1) ]
> > > - "output:X"
> > >b) Combined Actions after Removing recirculation:
> > > - "ct(zone=N), output:X".
> > >
> > > 2. All the subsequent packets shall hit a flow with the combined
> > > actions.
> > >
> > >
> > > [Darrell]
> > >
> > > 1) This would be constraining on how rules are written such that we
> > > can’t go back to
> > >   the beginning of the pipeline, just because a packet is
> > > established.
> > >   Meaning we

Re: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled.

2017-07-19 Thread Fischetti, Antonio
Hi Billy, your suggestion really simplify the code a lot and improve
readability but unfortunately there's no gain in performance.
Anyway in the next version I'm adding some further change and I will
try to take into account your suggestions.

/Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Fischetti, Antonio
> Sent: Friday, June 23, 2017 10:53 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> when EMC is disabled.
> 
> Hi Billy, thanks for your suggestion, it makes the code more clean
> and readable.
> Once I get back from vacation I'll give it a try and check if this
> still gives a performance benefit.
> 
> /Antonio
> 
> > -Original Message-
> > From: O Mahony, Billy
> > Sent: Friday, June 23, 2017 5:23 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> > when EMC is disabled.
> >
> > Hi Antonio,
> >
> > > -Original Message-
> > > From: Fischetti, Antonio
> > > Sent: Friday, June 23, 2017 3:10 PM
> > > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org
> > > Subject: RE: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> > > when EMC is disabled.
> > >
> > > Hi Billy,
> > > thanks a lot for you suggestions. Those would really help re-factoring
> > the
> > > code by avoiding duplications.
> > > The thing is that this patch 1/4 is mainly a preparation for the next
> > patch 2/4.
> > > So I did these changes with the next patch 2/4 in mind.
> > >
> > > The final result I meant to achieve in patch 2/4 is the following.
> > > EMC lookup is skipped - not only when EMC is disabled - but also when
> > > (we're processing recirculated packets) && (the EMC is 'enough' full).
> > > The purpose is to avoid EMC thrashing.
> > >
> > > Below is how the code looks like after applying patches 1/4 and 2/4.
> > > Please let me know if you can find some similar optimizations to avoid
> > code
> > > duplications, that would be great.
> > > 
> > > /*
> > >  * EMC lookup is skipped when one or both of the following
> > >  * two cases occurs:
> > >  *
> > >  *   - EMC is disabled.  This is detected from cur_min.
> > >  *
> > >  *   - The EMC occupancy exceeds EMC_FULL_THRESHOLD and the
> > >  * packet to be classified is being recirculated.  When
> this
> > >  * happens also EMC insertions are skipped for
> recirculated
> > >  * packets.  So that EMC is used just to store entries
> which
> > >  * are hit from the 'original' packets.  This way the EMC
> > >  * thrashing is mitigated with a benefit on performance.
> > >  */
> > > if (!md_is_valid) {
> > > pkt_metadata_init(>md, port_no);
> > > miniflow_extract(packet, >mf);  <== this fn must be
> > called after
> > > pkt_metadta_init
> > > /* This is not a recirculated packet. */
> > > if (OVS_LIKELY(cur_min)) {
> > > /* EMC is enabled.  We can retrieve the 5-tuple hash
> > >  * without considering the recirc id. */
> > > if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> > > key->hash = dp_packet_get_rss_hash(packet);
> > > } else {
> > > key->hash = miniflow_hash_5tuple(>mf, 0);
> > > dp_packet_set_rss_hash(packet, key->hash);
> > > }
> > > flow = emc_lookup(flow_cache, key);
> > > } else {
> > > /* EMC is disabled, skip emc_lookup. */
> > > flow = NULL;
> > > }
> > > } else {
> > > /* Recirculated packets. */
> > > miniflow_extract(packet, >mf);
> > > if (flow_cache->n_entries & EMC_FULL_THRESHOLD) {
> > > /* EMC occupancy is over the threshold.  We skip EMC
> > >  * lookup for recirculated packets. */
> > >

Re: [ovs-dev] [PATCH v2] dpctl: Add new 'ct-bkts' command.

2017-07-18 Thread Fischetti, Antonio
Thanks Darrel for your feedback, I'll post a v3 soon.

Antonio

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Monday, July 17, 2017 6:03 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2] dpctl: Add new 'ct-bkts' command.
> 
> I have not tested it yet, but will do that later today or tomorrow.
> I have a few comments inline.
> 
> On 7/17/17, 6:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> antonio.fische...@intel.com> wrote:
> 
> With the command:
>  ovs-appctl dpctl/ct-bkts
> shows the number of connections per bucket.
> 
> By using a threshold:
>  ovs-appctl dpctl/ct-bkts gt=N
> for each bucket shows the number of connections when they
> are greater than N.
> 
> Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> Co-authored-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> ---
>  lib/conntrack.c   |  5 +--
>  lib/conntrack.h   |  4 +--
>  lib/ct-dpif.h |  4 +++
>  lib/dpctl.c   | 88
> +++
>  lib/dpctl.man |  8 +
>  utilities/ovs-dpctl.c |  1 +
>  6 files changed, 105 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index de46a6b..e986115 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1931,7 +1931,7 @@ conn_key_to_tuple(const struct conn_key *key,
> struct ct_dpif_tuple *tuple)
> 
>  static void
>  conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry
> *entry,
> -  long long now)
> +  long long now, int bkt)
> 
> The parameters can also include the number of buckets.
> 
> 
>  {
>  struct ct_l4_proto *class;
>  long long expiration;
> @@ -1954,6 +1954,7 @@ conn_to_ct_dpif_entry(const struct conn *conn,
> struct ct_dpif_entry *entry,
>  if (class->conn_get_protoinfo) {
>  class->conn_get_protoinfo(conn, >protoinfo);
>  }
> +entry->bkt = bkt;
>  }
> 
>  int
> @@ -1991,7 +1992,7 @@ conntrack_dump_next(struct conntrack_dump *dump,
> struct ct_dpif_entry *entry)
>  INIT_CONTAINER(conn, node, node);
>  if ((!dump->filter_zone || conn->key.zone == dump->zone)
> &&
>   (conn->conn_type != CT_CONN_TYPE_UN_NAT)) {
> -conn_to_ct_dpif_entry(conn, entry, now);
> +conn_to_ct_dpif_entry(conn, entry, now, dump-
> >bucket);
>  break;
>  }
>  /* Else continue, until we find an entry in the
> appropriate zone
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index defde4c..81eb9df 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -28,6 +28,7 @@
>  #include "ovs-atomic.h"
>  #include "ovs-thread.h"
>  #include "packets.h"
> +#include "ct-dpif.h"
> 
>  /* Userspace connection tracker
>   * 
> @@ -242,9 +243,6 @@ struct conntrack_bucket {
>  long long next_cleanup OVS_GUARDED;
>  };
> 
> -#define CONNTRACK_BUCKETS_SHIFT 8
> -#define CONNTRACK_BUCKETS (1 << CONNTRACK_BUCKETS_SHIFT)
> 
> Please don’t move these defines from the Userspace conntrack module
> itself.
> The associated datastructures are in the Userspace conntrack module itself
> and hence
> the defines belong here.
> 
> The API conn_to_ct_dpif_entry() can also return the number of buckets.
> 
> 
> -
>  struct conntrack {
>  /* Independent buckets containing the connections */
>  struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> index cd35f3e..b2a2f9e 100644
> --- a/lib/ct-dpif.h
> +++ b/lib/ct-dpif.h
> @@ -20,6 +20,9 @@
>  #include "openvswitch/types.h"
>  #include "packets.h"
> 
> +#define CONNTRACK_BUCKETS_SHIFT 8
> +#define CONNTRACK_BUCKETS (1 << CONNTRACK_BUCKETS_SHIFT)
> +
>  union ct_dpif_inet_addr {
>  ovs_be32 ip;
>  ovs_be32 ip6[4];
> @@ -169,6 +172,7 @@ struct ct_dpif_entry {
>  /* Timeout for this en

Re: [ovs-dev] [PATCH 3/3] dpctl: Add new 'ct-bkts' command.

2017-07-17 Thread Fischetti, Antonio
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Friday, July 14, 2017 7:29 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>
> Cc: CC: <d...@openvswitch.org>
> Subject: Re: [ovs-dev] [PATCH 3/3] dpctl: Add new 'ct-bkts' command.
> 
> On Fri, Jul 14, 2017 at 10:11:04AM +, Fischetti, Antonio wrote:
> > Thanks Ben for your feedback, my replies inline.
> >
> > > -Original Message-
> > > From: Ben Pfaff [mailto:b...@ovn.org]
> > > Sent: Tuesday, July 11, 2017 8:54 PM
> > > To: Fischetti, Antonio <antonio.fische...@intel.com>
> > > Cc: d...@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH 3/3] dpctl: Add new 'ct-bkts' command.
> > >
> > > On Fri, Jun 23, 2017 at 01:28:22PM +0100, antonio.fische...@intel.com
> > > wrote:
> > > > From: Antonio Fischetti <antonio.fische...@intel.com>
> > > >
> > > > With the command:
> > > >  ovs-appctl dpctl/ct-bkts
> > > > shows the number of connections per bucket.
> > > >
> > > > By using a threshold:
> > > >  ovs-appctl dpctl/ct-bkts gt=N
> > > > for each bucket shows the number of connections when they
> > > > are greater than N.
> > > >
> > > > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > > > Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> > > > Co-authored-by: Bhanuprakash Bodireddy
> > > <bhanuprakash.bodire...@intel.com>
> > >
> > > Is this concept of buckets one that the kernel conntracker has too?
> If
> > > so, then I'm concerned about the definition of conn_per_buckets[] in
> > > dpctl_ct_bkts(), since it has a hardcoded CONNTRACK_BUCKETS size and
> > > nothing checks whether cte.bkt is in the right range.  If not, then
> > > should this command be one that is limited to the userspace
> conntracker?
> >
> > [AF] You're right, I didn't realize the concept of buckets
> > is in the kernel CT too, I did this implementation with the
> > userspace ConnTracker in mind.
> >
> > Would it be ok to have a first implementation limited to the userspace
> CT like:
> >
> > dpctl_ct_bkts(...)
> > {
> > if (!dpctl_p->is_appctl) { /* Not called by ovs-appctl command =>
> exit. */
> > dpctl_print(dpctl_p, "Command is available for UserSpace
> ConnTracker only.\n");
> > return 0;
> > }
> >
> > < implementation for Userspace CT here >
> > ...
> >
> > }
> >
> > so that later on it could be extended to cover also the kernel CT case?
> 
> Is it much work to extend it to cover the kernel?  If not, it would be
> better to cover both.  Otherwise, sure, this approach is an OK way to
> start.

[AF] I'd prefer to start with an implementation for userspace only. 
I did some investigation on how to cover the kernel case too but - as I'm not
enough familiar with the details of the kernel CT implementation - 
I'd limit this patch to the userspace CT for now. 
I'll post a v2 soon.
Thanks,
Antonio
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] dpctl: Add new 'ct-bkts' command.

2017-07-14 Thread Fischetti, Antonio
Thanks Ben for your feedback, my replies inline.

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, July 11, 2017 8:54 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 3/3] dpctl: Add new 'ct-bkts' command.
> 
> On Fri, Jun 23, 2017 at 01:28:22PM +0100, antonio.fische...@intel.com
> wrote:
> > From: Antonio Fischetti <antonio.fische...@intel.com>
> >
> > With the command:
> >  ovs-appctl dpctl/ct-bkts
> > shows the number of connections per bucket.
> >
> > By using a threshold:
> >  ovs-appctl dpctl/ct-bkts gt=N
> > for each bucket shows the number of connections when they
> > are greater than N.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> 
> Thanks for the patch!
> 
> checkpatch reports a few style issues:
> 

[AF] my bad, will fix

> WARNING: Line length is >79-characters long
> #145 FILE: lib/dpctl.c:1529:
>  dpctl_print(dpctl_p, "\n %3d..%3d   | ", i, i +
> NUM_BKTS_PER_ROW - 1);
> 
> ERROR: Inappropriate bracing around statement
> #147 FILE: lib/dpctl.c:1531:
> if (conn_per_bkts[i] > gt)
> 
> 
> Is this concept of buckets one that the kernel conntracker has too?  If
> so, then I'm concerned about the definition of conn_per_buckets[] in
> dpctl_ct_bkts(), since it has a hardcoded CONNTRACK_BUCKETS size and
> nothing checks whether cte.bkt is in the right range.  If not, then
> should this command be one that is limited to the userspace conntracker?

[AF] You're right, I didn't realize the concept of buckets 
is in the kernel CT too, I did this implementation with the 
userspace ConnTracker in mind.

Would it be ok to have a first implementation limited to the userspace CT like:

dpctl_ct_bkts(...)
{ 
if (!dpctl_p->is_appctl) { /* Not called by ovs-appctl command => exit. */
dpctl_print(dpctl_p, "Command is available for UserSpace ConnTracker 
only.\n");
return 0;
}

< implementation for Userspace CT here >
...

}

so that later on it could be extended to cover also the kernel CT case?


> 
> I'd update NEWS to describe the new command.
> 
> Thanks,
> 
> Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/2] conntrack : Add support for rx checksum offload.

2017-07-13 Thread Fischetti, Antonio
Hi Sugesh and Darrell,
I reviewed this patchset v5 and it LGTM.

I've also tested it with 10 UDP flows, 64B packets and a firewall setup as 
below.

table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
table=0, priority=10,arp actions=NORMAL
table=0, priority=1 actions=drop
table=1, ct_state=+new+trk,ip,in_port=dpdk0 actions=ct(commit),output:dpdk1
table=1, ct_state=+new+trk,ip,in_port=dpdk1 actions=drop
table=1, ct_state=+est+trk,ip,in_port=dpdk0 actions=output:dpdk1
table=1, ct_state=+est+trk,ip,in_port=dpdk1 actions=output:dpdk0

PDM threads:  2
2 phy ports
Code was built with CFLAGS="-O2 -march=native -g".


The reason I used just a tens of flows is because this way the throughput
is enough high to make any performance changes more evident.

I ran a continuous bi-dir test, ie data were sent at the line-rate from
both sides, regardless of packet loss.
I saw a performance improvement, below my results with three different runs.
For each run I've reported the Rx pkt rates for both sides.

 
+---+--+--+
 |Run #1 |   Run #2 |   Run 
#3 |
-+---+--+--+
Latest master with commit id:|   |  |   
   |
8a8c1b93b1723e022665950ec0b623dc6b57fbb0 |  (2.62, 2.67) | (2.43, 2.43) | 
(2.58, 2.61) |
-+---+--+--+
Latest master + this patchset|  (2.80, 2.86) | (2.67, 2.67) | 
(2.75, 2.79) |
-+---+--+--+

Below the connections created by the 10 UDP streams.

udp,orig=(src=10.10.10.10,dst=20.20.20.27,sport=63,dport=63),reply=(src=20.20.20.27,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.26,sport=63,dport=63),reply=(src=20.20.20.26,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.20,sport=63,dport=63),reply=(src=20.20.20.20,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.22,sport=63,dport=63),reply=(src=20.20.20.22,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.25,sport=63,dport=63),reply=(src=20.20.20.25,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.28,sport=63,dport=63),reply=(src=20.20.20.28,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.23,sport=63,dport=63),reply=(src=20.20.20.23,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.21,sport=63,dport=63),reply=(src=20.20.20.21,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.24,sport=63,dport=63),reply=(src=20.20.20.24,dst=10.10.10.10,sport=63,dport=63)
udp,orig=(src=10.10.10.10,dst=20.20.20.29,sport=63,dport=63),reply=(src=20.20.20.29,dst=10.10.10.10,sport=63,dport=63)


-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Sugesh Chandran
> Sent: Tuesday, July 11, 2017 5:00 PM
> To: b...@ovn.org; ovs-dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v5 0/2] conntrack : Add support for rx checksum
> offload.
> 
> Conntrack need not verify the checksum of incoming packets if it is
> validated
> by DPDK physical NIC ports.
> Also make use the DPDK rx checksum mask bits along with flags while
> validating
> the reported hardware checksum state.
> 
> v4->v5
>  (No functional changes in this version)
>  - Rebased on latest master.
>  - Added Darrel as co-author and removed the 'suggested by' tag.
>  - Moved the bad checksum validate functions from patch-2 to patch-1.
> 
> v3->v4
>  - Rebased on latest master
>  - Invoke 'checksum_valid' function only when checksum is not validated in
>hardware. Check the 'validate_checksum' flag first to invoke the
>'checksum_valid' function accordingly.
> 
> v2->v3
>  - Rebased on latest master.
>  - Updated the existing DPDK checksum validation function to honor hw
> offload
>masks along with checksum bits reported by hardware.
>  - As suggested by Darrel, Introduced new functions to validate bad
> checksum
>reported by DPDK.
>  - As proposed by Darrel, modified the conntrack checksum validation to
> check
>bad checksum first on received packets.
>  - Modified conntrack to validate checksum in software only when it failed
> to
>  - do in hardware. Changed the logic to validate bad and good checksum
> flags
>reported by hardware.
>  - Added tag 'Suggested-by: Darrell Ball '.
>  - Removed Acked, Tested by tags from Antonio as the changes has been
>modified afterwards.
> 
> v1->v2
>  - Rebased on master
>  - Added acked-by and tested-by tags in commit message
> 
> Signed-off-by: Sugesh Chandran 
> Co-authored-by: Darrell Ball 
> Signed-off-by: 

Re: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet fields.

2017-06-23 Thread Fischetti, Antonio
Hi Billy, thanks for your review.
Replies inline.

/Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Friday, June 23, 2017 2:27 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet
> fields.
> 
> Hi Antonio,
> 
> I'm not sure that this approach will work. Mainly the memcpy will not take
> account of any padding that the compiler may insert and the struct.  Maybe
> if the struct was defined as packed it could fix this objection.

[AF] This patch is somewhat similar to the patch for initializing 
the pkt_metadata struct in pkt_metadata_init() at
http://patchwork.ozlabs.org/patch/779696/


> 
> Also anyone editing the structure in future would have to be aware that
> these elements need to be kept contiguous in order for packet_clone to
> keep working.

[AF] Agree, I should add a comment locally to the struct definition. 

> 
> Or if the relevant fields here were all placed in the their own nested
> struct then sizeof that nested struct could be used in the memcpy call as
> the sizeof nested_struct would account for whatever padding the compiler
> inserted.
> 
> Does this change give much of a performance increase?

I tested this while working on connection tracker. I was using this particular 
set
of flows - see 4th line - with 
"action=ct(table=1),NORMAL";
to trigger a call to dp_packet_clone_with_headroom():

ovs-ofctl del-flows br0;
ovs-ofctl add-flow br0 table=0,priority=1,action=drop;
ovs-ofctl add-flow br0 table=0,priority=10,arp,action=normal;
ovs-ofctl add-flow br0 
table=0,priority=100,ip,ct_state=-trk,"action=ct(table=1),NORMAL";
ovs-ofctl add-flow br0 
table=1,in_port=1,ip,ct_state=+trk+new,"action=ct(commit),2";
ovs-ofctl add-flow br0 table=1,in_port=1,ip,ct_state=+trk+est,"action=2";
ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+new,"action=drop";
ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+est,"action=1";

After running a Hotspot analysis with VTune for 60 secs I had in the original 
that dp_packet_clone_with_headroom was ranked at the 2nd place:
Function   CPU Time
-+---
__libc_malloc5.880s
dp_packet_clone_with_headroom4.530s
emc_lookup   4.050s
free 3.500s
pthread_mutex_unlock 2.890s
...

Instead after this change the same fn was consuming less cpu cycles:
Function   CPU Time
-+---
__libc_malloc5.900s
emc_lookup   4.070s
free 4.010s
dp_packet_clone_with_headroom3.920s
pthread_mutex_unlock 3.060s




> 
> Also I commented on 1/4 of this patchset about a cover letter - but if the
> patchset members are independent of each other then maybe they should just
> be separate patches.

[AF] I grouped these patches together because they all would be some 
optimizations on performance with a focus mainly on conntracker usecase. 
Maybe a better choice was to split them in separate patches.

> 
> Regards,
> Billy.
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Monday, June 19, 2017 11:12 AM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet
> > fields.
> >
> > From: Antonio Fischetti <antonio.fische...@intel.com>
> >
> > memcpy replaces the single copies inside
> > dp_packet_clone_with_headroom().
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dp-packet.c | 17 +
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..5b1d416
> 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -157,7 +157,7 @@ dp_packet_clone(const struct dp_packet *buffer)
> >  return dp_packet_clone_with_headroom(buffer, 0);  }
> >
> > -/* Creates and returns a new dp_packet whose data are copied from
> > 'buffer'.   The
> > +/* Creates and returns a new dp_packet whose data are copied from
> > +'buffer'. The
> >   * returned dp_packet will additionally have 'headroom' bytes of
> headroom.
> > */  struct dp_packet *  dp_packet_clone_with_headroom(const struct
> > dp_packet *buffer, size_t headroom) @@ -167,13 +167,14 @@
> > dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t
> > headroom)
> >  new_buf

Re: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled.

2017-06-23 Thread Fischetti, Antonio
Hi Billy, thanks for your suggestion, it makes the code more clean 
and readable. 
Once I get back from vacation I'll give it a try and check if this 
still gives a performance benefit.

/Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Friday, June 23, 2017 5:23 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> when EMC is disabled.
> 
> Hi Antonio,
> 
> > -Original Message-
> > From: Fischetti, Antonio
> > Sent: Friday, June 23, 2017 3:10 PM
> > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> > when EMC is disabled.
> >
> > Hi Billy,
> > thanks a lot for you suggestions. Those would really help re-factoring
> the
> > code by avoiding duplications.
> > The thing is that this patch 1/4 is mainly a preparation for the next
> patch 2/4.
> > So I did these changes with the next patch 2/4 in mind.
> >
> > The final result I meant to achieve in patch 2/4 is the following.
> > EMC lookup is skipped - not only when EMC is disabled - but also when
> > (we're processing recirculated packets) && (the EMC is 'enough' full).
> > The purpose is to avoid EMC thrashing.
> >
> > Below is how the code looks like after applying patches 1/4 and 2/4.
> > Please let me know if you can find some similar optimizations to avoid
> code
> > duplications, that would be great.
> > 
> > /*
> >  * EMC lookup is skipped when one or both of the following
> >  * two cases occurs:
> >  *
> >  *   - EMC is disabled.  This is detected from cur_min.
> >  *
> >  *   - The EMC occupancy exceeds EMC_FULL_THRESHOLD and the
> >  * packet to be classified is being recirculated.  When this
> >  * happens also EMC insertions are skipped for recirculated
> >  * packets.  So that EMC is used just to store entries which
> >  * are hit from the 'original' packets.  This way the EMC
> >  * thrashing is mitigated with a benefit on performance.
> >  */
> > if (!md_is_valid) {
> > pkt_metadata_init(>md, port_no);
> > miniflow_extract(packet, >mf);  <== this fn must be
> called after
> > pkt_metadta_init
> > /* This is not a recirculated packet. */
> > if (OVS_LIKELY(cur_min)) {
> > /* EMC is enabled.  We can retrieve the 5-tuple hash
> >  * without considering the recirc id. */
> > if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> > key->hash = dp_packet_get_rss_hash(packet);
> > } else {
> > key->hash = miniflow_hash_5tuple(>mf, 0);
> > dp_packet_set_rss_hash(packet, key->hash);
> > }
> > flow = emc_lookup(flow_cache, key);
> > } else {
> > /* EMC is disabled, skip emc_lookup. */
> > flow = NULL;
> > }
> > } else {
> > /* Recirculated packets. */
> > miniflow_extract(packet, >mf);
> > if (flow_cache->n_entries & EMC_FULL_THRESHOLD) {
> > /* EMC occupancy is over the threshold.  We skip EMC
> >  * lookup for recirculated packets. */
> > flow = NULL;
> > } else {
> > if (OVS_LIKELY(cur_min)) {
> > key->hash = dpif_netdev_packet_get_rss_hash(packet,
> > >mf);
> > flow = emc_lookup(flow_cache, key);
> > } else {
> > flow = NULL;
> > }
> > }
> > }
> > 
> >
> > Basically patch 1/4 is mostly a preliminary change for 2/4.
> >
> > Yes, patch 1/4 also allows to avoid reading hash when EMC is disabled.
> > Or - for packets that are not recirculated - avoids calling
> > recirc_depth_get_unsafe() when reading the hash.
> >
> > Also, as these functions are critical for performance, I tend to avoid
> adding
> > new Booleans that require new if statements.
> [[BO'M]]
> 
> Can you investigate refactoring this patch with something like below.  I
> thi

Re: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert for recirculated packets.

2017-06-23 Thread Fischetti, Antonio
Thanks a lot Billy, really appreciate your feedback.
My replies inline.

/Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Friday, June 23, 2017 6:39 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert
> for recirculated packets.
> 
> Hi Antonio,
> 
> This is a really interesting patch. Comments inline below.
> 
> Thanks,
> /Billy.
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Monday, June 19, 2017 11:12 AM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH RFC 2/4] dpif-netdev: Skip EMC lookup/insert
> for
> > recirculated packets.
> >
> > From: Antonio Fischetti <antonio.fische...@intel.com>
> >
> > When OVS is configured as a firewall, with thousands of active
> concurrent
> > connections, the EMC gets quicly saturated and may come under heavy
> > thrashing for the reason that original and recirculated packets keep
> overwrite
> > existing active EMC entries due to its limited size(8k).
> >
> > This thrashing causes the EMC to be less efficient than the dcpls in
> terms of
> > lookups and insertions.
> >
> > This patch allows to use the EMC efficiently by allowing only the
> 'original'
> > packets to hit EMC. All recirculated packets are sent to classifier
> directly.
> > An empirical threshold (EMC_FULL_THRESHOLD - of 50%) for EMC occupancy
> > is set to trigger this logic. By doing so when EMC utilization exceeds
> > EMC_FULL_THRESHOLD.
> >  - EMC Insertions are allowed just for original packets. EMC insertion
> >and look up is skipped for recirculated packets.
> >  - Recirculated packets are sent to classifier.
> >
> > This patch depends on the previous one in this series. It's based on
> patch
> > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show"
> at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > ---
> > In our Connection Tracker testbench set up with
> >
> >  table=0, priority=1 actions=drop
> >  table=0, priority=10,arp actions=NORMAL  table=0,
> priority=100,ct_state=-
> > trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> > actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> > actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> >
> > we saw the following performance improvement.
> >
> > Measured packet Rx rate (regardless of packet loss). Bidirectional test
> with
> > 64B UDP packets.
> > Each row is a test with a different number of traffic streams. The
> traffic
> > generator is set so that each stream establishes one UDP connection.
> > Mpps columns reports the Rx rates on the 2 sides.
> >
> >  Traffic |Orig| Orig  |  +changes  |   +changes
> >  Streams |   [Mpps]   | [EMC entries] |   [Mpps]   | [EMC entries]
> > -++---++---
> >  10  |  3.4, 3.4  |  20   |  3.4, 3.4  |  20
> > 100  |  2.6, 2.7  | 200   |  2.6, 2.7  | 201
> >   1,000  |  2.4, 2.4  |2009   |  2.4, 2.4  |1994
> >   2,000  |  2.2, 2.2  |3903   |  2.2, 2.2  |3900
> >   3,000  |  2.1, 2.1  |5473   |  2.2, 2.2  |4798
> >   4,000  |  2.0, 2.0  |6478   |  2.2, 2.2  |5663
> >  10,000  |  1.8, 1.9  |8070   |  2.0, 2.0  |7347
> > 100,000  |  1.7, 1.7  |8192   |  1.8, 1.8  |8192
> >
> 
> [[BO'M]]
> A few questions on the test:
> Are all the pkts rxd being recirculated?

[AF] Yes, I sent UDP packets with the firewall rules above. Every packets 
goes through the CT module, so after that it is recirculated.

> Are there any flows present where the pkts do not require recirculation?

[AF] No. The flow
table=0,priority=100,ct_state=-trk,ip actions=ct(table=1)
implies that any received packet goes through CT and then it is recirculated.

> Was the rxd rss hash calculation offloaded to the NIC?

[AF] Yes.

> For the cases with larger numbers of flows (10K , 100K) did you
> invest

Re: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled.

2017-06-23 Thread Fischetti, Antonio
Hi Billy,
thanks a lot for you suggestions. Those would really help re-factoring
the code by avoiding duplications.
The thing is that this patch 1/4 is mainly a preparation for the 
next patch 2/4. So I did these changes with the next patch 2/4 in mind.

The final result I meant to achieve in patch 2/4 is the following.
EMC lookup is skipped - not only when EMC is disabled - but also when
(we're processing recirculated packets) && (the EMC is 'enough' full).
The purpose is to avoid EMC thrashing.

Below is how the code looks like after applying patches 1/4 and 2/4.
Please let me know if you can find some similar optimizations to 
avoid code duplications, that would be great.

/*
 * EMC lookup is skipped when one or both of the following
 * two cases occurs:
 *
 *   - EMC is disabled.  This is detected from cur_min.
 *
 *   - The EMC occupancy exceeds EMC_FULL_THRESHOLD and the
 * packet to be classified is being recirculated.  When this
 * happens also EMC insertions are skipped for recirculated
 * packets.  So that EMC is used just to store entries which
 * are hit from the 'original' packets.  This way the EMC
 * thrashing is mitigated with a benefit on performance.
 */
if (!md_is_valid) {
pkt_metadata_init(>md, port_no);
miniflow_extract(packet, >mf);  <== this fn must be called 
after pkt_metadta_init
/* This is not a recirculated packet. */
if (OVS_LIKELY(cur_min)) {
/* EMC is enabled.  We can retrieve the 5-tuple hash
 * without considering the recirc id. */
if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
key->hash = dp_packet_get_rss_hash(packet);
} else {
key->hash = miniflow_hash_5tuple(>mf, 0);
dp_packet_set_rss_hash(packet, key->hash);
}
flow = emc_lookup(flow_cache, key);
} else {
/* EMC is disabled, skip emc_lookup. */
flow = NULL;
}
} else {
/* Recirculated packets. */
miniflow_extract(packet, >mf);
if (flow_cache->n_entries & EMC_FULL_THRESHOLD) {
/* EMC occupancy is over the threshold.  We skip EMC
 * lookup for recirculated packets. */
flow = NULL;
} else {
if (OVS_LIKELY(cur_min)) {
key->hash = dpif_netdev_packet_get_rss_hash(packet,
>mf);
flow = emc_lookup(flow_cache, key);
} else {
flow = NULL;
}
}
}


Basically patch 1/4 is mostly a preliminary change for 2/4.

Yes, patch 1/4 also allows to avoid reading hash when EMC is disabled.
Or - for packets that are not recirculated - avoids calling
recirc_depth_get_unsafe() when reading the hash.

Also, as these functions are critical for performance, I tend to avoid
adding new Booleans that require new if statements.


Thanks,
Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Friday, June 23, 2017 1:54 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 1/4] dpif-netdev: Avoid reading RSS hash
> when EMC is disabled.
> 
> Hi Antonio,
> 
> In this patch of the patchset there are three lines removed from the
> direct command flow:
> 
> -miniflow_extract(packet, >mf);
> -key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
> -flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
> 
> Which are then replicated in several different branches for logic. This is
> a lot of duplication of logic.
> 
> I *think* (I haven't tested it) this can be re-written with less branching
> like this:
> 
>  if (!md_is_valid) {
>  pkt_metadata_init(>md, port_no);
>  }
>  miniflow_extract(packet, >mf);
>  if (OVS_LIKELY(cur_min)) {
>  if (md_is_valid) {
>  key->hash = dpif_netdev_packet_get_rss_hash(packet,
> >mf);
>  }
>  else
>  {
>  if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
>  key->hash = dp_packet_get_rss_hash(packet);
>  } else {
>  key->hash = miniflow_hash_5tuple(>mf, 0);
>  dp_packet_set_rss_hash(packet, key->hash);
>  }
>  flow = emc_lookup(flow_cache, key);
>   

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-06-23 Thread Fischetti, Antonio
Hi All,
thanks for your feedback. We published a patchset v1 at

http://patchwork.ozlabs.org/patch/775505/

please feel free to review.

Thanks,
Antonio

> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, May 3, 2017 12:04 AM
> To: Darrell Ball <db...@vmware.com>; d...@openvswitch.org; ja...@ovn.org;
> jan.scheur...@ericsson.com
> Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren <ren.w...@intel.com>;
> Gobriel, Sameh <sameh.gobr...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>
> Subject: RE: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
> Accelerate Megaflow Search
> 
> Thank you Darrell for the comment, we collect some data with the scalar
> version,  please see my reply inlined.  Our newest results show good
> speedup for both scalar and AVX version.
> 
> We are still waiting for more feedback before implementing version 2.
> Please feel free to comment on the patch.
> 
> Thank you.
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, April 26, 2017 10:04 PM
> > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> > Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren
> <ren.w...@intel.com>;
> > Gobriel, Sameh <sameh.gobr...@intel.com>
> > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor
> to
> > Accelerate Megaflow Search
> >
> >
> >
> > On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote:
> >
> > Thank you Darrell for the comments. Please take a look at my reply
> inlined.
> >
> >
> >
> > > -Original Message-
> >
> > > From: Darrell Ball [mailto:db...@vmware.com]
> >
> > > Sent: Thursday, April 13, 2017 10:36 PM
> >
> > > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> >
> > > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo
> Distributor
> > to
> >
> > > Accelerate Megaflow Search
> >
> > >
> >
> > >
> >
> > >
> >
> > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> >
> > > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on
> > behalf of
> >
> > > yipeng1.w...@intel.com> wrote:
> >
> > >
> >
> > > From: Yipeng Wang <yipeng1.w...@intel.com>
> >
> > >
> >
> > > The Datapath Classifier uses tuple space search for flow
> classification.
> >
> > > The rules are arranged into a set of tuples/subtables (each
> with a
> >
> > > distinct mask).  Each subtable is implemented as a hash table
> and
> > lookup
> >
> > > is done with flow keys formed by selecting the bits from the
> packet
> > header
> >
> > > based on each subtable's mask. Tuple space search will
> sequentially
> > search
> >
> > > each subtable until a match is found. With a large number of
> subtables,
> > a
> >
> > > sequential search of the subtables could consume a lot of CPU
> cycles.
> > In
> >
> > > a testbench with a uniform traffic pattern equally distributed
> across 20
> >
> > > subtables, we measured that up to 65% of total execution time
> is
> > attributed
> >
> > > to the megaflow cache lookup.
> >
> > >
> >
> > > This patch presents the idea of the two-layer hierarchical
> lookup,
> > where a
> >
> > > low overhead first level of indirection is accessed first, we
> call this
> >
> > > level cuckoo distributor (CD). If a flow key has been inserted
> in the flow
> >
> > > table the first level will indicate with high probability that
> which
> >
> > > subtable to look into. A lookup is performed on the second
> level (the
> >
> > > target subtable) to retrieve the result. If the key doesn’t
> have a match,
> >
> > > then we revert back to the sequential search of subtables.
> >
> > >
> >
> > > This patch can improve the already existing Subtable Ranking
> when
> > traffic
> >
> > > data has high entropy. Subtable Ranking helps minimize the
> number of
> >

Re: [ovs-dev] [patch_v2 3/3] conntrack: Add hash_finish() to conn_key_hash().

2017-06-13 Thread Fischetti, Antonio
Hi Darrell,
it seems in lib/hash.h there's already a hash_finish() function for the 
Intrinsic mode where the 1st parm is a uint64_t:
static inline uint32_t hash_finish(uint64_t hash, uint64_t final)

so I'm getting some errors when building with CFLAGS="-O2 -march=native -g" 

lib/hash.h:180:24: error: conflicting types for 'hash_finish'
 static inline uint32_t hash_finish(uint64_t hash, uint64_t final)

lib/hash.h:95:24: note: previous declaration of 'hash_finish' was here
 static inline uint32_t hash_finish(uint32_t hash, uint32_t final);


Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Darrell Ball
> Sent: Friday, June 9, 2017 11:31 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [patch_v2 3/3] conntrack: Add hash_finish() to
> conn_key_hash().
> 
> The function conn_key_hash() is updated to include
> a call to hash_finish() and also to make use of a
> new hash abstraction - ct_endpoint_hash_add().
> 
> Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 9584a0a..146edd7 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1529,14 +1529,10 @@ static uint32_t
>  conn_key_hash(const struct conn_key *key, uint32_t basis)
>  {
>  uint32_t hsrc, hdst, hash;
> -int i;
> 
>  hsrc = hdst = basis;
> -
> -for (i = 0; i < sizeof(key->src) / sizeof(uint32_t); i++) {
> -hsrc = hash_add(hsrc, ((uint32_t *) >src)[i]);
> -hdst = hash_add(hdst, ((uint32_t *) >dst)[i]);
> -}
> +hsrc = ct_endpoint_hash_add(hsrc, >src);
> +hdst = ct_endpoint_hash_add(hdst, >dst);
> 
>  /* Even if source and destination are swapped the hash will be the
> same. */
>  hash = hsrc ^ hdst;
> @@ -1546,7 +1542,7 @@ conn_key_hash(const struct conn_key *key, uint32_t
> basis)
>(uint32_t *) (key + 1) - (uint32_t *) (>dst +
> 1),
>hash);
> 
> -return hash;
> +return hash_finish(hash, 0);
>  }
> 
>  static void
> --
> 1.9.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   >