[dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Morten Br?rup > Sent: Wednesday, June 17, 2015 10:54 AM > To: Thomas Monjalon > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [dpdk-announce] important design choices - > statistics - ABI > > Dear Thomas, > > I don't have time to follow the DPDK Developers mailing list, but since you > call > for feedback, I would like to share my thoughts regarding these design > choices. > > > Regarding the statistics discussion: > > 1. The suggested solution assumes that, when statistics is disabled, the cost > of allocating and maintaining zero-value statistics is negligible. If > statistics > counters are only available through accessor functions, this is probably true. > > However, if statistics counters are directly accessible, e.g. as elements in > the > fast path data structures of a library, maintaining zero-value statistics may > a > have memory and/or performance impact. > Counters are only accessible through API functions. > Since the compile time flag > CONFIG_RTE__STATS_COLLECT already tells the > application if the statistics are present or not, the application should > simply > use this flag to determine if statistics are accessible or not. > > 2. The suggested solution with only one single flag per library prevents > implementing statistics with varying granularity for different purposes. E.g. > a > library could have one set of statistics counters for ordinary SNMP purposes, > and another set of statistics counters for debugging/optimization purposes. > > Multiple flags per library should be possible. A hierarchy of flags per > library is > probably not required. > Morten, thank you for your input. It would be good if you could add your contribution to the of the guidelines documentation patch by replying to the thread that Thomas indicated: http://dpdk.org/ml/archives/dev/2015-June/019461.html. Our initial stats patch submission had a much finer granularity of stats configuration: per object type instead of per library, but a lot of people on this mailing list are against this, so we are now looking for one configuration flag per library. > > Regarding the PHY speed ABI: > > 1. The Ethernet PHY ABI for speed, duplex, etc. should be common > throughout the entire DPDK. It might be confusing if some > structures/functions use a bitmask to indicate PHY > speed/duplex/personality/etc. and other structures/functions use a > combination of an unsigned integer, duplex flag, personality enumeration > etc. (By personality enumeration, I am referring to PHYs with multiple > electrical interfaces. E.g. a dual personality PHY might have both an RJ45 > copper interface and an SFP module interface, whereof only one can be > active at any time.) > > 2. The auto-negotiation standard allows the PHY to announce (to its link > partner) any subset of its capabilities to its link partner. E.g. a standard > 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s in both > half and full duplex and 1 Gbit/s full duplex) can be configured to announce > 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities to its link > partner. > (Of course, more useful combinations are normally announced, but the > purpose of the example is to show that any combination is possible.) > > The ABI for auto-negotiation should include options to select the list of > capabilities to announce to the link partner. The Linux PHY ABI only allows > forcing a selected speed and duplex (thereby disabling auto-negotiation) or > enabling auto-negotiation (thereby announcing all possible speeds and > duplex combinations the PHY is capable of). Don't make the same mistake in > DPDK. > > PS: While working for Vitesse Semiconductors (an Ethernet chip company) a > long time ago, I actually wrote the API for their line of Ethernet PHYs. So I > have hands on experience in this area. > > > Regarding the discussion about backwards/forwards compatibility in the ABI: > > 1. Sometimes, ABI breakage is required. That is the cost the users pay for > getting the benefits from upgrading to the latest and greatest version of any > library. The current solution of requiring acknowledgement from several > qualified developers is fine - these developers will consider the cost/benefit > on behalf of all the DPDK users and make a qualified decision. > > 2. It is my general experience that documentation is not always updated to > reflect the fine details of the source code, and this also applies to release > notes. For open source software, the primary point of documentation is > usually the source code itself. > > 2a. It should be clearly visible directly in the DPDK source code (including > makefiles etc.) which ABI (i.e. functions, macros, type definitions etc.) is > the > current, the deprecated, and the future. > > 2b. When a developer migrates a project using DPDK from a previous version > of the DPDK, it should be easy for the
[dpdk-dev] [PATCH v2 0/6] cfgfile: config file parsing extension
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Wednesday, June 17, 2015 3:49 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/6] cfgfile: config file parsing extension > > Added new implementation of section parsing in config file. Refactored > existing code by spliting it to smaller functions. Changed section > allocation scheme and added new features - variable length entry value > and line continue character '\'. > Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH] doc: fix doxygen warnings for QoS API
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Wednesday, June 17, 2015 3:37 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] doc: fix doxygen warnings for QoS API > > This patch fix doxygen warnings when generating documentation > for qos_meter and qos_sched > > Signed-off-by: Michal Jastrzebski > --- > lib/librte_meter/rte_meter.h | 8 > lib/librte_sched/rte_bitmap.h | 10 +- > lib/librte_sched/rte_red.h| 44 +--- > --- > lib/librte_sched/rte_sched.h | 2 ++ > 4 files changed, 33 insertions(+), 31 deletions(-) > Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH] doc: guidelines for library statistics
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, June 11, 2015 1:05 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics > > Hi Cristian, > > Thanks for trying to make a policy clearer. > We need to make a decision in the coming week. > Below are comments on the style and content. > > 2015-06-08 15:50, Cristian Dumitrescu: > > doc/guides/guidelines/statistics.rst | 42 > > > Maybe we should have a more general file like design.rst. I am not sure I correctly understood your suggestion. Do you want the section on statistics to be called design.rst? Do you want it to be part of doc/guides/guidelines or create a brand new document? To me, the initial idea of doc/guidelines/statistics.rst makes more sense, but I am fine either way. > In order to have a lot of readers of such guidelines, they must be concise. I reordered the sections to provide the guidelines first and the motivation afterwards. I think it is good to keep the motivation together with the guidelines. Do you suggest to remove the motivation from this document? > > Please wrap lines to be not too long and/or split lines after the end of a > sentence. Done in next version. > > > +Library Statistics > > +== > > + > > +Description > > +--- > > + > > +This document describes the guidelines for DPDK library-level statistics > counter support. This includes guidelines for turning library statistics on > and > off, requirements for preventing ABI changes when library statistics are > turned on and off, etc. > > Should we consider that driver stats and lib stats are different in DPDK? Why? I did update the document to make it clear these guidelines are applicable for the case when the stats counters are maintained by the library itself. I think the primary difference is whether the stats counters are implemented by the HW (e.g. NIC) or in SW by the CPU. * In the first case, the CPU does not spend cycles maintaining the counters. System resources might be consumed by the NIC for maintaining the counters (e.g. memory bandwidth), so sometimes the NICs provide mechanisms to enable/disable stats, which is done as part of init code rather than buid time. * In the second case, the CPU maintains the counters, which comes at the usual cost of cycles and system resources. These guidelines focus on this case. > > > +Motivation to allow the application to turn library statistics on and off > > +- > > + > > +It is highly recommended that each library provides statistics counters to > allow the application to monitor the library-level run-time events. Typical > counters are: number of packets received/dropped/transmitted, number of > buffers allocated/freed, number of occurrences for specific events, etc. > > + > > +Since the resources consumed for library-level statistics counter > > collection > have to be spent out of the application budget and the counters collected by > some libraries might not be relevant for the current application, in order to > avoid any unwanted waste of resources and/or performance for the > application, the application is to decide at build time whether the collection > of library-level statistics counters should be turned on or off for each > library > individually. > > It would be good to have acknowledgements or other opinions on this. > Some of them were expressed in other threads. Please comment here. > > > +Library-level statistics counters can be relevant or not for specific > applications: > > +* For application A, counters maintained by library X are always relevant > and the application needs to use them to implement certain features, as > traffic accounting, logging, application-level statistics, etc. In this case, > the > application requires that collection of statistics counters for library X is > always > turned on; > > +* For application B, counters maintained by library X are only useful > > during > the application debug stage and not relevant once debug phase is over. In > this case, the application may decide to turn on the collection of library X > statistics counters during the debug phase and later on turn them off; > > Users of binary package have not this choice. There is a section in the document about allowing stats to be turned on/off without ABI impact. > > > +* For application C, counters maintained by library X are not relevant at > > all. > It might me that the application maintains its own set of statistics
[dpdk-dev] [PATCH v4 1/1] pipeline: add statistics for librte_pipeline ports and tables
> This patch adds statistics collection for librte_pipeline. > Those statistics ale disabled by default during build time. > Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v4 00/10] table: added table statistics
> Added statistics for every type of table. By default all table statistics > are disabled, user must activate them in config file. > > Changes in v4: > - created single config option for all table statistics Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v4 00/13] port: added port statistics
> Added statistics for every type of port. By default all port statistics > are disabled, user must activate them in config file. > > > Changes in v4: > - created single config option for all port statistics > Acked-by: Cristian Dumitrescu
[dpdk-dev] 4 Traffic classes per Pipe limitation
No problem, Mike. Enjoy! From: Michael Sardo [mailto:m...@bandex.io] Sent: Sunday, June 7, 2015 12:24 AM To: Dumitrescu, Cristian Cc: Yeddula, Avinash; dev at dpdk.org Subject: Re: [dpdk-dev] 4 Traffic classes per Pipe limitation Oops, I should have searched a bit more before asking. I see that they've already been made available: http://dpdk.org/ml/archives/dev/attachments/20150423/17a4d8de/attachment-0001.pdf Thanks. -Mike On Sat, Jun 6, 2015 at 5:05 PM, Michael Sardo mailto:mike at bandex.io>> wrote: Hello Cristian, Are the slides shown in that video available? They're very helpful. -Mike On Fri, Jun 5, 2015 at 4:50 PM, Dumitrescu, Cristian mailto:cristian.dumitrescu at intel.com>> wrote: Hi Avinash, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>] On > Behalf Of Yeddula, Avinash > Sent: Friday, June 5, 2015 6:06 PM > To: dev at dpdk.org<mailto:dev at dpdk.org> > Subject: [dpdk-dev] 4 Traffic classes per Pipe limitation > > Hi, > This is related to the QOS scheduler functionality provided by dpdk. > > I see a limit on the number of traffic classes to be 4. I'm exploring the > available options to increase that limit to 8. Yes, there are 4x traffic classes (scheduled in strict priority), but each traffic class has 4x queues (scheduled using WFQ); for big weight ratios between queues (e.g. 1:4 or 1:8, etc), WFQ becomes very similar to strict priority, a king of strict priority without starvation. So the 16x queues per pipe can be considered 16x sub-traffic-classes. You might want to watch this video on DPDK QoS: https://youtu.be/_PPklkWGugs > > This is what I found when I researched on this topic. > The limitation on number's of TC (and pipes) comes from the number of > bits available. Since the QoS code overloads the 32 bit RSS field in > the mbuf there isn't enough bits to a lot. But then again if you add lots > of pipes or subports the memory footprint gets huge. It is not that simple. The number of 4x traffic classes in deeply built into the implementation for performance reasons. Increasing the number of bits allocated to traffic class in mbuf->sched would not help. > > Any more info or suggestions on increasing the limit to 8 ? Yes, look at the 16x pipe queues as 16x (sub)traffic classes. > > Thanks > -Avinash
[dpdk-dev] 4 Traffic classes per Pipe limitation
Hi Avinash, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash > Sent: Friday, June 5, 2015 6:06 PM > To: dev at dpdk.org > Subject: [dpdk-dev] 4 Traffic classes per Pipe limitation > > Hi, > This is related to the QOS scheduler functionality provided by dpdk. > > I see a limit on the number of traffic classes to be 4. I'm exploring the > available options to increase that limit to 8. Yes, there are 4x traffic classes (scheduled in strict priority), but each traffic class has 4x queues (scheduled using WFQ); for big weight ratios between queues (e.g. 1:4 or 1:8, etc), WFQ becomes very similar to strict priority, a king of strict priority without starvation. So the 16x queues per pipe can be considered 16x sub-traffic-classes. You might want to watch this video on DPDK QoS: https://youtu.be/_PPklkWGugs > > This is what I found when I researched on this topic. > The limitation on number's of TC (and pipes) comes from the number of > bits available. Since the QoS code overloads the 32 bit RSS field in > the mbuf there isn't enough bits to a lot. But then again if you add lots > of pipes or subports the memory footprint gets huge. It is not that simple. The number of 4x traffic classes in deeply built into the implementation for performance reasons. Increasing the number of bits allocated to traffic class in mbuf->sched would not help. > > Any more info or suggestions on increasing the limit to 8 ? Yes, look at the 16x pipe queues as 16x (sub)traffic classes. > > Thanks > -Avinash
[dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Daniel Mrzyglod > Sent: Friday, June 5, 2015 3:55 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros > > Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to > meta-data fields. > Forcing aligned accesses is not really required, so this is removing an > unneeded constraint. > This issue was met during testing of the new version of the ip_pipeline > application. There is no performance impact. > This change has no ABI impact, as the previous code that uses aligned > accesses continues to run without any issues. > > Signed-off-by: Daniel Mrzyglod Ack-ed by: Cristian Dumitrescu
[dpdk-dev] [PATCH] qos_sched: example modification to use librte_cfgfile
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Friday, June 5, 2015 10:28 AM > To: dev at dpdk.org > Cc: ??cristian.dumitrescu at intel.com > Subject: [dpdk-dev] [PATCH] qos_sched: example modification to use > librte_cfgfile > > This is a modification of qos_sched example to use > librte_cfgfile for parsing configuration file. > > Signed-off-by: Michal Jastrzebski > --- > examples/qos_sched/cfg_file.c | 157 > ++--- > examples/qos_sched/cfg_file.h | 35 ++--- > examples/qos_sched/init.c | 14 ++-- > 3 files changed, 47 insertions(+), 159 deletions(-) > > diff --git a/examples/qos_sched/cfg_file.c b/examples/qos_sched/cfg_file.c > index 05a8caf..71ddabb 100644 > --- a/examples/qos_sched/cfg_file.c > +++ b/examples/qos_sched/cfg_file.c > @@ -233,92 +233,7 @@ int cfg_close(struct cfg_file *cfg) > } > > int > -cfg_num_sections(struct cfg_file *cfg, const char *sectionname, size_t > length) > -{ > - int i; > - int num_sections = 0; > - for (i = 0; i < cfg->num_sections; i++) { > - if (strncmp(cfg->sections[i]->name, sectionname, length) == > 0) > - num_sections++; > - } > - return num_sections; > -} > - > -int > -cfg_sections(struct cfg_file *cfg, char *sections[], int max_sections) > -{ > - int i; > - for (i = 0; i < cfg->num_sections && i < max_sections; i++) { > - snprintf(sections[i], CFG_NAME_LEN, "%s", cfg->sections[i]- > >name); > - } > - return i; > -} > - > -static const struct cfg_section * > -_get_section(struct cfg_file *cfg, const char *sectionname) > -{ > - int i; > - for (i = 0; i < cfg->num_sections; i++) { > - if (strncmp(cfg->sections[i]->name, sectionname, > - sizeof(cfg->sections[0]->name)) == 0) > - return cfg->sections[i]; > - } > - return NULL; > -} > - > -int > -cfg_has_section(struct cfg_file *cfg, const char *sectionname) > -{ > - return (_get_section(cfg, sectionname) != NULL); > -} > - > -int > -cfg_section_num_entries(struct cfg_file *cfg, const char *sectionname) > -{ > - const struct cfg_section *s = _get_section(cfg, sectionname); > - if (s == NULL) > - return -1; > - return s->num_entries; > -} > - > - > -int > -cfg_section_entries(struct cfg_file *cfg, const char *sectionname, > - struct cfg_entry *entries, int max_entries) > -{ > - int i; > - const struct cfg_section *sect = _get_section(cfg, sectionname); > - if (sect == NULL) > - return -1; > - for (i = 0; i < max_entries && i < sect->num_entries; i++) > - entries[i] = *sect->entries[i]; > - return i; > -} > - > -const char * > -cfg_get_entry(struct cfg_file *cfg, const char *sectionname, > - const char *entryname) > -{ > - int i; > - const struct cfg_section *sect = _get_section(cfg, sectionname); > - if (sect == NULL) > - return NULL; > - for (i = 0; i < sect->num_entries; i++) > - if (strncmp(sect->entries[i]->name, entryname, > CFG_NAME_LEN) == 0) > - return sect->entries[i]->value; > - return NULL; > -} > - > -int > -cfg_has_entry(struct cfg_file *cfg, const char *sectionname, > - const char *entryname) > -{ > - return (cfg_get_entry(cfg, sectionname, entryname) != NULL); > -} > - > - > -int > -cfg_load_port(struct cfg_file *cfg, struct rte_sched_port_params > *port_params) > +cfg_load_port(struct rte_cfgfile *cfg, struct rte_sched_port_params > *port_params) > { > const char *entry; > int j; > @@ -326,19 +241,19 @@ cfg_load_port(struct cfg_file *cfg, struct > rte_sched_port_params *port_params) > if (!cfg || !port_params) > return -1; > > - entry = cfg_get_entry(cfg, "port", "frame overhead"); > + entry = rte_cfgfile_get_entry(cfg, "port", "frame overhead"); > if (entry) > port_params->frame_overhead = (uint32_t)atoi(entry); > > - entry = cfg_get_entry(cfg, "port", "number of subports per port"); > + entry = rte_cfgfile_get_entry(cfg, "port", "number of subports per > port"); > if (entry) > port_params->n_subports_per_port = (uint32_t)atoi(entry); > > - entry = cfg_get_entry(cfg, "port", "number of pipes per subport"); > + entry = rte_cfgfile_get_entry(cfg, "port", "number of pipes per > subport"); > if (entry) > port_params->n_pipes_per_subport = (uint32_t)atoi(entry); > > - entry = cfg_get_entry(cfg, "port", "queue sizes"); > + entry = rte_cfgfile_get_entry(cfg, "port", "queue sizes"); > if (entry) { > char *next; > > @@ -356,7 +271,7 @@ cfg_load_port(struct cfg_file *cfg, struct > rte_sched_port_params *port_params) > > /* Parse WRED min thresholds */ > snprintf(str, sizeof(str), "tc %d
[dpdk-dev] [PATCH 5/5] rte_sched: allow reading without clearing
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Wednesday, May 27, 2015 7:10 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: [PATCH 5/5] rte_sched: allow reading without clearing > > The rte_sched statistics API should allow reading statistics without > clearing. Make auto-clear optional. In this version, this is handled > by deprecating the old API and adding a new one. > > Signed-off-by: Stephen Hemminger > --- > app/test/test_sched.c | 4 +-- > lib/librte_sched/rte_sched.c | 65 > +- > lib/librte_sched/rte_sched.h | 37 ++- > lib/librte_sched/rte_sched_version.map | 2 ++ > 4 files changed, 74 insertions(+), 34 deletions(-) > > diff --git a/app/test/test_sched.c b/app/test/test_sched.c > index c7239f8..03f89b4 100644 > --- a/app/test/test_sched.c > +++ b/app/test/test_sched.c > @@ -198,13 +198,13 @@ test_sched(void) > > struct rte_sched_subport_stats subport_stats; > uint32_t tc_ov; > - rte_sched_subport_read_stats(port, SUBPORT, _stats, > _ov); > + rte_sched_subport_stats(port, SUBPORT, _stats, _ov, > 1); > #if 0 > TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong > subport stats\n"); > #endif > struct rte_sched_queue_stats queue_stats; > uint16_t qlen; > - rte_sched_queue_read_stats(port, QUEUE, _stats, ); > + rte_sched_queue_stats(port, QUEUE, _stats, , 1); > #if 0 > TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue > stats\n"); > #endif > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 9c9419d..b4d7edd 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -965,61 +965,78 @@ rte_sched_port_pkt_read_color(const struct > rte_mbuf *pkt) > } > > int > -rte_sched_subport_read_stats(struct rte_sched_port *port, > - uint32_t subport_id, > - struct rte_sched_subport_stats *stats, > - uint32_t *tc_ov) > +rte_sched_subport_stats(struct rte_sched_port *port, uint32_t > subport_id, > + struct rte_sched_subport_stats *stats, > + uint32_t *tc_ov, int clear) > { > struct rte_sched_subport *s; > > /* Check user parameters */ > - if ((port == NULL) || > - (subport_id >= port->n_subports_per_port) || > - (stats == NULL) || > - (tc_ov == NULL)) { > + if (port == NULL || subport_id >= port->n_subports_per_port) > return -1; > - } > + > s = port->subport + subport_id; > > /* Copy subport stats and clear */ > - memcpy(stats, >stats, sizeof(struct rte_sched_subport_stats)); > - memset(>stats, 0, sizeof(struct rte_sched_subport_stats)); > + if (stats) > + *stats = s->stats; > + if (clear) > + memset(>stats, 0, sizeof(struct > rte_sched_subport_stats)); > > /* Subport TC ovesubscription status */ > - *tc_ov = s->tc_ov; > + if (tc_ov) > + *tc_ov = s->tc_ov; > > return 0; > } > > +/* Deprecated API, always clears */ > int > -rte_sched_queue_read_stats(struct rte_sched_port *port, > - uint32_t queue_id, > - struct rte_sched_queue_stats *stats, > - uint16_t *qlen) > +rte_sched_subport_read_stats(struct rte_sched_port *port, uint32_t > subport_id, > + struct rte_sched_subport_stats *stats, > + uint32_t *tc_ov) > +{ > + return rte_sched_subport_stats(port, subport_id, stats, tc_ov, 1); > +} > + > +int > +rte_sched_queue_stats(struct rte_sched_port *port, uint32_t queue_id, > + struct rte_sched_queue_stats *stats, > + uint16_t *qlen, int clear) > { > struct rte_sched_queue *q; > struct rte_sched_queue_extra *qe; > > /* Check user parameters */ > - if ((port == NULL) || > - (queue_id >= rte_sched_port_queues_per_port(port)) || > - (stats == NULL) || > - (qlen == NULL)) { > + if (port == NULL || queue_id >= > rte_sched_port_queues_per_port(port)) > return -1; > - } > + > q = port->queue + queue_id; > qe = port->queue_extra + queue_id; > > /* Copy queue stats and clear */ > - memcpy(stats, >stats, sizeof(struct rte_sched_queue_stats)); > - memset(>stats, 0, sizeof(struct rte_sched_queue_stats)); > + if (stats) > + *stats = qe-&g
[dpdk-dev] [PATCH 01/11] ip_pipeline: add parsing for config files with new syntax
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Monday, June 1, 2015 2:34 PM > To: Gajdzica, MaciejX T > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 01/11] ip_pipeline: add parsing for config > files with new syntax > > On Fri, 29 May 2015 17:43:08 +0200 > Maciej Gajdzica wrote: > > > +/** > > + * Find object of name *name* in *obj_array* which is constant size array > of > > + * elements that have field *name*. > > + * > > + * @param obj_array > > + * Constant size array > > + * @param name > > + * name of object to find. > > + * @return > > + * Pointer to object in *obj_array* or NULL if not found. > > + */ > > +#define APP_PARAM_FIND(obj_array, key) \ > > +({ \ > > + ssize_t obj_idx;\ > > + const ssize_t obj_count = RTE_DIM(obj_array); \ > > +\ > > + for (obj_idx = 0; obj_idx < obj_count; obj_idx++) { \ > > + if (!APP_PARAM_VALID(&((obj_array)[obj_idx]))) \ > > + continue; \ > > + \ > > + if (strcmp(key, (obj_array)[obj_idx].name) == 0)\ > > + break; \ > > + } \ > > + obj_idx < obj_count ? obj_idx : -ENOENT;\ > > +}) > > Converting all the functions to macro's is a step backwards in several ways. > * macro's are hard to support > * macro's lead to lots of programming errors > * macro's look ugly > > Why not use real functions, or make the example into C++ if you have > to do generic programming. We are using macros here only because C language does not offer us a better choice (i.e. support for templates). The alternative would be to write a quasi-identical function per each object type, which would lead to unnecessary code duplication. We did our best to keep the number of macros small and to implement each macro as straightforward as possible. All the DPDK sample applications are written in C, so this is the main reason we want to keep this application as C code. As people expect C code from DPDK sample apps, it is easier for people to reuse parts of this application.
[dpdk-dev] [PATCH 0/3] librte_cfgfile rework and extension
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Friday, May 29, 2015 4:27 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 0/3] librte_cfgfile rework and extension > > From: Pawel Wodkowski > > This patchset provide extension to librte_cfgfile to overcome limit of 64 > characters for entry value and add line continue feature. > > Pawel Wodkowski (3): > librte_cfgfile: fix code formating in header file > librte_compat: fix macro deffinition > librte_cfgfile: rework > > lib/librte_cfgfile/Makefile| 2 +- > lib/librte_cfgfile/rte_cfgfile.c | 792 > ++--- > lib/librte_cfgfile/rte_cfgfile.h | 52 +- > lib/librte_cfgfile/rte_cfgfile_version.map | 8 + > lib/librte_compat/rte_compat.h | 2 +- > 5 files changed, 649 insertions(+), 207 deletions(-) > > -- > 1.9.1 Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, May 26, 2015 10:58 PM > To: Dumitrescu, Cristian > Cc: Gajdzica, MaciejX T; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing > table stats > > On Tue, 26 May 2015 21:40:42 +0000 > "Dumitrescu, Cristian" wrote: > > > > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > > > Hemminger > > > Sent: Tuesday, May 26, 2015 3:58 PM > > > To: Gajdzica, MaciejX T > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for > storing > > > table stats > > > > > > On Tue, 26 May 2015 14:39:38 +0200 > > > Maciej Gajdzica wrote: > > > > > > > + > > > > /** Lookup table interface defining the lookup table operation */ > > > > struct rte_table_ops { > > > > rte_table_op_create f_create; /**< Create */ > > > > @@ -194,6 +218,7 @@ struct rte_table_ops { > > > > rte_table_op_entry_add f_add; /**< Entry add */ > > > > rte_table_op_entry_delete f_delete; /**< Entry delete */ > > > > rte_table_op_lookup f_lookup; /**< Lookup */ > > > > + rte_table_op_stats_read f_stats;/**< Stats */ > > > > }; > > > > > > Another good idea, which is an ABI change. > > > > This is simply adding a new API function, this is not changing any function > prototype. There is no change required in the map file of this library. Is > there > anything we should have done and we did not do? > > > > But if I built an external set of code which had rte_table_ops (don't worry I > haven't) > and that binary ran with the new definition, the core code it table would > reference > outside the (old version) of rte_table_ops structure and find garbage. This is just adding a new field at the end of an API data structure. Based on input from multiple people and after reviewing the rules listed on http://dpdk.org/doc/guides/rel_notes/abi.html , I think this is an acceptable change. There are other patches in flight on this mailing list that are in the same situation. Any typical/well behaved application will not break due to this change.
[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline
Hi Raja, Thanks for your input. I think we have the following options identified so far for stats collection configuration: 1. Stats configuration through the RTE_LOG_LEVEL 2. Single configuration flag global for all DPDK libraries 3. Single configuration flag per DPDK library It would be good if Thomas and Stephen, as well as others, would reply with their preference order. My personal preference order is: 3., 2., 1., but I can work with any of the above that is identified by the majority of the replies. My goal right now is reaching a conclusion on this item as soon as we can. Regards, Cristian > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rajagopalan > Sivaramakrishnan > Sent: Wednesday, May 27, 2015 11:45 PM > To: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for > librte_pipeline > > > > > You also reiterate that you would like to have the stats always enabled. > You > > can definitely do this, it is one of the available choices, but why not also > > accommodate the users that want to pick the opposite choice? Why force > > apps to spend cycles on stats if the app either does not want these > counters > > (library counters not relevant for that app, maybe the app is only > interested > > in maintaining some other stats that it implements itself) or do not want > > them anymore (maybe they only needed them during debug phase), etc? > > Jay asked this question, and I did my best in my reply to describe our > > motivation (http://www.dpdk.org/ml/archives/dev/2015- > May/017992.html). > > Maybe you missed that post, it would be good to get your reply on this one > > too. > > > > I want to see DPDK get out of the config madness. > > This is real code, not an Intel benchmark special. > > > I agree that statistics will definitely be required in most real-world > production > environments and the overhead > from per-core stats gathering will be minimal if the data structures are such > that CPU cache thrashing is avoided. > However, if there are scenarios where it is desirable to turn stats off, I > think > we can live with a config option. > I am not comfortable with using the log level to enable/disable statistics as > they are not really related. A > separate config option for stats collection seems like a reasonable > compromise. > > Raja
[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, May 26, 2015 10:48 PM > To: Dumitrescu, Cristian > Cc: Gajdzica, MaciejX T; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for > librte_pipeline > ports and tables > > On Tue, 26 May 2015 21:35:22 + > "Dumitrescu, Cristian" wrote: > > > > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > > > Hemminger > > > Sent: Tuesday, May 26, 2015 3:57 PM > > > To: Gajdzica, MaciejX T > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for > librte_pipeline > > > ports and tables > > > > > > On Tue, 26 May 2015 15:39:18 +0200 > > > Maciej Gajdzica wrote: > > > > > > > +#if RTE_LOG_LEVEL == RTE_LOG_DEBUG > > > > +#define RTE_PIPELINE_STATS_ADD(counter, val) \ > > > > + ({ (counter) += (val); }) > > > > + > > > > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) \ > > > > + ({ (counter) += __builtin_popcountll(mask); }) > > > > +#else > > > > +#define RTE_PIPELINE_STATS_ADD(counter, val) > > > > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) > > > > +#endif > > > > > > This is worse idea than the previous one. > > > I want statistics done on a per lcore basis, and always available > > > because real users on production system want statistics (but they > > > don't want log spam). > > > > Stephen, > > > > Thomas and myself converged towards this solution, Thomas asked if > anybody objects, you did not (http://www.dpdk.org/ml/archives/dev/2015- > May/018099.html) . You say this idea is bad, but what exactly is your > objection? Do you have an alternative proposal? > > Yes. Always keep statistics. > > We use functions like this internally. > > struct xxx_stats { > uint64_t mib[XXX_MIB_MAX]; > }; > extern struct xxx_stats xxx_stats[RTE_MAX_LCORE]; > > #define XXXSTAT_INC(type) xxxstats[rte_lcore_id()].mibs[type]++ > > This can only be applied if stats are always enabled. I know this is your preferred choice, but other people have the requirement to be able to have the stats collection configurable, and we should also accommodate the needs of those people. Your code snippet does not provide a solution for this problem. > > You already mentioned in the previous thread you would like to have per > lcore statistics. I was kindly asking you to describe your idea, but you did > not > do this yet (http://www.dpdk.org/ml/archives/dev/2015-May/017956.html ). > Can you please describe it? Each port instance has its own statistics > counters. > Each lcore can run one or more pipeline instances, therefore each lcore > typically runs several port instances of the same or different type (each port > instance with its own statistics), so how is "per lcore stats" requirement > applicable here? > > I thought you were familiar with technique of having per-cpu structures and > variables > widely used in Linux kernel to prevent cache thrashing. Although you call it > false sharing, > it isn't false., it happens when same counter ping-pongs between multiple > threads for > no added benefit. The "per lcore stats" is applicable to stats structures that are accessed by more than one lcore. In our case, each port instance is handled by a single lcore, so it is never the case that two lcores would access the stats of the same port instance. So, we can safely say that port stats are "per lcore" already. > > > > You also reiterate that you would like to have the stats always enabled. You > can definitely do this, it is one of the available choices, but why not also > accommodate the users that want to pick the opposite choice? Why force > apps to spend cycles on stats if the app either does not want these counters > (library counters not relevant for that app, maybe the app is only interested > in maintaining some other stats that it implements itself) or do not want > them anymore (maybe they only needed them during debug phase), etc? > Jay asked this question, and I did my best in my reply to describe our > motivation (http://www.dpdk.org/ml/archives/dev/2015-May/017992.html). > Maybe you missed that post, it would be good to get your reply on this one > too. > > I want to see DPDK get out of the config madness. > This is real code, not an Intel benchmark special. You seem to make this assumption that our real reason for hav
[dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Tuesday, May 26, 2015 3:58 PM > To: Gajdzica, MaciejX T > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing > table stats > > On Tue, 26 May 2015 14:39:38 +0200 > Maciej Gajdzica wrote: > > > + > > /** Lookup table interface defining the lookup table operation */ > > struct rte_table_ops { > > rte_table_op_create f_create; /**< Create */ > > @@ -194,6 +218,7 @@ struct rte_table_ops { > > rte_table_op_entry_add f_add; /**< Entry add */ > > rte_table_op_entry_delete f_delete; /**< Entry delete */ > > rte_table_op_lookup f_lookup; /**< Lookup */ > > + rte_table_op_stats_read f_stats;/**< Stats */ > > }; > > Another good idea, which is an ABI change. This is simply adding a new API function, this is not changing any function prototype. There is no change required in the map file of this library. Is there anything we should have done and we did not do?
[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Tuesday, May 26, 2015 3:57 PM > To: Gajdzica, MaciejX T > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] pipeline: add statistics for > librte_pipeline > ports and tables > > On Tue, 26 May 2015 15:39:18 +0200 > Maciej Gajdzica wrote: > > > +#if RTE_LOG_LEVEL == RTE_LOG_DEBUG > > +#define RTE_PIPELINE_STATS_ADD(counter, val) \ > > + ({ (counter) += (val); }) > > + > > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) \ > > + ({ (counter) += __builtin_popcountll(mask); }) > > +#else > > +#define RTE_PIPELINE_STATS_ADD(counter, val) > > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) > > +#endif > > This is worse idea than the previous one. > I want statistics done on a per lcore basis, and always available > because real users on production system want statistics (but they > don't want log spam). Stephen, Thomas and myself converged towards this solution, Thomas asked if anybody objects, you did not (http://www.dpdk.org/ml/archives/dev/2015-May/018099.html) . You say this idea is bad, but what exactly is your objection? Do you have an alternative proposal? You already mentioned in the previous thread you would like to have per lcore statistics. I was kindly asking you to describe your idea, but you did not do this yet (http://www.dpdk.org/ml/archives/dev/2015-May/017956.html ). Can you please describe it? Each port instance has its own statistics counters. Each lcore can run one or more pipeline instances, therefore each lcore typically runs several port instances of the same or different type (each port instance with its own statistics), so how is "per lcore stats" requirement applicable here? You also reiterate that you would like to have the stats always enabled. You can definitely do this, it is one of the available choices, but why not also accommodate the users that want to pick the opposite choice? Why force apps to spend cycles on stats if the app either does not want these counters (library counters not relevant for that app, maybe the app is only interested in maintaining some other stats that it implements itself) or do not want them anymore (maybe they only needed them during debug phase), etc? Jay asked this question, and I did my best in my reply to describe our motivation (http://www.dpdk.org/ml/archives/dev/2015-May/017992.html). Maybe you missed that post, it would be good to get your reply on this one too. Thanks, Cristian
[dpdk-dev] [PATCH v3 00/10] table: added table statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Tuesday, May 26, 2015 1:40 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3 00/10] table: added table statistics > > Added statistics for every type of table. By default all table statistics > are disabled, user must activate them in config file. > > Changes in v2: > - added missing signoffs > > Changes in v3: > - removed new config options to enable/disable stats > - using RTE_LOG_LEVEL instead > > Maciej Gajdzica (10): > table: added structure for storing table stats > table: added acl table stats > table: added array table stats > table: added hash_ext table stats > table: added hash_key16 table stats > table: added hash_key32 table stats > table: added hash_key8 table stats > table: added hash_lru table stats > table: added lpm_ipv6 table stats > table: added lpm table stats > > lib/librte_table/rte_table.h| 25 +++ > lib/librte_table/rte_table_acl.c| 35 + > lib/librte_table/rte_table_array.c | 34 +++- > lib/librte_table/rte_table_hash_ext.c | 44 > ++ > lib/librte_table/rte_table_hash_key16.c | 41 > > lib/librte_table/rte_table_hash_key32.c | 41 > > lib/librte_table/rte_table_hash_key8.c | 52 > +++ > lib/librte_table/rte_table_hash_lru.c | 44 > ++ > lib/librte_table/rte_table_lpm.c| 34 > lib/librte_table/rte_table_lpm_ipv6.c | 34 > 10 files changed, 383 insertions(+), 1 deletion(-) > > -- > 1.7.9.5 Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v3 00/13] port: added port statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Tuesday, May 26, 2015 10:23 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3 00/13] port: added port statistics > > Added statistics for every type of port. By default all port statistics > are disabled, user must activate them in config file. > > Changes in v2: > - added missing signoffs > > Changes in v3: > - removed new config options to enable/disable stats > - using RTE_LOG_LEVEL instead > > Maciej Gajdzica (13): > port: added structures for port stats > port: added port_ethdev_reader stats > port: added port_ethdev_writer stats > port: added port_ethdev_writer_nodrop stats > port: added port_frag stats > port: added port_ras stats > port: added port_ring_reader stats > port: added port_ring_writer stats > port: added port_ring_writer_nodrop stats > port: added port_sched_reader stats > port: added port_sched_writer stats > port: added port_source stats > port: added port_sink stats > > lib/librte_port/rte_port.h | 60 +++-- > lib/librte_port/rte_port_ethdev.c | 110 > +- > lib/librte_port/rte_port_frag.c| 36 ++ > lib/librte_port/rte_port_ras.c | 38 +++ > lib/librte_port/rte_port_ring.c| 114 > +++- > lib/librte_port/rte_port_sched.c | 96 +-- > lib/librte_port/rte_port_source_sink.c | 98 > +-- > 7 files changed, 535 insertions(+), 17 deletions(-) > > -- > 1.7.9.5 Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
From: Jay Rolette [mailto:role...@infiniteio.com] Sent: Thursday, May 21, 2015 4:00 PM To: Dumitrescu, Cristian Cc: Thomas Monjalon; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables On Thu, May 21, 2015 at 8:33 AM, Dumitrescu, Cristian mailto:cristian.dumitrescu at intel.com>> wrote: > > The problem I see with this approach is that you cannot turn off debug > > messages while still having the statistics collected. We should allow > > people to collects the stats without getting the debug messages. How do > > we do this? > > By setting build-time log level to debug, you would enable stats and debug > messages. Then you can adjust the log messages level with the run-time > option --log-level. This is a really clever trick! I have to say it took me some time to make sure I got it right :) So when RTE_LOG_LEVEL (build time option) is set to DEBUG (i.e. 8), then we get both the stats and the debug messages, but then we can adjust the level of debug messages at run-time through the --log-level EAL option. I can work with this option, thank you! There is one more simple proposal that came to my mind late last night: how about single config file option RTE_COLLECT_STATS=y/n that should be used by all the libs across the board to indicate whether stats should be enabled or not? This is logically very similar to the solution above, as they both look at a single option in the config file, but maybe it is also more intuitive for people. I will go with your choice. Which one do you pick? As Stephen said previously, any real DPDK app needs stats. You can't support network products operationally without them. What's the point of making them optional other than trying to juice benchmarks? Jay Hi Jay, As explained in this thread, our strategy as a library is to keep all options open for the application: the application should be the one to decide whether the statistics collection should be enabled or not. The library should not take application level decisions: a) For application A, these stats are relevant, so they should be collected; b) For application B, these counters are not relevant, so they should not be collected, so we allow the application to spend the CPU cycles doing some other useful work; c) For application C, these counters might only be relevant during debugging phase, so they should be collected during that time, while later on, when debugging is completed, their collection should be disabled. I do not see why we should force the application to collect the stats if/when it does not need them. The library should allow the application to decide what the library should do, not the other way around. Regards, Cristian
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, May 21, 2015 8:30 AM > To: Dumitrescu, Cristian > Cc: Wodkowski, PawelX; dev at dpdk.org; Jastrzebski, MichalX K > Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for > librte_pipeline > ports and tables > > 2015-05-20 23:41, Dumitrescu, Cristian: > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > 2015-05-20 13:57, Dumitrescu, Cristian: > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > > > Thanks for the detailed explanation. > > > > > > > > > > You are raising a trade-off problem about > > > > > feature/maintenance/performance. > > > > > I think we must choose among 4 solutions: > > > > > 1/ always enabled > > > > > 2/ build-time log level > > > > > 3/ run-time option > > > > > 4/ build-time option > > > > > > > > > > It's good to have this discussion and let other contributors giving > > > > > their > > > > > opinion. > > > > > > > > > > 2015-05-19 22:41, Dumitrescu, Cristian: > > > > > > 1. What is the technical solution to avoid performance loss due to > stats > > > > > > support? > > > > > > Generally, I would agree with you that config options should be > avoided, > > > > > > especially those that alter the API (function prototypes, data > structure > > > > > > layouts, etc). Please note this is not the case for any of our > > > > > > patches, > > > > > > as only the stats collection is enabled/disabled, while the data > > > > > > structures and functions are not changed by the build time option. > > > > > > > > > > > > But what can you do for statistics? We all know that collecting the > stats > > > > > > sucks up CPU cycles, especially due to memory accesses, so stats > always > > > > > > have a run-time cost. Traditionally, stats are typically enabled for > > > > > > debugging purposes and then turned off for the release version > when > > > > > > performance is required. How can this be done if build time flags > > > > > > are > > > > > > not used? > > > > > > > > > > Statistics in drivers are always enabled (first solution). > > > > > If those statistics are only used for debugging, why not using the > build- > > > > > time option CONFIG_RTE_LOG_LEVEL? (second solution) > > > > > > > > Can you please describe what solution 2 on your list (build-time log > > > > level) consists of? > > > > > > > > I see log level useful for printing messages when an event takes place, > > > > but this is not what these stats patches are about. We want to poll > > > > for those counters on demand: if the build-time flag is off, then the > > > > value of those counters is 0; when the build-time is on, then the stats > > > > counters have the real value. Basically, the build-time flag only > > > > enables/disables the update of the counters at run-time, which is > where > > > > CPU cycles are consumed. I am not sure how the log levels can help > here? > > > > > > I think that counting stats is a kind of logging. > > > Some stats are always counted (see drivers) and you want to use these > ones > > > only for debugging (after rebuilding DPDK with some debug options). > > > So I suggest, as second solution, to check CONFIG_RTE_LOG_LEVEL is at > > > debug > > > level instead of having one option per module. > > > It would be implemented with "#if RTE_LOG_LEVEL == RTE_LOG_DEBUG" > in > > > RTE_PIPELINE_STATS_ADD. > > > > The problem I see with this approach is that you cannot turn off debug > > messages while still having the statistics collected. We should allow > > people to collects the stats without getting the debug messages. How do > > we do this? > > By setting build-time log level to debug, you would enable stats and debug > messages. Then you can adjust the log messages level with the run-time > option --log-level. This is a really clever trick! I have to say it took me some time to make sure I got it right :) So when RTE_LOG_LEVEL (build time option) is set to DEBUG (i.e. 8), then we get both the stats and the debug messages, but then we can adjust the level of debug messages at run-time through the --log-level EAL option. I can work with this option, thank you! There is one more simple proposal that came to my mind late last night: how about single config file option RTE_COLLECT_STATS=y/n that should be used by all the libs across the board to indicate whether stats should be enabled or not? This is logically very similar to the solution above, as they both look at a single option in the config file, but maybe it is also more intuitive for people. I will go with your choice. Which one do you pick?
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, May 20, 2015 11:02 PM > To: Stephen Hemminger; Dumitrescu, Cristian > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for > librte_pipeline > ports and tables > > 2015-05-20 10:59, Stephen Hemminger: > > On Wed, 20 May 2015 16:44:35 +0200 > > Thomas Monjalon wrote: > > > > > Please Cristian, do not top post. > > > I'm replacing your comment in the right context. > > > > > > 2015-05-20 13:57, Dumitrescu, Cristian: > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > > > Thanks for the detailed explanation. > > > > > > > > > > You are raising a trade-off problem about > > > > > feature/maintenance/performance. > > > > > I think we must choose among 4 solutions: > > > > > 1/ always enabled > > > > > 2/ build-time log level > > > > > 3/ run-time option > > > > > 4/ build-time option > > > > > > > > > > It's good to have this discussion and let other contributors giving > > > > > their > > > > > opinion. > > > > > > > > > > 2015-05-19 22:41, Dumitrescu, Cristian: > > > > > > 1. What is the technical solution to avoid performance loss due to > stats > > > > > > support? > > > > > > Generally, I would agree with you that config options should be > avoided, > > > > > > especially those that alter the API (function prototypes, data > structure > > > > > > layouts, etc). Please note this is not the case for any of our > > > > > > patches, > > > > > > as only the stats collection is enabled/disabled, while the data > > > > > > structures and functions are not changed by the build time option. > > > > > > > > > > > > But what can you do for statistics? We all know that collecting the > stats > > > > > > sucks up CPU cycles, especially due to memory accesses, so stats > always > > > > > > have a run-time cost. Traditionally, stats are typically enabled for > > > > > > debugging purposes and then turned off for the release version > when > > > > > > performance is required. How can this be done if build time flags > > > > > > are > not > > > > > > used? > > > > > > > > > > Statistics in drivers are always enabled (first solution). > > > > > If those statistics are only used for debugging, why not using the > build-time > > > > > option CONFIG_RTE_LOG_LEVEL? (second solution) > > > > > > > > Can you please describe what solution 2 on your list (build-time log > > > > level) consists of? > > > > > > > > I see log level useful for printing messages when an event takes place, > > > > but this is not what these stats patches are about. We want to poll > > > > for those counters on demand: if the build-time flag is off, then the > > > > value of those counters is 0; when the build-time is on, then the stats > > > > counters have the real value. Basically, the build-time flag only > > > > enables/disables the update of the counters at run-time, which is > where > > > > CPU cycles are consumed. I am not sure how the log levels can help > here? > > > > > > I think that counting stats is a kind of logging. > > > Some stats are always counted (see drivers) and you want to use these > ones > > > only for debugging (after rebuilding DPDK with some debug options). > > > So I suggest, as second solution, to check CONFIG_RTE_LOG_LEVEL is at > debug > > > level instead of having one option per module. > > > It would be implemented with "#if RTE_LOG_LEVEL == RTE_LOG_DEBUG" > in > > > RTE_PIPELINE_STATS_ADD. > > > > > > > In my experience, per-queue or per-cpu statistics have no visible > performance > > impact. And every real world deployment wants statistics > > Yes. Maybe that having a benchmark with per-cpu stats would help to discuss > the first solution (stats always enabled). What are per-cpu stats? Are you referring to the false sharing problem when each cpu core maintains its own stats, but stats from several cores end up being stores in the same cache line? We don't have this problem here.
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, May 20, 2015 3:45 PM > To: Dumitrescu, Cristian > Cc: Wodkowski, PawelX; dev at dpdk.org; Jastrzebski, MichalX K > Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for > librte_pipeline > ports and tables > > Please Cristian, do not top post. > I'm replacing your comment in the right context. > > 2015-05-20 13:57, Dumitrescu, Cristian: > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > Thanks for the detailed explanation. > > > > > > You are raising a trade-off problem about > > > feature/maintenance/performance. > > > I think we must choose among 4 solutions: > > > 1/ always enabled > > > 2/ build-time log level > > > 3/ run-time option > > > 4/ build-time option > > > > > > It's good to have this discussion and let other contributors giving their > > > opinion. > > > > > > 2015-05-19 22:41, Dumitrescu, Cristian: > > > > 1. What is the technical solution to avoid performance loss due to stats > > > > support? > > > > Generally, I would agree with you that config options should be > avoided, > > > > especially those that alter the API (function prototypes, data structure > > > > layouts, etc). Please note this is not the case for any of our patches, > > > > as only the stats collection is enabled/disabled, while the data > > > > structures and functions are not changed by the build time option. > > > > > > > > But what can you do for statistics? We all know that collecting the > > > > stats > > > > sucks up CPU cycles, especially due to memory accesses, so stats always > > > > have a run-time cost. Traditionally, stats are typically enabled for > > > > debugging purposes and then turned off for the release version when > > > > performance is required. How can this be done if build time flags are > not > > > > used? > > > > > > Statistics in drivers are always enabled (first solution). > > > If those statistics are only used for debugging, why not using the build- > time > > > option CONFIG_RTE_LOG_LEVEL? (second solution) > > > > Can you please describe what solution 2 on your list (build-time log > > level) consists of? > > > > I see log level useful for printing messages when an event takes place, > > but this is not what these stats patches are about. We want to poll > > for those counters on demand: if the build-time flag is off, then the > > value of those counters is 0; when the build-time is on, then the stats > > counters have the real value. Basically, the build-time flag only > > enables/disables the update of the counters at run-time, which is where > > CPU cycles are consumed. I am not sure how the log levels can help here? > > I think that counting stats is a kind of logging. > Some stats are always counted (see drivers) and you want to use these ones > only for debugging (after rebuilding DPDK with some debug options). > So I suggest, as second solution, to check CONFIG_RTE_LOG_LEVEL is at > debug > level instead of having one option per module. > It would be implemented with "#if RTE_LOG_LEVEL == RTE_LOG_DEBUG" in > RTE_PIPELINE_STATS_ADD. The problem I see with this approach is that you cannot turn off debug messages while still having the statistics collected. We should allow people to collects the stats without getting the debug messages. How do we do this? Although most people use the stats for debugging and then turn them off, some other people are OK with the stats related performance drop for the production code (e.g. they might want to log some stats periodically), but they want to get rid of the debug messages. Potential idea: Would it be acceptable to create a new log level that is dedicated to stats? This would have to be higher priority than all the present ones, so it needs to be: #define RTE_LOG_STATS1U. This way, we can get rid of all the debug messages while still logging stats: #if ((RTE_LOG_STATS <= RTE_LOG_LEVEL) #define STATS_COUNTER_ADD(level, type, counter, val) counter += val #else #define STATS_COUNTER_ADD(level, type, counter, val) #endif However, this won't allow getting the debug messages with stats collection being suppressed. But probably this case is not relevant.
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
Hi Thomas, Can you please describe what solution 2 on your list (build-time log level) consists of? I see log level useful for printing messages when an event takes place, but this is not what these stats patches are about. We want to poll for those counters on demand: if the build-time flag is off, then the value of those counters is 0; when the build-time is on, then the stats counters have the real value. Basically, the build-time flag only enables/disables the update of the counters at run-time, which is where CPU cycles are consumed. I am not sure how the log levels can help here? Regards, Cristian > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, May 20, 2015 1:06 AM > To: Dumitrescu, Cristian > Cc: Wodkowski, PawelX; dev at dpdk.org; Jastrzebski, MichalX K > Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for > librte_pipeline > ports and tables > > Hi Cristian, > > Thanks for the detailed explanation. > > You are raising a trade-off problem about > feature/maintenance/performance. > I think we must choose among 4 solutions: > 1/ always enabled > 2/ build-time log level > 3/ run-time option > 4/ build-time option > > It's good to have this discussion and let other contributors giving their > opinion. > > 2015-05-19 22:41, Dumitrescu, Cristian: > > Hi Thomas, > > > > I am asking for your help to identify a technical solution for moving > > forward. Enabling the stats is a very important part of our Packet > > Framework enhancements, as it is a dependency for a set of patches > > we plan to submit next. > > > > 1. What is the technical solution to avoid performance loss due to stats > > support? > > Generally, I would agree with you that config options should be avoided, > > especially those that alter the API (function prototypes, data structure > > layouts, etc). Please note this is not the case for any of our patches, > > as only the stats collection is enabled/disabled, while the data > > structures and functions are not changed by the build time option. > > > > But what can you do for statistics? We all know that collecting the stats > > sucks up CPU cycles, especially due to memory accesses, so stats always > > have a run-time cost. Traditionally, stats are typically enabled for > > debugging purposes and then turned off for the release version when > > performance is required. How can this be done if build time flags are not > > used? > > Statistics in drivers are always enabled (first solution). > If those statistics are only used for debugging, why not using the build-time > option CONFIG_RTE_LOG_LEVEL? (second solution) > > > Enabling stats conditionally at run-time has a significant cost > > potentially, as this involves adding branches to the code; this is > > Yes, a run-time option (similar to run-time log level) will cost only > an "unlikely if" branching. (third solution) > > > particularly bad due to branches being added in the library code, > > which is not aware of the overhead of the application running on top of > > it on the same CPU core. We are battling cycles very hard in the > > Packet Framework, trying to avoid any waste, and the cost of branch > > misprediction of ~15 cycles is prohibitive when your packet budget > > (for the overall app, not just the library) is less than ~200 cycles. > > This kind of issue applies to many areas of DPDK. > > > Sometimes, branches are predicted correctly by the CPU, some other > > times not, for example when the application code adds a lot of additional > > branches (which is the typical case for an app). This is outside of the > > control of the library. > > > > I would not recommend the run-time conditional option for stats support > > in DPDK, where performance is key. The only option that looks feasible > > to me to avoid performance impact is well-behaved build time option > > (i.e. those that does not change the API functions and data structures). > > The compile-time option makes testing coverage really complex. > (fourth solution) > > > 2. You already agreed that having build time options for performance > > reasons is fine, why change now? > > Right, build time options are tolerated in DPDK because it is a performance > oriented project. We also have to avoid 2 issues: > - increasing number of compile-time options hide some bugs > - binary distributions of DPDK cannot use these compile-time "features" > > > We had a previous email thread on QoS stats > (http://www.dpdk.org/ml/archives/dev/2015-February/0
[dpdk-dev] [PATCH v2 1/3] port: added WRITER_APPROACH == 1 implementation to ring port
HI Thomas and David, We can remove one of the code branches if you guys feel strongly about it. The reason we recommended to keep both is because both of them are working, and we wanted to keep both of them for a while to get some feedback from other people about which one they prefer. It might be that some apps would prefer one over the other for performance reasons. But again, we can resend this patch with only one code path present. Regards, Cristian > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Sunday, May 17, 2015 10:45 PM > To: Jastrzebski, MichalX K; Gajdzica, MaciejX T > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/3] port: added WRITER_APPROACH == 1 > implementation to ring port > > 2015-04-30 13:58, Michal Jastrzebski: > > From: Maciej Gajdzica > > > > Added better optimized implementation of tx_bulk for ring writer port > > based on > > similar solution in ethdev_writer port. New implementation sends burst > > without > > copying data to internal buffer if it is possible. > > > > Signed-off-by: Maciej Gajdzica > [...] > > +#if RTE_PORT_RING_WRITER_APPROACH == 0 > > Nack > Maybe you missed the previous comment: > http://dpdk.org/ml/archives/dev/2015-March/015999.html
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
Hi Thomas, I am asking for your help to identify a technical solution for moving forward. Enabling the stats is a very important part of our Packet Framework enhancements, as it is a dependency for a set of patches we plan to submit next. 1. What is the technical solution to avoid performance loss due to stats support? Generally, I would agree with you that config options should be avoided, especially those that alter the API (function prototypes, data structure layouts, etc). Please note this is not the case for any of our patches, as only the stats collection is enabled/disabled, while the data structures and functions are not changed by the build time option. But what can you do for statistics? We all know that collecting the stats sucks up CPU cycles, especially due to memory accesses, so stats always have a run-time cost. Traditionally, stats are typically enabled for debugging purposes and then turned off for the release version when performance is required. How can this be done if build time flags are not used? Enabling stats conditionally at run-time has a significant cost potentially, as this involves adding branches to the code; this is particularly bad due to branches being added in the library code, which is not aware of the overhead of the application running on top of it on the same CPU core. We are battling cycles very hard in the Packet Framework, trying to avoid any waste, and the cost of branch misprediction of ~15 cycles is prohibitive when your packet budget (for the overall app, not just the library) is less than ~200 cycles. Sometimes, branches are predicted correctly by the CPU, some other times not, for example when the application code adds a lot of additional branches (which is the typical case for an app). This is outside of the control of the library. I would not recommend the run-time conditional option for stats support in DPDK, where performance is key. The only option that looks feasible to me to avoid performance impact is well-behaved build time option (i.e. those that does not change the API functions and data structures). 2. You already agreed that having build time options for performance reasons is fine, why change now? We had a previous email thread on QoS stats (http://www.dpdk.org/ml/archives/dev/2015-February/013820.html ), and you stated: >We must remove and avoid build-time options. >The only ones which might be acceptable are the ones which allow more >performance by disabling some features. I think stats support is the perfect example where this is applicable. We took this as the guidance from you, and we worked out these stats exactly as you recommended. Let me also mention that the format of the API functions and data structures for stats were also designed to match exactly the requirements from this discussion (e.g. clear on read approach, etc). Our functions match exactly the strategy that Stephen and myself agreed on at that time. Our work was done under this agreement. 3. There are other libraries that already apply the build time option for stats support Other DPDK libraries provide stats based on build time configuration: Ethdev, QoS scheduler, IP fragmentation. Are you suggesting we need to remove the build time stats option for these libraries, too? If yes, what about the performance impact, which would be prohibitive (I can definitely testify this for QOS scheduler, where every cycle counts). This would result in choosing between functionality (stats support) vs. performance, and it might be that for some of these libraries the decision will be to remove the stats support altogether in order to keep the performance unchanged. I think we should encourage people to add stats to more libraries, and if we forbid build-time option for stats support we send a strong signal to discourage people from adding stats support to the applicable DPDK libraries. 4. Coding guidelines Is there a rule on "build time options are no longer allowed in DPDK" published somewhere in our documentation? Is it part of the coding standard? Am I wrong to say this is not the case at this point? If we intend to have such a rule, then I think it needs to be discussed on this list and published, to avoid any misinterpretation (leading to rework and debates). It should be published together with all the acceptable exceptions that are part of any rule definition. I personally think the stats support should be such an exception. Thanks for your help! Regards, Cristian > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Monday, May 18, 2015 12:02 PM > To: Wodkowski, PawelX > Cc: dev at dpdk.org; Dumitrescu, Cristian; Jastrzebski, MichalX K > Subject: Re: [dpdk-dev] [PATCH v2] pipeline: add statistics for > librte_pipeline > ports and tables > > 2015-05-05 15:11, Dumitrescu, Cristian: > > From: dev [mailto:dev-bounces
[dpdk-dev] [PATCH] examples/ip_pipeline: fix build with clang 3.6
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Tuesday, May 19, 2015 11:37 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] examples/ip_pipeline: fix build with clang 3.6 > > This error is detected: > examples/ip_pipeline/cmdline.c:272:15: error: address of array > 'params->file_path' will always evaluate to 'true' > if (!params->file_path) { > ~^ > > file_path is an array in a structure so it's unneeded to check it. > > Signed-off-by: Thomas Monjalon > --- > examples/ip_pipeline/cmdline.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/examples/ip_pipeline/cmdline.c > b/examples/ip_pipeline/cmdline.c > index 152acb5..3173fd0 100644 > --- a/examples/ip_pipeline/cmdline.c > +++ b/examples/ip_pipeline/cmdline.c > @@ -268,12 +268,6 @@ cmd_run_file_parsed( > struct cmdline *file_cl; > int fd; > > - /* Check params */ > - if (!params->file_path) { > - printf("Illegal value for file path (%s)\n", params->file_path); > - return; > - } > - > fd = open(params->file_path, O_RDONLY, 0); > if (fd < 0) { > printf("Illegal value for file path (%s)\n", params->file_path); > -- > 2.4.0 Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2 0/2] cmdline: add polling mode for command line
> -Original Message- > From: Wodkowski, PawelX > Sent: Wednesday, May 13, 2015 1:00 PM > To: dev at dpdk.org > Cc: Dumitrescu, Cristian > Subject: [PATCH v2 0/2] cmdline: add polling mode for command line > > This patchset adds the ability to process console input in the same thread > as packet processing by using poll() function and fixes some minor issues. > > v2 changes: > - add doxygen documentation for cmdline_poll() > - map file issue fixed > - use proper email address. > - add addtional missing include in cmdline_parse_ipaddr.h > > Pawel Wodkowski (2): > cmdline: fix missing include files > cmdline: add polling mode for command line > > doc/api/doxy-api.conf | 1 + > lib/librte_cmdline/cmdline.c | 35 > ++ > lib/librte_cmdline/cmdline.h | 24 > lib/librte_cmdline/cmdline_parse_ipaddr.h | 2 ++ > lib/librte_cmdline/cmdline_rdline.h| 1 + > lib/librte_cmdline/cmdline_vt100.h | 2 ++ > lib/librte_cmdline/rte_cmdline_version.map | 8 +++ > 7 files changed, 73 insertions(+) > > -- > 1.9.1 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH 0/2] cmdline: add polling mode for command line
> -Original Message- > From: Pawel Wodkowski [mailto:pwodkowx at stargo] > Sent: Tuesday, May 12, 2015 12:10 PM > To: dev at dpdk.org > Cc: Dumitrescu, Cristian; Jastrzebski, MichalX K > Subject: [PATCH 0/2] cmdline: add polling mode for command line > > This patchset adds the ability to process console input in the same thread > as packet processing by using poll() function and fixes some minor issues. > > Pawel Wodkowski (2): > cmdline: fix missing include files > cmdline: add polling mode for command line > > lib/librte_cmdline/cmdline.c | 35 > ++ > lib/librte_cmdline/cmdline.h | 4 > lib/librte_cmdline/cmdline_rdline.h| 1 + > lib/librte_cmdline/cmdline_vt100.h | 2 ++ > lib/librte_cmdline/rte_cmdline_version.map | 1 + > 5 files changed, 43 insertions(+) > > -- > 1.9.1 Acked by: Cristian Dumitrescu The existing cmdline API requires a thread to be completely consumed for running the CLI, so it is not possible to push any other tasks (typically slow tasks, control plane related) on the same thread. This is because the cmdline_interact() function implements an infinite loop internally, so once called after initialization, it only returns when the application is terminated (e.g. as result of quit CLI command). This patch removes this limitation by providing an alternative function called cmdline_poll(), which allows the application to own the thread dispatch loop and work with the CLI in polling mode, hence making it possible for the application to push additional work on the same thread. The thread dispatch loop should be owned by the application, and not by the cmdline library.
[dpdk-dev] [PATCH 0/6] rte_sched: patches against 2.o
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Wednesday, April 29, 2015 6:05 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: [PATCH 0/6] rte_sched: patches against 2.o > > This is a subset of earlier patches for QoS (rte_sched) subsystem. > Only changes were to fix whitespace and update against DPDK 2.0 > > Stephen Hemminger (6): > rte_sched: make RED optional at runtime > rte_sched: expand scheduler hierarchy for more VLAN's > rte_sched: keep track of RED drops > rte_sched: allow reading without clearing > rte_sched: don't put tabs in log messages > rte_sched: use correct log level > > app/test/test_sched.c| 4 +- > examples/qos_sched/stats.c | 16 -- > lib/librte_mbuf/rte_mbuf.h | 5 +- > lib/librte_sched/rte_sched.c | 113 -- > - > lib/librte_sched/rte_sched.h | 62 +++- > 5 files changed, 130 insertions(+), 70 deletions(-) > > -- > 2.1.4 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Thursday, April 30, 2015 1:16 PM > To: dev at dpdk.org > Cc: Wodkowski, PawelX > Subject: [dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline > ports and tables > > From: Pawel Wodkowski > > This patch adds statistics collection for librte_pipeline. > Those statistics ale disabled by default during build time. > > Signed-off-by: Pawel Wodkowski > --- > config/common_bsdapp |1 + > config/common_linuxapp |1 + > lib/librte_pipeline/rte_pipeline.c | 185 > +--- > lib/librte_pipeline/rte_pipeline.h | 99 +++ > 4 files changed, 275 insertions(+), 11 deletions(-) > > diff --git a/config/common_bsdapp b/config/common_bsdapp > index 1d0f5b2..e4f0bf5 100644 > --- a/config/common_bsdapp > +++ b/config/common_bsdapp > @@ -414,6 +414,7 @@ CONFIG_RTE_TABLE_LPM_STATS_COLLECT=n > # Compile librte_pipeline > # > CONFIG_RTE_LIBRTE_PIPELINE=y > +CONFIG_RTE_PIPELINE_STATS_COLLECT=n > > # > # Compile librte_kni > diff --git a/config/common_linuxapp b/config/common_linuxapp > index 8b01ca9..05553d9 100644 > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > @@ -421,6 +421,7 @@ CONFIG_RTE_TABLE_LPM_STATS_COLLECT=n > # Compile librte_pipeline > # > CONFIG_RTE_LIBRTE_PIPELINE=y > +CONFIG_RTE_PIPELINE_STATS_COLLECT=n > > # > # Compile librte_kni > diff --git a/lib/librte_pipeline/rte_pipeline.c > b/lib/librte_pipeline/rte_pipeline.c > index 36d92c9..69bf003 100644 > --- a/lib/librte_pipeline/rte_pipeline.c > +++ b/lib/librte_pipeline/rte_pipeline.c > @@ -48,6 +48,17 @@ > > #define RTE_TABLE_INVALID UINT32_MAX > > +#ifdef RTE_PIPELINE_STATS_COLLECT > +#define RTE_PIPELINE_STATS_ADD(counter, val) \ > + ({ (counter) += (val); }) > + > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) \ > + ({ (counter) += __builtin_popcountll(mask); }) > +#else > +#define RTE_PIPELINE_STATS_ADD(counter, val) > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) > +#endif > + > struct rte_port_in { > /* Input parameters */ > struct rte_port_in_ops ops; > @@ -63,6 +74,8 @@ struct rte_port_in { > > /* List of enabled ports */ > struct rte_port_in *next; > + > + uint64_t n_pkts_dropped_by_ah; > }; > > struct rte_port_out { > @@ -74,6 +87,8 @@ struct rte_port_out { > > /* Handle to low-level port */ > void *h_port; > + > + uint64_t n_pkts_dropped_by_ah; > }; > > struct rte_table { > @@ -90,6 +105,12 @@ struct rte_table { > > /* Handle to the low-level table object */ > void *h_table; > + > + /* Stats for this table. */ > + uint64_t n_pkts_dropped_by_lkp_hit_ah; > + uint64_t n_pkts_dropped_by_lkp_miss_ah; > + uint64_t n_pkts_dropped_lkp_hit; > + uint64_t n_pkts_dropped_lkp_miss; > }; > > #define RTE_PIPELINE_MAX_NAME_SZ 124 > @@ -1040,6 +1061,8 @@ rte_pipeline_action_handler_port_bulk(struct > rte_pipeline *p, > > port_out->f_action_bulk(p->pkts, _mask, port_out- > >arg_ah); > p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= > pkts_mask ^ mask; > + RTE_PIPELINE_STATS_ADD_M(port_out- > >n_pkts_dropped_by_ah, > + pkts_mask ^ mask); > } > > /* Output port TX */ > @@ -1071,6 +1094,9 @@ rte_pipeline_action_handler_port(struct > rte_pipeline *p, uint64_t pkts_mask) > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- > >n_pkts_dropped_by_ah, > + pkt_mask ^ 1LLU); > + > /* Output port TX */ > if (pkt_mask != 0) > port_out->ops.f_tx(port_out- > >h_port, > @@ -1104,6 +1130,9 @@ rte_pipeline_action_handler_port(struct > rte_pipeline *p, uint64_t pkts_mask) > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- > >n_pkts_dropped_by_ah, > + pkt_mask ^ 1LLU); > + > /* Output port TX */ > if (pkt_mask != 0) > port_out->ops.f_tx(port_out- > >h_port, > @@ -1140,6 +1169,9 @@ rte_pipeline_action_handler_port_meta(struct > rte_pipeline *p, > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- > >n_pkts_dropped_by_ah, > +
[dpdk-dev] [PATCH v2 00/10] table: added table statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Thursday, April 30, 2015 1:14 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 00/10] table: added table statistics > > From: Maciej Gajdzica > > Added statistics for every type of table. By default all table statistics > are disabled, user must activate them in config file. > > Maciej Gajdzica (10): > table: added structure for storing table stats > table: added acl table stats > table: added array table stats > table: added hash_ext table stats > table: added hash_key16 table stats > table: added hash_key32 table stats > table: added hash_key8 table stats > table: added hash_lru table stats > table: added lpm_ipv6 table stats > table: added lpm table stats > > config/common_bsdapp|9 ++ > config/common_linuxapp |9 ++ > lib/librte_table/rte_table.h| 25 +++ > lib/librte_table/rte_table_acl.c| 35 + > lib/librte_table/rte_table_array.c | 34 +++- > lib/librte_table/rte_table_hash_ext.c | 44 > ++ > lib/librte_table/rte_table_hash_key16.c | 41 > > lib/librte_table/rte_table_hash_key32.c | 41 > > lib/librte_table/rte_table_hash_key8.c | 52 > +++ > lib/librte_table/rte_table_hash_lru.c | 44 > ++ > lib/librte_table/rte_table_lpm.c| 34 > lib/librte_table/rte_table_lpm_ipv6.c | 34 > 12 files changed, 401 insertions(+), 1 deletion(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2 00/13] port: added port statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Thursday, April 30, 2015 1:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 00/13] port: added port statistics > > From: Maciej Gajdzica > > Added statistics for every type of port. By default all port statistics > are disabled, user must activate them in config file. > > Maciej Gajdzica (13): > port: added structures for port stats > port: added port_ethdev_reader stats > port: added port_ethdev_writer stats > port: added port_ethdev_writer_nodrop stats > port: added port_frag stats > port: added port_ras stats > port: added port_ring_reader stats > port: added port_ring_writer stats > port: added port_ring_writer_nodrop stats > port: added port_sched_reader stats > port: added port_sched_writer stats > port: added port_source stats > port: added port_sink stats > > config/common_bsdapp | 12 > config/common_linuxapp | 12 > lib/librte_port/rte_port.h | 60 ++-- > lib/librte_port/rte_port_ethdev.c | 113 > +- > lib/librte_port/rte_port_frag.c| 36 ++ > lib/librte_port/rte_port_ras.c | 38 ++ > lib/librte_port/rte_port_ring.c| 118 > +++- > lib/librte_port/rte_port_sched.c | 96 -- > lib/librte_port/rte_port_source_sink.c | 98 > -- > 9 files changed, 566 insertions(+), 17 deletions(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2 0/3] port: added frag_ipv6 and ras_ipv6 ports
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Thursday, April 30, 2015 1:03 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/3] port: added frag_ipv6 and ras_ipv6 ports > > From: Maciej Gajdzica > > Added ipv6 versions of ip fragmentation and ip reassembly ports. > > Maciej Gajdzica (3): > port: removed IPV4_MTU_DEFAULT define > port: added ipv6 fragmentation port > port: added ipv6 reassembly port > > lib/librte_port/rte_port_frag.c | 67 -- > lib/librte_port/rte_port_frag.h |9 ++- > lib/librte_port/rte_port_ras.c | 142 > --- > lib/librte_port/rte_port_ras.h |9 ++- > 4 files changed, 167 insertions(+), 60 deletions(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH v2 0/3] port: added ethdev_writer_nodrop and ring_writer_nodrop ports
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Thursday, April 30, 2015 12:58 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/3] port: added ethdev_writer_nodrop and > ring_writer_nodrop ports > > From: Maciej Gajdzica > > When nodrop writer port fails to send data, it retries until reach maximum > number of retries. Also added new tx_bulk implementation for ring writer > port. > > Maciej Gajdzica (3): > port: added WRITER_APPROACH == 1 implementation to ring port > port: added ethdev_writer_nodrop port > port: added ring_writer_nodrop port > > lib/librte_port/rte_port_ethdev.c | 230 > ++ > lib/librte_port/rte_port_ethdev.h | 19 +++ > lib/librte_port/rte_port_ring.c | 285 > + > lib/librte_port/rte_port_ring.h | 16 +++ > 4 files changed, 550 insertions(+) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] QoS Question
Hi Greg, Great question, thank you! Please see my comments below. > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Greg Smith > Sent: Monday, April 20, 2015 7:40 PM > To: dev at dpdk.org > Subject: [dpdk-dev] QoS Question > > Hi DPDK team, > > The docs on QoS > (http://dpdk.org/doc/guides/prog_guide/qos_framework.html# ) describe > the traffic class (TC) as follows: > 1 - The TCs of the same pipe handled in strict priority order. > 2 - Upper limit enforced per TC at the pipe level. > 3 - Lower priority TCs able to reuse pipe bandwidth currently unused by > higher priority TCs. > 4 - When subport TC is oversubscribed (configuration time event), pipe TC > upper limit is capped to a dynamically adjusted value that is shared by all > the > subport pipes. > > Can someone describe how and when the TC upper limit is "dynamically" > changed? This feature is described at length in Programmer's Guide section 21.2.4.6.6. Subport Traffic Class Oversubscription. Please note this feature is not enabled by default in the code base. To enable it, please set flag CONFIG_RTE_SCHED_SUBPORT_TC_OV=y in the DPDK configuration file (config/common_linuxapp or config/common_bsdapp). Subport traffic class oversubscription is implemented only for the lowest priority traffic class (TC3, a.k.a. Best Effort), as usually the Best Effort traffic class (where most of the traffic is) is oversubscribed, while the high priority traffic classes (e.g. voice) are usually fully provisioned. Overprovisioning takes place when the service provider is selling more bandwidth that physically available, i.e. when the summation of the _nominal_ rate assigned to the users exceeds the rate of the subport. This does not necessarily represent a problem, as only a fraction of the users are looking to fully utilize their service (i.e. use up 100% of their nominal rate) at any given time: when the current total demand from all subport users does not exceed the subport rate, no problem exists, as each subscriber has its demand fully serviced; when the current total demand (which changes dynamically) exceeds the limit, it is obviously no longer possible to fully meet the entire demand. In the latter case, it is important that some fairness is taking place. We do not want to have some subscribers getting close to 0% of their nominal rate, while others getting close to 100% of their nominal rate, as this would not be fair. On the other hand, we cannot reduce the nominal rate of the users (e.g. everybody is now allowed 73% of their rate), as the nominal rate of a subscriber is completely disconnected from its current demand: one user might demand only 10% of its rate at this moment, so reserving 73% of its rate for this users results in wasting 63% of its rate, which could otherwise be awarded to some other user which has a higher demand at the same moment. What we need to do is this: we need to apply a water filling algorithm that computes a user quota (common for all the subport users), so that users with current demand less than this quota will be fully serviced, while users with high demand will be truncated. This user quota is updated periodically, with the new value being estimated based on subport consumption from the past: when we see that the previous quota resulted in some subport bandwidth not being consumed, we increase the quota incrementally until the entire subport bandwidth is consumed; when we see that the entire subport bandwidth is consumed, we start dropping the quota incrementally until we see that some subport bandwidth starts to be wasted. > > For example, assume there's a 1Gb/s port and a single 1Gb/s subport and > 2000 pipes each of 1Mb/s (total pipes = 2Gb/s which is > the 1Gb/s subport > which I think means "oversubscribed" as used in the doc). Each Pipe has a > single TC. Yes, agree this is an example of oversubscription. I used a similar example to describe this feature during the DPDK community readout earlier this week (https://youtu.be/_PPklkWGugs). > In that case, would each pipe be shaped to an upper limit of 0.5 Mb/s? Only in the very unlikely event that all the 2000 users are active and each one is asking for 0.5 Mbps or more. Typically, some of these users are currently inactive (demand = 0%) and some others will ask for less than e.g. 0.5 Mbps; whatever subport bandwidth is left unused by the low demand users, it can be awarded to the high demand users (of course, no user will ever get more than its nominal rate). Let's refine the example: let's say that, currently, the demand distribution for the 2000 users is: [500 users: 0 Mbps; 500 users: 0.4 Mbps; 500 users: 0.7 Mbps; 500 users: 1 Mbps]. These users will be awarded the following rates: [500 users: 0 Mbps; 500 users: 0.4 Mbps; 500 users: 0.7 Mbps; 500 users: 0.9 Mbps]. Basically, all users are fully serviced, except the users demanding 1 Mbps, which will be truncated to
[dpdk-dev] DPDK Community Call - Software QoS
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, April 24, 2015 1:17 PM > To: Dumitrescu, Cristian > Cc: O'Driscoll, Tim; dev at dpdk.org > Subject: Re: DPDK Community Call - Software QoS > > 2015-04-24 09:14, Dumitrescu, Cristian: > > Hi Tim, > > > > Thank you for your help in recording this readout! > > The slides that you attached to the email are dropped by the dpdk.org > email server. > > No, they are not dropped: > http://dpdk.org/ml/archives/dev/2015-April/016842.html > http://dpdk.org/ml/archives/dev/attachments/20150423/17a4d8de/ > attachment-0001.pdf > > > Thomas, > > > > I suggest we setup a page on dpdk.org to keep track of all the community > readouts, > > we can store there the slides and a link to the Youtube video, when > available. > > What do you think? > > I think first that we must focus on having a good and up-to-date official > documentation. > > The document you attached and the audio explanation are a snapshot which > can be > referenced in a page dedicated to not updated or external documents. So it > will > be important to put a date on these entries and sort them by date. > > I suggest to create a page "discussion supports" which will be linked from > http://dpdk.org/doc in the "Archives" section. Yes, any section that provides a list of "discussions" works for me. I think we only need a one-liner per discussion: Date, topic, event, author, link to slides/doc, link to video. Thanks, Thomas!
[dpdk-dev] DPDK Community Call - Software QoS
Hi Tim, Thank you for your help in recording this readout! The slides that you attached to the email are dropped by the dpdk.org email server. Thomas, I suggest we setup a page on dpdk.org to keep track of all the community readouts, we can store there the slides and a link to the Youtube video, when available. What do you think? Regards, Cristian > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of O'Driscoll, Tim > Sent: Thursday, April 23, 2015 2:24 PM > To: dev at dpdk.org > Subject: Re: [dpdk-dev] DPDK Community Call - Software QoS > > Thanks to Cristian for a great presentation on QoS on Tuesday. For anybody > who missed it, the slides are attached and the video is available at: > https://youtu.be/_PPklkWGugs > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of O'driscoll, Tim > > Sent: Wednesday, April 1, 2015 4:41 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] DPDK Community Call - Software QoS > > > > Agenda: > > - Software Quality of Service (Cristian Dumitrescu) > > > > > > Tue, Apr 21, 4:00-5:30 PM GMT Daylight Time > > > > Please join my meeting from your computer, tablet or smartphone. > > https://global.gotomeeting.com/join/293233645 > > > > You can also dial in using your phone. > > Ireland : +353 (0) 15 290 180 > > United States : +1 (312) 757-3126 > > Australia : +61 2 9087 3604 > > Austria : +43 (0) 7 2088 0034 > > Belgium : +32 (0) 28 08 9321 > > Canada : +1 (647) 497-9350 > > Denmark : +45 (0) 69 91 80 05 > > Finland : +358 (0) 931 58 1746 > > France : +33 (0) 182 880 780 > > Germany : +49 (0) 692 5736 7211 > > Italy : +39 0 699 26 68 58 > > Netherlands : +31 (0) 208 908 267 > > New Zealand : +64 (0) 9 442 7358 > > Norway : +47 21 54 32 44 > > Spain : +34 911 23 0850 > > Sweden : +46 (0) 853 527 835 > > Switzerland : +41 (0) 435 0006 96 > > United Kingdom : +44 (0) 20 3713 5028 > > > > Access Code: 293-233-645 > > Audio PIN: Shown after joining the meeting > >
[dpdk-dev] [PATCH] pipeline: add statistics for librte_pipeline ports and tables
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 30, 2015 1:06 PM > To: dev at dpdk.org > Cc: Wodkowski, PawelX > Subject: [dpdk-dev] [PATCH] pipeline: add statistics for librte_pipeline ports > and tables > > From: Pawel Wodkowski > > This patch adds statistics collection for librte_pipeline. > Those statistics ale disabled by default during build time. > > This patchset depends on patchset: > port: added port statistics > > Signed-off-by: Pawel Wodkowski > --- > config/common_bsdapp | 1 + > config/common_linuxapp | 1 + > lib/librte_pipeline/rte_pipeline.c | 185 > ++--- > lib/librte_pipeline/rte_pipeline.h | 99 > 4 files changed, 275 insertions(+), 11 deletions(-) > > diff --git a/config/common_bsdapp b/config/common_bsdapp > index 829591c..25147b8 100644 > --- a/config/common_bsdapp > +++ b/config/common_bsdapp > @@ -413,6 +413,7 @@ CONFIG_RTE_TABLE_LPM_STATS_COLLECT=n > # Compile librte_pipeline > # > CONFIG_RTE_LIBRTE_PIPELINE=y > +CONFIG_RTE_PIPELINE_STATS_COLLECT=n > > # > # Compile librte_kni > diff --git a/config/common_linuxapp b/config/common_linuxapp > index 891b5da..1497aa0 100644 > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > @@ -420,6 +420,7 @@ CONFIG_RTE_TABLE_LPM_STATS_COLLECT=n > # Compile librte_pipeline > # > CONFIG_RTE_LIBRTE_PIPELINE=y > +CONFIG_RTE_PIPELINE_STATS_COLLECT=n > > # > # Compile librte_kni > diff --git a/lib/librte_pipeline/rte_pipeline.c > b/lib/librte_pipeline/rte_pipeline.c > index 36d92c9..69bf003 100644 > --- a/lib/librte_pipeline/rte_pipeline.c > +++ b/lib/librte_pipeline/rte_pipeline.c > @@ -48,6 +48,17 @@ > > #define RTE_TABLE_INVALID UINT32_MAX > > +#ifdef RTE_PIPELINE_STATS_COLLECT > +#define RTE_PIPELINE_STATS_ADD(counter, val) \ > + ({ (counter) += (val); }) > + > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) \ > + ({ (counter) += __builtin_popcountll(mask); }) > +#else > +#define RTE_PIPELINE_STATS_ADD(counter, val) > +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) > +#endif > + > struct rte_port_in { > /* Input parameters */ > struct rte_port_in_ops ops; > @@ -63,6 +74,8 @@ struct rte_port_in { > > /* List of enabled ports */ > struct rte_port_in *next; > + > + uint64_t n_pkts_dropped_by_ah; > }; > > struct rte_port_out { > @@ -74,6 +87,8 @@ struct rte_port_out { > > /* Handle to low-level port */ > void *h_port; > + > + uint64_t n_pkts_dropped_by_ah; > }; > > struct rte_table { > @@ -90,6 +105,12 @@ struct rte_table { > > /* Handle to the low-level table object */ > void *h_table; > + > + /* Stats for this table. */ > + uint64_t n_pkts_dropped_by_lkp_hit_ah; > + uint64_t n_pkts_dropped_by_lkp_miss_ah; > + uint64_t n_pkts_dropped_lkp_hit; > + uint64_t n_pkts_dropped_lkp_miss; > }; > > #define RTE_PIPELINE_MAX_NAME_SZ 124 > @@ -1040,6 +1061,8 @@ rte_pipeline_action_handler_port_bulk(struct > rte_pipeline *p, > > port_out->f_action_bulk(p->pkts, _mask, port_out- > >arg_ah); > p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= > pkts_mask ^ mask; > + RTE_PIPELINE_STATS_ADD_M(port_out- > >n_pkts_dropped_by_ah, > + pkts_mask ^ mask); > } > > /* Output port TX */ > @@ -1071,6 +1094,9 @@ rte_pipeline_action_handler_port(struct > rte_pipeline *p, uint64_t pkts_mask) > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- > >n_pkts_dropped_by_ah, > + pkt_mask ^ 1LLU); > + > /* Output port TX */ > if (pkt_mask != 0) > port_out->ops.f_tx(port_out- > >h_port, > @@ -1104,6 +1130,9 @@ rte_pipeline_action_handler_port(struct > rte_pipeline *p, uint64_t pkts_mask) > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- > >n_pkts_dropped_by_ah, > + pkt_mask ^ 1LLU); > + > /* Output port TX */ > if (pkt_mask != 0) > port_out->ops.f_tx(port_out- > >h_port, > @@ -1140,6 +1169,9 @@ rte_pipeline_action_handler_port_meta(struct > rte_pipeline *p, > p- > >action_mask0[RTE_PIPELINE_ACTION_DROP] |= > (pkt_mask ^ 1LLU) << i; > > + RTE_PIPELINE_STATS_ADD(port_out- >
[dpdk-dev] [PATCH 00/10] table: added table statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 30, 2015 12:42 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 00/10] table: added table statistics > > Added statistics for every type of table. By default all table statistics > are disabled, user must activate them in config file. > > Maciej Gajdzica (10): > table: added structure for storing table stats > table: added acl table stats > table: added array table stats > table: added hash_ext table stats > table: added hash_key16 table stats > table: added hash_key32 table stats > table: added hash_key8 table stats > table: added hash_lru table stats > table: added lpm_ipv6 table stats > table: added lpm table stats > > config/common_bsdapp|9 ++ > config/common_linuxapp |9 ++ > lib/librte_table/rte_table.h| 25 +++ > lib/librte_table/rte_table_acl.c| 35 + > lib/librte_table/rte_table_array.c | 34 +++- > lib/librte_table/rte_table_hash_ext.c | 44 > ++ > lib/librte_table/rte_table_hash_key16.c | 41 > > lib/librte_table/rte_table_hash_key32.c | 41 > > lib/librte_table/rte_table_hash_key8.c | 52 > +++ > lib/librte_table/rte_table_hash_lru.c | 44 > ++ > lib/librte_table/rte_table_lpm.c| 34 > lib/librte_table/rte_table_lpm_ipv6.c | 34 > 12 files changed, 401 insertions(+), 1 deletion(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH 00/13] port: added port statistics
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 30, 2015 11:29 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 00/13] port: added port statistics > > Added statistics for every type of port. By default all port statistics > are disabled, user must activate them in config file. > > This patchset depends on two patchsets: > port: added ethdev_writer_nodrop and ring_writer_nodrop ports > port: added frag_ipv6 and ras_ipv6 ports > > Maciej Gajdzica (13): > port: added structures for port stats > port: added port_ethdev_reader stats > port: added port_ethdev_writer stats > port: added port_ethdev_writer_nodrop stats > port: added port_frag stats > port: added port_ras stats > port: added port_ring_reader stats > port: added port_ring_writer stats > port: added port_ring_writer_nodrop stats > port: added port_sched_reader stats > port: added port_sched_writer stats > port: added port_source stats > port: added port_sink stats > > config/common_bsdapp | 12 > config/common_linuxapp | 12 > lib/librte_port/rte_port.h | 60 ++-- > lib/librte_port/rte_port_ethdev.c | 113 > +- > lib/librte_port/rte_port_frag.c| 36 ++ > lib/librte_port/rte_port_ras.c | 38 ++ > lib/librte_port/rte_port_ring.c| 118 > +++- > lib/librte_port/rte_port_sched.c | 96 -- > lib/librte_port/rte_port_source_sink.c | 98 > -- > 9 files changed, 566 insertions(+), 17 deletions(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH 0/3] port: added frag_ipv6 and ras_ipv6 ports
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 30, 2015 11:15 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 0/3] port: added frag_ipv6 and ras_ipv6 ports > > Added ipv6 versions of ip fragmentation and ip reassembly ports. > > Maciej Gajdzica (3): > port: removed IPV4_MTU_DEFAULT define > port: added ipv6 fragmentation port > port: added ipv6 reassembly port > > lib/librte_port/rte_port_frag.c | 67 -- > lib/librte_port/rte_port_frag.h |9 ++- > lib/librte_port/rte_port_ras.c | 142 > --- > lib/librte_port/rte_port_ras.h |9 ++- > 4 files changed, 167 insertions(+), 60 deletions(-) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] [PATCH 0/3] port: added ethdev_writer_nodrop and ring_writer_nodrop ports
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 30, 2015 10:57 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 0/3] port: added ethdev_writer_nodrop and > ring_writer_nodrop ports > > When nodrop writer port fails to send data, it retries until reach maximum > number of retries. Also added new tx_bulk implementation for ring writer > port. > > Maciej Gajdzica (3): > port: added WRITER_APPROACH == 1 implementation to ring port > port: added ethdev_writer_nodrop port > port: added ring_writer_nodrop port > > lib/librte_port/rte_port_ethdev.c | 230 > ++ > lib/librte_port/rte_port_ethdev.h | 19 +++ > lib/librte_port/rte_port_ring.c | 285 > + > lib/librte_port/rte_port_ring.h | 16 +++ > 4 files changed, 550 insertions(+) > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu
[dpdk-dev] FlowDirector Rules for TX packets
Hi Gal, No, flow director is just an RX-side packet filtering NIC feature. Regards, Cristian > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Gal Sagie > Sent: Friday, March 27, 2015 10:44 AM > To: > Subject: [dpdk-dev] FlowDirector Rules for TX packets > > Hello All, > > I can define drop filter rules for RX packets using FlowDirector. > Support for configuring this from CLI is also available using ethtool with > the --config-ntuple option > > I am wondering, does FlowDirector has any support to drop TX packets? > (Meaning that i can define rules which are if matched in the TX side will > not send the packets out) > > At least from ethtool it doesn't seems like this is supported.. > Any idea? > > Thanks > Gal.
[dpdk-dev] [PATCH] table: fix a crash during key8 and key32 overload
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica > Sent: Monday, March 23, 2015 3:09 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] table: fix a crash during key8 and key32 > overload > > hash_key8_ext and hash_key32_ext tables allocate cache entries to > support table overload cases. The crash can occur when cache entry is > free after use. The problem is with computing the index of the free > cache entry. The same case for key16 was fixed with earlier patch. > > Signed-off-by: Maciej Gajdzica > --- > lib/librte_table/rte_table_hash_key32.c |5 ++--- > lib/librte_table/rte_table_hash_key8.c |5 ++--- > 2 files changed, 4 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_table/rte_table_hash_key32.c > b/lib/librte_table/rte_table_hash_key32.c > index da0ce6a..6790594 100644 > --- a/lib/librte_table/rte_table_hash_key32.c > +++ b/lib/librte_table/rte_table_hash_key32.c > @@ -540,9 +540,8 @@ rte_table_hash_entry_delete_key32_ext( > > memset(bucket, 0, > sizeof(struct > rte_bucket_4_32)); > - bucket_index = (bucket - > - ((struct rte_bucket_4_32 *) > - f->memory)) - f->n_buckets; > + bucket_index = (((uint8_t *)bucket - > + (uint8_t *)f->memory)/f- > >bucket_size) - f->n_buckets; > f->stack[f->stack_pos++] = > bucket_index; > } > > diff --git a/lib/librte_table/rte_table_hash_key8.c > b/lib/librte_table/rte_table_hash_key8.c > index 443ca7d..6803eb2 100644 > --- a/lib/librte_table/rte_table_hash_key8.c > +++ b/lib/librte_table/rte_table_hash_key8.c > @@ -528,9 +528,8 @@ rte_table_hash_entry_delete_key8_ext( > > memset(bucket, 0, > sizeof(struct > rte_bucket_4_8)); > - bucket_index = (bucket - > - ((struct rte_bucket_4_8 *) > - f->memory)) - f->n_buckets; > + bucket_index = (((uint8_t *)bucket - > + (uint8_t *)f->memory)/f- > >bucket_size) - f->n_buckets; > f->stack[f->stack_pos++] = > bucket_index; > } > > -- > 1.7.9.5 Acked by: Cristian Dumitrescu Thanks, Maciej!
[dpdk-dev] [PATCH] dpdk: fix a crash during rte_table_hash_key16_ext overload
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of > miroslaw.walukiewicz at intel.com > Sent: Tuesday, March 3, 2015 2:16 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] dpdk: fix a crash during > rte_table_hash_key16_ext overload > > From: Miroslaw Walukiewicz > > The hash_key16_ext table allocates a cache entries to support > table overload cases. > > The crash can occur when cache entry is free after use. The problem > is with computing the index of the free cache entry. > > The patch fixes a problem. > > Signed-off-by: Mirek Walukiewicz > --- > lib/librte_table/rte_table_hash_key16.c |5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_table/rte_table_hash_key16.c > b/lib/librte_table/rte_table_hash_key16.c > index ee5f639..e0c99bd 100644 > --- a/lib/librte_table/rte_table_hash_key16.c > +++ b/lib/librte_table/rte_table_hash_key16.c > @@ -535,9 +535,8 @@ rte_table_hash_entry_delete_key16_ext( > > memset(bucket, 0, > sizeof(struct > rte_bucket_4_16)); > - bucket_index = (bucket - > - ((struct rte_bucket_4_16 *) > - f->memory)) - f->n_buckets; > + bucket_index = (((uint8_t *)bucket - > + (uint8_t *)f->memory)/f- > >bucket_size) - f->n_buckets; > f->stack[f->stack_pos++] = > bucket_index; > } > Acked by: Cristian Dumitrescu Mirek, identical issue is found at identical place in rte_table_hash_key8.c and rte_table_hash_key32.c, would you please submit the same fix for those as well? Thanks, Cristian
[dpdk-dev] Interactive/dynamic QoS scheduler
Hi Alexandre, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon > Sent: Wednesday, March 18, 2015 5:10 PM > To: dev at dpdk.org > Subject: [dpdk-dev] Interactive/dynamic QoS scheduler > > Hi, > > I'm currently working with the QoS api. I would like to know if it is > possible to > have interaction at runtime with the parameter applied to pipes, subport and > queue. Yes, it is. You can call API functions rte_sched_subport_config() and rte_sched_pipe_config() at run-time to re-configure the subport and pipe rates. This assumes that the port has been configured during initialization, with all the port subports and pipes fully provisioned at this time (through the port configuration parameters n_subports_per_port and n_pipes_per_subport). Please note you need to call the subport and pipe config functions before you put any packets in their associated queues (typically, these functions are first called at initialization for all the port subports and pipes), but you can call them again later on to modify their previously assigned rates. All the pipe profiles are set in stone once the port is initialized (through port configuration parameter pipe_profiles), what you can do at run-time is to re-assign a pipe to a different pre-defined profile, which basically changes the rate of the pipe. > > Looking at the Scheduler sample, I understand that all of these parameters > are predetermined using profiles in the config file. > What I would like to do is interact with the scheduler to change subport, pipe > and tc rate or size with arbitrary value at runtime to have control over > traffic. > This is just a limitation of the qos_sched application, which currently does not provide any mechanism for the dynamic update of the subport/pipe parameters. > For example: > The scheduler sample is running and at any given time I want a number of > pipe to drop from 10Gbit to 4.5 Gbit and then put it back to 8Gbit. > Profiles are not useful in this case because I would have to set a different > profile for each value from rate 1bit to 10Gbit. > > Is it actually possible to do that and how? > Yes, it is relatively simple to do it. The qos_sched application has a CLI that only implements some stats-related commands, but new commands could be added to update the subport/pipe configuration. At this point, you need to write that (relatively low complexity) code yourself unfortunately. Please consider developing a code patch to add this support and share it with the DPDK community! > Thanks, > Alexandre F.
[dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Monday, March 16, 2015 8:11 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without > clearing > > On Mon, 16 Mar 2015 19:58:51 +0000 > "Dumitrescu, Cristian" wrote: > > > > > > > > -Original Message- > > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > > Sent: Thursday, March 12, 2015 11:06 PM > > > To: Dumitrescu, Cristian > > > Cc: dev at dpdk.org; Stephen Hemminger > > > Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without > > > clearing > > > > > > On Thu, 12 Mar 2015 19:28:11 + > > > "Dumitrescu, Cristian" wrote: > > > > > > > Hi Stephen, > > > > > > > > Thank you for adding flexibility over clearing the stats or not. > > > > > > > > I have one concern though: why change the stats read API to add a > clear > > > parameter rather than keep prototype for the stats functions unchanged > and > > > add the flag as part of the port creation parameters in struct > > > rte_sched_port_params? This parameter could be saved into the internal > > > struct rte_sched_port, which is passed (as a pointer) to the stats read > > > functions. In my opinion, this approach is slightly more elegant and it > keeps > > > the changes to a minimum. > > > > > > > > int > > > > rte_sched_queue_read_stats(struct rte_sched_port *port, > > > > uint32_t queue_id, > > > > struct rte_sched_queue_stats *stats, > > > > uint16_t *qlen) > > > > { > > > > ... > > > > if (port->clear_stats_on_read) > > > > memset(...); > > > > } > > > > > > > > I made this suggestion during the previous round, but I did not get any > > > opinion from you on it yet. > > > > > > > > Regards, > > > > Cristian > > > > > > I rejected the config parameter idea because I don't like it is > > > inconsistent > > > with other statistics in DPDK and in related software. There is not a > > > config parameter that changes what BSD or Linux kernel API does. > > > > Your approach has the advantage of being able to clear/not clear the stats > per each read operation rather than configuring the behavior globally. I think > this approach allows for the ultimate flexibility, so I am OK to go with it. > > > > > > > > The only reason for keeping the read and clear in one operation is > > > because you like it, and there are somebody might have built code > > > that expects it. > > > > > > > Clearing the stats with a delay after the stats have been read is prone to a > race condition, as during this time more packets could be processed, and > these packets will not show up in the counters that the user read. > > I think it depends on the need of each particular application whether this > race condition is important or not: if the counters are read rarely (e.g. once > per day) and only course accuracy is required, the error is probably > irrelevant; > if the app is looking for fine great accuracy (e.g. rate measurement, > debugging, etc), then the error is not allowed. You seem to favour the > former and ignore the later case. > > > > > > > Changing the function signature is a nice red flag so that people will > > > notice at change. > > > > There is a small API change here. I am OK with it for the above reasons, > provided that there are no objections on this from other contributors. > > > > If you really wanted a fast/safe read and clear operation, how about using > exchange instruction to exchange in a zero? When stats read is decoupled from stats clear, the exchange operation cannot fix this. To your point: when stats are cleared on read, would it make sense to use the exchange operation? In my opinion, it does not make sense, as the rte_sched API is not thread safe (for performance reasons), which is explicitly stated. As clearly documented, the scheduler enqueue and dequeue operations need to run on the same CPU core. Therefore, IMHO having a thread safe stats read function while the main functional API is not thread safe is not going to add value.
[dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, March 10, 2015 4:14 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger; Stephen Hemminger > Subject: [PATCH v2 5/6] rte_sched: allow reading statistics without clearing > > From: Stephen Hemminger > > The rte_sched statistics API should allow reading statistics without > clearing. Make auto-clear optional. > > Signed-off-by: Stephen Hemminger > --- > v2 -- change to flag for clear from separate read/clear > > app/test/test_sched.c| 4 ++-- > examples/qos_sched/stats.c | 16 +++- > lib/librte_sched/rte_sched.c | 44 ++- > - > lib/librte_sched/rte_sched.h | 18 ++ > 4 files changed, 45 insertions(+), 37 deletions(-) > > diff --git a/app/test/test_sched.c b/app/test/test_sched.c > index 60c62de..3c9c9e3 100644 > --- a/app/test/test_sched.c > +++ b/app/test/test_sched.c > @@ -208,13 +208,13 @@ test_sched(void) > > struct rte_sched_subport_stats subport_stats; > uint32_t tc_ov; > - rte_sched_subport_read_stats(port, SUBPORT, _stats, > _ov); > + rte_sched_subport_read_stats(port, SUBPORT, _stats, > _ov, 1); > #if 0 > TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong > subport stats\n"); > #endif > struct rte_sched_queue_stats queue_stats; > uint16_t qlen; > - rte_sched_queue_read_stats(port, QUEUE, _stats, ); > + rte_sched_queue_read_stats(port, QUEUE, _stats, , > 1); > #if 0 > TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue > stats\n"); > #endif > diff --git a/examples/qos_sched/stats.c b/examples/qos_sched/stats.c > index b4db7b5..88caa54 100644 > --- a/examples/qos_sched/stats.c > +++ b/examples/qos_sched/stats.c > @@ -61,7 +61,7 @@ qavg_q(uint8_t port_id, uint32_t subport_id, uint32_t > pipe_id, uint8_t tc, uint8 > average = 0; > > for (count = 0; count < qavg_ntimes; count++) { > -rte_sched_queue_read_stats(port, queue_id, , ); > +rte_sched_queue_read_stats(port, queue_id, , , 1); > average += qlen; > usleep(qavg_period); > } > @@ -99,7 +99,9 @@ qavg_tcpipe(uint8_t port_id, uint32_t subport_id, > uint32_t pipe_id, uint8_t tc) > for (count = 0; count < qavg_ntimes; count++) { > part_average = 0; > for (i = 0; i < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) { > -rte_sched_queue_read_stats(port, queue_id + (tc * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i), , ); > +rte_sched_queue_read_stats(port, > +queue_id + (tc * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i), > +, , 1); > part_average += qlen; > } > average += part_average / > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; > @@ -138,7 +140,8 @@ qavg_pipe(uint8_t port_id, uint32_t subport_id, > uint32_t pipe_id) > for (count = 0; count < qavg_ntimes; count++) { > part_average = 0; > for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) { > -rte_sched_queue_read_stats(port, queue_id + i, > , > ); > +rte_sched_queue_read_stats(port, queue_id + i, > +, , 1); > part_average += qlen; > } > average += part_average / > (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS); > @@ -178,7 +181,9 @@ qavg_tcsubport(uint8_t port_id, uint32_t subport_id, > uint8_t tc) > queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id * > port_params.n_pipes_per_subport + i); > > for (j = 0; j < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; > j++) { > -rte_sched_queue_read_stats(port, queue_id + > (tc * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), , ); > +rte_sched_queue_read_stats(port, > +queue_id + (tc * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), > +, , 1); > part_average += qlen; > } > } > @@ -220,7 +225,8 @@ qavg_subport(uint8_t port_id, uint32_t
[dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Thursday, March 12, 2015 11:06 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without > clearing > > On Thu, 12 Mar 2015 19:28:11 +0000 > "Dumitrescu, Cristian" wrote: > > > Hi Stephen, > > > > Thank you for adding flexibility over clearing the stats or not. > > > > I have one concern though: why change the stats read API to add a clear > parameter rather than keep prototype for the stats functions unchanged and > add the flag as part of the port creation parameters in struct > rte_sched_port_params? This parameter could be saved into the internal > struct rte_sched_port, which is passed (as a pointer) to the stats read > functions. In my opinion, this approach is slightly more elegant and it keeps > the changes to a minimum. > > > > int > > rte_sched_queue_read_stats(struct rte_sched_port *port, > > uint32_t queue_id, > > struct rte_sched_queue_stats *stats, > > uint16_t *qlen) > > { > > ... > > if (port->clear_stats_on_read) > > memset(...); > > } > > > > I made this suggestion during the previous round, but I did not get any > opinion from you on it yet. > > > > Regards, > > Cristian > > I rejected the config parameter idea because I don't like it is inconsistent > with other statistics in DPDK and in related software. There is not a > config parameter that changes what BSD or Linux kernel API does. Your approach has the advantage of being able to clear/not clear the stats per each read operation rather than configuring the behavior globally. I think this approach allows for the ultimate flexibility, so I am OK to go with it. > > The only reason for keeping the read and clear in one operation is > because you like it, and there are somebody might have built code > that expects it. > Clearing the stats with a delay after the stats have been read is prone to a race condition, as during this time more packets could be processed, and these packets will not show up in the counters that the user read. I think it depends on the need of each particular application whether this race condition is important or not: if the counters are read rarely (e.g. once per day) and only course accuracy is required, the error is probably irrelevant; if the app is looking for fine great accuracy (e.g. rate measurement, debugging, etc), then the error is not allowed. You seem to favour the former and ignore the later case. > Changing the function signature is a nice red flag so that people will > notice at change. There is a small API change here. I am OK with it for the above reasons, provided that there are no objections on this from other contributors.
[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Tuesday, February 24, 2015 8:07 PM > To: Stephen Hemminger > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when > read > > 2015-02-24 11:18, Stephen Hemminger: > > On Mon, 23 Feb 2015 23:51:31 + > > Thomas Monjalon wrote: > > > > > 2015-02-05 07:43, Neil Horman: > > > > On Wed, Feb 04, 2015 at 10:13:58PM -0800, Stephen Hemminger wrote: > > > > > + > > > > > +/** > > > > > + * Hierarchical scheduler subport statistics reset > > > > > + * > > > > > + * @param port > > > > > + * Handle to port scheduler instance > > > > > + * @param subport_id > > > > > + * Subport ID > > > > > + * @return > > > > > + * 0 upon success, error code otherwise > > > > > + */ > > > > > +int > > > > > +rte_sched_subport_stats_reset(struct rte_sched_port *port, > > > > > + uint32_t subport_id); > > > > > + > > > > > /** > > > > > * Hierarchical scheduler queue statistics read > > > > > * > > > > > @@ -338,6 +353,20 @@ rte_sched_queue_read_stats(struct > rte_sched_port *port, > > > > > struct rte_sched_queue_stats *stats, > > > > > uint16_t *qlen); > > > > > > > > > > +/** > > > > > + * Hierarchical scheduler queue statistics reset > > > > > + * > > > > > + * @param port > > > > > + * Handle to port scheduler instance > > > > > + * @param queue_id > > > > > + * Queue ID within port scheduler > > > > > + * @return > > > > > + * 0 upon success, error code otherwise > > > > > + */ > > > > > +int > > > > > +rte_sched_queue_stats_reset(struct rte_sched_port *port, > > > > > + uint32_t queue_id); > > > > > + > > > > Both need to be added to the version map to expose them properly. > > > > Neil > > > > > > Stephen, this patchset is partially acked and could enter in 2.0.0-rc1. > > > May you send a v3 addressing comments? Or should I break the serie by > > > applying only some of them? Or postpone the serie to 2.1? > > > > I can resend v3. Wasn't clear that a conclusion was reached. > > IMHO read should not clear. > > Me too. I'm just saying that I cannot apply anything. > So you have to decide the strategy to adopt for your patches. How about my latest proposal to have the stats read functions either reset the counters or not, based on init-time user configuration? I did not see any reply on this. Maybe you guys missed my reply, I am pasting it below: "Personally, I think we should avoid proliferating the number of stats functions, I would keep a single set of stats read functions, which can clear the stats or not, depending on behaviour configured per rte_sched object at creation time. Basically, based on the value of configuration parameter struct rte_sched_params::clear_stats_on_reset, the stats read functions do clear the counters or not. In my opinion, this allows a clean init-time selection of the required behaviour, and it also provides backward compatibility. Any issues with this approach?"
[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Saturday, February 21, 2015 1:53 AM > To: Dumitrescu, Cristian > Cc: Thomas Monjalon; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when > read > > On Fri, 20 Feb 2015 21:28:55 +0000 > "Dumitrescu, Cristian" wrote: > > > Agree. > > Stephen, how about a run-time solution (I agree it would be much better, > why did I not consider this in the first place?) of adding a new bool > parameter > in struct rte_sched_port_params: clear_stats_on_reset? > > Both stats read function get the port handle (struct rte_sched_port *) as > parameter, so there should be no ripple effect to propagate this flag. > > Why not read_and_clear function if absolutely necessary. I am not sure I understand what you mean. Are you suggesting different set of stats read functions, one that is read and clear, while the other one is read without clear? Personally, I think we should avoid proliferating the number of stats functions, I would keep a single set of stats read functions, which can clear the stats or not, depending on behaviour configured per rte_sched object at creation time. Basically, based on the value of configuration parameter struct rte_sched_params::clear_stats_on_reset, the stats read functions do clear the counters or not. In my opinion, this allows a clean init-time selection of the required behaviour, and it also provides backward compatibility. Any issues with this approach?
[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, February 20, 2015 9:01 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when > read > > 2015-02-20 20:23, Dumitrescu, Cristian: > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > > On Fri, 20 Feb 2015 18:32:03 + > > > "Dumitrescu, Cristian" wrote: > > > > > > > Stephen, I suggest adding a new build-time configuration option for the > > > > librte_sched library in config/common_* files: > > > > CONFIG_RTE_SCHED_STATS_CLEAR_ON_READ. > > > > > > Build time config options do not work for distributions. > > > > Why? > > > > This does not affect the API, as the new API functions are always compiled > > in, and the prototypes are not changed, and no data structures are > affected. > > Behaviour is an important part of the API. Think comments as part of the API. Makes sense. > > > This only changes the behavior of certain functions, so that user can > > select which mode it needs. > > When user doesn't or cannot rebuild, he has no choice. > > > It also preserves backward compatibility. > > > > We have so many compilation options in config file, why is this one > different? > > We must remove and avoid build-time options. > The only ones which might be acceptable are the ones which allow more > performance by disabling some features. Agree. Stephen, how about a run-time solution (I agree it would be much better, why did I not consider this in the first place?) of adding a new bool parameter in struct rte_sched_port_params: clear_stats_on_reset? Both stats read function get the port handle (struct rte_sched_port *) as parameter, so there should be no ripple effect to propagate this flag. > > > This e-mail and any attachments may contain confidential material for the > > sole use of the intended recipient(s). Any review or distribution by others > > is strictly prohibited. If you are not the intended recipient, please > > contact > > the sender and delete all copies. > > Please ask to your administrator to remove this disclaimer. I did earlier, this is still work in progress unfortunately. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] RTE_PIPELINE_ACTION_PORT_META doesn't work properly
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, February 20, 2015 8:47 PM > To: Dumitrescu, Cristian; imustafin at bk.ru > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] RTE_PIPELINE_ACTION_PORT_META doesn't work > properly > > 2015-02-20 19:26, Dumitrescu, Cristian: > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ?? > > > > Sent: Friday, February 20, 2015 4:25 PM > > > To: dev at dpdk.org > > > Subject: [dpdk-dev] RTE_PIPELINE_ACTION_PORT_META doesn't work > > > properly > > > > > > Hi, list! > > > > > > RTE_PIPELINE_ACTION_PORT_META option in rte_pipeline library > doesn't > > > work for non-default table entries. Is this bug or correct behaviour? > > > > > > This is my patch, that changes this behaviour: > > > > > > --- a/lib/librte_pipeline/rte_pipeline.c > > > +++ b/lib/librte_pipeline/rte_pipeline.c > > > @@ -999,8 +999,10 @@ rte_pipeline_compute_masks(struct > rte_pipeline > > > *p, uint64_t pkts_mask) > > > { > > > p->action_mask1[RTE_PIPELINE_ACTION_DROP] = 0; > > > p->action_mask1[RTE_PIPELINE_ACTION_PORT] = 0; > > > +p->action_mask1[RTE_PIPELINE_ACTION_PORT_META] = 0; > > > p->action_mask1[RTE_PIPELINE_ACTION_TABLE] = 0; > > > > > > + > > > if ((pkts_mask & (pkts_mask + 1)) == 0) { > > > uint64_t n_pkts = __builtin_popcountll(pkts_mask); > > > uint32_t i; > > > @@ -1224,6 +1226,7 @@ rte_pipeline_run(struct rte_pipeline *p) > > > pkts_mask = RTE_LEN2MASK(n_pkts, uint64_t); > > > p->action_mask0[RTE_PIPELINE_ACTION_DROP] = 0; > > > p->action_mask0[RTE_PIPELINE_ACTION_PORT] = 0; > > > +p->action_mask0[RTE_PIPELINE_ACTION_PORT_META] = 0; > > > p->action_mask0[RTE_PIPELINE_ACTION_TABLE] = 0; > > > > > > /* Input port user actions */ > > > @@ -1300,6 +1303,9 @@ rte_pipeline_run(struct rte_pipeline *p) > > > p->action_mask0[RTE_PIPELINE_ACTION_PORT] > > > |= > > > p->action_mask1[ > > > RTE_PIPELINE_ACTION_PORT]; > > > +p->action_mask0[RTE_PIPELINE_ACTION_PORT_META] |= > > > +p->action_mask1[ > > > +RTE_PIPELINE_ACTION_PORT_META]; > > > > > > p->action_mask0[RTE_PIPELINE_ACTION_TABLE] |= > > > p->action_mask1[ > > > > > > RTE_PIPELINE_ACTION_TABLE]; > > > > > > > > > > > > > > > > > > Thanks, Ildar > > > > Acked by: Cristian Dumitrescu > > No Cristian, you cannot ack this patch, even if it fixes the problem. > The format is not OK: indent is wrong and it's not signed. > Please Ildar, check http://dpdk.org/dev#send to know how to submit a > patch. > > > Thank you, Ildar! Sorry, Thomas. Ildar, please fix the signature (currently it the above signature is: On Behalf Of ??) and the indentation and resubmit. Regards, Cristian -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] RTE_PIPELINE_ACTION_PORT_META doesn't work properly
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ?? > Sent: Friday, February 20, 2015 4:25 PM > To: dev at dpdk.org > Subject: [dpdk-dev] RTE_PIPELINE_ACTION_PORT_META doesn't work > properly > > Hi, list! > > RTE_PIPELINE_ACTION_PORT_META option in rte_pipeline library doesn't > work for non-default table entries. Is this bug or correct behaviour? > > This is my patch, that changes this behaviour: > > --- a/lib/librte_pipeline/rte_pipeline.c > +++ b/lib/librte_pipeline/rte_pipeline.c > @@ -999,8 +999,10 @@ rte_pipeline_compute_masks(struct rte_pipeline > *p, uint64_t pkts_mask) > ?{ > ??? p->action_mask1[RTE_PIPELINE_ACTION_DROP] = 0; > ??? p->action_mask1[RTE_PIPELINE_ACTION_PORT] = 0; > +??? p->action_mask1[RTE_PIPELINE_ACTION_PORT_META] = 0; > ??? p->action_mask1[RTE_PIPELINE_ACTION_TABLE] = 0; > > + > ??? if ((pkts_mask & (pkts_mask + 1)) == 0) { > ??? uint64_t n_pkts = __builtin_popcountll(pkts_mask); > ??? uint32_t i; > @@ -1224,6 +1226,7 @@ rte_pipeline_run(struct rte_pipeline *p) > ??? pkts_mask = RTE_LEN2MASK(n_pkts, uint64_t); > ??? p->action_mask0[RTE_PIPELINE_ACTION_DROP] = 0; > ??? p->action_mask0[RTE_PIPELINE_ACTION_PORT] = 0; > +??? p->action_mask0[RTE_PIPELINE_ACTION_PORT_META] = 0; > ??? p->action_mask0[RTE_PIPELINE_ACTION_TABLE] = 0; > > ??? /* Input port user actions */ > @@ -1300,6 +1303,9 @@ rte_pipeline_run(struct rte_pipeline *p) > ??? p->action_mask0[RTE_PIPELINE_ACTION_PORT] |= > ??? p->action_mask1[ > ??? RTE_PIPELINE_ACTION_PORT]; > +??? p->action_mask0[RTE_PIPELINE_ACTION_PORT_META] |= > +??? p->action_mask1[ > +??? RTE_PIPELINE_ACTION_PORT_META]; > ??? p->action_mask0[RTE_PIPELINE_ACTION_TABLE] |= > ??? p->action_mask1[ > ??? RTE_PIPELINE_ACTION_TABLE]; > > > > > > Thanks, Ildar Acked by: Cristian Dumitrescu Thank you, Ildar! -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v2 5/7] rte_sched: don't put tabs in log messages
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Thursday, February 5, 2015 6:14 AM > To: dev at dpdk.org > Cc: Stephen Hemminger > Subject: [dpdk-dev] [PATCH v2 5/7] rte_sched: don't put tabs in log > messages > > From: Stephen Hemminger > > syslog does not like tabs in log messages; tab gets translated to #011 > > Signed-off-by: Stephen Hemminger > --- > lib/librte_sched/rte_sched.c | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index d891e50..55fbc14 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -495,10 +495,10 @@ rte_sched_port_log_pipe_profile(struct > rte_sched_port *port, uint32_t i) > struct rte_sched_pipe_profile *p = port->pipe_profiles + i; > > RTE_LOG(INFO, SCHED, "Low level config for pipe profile %u:\n" > - "\tToken bucket: period = %u, credits per period = %u, size = > %u\n" > - "\tTraffic classes: period = %u, credits per period = [%u, %u, > %u, %u]\n" > - "\tTraffic class 3 oversubscription: weight = %hhu\n" > - "\tWRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, > %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n", > + "Token bucket: period = %u, credits per period = %u, size = > %u\n" > + "Traffic classes: period = %u, credits per period = [%u, %u, > %u, %u]\n" > + "Traffic class 3 oversubscription: weight = %hhu\n" > + "WRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, > %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n", > i, > > /* Token bucket */ > @@ -716,9 +716,9 @@ rte_sched_port_log_subport_config(struct > rte_sched_port *port, uint32_t i) > struct rte_sched_subport *s = port->subport + i; > > RTE_LOG(INFO, SCHED, "Low level config for subport %u:\n" > - "\tToken bucket: period = %u, credits per period = %u, size = > %u\n" > - "\tTraffic classes: period = %u, credits per period = [%u, %u, > %u, %u]\n" > - "\tTraffic class 3 oversubscription: wm min = %u, wm max = > %u\n", > + "Token bucket: period = %u, credits per period = %u, size = > %u\n" > + "Traffic classes: period = %u, credits per period = [%u, %u, > %u, %u]\n" > + "Traffic class 3 oversubscription: wm min = %u, wm max = > %u\n", > i, > > /* Token bucket */ > -- > 2.1.4 Acked by: Cristian Dumitrescu -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Monday, February 9, 2015 10:55 PM > To: Dumitrescu, Cristian > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when > read > > On Mon, 9 Feb 2015 22:48:36 +0000 > "Dumitrescu, Cristian" wrote: > > > Hi Stephen, > > > > What is the reason not to clear statistics on read? Do you have a use-case / > justification for it? > > > > (BTW, I see you added the reset functions, but was it also your intention to > remove the memset to 0 from the stats read functions? :) ) > > > > Regards, > > Cristian > > Read and clear is a non-standard model. Interface statistics are not > read/clear. > We have lots of scripts that read statistics. Users don't like it if when > stastics disappear. Stephen, I suggest adding a new build-time configuration option for the librte_sched library in config/common_* files: CONFIG_RTE_SCHED_STATS_CLEAR_ON_READ. - When set to YES, we clear the stats counters on read. When set to NO, we do not clear stats counters on read; - The default value for this configuration option should be YES, in order to preserve the backward compatibility with the current behavior; - The stats reset functions introduced by this patch should always be enabled/compiled into the code, regardless of whether this new option is set to YES or NO. What do you think? Regards, Cristian -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v2 3/7] rte_sched: keep track of RED drops
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Thursday, February 5, 2015 6:05 AM > To: dev at dpdk.org > Cc: Stephen Hemminger > Subject: [dpdk-dev] [PATCH v2 3/7] rte_sched: keep track of RED drops > > From: Stephen Hemminger > > Add new statistic to keep track of drops due to RED. > > Signed-off-by: Stephen Hemminger > --- > lib/librte_sched/rte_sched.c | 28 +++- > lib/librte_sched/rte_sched.h | 6 ++ > 2 files changed, 29 insertions(+), 5 deletions(-) > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 6928c98..8cb8bf1 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -1028,7 +1028,9 @@ rte_sched_port_update_subport_stats(struct > rte_sched_port *port, uint32_t qindex > } > > static inline void > -rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port > *port, uint32_t qindex, struct rte_mbuf *pkt) > +rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port > *port, > + uint32_t qindex, > + struct rte_mbuf *pkt, uint32_t red) > { > struct rte_sched_subport *s = port->subport + (qindex / > rte_sched_port_queues_per_subport(port)); > uint32_t tc_index = (qindex >> 2) & 0x3; > @@ -1036,6 +1038,9 @@ > rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port > *port, uint32_ > > s->stats.n_pkts_tc_dropped[tc_index] += 1; > s->stats.n_bytes_tc_dropped[tc_index] += pkt_len; > +#ifdef RTE_SCHED_RED > + s->stats.n_pkts_red_dropped[tc_index] += red; > +#endif > } > > static inline void > @@ -1049,13 +1054,18 @@ rte_sched_port_update_queue_stats(struct > rte_sched_port *port, uint32_t qindex, > } > > static inline void > -rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port > *port, uint32_t qindex, struct rte_mbuf *pkt) > +rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port > *port, > + uint32_t qindex, > + struct rte_mbuf *pkt, uint32_t red) > { > struct rte_sched_queue_extra *qe = port->queue_extra + qindex; > uint32_t pkt_len = pkt->pkt_len; > > qe->stats.n_pkts_dropped += 1; > qe->stats.n_bytes_dropped += pkt_len; > +#ifdef RTE_SCHED_RED > + qe->stats.n_pkts_red_dropped += red; > +#endif > } > > #endif /* RTE_SCHED_COLLECT_STATS */ > @@ -1206,12 +1216,20 @@ rte_sched_port_enqueue_qwa(struct > rte_sched_port *port, uint32_t qindex, struct > qlen = q->qw - q->qr; > > /* Drop the packet (and update drop stats) when queue is full */ > - if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen) || > (qlen >= qsize))) { > + if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen))) { > +#ifdef RTE_SCHED_COLLECT_STATS > + rte_sched_port_update_subport_stats_on_drop(port, > qindex, pkt, 1); > + rte_sched_port_update_queue_stats_on_drop(port, > qindex, pkt, 1); > +#endif > rte_pktmbuf_free(pkt); > + } > + > + if (qlen >= qsize) { > #ifdef RTE_SCHED_COLLECT_STATS > - rte_sched_port_update_subport_stats_on_drop(port, > qindex, pkt); > - rte_sched_port_update_queue_stats_on_drop(port, > qindex, pkt); > + rte_sched_port_update_subport_stats_on_drop(port, > qindex, pkt, 0); > + rte_sched_port_update_queue_stats_on_drop(port, > qindex, pkt, 0); > #endif > + rte_pktmbuf_free(pkt); > return 0; > } > > diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h > index dda287f..e9bf18a 100644 > --- a/lib/librte_sched/rte_sched.h > +++ b/lib/librte_sched/rte_sched.h > @@ -140,6 +140,9 @@ struct rte_sched_subport_stats { > subport for each traffic class*/ > uint32_t > n_bytes_tc_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< > Number of bytes dropped by the current >subport for each traffic class due > to subport queues > being full or congested */ > +#ifdef RTE_SCHED_RED > + uint32_t > n_pkts_red_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< > Number of packets dropped by red */ > +#endif > }; > > /** Pipe configuration parameters. The period and credits_per_period > parameters are measured > @@ -168,6 +171,9 @@ struct rte_sched_queue_stats { > /* Packets */ > uint32_t n_pkts; /**< Number of packets successfully > written to current queue */ > uint32_t n_pkts_dropped; /**< Number of packets dropped due > to current queue being full or congested */ > +#ifdef RTE_SCHED_RED > + uint32_t n_pkts_red_dropped; > +#endif > > /* Bytes */ > uint32_t n_bytes;/**< Number of bytes successfully > written > to current queue */ > -- > 2.1.4
[dpdk-dev] [PATCH v2 2/7] rte_sched: expand scheduler hierarchy for more VLAN's
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Thursday, February 5, 2015 6:05 AM > To: dev at dpdk.org > Cc: Stephen Hemminger > Subject: [dpdk-dev] [PATCH v2 2/7] rte_sched: expand scheduler hierarchy > for more VLAN's > > From: Stephen Hemminger > > The QoS subport is limited to 8 bits in original code. > But customers demanded ability to support full number of VLAN's (4096) > therefore use the full part of the tag field of mbuf. > > Resize the pipe as well to allow for more pipes in future and > avoid expensive bitfield access. > > Signed-off-by: Stephen Hemminger > --- > v2 use tag area rather than claiming reserved bit which isn't documented > > lib/librte_mbuf/rte_mbuf.h | 5 - > lib/librte_sched/rte_sched.h | 38 - > - > 2 files changed, 28 insertions(+), 15 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 16059c6..8f0c3a4 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -258,7 +258,10 @@ struct rte_mbuf { > /**< First 4 flexible bytes or FD ID, dependent on >PKT_RX_FDIR_* flag in ol_flags. */ > } fdir; /**< Filter identifier if FDIR enabled */ > - uint32_t sched; /**< Hierarchical scheduler */ > + struct { > + uint32_t lo; > + uint32_t hi; > + } sched; /**< Hierarchical scheduler */ > uint32_t usr; /**< User defined tags. See > @rte_distributor_process */ > } hash; /**< hash information */ > > diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h > index e6bba22..dda287f 100644 > --- a/lib/librte_sched/rte_sched.h > +++ b/lib/librte_sched/rte_sched.h > @@ -195,16 +195,20 @@ struct rte_sched_port_params { > #endif > }; > > -/** Path through the scheduler hierarchy used by the scheduler enqueue > operation to > -identify the destination queue for the current packet. Stored in the field > hash.sched > -of struct rte_mbuf of each packet, typically written by the classification > stage and read by > -scheduler enqueue.*/ > +/* > + * Path through the scheduler hierarchy used by the scheduler enqueue > + * operation to identify the destination queue for the current > + * packet. Stored in the field pkt.hash.sched of struct rte_mbuf of > + * each packet, typically written by the classification stage and read > + * by scheduler enqueue. > + */ > struct rte_sched_port_hierarchy { > - uint32_t queue:2;/**< Queue ID (0 .. 3) */ > - uint32_t traffic_class:2;/**< Traffic class ID (0 .. 3)*/ > - uint32_t pipe:20;/**< Pipe ID */ > - uint32_t subport:6; /**< Subport ID */ > - uint32_t color:2;/**< Color */ > + uint16_t queue:2;/**< Queue ID (0 .. 3) */ > + uint16_t traffic_class:2;/**< Traffic class ID (0 .. 3)*/ > + uint16_t color:2;/**< Color */ > + uint16_t unused:10; > + uint16_t subport;/**< Subport ID */ > + uint32_t pipe; /**< Pipe ID */ > }; Extending the number of bits allocated for mbuf->sched makes sense to me. I agree with this partitioning. > > /* > @@ -350,12 +354,15 @@ rte_sched_queue_read_stats(struct > rte_sched_port *port, > */ > static inline void > rte_sched_port_pkt_write(struct rte_mbuf *pkt, > - uint32_t subport, uint32_t pipe, uint32_t traffic_class, uint32_t > queue, enum rte_meter_color color) > + uint32_t subport, uint32_t pipe, > + uint32_t traffic_class, > + uint32_t queue, enum rte_meter_color color) > { > - struct rte_sched_port_hierarchy *sched = (struct > rte_sched_port_hierarchy *) >hash.sched; > + struct rte_sched_port_hierarchy *sched > + = (struct rte_sched_port_hierarchy *) >hash.sched; > > - sched->color = (uint32_t) color; > sched->subport = subport; > + sched->color = (uint32_t) color; > sched->pipe = pipe; > sched->traffic_class = traffic_class; > sched->queue = queue; > @@ -379,9 +386,12 @@ rte_sched_port_pkt_write(struct rte_mbuf *pkt, > * > */ > static inline void > -rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t > *subport, uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue) > +rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t > *subport, > + uint32_t *pipe, uint32_t *traffic_class, > + uint32_t *queue) > { > - struct rte_sched_port_hierarchy *sched = (struct > rte_sched_port_hierarchy *) >hash.sched; > + struct rte_sched_port_hierarchy *sched > + = (struct rte_sched_port_hierarchy *) >hash.sched; > >
[dpdk-dev] [PATCH v2 1/7] rte_sched: make RED optional at runtime
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Thursday, February 5, 2015 6:05 AM > To: dev at dpdk.org > Cc: Stephen Hemminger > Subject: [dpdk-dev] [PATCH v2 1/7] rte_sched: make RED optional at runtime > > From: Stephen Hemminger > > Want to be able to build with RTE_SCHED_RED enabled but > allow disabling RED on a per-queue basis at runtime. > > RED is disabled unless min/max thresholds set. > > Signed-off-by: Stephen Hemmminger > --- > lib/librte_sched/rte_sched.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 95dee27..6928c98 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -636,6 +636,12 @@ rte_sched_port_config(struct > rte_sched_port_params *params) > uint32_t j; > > for (j = 0; j < e_RTE_METER_COLORS; j++) { > + /* if min/max are both zero, then RED is disabled */ > + if ((params->red_params[i][j].min_th | > + params->red_params[i][j].max_th) == 0) { > + continue; > + } > + > if (rte_red_config_init(>red_config[i][j], > params->red_params[i][j].wq_log2, > params->red_params[i][j].min_th, > @@ -1069,6 +1075,9 @@ rte_sched_port_red_drop(struct rte_sched_port > *port, struct rte_mbuf *pkt, uint3 > color = rte_sched_port_pkt_read_color(pkt); > red_cfg = >red_config[tc_index][color]; > > + if ( (red_cfg->min_th | red_cfg->max_th) == 0) > + return 0; > + > qe = port->queue_extra + qindex; > red = >red; > > -- > 2.1.4 Acked by: Cristian Dumitrescu I agree with this approach of leaving RED as a library option that can be enabled/disabled at build-time in order to avoid any performance penalty when RED is not required. On enqueue side (when RED is enabled): -Makes sense to avoid the RED processing when RED is not enabled for the current queue; On dequeue side (when RED is enabled): -Not sure what's best when RED is not enabled for the current queue: whether to go ahead anyway and apply the timestamp, as it is harmless (but might trigger a cache miss) or check (which might trigger cache miss). Makes sense to go with the former option, which is implemented by this patch. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock
Hi Stephen, Sorry, NACK. 1. Overflow issue As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the line of code below results in overflow: port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / params->rate; Therefore, there is most likely a significant accuracy loss, which might result in more packets allowed to go out than it should. 2. Integer division has a higher cost than floating point division My understanding is we are considering a performance improvement by replacing the double precision floating point division in: double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; with an integer division: uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / port->cycles_per_byte; I don't think this is going to have the claimed benefit, as acording to "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Appendix C), the latency of the integer division instruction is significantly bigger than the latency of integer division: Instruction FDIV double precision: latency = 38-40 cycles Instruction IDIV: latency = 56 - 80 cycles 3. Alternative I hear though your suggestion about replacing the floating point division with a more performant construction. One suggestion would be to replace it with an integer multiplication followed by a shift right, probably by using a uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to prototype this code myself. Would you be OK to look into providing an alternative implementation? Thanks, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Thursday, February 5, 2015 6:14 AM To: dev at dpdk.org Cc: Stephen Hemminger Subject: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock From: Stephen HemmingerThe old code was doing a floating point divide for each rte_dequeue() which is very expensive. Change to using fixed point scaled math instead. This improved performance from 5Gbit/sec to 10 Gbit/sec Signed-off-by: Stephen Hemminger --- lib/librte_sched/rte_sched.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 55fbc14..3023457 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -102,6 +102,9 @@ #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX +/* For cycles_per_byte calculation */ +#define RTE_SCHED_TIME_SHIFT 20 + struct rte_sched_subport { /* Token bucket (TB) */ uint64_t tb_time; /* time of last update */ @@ -239,7 +242,7 @@ struct rte_sched_port { uint64_t time_cpu_cycles; /* Current CPU time measured in CPU cyles */ uint64_t time_cpu_bytes; /* Current CPU time measured in bytes */ uint64_t time;/* Current NIC TX time measured in bytes */ - double cycles_per_byte; /* CPU cycles per byte */ + uint32_t cycles_per_byte; /* CPU cycles per byte (scaled) */ /* Scheduling loop detection */ uint32_t pipe_loop; @@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params) port->time_cpu_cycles = rte_get_tsc_cycles(); port->time_cpu_bytes = 0; port->time = 0; - port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) params->rate); + + port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) + / params->rate; /* Scheduling loop detection */ port->pipe_loop = RTE_SCHED_PIPE_INVALID; @@ -2156,11 +2161,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port) { uint64_t cycles = rte_get_tsc_cycles(); uint64_t cycles_diff = cycles - port->time_cpu_cycles; - double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; + uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) + / port->cycles_per_byte; /* Advance port time */ port->time_cpu_cycles = cycles; - port->time_cpu_bytes += (uint64_t) bytes_diff; + port->time_cpu_bytes += bytes_diff; if (port->time < port->time_cpu_bytes) { port->time = port->time_cpu_bytes; } -- 2.1.4 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Explanation of the QoS offset values used in the QoS scheduler example app.
Hi, These are byte offsets used for reading these packet fields, considering that packet bytes are stored in memory in network order, while the CPU is little endian, so byte swapping takes place on read. This is probably not the best way to write this code, and I agree this portion of the app code is a bit more cryptic than it should be. Using data structures to describe the header format for the input packet (Ethernet/SVLAN/CVLAN/IPv4) and using portable byte swapping macros is probably a better alternative. This being said, the code implementation, code comments and Sample App Guide description seem to be consistent and correct. Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Aws Ismail Sent: Saturday, February 14, 2015 7:35 PM To: dev at dpdk.org Subject: [dpdk-dev] Explanation of the QoS offset values used in the QoS scheduler example app. Hi everyone, I am looking at this portion of the code in the app_thread.c file of the QoS scheduler example application: /* * QoS parameters are encoded as follows: * Outer VLAN ID defines subport * Inner VLAN ID defines pipe * Destination IP 0.0.XXX.0 defines traffic class * Destination IP host (0.0.0.XXX) defines queue * Values below define offset to each field from start of frame */ #define SUBPORT_OFFSET 7 #define PIPE_OFFSET 9 #define TC_OFFSET 20 #define QUEUE_OFFSET 20 #define COLOR_OFFSET 19 static inline int get_pkt_sched(struct rte_mbuf *m, uint32_t *subport, uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue, uint32_t *color) { uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *); *subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) & (port_params.n_subports_per_port - 1); /* Outer VLAN ID*/ *pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) & (port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */ *traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) & (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */ *queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) & (RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */ *color = pdata[COLOR_OFFSET] & 0x03; /* Destination IP */ return 0; } The offset values do not make sense to me. According to the programmer guide, the queue selection is SVID/CVID/TC/QID based. And those offset seem off in this case. Is this because it is assuming that the packet is being altered before it gets to this stage ? Can anyone provide a better explanation or at least the reason behind choosing those offset values shown above. Thanks. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
Hi Stephen, What is the reason not to clear statistics on read? Do you have a use-case / justification for it? (BTW, I see you added the reset functions, but was it also your intention to remove the memset to 0 from the stats read functions? :) ) Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Thursday, February 5, 2015 6:14 AM To: dev at dpdk.org Cc: Stephen Hemminger Subject: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read From: Stephen HemmingerMake rte_sched statistics API work like the ethernet statistics API. Don't auto-clear statistics. Signed-off-by: Stephen Hemminger --- lib/librte_sched/rte_sched.c | 30 ++ lib/librte_sched/rte_sched.h | 29 + 2 files changed, 59 insertions(+) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 8cb8bf1..d891e50 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -935,6 +935,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port, } int +rte_sched_subport_stats_reset(struct rte_sched_port *port, + uint32_t subport_id) +{ + struct rte_sched_subport *s; + + /* Check user parameters */ + if (port == NULL || subport_id >= port->n_subports_per_port) + return -1; + + s = port->subport + subport_id; + memset(>stats, 0, sizeof(struct rte_sched_subport_stats)); + return 0; +} + +int rte_sched_queue_read_stats(struct rte_sched_port *port, uint32_t queue_id, struct rte_sched_queue_stats *stats, @@ -963,6 +978,21 @@ rte_sched_queue_read_stats(struct rte_sched_port *port, return 0; } +int +rte_sched_queue_stats_reset(struct rte_sched_port *port, + uint32_t queue_id) +{ + struct rte_sched_queue_extra *qe; + + /* Check user parameters */ + if (port == NULL || queue_id >= rte_sched_port_queues_per_port(port)) + return -1; + + qe = port->queue_extra + queue_id; + memset(>stats, 0, sizeof(struct rte_sched_queue_stats)); + return 0; +} + static inline uint32_t rte_sched_port_qindex(struct rte_sched_port *port, uint32_t subport, uint32_t pipe, uint32_t traffic_class, uint32_t queue) { diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h index e9bf18a..3d007e4 100644 --- a/lib/librte_sched/rte_sched.h +++ b/lib/librte_sched/rte_sched.h @@ -317,6 +317,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port, struct rte_sched_subport_stats *stats, uint32_t *tc_ov); + +/** + * Hierarchical scheduler subport statistics reset + * + * @param port + * Handle to port scheduler instance + * @param subport_id + * Subport ID + * @return + * 0 upon success, error code otherwise + */ +int +rte_sched_subport_stats_reset(struct rte_sched_port *port, + uint32_t subport_id); + /** * Hierarchical scheduler queue statistics read * @@ -338,6 +353,20 @@ rte_sched_queue_read_stats(struct rte_sched_port *port, struct rte_sched_queue_stats *stats, uint16_t *qlen); +/** + * Hierarchical scheduler queue statistics reset + * + * @param port + * Handle to port scheduler instance + * @param queue_id + * Queue ID within port scheduler + * @return + * 0 upon success, error code otherwise + */ +int +rte_sched_queue_stats_reset(struct rte_sched_port *port, + uint32_t queue_id); + /* * Run-time * -- 2.1.4 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt framework
Thank you, Thomas! We are working on some enhancements on librte_cfg for release 2.1, so in order to avoid unnecessary code churn, it is probably better to have the librte_cfgfile changes done first, then have a subsequent patch on qos_sched. Regards, Cristian -Original Message- From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] Sent: Monday, February 9, 2015 3:15 PM To: Dumitrescu, Cristian Cc: dev at dpdk.org; Gonzalez Monroy, Sergio Subject: Re: [dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt framework 2015-02-06 13:13, Gonzalez Monroy, Sergio: > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cristian Dumitrescu > > Sent: Wednesday, February 4, 2015 3:53 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt > > framework > > > > As original author of these DPDK components, I am volunteering to maintain > > them going forward: > > - Traffic Metering > > - Hierarchical Scheduler > > - Packet Framework > > - Configuration File > > > > Signed-off-by: Cristian Dumitrescu > > Acked-by: Sergio Gonzalez Monroy Acked-by: Thomas Monjalon Applied, thanks About cfgfile, we are still waiting for the cleanup in qos_sched example: http://dpdk.org/ml/archives/dev/2014-October/006774.html Do you have news? -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] examples: fix unchecked malloc return value in ip_pipeline
Acked by: -Original Message- From: Richardson, Bruce Sent: Friday, December 12, 2014 12:24 PM To: dev at dpdk.org; Dumitrescu, Cristian Cc: Richardson, Bruce Subject: [PATCH] examples: fix unchecked malloc return value in ip_pipeline Static analysis shows that once instance of rte_zmalloc is missing a return value check in the code. This is fixed by adding a return value check. The malloc call itself is moved to earlier in the function so that no work is done unless all memory allocation requests have succeeded - thereby removing the need for rollback on error. Signed-off-by: Bruce Richardson --- examples/ip_pipeline/cmdline.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/examples/ip_pipeline/cmdline.c b/examples/ip_pipeline/cmdline.c index 13d565e..152acb5 100644 --- a/examples/ip_pipeline/cmdline.c +++ b/examples/ip_pipeline/cmdline.c @@ -1093,7 +1093,7 @@ cmd_firewall_add_parsed( __attribute__((unused)) void *data) { struct cmd_firewall_add_result *params = parsed_result; - struct app_rule rule, *old_rule; + struct app_rule rule, *old_rule, *new_rule = NULL; struct rte_mbuf *msg; struct app_msg_req *req; struct app_msg_resp *resp; @@ -1148,6 +1148,18 @@ cmd_firewall_add_parsed( if (msg == NULL) rte_panic("Unable to allocate new message\n"); + /* if we need a new rule structure, allocate it before we go further */ + if (old_rule == NULL) { + new_rule = rte_zmalloc_socket("CLI", sizeof(struct app_rule), + RTE_CACHE_LINE_SIZE, rte_socket_id()); + if (new_rule == NULL) { + printf("Cannot allocate memory for new rule\n"); + rte_ctrlmbuf_free(msg); + return; + } + } + + /* Fill request message */ req = (struct app_msg_req *)rte_ctrlmbuf_data(msg); req->type = APP_MSG_REQ_FW_ADD; @@ -1190,12 +1202,6 @@ cmd_firewall_add_parsed( printf("Request FIREWALL_ADD failed (%u)\n", resp->result); else { if (old_rule == NULL) { - struct app_rule *new_rule = (struct app_rule *) - rte_zmalloc_socket("CLI", - sizeof(struct app_rule), - RTE_CACHE_LINE_SIZE, - rte_socket_id()); - memcpy(new_rule, , sizeof(rule)); TAILQ_INSERT_TAIL(_table, new_rule, entries); n_firewall_rules++; -- 1.9.3 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] table: fix table_array for incomplete bitmask
Acked by: -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson Sent: Thursday, December 4, 2014 2:24 PM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH] table: fix table_array for incomplete bitmask When a lookup was done on a table_array structure with an incomplete bitmask, the results was always zero hits. This was because the pkts_mask value was cleared as we process each entry, and the result was assigned at the end of the loop, when pkts_mask was zero. Changing the assignment to occur at the start, before the pkts_mask gets cleared, fixes this issue. Signed-off-by: Bruce Richardson --- lib/librte_table/rte_table_array.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_table/rte_table_array.c b/lib/librte_table/rte_table_array.c index 0b1d42a..4d3c05e 100644 --- a/lib/librte_table/rte_table_array.c +++ b/lib/librte_table/rte_table_array.c @@ -164,6 +164,8 @@ rte_table_array_lookup( { struct rte_table_array *t = (struct rte_table_array *) table; + *lookup_hit_mask = pkts_mask; + if ((pkts_mask & (pkts_mask + 1)) == 0) { uint64_t n_pkts = __builtin_popcountll(pkts_mask); uint32_t i; @@ -190,8 +192,6 @@ rte_table_array_lookup( } } - *lookup_hit_mask = pkts_mask; - return 0; } -- 1.9.3 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Performance impact with QoS
Hi Satish, The QoS traffic manager has a large memory footprint due to large number of packet queues (e.g. 64K queues of 64 packets each) and large tables (e.g. 4K pipes with one cache line of context per pipe) that far exceeds the amount of CPU cache physically available. There are a lot of data structures that need to be brought into the L1 cache of the traffic manager core in order to take the scheduling decision: bitmap, pipe table entry, queue read/write pointers, queue elements, packet metadata (mbuf), etc. To minimize the penalties associated with the CPU pipeline stalling due to memory accesses, all these data structures are prefetched. So, the point I am trying to make is there are a lot of critical CPU resources involved: size of L1/L2 cache (per CPU core), size of L3 cache (shared by all CPU cores), bandwidth of L1/L2 cache (per core), bandwidth of L3 cache (shared by all CPU cores), number of outstanding prefetches (per CPU core), etc. If you map the QoS traffic manager on the same core with packet I/O (i.e. Poll Mode Driver RX/TX), my guess is these two I/O intensive workloads will both compete for the CPU resources listed above and will also impact each other by thrashing each other data structures in and out of L1/L2 cache. If you split them on different CPU cores, their operation is more performant and more predictable, as each one is having its own L1/L2 cache now. Did you try a CPU core chaining setup (through rte_rings) similar to qos_sched application, like: RX -> (TM enqueue & dequeue) -> TX or RX -> (TM enqueue & TM dequeue & TX)? I am sure you will find the right setup for you by conducting similar experiments. Of course, result also depends on which other workloads your application is performing. Regards, Cristian From: satish [mailto:nsatishb...@gmail.com] Sent: Monday, November 17, 2014 6:03 AM To: dev at dpdk.org Cc: Dumitrescu, Cristian Subject: Re: Performance impact with QoS Hi All, Can someone please provide comments on queries in below mail? Regards, Satish Babu On Mon, Nov 10, 2014 at 4:24 PM, satish mailto:nsatishbabu at gmail.com>> wrote: Hi, I need comments on performance impact with DPDK-QoS. We are working on developing a application based on DPDK. Our application supports IPv4 forwarding with and without QoS. Without QOS, we are achieving almost full wire rate (bi-directional traffic) with 128, 256 and 512 byte packets. But when we enabled QoS, performance dropped to half for 128 and 256 byte packets. For 512 byte packet, we didn't observe any drop even after enabling QoS (Achieving full wire rate). Traffic used in both the cases is same. ( One stream with Qos match to first queue in traffic class 0) In our application, we are using memory buffer pools to receive the packet bursts (Ring buffer is not used). Same buffer is used during packet processing and TX (enqueue and dequeue). All above handled on the same core. For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX. For forwarding with QoS, using rte_sched_port_pkt_write(), rte_sched_port_enqueue () and rte_sched_port_dequeue () before rte_eth_tx_burst (). We understood that performance dip in case of 128 and 256 byte packet is bacause of processing more number of packets compared to 512 byte packet. Can some comment on performance dip in my case with QOS enabled? [1] can this be because of inefficient use of RTE calls for QoS? [2] Is it the poor buffer management? [3] any other comments? To achieve good performance in QoS case, is it must to use worker thread (running on different core) with ring buffer? Please provide your comments. Thanks in advance. Regards, Satish Babu -- Regards, Satish Babu -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Max throughput Using QOS Scheduler
Hi Srikanth, >>Is there any difference between scheduler behavior for above two scenarios >>while enqueing and de-queueing ?? All the pipe queues share the bandwidth allocated to their pipe. The distribution of available pipe bandwidth between the pipe queues is governed by features like traffic class strict priority, bandwidth sharing between pipe traffic classes, weights of the queues within the same traffic class, etc. In the case you mention, you are just using one queue for each traffic class. Let?s take an example: -Configuration: pipe rate = 10 Mbps, pipe traffic class 0 .. 3 rates = [20% of pipe rate = 2 Mbps, 30% of pipe rate = 3 Mbps, 40% of pipe rate = 4 Mbps, 100% of pipe rate = 10 Mbps]. Convention is that traffic class 0 is the highest priority. -Injected traffic per traffic class for this pipe: [3, 0, 0, 0] Mbps => Output traffic per traffic class for this pipe: [2 , 0, 0, 0] Mbps -Injected traffic per traffic class for this pipe: [0, 0, 0, 15] Mbps => Output traffic per traffic class for this pipe: [0, 0, 0, 10] Mbps -Injected traffic per traffic class for this pipe: [1, 10, 2, 15] Mbps => Output traffic per traffic class for this pipe: [1, 3, 2, 4] Mbps Makes sense? >>Queue size is 64 , and number of packets enqueued and dequeued is 64 as well. I strongly recommend you never use a dequeue burst size that is equal to enqueue burst size, as performance will be bad. In the qos_sched sample app, we use [enqueue burst size, dequeue burst size] set to [64, 32], other reasonable values could be [64, 48], [32, 16], etc. An enqueue burst bigger than dequeue burst will cause the big packet reservoir which is the traffic manager/port scheduler to fill up to a reasonable level that will allow dequeu to function optimally, and then the system regulates itself. The reason is: since we interlace enqueue and dequeue calls, if you push on every iteration e.g. 64 packets in and then look to get 64 packets out, you?ll only have 64 packets into the queues, then you?ll work hard to find them, and you get out exactly those 64 packets that you pushed in. >>And what is the improvements i would gain if i move to DPDK 1.7 w.r.t QOS ? The QoS code is pretty stable since release 1.4, not many improvements added (maybe it?s the right time to revisit this feature and push it to the next level ?), but there are improvements in other DPDK libraries that are dependencies for QoS (e.g. packet Rx/Tx). Hope this helps. Regards, Cristian From: Srikanth Akula [mailto:srikanth...@gmail.com] Sent: Thursday, October 30, 2014 4:10 PM To: dev at dpdk.org; Dumitrescu, Cristian Subject: Max throughput Using QOS Scheduler Hello All , I am currently trying to implement QOS scheduler using DPDK 1.6 . I have configured 1 subport , 4096 pipes for the sub port and 4 TC's and 4 Queues . Currently i am trying to send packets destined to single Queue of the available 16 queues of one of the pipe . Could some body explain what could be the throughput we can achieve using this scheme. The reason for asking this is , i could sense different behavior each time when i send traffic destined to different destination Queues . for example : 1. << Only one stream>>> Stream destined Q0 of TC0 .. 2. << 4 streams >>>> 1st Stream destined for Q3 of Tc3 ... 2nd stream destined for Q2 of Tc2 3rd stream destined for Q1 of TC1 4th Stream destined for Q0 of TC0 Is there any difference between scheduler behavior for above two scenarios while enqueing and de-queueing ?? Queue size is 64 , and number of packets enqueud and dequeued is 64 as well. And what is the improvements i would gain if i move to DPDK 1.7 w.r.t QOS ? Could you please clarify my queries ? Thanks & Regards, Srikanth -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Dynamic port/pipe QoS configuration
Hi Satish, Yes, you can change the pipe configuration during run-time, but in a slightly different way. The way to do it is by defining multiple pipe profiles at the port level initialization time and reassigning the pipe to a different level at run-time. The pipe profiles are statically defined per port at initialization time (their number is configurable through rte_sched_port_params:: n_pipe_profiles parameter), so you need to know in advance the set of profiles you want to use at run-time. You can assign pipe X to profile A initially, then re-assign it to profile B later on (by using function rte_sched_pipe_config()). Regards, Cristian From: satish mailto:nsatishb...@gmail.com>> Date: October 17, 2014 at 2:59:35 PM PDT To: mailto:dev at dpdk.org>> Subject: Re: [dpdk-dev] Dynamic port/pipe QoS configuration Hi, Can someone please respond to below? Thank you. On Mon, Oct 13, 2014 at 3:54 PM, satish mailto:nsatishbabu at gmail.com>> wrote: Hi, We are trying to provide QoS support for one of our clients using rte_sched. In our implementation we are treating each pipe as a customer. So, we can have maximum of 4096 customers per sub-port. Customers(pipe) can be added, deleted or modified dynamically. Each customer can have different profiles. Currently we are using DPDK-v1.6. Can I modify pipe profile during run time using rte_sched_pipe_config ()? Our plan is to have initial configs as below (similar to examples in DPDK) [1] Specify port params at the initialization of port as below static struct rte_sched_port_params port_param = { : : .n_subports_per_port = 1, .n_pipes_per_subport = 4096, .qsize = {64, 64, 64, 64}, .pipe_profiles = pipe_profile, .n_pipe_profiles = 1, } [2] static struct rte_sched_subport_params subport_param[] = { { .tb_rate = Link speed (1G/10G..) divided by 8 (bits), .tb_size = 100, .tc_rate = {Same as tb_rate, Same as tb_rate, Same as tb_rate, Same as tb_rate}, .tc_period = 10, }, }; [3] static struct rte_sched_pipe_params pipe_profile[] = { { /* Profile #0 */ .tb_rate = Link speed (1G/10G..) divided by 8 (bits)/4096 (maximum number of pipes), .tb_size = 100, .tc_rate = {pipe's tb_rate, pipe's tb_rate, pipe's tb_rate, pipe's tb_rate}, .tc_period = 40, .wrr_weights = {16, 4, 2, 1, 16, 4, 2, 1, 16, 4, 2, 1, 16, 4, 2, 1}, }, }; Our plan here is to initialize the pipe with default profile and modify each pipe based on user configurations. My questions are [a] Can I modify pipe profile during run time using rte_sched_pipe_config ()? (question repeated) If I can modify at pipe level, [b] Can we have different profiles for pipes, With one default profile at initialization? [c] Can we modify port level params without deleting the port using rte_sched_port_config ()? Please provide your valuable comments. Thanks in advance. -- Regards, Satish Babu -- Regards, Satish Babu -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [v2 20/23] librte_cfgfile: interpret config files
Hi Tomas, Yes, you're right, we need to close on this pending item. Thanks for bringing it up. I am currently working on a patch series, once I send it out I will come back and look into to qos_sched. Is this OK with you? Regards, Cristian -Original Message- From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] Sent: Thursday, October 16, 2014 5:46 PM To: Dumitrescu, Cristian Cc: dev at dpdk.org; Wu, Jingjing; Liu, Jijiang Subject: Re: [dpdk-dev] [v2 20/23] librte_cfgfile: interpret config files Hi Cristian, 2014-06-04 19:08, Cristian Dumitrescu: > This library provides a tool to interpret config files that have standard > structure. > > It is used by the Packet Framework examples/ip_pipeline sample application. > > It originates from examples/qos_sched sample application and now it makes > this code available as a library for other sample applications to use. > The code duplication with qos_sched sample app to be addressed later. 4 months ago, you said that this duplication will be adressed later. Neither you nor anyone at Intel submitted a patch to clean up that. I just want to be sure that "later" doesn't mean "never" because I'm accepting another "later" word for cleaning old filtering API. Maybe you just forgot it so please prove me that I'm right to accept "later" clean-up, in general. Thanks -- Thomas -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the mbuf metadata
Hi Olivier, I agree that your suggested approach for application-dependent metadata makes sense, in fact the two approaches work in exactly the same way (packet metadata immediately after the regular mbuf), there is only a subtle difference, which is related to defining consistent DPDK usage guidelines. 1. Advertising the presence of application-dependent meta-data as supported mechanism If we explicitly have a metadata zero-size field at the end of the mbuf, we basically tell people that adding their own application meta-data at the end of the mandatory meta-data (mbuf structure) is a mechanism that DPDK allows and supports, and will continue to do so for the foreseeable future. In other words, we guarantee that an application doing so will continue to build successfully with future releases of DPDK, and we will not introduce changes in DPDK that could potentially break this mechanism. It is also a hint to people of where to put their application dependent meta-data. 2. Defining a standard base address for the application-dependent metadata - There are also libraries in DPDK that work with application dependent meta-data, currently these are the Packet Framework libraries: librte_port, librte_table, librte_pipeline. Of course, the library does not have the knowledge of the application dependent meta-data format, so they treat it as opaque array of bytes, with the offset and size of the array given as arguments. In my opinion, it is safer (and more elegant) if these libraries (and others) can rely on an mbuf API to access the application dependent meta-data (in an opaque way) rather than make an assumption about the mbuf (i.e. the location of custom metadata relative to the mbuf) that is not clearly supported/defined by the mbuf library. - By having this API, we basically say: we define the custom meta-data base address (first location where custom metadata _could_ be placed) immediately after the mbuf, so libraries and apps accessing custom meta-data should do so by using a relative offset from this base rather than each application defining its own base: immediately after mbuf, or 128 bytes after mbuf, or 64 bytes before the end of the buffer, or other. More (minor) comments inline below. Thanks, Cristian -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Friday, September 12, 2014 10:02 PM To: Dumitrescu, Cristian; Richardson, Bruce; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the mbuf metadata Hello Cristian, > What is the reason to remove this field? Please explain the > rationale of removing this field. The rationale is explained in http://dpdk.org/ml/archives/dev/2014-September/005232.html "The format of the metadata is up to the application". The type of data the application stores after the mbuf has not to be defined in the mbuf. These macros limits the types of metadata to uint8, uint16, uint32, uint64? What should I do if I need a void*, a struct foo ? Should we add a macro for each possible type? [Cristian] Actually, this is not correct, as macros to access metadata through pointers (to void or scalar types) are provided as well. This pointer can be converted by the application to the format is defines. > We previously agreed we need to provide an easy and standard > mechanism for applications to extend the mandatory per buffer > metadata (mbuf) with optional application-dependent > metadata. Defining a structure in the application which does not pollute the rte_mbuf structure is "easy and standard(TM)" too. [Cristian] I agree, both approaches work the same really, it is just the difference in advertising the presence of meta-data as supported mechanism and defining a standard base address for it. > This field just provides a clean way for the apps to > know where is the end of the mandatory metadata, i.e. the first > location in the packet buffer where the app can add its own > metadata (of course, the app has to manage the headroom space > before the first byte of packet data). A zero-size field is the > standard mechanism that DPDK uses extensively in pretty much > every library to access memory immediately after a header > structure. Having the following is clean too: struct metadata { ... }; struct app_mbuf { struct rte_mbuf mbuf; struct metadata metadata; }; There is no need to define anything in the mbuf structure. [Cristian] I agree, both approaches work the same really, it is just the difference in advertising the presence of meta-data as supported mechanism and defining a standard base address for it. > > The impact of removing this field is that there is no standard > way to identify where the end of the mandatory metadata is, so > each application will have to reinvent this. With no clear > convention, we will end up with a lot of non-standard ways. Eve
[dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the mbuf metadata
Bruce, Olivier, What is the reason to remove this field? Please explain the rationale of removing this field. We previously agreed we need to provide an easy and standard mechanism for applications to extend the mandatory per buffer metadata (mbuf) with optional application-dependent metadata. This field just provides a clean way for the apps to know where is the end of the mandatory metadata, i.e. the first location in the packet buffer where the app can add its own metadata (of course, the app has to manage the headroom space before the first byte of packet data). A zero-size field is the standard mechanism that DPDK uses extensively in pretty much every library to access memory immediately after a header structure. The impact of removing this field is that there is no standard way to identify where the end of the mandatory metadata is, so each application will have to reinvent this. With no clear convention, we will end up with a lot of non-standard ways. Every time the format of the mbuf structure is going to be changed, this can potentially break applications that use custom metadata, while using this simple standard mechanism would prevent this. So why remove this? Having applications define their optional meta-data is a real need. Please take a look at the Service Chaining IEFT emerging protocols (https://datatracker.ietf.org/wg/sfc/documents/), which provide standard mechanisms for applications to define their own packet meta-data and share it between the elements of the processing pipeline (for Service Chaining, these are typically virtual machines scattered amongst the data center). And, in my opinion, there is no negative impact/cost associated with keeping this field. Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Richardson, Bruce Sent: Tuesday, September 9, 2014 10:01 AM To: Olivier MATZ; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the mbuf metadata > -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Monday, September 08, 2014 1:06 PM > To: Richardson, Bruce; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the > mbuf metadata > > Hi Bruce, > > On 09/03/2014 05:49 PM, Bruce Richardson wrote: > > Removed the explicit zero-sized metadata definition at the end of the > > mbuf data structure. Updated the metadata macros to take account of this > > change so that all existing code which uses those macros still works. > > > > Signed-off-by: Bruce Richardson > > --- > > lib/librte_mbuf/rte_mbuf.h | 22 -- > > 1 file changed, 8 insertions(+), 14 deletions(-) > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > index 5260001..ca66d9a 100644 > > --- a/lib/librte_mbuf/rte_mbuf.h > > +++ b/lib/librte_mbuf/rte_mbuf.h > > @@ -166,31 +166,25 @@ struct rte_mbuf { > > struct rte_mempool *pool; /**< Pool from which mbuf was allocated. > */ > > struct rte_mbuf *next;/**< Next segment of scattered packet. */ > > > > - union { > > - uint8_t metadata[0]; > > - uint16_t metadata16[0]; > > - uint32_t metadata32[0]; > > - uint64_t metadata64[0]; > > - } __rte_cache_aligned; > > } __rte_cache_aligned; > > > > #define RTE_MBUF_METADATA_UINT8(mbuf, offset) \ > > - (mbuf->metadata[offset]) > > + (((uint8_t *)&(mbuf)[1])[offset]) > > #define RTE_MBUF_METADATA_UINT16(mbuf, offset) \ > > - (mbuf->metadata16[offset/sizeof(uint16_t)]) > > + (((uint16_t *)&(mbuf)[1])[offset/sizeof(uint16_t)]) > > #define RTE_MBUF_METADATA_UINT32(mbuf, offset) \ > > - (mbuf->metadata32[offset/sizeof(uint32_t)]) > > + (((uint32_t *)&(mbuf)[1])[offset/sizeof(uint32_t)]) > > #define RTE_MBUF_METADATA_UINT64(mbuf, offset) \ > > - (mbuf->metadata64[offset/sizeof(uint64_t)]) > > + (((uint64_t *)&(mbuf)[1])[offset/sizeof(uint64_t)]) > > > > #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset) \ > > - (>metadata[offset]) > > + (_MBUF_METADATA_UINT8(mbuf, offset)) > > #define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) \ > > - (>metadata16[offset/sizeof(uint16_t)]) > > + (_MBUF_METADATA_UINT16(mbuf, offset)) > > #define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) \ > > - (>metadata32[offset/sizeof(uint32_t)]) > > + (_MBUF_METADATA_UINT32(mbuf, offset)) > > #define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) \ > > - (>metadata64[offset/sizeof(uint64_t)]) > > + (_MBUF_METADATA_UINT64(mbuf, offset)) > > > > /** > > * Given the buf_addr returns the pointer to corresponding mbuf. > > > > I think it goes in the good direction. So: > Acked-by: Olivier Matz > > Just one question: why not removing RTE_MBUF_METADATA*() macros? > I'd just provide one macro that gives a (void*) to the first byte > after the mbuf structure. > > The
[dpdk-dev] [PATCH 0/4] lib/librte_table: Fix bugs occuring in corner cases
Acked by: Cristian.Dumitrescu -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Balazs Nemeth Sent: Thursday, September 11, 2014 6:47 PM To: dev at dpdk.org Cc: Nemeth, Balazs Subject: [dpdk-dev] [PATCH 0/4] lib/librte_table: Fix bugs occuring in corner cases This set of patches fixes bugs in the packet framework. Some of the bugs occur in corner cases (i.e. when a lookup is performed on a few packets or when buckets are in extended states) while others can cause memory to be accessed beyond what is reserved during initialization time. Balazs Nemeth (4): lib/librte_table: Fix empty bucket removal during entry deletion in rte_table_hash_ext lib/librte_table: Fix checking extended buckets in unoptimized case lib/librte_table: Fix incorrect t->data_size_shl initialization lib/librte_table: Fix pointer calculations at initialization lib/librte_table/rte_table_hash_ext.c | 13 ++--- lib/librte_table/rte_table_hash_key16.c | 4 ++-- lib/librte_table/rte_table_hash_key32.c | 4 ++-- lib/librte_table/rte_table_hash_key8.c | 8 lib/librte_table/rte_table_hash_lru.c | 7 +++ 5 files changed, 17 insertions(+), 19 deletions(-) -- 1.9.1 Intel Corporation NV/SA Kings Square, Veldkant 31 2550 Kontich RPM (Bruxelles) 0415.497.718. Citibank, Brussels, account 570/1031255/09 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] examples/qos_sched: fix flow pause after 2M packets
Acked-by: Cristian Dumitrescu -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yong Liu Sent: Thursday, July 3, 2014 10:00 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH] examples/qos_sched: fix flow pause after 2M packets After enable vector pmd, qos_sched only send 32 packets every burst. That will cause some packets not transmitted and therefore mempool will be drain after a while. App qos_sched now will re-send the packets which failed to send out in previous tx function. Signed-off-by: Yong Liu Acked-by: Cristian Dumitrescu Tested-by: Waterman Cao --- examples/qos_sched/app_thread.c | 16 +--- 1 files changed, 5 insertions(+), 11 deletions(-) diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c index 7501147..59c4014 100755 --- a/examples/qos_sched/app_thread.c +++ b/examples/qos_sched/app_thread.c @@ -139,17 +139,11 @@ app_send_burst(struct thread_conf *qconf) do { ret = rte_eth_tx_burst(qconf->tx_port, qconf->tx_queue, mbufs, (uint16_t)n); - if (unlikely(ret < n)) { /* we cannot drop the packets, so re-send */ - /* update number of packets to be sent */ - n -= ret; - mbufs = (struct rte_mbuf **)[ret]; - /* limit number of retries to avoid endless loop */ - /* reset retry counter if some packets were sent */ - if (likely(ret != 0)) { - continue; - } - } - } while (ret != n); + /* we cannot drop the packets, so re-send */ + /* update number of packets to be sent */ + n -= ret; + mbufs = (struct rte_mbuf **)[ret]; + } while (n); } -- 1.7.7.6 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] examples/qos_sched: fix flow pause after 2M packets
>>How about "mbufs += ret" rather than "mbufs = (struct rte_mbuf >>**)[ret]"? Functionally it is the same, but "mbufs = (struct rte_mbuf **)[ret]" is likely less prone to compiler warnings, so my vote is to keep it as it is and get this patch integrated asap into 1.7. Regards, Cristian -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init
Hi Anatoly, I would suggest we add a log message explaining which mechanism is loaded (igb_uio/vfio) and why (e.g. tried vfio first but container could not be opened, so falling back to igb_uio, etc). Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Burakov, Anatoly Sent: Wednesday, June 18, 2014 9:57 AM To: Thomas Monjalon Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init Hi Thomas, > Subject: Re: [dpdk-dev] [PATCH] vfio: open VFIO container at startup rather > than during init > > > Signed-off-by: Anatoly Burakov > > Please Anatoly, could you provide a text explaining what was broken and > why you fixed it this way? What was broken was if, for some reason, VFIO is loaded but the user can't initialize it (an example would be wrong permissions, or unsupported IOMMU type, which is what Bruce seems to be having... which shouldn't happen as far as I know, but there's nothing I can do on DPDK's side to fix this as this is the kernel reporting wrong kind of IOMMU type), DPDK would fail to load. The fix makes DPDK simply not try VFIO support at all if the container cannot be opened for some reason. Best regards, Anatoly Burakov DPDK SW Engineer -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] Add an API to query enabled core index
Maybe we could simplify this discussion by simply creating a new function to return the mask of all enabled cores (as provided through -c coremask EAL option) and have the user utilize this mask to derive whatever info it needs? Right now, to get the mask of enabled cores, a for loop is required to test each core index one by one and re-create the mask. In several instances, I needed to know just the number of enabled cores (i.e. number of bits set in -c coremask), and there was no alternative to the for loop above. But given such a function, we can quickly do: uint64_t coremask = rte_eal_coremask(); n_lcores = __builtin_popcountll(coremask); For what Patrick needs: uint32_t lcore_enabled_pos = __builtin_popcountll(coremask & RTE_LEN2MASK(lcore_index)); Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Richardson, Bruce Sent: Thursday, June 12, 2014 12:28 AM To: Thomas Monjalon Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] Add an API to query enabled core index > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, June 11, 2014 3:50 PM > To: Richardson, Bruce > Cc: Lu, Patrick; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] Add an API to query enabled core index > > 2014-06-11 21:57, Richardson, Bruce: > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > > > 2014-06-11 13:45, Patrick Lu: > > > > The new API rte_lcore_id2() will return an index from enabled lcores > > > > starting from zero. > > > > > > I think core_id2 is not a representative name. > > > What do you think of renaming core_id as lcore_hwid and core_id2 as > > > lcore_index? > > > > I like lcore_index as the name for the new function. However, I'm not sure > > in that case that we want/need to rename the old one. > > I think it would be not easy to distinguish id and index. So I prefer > hwid/index. And lcore is more precise than core. > The function is already called "rte_lcore_id()" so there is no need to change it to make it an "lcore" function. That function has been around for a long time and is commonly used, so I'd prefer it not be changed unless it really is necessary. "rte_lcore_index" is a sufficiently different function name, in my opinion. The API documentation should clear up any confusion for the user anyway. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [v2 22/23] Packet Framework IPv4 pipeline sample app
Hi Olivier, A few notes on using pktmbuf here: 1. As the name implies, pktmbuf should be used for packets and ctrlmbuf should be used for control messages :). IMHO using pktmbuf to control messages is a confusing workaround. 2. Pktmbuf has a lot of overhead that is not needed in order to send short messages between cores. Pktmbuf has a lot of pointers and other fields that do not make sense for messages. I don't think we want people to say DPDK is difficult to use because e.g. sending 2 bytes from core A to core B requires initializing a bunch of pointers and other fields that do not make sense. 3. Once we start using pktmbuf to send messages, it is likely that other people will follow this example, and they might do it incorrectly. I don't think we want to see emails on this list from people asking e.g: i) Why does my app segfaults, when all I want to do is send 2 bytes from core A to core B? ii) Why does my app segfaults when core A writes a message to a NIC TX queue? :) Using an app dependent structure requires duplicating the work to create/free the pool of such structures, and alloc/free mechanism. And then some people will ask why are we not using ctrlmbuf, as long as ctrlmbuf exists in DPDK. I think that, as long as we have ctrlmbuf and pktmbuf in DPDK, we should follow the existing model. We should not look for workarounds that we know we plan to change anyway, we should look for the right solution. We both agree we need to refresh pktmbuf and ctrlmbuf, but my point is we should not do changes as long as we don't know what the agreed solution will look like? Thanks, Cristian -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Monday, June 9, 2014 1:14 PM To: Dumitrescu, Cristian Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [v2 22/23] Packet Framework IPv4 pipeline sample app Hi Christian, > We need a message type defined for message passing between cores, and > pktmbuf is definitely not the right approach. Could you please explain why a pktmbuf is not the right approach? As proposed in http://dpdk.org/ml/archives/dev/2014-May/002759.html I think the control mbuf could be replaced by a packet mbuf or an application private structure. Regards, Olivier -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [v2 22/23] Packet Framework IPv4 pipeline sample app
Hi Olivier, We could remove the ctrlmbuf from this app and replace it with something else, but I am afraid we do not have that something else yet defined and agreed. And I would like to avoid doing the same work twice: change this app now to replace the ctrlmbuf with something else, and then replace this something else with whatever we decide to use for message passing as part of the 1.8 mbuf refresh discussion. We need a message type defined for message passing between cores, and pktmbuf is definitely not the right approach. I can also invent something new, but it is unlikely people will accept it now without a debate, so it will only make this problem worse. Not to mention that we do not even have consensus to remove ctrlmbuf :(. My proposal is (as also discussed with Ivan on a different thread) to take the mbuf refresh discussion during 1.8 timeframe, which should include the decision on what to use for message passing. I can commit now to send a patch for this app at that time to do these changes, would this work? Thanks, Cristian -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Monday, June 9, 2014 10:12 AM To: Dumitrescu, Cristian; dev at dpdk.org Subject: Re: [dpdk-dev] [v2 22/23] Packet Framework IPv4 pipeline sample app Hi Cristian, On 06/04/2014 08:08 PM, Cristian Dumitrescu wrote: > This Packet Framework sample application illustrates the capabilities of the > Intel DPDK Packet Framework toolbox. > > It creates different functional blocks used by a typical IPv4 framework like: > flow classification, firewall, routing, etc. > > CPU cores are connected together through standard interfaces built on SW > rings, which each CPU core running a separate pipeline instance. > > Please refer to Intel DPDK Sample App Guide for full description. > > Signed-off-by: Cristian Dumitrescu Would it be possible to replace the ctrlmbuf by something else (a pktmbuf for instance)? As you know this would conflict if we want to remove the ctrlmbuf from the rte_mbuf structure. Regards, Olivier -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
Hi, In order to minimize the number of iterations, Ivan and I had an offline discussion on this. Ivan is concerned with the way the mbuf cloning/scatter-gather feature is currently implemented in DPDK, and not with this particular patch. We agreed to take the discussion on cloning/scatter-gather implementation during the DPDK 1.8 time-frame, at this belongs to the mbuf refresh discussion. The mbuf library is not just the format of the mbuf data structure, it also includes all the features associated with the mbuf, as: accessing mbuf fields through get/set methods, buffer chaining, cloning/scatter-gather, enabling HW offloads through mbuf flags and fields, etc. Regards, Cristian -Original Message- From: Ivan Boule [mailto:ivan.bo...@6wind.com] Sent: Monday, June 2, 2014 1:24 PM To: Dumitrescu, Cristian; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Hi Cristian, I agree with you, the natural way to store application metadata into mbufs consists in putting it right after the rte_mbuf data structure. This can be simply implemented this way: struct metadata { ... }; struct app_mbuf { struct rte_mbuf mbuf; struct metadata mtdt; }; With such a representation, the application initializes the "buf_addr" field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as: amb->mbuf.buf_addr = ((char *amb) + sizeof(struct app_mbuf)); But such a natural programming approach breaks the assumptions of the macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT. For instance, in the function __rte_pktmbuf_prefree_seg(m), after the line struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr); "md" is always different from "m", and thus systematically (and most of the time wrongly) considered as an indirect mbuf. In the same way, the operations done by the function rte_pktmbuf_detach(m) void *buf = RTE_MBUF_TO_BADDR(m); m->buf_addr = buf; does not set buf_addr to its [dafault] correct value. To summarize, adding metadata after the rte_mbuf data structure is incompatible with the packet cloning feature behind the [wrongly named] RTE_MBUF_SCATTER_GATHER configuration option. By the way, you suggest to use the headroom to also store packet metadata. But, then, how does an application can store both types of information in a given mbuf at the same time, when the actual length of network headers in a mbuf is variable, as it depends on the protocol path followed by the packet in a networking stack (tunnels, etc)? Regards, Ivan On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote: > Hi Ivan, > > I hate to disagree with you :), but no, we do not break the scatter-gather > feature. We actually implemented the Packet Framework IPv4 fragmentation and > reassembly ports to validate exactly this. > > Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only > works (and it is only used) for direct mbufs, so probably the correct name > for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it > is not used) for indirect mbufs. > > I am describing the rationale behind meta-data design below, sorry for the > long description that looks like a design document ... > > Quick recap: > - mbuf->buf_addr points to the first address where a byte of data for the > current segment _can_ be placed > - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer > with the packet itself, so mbuf->buf_addr = mbuf + 1; > - indirect mbuf: the packet descriptor is located in a different mempool > buffer than the packet itself, so mbuf->buf_addr != mbuf + 1; > - Regardless of the mbuf type, it is not necessarily true that the first byte > of data is located at mbuf->buf_addr, as we save a headroom (configurable, by > default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of > the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why > not, meta-data!). The location of the first real data byte is > mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does > not change. > > Packet meta-data rationale: > - I think we both agree that we need to be able to store packet meta-data in > the packet buffer. The packet buffer regions where meta-data could be stored > can only be the in the headroom or in the tailroom, which are both managed by > the application and both go up and down during the packet lifetime. > > - To me, packet-metadata is part of the packet descriptor: mbuf is the > mandatory & standard part of the packet descriptor, while meta-data is the > optional & non-standard (per application) part of the packet descriptor. > Therefore, it makes sense to put
[dpdk-dev] IMPORTANT: feature freeze for version 1.7.0
Hi Thomas, I am getting emails from people asking about Intel DPDK docs update for Packet Framework. Is there a way to upload documents to dpdk.org to make them available for review? We have Packet Framework updates to Programmer's Guide and Sample App Guide (MS Word with lots of tables and diagrams), and it would probably be very beneficial to people reviewing the code to have access to these docs as well, as opposed to getting the code now and the docs later once release is done. Thanks, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon Sent: Friday, May 30, 2014 2:12 PM To: dev at dpdk.org Subject: [dpdk-dev] IMPORTANT: feature freeze for version 1.7.0 Hello all, We have a lot of new features mostly ready for DPDK 1.7.0. It's now time to be sure that they are well reviewed and integrated. Then release candidate should be tagged in mid-June in order to have some validation tests before end of June (planned released date). So requests for integration of features which have not yet been sent will be delayed to next version. Only reworks or small fixes of pending patches should be accepted. A new branch "rebased-next" will be created for patches which are reviewed and waiting next version. This branch will be regularly rebased on master HEAD. Being currently on vacation, I won't be very active until Monday, 9th. During this time, I'd like to see some reviews/acknowledgements for these patches: - socket id detection fallback http://dpdk.org/ml/archives/dev/2014-April/001987.html - fix rte_free run time in O(n) free blocks http://dpdk.org/ml/archives/dev/2014-May/002296.html - fix mbuf corruption in pcap_tx http://dpdk.org/ml/archives/dev/2014-May/002919.html - igb_uio fixes http://dpdk.org/ml/archives/dev/2014-May/002428.html - vfio http://dpdk.org/ml/archives/dev/2014-May/002914.html - ethernet enhancements http://dpdk.org/ml/archives/dev/2014-May/002436.html - splitted statistics http://dpdk.org/ml/archives/dev/2014-May/002707.html - mtu/flow control http://dpdk.org/ml/archives/dev/2014-May/002752.html - link up/down http://dpdk.org/ml/archives/dev/2014-May/002660.html - Tx rate limitation http://dpdk.org/ml/archives/dev/2014-May/002696.html - lpm optimization http://dpdk.org/ml/archives/dev/2014-May/002703.html - generic filter http://dpdk.org/ml/archives/dev/2014-May/002740.html http://dpdk.org/ml/archives/dev/2014-May/002577.html - ip_frag http://dpdk.org/ml/archives/dev/2014-May/002930.html - distributor http://dpdk.org/ml/archives/dev/2014-May/002598.html - link bonding http://dpdk.org/ml/archives/dev/2014-May/002922.html - acl http://dpdk.org/ml/archives/dev/2014-May/002945.html - packet framework http://dpdk.org/ml/archives/dev/2014-May/002820.html Thanks for your participation -- Thomas -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Please any one who can help me with librte_sched
Hi Ariel, So how are you making sure you are not getting more packets in while you are waiting for dequeue() to return zero for some time? As stated, I don?t like this approach, firstly because it forces the on-the-fly BW configuration changes to result in dropping packets for a considerable amount of time in an uncontrolled way, and it should not be this way. Secondly, because making it work with no buffer leakage and no race conditions is probably not that simple ;) Regards, Cristian From: Ariel Rodriguez [mailto:arodrig...@callistech.com] Sent: Wednesday, May 28, 2014 11:50 AM To: Dumitrescu, Cristian Cc: dev at dpdk.org; Stephen Hemminger Subject: RE: [dpdk-dev] Please any one who can help me with librte_sched Ok i can do that... but still is there a way to ask to the rte_sched_port something like is_empty ... Or simply if the dequeue function return 0 packets retrieved from the old port structure running in other core, Can i assume that port is empty with that? Regards Ariel. On May 28, 2014 7:10 AM, "Dumitrescu, Cristian" mailto:cristian.dumitrescu at intel.com>> wrote: Hi Ariel, I think you put your finger precisely on the problem associated with your approach: you have to iterate through all the queues and free up the packets, which takes a lot of time. Obviously this is not done by the rte_sched API. Maybe a limited workaround for this approach would be to create and service the parallel rte_sched using a different CPU core, while the previous CPU core takes its time to free up all the packets and data structures correctly. Regards, Cristian From: Ariel Rodriguez [mailto:arodriguez at callistech.com<mailto:arodrig...@callistech.com>] Sent: Wednesday, May 28, 2014 1:46 AM To: Dumitrescu, Cristian Cc: Stephen Hemminger; dev at dpdk.org<mailto:dev at dpdk.org> Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched Thank you perfect explanation, i think im going to creating a new parallel rte_sched_port and change the reference with managment core updating the tx/sched core. So, what happens with the packets on the old reference if i just do rte_port_free on it, are them leaked? Is there a why to flush the rte_sched_port or maybe gets the packet total size somewhere?. Anyway the rcu algoritm fits ok in this aproach ... but maybe there is a way to flush the old reference port, and work from there with the recently created rte_sched_port Regars, Ariel. On Tue, May 27, 2014 at 3:31 PM, Dumitrescu, Cristian mailto:cristian.dumitrescu at intel.com>> wrote: Hi Ariel, What's wrong with calling rte_sched_subport_config() and rte_sched_pipe_config() during run-time? This assumes that: 1. Port initialization is done, which includes the following: a) the number of subports, pipes per subport are fixed b) the queues are all created and their size is fixed c) the pipe profiles are defined d) Basically the maximal data structures get created (maximum number of supports, pipes and queues) with no run-time changes allowed, apart for the bandwidth related parameters. Queues that do not receive packets are not used now, they will be used as soon as they get packets. The packets-to-queues mapping logic can change over time, as well as the level of activity for different users/queues. 2. The CPU core calling the subport/pipe config functions is the same as the core doing enque/dequeue for this port (for thread safety reasons). a) As you say, the management core can send update requests to the core running the scheduler, with the latter sampling the request queue regularly and performing the updates. Regards, Cristian -Original Message- From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-boun...@dpdk.org>] On Behalf Of Stephen Hemminger Sent: Tuesday, May 27, 2014 5:35 PM To: Ariel Rodriguez Cc: dev at dpdk.org<mailto:dev at dpdk.org> Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched On Tue, 27 May 2014 10:33:02 -0300 Ariel Rodriguez mailto:arodriguez at callistech.com>> wrote: > Hello , this is my third mail , the previous mails have not been answered > yet. > > I justo need someone explains to me how the librte_sched framework behaves > in a specific situation. > > I have a managment application , this connects with a ring with the tx > core, when a user applies some configuration of the bandwith mangement , > the tx core read the message in the ring parse the configuration in a > rte_port_params struct , subport_params and pipe_params, then creates a new > rte_sched from scratch , and then changes the pointer of the current > rte_sched_port currently doing scheduling and then the code execurte > rte_sched_port_free() for the unreference (reference by temporal pointer) > rte_sched_port . This is the only way i found for applying dinamic > configuration or changes to the qos framework. > So, with this, what ha
[dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
There is a tricky type below (leave of -> live off), correcting ... -Original Message- From: Dumitrescu, Cristian Sent: Thursday, May 29, 2014 11:28 PM To: 'Ivan Boule'; dev at dpdk.org Subject: RE: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Hi Ivan, I hate to disagree with you :), but no, we do not break the scatter-gather feature. We actually implemented the Packet Framework IPv4 fragmentation and reassembly ports to validate exactly this. Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only works (and it is only used) for direct mbufs, so probably the correct name for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it is not used) for indirect mbufs. I am describing the rationale behind meta-data design below, sorry for the long description that looks like a design document ... Quick recap: - mbuf->buf_addr points to the first address where a byte of data for the current segment _can_ be placed - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer with the packet itself, so mbuf->buf_addr = mbuf + 1; - indirect mbuf: the packet descriptor is located in a different mempool buffer than the packet itself, so mbuf->buf_addr != mbuf + 1; - Regardless of the mbuf type, it is not necessarily true that the first byte of data is located at mbuf->buf_addr, as we save a headroom (configurable, by default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why not, meta-data!). The location of the first real data byte is mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does not change. Packet meta-data rationale: - I think we both agree that we need to be able to store packet meta-data in the packet buffer. The packet buffer regions where meta-data could be stored can only be the in the headroom or in the tailroom, which are both managed by the application and both go up and down during the packet lifetime. - To me, packet-metadata is part of the packet descriptor: mbuf is the mandatory & standard part of the packet descriptor, while meta-data is the optional & non-standard (per application) part of the packet descriptor. Therefore, it makes sense to put the meta-data immediately after mbuf, but this is not mandatory. - The zero-size field called meta-data is really just an offset: it points to the first buffer location where meta-data _can_ be placed. The reason for having this field is to provide a standard way to place meta-data in the packet buffer rather that hide it in the Packet Framework libraries and potentially conflict with other mbuf changes in the future. It flags that meta-data should be taken into consideration when any mbuf change is done in the future. - For direct mbufs, the same buffer space (the headroom) is shared between packet data (header prepending) and meta-data similar to how the stack and the heap manage the same memory. Of course, since it is managed by the application, it is the responsibility of the application to make sure the packet bytes and the meta-data do not step on one another, but this problem is not at all new, nor introduced by the meta-data field: even currently, the application has to make sure the headroom is dimensioned correctly, so that even in the worst case scenario (application-specific), the packet bytes do not run into the mbuf fields, right? - For indirect mbufs, the packet meta-data is associated with the indirect mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as the direct mbuf contains a different packet that has different meta-data), so it makes sense that meta-data of the indirect mbuf is stored in the same buffer as the indirect mbuf (rather than the buffer of the attached direct mbuf). So this means that the buffer size used to store the indirect mbuf is sized appropriately (mbuf size, currently 64 bytes, plus the max size for additional meta-data). This is illustrated in the code of the IPv4 fragmentation port, where for every child packet (output fragment) we copy the parent meta-data in the child buffer (which stores an indirect buffer attached to the direct buffer of the input jumbo, plus now additional meta-data). - We are also considering having a user_data field in the mbuf itself to point to meta-data in the same buffer or any other buffer. The Packet Framework functionally works with both approaches, but there is a performance problem associated with the mbuf->user_data approach that we are not addressing for this 1.7 release timeframe. The issue is the data dependency that is created, as in order to find out the location of the meta-data, we need to first read the mbuf and then read meta-data from mbuf->user_data. The cost of the additional memory access is high, due t
[dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
the mbuf or the packet (efficiency). For the time being, if somebody needs more yet meta-data in yet another buffer, they can add (for their application) a user_data pointer as part of their application meta-data (instead of standard mbuf). - As said, the mbuf->metadata points to the first location where meta-data _can_ be placed, if somebody needs a different offset, they can add it on top of the mbuf->metadata (e.g. by having a reserved field in their struct app_pkt_metadata). We have demonstrated the use of meta-data in the examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h"). Please let me know if this makes sense? Regards, Cristian PS: This is where a white board would help a lot ... -Original Message- From: Ivan Boule [mailto:ivan.bo...@6wind.com] Sent: Wednesday, May 28, 2014 1:03 PM To: Dumitrescu, Cristian; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Hi Cristian, Currently, the DPDK framework does not make any assumption on the actual layout of a mbuf. More precisely, the DPDK does not impose any constraint on the actual location of additional metadata, if any, or on the actual location and size of its associated payload data buffer. This is coherent with the fact that mbuf pools are not created by the DPDK itself, but by network applications that are free to choose whatever packet mbuf layout that fits their particular needs. There is one exception to this basic DPDK rule: the mbuf cloning feature available through the RTE_MBUF_SCATTER_GATHER configuration option assumes that the payload data buffer of the mbuf immediately follows the rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file lib/librte_mbuf/rte_mbuf.h). The cloning feature prevents to build packet mbufs with their metadata located immediately after the rte_mbuf data structure, which is exactly what your patch introduces. At least, a comment that clearly warns the user of this incompatibility might be worth adding into both the code and your patch log. Regards, Ivan On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote: > Added zero-size field (offset in data structure) to specify the beginning of > packet meta-data in the packet buffer just after the mbuf. > > The size of the packet meta-data is application specific and the packet > meta-data is managed by the application. > > The packet meta-data should always be accessed through the provided macros. > > This is used by the Packet Framework libraries (port, table, pipeline). > > There is absolutely no performance impact due to this mbuf field, as it does > not take any space in the mbuf structure (zero-size field). > > Signed-off-by: Cristian Dumitrescu > --- > lib/librte_mbuf/rte_mbuf.h | 17 + > 1 files changed, 17 insertions(+), 0 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 4a9ab41..bf09618 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -201,8 +201,25 @@ struct rte_mbuf { > struct rte_ctrlmbuf ctrl; > struct rte_pktmbuf pkt; > }; > + > + union { > + uint8_t metadata[0]; > + uint16_t metadata16[0]; > + uint32_t metadata32[0]; > + uint64_t metadata64[0]; > + }; > } __rte_cache_aligned; > > +#define RTE_MBUF_METADATA_UINT8(mbuf, offset) (mbuf->metadata[offset]) > +#define RTE_MBUF_METADATA_UINT16(mbuf, offset) > (mbuf->metadata16[offset/sizeof(uint16_t)]) > +#define RTE_MBUF_METADATA_UINT32(mbuf, offset) > (mbuf->metadata32[offset/sizeof(uint32_t)]) > +#define RTE_MBUF_METADATA_UINT64(mbuf, offset) > (mbuf->metadata64[offset/sizeof(uint64_t)]) > + > +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset) (>metadata[offset]) > +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) > (>metadata16[offset/sizeof(uint16_t)]) > +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) > (>metadata32[offset/sizeof(uint32_t)]) > +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) > (>metadata64[offset/sizeof(uint64_t)]) > + > /** >* Given the buf_addr returns the pointer to corresponding mbuf. >*/ > -- Ivan Boule 6WIND Development Engineer -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH 00/29] Packet Framework
Hi Neil, Packet Framework does not compete against OVS. OVS is an application (for virtual switching), while Packet Framework is a toolbox to build applications. Can somebody pick OVS building blocks and reuse them to build other applications that use the OpenFlow design principles (port, table, pipeline, actions, etc)? Probably not easily, if at all. Can somebody use Packet Framework to build a virtual switch application? Hopefully yes. Can somebody use Packet Framework to develop various applications with a custom actions extended outside the small OVS hardwired set (suitable for a switch, but not for e.g. a base station)? Hopefully yes. This being said, OVS and Packet Framework do use similar design principles originating from OpenFlow. Regards, Cristian -Original Message- From: Neil Horman [mailto:nhor...@tuxdriver.com] Sent: Tuesday, May 27, 2014 8:48 PM To: Dumitrescu, Cristian Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH 00/29] Packet Framework On Tue, May 27, 2014 at 06:09:23PM +0100, Cristian Dumitrescu wrote: > Intel DPDK Packet Framework provides a standard methodology (logically > similar to OpenFlow) for rapid development of complex packet processing > pipelines out of ports, tables and actions. > > A pipeline is constructed by connecting its input ports to its output ports > through a chain of lookup tables. As result of lookup operation into the > current table, one of the table entries (or the default table entry, in case > of lookup miss) is identified to provide the actions to be executed on the > current packet and the associated action meta-data. The behavior of user > actions is defined through the configurable table action handler, while the > reserved actions define the next hop for the current packet (either another > table, an output port or packet drop) and are handled transparently by the > framework. > > Three new Intel DPDK libraries are introduced for Packet Framework: > librte_port, librte_table, librte_pipeline. Please check the Intel DPDK > Programmer's Guide for full description of the Packet Framework design. > > Two sample applications are provided for Packet Framework: app/test-pipeline > and examples/ip_pipeline. Please check the Intel Sample Apps Guide for a > detailed description of how these sample apps. > Isn't this at least in part functionality that OVS provides on top of DPDK? Why re-invent the wheel? Neil -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Please any one who can help me with librte_sched
Hi Ariel, I think you put your finger precisely on the problem associated with your approach: you have to iterate through all the queues and free up the packets, which takes a lot of time. Obviously this is not done by the rte_sched API. Maybe a limited workaround for this approach would be to create and service the parallel rte_sched using a different CPU core, while the previous CPU core takes its time to free up all the packets and data structures correctly. Regards, Cristian From: Ariel Rodriguez [mailto:arodrig...@callistech.com] Sent: Wednesday, May 28, 2014 1:46 AM To: Dumitrescu, Cristian Cc: Stephen Hemminger; dev at dpdk.org Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched Thank you perfect explanation, i think im going to creating a new parallel rte_sched_port and change the reference with managment core updating the tx/sched core. So, what happens with the packets on the old reference if i just do rte_port_free on it, are them leaked? Is there a why to flush the rte_sched_port or maybe gets the packet total size somewhere?. Anyway the rcu algoritm fits ok in this aproach ... but maybe there is a way to flush the old reference port, and work from there with the recently created rte_sched_port Regars, Ariel. On Tue, May 27, 2014 at 3:31 PM, Dumitrescu, Cristian mailto:cristian.dumitrescu at intel.com>> wrote: Hi Ariel, What's wrong with calling rte_sched_subport_config() and rte_sched_pipe_config() during run-time? This assumes that: 1. Port initialization is done, which includes the following: a) the number of subports, pipes per subport are fixed b) the queues are all created and their size is fixed c) the pipe profiles are defined d) Basically the maximal data structures get created (maximum number of supports, pipes and queues) with no run-time changes allowed, apart for the bandwidth related parameters. Queues that do not receive packets are not used now, they will be used as soon as they get packets. The packets-to-queues mapping logic can change over time, as well as the level of activity for different users/queues. 2. The CPU core calling the subport/pipe config functions is the same as the core doing enque/dequeue for this port (for thread safety reasons). a) As you say, the management core can send update requests to the core running the scheduler, with the latter sampling the request queue regularly and performing the updates. Regards, Cristian -Original Message- From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-boun...@dpdk.org>] On Behalf Of Stephen Hemminger Sent: Tuesday, May 27, 2014 5:35 PM To: Ariel Rodriguez Cc: dev at dpdk.org<mailto:dev at dpdk.org> Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched On Tue, 27 May 2014 10:33:02 -0300 Ariel Rodriguez mailto:arodriguez at callistech.com>> wrote: > Hello , this is my third mail , the previous mails have not been answered > yet. > > I justo need someone explains to me how the librte_sched framework behaves > in a specific situation. > > I have a managment application , this connects with a ring with the tx > core, when a user applies some configuration of the bandwith mangement , > the tx core read the message in the ring parse the configuration in a > rte_port_params struct , subport_params and pipe_params, then creates a new > rte_sched from scratch , and then changes the pointer of the current > rte_sched_port currently doing scheduling and then the code execurte > rte_sched_port_free() for the unreference (reference by temporal pointer) > rte_sched_port . This is the only way i found for applying dinamic > configuration or changes to the qos framework. > So, with this, what happens with the packets attached to the old > rte_sched_port while is deleted? are those lost packets inside the > rte_sched_port generates memory leaks? how can i recover this packets _ > just dequeing from the port scheduler? Where the port scheduler indicates > empty packets in the queu state? > > Is there a better way to achieve this kind of behaviour? i just need to > update the rte_sched_port configuration dinamically, and i want to change > the current pipe configuration and sub port configuration also. > > Regards . If you need to do dynamic changes, I would recommend using an RCU type algorithm where you exchange in new parameters and then cleanup/free after a grace period. See http://lttng.org/urcu -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strict
[dpdk-dev] [PATCH] cmdline: finish at EOF
Hi Olivier, Great, so then maybe we should plan for an update of the DPDK command line library to your latest code base. Are the code changes massive, i.e. would this be a big task? Thanks, Cristian -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Monday, May 26, 2014 3:27 PM To: Dumitrescu, Cristian; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] cmdline: finish at EOF Hi Cristian, On 05/23/2014 05:32 PM, Olivier MATZ wrote: > On 05/23/2014 05:21 PM, Cristian Dumitrescu wrote: >> Bug fix in cmdline library to allow return on EOF as opposed to >> infinite loop. >> >> Signed-off-by: Cristian Dumitrescu >> --- >> lib/librte_cmdline/cmdline.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> mode change 100644 => 100755 lib/librte_cmdline/cmdline.c > > Acked-by: Olivier Matz By the way, I just noticed that this bug had already been fixed into my cmdline project onto my personal website some time ago: http://git.droids-corp.org/?p=libcmdline.git;a=commitdiff;h=52e1d5772d1f465a013e65b8ae6e43a027438ed6 Regards, Olivier -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Please any one who can help me with librte_sched
Hi Ariel, What's wrong with calling rte_sched_subport_config() and rte_sched_pipe_config() during run-time? This assumes that: 1. Port initialization is done, which includes the following: a) the number of subports, pipes per subport are fixed b) the queues are all created and their size is fixed c) the pipe profiles are defined d) Basically the maximal data structures get created (maximum number of supports, pipes and queues) with no run-time changes allowed, apart for the bandwidth related parameters. Queues that do not receive packets are not used now, they will be used as soon as they get packets. The packets-to-queues mapping logic can change over time, as well as the level of activity for different users/queues. 2. The CPU core calling the subport/pipe config functions is the same as the core doing enque/dequeue for this port (for thread safety reasons). a) As you say, the management core can send update requests to the core running the scheduler, with the latter sampling the request queue regularly and performing the updates. Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Tuesday, May 27, 2014 5:35 PM To: Ariel Rodriguez Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched On Tue, 27 May 2014 10:33:02 -0300 Ariel Rodriguez wrote: > Hello , this is my third mail , the previous mails have not been answered > yet. > > I justo need someone explains to me how the librte_sched framework behaves > in a specific situation. > > I have a managment application , this connects with a ring with the tx > core, when a user applies some configuration of the bandwith mangement , > the tx core read the message in the ring parse the configuration in a > rte_port_params struct , subport_params and pipe_params, then creates a new > rte_sched from scratch , and then changes the pointer of the current > rte_sched_port currently doing scheduling and then the code execurte > rte_sched_port_free() for the unreference (reference by temporal pointer) > rte_sched_port . This is the only way i found for applying dinamic > configuration or changes to the qos framework. > So, with this, what happens with the packets attached to the old > rte_sched_port while is deleted? are those lost packets inside the > rte_sched_port generates memory leaks? how can i recover this packets _ > just dequeing from the port scheduler? Where the port scheduler indicates > empty packets in the queu state? > > Is there a better way to achieve this kind of behaviour? i just need to > update the rte_sched_port configuration dinamically, and i want to change > the current pipe configuration and sub port configuration also. > > Regards . If you need to do dynamic changes, I would recommend using an RCU type algorithm where you exchange in new parameters and then cleanup/free after a grace period. See http://lttng.org/urcu -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf
I am also using the rte_ctrlmbuf to send messages between cores in one of the Packet Framework sample apps that I am going to send as a patch tomorrow or later this week. Removing rte_ctrlmbuf would require additional rework to this (complex) sample app. It can be done, but it is additional work very close to the code freeze cycle. Thanks, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gilmore, Walter E Sent: Sunday, May 25, 2014 10:39 PM To: Olivier Matz; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf Olivier you're making an assumption that customer application code running on the Intel DPDK isn't using the rte_ctrlmbuf structure. Remember there are more than 300 customers using the Intel DPDK and there is no way you can predict that this is not used by them. The purpose of this structure is to send commands, events or any other type of information between user application tasks (normally from a manager task). It has been there since the beginning in the original design and it's up to the user to define what is in the data field and how they wish to use it. It's one thing to fix a bug but to remove a structure like this because you don't see it use in the other parts is asking for trouble with customers. -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz Sent: Friday, May 09, 2014 10:51 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf The initial role of rte_ctrlmbuf is to carry generic messages (data pointer + data length) but it's not used by the DPDK or it applications. Keeping it implies: - loosing 1 byte in the rte_mbuf structure - having some dead code rte_mbuf.[ch] This patch removes this feature. Thanks to it, it is now possible to simplify the rte_mbuf structure by merging the rte_pktmbuf structure in it. This is done in next commit. Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 1 - app/test-pmd/testpmd.c | 2 - app/test-pmd/txonly.c| 2 +- app/test/commands.c | 1 - app/test/test_mbuf.c | 72 + examples/ipv4_multicast/main.c | 2 +- lib/librte_mbuf/rte_mbuf.c | 65 +++- lib/librte_mbuf/rte_mbuf.h | 175 ++- lib/librte_pmd_e1000/em_rxtx.c | 2 +- lib/librte_pmd_e1000/igb_rxtx.c | 2 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c| 4 +- lib/librte_pmd_virtio/virtio_rxtx.c | 2 +- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c| 2 +- lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- 14 files changed, 54 insertions(+), 280 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 7becedc..e3d1849 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -5010,7 +5010,6 @@ dump_struct_sizes(void) #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t)); DUMP_SIZE(struct rte_mbuf); DUMP_SIZE(struct rte_pktmbuf); - DUMP_SIZE(struct rte_ctrlmbuf); DUMP_SIZE(struct rte_mempool); DUMP_SIZE(struct rte_ring); #undef DUMP_SIZE diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 9c56914..76b3823 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -389,13 +389,11 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, mb_ctor_arg = (struct mbuf_ctor_arg *) opaque_arg; mb = (struct rte_mbuf *) raw_mbuf; - mb->type = RTE_MBUF_PKT; mb->pool = mp; mb->buf_addr = (void *) ((char *)mb + mb_ctor_arg->seg_buf_offset); mb->buf_physaddr = (uint64_t) (rte_mempool_virt2phy(mp, mb) + mb_ctor_arg->seg_buf_offset); mb->buf_len = mb_ctor_arg->seg_buf_size; - mb->type = RTE_MBUF_PKT; mb->ol_flags = 0; mb->pkt.data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM; mb->pkt.nb_segs = 1; diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 1cf2574..1f066d0 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -93,7 +93,7 @@ tx_mbuf_alloc(struct rte_mempool *mp) struct rte_mbuf *m; m = __rte_mbuf_raw_alloc(mp); - __rte_mbuf_sanity_check_raw(m, RTE_MBUF_PKT, 0); + __rte_mbuf_sanity_check_raw(m, 0); return (m); } diff --git a/app/test/commands.c b/app/test/commands.c index b145036..c69544b 100644 --- a/app/test/commands.c +++ b/app/test/commands.c @@ -262,7 +262,6 @@ dump_struct_sizes(void) #define DUMP_SIZE(t) printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t)); DUMP_SIZE(struct rte_mbuf); DUMP_SIZE(struct rte_pktmbuf); - DUMP_SIZE(struct rte_ctrlmbuf); DUMP_SIZE(struct rte_mempool); DUMP_SIZE(struct rte_ring); #undef DUMP_SIZE diff --git
[dpdk-dev] 4 Traffic classes per Pipe limitation
Hi Ariel, some comments inlined below. Regards, Cristian -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ariel Rodriguez Sent: Thursday, November 28, 2013 8:53 PM To: dev at dpdk.org Subject: [dpdk-dev] 4 Traffic classes per Pipe limitation Hi, im working with the qos scheduler framework and i have a few questions. Why the 4 traffic classes per pipe limitation? . [Cristian] The DPDK hierarchical scheduler allows for 4 traffic classes with 4 packet queues per traffic class for each pipe (user). Traffic classes and scheduled in strict priority, while queues within a pipe traffic class are scheduled using byte-level Weighted Round Robin (WRR) with configured weights. Since we have 16 packet queues per pipe (user), we could argue that actually 16 (sub)traffic classes are supported. When the weight ratio between different queues of the same traffic class is high (e.g. 4:1, 8:1, 16:1, etc), then WRR starts behaving like strict priority. If your mapping to traffic classes is done using DSCP, you can simply map some DSCP values to different queues within same traffic class as well. Im developing a deep packet inspection solution for a telecom company and i we need more than just 4 traffic classes per pipe. Im able to recognise almost all layer 7 applications, such as youtube, p2p , netflix , google-ads , etc, etc and i really need to map this type of flows in differents traffic classes. [Cristian] Not sure I completely agree with you here. The traffic classes are there for the reason of providing different levels of delay (latency), delay variation (jitter), packet loss rate, etc. So, for example, one possible mapping of traffic types to the 4 traffic classes might be: voice = TC0 (highest priority), interactive video = TC1, non-interactive/cached video = TC2, best effort traffic (file downloads, web browsing, email, etc) = TC3 (lowest priority). In my opinion, youtube and netflix could be considered as being part of the same traffic class (e.g. TC2), as they have very similar (if not identical) requirements, probably same for p2p and google-ads, email, web browsing, etc (where best effort traffic class is probably applicable). If really needed, youtube and netflix could be mapped to different queues of TC2. If different service / actions need to be applied to different applications that have the same scheduling requirements (and part of the same traffic class), then this would probably have to be decided earlier during the classification phase and e.g. rate limit youtube traffic per user using traffic metering algorithms, block p2p traffic if you are a firewall, etc; these are probably actions that could be enforced outside of scheduling/traffic management. The idea is mark each flow depending on the provisioning information and assign that flows to different subport depending on the information given and assign a pipe with the subscriber contract rate, but we really need to have more than 4 traffic clases, because we want to control the bandwidth of different layer 7 protocols flows. At most we need 32 or 64 traffic classes per subscriber. [Cristian] Bandwidth control could be done on both ingress side as well as egress side. On ingress, the amount of incoming traffic for a specific user (flow/connection/application) could be limited to predefined values, with potentially different levels for different classes of users (e.g. regular / premium / company / etc). On egress, several pipe profiles can be defined using the DPDK hierarchical scheduler, which would allow setting up a different rate limit for each traffic class for each user. Likewise, traffic classes can be rate limited at the subport level (group of users). I understand that in a given interval of time a subscriber dont use more than 4 protocols simultaneously , generally speaking , or 4 traffic classes in dpdk qos terminology, but the framework doesnt allow us to configure more traffic classes. Im looking the code of qos scheduler and im not seeing why this restriction. Is a performance problem, or a waste of resource problem? , maybe when the port grinder search for the active queues for each traffic class the delay of iterating over all pipes and each traffic class is too high. Cisco have a bandwidth managment solution that claims to control a million of subscribers simoultaneosly with 64 traffic classes per subscriber (pipes) and 4 queues per traffic classes (Cisco solution calls traffic clases as "Bandwith controller per service or BWC , a subscriber can have 64 BWC simoultaneasly). Its this posible? maybe this guys implements the bandwidth managment in hardware?. Anyway i really need this feature , but if the qos scheduller cannot scale to more than 4 traffic classes per pipe i would have to implement a custom packet scheduler from scratch and i really dont want to do that. Thanks for the patience, and