Hi,

> -----Original Message-----
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Thursday, April 12, 2018 8:52 PM
> To: Kevin Traynor <ktray...@redhat.com>; Stokes, Ian <ian.sto...@intel.com>;
> Jan Scheurich <jan.scheur...@ericsson.com>; Venkatesan Pradeep
> <venkatesan.prad...@ericsson.com>; d...@openvswitch.org
> Cc: Flavio Leitner <f...@redhat.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Ben Pfaff (b...@ovn.org)
> <b...@ovn.org>; acon...@redhat.com; disc...@openvswitch.org
> Subject: Re: Mempool issue for OVS 2.9
> 
> On 11.04.2018 20:55, Kevin Traynor wrote:
> > On 04/10/2018 11:12 AM, Stokes, Ian wrote:
> >>>>> -----Original Message-----
> >>>>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >>>>> Sent: Monday, 29 January, 2018 09:35
> >>>>> To: Jan Scheurich <jan.scheur...@ericsson.com>; Venkatesan Pradeep
> >>>>> <venkatesan.prad...@ericsson.com>; Stokes, Ian
> >>>>> <ian.sto...@intel.com>; d...@openvswitch.org
> >>>>> Cc: Kevin Traynor <ktray...@redhat.com>; Flavio Leitner
> >>>>> <f...@redhat.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> >>>>> Kavanagh, Mark B <mark.b.kavan...@intel.com>; Ben Pfaff
> >>>>> (b...@ovn.org) <b...@ovn.org>; acon...@redhat.com;
> >>>>> disc...@openvswitch.org
> >>>>> Subject: Re: Mempool issue for OVS 2.9
> >>>>>
> >>>>> On 29.01.2018 11:19, Jan Scheurich wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'd like to take one step back and look at how much many mbufs we
> >>> actually need.
> >>>>>>
> >>>>>> Today mbufs are consumed in the following places:
> >>>>>>
> >>>>>>  1. Rx queues of **physical** dpdk ports: dev->requested_n_rxq *
> >>>>>> dev-
> >>>> requested_rxq_size
> >>>>>>     Note 1: These mbufs are hogged up at all times.
> >>>>>>     Note 2: There is little point in configuring more rx queues
> >>>>>> per
> >>> phy port than there are PMDs to poll them.
> >>>>>>     Note 3: The rx queues of vhostuser ports exist as virtqueues
> >>>>>> in
> >>> the guest and do not hog mbufs.
> >>>>>>  2. One batch per PMD during processing: #PMD *
> NETDEV_MAX_BURST  3.
> >>>>>> One batch per tx queue with time-based tx batching:
> >>>>>> dev->requested_n_txq * NETDEV_MAX_BURST  4. Tx queues of
> >>>>>> dev->**physical**
> >>> ports: dev->requested_n_txq * expected peak tx queue fill level
> >>>>>>     Note 1:  The maximum of 2K mbufs per tx queue can only be
> >>>>>> reached
> >>> if the OVS transmit rate exceeds the line rate for a long time.
> >>>>> This can only happen for large packets and when the traffic
> >>>>> originates from VMs on the compute node. This would be a case of
> >>>>> under- dimensioning and packets would be dropped in any case.
> >>>>> Excluding
> >>> that scenario, a typical peak tx queue fill level would be when all
> >>> PMDs transmit a full batch at the same time: #PMDs *
> NETDEV_MAX_BURST.
> >>>>>
> >>>>> Above assumption is wrong. Just look at ixgbe driver:
> >>>>> drivers/net/ixgbe/ixgbe_rxtx.c: tx_xmit_pkts():
> >>>>>
> >>>>>        /*
> >>>>>         * Begin scanning the H/W ring for done descriptors when the
> >>>>>         * number of available descriptors drops below tx_free_thresh.
> >>> For
> >>>>>         * each done descriptor, free the associated buffer.
> >>>>>         */
> >>>>>        if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>>        ┊       ixgbe_tx_free_bufs(txq);
> >>>>>
> >>>>> The default value for 'tx_free_thresh' is 32. So, if I'll
> >>>>> configure number of TX descriptors to 4096, driver will start to
> >>>>> free mbufs only when it will have more than 4063 mbufs inside its
> >>>>> TX queue. No matter how frequent calls to send() function.
> >>>>
> >>>> OK, but that doesn't change my general argument. The mbufs hogged
> >>>> in the
> >>> tx side of the phy port driver are coming from all ports (least
> >>> likely the port itself). Considering them in dimensioning the port's
> >>> private mempool is conceptually wrong. In my simplified dimensioning
> >>> formula below I have already assumed full occupancy of the tx queue
> >>> for phy ports. The second key observation is that vhostuser ports do
> >>> not hog mbufs at all. And vhost zero copy doesn't change that.
> >>>
> >>> Formula below maybe good for static environment. I want to change
> >>> number of PMD threads dynamically in my deployments and this working
> >>> in current per-port model and with oversized shared pool. If we'll
> >>> try to reduce memory consumption of the shared pool we'll have to
> >>> reconfigure all the devices each time we change the number of PMD
> >>> threads. This would be really bad.
> >>> So, size of the memory pool should not depend on dynamic
> >>> characteristics of the datapath or other ports to avoid unexpected
> >>> interrupts in traffic flows in case of random changes in
> >>> configuration. Of course, it could depend on characteristics of the
> >>> port itself in case of per-port model. In case of shared mempool
> >>> model the size should only depend on static datapath configuration.
> >>
> >> Hi all,
> >>
> >> Now seems a good time to kick start this conversation again as there's a 
> >> few
> patches floating around for mempools on master and 2.9.
> >> I'm happy to work on a solution for this but before starting I'd like to 
> >> agree
> on the requirements so we're all comfortable with the solution.
> >>
> >
> > Thanks for kicking it off Ian. FWIW, the freeing fix code can work
> > with both schemes below. I already have that between the patches for
> > different branches. It should be straightforward to change to cover
> > both in same code. I can help with that if needed.
> 
> Agree, there is no much difference between mempool models for freeing fix.
> 
> >
> >> I see two use cases above, static and dynamic. Each have their own
> requirements (I'm keeping OVS 2.10 in mind here as it's an issue we need to
> resolve).
> >>
> >> Static environment
> >> 1. For a given deployment, the 2.10 the mempool design should use the
> same or less memory as the shared mempool design of 2.9.
> >> 2. Memory pool size can depend on static datapath configurations, but the
> previous provisioning used in OVS 2.9 is acceptable also.
> >>
> >> I think the shared mempool model suits the static environment, it's a rough
> way of provisioning memory but it works for the majority involved in the
> discussion to date.
> >>
> >> Dynamic environment
> >> 1. Mempool size should not depend on dynamic characteristics (number of
> PMDs, number of ports etc.), this leads to frequent traffic interrupts.
> >
> > If that is wanted I think you need to distinguish between port related
> > dynamic characteristics and non-port related. At present the per port
> > scheme depends on number of rx/tx queues and the size of rx/tx queues.
> > Also, txq's depends on number of PMDs. All of which can be changed
> > dynamically.
> 
> Changing of the mempool size is too heavy operation. We should avoid it
> somehow as long as possible.
> 
> It'll be cool to have some kind of dynamic mempool resize API from the DPDK,
> but there is no such concepts right now. Maybe it'll be good if DPDK API will
> allow to add more than one mempool for a device. Such API could allow us to
> dynamically increase/decrease the total amount of memory available for a
> single port. We should definitely think about something like this in the 
> future.
> 
> >
> >> 2. Due to the dynamic environment, it's preferable for clear visibility of
> memory usage for ports (Sharing mempools violates this).
> >>
> >> The current per port model suits the dynamic environment.
> >>
> >> I'd like to propose for 2.10 that we implement a model to allow both:
> >>
> >> * When adding a port the shared mempool model would be the default
> behavior. This would satisfy users moving from previous OVS releases to 2.10
> as memory requirements would be in line with what was previously expected
> and no new options/arguments are needed.
> >>
> >
> > +1
> 
> It's OK for me too.
> 
I agree that the shared mempool should be the default but would also ask if we 
really need a per-port mempool mechanism. I think a more important problem to 
solve would be to ensure that transmitted messages that are sitting on tx 
queues get freed at the earliest. That should help keep the mempool size down 
and it is all the more important if there's a need to also support per-port 
mempools. I think most deployments would use only a handful of MTUs and the 
shared mempool approach would suffice so long as we clearly document the memory 
usage for different MTU ranges (and perhaps even allow the mempool size to be 
configurable).

Regards,

Pradeep

> >
> >> * Per port mempool is available but must be requested by a user, it would
> require a new option argument when adding a port.
> >
> > I'm not sure there needs to be an option *per port*. The implication
> > is that some mempools would be created exclusively for a single port,
> > while others would be available to share and this would operate at the same
> time.
> >
> > I think a user would either have an unknown or high number of ports
> > and are ok with provisioning the amount of memory for shared mempools,
> > or they know they will have only a few ports and can benefit from
> > using less memory.
> 
> Unknown/big but limited number of ports could also be a scenario for separate
> mempool model, especially for dynamic case.
> 
> >
> > Although, while it is desirable to reduce memory usage, I've never
> > actually heard anyone complaining about the amount of memory needed
> > for shared mempools and requesting it to be reduced.
> 
> I agree that per-port option looks like more than users could need.
> Maybe global config will be better.
> 
> There is one more thing: Users like OpenStack are definitely "dynamic".
> Addition of the new special parameter will require them to modify their code
> to have more or less manageable memory consumption.
> 
> P.S. Meanwhile, I will be out of office until May 3 and will not be able
>      to respond to emails.
> 
> >
> > I don't think it would be particularly difficult to have both schemes
> > operating at the same time because you could use mempool names to
> > differentiate (some with unique port related name, some with a general
> > name) and mostly treat them the same, but just not sure that it's
> > really needed.
> >
> >> This would be an advanced feature as its mempool size can depend on port
> configuration, users need to understand this & mempool concepts in general
> before using this. A bit of work to be done here in the docs to make this 
> clear
> how memory requirements are calculated etc.
> >>
> >> Before going into solution details I'd like to get people's opinions. 
> >> There's a
> few different ways to implement this, but in general would the above be
> acceptable? I think with some smart design we could minimize the code impact
> so that both approaches share as much as possible.
> >>
> >> Ian
> >>
> >>>
> >>>>
> >>>> BTW:, is there any reason why phy drivers should free tx mbufs only
> >>>> when
> >>> the tx ring is close to becoming full? I'd understand that want need
> >>> to free them in batches for performance reasons, but is there no
> >>> cheap possibility to do this earlier?
> >>>>
> >>>>>
> >>>>>>     Note 2: Vhostuser ports do not store mbufs in tx queues due
> >>>>>> to copying to virtio descriptors
> >>>>>>
> >>>>>>
> >>>>>> For concreteness let us use an example of a typical, rather large
> >>>>>> OVS
> >>> deployment in an NFVI cloud:
> >>>>>>
> >>>>>>   * Two cores with 4 PMDs per NUMA socket using HT.
> >>>>>>   * Two physical ports using RSS over 4 rx queues to enable load-
> >>> sharing over the 4 local PMDs and 9 tx queues (8 PMDs plus non PMD)
> >>>>>>   * 100 vhostuser ports with a varying number of rx and tx queue
> >>>>>> pairs
> >>> (128 in total).
> >>>>>>
> >>>>>>
> >>>>>> In the above example deployments this translates into
> >>>>>>
> >>>>>>  1. 4 * 2K = 8K mbufs per physical port (16K in total)  2. 8 * 32
> >>>>>> =
> >>>>>> 256 mbufs total  3. (128 +  23*9) * 32 = 4672 mbufs in total  4.
> >>>>>> 9 *
> >>>>>> 32 = 288 mbufs per physical port (Adding some safety margin, a
> >>>>>> total of 2K mbufs)
> >>>>>>
> >>>>>> -------
> >>>>>> Total : 23K mbufs
> >>>>>>
> >>>>>> This is way lower than the size of the earlier shared mempool
> >>>>>> (256K mbufs), which explains why we have never observed out of
> >>>>>> mbuf
> >>>>> drops in our NFVI deployments. The vswitchd crash that triggered
> >>>>> the change to per-port mempools only happened because they tried
> >>>>> to configure 64 rx and tx queues per physical port for multiple
> >>>>> ports. I
> >>> can’t see any reason for configuring more rx and tx queues than
> >>> polling PMDs, though.
> >>>
> >>> There is at least one reason for having more TX queues than PMDs:
> >>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336074.html
> >>> this patch is still in kind of undecided state and I had no time to
> >>> work on it.
> >>>
> >>>>>>
> >>>>>> The actual consumption of mbufs scales primarily with the number
> >>>>>> of physical ports (a, c and d) and only to a much lower degree
> >>>>>> with
> >>>>> the number of vhost ports/queues (c).
> >>>>>>
> >>>>>> Except for the phy rx queues, all other cases buffer a
> >>>>>> statistical mix of mbufs received on all ports. There seems
> >>>>>> little point in assigning
> >>>>> per-port mempools for these.
> >>>>>>
> >>>>>> I think we should revert to a shared mempool (per MTU size) with
> >>>>>> a simple dimensioning formula that only depends on the number of
> >>>>> physical ports and the number of PMDs, both of which are zero day
> >>> configuration parameters that are set by OVS users.
> >>>>>>
> >>>>>> For example:
> >>>>>> #mbuf = SUM/physical ports [n_rxq * rxq_size + (#PMDs + 1) *
> >>>>>> txq_size] + 16K
> >>>>>>
> >>>>>> The fixed 16K would cover for b) and c) for up to 512 vhostuser
> >>>>>> tx
> >>> queues, which should be ample.
> >>>>>> In the above example this result in 2 * [ 4 * 2K + 9 * 2K ] + 8K
> >>>>>> =
> >>> 60K mbufs.
> >>>>>>
> >>>>>> BR, Jan
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Venkatesan Pradeep
> >>>>>>> Sent: Friday, 26 January, 2018 18:34
> >>>>>>> To: Jan Scheurich <jan.scheur...@ericsson.com>; Stokes, Ian
> >>>>>>> <ian.sto...@intel.com>; ovs-discuss@openvswitch.org
> >>>>>>> Cc: Kevin Traynor <ktray...@redhat.com>; Flavio Leitner
> >>>>>>> <f...@redhat.com>; Ilya Maximets (i.maxim...@samsung.com)
> >>>>>>> <i.maxim...@samsung.com>; Loftus, Ciara
> >>>>>>> <ciara.lof...@intel.com>; Kavanagh, Mark B
> >>>>>>> <mark.b.kavan...@intel.com>; Ben Pfaff
> >>>>>>> (b...@ovn.org) <b...@ovn.org>; acon...@redhat.com
> >>>>>>> Subject: RE: Mempool issue for OVS 2.9
> >>>>>>>
> >>>>>>> Response marked [Pradeep]
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Pradeep
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Jan Scheurich
> >>>>>>> Sent: Friday, January 26, 2018 10:26 PM
> >>>>>>> To: Stokes, Ian <ian.sto...@intel.com
> >>>>>>> <mailto:ian.sto...@intel.com>>; ovs-discuss@openvswitch.org
> >>>>>>> <mailto:ovs-
> >>>>> disc...@openvswitch.org>
> >>>>>>> Cc: Kevin Traynor <ktray...@redhat.com
> >>>>>>> <mailto:ktray...@redhat.com>>; Flavio Leitner <f...@redhat.com
> >>>>> <mailto:f...@redhat.com>>; Ilya Maximets (i.maxim...@samsung.com
> >>>>> <mailto:i.maxim...@samsung.com>)
> >>>>>>> <i.maxim...@samsung.com <mailto:i.maxim...@samsung.com>>;
> >>>>>>> Loftus, Ciara <ciara.lof...@intel.com
> >>>>> <mailto:ciara.lof...@intel.com>>; Kavanagh, Mark B
> >>>>> <mark.b.kavan...@intel.com <mailto:mark.b.kavan...@intel.com>>;
> >>>>> Ben Pfaff
> >>>>>>> (b...@ovn.org <mailto:b...@ovn.org>) <b...@ovn.org
> >>>>>>> <mailto:b...@ovn.org>>; acon...@redhat.com
> >>>>>>> <mailto:acon...@redhat.com>;
> >>>>> Venkatesan Pradeep <venkatesan.prad...@ericsson.com
> >>>>> <mailto:venkatesan.prad...@ericsson.com>>
> >>>>>>> Subject: RE: Mempool issue for OVS 2.9
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Stokes, Ian [mailto:ian.sto...@intel.com]
> >>>>>>>> Sent: Friday, 26 January, 2018 13:01
> >>>>>>>> To: ovs-discuss@openvswitch.org
> >>>>>>>> <mailto:ovs-discuss@openvswitch.org>
> >>>>>>>> Cc: Kevin Traynor <ktray...@redhat.com
> >>>>>>>> <mailto:ktray...@redhat.com>>; Flavio Leitner <f...@redhat.com
> >>>>>>>> <mailto:f...@redhat.com>>; Ilya Maximets
> (i.maxim...@samsung.com
> >>>>>>>> <mailto:i.maxim...@samsung.com>) <i.maxim...@samsung.com
> >>>>>>>> <mailto:i.maxim...@samsung.com>>; Loftus, Ciara
> >>>>>>>> <ciara.lof...@intel.com
> >>>>> <mailto:ciara.lof...@intel.com>>;
> >>>>>>>> Kavanagh, Mark B <mark.b.kavan...@intel.com
> >>>>>>>> <mailto:mark.b.kavan...@intel.com>>; Jan Scheurich
> >>>>>>>> <jan.scheur...@ericsson.com
> >>>>>>>> <mailto:jan.scheur...@ericsson.com>>;
> >>>>>>>> Ben Pfaff (b...@ovn.org <mailto:b...@ovn.org>) <b...@ovn.org
> >>>>> <mailto:b...@ovn.org>>;
> >>>>>>>> acon...@redhat.com <mailto:acon...@redhat.com>; Venkatesan
> >>>>>>>> Pradeep <venkatesan.prad...@ericsson.com
> >>>>>>>> <mailto:venkatesan.prad...@ericsson.com>>
> >>>>>>>> Subject: Mempool issue for OVS 2.9
> >>>>>>>>
> >>>>>>>> Hi All,
> >>>>>>>>
> >>>>>>>> Recently an issue was raised regarding the move from a single
> >>>>>>>> shared mempool model that was in place up to OVS 2.8, to a
> >>>>>>>> mempool
> >>> per port model introduced in 2.9.
> >>>>>>>>
> >>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January
> >>>>>>>> /04
> >>>>>>>> 6021
> >>>>>>>> .html
> >>>>>>>>
> >>>>>>>> The per port mempool model was introduced in September 5th with
> >>>>>>>> commit d555d9bded to allow fine grain control on a per port
> >>>>>>>> case of
> >>> memory usage.
> >>>>>>>>
> >>>>>>>> In the 'common/shared mempool' model, ports sharing a similar
> >>>>>>>> MTU would all share the same buffer mempool (e.g. the most
> >>>>>>>> common example of this being that all ports are by default
> >>>>>>>> created with a
> >>>>>>> 1500B MTU, and as such share the same mbuf mempool).
> >>>>>>>>
> >>>>>>>> This approach had some drawbacks however. For example, with the
> >>>>>>>> shared memory pool model a user could exhaust the shared memory
> >>>>>>>> pool (for instance by requesting a large number of RXQs for a
> >>>>>>>> port), this would cause the vSwitch to crash as any remaining
> >>>>>>>> ports would not have the required memory to function. This bug
> >>>>>>>> was discovered and
> >>>>>>> reported to the community in late 2016
> >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> >>> September/042560.html.
> >>>>>>>>
> >>>>>>>> The per port mempool patch aimed to avoid such issues by
> >>>>>>>> allocating
> >>> a separate buffer mempool to each port.
> >>>>>>>>
> >>>>>>>> An issue has been flagged on ovs-discuss, whereby memory
> >>>>>>>> dimensions provided for a given number of ports on OvS 2.6-2.8
> >>>>>>>> may be insufficient to support the same number of ports in OvS
> >>>>>>>> 2.9, on account of the per-port mempool model without
> >>>>>>>> re-dimensioning extra memory. The effect of this is use case
> >>>>>>>> dependent (number of ports, RXQs, MTU settings, number of PMDs
> >>>>>>>> etc.) The previous
> >>>>> 'common-
> >>>>>>> pool' model was rudimentary in estimating the mempool size and
> >>>>>>> was marked as something that was to be improved upon. The
> >>>>> memory
> >>>>>>> allocation calculation for per port model was modified to take
> >>>>>>> the
> >>> possible configuration factors mentioned into account.
> >>>>>>>>
> >>>>>>>> It's unfortunate that this came to light so close to the
> >>>>>>>> release code freeze - but better late than never as it is a
> >>>>>>>> valid problem to
> >>> be resolved.
> >>>>>>>>
> >>>>>>>> I wanted to highlight some options to the community as I don't
> >>>>>>>> think the next steps should be taken in isolation due to the
> >>>>>>>> impact
> >>> this feature has.
> >>>>>>>>
> >>>>>>>> There are a number of possibilities for the 2.9 release.
> >>>>>>>>
> >>>>>>>> (i) Revert the mempool per port patches and return to the
> >>>>>>>> shared mempool model. There are a number of features and
> >>>>>>>> refactoring in place on top of the change so this will not be a 
> >>>>>>>> simple
> revert.
> >>>>>>>> I'm
> >>>>>>> investigating what exactly is involved with this currently.
> >>>>>>>
> >>>>>>> The shared mempool concept has been working fairly well in our
> >>>>>>> NFVI systems for a long time. One can argue that it is too
> >>>>>>> simplistic
> >>>>> (as it
> >>>>>>> does not at all take the number of ports and queue lengths into
> >>>>>>> account) and may be insufficient in specific scenarios but at
> >>>>>>> least
> >>> it can operate with a reasonable amount of memory.
> >>>>>>>
> >>>>>>> Out of the very many vhostuser ports only very few actually
> >>>>>>> carry significant amount of traffic. To reserve the theoretical
> >>>>>>> worst case
> >>> amount of buffers for every port separately is complete overkill.
> >>>>>>>
> >>>>>>>> (ii) Leave the per port mempool implementation as is, flag to
> >>>>>>>> users that memory requirements have increased. Extra memory may
> >>>>>>>> have
> >>> to be provided on a per use case basis.
> >>>>>>>
> >>>>>>> [Jan] From my point of view this a blocker for roll-out of OVS
> >>>>>>> 2.9 in existing NFVI Cloud system. On these systems the amount
> >>>>>>> of huge pages allocated to OVS is fixed by zero day
> >>>>>>> configuration and there
> >>> is no easy way to change that memory allocation as part of a SW
> >>> upgrade procedure.
> >>>>>>>
> >>>>>>> As such servers host a significant number of vhostuser ports and
> >>>>>>> rx queues (in the order of 100 or more) the new per-port mempool
> >>>>>>> scheme
> >>> will likely cause OVS to fail after SW upgrade.
> >>>>>>>
> >>>>>>>> (iii) Reduce the amount of memory allocated per mempool per port.
> >>>>>>>> An RFC to this effect was submitted by Kevin but on follow up
> >>>>>>>> the
> >>> feeling is that it does not resolve the issue adequately.
> >>>>>>>
> >>>>>>> If we can't get the shared mempool back into 2.9. from day one,
> >>>>>>> this might be an emergency measure to support a reasonable
> >>>>>>> number
> >>>>> of
> >>>>>>> ports with typically deployed huge-page assignments.
> >>>>>>>
> >>>>>>> [Pradeep] The per-port partitioning scheme uses a port's queue
> >>>>>>> configuration to decide how many mbufs to allocate. That is fine
> >>>>>>> so
> >>>>> long
> >>>>>>> as the buffers are consumed by that port only. However, as
> >>>>>>> discussed on the original thread, mbufs of one port may go and
> >>>>>>> sit on the queues of other ports and if we were to account for
> >>>>>>> that the estimate would bloat up.  Also, since the calculation
> >>>>>>> would depend on the configuration of other ports the
> >>>>>>> requirements can increase long after the port is created. This
> >>>>>>> defeats the original intent behind the
> >>>>> per-
> >>>>>>> port allocation scheme.
> >>>>>>>
> >>>>>>> If retaining the new scheme is the only option for 2.9, perhaps
> >>>>>>> we can consider the following based on what was discussed in the
> >>>>>>> other
> >>>>>>> thread:
> >>>>>>> - the worst-case dimensioning assumes that tx queues will be full.
> >>>>>>> Unless the queues are drained slowly compared to the enqueue
> >>>>>>> rate (due to flow control, mismatched speeds etc) the queue
> >>>>>>> occupancy is
> >>> likely to be low. Instead of using txq_size (default: 2048) we can
> >>> consider using a lower value to calculate the # of mbufs allocated &
> >>> used by the port.
> >>>>>>> - since zero-copy for vhostuser ports is not yet enabled (or at
> >>>>>>> least not by default) and we always end up copying packets, the
> >>>>> occupancy is
> >>>>>>> effectively only 1. We could allocate a lower number of mbufs
> >>>>>>> for vhostuser ports
> >>>>>>>
> >>>>>>> To account for stolen buffers (i.e mbufs placed on other port
> >>>>>>> queues) we can add fixed value. Since the number of dpdk ports
> >>>>>>> would be much lower comparted to the number of vhostuser ports
> >>>>>>> and since zero-copy is not enabled for vhostuser ports this
> >>>>>>> fixed value need
> >>>>> not
> >>>>>>> be very high.
> >>>>>>>
> >>>>>>> I don't think this will necessarily address all use-cases but if
> >>>>>>> the queue occupancy and the fixed value (based on max expected #
> >>>>>>> of dpdk
> >>>>>>> ports/queues) are chosen properly it should, hopefully, cover
> >>>>>>> most
> >>> common deployments.
> >>>>>>>
> >>>>>>>> (iv) Introduce a feature to allow users to configure mempool as
> >>>>>>>> shared or on a per port basis: This would be the best of both
> >>>>>>>> worlds but given the proximity to the 2.9 freeze I don't think
> >>>>>>>> it's feasible by the
> >>>>> end
> >>>>>>> of January.
> >>>>>>>
> >>>>>>> Yes, too late now. But something that should be backported to
> >>>>>>> 2.9 as
> >>> soon as we have it on master.
> >>>>>>>
> >>>>>>> I think that we should aim for the combination of adaptive
> >>>>>>> shared mempools as default and explicitly configurable per-port
> >>>>>>> mempools,
> >>> when needed.
> >>>>>>>
> >>>>>>> Regards, Jan
> >>>>>>
> >
> >
> >
> >
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to