> -----Original Message-----
> From: Kevin Traynor [mailto:[email protected]]
> Sent: Friday, January 26, 2018 3:48 PM
> To: Ilya Maximets <[email protected]>; Stokes, Ian
> <[email protected]>; [email protected]
> Cc: Flavio Leitner <[email protected]>; Loftus, Ciara
> <[email protected]>; Kavanagh, Mark B <[email protected]>;
> Jan Scheurich ([email protected]) <[email protected]>;
> Ben Pfaff ([email protected]) <[email protected]>; [email protected]; Venkatesan
> Pradeep <[email protected]>
> Subject: Re: Mempool issue for OVS 2.9
>
> hOn 01/26/2018 03:16 PM, Ilya Maximets wrote:
> > On 26.01.2018 15:00, Stokes, Ian wrote:
> >> Hi All,
> >>
> >> Recently an issue was raised regarding the move from a single shared
> mempool model that was in place up to OVS 2.8, to a mempool per port model
> introduced in 2.9.
> >>
> >> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/04602
> >> 1.html
> >>
> >> The per port mempool model was introduced in September 5th with commit
> d555d9bded to allow fine grain control on a per port case of memory usage.
> >>
> >> In the 'common/shared mempool' model, ports sharing a similar MTU would
> all share the same buffer mempool (e.g. the most common example of this
> being that all ports are by default created with a 1500B MTU, and as such
> share the same mbuf mempool).
> >>
> >> This approach had some drawbacks however. For example, with the shared
> memory pool model a user could exhaust the shared memory pool (for
> instance by requesting a large number of RXQs for a port), this would
> cause the vSwitch to crash as any remaining ports would not have the
> required memory to function. This bug was discovered and reported to the
> community in late 2016 https://mail.openvswitch.org/pipermail/ovs-
> discuss/2016-September/042560.html.
> >>
> >> The per port mempool patch aimed to avoid such issues by allocating a
> separate buffer mempool to each port.
> >>
> >> An issue has been flagged on ovs-discuss, whereby memory dimensions
> provided for a given number of ports on OvS 2.6-2.8 may be insufficient to
> support the same number of ports in OvS 2.9, on account of the per-port
> mempool model without re-dimensioning extra memory. The effect of this is
> use case dependent (number of ports, RXQs, MTU settings, number of PMDs
> etc.) The previous 'common-pool' model was rudimentary in estimating the
> mempool size and was marked as something that was to be improved upon. The
> memory allocation calculation for per port model was modified to take the
> possible configuration factors mentioned into account.
> >>
> >> It's unfortunate that this came to light so close to the release code
> freeze - but better late than never as it is a valid problem to be
> resolved.
> >>
> >> I wanted to highlight some options to the community as I don't think
> the next steps should be taken in isolation due to the impact this feature
> has.
> >>
> >> There are a number of possibilities for the 2.9 release.
> >>
> >> (i) Revert the mempool per port patches and return to the shared
> mempool model. There are a number of features and refactoring in place on
> top of the change so this will not be a simple revert. I'm investigating
> what exactly is involved with this currently.
> >
> > This looks like a bad thing to do. Shared memory pools has their own
> > issues and hides the real memory usage by each port. Also, reverting
> > seems almost impossible, we'll have to re-implement it from scratch.
I would agree, reverting isn’t as straight forward an option due to the amount
of commits that were introduced in relation to the per port mempool feature
over time(listed below for completeness).
netdev-dpdk: Create separate memory pool for each port: d555d9b
netdev-dpdk: fix management of pre-existing mempools:b6b26021d
netdev-dpdk: Fix mempool names to reflect socket id: f06546a
netdev-dpdk: skip init for existing mempools: 837c176
netdev-dpdk: manage failure in mempool name creation: 65056fd
netdev-dpdk: Reword mp_size as n_mbufs: ad9b5b9
netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free: a08a115
netdev-dpdk: Fix mp_name leak on snprintf failure: ec6edc8
netdev-dpdk: Fix dpdk_mp leak in case of EEXIST: 173ef76
netdev-dpdk: Factor out struct dpdk_mp: 24e78f9
netdev-dpdk: Remove unused MAX_NB_MBUF: bc57ed9
netdev-dpdk: Fix mempool creation with large MTU: af5b0da
netdev-dpdk: Add debug appctl to get mempool information: be48173
Although a lot of these are fixes/formatting, we would have to introduce a new
series and re-introduce shared model by removing the per port specifics. This
will require extensive testing also.
Functionality such as the mempool debug tool would also be impacted as well
(I'm not sure how valuable it would be for 1 shared mempool).
I can look at putting an RFC for this together so as to keep the option on the
table if it's preferred within the 2.9 time window.
> >>> (ii) Leave the per port mempool implementation as is, flag to users
> that memory requirements have increased. Extra memory may have to be
> provided on a per use case basis.
> >
> > That's a good point. I prefer this option if possible.
>
> With 4 pmds and 4 rxqs per port a user would only be able to add 7 ports
> per socket now. 9 ports if 1 Rxq. Of course a user may have some headroom
> in the memory they have allocated but that can't be assumed. It will
> surely mean some users setup will not work after upgrade - then they will
> have to debug why.
>
> If we just reduce the MIN_NB_MBUF to 8192, it would make it 10 ports/4
> rxqs, or 13 ports/1 rxqs without additional memory. It still gives 24K/18K
> buffers per port.
>
> What do you think?
This is similar to the RFC you submitted I think.
I take it we still hit the issues flagged by Pradeep but it increases the
number of ports at least.
The 24K/18K is in and around the region Ilya had flagged below so it might be
ok. It would be great if we had common config/deployment info to help confirm
this.
>
> >
> >> (iii) Reduce the amount of memory allocated per mempool per port. An
> RFC to this effect was submitted by Kevin but on follow up the feeling is
> that it does not resolve the issue adequately.
> >
>
> Right, we discussed it in the other thread. It fixes some of the missed
> factors for worst case, but Pradeep pointed out there are more. To account
> for them all would be difficult and mean a lot of extra mbufs per port.
>
> > IMHO, we should not reduce the mempool size by hardcoding small value.
> > Different installations has their own requirements because of
> > different numbers of HW NICs and CPU cores. I beleive that 20-30K is a
> normal size per-port for now.
> >
>
> Would be not too far off that range with suggested change above.
>
> > How about to add an upper limit for mempool size configurable in boot
> time?
>
> I don't think an OVS user would know how to calculate the number of mbufs.
> Even if they did, setting it would constitute them okaying that they may
> run out of mbufs on a port.
I think with the shared memory model as it was before risks running out of
mbufs also as per the original bug where it was killing the switch.
Below helps with the debug side for users. Ignoring whether they have the
ability to calculate the required memory for the moment (I agree BTW, I don’t
think some users would know how to accurately do this), would the new parameter
still be an issue for backwards compatibility?
>
> > See below code for example (not tested):
> > ----------------------------------------------------------------------
> > --------- diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index
> > 3602180..ed54e06 100644
> > --- a/lib/dpdk-stub.c
> > +++ b/lib/dpdk-stub.c
> > @@ -54,3 +54,9 @@ dpdk_vhost_iommu_enabled(void) {
> > return false;
> > }
> > +
> > +uint32_t
> > +dpdk_mempool_size_limit(void)
> > +{
> > + return UINT32_MAX;
> > +}
> > diff --git a/lib/dpdk.c b/lib/dpdk.c
> > index 6710d10..219e412 100644
> > --- a/lib/dpdk.c
> > +++ b/lib/dpdk.c
> > @@ -43,6 +43,9 @@ static FILE *log_stream = NULL; /* Stream for
> DPDK log redirection */
> > static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets
> */
> > static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU
> > support */
> >
> > +/* Maximum number of mbufs in a single memory pool. */ static
> > +uint32_t per_port_mempool_size_limit = UINT32_MAX;
> > +
> > static int
> > process_vhost_flags(char *flag, const char *default_val, int size,
> > const struct smap *ovs_other_config, @@ -313,6
> > +316,7 @@ dpdk_init__(const struct smap *ovs_other_config)
> > int err = 0;
> > cpu_set_t cpuset;
> > char *sock_dir_subcomponent;
> > + uint64_t size_u64;
> >
> > log_stream = fopencookie(NULL, "w+", dpdk_log_func);
> > if (log_stream == NULL) {
> > @@ -351,6 +355,14 @@ dpdk_init__(const struct smap *ovs_other_config)
> > VLOG_INFO("IOMMU support for vhost-user-client %s.",
> > vhost_iommu_enabled ? "enabled" : "disabled");
> >
> > + size_u64 = smap_get_ullong(ovs_other_config, "per-port-mempool-
> size-limit",
> > + UINT32_MAX);
> > + if (size_u64 < UINT32_MAX) {
> > + per_port_mempool_size_limit = size_u64;
> > + VLOG_INFO("Mempool size for a single port limited to %"PRIu32"
> mbufs.",
> > + per_port_mempool_size_limit);
> > + }
> > +
> > argv = grow_argv(&argv, 0, 1);
> > argc = 1;
> > argv[0] = xstrdup(ovs_get_program_name()); @@ -494,6 +506,12 @@
> > dpdk_vhost_iommu_enabled(void)
> > return vhost_iommu_enabled;
> > }
> >
> > +uint32_t
> > +dpdk_mempool_size_limit(void)
> > +{
> > + return per_port_mempool_size_limit; }
> > +
> > void
> > dpdk_set_lcore_id(unsigned cpu)
> > {
> > diff --git a/lib/dpdk.h b/lib/dpdk.h
> > index dc58d96..4d78895 100644
> > --- a/lib/dpdk.h
> > +++ b/lib/dpdk.h
> > @@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
> > void dpdk_set_lcore_id(unsigned cpu); const char
> > *dpdk_get_vhost_sock_dir(void); bool dpdk_vhost_iommu_enabled(void);
> > +uint32_t dpdk_mempool_size_limit(void);
> >
> > #endif /* dpdk.h */
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > e834c7e..b2f1072 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -518,7 +518,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> > char mp_name[RTE_MEMPOOL_NAMESIZE];
> > const char *netdev_name = netdev_get_name(&dev->up);
> > int socket_id = dev->requested_socket_id;
> > - uint32_t n_mbufs;
> > + uint32_t n_mbufs, max_mbufs;
> > uint32_t hash = hash_string(netdev_name, 0);
> > struct rte_mempool *mp = NULL;
> >
> > @@ -534,6 +534,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> > + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) *
> NETDEV_MAX_BURST
> > + MIN_NB_MBUF;
> >
> > + max_mbufs = dpdk_mempool_size_limit();
> > + if (max_mbufs < n_mbufs) {
> > + VLOG_WARN("Mempool size for port %s limited to %"PRIu32
> > + " mbufs. Original estimation was %"PRIu32" mbufs.",
> > + netdev_name, max_mbufs, n_mbufs);
> > + }
> > +
> > ovs_mutex_lock(&dpdk_mp_mutex);
> > do {
> > /* Full DPDK memory pool name must be unique and cannot be
> >
> > ----------------------------------------------------------------------
> > ---------
> >
> > I think that above approach is a good enough workaround because only
> > user knows how much memory he/she able to allocate for OVS and how
> > many ports expected.
> > I'd like the limit to be in megabytes, but I didn't find a good way to
> > estimate total size of a mempool.
> >
> >> (iv) Introduce a feature to allow users to configure mempool as shared
> or on a per port basis: This would be the best of both worlds but given
> the proximity to the 2.9 freeze I don't think it's feasible by the end of
> January.
> >
> > This, probably, may solve some issues, but yes, it's definitely not for
> 2.9.
> > Good implementation and extensive testing will be required.
>
> +1. There may be other solutions too, Pradeep had a different
> suggestion, but it's not for now.
OK we can agree to remove this option for the moment, at least within the
period for the OVS code freeze. It will definitely be looked at for 2.10
however.
Thanks
Ian
>
> thanks,
> Kevin.
>
> >
> >>
> >> I'd appreciate peoples input on this ASAP so we can reach a consensus
> on the next steps.
> >>
> >> Thanks
> >> Ian
>
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss