On Monday 09 April 2018 11:30 PM, Anatoly Burakov wrote:
This patchset introduces dynamic memory allocation for DPDK (aka memory
hotplug). Based upon RFC submitted in December [1].

Dependencies (to be applied in specified order):
- EAL IOVA fix [2]

Deprecation notices relevant to this patchset:
- General outline of memory hotplug changes [3]

The vast majority of changes are in the EAL and malloc, the external API
disruption is minimal: a new flag is added to memzone API for contiguous
memory allocation, a few API additions in rte_memory due to switch
to memseg_lists as opposed to memsegs, and a few new convenience API's.
Every other API change is internal to EAL, and all of the memory
allocation/freeing is handled through rte_malloc, with no externally
visible API changes.

Quick outline of all changes done as part of this patchset:

  * Malloc heap adjusted to handle holes in address space
  * Single memseg list replaced by multiple memseg lists
  * VA space for hugepages is preallocated in advance
  * Added alloc/free for pages happening as needed on rte_malloc/rte_free
  * Added contiguous memory allocation API's for rte_memzone
  * Added convenience API calls to walk over memsegs
  * Integrated Pawel Wodkowski's patch for registering/unregistering memory
    with VFIO [4]
  * Callbacks for registering memory allocations
  * Callbacks for allowing/disallowing allocations above specified limit
  * Multiprocess support done via DPDK IPC introduced in 18.02

The biggest difference is a "memseg" now represents a single page (as opposed to
being a big contiguous block of pages). As a consequence, both memzones and
malloc elements are no longer guaranteed to be physically contiguous, unless
the user asks for it at reserve time. To preserve whatever functionality that
was dependent on previous behavior, a legacy memory option is also provided,
however it is expected (or perhaps vainly hoped) to be temporary solution.

Why multiple memseg lists instead of one? Since memseg is a single page now,
the list of memsegs will get quite big, and we need to locate pages somehow
when we allocate and free them. We could of course just walk the list and
allocate one contiguous chunk of VA space for memsegs, but this
implementation uses separate lists instead in order to speed up many
operations with memseg lists.

For v5, the following limitations are present:
- VFIO support for multiple processes is not well-tested; work is ongoing
   to validate VFIO for all use cases
- There are known problems with PPC64 VFIO code
As below.

- For DPAA and FSLMC platforms, performance will be heavily degraded for
   IOVA as PA cases; separate patches are expected to address the issue

For testing, it is recommended to use the GitHub repository [5], as it will
have all of the dependencies already integrated.

Tested-by: Hemant Agrawal <hemant.agra...@nxp.com>
Tested-by: Santosh Shukla <santosh.shu...@caviumnetworks.com>

Tested-by: Gowrishankar Muthukrishnan <gowrishanka...@linux.vnet.ibm.com>

VFIO related validations being done on powerpc still, so I'll post our arch specific changes, as I test more. This would not block this patch set to get merged, as the changes we would observe is mostly on top of sPAPR IOMMU (which is specific to powerpc only) and does
not affect other arch.

Thanks,
Gowrishankar

v5:
     - Fixed missing DMA window creation on PPC64 for VFIO
     - fslmc VFIO fixes
     - Added new user DMA map code to keep track of user DMA maps
       when hotplug is in use (also used on PPC64 on remap)
     - A few checkpatch and commit message fixes here and there

v4:
     - Fixed bug in memzone lookup
     - Added draft fslmc VFIO code
     - Rebased on latest master + dependent patchset
     - Documented limitations for *_walk() functions

v3:
     - Lots of compile fixes
     - Fixes for multiprocess synchronization
     - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM
     - Fixes for mempool size calculation
     - Added convenience memseg walk() API's
     - Added alloc validation callback

v2: - fixed deadlock at init
     - reverted rte_panic changes at init, this is now handled inside IPC

[1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/
[2] http://dpdk.org/dev/patchwork/bundle/aburakov/IOVA_mode_fixes/
[3] http://dpdk.org/dev/patchwork/patch/34002/
[4] http://dpdk.org/dev/patchwork/patch/24484/
[5] https://github.com/anatolyburakov/dpdk

Anatoly Burakov (70):
   eal: move get_virtual_area out of linuxapp eal_memory.c
   eal: move all locking to heap
   eal: make malloc heap a doubly-linked list
   eal: add function to dump malloc heap contents
   test: add command to dump malloc heap contents
   eal: make malloc_elem_join_adjacent_free public
   eal: make malloc free list remove public
   eal: make malloc free return resulting malloc element
   eal: replace panics with error messages in malloc
   eal: add backend support for contiguous allocation
   eal: enable reserving physically contiguous memzones
   ethdev: use contiguous allocation for DMA memory
   crypto/qat: use contiguous allocation for DMA memory
   net/avf: use contiguous allocation for DMA memory
   net/bnx2x: use contiguous allocation for DMA memory
   net/bnxt: use contiguous allocation for DMA memory
   net/cxgbe: use contiguous allocation for DMA memory
   net/ena: use contiguous allocation for DMA memory
   net/enic: use contiguous allocation for DMA memory
   net/i40e: use contiguous allocation for DMA memory
   net/qede: use contiguous allocation for DMA memory
   net/virtio: use contiguous allocation for DMA memory
   net/vmxnet3: use contiguous allocation for DMA memory
   mempool: add support for the new allocation methods
   eal: add function to walk all memsegs
   bus/fslmc: use memseg walk instead of iteration
   bus/pci: use memseg walk instead of iteration
   net/mlx5: use memseg walk instead of iteration
   eal: use memseg walk instead of iteration
   mempool: use memseg walk instead of iteration
   test: use memseg walk instead of iteration
   vfio/type1: use memseg walk instead of iteration
   vfio/spapr: use memseg walk instead of iteration
   eal: add contig walk function
   virtio: use memseg contig walk instead of iteration
   eal: add iova2virt function
   bus/dpaa: use iova2virt instead of memseg iteration
   bus/fslmc: use iova2virt instead of memseg iteration
   crypto/dpaa_sec: use iova2virt instead of memseg iteration
   eal: add virt2memseg function
   bus/fslmc: use virt2memseg instead of iteration
   crypto/dpaa_sec: use virt2memseg instead of iteration
   net/mlx4: use virt2memseg instead of iteration
   net/mlx5: use virt2memseg instead of iteration
   eal: use memzone walk instead of iteration
   vfio: allow to map other memory regions
   eal: add "legacy memory" option
   eal: add rte_fbarray
   eal: replace memseg with memseg lists
   eal: replace memzone array with fbarray
   eal: add support for mapping hugepages at runtime
   eal: add support for unmapping pages at runtime
   eal: add "single file segments" command-line option
   eal: add API to check if memory is contiguous
   eal: prepare memseg lists for multiprocess sync
   eal: read hugepage counts from node-specific sysfs path
   eal: make use of memory hotplug for init
   eal: share hugepage info primary and secondary
   eal: add secondary process init with memory hotplug
   eal: enable memory hotplug support in rte_malloc
   eal: add support for multiprocess memory hotplug
   eal: add support for callbacks on memory hotplug
   eal: enable callbacks on malloc/free and mp sync
   vfio: enable support for mem event callbacks
   bus/fslmc: move vfio DMA map into bus probe
   bus/fslmc: enable support for mem event callbacks for vfio
   eal: enable non-legacy memory mode
   eal: add memory validator callback
   eal: enable validation before new page allocation
   eal: prevent preallocated pages from being freed

  config/common_base                                |   15 +-
  config/defconfig_i686-native-linuxapp-gcc         |    3 +
  config/defconfig_i686-native-linuxapp-icc         |    3 +
  config/defconfig_x86_x32-native-linuxapp-gcc      |    3 +
  config/rte_config.h                               |    7 +-
  doc/guides/rel_notes/deprecation.rst              |    9 -
  drivers/bus/dpaa/rte_dpaa_bus.h                   |   12 +-
  drivers/bus/fslmc/fslmc_bus.c                     |   11 +
  drivers/bus/fslmc/fslmc_vfio.c                    |  195 +++-
  drivers/bus/fslmc/portal/dpaa2_hw_pvt.h           |   27 +-
  drivers/bus/pci/Makefile                          |    3 +
  drivers/bus/pci/linux/pci.c                       |   28 +-
  drivers/bus/pci/meson.build                       |    3 +
  drivers/crypto/dpaa_sec/dpaa_sec.c                |   30 +-
  drivers/crypto/qat/qat_qp.c                       |   23 +-
  drivers/event/dpaa2/Makefile                      |    3 +
  drivers/mempool/dpaa/Makefile                     |    3 +
  drivers/mempool/dpaa/meson.build                  |    3 +
  drivers/mempool/dpaa2/Makefile                    |    3 +
  drivers/mempool/dpaa2/meson.build                 |    3 +
  drivers/net/avf/avf_ethdev.c                      |    4 +-
  drivers/net/bnx2x/bnx2x.c                         |    2 +-
  drivers/net/bnx2x/bnx2x_rxtx.c                    |    3 +-
  drivers/net/bnxt/bnxt_ethdev.c                    |   17 +-
  drivers/net/bnxt/bnxt_ring.c                      |    9 +-
  drivers/net/bnxt/bnxt_vnic.c                      |    8 +-
  drivers/net/cxgbe/sge.c                           |    3 +-
  drivers/net/dpaa/Makefile                         |    3 +
  drivers/net/dpaa2/Makefile                        |    3 +
  drivers/net/dpaa2/dpaa2_ethdev.c                  |    1 -
  drivers/net/dpaa2/meson.build                     |    3 +
  drivers/net/ena/Makefile                          |    3 +
  drivers/net/ena/base/ena_plat_dpdk.h              |    9 +-
  drivers/net/ena/ena_ethdev.c                      |   10 +-
  drivers/net/enic/enic_main.c                      |    9 +-
  drivers/net/i40e/i40e_ethdev.c                    |    4 +-
  drivers/net/i40e/i40e_rxtx.c                      |    4 +-
  drivers/net/mlx4/mlx4_mr.c                        |   18 +-
  drivers/net/mlx5/Makefile                         |    3 +
  drivers/net/mlx5/mlx5.c                           |   25 +-
  drivers/net/mlx5/mlx5_mr.c                        |   19 +-
  drivers/net/qede/base/bcm_osal.c                  |    7 +-
  drivers/net/virtio/virtio_ethdev.c                |    8 +-
  drivers/net/virtio/virtio_user/vhost_kernel.c     |   83 +-
  drivers/net/vmxnet3/vmxnet3_ethdev.c              |    5 +-
  lib/librte_eal/bsdapp/eal/Makefile                |    4 +
  lib/librte_eal/bsdapp/eal/eal.c                   |   83 +-
  lib/librte_eal/bsdapp/eal/eal_hugepage_info.c     |   65 +-
  lib/librte_eal/bsdapp/eal/eal_memalloc.c          |   48 +
  lib/librte_eal/bsdapp/eal/eal_memory.c            |  224 +++-
  lib/librte_eal/bsdapp/eal/meson.build             |    1 +
  lib/librte_eal/common/Makefile                    |    2 +-
  lib/librte_eal/common/eal_common_fbarray.c        |  859 ++++++++++++++++
  lib/librte_eal/common/eal_common_memalloc.c       |  359 +++++++
  lib/librte_eal/common/eal_common_memory.c         |  824 ++++++++++++++-
  lib/librte_eal/common/eal_common_memzone.c        |  235 +++--
  lib/librte_eal/common/eal_common_options.c        |   13 +-
  lib/librte_eal/common/eal_filesystem.h            |   30 +
  lib/librte_eal/common/eal_hugepages.h             |   11 +-
  lib/librte_eal/common/eal_internal_cfg.h          |   12 +-
  lib/librte_eal/common/eal_memalloc.h              |   80 ++
  lib/librte_eal/common/eal_options.h               |    4 +
  lib/librte_eal/common/eal_private.h               |   33 +
  lib/librte_eal/common/include/rte_eal_memconfig.h |   28 +-
  lib/librte_eal/common/include/rte_fbarray.h       |  353 +++++++
  lib/librte_eal/common/include/rte_malloc.h        |   10 +
  lib/librte_eal/common/include/rte_malloc_heap.h   |    6 +
  lib/librte_eal/common/include/rte_memory.h        |  258 ++++-
  lib/librte_eal/common/include/rte_memzone.h       |   12 +-
  lib/librte_eal/common/include/rte_vfio.h          |   41 +
  lib/librte_eal/common/malloc_elem.c               |  433 ++++++--
  lib/librte_eal/common/malloc_elem.h               |   43 +-
  lib/librte_eal/common/malloc_heap.c               |  704 ++++++++++++-
  lib/librte_eal/common/malloc_heap.h               |   15 +-
  lib/librte_eal/common/malloc_mp.c                 |  744 ++++++++++++++
  lib/librte_eal/common/malloc_mp.h                 |   86 ++
  lib/librte_eal/common/meson.build                 |    4 +
  lib/librte_eal/common/rte_malloc.c                |   85 +-
  lib/librte_eal/linuxapp/eal/Makefile              |    5 +
  lib/librte_eal/linuxapp/eal/eal.c                 |   62 +-
  lib/librte_eal/linuxapp/eal/eal_hugepage_info.c   |  218 +++-
  lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 1123 +++++++++++++++++++++
  lib/librte_eal/linuxapp/eal/eal_memory.c          | 1119 ++++++++++++--------
  lib/librte_eal/linuxapp/eal/eal_vfio.c            |  870 ++++++++++++++--
  lib/librte_eal/linuxapp/eal/eal_vfio.h            |   12 +
  lib/librte_eal/linuxapp/eal/meson.build           |    1 +
  lib/librte_eal/rte_eal_version.map                |   30 +-
  lib/librte_ether/rte_ethdev.c                     |    3 +-
  lib/librte_mempool/Makefile                       |    3 +
  lib/librte_mempool/meson.build                    |    3 +
  lib/librte_mempool/rte_mempool.c                  |  149 ++-
  test/test/commands.c                              |    3 +
  test/test/test_malloc.c                           |   30 +-
  test/test/test_memory.c                           |   27 +-
  test/test/test_memzone.c                          |   62 +-
  95 files changed, 8794 insertions(+), 1285 deletions(-)
  create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c
  create mode 100644 lib/librte_eal/common/eal_common_fbarray.c
  create mode 100644 lib/librte_eal/common/eal_common_memalloc.c
  create mode 100644 lib/librte_eal/common/eal_memalloc.h
  create mode 100644 lib/librte_eal/common/include/rte_fbarray.h
  create mode 100644 lib/librte_eal/common/malloc_mp.c
  create mode 100644 lib/librte_eal/common/malloc_mp.h
  create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c


Reply via email to