Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-06 Thread Burakov, Anatoly

On 06-Apr-18 1:01 PM, Hemant Agrawal wrote:

Hi Thomas


-Original Message-
From: Thomas Monjalon [mailto:tho...@monjalon.net]
Sent: Thursday, April 05, 2018 7:43 PM
To: Shreyansh Jain 
Cc: Anatoly Burakov ; dev@dpdk.org;
keith.wi...@intel.com; jianfeng@intel.com;
andras.kov...@ericsson.com; laszlo.vadk...@ericsson.com;
benjamin.wal...@intel.com; bruce.richard...@intel.com;
konstantin.anan...@intel.com; kuralamudhan.ramakrish...@intel.com;
louise.m.d...@intel.com; nelio.laranje...@6wind.com;
ys...@mellanox.com; peppe...@japf.ch; jerin.ja...@caviumnetworks.com;
Hemant Agrawal ; olivier.m...@6wind.com;
gowrishanka...@linux.vnet.ibm.com
Subject: Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK
Importance: High

05/04/2018 16:24, Shreyansh Jain:

Physical addressing cases for both, dpaa/dpaa2, depend heavily on
the fact that physical addressing was the base and was available in
sorted manner. This is reversed/negated with hotplugging support.
So, rework of both the drivers is required from this perspective.
There are some suggestions floated by Anatoly and internally, but
work still needs to be done.
It also impacts a lot of use-cases for virtualization (no-iommu).


So what is your recommendation?
Can you rework PA case in dpaa/dpaa2 drivers within 18.05 timeframe?


We will like 2-3 more days on this before we can ack/nack this patch.
We are working on priority on this.  PA case rework is not a trivial change.


The patch is good to go. However, we will be making changes in dpaa/dpaa2 
drivers to fix the PA issues shortly (within 18.05 timeframe)


That's great to hear!



Anatoly needs to take care of following:
1. Comment by Shreyansh on " Re: [dpdk-dev] [PATCH v3 50/68] eal: replace memzone 
array with fbarray"


Yes, that is already fixed in both github and upcoming v4.


2. I could not apply the patches cleanly on current master.


The patchset has dependencies, listed in the cover letter. I'll rebase 
on latest master before sending v4 just in case.




Tested-by: Hemant Agrawal 
  

Regards,
Hemant







--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-06 Thread Hemant Agrawal
Hi Thomas

> > -Original Message-
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > Sent: Thursday, April 05, 2018 7:43 PM
> > To: Shreyansh Jain 
> > Cc: Anatoly Burakov ; dev@dpdk.org;
> > keith.wi...@intel.com; jianfeng@intel.com;
> > andras.kov...@ericsson.com; laszlo.vadk...@ericsson.com;
> > benjamin.wal...@intel.com; bruce.richard...@intel.com;
> > konstantin.anan...@intel.com; kuralamudhan.ramakrish...@intel.com;
> > louise.m.d...@intel.com; nelio.laranje...@6wind.com;
> > ys...@mellanox.com; peppe...@japf.ch; jerin.ja...@caviumnetworks.com;
> > Hemant Agrawal ; olivier.m...@6wind.com;
> > gowrishanka...@linux.vnet.ibm.com
> > Subject: Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK
> > Importance: High
> >
> > 05/04/2018 16:24, Shreyansh Jain:
> > > Physical addressing cases for both, dpaa/dpaa2, depend heavily on
> > > the fact that physical addressing was the base and was available in
> > > sorted manner. This is reversed/negated with hotplugging support.
> > > So, rework of both the drivers is required from this perspective.
> > > There are some suggestions floated by Anatoly and internally, but
> > > work still needs to be done.
> > > It also impacts a lot of use-cases for virtualization (no-iommu).
> >
> > So what is your recommendation?
> > Can you rework PA case in dpaa/dpaa2 drivers within 18.05 timeframe?
> >
> We will like 2-3 more days on this before we can ack/nack this patch.
> We are working on priority on this.  PA case rework is not a trivial change.

The patch is good to go. However, we will be making changes in dpaa/dpaa2 
drivers to fix the PA issues shortly (within 18.05 timeframe)

Anatoly needs to take care of following:
1. Comment by Shreyansh on " Re: [dpdk-dev] [PATCH v3 50/68] eal: replace 
memzone array with fbarray"
2. I could not apply the patches cleanly on current master.

Tested-by: Hemant Agrawal 
 
> Regards,
> Hemant
> 



Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-05 Thread santosh
Hi Anatoly,

On Wednesday 04 April 2018 04:51 AM, Anatoly Burakov wrote:
> This patchset introduces dynamic memory allocation for DPDK (aka memory
> hotplug). Based upon RFC submitted in December [1].
>
> Dependencies (to be applied in specified order):
> - IPC asynchronous request API patch [2]
> - Function to return number of sockets [3]
> - EAL IOVA fix [4]
>
> Deprecation notices relevant to this patchset:
> - General outline of memory hotplug changes [5]
> - EAL NUMA node count changes [6]
>
> The vast majority of changes are in the EAL and malloc, the external API
> disruption is minimal: a new set of API's are added for contiguous memory
> allocation for rte_memzone, and a few API additions in rte_memory due to
> switch to memseg_lists as opposed to memsegs. Every other API change is
> internal to EAL, and all of the memory allocation/freeing is handled
> through rte_malloc, with no externally visible API changes.
>
> Quick outline of all changes done as part of this patchset:
>
>  * Malloc heap adjusted to handle holes in address space
>  * Single memseg list replaced by multiple memseg lists
>  * VA space for hugepages is preallocated in advance
>  * Added alloc/free for pages happening as needed on rte_malloc/rte_free
>  * Added contiguous memory allocation API's for rte_memzone
>  * Added convenience API calls to walk over memsegs
>  * Integrated Pawel Wodkowski's patch for registering/unregistering memory
>with VFIO [7]
>  * Callbacks for registering memory allocations
>  * Callbacks for allowing/disallowing allocations above specified limit
>  * Multiprocess support done via DPDK IPC introduced in 18.02
>
> The biggest difference is a "memseg" now represents a single page (as opposed 
> to
> being a big contiguous block of pages). As a consequence, both memzones and
> malloc elements are no longer guaranteed to be physically contiguous, unless
> the user asks for it at reserve time. To preserve whatever functionality that
> was dependent on previous behavior, a legacy memory option is also provided,
> however it is expected (or perhaps vainly hoped) to be temporary solution.
>
> Why multiple memseg lists instead of one? Since memseg is a single page now,
> the list of memsegs will get quite big, and we need to locate pages somehow
> when we allocate and free them. We could of course just walk the list and
> allocate one contiguous chunk of VA space for memsegs, but this
> implementation uses separate lists instead in order to speed up many
> operations with memseg lists.
>
> For v3, the following limitations are present:
> - VFIO support is only smoke-tested (but is expected to work), VFIO support
>   with secondary processes is not tested; work is ongoing to validate VFIO
>   for all use cases
> - FSLMC bus VFIO code is not yet integrated, work is in progress
>
> For testing, it is recommended to use the GitHub repository [8], as it will
> have all of the dependencies already integrated.
>
> v3:
> - Lots of compile fixes
> - Fixes for multiprocess synchronization
> - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM
> - Fixes for mempool size calculation
> - Added convenience memseg walk() API's
> - Added alloc validation callback
>
> v2: - fixed deadlock at init
> - reverted rte_panic changes at init, this is now handled inside IPC

Tested-by: Santosh Shukla 



Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-05 Thread Hemant Agrawal
HI Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Thursday, April 05, 2018 7:43 PM
> To: Shreyansh Jain 
> Cc: Anatoly Burakov ; dev@dpdk.org;
> keith.wi...@intel.com; jianfeng@intel.com; andras.kov...@ericsson.com;
> laszlo.vadk...@ericsson.com; benjamin.wal...@intel.com;
> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> kuralamudhan.ramakrish...@intel.com; louise.m.d...@intel.com;
> nelio.laranje...@6wind.com; ys...@mellanox.com; peppe...@japf.ch;
> jerin.ja...@caviumnetworks.com; Hemant Agrawal
> ; olivier.m...@6wind.com;
> gowrishanka...@linux.vnet.ibm.com
> Subject: Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK
> Importance: High
> 
> 05/04/2018 16:24, Shreyansh Jain:
> > Physical addressing cases for both, dpaa/dpaa2, depend heavily on the
> > fact that physical addressing was the base and was available in sorted
> > manner. This is reversed/negated with hotplugging support.
> > So, rework of both the drivers is required from this perspective.
> > There are some suggestions floated by Anatoly and internally, but work
> > still needs to be done.
> > It also impacts a lot of use-cases for virtualization (no-iommu).
> 
> So what is your recommendation?
> Can you rework PA case in dpaa/dpaa2 drivers within 18.05 timeframe?
> 
We will like 2-3 more days on this before we can ack/nack this patch.
We are working on priority on this.  PA case rework is not a trivial change.

Regards,
Hemant




Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-05 Thread Thomas Monjalon
05/04/2018 16:24, Shreyansh Jain:
> Physical addressing cases for both, dpaa/dpaa2, depend heavily on the 
> fact that physical addressing was the base and was available in sorted 
> manner. This is reversed/negated with hotplugging support.
> So, rework of both the drivers is required from this perspective. There 
> are some suggestions floated by Anatoly and internally, but work still 
> needs to be done.
> It also impacts a lot of use-cases for virtualization (no-iommu).

So what is your recommendation?
Can you rework PA case in dpaa/dpaa2 drivers within 18.05 timeframe?





Re: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK

2018-04-05 Thread Shreyansh Jain

Hello Anatoly,

On Wednesday 04 April 2018 04:51 AM, Anatoly Burakov wrote:

This patchset introduces dynamic memory allocation for DPDK (aka memory
hotplug). Based upon RFC submitted in December [1].

Dependencies (to be applied in specified order):
- IPC asynchronous request API patch [2]
- Function to return number of sockets [3]
- EAL IOVA fix [4]

Deprecation notices relevant to this patchset:
- General outline of memory hotplug changes [5]
- EAL NUMA node count changes [6]

The vast majority of changes are in the EAL and malloc, the external API
disruption is minimal: a new set of API's are added for contiguous memory
allocation for rte_memzone, and a few API additions in rte_memory due to
switch to memseg_lists as opposed to memsegs. Every other API change is
internal to EAL, and all of the memory allocation/freeing is handled
through rte_malloc, with no externally visible API changes.

Quick outline of all changes done as part of this patchset:

  * Malloc heap adjusted to handle holes in address space
  * Single memseg list replaced by multiple memseg lists
  * VA space for hugepages is preallocated in advance
  * Added alloc/free for pages happening as needed on rte_malloc/rte_free
  * Added contiguous memory allocation API's for rte_memzone
  * Added convenience API calls to walk over memsegs
  * Integrated Pawel Wodkowski's patch for registering/unregistering memory
with VFIO [7]
  * Callbacks for registering memory allocations
  * Callbacks for allowing/disallowing allocations above specified limit
  * Multiprocess support done via DPDK IPC introduced in 18.02

The biggest difference is a "memseg" now represents a single page (as opposed to
being a big contiguous block of pages). As a consequence, both memzones and
malloc elements are no longer guaranteed to be physically contiguous, unless
the user asks for it at reserve time. To preserve whatever functionality that
was dependent on previous behavior, a legacy memory option is also provided,
however it is expected (or perhaps vainly hoped) to be temporary solution.

Why multiple memseg lists instead of one? Since memseg is a single page now,
the list of memsegs will get quite big, and we need to locate pages somehow
when we allocate and free them. We could of course just walk the list and
allocate one contiguous chunk of VA space for memsegs, but this
implementation uses separate lists instead in order to speed up many
operations with memseg lists.

For v3, the following limitations are present:
- VFIO support is only smoke-tested (but is expected to work), VFIO support
   with secondary processes is not tested; work is ongoing to validate VFIO
   for all use cases
- FSLMC bus VFIO code is not yet integrated, work is in progress

For testing, it is recommended to use the GitHub repository [8], as it will
have all of the dependencies already integrated.

v3:
 - Lots of compile fixes
 - Fixes for multiprocess synchronization
 - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM
 - Fixes for mempool size calculation
 - Added convenience memseg walk() API's
 - Added alloc validation callback

v2: - fixed deadlock at init
 - reverted rte_panic changes at init, this is now handled inside IPC

[1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/
[2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/
[3] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/
[4] http://dpdk.org/dev/patchwork/bundle/aburakov/IOVA_mode_fixes/
[5] http://dpdk.org/dev/patchwork/patch/34002/
[6] http://dpdk.org/dev/patchwork/patch/33853/
[7] http://dpdk.org/dev/patchwork/patch/24484/
[8] https://github.com/anatolyburakov/dpdk



Thanks for the huge work and continuously answering my barrage of questions.
Here is the update for dpaa/dpaa2. Incremental to [1]:

Note: Results based on github repo (16fbfef04a37bb9d714), rather than 
this series over master. Though, it shouldn't make that big a difference.


PA: Physical Addressing mode
VA: Virtual Addressing mode

1. DPAA2 PA and VA work OK over the github repo
 a. Almost equal performance for VA cases as compared to non-hotplug. 
Whether in --legacy-mem or without it. ($ see below)
 b. 70~90% drops in performance for PA case, depending on page size 
used (# see below)


2. DPAA PA (there is no VA mode) works with a minor fix over v3 which 
Anatoly knows about might incorporate in v4 (Patch 50/68)

 a. 70-90% performance drop. (# see below)

($)
There are some changes in dpaa2 code base which I will share against 
relevant patch in this series. That can be incorporated into v4 to 
enable dpaa2 in VA mode.


(#)
Physical addressing cases for both, dpaa/dpaa2, depend heavily on the 
fact that physical addressing was the base and was available in sorted 
manner. This is reversed/negated with hotplugging support.
So, rework of both the drivers is required from this perspective. There 
are some suggestions floated by Anatoly and internally, but work still