Re: [dpdk-dev] [PATCH v2 13/41] eal: replace memseg with memseg lists

2018-03-24 Thread Burakov, Anatoly

On 24-Mar-18 12:23 PM, santosh wrote:



On Saturday 24 March 2018 04:38 PM, Burakov, Anatoly wrote:

On 24-Mar-18 6:01 AM, santosh wrote:

Hi Anatoly,

Thanks for good work!.
Few observations:
# Noticed performance regression for thunderx platform for l3fwd application,
drops by 3%. git bisect shows this changeset is offending commit.
I'm still investigating reason for perf-dip..
Would like to know - have you noticed any regression on x86 platform?


I haven't noticed any regressions on x86. Would it by any chance be due to the 
fact that memory segments are now non-contiguous or are allocated from smaller 
page sizes first?

I am in the process of preparing a v3, which moves some things around and is 
better at git bisect (and fixes all compile issues i am or was made aware of). 
Does performance regression also happen in legacy mode?


Test ran for legacy mode only and noticed performance regression.

Thanks.
[..]




Legacy mode does not do IPC memory allocation, so that is out of the 
question. Does thunderx do any address translation or other memory 
lookups on fast path? That is the only thing that comes to mind that 
could affect performance once all allocations are complete.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 13/41] eal: replace memseg with memseg lists

2018-03-24 Thread santosh


On Saturday 24 March 2018 04:38 PM, Burakov, Anatoly wrote:
> On 24-Mar-18 6:01 AM, santosh wrote:
>> Hi Anatoly,
>>
>> Thanks for good work!.
>> Few observations:
>> # Noticed performance regression for thunderx platform for l3fwd application,
>> drops by 3%. git bisect shows this changeset is offending commit.
>> I'm still investigating reason for perf-dip..
>> Would like to know - have you noticed any regression on x86 platform?
>
> I haven't noticed any regressions on x86. Would it by any chance be due to 
> the fact that memory segments are now non-contiguous or are allocated from 
> smaller page sizes first?
>
> I am in the process of preparing a v3, which moves some things around and is 
> better at git bisect (and fixes all compile issues i am or was made aware 
> of). Does performance regression also happen in legacy mode?
>
Test ran for legacy mode only and noticed performance regression.

Thanks.
[..]



Re: [dpdk-dev] [PATCH v2 13/41] eal: replace memseg with memseg lists

2018-03-24 Thread Burakov, Anatoly

On 24-Mar-18 6:01 AM, santosh wrote:

Hi Anatoly,

Thanks for good work!.
Few observations:
# Noticed performance regression for thunderx platform for l3fwd application,
drops by 3%. git bisect shows this changeset is offending commit.
I'm still investigating reason for perf-dip..
Would like to know - have you noticed any regression on x86 platform?


I haven't noticed any regressions on x86. Would it by any chance be due 
to the fact that memory segments are now non-contiguous or are allocated 
from smaller page sizes first?


I am in the process of preparing a v3, which moves some things around 
and is better at git bisect (and fixes all compile issues i am or was 
made aware of). Does performance regression also happen in legacy mode?


Thanks for testing!


Perhaps you may club all below comits into one single patch,
as changes are identical... that way you'd reduce patch count by few less.
9a1e2a7bd9f6248c680ad3e444b6f173eb92d457 net/vmxnet3: use contiguous allocation 
for DMA memory
46388b194cd559b5cf7079e01b04bf67a99b64d7 net/virtio: use contiguous allocation 
for DMA memory
a3d2eb10bd998ba3ae3a3d39adeaff38d2e53a9d net/qede: use contiguous allocation 
for DMA memory
6f16b23ef1f472db475edf05159dea5ae741dbf8 net/i40e: use contiguous allocation 
for DMA memory
f9f7576eed35cb6aa50793810cdda43bcc0f4642 net/enic: use contiguous allocation 
for DMA memory
2af6c33009b8008da7028a351efed2932b1a13d0 net/ena: use contiguous allocation for 
DMA memory
18003e22bd7087e5e2e03543cb662d554f7bec52 net/cxgbe: use contiguous allocation 
for DMA memory
59f79182502dcb3634dfa3e7b918195829777460 net/bnx2x: use contiguous allocation 
for DMA memory
f481a321e41da82ddfa00f5ddbcb42fc29e6ae76 net/avf: use contiguous allocation for 
DMA memory
5253e9b757c1855a296656d939f5c28e651fea69 crypto/qat: use contiguous allocation 
for DMA memory
297ab037b4c0d9d725aa6cfdd2c33f7cd9396899 ethdev: use contiguous allocation for 
DMA memory


I would like to keep these as separate patches. It makes it easier to 
track which changes were accepted by maintainers of respective drivers, 
and which weren't.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 13/41] eal: replace memseg with memseg lists

2018-03-23 Thread santosh
Hi Anatoly,


On Wednesday 07 March 2018 10:26 PM, Anatoly Burakov wrote:
> Before, we were aggregating multiple pages into one memseg, so the
> number of memsegs was small. Now, each page gets its own memseg,
> so the list of memsegs is huge. To accommodate the new memseg list
> size and to keep the under-the-hood workings sane, the memseg list
> is now not just a single list, but multiple lists. To be precise,
> each hugepage size available on the system gets one or more memseg
> lists, per socket.
>
> In order to support dynamic memory allocation, we reserve all
> memory in advance. As in, we do an anonymous mmap() of the entire
> maximum size of memory per hugepage size, per socket (which is
> limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
> RTE_MAX_MEM_PER_TYPE gigabytes worth of memory, whichever is the
> smaller one), split over multiple lists (which are limited to
> either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_PER_LIST
> gigabytes per list, whichever is the smaller one).
>
> So, for each hugepage size, we get (by default) up to 128G worth
> of memory, per socket, split into chunks of up to 32G in size.
> The address space is claimed at the start, in eal_common_memory.c.
> The actual page allocation code is in eal_memalloc.c (Linux-only
> for now), and largely consists of copied EAL memory init code.
>
> Pages in the list are also indexed by address. That is, for
> non-legacy mode, in order to figure out where the page belongs,
> one can simply look at base address for a memseg list. Similarly,
> figuring out IOVA address of a memzone is a matter of finding the
> right memseg list, getting offset and dividing by page size to get
> the appropriate memseg. For legacy mode, old behavior of walking
> the memseg list remains.
>
> Due to switch to fbarray and to avoid any intrusive changes,
> secondary processes are not supported in this commit. Also, one
> particular API call (dump physmem layout) no longer makes sense
> and was removed, according to deprecation notice [1].
>
> In legacy mode, nothing is preallocated, and all memsegs are in
> a list like before, but each segment still resides in an appropriate
> memseg list.
>
> The rest of the changes are really ripple effects from the memseg
> change - heap changes, compile fixes, and rewrites to support
> fbarray-backed memseg lists.
>
> [1] http://dpdk.org/dev/patchwork/patch/34002/
>
> Signed-off-by: Anatoly Burakov 
> ---

Thanks for good work!.
Few observations:
# Noticed performance regression for thunderx platform for l3fwd application,
drops by 3%. git bisect shows this changeset is offending commit.
I'm still investigating reason for perf-dip..
Would like to know - have you noticed any regression on x86 platform?

# In next version, pl. make sure that individual patch builds successfully.
Right now, Some patches are dependent on other, leads to build break, observed 
while
git-bisecting.

Few examples are:
>>fa71cdef6963ed795fdd7e7f35085170bb300e39
>>1037fcd989176c5cc83db6223534205cac469765
>> befdec10759d30275a17a829919ee45228d91d3c  
>> 495e60f4e02af8a344c0f817a60d1ee9b9322df4 
[above commits are from your github repo..]

# Nits:
Perhaps you may club all below comits into one single patch,
as changes are identical... that way you'd reduce patch count by few less.
9a1e2a7bd9f6248c680ad3e444b6f173eb92d457 net/vmxnet3: use contiguous allocation 
for DMA memory
46388b194cd559b5cf7079e01b04bf67a99b64d7 net/virtio: use contiguous allocation 
for DMA memory
a3d2eb10bd998ba3ae3a3d39adeaff38d2e53a9d net/qede: use contiguous allocation 
for DMA memory
6f16b23ef1f472db475edf05159dea5ae741dbf8 net/i40e: use contiguous allocation 
for DMA memory
f9f7576eed35cb6aa50793810cdda43bcc0f4642 net/enic: use contiguous allocation 
for DMA memory
2af6c33009b8008da7028a351efed2932b1a13d0 net/ena: use contiguous allocation for 
DMA memory
18003e22bd7087e5e2e03543cb662d554f7bec52 net/cxgbe: use contiguous allocation 
for DMA memory
59f79182502dcb3634dfa3e7b918195829777460 net/bnx2x: use contiguous allocation 
for DMA memory
f481a321e41da82ddfa00f5ddbcb42fc29e6ae76 net/avf: use contiguous allocation for 
DMA memory
5253e9b757c1855a296656d939f5c28e651fea69 crypto/qat: use contiguous allocation 
for DMA memory
297ab037b4c0d9d725aa6cfdd2c33f7cd9396899 ethdev: use contiguous allocation for 
DMA memory

Thanks.



[dpdk-dev] [PATCH v2 13/41] eal: replace memseg with memseg lists

2018-03-07 Thread Anatoly Burakov
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.

In order to support dynamic memory allocation, we reserve all
memory in advance. As in, we do an anonymous mmap() of the entire
maximum size of memory per hugepage size, per socket (which is
limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_PER_TYPE gigabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_PER_LIST
gigabytes per list, whichever is the smaller one).

So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only
for now), and largely consists of copied EAL memory init code.

Pages in the list are also indexed by address. That is, for
non-legacy mode, in order to figure out where the page belongs,
one can simply look at base address for a memseg list. Similarly,
figuring out IOVA address of a memzone is a matter of finding the
right memseg list, getting offset and dividing by page size to get
the appropriate memseg. For legacy mode, old behavior of walking
the memseg list remains.

Due to switch to fbarray and to avoid any intrusive changes,
secondary processes are not supported in this commit. Also, one
particular API call (dump physmem layout) no longer makes sense
and was removed, according to deprecation notice [1].

In legacy mode, nothing is preallocated, and all memsegs are in
a list like before, but each segment still resides in an appropriate
memseg list.

The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists.

[1] http://dpdk.org/dev/patchwork/patch/34002/

Signed-off-by: Anatoly Burakov 
---
 config/common_base|  15 +-
 drivers/bus/pci/linux/pci.c   |  29 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c | 108 +---
 lib/librte_eal/common/eal_common_memory.c | 322 +++---
 lib/librte_eal/common/eal_common_memzone.c|  12 +-
 lib/librte_eal/common/eal_hugepages.h |   2 +
 lib/librte_eal/common/eal_internal_cfg.h  |   2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  22 +-
 lib/librte_eal/common/include/rte_memory.h|  33 ++-
 lib/librte_eal/common/include/rte_memzone.h   |   1 -
 lib/librte_eal/common/malloc_elem.c   |   8 +-
 lib/librte_eal/common/malloc_elem.h   |   6 +-
 lib/librte_eal/common/malloc_heap.c   |  92 +--
 lib/librte_eal/common/rte_malloc.c|  22 +-
 lib/librte_eal/linuxapp/eal/eal.c |  21 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c  | 297 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.c| 164 +++
 lib/librte_eal/rte_eal_version.map|   3 +-
 test/test/test_malloc.c   |  29 +-
 test/test/test_memory.c   |  43 ++-
 test/test/test_memzone.c  |  17 +-
 21 files changed, 917 insertions(+), 331 deletions(-)

diff --git a/config/common_base b/config/common_base
index ad03cf4..e9c1d93 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,7 +61,20 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
-CONFIG_RTE_MAX_MEMSEG=256
+CONFIG_RTE_MAX_MEMSEG_LISTS=32
+# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
+# or RTE_MAX_MEM_PER_LIST gigabytes worth of memory, whichever is the smallest
+CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
+CONFIG_RTE_MAX_MEM_PER_LIST=32
+# a "type" is a combination of page size and NUMA node. total number of memseg
+# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split
+# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or 
RTE_MAX_MEM_PER_TYPE
+# gigabytes of memory (split over multiple lists of RTE_MAX_MEM_PER_LIST),
+# whichever is the smallest
+CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
+CONFIG_RTE_MAX_MEM_PER_TYPE=128
+# legacy mem mode only
+CONFIG_RTE_MAX_LEGACY_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
 CONFIG_RTE_MAX_TAILQ=32
 CONFIG_RTE_ENABLE_ASSERT=n
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index abde641..ec05d7c 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,19 +119,30 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 void