Re: [dpdk-dev] [PATCH v2 0/6] mempool: add bucket driver

2018-04-25 Thread Olivier Matz
Hi,

On Wed, Apr 25, 2018 at 01:00:23AM +0200, Thomas Monjalon wrote:
> Can we have this patchset in 18.05-rc1?
> Or is it candidate to rc2?

I realized I made my comments on v1 instead of v2, sorry.

https://dpdk.org/dev/patchwork/patch/36538/
https://dpdk.org/dev/patchwork/patch/36535/
https://dpdk.org/dev/patchwork/patch/36533/

All of them are minor issues, they could be adressed for rc2.
Will send acks for generic mempool part.

Olivier


> 
> 16/04/2018 15:33, Andrew Rybchenko:
> > The patch series adds bucket mempool driver which allows to allocate
> > (both physically and virtually) contiguous blocks of objects and adds
> > mempool API to do it. It is still capable to provide separate objects,
> > but it is definitely more heavy-weight than ring/stack drivers.
> > The driver will be used by the future Solarflare driver enhancements
> > which allow to utilize physical contiguous blocks in the NIC firmware.
> > 
> > The target usecase is dequeue in blocks and enqueue separate objects
> > back (which are collected in buckets to be dequeued). So, the memory
> > pool with bucket driver is created by an application and provided to
> > networking PMD receive queue. The choice of bucket driver is done using
> > rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
> > block allocation should report the bucket driver as the only supported
> > and preferred one.
> [...]
> > Andrew Rybchenko (1):
> >   doc: advertise bucket mempool driver
> > 
> > Artem V. Andreev (5):
> >   mempool/bucket: implement bucket mempool manager
> >   mempool: implement abstract mempool info API
> >   mempool: support block dequeue operation
> >   mempool/bucket: implement block dequeue operation
> >   mempool/bucket: do not allow one lcore to grab all buckets


Re: [dpdk-dev] [PATCH v2 0/6] mempool: add bucket driver

2018-04-24 Thread Thomas Monjalon
Can we have this patchset in 18.05-rc1?
Or is it candidate to rc2?

16/04/2018 15:33, Andrew Rybchenko:
> The patch series adds bucket mempool driver which allows to allocate
> (both physically and virtually) contiguous blocks of objects and adds
> mempool API to do it. It is still capable to provide separate objects,
> but it is definitely more heavy-weight than ring/stack drivers.
> The driver will be used by the future Solarflare driver enhancements
> which allow to utilize physical contiguous blocks in the NIC firmware.
> 
> The target usecase is dequeue in blocks and enqueue separate objects
> back (which are collected in buckets to be dequeued). So, the memory
> pool with bucket driver is created by an application and provided to
> networking PMD receive queue. The choice of bucket driver is done using
> rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
> block allocation should report the bucket driver as the only supported
> and preferred one.
[...]
> Andrew Rybchenko (1):
>   doc: advertise bucket mempool driver
> 
> Artem V. Andreev (5):
>   mempool/bucket: implement bucket mempool manager
>   mempool: implement abstract mempool info API
>   mempool: support block dequeue operation
>   mempool/bucket: implement block dequeue operation
>   mempool/bucket: do not allow one lcore to grab all buckets






[dpdk-dev] [PATCH v2 0/6] mempool: add bucket driver

2018-04-16 Thread Andrew Rybchenko
The initial patch series [1] (RFCv1 is [2]) is split into two to simplify
processing.  It is the second part which relies on the first one [3].

It should be applied on top of [3].

The patch series adds bucket mempool driver which allows to allocate
(both physically and virtually) contiguous blocks of objects and adds
mempool API to do it. It is still capable to provide separate objects,
but it is definitely more heavy-weight than ring/stack drivers.
The driver will be used by the future Solarflare driver enhancements
which allow to utilize physical contiguous blocks in the NIC firmware.

The target usecase is dequeue in blocks and enqueue separate objects
back (which are collected in buckets to be dequeued). So, the memory
pool with bucket driver is created by an application and provided to
networking PMD receive queue. The choice of bucket driver is done using
rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
block allocation should report the bucket driver as the only supported
and preferred one.

Introduction of the contiguous block dequeue operation is proven by
performance measurements using autotest with minor enhancements:
 - in the original test bulks are powers of two, which is unacceptable
   for us, so they are changed to multiple of contig_block_size;
 - the test code is duplicated to support plain dequeue and
   dequeue_contig_blocks;
 - all the extra test variations (with/without cache etc) are eliminated;
 - a fake read from the dequeued buffer is added (in both cases) to
   simulate mbufs access.

start performance test for bucket (without cache)
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Srate_persec=   111935488
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Srate_persec=   115290931
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Srate_persec=   353055539
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Srate_persec=   353330790
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Srate_persec=   224657407
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Srate_persec=   230411468
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Srate_persec=   706700902
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Srate_persec=   703673139
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Srate_persec=   425236887
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Srate_persec=   437295512
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Srate_persec=  1343409356
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Srate_persec=  1336567397
start performance test for bucket (without cache + contiguous dequeue)
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Crate_persec=   122945536
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Crate_persec=   126458265
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Crate_persec=   374262988
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Crate_persec=   377316966
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Crate_persec=   244842496
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Crate_persec=   251618917
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Crate_persec=   751226060
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Crate_persec=   756233010
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
30 Crate_persec=   462068120
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  
60 Crate_persec=   476997221
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
30 Crate_persec=  1432171313
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  
60 Crate_persec=  1438829771

The number of objects in the contiguous block is a function of bucket
memory size (.config option) and total element size. In the future
additional API with possibility to pass parameters on mempool allocation
may be added.

It breaks ABI since changes rte_mempool_ops. The ABI version is already
bumped in [4].


[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2017-November/082335.html
[3] https://dpdk.org/ml/archives/dev/2018-April/097354.html
[4] https://dpdk.org/ml/archives/dev/2018-April/097352.html

v1 -> v2:
  - just rebase

RFCv2 -> v1:
  - rebased on top of [3]
  - cleanup deprecation notice when it is done
  - mark a new API experimental
  - move contig block